* <https://bugs.libre-soc.org/show_bug.cgi?id=887> fmvis
* <https://bugs.libre-soc.org/show_bug.cgi?id=1015> int-fp RFC
* [[int_fp_mv/appendix]]
-* [[sv/rfc/ls002]] - `fmvis` and `fishmv` External RFC Formal Submission
+* [[sv/rfc/ls002.fmi]] - `fmvis` and `fishmv` External RFC Formal Submission
* [[sv/rfc/ls006]] - int-fp-mv External RFC Formal Submission
Trademarks:
--- /dev/null
+# RFC ls002 v2 Floating-Point Load-Immediate
+
+**URLs**:
+
+* <https://libre-soc.org/openpower/sv/int_fp_mv/#fmvis>
+* <https://libre-soc.org/openpower/sv/rfc/ls002/>
+* <https://bugs.libre-soc.org/show_bug.cgi?id=944>
+* <https://git.openpower.foundation/isa/PowerISA/issues/87>
+
+**Severity**: Major
+
+**Status**: New
+
+**Date**: 05 Oct 2022
+
+**Target**: v3.2B
+
+**Source**: v3.0B
+
+**Books and Section affected**:
+
+```
+ Book I Scalar Floating-Point 4.6.2.1
+ Appendix E Power ISA sorted by opcode
+ Appendix F Power ISA sorted by version
+ Appendix G Power ISA sorted by Compliancy Subset
+ Appendix H Power ISA sorted by mnemonic
+```
+
+**Summary**
+
+```
+ Instructions added
+ fmvis - Floating-Point Move Immediate, Shifted
+ fishmv - Floating-Point Immediate, Second-half Move
+```
+
+**Submitter**: Luke Leighton (Libre-SOC)
+
+**Requester**: Libre-SOC
+
+**Impact on processor**:
+
+```
+ Addition of two new FPR-based instructions
+```
+
+**Impact on software**:
+
+```
+ Requires support for new instructions in assembler, debuggers,
+ and related tools.
+```
+
+**Keywords**:
+
+```
+ FPR, Floating-point, Load-immediate, BF16, bfloat16, BFP32
+```
+
+**Motivation**
+
+Similar to `lxvkq` but extended to a bfloat16 with one
+32-bit instruction and a full FP32 in two 32-bit instructions
+these instructions always save a Data Load and associated L1
+and TLB lookup. Even quickly clearing an FPR to zero presently needs Load.
+
+**Notes and Observations**:
+
+1. There is no need for an Rc=1 variant because this is an immediate
+ loading instruction (an FPR equivalent to `li`)
+2. There is no need for Special Registers (FP Flags) because this
+ is an immediate loading instruction. No FPR Load Operations
+ alter `FPSCR`, neither does `lxvkq`, and on that basis neither
+ should these instructions.
+3. `fishmv` as a FRT-only Read-Modify-Write (instead of an unnecessary
+ FRT,FRA pair) saves five potential bits, making
+ the difference between a 5-bit XO (VA/DX-Form) and requiring an entire
+ Primary Opcode.
+
+**Changes**
+
+Add the following entries to:
+
+* the Appendices of Book I
+* Instructions of Book I as a new Section 4.6.2.1
+* DX-Form of Book I Section 1.6.1.6 and 1.6.2
+* Floating-Point Data a Format of Book I Section 4.3.1
+
+----------------
+
+\newpage{}
+
+# Floating-Point Move Immediate
+
+`fmvis FRT, D`
+
+| 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form |
+|--------|------|-------|-------|-------|-----|---------|
+| Major | FRT | d1 | d0 | XO | d2 | DX-Form |
+
+Pseudocode:
+
+```
+ bf16 <- d0 || d1 || d2 # create bfloat16 immediate
+ bfp32 <- bf16 || [0]*16 # convert bfloat16 to BFP32
+ FRT <- DOUBLE(bfp32) # convert BFP32 to BFP64
+```
+
+Special registers altered:
+
+ None
+
+The value `D << 16` is interpreted as a 32-bit float, converted to a
+64-bit float and written to `FRT`. This is equivalent to reinterpreting
+`D` as a `bfloat16` and converting to 64-bit float.
+
+Examples:
+
+```
+ fmvis f4, 0 # writes +0.0 to f4 (clears an FPR)
+ fmvis f4, 0x8000 # writes -0.0 to f4
+ fmvis f4, 0x3F80 # writes +1.0 to f4
+ fmvis f4, 0xBFC0 # writes -1.5 to f4
+ fmvis f4, 0x7FC0 # writes +qNaN to f4
+ fmvis f4, 0x7F80 # writes +Infinity to f4
+ fmvis f4, 0xFF80 # writes -Infinity to f4
+ fmvis f4, 0x3FFF # writes +1.9921875 to f4
+```
+
+# Floating-Point Immediate Second-Half Move
+
+`fishmv FRT, D`
+
+DX-Form:
+
+| 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form |
+|--------|------|-------|-------|-------|-----|---------|
+| Major | FRT | d1 | d0 | XO | d2 | DX-Form |
+
+Pseudocode:
+
+```
+ n <- (FRT) # read FRT
+ bfp32 <- SINGLE(n) # convert to BFP32
+ bfp32[16:31] <- d0 || d1 || d2 # replace LSB half
+ FRT <- DOUBLE(bfp32) # convert back to BFP64
+```
+
+Special registers altered:
+
+ None
+
+An additional 16-bits of immediate is
+inserted into the low-order half of the single-format value
+corresponding to the contents of FRT.
+
+**This instruction performs a Read-Modify-Write on FRT.**
+In hardware, `fishmv` may be macro-op-fused with `fmvis`.
+
+Programmer's note:
+The use of these two instructions is strategically similar to
+how `li` combined with `oris` may be used to construct 32-bit Integers.
+If a prior `fmvis` instruction had been used to
+set the upper 16-bits from a BFP32 value, `fishmv` may be used
+to set the
+lower 16-bits.
+Example:
+
+```
+ # these two combined instructions write 0x3f808000
+ # into f4 as a BFP32 to be converted to a BFP64.
+ # actual contents in f4 after conversion: 0x3ff0_1000_0000_0000
+ # first the upper bits, happens to be +1.0
+ fmvis f4, 0x3F80 # writes +1.0 to f4
+ # now write the lower 16 bits of a BFP32
+ fishmv f4, 0x8000 # writes +1.00390625 to f4
+```
+[[!tag opf_rfc]]
+
+-------------
+
+\newpage{}
+
+# DX-Form
+
+Add the following to Book I, 1.6.1.6, DX-Form
+
+```
+ |0 |6 |11 |16 |26 |31
+ | PO | FRT| d1| d0| XO|d2
+```
+
+Add `DX` to `FRT` Field in Book I, 1.6.2
+
+```
+ FRT (6:10)
+ Field used to specify an FPR to be used as a
+ source.
+ Formats: D, X, DX
+```
+
+# bfloat16 definition
+
+Add the following to Book I, 4.3.1:
+
+The format may be a 16-bit bfloat16, 32-bit single format for a
+single-precision value...
+
+The bfloat16 format is used as an immediate.
+
+The structure of the bfloat16, single and double formats is shown below.
+
+```
+ |S |EXP| FRACTION|
+ |0 |1 8|9 15|
+```
+
+Figure #. Binary floating-point half-precision format (bfloat16)
+
+# Appendices
+
+ Appendix E Power ISA sorted by opcode
+ Appendix F Power ISA sorted by version
+ Appendix G Power ISA sorted by Compliancy Subset
+ Appendix H Power ISA sorted by mnemonic
+
+| Form | Book | Page | Version | mnemonic | Description |
+|------|------|------|---------|----------|-------------|
+| DX | I | # | 3.0B | fmvis | Floating-point Move Immediate, Shifted |
+| DX | I | # | 3.0B | fishmv | Floating-point Immediate, Second-half Move |
+
+++ /dev/null
-# RFC ls002 v2 Floating-Point Load-Immediate
-
-**URLs**:
-
-* <https://libre-soc.org/openpower/sv/int_fp_mv/#fmvis>
-* <https://libre-soc.org/openpower/sv/rfc/ls002/>
-* <https://bugs.libre-soc.org/show_bug.cgi?id=944>
-* <https://git.openpower.foundation/isa/PowerISA/issues/87>
-
-**Severity**: Major
-
-**Status**: New
-
-**Date**: 05 Oct 2022
-
-**Target**: v3.2B
-
-**Source**: v3.0B
-
-**Books and Section affected**:
-
-```
- Book I Scalar Floating-Point 4.6.2.1
- Appendix E Power ISA sorted by opcode
- Appendix F Power ISA sorted by version
- Appendix G Power ISA sorted by Compliancy Subset
- Appendix H Power ISA sorted by mnemonic
-```
-
-**Summary**
-
-```
- Instructions added
- fmvis - Floating-Point Move Immediate, Shifted
- fishmv - Floating-Point Immediate, Second-half Move
-```
-
-**Submitter**: Luke Leighton (Libre-SOC)
-
-**Requester**: Libre-SOC
-
-**Impact on processor**:
-
-```
- Addition of two new FPR-based instructions
-```
-
-**Impact on software**:
-
-```
- Requires support for new instructions in assembler, debuggers,
- and related tools.
-```
-
-**Keywords**:
-
-```
- FPR, Floating-point, Load-immediate, BF16, bfloat16, BFP32
-```
-
-**Motivation**
-
-Similar to `lxvkq` but extended to a bfloat16 with one
-32-bit instruction and a full FP32 in two 32-bit instructions
-these instructions always save a Data Load and associated L1
-and TLB lookup. Even quickly clearing an FPR to zero presently needs Load.
-
-**Notes and Observations**:
-
-1. There is no need for an Rc=1 variant because this is an immediate
- loading instruction (an FPR equivalent to `li`)
-2. There is no need for Special Registers (FP Flags) because this
- is an immediate loading instruction. No FPR Load Operations
- alter `FPSCR`, neither does `lxvkq`, and on that basis neither
- should these instructions.
-3. `fishmv` as a FRT-only Read-Modify-Write (instead of an unnecessary
- FRT,FRA pair) saves five potential bits, making
- the difference between a 5-bit XO (VA/DX-Form) and requiring an entire
- Primary Opcode.
-
-**Changes**
-
-Add the following entries to:
-
-* the Appendices of Book I
-* Instructions of Book I as a new Section 4.6.2.1
-* DX-Form of Book I Section 1.6.1.6 and 1.6.2
-* Floating-Point Data a Format of Book I Section 4.3.1
-
-----------------
-
-\newpage{}
-
-# Floating-Point Move Immediate
-
-`fmvis FRT, D`
-
-| 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form |
-|--------|------|-------|-------|-------|-----|---------|
-| Major | FRT | d1 | d0 | XO | d2 | DX-Form |
-
-Pseudocode:
-
-```
- bf16 <- d0 || d1 || d2 # create bfloat16 immediate
- bfp32 <- bf16 || [0]*16 # convert bfloat16 to BFP32
- FRT <- DOUBLE(bfp32) # convert BFP32 to BFP64
-```
-
-Special registers altered:
-
- None
-
-The value `D << 16` is interpreted as a 32-bit float, converted to a
-64-bit float and written to `FRT`. This is equivalent to reinterpreting
-`D` as a `bfloat16` and converting to 64-bit float.
-
-Examples:
-
-```
- fmvis f4, 0 # writes +0.0 to f4 (clears an FPR)
- fmvis f4, 0x8000 # writes -0.0 to f4
- fmvis f4, 0x3F80 # writes +1.0 to f4
- fmvis f4, 0xBFC0 # writes -1.5 to f4
- fmvis f4, 0x7FC0 # writes +qNaN to f4
- fmvis f4, 0x7F80 # writes +Infinity to f4
- fmvis f4, 0xFF80 # writes -Infinity to f4
- fmvis f4, 0x3FFF # writes +1.9921875 to f4
-```
-
-# Floating-Point Immediate Second-Half Move
-
-`fishmv FRT, D`
-
-DX-Form:
-
-| 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form |
-|--------|------|-------|-------|-------|-----|---------|
-| Major | FRT | d1 | d0 | XO | d2 | DX-Form |
-
-Pseudocode:
-
-```
- n <- (FRT) # read FRT
- bfp32 <- SINGLE(n) # convert to BFP32
- bfp32[16:31] <- d0 || d1 || d2 # replace LSB half
- FRT <- DOUBLE(bfp32) # convert back to BFP64
-```
-
-Special registers altered:
-
- None
-
-An additional 16-bits of immediate is
-inserted into the low-order half of the single-format value
-corresponding to the contents of FRT.
-
-**This instruction performs a Read-Modify-Write on FRT.**
-In hardware, `fishmv` may be macro-op-fused with `fmvis`.
-
-Programmer's note:
-The use of these two instructions is strategically similar to
-how `li` combined with `oris` may be used to construct 32-bit Integers.
-If a prior `fmvis` instruction had been used to
-set the upper 16-bits from a BFP32 value, `fishmv` may be used
-to set the
-lower 16-bits.
-Example:
-
-```
- # these two combined instructions write 0x3f808000
- # into f4 as a BFP32 to be converted to a BFP64.
- # actual contents in f4 after conversion: 0x3ff0_1000_0000_0000
- # first the upper bits, happens to be +1.0
- fmvis f4, 0x3F80 # writes +1.0 to f4
- # now write the lower 16 bits of a BFP32
- fishmv f4, 0x8000 # writes +1.00390625 to f4
-```
-[[!tag opf_rfc]]
-
--------------
-
-\newpage{}
-
-# DX-Form
-
-Add the following to Book I, 1.6.1.6, DX-Form
-
-```
- |0 |6 |11 |16 |26 |31
- | PO | FRT| d1| d0| XO|d2
-```
-
-Add `DX` to `FRT` Field in Book I, 1.6.2
-
-```
- FRT (6:10)
- Field used to specify an FPR to be used as a
- source.
- Formats: D, X, DX
-```
-
-# bfloat16 definition
-
-Add the following to Book I, 4.3.1:
-
-The format may be a 16-bit bfloat16, 32-bit single format for a
-single-precision value...
-
-The bfloat16 format is used as an immediate.
-
-The structure of the bfloat16, single and double formats is shown below.
-
-```
- |S |EXP| FRACTION|
- |0 |1 8|9 15|
-```
-
-Figure #. Binary floating-point half-precision format (bfloat16)
-
-# Appendices
-
- Appendix E Power ISA sorted by opcode
- Appendix F Power ISA sorted by version
- Appendix G Power ISA sorted by Compliancy Subset
- Appendix H Power ISA sorted by mnemonic
-
-| Form | Book | Page | Version | mnemonic | Description |
-|------|------|------|---------|----------|-------------|
-| DX | I | # | 3.0B | fmvis | Floating-point Move Immediate, Shifted |
-| DX | I | # | 3.0B | fishmv | Floating-point Immediate, Second-half Move |
-
| 4 | FPR LD/ST-Shifted-PostIncrement-Update (ditto) | [[ls011]] | |
| 26 | GPR LD/ST-Shifted (again saves hugely in hot-loops) | [[ls004]] | |
| 11 | FPR LD/ST-Shifted (ditto) | [[ls004]] | |
-| 2 | Float-Load-Immediate (always saves one LD L1/2/3 D-Cache op) | [[ls002]] | |
+| 2 | Float-Load-Immediate (always saves one LD L1/2/3 D-Cache op) | [[ls002.fmi]] | |
| 5 | Big-Integer Chained 3-in 2-out (64-bit Carry) | [[ls003]] | [[sv/biginteger]] |
| 6 | Bitmanip LUT2/3 operations. high cost high reward | [[ls007]] | [[sv/bitmanip]] |
| 1 | fclass (Scalar variant of xvtstdcsp) |TBD| [[sv/fclass]] |
## Float-Load-Immediate
-Very easily justified. As explained in [[ls002]] these always saves one
+Very easily justified. As explained in [[ls002.fmi]] these always saves one
LD L1/2/3 D-Cache memory-lookup operation, by virtue of the Immediate
FP value being in the I-Cache side. It is such a high priority that
these instructions are easily justifiable adding into EXT0xx, despite
binlut, ls007, high, 6, yes, TBD, no, sv/bitmanip, 3R1W, SFFS, no
crbinlut, ls007, high, 5, yes, TBD, no, sv/bitmanip, 3r1w, SV/D, no
# Float-Load-Immediate (always saves one LD L1/2/3 D-Cache op)
-fmvis, ls002, high, 5, yes, TBD, no, sv/bitmanip, 1W, SFFS, yes
-fishmv, ls002, high, 5, yes, TBD, no, sv/bitmanip, 1R1W, SFFS, yes
+fmvis, ls002.fmi, high, 5, yes, TBD, no, sv/bitmanip, 1W, SFFS, yes
+fishmv, ls002.fmi, high, 5, yes, TBD, no, sv/bitmanip, 1R1W, SFFS, yes
# Shift-and-Add (mitigates LD-ST-Shift; Cryptography e.g. twofish)
shadd, ls004, med, 7, yes, EXT0xx, no, sv/bitmanip, 2R1W1w, SFFS, yes
shadduw, ls004, med, 7, yes, EXT0xx, no, sv/bitmanip, 2R1W1w, SFFS, yes