From: Luke Kenneth Casson Leighton Date: Wed, 24 May 2023 11:01:09 +0000 (+0100) Subject: rename ls002 to ls002.fmi X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=4c20c551b0abc00cbe5da71ac660ba83d8992b23;p=libreriscv.git rename ls002 to ls002.fmi --- diff --git a/openpower/sv/int_fp_mv.mdwn b/openpower/sv/int_fp_mv.mdwn index af1262135..6da7b0e97 100644 --- a/openpower/sv/int_fp_mv.mdwn +++ b/openpower/sv/int_fp_mv.mdwn @@ -17,7 +17,7 @@ Links: * fmvis * int-fp RFC * [[int_fp_mv/appendix]] -* [[sv/rfc/ls002]] - `fmvis` and `fishmv` External RFC Formal Submission +* [[sv/rfc/ls002.fmi]] - `fmvis` and `fishmv` External RFC Formal Submission * [[sv/rfc/ls006]] - int-fp-mv External RFC Formal Submission Trademarks: diff --git a/openpower/sv/rfc/ls002.fmi.mdwn b/openpower/sv/rfc/ls002.fmi.mdwn new file mode 100644 index 000000000..21613b31b --- /dev/null +++ b/openpower/sv/rfc/ls002.fmi.mdwn @@ -0,0 +1,232 @@ +# RFC ls002 v2 Floating-Point Load-Immediate + +**URLs**: + +* +* +* +* + +**Severity**: Major + +**Status**: New + +**Date**: 05 Oct 2022 + +**Target**: v3.2B + +**Source**: v3.0B + +**Books and Section affected**: + +``` + Book I Scalar Floating-Point 4.6.2.1 + Appendix E Power ISA sorted by opcode + Appendix F Power ISA sorted by version + Appendix G Power ISA sorted by Compliancy Subset + Appendix H Power ISA sorted by mnemonic +``` + +**Summary** + +``` + Instructions added + fmvis - Floating-Point Move Immediate, Shifted + fishmv - Floating-Point Immediate, Second-half Move +``` + +**Submitter**: Luke Leighton (Libre-SOC) + +**Requester**: Libre-SOC + +**Impact on processor**: + +``` + Addition of two new FPR-based instructions +``` + +**Impact on software**: + +``` + Requires support for new instructions in assembler, debuggers, + and related tools. +``` + +**Keywords**: + +``` + FPR, Floating-point, Load-immediate, BF16, bfloat16, BFP32 +``` + +**Motivation** + +Similar to `lxvkq` but extended to a bfloat16 with one +32-bit instruction and a full FP32 in two 32-bit instructions +these instructions always save a Data Load and associated L1 +and TLB lookup. Even quickly clearing an FPR to zero presently needs Load. + +**Notes and Observations**: + +1. There is no need for an Rc=1 variant because this is an immediate + loading instruction (an FPR equivalent to `li`) +2. There is no need for Special Registers (FP Flags) because this + is an immediate loading instruction. No FPR Load Operations + alter `FPSCR`, neither does `lxvkq`, and on that basis neither + should these instructions. +3. `fishmv` as a FRT-only Read-Modify-Write (instead of an unnecessary + FRT,FRA pair) saves five potential bits, making + the difference between a 5-bit XO (VA/DX-Form) and requiring an entire + Primary Opcode. + +**Changes** + +Add the following entries to: + +* the Appendices of Book I +* Instructions of Book I as a new Section 4.6.2.1 +* DX-Form of Book I Section 1.6.1.6 and 1.6.2 +* Floating-Point Data a Format of Book I Section 4.3.1 + +---------------- + +\newpage{} + +# Floating-Point Move Immediate + +`fmvis FRT, D` + +| 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form | +|--------|------|-------|-------|-------|-----|---------| +| Major | FRT | d1 | d0 | XO | d2 | DX-Form | + +Pseudocode: + +``` + bf16 <- d0 || d1 || d2 # create bfloat16 immediate + bfp32 <- bf16 || [0]*16 # convert bfloat16 to BFP32 + FRT <- DOUBLE(bfp32) # convert BFP32 to BFP64 +``` + +Special registers altered: + + None + +The value `D << 16` is interpreted as a 32-bit float, converted to a +64-bit float and written to `FRT`. This is equivalent to reinterpreting +`D` as a `bfloat16` and converting to 64-bit float. + +Examples: + +``` + fmvis f4, 0 # writes +0.0 to f4 (clears an FPR) + fmvis f4, 0x8000 # writes -0.0 to f4 + fmvis f4, 0x3F80 # writes +1.0 to f4 + fmvis f4, 0xBFC0 # writes -1.5 to f4 + fmvis f4, 0x7FC0 # writes +qNaN to f4 + fmvis f4, 0x7F80 # writes +Infinity to f4 + fmvis f4, 0xFF80 # writes -Infinity to f4 + fmvis f4, 0x3FFF # writes +1.9921875 to f4 +``` + +# Floating-Point Immediate Second-Half Move + +`fishmv FRT, D` + +DX-Form: + +| 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form | +|--------|------|-------|-------|-------|-----|---------| +| Major | FRT | d1 | d0 | XO | d2 | DX-Form | + +Pseudocode: + +``` + n <- (FRT) # read FRT + bfp32 <- SINGLE(n) # convert to BFP32 + bfp32[16:31] <- d0 || d1 || d2 # replace LSB half + FRT <- DOUBLE(bfp32) # convert back to BFP64 +``` + +Special registers altered: + + None + +An additional 16-bits of immediate is +inserted into the low-order half of the single-format value +corresponding to the contents of FRT. + +**This instruction performs a Read-Modify-Write on FRT.** +In hardware, `fishmv` may be macro-op-fused with `fmvis`. + +Programmer's note: +The use of these two instructions is strategically similar to +how `li` combined with `oris` may be used to construct 32-bit Integers. +If a prior `fmvis` instruction had been used to +set the upper 16-bits from a BFP32 value, `fishmv` may be used +to set the +lower 16-bits. +Example: + +``` + # these two combined instructions write 0x3f808000 + # into f4 as a BFP32 to be converted to a BFP64. + # actual contents in f4 after conversion: 0x3ff0_1000_0000_0000 + # first the upper bits, happens to be +1.0 + fmvis f4, 0x3F80 # writes +1.0 to f4 + # now write the lower 16 bits of a BFP32 + fishmv f4, 0x8000 # writes +1.00390625 to f4 +``` +[[!tag opf_rfc]] + +------------- + +\newpage{} + +# DX-Form + +Add the following to Book I, 1.6.1.6, DX-Form + +``` + |0 |6 |11 |16 |26 |31 + | PO | FRT| d1| d0| XO|d2 +``` + +Add `DX` to `FRT` Field in Book I, 1.6.2 + +``` + FRT (6:10) + Field used to specify an FPR to be used as a + source. + Formats: D, X, DX +``` + +# bfloat16 definition + +Add the following to Book I, 4.3.1: + +The format may be a 16-bit bfloat16, 32-bit single format for a +single-precision value... + +The bfloat16 format is used as an immediate. + +The structure of the bfloat16, single and double formats is shown below. + +``` + |S |EXP| FRACTION| + |0 |1 8|9 15| +``` + +Figure #. Binary floating-point half-precision format (bfloat16) + +# Appendices + + Appendix E Power ISA sorted by opcode + Appendix F Power ISA sorted by version + Appendix G Power ISA sorted by Compliancy Subset + Appendix H Power ISA sorted by mnemonic + +| Form | Book | Page | Version | mnemonic | Description | +|------|------|------|---------|----------|-------------| +| DX | I | # | 3.0B | fmvis | Floating-point Move Immediate, Shifted | +| DX | I | # | 3.0B | fishmv | Floating-point Immediate, Second-half Move | + diff --git a/openpower/sv/rfc/ls002.mdwn b/openpower/sv/rfc/ls002.mdwn deleted file mode 100644 index 21613b31b..000000000 --- a/openpower/sv/rfc/ls002.mdwn +++ /dev/null @@ -1,232 +0,0 @@ -# RFC ls002 v2 Floating-Point Load-Immediate - -**URLs**: - -* -* -* -* - -**Severity**: Major - -**Status**: New - -**Date**: 05 Oct 2022 - -**Target**: v3.2B - -**Source**: v3.0B - -**Books and Section affected**: - -``` - Book I Scalar Floating-Point 4.6.2.1 - Appendix E Power ISA sorted by opcode - Appendix F Power ISA sorted by version - Appendix G Power ISA sorted by Compliancy Subset - Appendix H Power ISA sorted by mnemonic -``` - -**Summary** - -``` - Instructions added - fmvis - Floating-Point Move Immediate, Shifted - fishmv - Floating-Point Immediate, Second-half Move -``` - -**Submitter**: Luke Leighton (Libre-SOC) - -**Requester**: Libre-SOC - -**Impact on processor**: - -``` - Addition of two new FPR-based instructions -``` - -**Impact on software**: - -``` - Requires support for new instructions in assembler, debuggers, - and related tools. -``` - -**Keywords**: - -``` - FPR, Floating-point, Load-immediate, BF16, bfloat16, BFP32 -``` - -**Motivation** - -Similar to `lxvkq` but extended to a bfloat16 with one -32-bit instruction and a full FP32 in two 32-bit instructions -these instructions always save a Data Load and associated L1 -and TLB lookup. Even quickly clearing an FPR to zero presently needs Load. - -**Notes and Observations**: - -1. There is no need for an Rc=1 variant because this is an immediate - loading instruction (an FPR equivalent to `li`) -2. There is no need for Special Registers (FP Flags) because this - is an immediate loading instruction. No FPR Load Operations - alter `FPSCR`, neither does `lxvkq`, and on that basis neither - should these instructions. -3. `fishmv` as a FRT-only Read-Modify-Write (instead of an unnecessary - FRT,FRA pair) saves five potential bits, making - the difference between a 5-bit XO (VA/DX-Form) and requiring an entire - Primary Opcode. - -**Changes** - -Add the following entries to: - -* the Appendices of Book I -* Instructions of Book I as a new Section 4.6.2.1 -* DX-Form of Book I Section 1.6.1.6 and 1.6.2 -* Floating-Point Data a Format of Book I Section 4.3.1 - ----------------- - -\newpage{} - -# Floating-Point Move Immediate - -`fmvis FRT, D` - -| 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form | -|--------|------|-------|-------|-------|-----|---------| -| Major | FRT | d1 | d0 | XO | d2 | DX-Form | - -Pseudocode: - -``` - bf16 <- d0 || d1 || d2 # create bfloat16 immediate - bfp32 <- bf16 || [0]*16 # convert bfloat16 to BFP32 - FRT <- DOUBLE(bfp32) # convert BFP32 to BFP64 -``` - -Special registers altered: - - None - -The value `D << 16` is interpreted as a 32-bit float, converted to a -64-bit float and written to `FRT`. This is equivalent to reinterpreting -`D` as a `bfloat16` and converting to 64-bit float. - -Examples: - -``` - fmvis f4, 0 # writes +0.0 to f4 (clears an FPR) - fmvis f4, 0x8000 # writes -0.0 to f4 - fmvis f4, 0x3F80 # writes +1.0 to f4 - fmvis f4, 0xBFC0 # writes -1.5 to f4 - fmvis f4, 0x7FC0 # writes +qNaN to f4 - fmvis f4, 0x7F80 # writes +Infinity to f4 - fmvis f4, 0xFF80 # writes -Infinity to f4 - fmvis f4, 0x3FFF # writes +1.9921875 to f4 -``` - -# Floating-Point Immediate Second-Half Move - -`fishmv FRT, D` - -DX-Form: - -| 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form | -|--------|------|-------|-------|-------|-----|---------| -| Major | FRT | d1 | d0 | XO | d2 | DX-Form | - -Pseudocode: - -``` - n <- (FRT) # read FRT - bfp32 <- SINGLE(n) # convert to BFP32 - bfp32[16:31] <- d0 || d1 || d2 # replace LSB half - FRT <- DOUBLE(bfp32) # convert back to BFP64 -``` - -Special registers altered: - - None - -An additional 16-bits of immediate is -inserted into the low-order half of the single-format value -corresponding to the contents of FRT. - -**This instruction performs a Read-Modify-Write on FRT.** -In hardware, `fishmv` may be macro-op-fused with `fmvis`. - -Programmer's note: -The use of these two instructions is strategically similar to -how `li` combined with `oris` may be used to construct 32-bit Integers. -If a prior `fmvis` instruction had been used to -set the upper 16-bits from a BFP32 value, `fishmv` may be used -to set the -lower 16-bits. -Example: - -``` - # these two combined instructions write 0x3f808000 - # into f4 as a BFP32 to be converted to a BFP64. - # actual contents in f4 after conversion: 0x3ff0_1000_0000_0000 - # first the upper bits, happens to be +1.0 - fmvis f4, 0x3F80 # writes +1.0 to f4 - # now write the lower 16 bits of a BFP32 - fishmv f4, 0x8000 # writes +1.00390625 to f4 -``` -[[!tag opf_rfc]] - -------------- - -\newpage{} - -# DX-Form - -Add the following to Book I, 1.6.1.6, DX-Form - -``` - |0 |6 |11 |16 |26 |31 - | PO | FRT| d1| d0| XO|d2 -``` - -Add `DX` to `FRT` Field in Book I, 1.6.2 - -``` - FRT (6:10) - Field used to specify an FPR to be used as a - source. - Formats: D, X, DX -``` - -# bfloat16 definition - -Add the following to Book I, 4.3.1: - -The format may be a 16-bit bfloat16, 32-bit single format for a -single-precision value... - -The bfloat16 format is used as an immediate. - -The structure of the bfloat16, single and double formats is shown below. - -``` - |S |EXP| FRACTION| - |0 |1 8|9 15| -``` - -Figure #. Binary floating-point half-precision format (bfloat16) - -# Appendices - - Appendix E Power ISA sorted by opcode - Appendix F Power ISA sorted by version - Appendix G Power ISA sorted by Compliancy Subset - Appendix H Power ISA sorted by mnemonic - -| Form | Book | Page | Version | mnemonic | Description | -|------|------|------|---------|----------|-------------| -| DX | I | # | 3.0B | fmvis | Floating-point Move Immediate, Shifted | -| DX | I | # | 3.0B | fishmv | Floating-point Immediate, Second-half Move | - diff --git a/openpower/sv/rfc/ls012.mdwn b/openpower/sv/rfc/ls012.mdwn index bedbfa602..4eb3b3c4a 100644 --- a/openpower/sv/rfc/ls012.mdwn +++ b/openpower/sv/rfc/ls012.mdwn @@ -89,7 +89,7 @@ Audio/Visual, High-Performance Compute, GPU workloads and DSP. | 4 | FPR LD/ST-Shifted-PostIncrement-Update (ditto) | [[ls011]] | | | 26 | GPR LD/ST-Shifted (again saves hugely in hot-loops) | [[ls004]] | | | 11 | FPR LD/ST-Shifted (ditto) | [[ls004]] | | -| 2 | Float-Load-Immediate (always saves one LD L1/2/3 D-Cache op) | [[ls002]] | | +| 2 | Float-Load-Immediate (always saves one LD L1/2/3 D-Cache op) | [[ls002.fmi]] | | | 5 | Big-Integer Chained 3-in 2-out (64-bit Carry) | [[ls003]] | [[sv/biginteger]] | | 6 | Bitmanip LUT2/3 operations. high cost high reward | [[ls007]] | [[sv/bitmanip]] | | 1 | fclass (Scalar variant of xvtstdcsp) |TBD| [[sv/fclass]] | @@ -379,7 +379,7 @@ instructions into one. However it is still not a huge priority unlike ## Float-Load-Immediate -Very easily justified. As explained in [[ls002]] these always saves one +Very easily justified. As explained in [[ls002.fmi]] these always saves one LD L1/2/3 D-Cache memory-lookup operation, by virtue of the Immediate FP value being in the I-Cache side. It is such a high priority that these instructions are easily justifiable adding into EXT0xx, despite diff --git a/openpower/sv/rfc/ls012/optable.csv b/openpower/sv/rfc/ls012/optable.csv index c98e77e88..52ef3a25a 100644 --- a/openpower/sv/rfc/ls012/optable.csv +++ b/openpower/sv/rfc/ls012/optable.csv @@ -91,8 +91,8 @@ crternlogi, ls007, high, 5, yes, TBD, yes, sv/bitmanip, 3r1w, SV/D, no binlut, ls007, high, 6, yes, TBD, no, sv/bitmanip, 3R1W, SFFS, no crbinlut, ls007, high, 5, yes, TBD, no, sv/bitmanip, 3r1w, SV/D, no # Float-Load-Immediate (always saves one LD L1/2/3 D-Cache op) -fmvis, ls002, high, 5, yes, TBD, no, sv/bitmanip, 1W, SFFS, yes -fishmv, ls002, high, 5, yes, TBD, no, sv/bitmanip, 1R1W, SFFS, yes +fmvis, ls002.fmi, high, 5, yes, TBD, no, sv/bitmanip, 1W, SFFS, yes +fishmv, ls002.fmi, high, 5, yes, TBD, no, sv/bitmanip, 1R1W, SFFS, yes # Shift-and-Add (mitigates LD-ST-Shift; Cryptography e.g. twofish) shadd, ls004, med, 7, yes, EXT0xx, no, sv/bitmanip, 2R1W1w, SFFS, yes shadduw, ls004, med, 7, yes, EXT0xx, no, sv/bitmanip, 2R1W1w, SFFS, yes