openpower/sv/rfc/ls002.fmi.mdwn

   1 # RFC ls002.fmi v2 Floating-Point Load-Immediate
   2
   3 **URLs**:
   4
   5 * <https://libre-soc.org/openpower/sv/int_fp_mv/#fmvis>
   6 * <https://libre-soc.org/openpower/sv/rfc/ls002.fmi/>
   7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1092>
   8 * <https://git.openpower.foundation/isa/PowerISA/issues/87>
   9
  10 **Severity**: Major
  11
  12 **Status**: New
  13
  14 **Date**: 05 Oct 2022 v3 TODO
  15
  16 **Target**: v3.2B
  17
  18 **Source**: v3.0B
  19
  20 **Books and Section affected**:
  21
  22 ```
  23     Book I Scalar Floating-Point 4.6.2.1
  24     Appendix E Power ISA sorted by opcode
  25     Appendix F Power ISA sorted by version
  26     Appendix G Power ISA sorted by Compliancy Subset
  27     Appendix H Power ISA sorted by mnemonic
  28 ```
  29
  30 **Summary**
  31
  32 ```
  33     Instructions added
  34     fmvis - Floating-Point Move Immediate, Shifted
  35     fishmv - Floating-Point Immediate, Second-half Move
  36 ```
  37
  38 **Submitter**: Luke Leighton (Libre-SOC)
  39
  40 **Requester**: Libre-SOC
  41
  42 **Impact on processor**:
  43
  44 ```
  45     Addition of two new FPR-based instructions
  46 ```
  47
  48 **Impact on software**:
  49
  50 ```
  51     Requires support for new instructions in assembler, debuggers,
  52     and related tools.
  53 ```
  54
  55 **Keywords**:
  56
  57 ```
  58     FPR, Floating-point, Load-immediate, BF16, bfloat16, BFP32
  59 ```
  60
  61 **Motivation**
  62
  63 Similar to `lxvkq` but extended to a bfloat16 with one
  64 32-bit instruction and a full FP32 in two 32-bit instructions
  65 these instructions always save a Data Load and associated L1
  66 and TLB lookup. Even quickly clearing an FPR to zero presently needs Load.
  67
  68 **Notes and Observations**:
  69
  70 1. There is no need for an Rc=1 variant because this is an immediate
  71   loading instruction (an FPR equivalent to `li`)
  72 2. There is no need for Special Registers (FP Flags) because this
  73   is an immediate loading instruction.  No FPR Load Operations
  74   alter `FPSCR`, neither does `lxvkq`, and on that basis neither
  75   should these instructions.
  76 3. `fishmv` as a FRT-only Read-Modify-Write (instead of an unnecessary
  77   FRT,FRA pair) saves five potential bits, making
  78   the difference between a 5-bit XO (VA/DX-Form) and requiring an entire
  79   Primary Opcode.
  80
  81 **Changes**
  82
  83 Add the following entries to:
  84
  85 * the Appendices of Book I
  86 * Instructions of Book I as a new Section 4.6.2.1
  87 * DX-Form of Book I Section 1.6.1.6 and 1.6.2
  88 * Floating-Point Data a Format of Book I Section 4.3.1
  89
  90 ----------------
  91
  92 \newpage{}
  93
  94 # Floating-Point Move Immediate
  95
  96 `fmvis FRT, D`
  97
  98 |  0-5   | 6-10 | 11-15 | 16-25 | 26-30 | 31  | Form    |
  99 |--------|------|-------|-------|-------|-----|---------|
 100 |  Major | FRT  | d1    | d0    | XO    | d2  | DX-Form |
 101
 102 Pseudocode:
 103
 104 ```
 105     bf16 <- d0 || d1 || d2  # create bfloat16 immediate
 106     bfp32 <- bf16 || [0]*16 # convert bfloat16 to BFP32
 107     FRT <- DOUBLE(bfp32)    # convert BFP32 to BFP64
 108 ```
 109
 110 Special registers altered:
 111
 112     None
 113
 114 The value `D << 16` is interpreted as a 32-bit float, converted to a
 115 64-bit float and written to `FRT`.  This is equivalent to reinterpreting
 116 `D` as a `bfloat16` and converting to 64-bit float.
 117
 118 Examples:
 119
 120 ```
 121     fmvis f4, 0 # writes +0.0 to f4 (clears an FPR)
 122     fmvis f4, 0x8000 # writes -0.0 to f4
 123     fmvis f4, 0x3F80 # writes +1.0 to f4
 124     fmvis f4, 0xBFC0 # writes -1.5 to f4
 125     fmvis f4, 0x7FC0 # writes +qNaN to f4
 126     fmvis f4, 0x7F80 # writes +Infinity to f4
 127     fmvis f4, 0xFF80 # writes -Infinity to f4
 128     fmvis f4, 0x3FFF # writes +1.9921875 to f4
 129 ```
 130
 131 # Floating-Point Immediate Second-Half Move
 132
 133 `fishmv FRT, D`
 134
 135 DX-Form:
 136
 137 |  0-5   | 6-10 | 11-15 | 16-25 | 26-30 | 31  | Form    |
 138 |--------|------|-------|-------|-------|-----|---------|
 139 |  Major | FRT  | d1    | d0    | XO    | d2  | DX-Form |
 140
 141 Pseudocode:
 142
 143 ```
 144     n <- (FRT)                      # read FRT
 145     bfp32 <- SINGLE(n)              # convert to BFP32
 146     bfp32[16:31] <- d0 || d1 || d2  # replace LSB half
 147     FRT <- DOUBLE(bfp32)            # convert back to BFP64
 148 ```
 149
 150 Special registers altered:
 151
 152     None
 153
 154 An additional 16-bits of immediate is
 155 inserted into the low-order half of the single-format value
 156 corresponding to the contents of FRT.
 157
 158 **This instruction performs a Read-Modify-Write on FRT.**
 159 In hardware, `fishmv` may be macro-op-fused with `fmvis`.
 160
 161 Programmer's note:
 162 The use of these two instructions is strategically similar to
 163 how `li` combined with `oris` may be used to construct 32-bit Integers.
 164 If a prior `fmvis` instruction had been used to
 165 set the upper 16-bits from a BFP32 value, `fishmv` may be used
 166 to set the
 167 lower 16-bits.
 168 Example:
 169
 170 ```
 171     # these two combined instructions write 0x3f808000
 172     # into f4 as a BFP32 to be converted to a BFP64.
 173     # actual contents in f4 after conversion: 0x3ff0_1000_0000_0000
 174     # first the upper bits, happens to be +1.0
 175     fmvis f4, 0x3F80 # writes +1.0 to f4
 176     # now write the lower 16 bits of a BFP32
 177     fishmv f4, 0x8000 # writes +1.00390625 to f4
 178 ```
 179 [[!tag opf_rfc]]
 180
 181 -------------
 182
 183 \newpage{}
 184
 185 # DX-Form
 186
 187 Add the following to Book I, 1.6.1.6, DX-Form
 188
 189 ```
 190   |0    |6   |11   |16   |26   |31
 191   | PO  | FRT|   d1|   d0|   XO|d2
 192 ```
 193
 194 Add `DX` to `FRT` Field in Book I, 1.6.2
 195
 196 ```
 197  FRT (6:10)
 198      Field used to specify an FPR to be used as a
 199      source.
 200      Formats: D, X, DX
 201 ```
 202
 203 # bfloat16 definition
 204
 205 Add the following to Book I, 4.3.1:
 206
 207 The format may be a 16-bit bfloat16, 32-bit single format for a
 208 single-precision value...
 209
 210 The bfloat16 format is used as an immediate.
 211
 212 The structure of the bfloat16, single and double formats is shown below.
 213
 214 ```
 215   |S |EXP| FRACTION|
 216   |0 |1 8|9      15|
 217 ```
 218
 219 Figure #. Binary floating-point half-precision format (bfloat16)
 220
 221 # Appendices
 222
 223     Appendix E Power ISA sorted by opcode
 224     Appendix F Power ISA sorted by version
 225     Appendix G Power ISA sorted by Compliancy Subset
 226     Appendix H Power ISA sorted by mnemonic
 227
 228 | Form | Book | Page | Version | mnemonic | Description |
 229 |------|------|------|---------|----------|-------------|
 230 | DX   | I    | #    | 3.0B    | fmvis    | Floating-point Move Immediate, Shifted |
 231 | DX   | I    | #    | 3.0B    | fishmv  | Floating-point Immediate, Second-half Move |
 232