openpower/sv/int_fp_mv.mdwn

   1 # FPR-to-GPR and GPR-to-FPR
   2
   3 Introduction:
   4
   5 High-performance CPU/GPU software needs to often convert between integers
   6 and floating-point, therefore fast conversion/data-movement instructions
   7 are needed.  Also given that initialisation of floats tends to take up
   8 considerable space (even to just load 0.0) the inclusion of float immediate
   9 is up for consideration (BF16 as immediates)
  10
  11 Libre-SOC will be compliant with the
  12 **Scalar Floating-Point Subset** (SFFS) i.e. is not implementing VMX/VSX,
  13 and with its focus on modern 3D GPU hybrid workloads represents an
  14 important new potential use-case for OpenPOWER.
  15
  16 The progressive development of the Scalar parts of the Power ISA assumed
  17 that VSX would be there to complement it. However With VMX/VSX
  18 **not available** in the newly-introduced SFFS Compliancy Level, the
  19 existing non-VSX conversion/data-movement instructions require load/store
  20 instructions (slow and expensive) to transfer data between the FPRs and
  21 the GPRs.  Also, because SimpleV needs efficient scalar instructions in
  22 order to generate efficient vector instructions, adding new instructions
  23 for data-transfer/conversion between FPRs and GPRs multiplies the savings.
  24
  25 In addition, the vast majority of GPR <-> FPR data-transfers are as part
  26 of a FP <-> Integer conversion sequence, therefore reducing the number
  27 of instructions required to the minimum seems necessary.
  28
  29 Therefore, we are proposing adding:
  30
  31 * FPR load-immediate using `BF16` as the constant
  32 * FPR <-> GPR data-transfer instructions that just copy bits without conversion
  33 * FPR <-> GPR combined data-transfer/conversion instructions that do
  34   Integer <-> FP conversions
  35
  36 If we're adding new Integer <-> FP conversion instructions, we may
  37 as well take this opportunity to modernise the instructions and make them
  38 well suited for common/important conversion sequences:
  39
  40 * standard Integer -> FP IEEE754 conversion (used by most languages and CPUs)
  41 * standard OpenPower FP -> Integer conversion (saturation with NaN
  42   converted to minimum valid integer)
  43 * Rust FP -> Integer conversion (saturation with NaN converted to 0)
  44 * JavaScript FP -> Integer conversion (modular with Inf/NaN converted to 0)
  45
  46 The assembly listings in the [[int_fp_mv/appendix]] show how costly
  47 some of these language-specific conversions are: Javascript is 35
  48 scalar instructions, including four branches.
  49
  50 # A bit more research into integer - fp conversion
  51
  52 here is a paragraph which explains that there are different semantics
  53 for conversion, i don't know what the paragraph should say, but it needs
  54 to be here, to give some background.  it also acts as a lead-in to the
  55 sub-sections, introducing them and explaining why they are here, as
  56 justifications and background research as to why the ISA should support
  57 the feature being proposed.
  58
  59 *nothing* can be left to chance or guesswork.
  60
  61 ## standard Integer -> FP conversion
  62
  63 This conversion is outlined in the IEEE754 specification.  It is used
  64 by nearly all programming languages and CPUs.  In the case of OpenPOWER,
  65 the rounding mode is read from FPSCR
  66
  67 ## FP -> Integer conversions
  68
  69 ### standard OpenPower FP -> Integer conversion
  70
  71 TODO, explain this further, make this a complete sentence:
  72 "saturation with NaN converted to minimum valid integer"
  73
  74   - Matches x86's conversion semantics
  75   - Has instructions for both:
  76     * rounding mode read from FPSCR
  77     * rounding mode is always truncate
  78
  79 ### Rust FP -> Integer conversion
  80
  81 TODO, explain this further, the following is not a complete sentence,
  82 "saturation with NaN converted to 0"
  83
  84 Semantics required by all of:
  85 (what does this mean, what is "required"?
  86 what semantics are being referred to? the sentence needs completing:
  87 "For Rust integer conversion, the semantics required are shown by the
  88 following, all of which are supported in XYZ" something like that)
  89
  90 * Rust's FP -> Integer conversion using the
  91   [`as` operator](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics)
  92 * Java's
  93   [FP -> Integer conversion](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3)
  94 * LLVM's
  95   [`llvm.fptosi.sat`](https://llvm.org/docs/LangRef.html#llvm-fptosi-sat-intrinsic) and
  96   [`llvm.fptoui.sat`](https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic) intrinsics
  97 * SPIR-V's OpenCL dialect's
  98   [`OpConvertFToU`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToU) and
  99   [`OpConvertFToS`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToS)
 100   instructions when decorated with
 101   [the `SaturatedConversion` decorator](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_decoration_a_decoration).
 102
 103 ### JavaScript FP -> Integer conversion
 104
 105 modular with Inf/NaN converted to 0
 106
 107 TODO, explain this further, it is not a sentence:
 108 "Semantics required by JavaScript"
 109
 110 ### Other languages
 111
 112 TODO: review and investigate other language semantics
 113
 114 # Links
 115
 116 * <https://bugs.libre-soc.org/show_bug.cgi?id=650>
 117 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c71>
 118 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c74>
 119 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c76>
 120
 121 # Proposed New Scalar Instructions
 122
 123 All of the following instructions use the standard OpenPower conversion to/from 64-bit float format when reading/writing a 32-bit float from/to a FPR.
 124
 125 This can be overridden by SimpleV, which sets the following
 126 operation "reinterpretation" rules:
 127
 128 * any operation whose assembler mnemonic does not end in "s"
 129   (being defined in v3.0B as a "double" operation) is
 130   instead an operation at the overridden elwidth for the
 131   relevant operand.
 132 * any operation nominally defined as a "single" FP operation
 133   is redefined to be **half the elwidth** rather than
 134   "half of 64 bit".
 135
 136 Examples:
 137
 138 * `sv.fmvtg/sw=32 RT.v, FRA.v` is defined as treating FRA
 139    as a vector of *FP32* source operands each *32* bits wide
 140    which are to be placed into *64* bit integer destination elements.
 141 * `sv.fmvfgs/dw=32 FRT.v, RA.v` is defined as taking the bottom
 142    32 bits of each RA integer source, then performing a **32 bit**
 143    FP32 to **FP16** conversion and storing the result in the
 144    **32 bits** of an FRT destination element.
 145
 146 "Single" is therefore redefined in SVP64 to be "half elwidth"
 147 rather than Double width hardcoded to 64 and Single width
 148 hardcoded to 32.  This allows a full range of conversions
 149 between FP64, FP32, FP16 and BF16.
 150
 151 ## FPR to GPR moves
 152
 153 * `fmvtg RT, FRA`
 154 * `fmvtg. RT, FRA`
 155
 156 move a 64-bit float from a FPR to a GPR, just copying bits directly.
 157 Rc=1 tests RT and sets CR0
 158
 159 * `fmvtgs RT, FRA`
 160 * `fmvtgs. RT, FRA`
 161
 162 move a 32-bit float from a FPR to a GPR, just copying bits. Converts the
 163 64-bit float in `FRA` to a 32-bit float, then writes the 32-bit float to
 164 `RT`.
 165 Rc=1 tests RT and sets CR0
 166
 167 ## GPR to FPR moves
 168
 169 `fmvfg FRT, RA`
 170
 171 move a 64-bit float from a GPR to a FPR, just copying bits. No exceptions
 172 are raised, no flags are altered of any kind.
 173
 174 TODO: Rc=1 variants?
 175
 176 `fmvfgs FRT, RA`
 177
 178 move a 32-bit float from a GPR to a FPR, just copying bits. Converts the
 179 32-bit float in `RA` to a 64-bit float, then writes the 64-bit float to
 180 `FRT`. Effectively, `fmvfgs` is a macro-fusion of `fmvfg frsp`.
 181
 182 TODO: Rc=1 variants?
 183
 184 TODO: clear statement on evaluation as to whether exceptions or flags raised as part of the **FP** conversion (not the int bitcopy part, the conversion part.  the semantics should really be the same as frsp)
 185
 186 v3.0C section 4.6.7.1 states:
 187
 188 FPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when VE=1.
 189
 190     Special Registers Altered:
 191       FPRF FR FI
 192       FX OX UX XX VXSNAN
 193       CR1 (if Rc=1)
 194
 195 ### Float load immediate (kinda a variant of `fmvfg`)
 196
 197 `fmvis FRT, FI`
 198
 199 Reinterprets `FI << 16` as a 32-bit float, which is then converted to a
 200 64-bit float and written to `FRT`.  This is equivalent to reinterpreting
 201 `FI` as a `BF16` and converting to 64-bit float.
 202
 203 Example:
 204
 205 ```
 206 # clearing a FPR
 207 fmvis f4, 0 # writes +0.0 to f4
 208 # loading handy constants
 209 fmvis f4, 0x8000 # writes -0.0 to f4
 210 fmvis f4, 0x3F80 # writes +1.0 to f4
 211 fmvis f4, 0xBF80 # writes -1.0 to f4
 212 fmvis f4, 0xBFC0 # writes -1.5 to f4
 213 fmvis f4, 0x7FC0 # writes +qNaN to f4
 214 fmvis f4, 0x7F80 # writes +Infinity to f4
 215 fmvis f4, 0xFF80 # writes -Infinity to f4
 216 fmvis f4, 0x3FFF # writes +1.9921875 to f4
 217
 218 # clearing 128 FPRs with 2 SVP64 instructions
 219 # by issuing 32 vec4 (subvector length 4) ops
 220 setvli VL=MVL=32
 221 sv.fmvis/vec4 f0, 0 # writes +0.0 to f0-f127
 222 ```
 223 Important: If the float load immediate instruction(s) are left out,
 224 change all [GPR to FPR conversion instructions](#GPR-to-FPR-conversions)
 225 to instead write `+0.0` if `RA` is register `0`, at least
 226 allowing clearing FPRs.
 227
 228 |  0-5   | 6-10 | 11-25 | 26-30 | 31  |
 229 |--------|------|-------|-------|-----|
 230 |  Major | FRT  | FI    | XO    | FI0 |
 231
 232 The above fits reasonably well with Minor 19 and follows the
 233 pattern shown by `addpcis`, which uses an entire column of Minor 19
 234 XO.  15 bits of FI fit into bits 11 to 25,
 235 the top bit FI0 (MSB0 numbered 0) makes 16.
 236
 237     bf16 = FI0 || FI
 238     fp32 = bf16 || [0]*16
 239     FRT = Single_to_Double(fp32)
 240
 241 ## FPR to GPR conversions
 242
 243 <div id="fpr-to-gpr-conversion-mode"></div>
 244
 245 X-Form:
 246
 247 |  0-5   | 6-10 | 11-15  | 16-25 | 26-30 | 31 |
 248 |--------|------|--------|-------|-------|----|
 249 |  Major | RT   | //Mode | FRA   | XO    | Rc |
 250 |  Major | FRT  | //Mode | RA    | XO    | Rc |
 251
 252 Mode values:
 253
 254 | Mode | `rounding_mode` | Semantics                        |
 255 |------|-----------------|----------------------------------|
 256 | 000  | from `FPSCR`    | [OpenPower semantics]            |
 257 | 001  | Truncate        | [OpenPower semantics]            |
 258 | 010  | from `FPSCR`    | [Rust semantics]                 |
 259 | 011  | Truncate        | [Rust semantics]                 |
 260 | 100  | from `FPSCR`    | [JavaScript semantics]           |
 261 | 101  | Truncate        | [JavaScript semantics]           |
 262 | rest | --              | illegal instruction trap for now |
 263
 264 [OpenPower semantics]: #fp-to-int-openpower-conversion-semantics
 265 [Rust semantics]: #fp-to-int-rust-conversion-semantics
 266 [JavaScript semantics]: #fp-to-int-javascript-conversion-semantics
 267
 268 `fcvttgw RT, FRA, Mode`
 269
 270 Convert from 64-bit float to 32-bit signed integer, writing the result
 271 to the GPR `RT`. Converts using [mode `Mode`]
 272
 273 `fcvttguw RT, FRA, Mode`
 274
 275 Convert from 64-bit float to 32-bit unsigned integer, writing the result
 276 to the GPR `RT`. Converts using [mode `Mode`]
 277
 278 `fcvttgd RT, FRA, Mode`
 279
 280 Convert from 64-bit float to 64-bit signed integer, writing the result
 281 to the GPR `RT`. Converts using [mode `Mode`]
 282
 283 `fcvttgud RT, FRA, Mode`
 284
 285 Convert from 64-bit float to 64-bit unsigned integer, writing the result
 286 to the GPR `RT`. Converts using [mode `Mode`]
 287
 288 `fcvtstgw RT, FRA, Mode`
 289
 290 Convert from 32-bit float to 32-bit signed integer, writing the result
 291 to the GPR `RT`. Converts using [mode `Mode`]
 292
 293 `fcvtstguw RT, FRA, Mode`
 294
 295 Convert from 32-bit float to 32-bit unsigned integer, writing the result
 296 to the GPR `RT`. Converts using [mode `Mode`]
 297
 298 `fcvtstgd RT, FRA, Mode`
 299
 300 Convert from 32-bit float to 64-bit signed integer, writing the result
 301 to the GPR `RT`. Converts using [mode `Mode`]
 302
 303 `fcvtstgud RT, FRA, Mode`
 304
 305 Convert from 32-bit float to 64-bit unsigned integer, writing the result
 306 to the GPR `RT`. Converts using [mode `Mode`]
 307
 308 [mode `Mode`]: #fpr-to-gpr-conversion-mode
 309
 310 ## GPR to FPR conversions
 311
 312 All of the following GPR to FPR conversions use the rounding mode from `FPSCR`.
 313
 314 `fcvtfgw FRT, RA`
 315
 316 Convert from 32-bit signed integer in the GPR `RA` to 64-bit float in `FRT`.
 317
 318 `fcvtfgws FRT, RA`
 319
 320 Convert from 32-bit signed integer in the GPR `RA` to 32-bit float in `FRT`.
 321
 322 `fcvtfguw FRT, RA`
 323
 324 Convert from 32-bit unsigned integer in the GPR `RA` to 64-bit float in `FRT`.
 325
 326 `fcvtfguws FRT, RA`
 327
 328 Convert from 32-bit unsigned integer in the GPR `RA` to 32-bit float in `FRT`.
 329
 330 `fcvtfgd FRT, RA`
 331
 332 Convert from 64-bit signed integer in the GPR `RA` to 64-bit float in `FRT`.
 333
 334 `fcvtfgds FRT, RA`
 335
 336 Convert from 64-bit signed integer in the GPR `RA` to 32-bit float in `FRT`.
 337
 338 `fcvtfgud FRT, RA`
 339
 340 Convert from 64-bit unsigned integer in the GPR `RA` to 64-bit float in `FRT`.
 341
 342 `fcvtfguds FRT, RA`
 343
 344 Convert from 64-bit unsigned integer in the GPR `RA` to 32-bit float in `FRT`.
 345
 346 # FP to Integer Conversion Pseudo-code
 347
 348 Key for pseudo-code:
 349
 350 | term                      | result type | definition                                                                                         |
 351 |---------------------------|-------------|----------------------------------------------------------------------------------------------------|
 352 | `fp`                      | --          | `f32` or `f64` (or other types from SimpleV)                                                       |
 353 | `int`                     | --          | `u32`/`u64`/`i32`/`i64` (or other types from SimpleV)                                              |
 354 | `uint`                    | --          | the unsigned integer of the same bit-width as `int`                                                |
 355 | `int::BITS`               | `int`       | the bit-width of `int`                                                                             |
 356 | `int::MIN_VALUE`          | `int`       | the minimum value `int` can store (`0` if unsigned, `-2^(int::BITS-1)` if signed)                  |
 357 | `int::MAX_VALUE`          | `int`       | the maximum value `int` can store (`2^int::BITS - 1` if unsigned, `2^(int::BITS-1) - 1` if signed) |
 358 | `int::VALUE_COUNT`        | Integer     | the number of different values `int` can store (`2^int::BITS`). too big to fit in `int`.           |
 359 | `rint(fp, rounding_mode)` | `fp`        | rounds the floating-point value `fp` to an integer according to rounding mode `rounding_mode`      |
 360
 361 <div id="fp-to-int-openpower-conversion-semantics"></div>
 362 OpenPower conversion semantics (section A.2 page 999 (page 1023) of OpenPower ISA v3.1):
 363
 364 ```
 365 def fp_to_int_open_power<fp, int>(v: fp) -> int:
 366     if v is NaN:
 367         return int::MIN_VALUE
 368     if v >= int::MAX_VALUE:
 369         return int::MAX_VALUE
 370     if v <= int::MIN_VALUE:
 371         return int::MIN_VALUE
 372     return (int)rint(v, rounding_mode)
 373 ```
 374
 375 <div id="fp-to-int-rust-conversion-semantics"></div>
 376 Rust [conversion semantics](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics) (with adjustment to add non-truncate rounding modes):
 377
 378 ```
 379 def fp_to_int_rust<fp, int>(v: fp) -> int:
 380     if v is NaN:
 381         return 0
 382     if v >= int::MAX_VALUE:
 383         return int::MAX_VALUE
 384     if v <= int::MIN_VALUE:
 385         return int::MIN_VALUE
 386     return (int)rint(v, rounding_mode)
 387 ```
 388
 389 <div id="fp-to-int-javascript-conversion-semantics"></div>
 390 JavaScript [conversion semantics](https://262.ecma-international.org/11.0/#sec-toint32) (with adjustment to add non-truncate rounding modes):
 391
 392 ```
 393 def fp_to_int_java_script<fp, int>(v: fp) -> int:
 394     if v is NaN or infinite:
 395         return 0
 396     v = rint(v, rounding_mode)
 397     v = v mod int::VALUE_COUNT  # 2^32 for i32, 2^64 for i64, result is non-negative
 398     bits = (uint)v
 399     return (int)bits
 400 ```
 401
 402 # Equivalent OpenPower ISA v3.0 Assembly Language for FP -> Integer Conversion Modes
 403
 404 Moved to [[int_fp_mv/appendix]]