openpower/sv/int_fp_mv.mdwn

   1 # FPR-to-GPR and GPR-to-FPR
   2
   3 **Draft Status** under development, for submission as an RFC
   4
   5 Links:
   6
   7 * <https://bugs.libre-soc.org/show_bug.cgi?id=650>
   8 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c71>
   9 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c74>
  10 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c76>
  11
  12 Introduction:
  13
  14 High-performance CPU/GPU software needs to often convert between integers
  15 and floating-point, therefore fast conversion/data-movement instructions
  16 are needed.  Also given that initialisation of floats tends to take up
  17 considerable space (even to just load 0.0) the inclusion of compact
  18 format float immediate is up for consideration using BF16
  19
  20 Libre-SOC will be compliant with the
  21 **Scalar Floating-Point Subset** (SFFS) i.e. is not implementing VMX/VSX,
  22 and with its focus on modern 3D GPU hybrid workloads represents an
  23 important new potential use-case for OpenPOWER.
  24
  25 The progressive development of the Scalar parts of the Power ISA assumed
  26 that VSX would be there to complement it. However With VMX/VSX
  27 **not available** in the newly-introduced SFFS Compliancy Level, the
  28 existing non-VSX conversion/data-movement instructions require load/store
  29 instructions (slow and expensive) to transfer data between the FPRs and
  30 the GPRs.  For a 3D GPU this kills any modern competitive edge.
  31 Also, because SimpleV needs efficient scalar instructions in
  32 order to generate efficient vector instructions, adding new instructions
  33 for data-transfer/conversion between FPRs and GPRs multiplies the savings.
  34
  35 In addition, the vast majority of GPR <-> FPR data-transfers are as part
  36 of a FP <-> Integer conversion sequence, therefore reducing the number
  37 of instructions required to the minimum seems necessary.
  38
  39 Therefore, we are proposing adding:
  40
  41 * FPR load-immediate using `BF16` as the constant
  42 * FPR <-> GPR data-transfer instructions that just copy bits without conversion
  43 * FPR <-> GPR combined data-transfer/conversion instructions that do
  44   Integer <-> FP conversions
  45
  46 If we're adding new Integer <-> FP conversion instructions, we may
  47 as well take this opportunity to modernise the instructions and make them
  48 well suited for common/important conversion sequences:
  49
  50 * standard Integer -> FP IEEE754 conversion (used by most languages and CPUs)
  51 * standard OpenPower FP -> Integer conversion (saturation with NaN
  52   converted to minimum valid integer)
  53 * Rust FP -> Integer conversion (saturation with NaN converted to 0)
  54 * JavaScript FP -> Integer conversion (modular with Inf/NaN converted to 0)
  55
  56 The assembly listings in the [[int_fp_mv/appendix]] show how costly
  57 some of these language-specific conversions are: Javascript is 35
  58 scalar instructions, including four branches.
  59
  60 ## FP -> Integer conversions
  61
  62 Different programming languages turn out to have completely different
  63 semantics for FP to Integer conversion.  This section gives an overview
  64 of the different variants, listing the languages and hardware that
  65 implements each variant.
  66
  67 ## standard Integer -> FP conversion
  68
  69 This conversion is outlined in the IEEE754 specification.  It is used
  70 by nearly all programming languages and CPUs.  In the case of OpenPOWER,
  71 the rounding mode is read from FPSCR
  72
  73 ### standard OpenPower FP -> Integer conversion
  74
  75 This conversion, instead of exact IEEE754 Compliance, performs
  76 "saturation with NaN converted to minimum valid integer". This
  77 is also exactly the same as the x86 ISA conversion senantics.
  78 OpenPOWER however has instructions for both:
  79
  80 * rounding mode read from FPSCR
  81 * rounding mode always set to truncate
  82
  83 ### Rust FP -> Integer conversion
  84
  85 For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by Rust's `as` operator will be referred to as [Rust conversion semantics](#fp-to-int-rust-conversion-semantics).
  86
  87 Those same semantics are used in some way by all of the following languages (not necessarily for the default conversion method):
  88
  89 * Rust's FP -> Integer conversion using the
  90   [`as` operator](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics)
  91 * Java's
  92   [FP -> Integer conversion](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3)
  93 * LLVM's
  94   [`llvm.fptosi.sat`](https://llvm.org/docs/LangRef.html#llvm-fptosi-sat-intrinsic) and
  95   [`llvm.fptoui.sat`](https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic) intrinsics
  96 * SPIR-V's OpenCL dialect's
  97   [`OpConvertFToU`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToU) and
  98   [`OpConvertFToS`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToS)
  99   instructions when decorated with
 100   [the `SaturatedConversion` decorator](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_decoration_a_decoration).
 101
 102 ### JavaScript FP -> Integer conversion
 103
 104 For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by JavaScripts's `ToInt32` abstract operation will be referred to as [JavaScript conversion semantics](#fp-to-int-javascript-conversion-semantics).
 105
 106 ### Other languages
 107
 108 TODO: review and investigate other language semantics
 109
 110 # Proposed New Scalar Instructions
 111
 112 All of the following instructions use the standard OpenPower conversion to/from 64-bit float format when reading/writing a 32-bit float from/to a FPR.  All integers however are sourced/stored in the *GPR*.
 113
 114 Integer operands and results being in the GPR is the key differentiator between the proposed instructions
 115 (the entire rationale) compated to existing Scalar Power ISA.
 116 All existing Power ISA Scalar conversion instructions, all
 117 operands are FPRs, even if the format of the source or destination
 118 data is actually a scalar integer.
 119
 120 Note that source and destination widths can be overridden by SimpleV
 121 SVP64, and that SVP64 also has Saturation Modes *in addition*
 122 to those independently described here. SVP64 Overrides and Saturation
 123 work on *both* Fixed *and* Floating Point.
 124  The interactions with SVP64
 125 are explained in the  [[int_fp_mv/appendix]]
 126
 127 ## FPR to GPR moves
 128
 129 * `fmvtg RT, FRA`
 130 * `fmvtg. RT, FRA`
 131
 132 move a 64-bit float from a FPR to a GPR, just copying bits directly.
 133 As a direct bitcopy, no exceptions occur and no status flags are set.
 134
 135 Rc=1 tests RT and sets CR0, exactly like all other Scalar Fixed-Point
 136 operations.
 137
 138 * `fmvtgs RT, FRA`
 139 * `fmvtgs. RT, FRA`
 140
 141 move a 32-bit float from a FPR to a GPR, just copying bits. Converts the
 142 64-bit float in `FRA` to a 32-bit float, then writes the 32-bit float to
 143 `RT`. Effectively, `fmvtgs` is a macro-fusion of `frsp fmvtg`
 144 and therefore has the exact same exception and flags behaviour of `frsp`
 145
 146 Unlike `frsp` however, with RT being a GPR, Rc=1 follows
 147 standard *integer* behaviour, i.e. tests RT and sets CR0.
 148
 149 ## GPR to FPR moves
 150
 151 `fmvfg FRT, RA`
 152
 153 move a 64-bit float from a GPR to a FPR, just copying bits. No exceptions
 154 are raised, no flags are altered of any kind.
 155
 156 Rc=1 tests FRT and sets CR1
 157
 158 `fmvfgs FRT, RA`
 159
 160 move a 32-bit float from a GPR to a FPR, just copying bits. Converts the
 161 32-bit float in `RA` to a 64-bit float, then writes the 64-bit float to
 162 `FRT`. Effectively, `fmvfgs` is a macro-fusion of `fmvfg frsp` and
 163 therefore has the exact same exception and flags behaviour of `frsp`
 164
 165 Rc=1 tests FRT and sets CR1
 166
 167 TODO: clear statement on evaluation as to whether exceptions or flags raised as part of the **FP** conversion (not the int bitcopy part, the conversion part.  the semantics should really be the same as frsp)
 168
 169 v3.0C section 4.6.7.1 states:
 170
 171 FPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when VE=1.
 172
 173     Special Registers Altered:
 174       FPRF FR FI
 175       FX OX UX XX VXSNAN
 176       CR1 (if Rc=1)
 177
 178 ### Float load immediate (kinda a variant of `fmvfg`)
 179
 180 `fmvis FRT, FI`
 181
 182 Reinterprets `FI << 16` as a 32-bit float, which is then converted to a
 183 64-bit float and written to `FRT`.  This is equivalent to reinterpreting
 184 `FI` as a `BF16` and converting to 64-bit float.
 185
 186 Example:
 187
 188 ```
 189 # clearing a FPR
 190 fmvis f4, 0 # writes +0.0 to f4
 191 # loading handy constants
 192 fmvis f4, 0x8000 # writes -0.0 to f4
 193 fmvis f4, 0x3F80 # writes +1.0 to f4
 194 fmvis f4, 0xBF80 # writes -1.0 to f4
 195 fmvis f4, 0xBFC0 # writes -1.5 to f4
 196 fmvis f4, 0x7FC0 # writes +qNaN to f4
 197 fmvis f4, 0x7F80 # writes +Infinity to f4
 198 fmvis f4, 0xFF80 # writes -Infinity to f4
 199 fmvis f4, 0x3FFF # writes +1.9921875 to f4
 200
 201 # clearing 128 FPRs with 2 SVP64 instructions
 202 # by issuing 32 vec4 (subvector length 4) ops
 203 setvli VL=MVL=32
 204 sv.fmvis/vec4 f0, 0 # writes +0.0 to f0-f127
 205 ```
 206 Important: If the float load immediate instruction(s) are left out,
 207 change all [GPR to FPR conversion instructions](#GPR-to-FPR-conversions)
 208 to instead write `+0.0` if `RA` is register `0`, at least
 209 allowing clearing FPRs.
 210
 211 |  0-5   | 6-10 | 11-25 | 26-30 | 31  |
 212 |--------|------|-------|-------|-----|
 213 |  Major | FRT  | FI    | XO    | FI0 |
 214
 215 The above fits reasonably well with Minor 19 and follows the
 216 pattern shown by `addpcis`, which uses an entire column of Minor 19
 217 XO.  15 bits of FI fit into bits 11 to 25,
 218 the top bit FI0 (MSB0 numbered 0) makes 16.
 219
 220     bf16 = FI0 || FI
 221     fp32 = bf16 || [0]*16
 222     FRT = Single_to_Double(fp32)
 223
 224 ## FPR to GPR conversions
 225
 226 <div id="fpr-to-gpr-conversion-mode"></div>
 227
 228 X-Form:
 229
 230 |  0-5   | 6-10 | 11-15  | 16-25 | 26-30 | 31 |
 231 |--------|------|--------|-------|-------|----|
 232 |  Major | RT   | //Mode | FRA   | XO    | Rc |
 233 |  Major | FRT  | //Mode | RA    | XO    | Rc |
 234
 235 Mode values:
 236
 237 | Mode | `rounding_mode` | Semantics                        |
 238 |------|-----------------|----------------------------------|
 239 | 000  | from `FPSCR`    | [OpenPower semantics]            |
 240 | 001  | Truncate        | [OpenPower semantics]            |
 241 | 010  | from `FPSCR`    | [Rust semantics]                 |
 242 | 011  | Truncate        | [Rust semantics]                 |
 243 | 100  | from `FPSCR`    | [JavaScript semantics]           |
 244 | 101  | Truncate        | [JavaScript semantics]           |
 245 | rest | --              | illegal instruction trap for now |
 246
 247 [OpenPower semantics]: #fp-to-int-openpower-conversion-semantics
 248 [Rust semantics]: #fp-to-int-rust-conversion-semantics
 249 [JavaScript semantics]: #fp-to-int-javascript-conversion-semantics
 250
 251 `fcvttgw RT, FRA, Mode`
 252
 253 Convert from 64-bit float to 32-bit signed integer, writing the result
 254 to the GPR `RT`. Converts using [mode `Mode`]
 255
 256 `fcvttguw RT, FRA, Mode`
 257
 258 Convert from 64-bit float to 32-bit unsigned integer, writing the result
 259 to the GPR `RT`. Converts using [mode `Mode`]
 260
 261 `fcvttgd RT, FRA, Mode`
 262
 263 Convert from 64-bit float to 64-bit signed integer, writing the result
 264 to the GPR `RT`. Converts using [mode `Mode`]
 265
 266 `fcvttgud RT, FRA, Mode`
 267
 268 Convert from 64-bit float to 64-bit unsigned integer, writing the result
 269 to the GPR `RT`. Converts using [mode `Mode`]
 270
 271 `fcvtstgw RT, FRA, Mode`
 272
 273 Convert from 32-bit float to 32-bit signed integer, writing the result
 274 to the GPR `RT`. Converts using [mode `Mode`]
 275
 276 `fcvtstguw RT, FRA, Mode`
 277
 278 Convert from 32-bit float to 32-bit unsigned integer, writing the result
 279 to the GPR `RT`. Converts using [mode `Mode`]
 280
 281 `fcvtstgd RT, FRA, Mode`
 282
 283 Convert from 32-bit float to 64-bit signed integer, writing the result
 284 to the GPR `RT`. Converts using [mode `Mode`]
 285
 286 `fcvtstgud RT, FRA, Mode`
 287
 288 Convert from 32-bit float to 64-bit unsigned integer, writing the result
 289 to the GPR `RT`. Converts using [mode `Mode`]
 290
 291 [mode `Mode`]: #fpr-to-gpr-conversion-mode
 292
 293 ## GPR to FPR conversions
 294
 295 All of the following GPR to FPR conversions use the rounding mode from `FPSCR`.
 296
 297 `fcvtfgw FRT, RA`
 298
 299 Convert from 32-bit signed integer in the GPR `RA` to 64-bit float in `FRT`.
 300
 301 `fcvtfgws FRT, RA`
 302
 303 Convert from 32-bit signed integer in the GPR `RA` to 32-bit float in `FRT`.
 304
 305 `fcvtfguw FRT, RA`
 306
 307 Convert from 32-bit unsigned integer in the GPR `RA` to 64-bit float in `FRT`.
 308
 309 `fcvtfguws FRT, RA`
 310
 311 Convert from 32-bit unsigned integer in the GPR `RA` to 32-bit float in `FRT`.
 312
 313 `fcvtfgd FRT, RA`
 314
 315 Convert from 64-bit signed integer in the GPR `RA` to 64-bit float in `FRT`.
 316
 317 `fcvtfgds FRT, RA`
 318
 319 Convert from 64-bit signed integer in the GPR `RA` to 32-bit float in `FRT`.
 320
 321 `fcvtfgud FRT, RA`
 322
 323 Convert from 64-bit unsigned integer in the GPR `RA` to 64-bit float in `FRT`.
 324
 325 `fcvtfguds FRT, RA`
 326
 327 Convert from 64-bit unsigned integer in the GPR `RA` to 32-bit float in `FRT`.
 328
 329 # FP to Integer Conversion Pseudo-code
 330
 331 Key for pseudo-code:
 332
 333 | term                      | result type | definition                                                                                         |
 334 |---------------------------|-------------|----------------------------------------------------------------------------------------------------|
 335 | `fp`                      | --          | `f32` or `f64` (or other types from SimpleV)                                                       |
 336 | `int`                     | --          | `u32`/`u64`/`i32`/`i64` (or other types from SimpleV)                                              |
 337 | `uint`                    | --          | the unsigned integer of the same bit-width as `int`                                                |
 338 | `int::BITS`               | `int`       | the bit-width of `int`                                                                             |
 339 | `int::MIN_VALUE`          | `int`       | the minimum value `int` can store (`0` if unsigned, `-2^(int::BITS-1)` if signed)                  |
 340 | `int::MAX_VALUE`          | `int`       | the maximum value `int` can store (`2^int::BITS - 1` if unsigned, `2^(int::BITS-1) - 1` if signed) |
 341 | `int::VALUE_COUNT`        | Integer     | the number of different values `int` can store (`2^int::BITS`). too big to fit in `int`.           |
 342 | `rint(fp, rounding_mode)` | `fp`        | rounds the floating-point value `fp` to an integer according to rounding mode `rounding_mode`      |
 343
 344 <div id="fp-to-int-openpower-conversion-semantics"></div>
 345 OpenPower conversion semantics (section A.2 page 999 (page 1023) of OpenPower ISA v3.1):
 346
 347 ```
 348 def fp_to_int_open_power<fp, int>(v: fp) -> int:
 349     if v is NaN:
 350         return int::MIN_VALUE
 351     if v >= int::MAX_VALUE:
 352         return int::MAX_VALUE
 353     if v <= int::MIN_VALUE:
 354         return int::MIN_VALUE
 355     return (int)rint(v, rounding_mode)
 356 ```
 357
 358 <div id="fp-to-int-rust-conversion-semantics"></div>
 359 Rust [conversion semantics](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics) (with adjustment to add non-truncate rounding modes):
 360
 361 ```
 362 def fp_to_int_rust<fp, int>(v: fp) -> int:
 363     if v is NaN:
 364         return 0
 365     if v >= int::MAX_VALUE:
 366         return int::MAX_VALUE
 367     if v <= int::MIN_VALUE:
 368         return int::MIN_VALUE
 369     return (int)rint(v, rounding_mode)
 370 ```
 371
 372 <div id="fp-to-int-javascript-conversion-semantics"></div>
 373 Section 7.1 of the ECMAScript / JavaScript
 374 [conversion semantics](https://262.ecma-international.org/11.0/#sec-toint32) (with adjustment to add non-truncate rounding modes):
 375
 376 ```
 377 def fp_to_int_java_script<fp, int>(v: fp) -> int:
 378     if v is NaN or infinite:
 379         return 0
 380     v = rint(v, rounding_mode)
 381     v = v mod int::VALUE_COUNT  # 2^32 for i32, 2^64 for i64, result is non-negative
 382     bits = (uint)v
 383     return (int)bits
 384 ```
 385
 386 # Equivalent OpenPower ISA v3.0 Assembly Language for FP -> Integer Conversion Modes
 387
 388 Moved to [[int_fp_mv/appendix]]