openpower/sv/int_fp_mv.mdwn

   1 [[!tag standards]]
   2
   3 # FPR-to-GPR and GPR-to-FPR
   4
   5 **Draft Status** under development, for submission as an RFC
   6
   7 Links:
   8
   9 * <https://bugs.libre-soc.org/show_bug.cgi?id=650>
  10 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c71>
  11 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c74>
  12 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c76>
  13 * [[int_fp_mv/appendix]]
  14
  15 Introduction:
  16
  17 High-performance CPU/GPU software needs to often convert between integers
  18 and floating-point, therefore fast conversion/data-movement instructions
  19 are needed.  Also given that initialisation of floats tends to take up
  20 considerable space (even to just load 0.0) the inclusion of compact
  21 format float immediate is up for consideration using BF16 as a base.
  22
  23 Libre-SOC will be compliant with the
  24 **Scalar Floating-Point Subset** (SFFS) i.e. is not implementing VMX/VSX,
  25 and with its focus on modern 3D GPU hybrid workloads represents an
  26 important new potential use-case for OpenPOWER.
  27
  28 Prior to the formation of the Compliancy Levels first introduced
  29 in v3.0C and v3.1
  30 the progressive historic development of the Scalar parts of the Power ISA assumed
  31 that VSX would always be there to complement it. However With VMX/VSX
  32 **not available** in the newly-introduced SFFS Compliancy Level, the
  33 existing non-VSX conversion/data-movement instructions require load/store
  34 instructions (slow and expensive) to transfer data between the FPRs and
  35 the GPRs.  For a 3D GPU this kills any modern competitive edge.
  36 Also, because SimpleV needs efficient scalar instructions in
  37 order to generate efficient vector instructions, adding new instructions
  38 for data-transfer/conversion between FPRs and GPRs multiplies the savings.
  39
  40 In addition, the vast majority of GPR <-> FPR data-transfers are as part
  41 of a FP <-> Integer conversion sequence, therefore reducing the number
  42 of instructions required to the minimum seems necessary.
  43
  44 Therefore, we are proposing adding:
  45
  46 * FPR load-immediate using `BF16` as the constant
  47 * FPR <-> GPR data-transfer instructions that just copy bits without conversion
  48 * FPR <-> GPR combined data-transfer/conversion instructions that do
  49   Integer <-> FP conversions
  50
  51 If adding new Integer <-> FP conversion instructions,
  52 the opportunity may be taken to modernise the instructions and make them
  53 well-suited for common/important conversion sequences:
  54
  55 * **standard IEEE754** - used by most languages and CPUs
  56 * **standard OpenPOWER** - saturation with NaN
  57   converted to minimum valid integer
  58 * **Java** - saturation with NaN converted to 0
  59 * **JavaScript** - modulo wrapping with Inf/NaN converted to 0
  60
  61 The assembly listings in the [[int_fp_mv/appendix]] show how costly
  62 some of these language-specific conversions are: Javascript is 32
  63 scalar instructions, including seven branch instructions.
  64
  65 ## FP -> Integer conversions
  66
  67 Different programming languages turn out to have completely different
  68 semantics for FP to Integer conversion.  This section gives an overview
  69 of the different variants, listing the languages and hardware that
  70 implements each variant.
  71
  72 ### standard IEEE754 conversion
  73
  74 This conversion is outlined in the IEEE754 specification.  It is used
  75 by nearly all programming languages and CPUs.  In the case of OpenPOWER,
  76 the rounding mode is read from FPSCR
  77
  78 ### standard OpenPower conversion
  79
  80 This conversion, instead of exact IEEE754 Compliance, performs
  81 "saturation with NaN converted to minimum valid integer". This
  82 is also exactly the same as the x86 ISA conversion senantics.
  83 OpenPOWER however has instructions for both:
  84
  85 * rounding mode read from FPSCR
  86 * rounding mode always set to truncate
  87
  88 ### Java conversion
  89
  90 For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by Java's semantics (and Rust's `as` operator) will be referred to as
  91 [Java conversion semantics](#fp-to-int-java-conversion-semantics).
  92
  93 Those same semantics are used in some way by all of the following languages (not necessarily for the default conversion method):
  94
  95 * Java's
  96   [FP -> Integer conversion](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3)
  97 * Rust's FP -> Integer conversion using the
  98   [`as` operator](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics)
  99 * LLVM's
 100   [`llvm.fptosi.sat`](https://llvm.org/docs/LangRef.html#llvm-fptosi-sat-intrinsic) and
 101   [`llvm.fptoui.sat`](https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic) intrinsics
 102 * SPIR-V's OpenCL dialect's
 103   [`OpConvertFToU`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToU) and
 104   [`OpConvertFToS`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToS)
 105   instructions when decorated with
 106   [the `SaturatedConversion` decorator](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_decoration_a_decoration).
 107
 108 ### JavaScript conversion
 109
 110 For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by JavaScripts's `ToInt32` abstract operation will be referred to as [JavaScript conversion semantics](#fp-to-int-javascript-conversion-semantics).
 111
 112 This instruction is present in ARM assembler as FJCVTZS
 113 <https://developer.arm.com/documentation/dui0801/g/hko1477562192868>
 114
 115 ### Other languages
 116
 117 TODO: review and investigate other language semantics
 118
 119 # Proposed New Scalar Instructions
 120
 121 All of the following instructions use the standard OpenPower conversion to/from 64-bit float format when reading/writing a 32-bit float from/to a FPR.  All integers however are sourced/stored in the *GPR*.
 122
 123 Integer operands and results being in the GPR is the key differentiator between the proposed instructions
 124 (the entire rationale) compated to existing Scalar Power ISA.
 125 In all existing Power ISA Scalar conversion instructions, all
 126 operands are FPRs, even if the format of the source or destination
 127 data is actually a scalar integer.
 128
 129 Note that source and destination widths can be overridden by SimpleV
 130 SVP64, and that SVP64 also has Saturation Modes *in addition*
 131 to those independently described here. SVP64 Overrides and Saturation
 132 work on *both* Fixed *and* Floating Point operands and results.
 133  The interactions with SVP64
 134 are explained in the  [[int_fp_mv/appendix]]
 135
 136 ## FPR to GPR moves
 137
 138 * `fmvtg RT, FRA`
 139 * `fmvtg. RT, FRA`
 140
 141 move a 64-bit float from a FPR to a GPR, just copying bits directly.
 142 As a direct bitcopy, no exceptions occur and no status flags are set.
 143
 144 Rc=1 tests RT and sets CR0, exactly like all other Scalar Fixed-Point
 145 operations.
 146
 147 * `fmvtgs RT, FRA`
 148 * `fmvtgs. RT, FRA`
 149
 150 move a 32-bit float from a FPR to a GPR, just copying bits. Converts the
 151 64-bit float in `FRA` to a 32-bit float, then writes the 32-bit float to
 152 `RT`. Effectively, `fmvtgs` is a macro-fusion of `frsp fmvtg`
 153 and therefore has the exact same exception and flags behaviour of `frsp`
 154
 155 Unlike `frsp` however, with RT being a GPR, Rc=1 follows
 156 standard *integer* behaviour, i.e. tests RT and sets CR0.
 157
 158 ## GPR to FPR moves
 159
 160 `fmvfg FRT, RA`
 161
 162 move a 64-bit float from a GPR to a FPR, just copying bits. No exceptions
 163 are raised, no flags are altered of any kind.
 164
 165 Rc=1 tests FRT and sets CR1
 166
 167 `fmvfgs FRT, RA`
 168
 169 move a 32-bit float from a GPR to a FPR, just copying bits. Converts the
 170 32-bit float in `RA` to a 64-bit float, then writes the 64-bit float to
 171 `FRT`. Effectively, `fmvfgs` is a macro-fusion of `fmvfg frsp` and
 172 therefore has the exact same exception and flags behaviour of `frsp`
 173
 174 Rc=1 tests FRT and sets CR1
 175
 176 TODO: clear statement on evaluation as to whether exceptions or flags raised as part of the **FP** conversion (not the int bitcopy part, the conversion part.  the semantics should really be the same as frsp)
 177
 178 v3.0C section 4.6.7.1 states:
 179
 180 FPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when VE=1.
 181
 182     Special Registers Altered:
 183       FPRF FR FI
 184       FX OX UX XX VXSNAN
 185       CR1 (if Rc=1)
 186
 187 ## Float load immediate <a name="fmvis"></a>
 188
 189 This is like a variant of `fmvfg`
 190
 191 `fmvis FRT, FI`
 192
 193 Reinterprets `FI << 16` as a 32-bit float, which is then converted to a
 194 64-bit float and written to `FRT`.  This is equivalent to reinterpreting
 195 `FI` as a `BF16` and converting to 64-bit float.
 196
 197 There is no need for an Rc=1 variant because this is an immediate loading
 198 instruction. This frees up one extra bit in the X-Form format for packing
 199 a full `BF16`.
 200
 201 Example:
 202
 203 ```
 204 # clearing a FPR
 205 fmvis f4, 0 # writes +0.0 to f4
 206 # loading handy constants
 207 fmvis f4, 0x8000 # writes -0.0 to f4
 208 fmvis f4, 0x3F80 # writes +1.0 to f4
 209 fmvis f4, 0xBF80 # writes -1.0 to f4
 210 fmvis f4, 0xBFC0 # writes -1.5 to f4
 211 fmvis f4, 0x7FC0 # writes +qNaN to f4
 212 fmvis f4, 0x7F80 # writes +Infinity to f4
 213 fmvis f4, 0xFF80 # writes -Infinity to f4
 214 fmvis f4, 0x3FFF # writes +1.9921875 to f4
 215
 216 # clearing 128 FPRs with 2 SVP64 instructions
 217 # by issuing 32 vec4 (subvector length 4) ops
 218 setvli VL=MVL=32
 219 sv.fmvis/vec4 f0, 0 # writes +0.0 to f0-f127
 220 ```
 221 Important: If the float load immediate instruction(s) are left out,
 222 change all [GPR to FPR conversion instructions](#GPR-to-FPR-conversions)
 223 to instead write `+0.0` if `RA` is register `0`, at least
 224 allowing clearing FPRs.
 225
 226 `fmvis` fits well with DX-Form:
 227
 228 |  0-5   | 6-10 | 11-15 | 16-25 | 26-30 | 31  | Form |
 229 |--------|------|-------|-------|-------|-----|-----|
 230 |  Major | FRT  | d1    | d0    | XO    | d2  | DX-Form |
 231
 232     bf16 = d0 || d1 || d2
 233     fp32 = bf16 || [0]*16
 234     FRT = Single_to_Double(fp32)
 235
 236 ## FPR to GPR conversions
 237
 238 <div id="fpr-to-gpr-conversion-mode"></div>
 239
 240 X-Form:
 241
 242 |  0-5   | 6-10 | 11-15  | 16-25 | 26-30 | 31 |
 243 |--------|------|--------|-------|-------|----|
 244 |  Major | RT   | //Mode | FRA   | XO    | Rc |
 245 |  Major | FRT  | //Mode | RA    | XO    | Rc |
 246
 247 Mode values:
 248
 249 | Mode | `rounding_mode` | Semantics                        |
 250 |------|-----------------|----------------------------------|
 251 | 000  | from `FPSCR`    | [OpenPower semantics]            |
 252 | 001  | Truncate        | [OpenPower semantics]            |
 253 | 010  | from `FPSCR`    | [Java semantics]                 |
 254 | 011  | Truncate        | [Java semantics]                 |
 255 | 100  | from `FPSCR`    | [JavaScript semantics]           |
 256 | 101  | Truncate        | [JavaScript semantics]           |
 257 | rest | --              | illegal instruction trap for now |
 258
 259 [OpenPower semantics]: #fp-to-int-openpower-conversion-semantics
 260 [Java semantics]: #fp-to-int-java-conversion-semantics
 261 [JavaScript semantics]: #fp-to-int-javascript-conversion-semantics
 262
 263 * `fcvttgw RT, FRA, Mode`
 264   Convert from 64-bit float to 32-bit signed integer, writing the result
 265   to the GPR `RT`. Converts using [mode `Mode`]
 266 * `fcvttguw RT, FRA, Mode`
 267   Convert from 64-bit float to 32-bit unsigned integer, writing the result
 268   to the GPR `RT`. Converts using [mode `Mode`]
 269 * `fcvttgd RT, FRA, Mode`
 270   Convert from 64-bit float to 64-bit signed integer, writing the result
 271   to the GPR `RT`. Converts using [mode `Mode`]
 272 * `fcvttgud RT, FRA, Mode`
 273   Convert from 64-bit float to 64-bit unsigned integer, writing the result
 274   to the GPR `RT`. Converts using [mode `Mode`]
 275 * `fcvtstgw RT, FRA, Mode`
 276   Convert from 32-bit float to 32-bit signed integer, writing the result
 277   to the GPR `RT`. Converts using [mode `Mode`]
 278 * `fcvtstguw RT, FRA, Mode`
 279   Convert from 32-bit float to 32-bit unsigned integer, writing the result
 280   to the GPR `RT`. Converts using [mode `Mode`]
 281 * `fcvtstgd RT, FRA, Mode`
 282   Convert from 32-bit float to 64-bit signed integer, writing the result
 283   to the GPR `RT`. Converts using [mode `Mode`]
 284 * `fcvtstgud RT, FRA, Mode`
 285   Convert from 32-bit float to 64-bit unsigned integer, writing the result
 286   to the GPR `RT`. Converts using [mode `Mode`]
 287
 288 [mode `Mode`]: #fpr-to-gpr-conversion-mode
 289
 290 ## GPR to FPR conversions
 291
 292 All of the following GPR to FPR conversions use the rounding mode from `FPSCR`.
 293
 294 * `fcvtfgw FRT, RA`
 295   Convert from 32-bit signed integer in the GPR `RA` to 64-bit float in
 296   `FRT`.
 297 * `fcvtfgws FRT, RA`
 298   Convert from 32-bit signed integer in the GPR `RA` to 32-bit float in
 299   `FRT`.
 300 * `fcvtfguw FRT, RA`
 301   Convert from 32-bit unsigned integer in the GPR `RA` to 64-bit float in
 302   `FRT`.
 303 * `fcvtfguws FRT, RA`
 304   Convert from 32-bit unsigned integer in the GPR `RA` to 32-bit float in
 305   `FRT`.
 306 * `fcvtfgd FRT, RA`
 307   Convert from 64-bit signed integer in the GPR `RA` to 64-bit float in
 308   `FRT`.
 309 * `fcvtfgds FRT, RA`
 310   Convert from 64-bit signed integer in the GPR `RA` to 32-bit float in
 311   `FRT`.
 312 * `fcvtfgud FRT, RA`
 313   Convert from 64-bit unsigned integer in the GPR `RA` to 64-bit float in
 314   `FRT`.
 315 * `fcvtfguds FRT, RA`
 316   Convert from 64-bit unsigned integer in the GPR `RA` to 32-bit float in
 317   `FRT`.
 318
 319 # FP to Integer Conversion Pseudo-code
 320
 321 Key for pseudo-code:
 322
 323 | term                      | result type | definition                                                                                         |
 324 |---------------------------|-------------|----------------------------------------------------------------------------------------------------|
 325 | `fp`                      | --          | `f32` or `f64` (or other types from SimpleV)                                                       |
 326 | `int`                     | --          | `u32`/`u64`/`i32`/`i64` (or other types from SimpleV)                                              |
 327 | `uint`                    | --          | the unsigned integer of the same bit-width as `int`                                                |
 328 | `int::BITS`               | `int`       | the bit-width of `int`                                                                             |
 329 | `int::MIN_VALUE`          | `int`       | the minimum value `int` can store (`0` if unsigned, `-2^(int::BITS-1)` if signed)                  |
 330 | `int::MAX_VALUE`          | `int`       | the maximum value `int` can store (`2^int::BITS - 1` if unsigned, `2^(int::BITS-1) - 1` if signed) |
 331 | `int::VALUE_COUNT`        | Integer     | the number of different values `int` can store (`2^int::BITS`). too big to fit in `int`.           |
 332 | `rint(fp, rounding_mode)` | `fp`        | rounds the floating-point value `fp` to an integer according to rounding mode `rounding_mode`      |
 333
 334 <div id="fp-to-int-openpower-conversion-semantics"></div>
 335 OpenPower conversion semantics (section A.2 page 999 (page 1023) of OpenPower ISA v3.1):
 336
 337 ```
 338 def fp_to_int_open_power<fp, int>(v: fp) -> int:
 339     if v is NaN:
 340         return int::MIN_VALUE
 341     if v >= int::MAX_VALUE:
 342         return int::MAX_VALUE
 343     if v <= int::MIN_VALUE:
 344         return int::MIN_VALUE
 345     return (int)rint(v, rounding_mode)
 346 ```
 347
 348 <div id="fp-to-int-java-conversion-semantics"></div>
 349 [Java conversion semantics](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3)
 350 /
 351 [Rust semantics](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics)
 352 (with adjustment to add non-truncate rounding modes):
 353
 354 ```
 355 def fp_to_int_java<fp, int>(v: fp) -> int:
 356     if v is NaN:
 357         return 0
 358     if v >= int::MAX_VALUE:
 359         return int::MAX_VALUE
 360     if v <= int::MIN_VALUE:
 361         return int::MIN_VALUE
 362     return (int)rint(v, rounding_mode)
 363 ```
 364
 365 <div id="fp-to-int-javascript-conversion-semantics"></div>
 366 Section 7.1 of the ECMAScript / JavaScript
 367 [conversion semantics](https://262.ecma-international.org/11.0/#sec-toint32) (with adjustment to add non-truncate rounding modes):
 368
 369 ```
 370 def fp_to_int_java_script<fp, int>(v: fp) -> int:
 371     if v is NaN or infinite:
 372         return 0
 373     v = rint(v, rounding_mode)
 374     v = v mod int::VALUE_COUNT  # 2^32 for i32, 2^64 for i64, result is non-negative
 375     bits = (uint)v
 376     return (int)bits
 377 ```
 378