openpower/sv/int_fp_mv.mdwn

   1 [[!tag standards]]
   2
   3 # FPR-to-GPR and GPR-to-FPR
   4
   5 **Draft Status** under development, for submission as an RFC
   6
   7 Links:
   8
   9 * <https://bugs.libre-soc.org/show_bug.cgi?id=650>
  10 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c71>
  11 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c74>
  12 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c76>
  13 * [[int_fp_mv/appendix]]
  14
  15 Introduction:
  16
  17 High-performance CPU/GPU software needs to often convert between integers
  18 and floating-point, therefore fast conversion/data-movement instructions
  19 are needed.  Also given that initialisation of floats tends to take up
  20 considerable space (even to just load 0.0) the inclusion of compact
  21 format float immediate is up for consideration using BF16 as a base.
  22
  23 Libre-SOC will be compliant with the
  24 **Scalar Floating-Point Subset** (SFFS) i.e. is not implementing VMX/VSX,
  25 and with its focus on modern 3D GPU hybrid workloads represents an
  26 important new potential use-case for OpenPOWER.
  27
  28 Prior to the formation of the Compliancy Levels first introduced
  29 in v3.0C and v3.1
  30 the progressive historic development of the Scalar parts of the Power ISA assumed
  31 that VSX would always be there to complement it. However With VMX/VSX
  32 **not available** in the newly-introduced SFFS Compliancy Level, the
  33 existing non-VSX conversion/data-movement instructions require load/store
  34 instructions (slow and expensive) to transfer data between the FPRs and
  35 the GPRs.  For a 3D GPU this kills any modern competitive edge.
  36 Also, because SimpleV needs efficient scalar instructions in
  37 order to generate efficient vector instructions, adding new instructions
  38 for data-transfer/conversion between FPRs and GPRs multiplies the savings.
  39
  40 In addition, the vast majority of GPR <-> FPR data-transfers are as part
  41 of a FP <-> Integer conversion sequence, therefore reducing the number
  42 of instructions required to the minimum seems necessary.
  43
  44 Therefore, we are proposing adding:
  45
  46 * FPR load-immediate using `BF16` as the constant
  47 * FPR <-> GPR data-transfer instructions that just copy bits without conversion
  48 * FPR <-> GPR combined data-transfer/conversion instructions that do
  49   Integer <-> FP conversions
  50
  51 If adding new Integer <-> FP conversion instructions,
  52 the opportunity may be taken to modernise the instructions and make them
  53 well-suited for common/important conversion sequences:
  54
  55 * **standard IEEE754** - used by most languages and CPUs
  56 * **standard OpenPOWER** - saturation with NaN
  57   converted to minimum valid integer
  58 * **Java** - saturation with NaN converted to 0
  59 * **JavaScript** - modulo wrapping with Inf/NaN converted to 0
  60
  61 The assembly listings in the [[int_fp_mv/appendix]] show how costly
  62 some of these language-specific conversions are: Javascript is 32
  63 scalar instructions, including seven branch instructions.
  64
  65 # Proposed New Scalar Instructions
  66
  67 All of the following instructions use the standard OpenPower conversion to/from 64-bit float format when reading/writing a 32-bit float from/to a FPR.  All integers however are sourced/stored in the *GPR*.
  68
  69 Integer operands and results being in the GPR is the key differentiator between the proposed instructions
  70 (the entire rationale) compated to existing Scalar Power ISA.
  71 In all existing Power ISA Scalar conversion instructions, all
  72 operands are FPRs, even if the format of the source or destination
  73 data is actually a scalar integer.
  74
  75 Note that source and destination widths can be overridden by SimpleV
  76 SVP64, and that SVP64 also has Saturation Modes *in addition*
  77 to those independently described here. SVP64 Overrides and Saturation
  78 work on *both* Fixed *and* Floating Point operands and results.
  79  The interactions with SVP64
  80 are explained in the  [[int_fp_mv/appendix]]
  81
  82 # FPR to GPR moves
  83
  84 * `fmvtg RT, FRA`
  85 * `fmvtg. RT, FRA`
  86
  87 move a 64-bit float from a FPR to a GPR, just copying bits directly.
  88 As a direct bitcopy, no exceptions occur and no status flags are set.
  89
  90 Rc=1 tests RT and sets CR0, exactly like all other Scalar Fixed-Point
  91 operations.
  92
  93 * `fmvtgs RT, FRA`
  94 * `fmvtgs. RT, FRA`
  95
  96 move a 32-bit float from a FPR to a GPR, just copying bits. Converts the
  97 64-bit float in `FRA` to a 32-bit float, then writes the 32-bit float to
  98 `RT`. Effectively, `fmvtgs` is a macro-fusion of `frsp fmvtg`
  99 and therefore has the exact same exception and flags behaviour of `frsp`
 100
 101 Unlike `frsp` however, with RT being a GPR, Rc=1 follows
 102 standard *integer* behaviour, i.e. tests RT and sets CR0.
 103
 104 # GPR to FPR moves
 105
 106 `fmvfg FRT, RA`
 107
 108 move a 64-bit float from a GPR to a FPR, just copying bits. No exceptions
 109 are raised, no flags are altered of any kind.
 110
 111 Rc=1 tests FRT and sets CR1
 112
 113 `fmvfgs FRT, RA`
 114
 115 move a 32-bit float from a GPR to a FPR, just copying bits. Converts the
 116 32-bit float in `RA` to a 64-bit float, then writes the 64-bit float to
 117 `FRT`. Effectively, `fmvfgs` is a macro-fusion of `fmvfg frsp` and
 118 therefore has the exact same exception and flags behaviour of `frsp`
 119
 120 Rc=1 tests FRT and sets CR1
 121
 122 TODO: clear statement on evaluation as to whether exceptions or flags raised as part of the **FP** conversion (not the int bitcopy part, the conversion part.  the semantics should really be the same as frsp)
 123
 124 v3.0C section 4.6.7.1 states:
 125
 126 FPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when VE=1.
 127
 128     Special Registers Altered:
 129       FPRF FR FI
 130       FX OX UX XX VXSNAN
 131       CR1 (if Rc=1)
 132
 133 # Float load immediate <a name="fmvis"></a>
 134
 135 This is like a variant of `fmvfg`
 136
 137 `fmvis FRT, FI`
 138
 139 Reinterprets `FI << 16` as a 32-bit float, which is then converted to a
 140 64-bit float and written to `FRT`.  This is equivalent to reinterpreting
 141 `FI` as a `BF16` and converting to 64-bit float.
 142
 143 There is no need for an Rc=1 variant because this is an immediate loading
 144 instruction. This frees up one extra bit in the X-Form format for packing
 145 a full `BF16`.
 146
 147 Example:
 148
 149 ```
 150 # clearing a FPR
 151 fmvis f4, 0 # writes +0.0 to f4
 152 # loading handy constants
 153 fmvis f4, 0x8000 # writes -0.0 to f4
 154 fmvis f4, 0x3F80 # writes +1.0 to f4
 155 fmvis f4, 0xBF80 # writes -1.0 to f4
 156 fmvis f4, 0xBFC0 # writes -1.5 to f4
 157 fmvis f4, 0x7FC0 # writes +qNaN to f4
 158 fmvis f4, 0x7F80 # writes +Infinity to f4
 159 fmvis f4, 0xFF80 # writes -Infinity to f4
 160 fmvis f4, 0x3FFF # writes +1.9921875 to f4
 161
 162 # clearing 128 FPRs with 2 SVP64 instructions
 163 # by issuing 32 vec4 (subvector length 4) ops
 164 setvli VL=MVL=32
 165 sv.fmvis/vec4 f0, 0 # writes +0.0 to f0-f127
 166 ```
 167 Important: If the float load immediate instruction(s) are left out,
 168 change all [GPR to FPR conversion instructions](#GPR-to-FPR-conversions)
 169 to instead write `+0.0` if `RA` is register `0`, at least
 170 allowing clearing FPRs.
 171
 172 `fmvis` fits well with DX-Form:
 173
 174 |  0-5   | 6-10 | 11-15 | 16-25 | 26-30 | 31  | Form |
 175 |--------|------|-------|-------|-------|-----|-----|
 176 |  Major | FRT  | d1    | d0    | XO    | d2  | DX-Form |
 177
 178     bf16 = d0 || d1 || d2
 179     fp32 = bf16 || [0]*16
 180     FRT = Single_to_Double(fp32)
 181
 182 # Conversions
 183
 184 Unlike the move instructions
 185 these instructions perform conversions between Integer and
 186 Floating Point. Truncation can therefore occur, as well
 187 as exceptions.
 188
 189 Mode values:
 190
 191 | Mode | `rounding_mode` | Semantics                        |
 192 |------|-----------------|----------------------------------|
 193 | 000  | from `FPSCR`    | [OpenPower semantics]            |
 194 | 001  | Truncate        | [OpenPower semantics]            |
 195 | 010  | from `FPSCR`    | [Java semantics]                 |
 196 | 011  | Truncate        | [Java semantics]                 |
 197 | 100  | from `FPSCR`    | [JavaScript semantics]           |
 198 | 101  | Truncate        | [JavaScript semantics]           |
 199 | rest | --              | illegal instruction trap for now |
 200
 201 [OpenPower semantics]: #fp-to-int-openpower-conversion-semantics
 202 [Java semantics]: #fp-to-int-java-conversion-semantics
 203 [JavaScript semantics]: #fp-to-int-javascript-conversion-semantics
 204
 205 ## GPR to FPR conversions
 206
 207 **Format**
 208
 209 |  0-5   | 6-10 | 11-15  | 16-25 | 26-30 | 31 | Form |
 210 |--------|------|--------|-------|-------|----|------|
 211 |  Major | FRT  | //Mode | RA    | XO    | Rc |X-Form|
 212
 213 All of the following GPR to FPR conversions use the rounding mode from `FPSCR`.
 214
 215 * `fcvtfgw FRT, RA`
 216   Convert from 32-bit signed integer in the GPR `RA` to 64-bit float in
 217   `FRT`.
 218 * `fcvtfgws FRT, RA`
 219   Convert from 32-bit signed integer in the GPR `RA` to 32-bit float in
 220   `FRT`.
 221 * `fcvtfguw FRT, RA`
 222   Convert from 32-bit unsigned integer in the GPR `RA` to 64-bit float in
 223   `FRT`.
 224 * `fcvtfguws FRT, RA`
 225   Convert from 32-bit unsigned integer in the GPR `RA` to 32-bit float in
 226   `FRT`.
 227 * `fcvtfgd FRT, RA`
 228   Convert from 64-bit signed integer in the GPR `RA` to 64-bit float in
 229   `FRT`.
 230 * `fcvtfgds FRT, RA`
 231   Convert from 64-bit signed integer in the GPR `RA` to 32-bit float in
 232   `FRT`.
 233 * `fcvtfgud FRT, RA`
 234   Convert from 64-bit unsigned integer in the GPR `RA` to 64-bit float in
 235   `FRT`.
 236 * `fcvtfguds FRT, RA`
 237   Convert from 64-bit unsigned integer in the GPR `RA` to 32-bit float in
 238   `FRT`.
 239
 240 ## FPR to GPR (Integer) conversions
 241
 242 <div id="fpr-to-gpr-conversion-mode"></div>
 243
 244 Different programming languages turn out to have completely different
 245 semantics for FP to Integer conversion.  Below is an overview
 246 of the different variants, listing the languages and hardware that
 247 implements each variant.
 248
 249 **Standard IEEE754 conversion**
 250
 251 This conversion is outlined in the IEEE754 specification.  It is used
 252 by nearly all programming languages and CPUs.  In the case of OpenPOWER,
 253 the rounding mode is read from FPSCR
 254
 255 **Standard OpenPower conversion**
 256
 257 This conversion, instead of exact IEEE754 Compliance, performs
 258 "saturation with NaN converted to minimum valid integer". This
 259 is also exactly the same as the x86 ISA conversion senantics.
 260 OpenPOWER however has instructions for both:
 261
 262 * rounding mode read from FPSCR
 263 * rounding mode always set to truncate
 264
 265 **Java conversion**
 266
 267 For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by Java's semantics (and Rust's `as` operator) will be referred to as
 268 [Java conversion semantics](#fp-to-int-java-conversion-semantics).
 269
 270 Those same semantics are used in some way by all of the following languages (not necessarily for the default conversion method):
 271
 272 * Java's
 273   [FP -> Integer conversion](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3)
 274 * Rust's FP -> Integer conversion using the
 275   [`as` operator](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics)
 276 * LLVM's
 277   [`llvm.fptosi.sat`](https://llvm.org/docs/LangRef.html#llvm-fptosi-sat-intrinsic) and
 278   [`llvm.fptoui.sat`](https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic) intrinsics
 279 * SPIR-V's OpenCL dialect's
 280   [`OpConvertFToU`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToU) and
 281   [`OpConvertFToS`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToS)
 282   instructions when decorated with
 283   [the `SaturatedConversion` decorator](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_decoration_a_decoration).
 284
 285 **JavaScript conversion**
 286
 287 For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by JavaScripts's `ToInt32` abstract operation will be referred to as [JavaScript conversion semantics](#fp-to-int-javascript-conversion-semantics).
 288
 289 This instruction is present in ARM assembler as FJCVTZS
 290 <https://developer.arm.com/documentation/dui0801/g/hko1477562192868>
 291
 292 **Format**
 293
 294 |  0-5   | 6-10 | 11-15  | 16-25 | 26-30 | 31 | Form |
 295 |--------|------|--------|-------|-------|----|------|
 296 |  Major | RT   | //Mode | FRA   | XO    | Rc |X-Form|
 297
 298 **Instructions**
 299
 300 * `fcvttgw RT, FRA, Mode`
 301   Convert from 64-bit float to 32-bit signed integer, writing the result
 302   to the GPR `RT`. Converts using [mode `Mode`]
 303 * `fcvttguw RT, FRA, Mode`
 304   Convert from 64-bit float to 32-bit unsigned integer, writing the result
 305   to the GPR `RT`. Converts using [mode `Mode`]
 306 * `fcvttgd RT, FRA, Mode`
 307   Convert from 64-bit float to 64-bit signed integer, writing the result
 308   to the GPR `RT`. Converts using [mode `Mode`]
 309 * `fcvttgud RT, FRA, Mode`
 310   Convert from 64-bit float to 64-bit unsigned integer, writing the result
 311   to the GPR `RT`. Converts using [mode `Mode`]
 312 * `fcvtstgw RT, FRA, Mode`
 313   Convert from 32-bit float to 32-bit signed integer, writing the result
 314   to the GPR `RT`. Converts using [mode `Mode`]
 315 * `fcvtstguw RT, FRA, Mode`
 316   Convert from 32-bit float to 32-bit unsigned integer, writing the result
 317   to the GPR `RT`. Converts using [mode `Mode`]
 318 * `fcvtstgd RT, FRA, Mode`
 319   Convert from 32-bit float to 64-bit signed integer, writing the result
 320   to the GPR `RT`. Converts using [mode `Mode`]
 321 * `fcvtstgud RT, FRA, Mode`
 322   Convert from 32-bit float to 64-bit unsigned integer, writing the result
 323   to the GPR `RT`. Converts using [mode `Mode`]
 324
 325 [mode `Mode`]: #fpr-to-gpr-conversion-mode
 326
 327 ## FP to Integer Conversion Pseudo-code
 328
 329 Key for pseudo-code:
 330
 331 | term                      | result type | definition                                                                                         |
 332 |---------------------------|-------------|----------------------------------------------------------------------------------------------------|
 333 | `fp`                      | --          | `f32` or `f64` (or other types from SimpleV)                                                       |
 334 | `int`                     | --          | `u32`/`u64`/`i32`/`i64` (or other types from SimpleV)                                              |
 335 | `uint`                    | --          | the unsigned integer of the same bit-width as `int`                                                |
 336 | `int::BITS`               | `int`       | the bit-width of `int`                                                                             |
 337 | `int::MIN_VALUE`          | `int`       | the minimum value `int` can store (`0` if unsigned, `-2^(int::BITS-1)` if signed)                  |
 338 | `int::MAX_VALUE`          | `int`       | the maximum value `int` can store (`2^int::BITS - 1` if unsigned, `2^(int::BITS-1) - 1` if signed) |
 339 | `int::VALUE_COUNT`        | Integer     | the number of different values `int` can store (`2^int::BITS`). too big to fit in `int`.           |
 340 | `rint(fp, rounding_mode)` | `fp`        | rounds the floating-point value `fp` to an integer according to rounding mode `rounding_mode`      |
 341
 342 <div id="fp-to-int-openpower-conversion-semantics"></div>
 343 OpenPower conversion semantics (section A.2 page 999 (page 1023) of OpenPower ISA v3.1):
 344
 345 ```
 346 def fp_to_int_open_power<fp, int>(v: fp) -> int:
 347     if v is NaN:
 348         return int::MIN_VALUE
 349     if v >= int::MAX_VALUE:
 350         return int::MAX_VALUE
 351     if v <= int::MIN_VALUE:
 352         return int::MIN_VALUE
 353     return (int)rint(v, rounding_mode)
 354 ```
 355
 356 <div id="fp-to-int-java-conversion-semantics"></div>
 357 [Java conversion semantics](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3)
 358 /
 359 [Rust semantics](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics)
 360 (with adjustment to add non-truncate rounding modes):
 361
 362 ```
 363 def fp_to_int_java<fp, int>(v: fp) -> int:
 364     if v is NaN:
 365         return 0
 366     if v >= int::MAX_VALUE:
 367         return int::MAX_VALUE
 368     if v <= int::MIN_VALUE:
 369         return int::MIN_VALUE
 370     return (int)rint(v, rounding_mode)
 371 ```
 372
 373 <div id="fp-to-int-javascript-conversion-semantics"></div>
 374 Section 7.1 of the ECMAScript / JavaScript
 375 [conversion semantics](https://262.ecma-international.org/11.0/#sec-toint32) (with adjustment to add non-truncate rounding modes):
 376
 377 ```
 378 def fp_to_int_java_script<fp, int>(v: fp) -> int:
 379     if v is NaN or infinite:
 380         return 0
 381     v = rint(v, rounding_mode)
 382     v = v mod int::VALUE_COUNT  # 2^32 for i32, 2^64 for i64, result is non-negative
 383     bits = (uint)v
 384     return (int)bits
 385 ```
 386