openpower/sv/int_fp_mv.mdwn

   1 [[!tag standards]]
   2
   3 Note on considered alternative naming schemes: we decided to switch to using the reduced mnemonic naming scheme (over some people's objections) since it would be 5 instructions instead of dozens, though we did consider trying to match PowerISA's existing naming scheme for the instructions rather than only for the instruction aliases. <https://bugs.libre-soc.org/show_bug.cgi?id=1015#c7>
   4
   5 # FPR-to-GPR and GPR-to-FPR
   6
   7 TODO special constants instruction (e, tau/N, ln 2, sqrt 2, etc.) -- exclude any constants available through fmvis
   8
   9 **Draft Status** under development, for submission as an RFC
  10
  11 Links:
  12
  13 * <https://bugs.libre-soc.org/show_bug.cgi?id=650>
  14 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c71>
  15 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c74>
  16 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c76>
  17 * <https://bugs.libre-soc.org/show_bug.cgi?id=887> fmvis
  18 * <https://bugs.libre-soc.org/show_bug.cgi?id=1015> int-fp RFC
  19 * [[int_fp_mv/appendix]]
  20 * [[sv/rfc/ls002]] - `fmvis` and `fishmv` External RFC Formal Submission
  21 * [[sv/rfc/ls006]] - int-fp-mv External RFC Formal Submission
  22
  23 Trademarks:
  24
  25 * Rust is a Trademark of the Rust Foundation
  26 * Java and Javascript are Trademarks of Oracle
  27 * LLVM is a Trademark of the LLVM Foundation
  28 * SPIR-V is a Trademark of the Khronos Group
  29 * OpenCL is a Trademark of Apple, Inc.
  30
  31 Referring to these Trademarks within this document
  32 is by necessity, in order to put the semantics of each language
  33 into context, and is considered "fair use" under Trademark
  34 Law.
  35
  36 Introduction:
  37
  38 High-performance CPU/GPU software needs to often convert between integers
  39 and floating-point, therefore fast conversion/data-movement instructions
  40 are needed.  Also given that initialisation of floats tends to take up
  41 considerable space (even to just load 0.0) the inclusion of two compact
  42 format float immediate instructions is up for consideration using 16-bit
  43 immediates. BF16 is one of the formats: a second instruction allows a full
  44 accuracy FP32 to be constructed.
  45
  46 Libre-SOC will be compliant with the
  47 **Scalar Floating-Point Subset** (SFFS) i.e. is not implementing VMX/VSX,
  48 and with its focus on modern 3D GPU hybrid workloads represents an
  49 important new potential use-case for OpenPOWER.
  50
  51 Prior to the formation of the Compliancy Levels first introduced
  52 in v3.0C and v3.1
  53 the progressive historic development of the Scalar parts of the Power ISA assumed
  54 that VSX would always be there to complement it. However With VMX/VSX
  55 **not available** in the newly-introduced SFFS Compliancy Level, the
  56 existing non-VSX conversion/data-movement instructions require
  57 a Vector of load/store
  58 instructions (slow and expensive) to transfer data between the FPRs and
  59 the GPRs.  For a modern 3D GPU this kills any possibility of a
  60 competitive edge.
  61 Also, because SimpleV needs efficient scalar instructions in
  62 order to generate efficient vector instructions, adding new instructions
  63 for data-transfer/conversion between FPRs and GPRs multiplies the savings.
  64
  65 In addition, the vast majority of GPR <-> FPR data-transfers are as part
  66 of a FP <-> Integer conversion sequence, therefore reducing the number
  67 of instructions required is a priority.
  68
  69 Therefore, we are proposing adding:
  70
  71 * FPR load-immediate instructions, one equivalent to `BF16`, the
  72   other increasing accuracy to `FP32`
  73 * FPR <-> GPR data-transfer instructions that just copy bits without conversion
  74 * FPR <-> GPR combined data-transfer/conversion instructions that do
  75   Integer <-> FP conversions
  76
  77 If adding new Integer <-> FP conversion instructions,
  78 the opportunity may be taken to modernise the instructions and make them
  79 well-suited for common/important conversion sequences:
  80
  81 * **standard IEEE754** - used by most languages and CPUs
  82 * **standard OpenPOWER** - saturation with NaN
  83   converted to minimum valid integer
  84 * **Java** - saturation with NaN converted to 0
  85 * **JavaScript** - modulo wrapping with Inf/NaN converted to 0
  86
  87 The assembly listings in the [[int_fp_mv/appendix]] show how costly
  88 some of these language-specific conversions are: Javascript, the
  89 worst case, is 32 scalar instructions including seven branch instructions.
  90
  91 # Proposed New Scalar Instructions
  92
  93 All of the following instructions use the standard OpenPower conversion to/from 64-bit float format when reading/writing a 32-bit float from/to a FPR.  All integers however are sourced/stored in the *GPR*.
  94
  95 Integer operands and results being in the GPR is the key differentiator between the proposed instructions
  96 (the entire rationale) compared to existing Scalar Power ISA.
  97 In all existing Power ISA Scalar conversion instructions, all
  98 operands are FPRs, even if the format of the source or destination
  99 data is actually a scalar integer.
 100
 101 *(The existing Scalar instructions being FP-FP only is based on an assumption
 102 that VSX will be implemented, and VSX is not part of the SFFS Compliancy
 103 Level. An earlier version of the Power ISA used to have similar
 104 FPR<->GPR instructions to these:
 105 they were deprecated due to this incorrect assumption that VSX would
 106 always be present).*
 107
 108 Note that source and destination widths can be overridden by SimpleV
 109 SVP64, and that SVP64 also has Saturation Modes *in addition*
 110 to those independently described here. SVP64 Overrides and Saturation
 111 work on *both* Fixed *and* Floating Point operands and results.
 112  The interactions with SVP64
 113 are explained in the  [[int_fp_mv/appendix]]
 114
 115 # Float load immediate  <a name="fmvis"></a>
 116
 117 These are like a variant of `fmvfg` and `oris`, combined.
 118 Power ISA currently requires a large
 119 number of instructions to get Floating Point constants into registers.
 120 `fmvis` on its own is equivalent to BF16 to FP32/64 conversion,
 121 but if followed up by `fishmv` an additional 16 bits of accuracy in the
 122 mantissa may be achieved.
 123
 124 These instructions **always** save
 125 resources compared to FP-load for exactly the same reason
 126 that `li` saves resources: an L1-Data-Cache and memory read
 127 is avoided.
 128
 129 *IBM may consider it worthwhile to extend these two instructions to
 130 v3.1 Prefixed (`pfmvis` and `pfishmv`: 8RR, imm0 extended).
 131 If so it is recommended that
 132 `pfmvis` load a full FP32 immediate and `pfishmv` supplies the three high
 133 missing exponent bits (numbered 8 to 10) and the lower additional
 134 29 mantissa bits (23 to 51) needed to construct a full FP64 immediate.
 135 Strictly speaking the sequence `fmvis fishmv pfishmv` achieves the
 136 same effect in the same number of bytes as `pfmvis pfishmv`,
 137 making `pfmvis` redundant.*
 138
 139 Just as Floating-point Load does not set FP Flags neither does fmvis or fishmv.
 140 As fishmv is specifically intended to work in conjunction with fmvis
 141 to provide additional accuracy, all bits other than those which
 142 would have been set by a prior fmvis instruction are deliberately ignored.
 143 (If these instructions involved reading from registers rather than immediates
 144 it would be a different story).
 145
 146 ## Load BF16 Immediate
 147
 148 `fmvis FRS, D`
 149
 150 Reinterprets `D << 16` as a 32-bit float, which is then converted to a
 151 64-bit float and written to `FRS`.  This is equivalent to reinterpreting
 152 `D` as a `BF16` and converting to 64-bit float.
 153 There is no need for an Rc=1 variant because this is an immediate loading
 154 instruction.
 155
 156 Example:
 157
 158 ```
 159 # clearing a FPR
 160 fmvis f4, 0 # writes +0.0 to f4
 161 # loading handy constants
 162 fmvis f4, 0x8000 # writes -0.0 to f4
 163 fmvis f4, 0x3F80 # writes +1.0 to f4
 164 fmvis f4, 0xBF80 # writes -1.0 to f4
 165 fmvis f4, 0xBFC0 # writes -1.5 to f4
 166 fmvis f4, 0x7FC0 # writes +qNaN to f4
 167 fmvis f4, 0x7F80 # writes +Infinity to f4
 168 fmvis f4, 0xFF80 # writes -Infinity to f4
 169 fmvis f4, 0x3FFF # writes +1.9921875 to f4
 170
 171 # clearing 128 FPRs with 2 SVP64 instructions
 172 # by issuing 32 vec4 (subvector length 4) ops
 173 setvli VL=MVL=32
 174 sv.fmvis/vec4 f0, 0 # writes +0.0 to f0-f127
 175 ```
 176 Important: If the float load immediate instruction(s) are left out,
 177 change all [GPR to FPR conversion instructions](#GPR-to-FPR-conversions)
 178 to instead write `+0.0` if `RA` is register `0`, at least
 179 allowing clearing FPRs.
 180
 181 `fmvis` fits with DX-Form:
 182
 183 |  0-5   | 6-10 | 11-15 | 16-25 | 26-30 | 31  | Form    |
 184 |--------|------|-------|-------|-------|-----|---------|
 185 |  Major | FRS  | d1    | d0    | XO    | d2  | DX-Form |
 186
 187 Pseudocode:
 188
 189     bf16 = d0 || d1 || d2 # create BF16 immediate
 190     fp32 = bf16 || [0]*16 # convert BF16 to FP32
 191     FRS = DOUBLE(fp32)    # convert FP32 to FP64
 192
 193 Special registers altered:
 194
 195     None
 196
 197 ## Float Immediate Second-Half MV <a name="fishmv"></a>
 198
 199 `fishmv FRS, D`
 200
 201 DX-Form:
 202
 203 |  0-5   | 6-10 | 11-15 | 16-25 | 26-30 | 31  | Form    |
 204 |--------|------|-------|-------|-------|-----|---------|
 205 |  Major | FRS  | d1    | d0    | XO    | d2  | DX-Form |
 206
 207 Strategically similar to how `oris` is used to construct
 208 32-bit Integers, an additional 16-bits of immediate is
 209 inserted into `FRS` to extend its accuracy to
 210 a full FP32 (stored as usual in FP64 Format within the FPR).
 211 If a prior `fmvis` instruction had been used to
 212 set the upper 16-bits of an FP32 value, `fishmv` contains the
 213 lower 16-bits.
 214
 215 The key difference between using `li` and `oris` to construct 32-bit
 216 GPR Immediates and `fishmv` is that the `fmvis` will have converted
 217 the `BF16` immediate to FP64 (Double) format.
 218 This is taken into consideration
 219 as can be seen in the pseudocode below.
 220
 221 Pseudocode:
 222
 223     fp32 <- SINGLE((FRS))            # convert to FP32
 224     fp32[16:31] <- d0 || d1 || d2    # replace LSB half
 225     FRS <- DOUBLE(fp32)              # convert back to FP64
 226
 227 Special registers altered:
 228
 229     None
 230
 231 **This instruction performs a Read-Modify-Write.** *FRS is read, the additional
 232 16 bit immediate inserted, and the result also written to FRS*
 233
 234 Example:
 235
 236 ```
 237 # these two combined instructions write 0x3f808000
 238 # into f4 as an FP32 to be converted to an FP64.
 239 # actual contents in f4 after conversion: 0x3ff0_1000_0000_0000
 240 # first the upper bits, happens to be +1.0
 241 fmvis f4, 0x3F80 # writes +1.0 to f4
 242 # now write the lower 16 bits of an FP32
 243 fishmv f4, 0x8000 # writes +1.00390625 to f4
 244 ```
 245
 246 # Immediate Tables
 247
 248 Tables that are used by `fmvtg`/`fmvfg`/`fcvttg`/`fcvtfg`:
 249
 250 ## `RCS` -- `Rc` and `s`
 251
 252 | `RCS` | `Rc` | FP Single Mode | Assembly Alias Mnemonic |
 253 |-------|------|----------------|-------------------------|
 254 | 0     | 0    | Double         | `<op>`                  |
 255 | 1     | 1    | Double         | `<op>.`                 |
 256 | 2     | 0    | Single         | `<op>s`                 |
 257 | 3     | 1    | Single         | `<op>s.`                |
 258
 259 ## `IT` -- Integer Type
 260
 261 | `IT` | Integer Type    | Assembly Alias Mnemonic |
 262 |------|-----------------|-------------------------|
 263 | 0    | Signed 32-bit   | `<op>w`                 |
 264 | 1    | Unsigned 32-bit | `<op>uw`                |
 265 | 2    | Signed 64-bit   | `<op>d`                 |
 266 | 3    | Unsigned 64-bit | `<op>ud`                |
 267
 268 ## `CVM` -- Float to Integer Conversion Mode
 269
 270 | `CVM` | `rounding_mode` | Semantics                        |
 271 |-------|-----------------|----------------------------------|
 272 | 000   | from `FPSCR`    | [OpenPower semantics]            |
 273 | 001   | Truncate        | [OpenPower semantics]            |
 274 | 010   | from `FPSCR`    | [Java semantics]                 |
 275 | 011   | Truncate        | [Java semantics]                 |
 276 | 100   | from `FPSCR`    | [JavaScript semantics]           |
 277 | 101   | Truncate        | [JavaScript semantics]           |
 278 | rest  | --              | illegal instruction trap for now |
 279
 280 [OpenPower semantics]: #fp-to-int-openpower-conversion-semantics
 281 [Java semantics]: #fp-to-int-java-conversion-semantics
 282 [JavaScript semantics]: #fp-to-int-javascript-conversion-semantics
 283
 284 # Moves
 285
 286 These instructions perform a straight unaltered bit-level copy from one Register
 287 File to another.
 288
 289 ## FPR to GPR move
 290
 291 `fmvtg RT, FRB, RCS`
 292
 293 | 0-5 | 6-10 | 11-15 | 16-20 | 21-29 | 30-31 | Form   |
 294 |-----|------|-------|-------|-------|-------|--------|
 295 | PO  | RT   | 0     | FRB   | XO    | RCS   | X-Form |
 296
 297 ```
 298 if RCS[0] = 1 then  # if Single mode
 299     RT <- [0] * 32 || SINGLE((FRB))  # SINGLE since that's what stfs uses
 300 else
 301     RT <- (FRB)
 302 ```
 303
 304 move a 32/64-bit float from a FPR to a GPR, just copying bits of the IEEE 754 representation directly. This is equivalent to `stfs` followed by `lwz` or equivalent to `stfd` followed by `ld`.
 305 As `fmvtg` is just copying bits, `FPSCR` is not affected in any way.
 306
 307 Rc=1 tests RT and sets CR0, exactly like all other Scalar Fixed-Point
 308 operations.
 309
 310 ### Assembly Aliases
 311
 312 | Assembly Alias    | Full Instruction   |
 313 |-------------------|--------------------|
 314 | `fmvtg RT, FRB`   | `fmvtg RT, FRB, 0` |
 315 | `fmvtg. RT, FRB`  | `fmvtg RT, FRB, 1` |
 316 | `fmvtgs RT, FRB`  | `fmvtg RT, FRB, 2` |
 317 | `fmvtgs. RT, FRB` | `fmvtg RT, FRB, 3` |
 318
 319 ## GPR to FPR move
 320
 321 `fmvfg FRT, RB, RCS`
 322
 323 | 0-5 | 6-10 | 11-15 | 16-20 | 21-29 | 30-31 | Form   |
 324 |-----|------|-------|-------|-------|-------|--------|
 325 | PO  | FRT  | 0     | RB    | XO    | RCS   | X-Form |
 326
 327 ```
 328 if RCS[0] = 1 then  # if Single mode
 329     FRT <- DOUBLE((RB)[32:63])  # DOUBLE since that's what lfs uses
 330 else
 331     FRT <- (RB)
 332 ```
 333
 334 move a 32/64-bit float from a GPR to a FPR, just copying bits of the IEEE 754 representation directly. This is equivalent to `stw` followed by `lfs` or equivalent to `std` followed by `lfd`. As `fmvfg` is just copying bits, `FPSCR` is not affected in any way.
 335
 336 Rc=1 tests FRT and sets CR1, exactly like all other Scalar Floating-Point
 337 operations.
 338
 339 ### Assembly Aliases
 340
 341 | Assembly Alias    | Full Instruction   |
 342 |-------------------|--------------------|
 343 | `fmvfg FRT, RB`   | `fmvfg FRT, RB, 0` |
 344 | `fmvfg. FRT, RB`  | `fmvfg FRT, RB, 1` |
 345 | `fmvfgs FRT, RB`  | `fmvfg FRT, RB, 2` |
 346 | `fmvfgs. FRT, RB` | `fmvfg FRT, RB, 3` |
 347
 348 # Conversions
 349
 350 Unlike the move instructions
 351 these instructions perform conversions between Integer and
 352 Floating Point. Truncation can therefore occur, as well
 353 as exceptions.
 354
 355 ## Floating-point Convert From GPR
 356
 357 | 0-5 | 6-10 | 11-12 | 13-15 | 16-20 | 21-29 | 30-31 | Form   |
 358 |-----|------|-------|-------|-------|-------|-------|--------|
 359 | PO  | FRT  | IT    | 0     | RB    | XO    | RCS   | X-Form |
 360
 361 `fcvtfg FRT, RB, IT, RCS`
 362
 363 ```
 364 if IT[0] = 0 and RCS[0] = 0 then  # 32-bit int -> 64-bit float
 365     # rounding never necessary, so don't touch FPSCR
 366     # based off xvcvsxwdp
 367     if IT = 0 then  # Signed 32-bit
 368         src <- bfp_CONVERT_FROM_SI32((RB)[32:63])
 369     else  # IT = 1 -- Unsigned 32-bit
 370         src <- bfp_CONVERT_FROM_UI32((RB)[32:63])
 371     FRT <- bfp64_CONVERT_FROM_BFP(src)
 372 else
 373     # rounding may be necessary
 374     # based off xscvuxdsp
 375     reset_xflags()
 376     switch(IT)
 377         case(0):  # Signed 32-bit
 378             src <- bfp_CONVERT_FROM_SI32((RB)[32:63])
 379         case(1):  # Unsigned 32-bit
 380             src <- bfp_CONVERT_FROM_UI32((RB)[32:63])
 381         case(2):  # Signed 64-bit
 382             src <- bfp_CONVERT_FROM_SI64((RB))
 383         default:  # Unsigned 64-bit
 384             src <- bfp_CONVERT_FROM_UI64((RB))
 385     if RCS[0] = 1 then  # Single
 386         rnd <- bfp_ROUND_TO_BFP32(FPSCR.RN, src)
 387         result32 <- bfp32_CONVERT_FROM_BFP(rnd)
 388         cls <- fprf_CLASS_BFP32(result32)
 389         result <- DOUBLE(result32)
 390     else
 391         rnd <- bfp_ROUND_TO_BFP64(FPSCR.RN, src)
 392         result <- bfp64_CONVERT_FROM_BFP(rnd)
 393         cls <- fprf_CLASS_BFP64(result)
 394
 395     if xx_flag = 1 then SetFX(FPSCR.XX)
 396
 397     FRT <- result
 398     FPSCR.FPRF <- cls
 399     FPSCR.FR <- inc_flag
 400     FPSCR.FI <- xx_flag
 401 ```
 402
 403 Convert from a unsigned/signed 32/64-bit integer in RB to a 32/64-bit float in FRT, following the usual 32-bit float in 64-bit float format.
 404
 405 If converting from a unsigned/signed 32-bit integer to a 64-bit float, rounding is never necessary, so `FPSCR` is unmodified and exceptions are never raised. Otherwise, `FPSCR` is modified and exceptions are raised as usual.
 406
 407 Rc=1 tests FRT and sets CR1, exactly like all other Scalar Floating-Point
 408 operations.
 409
 410 ### Assembly Aliases
 411
 412 | Assembly Alias       | Full Instruction       |
 413 |----------------------|------------------------|
 414 | `fcvtfgw FRT, RB`    | `fcvtfg FRT, RB, 0, 0` |
 415 | `fcvtfgw. FRT, RB`   | `fcvtfg FRT, RB, 0, 1` |
 416 | `fcvtfgws FRT, RB`   | `fcvtfg FRT, RB, 0, 2` |
 417 | `fcvtfgws. FRT, RB`  | `fcvtfg FRT, RB, 0, 3` |
 418 | `fcvtfguw FRT, RB`   | `fcvtfg FRT, RB, 1, 0` |
 419 | `fcvtfguw. FRT, RB`  | `fcvtfg FRT, RB, 1, 1` |
 420 | `fcvtfguws FRT, RB`  | `fcvtfg FRT, RB, 1, 2` |
 421 | `fcvtfguws. FRT, RB` | `fcvtfg FRT, RB, 1, 3` |
 422 | `fcvtfgd FRT, RB`    | `fcvtfg FRT, RB, 2, 0` |
 423 | `fcvtfgd. FRT, RB`   | `fcvtfg FRT, RB, 2, 1` |
 424 | `fcvtfgds FRT, RB`   | `fcvtfg FRT, RB, 2, 2` |
 425 | `fcvtfgds. FRT, RB`  | `fcvtfg FRT, RB, 2, 3` |
 426 | `fcvtfgud FRT, RB`   | `fcvtfg FRT, RB, 3, 0` |
 427 | `fcvtfgud. FRT, RB`  | `fcvtfg FRT, RB, 3, 1` |
 428 | `fcvtfguds FRT, RB`  | `fcvtfg FRT, RB, 3, 2` |
 429 | `fcvtfguds. FRT, RB` | `fcvtfg FRT, RB, 3, 3` |
 430
 431 ## Floating-point to Integer Conversion Overview
 432
 433 <div id="fpr-to-gpr-conversion-mode"></div>
 434
 435 Different programming languages turn out to have completely different
 436 semantics for FP to Integer conversion.  Below is an overview
 437 of the different variants, listing the languages and hardware that
 438 implements each variant.
 439
 440 **Standard IEEE754 conversion**
 441
 442 This conversion is outlined in the IEEE754 specification.  It is used
 443 by nearly all programming languages and CPUs.  In the case of OpenPOWER,
 444 the rounding mode is read from FPSCR
 445
 446 **Standard OpenPower conversion**
 447
 448 This conversion, instead of exact IEEE754 Compliance, performs
 449 "saturation with NaN converted to minimum valid integer". This
 450 is also exactly the same as the x86 ISA conversion semantics.
 451 OpenPOWER however has instructions for both:
 452
 453 * rounding mode read from FPSCR
 454 * rounding mode always set to truncate
 455
 456 **Java conversion**
 457
 458 For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by Java's semantics (and Rust's `as` operator) will be referred to as
 459 [Java conversion semantics](#fp-to-int-java-conversion-semantics).
 460
 461 Those same semantics are used in some way by all of the following languages (not necessarily for the default conversion method):
 462
 463 * Java's
 464   [FP -> Integer conversion](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3)
 465 * Rust's FP -> Integer conversion using the
 466   [`as` operator](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics)
 467 * LLVM's
 468   [`llvm.fptosi.sat`](https://llvm.org/docs/LangRef.html#llvm-fptosi-sat-intrinsic) and
 469   [`llvm.fptoui.sat`](https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic) intrinsics
 470 * SPIR-V's OpenCL dialect's
 471   [`OpConvertFToU`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToU) and
 472   [`OpConvertFToS`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToS)
 473   instructions when decorated with
 474   [the `SaturatedConversion` decorator](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_decoration_a_decoration).
 475 * WebAssembly has also introduced
 476  [trunc_sat_u](ttps://webassembly.github.io/spec/core/exec/numerics.html#op-trunc-sat-u) and
 477  [trunc_sat_s](https://webassembly.github.io/spec/core/exec/numerics.html#op-trunc-sat-s)
 478
 479 **JavaScript conversion**
 480
 481 For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by JavaScripts's `ToInt32` abstract operation will be referred to as [JavaScript conversion semantics](#fp-to-int-javascript-conversion-semantics).
 482
 483 This instruction is present in ARM assembler as FJCVTZS
 484 <https://developer.arm.com/documentation/dui0801/g/hko1477562192868>
 485
 486 **Rc=1 and OE=1**
 487
 488 All of these instructions have an Rc=1 mode which sets CR0
 489 in the normal way for any instructions producing a GPR result.
 490 Additionally, when OE=1, if the numerical value of the FP number
 491 is not 100% accurately preserved (due to truncation or saturation
 492 and including when the FP number was NaN) then this is considered
 493 to be an integer Overflow condition, and CR0.SO, XER.SO and XER.OV
 494 are all set as normal for any GPR instructions that overflow.
 495
 496 ### FP to Integer Conversion Simplified Pseudo-code
 497
 498 Key for pseudo-code:
 499
 500 | term                      | result type | definition                                                                                         |
 501 |---------------------------|-------------|----------------------------------------------------------------------------------------------------|
 502 | `fp`                      | --          | `f32` or `f64` (or other types from SimpleV)                                                       |
 503 | `int`                     | --          | `u32`/`u64`/`i32`/`i64` (or other types from SimpleV)                                              |
 504 | `uint`                    | --          | the unsigned integer of the same bit-width as `int`                                                |
 505 | `int::BITS`               | `int`       | the bit-width of `int`                                                                             |
 506 | `uint::MIN_VALUE`         | `uint`      | the minimum value `uint` can store: `0`                   |
 507 | `uint::MAX_VALUE`          | `uint`       | the maximum value `uint` can store: `2^int::BITS - 1`  |
 508 | `int::MIN_VALUE`          | `int`       | the minimum value `int` can store : `-2^(int::BITS-1)`              |
 509 | `int::MAX_VALUE`          | `int`       | the maximum value `int` can store :  `2^(int::BITS-1) - 1`  |
 510 | `int::VALUE_COUNT`        | Integer     | the number of different values `int` can store (`2^int::BITS`). too big to fit in `int`.           |
 511 | `rint(fp, rounding_mode)` | `fp`        | rounds the floating-point value `fp` to an integer according to rounding mode `rounding_mode`      |
 512
 513 <div id="fp-to-int-openpower-conversion-semantics"></div>
 514 OpenPower conversion semantics (section A.2 page 999 (page 1023) of OpenPower ISA v3.1):
 515
 516 ```
 517 def fp_to_int_open_power<fp, int>(v: fp) -> int:
 518     if v is NaN:
 519         return int::MIN_VALUE
 520     if v >= int::MAX_VALUE:
 521         return int::MAX_VALUE
 522     if v <= int::MIN_VALUE:
 523         return int::MIN_VALUE
 524     return (int)rint(v, rounding_mode)
 525 ```
 526
 527 <div id="fp-to-int-java-conversion-semantics"></div>
 528 [Java conversion semantics](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3)
 529 /
 530 [Rust semantics](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics)
 531 (with adjustment to add non-truncate rounding modes):
 532
 533 ```
 534 def fp_to_int_java<fp, int>(v: fp) -> int:
 535     if v is NaN:
 536         return 0
 537     if v >= int::MAX_VALUE:
 538         return int::MAX_VALUE
 539     if v <= int::MIN_VALUE:
 540         return int::MIN_VALUE
 541     return (int)rint(v, rounding_mode)
 542 ```
 543
 544 <div id="fp-to-int-javascript-conversion-semantics"></div>
 545 Section 7.1 of the ECMAScript / JavaScript
 546 [conversion semantics](https://262.ecma-international.org/11.0/#sec-toint32) (with adjustment to add non-truncate rounding modes):
 547
 548 ```
 549 def fp_to_int_java_script<fp, int>(v: fp) -> int:
 550     if v is NaN or infinite:
 551         return 0
 552     v = rint(v, rounding_mode)  # assume no loss of precision in result
 553     v = v mod int::VALUE_COUNT  # 2^32 for i32, 2^64 for i64, result is non-negative
 554     bits = (uint)v
 555     return (int)bits
 556 ```
 557
 558 ## Floating-point Convert To GPR
 559
 560 | 0-5 | 6-10 | 11-12 | 13-15 | 16-20 | 21-28 | 29     | 30 |  31    | Form    |
 561 |-----|------|-------|-------|-------|-------|--------|----|--------|---------|
 562 | PO  | RT   | IT    | CVM   | FRB   | XO    | RCS[0] | OE | RCS[1] | XO-Form |
 563
 564 `fcvttg RT, FRB, CVM, IT, RCS`
 565 `fcvttgo RT, FRB, CVM, IT, RCS`
 566
 567 ```
 568 # based on xscvdpuxws
 569 reset_xflags()
 570
 571 if RCS[0] = 1 then  # if Single mode
 572     src <- bfp_CONVERT_FROM_BFP32(SINGLE((FRB)))
 573 else
 574     src <- bfp_CONVERT_FROM_BFP64((FRB))
 575
 576 switch(IT)
 577     case(0):  # Signed 32-bit
 578         range_min <- bfp_CONVERT_FROM_SI32(0x8000_0000)
 579         range_max <- bfp_CONVERT_FROM_SI32(0x7FFF_FFFF)
 580         js_mask <- 0xFFFF_FFFF
 581     case(1):  # Unsigned 32-bit
 582         range_min <- bfp_CONVERT_FROM_UI32(0)
 583         range_max <- bfp_CONVERT_FROM_UI32(0xFFFF_FFFF)
 584         js_mask <- 0xFFFF_FFFF
 585     case(2):  # Signed 64-bit
 586         range_min <- bfp_CONVERT_FROM_SI64(-0x8000_0000_0000_0000)
 587         range_max <- bfp_CONVERT_FROM_SI64(0x7FFF_FFFF_FFFF_FFFF)
 588         js_mask <- 0xFFFF_FFFF_FFFF_FFFF
 589     default:  # Unsigned 64-bit
 590         range_min <- bfp_CONVERT_FROM_UI64(0)
 591         range_max <- bfp_CONVERT_FROM_UI64(0xFFFF_FFFF_FFFF_FFFF)
 592         js_mask <- 0xFFFF_FFFF_FFFF_FFFF
 593
 594 if CVM[2] = 1 or FPSCR.RN = 0b01 then
 595     rnd <- bfp_ROUND_TO_INTEGER_TRUNC(src)
 596 else if FPSCR.RN = 0b00 then
 597     rnd <- bfp_ROUND_TO_INTEGER_NEAR_EVEN(src)
 598 else if FPSCR.RN = 0b10 then
 599     rnd <- bfp_ROUND_TO_INTEGER_CEIL(src)
 600 else if FPSCR.RN = 0b11 then
 601     rnd <- bfp_ROUND_TO_INTEGER_FLOOR(src)
 602
 603 # set conversion flags
 604 switch(IT)
 605     case(0):  # Signed 32-bit
 606         si32_CONVERT_FROM_BFP(rnd)
 607     case(1):  # Unsigned 32-bit
 608         ui32_CONVERT_FROM_BFP(rnd)
 609     case(2):  # Signed 64-bit
 610         si64_CONVERT_FROM_BFP(rnd)
 611     default:  # Unsigned 64-bit
 612         ui64_CONVERT_FROM_BFP(rnd)
 613
 614 switch(CVM)
 615     case(0, 1):  # OpenPower semantics
 616         if IsNaN(rnd) then
 617             result <- si64_CONVERT_FROM_BFP(range_min)
 618         else if bfp_COMPARE_GT(rnd, range_max) then
 619             result <- ui64_CONVERT_FROM_BFP(range_max)
 620         else if bfp_COMPARE_LT(rnd, range_min) then
 621             result <- si64_CONVERT_FROM_BFP(range_min)
 622         else if IT[1] = 1 then  # Unsigned 32/64-bit
 623             result <- ui64_CONVERT_FROM_BFP(range_max)
 624         else  # Signed 32/64-bit
 625             result <- si64_CONVERT_FROM_BFP(range_max)
 626     case(2, 3):  # Java semantics
 627         if IsNaN(rnd) then
 628             result <- [0] * 64
 629         else if bfp_COMPARE_GT(rnd, range_max) then
 630             result <- ui64_CONVERT_FROM_BFP(range_max)
 631         else if bfp_COMPARE_LT(rnd, range_min) then
 632             result <- si64_CONVERT_FROM_BFP(range_min)
 633         else if IT[1] = 1 then  # Unsigned 32/64-bit
 634             result <- ui64_CONVERT_FROM_BFP(range_max)
 635         else  # Signed 32/64-bit
 636             result <- si64_CONVERT_FROM_BFP(range_max)
 637     default:  # JavaScript semantics
 638         # CVM = 6, 7 are illegal instructions
 639
 640         if IsInf(rnd) or IsNaN(rnd) then
 641             result <- [0] * 64
 642         else
 643             # this works because the largest type we try to
 644             # convert from has 53 significand bits, and the
 645             # largest type we try to convert to has 64 bits,
 646             # and the sum of those is strictly less than the
 647             # 128 bits of the intermediate result.
 648             result128 <- si128_CONVERT_FROM_BFP(rnd)
 649             result <- result128[64:127] & js_mask
 650
 651 switch(IT)
 652     case(0):  # Signed 32-bit
 653         result <- EXTS64(result[32:63])
 654         result_bfp <- bfp_CONVERT_FROM_SI32(result[32:63])
 655     case(1):  # Unsigned 32-bit
 656         result <- EXTZ64(result[32:63])
 657         result_bfp <- bfp_CONVERT_FROM_UI32(result[32:63])
 658     case(2):  # Signed 64-bit
 659         result_bfp <- bfp_CONVERT_FROM_SI64(result)
 660     default:  # Unsigned 64-bit
 661         result_bfp <- bfp_CONVERT_FROM_UI64(result)
 662
 663 if vxsnan_flag = 1 then SetFX(FPSCR.VXSNAN)
 664 if vxcvi_flag = 1 then SetFX(FPSCR.VXCVI)
 665 if xx_flag = 1 then SetFX(FPSCR.XX)
 666
 667 vx_flag <- vxsnan_flag | vxcvi_flag
 668 vex_flag <- FPSCR.VE & vx_flag
 669
 670 if vex_flag = 0 then
 671     RT <- result
 672     FPSCR.FPRF <- undefined
 673     FPSCR.FR <- inc_flag
 674     FPSCR.FI <- xx_flag
 675     if IsNaN(src) or not bfp_COMPARE_EQ(src, result_bfp) then
 676         overflow <- 1  # signals SO only when OE = 1
 677 else
 678     FPSCR.FR <- 0
 679     FPSCR.FI <- 0
 680 ```
 681
 682 Convert from 32/64-bit float in FRB to a unsigned/signed 32/64-bit integer in RT, with the conversion overflow/rounding semantics following the chosen `CVM` value, following the usual 32-bit float in 64-bit float format.
 683
 684 `FPSCR` is modified and exceptions are raised as usual.
 685
 686 Both of these instructions have an Rc=1 mode which sets CR0
 687 in the normal way for any instructions producing a GPR result.
 688 Additionally, when OE=1, if the numerical value of the FP number
 689 is not 100% accurately preserved (due to truncation or saturation
 690 and including when the FP number was NaN) then this is considered
 691 to be an integer Overflow condition, and CR0.SO, XER.SO and XER.OV
 692 are all set as normal for any GPR instructions that overflow.
 693
 694 ### Assembly Aliases
 695
 696 For brevity, `[o]` is used to mean `o` is optional there.
 697
 698 | Assembly Alias               | Full Instruction               |
 699 |------------------------------|--------------------------------|
 700 | `fcvttgw[o] RT, FRB, CVM`    | `fcvttg[o] RT, FRB, CVM, 0, 0` |
 701 | `fcvttgw[o]. RT, FRB, CVM`   | `fcvttg[o] RT, FRB, CVM, 0, 1` |
 702 | `fcvtstgw[o] RT, FRB, CVM`   | `fcvttg[o] RT, FRB, CVM, 0, 2` |
 703 | `fcvtstgw[o]. RT, FRB, CVM`  | `fcvttg[o] RT, FRB, CVM, 0, 3` |
 704 | `fcvttguw[o] RT, FRB, CVM`   | `fcvttg[o] RT, FRB, CVM, 1, 0` |
 705 | `fcvttguw[o]. RT, FRB, CVM`  | `fcvttg[o] RT, FRB, CVM, 1, 1` |
 706 | `fcvtstguw[o] RT, FRB, CVM`  | `fcvttg[o] RT, FRB, CVM, 1, 2` |
 707 | `fcvtstguw[o]. RT, FRB, CVM` | `fcvttg[o] RT, FRB, CVM, 1, 3` |
 708 | `fcvttgd[o] RT, FRB, CVM`    | `fcvttg[o] RT, FRB, CVM, 2, 0` |
 709 | `fcvttgd[o]. RT, FRB, CVM`   | `fcvttg[o] RT, FRB, CVM, 2, 1` |
 710 | `fcvtstgd[o] RT, FRB, CVM`   | `fcvttg[o] RT, FRB, CVM, 2, 2` |
 711 | `fcvtstgd[o]. RT, FRB, CVM`  | `fcvttg[o] RT, FRB, CVM, 2, 3` |
 712 | `fcvttgud[o] RT, FRB, CVM`   | `fcvttg[o] RT, FRB, CVM, 3, 0` |
 713 | `fcvttgud[o]. RT, FRB, CVM`  | `fcvttg[o] RT, FRB, CVM, 3, 1` |
 714 | `fcvtstgud[o] RT, FRB, CVM`  | `fcvttg[o] RT, FRB, CVM, 3, 2` |
 715 | `fcvtstgud[o]. RT, FRB, CVM` | `fcvttg[o] RT, FRB, CVM, 3, 3` |
 716
 717 [mode `Mode`]: #fpr-to-gpr-conversion-mode
 718