X-Git-Url: https://git.libre-soc.org/?a=blobdiff_plain;ds=sidebyside;f=openpower%2Fsv%2Fint_fp_mv.mdwn;h=723ac5385c528d6bb74e8762a5036109e8417a54;hb=b8b9aae2a9c89c4bcc3111cb05d2cd5ca9c3ab60;hp=c497f6f74ed2302f9fcca97f441387a9d2c560fd;hpb=4749616898f83960b9fc59e8eee5754740d2f27d;p=libreriscv.git diff --git a/openpower/sv/int_fp_mv.mdwn b/openpower/sv/int_fp_mv.mdwn index c497f6f74..723ac5385 100644 --- a/openpower/sv/int_fp_mv.mdwn +++ b/openpower/sv/int_fp_mv.mdwn @@ -1,231 +1,281 @@ +[[!tag standards]] + # FPR-to-GPR and GPR-to-FPR +TODO special constants instruction (e, tau/N, ln 2, sqrt 2, etc.) -- exclude any constants available through fmvis + +**Draft Status** under development, for submission as an RFC + +Links: + +* +* +* +* +* fmvis +* [[int_fp_mv/appendix]] + +Trademarks: + +* Rust is a Trademark of the Rust Foundation +* Java and Javascript are Trademarks of Oracle +* LLVM is a Trademark of the LLVM Foundation +* SPIR-V is a Trademark of the Khronos Group +* OpenCL is a Trademark of Apple, Inc. + +Referring to these Trademarks within this document +is by necessity, in order to put the semantics of each language +into context, and is considered "fair use" under Trademark +Law. + Introduction: High-performance CPU/GPU software needs to often convert between integers and floating-point, therefore fast conversion/data-movement instructions are needed. Also given that initialisation of floats tends to take up -considerable space (even to just load 0.0) the inclusion of float immediate -is up for consideration (BF16 as immediates) +considerable space (even to just load 0.0) the inclusion of two compact +format float immediate instructions is up for consideration using 16-bit +immediates. BF16 is one of the formats: a second instruction allows a full +accuracy FP32 to be constructed. Libre-SOC will be compliant with the **Scalar Floating-Point Subset** (SFFS) i.e. is not implementing VMX/VSX, and with its focus on modern 3D GPU hybrid workloads represents an important new potential use-case for OpenPOWER. -With VMX/VSX not available in the SFFS Compliancy Level, the -existing non-VSX conversion/data-movement instructions require load/store + +Prior to the formation of the Compliancy Levels first introduced +in v3.0C and v3.1 +the progressive historic development of the Scalar parts of the Power ISA assumed +that VSX would always be there to complement it. However With VMX/VSX +**not available** in the newly-introduced SFFS Compliancy Level, the +existing non-VSX conversion/data-movement instructions require +a Vector of load/store instructions (slow and expensive) to transfer data between the FPRs and -the GPRs. Also, because SimpleV needs efficient scalar instructions in +the GPRs. For a modern 3D GPU this kills any possibility of a +competitive edge. +Also, because SimpleV needs efficient scalar instructions in order to generate efficient vector instructions, adding new instructions -for data-transfer/conversion between FPRs and GPRs seems necessary. +for data-transfer/conversion between FPRs and GPRs multiplies the savings. In addition, the vast majority of GPR <-> FPR data-transfers are as part of a FP <-> Integer conversion sequence, therefore reducing the number -of instructions required to the minimum seems necessary. +of instructions required is a priority. Therefore, we are proposing adding: -* FPR load-immediate using `BF16` as the constant +* FPR load-immediate instructions, one equivalent to `BF16`, the + other increasing accuracy to `FP32` * FPR <-> GPR data-transfer instructions that just copy bits without conversion * FPR <-> GPR combined data-transfer/conversion instructions that do Integer <-> FP conversions -If we're adding new Integer <-> FP conversion instructions, we may -as well take this opportunity to modernise the instructions and make them -well suited for common/important conversion sequences: - -* standard Integer -> FP conversion (**TODO, which standard?** can it - be described in words? how does it differ from the other "standards"?) -* standard OpenPower FP -> Integer conversion (saturation with NaN - converted to minimum valid integer) -* Rust FP -> Integer conversion (saturation with NaN converted to 0) -* JavaScript FP -> Integer conversion (modular with Inf/NaN converted to 0) - -# A bit more research into integer - fp conversion - -here is a paragraph which explains that there are different semantics -for conversion, i don't know what the paragraph should say, but it needs -to be here, to give some background. it also acts as a lead-in to the -sub-sections, introducing them and explaining why they are here, as -justifications and background research as to why the ISA should support -the feature being proposed. - -*nothing* can be left to chance or guesswork. +If adding new Integer <-> FP conversion instructions, +the opportunity may be taken to modernise the instructions and make them +well-suited for common/important conversion sequences: -## standard Integer -> FP conversion +* **standard IEEE754** - used by most languages and CPUs +* **standard OpenPOWER** - saturation with NaN + converted to minimum valid integer +* **Java** - saturation with NaN converted to 0 +* **JavaScript** - modulo wrapping with Inf/NaN converted to 0 -TODO, explain this further +The assembly listings in the [[int_fp_mv/appendix]] show how costly +some of these language-specific conversions are: Javascript, the +worst case, is 32 scalar instructions including seven branch instructions. -- rounding mode read from FPSCR +# Proposed New Scalar Instructions -# standard OpenPower FP -> Integer conversion +All of the following instructions use the standard OpenPower conversion to/from 64-bit float format when reading/writing a 32-bit float from/to a FPR. All integers however are sourced/stored in the *GPR*. + +Integer operands and results being in the GPR is the key differentiator between the proposed instructions +(the entire rationale) compared to existing Scalar Power ISA. +In all existing Power ISA Scalar conversion instructions, all +operands are FPRs, even if the format of the source or destination +data is actually a scalar integer. + +*(The existing Scalar instructions being FP-FP only is based on an assumption +that VSX will be implemented, and VSX is not part of the SFFS Compliancy +Level. An earlier version of the Power ISA used to have similar +FPR<->GPR instructions to these: +they were deprecated due to this incorrect assumption that VSX would +always be present).* + +Note that source and destination widths can be overridden by SimpleV +SVP64, and that SVP64 also has Saturation Modes *in addition* +to those independently described here. SVP64 Overrides and Saturation +work on *both* Fixed *and* Floating Point operands and results. + The interactions with SVP64 +are explained in the [[int_fp_mv/appendix]] + +# Float load immediate + +These are like a variant of `fmvfg` and `oris`, combined. +Power ISA currently requires a large +number of instructions to get Floating Point constants into registers. +`fmvis` on its own is equivalent to BF16 to FP32/64 conversion, +but if followed up by `frlsi` an additional 16 bits of accuracy in the +mantissa may be achieved. + +*IBM may consider it worthwhile to extend these two instructions to +v3.1 Prefixed (`pfmvis` and `pfrlsi`). If so it is recommended that +`pfmvis` load a full FP32 immediate and `pfrlsi` supplies the three high +missing exponent bits (numbered 8 to 10) and the lower additional +29 mantissa bits (23 to 51) needed to construct a full FP64 immediate.* + +## Load BF16 Immediate + +`fmvis FRS, D` + +Reinterprets `D << 16` as a 32-bit float, which is then converted to a +64-bit float and written to `FRS`. This is equivalent to reinterpreting +`D` as a `BF16` and converting to 64-bit float. +There is no need for an Rc=1 variant because this is an immediate loading +instruction. -TODO, explain this further, make this a complete sentence: -"saturation with NaN converted to minimum valid integer" +Example: - - Matches x86's conversion semantics - - Has instructions for both: - * rounding mode read from FPSCR - * rounding mode is always truncate +``` +# clearing a FPR +fmvis f4, 0 # writes +0.0 to f4 +# loading handy constants +fmvis f4, 0x8000 # writes -0.0 to f4 +fmvis f4, 0x3F80 # writes +1.0 to f4 +fmvis f4, 0xBF80 # writes -1.0 to f4 +fmvis f4, 0xBFC0 # writes -1.5 to f4 +fmvis f4, 0x7FC0 # writes +qNaN to f4 +fmvis f4, 0x7F80 # writes +Infinity to f4 +fmvis f4, 0xFF80 # writes -Infinity to f4 +fmvis f4, 0x3FFF # writes +1.9921875 to f4 -## Rust FP -> Integer conversion +# clearing 128 FPRs with 2 SVP64 instructions +# by issuing 32 vec4 (subvector length 4) ops +setvli VL=MVL=32 +sv.fmvis/vec4 f0, 0 # writes +0.0 to f0-f127 +``` +Important: If the float load immediate instruction(s) are left out, +change all [GPR to FPR conversion instructions](#GPR-to-FPR-conversions) +to instead write `+0.0` if `RA` is register `0`, at least +allowing clearing FPRs. -TODO, explain this further, the following is not a complete sentence, -"saturation with NaN converted to 0" +`fmvis` fits with DX-Form: -Semantics required by all of: -(what does this mean, what is "required"? -what semantics are being referred to? the sentence needs completing: -"For Rust integer conversion, the semantics required are shown by the -following, all of which are supported in XYZ" something like that) +| 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form | +|--------|------|-------|-------|-------|-----|-----| +| Major | FRS | d1 | d0 | XO | d2 | DX-Form | -* Rust's FP -> Integer conversion using the - [`as` operator](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics) -* Java's - [FP -> Integer conversion](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3) -* LLVM's - [`llvm.fptosi.sat`](https://llvm.org/docs/LangRef.html#llvm-fptosi-sat-intrinsic) and - [`llvm.fptoui.sat`](https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic) intrinsics -* SPIR-V's OpenCL dialect's - [`OpConvertFToU`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToU) and - [`OpConvertFToS`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToS) - instructions when decorated with - [the `SaturatedConversion` decorator](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_decoration_a_decoration). +Pseudocode: -## JavaScript FP -> Integer conversion + bf16 = d0 || d1 || d2 + fp32 = bf16 || [0]*16 + FRS = Single_to_Double(fp32) -modular with Inf/NaN converted to 0 +## Float Replace Lower-Half Single, Immediate -TODO, explain this further, it is not a sentence: -"Semantics required by JavaScript" +`frlsi FRS, D` -## Other languages +DX-Form: -TODO: review and investigate other language semantics +| 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form | +|--------|------|-------|-------|-------|-----|-----| +| Major | FRS | d1 | d0 | XO | d2 | DX-Form | -# Links +Strategically similar to how `oris` is used to construct +32-bit Integers, an additional 16-bits of immediate is +inserted into `FRS` to extend its accuracy to +a full FP32 (stored as usual in FP64 Format within the FPR). +If a prior `fmvis` instruction had been used to +set the upper 16-bits of an FP32 value, `frlsi` contains the +lower 16-bits. -* -* -* -* +The key difference between using `li` and `oris` to construct 32-bit +GPR Immediates and `frlsi` is that the `fmvis` will have converted +the `BF16` immediate to FP64 (Double) format. +This is taken into consideration +as can be seen in the pseudocode below. -# Proposed New Scalar Instructions +Pseudocode: -All of the following instructions use the standard OpenPower conversion to/from 64-bit float format when reading/writing a 32-bit float from/to a FPR. + fp32 = Double_to_Single(FRS) + n = fp32[0:15] || d0 || d1 || d2 + FRS = Single_to_Double(n) -This can be overridden by SimpleV, which sets the following -operation "reinterpretation" rules: +*This instruction performs a Read-Modify-Write. FRS is read, the additional +16 bit immediate inserted, and the result also written to FRS* -* any operation whose assembler mnemonic does not end in "s" - (being defined in v3.0B as a "double" operation) is - instead an operation at the overridden elwidth for the - relevant operand. -* any operation nominally defined as a "single" FP operation - is redefined to be **half the elwidth** rather than - "half of 64 bit". +Example: -Examples: +``` +# these two combined instructions write 0x3f808000 +# into f4 as an FP32 to be converted to an FP64. +# actual contents in f4 after conversion: 0x3ff0_1000_0000_0000 +# first the upper bits, happens to be +1.0 +fmvis f4, 0x3F80 # writes +1.0 to f4 +# now write the lower 16 bits of an FP32 +frlsi f4, 0x8000 # writes +1.00390625 to f4 +``` -* `sv.fmvtg/sw=32 RT.v, FRA.v` is defined as treating FRA - as a vector of *FP32* source operands each *32* bits wide - which are to be placed into *64* bit integer destination elements. -* `sv.fmvfgs/dw=32 FRT.v, RA.v` is defined as taking the bottom - 32 bits of each RA integer source, then performing a **32 bit** - FP32 to **FP16** conversion and storing the result in the - **32 bits** of an FRT destination element. +# Moves -"Single" is therefore redefined in SVP64 to be "half elwidth" -rather than Double width hardcoded to 64 and Single width -hardcoded to 32. This allows a full range of conversions -between FP64, FP32, FP16 and BF16. +These instructions perform a straight unaltered bit-level copy from one Register +File to another. -## FPR to GPR moves +# FPR to GPR moves * `fmvtg RT, FRA` * `fmvtg. RT, FRA` move a 64-bit float from a FPR to a GPR, just copying bits directly. -Rc=1 tests RT and sets CR0 +As a direct bitcopy, no exceptions occur and no status flags are set. + +Rc=1 tests RT and sets CR0, exactly like all other Scalar Fixed-Point +operations. * `fmvtgs RT, FRA` * `fmvtgs. RT, FRA` move a 32-bit float from a FPR to a GPR, just copying bits. Converts the 64-bit float in `FRA` to a 32-bit float, then writes the 32-bit float to -`RT`. -Rc=1 tests RT and sets CR0 +`RT`. Effectively, `fmvtgs` is a macro-fusion of `frsp fmvtg` +and therefore has the exact same exception and flags behaviour of `frsp` + +Unlike `frsp` however, with RT being a GPR, Rc=1 follows +standard *integer* behaviour, i.e. tests RT and sets CR0. -## GPR to FPR moves +# GPR to FPR moves `fmvfg FRT, RA` -move a 64-bit float from a GPR to a FPR, just copying bits. +move a 64-bit float from a GPR to a FPR, just copying bits. No exceptions +are raised, no flags are altered of any kind. + +Rc=1 tests FRT and sets CR1 `fmvfgs FRT, RA` move a 32-bit float from a GPR to a FPR, just copying bits. Converts the 32-bit float in `RA` to a 64-bit float, then writes the 64-bit float to -`FRT`. - -TODO: Rc=1 variants? +`FRT`. Effectively, `fmvfgs` is a macro-fusion of `fmvfg frsp` and +therefore has the exact same exception and flags behaviour of `frsp` -### Float load immediate (kinda a variant of `fmvfg`) +Rc=1 tests FRT and sets CR1 -`fmvis FRT, FI` +TODO: clear statement on evaluation as to whether exceptions or flags raised as part of the **FP** conversion (not the int bitcopy part, the conversion part. the semantics should really be the same as frsp) -Reinterprets `FI << 16` as a 32-bit float, which is then converted to a -64-bit float and written to `FRT`. This is equivalent to reinterpreting -`FI` as a `BF16` and converting to 64-bit float. +v3.0C section 4.6.7.1 states: -Example: +FPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when VE=1. -``` -# clearing a FPR -fmvis f4, 0 # writes +0.0 to f4 -# loading handy constants -fmvis f4, 0x8000 # writes -0.0 to f4 -fmvis f4, 0x3F80 # writes +1.0 to f4 -fmvis f4, 0xBF80 # writes -1.0 to f4 -fmvis f4, 0xBFC0 # writes -1.5 to f4 -fmvis f4, 0x7FC0 # writes +qNaN to f4 -fmvis f4, 0x7F80 # writes +Infinity to f4 -fmvis f4, 0xFF80 # writes -Infinity to f4 -fmvis f4, 0x3FFF # writes +1.9921875 to f4 + Special Registers Altered: + FPRF FR FI + FX OX UX XX VXSNAN + CR1 (if Rc=1) -# clearing 128 FPRs with 2 SVP64 instructions -# by issuing 32 vec4 (subvector length 4) ops -setvli VL=MVL=32 -sv.fmvis/vec4 f0, 0 # writes +0.0 to f0-f127 -``` -Important: If the float load immediate instruction(s) are left out, -change all [GPR to FPR conversion instructions](#GPR-to-FPR-conversions) -to instead write `+0.0` if `RA` is register `0`, at least -allowing clearing FPRs. - -| 0-5 | 6-10 | 11-25 | 26-30 | 31 | -|--------|------|-------|-------|-----| -| Major | FRT | FI | XO | FI0 | - -The above fits reasonably well with Minor 19 and follows the -pattern shown by `addpcis`, which uses an entire column of Minor 19 -XO. 15 bits of FI fit into bits 11 to 25, -the top bit FI0 (MSB0 numbered 0) makes 16. - - bf16 = FI0 || FI - fp32 = bf16 || [0]*16 - FRT = Single_to_Double(fp32) - -## FPR to GPR conversions - -
+# Conversions -X-Form: - -| 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | -|--------|------|--------|-------|-------|----| -| Major | RT | //Mode | FRA | XO | Rc | -| Major | FRT | //Mode | RA | XO | Rc | +Unlike the move instructions +these instructions perform conversions between Integer and +Floating Point. Truncation can therefore occur, as well +as exceptions. Mode values: @@ -233,95 +283,152 @@ Mode values: |------|-----------------|----------------------------------| | 000 | from `FPSCR` | [OpenPower semantics] | | 001 | Truncate | [OpenPower semantics] | -| 010 | from `FPSCR` | [Rust semantics] | -| 011 | Truncate | [Rust semantics] | +| 010 | from `FPSCR` | [Java semantics] | +| 011 | Truncate | [Java semantics] | | 100 | from `FPSCR` | [JavaScript semantics] | | 101 | Truncate | [JavaScript semantics] | | rest | -- | illegal instruction trap for now | [OpenPower semantics]: #fp-to-int-openpower-conversion-semantics -[Rust semantics]: #fp-to-int-rust-conversion-semantics +[Java semantics]: #fp-to-int-java-conversion-semantics [JavaScript semantics]: #fp-to-int-javascript-conversion-semantics -`fcvttgw RT, FRA, Mode` - -Convert from 64-bit float to 32-bit signed integer, writing the result -to the GPR `RT`. Converts using [mode `Mode`] - -`fcvttguw RT, FRA, Mode` - -Convert from 64-bit float to 32-bit unsigned integer, writing the result -to the GPR `RT`. Converts using [mode `Mode`] - -`fcvttgd RT, FRA, Mode` - -Convert from 64-bit float to 64-bit signed integer, writing the result -to the GPR `RT`. Converts using [mode `Mode`] - -`fcvttgud RT, FRA, Mode` - -Convert from 64-bit float to 64-bit unsigned integer, writing the result -to the GPR `RT`. Converts using [mode `Mode`] - -`fcvtstgw RT, FRA, Mode` - -Convert from 32-bit float to 32-bit signed integer, writing the result -to the GPR `RT`. Converts using [mode `Mode`] - -`fcvtstguw RT, FRA, Mode` - -Convert from 32-bit float to 32-bit unsigned integer, writing the result -to the GPR `RT`. Converts using [mode `Mode`] - -`fcvtstgd RT, FRA, Mode` - -Convert from 32-bit float to 64-bit signed integer, writing the result -to the GPR `RT`. Converts using [mode `Mode`] - -`fcvtstgud RT, FRA, Mode` - -Convert from 32-bit float to 64-bit unsigned integer, writing the result -to the GPR `RT`. Converts using [mode `Mode`] - -[mode `Mode`]: #fpr-to-gpr-conversion-mode - ## GPR to FPR conversions -All of the following GPR to FPR conversions use the rounding mode from `FPSCR`. - -`fcvtfgw FRT, RA` +**Format** -Convert from 32-bit signed integer in the GPR `RA` to 64-bit float in `FRT`. +| 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form | +|--------|------|--------|-------|-------|----|------| +| Major | FRT | //Mode | RA | XO | Rc |X-Form| -`fcvtfgws FRT, RA` +All of the following GPR to FPR conversions use the rounding mode from `FPSCR`. -Convert from 32-bit signed integer in the GPR `RA` to 32-bit float in `FRT`. +* `fcvtfgw FRT, RA` + Convert from 32-bit signed integer in the GPR `RA` to 64-bit float in + `FRT`. +* `fcvtfgws FRT, RA` + Convert from 32-bit signed integer in the GPR `RA` to 32-bit float in + `FRT`. +* `fcvtfguw FRT, RA` + Convert from 32-bit unsigned integer in the GPR `RA` to 64-bit float in + `FRT`. +* `fcvtfguws FRT, RA` + Convert from 32-bit unsigned integer in the GPR `RA` to 32-bit float in + `FRT`. +* `fcvtfgd FRT, RA` + Convert from 64-bit signed integer in the GPR `RA` to 64-bit float in + `FRT`. +* `fcvtfgds FRT, RA` + Convert from 64-bit signed integer in the GPR `RA` to 32-bit float in + `FRT`. +* `fcvtfgud FRT, RA` + Convert from 64-bit unsigned integer in the GPR `RA` to 64-bit float in + `FRT`. +* `fcvtfguds FRT, RA` + Convert from 64-bit unsigned integer in the GPR `RA` to 32-bit float in + `FRT`. + +## FPR to GPR (Integer) conversions -`fcvtfguw FRT, RA` +
-Convert from 32-bit unsigned integer in the GPR `RA` to 64-bit float in `FRT`. +Different programming languages turn out to have completely different +semantics for FP to Integer conversion. Below is an overview +of the different variants, listing the languages and hardware that +implements each variant. -`fcvtfguws FRT, RA` +**Standard IEEE754 conversion** -Convert from 32-bit unsigned integer in the GPR `RA` to 32-bit float in `FRT`. +This conversion is outlined in the IEEE754 specification. It is used +by nearly all programming languages and CPUs. In the case of OpenPOWER, +the rounding mode is read from FPSCR -`fcvtfgd FRT, RA` +**Standard OpenPower conversion** -Convert from 64-bit signed integer in the GPR `RA` to 64-bit float in `FRT`. +This conversion, instead of exact IEEE754 Compliance, performs +"saturation with NaN converted to minimum valid integer". This +is also exactly the same as the x86 ISA conversion semantics. +OpenPOWER however has instructions for both: -`fcvtfgds FRT, RA` +* rounding mode read from FPSCR +* rounding mode always set to truncate -Convert from 64-bit signed integer in the GPR `RA` to 32-bit float in `FRT`. +**Java conversion** -`fcvtfgud FRT, RA` +For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by Java's semantics (and Rust's `as` operator) will be referred to as +[Java conversion semantics](#fp-to-int-java-conversion-semantics). -Convert from 64-bit unsigned integer in the GPR `RA` to 64-bit float in `FRT`. +Those same semantics are used in some way by all of the following languages (not necessarily for the default conversion method): -`fcvtfguds FRT, RA` +* Java's + [FP -> Integer conversion](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3) +* Rust's FP -> Integer conversion using the + [`as` operator](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics) +* LLVM's + [`llvm.fptosi.sat`](https://llvm.org/docs/LangRef.html#llvm-fptosi-sat-intrinsic) and + [`llvm.fptoui.sat`](https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic) intrinsics +* SPIR-V's OpenCL dialect's + [`OpConvertFToU`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToU) and + [`OpConvertFToS`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToS) + instructions when decorated with + [the `SaturatedConversion` decorator](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_decoration_a_decoration). +* WebAssembly has also introduced + [trunc_sat_u](ttps://webassembly.github.io/spec/core/exec/numerics.html#op-trunc-sat-u) and + [trunc_sat_s](https://webassembly.github.io/spec/core/exec/numerics.html#op-trunc-sat-s) + +**JavaScript conversion** + +For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by JavaScripts's `ToInt32` abstract operation will be referred to as [JavaScript conversion semantics](#fp-to-int-javascript-conversion-semantics). + +This instruction is present in ARM assembler as FJCVTZS + + +**Format** + +| 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form | +|--------|------|--------|-------|-------|----|------| +| Major | RT | //Mode | FRA | XO | Rc |X-Form| + +**Rc=1 and OE=1** + +All of these insructions have an Rc=1 mode which sets CR0 +in the normal way for any instructions producing a GPR result. +Additionally, when OE=1, if the numerical value of the FP number +is not 100% accurately preserved (due to truncation or saturation +and including when the FP number was NaN) then this is considered +to be an integer Overflow condition, and CR0.SO, XER.SO and XER.OV +are all set as normal for any GPR instructions that overflow. + +**Instructions** + +* `fcvttgw RT, FRA, Mode` + Convert from 64-bit float to 32-bit signed integer, writing the result + to the GPR `RT`. Converts using [mode `Mode`]. Similar to `fctiw` or `fctiwz` +* `fcvttguw RT, FRA, Mode` + Convert from 64-bit float to 32-bit unsigned integer, writing the result + to the GPR `RT`. Converts using [mode `Mode`]. Similar to `fctiwu` or `fctiwuz` +* `fcvttgd RT, FRA, Mode` + Convert from 64-bit float to 64-bit signed integer, writing the result + to the GPR `RT`. Converts using [mode `Mode`]. Similar to `fctid` or `fctidz` +* `fcvttgud RT, FRA, Mode` + Convert from 64-bit float to 64-bit unsigned integer, writing the result + to the GPR `RT`. Converts using [mode `Mode`]. Similar to `fctidu` or `fctiduz` +* `fcvtstgw RT, FRA, Mode` + Convert from 32-bit float to 32-bit signed integer, writing the result + to the GPR `RT`. Converts using [mode `Mode`] +* `fcvtstguw RT, FRA, Mode` + Convert from 32-bit float to 32-bit unsigned integer, writing the result + to the GPR `RT`. Converts using [mode `Mode`] +* `fcvtstgd RT, FRA, Mode` + Convert from 32-bit float to 64-bit signed integer, writing the result + to the GPR `RT`. Converts using [mode `Mode`] +* `fcvtstgud RT, FRA, Mode` + Convert from 32-bit float to 64-bit unsigned integer, writing the result + to the GPR `RT`. Converts using [mode `Mode`] -Convert from 64-bit unsigned integer in the GPR `RA` to 32-bit float in `FRT`. +[mode `Mode`]: #fpr-to-gpr-conversion-mode -# FP to Integer Conversion Pseudo-code +## FP to Integer Conversion Pseudo-code Key for pseudo-code: @@ -331,8 +438,10 @@ Key for pseudo-code: | `int` | -- | `u32`/`u64`/`i32`/`i64` (or other types from SimpleV) | | `uint` | -- | the unsigned integer of the same bit-width as `int` | | `int::BITS` | `int` | the bit-width of `int` | -| `int::MIN_VALUE` | `int` | the minimum value `int` can store (`0` if unsigned, `-2^(int::BITS-1)` if signed) | -| `int::MAX_VALUE` | `int` | the maximum value `int` can store (`2^int::BITS - 1` if unsigned, `2^(int::BITS-1) - 1` if signed) | +| `uint::MIN_VALUE` | `uint` | the minimum value `uint` can store: `0` | +| `uint::MAX_VALUE` | `uint` | the maximum value `uint` can store: `2^int::BITS - 1` | +| `int::MIN_VALUE` | `int` | the minimum value `int` can store : `-2^(int::BITS-1)` | +| `int::MAX_VALUE` | `int` | the maximum value `int` can store : `2^(int::BITS-1) - 1` | | `int::VALUE_COUNT` | Integer | the number of different values `int` can store (`2^int::BITS`). too big to fit in `int`. | | `rint(fp, rounding_mode)` | `fp` | rounds the floating-point value `fp` to an integer according to rounding mode `rounding_mode` | @@ -350,11 +459,14 @@ def fp_to_int_open_power(v: fp) -> int: return (int)rint(v, rounding_mode) ``` -
-Rust [conversion semantics](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics) (with adjustment to add non-truncate rounding modes): +
+[Java conversion semantics](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3) +/ +[Rust semantics](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics) +(with adjustment to add non-truncate rounding modes): ``` -def fp_to_int_rust(v: fp) -> int: +def fp_to_int_java(v: fp) -> int: if v is NaN: return 0 if v >= int::MAX_VALUE: @@ -365,18 +477,16 @@ def fp_to_int_rust(v: fp) -> int: ```
-JavaScript [conversion semantics](https://262.ecma-international.org/11.0/#sec-toint32) (with adjustment to add non-truncate rounding modes): +Section 7.1 of the ECMAScript / JavaScript +[conversion semantics](https://262.ecma-international.org/11.0/#sec-toint32) (with adjustment to add non-truncate rounding modes): ``` def fp_to_int_java_script(v: fp) -> int: if v is NaN or infinite: return 0 - v = rint(v, rounding_mode) + v = rint(v, rounding_mode) # assume no loss of precision in result v = v mod int::VALUE_COUNT # 2^32 for i32, 2^64 for i64, result is non-negative bits = (uint)v return (int)bits ``` -# Equivalent OpenPower ISA v3.0 Assembly Language for FP -> Integer Conversion Modes - -Moved to [[int_fp_mv/appendix]]