X-Git-Url: https://git.libre-soc.org/?a=blobdiff_plain;f=openpower%2Fsv%2Fint_fp_mv.mdwn;h=225fd684b7290baf53560b57a9f805ddce11e2bc;hb=a8a393c4653918330ae43f2f8b0a21f26046352d;hp=8c62fc954872ef3e5fe03b437f502e23f136636d;hpb=ac0eb020d225cf8aaaad36a1351cf67bd678a9a7;p=libreriscv.git diff --git a/openpower/sv/int_fp_mv.mdwn b/openpower/sv/int_fp_mv.mdwn index 8c62fc954..225fd684b 100644 --- a/openpower/sv/int_fp_mv.mdwn +++ b/openpower/sv/int_fp_mv.mdwn @@ -1,28 +1,48 @@ +[[!tag standards]] + # FPR-to-GPR and GPR-to-FPR +**Draft Status** under development, for submission as an RFC + +Links: + +* +* +* +* + Introduction: High-performance CPU/GPU software needs to often convert between integers and floating-point, therefore fast conversion/data-movement instructions are needed. Also given that initialisation of floats tends to take up -considerable space Geven to just load 0.0) the inclusion of float immediate -is up for consideration. +considerable space (even to just load 0.0) the inclusion of compact +format float immediate is up for consideration using BF16 Libre-SOC will be compliant with the -**Scalar Floating-Point Subset** (SFFS) i.e. is not implementing VMX/VSX. -With VMX/VSX not available in the SFFS Compliancy Level, the +**Scalar Floating-Point Subset** (SFFS) i.e. is not implementing VMX/VSX, +and with its focus on modern 3D GPU hybrid workloads represents an +important new potential use-case for OpenPOWER. + +Prior to the formation of the Compliancy Levels first introduced +in v3.0C and v3.1 +the progressive historic development of the Scalar parts of the Power ISA assumed +that VSX would always be there to complement it. However With VMX/VSX +**not available** in the newly-introduced SFFS Compliancy Level, the existing non-VSX conversion/data-movement instructions require load/store instructions (slow and expensive) to transfer data between the FPRs and -the GPRs. Also, because SimpleV needs efficient scalar instructions in +the GPRs. For a 3D GPU this kills any modern competitive edge. +Also, because SimpleV needs efficient scalar instructions in order to generate efficient vector instructions, adding new instructions -for data-transfer/conversion between FPRs and GPRs seems necessary. +for data-transfer/conversion between FPRs and GPRs multiplies the savings. In addition, the vast majority of GPR <-> FPR data-transfers are as part of a FP <-> Integer conversion sequence, therefore reducing the number of instructions required to the minimum seems necessary. -Therefore, we are proposing adding both: +Therefore, we are proposing adding: +* FPR load-immediate using `BF16` as the constant * FPR <-> GPR data-transfer instructions that just copy bits without conversion * FPR <-> GPR combined data-transfer/conversion instructions that do Integer <-> FP conversions @@ -31,17 +51,44 @@ If we're adding new Integer <-> FP conversion instructions, we may as well take this opportunity to modernise the instructions and make them well suited for common/important conversion sequences: -* standard Integer -> FP conversion - - rounding mode read from FPSCR -* standard OpenPower FP -> Integer conversion -- saturation with NaN - converted to minimum valid integer - - Matches x86's conversion semantics - - Has instructions for both: - * rounding mode read from FPSCR - * rounding mode is always truncate -* Rust FP -> Integer conversion -- saturation with NaN converted to 0 +* standard Integer -> FP IEEE754 conversion (used by most languages and CPUs) +* standard OpenPower FP -> Integer conversion (saturation with NaN + converted to minimum valid integer) +* Rust FP -> Integer conversion (saturation with NaN converted to 0) +* JavaScript FP -> Integer conversion (modular with Inf/NaN converted to 0) + +The assembly listings in the [[int_fp_mv/appendix]] show how costly +some of these language-specific conversions are: Javascript is 35 +scalar instructions, including four branches. + +## FP -> Integer conversions + +Different programming languages turn out to have completely different +semantics for FP to Integer conversion. This section gives an overview +of the different variants, listing the languages and hardware that +implements each variant. -Semantics required by all of: +## standard Integer -> FP conversion + +This conversion is outlined in the IEEE754 specification. It is used +by nearly all programming languages and CPUs. In the case of OpenPOWER, +the rounding mode is read from FPSCR + +### standard OpenPower FP -> Integer conversion + +This conversion, instead of exact IEEE754 Compliance, performs +"saturation with NaN converted to minimum valid integer". This +is also exactly the same as the x86 ISA conversion senantics. +OpenPOWER however has instructions for both: + +* rounding mode read from FPSCR +* rounding mode always set to truncate + +### Rust FP -> Integer conversion + +For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by Rust's `as` operator will be referred to as [Rust conversion semantics](#fp-to-int-rust-conversion-semantics). + +Those same semantics are used in some way by all of the following languages (not necessarily for the default conversion method): * Rust's FP -> Integer conversion using the [`as` operator](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics) @@ -55,54 +102,97 @@ Semantics required by all of: [`OpConvertFToS`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToS) instructions when decorated with [the `SaturatedConversion` decorator](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_decoration_a_decoration). -* JavaScript FP -> Integer conversion -- modular with Inf/NaN converted to 0 -Semantics required by JavaScript +### JavaScript FP -> Integer conversion -TODO: review and investigate other language semantics +For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by JavaScripts's `ToInt32` abstract operation will be referred to as [JavaScript conversion semantics](#fp-to-int-javascript-conversion-semantics). -# Links +This instruction is present in ARM assembler as FJCVTZS + -* -* -* -* +### Other languages + +TODO: review and investigate other language semantics # Proposed New Scalar Instructions -All of the following instructions use the standard OpenPower conversion to/from 64-bit float format when reading/writing a 32-bit float from/to a FPR. This can be overridden by SimpleV. +All of the following instructions use the standard OpenPower conversion to/from 64-bit float format when reading/writing a 32-bit float from/to a FPR. All integers however are sourced/stored in the *GPR*. + +Integer operands and results being in the GPR is the key differentiator between the proposed instructions +(the entire rationale) compated to existing Scalar Power ISA. +In all existing Power ISA Scalar conversion instructions, all +operands are FPRs, even if the format of the source or destination +data is actually a scalar integer. + +Note that source and destination widths can be overridden by SimpleV +SVP64, and that SVP64 also has Saturation Modes *in addition* +to those independently described here. SVP64 Overrides and Saturation +work on *both* Fixed *and* Floating Point operands and results. + The interactions with SVP64 +are explained in the [[int_fp_mv/appendix]] ## FPR to GPR moves -`fmvtg RT, FRA` +* `fmvtg RT, FRA` +* `fmvtg. RT, FRA` + +move a 64-bit float from a FPR to a GPR, just copying bits directly. +As a direct bitcopy, no exceptions occur and no status flags are set. -move a 64-bit float from a FPR to a GPR, just copying bits. +Rc=1 tests RT and sets CR0, exactly like all other Scalar Fixed-Point +operations. -`fmvtgs RT, FRA` +* `fmvtgs RT, FRA` +* `fmvtgs. RT, FRA` move a 32-bit float from a FPR to a GPR, just copying bits. Converts the 64-bit float in `FRA` to a 32-bit float, then writes the 32-bit float to -`RT`. +`RT`. Effectively, `fmvtgs` is a macro-fusion of `frsp fmvtg` +and therefore has the exact same exception and flags behaviour of `frsp` + +Unlike `frsp` however, with RT being a GPR, Rc=1 follows +standard *integer* behaviour, i.e. tests RT and sets CR0. ## GPR to FPR moves `fmvfg FRT, RA` -move a 64-bit float from a GPR to a FPR, just copying bits. +move a 64-bit float from a GPR to a FPR, just copying bits. No exceptions +are raised, no flags are altered of any kind. + +Rc=1 tests FRT and sets CR1 `fmvfgs FRT, RA` move a 32-bit float from a GPR to a FPR, just copying bits. Converts the 32-bit float in `RA` to a 64-bit float, then writes the 64-bit float to -`FRT`. +`FRT`. Effectively, `fmvfgs` is a macro-fusion of `fmvfg frsp` and +therefore has the exact same exception and flags behaviour of `frsp` + +Rc=1 tests FRT and sets CR1 + +TODO: clear statement on evaluation as to whether exceptions or flags raised as part of the **FP** conversion (not the int bitcopy part, the conversion part. the semantics should really be the same as frsp) + +v3.0C section 4.6.7.1 states: + +FPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when VE=1. + + Special Registers Altered: + FPRF FR FI + FX OX UX XX VXSNAN + CR1 (if Rc=1) ### Float load immediate (kinda a variant of `fmvfg`) -`fmvis FRT, UI` +`fmvis FRT, FI` -Reinterprets `UI << 16` as a 32-bit float, which is then converted to a +Reinterprets `FI << 16` as a 32-bit float, which is then converted to a 64-bit float and written to `FRT`. This is equivalent to reinterpreting -`UI` as a bf16 and converting to 64-bit float, writing to `FRT`. +`FI` as a `BF16` and converting to 64-bit float. + +There is no need for an Rc=1 variant because this is an immediate loading +instruction. This frees up one extra bit in the X-Form format for packing +a full `BF16`. Example: @@ -118,15 +208,41 @@ fmvis f4, 0x7FC0 # writes +qNaN to f4 fmvis f4, 0x7F80 # writes +Infinity to f4 fmvis f4, 0xFF80 # writes -Infinity to f4 fmvis f4, 0x3FFF # writes +1.9921875 to f4 + +# clearing 128 FPRs with 2 SVP64 instructions +# by issuing 32 vec4 (subvector length 4) ops +setvli VL=MVL=32 +sv.fmvis/vec4 f0, 0 # writes +0.0 to f0-f127 ``` Important: If the float load immediate instruction(s) are left out, change all [GPR to FPR conversion instructions](#GPR-to-FPR-conversions) -to instead write `+0.0` if `RA` is register `0`, allowing clearing FPRs. +to instead write `+0.0` if `RA` is register `0`, at least +allowing clearing FPRs. + +| 0-5 | 6-10 | 11-25 | 26-30 | 31 | +|--------|------|-------|-------|-----| +| Major | FRT | FI | XO | FI0 | + +The above fits reasonably well with Minor 19 and follows the +pattern shown by `addpcis`, which uses an entire column of Minor 19 +XO. 15 bits of FI fit into bits 11 to 25, +the top bit FI0 (MSB0 numbered 0) makes 16. + + bf16 = FI0 || FI + fp32 = bf16 || [0]*16 + FRT = Single_to_Double(fp32) ## FPR to GPR conversions
+X-Form: + +| 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | +|--------|------|--------|-------|-------|----| +| Major | RT | //Mode | FRA | XO | Rc | +| Major | FRT | //Mode | RA | XO | Rc | + Mode values: | Mode | `rounding_mode` | Semantics | @@ -265,7 +381,8 @@ def fp_to_int_rust(v: fp) -> int: ```
-JavaScript [conversion semantics](https://262.ecma-international.org/11.0/#sec-toint32) (with adjustment to add non-truncate rounding modes): +Section 7.1 of the ECMAScript / JavaScript +[conversion semantics](https://262.ecma-international.org/11.0/#sec-toint32) (with adjustment to add non-truncate rounding modes): ``` def fp_to_int_java_script(v: fp) -> int: @@ -276,3 +393,7 @@ def fp_to_int_java_script(v: fp) -> int: bits = (uint)v return (int)bits ``` + +# Equivalent OpenPower ISA v3.0 Assembly Language for FP -> Integer Conversion Modes + +Moved to [[int_fp_mv/appendix]]