(no commit message)

[libreriscv.git] / openpower / sv / int_fp_mv.mdwn
diff --git a/openpower/sv/int_fp_mv.mdwn b/openpower/sv/int_fp_mv.mdwn

index 531de77d285fdc125645603ead23ef96dad3d147..225fd684b7290baf53560b57a9f805ddce11e2bc 100644 (file)
--- a/openpower/sv/int_fp_mv.mdwn
+++ b/openpower/sv/int_fp_mv.mdwn
@@ -1,23 +1,40 @@
+[[!tag standards]]
+
  # FPR-to-GPR and GPR-to-FPR
  
+**Draft Status** under development, for submission as an RFC
+
+Links:
+
+* <https://bugs.libre-soc.org/show_bug.cgi?id=650>
+* <https://bugs.libre-soc.org/show_bug.cgi?id=230#c71>
+* <https://bugs.libre-soc.org/show_bug.cgi?id=230#c74>
+* <https://bugs.libre-soc.org/show_bug.cgi?id=230#c76>
+
  Introduction:
  
  High-performance CPU/GPU software needs to often convert between integers
  and floating-point, therefore fast conversion/data-movement instructions
  are needed.  Also given that initialisation of floats tends to take up
-considerable space (even to just load 0.0) the inclusion of float immediate
-is up for consideration (BF16 as immediates)
+considerable space (even to just load 0.0) the inclusion of compact
+format float immediate is up for consideration using BF16
  
  Libre-SOC will be compliant with the
  **Scalar Floating-Point Subset** (SFFS) i.e. is not implementing VMX/VSX,
  and with its focus on modern 3D GPU hybrid workloads represents an
  important new potential use-case for OpenPOWER.
-With VMX/VSX not available in the SFFS Compliancy Level, the 
+
+Prior to the formation of the Compliancy Levels first introduced
+in v3.0C and v3.1
+the progressive historic development of the Scalar parts of the Power ISA assumed
+that VSX would always be there to complement it. However With VMX/VSX 
+**not available** in the newly-introduced SFFS Compliancy Level, the
  existing non-VSX conversion/data-movement instructions require load/store
  instructions (slow and expensive) to transfer data between the FPRs and
-the GPRs.  Also, because SimpleV needs efficient scalar instructions in
+the GPRs.  For a 3D GPU this kills any modern competitive edge.
+Also, because SimpleV needs efficient scalar instructions in
  order to generate efficient vector instructions, adding new instructions
-for data-transfer/conversion between FPRs and GPRs seems necessary.
+for data-transfer/conversion between FPRs and GPRs multiplies the savings.
  
  In addition, the vast majority of GPR <-> FPR data-transfers are as part
  of a FP <-> Integer conversion sequence, therefore reducing the number
@@ -34,72 +51,85 @@ If we're adding new Integer <-> FP conversion instructions, we may
  as well take this opportunity to modernise the instructions and make them
  well suited for common/important conversion sequences:
  
-* standard Integer -> FP conversion
-  - rounding mode read from FPSCR
-* standard OpenPower FP -> Integer conversion -- saturation with NaN
-  converted to minimum valid integer
-  - Matches x86's conversion semantics
-  - Has instructions for both:
-    * rounding mode read from FPSCR
-    * rounding mode is always truncate
-* Rust FP -> Integer conversion -- saturation with NaN converted to 0
-
-    Semantics required by all of:
-
-    * Rust's FP -> Integer conversion using the
-      [`as` operator](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics)
-    * Java's
-      [FP -> Integer conversion](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3)
-    * LLVM's
-      [`llvm.fptosi.sat`](https://llvm.org/docs/LangRef.html#llvm-fptosi-sat-intrinsic) and
-      [`llvm.fptoui.sat`](https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic) intrinsics
-    * SPIR-V's OpenCL dialect's
-      [`OpConvertFToU`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToU) and
-      [`OpConvertFToS`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToS)
-      instructions when decorated with
-      [the `SaturatedConversion` decorator](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_decoration_a_decoration).
-* JavaScript FP -> Integer conversion -- modular with Inf/NaN converted to 0
-
-    Semantics required by JavaScript
+* standard Integer -> FP IEEE754 conversion (used by most languages and CPUs)
+* standard OpenPower FP -> Integer conversion (saturation with NaN
+  converted to minimum valid integer)
+* Rust FP -> Integer conversion (saturation with NaN converted to 0)
+* JavaScript FP -> Integer conversion (modular with Inf/NaN converted to 0)
  
-TODO: review and investigate other language semantics
+The assembly listings in the [[int_fp_mv/appendix]] show how costly
+some of these language-specific conversions are: Javascript is 35
+scalar instructions, including four branches.
  
-# Links
+## FP -> Integer conversions
  
-* <https://bugs.libre-soc.org/show_bug.cgi?id=650>
-* <https://bugs.libre-soc.org/show_bug.cgi?id=230#c71>
-* <https://bugs.libre-soc.org/show_bug.cgi?id=230#c74>
-* <https://bugs.libre-soc.org/show_bug.cgi?id=230#c76>
+Different programming languages turn out to have completely different
+semantics for FP to Integer conversion.  This section gives an overview
+of the different variants, listing the languages and hardware that
+implements each variant.
  
-# Proposed New Scalar Instructions
+## standard Integer -> FP conversion
+
+This conversion is outlined in the IEEE754 specification.  It is used
+by nearly all programming languages and CPUs.  In the case of OpenPOWER,
+the rounding mode is read from FPSCR
+
+### standard OpenPower FP -> Integer conversion
+
+This conversion, instead of exact IEEE754 Compliance, performs
+"saturation with NaN converted to minimum valid integer". This
+is also exactly the same as the x86 ISA conversion senantics.
+OpenPOWER however has instructions for both:
+
+* rounding mode read from FPSCR
+* rounding mode always set to truncate
+
+### Rust FP -> Integer conversion
+
+For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by Rust's `as` operator will be referred to as [Rust conversion semantics](#fp-to-int-rust-conversion-semantics).
  
-All of the following instructions use the standard OpenPower conversion to/from 64-bit float format when reading/writing a 32-bit float from/to a FPR.
+Those same semantics are used in some way by all of the following languages (not necessarily for the default conversion method):
  
-This can be overridden by SimpleV, which sets the following
-operation "reinterpretation" rules:
+* Rust's FP -> Integer conversion using the
+  [`as` operator](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics)
+* Java's
+  [FP -> Integer conversion](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3)
+* LLVM's
+  [`llvm.fptosi.sat`](https://llvm.org/docs/LangRef.html#llvm-fptosi-sat-intrinsic) and
+  [`llvm.fptoui.sat`](https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic) intrinsics
+* SPIR-V's OpenCL dialect's
+  [`OpConvertFToU`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToU) and
+  [`OpConvertFToS`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToS)
+  instructions when decorated with
+  [the `SaturatedConversion` decorator](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_decoration_a_decoration).
  
-* any operation whose assembler mnemonic does not end in "s"
-  (being defined in v3.0B as a "double" operation) is
-  instead an operation at the overridden elwidth for the
-  relevant operand.
-* any operation nominally defined as a "single" FP operation
-  is redefined to be **half the elwidth** rather than
-  "half of 64 bit".
+### JavaScript FP -> Integer conversion
  
-Examples:
+For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by JavaScripts's `ToInt32` abstract operation will be referred to as [JavaScript conversion semantics](#fp-to-int-javascript-conversion-semantics).
+
+This instruction is present in ARM assembler as FJCVTZS
+<https://developer.arm.com/documentation/dui0801/g/hko1477562192868>
+
+### Other languages
+
+TODO: review and investigate other language semantics
+
+# Proposed New Scalar Instructions
  
-* `sv.fmvtg/sw=32 RT.v, FRA.v` is defined as treating FRA
-   as a vector of *FP32* source operands each *32* bits wide
-   which are to be placed into *64* bit integer destination elements.
-* `sv.fmvfgs/dw=32 FRT.v, RA.v` is defined as taking the bottom
-   32 bits of each RA integer source, then performing a **32 bit**
-   FP32 to **FP16** conversion and storing the result in the
-   **32 bits** of an FRT destination element.
+All of the following instructions use the standard OpenPower conversion to/from 64-bit float format when reading/writing a 32-bit float from/to a FPR.  All integers however are sourced/stored in the *GPR*.
  
-"Single" is therefore redefined in SVP64 to be "half elwidth"
-rather than Double width hardcoded to 64 and Single width
-hardcoded to 32.  This allows a full range of conversions
-between FP64, FP32, FP16 and BF16.
+Integer operands and results being in the GPR is the key differentiator between the proposed instructions
+(the entire rationale) compated to existing Scalar Power ISA.
+In all existing Power ISA Scalar conversion instructions, all
+operands are FPRs, even if the format of the source or destination
+data is actually a scalar integer.
+
+Note that source and destination widths can be overridden by SimpleV
+SVP64, and that SVP64 also has Saturation Modes *in addition*
+to those independently described here. SVP64 Overrides and Saturation
+work on *both* Fixed *and* Floating Point operands and results.
+ The interactions with SVP64
+are explained in the  [[int_fp_mv/appendix]]
  
  ## FPR to GPR moves
  
@@ -107,29 +137,50 @@ between FP64, FP32, FP16 and BF16.
  * `fmvtg. RT, FRA`
  
  move a 64-bit float from a FPR to a GPR, just copying bits directly.
-Rc=1 tests RT and sets CR0
+As a direct bitcopy, no exceptions occur and no status flags are set.
+
+Rc=1 tests RT and sets CR0, exactly like all other Scalar Fixed-Point
+operations.
  
  * `fmvtgs RT, FRA`
  * `fmvtgs. RT, FRA`
  
  move a 32-bit float from a FPR to a GPR, just copying bits. Converts the
  64-bit float in `FRA` to a 32-bit float, then writes the 32-bit float to
-`RT`.
-Rc=1 tests RT and sets CR0
+`RT`. Effectively, `fmvtgs` is a macro-fusion of `frsp fmvtg`
+and therefore has the exact same exception and flags behaviour of `frsp`
+
+Unlike `frsp` however, with RT being a GPR, Rc=1 follows
+standard *integer* behaviour, i.e. tests RT and sets CR0.
  
  ## GPR to FPR moves
  
  `fmvfg FRT, RA`
  
-move a 64-bit float from a GPR to a FPR, just copying bits.
+move a 64-bit float from a GPR to a FPR, just copying bits. No exceptions
+are raised, no flags are altered of any kind.
+
+Rc=1 tests FRT and sets CR1
  
  `fmvfgs FRT, RA`
  
  move a 32-bit float from a GPR to a FPR, just copying bits. Converts the
  32-bit float in `RA` to a 64-bit float, then writes the 64-bit float to
-`FRT`.
+`FRT`. Effectively, `fmvfgs` is a macro-fusion of `fmvfg frsp` and
+therefore has the exact same exception and flags behaviour of `frsp`
+
+Rc=1 tests FRT and sets CR1
+
+TODO: clear statement on evaluation as to whether exceptions or flags raised as part of the **FP** conversion (not the int bitcopy part, the conversion part.  the semantics should really be the same as frsp)
+
+v3.0C section 4.6.7.1 states:
  
-TODO: Rc=1 variants?
+FPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when VE=1.
+
+    Special Registers Altered:
+      FPRF FR FI
+      FX OX UX XX VXSNAN
+      CR1 (if Rc=1)
  
  ### Float load immediate (kinda a variant of `fmvfg`)
  
@@ -139,6 +190,10 @@ Reinterprets `FI << 16` as a 32-bit float, which is then converted to a
  64-bit float and written to `FRT`.  This is equivalent to reinterpreting
  `FI` as a `BF16` and converting to 64-bit float.
  
+There is no need for an Rc=1 variant because this is an immediate loading
+instruction. This frees up one extra bit in the X-Form format for packing
+a full `BF16`.
+
  Example:
  
  ```
@@ -326,7 +381,8 @@ def fp_to_int_rust<fp, int>(v: fp) -> int:
  ```
  
  <div id="fp-to-int-javascript-conversion-semantics"></div>
-JavaScript [conversion semantics](https://262.ecma-international.org/11.0/#sec-toint32) (with adjustment to add non-truncate rounding modes):
+Section 7.1 of the ECMAScript / JavaScript
+[conversion semantics](https://262.ecma-international.org/11.0/#sec-toint32) (with adjustment to add non-truncate rounding modes):
  
  ```
  def fp_to_int_java_script<fp, int>(v: fp) -> int:
@@ -340,133 +396,4 @@ def fp_to_int_java_script<fp, int>(v: fp) -> int:
  
  # Equivalent OpenPower ISA v3.0 Assembly Language for FP -> Integer Conversion Modes
  
-## Rust
-
-https://rust.godbolt.org/z/jervW7ofb
-
-### 64-bit float -> 64-bit signed integer
-
-```
-.LCPI0_0:
-        .long   0xdf000000
-.LCPI0_1:
-        .quad   0x43dfffffffffffff
-example::fcvttgd_rust:
-.Lfunc_gep0:
-        addis 2, 12, .TOC.-.Lfunc_gep0@ha
-        addi 2, 2, .TOC.-.Lfunc_gep0@l
-        addis 3, 2, .LCPI0_0@toc@ha
-        fctidz 2, 1
-        fcmpu 5, 1, 1
-        li 4, 1
-        li 5, -1
-        lfs 0, .LCPI0_0@toc@l(3)
-        addis 3, 2, .LCPI0_1@toc@ha
-        rldic 4, 4, 63, 0
-        fcmpu 0, 1, 0
-        lfd 0, .LCPI0_1@toc@l(3)
-        stfd 2, -8(1)
-        ld 3, -8(1)
-        fcmpu 1, 1, 0
-        cror 24, 0, 3
-        isel 3, 4, 3, 24
-        rldic 4, 5, 0, 1
-        isel 3, 4, 3, 5
-        isel 3, 0, 3, 23
-        blr
-        .long   0
-        .quad   0
-```
-
-### 64-bit float -> 64-bit unsigned integer
-
-```
-.LCPI1_0:
-        .long   0x00000000
-.LCPI1_1:
-        .quad   0x43efffffffffffff
-example::fcvttgud_rust:
-.Lfunc_gep1:
-        addis 2, 12, .TOC.-.Lfunc_gep1@ha
-        addi 2, 2, .TOC.-.Lfunc_gep1@l
-        addis 3, 2, .LCPI1_0@toc@ha
-        fctiduz 2, 1
-        li 4, -1
-        lfs 0, .LCPI1_0@toc@l(3)
-        addis 3, 2, .LCPI1_1@toc@ha
-        fcmpu 0, 1, 0
-        lfd 0, .LCPI1_1@toc@l(3)
-        stfd 2, -8(1)
-        ld 3, -8(1)
-        fcmpu 1, 1, 0
-        cror 20, 0, 3
-        isel 3, 0, 3, 20
-        isel 3, 4, 3, 5
-        blr
-        .long   0
-        .quad   0
-```
-
-### 64-bit float -> 32-bit signed integer
-
-```
-.LCPI2_0:
-        .long   0xcf000000
-.LCPI2_1:
-        .quad   0x41dfffffffc00000
-example::fcvttgw_rust:
-.Lfunc_gep2:
-        addis 2, 12, .TOC.-.Lfunc_gep2@ha
-        addi 2, 2, .TOC.-.Lfunc_gep2@l
-        addis 3, 2, .LCPI2_0@toc@ha
-        fctiwz 2, 1
-        lis 4, -32768
-        lis 5, 32767
-        lfs 0, .LCPI2_0@toc@l(3)
-        addis 3, 2, .LCPI2_1@toc@ha
-        fcmpu 0, 1, 0
-        lfd 0, .LCPI2_1@toc@l(3)
-        addi 3, 1, -4
-        stfiwx 2, 0, 3
-        fcmpu 5, 1, 1
-        lwz 3, -4(1)
-        fcmpu 1, 1, 0
-        cror 24, 0, 3
-        isel 3, 4, 3, 24
-        ori 4, 5, 65535
-        isel 3, 4, 3, 5
-        isel 3, 0, 3, 23
-        blr
-        .long   0
-        .quad   0
-```
-
-### 64-bit float -> 32-bit unsigned integer
-
-```
-.LCPI3_0:
-        .long   0x00000000
-.LCPI3_1:
-        .quad   0x41efffffffe00000
-example::fcvttguw_rust:
-.Lfunc_gep3:
-        addis 2, 12, .TOC.-.Lfunc_gep3@ha
-        addi 2, 2, .TOC.-.Lfunc_gep3@l
-        addis 3, 2, .LCPI3_0@toc@ha
-        fctiwuz 2, 1
-        li 4, -1
-        lfs 0, .LCPI3_0@toc@l(3)
-        addis 3, 2, .LCPI3_1@toc@ha
-        fcmpu 0, 1, 0
-        lfd 0, .LCPI3_1@toc@l(3)
-        addi 3, 1, -4
-        stfiwx 2, 0, 3
-        lwz 3, -4(1)
-        fcmpu 1, 1, 0
-        cror 20, 0, 3
-        isel 3, 0, 3, 20
-        isel 3, 4, 3, 5
-        blr
-        .long   0
-        .quad   0
-```
+Moved to [[int_fp_mv/appendix]]