(no commit message)

[libreriscv.git] / openpower / sv / int_fp_mv.mdwn
diff --git a/openpower/sv/int_fp_mv.mdwn b/openpower/sv/int_fp_mv.mdwn

index 475fc250ffcefb7d34195a971345cc600371ef72..ad1740f0fb9078984e909a3b30b659dd1e63ee5d 100644 (file)
--- a/openpower/sv/int_fp_mv.mdwn
+++ b/openpower/sv/int_fp_mv.mdwn
@@ -12,6 +12,19 @@ Links:
  * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c76>
  * [[int_fp_mv/appendix]]
  
+Trademarks:
+
+* Rust is a Trademark of the Rust Foundation
+* Java and Javascript are Trademarks of Oracle
+* LLVM is a Trademark of the LLVM Foundation
+* SPIR-V is a Trademark of the Khronos Group
+* OpenCL is a Trademark of Apple, Inc.
+
+Referring to these Trademarks within this document
+is by necessity, in order to put the semantics of each language
+into context, and is considered "fair use" under Trademark
+Law.
+
  Introduction:
  
  High-performance CPU/GPU software needs to often convert between integers
@@ -32,9 +45,11 @@ in v3.0C and v3.1
  the progressive historic development of the Scalar parts of the Power ISA assumed
  that VSX would always be there to complement it. However With VMX/VSX 
  **not available** in the newly-introduced SFFS Compliancy Level, the
-existing non-VSX conversion/data-movement instructions require load/store
+existing non-VSX conversion/data-movement instructions require 
+a Vector of load/store
  instructions (slow and expensive) to transfer data between the FPRs and
-the GPRs.  For a 3D GPU this kills any modern competitive edge.
+the GPRs.  For a modern 3D GPU this kills any possibility of a
+competitive edge.
  Also, because SimpleV needs efficient scalar instructions in
  order to generate efficient vector instructions, adding new instructions
  for data-transfer/conversion between FPRs and GPRs multiplies the savings.
@@ -95,21 +110,22 @@ These are like a variant of `fmvfg` and `oris`, combined.
  Power ISA currently requires a large
  number of instructions to get Floating Point constants into registers.
  `fmvis` on its own is equivalent to BF16 to FP32/64 conversion,
-but if followed up by `fishmv` an additional 16 bits of accuracy in the
+but if followed up by `frlsi` an additional 16 bits of accuracy in the
  mantissa may be achieved.
  
  *IBM may consider it worthwhile to extend these two instructions to
-v3.1 Prefixed (`pfmvis` and `pfishmv`). If so it is recommended that
-`pfmvis` load a full FP32 immediate and `pfishmv` extend the lower
-32-bits to construct a full FP64 immediate.*
+v3.1 Prefixed (`pfmvis` and `pfrlsi`). If so it is recommended that
+`pfmvis` load a full FP32 immediate and `pfrlsi` supplies the three high
+missing exponent bits (numbered 8 to 10) and the lower additional
+29 mantissa bits (23 to 51) needed to construct a full FP64 immediate.*
  
  ## Load BF16 Immediate
  
-`fmvis FRS, FI`
+`fmvis FRS, D`
  
-Reinterprets `FI << 16` as a 32-bit float, which is then converted to a
+Reinterprets `D << 16` as a 32-bit float, which is then converted to a
  64-bit float and written to `FRS`.  This is equivalent to reinterpreting
-`FI` as a `BF16` and converting to 64-bit float.
+`D` as a `BF16` and converting to 64-bit float.
  There is no need for an Rc=1 variant because this is an immediate loading
  instruction.
  
@@ -150,9 +166,9 @@ Pseudocode:
      fp32 = bf16 || [0]*16
      FRS = Single_to_Double(fp32)
  
-## Float Immediate, Second Half <a name="fishmv"></a>
+## Float Replace Lower-Half Single, Immediate <a name="frlsi"></a>
  
-`fishmv FRS, FI`
+`frlsi FRS, D`
  
  DX-Form:
  
@@ -163,14 +179,16 @@ DX-Form:
  Strategically similar to how `oris` is used to construct
  32-bit Integers, an additional 16-bits of immediate is
  inserted into `FRS` to extend its accuracy to
-a full FP32. If a prior `fmvis` instruction had been used to
-set the upper 16-bits of an FP32 value, `fishmv` contains the
+a full FP32 (stored as usual in FP64 Format within the FPR).
+If a prior `fmvis` instruction had been used to
+set the upper 16-bits of an FP32 value, `frlsi` contains the
  lower 16-bits.
  
-The key difference between using `li` and `oris` to construxt 32-bit
-GPR Immediates and `fishmv` is that the `fmvis` will have converted
-the `BF16` to FP64 (Double) format. This is taken into consideration
-as can be seen in the pseudocode below
+The key difference between using `li` and `oris` to construct 32-bit
+GPR Immediates and `frlsi` is that the `fmvis` will have converted
+the `BF16` immediate to FP64 (Double) format.
+This is taken into consideration
+as can be seen in the pseudocode below.
  
  Pseudocode:
  
@@ -190,7 +208,7 @@ Example:
  # first the upper bits, happens to be +1.0
  fmvis f4, 0x3F80 # writes +1.0 to f4
  # now write the lower 16 bits of an FP32
-fishmv f4, 0x8000 # writes +1.00390625 to f4
+frlsi f4, 0x8000 # writes +1.00390625 to f4
  ```
  
  # Moves
@@ -326,7 +344,7 @@ the rounding mode is read from FPSCR
  
  This conversion, instead of exact IEEE754 Compliance, performs
  "saturation with NaN converted to minimum valid integer". This
-is also exactly the same as the x86 ISA conversion senantics.
+is also exactly the same as the x86 ISA conversion semantics.
  OpenPOWER however has instructions for both:
  
  * rounding mode read from FPSCR
@@ -351,6 +369,9 @@ Those same semantics are used in some way by all of the following languages (not
    [`OpConvertFToS`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToS)
    instructions when decorated with
    [the `SaturatedConversion` decorator](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_decoration_a_decoration).
+* WebAssembly has also introduced
+ [trunc_sat_u](ttps://webassembly.github.io/spec/core/exec/numerics.html#op-trunc-sat-u) and
+ [trunc_sat_s](https://webassembly.github.io/spec/core/exec/numerics.html#op-trunc-sat-s)
  
  **JavaScript conversion**
  
@@ -365,20 +386,30 @@ This instruction is present in ARM assembler as FJCVTZS
  |--------|------|--------|-------|-------|----|------|
  |  Major | RT   | //Mode | FRA   | XO    | Rc |X-Form|
  
+**Rc=1 and OE=1**
+
+All of these insructions have an Rc=1 mode which sets CR0
+in the normal way for any instructions producing a GPR result.
+Additionally, when OE=1, if the numerical value of the FP number
+is not 100% accurately preserved (due to truncation or saturation
+and including when the FP number was NaN) then this is considered
+to be an integer Overflow condition, and CR0.SO, XER.SO and XER.OV
+are all set as normal for any GPR instructions that overflow.
+
  **Instructions**
  
  * `fcvttgw RT, FRA, Mode`
    Convert from 64-bit float to 32-bit signed integer, writing the result
-  to the GPR `RT`. Converts using [mode `Mode`]
+  to the GPR `RT`. Converts using [mode `Mode`]. Similar to `fctiw` or `fctiwz`
  * `fcvttguw RT, FRA, Mode`
    Convert from 64-bit float to 32-bit unsigned integer, writing the result
-  to the GPR `RT`. Converts using [mode `Mode`]
+  to the GPR `RT`. Converts using [mode `Mode`]. Similar to `fctiwu` or `fctiwuz`
  * `fcvttgd RT, FRA, Mode`
    Convert from 64-bit float to 64-bit signed integer, writing the result
-  to the GPR `RT`. Converts using [mode `Mode`]
+  to the GPR `RT`. Converts using [mode `Mode`]. Similar to `fctid` or `fctidz`
  * `fcvttgud RT, FRA, Mode`
    Convert from 64-bit float to 64-bit unsigned integer, writing the result
-  to the GPR `RT`. Converts using [mode `Mode`]
+  to the GPR `RT`. Converts using [mode `Mode`]. Similar to `fctidu` or `fctiduz`
  * `fcvtstgw RT, FRA, Mode`
    Convert from 32-bit float to 32-bit signed integer, writing the result
    to the GPR `RT`. Converts using [mode `Mode`]
@@ -404,8 +435,10 @@ Key for pseudo-code:
  | `int`                     | --          | `u32`/`u64`/`i32`/`i64` (or other types from SimpleV)                                              |
  | `uint`                    | --          | the unsigned integer of the same bit-width as `int`                                                |
  | `int::BITS`               | `int`       | the bit-width of `int`                                                                             |
-| `int::MIN_VALUE`          | `int`       | the minimum value `int` can store (`0` if unsigned, `-2^(int::BITS-1)` if signed)                  |
-| `int::MAX_VALUE`          | `int`       | the maximum value `int` can store (`2^int::BITS - 1` if unsigned, `2^(int::BITS-1) - 1` if signed) |
+| `uint::MIN_VALUE`         | `uint`      | the minimum value `uint` can store: `0`                   |
+| `uint::MAX_VALUE`          | `uint`       | the maximum value `uint` can store: `2^int::BITS - 1`  |
+| `int::MIN_VALUE`          | `int`       | the minimum value `int` can store : `-2^(int::BITS-1)`              |
+| `int::MAX_VALUE`          | `int`       | the maximum value `int` can store :  `2^(int::BITS-1) - 1`  |
  | `int::VALUE_COUNT`        | Integer     | the number of different values `int` can store (`2^int::BITS`). too big to fit in `int`.           |
  | `rint(fp, rounding_mode)` | `fp`        | rounds the floating-point value `fp` to an integer according to rounding mode `rounding_mode`      |
  
@@ -448,7 +481,7 @@ Section 7.1 of the ECMAScript / JavaScript
  def fp_to_int_java_script<fp, int>(v: fp) -> int:
      if v is NaN or infinite:
          return 0
-    v = rint(v, rounding_mode)
+    v = rint(v, rounding_mode)  # assume no loss of precision in result
      v = v mod int::VALUE_COUNT  # 2^32 for i32, 2^64 for i64, result is non-negative
      bits = (uint)v
      return (int)bits