+* <https://bugs.libre-soc.org/show_bug.cgi?id=230#c76>
+
+Introduction:
+
+High-performance CPU/GPU software needs to often convert between integers
+and floating-point, therefore fast conversion/data-movement instructions
+are needed. Also given that initialisation of floats tends to take up
+considerable space (even to just load 0.0) the inclusion of compact
+format float immediate is up for consideration using BF16
+
+Libre-SOC will be compliant with the
+**Scalar Floating-Point Subset** (SFFS) i.e. is not implementing VMX/VSX,
+and with its focus on modern 3D GPU hybrid workloads represents an
+important new potential use-case for OpenPOWER.
+
+Prior to the formation of the Compliancy Levels first introduced
+in v3.0C and v3.1
+the progressive historic development of the Scalar parts of the Power ISA assumed
+that VSX would always be there to complement it. However With VMX/VSX
+**not available** in the newly-introduced SFFS Compliancy Level, the
+existing non-VSX conversion/data-movement instructions require load/store
+instructions (slow and expensive) to transfer data between the FPRs and
+the GPRs. For a 3D GPU this kills any modern competitive edge.
+Also, because SimpleV needs efficient scalar instructions in
+order to generate efficient vector instructions, adding new instructions
+for data-transfer/conversion between FPRs and GPRs multiplies the savings.
+
+In addition, the vast majority of GPR <-> FPR data-transfers are as part
+of a FP <-> Integer conversion sequence, therefore reducing the number
+of instructions required to the minimum seems necessary.
+
+Therefore, we are proposing adding:
+
+* FPR load-immediate using `BF16` as the constant
+* FPR <-> GPR data-transfer instructions that just copy bits without conversion
+* FPR <-> GPR combined data-transfer/conversion instructions that do
+ Integer <-> FP conversions
+
+If we're adding new Integer <-> FP conversion instructions, we may
+as well take this opportunity to modernise the instructions and make them
+well suited for common/important conversion sequences:
+
+* standard Integer -> FP IEEE754 conversion (used by most languages and CPUs)
+* standard OpenPower FP -> Integer conversion (saturation with NaN
+ converted to minimum valid integer)
+* Rust FP -> Integer conversion (saturation with NaN converted to 0)
+* JavaScript FP -> Integer conversion (modular with Inf/NaN converted to 0)
+
+The assembly listings in the [[int_fp_mv/appendix]] show how costly
+some of these language-specific conversions are: Javascript is 35
+scalar instructions, including four branches.
+
+## FP -> Integer conversions
+
+Different programming languages turn out to have completely different
+semantics for FP to Integer conversion. This section gives an overview
+of the different variants, listing the languages and hardware that
+implements each variant.
+
+## standard Integer -> FP conversion
+
+This conversion is outlined in the IEEE754 specification. It is used
+by nearly all programming languages and CPUs. In the case of OpenPOWER,
+the rounding mode is read from FPSCR
+
+### standard OpenPower FP -> Integer conversion
+
+This conversion, instead of exact IEEE754 Compliance, performs
+"saturation with NaN converted to minimum valid integer". This
+is also exactly the same as the x86 ISA conversion senantics.
+OpenPOWER however has instructions for both:
+
+* rounding mode read from FPSCR
+* rounding mode always set to truncate
+
+### Rust FP -> Integer conversion
+
+For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by Rust's `as` operator will be referred to as [Rust conversion semantics](#fp-to-int-rust-conversion-semantics).
+
+Those same semantics are used in some way by all of the following languages (not necessarily for the default conversion method):
+
+* Rust's FP -> Integer conversion using the
+ [`as` operator](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics)
+* Java's
+ [FP -> Integer conversion](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3)
+* LLVM's
+ [`llvm.fptosi.sat`](https://llvm.org/docs/LangRef.html#llvm-fptosi-sat-intrinsic) and
+ [`llvm.fptoui.sat`](https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic) intrinsics
+* SPIR-V's OpenCL dialect's
+ [`OpConvertFToU`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToU) and
+ [`OpConvertFToS`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToS)
+ instructions when decorated with
+ [the `SaturatedConversion` decorator](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_decoration_a_decoration).
+
+### JavaScript FP -> Integer conversion
+
+For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by JavaScripts's `ToInt32` abstract operation will be referred to as [JavaScript conversion semantics](#fp-to-int-javascript-conversion-semantics).
+
+This instruction is present in ARM assembler as FJCVTZS
+<https://developer.arm.com/documentation/dui0801/g/hko1477562192868>
+
+### Other languages
+
+TODO: review and investigate other language semantics
+
+# Proposed New Scalar Instructions
+
+All of the following instructions use the standard OpenPower conversion to/from 64-bit float format when reading/writing a 32-bit float from/to a FPR. All integers however are sourced/stored in the *GPR*.
+
+Integer operands and results being in the GPR is the key differentiator between the proposed instructions
+(the entire rationale) compated to existing Scalar Power ISA.
+In all existing Power ISA Scalar conversion instructions, all
+operands are FPRs, even if the format of the source or destination
+data is actually a scalar integer.
+
+Note that source and destination widths can be overridden by SimpleV
+SVP64, and that SVP64 also has Saturation Modes *in addition*
+to those independently described here. SVP64 Overrides and Saturation
+work on *both* Fixed *and* Floating Point operands and results.
+ The interactions with SVP64
+are explained in the [[int_fp_mv/appendix]]
+
+## FPR to GPR moves
+
+* `fmvtg RT, FRA`
+* `fmvtg. RT, FRA`
+
+move a 64-bit float from a FPR to a GPR, just copying bits directly.
+As a direct bitcopy, no exceptions occur and no status flags are set.
+
+Rc=1 tests RT and sets CR0, exactly like all other Scalar Fixed-Point
+operations.
+
+* `fmvtgs RT, FRA`
+* `fmvtgs. RT, FRA`
+
+move a 32-bit float from a FPR to a GPR, just copying bits. Converts the
+64-bit float in `FRA` to a 32-bit float, then writes the 32-bit float to
+`RT`. Effectively, `fmvtgs` is a macro-fusion of `frsp fmvtg`
+and therefore has the exact same exception and flags behaviour of `frsp`
+
+Unlike `frsp` however, with RT being a GPR, Rc=1 follows
+standard *integer* behaviour, i.e. tests RT and sets CR0.
+
+## GPR to FPR moves
+
+`fmvfg FRT, RA`
+
+move a 64-bit float from a GPR to a FPR, just copying bits. No exceptions
+are raised, no flags are altered of any kind.
+
+Rc=1 tests FRT and sets CR1
+
+`fmvfgs FRT, RA`
+
+move a 32-bit float from a GPR to a FPR, just copying bits. Converts the
+32-bit float in `RA` to a 64-bit float, then writes the 64-bit float to
+`FRT`. Effectively, `fmvfgs` is a macro-fusion of `fmvfg frsp` and
+therefore has the exact same exception and flags behaviour of `frsp`
+
+Rc=1 tests FRT and sets CR1
+
+TODO: clear statement on evaluation as to whether exceptions or flags raised as part of the **FP** conversion (not the int bitcopy part, the conversion part. the semantics should really be the same as frsp)
+
+v3.0C section 4.6.7.1 states:
+
+FPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when VE=1.
+
+ Special Registers Altered:
+ FPRF FR FI
+ FX OX UX XX VXSNAN
+ CR1 (if Rc=1)
+
+### Float load immediate (kinda a variant of `fmvfg`)
+
+`fmvis FRT, FI`
+
+Reinterprets `FI << 16` as a 32-bit float, which is then converted to a
+64-bit float and written to `FRT`. This is equivalent to reinterpreting
+`FI` as a `BF16` and converting to 64-bit float.
+
+There is no need for an Rc=1 variant because this is an immediate loading
+instruction. This frees up one extra bit in the X-Form format for packing
+a full `BF16`.
+
+Example:
+
+```
+# clearing a FPR
+fmvis f4, 0 # writes +0.0 to f4
+# loading handy constants
+fmvis f4, 0x8000 # writes -0.0 to f4
+fmvis f4, 0x3F80 # writes +1.0 to f4
+fmvis f4, 0xBF80 # writes -1.0 to f4
+fmvis f4, 0xBFC0 # writes -1.5 to f4
+fmvis f4, 0x7FC0 # writes +qNaN to f4
+fmvis f4, 0x7F80 # writes +Infinity to f4
+fmvis f4, 0xFF80 # writes -Infinity to f4
+fmvis f4, 0x3FFF # writes +1.9921875 to f4
+
+# clearing 128 FPRs with 2 SVP64 instructions
+# by issuing 32 vec4 (subvector length 4) ops
+setvli VL=MVL=32
+sv.fmvis/vec4 f0, 0 # writes +0.0 to f0-f127
+```
+Important: If the float load immediate instruction(s) are left out,
+change all [GPR to FPR conversion instructions](#GPR-to-FPR-conversions)
+to instead write `+0.0` if `RA` is register `0`, at least
+allowing clearing FPRs.
+
+| 0-5 | 6-10 | 11-25 | 26-30 | 31 |
+|--------|------|-------|-------|-----|
+| Major | FRT | FI | XO | FI0 |
+
+The above fits reasonably well with Minor 19 and follows the
+pattern shown by `addpcis`, which uses an entire column of Minor 19
+XO. 15 bits of FI fit into bits 11 to 25,
+the top bit FI0 (MSB0 numbered 0) makes 16.
+
+ bf16 = FI0 || FI
+ fp32 = bf16 || [0]*16
+ FRT = Single_to_Double(fp32)
+
+## FPR to GPR conversions
+
+<div id="fpr-to-gpr-conversion-mode"></div>
+
+X-Form:
+
+| 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 |
+|--------|------|--------|-------|-------|----|
+| Major | RT | //Mode | FRA | XO | Rc |
+| Major | FRT | //Mode | RA | XO | Rc |
+
+Mode values:
+
+| Mode | `rounding_mode` | Semantics |
+|------|-----------------|----------------------------------|
+| 000 | from `FPSCR` | [OpenPower semantics] |
+| 001 | Truncate | [OpenPower semantics] |
+| 010 | from `FPSCR` | [Rust semantics] |
+| 011 | Truncate | [Rust semantics] |
+| 100 | from `FPSCR` | [JavaScript semantics] |
+| 101 | Truncate | [JavaScript semantics] |
+| rest | -- | illegal instruction trap for now |
+
+[OpenPower semantics]: #fp-to-int-openpower-conversion-semantics
+[Rust semantics]: #fp-to-int-rust-conversion-semantics
+[JavaScript semantics]: #fp-to-int-javascript-conversion-semantics
+
+`fcvttgw RT, FRA, Mode`
+
+Convert from 64-bit float to 32-bit signed integer, writing the result
+to the GPR `RT`. Converts using [mode `Mode`]
+
+`fcvttguw RT, FRA, Mode`
+
+Convert from 64-bit float to 32-bit unsigned integer, writing the result
+to the GPR `RT`. Converts using [mode `Mode`]
+
+`fcvttgd RT, FRA, Mode`
+
+Convert from 64-bit float to 64-bit signed integer, writing the result
+to the GPR `RT`. Converts using [mode `Mode`]
+
+`fcvttgud RT, FRA, Mode`
+
+Convert from 64-bit float to 64-bit unsigned integer, writing the result
+to the GPR `RT`. Converts using [mode `Mode`]
+
+`fcvtstgw RT, FRA, Mode`
+
+Convert from 32-bit float to 32-bit signed integer, writing the result
+to the GPR `RT`. Converts using [mode `Mode`]
+
+`fcvtstguw RT, FRA, Mode`
+
+Convert from 32-bit float to 32-bit unsigned integer, writing the result
+to the GPR `RT`. Converts using [mode `Mode`]
+
+`fcvtstgd RT, FRA, Mode`
+
+Convert from 32-bit float to 64-bit signed integer, writing the result
+to the GPR `RT`. Converts using [mode `Mode`]
+
+`fcvtstgud RT, FRA, Mode`
+
+Convert from 32-bit float to 64-bit unsigned integer, writing the result
+to the GPR `RT`. Converts using [mode `Mode`]
+
+[mode `Mode`]: #fpr-to-gpr-conversion-mode
+
+## GPR to FPR conversions
+
+All of the following GPR to FPR conversions use the rounding mode from `FPSCR`.
+
+`fcvtfgw FRT, RA`
+
+Convert from 32-bit signed integer in the GPR `RA` to 64-bit float in `FRT`.
+
+`fcvtfgws FRT, RA`
+
+Convert from 32-bit signed integer in the GPR `RA` to 32-bit float in `FRT`.
+
+`fcvtfguw FRT, RA`
+
+Convert from 32-bit unsigned integer in the GPR `RA` to 64-bit float in `FRT`.
+
+`fcvtfguws FRT, RA`
+
+Convert from 32-bit unsigned integer in the GPR `RA` to 32-bit float in `FRT`.
+
+`fcvtfgd FRT, RA`
+
+Convert from 64-bit signed integer in the GPR `RA` to 64-bit float in `FRT`.
+
+`fcvtfgds FRT, RA`
+
+Convert from 64-bit signed integer in the GPR `RA` to 32-bit float in `FRT`.
+
+`fcvtfgud FRT, RA`
+
+Convert from 64-bit unsigned integer in the GPR `RA` to 64-bit float in `FRT`.
+
+`fcvtfguds FRT, RA`
+
+Convert from 64-bit unsigned integer in the GPR `RA` to 32-bit float in `FRT`.
+
+# FP to Integer Conversion Pseudo-code
+
+Key for pseudo-code:
+
+| term | result type | definition |
+|---------------------------|-------------|----------------------------------------------------------------------------------------------------|
+| `fp` | -- | `f32` or `f64` (or other types from SimpleV) |
+| `int` | -- | `u32`/`u64`/`i32`/`i64` (or other types from SimpleV) |
+| `uint` | -- | the unsigned integer of the same bit-width as `int` |
+| `int::BITS` | `int` | the bit-width of `int` |
+| `int::MIN_VALUE` | `int` | the minimum value `int` can store (`0` if unsigned, `-2^(int::BITS-1)` if signed) |
+| `int::MAX_VALUE` | `int` | the maximum value `int` can store (`2^int::BITS - 1` if unsigned, `2^(int::BITS-1) - 1` if signed) |
+| `int::VALUE_COUNT` | Integer | the number of different values `int` can store (`2^int::BITS`). too big to fit in `int`. |
+| `rint(fp, rounding_mode)` | `fp` | rounds the floating-point value `fp` to an integer according to rounding mode `rounding_mode` |