High-performance CPU/GPU software needs to often convert between integers
and floating-point, therefore fast conversion/data-movement instructions
are needed. Also given that initialisation of floats tends to take up
-considerable space (even to just load 0.0) the inclusion of compact
-format float immediate is up for consideration using BF16 as a base.
+considerable space (even to just load 0.0) the inclusion of two compact
+format float immediate instructions is up for consideration using 16-bit
+immediates. BF16 is one of the formats.
Libre-SOC will be compliant with the
**Scalar Floating-Point Subset** (SFFS) i.e. is not implementing VMX/VSX,
Therefore, we are proposing adding:
-* FPR load-immediate using `BF16` as the constant
+* FPR load-immediate equivalent partially to `BF16`
* FPR <-> GPR data-transfer instructions that just copy bits without conversion
* FPR <-> GPR combined data-transfer/conversion instructions that do
Integer <-> FP conversions
The interactions with SVP64
are explained in the [[int_fp_mv/appendix]]
-# Float load immediate
+# Float load immediate <a name="fmvis"></a>
-These arelike a variant of `fmvfg`. Power ISA currently requires a large
+These are like a variant of `fmvfg` and `oris`, combined.
+Power ISA currently requires a large
number of instructions to get Floating Point constants into registers.
-FP16 and BF16 Formats both fit into 16-bit immediates.
+`fmvis` on its own is equivalent to BF16 to FP32/64 conversion,
+but if followed up by `fishmv` an additional 16 bits of accuracy in the
+mantissa may be achieved.
-## Load BF16 Immediate <a name="fmvis"></a>
+## Load BF16 Immediate
`fmvis FRT, FI`
fp32 = bf16 || [0]*16
FRT = Single_to_Double(fp32)
-## Load FP16 Immediate <a name="fishmv"></a>
+## Floating Extend Immediate <a name="fishmv"></a>
-`fishmv FRT, FI`
+`fishmv FRS, FI`
-Interprets `FI` as an IEEE754 16-bit float, which is then converted to a
-64-bit float and written to `FRT`. This is equivalent to interpreting
-`FI` as a `FP16` and converting to 64-bit float.
-
-There is no need for an Rc=1 variant because this is an immediate loading
-instruction. This frees up one extra bit in the DX-Form format for packing
-a full `FP16`.
+Equivalent to `oris`, an additional 16-bits of immediate is
+strategically inserted into `FRS` to extend its accuracy to
+a full FP32, if a prior `fmvis` instruction had been used to
+set the upper 16-bits.
`fishmv` fits with DX-Form:
| 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form |
|--------|------|-------|-------|-------|-----|-----|
-| Major | FRT | d1 | d0 | XO | d2 | DX-Form |
+| Major | FRS | d1 | d0 | XO | d2 | DX-Form |
Pseudocode:
- fp16 = d0 || d1 || d2
- FRT = Half_to_Double(fp16)
+ fp32 = FRS[48:63] || d0 || d1 || d2
+ FRT = Single_to_Double(fp32)
+
+*This instruction performs a Read-Modify-Write. FRS is read, the additional
+16 bit immediate inserted, and the result also written to FRS*
# Moves