From: lkcl <lkcl@web>
Date: Wed, 25 May 2022 11:29:00 +0000 (+0100)
Subject: (no commit message)
X-Git-Tag: opf_rfc_ls005_v1~2101
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=95a4fab2a7fd1ee1c8b33744d7ab7f25d6f419fa;p=libreriscv.git

---

diff --git a/openpower/sv/int_fp_mv.mdwn b/openpower/sv/int_fp_mv.mdwn
index 3baf1fa83..cfa6dbba4 100644
--- a/openpower/sv/int_fp_mv.mdwn
+++ b/openpower/sv/int_fp_mv.mdwn
@@ -17,8 +17,9 @@ Introduction:
 High-performance CPU/GPU software needs to often convert between integers
 and floating-point, therefore fast conversion/data-movement instructions
 are needed.  Also given that initialisation of floats tends to take up
-considerable space (even to just load 0.0) the inclusion of compact
-format float immediate is up for consideration using BF16 as a base.
+considerable space (even to just load 0.0) the inclusion of two compact
+format float immediate instructions is up for consideration using 16-bit
+immediates. BF16 is one of the formats.
 
 Libre-SOC will be compliant with the
 **Scalar Floating-Point Subset** (SFFS) i.e. is not implementing VMX/VSX,
@@ -43,7 +44,7 @@ of instructions required to the minimum seems necessary.
 
 Therefore, we are proposing adding:
 
-* FPR load-immediate using `BF16` as the constant
+* FPR load-immediate equivalent partially to `BF16`
 * FPR <-> GPR data-transfer instructions that just copy bits without conversion
 * FPR <-> GPR combined data-transfer/conversion instructions that do
   Integer <-> FP conversions
@@ -79,13 +80,16 @@ work on *both* Fixed *and* Floating Point operands and results.
  The interactions with SVP64
 are explained in the  [[int_fp_mv/appendix]]
 
-# Float load immediate
+# Float load immediate  <a name="fmvis"></a>
 
-These arelike a variant of `fmvfg`. Power ISA currently requires a large
+These are like a variant of `fmvfg` and `oris`, combined.
+Power ISA currently requires a large
 number of instructions to get Floating Point constants into registers.
-FP16 and BF16 Formats both fit into 16-bit immediates.
+`fmvis` on its own is equivalent to BF16 to FP32/64 conversion,
+but if followed up by `fishmv` an additional 16 bits of accuracy in the
+mantissa may be achieved.
 
-## Load BF16 Immediate <a name="fmvis"></a>
+## Load BF16 Immediate
 
 `fmvis FRT, FI`
 
@@ -134,28 +138,28 @@ Pseudocode:
     fp32 = bf16 || [0]*16
     FRT = Single_to_Double(fp32)
 
-## Load FP16 Immediate <a name="fishmv"></a>
+## Floating Extend Immediate <a name="fishmv"></a>
 
-`fishmv FRT, FI`
+`fishmv FRS, FI`
 
-Interprets `FI` as an IEEE754 16-bit float, which is then converted to a
-64-bit float and written to `FRT`.  This is equivalent to interpreting
-`FI` as a `FP16` and converting to 64-bit float.
-
-There is no need for an Rc=1 variant because this is an immediate loading
-instruction. This frees up one extra bit in the DX-Form format for packing
-a full `FP16`.
+Equivalent to `oris`, an additional 16-bits of immediate is
+strategically inserted into `FRS` to extend its accuracy to
+a full FP32, if a prior `fmvis` instruction had been used to
+set the upper 16-bits.
 
 `fishmv` fits with DX-Form:
 
 |  0-5   | 6-10 | 11-15 | 16-25 | 26-30 | 31  | Form |
 |--------|------|-------|-------|-------|-----|-----|
-|  Major | FRT  | d1    | d0    | XO    | d2  | DX-Form |
+|  Major | FRS  | d1    | d0    | XO    | d2  | DX-Form |
 
 Pseudocode:
 
-    fp16 = d0 || d1 || d2
-    FRT = Half_to_Double(fp16)
+    fp32 = FRS[48:63] || d0 || d1 || d2
+    FRT = Single_to_Double(fp32)
+
+*This instruction performs a Read-Modify-Write. FRS is read, the additional
+16 bit immediate inserted, and the result also written to FRS*
 
 # Moves