1 # FPR-to-GPR and GPR-to-FPR
3 **Draft Status** under development, for submission as an RFC
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=650>
8 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c71>
9 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c74>
10 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c76>
14 High-performance CPU/GPU software needs to often convert between integers
15 and floating-point, therefore fast conversion/data-movement instructions
16 are needed. Also given that initialisation of floats tends to take up
17 considerable space (even to just load 0.0) the inclusion of compact
18 format float immediate is up for consideration using BF16
20 Libre-SOC will be compliant with the
21 **Scalar Floating-Point Subset** (SFFS) i.e. is not implementing VMX/VSX,
22 and with its focus on modern 3D GPU hybrid workloads represents an
23 important new potential use-case for OpenPOWER.
25 The progressive development of the Scalar parts of the Power ISA assumed
26 that VSX would be there to complement it. However With VMX/VSX
27 **not available** in the newly-introduced SFFS Compliancy Level, the
28 existing non-VSX conversion/data-movement instructions require load/store
29 instructions (slow and expensive) to transfer data between the FPRs and
30 the GPRs. For a 3D GPU this kills any modern competitive edge.
31 Also, because SimpleV needs efficient scalar instructions in
32 order to generate efficient vector instructions, adding new instructions
33 for data-transfer/conversion between FPRs and GPRs multiplies the savings.
35 In addition, the vast majority of GPR <-> FPR data-transfers are as part
36 of a FP <-> Integer conversion sequence, therefore reducing the number
37 of instructions required to the minimum seems necessary.
39 Therefore, we are proposing adding:
41 * FPR load-immediate using `BF16` as the constant
42 * FPR <-> GPR data-transfer instructions that just copy bits without conversion
43 * FPR <-> GPR combined data-transfer/conversion instructions that do
44 Integer <-> FP conversions
46 If we're adding new Integer <-> FP conversion instructions, we may
47 as well take this opportunity to modernise the instructions and make them
48 well suited for common/important conversion sequences:
50 * standard Integer -> FP IEEE754 conversion (used by most languages and CPUs)
51 * standard OpenPower FP -> Integer conversion (saturation with NaN
52 converted to minimum valid integer)
53 * Rust FP -> Integer conversion (saturation with NaN converted to 0)
54 * JavaScript FP -> Integer conversion (modular with Inf/NaN converted to 0)
56 The assembly listings in the [[int_fp_mv/appendix]] show how costly
57 some of these language-specific conversions are: Javascript is 35
58 scalar instructions, including four branches.
60 ## FP -> Integer conversions
62 Different programming languages turn out to have completely different
63 semantics for FP to Integer conversion. This section gives an overview
64 of the different variants, listing the languages and hardware that
65 implements each variant.
67 ## standard Integer -> FP conversion
69 This conversion is outlined in the IEEE754 specification. It is used
70 by nearly all programming languages and CPUs. In the case of OpenPOWER,
71 the rounding mode is read from FPSCR
73 ### standard OpenPower FP -> Integer conversion
75 This conversion, instead of exact IEEE754 Compliance, performs
76 "saturation with NaN converted to minimum valid integer". This
77 is also exactly the same as the x86 ISA conversion senantics.
78 OpenPOWER however has instructions for both:
80 * rounding mode read from FPSCR
81 * rounding mode always set to truncate
83 ### Rust FP -> Integer conversion
85 For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by Rust's `as` operator will be referred to as [Rust conversion semantics](#fp-to-int-rust-conversion-semantics).
87 Those same semantics are used in some way by all of the following languages (not necessarily for the default conversion method):
89 * Rust's FP -> Integer conversion using the
90 [`as` operator](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics)
92 [FP -> Integer conversion](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3)
94 [`llvm.fptosi.sat`](https://llvm.org/docs/LangRef.html#llvm-fptosi-sat-intrinsic) and
95 [`llvm.fptoui.sat`](https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic) intrinsics
96 * SPIR-V's OpenCL dialect's
97 [`OpConvertFToU`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToU) and
98 [`OpConvertFToS`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToS)
99 instructions when decorated with
100 [the `SaturatedConversion` decorator](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_decoration_a_decoration).
102 ### JavaScript FP -> Integer conversion
104 For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by JavaScripts's `ToInt32` abstract operation will be referred to as [JavaScript conversion semantics](#fp-to-int-javascript-conversion-semantics).
108 TODO: review and investigate other language semantics
110 # Proposed New Scalar Instructions
112 All of the following instructions use the standard OpenPower conversion to/from 64-bit float format when reading/writing a 32-bit float from/to a FPR. All integers however are sourced/stored in the *GPR*.
114 Integer operands and results being in the GPR is the key differentiator between the proposed instructions
115 (the entire rationale) compated to existing Scalar Power ISA.
116 All existing Power ISA Scalar conversion instructions, all
117 operands are FPRs, even if the format of the source or destination
118 data is actually a scalar integer.
120 Note that source and destination widths can be overridden by SimpleV
121 SVP64, and that SVP64 also has Saturation Modes *in addition*
122 to those independently described here. SVP64 Overrides and Saturation
123 work on *both* Fixed *and* Floating Point.
124 The interactions with SVP64
125 are explained in the [[int_fp_mv/appendix]]
132 move a 64-bit float from a FPR to a GPR, just copying bits directly.
133 As a direct bitcopy, no exceptions occur and no status flags are set.
135 Rc=1 tests RT and sets CR0, exactly like all other Scalar Fixed-Point
141 move a 32-bit float from a FPR to a GPR, just copying bits. Converts the
142 64-bit float in `FRA` to a 32-bit float, then writes the 32-bit float to
143 `RT`. Effectively, `fmvtgs` is a macro-fusion of `frsp fmvtg`
144 and therefore has the exact same exception and flags behaviour of `frsp`
146 Unlike `frsp` however, with RT being a GPR, Rc=1 follows
147 standard *integer* behaviour, i.e. tests RT and sets CR0.
153 move a 64-bit float from a GPR to a FPR, just copying bits. No exceptions
154 are raised, no flags are altered of any kind.
156 Rc=1 tests FRT and sets CR1
160 move a 32-bit float from a GPR to a FPR, just copying bits. Converts the
161 32-bit float in `RA` to a 64-bit float, then writes the 64-bit float to
162 `FRT`. Effectively, `fmvfgs` is a macro-fusion of `fmvfg frsp` and
163 therefore has the exact same exception and flags behaviour of `frsp`
165 Rc=1 tests FRT and sets CR1
167 TODO: clear statement on evaluation as to whether exceptions or flags raised as part of the **FP** conversion (not the int bitcopy part, the conversion part. the semantics should really be the same as frsp)
169 v3.0C section 4.6.7.1 states:
171 FPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when VE=1.
173 Special Registers Altered:
178 ### Float load immediate (kinda a variant of `fmvfg`)
182 Reinterprets `FI << 16` as a 32-bit float, which is then converted to a
183 64-bit float and written to `FRT`. This is equivalent to reinterpreting
184 `FI` as a `BF16` and converting to 64-bit float.
190 fmvis f4, 0 # writes +0.0 to f4
191 # loading handy constants
192 fmvis f4, 0x8000 # writes -0.0 to f4
193 fmvis f4, 0x3F80 # writes +1.0 to f4
194 fmvis f4, 0xBF80 # writes -1.0 to f4
195 fmvis f4, 0xBFC0 # writes -1.5 to f4
196 fmvis f4, 0x7FC0 # writes +qNaN to f4
197 fmvis f4, 0x7F80 # writes +Infinity to f4
198 fmvis f4, 0xFF80 # writes -Infinity to f4
199 fmvis f4, 0x3FFF # writes +1.9921875 to f4
201 # clearing 128 FPRs with 2 SVP64 instructions
202 # by issuing 32 vec4 (subvector length 4) ops
204 sv.fmvis/vec4 f0, 0 # writes +0.0 to f0-f127
206 Important: If the float load immediate instruction(s) are left out,
207 change all [GPR to FPR conversion instructions](#GPR-to-FPR-conversions)
208 to instead write `+0.0` if `RA` is register `0`, at least
209 allowing clearing FPRs.
211 | 0-5 | 6-10 | 11-25 | 26-30 | 31 |
212 |--------|------|-------|-------|-----|
213 | Major | FRT | FI | XO | FI0 |
215 The above fits reasonably well with Minor 19 and follows the
216 pattern shown by `addpcis`, which uses an entire column of Minor 19
217 XO. 15 bits of FI fit into bits 11 to 25,
218 the top bit FI0 (MSB0 numbered 0) makes 16.
221 fp32 = bf16 || [0]*16
222 FRT = Single_to_Double(fp32)
224 ## FPR to GPR conversions
226 <div id="fpr-to-gpr-conversion-mode"></div>
230 | 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 |
231 |--------|------|--------|-------|-------|----|
232 | Major | RT | //Mode | FRA | XO | Rc |
233 | Major | FRT | //Mode | RA | XO | Rc |
237 | Mode | `rounding_mode` | Semantics |
238 |------|-----------------|----------------------------------|
239 | 000 | from `FPSCR` | [OpenPower semantics] |
240 | 001 | Truncate | [OpenPower semantics] |
241 | 010 | from `FPSCR` | [Rust semantics] |
242 | 011 | Truncate | [Rust semantics] |
243 | 100 | from `FPSCR` | [JavaScript semantics] |
244 | 101 | Truncate | [JavaScript semantics] |
245 | rest | -- | illegal instruction trap for now |
247 [OpenPower semantics]: #fp-to-int-openpower-conversion-semantics
248 [Rust semantics]: #fp-to-int-rust-conversion-semantics
249 [JavaScript semantics]: #fp-to-int-javascript-conversion-semantics
251 `fcvttgw RT, FRA, Mode`
253 Convert from 64-bit float to 32-bit signed integer, writing the result
254 to the GPR `RT`. Converts using [mode `Mode`]
256 `fcvttguw RT, FRA, Mode`
258 Convert from 64-bit float to 32-bit unsigned integer, writing the result
259 to the GPR `RT`. Converts using [mode `Mode`]
261 `fcvttgd RT, FRA, Mode`
263 Convert from 64-bit float to 64-bit signed integer, writing the result
264 to the GPR `RT`. Converts using [mode `Mode`]
266 `fcvttgud RT, FRA, Mode`
268 Convert from 64-bit float to 64-bit unsigned integer, writing the result
269 to the GPR `RT`. Converts using [mode `Mode`]
271 `fcvtstgw RT, FRA, Mode`
273 Convert from 32-bit float to 32-bit signed integer, writing the result
274 to the GPR `RT`. Converts using [mode `Mode`]
276 `fcvtstguw RT, FRA, Mode`
278 Convert from 32-bit float to 32-bit unsigned integer, writing the result
279 to the GPR `RT`. Converts using [mode `Mode`]
281 `fcvtstgd RT, FRA, Mode`
283 Convert from 32-bit float to 64-bit signed integer, writing the result
284 to the GPR `RT`. Converts using [mode `Mode`]
286 `fcvtstgud RT, FRA, Mode`
288 Convert from 32-bit float to 64-bit unsigned integer, writing the result
289 to the GPR `RT`. Converts using [mode `Mode`]
291 [mode `Mode`]: #fpr-to-gpr-conversion-mode
293 ## GPR to FPR conversions
295 All of the following GPR to FPR conversions use the rounding mode from `FPSCR`.
299 Convert from 32-bit signed integer in the GPR `RA` to 64-bit float in `FRT`.
303 Convert from 32-bit signed integer in the GPR `RA` to 32-bit float in `FRT`.
307 Convert from 32-bit unsigned integer in the GPR `RA` to 64-bit float in `FRT`.
311 Convert from 32-bit unsigned integer in the GPR `RA` to 32-bit float in `FRT`.
315 Convert from 64-bit signed integer in the GPR `RA` to 64-bit float in `FRT`.
319 Convert from 64-bit signed integer in the GPR `RA` to 32-bit float in `FRT`.
323 Convert from 64-bit unsigned integer in the GPR `RA` to 64-bit float in `FRT`.
327 Convert from 64-bit unsigned integer in the GPR `RA` to 32-bit float in `FRT`.
329 # FP to Integer Conversion Pseudo-code
333 | term | result type | definition |
334 |---------------------------|-------------|----------------------------------------------------------------------------------------------------|
335 | `fp` | -- | `f32` or `f64` (or other types from SimpleV) |
336 | `int` | -- | `u32`/`u64`/`i32`/`i64` (or other types from SimpleV) |
337 | `uint` | -- | the unsigned integer of the same bit-width as `int` |
338 | `int::BITS` | `int` | the bit-width of `int` |
339 | `int::MIN_VALUE` | `int` | the minimum value `int` can store (`0` if unsigned, `-2^(int::BITS-1)` if signed) |
340 | `int::MAX_VALUE` | `int` | the maximum value `int` can store (`2^int::BITS - 1` if unsigned, `2^(int::BITS-1) - 1` if signed) |
341 | `int::VALUE_COUNT` | Integer | the number of different values `int` can store (`2^int::BITS`). too big to fit in `int`. |
342 | `rint(fp, rounding_mode)` | `fp` | rounds the floating-point value `fp` to an integer according to rounding mode `rounding_mode` |
344 <div id="fp-to-int-openpower-conversion-semantics"></div>
345 OpenPower conversion semantics (section A.2 page 999 (page 1023) of OpenPower ISA v3.1):
348 def fp_to_int_open_power<fp, int>(v: fp) -> int:
350 return int::MIN_VALUE
351 if v >= int::MAX_VALUE:
352 return int::MAX_VALUE
353 if v <= int::MIN_VALUE:
354 return int::MIN_VALUE
355 return (int)rint(v, rounding_mode)
358 <div id="fp-to-int-rust-conversion-semantics"></div>
359 Rust [conversion semantics](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics) (with adjustment to add non-truncate rounding modes):
362 def fp_to_int_rust<fp, int>(v: fp) -> int:
365 if v >= int::MAX_VALUE:
366 return int::MAX_VALUE
367 if v <= int::MIN_VALUE:
368 return int::MIN_VALUE
369 return (int)rint(v, rounding_mode)
372 <div id="fp-to-int-javascript-conversion-semantics"></div>
373 Section 7.1 of the ECMAScript / JavaScript
374 [conversion semantics](https://262.ecma-international.org/11.0/#sec-toint32) (with adjustment to add non-truncate rounding modes):
377 def fp_to_int_java_script<fp, int>(v: fp) -> int:
378 if v is NaN or infinite:
380 v = rint(v, rounding_mode)
381 v = v mod int::VALUE_COUNT # 2^32 for i32, 2^64 for i64, result is non-negative
386 # Equivalent OpenPower ISA v3.0 Assembly Language for FP -> Integer Conversion Modes
388 Moved to [[int_fp_mv/appendix]]