3 # FPR-to-GPR and GPR-to-FPR
5 **Draft Status** under development, for submission as an RFC
9 * <https://bugs.libre-soc.org/show_bug.cgi?id=650>
10 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c71>
11 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c74>
12 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c76>
13 * [[int_fp_mv/appendix]]
17 High-performance CPU/GPU software needs to often convert between integers
18 and floating-point, therefore fast conversion/data-movement instructions
19 are needed. Also given that initialisation of floats tends to take up
20 considerable space (even to just load 0.0) the inclusion of compact
21 format float immediate is up for consideration using BF16 as a base.
23 Libre-SOC will be compliant with the
24 **Scalar Floating-Point Subset** (SFFS) i.e. is not implementing VMX/VSX,
25 and with its focus on modern 3D GPU hybrid workloads represents an
26 important new potential use-case for OpenPOWER.
28 Prior to the formation of the Compliancy Levels first introduced
30 the progressive historic development of the Scalar parts of the Power ISA assumed
31 that VSX would always be there to complement it. However With VMX/VSX
32 **not available** in the newly-introduced SFFS Compliancy Level, the
33 existing non-VSX conversion/data-movement instructions require load/store
34 instructions (slow and expensive) to transfer data between the FPRs and
35 the GPRs. For a 3D GPU this kills any modern competitive edge.
36 Also, because SimpleV needs efficient scalar instructions in
37 order to generate efficient vector instructions, adding new instructions
38 for data-transfer/conversion between FPRs and GPRs multiplies the savings.
40 In addition, the vast majority of GPR <-> FPR data-transfers are as part
41 of a FP <-> Integer conversion sequence, therefore reducing the number
42 of instructions required to the minimum seems necessary.
44 Therefore, we are proposing adding:
46 * FPR load-immediate using `BF16` as the constant
47 * FPR <-> GPR data-transfer instructions that just copy bits without conversion
48 * FPR <-> GPR combined data-transfer/conversion instructions that do
49 Integer <-> FP conversions
51 If adding new Integer <-> FP conversion instructions,
52 the opportunity may be taken to modernise the instructions and make them
53 well-suited for common/important conversion sequences:
55 * **standard IEEE754** - used by most languages and CPUs
56 * **standard OpenPOWER** - saturation with NaN
57 converted to minimum valid integer
58 * **Java** - saturation with NaN converted to 0
59 * **JavaScript** - modulo wrapping with Inf/NaN converted to 0
61 The assembly listings in the [[int_fp_mv/appendix]] show how costly
62 some of these language-specific conversions are: Javascript is 32
63 scalar instructions, including seven branch instructions.
65 ## FP -> Integer conversions
67 Different programming languages turn out to have completely different
68 semantics for FP to Integer conversion. This section gives an overview
69 of the different variants, listing the languages and hardware that
70 implements each variant.
72 ### standard IEEE754 conversion
74 This conversion is outlined in the IEEE754 specification. It is used
75 by nearly all programming languages and CPUs. In the case of OpenPOWER,
76 the rounding mode is read from FPSCR
78 ### standard OpenPower conversion
80 This conversion, instead of exact IEEE754 Compliance, performs
81 "saturation with NaN converted to minimum valid integer". This
82 is also exactly the same as the x86 ISA conversion senantics.
83 OpenPOWER however has instructions for both:
85 * rounding mode read from FPSCR
86 * rounding mode always set to truncate
90 For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by Java's semantics (and Rust's `as` operator) will be referred to as
91 [Java conversion semantics](#fp-to-int-java-conversion-semantics).
93 Those same semantics are used in some way by all of the following languages (not necessarily for the default conversion method):
96 [FP -> Integer conversion](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3)
97 * Rust's FP -> Integer conversion using the
98 [`as` operator](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics)
100 [`llvm.fptosi.sat`](https://llvm.org/docs/LangRef.html#llvm-fptosi-sat-intrinsic) and
101 [`llvm.fptoui.sat`](https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic) intrinsics
102 * SPIR-V's OpenCL dialect's
103 [`OpConvertFToU`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToU) and
104 [`OpConvertFToS`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToS)
105 instructions when decorated with
106 [the `SaturatedConversion` decorator](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_decoration_a_decoration).
108 ### JavaScript conversion
110 For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by JavaScripts's `ToInt32` abstract operation will be referred to as [JavaScript conversion semantics](#fp-to-int-javascript-conversion-semantics).
112 This instruction is present in ARM assembler as FJCVTZS
113 <https://developer.arm.com/documentation/dui0801/g/hko1477562192868>
117 TODO: review and investigate other language semantics
119 # Proposed New Scalar Instructions
121 All of the following instructions use the standard OpenPower conversion to/from 64-bit float format when reading/writing a 32-bit float from/to a FPR. All integers however are sourced/stored in the *GPR*.
123 Integer operands and results being in the GPR is the key differentiator between the proposed instructions
124 (the entire rationale) compated to existing Scalar Power ISA.
125 In all existing Power ISA Scalar conversion instructions, all
126 operands are FPRs, even if the format of the source or destination
127 data is actually a scalar integer.
129 Note that source and destination widths can be overridden by SimpleV
130 SVP64, and that SVP64 also has Saturation Modes *in addition*
131 to those independently described here. SVP64 Overrides and Saturation
132 work on *both* Fixed *and* Floating Point operands and results.
133 The interactions with SVP64
134 are explained in the [[int_fp_mv/appendix]]
141 move a 64-bit float from a FPR to a GPR, just copying bits directly.
142 As a direct bitcopy, no exceptions occur and no status flags are set.
144 Rc=1 tests RT and sets CR0, exactly like all other Scalar Fixed-Point
150 move a 32-bit float from a FPR to a GPR, just copying bits. Converts the
151 64-bit float in `FRA` to a 32-bit float, then writes the 32-bit float to
152 `RT`. Effectively, `fmvtgs` is a macro-fusion of `frsp fmvtg`
153 and therefore has the exact same exception and flags behaviour of `frsp`
155 Unlike `frsp` however, with RT being a GPR, Rc=1 follows
156 standard *integer* behaviour, i.e. tests RT and sets CR0.
162 move a 64-bit float from a GPR to a FPR, just copying bits. No exceptions
163 are raised, no flags are altered of any kind.
165 Rc=1 tests FRT and sets CR1
169 move a 32-bit float from a GPR to a FPR, just copying bits. Converts the
170 32-bit float in `RA` to a 64-bit float, then writes the 64-bit float to
171 `FRT`. Effectively, `fmvfgs` is a macro-fusion of `fmvfg frsp` and
172 therefore has the exact same exception and flags behaviour of `frsp`
174 Rc=1 tests FRT and sets CR1
176 TODO: clear statement on evaluation as to whether exceptions or flags raised as part of the **FP** conversion (not the int bitcopy part, the conversion part. the semantics should really be the same as frsp)
178 v3.0C section 4.6.7.1 states:
180 FPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when VE=1.
182 Special Registers Altered:
187 ## Float load immediate <a name="fmvis"></a>
189 This is like a variant of `fmvfg`
193 Reinterprets `FI << 16` as a 32-bit float, which is then converted to a
194 64-bit float and written to `FRT`. This is equivalent to reinterpreting
195 `FI` as a `BF16` and converting to 64-bit float.
197 There is no need for an Rc=1 variant because this is an immediate loading
198 instruction. This frees up one extra bit in the X-Form format for packing
205 fmvis f4, 0 # writes +0.0 to f4
206 # loading handy constants
207 fmvis f4, 0x8000 # writes -0.0 to f4
208 fmvis f4, 0x3F80 # writes +1.0 to f4
209 fmvis f4, 0xBF80 # writes -1.0 to f4
210 fmvis f4, 0xBFC0 # writes -1.5 to f4
211 fmvis f4, 0x7FC0 # writes +qNaN to f4
212 fmvis f4, 0x7F80 # writes +Infinity to f4
213 fmvis f4, 0xFF80 # writes -Infinity to f4
214 fmvis f4, 0x3FFF # writes +1.9921875 to f4
216 # clearing 128 FPRs with 2 SVP64 instructions
217 # by issuing 32 vec4 (subvector length 4) ops
219 sv.fmvis/vec4 f0, 0 # writes +0.0 to f0-f127
221 Important: If the float load immediate instruction(s) are left out,
222 change all [GPR to FPR conversion instructions](#GPR-to-FPR-conversions)
223 to instead write `+0.0` if `RA` is register `0`, at least
224 allowing clearing FPRs.
226 `fmvis` fits well with DX-Form:
228 | 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form |
229 |--------|------|-------|-------|-------|-----|-----|
230 | Major | FRT | d1 | d0 | XO | d2 | DX-Form |
232 bf16 = d0 || d1 || d2
233 fp32 = bf16 || [0]*16
234 FRT = Single_to_Double(fp32)
236 ## FPR to GPR conversions
238 <div id="fpr-to-gpr-conversion-mode"></div>
242 | 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 |
243 |--------|------|--------|-------|-------|----|
244 | Major | RT | //Mode | FRA | XO | Rc |
245 | Major | FRT | //Mode | RA | XO | Rc |
249 | Mode | `rounding_mode` | Semantics |
250 |------|-----------------|----------------------------------|
251 | 000 | from `FPSCR` | [OpenPower semantics] |
252 | 001 | Truncate | [OpenPower semantics] |
253 | 010 | from `FPSCR` | [Java semantics] |
254 | 011 | Truncate | [Java semantics] |
255 | 100 | from `FPSCR` | [JavaScript semantics] |
256 | 101 | Truncate | [JavaScript semantics] |
257 | rest | -- | illegal instruction trap for now |
259 [OpenPower semantics]: #fp-to-int-openpower-conversion-semantics
260 [Java semantics]: #fp-to-int-java-conversion-semantics
261 [JavaScript semantics]: #fp-to-int-javascript-conversion-semantics
263 * `fcvttgw RT, FRA, Mode`
264 Convert from 64-bit float to 32-bit signed integer, writing the result
265 to the GPR `RT`. Converts using [mode `Mode`]
266 * `fcvttguw RT, FRA, Mode`
267 Convert from 64-bit float to 32-bit unsigned integer, writing the result
268 to the GPR `RT`. Converts using [mode `Mode`]
269 * `fcvttgd RT, FRA, Mode`
270 Convert from 64-bit float to 64-bit signed integer, writing the result
271 to the GPR `RT`. Converts using [mode `Mode`]
272 * `fcvttgud RT, FRA, Mode`
273 Convert from 64-bit float to 64-bit unsigned integer, writing the result
274 to the GPR `RT`. Converts using [mode `Mode`]
275 * `fcvtstgw RT, FRA, Mode`
276 Convert from 32-bit float to 32-bit signed integer, writing the result
277 to the GPR `RT`. Converts using [mode `Mode`]
278 * `fcvtstguw RT, FRA, Mode`
279 Convert from 32-bit float to 32-bit unsigned integer, writing the result
280 to the GPR `RT`. Converts using [mode `Mode`]
281 * `fcvtstgd RT, FRA, Mode`
282 Convert from 32-bit float to 64-bit signed integer, writing the result
283 to the GPR `RT`. Converts using [mode `Mode`]
284 * `fcvtstgud RT, FRA, Mode`
285 Convert from 32-bit float to 64-bit unsigned integer, writing the result
286 to the GPR `RT`. Converts using [mode `Mode`]
288 [mode `Mode`]: #fpr-to-gpr-conversion-mode
290 ## GPR to FPR conversions
292 All of the following GPR to FPR conversions use the rounding mode from `FPSCR`.
295 Convert from 32-bit signed integer in the GPR `RA` to 64-bit float in
298 Convert from 32-bit signed integer in the GPR `RA` to 32-bit float in
301 Convert from 32-bit unsigned integer in the GPR `RA` to 64-bit float in
303 * `fcvtfguws FRT, RA`
304 Convert from 32-bit unsigned integer in the GPR `RA` to 32-bit float in
307 Convert from 64-bit signed integer in the GPR `RA` to 64-bit float in
310 Convert from 64-bit signed integer in the GPR `RA` to 32-bit float in
313 Convert from 64-bit unsigned integer in the GPR `RA` to 64-bit float in
315 * `fcvtfguds FRT, RA`
316 Convert from 64-bit unsigned integer in the GPR `RA` to 32-bit float in
319 # FP to Integer Conversion Pseudo-code
323 | term | result type | definition |
324 |---------------------------|-------------|----------------------------------------------------------------------------------------------------|
325 | `fp` | -- | `f32` or `f64` (or other types from SimpleV) |
326 | `int` | -- | `u32`/`u64`/`i32`/`i64` (or other types from SimpleV) |
327 | `uint` | -- | the unsigned integer of the same bit-width as `int` |
328 | `int::BITS` | `int` | the bit-width of `int` |
329 | `int::MIN_VALUE` | `int` | the minimum value `int` can store (`0` if unsigned, `-2^(int::BITS-1)` if signed) |
330 | `int::MAX_VALUE` | `int` | the maximum value `int` can store (`2^int::BITS - 1` if unsigned, `2^(int::BITS-1) - 1` if signed) |
331 | `int::VALUE_COUNT` | Integer | the number of different values `int` can store (`2^int::BITS`). too big to fit in `int`. |
332 | `rint(fp, rounding_mode)` | `fp` | rounds the floating-point value `fp` to an integer according to rounding mode `rounding_mode` |
334 <div id="fp-to-int-openpower-conversion-semantics"></div>
335 OpenPower conversion semantics (section A.2 page 999 (page 1023) of OpenPower ISA v3.1):
338 def fp_to_int_open_power<fp, int>(v: fp) -> int:
340 return int::MIN_VALUE
341 if v >= int::MAX_VALUE:
342 return int::MAX_VALUE
343 if v <= int::MIN_VALUE:
344 return int::MIN_VALUE
345 return (int)rint(v, rounding_mode)
348 <div id="fp-to-int-java-conversion-semantics"></div>
349 [Java conversion semantics](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3)
351 [Rust semantics](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics)
352 (with adjustment to add non-truncate rounding modes):
355 def fp_to_int_java<fp, int>(v: fp) -> int:
358 if v >= int::MAX_VALUE:
359 return int::MAX_VALUE
360 if v <= int::MIN_VALUE:
361 return int::MIN_VALUE
362 return (int)rint(v, rounding_mode)
365 <div id="fp-to-int-javascript-conversion-semantics"></div>
366 Section 7.1 of the ECMAScript / JavaScript
367 [conversion semantics](https://262.ecma-international.org/11.0/#sec-toint32) (with adjustment to add non-truncate rounding modes):
370 def fp_to_int_java_script<fp, int>(v: fp) -> int:
371 if v is NaN or infinite:
373 v = rint(v, rounding_mode)
374 v = v mod int::VALUE_COUNT # 2^32 for i32, 2^64 for i64, result is non-negative