3 # FPR-to-GPR and GPR-to-FPR
5 **Draft Status** under development, for submission as an RFC
9 * <https://bugs.libre-soc.org/show_bug.cgi?id=650>
10 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c71>
11 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c74>
12 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c76>
13 * [[int_fp_mv/appendix]]
17 High-performance CPU/GPU software needs to often convert between integers
18 and floating-point, therefore fast conversion/data-movement instructions
19 are needed. Also given that initialisation of floats tends to take up
20 considerable space (even to just load 0.0) the inclusion of compact
21 format float immediate is up for consideration using BF16 as a base.
23 Libre-SOC will be compliant with the
24 **Scalar Floating-Point Subset** (SFFS) i.e. is not implementing VMX/VSX,
25 and with its focus on modern 3D GPU hybrid workloads represents an
26 important new potential use-case for OpenPOWER.
28 Prior to the formation of the Compliancy Levels first introduced
30 the progressive historic development of the Scalar parts of the Power ISA assumed
31 that VSX would always be there to complement it. However With VMX/VSX
32 **not available** in the newly-introduced SFFS Compliancy Level, the
33 existing non-VSX conversion/data-movement instructions require load/store
34 instructions (slow and expensive) to transfer data between the FPRs and
35 the GPRs. For a 3D GPU this kills any modern competitive edge.
36 Also, because SimpleV needs efficient scalar instructions in
37 order to generate efficient vector instructions, adding new instructions
38 for data-transfer/conversion between FPRs and GPRs multiplies the savings.
40 In addition, the vast majority of GPR <-> FPR data-transfers are as part
41 of a FP <-> Integer conversion sequence, therefore reducing the number
42 of instructions required to the minimum seems necessary.
44 Therefore, we are proposing adding:
46 * FPR load-immediate using `BF16` as the constant
47 * FPR <-> GPR data-transfer instructions that just copy bits without conversion
48 * FPR <-> GPR combined data-transfer/conversion instructions that do
49 Integer <-> FP conversions
51 If adding new Integer <-> FP conversion instructions,
52 the opportunity may be taken to modernise the instructions and make them
53 well-suited for common/important conversion sequences:
55 * **standard IEEE754** - used by most languages and CPUs
56 * **standard OpenPOWER** - saturation with NaN
57 converted to minimum valid integer
58 * **Java** - saturation with NaN converted to 0
59 * **JavaScript** - modulo wrapping with Inf/NaN converted to 0
61 The assembly listings in the [[int_fp_mv/appendix]] show how costly
62 some of these language-specific conversions are: Javascript is 32
63 scalar instructions, including seven branch instructions.
65 # Proposed New Scalar Instructions
67 All of the following instructions use the standard OpenPower conversion to/from 64-bit float format when reading/writing a 32-bit float from/to a FPR. All integers however are sourced/stored in the *GPR*.
69 Integer operands and results being in the GPR is the key differentiator between the proposed instructions
70 (the entire rationale) compated to existing Scalar Power ISA.
71 In all existing Power ISA Scalar conversion instructions, all
72 operands are FPRs, even if the format of the source or destination
73 data is actually a scalar integer.
75 Note that source and destination widths can be overridden by SimpleV
76 SVP64, and that SVP64 also has Saturation Modes *in addition*
77 to those independently described here. SVP64 Overrides and Saturation
78 work on *both* Fixed *and* Floating Point operands and results.
79 The interactions with SVP64
80 are explained in the [[int_fp_mv/appendix]]
87 move a 64-bit float from a FPR to a GPR, just copying bits directly.
88 As a direct bitcopy, no exceptions occur and no status flags are set.
90 Rc=1 tests RT and sets CR0, exactly like all other Scalar Fixed-Point
96 move a 32-bit float from a FPR to a GPR, just copying bits. Converts the
97 64-bit float in `FRA` to a 32-bit float, then writes the 32-bit float to
98 `RT`. Effectively, `fmvtgs` is a macro-fusion of `frsp fmvtg`
99 and therefore has the exact same exception and flags behaviour of `frsp`
101 Unlike `frsp` however, with RT being a GPR, Rc=1 follows
102 standard *integer* behaviour, i.e. tests RT and sets CR0.
108 move a 64-bit float from a GPR to a FPR, just copying bits. No exceptions
109 are raised, no flags are altered of any kind.
111 Rc=1 tests FRT and sets CR1
115 move a 32-bit float from a GPR to a FPR, just copying bits. Converts the
116 32-bit float in `RA` to a 64-bit float, then writes the 64-bit float to
117 `FRT`. Effectively, `fmvfgs` is a macro-fusion of `fmvfg frsp` and
118 therefore has the exact same exception and flags behaviour of `frsp`
120 Rc=1 tests FRT and sets CR1
122 TODO: clear statement on evaluation as to whether exceptions or flags raised as part of the **FP** conversion (not the int bitcopy part, the conversion part. the semantics should really be the same as frsp)
124 v3.0C section 4.6.7.1 states:
126 FPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when VE=1.
128 Special Registers Altered:
133 # Float load immediate <a name="fmvis"></a>
135 This is like a variant of `fmvfg`
139 Reinterprets `FI << 16` as a 32-bit float, which is then converted to a
140 64-bit float and written to `FRT`. This is equivalent to reinterpreting
141 `FI` as a `BF16` and converting to 64-bit float.
143 There is no need for an Rc=1 variant because this is an immediate loading
144 instruction. This frees up one extra bit in the X-Form format for packing
151 fmvis f4, 0 # writes +0.0 to f4
152 # loading handy constants
153 fmvis f4, 0x8000 # writes -0.0 to f4
154 fmvis f4, 0x3F80 # writes +1.0 to f4
155 fmvis f4, 0xBF80 # writes -1.0 to f4
156 fmvis f4, 0xBFC0 # writes -1.5 to f4
157 fmvis f4, 0x7FC0 # writes +qNaN to f4
158 fmvis f4, 0x7F80 # writes +Infinity to f4
159 fmvis f4, 0xFF80 # writes -Infinity to f4
160 fmvis f4, 0x3FFF # writes +1.9921875 to f4
162 # clearing 128 FPRs with 2 SVP64 instructions
163 # by issuing 32 vec4 (subvector length 4) ops
165 sv.fmvis/vec4 f0, 0 # writes +0.0 to f0-f127
167 Important: If the float load immediate instruction(s) are left out,
168 change all [GPR to FPR conversion instructions](#GPR-to-FPR-conversions)
169 to instead write `+0.0` if `RA` is register `0`, at least
170 allowing clearing FPRs.
172 `fmvis` fits well with DX-Form:
174 | 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form |
175 |--------|------|-------|-------|-------|-----|-----|
176 | Major | FRT | d1 | d0 | XO | d2 | DX-Form |
178 bf16 = d0 || d1 || d2
179 fp32 = bf16 || [0]*16
180 FRT = Single_to_Double(fp32)
184 Unlike the move instructions
185 these instructions perform conversions between Integer and
186 Floating Point. Truncation can therefore occur, as well
191 | Mode | `rounding_mode` | Semantics |
192 |------|-----------------|----------------------------------|
193 | 000 | from `FPSCR` | [OpenPower semantics] |
194 | 001 | Truncate | [OpenPower semantics] |
195 | 010 | from `FPSCR` | [Java semantics] |
196 | 011 | Truncate | [Java semantics] |
197 | 100 | from `FPSCR` | [JavaScript semantics] |
198 | 101 | Truncate | [JavaScript semantics] |
199 | rest | -- | illegal instruction trap for now |
201 [OpenPower semantics]: #fp-to-int-openpower-conversion-semantics
202 [Java semantics]: #fp-to-int-java-conversion-semantics
203 [JavaScript semantics]: #fp-to-int-javascript-conversion-semantics
205 ## GPR to FPR conversions
209 | 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form |
210 |--------|------|--------|-------|-------|----|------|
211 | Major | FRT | //Mode | RA | XO | Rc |X-Form|
213 All of the following GPR to FPR conversions use the rounding mode from `FPSCR`.
216 Convert from 32-bit signed integer in the GPR `RA` to 64-bit float in
219 Convert from 32-bit signed integer in the GPR `RA` to 32-bit float in
222 Convert from 32-bit unsigned integer in the GPR `RA` to 64-bit float in
224 * `fcvtfguws FRT, RA`
225 Convert from 32-bit unsigned integer in the GPR `RA` to 32-bit float in
228 Convert from 64-bit signed integer in the GPR `RA` to 64-bit float in
231 Convert from 64-bit signed integer in the GPR `RA` to 32-bit float in
234 Convert from 64-bit unsigned integer in the GPR `RA` to 64-bit float in
236 * `fcvtfguds FRT, RA`
237 Convert from 64-bit unsigned integer in the GPR `RA` to 32-bit float in
240 ## FPR to GPR (Integer) conversions
242 <div id="fpr-to-gpr-conversion-mode"></div>
244 Different programming languages turn out to have completely different
245 semantics for FP to Integer conversion. Below is an overview
246 of the different variants, listing the languages and hardware that
247 implements each variant.
249 **Standard IEEE754 conversion**
251 This conversion is outlined in the IEEE754 specification. It is used
252 by nearly all programming languages and CPUs. In the case of OpenPOWER,
253 the rounding mode is read from FPSCR
255 **Standard OpenPower conversion**
257 This conversion, instead of exact IEEE754 Compliance, performs
258 "saturation with NaN converted to minimum valid integer". This
259 is also exactly the same as the x86 ISA conversion senantics.
260 OpenPOWER however has instructions for both:
262 * rounding mode read from FPSCR
263 * rounding mode always set to truncate
267 For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by Java's semantics (and Rust's `as` operator) will be referred to as
268 [Java conversion semantics](#fp-to-int-java-conversion-semantics).
270 Those same semantics are used in some way by all of the following languages (not necessarily for the default conversion method):
273 [FP -> Integer conversion](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3)
274 * Rust's FP -> Integer conversion using the
275 [`as` operator](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics)
277 [`llvm.fptosi.sat`](https://llvm.org/docs/LangRef.html#llvm-fptosi-sat-intrinsic) and
278 [`llvm.fptoui.sat`](https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic) intrinsics
279 * SPIR-V's OpenCL dialect's
280 [`OpConvertFToU`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToU) and
281 [`OpConvertFToS`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToS)
282 instructions when decorated with
283 [the `SaturatedConversion` decorator](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_decoration_a_decoration).
285 **JavaScript conversion**
287 For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by JavaScripts's `ToInt32` abstract operation will be referred to as [JavaScript conversion semantics](#fp-to-int-javascript-conversion-semantics).
289 This instruction is present in ARM assembler as FJCVTZS
290 <https://developer.arm.com/documentation/dui0801/g/hko1477562192868>
294 | 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form |
295 |--------|------|--------|-------|-------|----|------|
296 | Major | RT | //Mode | FRA | XO | Rc |X-Form|
300 * `fcvttgw RT, FRA, Mode`
301 Convert from 64-bit float to 32-bit signed integer, writing the result
302 to the GPR `RT`. Converts using [mode `Mode`]
303 * `fcvttguw RT, FRA, Mode`
304 Convert from 64-bit float to 32-bit unsigned integer, writing the result
305 to the GPR `RT`. Converts using [mode `Mode`]
306 * `fcvttgd RT, FRA, Mode`
307 Convert from 64-bit float to 64-bit signed integer, writing the result
308 to the GPR `RT`. Converts using [mode `Mode`]
309 * `fcvttgud RT, FRA, Mode`
310 Convert from 64-bit float to 64-bit unsigned integer, writing the result
311 to the GPR `RT`. Converts using [mode `Mode`]
312 * `fcvtstgw RT, FRA, Mode`
313 Convert from 32-bit float to 32-bit signed integer, writing the result
314 to the GPR `RT`. Converts using [mode `Mode`]
315 * `fcvtstguw RT, FRA, Mode`
316 Convert from 32-bit float to 32-bit unsigned integer, writing the result
317 to the GPR `RT`. Converts using [mode `Mode`]
318 * `fcvtstgd RT, FRA, Mode`
319 Convert from 32-bit float to 64-bit signed integer, writing the result
320 to the GPR `RT`. Converts using [mode `Mode`]
321 * `fcvtstgud RT, FRA, Mode`
322 Convert from 32-bit float to 64-bit unsigned integer, writing the result
323 to the GPR `RT`. Converts using [mode `Mode`]
325 [mode `Mode`]: #fpr-to-gpr-conversion-mode
327 ## FP to Integer Conversion Pseudo-code
331 | term | result type | definition |
332 |---------------------------|-------------|----------------------------------------------------------------------------------------------------|
333 | `fp` | -- | `f32` or `f64` (or other types from SimpleV) |
334 | `int` | -- | `u32`/`u64`/`i32`/`i64` (or other types from SimpleV) |
335 | `uint` | -- | the unsigned integer of the same bit-width as `int` |
336 | `int::BITS` | `int` | the bit-width of `int` |
337 | `int::MIN_VALUE` | `int` | the minimum value `int` can store (`0` if unsigned, `-2^(int::BITS-1)` if signed) |
338 | `int::MAX_VALUE` | `int` | the maximum value `int` can store (`2^int::BITS - 1` if unsigned, `2^(int::BITS-1) - 1` if signed) |
339 | `int::VALUE_COUNT` | Integer | the number of different values `int` can store (`2^int::BITS`). too big to fit in `int`. |
340 | `rint(fp, rounding_mode)` | `fp` | rounds the floating-point value `fp` to an integer according to rounding mode `rounding_mode` |
342 <div id="fp-to-int-openpower-conversion-semantics"></div>
343 OpenPower conversion semantics (section A.2 page 999 (page 1023) of OpenPower ISA v3.1):
346 def fp_to_int_open_power<fp, int>(v: fp) -> int:
348 return int::MIN_VALUE
349 if v >= int::MAX_VALUE:
350 return int::MAX_VALUE
351 if v <= int::MIN_VALUE:
352 return int::MIN_VALUE
353 return (int)rint(v, rounding_mode)
356 <div id="fp-to-int-java-conversion-semantics"></div>
357 [Java conversion semantics](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3)
359 [Rust semantics](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics)
360 (with adjustment to add non-truncate rounding modes):
363 def fp_to_int_java<fp, int>(v: fp) -> int:
366 if v >= int::MAX_VALUE:
367 return int::MAX_VALUE
368 if v <= int::MIN_VALUE:
369 return int::MIN_VALUE
370 return (int)rint(v, rounding_mode)
373 <div id="fp-to-int-javascript-conversion-semantics"></div>
374 Section 7.1 of the ECMAScript / JavaScript
375 [conversion semantics](https://262.ecma-international.org/11.0/#sec-toint32) (with adjustment to add non-truncate rounding modes):
378 def fp_to_int_java_script<fp, int>(v: fp) -> int:
379 if v is NaN or infinite:
381 v = rint(v, rounding_mode)
382 v = v mod int::VALUE_COUNT # 2^32 for i32, 2^64 for i64, result is non-negative