ef561b80b4edfb80fb58f9137e1afb79047de056
[libreriscv.git] / openpower / sv / int_fp_mv.mdwn
1 # FPR-to-GPR and GPR-to-FPR
2
3 **Draft Status** under development, for submission as an RFC
4
5 Links:
6
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=650>
8 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c71>
9 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c74>
10 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c76>
11
12 Introduction:
13
14 High-performance CPU/GPU software needs to often convert between integers
15 and floating-point, therefore fast conversion/data-movement instructions
16 are needed. Also given that initialisation of floats tends to take up
17 considerable space (even to just load 0.0) the inclusion of compact
18 format float immediate is up for consideration using BF16
19
20 Libre-SOC will be compliant with the
21 **Scalar Floating-Point Subset** (SFFS) i.e. is not implementing VMX/VSX,
22 and with its focus on modern 3D GPU hybrid workloads represents an
23 important new potential use-case for OpenPOWER.
24
25 The progressive development of the Scalar parts of the Power ISA assumed
26 that VSX would be there to complement it. However With VMX/VSX
27 **not available** in the newly-introduced SFFS Compliancy Level, the
28 existing non-VSX conversion/data-movement instructions require load/store
29 instructions (slow and expensive) to transfer data between the FPRs and
30 the GPRs. For a 3D GPU this kills any modern competitive edge.
31 Also, because SimpleV needs efficient scalar instructions in
32 order to generate efficient vector instructions, adding new instructions
33 for data-transfer/conversion between FPRs and GPRs multiplies the savings.
34
35 In addition, the vast majority of GPR <-> FPR data-transfers are as part
36 of a FP <-> Integer conversion sequence, therefore reducing the number
37 of instructions required to the minimum seems necessary.
38
39 Therefore, we are proposing adding:
40
41 * FPR load-immediate using `BF16` as the constant
42 * FPR <-> GPR data-transfer instructions that just copy bits without conversion
43 * FPR <-> GPR combined data-transfer/conversion instructions that do
44 Integer <-> FP conversions
45
46 If we're adding new Integer <-> FP conversion instructions, we may
47 as well take this opportunity to modernise the instructions and make them
48 well suited for common/important conversion sequences:
49
50 * standard Integer -> FP IEEE754 conversion (used by most languages and CPUs)
51 * standard OpenPower FP -> Integer conversion (saturation with NaN
52 converted to minimum valid integer)
53 * Rust FP -> Integer conversion (saturation with NaN converted to 0)
54 * JavaScript FP -> Integer conversion (modular with Inf/NaN converted to 0)
55
56 The assembly listings in the [[int_fp_mv/appendix]] show how costly
57 some of these language-specific conversions are: Javascript is 35
58 scalar instructions, including four branches.
59
60 ## FP -> Integer conversions
61
62 Different programming languages turn out to have completely different
63 semantics for FP to Integer conversion. This section gives an overview
64 of the different variants, listing the languages and hardware that
65 implements each variant.
66
67 ## standard Integer -> FP conversion
68
69 This conversion is outlined in the IEEE754 specification. It is used
70 by nearly all programming languages and CPUs. In the case of OpenPOWER,
71 the rounding mode is read from FPSCR
72
73 ### standard OpenPower FP -> Integer conversion
74
75 This conversion, instead of exact IEEE754 Compliance, performs
76 "saturation with NaN converted to minimum valid integer". This
77 is also exactly the same as the x86 ISA conversion senantics.
78 OpenPOWER however has instructions for both:
79
80 * rounding mode read from FPSCR
81 * rounding mode always set to truncate
82
83 ### Rust FP -> Integer conversion
84
85 For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by Rust's `as` operator will be referred to as [Rust conversion semantics](#fp-to-int-rust-conversion-semantics).
86
87 Those same semantics are used in some way by all of the following languages (not necessarily for the default conversion method):
88
89 * Rust's FP -> Integer conversion using the
90 [`as` operator](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics)
91 * Java's
92 [FP -> Integer conversion](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3)
93 * LLVM's
94 [`llvm.fptosi.sat`](https://llvm.org/docs/LangRef.html#llvm-fptosi-sat-intrinsic) and
95 [`llvm.fptoui.sat`](https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic) intrinsics
96 * SPIR-V's OpenCL dialect's
97 [`OpConvertFToU`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToU) and
98 [`OpConvertFToS`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToS)
99 instructions when decorated with
100 [the `SaturatedConversion` decorator](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_decoration_a_decoration).
101
102 ### JavaScript FP -> Integer conversion
103
104 For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by JavaScripts's `ToInt32` abstract operation will be referred to as [JavaScript conversion semantics](#fp-to-int-javascript-conversion-semantics).
105
106 ### Other languages
107
108 TODO: review and investigate other language semantics
109
110 # Proposed New Scalar Instructions
111
112 All of the following instructions use the standard OpenPower conversion to/from 64-bit float format when reading/writing a 32-bit float from/to a FPR. All integers however are sourced/stored in the *GPR*.
113
114 Integer operands and results being in the GPR is the key differentiator between the proposed instructions
115 (the entire rationale) compated to existing Scalar Power ISA.
116 All existing Power ISA Scalar conversion instructions, all
117 operands are FPRs, even if the format of the source or destination
118 data is actually a scalar integer.
119
120 Note that source and destination widths can be overridden by SimpleV
121 SVP64, and that SVP64 also has Saturation Modes *in addition*
122 to those independently described here. SVP64 Overrides and Saturation
123 work on *both* Fixed *and* Floating Point.
124 The interactions with SVP64
125 are explained in the [[int_fp_mv/appendix]]
126
127 ## FPR to GPR moves
128
129 * `fmvtg RT, FRA`
130 * `fmvtg. RT, FRA`
131
132 move a 64-bit float from a FPR to a GPR, just copying bits directly.
133 As a direct bitcopy, no exceptions occur and no status flags are set.
134
135 Rc=1 tests RT and sets CR0, exactly like all other Scalar Fixed-Point
136 operations.
137
138 * `fmvtgs RT, FRA`
139 * `fmvtgs. RT, FRA`
140
141 move a 32-bit float from a FPR to a GPR, just copying bits. Converts the
142 64-bit float in `FRA` to a 32-bit float, then writes the 32-bit float to
143 `RT`. Effectively, `fmvtgs` is a macro-fusion of `frsp fmvtg`
144 and therefore has the exact same exception and flags behaviour of `frsp`
145
146 Unlike `frsp` however, with RT being a GPR, Rc=1 follows
147 standard *integer* behaviour, i.e. tests RT and sets CR0.
148
149 ## GPR to FPR moves
150
151 `fmvfg FRT, RA`
152
153 move a 64-bit float from a GPR to a FPR, just copying bits. No exceptions
154 are raised, no flags are altered of any kind.
155
156 Rc=1 tests FRT and sets CR1
157
158 `fmvfgs FRT, RA`
159
160 move a 32-bit float from a GPR to a FPR, just copying bits. Converts the
161 32-bit float in `RA` to a 64-bit float, then writes the 64-bit float to
162 `FRT`. Effectively, `fmvfgs` is a macro-fusion of `fmvfg frsp` and
163 therefore has the exact same exception and flags behaviour of `frsp`
164
165 Rc=1 tests FRT and sets CR1
166
167 TODO: clear statement on evaluation as to whether exceptions or flags raised as part of the **FP** conversion (not the int bitcopy part, the conversion part. the semantics should really be the same as frsp)
168
169 v3.0C section 4.6.7.1 states:
170
171 FPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when VE=1.
172
173 Special Registers Altered:
174 FPRF FR FI
175 FX OX UX XX VXSNAN
176 CR1 (if Rc=1)
177
178 ### Float load immediate (kinda a variant of `fmvfg`)
179
180 `fmvis FRT, FI`
181
182 Reinterprets `FI << 16` as a 32-bit float, which is then converted to a
183 64-bit float and written to `FRT`. This is equivalent to reinterpreting
184 `FI` as a `BF16` and converting to 64-bit float.
185
186 Example:
187
188 ```
189 # clearing a FPR
190 fmvis f4, 0 # writes +0.0 to f4
191 # loading handy constants
192 fmvis f4, 0x8000 # writes -0.0 to f4
193 fmvis f4, 0x3F80 # writes +1.0 to f4
194 fmvis f4, 0xBF80 # writes -1.0 to f4
195 fmvis f4, 0xBFC0 # writes -1.5 to f4
196 fmvis f4, 0x7FC0 # writes +qNaN to f4
197 fmvis f4, 0x7F80 # writes +Infinity to f4
198 fmvis f4, 0xFF80 # writes -Infinity to f4
199 fmvis f4, 0x3FFF # writes +1.9921875 to f4
200
201 # clearing 128 FPRs with 2 SVP64 instructions
202 # by issuing 32 vec4 (subvector length 4) ops
203 setvli VL=MVL=32
204 sv.fmvis/vec4 f0, 0 # writes +0.0 to f0-f127
205 ```
206 Important: If the float load immediate instruction(s) are left out,
207 change all [GPR to FPR conversion instructions](#GPR-to-FPR-conversions)
208 to instead write `+0.0` if `RA` is register `0`, at least
209 allowing clearing FPRs.
210
211 | 0-5 | 6-10 | 11-25 | 26-30 | 31 |
212 |--------|------|-------|-------|-----|
213 | Major | FRT | FI | XO | FI0 |
214
215 The above fits reasonably well with Minor 19 and follows the
216 pattern shown by `addpcis`, which uses an entire column of Minor 19
217 XO. 15 bits of FI fit into bits 11 to 25,
218 the top bit FI0 (MSB0 numbered 0) makes 16.
219
220 bf16 = FI0 || FI
221 fp32 = bf16 || [0]*16
222 FRT = Single_to_Double(fp32)
223
224 ## FPR to GPR conversions
225
226 <div id="fpr-to-gpr-conversion-mode"></div>
227
228 X-Form:
229
230 | 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 |
231 |--------|------|--------|-------|-------|----|
232 | Major | RT | //Mode | FRA | XO | Rc |
233 | Major | FRT | //Mode | RA | XO | Rc |
234
235 Mode values:
236
237 | Mode | `rounding_mode` | Semantics |
238 |------|-----------------|----------------------------------|
239 | 000 | from `FPSCR` | [OpenPower semantics] |
240 | 001 | Truncate | [OpenPower semantics] |
241 | 010 | from `FPSCR` | [Rust semantics] |
242 | 011 | Truncate | [Rust semantics] |
243 | 100 | from `FPSCR` | [JavaScript semantics] |
244 | 101 | Truncate | [JavaScript semantics] |
245 | rest | -- | illegal instruction trap for now |
246
247 [OpenPower semantics]: #fp-to-int-openpower-conversion-semantics
248 [Rust semantics]: #fp-to-int-rust-conversion-semantics
249 [JavaScript semantics]: #fp-to-int-javascript-conversion-semantics
250
251 `fcvttgw RT, FRA, Mode`
252
253 Convert from 64-bit float to 32-bit signed integer, writing the result
254 to the GPR `RT`. Converts using [mode `Mode`]
255
256 `fcvttguw RT, FRA, Mode`
257
258 Convert from 64-bit float to 32-bit unsigned integer, writing the result
259 to the GPR `RT`. Converts using [mode `Mode`]
260
261 `fcvttgd RT, FRA, Mode`
262
263 Convert from 64-bit float to 64-bit signed integer, writing the result
264 to the GPR `RT`. Converts using [mode `Mode`]
265
266 `fcvttgud RT, FRA, Mode`
267
268 Convert from 64-bit float to 64-bit unsigned integer, writing the result
269 to the GPR `RT`. Converts using [mode `Mode`]
270
271 `fcvtstgw RT, FRA, Mode`
272
273 Convert from 32-bit float to 32-bit signed integer, writing the result
274 to the GPR `RT`. Converts using [mode `Mode`]
275
276 `fcvtstguw RT, FRA, Mode`
277
278 Convert from 32-bit float to 32-bit unsigned integer, writing the result
279 to the GPR `RT`. Converts using [mode `Mode`]
280
281 `fcvtstgd RT, FRA, Mode`
282
283 Convert from 32-bit float to 64-bit signed integer, writing the result
284 to the GPR `RT`. Converts using [mode `Mode`]
285
286 `fcvtstgud RT, FRA, Mode`
287
288 Convert from 32-bit float to 64-bit unsigned integer, writing the result
289 to the GPR `RT`. Converts using [mode `Mode`]
290
291 [mode `Mode`]: #fpr-to-gpr-conversion-mode
292
293 ## GPR to FPR conversions
294
295 All of the following GPR to FPR conversions use the rounding mode from `FPSCR`.
296
297 `fcvtfgw FRT, RA`
298
299 Convert from 32-bit signed integer in the GPR `RA` to 64-bit float in `FRT`.
300
301 `fcvtfgws FRT, RA`
302
303 Convert from 32-bit signed integer in the GPR `RA` to 32-bit float in `FRT`.
304
305 `fcvtfguw FRT, RA`
306
307 Convert from 32-bit unsigned integer in the GPR `RA` to 64-bit float in `FRT`.
308
309 `fcvtfguws FRT, RA`
310
311 Convert from 32-bit unsigned integer in the GPR `RA` to 32-bit float in `FRT`.
312
313 `fcvtfgd FRT, RA`
314
315 Convert from 64-bit signed integer in the GPR `RA` to 64-bit float in `FRT`.
316
317 `fcvtfgds FRT, RA`
318
319 Convert from 64-bit signed integer in the GPR `RA` to 32-bit float in `FRT`.
320
321 `fcvtfgud FRT, RA`
322
323 Convert from 64-bit unsigned integer in the GPR `RA` to 64-bit float in `FRT`.
324
325 `fcvtfguds FRT, RA`
326
327 Convert from 64-bit unsigned integer in the GPR `RA` to 32-bit float in `FRT`.
328
329 # FP to Integer Conversion Pseudo-code
330
331 Key for pseudo-code:
332
333 | term | result type | definition |
334 |---------------------------|-------------|----------------------------------------------------------------------------------------------------|
335 | `fp` | -- | `f32` or `f64` (or other types from SimpleV) |
336 | `int` | -- | `u32`/`u64`/`i32`/`i64` (or other types from SimpleV) |
337 | `uint` | -- | the unsigned integer of the same bit-width as `int` |
338 | `int::BITS` | `int` | the bit-width of `int` |
339 | `int::MIN_VALUE` | `int` | the minimum value `int` can store (`0` if unsigned, `-2^(int::BITS-1)` if signed) |
340 | `int::MAX_VALUE` | `int` | the maximum value `int` can store (`2^int::BITS - 1` if unsigned, `2^(int::BITS-1) - 1` if signed) |
341 | `int::VALUE_COUNT` | Integer | the number of different values `int` can store (`2^int::BITS`). too big to fit in `int`. |
342 | `rint(fp, rounding_mode)` | `fp` | rounds the floating-point value `fp` to an integer according to rounding mode `rounding_mode` |
343
344 <div id="fp-to-int-openpower-conversion-semantics"></div>
345 OpenPower conversion semantics (section A.2 page 999 (page 1023) of OpenPower ISA v3.1):
346
347 ```
348 def fp_to_int_open_power<fp, int>(v: fp) -> int:
349 if v is NaN:
350 return int::MIN_VALUE
351 if v >= int::MAX_VALUE:
352 return int::MAX_VALUE
353 if v <= int::MIN_VALUE:
354 return int::MIN_VALUE
355 return (int)rint(v, rounding_mode)
356 ```
357
358 <div id="fp-to-int-rust-conversion-semantics"></div>
359 Rust [conversion semantics](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics) (with adjustment to add non-truncate rounding modes):
360
361 ```
362 def fp_to_int_rust<fp, int>(v: fp) -> int:
363 if v is NaN:
364 return 0
365 if v >= int::MAX_VALUE:
366 return int::MAX_VALUE
367 if v <= int::MIN_VALUE:
368 return int::MIN_VALUE
369 return (int)rint(v, rounding_mode)
370 ```
371
372 <div id="fp-to-int-javascript-conversion-semantics"></div>
373 Section 7.1 of the ECMAScript / JavaScript
374 [conversion semantics](https://262.ecma-international.org/11.0/#sec-toint32) (with adjustment to add non-truncate rounding modes):
375
376 ```
377 def fp_to_int_java_script<fp, int>(v: fp) -> int:
378 if v is NaN or infinite:
379 return 0
380 v = rint(v, rounding_mode)
381 v = v mod int::VALUE_COUNT # 2^32 for i32, 2^64 for i64, result is non-negative
382 bits = (uint)v
383 return (int)bits
384 ```
385
386 # Equivalent OpenPower ISA v3.0 Assembly Language for FP -> Integer Conversion Modes
387
388 Moved to [[int_fp_mv/appendix]]