(no commit message)
[libreriscv.git] / openpower / sv / int_fp_mv.mdwn
1 [[!tag standards]]
2
3 # FPR-to-GPR and GPR-to-FPR
4
5 **Draft Status** under development, for submission as an RFC
6
7 Links:
8
9 * <https://bugs.libre-soc.org/show_bug.cgi?id=650>
10 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c71>
11 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c74>
12 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c76>
13 * [[int_fp_mv/appendix]]
14
15 Introduction:
16
17 High-performance CPU/GPU software needs to often convert between integers
18 and floating-point, therefore fast conversion/data-movement instructions
19 are needed. Also given that initialisation of floats tends to take up
20 considerable space (even to just load 0.0) the inclusion of compact
21 format float immediate is up for consideration using BF16 as a base.
22
23 Libre-SOC will be compliant with the
24 **Scalar Floating-Point Subset** (SFFS) i.e. is not implementing VMX/VSX,
25 and with its focus on modern 3D GPU hybrid workloads represents an
26 important new potential use-case for OpenPOWER.
27
28 Prior to the formation of the Compliancy Levels first introduced
29 in v3.0C and v3.1
30 the progressive historic development of the Scalar parts of the Power ISA assumed
31 that VSX would always be there to complement it. However With VMX/VSX
32 **not available** in the newly-introduced SFFS Compliancy Level, the
33 existing non-VSX conversion/data-movement instructions require load/store
34 instructions (slow and expensive) to transfer data between the FPRs and
35 the GPRs. For a 3D GPU this kills any modern competitive edge.
36 Also, because SimpleV needs efficient scalar instructions in
37 order to generate efficient vector instructions, adding new instructions
38 for data-transfer/conversion between FPRs and GPRs multiplies the savings.
39
40 In addition, the vast majority of GPR <-> FPR data-transfers are as part
41 of a FP <-> Integer conversion sequence, therefore reducing the number
42 of instructions required to the minimum seems necessary.
43
44 Therefore, we are proposing adding:
45
46 * FPR load-immediate using `BF16` as the constant
47 * FPR <-> GPR data-transfer instructions that just copy bits without conversion
48 * FPR <-> GPR combined data-transfer/conversion instructions that do
49 Integer <-> FP conversions
50
51 If adding new Integer <-> FP conversion instructions,
52 the opportunity may be taken to modernise the instructions and make them
53 well-suited for common/important conversion sequences:
54
55 * **standard IEEE754** - used by most languages and CPUs
56 * **standard OpenPOWER** - saturation with NaN
57 converted to minimum valid integer
58 * **Java** - saturation with NaN converted to 0
59 * **JavaScript** - modulo wrapping with Inf/NaN converted to 0
60
61 The assembly listings in the [[int_fp_mv/appendix]] show how costly
62 some of these language-specific conversions are: Javascript is 32
63 scalar instructions, including seven branch instructions.
64
65 ## FP -> Integer conversions
66
67 Different programming languages turn out to have completely different
68 semantics for FP to Integer conversion. This section gives an overview
69 of the different variants, listing the languages and hardware that
70 implements each variant.
71
72 ### standard IEEE754 conversion
73
74 This conversion is outlined in the IEEE754 specification. It is used
75 by nearly all programming languages and CPUs. In the case of OpenPOWER,
76 the rounding mode is read from FPSCR
77
78 ### standard OpenPower conversion
79
80 This conversion, instead of exact IEEE754 Compliance, performs
81 "saturation with NaN converted to minimum valid integer". This
82 is also exactly the same as the x86 ISA conversion senantics.
83 OpenPOWER however has instructions for both:
84
85 * rounding mode read from FPSCR
86 * rounding mode always set to truncate
87
88 ### Java conversion
89
90 For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by Java's semantics (and Rust's `as` operator) will be referred to as
91 [Java conversion semantics](#fp-to-int-java-conversion-semantics).
92
93 Those same semantics are used in some way by all of the following languages (not necessarily for the default conversion method):
94
95 * Java's
96 [FP -> Integer conversion](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3)
97 * Rust's FP -> Integer conversion using the
98 [`as` operator](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics)
99 * LLVM's
100 [`llvm.fptosi.sat`](https://llvm.org/docs/LangRef.html#llvm-fptosi-sat-intrinsic) and
101 [`llvm.fptoui.sat`](https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic) intrinsics
102 * SPIR-V's OpenCL dialect's
103 [`OpConvertFToU`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToU) and
104 [`OpConvertFToS`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToS)
105 instructions when decorated with
106 [the `SaturatedConversion` decorator](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_decoration_a_decoration).
107
108 ### JavaScript conversion
109
110 For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by JavaScripts's `ToInt32` abstract operation will be referred to as [JavaScript conversion semantics](#fp-to-int-javascript-conversion-semantics).
111
112 This instruction is present in ARM assembler as FJCVTZS
113 <https://developer.arm.com/documentation/dui0801/g/hko1477562192868>
114
115 ### Other languages
116
117 TODO: review and investigate other language semantics
118
119 # Proposed New Scalar Instructions
120
121 All of the following instructions use the standard OpenPower conversion to/from 64-bit float format when reading/writing a 32-bit float from/to a FPR. All integers however are sourced/stored in the *GPR*.
122
123 Integer operands and results being in the GPR is the key differentiator between the proposed instructions
124 (the entire rationale) compated to existing Scalar Power ISA.
125 In all existing Power ISA Scalar conversion instructions, all
126 operands are FPRs, even if the format of the source or destination
127 data is actually a scalar integer.
128
129 Note that source and destination widths can be overridden by SimpleV
130 SVP64, and that SVP64 also has Saturation Modes *in addition*
131 to those independently described here. SVP64 Overrides and Saturation
132 work on *both* Fixed *and* Floating Point operands and results.
133 The interactions with SVP64
134 are explained in the [[int_fp_mv/appendix]]
135
136 ## FPR to GPR moves
137
138 * `fmvtg RT, FRA`
139 * `fmvtg. RT, FRA`
140
141 move a 64-bit float from a FPR to a GPR, just copying bits directly.
142 As a direct bitcopy, no exceptions occur and no status flags are set.
143
144 Rc=1 tests RT and sets CR0, exactly like all other Scalar Fixed-Point
145 operations.
146
147 * `fmvtgs RT, FRA`
148 * `fmvtgs. RT, FRA`
149
150 move a 32-bit float from a FPR to a GPR, just copying bits. Converts the
151 64-bit float in `FRA` to a 32-bit float, then writes the 32-bit float to
152 `RT`. Effectively, `fmvtgs` is a macro-fusion of `frsp fmvtg`
153 and therefore has the exact same exception and flags behaviour of `frsp`
154
155 Unlike `frsp` however, with RT being a GPR, Rc=1 follows
156 standard *integer* behaviour, i.e. tests RT and sets CR0.
157
158 ## GPR to FPR moves
159
160 `fmvfg FRT, RA`
161
162 move a 64-bit float from a GPR to a FPR, just copying bits. No exceptions
163 are raised, no flags are altered of any kind.
164
165 Rc=1 tests FRT and sets CR1
166
167 `fmvfgs FRT, RA`
168
169 move a 32-bit float from a GPR to a FPR, just copying bits. Converts the
170 32-bit float in `RA` to a 64-bit float, then writes the 64-bit float to
171 `FRT`. Effectively, `fmvfgs` is a macro-fusion of `fmvfg frsp` and
172 therefore has the exact same exception and flags behaviour of `frsp`
173
174 Rc=1 tests FRT and sets CR1
175
176 TODO: clear statement on evaluation as to whether exceptions or flags raised as part of the **FP** conversion (not the int bitcopy part, the conversion part. the semantics should really be the same as frsp)
177
178 v3.0C section 4.6.7.1 states:
179
180 FPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when VE=1.
181
182 Special Registers Altered:
183 FPRF FR FI
184 FX OX UX XX VXSNAN
185 CR1 (if Rc=1)
186
187 ## Float load immediate <a name="fmvis"></a>
188
189 This is like a variant of `fmvfg`
190
191 `fmvis FRT, FI`
192
193 Reinterprets `FI << 16` as a 32-bit float, which is then converted to a
194 64-bit float and written to `FRT`. This is equivalent to reinterpreting
195 `FI` as a `BF16` and converting to 64-bit float.
196
197 There is no need for an Rc=1 variant because this is an immediate loading
198 instruction. This frees up one extra bit in the X-Form format for packing
199 a full `BF16`.
200
201 Example:
202
203 ```
204 # clearing a FPR
205 fmvis f4, 0 # writes +0.0 to f4
206 # loading handy constants
207 fmvis f4, 0x8000 # writes -0.0 to f4
208 fmvis f4, 0x3F80 # writes +1.0 to f4
209 fmvis f4, 0xBF80 # writes -1.0 to f4
210 fmvis f4, 0xBFC0 # writes -1.5 to f4
211 fmvis f4, 0x7FC0 # writes +qNaN to f4
212 fmvis f4, 0x7F80 # writes +Infinity to f4
213 fmvis f4, 0xFF80 # writes -Infinity to f4
214 fmvis f4, 0x3FFF # writes +1.9921875 to f4
215
216 # clearing 128 FPRs with 2 SVP64 instructions
217 # by issuing 32 vec4 (subvector length 4) ops
218 setvli VL=MVL=32
219 sv.fmvis/vec4 f0, 0 # writes +0.0 to f0-f127
220 ```
221 Important: If the float load immediate instruction(s) are left out,
222 change all [GPR to FPR conversion instructions](#GPR-to-FPR-conversions)
223 to instead write `+0.0` if `RA` is register `0`, at least
224 allowing clearing FPRs.
225
226 `fmvis` fits well with DX-Form:
227
228 | 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form |
229 |--------|------|-------|-------|-------|-----|-----|
230 | Major | FRT | d1 | d0 | XO | d2 | DX-Form |
231
232 bf16 = d0 || d1 || d2
233 fp32 = bf16 || [0]*16
234 FRT = Single_to_Double(fp32)
235
236 ## FPR to GPR conversions
237
238 <div id="fpr-to-gpr-conversion-mode"></div>
239
240 X-Form:
241
242 | 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 |
243 |--------|------|--------|-------|-------|----|
244 | Major | RT | //Mode | FRA | XO | Rc |
245 | Major | FRT | //Mode | RA | XO | Rc |
246
247 Mode values:
248
249 | Mode | `rounding_mode` | Semantics |
250 |------|-----------------|----------------------------------|
251 | 000 | from `FPSCR` | [OpenPower semantics] |
252 | 001 | Truncate | [OpenPower semantics] |
253 | 010 | from `FPSCR` | [Java semantics] |
254 | 011 | Truncate | [Java semantics] |
255 | 100 | from `FPSCR` | [JavaScript semantics] |
256 | 101 | Truncate | [JavaScript semantics] |
257 | rest | -- | illegal instruction trap for now |
258
259 [OpenPower semantics]: #fp-to-int-openpower-conversion-semantics
260 [Java semantics]: #fp-to-int-java-conversion-semantics
261 [JavaScript semantics]: #fp-to-int-javascript-conversion-semantics
262
263 * `fcvttgw RT, FRA, Mode`
264 Convert from 64-bit float to 32-bit signed integer, writing the result
265 to the GPR `RT`. Converts using [mode `Mode`]
266 * `fcvttguw RT, FRA, Mode`
267 Convert from 64-bit float to 32-bit unsigned integer, writing the result
268 to the GPR `RT`. Converts using [mode `Mode`]
269 * `fcvttgd RT, FRA, Mode`
270 Convert from 64-bit float to 64-bit signed integer, writing the result
271 to the GPR `RT`. Converts using [mode `Mode`]
272 * `fcvttgud RT, FRA, Mode`
273 Convert from 64-bit float to 64-bit unsigned integer, writing the result
274 to the GPR `RT`. Converts using [mode `Mode`]
275 * `fcvtstgw RT, FRA, Mode`
276 Convert from 32-bit float to 32-bit signed integer, writing the result
277 to the GPR `RT`. Converts using [mode `Mode`]
278 * `fcvtstguw RT, FRA, Mode`
279 Convert from 32-bit float to 32-bit unsigned integer, writing the result
280 to the GPR `RT`. Converts using [mode `Mode`]
281 * `fcvtstgd RT, FRA, Mode`
282 Convert from 32-bit float to 64-bit signed integer, writing the result
283 to the GPR `RT`. Converts using [mode `Mode`]
284 * `fcvtstgud RT, FRA, Mode`
285 Convert from 32-bit float to 64-bit unsigned integer, writing the result
286 to the GPR `RT`. Converts using [mode `Mode`]
287
288 [mode `Mode`]: #fpr-to-gpr-conversion-mode
289
290 ## GPR to FPR conversions
291
292 All of the following GPR to FPR conversions use the rounding mode from `FPSCR`.
293
294 * `fcvtfgw FRT, RA`
295 Convert from 32-bit signed integer in the GPR `RA` to 64-bit float in
296 `FRT`.
297 * `fcvtfgws FRT, RA`
298 Convert from 32-bit signed integer in the GPR `RA` to 32-bit float in
299 `FRT`.
300 * `fcvtfguw FRT, RA`
301 Convert from 32-bit unsigned integer in the GPR `RA` to 64-bit float in
302 `FRT`.
303 * `fcvtfguws FRT, RA`
304 Convert from 32-bit unsigned integer in the GPR `RA` to 32-bit float in
305 `FRT`.
306 * `fcvtfgd FRT, RA`
307 Convert from 64-bit signed integer in the GPR `RA` to 64-bit float in
308 `FRT`.
309 * `fcvtfgds FRT, RA`
310 Convert from 64-bit signed integer in the GPR `RA` to 32-bit float in
311 `FRT`.
312 * `fcvtfgud FRT, RA`
313 Convert from 64-bit unsigned integer in the GPR `RA` to 64-bit float in
314 `FRT`.
315 * `fcvtfguds FRT, RA`
316 Convert from 64-bit unsigned integer in the GPR `RA` to 32-bit float in
317 `FRT`.
318
319 # FP to Integer Conversion Pseudo-code
320
321 Key for pseudo-code:
322
323 | term | result type | definition |
324 |---------------------------|-------------|----------------------------------------------------------------------------------------------------|
325 | `fp` | -- | `f32` or `f64` (or other types from SimpleV) |
326 | `int` | -- | `u32`/`u64`/`i32`/`i64` (or other types from SimpleV) |
327 | `uint` | -- | the unsigned integer of the same bit-width as `int` |
328 | `int::BITS` | `int` | the bit-width of `int` |
329 | `int::MIN_VALUE` | `int` | the minimum value `int` can store (`0` if unsigned, `-2^(int::BITS-1)` if signed) |
330 | `int::MAX_VALUE` | `int` | the maximum value `int` can store (`2^int::BITS - 1` if unsigned, `2^(int::BITS-1) - 1` if signed) |
331 | `int::VALUE_COUNT` | Integer | the number of different values `int` can store (`2^int::BITS`). too big to fit in `int`. |
332 | `rint(fp, rounding_mode)` | `fp` | rounds the floating-point value `fp` to an integer according to rounding mode `rounding_mode` |
333
334 <div id="fp-to-int-openpower-conversion-semantics"></div>
335 OpenPower conversion semantics (section A.2 page 999 (page 1023) of OpenPower ISA v3.1):
336
337 ```
338 def fp_to_int_open_power<fp, int>(v: fp) -> int:
339 if v is NaN:
340 return int::MIN_VALUE
341 if v >= int::MAX_VALUE:
342 return int::MAX_VALUE
343 if v <= int::MIN_VALUE:
344 return int::MIN_VALUE
345 return (int)rint(v, rounding_mode)
346 ```
347
348 <div id="fp-to-int-java-conversion-semantics"></div>
349 [Java conversion semantics](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3)
350 /
351 [Rust semantics](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics)
352 (with adjustment to add non-truncate rounding modes):
353
354 ```
355 def fp_to_int_java<fp, int>(v: fp) -> int:
356 if v is NaN:
357 return 0
358 if v >= int::MAX_VALUE:
359 return int::MAX_VALUE
360 if v <= int::MIN_VALUE:
361 return int::MIN_VALUE
362 return (int)rint(v, rounding_mode)
363 ```
364
365 <div id="fp-to-int-javascript-conversion-semantics"></div>
366 Section 7.1 of the ECMAScript / JavaScript
367 [conversion semantics](https://262.ecma-international.org/11.0/#sec-toint32) (with adjustment to add non-truncate rounding modes):
368
369 ```
370 def fp_to_int_java_script<fp, int>(v: fp) -> int:
371 if v is NaN or infinite:
372 return 0
373 v = rint(v, rounding_mode)
374 v = v mod int::VALUE_COUNT # 2^32 for i32, 2^64 for i64, result is non-negative
375 bits = (uint)v
376 return (int)bits
377 ```
378