(no commit message)
[libreriscv.git] / openpower / sv / int_fp_mv.mdwn
1 [[!tag standards]]
2
3 # FPR-to-GPR and GPR-to-FPR
4
5 **Draft Status** under development, for submission as an RFC
6
7 Links:
8
9 * <https://bugs.libre-soc.org/show_bug.cgi?id=650>
10 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c71>
11 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c74>
12 * <https://bugs.libre-soc.org/show_bug.cgi?id=230#c76>
13 * [[int_fp_mv/appendix]]
14
15 Introduction:
16
17 High-performance CPU/GPU software needs to often convert between integers
18 and floating-point, therefore fast conversion/data-movement instructions
19 are needed. Also given that initialisation of floats tends to take up
20 considerable space (even to just load 0.0) the inclusion of compact
21 format float immediate is up for consideration using BF16 as a base.
22
23 Libre-SOC will be compliant with the
24 **Scalar Floating-Point Subset** (SFFS) i.e. is not implementing VMX/VSX,
25 and with its focus on modern 3D GPU hybrid workloads represents an
26 important new potential use-case for OpenPOWER.
27
28 Prior to the formation of the Compliancy Levels first introduced
29 in v3.0C and v3.1
30 the progressive historic development of the Scalar parts of the Power ISA assumed
31 that VSX would always be there to complement it. However With VMX/VSX
32 **not available** in the newly-introduced SFFS Compliancy Level, the
33 existing non-VSX conversion/data-movement instructions require load/store
34 instructions (slow and expensive) to transfer data between the FPRs and
35 the GPRs. For a 3D GPU this kills any modern competitive edge.
36 Also, because SimpleV needs efficient scalar instructions in
37 order to generate efficient vector instructions, adding new instructions
38 for data-transfer/conversion between FPRs and GPRs multiplies the savings.
39
40 In addition, the vast majority of GPR <-> FPR data-transfers are as part
41 of a FP <-> Integer conversion sequence, therefore reducing the number
42 of instructions required to the minimum seems necessary.
43
44 Therefore, we are proposing adding:
45
46 * FPR load-immediate using `BF16` as the constant
47 * FPR <-> GPR data-transfer instructions that just copy bits without conversion
48 * FPR <-> GPR combined data-transfer/conversion instructions that do
49 Integer <-> FP conversions
50
51 If adding new Integer <-> FP conversion instructions,
52 the opportunity may be taken to modernise the instructions and make them
53 well-suited for common/important conversion sequences:
54
55 * **standard IEEE754** - used by most languages and CPUs
56 * **standard OpenPOWER** - saturation with NaN
57 converted to minimum valid integer
58 * **Java** - saturation with NaN converted to 0
59 * **JavaScript** - modulo wrapping with Inf/NaN converted to 0
60
61 The assembly listings in the [[int_fp_mv/appendix]] show how costly
62 some of these language-specific conversions are: Javascript is 32
63 scalar instructions, including seven branch instructions.
64
65 # Proposed New Scalar Instructions
66
67 All of the following instructions use the standard OpenPower conversion to/from 64-bit float format when reading/writing a 32-bit float from/to a FPR. All integers however are sourced/stored in the *GPR*.
68
69 Integer operands and results being in the GPR is the key differentiator between the proposed instructions
70 (the entire rationale) compated to existing Scalar Power ISA.
71 In all existing Power ISA Scalar conversion instructions, all
72 operands are FPRs, even if the format of the source or destination
73 data is actually a scalar integer.
74
75 Note that source and destination widths can be overridden by SimpleV
76 SVP64, and that SVP64 also has Saturation Modes *in addition*
77 to those independently described here. SVP64 Overrides and Saturation
78 work on *both* Fixed *and* Floating Point operands and results.
79 The interactions with SVP64
80 are explained in the [[int_fp_mv/appendix]]
81
82 # FPR to GPR moves
83
84 * `fmvtg RT, FRA`
85 * `fmvtg. RT, FRA`
86
87 move a 64-bit float from a FPR to a GPR, just copying bits directly.
88 As a direct bitcopy, no exceptions occur and no status flags are set.
89
90 Rc=1 tests RT and sets CR0, exactly like all other Scalar Fixed-Point
91 operations.
92
93 * `fmvtgs RT, FRA`
94 * `fmvtgs. RT, FRA`
95
96 move a 32-bit float from a FPR to a GPR, just copying bits. Converts the
97 64-bit float in `FRA` to a 32-bit float, then writes the 32-bit float to
98 `RT`. Effectively, `fmvtgs` is a macro-fusion of `frsp fmvtg`
99 and therefore has the exact same exception and flags behaviour of `frsp`
100
101 Unlike `frsp` however, with RT being a GPR, Rc=1 follows
102 standard *integer* behaviour, i.e. tests RT and sets CR0.
103
104 # GPR to FPR moves
105
106 `fmvfg FRT, RA`
107
108 move a 64-bit float from a GPR to a FPR, just copying bits. No exceptions
109 are raised, no flags are altered of any kind.
110
111 Rc=1 tests FRT and sets CR1
112
113 `fmvfgs FRT, RA`
114
115 move a 32-bit float from a GPR to a FPR, just copying bits. Converts the
116 32-bit float in `RA` to a 64-bit float, then writes the 64-bit float to
117 `FRT`. Effectively, `fmvfgs` is a macro-fusion of `fmvfg frsp` and
118 therefore has the exact same exception and flags behaviour of `frsp`
119
120 Rc=1 tests FRT and sets CR1
121
122 TODO: clear statement on evaluation as to whether exceptions or flags raised as part of the **FP** conversion (not the int bitcopy part, the conversion part. the semantics should really be the same as frsp)
123
124 v3.0C section 4.6.7.1 states:
125
126 FPRF is set to the class and sign of the result, except for Invalid Operation Exceptions when VE=1.
127
128 Special Registers Altered:
129 FPRF FR FI
130 FX OX UX XX VXSNAN
131 CR1 (if Rc=1)
132
133 # Float load immediate <a name="fmvis"></a>
134
135 This is like a variant of `fmvfg`
136
137 `fmvis FRT, FI`
138
139 Reinterprets `FI << 16` as a 32-bit float, which is then converted to a
140 64-bit float and written to `FRT`. This is equivalent to reinterpreting
141 `FI` as a `BF16` and converting to 64-bit float.
142
143 There is no need for an Rc=1 variant because this is an immediate loading
144 instruction. This frees up one extra bit in the X-Form format for packing
145 a full `BF16`.
146
147 Example:
148
149 ```
150 # clearing a FPR
151 fmvis f4, 0 # writes +0.0 to f4
152 # loading handy constants
153 fmvis f4, 0x8000 # writes -0.0 to f4
154 fmvis f4, 0x3F80 # writes +1.0 to f4
155 fmvis f4, 0xBF80 # writes -1.0 to f4
156 fmvis f4, 0xBFC0 # writes -1.5 to f4
157 fmvis f4, 0x7FC0 # writes +qNaN to f4
158 fmvis f4, 0x7F80 # writes +Infinity to f4
159 fmvis f4, 0xFF80 # writes -Infinity to f4
160 fmvis f4, 0x3FFF # writes +1.9921875 to f4
161
162 # clearing 128 FPRs with 2 SVP64 instructions
163 # by issuing 32 vec4 (subvector length 4) ops
164 setvli VL=MVL=32
165 sv.fmvis/vec4 f0, 0 # writes +0.0 to f0-f127
166 ```
167 Important: If the float load immediate instruction(s) are left out,
168 change all [GPR to FPR conversion instructions](#GPR-to-FPR-conversions)
169 to instead write `+0.0` if `RA` is register `0`, at least
170 allowing clearing FPRs.
171
172 `fmvis` fits well with DX-Form:
173
174 | 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form |
175 |--------|------|-------|-------|-------|-----|-----|
176 | Major | FRT | d1 | d0 | XO | d2 | DX-Form |
177
178 bf16 = d0 || d1 || d2
179 fp32 = bf16 || [0]*16
180 FRT = Single_to_Double(fp32)
181
182 # Conversions
183
184 Unlike the move instructions
185 these instructions perform conversions between Integer and
186 Floating Point. Truncation can therefore occur, as well
187 as exceptions.
188
189 Mode values:
190
191 | Mode | `rounding_mode` | Semantics |
192 |------|-----------------|----------------------------------|
193 | 000 | from `FPSCR` | [OpenPower semantics] |
194 | 001 | Truncate | [OpenPower semantics] |
195 | 010 | from `FPSCR` | [Java semantics] |
196 | 011 | Truncate | [Java semantics] |
197 | 100 | from `FPSCR` | [JavaScript semantics] |
198 | 101 | Truncate | [JavaScript semantics] |
199 | rest | -- | illegal instruction trap for now |
200
201 [OpenPower semantics]: #fp-to-int-openpower-conversion-semantics
202 [Java semantics]: #fp-to-int-java-conversion-semantics
203 [JavaScript semantics]: #fp-to-int-javascript-conversion-semantics
204
205 ## GPR to FPR conversions
206
207 **Format**
208
209 | 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form |
210 |--------|------|--------|-------|-------|----|------|
211 | Major | FRT | //Mode | RA | XO | Rc |X-Form|
212
213 All of the following GPR to FPR conversions use the rounding mode from `FPSCR`.
214
215 * `fcvtfgw FRT, RA`
216 Convert from 32-bit signed integer in the GPR `RA` to 64-bit float in
217 `FRT`.
218 * `fcvtfgws FRT, RA`
219 Convert from 32-bit signed integer in the GPR `RA` to 32-bit float in
220 `FRT`.
221 * `fcvtfguw FRT, RA`
222 Convert from 32-bit unsigned integer in the GPR `RA` to 64-bit float in
223 `FRT`.
224 * `fcvtfguws FRT, RA`
225 Convert from 32-bit unsigned integer in the GPR `RA` to 32-bit float in
226 `FRT`.
227 * `fcvtfgd FRT, RA`
228 Convert from 64-bit signed integer in the GPR `RA` to 64-bit float in
229 `FRT`.
230 * `fcvtfgds FRT, RA`
231 Convert from 64-bit signed integer in the GPR `RA` to 32-bit float in
232 `FRT`.
233 * `fcvtfgud FRT, RA`
234 Convert from 64-bit unsigned integer in the GPR `RA` to 64-bit float in
235 `FRT`.
236 * `fcvtfguds FRT, RA`
237 Convert from 64-bit unsigned integer in the GPR `RA` to 32-bit float in
238 `FRT`.
239
240 ## FPR to GPR (Integer) conversions
241
242 <div id="fpr-to-gpr-conversion-mode"></div>
243
244 Different programming languages turn out to have completely different
245 semantics for FP to Integer conversion. Below is an overview
246 of the different variants, listing the languages and hardware that
247 implements each variant.
248
249 **Standard IEEE754 conversion**
250
251 This conversion is outlined in the IEEE754 specification. It is used
252 by nearly all programming languages and CPUs. In the case of OpenPOWER,
253 the rounding mode is read from FPSCR
254
255 **Standard OpenPower conversion**
256
257 This conversion, instead of exact IEEE754 Compliance, performs
258 "saturation with NaN converted to minimum valid integer". This
259 is also exactly the same as the x86 ISA conversion senantics.
260 OpenPOWER however has instructions for both:
261
262 * rounding mode read from FPSCR
263 * rounding mode always set to truncate
264
265 **Java conversion**
266
267 For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by Java's semantics (and Rust's `as` operator) will be referred to as
268 [Java conversion semantics](#fp-to-int-java-conversion-semantics).
269
270 Those same semantics are used in some way by all of the following languages (not necessarily for the default conversion method):
271
272 * Java's
273 [FP -> Integer conversion](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3)
274 * Rust's FP -> Integer conversion using the
275 [`as` operator](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics)
276 * LLVM's
277 [`llvm.fptosi.sat`](https://llvm.org/docs/LangRef.html#llvm-fptosi-sat-intrinsic) and
278 [`llvm.fptoui.sat`](https://llvm.org/docs/LangRef.html#llvm-fptoui-sat-intrinsic) intrinsics
279 * SPIR-V's OpenCL dialect's
280 [`OpConvertFToU`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToU) and
281 [`OpConvertFToS`](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#OpConvertFToS)
282 instructions when decorated with
283 [the `SaturatedConversion` decorator](https://www.khronos.org/registry/spir-v/specs/unified1/SPIRV.html#_a_id_decoration_a_decoration).
284
285 **JavaScript conversion**
286
287 For the sake of simplicity, the FP -> Integer conversion semantics generalized from those used by JavaScripts's `ToInt32` abstract operation will be referred to as [JavaScript conversion semantics](#fp-to-int-javascript-conversion-semantics).
288
289 This instruction is present in ARM assembler as FJCVTZS
290 <https://developer.arm.com/documentation/dui0801/g/hko1477562192868>
291
292 **Format**
293
294 | 0-5 | 6-10 | 11-15 | 16-25 | 26-30 | 31 | Form |
295 |--------|------|--------|-------|-------|----|------|
296 | Major | RT | //Mode | FRA | XO | Rc |X-Form|
297
298 **Instructions**
299
300 * `fcvttgw RT, FRA, Mode`
301 Convert from 64-bit float to 32-bit signed integer, writing the result
302 to the GPR `RT`. Converts using [mode `Mode`]
303 * `fcvttguw RT, FRA, Mode`
304 Convert from 64-bit float to 32-bit unsigned integer, writing the result
305 to the GPR `RT`. Converts using [mode `Mode`]
306 * `fcvttgd RT, FRA, Mode`
307 Convert from 64-bit float to 64-bit signed integer, writing the result
308 to the GPR `RT`. Converts using [mode `Mode`]
309 * `fcvttgud RT, FRA, Mode`
310 Convert from 64-bit float to 64-bit unsigned integer, writing the result
311 to the GPR `RT`. Converts using [mode `Mode`]
312 * `fcvtstgw RT, FRA, Mode`
313 Convert from 32-bit float to 32-bit signed integer, writing the result
314 to the GPR `RT`. Converts using [mode `Mode`]
315 * `fcvtstguw RT, FRA, Mode`
316 Convert from 32-bit float to 32-bit unsigned integer, writing the result
317 to the GPR `RT`. Converts using [mode `Mode`]
318 * `fcvtstgd RT, FRA, Mode`
319 Convert from 32-bit float to 64-bit signed integer, writing the result
320 to the GPR `RT`. Converts using [mode `Mode`]
321 * `fcvtstgud RT, FRA, Mode`
322 Convert from 32-bit float to 64-bit unsigned integer, writing the result
323 to the GPR `RT`. Converts using [mode `Mode`]
324
325 [mode `Mode`]: #fpr-to-gpr-conversion-mode
326
327 ## FP to Integer Conversion Pseudo-code
328
329 Key for pseudo-code:
330
331 | term | result type | definition |
332 |---------------------------|-------------|----------------------------------------------------------------------------------------------------|
333 | `fp` | -- | `f32` or `f64` (or other types from SimpleV) |
334 | `int` | -- | `u32`/`u64`/`i32`/`i64` (or other types from SimpleV) |
335 | `uint` | -- | the unsigned integer of the same bit-width as `int` |
336 | `int::BITS` | `int` | the bit-width of `int` |
337 | `int::MIN_VALUE` | `int` | the minimum value `int` can store (`0` if unsigned, `-2^(int::BITS-1)` if signed) |
338 | `int::MAX_VALUE` | `int` | the maximum value `int` can store (`2^int::BITS - 1` if unsigned, `2^(int::BITS-1) - 1` if signed) |
339 | `int::VALUE_COUNT` | Integer | the number of different values `int` can store (`2^int::BITS`). too big to fit in `int`. |
340 | `rint(fp, rounding_mode)` | `fp` | rounds the floating-point value `fp` to an integer according to rounding mode `rounding_mode` |
341
342 <div id="fp-to-int-openpower-conversion-semantics"></div>
343 OpenPower conversion semantics (section A.2 page 999 (page 1023) of OpenPower ISA v3.1):
344
345 ```
346 def fp_to_int_open_power<fp, int>(v: fp) -> int:
347 if v is NaN:
348 return int::MIN_VALUE
349 if v >= int::MAX_VALUE:
350 return int::MAX_VALUE
351 if v <= int::MIN_VALUE:
352 return int::MIN_VALUE
353 return (int)rint(v, rounding_mode)
354 ```
355
356 <div id="fp-to-int-java-conversion-semantics"></div>
357 [Java conversion semantics](https://docs.oracle.com/javase/specs/jls/se16/html/jls-5.html#jls-5.1.3)
358 /
359 [Rust semantics](https://doc.rust-lang.org/reference/expressions/operator-expr.html#semantics)
360 (with adjustment to add non-truncate rounding modes):
361
362 ```
363 def fp_to_int_java<fp, int>(v: fp) -> int:
364 if v is NaN:
365 return 0
366 if v >= int::MAX_VALUE:
367 return int::MAX_VALUE
368 if v <= int::MIN_VALUE:
369 return int::MIN_VALUE
370 return (int)rint(v, rounding_mode)
371 ```
372
373 <div id="fp-to-int-javascript-conversion-semantics"></div>
374 Section 7.1 of the ECMAScript / JavaScript
375 [conversion semantics](https://262.ecma-international.org/11.0/#sec-toint32) (with adjustment to add non-truncate rounding modes):
376
377 ```
378 def fp_to_int_java_script<fp, int>(v: fp) -> int:
379 if v is NaN or infinite:
380 return 0
381 v = rint(v, rounding_mode)
382 v = v mod int::VALUE_COUNT # 2^32 for i32, 2^64 for i64, result is non-negative
383 bits = (uint)v
384 return (int)bits
385 ```
386