add ls001.po9 RFC
[libreriscv.git] / openpower / sv / rfc / ls013.mdwn
1 # RFC ls013 Min/Max GPR/FPR
2
3 **URLs**:
4
5 * <https://libre-soc.org/openpower/sv/rfc/ls013/>
6 * <https://git.openpower.foundation/isa/PowerISA/issues/TODO>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1057>
8
9 **Severity**: Major
10
11 **Status**: New
12
13 **Date**: 14 Apr 2023
14
15 **Target**: v3.2B
16
17 **Source**: v3.1B
18
19 **Books and Section affected**:
20
21 ```
22 Book I Fixed-Point and Floating-Point Instructions
23 Appendix E Power ISA sorted by opcode
24 Appendix F Power ISA sorted by version
25 Appendix G Power ISA sorted by Compliancy Subset
26 Appendix H Power ISA sorted by mnemonic
27 ```
28
29 **Summary**
30
31 ```
32 Instructions added
33 ```
34
35 **Submitter**: Luke Leighton (Libre-SOC)
36
37 **Requester**: Libre-SOC
38
39 **Impact on processor**:
40
41 ```
42 Addition of new GPR-based and FPR-based instructions
43 ```
44
45 **Impact on software**:
46
47 ```
48 Requires support for new instructions in assembler, debuggers,
49 and related tools.
50 ```
51
52 **Keywords**:
53
54 ```
55 GPR, FPR, min, max, fmin, fmax
56 ```
57
58 **Motivation**
59
60 Minimum/Maximum are common operations that can take an astounding number of
61 operations to implement in software. Additionally, Vector Reduce-Min/Max are
62 common vector operations, and SVP64 Parallel Reduction needs a single Scalar
63 instruction in order to effectively implement Reduce-Min/Max.
64
65 **Notes and Observations**:
66
67 1. SVP64 REMAP Parallel Reduction needs a single Scalar instruction to
68 work with, for best effectiveness. With no SFFS minimum/maximum
69 instructions Simple-V min/max Parallel Reduction is severely compromised.
70 2. Once one FP min/max mode is implemented the rest are not much more hardware.
71 3. There exists similar instructions in VSX (not IEEE754-2019 though).
72 This is frequently used to justify not adding them. However SVP64/VSX may
73 have different meaning from SVP64/SFFS, so it is *really* crucial to have
74 SFFS ops even if "equivalent" to VSX in order for SVP64 to not be
75 compromised (non-orthogonal).
76 4. FP min/max are rather complex to implement in software, the most commonly
77 used FP max function `fmax` from glibc compiled for SFFS is an astounding
78 32 instructions.
79
80 **Changes**
81
82 Add the following entries to:
83
84 * the Appendices of Book I
85 * Book I 3.3.9 Fixed-Point Arithmetic Instructions
86 * Book I 4.6.6.1 Floating-Point Elementary Arithmetic Instructions
87 * Book I 1.6.1 and 1.6.2
88
89 ----------------
90
91 \newpage{}
92
93 # Floating-Point Instructions
94
95 This group is to provide Floating-Point min/max however with IEEE754 having advanced
96 to 2019 there are now subtle differences. These are selectable with a Mode Field, `FMM`.
97
98 ## `FMM` -- Floating Min/Max Mode
99
100 <a id="fmm-floating-min-max-mode"></a>
101
102 <!-- hyphens in table determine width of columns for pandoc -- -->
103 | `FMM`| Extended Mnemonic | Origin | Semantics |
104 |------|-------------------------------|--------------------|--------------------------------------------|
105 | 0000 | fminnum08[s] FRT,FRA,FRB | IEEE 754-2008 | minNum(FRA,FRB) (1) |
106 | 0001 | fmin19[s] FRT,FRA,FRB | IEEE 754-2019 | minimum(FRA,FRB) |
107 | 0010 | fminnum19[s] FRT,FRA,FRB | IEEE 754-2019 | minimumNumber(FRA,FRB) |
108 | 0011 | fminc[s] FRT,FRA,FRB | x86 minss (4) | FRA\<FRB ? FRA:FRB |
109 | 0100 | fminmagnum08[s] FRT,FRA,FRB | IEEE 754-2008 (3) | mmmag(FRA,FRB,False,fminnum08) (2) |
110 | 0101 | fminmag19[s] FRT,FRA,FRB | IEEE 754-2019 | mmmag(FRA,FRB,False,fmin19) (2) |
111 | 0110 | fminmagnum19[s] FRT,FRA,FRB | IEEE 754-2019 | mmmag(FRA,FRB,False,fminnum19) (2) |
112 | 0111 | fminmagc[s] FRT,FRA,FRB | - | mmmag(FRA,FRB,False,fminc) (2) |
113 | 1000 | fmaxnum08[s] FRT,FRA,FRB | IEEE 754-2008 | maxNum(FRA,FRB) (1) |
114 | 1001 | fmax19[s] FRT,FRA,FRB | IEEE 754-2019 | maximum(FRA,FRB) |
115 | 1010 | fmaxnum19[s] FRT,FRA,FRB | IEEE 754-2019 | maximumNumber(FRA,FRB) |
116 | 1011 | fmaxc[s] FRT,FRA,FRB | x86 maxss (4) | FRA\>FRB ? FRA:FRB |
117 | 1100 | fmaxmagnum08[s] FRT,FRA,FRB | IEEE 754-2008 (3) | mmmag(FRA,FRB,True,fmaxnum08) (2) |
118 | 1101 | fmaxmag19[s] FRT,FRA,FRB | IEEE 754-2019 | mmmag(FRA,FRB,True,fmax19) (2) |
119 | 1110 | fmaxmagnum19[s] FRT,FRA,FRB | IEEE 754-2019 | mmmag(FRA,FRB,True,fmaxnum19) (2) |
120 | 1111 | fmaxmagc[s] FRT,FRA,FRB | - | mmmag(FRA,FRB,True,fmaxc) (2) |
121
122 Note (1): for the purposes of minNum/maxNum, -0.0 is defined to be less than
123 +0.0. This is left unspecified in IEEE 754-2008.
124
125 Note (2): mmmag(x, y, cmp, fallback) is defined as:
126
127 ```python
128 def mmmag(x, y, is_max, fallback):
129 a = abs(x) < abs(y)
130 b = abs(x) > abs(y)
131 if is_max:
132 a, b = b, a # swap
133 if a:
134 return x
135 if b:
136 return y
137 # equal magnitudes, or NaN input(s)
138 return fallback(x, y)
139 ```
140
141 Note (3): TODO: icr if IEEE 754-2008 has min/maxMagNum like IEEE 754-2019's
142 minimum/maximumMagnitudeNumber
143
144 Note (4) or Win32's min macro
145
146 ----------------
147
148 \newpage{}
149
150 ## Floating Minimum/Maximum MM-form
151
152 * fminmax FRT, FRA, FRB, FMM
153 * fminmax. FRT, FRA, FRB, FMM
154
155 ```
156 |0 |6 |11 |16 |21 |25 |31 |
157 | PO | FRT | FRA | FRB | FMM | XO | Rc |
158 ```
159
160 ```
161 result <- [0] * 64
162 a <- (FRA)
163 b <- (FRB)
164 abs_a <- 0b0 || a[1:63]
165 abs_b <- 0b0 || b[1:63]
166 a_is_nan <- abs_a >u 0x7FF0_0000_0000_0000
167 a_is_snan <- a_is_nan and a[12] = 0
168 b_is_nan <- abs_b >u 0x7FF0_0000_0000_0000
169 b_is_snan <- b_is_nan and b[12] = 0
170 any_snan <- a_is_snan or b_is_snan
171 a_quieted <- a
172 a_quieted[12] = 1
173 b_quieted <- b
174 b_quieted[12] = 1
175 if a_is_nan or b_is_nan then
176 if FMM[2:3] = 0b00 then # min/maxnum08
177 if a_is_snan then result <- a_quieted
178 else if b_is_snan then result <- b_quieted
179 else if a_is_nan and b_is_nan then result <- a_quieted
180 else if a_is_nan then result <- b
181 else result <- a
182 if FMM[2:3] = 0b01 then # min/max19
183 if a_is_nan then result <- a_quieted
184 else result <- b_quieted
185 if FMM[2:3] = 0b10 then # min/maxnum19
186 if a_is_nan and b_is_nan then result <- a_quieted
187 else if a_is_nan then result <- b
188 else result <- a
189 if FMM[2:3] = 0b11 then # min/maxc
190 result <- b
191 else
192 cmp_l <- a
193 cmp_r <- b
194 if FMM[1] then # min/maxmag
195 if abs_a != abs_b then
196 cmp_l <- abs_a
197 cmp_r <- abs_b
198 if FMM[2:3] = 0b11 then # min/maxc
199 if abs_a = 0 then cmp_l <- 0
200 if abs_b = 0 then cmp_r <- 0
201 if FMM[0] then # max
202 # swap cmp_* so comparison goes the other way
203 cmp_l, cmp_r <- cmp_r, cmp_l
204 if cmp_l[0] = 1 then
205 if cmp_r[0] = 0 then result <- a
206 else if cmp_l >u cmp_r then
207 # IEEE 754 is sign-magnitude,
208 # so bigger magnitude negative is smaller
209 result <- a
210 else result <- b
211 else if cmp_r[0] = 1 then result <- b
212 else if cmp_l <u cmp_r then result <- a
213 else result <- b
214 if any_snan then SetFX(FPSCR.VXSNAN)
215 if FPSCR.VE = 0 and ¬any_snan then (FRT) <- result
216 ```
217
218 Compute the minimum/maximum of FRA and FRB, according to FMM, and store the
219 result in FRT.
220
221 Special Registers altered:
222
223 ```
224 FX VXSNAN
225 CR1 (if Rc=1)
226 ```
227
228 Extended Mnemonics:
229
230 see [`FMM` -- Floating Min/Max Mode](#fmm-floating-min-max-mode)
231
232 ----------
233
234 ## Floating Minimum/Maximum Single MM-form
235
236 * fminmaxs FRT, FRA, FRB, FMM
237 * fminmaxs. FRT, FRA, FRB, FMM
238
239 ```
240 |0 |6 |11 |16 |21 |25 |31 |
241 | PO | FRT | FRA | FRB | FMM | XO | Rc |
242 ```
243
244 Compute the minimum/maximum of FRA and FRB, according to FMM, and store the
245 result in FRT.
246
247 Special Registers altered:
248
249 ```
250 FX VXSNAN
251 CR1 (if Rc=1)
252 ```
253
254 Extended Mnemonics:
255
256 see [`FMM` -- Floating Min/Max Mode](#fmm-floating-min-max-mode)
257
258 ----------
259
260 \newpage{}
261
262 # Fixed-Point Instructions
263
264 These are signed and unsigned, min or max. SVP64 Prefixing defines Saturation
265 semantics therefore Saturated variants of these instructions need not be proposed.
266
267 ## `MMM` -- Integer Min/Max Mode
268
269 <a id="mmm-integer-min-max-mode"></a>
270
271 * bit 0: set if word variant else dword
272 * bit 1: set if signed else unsigned
273 * bit 2: set if max else min
274
275 | `MMM` | Extended Mnemonic | Semantics |
276 |-------|-------------------|----------------------------------------------|
277 | 000 | `minu RT,RA,RB` | `(uint64_t)RA < (uint64_t)RB ? RA : RB` |
278 | 001 | `maxu RT,RA,RB` | `(uint64_t)RA > (uint64_t)RB ? RA : RB` |
279 | 010 | `mins RT,RA,RB` | ` (int64_t)RA < (int64_t)RB ? RA : RB` |
280 | 011 | `maxs RT,RA,RB` | ` (int64_t)RA > (int64_t)RB ? RA : RB` |
281 | 100 | `minuw RT,RA,RB` | `(uint32_t)RA < (uint32_t)RB ? RA : RB` |
282 | 101 | `maxuw RT,RA,RB` | `(uint32_t)RA > (uint32_t)RB ? RA : RB` |
283 | 110 | `minsw RT,RA,RB` | ` (int32_t)RA < (int32_t)RB ? RA : RB` |
284 | 111 | `maxsw RT,RA,RB` | ` (int32_t)RA > (int32_t)RB ? RA : RB` |
285
286 ## Minimum/Maximum MM-Form
287
288 * minmax RT, RA, RB, MMM
289 * minmax. RT, RA, RB, MMM
290
291 ```
292 |0 |6 |11 |16 |21 |24 |25 |31 |
293 | PO | RT | RA | RB | MMM | / | XO | Rc |
294 ```
295
296 ```
297 a <- (RA|0)
298 b <- (RB)
299 if MMM[0] then # word mode
300 # shift left by XLEN/2 to make the dword comparison
301 # do word comparison of the original inputs
302 a <- a[XLEN/2:XLEN-1] || [0] * XLEN/2
303 b <- b[XLEN/2:XLEN-1] || [0] * XLEN/2
304 if MMM[1] then # signed mode
305 # invert sign bits to make the unsigned comparison
306 # do signed comparison of the original inputs
307 a[0] <- ¬a[0]
308 b[0] <- ¬b[0]
309 # if Rc = 1 then store the result of comparing a and b to CR0
310 if Rc = 1 then
311 if a <u b then
312 CR0 <- 0b100 || XER.SO
313 if a = b then
314 CR0 <- 0b001 || XER.SO
315 if a >u b then
316 CR0 <- 0b010 || XER.SO
317 if MMM[2] then # max mode
318 # swap a and b to make the less than comparison do
319 # greater than comparison of the original inputs
320 t <- a
321 a <- b
322 b <- t
323 # store the entire selected source (even in word mode)
324 # if Rc = 1 then store the result of comparing a and b to CR0
325 if a <u b then RT <- (RA|0)
326 else RT <- (RB)
327 ```
328
329 Compute the integer minimum/maximum according to `MMM` of `(RA|0)` and `(RB)`
330 and store the result in `RT`.
331
332 Special Registers altered:
333
334 ```
335 CR0 (if Rc=1)
336 ```
337
338 Extended Mnemonics:
339
340 see [`MMM` -- Integer Min/Max Mode](#mmm-integer-min-max-mode)
341
342 ----------
343
344 \newpage{}
345
346 # Instruction Formats
347
348 Add the following entries to Book I 1.6.1 Word Instruction Formats:
349
350 ## MM-FORM
351
352 ```
353 |0 |6 |11 |16 |21 |24 |25 |31 |
354 | PO | FRT | FRA | FRB | FMM | XO | Rc |
355 | PO | RT | RA | RB | MMM | / | XO | Rc |
356 ```
357
358 Add the following new fields to Book I 1.6.2 Word Instruction Fields:
359
360 ```
361 FMM (21:24)
362 Field used to specify minimum/maximum mode for fminmax[s].
363
364 Formats: MM
365
366 MMM (21:23)
367 Field used to specify minimum/maximum mode for integer minmax.
368
369 Formats: MM
370 ```
371
372 Add `MM` to the `Formats:` list for all of `FRT`, `FRA`, `FRB`, `XO (25:30)`,
373 `Rc`, `RT`, `RA` and `RB`.
374
375 ----------
376
377 \newpage{}
378
379 # Appendices
380
381 Appendix E Power ISA sorted by opcode
382 Appendix F Power ISA sorted by version
383 Appendix G Power ISA sorted by Compliancy Subset
384 Appendix H Power ISA sorted by mnemonic
385
386 | Form | Book | Page | Version | Mnemonic | Description |
387 |------|------|------|---------|----------|-------------|
388 | MM | I | # | 3.2B | fminmax | Floating Minimum/Maximum |
389 | MM | I | # | 3.2B | fminmaxs | Floating Minimum/Maximum Single |
390 | MM | I | # | 3.2B | minmax | Minimum/Maximum |
391
392 ## fmax instruction count
393
394 32 instructions are required in SFFS to emulate fmax.
395
396 ```
397 #include <stdint.h>
398 #include <string.h>
399
400 inline uint64_t asuint64(double f) {
401 union {
402 double f;
403 uint64_t i;
404 } u = {f};
405 return u.i;
406 }
407
408 inline int issignaling(double v) {
409 // copied from glibc:
410 // https://github.com/bminor/glibc/blob/e2756903/sysdeps/ieee754/dbl-64/math_config.h#L101
411 uint64_t ix = asuint64(v);
412 return 2 * (ix ^ 0x0008000000000000) > 2 * 0x7ff8000000000000ULL;
413 }
414
415 double fmax(double x, double y) {
416 // copied from glibc:
417 // https://github.com/bminor/glibc/blob/e2756903/math/s_fmax_template.c
418 if(__builtin_isgreaterequal(x, y))
419 return x;
420 else if(__builtin_isless(x, y))
421 return y;
422 else if(issignaling(x) || issignaling(y))
423 return x + y;
424 else
425 return __builtin_isnan(y) ? x : y;
426 }
427 ```
428
429 Assembly listing:
430
431 ```
432 fmax(double, double):
433 fcmpu 0,1,2
434 fmr 0,1
435 cror 30,1,2
436 beq 7,.L12
437 blt 0,.L13
438 stfd 1,-16(1)
439 lis 9,0x8
440 li 8,-1
441 sldi 9,9,32
442 rldicr 8,8,0,11
443 ori 2,2,0
444 ld 10,-16(1)
445 xor 10,10,9
446 sldi 10,10,1
447 cmpld 0,10,8
448 bgt 0,.L5
449 stfd 2,-16(1)
450 ori 2,2,0
451 ld 10,-16(1)
452 xor 9,10,9
453 sldi 9,9,1
454 cmpld 0,9,8
455 ble 0,.L6
456 .L5:
457 fadd 1,0,2
458 blr
459 .L13:
460 fmr 1,2
461 blr
462 .L6:
463 fcmpu 0,2,2
464 fmr 1,2
465 bnulr 0
466 .L12:
467 fmr 1,0
468 blr
469 .long 0
470 .byte 0,9,0,0,0,0,0,0
471 ```
472
473 [[!tag opf_rfc]]
474