expand out integer min/max mode table
[libreriscv.git] / openpower / sv / rfc / ls013.mdwn
1 # RFC ls013 Min/Max GPR/FPR
2
3 **URLs**:
4
5 * <https://libre-soc.org/openpower/sv/rfc/ls013/>
6 * <https://git.openpower.foundation/isa/PowerISA/issues/TODO>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1057>
8
9 **Severity**: Major
10
11 **Status**: New
12
13 **Date**: 14 Apr 2023
14
15 **Target**: v3.2B
16
17 **Source**: v3.1B
18
19 **Books and Section affected**:
20
21 ```
22 Book I Fixed-Point and Floating-Point Instructions
23 Appendix E Power ISA sorted by opcode
24 Appendix F Power ISA sorted by version
25 Appendix G Power ISA sorted by Compliancy Subset
26 Appendix H Power ISA sorted by mnemonic
27 ```
28
29 **Summary**
30
31 ```
32 Instructions added
33 ```
34
35 **Submitter**: Luke Leighton (Libre-SOC)
36
37 **Requester**: Libre-SOC
38
39 **Impact on processor**:
40
41 ```
42 Addition of new GPR-based and FPR-based instructions
43 ```
44
45 **Impact on software**:
46
47 ```
48 Requires support for new instructions in assembler, debuggers,
49 and related tools.
50 ```
51
52 **Keywords**:
53
54 ```
55 GPR, FPR, min, max, fmin, fmax
56 ```
57
58 **Motivation**
59
60 TODO
61
62 **Notes and Observations**:
63
64 1. SVP64 REMAP Parallel Reduction needs a single Scalar instruction to
65 work with, for best effectiveness. With no SFFS minimum/maximum instructions
66 Simple-V min/max Parallel Reduction is severely compromised.
67 2. Once one FP min/max mode is implemented the rest are not much more
68 hardware.
69 3. There exists similar instructions in VSX (not IEEE754-2019 though).
70 This is frequently used to justify not
71 adding them. However SVP64/VSX may have different meaning from SVP64/SFFS,
72 so it is *really* crucial to have SFFS ops even if "equivalent" to VSX
73 in order for SVP64 to not be compromised (non-orthogonal).
74 4. FP min/max are rather complex to implement in software, the most commonly
75 used FP max function `fmax` from glibc compiled for SFFS is an
76 astounding 32 instructions.
77
78 **Changes**
79
80 Add the following entries to:
81
82 * the Appendices of Book I
83 * Book I 3.3.9 Fixed-Point Arithmetic Instructions
84 * Book I 4.6.6.1 Floating-Point Elementary Arithmetic Instructions
85 * Book I 1.6.1 and 1.6.2
86
87 ----------------
88
89 \newpage{}
90
91 # Floating-Point Instructions
92
93 This group is to provide Floating-Point min/max however with IEEE754 having advanced
94 to 2019 there are now subtle differences. These are selectable with a Mode Field, `FMM`.
95
96 ## `FMM` -- Floating Min/Max Mode
97
98 <a id="fmm-floating-min-max-mode"></a>
99
100 | `FMM` | Assembly Alias | Origin | Semantics |
101 |-------|-------------------------------|--------------------------------|-------------------------------------------------|
102 | 0000 | fminnum08[s] FRT, FRA, FRB | IEEE 754-2008 | FRT = minNum(FRA, FRB) (1) |
103 | 0001 | fmin19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = minimum(FRA, FRB) |
104 | 0010 | fminnum19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = minimumNumber(FRA, FRB) |
105 | 0011 | fminc[s] FRT, FRA, FRB | x86 minss or Win32's min macro | FRT = FRA \< FRB ? FRA : FRB |
106 | 0100 | fminmagnum08[s] FRT, FRA, FRB | IEEE 754-2008 (TODO: (3)) | FRT = minmaxmag(FRA, FRB, False, fminnum08) (2) |
107 | 0101 | fminmag19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = minmaxmag(FRA, FRB, False, fmin19) (2) |
108 | 0110 | fminmagnum19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = minmaxmag(FRA, FRB, False, fminnum19) (2) |
109 | 0111 | fminmagc[s] FRT, FRA, FRB | - | FRT = minmaxmag(FRA, FRB, False, fminc) (2) |
110 | 1000 | fmaxnum08[s] FRT, FRA, FRB | IEEE 754-2008 | FRT = maxNum(FRA, FRB) (1) |
111 | 1001 | fmax19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = maximum(FRA, FRB) |
112 | 1010 | fmaxnum19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = maximumNumber(FRA, FRB) |
113 | 1011 | fmaxc[s] FRT, FRA, FRB | x86 maxss or Win32's max macro | FRT = FRA > FRB ? FRA : FRB |
114 | 1100 | fmaxmagnum08[s] FRT, FRA, FRB | IEEE 754-2008 (TODO: (3)) | FRT = minmaxmag(FRA, FRB, True, fmaxnum08) (2) |
115 | 1101 | fmaxmag19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = minmaxmag(FRA, FRB, True, fmax19) (2) |
116 | 1110 | fmaxmagnum19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = minmaxmag(FRA, FRB, True, fmaxnum19) (2) |
117 | 1111 | fmaxmagc[s] FRT, FRA, FRB | - | FRT = minmaxmag(FRA, FRB, True, fmaxc) (2) |
118
119 Note (1): for the purposes of minNum/maxNum, -0.0 is defined to be less than
120 +0.0. This is left unspecified in IEEE 754-2008.
121
122 Note (2): minmaxmag(x, y, cmp, fallback) is defined as:
123
124 ```python
125 def minmaxmag(x, y, is_max, fallback):
126 a = abs(x) < abs(y)
127 b = abs(x) > abs(y)
128 if is_max:
129 a, b = b, a # swap
130 if a:
131 return x
132 if b:
133 return y
134 # equal magnitudes, or NaN input(s)
135 return fallback(x, y)
136 ```
137
138 Note (3): TODO: icr if IEEE 754-2008 has min/maxMagNum like IEEE 754-2019's
139 minimum/maximumMagnitudeNumber
140
141 ----------------
142
143 \newpage{}
144
145 ## Floating Minimum/Maximum
146
147 A-Form
148
149
150 * fminmax FRT, FRA, FRB, FMM
151 * fminmax. FRT, FRA, FRB, FMM
152
153 ```
154 |0 |6 |11 |16 |21 |26 |31 |
155 | PO | FRT | FRA | FRB | FMM[0:3] / | XO | Rc |
156 ```
157
158 Special Registers altered:
159
160 ```
161 FX VXSNAN
162 CR1 (if Rc=1)
163 ```
164 Compute the minimum/maximum of FRA and FRB, according to FMM, and store the
165 result in FRT.
166
167 Assembly Aliases: see
168 [`FMM` -- Floating Min/Max Mode](#fmm-floating-min-max-mode)
169
170 ----------
171
172 ## Floating Minimum/Maximum Single
173
174 A-Form
175
176 * fminmaxs FRT, FRA, FRB, FMM
177 * fminmaxs. FRT, FRA, FRB, FMM
178
179 ```
180 |0 |6 |11 |16 |21 |26 |31 |
181 | PO | FRT | FRA | FRB | FMM[0:3] / | XO | Rc |
182 ```
183
184 Special Registers altered:
185
186 ```
187 FX VXSNAN
188 CR1 (if Rc=1)
189 ```
190
191
192 Compute the minimum/maximum of FRA and FRB, according to FMM, and store the
193 result in FRT.
194
195 Assembly Aliases: see
196 [`FMM` -- Floating Min/Max Mode](#fmm-floating-min-max-mode)
197
198 ----------
199
200 \newpage{}
201
202 # Fixed-Point Instructions
203
204 These are signed and unsigned, min or max. SVP64 Prefixing defines Saturation
205 semantics therefore Saturated variants of these instructions need not be proposed.
206
207 ## Integer Min/Max Mode
208
209 * bit 0: set if word variant else dword
210 * bit 1: set if signed else unsigned
211 * bit 2: set if max else min
212
213 | `IMM` | Assembly Alias | Semantics |
214 |-------|------------------|----------------------------------------------|
215 | 000 | `minu RT,RA,RB` | `RT = (uint64_t)RA < (uint64_t)RB ? RA : RB` |
216 | 001 | `maxu RT,RA,RB` | `RT = (uint64_t)RA > (uint64_t)RB ? RA : RB` |
217 | 010 | `mins RT,RA,RB` | `RT = (int64_t)RA < (int64_t)RB ? RA : RB` |
218 | 011 | `maxs RT,RA,RB` | `RT = (int64_t)RA > (int64_t)RB ? RA : RB` |
219 | 100 | `minuw RT,RA,RB` | `RT = (uint32_t)RA < (uint32_t)RB ? RA : RB` |
220 | 101 | `maxuw RT,RA,RB` | `RT = (uint32_t)RA > (uint32_t)RB ? RA : RB` |
221 | 110 | `minsw RT,RA,RB` | `RT = (int32_t)RA < (int32_t)RB ? RA : RB` |
222 | 111 | `maxsw RT,RA,RB` | `RT = (int32_t)RA > (int32_t)RB ? RA : RB` |
223
224 ## Minimum Unsigned
225
226 X-Form
227
228 ```
229 |0 |6 |11 |16 |21 |31 |
230 | PO | RT | RA | RB | XO | Rc |
231 ```
232
233 * minu RT, RA, RB
234 * minu. RT, RA, RB
235
236
237 ```
238 if (RA) <u (RB) then
239 RT <- (RA)
240 else
241 RT <- (RB)
242 ```
243
244 Special Registers altered:
245
246 ```
247 CR0 (if Rc=1)
248 ```
249
250 Compute the unsigned minimum of RA and RB and store the result in RT.
251
252 ## Maximum Unsigned
253
254 X-Form
255
256 ```
257 maxu RT, RA, RB
258 maxu. RT, RA, RB
259 ```
260
261 ```
262 |0 |6 |11 |16 |21 |31 |
263 | PO | RT | RA | RB | XO | Rc |
264 ```
265
266 ```
267 if (RA) >u (RB) then
268 RT <- (RA)
269 else
270 RT <- (RB)
271 ```
272
273 Special Registers altered:
274
275 ```
276 CR0 (if Rc=1)
277 ```
278
279 Compute the unsigned maximum of RA and RB and store the result in RT.
280
281 \newpage{}
282
283 ## Minimum
284
285 X-Form
286
287 ```
288 min RT, RA, RB
289 min. RT, RA, RB
290 ```
291
292 ```
293 |0 |6 |11 |16 |21 |31 |
294 | PO | RT | RA | RB | XO | Rc |
295 ```
296
297 ```
298 if (RA) < (RB) then
299 RT <- (RA)
300 else
301 RT <- (RB)
302 ```
303
304 Special Registers altered:
305
306 ```
307 CR0 (if Rc=1)
308 ```
309 Compute the signed minimum of RA and RB and store the result in RT.
310
311 ## Maximum
312
313 X-Form
314
315 ```
316 max RT, RA, RB
317 max. RT, RA, RB
318 ```
319
320 ```
321 |0 |6 |11 |16 |21 |31 |
322 | PO | RT | RA | RB | XO | Rc |
323 ```
324
325 ```
326 if (RA) > (RB) then
327 RT <- (RA)
328 else
329 RT <- (RB)
330 ```
331
332 Compute the signed maximum of RA and RB and store the result in RT.
333
334 Special Registers altered:
335
336 ```
337 CR0 (if Rc=1)
338 ```
339
340 ----------
341
342 \newpage{}
343
344 # Instruction Formats
345
346 Add the following entries to Book I 1.6.1.15 X-FORM:
347
348 ```
349 |0 |6 |11 |16 |21 |26 |31 |
350 | PO | FRT | FRA | FRB | FMM[0:3] / | XO | Rc |
351 ```
352
353 Add a new field to Book I 1.6.2 Word Instruction Fields:
354
355 ```
356 FMM (21:24)
357 Field used to specify minimum/maximum mode for fminmax[s].
358
359 Formats: A
360 ```
361
362 ----------
363
364 \newpage{}
365
366 # Appendices
367
368 Appendix E Power ISA sorted by opcode
369 Appendix F Power ISA sorted by version
370 Appendix G Power ISA sorted by Compliancy Subset
371 Appendix H Power ISA sorted by mnemonic
372
373 | Form | Book | Page | Version | mnemonic | Description |
374 |------|------|------|---------|----------|-------------|
375 | A | I | # | 3.2B | fminmax | Floating Minimum/Maximum |
376 | A | I | # | 3.2B | fminmaxs | Floating Minimum/Maximum Single |
377 | ??? | I | # | 3.2B | minmax | Minimum/max Signed/Unsigned |
378
379 ## fmax instruction count
380
381 32 instructions are required in SFFS to emulate fmac.
382 <https://gcc.godbolt.org/z/6xba61To6>
383
384
385 ```
386 fmax(double, double):
387 fcmpu 0,1,2
388 fmr 0,1
389 cror 30,1,2
390 beq 7,.L12
391 blt 0,.L13
392 stfd 1,-16(1)
393 lis 9,0x8
394 li 8,-1
395 sldi 9,9,32
396 rldicr 8,8,0,11
397 ori 2,2,0
398 ld 10,-16(1)
399 xor 10,10,9
400 sldi 10,10,1
401 cmpld 0,10,8
402 bgt 0,.L5
403 stfd 2,-16(1)
404 ori 2,2,0
405 ld 10,-16(1)
406 xor 9,10,9
407 sldi 9,9,1
408 cmpld 0,9,8
409 ble 0,.L6
410 .L5:
411 fadd 1,0,2
412 blr
413 .L13:
414 fmr 1,2
415 blr
416 .L6:
417 fcmpu 0,2,2
418 fmr 1,2
419 bnulr 0
420 .L12:
421 fmr 1,0
422 blr
423 .long 0
424 .byte 0,9,0,0,0,0,0,0
425 ```
426
427 [[!tag opf_rfc]]
428