8757fab6b490a1c1fc66ff533464dd16fd06bf6a
[libreriscv.git] / openpower / sv / rfc / ls013.mdwn
1 # RFC ls013 Min/Max GPR/FPR
2
3 **URLs**:
4
5 * <https://libre-soc.org/openpower/sv/rfc/ls013/>
6 * <https://git.openpower.foundation/isa/PowerISA/issues/TODO>
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1057>
8
9 **Severity**: Major
10
11 **Status**: New
12
13 **Date**: 14 Apr 2023
14
15 **Target**: v3.2B
16
17 **Source**: v3.1B
18
19 **Books and Section affected**:
20
21 ```
22 Book I Fixed-Point and Floating-Point Instructions
23 Appendix E Power ISA sorted by opcode
24 Appendix F Power ISA sorted by version
25 Appendix G Power ISA sorted by Compliancy Subset
26 Appendix H Power ISA sorted by mnemonic
27 ```
28
29 **Summary**
30
31 ```
32 Instructions added
33 ```
34
35 **Submitter**: Luke Leighton (Libre-SOC)
36
37 **Requester**: Libre-SOC
38
39 **Impact on processor**:
40
41 ```
42 Addition of new GPR-based and FPR-based instructions
43 ```
44
45 **Impact on software**:
46
47 ```
48 Requires support for new instructions in assembler, debuggers,
49 and related tools.
50 ```
51
52 **Keywords**:
53
54 ```
55 GPR, FPR, min, max, fmin, fmax
56 ```
57
58 **Motivation**
59
60 Minimum/Maximum are common operations that can take an astounding number of
61 operations to implement in software. Additionally, Vector Reduce-Min/Max are
62 common vector operations, and SVP64 Parallel Reduction needs a single Scalar
63 instruction in order to effectively implement Reduce-Min/Max.
64
65 **Notes and Observations**:
66
67 1. SVP64 REMAP Parallel Reduction needs a single Scalar instruction to
68 work with, for best effectiveness. With no SFFS minimum/maximum instructions
69 Simple-V min/max Parallel Reduction is severely compromised.
70 2. Once one FP min/max mode is implemented the rest are not much more
71 hardware.
72 3. There exists similar instructions in VSX (not IEEE754-2019 though).
73 This is frequently used to justify not
74 adding them. However SVP64/VSX may have different meaning from SVP64/SFFS,
75 so it is *really* crucial to have SFFS ops even if "equivalent" to VSX
76 in order for SVP64 to not be compromised (non-orthogonal).
77 4. FP min/max are rather complex to implement in software, the most commonly
78 used FP max function `fmax` from glibc compiled for SFFS is an
79 astounding 32 instructions.
80
81 **Changes**
82
83 Add the following entries to:
84
85 * the Appendices of Book I
86 * Book I 3.3.9 Fixed-Point Arithmetic Instructions
87 * Book I 4.6.6.1 Floating-Point Elementary Arithmetic Instructions
88 * Book I 1.6.1 and 1.6.2
89
90 ----------------
91
92 \newpage{}
93
94 # Floating-Point Instructions
95
96 This group is to provide Floating-Point min/max however with IEEE754 having advanced
97 to 2019 there are now subtle differences. These are selectable with a Mode Field, `FMM`.
98
99 ## `FMM` -- Floating Min/Max Mode
100
101 <a id="fmm-floating-min-max-mode"></a>
102
103 | `FMM` | Assembly Alias | Origin | Semantics |
104 |-------|-------------------------------|--------------------------------|-------------------------------------------------|
105 | 0000 | fminnum08[s] FRT, FRA, FRB | IEEE 754-2008 | FRT = minNum(FRA, FRB) (1) |
106 | 0001 | fmin19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = minimum(FRA, FRB) |
107 | 0010 | fminnum19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = minimumNumber(FRA, FRB) |
108 | 0011 | fminc[s] FRT, FRA, FRB | x86 minss or Win32's min macro | FRT = FRA \< FRB ? FRA : FRB |
109 | 0100 | fminmagnum08[s] FRT, FRA, FRB | IEEE 754-2008 (TODO: (3)) | FRT = minmaxmag(FRA, FRB, False, fminnum08) (2) |
110 | 0101 | fminmag19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = minmaxmag(FRA, FRB, False, fmin19) (2) |
111 | 0110 | fminmagnum19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = minmaxmag(FRA, FRB, False, fminnum19) (2) |
112 | 0111 | fminmagc[s] FRT, FRA, FRB | - | FRT = minmaxmag(FRA, FRB, False, fminc) (2) |
113 | 1000 | fmaxnum08[s] FRT, FRA, FRB | IEEE 754-2008 | FRT = maxNum(FRA, FRB) (1) |
114 | 1001 | fmax19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = maximum(FRA, FRB) |
115 | 1010 | fmaxnum19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = maximumNumber(FRA, FRB) |
116 | 1011 | fmaxc[s] FRT, FRA, FRB | x86 maxss or Win32's max macro | FRT = FRA > FRB ? FRA : FRB |
117 | 1100 | fmaxmagnum08[s] FRT, FRA, FRB | IEEE 754-2008 (TODO: (3)) | FRT = minmaxmag(FRA, FRB, True, fmaxnum08) (2) |
118 | 1101 | fmaxmag19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = minmaxmag(FRA, FRB, True, fmax19) (2) |
119 | 1110 | fmaxmagnum19[s] FRT, FRA, FRB | IEEE 754-2019 | FRT = minmaxmag(FRA, FRB, True, fmaxnum19) (2) |
120 | 1111 | fmaxmagc[s] FRT, FRA, FRB | - | FRT = minmaxmag(FRA, FRB, True, fmaxc) (2) |
121
122 Note (1): for the purposes of minNum/maxNum, -0.0 is defined to be less than
123 +0.0. This is left unspecified in IEEE 754-2008.
124
125 Note (2): minmaxmag(x, y, cmp, fallback) is defined as:
126
127 ```python
128 def minmaxmag(x, y, is_max, fallback):
129 a = abs(x) < abs(y)
130 b = abs(x) > abs(y)
131 if is_max:
132 a, b = b, a # swap
133 if a:
134 return x
135 if b:
136 return y
137 # equal magnitudes, or NaN input(s)
138 return fallback(x, y)
139 ```
140
141 Note (3): TODO: icr if IEEE 754-2008 has min/maxMagNum like IEEE 754-2019's
142 minimum/maximumMagnitudeNumber
143
144 ----------------
145
146 \newpage{}
147
148 ## Floating Minimum/Maximum
149
150 A-Form
151
152
153 * fminmax FRT, FRA, FRB, FMM
154 * fminmax. FRT, FRA, FRB, FMM
155
156 ```
157 |0 |6 |11 |16 |21 |26 |31 |
158 | PO | FRT | FRA | FRB | FMM[0:3] / | XO | Rc |
159 ```
160
161 Special Registers altered:
162
163 ```
164 FX VXSNAN
165 CR1 (if Rc=1)
166 ```
167 Compute the minimum/maximum of FRA and FRB, according to FMM, and store the
168 result in FRT.
169
170 Assembly Aliases: see
171 [`FMM` -- Floating Min/Max Mode](#fmm-floating-min-max-mode)
172
173 ----------
174
175 ## Floating Minimum/Maximum Single
176
177 A-Form
178
179 * fminmaxs FRT, FRA, FRB, FMM
180 * fminmaxs. FRT, FRA, FRB, FMM
181
182 ```
183 |0 |6 |11 |16 |21 |26 |31 |
184 | PO | FRT | FRA | FRB | FMM[0:3] / | XO | Rc |
185 ```
186
187 Special Registers altered:
188
189 ```
190 FX VXSNAN
191 CR1 (if Rc=1)
192 ```
193
194
195 Compute the minimum/maximum of FRA and FRB, according to FMM, and store the
196 result in FRT.
197
198 Assembly Aliases: see
199 [`FMM` -- Floating Min/Max Mode](#fmm-floating-min-max-mode)
200
201 ----------
202
203 \newpage{}
204
205 # Fixed-Point Instructions
206
207 These are signed and unsigned, min or max. SVP64 Prefixing defines Saturation
208 semantics therefore Saturated variants of these instructions need not be proposed.
209
210 ## Integer Min/Max Mode
211
212 * bit 0: set if word variant else dword
213 * bit 1: set if signed else unsigned
214 * bit 2: set if max else min
215
216 | `IMM` | Assembly Alias | Semantics |
217 |-------|------------------|----------------------------------------------|
218 | 000 | `minu RT,RA,RB` | `RT = (uint64_t)RA < (uint64_t)RB ? RA : RB` |
219 | 001 | `maxu RT,RA,RB` | `RT = (uint64_t)RA > (uint64_t)RB ? RA : RB` |
220 | 010 | `mins RT,RA,RB` | `RT = (int64_t)RA < (int64_t)RB ? RA : RB` |
221 | 011 | `maxs RT,RA,RB` | `RT = (int64_t)RA > (int64_t)RB ? RA : RB` |
222 | 100 | `minuw RT,RA,RB` | `RT = (uint32_t)RA < (uint32_t)RB ? RA : RB` |
223 | 101 | `maxuw RT,RA,RB` | `RT = (uint32_t)RA > (uint32_t)RB ? RA : RB` |
224 | 110 | `minsw RT,RA,RB` | `RT = (int32_t)RA < (int32_t)RB ? RA : RB` |
225 | 111 | `maxsw RT,RA,RB` | `RT = (int32_t)RA > (int32_t)RB ? RA : RB` |
226
227 ## Integer Min/Max MM-Form
228
229 * minmax RT, RA, RB, MMM
230 * minmax. RT, RA, RB, MMM
231
232 ```
233 |0 |6 |11 |16 |21 |24 |25 |31 |
234 | PO | RT | RA | RB | MMM | / | XO | Rc |
235 ```
236
237 ```
238 a <- (RA)
239 b <- (RB)
240 if MMM[0] then # word mode
241 # shift left by XLEN/2 to make the dword comparison
242 # do word comparison of the original inputs
243 a <- a[XLEN/2:XLEN-1] || [0] * XLEN/2
244 b <- b[XLEN/2:XLEN-1] || [0] * XLEN/2
245 if MMM[1] then # signed mode
246 # invert sign bits to make the unsigned comparison
247 # do signed comparison of the original inputs
248 a[0] <- !a[0] # convert
249 b[0] <- !b[0]
250 if MMM[2] then # max mode
251 # swap a and b to make the less than comparison do
252 # greater than comparison of the original inputs
253 t <- a
254 a <- b
255 b <- t
256 # store the entire selected source (even in word mode)
257 if a <u b then RT <- (RA)
258 else RT <- (RB)
259 ```
260
261 Compute the integer minimum/maximum according to `MMM` of `RA` and `RB` and
262 store the result in `RT`.
263
264 Special Registers altered:
265
266 ```
267 CR0 (if Rc=1)
268 ```
269
270 ----------
271
272 \newpage{}
273
274 # Instruction Formats
275
276 Add the following entries to Book I 1.6.1.15 X-FORM:
277
278 ```
279 |0 |6 |11 |16 |21 |26 |31 |
280 | PO | FRT | FRA | FRB | FMM[0:3] / | XO | Rc |
281 ```
282
283 Add a new field to Book I 1.6.2 Word Instruction Fields:
284
285 ```
286 FMM (21:24)
287 Field used to specify minimum/maximum mode for fminmax[s].
288
289 Formats: A
290 ```
291
292 ----------
293
294 \newpage{}
295
296 # Appendices
297
298 Appendix E Power ISA sorted by opcode
299 Appendix F Power ISA sorted by version
300 Appendix G Power ISA sorted by Compliancy Subset
301 Appendix H Power ISA sorted by mnemonic
302
303 | Form | Book | Page | Version | mnemonic | Description |
304 |------|------|------|---------|----------|-------------|
305 | A | I | # | 3.2B | fminmax | Floating Minimum/Maximum |
306 | A | I | # | 3.2B | fminmaxs | Floating Minimum/Maximum Single |
307 | ??? | I | # | 3.2B | minmax | Minimum/max Signed/Unsigned |
308
309 ## fmax instruction count
310
311 32 instructions are required in SFFS to emulate fmax.
312 <https://gcc.godbolt.org/z/6xba61To6>
313
314
315 ```
316 fmax(double, double):
317 fcmpu 0,1,2
318 fmr 0,1
319 cror 30,1,2
320 beq 7,.L12
321 blt 0,.L13
322 stfd 1,-16(1)
323 lis 9,0x8
324 li 8,-1
325 sldi 9,9,32
326 rldicr 8,8,0,11
327 ori 2,2,0
328 ld 10,-16(1)
329 xor 10,10,9
330 sldi 10,10,1
331 cmpld 0,10,8
332 bgt 0,.L5
333 stfd 2,-16(1)
334 ori 2,2,0
335 ld 10,-16(1)
336 xor 9,10,9
337 sldi 9,9,1
338 cmpld 0,9,8
339 ble 0,.L6
340 .L5:
341 fadd 1,0,2
342 blr
343 .L13:
344 fmr 1,2
345 blr
346 .L6:
347 fcmpu 0,2,2
348 fmr 1,2
349 bnulr 0
350 .L12:
351 fmr 1,0
352 blr
353 .long 0
354 .byte 0,9,0,0,0,0,0,0
355 ```
356
357 [[!tag opf_rfc]]
358