openpower/sv/rfc/ls013.mdwn

   1 # RFC ls013 Min/Max GPR/FPR
   2
   3 **URLs**:
   4
   5 * <https://libre-soc.org/openpower/sv/rfc/ls013/>
   6 * <https://git.openpower.foundation/isa/PowerISA/issues/TODO>
   7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1057>
   8
   9 **Severity**: Major
  10
  11 **Status**: New
  12
  13 **Date**: 14 Apr 2023
  14
  15 **Target**: v3.2B
  16
  17 **Source**: v3.1B
  18
  19 **Books and Section affected**:
  20
  21 ```
  22     Book I Fixed-Point and Floating-Point Instructions
  23     Appendix E Power ISA sorted by opcode
  24     Appendix F Power ISA sorted by version
  25     Appendix G Power ISA sorted by Compliancy Subset
  26     Appendix H Power ISA sorted by mnemonic
  27 ```
  28
  29 **Summary**
  30
  31 ```
  32     Instructions added
  33 ```
  34
  35 **Submitter**: Luke Leighton (Libre-SOC)
  36
  37 **Requester**: Libre-SOC
  38
  39 **Impact on processor**:
  40
  41 ```
  42     Addition of new GPR-based and FPR-based instructions
  43 ```
  44
  45 **Impact on software**:
  46
  47 ```
  48     Requires support for new instructions in assembler, debuggers,
  49     and related tools.
  50 ```
  51
  52 **Keywords**:
  53
  54 ```
  55     GPR, FPR, min, max, fmin, fmax
  56 ```
  57
  58 **Motivation**
  59
  60 Minimum/Maximum are common operations that can take an astounding number of
  61 operations to implement in software. Additionally, Vector Reduce-Min/Max are
  62 common vector operations, and SVP64 Parallel Reduction needs a single Scalar
  63 instruction in order to effectively implement Reduce-Min/Max.
  64
  65 **Notes and Observations**:
  66
  67 1. SVP64 REMAP Parallel Reduction needs a single Scalar instruction to
  68     work with, for best effectiveness.  With no SFFS minimum/maximum
  69     instructions Simple-V min/max Parallel Reduction is severely compromised.
  70 2. Once one FP min/max mode is implemented the rest are not much more hardware.
  71 3. There exists similar instructions in VSX (not IEEE754-2019 though).
  72     This is frequently used to justify not adding them. However SVP64/VSX may
  73     have different meaning from SVP64/SFFS, so it is *really* crucial to have
  74     SFFS ops even if "equivalent" to VSX in order for SVP64 to not be
  75     compromised (non-orthogonal).
  76 4. FP min/max are rather complex to implement in software, the most commonly
  77     used FP max function `fmax` from glibc compiled for SFFS is an astounding
  78     32 instructions.
  79
  80 **Changes**
  81
  82 Add the following entries to:
  83
  84 * the Appendices of Book I
  85 * Book I 3.3.9 Fixed-Point Arithmetic Instructions
  86 * Book I 4.6.6.1 Floating-Point Elementary Arithmetic Instructions
  87 * Book I 1.6.1 and 1.6.2
  88
  89 ----------------
  90
  91 \newpage{}
  92
  93 # Floating-Point Instructions
  94
  95 This group is to provide Floating-Point min/max however with IEEE754 having advanced
  96 to 2019 there are now subtle differences. These are selectable with a Mode Field, `FMM`.
  97
  98 ## `FMM` -- Floating Min/Max Mode
  99
 100 <a id="fmm-floating-min-max-mode"></a>
 101
 102 | `FMM` | Extended Mnemonic             | Origin                         | Semantics                                       |
 103 |-------|-------------------------------|--------------------------------|-------------------------------------------------|
 104 | 0000  | fminnum08[s] FRT, FRA, FRB    | IEEE 754-2008                  | FRT = minNum(FRA, FRB)  (1)                     |
 105 | 0001  | fmin19[s] FRT, FRA, FRB       | IEEE 754-2019                  | FRT = minimum(FRA, FRB)                         |
 106 | 0010  | fminnum19[s] FRT, FRA, FRB    | IEEE 754-2019                  | FRT = minimumNumber(FRA, FRB)                   |
 107 | 0011  | fminc[s] FRT, FRA, FRB        | x86 minss or Win32's min macro | FRT = FRA \< FRB ? FRA : FRB                    |
 108 | 0100  | fminmagnum08[s] FRT, FRA, FRB | IEEE 754-2008 (TODO: (3))      | FRT = minmaxmag(FRA, FRB, False, fminnum08) (2) |
 109 | 0101  | fminmag19[s] FRT, FRA, FRB    | IEEE 754-2019                  | FRT = minmaxmag(FRA, FRB, False, fmin19) (2)    |
 110 | 0110  | fminmagnum19[s] FRT, FRA, FRB | IEEE 754-2019                  | FRT = minmaxmag(FRA, FRB, False, fminnum19) (2) |
 111 | 0111  | fminmagc[s] FRT, FRA, FRB     | -                              | FRT = minmaxmag(FRA, FRB, False, fminc) (2)     |
 112 | 1000  | fmaxnum08[s] FRT, FRA, FRB    | IEEE 754-2008                  | FRT = maxNum(FRA, FRB)  (1)                     |
 113 | 1001  | fmax19[s] FRT, FRA, FRB       | IEEE 754-2019                  | FRT = maximum(FRA, FRB)                         |
 114 | 1010  | fmaxnum19[s] FRT, FRA, FRB    | IEEE 754-2019                  | FRT = maximumNumber(FRA, FRB)                   |
 115 | 1011  | fmaxc[s] FRT, FRA, FRB        | x86 maxss or Win32's max macro | FRT = FRA > FRB ? FRA : FRB                     |
 116 | 1100  | fmaxmagnum08[s] FRT, FRA, FRB | IEEE 754-2008 (TODO: (3))      | FRT = minmaxmag(FRA, FRB, True, fmaxnum08) (2)  |
 117 | 1101  | fmaxmag19[s] FRT, FRA, FRB    | IEEE 754-2019                  | FRT = minmaxmag(FRA, FRB, True, fmax19) (2)     |
 118 | 1110  | fmaxmagnum19[s] FRT, FRA, FRB | IEEE 754-2019                  | FRT = minmaxmag(FRA, FRB, True, fmaxnum19) (2)  |
 119 | 1111  | fmaxmagc[s] FRT, FRA, FRB     | -                              | FRT = minmaxmag(FRA, FRB, True, fmaxc) (2)      |
 120
 121 Note (1): for the purposes of minNum/maxNum, -0.0 is defined to be less than
 122     +0.0. This is left unspecified in IEEE 754-2008.
 123
 124 Note (2): minmaxmag(x, y, cmp, fallback) is defined as:
 125
 126 ```python
 127 def minmaxmag(x, y, is_max, fallback):
 128     a = abs(x) < abs(y)
 129     b = abs(x) > abs(y)
 130     if is_max:
 131         a, b = b, a  # swap
 132     if a:
 133         return x
 134     if b:
 135         return y
 136     # equal magnitudes, or NaN input(s)
 137     return fallback(x, y)
 138 ```
 139
 140 Note (3): TODO: icr if IEEE 754-2008 has min/maxMagNum like IEEE 754-2019's
 141     minimum/maximumMagnitudeNumber
 142
 143 ----------------
 144
 145 \newpage{}
 146
 147 ## Floating Minimum/Maximum MM-form
 148
 149 * fminmax FRT, FRA, FRB, FMM
 150 * fminmax. FRT, FRA, FRB, FMM
 151
 152 ```
 153     |0    |6    |11   |16   |21   |25  |31  |
 154     | PO  | FRT | FRA | FRB | FMM | XO | Rc |
 155 ```
 156
 157 Compute the minimum/maximum of FRA and FRB, according to FMM, and store the
 158 result in FRT.
 159
 160 Special Registers altered:
 161
 162 ```
 163     FX VXSNAN
 164     CR1     (if Rc=1)
 165 ```
 166
 167 Extended Mnemonics:
 168
 169 see [`FMM` -- Floating Min/Max Mode](#fmm-floating-min-max-mode)
 170
 171 ----------
 172
 173 ## Floating Minimum/Maximum Single MM-form
 174
 175 * fminmaxs FRT, FRA, FRB, FMM
 176 * fminmaxs. FRT, FRA, FRB, FMM
 177
 178 ```
 179     |0    |6    |11   |16   |21   |25  |31  |
 180     | PO  | FRT | FRA | FRB | FMM | XO | Rc |
 181 ```
 182
 183 Compute the minimum/maximum of FRA and FRB, according to FMM, and store the
 184 result in FRT.
 185
 186 Special Registers altered:
 187
 188 ```
 189     FX VXSNAN
 190     CR1     (if Rc=1)
 191 ```
 192
 193 Extended Mnemonics:
 194
 195 see [`FMM` -- Floating Min/Max Mode](#fmm-floating-min-max-mode)
 196
 197 ----------
 198
 199 \newpage{}
 200
 201 # Fixed-Point Instructions
 202
 203 These are signed and unsigned, min or max.  SVP64 Prefixing defines Saturation
 204 semantics therefore Saturated variants of these instructions need not be proposed.
 205
 206 ## `MMM` -- Integer Min/Max Mode
 207
 208 <a id="mmm-integer-min-max-mode"></a>
 209
 210 * bit 0: set if word variant else dword
 211 * bit 1: set if signed else unsigned
 212 * bit 2: set if max else min
 213
 214 | `MMM` | Extended Mnemonic | Semantics                                    |
 215 |-------|-------------------|----------------------------------------------|
 216 | 000   | `minu RT,RA,RB`   | `RT = (uint64_t)RA < (uint64_t)RB ? RA : RB` |
 217 | 001   | `maxu RT,RA,RB`   | `RT = (uint64_t)RA > (uint64_t)RB ? RA : RB` |
 218 | 010   | `mins RT,RA,RB`   | `RT =  (int64_t)RA < (int64_t)RB  ? RA : RB` |
 219 | 011   | `maxs RT,RA,RB`   | `RT =  (int64_t)RA > (int64_t)RB  ? RA : RB` |
 220 | 100   | `minuw RT,RA,RB`  | `RT = (uint32_t)RA < (uint32_t)RB ? RA : RB` |
 221 | 101   | `maxuw RT,RA,RB`  | `RT = (uint32_t)RA > (uint32_t)RB ? RA : RB` |
 222 | 110   | `minsw RT,RA,RB`  | `RT =  (int32_t)RA < (int32_t)RB  ? RA : RB` |
 223 | 111   | `maxsw RT,RA,RB`  | `RT =  (int32_t)RA > (int32_t)RB  ? RA : RB` |
 224
 225 ## Minimum/Maximum MM-Form
 226
 227 * minmax RT, RA, RB, MMM
 228 * minmax. RT, RA, RB, MMM
 229
 230 ```
 231     |0    |6    |11   |16   |21   |24 |25  |31  |
 232     | PO  | RT  | RA  | RB  | MMM | / | XO | Rc |
 233 ```
 234
 235 ```
 236     a <- (RA|0)
 237     b <- (RB)
 238     if MMM[0] then  # word mode
 239         # shift left by XLEN/2 to make the dword comparison
 240         # do word comparison of the original inputs
 241         a <- a[XLEN/2:XLEN-1] || [0] * XLEN/2
 242         b <- b[XLEN/2:XLEN-1] || [0] * XLEN/2
 243     if MMM[1] then  # signed mode
 244         # invert sign bits to make the unsigned comparison
 245         # do signed comparison of the original inputs
 246         a[0] <- ¬a[0]
 247         b[0] <- ¬b[0]
 248     if MMM[2] then  # max mode
 249         # swap a and b to make the less than comparison do
 250         # greater than comparison of the original inputs
 251         t <- a
 252         a <- b
 253         b <- t
 254     # store the entire selected source (even in word mode)
 255     # if Rc = 1 then store the result of comparing a and b to CR0
 256     if a <u b then
 257         RT <- (RA|0)
 258         if Rc = 1 then CR0 <- 0b100 || XER.SO
 259     if a = b then
 260         RT <- (RB)
 261         if Rc = 1 then CR0 <- 0b001 || XER.SO
 262     if a >u b then
 263         RT <- (RB)
 264         if Rc = 1 then CR0 <- 0b010 || XER.SO
 265 ```
 266
 267 Compute the integer minimum/maximum according to `MMM` of `(RA|0)` and `(RB)`
 268 and store the result in `RT`.
 269
 270 Special Registers altered:
 271
 272 ```
 273     CR0     (if Rc=1)
 274 ```
 275
 276 Extended Mnemonics:
 277
 278 see [`MMM` -- Integer Min/Max Mode](#mmm-integer-min-max-mode)
 279
 280 ----------
 281
 282 \newpage{}
 283
 284 # Instruction Formats
 285
 286 Add the following entries to Book I 1.6.1 Word Instruction Formats:
 287
 288 ## MM-FORM
 289
 290 ```
 291     |0    |6    |11   |16   |21   |24 |25  |31  |
 292     | PO  | FRT | FRA | FRB | FMM     | XO | Rc |
 293     | PO  | RT  | RA  | RB  | MMM | / | XO | Rc |
 294 ```
 295
 296 Add the following new fields to Book I 1.6.2 Word Instruction Fields:
 297
 298 ```
 299     FMM (21:24)
 300         Field used to specify minimum/maximum mode for fminmax[s].
 301
 302         Formats: MM
 303
 304     MMM (21:23)
 305         Field used to specify minimum/maximum mode for integer minmax.
 306
 307         Formats: MM
 308 ```
 309
 310 Add `MM` to the `Formats:` list for all of `FRT`, `FRA`, `FRB`, `XO (25:30)`,
 311 `Rc`, `RT`, `RA` and `RB`.
 312
 313 ----------
 314
 315 \newpage{}
 316
 317 # Appendices
 318
 319     Appendix E Power ISA sorted by opcode
 320     Appendix F Power ISA sorted by version
 321     Appendix G Power ISA sorted by Compliancy Subset
 322     Appendix H Power ISA sorted by mnemonic
 323
 324 | Form | Book | Page | Version | Mnemonic | Description |
 325 |------|------|------|---------|----------|-------------|
 326 | MM   | I    | #    | 3.2B    | fminmax  | Floating Minimum/Maximum |
 327 | MM   | I    | #    | 3.2B    | fminmaxs | Floating Minimum/Maximum Single |
 328 | MM   | I    | #    | 3.2B    | minmax   | Minimum/Maximum |
 329
 330 ## fmax instruction count
 331
 332 32 instructions are required in SFFS to emulate fmax.
 333
 334 ```
 335 #include <stdint.h>
 336 #include <string.h>
 337
 338 inline uint64_t asuint64(double f) {
 339     union {
 340         double f;
 341         uint64_t i;
 342     } u = {f};
 343     return u.i;
 344 }
 345
 346 inline int issignaling(double v) {
 347     // copied from glibc:
 348     // https://github.com/bminor/glibc/blob/e2756903329365134089d23548e9083d23bc3dd9/sysdeps/ieee754/dbl-64/math_config.h#L101
 349     uint64_t ix = asuint64(v);
 350     return 2 * (ix ^ 0x0008000000000000) > 2 * 0x7ff8000000000000ULL;
 351 }
 352
 353 double fmax(double x, double y) {
 354     // copied from glibc:
 355     // https://github.com/bminor/glibc/blob/e2756903329365134089d23548e9083d23bc3dd9/math/s_fmax_template.c
 356     if(__builtin_isgreaterequal(x, y))
 357         return x;
 358     else if(__builtin_isless(x, y))
 359         return y;
 360     else if(issignaling(x) || issignaling(y))
 361         return x + y;
 362     else
 363         return __builtin_isnan(y) ? x : y;
 364 }
 365 ```
 366
 367 Assembly listing:
 368
 369 ```
 370     fmax(double, double):
 371         fcmpu 0,1,2
 372         fmr 0,1
 373         cror 30,1,2
 374         beq 7,.L12
 375         blt 0,.L13
 376         stfd 1,-16(1)
 377         lis 9,0x8
 378         li 8,-1
 379         sldi 9,9,32
 380         rldicr 8,8,0,11
 381         ori 2,2,0
 382         ld 10,-16(1)
 383         xor 10,10,9
 384         sldi 10,10,1
 385         cmpld 0,10,8
 386         bgt 0,.L5
 387         stfd 2,-16(1)
 388         ori 2,2,0
 389         ld 10,-16(1)
 390         xor 9,10,9
 391         sldi 9,9,1
 392         cmpld 0,9,8
 393         ble 0,.L6
 394 .L5:
 395         fadd 1,0,2
 396         blr
 397 .L13:
 398         fmr 1,2
 399         blr
 400 .L6:
 401         fcmpu 0,2,2
 402         fmr 1,2
 403         bnulr 0
 404 .L12:
 405         fmr 1,0
 406         blr
 407         .long 0
 408         .byte 0,9,0,0,0,0,0,0
 409 ```
 410
 411 [[!tag opf_rfc]]
 412