openpower/sv/rfc/ls013.mdwn

   1 # RFC ls013 Min/Max GPR/FPR
   2
   3 **URLs**:
   4
   5 * <https://libre-soc.org/openpower/sv/rfc/ls013/>
   6 * <https://git.openpower.foundation/isa/PowerISA/issues/TODO>
   7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1057>
   8
   9 **Severity**: Major
  10
  11 **Status**: New
  12
  13 **Date**: 14 Apr 2023
  14
  15 **Target**: v3.2B
  16
  17 **Source**: v3.1B
  18
  19 **Books and Section affected**:
  20
  21 ```
  22     Book I Fixed-Point and Floating-Point Instructions
  23     Appendix E Power ISA sorted by opcode
  24     Appendix F Power ISA sorted by version
  25     Appendix G Power ISA sorted by Compliancy Subset
  26     Appendix H Power ISA sorted by mnemonic
  27 ```
  28
  29 **Summary**
  30
  31 ```
  32     Instructions added
  33 ```
  34
  35 **Submitter**: Luke Leighton (Libre-SOC)
  36
  37 **Requester**: Libre-SOC
  38
  39 **Impact on processor**:
  40
  41 ```
  42     Addition of new GPR-based and FPR-based instructions
  43 ```
  44
  45 **Impact on software**:
  46
  47 ```
  48     Requires support for new instructions in assembler, debuggers,
  49     and related tools.
  50 ```
  51
  52 **Keywords**:
  53
  54 ```
  55     GPR, FPR, min, max, fmin, fmax
  56 ```
  57
  58 **Motivation**
  59
  60 Minimum/Maximum are common operations that can take an astounding number of
  61 operations to implement in software. Additionally, Vector Reduce-Min/Max are
  62 common vector operations, and SVP64 Parallel Reduction needs a single Scalar
  63 instruction in order to effectively implement Reduce-Min/Max.
  64
  65 **Notes and Observations**:
  66
  67 1. SVP64 REMAP Parallel Reduction needs a single Scalar instruction to
  68     work with, for best effectiveness.  With no SFFS minimum/maximum
  69     instructions Simple-V min/max Parallel Reduction is severely compromised.
  70 2. Once one FP min/max mode is implemented the rest are not much more hardware.
  71 3. There exists similar instructions in VSX (not IEEE754-2019 though).
  72     This is frequently used to justify not adding them. However SVP64/VSX may
  73     have different meaning from SVP64/SFFS, so it is *really* crucial to have
  74     SFFS ops even if "equivalent" to VSX in order for SVP64 to not be
  75     compromised (non-orthogonal).
  76 4. FP min/max are rather complex to implement in software, the most commonly
  77     used FP max function `fmax` from glibc compiled for SFFS is an astounding
  78     32 instructions.
  79
  80 **Changes**
  81
  82 Add the following entries to:
  83
  84 * the Appendices of Book I
  85 * Book I 3.3.9 Fixed-Point Arithmetic Instructions
  86 * Book I 4.6.6.1 Floating-Point Elementary Arithmetic Instructions
  87 * Book I 1.6.1 and 1.6.2
  88
  89 ----------------
  90
  91 \newpage{}
  92
  93 # Floating-Point Instructions
  94
  95 This group is to provide Floating-Point min/max however with IEEE754 having advanced
  96 to 2019 there are now subtle differences. These are selectable with a Mode Field, `FMM`.
  97
  98 ## `FMM` -- Floating Min/Max Mode
  99
 100 <a id="fmm-floating-min-max-mode"></a>
 101
 102 <!-- hyphens in table determine width of columns for pandoc -- -->
 103 | `FMM`| Extended Mnemonic             | Origin             | Semantics                                  |
 104 |------|-------------------------------|--------------------|--------------------------------------------|
 105 | 0000 | fminnum08[s] FRT,FRA,FRB      | IEEE 754-2008      | minNum(FRA,FRB)  (1)                       |
 106 | 0001 | fmin19[s] FRT,FRA,FRB         | IEEE 754-2019      | minimum(FRA,FRB)                           |
 107 | 0010 | fminnum19[s] FRT,FRA,FRB      | IEEE 754-2019      | minimumNumber(FRA,FRB)                     |
 108 | 0011 | fminc[s] FRT,FRA,FRB          | x86 minss (4)      | FRA\<FRB ? FRA:FRB                      |
 109 | 0100 | fminmagnum08[s] FRT,FRA,FRB   | IEEE 754-2008 (3)  | mmmag(FRA,FRB,False,fminnum08) (2)   |
 110 | 0101 | fminmag19[s] FRT,FRA,FRB      | IEEE 754-2019      | mmmag(FRA,FRB,False,fmin19) (2)      |
 111 | 0110 | fminmagnum19[s] FRT,FRA,FRB   | IEEE 754-2019      | mmmag(FRA,FRB,False,fminnum19) (2)   |
 112 | 0111 | fminmagc[s] FRT,FRA,FRB       | -                  | mmmag(FRA,FRB,False,fminc) (2)       |
 113 | 1000 | fmaxnum08[s] FRT,FRA,FRB      | IEEE 754-2008      | maxNum(FRA,FRB)  (1)                       |
 114 | 1001 | fmax19[s] FRT,FRA,FRB         | IEEE 754-2019      | maximum(FRA,FRB)                           |
 115 | 1010 | fmaxnum19[s] FRT,FRA,FRB      | IEEE 754-2019      | maximumNumber(FRA,FRB)                     |
 116 | 1011 | fmaxc[s] FRT,FRA,FRB          | x86 maxss (4)      | FRA\>FRB ? FRA:FRB                       |
 117 | 1100 | fmaxmagnum08[s] FRT,FRA,FRB   | IEEE 754-2008 (3)  | mmmag(FRA,FRB,True,fmaxnum08) (2)    |
 118 | 1101 | fmaxmag19[s] FRT,FRA,FRB      | IEEE 754-2019      | mmmag(FRA,FRB,True,fmax19) (2)       |
 119 | 1110 | fmaxmagnum19[s] FRT,FRA,FRB   | IEEE 754-2019      | mmmag(FRA,FRB,True,fmaxnum19) (2)    |
 120 | 1111 | fmaxmagc[s] FRT,FRA,FRB       | -                  | mmmag(FRA,FRB,True,fmaxc) (2)        |
 121
 122 Note (1): for the purposes of minNum/maxNum, -0.0 is defined to be less than
 123     +0.0. This is left unspecified in IEEE 754-2008.
 124
 125 Note (2): mmmag(x, y, cmp, fallback) is defined as:
 126
 127 ```python
 128 def mmmag(x, y, is_max, fallback):
 129     a = abs(x) < abs(y)
 130     b = abs(x) > abs(y)
 131     if is_max:
 132         a, b = b, a  # swap
 133     if a:
 134         return x
 135     if b:
 136         return y
 137     # equal magnitudes, or NaN input(s)
 138     return fallback(x, y)
 139 ```
 140
 141 Note (3): TODO: icr if IEEE 754-2008 has min/maxMagNum like IEEE 754-2019's
 142     minimum/maximumMagnitudeNumber
 143
 144 Note (4) or Win32's min macro
 145
 146 ----------------
 147
 148 \newpage{}
 149
 150 ## Floating Minimum/Maximum MM-form
 151
 152 * fminmax FRT, FRA, FRB, FMM
 153 * fminmax. FRT, FRA, FRB, FMM
 154
 155 ```
 156     |0    |6    |11   |16   |21   |25  |31  |
 157     | PO  | FRT | FRA | FRB | FMM | XO | Rc |
 158 ```
 159
 160 ```
 161     result <- [0] * 64
 162     a <- (FRA)
 163     b <- (FRB)
 164     abs_a <- 0b0 || a[1:63]
 165     abs_b <- 0b0 || b[1:63]
 166     a_is_nan <- abs_a >u 0x7FF0_0000_0000_0000
 167     a_is_snan <- a_is_nan and a[12] = 0
 168     b_is_nan <- abs_b >u 0x7FF0_0000_0000_0000
 169     b_is_snan <- b_is_nan and b[12] = 0
 170     any_snan <- a_is_snan or b_is_snan
 171     a_quieted <- a
 172     a_quieted[12] = 1
 173     b_quieted <- b
 174     b_quieted[12] = 1
 175     if a_is_nan or b_is_nan then
 176         if FMM[2:3] = 0b00 then  # min/maxnum08
 177             if a_is_snan then result <- a_quieted
 178             else if b_is_snan then result <- b_quieted
 179             else if a_is_nan and b_is_nan then result <- a_quieted
 180             else if a_is_nan then result <- b
 181             else result <- a
 182         if FMM[2:3] = 0b01 then  # min/max19
 183             if a_is_nan then result <- a_quieted
 184             else result <- b_quieted
 185         if FMM[2:3] = 0b10 then  # min/maxnum19
 186             if a_is_nan and b_is_nan then result <- a_quieted
 187             else if a_is_nan then result <- b
 188             else result <- a
 189         if FMM[2:3] = 0b11 then  # min/maxc
 190             result <- b
 191     else
 192         cmp_l <- a
 193         cmp_r <- b
 194         if FMM[1] then  # min/maxmag
 195             if abs_a != abs_b then
 196                 cmp_l <- abs_a
 197                 cmp_r <- abs_b
 198         if FMM[2:3] = 0b11 then  # min/maxc
 199             if abs_a = 0 then cmp_l <- 0
 200             if abs_b = 0 then cmp_r <- 0
 201         if FMM[0] then  # max
 202             # swap cmp_* so comparison goes the other way
 203             cmp_l, cmp_r <- cmp_r, cmp_l
 204         if cmp_l[0] = 1 then
 205             if cmp_r[0] = 0 then result <- a
 206             else if cmp_l >u cmp_r then
 207                 # IEEE 754 is sign-magnitude,
 208                 # so bigger magnitude negative is smaller
 209                 result <- a
 210             else result <- b
 211         else if cmp_r[0] = 1 then result <- b
 212         else if cmp_l <u cmp_r then result <- a
 213         else result <- b
 214     if any_snan then SetFX(FPSCR.VXSNAN)
 215     if FPSCR.VE = 0 and ¬any_snan then (FRT) <- result
 216 ```
 217
 218 Compute the minimum/maximum of FRA and FRB, according to FMM, and store the
 219 result in FRT.
 220
 221 Special Registers altered:
 222
 223 ```
 224     FX VXSNAN
 225     CR1     (if Rc=1)
 226 ```
 227
 228 Extended Mnemonics:
 229
 230 see [`FMM` -- Floating Min/Max Mode](#fmm-floating-min-max-mode)
 231
 232 ----------
 233
 234 ## Floating Minimum/Maximum Single MM-form
 235
 236 * fminmaxs FRT, FRA, FRB, FMM
 237 * fminmaxs. FRT, FRA, FRB, FMM
 238
 239 ```
 240     |0    |6    |11   |16   |21   |25  |31  |
 241     | PO  | FRT | FRA | FRB | FMM | XO | Rc |
 242 ```
 243
 244 Compute the minimum/maximum of FRA and FRB, according to FMM, and store the
 245 result in FRT.
 246
 247 Special Registers altered:
 248
 249 ```
 250     FX VXSNAN
 251     CR1     (if Rc=1)
 252 ```
 253
 254 Extended Mnemonics:
 255
 256 see [`FMM` -- Floating Min/Max Mode](#fmm-floating-min-max-mode)
 257
 258 ----------
 259
 260 \newpage{}
 261
 262 # Fixed-Point Instructions
 263
 264 These are signed and unsigned, min or max.  SVP64 Prefixing defines Saturation
 265 semantics therefore Saturated variants of these instructions need not be proposed.
 266
 267 ## `MMM` -- Integer Min/Max Mode
 268
 269 <a id="mmm-integer-min-max-mode"></a>
 270
 271 * bit 0: set if word variant else dword
 272 * bit 1: set if signed else unsigned
 273 * bit 2: set if max else min
 274
 275 | `MMM` | Extended Mnemonic | Semantics                                    |
 276 |-------|-------------------|----------------------------------------------|
 277 | 000   | `minu RT,RA,RB`   | `(uint64_t)RA < (uint64_t)RB ? RA : RB` |
 278 | 001   | `maxu RT,RA,RB`   | `(uint64_t)RA > (uint64_t)RB ? RA : RB` |
 279 | 010   | `mins RT,RA,RB`   | ` (int64_t)RA < (int64_t)RB  ? RA : RB` |
 280 | 011   | `maxs RT,RA,RB`   | ` (int64_t)RA > (int64_t)RB  ? RA : RB` |
 281 | 100   | `minuw RT,RA,RB`  | `(uint32_t)RA < (uint32_t)RB ? RA : RB` |
 282 | 101   | `maxuw RT,RA,RB`  | `(uint32_t)RA > (uint32_t)RB ? RA : RB` |
 283 | 110   | `minsw RT,RA,RB`  | ` (int32_t)RA < (int32_t)RB  ? RA : RB` |
 284 | 111   | `maxsw RT,RA,RB`  | ` (int32_t)RA > (int32_t)RB  ? RA : RB` |
 285
 286 ## Minimum/Maximum MM-Form
 287
 288 * minmax RT, RA, RB, MMM
 289 * minmax. RT, RA, RB, MMM
 290
 291 ```
 292     |0    |6    |11   |16   |21   |24 |25  |31  |
 293     | PO  | RT  | RA  | RB  | MMM | / | XO | Rc |
 294 ```
 295
 296 ```
 297     a <- (RA|0)
 298     b <- (RB)
 299     if MMM[0] then  # word mode
 300         # shift left by XLEN/2 to make the dword comparison
 301         # do word comparison of the original inputs
 302         a <- a[XLEN/2:XLEN-1] || [0] * XLEN/2
 303         b <- b[XLEN/2:XLEN-1] || [0] * XLEN/2
 304     if MMM[1] then  # signed mode
 305         # invert sign bits to make the unsigned comparison
 306         # do signed comparison of the original inputs
 307         a[0] <- ¬a[0]
 308         b[0] <- ¬b[0]
 309     # if Rc = 1 then store the result of comparing a and b to CR0
 310     if Rc = 1 then
 311         if a <u b then
 312             CR0 <- 0b100 || XER.SO
 313         if a = b then
 314             CR0 <- 0b001 || XER.SO
 315         if a >u b then
 316             CR0 <- 0b010 || XER.SO
 317     if MMM[2] then  # max mode
 318         # swap a and b to make the less than comparison do
 319         # greater than comparison of the original inputs
 320         t <- a
 321         a <- b
 322         b <- t
 323     # store the entire selected source (even in word mode)
 324     # if Rc = 1 then store the result of comparing a and b to CR0
 325     if a <u b then RT <- (RA|0)
 326     else RT <- (RB)
 327 ```
 328
 329 Compute the integer minimum/maximum according to `MMM` of `(RA|0)` and `(RB)`
 330 and store the result in `RT`.
 331
 332 Special Registers altered:
 333
 334 ```
 335     CR0     (if Rc=1)
 336 ```
 337
 338 Extended Mnemonics:
 339
 340 see [`MMM` -- Integer Min/Max Mode](#mmm-integer-min-max-mode)
 341
 342 ----------
 343
 344 \newpage{}
 345
 346 # Instruction Formats
 347
 348 Add the following entries to Book I 1.6.1 Word Instruction Formats:
 349
 350 ## MM-FORM
 351
 352 ```
 353     |0    |6    |11   |16   |21   |24 |25  |31  |
 354     | PO  | FRT | FRA | FRB | FMM     | XO | Rc |
 355     | PO  | RT  | RA  | RB  | MMM | / | XO | Rc |
 356 ```
 357
 358 Add the following new fields to Book I 1.6.2 Word Instruction Fields:
 359
 360 ```
 361     FMM (21:24)
 362         Field used to specify minimum/maximum mode for fminmax[s].
 363
 364         Formats: MM
 365
 366     MMM (21:23)
 367         Field used to specify minimum/maximum mode for integer minmax.
 368
 369         Formats: MM
 370 ```
 371
 372 Add `MM` to the `Formats:` list for all of `FRT`, `FRA`, `FRB`, `XO (25:30)`,
 373 `Rc`, `RT`, `RA` and `RB`.
 374
 375 ----------
 376
 377 \newpage{}
 378
 379 # Appendices
 380
 381     Appendix E Power ISA sorted by opcode
 382     Appendix F Power ISA sorted by version
 383     Appendix G Power ISA sorted by Compliancy Subset
 384     Appendix H Power ISA sorted by mnemonic
 385
 386 | Form | Book | Page | Version | Mnemonic | Description |
 387 |------|------|------|---------|----------|-------------|
 388 | MM   | I    | #    | 3.2B    | fminmax  | Floating Minimum/Maximum |
 389 | MM   | I    | #    | 3.2B    | fminmaxs | Floating Minimum/Maximum Single |
 390 | MM   | I    | #    | 3.2B    | minmax   | Minimum/Maximum |
 391
 392 ## fmax instruction count
 393
 394 32 instructions are required in SFFS to emulate fmax.
 395
 396 ```
 397     #include <stdint.h>
 398     #include <string.h>
 399
 400     inline uint64_t asuint64(double f) {
 401         union {
 402             double f;
 403             uint64_t i;
 404         } u = {f};
 405         return u.i;
 406     }
 407
 408     inline int issignaling(double v) {
 409         // copied from glibc:
 410         // https://github.com/bminor/glibc/blob/e2756903/sysdeps/ieee754/dbl-64/math_config.h#L101
 411         uint64_t ix = asuint64(v);
 412         return 2 * (ix ^ 0x0008000000000000) > 2 * 0x7ff8000000000000ULL;
 413     }
 414
 415     double fmax(double x, double y) {
 416         // copied from glibc:
 417         // https://github.com/bminor/glibc/blob/e2756903/math/s_fmax_template.c
 418         if(__builtin_isgreaterequal(x, y))
 419             return x;
 420         else if(__builtin_isless(x, y))
 421             return y;
 422         else if(issignaling(x) || issignaling(y))
 423             return x + y;
 424         else
 425             return __builtin_isnan(y) ? x : y;
 426     }
 427 ```
 428
 429 Assembly listing:
 430
 431 ```
 432     fmax(double, double):
 433         fcmpu 0,1,2
 434         fmr 0,1
 435         cror 30,1,2
 436         beq 7,.L12
 437         blt 0,.L13
 438         stfd 1,-16(1)
 439         lis 9,0x8
 440         li 8,-1
 441         sldi 9,9,32
 442         rldicr 8,8,0,11
 443         ori 2,2,0
 444         ld 10,-16(1)
 445         xor 10,10,9
 446         sldi 10,10,1
 447         cmpld 0,10,8
 448         bgt 0,.L5
 449         stfd 2,-16(1)
 450         ori 2,2,0
 451         ld 10,-16(1)
 452         xor 9,10,9
 453         sldi 9,9,1
 454         cmpld 0,9,8
 455         ble 0,.L6
 456 .L5:
 457         fadd 1,0,2
 458         blr
 459 .L13:
 460         fmr 1,2
 461         blr
 462 .L6:
 463         fcmpu 0,2,2
 464         fmr 1,2
 465         bnulr 0
 466 .L12:
 467         fmr 1,0
 468         blr
 469         .long 0
 470         .byte 0,9,0,0,0,0,0,0
 471 ```
 472
 473 [[!tag opf_rfc]]
 474