openpower/sv/rfc/ls013.mdwn

   1 # RFC ls013 Min/Max GPR/FPR
   2
   3 **URLs**:
   4
   5 * <https://libre-soc.org/openpower/sv/rfc/ls013/>
   6 * <https://git.openpower.foundation/isa/PowerISA/issues/TODO>
   7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1057>
   8
   9 **Severity**: Major
  10
  11 **Status**: New
  12
  13 **Date**: 14 Apr 2023
  14
  15 **Target**: v3.2B
  16
  17 **Source**: v3.1B
  18
  19 **Books and Section affected**:
  20
  21 ```
  22     Book I Fixed-Point and Floating-Point Instructions
  23     Appendix E Power ISA sorted by opcode
  24     Appendix F Power ISA sorted by version
  25     Appendix G Power ISA sorted by Compliancy Subset
  26     Appendix H Power ISA sorted by mnemonic
  27 ```
  28
  29 **Summary**
  30
  31 ```
  32     Instructions added
  33 ```
  34
  35 **Submitter**: Luke Leighton (Libre-SOC)
  36
  37 **Requester**: Libre-SOC
  38
  39 **Impact on processor**:
  40
  41 ```
  42     Addition of new GPR-based and FPR-based instructions
  43 ```
  44
  45 **Impact on software**:
  46
  47 ```
  48     Requires support for new instructions in assembler, debuggers,
  49     and related tools.
  50 ```
  51
  52 **Keywords**:
  53
  54 ```
  55     GPR, FPR, min, max, fmin, fmax
  56 ```
  57
  58 **Motivation**
  59
  60 Minimum/Maximum are common operations that can take an astounding number of
  61 operations to implement in software. Additionally, Vector Reduce-Min/Max are
  62 common vector operations, and SVP64 Parallel Reduction needs a single Scalar
  63 instruction in order to effectively implement Reduce-Min/Max.
  64
  65 **Notes and Observations**:
  66
  67 1. SVP64 REMAP Parallel Reduction needs a single Scalar instruction to
  68     work with, for best effectiveness.  With no SFFS minimum/maximum
  69     instructions Simple-V min/max Parallel Reduction is severely compromised.
  70 2. Once one FP min/max mode is implemented the rest are not much more hardware.
  71 3. There exists similar instructions in VSX (not IEEE754-2019 though).
  72     This is frequently used to justify not adding them. However SVP64/VSX may
  73     have different meaning from SVP64/SFFS, so it is *really* crucial to have
  74     SFFS ops even if "equivalent" to VSX in order for SVP64 to not be
  75     compromised (non-orthogonal).
  76 4. FP min/max are rather complex to implement in software, the most commonly
  77     used FP max function `fmax` from glibc compiled for SFFS is an astounding
  78     32 instructions.
  79
  80 **Changes**
  81
  82 Add the following entries to:
  83
  84 * the Appendices of Book I
  85 * Book I 3.3.9 Fixed-Point Arithmetic Instructions
  86 * Book I 4.6.6.1 Floating-Point Elementary Arithmetic Instructions
  87 * Book I 1.6.1 and 1.6.2
  88
  89 ----------------
  90
  91 \newpage{}
  92
  93 # Floating-Point Instructions
  94
  95 This group is to provide Floating-Point min/max however with IEEE754 having advanced
  96 to 2019 there are now subtle differences. These are selectable with a Mode Field, `FMM`.
  97
  98 ## `FMM` -- Floating Min/Max Mode
  99
 100 <a id="fmm-floating-min-max-mode"></a>
 101
 102 | `FMM` | Assembly Alias                | Origin                         | Semantics                                       |
 103 |-------|-------------------------------|--------------------------------|-------------------------------------------------|
 104 | 0000  | fminnum08[s] FRT, FRA, FRB    | IEEE 754-2008                  | FRT = minNum(FRA, FRB)  (1)                     |
 105 | 0001  | fmin19[s] FRT, FRA, FRB       | IEEE 754-2019                  | FRT = minimum(FRA, FRB)                         |
 106 | 0010  | fminnum19[s] FRT, FRA, FRB    | IEEE 754-2019                  | FRT = minimumNumber(FRA, FRB)                   |
 107 | 0011  | fminc[s] FRT, FRA, FRB        | x86 minss or Win32's min macro | FRT = FRA \< FRB ? FRA : FRB                    |
 108 | 0100  | fminmagnum08[s] FRT, FRA, FRB | IEEE 754-2008 (TODO: (3))      | FRT = minmaxmag(FRA, FRB, False, fminnum08) (2) |
 109 | 0101  | fminmag19[s] FRT, FRA, FRB    | IEEE 754-2019                  | FRT = minmaxmag(FRA, FRB, False, fmin19) (2)    |
 110 | 0110  | fminmagnum19[s] FRT, FRA, FRB | IEEE 754-2019                  | FRT = minmaxmag(FRA, FRB, False, fminnum19) (2) |
 111 | 0111  | fminmagc[s] FRT, FRA, FRB     | -                              | FRT = minmaxmag(FRA, FRB, False, fminc) (2)     |
 112 | 1000  | fmaxnum08[s] FRT, FRA, FRB    | IEEE 754-2008                  | FRT = maxNum(FRA, FRB)  (1)                     |
 113 | 1001  | fmax19[s] FRT, FRA, FRB       | IEEE 754-2019                  | FRT = maximum(FRA, FRB)                         |
 114 | 1010  | fmaxnum19[s] FRT, FRA, FRB    | IEEE 754-2019                  | FRT = maximumNumber(FRA, FRB)                   |
 115 | 1011  | fmaxc[s] FRT, FRA, FRB        | x86 maxss or Win32's max macro | FRT = FRA > FRB ? FRA : FRB                     |
 116 | 1100  | fmaxmagnum08[s] FRT, FRA, FRB | IEEE 754-2008 (TODO: (3))      | FRT = minmaxmag(FRA, FRB, True, fmaxnum08) (2)  |
 117 | 1101  | fmaxmag19[s] FRT, FRA, FRB    | IEEE 754-2019                  | FRT = minmaxmag(FRA, FRB, True, fmax19) (2)     |
 118 | 1110  | fmaxmagnum19[s] FRT, FRA, FRB | IEEE 754-2019                  | FRT = minmaxmag(FRA, FRB, True, fmaxnum19) (2)  |
 119 | 1111  | fmaxmagc[s] FRT, FRA, FRB     | -                              | FRT = minmaxmag(FRA, FRB, True, fmaxc) (2)      |
 120
 121 Note (1): for the purposes of minNum/maxNum, -0.0 is defined to be less than
 122     +0.0. This is left unspecified in IEEE 754-2008.
 123
 124 Note (2): minmaxmag(x, y, cmp, fallback) is defined as:
 125
 126 ```python
 127 def minmaxmag(x, y, is_max, fallback):
 128     a = abs(x) < abs(y)
 129     b = abs(x) > abs(y)
 130     if is_max:
 131         a, b = b, a  # swap
 132     if a:
 133         return x
 134     if b:
 135         return y
 136     # equal magnitudes, or NaN input(s)
 137     return fallback(x, y)
 138 ```
 139
 140 Note (3): TODO: icr if IEEE 754-2008 has min/maxMagNum like IEEE 754-2019's
 141     minimum/maximumMagnitudeNumber
 142
 143 ----------------
 144
 145 \newpage{}
 146
 147 ## Floating Minimum/Maximum MM-form
 148
 149 * fminmax FRT, FRA, FRB, FMM
 150 * fminmax. FRT, FRA, FRB, FMM
 151
 152 ```
 153     |0    |6    |11   |16   |21   |25  |31  |
 154     | PO  | FRT | FRA | FRB | FMM | XO | Rc |
 155 ```
 156
 157 Special Registers altered:
 158
 159 ```
 160     FX VXSNAN
 161     CR1     (if Rc=1)
 162 ```
 163 Compute the minimum/maximum of FRA and FRB, according to FMM, and store the
 164 result in FRT.
 165
 166 Assembly Aliases: see
 167 [`FMM` -- Floating Min/Max Mode](#fmm-floating-min-max-mode)
 168
 169 ----------
 170
 171 ## Floating Minimum/Maximum Single MM-form
 172
 173 * fminmaxs FRT, FRA, FRB, FMM
 174 * fminmaxs. FRT, FRA, FRB, FMM
 175
 176 ```
 177     |0    |6    |11   |16   |21   |25  |31  |
 178     | PO  | FRT | FRA | FRB | FMM | XO | Rc |
 179 ```
 180
 181 Special Registers altered:
 182
 183 ```
 184     FX VXSNAN
 185     CR1     (if Rc=1)
 186 ```
 187
 188
 189 Compute the minimum/maximum of FRA and FRB, according to FMM, and store the
 190 result in FRT.
 191
 192 Assembly Aliases: see
 193 [`FMM` -- Floating Min/Max Mode](#fmm-floating-min-max-mode)
 194
 195 ----------
 196
 197 \newpage{}
 198
 199 # Fixed-Point Instructions
 200
 201 These are signed and unsigned, min or max.  SVP64 Prefixing defines Saturation
 202 semantics therefore Saturated variants of these instructions need not be proposed.
 203
 204 ## `MMM` -- Integer Min/Max Mode
 205
 206 <a id="mmm-integer-min-max-mode"></a>
 207
 208 * bit 0: set if word variant else dword
 209 * bit 1: set if signed else unsigned
 210 * bit 2: set if max else min
 211
 212 | `MMM` | Assembly Alias   | Semantics                                    |
 213 |-------|------------------|----------------------------------------------|
 214 | 000   | `minu RT,RA,RB`  | `RT = (uint64_t)RA < (uint64_t)RB ? RA : RB` |
 215 | 001   | `maxu RT,RA,RB`  | `RT = (uint64_t)RA > (uint64_t)RB ? RA : RB` |
 216 | 010   | `mins RT,RA,RB`  | `RT =  (int64_t)RA < (int64_t)RB  ? RA : RB` |
 217 | 011   | `maxs RT,RA,RB`  | `RT =  (int64_t)RA > (int64_t)RB  ? RA : RB` |
 218 | 100   | `minuw RT,RA,RB` | `RT = (uint32_t)RA < (uint32_t)RB ? RA : RB` |
 219 | 101   | `maxuw RT,RA,RB` | `RT = (uint32_t)RA > (uint32_t)RB ? RA : RB` |
 220 | 110   | `minsw RT,RA,RB` | `RT =  (int32_t)RA < (int32_t)RB  ? RA : RB` |
 221 | 111   | `maxsw RT,RA,RB` | `RT =  (int32_t)RA > (int32_t)RB  ? RA : RB` |
 222
 223 ## Minimum/Maximum MM-Form
 224
 225 * minmax RT, RA, RB, MMM
 226 * minmax. RT, RA, RB, MMM
 227
 228 ```
 229     |0    |6    |11   |16   |21   |24 |25  |31  |
 230     | PO  | RT  | RA  | RB  | MMM | / | XO | Rc |
 231 ```
 232
 233 ```
 234     a <- (RA|0)
 235     b <- (RB)
 236     if MMM[0] then  # word mode
 237         # shift left by XLEN/2 to make the dword comparison
 238         # do word comparison of the original inputs
 239         a <- a[XLEN/2:XLEN-1] || [0] * XLEN/2
 240         b <- b[XLEN/2:XLEN-1] || [0] * XLEN/2
 241     if MMM[1] then  # signed mode
 242         # invert sign bits to make the unsigned comparison
 243         # do signed comparison of the original inputs
 244         a[0] <- ¬a[0]
 245         b[0] <- ¬b[0]
 246     if MMM[2] then  # max mode
 247         # swap a and b to make the less than comparison do
 248         # greater than comparison of the original inputs
 249         t <- a
 250         a <- b
 251         b <- t
 252     # store the entire selected source (even in word mode)
 253     # if Rc = 1 then store the result of comparing a and b to CR0
 254     if a <u b then
 255         RT <- (RA|0)
 256         if Rc = 1 then CR0 <- 0b100 || XER.SO
 257     if a = b then
 258         RT <- (RB)
 259         if Rc = 1 then CR0 <- 0b001 || XER.SO
 260     if a >u b then
 261         RT <- (RB)
 262         if Rc = 1 then CR0 <- 0b010 || XER.SO
 263 ```
 264
 265 Compute the integer minimum/maximum according to `MMM` of `(RA|0)` and `(RB)`
 266 and store the result in `RT`.
 267
 268 Special Registers altered:
 269
 270 ```
 271     CR0     (if Rc=1)
 272 ```
 273
 274 Assembly Aliases: see
 275 [`MMM` -- Integer Min/Max Mode](#mmm-integer-min-max-mode)
 276
 277 ----------
 278
 279 \newpage{}
 280
 281 # Instruction Formats
 282
 283 Add the following entries to Book I 1.6.1 Word Instruction Formats:
 284
 285 ## MM-FORM
 286
 287 ```
 288     |0    |6    |11   |16   |21   |24 |25  |31  |
 289     | PO  | FRT | FRA | FRB | FMM     | XO | Rc |
 290     | PO  | RT  | RA  | RB  | MMM | / | XO | Rc |
 291 ```
 292
 293 Add the following new fields to Book I 1.6.2 Word Instruction Fields:
 294
 295 ```
 296     FMM (21:24)
 297         Field used to specify minimum/maximum mode for fminmax[s].
 298
 299         Formats: MM
 300
 301     MMM (21:23)
 302         Field used to specify minimum/maximum mode for integer minmax.
 303
 304         Formats: MM
 305 ```
 306
 307 Add `MM` to the `Formats:` list for all of `FRT`, `FRA`, `FRB`, `XO (25:30)`,
 308 `Rc`, `RT`, `RA` and `RB`.
 309
 310 ----------
 311
 312 \newpage{}
 313
 314 # Appendices
 315
 316     Appendix E Power ISA sorted by opcode
 317     Appendix F Power ISA sorted by version
 318     Appendix G Power ISA sorted by Compliancy Subset
 319     Appendix H Power ISA sorted by mnemonic
 320
 321 | Form | Book | Page | Version | Mnemonic | Description |
 322 |------|------|------|---------|----------|-------------|
 323 | MM   | I    | #    | 3.2B    | fminmax  | Floating Minimum/Maximum |
 324 | MM   | I    | #    | 3.2B    | fminmaxs | Floating Minimum/Maximum Single |
 325 | MM   | I    | #    | 3.2B    | minmax   | Minimum/Maximum |
 326
 327 ## fmax instruction count
 328
 329 32 instructions are required in SFFS to emulate fmax.
 330 <https://gcc.godbolt.org/z/6xba61To6>
 331
 332
 333 ```
 334     fmax(double, double):
 335         fcmpu 0,1,2
 336         fmr 0,1
 337         cror 30,1,2
 338         beq 7,.L12
 339         blt 0,.L13
 340         stfd 1,-16(1)
 341         lis 9,0x8
 342         li 8,-1
 343         sldi 9,9,32
 344         rldicr 8,8,0,11
 345         ori 2,2,0
 346         ld 10,-16(1)
 347         xor 10,10,9
 348         sldi 10,10,1
 349         cmpld 0,10,8
 350         bgt 0,.L5
 351         stfd 2,-16(1)
 352         ori 2,2,0
 353         ld 10,-16(1)
 354         xor 9,10,9
 355         sldi 9,9,1
 356         cmpld 0,9,8
 357         ble 0,.L6
 358 .L5:
 359         fadd 1,0,2
 360         blr
 361 .L13:
 362         fmr 1,2
 363         blr
 364 .L6:
 365         fcmpu 0,2,2
 366         fmr 1,2
 367         bnulr 0
 368 .L12:
 369         fmr 1,0
 370         blr
 371         .long 0
 372         .byte 0,9,0,0,0,0,0,0
 373 ```
 374
 375 [[!tag opf_rfc]]
 376