openpower/sv/rfc/ls013.mdwn

   1 # RFC ls013 Min/Max GPR/FPR
   2
   3 **URLs**:
   4
   5 * <https://libre-soc.org/openpower/sv/rfc/ls013/>
   6 * <https://git.openpower.foundation/isa/PowerISA/issues/TODO>
   7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1057>
   8
   9 **Severity**: Major
  10
  11 **Status**: New
  12
  13 **Date**: 14 Apr 2023
  14
  15 **Target**: v3.2B
  16
  17 **Source**: v3.1B
  18
  19 **Books and Section affected**:
  20
  21 ```
  22     Book I Fixed-Point and Floating-Point Instructions
  23     Appendix E Power ISA sorted by opcode
  24     Appendix F Power ISA sorted by version
  25     Appendix G Power ISA sorted by Compliancy Subset
  26     Appendix H Power ISA sorted by mnemonic
  27 ```
  28
  29 **Summary**
  30
  31 ```
  32     Instructions added
  33 ```
  34
  35 **Submitter**: Luke Leighton (Libre-SOC)
  36
  37 **Requester**: Libre-SOC
  38
  39 **Impact on processor**:
  40
  41 ```
  42     Addition of new GPR-based and FPR-based instructions
  43 ```
  44
  45 **Impact on software**:
  46
  47 ```
  48     Requires support for new instructions in assembler, debuggers,
  49     and related tools.
  50 ```
  51
  52 **Keywords**:
  53
  54 ```
  55     GPR, FPR, min, max, fmin, fmax
  56 ```
  57
  58 **Motivation**
  59
  60 Minimum/Maximum are common operations that can take an astounding number of
  61 operations to implement in software. Additionally, Vector Reduce-Min/Max are
  62 common vector operations, and SVP64 Parallel Reduction needs a single Scalar
  63 instruction in order to effectively implement Reduce-Min/Max.
  64
  65 **Notes and Observations**:
  66
  67 1. SVP64 REMAP Parallel Reduction needs a single Scalar instruction to
  68    work with, for best effectiveness.  With no SFFS minimum/maximum instructions
  69    Simple-V min/max Parallel Reduction is severely compromised.
  70 2. Once one FP min/max mode is implemented the rest are not much more
  71    hardware.
  72 3.  There exists similar instructions in VSX (not IEEE754-2019 though).
  73    This is frequently used to justify not
  74    adding them. However SVP64/VSX may have different meaning from SVP64/SFFS,
  75     so it is *really* crucial to have SFFS ops even if "equivalent" to VSX
  76    in order for SVP64 to not be compromised (non-orthogonal).
  77 4. FP min/max are rather complex to implement in software, the most commonly
  78     used FP max function `fmax` from glibc compiled for SFFS is an
  79     astounding 32 instructions.
  80
  81 **Changes**
  82
  83 Add the following entries to:
  84
  85 * the Appendices of Book I
  86 * Book I 3.3.9 Fixed-Point Arithmetic Instructions
  87 * Book I 4.6.6.1 Floating-Point Elementary Arithmetic Instructions
  88 * Book I 1.6.1 and 1.6.2
  89
  90 ----------------
  91
  92 \newpage{}
  93
  94 # Floating-Point Instructions
  95
  96 This group is to provide Floating-Point min/max however with IEEE754 having advanced
  97 to 2019 there are now subtle differences. These are selectable with a Mode Field, `FMM`.
  98
  99 ## `FMM` -- Floating Min/Max Mode
 100
 101 <a id="fmm-floating-min-max-mode"></a>
 102
 103 | `FMM` | Assembly Alias                | Origin                         | Semantics                                       |
 104 |-------|-------------------------------|--------------------------------|-------------------------------------------------|
 105 | 0000  | fminnum08[s] FRT, FRA, FRB    | IEEE 754-2008                  | FRT = minNum(FRA, FRB)  (1)                     |
 106 | 0001  | fmin19[s] FRT, FRA, FRB       | IEEE 754-2019                  | FRT = minimum(FRA, FRB)                         |
 107 | 0010  | fminnum19[s] FRT, FRA, FRB    | IEEE 754-2019                  | FRT = minimumNumber(FRA, FRB)                   |
 108 | 0011  | fminc[s] FRT, FRA, FRB        | x86 minss or Win32's min macro | FRT = FRA \< FRB ? FRA : FRB                    |
 109 | 0100  | fminmagnum08[s] FRT, FRA, FRB | IEEE 754-2008 (TODO: (3))      | FRT = minmaxmag(FRA, FRB, False, fminnum08) (2) |
 110 | 0101  | fminmag19[s] FRT, FRA, FRB    | IEEE 754-2019                  | FRT = minmaxmag(FRA, FRB, False, fmin19) (2)    |
 111 | 0110  | fminmagnum19[s] FRT, FRA, FRB | IEEE 754-2019                  | FRT = minmaxmag(FRA, FRB, False, fminnum19) (2) |
 112 | 0111  | fminmagc[s] FRT, FRA, FRB     | -                              | FRT = minmaxmag(FRA, FRB, False, fminc) (2)     |
 113 | 1000  | fmaxnum08[s] FRT, FRA, FRB    | IEEE 754-2008                  | FRT = maxNum(FRA, FRB)  (1)                     |
 114 | 1001  | fmax19[s] FRT, FRA, FRB       | IEEE 754-2019                  | FRT = maximum(FRA, FRB)                         |
 115 | 1010  | fmaxnum19[s] FRT, FRA, FRB    | IEEE 754-2019                  | FRT = maximumNumber(FRA, FRB)                   |
 116 | 1011  | fmaxc[s] FRT, FRA, FRB        | x86 maxss or Win32's max macro | FRT = FRA > FRB ? FRA : FRB                     |
 117 | 1100  | fmaxmagnum08[s] FRT, FRA, FRB | IEEE 754-2008 (TODO: (3))      | FRT = minmaxmag(FRA, FRB, True, fmaxnum08) (2)  |
 118 | 1101  | fmaxmag19[s] FRT, FRA, FRB    | IEEE 754-2019                  | FRT = minmaxmag(FRA, FRB, True, fmax19) (2)     |
 119 | 1110  | fmaxmagnum19[s] FRT, FRA, FRB | IEEE 754-2019                  | FRT = minmaxmag(FRA, FRB, True, fmaxnum19) (2)  |
 120 | 1111  | fmaxmagc[s] FRT, FRA, FRB     | -                              | FRT = minmaxmag(FRA, FRB, True, fmaxc) (2)      |
 121
 122 Note (1): for the purposes of minNum/maxNum, -0.0 is defined to be less than
 123     +0.0. This is left unspecified in IEEE 754-2008.
 124
 125 Note (2): minmaxmag(x, y, cmp, fallback) is defined as:
 126
 127 ```python
 128 def minmaxmag(x, y, is_max, fallback):
 129     a = abs(x) < abs(y)
 130     b = abs(x) > abs(y)
 131     if is_max:
 132         a, b = b, a  # swap
 133     if a:
 134         return x
 135     if b:
 136         return y
 137     # equal magnitudes, or NaN input(s)
 138     return fallback(x, y)
 139 ```
 140
 141 Note (3): TODO: icr if IEEE 754-2008 has min/maxMagNum like IEEE 754-2019's
 142     minimum/maximumMagnitudeNumber
 143
 144 ----------------
 145
 146 \newpage{}
 147
 148 ## Floating Minimum/Maximum
 149
 150 A-Form
 151
 152
 153 * fminmax FRT, FRA, FRB, FMM
 154 * fminmax. FRT, FRA, FRB, FMM
 155
 156 ```
 157     |0    |6    |11   |16   |21          |26  |31  |
 158     | PO  | FRT | FRA | FRB | FMM[0:3] / | XO | Rc |
 159 ```
 160
 161 Special Registers altered:
 162
 163 ```
 164     FX VXSNAN
 165     CR1     (if Rc=1)
 166 ```
 167 Compute the minimum/maximum of FRA and FRB, according to FMM, and store the
 168 result in FRT.
 169
 170 Assembly Aliases: see
 171 [`FMM` -- Floating Min/Max Mode](#fmm-floating-min-max-mode)
 172
 173 ----------
 174
 175 ## Floating Minimum/Maximum Single
 176
 177 A-Form
 178
 179 * fminmaxs FRT, FRA, FRB, FMM
 180 * fminmaxs. FRT, FRA, FRB, FMM
 181
 182 ```
 183     |0    |6    |11   |16   |21          |26  |31  |
 184     | PO  | FRT | FRA | FRB | FMM[0:3] / | XO | Rc |
 185 ```
 186
 187 Special Registers altered:
 188
 189 ```
 190     FX VXSNAN
 191     CR1     (if Rc=1)
 192 ```
 193
 194
 195 Compute the minimum/maximum of FRA and FRB, according to FMM, and store the
 196 result in FRT.
 197
 198 Assembly Aliases: see
 199 [`FMM` -- Floating Min/Max Mode](#fmm-floating-min-max-mode)
 200
 201 ----------
 202
 203 \newpage{}
 204
 205 # Fixed-Point Instructions
 206
 207 These are signed and unsigned, min or max.  SVP64 Prefixing defines Saturation
 208 semantics therefore Saturated variants of these instructions need not be proposed.
 209
 210 ## Integer Min/Max Mode
 211
 212 * bit 0: set if word variant else dword
 213 * bit 1: set if signed else unsigned
 214 * bit 2: set if max else min
 215
 216 | `IMM` | Assembly Alias   | Semantics                                    |
 217 |-------|------------------|----------------------------------------------|
 218 | 000   | `minu RT,RA,RB`  | `RT = (uint64_t)RA < (uint64_t)RB ? RA : RB` |
 219 | 001   | `maxu RT,RA,RB`  | `RT = (uint64_t)RA > (uint64_t)RB ? RA : RB` |
 220 | 010   | `mins RT,RA,RB`  | `RT =  (int64_t)RA < (int64_t)RB  ? RA : RB` |
 221 | 011   | `maxs RT,RA,RB`  | `RT =  (int64_t)RA > (int64_t)RB  ? RA : RB` |
 222 | 100   | `minuw RT,RA,RB` | `RT = (uint32_t)RA < (uint32_t)RB ? RA : RB` |
 223 | 101   | `maxuw RT,RA,RB` | `RT = (uint32_t)RA > (uint32_t)RB ? RA : RB` |
 224 | 110   | `minsw RT,RA,RB` | `RT =  (int32_t)RA < (int32_t)RB  ? RA : RB` |
 225 | 111   | `maxsw RT,RA,RB` | `RT =  (int32_t)RA > (int32_t)RB  ? RA : RB` |
 226
 227 ## Integer Min/Max MM-Form
 228
 229 * minmax RT, RA, RB, MMM
 230 * minmax. RT, RA, RB, MMM
 231
 232 ```
 233     |0    |6    |11   |16   |21   |24 |25  |31  |
 234     | PO  | RT  | RA  | RB  | MMM | / | XO | Rc |
 235 ```
 236
 237 ```
 238     a <- (RA)
 239     b <- (RB)
 240     if MMM[0] then  # word mode
 241         # shift left by XLEN/2 to make the dword comparison
 242         # do word comparison of the original inputs
 243         a <- a[XLEN/2:XLEN-1] || [0] * XLEN/2
 244         b <- b[XLEN/2:XLEN-1] || [0] * XLEN/2
 245     if MMM[1] then  # signed mode
 246         # invert sign bits to make the unsigned comparison
 247         # do signed comparison of the original inputs
 248         a[0] <- !a[0]  # convert
 249         b[0] <- !b[0]
 250     if MMM[2] then  # max mode
 251         # swap a and b to make the less than comparison do
 252         # greater than comparison of the original inputs
 253         t <- a
 254         a <- b
 255         b <- t
 256     # store the entire selected source (even in word mode)
 257     if a <u b then RT <- (RA)
 258     else           RT <- (RB)
 259 ```
 260
 261 Compute the integer minimum/maximum according to `MMM` of `RA` and `RB` and
 262 store the result in `RT`.
 263
 264 Special Registers altered:
 265
 266 ```
 267     CR0     (if Rc=1)
 268 ```
 269
 270 ----------
 271
 272 \newpage{}
 273
 274 # Instruction Formats
 275
 276 Add the following entries to Book I 1.6.1.15 X-FORM:
 277
 278 ```
 279     |0    |6    |11   |16   |21          |26  |31  |
 280     | PO  | FRT | FRA | FRB | FMM[0:3] / | XO | Rc |
 281 ```
 282
 283 Add a new field to Book I 1.6.2 Word Instruction Fields:
 284
 285 ```
 286     FMM (21:24)
 287         Field used to specify minimum/maximum mode for fminmax[s].
 288
 289         Formats: A
 290 ```
 291
 292 ----------
 293
 294 \newpage{}
 295
 296 # Appendices
 297
 298     Appendix E Power ISA sorted by opcode
 299     Appendix F Power ISA sorted by version
 300     Appendix G Power ISA sorted by Compliancy Subset
 301     Appendix H Power ISA sorted by mnemonic
 302
 303 | Form | Book | Page | Version | mnemonic | Description |
 304 |------|------|------|---------|----------|-------------|
 305 | A    | I    | #    | 3.2B    | fminmax  | Floating Minimum/Maximum |
 306 | A    | I    | #    | 3.2B    | fminmaxs | Floating Minimum/Maximum Single |
 307 | ???  | I    | #    | 3.2B    | minmax | Minimum/max Signed/Unsigned |
 308
 309 ## fmax instruction count
 310
 311 32 instructions are required in SFFS to emulate fmax.
 312 <https://gcc.godbolt.org/z/6xba61To6>
 313
 314
 315 ```
 316     fmax(double, double):
 317         fcmpu 0,1,2
 318         fmr 0,1
 319         cror 30,1,2
 320         beq 7,.L12
 321         blt 0,.L13
 322         stfd 1,-16(1)
 323         lis 9,0x8
 324         li 8,-1
 325         sldi 9,9,32
 326         rldicr 8,8,0,11
 327         ori 2,2,0
 328         ld 10,-16(1)
 329         xor 10,10,9
 330         sldi 10,10,1
 331         cmpld 0,10,8
 332         bgt 0,.L5
 333         stfd 2,-16(1)
 334         ori 2,2,0
 335         ld 10,-16(1)
 336         xor 9,10,9
 337         sldi 9,9,1
 338         cmpld 0,9,8
 339         ble 0,.L6
 340 .L5:
 341         fadd 1,0,2
 342         blr
 343 .L13:
 344         fmr 1,2
 345         blr
 346 .L6:
 347         fcmpu 0,2,2
 348         fmr 1,2
 349         bnulr 0
 350 .L12:
 351         fmr 1,0
 352         blr
 353         .long 0
 354         .byte 0,9,0,0,0,0,0,0
 355 ```
 356
 357 [[!tag opf_rfc]]
 358