openpower/sv/rfc/ls013.mdwn

   1 # RFC ls013 Min/Max GPR/FPR
   2
   3 **URLs**:
   4
   5 * <https://libre-soc.org/openpower/sv/rfc/ls013/>
   6 * <https://git.openpower.foundation/isa/PowerISA/issues/TODO>
   7 * <https://bugs.libre-soc.org/show_bug.cgi?id=1057>
   8
   9 **Severity**: Major
  10
  11 **Status**: New
  12
  13 **Date**: 14 Apr 2023
  14
  15 **Target**: v3.2B
  16
  17 **Source**: v3.1B
  18
  19 **Books and Section affected**:
  20
  21 ```
  22     Book I Fixed-Point and Floating-Point Instructions
  23     Appendix E Power ISA sorted by opcode
  24     Appendix F Power ISA sorted by version
  25     Appendix G Power ISA sorted by Compliancy Subset
  26     Appendix H Power ISA sorted by mnemonic
  27 ```
  28
  29 **Summary**
  30
  31 ```
  32     Instructions added
  33 ```
  34
  35 **Submitter**: Luke Leighton (Libre-SOC)
  36
  37 **Requester**: Libre-SOC
  38
  39 **Impact on processor**:
  40
  41 ```
  42     Addition of new GPR-based and FPR-based instructions
  43 ```
  44
  45 **Impact on software**:
  46
  47 ```
  48     Requires support for new instructions in assembler, debuggers,
  49     and related tools.
  50 ```
  51
  52 **Keywords**:
  53
  54 ```
  55     GPR, FPR, min, max, fmin, fmax
  56 ```
  57
  58 **Motivation**
  59
  60 TODO
  61
  62 **Notes and Observations**:
  63
  64 1. SVP64 REMAP Parallel Reduction needs a single Scalar instruction to
  65    work with, for best effectiveness.  With no SFFS minimum/maximum instructions
  66    Simple-V min/max Parallel Reduction is severely compromised.
  67 2. Once one FP min/max mode is implemented the rest are not much more
  68    hardware.
  69 3.  There exists similar instructions in VSX (not IEEE754-2019 though).
  70    This is frequently used to justify not
  71    adding them. However SVP64/VSX may have different meaning from SVP64/SFFS,
  72     so it is *really* crucial to have SFFS ops even if "equivalent" to VSX
  73    in order for SVP64 to not be compromised (non-orthogonal).
  74 4. FP min/max are rather complex to implement in software, the most commonly
  75     used FP max function `fmax` from glibc compiled for SFFS is an
  76     astounding 32 instructions.
  77
  78 **Changes**
  79
  80 Add the following entries to:
  81
  82 * the Appendices of Book I
  83 * Book I 3.3.9 Fixed-Point Arithmetic Instructions
  84 * Book I 4.6.6.1 Floating-Point Elementary Arithmetic Instructions
  85 * Book I 1.6.1 and 1.6.2
  86
  87 ----------------
  88
  89 \newpage{}
  90
  91 # Floating-Point Instructions
  92
  93 This group is to provide Floating-Point min/max however with IEEE754 having advanced
  94 to 2019 there are now subtle differences. These are selectable with a Mode Field, `FMM`.
  95
  96 ## `FMM` -- Floating Min/Max Mode
  97
  98 <a id="fmm-floating-min-max-mode"></a>
  99
 100 | `FMM` | Assembly Alias                | Origin                         | Semantics                                       |
 101 |-------|-------------------------------|--------------------------------|-------------------------------------------------|
 102 | 0000  | fminnum08[s] FRT, FRA, FRB    | IEEE 754-2008                  | FRT = minNum(FRA, FRB)  (1)                     |
 103 | 0001  | fmin19[s] FRT, FRA, FRB       | IEEE 754-2019                  | FRT = minimum(FRA, FRB)                         |
 104 | 0010  | fminnum19[s] FRT, FRA, FRB    | IEEE 754-2019                  | FRT = minimumNumber(FRA, FRB)                   |
 105 | 0011  | fminc[s] FRT, FRA, FRB        | x86 minss or Win32's min macro | FRT = FRA \< FRB ? FRA : FRB                    |
 106 | 0100  | fminmagnum08[s] FRT, FRA, FRB | IEEE 754-2008 (TODO: (3))      | FRT = minmaxmag(FRA, FRB, False, fminnum08) (2) |
 107 | 0101  | fminmag19[s] FRT, FRA, FRB    | IEEE 754-2019                  | FRT = minmaxmag(FRA, FRB, False, fmin19) (2)    |
 108 | 0110  | fminmagnum19[s] FRT, FRA, FRB | IEEE 754-2019                  | FRT = minmaxmag(FRA, FRB, False, fminnum19) (2) |
 109 | 0111  | fminmagc[s] FRT, FRA, FRB     | -                              | FRT = minmaxmag(FRA, FRB, False, fminc) (2)     |
 110 | 1000  | fmaxnum08[s] FRT, FRA, FRB    | IEEE 754-2008                  | FRT = maxNum(FRA, FRB)  (1)                     |
 111 | 1001  | fmax19[s] FRT, FRA, FRB       | IEEE 754-2019                  | FRT = maximum(FRA, FRB)                         |
 112 | 1010  | fmaxnum19[s] FRT, FRA, FRB    | IEEE 754-2019                  | FRT = maximumNumber(FRA, FRB)                   |
 113 | 1011  | fmaxc[s] FRT, FRA, FRB        | x86 maxss or Win32's max macro | FRT = FRA > FRB ? FRA : FRB                     |
 114 | 1100  | fmaxmagnum08[s] FRT, FRA, FRB | IEEE 754-2008 (TODO: (3))      | FRT = minmaxmag(FRA, FRB, True, fmaxnum08) (2)  |
 115 | 1101  | fmaxmag19[s] FRT, FRA, FRB    | IEEE 754-2019                  | FRT = minmaxmag(FRA, FRB, True, fmax19) (2)     |
 116 | 1110  | fmaxmagnum19[s] FRT, FRA, FRB | IEEE 754-2019                  | FRT = minmaxmag(FRA, FRB, True, fmaxnum19) (2)  |
 117 | 1111  | fmaxmagc[s] FRT, FRA, FRB     | -                              | FRT = minmaxmag(FRA, FRB, True, fmaxc) (2)      |
 118
 119 Note (1): for the purposes of minNum/maxNum, -0.0 is defined to be less than
 120     +0.0. This is left unspecified in IEEE 754-2008.
 121
 122 Note (2): minmaxmag(x, y, cmp, fallback) is defined as:
 123
 124 ```python
 125 def minmaxmag(x, y, is_max, fallback):
 126     a = abs(x) < abs(y)
 127     b = abs(x) > abs(y)
 128     if is_max:
 129         a, b = b, a  # swap
 130     if a:
 131         return x
 132     if b:
 133         return y
 134     # equal magnitudes, or NaN input(s)
 135     return fallback(x, y)
 136 ```
 137
 138 Note (3): TODO: icr if IEEE 754-2008 has min/maxMagNum like IEEE 754-2019's
 139     minimum/maximumMagnitudeNumber
 140
 141 ----------------
 142
 143 \newpage{}
 144
 145 ## Floating Minimum/Maximum
 146
 147 A-Form
 148
 149
 150 * fminmax FRT, FRA, FRB, FMM
 151 * fminmax. FRT, FRA, FRB, FMM
 152
 153 ```
 154     |0    |6    |11   |16   |21          |26  |31  |
 155     | PO  | FRT | FRA | FRB | FMM[0:3] / | XO | Rc |
 156 ```
 157
 158 Special Registers altered:
 159
 160 ```
 161     FX VXSNAN
 162     CR1     (if Rc=1)
 163 ```
 164 Compute the minimum/maximum of FRA and FRB, according to FMM, and store the
 165 result in FRT.
 166
 167 Assembly Aliases: see
 168 [`FMM` -- Floating Min/Max Mode](#fmm-floating-min-max-mode)
 169
 170 ----------
 171
 172 ## Floating Minimum/Maximum Single
 173
 174 A-Form
 175
 176 * fminmaxs FRT, FRA, FRB, FMM
 177 * fminmaxs. FRT, FRA, FRB, FMM
 178
 179 ```
 180     |0    |6    |11   |16   |21          |26  |31  |
 181     | PO  | FRT | FRA | FRB | FMM[0:3] / | XO | Rc |
 182 ```
 183
 184 Special Registers altered:
 185
 186 ```
 187     FX VXSNAN
 188     CR1     (if Rc=1)
 189 ```
 190
 191
 192 Compute the minimum/maximum of FRA and FRB, according to FMM, and store the
 193 result in FRT.
 194
 195 Assembly Aliases: see
 196 [`FMM` -- Floating Min/Max Mode](#fmm-floating-min-max-mode)
 197
 198 ----------
 199
 200 \newpage{}
 201
 202 # Fixed-Point Instructions
 203
 204 These are signed and unsigned, min or max.  SVP64 Prefixing defines Saturation
 205 semantics therefore Saturated variants of these instructions need not be proposed.
 206
 207 ## Integer Min/Max Mode
 208
 209 * bit 0: set if word variant else dword
 210 * bit 1: set if signed else unsigned
 211 * bit 2: set if max else min
 212
 213 | `IMM` | Assembly Alias   | Semantics                                    |
 214 |-------|------------------|----------------------------------------------|
 215 | 000   | `minu RT,RA,RB`  | `RT = (uint64_t)RA < (uint64_t)RB ? RA : RB` |
 216 | 001   | `maxu RT,RA,RB`  | `RT = (uint64_t)RA > (uint64_t)RB ? RA : RB` |
 217 | 010   | `mins RT,RA,RB`  | `RT =  (int64_t)RA < (int64_t)RB  ? RA : RB` |
 218 | 011   | `maxs RT,RA,RB`  | `RT =  (int64_t)RA > (int64_t)RB  ? RA : RB` |
 219 | 100   | `minuw RT,RA,RB` | `RT = (uint32_t)RA < (uint32_t)RB ? RA : RB` |
 220 | 101   | `maxuw RT,RA,RB` | `RT = (uint32_t)RA > (uint32_t)RB ? RA : RB` |
 221 | 110   | `minsw RT,RA,RB` | `RT =  (int32_t)RA < (int32_t)RB  ? RA : RB` |
 222 | 111   | `maxsw RT,RA,RB` | `RT =  (int32_t)RA > (int32_t)RB  ? RA : RB` |
 223
 224 ## Minimum Unsigned
 225
 226 X-Form
 227
 228 ```
 229     |0   |6   |11  |16   |21  |31  |
 230     | PO | RT | RA | RB  | XO | Rc |
 231 ```
 232
 233 * minu RT, RA, RB
 234 * minu. RT, RA, RB
 235
 236
 237 ```
 238     if (RA) <u (RB) then
 239         RT <- (RA)
 240     else
 241         RT <- (RB)
 242 ```
 243
 244 Special Registers altered:
 245
 246 ```
 247     CR0     (if Rc=1)
 248 ```
 249
 250 Compute the unsigned minimum of RA and RB and store the result in RT.
 251
 252 ## Maximum Unsigned
 253
 254 X-Form
 255
 256 ```
 257     maxu RT, RA, RB
 258     maxu. RT, RA, RB
 259 ```
 260
 261 ```
 262     |0   |6   |11  |16   |21  |31  |
 263     | PO | RT | RA | RB  | XO | Rc |
 264 ```
 265
 266 ```
 267     if (RA) >u (RB) then
 268         RT <- (RA)
 269     else
 270         RT <- (RB)
 271 ```
 272
 273 Special Registers altered:
 274
 275 ```
 276     CR0     (if Rc=1)
 277 ```
 278
 279 Compute the unsigned maximum of RA and RB and store the result in RT.
 280
 281 \newpage{}
 282
 283 ## Minimum
 284
 285 X-Form
 286
 287 ```
 288     min RT, RA, RB
 289     min. RT, RA, RB
 290 ```
 291
 292 ```
 293     |0   |6   |11  |16   |21  |31  |
 294     | PO | RT | RA | RB  | XO | Rc |
 295 ```
 296
 297 ```
 298     if (RA) < (RB) then
 299         RT <- (RA)
 300     else
 301         RT <- (RB)
 302 ```
 303
 304 Special Registers altered:
 305
 306 ```
 307     CR0     (if Rc=1)
 308 ```
 309 Compute the signed minimum of RA and RB and store the result in RT.
 310
 311 ## Maximum
 312
 313 X-Form
 314
 315 ```
 316     max RT, RA, RB
 317     max. RT, RA, RB
 318 ```
 319
 320 ```
 321     |0   |6   |11  |16   |21  |31  |
 322     | PO | RT | RA | RB  | XO | Rc |
 323 ```
 324
 325 ```
 326     if (RA) > (RB) then
 327         RT <- (RA)
 328     else
 329         RT <- (RB)
 330 ```
 331
 332 Compute the signed maximum of RA and RB and store the result in RT.
 333
 334 Special Registers altered:
 335
 336 ```
 337     CR0     (if Rc=1)
 338 ```
 339
 340 ----------
 341
 342 \newpage{}
 343
 344 # Instruction Formats
 345
 346 Add the following entries to Book I 1.6.1.15 X-FORM:
 347
 348 ```
 349     |0    |6    |11   |16   |21          |26  |31  |
 350     | PO  | FRT | FRA | FRB | FMM[0:3] / | XO | Rc |
 351 ```
 352
 353 Add a new field to Book I 1.6.2 Word Instruction Fields:
 354
 355 ```
 356     FMM (21:24)
 357         Field used to specify minimum/maximum mode for fminmax[s].
 358
 359         Formats: A
 360 ```
 361
 362 ----------
 363
 364 \newpage{}
 365
 366 # Appendices
 367
 368     Appendix E Power ISA sorted by opcode
 369     Appendix F Power ISA sorted by version
 370     Appendix G Power ISA sorted by Compliancy Subset
 371     Appendix H Power ISA sorted by mnemonic
 372
 373 | Form | Book | Page | Version | mnemonic | Description |
 374 |------|------|------|---------|----------|-------------|
 375 | A    | I    | #    | 3.2B    | fminmax  | Floating Minimum/Maximum |
 376 | A    | I    | #    | 3.2B    | fminmaxs | Floating Minimum/Maximum Single |
 377 | ???  | I    | #    | 3.2B    | minmax | Minimum/max Signed/Unsigned |
 378
 379 ## fmax instruction count
 380
 381 32 instructions are required in SFFS to emulate fmac.
 382 <https://gcc.godbolt.org/z/6xba61To6>
 383
 384
 385 ```
 386     fmax(double, double):
 387         fcmpu 0,1,2
 388         fmr 0,1
 389         cror 30,1,2
 390         beq 7,.L12
 391         blt 0,.L13
 392         stfd 1,-16(1)
 393         lis 9,0x8
 394         li 8,-1
 395         sldi 9,9,32
 396         rldicr 8,8,0,11
 397         ori 2,2,0
 398         ld 10,-16(1)
 399         xor 10,10,9
 400         sldi 10,10,1
 401         cmpld 0,10,8
 402         bgt 0,.L5
 403         stfd 2,-16(1)
 404         ori 2,2,0
 405         ld 10,-16(1)
 406         xor 9,10,9
 407         sldi 9,9,1
 408         cmpld 0,9,8
 409         ble 0,.L6
 410 .L5:
 411         fadd 1,0,2
 412         blr
 413 .L13:
 414         fmr 1,2
 415         blr
 416 .L6:
 417         fcmpu 0,2,2
 418         fmr 1,2
 419         bnulr 0
 420 .L12:
 421         fmr 1,0
 422         blr
 423         .long 0
 424         .byte 0,9,0,0,0,0,0,0
 425 ```
 426
 427 [[!tag opf_rfc]]
 428