openpower/sv/rfc/ls004.mdwn

   1 # RFC ls004  Shift-And-Add
   2
   3 **URLs**:
   4
   5 * <https://libre-soc.org/openpower/sv/rfc/ls004/>
   6 * <https://git.openpower.foundation/isa/PowerISA/issues/125>
   7 * shift-and-add <https://bugs.libre-soc.org/show_bug.cgi?id=968>
   8 * add shaddw: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
   9
  10 **Severity**: Major
  11
  12 **Status**: New
  13
  14 **Date**: 31 Oct 2022
  15
  16 **Target**: v3.2B
  17
  18 **Source**: v3.0B
  19
  20 **Books and Section affected**:
  21
  22 ```
  23     Book I Fixed-Point Shift Instructions 3.3.14.2
  24     Appendix E Power ISA sorted by opcode
  25     Appendix F Power ISA sorted by version
  26     Appendix G Power ISA sorted by Compliancy Subset
  27     Appendix H Power ISA sorted by mnemonic
  28 ```
  29
  30 **Summary**
  31
  32 ```
  33     Instructions added
  34     shadd - Shift and Add
  35     shaddw - Shift and Add Signed Word
  36     shadduw - Shift and Add Unsigned Word
  37     Also under consideration LD/ST-Indexed-Shifted
  38 ```
  39
  40 **Submitter**: Luke Leighton (Libre-SOC)
  41
  42 **Requester**: Libre-SOC
  43
  44 **Impact on processor**:
  45
  46 ```
  47     Addition of three new GPR-based instructions
  48 ```
  49
  50 **Impact on software**:
  51
  52 ```
  53     Requires support for new instructions in assembler, debuggers,
  54     and related tools.
  55 ```
  56
  57 **Keywords**:
  58
  59 ```
  60     GPR, Bit-manipulation, Shift, Arithmetic, Array Indexing
  61 ```
  62
  63 **Motivation**
  64
  65 Power ISA is missing LD/ST Indexed with shift, which is present in both ARM
  66 and x86.  Adding more LD/ST is thirty eight instructions, a compromise is to
  67 add shift-and-add.  Replaces a pair of explicit instructions in hot-loops.
  68
  69 **Notes and Observations**:
  70
  71 1. `shadd` and `shadduw` operate on unsigned integers.
  72 2. `shadduw` is intended for performing address offsets,
  73     as the second operand is constrained to lower 32-bits
  74     and zero-extended.
  75 3. All three are 2-in 1-out instructions.
  76 4. shift-add operations are present in both x86 and aarch64,
  77     since they are useful for both general arithmetic and for
  78     computing addresses even when not immediately followed
  79     with a load/store.
  80 5. `shaddw` is often more useful than `shadduw` because C/C++ programmers like
  81     to use `int` for array indexing. for additional details see
  82     <https://bugs.libre-soc.org/show_bug.cgi?id=996>.
  83 6. Even Motorola 68000 has LD/ST-Indexed-Shifted <https://tack.sourceforge.net/olddocs/m68020.html#2.2.2.%20Extra%20MC68020%20addressing%20modes>
  84 7.  should average-add also be included? what about CA?
  85
  86 **Changes**
  87
  88 Add the following entries to:
  89
  90 * the Appendices of Book I
  91 * Instructions of Book I added to Section 3.3.14.2
  92
  93 ----------------
  94
  95 \newpage{}
  96
  97 # Table of LD/ST-Indexed-Shift
  98
  99 The following demonstrates the alternative instructions that could
 100 be considered to be added. They are all 9-bit XO:
 101
 102 * 12 Load Indexed Shifted (with Update)
 103 * 3 Load Indexed Shifted Byte-reverse
 104 * 8 Store Indexed Shifted (with Update)
 105 * 3 Store Indexed Shifted Byte-reverse
 106 * 6 Floating-Point Load Indexed Shifted (with Update)
 107 * 6 Floating-Point Store Indexed Shifted (with Update)
 108 * 6 Load Indexed Shifted Update Post-Increment
 109 * 4 Store Indexed Shifted Update Post-Increment
 110 * 2 Floating-Point Load Indexed Shifted Update Post-Increment
 111 * 2 Floating-Point Store Indexed Shifted Update Post-Increment
 112
 113 Total count: 51 new 9-bit XO instructions, for an approximate total
 114 XO cost of 3 bits within a single Primary Opcode.  With the savings
 115 that these instructions represent in hot-loops, as evidenced by their
 116 inclusion in top-end ISAs such as x86 and ARM, the cost may be considered
 117 justifiable.  However there is no point in placing the 38
 118 Shifted-only group in EXT2xx, they need to be in EXT0xx, because if added
 119 as 64-bit Encoding the benefit reduction in binary size is not achieved.
 120 Post-Increment-Shifted on the other hand could reasonably be proposed
 121 in EXT2xx.
 122
 123 **LD/ST-Shifted**
 124
 125 |  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction          |
 126 |-------|------|-------|-------|-------|-------|----------------------|
 127 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lbzsx RT,RA,RB,sm    |
 128 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhzsx RT,RA,RB,sm    |
 129 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhasx RT,RA,RB,sm    |
 130 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwzsx RT,RA,RB,sm    |
 131 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwasx RT,RA,RB,sm    |
 132 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | ldsx RT,RA,RB,sm     |
 133 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhbrsx RT,RA,RB,sm   |
 134 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwbrsx RT,RA,RB,sm   |
 135 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | ldbrsx RT,RA,RB,sm   |
 136 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stbsx RS,RA,RB,sm    |
 137 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthsx RS,RA,RB,sm    |
 138 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwsx RS,RA,RB,sm    |
 139 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stdsx RS,RA,RB,sm    |
 140 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthbrsx RS,RA,RB,sm  |
 141 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwbrsx RS,RA,RB,sm  |
 142 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stdbrsx RS,RA,RB,sm  |
 143 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfsxs FRT,RA,RB,sm   |
 144 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfdxs FRT,RA,RB,sm   |
 145 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfiwaxs FRT,RA,RB,sm |
 146 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfiwzxs FRT,RA,RB,sm |
 147 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfsxs FRS,RA,RB,sm  |
 148 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfdxs FRS,RA,RB,sm  |
 149 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfiwxs FRS,RA,RB,sm |
 150
 151 **LD/ST-Shifted-Update**
 152
 153 |  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction          |
 154 |-------|------|-------|-------|-------|-------|----------------------|
 155 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lbzusx RT,RA,RB,sm   |
 156 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhzusx RT,RA,RB,sm   |
 157 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhausx RT,RA,RB,sm   |
 158 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwzusx RT,RA,RB,sm   |
 159 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwausx RT,RA,RB,sm   |
 160 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | ldusx RT,RA,RB,sm    |
 161 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stbusx RS,RA,RB,sm   |
 162 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthusx RS,RA,RB,sm   |
 163 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwusx RS,RA,RB,sm   |
 164 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stdusx RS,RA,RB,sm   |
 165 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfsuxs FRT,RA,RB,sm  |
 166 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfduxs FRT,RA,RB,sm  |
 167 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfsuxs FRS,RA,RB,sm |
 168 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfduxs FRS,RA,RB,sm |
 169
 170 **Post-Increment-Update LD/ST-Shifted**
 171
 172 |  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction          |
 173 |-------|------|-------|-------|-------|-------|----------------------|
 174 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lbzuspx RT,RA,RB,sm   |
 175 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhzuspx RT,RA,RB,sm   |
 176 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhauspx RT,RA,RB,sm   |
 177 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwzuspx RT,RA,RB,sm   |
 178 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwauspx RT,RA,RB,sm   |
 179 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stbuspx RS,RA,RB,sm   |
 180 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthuspx RS,RA,RB,sm   |
 181 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwuspx RS,RA,RB,sm   |
 182 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stduspx RS,RA,RB,sm   |
 183 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lduspx RT,RA,RB,sm   |
 184 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfdupxs FRT,RA,RB,sm  |
 185 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfsupxs FRT,RA,RB,sm  |
 186 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfdupxs FRS,RA,RB,sm |
 187 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfsupxs FRS,RA,RB,sm |
 188
 189 ----------------
 190
 191 \newpage{}
 192
 193 # Shift-and-Add
 194
 195 `shadd RT, RA, RB, sm`
 196
 197 |  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form     |
 198 |-------|------|-------|-------|-------|-------|----|----------|
 199 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | Rc | Z23-Form |
 200
 201 Pseudocode:
 202
 203 ```
 204     shift <- sm + 1                     # Shift is between 1-4
 205     sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
 206     RT <- sum                           # Result stored in RT
 207 ```
 208
 209 When `sm` is zero, the contents of register RB are multiplied by 2,
 210 added to the contents of register RA, and the result stored in RT.
 211
 212 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
 213
 214 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
 215
 216 **NEED EXAMPLES (not sure how to embed sm)!!!**
 217 Examples:
 218
 219 ```
 220     # adds r1 to (r2*8)
 221     shadd r4, r1, r2, 3
 222 ```
 223
 224 # Shift-and-Add Signed Word
 225
 226 `shaddw RT, RA, RB, sm`
 227
 228 |  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form     |
 229 |-------|------|-------|-------|-------|-------|----|----------|
 230 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | Rc | Z23-Form |
 231
 232 Pseudocode:
 233
 234 ```
 235     shift <- sm + 1                  # Shift is between 1-4
 236     n <- EXTS64((RB)[32:63])         # Only use lower 32-bits of RB
 237     sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
 238     RT <- sum                        # Result stored in RT
 239 ```
 240
 241 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
 242 added to the contents of register RA, and the result stored in RT.
 243
 244 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
 245
 246 Operands RA and RB, and the result RT are all 64-bit, signed integers.
 247
 248 *Programmer's Note:
 249 The advantage of this instruction is doing address offsets. RA is the base 64-bit
 250 address. RB is the offset into data structure limited to 32-bit.*
 251
 252 Examples:
 253
 254 ```
 255 # r4 = r1 + (r2*16)
 256 shaddw r4, r1, r2, 3
 257 ```
 258
 259 ----------------
 260
 261 \newpage{}
 262
 263
 264 # Shift-and-Add Unsigned Word
 265
 266 `shadduw RT, RA, RB, sm`
 267
 268 |  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form     |
 269 |-------|------|-------|-------|-------|-------|----|----------|
 270 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | Rc | Z23-Form |
 271
 272 Pseudocode:
 273
 274 ```
 275     shift <- sm + 1                  # Shift is between 1-4
 276     n <- (RB)[32:63]                 # Only use lower 32-bits of RB
 277     sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
 278     RT <- sum                        # Result stored in RT
 279 ```
 280
 281 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
 282 added to the contents of register RA, and the result stored in RT.
 283
 284 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
 285
 286 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
 287
 288 *Programmer's Note:
 289 The advantage of this instruction is doing address offsets. RA is the base 64-bit
 290 address. RB is the offset into data structure limited to 32-bit.*
 291
 292 Examples:
 293
 294 ```
 295 #
 296 shadduw r4, r1, r2, 2
 297 ```
 298
 299 # Appendices
 300
 301     Appendix E Power ISA sorted by opcode
 302     Appendix F Power ISA sorted by version
 303     Appendix G Power ISA sorted by Compliancy Subset
 304     Appendix H Power ISA sorted by mnemonic
 305
 306 | Form | Book | Page | Version | mnemonic | Description |
 307 |------|------|------|---------|----------|-------------|
 308 | Z23  | I    | #    | 3.0B    | shadd    | Shift-and-Add |
 309 | Z23  | I    | #    | 3.0B    | shaddw   | Shift-and-Add Signed Word |
 310 | Z23  | I    | #    | 3.0B    | shadduw  | Shift-and-Add Unsigned Word |
 311
 312 [[!tag opf_rfc]]
 313