openpower/sv/rfc/ls004.mdwn

   1 # RFC ls004  Shift-And-Add
   2
   3 **URLs**:
   4
   5 * <https://libre-soc.org/openpower/sv/biginteger/analysis/>
   6 * <https://libre-soc.org/openpower/sv/rfc/ls004/>
   7 * bigint: <https://bugs.libre-soc.org/show_bug.cgi?id=960> TODO: maybe remove this link due to confusion and irrelevance?
   8 * <https://git.openpower.foundation/isa/PowerISA/issues/91>
   9 * shift-and-add <https://bugs.libre-soc.org/show_bug.cgi?id=968>
  10 * add shaddw: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
  11
  12 **Severity**: Major
  13
  14 **Status**: New
  15
  16 **Date**: 31 Oct 2022
  17
  18 **Target**: v3.2B
  19
  20 **Source**: v3.0B
  21
  22 **Books and Section affected**:
  23
  24 ```
  25     Book I Fixed-Point Shift Instructions 3.3.14.2
  26     Appendix E Power ISA sorted by opcode
  27     Appendix F Power ISA sorted by version
  28     Appendix G Power ISA sorted by Compliancy Subset
  29     Appendix H Power ISA sorted by mnemonic
  30 ```
  31
  32 **Summary**
  33
  34 ```
  35     Instructions added
  36     shadd - Shift and Add
  37     shadduw - Shift and Add Unsigned Word
  38 ```
  39
  40 **Submitter**: Luke Leighton (Libre-SOC)
  41
  42 **Requester**: Libre-SOC
  43
  44 **Impact on processor**:
  45
  46 ```
  47     Addition of two new GPR-based instructions
  48 ```
  49
  50 **Impact on software**:
  51
  52 ```
  53     Requires support for new instructions in assembler, debuggers,
  54     and related tools.
  55 ```
  56
  57 **Keywords**:
  58
  59 ```
  60     GPR, Bit-manipulation, Shift, Arithmetic, Array Indexing
  61 ```
  62
  63 **Motivation**
  64
  65 Power ISA is missing LD/ST Indexed with shift, which is present in both ARM
  66 and x86.  Adding more LD/ST is thirty eight instructions, a compromise is to
  67 add shift-and-add.  Replaces a pair of explicit instructions in hot-loops.
  68
  69 **Notes and Observations**:
  70
  71 1. `shadd` and `shadduw` operate on unsigned integers.
  72 2. `shadduw` is intended for performing address offsets,
  73     as the second operand is constrained to lower 32-bits
  74     and zero-extended.
  75 3. Both are 2-in 1-out instructions.
  76 4. shift-add operations are present in both x86 and aarch64,
  77     since they are useful for both general arithmetic and for
  78     computing addresses even when not immediately followed
  79     with a load/store.
  80
  81 TODO: signed 32-bit shift-and-add should be added, this needs to be addressed
  82 before submitting the RFC: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
  83
  84 **Changes**
  85
  86 Add the following entries to:
  87
  88 * the Appendices of Book I
  89 * Instructions of Book I added to Section 3.3.14.2
  90
  91 ----------------
  92
  93 \newpage{}
  94
  95 # Table of LD/ST-Indexed-Shift
  96
  97 The following demonstrates the alternative instructions that could
  98 be considered to be added. They are all 9-bit XO which is not hugely
  99 costly.  The totals are
 100
 101 * 12 Load Indexed Shifted (with Update)
 102 * 3 Load Indexed Shifted Byte-reverse
 103 * 8 Store Indexed Shifted (with Update)
 104 * 3 Store Indexed Shifted Byte-reverse
 105 * 6 Floating-Point Load Indexed Shifted (with Update)
 106 * 6 Floating-Point Store Indexed Shifted (with Update)
 107
 108 Total count: 38 new 9-bit XO instructions, for an approximate total
 109 XO cost of 3 bits within a single Primary Opcode.  With the savings
 110 that these instructions represent in hot-loops, as evidenced by their
 111 inclusion in top-end ISAs such as x86 and ARM, the cost may be considered
 112 justifiable.  However there is no point in placing these in EXT2xx, they
 113 need to be in EXT0xx, because if added as 64-bit Encoding the benefit
 114 reduction in binary size is not achieved.
 115
 116 |  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction          |
 117 |-------|------|-------|-------|-------|-------|----------------------|
 118 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lbzsx RT,RA,RB,sm    |
 119 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lbzusx RT,RA,RB,sm   |
 120 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhzsx RT,RA,RB,sm    |
 121 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhzusx RT,RA,RB,sm   |
 122 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhasx RT,RA,RB,sm    |
 123 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhausx RT,RA,RB,sm   |
 124 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwzsx RT,RA,RB,sm    |
 125 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwzusx RT,RA,RB,sm   |
 126 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwasx RT,RA,RB,sm    |
 127 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwausx RT,RA,RB,sm   |
 128 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | ldsx RT,RA,RB,sm     |
 129 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | ldusx RT,RA,RB,sm    |
 130 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lhbrsx RT,RA,RB,sm   |
 131 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | lwbrsx RT,RA,RB,sm   |
 132 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | ldbrsx RT,RA,RB,sm   |
 133 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stbsx RS,RA,RB,sm    |
 134 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stbusx RS,RA,RB,sm   |
 135 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthsx RS,RA,RB,sm    |
 136 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthusx RS,RA,RB,sm   |
 137 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwsx RS,RA,RB,sm    |
 138 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwusx RS,RA,RB,sm   |
 139 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stdsx RS,RA,RB,sm    |
 140 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stdusx RS,RA,RB,sm   |
 141 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | sthbrsx RS,RA,RB,sm  |
 142 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stwbrsx RS,RA,RB,sm  |
 143 |  PO   | RS   |  RA   |  RB   |  sm   |  XO   | stdbrsx RS,RA,RB,sm  |
 144 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfsxs FRT,RA,RB,sm   |
 145 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfsuxs FRT,RA,RB,sm  |
 146 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfdxs FRT,RA,RB,sm   |
 147 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfduxs FRT,RA,RB,sm  |
 148 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfiwaxs FRT,RA,RB,sm |
 149 |  PO   | FRT  |  RA   |  RB   |  sm   |  XO   | lfiwzxs FRT,RA,RB,sm |
 150 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfsxs FRS,RA,RB,sm  |
 151 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfsuxs FRS,RA,RB,sm |
 152 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfdxs FRS,RA,RB,sm  |
 153 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfduxs FRS,RA,RB,sm |
 154 |  PO   | FRS  |  RA   |  RB   |  sm   |  XO   | stfiwxs FRS,RA,RB,sm |
 155
 156 ----------------
 157
 158 \newpage{}
 159
 160 # Shift-and-Add
 161
 162 `shadd RT, RA, RB`
 163
 164 |  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form     |
 165 |-------|------|-------|-------|-------|-------|----|----------|
 166 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | Rc | Z23-Form |
 167
 168 Pseudocode:
 169
 170 ```
 171     shift <- sm + 1                     # Shift is between 1-4
 172     sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
 173     RT <- sum                           # Result stored in RT
 174 ```
 175
 176 When `sm` is zero, the contents of register RB are multiplied by 2,
 177 added to the contents of register RA, and the result stored in RT.
 178
 179 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
 180
 181 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
 182
 183 **NEED EXAMPLES (not sure how to embed sm)!!!**
 184 Examples:
 185
 186 ```
 187     # adds r1 to (r2*8)
 188     shadd r4, r1, r2, 3
 189 ```
 190
 191 # Shift-and-Add Unsigned Word
 192
 193 `shadduw RT, RA, RB`
 194
 195 |  0-5  | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form     |
 196 |-------|------|-------|-------|-------|-------|----|----------|
 197 |  PO   | RT   |  RA   |  RB   |  sm   |  XO   | Rc | Z23-Form |
 198
 199 Pseudocode:
 200
 201 ```
 202     shift <- sm + 1                  # Shift is between 1-4
 203     n <- (RB)[32:63]                 # Only use lower 32-bits of RB
 204     sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
 205     RT <- sum                        # Result stored in RT
 206 ```
 207
 208 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
 209 added to the contents of register RA, and the result stored in RT.
 210
 211 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
 212
 213 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
 214
 215 *Programmer's Note:
 216 The advantage of this instruction is doing address offsets. RA is the base 64-bit
 217 address. RB is the offset into data structure limited to 32-bit.*
 218
 219 Examples:
 220
 221 ```
 222 #
 223 shadduw r4, r1, r2
 224 ```
 225
 226 [[!tag opf_rfc]]
 227
 228 # Appendices
 229
 230     Appendix E Power ISA sorted by opcode
 231     Appendix F Power ISA sorted by version
 232     Appendix G Power ISA sorted by Compliancy Subset
 233     Appendix H Power ISA sorted by mnemonic
 234
 235 | Form | Book | Page | Version | mnemonic | Description |
 236 |------|------|------|---------|----------|-------------|
 237 | Z23  | I    | #    | 3.0B    | shadd    | Shift-and-Add |
 238 | Z23  | I    | #    | 3.0B    | shadduw  | Shift-and-Add Unsigned Word |