1 # RFC ls004 Shift-And-Add
5 * <https://libre-soc.org/openpower/sv/rfc/ls004/>
6 * <https://git.openpower.foundation/isa/PowerISA/issues/125>
7 * shift-and-add <https://bugs.libre-soc.org/show_bug.cgi?id=968>
8 * add shaddw: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
20 **Books and Section affected**:
23 Book I Fixed-Point Shift Instructions 3.3.14.2
24 Appendix E Power ISA sorted by opcode
25 Appendix F Power ISA sorted by version
26 Appendix G Power ISA sorted by Compliancy Subset
27 Appendix H Power ISA sorted by mnemonic
35 shaddw - Shift and Add Signed Word
36 shadduw - Shift and Add Unsigned Word
37 Also under consideration LD/ST-Indexed-Shifted
40 **Submitter**: Luke Leighton (Libre-SOC)
42 **Requester**: Libre-SOC
44 **Impact on processor**:
47 Addition of three new GPR-based instructions
50 **Impact on software**:
53 Requires support for new instructions in assembler, debuggers,
60 GPR, Bit-manipulation, Shift, Arithmetic, Array Indexing
65 Power ISA is missing LD/ST Indexed with shift, which is present in both ARM
66 and x86. Adding more LD/ST is thirty eight instructions, a compromise is to
67 add shift-and-add. Replaces a pair of explicit instructions in hot-loops.
69 **Notes and Observations**:
71 1. `shadd` and `shadduw` operate on unsigned integers.
72 2. `shadduw` is intended for performing address offsets,
73 as the second operand is constrained to lower 32-bits
75 3. All three are 2-in 1-out instructions.
76 4. shift-add operations are present in both x86 and aarch64,
77 since they are useful for both general arithmetic and for
78 computing addresses even when not immediately followed
80 5. `shaddw` is often more useful than `shadduw` because C/C++ programmers like
81 to use `int` for array indexing. for additional details see
82 <https://bugs.libre-soc.org/show_bug.cgi?id=996>.
83 6. Even Motorola 68000 has LD/ST-Indexed-Shifted <https://tack.sourceforge.net/olddocs/m68020.html#2.2.2.%20Extra%20MC68020%20addressing%20modes>
84 7. should average-add also be included? what about CA?
88 Add the following entries to:
90 * the Appendices of Book I
91 * Instructions of Book I added to Section 3.3.14.2
97 # Table of LD/ST-Indexed-Shift
99 The following demonstrates the alternative instructions that could
100 be considered to be added. They are all 9-bit XO:
102 * 12 Load Indexed Shifted (with Update)
103 * 3 Load Indexed Shifted Byte-reverse
104 * 8 Store Indexed Shifted (with Update)
105 * 3 Store Indexed Shifted Byte-reverse
106 * 6 Floating-Point Load Indexed Shifted (with Update)
107 * 6 Floating-Point Store Indexed Shifted (with Update)
108 * 6 Load Indexed Shifted Update Post-Increment
109 * 4 Store Indexed Shifted Update Post-Increment
110 * 2 Floating-Point Load Indexed Shifted Update Post-Increment
111 * 2 Floating-Point Store Indexed Shifted Update Post-Increment
113 Total count: 51 new 9-bit XO instructions, for an approximate total
114 XO cost of 3 bits within a single Primary Opcode. With the savings
115 that these instructions represent in hot-loops, as evidenced by their
116 inclusion in top-end ISAs such as x86 and ARM, the cost may be considered
117 justifiable. However there is no point in placing the 38
118 Shifted-only group in EXT2xx, they need to be in EXT0xx, because if added
119 as 64-bit Encoding the benefit reduction in binary size is not achieved.
120 Post-Increment-Shifted on the other hand could reasonably be proposed
125 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
126 |-------|------|-------|-------|-------|-------|----------------------|
127 | PO | RT | RA | RB | sm | XO | lbzsx RT,RA,RB,sm |
128 | PO | RT | RA | RB | sm | XO | lhzsx RT,RA,RB,sm |
129 | PO | RT | RA | RB | sm | XO | lhasx RT,RA,RB,sm |
130 | PO | RT | RA | RB | sm | XO | lwzsx RT,RA,RB,sm |
131 | PO | RT | RA | RB | sm | XO | lwasx RT,RA,RB,sm |
132 | PO | RT | RA | RB | sm | XO | ldsx RT,RA,RB,sm |
133 | PO | RT | RA | RB | sm | XO | lhbrsx RT,RA,RB,sm |
134 | PO | RT | RA | RB | sm | XO | lwbrsx RT,RA,RB,sm |
135 | PO | RT | RA | RB | sm | XO | ldbrsx RT,RA,RB,sm |
136 | PO | RS | RA | RB | sm | XO | stbsx RS,RA,RB,sm |
137 | PO | RS | RA | RB | sm | XO | sthsx RS,RA,RB,sm |
138 | PO | RS | RA | RB | sm | XO | stwsx RS,RA,RB,sm |
139 | PO | RS | RA | RB | sm | XO | stdsx RS,RA,RB,sm |
140 | PO | RS | RA | RB | sm | XO | sthbrsx RS,RA,RB,sm |
141 | PO | RS | RA | RB | sm | XO | stwbrsx RS,RA,RB,sm |
142 | PO | RS | RA | RB | sm | XO | stdbrsx RS,RA,RB,sm |
143 | PO | FRT | RA | RB | sm | XO | lfsxs FRT,RA,RB,sm |
144 | PO | FRT | RA | RB | sm | XO | lfdxs FRT,RA,RB,sm |
145 | PO | FRT | RA | RB | sm | XO | lfiwaxs FRT,RA,RB,sm |
146 | PO | FRT | RA | RB | sm | XO | lfiwzxs FRT,RA,RB,sm |
147 | PO | FRS | RA | RB | sm | XO | stfsxs FRS,RA,RB,sm |
148 | PO | FRS | RA | RB | sm | XO | stfdxs FRS,RA,RB,sm |
149 | PO | FRS | RA | RB | sm | XO | stfiwxs FRS,RA,RB,sm |
151 **LD/ST-Shifted-Update**
153 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
154 |-------|------|-------|-------|-------|-------|----------------------|
155 | PO | RT | RA | RB | sm | XO | lbzusx RT,RA,RB,sm |
156 | PO | RT | RA | RB | sm | XO | lhzusx RT,RA,RB,sm |
157 | PO | RT | RA | RB | sm | XO | lhausx RT,RA,RB,sm |
158 | PO | RT | RA | RB | sm | XO | lwzusx RT,RA,RB,sm |
159 | PO | RT | RA | RB | sm | XO | lwausx RT,RA,RB,sm |
160 | PO | RT | RA | RB | sm | XO | ldusx RT,RA,RB,sm |
161 | PO | RS | RA | RB | sm | XO | stbusx RS,RA,RB,sm |
162 | PO | RS | RA | RB | sm | XO | sthusx RS,RA,RB,sm |
163 | PO | RS | RA | RB | sm | XO | stwusx RS,RA,RB,sm |
164 | PO | RS | RA | RB | sm | XO | stdusx RS,RA,RB,sm |
165 | PO | FRT | RA | RB | sm | XO | lfsuxs FRT,RA,RB,sm |
166 | PO | FRT | RA | RB | sm | XO | lfduxs FRT,RA,RB,sm |
167 | PO | FRS | RA | RB | sm | XO | stfsuxs FRS,RA,RB,sm |
168 | PO | FRS | RA | RB | sm | XO | stfduxs FRS,RA,RB,sm |
170 **Post-Increment-Update LD/ST-Shifted**
172 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
173 |-------|------|-------|-------|-------|-------|----------------------|
174 | PO | RT | RA | RB | sm | XO | lbzuspx RT,RA,RB,sm |
175 | PO | RT | RA | RB | sm | XO | lhzuspx RT,RA,RB,sm |
176 | PO | RT | RA | RB | sm | XO | lhauspx RT,RA,RB,sm |
177 | PO | RT | RA | RB | sm | XO | lwzuspx RT,RA,RB,sm |
178 | PO | RT | RA | RB | sm | XO | lwauspx RT,RA,RB,sm |
179 | PO | RS | RA | RB | sm | XO | stbuspx RS,RA,RB,sm |
180 | PO | RS | RA | RB | sm | XO | sthuspx RS,RA,RB,sm |
181 | PO | RS | RA | RB | sm | XO | stwuspx RS,RA,RB,sm |
182 | PO | RS | RA | RB | sm | XO | stduspx RS,RA,RB,sm |
183 | PO | RT | RA | RB | sm | XO | lduspx RT,RA,RB,sm |
184 | PO | FRT | RA | RB | sm | XO | lfdupxs FRT,RA,RB,sm |
185 | PO | FRT | RA | RB | sm | XO | lfsupxs FRT,RA,RB,sm |
186 | PO | FRS | RA | RB | sm | XO | stfdupxs FRS,RA,RB,sm |
187 | PO | FRS | RA | RB | sm | XO | stfsupxs FRS,RA,RB,sm |
195 `shadd RT, RA, RB, sm`
197 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
198 |-------|------|-------|-------|-------|-------|----|----------|
199 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
204 shift <- sm + 1 # Shift is between 1-4
205 sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
206 RT <- sum # Result stored in RT
209 When `sm` is zero, the contents of register RB are multiplied by 2,
210 added to the contents of register RA, and the result stored in RT.
212 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
214 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
216 **NEED EXAMPLES (not sure how to embed sm)!!!**
224 # Shift-and-Add Signed Word
226 `shaddw RT, RA, RB, sm`
228 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
229 |-------|------|-------|-------|-------|-------|----|----------|
230 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
235 shift <- sm + 1 # Shift is between 1-4
236 n <- EXTS64((RB)[32:63]) # Only use lower 32-bits of RB
237 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
238 RT <- sum # Result stored in RT
241 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
242 added to the contents of register RA, and the result stored in RT.
244 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
246 Operands RA and RB, and the result RT are all 64-bit, signed integers.
249 The advantage of this instruction is doing address offsets. RA is the base 64-bit
250 address. RB is the offset into data structure limited to 32-bit.*
264 # Shift-and-Add Unsigned Word
266 `shadduw RT, RA, RB, sm`
268 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
269 |-------|------|-------|-------|-------|-------|----|----------|
270 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
275 shift <- sm + 1 # Shift is between 1-4
276 n <- (RB)[32:63] # Only use lower 32-bits of RB
277 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
278 RT <- sum # Result stored in RT
281 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
282 added to the contents of register RA, and the result stored in RT.
284 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
286 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
289 The advantage of this instruction is doing address offsets. RA is the base 64-bit
290 address. RB is the offset into data structure limited to 32-bit.*
296 shadduw r4, r1, r2, 2
301 Appendix E Power ISA sorted by opcode
302 Appendix F Power ISA sorted by version
303 Appendix G Power ISA sorted by Compliancy Subset
304 Appendix H Power ISA sorted by mnemonic
306 | Form | Book | Page | Version | mnemonic | Description |
307 |------|------|------|---------|----------|-------------|
308 | Z23 | I | # | 3.0B | shadd | Shift-and-Add |
309 | Z23 | I | # | 3.0B | shaddw | Shift-and-Add Signed Word |
310 | Z23 | I | # | 3.0B | shadduw | Shift-and-Add Unsigned Word |