1 # RFC ls004 v2 Shift-And-Add and LD/ST-Shifted
3 * Funded by NLnet under the Privacy and Enhanced Trust Programme, EU
4 Horizon2020 Grant 825310, and NGI0 Entrust No 101069594
5 * <https://libre-soc.org/openpower/sv/rfc/ls004/>
6 * <https://git.openpower.foundation/isa/PowerISA/issues/125>
7 * feedback: <https://bugs.libre-soc.org/show_bug.cgi?id=1091>
11 * initial shift-and-add <https://bugs.libre-soc.org/show_bug.cgi?id=968>
12 * add saddw: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
13 * consider LD/ST-Shifted <https://bugs.libre-soc.org/show_bug.cgi?id=1055>
25 **Books and Section affected**:
28 Book I Fixed-Point Shift Instructions 3.3.14.2
29 Appendix E Power ISA sorted by opcode
30 Appendix F Power ISA sorted by version
31 Appendix G Power ISA sorted by Compliancy Subset
32 Appendix H Power ISA sorted by mnemonic
40 saddw - Shift and Add Signed Word
41 sadduw - Shift and Add Unsigned Word
42 Also LD/ST-Indexed-Shifted (Fixed and Floating)
45 **Submitter**: Luke Leighton (Libre-SOC)
47 **Requester**: Libre-SOC
49 **Impact on processor**:
52 Addition of three new GPR-based instructions
55 **Impact on software**:
58 Requires support for new instructions in assembler, debuggers,
65 GPR, Bit-manipulation, Shift, Arithmetic, Array Indexing
70 Power ISA is missing LD/ST Indexed with shift, which is present in both ARM
71 and x86. Adding more LD/ST is thirty eight instructions, a compromise is to
72 add shift-and-add. Replaces a pair of explicit instructions in hot-loops.
73 Adding actual LD/ST Shifted saves even further.
75 **Notes and Observations**:
77 1. `sadd` and `sadduw` operate on unsigned integers.
78 2. `sadduw` is intended for performing address offsets,
79 as the second operand is constrained to lower 32-bits
81 3. All three are 2-in 1-out instructions.
82 4. shift-add operations are present in both x86 and aarch64,
83 since they are useful for both general arithmetic and for
84 computing addresses even when not immediately followed
86 5. `saddw` is often more useful than `sadduw` because C/C++ programmers like
87 to use `int` for array indexing. for additional details see
88 <https://bugs.libre-soc.org/show_bug.cgi?id=996>.
89 6. Even Motorola 68000 has LD/ST-Indexed-Shifted <https://tack.sourceforge.net/olddocs/m68020.html#2.2.2.%20Extra%20MC68020%20addressing%20modes>
90 7. should average-shift-add also be included? what about CA-in / CA-out?
94 Add the following entries to:
96 * the Appendices of Book I
97 * Instructions of Book I added to Section 3.3.14.2
103 # Table of LD/ST-Indexed-Shift
105 The following demonstrates the alternative instructions that could
106 be considered to be added. They are all 9-bit XO:
108 * 12 Load Indexed Shifted (with Update)
109 * 3 Load Indexed Shifted Byte-reverse
110 * 8 Store Indexed Shifted (with Update)
111 * 3 Store Indexed Shifted Byte-reverse
112 * 6 Floating-Point Load Indexed Shifted (with Update)
113 * 6 Floating-Point Store Indexed Shifted (with Update)
114 * 6 Load Indexed Shifted Update Post-Increment
115 * 4 Store Indexed Shifted Update Post-Increment
116 * 2 Floating-Point Load Indexed Shifted Update Post-Increment
117 * 2 Floating-Point Store Indexed Shifted Update Post-Increment
119 Total count: 51 new 9-bit XO instructions, for an approximate total
120 XO cost of 3 bits within a single Primary Opcode. With the savings
121 that these instructions represent in hot-loops, as evidenced by their
122 inclusion in top-end ISAs such as x86 and ARM, the cost may be considered
123 justifiable. However there is no point in placing the 38
124 Shifted-only group in EXT2xx, they need to be in EXT0xx, because if added
125 as 64-bit Encoding the benefit reduction in binary size is not achieved.
126 Post-Increment-Shifted on the other hand could reasonably be proposed
131 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
132 |-------|------|-------|-------|-------|-------|----------------------|
133 | PO | RT | RA | RB | SH | XO | lbzsx RT,RA,RB,SH |
134 | PO | RT | RA | RB | SH | XO | lhzsx RT,RA,RB,SH |
135 | PO | RT | RA | RB | SH | XO | lhasx RT,RA,RB,SH |
136 | PO | RT | RA | RB | SH | XO | lwzsx RT,RA,RB,SH |
137 | PO | RT | RA | RB | SH | XO | lwasx RT,RA,RB,SH |
138 | PO | RT | RA | RB | SH | XO | ldsx RT,RA,RB,SH |
139 | PO | RT | RA | RB | SH | XO | lhbrsx RT,RA,RB,SH |
140 | PO | RT | RA | RB | SH | XO | lwbrsx RT,RA,RB,SH |
141 | PO | RT | RA | RB | SH | XO | ldbrsx RT,RA,RB,SH |
142 | PO | RS | RA | RB | SH | XO | stbsx RS,RA,RB,SH |
143 | PO | RS | RA | RB | SH | XO | sthsx RS,RA,RB,SH |
144 | PO | RS | RA | RB | SH | XO | stwsx RS,RA,RB,SH |
145 | PO | RS | RA | RB | SH | XO | stdsx RS,RA,RB,SH |
146 | PO | RS | RA | RB | SH | XO | sthbrsx RS,RA,RB,SH |
147 | PO | RS | RA | RB | SH | XO | stwbrsx RS,RA,RB,SH |
148 | PO | RS | RA | RB | SH | XO | stdbrsx RS,RA,RB,SH |
149 | PO | FRT | RA | RB | SH | XO | lfsxs FRT,RA,RB,SH |
150 | PO | FRT | RA | RB | SH | XO | lfdxs FRT,RA,RB,SH |
151 | PO | FRT | RA | RB | SH | XO | lfiwaxs FRT,RA,RB,SH |
152 | PO | FRT | RA | RB | SH | XO | lfiwzxs FRT,RA,RB,SH |
153 | PO | FRS | RA | RB | SH | XO | stfsxs FRS,RA,RB,SH |
154 | PO | FRS | RA | RB | SH | XO | stfdxs FRS,RA,RB,SH |
155 | PO | FRS | RA | RB | SH | XO | stfiwxs FRS,RA,RB,SH |
157 **LD/ST-Shifted-Update**
159 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
160 |-------|------|-------|-------|-------|-------|----------------------|
161 | PO | RT | RA | RB | SH | XO | lbzusx RT,RA,RB,SH |
162 | PO | RT | RA | RB | SH | XO | lhzusx RT,RA,RB,SH |
163 | PO | RT | RA | RB | SH | XO | lhausx RT,RA,RB,SH |
164 | PO | RT | RA | RB | SH | XO | lwzusx RT,RA,RB,SH |
165 | PO | RT | RA | RB | SH | XO | lwausx RT,RA,RB,SH |
166 | PO | RT | RA | RB | SH | XO | ldusx RT,RA,RB,SH |
167 | PO | RS | RA | RB | SH | XO | stbusx RS,RA,RB,SH |
168 | PO | RS | RA | RB | SH | XO | sthusx RS,RA,RB,SH |
169 | PO | RS | RA | RB | SH | XO | stwusx RS,RA,RB,SH |
170 | PO | RS | RA | RB | SH | XO | stdusx RS,RA,RB,SH |
171 | PO | FRT | RA | RB | SH | XO | lfsuxs FRT,RA,RB,SH |
172 | PO | FRT | RA | RB | SH | XO | lfduxs FRT,RA,RB,SH |
173 | PO | FRS | RA | RB | SH | XO | stfsuxs FRS,RA,RB,SH |
174 | PO | FRS | RA | RB | SH | XO | stfduxs FRS,RA,RB,SH |
176 **Post-Increment-Update LD/ST-Shifted**
178 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
179 |-------|------|-------|-------|-------|-------|----------------------|
180 | PO | RT | RA | RB | SH | XO | lbzupsx RT,RA,RB,SH |
181 | PO | RT | RA | RB | SH | XO | lhzupsx RT,RA,RB,SH |
182 | PO | RT | RA | RB | SH | XO | lhaupsx RT,RA,RB,SH |
183 | PO | RT | RA | RB | SH | XO | lwzupsx RT,RA,RB,SH |
184 | PO | RT | RA | RB | SH | XO | lwaupsx RT,RA,RB,SH |
185 | PO | RS | RA | RB | SH | XO | stbupsx RS,RA,RB,SH |
186 | PO | RS | RA | RB | SH | XO | sthupsx RS,RA,RB,SH |
187 | PO | RS | RA | RB | SH | XO | stwupsx RS,RA,RB,SH |
188 | PO | RS | RA | RB | SH | XO | stdupsx RS,RA,RB,SH |
189 | PO | RT | RA | RB | SH | XO | ldupsx RT,RA,RB,SH |
190 | PO | FRT | RA | RB | SH | XO | lfdupxs FRT,RA,RB,SH |
191 | PO | FRT | RA | RB | SH | XO | lfsupxs FRT,RA,RB,SH |
192 | PO | FRS | RA | RB | SH | XO | stfdupxs FRS,RA,RB,SH |
193 | PO | FRS | RA | RB | SH | XO | stfsupxs FRS,RA,RB,SH |
201 `sadd RT, RA, RB, SH`
203 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
204 |-------|------|-------|-------|-------|-------|----|----------|
205 | PO | RT | RA | RB | SH | XO | Rc | Z23-Form |
210 shift <- SH + 1 # Shift is between 1-4
211 sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
212 RT <- sum # Result stored in RT
215 When `SH` is zero, the contents of register RB are multiplied by 2,
216 added to the contents of register RA, and the result stored in RT.
218 `SH` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
220 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
222 **NEED EXAMPLES (not sure how to embed SH)!!!**
230 # Shift-and-Add Signed Word
232 `saddw RT, RA, RB, SH`
234 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
235 |-------|------|-------|-------|-------|-------|----|----------|
236 | PO | RT | RA | RB | SH | XO | Rc | Z23-Form |
241 shift <- SH + 1 # Shift is between 1-4
242 n <- EXTS64((RB)[32:63]) # Only use lower 32-bits of RB
243 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
244 RT <- sum # Result stored in RT
247 When `SH` is zero, the lower word contents of register RB are multiplied by 2,
248 added to the contents of register RA, and the result stored in RT.
250 `SH` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
252 Operands RA and RB, and the result RT are all 64-bit, signed integers.
255 The advantage of this instruction is doing address offsets. RA is the base 64-bit
256 address. RB is the offset into data structure limited to 32-bit.*
270 # Shift-and-Add Unsigned Word
272 `sadduw RT, RA, RB, SH`
274 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
275 |-------|------|-------|-------|-------|-------|----|----------|
276 | PO | RT | RA | RB | SH | XO | Rc | Z23-Form |
281 shift <- SH + 1 # Shift is between 1-4
282 n <- (RB)[32:63] # Only use lower 32-bits of RB
283 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
284 RT <- sum # Result stored in RT
287 When `SH` is zero, the lower word contents of register RB are multiplied by 2,
288 added to the contents of register RA, and the result stored in RT.
290 `SH` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
292 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
295 The advantage of this instruction is doing address offsets. RA is the base 64-bit
296 address. RB is the offset into data structure limited to 32-bit.*
306 [[!inline pages="openpower/isa/fixedloadshift" raw=yes ]]
308 [[!inline pages="openpower/isa/fixedstoreshift" raw=yes ]]
310 [[!inline pages="openpower/isa/fploadshift" raw=yes ]]
312 [[!inline pages="openpower/isa/fpstoreshift" raw=yes ]]
315 [[!inline pages="openpower/isa/pifixedloadshift" raw=yes ]]
317 [[!inline pages="openpower/isa/pifixedstoreshift" raw=yes ]]
319 [[!inline pages="openpower/isa/pifploadshift" raw=yes ]]
321 [[!inline pages="openpower/isa/pifpstoreshift" raw=yes ]]
325 # Instruction Formats
327 **Add the following to Book I 1.6.1**
332 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
333 |-------|------|-------|-------|-------|-------|----|----------|
334 | PO | RT | RA | RB | SH | XO | Rc | Z23-Form |
335 | PO | RS | RA | RB | SH | XO | Rc | Z23-Form |
336 | PO | FRT | RA | RB | SH | XO | Rc | Z23-Form |
337 | PO | FRS | RA | RB | SH | XO | Rc | Z23-Form |
342 Add Z23 to the following Formats in Book I 1.6.2: `FRS FRT RT RA RB XO Rc`
344 Add the following new fields:
348 Field used to specify a shift amount.
354 Appendix E Power ISA sorted by opcode
355 Appendix F Power ISA sorted by version
356 Appendix G Power ISA sorted by Compliancy Subset
357 Appendix H Power ISA sorted by mnemonic
359 | Form | Book | Page | Version | mnemonic | Description |
360 |------|------|------|---------|----------|-------------|
361 | Z23 | I | # | 3.0B | sadd | Shift-and-Add |
362 | Z23 | I | # | 3.0B | saddw | Shift-and-Add Signed Word |
363 | Z23 | I | # | 3.0B | sadduw | Shift-and-Add Unsigned Word |