separate tables in ls004 for ld/st groups
[libreriscv.git] / openpower / sv / rfc / ls004.mdwn
1 # RFC ls004 Shift-And-Add
2
3 **URLs**:
4
5 * <https://libre-soc.org/openpower/sv/rfc/ls004/>
6 * <https://git.openpower.foundation/isa/PowerISA/issues/125>
7 * shift-and-add <https://bugs.libre-soc.org/show_bug.cgi?id=968>
8 * add shaddw: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
9
10 **Severity**: Major
11
12 **Status**: New
13
14 **Date**: 31 Oct 2022
15
16 **Target**: v3.2B
17
18 **Source**: v3.0B
19
20 **Books and Section affected**:
21
22 ```
23 Book I Fixed-Point Shift Instructions 3.3.14.2
24 Appendix E Power ISA sorted by opcode
25 Appendix F Power ISA sorted by version
26 Appendix G Power ISA sorted by Compliancy Subset
27 Appendix H Power ISA sorted by mnemonic
28 ```
29
30 **Summary**
31
32 ```
33 Instructions added
34 shadd - Shift and Add
35 shaddw - Shift and Add Signed Word
36 shadduw - Shift and Add Unsigned Word
37 Also under consideration LD/ST-Indexed-Shifted
38 ```
39
40 **Submitter**: Luke Leighton (Libre-SOC)
41
42 **Requester**: Libre-SOC
43
44 **Impact on processor**:
45
46 ```
47 Addition of three new GPR-based instructions
48 ```
49
50 **Impact on software**:
51
52 ```
53 Requires support for new instructions in assembler, debuggers,
54 and related tools.
55 ```
56
57 **Keywords**:
58
59 ```
60 GPR, Bit-manipulation, Shift, Arithmetic, Array Indexing
61 ```
62
63 **Motivation**
64
65 Power ISA is missing LD/ST Indexed with shift, which is present in both ARM
66 and x86. Adding more LD/ST is thirty eight instructions, a compromise is to
67 add shift-and-add. Replaces a pair of explicit instructions in hot-loops.
68
69 **Notes and Observations**:
70
71 1. `shadd` and `shadduw` operate on unsigned integers.
72 2. `shadduw` is intended for performing address offsets,
73 as the second operand is constrained to lower 32-bits
74 and zero-extended.
75 3. All three are 2-in 1-out instructions.
76 4. shift-add operations are present in both x86 and aarch64,
77 since they are useful for both general arithmetic and for
78 computing addresses even when not immediately followed
79 with a load/store.
80 5. `shaddw` is often more useful than `shadduw` because C/C++ programmers like
81 to use `int` for array indexing. for additional details see
82 <https://bugs.libre-soc.org/show_bug.cgi?id=996>.
83 6. Even Motorola 68000 has LD/ST-Indexed-Shifted <https://tack.sourceforge.net/olddocs/m68020.html#2.2.2.%20Extra%20MC68020%20addressing%20modes>
84 7. should average-add also be included? what about CA?
85
86 **Changes**
87
88 Add the following entries to:
89
90 * the Appendices of Book I
91 * Instructions of Book I added to Section 3.3.14.2
92
93 ----------------
94
95 \newpage{}
96
97 # Table of LD/ST-Indexed-Shift
98
99 The following demonstrates the alternative instructions that could
100 be considered to be added. They are all 9-bit XO:
101
102 * 12 Load Indexed Shifted (with Update)
103 * 3 Load Indexed Shifted Byte-reverse
104 * 8 Store Indexed Shifted (with Update)
105 * 3 Store Indexed Shifted Byte-reverse
106 * 6 Floating-Point Load Indexed Shifted (with Update)
107 * 6 Floating-Point Store Indexed Shifted (with Update)
108 * 6 Load Indexed Shifted Update Post-Increment
109 * 4 Store Indexed Shifted Update Post-Increment
110 * 2 Floating-Point Load Indexed Shifted Update Post-Increment
111 * 2 Floating-Point Store Indexed Shifted Update Post-Increment
112
113 Total count: 51 new 9-bit XO instructions, for an approximate total
114 XO cost of 3 bits within a single Primary Opcode. With the savings
115 that these instructions represent in hot-loops, as evidenced by their
116 inclusion in top-end ISAs such as x86 and ARM, the cost may be considered
117 justifiable. However there is no point in placing the 38
118 Shifted-only group in EXT2xx, they need to be in EXT0xx, because if added
119 as 64-bit Encoding the benefit reduction in binary size is not achieved.
120 Post-Increment-Shifted on the other hand could reasonably be proposed
121 in EXT2xx.
122
123 **LD/ST-Shifted**
124
125 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
126 |-------|------|-------|-------|-------|-------|----------------------|
127 | PO | RT | RA | RB | sm | XO | lbzsx RT,RA,RB,sm |
128 | PO | RT | RA | RB | sm | XO | lhzsx RT,RA,RB,sm |
129 | PO | RT | RA | RB | sm | XO | lhasx RT,RA,RB,sm |
130 | PO | RT | RA | RB | sm | XO | lwzsx RT,RA,RB,sm |
131 | PO | RT | RA | RB | sm | XO | lwasx RT,RA,RB,sm |
132 | PO | RT | RA | RB | sm | XO | ldsx RT,RA,RB,sm |
133 | PO | RT | RA | RB | sm | XO | lhbrsx RT,RA,RB,sm |
134 | PO | RT | RA | RB | sm | XO | lwbrsx RT,RA,RB,sm |
135 | PO | RT | RA | RB | sm | XO | ldbrsx RT,RA,RB,sm |
136 | PO | RS | RA | RB | sm | XO | stbsx RS,RA,RB,sm |
137 | PO | RS | RA | RB | sm | XO | sthsx RS,RA,RB,sm |
138 | PO | RS | RA | RB | sm | XO | stwsx RS,RA,RB,sm |
139 | PO | RS | RA | RB | sm | XO | stdsx RS,RA,RB,sm |
140 | PO | RS | RA | RB | sm | XO | sthbrsx RS,RA,RB,sm |
141 | PO | RS | RA | RB | sm | XO | stwbrsx RS,RA,RB,sm |
142 | PO | RS | RA | RB | sm | XO | stdbrsx RS,RA,RB,sm |
143 | PO | FRT | RA | RB | sm | XO | lfsxs FRT,RA,RB,sm |
144 | PO | FRT | RA | RB | sm | XO | lfdxs FRT,RA,RB,sm |
145 | PO | FRT | RA | RB | sm | XO | lfiwaxs FRT,RA,RB,sm |
146 | PO | FRT | RA | RB | sm | XO | lfiwzxs FRT,RA,RB,sm |
147 | PO | FRS | RA | RB | sm | XO | stfsxs FRS,RA,RB,sm |
148 | PO | FRS | RA | RB | sm | XO | stfdxs FRS,RA,RB,sm |
149 | PO | FRS | RA | RB | sm | XO | stfiwxs FRS,RA,RB,sm |
150
151 **LD/ST-Shifted-Update**
152
153 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
154 |-------|------|-------|-------|-------|-------|----------------------|
155 | PO | RT | RA | RB | sm | XO | lbzusx RT,RA,RB,sm |
156 | PO | RT | RA | RB | sm | XO | lhzusx RT,RA,RB,sm |
157 | PO | RT | RA | RB | sm | XO | lhausx RT,RA,RB,sm |
158 | PO | RT | RA | RB | sm | XO | lwzusx RT,RA,RB,sm |
159 | PO | RT | RA | RB | sm | XO | lwausx RT,RA,RB,sm |
160 | PO | RT | RA | RB | sm | XO | ldusx RT,RA,RB,sm |
161 | PO | RS | RA | RB | sm | XO | stbusx RS,RA,RB,sm |
162 | PO | RS | RA | RB | sm | XO | sthusx RS,RA,RB,sm |
163 | PO | RS | RA | RB | sm | XO | stwusx RS,RA,RB,sm |
164 | PO | RS | RA | RB | sm | XO | stdusx RS,RA,RB,sm |
165 | PO | FRT | RA | RB | sm | XO | lfsuxs FRT,RA,RB,sm |
166 | PO | FRT | RA | RB | sm | XO | lfduxs FRT,RA,RB,sm |
167 | PO | FRS | RA | RB | sm | XO | stfsuxs FRS,RA,RB,sm |
168 | PO | FRS | RA | RB | sm | XO | stfduxs FRS,RA,RB,sm |
169
170 **Post-Increment-Update LD/ST-Shifted**
171
172 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
173 |-------|------|-------|-------|-------|-------|----------------------|
174 | PO | RT | RA | RB | sm | XO | lbzuspx RT,RA,RB,sm |
175 | PO | RT | RA | RB | sm | XO | lhzuspx RT,RA,RB,sm |
176 | PO | RT | RA | RB | sm | XO | lhauspx RT,RA,RB,sm |
177 | PO | RT | RA | RB | sm | XO | lwzuspx RT,RA,RB,sm |
178 | PO | RT | RA | RB | sm | XO | lwauspx RT,RA,RB,sm |
179 | PO | RS | RA | RB | sm | XO | stbuspx RS,RA,RB,sm |
180 | PO | RS | RA | RB | sm | XO | sthuspx RS,RA,RB,sm |
181 | PO | RS | RA | RB | sm | XO | stwuspx RS,RA,RB,sm |
182 | PO | RS | RA | RB | sm | XO | stduspx RS,RA,RB,sm |
183 | PO | RT | RA | RB | sm | XO | lduspx RT,RA,RB,sm |
184 | PO | FRT | RA | RB | sm | XO | lfdupxs FRT,RA,RB,sm |
185 | PO | FRT | RA | RB | sm | XO | lfsupxs FRT,RA,RB,sm |
186 | PO | FRS | RA | RB | sm | XO | stfdupxs FRS,RA,RB,sm |
187 | PO | FRS | RA | RB | sm | XO | stfsupxs FRS,RA,RB,sm |
188
189 ----------------
190
191 \newpage{}
192
193 # Shift-and-Add
194
195 `shadd RT, RA, RB, sm`
196
197 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
198 |-------|------|-------|-------|-------|-------|----|----------|
199 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
200
201 Pseudocode:
202
203 ```
204 shift <- sm + 1 # Shift is between 1-4
205 sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
206 RT <- sum # Result stored in RT
207 ```
208
209 When `sm` is zero, the contents of register RB are multiplied by 2,
210 added to the contents of register RA, and the result stored in RT.
211
212 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
213
214 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
215
216 **NEED EXAMPLES (not sure how to embed sm)!!!**
217 Examples:
218
219 ```
220 # adds r1 to (r2*8)
221 shadd r4, r1, r2, 3
222 ```
223
224 # Shift-and-Add Signed Word
225
226 `shaddw RT, RA, RB, sm`
227
228 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
229 |-------|------|-------|-------|-------|-------|----|----------|
230 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
231
232 Pseudocode:
233
234 ```
235 shift <- sm + 1 # Shift is between 1-4
236 n <- EXTS64((RB)[32:63]) # Only use lower 32-bits of RB
237 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
238 RT <- sum # Result stored in RT
239 ```
240
241 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
242 added to the contents of register RA, and the result stored in RT.
243
244 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
245
246 Operands RA and RB, and the result RT are all 64-bit, signed integers.
247
248 *Programmer's Note:
249 The advantage of this instruction is doing address offsets. RA is the base 64-bit
250 address. RB is the offset into data structure limited to 32-bit.*
251
252 Examples:
253
254 ```
255 # r4 = r1 + (r2*16)
256 shaddw r4, r1, r2, 3
257 ```
258
259 ----------------
260
261 \newpage{}
262
263
264 # Shift-and-Add Unsigned Word
265
266 `shadduw RT, RA, RB, sm`
267
268 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
269 |-------|------|-------|-------|-------|-------|----|----------|
270 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
271
272 Pseudocode:
273
274 ```
275 shift <- sm + 1 # Shift is between 1-4
276 n <- (RB)[32:63] # Only use lower 32-bits of RB
277 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
278 RT <- sum # Result stored in RT
279 ```
280
281 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
282 added to the contents of register RA, and the result stored in RT.
283
284 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
285
286 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
287
288 *Programmer's Note:
289 The advantage of this instruction is doing address offsets. RA is the base 64-bit
290 address. RB is the offset into data structure limited to 32-bit.*
291
292 Examples:
293
294 ```
295 #
296 shadduw r4, r1, r2, 2
297 ```
298
299 # Appendices
300
301 Appendix E Power ISA sorted by opcode
302 Appendix F Power ISA sorted by version
303 Appendix G Power ISA sorted by Compliancy Subset
304 Appendix H Power ISA sorted by mnemonic
305
306 | Form | Book | Page | Version | mnemonic | Description |
307 |------|------|------|---------|----------|-------------|
308 | Z23 | I | # | 3.0B | shadd | Shift-and-Add |
309 | Z23 | I | # | 3.0B | shaddw | Shift-and-Add Signed Word |
310 | Z23 | I | # | 3.0B | shadduw | Shift-and-Add Unsigned Word |
311
312 [[!tag opf_rfc]]
313