4fcf6577cac4616f8865d976f123de5cc093acf1
[libreriscv.git] / openpower / sv / rfc / ls004.mdwn
1 # RFC ls004 v2 Shift-And-Add and LD/ST-Shifted
2
3 * Funded by NLnet under the Privacy and Enhanced Trust Programme, EU
4 Horizon2020 Grant 825310, and NGI0 Entrust No 101069594
5 * <https://libre-soc.org/openpower/sv/rfc/ls004/>
6 * <https://git.openpower.foundation/isa/PowerISA/issues/125>
7 * feedback: <https://bugs.libre-soc.org/show_bug.cgi?id=1091>
8
9 **Changes**:
10
11 * initial shift-and-add <https://bugs.libre-soc.org/show_bug.cgi?id=968>
12 * add saddw: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
13 * consider LD/ST-Shifted <https://bugs.libre-soc.org/show_bug.cgi?id=1055>
14
15 **Severity**: Major
16
17 **Status**: New
18
19 **Date**: 07 Feb 2024
20
21 **Target**: v3.2B
22
23 **Source**: v3.0B
24
25 **Books and Section affected**:
26
27 ```
28 Book I Fixed-Point Shift Instructions 3.3.14.2
29 Appendix E Power ISA sorted by opcode
30 Appendix F Power ISA sorted by version
31 Appendix G Power ISA sorted by Compliancy Subset
32 Appendix H Power ISA sorted by mnemonic
33 ```
34
35 **Summary**
36
37 ```
38 Instructions added
39 sadd - Shift and Add
40 saddw - Shift and Add Signed Word
41 sadduw - Shift and Add Unsigned Word
42 Also LD/ST-Indexed-Shifted (Fixed and Floating)
43 ```
44
45 **Submitter**: Luke Leighton (Libre-SOC)
46
47 **Requester**: Libre-SOC
48
49 **Impact on processor**:
50
51 ```
52 Addition of three new GPR-based instructions
53 ```
54
55 **Impact on software**:
56
57 ```
58 Requires support for new instructions in assembler, debuggers,
59 and related tools.
60 ```
61
62 **Keywords**:
63
64 ```
65 GPR, Bit-manipulation, Shift, Arithmetic, Array Indexing
66 ```
67
68 **Motivation**
69
70 Power ISA is missing LD/ST Indexed with shift, which is present in both ARM
71 and x86. Adding more LD/ST is thirty eight instructions, a compromise is to
72 add shift-and-add. Replaces a pair of explicit instructions in hot-loops.
73 Adding actual LD/ST Shifted saves even further.
74
75 **Notes and Observations**:
76
77 1. `sadd` and `sadduw` operate on unsigned integers.
78 2. `sadduw` is intended for performing address offsets,
79 as the second operand is constrained to lower 32-bits
80 and zero-extended.
81 3. All three are 2-in 1-out instructions.
82 4. shift-add operations are present in both x86 and aarch64,
83 since they are useful for both general arithmetic and for
84 computing addresses even when not immediately followed
85 with a load/store.
86 5. `saddw` is often more useful than `sadduw` because C/C++ programmers like
87 to use `int` for array indexing. for additional details see
88 <https://bugs.libre-soc.org/show_bug.cgi?id=996>.
89 6. Even Motorola 68000 has LD/ST-Indexed-Shifted <https://tack.sourceforge.net/olddocs/m68020.html#2.2.2.%20Extra%20MC68020%20addressing%20modes>
90 7. should average-shift-add also be included? what about CA-in / CA-out?
91
92 **Changes**
93
94 Add the following entries to:
95
96 * the Appendices of Book I
97 * Instructions of Book I added to Section 3.3.14.2
98
99 ----------------
100
101 \newpage{}
102
103 # Table of LD/ST-Indexed-Shift
104
105 The following demonstrates the alternative instructions that could
106 be considered to be added. They are all 9-bit XO:
107
108 * 12 Load Indexed Shifted (with Update)
109 * 3 Load Indexed Shifted Byte-reverse
110 * 8 Store Indexed Shifted (with Update)
111 * 3 Store Indexed Shifted Byte-reverse
112 * 6 Floating-Point Load Indexed Shifted (with Update)
113 * 6 Floating-Point Store Indexed Shifted (with Update)
114 * 6 Load Indexed Shifted Update Post-Increment
115 * 4 Store Indexed Shifted Update Post-Increment
116 * 2 Floating-Point Load Indexed Shifted Update Post-Increment
117 * 2 Floating-Point Store Indexed Shifted Update Post-Increment
118
119 Total count: 51 new 9-bit XO instructions, for an approximate total
120 XO cost of 3 bits within a single Primary Opcode. With the savings
121 that these instructions represent in hot-loops, as evidenced by their
122 inclusion in top-end ISAs such as x86 and ARM, the cost may be considered
123 justifiable. However there is no point in placing the 38
124 Shifted-only group in EXT2xx, they need to be in EXT0xx, because if added
125 as 64-bit Encoding the benefit reduction in binary size is not achieved.
126 Post-Increment-Shifted on the other hand could reasonably be proposed
127 in EXT2xx.
128
129 **LD/ST-Shifted**
130
131 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
132 |-------|------|-------|-------|-------|-------|----------------------|
133 | PO | RT | RA | RB | SH | XO | lbzsx RT,RA,RB,SH |
134 | PO | RT | RA | RB | SH | XO | lhzsx RT,RA,RB,SH |
135 | PO | RT | RA | RB | SH | XO | lhasx RT,RA,RB,SH |
136 | PO | RT | RA | RB | SH | XO | lwzsx RT,RA,RB,SH |
137 | PO | RT | RA | RB | SH | XO | lwasx RT,RA,RB,SH |
138 | PO | RT | RA | RB | SH | XO | ldsx RT,RA,RB,SH |
139 | PO | RT | RA | RB | SH | XO | lhbrsx RT,RA,RB,SH |
140 | PO | RT | RA | RB | SH | XO | lwbrsx RT,RA,RB,SH |
141 | PO | RT | RA | RB | SH | XO | ldbrsx RT,RA,RB,SH |
142 | PO | RS | RA | RB | SH | XO | stbsx RS,RA,RB,SH |
143 | PO | RS | RA | RB | SH | XO | sthsx RS,RA,RB,SH |
144 | PO | RS | RA | RB | SH | XO | stwsx RS,RA,RB,SH |
145 | PO | RS | RA | RB | SH | XO | stdsx RS,RA,RB,SH |
146 | PO | RS | RA | RB | SH | XO | sthbrsx RS,RA,RB,SH |
147 | PO | RS | RA | RB | SH | XO | stwbrsx RS,RA,RB,SH |
148 | PO | RS | RA | RB | SH | XO | stdbrsx RS,RA,RB,SH |
149 | PO | FRT | RA | RB | SH | XO | lfsxs FRT,RA,RB,SH |
150 | PO | FRT | RA | RB | SH | XO | lfdxs FRT,RA,RB,SH |
151 | PO | FRT | RA | RB | SH | XO | lfiwaxs FRT,RA,RB,SH |
152 | PO | FRT | RA | RB | SH | XO | lfiwzxs FRT,RA,RB,SH |
153 | PO | FRS | RA | RB | SH | XO | stfsxs FRS,RA,RB,SH |
154 | PO | FRS | RA | RB | SH | XO | stfdxs FRS,RA,RB,SH |
155 | PO | FRS | RA | RB | SH | XO | stfiwxs FRS,RA,RB,SH |
156
157 **LD/ST-Shifted-Update**
158
159 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
160 |-------|------|-------|-------|-------|-------|----------------------|
161 | PO | RT | RA | RB | SH | XO | lbzusx RT,RA,RB,SH |
162 | PO | RT | RA | RB | SH | XO | lhzusx RT,RA,RB,SH |
163 | PO | RT | RA | RB | SH | XO | lhausx RT,RA,RB,SH |
164 | PO | RT | RA | RB | SH | XO | lwzusx RT,RA,RB,SH |
165 | PO | RT | RA | RB | SH | XO | lwausx RT,RA,RB,SH |
166 | PO | RT | RA | RB | SH | XO | ldusx RT,RA,RB,SH |
167 | PO | RS | RA | RB | SH | XO | stbusx RS,RA,RB,SH |
168 | PO | RS | RA | RB | SH | XO | sthusx RS,RA,RB,SH |
169 | PO | RS | RA | RB | SH | XO | stwusx RS,RA,RB,SH |
170 | PO | RS | RA | RB | SH | XO | stdusx RS,RA,RB,SH |
171 | PO | FRT | RA | RB | SH | XO | lfsuxs FRT,RA,RB,SH |
172 | PO | FRT | RA | RB | SH | XO | lfduxs FRT,RA,RB,SH |
173 | PO | FRS | RA | RB | SH | XO | stfsuxs FRS,RA,RB,SH |
174 | PO | FRS | RA | RB | SH | XO | stfduxs FRS,RA,RB,SH |
175
176 **Post-Increment-Update LD/ST-Shifted**
177
178 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
179 |-------|------|-------|-------|-------|-------|----------------------|
180 | PO | RT | RA | RB | SH | XO | lbzupsx RT,RA,RB,SH |
181 | PO | RT | RA | RB | SH | XO | lhzupsx RT,RA,RB,SH |
182 | PO | RT | RA | RB | SH | XO | lhaupsx RT,RA,RB,SH |
183 | PO | RT | RA | RB | SH | XO | lwzupsx RT,RA,RB,SH |
184 | PO | RT | RA | RB | SH | XO | lwaupsx RT,RA,RB,SH |
185 | PO | RS | RA | RB | SH | XO | stbupsx RS,RA,RB,SH |
186 | PO | RS | RA | RB | SH | XO | sthupsx RS,RA,RB,SH |
187 | PO | RS | RA | RB | SH | XO | stwupsx RS,RA,RB,SH |
188 | PO | RS | RA | RB | SH | XO | stdupsx RS,RA,RB,SH |
189 | PO | RT | RA | RB | SH | XO | ldupsx RT,RA,RB,SH |
190 | PO | FRT | RA | RB | SH | XO | lfdupxs FRT,RA,RB,SH |
191 | PO | FRT | RA | RB | SH | XO | lfsupxs FRT,RA,RB,SH |
192 | PO | FRS | RA | RB | SH | XO | stfdupxs FRS,RA,RB,SH |
193 | PO | FRS | RA | RB | SH | XO | stfsupxs FRS,RA,RB,SH |
194
195 ----------------
196
197 \newpage{}
198
199 # Shift-and-Add
200
201 `sadd RT, RA, RB, SH`
202
203 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
204 |-------|------|-------|-------|-------|-------|----|----------|
205 | PO | RT | RA | RB | SH | XO | Rc | Z23-Form |
206
207 Pseudocode:
208
209 ```
210 shift <- SH + 1 # Shift is between 1-4
211 sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
212 RT <- sum # Result stored in RT
213 ```
214
215 When `SH` is zero, the contents of register RB are multiplied by 2,
216 added to the contents of register RA, and the result stored in RT.
217
218 `SH` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
219
220 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
221
222 **NEED EXAMPLES (not sure how to embed SH)!!!**
223 Examples:
224
225 ```
226 # adds r1 to (r2*8)
227 sadd r4, r1, r2, 3
228 ```
229
230 # Shift-and-Add Signed Word
231
232 `saddw RT, RA, RB, SH`
233
234 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
235 |-------|------|-------|-------|-------|-------|----|----------|
236 | PO | RT | RA | RB | SH | XO | Rc | Z23-Form |
237
238 Pseudocode:
239
240 ```
241 shift <- SH + 1 # Shift is between 1-4
242 n <- EXTS64((RB)[32:63]) # Only use lower 32-bits of RB
243 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
244 RT <- sum # Result stored in RT
245 ```
246
247 When `SH` is zero, the lower word contents of register RB are multiplied by 2,
248 added to the contents of register RA, and the result stored in RT.
249
250 `SH` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
251
252 Operands RA and RB, and the result RT are all 64-bit, signed integers.
253
254 *Programmer's Note:
255 The advantage of this instruction is doing address offsets. RA is the base 64-bit
256 address. RB is the offset into data structure limited to 32-bit.*
257
258 Examples:
259
260 ```
261 # r4 = r1 + (r2*16)
262 saddw r4, r1, r2, 3
263 ```
264
265 ----------------
266
267 \newpage{}
268
269
270 # Shift-and-Add Unsigned Word
271
272 `sadduw RT, RA, RB, SH`
273
274 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
275 |-------|------|-------|-------|-------|-------|----|----------|
276 | PO | RT | RA | RB | SH | XO | Rc | Z23-Form |
277
278 Pseudocode:
279
280 ```
281 shift <- SH + 1 # Shift is between 1-4
282 n <- (RB)[32:63] # Only use lower 32-bits of RB
283 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
284 RT <- sum # Result stored in RT
285 ```
286
287 When `SH` is zero, the lower word contents of register RB are multiplied by 2,
288 added to the contents of register RA, and the result stored in RT.
289
290 `SH` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
291
292 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
293
294 *Programmer's Note:
295 The advantage of this instruction is doing address offsets. RA is the base 64-bit
296 address. RB is the offset into data structure limited to 32-bit.*
297
298 Examples:
299
300 ```
301 #
302 sadduw r4, r1, r2, 2
303 ```
304
305 \newpage{}
306 [[!inline pages="openpower/isa/fixedloadshift" raw=yes ]]
307 \newpage{}
308 [[!inline pages="openpower/isa/fixedstoreshift" raw=yes ]]
309 \newpage{}
310 [[!inline pages="openpower/isa/fploadshift" raw=yes ]]
311 \newpage{}
312 [[!inline pages="openpower/isa/fpstoreshift" raw=yes ]]
313
314 \newpage{}
315 [[!inline pages="openpower/isa/pifixedloadshift" raw=yes ]]
316 \newpage{}
317 [[!inline pages="openpower/isa/pifixedstoreshift" raw=yes ]]
318 \newpage{}
319 [[!inline pages="openpower/isa/pifploadshift" raw=yes ]]
320 \newpage{}
321 [[!inline pages="openpower/isa/pifpstoreshift" raw=yes ]]
322
323 \newpage{}
324
325 # Instruction Formats
326
327 **Add the following to Book I 1.6.1**
328
329 Z23-Form:
330
331 ```
332 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
333 |-------|------|-------|-------|-------|-------|----|----------|
334 | PO | RT | RA | RB | SH | XO | Rc | Z23-Form |
335 | PO | RS | RA | RB | SH | XO | Rc | Z23-Form |
336 | PO | FRT | RA | RB | SH | XO | Rc | Z23-Form |
337 | PO | FRS | RA | RB | SH | XO | Rc | Z23-Form |
338 ```
339
340 # Instruction Fields
341
342 Add Z23 to the following Formats in Book I 1.6.2: `FRS FRT RT RA RB XO Rc`
343
344 Add the following new fields:
345
346 ```
347 SH (21:22)
348 Field used to specify a shift amount.
349 Formats: Z23
350 ```
351
352 # Appendices
353
354 Appendix E Power ISA sorted by opcode
355 Appendix F Power ISA sorted by version
356 Appendix G Power ISA sorted by Compliancy Subset
357 Appendix H Power ISA sorted by mnemonic
358
359 | Form | Book | Page | Version | mnemonic | Description |
360 |------|------|------|---------|----------|-------------|
361 | Z23 | I | # | 3.0B | sadd | Shift-and-Add |
362 | Z23 | I | # | 3.0B | saddw | Shift-and-Add Signed Word |
363 | Z23 | I | # | 3.0B | sadduw | Shift-and-Add Unsigned Word |
364
365 [[!tag opf_rfc]]
366