fix typo
[libreriscv.git] / openpower / sv / rfc / ls004.mdwn
1 # RFC ls004 Shift-And-Add
2
3 **URLs**:
4
5 * <https://libre-soc.org/openpower/sv/rfc/ls004/>
6 * <https://git.openpower.foundation/isa/PowerISA/issues/125>
7 * shift-and-add <https://bugs.libre-soc.org/show_bug.cgi?id=968>
8 * add shaddw: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
9
10 **Severity**: Major
11
12 **Status**: New
13
14 **Date**: 31 Oct 2022
15
16 **Target**: v3.2B
17
18 **Source**: v3.0B
19
20 **Books and Section affected**:
21
22 ```
23 Book I Fixed-Point Shift Instructions 3.3.14.2
24 Appendix E Power ISA sorted by opcode
25 Appendix F Power ISA sorted by version
26 Appendix G Power ISA sorted by Compliancy Subset
27 Appendix H Power ISA sorted by mnemonic
28 ```
29
30 **Summary**
31
32 ```
33 Instructions added
34 shadd - Shift and Add
35 shaddw - Shift and Add Signed Word
36 shadduw - Shift and Add Unsigned Word
37 Also under consideration LD/ST-Indexed-Shifted
38 ```
39
40 **Submitter**: Luke Leighton (Libre-SOC)
41
42 **Requester**: Libre-SOC
43
44 **Impact on processor**:
45
46 ```
47 Addition of three new GPR-based instructions
48 ```
49
50 **Impact on software**:
51
52 ```
53 Requires support for new instructions in assembler, debuggers,
54 and related tools.
55 ```
56
57 **Keywords**:
58
59 ```
60 GPR, Bit-manipulation, Shift, Arithmetic, Array Indexing
61 ```
62
63 **Motivation**
64
65 Power ISA is missing LD/ST Indexed with shift, which is present in both ARM
66 and x86. Adding more LD/ST is thirty eight instructions, a compromise is to
67 add shift-and-add. Replaces a pair of explicit instructions in hot-loops.
68
69 **Notes and Observations**:
70
71 1. `shadd` and `shadduw` operate on unsigned integers.
72 2. `shadduw` is intended for performing address offsets,
73 as the second operand is constrained to lower 32-bits
74 and zero-extended.
75 3. All three are 2-in 1-out instructions.
76 4. shift-add operations are present in both x86 and aarch64,
77 since they are useful for both general arithmetic and for
78 computing addresses even when not immediately followed
79 with a load/store.
80 5. `shaddw` is often more useful than `shadduw` because C/C++ programmers like
81 to use `int` for array indexing. for additional details see
82 <https://bugs.libre-soc.org/show_bug.cgi?id=996>.
83 6. Even Motorola 68000 has LD/ST-Indexed-Shifted <https://tack.sourceforge.net/olddocs/m68020.html#2.2.2.%20Extra%20MC68020%20addressing%20modes>
84 7. should average-add also be included? what about CA?
85
86 **Changes**
87
88 Add the following entries to:
89
90 * the Appendices of Book I
91 * Instructions of Book I added to Section 3.3.14.2
92
93 ----------------
94
95 \newpage{}
96
97 # Table of LD/ST-Indexed-Shift
98
99 The following demonstrates the alternative instructions that could
100 be considered to be added. They are all 9-bit XO:
101
102 * 12 Load Indexed Shifted (with Update)
103 * 3 Load Indexed Shifted Byte-reverse
104 * 8 Store Indexed Shifted (with Update)
105 * 3 Store Indexed Shifted Byte-reverse
106 * 6 Floating-Point Load Indexed Shifted (with Update)
107 * 6 Floating-Point Store Indexed Shifted (with Update)
108 * 6 Load Indexed Shifted Update Post-Increment
109 * 4 Store Indexed Shifted Update Post-Increment
110 * 2 Floating-Point Load Indexed Shifted Update Post-Increment
111 * 2 Floating-Point Store Indexed Shifted Update Post-Increment
112
113 Total count: 51 new 9-bit XO instructions, for an approximate total
114 XO cost of 3 bits within a single Primary Opcode. With the savings
115 that these instructions represent in hot-loops, as evidenced by their
116 inclusion in top-end ISAs such as x86 and ARM, the cost may be considered
117 justifiable. However there is no point in placing the 38
118 Shifted-only group in EXT2xx, they need to be in EXT0xx, because if added
119 as 64-bit Encoding the benefit reduction in binary size is not achieved.
120 Post-Increment-Shifted on the other hand could reasonably be proposed
121 in EXT2xx.
122
123 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
124 |-------|------|-------|-------|-------|-------|----------------------|
125 | PO | RT | RA | RB | sm | XO | lbzsx RT,RA,RB,sm |
126 | PO | RT | RA | RB | sm | XO | lbzusx RT,RA,RB,sm |
127 | PO | RT | RA | RB | sm | XO | lhzsx RT,RA,RB,sm |
128 | PO | RT | RA | RB | sm | XO | lhzusx RT,RA,RB,sm |
129 | PO | RT | RA | RB | sm | XO | lhasx RT,RA,RB,sm |
130 | PO | RT | RA | RB | sm | XO | lhausx RT,RA,RB,sm |
131 | PO | RT | RA | RB | sm | XO | lwzsx RT,RA,RB,sm |
132 | PO | RT | RA | RB | sm | XO | lwzusx RT,RA,RB,sm |
133 | PO | RT | RA | RB | sm | XO | lwasx RT,RA,RB,sm |
134 | PO | RT | RA | RB | sm | XO | lwausx RT,RA,RB,sm |
135 | PO | RT | RA | RB | sm | XO | ldsx RT,RA,RB,sm |
136 | PO | RT | RA | RB | sm | XO | ldusx RT,RA,RB,sm |
137 | PO | RT | RA | RB | sm | XO | lhbrsx RT,RA,RB,sm |
138 | PO | RT | RA | RB | sm | XO | lwbrsx RT,RA,RB,sm |
139 | PO | RT | RA | RB | sm | XO | ldbrsx RT,RA,RB,sm |
140 | PO | RS | RA | RB | sm | XO | stbsx RS,RA,RB,sm |
141 | PO | RS | RA | RB | sm | XO | stbusx RS,RA,RB,sm |
142 | PO | RS | RA | RB | sm | XO | sthsx RS,RA,RB,sm |
143 | PO | RS | RA | RB | sm | XO | sthusx RS,RA,RB,sm |
144 | PO | RS | RA | RB | sm | XO | stwsx RS,RA,RB,sm |
145 | PO | RS | RA | RB | sm | XO | stwusx RS,RA,RB,sm |
146 | PO | RS | RA | RB | sm | XO | stdsx RS,RA,RB,sm |
147 | PO | RS | RA | RB | sm | XO | stdusx RS,RA,RB,sm |
148 | PO | RS | RA | RB | sm | XO | sthbrsx RS,RA,RB,sm |
149 | PO | RS | RA | RB | sm | XO | stwbrsx RS,RA,RB,sm |
150 | PO | RS | RA | RB | sm | XO | stdbrsx RS,RA,RB,sm |
151 | PO | FRT | RA | RB | sm | XO | lfsxs FRT,RA,RB,sm |
152 | PO | FRT | RA | RB | sm | XO | lfsuxs FRT,RA,RB,sm |
153 | PO | FRT | RA | RB | sm | XO | lfdxs FRT,RA,RB,sm |
154 | PO | FRT | RA | RB | sm | XO | lfduxs FRT,RA,RB,sm |
155 | PO | FRT | RA | RB | sm | XO | lfiwaxs FRT,RA,RB,sm |
156 | PO | FRT | RA | RB | sm | XO | lfiwzxs FRT,RA,RB,sm |
157 | PO | FRS | RA | RB | sm | XO | stfsxs FRS,RA,RB,sm |
158 | PO | FRS | RA | RB | sm | XO | stfsuxs FRS,RA,RB,sm |
159 | PO | FRS | RA | RB | sm | XO | stfdxs FRS,RA,RB,sm |
160 | PO | FRS | RA | RB | sm | XO | stfduxs FRS,RA,RB,sm |
161 | PO | FRS | RA | RB | sm | XO | stfiwxs FRS,RA,RB,sm |
162 | PO | RT | RA | RB | sm | XO | lbzuspx RT,RA,RB,sm |
163 | PO | RT | RA | RB | sm | XO | lhzuspx RT,RA,RB,sm |
164 | PO | RT | RA | RB | sm | XO | lhauspx RT,RA,RB,sm |
165 | PO | RT | RA | RB | sm | XO | lwzuspx RT,RA,RB,sm |
166 | PO | RT | RA | RB | sm | XO | lwauspx RT,RA,RB,sm |
167 | PO | RS | RA | RB | sm | XO | stbuspx RS,RA,RB,sm |
168 | PO | RS | RA | RB | sm | XO | sthuspx RS,RA,RB,sm |
169 | PO | RS | RA | RB | sm | XO | stwuspx RS,RA,RB,sm |
170 | PO | RS | RA | RB | sm | XO | stduspx RS,RA,RB,sm |
171 | PO | RT | RA | RB | sm | XO | lduspx RT,RA,RB,sm |
172 | PO | FRT | RA | RB | sm | XO | lfdupxs FRT,RA,RB,sm |
173 | PO | FRT | RA | RB | sm | XO | lfsupxs FRT,RA,RB,sm |
174 | PO | FRS | RA | RB | sm | XO | stfdupxs FRS,RA,RB,sm |
175 | PO | FRS | RA | RB | sm | XO | stfsupxs FRS,RA,RB,sm |
176
177 ----------------
178
179 \newpage{}
180
181 # Shift-and-Add
182
183 `shadd RT, RA, RB, sm`
184
185 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
186 |-------|------|-------|-------|-------|-------|----|----------|
187 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
188
189 Pseudocode:
190
191 ```
192 shift <- sm + 1 # Shift is between 1-4
193 sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
194 RT <- sum # Result stored in RT
195 ```
196
197 When `sm` is zero, the contents of register RB are multiplied by 2,
198 added to the contents of register RA, and the result stored in RT.
199
200 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
201
202 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
203
204 **NEED EXAMPLES (not sure how to embed sm)!!!**
205 Examples:
206
207 ```
208 # adds r1 to (r2*8)
209 shadd r4, r1, r2, 3
210 ```
211
212 # Shift-and-Add Signed Word
213
214 `shaddw RT, RA, RB, sm`
215
216 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
217 |-------|------|-------|-------|-------|-------|----|----------|
218 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
219
220 Pseudocode:
221
222 ```
223 shift <- sm + 1 # Shift is between 1-4
224 n <- EXTS64((RB)[32:63]) # Only use lower 32-bits of RB
225 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
226 RT <- sum # Result stored in RT
227 ```
228
229 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
230 added to the contents of register RA, and the result stored in RT.
231
232 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
233
234 Operands RA and RB, and the result RT are all 64-bit, signed integers.
235
236 *Programmer's Note:
237 The advantage of this instruction is doing address offsets. RA is the base 64-bit
238 address. RB is the offset into data structure limited to 32-bit.*
239
240 Examples:
241
242 ```
243 # r4 = r1 + (r2*16)
244 shaddw r4, r1, r2, 3
245 ```
246
247 ----------------
248
249 \newpage{}
250
251
252 # Shift-and-Add Unsigned Word
253
254 `shadduw RT, RA, RB, sm`
255
256 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
257 |-------|------|-------|-------|-------|-------|----|----------|
258 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
259
260 Pseudocode:
261
262 ```
263 shift <- sm + 1 # Shift is between 1-4
264 n <- (RB)[32:63] # Only use lower 32-bits of RB
265 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
266 RT <- sum # Result stored in RT
267 ```
268
269 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
270 added to the contents of register RA, and the result stored in RT.
271
272 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
273
274 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
275
276 *Programmer's Note:
277 The advantage of this instruction is doing address offsets. RA is the base 64-bit
278 address. RB is the offset into data structure limited to 32-bit.*
279
280 Examples:
281
282 ```
283 #
284 shadduw r4, r1, r2, 2
285 ```
286
287 # Appendices
288
289 Appendix E Power ISA sorted by opcode
290 Appendix F Power ISA sorted by version
291 Appendix G Power ISA sorted by Compliancy Subset
292 Appendix H Power ISA sorted by mnemonic
293
294 | Form | Book | Page | Version | mnemonic | Description |
295 |------|------|------|---------|----------|-------------|
296 | Z23 | I | # | 3.0B | shadd | Shift-and-Add |
297 | Z23 | I | # | 3.0B | shaddw | Shift-and-Add Signed Word |
298 | Z23 | I | # | 3.0B | shadduw | Shift-and-Add Unsigned Word |
299
300 [[!tag opf_rfc]]
301