1 # RFC ls004 Shift-And-Add
5 * <https://libre-soc.org/openpower/sv/biginteger/analysis/>
6 * <https://libre-soc.org/openpower/sv/rfc/ls004/>
7 * bigint: <https://bugs.libre-soc.org/show_bug.cgi?id=960> TODO: maybe remove this link due to confusion and irrelevance?
8 * <https://git.openpower.foundation/isa/PowerISA/issues/91>
9 * shift-and-add <https://bugs.libre-soc.org/show_bug.cgi?id=968>
10 * add shaddw: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
22 **Books and Section affected**:
25 Book I Fixed-Point Shift Instructions 3.3.14.2
26 Appendix E Power ISA sorted by opcode
27 Appendix F Power ISA sorted by version
28 Appendix G Power ISA sorted by Compliancy Subset
29 Appendix H Power ISA sorted by mnemonic
37 shadduw - Shift and Add Unsigned Word
40 **Submitter**: Luke Leighton (Libre-SOC)
42 **Requester**: Libre-SOC
44 **Impact on processor**:
47 Addition of two new GPR-based instructions
50 **Impact on software**:
53 Requires support for new instructions in assembler, debuggers,
60 GPR, Bit-manipulation, Shift, Arithmetic, Array Indexing
65 Power ISA is missing LD/ST Indexed with shift, which is present in both ARM
66 and x86. Adding more LD/ST is thirty eight instructions, a compromise is to
67 add shift-and-add. Replaces a pair of explicit instructions in hot-loops.
69 **Notes and Observations**:
71 1. `shadd` and `shadduw` operate on unsigned integers.
72 2. `shadduw` is intended for performing address offsets,
73 as the second operand is constrained to lower 32-bits
75 3. Both are 2-in 1-out instructions.
76 4. shift-add operations are present in both x86 and aarch64,
77 since they are useful for both general arithmetic and for
78 computing addresses even when not immediately followed
81 TODO: signed 32-bit shift-and-add should be added, this needs to be addressed
82 before submitting the RFC: <https://bugs.libre-soc.org/show_bug.cgi?id=996>
86 Add the following entries to:
88 * the Appendices of Book I
89 * Instructions of Book I added to Section 3.3.14.2
95 # Table of LD/ST-Indexed-Shift
97 The following demonstrates the alternative instructions that could
98 be considered to be added. They are all 9-bit XO which is not hugely
99 costly. The totals are
101 * 12 Load Indexed Shifted (with Update)
102 * 3 Load Indexed Shifted Byte-reverse
103 * 8 Store Indexed Shifted (with Update)
104 * 3 Store Indexed Shifted Byte-reverse
105 * 6 Floating-Point Load Indexed Shifted (with Update)
106 * 6 Floating-Point Store Indexed Shifted (with Update)
108 Total count: 38 new 9-bit XO instructions, for an approximate total
109 XO cost of 3 bits within a single Primary Opcode. With the savings
110 that these instructions represent in hot-loops, as evidenced by their
111 inclusion in top-end ISAs such as x86 and ARM, the cost may be considered
112 justifiable. However there is no point in placing these in EXT2xx, they
113 need to be in EXT0xx, because if added as 64-bit Encoding the benefit
114 reduction in binary size is not achieved.
116 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-31 | Instruction |
117 |-------|------|-------|-------|-------|-------|----------------------|
118 | PO | RT | RA | RB | sm | XO | lbzsx RT,RA,RB,sm |
119 | PO | RT | RA | RB | sm | XO | lbzusx RT,RA,RB,sm |
120 | PO | RT | RA | RB | sm | XO | lhzsx RT,RA,RB,sm |
121 | PO | RT | RA | RB | sm | XO | lhzusx RT,RA,RB,sm |
122 | PO | RT | RA | RB | sm | XO | lhasx RT,RA,RB,sm |
123 | PO | RT | RA | RB | sm | XO | lhausx RT,RA,RB,sm |
124 | PO | RT | RA | RB | sm | XO | lwzsx RT,RA,RB,sm |
125 | PO | RT | RA | RB | sm | XO | lwzusx RT,RA,RB,sm |
126 | PO | RT | RA | RB | sm | XO | lwasx RT,RA,RB,sm |
127 | PO | RT | RA | RB | sm | XO | lwausx RT,RA,RB,sm |
128 | PO | RT | RA | RB | sm | XO | ldsx RT,RA,RB,sm |
129 | PO | RT | RA | RB | sm | XO | ldusx RT,RA,RB,sm |
130 | PO | RT | RA | RB | sm | XO | lhbrsx RT,RA,RB,sm |
131 | PO | RT | RA | RB | sm | XO | lwbrsx RT,RA,RB,sm |
132 | PO | RT | RA | RB | sm | XO | ldbrsx RT,RA,RB,sm |
133 | PO | RS | RA | RB | sm | XO | stbsx RS,RA,RB,sm |
134 | PO | RS | RA | RB | sm | XO | stbusx RS,RA,RB,sm |
135 | PO | RS | RA | RB | sm | XO | sthsx RS,RA,RB,sm |
136 | PO | RS | RA | RB | sm | XO | sthusx RS,RA,RB,sm |
137 | PO | RS | RA | RB | sm | XO | stwsx RS,RA,RB,sm |
138 | PO | RS | RA | RB | sm | XO | stwusx RS,RA,RB,sm |
139 | PO | RS | RA | RB | sm | XO | stdsx RS,RA,RB,sm |
140 | PO | RS | RA | RB | sm | XO | stdusx RS,RA,RB,sm |
141 | PO | RS | RA | RB | sm | XO | sthbrsx RS,RA,RB,sm |
142 | PO | RS | RA | RB | sm | XO | stwbrsx RS,RA,RB,sm |
143 | PO | RS | RA | RB | sm | XO | stdbrsx RS,RA,RB,sm |
144 | PO | FRT | RA | RB | sm | XO | lfsxs FRT,RA,RB,sm |
145 | PO | FRT | RA | RB | sm | XO | lfsuxs FRT,RA,RB,sm |
146 | PO | FRT | RA | RB | sm | XO | lfdxs FRT,RA,RB,sm |
147 | PO | FRT | RA | RB | sm | XO | lfduxs FRT,RA,RB,sm |
148 | PO | FRT | RA | RB | sm | XO | lfiwaxs FRT,RA,RB,sm |
149 | PO | FRT | RA | RB | sm | XO | lfiwzxs FRT,RA,RB,sm |
150 | PO | FRS | RA | RB | sm | XO | stfsxs FRS,RA,RB,sm |
151 | PO | FRS | RA | RB | sm | XO | stfsuxs FRS,RA,RB,sm |
152 | PO | FRS | RA | RB | sm | XO | stfdxs FRS,RA,RB,sm |
153 | PO | FRS | RA | RB | sm | XO | stfduxs FRS,RA,RB,sm |
154 | PO | FRS | RA | RB | sm | XO | stfiwxs FRS,RA,RB,sm |
164 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
165 |-------|------|-------|-------|-------|-------|----|----------|
166 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
171 shift <- sm + 1 # Shift is between 1-4
172 sum[0:63] <- ((RB) << shift) + (RA) # Shift RB, add RA
173 RT <- sum # Result stored in RT
176 When `sm` is zero, the contents of register RB are multiplied by 2,
177 added to the contents of register RA, and the result stored in RT.
179 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
181 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
183 **NEED EXAMPLES (not sure how to embed sm)!!!**
191 # Shift-and-Add Unsigned Word
195 | 0-5 | 6-10 | 11-15 | 16-20 | 21-22 | 23-30 | 31 | Form |
196 |-------|------|-------|-------|-------|-------|----|----------|
197 | PO | RT | RA | RB | sm | XO | Rc | Z23-Form |
202 shift <- sm + 1 # Shift is between 1-4
203 n <- (RB)[32:63] # Only use lower 32-bits of RB
204 sum[0:63] <- (n << shift) + (RA) # Shift n, add RA
205 RT <- sum # Result stored in RT
208 When `sm` is zero, the lower word contents of register RB are multiplied by 2,
209 added to the contents of register RA, and the result stored in RT.
211 `sm` is a 2-bit bit-field, and allows multiplication of RB by 2, 4, 8, 16.
213 Operands RA and RB, and the result RT are all 64-bit, unsigned integers.
216 The advantage of this instruction is doing address offsets. RA is the base 64-bit
217 address. RB is the offset into data structure limited to 32-bit.*
230 Appendix E Power ISA sorted by opcode
231 Appendix F Power ISA sorted by version
232 Appendix G Power ISA sorted by Compliancy Subset
233 Appendix H Power ISA sorted by mnemonic
235 | Form | Book | Page | Version | mnemonic | Description |
236 |------|------|------|---------|----------|-------------|
237 | Z23 | I | # | 3.0B | shadd | Shift-and-Add |
238 | Z23 | I | # | 3.0B | shadduw | Shift-and-Add Unsigned Word |