(no commit message)
[libreriscv.git] / openpower / sv / svp_rewrite / svp64.mdwn
1 # Rewrite of SVP64 for OpenPower ISA v3.1
2
3 * [[svp64/discussion]]
4
5 The plan is to create an encoding for SVP64, then to create an encoding for
6 SVP48, then to reorganize them both to improve field overlap, reducing the
7 amount of decoder hardware necessary.
8
9 All bit numbers are in MSB0 form (the bits are numbered from 0 at the MSB and
10 counting up as you move to the LSB end). All bit ranges are inclusive (so
11 `4:6` means bits 4, 5, and 6).
12
13 64-bit instructions are split into two 32-bit words, the prefix and the suffix. The prefix always comes before the suffix in PC order.
14
15 ## Definition of Reserved in this spec.
16
17 For the new fields added in SVP64, instructions that have any of their fields set to a reserved value must cause an illegal instruction trap, to allow emulation of future instruction sets.
18
19 This is unlike OpenPower ISA v3.1, which doesn't require a CPU to trap.
20
21 ## Remapped Encoding (`RM[0:23]`)
22
23 To allow relatively easy remapping of which portions of the Prefix Opcode Map
24 are used for SVP64 without needing to rewrite a large portion of the SVP64
25 spec, a mapping is defined from the OpenPower v3.1 prefix bits to a new 24-bit
26 Remapped Encoding denoted `RM[0]` at the MSB to `RM[23]` at the LSB.
27
28 The mapping from the OpenPower v3.1 prefix bits to the Remapped Encoding is
29 defined in the Prefix Fields section.
30
31 ## Remapped Encoding Fields
32
33 Shows all fields in the Remapped Encoding `RM[0:23]` for all instruction variants. There are two categories: Single and Twin Predication.
34
35 ### Single Predication (N(src) > 1)
36
37
38 | Remapped Encoding Field Name | Field bits | Description |
39 |------------------------------|------------|---------------------------------------------------------------------------|
40 | MASK_KIND | `0` | Execution Mask Kind |
41 | MASK | `1:3` | Execution Mask |
42 | ELWIDTH | `4:5` | Element Width |
43 | SUBVL | `6:7` | Sub-vector length |
44 | Rdest_EXTRA | `8:10` | extra bits for Rdest (Uses R\*_EXTRA Encoding) |
45 | Rsrc1_EXTRA | `11:13` | extra bits for Rsrc1 (Uses R\*_EXTRA Encoding) |
46 | Rsrc2_EXTRA | `14:16` | extra bits for Rsrc2 (Uses R\*_EXTRA Encoding) |
47 | Rsrc3_EXTRA | `17:18` | extra bits for Rsrc3 (Uses 2-bit R\*_EXTRA Encoding) |
48 | MODE | `19:23` | see [[discussion]] |
49
50 ### Twin Predication (src=1, dest=1)
51
52 | Remapped Encoding Field Name | Field bits | Description |
53 |------------------------------|------------|---------------------------------------------------------------------------|
54 | MASK_KIND | `0` | Execution Mask Kind |
55 | MASK | `1:3` | Execution Mask |
56 | ELWIDTH | `4:5` | Element Width |
57 | SUBVL | `6:7` | Sub-vector length |
58 | Rdest_EXTRA | `8:10` | extra bits for Rdest (Uses R\*_EXTRA Encoding) |
59 | Rsrc1_EXTRA | `11:13` | extra bits for Rsrc1 (Uses R\*_EXTRA Encoding) |
60 | MASK_SRC | `14:16` | Execution Mask for Source (only on instructions with twin-predication) |
61 | ELWIDTH_SRC | `17:18` | Element Width for Source (only on instructions with twin-predication) |
62 | MODE | `19:23` | see [[discussion]] |
63
64 note in [[discussion]]: TODO, evaluate if 2nd SUBVL should be added. conclusion: no. 2nd SUBVL makes no sense except for mv, and that is covered by [[mv.vec]]
65
66 ## R\*_EXTRA Encoding
67
68 (**TODO: 2-bit version of the table, just like in the original SVPrefix. This is important, to save bits on 4-operand instructions such as fmadd**)
69
70 In the following table, `<N>` denotes the value of the corresponding register field in the SVP64 suffix word.
71
72 (**Jacob: these tables are not in the slightest bit understandable due to the use of register names that are impossible to interpret clearly**)
73
74 | R\*_EXTRA | Vector/Scalar<br/>Mode | CR Register | Int/FP<br/>Register |
75 |-----------|------------------------|---------------|---------------------|
76 | 000 | Scalar | `SVCR<N>_000` | `SV[F]R<N>_00` |
77 | 001 | Scalar | `SVCR<N>_010` | `SV[F]R<N>_01` |
78 | 010 | Scalar | `SVCR<N>_100` | `SV[F]R<N>_10` |
79 | 011 | Scalar | `SVCR<N>_110` | `SV[F]R<N>_11` |
80 | 100 | Vector | `SVCR<N>_000` | `SV[F]R<N>_00` |
81 | 101 | Vector | `SVCR<N>_010` | `SV[F]R<N>_01` |
82 | 110 | Vector | `SVCR<N>_100` | `SV[F]R<N>_10` |
83 | 111 | Vector | `SVCR<N>_110` | `SV[F]R<N>_11` |
84
85 ## ELWIDTH Encoding
86
87 | Instruction Kind | ELWIDTH Value | Mnemonic | Description |
88 |------------------|---------------|---------------------------|-------------------------------------------------------------------------------------|
89 | Integer | 00 | `ELWIDTH=b` | Byte: 8-bit integer |
90 | Integer | 01 | `ELWIDTH=h` | Halfword: 16-bit integer |
91 | Integer | 10 | `ELWIDTH=w` | Word: 32-bit integer |
92 | Integer | 11 | `ELWIDTH=d` | Doubleword: 64-bit integer |
93 | FP | 00 | `ELWIDTH=bf16` (Reserved) | Reserved for [`bf16`](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) |
94 | FP | 01 | `ELWIDTH=f16` | 16-bit IEEE 754 Half floating-point |
95 | FP | 10 | `ELWIDTH=f32` | 32-bit IEEE 754 Single floating-point |
96 | FP | 11 | `ELWIDTH=f64` | 64-bit IEEE 754 Double floating-point |
97
98 ## SUBVL Encoding
99
100 | SUBVL Value | Mnemonic | Description |
101 |-------------|---------------------|------------------------|
102 | 00 | `SUBVL=4` | Sub-vector length of 4 |
103 | 01 | `SUBVL=1` (default) | Sub-vector length of 1 |
104 | 10 | `SUBVL=2` | Sub-vector length of 2 |
105 | 11 | `SUBVL=3` | Sub-vector length of 3 |
106
107 ## MASK/MASK_SRC & MASK_KIND Encoding
108
109 One bit (`MASKMODE`) indicates the mode: CR or Int predication. The two types may not be mixed.
110
111 | MASK_KIND Value | Description |
112 |-----------------|------------------------------------------------------|
113 | 0 | MASK/MASK_SRC are encoded using Integer Predication |
114 | 1 | MASK/MASK_SRC are encoded using CR-based Predication |
115
116 Integer Twin predication has a second set if 3 bits that uses the same encoding thus allowing either the same register (r3 or r10) to be used for both src and dest, or different regs (one for src, one for dest).
117
118 Likewise CR based twin predication has a second set of 3 bits, allowing a different test to be applied.
119
120 ### Integer Predication (MASK_KIND=0)
121
122 When the predicate mode bit is zero the 3 bits are interpreted as below.
123 Twin predication has an identical 3 bit field similarly encoded.
124
125 | MASK/MASK_SRC<br/>Value | Mnemonic | Description |
126 |-------------------------|----------|--------------------------------------------------------|
127 | 000 | ALWAYS | Operation is not masked (mask set to all 1s) |
128 | 001 | 1 << R3 | Element `i` is enabled if `i == R3` |
129 | 010 | R3 | Element `i` is enabled if `R3 & (1 << i)` is non-zero |
130 | 011 | ~R3 | Element `i` is enabled if `R3 & (1 << i)` is zero |
131 | 100 | R10 | Element `i` is enabled if `R10 & (1 << i)` is non-zero |
132 | 101 | ~R10 | Element `i` is enabled if `R10 & (1 << i)` is zero |
133 | 110 | R30 | Element `i` is enabled if `R30 & (1 << i)` is non-zero |
134 | 111 | ~R30 | Element `i` is enabled if `R30 & (1 << i)` is zero |
135
136 ### CR-based Predication (MASK_KIND=1)
137
138 When the predicate mode bit is one the 3 bits are interpreted as below. Twin predication has an identical 3 bit field similarly encoded
139
140 | MASK/MASK_SRC<br/>Value | Mnemonic | Description |
141 |-------------------------|----------|-------------------------------------------------|
142 | 000 | lt | Element `i` is enabled if `CR[6+i].LT` is set |
143 | 001 | nl/ge | Element `i` is enabled if `CR[6+i].LT` is clear |
144 | 010 | gt | Element `i` is enabled if `CR[6+i].GT` is set |
145 | 011 | ng/le | Element `i` is enabled if `CR[6+i].GT` is clear |
146 | 100 | eq | Element `i` is enabled if `CR[6+i].EQ` is set |
147 | 101 | ne | Element `i` is enabled if `CR[6+i].EQ` is clear |
148 | 110 | so/un | Element `i` is enabled if `CR[6+i].FU` is set |
149 | 111 | ns/nu | Element `i` is enabled if `CR[6+i].FU` is clear |
150
151 CR based predication. TODO: select alternate CR for twin predication? see [[discussion]] Overlap of the two CR based predicates must be taken into account, so the starting point for one of them must be suitably high, or accept that for twin predication VL must not exceed the range where overlap will occur, *or* that they use the same starting point but select different *bits* of the same CRs
152
153
154 ## Prefix Opcode Map (64-bit instruction encoding) (prefix bits 6:11)
155
156 (shows both PowerISA v3.1 instructions as well as new SVP instructions; empty spaces are yet-to-be-allocated Illegal Instructions)
157
158 | bits 6:11 | ---000 | ---001 | ---010 | ---011 | ---100 | ---101 | ---110 | ---111 |
159 |-----------|----------|------------|----------|----------|----------|----------|----------|----------|
160 | 000--- | 8LS-form | 8LS-form | 8LS-form | 8LS-form | 8LS-form | 8LS-form | 8LS-form | 8LS-form |
161 | 001--- | | | | | | | | |
162 | 010--- | 8RR-form | | | | SVP64 | SVP64 | SVP64 | SVP64 |
163 | 011--- | | | | | SVP64 | SVP64 | SVP64 | SVP64 |
164 | 100--- | MLS-form | MLS-form | MLS-form | MLS-form | MLS-form | MLS-form | MLS-form | MLS-form |
165 | 101--- | | | | | | | | |
166 | 110--- | MRR-form | | | | SVP64 | SVP64 | SVP64 | SVP64 |
167 | 111--- | | MMIRR-form | | | SVP64 | SVP64 | SVP64 | SVP64 |
168
169 ## Prefix Fields
170
171 | Prefix Field Name | Field bits | Constant Value | Description |
172 |---------------------|------------|----------------|--------------------------------------------|
173 | PO (Primary Opcode) | `0:5` | `1` | Indicates this is a 64-bit instruction |
174 | `RM[0]` | `6` | | Bit 0 of the Remapped Encoding |
175 | SVP64_7 | `7` | `1` | Indicates this is a SVP64 instruction |
176 | `RM[1]` | `8` | | Bit 1 of the Remapped Encoding |
177 | SVP64_9 | `9` | `1` | Indicates this is a SVP64 instruction |
178 | `RM[2:23]` | `10:31` | | Bits 2 through 23 of the Remapped Encoding |
179
180 # Twin Predication
181
182 This is a novel concept that allows predication to be applied to a single source and a single dest register. The following types of traditional Vector operations may be encoded with it, *without requiring explicit opcodes to do so*
183
184 * VSPLAT (a single scalar distributed across a vector)
185 * VEXTRACT (like LLVM IR [`extractelement`](https://releases.llvm.org/11.0.0/docs/LangRef.html#extractelement-instruction))
186 * VINSERT (like LLVM IR [`insertelement`](https://releases.llvm.org/11.0.0/docs/LangRef.html#insertelement-instruction))
187 * VCOMPRESS (like LLVM IR [`llvm.masked.compressstore.*`](https://releases.llvm.org/11.0.0/docs/LangRef.html#llvm-masked-compressstore-intrinsics))
188 * VEXPAND (like LLVM IR [`llvm.masked.expandload.*`](https://releases.llvm.org/11.0.0/docs/LangRef.html#llvm-masked-expandload-intrinsics))
189
190 Those patterns (and more) may be applied to:
191
192 * mv (the usual way that V\* operations are created)
193 * exts\* sign-extension
194 * rwlinm and other RS-RA shift operations
195 * LD and ST (treating AGEN as one source)
196 * FP fclass, fsgn, fneg, fabs, fcvt, frecip, fsqrt etc.
197 * Condition Register ops mfcr, mtcr and other similar
198
199 This is a huge list that creates extremely powerful combinations, particularly given that one of the predicate options is `(1<<r3)`
200
201 Additional unusual capabilities of Twin Predication include a back-to-back version of VCOMPRESS-VEXPAND which is effectively the ability to do an ordered multiple VINSERT.
202
203 ## Twin Predication
204
205 There are two different encodings: single-predication (typically arithmetic operations i.e. with more than one source register) and twin-predication (one source, one destination). They require different encodings
206
207 # Register Naming
208
209 SV Registers are numbered using the notation `SV[F|C]R<N>_<M>` where `<N>` is a decimal integer and `<M>` is a binary integer. Two integers are used to enable future register expansions to add more registers by appending more LSB bits to `<M>`.
210
211 For all `SV[F|C]R<N>_<M>` registers, the N is the
212 upper bits in decimal and the M is the lower bits in binary, so `SVR5_01` is
213 SV integer register `(5 << 2) + 0b01`, `SVCR6_011` is SV condition register
214 `(6 << 3) + 0b011`, and `SVFR20_10` is SV floating-point register
215 `(20 << 2) + 0b10`.
216
217 ## Example Code
218
219 a vectorized 32-bit add:
220
221 add SVR3_01, SVR6_10, SVR10_00, elwidth=w, subvl=1, mask=lt
222
223 does the following:
224
225 const size_t start_cr = (6 << 3) + 0b000; // starting at SVCR6_000
226 // pretend for the moment that type-punning actually works in C/C++
227 uint32_t *rt = (uint32_t *)&regs[(3 << 2) + 0b01]; // SVR3_01
228 uint32_t *ra = (uint32_t *)&regs[(6 << 2) + 0b10]; // SVR6_10
229 uint32_t *rb = (uint32_t *)&regs[(10 << 2) + 0b00]; // SVR10_00
230 for(size_t i = 0; i < VL; i++) {
231 if(CRs[(start_cr + i) % 64].lt) {
232 rt[i] = ra[i] + rb[i];
233 }
234 }
235
236 ## Integer Registers
237
238 setvli ..., VL=7
239 add r20, r25, r30, elwidth=64, subvl=1
240
241 where `r20`, `r25`, and `r30` are standard OpenPower register names.
242 Those names correspond to `SVR20_00`, `SVR25_00`, and `SVR30_00`.
243
244 pseudocode:
245
246 const size_t STD_TO_SV_SHIFT = 2; // gets bigger as reg files expand to 256, 512, ... registers
247
248 VL = 7; // setvli (omitting maxvl here)
249
250 for(size_t i = 0; i < VL; i++) {
251 regs[(20 << STD_TO_SV_SHIFT) + i] = regs[(25 << STD_TO_SV_SHIFT) + i]
252 + regs[(30 << STD_TO_SV_SHIFT) + i];
253 }
254
255 Standard PowerISA Integer registers are aliased to some of the SV integer registers:
256
257 (**Jacob these names are impossible to interpret due to them not being sequential numbering and there being no compact algorithm given that shows how they're created. the original SVPrefix was dead easy to understand**)
258
259 | Integer<br/>Register | SV Integer<br/>Register | Integer<br/>Register | SV Integer<br/>Register | Integer<br/>Register | SV Integer<br/>Register | Integer<br/>Register | SV Integer<br/>Register |
260 |----------------------|-------------------------|----------------------|-------------------------|----------------------|-------------------------|----------------------|-------------------------|
261 | R0 | SVR0_00 | R8 | SVR8_00 | R16 | SVR16_00 | R24 | SVR24_00 |
262 | | SVR0_01 | | SVR8_01 | | SVR16_01 | | SVR24_01 |
263 | | SVR0_10 | | SVR8_10 | | SVR16_10 | | SVR24_10 |
264 | | SVR0_11 | | SVR8_11 | | SVR16_11 | | SVR24_11 |
265 | R1 | SVR1_00 | R9 | SVR9_00 | R17 | SVR17_00 | R25 | SVR25_00 |
266 | | SVR1_01 | | SVR9_01 | | SVR17_01 | | SVR25_01 |
267 | | SVR1_10 | | SVR9_10 | | SVR17_10 | | SVR25_10 |
268 | | SVR1_11 | | SVR9_11 | | SVR17_11 | | SVR25_11 |
269 | R2 | SVR2_00 | R10 | SVR10_00 | R18 | SVR18_00 | R26 | SVR26_00 |
270 | | SVR2_01 | | SVR10_01 | | SVR18_01 | | SVR26_01 |
271 | | SVR2_10 | | SVR10_10 | | SVR18_10 | | SVR26_10 |
272 | | SVR2_11 | | SVR10_11 | | SVR18_11 | | SVR26_11 |
273 | R3 | SVR3_00 | R11 | SVR11_00 | R19 | SVR19_00 | R27 | SVR27_00 |
274 | | SVR3_01 | | SVR11_01 | | SVR19_01 | | SVR27_01 |
275 | | SVR3_10 | | SVR11_10 | | SVR19_10 | | SVR27_10 |
276 | | SVR3_11 | | SVR11_11 | | SVR19_11 | | SVR27_11 |
277 | R4 | SVR4_00 | R12 | SVR12_00 | R20 | SVR20_00 | R28 | SVR28_00 |
278 | | SVR4_01 | | SVR12_01 | | SVR20_01 | | SVR28_01 |
279 | | SVR4_10 | | SVR12_10 | | SVR20_10 | | SVR28_10 |
280 | | SVR4_11 | | SVR12_11 | | SVR20_11 | | SVR28_11 |
281 | R5 | SVR5_00 | R13 | SVR13_00 | R21 | SVR21_00 | R29 | SVR29_00 |
282 | | SVR5_01 | | SVR13_01 | | SVR21_01 | | SVR29_01 |
283 | | SVR5_10 | | SVR13_10 | | SVR21_10 | | SVR29_10 |
284 | | SVR5_11 | | SVR13_11 | | SVR21_11 | | SVR29_11 |
285 | R6 | SVR6_00 | R14 | SVR14_00 | R22 | SVR22_00 | R30 | SVR30_00 |
286 | | SVR6_01 | | SVR14_01 | | SVR22_01 | | SVR30_01 |
287 | | SVR6_10 | | SVR14_10 | | SVR22_10 | | SVR30_10 |
288 | | SVR6_11 | | SVR14_11 | | SVR22_11 | | SVR30_11 |
289 | R7 | SVR7_00 | R15 | SVR15_00 | R23 | SVR23_00 | R31 | SVR31_00 |
290 | | SVR7_01 | | SVR15_01 | | SVR23_01 | | SVR31_01 |
291 | | SVR7_10 | | SVR15_10 | | SVR23_10 | | SVR31_10 |
292 | | SVR7_11 | | SVR15_11 | | SVR23_11 | | SVR31_11 |
293
294 ## Floating-Point Registers
295
296 Standard PowerISA floating-point and VSX registers are aliased to some of the SV floating-point registers:
297
298 (**Jacob these names are impossible to interpret due to them not being sequential numbering and there being no compact algorithm given that shows how they're created. the original SVPrefix was dead easy to understand**)
299
300 | FP<br/>Register | VSX Register | SV FP<br/>Register | FP<br/>Register | VSX Register | SV FP<br/>Register |
301 |-----------------|-----------------------|--------------------|-----------------|-----------------------|--------------------|
302 | FPR\[0\] | VSR\[0\]\.dword\[0\] | SVFR0\_00 | FPR\[16\] | VSR\[16\]\.dword\[0\] | SVFR16\_00 |
303 | | VSR\[0\]\.dword\[1\] | SVFR0\_01 | | VSR\[16\]\.dword\[1\] | SVFR16\_01 |
304 | | VSR\[32\]\.dword\[0\] | SVFR0\_10 | | VSR\[48\]\.dword\[0\] | SVFR16\_10 |
305 | | VSR\[32\]\.dword\[1\] | SVFR0\_11 | | VSR\[48\]\.dword\[1\] | SVFR16\_11 |
306 | FPR\[1\] | VSR\[1\]\.dword\[0\] | SVFR1\_00 | FPR\[17\] | VSR\[17\]\.dword\[0\] | SVFR17\_00 |
307 | | VSR\[1\]\.dword\[1\] | SVFR1\_01 | | VSR\[17\]\.dword\[1\] | SVFR17\_01 |
308 | | VSR\[33\]\.dword\[0\] | SVFR1\_10 | | VSR\[49\]\.dword\[0\] | SVFR17\_10 |
309 | | VSR\[33\]\.dword\[1\] | SVFR1\_11 | | VSR\[49\]\.dword\[1\] | SVFR17\_11 |
310 | FPR\[2\] | VSR\[2\]\.dword\[0\] | SVFR2\_00 | FPR\[18\] | VSR\[18\]\.dword\[0\] | SVFR18\_00 |
311 | | VSR\[2\]\.dword\[1\] | SVFR2\_01 | | VSR\[18\]\.dword\[1\] | SVFR18\_01 |
312 | | VSR\[34\]\.dword\[0\] | SVFR2\_10 | | VSR\[50\]\.dword\[0\] | SVFR18\_10 |
313 | | VSR\[34\]\.dword\[1\] | SVFR2\_11 | | VSR\[50\]\.dword\[1\] | SVFR18\_11 |
314 | FPR\[3\] | VSR\[3\]\.dword\[0\] | SVFR3\_00 | FPR\[19\] | VSR\[19\]\.dword\[0\] | SVFR19\_00 |
315 | | VSR\[3\]\.dword\[1\] | SVFR3\_01 | | VSR\[19\]\.dword\[1\] | SVFR19\_01 |
316 | | VSR\[35\]\.dword\[0\] | SVFR3\_10 | | VSR\[51\]\.dword\[0\] | SVFR19\_10 |
317 | | VSR\[35\]\.dword\[1\] | SVFR3\_11 | | VSR\[51\]\.dword\[1\] | SVFR19\_11 |
318 | FPR\[4\] | VSR\[4\]\.dword\[0\] | SVFR4\_00 | FPR\[20\] | VSR\[20\]\.dword\[0\] | SVFR20\_00 |
319 | | VSR\[4\]\.dword\[1\] | SVFR4\_01 | | VSR\[20\]\.dword\[1\] | SVFR20\_01 |
320 | | VSR\[36\]\.dword\[0\] | SVFR4\_10 | | VSR\[52\]\.dword\[0\] | SVFR20\_10 |
321 | | VSR\[36\]\.dword\[1\] | SVFR4\_11 | | VSR\[52\]\.dword\[1\] | SVFR20\_11 |
322 | FPR\[5\] | VSR\[5\]\.dword\[0\] | SVFR5\_00 | FPR\[21\] | VSR\[21\]\.dword\[0\] | SVFR21\_00 |
323 | | VSR\[5\]\.dword\[1\] | SVFR5\_01 | | VSR\[21\]\.dword\[1\] | SVFR21\_01 |
324 | | VSR\[37\]\.dword\[0\] | SVFR5\_10 | | VSR\[53\]\.dword\[0\] | SVFR21\_10 |
325 | | VSR\[37\]\.dword\[1\] | SVFR5\_11 | | VSR\[53\]\.dword\[1\] | SVFR21\_11 |
326 | FPR\[6\] | VSR\[6\]\.dword\[0\] | SVFR6\_00 | FPR\[22\] | VSR\[22\]\.dword\[0\] | SVFR22\_00 |
327 | | VSR\[6\]\.dword\[1\] | SVFR6\_01 | | VSR\[22\]\.dword\[1\] | SVFR22\_01 |
328 | | VSR\[38\]\.dword\[0\] | SVFR6\_10 | | VSR\[54\]\.dword\[0\] | SVFR22\_10 |
329 | | VSR\[38\]\.dword\[1\] | SVFR6\_11 | | VSR\[54\]\.dword\[1\] | SVFR22\_11 |
330 | FPR\[7\] | VSR\[7\]\.dword\[0\] | SVFR7\_00 | FPR\[23\] | VSR\[23\]\.dword\[0\] | SVFR23\_00 |
331 | | VSR\[7\]\.dword\[1\] | SVFR7\_01 | | VSR\[23\]\.dword\[1\] | SVFR23\_01 |
332 | | VSR\[39\]\.dword\[0\] | SVFR7\_10 | | VSR\[55\]\.dword\[0\] | SVFR23\_10 |
333 | | VSR\[39\]\.dword\[1\] | SVFR7\_11 | | VSR\[55\]\.dword\[1\] | SVFR23\_11 |
334 | FPR\[8\] | VSR\[8\]\.dword\[0\] | SVFR8\_00 | FPR\[24\] | VSR\[24\]\.dword\[0\] | SVFR24\_00 |
335 | | VSR\[8\]\.dword\[1\] | SVFR8\_01 | | VSR\[24\]\.dword\[1\] | SVFR24\_01 |
336 | | VSR\[40\]\.dword\[0\] | SVFR8\_10 | | VSR\[56\]\.dword\[0\] | SVFR24\_10 |
337 | | VSR\[40\]\.dword\[1\] | SVFR8\_11 | | VSR\[56\]\.dword\[1\] | SVFR24\_11 |
338 | FPR\[9\] | VSR\[9\]\.dword\[0\] | SVFR9\_00 | FPR\[25\] | VSR\[25\]\.dword\[0\] | SVFR25\_00 |
339 | | VSR\[9\]\.dword\[1\] | SVFR9\_01 | | VSR\[25\]\.dword\[1\] | SVFR25\_01 |
340 | | VSR\[41\]\.dword\[0\] | SVFR9\_10 | | VSR\[57\]\.dword\[0\] | SVFR25\_10 |
341 | | VSR\[41\]\.dword\[1\] | SVFR9\_11 | | VSR\[57\]\.dword\[1\] | SVFR25\_11 |
342 | FPR\[10\] | VSR\[10\]\.dword\[0\] | SVFR10\_00 | FPR\[26\] | VSR\[26\]\.dword\[0\] | SVFR26\_00 |
343 | | VSR\[10\]\.dword\[1\] | SVFR10\_01 | | VSR\[26\]\.dword\[1\] | SVFR26\_01 |
344 | | VSR\[42\]\.dword\[0\] | SVFR10\_10 | | VSR\[58\]\.dword\[0\] | SVFR26\_10 |
345 | | VSR\[42\]\.dword\[1\] | SVFR10\_11 | | VSR\[58\]\.dword\[1\] | SVFR26\_11 |
346 | FPR\[11\] | VSR\[11\]\.dword\[0\] | SVFR11\_00 | FPR\[27\] | VSR\[27\]\.dword\[0\] | SVFR27\_00 |
347 | | VSR\[11\]\.dword\[1\] | SVFR11\_01 | | VSR\[27\]\.dword\[1\] | SVFR27\_01 |
348 | | VSR\[43\]\.dword\[0\] | SVFR11\_10 | | VSR\[59\]\.dword\[0\] | SVFR27\_10 |
349 | | VSR\[43\]\.dword\[1\] | SVFR11\_11 | | VSR\[59\]\.dword\[1\] | SVFR27\_11 |
350 | FPR\[12\] | VSR\[12\]\.dword\[0\] | SVFR12\_00 | FPR\[28\] | VSR\[28\]\.dword\[0\] | SVFR28\_00 |
351 | | VSR\[12\]\.dword\[1\] | SVFR12\_01 | | VSR\[28\]\.dword\[1\] | SVFR28\_01 |
352 | | VSR\[44\]\.dword\[0\] | SVFR12\_10 | | VSR\[60\]\.dword\[0\] | SVFR28\_10 |
353 | | VSR\[44\]\.dword\[1\] | SVFR12\_11 | | VSR\[60\]\.dword\[1\] | SVFR28\_11 |
354 | FPR\[13\] | VSR\[13\]\.dword\[0\] | SVFR13\_00 | FPR\[29\] | VSR\[29\]\.dword\[0\] | SVFR29\_00 |
355 | | VSR\[13\]\.dword\[1\] | SVFR13\_01 | | VSR\[29\]\.dword\[1\] | SVFR29\_01 |
356 | | VSR\[45\]\.dword\[0\] | SVFR13\_10 | | VSR\[61\]\.dword\[0\] | SVFR29\_10 |
357 | | VSR\[45\]\.dword\[1\] | SVFR13\_11 | | VSR\[61\]\.dword\[1\] | SVFR29\_11 |
358 | FPR\[14\] | VSR\[14\]\.dword\[0\] | SVFR14\_00 | FPR\[30\] | VSR\[30\]\.dword\[0\] | SVFR30\_00 |
359 | | VSR\[14\]\.dword\[1\] | SVFR14\_01 | | VSR\[30\]\.dword\[1\] | SVFR30\_01 |
360 | | VSR\[46\]\.dword\[0\] | SVFR14\_10 | | VSR\[62\]\.dword\[0\] | SVFR30\_10 |
361 | | VSR\[46\]\.dword\[1\] | SVFR14\_11 | | VSR\[62\]\.dword\[1\] | SVFR30\_11 |
362 | FPR\[15\] | VSR\[15\]\.dword\[0\] | SVFR15\_00 | FPR\[31\] | VSR\[31\]\.dword\[0\] | SVFR31\_00 |
363 | | VSR\[15\]\.dword\[1\] | SVFR15\_01 | | VSR\[31\]\.dword\[1\] | SVFR31\_01 |
364 | | VSR\[47\]\.dword\[0\] | SVFR15\_10 | | VSR\[63\]\.dword\[0\] | SVFR31\_10 |
365 | | VSR\[47\]\.dword\[1\] | SVFR15\_11 | | VSR\[63\]\.dword\[1\] | SVFR31\_11 |
366
367 # Operation
368
369 ## CR fields as inputs/outputs of vector operations
370
371 When vectorized, the CR inputs/outputs are read/written to 4-bit CR fields
372 starting from SVCR6_000 and incrementing from there. If SVCR7_111 is reached, the next CR
373 field used wraps around to SVCR0_000, then incrementing from there.
374 (see [[discussion]]. some alternative schemes are described there)
375
376 SVCR6_000 was chosen to balance avoiding needing to save CR2-CR4 (which are
377 callee-saved) just to use SV vectors with VL <= 61 as well as having the first
378 vector CR field readily accessible to standard CR instructions and branches.
379 Additionally, SVCR6_000 is used as the implicit result of a OpenPower ISA v3.1
380 standard vector (SIMD) instruction with Rc=1.
381
382 ## Table of CR fields
383
384 CR[i] is the notation used by the OpenPower spec to refer to CR field #i,
385 so FP instructions with Rc=1 write to CR[1] aka SVCR1_000.
386
387 There are 3 new SPRs for holding CRs: CR_EXT1, CR_EXT2, and CR_EXT3.
388
389 The 64 SV CRs are arranged similarly to the way the 128 integer registers are arranged:
390
391 (**Jacob these names are impossible to interpret due to them not being sequential numbering and there being no compact algorithm given that shows how they're created. the original SVPrefix was dead easy to understand**)
392
393 | CR<br/>Register | SPR<br/>Field | SV CR<br/>Register | CR<br/>Register | SPR<br/>Field | SV CR<br/>Register |
394 |-----------------|----------------|--------------------|-----------------|----------------|--------------------|
395 | CR[0] | CR[32:35] | SVCR0_000 | CR[4] | CR[48:51] | SVCR4_000 |
396 | | CR_EXT1[32:35] | SVCR0_001 | | CR_EXT1[48:51] | SVCR4_001 |
397 | | CR_EXT2[32:35] | SVCR0_010 | | CR_EXT2[48:51] | SVCR4_010 |
398 | | CR_EXT3[32:35] | SVCR0_011 | | CR_EXT3[48:51] | SVCR4_011 |
399 | *CR[-8]* | CR[0:3] | SVCR0_100 | *CR[-4]* | CR[16:19] | SVCR4_100 |
400 | | CR_EXT1[0:3] | SVCR0_101 | | CR_EXT1[16:19] | SVCR4_101 |
401 | | CR_EXT2[0:3] | SVCR0_110 | | CR_EXT2[16:19] | SVCR4_110 |
402 | | CR_EXT3[0:3] | SVCR0_111 | | CR_EXT3[16:19] | SVCR4_111 |
403 | CR[1] | CR[36:39] | SVCR1_000 | CR[5] | CR[52:55] | SVCR5_000 |
404 | | CR_EXT1[36:39] | SVCR1_001 | | CR_EXT1[52:55] | SVCR5_001 |
405 | | CR_EXT2[36:39] | SVCR1_010 | | CR_EXT2[52:55] | SVCR5_010 |
406 | | CR_EXT3[36:39] | SVCR1_011 | | CR_EXT3[52:55] | SVCR5_011 |
407 | *CR[-7]* | CR[4:7] | SVCR1_100 | *CR[-3]* | CR[20:23] | SVCR5_100 |
408 | | CR_EXT1[4:7] | SVCR1_101 | | CR_EXT1[20:23] | SVCR5_101 |
409 | | CR_EXT2[4:7] | SVCR1_110 | | CR_EXT2[20:23] | SVCR5_110 |
410 | | CR_EXT3[4:7] | SVCR1_111 | | CR_EXT3[20:23] | SVCR5_111 |
411 | CR[2] | CR[40:43] | SVCR2_000 | CR[6] | CR[56:59] | SVCR6_000 |
412 | | CR_EXT1[40:43] | SVCR2_001 | | CR_EXT1[56:59] | SVCR6_001 |
413 | | CR_EXT2[40:43] | SVCR2_010 | | CR_EXT2[56:59] | SVCR6_010 |
414 | | CR_EXT3[40:43] | SVCR2_011 | | CR_EXT3[56:59] | SVCR6_011 |
415 | *CR[-6]* | CR[8:11] | SVCR2_100 | *CR[-2]* | CR[24:27] | SVCR6_100 |
416 | | CR_EXT1[8:11] | SVCR2_101 | | CR_EXT1[24:27] | SVCR6_101 |
417 | | CR_EXT2[8:11] | SVCR2_110 | | CR_EXT2[24:27] | SVCR6_110 |
418 | | CR_EXT3[8:11] | SVCR2_111 | | CR_EXT3[24:27] | SVCR6_111 |
419 | CR[3] | CR[44:47] | SVCR3_000 | CR[7] | CR[60:63] | SVCR7_000 |
420 | | CR_EXT1[44:47] | SVCR3_001 | | CR_EXT1[60:63] | SVCR7_001 |
421 | | CR_EXT2[44:47] | SVCR3_010 | | CR_EXT2[60:63] | SVCR7_010 |
422 | | CR_EXT3[44:47] | SVCR3_011 | | CR_EXT3[60:63] | SVCR7_011 |
423 | *CR[-5]* | CR[12:15] | SVCR3_100 | *CR[-1]* | CR[28:31] | SVCR7_100 |
424 | | CR_EXT1[12:15] | SVCR3_101 | | CR_EXT1[28:31] | SVCR7_101 |
425 | | CR_EXT2[12:15] | SVCR3_110 | | CR_EXT2[28:31] | SVCR7_110 |
426 | | CR_EXT3[12:15] | SVCR3_111 | | CR_EXT3[28:31] | SVCR7_111 |
427
428 Note: CR[-8] through CR[-1] are not part of OpenPower v3.1, they are the MSB half of the 64-bit CR SPR.
429
430 # Register Profiles
431
432 Instructions are broken down by Register Profiles as listed in the following auto-generated page:
433 [[opcode_regs_deduped]]. "Non-SV" indicates that the operations with this Register Profile cannot be Vectorised (mtspr, bc, dcbz, twi)
434
435 ## LDST-1R-1W-imm
436 TBD
437 ## LDST-1R-2W-imm
438 TBD
439 ## LDST-2R-imm
440 TBD
441 ## LDST-2R-1W
442 TBD
443 ## LDST-2R-1W-imm
444 TBD
445 ## LDST-2R-2W
446 TBD
447 ## LDST-3R
448 TBD
449 ## LDST-3R-CRo
450 TBD
451 ## LDST-3R-1W
452 TBD
453 ## CRio
454 TBD
455 ## CR=2R1W
456
457 Remapped Encoding Fields:
458
459 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
460 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
461 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
462
463 ## 1W-CRi
464
465 Remapped Encoding Fields:
466
467 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
468 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
469 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
470
471 ## 1R-CRo
472
473 Remapped Encoding Fields:
474
475 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
476 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
477 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
478
479 ## 1R-CRio
480
481 Remapped Encoding Fields:
482
483 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
484 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
485 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
486
487 ## 1R-1W
488
489 Remapped Encoding Fields:
490
491 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
492 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
493 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
494
495 ## 1R-1W-imm
496
497 Remapped Encoding Fields:
498
499 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
500 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
501 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
502
503 ## 1R-1W-CRo
504
505 Remapped Encoding Fields:
506
507 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
508 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
509 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
510
511 ## 1R-1W-CRio
512
513 Remapped Encoding Fields:
514
515 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
516 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
517 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
518
519 ## 2R-CRo
520
521 Remapped Encoding Fields:
522
523 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
524 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
525 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
526
527 ## 2R-CRio
528
529 Remapped Encoding Fields:
530
531 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
532 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
533 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
534
535 ## 2R-1W
536
537 Remapped Encoding Fields:
538
539 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
540 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
541 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
542
543 ## 2R-1W-CRo
544
545 Remapped Encoding Fields:
546
547 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
548 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
549 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
550
551 <!-- comment needed to stop ikiwiki markdown from mis-parsing table -->
552
553 ## 2R-1W-CRo (rl(w|d)imi)
554
555 Remapped Encoding Fields:
556
557 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:23` |
558 |-----------|-------|---------|-------|-------------|-------------|---------|
559 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | TBD |
560
561 ## 2R-1W-CRi
562 TBD
563 ## 2R-1W-CRio
564
565 Remapped Encoding Fields:
566
567 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
568 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
569 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
570
571 ## 3R-1W-CRio
572
573 Remapped Encoding Fields:
574
575 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:19` | `20:23` |
576 |-----------|-------|---------|-------|-------------|-------------|-------------|-------------|----------|
577 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | Rsrc3_EXTRA | Reserved |