(no commit message)
[libreriscv.git] / openpower / sv / svp_rewrite / svp64.mdwn
1 # Rewrite of SVP64 for OpenPower ISA v3.1
2
3 * [[svp64/discussion]]
4
5 The plan is to create an encoding for SVP64, then to create an encoding for
6 SVP48, then to reorganize them both to improve field overlap, reducing the
7 amount of decoder hardware necessary.
8
9 All bit numbers are in MSB0 form (the bits are numbered from 0 at the MSB and
10 counting up as you move to the LSB end). All bit ranges are inclusive (so
11 `4:6` means bits 4, 5, and 6).
12
13 64-bit instructions are split into two 32-bit words, the prefix and the suffix. The prefix always comes before the suffix in PC order.
14
15 SVP64 is designed so that when the prefix is all zeros, no effect or influence occurs (no augmentation) such that all standard OpenPOWER v3.B instructions may be active at that time, in full (and SV is quiescent). The corollary is that when the SV prefix is nonzero, alternative meanings may be given to all and any instructions.
16
17 # Definition of Reserved in this spec.
18
19 For the new fields added in SVP64, instructions that have any of their fields set to a reserved value must cause an illegal instruction trap, to allow emulation of future instruction sets.
20
21 This is unlike OpenPower ISA v3.1, which doesn't require a CPU to trap.
22
23 # Remapped Encoding (`RM[0:23]`)
24
25 To allow relatively easy remapping of which portions of the Prefix Opcode Map
26 are used for SVP64 without needing to rewrite a large portion of the SVP64
27 spec, a mapping is defined from the OpenPower v3.1 prefix bits to a new 24-bit
28 Remapped Encoding denoted `RM[0]` at the MSB to `RM[23]` at the LSB.
29
30 The mapping from the OpenPower v3.1 prefix bits to the Remapped Encoding is
31 defined in the Prefix Fields section.
32 ## Prefix Opcode Map (64-bit instruction encoding) (prefix bits 6:11)
33
34 (shows both PowerISA v3.1 instructions as well as new SVP instructions; empty spaces are yet-to-be-allocated Illegal Instructions)
35
36 | bits 6:11 | ---000 | ---001 | ---010 | ---011 | ---100 | ---101 | ---110 | ---111 |
37 |-----------|----------|------------|----------|----------|----------|----------|----------|----------|
38 | 000--- | 8LS-form | 8LS-form | 8LS-form | 8LS-form | 8LS-form | 8LS-form | 8LS-form | 8LS-form |
39 | 001--- | | | | | | | | |
40 | 010--- | 8RR-form | | | | SVP64 | SVP64 | SVP64 | SVP64 |
41 | 011--- | | | | | SVP64 | SVP64 | SVP64 | SVP64 |
42 | 100--- | MLS-form | MLS-form | MLS-form | MLS-form | MLS-form | MLS-form | MLS-form | MLS-form |
43 | 101--- | | | | | | | | |
44 | 110--- | MRR-form | | | | SVP64 | SVP64 | SVP64 | SVP64 |
45 | 111--- | | MMIRR-form | | | SVP64 | SVP64 | SVP64 | SVP64 |
46
47 ## Prefix Fields
48
49 | Prefix Field Name | Field bits | Constant Value | Description |
50 |---------------------|------------|----------------|--------------------------------------------|
51 | PO (Primary Opcode) | `0:5` | `1` | Indicates this is a 64-bit instruction |
52 | `RM[0]` | `6` | | Bit 0 of the Remapped Encoding |
53 | SVP64_7 | `7` | `1` | Indicates this is a SVP64 instruction |
54 | `RM[1]` | `8` | | Bit 1 of the Remapped Encoding |
55 | SVP64_9 | `9` | `1` | Indicates this is a SVP64 instruction |
56 | `RM[2:23]` | `10:31` | | Bits 2 through 23 of the Remapped Encoding |
57
58
59 # Remapped Encoding Fields
60
61 Shows all fields in the Remapped Encoding `RM[0:23]` for all instruction variants. There are two categories: Single and Twin Predication. Due to space considerations further subdivision of Single Predication is based on whether the number of src operands is 2 or 3.
62
63 ## Single Predication dest/src1/2/3
64
65 applies to 4-operand instructions (fmadd, isel, madd).
66
67 | Field Name | Field bits | Description |
68 |------------|------------|------------------------------------------------|
69 | MASK_KIND | `0` | Execution Mask Kind |
70 | MASK | `1:3` | Execution Mask |
71 | ELWIDTH | `4:5` | Element Width |
72 | SUBVL | `6:7` | Sub-vector length |
73 | Rdest_EXTRA2 | `8:9` | extra bits for Rdest (R\*_EXTRA2 Encoding) |
74 | Rsrc1_EXTRA2 | `10:11` | extra bits for Rsrc1 (R\*_EXTRA2 Encoding) |
75 | Rsrc2_EXTRA2 | `12:13` | extra bits for Rsrc2 (R\*_EXTRA2 Encoding) |
76 | Rsrc3_EXTRA2 | `14:15` | extra bits for Rsrc3 (R\*_EXTRA2 Encoding|
77 | reserved | `16` | reserved |
78 | MODE | `19:23` | see [[discussion]] |
79
80
81 ## Single Predication dest/src1/2
82
83 applies to 3-operand instructions (src1 src2 dest)
84
85 | Field Name | Field bits | Description |
86 |------------|------------|------------------------------------------------|
87 | MASK_KIND | `0` | Execution Mask Kind |
88 | MASK | `1:3` | Execution Mask |
89 | ELWIDTH | `4:5` | Element Width |
90 | SUBVL | `6:7` | Sub-vector length |
91 | Rdest_EXTRA3 | `8:10` | extra bits for Rdest (Uses R\*_EXTRA3 Encoding) |
92 | Rsrc1_EXTRA3 | `11:13` | extra bits for Rsrc1 (Uses R\*_EXTRA3 Encoding) |
93 | Rsrc2_EXTRA3 | `14:16` | extra bits for Rsrc3 (Uses R\*_EXTRA3 Encoding) |
94 | MODE | `19:23` | see [[discussion]] |
95
96 ## Twin Predication (src=1, dest=1)
97
98 | Field Name | Field bits | Description |
99 |------------|------------|----------------------------|
100 | MASK_KIND | `0` | Execution Mask Kind |
101 | MASK | `1:3` | Execution Mask |
102 | ELWIDTH | `4:5` | Element Width |
103 | SUBVL | `6:7` | Sub-vector length |
104 | Rdest_EXTRA3 | `8:10` | extra bits for Rdest |
105 | Rsrc1_EXTRA3 | `11:13` | extra bits for Rsrc1 |
106 | MASK_SRC | `14:16` | Execution Mask for Source |
107 | ELWIDTH_SRC | `17:18` | Element Width for Source |
108 | MODE | `19:23` | see [[discussion]] |
109
110 note in [[discussion]]: TODO, evaluate if 2nd SUBVL should be added. conclusion: no. 2nd SUBVL makes no sense except for mv, and that is covered by [[mv.vec]]
111
112 ## R\*_EXTRA2 and R\*_EXTRA3 Encoding
113
114 (**TODO: 2-bit version of the table, just like in the original SVPrefix. This is important, to save bits on 4-operand instructions such as fmadd**)
115
116 In the following table, `<N>` denotes the value of the corresponding register field in the SVP64 suffix word.
117
118 (**Jacob: these tables are not in the slightest bit understandable due to the use of register names that are impossible to interpret clearly**)
119
120 3 bit version
121
122 | R\*_EXTRA3 | Vector/Scalar<br/>Mode | CR Register | Int/FP<br/>Register |
123 |-----------|------------------------|---------------|---------------------|
124 | 000 | Scalar | `SVCR<N>_000` | `SV[F]R<N>_00` |
125 | 001 | Scalar | `SVCR<N>_010` | `SV[F]R<N>_01` |
126 | 010 | Scalar | `SVCR<N>_100` | `SV[F]R<N>_10` |
127 | 011 | Scalar | `SVCR<N>_110` | `SV[F]R<N>_11` |
128 | 100 | Vector | `SVCR<N>_000` | `SV[F]R<N>_00` |
129 | 101 | Vector | `SVCR<N>_010` | `SV[F]R<N>_01` |
130 | 110 | Vector | `SVCR<N>_100` | `SV[F]R<N>_10` |
131 | 111 | Vector | `SVCR<N>_110` | `SV[F]R<N>_11` |
132
133 alternative which is understandable and, if EXTRA3 is zero, maps to "no effect" (scalar OpenPOWER ISA field naming)
134
135 | R\*_EXTRA3 | Mode | CR Register | Int/FP<br/>Register |
136 |-----------|-------|---------------|---------------------|
137 | 000 | Scalar | `` | `0b00 RA` |
138 | 001 | Scalar | `` | `0b01 RA` |
139 | 010 | Scalar | `` | `0b10 RA` |
140 | 011 | Scalar | `` | `0b11 RA` |
141 | 100 | Vector | `` | `RA 0b00` |
142 | 101 | Vector | `` | `RA 0b01` |
143 | 110 | Vector | `` | `RA 0b10` |
144 | 111 | Vector | `` | `RA 0b11` |
145
146 2 bit version
147
148 (**TODO, i simply cannot interpret the names, they have absolutely zero meaning to me so i have no idea how to fill in the table. this is a bad sign, indicative that the names have to go, to be replaced by something xlear snd obvious**)
149
150 | R\*_EXTRA2 | Mode | CR Register | Int/FP<br/>Register |
151 |-----------|---------|---------------|---------------------|
152 | 00 | Scalar | `SVCR<N>_000` | `SV[F]R<N>_00` |
153 | 01 | Scalar | `SVCR<N>_100` | `SV[F]R<N>_10` |
154 | 10 | Vector | `SVCR<N>_000` | `SV[F]R<N>_00` |
155 | 11 | Vector | `SVCR<N>_100` | `SV[F]R<N>_10` |
156
157 alternative which is understandable and, if EXTRA2 is zero will map to "no effect" i.e Scalar OpenPOWER register naming:
158
159 | R\*_EXTRA2 | Mode | CR Register | Int/FP<br/>Register |
160 |-----------|-------|---------------|---------------------|
161 | 00 | Scalar | `` | `0b00 RA` |
162 | 01 | Scalar | `` | `0b01 RA` |
163 | 10 | Vector | `` | `RA 0b00` |
164 | 11 | Vector | `` | `RA 0b10` |
165
166 ## ELWIDTH Encoding
167
168 Default behaviour is set to 0b00 so that zeros follow the convention of "npt doing anything". In this case it means that elwidth overrides are not applicable. Thus if a 32 bit instruction operates on 32 bit, `elwidth=0b00` specifies that this behaviour is unmodified. Likewise when a processor is switched from 64 bit to 32 bit mode, `elwidth=0b00` states that, again, the behaviour is not to be modified.
169
170 Only when elwidth is nonzero is the element width overridden to the explicitly required value.
171
172 | Op Kind | Value | Mnemonic | Description |
173 |---------|-------|----------------|------------------------------------|
174 | Integer | 00 | DEFAULT | default behaviour for operation |
175 | Integer | 01 | `ELWIDTH=b` | Byte: 8-bit integer |
176 | Integer | 10 | `ELWIDTH=h` | Halfword: 16-bit integer |
177 | Integer | 11 | `ELWIDTH=w` | Word: 32-bit integer |
178 | FP | 00 | DEFAULT | default behaviour for FP operation |
179 | FP | 01 | `ELWIDTH=bf16` (rsvd) | Reserved for [`bf16`](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) |
180 | FP | 10 | `ELWIDTH=f16` | 16-bit IEEE 754 Half floating-point |
181 | FP | 11 | `ELWIDTH=f32` | 32-bit IEEE 754 Single floating-point |
182
183 ## SUBVL Encoding
184
185 the default for SUBVL is 1 and its encoding is 0b00 to indicate that SUBVL is effectively disabled (a SUBVL for-loop of only one element). this lines up in combination with all other "default is all zeros" behaviour.
186
187 | SUBVL Value | Mnemonic | Description |
188 |-------------|---------------------|------------------------|
189 | 00 | `SUBVL=1` (default) | Sub-vector length of 1 |
190 | 01 | `SUBVL=2` | Sub-vector length of 2 |
191 | 10 | `SUBVL=3` | Sub-vector length of 3 |
192 | 11 | `SUBVL=4` | Sub-vector length of 4 |
193
194 ## MASK/MASK_SRC & MASK_KIND Encoding
195
196 One bit (`MASKMODE`) indicates the mode: CR or Int predication. The two types may not be mixed.
197
198 Special note: to get default behaviour (SV disabled) this field must be set to zero in combination with Integer Predication also being set to 0b000. this has the effect of enabling "all 1s" in the predicate mask, which is equivalent to "not having any predication at all" and consequently, in combination with all other default zeros, fully disables SV.
199
200 | MASK_KIND Value | Description |
201 |-----------------|------------------------------------------------------|
202 | 0 | MASK/MASK_SRC are encoded using Integer Predication |
203 | 1 | MASK/MASK_SRC are encoded using CR-based Predication |
204
205 Integer Twin predication has a second set if 3 bits that uses the same encoding thus allowing either the same register (r3 or r10) to be used for both src and dest, or different regs (one for src, one for dest).
206
207 Likewise CR based twin predication has a second set of 3 bits, allowing a different test to be applied.
208
209 ### Integer Predication (MASK_KIND=0)
210
211 When the predicate mode bit is zero the 3 bits are interpreted as below.
212 Twin predication has an identical 3 bit field similarly encoded.
213
214 | MASK/MASK_SRC<br/>Value | Mnemonic | Description |
215 |-------------------------|----------|--------------------------------------------------------|
216 | 000 | ALWAYS | Operation is not masked (mask set to all 1s) |
217 | 001 | 1 << R3 | Element `i` is enabled if `i == R3` |
218 | 010 | R3 | Element `i` is enabled if `R3 & (1 << i)` is non-zero |
219 | 011 | ~R3 | Element `i` is enabled if `R3 & (1 << i)` is zero |
220 | 100 | R10 | Element `i` is enabled if `R10 & (1 << i)` is non-zero |
221 | 101 | ~R10 | Element `i` is enabled if `R10 & (1 << i)` is zero |
222 | 110 | R30 | Element `i` is enabled if `R30 & (1 << i)` is non-zero |
223 | 111 | ~R30 | Element `i` is enabled if `R30 & (1 << i)` is zero |
224
225 ### CR-based Predication (MASK_KIND=1)
226
227 When the predicate mode bit is one the 3 bits are interpreted as below. Twin predication has an identical 3 bit field similarly encoded
228
229 | MASK/MASK_SRC<br/>Value | Mnemonic | Description |
230 |-------------------------|----------|-------------------------------------------------|
231 | 000 | lt | Element `i` is enabled if `CR[6+i].LT` is set |
232 | 001 | nl/ge | Element `i` is enabled if `CR[6+i].LT` is clear |
233 | 010 | gt | Element `i` is enabled if `CR[6+i].GT` is set |
234 | 011 | ng/le | Element `i` is enabled if `CR[6+i].GT` is clear |
235 | 100 | eq | Element `i` is enabled if `CR[6+i].EQ` is set |
236 | 101 | ne | Element `i` is enabled if `CR[6+i].EQ` is clear |
237 | 110 | so/un | Element `i` is enabled if `CR[6+i].FU` is set |
238 | 111 | ns/nu | Element `i` is enabled if `CR[6+i].FU` is clear |
239
240 CR based predication. TODO: select alternate CR for twin predication? see [[discussion]] Overlap of the two CR based predicates must be taken into account, so the starting point for one of them must be suitably high, or accept that for twin predication VL must not exceed the range where overlap will occur, *or* that they use the same starting point but select different *bits* of the same CRs
241
242
243 # Twin Predication
244
245 This is a novel concept that allows predication to be applied to a single source and a single dest register. The following types of traditional Vector operations may be encoded with it, *without requiring explicit opcodes to do so*
246
247 * VSPLAT (a single scalar distributed across a vector)
248 * VEXTRACT (like LLVM IR [`extractelement`](https://releases.llvm.org/11.0.0/docs/LangRef.html#extractelement-instruction))
249 * VINSERT (like LLVM IR [`insertelement`](https://releases.llvm.org/11.0.0/docs/LangRef.html#insertelement-instruction))
250 * VCOMPRESS (like LLVM IR [`llvm.masked.compressstore.*`](https://releases.llvm.org/11.0.0/docs/LangRef.html#llvm-masked-compressstore-intrinsics))
251 * VEXPAND (like LLVM IR [`llvm.masked.expandload.*`](https://releases.llvm.org/11.0.0/docs/LangRef.html#llvm-masked-expandload-intrinsics))
252
253 Those patterns (and more) may be applied to:
254
255 * mv (the usual way that V\* operations are created)
256 * exts\* sign-extension
257 * rwlinm and other RS-RA shift operations (**note**: excluding
258 those that take RA as both a src and dest. These are not
259 1-src 1-dest, they are 2-src, 1-dest)
260 * LD and ST (treating AGEN as one source)
261 * FP fclass, fsgn, fneg, fabs, fcvt, frecip, fsqrt etc.
262 * Condition Register ops mfcr, mtcr and other similar
263
264 This is a huge list that creates extremely powerful combinations, particularly given that one of the predicate options is `(1<<r3)`
265
266 Additional unusual capabilities of Twin Predication include a back-to-back version of VCOMPRESS-VEXPAND which is effectively the ability to do an ordered multiple VINSERT.
267
268 ## Twin Predication
269
270 There are two different encodings: single-predication (typically arithmetic operations i.e. with more than one source register) and twin-predication (one source, one destination). They require different encodings
271
272 # Register Naming
273
274 SV Registers are numbered using the notation `SV[F|C]R<N>_<M>` where `<N>` is a decimal integer and `<M>` is a binary integer. Two integers are used to enable future register expansions to add more registers by appending more LSB bits to `<M>`.
275
276 For all `SV[F|C]R<N>_<M>` registers, the N is the
277 upper bits in decimal and the M is the lower bits in binary, so `SVR5_01` is
278 SV integer register `(5 << 2) + 0b01`, `SVCR6_011` is SV condition register
279 `(6 << 3) + 0b011`, and `SVFR20_10` is SV floating-point register
280 `(20 << 2) + 0b10`.
281
282 ## Example Code
283
284 a vectorized 32-bit add:
285
286 add SVR3_01, SVR6_10, SVR10_00, elwidth=w, subvl=1, mask=lt
287
288 does the following:
289
290 const size_t start_cr = (6 << 3) + 0b000; // starting at SVCR6_000
291 // pretend for the moment that type-punning actually works in C/C++
292 uint32_t *rt = (uint32_t *)&regs[(3 << 2) + 0b01]; // SVR3_01
293 uint32_t *ra = (uint32_t *)&regs[(6 << 2) + 0b10]; // SVR6_10
294 uint32_t *rb = (uint32_t *)&regs[(10 << 2) + 0b00]; // SVR10_00
295 for(size_t i = 0; i < VL; i++) {
296 if(CRs[(start_cr + i) % 64].lt) {
297 rt[i] = ra[i] + rb[i];
298 }
299 }
300
301 ## Integer Registers
302
303 setvli ..., VL=7
304 add r20, r25, r30, elwidth=64, subvl=1
305
306 where `r20`, `r25`, and `r30` are standard OpenPower register names.
307 Those names correspond to `SVR20_00`, `SVR25_00`, and `SVR30_00`.
308
309 pseudocode:
310
311 const size_t STD_TO_SV_SHIFT = 2; // gets bigger as reg files expand to 256, 512, ... registers
312
313 VL = 7; // setvli (omitting maxvl here)
314
315 for(size_t i = 0; i < VL; i++) {
316 regs[(20 << STD_TO_SV_SHIFT) + i] = regs[(25 << STD_TO_SV_SHIFT) + i]
317 + regs[(30 << STD_TO_SV_SHIFT) + i];
318 }
319
320 Standard PowerISA Integer registers are aliased to some of the SV integer registers:
321
322 (**Jacob these names are impossible to interpret due to them not being sequential numbering and there being no compact algorithm given that shows how they're created. the original SVPrefix was dead easy to understand**)
323
324 | Integer<br/>Register | SV Integer<br/>Register | Integer<br/>Register | SV Integer<br/>Register | Integer<br/>Register | SV Integer<br/>Register | Integer<br/>Register | SV Integer<br/>Register |
325 |----------------------|-------------------------|----------------------|-------------------------|----------------------|-------------------------|----------------------|-------------------------|
326 | R0 | SVR0_00 | R8 | SVR8_00 | R16 | SVR16_00 | R24 | SVR24_00 |
327 | | SVR0_01 | | SVR8_01 | | SVR16_01 | | SVR24_01 |
328 | | SVR0_10 | | SVR8_10 | | SVR16_10 | | SVR24_10 |
329 | | SVR0_11 | | SVR8_11 | | SVR16_11 | | SVR24_11 |
330 | R1 | SVR1_00 | R9 | SVR9_00 | R17 | SVR17_00 | R25 | SVR25_00 |
331 | | SVR1_01 | | SVR9_01 | | SVR17_01 | | SVR25_01 |
332 | | SVR1_10 | | SVR9_10 | | SVR17_10 | | SVR25_10 |
333 | | SVR1_11 | | SVR9_11 | | SVR17_11 | | SVR25_11 |
334 | R2 | SVR2_00 | R10 | SVR10_00 | R18 | SVR18_00 | R26 | SVR26_00 |
335 | | SVR2_01 | | SVR10_01 | | SVR18_01 | | SVR26_01 |
336 | | SVR2_10 | | SVR10_10 | | SVR18_10 | | SVR26_10 |
337 | | SVR2_11 | | SVR10_11 | | SVR18_11 | | SVR26_11 |
338 | R3 | SVR3_00 | R11 | SVR11_00 | R19 | SVR19_00 | R27 | SVR27_00 |
339 | | SVR3_01 | | SVR11_01 | | SVR19_01 | | SVR27_01 |
340 | | SVR3_10 | | SVR11_10 | | SVR19_10 | | SVR27_10 |
341 | | SVR3_11 | | SVR11_11 | | SVR19_11 | | SVR27_11 |
342 | R4 | SVR4_00 | R12 | SVR12_00 | R20 | SVR20_00 | R28 | SVR28_00 |
343 | | SVR4_01 | | SVR12_01 | | SVR20_01 | | SVR28_01 |
344 | | SVR4_10 | | SVR12_10 | | SVR20_10 | | SVR28_10 |
345 | | SVR4_11 | | SVR12_11 | | SVR20_11 | | SVR28_11 |
346 | R5 | SVR5_00 | R13 | SVR13_00 | R21 | SVR21_00 | R29 | SVR29_00 |
347 | | SVR5_01 | | SVR13_01 | | SVR21_01 | | SVR29_01 |
348 | | SVR5_10 | | SVR13_10 | | SVR21_10 | | SVR29_10 |
349 | | SVR5_11 | | SVR13_11 | | SVR21_11 | | SVR29_11 |
350 | R6 | SVR6_00 | R14 | SVR14_00 | R22 | SVR22_00 | R30 | SVR30_00 |
351 | | SVR6_01 | | SVR14_01 | | SVR22_01 | | SVR30_01 |
352 | | SVR6_10 | | SVR14_10 | | SVR22_10 | | SVR30_10 |
353 | | SVR6_11 | | SVR14_11 | | SVR22_11 | | SVR30_11 |
354 | R7 | SVR7_00 | R15 | SVR15_00 | R23 | SVR23_00 | R31 | SVR31_00 |
355 | | SVR7_01 | | SVR15_01 | | SVR23_01 | | SVR31_01 |
356 | | SVR7_10 | | SVR15_10 | | SVR23_10 | | SVR31_10 |
357 | | SVR7_11 | | SVR15_11 | | SVR23_11 | | SVR31_11 |
358
359 ## Floating-Point Registers
360
361 Standard PowerISA floating-point and VSX registers are aliased to some of the SV floating-point registers:
362
363 (**Jacob these names are impossible to interpret due to them not being sequential numbering and there being no compact algorithm given that shows how they're created. the original SVPrefix was dead easy to understand**)
364
365 | FP<br/>Register | VSX Register | SV FP<br/>Register | FP<br/>Register | VSX Register | SV FP<br/>Register |
366 |-----------------|-----------------------|--------------------|-----------------|-----------------------|--------------------|
367 | FPR\[0\] | VSR\[0\]\.dword\[0\] | SVFR0\_00 | FPR\[16\] | VSR\[16\]\.dword\[0\] | SVFR16\_00 |
368 | | VSR\[0\]\.dword\[1\] | SVFR0\_01 | | VSR\[16\]\.dword\[1\] | SVFR16\_01 |
369 | | VSR\[32\]\.dword\[0\] | SVFR0\_10 | | VSR\[48\]\.dword\[0\] | SVFR16\_10 |
370 | | VSR\[32\]\.dword\[1\] | SVFR0\_11 | | VSR\[48\]\.dword\[1\] | SVFR16\_11 |
371 | FPR\[1\] | VSR\[1\]\.dword\[0\] | SVFR1\_00 | FPR\[17\] | VSR\[17\]\.dword\[0\] | SVFR17\_00 |
372 | | VSR\[1\]\.dword\[1\] | SVFR1\_01 | | VSR\[17\]\.dword\[1\] | SVFR17\_01 |
373 | | VSR\[33\]\.dword\[0\] | SVFR1\_10 | | VSR\[49\]\.dword\[0\] | SVFR17\_10 |
374 | | VSR\[33\]\.dword\[1\] | SVFR1\_11 | | VSR\[49\]\.dword\[1\] | SVFR17\_11 |
375 | FPR\[2\] | VSR\[2\]\.dword\[0\] | SVFR2\_00 | FPR\[18\] | VSR\[18\]\.dword\[0\] | SVFR18\_00 |
376 | | VSR\[2\]\.dword\[1\] | SVFR2\_01 | | VSR\[18\]\.dword\[1\] | SVFR18\_01 |
377 | | VSR\[34\]\.dword\[0\] | SVFR2\_10 | | VSR\[50\]\.dword\[0\] | SVFR18\_10 |
378 | | VSR\[34\]\.dword\[1\] | SVFR2\_11 | | VSR\[50\]\.dword\[1\] | SVFR18\_11 |
379 | FPR\[3\] | VSR\[3\]\.dword\[0\] | SVFR3\_00 | FPR\[19\] | VSR\[19\]\.dword\[0\] | SVFR19\_00 |
380 | | VSR\[3\]\.dword\[1\] | SVFR3\_01 | | VSR\[19\]\.dword\[1\] | SVFR19\_01 |
381 | | VSR\[35\]\.dword\[0\] | SVFR3\_10 | | VSR\[51\]\.dword\[0\] | SVFR19\_10 |
382 | | VSR\[35\]\.dword\[1\] | SVFR3\_11 | | VSR\[51\]\.dword\[1\] | SVFR19\_11 |
383 | FPR\[4\] | VSR\[4\]\.dword\[0\] | SVFR4\_00 | FPR\[20\] | VSR\[20\]\.dword\[0\] | SVFR20\_00 |
384 | | VSR\[4\]\.dword\[1\] | SVFR4\_01 | | VSR\[20\]\.dword\[1\] | SVFR20\_01 |
385 | | VSR\[36\]\.dword\[0\] | SVFR4\_10 | | VSR\[52\]\.dword\[0\] | SVFR20\_10 |
386 | | VSR\[36\]\.dword\[1\] | SVFR4\_11 | | VSR\[52\]\.dword\[1\] | SVFR20\_11 |
387 | FPR\[5\] | VSR\[5\]\.dword\[0\] | SVFR5\_00 | FPR\[21\] | VSR\[21\]\.dword\[0\] | SVFR21\_00 |
388 | | VSR\[5\]\.dword\[1\] | SVFR5\_01 | | VSR\[21\]\.dword\[1\] | SVFR21\_01 |
389 | | VSR\[37\]\.dword\[0\] | SVFR5\_10 | | VSR\[53\]\.dword\[0\] | SVFR21\_10 |
390 | | VSR\[37\]\.dword\[1\] | SVFR5\_11 | | VSR\[53\]\.dword\[1\] | SVFR21\_11 |
391 | FPR\[6\] | VSR\[6\]\.dword\[0\] | SVFR6\_00 | FPR\[22\] | VSR\[22\]\.dword\[0\] | SVFR22\_00 |
392 | | VSR\[6\]\.dword\[1\] | SVFR6\_01 | | VSR\[22\]\.dword\[1\] | SVFR22\_01 |
393 | | VSR\[38\]\.dword\[0\] | SVFR6\_10 | | VSR\[54\]\.dword\[0\] | SVFR22\_10 |
394 | | VSR\[38\]\.dword\[1\] | SVFR6\_11 | | VSR\[54\]\.dword\[1\] | SVFR22\_11 |
395 | FPR\[7\] | VSR\[7\]\.dword\[0\] | SVFR7\_00 | FPR\[23\] | VSR\[23\]\.dword\[0\] | SVFR23\_00 |
396 | | VSR\[7\]\.dword\[1\] | SVFR7\_01 | | VSR\[23\]\.dword\[1\] | SVFR23\_01 |
397 | | VSR\[39\]\.dword\[0\] | SVFR7\_10 | | VSR\[55\]\.dword\[0\] | SVFR23\_10 |
398 | | VSR\[39\]\.dword\[1\] | SVFR7\_11 | | VSR\[55\]\.dword\[1\] | SVFR23\_11 |
399 | FPR\[8\] | VSR\[8\]\.dword\[0\] | SVFR8\_00 | FPR\[24\] | VSR\[24\]\.dword\[0\] | SVFR24\_00 |
400 | | VSR\[8\]\.dword\[1\] | SVFR8\_01 | | VSR\[24\]\.dword\[1\] | SVFR24\_01 |
401 | | VSR\[40\]\.dword\[0\] | SVFR8\_10 | | VSR\[56\]\.dword\[0\] | SVFR24\_10 |
402 | | VSR\[40\]\.dword\[1\] | SVFR8\_11 | | VSR\[56\]\.dword\[1\] | SVFR24\_11 |
403 | FPR\[9\] | VSR\[9\]\.dword\[0\] | SVFR9\_00 | FPR\[25\] | VSR\[25\]\.dword\[0\] | SVFR25\_00 |
404 | | VSR\[9\]\.dword\[1\] | SVFR9\_01 | | VSR\[25\]\.dword\[1\] | SVFR25\_01 |
405 | | VSR\[41\]\.dword\[0\] | SVFR9\_10 | | VSR\[57\]\.dword\[0\] | SVFR25\_10 |
406 | | VSR\[41\]\.dword\[1\] | SVFR9\_11 | | VSR\[57\]\.dword\[1\] | SVFR25\_11 |
407 | FPR\[10\] | VSR\[10\]\.dword\[0\] | SVFR10\_00 | FPR\[26\] | VSR\[26\]\.dword\[0\] | SVFR26\_00 |
408 | | VSR\[10\]\.dword\[1\] | SVFR10\_01 | | VSR\[26\]\.dword\[1\] | SVFR26\_01 |
409 | | VSR\[42\]\.dword\[0\] | SVFR10\_10 | | VSR\[58\]\.dword\[0\] | SVFR26\_10 |
410 | | VSR\[42\]\.dword\[1\] | SVFR10\_11 | | VSR\[58\]\.dword\[1\] | SVFR26\_11 |
411 | FPR\[11\] | VSR\[11\]\.dword\[0\] | SVFR11\_00 | FPR\[27\] | VSR\[27\]\.dword\[0\] | SVFR27\_00 |
412 | | VSR\[11\]\.dword\[1\] | SVFR11\_01 | | VSR\[27\]\.dword\[1\] | SVFR27\_01 |
413 | | VSR\[43\]\.dword\[0\] | SVFR11\_10 | | VSR\[59\]\.dword\[0\] | SVFR27\_10 |
414 | | VSR\[43\]\.dword\[1\] | SVFR11\_11 | | VSR\[59\]\.dword\[1\] | SVFR27\_11 |
415 | FPR\[12\] | VSR\[12\]\.dword\[0\] | SVFR12\_00 | FPR\[28\] | VSR\[28\]\.dword\[0\] | SVFR28\_00 |
416 | | VSR\[12\]\.dword\[1\] | SVFR12\_01 | | VSR\[28\]\.dword\[1\] | SVFR28\_01 |
417 | | VSR\[44\]\.dword\[0\] | SVFR12\_10 | | VSR\[60\]\.dword\[0\] | SVFR28\_10 |
418 | | VSR\[44\]\.dword\[1\] | SVFR12\_11 | | VSR\[60\]\.dword\[1\] | SVFR28\_11 |
419 | FPR\[13\] | VSR\[13\]\.dword\[0\] | SVFR13\_00 | FPR\[29\] | VSR\[29\]\.dword\[0\] | SVFR29\_00 |
420 | | VSR\[13\]\.dword\[1\] | SVFR13\_01 | | VSR\[29\]\.dword\[1\] | SVFR29\_01 |
421 | | VSR\[45\]\.dword\[0\] | SVFR13\_10 | | VSR\[61\]\.dword\[0\] | SVFR29\_10 |
422 | | VSR\[45\]\.dword\[1\] | SVFR13\_11 | | VSR\[61\]\.dword\[1\] | SVFR29\_11 |
423 | FPR\[14\] | VSR\[14\]\.dword\[0\] | SVFR14\_00 | FPR\[30\] | VSR\[30\]\.dword\[0\] | SVFR30\_00 |
424 | | VSR\[14\]\.dword\[1\] | SVFR14\_01 | | VSR\[30\]\.dword\[1\] | SVFR30\_01 |
425 | | VSR\[46\]\.dword\[0\] | SVFR14\_10 | | VSR\[62\]\.dword\[0\] | SVFR30\_10 |
426 | | VSR\[46\]\.dword\[1\] | SVFR14\_11 | | VSR\[62\]\.dword\[1\] | SVFR30\_11 |
427 | FPR\[15\] | VSR\[15\]\.dword\[0\] | SVFR15\_00 | FPR\[31\] | VSR\[31\]\.dword\[0\] | SVFR31\_00 |
428 | | VSR\[15\]\.dword\[1\] | SVFR15\_01 | | VSR\[31\]\.dword\[1\] | SVFR31\_01 |
429 | | VSR\[47\]\.dword\[0\] | SVFR15\_10 | | VSR\[63\]\.dword\[0\] | SVFR31\_10 |
430 | | VSR\[47\]\.dword\[1\] | SVFR15\_11 | | VSR\[63\]\.dword\[1\] | SVFR31\_11 |
431
432 # Operation
433
434 ## CR fields as inputs/outputs of vector operations
435
436 When vectorized, the CR inputs/outputs are read/written to 4-bit CR fields
437 starting from SVCR6_000 and incrementing from there. If SVCR7_111 is reached, the next CR
438 field used wraps around to SVCR0_000, then incrementing from there.
439 (see [[discussion]]. some alternative schemes are described there)
440
441 SVCR6_000 was chosen to balance avoiding needing to save CR2-CR4 (which are
442 callee-saved) just to use SV vectors with VL <= 61 as well as having the first
443 vector CR field readily accessible to standard CR instructions and branches.
444 Additionally, SVCR6_000 is used as the implicit result of a OpenPower ISA v3.1
445 standard vector (SIMD) instruction with Rc=1.
446
447 ## Table of CR fields
448
449 CR[i] is the notation used by the OpenPower spec to refer to CR field #i,
450 so FP instructions with Rc=1 write to CR[1] aka SVCR1_000.
451
452 There are 3 new SPRs for holding CRs: CR_EXT1, CR_EXT2, and CR_EXT3.
453
454 The 64 SV CRs are arranged similarly to the way the 128 integer registers are arranged:
455
456 (**Jacob these names are impossible to interpret due to them not being sequential numbering and there being no compact algorithm given that shows how they're created. the original SVPrefix was dead easy to understand**)
457
458 | CR<br/>Register | SPR<br/>Field | SV CR<br/>Register | CR<br/>Register | SPR<br/>Field | SV CR<br/>Register |
459 |-----------------|----------------|--------------------|-----------------|----------------|--------------------|
460 | CR[0] | CR[32:35] | SVCR0_000 | CR[4] | CR[48:51] | SVCR4_000 |
461 | | CR_EXT1[32:35] | SVCR0_001 | | CR_EXT1[48:51] | SVCR4_001 |
462 | | CR_EXT2[32:35] | SVCR0_010 | | CR_EXT2[48:51] | SVCR4_010 |
463 | | CR_EXT3[32:35] | SVCR0_011 | | CR_EXT3[48:51] | SVCR4_011 |
464 | *CR[-8]* | CR[0:3] | SVCR0_100 | *CR[-4]* | CR[16:19] | SVCR4_100 |
465 | | CR_EXT1[0:3] | SVCR0_101 | | CR_EXT1[16:19] | SVCR4_101 |
466 | | CR_EXT2[0:3] | SVCR0_110 | | CR_EXT2[16:19] | SVCR4_110 |
467 | | CR_EXT3[0:3] | SVCR0_111 | | CR_EXT3[16:19] | SVCR4_111 |
468 | CR[1] | CR[36:39] | SVCR1_000 | CR[5] | CR[52:55] | SVCR5_000 |
469 | | CR_EXT1[36:39] | SVCR1_001 | | CR_EXT1[52:55] | SVCR5_001 |
470 | | CR_EXT2[36:39] | SVCR1_010 | | CR_EXT2[52:55] | SVCR5_010 |
471 | | CR_EXT3[36:39] | SVCR1_011 | | CR_EXT3[52:55] | SVCR5_011 |
472 | *CR[-7]* | CR[4:7] | SVCR1_100 | *CR[-3]* | CR[20:23] | SVCR5_100 |
473 | | CR_EXT1[4:7] | SVCR1_101 | | CR_EXT1[20:23] | SVCR5_101 |
474 | | CR_EXT2[4:7] | SVCR1_110 | | CR_EXT2[20:23] | SVCR5_110 |
475 | | CR_EXT3[4:7] | SVCR1_111 | | CR_EXT3[20:23] | SVCR5_111 |
476 | CR[2] | CR[40:43] | SVCR2_000 | CR[6] | CR[56:59] | SVCR6_000 |
477 | | CR_EXT1[40:43] | SVCR2_001 | | CR_EXT1[56:59] | SVCR6_001 |
478 | | CR_EXT2[40:43] | SVCR2_010 | | CR_EXT2[56:59] | SVCR6_010 |
479 | | CR_EXT3[40:43] | SVCR2_011 | | CR_EXT3[56:59] | SVCR6_011 |
480 | *CR[-6]* | CR[8:11] | SVCR2_100 | *CR[-2]* | CR[24:27] | SVCR6_100 |
481 | | CR_EXT1[8:11] | SVCR2_101 | | CR_EXT1[24:27] | SVCR6_101 |
482 | | CR_EXT2[8:11] | SVCR2_110 | | CR_EXT2[24:27] | SVCR6_110 |
483 | | CR_EXT3[8:11] | SVCR2_111 | | CR_EXT3[24:27] | SVCR6_111 |
484 | CR[3] | CR[44:47] | SVCR3_000 | CR[7] | CR[60:63] | SVCR7_000 |
485 | | CR_EXT1[44:47] | SVCR3_001 | | CR_EXT1[60:63] | SVCR7_001 |
486 | | CR_EXT2[44:47] | SVCR3_010 | | CR_EXT2[60:63] | SVCR7_010 |
487 | | CR_EXT3[44:47] | SVCR3_011 | | CR_EXT3[60:63] | SVCR7_011 |
488 | *CR[-5]* | CR[12:15] | SVCR3_100 | *CR[-1]* | CR[28:31] | SVCR7_100 |
489 | | CR_EXT1[12:15] | SVCR3_101 | | CR_EXT1[28:31] | SVCR7_101 |
490 | | CR_EXT2[12:15] | SVCR3_110 | | CR_EXT2[28:31] | SVCR7_110 |
491 | | CR_EXT3[12:15] | SVCR3_111 | | CR_EXT3[28:31] | SVCR7_111 |
492
493 Note: CR[-8] through CR[-1] are not part of OpenPower v3.1, they are the MSB half of the 64-bit CR SPR.
494
495 # Register Profiles
496
497 Instructions are broken down by Register Profiles as listed in the following auto-generated page:
498 [[opcode_regs_deduped]]. "Non-SV" indicates that the operations with this Register Profile cannot be Vectorised (mtspr, bc, dcbz, twi)
499
500 ## LDST-1R-1W-imm
501 TBD
502 ## LDST-1R-2W-imm
503 TBD
504 ## LDST-2R-imm
505 TBD
506 ## LDST-2R-1W
507 TBD
508 ## LDST-2R-1W-imm
509 TBD
510 ## LDST-2R-2W
511 TBD
512 ## LDST-3R
513 TBD
514 ## LDST-3R-CRo
515 TBD
516 ## LDST-3R-1W
517 TBD
518 ## CRio
519 TBD
520 ## CR=2R1W
521
522 Remapped Encoding Fields:
523
524 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
525 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
526 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
527
528 ## 1W-CRi
529
530 Remapped Encoding Fields:
531
532 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
533 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
534 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
535
536 ## 1R-CRo
537
538 Remapped Encoding Fields:
539
540 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
541 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
542 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
543
544 ## 1R-CRio
545
546 Remapped Encoding Fields:
547
548 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
549 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
550 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
551
552 ## 1R-1W
553
554 Remapped Encoding Fields:
555
556 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
557 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
558 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
559
560 ## 1R-1W-imm
561
562 Remapped Encoding Fields:
563
564 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
565 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
566 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
567
568 ## 1R-1W-CRo
569
570 Remapped Encoding Fields:
571
572 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
573 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
574 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
575
576 ## 1R-1W-CRio
577
578 Remapped Encoding Fields:
579
580 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
581 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
582 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
583
584 ## 2R-CRo
585
586 Remapped Encoding Fields:
587
588 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
589 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
590 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
591
592 ## 2R-CRio
593
594 Remapped Encoding Fields:
595
596 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
597 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
598 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
599
600 ## 2R-1W
601
602 Remapped Encoding Fields:
603
604 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
605 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
606 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
607
608 ## 2R-1W-CRo
609
610 Remapped Encoding Fields:
611
612 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
613 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
614 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
615
616 <!-- comment needed to stop ikiwiki markdown from mis-parsing table -->
617
618 ## 2R-1W-CRo (rl(w|d)imi)
619
620 Remapped Encoding Fields:
621
622 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:23` |
623 |-----------|-------|---------|-------|-------------|-------------|---------|
624 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | TBD |
625
626 ## 2R-1W-CRi
627 TBD
628 ## 2R-1W-CRio
629
630 Remapped Encoding Fields:
631
632 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
633 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
634 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
635
636 ## 3R-1W-CRio
637
638 Remapped Encoding Fields:
639
640 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:19` | `20:23` |
641 |-----------|-------|---------|-------|-------------|-------------|-------------|-------------|----------|
642 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | Rsrc3_EXTRA | Reserved |