(no commit message)
[libreriscv.git] / openpower / sv / svp_rewrite / svp64.mdwn
1 # Rewrite of SVP64 for OpenPower ISA v3.1
2
3 * [[svp64/discussion]]
4
5 The plan is to create an encoding for SVP64, then to create an encoding for
6 SVP48, then to reorganize them both to improve field overlap, reducing the
7 amount of decoder hardware necessary.
8
9 All bit numbers are in MSB0 form (the bits are numbered from 0 at the MSB and
10 counting up as you move to the LSB end). All bit ranges are inclusive (so
11 `4:6` means bits 4, 5, and 6).
12
13 64-bit instructions are split into two 32-bit words, the prefix and the suffix. The prefix always comes before the suffix in PC order.
14
15 ## Definition of Reserved in this spec.
16
17 For the new fields added in SVP64, instructions that have any of their fields set to a reserved value must cause an illegal instruction trap, to allow emulation of future instruction sets.
18
19 This is unlike OpenPower ISA v3.1, which doesn't require a CPU to trap.
20
21 ## Remapped Encoding (`RM[0:23]`)
22
23 To allow relatively easy remapping of which portions of the Prefix Opcode Map
24 are used for SVP64 without needing to rewrite a large portion of the SVP64
25 spec, a mapping is defined from the OpenPower v3.1 prefix bits to a new 24-bit
26 Remapped Encoding denoted `RM[0]` at the MSB to `RM[23]` at the LSB.
27
28 The mapping from the OpenPower v3.1 prefix bits to the Remapped Encoding is
29 defined in the Prefix Fields section.
30
31 ## Remapped Encoding Fields
32
33 Shows all fields in the Remapped Encoding `RM[0:23]` for all instruction variants. There are two categories: Single and Twin Predication. Due to space considerations further subdivision of Single Predication is based on whether the number of src operands is 2 or 3.
34
35 ### Single Predication (N(src) > 1)
36
37
38 | Field Name | Field bits | Description |
39 |------------|------------|------------------------------------------------|
40 | MASK_KIND | `0` | Execution Mask Kind |
41 | MASK | `1:3` | Execution Mask |
42 | ELWIDTH | `4:5` | Element Width |
43 | SUBVL | `6:7` | Sub-vector length |
44 | EXTRA | `8:16` | Extra fields qualifying registers |
45 | MODE | `19:23` | see [[discussion]] |
46
47 Extra2: applies to 4-operand instructions (fmadd)
48
49 | Field Name | Field bits | Description |
50 |--------------|---------|--------------------------------------------------|
51 | Rdest_EXTRA2 | `8:9` | extra bits for Rdest (R\*_EXTRA2 Encoding) |
52 | Rsrc1_EXTRA2 | `10:11` | extra bits for Rsrc1 (R\*_EXTRA2 Encoding) |
53 | Rsrc2_EXTRA2 | `12:13` | extra bits for Rsrc2 (R\*_EXTRA2 Encoding) |
54 | Rsrc3_EXTRA2 | `14:15` | extra bits for Rsrc3 (R\*_EXTRA2 Encoding|
55 | reserved | `16` | reserved |
56
57 Extra3: applies to 3-operand instructions (src1 src2 dest)
58
59
60 | Field Name | Field bits | Description |
61 |--------------|---------|--------------------------------------------------|
62 | Rdest_EXTRA3 | `8:10` | extra bits for Rdest (Uses R\*_EXTRA3 Encoding) |
63 | Rsrc1_EXTRA3 | `11:13` | extra bits for Rsrc1 (Uses R\*_EXTRA3 Encoding) |
64 | Rsrc2_EXTRA3 | `14:16` | extra bits for Rsrc3 (Uses R\*_EXTRA3 Encoding) |
65
66 ### Twin Predication (src=1, dest=1)
67
68 | Remapped Encoding Field Name | Field bits | Description |
69 |------------------------------|------------|---------------------------------------------------------------------------|
70 | MASK_KIND | `0` | Execution Mask Kind |
71 | MASK | `1:3` | Execution Mask |
72 | ELWIDTH | `4:5` | Element Width |
73 | SUBVL | `6:7` | Sub-vector length |
74 | Rdest_EXTRA3 | `8:10` | extra bits for Rdest (Uses R\*_EXTRA Encoding) |
75 | Rsrc1_EXTRA3 | `11:13` | extra bits for Rsrc1 (Uses R\*_EXTRA Encoding) |
76 | MASK_SRC | `14:16` | Execution Mask for Source (only on instructions with twin-predication) |
77 | ELWIDTH_SRC | `17:18` | Element Width for Source (only on instructions with twin-predication) |
78 | MODE | `19:23` | see [[discussion]] |
79
80 note in [[discussion]]: TODO, evaluate if 2nd SUBVL should be added. conclusion: no. 2nd SUBVL makes no sense except for mv, and that is covered by [[mv.vec]]
81
82 ## R\*_EXTRA2 and R\*_EXTRA3 Encoding
83
84 (**TODO: 2-bit version of the table, just like in the original SVPrefix. This is important, to save bits on 4-operand instructions such as fmadd**)
85
86 In the following table, `<N>` denotes the value of the corresponding register field in the SVP64 suffix word.
87
88 (**Jacob: these tables are not in the slightest bit understandable due to the use of register names that are impossible to interpret clearly**)
89
90 3 bit version
91
92 | R\*_EXTRA3 | Vector/Scalar<br/>Mode | CR Register | Int/FP<br/>Register |
93 |-----------|------------------------|---------------|---------------------|
94 | 000 | Scalar | `SVCR<N>_000` | `SV[F]R<N>_00` |
95 | 001 | Scalar | `SVCR<N>_010` | `SV[F]R<N>_01` |
96 | 010 | Scalar | `SVCR<N>_100` | `SV[F]R<N>_10` |
97 | 011 | Scalar | `SVCR<N>_110` | `SV[F]R<N>_11` |
98 | 100 | Vector | `SVCR<N>_000` | `SV[F]R<N>_00` |
99 | 101 | Vector | `SVCR<N>_010` | `SV[F]R<N>_01` |
100 | 110 | Vector | `SVCR<N>_100` | `SV[F]R<N>_10` |
101 | 111 | Vector | `SVCR<N>_110` | `SV[F]R<N>_11` |
102
103 2 bit version
104
105 (**TODO, i simply cannot interpret the names, they have absolutely zero meaning to me so i have no idea how to fill in the table. this is a bad sign, indicative that the names have to go, to be replaced by something xlear snd obvious**)
106
107 | R\*_EXTRA2 | Vector/Scalar<br/>Mode | CR Register | Int/FP<br/>Register |
108 |-----------|------------------------|---------------|---------------------|
109 | 00 | Scalar | `SVCR<N>_000` | `SV[F]R<N>_00` |
110 | 01 | Scalar | `SVCR<N>_100` | `SV[F]R<N>_10` |
111 | 10 | Vector | `SVCR<N>_000` | `SV[F]R<N>_00` |
112 | 11 | Vector | `SVCR<N>_100` | `SV[F]R<N>_10` |
113
114 | R\*_EXTRA2 | Mode | CR Register | Int/FP<br/>Register |
115 |-----------|-------|---------------|---------------------|
116 | 00 | Scalar | `` | `RA` |
117 | 01 | Scalar | `` | `RA || 0b00` |
118 | 10 | Vector | `` | `RA || 0b00` |
119 | 11 | Vector | `` | `RA || 0b10` |
120
121 ## ELWIDTH Encoding
122
123 | Instruction Kind | ELWIDTH Value | Mnemonic | Description |
124 |------------------|---------------|---------------------------|-------------------------------------------------------------------------------------|
125 | Integer | 00 | `ELWIDTH=b` | Byte: 8-bit integer |
126 | Integer | 01 | `ELWIDTH=h` | Halfword: 16-bit integer |
127 | Integer | 10 | `ELWIDTH=w` | Word: 32-bit integer |
128 | Integer | 11 | `ELWIDTH=d` | Doubleword: 64-bit integer |
129 | FP | 00 | `ELWIDTH=bf16` (Reserved) | Reserved for [`bf16`](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) |
130 | FP | 01 | `ELWIDTH=f16` | 16-bit IEEE 754 Half floating-point |
131 | FP | 10 | `ELWIDTH=f32` | 32-bit IEEE 754 Single floating-point |
132 | FP | 11 | `ELWIDTH=f64` | 64-bit IEEE 754 Double floating-point |
133
134 ## SUBVL Encoding
135
136 | SUBVL Value | Mnemonic | Description |
137 |-------------|---------------------|------------------------|
138 | 00 | `SUBVL=4` | Sub-vector length of 4 |
139 | 01 | `SUBVL=1` (default) | Sub-vector length of 1 |
140 | 10 | `SUBVL=2` | Sub-vector length of 2 |
141 | 11 | `SUBVL=3` | Sub-vector length of 3 |
142
143 ## MASK/MASK_SRC & MASK_KIND Encoding
144
145 One bit (`MASKMODE`) indicates the mode: CR or Int predication. The two types may not be mixed.
146
147 | MASK_KIND Value | Description |
148 |-----------------|------------------------------------------------------|
149 | 0 | MASK/MASK_SRC are encoded using Integer Predication |
150 | 1 | MASK/MASK_SRC are encoded using CR-based Predication |
151
152 Integer Twin predication has a second set if 3 bits that uses the same encoding thus allowing either the same register (r3 or r10) to be used for both src and dest, or different regs (one for src, one for dest).
153
154 Likewise CR based twin predication has a second set of 3 bits, allowing a different test to be applied.
155
156 ### Integer Predication (MASK_KIND=0)
157
158 When the predicate mode bit is zero the 3 bits are interpreted as below.
159 Twin predication has an identical 3 bit field similarly encoded.
160
161 | MASK/MASK_SRC<br/>Value | Mnemonic | Description |
162 |-------------------------|----------|--------------------------------------------------------|
163 | 000 | ALWAYS | Operation is not masked (mask set to all 1s) |
164 | 001 | 1 << R3 | Element `i` is enabled if `i == R3` |
165 | 010 | R3 | Element `i` is enabled if `R3 & (1 << i)` is non-zero |
166 | 011 | ~R3 | Element `i` is enabled if `R3 & (1 << i)` is zero |
167 | 100 | R10 | Element `i` is enabled if `R10 & (1 << i)` is non-zero |
168 | 101 | ~R10 | Element `i` is enabled if `R10 & (1 << i)` is zero |
169 | 110 | R30 | Element `i` is enabled if `R30 & (1 << i)` is non-zero |
170 | 111 | ~R30 | Element `i` is enabled if `R30 & (1 << i)` is zero |
171
172 ### CR-based Predication (MASK_KIND=1)
173
174 When the predicate mode bit is one the 3 bits are interpreted as below. Twin predication has an identical 3 bit field similarly encoded
175
176 | MASK/MASK_SRC<br/>Value | Mnemonic | Description |
177 |-------------------------|----------|-------------------------------------------------|
178 | 000 | lt | Element `i` is enabled if `CR[6+i].LT` is set |
179 | 001 | nl/ge | Element `i` is enabled if `CR[6+i].LT` is clear |
180 | 010 | gt | Element `i` is enabled if `CR[6+i].GT` is set |
181 | 011 | ng/le | Element `i` is enabled if `CR[6+i].GT` is clear |
182 | 100 | eq | Element `i` is enabled if `CR[6+i].EQ` is set |
183 | 101 | ne | Element `i` is enabled if `CR[6+i].EQ` is clear |
184 | 110 | so/un | Element `i` is enabled if `CR[6+i].FU` is set |
185 | 111 | ns/nu | Element `i` is enabled if `CR[6+i].FU` is clear |
186
187 CR based predication. TODO: select alternate CR for twin predication? see [[discussion]] Overlap of the two CR based predicates must be taken into account, so the starting point for one of them must be suitably high, or accept that for twin predication VL must not exceed the range where overlap will occur, *or* that they use the same starting point but select different *bits* of the same CRs
188
189
190 ## Prefix Opcode Map (64-bit instruction encoding) (prefix bits 6:11)
191
192 (shows both PowerISA v3.1 instructions as well as new SVP instructions; empty spaces are yet-to-be-allocated Illegal Instructions)
193
194 | bits 6:11 | ---000 | ---001 | ---010 | ---011 | ---100 | ---101 | ---110 | ---111 |
195 |-----------|----------|------------|----------|----------|----------|----------|----------|----------|
196 | 000--- | 8LS-form | 8LS-form | 8LS-form | 8LS-form | 8LS-form | 8LS-form | 8LS-form | 8LS-form |
197 | 001--- | | | | | | | | |
198 | 010--- | 8RR-form | | | | SVP64 | SVP64 | SVP64 | SVP64 |
199 | 011--- | | | | | SVP64 | SVP64 | SVP64 | SVP64 |
200 | 100--- | MLS-form | MLS-form | MLS-form | MLS-form | MLS-form | MLS-form | MLS-form | MLS-form |
201 | 101--- | | | | | | | | |
202 | 110--- | MRR-form | | | | SVP64 | SVP64 | SVP64 | SVP64 |
203 | 111--- | | MMIRR-form | | | SVP64 | SVP64 | SVP64 | SVP64 |
204
205 ## Prefix Fields
206
207 | Prefix Field Name | Field bits | Constant Value | Description |
208 |---------------------|------------|----------------|--------------------------------------------|
209 | PO (Primary Opcode) | `0:5` | `1` | Indicates this is a 64-bit instruction |
210 | `RM[0]` | `6` | | Bit 0 of the Remapped Encoding |
211 | SVP64_7 | `7` | `1` | Indicates this is a SVP64 instruction |
212 | `RM[1]` | `8` | | Bit 1 of the Remapped Encoding |
213 | SVP64_9 | `9` | `1` | Indicates this is a SVP64 instruction |
214 | `RM[2:23]` | `10:31` | | Bits 2 through 23 of the Remapped Encoding |
215
216 # Twin Predication
217
218 This is a novel concept that allows predication to be applied to a single source and a single dest register. The following types of traditional Vector operations may be encoded with it, *without requiring explicit opcodes to do so*
219
220 * VSPLAT (a single scalar distributed across a vector)
221 * VEXTRACT (like LLVM IR [`extractelement`](https://releases.llvm.org/11.0.0/docs/LangRef.html#extractelement-instruction))
222 * VINSERT (like LLVM IR [`insertelement`](https://releases.llvm.org/11.0.0/docs/LangRef.html#insertelement-instruction))
223 * VCOMPRESS (like LLVM IR [`llvm.masked.compressstore.*`](https://releases.llvm.org/11.0.0/docs/LangRef.html#llvm-masked-compressstore-intrinsics))
224 * VEXPAND (like LLVM IR [`llvm.masked.expandload.*`](https://releases.llvm.org/11.0.0/docs/LangRef.html#llvm-masked-expandload-intrinsics))
225
226 Those patterns (and more) may be applied to:
227
228 * mv (the usual way that V\* operations are created)
229 * exts\* sign-extension
230 * rwlinm and other RS-RA shift operations
231 * LD and ST (treating AGEN as one source)
232 * FP fclass, fsgn, fneg, fabs, fcvt, frecip, fsqrt etc.
233 * Condition Register ops mfcr, mtcr and other similar
234
235 This is a huge list that creates extremely powerful combinations, particularly given that one of the predicate options is `(1<<r3)`
236
237 Additional unusual capabilities of Twin Predication include a back-to-back version of VCOMPRESS-VEXPAND which is effectively the ability to do an ordered multiple VINSERT.
238
239 ## Twin Predication
240
241 There are two different encodings: single-predication (typically arithmetic operations i.e. with more than one source register) and twin-predication (one source, one destination). They require different encodings
242
243 # Register Naming
244
245 SV Registers are numbered using the notation `SV[F|C]R<N>_<M>` where `<N>` is a decimal integer and `<M>` is a binary integer. Two integers are used to enable future register expansions to add more registers by appending more LSB bits to `<M>`.
246
247 For all `SV[F|C]R<N>_<M>` registers, the N is the
248 upper bits in decimal and the M is the lower bits in binary, so `SVR5_01` is
249 SV integer register `(5 << 2) + 0b01`, `SVCR6_011` is SV condition register
250 `(6 << 3) + 0b011`, and `SVFR20_10` is SV floating-point register
251 `(20 << 2) + 0b10`.
252
253 ## Example Code
254
255 a vectorized 32-bit add:
256
257 add SVR3_01, SVR6_10, SVR10_00, elwidth=w, subvl=1, mask=lt
258
259 does the following:
260
261 const size_t start_cr = (6 << 3) + 0b000; // starting at SVCR6_000
262 // pretend for the moment that type-punning actually works in C/C++
263 uint32_t *rt = (uint32_t *)&regs[(3 << 2) + 0b01]; // SVR3_01
264 uint32_t *ra = (uint32_t *)&regs[(6 << 2) + 0b10]; // SVR6_10
265 uint32_t *rb = (uint32_t *)&regs[(10 << 2) + 0b00]; // SVR10_00
266 for(size_t i = 0; i < VL; i++) {
267 if(CRs[(start_cr + i) % 64].lt) {
268 rt[i] = ra[i] + rb[i];
269 }
270 }
271
272 ## Integer Registers
273
274 setvli ..., VL=7
275 add r20, r25, r30, elwidth=64, subvl=1
276
277 where `r20`, `r25`, and `r30` are standard OpenPower register names.
278 Those names correspond to `SVR20_00`, `SVR25_00`, and `SVR30_00`.
279
280 pseudocode:
281
282 const size_t STD_TO_SV_SHIFT = 2; // gets bigger as reg files expand to 256, 512, ... registers
283
284 VL = 7; // setvli (omitting maxvl here)
285
286 for(size_t i = 0; i < VL; i++) {
287 regs[(20 << STD_TO_SV_SHIFT) + i] = regs[(25 << STD_TO_SV_SHIFT) + i]
288 + regs[(30 << STD_TO_SV_SHIFT) + i];
289 }
290
291 Standard PowerISA Integer registers are aliased to some of the SV integer registers:
292
293 (**Jacob these names are impossible to interpret due to them not being sequential numbering and there being no compact algorithm given that shows how they're created. the original SVPrefix was dead easy to understand**)
294
295 | Integer<br/>Register | SV Integer<br/>Register | Integer<br/>Register | SV Integer<br/>Register | Integer<br/>Register | SV Integer<br/>Register | Integer<br/>Register | SV Integer<br/>Register |
296 |----------------------|-------------------------|----------------------|-------------------------|----------------------|-------------------------|----------------------|-------------------------|
297 | R0 | SVR0_00 | R8 | SVR8_00 | R16 | SVR16_00 | R24 | SVR24_00 |
298 | | SVR0_01 | | SVR8_01 | | SVR16_01 | | SVR24_01 |
299 | | SVR0_10 | | SVR8_10 | | SVR16_10 | | SVR24_10 |
300 | | SVR0_11 | | SVR8_11 | | SVR16_11 | | SVR24_11 |
301 | R1 | SVR1_00 | R9 | SVR9_00 | R17 | SVR17_00 | R25 | SVR25_00 |
302 | | SVR1_01 | | SVR9_01 | | SVR17_01 | | SVR25_01 |
303 | | SVR1_10 | | SVR9_10 | | SVR17_10 | | SVR25_10 |
304 | | SVR1_11 | | SVR9_11 | | SVR17_11 | | SVR25_11 |
305 | R2 | SVR2_00 | R10 | SVR10_00 | R18 | SVR18_00 | R26 | SVR26_00 |
306 | | SVR2_01 | | SVR10_01 | | SVR18_01 | | SVR26_01 |
307 | | SVR2_10 | | SVR10_10 | | SVR18_10 | | SVR26_10 |
308 | | SVR2_11 | | SVR10_11 | | SVR18_11 | | SVR26_11 |
309 | R3 | SVR3_00 | R11 | SVR11_00 | R19 | SVR19_00 | R27 | SVR27_00 |
310 | | SVR3_01 | | SVR11_01 | | SVR19_01 | | SVR27_01 |
311 | | SVR3_10 | | SVR11_10 | | SVR19_10 | | SVR27_10 |
312 | | SVR3_11 | | SVR11_11 | | SVR19_11 | | SVR27_11 |
313 | R4 | SVR4_00 | R12 | SVR12_00 | R20 | SVR20_00 | R28 | SVR28_00 |
314 | | SVR4_01 | | SVR12_01 | | SVR20_01 | | SVR28_01 |
315 | | SVR4_10 | | SVR12_10 | | SVR20_10 | | SVR28_10 |
316 | | SVR4_11 | | SVR12_11 | | SVR20_11 | | SVR28_11 |
317 | R5 | SVR5_00 | R13 | SVR13_00 | R21 | SVR21_00 | R29 | SVR29_00 |
318 | | SVR5_01 | | SVR13_01 | | SVR21_01 | | SVR29_01 |
319 | | SVR5_10 | | SVR13_10 | | SVR21_10 | | SVR29_10 |
320 | | SVR5_11 | | SVR13_11 | | SVR21_11 | | SVR29_11 |
321 | R6 | SVR6_00 | R14 | SVR14_00 | R22 | SVR22_00 | R30 | SVR30_00 |
322 | | SVR6_01 | | SVR14_01 | | SVR22_01 | | SVR30_01 |
323 | | SVR6_10 | | SVR14_10 | | SVR22_10 | | SVR30_10 |
324 | | SVR6_11 | | SVR14_11 | | SVR22_11 | | SVR30_11 |
325 | R7 | SVR7_00 | R15 | SVR15_00 | R23 | SVR23_00 | R31 | SVR31_00 |
326 | | SVR7_01 | | SVR15_01 | | SVR23_01 | | SVR31_01 |
327 | | SVR7_10 | | SVR15_10 | | SVR23_10 | | SVR31_10 |
328 | | SVR7_11 | | SVR15_11 | | SVR23_11 | | SVR31_11 |
329
330 ## Floating-Point Registers
331
332 Standard PowerISA floating-point and VSX registers are aliased to some of the SV floating-point registers:
333
334 (**Jacob these names are impossible to interpret due to them not being sequential numbering and there being no compact algorithm given that shows how they're created. the original SVPrefix was dead easy to understand**)
335
336 | FP<br/>Register | VSX Register | SV FP<br/>Register | FP<br/>Register | VSX Register | SV FP<br/>Register |
337 |-----------------|-----------------------|--------------------|-----------------|-----------------------|--------------------|
338 | FPR\[0\] | VSR\[0\]\.dword\[0\] | SVFR0\_00 | FPR\[16\] | VSR\[16\]\.dword\[0\] | SVFR16\_00 |
339 | | VSR\[0\]\.dword\[1\] | SVFR0\_01 | | VSR\[16\]\.dword\[1\] | SVFR16\_01 |
340 | | VSR\[32\]\.dword\[0\] | SVFR0\_10 | | VSR\[48\]\.dword\[0\] | SVFR16\_10 |
341 | | VSR\[32\]\.dword\[1\] | SVFR0\_11 | | VSR\[48\]\.dword\[1\] | SVFR16\_11 |
342 | FPR\[1\] | VSR\[1\]\.dword\[0\] | SVFR1\_00 | FPR\[17\] | VSR\[17\]\.dword\[0\] | SVFR17\_00 |
343 | | VSR\[1\]\.dword\[1\] | SVFR1\_01 | | VSR\[17\]\.dword\[1\] | SVFR17\_01 |
344 | | VSR\[33\]\.dword\[0\] | SVFR1\_10 | | VSR\[49\]\.dword\[0\] | SVFR17\_10 |
345 | | VSR\[33\]\.dword\[1\] | SVFR1\_11 | | VSR\[49\]\.dword\[1\] | SVFR17\_11 |
346 | FPR\[2\] | VSR\[2\]\.dword\[0\] | SVFR2\_00 | FPR\[18\] | VSR\[18\]\.dword\[0\] | SVFR18\_00 |
347 | | VSR\[2\]\.dword\[1\] | SVFR2\_01 | | VSR\[18\]\.dword\[1\] | SVFR18\_01 |
348 | | VSR\[34\]\.dword\[0\] | SVFR2\_10 | | VSR\[50\]\.dword\[0\] | SVFR18\_10 |
349 | | VSR\[34\]\.dword\[1\] | SVFR2\_11 | | VSR\[50\]\.dword\[1\] | SVFR18\_11 |
350 | FPR\[3\] | VSR\[3\]\.dword\[0\] | SVFR3\_00 | FPR\[19\] | VSR\[19\]\.dword\[0\] | SVFR19\_00 |
351 | | VSR\[3\]\.dword\[1\] | SVFR3\_01 | | VSR\[19\]\.dword\[1\] | SVFR19\_01 |
352 | | VSR\[35\]\.dword\[0\] | SVFR3\_10 | | VSR\[51\]\.dword\[0\] | SVFR19\_10 |
353 | | VSR\[35\]\.dword\[1\] | SVFR3\_11 | | VSR\[51\]\.dword\[1\] | SVFR19\_11 |
354 | FPR\[4\] | VSR\[4\]\.dword\[0\] | SVFR4\_00 | FPR\[20\] | VSR\[20\]\.dword\[0\] | SVFR20\_00 |
355 | | VSR\[4\]\.dword\[1\] | SVFR4\_01 | | VSR\[20\]\.dword\[1\] | SVFR20\_01 |
356 | | VSR\[36\]\.dword\[0\] | SVFR4\_10 | | VSR\[52\]\.dword\[0\] | SVFR20\_10 |
357 | | VSR\[36\]\.dword\[1\] | SVFR4\_11 | | VSR\[52\]\.dword\[1\] | SVFR20\_11 |
358 | FPR\[5\] | VSR\[5\]\.dword\[0\] | SVFR5\_00 | FPR\[21\] | VSR\[21\]\.dword\[0\] | SVFR21\_00 |
359 | | VSR\[5\]\.dword\[1\] | SVFR5\_01 | | VSR\[21\]\.dword\[1\] | SVFR21\_01 |
360 | | VSR\[37\]\.dword\[0\] | SVFR5\_10 | | VSR\[53\]\.dword\[0\] | SVFR21\_10 |
361 | | VSR\[37\]\.dword\[1\] | SVFR5\_11 | | VSR\[53\]\.dword\[1\] | SVFR21\_11 |
362 | FPR\[6\] | VSR\[6\]\.dword\[0\] | SVFR6\_00 | FPR\[22\] | VSR\[22\]\.dword\[0\] | SVFR22\_00 |
363 | | VSR\[6\]\.dword\[1\] | SVFR6\_01 | | VSR\[22\]\.dword\[1\] | SVFR22\_01 |
364 | | VSR\[38\]\.dword\[0\] | SVFR6\_10 | | VSR\[54\]\.dword\[0\] | SVFR22\_10 |
365 | | VSR\[38\]\.dword\[1\] | SVFR6\_11 | | VSR\[54\]\.dword\[1\] | SVFR22\_11 |
366 | FPR\[7\] | VSR\[7\]\.dword\[0\] | SVFR7\_00 | FPR\[23\] | VSR\[23\]\.dword\[0\] | SVFR23\_00 |
367 | | VSR\[7\]\.dword\[1\] | SVFR7\_01 | | VSR\[23\]\.dword\[1\] | SVFR23\_01 |
368 | | VSR\[39\]\.dword\[0\] | SVFR7\_10 | | VSR\[55\]\.dword\[0\] | SVFR23\_10 |
369 | | VSR\[39\]\.dword\[1\] | SVFR7\_11 | | VSR\[55\]\.dword\[1\] | SVFR23\_11 |
370 | FPR\[8\] | VSR\[8\]\.dword\[0\] | SVFR8\_00 | FPR\[24\] | VSR\[24\]\.dword\[0\] | SVFR24\_00 |
371 | | VSR\[8\]\.dword\[1\] | SVFR8\_01 | | VSR\[24\]\.dword\[1\] | SVFR24\_01 |
372 | | VSR\[40\]\.dword\[0\] | SVFR8\_10 | | VSR\[56\]\.dword\[0\] | SVFR24\_10 |
373 | | VSR\[40\]\.dword\[1\] | SVFR8\_11 | | VSR\[56\]\.dword\[1\] | SVFR24\_11 |
374 | FPR\[9\] | VSR\[9\]\.dword\[0\] | SVFR9\_00 | FPR\[25\] | VSR\[25\]\.dword\[0\] | SVFR25\_00 |
375 | | VSR\[9\]\.dword\[1\] | SVFR9\_01 | | VSR\[25\]\.dword\[1\] | SVFR25\_01 |
376 | | VSR\[41\]\.dword\[0\] | SVFR9\_10 | | VSR\[57\]\.dword\[0\] | SVFR25\_10 |
377 | | VSR\[41\]\.dword\[1\] | SVFR9\_11 | | VSR\[57\]\.dword\[1\] | SVFR25\_11 |
378 | FPR\[10\] | VSR\[10\]\.dword\[0\] | SVFR10\_00 | FPR\[26\] | VSR\[26\]\.dword\[0\] | SVFR26\_00 |
379 | | VSR\[10\]\.dword\[1\] | SVFR10\_01 | | VSR\[26\]\.dword\[1\] | SVFR26\_01 |
380 | | VSR\[42\]\.dword\[0\] | SVFR10\_10 | | VSR\[58\]\.dword\[0\] | SVFR26\_10 |
381 | | VSR\[42\]\.dword\[1\] | SVFR10\_11 | | VSR\[58\]\.dword\[1\] | SVFR26\_11 |
382 | FPR\[11\] | VSR\[11\]\.dword\[0\] | SVFR11\_00 | FPR\[27\] | VSR\[27\]\.dword\[0\] | SVFR27\_00 |
383 | | VSR\[11\]\.dword\[1\] | SVFR11\_01 | | VSR\[27\]\.dword\[1\] | SVFR27\_01 |
384 | | VSR\[43\]\.dword\[0\] | SVFR11\_10 | | VSR\[59\]\.dword\[0\] | SVFR27\_10 |
385 | | VSR\[43\]\.dword\[1\] | SVFR11\_11 | | VSR\[59\]\.dword\[1\] | SVFR27\_11 |
386 | FPR\[12\] | VSR\[12\]\.dword\[0\] | SVFR12\_00 | FPR\[28\] | VSR\[28\]\.dword\[0\] | SVFR28\_00 |
387 | | VSR\[12\]\.dword\[1\] | SVFR12\_01 | | VSR\[28\]\.dword\[1\] | SVFR28\_01 |
388 | | VSR\[44\]\.dword\[0\] | SVFR12\_10 | | VSR\[60\]\.dword\[0\] | SVFR28\_10 |
389 | | VSR\[44\]\.dword\[1\] | SVFR12\_11 | | VSR\[60\]\.dword\[1\] | SVFR28\_11 |
390 | FPR\[13\] | VSR\[13\]\.dword\[0\] | SVFR13\_00 | FPR\[29\] | VSR\[29\]\.dword\[0\] | SVFR29\_00 |
391 | | VSR\[13\]\.dword\[1\] | SVFR13\_01 | | VSR\[29\]\.dword\[1\] | SVFR29\_01 |
392 | | VSR\[45\]\.dword\[0\] | SVFR13\_10 | | VSR\[61\]\.dword\[0\] | SVFR29\_10 |
393 | | VSR\[45\]\.dword\[1\] | SVFR13\_11 | | VSR\[61\]\.dword\[1\] | SVFR29\_11 |
394 | FPR\[14\] | VSR\[14\]\.dword\[0\] | SVFR14\_00 | FPR\[30\] | VSR\[30\]\.dword\[0\] | SVFR30\_00 |
395 | | VSR\[14\]\.dword\[1\] | SVFR14\_01 | | VSR\[30\]\.dword\[1\] | SVFR30\_01 |
396 | | VSR\[46\]\.dword\[0\] | SVFR14\_10 | | VSR\[62\]\.dword\[0\] | SVFR30\_10 |
397 | | VSR\[46\]\.dword\[1\] | SVFR14\_11 | | VSR\[62\]\.dword\[1\] | SVFR30\_11 |
398 | FPR\[15\] | VSR\[15\]\.dword\[0\] | SVFR15\_00 | FPR\[31\] | VSR\[31\]\.dword\[0\] | SVFR31\_00 |
399 | | VSR\[15\]\.dword\[1\] | SVFR15\_01 | | VSR\[31\]\.dword\[1\] | SVFR31\_01 |
400 | | VSR\[47\]\.dword\[0\] | SVFR15\_10 | | VSR\[63\]\.dword\[0\] | SVFR31\_10 |
401 | | VSR\[47\]\.dword\[1\] | SVFR15\_11 | | VSR\[63\]\.dword\[1\] | SVFR31\_11 |
402
403 # Operation
404
405 ## CR fields as inputs/outputs of vector operations
406
407 When vectorized, the CR inputs/outputs are read/written to 4-bit CR fields
408 starting from SVCR6_000 and incrementing from there. If SVCR7_111 is reached, the next CR
409 field used wraps around to SVCR0_000, then incrementing from there.
410 (see [[discussion]]. some alternative schemes are described there)
411
412 SVCR6_000 was chosen to balance avoiding needing to save CR2-CR4 (which are
413 callee-saved) just to use SV vectors with VL <= 61 as well as having the first
414 vector CR field readily accessible to standard CR instructions and branches.
415 Additionally, SVCR6_000 is used as the implicit result of a OpenPower ISA v3.1
416 standard vector (SIMD) instruction with Rc=1.
417
418 ## Table of CR fields
419
420 CR[i] is the notation used by the OpenPower spec to refer to CR field #i,
421 so FP instructions with Rc=1 write to CR[1] aka SVCR1_000.
422
423 There are 3 new SPRs for holding CRs: CR_EXT1, CR_EXT2, and CR_EXT3.
424
425 The 64 SV CRs are arranged similarly to the way the 128 integer registers are arranged:
426
427 (**Jacob these names are impossible to interpret due to them not being sequential numbering and there being no compact algorithm given that shows how they're created. the original SVPrefix was dead easy to understand**)
428
429 | CR<br/>Register | SPR<br/>Field | SV CR<br/>Register | CR<br/>Register | SPR<br/>Field | SV CR<br/>Register |
430 |-----------------|----------------|--------------------|-----------------|----------------|--------------------|
431 | CR[0] | CR[32:35] | SVCR0_000 | CR[4] | CR[48:51] | SVCR4_000 |
432 | | CR_EXT1[32:35] | SVCR0_001 | | CR_EXT1[48:51] | SVCR4_001 |
433 | | CR_EXT2[32:35] | SVCR0_010 | | CR_EXT2[48:51] | SVCR4_010 |
434 | | CR_EXT3[32:35] | SVCR0_011 | | CR_EXT3[48:51] | SVCR4_011 |
435 | *CR[-8]* | CR[0:3] | SVCR0_100 | *CR[-4]* | CR[16:19] | SVCR4_100 |
436 | | CR_EXT1[0:3] | SVCR0_101 | | CR_EXT1[16:19] | SVCR4_101 |
437 | | CR_EXT2[0:3] | SVCR0_110 | | CR_EXT2[16:19] | SVCR4_110 |
438 | | CR_EXT3[0:3] | SVCR0_111 | | CR_EXT3[16:19] | SVCR4_111 |
439 | CR[1] | CR[36:39] | SVCR1_000 | CR[5] | CR[52:55] | SVCR5_000 |
440 | | CR_EXT1[36:39] | SVCR1_001 | | CR_EXT1[52:55] | SVCR5_001 |
441 | | CR_EXT2[36:39] | SVCR1_010 | | CR_EXT2[52:55] | SVCR5_010 |
442 | | CR_EXT3[36:39] | SVCR1_011 | | CR_EXT3[52:55] | SVCR5_011 |
443 | *CR[-7]* | CR[4:7] | SVCR1_100 | *CR[-3]* | CR[20:23] | SVCR5_100 |
444 | | CR_EXT1[4:7] | SVCR1_101 | | CR_EXT1[20:23] | SVCR5_101 |
445 | | CR_EXT2[4:7] | SVCR1_110 | | CR_EXT2[20:23] | SVCR5_110 |
446 | | CR_EXT3[4:7] | SVCR1_111 | | CR_EXT3[20:23] | SVCR5_111 |
447 | CR[2] | CR[40:43] | SVCR2_000 | CR[6] | CR[56:59] | SVCR6_000 |
448 | | CR_EXT1[40:43] | SVCR2_001 | | CR_EXT1[56:59] | SVCR6_001 |
449 | | CR_EXT2[40:43] | SVCR2_010 | | CR_EXT2[56:59] | SVCR6_010 |
450 | | CR_EXT3[40:43] | SVCR2_011 | | CR_EXT3[56:59] | SVCR6_011 |
451 | *CR[-6]* | CR[8:11] | SVCR2_100 | *CR[-2]* | CR[24:27] | SVCR6_100 |
452 | | CR_EXT1[8:11] | SVCR2_101 | | CR_EXT1[24:27] | SVCR6_101 |
453 | | CR_EXT2[8:11] | SVCR2_110 | | CR_EXT2[24:27] | SVCR6_110 |
454 | | CR_EXT3[8:11] | SVCR2_111 | | CR_EXT3[24:27] | SVCR6_111 |
455 | CR[3] | CR[44:47] | SVCR3_000 | CR[7] | CR[60:63] | SVCR7_000 |
456 | | CR_EXT1[44:47] | SVCR3_001 | | CR_EXT1[60:63] | SVCR7_001 |
457 | | CR_EXT2[44:47] | SVCR3_010 | | CR_EXT2[60:63] | SVCR7_010 |
458 | | CR_EXT3[44:47] | SVCR3_011 | | CR_EXT3[60:63] | SVCR7_011 |
459 | *CR[-5]* | CR[12:15] | SVCR3_100 | *CR[-1]* | CR[28:31] | SVCR7_100 |
460 | | CR_EXT1[12:15] | SVCR3_101 | | CR_EXT1[28:31] | SVCR7_101 |
461 | | CR_EXT2[12:15] | SVCR3_110 | | CR_EXT2[28:31] | SVCR7_110 |
462 | | CR_EXT3[12:15] | SVCR3_111 | | CR_EXT3[28:31] | SVCR7_111 |
463
464 Note: CR[-8] through CR[-1] are not part of OpenPower v3.1, they are the MSB half of the 64-bit CR SPR.
465
466 # Register Profiles
467
468 Instructions are broken down by Register Profiles as listed in the following auto-generated page:
469 [[opcode_regs_deduped]]. "Non-SV" indicates that the operations with this Register Profile cannot be Vectorised (mtspr, bc, dcbz, twi)
470
471 ## LDST-1R-1W-imm
472 TBD
473 ## LDST-1R-2W-imm
474 TBD
475 ## LDST-2R-imm
476 TBD
477 ## LDST-2R-1W
478 TBD
479 ## LDST-2R-1W-imm
480 TBD
481 ## LDST-2R-2W
482 TBD
483 ## LDST-3R
484 TBD
485 ## LDST-3R-CRo
486 TBD
487 ## LDST-3R-1W
488 TBD
489 ## CRio
490 TBD
491 ## CR=2R1W
492
493 Remapped Encoding Fields:
494
495 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
496 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
497 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
498
499 ## 1W-CRi
500
501 Remapped Encoding Fields:
502
503 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
504 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
505 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
506
507 ## 1R-CRo
508
509 Remapped Encoding Fields:
510
511 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
512 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
513 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
514
515 ## 1R-CRio
516
517 Remapped Encoding Fields:
518
519 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
520 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
521 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
522
523 ## 1R-1W
524
525 Remapped Encoding Fields:
526
527 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
528 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
529 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
530
531 ## 1R-1W-imm
532
533 Remapped Encoding Fields:
534
535 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
536 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
537 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
538
539 ## 1R-1W-CRo
540
541 Remapped Encoding Fields:
542
543 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
544 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
545 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
546
547 ## 1R-1W-CRio
548
549 Remapped Encoding Fields:
550
551 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
552 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
553 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
554
555 ## 2R-CRo
556
557 Remapped Encoding Fields:
558
559 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
560 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
561 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
562
563 ## 2R-CRio
564
565 Remapped Encoding Fields:
566
567 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
568 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
569 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
570
571 ## 2R-1W
572
573 Remapped Encoding Fields:
574
575 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
576 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
577 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
578
579 ## 2R-1W-CRo
580
581 Remapped Encoding Fields:
582
583 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
584 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
585 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
586
587 <!-- comment needed to stop ikiwiki markdown from mis-parsing table -->
588
589 ## 2R-1W-CRo (rl(w|d)imi)
590
591 Remapped Encoding Fields:
592
593 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:23` |
594 |-----------|-------|---------|-------|-------------|-------------|---------|
595 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | TBD |
596
597 ## 2R-1W-CRi
598 TBD
599 ## 2R-1W-CRio
600
601 Remapped Encoding Fields:
602
603 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
604 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
605 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
606
607 ## 3R-1W-CRio
608
609 Remapped Encoding Fields:
610
611 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:19` | `20:23` |
612 |-----------|-------|---------|-------|-------------|-------------|-------------|-------------|----------|
613 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | Rsrc3_EXTRA | Reserved |