expand on comment explanation
[libreriscv.git] / openpower / sv / svp_rewrite / svp64.mdwn
1 # Rewrite of SVP64 for OpenPower ISA v3.1
2
3 * [[svp64/discussion]]
4
5 The plan is to create an encoding for SVP64, then to create an encoding for
6 SVP48, then to reorganize them both to improve field overlap, reducing the
7 amount of decoder hardware necessary.
8
9 All bit numbers are in MSB0 form (the bits are numbered from 0 at the MSB and
10 counting up as you move to the LSB end). All bit ranges are inclusive (so
11 `4:6` means bits 4, 5, and 6).
12
13 64-bit instructions are split into two 32-bit words, the prefix and the suffix. The prefix always comes before the suffix in PC order.
14
15 ## Definition of Reserved in this spec.
16
17 For the new fields added in SVP64, instructions that have any of their fields set to a reserved value must cause an illegal instruction trap, to allow emulation of future instruction sets.
18
19 This is unlike OpenPower ISA v3.1, which doesn't require a CPU to trap.
20
21 ## Remapped Encoding (`RM[0:23]`)
22
23 To allow relatively easy remapping of which portions of the Prefix Opcode Map
24 are used for SVP64 without needing to rewrite a large portion of the SVP64
25 spec, a mapping is defined from the OpenPower v3.1 prefix bits to a new 24-bit
26 Remapped Encoding denoted `RM[0]` at the MSB to `RM[23]` at the LSB.
27
28 The mapping from the OpenPower v3.1 prefix bits to the Remapped Encoding is
29 defined in the Prefix Fields section.
30
31 ## Remapped Encoding Fields
32
33 Shows all fields in the Remapped Encoding `RM[0:23]` for all instruction variants.
34
35 | Remapped Encoding Field Name | Field bits | Description |
36 |------------------------------|------------|---------------------------------------------------------------------------|
37 | MASK_KIND | `0` | Execution Mask Kind |
38 | MASK | `1:3` | Execution Mask |
39 | ELWIDTH | `4:5` | Element Width |
40 | SUBVL | `6:7` | Sub-vector length |
41 | Rdest_EXTRA | `8:10` | extra bits for Rdest (Uses R\*_EXTRA Encoding) |
42 | Rsrc1_EXTRA | `11:13` | extra bits for Rsrc1 (Uses R\*_EXTRA Encoding) |
43 | Rsrc2_EXTRA | `14:16` | extra bits for Rsrc2 (Uses R\*_EXTRA Encoding) |
44 | Rsrc3_EXTRA | `17:19` | extra bits for Rsrc3 (Uses R\*_EXTRA Encoding) |
45 | MASK_SRC | `14:16` | Execution Mask for Source (only on instructions with twin-predication) |
46 | ELWIDTH_SRC | `17:18` | Element Width for Source (only on instructions with twin-predication) |
47 | SUBVL_SRC | `19:20` | Sub-vector length for Source (only on instructions with twin-predication) |
48 | TBD | `21:23` | TBD |
49
50 ## R\*_EXTRA Encoding
51
52 In the following table, `<N>` denotes the value of the corresponding register field in the SVP64 suffix word.
53
54 | R\*_EXTRA | Vector/Scalar<br/>Mode | CR Register | Int/FP<br/>Register |
55 |-----------|------------------------|---------------|---------------------|
56 | 000 | Scalar | `SVCR<N>_000` | `SV[F]R<N>_00` |
57 | 001 | Scalar | `SVCR<N>_010` | `SV[F]R<N>_01` |
58 | 010 | Scalar | `SVCR<N>_100` | `SV[F]R<N>_10` |
59 | 011 | Scalar | `SVCR<N>_110` | `SV[F]R<N>_11` |
60 | 100 | Vector | `SVCR<N>_000` | `SV[F]R<N>_00` |
61 | 101 | Vector | `SVCR<N>_010` | `SV[F]R<N>_01` |
62 | 110 | Vector | `SVCR<N>_100` | `SV[F]R<N>_10` |
63 | 111 | Vector | `SVCR<N>_110` | `SV[F]R<N>_11` |
64
65 ## ELWIDTH Encoding
66
67 | Instruction Kind | ELWIDTH Value | Mnemonic | Description |
68 |------------------|---------------|---------------------------|-------------------------------------------------------------------------------------|
69 | Integer | 00 | `ELWIDTH=b` | Byte: 8-bit integer |
70 | Integer | 01 | `ELWIDTH=h` | Halfword: 16-bit integer |
71 | Integer | 10 | `ELWIDTH=w` | Word: 32-bit integer |
72 | Integer | 11 | `ELWIDTH=d` | Doubleword: 64-bit integer |
73 | FP | 00 | `ELWIDTH=bf16` (Reserved) | Reserved for [`bf16`](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) |
74 | FP | 01 | `ELWIDTH=f16` | 16-bit IEEE 754 Half floating-point |
75 | FP | 10 | `ELWIDTH=f32` | 32-bit IEEE 754 Single floating-point |
76 | FP | 11 | `ELWIDTH=f64` | 64-bit IEEE 754 Double floating-point |
77
78 ## SUBVL Encoding
79
80 | SUBVL Value | Mnemonic | Description |
81 |-------------|---------------------|------------------------|
82 | 00 | `SUBVL=4` | Sub-vector length of 4 |
83 | 01 | `SUBVL=1` (default) | Sub-vector length of 1 |
84 | 10 | `SUBVL=2` | Sub-vector length of 2 |
85 | 11 | `SUBVL=3` | Sub-vector length of 3 |
86
87 ## MASK/MASK_SRC & MASK_KIND Encoding
88
89 One bit (`MASKMODE`) indicates the mode: CR or Int predication. The two types may not be mixed.
90
91 | MASK_KIND Value | Description |
92 |-----------------|------------------------------------------------------|
93 | 0 | MASK/MASK_SRC are encoded using Integer Predication |
94 | 1 | MASK/MASK_SRC are encoded using CR-based Predication |
95
96 Integer Twin predication has a second set if 3 bits that uses the same encoding thus allowing either the same register (r3 or r10) to be used for both src and dest, or different regs (one for src, one for dest).
97
98 Likewise CR based twin predication has a second set of 3 bits, allowing a different test to be applied.
99
100 ### Integer Predication (MASK_KIND=0)
101
102 When the predicate mode bit is zero the 3 bits are interpreted as below.
103 Twin predication has an identical 3 bit field similarly encoded.
104
105 | MASK/MASK_SRC<br/>Value | Mnemonic | Description |
106 |-------------------------|----------|--------------------------------------------------------|
107 | 000 | ALWAYS | Operation is not masked (mask set to all 1s) |
108 | 001 | 1 << R3 | Element `i` is enabled if `i == R3` |
109 | 010 | R3 | Element `i` is enabled if `R3 & (1 << i)` is non-zero |
110 | 011 | ~R3 | Element `i` is enabled if `R3 & (1 << i)` is zero |
111 | 100 | R10 | Element `i` is enabled if `R10 & (1 << i)` is non-zero |
112 | 101 | ~R10 | Element `i` is enabled if `R10 & (1 << i)` is zero |
113 | 110 | R30 | Element `i` is enabled if `R30 & (1 << i)` is non-zero |
114 | 111 | ~R30 | Element `i` is enabled if `R30 & (1 << i)` is zero |
115
116 ### CR-based Predication (MASK_KIND=1)
117
118 When the predicate mode bit is one the 3 bits are interpreted as below. Twin predication has an identical 3 bit field similarly encoded
119
120 | MASK/MASK_SRC<br/>Value | Mnemonic | Description |
121 |-------------------------|----------|-------------------------------------------------|
122 | 000 | lt | Element `i` is enabled if `CR[6+i].LT` is set |
123 | 001 | nl/ge | Element `i` is enabled if `CR[6+i].LT` is clear |
124 | 010 | gt | Element `i` is enabled if `CR[6+i].GT` is set |
125 | 011 | ng/le | Element `i` is enabled if `CR[6+i].GT` is clear |
126 | 100 | eq | Element `i` is enabled if `CR[6+i].EQ` is set |
127 | 101 | ne | Element `i` is enabled if `CR[6+i].EQ` is clear |
128 | 110 | so/un | Element `i` is enabled if `CR[6+i].FU` is set |
129 | 111 | ns/nu | Element `i` is enabled if `CR[6+i].FU` is clear |
130
131 CR based predication. TODO: select alternate CR for twin predication? see [[discussion]] Overlap of the two CR based predicates must be taken into account, so the starting point for one of them must be suitably high, or accept that for twin predication VL must not exceed the range where overlap will occur, *or* that they use the same starting point but select different *bits* of the same CRs
132
133
134 ## Prefix Opcode Map (64-bit instruction encoding) (prefix bits 6:11)
135
136 (shows both PowerISA v3.1 instructions as well as new SVP instructions; empty spaces are yet-to-be-allocated Illegal Instructions)
137
138 | bits 6:11 | ---000 | ---001 | ---010 | ---011 | ---100 | ---101 | ---110 | ---111 |
139 |-----------|----------|------------|----------|----------|----------|----------|----------|----------|
140 | 000--- | 8LS-form | 8LS-form | 8LS-form | 8LS-form | 8LS-form | 8LS-form | 8LS-form | 8LS-form |
141 | 001--- | | | | | | | | |
142 | 010--- | 8RR-form | | | | SVP64 | SVP64 | SVP64 | SVP64 |
143 | 011--- | | | | | SVP64 | SVP64 | SVP64 | SVP64 |
144 | 100--- | MLS-form | MLS-form | MLS-form | MLS-form | MLS-form | MLS-form | MLS-form | MLS-form |
145 | 101--- | | | | | | | | |
146 | 110--- | MRR-form | | | | SVP64 | SVP64 | SVP64 | SVP64 |
147 | 111--- | | MMIRR-form | | | SVP64 | SVP64 | SVP64 | SVP64 |
148
149 ## Prefix Fields
150
151 | Prefix Field Name | Field bits | Constant Value | Description |
152 |---------------------|------------|----------------|--------------------------------------------|
153 | PO (Primary Opcode) | `0:5` | `1` | Indicates this is a 64-bit instruction |
154 | `RM[0]` | `6` | | Bit 0 of the Remapped Encoding |
155 | SVP64_7 | `7` | `1` | Indicates this is a SVP64 instruction |
156 | `RM[1]` | `8` | | Bit 1 of the Remapped Encoding |
157 | SVP64_9 | `9` | `1` | Indicates this is a SVP64 instruction |
158 | `RM[2:23]` | `10:31` | | Bits 2 through 23 of the Remapped Encoding |
159
160 # Twin Predication
161
162 This is a novel concept that allows predication to be applied to a single source and a single dest register. The following types of traditional Vector operations may be encoded with it, *without requiring explicit opcodes to do so*
163
164 * VSPLAT (a single scalar distributed across a vector)
165 * VEXTRACT (like LLVM IR [`extractelement`](https://releases.llvm.org/11.0.0/docs/LangRef.html#extractelement-instruction))
166 * VINSERT (like LLVM IR [`insertelement`](https://releases.llvm.org/11.0.0/docs/LangRef.html#insertelement-instruction))
167 * VCOMPRESS (like LLVM IR [`llvm.masked.compressstore.*`](https://releases.llvm.org/11.0.0/docs/LangRef.html#llvm-masked-compressstore-intrinsics))
168 * VEXPAND (like LLVM IR [`llvm.masked.expandload.*`](https://releases.llvm.org/11.0.0/docs/LangRef.html#llvm-masked-expandload-intrinsics))
169
170 Those patterns (and more) may be applied to:
171
172 * mv (the usual way that V\* operations are created)
173 * exts\* sign-extension
174 * rwlinm and other RS-RA shift operations
175 * LD and ST (treating AGEN as one source)
176 * FP fclass, fsgn, fneg, fabs, fcvt, frecip, fsqrt etc.
177 * Condition Register ops mfcr, mtcr and other similar
178
179 This is a huge list that creates extremely powerful combinations, particularly given that one of the predicate options is `(1<<r3)`
180
181 Additional unusual capabilities of Twin Predication include a back-to-back version of VCOMPRESS-VEXPAND which is effectively the ability to do an ordered multiple VINSERT.
182
183 ## Twin Predication
184
185 There are two different encodings: single-predication (typically arithmetic operations i.e. with more than one source register) and twin-predication (one source, one destination). They require different encodings
186
187 # Register Naming
188
189 SV Registers are numbered using the notation `SV[F|C]R<N>_<M>` where `<N>` is a decimal integer and `<M>` is a binary integer. Two integers are used to enable future register expansions to add more registers by appending more LSB bits to `<M>`.
190
191 For all `SV[F|C]R<N>_<M>` registers, the N is the
192 upper bits in decimal and the M is the lower bits in binary, so `SVR5_01` is
193 SV integer register `(5 << 2) + 0b01`, `SVCR6_011` is SV condition register
194 `(6 << 3) + 0b011`, and `SVFR20_10` is SV floating-point register
195 `(20 << 2) + 0b10`.
196
197 ## Example Code
198
199 a vectorized 32-bit add:
200
201 add SVR3_01, SVR6_10, SVR10_00, elwidth=w, subvl=1, mask=lt
202
203 does the following:
204
205 const size_t start_cr = (6 << 3) + 0b000; // starting at SVCR6_000
206 // pretend for the moment that type-punning actually works in C/C++
207 uint32_t *rt = (uint32_t *)&regs[(3 << 2) + 0b01]; // SVR3_01
208 uint32_t *ra = (uint32_t *)&regs[(6 << 2) + 0b10]; // SVR6_10
209 uint32_t *rb = (uint32_t *)&regs[(10 << 2) + 0b00]; // SVR10_00
210 for(size_t i = 0; i < VL; i++) {
211 if(CRs[(start_cr + i) % 64].lt) {
212 rt[i] = ra[i] + rb[i];
213 }
214 }
215
216 ## Integer Registers
217
218 setvli ..., VL=7
219 add r20, r25, r30, elwidth=64, subvl=1
220
221 where `r20`, `r25`, and `r30` are standard OpenPower register names.
222 Those names correspond to `SVR20_00`, `SVR25_00`, and `SVR30_00`.
223
224 pseudocode:
225
226 const size_t STD_TO_SV_SHIFT = 2; // gets bigger as reg files expand to 256, 512, ... registers
227
228 VL = 7; // setvli (omitting maxvl here)
229
230 for(size_t i = 0; i < VL; i++) {
231 regs[(20 << STD_TO_SV_SHIFT) + i] = regs[(25 << STD_TO_SV_SHIFT) + i]
232 + regs[(30 << STD_TO_SV_SHIFT) + i];
233 }
234
235 Standard PowerISA Integer registers are aliased to some of the SV integer registers:
236
237 | Integer<br/>Register | SV Integer<br/>Register | Integer<br/>Register | SV Integer<br/>Register | Integer<br/>Register | SV Integer<br/>Register | Integer<br/>Register | SV Integer<br/>Register |
238 |----------------------|-------------------------|----------------------|-------------------------|----------------------|-------------------------|----------------------|-------------------------|
239 | R0 | SVR0_00 | R8 | SVR8_00 | R16 | SVR16_00 | R24 | SVR24_00 |
240 | | SVR0_01 | | SVR8_01 | | SVR16_01 | | SVR24_01 |
241 | | SVR0_10 | | SVR8_10 | | SVR16_10 | | SVR24_10 |
242 | | SVR0_11 | | SVR8_11 | | SVR16_11 | | SVR24_11 |
243 | R1 | SVR1_00 | R9 | SVR9_00 | R17 | SVR17_00 | R25 | SVR25_00 |
244 | | SVR1_01 | | SVR9_01 | | SVR17_01 | | SVR25_01 |
245 | | SVR1_10 | | SVR9_10 | | SVR17_10 | | SVR25_10 |
246 | | SVR1_11 | | SVR9_11 | | SVR17_11 | | SVR25_11 |
247 | R2 | SVR2_00 | R10 | SVR10_00 | R18 | SVR18_00 | R26 | SVR26_00 |
248 | | SVR2_01 | | SVR10_01 | | SVR18_01 | | SVR26_01 |
249 | | SVR2_10 | | SVR10_10 | | SVR18_10 | | SVR26_10 |
250 | | SVR2_11 | | SVR10_11 | | SVR18_11 | | SVR26_11 |
251 | R3 | SVR3_00 | R11 | SVR11_00 | R19 | SVR19_00 | R27 | SVR27_00 |
252 | | SVR3_01 | | SVR11_01 | | SVR19_01 | | SVR27_01 |
253 | | SVR3_10 | | SVR11_10 | | SVR19_10 | | SVR27_10 |
254 | | SVR3_11 | | SVR11_11 | | SVR19_11 | | SVR27_11 |
255 | R4 | SVR4_00 | R12 | SVR12_00 | R20 | SVR20_00 | R28 | SVR28_00 |
256 | | SVR4_01 | | SVR12_01 | | SVR20_01 | | SVR28_01 |
257 | | SVR4_10 | | SVR12_10 | | SVR20_10 | | SVR28_10 |
258 | | SVR4_11 | | SVR12_11 | | SVR20_11 | | SVR28_11 |
259 | R5 | SVR5_00 | R13 | SVR13_00 | R21 | SVR21_00 | R29 | SVR29_00 |
260 | | SVR5_01 | | SVR13_01 | | SVR21_01 | | SVR29_01 |
261 | | SVR5_10 | | SVR13_10 | | SVR21_10 | | SVR29_10 |
262 | | SVR5_11 | | SVR13_11 | | SVR21_11 | | SVR29_11 |
263 | R6 | SVR6_00 | R14 | SVR14_00 | R22 | SVR22_00 | R30 | SVR30_00 |
264 | | SVR6_01 | | SVR14_01 | | SVR22_01 | | SVR30_01 |
265 | | SVR6_10 | | SVR14_10 | | SVR22_10 | | SVR30_10 |
266 | | SVR6_11 | | SVR14_11 | | SVR22_11 | | SVR30_11 |
267 | R7 | SVR7_00 | R15 | SVR15_00 | R23 | SVR23_00 | R31 | SVR31_00 |
268 | | SVR7_01 | | SVR15_01 | | SVR23_01 | | SVR31_01 |
269 | | SVR7_10 | | SVR15_10 | | SVR23_10 | | SVR31_10 |
270 | | SVR7_11 | | SVR15_11 | | SVR23_11 | | SVR31_11 |
271
272 ## Floating-Point Registers
273
274 Standard PowerISA floating-point and VSX registers are aliased to some of the SV floating-point registers:
275
276 | FP<br/>Register | VSX Register | SV FP<br/>Register | FP<br/>Register | VSX Register | SV FP<br/>Register |
277 |-----------------|-----------------------|--------------------|-----------------|-----------------------|--------------------|
278 | FPR\[0\] | VSR\[0\]\.dword\[0\] | SVFR0\_00 | FPR\[16\] | VSR\[16\]\.dword\[0\] | SVFR16\_00 |
279 | | VSR\[0\]\.dword\[1\] | SVFR0\_01 | | VSR\[16\]\.dword\[1\] | SVFR16\_01 |
280 | | VSR\[32\]\.dword\[0\] | SVFR0\_10 | | VSR\[48\]\.dword\[0\] | SVFR16\_10 |
281 | | VSR\[32\]\.dword\[1\] | SVFR0\_11 | | VSR\[48\]\.dword\[1\] | SVFR16\_11 |
282 | FPR\[1\] | VSR\[1\]\.dword\[0\] | SVFR1\_00 | FPR\[17\] | VSR\[17\]\.dword\[0\] | SVFR17\_00 |
283 | | VSR\[1\]\.dword\[1\] | SVFR1\_01 | | VSR\[17\]\.dword\[1\] | SVFR17\_01 |
284 | | VSR\[33\]\.dword\[0\] | SVFR1\_10 | | VSR\[49\]\.dword\[0\] | SVFR17\_10 |
285 | | VSR\[33\]\.dword\[1\] | SVFR1\_11 | | VSR\[49\]\.dword\[1\] | SVFR17\_11 |
286 | FPR\[2\] | VSR\[2\]\.dword\[0\] | SVFR2\_00 | FPR\[18\] | VSR\[18\]\.dword\[0\] | SVFR18\_00 |
287 | | VSR\[2\]\.dword\[1\] | SVFR2\_01 | | VSR\[18\]\.dword\[1\] | SVFR18\_01 |
288 | | VSR\[34\]\.dword\[0\] | SVFR2\_10 | | VSR\[50\]\.dword\[0\] | SVFR18\_10 |
289 | | VSR\[34\]\.dword\[1\] | SVFR2\_11 | | VSR\[50\]\.dword\[1\] | SVFR18\_11 |
290 | FPR\[3\] | VSR\[3\]\.dword\[0\] | SVFR3\_00 | FPR\[19\] | VSR\[19\]\.dword\[0\] | SVFR19\_00 |
291 | | VSR\[3\]\.dword\[1\] | SVFR3\_01 | | VSR\[19\]\.dword\[1\] | SVFR19\_01 |
292 | | VSR\[35\]\.dword\[0\] | SVFR3\_10 | | VSR\[51\]\.dword\[0\] | SVFR19\_10 |
293 | | VSR\[35\]\.dword\[1\] | SVFR3\_11 | | VSR\[51\]\.dword\[1\] | SVFR19\_11 |
294 | FPR\[4\] | VSR\[4\]\.dword\[0\] | SVFR4\_00 | FPR\[20\] | VSR\[20\]\.dword\[0\] | SVFR20\_00 |
295 | | VSR\[4\]\.dword\[1\] | SVFR4\_01 | | VSR\[20\]\.dword\[1\] | SVFR20\_01 |
296 | | VSR\[36\]\.dword\[0\] | SVFR4\_10 | | VSR\[52\]\.dword\[0\] | SVFR20\_10 |
297 | | VSR\[36\]\.dword\[1\] | SVFR4\_11 | | VSR\[52\]\.dword\[1\] | SVFR20\_11 |
298 | FPR\[5\] | VSR\[5\]\.dword\[0\] | SVFR5\_00 | FPR\[21\] | VSR\[21\]\.dword\[0\] | SVFR21\_00 |
299 | | VSR\[5\]\.dword\[1\] | SVFR5\_01 | | VSR\[21\]\.dword\[1\] | SVFR21\_01 |
300 | | VSR\[37\]\.dword\[0\] | SVFR5\_10 | | VSR\[53\]\.dword\[0\] | SVFR21\_10 |
301 | | VSR\[37\]\.dword\[1\] | SVFR5\_11 | | VSR\[53\]\.dword\[1\] | SVFR21\_11 |
302 | FPR\[6\] | VSR\[6\]\.dword\[0\] | SVFR6\_00 | FPR\[22\] | VSR\[22\]\.dword\[0\] | SVFR22\_00 |
303 | | VSR\[6\]\.dword\[1\] | SVFR6\_01 | | VSR\[22\]\.dword\[1\] | SVFR22\_01 |
304 | | VSR\[38\]\.dword\[0\] | SVFR6\_10 | | VSR\[54\]\.dword\[0\] | SVFR22\_10 |
305 | | VSR\[38\]\.dword\[1\] | SVFR6\_11 | | VSR\[54\]\.dword\[1\] | SVFR22\_11 |
306 | FPR\[7\] | VSR\[7\]\.dword\[0\] | SVFR7\_00 | FPR\[23\] | VSR\[23\]\.dword\[0\] | SVFR23\_00 |
307 | | VSR\[7\]\.dword\[1\] | SVFR7\_01 | | VSR\[23\]\.dword\[1\] | SVFR23\_01 |
308 | | VSR\[39\]\.dword\[0\] | SVFR7\_10 | | VSR\[55\]\.dword\[0\] | SVFR23\_10 |
309 | | VSR\[39\]\.dword\[1\] | SVFR7\_11 | | VSR\[55\]\.dword\[1\] | SVFR23\_11 |
310 | FPR\[8\] | VSR\[8\]\.dword\[0\] | SVFR8\_00 | FPR\[24\] | VSR\[24\]\.dword\[0\] | SVFR24\_00 |
311 | | VSR\[8\]\.dword\[1\] | SVFR8\_01 | | VSR\[24\]\.dword\[1\] | SVFR24\_01 |
312 | | VSR\[40\]\.dword\[0\] | SVFR8\_10 | | VSR\[56\]\.dword\[0\] | SVFR24\_10 |
313 | | VSR\[40\]\.dword\[1\] | SVFR8\_11 | | VSR\[56\]\.dword\[1\] | SVFR24\_11 |
314 | FPR\[9\] | VSR\[9\]\.dword\[0\] | SVFR9\_00 | FPR\[25\] | VSR\[25\]\.dword\[0\] | SVFR25\_00 |
315 | | VSR\[9\]\.dword\[1\] | SVFR9\_01 | | VSR\[25\]\.dword\[1\] | SVFR25\_01 |
316 | | VSR\[41\]\.dword\[0\] | SVFR9\_10 | | VSR\[57\]\.dword\[0\] | SVFR25\_10 |
317 | | VSR\[41\]\.dword\[1\] | SVFR9\_11 | | VSR\[57\]\.dword\[1\] | SVFR25\_11 |
318 | FPR\[10\] | VSR\[10\]\.dword\[0\] | SVFR10\_00 | FPR\[26\] | VSR\[26\]\.dword\[0\] | SVFR26\_00 |
319 | | VSR\[10\]\.dword\[1\] | SVFR10\_01 | | VSR\[26\]\.dword\[1\] | SVFR26\_01 |
320 | | VSR\[42\]\.dword\[0\] | SVFR10\_10 | | VSR\[58\]\.dword\[0\] | SVFR26\_10 |
321 | | VSR\[42\]\.dword\[1\] | SVFR10\_11 | | VSR\[58\]\.dword\[1\] | SVFR26\_11 |
322 | FPR\[11\] | VSR\[11\]\.dword\[0\] | SVFR11\_00 | FPR\[27\] | VSR\[27\]\.dword\[0\] | SVFR27\_00 |
323 | | VSR\[11\]\.dword\[1\] | SVFR11\_01 | | VSR\[27\]\.dword\[1\] | SVFR27\_01 |
324 | | VSR\[43\]\.dword\[0\] | SVFR11\_10 | | VSR\[59\]\.dword\[0\] | SVFR27\_10 |
325 | | VSR\[43\]\.dword\[1\] | SVFR11\_11 | | VSR\[59\]\.dword\[1\] | SVFR27\_11 |
326 | FPR\[12\] | VSR\[12\]\.dword\[0\] | SVFR12\_00 | FPR\[28\] | VSR\[28\]\.dword\[0\] | SVFR28\_00 |
327 | | VSR\[12\]\.dword\[1\] | SVFR12\_01 | | VSR\[28\]\.dword\[1\] | SVFR28\_01 |
328 | | VSR\[44\]\.dword\[0\] | SVFR12\_10 | | VSR\[60\]\.dword\[0\] | SVFR28\_10 |
329 | | VSR\[44\]\.dword\[1\] | SVFR12\_11 | | VSR\[60\]\.dword\[1\] | SVFR28\_11 |
330 | FPR\[13\] | VSR\[13\]\.dword\[0\] | SVFR13\_00 | FPR\[29\] | VSR\[29\]\.dword\[0\] | SVFR29\_00 |
331 | | VSR\[13\]\.dword\[1\] | SVFR13\_01 | | VSR\[29\]\.dword\[1\] | SVFR29\_01 |
332 | | VSR\[45\]\.dword\[0\] | SVFR13\_10 | | VSR\[61\]\.dword\[0\] | SVFR29\_10 |
333 | | VSR\[45\]\.dword\[1\] | SVFR13\_11 | | VSR\[61\]\.dword\[1\] | SVFR29\_11 |
334 | FPR\[14\] | VSR\[14\]\.dword\[0\] | SVFR14\_00 | FPR\[30\] | VSR\[30\]\.dword\[0\] | SVFR30\_00 |
335 | | VSR\[14\]\.dword\[1\] | SVFR14\_01 | | VSR\[30\]\.dword\[1\] | SVFR30\_01 |
336 | | VSR\[46\]\.dword\[0\] | SVFR14\_10 | | VSR\[62\]\.dword\[0\] | SVFR30\_10 |
337 | | VSR\[46\]\.dword\[1\] | SVFR14\_11 | | VSR\[62\]\.dword\[1\] | SVFR30\_11 |
338 | FPR\[15\] | VSR\[15\]\.dword\[0\] | SVFR15\_00 | FPR\[31\] | VSR\[31\]\.dword\[0\] | SVFR31\_00 |
339 | | VSR\[15\]\.dword\[1\] | SVFR15\_01 | | VSR\[31\]\.dword\[1\] | SVFR31\_01 |
340 | | VSR\[47\]\.dword\[0\] | SVFR15\_10 | | VSR\[63\]\.dword\[0\] | SVFR31\_10 |
341 | | VSR\[47\]\.dword\[1\] | SVFR15\_11 | | VSR\[63\]\.dword\[1\] | SVFR31\_11 |
342
343 # Operation
344
345 ## CR fields as inputs/outputs of vector operations
346
347 When vectorized, the CR inputs/outputs are read/written to 4-bit CR fields
348 starting from SVCR6_000 and incrementing from there. If SVCR7_111 is reached, the next CR
349 field used wraps around to SVCR0_000, then incrementing from there.
350 (see [[discussion]]. some alternative schemes are described there)
351
352 SVCR6_000 was chosen to balance avoiding needing to save CR2-CR4 (which are
353 callee-saved) just to use SV vectors with VL <= 61 as well as having the first
354 vector CR field readily accessible to standard CR instructions and branches.
355 Additionally, SVCR6_000 is used as the implicit result of a OpenPower ISA v3.1
356 standard vector (SIMD) instruction with Rc=1.
357
358 ## Table of CR fields
359
360 CR[i] is the notation used by the OpenPower spec to refer to CR field #i,
361 so FP instructions with Rc=1 write to CR[1] aka SVCR1_000.
362
363 There are 3 new SPRs for holding CRs: CR_EXT1, CR_EXT2, and CR_EXT3.
364
365 The 64 SV CRs are arranged similarly to the way the 128 integer registers are arranged:
366
367 | CR<br/>Register | SPR<br/>Field | SV CR<br/>Register | CR<br/>Register | SPR<br/>Field | SV CR<br/>Register |
368 |-----------------|----------------|--------------------|-----------------|----------------|--------------------|
369 | CR[0] | CR[32:35] | SVCR0_000 | CR[4] | CR[48:51] | SVCR4_000 |
370 | | CR_EXT1[32:35] | SVCR0_001 | | CR_EXT1[48:51] | SVCR4_001 |
371 | | CR_EXT2[32:35] | SVCR0_010 | | CR_EXT2[48:51] | SVCR4_010 |
372 | | CR_EXT3[32:35] | SVCR0_011 | | CR_EXT3[48:51] | SVCR4_011 |
373 | *CR[-8]* | CR[0:3] | SVCR0_100 | *CR[-4]* | CR[16:19] | SVCR4_100 |
374 | | CR_EXT1[0:3] | SVCR0_101 | | CR_EXT1[16:19] | SVCR4_101 |
375 | | CR_EXT2[0:3] | SVCR0_110 | | CR_EXT2[16:19] | SVCR4_110 |
376 | | CR_EXT3[0:3] | SVCR0_111 | | CR_EXT3[16:19] | SVCR4_111 |
377 | CR[1] | CR[36:39] | SVCR1_000 | CR[5] | CR[52:55] | SVCR5_000 |
378 | | CR_EXT1[36:39] | SVCR1_001 | | CR_EXT1[52:55] | SVCR5_001 |
379 | | CR_EXT2[36:39] | SVCR1_010 | | CR_EXT2[52:55] | SVCR5_010 |
380 | | CR_EXT3[36:39] | SVCR1_011 | | CR_EXT3[52:55] | SVCR5_011 |
381 | *CR[-7]* | CR[4:7] | SVCR1_100 | *CR[-3]* | CR[20:23] | SVCR5_100 |
382 | | CR_EXT1[4:7] | SVCR1_101 | | CR_EXT1[20:23] | SVCR5_101 |
383 | | CR_EXT2[4:7] | SVCR1_110 | | CR_EXT2[20:23] | SVCR5_110 |
384 | | CR_EXT3[4:7] | SVCR1_111 | | CR_EXT3[20:23] | SVCR5_111 |
385 | CR[2] | CR[40:43] | SVCR2_000 | CR[6] | CR[56:59] | SVCR6_000 |
386 | | CR_EXT1[40:43] | SVCR2_001 | | CR_EXT1[56:59] | SVCR6_001 |
387 | | CR_EXT2[40:43] | SVCR2_010 | | CR_EXT2[56:59] | SVCR6_010 |
388 | | CR_EXT3[40:43] | SVCR2_011 | | CR_EXT3[56:59] | SVCR6_011 |
389 | *CR[-6]* | CR[8:11] | SVCR2_100 | *CR[-2]* | CR[24:27] | SVCR6_100 |
390 | | CR_EXT1[8:11] | SVCR2_101 | | CR_EXT1[24:27] | SVCR6_101 |
391 | | CR_EXT2[8:11] | SVCR2_110 | | CR_EXT2[24:27] | SVCR6_110 |
392 | | CR_EXT3[8:11] | SVCR2_111 | | CR_EXT3[24:27] | SVCR6_111 |
393 | CR[3] | CR[44:47] | SVCR3_000 | CR[7] | CR[60:63] | SVCR7_000 |
394 | | CR_EXT1[44:47] | SVCR3_001 | | CR_EXT1[60:63] | SVCR7_001 |
395 | | CR_EXT2[44:47] | SVCR3_010 | | CR_EXT2[60:63] | SVCR7_010 |
396 | | CR_EXT3[44:47] | SVCR3_011 | | CR_EXT3[60:63] | SVCR7_011 |
397 | *CR[-5]* | CR[12:15] | SVCR3_100 | *CR[-1]* | CR[28:31] | SVCR7_100 |
398 | | CR_EXT1[12:15] | SVCR3_101 | | CR_EXT1[28:31] | SVCR7_101 |
399 | | CR_EXT2[12:15] | SVCR3_110 | | CR_EXT2[28:31] | SVCR7_110 |
400 | | CR_EXT3[12:15] | SVCR3_111 | | CR_EXT3[28:31] | SVCR7_111 |
401
402 Note: CR[-8] through CR[-1] are not part of OpenPower v3.1, they are the MSB half of the 64-bit CR SPR.
403
404 # Register Profiles
405
406 Instructions are broken down by Register Profiles as listed in the following auto-generated page:
407 [[opcode_regs_deduped]]. "Non-SV" indicates that the operations with this Register Profile cannot be Vectorised (mtspr, bc, dcbz, twi)
408
409 ## LDST-1R-1W-imm
410 TBD
411 ## LDST-1R-2W-imm
412 TBD
413 ## LDST-2R-imm
414 TBD
415 ## LDST-2R-1W
416 TBD
417 ## LDST-2R-1W-imm
418 TBD
419 ## LDST-2R-2W
420 TBD
421 ## LDST-3R
422 TBD
423 ## LDST-3R-CRo
424 TBD
425 ## LDST-3R-1W
426 TBD
427 ## CRio
428 TBD
429 ## CR=2R1W
430
431 Remapped Encoding Fields:
432
433 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
434 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
435 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
436
437 ## 1W-CRi
438
439 Remapped Encoding Fields:
440
441 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
442 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
443 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
444
445 ## 1R-CRo
446
447 Remapped Encoding Fields:
448
449 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
450 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
451 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
452
453 ## 1R-CRio
454
455 Remapped Encoding Fields:
456
457 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
458 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
459 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
460
461 ## 1R-1W
462
463 Remapped Encoding Fields:
464
465 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
466 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
467 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
468
469 ## 1R-1W-imm
470
471 Remapped Encoding Fields:
472
473 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
474 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
475 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
476
477 ## 1R-1W-CRo
478
479 Remapped Encoding Fields:
480
481 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
482 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
483 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
484
485 ## 1R-1W-CRio
486
487 Remapped Encoding Fields:
488
489 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:18` | `19:20` | `21:23` |
490 |-----------|-------|---------|-------|-------------|-------------|----------|-------------|-----------|---------|
491 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | MASK_SRC | ELWIDTH_SRC | SUBVL_SRC | TBD |
492
493 ## 2R-CRo
494
495 Remapped Encoding Fields:
496
497 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
498 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
499 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
500
501 ## 2R-CRio
502
503 Remapped Encoding Fields:
504
505 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
506 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
507 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
508
509 ## 2R-1W
510
511 Remapped Encoding Fields:
512
513 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
514 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
515 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
516
517 ## 2R-1W-CRo
518
519 Remapped Encoding Fields:
520
521 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
522 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
523 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
524
525 <!-- comment needed to stop ikiwiki markdown from mis-parsing table -->
526
527 ## 2R-1W-CRo (rl(w|d)imi)
528
529 Remapped Encoding Fields:
530
531 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:23` |
532 |-----------|-------|---------|-------|-------------|-------------|---------|
533 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | TBD |
534
535 ## 2R-1W-CRi
536 TBD
537 ## 2R-1W-CRio
538
539 Remapped Encoding Fields:
540
541 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:23` |
542 |-----------|-------|---------|-------|-------------|-------------|-------------|---------|
543 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | TBD |
544
545 ## 3R-1W-CRio
546
547 Remapped Encoding Fields:
548
549 | `0` | `1:3` | `4:5` | `6:7` | `8:10` | `11:13` | `14:16` | `17:19` | `20:23` |
550 |-----------|-------|---------|-------|-------------|-------------|-------------|-------------|----------|
551 | MASK_KIND | MASK | ELWIDTH | SUBVL | Rdest_EXTRA | Rsrc1_EXTRA | Rsrc2_EXTRA | Rsrc3_EXTRA | Reserved |