Latest additions
[libresoc-isa-manual.git] / powerpc-add / src / SVPrefix.tex
1 % https://bugs.libre-soc.org/show_bug.cgi?id=213
2 % SimpleV Prefix (SVprefix) Proposal v0.3
3 % https://libre-soc.org/simple_v_extension/sv_prefix_proposal/
4
5 \newcommand{\Specification}{{\href{https://libre-soc.org/simple_v_extension/specification/}{Specification}}}
6
7 \chapter{SimpleV Prefix Proposal -- v0.3}
8
9 \paragraph{}
10
11 Copyright (c) Jacob Lifshay, 2019
12 Copyright (c) Luke Kenneth Casson Leighton, 2019
13
14 This proposal is designed to be able to operate without SVorig, but not to
15 require the absence of SVorig. See \Specification.
16
17 Principle: SVprefix embeds (unmodified) RVC and 32-bit scalar opcodes into 32,
18 48 and 64 bit RV formats, to provide Vectorisation context on a per-instruction
19 basis.
20
21 \section{Options}
22
23
24 The following partial / full implementation options are possible:
25
26 \begin{itemize}
27 \item
28 SVPrefix augments the main \Specification
29
30 \item
31 SVPrefix operates independently, without the main spec VL (and MVL) \gls{CSR}s
32 (in any privilege level)
33
34 \item
35 SVPrefix operates independently, without the main spec SUBVL CSRs (in any priv level)
36
37 \item
38 SVPrefix has no support for VL (or MVL) overrides in the 64 bit instruction
39 format (VLtyp=0 as the only legal permitted value)
40
41 \item
42 SVPrefix has no support for svlen overrides in either the 48 or 64 bit
43 instruction format either (svlen=0 as the only legal permitted value).
44
45 \end{itemize}
46
47 All permutations of the above options are permitted, and the UNIX platform must
48 raise illegal instruction exceptions on implementations that do not support
49 each option. For example, an implementation that has no support for VLtyp that
50 sees an opcode with a nonzero VLtyp must raise an illegal instruction exception.
51
52 Note that SVPrefix (VLtyp and svlen) has its own STATE CSR, SVPSTATE. This
53 allows Prefixed operations to be re-entrant on traps, and to not affect VBLOCK
54 use of VL or SUBVL.
55
56 If the main \Specification CSRs and features are to be supported (VBLOCK), then
57 when VLtyp or svlen are "default" they utilise the main \Specification VBLOCK VL
58 and/or SUBVL, and, correspondingly, the main VBLOCK STATE CSR will be updated
59 and used to track hardware loops.
60
61 If however VLtyp is set to nondefault, then the SVPSTATE src and destoffs
62 fields are used instead to create the hardware loops, and likewise if svlen is
63 set to nondefault, SVPSTATE's svoffs field is used.
64
65 \section{Half-Precision Floating Point (FP16)}
66
67 If the F extension is supported, SVprefix adds support for FP16 in the base FP
68 instructions by using 10 (H) in the floating-point format field fmt and using
69 001 (H) in the floating-point load/store width field.
70
71 \section{Compressed Instructions}
72
73 Compressed instructions are under evaluation by taking the same prefix as used
74 in P48, embedding that and standard RVC opcodes (minus their RVC prefix) into a
75 32-bit space. This by taking the three remaining Major "custom" opcodes (0-2),
76 % TODO discussion ???
77 one for each of the three RVC Quadrants. see \textbf{discussion ???}.
78
79 \section{48-bit Prefixed Instructions}
80
81 All 48-bit prefixed instructions contain a 32-bit "base" instruction as the
82 last 4 bytes. Since all 32-bit instructions have bits 1:0 set to 11, those bits
83 are reused for additional encoding space in the 48-bit instructions.
84
85 \section{64-bit Prefixed Instructions}
86
87 The 48 bit format is further extended with the full 128-bit range on all source
88 and destination registers, and the option to set both SVSTATE.VL and
89 SVSTATE.MVL is provided.
90
91 \section{48-bit Instruction Encodings}
92
93 In the following table, Rsvd (reserved) entries must be zero. RV32 equivalent encodings included for side-by-side comparison (and listed below, separately).
94
95 First, bits 17:0:
96
97 \begin{tabular}{|l|l|l|l|l|l|l|l|l|l|} \hline
98 Encoding & 17 & 16 & 15 & 14 & 13 & 12 & 11:7 & 6 & 5:0 \\ \hline
99 P48-LD-type & rd[5] & rs1[5] & vitp7[6] & vd & vs1 & vitp7[5:0] & & Rsvd & 011111 \\ \hline
100 P48-ST-type & vitp7[6] & rs1[5] & rs2[5] & vs2 & vs1 & vitp7[5:0] & & Rsvd & 011111 \\ \hline
101 P48-R-type & rd[5] & rs1[5] & rs2[5] & vs2 & vs1 & vitp6 & & Rsvd & 011111 \\ \hline
102 P48-I-type & rd[5] & rs1[5] & vitp7[6] & vd & vs1 & vitp7[5:0] & & Rsvd & 011111 \\ \hline
103 P48-U-type & rd[5] & Rsvd & Rsvd & vd & Rsvd & vitp6 & & Rsvd & 011111 \\ \hline
104 P48-FR-type & rd[5] & rs1[5] & rs2[5] & vs2 & vs1 & Rsvd & vtp5 & Rsvd & 011111 \\ \hline
105 P48-FI-type & rd[5] & rs1[5] & vitp7[6] & vd & vs1 & vitp7[5:0] & & Rsvd & 011111 \\ \hline
106 P48-FR4-type & rd[5] & rs1[5] & rs2[5] & vs2 & rs3[5] & vs3 [1] & vtp5 & Rsvd & 011111 \\ \hline
107 \end{tabular}
108
109 \fixme{ The link to [1] is easily confused with the likes of [5]}
110
111 [1] Only vs2 and vs3 are included in the P48-FR4-type encoding because there is
112 not enough space for vs1 as well, and because it is more useful to have a
113 scalar argument for each of the multiplication and addition portions of fmadd
114 than to have two scalars on the multiplication portion.
115
116 Table showing correspondance between P48--type and RV32--type. These are bits 47:18 (RV32 shifted up by 16 bits):
117
118 \begin{tabular}{|l|l|} \hline
119 Encoding & RV32 Encoding \\ \hline
120 47:32 & 31:2 \\ \hline
121 P48-LD-type & RV32-I-type \\ \hline
122 P48-ST-type & RV32-S-Type \\ \hline
123 P48-R-type & RV32-R-Type \\ \hline
124 P48-I-type & RV32-I-Type \\ \hline
125 P48-U-type & RV32-U-Type \\ \hline
126 P48-FR-type & RV32-FR-Type \\ \hline
127 P48-FI-type & RV32-I-Type \\ \hline
128 P48-FR4-type & RV32-FR4-type \\ \hline
129 \end{tabular}
130
131 Table showing Standard RV32 encodings:
132
133 \begin{tabular}{|l|l|l|l|l|l|l|l|l|} \hline
134 Encoding & 31:27 & 26:25 & 24:20 & 19:15 & 14:12 & 11:7 & 6:2 & 1:0 \\ \hline
135 RV32-R-type & funct7 & & rs2[4:0] & rs1[4:0] & funct3 & rd[4:0] & opcode & 0b11 \\ \hline
136 RV32-S-type & imm[11:5] & & rs2[4:0] & rs1[4:0] & funct3 & imm[4:0] & opcode & 0b11 \\ \hline
137 RV32-I-type & imm[11:0] & & & rs1[4:0] & funct3 & rd[4:0] & opcode & 0b11 \\ \hline
138 RV32-U-type & imm[31:12] & & & & & rd[4:0] & opcode & 0b11 \\ \hline
139 RV32-FR4-type & rs3[4:0] & fmt & rs2[4:0] & rs1[4:0] & funct3 & rd[4:0] & opcode & 0b11 \\ \hline
140 RV32-FR-type & funct5 & fmt & rs2[4:0] & rs1[4:0] & rm & rd[4:0] & opcode & 0b11 \\ \hline
141 \end{tabular}
142
143 \section{64-bit Instruction Encodings}
144
145 Where in the 48 bit format the prefix is "0b0011111" in bits 0 to 6, this is now set to "0b0111111".
146
147 \begin{tabular}{|l|l|l|l|} \hline
148 63:48 & 47:18 & 17:7 & 6:0 \\ \hline
149 64 bit prefix & RV32[31:3] & P48[17:7] & 0b0111111 \\ \hline
150 \end{tabular}
151
152 \begin{itemize}
153 \item
154 The 64 bit prefix format is below
155
156 \item
157 Bits 18 to 47 contain bits 3 to 31 of a standard RV32 format
158
159 \item
160 Bits 7 to 17 contain bits 7 through 17 of the P48 format
161
162 \item
163 Bits 0 to 6 contain the standard RV 64-bit prefix 0b0111111
164
165 \end{itemize}
166
167 64 bit prefix format:
168
169 \begin{tabular}{|l|l|l|l|l|l|} \hline
170 Encoding & 63 & 62 & 61 & 60 & 59:48 \\ \hline
171 P64-LD-type & rd[6] & rs1[6] & & Rsvd & VLtyp \\ \hline
172 P64-ST-type & & rs1[6] & rs2[6] & Rsvd & VLtyp \\ \hline
173 P64-R-type & rd[6] & rs1[6] & rs2[6] & vd & VLtyp \\ \hline
174 P64-I-type & rd[6] & rs1[6] & & Rsvd & VLtyp \\ \hline
175 P64-U-type & rd[6] & & & Rsvd & VLtyp \\ \hline
176 P64-FR-type & & rs1[6] & rs2[6] & vd & VLtyp \\ \hline
177 P64-FI-type & rd[6] & rs1[6] & rs2[6] & vd & VLtyp \\ \hline
178 P64-FR4-type & rd[6] & rs1[6] & rs2[6] & rs3[6] & VLtyp \\ \hline
179 \end{tabular}
180
181 The extra bit for src and dest registers provides the full range of up to 128
182 registers, when combined with the extra bit from the 48 bit prefix as well.
183 VLtyp encodes how (whether) to set SVPSTATE.VL and SVPSTATE.MAXVL.
184
185 \section{VLtyp field encoding}
186
187 NOTE: VL and MVL below are local to SVPrefix and, if non-default, will update
188 the src and dest element offsets in SVPSTATE, not the main \Specification STATE.
189 If default (all zeros) then STATE VL and MVL apply to this instruction, and
190 STATE.srcoffs (etc) will be used.
191
192 \begin{tabular}{|l|l|l|l|l|} \hline
193 VLtyp[11] & VLtyp[10:6] & VLtyp[5:1] & VLtyp[0] & comment \\ \hline
194 0 & 00000 & 00000 & 0 & no change to VL/MVL \\ \hline
195 0 & VLdest & VLEN & vlt & VL imm/reg mode (vlt) \\ \hline
196 1 & VLdest & MVL+VL-immed & 0 & MVL+VL immed mode \\ \hline
197 1 & VLdest & MVL-immed & 1 & MVL immed mode \\ \hline
198 \end{tabular}
199
200 Note: when VLtyp is all zeros, the main \Specification VL and MVL apply to this
201 instruction. If called outside of a VBLOCK or if sv.setvl has not set VL, the
202 operation is "scalar".
203
204 Just as in the VBLOCK format, when bit 11 of VLtyp is zero:
205
206 \begin{itemize}
207 \item
208 if vlt is zero, bits 1 to 5 specify the VLEN as a 5 bit immediate (offset
209 by 1: 0b00000 represents VL=1, 0b00001 represents VL=2 etc.)
210
211 \item
212 if vlt is 1, bits 1 to 5 specify the scalar (RV standard) register from
213 which VL is set. x0 is not permitted
214
215 \item
216 VL goes into the scalar register VLdest (if VLdest is not x0)
217
218 \end{itemize}
219
220 When bit 11 of VLtype is 1:
221
222 \begin{itemize}
223 \item
224 if VLtyp[0] is zero, both SVPSTATE.MAXVL and SVPSTATE.VL are set to
225 (imm+1). The same value goes into the scalar register VLdest (if VLdest is
226 not x0)
227
228 \item
229 if VLtyp[0] is 1, SVPSTATE.MAXVL is set to (imm+1). SVPSTATE.VL will be
230 truncated to within the new range (if VL was greater than the new MAXVL).
231 The new VL goes into the scalar register VLdest (if VLdest is not x0).
232
233 \end{itemize}
234
235 This gives the option to set up SVPSTATE.VL in a "loop mode" (VLtype[11]=0) or
236 in a "one-off" mode (VLtype[11]=1) which sets both MVL and VL to the same
237 immediate value. This may be most useful for one-off Vectorised operations such
238 as LOAD-MULTI / STORE-MULTI, for saving and restoration of large batches of
239 registers in context-switches or function calls.
240
241 Note that VLtyp's VL and MVL are not the same as the main \Specification VL or
242 MVL, and that loops will alter srcoffs and destoffs in SVPSTATE in VLtype
243 nondefault mode, but the srcoffs and destoffs in STATE, if VLtype=0.
244
245 Furthermore, the execution order and exception handling must be exactly the
246 same as in the main spec (Program Order must be preserved)
247
248 Pseudocode for SVPSTATE.VL:
249
250 \begin{verbatim}
251 # pseudocode
252
253 regs = [0u64; 128];
254 vl = 0;
255
256 // instruction fields:
257 rd = get_rd_field();
258 vlmax = get_immed_field();
259
260 // handle illegal instruction decoding
261 if vlmax > XLEN {
262 trap()
263 }
264
265 // calculate VL
266 if rs1 == 0 { // rs1 is x0
267 vl = vlmax
268 } else {
269 vl = min(regs[rs1], vlmax)
270 }
271
272 // write rd
273 if rd != 0 {
274 // rd is not x0
275 regs[rd] = vl
276 }
277 \end{verbatim}
278
279
280 \section{vs\#/vd Fields' Encoding}
281
282 % Note tabularx - as the 3rd field needs to wrap otherwise it overflows the line
283 \begin{tabularx}{\textwidth}{|l|l|X|} \hline
284 vs\#/vd & Mnemonic & Meaning \\ \hline
285 0 & S & the rs\#/rd field specifies a scalar (single sub-vector);
286 the rs\#/rd field is zero-extended to get the actual 7-bit register number
287 \\ \hline
288 1 & V & the rs\#/rd field specifies a vector; the rs\#/rd field is decoded using
289 the Vector Register Number Encoding to get the actual 7-bit register number
290 \\ \hline
291 \end{tabularx}
292
293 \fixme{Vector Register Number Encoding should be a link }
294
295 If a vs\#/vd field is not present, it is as if it was present with a value that
296 is the bitwise-or of all present vs\#/vd fields.
297
298 \begin{itemize}
299 \item
300 scalar register numbers do NOT increment when allocated in the hardware
301 for-loop. the same scalar register number is handed to every ALU.
302
303 \item
304 vector register numbers DO increase when allocated in the hardware
305 for-loop. sequentially-increasing register data is handed to sequential
306 ALUs.
307
308 \end{itemize}
309
310 \section{Vector Register Number Encoding}
311
312 For the 48 bit format, when vs\#/vd is 1, the actual 7-bit register number is
313 derived from the corresponding 6-bit rs\#/rd field:
314
315 \begin{tabular}{|l|l|l|} \hline
316 \multicolumn{3}{|c|}{Actual 7-bit register number} \\ \hline
317 Bit 6 & Bits 5:1 & Bit 0 \\ \hline
318 rs\#/rd[0] & rs\#/rd[5:1] & 0 \\ \hline
319 \end{tabular}
320
321 For the 64 bit format, the 7 bit register is constructed from the 7 bit fields:
322 bits 0 to 4 from the 32 bit RV Standard format, bit 5 from the 48 bit prefix
323 and bit 6 from the 64 bit prefix. Thus in the 64 bit format the full range of
324 up to 128 registers is directly available. This for both when either scalar or
325 vector mode is set.
326
327 \section{Load/Store Kind (lsk) Field Encoding}
328
329 \begin{tabular}{|l|l|l|} \hline
330 vd/vs2 & vs1 & Meaning \\ \hline
331 0 & 0 & srcbase is scalar, LD/ST is pure scalar. \\ \hline
332 1 & 0 & srcbase is scalar, LD/ST is unit strided \\ \hline
333 0 & 1 & srcbase is a vector (gather/scatter aka array of srcbases). VSPLAT and VSELECT \\ \hline
334 1 & 1 & srcbase is a vector, LD/ST is a full vector LD/ST. \\ \hline
335 \end{tabular}
336
337 Notes:
338 \begin{itemize}
339 \item
340 A register strided LD/ST would require 5 registers. srcbase, vd/vs2,
341 predicate 1, predicate 2 and the stride register.
342
343 \item
344 Complex strides may all be done with a general purpose vector of srcbases.
345
346 \item
347 Twin predication may be used even when vd/vs1 is a scalar, to give VSPLAT
348 and VSELECT, because the hardware loop ends on the first occurrence of a 1
349 in the predicate when a predicate is applied to a scalar.
350
351 \item
352 Full vectorised gather/scatter is enabled when both registers are marked as
353 vectorised, however unlike e.g Intel AVX512, twin predication can be
354 applied.
355
356 \end{itemize}
357
358 Open question: RVV overloads the width field of LOAD-FP/STORE-FP using the bit
359 2 to indicate additional interpretation of the 11 bit immediate. Should this be
360 considered ?
361
362 \section{Sub-Vector Length (svlen) Field Encoding}
363
364 NOTE: svlen is not the same as the main spec SUBVL. When nondefault (not zero)
365 SVPSTATE context is used for Sub vector loops. However is svlen is zero, STATE
366 and SUBVL is used instead.
367
368 Bitwidth, from VL's perspective, is a multiple of the elwidth times svlen. So
369 within each loop of VL there are svlen sub-elements of elwidth in size, just
370 like in a SIMD architecture. When svlen is set to 0b00 (indicating svlen=1) no
371 such SIMD-like behaviour exists and the subvectoring is disabled.
372
373 Predicate bits do not apply to the individual sub-vector elements, they apply
374 to the entire subvector group. This saves instructions on setup of the
375 predicate.
376
377 \begin{tabular}{|l|l|} \hline
378 svlen Encoding & Value \\ \hline
379 00 & SUBVL \\ \hline
380 01 & 2 \\ \hline
381 10 & 3 \\ \hline
382 11 & 4 \\ \hline
383 \end{tabular}
384
385 In independent standalone implementations that do not implement the main
386 \Specification, the value of SUBVL in the above table (svtyp=0b00) is set to 1,
387 such that svlen is also 1.
388
389 Behaviour of operations that set svlen are identical to those of the main spec.
390 See section on VLtyp, above.
391
392 \section{Predication (pred) Field Encoding}
393
394 \begin{tabular}{|l|l|l|l|} \hline
395 pred & Mnemonic & Predicate Register & Meaning \\ \hline
396 000 & None & None & The instruction is unpredicated \\ \hline
397 001 & Reserved & Reserved & \\ \hline
398 010 & !x9 & \multirow{2}{*}{x9 (s1)} & execute vector op[0..i] on x9[i] == 0 \\ \cline{1-2} \cline{4-4}
399 011 & x9 & & execute vector op[0..i] on x9[i] == 1 \\ \hline
400 100 & !x10 & \multirow{2}{*}{x10 (a0)} & execute vector op[0..i] on x10[i] == 0 \\ \cline{1-2} \cline{4-4}
401 101 & x10 & & execute vector op[0..i] on x10[i] == 1 \\ \hline
402 110 & !x11 & \multirow{2}{*}{x11 (a1)} & execute vector op[0..i] on x11[i] == 0 \\ \cline{1-2} \cline{4-4}
403 111 & x11 & & execute vector op[0..i] on x11[i] == 1 \\ \hline
404 \end{tabular}
405
406 \section{Twin-predication (tpred) Field Encoding}
407
408 Twin-predication (ability to associate two predicate registers with an
409 instruction) applies to MV, FCLASS, LD and ST. The same format also applies to
410 integer-branch-compare operations although it is not to be considered "twin"
411 predication. In the case of integer-branch-compare operations, the second
412 register (if enabled) stores the results of the element comparisons. See
413 Appendix for details.
414
415 \fixme{Appendix above is link to http://libre\-riscv.org/simple\_v\_extension/appendix/ }
416
417 \begin{tabular}{|l|l|l|l|} \hline
418 pred & Mnemonic & Predicate Register & Meaning \\ \hline
419 000 & None & None & The instruction is unpredicated \\ \hline
420 001 & x9,off & src=x9, dest=none & src[0..i] uses x9[i], dest unpredicated \\ \hline
421 010 & off,x10 & src=none, dest=x10 & dest[0..i] uses x10[i], src unpredicated \\ \hline
422 011 & x9,10 & src=x9, dest=x10 & src[0..i] uses x9[i], dest[0..i] uses x10[i] \\ \hline
423 100 & None & RESERVED & Instruction is unpredicated (TBD) \\ \hline
424 101 & !x9,off & src=!x9, dest=none & \\ \hline
425 110 & off,!x10 & src=none, dest=!x10 & \\ \hline
426 111 & !x9,!x10 & src=!x9, dest=!x10 & \\ \hline
427 \end{tabular}
428
429 \fixme{In table above some in col 3 might be vertically joined}
430
431 \section{Integer Element Type (itype) Field Encoding}
432
433 \begin{tabularx}{\textwidth}{|l|l|l|X|X|X|} \hline
434 Signedness [2] & itype & Element Type & Mnemonic in Integer Instructions & Mnemonic in FP Instructions (such as fmv.x) & Meaning (INT may be un/signed, FP just re-sized \\ \hline
435 Unsigned & 01 & u8 & BU & BU & Unsigned 8-bit \\ \hline
436 & 10 & u16 & HU & HU & Unsigned 16-bit \\ \hline
437 & 11 & u32 & WU & WU & Unsigned 32-bit \\ \hline
438 & 00 & uXLEN & WU/DU/QU & WU/LU/TU & Unsigned XLEN-bit \\ \hline
439 Signed & 01 & i8 & BS & BS & Signed 8-bit \\ \hline
440 & 10 & i16 & HS & HS & Signed 16-bit \\ \hline
441 & 11 & i32 & W & W & Signed 32-bit \\ \hline
442 & 00 & iXLEN & W/D/Q & W/L/T & Signed XLEN-bit \\ \hline
443 \end{tabularx}
444
445 [2] (1, 2) Signedness is defined in Signedness Decision Procedure
446
447 Note: vector mode is effectively a type-cast of the register file as if it was
448 a sequential array being typecast to typedef itype[] (c syntax). The starting
449 point of the "typecast" is the vector register rs\#/rd.
450
451 Example: if itype=0b10 (u16), and rd is set to "vector", and VL is set to 4,
452 the 64-bit register at rd is subdivided into FOUR 16-bit destination elements.
453 It is NOT four separate 64-bit destination registers (rd+0, rd+1, rd+2, rd+3)
454 that are sign-extended from the source width size out to 64-bit, because that
455 is itype=0b00 (uXLEN).
456
457 Note also: changing elwidth creates packed elements that, depending on VL, may
458 create vectors that do not fit perfectly onto XLEN sized registry file
459 bit-boundaries. This does NOT result in the destruction of the MSBs of the last
460 register written to at the end of a VL loop. More details on how to handle this
461 are described in the main \Specification.
462
463 \section{Signedness Decision Procedure}
464
465 \begin{enumerate}
466 \item
467 If the opcode field is either OP or OP-IMM, then
468
469 \indent Signedness is Unsigned.
470
471 \item
472 If the opcode field is either OP-32 or OP-IMM-32, then
473
474 \indent Signedness is Signed.
475
476 \item
477 If Signedness is encoded in a field of the base instruction, [3] then
478
479 \indent Signedness uses the encoded value.
480
481 \item
482 Otherwise,
483
484 \indent Signedness is Unsigned.
485
486 \end{enumerate}
487
488 [3] Like in fcvt.d.l[u], but unlike in fmv.x.w, since there is no fmv.x.wu
489
490 \section{Vector Type and Predication 5-bit (vtp5) Field Encoding}
491
492 In the following table, X denotes a wildcard that is 0 or 1 and can be a
493 different value for every occurrence.
494
495 \begin{tabular}{|l|l|l|} \hline
496 vtp5 & pred & svlen \\ \hline
497 1XXXX & vtp5[4:2] & vtp5[1:0] \\ \hline
498 01XXX & & \\ \hline
499 000XX & & \\ \hline
500 001XX & Reserved & \\ \hline
501 \end{tabular}
502
503 \section{Vector Integer Type and Predication 6-bit (vitp6) Field Encoding}
504
505 In the following table, X denotes a wildcard that is 0 or 1 and can be a
506 different value for every occurrence.
507
508 \begin{tabular}{|l|l|l|l|l|} \hline
509 vitp6 & itype & pred[2] & pred[0:1] & svlen \\ \hline
510 XX1XXX & vitp6[5:4] & 0 & vitp6[3:2] & vitp6[1:0] \\ \hline
511 XX00XX & & & & \\ \hline
512 XX01XX & Reserved & & & \\ \hline
513 \end{tabular}
514
515 \fixme{spanning cols/rows above}
516
517 vitp7 field: only tpred
518
519 \begin{tabular}{|l|l|l|l|l|} \hline
520 vitp7 & itype & tpred[2] & tpred[0:1] & svlen \\ \hline
521 XXXXXXX & vitp7[5:4] & vitp7[6] & vitp7[3:2] & vitp7[1:0] \\ \hline
522 \end{tabular}
523
524 \section{48-bit Instruction Encoding Decision Procedure}
525
526 In the following decision procedure, \textit{Reserved} means that there is not yet a
527 defined 48-bit instruction encoding for the base instruction.
528
529 \begin{enumerate}
530
531 \item
532 If the base instruction is a load instruction, then
533
534 \begin{enumerate}
535 \item
536 If the base instruction is an I-type instruction, then
537 \begin{enumerate}
538 \item
539 The encoding is P48-LD-type.
540
541 \end{enumerate}
542
543 \item
544 Otherwise
545 \begin{enumerate}
546 \item
547 The encoding is \textit{Reserved}.
548
549 \end{enumerate}
550
551 \end{enumerate}
552 \item
553 If the base instruction is a store instruction, then
554
555 \begin{enumerate}
556 \item
557 If the base instruction is an S-type instruction, then
558 \begin{enumerate}
559 \item
560 The encoding is P48-ST-type.
561
562 \end{enumerate}
563
564 \item
565 Otherwise
566 \begin{enumerate}
567 \item
568 The encoding is \textit{Reserved}.
569
570 \end{enumerate}
571
572 \end{enumerate}
573
574 \item
575 If the base instruction is a SYSTEM instruction, then
576
577 \begin{enumerate}
578 \item
579 The encoding is \textit{Reserved}.
580
581 \end{enumerate}
582
583 \item
584 If the base instruction is an integer instruction, then
585
586 \begin{enumerate}
587
588 \item
589 If the base instruction is an R-type instruction, then
590 \begin{enumerate}
591 \item
592 The encoding is P48-R-type.
593
594 \end{enumerate}
595
596 \item
597 If the base instruction is an I-type instruction, then
598 \begin{enumerate}
599 \item
600 The encoding is P48-I-type.
601
602 \end{enumerate}
603
604 \item
605 If the base instruction is an S-type instruction, then
606 \begin{enumerate}
607 \item
608 The encoding is \textit{Reserved}.
609
610 \end{enumerate}
611
612 \item
613 If the base instruction is an B-type instruction, then
614 \begin{enumerate}
615 \item
616 The encoding is \textit{Reserved}.
617
618 \end{enumerate}
619
620 \item
621 If the base instruction is an U-type instruction, then
622 \begin{enumerate}
623 \item
624 The encoding is P48-U-type.
625
626 \end{enumerate}
627
628 \item
629 If the base instruction is an J-type instruction, then
630 \begin{enumerate}
631 \item
632 The encoding is \textit{Reserved}.
633
634 \end{enumerate}
635
636 \item
637 Otherwise
638 \begin{enumerate}
639 \item
640 The encoding is \textit{Reserved}.
641
642 \end{enumerate}
643
644 \end{enumerate}
645
646 \item
647 If the base instruction is a floating-point instruction, then
648
649 \begin{enumerate}
650
651 \item
652 If the base instruction is an R-type instruction, then
653 \begin{enumerate}
654 \item
655 The encoding is P48-FR-type.
656
657 \end{enumerate}
658
659 \item
660 If the base instruction is an I-type instruction, then
661 \begin{enumerate}
662 \item
663 The encoding is P48-FI-type.
664
665 \end{enumerate}
666
667 \item
668 If the base instruction is an S-type instruction, then
669 \begin{enumerate}
670 \item
671 The encoding is \textit{Reserved}.
672
673 \end{enumerate}
674
675 \item
676 If the base instruction is an B-type instruction, then
677 \begin{enumerate}
678 \item
679 The encoding is \textit{Reserved}.
680
681 \end{enumerate}
682
683 \item
684 If the base instruction is an U-type instruction, then
685 \begin{enumerate}
686 \item
687 The encoding is \textit{Reserved}.
688
689 \end{enumerate}
690
691 \item
692 If the base instruction is an J-type instruction, then
693 \begin{enumerate}
694 \item
695 The encoding is \textit{Reserved}.
696
697 \end{enumerate}
698
699 \item
700 If the base instruction is an R4-type instruction, then
701 \begin{enumerate}
702 \item
703 The encoding is P48-FR4-type.
704
705 \end{enumerate}
706
707 \item
708 Otherwise
709 \begin{enumerate}
710 \item
711 The encoding is \textit{Reserved}.
712
713 \end{enumerate}
714 \end{enumerate}
715
716 \item
717 Otherwise
718 The encoding is \textit{Reserved}.
719
720 \end{enumerate}
721
722 \section{CSR Registers}
723
724 CSRs are the same as in the main \Specification, if associated functionality is implemented. They have the exact same meaning as in the main \Specification.
725
726 \begin{itemize}
727 \item
728 VL
729
730 \item
731 MVL
732
733 \item
734 SVPSTATE
735
736 \item
737 SUBVL
738
739 \end{itemize}
740
741 Associated SET and GET on the CSRs is exactly as in the main spec as well
742 (including CSRRWI and CSRRW differences).
743
744 Note that if both VLtyp and svlen are not implemented, SVPSTATE is not
745 required. Also if VL and SUBVL are not implemented, STATE from the main
746 \Specification is not required either.
747
748 However if partial functionality is implemented, the unimplemented bits in
749 STATE and SVPSTATE must be zero, and, in the UNIX Platform, an illegal
750 exception MUST be raised if unsupported bits are written to.
751
752 SVPSTATE fields are exactly the same layout as STATE:
753
754 \begin{tabular}{|l|l|l|l|l|l|l|} \hline
755 (31..28) & (27..26) & (25..24) & (23..18) & (17..12) & (11..6) & (5...0) \\ \hline
756 rsvd & dsvoffs & subvl & destoffs & srcoffs & vl & maxvl \\ \hline
757 \end{tabular}
758
759 However note that where STATE stores the scalar register number to be used as
760 VL, SVPSTATE.VL actually contains the actual VL value, in an identical fashion
761 to RVV.
762
763 \section{Additional Instructions}
764
765 \begin{itemize}
766 \item
767 Add instructions to convert between integer types.
768
769 \item
770 Add instructions to swizzle elements in sub-vectors. Note that the
771 sub-vector lengths of the source and destination won't necessarily match.
772
773 \item
774 Add instructions to transpose (2-4)x(2-4) element matrices.
775
776 \item
777 Add instructions to insert or extract a sub-vector from a vector, with the
778 index allowed to be both immediate and from a register (immediate can be
779 covered by twin-predication, register might be, by virtue of predicates
780 being registers)
781
782 \item
783 Add a register gather instruction (aka MV.X: regfile[rd] =
784 regfile[regfile[rs1]])
785
786 \end{itemize}
787
788 subelement swizzle example:
789
790 velswizzle x32, x64, SRCSUBVL=3, DESTSUBVL=4, ELTYPE=u8, elements=[0, 0, 2, 1]
791
792 \section{Questions}
793
794 Moved to the discussion page (link at top of this page)
795
796 \section{TODO}
797
798 Work out a way to do sub-element swizzling.