1 % https://bugs.libre-soc.org/show_bug.cgi?id=213
2 % SimpleV Prefix (SVprefix) Proposal v0.3
3 % https://libre-soc.org/simple_v_extension/sv_prefix_proposal/
5 \newcommand{\Specification}{{\href{https://libre-soc.org/simple_v_extension/specification/
}{Specification
}}}
7 \chapter{SimpleV Prefix Proposal -- v0.3
}
11 Copyright (c) Jacob Lifshay,
2019
12 Copyright (c) Luke Kenneth Casson Leighton,
2019
14 This proposal is designed to be able to operate without SVorig, but not to
15 require the absence of SVorig. See
\Specification.
17 Principle: SVprefix embeds (unmodified) RVC and
32-bit scalar opcodes into
32,
18 48 and
64 bit RV formats, to provide Vectorisation context on a per-instruction
24 The following partial / full implementation options are possible:
28 SVPrefix augments the main
\Specification
31 SVPrefix operates independently, without the main spec VL (and MVL)
\gls{CSR
}s
32 (in any privilege level)
35 SVPrefix operates independently, without the main spec SUBVL CSRs (in any priv level)
38 SVPrefix has no support for VL (or MVL) overrides in the
64 bit instruction
39 format (VLtyp=
0 as the only legal permitted value)
42 SVPrefix has no support for svlen overrides in either the
48 or
64 bit
43 instruction format either (svlen=
0 as the only legal permitted value).
47 All permutations of the above options are permitted, and the UNIX platform must
48 raise illegal instruction exceptions on implementations that do not support
49 each option. For example, an implementation that has no support for VLtyp that
50 sees an opcode with a nonzero VLtyp must raise an illegal instruction exception.
52 Note that SVPrefix (VLtyp and svlen) has its own STATE CSR, SVPSTATE. This
53 allows Prefixed operations to be re-entrant on traps, and to not affect VBLOCK
56 If the main
\Specification CSRs and features are to be supported (VBLOCK), then
57 when VLtyp or svlen are "default" they utilise the main
\Specification VBLOCK VL
58 and/or SUBVL, and, correspondingly, the main VBLOCK STATE CSR will be updated
59 and used to track hardware loops.
61 If however VLtyp is set to nondefault, then the SVPSTATE src and destoffs
62 fields are used instead to create the hardware loops, and likewise if svlen is
63 set to nondefault, SVPSTATE's svoffs field is used.
65 \section{Half-Precision Floating Point (FP16)
}
67 If the F extension is supported, SVprefix adds support for FP16 in the base FP
68 instructions by using
10 (H) in the floating-point format field fmt and using
69 001 (H) in the floating-point load/store width field.
71 \section{Compressed Instructions
}
73 Compressed instructions are under evaluation by taking the same prefix as used
74 in P48, embedding that and standard RVC opcodes (minus their RVC prefix) into a
75 32-bit space. This by taking the three remaining Major "custom" opcodes (
0-
2),
77 one for each of the three RVC Quadrants. see
\textbf{discussion ???
}.
79 \section{48-bit Prefixed Instructions
}
81 All
48-bit prefixed instructions contain a
32-bit "base" instruction as the
82 last
4 bytes. Since all
32-bit instructions have bits
1:
0 set to
11, those bits
83 are reused for additional encoding space in the
48-bit instructions.
85 \section{64-bit Prefixed Instructions
}
87 The
48 bit format is further extended with the full
128-bit range on all source
88 and destination registers, and the option to set both SVSTATE.VL and
89 SVSTATE.MVL is provided.
91 \section{48-bit Instruction Encodings
}
93 In the following table, Rsvd (reserved) entries must be zero. RV32 equivalent encodings included for side-by-side comparison (and listed below, separately).
97 \begin{tabular
}{|l|l|l|l|l|l|l|l|l|l|
} \hline
98 Encoding &
17 &
16 &
15 &
14 &
13 &
12 &
11:
7 &
6 &
5:
0 \\
\hline
99 P48-LD-type & rd
[5] & rs1
[5] & vitp7
[6] & vd & vs1 & vitp7
[5:
0] & & Rsvd &
011111 \\
\hline
100 P48-ST-type & vitp7
[6] & rs1
[5] & rs2
[5] & vs2 & vs1 & vitp7
[5:
0] & & Rsvd &
011111 \\
\hline
101 P48-R-type & rd
[5] & rs1
[5] & rs2
[5] & vs2 & vs1 & vitp6 & & Rsvd &
011111 \\
\hline
102 P48-I-type & rd
[5] & rs1
[5] & vitp7
[6] & vd & vs1 & vitp7
[5:
0] & & Rsvd &
011111 \\
\hline
103 P48-U-type & rd
[5] & Rsvd & Rsvd & vd & Rsvd & vitp6 & & Rsvd &
011111 \\
\hline
104 P48-FR-type & rd
[5] & rs1
[5] & rs2
[5] & vs2 & vs1 & Rsvd & vtp5 & Rsvd &
011111 \\
\hline
105 P48-FI-type & rd
[5] & rs1
[5] & vitp7
[6] & vd & vs1 & vitp7
[5:
0] & & Rsvd &
011111 \\
\hline
106 P48-FR4-type & rd
[5] & rs1
[5] & rs2
[5] & vs2 & rs3
[5] & vs3
[1] & vtp5 & Rsvd &
011111 \\
\hline
109 \fixme{ The link to
[1] is easily confused with the likes of
[5]}
111 [1] Only vs2 and vs3 are included in the P48-FR4-type encoding because there is
112 not enough space for vs1 as well, and because it is more useful to have a
113 scalar argument for each of the multiplication and addition portions of fmadd
114 than to have two scalars on the multiplication portion.
116 Table showing correspondance between P48--type and RV32--type. These are bits
47:
18 (RV32 shifted up by
16 bits):
118 \begin{tabular
}{|l|l|
} \hline
119 Encoding & RV32 Encoding \\
\hline
120 47:
32 &
31:
2 \\
\hline
121 P48-LD-type & RV32-I-type \\
\hline
122 P48-ST-type & RV32-S-Type \\
\hline
123 P48-R-type & RV32-R-Type \\
\hline
124 P48-I-type & RV32-I-Type \\
\hline
125 P48-U-type & RV32-U-Type \\
\hline
126 P48-FR-type & RV32-FR-Type \\
\hline
127 P48-FI-type & RV32-I-Type \\
\hline
128 P48-FR4-type & RV32-FR4-type \\
\hline
131 Table showing Standard RV32 encodings:
133 \begin{tabular
}{|l|l|l|l|l|l|l|l|l|
} \hline
134 Encoding &
31:
27 &
26:
25 &
24:
20 &
19:
15 &
14:
12 &
11:
7 &
6:
2 &
1:
0 \\
\hline
135 RV32-R-type & funct7 & & rs2
[4:
0] & rs1
[4:
0] & funct3 & rd
[4:
0] & opcode &
0b11 \\
\hline
136 RV32-S-type & imm
[11:
5] & & rs2
[4:
0] & rs1
[4:
0] & funct3 & imm
[4:
0] & opcode &
0b11 \\
\hline
137 RV32-I-type & imm
[11:
0] & & & rs1
[4:
0] & funct3 & rd
[4:
0] & opcode &
0b11 \\
\hline
138 RV32-U-type & imm
[31:
12] & & & & & rd
[4:
0] & opcode &
0b11 \\
\hline
139 RV32-FR4-type & rs3
[4:
0] & fmt & rs2
[4:
0] & rs1
[4:
0] & funct3 & rd
[4:
0] & opcode &
0b11 \\
\hline
140 RV32-FR-type & funct5 & fmt & rs2
[4:
0] & rs1
[4:
0] & rm & rd
[4:
0] & opcode &
0b11 \\
\hline
143 \section{64-bit Instruction Encodings
}
145 Where in the
48 bit format the prefix is "
0b0011111" in bits
0 to
6, this is now set to "
0b0111111".
147 \begin{tabular
}{|l|l|l|l|
} \hline
148 63:
48 &
47:
18 &
17:
7 &
6:
0 \\
\hline
149 64 bit prefix & RV32
[31:
3] & P48
[17:
7] &
0b0111111 \\
\hline
154 The
64 bit prefix format is below
157 Bits
18 to
47 contain bits
3 to
31 of a standard RV32 format
160 Bits
7 to
17 contain bits
7 through
17 of the P48 format
163 Bits
0 to
6 contain the standard RV
64-bit prefix
0b0111111
167 64 bit prefix format:
169 \begin{tabular
}{|l|l|l|l|l|l|
} \hline
170 Encoding &
63 &
62 &
61 &
60 &
59:
48 \\
\hline
171 P64-LD-type & rd
[6] & rs1
[6] & & Rsvd & VLtyp \\
\hline
172 P64-ST-type & & rs1
[6] & rs2
[6] & Rsvd & VLtyp \\
\hline
173 P64-R-type & rd
[6] & rs1
[6] & rs2
[6] & vd & VLtyp \\
\hline
174 P64-I-type & rd
[6] & rs1
[6] & & Rsvd & VLtyp \\
\hline
175 P64-U-type & rd
[6] & & & Rsvd & VLtyp \\
\hline
176 P64-FR-type & & rs1
[6] & rs2
[6] & vd & VLtyp \\
\hline
177 P64-FI-type & rd
[6] & rs1
[6] & rs2
[6] & vd & VLtyp \\
\hline
178 P64-FR4-type & rd
[6] & rs1
[6] & rs2
[6] & rs3
[6] & VLtyp \\
\hline
181 The extra bit for src and dest registers provides the full range of up to
128
182 registers, when combined with the extra bit from the
48 bit prefix as well.
183 VLtyp encodes how (whether) to set SVPSTATE.VL and SVPSTATE.MAXVL.
185 \section{VLtyp field encoding
}
187 NOTE: VL and MVL below are local to SVPrefix and, if non-default, will update
188 the src and dest element offsets in SVPSTATE, not the main
\Specification STATE.
189 If default (all zeros) then STATE VL and MVL apply to this instruction, and
190 STATE.srcoffs (etc) will be used.
192 \begin{tabular
}{|l|l|l|l|l|
} \hline
193 VLtyp
[11] & VLtyp
[10:
6] & VLtyp
[5:
1] & VLtyp
[0] & comment \\
\hline
194 0 &
00000 &
00000 &
0 & no change to VL/MVL \\
\hline
195 0 & VLdest & VLEN & vlt & VL imm/reg mode (vlt) \\
\hline
196 1 & VLdest & MVL+VL-immed &
0 & MVL+VL immed mode \\
\hline
197 1 & VLdest & MVL-immed &
1 & MVL immed mode \\
\hline
200 Note: when VLtyp is all zeros, the main
\Specification VL and MVL apply to this
201 instruction. If called outside of a VBLOCK or if sv.setvl has not set VL, the
202 operation is "scalar".
204 Just as in the VBLOCK format, when bit
11 of VLtyp is zero:
208 if vlt is zero, bits
1 to
5 specify the VLEN as a
5 bit immediate (offset
209 by
1:
0b00000 represents VL=
1,
0b00001 represents VL=
2 etc.)
212 if vlt is
1, bits
1 to
5 specify the scalar (RV standard) register from
213 which VL is set. x0 is not permitted
216 VL goes into the scalar register VLdest (if VLdest is not x0)
220 When bit
11 of VLtype is
1:
224 if VLtyp
[0] is zero, both SVPSTATE.MAXVL and SVPSTATE.VL are set to
225 (imm+
1). The same value goes into the scalar register VLdest (if VLdest is
229 if VLtyp
[0] is
1, SVPSTATE.MAXVL is set to (imm+
1). SVPSTATE.VL will be
230 truncated to within the new range (if VL was greater than the new MAXVL).
231 The new VL goes into the scalar register VLdest (if VLdest is not x0).
235 This gives the option to set up SVPSTATE.VL in a "loop mode" (VLtype
[11]=
0) or
236 in a "one-off" mode (VLtype
[11]=
1) which sets both MVL and VL to the same
237 immediate value. This may be most useful for one-off Vectorised operations such
238 as LOAD-MULTI / STORE-MULTI, for saving and restoration of large batches of
239 registers in context-switches or function calls.
241 Note that VLtyp's VL and MVL are not the same as the main
\Specification VL or
242 MVL, and that loops will alter srcoffs and destoffs in SVPSTATE in VLtype
243 nondefault mode, but the srcoffs and destoffs in STATE, if VLtype=
0.
245 Furthermore, the execution order and exception handling must be exactly the
246 same as in the main spec (Program Order must be preserved)
248 Pseudocode for SVPSTATE.VL:
256 // instruction fields:
258 vlmax = get_immed_field();
260 // handle illegal instruction decoding
266 if rs1 ==
0 { // rs1 is x0
269 vl = min(regs
[rs1
], vlmax)
280 \section{vs\#/vd Fields' Encoding
}
282 % Note tabularx - as the 3rd field needs to wrap otherwise it overflows the line
283 \begin{tabularx
}{\textwidth}{|l|l|X|
} \hline
284 vs\#/vd & Mnemonic & Meaning \\
\hline
285 0 & S & the rs\#/rd field specifies a scalar (single sub-vector);
286 the rs\#/rd field is zero-extended to get the actual
7-bit register number
288 1 & V & the rs\#/rd field specifies a vector; the rs\#/rd field is decoded using
289 the Vector Register Number Encoding to get the actual
7-bit register number
293 \fixme{Vector Register Number Encoding should be a link
}
295 If a vs\#/vd field is not present, it is as if it was present with a value that
296 is the bitwise-or of all present vs\#/vd fields.
300 scalar register numbers do NOT increment when allocated in the hardware
301 for-loop. the same scalar register number is handed to every ALU.
304 vector register numbers DO increase when allocated in the hardware
305 for-loop. sequentially-increasing register data is handed to sequential
310 \section{Vector Register Number Encoding
}
312 For the
48 bit format, when vs\#/vd is
1, the actual
7-bit register number is
313 derived from the corresponding
6-bit rs\#/rd field:
315 \begin{tabular
}{|l|l|l|
} \hline
316 \multicolumn{3}{|c|
}{Actual
7-bit register number
} \\
\hline
317 Bit
6 & Bits
5:
1 & Bit
0 \\
\hline
318 rs\#/rd
[0] & rs\#/rd
[5:
1] &
0 \\
\hline
321 For the
64 bit format, the
7 bit register is constructed from the
7 bit fields:
322 bits
0 to
4 from the
32 bit RV Standard format, bit
5 from the
48 bit prefix
323 and bit
6 from the
64 bit prefix. Thus in the
64 bit format the full range of
324 up to
128 registers is directly available. This for both when either scalar or
327 \section{Load/Store Kind (lsk) Field Encoding
}
329 \begin{tabular
}{|l|l|l|
} \hline
330 vd/vs2 & vs1 & Meaning \\
\hline
331 0 &
0 & srcbase is scalar, LD/ST is pure scalar. \\
\hline
332 1 &
0 & srcbase is scalar, LD/ST is unit strided \\
\hline
333 0 &
1 & srcbase is a vector (gather/scatter aka array of srcbases). VSPLAT and VSELECT \\
\hline
334 1 &
1 & srcbase is a vector, LD/ST is a full vector LD/ST. \\
\hline
340 A register strided LD/ST would require
5 registers. srcbase, vd/vs2,
341 predicate
1, predicate
2 and the stride register.
344 Complex strides may all be done with a general purpose vector of srcbases.
347 Twin predication may be used even when vd/vs1 is a scalar, to give VSPLAT
348 and VSELECT, because the hardware loop ends on the first occurrence of a
1
349 in the predicate when a predicate is applied to a scalar.
352 Full vectorised gather/scatter is enabled when both registers are marked as
353 vectorised, however unlike e.g Intel AVX512, twin predication can be
358 Open question: RVV overloads the width field of LOAD-FP/STORE-FP using the bit
359 2 to indicate additional interpretation of the
11 bit immediate. Should this be
362 \section{Sub-Vector Length (svlen) Field Encoding
}
364 NOTE: svlen is not the same as the main spec SUBVL. When nondefault (not zero)
365 SVPSTATE context is used for Sub vector loops. However is svlen is zero, STATE
366 and SUBVL is used instead.
368 Bitwidth, from VL's perspective, is a multiple of the elwidth times svlen. So
369 within each loop of VL there are svlen sub-elements of elwidth in size, just
370 like in a SIMD architecture. When svlen is set to
0b00 (indicating svlen=
1) no
371 such SIMD-like behaviour exists and the subvectoring is disabled.
373 Predicate bits do not apply to the individual sub-vector elements, they apply
374 to the entire subvector group. This saves instructions on setup of the
377 \begin{tabular
}{|l|l|
} \hline
378 svlen Encoding & Value \\
\hline
385 In independent standalone implementations that do not implement the main
386 \Specification, the value of SUBVL in the above table (svtyp=
0b00) is set to
1,
387 such that svlen is also
1.
389 Behaviour of operations that set svlen are identical to those of the main spec.
390 See section on VLtyp, above.
392 \section{Predication (pred) Field Encoding
}
394 \begin{tabular
}{|l|l|l|l|
} \hline
395 pred & Mnemonic & Predicate Register & Meaning \\
\hline
396 000 & None & None & The instruction is unpredicated \\
\hline
397 001 & Reserved & Reserved & \\
\hline
398 010 & !x9 &
\multirow{2}{*
}{x9 (s1)
} & execute vector op
[0..i
] on x9
[i
] ==
0 \\
\cline{1-
2} \cline{4-
4}
399 011 & x9 & & execute vector op
[0..i
] on x9
[i
] ==
1 \\
\hline
400 100 & !x10 &
\multirow{2}{*
}{x10 (a0)
} & execute vector op
[0..i
] on x10
[i
] ==
0 \\
\cline{1-
2} \cline{4-
4}
401 101 & x10 & & execute vector op
[0..i
] on x10
[i
] ==
1 \\
\hline
402 110 & !x11 &
\multirow{2}{*
}{x11 (a1)
} & execute vector op
[0..i
] on x11
[i
] ==
0 \\
\cline{1-
2} \cline{4-
4}
403 111 & x11 & & execute vector op
[0..i
] on x11
[i
] ==
1 \\
\hline
406 \section{Twin-predication (tpred) Field Encoding
}
408 Twin-predication (ability to associate two predicate registers with an
409 instruction) applies to MV, FCLASS, LD and ST. The same format also applies to
410 integer-branch-compare operations although it is not to be considered "twin"
411 predication. In the case of integer-branch-compare operations, the second
412 register (if enabled) stores the results of the element comparisons. See
413 Appendix for details.
415 \fixme{Appendix above is link to http://libre\-riscv.org/simple
\_v\_extension/appendix/
}
417 \begin{tabular
}{|l|l|l|l|
} \hline
418 pred & Mnemonic & Predicate Register & Meaning \\
\hline
419 000 & None & None & The instruction is unpredicated \\
\hline
420 001 & x9,off & src=x9, dest=none & src
[0..i
] uses x9
[i
], dest unpredicated \\
\hline
421 010 & off,x10 & src=none, dest=x10 & dest
[0..i
] uses x10
[i
], src unpredicated \\
\hline
422 011 & x9,
10 & src=x9, dest=x10 & src
[0..i
] uses x9
[i
], dest
[0..i
] uses x10
[i
] \\
\hline
423 100 & None & RESERVED & Instruction is unpredicated (TBD) \\
\hline
424 101 & !x9,off & src=!x9, dest=none & \\
\hline
425 110 & off,!x10 & src=none, dest=!x10 & \\
\hline
426 111 & !x9,!x10 & src=!x9, dest=!x10 & \\
\hline
429 \fixme{In table above some in col
3 might be vertically joined
}
431 \section{Integer Element Type (itype) Field Encoding
}
433 \begin{tabularx
}{\textwidth}{|l|l|l|X|X|X|
} \hline
434 Signedness
[2] & itype & Element Type & Mnemonic in Integer Instructions & Mnemonic in FP Instructions (such as fmv.x) & Meaning (INT may be un/signed, FP just re-sized \\
\hline
435 Unsigned &
01 & u8 & BU & BU & Unsigned
8-bit \\
\hline
436 &
10 & u16 & HU & HU & Unsigned
16-bit \\
\hline
437 &
11 & u32 & WU & WU & Unsigned
32-bit \\
\hline
438 &
00 & uXLEN & WU/DU/QU & WU/LU/TU & Unsigned XLEN-bit \\
\hline
439 Signed &
01 & i8 & BS & BS & Signed
8-bit \\
\hline
440 &
10 & i16 & HS & HS & Signed
16-bit \\
\hline
441 &
11 & i32 & W & W & Signed
32-bit \\
\hline
442 &
00 & iXLEN & W/D/Q & W/L/T & Signed XLEN-bit \\
\hline
445 [2] (
1,
2) Signedness is defined in Signedness Decision Procedure
447 Note: vector mode is effectively a type-cast of the register file as if it was
448 a sequential array being typecast to typedef itype
[] (c syntax). The starting
449 point of the "typecast" is the vector register rs\#/rd.
451 Example: if itype=
0b10 (u16), and rd is set to "vector", and VL is set to
4,
452 the
64-bit register at rd is subdivided into FOUR
16-bit destination elements.
453 It is NOT four separate
64-bit destination registers (rd+
0, rd+
1, rd+
2, rd+
3)
454 that are sign-extended from the source width size out to
64-bit, because that
455 is itype=
0b00 (uXLEN).
457 Note also: changing elwidth creates packed elements that, depending on VL, may
458 create vectors that do not fit perfectly onto XLEN sized registry file
459 bit-boundaries. This does NOT result in the destruction of the MSBs of the last
460 register written to at the end of a VL loop. More details on how to handle this
461 are described in the main
\Specification.
463 \section{Signedness Decision Procedure
}
467 If the opcode field is either OP or OP-IMM, then
469 \indent Signedness is Unsigned.
472 If the opcode field is either OP-
32 or OP-IMM-
32, then
474 \indent Signedness is Signed.
477 If Signedness is encoded in a field of the base instruction,
[3] then
479 \indent Signedness uses the encoded value.
484 \indent Signedness is Unsigned.
488 [3] Like in fcvt.d.l
[u
], but unlike in fmv.x.w, since there is no fmv.x.wu
490 \section{Vector Type and Predication
5-bit (vtp5) Field Encoding
}
492 In the following table, X denotes a wildcard that is
0 or
1 and can be a
493 different value for every occurrence.
495 \begin{tabular
}{|l|l|l|
} \hline
496 vtp5 & pred & svlen \\
\hline
497 1XXXX & vtp5
[4:
2] & vtp5
[1:
0] \\
\hline
500 001XX & Reserved & \\
\hline
503 \section{Vector Integer Type and Predication
6-bit (vitp6) Field Encoding
}
505 In the following table, X denotes a wildcard that is
0 or
1 and can be a
506 different value for every occurrence.
508 \begin{tabular
}{|l|l|l|l|l|
} \hline
509 vitp6 & itype & pred
[2] & pred
[0:
1] & svlen \\
\hline
510 XX1XXX & vitp6
[5:
4] &
0 & vitp6
[3:
2] & vitp6
[1:
0] \\
\hline
511 XX00XX & & & & \\
\hline
512 XX01XX & Reserved & & & \\
\hline
515 \fixme{spanning cols/rows above
}
517 vitp7 field: only tpred
519 \begin{tabular
}{|l|l|l|l|l|
} \hline
520 vitp7 & itype & tpred
[2] & tpred
[0:
1] & svlen \\
\hline
521 XXXXXXX & vitp7
[5:
4] & vitp7
[6] & vitp7
[3:
2] & vitp7
[1:
0] \\
\hline
524 \section{48-bit Instruction Encoding Decision Procedure
}
526 In the following decision procedure,
\textit{Reserved
} means that there is not yet a
527 defined
48-bit instruction encoding for the base instruction.
532 If the base instruction is a load instruction, then
536 If the base instruction is an I-type instruction, then
539 The encoding is P48-LD-type.
547 The encoding is
\textit{Reserved
}.
553 If the base instruction is a store instruction, then
557 If the base instruction is an S-type instruction, then
560 The encoding is P48-ST-type.
568 The encoding is
\textit{Reserved
}.
575 If the base instruction is a SYSTEM instruction, then
579 The encoding is
\textit{Reserved
}.
584 If the base instruction is an integer instruction, then
589 If the base instruction is an R-type instruction, then
592 The encoding is P48-R-type.
597 If the base instruction is an I-type instruction, then
600 The encoding is P48-I-type.
605 If the base instruction is an S-type instruction, then
608 The encoding is
\textit{Reserved
}.
613 If the base instruction is an B-type instruction, then
616 The encoding is
\textit{Reserved
}.
621 If the base instruction is an U-type instruction, then
624 The encoding is P48-U-type.
629 If the base instruction is an J-type instruction, then
632 The encoding is
\textit{Reserved
}.
640 The encoding is
\textit{Reserved
}.
647 If the base instruction is a floating-point instruction, then
652 If the base instruction is an R-type instruction, then
655 The encoding is P48-FR-type.
660 If the base instruction is an I-type instruction, then
663 The encoding is P48-FI-type.
668 If the base instruction is an S-type instruction, then
671 The encoding is
\textit{Reserved
}.
676 If the base instruction is an B-type instruction, then
679 The encoding is
\textit{Reserved
}.
684 If the base instruction is an U-type instruction, then
687 The encoding is
\textit{Reserved
}.
692 If the base instruction is an J-type instruction, then
695 The encoding is
\textit{Reserved
}.
700 If the base instruction is an R4-type instruction, then
703 The encoding is P48-FR4-type.
711 The encoding is
\textit{Reserved
}.
718 The encoding is
\textit{Reserved
}.
722 \section{CSR Registers
}
724 CSRs are the same as in the main
\Specification, if associated functionality is implemented. They have the exact same meaning as in the main
\Specification.
741 Associated SET and GET on the CSRs is exactly as in the main spec as well
742 (including CSRRWI and CSRRW differences).
744 Note that if both VLtyp and svlen are not implemented, SVPSTATE is not
745 required. Also if VL and SUBVL are not implemented, STATE from the main
746 \Specification is not required either.
748 However if partial functionality is implemented, the unimplemented bits in
749 STATE and SVPSTATE must be zero, and, in the UNIX Platform, an illegal
750 exception MUST be raised if unsupported bits are written to.
752 SVPSTATE fields are exactly the same layout as STATE:
754 \begin{tabular
}{|l|l|l|l|l|l|l|
} \hline
755 (
31.
.28) & (
27.
.26) & (
25.
.24) & (
23.
.18) & (
17.
.12) & (
11.
.6) & (
5..
.0) \\
\hline
756 rsvd & dsvoffs & subvl & destoffs & srcoffs & vl & maxvl \\
\hline
759 However note that where STATE stores the scalar register number to be used as
760 VL, SVPSTATE.VL actually contains the actual VL value, in an identical fashion
763 \section{Additional Instructions
}
767 Add instructions to convert between integer types.
770 Add instructions to swizzle elements in sub-vectors. Note that the
771 sub-vector lengths of the source and destination won't necessarily match.
774 Add instructions to transpose (
2-
4)x(
2-
4) element matrices.
777 Add instructions to insert or extract a sub-vector from a vector, with the
778 index allowed to be both immediate and from a register (immediate can be
779 covered by twin-predication, register might be, by virtue of predicates
783 Add a register gather instruction (aka MV.X: regfile
[rd
] =
784 regfile
[regfile
[rs1
]])
788 subelement swizzle example:
790 velswizzle x32, x64, SRCSUBVL=
3, DESTSUBVL=
4, ELTYPE=u8, elements=
[0,
0,
2,
1]
794 Moved to the discussion page (link at top of this page)
798 Work out a way to do sub-element swizzling.