powerpc-add/src/SVPrefix.tex

   1 % https://bugs.libre-soc.org/show_bug.cgi?id=213
   2 % SimpleV Prefix (SVprefix) Proposal v0.3
   3 % https://libre-soc.org/simple_v_extension/sv_prefix_proposal/
   4
   5 \newcommand{\Specification}{{\href{https://libre-soc.org/simple_v_extension/specification/}{Specification}}}
   6
   7 \chapter{SimpleV Prefix Proposal -- v0.3}
   8
   9 \paragraph{}
  10
  11 Copyright (c) Jacob Lifshay, 2019
  12 Copyright (c) Luke Kenneth Casson Leighton, 2019
  13
  14 This proposal is designed to be able to operate without SVorig, but not to
  15 require the absence of SVorig. See \Specification.
  16
  17 Principle: SVprefix embeds (unmodified) RVC and 32-bit scalar opcodes into 32,
  18 48 and 64 bit RV formats, to provide Vectorisation context on a per-instruction
  19 basis.
  20
  21 \section{Options}
  22
  23
  24 The following partial / full implementation options are possible:
  25
  26 \begin{itemize}
  27 \item
  28     SVPrefix augments the main \Specification
  29
  30 \item
  31     SVPrefix operates independently, without the main spec VL (and MVL) \gls{CSR}s
  32     (in any privilege level)
  33
  34 \item
  35     SVPrefix operates independently, without the main spec SUBVL CSRs (in any priv level)
  36
  37 \item
  38     SVPrefix has no support for VL (or MVL) overrides in the 64 bit instruction
  39     format (VLtyp=0 as the only legal permitted value)
  40
  41 \item
  42     SVPrefix has no support for svlen overrides in either the 48 or 64 bit
  43     instruction format either (svlen=0 as the only legal permitted value).
  44
  45 \end{itemize}
  46
  47 All permutations of the above options are permitted, and the UNIX platform must
  48 raise illegal instruction exceptions on implementations that do not support
  49 each option. For example, an implementation that has no support for VLtyp that
  50 sees an opcode with a nonzero VLtyp must raise an illegal instruction exception.
  51
  52 Note that SVPrefix (VLtyp and svlen) has its own STATE CSR, SVPSTATE. This
  53 allows Prefixed operations to be re-entrant on traps, and to not affect VBLOCK
  54 use of VL or SUBVL.
  55
  56 If the main \Specification CSRs and features are to be supported (VBLOCK), then
  57 when VLtyp or svlen are "default" they utilise the main \Specification VBLOCK VL
  58 and/or SUBVL, and, correspondingly, the main VBLOCK STATE CSR will be updated
  59 and used to track hardware loops.
  60
  61 If however VLtyp is set to nondefault, then the SVPSTATE src and destoffs
  62 fields are used instead to create the hardware loops, and likewise if svlen is
  63 set to nondefault, SVPSTATE's svoffs field is used.
  64
  65 \section{Half-Precision Floating Point (FP16)}
  66
  67 If the F extension is supported, SVprefix adds support for FP16 in the base FP
  68 instructions by using 10 (H) in the floating-point format field fmt and using
  69 001 (H) in the floating-point load/store width field.
  70
  71 \section{Compressed Instructions}
  72
  73 Compressed instructions are under evaluation by taking the same prefix as used
  74 in P48, embedding that and standard RVC opcodes (minus their RVC prefix) into a
  75 32-bit space. This by taking the three remaining Major "custom" opcodes (0-2),
  76 % TODO discussion ???
  77 one for each of the three RVC Quadrants. see \textbf{discussion ???}.
  78
  79 \section{48-bit Prefixed Instructions}
  80
  81 All 48-bit prefixed instructions contain a 32-bit "base" instruction as the
  82 last 4 bytes. Since all 32-bit instructions have bits 1:0 set to 11, those bits
  83 are reused for additional encoding space in the 48-bit instructions.
  84
  85 \section{64-bit Prefixed Instructions}
  86
  87 The 48 bit format is further extended with the full 128-bit range on all source
  88 and destination registers, and the option to set both SVSTATE.VL and
  89 SVSTATE.MVL is provided.
  90
  91 \section{48-bit Instruction Encodings}
  92
  93 In the following table, Rsvd (reserved) entries must be zero. RV32 equivalent encodings included for side-by-side comparison (and listed below, separately).
  94
  95 First, bits 17:0:
  96
  97 \begin{tabular}{|l|l|l|l|l|l|l|l|l|l|}                                                                   \hline
  98     Encoding        & 17       & 16     & 15       & 14  & 13     & 12         & 11:7 & 6    & 5:0    \\ \hline
  99     P48-LD-type     & rd[5]    & rs1[5] & vitp7[6] & vd  & vs1    & vitp7[5:0] &      & Rsvd & 011111 \\ \hline
 100     P48-ST-type     & vitp7[6] & rs1[5] & rs2[5]   & vs2 & vs1    & vitp7[5:0] &      & Rsvd & 011111 \\ \hline
 101     P48-R-type      & rd[5]    & rs1[5] & rs2[5]   & vs2 & vs1    & vitp6      &      & Rsvd & 011111 \\ \hline
 102     P48-I-type      & rd[5]    & rs1[5] & vitp7[6] & vd  & vs1    & vitp7[5:0] &      & Rsvd & 011111 \\ \hline
 103     P48-U-type      & rd[5]    & Rsvd   & Rsvd     & vd  & Rsvd   & vitp6      &      & Rsvd & 011111 \\ \hline
 104     P48-FR-type     & rd[5]    & rs1[5] & rs2[5]   & vs2 & vs1    & Rsvd       & vtp5 & Rsvd & 011111 \\ \hline
 105     P48-FI-type     & rd[5]    & rs1[5] & vitp7[6] & vd  & vs1    & vitp7[5:0] &      & Rsvd & 011111 \\ \hline
 106     P48-FR4-type    & rd[5]    & rs1[5] & rs2[5]  & vs2  & rs3[5] & vs3 [1]    & vtp5 & Rsvd & 011111 \\ \hline
 107 \end{tabular}
 108
 109 \fixme{ The link to [1] is easily confused with the likes of [5]}
 110
 111 [1] Only vs2 and vs3 are included in the P48-FR4-type encoding because there is
 112 not enough space for vs1 as well, and because it is more useful to have a
 113 scalar argument for each of the multiplication and addition portions of fmadd
 114 than to have two scalars on the multiplication portion.
 115
 116 Table showing correspondance between P48--type and RV32--type. These are bits 47:18 (RV32 shifted up by 16 bits):
 117
 118 \begin{tabular}{|l|l|}                  \hline
 119     Encoding        & RV32 Encoding  \\ \hline
 120     47:32           & 31:2           \\ \hline
 121     P48-LD-type     & RV32-I-type    \\ \hline
 122     P48-ST-type     & RV32-S-Type    \\ \hline
 123     P48-R-type      & RV32-R-Type    \\ \hline
 124     P48-I-type      & RV32-I-Type    \\ \hline
 125     P48-U-type      & RV32-U-Type    \\ \hline
 126     P48-FR-type     & RV32-FR-Type   \\ \hline
 127     P48-FI-type     & RV32-I-Type    \\ \hline
 128     P48-FR4-type    & RV32-FR4-type  \\ \hline
 129 \end{tabular}
 130
 131 Table showing Standard RV32 encodings:
 132
 133 \begin{tabular}{|l|l|l|l|l|l|l|l|l|}                                                                         \hline
 134     Encoding        & 31:27       & 26:25   & 24:20    & 19:15     & 14:12   & 11:7      & 6:2     & 1:0  \\ \hline
 135     RV32-R-type     & funct7      &         & rs2[4:0] & rs1[4:0]  & funct3  & rd[4:0]   & opcode  & 0b11 \\ \hline
 136     RV32-S-type     & imm[11:5]   &         & rs2[4:0] & rs1[4:0]  & funct3  & imm[4:0]  & opcode  & 0b11 \\ \hline
 137     RV32-I-type     & imm[11:0]   &         &          & rs1[4:0]  & funct3  & rd[4:0]   & opcode  & 0b11 \\ \hline
 138     RV32-U-type     & imm[31:12]  &         &          &           &         & rd[4:0]   & opcode  & 0b11 \\ \hline
 139     RV32-FR4-type   & rs3[4:0]    & fmt     & rs2[4:0] & rs1[4:0]  & funct3  & rd[4:0]   & opcode  & 0b11 \\ \hline
 140     RV32-FR-type    & funct5      & fmt     & rs2[4:0] & rs1[4:0]  & rm      & rd[4:0]   & opcode  & 0b11 \\ \hline
 141 \end{tabular}
 142
 143 \section{64-bit Instruction Encodings}
 144
 145 Where in the 48 bit format the prefix is "0b0011111" in bits 0 to 6, this is now set to "0b0111111".
 146
 147 \begin{tabular}{|l|l|l|l|}                                    \hline
 148     63:48          & 47:18       & 17:7       & 6:0        \\ \hline
 149     64 bit prefix  & RV32[31:3]  & P48[17:7]  & 0b0111111  \\ \hline
 150 \end{tabular}
 151
 152 \begin{itemize}
 153 \item
 154     The 64 bit prefix format is below
 155
 156 \item
 157     Bits 18 to 47 contain bits 3 to 31 of a standard RV32 format
 158
 159 \item
 160     Bits 7 to 17 contain bits 7 through 17 of the P48 format
 161
 162 \item
 163     Bits 0 to 6 contain the standard RV 64-bit prefix 0b0111111
 164
 165 \end{itemize}
 166
 167 64 bit prefix format:
 168
 169 \begin{tabular}{|l|l|l|l|l|l|}                                       \hline
 170     Encoding      & 63      & 62      & 61      & 60      & 59:48 \\ \hline
 171     P64-LD-type   & rd[6]   & rs1[6]  &         & Rsvd    & VLtyp \\ \hline
 172     P64-ST-type   &         & rs1[6]  & rs2[6]  & Rsvd    & VLtyp \\ \hline
 173     P64-R-type    & rd[6]   & rs1[6]  & rs2[6]  & vd      & VLtyp \\ \hline
 174     P64-I-type    & rd[6]   & rs1[6]  &         & Rsvd    & VLtyp \\ \hline
 175     P64-U-type    & rd[6]   &         &         & Rsvd    & VLtyp \\ \hline
 176     P64-FR-type   &         & rs1[6]  & rs2[6]  & vd      & VLtyp \\ \hline
 177     P64-FI-type   & rd[6]   & rs1[6]  & rs2[6]  & vd      & VLtyp \\ \hline
 178     P64-FR4-type  & rd[6]   & rs1[6]  & rs2[6]  & rs3[6]  & VLtyp \\ \hline
 179 \end{tabular}
 180
 181 The extra bit for src and dest registers provides the full range of up to 128
 182 registers, when combined with the extra bit from the 48 bit prefix as well.
 183 VLtyp encodes how (whether) to set SVPSTATE.VL and SVPSTATE.MAXVL.
 184
 185 \section{VLtyp field encoding}
 186
 187 NOTE: VL and MVL below are local to SVPrefix and, if non-default, will update
 188 the src and dest element offsets in SVPSTATE, not the main \Specification STATE.
 189 If default (all zeros) then STATE VL and MVL apply to this instruction, and
 190 STATE.srcoffs (etc) will be used.
 191
 192 \begin{tabular}{|l|l|l|l|l|}                                                                       \hline
 193     VLtyp[11]     & VLtyp[10:6]     & VLtyp[5:1]      & VLtyp[0]        & comment               \\ \hline
 194     0             & 00000           & 00000           & 0               & no change to VL/MVL   \\ \hline
 195     0             & VLdest          & VLEN            & vlt             & VL imm/reg mode (vlt) \\ \hline
 196     1             & VLdest          & MVL+VL-immed    & 0               & MVL+VL immed mode     \\ \hline
 197     1             & VLdest          & MVL-immed       & 1               & MVL immed mode        \\ \hline
 198 \end{tabular}
 199
 200 Note: when VLtyp is all zeros, the main \Specification VL and MVL apply to this
 201 instruction. If called outside of a VBLOCK or if sv.setvl has not set VL, the
 202 operation is "scalar".
 203
 204 Just as in the VBLOCK format, when bit 11 of VLtyp is zero:
 205
 206 \begin{itemize}
 207 \item
 208     if vlt is zero, bits 1 to 5 specify the VLEN as a 5 bit immediate (offset
 209     by 1: 0b00000 represents VL=1, 0b00001 represents VL=2 etc.)
 210
 211 \item
 212     if vlt is 1, bits 1 to 5 specify the scalar (RV standard) register from
 213     which VL is set. x0 is not permitted
 214
 215 \item
 216     VL goes into the scalar register VLdest (if VLdest is not x0)
 217
 218 \end{itemize}
 219
 220 When bit 11 of VLtype is 1:
 221
 222 \begin{itemize}
 223 \item
 224     if VLtyp[0] is zero, both SVPSTATE.MAXVL and SVPSTATE.VL are set to
 225     (imm+1). The same value goes into the scalar register VLdest (if VLdest is
 226     not x0)
 227
 228 \item
 229     if VLtyp[0] is 1, SVPSTATE.MAXVL is set to (imm+1). SVPSTATE.VL will be
 230     truncated to within the new range (if VL was greater than the new MAXVL).
 231     The new VL goes into the scalar register VLdest (if VLdest is not x0).
 232
 233 \end{itemize}
 234
 235 This gives the option to set up SVPSTATE.VL in a "loop mode" (VLtype[11]=0) or
 236 in a "one-off" mode (VLtype[11]=1) which sets both MVL and VL to the same
 237 immediate value. This may be most useful for one-off Vectorised operations such
 238 as LOAD-MULTI / STORE-MULTI, for saving and restoration of large batches of
 239 registers in context-switches or function calls.
 240
 241 Note that VLtyp's VL and MVL are not the same as the main \Specification VL or
 242 MVL, and that loops will alter srcoffs and destoffs in SVPSTATE in VLtype
 243 nondefault mode, but the srcoffs and destoffs in STATE, if VLtype=0.
 244
 245 Furthermore, the execution order and exception handling must be exactly the
 246 same as in the main spec (Program Order must be preserved)
 247
 248 Pseudocode for SVPSTATE.VL:
 249
 250 \begin{verbatim}
 251     # pseudocode
 252
 253     regs = [0u64; 128];
 254     vl = 0;
 255
 256     // instruction fields:
 257     rd = get_rd_field();
 258     vlmax = get_immed_field();
 259
 260     // handle illegal instruction decoding
 261     if vlmax > XLEN {
 262         trap()
 263     }
 264
 265     // calculate VL
 266     if rs1 == 0 { // rs1 is x0
 267         vl = vlmax
 268     } else {
 269         vl = min(regs[rs1], vlmax)
 270     }
 271
 272     // write rd
 273     if rd != 0 {
 274         // rd is not x0
 275         regs[rd] = vl
 276     }
 277 \end{verbatim}
 278
 279
 280 \section{vs\#/vd Fields' Encoding}
 281
 282 % Note tabularx - as the 3rd field needs to wrap otherwise it overflows the line
 283 \begin{tabularx}{\textwidth}{|l|l|X|}   \hline
 284     vs\#/vd   & Mnemonic   & Meaning \\ \hline
 285     0         & S          & the rs\#/rd field specifies a scalar (single sub-vector);
 286                              the rs\#/rd field is zero-extended to get the actual 7-bit register number
 287                                      \\ \hline
 288     1         & V          & the rs\#/rd field specifies a vector; the rs\#/rd field is decoded using
 289                              the Vector Register Number Encoding to get the actual 7-bit register number
 290                                      \\ \hline
 291 \end{tabularx}
 292
 293 \fixme{Vector Register Number Encoding should be a link }
 294
 295 If a vs\#/vd field is not present, it is as if it was present with a value that
 296 is the bitwise-or of all present vs\#/vd fields.
 297
 298 \begin{itemize}
 299 \item
 300     scalar register numbers do NOT increment when allocated in the hardware
 301     for-loop. the same scalar register number is handed to every ALU.
 302
 303 \item
 304     vector register numbers DO increase when allocated in the hardware
 305     for-loop. sequentially-increasing register data is handed to sequential
 306     ALUs.
 307
 308 \end{itemize}
 309
 310 \section{Vector Register Number Encoding}
 311
 312 For the 48 bit format, when vs\#/vd is 1, the actual 7-bit register number is
 313 derived from the corresponding 6-bit rs\#/rd field:
 314
 315 \begin{tabular}{|l|l|l|}                                  \hline
 316     \multicolumn{3}{|c|}{Actual 7-bit register number} \\ \hline
 317     Bit 6        & Bits 5:1        & Bit 0             \\ \hline
 318     rs\#/rd[0]   & rs\#/rd[5:1]    & 0                 \\ \hline
 319 \end{tabular}
 320
 321 For the 64 bit format, the 7 bit register is constructed from the 7 bit fields:
 322 bits 0 to 4 from the 32 bit RV Standard format, bit 5 from the 48 bit prefix
 323 and bit 6 from the 64 bit prefix. Thus in the 64 bit format the full range of
 324 up to 128 registers is directly available. This for both when either scalar or
 325 vector mode is set.
 326
 327 \section{Load/Store Kind (lsk) Field Encoding}
 328
 329 \begin{tabular}{|l|l|l|}                                                                                 \hline
 330     vd/vs2 & vs1     & Meaning                                                                        \\ \hline
 331     0      & 0       & srcbase is scalar, LD/ST is pure scalar.                                       \\ \hline
 332     1      & 0       & srcbase is scalar, LD/ST is unit strided                                       \\ \hline
 333     0      & 1       & srcbase is a vector (gather/scatter aka array of srcbases). VSPLAT and VSELECT \\ \hline
 334     1      & 1       & srcbase is a vector, LD/ST is a full vector LD/ST.                             \\ \hline
 335 \end{tabular}
 336
 337 Notes:
 338 \begin{itemize}
 339 \item
 340     A register strided LD/ST would require 5 registers. srcbase, vd/vs2,
 341     predicate 1, predicate 2 and the stride register.
 342
 343 \item
 344     Complex strides may all be done with a general purpose vector of srcbases.
 345
 346 \item
 347     Twin predication may be used even when vd/vs1 is a scalar, to give VSPLAT
 348     and VSELECT, because the hardware loop ends on the first occurrence of a 1
 349     in the predicate when a predicate is applied to a scalar.
 350
 351 \item
 352     Full vectorised gather/scatter is enabled when both registers are marked as
 353     vectorised, however unlike e.g Intel AVX512, twin predication can be
 354     applied.
 355
 356 \end{itemize}
 357
 358 Open question: RVV overloads the width field of LOAD-FP/STORE-FP using the bit
 359 2 to indicate additional interpretation of the 11 bit immediate. Should this be
 360 considered ?
 361
 362 \section{Sub-Vector Length (svlen) Field Encoding}
 363
 364 NOTE: svlen is not the same as the main spec SUBVL. When nondefault (not zero)
 365 SVPSTATE context is used for Sub vector loops. However is svlen is zero, STATE
 366 and SUBVL is used instead.
 367
 368 Bitwidth, from VL's perspective, is a multiple of the elwidth times svlen. So
 369 within each loop of VL there are svlen sub-elements of elwidth in size, just
 370 like in a SIMD architecture. When svlen is set to 0b00 (indicating svlen=1) no
 371 such SIMD-like behaviour exists and the subvectoring is disabled.
 372
 373 Predicate bits do not apply to the individual sub-vector elements, they apply
 374 to the entire subvector group. This saves instructions on setup of the
 375 predicate.
 376
 377 \begin{tabular}{|l|l|}         \hline
 378     svlen Encoding  & Value \\ \hline
 379     00              & SUBVL \\ \hline
 380     01              & 2     \\ \hline
 381     10              & 3     \\ \hline
 382     11              & 4     \\ \hline
 383 \end{tabular}
 384
 385 In independent standalone implementations that do not implement the main
 386 \Specification, the value of SUBVL in the above table (svtyp=0b00) is set to 1,
 387 such that svlen is also 1.
 388
 389 Behaviour of operations that set svlen are identical to those of the main spec.
 390 See section on VLtyp, above.
 391
 392 \section{Predication (pred) Field Encoding}
 393
 394 \begin{tabular}{|l|l|l|l|}                                                                      \hline
 395     pred  & Mnemonic    & Predicate Register        & Meaning                                \\ \hline
 396     000   & None        & None                      & The instruction is unpredicated        \\ \hline
 397     001   & Reserved    & Reserved                  &                                        \\ \hline
 398     010   & !x9         & \multirow{2}{*}{x9 (s1)}  & execute vector op[0..i] on x9[i] == 0  \\ \cline{1-2} \cline{4-4}
 399     011   & x9          &                           & execute vector op[0..i] on x9[i] == 1  \\ \hline
 400     100   & !x10        & \multirow{2}{*}{x10 (a0)} & execute vector op[0..i] on x10[i] == 0 \\ \cline{1-2} \cline{4-4}
 401     101   & x10         &                           & execute vector op[0..i] on x10[i] == 1 \\ \hline
 402     110   & !x11        & \multirow{2}{*}{x11 (a1)} & execute vector op[0..i] on x11[i] == 0 \\ \cline{1-2} \cline{4-4}
 403     111   & x11         &                           & execute vector op[0..i] on x11[i] == 1 \\ \hline
 404 \end{tabular}
 405
 406 \section{Twin-predication (tpred) Field Encoding}
 407
 408 Twin-predication (ability to associate two predicate registers with an
 409 instruction) applies to MV, FCLASS, LD and ST. The same format also applies to
 410 integer-branch-compare operations although it is not to be considered "twin"
 411 predication. In the case of integer-branch-compare operations, the second
 412 register (if enabled) stores the results of the element comparisons. See
 413 Appendix for details.
 414
 415 \fixme{Appendix above is link to http://libre\-riscv.org/simple\_v\_extension/appendix/ }
 416
 417 \begin{tabular}{|l|l|l|l|}                                                                       \hline
 418     pred & Mnemonic   & Predicate Register    & Meaning                                       \\ \hline
 419     000  & None       & None                  & The instruction is unpredicated               \\ \hline
 420     001  & x9,off     & src=x9, dest=none     & src[0..i] uses x9[i], dest unpredicated       \\ \hline
 421     010  & off,x10    & src=none, dest=x10    & dest[0..i] uses x10[i], src unpredicated      \\ \hline
 422     011  & x9,10      & src=x9, dest=x10      & src[0..i] uses x9[i], dest[0..i] uses x10[i]  \\ \hline
 423     100  & None       & RESERVED              & Instruction is unpredicated (TBD)             \\ \hline
 424     101  & !x9,off    & src=!x9, dest=none    &                                               \\ \hline
 425     110  & off,!x10   & src=none, dest=!x10   &                                               \\ \hline
 426     111  & !x9,!x10   & src=!x9, dest=!x10    &                                               \\ \hline
 427 \end{tabular}
 428
 429 \fixme{In table above some in col 3 might be vertically joined}
 430
 431 \section{Integer Element Type (itype) Field Encoding}
 432
 433 \begin{tabularx}{\textwidth}{|l|l|l|X|X|X|}   \hline
 434     Signedness [2] & itype & Element Type  & Mnemonic in Integer Instructions  & Mnemonic in FP Instructions (such as fmv.x) & Meaning (INT may be un/signed, FP just re-sized \\ \hline
 435     Unsigned       & 01    & u8            & BU                                & BU                                          & Unsigned 8-bit                                  \\ \hline
 436                    & 10    & u16           & HU                                & HU                                          & Unsigned 16-bit                                 \\ \hline
 437                    & 11    & u32           & WU                                & WU                                          & Unsigned 32-bit                                 \\ \hline
 438                    & 00    & uXLEN         & WU/DU/QU                          & WU/LU/TU                                    & Unsigned XLEN-bit                               \\ \hline
 439     Signed         & 01    & i8            & BS                                & BS                                          & Signed 8-bit                                    \\ \hline
 440                    & 10    & i16           & HS                                & HS                                          & Signed 16-bit                                   \\ \hline
 441                    & 11    & i32           & W                                 & W                                           & Signed 32-bit                                   \\ \hline
 442                    & 00    & iXLEN         & W/D/Q                             & W/L/T                                       & Signed XLEN-bit                                 \\ \hline
 443 \end{tabularx}
 444
 445 [2]   (1, 2) Signedness is defined in Signedness Decision Procedure
 446
 447 Note: vector mode is effectively a type-cast of the register file as if it was
 448 a sequential array being typecast to typedef itype[] (c syntax). The starting
 449 point of the "typecast" is the vector register rs\#/rd.
 450
 451 Example: if itype=0b10 (u16), and rd is set to "vector", and VL is set to 4,
 452 the 64-bit register at rd is subdivided into FOUR 16-bit destination elements.
 453 It is NOT four separate 64-bit destination registers (rd+0, rd+1, rd+2, rd+3)
 454 that are sign-extended from the source width size out to 64-bit, because that
 455 is itype=0b00 (uXLEN).
 456
 457 Note also: changing elwidth creates packed elements that, depending on VL, may
 458 create vectors that do not fit perfectly onto XLEN sized registry file
 459 bit-boundaries. This does NOT result in the destruction of the MSBs of the last
 460 register written to at the end of a VL loop. More details on how to handle this
 461 are described in the main \Specification.
 462
 463 \section{Signedness Decision Procedure}
 464
 465 \begin{enumerate}
 466 \item
 467     If the opcode field is either OP or OP-IMM, then
 468
 469 \indent  Signedness is Unsigned.
 470
 471 \item
 472     If the opcode field is either OP-32 or OP-IMM-32, then
 473
 474 \indent  Signedness is Signed.
 475
 476 \item
 477     If Signedness is encoded in a field of the base instruction, [3] then
 478
 479 \indent  Signedness uses the encoded value.
 480
 481 \item
 482     Otherwise,
 483
 484 \indent  Signedness is Unsigned.
 485
 486 \end{enumerate}
 487
 488 [3]   Like in fcvt.d.l[u], but unlike in fmv.x.w, since there is no fmv.x.wu
 489
 490 \section{Vector Type and Predication 5-bit (vtp5) Field Encoding}
 491
 492 In the following table, X denotes a wildcard that is 0 or 1 and can be a
 493 different value for every occurrence.
 494
 495 \begin{tabular}{|l|l|l|}                \hline
 496     vtp5    & pred       & svlen     \\ \hline
 497     1XXXX   & vtp5[4:2]  & vtp5[1:0] \\ \hline
 498     01XXX   &            &           \\ \hline
 499     000XX   &            &           \\ \hline
 500     001XX   & Reserved   &           \\ \hline
 501 \end{tabular}
 502
 503 \section{Vector Integer Type and Predication 6-bit (vitp6) Field Encoding}
 504
 505 In the following table, X denotes a wildcard that is 0 or 1 and can be a
 506 different value for every occurrence.
 507
 508 \begin{tabular}{|l|l|l|l|l|}                                       \hline
 509     vitp6    & itype       & pred[2] & pred[0:1]  & svlen       \\ \hline
 510     XX1XXX   & vitp6[5:4]  & 0       & vitp6[3:2] & vitp6[1:0]  \\ \hline
 511     XX00XX   &             &         &            &             \\ \hline
 512     XX01XX   & Reserved    &         &            &             \\ \hline
 513 \end{tabular}
 514
 515 \fixme{spanning cols/rows above}
 516
 517 vitp7 field: only tpred
 518
 519 \begin{tabular}{|l|l|l|l|l|}                                         \hline
 520     vitp7    & itype       & tpred[2]  & tpred[0:1]  & svlen      \\ \hline
 521     XXXXXXX  & vitp7[5:4]  & vitp7[6]  & vitp7[3:2]  & vitp7[1:0] \\ \hline
 522 \end{tabular}
 523
 524 \section{48-bit Instruction Encoding Decision Procedure}
 525
 526 In the following decision procedure, \textit{Reserved} means that there is not yet a
 527 defined 48-bit instruction encoding for the base instruction.
 528
 529 \begin{enumerate}
 530
 531 \item
 532     If the base instruction is a load instruction, then
 533
 534     \begin{enumerate}
 535     \item
 536         If the base instruction is an I-type instruction, then
 537             \begin{enumerate}
 538             \item
 539                 The encoding is P48-LD-type.
 540
 541             \end{enumerate}
 542
 543     \item
 544         Otherwise
 545             \begin{enumerate}
 546             \item
 547                 The encoding is \textit{Reserved}.
 548
 549             \end{enumerate}
 550
 551     \end{enumerate}
 552 \item
 553     If the base instruction is a store instruction, then
 554
 555     \begin{enumerate}
 556     \item
 557         If the base instruction is an S-type instruction, then
 558             \begin{enumerate}
 559             \item
 560                 The encoding is P48-ST-type.
 561
 562             \end{enumerate}
 563
 564     \item
 565         Otherwise
 566             \begin{enumerate}
 567             \item
 568                 The encoding is \textit{Reserved}.
 569
 570             \end{enumerate}
 571
 572     \end{enumerate}
 573
 574 \item
 575     If the base instruction is a SYSTEM instruction, then
 576
 577     \begin{enumerate}
 578     \item
 579         The encoding is \textit{Reserved}.
 580
 581     \end{enumerate}
 582
 583 \item
 584     If the base instruction is an integer instruction, then
 585
 586     \begin{enumerate}
 587
 588     \item
 589         If the base instruction is an R-type instruction, then
 590             \begin{enumerate}
 591             \item
 592                 The encoding is P48-R-type.
 593
 594             \end{enumerate}
 595
 596     \item
 597         If the base instruction is an I-type instruction, then
 598             \begin{enumerate}
 599             \item
 600                 The encoding is P48-I-type.
 601
 602             \end{enumerate}
 603
 604     \item
 605         If the base instruction is an S-type instruction, then
 606             \begin{enumerate}
 607             \item
 608                 The encoding is \textit{Reserved}.
 609
 610             \end{enumerate}
 611
 612     \item
 613         If the base instruction is an B-type instruction, then
 614             \begin{enumerate}
 615             \item
 616                 The encoding is \textit{Reserved}.
 617
 618             \end{enumerate}
 619
 620     \item
 621         If the base instruction is an U-type instruction, then
 622             \begin{enumerate}
 623             \item
 624                 The encoding is P48-U-type.
 625
 626             \end{enumerate}
 627
 628     \item
 629         If the base instruction is an J-type instruction, then
 630             \begin{enumerate}
 631             \item
 632                 The encoding is \textit{Reserved}.
 633
 634             \end{enumerate}
 635
 636     \item
 637         Otherwise
 638             \begin{enumerate}
 639             \item
 640                 The encoding is \textit{Reserved}.
 641
 642             \end{enumerate}
 643
 644     \end{enumerate}
 645
 646 \item
 647     If the base instruction is a floating-point instruction, then
 648
 649     \begin{enumerate}
 650
 651     \item
 652         If the base instruction is an R-type instruction, then
 653             \begin{enumerate}
 654             \item
 655                 The encoding is P48-FR-type.
 656
 657             \end{enumerate}
 658
 659     \item
 660         If the base instruction is an I-type instruction, then
 661             \begin{enumerate}
 662             \item
 663                 The encoding is P48-FI-type.
 664
 665             \end{enumerate}
 666
 667     \item
 668         If the base instruction is an S-type instruction, then
 669             \begin{enumerate}
 670             \item
 671                 The encoding is \textit{Reserved}.
 672
 673             \end{enumerate}
 674
 675     \item
 676         If the base instruction is an B-type instruction, then
 677             \begin{enumerate}
 678             \item
 679                 The encoding is \textit{Reserved}.
 680
 681             \end{enumerate}
 682
 683     \item
 684         If the base instruction is an U-type instruction, then
 685             \begin{enumerate}
 686             \item
 687                 The encoding is \textit{Reserved}.
 688
 689             \end{enumerate}
 690
 691     \item
 692         If the base instruction is an J-type instruction, then
 693             \begin{enumerate}
 694             \item
 695                 The encoding is \textit{Reserved}.
 696
 697             \end{enumerate}
 698
 699     \item
 700         If the base instruction is an R4-type instruction, then
 701             \begin{enumerate}
 702             \item
 703                 The encoding is P48-FR4-type.
 704
 705             \end{enumerate}
 706
 707     \item
 708         Otherwise
 709             \begin{enumerate}
 710             \item
 711                 The encoding is \textit{Reserved}.
 712
 713             \end{enumerate}
 714     \end{enumerate}
 715
 716 \item
 717     Otherwise
 718             The encoding is \textit{Reserved}.
 719
 720 \end{enumerate}
 721
 722 \section{CSR Registers}
 723
 724 CSRs are the same as in the main \Specification, if associated functionality is implemented. They have the exact same meaning as in the main \Specification.
 725
 726 \begin{itemize}
 727 \item
 728     VL
 729
 730 \item
 731     MVL
 732
 733 \item
 734     SVPSTATE
 735
 736 \item
 737     SUBVL
 738
 739 \end{itemize}
 740
 741 Associated SET and GET on the CSRs is exactly as in the main spec as well
 742 (including CSRRWI and CSRRW differences).
 743
 744 Note that if both VLtyp and svlen are not implemented, SVPSTATE is not
 745 required. Also if VL and SUBVL are not implemented, STATE from the main
 746 \Specification is not required either.
 747
 748 However if partial functionality is implemented, the unimplemented bits in
 749 STATE and SVPSTATE must be zero, and, in the UNIX Platform, an illegal
 750 exception MUST be raised if unsupported bits are written to.
 751
 752 SVPSTATE fields are exactly the same layout as STATE:
 753
 754 \begin{tabular}{|l|l|l|l|l|l|l|}                                                 \hline
 755     (31..28) & (27..26) & (25..24) & (23..18) & (17..12) & (11..6) & (5...0)  \\ \hline
 756     rsvd     & dsvoffs  & subvl    & destoffs & srcoffs  & vl      & maxvl    \\ \hline
 757 \end{tabular}
 758
 759 However note that where STATE stores the scalar register number to be used as
 760 VL, SVPSTATE.VL actually contains the actual VL value, in an identical fashion
 761 to RVV.
 762
 763 \section{Additional Instructions}
 764
 765 \begin{itemize}
 766 \item
 767     Add instructions to convert between integer types.
 768
 769 \item
 770     Add instructions to swizzle elements in sub-vectors. Note that the
 771     sub-vector lengths of the source and destination won't necessarily match.
 772
 773 \item
 774     Add instructions to transpose (2-4)x(2-4) element matrices.
 775
 776 \item
 777     Add instructions to insert or extract a sub-vector from a vector, with the
 778     index allowed to be both immediate and from a register (immediate can be
 779     covered by twin-predication, register might be, by virtue of predicates
 780     being registers)
 781
 782 \item
 783     Add a register gather instruction (aka MV.X: regfile[rd] =
 784     regfile[regfile[rs1]])
 785
 786 \end{itemize}
 787
 788 subelement swizzle example:
 789
 790     velswizzle x32, x64, SRCSUBVL=3, DESTSUBVL=4, ELTYPE=u8, elements=[0, 0, 2, 1]
 791
 792 \section{Questions}
 793
 794 Moved to the discussion page (link at top of this page)
 795
 796 \section{TODO}
 797
 798 Work out a way to do sub-element swizzling.