--- /dev/null
+% https://bugs.libre-soc.org/show_bug.cgi?id=213
+% SimpleV Prefix (SVprefix) Proposal v0.3
+% https://libre-soc.org/simple_v_extension/sv_prefix_proposal/
+
+\newcommand{\Specification}{{\href{https://libre-soc.org/simple_v_extension/specification/}{Specification}}}
+
+\chapter{SimpleV Prefix Proposal -- v0.3}
+
+\paragraph{}
+
+Copyright (c) Jacob Lifshay, 2019
+Copyright (c) Luke Kenneth Casson Leighton, 2019
+
+This proposal is designed to be able to operate without SVorig, but not to
+require the absence of SVorig. See \Specification.
+
+Principle: SVprefix embeds (unmodified) RVC and 32-bit scalar opcodes into 32,
+48 and 64 bit RV formats, to provide Vectorisation context on a per-instruction
+basis.
+
+\section{Options}
+
+
+The following partial / full implementation options are possible:
+
+\begin{itemize}
+\item
+ SVPrefix augments the main \Specification
+
+\item
+ SVPrefix operates independently, without the main spec VL (and MVL) \gls{CSR}s
+ (in any privilege level)
+
+\item
+ SVPrefix operates independently, without the main spec SUBVL CSRs (in any priv level)
+
+\item
+ SVPrefix has no support for VL (or MVL) overrides in the 64 bit instruction
+ format (VLtyp=0 as the only legal permitted value)
+
+\item
+ SVPrefix has no support for svlen overrides in either the 48 or 64 bit
+ instruction format either (svlen=0 as the only legal permitted value).
+
+\end{itemize}
+
+All permutations of the above options are permitted, and the UNIX platform must
+raise illegal instruction exceptions on implementations that do not support
+each option. For example, an implementation that has no support for VLtyp that
+sees an opcode with a nonzero VLtyp must raise an illegal instruction exception.
+
+Note that SVPrefix (VLtyp and svlen) has its own STATE CSR, SVPSTATE. This
+allows Prefixed operations to be re-entrant on traps, and to not affect VBLOCK
+use of VL or SUBVL.
+
+If the main \Specification CSRs and features are to be supported (VBLOCK), then
+when VLtyp or svlen are "default" they utilise the main \Specification VBLOCK VL
+and/or SUBVL, and, correspondingly, the main VBLOCK STATE CSR will be updated
+and used to track hardware loops.
+
+If however VLtyp is set to nondefault, then the SVPSTATE src and destoffs
+fields are used instead to create the hardware loops, and likewise if svlen is
+set to nondefault, SVPSTATE's svoffs field is used.
+
+\section{Half-Precision Floating Point (FP16)}
+
+If the F extension is supported, SVprefix adds support for FP16 in the base FP
+instructions by using 10 (H) in the floating-point format field fmt and using
+001 (H) in the floating-point load/store width field.
+
+\section{Compressed Instructions}
+
+Compressed instructions are under evaluation by taking the same prefix as used
+in P48, embedding that and standard RVC opcodes (minus their RVC prefix) into a
+32-bit space. This by taking the three remaining Major "custom" opcodes (0-2),
+% TODO discussion ???
+one for each of the three RVC Quadrants. see \textbf{discussion ???}.
+
+\section{48-bit Prefixed Instructions}
+
+All 48-bit prefixed instructions contain a 32-bit "base" instruction as the
+last 4 bytes. Since all 32-bit instructions have bits 1:0 set to 11, those bits
+are reused for additional encoding space in the 48-bit instructions.
+
+\section{64-bit Prefixed Instructions}
+
+The 48 bit format is further extended with the full 128-bit range on all source
+and destination registers, and the option to set both SVSTATE.VL and
+SVSTATE.MVL is provided.
+
+\section{48-bit Instruction Encodings}
+
+In the following table, Rsvd (reserved) entries must be zero. RV32 equivalent encodings included for side-by-side comparison (and listed below, separately).
+
+First, bits 17:0:
+
+\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|} \hline
+ Encoding & 17 & 16 & 15 & 14 & 13 & 12 & 11:7 & 6 & 5:0 \\ \hline
+ P48-LD-type & rd[5] & rs1[5] & vitp7[6] & vd & vs1 & vitp7[5:0] & & Rsvd & 011111 \\ \hline
+ P48-ST-type & vitp7[6] & rs1[5] & rs2[5] & vs2 & vs1 & vitp7[5:0] & & Rsvd & 011111 \\ \hline
+ P48-R-type & rd[5] & rs1[5] & rs2[5] & vs2 & vs1 & vitp6 & & Rsvd & 011111 \\ \hline
+ P48-I-type & rd[5] & rs1[5] & vitp7[6] & vd & vs1 & vitp7[5:0] & & Rsvd & 011111 \\ \hline
+ P48-U-type & rd[5] & Rsvd & Rsvd & vd & Rsvd & vitp6 & & Rsvd & 011111 \\ \hline
+ P48-FR-type & rd[5] & rs1[5] & rs2[5] & vs2 & vs1 & Rsvd & vtp5 & Rsvd & 011111 \\ \hline
+ P48-FI-type & rd[5] & rs1[5] & vitp7[6] & vd & vs1 & vitp7[5:0] & & Rsvd & 011111 \\ \hline
+ P48-FR4-type & rd[5] & rs1[5] & rs2[5] & vs2 & rs3[5] & vs3 [1] & vtp5 & Rsvd & 011111 \\ \hline
+\end{tabular}
+
+\fixme{ The link to [1] is easily confused with the likes of [5]}
+
+[1] Only vs2 and vs3 are included in the P48-FR4-type encoding because there is
+not enough space for vs1 as well, and because it is more useful to have a
+scalar argument for each of the multiplication and addition portions of fmadd
+than to have two scalars on the multiplication portion.
+
+Table showing correspondance between P48--type and RV32--type. These are bits 47:18 (RV32 shifted up by 16 bits):
+
+\begin{tabular}{|l|l|} \hline
+ Encoding & RV32 Encoding \\ \hline
+ 47:32 & 31:2 \\ \hline
+ P48-LD-type & RV32-I-type \\ \hline
+ P48-ST-type & RV32-S-Type \\ \hline
+ P48-R-type & RV32-R-Type \\ \hline
+ P48-I-type & RV32-I-Type \\ \hline
+ P48-U-type & RV32-U-Type \\ \hline
+ P48-FR-type & RV32-FR-Type \\ \hline
+ P48-FI-type & RV32-I-Type \\ \hline
+ P48-FR4-type & RV32-FR4-type \\ \hline
+\end{tabular}
+
+Table showing Standard RV32 encodings:
+
+\begin{tabular}{|l|l|l|l|l|l|l|l|l|} \hline
+ Encoding & 31:27 & 26:25 & 24:20 & 19:15 & 14:12 & 11:7 & 6:2 & 1:0 \\ \hline
+ RV32-R-type & funct7 & & rs2[4:0] & rs1[4:0] & funct3 & rd[4:0] & opcode & 0b11 \\ \hline
+ RV32-S-type & imm[11:5] & & rs2[4:0] & rs1[4:0] & funct3 & imm[4:0] & opcode & 0b11 \\ \hline
+ RV32-I-type & imm[11:0] & & & rs1[4:0] & funct3 & rd[4:0] & opcode & 0b11 \\ \hline
+ RV32-U-type & imm[31:12] & & & & & rd[4:0] & opcode & 0b11 \\ \hline
+ RV32-FR4-type & rs3[4:0] & fmt & rs2[4:0] & rs1[4:0] & funct3 & rd[4:0] & opcode & 0b11 \\ \hline
+ RV32-FR-type & funct5 & fmt & rs2[4:0] & rs1[4:0] & rm & rd[4:0] & opcode & 0b11 \\ \hline
+\end{tabular}
+
+\section{64-bit Instruction Encodings}
+
+Where in the 48 bit format the prefix is "0b0011111" in bits 0 to 6, this is now set to "0b0111111".
+
+\begin{tabular}{|l|l|l|l|} \hline
+ 63:48 & 47:18 & 17:7 & 6:0 \\ \hline
+ 64 bit prefix & RV32[31:3] & P48[17:7] & 0b0111111 \\ \hline
+\end{tabular}
+
+\begin{itemize}
+\item
+ The 64 bit prefix format is below
+
+\item
+ Bits 18 to 47 contain bits 3 to 31 of a standard RV32 format
+
+\item
+ Bits 7 to 17 contain bits 7 through 17 of the P48 format
+
+\item
+ Bits 0 to 6 contain the standard RV 64-bit prefix 0b0111111
+
+\end{itemize}
+
+64 bit prefix format:
+
+\begin{tabular}{|l|l|l|l|l|l|} \hline
+ Encoding & 63 & 62 & 61 & 60 & 59:48 \\ \hline
+ P64-LD-type & rd[6] & rs1[6] & & Rsvd & VLtyp \\ \hline
+ P64-ST-type & & rs1[6] & rs2[6] & Rsvd & VLtyp \\ \hline
+ P64-R-type & rd[6] & rs1[6] & rs2[6] & vd & VLtyp \\ \hline
+ P64-I-type & rd[6] & rs1[6] & & Rsvd & VLtyp \\ \hline
+ P64-U-type & rd[6] & & & Rsvd & VLtyp \\ \hline
+ P64-FR-type & & rs1[6] & rs2[6] & vd & VLtyp \\ \hline
+ P64-FI-type & rd[6] & rs1[6] & rs2[6] & vd & VLtyp \\ \hline
+ P64-FR4-type & rd[6] & rs1[6] & rs2[6] & rs3[6] & VLtyp \\ \hline
+\end{tabular}
+
+The extra bit for src and dest registers provides the full range of up to 128
+registers, when combined with the extra bit from the 48 bit prefix as well.
+VLtyp encodes how (whether) to set SVPSTATE.VL and SVPSTATE.MAXVL.
+
+\section{VLtyp field encoding}
+
+NOTE: VL and MVL below are local to SVPrefix and, if non-default, will update
+the src and dest element offsets in SVPSTATE, not the main \Specification STATE.
+If default (all zeros) then STATE VL and MVL apply to this instruction, and
+STATE.srcoffs (etc) will be used.
+
+\begin{tabular}{|l|l|l|l|l|} \hline
+ VLtyp[11] & VLtyp[10:6] & VLtyp[5:1] & VLtyp[0] & comment \\ \hline
+ 0 & 00000 & 00000 & 0 & no change to VL/MVL \\ \hline
+ 0 & VLdest & VLEN & vlt & VL imm/reg mode (vlt) \\ \hline
+ 1 & VLdest & MVL+VL-immed & 0 & MVL+VL immed mode \\ \hline
+ 1 & VLdest & MVL-immed & 1 & MVL immed mode \\ \hline
+\end{tabular}
+
+Note: when VLtyp is all zeros, the main \Specification VL and MVL apply to this
+instruction. If called outside of a VBLOCK or if sv.setvl has not set VL, the
+operation is "scalar".
+
+Just as in the VBLOCK format, when bit 11 of VLtyp is zero:
+
+\begin{itemize}
+\item
+ if vlt is zero, bits 1 to 5 specify the VLEN as a 5 bit immediate (offset
+ by 1: 0b00000 represents VL=1, 0b00001 represents VL=2 etc.)
+
+\item
+ if vlt is 1, bits 1 to 5 specify the scalar (RV standard) register from
+ which VL is set. x0 is not permitted
+
+\item
+ VL goes into the scalar register VLdest (if VLdest is not x0)
+
+\end{itemize}
+
+When bit 11 of VLtype is 1:
+
+\begin{itemize}
+\item
+ if VLtyp[0] is zero, both SVPSTATE.MAXVL and SVPSTATE.VL are set to
+ (imm+1). The same value goes into the scalar register VLdest (if VLdest is
+ not x0)
+
+\item
+ if VLtyp[0] is 1, SVPSTATE.MAXVL is set to (imm+1). SVPSTATE.VL will be
+ truncated to within the new range (if VL was greater than the new MAXVL).
+ The new VL goes into the scalar register VLdest (if VLdest is not x0).
+
+\end{itemize}
+
+This gives the option to set up SVPSTATE.VL in a "loop mode" (VLtype[11]=0) or
+in a "one-off" mode (VLtype[11]=1) which sets both MVL and VL to the same
+immediate value. This may be most useful for one-off Vectorised operations such
+as LOAD-MULTI / STORE-MULTI, for saving and restoration of large batches of
+registers in context-switches or function calls.
+
+Note that VLtyp's VL and MVL are not the same as the main \Specification VL or
+MVL, and that loops will alter srcoffs and destoffs in SVPSTATE in VLtype
+nondefault mode, but the srcoffs and destoffs in STATE, if VLtype=0.
+
+Furthermore, the execution order and exception handling must be exactly the
+same as in the main spec (Program Order must be preserved)
+
+Pseudocode for SVPSTATE.VL:
+
+\begin{verbatim}
+ # pseudocode
+
+ regs = [0u64; 128];
+ vl = 0;
+
+ // instruction fields:
+ rd = get_rd_field();
+ vlmax = get_immed_field();
+
+ // handle illegal instruction decoding
+ if vlmax > XLEN {
+ trap()
+ }
+
+ // calculate VL
+ if rs1 == 0 { // rs1 is x0
+ vl = vlmax
+ } else {
+ vl = min(regs[rs1], vlmax)
+ }
+
+ // write rd
+ if rd != 0 {
+ // rd is not x0
+ regs[rd] = vl
+ }
+\end{verbatim}
+
+
+\section{vs\#/vd Fields' Encoding}
+
+% Note tabularx - as the 3rd field needs to wrap otherwise it overflows the line
+\begin{tabularx}{\textwidth}{|l|l|X|} \hline
+ vs\#/vd & Mnemonic & Meaning \\ \hline
+ 0 & S & the rs\#/rd field specifies a scalar (single sub-vector);
+ the rs\#/rd field is zero-extended to get the actual 7-bit register number
+ \\ \hline
+ 1 & V & the rs\#/rd field specifies a vector; the rs\#/rd field is decoded using
+ the Vector Register Number Encoding to get the actual 7-bit register number
+ \\ \hline
+\end{tabularx}
+
+\fixme{Vector Register Number Encoding should be a link }
+
+If a vs\#/vd field is not present, it is as if it was present with a value that
+is the bitwise-or of all present vs\#/vd fields.
+
+\begin{itemize}
+\item
+ scalar register numbers do NOT increment when allocated in the hardware
+ for-loop. the same scalar register number is handed to every ALU.
+
+\item
+ vector register numbers DO increase when allocated in the hardware
+ for-loop. sequentially-increasing register data is handed to sequential
+ ALUs.
+
+\end{itemize}
+
+\section{Vector Register Number Encoding}
+
+For the 48 bit format, when vs\#/vd is 1, the actual 7-bit register number is
+derived from the corresponding 6-bit rs\#/rd field:
+
+\begin{tabular}{|l|l|l|} \hline
+ \multicolumn{3}{|c|}{Actual 7-bit register number} \\ \hline
+ Bit 6 & Bits 5:1 & Bit 0 \\ \hline
+ rs\#/rd[0] & rs\#/rd[5:1] & 0 \\ \hline
+\end{tabular}
+
+For the 64 bit format, the 7 bit register is constructed from the 7 bit fields:
+bits 0 to 4 from the 32 bit RV Standard format, bit 5 from the 48 bit prefix
+and bit 6 from the 64 bit prefix. Thus in the 64 bit format the full range of
+up to 128 registers is directly available. This for both when either scalar or
+vector mode is set.
+
+\section{Load/Store Kind (lsk) Field Encoding}
+
+\begin{tabular}{|l|l|l|} \hline
+ vd/vs2 & vs1 & Meaning \\ \hline
+ 0 & 0 & srcbase is scalar, LD/ST is pure scalar. \\ \hline
+ 1 & 0 & srcbase is scalar, LD/ST is unit strided \\ \hline
+ 0 & 1 & srcbase is a vector (gather/scatter aka array of srcbases). VSPLAT and VSELECT \\ \hline
+ 1 & 1 & srcbase is a vector, LD/ST is a full vector LD/ST. \\ \hline
+\end{tabular}
+
+Notes:
+\begin{itemize}
+\item
+ A register strided LD/ST would require 5 registers. srcbase, vd/vs2,
+ predicate 1, predicate 2 and the stride register.
+
+\item
+ Complex strides may all be done with a general purpose vector of srcbases.
+
+\item
+ Twin predication may be used even when vd/vs1 is a scalar, to give VSPLAT
+ and VSELECT, because the hardware loop ends on the first occurrence of a 1
+ in the predicate when a predicate is applied to a scalar.
+
+\item
+ Full vectorised gather/scatter is enabled when both registers are marked as
+ vectorised, however unlike e.g Intel AVX512, twin predication can be
+ applied.
+
+\end{itemize}
+
+Open question: RVV overloads the width field of LOAD-FP/STORE-FP using the bit
+2 to indicate additional interpretation of the 11 bit immediate. Should this be
+considered ?
+
+\section{Sub-Vector Length (svlen) Field Encoding}
+
+NOTE: svlen is not the same as the main spec SUBVL. When nondefault (not zero)
+SVPSTATE context is used for Sub vector loops. However is svlen is zero, STATE
+and SUBVL is used instead.
+
+Bitwidth, from VL's perspective, is a multiple of the elwidth times svlen. So
+within each loop of VL there are svlen sub-elements of elwidth in size, just
+like in a SIMD architecture. When svlen is set to 0b00 (indicating svlen=1) no
+such SIMD-like behaviour exists and the subvectoring is disabled.
+
+Predicate bits do not apply to the individual sub-vector elements, they apply
+to the entire subvector group. This saves instructions on setup of the
+predicate.
+
+\begin{tabular}{|l|l|} \hline
+ svlen Encoding & Value \\ \hline
+ 00 & SUBVL \\ \hline
+ 01 & 2 \\ \hline
+ 10 & 3 \\ \hline
+ 11 & 4 \\ \hline
+\end{tabular}
+
+In independent standalone implementations that do not implement the main
+\Specification, the value of SUBVL in the above table (svtyp=0b00) is set to 1,
+such that svlen is also 1.
+
+Behaviour of operations that set svlen are identical to those of the main spec.
+See section on VLtyp, above.
+
+\section{Predication (pred) Field Encoding}
+
+\begin{tabular}{|l|l|l|l|} \hline
+ pred & Mnemonic & Predicate Register & Meaning \\ \hline
+ 000 & None & None & The instruction is unpredicated \\ \hline
+ 001 & Reserved & Reserved & \\ \hline
+ 010 & !x9 & \multirow{2}{*}{x9 (s1)} & execute vector op[0..i] on x9[i] == 0 \\ \cline{1-2} \cline{4-4}
+ 011 & x9 & & execute vector op[0..i] on x9[i] == 1 \\ \hline
+ 100 & !x10 & \multirow{2}{*}{x10 (a0)} & execute vector op[0..i] on x10[i] == 0 \\ \cline{1-2} \cline{4-4}
+ 101 & x10 & & execute vector op[0..i] on x10[i] == 1 \\ \hline
+ 110 & !x11 & \multirow{2}{*}{x11 (a1)} & execute vector op[0..i] on x11[i] == 0 \\ \cline{1-2} \cline{4-4}
+ 111 & x11 & & execute vector op[0..i] on x11[i] == 1 \\ \hline
+\end{tabular}
+
+\section{Twin-predication (tpred) Field Encoding}
+
+Twin-predication (ability to associate two predicate registers with an
+instruction) applies to MV, FCLASS, LD and ST. The same format also applies to
+integer-branch-compare operations although it is not to be considered "twin"
+predication. In the case of integer-branch-compare operations, the second
+register (if enabled) stores the results of the element comparisons. See
+Appendix for details.
+
+\fixme{Appendix above is link to http://libre\-riscv.org/simple\_v\_extension/appendix/ }
+
+\begin{tabular}{|l|l|l|l|} \hline
+ pred & Mnemonic & Predicate Register & Meaning \\ \hline
+ 000 & None & None & The instruction is unpredicated \\ \hline
+ 001 & x9,off & src=x9, dest=none & src[0..i] uses x9[i], dest unpredicated \\ \hline
+ 010 & off,x10 & src=none, dest=x10 & dest[0..i] uses x10[i], src unpredicated \\ \hline
+ 011 & x9,10 & src=x9, dest=x10 & src[0..i] uses x9[i], dest[0..i] uses x10[i] \\ \hline
+ 100 & None & RESERVED & Instruction is unpredicated (TBD) \\ \hline
+ 101 & !x9,off & src=!x9, dest=none & \\ \hline
+ 110 & off,!x10 & src=none, dest=!x10 & \\ \hline
+ 111 & !x9,!x10 & src=!x9, dest=!x10 & \\ \hline
+\end{tabular}
+
+\fixme{In table above some in col 3 might be vertically joined}
+
+\section{Integer Element Type (itype) Field Encoding}
+
+\begin{tabularx}{\textwidth}{|l|l|l|X|X|X|} \hline
+ Signedness [2] & itype & Element Type & Mnemonic in Integer Instructions & Mnemonic in FP Instructions (such as fmv.x) & Meaning (INT may be un/signed, FP just re-sized \\ \hline
+ Unsigned & 01 & u8 & BU & BU & Unsigned 8-bit \\ \hline
+ & 10 & u16 & HU & HU & Unsigned 16-bit \\ \hline
+ & 11 & u32 & WU & WU & Unsigned 32-bit \\ \hline
+ & 00 & uXLEN & WU/DU/QU & WU/LU/TU & Unsigned XLEN-bit \\ \hline
+ Signed & 01 & i8 & BS & BS & Signed 8-bit \\ \hline
+ & 10 & i16 & HS & HS & Signed 16-bit \\ \hline
+ & 11 & i32 & W & W & Signed 32-bit \\ \hline
+ & 00 & iXLEN & W/D/Q & W/L/T & Signed XLEN-bit \\ \hline
+\end{tabularx}
+
+[2] (1, 2) Signedness is defined in Signedness Decision Procedure
+
+Note: vector mode is effectively a type-cast of the register file as if it was
+a sequential array being typecast to typedef itype[] (c syntax). The starting
+point of the "typecast" is the vector register rs\#/rd.
+
+Example: if itype=0b10 (u16), and rd is set to "vector", and VL is set to 4,
+the 64-bit register at rd is subdivided into FOUR 16-bit destination elements.
+It is NOT four separate 64-bit destination registers (rd+0, rd+1, rd+2, rd+3)
+that are sign-extended from the source width size out to 64-bit, because that
+is itype=0b00 (uXLEN).
+
+Note also: changing elwidth creates packed elements that, depending on VL, may
+create vectors that do not fit perfectly onto XLEN sized registry file
+bit-boundaries. This does NOT result in the destruction of the MSBs of the last
+register written to at the end of a VL loop. More details on how to handle this
+are described in the main \Specification.
+
+\section{Signedness Decision Procedure}
+
+\begin{enumerate}
+\item
+ If the opcode field is either OP or OP-IMM, then
+
+\indent Signedness is Unsigned.
+
+\item
+ If the opcode field is either OP-32 or OP-IMM-32, then
+
+\indent Signedness is Signed.
+
+\item
+ If Signedness is encoded in a field of the base instruction, [3] then
+
+\indent Signedness uses the encoded value.
+
+\item
+ Otherwise,
+
+\indent Signedness is Unsigned.
+
+\end{enumerate}
+
+[3] Like in fcvt.d.l[u], but unlike in fmv.x.w, since there is no fmv.x.wu
+
+\section{Vector Type and Predication 5-bit (vtp5) Field Encoding}
+
+In the following table, X denotes a wildcard that is 0 or 1 and can be a
+different value for every occurrence.
+
+\begin{tabular}{|l|l|l|} \hline
+ vtp5 & pred & svlen \\ \hline
+ 1XXXX & vtp5[4:2] & vtp5[1:0] \\ \hline
+ 01XXX & & \\ \hline
+ 000XX & & \\ \hline
+ 001XX & Reserved & \\ \hline
+\end{tabular}
+
+\section{Vector Integer Type and Predication 6-bit (vitp6) Field Encoding}
+
+In the following table, X denotes a wildcard that is 0 or 1 and can be a
+different value for every occurrence.
+
+\begin{tabular}{|l|l|l|l|l|} \hline
+ vitp6 & itype & pred[2] & pred[0:1] & svlen \\ \hline
+ XX1XXX & vitp6[5:4] & 0 & vitp6[3:2] & vitp6[1:0] \\ \hline
+ XX00XX & & & & \\ \hline
+ XX01XX & Reserved & & & \\ \hline
+\end{tabular}
+
+\fixme{spanning cols/rows above}
+
+vitp7 field: only tpred
+
+\begin{tabular}{|l|l|l|l|l|} \hline
+ vitp7 & itype & tpred[2] & tpred[0:1] & svlen \\ \hline
+ XXXXXXX & vitp7[5:4] & vitp7[6] & vitp7[3:2] & vitp7[1:0] \\ \hline
+\end{tabular}
+
+\section{48-bit Instruction Encoding Decision Procedure}
+
+In the following decision procedure, \textit{Reserved} means that there is not yet a
+defined 48-bit instruction encoding for the base instruction.
+
+\begin{enumerate}
+
+\item
+ If the base instruction is a load instruction, then
+
+ \begin{enumerate}
+ \item
+ If the base instruction is an I-type instruction, then
+ \begin{enumerate}
+ \item
+ The encoding is P48-LD-type.
+
+ \end{enumerate}
+
+ \item
+ Otherwise
+ \begin{enumerate}
+ \item
+ The encoding is \textit{Reserved}.
+
+ \end{enumerate}
+
+ \end{enumerate}
+\item
+ If the base instruction is a store instruction, then
+
+ \begin{enumerate}
+ \item
+ If the base instruction is an S-type instruction, then
+ \begin{enumerate}
+ \item
+ The encoding is P48-ST-type.
+
+ \end{enumerate}
+
+ \item
+ Otherwise
+ \begin{enumerate}
+ \item
+ The encoding is \textit{Reserved}.
+
+ \end{enumerate}
+
+ \end{enumerate}
+
+\item
+ If the base instruction is a SYSTEM instruction, then
+
+ \begin{enumerate}
+ \item
+ The encoding is \textit{Reserved}.
+
+ \end{enumerate}
+
+\item
+ If the base instruction is an integer instruction, then
+
+ \begin{enumerate}
+
+ \item
+ If the base instruction is an R-type instruction, then
+ \begin{enumerate}
+ \item
+ The encoding is P48-R-type.
+
+ \end{enumerate}
+
+ \item
+ If the base instruction is an I-type instruction, then
+ \begin{enumerate}
+ \item
+ The encoding is P48-I-type.
+
+ \end{enumerate}
+
+ \item
+ If the base instruction is an S-type instruction, then
+ \begin{enumerate}
+ \item
+ The encoding is \textit{Reserved}.
+
+ \end{enumerate}
+
+ \item
+ If the base instruction is an B-type instruction, then
+ \begin{enumerate}
+ \item
+ The encoding is \textit{Reserved}.
+
+ \end{enumerate}
+
+ \item
+ If the base instruction is an U-type instruction, then
+ \begin{enumerate}
+ \item
+ The encoding is P48-U-type.
+
+ \end{enumerate}
+
+ \item
+ If the base instruction is an J-type instruction, then
+ \begin{enumerate}
+ \item
+ The encoding is \textit{Reserved}.
+
+ \end{enumerate}
+
+ \item
+ Otherwise
+ \begin{enumerate}
+ \item
+ The encoding is \textit{Reserved}.
+
+ \end{enumerate}
+
+ \end{enumerate}
+
+\item
+ If the base instruction is a floating-point instruction, then
+
+ \begin{enumerate}
+
+ \item
+ If the base instruction is an R-type instruction, then
+ \begin{enumerate}
+ \item
+ The encoding is P48-FR-type.
+
+ \end{enumerate}
+
+ \item
+ If the base instruction is an I-type instruction, then
+ \begin{enumerate}
+ \item
+ The encoding is P48-FI-type.
+
+ \end{enumerate}
+
+ \item
+ If the base instruction is an S-type instruction, then
+ \begin{enumerate}
+ \item
+ The encoding is \textit{Reserved}.
+
+ \end{enumerate}
+
+ \item
+ If the base instruction is an B-type instruction, then
+ \begin{enumerate}
+ \item
+ The encoding is \textit{Reserved}.
+
+ \end{enumerate}
+
+ \item
+ If the base instruction is an U-type instruction, then
+ \begin{enumerate}
+ \item
+ The encoding is \textit{Reserved}.
+
+ \end{enumerate}
+
+ \item
+ If the base instruction is an J-type instruction, then
+ \begin{enumerate}
+ \item
+ The encoding is \textit{Reserved}.
+
+ \end{enumerate}
+
+ \item
+ If the base instruction is an R4-type instruction, then
+ \begin{enumerate}
+ \item
+ The encoding is P48-FR4-type.
+
+ \end{enumerate}
+
+ \item
+ Otherwise
+ \begin{enumerate}
+ \item
+ The encoding is \textit{Reserved}.
+
+ \end{enumerate}
+ \end{enumerate}
+
+\item
+ Otherwise
+ The encoding is \textit{Reserved}.
+
+\end{enumerate}
+
+\section{CSR Registers}
+
+CSRs are the same as in the main \Specification, if associated functionality is implemented. They have the exact same meaning as in the main \Specification.
+
+\begin{itemize}
+\item
+ VL
+
+\item
+ MVL
+
+\item
+ SVPSTATE
+
+\item
+ SUBVL
+
+\end{itemize}
+
+Associated SET and GET on the CSRs is exactly as in the main spec as well
+(including CSRRWI and CSRRW differences).
+
+Note that if both VLtyp and svlen are not implemented, SVPSTATE is not
+required. Also if VL and SUBVL are not implemented, STATE from the main
+\Specification is not required either.
+
+However if partial functionality is implemented, the unimplemented bits in
+STATE and SVPSTATE must be zero, and, in the UNIX Platform, an illegal
+exception MUST be raised if unsupported bits are written to.
+
+SVPSTATE fields are exactly the same layout as STATE:
+
+\begin{tabular}{|l|l|l|l|l|l|l|} \hline
+ (31..28) & (27..26) & (25..24) & (23..18) & (17..12) & (11..6) & (5...0) \\ \hline
+ rsvd & dsvoffs & subvl & destoffs & srcoffs & vl & maxvl \\ \hline
+\end{tabular}
+
+However note that where STATE stores the scalar register number to be used as
+VL, SVPSTATE.VL actually contains the actual VL value, in an identical fashion
+to RVV.
+
+\section{Additional Instructions}
+
+\begin{itemize}
+\item
+ Add instructions to convert between integer types.
+
+\item
+ Add instructions to swizzle elements in sub-vectors. Note that the
+ sub-vector lengths of the source and destination won't necessarily match.
+
+\item
+ Add instructions to transpose (2-4)x(2-4) element matrices.
+
+\item
+ Add instructions to insert or extract a sub-vector from a vector, with the
+ index allowed to be both immediate and from a register (immediate can be
+ covered by twin-predication, register might be, by virtue of predicates
+ being registers)
+
+\item
+ Add a register gather instruction (aka MV.X: regfile[rd] =
+ regfile[regfile[rs1]])
+
+\end{itemize}
+
+subelement swizzle example:
+
+ velswizzle x32, x64, SRCSUBVL=3, DESTSUBVL=4, ELTYPE=u8, elements=[0, 0, 2, 1]
+
+\section{Questions}
+
+Moved to the discussion page (link at top of this page)
+
+\section{TODO}
+
+Work out a way to do sub-element swizzling.
% Glossary
+% Yes: some of them are obvious, but it does no harm.
% In the main documentation text I have not tagged every use of the glossary
% entries below, I have tagged the first in a chapter, or first use for some
% number of paragraphs.
-% Put the definition of terms of the glossary terms in here
-% Try to keep in alphabetic order - for easier editing, they will
-% be generated (in the PDF) in alphabetic order regardless of the order below
+% Put the definition of terms of the glossary terms in here I did try to keep
+% in alphabetic order - for easier editing, but then decided that keeping
+% related terms together helped (in which case put some blank lines
+% before/after the related items). They will be generated (in the PDF) in
+% alphabetic order regardless of the order below
% To use one do something like: \gls{PowerPC}
% Note that the entries are case sensitive.
% entries below to appear. This seems to be if an entry is only mentioned in another
% glossary entry.
+\newglossaryentry{ALU}
+{
+ name=ALU,
+ description={
+ Arithmetic Logic Unit.
+ The part of the computer that does calculations of integer data.
+ Contrast to the \gls{FPU}.
+ See: \href{https://en.wikipedia.org/wiki/Arithmetic_logic_unit}{Wikipedia}
+ }
+}
+
\newglossaryentry{Binutils}
{
name=Binutils,
}
}
+\newglossaryentry{FPU}
+{
+ name=FPU,
+ description={
+ Floating Point Unit.
+ The part of the computer that does calculations of data in, probably, \gls{IEEE754} format.
+ See: \href{https://en.wikipedia.org/wiki/Floating-point_unit}{Wikipedia}
+ }
+}
+
+\newglossaryentry{FU}
+{
+ name=FU,
+ description={
+ Functional Unit.
+ A computer typically has five main functional units:
+ \gls{CPU}, Input (get data from: keyboard, network, microphone, ...),
+ Output (send data to: screen, network, audio, ...), Memory (RAM, disk, ...) \&
+ Control (coordinates the other FUs).
+ Many of these FUs are built out of smaller FUs.
+ }
+}
+
\newglossaryentry{gcc}
{
name=gcc,
A popular standard way of representing and manipulating floating point numbers.
Initiated by the Institute of Electrical and Electronics Engineers in 1985.
Different precisions from 16 to 256 bits are described.
- See: \href{https://en.wikipedia.org/wiki/IEEE_754}{Wikipedia}
+ See: \gls{FPU} \href{https://en.wikipedia.org/wiki/IEEE_754}{Wikipedia}
}
}
Input Output Memory Management Unit.
Mediates between Input/Output devices and main memory mapping virtual
addresses to physical ones and, maybe, enforcing protection restrictions.
- See: \href{https://en.wikipedia.org/wiki/Input%E2%80%93output_memory_management_unit}{Wikipedia}
+ See: \href{https://en.wikipedia.org/wiki/Input\%E2\%80\%93output_memory_management_unit}{Wikipedia}
}
}
}
}
+
+
+\newglossaryentry{LSB}
+{
+ name=LSB,
+ description={
+ Least Significant Bit.
+ In an integer represented in binary, the bit that has the smallest value.
+ In this document the LSB is called 'bit 0', if this is the only bit set the
+ integer will have the value 1.
+ See: \gls{MSB}
+% See: \href{}{}
+ }
+}
+
+\newglossaryentry{MSB}
+{
+ name=MSB,
+ description={
+ Most Significant Bit.
+ In an integer represented in binary, the bit that has the greatest value.
+ If the integer is signed, this bit will make the integer negative if it is 1.
+ In this document the MSB is given the highest number in the integer, eg:
+ in 8 bits it is called 'bit 7'; in 32 bits it is called 'bit 31'.
+ See: \gls{LSB}
+% See: \href{}{}
+ }
+}
+
+
+
+% microwatt https://github.com/antonblanchard/microwatt/blob/master/decode1.vhdl
+% https://www.zephyrproject.org/microwatt-and-the-power-isa-support-in-renode/
+% I did see references to this somewhere ... cannot see them now
+\newglossaryentry{microwatt}
+{
+ name=microwatt,
+ description={
+ An open source implementation of POWER by IBM in 2019.
+% See: \href{}{}
+ }
+}
+
\newglossaryentry{MISA}
{
name=MISA,
The ability to run more than one \gls{ISA} on the same hardware.
A setting in a \gls{CSR} controls which instructions will be
recognised at any time.
- See: \href{}{}
+% See: \href{}{}
}
}
% https://ieeexplore.ieee.org/document/6136696 - paywalled
}
}
+\newglossaryentry{PC}
+{
+ name=PC,
+ description={
+ Program Counter.
+ A register that holds the address of the instruction being executed.
+ }
+}
+
\newglossaryentry{PowerPC}
{
name=PowerPC,
}
}
+\newglossaryentry{SP}
+{
+ name=SP,
+ description={
+ Stack Pointer.
+ A register that holds the address of the current function stack
+ frame -- used for variables local to a function.
+ }
+}
+
\newglossaryentry{SPARC}
{
name=SPARC,
description={
Video Processing Unit.
Similar to a \gls{CPU} but has extra hardware instructions to speed up things
- like the decoding and encoding of \gls{H.265}, or \gls{VP9}.
-% See: \href{}{}
+% like the decoding and encoding of \gls{H.265}, or \gls{VP9}.
}
}
}
}
+
+
+\newglossaryentry{LE}
+{
+ name=LE,
+ description={
+ Little Endian.
+ When 2/4/8 bytes are loaded into a 16/32/64 bit register the bytes at \textbf{lower}
+ memory addresses are put into \textbf{lower -- less significant} places in the register.
+ Intel/AMD are LE.
+ See: \gls{BE} and \gls{endian}
+ }
+}
+\newglossaryentry{BE}
+{
+ name=BE,
+ description={
+ Big Endian.
+ When 2/4/8 bytes are loaded into a 16/32/64 bit register the bytes at \textbf{lower}
+ memory addresses are put into \textbf{higher -- more significant} places in the register.
+ IBM z is BE.
+ See: \gls{LE} and \gls{endian}
+ }
+}
+\newglossaryentry{endian}
+{
+ name=endian,
+ description={
+ Describes in a multi-byte word, which byte contains the most significant bits.
+ Two choices Little Endian \gls{LE} and and Big Endian \gls{BE} predominate,
+ but it can be more complicated when a word is made of 4 or more bytes.
+ \gls{PowerPC}, ARM \& \gls{SPARC} can be either LE or BE.
+ See: \href{https://en.wikipedia.org/wiki/Endianness}{Wikipedia}
+ }
+}
+
+
+
% Other entries to consider:
% emulator
% namespace
% MSB
% PCR
% SIMD
-% ALU
% RA
% RB
-% microwatt https://github.com/antonblanchard/microwatt/blob/master/decode1.vhdl
% 6600 https://libre-soc.org/3d_gpu/architecture/6600scoreboard/
% DAG Directed Acyclic Graph
% SR latch
-% FU Functional Unit
-% FPU float point unit
+%
% WAR https://libre-soc.org/3d_gpu/architecture/6600scoreboard/ #10
-% ALU
% FU-FU function to function https://libre-soc.org/3d_gpu/architecture/6600scoreboard/ #14
% GORD GOWR GO read/write
% ISANS
% SIE
% WARL
% WLRL
+% xepc -- in isamux.tex
+% RV - as in RV mode Description isamux
+% RVS - see SVPrefix
+% VL MVL see SVPrefix
+% SVorig