From: Alain D D Williams Date: Tue, 25 Aug 2020 19:18:42 +0000 (+0100) Subject: Latest additions X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=09020a5d1e22d10c3166dfdac23e19a5db6e7c92;p=libresoc-isa-manual.git Latest additions --- diff --git a/powerpc-add/build/Makefile b/powerpc-add/build/Makefile index b204024..b659a1a 100644 --- a/powerpc-add/build/Makefile +++ b/powerpc-add/build/Makefile @@ -29,6 +29,7 @@ srcdir = src # Relative to output: relsrcdir = ../$(srcdir) +# What we want to produce: doc_target = power-spec docs_with_bib = power-spec diff --git a/powerpc-add/src/SVPrefix.tex b/powerpc-add/src/SVPrefix.tex new file mode 100644 index 0000000..72b2251 --- /dev/null +++ b/powerpc-add/src/SVPrefix.tex @@ -0,0 +1,798 @@ +% https://bugs.libre-soc.org/show_bug.cgi?id=213 +% SimpleV Prefix (SVprefix) Proposal v0.3 +% https://libre-soc.org/simple_v_extension/sv_prefix_proposal/ + +\newcommand{\Specification}{{\href{https://libre-soc.org/simple_v_extension/specification/}{Specification}}} + +\chapter{SimpleV Prefix Proposal -- v0.3} + +\paragraph{} + +Copyright (c) Jacob Lifshay, 2019 +Copyright (c) Luke Kenneth Casson Leighton, 2019 + +This proposal is designed to be able to operate without SVorig, but not to +require the absence of SVorig. See \Specification. + +Principle: SVprefix embeds (unmodified) RVC and 32-bit scalar opcodes into 32, +48 and 64 bit RV formats, to provide Vectorisation context on a per-instruction +basis. + +\section{Options} + + +The following partial / full implementation options are possible: + +\begin{itemize} +\item + SVPrefix augments the main \Specification + +\item + SVPrefix operates independently, without the main spec VL (and MVL) \gls{CSR}s + (in any privilege level) + +\item + SVPrefix operates independently, without the main spec SUBVL CSRs (in any priv level) + +\item + SVPrefix has no support for VL (or MVL) overrides in the 64 bit instruction + format (VLtyp=0 as the only legal permitted value) + +\item + SVPrefix has no support for svlen overrides in either the 48 or 64 bit + instruction format either (svlen=0 as the only legal permitted value). + +\end{itemize} + +All permutations of the above options are permitted, and the UNIX platform must +raise illegal instruction exceptions on implementations that do not support +each option. For example, an implementation that has no support for VLtyp that +sees an opcode with a nonzero VLtyp must raise an illegal instruction exception. + +Note that SVPrefix (VLtyp and svlen) has its own STATE CSR, SVPSTATE. This +allows Prefixed operations to be re-entrant on traps, and to not affect VBLOCK +use of VL or SUBVL. + +If the main \Specification CSRs and features are to be supported (VBLOCK), then +when VLtyp or svlen are "default" they utilise the main \Specification VBLOCK VL +and/or SUBVL, and, correspondingly, the main VBLOCK STATE CSR will be updated +and used to track hardware loops. + +If however VLtyp is set to nondefault, then the SVPSTATE src and destoffs +fields are used instead to create the hardware loops, and likewise if svlen is +set to nondefault, SVPSTATE's svoffs field is used. + +\section{Half-Precision Floating Point (FP16)} + +If the F extension is supported, SVprefix adds support for FP16 in the base FP +instructions by using 10 (H) in the floating-point format field fmt and using +001 (H) in the floating-point load/store width field. + +\section{Compressed Instructions} + +Compressed instructions are under evaluation by taking the same prefix as used +in P48, embedding that and standard RVC opcodes (minus their RVC prefix) into a +32-bit space. This by taking the three remaining Major "custom" opcodes (0-2), +% TODO discussion ??? +one for each of the three RVC Quadrants. see \textbf{discussion ???}. + +\section{48-bit Prefixed Instructions} + +All 48-bit prefixed instructions contain a 32-bit "base" instruction as the +last 4 bytes. Since all 32-bit instructions have bits 1:0 set to 11, those bits +are reused for additional encoding space in the 48-bit instructions. + +\section{64-bit Prefixed Instructions} + +The 48 bit format is further extended with the full 128-bit range on all source +and destination registers, and the option to set both SVSTATE.VL and +SVSTATE.MVL is provided. + +\section{48-bit Instruction Encodings} + +In the following table, Rsvd (reserved) entries must be zero. RV32 equivalent encodings included for side-by-side comparison (and listed below, separately). + +First, bits 17:0: + +\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|} \hline + Encoding & 17 & 16 & 15 & 14 & 13 & 12 & 11:7 & 6 & 5:0 \\ \hline + P48-LD-type & rd[5] & rs1[5] & vitp7[6] & vd & vs1 & vitp7[5:0] & & Rsvd & 011111 \\ \hline + P48-ST-type & vitp7[6] & rs1[5] & rs2[5] & vs2 & vs1 & vitp7[5:0] & & Rsvd & 011111 \\ \hline + P48-R-type & rd[5] & rs1[5] & rs2[5] & vs2 & vs1 & vitp6 & & Rsvd & 011111 \\ \hline + P48-I-type & rd[5] & rs1[5] & vitp7[6] & vd & vs1 & vitp7[5:0] & & Rsvd & 011111 \\ \hline + P48-U-type & rd[5] & Rsvd & Rsvd & vd & Rsvd & vitp6 & & Rsvd & 011111 \\ \hline + P48-FR-type & rd[5] & rs1[5] & rs2[5] & vs2 & vs1 & Rsvd & vtp5 & Rsvd & 011111 \\ \hline + P48-FI-type & rd[5] & rs1[5] & vitp7[6] & vd & vs1 & vitp7[5:0] & & Rsvd & 011111 \\ \hline + P48-FR4-type & rd[5] & rs1[5] & rs2[5] & vs2 & rs3[5] & vs3 [1] & vtp5 & Rsvd & 011111 \\ \hline +\end{tabular} + +\fixme{ The link to [1] is easily confused with the likes of [5]} + +[1] Only vs2 and vs3 are included in the P48-FR4-type encoding because there is +not enough space for vs1 as well, and because it is more useful to have a +scalar argument for each of the multiplication and addition portions of fmadd +than to have two scalars on the multiplication portion. + +Table showing correspondance between P48--type and RV32--type. These are bits 47:18 (RV32 shifted up by 16 bits): + +\begin{tabular}{|l|l|} \hline + Encoding & RV32 Encoding \\ \hline + 47:32 & 31:2 \\ \hline + P48-LD-type & RV32-I-type \\ \hline + P48-ST-type & RV32-S-Type \\ \hline + P48-R-type & RV32-R-Type \\ \hline + P48-I-type & RV32-I-Type \\ \hline + P48-U-type & RV32-U-Type \\ \hline + P48-FR-type & RV32-FR-Type \\ \hline + P48-FI-type & RV32-I-Type \\ \hline + P48-FR4-type & RV32-FR4-type \\ \hline +\end{tabular} + +Table showing Standard RV32 encodings: + +\begin{tabular}{|l|l|l|l|l|l|l|l|l|} \hline + Encoding & 31:27 & 26:25 & 24:20 & 19:15 & 14:12 & 11:7 & 6:2 & 1:0 \\ \hline + RV32-R-type & funct7 & & rs2[4:0] & rs1[4:0] & funct3 & rd[4:0] & opcode & 0b11 \\ \hline + RV32-S-type & imm[11:5] & & rs2[4:0] & rs1[4:0] & funct3 & imm[4:0] & opcode & 0b11 \\ \hline + RV32-I-type & imm[11:0] & & & rs1[4:0] & funct3 & rd[4:0] & opcode & 0b11 \\ \hline + RV32-U-type & imm[31:12] & & & & & rd[4:0] & opcode & 0b11 \\ \hline + RV32-FR4-type & rs3[4:0] & fmt & rs2[4:0] & rs1[4:0] & funct3 & rd[4:0] & opcode & 0b11 \\ \hline + RV32-FR-type & funct5 & fmt & rs2[4:0] & rs1[4:0] & rm & rd[4:0] & opcode & 0b11 \\ \hline +\end{tabular} + +\section{64-bit Instruction Encodings} + +Where in the 48 bit format the prefix is "0b0011111" in bits 0 to 6, this is now set to "0b0111111". + +\begin{tabular}{|l|l|l|l|} \hline + 63:48 & 47:18 & 17:7 & 6:0 \\ \hline + 64 bit prefix & RV32[31:3] & P48[17:7] & 0b0111111 \\ \hline +\end{tabular} + +\begin{itemize} +\item + The 64 bit prefix format is below + +\item + Bits 18 to 47 contain bits 3 to 31 of a standard RV32 format + +\item + Bits 7 to 17 contain bits 7 through 17 of the P48 format + +\item + Bits 0 to 6 contain the standard RV 64-bit prefix 0b0111111 + +\end{itemize} + +64 bit prefix format: + +\begin{tabular}{|l|l|l|l|l|l|} \hline + Encoding & 63 & 62 & 61 & 60 & 59:48 \\ \hline + P64-LD-type & rd[6] & rs1[6] & & Rsvd & VLtyp \\ \hline + P64-ST-type & & rs1[6] & rs2[6] & Rsvd & VLtyp \\ \hline + P64-R-type & rd[6] & rs1[6] & rs2[6] & vd & VLtyp \\ \hline + P64-I-type & rd[6] & rs1[6] & & Rsvd & VLtyp \\ \hline + P64-U-type & rd[6] & & & Rsvd & VLtyp \\ \hline + P64-FR-type & & rs1[6] & rs2[6] & vd & VLtyp \\ \hline + P64-FI-type & rd[6] & rs1[6] & rs2[6] & vd & VLtyp \\ \hline + P64-FR4-type & rd[6] & rs1[6] & rs2[6] & rs3[6] & VLtyp \\ \hline +\end{tabular} + +The extra bit for src and dest registers provides the full range of up to 128 +registers, when combined with the extra bit from the 48 bit prefix as well. +VLtyp encodes how (whether) to set SVPSTATE.VL and SVPSTATE.MAXVL. + +\section{VLtyp field encoding} + +NOTE: VL and MVL below are local to SVPrefix and, if non-default, will update +the src and dest element offsets in SVPSTATE, not the main \Specification STATE. +If default (all zeros) then STATE VL and MVL apply to this instruction, and +STATE.srcoffs (etc) will be used. + +\begin{tabular}{|l|l|l|l|l|} \hline + VLtyp[11] & VLtyp[10:6] & VLtyp[5:1] & VLtyp[0] & comment \\ \hline + 0 & 00000 & 00000 & 0 & no change to VL/MVL \\ \hline + 0 & VLdest & VLEN & vlt & VL imm/reg mode (vlt) \\ \hline + 1 & VLdest & MVL+VL-immed & 0 & MVL+VL immed mode \\ \hline + 1 & VLdest & MVL-immed & 1 & MVL immed mode \\ \hline +\end{tabular} + +Note: when VLtyp is all zeros, the main \Specification VL and MVL apply to this +instruction. If called outside of a VBLOCK or if sv.setvl has not set VL, the +operation is "scalar". + +Just as in the VBLOCK format, when bit 11 of VLtyp is zero: + +\begin{itemize} +\item + if vlt is zero, bits 1 to 5 specify the VLEN as a 5 bit immediate (offset + by 1: 0b00000 represents VL=1, 0b00001 represents VL=2 etc.) + +\item + if vlt is 1, bits 1 to 5 specify the scalar (RV standard) register from + which VL is set. x0 is not permitted + +\item + VL goes into the scalar register VLdest (if VLdest is not x0) + +\end{itemize} + +When bit 11 of VLtype is 1: + +\begin{itemize} +\item + if VLtyp[0] is zero, both SVPSTATE.MAXVL and SVPSTATE.VL are set to + (imm+1). The same value goes into the scalar register VLdest (if VLdest is + not x0) + +\item + if VLtyp[0] is 1, SVPSTATE.MAXVL is set to (imm+1). SVPSTATE.VL will be + truncated to within the new range (if VL was greater than the new MAXVL). + The new VL goes into the scalar register VLdest (if VLdest is not x0). + +\end{itemize} + +This gives the option to set up SVPSTATE.VL in a "loop mode" (VLtype[11]=0) or +in a "one-off" mode (VLtype[11]=1) which sets both MVL and VL to the same +immediate value. This may be most useful for one-off Vectorised operations such +as LOAD-MULTI / STORE-MULTI, for saving and restoration of large batches of +registers in context-switches or function calls. + +Note that VLtyp's VL and MVL are not the same as the main \Specification VL or +MVL, and that loops will alter srcoffs and destoffs in SVPSTATE in VLtype +nondefault mode, but the srcoffs and destoffs in STATE, if VLtype=0. + +Furthermore, the execution order and exception handling must be exactly the +same as in the main spec (Program Order must be preserved) + +Pseudocode for SVPSTATE.VL: + +\begin{verbatim} + # pseudocode + + regs = [0u64; 128]; + vl = 0; + + // instruction fields: + rd = get_rd_field(); + vlmax = get_immed_field(); + + // handle illegal instruction decoding + if vlmax > XLEN { + trap() + } + + // calculate VL + if rs1 == 0 { // rs1 is x0 + vl = vlmax + } else { + vl = min(regs[rs1], vlmax) + } + + // write rd + if rd != 0 { + // rd is not x0 + regs[rd] = vl + } +\end{verbatim} + + +\section{vs\#/vd Fields' Encoding} + +% Note tabularx - as the 3rd field needs to wrap otherwise it overflows the line +\begin{tabularx}{\textwidth}{|l|l|X|} \hline + vs\#/vd & Mnemonic & Meaning \\ \hline + 0 & S & the rs\#/rd field specifies a scalar (single sub-vector); + the rs\#/rd field is zero-extended to get the actual 7-bit register number + \\ \hline + 1 & V & the rs\#/rd field specifies a vector; the rs\#/rd field is decoded using + the Vector Register Number Encoding to get the actual 7-bit register number + \\ \hline +\end{tabularx} + +\fixme{Vector Register Number Encoding should be a link } + +If a vs\#/vd field is not present, it is as if it was present with a value that +is the bitwise-or of all present vs\#/vd fields. + +\begin{itemize} +\item + scalar register numbers do NOT increment when allocated in the hardware + for-loop. the same scalar register number is handed to every ALU. + +\item + vector register numbers DO increase when allocated in the hardware + for-loop. sequentially-increasing register data is handed to sequential + ALUs. + +\end{itemize} + +\section{Vector Register Number Encoding} + +For the 48 bit format, when vs\#/vd is 1, the actual 7-bit register number is +derived from the corresponding 6-bit rs\#/rd field: + +\begin{tabular}{|l|l|l|} \hline + \multicolumn{3}{|c|}{Actual 7-bit register number} \\ \hline + Bit 6 & Bits 5:1 & Bit 0 \\ \hline + rs\#/rd[0] & rs\#/rd[5:1] & 0 \\ \hline +\end{tabular} + +For the 64 bit format, the 7 bit register is constructed from the 7 bit fields: +bits 0 to 4 from the 32 bit RV Standard format, bit 5 from the 48 bit prefix +and bit 6 from the 64 bit prefix. Thus in the 64 bit format the full range of +up to 128 registers is directly available. This for both when either scalar or +vector mode is set. + +\section{Load/Store Kind (lsk) Field Encoding} + +\begin{tabular}{|l|l|l|} \hline + vd/vs2 & vs1 & Meaning \\ \hline + 0 & 0 & srcbase is scalar, LD/ST is pure scalar. \\ \hline + 1 & 0 & srcbase is scalar, LD/ST is unit strided \\ \hline + 0 & 1 & srcbase is a vector (gather/scatter aka array of srcbases). VSPLAT and VSELECT \\ \hline + 1 & 1 & srcbase is a vector, LD/ST is a full vector LD/ST. \\ \hline +\end{tabular} + +Notes: +\begin{itemize} +\item + A register strided LD/ST would require 5 registers. srcbase, vd/vs2, + predicate 1, predicate 2 and the stride register. + +\item + Complex strides may all be done with a general purpose vector of srcbases. + +\item + Twin predication may be used even when vd/vs1 is a scalar, to give VSPLAT + and VSELECT, because the hardware loop ends on the first occurrence of a 1 + in the predicate when a predicate is applied to a scalar. + +\item + Full vectorised gather/scatter is enabled when both registers are marked as + vectorised, however unlike e.g Intel AVX512, twin predication can be + applied. + +\end{itemize} + +Open question: RVV overloads the width field of LOAD-FP/STORE-FP using the bit +2 to indicate additional interpretation of the 11 bit immediate. Should this be +considered ? + +\section{Sub-Vector Length (svlen) Field Encoding} + +NOTE: svlen is not the same as the main spec SUBVL. When nondefault (not zero) +SVPSTATE context is used for Sub vector loops. However is svlen is zero, STATE +and SUBVL is used instead. + +Bitwidth, from VL's perspective, is a multiple of the elwidth times svlen. So +within each loop of VL there are svlen sub-elements of elwidth in size, just +like in a SIMD architecture. When svlen is set to 0b00 (indicating svlen=1) no +such SIMD-like behaviour exists and the subvectoring is disabled. + +Predicate bits do not apply to the individual sub-vector elements, they apply +to the entire subvector group. This saves instructions on setup of the +predicate. + +\begin{tabular}{|l|l|} \hline + svlen Encoding & Value \\ \hline + 00 & SUBVL \\ \hline + 01 & 2 \\ \hline + 10 & 3 \\ \hline + 11 & 4 \\ \hline +\end{tabular} + +In independent standalone implementations that do not implement the main +\Specification, the value of SUBVL in the above table (svtyp=0b00) is set to 1, +such that svlen is also 1. + +Behaviour of operations that set svlen are identical to those of the main spec. +See section on VLtyp, above. + +\section{Predication (pred) Field Encoding} + +\begin{tabular}{|l|l|l|l|} \hline + pred & Mnemonic & Predicate Register & Meaning \\ \hline + 000 & None & None & The instruction is unpredicated \\ \hline + 001 & Reserved & Reserved & \\ \hline + 010 & !x9 & \multirow{2}{*}{x9 (s1)} & execute vector op[0..i] on x9[i] == 0 \\ \cline{1-2} \cline{4-4} + 011 & x9 & & execute vector op[0..i] on x9[i] == 1 \\ \hline + 100 & !x10 & \multirow{2}{*}{x10 (a0)} & execute vector op[0..i] on x10[i] == 0 \\ \cline{1-2} \cline{4-4} + 101 & x10 & & execute vector op[0..i] on x10[i] == 1 \\ \hline + 110 & !x11 & \multirow{2}{*}{x11 (a1)} & execute vector op[0..i] on x11[i] == 0 \\ \cline{1-2} \cline{4-4} + 111 & x11 & & execute vector op[0..i] on x11[i] == 1 \\ \hline +\end{tabular} + +\section{Twin-predication (tpred) Field Encoding} + +Twin-predication (ability to associate two predicate registers with an +instruction) applies to MV, FCLASS, LD and ST. The same format also applies to +integer-branch-compare operations although it is not to be considered "twin" +predication. In the case of integer-branch-compare operations, the second +register (if enabled) stores the results of the element comparisons. See +Appendix for details. + +\fixme{Appendix above is link to http://libre\-riscv.org/simple\_v\_extension/appendix/ } + +\begin{tabular}{|l|l|l|l|} \hline + pred & Mnemonic & Predicate Register & Meaning \\ \hline + 000 & None & None & The instruction is unpredicated \\ \hline + 001 & x9,off & src=x9, dest=none & src[0..i] uses x9[i], dest unpredicated \\ \hline + 010 & off,x10 & src=none, dest=x10 & dest[0..i] uses x10[i], src unpredicated \\ \hline + 011 & x9,10 & src=x9, dest=x10 & src[0..i] uses x9[i], dest[0..i] uses x10[i] \\ \hline + 100 & None & RESERVED & Instruction is unpredicated (TBD) \\ \hline + 101 & !x9,off & src=!x9, dest=none & \\ \hline + 110 & off,!x10 & src=none, dest=!x10 & \\ \hline + 111 & !x9,!x10 & src=!x9, dest=!x10 & \\ \hline +\end{tabular} + +\fixme{In table above some in col 3 might be vertically joined} + +\section{Integer Element Type (itype) Field Encoding} + +\begin{tabularx}{\textwidth}{|l|l|l|X|X|X|} \hline + Signedness [2] & itype & Element Type & Mnemonic in Integer Instructions & Mnemonic in FP Instructions (such as fmv.x) & Meaning (INT may be un/signed, FP just re-sized \\ \hline + Unsigned & 01 & u8 & BU & BU & Unsigned 8-bit \\ \hline + & 10 & u16 & HU & HU & Unsigned 16-bit \\ \hline + & 11 & u32 & WU & WU & Unsigned 32-bit \\ \hline + & 00 & uXLEN & WU/DU/QU & WU/LU/TU & Unsigned XLEN-bit \\ \hline + Signed & 01 & i8 & BS & BS & Signed 8-bit \\ \hline + & 10 & i16 & HS & HS & Signed 16-bit \\ \hline + & 11 & i32 & W & W & Signed 32-bit \\ \hline + & 00 & iXLEN & W/D/Q & W/L/T & Signed XLEN-bit \\ \hline +\end{tabularx} + +[2] (1, 2) Signedness is defined in Signedness Decision Procedure + +Note: vector mode is effectively a type-cast of the register file as if it was +a sequential array being typecast to typedef itype[] (c syntax). The starting +point of the "typecast" is the vector register rs\#/rd. + +Example: if itype=0b10 (u16), and rd is set to "vector", and VL is set to 4, +the 64-bit register at rd is subdivided into FOUR 16-bit destination elements. +It is NOT four separate 64-bit destination registers (rd+0, rd+1, rd+2, rd+3) +that are sign-extended from the source width size out to 64-bit, because that +is itype=0b00 (uXLEN). + +Note also: changing elwidth creates packed elements that, depending on VL, may +create vectors that do not fit perfectly onto XLEN sized registry file +bit-boundaries. This does NOT result in the destruction of the MSBs of the last +register written to at the end of a VL loop. More details on how to handle this +are described in the main \Specification. + +\section{Signedness Decision Procedure} + +\begin{enumerate} +\item + If the opcode field is either OP or OP-IMM, then + +\indent Signedness is Unsigned. + +\item + If the opcode field is either OP-32 or OP-IMM-32, then + +\indent Signedness is Signed. + +\item + If Signedness is encoded in a field of the base instruction, [3] then + +\indent Signedness uses the encoded value. + +\item + Otherwise, + +\indent Signedness is Unsigned. + +\end{enumerate} + +[3] Like in fcvt.d.l[u], but unlike in fmv.x.w, since there is no fmv.x.wu + +\section{Vector Type and Predication 5-bit (vtp5) Field Encoding} + +In the following table, X denotes a wildcard that is 0 or 1 and can be a +different value for every occurrence. + +\begin{tabular}{|l|l|l|} \hline + vtp5 & pred & svlen \\ \hline + 1XXXX & vtp5[4:2] & vtp5[1:0] \\ \hline + 01XXX & & \\ \hline + 000XX & & \\ \hline + 001XX & Reserved & \\ \hline +\end{tabular} + +\section{Vector Integer Type and Predication 6-bit (vitp6) Field Encoding} + +In the following table, X denotes a wildcard that is 0 or 1 and can be a +different value for every occurrence. + +\begin{tabular}{|l|l|l|l|l|} \hline + vitp6 & itype & pred[2] & pred[0:1] & svlen \\ \hline + XX1XXX & vitp6[5:4] & 0 & vitp6[3:2] & vitp6[1:0] \\ \hline + XX00XX & & & & \\ \hline + XX01XX & Reserved & & & \\ \hline +\end{tabular} + +\fixme{spanning cols/rows above} + +vitp7 field: only tpred + +\begin{tabular}{|l|l|l|l|l|} \hline + vitp7 & itype & tpred[2] & tpred[0:1] & svlen \\ \hline + XXXXXXX & vitp7[5:4] & vitp7[6] & vitp7[3:2] & vitp7[1:0] \\ \hline +\end{tabular} + +\section{48-bit Instruction Encoding Decision Procedure} + +In the following decision procedure, \textit{Reserved} means that there is not yet a +defined 48-bit instruction encoding for the base instruction. + +\begin{enumerate} + +\item + If the base instruction is a load instruction, then + + \begin{enumerate} + \item + If the base instruction is an I-type instruction, then + \begin{enumerate} + \item + The encoding is P48-LD-type. + + \end{enumerate} + + \item + Otherwise + \begin{enumerate} + \item + The encoding is \textit{Reserved}. + + \end{enumerate} + + \end{enumerate} +\item + If the base instruction is a store instruction, then + + \begin{enumerate} + \item + If the base instruction is an S-type instruction, then + \begin{enumerate} + \item + The encoding is P48-ST-type. + + \end{enumerate} + + \item + Otherwise + \begin{enumerate} + \item + The encoding is \textit{Reserved}. + + \end{enumerate} + + \end{enumerate} + +\item + If the base instruction is a SYSTEM instruction, then + + \begin{enumerate} + \item + The encoding is \textit{Reserved}. + + \end{enumerate} + +\item + If the base instruction is an integer instruction, then + + \begin{enumerate} + + \item + If the base instruction is an R-type instruction, then + \begin{enumerate} + \item + The encoding is P48-R-type. + + \end{enumerate} + + \item + If the base instruction is an I-type instruction, then + \begin{enumerate} + \item + The encoding is P48-I-type. + + \end{enumerate} + + \item + If the base instruction is an S-type instruction, then + \begin{enumerate} + \item + The encoding is \textit{Reserved}. + + \end{enumerate} + + \item + If the base instruction is an B-type instruction, then + \begin{enumerate} + \item + The encoding is \textit{Reserved}. + + \end{enumerate} + + \item + If the base instruction is an U-type instruction, then + \begin{enumerate} + \item + The encoding is P48-U-type. + + \end{enumerate} + + \item + If the base instruction is an J-type instruction, then + \begin{enumerate} + \item + The encoding is \textit{Reserved}. + + \end{enumerate} + + \item + Otherwise + \begin{enumerate} + \item + The encoding is \textit{Reserved}. + + \end{enumerate} + + \end{enumerate} + +\item + If the base instruction is a floating-point instruction, then + + \begin{enumerate} + + \item + If the base instruction is an R-type instruction, then + \begin{enumerate} + \item + The encoding is P48-FR-type. + + \end{enumerate} + + \item + If the base instruction is an I-type instruction, then + \begin{enumerate} + \item + The encoding is P48-FI-type. + + \end{enumerate} + + \item + If the base instruction is an S-type instruction, then + \begin{enumerate} + \item + The encoding is \textit{Reserved}. + + \end{enumerate} + + \item + If the base instruction is an B-type instruction, then + \begin{enumerate} + \item + The encoding is \textit{Reserved}. + + \end{enumerate} + + \item + If the base instruction is an U-type instruction, then + \begin{enumerate} + \item + The encoding is \textit{Reserved}. + + \end{enumerate} + + \item + If the base instruction is an J-type instruction, then + \begin{enumerate} + \item + The encoding is \textit{Reserved}. + + \end{enumerate} + + \item + If the base instruction is an R4-type instruction, then + \begin{enumerate} + \item + The encoding is P48-FR4-type. + + \end{enumerate} + + \item + Otherwise + \begin{enumerate} + \item + The encoding is \textit{Reserved}. + + \end{enumerate} + \end{enumerate} + +\item + Otherwise + The encoding is \textit{Reserved}. + +\end{enumerate} + +\section{CSR Registers} + +CSRs are the same as in the main \Specification, if associated functionality is implemented. They have the exact same meaning as in the main \Specification. + +\begin{itemize} +\item + VL + +\item + MVL + +\item + SVPSTATE + +\item + SUBVL + +\end{itemize} + +Associated SET and GET on the CSRs is exactly as in the main spec as well +(including CSRRWI and CSRRW differences). + +Note that if both VLtyp and svlen are not implemented, SVPSTATE is not +required. Also if VL and SUBVL are not implemented, STATE from the main +\Specification is not required either. + +However if partial functionality is implemented, the unimplemented bits in +STATE and SVPSTATE must be zero, and, in the UNIX Platform, an illegal +exception MUST be raised if unsupported bits are written to. + +SVPSTATE fields are exactly the same layout as STATE: + +\begin{tabular}{|l|l|l|l|l|l|l|} \hline + (31..28) & (27..26) & (25..24) & (23..18) & (17..12) & (11..6) & (5...0) \\ \hline + rsvd & dsvoffs & subvl & destoffs & srcoffs & vl & maxvl \\ \hline +\end{tabular} + +However note that where STATE stores the scalar register number to be used as +VL, SVPSTATE.VL actually contains the actual VL value, in an identical fashion +to RVV. + +\section{Additional Instructions} + +\begin{itemize} +\item + Add instructions to convert between integer types. + +\item + Add instructions to swizzle elements in sub-vectors. Note that the + sub-vector lengths of the source and destination won't necessarily match. + +\item + Add instructions to transpose (2-4)x(2-4) element matrices. + +\item + Add instructions to insert or extract a sub-vector from a vector, with the + index allowed to be both immediate and from a register (immediate can be + covered by twin-predication, register might be, by virtue of predicates + being registers) + +\item + Add a register gather instruction (aka MV.X: regfile[rd] = + regfile[regfile[rs1]]) + +\end{itemize} + +subelement swizzle example: + + velswizzle x32, x64, SRCSUBVL=3, DESTSUBVL=4, ELTYPE=u8, elements=[0, 0, 2, 1] + +\section{Questions} + +Moved to the discussion page (link at top of this page) + +\section{TODO} + +Work out a way to do sub-element swizzling. diff --git a/powerpc-add/src/conventions.tex b/powerpc-add/src/conventions.tex new file mode 100644 index 0000000..2280dd1 --- /dev/null +++ b/powerpc-add/src/conventions.tex @@ -0,0 +1,42 @@ +% Conventions used in this document + +\chapter{Conventions used in this document} + +\begin{itemize} +\parskip 0pt +\itemsep 1pt + +\item + +Bits are numbered starting from 0 at the LSB, so bit 3 is 1 in the integer 8. + +\item + +Bit ranges are inclusive on both ends, so 5:3 means bits 5, 4, and 3. + +\item + +Operations work on variable-length vectors of sub-vectors up to VL in length, +where each sub-vector has a length svlen, and svlen elements of type etype. + +\item + +The actual total number of elements is therefore svlen times VL. + +\item + +When the vectors are stored in registers, all elements are packed so that there +is no padding in-between elements of the same vector. + +\item + +The register file itself is thus best viewed as a byte-level SRAM that is +typecast to an array of etypes + +\item + +The number of bytes in a sub-vector, svsz, is the product of svlen and the +element size in bytes. + +\end{itemize} + diff --git a/powerpc-add/src/glossary.tex b/powerpc-add/src/glossary.tex index 941adf2..0ddc36e 100644 --- a/powerpc-add/src/glossary.tex +++ b/powerpc-add/src/glossary.tex @@ -1,12 +1,15 @@ % Glossary +% Yes: some of them are obvious, but it does no harm. % In the main documentation text I have not tagged every use of the glossary % entries below, I have tagged the first in a chapter, or first use for some % number of paragraphs. -% Put the definition of terms of the glossary terms in here -% Try to keep in alphabetic order - for easier editing, they will -% be generated (in the PDF) in alphabetic order regardless of the order below +% Put the definition of terms of the glossary terms in here I did try to keep +% in alphabetic order - for easier editing, but then decided that keeping +% related terms together helped (in which case put some blank lines +% before/after the related items). They will be generated (in the PDF) in +% alphabetic order regardless of the order below % To use one do something like: \gls{PowerPC} % Note that the entries are case sensitive. @@ -46,6 +49,17 @@ % entries below to appear. This seems to be if an entry is only mentioned in another % glossary entry. +\newglossaryentry{ALU} +{ + name=ALU, + description={ + Arithmetic Logic Unit. + The part of the computer that does calculations of integer data. + Contrast to the \gls{FPU}. + See: \href{https://en.wikipedia.org/wiki/Arithmetic_logic_unit}{Wikipedia} + } +} + \newglossaryentry{Binutils} { name=Binutils, @@ -90,6 +104,29 @@ } } +\newglossaryentry{FPU} +{ + name=FPU, + description={ + Floating Point Unit. + The part of the computer that does calculations of data in, probably, \gls{IEEE754} format. + See: \href{https://en.wikipedia.org/wiki/Floating-point_unit}{Wikipedia} + } +} + +\newglossaryentry{FU} +{ + name=FU, + description={ + Functional Unit. + A computer typically has five main functional units: + \gls{CPU}, Input (get data from: keyboard, network, microphone, ...), + Output (send data to: screen, network, audio, ...), Memory (RAM, disk, ...) \& + Control (coordinates the other FUs). + Many of these FUs are built out of smaller FUs. + } +} + \newglossaryentry{gcc} { name=gcc, @@ -141,7 +178,7 @@ A popular standard way of representing and manipulating floating point numbers. Initiated by the Institute of Electrical and Electronics Engineers in 1985. Different precisions from 16 to 256 bits are described. - See: \href{https://en.wikipedia.org/wiki/IEEE_754}{Wikipedia} + See: \gls{FPU} \href{https://en.wikipedia.org/wiki/IEEE_754}{Wikipedia} } } @@ -152,7 +189,7 @@ Input Output Memory Management Unit. Mediates between Input/Output devices and main memory mapping virtual addresses to physical ones and, maybe, enforcing protection restrictions. - See: \href{https://en.wikipedia.org/wiki/Input%E2%80%93output_memory_management_unit}{Wikipedia} + See: \href{https://en.wikipedia.org/wiki/Input\%E2\%80\%93output_memory_management_unit}{Wikipedia} } } @@ -225,6 +262,49 @@ } } + + +\newglossaryentry{LSB} +{ + name=LSB, + description={ + Least Significant Bit. + In an integer represented in binary, the bit that has the smallest value. + In this document the LSB is called 'bit 0', if this is the only bit set the + integer will have the value 1. + See: \gls{MSB} +% See: \href{}{} + } +} + +\newglossaryentry{MSB} +{ + name=MSB, + description={ + Most Significant Bit. + In an integer represented in binary, the bit that has the greatest value. + If the integer is signed, this bit will make the integer negative if it is 1. + In this document the MSB is given the highest number in the integer, eg: + in 8 bits it is called 'bit 7'; in 32 bits it is called 'bit 31'. + See: \gls{LSB} +% See: \href{}{} + } +} + + + +% microwatt https://github.com/antonblanchard/microwatt/blob/master/decode1.vhdl +% https://www.zephyrproject.org/microwatt-and-the-power-isa-support-in-renode/ +% I did see references to this somewhere ... cannot see them now +\newglossaryentry{microwatt} +{ + name=microwatt, + description={ + An open source implementation of POWER by IBM in 2019. +% See: \href{}{} + } +} + \newglossaryentry{MISA} { name=MISA, @@ -233,7 +313,7 @@ The ability to run more than one \gls{ISA} on the same hardware. A setting in a \gls{CSR} controls which instructions will be recognised at any time. - See: \href{}{} +% See: \href{}{} } } % https://ieeexplore.ieee.org/document/6136696 - paywalled @@ -249,6 +329,15 @@ } } +\newglossaryentry{PC} +{ + name=PC, + description={ + Program Counter. + A register that holds the address of the instruction being executed. + } +} + \newglossaryentry{PowerPC} { name=PowerPC, @@ -291,6 +380,16 @@ } } +\newglossaryentry{SP} +{ + name=SP, + description={ + Stack Pointer. + A register that holds the address of the current function stack + frame -- used for variables local to a function. + } +} + \newglossaryentry{SPARC} { name=SPARC, @@ -336,8 +435,7 @@ description={ Video Processing Unit. Similar to a \gls{CPU} but has extra hardware instructions to speed up things - like the decoding and encoding of \gls{H.265}, or \gls{VP9}. -% See: \href{}{} +% like the decoding and encoding of \gls{H.265}, or \gls{VP9}. } } @@ -351,23 +449,57 @@ } } + + +\newglossaryentry{LE} +{ + name=LE, + description={ + Little Endian. + When 2/4/8 bytes are loaded into a 16/32/64 bit register the bytes at \textbf{lower} + memory addresses are put into \textbf{lower -- less significant} places in the register. + Intel/AMD are LE. + See: \gls{BE} and \gls{endian} + } +} +\newglossaryentry{BE} +{ + name=BE, + description={ + Big Endian. + When 2/4/8 bytes are loaded into a 16/32/64 bit register the bytes at \textbf{lower} + memory addresses are put into \textbf{higher -- more significant} places in the register. + IBM z is BE. + See: \gls{LE} and \gls{endian} + } +} +\newglossaryentry{endian} +{ + name=endian, + description={ + Describes in a multi-byte word, which byte contains the most significant bits. + Two choices Little Endian \gls{LE} and and Big Endian \gls{BE} predominate, + but it can be more complicated when a word is made of 4 or more bytes. + \gls{PowerPC}, ARM \& \gls{SPARC} can be either LE or BE. + See: \href{https://en.wikipedia.org/wiki/Endianness}{Wikipedia} + } +} + + + % Other entries to consider: % emulator % namespace % MSB % PCR % SIMD -% ALU % RA % RB -% microwatt https://github.com/antonblanchard/microwatt/blob/master/decode1.vhdl % 6600 https://libre-soc.org/3d_gpu/architecture/6600scoreboard/ % DAG Directed Acyclic Graph % SR latch -% FU Functional Unit -% FPU float point unit +% % WAR https://libre-soc.org/3d_gpu/architecture/6600scoreboard/ #10 -% ALU % FU-FU function to function https://libre-soc.org/3d_gpu/architecture/6600scoreboard/ #14 % GORD GOWR GO read/write % ISANS @@ -376,3 +508,8 @@ % SIE % WARL % WLRL +% xepc -- in isamux.tex +% RV - as in RV mode Description isamux +% RVS - see SVPrefix +% VL MVL see SVPrefix +% SVorig diff --git a/powerpc-add/src/intro.tex b/powerpc-add/src/intro.tex index 98032de..7ab4a79 100644 --- a/powerpc-add/src/intro.tex +++ b/powerpc-add/src/intro.tex @@ -1,4 +1,4 @@ -% +% Introduction \chapter{Introduction} \section{Why has Libre-SOC chosen PowerPC ?} diff --git a/powerpc-add/src/isamux.tex b/powerpc-add/src/isamux.tex index 8d7a485..4352421 100644 --- a/powerpc-add/src/isamux.tex +++ b/powerpc-add/src/isamux.tex @@ -1,7 +1,7 @@ % ISAMUX % https://bugs.libre-soc.org/show_bug.cgi?id=214 -\chapter{Introduction} +\chapter{ISAMUX} \paragraph{} @@ -91,7 +91,7 @@ Bits 24 thru 31 are for custom usage. \item -bit 6 (\textbf{B}) is endian-selection: LE/BE +bit 6 (\textbf{B}) is \gls{endian}-selection: \gls{LE}/\gls{BE} \end{itemize} @@ -147,7 +147,8 @@ Foreign Arch Mode \item -when bit 0 is 1, \textbf{Foreign arch} mode is selected. +when bit 0 (the \gls{LSB}) is 1, \textbf{Foreign arch} mode is selected. +% part of the reason for having LSB here is to avoid glossary ordering problems \item @@ -155,7 +156,7 @@ Bits 1 thru 7 are a table of foreign arches. \item -when the MSB is 1, this is for custom use. +when the \gls{MSB} is 1, this is for custom use. \item @@ -256,14 +257,15 @@ state!) which is quite severely burdensome and getting exceptionally complex. \paragraph{} -Switching \gls{CSR}, PC (and potentially SP) and other state on a NS change in the +Switching \gls{CSR}, \gls{PC} (and potentially \gls{SP}) and other state on a NS change in the RISCV unary NS therefore needs to be done wisely and responsibly, i.e. minimised! \paragraph{} To be discussed. Context -href=https://groups.google.com/a/groups.riscv.org/d/msg/isa-dev/x-uFZDXiOxY/27QDW5KvBQAJ +href=https://groups.google.com/a/groups.riscv.org/d/msg/isa-dev/\\ +x-uFZDXiOxY/27QDW5KvBQAJ \section{Privileged Modes / Traps} \label{privtraps} @@ -275,7 +277,7 @@ another called \textbf{TRAP-ISANS} These mirrors the ISANS CSR, and, on a trap, the current ISANS in that privilege level is atomically transferred into LAST-ISANS by the hardware, and ISANS in that trap -is set to TRAP-ISANS. Hardware is \textbf{only then} permitted to modify the PC to +is set to TRAP-ISANS. Hardware is \textbf{only then} permitted to modify the \gls{PC} to begin execution of the trap. \paragraph{} @@ -365,7 +367,7 @@ the trap handler routine is written. \paragraph{} -Open question: see https://groups.google.com/a/groups.riscv.org/d/msg/isa-dev/IAhyOqEZoWA/BM0G3J2zBgAJ +Open question: see https://groups.google.com/a/groups.riscv.org/d/msg/isa\-dev/IAhyOqEZoWA/BM0G3J2zBgAJ \begin{verbatim} trap_entry(x_cause) @@ -383,7 +385,7 @@ trap_exit(x_cause): } \end{verbatim} -\subsection{Is this like \gls{MISA} ?} \label{misa} +\subsection{Is this like MISA ?} \label{misa} \paragraph{} @@ -395,7 +397,7 @@ No. \item -MISA's space is entirely taken up (and running out). +\gls{MISA}'s space is entirely taken up (and running out). \item diff --git a/powerpc-add/src/power-spec.tex b/powerpc-add/src/power-spec.tex index 86514f4..7340eda 100644 --- a/powerpc-add/src/power-spec.tex +++ b/powerpc-add/src/power-spec.tex @@ -38,8 +38,9 @@ Please cite as: ``The PowerPC Instruction Set Additions, Document Version \specr Alain Williams, Libre-SOC, \specmonthyear. -\markboth{Volume I: PowerPC ISAMUX \specrev} -{Volume I: PowerPC ISAMUX \specrev} +\markboth{PowerPC ISAMUX \specrev: Volume I} +{PowerPC ISAMUX \specrev: Volume I} + \thispagestyle{empty} \frontmatter @@ -52,13 +53,18 @@ Alain Williams, Libre-SOC, \specmonthyear. \mainmatter +% These will need to be sorted - put into an order than helps understanding \input{intro} +\input{conventions} + + \input{isamux} \input{atomics} \input{varenc} \input{isa_op_protocol} % https://bugs.libre-soc.org/show_bug.cgi?id=238 \input{fp16} +\input{SVPrefix} % \input{rv32} % \input{zifencei} diff --git a/powerpc-add/src/preamble.tex b/powerpc-add/src/preamble.tex index d96b80d..5b906d7 100644 --- a/powerpc-add/src/preamble.tex +++ b/powerpc-add/src/preamble.tex @@ -1,5 +1,7 @@ % Package includes +\usepackage{tabularx} % For tables with wide columns + \usepackage{graphicx} \usepackage{geometry} \usepackage{array} @@ -13,7 +15,7 @@ \usepackage{float} \usepackage{listings} \usepackage{comment} -\usepackage{enumitem} +\usepackage{enumitem} % https://ctan.org/pkg/enumitem?lang=en \usepackage{verbatimbox} \usepackage{amsmath} @@ -65,10 +67,13 @@ } % Custom list environments - +% https://ctan.org/pkg/enumitem?lang=en \newlist{tightlist}{itemize}{1} \setlist[tightlist]{label=\textbullet,nosep} +% Remove too much space before lists: +\setlist[itemize]{topsep=0pt, partopsep=0pt} + \newenvironment{titledtightlist}[1] {\noindent ~~\textbf{#1} @@ -92,28 +97,16 @@ } {\endlist} -%\newenvironment{discussion} -%{ \vspace{-1.5mm} -% \list{}{ -% \topsep 0mm -% \partopsep 0mm -% \listparindent 1.5em -% \itemindent \listparindent -% \rightmargin \leftmargin -% \parsep 0mm -% } -% \item -% \small\em -% \noindent\nopagebreak\rule{\linewidth}{1pt}\par -% \noindent\textbf{Discussion:} -%} -%{\endlist} - % Other commands and parameters -\pagestyle{myheadings} +\pagestyle{myheadings} % page headers/footers, see \markboth{} in power-spec.tex \setlength{\parindent}{0in} + +% How much after paragraph + \setlength{\parskip}{10pt} +%\setlength{\parskip}{5pt} + \sloppy \raggedbottom \clubpenalty=10000 @@ -161,3 +154,4 @@ \newcommand{\warl}{\textbf{WARL}} \newcommand{\unspecified}{\textsc{unspecified}} + diff --git a/powerpc-add/src/preface.tex b/powerpc-add/src/preface.tex index baa330d..097ea66 100644 --- a/powerpc-add/src/preface.tex +++ b/powerpc-add/src/preface.tex @@ -2,4 +2,5 @@ This document describes the Libre-SOC ISAMUX additions to the PowerPC architecture. +% Something to keep bibtex happy until we give it something real: \cite{foo-bar}