-# CSRs <a name="csrs"></a>
-
-There are a number of CSRs needed, which are used at the instruction
-decode phase to re-interpret RV opcodes (a practice that has
-precedent in the setting of MISA to enable / disable extensions).
-
-* Integer Register N is Vector of length M: r(N) -> r(N..N+M-1)
-* Integer Register N is of implicit bitwidth M (M=default,8,16,32,64)
-* Floating-point Register N is Vector of length M: r(N) -> r(N..N+M-1)
-* Floating-point Register N is of implicit bitwidth M (M=default,8,16,32,64)
-* Integer Register N is a Predication Register (note: a key-value store)
-* Vector Length CSR (VSETVL, VGETVL)
-
-Also (see Appendix, "Context Switch Example") it may turn out to be important
-to have a separate (smaller) set of CSRs for M-Mode (and S-Mode) so that
-Vectorised LOAD / STORE may be used to load and store multiple registers:
-something that is missing from the Base RV ISA.
-
-Notes:
-
-* for the purposes of LOAD / STORE, Integer Registers which are
- marked as a Vector will result in a Vector LOAD / STORE.
-* Vector Lengths are *not* the same as vsetl but are an integral part
- of vsetl.
-* Actual vector length is *multipled* by how many blocks of length
- "bitwidth" may fit into an XLEN-sized register file.
-* Predication is a key-value store due to the implicit referencing,
- as opposed to having the predicate register explicitly in the instruction.
-* Whilst the predication CSR is a key-value store it *generates* easier-to-use
- state information.
-* TODO: assess whether the same technique could be applied to the other
- Vector CSRs, particularly as pointed out in Section 17.8 (Draft RV 0.4,
- V2.3-Draft ISA Reference) it becomes possible to greatly reduce state
- needed for context-switches (empty slots need never be stored).
-
-## Predication CSR <a name="predication_csr_table"></a>
-
-The Predication CSR is a key-value store indicating whether, if a given
-destination register (integer or floating-point) is referred to in an
-instruction, it is to be predicated. The first entry is whether predication
-is enabled. The second entry is whether the register index refers to a
-floating-point or an integer register. The third entry is the index
-of that register which is to be predicated (if referred to). The fourth entry
-is the integer register that is treated as a bitfield, indexable by the
-vector element index.
-
-| PrCSR | 7 | 6 | 5 | (4..0) | (4..0) |
-| ----- | - | - | - | ------- | ------- |
-| 0 | zero0 | inv0 | i/f | regidx | predidx |
-| 1 | zero1 | inv1 | i/f | regidx | predidx |
-| .. | zero.. | inv.. | i/f | regidx | predidx |
-| 15 | zero15 | inv15 | i/f | regidx | predidx |
-
-The Predication CSR Table is a key-value store, so implementation-wise
-it will be faster to turn the table around (maintain topologically
-equivalent state):
-
- struct pred {
- bool zero;
- bool inv;
- bool enabled;
- int predidx; // redirection: actual int register to use
- }
-
- struct pred fp_pred_reg[32];
- struct pred int_pred_reg[32];
-
- for (i = 0; i < 16; i++)
- tb = int_pred_reg if CSRpred[i].type == 0 else fp_pred_reg;
- idx = CSRpred[i].regidx
- tb[idx].zero = CSRpred[i].zero
- tb[idx].inv = CSRpred[i].inv
- tb[idx].predidx = CSRpred[i].predidx
- tb[idx].enabled = true
-
-So when an operation is to be predicated, it is the internal state that
-is used. In Section 6.4.2 of Hwacha's Manual (EECS-2015-262) the following
-pseudo-code for operations is given, where p is the explicit (direct)
-reference to the predication register to be used:
-
- for (int i=0; i<vl; ++i)
- if ([!]preg[p][i])
- (d ? vreg[rd][i] : sreg[rd]) =
- iop(s1 ? vreg[rs1][i] : sreg[rs1],
- s2 ? vreg[rs2][i] : sreg[rs2]); // for insts with 2 inputs
-
-This instead becomes an *indirect* reference using the *internal* state
-table generated from the Predication CSR key-value store, which iwws used
-as follows.
-
- if type(iop) == INT:
- preg = int_pred_reg[rd]
- else:
- preg = fp_pred_reg[rd]
-
- for (int i=0; i<vl; ++i)
- predidx = preg[rd].predidx; // the indirection takes place HERE
- if (!preg[rd].enabled)
- predicate = ~0x0; // all parallel ops enabled
- else:
- predicate = intregfile[predidx]; // get actual reg contents HERE
- if (preg[rd].inv) // invert if requested
- predicate = ~predicate;
- if (predicate && (1<<i))
- (d ? regfile[rd+i] : regfile[rd]) =
- iop(s1 ? regfile[rs1+i] : regfile[rs1],
- s2 ? regfile[rs2+i] : regfile[rs2]); // for insts with 2 inputs
- else if (preg[rd].zero)
- // TODO: place zero in dest reg
-
-Note:
-
-* d, s1 and s2 are booleans indicating whether destination,
- source1 and source2 are vector or scalar
-* key-value CSR-redirection of rd, rs1 and rs2 have NOT been included
- above, for clarity. rd, rs1 and rs2 all also must ALSO go through
- register-level redirection (from the Register CSR table) if they are
- vectors.
-
-## MAXVECTORDEPTH
-
-MAXVECTORDEPTH is the same concept as MVL in RVV. However in Simple-V,
-given that its primary (base, unextended) purpose is for 3D, Video and
-other purposes (not requiring supercomputing capability), it makes sense
-to limit MAXVECTORDEPTH to the regfile bitwidth (32 for RV32, 64 for RV64
-and so on).
-
-The reason for setting this limit is so that predication registers, when
-marked as such, may fit into a single register as opposed to fanning out
-over several registers. This keeps the implementation a little simpler.
-Note also (as also described in the VSETVL section) that the *minimum*
-for MAXVECTORDEPTH must be the total number of registers (15 for RV32E
-and 31 for RV32 or RV64).
-
-Note that RVV on top of Simple-V may choose to over-ride this decision.
-
-## Vector-length CSRs
-
-Vector lengths are interpreted as meaning "any instruction referring to
-r(N) generates implicit identical instructions referring to registers
-r(N+M-1) where M is the Vector Length". Vector Lengths may be set to
-use up to 16 registers in the register file.
-
-One separate CSR table is needed for each of the integer and floating-point
-register files:
-
-| RegNo | (3..0) |
-| ----- | ------ |
-| r0 | vlen0 |
-| r1 | vlen1 |
-| .. | vlen.. |
-| r31 | vlen31 |
-
-An array of 32 4-bit CSRs is needed (4 bits per register) to indicate
-whether a register was, if referred to in any standard instructions,
-implicitly to be treated as a vector.
-
-Note:
-
-* A vector length of 1 indicates that it is to be treated as a scalar.
- Bitwidths (on the same register) are interpreted and meaningful.
-* A vector length of 0 indicates that the parallelism is to be switched
- off for this register (treated as a scalar). When length is 0,
- the bitwidth CSR for the register is *ignored*.
-
-Internally, implementations may choose to use the non-zero vector length
-to set a bit-field per register, to be used in the instruction decode phase.
-In this way any standard (current or future) operation involving
-register operands may detect if the operation is to be vector-vector,
-vector-scalar or scalar-scalar (standard) simply through a single
-bit test.
-
-Note that when using the "vsetl rs1, rs2" instruction (caveat: when the
-bitwidth is specifically not set) it becomes:
-
- CSRvlength = MIN(MIN(CSRvectorlen[rs1], MAXVECTORDEPTH), rs2)
-
-This is in contrast to RVV:
-
- CSRvlength = MIN(MIN(rs1, MAXVECTORDEPTH), rs2)
-
-## Element (SIMD) bitwidth CSRs
-
-Element bitwidths may be specified with a per-register CSR, and indicate
-how a register (integer or floating-point) is to be subdivided.
-
-| RegNo | (2..0) |
-| ----- | ------ |
-| r0 | vew0 |
-| r1 | vew1 |
-| .. | vew.. |
-| r31 | vew31 |