The purpose of the Register table is to mark which registers change behaviour
if used in a "Standard" (normally scalar) opcode.
-16 bit format:
-
-| RegCAM | 15 | (14..8) | 7 | (6..5) | (4..0) |
-| ------ | - | - | - | ------ | ------- |
-| 0 | isvec0 | regidx0 | i/f | vew0 | regkey |
-| 1 | isvec1 | regidx1 | i/f | vew1 | regkey |
-| 2 | isvec2 | regidx2 | i/f | vew2 | regkey |
-| 3 | isvec3 | regidx3 | i/f | vew3 | regkey |
-
-8 bit format:
-
-| RegCAM | | 7 | (6..5) | (4..0) |
-| ------ | | - | ------ | ------- |
-| 0 | | i/f | vew0 | regnum |
-
-Mapping the 8-bit to 16-bit format:
-
-| RegCAM | 15 | (14..8) | 7 | (6..5) | (4..0) |
-| ------ | - | - | - | ------ | ------- |
-| 0 | isvec=1 | regnum0<<2 | i/f | vew0 | regnum0 |
-| 1 | isvec=1 | regnum1<<2 | i/f | vew1 | regnum1 |
-| 2 | isvec=1 | regnum2<<2 | i/f | vew2 | regnum2 |
-| 3 | isvec=1 | regnum2<<2 | i/f | vew3 | regnum3 |
+[[!inline raw="yes" pages="simple_v_extension/reg_table_format" ]]
Fields:
As the above table is a CAM (key-value store) it may be appropriate
(faster, less gates, implementation-wise) to expand it as follows:
- struct vectorised {
- bool isvector:1;
- int vew:2;
- bool enabled:1;
- int predidx:7;
- }
-
- struct vectorised fp_vec[32], int_vec[32];
-
- for (i = 0; i < len; i++) // from VBLOCK Format
- tb = int_vec if CSRvec[i].type == 0 else fp_vec
- idx = CSRvec[i].regkey // INT/FP src/dst reg in opcode
- tb[idx].elwidth = CSRvec[i].elwidth
- tb[idx].regidx = CSRvec[i].regidx // indirection
- tb[idx].isvector = CSRvec[i].isvector // 0=scalar
- tb[idx].enabled = true;
+[[!inline raw="yes" pages="simple_v_extension/reg_table" ]]
## Predication Table <a name="predication_csr_table"></a>
The handling of each (trap or conditional test) is slightly different:
see Instruction sections for further details
-16 bit format:
-
-| PrCSR | (15..11) | 10 | 9 | 8 | (7..1) | 0 |
-| ----- | - | - | - | - | ------- | ------- |
-| 0 | predidx | zero0 | inv0 | i/f | regidx | ffirst0 |
-| 1 | predidx | zero1 | inv1 | i/f | regidx | ffirst1 |
-| 2 | predidx | zero2 | inv2 | i/f | regidx | ffirst2 |
-| 3 | predidx | zero3 | inv3 | i/f | regidx | ffirst3 |
-
-Note: predidx=x0, zero=1, inv=1 is a RESERVED encoding. Its use must
-generate an illegal instruction trap.
-
-8 bit format:
-
-| PrCSR | 7 | 6 | 5 | (4..0) |
-| ----- | - | - | - | ------- |
-| 0 | zero0 | inv0 | i/f | regnum |
-
-Mapping from 8 to 16 bit format, the table becomes:
-
-| PrCSR | (15..11) | 10 | 9 | 8 | (7..1) | 0 |
-| ----- | - | - | - | - | ------- | ------- |
-| 0 | x9 | zero0 | inv0 | i/f | regnum | ff=0 |
-| 1 | x10 | zero1 | inv1 | i/f | regnum | ff=0 |
-| 2 | x11 | zero2 | inv2 | i/f | regnum | ff=0 |
-| 3 | x12 | zero3 | inv3 | i/f | regnum | ff=0 |
+[[!inline raw="yes" pages="simple_v_extension/pred_table_format" ]]
Pseudocode for predication:
- struct pred {
- bool zero; // zeroing
- bool inv; // register at predidx is inverted
- bool ffirst; // fail-on-first
- bool enabled; // use this to tell if the table-entry is active
- int predidx; // redirection: actual int register to use
- }
-
- struct pred fp_pred_reg[32];
- struct pred int_pred_reg[32];
-
- for (i = 0; i < len; i++) // number of Predication entries in VBLOCK
- tb = int_pred_reg if PredicateTable[i].type == 0 else fp_pred_reg;
- idx = VBLOCKPredicateTable[i].regidx
- tb[idx].zero = CSRpred[i].zero
- tb[idx].inv = CSRpred[i].inv
- tb[idx].ffirst = CSRpred[i].ffirst
- tb[idx].predidx = CSRpred[i].predidx
- tb[idx].enabled = true
-
- def get_pred_val(bool is_fp_op, int reg):
- tb = int_reg if is_fp_op else fp_reg
- if (!tb[reg].enabled):
- return ~0x0, False // all enabled; no zeroing
- tb = int_pred if is_fp_op else fp_pred
- if (!tb[reg].enabled):
- return ~0x0, False // all enabled; no zeroing
- predidx = tb[reg].predidx // redirection occurs HERE
- predicate = intreg[predidx] // actual predicate HERE
- if (tb[reg].inv):
- predicate = ~predicate // invert ALL bits
- return predicate, tb[reg].zero
+[[!inline raw="yes" pages="simple_v_extension/pred_table" ]]
+[[!inline raw="yes" pages="simple_v_extension/get_pred_value" ]]
## Fail-on-First Mode <a name="ffirst-mode"></a>
will have the effect of creating a mask of all ones, allowing ffirst
to be set.
-### Fail-on-first traps
-
-Except for the first element, ffault stops sequential element processing
-when a trap occurs. The first element is treated normally (as if ffirst
-is clear). Should any subsequent element instruction require a trap,
-instead it and subsequent indexed elements are ignored (or cancelled in
-out-of-order designs), and VL is set to the *last* instruction that did
-not take the trap.
+See [[appendix]] for more details on fail-on-first modes.
-Note that predicated-out elements (where the predicate mask bit is zero)
-are clearly excluded (i.e. the trap will not occur). However, note that
-the loop still had to test the predicate bit: thus on return,
-VL is set to include elements that did not take the trap *and* includes
-the elements that were predicated (masked) out (not tested up to the
-point where the trap occurred).
+# Simplified Pseudo-code example
-If SUBVL is being used (SUBVL!=1), the first *sub-group* of elements
-will cause a trap as normal (as if ffirst is not set); subsequently,
-the trap must not occur in the *sub-group* of elements. SUBVL will **NOT**
-be modified.
+A greatly simplified example illustrating (just) the VL hardware for-loop
+is as follows:
-Given that predication bits apply to SUBVL groups, the same rules apply
-to predicated-out (masked-out) sub-groups in calculating the value that VL
-is set to.
+[[!inline raw="yes" pages="simple_v_extension/simple_add_example" ]]
-### Fail-on-first conditional tests
+Note that zeroing, elwidth handling, SUBVL and PCVLIW have all been
+left out, for clarity. For examples on how to handle each, see
+[[appendix]].
-ffault stops sequential element conditional testing on the first element result
-being zero. VL is set to the number of elements that were processed before
-the fail-condition was encountered.
+# Vector Block Format <a name="vliw-format"></a>
-Note that just as with traps, if SUBVL!=1, the first of any of the *sub-group*
-will cause the processing to end, and, even if there were elements within
-the *sub-group* that passed the test, that sub-group is still (entirely)
-excluded from the count (from setting VL). i.e. VL is set to the total
-number of *sub-groups* that had no fail-condition up until execution was
-stopped.
+The Vector Block format uses the RISC-V 80-192 bit format from Section 1.5
+of the RISC-V Spec. It permits an optional VL/MVL/SUBVL block, up to 4
+16-bit (or 8 8-bit) Register Table entries, the same for Predicate Entries,
+and the rest of the instruction may be either standard RV opcodes or the
+SVPrefix opcodes ([[sv_prefix_proposal]])
-Note again that, just as with traps, predicated-out (masked-out) elements
-are included in the count leading up to the fail-condition, even though they
-were not tested.
+[[!inline raw="yes" pages="simple_v_extension/vblock_format_table" ]]
-The pseudo-code for Predication makes this clearer and simpler than it is
-in words (the loop ends, VL is set to the current element index, "i").
+For full details see ancillary resource: [[vblock_format]]
# Exceptions
-TODO: expand.
+Exception handling **MUST** be precise, in-order, and exactly
+like Standard RISC-V as far as the instruction execution order is
+concerned, regardless of whether it is PC, PCVBLK, VL or SUBVL that
+is currently being incremented.
# Hints
No specific hints are yet defined in Simple-V
-# Vector Block Format <a name="vliw-format"></a>
-
-See ancillary resource: [[vblock_format]]
-
# Subsets of RV functionality
It is permitted to only implement SVprefix and not the VBLOCK instruction