being bits 19-23 of SVP64 `RM`, is laid out as
follows:
-| 0-1 | 2 | 3 4 | description |
-| --- | --- |---------|-------------------------- |
-| 00 | 0 | dz sz | simple mode |
-| 00 | 1 | 0 RG | scalar reduce mode (mapreduce) |
-| 00 | 1 | 1 / | reserved |
-| 01 | inv | CR-bit | Rc=1: ffirst CR sel |
-| 01 | inv | VLi RC1 | Rc=0: ffirst z/nonz |
-| 10 | N | dz sz | sat mode: N=0/1 u/s |
-| 11 | / | / / | reserved |
+| 0-1 | 2 | 3 4 | description |
+| ------ | --- |---------|-------------------------- |
+| 0 0 | 0 | dz sz | simple mode |
+| 0 0 | 1 | 0 RG | scalar reduce mode (mapreduce) |
+| 0 0 | 1 | 1 / | reserved |
+| 1 0 | N | dz sz | sat mode: N=0/1 u/s |
+| VLi 1 | inv | CR-bit | Rc=1: ffirst CR sel |
+| VLi 1 | inv | zz RC1 | Rc=0: ffirst z/nonz |
Fields:
to be executed in sequential Program Order. When REMAP is not active,
element 0 would be the first.
-Data-driven (CR-field-driven) fail-on-first activates when Rc=1 or other
-CR-creating operation produces a result (including cmp). Similar to
-Branch-Conditional,
-an analysis of the CR is performed and if the test fails, the
-vector operation terminates and discards all element operations **at and
-above the current one**, and VL is truncated to either the *previous*
+Arithmetic/Logical Data-driven (CR-field-driven) fail-on-first performs a
+test ofvthe result, similar to
+Branch-Conditional `BO` field testing, and if the test fails, the
+Vector Loop operation terminates, and VL is truncated to either the *previous*
element or the current one, depending on whether VLi (VL "inclusive")
is clear or set, respectively.
Thus the new VL comprises a contiguous vector of results, all of which
pass the testing criteria (equal to zero, less than zero etc as defined
-by the CR-bit test).
+by the CR-bit test). When Rc=1 the Condition Regster Field for
+the element just tested is always written out (regardless of VLi).
+
+* **VLi=0** Only elements that passed the test are written out. When Rc=1
+ the co-result CR Field element is written out (even if the current test failed).
+ Vector length is truncated to "elements that passed"
+* **VLi=1** Elements that were *tested* are written out. When Rc=1
+ the co-result CR Field element is written out.
+ Vector length is truncated to "elements tested up to the first fail point"
*Note: when VLi is clear, the behaviour at first seems counter-intuitive.
A result is calculated but if the test fails it is prohibited from being
a generalised form of Z80's `CPIR` instruction. It is extremely useful
for reducing instruction count, however requires speculative execution
involving modifications of VL to get high performance implementations.
-An additional mode (RC1=1) effectively turns what would otherwise be an
-arithmetic operation into a type of `cmp`. The CR is stored (and the
+An additional mode (RC1=1) allows instructions that would not normally
+have an Rc=1 mode to at least be tested for zero or non-zero.
+The CR is stored (and the
CR.eq bit tested against the `inv` field). If the CR.eq bit is equal to
`inv` then the Vector is truncated and the loop ends.
standard Out-of-Order Execution and the modification effort to efficiently
support Data-Dependent Fail-First within a pre-existing Multi-Issue
Out-of-Order Engine is anticipated to be minimal. In-Order systems on
-the other hand are expected, unavoidably, to be low-performance*.
+the other hand are expected, unavoidably, to be low-performance unless they
+also make use of `SVSTATE.hphint` and exploit it to safely implement rudimentary
+Shadow-Commit-Hold normally only found in Out-of-Order systems*.
Two extremely important aspects of ffirst are: