| 0 | 0 | 0 | zz els | simple mode |
| 0 | 0 | 1 | PI LF | post-increment and Fault-First |
| 1 | 0 | N | zz els | sat mode: N=0/1 u/s |
-|VLi| 1 | inv | CR-bit | Rc=1: ffirst CR sel |
-|VLi| 1 | inv | els RC1 | Rc=0: ffirst z/nonz |
+|VLi| 1 | inv | CR-bit | ffirst CR sel |
The `els` bit is only relevant when `RA.isvec` is clear: this indicates
whether stride is unit or element:
| 0 | 1 | 2 | 3 4 | description |
|---|---| --- |---------|--------------------------- |
|els| 0 | SEA | dz sz | simple mode |
-|VLi| 1 | inv | CR-bit | Rc=1: ffirst CR sel |
-|VLi| 1 | inv | els RC1 | Rc=0: ffirst z/nonz |
+|VLi| 1 | inv | CR-bit | ffirst CR sel |
Vector Indexed Strided Mode is qualified as follows:
Thus it can be seen that the use of Indexed REMAP saves copying
and manual reordering of the Vector of RB offsets.
-## LD/ST ffirst
+## LD/ST ffirst (Fault-First)
LD/ST ffirst treats the first LD/ST in a vector (element 0 if REMAP
is not active) as an ordinary one, with all behaviour with respect to
## Data-Dependent Fail-First (not Fail/Fault-First)
Not to be confused with Fail/Fault First, Data-Fail-First performs an
-additional check on the data into a Condition Register Field and if a test
-on the CR Field fails then VL is truncated and further looping terminates.
+additional check on the data, and if the test
+fails then VL is truncated and further looping terminates.
This is precisely the same as Arithmetic Data-Dependent Fail-First,
-the only difference being that the result comes from the LD/ST.
+the only difference being that the result comes from the LD/ST
+rather than from an Arithmetic operation.
+
+Also a crucial difference between Arithmetic and LD/ST Data-Dependent Fail-First:
+except for Store-Conditional a 4-bit Condition Register Field test is created
+for testing purposes
+*but not stored* (thus there is no RC1 Mode as there is in Arithmetic).
+The reason why a CR Field is not stored is because Load/Store, particularly
+the Update instructions, is already expensive in register terms,
+and adding an extra Vector write would be too costly in hardware.
+
+*Programmer's note: Programmers
+may use Data-Dependent Load with a test to truncate VL, and may then
+follow up with a `sv.cmpi` or other operation. The important aspect is
+that the Vector Load truncated on finding a NULL pointer, for example.*
In the case of Store operations there is a quirk when VLi (VL inclusive
is "Valid") is clear. Bear in mind the criteria is that the truncated
VLi is set the *current failed test* is permitted to be included. Thus,
the actual update (store) to Memory is **not permitted to take place**
should the test fail. Therefore, on testing the value to be stored,
-and after updating the corresponding CR Field Element, when VLi=0 and
-finding that the test fails the Memory store must **not** occur.
+when VLi=0 and finding that the test fails the Memory store must **not** occur.
Additionally, when VLi=0 and a test fails then RA does **not** receive a
copy of the Effective Address. Hardware implementations with Out-of-Order