Data-driven (CR-driven) fail-on-first activates when Rc=1 or other
CR-creating operation produces a result (including cmp). Similar to
branch, an analysis of the CR is performed and if the test fails, the
-vector operation terminates and discards all element operations at and
-above the current one, and VL is truncated to either
+vector operation terminates and discards all element operations **at and
+above the current one**, and VL is truncated to either
the *previous* element or the current one, depending on whether
-VLi (VL "inclusive") is set.
+VLi (VL "inclusive") is clear or set, respectively.
Thus the new VL comprises a contiguous vector of results,
all of which pass the testing criteria (equal to zero, less than zero etc
as defined by the CR-bit test).
+*Note: when VLi is clear, the behaviour at first seems counter-intuitive.
+A result is calculated but if the test fails it is prohibited from being
+actually written. This becomes intuitive again when it is remembered
+that the length that VL is set to is the number of *written* elements,
+and only when VLI is set will the current element be included in that
+count.*
+
The CR-based data-driven fail-on-first is "new" and not found in ARM
SVE or RVV. At the same time it is "old" because it is almost
identical to a generalised form of Z80's `CPIR` instruction.
suitable reason. Beyond the first element LD/ST Failfirst is
arbitrarily speculative and 100% non-deterministic.
* CR-based data-dependent first on the other hand MUST NOT truncate VL
-arbitrarily to a length decided by the hardware: VL MUST only be
-truncated based explicitly on whether a test fails.
-This because it is a precise Deterministic test on which algorithms
-can and will will rely.
+ arbitrarily to a length decided by the hardware: VL MUST only be
+ truncated based explicitly on whether a test fails.
+ This because it is a precise Deterministic test on which algorithms
+ can and will will rely.
**Floating-point Exceptions**