and FP.
* **reduce mode**. a mapreduce is performed. the result is a scalar. a result vector however is required, as the upper elements may be used to store intermediary computations. the result of the mapreduce is in the first element with a nonzero predicate bit. see [[svp64/appendix]]
note that there are comprehensive caveats when using this mode.
-* **pred-result** will test the result (CR testing selects a bit of CR and inverts it, just like branch testing) and if the test fails it is as if the
-*destination* predicate bit was zero. When Rc=1 the CR element however is still stored in the CR regfile, even if the test failed. See appendix for details.
+* **pred-result** will test the result (CR testing selects a bit of CR and inverts it, just like branch conditional testing) and if the test fails it
+is as if the
+*destination* predicate bit was zero even before starting the operation.
+When Rc=1 the CR element however is still stored in the CR regfile, even if the test failed. See appendix for details.
Note that ffirst and reduce modes are not anticipated to be high-performance in some implementations. ffirst due to interactions with VL, and reduce due to it requiring additional operations to produce a result. normal, saturate and pred-result are however inter-element independent and may easily be parallelised to give high performance, regardless of the value of VL.
overflow bit is therefore simply set to zero if saturation did not occur,
and to one if it did.
-Note also that saturate on operations that produce a carry output are
-prohibited due to the conflicting use of the CR.so bit for storing if
-saturation occurred.
-
-Post-analysis of the Vector of CRs to find out if any given element hit
-saturation may be done using a mapreduced CR op (cror), or by using the
-new crweird instruction, transferring the relevant CR bits to a scalar
-integer and testing it for nonzero. see [[sv/cr_int_predication]]
+Note also that saturate on operations that set OE=1 must raise an
+Illegal Instruction due to the conflicting use of the CR.so bit for
+storing if
+saturation occurred. Integer Operations that produce a Carry-Out (CA, CA32):
+these two bits will be `UNDEFINED` if saturation is also requested.
Note that the operation takes place at the maximum bitwidth (max of
src and dest elwidth) and that truncation occurs to the range of the
dest elwidth.
+*Programmer's Note: Post-analysis of the Vector of CRs to find out if any given element hit
+saturation may be done using a mapreduced CR op (cror), or by using the
+new crrweird instruction with Rc=1, which will transfer the required
+CR bits to a scalar integer and update CR0, which will allow testing
+the scalar integer for nonzero. see [[sv/cr_int_predication]]*
+
# Reduce mode
Reduction in SVP64 is similar in essence to other Vector Processing
ISAs, but leverages the underlying scalar Base v3.0B operations.
Thus it is more a convention that the programmer may utilise to give
-the appearance and effect of a Horizontal Vector Reduction.
-Details are in the [[svp64/appendix]]
+the appearance and effect of a Horizontal Vector Reduction. Due
+to the unusual decoupling it is also possible to perform
+prefix-sum in certain circumstances. Details are in the [[svp64/appendix]]
# Fail-on-first
case the assumption is that vector elements are required appear to be
executed in sequential Program Order, element 0 being the first.
-
* Data-driven (CR-driven) fail-on-first activates when Rc=1 or other
CR-creating operation produces a result (including cmp). Similar to
branch, an analysis of the CR is performed and if the test fails, the
In CR-based data-driven fail-on-first there is only the option to select
and test one bit of each CR (just as with branch BO). For more complex
-tests this may be insufficient. If that is the case, a vectorised crops
+tests this may be insufficient. If that is the case, a vectorised crop
(crand, cror) may be used, and ffirst applied to the crop instead of to
-the arithmetic vector.
+the arithmetic vector. Note that crops are covered by
+the [[sv/cr_ops]] Mode format.
-One extremely important aspect of ffirst is:
+Two extremely important aspects of ffirst are:
* LDST ffirst may never set VL equal to zero. This because on the first
element an exception must be raised "as normal".
vectorised operations are effectively `nops` which is
*precisely the desired and intended behaviour*.
-CR-based data-dependent first on the other hand MUST not truncate VL
+The second crucial aspect, compared to LDST Ffirst:
+
+* LD/ST Failfirst may (beyond the initial first element
+ conditions) truncate VL for any architecturally
+ suitable reason.
+* CR-based data-dependent first on the other hand MUST NOT truncate VL
arbitrarily to a length decided by the hardware: VL MUST only be
truncated based explicitly on whether a test fails.
This because it is a precise test on which algorithms