0 1 2 3 4 description
------------------
- 0 0 M sz dz reduce mode (M=1)
+ 0 0 M sz CRM reduce mode (M=1).
0 1 inv CR-bit Rc=1: ffirst CR sel
0 1 inv sz dz Rc=0: ffirst z/nonz
1 0 N sz dz sat mode: N=0/1 u/s
* **ffirst** or data-dependent fail-on-first: see separate section.
* **sat mode** or saturation: clamps the result to a min/max rather than overflows / wraps. allows signed and unsigned clamping.
* **reduce mode**. when M=1 a mapreduce is performed. the result is a scalar. a vector however is required, as it may be used to store intermediary computations. the result is in the first element with a nonzero predicate bit.
+ note that reduce mode only applies to 2 src operations.
* **pred-result** will test the result (CR testing selects a bit of CR and inverts it, just like branch testing) and if the test fails it is as if the predicate bit was zero. When Rc=1 the CR element (CR0) however is still stored in the CR regfile. This scheme does not apply to crops (crand, cror).
# Notes about rounding, clamp and saturate
If there are spare bits it would be very good to look at using some of them to specify the mode, because otherwise a SPR has to be used which will need to be set and unset. This can get costly.
+# Notes about reduce mode
+
+1. limited to single predicated dual src operations (add RT, RA, RB) and to triple source operations where one of the inputs is set to a scalar (e.g. isel)
+2. limited to operations that make sense. divide is excluded, as us subtract. multiply, add, logical bitwise OR, CR operations.
+3. the destination is a vector but the result is stored, ultimately, in the first nonzero predicated element. all other nonzero predicated elements are undefined.
+4. implementations may use any ordering and any algorithm to reduce down to a single result. However it must be equivalent to a straight application of mapreduce. The destination vector (except masked out elements) may be used for storing any intermediate results. these may be left in the vector (undefined).
+5. CRM applies when Rc=1. When CRM is zero, the CR associated with the result is regarded as a "some results met standard CR result criteria". When CRM is one, this changes to "all results met standard CR criteria".
+
+TODO: Rc=1 on Logical Operations? is this possible?
+
# Fail-on-first
Data-dependent fail-on-first has two distinct variants: one for LD/ST, the other for arithmetic operations (actually, CR-driven). Note in each case the assumption is that vector elements are required appear to be executed in sequential Program Order, element 0 being the first.