Sub-Vector rules.
By contrast, when SVM is set and SUBVL!=1, a Horizontal
-Subvector mode is enabled, which behaves very much more
-like a traditional Vector Processor Reduction instruction.
+Subvector mode is enabled, applying the Parallel Reduction
+Algorithm to the Subvector Elements. The Parallel Reduction
+is independently applied VL times, to each group of Subvector
+elements. Bear in mind that predication is never applied down
+into individual Subvector elements, but will be applied
+to select whether the *entire* Parallel Reduction on each
+group is performed or not.
-Example for a vec2:
-
- for i in range(VL):
- iregs[RT+i] = op(iregs[RA+i].x, iregs[RB+i].y)
-
-Example for a vec3:
-
- for i in range(VL):
- iregs[RT+i] = op(iregs[RA+i].x, iregs[RB+i].y)
- iregs[RT+i] = op(iregs[RT+i] , iregs[RB+i].z)
+ for (i = 0; i < VL; i++)
+ if (predval & 1<<i) # predication
+ subvecparallelreduction(...)
-Example for a vec4:
+Note that as this is a Parallel Reduction, for best results
+it should be an overwrite operation, where the result for
+the Horizontal Reduction of each Subvector will be in the
+first Subvector element.
- for i in range(VL):
- iregs[RT+i] = op(iregs[RA+i].x, iregs[RB+i].y)
- iregs[RT+i] = op(iregs[RT+i] , iregs[RB+i].z)
- iregs[RT+i] = op(iregs[RT+i] , iregs[RB+i].w)
-
-In this mode, when Rc=1 the Vector of CRs is as normal: each result
-element creates a corresponding CR element (for the final, reduced, result).
-
-Note:
-
-1. that the destination (RT) is inherently used as an "Accumulator"
- register, and consequently the Sub-Vector Loop is interruptible.
- If RT is a Scalar then as usual the main VL Loop terminates at the
- first predicated element (or the first element if unpredicated).
-2. that the Sub-Vector designation applies to RA and RB *but not RT*.
-3. that the number of operations executed is one less than the Sub-vector
- length
+Also note that use of Rc=1 is `UNDEFINED` behaviour.
# Fail-on-first <a name="fail-first"> </a>