## Sub-Vector Horizontal Reduction
-Note that when SVM is clear and SUBVL!=1 the sub-elements are
-*independent*, i.e. they are mapreduced per *sub-element* as a result.
-illustration with a vec2, assuming RA==RT, e.g `sv.add/mr/vec2 r4, r4, r16.v`
-
- for i in range(0, VL):
- # RA==RT in the instruction. does not have to be
- iregs[RT].x = op(iregs[RT].x, iregs[RB+i].x)
- iregs[RT].y = op(iregs[RT].y, iregs[RB+i].y)
-
-Thus logically there is nothing special or unanticipated about
-`SVM=0`: it is expected behaviour according to standard SVP64
-Sub-Vector rules.
+Note that when SVM is clear and SUBVL!=1 a Parallel Reduction is performed
+on all first Subvector elements, followed by another separate independent
+Parallel Reduction on all the second Subvector elements and so on.
By contrast, when SVM is set and SUBVL!=1, a Horizontal
Subvector mode is enabled, applying the Parallel Reduction
it should be an overwrite operation, where the result for
the Horizontal Reduction of each Subvector will be in the
first Subvector element.
-
Also note that use of Rc=1 is `UNDEFINED` behaviour.
+In essence what is happening here is that Structure Packing is being
+combined with Parallel Reduction. If the Subvector elements may be
+laid out as a 2D matrix, with the Subvector elements on rows,
+and Parallel Reduction is applied per row, then if `SVM` is **clear**
+the Matrix is transposed (like Pack/Unpack)
+before still applying the Parallel Reduction to the **row**.
+
# Fail-on-first <a name="fail-first"> </a>
Data-dependent fail-on-first has two distinct variants: one for LD/ST