Note that when SVM is clear and SUBVL!=1 the sub-elements are
*independent*, i.e. they are mapreduced per *sub-element* as a result.
-illustration with a vec2, assuming RA==RT, e.g `sv.add/mr/vec2 r4, r4, r16`
+illustration with a vec2, assuming RA==RT, e.g `sv.add/mr/vec2 r4, r4, r16.v`
for i in range(0, VL):
# RA==RT in the instruction. does not have to be
Example for a vec2:
for i in range(VL):
- iregs[RT+i] = op(iregs[RA+i].x, iregs[RA+i].y)
+ iregs[RT+i] = op(iregs[RA+i].x, iregs[RB+i].y)
Example for a vec3:
for i in range(VL):
- iregs[RT+i] = op(iregs[RA+i].x, iregs[RA+i].y)
- iregs[RT+i] = op(iregs[RT+i] , iregs[RA+i].z)
+ iregs[RT+i] = op(iregs[RA+i].x, iregs[RB+i].y)
+ iregs[RT+i] = op(iregs[RT+i] , iregs[RB+i].z)
Example for a vec4:
for i in range(VL):
- iregs[RT+i] = op(iregs[RA+i].x, iregs[RA+i].y)
- iregs[RT+i] = op(iregs[RT+i] , iregs[RA+i].z)
- iregs[RT+i] = op(iregs[RT+i] , iregs[RA+i].w)
+ iregs[RT+i] = op(iregs[RA+i].x, iregs[RB+i].y)
+ iregs[RT+i] = op(iregs[RT+i] , iregs[RB+i].z)
+ iregs[RT+i] = op(iregs[RT+i] , iregs[RB+i].w)
In this mode, when Rc=1 the Vector of CRs is as normal: each result
element creates a corresponding CR element (for the final, reduced, result).
-Note that the destination (RT) is automatically used as an "Accumulator"
-register, and consequently the Sub-Vector Loop is interruptible.
-If RT is a Scalar then as usual the main VL Loop terminates at the
-first predicated element (or the first element if unpredicated).
+Note:
+
+1. that the destination (RT) is inherently used as an "Accumulator"
+ register, and consequently the Sub-Vector Loop is interruptible.
+ If RT is a Scalar then as usual the main VL Loop terminates at the
+ first predicated element (or the first element if unpredicated).
+2. that the Sub-Vector designation applies to RA and RB *but not RT*.
+3. that the number of operations executed is one less than the Sub-vector
+ length
# Fail-on-first