on all first Subvector elements, followed by another separate independent
Parallel Reduction on all the second Subvector elements and so on.
+ for selectsubelement in (x,y,z,w):
+ parallelreduce(0..VL-1, selectsubelement)
+
By contrast, when SVM is set and SUBVL!=1, a Horizontal
Subvector mode is enabled, applying the Parallel Reduction
Algorithm to the Subvector Elements. The Parallel Reduction
for (i = 0; i < VL; i++)
if (predval & 1<<i) # predication
- subvecparallelreduction(...)
+ el = element[i]
+ parallelreduction([el.x, el.y, el.z, el.w])
Note that as this is a Parallel Reduction, for best results
it should be an overwrite operation, where the result for