// reduction operation -- we still use this algorithm even
// if the reduction operation isn't associative or
// commutative.
-/// `temp_pred` is a user-visible Vector Condition register
+ XXX VIOLATION OF SVP64 DESIGN PRINCIPLES XXXX
+/// XXX `pred` is a user-visible Vector Condition register XXXX
+ XXX VIOLATION OF SVP64 DESIGN PRINCIPLES XXXX
///
/// all input arrays have length `vl`
def reduce(vl, vec, pred):
pred[i] |= other_pred;
```
-The principle in SVP64 being violated is that SVP64 is a fully-independent
+The first principle in SVP64 being violated is that SVP64 is a fully-independent
Abstraction of hardware-looping in between issue and execute phases
that has no relation to the operation it issues. The above pseudocode
conditionally changes not only the type of element operation issued
At the very least, for Vertical-First Mode this will result in unanticipated and unexpected behaviour (maximise "surprises" for programmers) in
the middle of loops, that will be far too hard to explain.
-An alternative algorithm is therefore required that does not perform MVs.
+The second principle being violated by the above algorithm is the expectation
+that temporary storage is available for a modified predicate: there is no
+such space. SVP64 is founded on the principle that all operations are
+"re-entrant" with respect to interrupts and exceptions: SVSTATE must
+be saved and restored alongside PC and MSR, but nothing more. It is perfectly
+fine to have context-switching back to the operation be somewhat slower,
+through "reconstruction" of temporary internal state based on what SVSTATE
+contains, but nothing more.
+
+An alternative algorithm is therefore required that does not perform MVs,
+and does not require additional state to be saved on context-switching.
```
def reduce( vl, vec, pred, pred,):
halfstep = step // 2
for i in (0..vl).step_by(step)
other = vi[i + halfstep]
- i = vi[i]
+ ir = vi[i]
other_pred = other < vl && pred[other]
if pred[i] && other_pred
- vec[i] += vec[other]
- pred[i] |= other_pred
+ vec[ir] += vec[other]
+ else if other_pred:
+ vi[ir] = vi[other] # index redirection, no MV
+ pred[ir] |= other_pred # reconstructed on context-switch
step *= 2
-
```
+In this version the need for an explicit MV is made unnecessary by instead
+leaving elements *in situ*. The internal modifications to the predicate may,
+due to the reduction being entirely deterministic, be "reconstructed"
+on a context-switch. This may make some implementations slower.
+
*Implementor's Note: many SIMD-based Parallel Reduction Algorithms are
implemented in hardware with MVs that ensure lane-crossing is minimised.
In SIMD ISAs the internal SIMD Architectural design is exposed and imposed on the programmer. Cray-style Vector ISAs on the other hand provide convenient,