# Proposed Parallel-reduction algorithm
-**This algorithm contains a MV operation and may NOT be used. Removal
-of the MV operation may be achieved by using index-redirection as was
-achieved in DCT and FFT REMAP**
-
-```
-/// reference implementation of proposed SimpleV reduction semantics.
-///
- // reduction operation -- we still use this algorithm even
- // if the reduction operation isn't associative or
- // commutative.
- XXX VIOLATION OF SVP64 DESIGN PRINCIPLES XXXX
-/// XXX `pred` is a user-visible Vector Condition register XXXX
- XXX VIOLATION OF SVP64 DESIGN PRINCIPLES XXXX
-///
-/// all input arrays have length `vl`
-def reduce(vl, vec, pred):
- pred = copy(pred) # must not damage predicate
- step = 1;
- while step < vl
- step *= 2;
- for i in (0..vl).step_by(step)
- other = i + step / 2;
- other_pred = other < vl && pred[other];
- if pred[i] && other_pred
- vec[i] += vec[other];
- else if other_pred
- XXX VIOLATION OF SVP64 DESIGN XXX
- XXX vec[i] = vec[other]; XXX
- XXX VIOLATION OF SVP64 DESIGN XXX
- pred[i] |= other_pred;
-```
-
-The first principle in SVP64 being violated is that SVP64 is a fully-independent
+The principle of SVP64 is that SVP64 is a fully-independent
Abstraction of hardware-looping in between issue and execute phases
-that has no relation to the operation it issues. The above pseudocode
-conditionally changes not only the type of element operation issued
-(a MV in some cases) but also the number of arguments (2 for a MV).
-At the very least, for Vertical-First Mode this will result in unanticipated and unexpected behaviour (maximise "surprises" for programmers) in
-the middle of loops, that will be far too hard to explain.
-
-The second principle being violated by the above algorithm is the expectation
-that temporary storage is available for a modified predicate: there is no
-such space, and predicates are read-only to reduce complexity at the
-micro-architectural level.
-SVP64 is founded on the principle that all operations are
-"re-entrant" with respect to interrupts and exceptions: SVSTATE must
-be saved and restored alongside PC and MSR, but nothing more. It is perfectly
-fine to have context-switching back to the operation be somewhat slower,
-through "reconstruction" of temporary internal state based on what SVSTATE
-contains, but nothing more.
-
-An alternative algorithm is therefore required that does not perform MVs,
-and does not require additional state to be saved on context-switching.
+that has no relation to the operation it issues.
+Additional state cannot be saved on context-switching beyond that
+of SVSTATE.
```
-def reduce( vl, vec, pred ):
+def preducei(vl, vec, pred):
+ vec = copy(vec)
pred = copy(pred) # must not damage predicate
- j = 0
- vi = [] # array of lookup indices to skip nonpredicated
- for i, pbit in enumerate(pred):
- if pbit:
- vi[j] = i
- j += 1
- step = 2
- while step <= vl
- halfstep = step // 2
- for i in (0..vl).step_by(step)
- other = vi[i + halfstep]
- ir = vi[i]
- other_pred = other < vl && pred[other]
- if pred[i] && other_pred
- vec[ir] += vec[other]
- else if other_pred:
- vi[ir] = vi[other] # index redirection, no MV
- pred[ir] |= other_pred # reconstructed on context-switch
- step *= 2
+ step = 1
+ ix = list(range(vl)) # indices move rather than copy data
+ print(" start", step, pred, vec)
+ while step < vl:
+ step *= 2
+ for i in range(0, vl, step):
+ other = i + step // 2
+ ci = ix[i]
+ oi = ix[other] if other < vl else None
+ other_pred = other < vl and pred[oi]
+ if pred[ci] and other_pred:
+ vec[ci] += vec[oi]
+ elif other_pred:
+ ix[i] = oi # leave data in-place, copy index instead
+ pred[ci] |= other_pred
+ print(" row", step, pred, vec, ix)
+ return vec
```
In this version the need for an explicit MV is made unnecessary by instead