From: lkcl <lkcl@web>
Date: Tue, 21 Jun 2022 15:11:25 +0000 (+0100)
Subject: (no commit message)
X-Git-Tag: opf_rfc_ls005_v1~1623
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=df08ebaa5f5e2d9f05ec2b98b7bbbeee66d83018;p=libreriscv.git

---

diff --git a/openpower/sv/svp64/appendix.mdwn b/openpower/sv/svp64/appendix.mdwn
index a8b9c1644..626c3640a 100644
--- a/openpower/sv/svp64/appendix.mdwn
+++ b/openpower/sv/svp64/appendix.mdwn
@@ -963,13 +963,13 @@ For modes:
   - mr OR crm: "normal" map-reduce mode or CR-mode.
   - mr.svm OR crm.svm: when vec2/3/4 set, sub-vector mapreduce is enabled
 
-# Proposed Parallel-reduction algorithm
+# Parallel-reduction algorithm
 
 The principle of SVP64 is that SVP64 is a fully-independent
 Abstraction of hardware-looping in between issue and execute phases 
 that has no relation to the operation it issues.
 Additional state cannot be saved on context-switching beyond that
-of SVSTATE.
+of SVSTATE, making things slightly tricky.
 
 Executable demo pseudocode, full version
 [here](https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/sv/preduce.py;hb=HEAD)
@@ -997,10 +997,24 @@ def preducei(vl, vec, pred):
     return vec
 ```
 
-In this version the need for an explicit MV is made unnecessary by instead
-leaving elements *in situ*.  The internal modifications to the predicate may,
-due to the reduction being entirely deterministic, be "reconstructed"
-on a context-switch. This may make some implementations slower.
+This algorithm works by noting when data remains in-place rather than
+being reduced, and referring to that alternative position on subsequent
+layers of reduction.  It is re-entrant. If however interrupted and
+restored, some implementations may take longer to re-establish the
+context.
+
+Its application by default is that:
+
+* RA, FRA or BFA is the first register as the first operand
+  (ci index offset in the above pseudocode)
+* RB, FRB or BFB is the second (co index offset)
+* RT (result) also uses ci **if RA==RT**
+
+For more complex applications a REMAP Schedule must be used
+
+*Programmers's note:
+if passed a predicate mask with only one bit set, this algorithm
+takes no action, similar to when a predicate mask is all zero.*
 
 *Implementor's Note: many SIMD-based Parallel Reduction Algorithms are
 implemented in hardware with MVs that ensure lane-crossing is minimised.