(no commit message)

[libreriscv.git] / openpower / sv / svp64 / appendix.mdwn
diff --git a/openpower/sv/svp64/appendix.mdwn b/openpower/sv/svp64/appendix.mdwn

index a8b9c16446a0e3361954c082a6034cbf2bbff3e3..626c3640ab3d20f73437cebcde45925dc0971c00 100644 (file)
--- a/openpower/sv/svp64/appendix.mdwn
+++ b/openpower/sv/svp64/appendix.mdwn
@@ -963,13 +963,13 @@ For modes:
    - mr OR crm: "normal" map-reduce mode or CR-mode.
    - mr.svm OR crm.svm: when vec2/3/4 set, sub-vector mapreduce is enabled
  
-# Proposed Parallel-reduction algorithm
+# Parallel-reduction algorithm
  
  The principle of SVP64 is that SVP64 is a fully-independent
  Abstraction of hardware-looping in between issue and execute phases 
  that has no relation to the operation it issues.
  Additional state cannot be saved on context-switching beyond that
-of SVSTATE.
+of SVSTATE, making things slightly tricky.
  
  Executable demo pseudocode, full version
  [here](https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/sv/preduce.py;hb=HEAD)
@@ -997,10 +997,24 @@ def preducei(vl, vec, pred):
      return vec
  ```
  
-In this version the need for an explicit MV is made unnecessary by instead
-leaving elements *in situ*.  The internal modifications to the predicate may,
-due to the reduction being entirely deterministic, be "reconstructed"
-on a context-switch. This may make some implementations slower.
+This algorithm works by noting when data remains in-place rather than
+being reduced, and referring to that alternative position on subsequent
+layers of reduction.  It is re-entrant. If however interrupted and
+restored, some implementations may take longer to re-establish the
+context.
+
+Its application by default is that:
+
+* RA, FRA or BFA is the first register as the first operand
+  (ci index offset in the above pseudocode)
+* RB, FRB or BFB is the second (co index offset)
+* RT (result) also uses ci **if RA==RT**
+
+For more complex applications a REMAP Schedule must be used
+
+*Programmers's note:
+if passed a predicate mask with only one bit set, this algorithm
+takes no action, similar to when a predicate mask is all zero.*
  
  *Implementor's Note: many SIMD-based Parallel Reduction Algorithms are
  implemented in hardware with MVs that ensure lane-crossing is minimised.