From: lkcl <lkcl@web>
Date: Mon, 11 Apr 2022 08:14:01 +0000 (+0100)
Subject: (no commit message)
X-Git-Tag: opf_rfc_ls005_v1~2793
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=51ea7133be63382cb1a7026b2ff643bbcb40a812;p=libreriscv.git

---

diff --git a/openpower/sv/svp64/appendix.mdwn b/openpower/sv/svp64/appendix.mdwn
index 18be50fba..6f23727ba 100644
--- a/openpower/sv/svp64/appendix.mdwn
+++ b/openpower/sv/svp64/appendix.mdwn
@@ -836,7 +836,9 @@ achieved in DCT and FFT REMAP**
                 // reduction operation -- we still use this algorithm even
                 // if the reduction operation isn't associative or
                 // commutative.
-/// `temp_pred` is a user-visible Vector Condition register 
+    XXX VIOLATION OF SVP64 DESIGN PRINCIPLES               XXXX
+/// XXX `pred` is a user-visible Vector Condition register XXXX
+    XXX VIOLATION OF SVP64 DESIGN PRINCIPLES               XXXX
 ///
 /// all input arrays have length `vl`
 def reduce(vl, vec, pred):
@@ -855,7 +857,7 @@ def reduce(vl, vec, pred):
             pred[i] |= other_pred;
 ```
 
-The principle in SVP64 being violated is that SVP64 is a fully-independent
+The first principle in SVP64 being violated is that SVP64 is a fully-independent
 Abstraction of hardware-looping in between issue and execute phases 
 that has no relation to the operation it issues.  The above pseudocode
 conditionally changes not only the type of element operation issued
@@ -863,7 +865,17 @@ conditionally changes not only the type of element operation issued
 At the very least, for Vertical-First Mode this will result in unanticipated and unexpected behaviour (maximise "surprises" for programmers) in
 the middle of loops, that will be far too hard to explain.
 
-An alternative algorithm is therefore required that does not perform MVs.
+The second principle being violated by the above algorithm is the expectation
+that temporary storage is available for a modified predicate: there is no
+such space.  SVP64 is founded on the principle that all operations are
+"re-entrant" with respect to interrupts and exceptions: SVSTATE must
+be saved and restored alongside PC and MSR, but nothing more. It is perfectly
+fine to have context-switching back to the operation be somewhat slower,
+through "reconstruction" of temporary internal state based on what SVSTATE
+contains, but nothing more.
+
+An alternative algorithm is therefore required that does not perform MVs,
+and does not require additional state to be saved on context-switching.
 
 ```
 def reduce(  vl,  vec, pred, pred,):
@@ -878,15 +890,21 @@ def reduce(  vl,  vec, pred, pred,):
         halfstep = step // 2
         for i in (0..vl).step_by(step)
             other = vi[i + halfstep]
-            i = vi[i]
+            ir = vi[i]
             other_pred = other < vl && pred[other]
             if pred[i] && other_pred
-                vec[i] += vec[other]
-            pred[i] |= other_pred
+                vec[ir] += vec[other]
+            else if other_pred:
+               vi[ir] = vi[other] # index redirection, no MV
+            pred[ir] |= other_pred # reconstructed on context-switch
          step *= 2
-
 ```
 
+In this version the need for an explicit MV is made unnecessary by instead
+leaving elements *in situ*.  The internal modifications to the predicate may,
+due to the reduction being entirely deterministic, be "reconstructed"
+on a context-switch. This may make some implementations slower.
+
 *Implementor's Note: many SIMD-based Parallel Reduction Algorithms are
 implemented in hardware with MVs that ensure lane-crossing is minimised.
 In SIMD ISAs the internal SIMD Architectural design is exposed and imposed on the programmer. Cray-style Vector ISAs on the other hand provide convenient,