From: lkcl <lkcl@web>
Date: Mon, 11 Dec 2023 02:43:20 +0000 (+0000)
Subject: (no commit message)
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=ea249820c03;p=libreriscv.git

---

diff --git a/openpower/sv/svp64_quirks.mdwn b/openpower/sv/svp64_quirks.mdwn
index b87bf0da4..fa4a32a31 100644
--- a/openpower/sv/svp64_quirks.mdwn
+++ b/openpower/sv/svp64_quirks.mdwn
@@ -484,7 +484,7 @@ effect a change of VL:
 
 ```
 for i in range(VL):
-    result = element_operation(GPR(RA+i), GPR(RB+i))
+    GPR(RT+i) = result = operation(GPR(RA+i), GPR(RB+i))
     if test(result):
         VL = i
         break
@@ -501,6 +501,37 @@ beyond the Vector Truncation point.  In-order systems will have a slightly
 harder time and may choose to execute one element only at a time, reducing
 performance as a result.
 
+# Data-Dependent Fail-First implicit mapreduce mode
+
+Best first illustrated with pseudocode, which should be
+compared with the above, it is crucial to note that both
+RT and RA are scalar: only RB is Vector yet just as with
+mapreduce mode looping *continues*.
+
+```
+for i in range(VL):
+    GPR(RT) = result = operation(GPR(RA), GPR(RB+i))
+    if test(result):
+        VL = i
+        break
+```
+
+The "normal" rule for SV Looping is that looping
+terminates at the first scalar result (if destination is
+set to scalar). This rule is *disabled* for mapreduce mode,
+allowing a scalar to be used as an "accumulator" by
+setting the result (RT, FRT, BF) to be the exact same
+register as one of the sources.
+
+It turned out to be extremly useful to have *conditional*
+termination of such "mapreducing" style accumulation,
+for example to terminate and truncate dotproduct
+accumulation should the arithmetic accumulator overflow.
+Or, in the [[openpower/sv/cookbook/fortran_maxloc]]
+example, to terminate the parallel max-search at the
+first instance where the element currently tested is
+no longer greater than that previously found.
+
 # OE=1
 
 The hardware cost of Sticky Overflow in a parallel environment is immense.