(no commit message)

author lkcl <lkcl@web>

Mon, 11 Dec 2023 02:43:20 +0000 (02:43 +0000)

committer IkiWiki <ikiwiki.info>

Mon, 11 Dec 2023 02:43:20 +0000 (02:43 +0000)
author lkcl <lkcl@web>
Mon, 11 Dec 2023 02:43:20 +0000 (02:43 +0000)
committer IkiWiki <ikiwiki.info>
Mon, 11 Dec 2023 02:43:20 +0000 (02:43 +0000)
diff --git a/openpower/sv/svp64_quirks.mdwn b/openpower/sv/svp64_quirks.mdwn

index b87bf0da4e775a880182696425b8e8612b5ae172..fa4a32a31c4f97693e97859a3eb65b8b6460031f 100644 (file)
--- a/openpower/sv/svp64_quirks.mdwn
+++ b/openpower/sv/svp64_quirks.mdwn
@@ -484,7 +484,7 @@ effect a change of VL:
  
  ```
  for i in range(VL):
-    result = element_operation(GPR(RA+i), GPR(RB+i))
+    GPR(RT+i) = result = operation(GPR(RA+i), GPR(RB+i))
      if test(result):
          VL = i
          break
@@ -501,6 +501,37 @@ beyond the Vector Truncation point.  In-order systems will have a slightly
  harder time and may choose to execute one element only at a time, reducing
  performance as a result.
  
+# Data-Dependent Fail-First implicit mapreduce mode
+
+Best first illustrated with pseudocode, which should be
+compared with the above, it is crucial to note that both
+RT and RA are scalar: only RB is Vector yet just as with
+mapreduce mode looping *continues*.
+
+```
+for i in range(VL):
+    GPR(RT) = result = operation(GPR(RA), GPR(RB+i))
+    if test(result):
+        VL = i
+        break
+```
+
+The "normal" rule for SV Looping is that looping
+terminates at the first scalar result (if destination is
+set to scalar). This rule is *disabled* for mapreduce mode,
+allowing a scalar to be used as an "accumulator" by
+setting the result (RT, FRT, BF) to be the exact same
+register as one of the sources.
+
+It turned out to be extremly useful to have *conditional*
+termination of such "mapreducing" style accumulation,
+for example to terminate and truncate dotproduct
+accumulation should the arithmetic accumulator overflow.
+Or, in the [[openpower/sv/cookbook/fortran_maxloc]]
+example, to terminate the parallel max-search at the
+first instance where the element currently tested is
+no longer greater than that previously found.
+
  # OE=1
  
  The hardware cost of Sticky Overflow in a parallel environment is immense.
author	lkcl <lkcl@web>
	Mon, 11 Dec 2023 02:43:20 +0000 (02:43 +0000)
committer	IkiWiki <ikiwiki.info>
	Mon, 11 Dec 2023 02:43:20 +0000 (02:43 +0000)