From 875cfd9dc46f3961819c0a500c22231dcfeef1af Mon Sep 17 00:00:00 2001
From: lkcl <lkcl@web>
Date: Wed, 15 Sep 2021 14:50:32 +0100
Subject: [PATCH]

---
 openpower/sv/svp64/appendix.mdwn | 37 ++++++++++++++++++--------------
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/openpower/sv/svp64/appendix.mdwn b/openpower/sv/svp64/appendix.mdwn
index 90fa26488..1a528809d 100644
--- a/openpower/sv/svp64/appendix.mdwn
+++ b/openpower/sv/svp64/appendix.mdwn
@@ -201,22 +201,27 @@ dest elwidth.
 
 # Reduce mode
 
-There are two variants here.  The first is when the destination is scalar
-and at least one of the sources is Vector.  The second is more complex
-and involves map-reduction on vectors.
-
-The first defining characteristic distinguishing Scalar-dest reduce mode
-from Vector reduce mode is that Scalar-dest reduce issues VL element
-operations, whereas Vector reduce mode performs an actual map-reduce
-(tree reduction): typically `O(VL log VL)` actual computations.
-
-The second defining characteristic of scalar-dest reduce mode is that it
-is, in simplistic and shallow terms *serial and sequential in nature*,
-whereas the Vector reduce mode is definitely inherently paralleliseable.
-
-The reason why scalar-dest reduce mode is "simplistically" serial and
-sequential is that in certain circumstances (such as an `OR` operation
-or a MIN/MAX operation) it may be possible to parallelise the reduction.
+Reduction in SVP64 is deterministic and somewhat of a misnomer.  A normal
+Vector ISA would have explicit Reduce opcodes with defibed characteristics
+per operation: in SX Aurora there is even an additional scalar argument
+containing the initial reduction value. SVP64 fundamentally has to
+utilise *existing* Scalar Power ISA v3.0B operations, which presents some
+unique challenges.
+
+The solution turns out to be to simply define reduction as permitting
+deterministic element-based schedules to be issued using the base Scalar
+operations, and to rely on the underlying microarchitecture to resolve
+Register Hazards at the element level.  This goes back to
+the fundamental principle that SV is nothing more than a Sub-Program-Counter
+sitting between Decode and Issue phases.
+
+Microarchitectures *may* take opportunities to parallelise the reduction
+but only if in doing so they preserve Program Order at the Element Level.
+Opportunities where this is possible include an `OR` operation
+or a MIN/MAX operation: it may be possible to parallelise the reduction,
+but for Floating Point it is not permitted due to different results
+being obtained if the reduction is not executed in strict sequential
+order.
 
 ## Scalar result reduce mode
 
-- 
2.30.2