From 6578de46990c3df8f6f911bdb2b794cbcec7e362 Mon Sep 17 00:00:00 2001
From: lkcl <lkcl@web>
Date: Fri, 18 Jun 2021 23:39:39 +0100
Subject: [PATCH]

---
 openpower/sv/svp64/appendix.mdwn | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/openpower/sv/svp64/appendix.mdwn b/openpower/sv/svp64/appendix.mdwn
index 4d921a2f1..86740e628 100644
--- a/openpower/sv/svp64/appendix.mdwn
+++ b/openpower/sv/svp64/appendix.mdwn
@@ -218,7 +218,7 @@ or a MIN/MAX operation) it may be possible to parallelise the reduction.
 ## Scalar result reduce mode
 
 In this mode, which is suited to operations involving carry or overflow,
-one register is identified as being the "accumulator".
+one register must be identified by the programmer as being the "accumulator".
 Scalar reduction is thus categorised by:
 
 * One of the sources is a Vector
@@ -226,7 +226,7 @@ Scalar reduction is thus categorised by:
 * optionally but most usefully when one source register is also the destination
 * That the source register type is the same as the destination register
   type identified as the "accumulator".  scalar reduction on `cmp`,
-  `setb` or `isel` is not possible for example because of the mixture
+  `setb` or `isel` makes no sense for example because of the mixture
   between CRs and GPRs.
 
 Typical applications include simple operations such as `ADD r3, r10.v,
@@ -242,6 +242,12 @@ However, *unless* the operation is marked as "mapreduce", SV ordinarily
 operation as "mapreduce" will it continue to issue multiple sub-looped
 (element) instructions in `Program Order`.
 
+To.perform the loop in reverse order, the ```RG``` (reverse gear) bit must be set.  This is useful for leaving a cumulative suffix sum in reverse order:
+
+    for i in (VL-1 downto 0):
+        # RT-1 = RA gives a suffix sum
+        iregs[RT+i] = iregs[RA+i] - iregs[RB+i]
+
 Other examples include shift-mask operations where a Vector of inserts
 into a single destination register is required, as a way to construct
 a value quickly from multiple arbitrary bit-ranges and bit-offsets.
@@ -270,8 +276,8 @@ Reduce Mode.
 If an interrupt or exception occurs in the middle of the scalar mapreduce,
 the scalar destination register **MUST** be updated with the current
 (intermediate) result, because this is how ```Program Order``` is
-preserved (Vector Loops are to be considered to be just another instruction
-being executed in Program Order).  In this way, after return from interrupt,
+preserved (Vector Loops are to be considered to be just another way of issuing instructions
+in Program Order).  In this way, after return from interrupt,
 the scalar mapreduce may continue where it left off.  This provides
 "precise" exception behaviour.
 
-- 
2.30.2