From ee6ff936c36d0e5523b1ee430f81b076ca392824 Mon Sep 17 00:00:00 2001
From: lkcl <lkcl@web>
Date: Mon, 21 Dec 2020 14:12:07 +0000
Subject: [PATCH]

---
 openpower/sv/svp_rewrite/svp64.mdwn | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/openpower/sv/svp_rewrite/svp64.mdwn b/openpower/sv/svp_rewrite/svp64.mdwn
index 4b9022883..d4d512994 100644
--- a/openpower/sv/svp_rewrite/svp64.mdwn
+++ b/openpower/sv/svp_rewrite/svp64.mdwn
@@ -229,7 +229,7 @@ These are the modes:
 * **sat mode** or saturation: clamps each elemrnt result to a min/max rather than overflows / wraps.  allows signed and unsigned clamping. 
 * **reduce mode**. a mapreduce is performed.  the result is a scalar.  a result vector however is required, as the upper elements may be used to store intermediary computations.  the result of the mapreduce is in the first element with a nonzero predicate bit.  see separate section below.
   note that there are comprehensive caveats when using this mode.
-* **pred-result** will test the result (CR testing selects a bit of CR and inverts it, just like branch testing) and if the test fails it is as if the predicate bit was zero.  When Rc=1 the CR element (CR0) however is still stored in the CR regfile.  This scheme does not apply to crops (crand, cror).
+* **pred-result** will test the result (CR testing selects a bit of CR and inverts it, just like branch testing) and if the test fails it is as if the predicate bit was zero.  When Rc=1 the CR element however is still stored in the CR regfile, even if the test failed.  This scheme does not apply to crops (crand, cror).  See appendix for details.
 
 Note that ffirst and reduce modes are not anticipated to be high-performance in some implementations.  ffirst due to interactions with VL, and reduce due to it requiring additional operations to produce a result.  normal, saturate and pred-result are however independent and may easily be parallelised to give high performance, regardless of the value of VL.
 
@@ -653,6 +653,26 @@ One extremely important aspect of ffirst is:
   vectorised operations are effectively `nops` which is
   *precisely the desired and intended behaviour*.
 
+## pred-result mode
+
+This mode merges common CR testing with predication, saving on instruction count
+
+    for i in range(VL):
+        if predicate_masked_out(i):
+             continue
+        result = op(iregs[RA+i], iregs[RB+i])
+        CRnew = analyse(result) # calculates eq/lt/gt
+        # Rc=1 always stores the CR
+        if Rc=1:
+            crregs[offs+i] = CRnew
+        # now test CR, similar to branch
+        if CRnew[BO[0:1]] != BO[2]:
+            continue # test failed: cancel store
+        # result optionally stored but CR always is
+        iregs[RT+i] = result
+
+The reason for allowing the CR element to be stored is so that post-analysis
+of the CR Vector may be carried out.  For example: Saturation may have occurred (and been prevented from updating, by the test) but it is desirable to know *which* elements fail saturation.
 
 ## CR Operations
 
-- 
2.30.2