From 3a91dd8e37802ffe4f38c80c62f07cab5628040e Mon Sep 17 00:00:00 2001
From: lkcl <lkcl@web>
Date: Sun, 10 Apr 2022 16:33:29 +0100
Subject: [PATCH]

---
 openpower/sv/svp64/appendix.mdwn | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/openpower/sv/svp64/appendix.mdwn b/openpower/sv/svp64/appendix.mdwn
index fac669f0c..3067efe4f 100644
--- a/openpower/sv/svp64/appendix.mdwn
+++ b/openpower/sv/svp64/appendix.mdwn
@@ -16,29 +16,47 @@ Table of contents:
 
 Vector systems are expected to be high performance.  This is achieved
 through parallelism, which requires that elements in the vector be
-independent.  XER SO and other global "accumulation" flags (CR.OV) cause
+independent.  XER SO/OV and other global "accumulation" flags (CR.SO) cause
 Read-Write Hazards on single-bit global resources, having a significant
 detrimental effect.
 
-Consequently in SV, XER.SO and CR.OV behaviour is disregarded (including
+Consequently in SV, XER.SO and OV behaviour is disregarded (including
 in `cmp` instructions).  XER is simply neither read nor written.
 This includes when `scalar identity behaviour` occurs.  If precise
 OpenPOWER v3.0/1 scalar behaviour is desired then OpenPOWER v3.0/1
 instructions should be used without an SV Prefix.
 
+Of note here is that XER.SO and OV may already be disregarded in the
+Power ISA v3.0/1 SFFS (Scalar Fixed and Floating) Compliancy Subset.
+SVP64 simply makes it mandatory to disregard even for other Subsets,
+but only for SVP64 Prefixed Operations.
+
 An interesting side-effect of this decision is that the OE flag is now
-free for other uses when SV Prefixing is used.
+free for other uses when SV Prefixing is used, and CR.SO may likewise
+used for other purposes (saturation for example).
 
 XER.CA/CA32 on the other hand is expected and required to be implemented
 according to standard Power ISA Scalar behaviour.  Interestingly, due
 to SVP64 being in effect a hardware for-loop around Scalar instructions
 executing in precise Program Order, a little thought shows that a Vectorised
-Carry-In-Out add is in effect a Big Integer Add, taking a single bit CarryIn
+Carry-In-Out add is in effect a Big Integer Add, taking a single bit Carry In
 and producing, at the end, a single bit Carry out.  High performance
 implementations may exploit this observation to deploy efficient
 Parallel Carry Lookahead.
 
-    sv.
+    # assume VL=4, this results in 4 sequential ops (below)
+    sv.adde r0.v, r4.v, r8.v
+
+    # instructions that get executed in backend hardware:
+    adde r0, r4, r8 # takes carry-in, produces carry-out
+    adde r1, r5, r9 # takes carry from previous
+    ...
+    adde r3, r7, r11 # likewise
+
+It can clearly be seen that the carry chains from one
+64 bit add to the next, the end result being that a
+256-bit "Big Integer Add" has been performed, and that
+CA contains the 257th bit.
 
 # v3.0B/v3.1 relevant instructions
 
-- 
2.30.2