From 3a91dd8e37802ffe4f38c80c62f07cab5628040e Mon Sep 17 00:00:00 2001 From: lkcl Date: Sun, 10 Apr 2022 16:33:29 +0100 Subject: [PATCH] --- openpower/sv/svp64/appendix.mdwn | 28 +++++++++++++++++++++++----- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/openpower/sv/svp64/appendix.mdwn b/openpower/sv/svp64/appendix.mdwn index fac669f0c..3067efe4f 100644 --- a/openpower/sv/svp64/appendix.mdwn +++ b/openpower/sv/svp64/appendix.mdwn @@ -16,29 +16,47 @@ Table of contents: Vector systems are expected to be high performance. This is achieved through parallelism, which requires that elements in the vector be -independent. XER SO and other global "accumulation" flags (CR.OV) cause +independent. XER SO/OV and other global "accumulation" flags (CR.SO) cause Read-Write Hazards on single-bit global resources, having a significant detrimental effect. -Consequently in SV, XER.SO and CR.OV behaviour is disregarded (including +Consequently in SV, XER.SO and OV behaviour is disregarded (including in `cmp` instructions). XER is simply neither read nor written. This includes when `scalar identity behaviour` occurs. If precise OpenPOWER v3.0/1 scalar behaviour is desired then OpenPOWER v3.0/1 instructions should be used without an SV Prefix. +Of note here is that XER.SO and OV may already be disregarded in the +Power ISA v3.0/1 SFFS (Scalar Fixed and Floating) Compliancy Subset. +SVP64 simply makes it mandatory to disregard even for other Subsets, +but only for SVP64 Prefixed Operations. + An interesting side-effect of this decision is that the OE flag is now -free for other uses when SV Prefixing is used. +free for other uses when SV Prefixing is used, and CR.SO may likewise +used for other purposes (saturation for example). XER.CA/CA32 on the other hand is expected and required to be implemented according to standard Power ISA Scalar behaviour. Interestingly, due to SVP64 being in effect a hardware for-loop around Scalar instructions executing in precise Program Order, a little thought shows that a Vectorised -Carry-In-Out add is in effect a Big Integer Add, taking a single bit CarryIn +Carry-In-Out add is in effect a Big Integer Add, taking a single bit Carry In and producing, at the end, a single bit Carry out. High performance implementations may exploit this observation to deploy efficient Parallel Carry Lookahead. - sv. + # assume VL=4, this results in 4 sequential ops (below) + sv.adde r0.v, r4.v, r8.v + + # instructions that get executed in backend hardware: + adde r0, r4, r8 # takes carry-in, produces carry-out + adde r1, r5, r9 # takes carry from previous + ... + adde r3, r7, r11 # likewise + +It can clearly be seen that the carry chains from one +64 bit add to the next, the end result being that a +256-bit "Big Integer Add" has been performed, and that +CA contains the 257th bit. # v3.0B/v3.1 relevant instructions -- 2.30.2