From bff6bf4de1b8c84d52bc082c8c7c5290c30f851a Mon Sep 17 00:00:00 2001
From: lkcl <lkcl@web>
Date: Wed, 27 Apr 2022 10:22:21 +0100
Subject: [PATCH]

---
 openpower/sv/biginteger/analysis.mdwn | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/openpower/sv/biginteger/analysis.mdwn b/openpower/sv/biginteger/analysis.mdwn
index 703f51235..b731f2bee 100644
--- a/openpower/sv/biginteger/analysis.mdwn
+++ b/openpower/sv/biginteger/analysis.mdwn
@@ -129,6 +129,12 @@ With Scalar shift and rotate operations in the Power ISA already being
 complex and very comprehensive, it is hard to justify creating complex
 3-in 2-out variants when a sequence of 3 simple instructions will suffice.
 
+For larger shift amounts beyond an element bitwidth standard register move
+operations may be used, or, if the shift amount is static,
+to reference an alternate starting point in
+the registers containing the Vector elements
+because SVP64 sits on top of a standard Scalar register file.
+
 # Vector Multiply
 
 Long-multiply, assuming an O(N^2) algorithm, is performed by summing
@@ -149,12 +155,13 @@ the results of the multiply must be explicitly extracted. High
 performance RISC advocates
 recommend "macro-op fusion" which is in effect where the second instruction
 gains access to the cached copy of the HI half of the
-multiply redult, which had already been
+multiply result, which had already been
 computed by the first. This approach quickly complicates the internal
 microarchitecture, especially at the decode phase.
 
 Instead, Intel, in 2012, specifically added a `mulx` instruction, allowing
-both HI and LO halves of the multiply to reach registers.  If done as a
+both HI and LO halves of the multiply to reach registers with a single
+instruction.  If however done as a
 multiply-and-accumulate this becomes quite an expensive operation:
 (3 64-Bit in, 2 64-bit registers out).
 
-- 
2.30.2