From 834f6148570fc806851b2d538718808c5c2405e8 Mon Sep 17 00:00:00 2001
From: lkcl <lkcl@web>
Date: Tue, 19 Apr 2022 14:48:53 +0100
Subject: [PATCH]

---
 openpower/sv/biginteger.mdwn | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/openpower/sv/biginteger.mdwn b/openpower/sv/biginteger.mdwn
index 4edd00d58..32773ef66 100644
--- a/openpower/sv/biginteger.mdwn
+++ b/openpower/sv/biginteger.mdwn
@@ -18,7 +18,9 @@ Dynamic SIMD ALUs for maximum performance and effectiveness.
 
 # Analysis
 
-This section covers an analysis of big integer operations
+This section covers an analysis of big integer operations.  Use of
+smaller sub-operations is a given: worst-case, addition is O(N)
+whilst multiply and divide are O(N^2).
 
 ## Add and Subtract
 
@@ -52,3 +54,23 @@ Instead, Intel, in 2012, specifically added a `mulx` instruction, allowing
 both HI and LO halves of the multiply to reach registers.  If done as a
 multiply-and-accumulate this becomes quite an expensive operation:
 3 64-Bit in, 2 64-bit registers out).
+
+Long-multiplication may be performed a row at a time, starting
+with B0:
+
+    C4 C3 C2 C1 C0
+             A0xB0
+          A1xB0
+       A2xB0
+    A3xB0
+    R4 R3 R2 R1 R0
+
+* R0 contains C0 plus the LO half of A0 times B0
+* R1 contains C1 plus the LO half of A1 times B0
+  plus the HI half of A0 times B0.
+
+This would on the face of it be a 4-in operation:
+the upper half of a previous multiply, two new operands
+to multiply, and an additional accumulator (C). However if
+C is left out (and added afterwards with a Vector-Add)
+things become more manageable.
-- 
2.30.2