From: lkcl <lkcl@web>
Date: Thu, 21 Apr 2022 09:27:39 +0000 (+0100)
Subject: (no commit message)
X-Git-Tag: opf_rfc_ls005_v1~2655
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=4b032e88978b47d7af96aebea9bd5958ed19fb53;p=libreriscv.git

---

diff --git a/openpower/sv/biginteger/analysis.mdwn b/openpower/sv/biginteger/analysis.mdwn
index 371e81926..52174220f 100644
--- a/openpower/sv/biginteger/analysis.mdwn
+++ b/openpower/sv/biginteger/analysis.mdwn
@@ -188,6 +188,43 @@ afterwards.  Essentially there are three phases:
 * Carry-Correction with a big integer add, if the estimate from
   phase 1 was wrong by one digit.
 
+From Knuth's Algorithm D, implemented in
+[mulmnu.c](https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/sv/bitmanip/mulmnu.c;hb=HEAD),
+Phase 2 is expressed in c, as:
+
+```
+      // Multiply and subtract.
+      k = 0;
+      for (i = 0; i < n; i++) {
+         p = qhat*vn[i]; // 64-bit product
+         t = un[i+j] - k - (p & 0xFFFFFFFFLL);
+         un[i+j] = t;
+         k = (p >> 32) - (t >> 32);
+      }
+```
+
+Where analysis of this algorithm, if a temporary vector is acceptable,
+shows that it can be split into two in exactly the same way as Algorithm M,
+this time using subtract instead of add.
+
+```
+      // this becomes the basis for sv.msubed in RS=RC Mode,
+      // where k is RC
+      k = 0;
+      for (i = 0; i < m; i++) {
+         unsigned product = k - u[i]*v[j];
+         k = product>>16;
+         plo[i] = product; // & 0xffff
+      }
+      // this is simply sv.subfe where k is XER.CA
+      k = 1; // borrow not carry
+      for (i = 0; i < m; i++) {
+         t = w[i + j] + k - plo[i];
+         w[i + j] = t;          // (I.e., t & 0xFFFF).
+         k = t >> 16; // borrow: should only be 1 bit
+      }
+```
+
 In essence then the primary focus of Vectorised Big-Int divide is in
 fact big-integer multiply (more specifically, mul-and-subtract).
 
@@ -195,3 +232,7 @@ fact big-integer multiply (more specifically, mul-and-subtract).
     RT = lowerhalf(product)
     RS = upperhalf(product)
 
+Detection of the fixup (phase 3) is determined by the Carry (borrow)
+bit at the end. Logically: if borrow was required then the qhat estimate
+was too large and the correction is required, which is nothing more than
+a Vectorised big-integer add (one instruction).