From 8ec39aa19ae008e04de6f9abccf2627a42c63b2f Mon Sep 17 00:00:00 2001
From: lkcl <lkcl@web>
Date: Sun, 24 Apr 2022 22:00:16 +0100
Subject: [PATCH]

---
 openpower/sv/biginteger/analysis.mdwn | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/openpower/sv/biginteger/analysis.mdwn b/openpower/sv/biginteger/analysis.mdwn
index c77a7a859..1f422337c 100644
--- a/openpower/sv/biginteger/analysis.mdwn
+++ b/openpower/sv/biginteger/analysis.mdwn
@@ -369,12 +369,24 @@ In this way a Scalar Integer divide can be performed in the same
 time-order as Newton-Raphson, using two hardware multipliers
 and a subtract.
 
+There is however another reason for having a 128/64 division
+instruction, and it's effectively the reverse of `madded`.
+Look closely at Algorithm D when the divisor is only a scalar
+(`v[0]`):
+
 ```
         k = 0; // the case of a
         for (j = m - 1; j >= 0; j--)
         {                                 // single-digit
-            uint64_t dig2 = (k * b + u[j]);
+            uint64_t dig2 = ((k << 32) | u[j]);
             q[j] = dig2 / v[0]; // divisor here.
-            k = dig2 % v[0]; // modulo bak into next loop
+            k = dig2 % v[0]; // modulo back into next loop
         }
 ```
+
+Here, just as with `madded` which can put the hi-half of the 128 bit product
+back in as a form of 64-bit carry, a scalar divisor of a vector dividend
+puts the modulo back in as the hi-half of a 128/64-bit divide.
+By a nice coincidence this is exactly the same 128/64-bit operation
+needed for the `qhat` estimate if it may produce both the quotient and
+the remainder.
-- 
2.30.2