* Carry-Correction with a big integer add, if the estimate from
phase 1 was wrong by one digit.
+From Knuth's Algorithm D, implemented in
+[mulmnu.c](https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/sv/bitmanip/mulmnu.c;hb=HEAD),
+Phase 2 is expressed in c, as:
+
+```
+ // Multiply and subtract.
+ k = 0;
+ for (i = 0; i < n; i++) {
+ p = qhat*vn[i]; // 64-bit product
+ t = un[i+j] - k - (p & 0xFFFFFFFFLL);
+ un[i+j] = t;
+ k = (p >> 32) - (t >> 32);
+ }
+```
+
+Where analysis of this algorithm, if a temporary vector is acceptable,
+shows that it can be split into two in exactly the same way as Algorithm M,
+this time using subtract instead of add.
+
+```
+ // this becomes the basis for sv.msubed in RS=RC Mode,
+ // where k is RC
+ k = 0;
+ for (i = 0; i < m; i++) {
+ unsigned product = k - u[i]*v[j];
+ k = product>>16;
+ plo[i] = product; // & 0xffff
+ }
+ // this is simply sv.subfe where k is XER.CA
+ k = 1; // borrow not carry
+ for (i = 0; i < m; i++) {
+ t = w[i + j] + k - plo[i];
+ w[i + j] = t; // (I.e., t & 0xFFFF).
+ k = t >> 16; // borrow: should only be 1 bit
+ }
+```
+
In essence then the primary focus of Vectorised Big-Int divide is in
fact big-integer multiply (more specifically, mul-and-subtract).
RT = lowerhalf(product)
RS = upperhalf(product)
+Detection of the fixup (phase 3) is determined by the Carry (borrow)
+bit at the end. Logically: if borrow was required then the qhat estimate
+was too large and the correction is required, which is nothing more than
+a Vectorised big-integer add (one instruction).