From: lkcl Date: Thu, 21 Apr 2022 09:27:39 +0000 (+0100) Subject: (no commit message) X-Git-Tag: opf_rfc_ls005_v1~2655 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=4b032e88978b47d7af96aebea9bd5958ed19fb53;p=libreriscv.git --- diff --git a/openpower/sv/biginteger/analysis.mdwn b/openpower/sv/biginteger/analysis.mdwn index 371e81926..52174220f 100644 --- a/openpower/sv/biginteger/analysis.mdwn +++ b/openpower/sv/biginteger/analysis.mdwn @@ -188,6 +188,43 @@ afterwards. Essentially there are three phases: * Carry-Correction with a big integer add, if the estimate from phase 1 was wrong by one digit. +From Knuth's Algorithm D, implemented in +[mulmnu.c](https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/sv/bitmanip/mulmnu.c;hb=HEAD), +Phase 2 is expressed in c, as: + +``` + // Multiply and subtract. + k = 0; + for (i = 0; i < n; i++) { + p = qhat*vn[i]; // 64-bit product + t = un[i+j] - k - (p & 0xFFFFFFFFLL); + un[i+j] = t; + k = (p >> 32) - (t >> 32); + } +``` + +Where analysis of this algorithm, if a temporary vector is acceptable, +shows that it can be split into two in exactly the same way as Algorithm M, +this time using subtract instead of add. + +``` + // this becomes the basis for sv.msubed in RS=RC Mode, + // where k is RC + k = 0; + for (i = 0; i < m; i++) { + unsigned product = k - u[i]*v[j]; + k = product>>16; + plo[i] = product; // & 0xffff + } + // this is simply sv.subfe where k is XER.CA + k = 1; // borrow not carry + for (i = 0; i < m; i++) { + t = w[i + j] + k - plo[i]; + w[i + j] = t; // (I.e., t & 0xFFFF). + k = t >> 16; // borrow: should only be 1 bit + } +``` + In essence then the primary focus of Vectorised Big-Int divide is in fact big-integer multiply (more specifically, mul-and-subtract). @@ -195,3 +232,7 @@ fact big-integer multiply (more specifically, mul-and-subtract). RT = lowerhalf(product) RS = upperhalf(product) +Detection of the fixup (phase 3) is determined by the Carry (borrow) +bit at the end. Logically: if borrow was required then the qhat estimate +was too large and the correction is required, which is nothing more than +a Vectorised big-integer add (one instruction).