smaller sub-operations is a given: worst-case, addition is O(N)
whilst multiply and divide are O(N^2).
+Links
+
+* <http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-large-integer-arithmetic-paper.pdf>
+* <https://lists.libre-soc.org/pipermail/libre-soc-dev/2022-April/004700.html>
+* <https://news.ycombinator.com/item?id=21151646>
+
# Add and Subtract
Surprisingly, no new additional instructions are required to perform
C is left out (and added afterwards with a Vector-Add)
things become more manageable.
+Demonstrating in c, a Row-based multiply using a temporary vector.
+Adapted from a simple implementation
+of Knuth M: <https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=openpower/sv/bitmanip/mulmnu.c;hb=HEAD>
+
+```
+ // this becomes the basis for sv.madded in RS=RC Mode,
+ // where k is RC
+ k = 0;
+ for (i = 0; i < m; i++) {
+ unsigned product = u[i]*v[j] + k;
+ k = product>>16;
+ plo[i] = product; // & 0xffff
+ }
+ // this is simply sv.adde where k is XER.CA
+ k = 0;
+ for (i = 0; i < m; i++) {
+ t = plo[i] + w[i + j] + k;
+ w[i + j] = t; // (I.e., t & 0xFFFF).
+ k = t >> 16; // carry: should only be 1 bit
+ }
+```
+
We therefore propose an operation that is 3-in, 2-out,
that, noting that the connection between successive
mul-adds has the UPPER half of the previous operation