that, noting that the connection between successive
mul-adds has the UPPER half of the previous operation
as its input, writes the UPPER half of the current
-product into a second output register for exactly that
-purpose.
+product into a second output register for exactly the
+purpose of letting it be added onto the next BigInt digit.
product = RA*RB+RC
RT = lowerhalf(product)
RC = upperhalf(product)
-Successive iterations effectively use RC as a 64-bit carry, and
+Successive iterations thus effectively use RC as a 64-bit carry, and
as noted by Intel in their notes on mulx,
RA*RB+RC+RD cannot overflow, so does not require
-setting an additional CA flag, we first cover the chain of
+setting an additional CA flag. We first cover the chain of
RA*RB+RC as follows:
RT0, RC0 = RA0 * RB0 + RC
which are scalar initialisation:
li r16, 0 # zero accumulator
- sv.madde r0.v, r8.v, r17, r16 # mul vector
addic r16, r16, 0 # CA to zero as well
+ sv.madde r0.v, r8.v, r17, r16 # mul vector
sv.adde r24.v, r24.v, r0.v # big-add row to result
Normally, in a Scalar ISA, the use of a register as both a source