setting an additional CA flag.
Combined with a Vectorised big-int `sv.addeo` the key inner loop of
-Knuth's Algorithm M may be achieved in four instructions:
+Knuth's Algorithm M may be achieved in four instructions, two of
+which are scalar initialisation:
- li r16, 0 # carry-accululator to zero
- addicc r16, r16, 0 # CA to zero as well
- sv.mulx r0.v, r8.v, r16 # mul vector using r16
- sv.addeo
+ li r16, 0 # zero accululator
+ addic r16, r16, 0 # CA to zero as well
+ sv.madde r0.v, r8.v, r17, r16 # mul vector
+ sv.addeo r24.v, r24.v, r0.v # big-add row to result
-
Normally, in a Scalar ISA, the use of a register as both a source
and destination like this would create costly Dependency Hazards, so
such an instruction would never be proposed. However: it turns out