then a partial-sum may be carried out with LD and ST in a standard
Cray-style Vector Loop:
-
+ aptr = A address
+ bptr = B address
+ rptr = Result address
+ li r0, 0 # used to help clear CA
+ addic r0, r0, 0 # CA to zero as well
+ setmvli 8 # set MAXVL to 8
+ loop:
+ setvl t0, n # n is the number of digits
+ mulli t1, t0, 8 # 8 bytes per element
+ sv.ldu a0, aptr, t1 # update advances pointer
+ sv.ldu b0, bptr, t1 # likewise
+ sv.adde r0, a0, b0 # partial, of length VL
+ sv.stu rptr, r0, t1 # pointer advances too
+ sub. n, n, t0 # should not alter CA
+ bnz loop # do more digits
+
+This is not that different from a Scalar Big-Int add, it is
+just that like all Cray-style Vectorisation, a variable number
+of elements are covered by one instruction. Of interest
+to people unfamiliar with Cray-style Vectors: if VL is not
+permitted to exceed 1 (MAXVL set to 1) then the above
+actually becomes a Scalar Big-Int add algorithm.
# Multiply