only store the last XER.CA. The size of the underlying back-end SIMD ALU
is entirely at the discretion of the implementer.
-If there is pressure on the register file (multi-million-digit big integers)
+If there is pressure on the register file (or
+multi-million-digit big integers)
then a partial-sum may be carried out with LD and ST in a standard
Cray-style Vector Loop:
- aptr = A address
- bptr = B address
- rptr = Result address
- li r0, 0 # used to help clear CA
- addic r0, r0, 0 # CA to zero as well
- setmvli 8 # set MAXVL to 8
+ aptr = A address
+ bptr = B address
+ rptr = Result address
+ li r0, 0 # used to help clear CA
+ addic r0, r0, 0 # CA to zero as well
+ setmvli 8 # set MAXVL to 8
loop:
setvl t0, n # n is the number of digits
- mulli t1, t0, 8 # 8 bytes per element
+ mulli t1, t0, 8 # 8 bytes per digit/element
sv.ldu a0, aptr, t1 # update advances pointer
sv.ldu b0, bptr, t1 # likewise
- sv.adde r0, a0, b0 # partial, of length VL
+ sv.adde r0, a0, b0 # takes in CA, updates CA
sv.stu rptr, r0, t1 # pointer advances too
sub. n, n, t0 # should not alter CA
bnz loop # do more digits