# SVP64 Power ISA version
+In SVP64 Power ISA assembler, the algorithm, despite easy parallelism in
+hardware, is almost deceptively simple and straightforward. There are however
+some key additions over Standard Scalar (SFFS Subset) Power ISA 3.0 that
+need explaining.
+
+```
+# r5: n count; r6: x ptr; r7: y ptr; fp1: a
+1 mtctr 5 # move n to CTR
+2 .L2
+3 setvl MAXVL=32,VL=CTR # actually VL=MIN(MAXVL,CTR)
+4 sv.lfdup *32,8(6) # load x into fp32-63, incr x
+5 sv.lfd/els *64,8(7) # load y into fp64-95, NO INC
+6 sv.fmadd *64,*64,1,*32 # (*y) = (*y) * (*x) + a
+7 sv.stfdup *64,8(7) # store at y, post-incr y
+8 sv.bc/ctr .L2 # decr CTR by VL, jump !zero
+9 blr # return
+```
+
The first instruction is simple: the plan is to use CTR for looping.
Therefore, copy n (r5) into CTR. Next however, at the start of
the loop (L2) is not so obvious: MAXVL is being set to 32
The end result is an exceptionally compact daxpy that is easy to read
and understand.
-```
-# r5: n count; r6: x ptr; r7: y ptr; fp1: a
-1 mtctr 5 # move n to CTR
-2 .L2
-3 setvl MAXVL=32,VL=CTR # actually VL=MIN(MAXVL,CTR)
-4 sv.lfdup *32,8(6) # load x into fp32-63, incr x
-5 sv.lfd/els *64,8(7) # load y into fp64-95, NO INC
-6 sv.fmadd *64,*64,1,*32 # (*y) = (*y) * (*x) + a
-7 sv.stfdup *64,8(7) # store at y, post-incr y
-8 sv.bc/ctr .L2 # decr CTR by VL, jump !zero
-9 blr # return
-```
-
# RVV version