From 90857dced783436245e25e58d7a3b5d93a418b1a Mon Sep 17 00:00:00 2001 From: lkcl Date: Fri, 5 Jan 2024 16:13:47 +0000 Subject: [PATCH] --- openpower/sv/cookbook/daxpy_example.mdwn | 31 ++++++++++++++---------- 1 file changed, 18 insertions(+), 13 deletions(-) diff --git a/openpower/sv/cookbook/daxpy_example.mdwn b/openpower/sv/cookbook/daxpy_example.mdwn index fc026cf79..aba10f784 100644 --- a/openpower/sv/cookbook/daxpy_example.mdwn +++ b/openpower/sv/cookbook/daxpy_example.mdwn @@ -23,6 +23,24 @@ Summary # SVP64 Power ISA version +In SVP64 Power ISA assembler, the algorithm, despite easy parallelism in +hardware, is almost deceptively simple and straightforward. There are however +some key additions over Standard Scalar (SFFS Subset) Power ISA 3.0 that +need explaining. + +``` +# r5: n count; r6: x ptr; r7: y ptr; fp1: a +1 mtctr 5 # move n to CTR +2 .L2 +3 setvl MAXVL=32,VL=CTR # actually VL=MIN(MAXVL,CTR) +4 sv.lfdup *32,8(6) # load x into fp32-63, incr x +5 sv.lfd/els *64,8(7) # load y into fp64-95, NO INC +6 sv.fmadd *64,*64,1,*32 # (*y) = (*y) * (*x) + a +7 sv.stfdup *64,8(7) # store at y, post-incr y +8 sv.bc/ctr .L2 # decr CTR by VL, jump !zero +9 blr # return +``` + The first instruction is simple: the plan is to use CTR for looping. Therefore, copy n (r5) into CTR. Next however, at the start of the loop (L2) is not so obvious: MAXVL is being set to 32 @@ -88,19 +106,6 @@ since its inception: we propose in SVP64 to add "Decrement CTR by VL". The end result is an exceptionally compact daxpy that is easy to read and understand. -``` -# r5: n count; r6: x ptr; r7: y ptr; fp1: a -1 mtctr 5 # move n to CTR -2 .L2 -3 setvl MAXVL=32,VL=CTR # actually VL=MIN(MAXVL,CTR) -4 sv.lfdup *32,8(6) # load x into fp32-63, incr x -5 sv.lfd/els *64,8(7) # load y into fp64-95, NO INC -6 sv.fmadd *64,*64,1,*32 # (*y) = (*y) * (*x) + a -7 sv.stfdup *64,8(7) # store at y, post-incr y -8 sv.bc/ctr .L2 # decr CTR by VL, jump !zero -9 blr # return -``` - # RVV version -- 2.30.2