From bbb186ab1ab26cc0d36065521a82b5d53cfa607e Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Tue, 11 Apr 2023 18:58:44 +0100 Subject: [PATCH] add Vectorised strncpy example and move Definitions and EXT09 to end of ls001 --- openpower/sv/rfc/ls001.mdwn | 43 +++++++++++++++++++++++++++++++++++-- 1 file changed, 41 insertions(+), 2 deletions(-) diff --git a/openpower/sv/rfc/ls001.mdwn b/openpower/sv/rfc/ls001.mdwn index 67ac35117..65bbb2585 100644 --- a/openpower/sv/rfc/ls001.mdwn +++ b/openpower/sv/rfc/ls001.mdwn @@ -874,8 +874,6 @@ on their merits. \newpage{} -[[!inline pages="openpower/sv/po9_encoding" raw=yes ]] - **EXT000-EXT063** @@ -1295,6 +1293,47 @@ performance and also greatly simplify unlimited-length biginteger algorithms. +\newpage{} +# Vectorised strncpy + +Aside from the `blr` return instruction this is an entire fully-functional +implementation of `strncpy` which demonstrates some of the remarkably +powerful capabilities of Simple-V. Load Fault-First avoids instruction +traps and page faults in the middle of the Vectorised Load, providing +the *micro-architecture* with the opportunity to notify the program of +the successful Vector Length. `sv.cmpi` is the next strategically-critical +instruction, as it searches for a zero and yet *includes* it in a new +Vector Length - bearing in mind that the previous instruction (the Load) +*also* truncated down to the valid number of LDs performed. Finally, +a Vectorised Branch-Conditional automatically decrements CTR by the number +of elements copied (VL), rather than decrementing simply by one. + +``` + 41 "mtspr 9, 3", # move r3 to CTR + 42 "addi 0,0,0", # initialise r0 to zero + 43 # chr-copy loop starts here: + 44 # for (i = 0; i < n && src[i] != '\0'; i++) + 45 # dest[i] = src[i]; + 46 # VL (and r1) = MIN(CTR,MAXVL=4) + 47 "setvl 1,0,%d,0,1,1" % maxvl, + 48 # load VL bytes (update r10 addr) + 49 "sv.lbzu/pi *16, 1(10)", + 50 "sv.cmpi/ff=eq/vli *0,1,*16,0", # compare against zero, truncate VL + 51 # store VL bytes (update r12 addr) + 52 "sv.stbu/pi *16, 1(12)", + 53 "sv.bc/all 0, *2, -0x1c", # test CTR, stop if cmpi failed + 54 # zeroing loop starts here: + 55 # for ( ; i < n; i++) + 56 # dest[i] = '\0'; + 57 # VL (and r1) = MIN(CTR,MAXVL=4) + 58 "setvl 1,0,%d,0,1,1" % maxvl, + 59 # store VL zeros (update r12 addr) + 60 "sv.stbu/pi 0, 1(12)", + 61 "sv.bc 16, *0, -0xc", # decrement CTR by VL, stop at zero +``` + +[[!inline pages="openpower/sv/po9_encoding" raw=yes ]] + [[!tag opf_rfc]] [^zolc]: first introduced in DSPs, Zero-Overhead Loops are astoundingly effective in reducing total number of instructions executed or needed. [ZOLC](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.301.4646&rep=rep1&type=pdf) reduces instructions by **25 to 80 percent**. -- 2.30.2