\newpage{}
-[[!inline pages="openpower/sv/po9_encoding" raw=yes ]]
-
**EXT000-EXT063**
<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_bigint.py;hb=HEAD>
+\newpage{}
+# Vectorised strncpy
+
+Aside from the `blr` return instruction this is an entire fully-functional
+implementation of `strncpy` which demonstrates some of the remarkably
+powerful capabilities of Simple-V. Load Fault-First avoids instruction
+traps and page faults in the middle of the Vectorised Load, providing
+the *micro-architecture* with the opportunity to notify the program of
+the successful Vector Length. `sv.cmpi` is the next strategically-critical
+instruction, as it searches for a zero and yet *includes* it in a new
+Vector Length - bearing in mind that the previous instruction (the Load)
+*also* truncated down to the valid number of LDs performed. Finally,
+a Vectorised Branch-Conditional automatically decrements CTR by the number
+of elements copied (VL), rather than decrementing simply by one.
+
+```
+ 41 "mtspr 9, 3", # move r3 to CTR
+ 42 "addi 0,0,0", # initialise r0 to zero
+ 43 # chr-copy loop starts here:
+ 44 # for (i = 0; i < n && src[i] != '\0'; i++)
+ 45 # dest[i] = src[i];
+ 46 # VL (and r1) = MIN(CTR,MAXVL=4)
+ 47 "setvl 1,0,%d,0,1,1" % maxvl,
+ 48 # load VL bytes (update r10 addr)
+ 49 "sv.lbzu/pi *16, 1(10)",
+ 50 "sv.cmpi/ff=eq/vli *0,1,*16,0", # compare against zero, truncate VL
+ 51 # store VL bytes (update r12 addr)
+ 52 "sv.stbu/pi *16, 1(12)",
+ 53 "sv.bc/all 0, *2, -0x1c", # test CTR, stop if cmpi failed
+ 54 # zeroing loop starts here:
+ 55 # for ( ; i < n; i++)
+ 56 # dest[i] = '\0';
+ 57 # VL (and r1) = MIN(CTR,MAXVL=4)
+ 58 "setvl 1,0,%d,0,1,1" % maxvl,
+ 59 # store VL zeros (update r12 addr)
+ 60 "sv.stbu/pi 0, 1(12)",
+ 61 "sv.bc 16, *0, -0xc", # decrement CTR by VL, stop at zero
+```
+
+[[!inline pages="openpower/sv/po9_encoding" raw=yes ]]
+
[[!tag opf_rfc]]
[^zolc]: first introduced in DSPs, Zero-Overhead Loops are astoundingly effective in reducing total number of instructions executed or needed. [ZOLC](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.301.4646&rep=rep1&type=pdf) reduces instructions by **25 to 80 percent**.