Notes:
-* Setting MVL to 8 is just an example. If enough registers are spare it may be set to XLEN which will require a bank of 8 registers for a1, a3 and t0.
-* obviously if that is done, t0 is not separated by 8 full registers, and would overwrite t1 etc. x80 would work well, as an example, instead.
+* Setting MVL to 8 is just an example. If enough registers are spare it may be set to XLEN which will require a bank of 8 scalar registers for a1, a3 and t0.
+* obviously if that is done, t0 is not separated by 8 full registers, and would overwrite t1 thru t7. x80 would work well, as an example, instead.
* with the exception of the GETVL (a pseudo code alias for csrr), every single instruction above may use RVC.
* RVC C.BNEZ can be used because rs1' may be extended to the full 128 registers through redirection
* RVC C.LW and C.SW may be used because the W format may be overridden by the 8 bit format. All of t0, a3 and a1 are overridden to make that work.
* with the exception of the GETVL, all Vector Context may be done in VBLOCK form.
-* setting predication to x0 and invert on t0 is a trick to enable just ffirst on t0
+* setting predication to x0 (zero) and invert on t0 is a trick to enable just ffirst on t0
* ldb and bne are both using t0, both in ffirst mode
* ldb will end on illegal mem, reduce VL, but copied all sorts of stuff into t0
* bne t0 x0 tests up to the NEW VL for nonzero, vector t0 against scalar x0
* SETVL sets *exactly* the requested amount into VL.
* the SETVL just after allnonzero label is needed in case the ldb ffirst activates but the bne allzeros does not.
* this would cause the stb to copy up to the end of the legal memory
-* of course, on the next loop the ldb would throw a trap, as a1 points to the first illegal mem location.
+* of course, on the next loop the ldb would throw a trap, as a1 now points to the first illegal mem location.
RVV version: