regfiles for counters), simply two instructions:
setvli r0, MVL=64, VL=64
- ld r0.v, 0(r30) # load exactly 64 registers from memory
+ sv.ld *r0, 0(r30) # load exactly 64 registers from memory
Page Faults etc. aside this is *guaranteed* 100% without fail to perform
64 unit-strided LDs starting from the address pointed to by r30 and put
the contents into r0 through r63. Thus it becomes a "LOAD-MULTI". Twin
Predication could even be used to only load relevant registers from
the stack. This *only works if VL is set to the requested value* rather
-than, as in RVV, allowing the hardware to set VL to an arbitrary value.
+than, as in RVV, allowing the hardware to set VL to an arbitrary value
+(due to variances in implementation choices).
Also available is the option to set VL from CTR (`VL = MIN(CTR, MVL)`.
In combination with SVP64 [[sv/branches]] this can save one instruction
-inside critical inner loops. Note: to avoid having an extra opcode
-bit in `setvl`,
-to select CTR is slightly convoluted.
+inside critical inner loops. A caveat: to avoid having an extra opcode
+bit in `setvl`, selection of CTR mode is slightly convoluted.
# Format