getvl r5 : setvl r5, r0, vf=0, vs=0, ms=0
getvl. r5 : setvl. r5, r0, vf=0, vs=0, ms=0
+Note that whilst it is possible to set both MVL and VL from the same
+immediate, it is not possible to set them to different immediates in
+the same instruction. Doing so would require two instructions.
+
+**Selecting sources for VL**
+
+There is considerable opcode pressure, consequently to set MVL and VL
+from different sources is as follows:
+
+| condition | effect |
+| - | - |
+| `vs=1, RA=0, RT!=0` | VL,RT set to MIN(MVL, CTR) |
+| `vs=1, RA=0, RT=0` | VL set to MIN(MVL, SVi+1) |
+| `vs=1, RA!=0, RT=0` | VL set to MIN(MVL, RA) |
+| `vs=1, RA!=0, RT!=0` | VL,RT set to MIN(MVL, RA) |
+
+The reasoning here is that the opportunity to set RT equal to the
+immediate `SVi+1` is sacrificed in favour of setting from CTR.
+
+# Unusual Rc=1 behaviour
+
+Normally, the return result from an instruction is in `RT`. With
+it being possible for `RT=0` to mean that `CTR` mode is to be read,
+some different semantics are needed.
+
+CR Field 0, when `Rc=1`, may be set even if `RT=0`. The reason is that
+overflow may occur: `VL`, if set either from an immediate or from `CTR`,
+may not exceed `MAXVL`, and if it is, `CR0.SO` must be set.
+
+Additionally, in reality it is **`VL`** being set. Therefore, rather
+than `CR0` testing `RT` when `Rc=1`, CR0.EQ is set if `VL=0`, CR0.GE
+is set if `VL` is non-zero.
+
+# Vertical First Mode
+
+Vertical First is effectively like an implicit single bit predicate
+applied to every SVP64 instruction. **ONLY** one element in each
+SVP64 Vector instruction is executed; srcstep and dststep do **not**
+increment, and the Program Counter progresses **immediately** to
+the next instruction just as it would for any standard scalar v3.0B
+instruction.
+
+An explicit mode of setvl is called which can move srcstep and
+dststep on to the next element, still respecting predicate
+masks.
+
+In other words, where normal SVP64 Vectorisation acts "horizontally"
+by looping first through 0 to VL-1 and only then moving the PC
+to the next instruction, Vertical-First moves the PC onwards
+(vertically) through multiple instructions **with the same
+srcstep and dststep**, then an explict instruction used to
+advance srcstep/dststep. An outer loop is expected to be
+used (branch instruction) which completes a series of
+Vector operations.
+
+```svfstep``` mode is enabled when vf=1, vs=0 and ms=0.
+When Rc=1 it is possible to determine when any level of
+loops reach an end condition, or if VL has been reached. The immediate can
+be reinterpreted as indicating which SVSTATE (0-3)
+should be tested and placed into CR0 (when Rc=1)
+
+When RT is not zero, an internal stepping index may also be returned,
+either the REMAP index or srcstep or dststep. This table is identical
+to that of [[sv/svstep]]:
+
+* `SVi=1`: also include inner middle and outer
+ loop end conditions from SVSTATE0 into CR.EQ CR.LE CR.GT
+* `SVi=2`: test SVSTATE1 (and return conditions)
+* `SVi=3`: test SVSTATE2 (and return conditions)
+* `SVi=4`: test SVSTATE3 (and return conditions)
+* `SVi=5`: `SVSTATE.srcstep` is returned.
+* `SVi=6`: `SVSTATE.dststep` is returned.
+
+Testing any end condition of any loop of any REMAP state allows branches to be used to create loops.
+
+*Programmers should be aware that VL, srcstep and dststep are global in nature.
+Nested looping with different schedules is perfectly possible, as is
+calling of functions, however SVSTATE (and any associated SVSTATE) should be stored on the stack.*
+
+**SUBVL**
+
+Sub-vector elements are not be considered "Vertical". The vec2/3/4
+is to be considered as if the "single element". Caveats exist for
+[[sv/mv.swizzle]] and [[sv/mv.vec]] when Pack/Unpack is enabled,
+due to the order in which VL and SUBVL loops are applied being
+swapped (outer-inner becomes inner-outer)
+
+# Examples
+
+## Core concept loop
+
+```
+loop:
+ setvl a3, a0, MVL=8 # update a3 with vl
+ # (# of elements this iteration)
+ # set MVL to 8
+ # do vector operations at up to 8 length (MVL=8)
+ # ...
+ sub a0, a0, a3 # Decrement count by vl
+ bnez a0, loop # Any more?
+```
+
+## Loop using Rc=1
+
+ my_fn:
+ li r3, 1000
+ b test
+ loop:
+ sub r3, r3, r4
+ ...
+ test:
+ setvli. r4, r3, MVL=64
+ bne cr0, loop
+ end:
+ blr
+
+## Load/Store-Multi (selective)
+
+Up to 64 FPRs will be loaded, here. `r3` is set one per bit
+for each FP register required to be loaded. The block of memory
+from which the registers are loaded is contiguous (no gaps):
+any FP register which has a corresponding zero bit in `r3`
+is *unaltered*. In essence this is a selective LD-multi with
+"Scatter" capability.
+
+ setvli r0, MVL=64, VL=64
+ sv.fld/dm=r3 *r0, 0(r30) # selective load 64 FP registers
+
+Up to 64 FPRs will be saved, here. Again, `r3`
+
+ setvli r0, MVL=64, VL=64
+ sv.stfd/sm=r3 *fp0, 0(r30) # selective store 64 FP registers
+
-------------
\newpage{}