SO bit set on the very last element, when all loops reach their maximum
extent.
-*Programmer's note (1): VL in some situations, particularly larger Matrices,
-may exceed 64,
-meaning that `sv.svshape` returning a considerable number of values. Under
-such circumstances `sv.svshape/ew=8` is recommended.*
+*Programmer's note: VL in some situations, particularly larger Matrices
+(5x7x3 will set MAXVL=105),
+will cause `sv.svstep` to return a considerable number of values. Under
+such circumstances `sv.svstep/ew=8` is recommended.*
-*Programmer's note (2): having conveniently obtained a pre-computed
+*Programmer's note: having conveniently obtained a pre-computed
Schedule with `sv.svstep`,
it may then be used as the input to Indexed REMAP Mode
to achieve the exact same Schedule. It is evident however that
before use some of the Indices may be arbitrarily altered as desired.
`sv.svstep` helps the programmer avoid having to manually recreate
Indices for certain
-types of common Loop patterns, and in its simplest form, without REMAP
+types of common Loop patterns. In its simplest form, without REMAP
(SVi=5 or SVi=6),
is equivalent to the `iota` instruction found in other Vector ISAs*
Vertical First is effectively like an implicit single bit predicate
applied to every SVP64 instruction. **ONLY** one element in each
SVP64 Vector instruction is executed; srcstep and dststep do **not**
-increment, and the Program Counter progresses **immediately** to
+increment automatically on completion of one instruction,
+and the Program Counter progresses **immediately** to
the next instruction just as it would for any standard scalar v3.0B
instruction.
*This includes in Vertical-First Mode*, and programmers should be keenly
aware that srcstep or dststep or both *may* jump by more than one as
a result, because the actual request under these circumstances was to execute
-on the first available next *non-masked-out* element.
-
-*Programmers should be aware that VL, srcstep and dststep are global in nature.
+on the first available next *non-masked-out* element. It should be
+evident that it is the `sv.svstep` instruction that must be Predicated
+in order for the **entire** loop to use the Predicate correctly, and
+it is strongly recommended for all instructions within the same
+Vertical-First Loop to utilise the exact same Predicate Mask(s).*
+
+Programmers should be aware that VL, srcstep and dststep and
+the SUBVL substeps are global in nature.
Nested looping with different schedules is perfectly possible, as is
-calling of functions, however SVSTATE (and any associated SVSTATE) should
-obviously be stored on the stack in order to achieve this benefit*
+calling of functions, however SVSTATE (and any associated SVSHAPEs
+if REMAP is being used) should
+obviously be stored on the stack in order to achieve this benefit
+not normally found in Vector ISAs.
[[!tag standards]]