SO bit set on the very last element, when all loops reach their maximum
extent.
-*Programmer's note: VL in some situations, particularly larger Matrices
-(5x7x3 will set MAXVL=105),
-will cause `sv.svstep` to return a considerable number of values. Under
-such circumstances `sv.svstep/ew=8` is recommended.*
-
-*Programmer's note: having conveniently obtained a pre-computed
-Schedule with `sv.svstep`,
-it may then be used as the input to Indexed REMAP Mode
-to achieve the exact same Schedule. It is evident however that
+*Programmer's note: VL in some situations, particularly larger
+Matrices (5x7x3 will set MAXVL=105), will cause `sv.svstep` to return a
+considerable number of values. Under such circumstances `sv.svstep/ew=8`
+is recommended.*
+
+*Programmer's note: having conveniently obtained a pre-computed Schedule
+with `sv.svstep`, it may then be used as the input to Indexed REMAP
+Mode to achieve the exact same Schedule. It is evident however that
before use some of the Indices may be arbitrarily altered as desired.
`sv.svstep` helps the programmer avoid having to manually recreate
-Indices for certain
-types of common Loop patterns. In its simplest form, without REMAP
-(SVi=5 or SVi=6),
-is equivalent to the `iota` instruction found in other Vector ISAs*
+Indices for certain types of common Loop patterns. In its simplest form,
+without REMAP (SVi=5 or SVi=6), is equivalent to the `iota` instruction
+found in other Vector ISAs*
**Vertical First Mode**
Vertical First is effectively like an implicit single bit predicate
-applied to every SVP64 instruction. **ONLY** one element in each
-SVP64 Vector instruction is executed; srcstep and dststep do **not**
-increment automatically on completion of one instruction,
-and the Program Counter progresses **immediately** to
-the next instruction just as it would for any standard scalar v3.0B
-instruction.
+applied to every SVP64 instruction. **ONLY** one element in each SVP64
+Vector instruction is executed; srcstep and dststep do **not** increment
+automatically on completion of one instruction, and the Program Counter
+progresses **immediately** to the next instruction just as it would for
+any standard scalar v3.0B instruction.
-A mode of srcstep (SVi=0) is called which can move srcstep and
-dststep on to the next element, still respecting predicate
-masks.
+A mode of srcstep (SVi=0) is called which can move srcstep and dststep
+on to the next element, still respecting predicate masks.
In other words, where normal SVP64 Vectorisation acts "horizontally"
-by looping first through 0 to VL-1 and only then moving the PC
-to the next instruction, Vertical-First moves the PC onwards
-(vertically) through multiple instructions **with the same
-srcstep and dststep**, then an explict instruction used to
-advance srcstep/dststep. An outer loop is expected to be
-used (branch instruction) which completes a series of
-Vector operations.
-
-Testing any end condition of any loop of any REMAP state allows branches to be
-used to create loops.
-
-Programmer's note: when Predicate Non-Zeroing is used this indicates to
+by looping first through 0 to VL-1 and only then moving the PC to the
+next instruction, Vertical-First moves the PC onwards (vertically)
+through multiple instructions **with the same srcstep and dststep**,
+then an explict instruction used to advance srcstep/dststep. An outer
+loop is expected to be used (branch instruction) which completes a series
+of Vector operations.
+
+Testing any end condition of any loop of any REMAP state allows branches
+to be used to create loops.
+
+*Programmer's note: when Predicate Non-Zeroing is used this indicates to
the underlying hardware that any masked-out element must be skipped.
-*This includes in Vertical-First Mode*, and programmers should be keenly
-aware that srcstep or dststep or both *may* jump by more than one as
-a result, because the actual request under these circumstances was to execute
-on the first available next *non-masked-out* element. It should be
-evident that it is the `sv.svstep` instruction that must be Predicated
-in order for the **entire** loop to use the Predicate correctly, and
-it is strongly recommended for all instructions within the same
-Vertical-First Loop to utilise the exact same Predicate Mask(s).*
-
-Programmers should be aware that VL, srcstep and dststep and
-the SUBVL substeps are global in nature.
-Nested looping with different schedules is perfectly possible, as is
-calling of functions, however SVSTATE (and any associated SVSHAPEs
-if REMAP is being used) should
-obviously be stored on the stack in order to achieve this benefit
-not normally found in Vector ISAs.
+*This includes in Vertical-First Mode*, and programmers should be
+keenly aware that srcstep or dststep or both *may* jump by more than
+one as a result, because the actual request under these circumstances
+was to execute on the first available next *non-masked-out* element.
+It should be evident that it is the `sv.svstep` instruction that must
+be Predicated in order for the **entire** loop to use the Predicate
+correctly, and it is strongly recommended for all instructions within
+the same Vertical-First Loop to utilise the exact same Predicate Mask(s).*
+
+Programmers should be aware that VL, srcstep and dststep and the SUBVL
+substeps are global in nature. Nested looping with different schedules
+is perfectly possible, as is calling of functions, however SVSTATE
+(and any associated SVSHAPEs if REMAP is being used) should obviously
+be stored on the stack in order to achieve this benefit not normally
+found in Vector ISAs.
-------------
# Appendix
-**SVSTATE_NEXT**
+**src_iterate**
-```
- if SVi = 1 then return REMAP SVSHAPE0 current offset
- if SVi = 2 then return REMAP SVSHAPE1 current offset
- if SVi = 3 then return REMAP SVSHAPE2 current offset
- if SVi = 4 then return REMAP SVSHAPE3 current offset
- if SVi = 5 then return SVSTATE.srcstep
- if SVi = 6 then return SVSTATE.dststep
- if SVi = 7 then return SVSTATE.ssubstep # SUBVL source step
- if SVi = 8 then return SVSTATE.dsubstep # SUBVL dest step
- # SVi=0, explicit iteration requezted
- src_iterate();
- dst_iterate();
- return 0
-```
+Note that `srcstep` and `ssubstep` are not the absolute final Element
+(and Sub-Element) offsets. `srcstep` still has to go through individual
+`REMAP` translation before becoming a per-operand (RA, RB, RC, RT, RS)
+Element-level Source offset.
-**ADVANCE_STEPS**
+Note also critically that `PACK` mode simply inverts the outer/order
+loops making SUBVL the outer loop and VL the inner.
```
- def src_iterate(): # source-stepping iterator
- subvl = self.subvl
- vl = self.svstate.vl
- pack = self.svstate.pack
- unpack = self.svstate.unpack
- ssubstep = self.svstate.ssubstep
- end_ssub = ssubstep == subvl
- end_src = self.svstate.srcstep == vl-1
- # first source step
- srcstep = self.svstate.srcstep
- srcmask = self.srcmask
- if pack:
- # pack advances subvl in *outer* loop
+ # source-stepping iterator
+ subvl = SVSTATE.subvl
+ vl = SVSTATE.vl
+ pack = SVSTATE.pack
+ unpack = SVSTATE.unpack
+ ssubstep = SVSTATE.ssubstep
+ end_ssub = ssubstep == subvl
+ end_src = SVSTATE.srcstep == vl-1
+ # first source step.
+ srcstep = SVSTATE.srcstep
+ # used below:
+ # sz - from RM.MODE, source-zeroing
+ # srcmask - from RM.MODE, the source predicate
+ if pack:
+ # pack advances subvl in *outer* loop
+ while True:
+ assert srcstep <= vl-1
+ end_src = srcstep == vl-1
+ if end_src:
+ if end_ssub:
+ loopend = True
+ else:
+ SVSTATE.ssubstep += 1
+ srcstep = 0 # reset
+ break
+ else:
+ srcstep += 1 # advance srcstep
+ if not sz:
+ break
+ if ((1 << srcstep) & srcmask) != 0:
+ break
+ else:
+ # advance subvl in *inner* loop
+ if end_ssub:
while True:
assert srcstep <= vl-1
end_src = srcstep == vl-1
- if end_src:
- if end_ssub:
- self.loopend = True
- else:
- self.svstate.ssubstep += SelectableInt(1, 2)
- srcstep = 0 # reset
+ if end_src: # end-point
+ loopend = True
+ srcstep = 0
+ break
+ else:
+ srcstep += 1
+ if not sz:
+ break
+ if ((1 << srcstep) & srcmask) != 0:
break
else:
- srcstep += 1 # advance srcstep
- if not self.srcstep_skip:
- break
- if ((1 << srcstep) & srcmask) != 0:
- break
+ log(" sskip", bin(srcmask), bin(1 << srcstep))
+ SVSTATE.ssubstep = 0b00 # reset
else:
- # advance subvl in *inner* loop
- if end_ssub:
- while True:
- assert srcstep <= vl-1
- end_src = srcstep == vl-1
- if end_src: # end-point
- self.loopend = True
- srcstep = 0
- break
- else:
- srcstep += 1
- if not self.srcstep_skip:
- break
- if ((1 << srcstep) & srcmask) != 0:
- break
- else:
- log(" sskip", bin(srcmask), bin(1 << srcstep))
- self.svstate.ssubstep = SelectableInt(0, 2) # reset
+ # advance ssubstep
+ SVSTATE.ssubstep += 1
+
+ SVSTATE.srcstep = srcstep
+```
+
+-------------
+
+\newpage{}
+
+**dest_iterate**
+
+Note that `dststep` and `dsubstep` are not the absolute final Element
+(and Sub-Element) offsets. `dststep` still has to go through individual
+`REMAP` translation before becoming a per-operand (RT, RS/EA) destination
+Element-level offset, and `dsubstep` may also go through `(f)mv.swizzle`
+reordering.
+
+Note also critically that `UNPACK` mode simply inverts the outer/order
+loops making SUBVL the outer loop and VL the inner.
+
+```
+ # dest step iterator
+ vl = SVSTATE.vl
+ subvl = SVSTATE.subvl
+ unpack = SVSTATE.unpack
+ dsubstep = SVSTATE.dsubstep
+ end_dsub = dsubstep == subvl
+ dststep = SVSTATE.dststep
+ end_dst = dststep == vl-1
+ # used below:
+ # dz - from RM.MODE, destination-zeroing
+ # dstmask - from RM.MODE, the destination predicate
+ if unpack:
+ # unpack advances subvl in *outer* loop
+ while True:
+ assert dststep <= vl-1
+ end_dst = dststep == vl-1
+ if end_dst:
+ if end_dsub:
+ loopend = True
+ else:
+ SVSTATE.dsubstep += 1
+ dststep = 0 # reset
+ break
else:
- # advance ssubstep
- self.svstate.ssubstep += SelectableInt(1, 2)
-
- self.svstate.srcstep = SelectableInt(srcstep, 7)
-
- def dst_iterate(): # dest step iterator
- vl = self.svstate.vl
- subvl = self.subvl
- pack = self.svstate.pack
- unpack = self.svstate.unpack
- dsubstep = self.svstate.dsubstep
- end_dsub = dsubstep == subvl
- dststep = self.svstate.dststep
- end_dst = dststep == vl-1
- dstmask = self.dstmask
- # now dest step
- if unpack:
- # unpack advances subvl in *outer* loop
+ dststep += 1 # advance dststep
+ if not dz:
+ break
+ if ((1 << dststep) & dstmask) != 0:
+ break
+ else:
+ # advance subvl in *inner* loop
+ if end_dsub:
while True:
assert dststep <= vl-1
end_dst = dststep == vl-1
- if end_dst:
- if end_dsub:
- self.loopend = True
- else:
- self.svstate.dsubstep += SelectableInt(1, 2)
- dststep = 0 # reset
+ if end_dst: # end-point
+ loopend = True
+ dststep = 0
break
else:
- dststep += 1 # advance dststep
- if not self.dststep_skip:
- break
- if ((1 << dststep) & dstmask) != 0:
- break
+ dststep += 1
+ if not dz:
+ break
+ if ((1 << dststep) & dstmask) != 0:
+ break
+ SVSTATE.dsubstep = 0b00 # reset
else:
- # advance subvl in *inner* loop
- if end_dsub:
- while True:
- assert dststep <= vl-1
- end_dst = dststep == vl-1
- if end_dst: # end-point
- self.loopend = True
- dststep = 0
- break
- else:
- dststep += 1
- if not self.dststep_skip:
- break
- if ((1 << dststep) & dstmask) != 0:
- break
- self.svstate.dsubstep = SelectableInt(0, 2) # reset
- else:
- # advance ssubstep
- self.svstate.dsubstep += SelectableInt(1, 2)
+ # advance ssubstep
+ SVSTATE.dsubstep += 1
+
+ SVSTATE.dststep = dststep
+```
+
+-------------
+
+\newpage{}
- self.svstate.dststep = SelectableInt(dststep, 7)
+**SVSTATE_NEXT**
+```
+ if SVi = 1 then return REMAP SVSHAPE0 current offset
+ if SVi = 2 then return REMAP SVSHAPE1 current offset
+ if SVi = 3 then return REMAP SVSHAPE2 current offset
+ if SVi = 4 then return REMAP SVSHAPE3 current offset
+ if SVi = 5 then return SVSTATE.srcstep # VL source step
+ if SVi = 6 then return SVSTATE.dststep # VL dest step
+ if SVi = 7 then return SVSTATE.ssubstep # SUBVL source step
+ if SVi = 8 then return SVSTATE.dsubstep # SUBVL dest step
+
+ # SVi=0, explicit iteration requezted
+ src_iterate();
+ dst_iterate();
+ return 0
+```
+
+**at_loopend**
+
+Both Vertical-First and Horizontal-First may use this algorithm to
+determine if the "end-of-looping" (end of Sub-Program-Counter) has
+been reached. Horizontal-First Mode will immediately move to the
+next instruction, where `svstep.` will set `CR0.EQ` to 1.
+
+```
+ # tells if this is the last possible element.
+ subvl = SVSTATE.subvl
+ vl = SVSTATE.vl
+ end_ssub = SVSTATE.ssubstep == subvl
+ end_dsub = SVSTATE.dsubstep == subvl
+ if SVSTATE.srcstep == vl-1 and end_ssub:
+ return True
+ if SVSTATE.dststep == vl-1 and end_dsub:
+ return True
+ return False
```
[[!tag standards]]