From 1800a1e96faad3652632f310d07547ce47ee003a Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Fri, 14 Apr 2023 19:10:32 +0100 Subject: [PATCH] cleanup on svstep pseudocode --- openpower/sv/svstep.mdwn | 359 ++++++++++++++++++++++----------------- 1 file changed, 202 insertions(+), 157 deletions(-) diff --git a/openpower/sv/svstep.mdwn b/openpower/sv/svstep.mdwn index 4a5fc363d..1f19a136d 100644 --- a/openpower/sv/svstep.mdwn +++ b/openpower/sv/svstep.mdwn @@ -75,66 +75,60 @@ the LE bit for the second, the GT bit for the outermost loop and the SO bit set on the very last element, when all loops reach their maximum extent. -*Programmer's note: VL in some situations, particularly larger Matrices -(5x7x3 will set MAXVL=105), -will cause `sv.svstep` to return a considerable number of values. Under -such circumstances `sv.svstep/ew=8` is recommended.* - -*Programmer's note: having conveniently obtained a pre-computed -Schedule with `sv.svstep`, -it may then be used as the input to Indexed REMAP Mode -to achieve the exact same Schedule. It is evident however that +*Programmer's note: VL in some situations, particularly larger +Matrices (5x7x3 will set MAXVL=105), will cause `sv.svstep` to return a +considerable number of values. Under such circumstances `sv.svstep/ew=8` +is recommended.* + +*Programmer's note: having conveniently obtained a pre-computed Schedule +with `sv.svstep`, it may then be used as the input to Indexed REMAP +Mode to achieve the exact same Schedule. It is evident however that before use some of the Indices may be arbitrarily altered as desired. `sv.svstep` helps the programmer avoid having to manually recreate -Indices for certain -types of common Loop patterns. In its simplest form, without REMAP -(SVi=5 or SVi=6), -is equivalent to the `iota` instruction found in other Vector ISAs* +Indices for certain types of common Loop patterns. In its simplest form, +without REMAP (SVi=5 or SVi=6), is equivalent to the `iota` instruction +found in other Vector ISAs* **Vertical First Mode** Vertical First is effectively like an implicit single bit predicate -applied to every SVP64 instruction. **ONLY** one element in each -SVP64 Vector instruction is executed; srcstep and dststep do **not** -increment automatically on completion of one instruction, -and the Program Counter progresses **immediately** to -the next instruction just as it would for any standard scalar v3.0B -instruction. +applied to every SVP64 instruction. **ONLY** one element in each SVP64 +Vector instruction is executed; srcstep and dststep do **not** increment +automatically on completion of one instruction, and the Program Counter +progresses **immediately** to the next instruction just as it would for +any standard scalar v3.0B instruction. -A mode of srcstep (SVi=0) is called which can move srcstep and -dststep on to the next element, still respecting predicate -masks. +A mode of srcstep (SVi=0) is called which can move srcstep and dststep +on to the next element, still respecting predicate masks. In other words, where normal SVP64 Vectorisation acts "horizontally" -by looping first through 0 to VL-1 and only then moving the PC -to the next instruction, Vertical-First moves the PC onwards -(vertically) through multiple instructions **with the same -srcstep and dststep**, then an explict instruction used to -advance srcstep/dststep. An outer loop is expected to be -used (branch instruction) which completes a series of -Vector operations. - -Testing any end condition of any loop of any REMAP state allows branches to be -used to create loops. - -Programmer's note: when Predicate Non-Zeroing is used this indicates to +by looping first through 0 to VL-1 and only then moving the PC to the +next instruction, Vertical-First moves the PC onwards (vertically) +through multiple instructions **with the same srcstep and dststep**, +then an explict instruction used to advance srcstep/dststep. An outer +loop is expected to be used (branch instruction) which completes a series +of Vector operations. + +Testing any end condition of any loop of any REMAP state allows branches +to be used to create loops. + +*Programmer's note: when Predicate Non-Zeroing is used this indicates to the underlying hardware that any masked-out element must be skipped. -*This includes in Vertical-First Mode*, and programmers should be keenly -aware that srcstep or dststep or both *may* jump by more than one as -a result, because the actual request under these circumstances was to execute -on the first available next *non-masked-out* element. It should be -evident that it is the `sv.svstep` instruction that must be Predicated -in order for the **entire** loop to use the Predicate correctly, and -it is strongly recommended for all instructions within the same -Vertical-First Loop to utilise the exact same Predicate Mask(s).* - -Programmers should be aware that VL, srcstep and dststep and -the SUBVL substeps are global in nature. -Nested looping with different schedules is perfectly possible, as is -calling of functions, however SVSTATE (and any associated SVSHAPEs -if REMAP is being used) should -obviously be stored on the stack in order to achieve this benefit -not normally found in Vector ISAs. +*This includes in Vertical-First Mode*, and programmers should be +keenly aware that srcstep or dststep or both *may* jump by more than +one as a result, because the actual request under these circumstances +was to execute on the first available next *non-masked-out* element. +It should be evident that it is the `sv.svstep` instruction that must +be Predicated in order for the **entire** loop to use the Predicate +correctly, and it is strongly recommended for all instructions within +the same Vertical-First Loop to utilise the exact same Predicate Mask(s).* + +Programmers should be aware that VL, srcstep and dststep and the SUBVL +substeps are global in nature. Nested looping with different schedules +is perfectly possible, as is calling of functions, however SVSTATE +(and any associated SVSHAPEs if REMAP is being used) should obviously +be stored on the stack in order to achieve this benefit not normally +found in Vector ISAs. ------------- @@ -142,132 +136,183 @@ not normally found in Vector ISAs. # Appendix -**SVSTATE_NEXT** +**src_iterate** -``` - if SVi = 1 then return REMAP SVSHAPE0 current offset - if SVi = 2 then return REMAP SVSHAPE1 current offset - if SVi = 3 then return REMAP SVSHAPE2 current offset - if SVi = 4 then return REMAP SVSHAPE3 current offset - if SVi = 5 then return SVSTATE.srcstep - if SVi = 6 then return SVSTATE.dststep - if SVi = 7 then return SVSTATE.ssubstep # SUBVL source step - if SVi = 8 then return SVSTATE.dsubstep # SUBVL dest step - # SVi=0, explicit iteration requezted - src_iterate(); - dst_iterate(); - return 0 -``` +Note that `srcstep` and `ssubstep` are not the absolute final Element +(and Sub-Element) offsets. `srcstep` still has to go through individual +`REMAP` translation before becoming a per-operand (RA, RB, RC, RT, RS) +Element-level Source offset. -**ADVANCE_STEPS** +Note also critically that `PACK` mode simply inverts the outer/order +loops making SUBVL the outer loop and VL the inner. ``` - def src_iterate(): # source-stepping iterator - subvl = self.subvl - vl = self.svstate.vl - pack = self.svstate.pack - unpack = self.svstate.unpack - ssubstep = self.svstate.ssubstep - end_ssub = ssubstep == subvl - end_src = self.svstate.srcstep == vl-1 - # first source step - srcstep = self.svstate.srcstep - srcmask = self.srcmask - if pack: - # pack advances subvl in *outer* loop + # source-stepping iterator + subvl = SVSTATE.subvl + vl = SVSTATE.vl + pack = SVSTATE.pack + unpack = SVSTATE.unpack + ssubstep = SVSTATE.ssubstep + end_ssub = ssubstep == subvl + end_src = SVSTATE.srcstep == vl-1 + # first source step. + srcstep = SVSTATE.srcstep + # used below: + # sz - from RM.MODE, source-zeroing + # srcmask - from RM.MODE, the source predicate + if pack: + # pack advances subvl in *outer* loop + while True: + assert srcstep <= vl-1 + end_src = srcstep == vl-1 + if end_src: + if end_ssub: + loopend = True + else: + SVSTATE.ssubstep += 1 + srcstep = 0 # reset + break + else: + srcstep += 1 # advance srcstep + if not sz: + break + if ((1 << srcstep) & srcmask) != 0: + break + else: + # advance subvl in *inner* loop + if end_ssub: while True: assert srcstep <= vl-1 end_src = srcstep == vl-1 - if end_src: - if end_ssub: - self.loopend = True - else: - self.svstate.ssubstep += SelectableInt(1, 2) - srcstep = 0 # reset + if end_src: # end-point + loopend = True + srcstep = 0 + break + else: + srcstep += 1 + if not sz: + break + if ((1 << srcstep) & srcmask) != 0: break else: - srcstep += 1 # advance srcstep - if not self.srcstep_skip: - break - if ((1 << srcstep) & srcmask) != 0: - break + log(" sskip", bin(srcmask), bin(1 << srcstep)) + SVSTATE.ssubstep = 0b00 # reset else: - # advance subvl in *inner* loop - if end_ssub: - while True: - assert srcstep <= vl-1 - end_src = srcstep == vl-1 - if end_src: # end-point - self.loopend = True - srcstep = 0 - break - else: - srcstep += 1 - if not self.srcstep_skip: - break - if ((1 << srcstep) & srcmask) != 0: - break - else: - log(" sskip", bin(srcmask), bin(1 << srcstep)) - self.svstate.ssubstep = SelectableInt(0, 2) # reset + # advance ssubstep + SVSTATE.ssubstep += 1 + + SVSTATE.srcstep = srcstep +``` + +------------- + +\newpage{} + +**dest_iterate** + +Note that `dststep` and `dsubstep` are not the absolute final Element +(and Sub-Element) offsets. `dststep` still has to go through individual +`REMAP` translation before becoming a per-operand (RT, RS/EA) destination +Element-level offset, and `dsubstep` may also go through `(f)mv.swizzle` +reordering. + +Note also critically that `UNPACK` mode simply inverts the outer/order +loops making SUBVL the outer loop and VL the inner. + +``` + # dest step iterator + vl = SVSTATE.vl + subvl = SVSTATE.subvl + unpack = SVSTATE.unpack + dsubstep = SVSTATE.dsubstep + end_dsub = dsubstep == subvl + dststep = SVSTATE.dststep + end_dst = dststep == vl-1 + # used below: + # dz - from RM.MODE, destination-zeroing + # dstmask - from RM.MODE, the destination predicate + if unpack: + # unpack advances subvl in *outer* loop + while True: + assert dststep <= vl-1 + end_dst = dststep == vl-1 + if end_dst: + if end_dsub: + loopend = True + else: + SVSTATE.dsubstep += 1 + dststep = 0 # reset + break else: - # advance ssubstep - self.svstate.ssubstep += SelectableInt(1, 2) - - self.svstate.srcstep = SelectableInt(srcstep, 7) - - def dst_iterate(): # dest step iterator - vl = self.svstate.vl - subvl = self.subvl - pack = self.svstate.pack - unpack = self.svstate.unpack - dsubstep = self.svstate.dsubstep - end_dsub = dsubstep == subvl - dststep = self.svstate.dststep - end_dst = dststep == vl-1 - dstmask = self.dstmask - # now dest step - if unpack: - # unpack advances subvl in *outer* loop + dststep += 1 # advance dststep + if not dz: + break + if ((1 << dststep) & dstmask) != 0: + break + else: + # advance subvl in *inner* loop + if end_dsub: while True: assert dststep <= vl-1 end_dst = dststep == vl-1 - if end_dst: - if end_dsub: - self.loopend = True - else: - self.svstate.dsubstep += SelectableInt(1, 2) - dststep = 0 # reset + if end_dst: # end-point + loopend = True + dststep = 0 break else: - dststep += 1 # advance dststep - if not self.dststep_skip: - break - if ((1 << dststep) & dstmask) != 0: - break + dststep += 1 + if not dz: + break + if ((1 << dststep) & dstmask) != 0: + break + SVSTATE.dsubstep = 0b00 # reset else: - # advance subvl in *inner* loop - if end_dsub: - while True: - assert dststep <= vl-1 - end_dst = dststep == vl-1 - if end_dst: # end-point - self.loopend = True - dststep = 0 - break - else: - dststep += 1 - if not self.dststep_skip: - break - if ((1 << dststep) & dstmask) != 0: - break - self.svstate.dsubstep = SelectableInt(0, 2) # reset - else: - # advance ssubstep - self.svstate.dsubstep += SelectableInt(1, 2) + # advance ssubstep + SVSTATE.dsubstep += 1 + + SVSTATE.dststep = dststep +``` + +------------- + +\newpage{} - self.svstate.dststep = SelectableInt(dststep, 7) +**SVSTATE_NEXT** +``` + if SVi = 1 then return REMAP SVSHAPE0 current offset + if SVi = 2 then return REMAP SVSHAPE1 current offset + if SVi = 3 then return REMAP SVSHAPE2 current offset + if SVi = 4 then return REMAP SVSHAPE3 current offset + if SVi = 5 then return SVSTATE.srcstep # VL source step + if SVi = 6 then return SVSTATE.dststep # VL dest step + if SVi = 7 then return SVSTATE.ssubstep # SUBVL source step + if SVi = 8 then return SVSTATE.dsubstep # SUBVL dest step + + # SVi=0, explicit iteration requezted + src_iterate(); + dst_iterate(); + return 0 +``` + +**at_loopend** + +Both Vertical-First and Horizontal-First may use this algorithm to +determine if the "end-of-looping" (end of Sub-Program-Counter) has +been reached. Horizontal-First Mode will immediately move to the +next instruction, where `svstep.` will set `CR0.EQ` to 1. + +``` + # tells if this is the last possible element. + subvl = SVSTATE.subvl + vl = SVSTATE.vl + end_ssub = SVSTATE.ssubstep == subvl + end_dsub = SVSTATE.dsubstep == subvl + if SVSTATE.srcstep == vl-1 and end_ssub: + return True + if SVSTATE.dststep == vl-1 and end_dsub: + return True + return False ``` [[!tag standards]] -- 2.30.2