cleanup on svstep pseudocode
authorLuke Kenneth Casson Leighton <lkcl@lkcl.net>
Fri, 14 Apr 2023 18:10:32 +0000 (19:10 +0100)
committerLuke Kenneth Casson Leighton <lkcl@lkcl.net>
Fri, 14 Apr 2023 18:10:32 +0000 (19:10 +0100)
openpower/sv/svstep.mdwn

index 4a5fc363d1bae3a603b9879dffa33eb91b77a119..1f19a136dbc484cb870d6705fd1c35ac917db35b 100644 (file)
@@ -75,66 +75,60 @@ the LE bit for the second, the GT bit for the outermost loop and the
 SO bit set on the very last element, when all loops reach their maximum
 extent.
 
-*Programmer's note: VL in some situations, particularly larger Matrices
-(5x7x3 will set MAXVL=105),
-will cause `sv.svstep` to return a considerable number of values. Under
-such circumstances `sv.svstep/ew=8` is recommended.*
-
-*Programmer's note: having conveniently obtained a pre-computed
-Schedule with `sv.svstep`,
-it may then be used as the input to Indexed REMAP Mode
-to achieve the exact same Schedule. It is evident however that
+*Programmer's note: VL in some situations, particularly larger
+Matrices (5x7x3 will set MAXVL=105), will cause `sv.svstep` to return a
+considerable number of values. Under such circumstances `sv.svstep/ew=8`
+is recommended.*
+
+*Programmer's note: having conveniently obtained a pre-computed Schedule
+with `sv.svstep`, it may then be used as the input to Indexed REMAP
+Mode to achieve the exact same Schedule. It is evident however that
 before use some of the Indices may be arbitrarily altered as desired.
 `sv.svstep` helps the programmer avoid having to manually recreate
-Indices for certain
-types of common Loop patterns. In its simplest form, without REMAP
-(SVi=5 or SVi=6),
-is equivalent to the `iota` instruction found in other Vector ISAs*
+Indices for certain types of common Loop patterns. In its simplest form,
+without REMAP (SVi=5 or SVi=6), is equivalent to the `iota` instruction
+found in other Vector ISAs*
 
 **Vertical First Mode**
 
 Vertical First is effectively like an implicit single bit predicate
-applied to every SVP64 instruction.  **ONLY** one element in each
-SVP64 Vector instruction is executed; srcstep and dststep do **not**
-increment automatically on completion of one instruction,
-and the Program Counter progresses **immediately** to
-the next instruction just as it would for any standard scalar v3.0B
-instruction.
+applied to every SVP64 instruction.  **ONLY** one element in each SVP64
+Vector instruction is executed; srcstep and dststep do **not** increment
+automatically on completion of one instruction, and the Program Counter
+progresses **immediately** to the next instruction just as it would for
+any standard scalar v3.0B instruction.
 
-A mode of srcstep (SVi=0) is called which can move srcstep and
-dststep on to the next element, still respecting predicate
-masks.
+A mode of srcstep (SVi=0) is called which can move srcstep and dststep
+on to the next element, still respecting predicate masks.
 
 In other words, where normal SVP64 Vectorisation acts "horizontally"
-by looping first through 0 to VL-1 and only then moving the PC
-to the next instruction, Vertical-First moves the PC onwards
-(vertically) through multiple instructions **with the same
-srcstep and dststep**, then an explict instruction used to
-advance srcstep/dststep. An outer loop is expected to be
-used (branch instruction) which completes a series of
-Vector operations.
-
-Testing any end condition of any loop of any REMAP state allows branches to be
-used to create loops.
-
-Programmer's note: when Predicate Non-Zeroing is used this indicates to
+by looping first through 0 to VL-1 and only then moving the PC to the
+next instruction, Vertical-First moves the PC onwards (vertically)
+through multiple instructions **with the same srcstep and dststep**,
+then an explict instruction used to advance srcstep/dststep. An outer
+loop is expected to be used (branch instruction) which completes a series
+of Vector operations.
+
+Testing any end condition of any loop of any REMAP state allows branches
+to be used to create loops.
+
+*Programmer's note: when Predicate Non-Zeroing is used this indicates to
 the underlying hardware that any masked-out element must be skipped.
-*This includes in Vertical-First Mode*, and programmers should be keenly
-aware that srcstep or dststep or both *may* jump by more than one as
-a result, because the actual request under these circumstances was to execute
-on the first available next *non-masked-out* element.  It should be
-evident that it is the `sv.svstep` instruction that must be Predicated
-in order for the **entire** loop to use the Predicate correctly, and
-it is strongly recommended for all instructions within the same
-Vertical-First Loop to utilise the exact same Predicate Mask(s).*
-
-Programmers should be aware that VL, srcstep and dststep and
-the SUBVL substeps are global in nature.
-Nested looping with different schedules is perfectly possible, as is
-calling of functions, however SVSTATE (and any associated SVSHAPEs
-if REMAP is being used) should
-obviously be stored on the stack in order to achieve this benefit
-not normally found in Vector ISAs.
+*This includes in Vertical-First Mode*, and programmers should be
+keenly aware that srcstep or dststep or both *may* jump by more than
+one as a result, because the actual request under these circumstances
+was to execute on the first available next *non-masked-out* element.
+It should be evident that it is the `sv.svstep` instruction that must
+be Predicated in order for the **entire** loop to use the Predicate
+correctly, and it is strongly recommended for all instructions within
+the same Vertical-First Loop to utilise the exact same Predicate Mask(s).*
+
+Programmers should be aware that VL, srcstep and dststep and the SUBVL
+substeps are global in nature.  Nested looping with different schedules
+is perfectly possible, as is calling of functions, however SVSTATE
+(and any associated SVSHAPEs if REMAP is being used) should obviously
+be stored on the stack in order to achieve this benefit not normally
+found in Vector ISAs.
 
 -------------
 
@@ -142,132 +136,183 @@ not normally found in Vector ISAs.
 
 # Appendix
 
-**SVSTATE_NEXT**
+**src_iterate**
 
-```
-    if SVi = 1 then return REMAP SVSHAPE0 current offset
-    if SVi = 2 then return REMAP SVSHAPE1 current offset
-    if SVi = 3 then return REMAP SVSHAPE2 current offset
-    if SVi = 4 then return REMAP SVSHAPE3 current offset
-    if SVi = 5 then return SVSTATE.srcstep
-    if SVi = 6 then return SVSTATE.dststep
-    if SVi = 7 then return SVSTATE.ssubstep # SUBVL source step
-    if SVi = 8 then return SVSTATE.dsubstep # SUBVL dest step
-    # SVi=0, explicit iteration requezted
-    src_iterate();
-    dst_iterate();
-    return 0
-```
+Note that `srcstep` and `ssubstep` are not the absolute final Element
+(and Sub-Element) offsets.  `srcstep` still has to go through individual
+`REMAP` translation before becoming a per-operand (RA, RB, RC, RT, RS)
+Element-level Source offset.
 
-**ADVANCE_STEPS**
+Note also critically that `PACK` mode simply inverts the outer/order
+loops making SUBVL the outer loop and VL the inner.
 
 ```
-    def src_iterate(): # source-stepping iterator
-        subvl = self.subvl
-        vl = self.svstate.vl
-        pack = self.svstate.pack
-        unpack = self.svstate.unpack
-        ssubstep = self.svstate.ssubstep
-        end_ssub = ssubstep == subvl
-        end_src = self.svstate.srcstep == vl-1
-        # first source step
-        srcstep = self.svstate.srcstep
-        srcmask = self.srcmask
-        if pack:
-            # pack advances subvl in *outer* loop
+    # source-stepping iterator
+    subvl = SVSTATE.subvl
+    vl = SVSTATE.vl
+    pack = SVSTATE.pack
+    unpack = SVSTATE.unpack
+    ssubstep = SVSTATE.ssubstep
+    end_ssub = ssubstep == subvl
+    end_src = SVSTATE.srcstep == vl-1
+    # first source step.
+    srcstep = SVSTATE.srcstep
+    # used below:
+    #       sz      - from RM.MODE, source-zeroing
+    #       srcmask - from RM.MODE, the source predicate
+    if pack:
+        # pack advances subvl in *outer* loop
+        while True:
+            assert srcstep <= vl-1
+            end_src = srcstep == vl-1
+            if end_src:
+                if end_ssub:
+                    loopend = True
+                else:
+                    SVSTATE.ssubstep += 1
+                srcstep = 0  # reset
+                break
+            else:
+                srcstep += 1  # advance srcstep
+                if not sz:
+                    break
+                if ((1 << srcstep) & srcmask) != 0:
+                    break
+    else:
+        # advance subvl in *inner* loop
+        if end_ssub:
             while True:
                 assert srcstep <= vl-1
                 end_src = srcstep == vl-1
-                if end_src:
-                    if end_ssub:
-                        self.loopend = True
-                    else:
-                        self.svstate.ssubstep += SelectableInt(1, 2)
-                    srcstep = 0  # reset
+                if end_src:  # end-point
+                    loopend = True
+                    srcstep = 0
+                    break
+                else:
+                    srcstep += 1
+                if not sz:
+                    break
+                if ((1 << srcstep) & srcmask) != 0:
                     break
                 else:
-                    srcstep += 1  # advance srcstep
-                    if not self.srcstep_skip:
-                        break
-                    if ((1 << srcstep) & srcmask) != 0:
-                        break
+                    log("      sskip", bin(srcmask), bin(1 << srcstep))
+            SVSTATE.ssubstep = 0b00  # reset
         else:
-            # advance subvl in *inner* loop
-            if end_ssub:
-                while True:
-                    assert srcstep <= vl-1
-                    end_src = srcstep == vl-1
-                    if end_src:  # end-point
-                        self.loopend = True
-                        srcstep = 0
-                        break
-                    else:
-                        srcstep += 1
-                    if not self.srcstep_skip:
-                        break
-                    if ((1 << srcstep) & srcmask) != 0:
-                        break
-                    else:
-                        log("      sskip", bin(srcmask), bin(1 << srcstep))
-                self.svstate.ssubstep = SelectableInt(0, 2)  # reset
+            # advance ssubstep
+            SVSTATE.ssubstep += 1
+
+    SVSTATE.srcstep = srcstep
+```
+
+-------------
+
+\newpage{}
+
+**dest_iterate**
+
+Note that `dststep` and `dsubstep` are not the absolute final Element
+(and Sub-Element) offsets.  `dststep` still has to go through individual
+`REMAP` translation before becoming a per-operand (RT, RS/EA) destination
+Element-level offset, and `dsubstep` may also go through `(f)mv.swizzle`
+reordering.
+
+Note also critically that `UNPACK` mode simply inverts the outer/order
+loops making SUBVL the outer loop and VL the inner.
+
+```
+    # dest step iterator
+    vl = SVSTATE.vl
+    subvl = SVSTATE.subvl
+    unpack = SVSTATE.unpack
+    dsubstep = SVSTATE.dsubstep
+    end_dsub = dsubstep == subvl
+    dststep = SVSTATE.dststep
+    end_dst = dststep == vl-1
+    # used below:
+    #       dz      - from RM.MODE, destination-zeroing
+    #       dstmask - from RM.MODE, the destination predicate
+    if unpack:
+        # unpack advances subvl in *outer* loop
+        while True:
+            assert dststep <= vl-1
+            end_dst = dststep == vl-1
+            if end_dst:
+                if end_dsub:
+                    loopend = True
+                else:
+                    SVSTATE.dsubstep += 1
+                dststep = 0  # reset
+                break
             else:
-                # advance ssubstep
-                self.svstate.ssubstep += SelectableInt(1, 2)
-
-        self.svstate.srcstep = SelectableInt(srcstep, 7)
-
-    def dst_iterate(): # dest step iterator
-        vl = self.svstate.vl
-        subvl = self.subvl
-        pack = self.svstate.pack
-        unpack = self.svstate.unpack
-        dsubstep = self.svstate.dsubstep
-        end_dsub = dsubstep == subvl
-        dststep = self.svstate.dststep
-        end_dst = dststep == vl-1
-        dstmask = self.dstmask
-        # now dest step
-        if unpack:
-            # unpack advances subvl in *outer* loop
+                dststep += 1  # advance dststep
+                if not dz:
+                    break
+                if ((1 << dststep) & dstmask) != 0:
+                    break
+    else:
+        # advance subvl in *inner* loop
+        if end_dsub:
             while True:
                 assert dststep <= vl-1
                 end_dst = dststep == vl-1
-                if end_dst:
-                    if end_dsub:
-                        self.loopend = True
-                    else:
-                        self.svstate.dsubstep += SelectableInt(1, 2)
-                    dststep = 0  # reset
+                if end_dst:  # end-point
+                    loopend = True
+                    dststep = 0
                     break
                 else:
-                    dststep += 1  # advance dststep
-                    if not self.dststep_skip:
-                        break
-                    if ((1 << dststep) & dstmask) != 0:
-                        break
+                    dststep += 1
+                if not dz:
+                    break
+                if ((1 << dststep) & dstmask) != 0:
+                    break
+            SVSTATE.dsubstep = 0b00  # reset
         else:
-            # advance subvl in *inner* loop
-            if end_dsub:
-                while True:
-                    assert dststep <= vl-1
-                    end_dst = dststep == vl-1
-                    if end_dst:  # end-point
-                        self.loopend = True
-                        dststep = 0
-                        break
-                    else:
-                        dststep += 1
-                    if not self.dststep_skip:
-                        break
-                    if ((1 << dststep) & dstmask) != 0:
-                        break
-                self.svstate.dsubstep = SelectableInt(0, 2)  # reset
-            else:
-                # advance ssubstep
-                self.svstate.dsubstep += SelectableInt(1, 2)
+            # advance ssubstep
+            SVSTATE.dsubstep += 1
+
+    SVSTATE.dststep = dststep
+```
+
+-------------
+
+\newpage{}
 
-        self.svstate.dststep = SelectableInt(dststep, 7)
+**SVSTATE_NEXT**
 
+```
+    if SVi = 1 then return REMAP SVSHAPE0 current offset
+    if SVi = 2 then return REMAP SVSHAPE1 current offset
+    if SVi = 3 then return REMAP SVSHAPE2 current offset
+    if SVi = 4 then return REMAP SVSHAPE3 current offset
+    if SVi = 5 then return SVSTATE.srcstep  # VL source step
+    if SVi = 6 then return SVSTATE.dststep  # VL dest step
+    if SVi = 7 then return SVSTATE.ssubstep # SUBVL source step
+    if SVi = 8 then return SVSTATE.dsubstep # SUBVL dest step
+
+    # SVi=0, explicit iteration requezted
+    src_iterate();
+    dst_iterate();
+    return 0
+```
+
+**at_loopend**
+
+Both Vertical-First and Horizontal-First may use this algorithm to
+determine if the "end-of-looping" (end of Sub-Program-Counter) has
+been reached.  Horizontal-First Mode will immediately move to the
+next instruction, where `svstep.` will set `CR0.EQ` to 1.
+
+```
+    # tells if this is the last possible element.
+    subvl = SVSTATE.subvl
+    vl = SVSTATE.vl
+    end_ssub = SVSTATE.ssubstep == subvl
+    end_dsub = SVSTATE.dsubstep == subvl
+    if SVSTATE.srcstep == vl-1 and end_ssub:
+        return True
+    if SVSTATE.dststep == vl-1 and end_dsub:
+        return True
+    return False
 ```
 
 [[!tag standards]]