CPU cycles not only to run them but also to load the predicate
mask repeatedly for each one. 3D GPU ISAs can test for this scenario
and jump over the fully-masked-out operations, by spotting that
-all Conditions are zero.
+*all* Conditions are false. Or, conversely, they only call the function if at least
+one Condition) is set.
Therefore, in order to be commercially competitive, `sv.bc` and
other Vector-aware Branch Conditional instructions are a high priority
for 3D GPU workloads.
NAND and NOR may be synthesised by
inverting `BO[2]` which just leaves two modes:
-* Branch takes place on the first CR test to succeed
+* Branch takes place on the first CR Field test to succeed
(a Great Big OR of all condition tests)
-* Branch takes place only if **all** CR tests succeed:
+* Branch takes place only if **all** CR field tests succeed:
a Great Big AND of all condition tests
(including those where the predicate is masked out
and the corresponding CR Field is considered to be
set to `SNZ`)
-When the CR Fields selected by SVP64 Augmented `BI` is marked as scalar,
-then as usual the loop ends at the first element tested, after taking
-predication into consideration. Thus, as usual, when `sz` is zero, srcstep
+When the CR Fields selected by SVP64-Augmented `BI` is marked as scalar,
+then as the usual SVP64 rules apply,
+the loop ends at the first element tested, after taking
+predication into consideration. Thus, also as usual, when a predicate mask is
+given, and `BI` marked as scalar, and `sz` is zero, srcstep
skips forward to the first non-zero predicated element, and only that
one element is tested.
SVP64 Branch Conditional operations, exactly as they may be applied to
other SVP64 operations. When `sz` is zero, any masked-out Branch-element
operations are not included in condition testing, exactly like all other
-SVP64 operations. This *includes* side-effects such as decrementing of
-CTR, which is also skipped on masked-out CR Field elements, when `sz`
-is zero.
-
-However when `sz` is non-zero, this normally requests insertion of a zero
+SVP64 operations, *including* side-effects such as potentially updating
+LR or CTR, which will also be skipped. There is *one* exception here,
+which is when
+`BO[2]=0, sz=0, CTR-test=0, CTi=1` and the relevant element
+predicate mask bit is also zero:
+under these special circumstances CTR will also decrement.
+
+When `sz` is non-zero, this normally requests insertion of a zero
in place of the input data, when the relevant predicate mask bit is zero.
This would mean that a zero is inserted in place of `CR[BI+32]` for
testing against `BO`, which may not be desirable in all circumstances.
| - | - | - | - | -- | -- | --- |---------|----------------- |
|ALL|LRu| / | / | 0 | 0 | / | SNZ sz | normal mode |
|ALL|LRu| / |VSb| 0 | 1 | VLI | SNZ sz | VLSET mode |
-|ALL|LRu| / | / | 1 | 0 | / | SNZ sz | CTR mode |
-|ALL|LRu| / |VSb| 1 | 1 | VLI | SNZ sz | CTR+VLSET mode |
+|ALL|LRu|CTi| / | 1 | 0 | / | SNZ sz | CTR-test mode |
+|ALL|LRu|CTi|VSb| 1 | 1 | VLI | SNZ sz | CTR-test+VLSET mode |
Fields:
* **VSb** is most relevant for Vertical-First VLSET Mode. After testing,
if VSb is set, VL is truncated if the branch succeeds. If VSb is clear,
VL is truncated if the branch did **not** take place.
+* **CTi** CTR inversion. CTR Mode normally decrements per element
+ tested. CTR inversion decrements if a test *fails*.
-CTR mode will subtract VL from CTR rather than just decrement
-CTR by one. Just as when v3.0B Branch-Conditional saves at
+Normally, CTR mode will decrement once per Condition Test, resulting
+under normal circumstances that CTR reduces by up to VL in Horizontal-First
+Mode. Just as when v3.0B Branch-Conditional saves at
least one instruction on tight inner loops through auto-decrementation
of CTR, likewise it is also possible to save instruction count for
-SVP64 loops in both Vertical-First and Horizontal-First Mode.
-Setting CTR Mode in Vertical-First results in `UNDEFINED`
-behaviour. Given that Vertical-First steps through one element
-at a time, standard single (v3.0B) CTR decrementing should
-correspondingly be used instead.
-
-If CTR+VLSET Modes are requested, the amount that CTR is decremented
-by is the value of VL *after* truncation (should that occur).
+SVP64 loops in both Vertical-First and Horizontal-First Mode, particularly
+in circumstances where there is conditional interaction between the
+element computation and testing, and the continuation (or otherwise)
+of a given loop. The potential combinations of interactions is why CTR
+testing options have been added.
+
+If both CTR-test and VLSET Modes are requested, then because the CTR decrement is on a per element basis, the total amount that CTR is decremented
+by will end up being VL *after* truncation (should that occur). In
+other words, the order is (as can be seen in pseudocode, below):
+
+1. compute the test
+2. (optionally) decrement CTR
+3. (optionally) truncate VL
+4. decide (based on step 1) whether to terminate looping
+ (including not executing step 5)
+5. decide whether to branch.
+
+CTR-test mode and CTi interaction is as follows: note that
+`BO[2]` is still required to be clear for decrements to be
+considered.
+
+* **CTR-test=0, CTi=0**: CTR decrements on a per-element basis
+ if `BO[2]` is zero. Masked-out elements when `sz=0` are
+ skipped.
+* **CTR-test=0, CTi=1**: CTR decrements on a per-element basis
+ if `BO[2]` is zero and a masked-out element is skipped
+ (`sz=0` and predicate bit is zero). This one special case is the
+ **opposite** of other combinations.
+* **CTR-test=1, CTi=0**: CTR decrements on a per-element basis
+ if `BO[2]` is zero and the Condition Test succeeds.
+ Masked-out elements when `sz=0` are skipped.
+* **CTR-test=1, CTi=1**: CTR decrements on a per-element basis
+ if `BO[2]` is zero and the Condition Test *fails*.
+ Masked-out elements when `sz=0` are skipped.
Note that, interestingly, due to the side-effects of `VLSET` mode
it is actually useful to use Branch Conditional even
testbit = CRbits[BI & 0b11]
# testbit = CR[BI+32+srcstep*4]
else if not SVRMmode.sz:
+ # inverted CTR test skip mode
+ if ¬BO[2] & CTRtest & ¬CTI then
+ CTR = CTR - 1
continue
else
testbit = SVRMmode.SNZ
CR{SVCRf+srcstep} = CRbits
testbit = CRbits[BI & 0b11]
else if not SVRMmode.sz:
+ # inverted CTR test skip mode
+ if ¬BO[2] & CTRtest & ¬CTI then
+ CTR = CTR - 1
SVSTATE.srcstep = new_srcstep
exit # no branch testing
else
SVSTATE.VL = new_srcstep
```
-v3.0B branch pseudocode including LRu
+v3.0B branch pseudocode including LRu and CTR skipping
```
if (mode_is_64bit) then M <- 0
else M <- 32
-if ¬BO[2] then CTR <- CTR - 1
-ctr_ok <- BO[2] | ((CTR[M:63] != 0) ^ BO[3])
cond_ok <- BO[0] | ¬(CR[BI+32] ^ BO[1])
+ctrdec = ¬BO[2]
+if CTRtest & (cond_ok ^ CTi) then
+ ctrdec = 0b0
+if ctrdec then CTR <- CTR - 1
+ctr_ok <- BO[2] | ((CTR[M:63] != 0) ^ BO[3])
lr_ok <- SVRMmode.LRu
if ctr_ok & cond_ok then
if AA then NIA <-iea EXTS(BD || 0b00)