CPU cycles not only to run them but also to load the predicate
mask repeatedly for each one. 3D GPU ISAs can test for this scenario
and jump over the fully-masked-out operations, by spotting that
-all Conditions are zero.
+*all* Conditions are false. Or, conversely, they only call the function if at least
+one Condition) is set.
Therefore, in order to be commercially competitive, `sv.bc` and
other Vector-aware Branch Conditional instructions are a high priority
for 3D GPU workloads.
The `BI` field of Branch Conditional operations is five bits, in scalar
-v3.0B this would select one bit of the 32 bit CR. In SVP64 there are
+v3.0B this would select one bit of the 32 bit CR,
+comprising eight CR Fields of 4 bits each. In SVP64 there are
16 32 bit CRs, containing 128 4-bit CR Fields. Therefore, the 2 LSBs of
`BI` select the bit from the CR Field (EQ LT GT SO), and the top 3 bits
are extended to either scalar or vector and to select CR Fields 0..127
NAND and NOR may be synthesised by
inverting `BO[2]` which just leaves two modes:
-* Branch takes place on the first CR test to succeed
+* Branch takes place on the first CR Field test to succeed
(a Great Big OR of all condition tests)
-* Branch takes place only if **all** CR tests succeed:
+* Branch takes place only if **all** CR field tests succeed:
a Great Big AND of all condition tests
(including those where the predicate is masked out
and the corresponding CR Field is considered to be
set to `SNZ`)
-When the CR Fields selected by SVP64 Augmented `BI` is marked as scalar,
-then as usual the loop ends at the first element tested, after taking
-predication into consideration. Thus, as usual, when `sz` is zero, srcstep
+When the CR Fields selected by SVP64-Augmented `BI` is marked as scalar,
+then as the usual SVP64 rules apply,
+the loop ends at the first element tested, after taking
+predication into consideration. Thus, also as usual, when a predicate mask is
+given, and `BI` marked as scalar, and `sz` is zero, srcstep
skips forward to the first non-zero predicated element, and only that
one element is tested.
SVP64 Branch Conditional operations, exactly as they may be applied to
other SVP64 operations. When `sz` is zero, any masked-out Branch-element
operations are not included in condition testing, exactly like all other
-SVP64 operations. This *includes* side-effects such as decrementing of
-CTR, which is also skipped on masked-out CR Field elements, when `sz`
-is zero.
-
-However when `sz` is non-zero, this normally requests insertion of a zero
+SVP64 operations, *including* side-effects such as potentially updating
+LR or CTR, which will also be skipped. There is *one* exception here,
+which is when
+`BO[2]=0, sz=0, CTR-test=0, CTi=1` and the relevant element
+predicate mask bit is also zero:
+under these special circumstances CTR will also decrement.
+
+When `sz` is non-zero, this normally requests insertion of a zero
in place of the input data, when the relevant predicate mask bit is zero.
This would mean that a zero is inserted in place of `CR[BI+32]` for
testing against `BO`, which may not be desirable in all circumstances.
| - | - | - | - | -- | -- | --- |---------|----------------- |
|ALL|LRu| / | / | 0 | 0 | / | SNZ sz | normal mode |
|ALL|LRu| / |VSb| 0 | 1 | VLI | SNZ sz | VLSET mode |
-|ALL|LRu| / | / | 1 | 0 | / | SNZ sz | CTR mode |
-|ALL|LRu| / |VSb| 1 | 1 | VLI | SNZ sz | CTR+VLSET mode |
+|ALL|LRu|CTi| / | 1 | 0 | / | SNZ sz | CTR-test mode |
+|ALL|LRu|CTi|VSb| 1 | 1 | VLI | SNZ sz | CTR-test+VLSET mode |
Fields:
* **VSb** is most relevant for Vertical-First VLSET Mode. After testing,
if VSb is set, VL is truncated if the branch succeeds. If VSb is clear,
VL is truncated if the branch did **not** take place.
+* **CTi** CTR inversion. CTR Mode normally decrements per element
+ tested. CTR inversion decrements if a test *fails*.
-CTR mode will subtract VL from CTR rather than just decrement
-CTR by one. Just as when v3.0B Branch-Conditional saves at
+Normally, CTR mode will decrement once per Condition Test, resulting
+under normal circumstances that CTR reduces by up to VL in Horizontal-First
+Mode. Just as when v3.0B Branch-Conditional saves at
least one instruction on tight inner loops through auto-decrementation
of CTR, likewise it is also possible to save instruction count for
-SVP64 loops in both Vertical-First and Horizontal-First Mode.
-Setting CTR Mode in Vertical-First results in `UNDEFINED`
-behaviour. Given that Vertical-First steps through one element
-at a time, standard single (v3.0B) CTR decrementing should
-correspondingly be used.
-
-Note that, interestingly, due to the useful side-effects of `VLSET` mode
+SVP64 loops in both Vertical-First and Horizontal-First Mode, particularly
+in circumstances where there is conditional interaction between the
+element computation and testing, and the continuation (or otherwise)
+of a given loop. The potential combinations of interactions is why CTR
+testing options have been added.
+
+If both CTR-test and VLSET Modes are requested, then because the CTR decrement is on a per element basis, the total amount that CTR is decremented
+by will end up being VL *after* truncation (should that occur). In
+other words, the order is (as can be seen in pseudocode, below):
+
+1. compute the test
+2. (optionally) decrement CTR
+3. (optionally) truncate VL
+4. decide (based on step 1) whether to terminate looping
+ (including not executing step 5)
+5. decide whether to branch.
+
+CTR-test mode and CTi interaction is as follows: note that
+`BO[2]` is still required to be clear for decrements to be
+considered.
+
+* **CTR-test=0, CTi=0**: CTR decrements on a per-element basis
+ if `BO[2]` is zero. Masked-out elements when `sz=0` are
+ skipped.
+* **CTR-test=0, CTi=1**: CTR decrements on a per-element basis
+ if `BO[2]` is zero and a masked-out element is skipped
+ (`sz=0` and predicate bit is zero). This one special case is the
+ **opposite** of other combinations.
+* **CTR-test=1, CTi=0**: CTR decrements on a per-element basis
+ if `BO[2]` is zero and the Condition Test succeeds.
+ Masked-out elements when `sz=0` are skipped.
+* **CTR-test=1, CTi=1**: CTR decrements on a per-element basis
+ if `BO[2]` is zero and the Condition Test *fails*.
+ Masked-out elements when `sz=0` are skipped.
+
+Note that, interestingly, due to the side-effects of `VLSET` mode
it is actually useful to use Branch Conditional even
to perform no actual branch operation, i.e to point to the instruction
-after the branch.
-If VLSET mode was requested with REMAP, VL will have been set to the
-length of one of the loop endpoints, as specified by the bit from
-the Branch `BI` field.
+after the branch. Truncation of VL would thus conditionally occur yet control
+flow alteration would not.
Also, the unconditional bit `BO[0]` is still relevant when Predication
is applied to the Branch because in `ALL` mode all nonmasked bits have
is used for explicit looping, where the looping is to terminate if the end
of the Vector, VL, is reached. If however that loop is terminated early
because VL is truncated, VLSET with Vertical-First becomes meaningless.
-Therefore, the option to decide whether truncation should occur if the
+Therefore, with `VSb`, the option to decide whether truncation should occur if the
branch succeeds *or* if the branch condition fails allows for flexibility
required.
mode, the Vector CR Field is to be overwritten or not: in some cases it
is useful to know but in others all that is needed is the branch itself.
+*Programming note: One important point is that SVP64 instructions are 64 bit.
+(8 bytes not 4). This needs to be taken into consideration when computing
+branch offsets: the offset is relative to the start of the instruction,
+which includes the SVP64 Prefix*
+
Pseudocode for Horizontal-First Mode:
```
testbit = CRbits[BI & 0b11]
# testbit = CR[BI+32+srcstep*4]
else if not SVRMmode.sz:
+ # inverted CTR test skip mode
+ if ¬BO[2] & CTRtest & ¬CTI then
+ CTR = CTR - 1
continue
else
testbit = SVRMmode.SNZ
CR{SVCRf+srcstep} = CRbits
testbit = CRbits[BI & 0b11]
else if not SVRMmode.sz:
+ # inverted CTR test skip mode
+ if ¬BO[2] & CTRtest & ¬CTI then
+ CTR = CTR - 1
SVSTATE.srcstep = new_srcstep
exit # no branch testing
else
SVSTATE.VL = new_srcstep
```
-v3.0B branch pseudocode including LRu
+v3.0B branch pseudocode including LRu and CTR skipping
```
if (mode_is_64bit) then M <- 0
else M <- 32
-if ¬BO[2] then CTR <- CTR - 1
-ctr_ok <- BO[2] | ((CTR[M:63] != 0) ^ BO[3])
cond_ok <- BO[0] | ¬(CR[BI+32] ^ BO[1])
+ctrdec = ¬BO[2]
+if CTRtest & (cond_ok ^ CTi) then
+ ctrdec = 0b0
+if ctrdec then CTR <- CTR - 1
+ctr_ok <- BO[2] | ((CTR[M:63] != 0) ^ BO[3])
lr_ok <- SVRMmode.LRu
if ctr_ok & cond_ok then
if AA then NIA <-iea EXTS(BD || 0b00)