SVP64 Branch Conditional operations, exactly as they may be applied to
other SVP64 operations. When `sz` is zero, any masked-out Branch-element
operations are not included in condition testing, exactly like all other
-SVP64 operations. This *includes* side-effects such as decrementing of
-CTR, which is also skipped on masked-out CR Field elements, when `sz`
-is zero.
-
-However when `sz` is non-zero, this normally requests insertion of a zero
+SVP64 operations, *including* side-effects such as potentially updating
+LR or CTR, which will also be skipped. There is *one* exception here,
+which is when
+`BO[2]=0, sz=0, CTR-test=0, CTi=1` and the relevant element
+predicate mask bit is also zero:
+under these special circumstances CTR will also decrement.
+
+When `sz` is non-zero, this normally requests insertion of a zero
in place of the input data, when the relevant predicate mask bit is zero.
This would mean that a zero is inserted in place of `CR[BI+32]` for
testing against `BO`, which may not be desirable in all circumstances.
| - | - | - | - | -- | -- | --- |---------|----------------- |
|ALL|LRu| / | / | 0 | 0 | / | SNZ sz | normal mode |
|ALL|LRu| / |VSb| 0 | 1 | VLI | SNZ sz | VLSET mode |
-|ALL|LRu|Csk| / | 1 | 0 | / | SNZ sz | CTR mode |
-|ALL|LRu|Csk|VSb| 1 | 1 | VLI | SNZ sz | CTR+VLSET mode |
+|ALL|LRu|CTi| / | 1 | 0 | / | SNZ sz | CTR-test mode |
+|ALL|LRu|CTi|VSb| 1 | 1 | VLI | SNZ sz | CTR-test+VLSET mode |
Fields:
* **VSb** is most relevant for Vertical-First VLSET Mode. After testing,
if VSb is set, VL is truncated if the branch succeeds. If VSb is clear,
VL is truncated if the branch did **not** take place.
-* **Csk** CTR skipping. CTR Mode normally subtracts VL from CTR.
- Csk refines that further
+* **CTi** CTR inversion. CTR Mode normally decrements per element
+ tested. CTR inversion decrements if a test *fails*.
-Normally, CTR mode will subtract VL from CTR rather than just decrement
-CTR by one. Just as when v3.0B Branch-Conditional saves at
+Normally, CTR mode will decrement once per Condition Test, resulting
+under normal circumstances that CTR reduces by up to VL in Horizontal-First
+Mode. Just as when v3.0B Branch-Conditional saves at
least one instruction on tight inner loops through auto-decrementation
of CTR, likewise it is also possible to save instruction count for
-SVP64 loops in both Vertical-First and Horizontal-First Mode.
-Setting CTR Mode in Vertical-First results in `UNDEFINED`
-behaviour. Given that Vertical-First steps through one element
-at a time, standard single (v3.0B) CTR decrementing should
-correspondingly be used instead.
-
-If both CTR+VLSET Modes are requested, the amount that CTR is decremented
-by is the value of VL *after* truncation (should that occur).
-
-Enabling CTR Skipping (Csk) has a number of options, which need explaining:
-
-* **Standard SVP64 CTR Mode** Csk=0, sz=0, no predicate specified.
- VL will be subtracted from CTR (as already explained above)
-* **Predicated CTR Mode** Csk=1, predicate is specified.
- Regardless of whether the Condition Test passes or fails,
- masked-out elements are *not included* in the
- count subtracted from CTR. If VL=3 but the predicate mask
- is 0b101 and all CR Field Conditions are tested then CTR
- will be reduced by two, *not* three (because only 2 predicate
- mask bits are enabled). This includes when sz=1.
-* **Non-predicated CTR Skip Mode**, Csk=1, sz=0, no
- predicate specified.
- Only the number of elements which pass the Condition Test (in
- both ALL or ANY mode) will be subtracted from CTR
-* **Non-predicated CTR Skip inverted**, Csk=1, sz=1,
- no predicate specified.
- Only the number of elements which **fail** the Condition
- test will be subtracted from CTR
+SVP64 loops in both Vertical-First and Horizontal-First Mode, particularly
+in circumstances where there is conditional interaction between the
+element computation and testing, and the continuation (or otherwise)
+of a given loop. The potential combinations of interactions is why CTR
+testing options have been added.
+
+If both CTR-test and VLSET Modes are requested, then because the CTR decrement is on a per element basis, the total amount that CTR is decremented
+by will end up being VL *after* truncation (should that occur). In
+other words, the order is (as can be seen in pseudocode, below):
+
+1. compute the test
+2. (optionally) decrement CTR
+3. (optionally) truncate VL
+4. decide (based on step 1) whether to terminate looping
+ (including not executing step 5)
+5. decide whether to branch.
+
+CTR-test mode and CTi interaction is as follows: note that
+`BO[2]` is still required to be clear for decrements to be
+considered.
+
+* **CTR-test=0, CTi=0**: CTR decrements on a per-element basis
+ if `BO[2]` is zero. Masked-out elements when `sz=0` are
+ skipped.
+* **CTR-test=0, CTi=1**: CTR decrements on a per-element basis
+ if `BO[2]` is zero and a masked-out element is skipped
+ (`sz=0` and predicate bit is zero). This one special case is the
+ **opposite** of other combinations.
+* **CTR-test=1, CTi=0**: CTR decrements on a per-element basis
+ if `BO[2]` is zero and the Condition Test succeeds.
+ Masked-out elements when `sz=0` are skipped.
+* **CTR-test=1, CTi=1**: CTR decrements on a per-element basis
+ if `BO[2]` is zero and the Condition Test *fails*.
+ Masked-out elements when `sz=0` are skipped.
Note that, interestingly, due to the side-effects of `VLSET` mode
it is actually useful to use Branch Conditional even
testbit = CRbits[BI & 0b11]
# testbit = CR[BI+32+srcstep*4]
else if not SVRMmode.sz:
+ # inverted CTR test skip mode
+ if ¬BO[2] & CTRtest & ¬CTI then
+ CTR = CTR - 1
continue
else
testbit = SVRMmode.SNZ
CR{SVCRf+srcstep} = CRbits
testbit = CRbits[BI & 0b11]
else if not SVRMmode.sz:
+ # inverted CTR test skip mode
+ if ¬BO[2] & CTRtest & ¬CTI then
+ CTR = CTR - 1
SVSTATE.srcstep = new_srcstep
exit # no branch testing
else
SVSTATE.VL = new_srcstep
```
-v3.0B branch pseudocode including LRu
+v3.0B branch pseudocode including LRu and CTR skipping
```
if (mode_is_64bit) then M <- 0
else M <- 32
-if ¬BO[2] then CTR <- CTR - 1
-ctr_ok <- BO[2] | ((CTR[M:63] != 0) ^ BO[3])
cond_ok <- BO[0] | ¬(CR[BI+32] ^ BO[1])
+ctrdec = ¬BO[2]
+if CTRtest & (cond_ok ^ CTi) then
+ ctrdec = 0b0
+if ctrdec then CTR <- CTR - 1
+ctr_ok <- BO[2] | ((CTR[M:63] != 0) ^ BO[3])
lr_ok <- SVRMmode.LRu
if ctr_ok & cond_ok then
if AA then NIA <-iea EXTS(BD || 0b00)