by comparison to a single Vector-aware Branch.
Therefore, in order to be commercially competitive, `sv.bc` and
other Vector-aware Branch Conditional instructions are a high priority
-for 3D GPU (and CUDA) workloads.
+for 3D GPU (and OpenCL-style) workloads.
Given that Power ISA v3.0B is already quite powerful, particularly
the Condition Registers and their interaction with Branches, there
SVP64 RM `MODE` (includes `ELWIDTH` and `ELWIDTH_SRC` bits) for Branch
Conditional:
-| 4 | 5 | 6 | 7 | 19 | 20 | 21 | 22 23 | description |
-| - | - | - | - | -- | -- | --- |---------|----------------- |
-|ALL|SNZ| / | / | 0 | 0 | / | LRu sz | normal mode |
-|ALL|SNZ| / |VSb| 0 | 1 | VLI | LRu sz | VLSET mode |
-|ALL|SNZ|CTi| / | 1 | 0 | / | LRu sz | CTR-test mode |
-|ALL|SNZ|CTi|VSb| 1 | 1 | VLI | LRu sz | CTR-test+VLSET mode |
+| 4 | 5 | 6 | 7 | 17 | 18 | 19 | 20 | 21 | 22 23 | description |
+| - | - | - | - | -- | -- | -- | -- | --- |--------|----------------- |
+|ALL|SNZ| / | / | | | 0 | 0 | / | LRu sz | normal mode |
+|ALL|SNZ| / |VSb| | | 0 | 1 | VLI | LRu sz | VLSET mode |
+|ALL|SNZ|CTi| / | | | 1 | 0 | / | LRu sz | CTR-test mode |
+|ALL|SNZ|CTi|VSb| | | 1 | 1 | VLI | LRu sz | CTR-test+VLSET mode |
+
+TODO bits 17,18 for SVSTATE-variant of LR and LRu.
Brief description of fields:
in CTR-test Mode.
LRu and CTR-test modes are where SVP64 Branches subtly differ from
-Scalar v3.0B Branches. `bclr` for example will always update LR, whereas
-`sv.bclr/lru` will only update LR if the branch succeeds.
+Scalar v3.0B Branches. `sv.bcl` for example will always update LR, whereas
+`sv.bcl/lru` will only update LR if the branch succeeds.
Of special interest is that when using ALL Mode (Great Big AND
of all Condition Tests), if `VL=0`,
CTR would be subtracted, in a fully-deterministic and parallel
fashion. A SIMD-based Branch Unit, receiving and processing
multiple CR Fields covered by multiple predicate bits, would
-do the exact same thing.*
+do the exact same thing. Obviously, however, if CTR is modified
+within any given loop (mtctr) the behaviour of CTR is no longer
+deterministic.*
## Link Register Update
if ¬predicate_bit & ¬SVRMmode.sz then
if ¬BO[2] & CTRtest & ¬CTi then
CTR = CTR - 1
- stop # instruction finishes here
-if ¬BO[2] & ¬(CTRtest & (cond_ok ^ CTi)) then CTR <- CTR - 1
-lr_ok <- LK
-if ctr_ok & cond_ok then
- if AA then NIA <-iea EXTS(BD || 0b00)
- else NIA <-iea CIA + EXTS(BD || 0b00)
- if SVRMmode.LRu then lr_ok <- ¬lr_ok
-if lr_ok then LR <-iea CIA + 4
+ # instruction finishes here
+else
+ if ¬BO[2] & ¬(CTRtest & (cond_ok ^ CTi)) then CTR <- CTR - 1
+ if VLSET and VSb = (cond_ok & ctr_ok) then
+ if SVRMmode.VLI then SVSTATE.VL = srcstep+1
+ else SVSTATE.VL = srcstep
+ lr_ok <- LK
+ if ctr_ok & cond_ok then
+ if AA then NIA <-iea EXTS(BD || 0b00)
+ else NIA <-iea CIA + EXTS(BD || 0b00)
+ if SVRMmode.LRu then lr_ok <- ¬lr_ok
+ if lr_ok then LR <-iea CIA + 4
```
Below is the pseudocode for SVP64 Branches, which is a little less
cond_ok |= (el_cond_ok & ctr_ok)
# test for VL to be set (and exit)
if VLSET and VSb = (el_cond_ok & ctr_ok) then
- if SVRMmode.VLI
- SVSTATE.VL = srcstep+1
- else
- SVSTATE.VL = srcstep
+ if SVRMmode.VLI then SVSTATE.VL = srcstep+1
+ else SVSTATE.VL = srcstep
break
# early exit?
if SVRMmode.ALL != (el_cond_ok & ctr_ok):
```
# TODO LRu example
-show why LRu would be useful in a loop.
+show why LRu would be useful in a loop. Imagine the following
+c code:
+
+```
+for (int i = 0; i < 8; i++) {
+ if (x < y) break;
+}
+```
+
+Under these circumstances exiting from the loop is not only
+based on CTR it has become conditional on a CR result.
+Thus it is desirable that NIA *and* LR only be modified
+if the conditions are met
+
v3.0 pseudocode for `bclrl`: