From 5ca41cc07cad5dca7a71bdd6e2ed3fca15659ee9 Mon Sep 17 00:00:00 2001 From: lkcl Date: Thu, 26 Aug 2021 14:27:58 +0100 Subject: [PATCH] --- openpower/sv/branches.mdwn | 39 ++++++++++++++++++++++++++++++-------- 1 file changed, 31 insertions(+), 8 deletions(-) diff --git a/openpower/sv/branches.mdwn b/openpower/sv/branches.mdwn index f61e28e0a..6fdaba7a1 100644 --- a/openpower/sv/branches.mdwn +++ b/openpower/sv/branches.mdwn @@ -34,7 +34,8 @@ instructions may be masked out to `nop`, and it would waste CPU cycles not only to run them but also to load the predicate mask repeatedly for each one. 3D GPU ISAs can test for this scenario and jump over the fully-masked-out operations, by spotting that -all Conditions are zero. +*all* Conditions are false. Or, conversely, they only call the function if at least +one Condition) is set. Therefore, in order to be commercially competitive, `sv.bc` and other Vector-aware Branch Conditional instructions are a high priority for 3D GPU workloads. @@ -52,16 +53,17 @@ AND, OR, NAND and NOR of all Conditions. NAND and NOR may be synthesised by inverting `BO[2]` which just leaves two modes: -* Branch takes place on the first CR test to succeed +* Branch takes place on the first CR Field test to succeed (a Great Big OR of all condition tests) -* Branch takes place only if **all** CR tests succeed: +* Branch takes place only if **all** CR field tests succeed: a Great Big AND of all condition tests (including those where the predicate is masked out and the corresponding CR Field is considered to be set to `SNZ`) When the CR Fields selected by SVP64-Augmented `BI` is marked as scalar, -then as usual the loop ends at the first element tested, after taking +then as the usual SVP64 rules apply, +the loop ends at the first element tested, after taking predication into consideration. Thus, also as usual, when a predicate mask is given, and `BI` marked as scalar, and `sz` is zero, srcstep skips forward to the first non-zero predicated element, and only that @@ -113,8 +115,8 @@ Conditional: | - | - | - | - | -- | -- | --- |---------|----------------- | |ALL|LRu| / | / | 0 | 0 | / | SNZ sz | normal mode | |ALL|LRu| / |VSb| 0 | 1 | VLI | SNZ sz | VLSET mode | -|ALL|LRu| / | / | 1 | 0 | / | SNZ sz | CTR mode | -|ALL|LRu| / |VSb| 1 | 1 | VLI | SNZ sz | CTR+VLSET mode | +|ALL|LRu|Csk| / | 1 | 0 | / | SNZ sz | CTR mode | +|ALL|LRu|Csk|VSb| 1 | 1 | VLI | SNZ sz | CTR+VLSET mode | Fields: @@ -138,8 +140,10 @@ Fields: * **VSb** is most relevant for Vertical-First VLSET Mode. After testing, if VSb is set, VL is truncated if the branch succeeds. If VSb is clear, VL is truncated if the branch did **not** take place. +* **Csk** CTR skipping. CTR Mode normally subtracts VL from CTR. + Csk refines that further -CTR mode will subtract VL from CTR rather than just decrement +Normally, CTR mode will subtract VL from CTR rather than just decrement CTR by one. Just as when v3.0B Branch-Conditional saves at least one instruction on tight inner loops through auto-decrementation of CTR, likewise it is also possible to save instruction count for @@ -149,9 +153,28 @@ behaviour. Given that Vertical-First steps through one element at a time, standard single (v3.0B) CTR decrementing should correspondingly be used instead. -If CTR+VLSET Modes are requested, the amount that CTR is decremented +If both CTR+VLSET Modes are requested, the amount that CTR is decremented by is the value of VL *after* truncation (should that occur). +Enabling CTR Skipping (Csk) has a number of options, which need explaining: + +* **Standard SVP64 CTR Mode** Csk=0, sz=0, no predicate specified. + VL will be subtracted from CTR (as already explained above) +* **Predicated CTR Mode** Csk=1, predicate is specified. + Masked-out elements are *not included* in the + count subtracted from CTR. If VL=3 but the predicate mask + is 0b101 and all CR Field Conditions are tested then CTR + will be reduced by two, *not* three (because only 2 predicate + mask bits are enabled). This includes when sz=1. +* **Non-predicated CTR Skip Mode**, Csk=1, sz=0, no + predicate specified. + Only those elements which pass the Condition Test (in + both ALL or ANY mode) will be subtracted from CTR +* **Non-predicated CTR Skip inverted**, Csk=1, sz=1, + no predicate specified. + Only those elements which **fail** the Condition + test will be subtracted from CTR + Note that, interestingly, due to the side-effects of `VLSET` mode it is actually useful to use Branch Conditional even to perform no actual branch operation, i.e to point to the instruction -- 2.30.2