2 # SVP64 Branch Conditional behaviour
6 Please note: although similar, SVP64 Branch instructions should be
7 considered completely separate and distinct from
8 standard scalar OpenPOWER-approved v3.0B branches.
9 **v3.0B branches are in no way impacted, altered,
10 changed or modified in any way, shape or form by
11 the SVP64 Vectorised Variants**.
14 extremely important to note that Branches are the
15 sole semi-exception in SVP64 to `Scalar Identity Behaviour`.
16 SVP64 Branches contain additional modes that are useful
17 for scalar operations (i.e. even when VL=1 or when
18 using single-bit predication).
22 * <https://bugs.libre-soc.org/show_bug.cgi?id=664>
23 * <http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-August/003416.html>
24 * [[openpower/isa/branch]]
28 Scalar 3.0B Branch Conditional operations, `bc`, `bctar` etc. test a
29 Condition Register. However for parallel processing it is simply impossible
30 to perform multiple independent branches: the Program Counter simply
31 cannot branch to multiple destinations based on multiple conditions.
32 The best that can be done is
33 to test multiple Conditions and make a decision of a *single* branch,
34 based on analysis of a *Vector* of CR Fields
35 which have just been calculated from a *Vector* of results.
38 binaries, which are inherently parallelised and predicated, testing all or
39 some results and branching based on multiple tests is extremely common,
40 and a fundamental part of Shader Compilers. Example:
41 without such multi-condition
42 test-and-branch, if a predicate mask is all zeros a large batch of
43 instructions may be masked out to `nop`, and it would waste
44 CPU cycles to run them. 3D GPU ISAs can test for this scenario
45 and, with the appropriate predicate-analysis instruction,
46 jump over fully-masked-out operations, by spotting that
47 *all* Conditions are false.
49 Unless Branches are aware and capable of such analysis, additional
50 instructions would be required which perform Horizontal Cumulative
51 analysis of Vectorised Condition Register Fields, in order to
52 reduce the Vector of CR Fields down to one single yes or no
53 decision that a Scalar-only v3.0B Branch-Conditional could cope with.
54 Such instructions would be unavoidable, required, and costly
55 by comparison to a single Vector-aware Branch.
56 Therefore, in order to be commercially competitive, `sv.bc` and
57 other Vector-aware Branch Conditional instructions are a high priority
58 for 3D GPU (and CUDA) workloads.
60 Given that Power ISA v3.0B is already quite powerful, particularly
61 the Condition Registers and their interaction with Branches, there
62 are opportunities to create extremely flexible and compact
63 Vectorised Branch behaviour. In addition, the side-effects (updating
64 of CTR, truncation of VL, described below) make it a useful instruction
65 even if the branch points to the next instruction (no actual branch).
69 When considering an "array" of branch-tests, there are four
70 primarily-useful modes:
71 AND, OR, NAND and NOR of all Conditions.
72 NAND and NOR may be synthesised from AND and OR by
73 inverting `BO[1]` which just leaves two modes:
75 * Branch takes place on the **first** CR Field test to succeed
76 (a Great Big OR of all condition tests)
77 * Branch takes place only if **all** CR field tests succeed:
78 a Great Big AND of all condition tests
80 Early-exit is enacted such that the Vectorised Branch does not
81 perform needless extra tests, which will help reduce reads on
82 the Condition Register file.
84 *Note: Early-exit is **MANDATORY** (required) behaviour.
85 Branches **MUST** exit at the first sequentially-encountered
87 exactly the same reasons for which it is mandatory in
88 programming languages doing early-exit: to avoid
89 damaging side-effects and to provide deterministic
90 behsviour. Speculative testing of Condition
91 Register Fields is permitted, as is speculative updating
92 of CTR, as long as, as usual in any Out-of-Order microarchitecture,
93 that speculative testing is cancelled should an early-exit occur.*
95 Also note that when early-exit occurs in Horizontal-first Mode,
96 srcstep, dststep etc. are all reset, ready to begin looping from the
97 beginning for the next instruction. However for Vertical-first
98 Mode srcstep etc. are incremented "as usual" i.e. the early-exit
99 has no special impact. This can leave srcstep etc. in an unusual
100 state on exit from a loop and it is up to the programmer to
101 reset srcstep, dststep etc. to known-good values.
103 Additional useful behaviour involves two primary Modes (both of
104 which may be enabled and combined):
106 * **VLSET Mode**: identical to Data-Dependent Fail-First Mode
107 for Arithmetic SVP64 operations, with more
108 flexibility and a close interaction and integration into the
109 underlying base Scalar v3.0B Branch instruction.
110 Truncation of VL takes place around the early-exit point.
111 * **CTR-test Mode**: gives much more flexibility over when and why
112 CTR is decremented, including options to decrement if a Condition
113 test succeeds *or if it fails*.
115 With these side-effects, basic Boolean Logic Analysis advises that
116 it is important to provide a means
117 to enact them each based on whether testing succeeds *or fails*. This
118 results in a not-insignificant number of additional Mode Augmentation bits,
119 accompanying VLSET and CTR-test Modes respectively.
121 Predicate skipping or zeroing may, as usual with SVP64, be controlled
123 Where the predicate is masked out and
124 zeroing is enabled, then in such circumstances
125 the same Boolean Logic Analysis dictates that
126 rather than testing only against zero, the option to test
127 against one is also prudent. This introduces a new
128 immediate field, `SNZ`, which works in conjunction with
132 Vectorised Branches can be used
133 in either SVP64 Horizontal-First or Vertical-First Mode. Essentially,
134 at an element level, the behaviour is identical in both Modes,
135 although the `ALL` bit is meaningless in Vertical-First Mode.
138 to bear in mind that, fundamentally, Vectorised Branch-Conditional
139 is still extremely close to the Scalar v3.0B Branch-Conditional
140 instructions, and that the same v3.0B Scalar Branch-Conditional
141 instructions are still
142 *completely separate and independent*, being unaltered and
143 unaffected by their SVP64 variants in every conceivable way.
145 *Programming note: One important point is that SVP64 instructions are 64 bit.
146 (8 bytes not 4). This needs to be taken into consideration when computing
147 branch offsets: the offset is relative to the start of the instruction,
148 which **includes** the SVP64 Prefix*
152 With element-width overrides being meaningless for Condition
153 Register Fields, bits 4 thru 7 of SVP64 RM may be used for additional
156 SVP64 RM `MODE` (includes `ELWIDTH` and `ELWIDTH_SRC` bits) for Branch
159 | 4 | 5 | 6 | 7 | 19 | 20 | 21 | 22 23 | description |
160 | - | - | - | - | -- | -- | --- |---------|----------------- |
161 |ALL|SNZ| / | / | 0 | 0 | / | LRu sz | normal mode |
162 |ALL|SNZ| / |VSb| 0 | 1 | VLI | LRu sz | VLSET mode |
163 |ALL|SNZ|CTi| / | 1 | 0 | / | LRu sz | CTR-test mode |
164 |ALL|SNZ|CTi|VSb| 1 | 1 | VLI | LRu sz | CTR-test+VLSET mode |
166 Brief description of fields:
168 * **sz=1** if predication is enabled and `sz=1` and a predicate
169 element bit is zero, `SNZ` will
170 be substituted in place of the CR bit selected by `BI`,
171 as the Condition tested.
173 normal SVP64 `sz=1` behaviour, where *only* a zero is put in
174 place of masked-out predicate bits.
175 * **sz=0** When `sz=0` skipping occurs as usual on
176 masked-out elements, but unlike all
177 other SVP64 behaviour which entirely skips an element with
178 no related side-effects at all, there are certain
179 special circumstances where CTR
180 may be decremented. See CTR-test Mode, below.
181 * **ALL** when set, all branch conditional tests must pass in order for
182 the branch to succeed. When clear, it is the first sequentially
183 encountered successful test that causes the branch to succeed.
184 This is identical behaviour to how programming languages perform
185 early-exit on Boolean Logic chains.
186 * **VLI** VLSET is identical to Data-dependent Fail-First mode.
187 In VLSET mode, VL *may* (depending on `VSb`) be truncated.
188 If VLI (Vector Length Inclusive) is clear,
189 VL is truncated to *exclude* the current element, otherwise it is
190 included. SVSTATE.MVL is not altered: only VL.
191 * **LRu**: Link Register Update. When set, Link Register will
192 only be updated if the Branch Condition succeeds. This avoids
193 destruction of LR during loops (particularly Vertical-First
195 * **VSb** In VLSET Mode, after testing,
196 if VSb is set, VL is truncated if the test succeeds. If VSb is clear,
197 VL is truncated if a test *fails*. Masked-out (skipped)
198 bits are not considered
200 * **CTi** CTR inversion. CTR-test Mode normally decrements per element
201 tested. CTR inversion decrements if a test *fails*. Only relevant
204 LRu and CTR-test modes are where SVP64 Branches subtly differ from
205 Scalar v3.0B Branches. `bclr` for example will always update LR, whereas
206 `sv.bclr/lru` will only update LR if the branch succeeds.
208 Of special interest is that when using ALL Mode (Great Big AND
209 of all Condition Tests), if `VL=0`,
210 which is rare but can occur in Data-Dependent Modes, the Branch
211 will always take place because there will be no failing Condition
212 Tests to prevent it. Likewise when not using ALL Mode (Great Big OR
213 of all Condition Tests) and `VL=0` the Branch is guaranteed not
214 to occur because there will be no *successful* Condition Tests
217 # Vectorised CR Field numbering, and Scalar behaviour
219 It is important to keep in mind that just like all SVP64 instructions,
220 the `BI` field of the base v3.0B Branch Conditional instruction
221 may be extended by SVP64 EXTRA augmentation, as well as be marked
222 as either Scalar or Vector. It is also crucially important to keep in mind
223 that for CRs, SVP64 sequentially increments the CR *Field* numbers.
224 CR *Fields* are treated as elements, not bit-numbers of the CR *register*.
226 The `BI` operand of Branch Conditional operations is five bits, in scalar
227 v3.0B this would select one bit of the 32 bit CR,
228 comprising eight CR Fields of 4 bits each. In SVP64 there are
229 16 32 bit CRs, containing 128 4-bit CR Fields. Therefore, the 2 LSBs of
230 `BI` select the bit from the CR Field (EQ LT GT SO), and the top 3 bits
231 are extended to either scalar or vector and to select CR Fields 0..127
232 as specified in SVP64 [[sv/svp64/appendix]].
234 When the CR Fields selected by SVP64-Augmented `BI` is marked as scalar,
235 then as the usual SVP64 rules apply:
236 the Vector loop ends at the first element tested
237 (the first CR *Field*), after taking
238 predication into consideration. Thus, also as usual, when a predicate mask is
239 given, and `BI` marked as scalar, and `sz` is zero, srcstep
240 skips forward to the first non-zero predicated element, and only that
241 one element is tested.
243 In other words, the fact that this is a Branch
244 Operation (instead of an arithmetic one) does not result, ultimately,
245 in significant changes as to
246 how SVP64 is fundamentally applied, except with respect to:
248 * the unique properties associated with conditionally
250 Counter (aka "a Branch"), resulting in early-out
254 Both are outlined below.
256 # Horizontal-First and Vertical-First Modes
258 In SVP64 Horizontal-First Mode, the first failure in ALL mode (Great Big
259 AND) results in early exit: no more updates to CTR occur (if requested);
260 no branch occurs, and LR is not updated (if requested). Likewise for
261 non-ALL mode (Great Big Or) on first success early exit also occurs,
262 however this time with the Branch proceeding. In both cases the testing
263 of the Vector of CRs should be done in linear sequential order (or in
264 REMAP re-sequenced order): such that tests that are sequentially beyond
265 the exit point are *not* carried out. (*Note: it is standard practice in
266 Programming languages to exit early from conditional tests, however
267 a little unusual to consider in an ISA that is designed for Parallel
268 Vector Processing. The reason is to have strictly-defined guaranteed
271 In Vertical-First Mode, setting the `ALL` bit results in `UNDEFINED`
272 behaviour. Given that only one element is being tested at a time
273 in Vertical-First Mode, a test designed to be done on multiple
276 # Description and Modes
278 Predication in both INT and CR modes may be applied to `sv.bc` and other
279 SVP64 Branch Conditional operations, exactly as they may be applied to
280 other SVP64 operations. When `sz` is zero, any masked-out Branch-element
281 operations are not included in condition testing, exactly like all other
282 SVP64 operations, *including* side-effects such as potentially updating
283 LR or CTR, which will also be skipped. There is *one* exception here,
285 `BO[2]=0, sz=0, CTR-test=0, CTi=1` and the relevant element
286 predicate mask bit is also zero:
287 under these special circumstances CTR will also decrement.
289 When `sz` is non-zero, this normally requests insertion of a zero
290 in place of the input data, when the relevant predicate mask bit is zero.
291 This would mean that a zero is inserted in place of `CR[BI+32]` for
292 testing against `BO`, which may not be desirable in all circumstances.
293 Therefore, an extra field is provided `SNZ`, which, if set, will insert
294 a **one** in place of a masked-out element, instead of a zero.
296 (*Note: Both options are provided because it is useful to deliberately
297 cause the Branch-Conditional Vector testing to fail at a specific point,
298 controlled by the Predicate mask. This is particularly useful in `VLSET`
299 mode, which will truncate SVSTATE.VL at the point of the first failed
302 Normally, CTR mode will decrement once per Condition Test, resulting
303 under normal circumstances that CTR reduces by up to VL in Horizontal-First
304 Mode. Just as when v3.0B Branch-Conditional saves at
305 least one instruction on tight inner loops through auto-decrementation
306 of CTR, likewise it is also possible to save instruction count for
307 SVP64 loops in both Vertical-First and Horizontal-First Mode, particularly
308 in circumstances where there is conditional interaction between the
309 element computation and testing, and the continuation (or otherwise)
310 of a given loop. The potential combinations of interactions is why CTR
311 testing options have been added.
313 Also, the unconditional bit `BO[0]` is still relevant when Predication
314 is applied to the Branch because in `ALL` mode all nonmasked bits have
315 to be tested, and when `sz=0` skipping occurs.
316 Even when VLSET mode is not used, CTR
317 may still be decremented by the total number of nonmasked elements,
318 acting in effect as either a popcount or cntlz depending on which
320 In short, Vectorised Branch becomes an extremely powerful tool.
324 Where a standard Scalar v3.0B branch unconditionally decrements
325 CTR when `BO[2]` is clear, CTR-test Mode introduces more flexibility
326 which allows CTR to be used for many more types of Vector loops
329 CTR-test mode and CTi interaction is as follows: note that
330 `BO[2]` is still required to be clear for CTR decrements to be
331 considered, exactly as is the case in Scalar Power ISA v3.0B
333 * **CTR-test=0, CTi=0**: CTR decrements on a per-element basis
334 if `BO[2]` is zero. Masked-out elements when `sz=0` are
335 skipped (i.e. CTR is *not* decremented when the predicate
336 bit is zero and `sz=0`).
337 * **CTR-test=0, CTi=1**: CTR decrements on a per-element basis
338 if `BO[2]` is zero and a masked-out element is skipped
339 (`sz=0` and predicate bit is zero). This one special case is the
340 **opposite** of other combinations, as well as being
341 completely different from normal SVP64 `sz=0` behaviour)
342 * **CTR-test=1, CTi=0**: CTR decrements on a per-element basis
343 if `BO[2]` is zero and the Condition Test succeeds.
344 Masked-out elements when `sz=0` are skipped (including
345 not decrementing CTR)
346 * **CTR-test=1, CTi=1**: CTR decrements on a per-element basis
347 if `BO[2]` is zero and the Condition Test *fails*.
348 Masked-out elements when `sz=0` are skipped (including
349 not decrementing CTR)
351 `CTR-test=0, CTi=1, sz=0` requires special emphasis because it is the
352 only time in the entirety of SVP64 that has side-effects when
353 a predicate mask bit is clear. **All** other SVP64 operations
354 entirely skip an element when sz=0 and a predicate mask bit is zero.
355 It is also critical to emphasise that in this unusual mode,
356 no other side-effects occur: **only** CTR is decremented, i.e. the
357 rest of the Branch operation iss skipped.
361 VLSET Mode truncates the Vector Length so that subsequent instructions
362 operate on a reduced Vector Length. This is similar to
363 Data-dependent Fail-First and LD/ST Fail-First, where for VLSET the
364 truncation occurs at the Branch decision-point.
366 Interestingly, due to the side-effects of `VLSET` mode
367 it is actually useful to use Branch Conditional even
368 to perform no actual branch operation, i.e to point to the instruction
369 after the branch. Truncation of VL would thus conditionally occur yet control
370 flow alteration would not.
372 `VLSET` mode with Vertical-First is particularly unusual. Vertical-First
373 is designed to be used for explicit looping, where an explicit call to
374 `svstep` is required to move both srcstep and dststep on to
375 the next element, until VL (or other condition) is reached.
376 Vertical-First Looping is expected (required) to terminate if the end
377 of the Vector, VL, is reached. If however that loop is terminated early
378 because VL is truncated, VLSET with Vertical-First becomes meaningless.
379 Resolving this would require two branches: one Conditional, the other
380 branching unconditionally to create the loop, where the Conditional
383 Therefore, with `VSb`, the option to decide whether truncation should occur if the
384 branch succeeds *or* if the branch condition fails allows for the flexibility
385 required. This allows a Vertical-First Branch to *either* be used as
386 a branch-back (loop) *or* as part of a conditional exit or function
387 call from *inside* a loop, and for VLSET to be integrated into both
388 types of decision-making.
390 In the case of a Vertical-First branch-back (loop), with `VSb=0` the branch takes
391 place if success conditions are met, but on exit from that loop
392 (branch condition fails), VL will be truncated. This is extremely
395 `VLSET` mode with Horizontal-First when `VSb=0` is still
396 useful, because it can be used to truncate VL to the first predicated
397 (non-masked-out) element.
399 The truncation point for VL, when VLi is clear, must not include skipped
400 elements that preceded the current element being tested.
401 Example: `sz=0, VLi=0, predicate mask = 0b110010` and the Condition
402 Register failure point is at CR Field element 4.
404 * Testing at element 0 is skipped because its predicate bit is zero
405 * Testing at element 1 passed
406 * Testing elements 2 and 3 are skipped because their
407 respective predicate mask bits are zero
408 * Testing element 4 fails therefore VL is truncated to **2**
409 not 4 due to elements 2 and 3 being skipped.
411 If `sz=1` in the above example *then* VL would have been set to 4 because
412 in non-zeroing mode the zero'd elements are still effectively part of the
413 Vector (with their respective elements set to `SNZ`)
415 If `VLI=1` then VL would be set to 5 regardless of sz, due to being inclusive
416 of the element actually being tested.
418 ## VLSET and CTR-test combined
420 If both CTR-test and VLSET Modes are requested, it's important to
421 observe the correct order. What occurs depends on whether VLi
422 is enabled, because VLi affects the length, VL.
424 If VLi (VL truncate inclusive) is set:
426 1. compute the test including whether CTR triggers
427 2. (optionally) decrement CTR
428 3. (optionally) truncate VL (VSb inverts the decision)
429 4. decide (based on step 1) whether to terminate looping
430 (including not executing step 5)
431 5. decide whether to branch.
433 If VLi is clear, then when a test fails that element
435 should **not** be considered part of the Vector. Consequently:
437 1. compute the branch test including whether CTR triggers
438 2. if the test fails against VSb, truncate VL to the *previous*
439 element, and terminate looping. No further steps executed.
440 3. (optionally) decrement CTR
441 4. decide whether to branch.
443 # Boolean Logic combinations
445 In a Scalar ISA, Branch-Conditional testing even of vector
446 results may be performed through inversion of tests. NOR of
447 all tests may be performed by inversion of the scalar condition
448 and branching *out* from the scalar loop around elements,
449 using scalar operations.
451 In a parallel (Vector) ISA it is the ISA itself which must perform
452 the prerequisite logic manipulation.
453 Thus for SVP64 there are an extraordinary number of nesessary combinations
454 which provide completely different and useful behaviour.
455 Available options to combine:
457 * `BO[0]` to make an unconditional branch would seem irrelevant if
458 it were not for predication and for side-effects (CTR Mode
460 * Enabling CTR-test Mode and setting `BO[2]` can still result in the
462 taking place, not because the Condition Test itself failed, but
463 because CTR reached zero **because**, as required by CTR-test mode,
464 CTR was decremented as a **result** of Condition Tests failing.
465 * `BO[1]` to select whether the CR bit being tested is zero or nonzero
466 * `R30` and `~R30` and other predicate mask options including CR and
467 inverted CR bit testing
468 * `sz` and `SNZ` to insert either zeros or ones in place of masked-out
470 * `ALL` or `ANY` behaviour corresponding to `AND` of all tests and
471 `OR` of all tests, respectively.
472 * Predicate Mask bits, which combine in effect with the CR being
474 * Inversion of Predicate Masks (`~r3` instead of `r3`, or using
475 `NE` rather than `EQ`) which results in an additional
476 level of possible ANDing, ORing etc. that would otherwise
477 need explicit instructions.
479 The most obviously useful combinations here are to set `BO[1]` to zero
480 in order to turn `ALL` into Great-Big-NAND and `ANY` into
481 Great-Big-NOR. Other Mode bits which perform behavioural inversion then
482 have to work round the fact that the Condition Testing is NOR or NAND.
483 The alternative to not having additional behavioural inversion
484 (`SNZ`, `VSb`, `CTi`) would be to have a second (unconditional)
485 branch directly after the first, which the first branch jumps over.
486 This contrivance is avoided by the behavioural inversion bits.
488 # Pseudocode and examples
490 Please see [[svp64/appendix]] regarding CR bit ordering and for
491 the definition of `CR{n}`
493 For comparative purposes this is a copy of the v3.0B `bc` pseudocode
496 if (mode_is_64bit) then M <- 0
498 if ¬BO[2] then CTR <- CTR - 1
499 ctr_ok <- BO[2] | ((CTR[M:63] != 0) ^ BO[3])
500 cond_ok <- BO[0] | ¬(CR[BI+32] ^ BO[1])
501 if ctr_ok & cond_ok then
502 if AA then NIA <-iea EXTS(BD || 0b00)
503 else NIA <-iea CIA + EXTS(BD || 0b00)
504 if LK then LR <-iea CIA + 4
507 Simplified pseudocode including LRu and CTR skipping, which illustrates
508 clearly that SVP64 Scalar Branches (VL=1) are **not** identical to
509 v3.0B Scalar Branches. The key areas where differences occur are
510 the inclusion of predication (which can still be used when VL=1), in
511 when and why CTR is decremented (CTRtest Mode) and whether LR is
512 updated (which is unconditional in v3.0B when LK=1, and conditional
513 in SVP64 when LRu=1).
516 if (mode_is_64bit) then M <- 0
519 if ¬predicate_bit then testbit = SVRMmode.SNZ
520 ctr_ok <- BO[2] | ((CTR[M:63] != 0) ^ BO[3])
521 cond_ok <- BO[0] | ¬(testbit ^ BO[1])
522 if ¬predicate_bit & ¬SVRMmode.sz then
523 if ¬BO[2] & CTRtest & ¬CTi then
525 stop # instruction finishes here
526 if ¬BO[2] & ¬(CTRtest & (cond_ok ^ CTi)) then CTR <- CTR - 1
527 lr_ok <- SVRMmode.LRu
528 if ctr_ok & cond_ok then
529 if AA then NIA <-iea EXTS(BD || 0b00)
530 else NIA <-iea CIA + EXTS(BD || 0b00)
532 if LK & lr_ok then LR <-iea CIA + 4
535 Below is the pseudocode for SVP64 Branches, which is a little less
536 obvious but identical to the above. The lack of obviousness is down
537 to the early-exit opportunities.
539 Pseudocode for Horizontal-First Mode:
542 if (mode_is_64bit) then M <- 0
544 cond_ok = not SVRMmode.ALL
545 for srcstep in range(VL):
546 # select predicate bit or zero/one
547 if predicate[srcstep]:
548 # get SVP64 extended CR field 0..127
549 SVCRf = SVP64EXTRA(BI>>2)
551 testbit = CRbits[BI & 0b11]
552 # testbit = CR[BI+32+srcstep*4]
553 else if not SVRMmode.sz:
554 # inverted CTR test skip mode
555 if ¬BO[2] & CTRtest & ¬CTI then
557 continue # skip to next element
559 testbit = SVRMmode.SNZ
560 # actual element test here
561 ctr_ok <- BO[2] | ((CTR[M:63] != 0) ^ BO[3])
562 el_cond_ok <- BO[0] | ¬(testbit ^ BO[1])
563 # check if CTR dec should occur
565 if CTRtest & (el_cond_ok ^ CTi) then
567 if ctrdec then CTR <- CTR - 1
570 cond_ok &= (el_cond_ok & ctr_ok)
572 cond_ok |= (el_cond_ok & ctr_ok)
573 # test for VL to be set (and exit)
574 if VLSET and VSb = (el_cond_ok & ctr_ok) then
576 SVSTATE.VL = srcstep+1
581 if SVRMmode.ALL != (el_cond_ok & ctr_ok):
583 # SVP64 rules about Scalar registers still apply!
586 # loop finally done, now test if branch (and update LR)
587 lr_ok <- SVRMmode.LRu
589 if AA then NIA <-iea EXTS(BD || 0b00)
590 else NIA <-iea CIA + EXTS(BD || 0b00)
592 if LK & lr_ok then LR <-iea CIA + 4
595 Pseudocode for Vertical-First Mode:
598 # get SVP64 extended CR field 0..127
599 SVCRf = SVP64EXTRA(BI>>2)
601 # select predicate bit or zero/one
602 if predicate[srcstep]:
603 if BRc = 1 then # CR0 vectorised
604 CR{SVCRf+srcstep} = CRbits
605 testbit = CRbits[BI & 0b11]
606 else if not SVRMmode.sz:
607 # inverted CTR test skip mode
608 if ¬BO[2] & CTRtest & ¬CTI then
610 SVSTATE.srcstep = new_srcstep
611 exit # no branch testing
613 testbit = SVRMmode.SNZ
614 # actual element test here
615 cond_ok <- BO[0] | ¬(testbit ^ BO[1])
616 # test for VL to be set (and exit)
617 if VLSET and cond_ok = VSb then
619 SVSTATE.VL = new_srcstep+1
621 SVSTATE.VL = new_srcstep
624 # Example Shader code
627 // assume f() g() or h() modify a and/or b
637 which compiles to something like:
642 pred loop_pred = a > 2;
643 // loop continues while any of a elements greater than 2
644 while(loop_pred.any()) {
645 // vector of predicate bits
646 pred if_pred = loop_pred & (b < 5);
647 // only call f() if at least 1 bit set
652 // loop mask ANDs with inverted if-test
653 pred else_pred = loop_pred & ~if_pred;
654 // only call g() if at least 1 bit set
655 if(else_pred.any()) {
662 which will end up as:
665 # start from while loop test point
668 sv.cmpi CR80.v, b.v, 5 # vector compare b into CR64 Vector
669 sv.bc/m=r30/~ALL/sz CR80.v.LT skip_f # skip when none
670 # only calculate loop_pred & pred_b because needed in f()
671 sv.crand CR80.v.SO, CR60.v.GT, CR80.V.LT # if = loop & pred_b
674 # illustrate inversion of pred_b. invert r30, test ALL
675 # rather than SOME, but masked-out zero test would FAIL,
676 # therefore masked-out instead is tested against 1 not 0
677 sv.bc/m=~r30/ALL/SNZ CR80.v.LT skip_g
678 # else = loop & ~pred_b, need this because used in g()
679 sv.crternari(A&~B) CR80.v.SO, CR60.v.GT, CR80.V.LT
682 # conditionally call h(r30) if any loop pred set
683 sv.bclr/m=r30/~ALL/sz BO[1]=1 h()
685 sv.cmpi CR60.v a.v, 2 # vector compare a into CR60 vector
686 sv.crweird r30, CR60.GT # transfer GT vector to r30
687 sv.bc/m=r30/~ALL/sz BO[1]=1 while_loop