(no commit message)
[libreriscv.git] / openpower / sv / branches.mdwn
1 [[!tag standards]]
2 # SVP64 Branch Conditional behaviour
3
4 **DRAFT STATUS**
5
6 Please note: SVP64 Branch instructions should be
7 considered completely separate and distinct from
8 standard scalar OpenPOWER-approved v3.0B branches.
9 **v3.0B branches are in no way impacted, altered,
10 changed or modified in any way, shape or form by
11 the SVP64 Vectorised Variants**.
12
13 Links
14
15 * <https://bugs.libre-soc.org/show_bug.cgi?id=664>
16 * <http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-August/003416.html>
17 * [[openpower/isa/branch]]
18
19 # Rationale
20
21 Scalar 3.0B Branch Conditional operations, `bc`, `bctar` etc. test a
22 Condition Register. However for parallel processing it is simply impossible
23 to perform multiple independent branches: the Program Counter simply
24 cannot branch to multiple destinations based on multiple conditions.
25 The best that can be done is
26 to test multiple Conditions and make a decision of a *single* branch,
27 based on analysis of a *Vector* of CR Fields
28 which have just been calculated from a *Vector* of results.
29
30 In 3D Shader
31 binaries, which are inherently parallelised and predicated, testing all or
32 some results and branching based on multiple tests is extremely common,
33 and a fundamental part of Shader Compilers. Example:
34 without such multi-condition
35 test-and-branch, if a predicate mask is all zeros a large batch of
36 instructions may be masked out to `nop`, and it would waste
37 CPU cycles to run them. 3D GPU ISAs can test for this scenario
38 and, with the appropriate predicate-analysis instruction,
39 jump over fully-masked-out operations, by spotting that
40 *all* Conditions are false.
41
42 Unless Branches are aware and capable of such analysis, additional
43 instructions would be required which perform Horizontal Cumulative
44 analysis of Vectorised Condition Register Fields, in order to
45 reduce the Vector of CR Fields down to one single yes or no
46 decision that a Scalar-only v3.0B Branch-Conditional could cope with.
47 Such instructions would be unavoidable, required, and costly
48 by comparison to a single Vector-aware Branch.
49 Therefore, in order to be commercially competitive, `sv.bc` and
50 other Vector-aware Branch Conditional instructions are a high priority
51 for 3D GPU workloads.
52
53 Given that Power ISA v3.0B is already quite powerful, particularly
54 the Condition Registers and their interaction with Branches, there
55 are opportunities to create extremely flexible and compact
56 Vectorised Branch behaviour. In addition, the side-effects (updating
57 of CTR, truncation of VL, described below) make it a useful instruction
58 even if the branch points to the next instruction (no actual branch).
59
60 # Overview
61
62 When considering an "array" of branch-tests, there are four useful modes:
63 AND, OR, NAND and NOR of all Conditions.
64 NAND and NOR may be synthesised from AND and OR by
65 inverting `BO[1]` which just leaves two modes:
66
67 * Branch takes place on the **first** CR Field test to succeed
68 (a Great Big OR of all condition tests)
69 * Branch takes place only if **all** CR field tests succeed:
70 a Great Big AND of all condition tests
71
72 Early-exit is enacted such that the Vectorised Branch does not
73 perform needless extra tests, which will help reduce reads on
74 the Condition Register file.
75
76 *Note: Early-exit is **MANDATORY** (required) behaviour.
77 Branches **MUST** exit at the first failure point, for
78 exactly the same reasons for which it is mandatory in
79 programming languages doing early-exit: to avoid
80 damaging side-effects. Speculative testing of Condition
81 Register Fields is permitted, as is speculative updating
82 of CTR, as long as, as usual in any Out-of-Order microarchitecture,
83 that speculative testing is cancelled should an early-exit occur.*
84
85 Additional useful behaviour involves two primary Modes (both of
86 which may be enabled and combined):
87
88 * **VLSET Mode**: identical to Data-Dependent Fail-First Mode
89 for Arithmetic SVP64 operations, with more
90 flexibility and a close interaction and integration into the
91 underlying base Scalar v3.0B Branch instruction.
92 Truncation of VL takes place around the early-exit point.
93 * **CTR-test Mode**: gives much more flexibility over when and why
94 CTR is decremented, including options to decrement if a Condition
95 test succeeds *or if it fails*.
96
97 With these side-effects, basic Boolean Logic Analysis advises that
98 it is important to provide a means
99 to enact them each based on whether testing succeeds *or fails*. This
100 results in a not-insignificant number of additional Mode Augmentation bits,
101 accompanying VLSET and CTR-test Modes respectively.
102
103 Predicate skipping or zeroing may, as usual with SVP64, be controlled
104 by `sz`.
105 Where the predicate is masked out and
106 zeroing is enabled, then in such circumstances
107 the same Boolean Logic Analysis dictates that
108 rather than testing only against zero, the option to test
109 against one is also prudent. This introduces a new
110 immediate field, `SNZ`, which works in conjunction with
111 `sz`.
112
113
114 Vectorised Branches can be used
115 in either SVP64 Horizontal-First or Vertical-First Mode. Essentially,
116 at an element level, the behaviour is identical in both Modes,
117 although the `ALL` bit is meaningless in Vertical-First Mode.
118
119 It is also important
120 to bear in mind that, fundamentally, Vectorised Branch-Conditional
121 is still extremely close to the Scalar v3.0B Branch-Conditional
122 instructions, and that the same v3.0B Scalar Branch-Conditional
123 instructions are still
124 *completely separate and independent*, being unaltered and
125 unaffected by their SVP64 variants in every conceivable way.
126
127 *Programming note: One important point is that SVP64 instructions are 64 bit.
128 (8 bytes not 4). This needs to be taken into consideration when computing
129 branch offsets: the offset is relative to the start of the instruction,
130 which **includes** the SVP64 Prefix*
131
132 # Format and fields
133
134 With element-width overrides being meaningless for Condition
135 Register Fields, bits 4 thru 7 of SVP64 RM may be used for additional
136 Mode bits.
137
138 SVP64 RM `MODE` (includes `ELWIDTH` and `ELWIDTH_SRC` bits) for Branch
139 Conditional:
140
141 | 4 | 5 | 6 | 7 | 19 | 20 | 21 | 22 23 | description |
142 | - | - | - | - | -- | -- | --- |---------|----------------- |
143 |ALL|SNZ| / | / | 0 | 0 | / | LRu sz | normal mode |
144 |ALL|SNZ| / |VSb| 0 | 1 | VLI | LRu sz | VLSET mode |
145 |ALL|SNZ|CTi| / | 1 | 0 | / | LRu sz | CTR-test mode |
146 |ALL|SNZ|CTi|VSb| 1 | 1 | VLI | LRu sz | CTR-test+VLSET mode |
147
148 Brief description of fields:
149
150 * **sz=1** if predication is enabled and `sz=1` and a predicate
151 element bit is zero, `SNZ` will
152 be substituted in place of the CR bit selected by `BI`,
153 as the Condition tested.
154 Contrast this with
155 normal SVP64 `sz=1` behaviour, where *only* a zero is put in
156 place of masked-out predicate bits.
157 * **sz=0** When `sz=0` skipping occurs as usual on
158 masked-out elements, but unlike all
159 other SVP64 behaviour which entirely skips an element with
160 no related side-effects at all, there are certain
161 special circumstances where CTR
162 may be decremented. See CTR-test Mode, below.
163 * **ALL** when set, all branch conditional tests must pass in order for
164 the branch to succeed. When clear, it is the first sequentially
165 encountered successful test that causes the branch to succeed.
166 This is identical behaviour to how programming languages perform
167 early-exit on Boolean Logic chains.
168 * **VLI** VLSET is identical to Data-dependent Fail-First mode.
169 In VLSET mode, VL *may* (depending on `VSb`) be truncated.
170 If VLI (Vector Length Inclusive) is clear,
171 VL is truncated to *exclude* the current element, otherwise it is
172 included. SVSTATE.MVL is not altered: only VL.
173 * **LRu**: Link Register Update. When set, Link Register will
174 only be updated if the Branch Condition succeeds. This avoids
175 destruction of LR during loops (particularly Vertical-First
176 ones).
177 * **VSb** In VLSET Mode, after testing,
178 if VSb is set, VL is truncated if the branch succeeds. If VSb is clear,
179 VL is truncated if the branch did **not** take place.
180 * **CTi** CTR inversion. CTR-test Mode normally decrements per element
181 tested. CTR inversion decrements if a test *fails*. Only relevant
182 in CTR-test Mode.
183
184 # Vectorised CR Field numbering, and Scalar behaviour
185
186 It is important to keep in mind that just like all SVP64 instructions,
187 the `BI` field of the base v3.0B Branch Conditional instruction
188 may be extended by SVP64 EXTRA augmentation, as well as be marked
189 as either Scalar or Vector.
190
191 The `BI` field of Branch Conditional operations is five bits, in scalar
192 v3.0B this would select one bit of the 32 bit CR,
193 comprising eight CR Fields of 4 bits each. In SVP64 there are
194 16 32 bit CRs, containing 128 4-bit CR Fields. Therefore, the 2 LSBs of
195 `BI` select the bit from the CR Field (EQ LT GT SO), and the top 3 bits
196 are extended to either scalar or vector and to select CR Fields 0..127
197 as specified in SVP64 [[sv/svp64/appendix]].
198
199 When the CR Fields selected by SVP64-Augmented `BI` is marked as scalar,
200 then as the usual SVP64 rules apply:
201 the Vector loop ends at the first element tested, after taking
202 predication into consideration. Thus, also as usual, when a predicate mask is
203 given, and `BI` marked as scalar, and `sz` is zero, srcstep
204 skips forward to the first non-zero predicated element, and only that
205 one element is tested.
206
207 In other words, the fact that this is a Branch
208 Operation (instead of an arithmetic one) does not result, ultimately,
209 in significant changes as to
210 how SVP64 is fundamentally applied, except with respect to:
211
212 * the unique properties associated with conditionally
213 changing the Program
214 Counter (aka "a Branch"), resulting in early-out
215 opportunities
216 * CTR-testing
217
218 Both are outlined below.
219
220 # Horizontal-First and Vertical-First Modes
221
222 In SVP64 Horizontal-First Mode, the first failure in ALL mode (Great Big
223 AND) results in early exit: no more updates to CTR occur (if requested);
224 no branch occurs, and LR is not updated (if requested). Likewise for
225 non-ALL mode (Great Big Or) on first success early exit also occurs,
226 however this time with the Branch proceeding. In both cases the testing
227 of the Vector of CRs should be done in linear sequential order (or in
228 REMAP re-sequenced order): such that tests that are sequentially beyond
229 the exit point are *not* carried out. (*Note: it is standard practice in
230 Programming languages to exit early from conditional tests, however
231 a little unusual to consider in an ISA that is designed for Parallel
232 Vector Processing. The reason is to have strictly-defined guaranteed
233 behaviour*)
234
235 In Vertical-First Mode, setting the `ALL` bit results in `UNDEFINED`
236 behaviour. Given that only one element is being tested at a time
237 in Vertical-First Mode, a test designed to be done on multiple
238 bits is meaningless.
239
240 # Description and Modes
241
242 Predication in both INT and CR modes may be applied to `sv.bc` and other
243 SVP64 Branch Conditional operations, exactly as they may be applied to
244 other SVP64 operations. When `sz` is zero, any masked-out Branch-element
245 operations are not included in condition testing, exactly like all other
246 SVP64 operations, *including* side-effects such as potentially updating
247 LR or CTR, which will also be skipped. There is *one* exception here,
248 which is when
249 `BO[2]=0, sz=0, CTR-test=0, CTi=1` and the relevant element
250 predicate mask bit is also zero:
251 under these special circumstances CTR will also decrement.
252
253 When `sz` is non-zero, this normally requests insertion of a zero
254 in place of the input data, when the relevant predicate mask bit is zero.
255 This would mean that a zero is inserted in place of `CR[BI+32]` for
256 testing against `BO`, which may not be desirable in all circumstances.
257 Therefore, an extra field is provided `SNZ`, which, if set, will insert
258 a **one** in place of a masked-out element, instead of a zero.
259
260 (*Note: Both options are provided because it is useful to deliberately
261 cause the Branch-Conditional Vector testing to fail at a specific point,
262 controlled by the Predicate mask. This is particularly useful in `VLSET`
263 mode, which will truncate SVSTATE.VL at the point of the first failed
264 test.*)
265
266 Normally, CTR mode will decrement once per Condition Test, resulting
267 under normal circumstances that CTR reduces by up to VL in Horizontal-First
268 Mode. Just as when v3.0B Branch-Conditional saves at
269 least one instruction on tight inner loops through auto-decrementation
270 of CTR, likewise it is also possible to save instruction count for
271 SVP64 loops in both Vertical-First and Horizontal-First Mode, particularly
272 in circumstances where there is conditional interaction between the
273 element computation and testing, and the continuation (or otherwise)
274 of a given loop. The potential combinations of interactions is why CTR
275 testing options have been added.
276
277 Also, the unconditional bit `BO[0]` is still relevant when Predication
278 is applied to the Branch because in `ALL` mode all nonmasked bits have
279 to be tested, and when `sz=0` skipping occurs.
280 Even when VLSET mode is not used, CTR
281 may still be decremented by the total number of nonmasked elements,
282 acting in effect as either a popcount or cntlz depending on which
283 mode bits are set.
284 In short, Vectorised Branch becomes an extremely powerful tool.
285
286 ## CTR-test
287
288 Where a standard Scalar v3.0B branch unconditionally decrements
289 CTR when `BO[2]` is clear, CTR-test Mode introduces more flexibility
290 which allows CTR to be used for many more types of Vector loops
291 constructs.
292
293 CTR-test mode and CTi interaction is as follows: note that
294 `BO[2]` is still required to be clear for CTR decrements to be
295 considered, exactly as is the case in Scalar Power ISA v3.0B
296
297 * **CTR-test=0, CTi=0**: CTR decrements on a per-element basis
298 if `BO[2]` is zero. Masked-out elements when `sz=0` are
299 skipped (i.e. CTR is *not* decremented when the predicate
300 bit is zero and `sz=0`).
301 * **CTR-test=0, CTi=1**: CTR decrements on a per-element basis
302 if `BO[2]` is zero and a masked-out element is skipped
303 (`sz=0` and predicate bit is zero). This one special case is the
304 **opposite** of other combinations, as well as being
305 completely different from normal SVP64 `sz=0` behaviour)
306 * **CTR-test=1, CTi=0**: CTR decrements on a per-element basis
307 if `BO[2]` is zero and the Condition Test succeeds.
308 Masked-out elements when `sz=0` are skipped (including
309 not decrementing CTR)
310 * **CTR-test=1, CTi=1**: CTR decrements on a per-element basis
311 if `BO[2]` is zero and the Condition Test *fails*.
312 Masked-out elements when `sz=0` are skipped (including
313 not decrementing CTR)
314
315 `CTR-test=0, CTi=1, sz=0` requires special emphasis because it is the
316 only time in the entirety of SVP64 that has side-effects when
317 a predicate mask bit is clear. **All** other SVP64 operations
318 entirely skip an element when sz=0 and a predicate mask bit is zero.
319 It is also critical to emphasise that in this unusual mode,
320 no other side-effects occur: **only** CTR is decremented, i.e. the
321 rest of the Branch operation iss skipped.
322
323 # VLSET Mode
324
325 VLSET Mode truncates the Vector Length so that subsequent instructions
326 operate on a reduced Vector Length. This is similar to
327 Data-dependent Fail-First and LD/ST Fail-First, where for VLSET the
328 truncation occurs at the Branch decision-point.
329
330 Interestingly, due to the side-effects of `VLSET` mode
331 it is actually useful to use Branch Conditional even
332 to perform no actual branch operation, i.e to point to the instruction
333 after the branch. Truncation of VL would thus conditionally occur yet control
334 flow alteration would not.
335
336 `VLSET` mode with Vertical-First is particularly unusual. Vertical-First
337 is designed to be used for explicit looping, where an explicit call to
338 `svstep` is required to move both srcstep and dststep on to
339 the next element, until VL (or other condition) is reached.
340 Vertical-First Looping is expected (required) to terminate if the end
341 of the Vector, VL, is reached. If however that loop is terminated early
342 because VL is truncated, VLSET with Vertical-First becomes meaningless.
343 Resolving this would require two branches: one Conditional, the other
344 branching unconditionally to create the loop, where the Conditional
345 one jumps over it.
346
347 Therefore, with `VSb`, the option to decide whether truncation should occur if the
348 branch succeeds *or* if the branch condition fails allows for the flexibility
349 required. This allows a Vertical-First Branch to *either* be used as
350 a branch-back (loop) *or* as part of a conditional exit or function
351 call from *inside* a loop, and for VLSET to be integrated into both
352 types of decision-making.
353
354 In the case of a Vertical-First branch-back (loop), with `VSb=0` the branch takes
355 place if success conditions are met, but on exit from that loop
356 (branch condition fails), VL will be truncated. This is extremely
357 useful.
358
359 `VLSET` mode with Horizontal-First when `VSb=0` is still
360 useful, because it can be used to truncate VL to the first predicated
361 (non-masked-out) element.
362
363 The truncation point for VL, when VLi is clear, must not include skipped
364 elements that preceded the current element being tested.
365 Example: `sz=0, VLi=0, predicate mask = 0b110010` and the Condition
366 failure point is at element 4.
367
368 * Testing at element 0 is skipped because its predicate bit is zero
369 * Testing at element 1 passed
370 * Testing elements 2 and 3 are skipped because their
371 respective predicate mask bits are zero
372 * Testing element 4 fails therefore VL is truncated to **2**
373 not 4 due to elements 2 and 3 being skipped.
374
375 If `sz=1` in the above example *then* VL would have been set to 4 because
376 in non-zeroing mode the zero'd elements are still effectively part of the
377 Vector (with their respective elements set to `SNZ`)
378
379 If `VLI=1` then VL would be set to 5 regardless of sz, due to being inclusive
380 of the element actually being tested.
381
382 ## VLSET and CTR-test combined
383
384 If both CTR-test and VLSET Modes are requested, it's important to
385 observe the correct order. What occurs depends on whether VLi
386 is enabled, because VLi affects the length, VL.
387
388 If VLi (VL truncate inclusive) is set:
389
390 1. compute the test including whether CTR triggers
391 2. (optionally) decrement CTR
392 3. (optionally) truncate VL (VSb inverts the decision)
393 4. decide (based on step 1) whether to terminate looping
394 (including not executing step 5)
395 5. decide whether to branch.
396
397 If VLi is clear, then when a test fails that element
398 and any following it
399 should **not** be considered part of the Vector. Consequently:
400
401 1. compute the branch test including whether CTR triggers
402 2. if the test fails against VSb, truncate VL to the *previous*
403 element, and terminate looping. No further steps executed.
404 3. (optionally) decrement CTR
405 4. decide whether to branch.
406
407 # Boolean Logic combinations
408
409 There are an extraordinary number of different combinations which
410 provide completely different and useful behaviour.
411 Available options to combine:
412
413 * `BO[0]` to make an unconditional branch would seem irrelevant if
414 it were not for predication and for side-effects (CTR Mode
415 for example)
416 * Enabling CTR-test Mode and setting `BO[2]` can still result in the
417 Branch
418 taking place, not because the Condition Test itself failed, but
419 because CTR reached zero **because**, as required by CTR-test mode,
420 CTR was decremented as a **result** of Condition Tests failing.
421 * `BO[1]` to select whether the CR bit being tested is zero or nonzero
422 * `R30` and `~R30` and other predicate mask options including CR and
423 inverted CR bit testing
424 * `sz` and `SNZ` to insert either zeros or ones in place of masked-out
425 predicate bits
426 * `ALL` or `ANY` behaviour corresponding to `AND` of all tests and
427 `OR` of all tests, respectively.
428 * Predicate Mask bits, which combine in effect with the CR being
429 tested.
430 * Inversion of Predicate Masks (`~r3` instead of `r3`, or using
431 `NE` rather than `EQ`) which results in an additional
432 level of possible ANDing, ORing etc. that would otherwise
433 need explicit instructions.
434
435 The most obviously useful combinations here are to set `BO[1]` to zero
436 in order to turn `ALL` into Great-Big-NAND and `ANY` into
437 Great-Big-NOR. Other Mode bits which perform behavioural inversion then
438 have to work round the fact that the Condition Testing is NOR or NAND.
439 The alternative to not having additional behavioural inversion
440 (`SNZ`, `VSb`, `CTi`) would be to have a second (unconditional)
441 branch directly after the first, which the first branch jumps over.
442 This contrived construct is avoided by the behavioural inversion bits.
443
444 # Pseudocode and examples
445
446 Pseudocode for Horizontal-First Mode:
447
448 ```
449 cond_ok = not SVRMmode.ALL
450 for srcstep in range(VL):
451 # select predicate bit or zero/one
452 if predicate[srcstep]:
453 # get SVP64 extended CR field 0..127
454 SVCRf = SVP64EXTRA(BI>>2)
455 CRbits = CR{SVCRf}
456 testbit = CRbits[BI & 0b11]
457 # testbit = CR[BI+32+srcstep*4]
458 else if not SVRMmode.sz:
459 # inverted CTR test skip mode
460 if ¬BO[2] & CTRtest & ¬CTI then
461 CTR = CTR - 1
462 continue
463 else
464 testbit = SVRMmode.SNZ
465 # actual element test here
466 el_cond_ok <- BO[0] | ¬(testbit ^ BO[1])
467 # merge in the test
468 if SVRMmode.ALL:
469 cond_ok &= el_cond_ok
470 else
471 cond_ok |= el_cond_ok
472 # test for VL to be set (and exit)
473 if VLSET and VSb = el_cond_ok then
474 if SVRMmode.VLI
475 SVSTATE.VL = srcstep+1
476 else
477 SVSTATE.VL = srcstep
478 break
479 # early exit?
480 if SVRMmode.ALL:
481 if ~el_cond_ok:
482 break
483 else
484 if el_cond_ok:
485 break
486 if SVCRf.scalar:
487 break
488 ```
489
490 Pseudocode for Vertical-First Mode:
491
492 ```
493 # get SVP64 extended CR field 0..127
494 SVCRf = SVP64EXTRA(BI>>2)
495 CRbits = CR{SVCRf}
496 # select predicate bit or zero/one
497 if predicate[srcstep]:
498 if BRc = 1 then # CR0 vectorised
499 CR{SVCRf+srcstep} = CRbits
500 testbit = CRbits[BI & 0b11]
501 else if not SVRMmode.sz:
502 # inverted CTR test skip mode
503 if ¬BO[2] & CTRtest & ¬CTI then
504 CTR = CTR - 1
505 SVSTATE.srcstep = new_srcstep
506 exit # no branch testing
507 else
508 testbit = SVRMmode.SNZ
509 # actual element test here
510 cond_ok <- BO[0] | ¬(testbit ^ BO[1])
511 # test for VL to be set (and exit)
512 if VLSET and cond_ok = VSb then
513 if SVRMmode.VLI
514 SVSTATE.VL = new_srcstep+1
515 else
516 SVSTATE.VL = new_srcstep
517 ```
518
519 v3.0B branch pseudocode including LRu and CTR skipping
520
521 ```
522 if (mode_is_64bit) then M <- 0
523 else M <- 32
524 cond_ok <- BO[0] | ¬(CR[BI+32] ^ BO[1])
525 ctrdec = ¬BO[2]
526 if CTRtest & (cond_ok ^ CTi) then
527 ctrdec = 0b0
528 if ctrdec then CTR <- CTR - 1
529 ctr_ok <- BO[2] | ((CTR[M:63] != 0) ^ BO[3])
530 lr_ok <- SVRMmode.LRu
531 if ctr_ok & cond_ok then
532 if AA then NIA <-iea EXTS(BD || 0b00)
533 else NIA <-iea CIA + EXTS(BD || 0b00)
534 lr_ok <- 0b1
535 if LK & lr_ok then LR <-iea CIA + 4
536 ```
537
538 # Example Shader code
539
540 ```
541 while(a > 2) {
542 if(b < 5)
543 f();
544 else
545 g();
546 h();
547 }
548 ```
549
550 which compiles to something like:
551
552 ```
553 vec<i32> a, b;
554 // ...
555 pred loop_pred = a > 2;
556 while(loop_pred.any()) {
557 pred if_pred = loop_pred & (b < 5);
558 if(if_pred.any()) {
559 f(if_pred);
560 }
561 label1:
562 pred else_pred = loop_pred & ~if_pred;
563 if(else_pred.any()) {
564 g(else_pred);
565 }
566 h(loop_pred);
567 }
568 ```
569
570 which will end up as:
571
572 ```
573 sv.cmpi CR60.v a.v, 2 # vector compare a into CR60 vector
574 sv.crweird r30, CR60.GT # transfer GT vector to r30
575 while_loop:
576 sv.cmpi CR80.v, b.v, 5 # vector compare b into CR64 Vector
577 sv.bc/m=r30/~ALL/sz CR80.v.LT skip_f # skip when none
578 # only calculate loop_pred & pred_b because needed in f()
579 sv.crand CR80.v.SO, CR60.v.GT, CR80.V.LT # if = loop & pred_b
580 f(CR80.v.SO)
581 skip_f:
582 # illustrate inversion of pred_b. invert r30, test ALL
583 # rather than SOME, but masked-out zero test would FAIL,
584 # therefore masked-out instead is tested against 1 not 0
585 sv.bc/m=~r30/ALL/SNZ CR80.v.LT skip_g
586 # else = loop & ~pred_b, need this because used in g()
587 sv.crternari(A&~B) CR80.v.SO, CR60.v.GT, CR80.V.LT
588 g(CR80.v.SO)
589 skip_g:
590 # conditionally call h(r30) if any loop pred set
591 sv.bclr/m=r30/~ALL/sz BO[1]=1 h()
592 sv.bc/m=r30/~ALL/sz BO[1]=1 while_loop
593 ```