(no commit message)
[libreriscv.git] / openpower / sv / branches.mdwn
1 # SVP64 Branch Conditional behaviour
2
3 Links
4
5 * <https://bugs.libre-soc.org/show_bug.cgi?id=664>
6 * <http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-August/003416.html>
7 * [[openpower/isa/branch]]
8
9 Scalar 3.0B Branch Conditional operations, `bc`, `bctar` etc. test a Condition Register.
10 When doing so in a Vector Context, it is quite reasonable and logical to test a *Vector* of
11 CR Fields. In 3D Shader binaries, which are inherently parallelised
12 and predicated, testing all or some results and branching based on
13 multiple tests is extremely common.
14 Therefore, `sv.bc` and other Vector-aware Branch Conditional instructions are worth
15 including.
16
17 The `BI` field of Branch Conditional operations is five bits,
18 in scalar v3.0B this would select one bit of the 32 bit CR.
19 In SVP64 there are 16 32 bit CRs, containing 128 4-bit CR Fields.
20 Therefore, the 2 LSBs of `BI` select the bit from the CR, and the
21 top 3 bits are extended to either scalar or vector and to
22 select CR Fields 0..127 as specified
23 in SVP64 [[sv/svp64/appendix]]
24
25 When considering an "array" of branches, there are two useful modes:
26
27 * Branch takes place on the first CR test to succeed
28 (a Great Big OR of all condition tests)
29 * Branch takes place only if **all** CR tests succeed:
30 a Great Big AND of all condition tests
31 (including those where the predicate is masked out
32 and the corresponding CR Field is considered to be
33 set to `SNZ`)
34
35 In Vertical-First Mode, the `ALL` bit should
36 not be used. If set, behaviour is `UNDEFINED`.
37 (*The reason is that Vertical-First hints may permit
38 multiple elements up to hint length to be executed
39 in parallel, however the number is entirely up to
40 implementors. Attempting to test an arbitrary
41 indeterminate number of Conditional tests is impossible
42 to define, and efforts to enforce such defined behaviour
43 interfere with Vertical-First mode parallel
44 opportunistic behaviour.*)
45
46 In `svstep` mode,
47 the whole CR Field, part of which is
48 selected by `BI` (top 3 bits) is updated based on
49 incrementing srcstep and dststep, and performing the
50 same tests as [[sv/svstep]], following which the Branch
51 Conditional instruction proceeds as normal (reading
52 and testing the CR bit just updated, if the relevant
53 `BO` bit is set). Note that the SVSTATE fields
54 are still updated, and the CR field still updated,
55 even if the `BO` bits do not require CR testing.
56
57 Predication in both INT and CR modes may be applied to
58 `sv.bc` and other SVP64 Branch Conditional operations,
59 exactly as they may be applied to other SVP64 operations.
60 When `sz` is zero, any masked-out Branch-element operations
61 are not executed, exactly like all other SVP64
62 operations.
63
64 However when `sz` is non-zero, this normally requests insertion
65 of a zero in place of the input data, when the relevant predicate
66 mask bit is zero. This would mean that a zero is inserted in
67 place of `CR[BI+32]` for testing against `BO`, which may not
68 be desirable in all circumstances. Therefore, an extra field
69 is provided `SNZ`, which, if set, will insert a **one** in
70 place of a masked-out element instead of a zero.
71
72 (*Note: Both options are provided because it is useful to
73 deliberately cause the Branch-Conditional Vector testing
74 to fail at a specific point, controlled by the Predicate
75 mask. This is particularly useful in `VLSET` mode, which
76 will truncate SVSTATE.VL at the point of the first failed
77 test.*)
78
79 SVP64 RM `MODE` for Branch Conditional:
80
81 | 0-1 | 2 | 3 4 | description |
82 | --- | --- |---------|-------------------------- |
83 | 00 | SNZ | ALL sz | normal mode |
84 | 01 | VLI | ALL sz | VLSET mode |
85 | 10 | SNZ | ALL sz | svstep mode |
86 | 11 | VLI | ALL sz | svstep VLSET mode, in Horizontal-First |
87 | 11 | VLI | SNZ sz | svstep VLSET mode, in Vertical-First |
88
89 Fields:
90
91 * **sz** if predication is enabled will put 4 copies of `SNZ` in place of the src CR Field when the predicate bit is zero. otherwise the element is ignored or skipped, depending on context.
92 * **ALL** when set, all branch conditional tests must pass in order for
93 the branch to succeed.
94 * **VLI** In VLSET mode, VL is set equal (truncated) to the first branch
95 which succeeds. If VLI (Vector Length Inclusive) is clear, VL is truncated
96 to *exclude* the current element, otherwise it is included. SVSTATE.MVL is not changed.
97
98 svstep mode will run an increment of SVSTATE srcstep and dststep
99 (which is still useful in Horizontal First Mode). Unlike `svstep.` however
100 which updates only CR0 with the testing of REMAP loop progress,
101 the CR Field is taken from the branch `BI` field, and updated
102 prior to proceeding to each element branch conditional testing.
103
104 Note that, interestingly, due to the useful side-effects of `VLSET` mode
105 and `svstep` mode it is actually useful to use Branch Conditional even
106 to perform no actual branch operation, i.e to point to the instruction
107 after the branch.
108 In particular, svstep mode is still useful for Horizontal-First Mode
109 particularly in combination with REMAP. All "loop end" conditions
110 will be tested on a per-element basis and placed into a Vector of
111 CRs starting from the point specified by the Branch `BI` field.
112 This Vector of CR Fields may then be subsequently used as a Predicate
113 Mask, and, furthermore, if VLSET mode was requested, VL will have
114 been set to the length of one of the loop endpoints, again as specified
115 by the bit from the Branch `BI` field.
116
117 Available options to combine:
118
119 * `BO[1]` to select whether the CR bit being tested is zero or nonzero
120 * `R30` and `~R30` and other predicate mask options including CR and
121 inverted CR bit testing
122 * `sz` and `SNZ` to insert either zeros or ones in place of masked-out
123 predicate bits
124 * `ALL` or `ANY` behaviour corresponding to `AND` of all tests and
125 `OR` of all tests, respectively.
126
127 Pseudocode for Horizontal-First Mode:
128
129 ```
130 if BO[0]:
131 cond_ok = 1
132 else
133 cond_ok = not SVRMmode.ALL
134 for srcstep in range(VL):
135 new_srcstep, CRbits = SVSTATE_NEXT(srcstep)
136 # select predicate bit or zero/one
137 if predicate[srcstep]:
138 # get SVP64 extended CR field 0..127
139 SVCRf = SVP64EXTRA(BI>>2)
140 CR{SVCRf+srcstep} = CRbits
141 testbit = CRbits[BI & 0b11]
142 # testbit = CR[BI+32+srcstep*4]
143 else if not SVRMmode.sz:
144 continue
145 else
146 testbit = SVRMmode.SNZ
147 # actual element test here
148 el_cond_ok <- ¬(testbit ^ BO[1])
149 # merge in the test
150 if SVRMmode.ALL:
151 cond_ok &= el_cond_ok
152 else
153 cond_ok |= el_cond_ok
154 # test for VL to be set (and exit)
155 if ~el_cond_ok and VLSET
156 if SVRMmode.VLI
157 SVSTATE.VL = srcstep+1
158 else
159 SVSTATE.VL = srcstep
160 break
161 # early exit?
162 if SVRMmode.ALL:
163 if ~el_cond_ok:
164 break
165 else
166 if el_cond_ok:
167 break
168 ```
169
170 Pseudocode for Vertical-First Mode:
171
172 ```
173 new_srcstep, CRbits = SVSTATE_NEXT(srcstep)
174 if BO[0]:
175 cond_ok = 1
176 else
177 # select predicate bit or zero/one
178 if predicate[srcstep]:
179 # get SVP64 extended CR field 0..127
180 SVCRf = SVP64EXTRA(BI>>2)
181 CR{SVCRf+srcstep} = CRbits
182 testbit = CRbits[BI & 0b11]
183 else if not SVRMmode.sz:
184 SVSTATE.srcstep = new_srcstep
185 exit # no branch testing
186 else
187 testbit = SVRMmode.SNZ
188 # actual element test here
189 cond_ok <- ¬(testbit ^ BO[1])
190 # test for VL to be set (and exit)
191 if ~cond_ok and VLSET
192 if SVRMmode.VLI
193 SVSTATE.VL = new_srcstep+1
194 else
195 SVSTATE.VL = new_srcstep
196 SVSTATE.srcstep = new_srcstep
197 ```