(no commit message)
[libreriscv.git] / openpower / sv / branches.mdwn
1 # SVP64 Branch Conditional behaviour
2
3 Links
4
5 * <https://bugs.libre-soc.org/show_bug.cgi?id=664>
6 * <http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-August/003416.html>
7 * [[openpower/isa/branch]]
8
9 Scalar 3.0B Branch Conditional operations, `bc`, `bctar` etc. test a
10 Condition Register. When doing so in a Vector Context, it is quite
11 reasonable and logical to test a *Vector* of CR Fields. In 3D Shader
12 binaries, which are inherently parallelised and predicated, testing all or
13 some results and branching based on multiple tests is extremely common,
14 and a fundamental part of Shader Compilers. Therefore, `sv.bc` and
15 other Vector-aware Branch Conditional instructions are worth including.
16
17 The `BI` field of Branch Conditional operations is five bits, in scalar
18 v3.0B this would select one bit of the 32 bit CR. In SVP64 there are
19 16 32 bit CRs, containing 128 4-bit CR Fields. Therefore, the 2 LSBs of
20 `BI` select the bit from the CR Field (EQ LT GT SO), and the top 3 bits
21 are extended to either scalar or vector and to select CR Fields 0..127
22 as specified in SVP64 [[sv/svp64/appendix]]
23
24 When considering an "array" of branches, there are two useful modes:
25
26 * Branch takes place on the first CR test to succeed
27 (a Great Big OR of all condition tests)
28 * Branch takes place only if **all** CR tests succeed:
29 a Great Big AND of all condition tests
30 (including those where the predicate is masked out
31 and the corresponding CR Field is considered to be
32 set to `SNZ`)
33
34 In SVP64 Horizontal-First Mode, the first failure in ALL mode (Great Big
35 AND) results in early exit: no more updates to CTR occur (if requested);
36 no branch occurs, and LR is not updated (if requested). Likewise for
37 non-ALL mode (Great Big Or) on first success early exit also occurs,
38 however this time with the Branch proceeding. In both cases the testing
39 of the Vector of CRs should be done in linear sequential order (or in
40 REMAP re-sequenced order): such that tests that are sequentially beyond
41 the exit point are *not* carried out. (*Note: is standard practice in
42 Programming languages to exit early from conditional tests*)
43
44 In Vertical-First Mode, the `ALL` bit should not be used. If set,
45 behaviour is `UNDEFINED`. (*The reason is that Vertical-First hints may
46 permit multiple elements up to hint length to be executed in parallel,
47 however the number is entirely up to implementors. Attempting to test
48 an arbitrary indeterminate number of Conditional tests is impossible
49 to define, and efforts to enforce such defined behaviour interfere with
50 Vertical-First mode parallel opportunistic behaviour.*)
51
52 In `svstep` mode, the whole CR Field, part of which is selected by `BI`
53 (top 3 bits), is updated based on incrementing srcstep and dststep, and
54 performing the same tests as [[sv/svstep]]. Following the step update,
55 which involved writing to the exact CR Field about to be tested, the
56 Branch Conditional instruction proceeds as normal (reading and testing
57 the CR bit just updated, if the relevant `BO` bit is set). Note that
58 the SVSTATE fields are still updated, and the CR field still updated,
59 even if the `BO` bits do not require CR testing.
60
61 Predication in both INT and CR modes may be applied to `sv.bc` and other
62 SVP64 Branch Conditional operations, exactly as they may be applied to
63 other SVP64 operations. When `sz` is zero, any masked-out Branch-element
64 operations are not executed, exactly like all other SVP64 operations.
65
66 However when `sz` is non-zero, this normally requests insertion of a zero
67 in place of the input data, when the relevant predicate mask bit is zero.
68 This would mean that a zero is inserted in place of `CR[BI+32]` for
69 testing against `BO`, which may not be desirable in all circumstances.
70 Therefore, an extra field is provided `SNZ`, which, if set, will insert
71 a **one** in place of a masked-out element instead of a zero.
72
73 (*Note: Both options are provided because it is useful to deliberately
74 cause the Branch-Conditional Vector testing to fail at a specific point,
75 controlled by the Predicate mask. This is particularly useful in `VLSET`
76 mode, which will truncate SVSTATE.VL at the point of the first failed
77 test.*)
78
79 SVP64 RM `MODE` for Branch Conditional:
80
81 | 0-1 | 2 | 3 4 | description |
82 | --- | --- |---------|-------------------------- |
83 | 00 | SNZ | ALL sz | normal mode |
84 | 01 | VLI | ALL sz | VLSET mode |
85 | 10 | SNZ | ALL sz | svstep mode |
86 | 11 | VLI | ALL sz | svstep VLSET mode, in Horizontal-First |
87 | 11 | VLI | SNZ sz | svstep VLSET mode, in Vertical-First |
88
89 Fields:
90
91 * **sz** if predication is enabled will put 4 copies of `SNZ` in place of
92 the src CR Field when the predicate bit is zero. otherwise the element
93 is ignored or skipped, depending on context.
94 * **ALL** when set, all branch conditional tests must pass in order for
95 the branch to succeed.
96 * **VLI** In VLSET mode, VL is set equal (truncated) to the first
97 branch which succeeds. If VLI (Vector Length Inclusive) is clear,
98 VL is truncated to *exclude* the current element, otherwise it is
99 included. SVSTATE.MVL is not changed.
100
101 svstep mode will run an increment of SVSTATE srcstep and dststep
102 (which is still useful in Horizontal First Mode). Unlike `svstep.`
103 however which updates only CR0 with the testing of REMAP loop progress,
104 the CR Field is taken from the branch `BI` field, and updated prior to
105 proceeding to each element branch conditional testing.
106
107 Note that, interestingly, due to the useful side-effects of `VLSET` mode
108 and `svstep` mode it is actually useful to use Branch Conditional even
109 to perform no actual branch operation, i.e to point to the instruction
110 after the branch.
111
112 In particular, svstep mode is still useful for Horizontal-First Mode
113 particularly in combination with REMAP. All "loop end" conditions
114 will be tested on a per-element basis and placed into a Vector of CRs
115 starting from the point specified by the Branch `BI` field. This Vector
116 of CR Fields may then be subsequently used as a Predicate Mask, and,
117 furthermore, if VLSET mode was requested, VL will have been set to the
118 length of one of the loop endpoints, again as specified by the bit from
119 the Branch `BI` field.
120
121 Also, the unconditional bit `BO[0]` is still relevant when Predication
122 is applied to the Branch because in `ALL` mode all nonmasked bits have
123 to be tested. Even when svstep mode or VLSET mode are not used, CTR
124 may still be decremented by the total number of nonmasked elements.
125 In short, Vectorised Branch becomes an extremely powerful tool.
126
127 Available options to combine:
128
129 * `BO[0]` to make an unconditional branch would seem irrelevant if
130 it were not for predication and for side-effects.
131 * `BO[1]` to select whether the CR bit being tested is zero or nonzero
132 * `R30` and `~R30` and other predicate mask options including CR and
133 inverted CR bit testing
134 * `sz` and `SNZ` to insert either zeros or ones in place of masked-out
135 predicate bits
136 * `ALL` or `ANY` behaviour corresponding to `AND` of all tests and
137 `OR` of all tests, respectively.
138
139 In addition to the above, it is necessary to select whether, in `svstep`
140 mode, the Vector CR Field is to be overwritten or not: in some cases
141 it is useful to know but in others all that is needed is the branch itself.
142 In the case of `sv.bc` there is no additional bitspace so the ``AA`
143 field is re-interpreted instead to be `Rc`. For `sv.bclr`, there is free
144 bitspace and so bit 16 has been chosen as `Rc`.
145
146 **These interpretations are only available for sv.bc, they are NOT
147 available for Power ISA v3.0B** i.e. only when embedded in an SVP64
148 Prefix Context do these and all other parts of this specification
149 apply. To repeat: **Standard Scalar v3.0B Branch is in
150 absolutely no way impacted or altered in any way shape or form by
151 the SVP64 variant of the same**
152
153 Pseudocode for Rc in sv.bc
154
155 ```
156 # Use bit 30 as Rc, disable AA
157 Rc = AA
158 AA = 0
159 ```
160
161 Pseudocode for Rc in sv.bclr
162
163 ```
164 # use bit 16 of opcode as Rc
165 Rc = instr[16]
166 ```
167
168 Pseudocode for Horizontal-First Mode:
169
170 ```
171 cond_ok = not SVRMmode.ALL
172 for srcstep in range(VL):
173 # select predicate bit or zero/one
174 if predicate[srcstep]:
175 # get SVP64 extended CR field 0..127
176 SVCRf = SVP64EXTRA(BI>>2)
177 if svstep_mode then
178 new_srcstep, CRbits = SVSTATE_NEXT(srcstep)
179 else
180 CRbits = CR{SVCRf}
181 if Rc = 1 then # CR0 Vectorised
182 CR{0+srcstep} = CRbits
183 testbit = CRbits[BI & 0b11]
184 # testbit = CR[BI+32+srcstep*4]
185 else if not SVRMmode.sz:
186 continue
187 else
188 testbit = SVRMmode.SNZ
189 # actual element test here
190 el_cond_ok <- BO[0] | ¬(testbit ^ BO[1])
191 # merge in the test
192 if SVRMmode.ALL:
193 cond_ok &= el_cond_ok
194 else
195 cond_ok |= el_cond_ok
196 # test for VL to be set (and exit)
197 if ~el_cond_ok and VLSET
198 if SVRMmode.VLI
199 SVSTATE.VL = srcstep+1
200 else
201 SVSTATE.VL = srcstep
202 break
203 # early exit?
204 if SVRMmode.ALL:
205 if ~el_cond_ok:
206 break
207 else
208 if el_cond_ok:
209 break
210 if svstep_mode then
211 SVSTATE.srcstep = new_srcstep
212 ```
213
214 Pseudocode for Vertical-First Mode:
215
216 ```
217 # get SVP64 extended CR field 0..127
218 SVCRf = SVP64EXTRA(BI>>2)
219 if svstep_mode then
220 new_srcstep, CRbits = SVSTATE_NEXT(srcstep)
221 else
222 CRbits = CR{SVCRf}
223 # select predicate bit or zero/one
224 if predicate[srcstep]:
225 if Rc = 1 then # CR0 vectorised
226 CR{0+srcstep} = CRbits
227 testbit = CRbits[BI & 0b11]
228 else if not SVRMmode.sz:
229 SVSTATE.srcstep = new_srcstep
230 exit # no branch testing
231 else
232 testbit = SVRMmode.SNZ
233 # actual element test here
234 cond_ok <- BO[0] | ¬(testbit ^ BO[1])
235 # test for VL to be set (and exit)
236 if ~cond_ok and VLSET
237 if SVRMmode.VLI
238 SVSTATE.VL = new_srcstep+1
239 else
240 SVSTATE.VL = new_srcstep
241 if svstep_mode then
242 SVSTATE.srcstep = new_srcstep
243 ```
244
245 # Example Shader code
246
247 ```
248 while(a > 2) {
249 if(b < 5)
250 f();
251 else
252 g();
253 h();
254 }
255 ```
256
257 which compiles to something like:
258
259 ```
260 vec<i32> a, b;
261 // ...
262 pred loop_pred = a > 2;
263 while(loop_pred.any()) {
264 pred if_pred = loop_pred & (b < 5);
265 if(if_pred.any()) {
266 f(if_pred);
267 }
268 label1:
269 pred else_pred = loop_pred & ~if_pred;
270 if(else_pred.any()) {
271 g(else_pred);
272 }
273 h(loop_pred);
274 }
275 ```
276
277 which will end up as:
278
279 ```
280 sv.cmpi CR60.v a.v, 2 # vector compare a into CR60 vector
281 sv.crweird r30, CR60.GT # transfer GT vector to r30
282 while_loop:
283 sv.cmpi CR80.v, b.v, 5 # vector compare b into CR64 Vector
284 sv.bc/m=r30/~ALL/sz CR80.v.LT skip_f # skip when none
285 # only calculate loop_pred & pred_b because needed in f()
286 sv.crand CR80.v.SO, CR60.v.GT, CR80.V.LT # if = loop & pred_b
287 f(CR80.v.SO)
288 skip_f:
289 # illustrate inversion of pred_b. invert r30, test ALL
290 # rather than SOME, but masked-out zero test would FAIL,
291 # therefore masked-out instead is tested against 1 not 0
292 sv.bc/m=~r30/ALL/SNZ CR80.v.LT skip_g
293 # else = loop & ~pred_b, need this because used in g()
294 sv.crternari(A&~B) CR80.v.SO, CR60.v.GT, CR80.V.LT
295 g(CR80.v.SO)
296 skip_g:
297 # conditionally call h(r30) if any loop pred set
298 sv.bclr/m=r30/~ALL/sz BO[1]=1 h()
299 sv.bc/m=r30/~ALL/sz BO[1]=1 while_loop
300 ```