(no commit message)
[libreriscv.git] / openpower / sv / branches.mdwn
1 # SVP64 Branch Conditional behaviour
2
3 Links
4
5 * <https://bugs.libre-soc.org/show_bug.cgi?id=664>
6 * <http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-August/003416.html>
7 * [[openpower/isa/branch]]
8
9 Scalar 3.0B Branch Conditional operations, `bc`, `bctar` etc. test a Condition Register.
10 When doing so in a Vector Context, it is quite reasonable and logical to test a *Vector* of
11 CR Fields. In 3D Shader binaries, which are inherently parallelised
12 and predicated, testing all or some results and branching based on
13 multiple tests is extremely common.
14 Therefore, `sv.bc` and other Branch Conditional instructions are worth
15 including.
16
17 The `BI` field of Branch Conditional operations is five bits,
18 in scalar v3.0B this would select one bit of the 32 bit CR.
19 In SVP64 there are 16 32 bit CRs, containing 128 4-bit CR Fields.
20 Therefore, the 2 LSBs of `BI` select the bit from the CR, and the
21 top 3 bits are extended to either scalar or vector and to
22 select CR Fields 0..127 as specified
23 in SVP64 [[sv/svp64/appendix]]
24
25 When considering an "array" of branches, there are two useful modes:
26
27 * Branch takes place on the first CR test to succeed
28 (a Great Big OR of all condition tests)
29 * Branch takes place only if **all** CR tests succeed:
30 a Great Big AND of all condition tests
31 (including those where the predicate is masked out
32 and the corresponding CR Field is considered to be
33 set to `SNZ`)
34
35 In Vertical-First Mode, the `ALL` bit should
36 not be used. If set, behaviour is `UNDEFINED`.
37 (*The reason is that Vertical-First hints may permit
38 multiple elements up to hint length to be executed
39 in parallel, however the number is entirely up to
40 implementors. Attempting to test an arbitrary
41 indeterminate number of Conditional tests is impossible
42 to define, and efforts to enforce such defined behaviour
43 interfere with Vertical-First mode parallel
44 opportunistic behaviour.*)
45
46 In `svstep` mode,
47 the whole CR Field, part of which is
48 selected by `BI` (top 3 bits) is updated based on
49 incrementing srcstep and dststep, and performing the
50 same tests as [[sv/svstep]], following which the Branch
51 Conditional instruction proceeds as normal (reading
52 and testing the CR bit just updated, if the relevant
53 `BO` bit is set). Note that the SVSTATE fields
54 are still updated, and the CR field still updated,
55 even if the `BO` bits do not require CR testing.
56
57 Predication in both INT and CR modes may be applied to
58 `sv.bc` and other SVP64 Branch Conditional operations,
59 exactly as they may be applied to other SVP64 operations.
60 When `sz` is zero, any masked-out Branch-element operations
61 are not executed, exactly like all other SVP64
62 operations.
63
64 However when `sz` is non-zero, this normally requests insertion
65 of a zero in place of the input data, when the relevant predicate
66 mask bit is zero. This would mean that a zero is inserted in
67 place of `CR[BI+32]` for testing against `BO`, which may not
68 be desirable in all circumstances. Therefore, an extra field
69 is provided `SNZ`, which, if set, will insert a **one** in
70 place of a masked-out element instead of a zero.
71
72 (*Note: Both options are provided because it is useful to
73 deliberately cause the Branch-Conditional Vector testing
74 to fail at a specific point, controlled by the Predicate
75 mask. This is particularly useful in `VLSET` mode, which
76 will truncate SVSTATE.VL at the point of the first failed
77 test.*)
78
79 SVP64 RM `MODE` for Branch Conditional:
80
81 | 0-1 | 2 | 3 4 | description |
82 | --- | --- |---------|-------------------------- |
83 | 00 | SNZ | ALL sz | normal mode |
84 | 01 | VLI | ALL sz | VLSET mode |
85 | 10 | SNZ | ALL sz | svstep mode |
86 | 11 | VLI | ALL sz | svstep VLSET mode |
87
88 Fields:
89
90 * **sz** if predication is enabled will put 4 copies of `SNZ` in place of the src CR Field when the predicate bit is zero. otherwise the element is ignored or skipped, depending on context.
91 * **ALL** when set, all branch conditional tests must pass in order for
92 the branch to succeed.
93 * **VLI** In VLSET mode, VL is set equal (truncated) to the first branch
94 which succeeds. If VLI (Vector Length Inclusive) is clear, VL is truncated
95 to *exclude* the current element, otherwise it is included. SVSTATE.MVL is not changed.
96
97 svstep mode will run an increment of SVSTATE srcstep and dststep
98 (which is still useful in Horizontal First Mode). Unlike `svstep.` however
99 which updates only CR0 with the testing of REMAP loop progress,
100 the CR Field is taken from the branch `BI` field, and updated
101 prior to proceeding to each element branch conditional testing.
102
103 Note that, interestingly, due to the useful side-effects of `VLSET` mode
104 and `svstep` mode it is actually useful to use Branch Conditional even
105 to perform no actual branch operation, i.e to point to the instruction
106 after the branch.
107 In particular, svstep mode is still useful for Horizontal-First Mode
108 particularly in combination with REMAP. All "loop end" conditions
109 will be tested on a per-element basis and placed into a Vector of
110 CRs starting from the point specified by the Branch `BI` field.
111 This Vector of CR Fields may then be subsequently used as a Predicate
112 Mask, and, furthermore, if VLSET mode was requested, VL will have
113 been set to the length of one of the loop endpoints, again as specified
114 by the bit from the Branch `BI` field.
115
116 Available options to combine:
117
118 * `BO[1]` to select whether the CR bit being tested is zero or nonzero
119 * `R30` and `~R30` and other predicate mask options including CR and
120 inverted CR bit testing
121 * `sz` and `SNZ` to insert either zeros or ones in place of masked-out
122 predicate bits
123 * `ALL` or `ANY` behaviour corresponding to `AND` of all tests and
124 `OR` of all tests, respectively.
125
126 Pseudocode for Horizontal-First Mode:
127
128 ```
129 if BO[0]:
130 cond_ok = 1
131 else
132 cond_ok = not SVRMmode.ALL
133 for srcstep in range(VL):
134 new_srcstep, CRbits = SVSTATE_NEXT(srcstep)
135 # select predicate bit or zero/one
136 if predicate[srcstep]:
137 # get SVP64 extended CR field 0..127
138 SVCRf = SVP64EXTRA(BI>>2)
139 CR{SVCRf+srcstep} = CRbits
140 testbit = CRbits[BI & 0b11]
141 # testbit = CR[BI+32+srcstep*4]
142 else if not SVRMmode.sz:
143 continue
144 else
145 testbit = SVRMmode.SNZ
146 # actual element test here
147 el_cond_ok <- ¬(testbit ^ BO[1])
148 # merge in the test
149 if SVRMmode.ALL:
150 cond_ok &= el_cond_ok
151 else
152 cond_ok |= el_cond_ok
153 # test for VL to be set (and exit)
154 if ~el_cond_ok and VLSET
155 if SVRMmode.VLI
156 SVSTATE.VL = srcstep+1
157 else
158 SVSTATE.VL = srcstep
159 break
160 # early exit?
161 if SVRMmode.ALL:
162 if ~el_cond_ok:
163 break
164 else
165 if el_cond_ok:
166 break
167 ```
168
169 Pseudocode for Vertical-First Mode:
170
171 ```
172 new_srcstep, CRbits = SVSTATE_NEXT(srcstep)
173 if BO[0]:
174 cond_ok = 1
175 else
176 # select predicate bit or zero/one
177 if predicate[srcstep]:
178 # get SVP64 extended CR field 0..127
179 SVCRf = SVP64EXTRA(BI>>2)
180 CR{SVCRf+srcstep} = CRbits
181 testbit = CRbits[BI & 0b11]
182 else if not SVRMmode.sz:
183 SVSTATE.srcstep = new_srcstep
184 exit # no branch testing
185 else
186 testbit = SVRMmode.SNZ
187 # actual element test here
188 cond_ok <- ¬(testbit ^ BO[1])
189 # test for VL to be set (and exit)
190 if ~cond_ok and VLSET
191 if SVRMmode.VLI
192 SVSTATE.VL = new_srcstep+1
193 else
194 SVSTATE.VL = new_srcstep
195 SVSTATE.srcstep = new_srcstep
196 ```