5e59e7bb78a0cb34410c7a588b669aea83610bdf
[libreriscv.git] / openpower / sv / cr_ops.mdwn
1 [[!tag standards]]
2 # Condition Register SVP64 Operations
3
4 Links:
5
6 * <https://bugs.libre-soc.org/show_bug.cgi?id=687>
7 * [[svp64]]
8 * [[sv/branches]]
9 * [[openpower/isa/sprset]]
10 * [[openpower/isa/condition]]
11 * [[openpower/isa/comparefixed]]
12
13 Condition Register Fields are only 4 bits wide: this presents some
14 interesting conceptual challenges for SVP64, particularly with respect to element
15 width (which is clearly meaningless for a 4-bit
16 collation of Conditions, EQ LT GE SO). Likewise, arithmetic saturation
17 (an important part of Arithmetic SVP64)
18 has no meaning. Additionally, extra modes are required that only make
19 sense for Vectorised CR Operations. Consequently an alternative Mode Format is required, and given that elwidths are meaningless for CR Fields
20 the bits in SVP64 `RM` may be used for other purposes.
21
22 This alternative mapping **only** applies to instructions that **only**
23 reference a CR Field or CR bit as the sole exclusive result. This section
24 **does not** apply to instructions which primarily produce arithmetic
25 results that also, as an aside, produce a corresponding
26 CR Field (such as when Rc=1).
27 Instructions that involve Rc=1 are definitively arithmetic in nature,
28 where the corresponding Condition Register Field can be considered to
29 be a "co-result". Such CR Field "co-result" arithmeric operations
30 are firmly out of scope for
31 this section.
32
33 * Examples of v3.0B instructions to which this section does
34 apply is
35 - `mfcr` (3 bit operands) and
36 - `crnor` and `cmpi` (5 bit operands).
37 * Examples to which this section does **not** apply include
38 `fadds.` and `subf.` which both produce arithmetic results
39 (and a CR Field co-result).
40
41 The CR Mode Format still applies to `sv.cmpi` because despite
42 taking a GPR as input, the output from the Base Scalar v3.0B `cmpi`
43 instruction is purely to a Condition Register Field.
44
45 Other modes are still applicable and include:
46
47 * **Data-dependent fail-first**.
48 useful to truncate VL based on
49 analysis of a Condition Register result bit.
50 * **Scalar and parallel reduction**.
51 Reduction is useful
52 for analysing a Vector of Condition Register Fields
53 and reducing it to one
54 single Condition Register Field.
55 * **Predicate-result**.
56 An augmentation to predication in that only elements which pass a test
57 on the result carried out *by the instruction itself*
58 will end up actually being modified. This is in effect the same
59 as ANDing the Condition Test with the destination predicate
60 mask (hence the name, "predicate-result").
61
62 Predicate-result is a particularly powerful strategic mode
63 in that it is the interaction of a source predicate, destination predicate,
64 input operands *and* the output result, all combining to influence
65 what actually goes into the Condition Register File. Given that
66 predicates may themselves be Condition Registers it can be seen that
67 there could potentially be up to **six** CR Fields involved in
68 the execution of Predicate-result Mode.
69
70 A reminder that, just as with other SVP64 Modes, unlike v3.1 64 bit
71 Prefixing there are insufficient bits spare in the prefix to mark
72 the type. Therefore, the SVP64 Mode must be identified by first
73 decoding the suffix (the 32 bit scalar operation), and, once
74 the instruction is identified (cmpi, mfcr, crweird)
75 only then may the type of SVP64 Mode (normal, branch, LDST, CR 3-bit
76 or CR 5-bit) be decoded.
77
78 # Format
79
80 SVP64 RM `MODE` (includes `ELWIDTH` bits) for CR-based operations:
81
82 | 4 | 5 | 19-20 | 21 | 22 23 | description |
83 | - | - | ----- | --- |---------|----------------- |
84 |sz |SNZ| 00 | 0 | dz / | normal mode |
85 |sz |SNZ| 00 | 1 | 0 RG | scalar reduce mode (mapreduce), SUBVL=1 |
86 |sz |SNZ| 00 | 1 | 1 / | parallel reduce mode (mapreduce), SUBVL=1 |
87 |sz |SNZ| 00 | 1 | SVM RG | subvector reduce mode, SUBVL>1 |
88 |sz |SNZ| 01/10 | inv | CR-bit | Ffirst 3-bit mode |
89 |sz |SNZ| 01/10 | inv | dz / | Ffirst 5-bit mode |
90 |sz |SNZ| 11 | inv | CR-bit | 3-bit pred-result CR sel |
91 |sz |SNZ| 11 | inv | dz / | 5-bit pred-result z/nonz |
92
93 `VLI=0` when bits 19-20=0b01.
94 `VLI=1` when bits 19-20=0b10.
95
96 Fields:
97
98 * **sz / dz** if predication is enabled will put zeros into the dest (or as src in the case of twin pred) when the predicate bit is zero. otherwise the element is ignored or skipped, depending on context.
99 * **SNZ** when sz=1 and SNZ=1 a value "1" is put in place of zeros when
100 the predicate bit is clear.
101 * **inv CR bit** just as in branches (BO) these bits allow testing of a CR bit and whether it is set (inv=0) or unset (inv=1)
102 * **RG** inverts the Vector Loop order (VL-1 downto 0) rather
103 than the normal 0..VL-1
104 * **SVM** sets "subvector" reduce mode
105 * **VLi** VL inclusive: in fail-first mode, the truncation of
106 VL *includes* the current element at the failure point rather
107 than excludes it from the count.
108
109 # Data-dependent fail-first on CR operations
110
111 The principle of data-dependent fail-first is that if, during
112 the course of sequentially evaluating an element's Condition Test,
113 one such test is encountered which fails,
114 then VL (Vector Length) is truncated at that point. In the case
115 of Arithmetic SVP64 Operations the Condition Register Field generated from
116 Rc=1 is used as the basis for the truncation decision.
117 However with CR-based operations that CR Field result to be
118 tested is provided
119 *by the operation itself*.
120
121 Data-dependent SVP64 Vectorised Operations involving the creation or
122 modification of a CR can require an extra two bits, which are not available
123 in the compact space of the SVP64 RM `MODE` Field. With the concept of element
124 width overrides being meaningless for CR Fields it is possible to use the
125 `ELWIDTH` field for alternative purposes.
126
127 Condition Register based operations such as `sv.mfcr` and `sv.crand` can thus
128 be made more flexible. However the rules that apply in this section
129 also apply to future CR-based instructions.
130
131 There are two primary different types of CR operations:
132
133 * Those which have a 3-bit operand field (referring to a CR Field)
134 * Those which have a 5-bit operand (referring to a bit within the
135 whole 32-bit CR)
136
137 Examining these two types it is observed that the
138 difference may be considered to be that the 5-bit variant
139 *already* provides the
140 prerequisite information about which CR Field bit (EQ, GE, LT, SO) is to
141 be operated on by the instruction.
142 Thus, logically, we may set the following rule:
143
144 * When a 5-bit CR Result field is used in an instruction, the
145 5-bit variant of Data-Dependent Fail-First
146 must be used. i.e. the bit of the CR field to be tested is
147 the one that has just been modified (created) by the operation.
148 * When a 3-bit CR Result field is used the 3-bit variant
149 must be used, providing as it does the missing `CRbit` field
150 in order to select which CR Field bit of the result shall
151 be tested (EQ, LE, GE, SO)
152
153 The reason why the 3-bit CR variant needs the additional CR-bit
154 field should be obvious from the fact that the 3-bit CR Field
155 from the base Power ISA v3.0B operation clearly does not contain
156 and is missing the two CR Field Selector bits. Thus, these two
157 bits (to select EQ, LE, GE or SO) must be provided in another
158 way.
159
160 Examples of the former type:
161
162 * crand, cror, crnor. These all are 5-bit (BA, BB, BT). The bit
163 to be tested against `inv` is the one selected by `BT`
164 * mcrf. This has only 3-bit (BF, BFA). In order to select the
165 bit to be tested, the alternative encoding must be used.
166 With `CRbit` coming from the SVP64 RM bits 22-23 the bit
167 of BF to be tested is identified.
168
169 Just as with SVP64 [[sv/branches]] there is the option to truncate
170 VL to include the element being tested (`VLi=1`) and to exclude it
171 (`VLi=0`).
172
173 # Reduction and Iteration
174
175 Bearing in mind as described in the [[svp64/appendix]] SVP64 Horizontal
176 Reduction is a deterministic schedule on top of base Scalar v3.0 operations,
177 the same rules apply to CR Operations, i.e. that programmers must
178 follow certain conventions in order for an *end result* of a
179 reduction to be achieved. Unlike
180 other Vector ISAs *there are no explicit reduction opcodes*
181 in SVP64.
182
183 Due to these conventions only reduction on operations such as `crand`
184 and `cror` are meaningful because these have Condition Register Fields
185 as both input and output.
186
187 Also bear in mind that 'Reverse Gear' may be enabled, which can be
188 used in combination with overlapping CR operations to iteratively accumulate
189 results. Issuing a `sv.crand` operation for example with `BA`
190 differing from `BB` by one Condition Register Field would
191 result in a cascade effect, where the first-encountered CR Field
192 would set the result to zero, and also all subsequent CR Field
193 elements thereafter:
194
195 # sv.crand/mr/rg CR4.ge.v, CR5.ge.v, CR4.ge.v
196 for i in VL-1 downto 0 # reverse gear
197 CR[4+i].ge &= CR[5+i].ge
198
199
200
201 # Predicate-result Condition Register operations
202
203 These are again slightly different compared to SVP64 arithmetic
204 pred-result (described in [[svp64/appendix]]). The reason is that,
205 again, for arithmetic operations the production of a CR Field when
206 Rc=1 is a *co-result* accompanying the main arithmetic result, whereas
207 for CR-based operations the CR Field (referred to by a 3-bit
208 v3.0B base operand from e.g. `mfcr`) or CR bit (referred to by a 5-bit operand from e.g. `crnor`)
209 *is* itself the explicit and sole result of the operation.
210
211 Therefore, logically, Predicate-result needs to be adapted to
212 test the actual result of the CR-based instruction (rather than
213 test the co-resultant CR when Rc=1, as is done for Arithmetic SVP64).
214
215 for i in range(VL):
216 # predication test, skip all masked out elements.
217 # skips when sz=0
218 if sz=0 and predicate_masked_out(i):
219 continue
220 if predicate_masked_out(i):
221 if 5bit mode:
222 # only one bit of CR to update
223 result = SNZ
224 else
225 # four copies of SNZ
226 result = SNZ || SNZ || SNZ || SNZ
227 else
228 # result is to go into CR. may be a 4-bit CR Field
229 # (3-bit mode) or just a single bit (5-bit mode)
230 result = op(...)
231 if 5bit mode:
232 # if this CR op has 5-bit CR result operands
233 # the single bit result is what must be tested
234 to_test = result
235 else
236 # if however this is a 3-bit CR *field* result
237 # then the bit to be tested must be selected
238 to_test = result[CRbit]
239 # now test CR, similar to branch
240 if to_test != inv:
241 continue # test failed: cancel store
242 # result optionally stored
243 update_CR(result)