(no commit message)
[libreriscv.git] / openpower / sv / cr_ops.mdwn
1 [[!tag standards]]
2 # Condition Register SVP64 Operations
3
4 **DRAFT STATUS**
5
6 Links:
7
8 * <https://bugs.libre-soc.org/show_bug.cgi?id=687>
9 * [[svp64]]
10 * [[sv/branches]]
11 * [[sv/cr_int_predication]]
12 * [[openpower/isa/sprset]]
13 * [[openpower/isa/condition]]
14 * [[openpower/isa/comparefixed]]
15
16 Condition Register Fields are only 4 bits wide: this presents some
17 interesting conceptual challenges for SVP64, which was designed
18 primarily for vectors of arithmetic and logical operations. However
19 if predicates may be bits of CR Fields it makes sense to extend
20 SVP64 to cover CR Operations.
21
22 Element width however is clearly meaningless for a 4-bit
23 collation of Conditions, EQ LT GE SO. Likewise, arithmetic saturation
24 (an important part of Arithmetic SVP64)
25 has no meaning. An alternative Mode Format is required, and given that elwidths are meaningless for CR Fields
26 the bits in SVP64 `RM` may be used for other purposes.
27
28 This alternative mapping **only** applies to instructions that **only**
29 reference a CR Field or CR bit as the sole exclusive result. This section
30 **does not** apply to instructions which primarily produce arithmetic
31 results that also, as an aside, produce a corresponding
32 CR Field (such as when Rc=1).
33 Instructions that involve Rc=1 are definitively arithmetic in nature,
34 where the corresponding Condition Register Field can be considered to
35 be a "co-result". Such CR Field "co-result" arithmeric operations
36 are firmly out of scope for
37 this section.
38
39 * Examples of v3.0B instructions to which this section does
40 apply is
41 - `mfcr` and `cmpi` (3 bit operands) and
42 - `crnor` and `crand` (5 bit operands).
43 * Examples to which this section does **not** apply include
44 `fadds.` and `subf.` which both produce arithmetic results
45 (and a CR Field co-result).
46
47 The CR Mode Format still applies to `sv.cmpi` because despite
48 taking a GPR as input, the output from the Base Scalar v3.0B `cmpi`
49 instruction is purely to a Condition Register Field.
50
51 Other modes are still applicable and include:
52
53 * **Data-dependent fail-first**.
54 useful to truncate VL based on
55 analysis of a Condition Register result bit.
56 * **Scalar and parallel reduction**.
57 Reduction is useful
58 for analysing a Vector of Condition Register Fields
59 and reducing it to one
60 single Condition Register Field.
61 * **Pack/Unpack Mode**.
62 Like VSX `vpack` and `vunpack` the source and destination
63 elements are reordered.
64
65 Predicate-result does not make any sense because
66 when Rc=1 a co-result is created (a CR Field). Testing the co-result
67 allows the decision to be made to store or not store the main
68 result, and for CR Ops the CR Field result *is*
69 the main result.
70
71 # Format
72
73 SVP64 RM `MODE` (includes `ELWIDTH_SRC` bits) for CR-based operations:
74
75 | 6 | 7 | 19-20 | 21 | 22 23 | description |
76 | - | - | ----- | --- |---------|----------------- |
77 |sz |SNZ| 0 RG | 0 | dz / | normal mode |
78 |sz |SNZ| 0 RG | 1 | 0 / | scalar reduce mode (mapreduce), SUBVL=1 |
79 |zz |SNZ| 0 RG | 1 | 1 / | parallel reduce mode (mapreduce), SUBVL=1 |
80 |zz |SNZ| 0 RG | 1 | SVM 0 | subvector reduce mode, SUBVL>1 |
81 |zz |SNZ| 0 RG | 1 | SVM 1 | Pack/Unpack mode, SUBVL>1 |
82 |zz |SNZ| 1 VLI | inv | CR-bit | Ffirst 3-bit mode |
83 |sz |SNZ| 1 VLI | inv | dz / | Ffirst 5-bit mode |
84
85 Fields:
86
87 * **sz / dz** if predication is enabled will put zeros into the dest (or as src in the case of twin pred) when the predicate bit is zero. otherwise the element is ignored or skipped, depending on context.
88 * **zz** set both sz and dz equal to this flag
89 * **SNZ** when sz=1 and SNZ=1 a value "1" is put in place of zeros when
90 the predicate bit is clear (on both source and destination masks)
91 * **inv CR bit** just as in branches (BO) these bits allow testing of a CR bit and whether it is set (inv=0) or unset (inv=1)
92 * **RG** inverts the Vector Loop order (VL-1 downto 0) rather
93 than the normal 0..VL-1
94 * **SVM** sets "subvector" reduce mode
95 * **VLi** VL inclusive: in fail-first mode, the truncation of
96 VL *includes* the current element at the failure point rather
97 than excludes it from the count.
98
99 # Data-dependent fail-first on CR operations
100
101 The principle of data-dependent fail-first is that if, during
102 the course of sequentially evaluating an element's Condition Test,
103 one such test is encountered which fails,
104 then VL (Vector Length) is truncated at that point. In the case
105 of Arithmetic SVP64 Operations the Condition Register Field generated from
106 Rc=1 is used as the basis for the truncation decision.
107 However with CR-based operations that CR Field result to be
108 tested is provided
109 *by the operation itself*.
110
111 Data-dependent SVP64 Vectorised Operations involving the creation or
112 modification of a CR can require an extra two bits, which are not available
113 in the compact space of the SVP64 RM `MODE` Field. With the concept of element
114 width overrides being meaningless for CR Fields it is possible to use the
115 `ELWIDTH` field for alternative purposes.
116
117 Condition Register based operations such as `sv.mfcr` and `sv.crand` can thus
118 be made more flexible. However the rules that apply in this section
119 also apply to future CR-based instructions.
120
121 There are two primary different types of CR operations:
122
123 * Those which have a 3-bit operand field (referring to a CR Field)
124 * Those which have a 5-bit operand (referring to a bit within the
125 whole 32-bit CR)
126
127 Examining these two types it is observed that the
128 difference may be considered to be that the 5-bit variant
129 *already* provides the
130 prerequisite information about which CR Field bit (EQ, GE, LT, SO) is to
131 be operated on by the instruction.
132 Thus, logically, we may set the following rule:
133
134 * When a 5-bit CR Result field is used in an instruction, the
135 5-bit variant of Data-Dependent Fail-First
136 must be used. i.e. the bit of the CR field to be tested is
137 the one that has just been modified (created) by the operation.
138 * When a 3-bit CR Result field is used the 3-bit variant
139 must be used, providing as it does the missing `CRbit` field
140 in order to select which CR Field bit of the result shall
141 be tested (EQ, LE, GE, SO)
142
143 The reason why the 3-bit CR variant needs the additional CR-bit
144 field should be obvious from the fact that the 3-bit CR Field
145 from the base Power ISA v3.0B operation clearly does not contain
146 and is missing the two CR Field Selector bits. Thus, these two
147 bits (to select EQ, LE, GE or SO) must be provided in another
148 way.
149
150 Examples of the former type:
151
152 * crand, cror, crnor. These all are 5-bit (BA, BB, BT). The bit
153 to be tested against `inv` is the one selected by `BT`
154 * mcrf. This has only 3-bit (BF, BFA). In order to select the
155 bit to be tested, the alternative encoding must be used.
156 With `CRbit` coming from the SVP64 RM bits 22-23 the bit
157 of BF to be tested is identified.
158
159 Just as with SVP64 [[sv/branches]] there is the option to truncate
160 VL to include the element being tested (`VLi=1`) and to exclude it
161 (`VLi=0`).
162
163 Also just as with [[sv/normal]] fail-first VL cannot, unlike
164 [[sv/ldst]], be set to an arbitrary value. Deterministic behaviour
165 is *required*.
166
167 # Reduction and Iteration
168
169 Bearing in mind as described in the [[svp64/appendix]] SVP64 Horizontal
170 Reduction is a deterministic schedule on top of base Scalar v3.0 operations,
171 the same rules apply to CR Operations, i.e. that programmers must
172 follow certain conventions in order for an *end result* of a
173 reduction to be achieved. Unlike
174 other Vector ISAs *there are no explicit reduction opcodes*
175 in SVP64.
176
177 Due to these conventions only reduction on operations such as `crand`
178 and `cror` are meaningful because these have Condition Register Fields
179 as both input and output.
180 Meaningless operations are not prohibited because the cost in hardware
181 of doing so is prohibitive, but neither are they `UNDEFINED`. Implementations
182 are still required to execute them but are at liberty to optimise out
183 any operations that would ultimately be overwritten, as long as Strict
184 Program Order is still obvservable by the programmer.
185
186 Also bear in mind that 'Reverse Gear' may be enabled, which can be
187 used in combination with overlapping CR operations to iteratively accumulate
188 results. Issuing a `sv.crand` operation for example with `BA`
189 differing from `BB` by one Condition Register Field would
190 result in a cascade effect, where the first-encountered CR Field
191 would set the result to zero, and also all subsequent CR Field
192 elements thereafter:
193
194 # sv.crand/mr/rg CR4.ge.v, CR5.ge.v, CR4.ge.v
195 for i in VL-1 downto 0 # reverse gear
196 CR[4+i].ge &= CR[5+i].ge
197
198 # LD/ST Pack/Unpack Mode
199
200 As described in [[sv/normal]],
201 Structured Pack/Unpack is similar to VSX `vpack` and `vunpack` except
202 generalised not only to a Schedule to be applied to any operation but
203 also extended to vec2/3/4.
204
205 Like in [[sv/normal]] and [sv/ldst]] operations,
206 setting this mode changes the meaning of bits 4-5 in `RM` from being
207 `ELWIDTH` to a pair of Pack/Unpack bits.
208 *Unlike* in other operation categories however,
209 the `SRC_ELWIDTH` bits (6-7) are in use for `SNZ`.
210 Therefore **it is not possible to use elwidth overrides and Pack/Unpack**
211 at the same time. With elwidths being meaningless for CR Fields this was
212 considered an acceptable compromise: the operations particularly affected
213 are extremely weird CR ops.
214
215 # Unusual and quirky CR operations
216
217 ## cmp and other compare ops
218
219 `cmp` and `cmpi` etc take GPRs as sources and create a CR Field as a result.
220
221 cmpli BF,L,RA,UI
222 cmpeqb BF,RA,RB
223
224 With `ELWIDTH` applying to the source GPR operands this is perfectly fine
225 (caveat: except if Pack/Unpack is needed as well)
226
227 ## crweird operations
228
229 There are 4 weird CR-GPR operations and one reasonable one in
230 the [[cr_int_predication]] set:
231
232 * crrweird
233 * mtcrweird
234 * crweirder
235 * crweird
236 * mcrfm - reasonably normal and referring to CR Fields for src and dest.
237