port ls010.mdwn updates back to cr_ops.mdwn
[libreriscv.git] / openpower / sv / cr_ops.mdwn
1 # Condition Register SVP64 Operations
2
3 **DRAFT STATUS**
4
5 Links:
6
7 * <https://bugs.libre-soc.org/show_bug.cgi?id=687>
8 * <https://bugs.libre-soc.org/show_bug.cgi?id=936> write on failfirst
9 * [[svp64]]
10 * [[sv/branches]]
11 * [[sv/cr_int_predication]]
12 * [[openpower/isa/sprset]]
13 * [[openpower/isa/condition]]
14 * [[openpower/isa/comparefixed]]
15
16 Condition Register Fields are only 4 bits wide: this presents some
17 interesting conceptual challenges for SVP64, which was designed
18 primarily for vectors of arithmetic and logical operations. However
19 if predicates may be bits of CR Fields it makes sense to extend
20 Simple-V to cover CR Operations, especially given that Vectorised Rc=1
21 may be processed by Vectorised CR Operations tbat usefully in turn
22 may become Predicate Masks to yet more Vector operations, like so:
23
24 ```
25 sv.cmpi/ew=8 *B,*ra,0 # compare bytes against zero
26 sv.cmpi/ew=8 *B2,*ra,13. # and against newline
27 sv.cror PM.EQ,B.EQ,B2.EQ # OR compares to create mask
28 sv.stb/sm=EQ ... # store only nonzero/newline
29 ```
30
31 Element width however is clearly meaningless for a 4-bit collation of
32 Conditions, EQ LT GE SO. Likewise, arithmetic saturation (an important
33 part of Arithmetic SVP64) has no meaning. An alternative Mode Format is
34 required, and given that elwidths are meaningless for CR Fields the bits
35 in SVP64 `RM` may be used for other purposes.
36
37 This alternative mapping **only** applies to instructions that **only**
38 reference a CR Field or CR bit as the sole exclusive result. This section
39 **does not** apply to instructions which primarily produce arithmetic
40 results that also, as an aside, produce a corresponding CR Field (such as
41 when Rc=1). Instructions that involve Rc=1 are definitively arithmetic
42 in nature, where the corresponding Condition Register Field can be
43 considered to be a "co-result". Such CR Field "co-result" arithmeric
44 operations are firmly out of scope for this section, being covered fully
45 by [[sv/normal]].
46
47 * Examples of v3.0B instructions to which this section does
48 apply is
49 - `mfcr` and `cmpi` (3 bit operands) and
50 - `crnor` and `crand` (5 bit operands).
51 * Examples to which this section does **not** apply include
52 `fadds.` and `subf.` which both produce arithmetic results
53 (and a CR Field co-result).
54
55 The CR Mode Format still applies to `sv.cmpi` because despite
56 taking a GPR as input, the output from the Base Scalar v3.0B `cmpi`
57 instruction is purely to a Condition Register Field.
58
59 Other modes are still applicable and include:
60
61 * **Data-dependent fail-first**.
62 useful to truncate VL based on analysis of a Condition Register result bit.
63 * **Reduction**.
64 Reduction is useful for analysing a Vector of Condition Register Fields
65 and reducing it to one single Condition Register Field.
66
67 Predicate-result does not make any sense because when Rc=1 a co-result
68 is created (a CR Field). Testing the co-result allows the decision to
69 be made to store or not store the main result, and for CR Ops the CR
70 Field result *is* the main result.
71
72 ## Format
73
74 SVP64 RM `MODE` (includes `ELWIDTH_SRC` bits) for CR-based operations:
75
76 |6 | 7 |19-20| 21 | 22 23 | description |
77 |--|---|-----| --- |---------|----------------- |
78 |/ | / |0 RG | 0 | dz sz | simple mode |
79 |/ | / |0 RG | 1 | dz sz | scalar reduce mode (mapreduce) |
80 |zz|SNZ|1 VLI| inv | CR-bit | Ffirst 3-bit mode |
81 |/ |SNZ|1 VLI| inv | dz sz | Ffirst 5-bit mode (implies CR-bit from result) |
82
83 Fields:
84
85 * **sz / dz** if predication is enabled will put zeros into the dest
86 (or as src in the case of twin pred) when the predicate bit is zero.
87 otherwise the element is ignored or skipped, depending on context.
88 * **zz** set both sz and dz equal to this flag
89 * **SNZ** In fail-first mode, on the bit being tested, when sz=1 and
90 SNZ=1 a value "1" is put in place of "0".
91 * **inv CR-bit** just as in branches (BO) these bits allow testing of
92 a CR bit and whether it is set (inv=0) or unset (inv=1)
93 * **RG** inverts the Vector Loop order (VL-1 downto 0) rather
94 than the normal 0..VL-1
95 * **SVM** sets "subvector" reduce mode
96 * **VLi** VL inclusive: in fail-first mode, the truncation of
97 VL *includes* the current element at the failure point rather
98 than excludes it from the count.
99
100 ## Data-dependent fail-first on CR operations
101
102 The principle of data-dependent fail-first is that if, during the course
103 of sequentially evaluating an element's Condition Test, one such test
104 is encountered which fails, then VL (Vector Length) is truncated (set)
105 at that point. In the case of Arithmetic SVP64 Operations the Condition
106 Register Field generated from Rc=1 is used as the basis for the truncation
107 decision. However with CR-based operations that CR Field result to be
108 tested is provided *by the operation itself*.
109
110 Data-dependent SVP64 Vectorised Operations involving the creation
111 or modification of a CR can require an extra two bits, which are not
112 available in the compact space of the SVP64 RM `MODE` Field. With the
113 concept of element width overrides being meaningless for CR Fields it
114 is possible to use the `ELWIDTH` field for alternative purposes.
115
116 Condition Register based operations such as `sv.mfcr` and `sv.crand`
117 can thus be made more flexible. However the rules that apply in this
118 section also apply to future CR-based instructions.
119
120 There are two primary different types of CR operations:
121
122 * Those which have a 3-bit operand field (referring to a CR Field)
123 * Those which have a 5-bit operand (referring to a bit within the
124 whole 32-bit CR)
125
126 Examining these two types it is observed that the difference may
127 be considered to be that the 5-bit variant *already* provides the
128 prerequisite information about which CR Field bit (EQ, GE, LT, SO) is
129 to be operated on by the instruction. Thus, logically, we may set the
130 following rule:
131
132 * When a 5-bit CR Result field is used in an instruction, the
133 5-bit variant of Data-Dependent Fail-First
134 must be used. i.e. the bit of the CR field to be tested is
135 the one that has just been modified (created) by the operation.
136 * When a 3-bit CR Result field is used the 3-bit variant
137 must be used, providing as it does the missing `CRbit` field
138 in order to select which CR Field bit of the result shall
139 be tested (EQ, LE, GE, SO)
140
141 The reason why the 3-bit CR variant needs the additional CR-bit field
142 should be obvious from the fact that the 3-bit CR Field from the base
143 Power ISA v3.0B operation clearly does not contain and is missing the
144 two CR Field Selector bits. Thus, these two bits (to select EQ, LE,
145 GE or SO) must be provided in another way.
146
147 Examples of the former type:
148
149 * crand, cror, crnor. These all are 5-bit (BA, BB, BT). The bit
150 to be tested against `inv` is the one selected by `BT`
151 * mcrf. This has only 3-bit (BF, BFA). In order to select the
152 bit to be tested, the alternative encoding must be used.
153 With `CRbit` coming from the SVP64 RM bits 22-23 the bit
154 of BF to be tested is identified.
155
156 Just as with SVP64 [[sv/branches]] there is the option to truncate
157 VL to include the element being tested (`VLi=1`) and to exclude it
158 (`VLi=0`).
159
160 Also exactly as with [[sv/normal]] fail-first, VL cannot, unlike
161 [[sv/ldst]], be set to an arbitrary value. Deterministic behaviour
162 is *required*.
163
164 ## Reduction and Iteration
165
166 Bearing in mind as described in the [[svp64/appendix]] SVP64 Horizontal
167 Reduction is a deterministic schedule on top of base Scalar v3.0
168 operations, the same rules apply to CR Operations, i.e. that programmers
169 must follow certain conventions in order for an *end result* of a
170 reduction to be achieved. Unlike other Vector ISAs *there are no explicit
171 reduction opcodes* in SVP64: Schedules however achieve the same effect.
172
173 Due to these conventions only reduction on operations such as `crand`
174 and `cror` are meaningful because these have Condition Register Fields
175 as both input and output. Meaningless operations are not prohibited
176 because the cost in hardware of doing so is prohibitive, but neither
177 are they `UNDEFINED`. Implementations are still required to execute them
178 but are at liberty to optimise out any operations that would ultimately
179 be overwritten, as long as Strict Program Order is still obvservable by
180 the programmer.
181
182 Also bear in mind that 'Reverse Gear' may be enabled, which can be
183 used in combination with overlapping CR operations to iteratively
184 accumulate results. Issuing a `sv.crand` operation for example with
185 `BA` differing from `BB` by one Condition Register Field would result
186 in a cascade effect, where the first-encountered CR Field would set the
187 result to zero, and also all subsequent CR Field elements thereafter:
188
189 ```
190 # sv.crand/mr/rg CR4.ge.v, CR5.ge.v, CR4.ge.v
191 for i in VL-1 downto 0 # reverse gear
192 CR.field[4+i].ge &= CR.field[5+i].ge
193 ```
194
195 `sv.crxor` with reduction would be particularly useful for parity
196 calculation for example, although there are many ways in which the same
197 calculation could be carried out after transferring a vector of CR Fields
198 to a GPR using crweird operations.
199
200 Implementations are free and clear to optimise these reductions in any way
201 they see fit, as long as the end-result is compatible with Strict Program
202 Order being observed, and Interrupt latency is not adversely impacted.
203
204 ## Unusual and quirky CR operations
205
206 **cmp and other compare ops**
207
208 `cmp` and `cmpi` etc take GPRs as sources and create a CR Field as a result.
209
210 ```
211 cmpli BF,L,RA,UI
212 cmpeqb BF,RA,RB
213 ```
214
215 With `ELWIDTH` applying to the source GPR operands this is perfectly fine.
216
217 **crweird operations**
218
219 There are 4 weird CR-GPR operations and one reasonable one in
220 the [[cr_int_predication]] set:
221
222 * crrweird
223 * mtcrweird
224 * crweirder
225 * crweird
226 * mcrfm - reasonably normal and referring to CR Fields for src and dest.
227
228 The "weird" operations have a non-standard behaviour, being able to
229 treat *individual bits* of a GPR effectively as elements. They are
230 expected to be Micro-coded by most Hardware implementations.
231
232 --------
233
234 [[!tag standards]]
235
236 \newpage{}
237