(no commit message)
[libreriscv.git] / openpower / sv / cr_ops.mdwn
1 [[!tag standards]]
2 # Condition Register SVP64 Operations
3
4 **DRAFT STATUS**
5
6 Links:
7
8 * <https://bugs.libre-soc.org/show_bug.cgi?id=687>
9 * [[svp64]]
10 * [[sv/branches]]
11 * [[sv/cr_int_predication]]
12 * [[openpower/isa/sprset]]
13 * [[openpower/isa/condition]]
14 * [[openpower/isa/comparefixed]]
15
16 Condition Register Fields are only 4 bits wide: this presents some
17 interesting conceptual challenges for SVP64, which was designed
18 primarily for vectors of arithmetic and logical operations. However
19 if predicates may be bits of CR Fields it makes sense to extend
20 Simple-V to cover CR Operations, especially given that Vectorised Rc=1
21 may be processed by Vectorised CR Operations tbat usefully in turn
22 may become Predicate Masks to yet more Vector operations, like so:
23
24 sv.cmpi/ew=8 *B,*ra,0 # compare bytes against zero
25 sv.cmpi/ew=8 *B2,*ra,13. # and against newline
26 sv.cror PM.EQ,B.EQ,B2.EQ # OR compares to create mask
27 sv.stb/sm=EQ ... # store only nonzero/newline
28
29 Element width however is clearly meaningless for a 4-bit
30 collation of Conditions, EQ LT GE SO. Likewise, arithmetic saturation
31 (an important part of Arithmetic SVP64)
32 has no meaning. An alternative Mode Format is required, and given that elwidths are meaningless for CR Fields
33 the bits in SVP64 `RM` may be used for other purposes.
34
35 This alternative mapping **only** applies to instructions that **only**
36 reference a CR Field or CR bit as the sole exclusive result. This section
37 **does not** apply to instructions which primarily produce arithmetic
38 results that also, as an aside, produce a corresponding
39 CR Field (such as when Rc=1).
40 Instructions that involve Rc=1 are definitively arithmetic in nature,
41 where the corresponding Condition Register Field can be considered to
42 be a "co-result". Such CR Field "co-result" arithmeric operations
43 are firmly out of scope for
44 this section, being covered fully by [[sv/normal]].
45
46 * Examples of v3.0B instructions to which this section does
47 apply is
48 - `mfcr` and `cmpi` (3 bit operands) and
49 - `crnor` and `crand` (5 bit operands).
50 * Examples to which this section does **not** apply include
51 `fadds.` and `subf.` which both produce arithmetic results
52 (and a CR Field co-result).
53
54 The CR Mode Format still applies to `sv.cmpi` because despite
55 taking a GPR as input, the output from the Base Scalar v3.0B `cmpi`
56 instruction is purely to a Condition Register Field.
57
58 Other modes are still applicable and include:
59
60 * **Data-dependent fail-first**.
61 useful to truncate VL based on
62 analysis of a Condition Register result bit.
63 * **Scalar and parallel reduction**.
64 Reduction is useful
65 for analysing a Vector of Condition Register Fields
66 and reducing it to one
67 single Condition Register Field.
68 * **Pack/Unpack Mode**.
69 Like VSX `vpack` and `vunpack` the source and destination
70 elements are reordered.
71
72 Predicate-result does not make any sense because
73 when Rc=1 a co-result is created (a CR Field). Testing the co-result
74 allows the decision to be made to store or not store the main
75 result, and for CR Ops the CR Field result *is*
76 the main result.
77
78 # Format
79
80 SVP64 RM `MODE` (includes `ELWIDTH_SRC` bits) for CR-based operations:
81
82 | 6 | 7 | 19-20 | 21 | 22 23 | description |
83 | - | - |-------| --- |---------|----------------- |
84 |sz |SNZ| 0 RG | 0 | dz / | simple mode |
85 |sz |SNZ| 0 RG | 1 | 0 / | scalar reduce mode (mapreduce), SUBVL=1 |
86 |zz |SNZ| 0 RG | 1 | 1 / | parallel reduce mode (mapreduce), SUBVL=1 |
87 |zz |SNZ| 0 RG | 1 | SVM 0 | subvector reduce mode, SUBVL>1 |
88 |zz |SNZ| 0 RG | 1 | SVM 1 | Pack/Unpack mode, SUBVL>1 |
89 |zz |SNZ| 1 VLI | inv | CR-bit | Ffirst 3-bit mode |
90 |sz |SNZ| 1 VLI | inv | dz / | Ffirst 5-bit mode |
91
92 Fields:
93
94 * **sz / dz** if predication is enabled will put zeros into the dest (or as src in the case of twin pred) when the predicate bit is zero. otherwise the element is ignored or skipped, depending on context.
95 * **zz** set both sz and dz equal to this flag
96 * **SNZ** when sz=1 and SNZ=1 a value "1" is put in place of zeros when
97 the predicate bit is clear (on both source and destination masks)
98 * **inv CR-bit** just as in branches (BO) these bits allow testing of a CR bit and whether it is set (inv=0) or unset (inv=1)
99 * **RG** inverts the Vector Loop order (VL-1 downto 0) rather
100 than the normal 0..VL-1
101 * **SVM** sets "subvector" reduce mode
102 * **VLi** VL inclusive: in fail-first mode, the truncation of
103 VL *includes* the current element at the failure point rather
104 than excludes it from the count.
105
106 # Data-dependent fail-first on CR operations
107
108 The principle of data-dependent fail-first is that if, during
109 the course of sequentially evaluating an element's Condition Test,
110 one such test is encountered which fails,
111 then VL (Vector Length) is truncated (set) at that point. In the case
112 of Arithmetic SVP64 Operations the Condition Register Field generated from
113 Rc=1 is used as the basis for the truncation decision.
114 However with CR-based operations that CR Field result to be
115 tested is provided
116 *by the operation itself*.
117
118 Data-dependent SVP64 Vectorised Operations involving the creation or
119 modification of a CR can require an extra two bits, which are not available
120 in the compact space of the SVP64 RM `MODE` Field. With the concept of element
121 width overrides being meaningless for CR Fields it is possible to use the
122 `ELWIDTH` field for alternative purposes.
123
124 Condition Register based operations such as `sv.mfcr` and `sv.crand` can thus
125 be made more flexible. However the rules that apply in this section
126 also apply to future CR-based instructions.
127
128 There are two primary different types of CR operations:
129
130 * Those which have a 3-bit operand field (referring to a CR Field)
131 * Those which have a 5-bit operand (referring to a bit within the
132 whole 32-bit CR)
133
134 Examining these two types it is observed that the
135 difference may be considered to be that the 5-bit variant
136 *already* provides the
137 prerequisite information about which CR Field bit (EQ, GE, LT, SO) is to
138 be operated on by the instruction.
139 Thus, logically, we may set the following rule:
140
141 * When a 5-bit CR Result field is used in an instruction, the
142 5-bit variant of Data-Dependent Fail-First
143 must be used. i.e. the bit of the CR field to be tested is
144 the one that has just been modified (created) by the operation.
145 * When a 3-bit CR Result field is used the 3-bit variant
146 must be used, providing as it does the missing `CRbit` field
147 in order to select which CR Field bit of the result shall
148 be tested (EQ, LE, GE, SO)
149
150 The reason why the 3-bit CR variant needs the additional CR-bit
151 field should be obvious from the fact that the 3-bit CR Field
152 from the base Power ISA v3.0B operation clearly does not contain
153 and is missing the two CR Field Selector bits. Thus, these two
154 bits (to select EQ, LE, GE or SO) must be provided in another
155 way.
156
157 Examples of the former type:
158
159 * crand, cror, crnor. These all are 5-bit (BA, BB, BT). The bit
160 to be tested against `inv` is the one selected by `BT`
161 * mcrf. This has only 3-bit (BF, BFA). In order to select the
162 bit to be tested, the alternative encoding must be used.
163 With `CRbit` coming from the SVP64 RM bits 22-23 the bit
164 of BF to be tested is identified.
165
166 Just as with SVP64 [[sv/branches]] there is the option to truncate
167 VL to include the element being tested (`VLi=1`) and to exclude it
168 (`VLi=0`).
169
170 Also exactly as with [[sv/normal]] fail-first, VL cannot, unlike
171 [[sv/ldst]], be set to an arbitrary value. Deterministic behaviour
172 is *required*.
173
174 # Reduction and Iteration
175
176 Bearing in mind as described in the [[svp64/appendix]] SVP64 Horizontal
177 Reduction is a deterministic schedule on top of base Scalar v3.0 operations,
178 the same rules apply to CR Operations, i.e. that programmers must
179 follow certain conventions in order for an *end result* of a
180 reduction to be achieved. Unlike
181 other Vector ISAs *there are no explicit reduction opcodes*
182 in SVP64: Schedules however achieve the same effect.
183
184 Due to these conventions only reduction on operations such as `crand`
185 and `cror` are meaningful because these have Condition Register Fields
186 as both input and output.
187 Meaningless operations are not prohibited because the cost in hardware
188 of doing so is prohibitive, but neither are they `UNDEFINED`. Implementations
189 are still required to execute them but are at liberty to optimise out
190 any operations that would ultimately be overwritten, as long as Strict
191 Program Order is still obvservable by the programmer.
192
193 Also bear in mind that 'Reverse Gear' may be enabled, which can be
194 used in combination with overlapping CR operations to iteratively accumulate
195 results. Issuing a `sv.crand` operation for example with `BA`
196 differing from `BB` by one Condition Register Field would
197 result in a cascade effect, where the first-encountered CR Field
198 would set the result to zero, and also all subsequent CR Field
199 elements thereafter:
200
201 # sv.crand/mr/rg CR4.ge.v, CR5.ge.v, CR4.ge.v
202 for i in VL-1 downto 0 # reverse gear
203 CR[4+i].ge &= CR[5+i].ge
204
205 `sv.crxor` with reduction would be particularly useful for parity calculation
206 for example, although there are many ways in which the same calculation
207 could be carried out after transferring a vector of CR Fields to a GPR
208 using crweird operations.
209
210 Implementations are free and clear to optimise these reductions in any
211 way they see fit, as long as the end-result is compatible with Strict Program
212 Order being observed, and Interrupt latency is not adversely impacted.
213
214 # LD/ST Pack/Unpack Mode
215
216 As described in [[sv/normal]],
217 Structured Pack/Unpack is similar to VSX `vpack` and `vunpack` except
218 generalised not only to a Schedule to be applied to any operation but
219 also extended to vec2/3/4.
220
221 Like in [[sv/normal]] and [sv/ldst]] operations,
222 setting this mode changes the meaning of bits 4-5 in `RM` from being
223 `ELWIDTH` to a pair of Pack/Unpack bits.
224 *Unlike* in other operation categories however,
225 the `SRC_ELWIDTH` bits (6-7) are in use for `SNZ`.
226 Therefore **it is not possible to use elwidth overrides and Pack/Unpack**
227 at the same time. With elwidths being meaningless for CR Fields this was
228 considered an acceptable compromise: the operations particularly affected
229 are extremely weird CR ops.
230
231 # Unusual and quirky CR operations
232
233 ## cmp and other compare ops
234
235 `cmp` and `cmpi` etc take GPRs as sources and create a CR Field as a result.
236
237 cmpli BF,L,RA,UI
238 cmpeqb BF,RA,RB
239
240 With `ELWIDTH` applying to the source GPR operands this is perfectly fine
241 (caveat: except if Pack/Unpack is needed as well)
242
243 ## crweird operations
244
245 There are 4 weird CR-GPR operations and one reasonable one in
246 the [[cr_int_predication]] set:
247
248 * crrweird
249 * mtcrweird
250 * crweirder
251 * crweird
252 * mcrfm - reasonably normal and referring to CR Fields for src and dest.
253