(no commit message)
[libreriscv.git] / openpower / sv / cr_ops.mdwn
1 [[!tag standards]]
2 # Condition Register SVP64 Operations
3
4 Links:
5
6 * <https://bugs.libre-soc.org/show_bug.cgi?id=687>
7 * [[svp64]]
8 * [[sv/branches]]
9 * [[sv/cr_int_predication]]
10 * [[openpower/isa/sprset]]
11 * [[openpower/isa/condition]]
12 * [[openpower/isa/comparefixed]]
13
14 Condition Register Fields are only 4 bits wide: this presents some
15 interesting conceptual challenges for SVP64, particularly with respect to element
16 width (which is clearly meaningless for a 4-bit
17 collation of Conditions, EQ LT GE SO). Likewise, arithmetic saturation
18 (an important part of Arithmetic SVP64)
19 has no meaning. Additionally, extra modes are required that only make
20 sense for Vectorised CR Operations. Consequently an alternative Mode Format is required, and given that elwidths are meaningless for CR Fields
21 the bits in SVP64 `RM` may be used for other purposes.
22
23 This alternative mapping **only** applies to instructions that **only**
24 reference a CR Field or CR bit as the sole exclusive result. This section
25 **does not** apply to instructions which primarily produce arithmetic
26 results that also, as an aside, produce a corresponding
27 CR Field (such as when Rc=1).
28 Instructions that involve Rc=1 are definitively arithmetic in nature,
29 where the corresponding Condition Register Field can be considered to
30 be a "co-result". Such CR Field "co-result" arithmeric operations
31 are firmly out of scope for
32 this section.
33
34 * Examples of v3.0B instructions to which this section does
35 apply is
36 - `mfcr` (3 bit operands) and
37 - `crnor` and `cmpi` (5 bit operands).
38 * Examples to which this section does **not** apply include
39 `fadds.` and `subf.` which both produce arithmetic results
40 (and a CR Field co-result).
41
42 The CR Mode Format still applies to `sv.cmpi` because despite
43 taking a GPR as input, the output from the Base Scalar v3.0B `cmpi`
44 instruction is purely to a Condition Register Field.
45
46 Other modes are still applicable and include:
47
48 * **Data-dependent fail-first**.
49 useful to truncate VL based on
50 analysis of a Condition Register result bit.
51 * **Scalar and parallel reduction**.
52 Reduction is useful
53 for analysing a Vector of Condition Register Fields
54 and reducing it to one
55 single Condition Register Field.
56
57 Predicate-result unfortunately does not make any sense because
58 when Rc=1 a co-result is created (a CR Field). Testing the co-result
59 allows the decision to be made to store or not store the main
60 result, and unfortunately for CR Ops the CR Field result *is*
61 the main result.
62
63 A reminder that, just as with other SVP64 Modes, unlike v3.1 64 bit
64 Prefixing there are insufficient bits spare in the prefix to mark
65 the type. Therefore, the SVP64 Mode must be identified by first
66 decoding the suffix (the 32 bit scalar operation), and, once
67 the instruction is identified (cmpi, mfcr, crweird)
68 only then may the type of SVP64 Mode (normal, branch, LDST, CR 3-bit
69 or CR 5-bit) be decoded.
70
71 # Format
72
73 SVP64 RM `MODE` (includes `ELWIDTH` bits) for CR-based operations:
74
75 | 4 | 5 | 19-20 | 21 | 22 23 | description |
76 | - | - | ----- | --- |---------|----------------- |
77 |sz |SNZ| 00 | 0 | dz / | normal mode |
78 |sz |SNZ| 00 | 1 | 0 RG | scalar reduce mode (mapreduce), SUBVL=1 |
79 |sz |SNZ| 00 | 1 | 1 / | parallel reduce mode (mapreduce), SUBVL=1 |
80 |sz |SNZ| 00 | 1 | SVM RG | subvector reduce mode, SUBVL>1 |
81 |sz |SNZ| 01/10 | inv | CR-bit | Ffirst 3-bit mode |
82 |sz |SNZ| 01/10 | inv | dz / | Ffirst 5-bit mode |
83 |sz |SNZ| 11 | rsv | rsvd | reserved |
84
85 `VLI=0` when bits 19-20=0b01.
86 `VLI=1` when bits 19-20=0b10.
87
88 Fields:
89
90 * **sz / dz** if predication is enabled will put zeros into the dest (or as src in the case of twin pred) when the predicate bit is zero. otherwise the element is ignored or skipped, depending on context.
91 * **SNZ** when sz=1 and SNZ=1 a value "1" is put in place of zeros when
92 the predicate bit is clear.
93 * **inv CR bit** just as in branches (BO) these bits allow testing of a CR bit and whether it is set (inv=0) or unset (inv=1)
94 * **RG** inverts the Vector Loop order (VL-1 downto 0) rather
95 than the normal 0..VL-1
96 * **SVM** sets "subvector" reduce mode
97 * **VLi** VL inclusive: in fail-first mode, the truncation of
98 VL *includes* the current element at the failure point rather
99 than excludes it from the count.
100
101 # Data-dependent fail-first on CR operations
102
103 The principle of data-dependent fail-first is that if, during
104 the course of sequentially evaluating an element's Condition Test,
105 one such test is encountered which fails,
106 then VL (Vector Length) is truncated at that point. In the case
107 of Arithmetic SVP64 Operations the Condition Register Field generated from
108 Rc=1 is used as the basis for the truncation decision.
109 However with CR-based operations that CR Field result to be
110 tested is provided
111 *by the operation itself*.
112
113 Data-dependent SVP64 Vectorised Operations involving the creation or
114 modification of a CR can require an extra two bits, which are not available
115 in the compact space of the SVP64 RM `MODE` Field. With the concept of element
116 width overrides being meaningless for CR Fields it is possible to use the
117 `ELWIDTH` field for alternative purposes.
118
119 Condition Register based operations such as `sv.mfcr` and `sv.crand` can thus
120 be made more flexible. However the rules that apply in this section
121 also apply to future CR-based instructions.
122
123 There are two primary different types of CR operations:
124
125 * Those which have a 3-bit operand field (referring to a CR Field)
126 * Those which have a 5-bit operand (referring to a bit within the
127 whole 32-bit CR)
128
129 Examining these two types it is observed that the
130 difference may be considered to be that the 5-bit variant
131 *already* provides the
132 prerequisite information about which CR Field bit (EQ, GE, LT, SO) is to
133 be operated on by the instruction.
134 Thus, logically, we may set the following rule:
135
136 * When a 5-bit CR Result field is used in an instruction, the
137 5-bit variant of Data-Dependent Fail-First
138 must be used. i.e. the bit of the CR field to be tested is
139 the one that has just been modified (created) by the operation.
140 * When a 3-bit CR Result field is used the 3-bit variant
141 must be used, providing as it does the missing `CRbit` field
142 in order to select which CR Field bit of the result shall
143 be tested (EQ, LE, GE, SO)
144
145 The reason why the 3-bit CR variant needs the additional CR-bit
146 field should be obvious from the fact that the 3-bit CR Field
147 from the base Power ISA v3.0B operation clearly does not contain
148 and is missing the two CR Field Selector bits. Thus, these two
149 bits (to select EQ, LE, GE or SO) must be provided in another
150 way.
151
152 Examples of the former type:
153
154 * crand, cror, crnor. These all are 5-bit (BA, BB, BT). The bit
155 to be tested against `inv` is the one selected by `BT`
156 * mcrf. This has only 3-bit (BF, BFA). In order to select the
157 bit to be tested, the alternative encoding must be used.
158 With `CRbit` coming from the SVP64 RM bits 22-23 the bit
159 of BF to be tested is identified.
160
161 Just as with SVP64 [[sv/branches]] there is the option to truncate
162 VL to include the element being tested (`VLi=1`) and to exclude it
163 (`VLi=0`).
164
165 Also just as with [[sv/normal]] fail-first VL cannot, unlike
166 [[sv/ldst]], be set to an arbitrary value. Deterministic behaviour
167 is *required*.
168
169 # Reduction and Iteration
170
171 Bearing in mind as described in the [[svp64/appendix]] SVP64 Horizontal
172 Reduction is a deterministic schedule on top of base Scalar v3.0 operations,
173 the same rules apply to CR Operations, i.e. that programmers must
174 follow certain conventions in order for an *end result* of a
175 reduction to be achieved. Unlike
176 other Vector ISAs *there are no explicit reduction opcodes*
177 in SVP64.
178
179 Due to these conventions only reduction on operations such as `crand`
180 and `cror` are meaningful because these have Condition Register Fields
181 as both input and output.
182
183 Also bear in mind that 'Reverse Gear' may be enabled, which can be
184 used in combination with overlapping CR operations to iteratively accumulate
185 results. Issuing a `sv.crand` operation for example with `BA`
186 differing from `BB` by one Condition Register Field would
187 result in a cascade effect, where the first-encountered CR Field
188 would set the result to zero, and also all subsequent CR Field
189 elements thereafter:
190
191 # sv.crand/mr/rg CR4.ge.v, CR5.ge.v, CR4.ge.v
192 for i in VL-1 downto 0 # reverse gear
193 CR[4+i].ge &= CR[5+i].ge
194