1 # Condition Register SVP64 Operations
5 * <https://bugs.libre-soc.org/show_bug.cgi?id=687>
8 * [[openpower/isa/sprset]]
9 * [[openpower/isa/condition]]
10 * [[openpower/isa/comparefixed]]
12 Condition Register Fields are only 4 bits wide: this presents some
13 interesting conceptual challenges for SVP64, particularly with respect to element
14 width (which is clearly meaningless for a 4-bit
15 collation of Conditions, EQ LT GE SO). Likewise, arithmetic saturation
16 (an important part of Arithmetic SVP64)
17 has no meaning. Additionally, extra modes are required that only make
18 sense for Vectorised CR Operations. Consequently an alternative Mode Format is required.
20 This alternative mapping **only** applies to instructions that **only**
21 reference a CR Field or CR bit as the sole exclusive result. This section
22 **does not** apply to instructions which primarily produce arithmetic
23 results that also, as an aside, produce a corresponding
24 CR Field (such as when Rc=1).
25 Instructions that involve Rc=1 are definitively arithmetic in nature,
26 where the corresponding Condition Register Field can be considered to
27 be a "co-result". Such CR Field "co-result" arithmeric operations
28 are firmly out of scope for
31 * Examples of v3.0B instructions to which this section does
33 - `mfcr` (3 bit operands) and
34 - `crnor` and `cmpi` (5 bit operands).
35 * Examples to which this section does **not** apply include
36 `fadds.` and `subf.` which both produce arithmetic results
37 (and a CR Field co-result).
39 The CR Mode Format still applies to `sv.cmpi` because despite
40 taking a GPR as input, the output from the Base Scalar v3.0B `cmpi`
41 instruction is purely to a Condition Register Field.
43 Other modes are still applicable and include:
45 * **Data-dependent fail-first**.
46 useful to truncate VL based on
47 analysis of a Condition Register result bit.
48 * **Scalar and parallel reduction**.
50 for analysing a Vector of Condition Register Fields
51 and reducing it to one
52 single Condition Register Field.
53 * **Predicate-result**.
55 to python "filter", in that only elements which pass a test
56 will end up actually being modified. This is in effect the same
57 as ANDing the Condition Test with the destination predicate
58 mask (hence the name, "predicate-result").
60 Predicate-result is a particularly powerful strategic mode
61 in that it is the interaction of a source predicate, destination predicate,
62 input operands *and* the output result, all combining to influence
63 what actually goes into the Condition Register File. Given that
64 predicates may themselves be Condition Registers it can be seen that
65 there could potentially be up to **six** CR Fields involved in
66 the execution of Predicate-result Mode.
68 SVP64 RM `MODE` (includes `ELWIDTH` bits) for CR-based operations:
70 | 4 | 5 | 19-20 | 21 | 22 23 | description |
71 | - | - | ----- | --- |---------|----------------- |
72 |sz |SNZ| 00 | 0 | dz / | normal mode |
73 | / | / | 00 | 1 | 0 RG | scalar reduce mode (mapreduce), SUBVL=1 |
74 | / | / | 00 | 1 | 1 CRM | parallel reduce mode (mapreduce), SUBVL=1 |
75 | / | / | 00 | 1 | SVM RG | subvector reduce mode, SUBVL>1 |
76 |sz |SNZ| 01/10 | inv | CR-bit | Ffirst 3-bit mode |
77 |sz |SNZ| 01/10 | inv | dz / | Ffirst 5-bit mode |
78 |sz |SNZ| 11 | inv | CR-bit | 3-bit pred-result CR sel |
79 |sz |SNZ| 11 | inv | dz / | 5-bit pred-result z/nonz |
81 `VLI=0` when bits 19-20=0b01.
82 `VLI=1` when bits 19-20=0b10.
86 * **sz / dz** if predication is enabled will put zeros into the dest (or as src in the case of twin pred) when the predicate bit is zero. otherwise the element is ignored or skipped, depending on context.
87 * **SNZ** when sz=1 and SNZ=1 a value "1" is put in place of zeros when
88 the predicate bit is clear.
89 * **inv CR bit** just as in branches (BO) these bits allow testing of a CR bit and whether it is set (inv=0) or unset (inv=1)
90 * **RG** inverts the Vector Loop order (VL-1 downto 0) rather
91 than the normal 0..VL-1
92 * **CRM** affects the CR on reduce mode when Rc=1
93 * **SVM** sets "subvector" reduce mode
94 * **VLi** VL inclusive: in fail-first mode, the truncation of
95 VL *includes* the current element at the failure point rather
96 than excludes it from the count.
99 # Data-dependent fail-first on CR operations
101 The principle of data-dependent fail-first is that if a Condition Test
102 fails then VL (Vector Length) is truncated at that point. In the case
103 of Arithmetic SVP64 Operations the Condition Register Field generated from
104 Rc=1 is used, however with CR-based operations that CR result is provided
105 by the operation itself.
107 Data-dependent SVP64 Vectorised Operations involving the creation or
108 modification of a CR can require an extra two bits, which are not available
109 in the compact space of the SVP64 RM `MODE` Field. With the concept of element
110 width overrides being meaningless for CR Fields it is possible to use the
111 `ELWIDTH` field for alternative purposes.
113 Condition Register based operations such as `sv.mfcr` and `sv.crand` can thus
114 be made more flexible. However the rules that apply in this section
115 also apply to future CR-based instructions.
117 There are two primary different types of CR operations:
119 * Those which have a 3-bit operand field (referring to a CR Field)
120 * Those which have a 5-bit operand (referring to a bit within the
123 Examining these two types it is observed that the
124 difference may be considered to be that the 5-bit variant provides
125 additional information about which CR Field bit (EQ, GE, LT, SO) is to
126 be operated on by the instruction.
127 Thus, logically, we may set the following rule:
129 * When a 5-bit CR Result field is used in an instruction, the
130 5-bit variant of Data-Dependent Fail-First
131 must be used. i.e. the bit of the CR field to be tested is
132 the one that has just been modified (created) by the operation.
133 * When a 3-bit CR Result field is used the 3-bit variant
134 must be used, providing as it does the missing `CRbit`
135 in order to select which CR Field bit of the result shall
136 be tested (EQ, LE, GE, SO)
138 The reason why the 3-bit CR variant needs the additional CR-bit
139 field should be obvious from the fact that the 3-bit CR Field
140 from the base Power ISA v3.0B operation clearly does not contain
141 and is missing the two CR Field Selector bits. Thus, these two
142 bits (to select EQ, LE, GE or SO) must be provided in another
145 Examples of the former type:
147 * crand, cror, crnor. These all are 5-bit (BA, BB, BT). The bit
148 to be tested against `inv` is the one selected by `BT`
149 * mcrf. This has only 3-bit (BF, BFA). In order to select the
150 bit to be tested, the alternative encoding must be used.
151 With `CRbit` coming from the SVP64 RM bits 22-23 the bit
152 of BF to be tested is identified.
154 Just as with SVP64 [[sv/branches]] there is the option to truncate
155 VL to include the element being tested (`VLi=1`) and to exclude it
158 # Predicate-result Condition Register operations
160 These are again slightly different compared to SVP64 arithmetic
161 pred-result (described in [[svp64/appendix]]). The reason is that,
162 again, for arithmetic operations the production of a CR Field when
163 Rc=1 is a *co-result* accompanying the main arithmetic result, whereas
164 for CR-based operations the CR Field (referred to by a 3-bit
165 v3.0B base operand from e.g. `mfcr`) or CR bit (referred to by a 5-bit operand from e.g. `crnor`)
166 *is* itself the explicit and sole result of the operation.
168 Therefore, logically, Predicate-result needs to be adapted to
169 test the actual result of the CR-based instruction (rather than
170 test the co-resultant CR when Rc=1, as is done for Arithmetic SVP64).
173 # predication test, skip all masked out elements.
175 if sz=0 and predicate_masked_out(i):
177 if predicate_masked_out(i):
179 # only one bit of CR to update
183 result = SNZ || SNZ || SNZ || SNZ
185 # result is to go into CR. may be a 4-bit CR Field
186 # (3-bit mode) or just a single bit (5-bit mode)
189 # if this CR op has 5-bit CR result operands
190 # the single bit result is what must be tested
193 # if however this is a 3-bit CR *field* result
194 # then the bit to be tested must be selected
195 to_test = result[CRbit]
196 # now test CR, similar to branch
198 continue # test failed: cancel store
199 # result optionally stored