(no commit message)
[libreriscv.git] / openpower / sv / cr_ops.mdwn
1 # Condition Register SVP64 Operations
2
3 Links:
4
5 * <https://bugs.libre-soc.org/show_bug.cgi?id=687>
6 * [[svp64]]
7 * [[sv/branches]]
8 * [[openpower/isa/sprset]]
9 * [[openpower/isa/condition]]
10 * [[openpower/isa/comparefixed]]
11
12 Condition Register Fields are only 4 bits wide: this presents some
13 interesting conceptual challenges for SVP64, particularly with respect to element
14 width (which is clearly meaningless for a 4-bit
15 collation of Conditions, EQ LT GE SO). Likewise, arithmetic saturation
16 (an important part of Arithmetic SVP64)
17 has no meaning. Additionally, extra modes are required that only make
18 sense for Vectorised CR Operations. Consequently an alternative Mode Format is required.
19
20 This alternative mapping **only** applies to instructions that **only**
21 reference a CR Field or CR bit as the sole exclusive result. This section
22 **does not** apply to instructions which primarily produce arithmetic
23 results that also, as an aside, produce a corresponding
24 CR Field (such as when Rc=1).
25 Instructions that involve Rc=1 are definitively arithmetic in nature,
26 where the corresponding Condition Register Field can be considered to
27 be a "co-result". Such CR Field "co-result" arithmeric operations
28 are firmly out of scope for
29 this section.
30
31 * Examples of v3.0B instructions to which this section does
32 apply is `mfcr` (3 bit operands) and `crnor` and `cmpi`
33 (5 bit operands).
34 * Examples to which this section does **not** apply include
35 `fadds.` and `subf.` which both produce arithmetic results
36 (and a CR Field co-result).
37
38 The CR Mode Format still applies to `sv.cmpi` because despite
39 taking a GPR as input, the output from the Base Scalar v3.0B `cmpi`
40 instruction is purely to a Condition Register Field.
41
42 Other modes are still applicable and include:
43
44 * **Data-dependent fail-first**.
45 useful to truncate VL based on
46 analysis of a Condition Register result bit.
47 * **Scalar and parallel reduction**.
48 Reduction is useful
49 for analysing a Vector of Condition Register Fields
50 and reducing it to one
51 single Condition Register Field.
52 * **Predicate-result**.
53 Equivalent
54 to python "filter", in that only elements which pass a test
55 will end up actually being modified. This is in effect the same
56 as ANDing the Condition Test with the destination predicate
57 mask (hence the name, "predicate-result").
58
59 Predicate-result is a particularly powerful strategic mode
60 in that it is the interaction of a source predicate, destination predicate,
61 input operands *and* the output result, all combining to influence
62 what actually goes into the Condition Register File. Given that
63 predicates may themselves be Condition Registers it can be seen that
64 there could potentially be up to **six** CR Fields involved in
65 the execution of Predicate-result Mode.
66
67 SVP64 RM `MODE` (includes `ELWIDTH` bits) for CR-based operations:
68
69 | 4 | 5 | 19-20 | 21 | 22 23 | description |
70 | - | - | ----- | --- |---------|----------------- |
71 |sz |SNZ| 00 | 0 | dz / | normal mode |
72 | / | / | 00 | 1 | 0 RG | scalar reduce mode (mapreduce), SUBVL=1 |
73 | / | / | 00 | 1 | 1 CRM | parallel reduce mode (mapreduce), SUBVL=1 |
74 | / | / | 00 | 1 | SVM RG | subvector reduce mode, SUBVL>1 |
75 |dz |SNZ| 01/10 | inv | CR-bit | Ffirst 3-bit mode |
76 |sz |SNZ| 01/10 | inv | dz / | Ffirst 5-bit mode |
77 |sz |SNZ| 11 | inv | CR-bit | 3-bit pred-result CR sel |
78 |sz |SNZ| 11 | inv | dz / | 5-bit pred-result z/nonz |
79
80 `VLI=0` when bits 19-20=0b01.
81 `VLI=1` when bits 19-20=0b10.
82
83 Fields:
84
85 TODO
86
87 # Data-dependent fail-first on CR operations
88
89 The principle of data-dependent fail-first is that if a Condition Test
90 fails then VL (Vector Length) is truncated at that point. In the case
91 of Arithmetic SVP64 Operations the Condition Register Field generated from
92 Rc=1 is used, however with CR-based operations that CR result is provided
93 by the operation itself.
94
95 Data-dependent SVP64 Vectorised Operations involving the creation or
96 modification of a CR can require an extra two bits, which are not available
97 in the compact space of the SVP64 RM `MODE` Field. With the concept of element
98 width overrides being meaningless for CR Fields it is possible to use the
99 `ELWIDTH` field for alternative purposes.
100
101 Condition Register based operations such as `sv.mfcr` and `sv.crand` can thus
102 be made more flexible. However the rules that apply in this section
103 also apply to future CR-based instructions.
104
105 There are two primary different types of CR operations:
106
107 * Those which have a 3-bit operand field (referring to a CR Field)
108 * Those which have a 5-bit operand (referring to a bit within the
109 whole 32-bit CR)
110
111 Examining these two types it is observed that the
112 difference may be considered to be that the 5-bit variant provides
113 additional information about which CR Field bit (EQ, GE, LT, SO) is to
114 be operated on by the instruction.
115 Thus, logically, we may set the following rule:
116
117 * When a 5-bit CR Result field is used in an instruction, the
118 5-bit variant of Data-Dependent Fail-First
119 must be used. i.e. the bit of the CR field to be tested is
120 the one that has just been modified (created) by the operation.
121 * When a 3-bit CR Result field is used the 3-bit variant
122 must be used, providing as it does the missing `CRbit`
123 in order to select which CR Field bit of the result shall
124 be tested (EQ, LE, GE, SO)
125
126 The reason why the 3-bit CR variant needs the additional CR-bit
127 field should be obvious from the fact that the 3-bit CR Field
128 from the base Power ISA v3.0B operation clearly does not contain
129 and is missing the two CR Field Selector bits. Thus, these two
130 bits (to select EQ, LE, GE or SO) must be provided in another
131 way.
132
133 Examples of the former type:
134
135 * crand, cror, crnor. These all are 5-bit (BA, BB, BT). The bit
136 to be tested against `inv` is the one selected by `BT`
137 * mcrf. This has only 3-bit (BF, BFA). In order to select the
138 bit to be tested, the alternative encoding must be used.
139 With `CRbit` coming from the SVP64 RM bits 22-23 the bit
140 of BF to be tested is identified.
141
142 Just as with SVP64 [[sv/branches]] there is the option to truncate
143 VL to include the element being tested (`VLi=1`) and to exclude it
144 (`VLi=0`).
145
146 # Predicate-result Condition Register operations
147
148 These are again slightly different compared to SVP64 arithmetic
149 pred-result (described in [[svp64/appendix]]). The reason is that,
150 again, for arithmetic operations the production of a CR Field when
151 Rc=1 is a *co-result* accompanying the main arithmetic result, whereas
152 for CR-based operations the CR Field (referred to by a 3-bit
153 v3.0B base operand from e.g. `mfcr`) or CR bit (referred to by a 5-bit operand from e.g. `crnor`)
154 *is* itself the explicit and sole result of the operation.
155
156 Therefore, logically, Predicate-result needs to be adapted to
157 test the actual result of the CR-based instruction (rather than
158 test the co-resultant CR when Rc=1, as is done for Arithmetic SVP64).
159
160 for i in range(VL):
161 # predication test, skip all masked out elements.
162 # skips when sz=0
163 if sz=0 and predicate_masked_out(i):
164 continue
165 if predicate_masked_out(i):
166 if 5bit mode:
167 # only one bit of CR to update
168 result = SNZ
169 else
170 # four copies of SNZ
171 result = SNZ || SNZ || SNZ || SNZ
172 else
173 # result is to go into CR. may be a 4-bit CR Field
174 # (3-bit mode) or just a single bit (5-bit mode)
175 result = op(...)
176 if 5bit mode:
177 # if this CR op has 5-bit CR result operands
178 # the single bit result is what must be tested
179 to_test = result
180 else
181 # if however this is a 3-bit CR *field* result
182 # then the bit to be tested must be selected
183 to_test = result[CRbit]
184 # now test CR, similar to branch
185 if to_test != inv:
186 continue # test failed: cancel store
187 # result optionally stored
188 update_CR(result)