6cc6820b3c4cdd9ff8ed6fbd48a7433a05a98238
[libreriscv.git] / openpower / sv / cr_ops.mdwn
1 # Condition Register SVP64 Operations
2
3 Links:
4
5 * <https://bugs.libre-soc.org/show_bug.cgi?id=687>
6 * [[svp64]]
7 * [[sv/branches]]
8
9 Condition Register Fields are only 4 bits wide: this presents some
10 interesting conceptual challenges for SVP64, particularly with respect to element
11 width (which is clearly meaningless for a 4-bit
12 collation of Conditions, EQ LT GE SO). Likewise, arithmetic saturation
13 (an important part of Arithmetic SVP64)
14 has no meaning. Additionally, extra modes are required that only make
15 sense for Vectorised CR Operations. Consequently an alternative Mode Format is required.
16
17 This alternative mapping **only** applies to instructions that **only**
18 reference a CR Field or CR bit as the sole exclusive result. This section
19 **does not** apply to instructions which primarily produce arithmetic
20 results that also, as an aside, produce a corresponding
21 CR Field (such as when Rc=1).
22 Instructions that involve Rc=1 are definitively arithmetic in nature,
23 where the corresponding Condition Register Field can be considered to
24 be a "co-result". Such CR Field "co-result" arithmeric operations
25 are firmly out of scope for
26 this section.
27
28 * Examples of v3.0B instructions to which this section does
29 apply is `mfcr` (3 bit operands) and `crnor` (5 bit operands).
30 * Examples to which this section does **not** apply include
31 `fadds.` and `subf.` which both produce arithmetic results
32 (and a CR Field co-result).
33
34 Other modes are still applicable and include:
35
36 * **Data-dependent fail-first**.
37 useful to truncate VL based on
38 analysis of a Condition Register result bit.
39 * **Scalar and parallel reduction**.
40 Reduction is useful
41 for turning a Vector of Condition Register Fields into one
42 single Condition Register.
43 * **Predicate-result**.
44 Equivalent
45 to python "filter", in that only elements which pass a test
46 will end up actually being modified. This is in effect the same
47 as ANDing the Condition Test with the destination predicate
48 mask (hence the name, "predicate-result").
49
50 Predicate-result is a particularly powerful strategic mode
51 in that it is the interaction of a source predicate, destination predicate,
52 input operands *and* the output result, all combining to influence
53 what actually goes into the Condition Register File. Given that
54 predicates may themselves be Condition Registers it can be seen that
55 there could potentially be up to **six** CR Fields involved in
56 the execution of Predicate-result Mode.
57
58 SVP64 RM `MODE` (includes `ELWIDTH` bits) for CR-based operations:
59
60 | 4 | 5 | 19-20 | 21 | 22 23 | description |
61 | - | - | ----- | --- |---------|----------------- |
62 | / | / | 00 | 0 | dz sz | normal mode |
63 | / | / | 00 | 1 | 0 RG | scalar reduce mode (mapreduce), SUBVL=1 |
64 | / | / | 00 | 1 | 1 CRM | parallel reduce mode (mapreduce), SUBVL=1 |
65 | / | / | 00 | 1 | SVM RG | subvector reduce mode, SUBVL>1 |
66 |dz |VLi| 01 | inv | CR-bit | Ffirst 3-bit mode |
67 |sz |VLi| 01 | inv | dz Rc1 | Ffirst 5-bit mode |
68 | / | / | 10 | / | / / | RESERVED |
69 |sz |SNZ| 11 | inv | CR-bit | 3-bit pred-result CR sel |
70 | / |SNZ| 11 | inv | dz sz | 5-bit pred-result z/nonz |
71
72 Fields:
73
74 TODO
75
76 # Data-dependent fail-first on CR operations
77
78 The principle of data-dependent fail-first is that if a Condition Test
79 fails then VL (Vector Length) is truncated at that point. In the case
80 of Arithmetic SVP64 Operations the Condition Register Field generated from
81 Rc=1 is used, however with CR-based operations that CR result is provided
82 by the operation itself.
83
84 Data-dependent SVP64 Vectorised Operations involving the creation or
85 modification of a CR can require an extra two bits, which are not available
86 in the compact space of the SVP64 RM `MODE` Field. With the concept of element
87 width overrides being meaningless for CR Fields it is possible to use the
88 `ELWIDTH` field for alternative purposes.
89
90 Condition Register based operations such as `sv.mfcr` and `sv.crand` can thus
91 be made more flexible. However the rules that apply in this section
92 also apply to future CR-based instructions.
93
94 There are two primary different types of CR operations:
95
96 * Those which have a 3-bit operand field (referring to a CR Field)
97 * Those which have a 5-bit operand (referring to a bit within the
98 whole 32-bit CR)
99
100 Examining these two types it is observed that the
101 difference may be considered to be that the 5-bit variant provides
102 additional information about which CR Field bit (EQ, GE, LT, SO) is to
103 be operated on by the instruction.
104 Thus, logically, we may set the following rule:
105
106 * When a 5-bit CR Result field is used in an instruction, the
107 5-bit variant of Data-Dependent Fail-First
108 must be used. i.e. the bit of the CR field to be tested is
109 the one that has just been modified (created) by the operation.
110 * When a 3-bit CR Result field is used the 3-bit variant
111 must be used, providing as it does the missing `CRbit`
112 in order to select which CR Field bit of the result shall
113 be tested (EQ, LE, GE, SO)
114
115 The reason why the 3-bit CR variant needs the additional CR-bit
116 field should be obvious from the fact that the 3-bit CR Field
117 from the base Power ISA v3.0B operation clearly does not contain
118 and is missing the two CR Field Selector bits. Thus, these two
119 bits (to select EQ, LE, GE or SO) must be provided in another
120 way.
121
122 Examples of the former type:
123
124 * crand, cror, crnor. These all are 5-bit (BA, BB, BT). The bit
125 to be tested against `inv` is the one selected by `BT`
126 * mcrf. This has only 3-bit (BF, BFA). In order to select the
127 bit to be tested, the alternative encoding must be used.
128 With `CRbit` coming from the SVP64 RM bits 22-23 the bit
129 of BF to be tested is identified.
130
131 Just as with SVP64 [[sv/branches]] there is the option to truncate
132 VL to include the element being tested (`VLi=1`) and to exclude it
133 (`VLi=0`).
134
135 # Predicate-result Condition Register operations
136
137 These are again slightly different compared to SVP64 arithmetic
138 pred-result (described in [[svp64/appendix]]). The reason is that,
139 again, for arithmetic operations the production of a CR Field when
140 Rc=1 is a *co-result* accompanying the main arithmetic result, whereas
141 for CR-based operations the CR Field (referred to by a 3-bit
142 v3.0B base operand from e.g. `mfcr`) or CR bit (referred to by a 5-bit operand from e.g. `crnor`)
143 *is* itself the explicit and sole result of the operation.
144
145 Therefore, logically, Predicate-result needs to be adapted to
146 test the actual result of the CR-based instruction (rather than
147 test the co-resultant CR when Rc=1, as is done for Arithmetic SVP64).
148
149 for i in range(VL):
150 # predication test, skip all masked out elements.
151 # skips when sz=0
152 if sz=0 and predicate_masked_out(i):
153 continue
154 if predicate_masked_out(i):
155 if 5bit mode:
156 # only one bit of CR to update
157 result = SNZ
158 else
159 # four copies of SNZ
160 result = SNZ || SNZ || SNZ || SNZ
161 else
162 # result is to go into CR. may be a 4-bit CR Field
163 # (3-bit mode) or just a single bit (5-bit mode)
164 result = op(...)
165 if 5bit mode:
166 # if this CR op has 5-bit CR result operands
167 # the single bit result is what must be tested
168 to_test = result
169 else
170 # if however this is a 3-bit CR *field* result
171 # then the bit to be tested must be selected
172 to_test = result[CRbit]
173 # now test CR, similar to branch
174 if to_test != inv:
175 continue # test failed: cancel store
176 # result optionally stored
177 update_CR(result)