(no commit message)
[libreriscv.git] / openpower / sv / cr_ops.mdwn
1 # Condition Register SVP64 Operations
2
3 Links:
4
5 * <https://bugs.libre-soc.org/show_bug.cgi?id=687>
6
7 Condition Register Fields are only 4 bits wide: this presents some
8 interesting conceptual challenges for SVP64, particularly with respect to element
9 width (which is clearly meaningless for a 4-bit
10 collation of Conditions, EQ LT GE SO). Likewise, arithmetic saturation
11 (an important part of Arithmetic SVP64)
12 has no meaning. Additionally, extra modes are required that only make
13 sense for Vectorised CR Operations. Consequently an alternative Mode Format is required.
14
15 This alternative mapping **only** applies to instructions that **only**
16 reference a CR Field or CR bit as the sole exclusive result. This section
17 **does not** apply to instructions which primarily produce arithmetic
18 results that also, as an aside, produce a corresponding
19 CR Field (such as when Rc=1).
20 Instructions that involve Rc=1 are definitively arithmetic in nature,
21 where the corresponding Condition Register Field can be considered to
22 be a "co-result". Thus, if the arithmetic result is Vectorised, so
23 is the CR Field "co-result", which puts both firmly out of scope for
24 this section.
25
26 Examples of v3.0B instructions to which this section does
27 apply is `mfcr` (3 bit operands) and `crnor` (5 bit operands).
28 Examples to which this section does **not** apply include
29 `fadds.` and `subf.` which both produce arithmetic results
30 (and a CR Field co-result).
31
32 Other modes are still applicable and include:
33
34 * **Data-dependent fail-first**.
35 useful to truncate VL based on
36 analysis of a Condition Register result bit.
37 * **Scalar and parallel reduction**.
38 Reduction is useful
39 for turning a Vector of Condition Register Fields into one
40 single Condition Register.
41 * **Predicate-result**.
42 Equivalent
43 to python "filter", in that only elements which pass a test
44 will end up actually being modified. This is in effect the same
45 as ANDing the Condition Test with the destination predicate
46 mask (hence the name, "predicate-result").
47
48 SVP64 RM `MODE` (includes `ELWIDTH` bits) for CR-based operations:
49
50 | 4 | 5 | 19-20 | 21 | 22 23 | description |
51 | - | - | ----- | --- |---------|----------------- |
52 | / | / | 00 | 0 | dz sz | normal mode |
53 | / | / | 00 | 1 | 0 RG | scalar reduce mode (mapreduce), SUBVL=1 |
54 | / | / | 00 | 1 | 1 CRM | parallel reduce mode (mapreduce), SUBVL=1 |
55 | / | / | 00 | 1 | SVM RG | subvector reduce mode, SUBVL>1 |
56 |dz |VLi| 01 | inv | CR-bit | Ffirst 3-bit mode |
57 |sz |VLi| 01 | inv | dz Rc1 | Ffirst 5-bit mode |
58 | / | / | 10 | / | / / | RESERVED |
59 |sz |SNZ| 11 | inv | CR-bit | 3-bit pred-result CR sel |
60 | / |SNZ| 11 | inv | dz sz | 5-bit pred-result z/nonz |
61
62 Fields:
63
64 TODO
65
66 # Data-dependent fail-first on CR operations
67
68 The principle of data-dependent fail-first is that if a Condition Test
69 fails then VL (Vector Length) is truncated at that point. In the case
70 of Arithmetic SVP64 Operations the Condition Register Field generated from
71 Rc=1 is used, however with CR-based operations that CR result is provided
72 by the operation itself.
73
74 Data-dependent SVP64 Vectorised Operations involving the creation or
75 modification of a CR can require an extra two bits, which are not available
76 in the compact space of the SVP64 RM `MODE` Field. With the concept of element
77 width overrides being meaningless for CR Fields it is possible to use the
78 `ELWIDTH` field for alternative purposes.
79
80 Condition Register based operations such as `sv.mfcr` and `sv.crand` can thus
81 be made more flexible. However the rules that apply in this section
82 also apply to future CR-based instructions.
83
84 There are two primary different types of CR operations:
85
86 * Those which have a 3-bit operand field (referring to a CR Field)
87 * Those which have a 5-bit operand (referring to a bit within the
88 whole 32-bit CR)
89
90 Examining these two types it is observed that the
91 difference may be considered to be that the 5-bit variant provides
92 additional information about which CR Field bit (EQ, GE, LT, SO) is to
93 be operated on by the instruction.
94 Thus, logically, we may set the following rule:
95
96 * When a 5-bit CR Result field is used in an instruction, the
97 5-bit variant of Data-Dependent Fail-First
98 must be used. i.e. the bit of the CR field to be tested is
99 the one that has just been modified (created) by the operation.
100 * When a 3-bit CR Result field is used the 3-bit variant
101 must be used, providing as it does the missing `CRbit`
102 in order to select which CR Field bit of the result shall
103 be tested (EQ, LE, GE, SO)
104
105 The reason why the 3-bit CR variant needs the additional CR-bit
106 field should be obvious from the fact that the 3-bit CR Field
107 from the base Power ISA v3.0B operation clearly does not contain
108 and is missing the two CR Field Selector bits. Thus, these two
109 bits (to select EQ, LE, GE or SO) must be provided in another
110 way.
111
112 Examples of the former type:
113
114 * crand, cror, crnor. These all are 5-bit (BA, BB, BT). The bit
115 to be tested against `inv` is the one selected by `BT`
116 * mcrf. This has only 3-bit (BF, BFA). In order to select the
117 bit to be tested, the alternative encoding must be used.
118 With `CRbit` coming from the SVP64 RM bits 22-23 the bit
119 of BF to be tested is identified.
120
121 # Predicate-result Condition Register operations
122
123 These are again slightly different compared to SVP64 arithmetic
124 pred-result (described in [[svp64/appendix]]). The reason is that,
125 again, for arithmetic operations the production of a CR Field when
126 Rc=1 is a *co-result* accompanying the main arithmetic result, whereas
127 for CR-based operations the CR Field (referred to by a 3-bit
128 v3.0B base operand from e.g. `mfcr`) or CR bit (referred to by a 5-bit operand from e.g. `crnor`)
129 *is* itself the explicit and sole result of the operation.
130
131 Therefore, logically, Predicate-result needs to be adapted to
132 test the actual result of the CR-based instruction, rather than
133 test the co-resultant CR when Rc=1.
134
135 for i in range(VL):
136 # predication test, skip all masked out elements.
137 # skips when sz=0
138 if sz=0 and predicate_masked_out(i):
139 continue
140 if predicate_masked_out(i):
141 if 5bit mode:
142 # only one bit of CR to update
143 result = SNZ
144 else
145 # four copies of SNZ
146 result = SNZ || SNZ || SNZ || SNZ
147 else
148 # result is to go into CR. may be a 4-bit CR Field
149 # (3-bit mode) or just a single bit (5-bit mode)
150 result = op(...)
151 if 5bit mode:
152 # if this CR op has 5-bit CR result operands
153 # the single bit result is what must be tested
154 to_test = result
155 else
156 # if however this is a 3-bit CR *field* result
157 # then the bit to be tested must be selected
158 to_test = result[CRbit]
159 # now test CR, similar to branch
160 if to_test != inv:
161 continue # test failed: cancel store
162 # result optionally stored
163 update_CR(result)