openpower/sv/cr_ops.mdwn

   1 # Condition Register SVP64 Operations
   2
   3 **DRAFT STATUS**
   4
   5 Links:
   6
   7 * <https://bugs.libre-soc.org/show_bug.cgi?id=687>
   8 * <https://bugs.libre-soc.org/show_bug.cgi?id=936> write on failfirst
   9 * [[svp64]]
  10 * [[sv/branches]]
  11 * [[sv/cr_int_predication]]
  12 * [[openpower/isa/sprset]]
  13 * [[openpower/isa/condition]]
  14 * [[openpower/isa/comparefixed]]
  15
  16 Condition Register Fields are only 4 bits wide: this presents some
  17 interesting conceptual challenges for SVP64, which was designed
  18 primarily for vectors of arithmetic and logical operations. However
  19 if predicates may be bits of CR Fields it makes sense to extend
  20 Simple-V to cover CR Operations, especially given that Vectorised Rc=1
  21 may be processed by Vectorised CR Operations that usefully in turn
  22 may become Predicate Masks to yet more Vector operations, like so:
  23
  24 ```
  25     sv.cmpi/ew=8 *B,*ra,0    # compare bytes against zero
  26     sv.cmpi/ew=8 *B2,*ra,13. # and against newline
  27     sv.cror PM.EQ,B.EQ,B2.EQ # OR compares to create mask
  28     sv.stb/sm=EQ    ...      # store only nonzero/newline
  29 ```
  30
  31 Element width however is clearly meaningless for a 4-bit collation of
  32 Conditions, EQ LT GE SO. Likewise, arithmetic saturation (an important
  33 part of Arithmetic SVP64) has no meaning. An alternative Mode Format is
  34 required, and given that elwidths are meaningless for CR Fields the bits
  35 in SVP64 `RM` may be used for other purposes.
  36
  37 This alternative mapping **only** applies to instructions that **only**
  38 reference a CR Field or CR bit as the sole exclusive result. This section
  39 **does not** apply to instructions which primarily produce arithmetic
  40 results that also, as an aside, produce a corresponding CR Field (such as
  41 when Rc=1).  Instructions that involve Rc=1 are definitively arithmetic
  42 in nature, where the corresponding Condition Register Field can be
  43 considered to be a "co-result". Such CR Field "co-result" arithmeric
  44 operations are firmly out of scope for this section, being covered fully
  45 by [[sv/normal]].
  46
  47 * Examples of v3.0B instructions to which this section does
  48   apply is
  49   - `mfcr` and `cmpi` (3 bit operands) and
  50   - `crnor` and `crand` (5 bit operands).
  51 * Examples to which this section does **not** apply include
  52   `fadds.` and `subf.` which both produce arithmetic results
  53   (and a CR Field co-result).
  54 * `mtcr` is considered [[openpower/sv/normal]] because it refers
  55   to the entire 32-bit Condition Register rather than to CR Fields.
  56
  57 The CR Mode Format still applies to `sv.cmpi` because despite
  58 taking a GPR as input, the output from the Base Scalar v3.0B `cmpi`
  59 instruction is purely to a Condition Register Field.
  60
  61 Other modes are still applicable and include:
  62
  63 * **Data-dependent fail-first**.
  64   useful to truncate VL based on analysis of a Condition Register result bit.
  65 * **Reduction**.
  66   Reduction is useful for analysing a Vector of Condition Register Fields
  67   and reducing it to one single Condition Register Field.
  68
  69 Predicate-result does not make any sense because when Rc=1 a co-result
  70 is created (a CR Field). Testing the co-result allows the decision to
  71 be made to store or not store the main result, and for CR Ops the CR
  72 Field result *is* the main result.
  73
  74 ## Format
  75
  76 SVP64 RM `MODE` (includes `ELWIDTH_SRC` bits) for CR-based operations:
  77
  78 |6 | 7 |19-20|  21 | 22   23 |  description     |
  79 |--|---|-----| --- |---------|----------------- |
  80 |/ | / |0 RG |   0 | dz  sz  | simple mode                      |
  81 |/ | / |0 RG |   1 | dz  sz  | scalar reduce mode (mapreduce) |
  82 |zz|SNZ|1 VLI| inv |  CR-bit | Ffirst 3-bit mode      |
  83 |/ |SNZ|1 VLI| inv |  dz sz  | Ffirst 5-bit mode (implies CR-bit from result) |
  84
  85 Fields:
  86
  87 * **sz / dz**  if predication is enabled will put zeros into the dest
  88  (or as src in the case of twin pred) when the predicate bit is zero.
  89   otherwise the element is ignored or skipped, depending on context.
  90 * **zz** set both sz and dz equal to this flag
  91 * **SNZ** In fail-first mode, on the bit being tested, when sz=1 and
  92   SNZ=1 a value "1" is put in place of "0".
  93 * **inv CR-bit** just as in branches (BO) these bits allow testing of
  94   a CR bit and whether it is set (inv=0) or unset (inv=1)
  95 * **RG** inverts the Vector Loop order (VL-1 downto 0) rather
  96   than the normal 0..VL-1
  97 * **SVM** sets "subvector" reduce mode
  98 * **VLi** VL inclusive: in fail-first mode, the truncation of
  99   VL *includes* the current element at the failure point rather
 100   than excludes it from the count.
 101
 102 ## Data-dependent fail-first on CR operations
 103
 104 The principle of data-dependent fail-first is that if, during the course
 105 of sequentially evaluating an element's Condition Test, one such test
 106 is encountered which fails, then VL (Vector Length) is truncated (set)
 107 at that point. In the case of Arithmetic SVP64 Operations the Condition
 108 Register Field generated from Rc=1 is used as the basis for the truncation
 109 decision.  However with CR-based operations that CR Field result to be
 110 tested is provided *by the operation itself*.
 111
 112 Data-dependent SVP64 Vectorised Operations involving the creation
 113 or modification of a CR can require an extra two bits, which are not
 114 available in the compact space of the SVP64 RM `MODE` Field. With the
 115 concept of element width overrides being meaningless for CR Fields it
 116 is possible to use the `ELWIDTH` field for alternative purposes.
 117
 118 Condition Register based operations such as `sv.mfcr` and `sv.crand`
 119 can thus be made more flexible.  However the rules that apply in this
 120 section also apply to future CR-based instructions.
 121
 122 There are two primary different types of CR operations:
 123
 124 * Those which have a 3-bit operand field (referring to a CR Field)
 125 * Those which have a 5-bit operand (referring to a bit within the
 126    whole 32-bit CR)
 127
 128 Examining these two types it is observed that the difference may
 129 be considered to be that the 5-bit variant *already* provides the
 130 prerequisite information about which CR Field bit (EQ, GE, LT, SO) is
 131 to be operated on by the instruction.  Thus, logically, we may set the
 132 following rule:
 133
 134 * When a 5-bit CR Result field is used in an instruction, the
 135   5-bit variant of Data-Dependent Fail-First
 136   must be used. i.e. the bit of the CR field to be tested is
 137   the one that has just been modified (created) by the operation.
 138 * When a 3-bit CR Result field is used the 3-bit variant
 139   must be used, providing as it does the missing `CRbit` field
 140   in order to select which CR Field bit of the result shall
 141   be tested (EQ, LE, GE, SO)
 142
 143 The reason why the 3-bit CR variant needs the additional CR-bit field
 144 should be obvious from the fact that the 3-bit CR Field from the base
 145 Power ISA v3.0B operation clearly does not contain and is missing the
 146 two CR Field Selector bits. Thus, these two bits (to select EQ, LE,
 147 GE or SO) must be provided in another way.
 148
 149 Examples of the former type:
 150
 151 * crand, cror, crnor. These all are 5-bit (BA, BB, BT). The bit
 152   to be tested against `inv` is the one selected by `BT`
 153 * mcrf. This has only 3-bit (BF, BFA). In order to select the
 154   bit to be tested, the alternative encoding must be used.
 155   With `CRbit` coming from the SVP64 RM bits 22-23 the bit
 156   of BF to be tested is identified.
 157
 158 Just as with SVP64 [[sv/branches]] there is the option to truncate
 159 VL to include the element being tested (`VLi=1`) and to exclude it
 160 (`VLi=0`).
 161
 162 Also exactly as with [[sv/normal]] fail-first, VL cannot, unlike
 163 [[sv/ldst]], be set to an arbitrary value.  Deterministic behaviour
 164 is *required*.
 165
 166 ## Reduction and Iteration
 167
 168 Bearing in mind as described in the [[svp64/appendix]] SVP64 Horizontal
 169 Reduction is a deterministic schedule on top of base Scalar v3.0
 170 operations, the same rules apply to CR Operations, i.e. that programmers
 171 must follow certain conventions in order for an *end result* of a
 172 reduction to be achieved.  Unlike other Vector ISAs *there are no explicit
 173 reduction opcodes* in SVP64: Schedules however achieve the same effect.
 174
 175 Due to these conventions only reduction on operations such as `crand`
 176 and `cror` are meaningful because these have Condition Register Fields
 177 as both input and output.  Meaningless operations are not prohibited
 178 because the cost in hardware of doing so is prohibitive, but neither
 179 are they `UNDEFINED`. Implementations are still required to execute them
 180 but are at liberty to optimise out any operations that would ultimately
 181 be overwritten, as long as Strict Program Order is still obvservable by
 182 the programmer.
 183
 184 Also bear in mind that 'Reverse Gear' may be enabled, which can be
 185 used in combination with overlapping CR operations to iteratively
 186 accumulate results.  Issuing a `sv.crand` operation for example with
 187 `BA` differing from `BB` by one Condition Register Field would result
 188 in a cascade effect, where the first-encountered CR Field would set the
 189 result to zero, and also all subsequent CR Field elements thereafter:
 190
 191 ```
 192     # sv.crand/mr/rg CR4.ge.v, CR5.ge.v, CR4.ge.v
 193     for i in VL-1 downto 0 # reverse gear
 194          CR.field[4+i].ge &= CR.field[5+i].ge
 195 ```
 196
 197 `sv.crxor` with reduction would be particularly useful for parity
 198 calculation for example, although there are many ways in which the same
 199 calculation could be carried out (`parityw`)
 200 after transferring a vector of CR Fields
 201 to a GPR using crweird operations.
 202
 203 Implementations are free and clear to optimise these reductions in any way
 204 they see fit, as long as the end-result is compatible with Strict Program
 205 Order being observed, and Interrupt latency is not adversely impacted.
 206 Good examples include `sv.cror/mr` which is a cumulative ORing of
 207 a Vector of CR Field bits, and consequently an easy target for
 208 parallelising.
 209
 210 ## Unusual and quirky CR operations
 211
 212 **cmp and other compare ops**
 213
 214 `cmp` and `cmpi` etc take GPRs as sources and create a CR Field as a result.
 215
 216 ```
 217     cmpli BF,L,RA,UI
 218     cmpeqb BF,RA,RB
 219 ```
 220
 221 With `ELWIDTH` applying to the source GPR operands this is perfectly fine.
 222
 223 **crweird operations**
 224
 225 There are 4 weird CR-GPR operations and one reasonable one in
 226 the [[cr_int_predication]] set:
 227
 228 * crrweird
 229 * mtcrweird
 230 * crweirder
 231 * crweird
 232 * mcrfm - reasonably normal and referring to CR Fields for src and dest.
 233
 234 The "weird" operations have a non-standard behaviour, being able to
 235 treat *individual bits* of a GPR effectively as elements.  They are
 236 expected to be Micro-coded by most Hardware implementations.
 237
 238 ## Effectively-separate Vector and Scalar Condition Register file
 239
 240 As mentioned in the introduction on [[sv/svp64]] some prohibitions
 241 are made on instructions involving Condition Registers that allow
 242 implementors to actually consider the Scalar CR (fields CR0-CR7)
 243 as a completely separate register file from the Vector CRs
 244 (fields CR8-CR127).
 245
 246 The complications arise for existing Hardware implementations
 247 due to Power ISA not having had "Conditional Execution" added.
 248 Adding entirely new pipelines and a new Vector CR Register file
 249 is a much easier proposition to consider.
 250
 251 The prohibitions utilise the CR Field numbers implicitly to
 252 split out Vectorised CR operations to be considered completely
 253 separare and distinct from Scalar CR operations *even though
 254 they both use the same binary encoding*.  This does in turn
 255 mean that at the Decode Phase it becomes necessary to examine
 256 not only the operation (`sv.crand`, `sv.cmp`) but also
 257 the CR Field numbers as well as whether, in the EXTRA2/3 Mode
 258 bits, the operands are Vectorised.
 259
 260 A future version of Power ISA, where SVP64Single is proposed,
 261 would in fact introduce "Conditional Execution", including
 262 for VSX.  At which point this prohibition becomes moot as
 263 Predication would be required to be added into the existing
 264 Scalar (and PackedSIMD VSX) side of existing Power ISA
 265 implementations.
 266
 267
 268 --------
 269
 270 [[!tag standards]]
 271
 272 \newpage{}
 273