openpower/sv/cr_int_predication.mdwn

   1 [[!tag standards]]
   2
   3 # New instructions for CR/INT predication
   4
   5 **DRAFT STATUS**
   6
   7 See:
   8
   9 * main bugreport for crweirds
  10   <https://bugs.libre-soc.org/show_bug.cgi?id=533>
  11 * <https://bugs.libre-soc.org/show_bug.cgi?id=527>
  12 * <https://bugs.libre-soc.org/show_bug.cgi?id=569>
  13 * <https://bugs.libre-soc.org/show_bug.cgi?id=558#c47>
  14
  15 Rationale:
  16
  17 Condition Registers are conceptually perfect for use as predicate masks, the only problem being that typical Vector ISAs have quite comprehensive mask-based instructions: set-before-first, popcount and much more.  In fact many Vector ISAs can use Vectors *as* masks, consequently the entire Vector ISA is available for use in creating masks.  This is not practical for SV given the strategy of leveraging pre-existing Scalar instructions in a minimalist way.
  18
  19 With the scalar OpenPOWER v3.0B ISA having already popcnt, cntlz and others normally seen in Vector Mask operations it makes sense to allow *both* scalar integers *and* CR-Vectors to be predicate masks.  That in turn means that much more comprehensive interaction between CRs and scalar Integers is required, because with the CR Predication Modes designating CR *Fields*
  20 (not CR bits) as Predicate Elements, fast transfers between CR *Fields* and
  21 the Integer Register File is needed.
  22
  23 The opportunity is therefore taken to also augment CR logical arithmetic as well, using a mask-based paradigm that takes into consideration multiple bits of each CR Field (eq/lt/gt/ov).  By contrast
  24 v3.0B Scalar CR instructions (crand, crxor) only allow a single bit calculation.
  25
  26 Also strangely there is no v3.0 instruction for moving CR Fields,
  27 so that is corrected here with `mcrfm`. The opportunity is taken
  28 to allow inversion of CR Fields when copied.
  29
  30 Basic concept:
  31
  32 * CR-based instructions that perform simple AND/OR/XOR from any four bits
  33   of a CR field to create a single bit value (0/1) in an integer register
  34 * Inverse of the same, taking a single bit value (0/1) from an integer
  35   register to selectively target any four bits of a given CR Field
  36 * CR-to-CR version of the same, allowing multiple bits to be AND/OR/XORed
  37   in one hit.
  38 * Optional Vectorisation of the same when SVP64 is implemented
  39
  40 Purpose:
  41
  42 * To provide a merged version of what is currently a multi-sequence of
  43   CR operations (crand, cror, crxor) with mfcr and mtcrf, reducing
  44   instruction count.
  45 * To provide a vectorised version of the same, suitable for advanced
  46   predication
  47
  48 Side-effects:
  49
  50 * mtcrweird when RA=0 is a means to set or clear arbitrary CR bits
  51   using immediates embedded within the instruction.
  52
  53 (Twin) Predication interactions:
  54
  55 * INT twin predication with zeroing is a way to copy an integer into CRs without necessarily needing the INT register (RA).  if it is, it is effectively ANDed (or negate-and-ANDed) with the INT Predicate
  56 * CR twin predication with zeroing is likewise a way to interact with the incoming integer
  57
  58 this gets particularly powerful if data-dependent predication is also enabled.  further explanation is below.
  59
  60 # Bit ordering.
  61
  62 Please see [[svp64/appendix]] regarding CR bit ordering and for
  63 the definition of `CR{n}`
  64
  65 # Instruction form and pseudocode
  66
  67 **DRAFT** Instruction format (use of MAJOR 19 not approved by
  68 OPF ISA WG):
  69
  70 |0-5|6-10 |11|12-15|16-18|19-20|21-25  |26-30  |31|name      |
  71 |---|---- |--|-----|-----|-----|-----  |-----  |--|----      |
  72 |19 |RT   |  |mask |BFA  |     |XO[0:4]|XO[5:9]|/ |          |
  73 |19 |RT   |M |mask |BFA  | 0 0 |XO[0:4]|0 mode |Rc|crrweird  |
  74 |19 |RA   |M |mask |BF   | 0 1 |XO[0:4]|0 mode |/ |mtcrweird |
  75 |19 |BT   |M |mask |BFA  | 1 0 |XO[0:4]|0 mode |/ |crweirder |
  76 |19 |BF //|M |mask |BFA  | 1 1 |XO[0:4]|0 mode |0 |crweird   |
  77 |19 |BF //|M |mask |BFA  | 1 1 |XO[0:4]|0 mode |1 |mcrfm     |
  78
  79 **crrweird**
  80
  81 mode is encoded in XO and is 4 bits
  82
  83 bit 19=0, bit 20=0
  84
  85     crrweird: RT, BFA, M, mask.mode
  86
  87     creg = CR{BFA}
  88     n0 = mask[0] & (mode[0] == creg[0])
  89     n1 = mask[1] & (mode[1] == creg[1])
  90     n2 = mask[2] & (mode[2] == creg[2])
  91     n3 = mask[3] & (mode[3] == creg[3])
  92     result = n0|n1|n2|n3 if M else n0&n1&n2&n3
  93     RT[63] = result # MSB0 numbering, 63 is LSB
  94     If Rc:
  95         CR0 = analyse(RT)
  96
  97 When used with SVP64 Prefixing this is a [[openpower/sv/normal]] SVP64 type operation and as
  98 such can use Rc=1 and RC1 Data-dependent Mode capability
  99
 100 **mtcrweird**
 101
 102 bit 19=0, bit 20=1
 103
 104     mtcrweird: BF, RA, M, mask.mode
 105
 106     reg = (RA|0)
 107     lsb = reg[63] # MSB0 numbering
 108     n0 = mask[0] & (mode[0] == lsb)
 109     n1 = mask[1] & (mode[1] == lsb)
 110     n2 = mask[2] & (mode[2] == lsb)
 111     n3 = mask[3] & (mode[3] == lsb)
 112     result = n0 || n1 || n2 || n3
 113     if M:
 114         result |= CR{BF} & ~mask
 115     CR{BF} = result
 116
 117 Note that when M=1 this operation is a Read-Modify-Write on the CR Field
 118 BF. Masked-out bits of the 4-bit CR Field BF will not be changed when
 119 M=1. Correspondingly when M=0 this operation is an overwrite: no read
 120 of BF is required because the masked-out bits of the BF CR Field are
 121 set to zero.
 122
 123 When used with SVP64 Prefixing this is a [[openpower/sv/cr_ops]] SVP64 type operation that has
 124 3-bit Data-dependent and 3-bit Predicate-result capability
 125 (BF is 3 bits)
 126
 127 **crweird**
 128
 129 bit 19=1, bit 20=0, Rc=0
 130
 131     crweird: BF, BFA, M, mask.mode
 132
 133     creg = CR{BFA}
 134     n0 = mask[0] & (mode[0] == creg[0])
 135     n1 = mask[1] & (mode[1] == creg[1])
 136     n2 = mask[2] & (mode[2] == creg[2])
 137     n3 = mask[3] & (mode[3] == creg[3])
 138     result = n0 || n1 || n2 || n3
 139     if M:
 140         result |= CR{BF} & ~mask
 141     CR{BF} = result
 142
 143 Note that when M=1 this operation is a Read-Modify-Write on the CR Field
 144 BF. Masked-out bits of the 4-bit CR Field BF will not be changed when
 145 M=1. Correspondingly when M=0 this operation is an overwrite: no read
 146 of BF is required because the masked-out bits of the BF CR Field are
 147 set to zero.
 148
 149 When used with SVP64 Prefixing this is a [[openpower/sv/cr_ops]] SVP64 type operation that has
 150 3-bit Data-dependent and 3-bit Predicate-result capability
 151 (BF is 3 bits)
 152
 153 **mcrfm** - Move CR Field, masked.
 154
 155 bit 19=1, bit 20=0, Rc=1
 156
 157     mcrfm: BF, BFA, M, mask.mode
 158
 159     result = mask & CR{BFA}
 160     if M:
 161         result |= CR{BF} & ~mask
 162     result ^= mode
 163     CR{BF} = result
 164
 165 Note that when M=1 this operation is a Read-Modify-Write on the CR Field
 166 BF. Masked-out bits of the 4-bit CR Field BF will not be changed when
 167 M=1. Correspondingly when M=0 this operation is an overwrite: no read
 168 of BF is required because the masked-out bits of the BF CR Field are
 169 set to zero.
 170
 171 When used with SVP64 Prefixing this is a [[openpower/sv/cr_ops]] SVP64 type operation that has
 172 3-bit Data-dependent and 3-bit Predicate-result capability
 173 (BF is 3 bits)
 174
 175 *Programmer's note: `mode` being XORed onto the result provides considerable
 176 flexibility. individual bits of BFA may be copied inverted to BF by
 177 ensuring that `mask` and `mode` have the same bit set.  Also, individual
 178 bits in BF may be set to 1 by ensuring that the required bit of `mask`
 179 is set to zero and the same bit in `mode` is set to 1*
 180
 181 **crweirder**
 182
 183 bit 19=1, bit 20=1
 184
 185     crweirder: BT, BFA, mask.mode
 186
 187     creg = CR{BFA}
 188     n0 = mask[0] & (mode[0] == creg[0])
 189     n1 = mask[1] & (mode[1] == creg[1])
 190     n2 = mask[2] & (mode[2] == creg[2])
 191     n3 = mask[3] & (mode[3] == creg[3])
 192     BF = BT[2:4] # select CR
 193     bit = BT[0:1] # select bit of CR
 194     result = n0|n1|n2|n3 if M else n0&n1&n2&n3
 195     CR{BF}[bit] = result
 196
 197 When used with SVP64 Prefixing this is a [[openpower/sv/cr_ops]] SVP64 type operation that has
 198 5-bit Data-dependent and 5-bit Predicate-result capability
 199 (BFT is 5 bits)
 200
 201 **Example Pseudo-ops:**
 202
 203     mtcri BF, mode    mtcrweird BF, r0, 0, 0b1111.~mode
 204     mtcrset BF, mask  mtcrweird BF, r0, 1, mask.0b0000
 205     mtcrclr BF, mask  mtcrweird BF, r0, 1, mask.0b1111
 206
 207 # Vectorised versions
 208
 209 The name "weird" refers to a minor violation of SV rules when it comes to deriving the Vectorised versions of these instructions.
 210
 211 Normally the progression of the SV for-loop would move on to the next register.
 212 Instead however in the scalar case these instructions **remain in the same register** and insert or transfer between **bits** of the scalar integer source or destination.
 213
 214 Further useful violation of the normal SV Elwidth override rules allows
 215 for packing (or unpacking) of multiple CR test results into
 216 (or out of) an Integer Element. Note
 217 that the CR (source operand) elwidth field is utilised to determine the bit-
 218 packing size (1/2/4/8 with remaining bits within the Integer element
 219 set to zero) whilst the INT (dest operand) elwidth field still sets
 220 the Integer element size as usual (8/16/32/default)
 221
 222     crrweird: RT, BB, mask.mode
 223
 224     for i in range(VL):
 225         if BB.isvec:
 226             creg = CR{BB+i}
 227         else:
 228             creg = CR{BB}
 229         n0 = mask[0] & (mode[0] == creg[0])
 230         n1 = mask[1] & (mode[1] == creg[1])
 231         n2 = mask[2] & (mode[2] == creg[2])
 232         n3 = mask[3] & (mode[3] == creg[3])
 233         # OR or AND to a single bit
 234         result = n0|n1|n2|n3 if M else n0&n1&n2&n3
 235         if RT.isvec:
 236             # TODO: RT.elwidth override to be also added here
 237             # note, yes, really, the CR's elwidth field determines
 238             # the bit-packing into the INT!
 239             if BB.elwidth == 0b00:
 240                 # pack 1 result into 64-bit registers
 241                 iregs[RT+i][0..62] = 0
 242                 iregs[RT+i][63] = result # sets LSB to result
 243             if BB.elwidth == 0b01:
 244                 # pack 2 results sequentially into INT registers
 245                 iregs[RT+i//2][0..61] = 0
 246                 iregs[RT+i//2][63-(i%2)] = result
 247             if BB.elwidth == 0b10:
 248                 # pack 4 results sequentially into INT registers
 249                 iregs[RT+i//4][0..59] = 0
 250                 iregs[RT+i//4][63-(i%4)] = result
 251             if BB.elwidth == 0b11:
 252                 # pack 8 results sequentially into INT registers
 253                 iregs[RT+i//8][0..55] = 0
 254                 iregs[RT+i//8][63-(i%8)] = result
 255         else:
 256             iregs[RT][63-i] = result # results also in scalar INT
 257
 258 Note that:
 259
 260 * in the scalar case the CR-Vector assessment
 261   is stored bit-wise starting at the LSB of the
 262    destination scalar INT
 263 * in the INT-vector case the results are packed into LSBs
 264   of the INT Elements, the packing arrangement depending on both
 265   elwidth override settings.
 266
 267 # v3.1 setbc instructions
 268
 269 there are additional setb conditional instructions in v3.1 (p129)
 270
 271     RT = (CR[BI] == 1) ? 1 : 0
 272
 273 which also negate that, and also return -1 / 0.  these are similar to crweird but not the same purpose.  most notable is that crweird acts on CR fields rather than the entire 32 bit CR.
 274
 275 # Predication Examples
 276
 277 Take the following example:
 278
 279     r10 = 0b00010
 280     sv.mtcrweird/dm=r10/dz cr8.v, 0, 0b0011.0000
 281
 282 Here, RA is zero, so the source input is zero. The destination
 283 is CR Field 8, and the destination predicate mask indicates
 284 to target the first two elements.  Destination predicate zeroing is
 285 enabled, and the destination predicate is only set in the 2nd bit.
 286 mask is 0b0011, mode is all zeros.
 287
 288 Let us first consider what should go into element 0 (CR Field 8):
 289
 290 * The destination predicate bit is zero, and zeroing is enabled.
 291 * Therefore, what is in the source is irrelevant: the result must
 292   be zero.
 293 * Therefore all four bits of CR Field 8 are therefore set to zero.
 294
 295 Now the second element, CR Field 9 (CR9):
 296
 297 * Bit 2 of the destination predicate, r10, is 1. Therefore the computation
 298   of the result is relevant.
 299 * RA is zero therefore bit 2 is zero.  mask is 0b0011 and mode is 0b0000
 300 * When calculating n0 thru n3 we get n0=1, n1=2, n2=0, n3=0
 301 * Therefore, CR9 is set (using LSB0 ordering) to 0b0011, i.e. to mask.
 302
 303 It should be clear that this instruction uses bits of the integer
 304 predicate to decide whether to set CR Fields to `(mask & ~mode)`
 305 or to zero.  Thus, in effect, it is the integer predicate that has
 306 been copied into the CR Fields.
 307
 308 By using twin predication, zeroing, and inversion (sm=~r3, dm=r10) for example, it becomes possible to combine two Integers together in
 309 order to set bits in CR Fields.
 310 Likewise there are dozens of ways that CR Predicates can be used, on the
 311 same sv.mtcrweird instruction.