openpower/sv/cr_int_predication.mdwn

   1 [[!tag standards]]
   2
   3 # New instructions for CR/INT predication
   4
   5 See:
   6
   7 * <https://bugs.libre-soc.org/show_bug.cgi?id=533>
   8 * <https://bugs.libre-soc.org/show_bug.cgi?id=527>
   9 * <https://bugs.libre-soc.org/show_bug.cgi?id=569>
  10 * <https://bugs.libre-soc.org/show_bug.cgi?id=558#c47>
  11
  12 Basic concept:
  13
  14 * CR-based instructions that perform simple AND/OR/XOR from all four bits
  15   of a CR to create a single bit value (0/1) in an integer register
  16 * Inverse of the same, taking a single bit value (0/1) from an integer
  17   register to selectively target all four bits of a given CR
  18 * CR-to-CR version of the same, allowing multiple bits to be AND/OR/XORed
  19   in one hit.
  20 * Vectorisation of the same
  21
  22 Purpose:
  23
  24 * To provide a merged version of what is currently a multi-sequence of
  25   CR operations (crand, cror, crxor) with mfcr and mtcrf, reducing
  26   instruction count.
  27 * To provide a vectorised version of the same, suitable for advanced
  28   predication
  29
  30 Side-effects:
  31
  32 * mtcrweird when RA=0 is a means to set or clear arbitrary CR bits from immediates
  33
  34 (Twin) Predication interactions:
  35
  36 * INT twin predication with zeroing is a way to copy an integer into CRs without necessarily needing the INT register (RA).  if it is, it is effectively ANDed (or negate-and-ANDed) with the INT Predicate
  37 * CR twin predication with zeroing is likewise a way to interact with the incoming integer
  38
  39 this gets particularly powerful if data-dependent predication is also enabled.
  40
  41 # Bit ordering.
  42
  43 IBM chose MSB0 for the OpenPOWER v3.0B specification.  This makes things slightly hair-raising.  Our desire initially is therefore to follow the logical progression from the defined behaviour of `mtcr` and `mfcr` etc.
  44 In [[isa/sprset]] we see the pseudocode for `mtcrf` for example:
  45
  46     mtcrf FXM,RS
  47
  48     do n = 0 to 7
  49       if FXM[n] = 1 then
  50         CR[4*n+32:4*n+35] <- (RS)[4*n+32:4*n+35]
  51
  52 This places (according to a mask schedule) `CR0` into MSB0-numbered bits 32-35 of the target Integer register `RS`, these bits of `RS` being the 31st down to the 28th.  Unfortunately, even when not Vectorised, this inserts CR numbering inversions on each batch of 8 CRs, massively complicating matters.  Predication when using CRs would have to be morphed to this (unacceptably complex) behaviour:
  53
  54     for i in range(VL):
  55          n = (7-(i%8)) | (i & ~0x7) # total mess
  56          CRpredicate = CR{n}        # select CR0, CR1, ....
  57          predbit = CRpredicate[offs]  # select eq..ov bit
  58
  59 Which is nowhere close to matching the straightforward obvious case:
  60
  61     for i in range(VL):
  62          if INTpredmode:
  63              predbit = (r3)[63-i] # IBM MSB0 spec sigh
  64          else:
  65              CRpredicate = CR{i} # start at CR0, work up
  66              predbit = CRpredicate[offs]
  67
  68 In other words unless we do something about this, when we transfer bits from an Integer Predicate into a Vector of CRs, our numbering of CRs, when enumerating them in a CR Vector, would be **CR7** CR6 CR5.... CR0 **CR15** CR14 CR13... CR8 **CR23** CR22 etc. **not** the more natural and obvious CR0 CR1 ... CR23.
  69
  70 Therefore the instructions below need to **redefine** the relationship so that CR numbers (CR0, CR1) sequentially match the arithmetically-ordered bits of Integer registers.  By `arithmetic` this is deduced from the fact that the instruction `addi r3, r0, 1` will result in the **LSB** (numbered 63 in IBM MSB0 order) of r3 being set to 1 and all other bits set to zero.  We therefore refer, below, to this LSB as "Arithmetic bit 0", and it is this bit which is used - defined - as being the first bit used in Integer predication (on element 0).
  71
  72 Below is some pseudocode that, given a CR offset `offs` to represent `CR.eq` thru to `CR.ov` respectively, will copy the INT predicate bits in the correct order into the first 8 CRs:
  73
  74     do n = 0 to 7
  75         CR[4*n+32+offs] <- (RS)[63-n]
  76
  77 Assuming that `offs` is set to `CR.eq` this results in:
  78
  79 * Arithmetic bit 0 (the LSB, numbered 63 in IBM MSB0 terminology)
  80   of RS being inserted into CR0.eq
  81 * Arithmetic bit 1  of RS being inserted into CR1.eq
  82 * ...
  83 * Arithmetic bit 7 of RS being inserted into CR7.eq
  84
  85 To clarify, then: all instructions below do **NOT** follow the IBM convention, they follow the natural sequence CR0 CR1 instead.  However it is critically important to note that the offsets **in** a CR (`CR.eq` for example) continue to follow the v3.0B definition and convention.
  86
  87
  88 # Instruction form and pseudocode
  89
  90 Note that `CR{n}` refers to `CR0` when `n=0` and consequently, for CR0-7, is defined, in v3.0B pseudocode, as:
  91
  92      CR{7-n} = CR[32+n*4:35+n*4]
  93
  94 Instruction format:
  95
  96     | 0-5 | 6-10  | 11 | 12-15 | 16-18 | 19-20 | 21-25   | 26-30   | 31 |
  97     | --- | ----  | -- | ----- | ----- | ----- | -----   | -----   | -- |
  98     | 19  | RT    |    | mask  | BB    |    /  | XO[0:4] | XO[5:9] | /  |
  99     | 19  | RT    | 0  | mask  | BB    |  0 /  | XO[0:4] | 0 mode  | /  |
 100     | 19  | RA    | 1  | mask  | BB    |  0 /  | XO[0:4] | 0 mode  | /  |
 101     | 19  | BT // | 0  | mask  | BB    |  1 /  | XO[0:4] | 0 mode  | /  |
 102     | 19  | BFT   | 1  | mask  | BB    |  1 /  | XO[0:4] | 0 mode  | /  |
 103
 104 mode is encoded in XO and is 4 bits
 105
 106 bit 11=0, bit 19=0
 107
 108     crrweird: RT, BB, mask.mode
 109
 110     creg = CR{BB}
 111     n0 = mask[0] & (mode[0] == creg[0])
 112     n1 = mask[1] & (mode[1] == creg[1])
 113     n2 = mask[2] & (mode[2] == creg[2])
 114     n3 = mask[3] & (mode[3] == creg[3])
 115     RT[63] = n0|n1|n2|n3 # MSB0 numbering, 63 is LSB
 116
 117 bit 11=1, bit 19=0
 118
 119     mtcrweird: RA, BB, mask.mode
 120
 121     reg = (RA|0)
 122     lsb = reg[63] # MSB0 numbering
 123     n0 = mask[0] & (mode[0] == lsb)
 124     n1 = mask[1] & (mode[1] == lsb)
 125     n2 = mask[2] & (mode[2] == lsb)
 126     n3 = mask[3] & (mode[3] == lsb)
 127     CR{BB} = n0 || n1 || n2 || n3
 128
 129 bit 11=0, bit 19=1
 130
 131     crweird: BT, BB, mask.mode
 132
 133     creg = CR{BB}
 134     n0 = mask[0] & (mode[0] == creg[0])
 135     n1 = mask[1] & (mode[1] == creg[1])
 136     n2 = mask[2] & (mode[2] == creg[2])
 137     n3 = mask[3] & (mode[3] == creg[3])
 138     CR{BT} = n0 || n1 || n2 || n3
 139
 140 bit 11=1, bit 19=1
 141
 142     crweirder: BFT, BB, mask.mode
 143
 144     creg = CR{BB}
 145     n0 = mask[0] & (mode[0] == creg[0])
 146     n1 = mask[1] & (mode[1] == creg[1])
 147     n2 = mask[2] & (mode[2] == creg[2])
 148     n3 = mask[3] & (mode[3] == creg[3])
 149     BF = BFT[2:4] # select CR
 150     bit = BFT[0:1] # select bit of CR
 151     CR{BF}[bit] = n0|n1|n2|n3
 152
 153 Pseudo-op:
 154
 155     mtcri BB, mode    mtcrweird r0, BB, 0b1111.~mode
 156     mtcrset BB, mask  mtcrweird r0, BB, mask.0b0000
 157     mtcrclr BB, mask  mtcrweird r0, BB, mask.0b1111
 158
 159