# New instructions for CR/INT predication See: * * * Basic concept: * CR-based instructions that perform simple AND/OR/XOR from all four bits of a CR to create a single bit value (0/1) in an integer register * Inverse of the same, taking a single bit value (0/1) from an integer register to selectively target all four bits of a given CR * CR-to-CR version of the same, allowing multiple bits to be AND/OR/XORed in one hit. * Vectorisation of the same Purpose: * To provide a merged version of what is currently a multi-sequence of CR operations (crand, cror, crxor) with mfcr and mtcrf, reducing instruction count. * To provide a vectorised version of the same, suitable for advanced predication Side-effects: * mtcrweird when RA=0 is a means to set or clear arbitrary CR bits from immediates (Twin) Predication interactions: * INT twin predication with zeroing is a way to copy an integer into CRs without necessarily needing the INT register (RA). if it is, it is effectively ANDed (or negate-and-ANDed) with the INT Predicate * CR twin predication with zeroing is likewise a way to interact with the incoming integer this gets particularly powerful if data-dependent predication is also enabled. # Bit ordering. IBM chose MSB0 for the OpenPOWER v3.0B specification. This makes things slightly hair-raising. Our model initially therefore to follow the logical progression from the defined behaviour of `mtcr` and `mfcr` etc. In [[isa/sprset]] we see the pseudocode for `mtcrf` for example: mtcrf FXM,RS do n = 0 to 7 if FXM[n] = 1 then CR[4*n+32:4*n+35] <- (RS)[4*n+32:4*n+35] This places (according to a mask schedule) `CR0` into MSB0-numbered bits 32-35 of the target Integer register `RS`, these bits of `RS` being the 31st down to the 28th. Unfortunately, even when not Vectorised, this inserts CR numbering inversions on each batch of 8 CRs, massively complicating matters. In other words unless we do something about this, when we transger bits from an Integer Predicate into a Vector of CRs, our numbering of CRs, when enumerating them in a CR Vector, would be CR7 CR6 CR5.... CR0 **CR15** CR14 CR13... CR8 **CR23** CR22 etc. **not** CR0 CR1 ... CR23. Therefore the instructions below need to **redefine** the relationship so that CR numbers (CR0, CR1) sequentially match the arithmetically-ordered bits of Integer registers. By `arithmetic` this is deduced from the fact that the ibsteuction `addi r3, r0, 1` it will result in the **LSB** (numbered 63 in IBM MSB0 order) of r3 being set to 1 and all other bits,set to zero. We therefore refer, below, to this LSB as "Arithmetic bit 0", and it is this bit which is used - defined - as being the first bit used in predication (on element 0). Below is some pseudocode that, given a CR offset `offs` to represent `CR.eq` thru to `CR.ov` respectively, will copy the INT predicate bits in the correct order into the first 8 CRs: do n = 0 to 7 CR[4*n+32+offs] <- (RS)[63-n] Assuming that `offs` is set to `CR.eq` this results in: * Arithmetic bit 0 (the LSB) of RS being inserted into CR0.eq * Arithmetic bit 1 of RS being inserted into CR1.eq * ... * Arithmetic bit 7 of RS being inserted into CR7.eq To clarify, then: all instructions below do **NOT** follow the IBM convention, they follow the natural sequence CR0 CR1 instead. However it is critically important to note that the offsets **in** a CR (`CR.eq` for example) continue to follow the v3.0B definition and convention. # Instruction form and pseudocode | 0-5 | 6-10 | 11 | 12-15 | 16-18 | 19-20 | 21-25 | 26-30 | 31 | | --- | ---- | -- | ----- | ----- | ----- | ----- | ----- | -- | | 19 | RT | | mask | BB | / | XO[0:4] | XO[5:9] | / | | 19 | RT | 0 | mask | BB | 0 / | XO[0:4] | 0 mode | / | | 19 | RA | 1 | mask | BB | 0 / | XO[0:4] | 0 mode | / | | 19 | BT // | 0 | mask | BB | 1 / | XO[0:4] | 0 mode | / | | 19 | BFT | 1 | mask | BB | 1 / | XO[0:4] | 0 mode | / | mode is encoded in XO and is 4 bits bit 11=0, bit 19=0 crrweird: RT, BB, mask.mode creg = CRfile[32+BB*4:36+BB*4] n0 = mask[0] & (mode[0] == creg[0]) n1 = mask[1] & (mode[1] == creg[1]) n2 = mask[2] & (mode[2] == creg[2]) n3 = mask[3] & (mode[3] == creg[3]) RT[0] = n0|n1|n2|n3 bit 11=1, bit 19=0 mtcrweird: RA, BB, mask.mode reg = (RA|0) n0 = mask[0] & (mode[0] == reg[0]) n1 = mask[1] & (mode[1] == reg[0]) n2 = mask[2] & (mode[2] == reg[0]) n3 = mask[3] & (mode[3] == reg[0]) CRfile[32+BB*4:36+BB*4] = n0 || n1 || n2 || n3 bit 11=0, bit 19=1 crweird: BT, BB, mask.mode creg = CRfile[32+BB*4:36+BB*4] n0 = mask[0] & (mode[0] == creg[0]) n1 = mask[1] & (mode[1] == creg[1]) n2 = mask[2] & (mode[2] == creg[2]) n3 = mask[3] & (mode[3] == creg[3]) CRfile[32+BT*4:36+BT*4] = n0 || n1 || n2 || n3 bit 11=1, bit 19=1 crweirder: BFT, BB, mask.mode creg = CRfile[32+BB*4:36+BB*4] n0 = mask[0] & (mode[0] == creg[0]) n1 = mask[1] & (mode[1] == creg[1]) n2 = mask[2] & (mode[2] == creg[2]) n3 = mask[3] & (mode[3] == creg[3]) CRfile[32+BFT] = n0|n1|n2|n3 Pseudo-op: mtcri BB, mode mtcrweird r0, BB, 0b1111.~mode mtcrset BB, mask mtcrweird r0, BB, mask.0b0000 mtcrclr BB, mask mtcrweird r0, BB, mask.0b1111