# New instructions for CR/INT predication

See:

* <https://bugs.libre-soc.org/show_bug.cgi?id=533>
* <https://bugs.libre-soc.org/show_bug.cgi?id=527>
* <https://bugs.libre-soc.org/show_bug.cgi?id=569>

Basic concept:

* CR-based instructions that perform simple AND/OR/XOR from all four bits
  of a CR to create a single bit value (0/1) in an integer register
* Inverse of the same, taking a single bit value (0/1) from an integer
  register to selectively target all four bits of a given CR
* CR-to-CR version of the same, allowing multiple bits to be AND/OR/XORed
  in one hit.
* Vectorisation of the same

Purpose:

* To provide a merged version of what is currently a multi-sequence of
  CR operations (crand, cror, crxor) with mfcr and mtcrf, reducing
  instruction count.
* To provide a vectorised version of the same, suitable for advanced
  predication

Side-effects:

* mtcrweird when RA=0 is a means to set or clear arbitrary CR bits from immediates

(Twin) Predication interactions:

* INT twin predication with zeroing is a way to copy an integer into CRs without necessarily needing the INT register (RA).  if it is, it is effectively ANDed (or negate-and-ANDed) with the INT Predicate
* CR twin predication with zeroing is likewise a way to interact with the incoming integer

this gets particularly powerful if data-dependent predication is also enabled.

# Bit ordering.

IBM chose MSB0 for the OpenPOWER v3.0B specification.  This makes things slightly hair-raising.  Our model initially therefore to follow the logical progression from the defined behaviour of `mtcr` and `mfcr` etc.  
In [[isa/sprset]] we see the pseudocode for `mtcrf` for example:

    mtcrf FXM,RS

    do n = 0 to 7
      if FXM[n] = 1 then
        CR[4*n+32:4*n+35] <- (RS)[4*n+32:4*n+35]

This places (according to a mask schedule) `CR0` into MSB0-numbered bits 32-35 of the target Integer register `RS`, these bits of `RS` being the 31st down to the 28th.  Unfortunately, even when not Vectorised, this inserts CR numbering inversions on each batch of 8 CRs, massively complicating matters.

In other words unless we do something about this, when we transger bits from an Integer Predicate into a Vector of CRs, our numbering of CRs, when enumerating them in a CR Vector, would be CR7 CR6 CR5.... CR0 **CR15** CR14 CR13... CR8 **CR23** CR22 etc. **not** CR0 CR1 ... CR23.

Therefore the instructions below need to **redefine** the relationship so that CR numbers (CR0, CR1) sequentially match the arithmetically-ordered bits of Integer registers.  By `arithmetic` this is deduced from the fact that the ibsteuction `addi r3, r0, 1` it will result in the **LSB** (numbered 63 in IBM MSB0 order) of r3 being set to 1 and all other bits,set to zero.  We therefore refer, below, to this LSB as "Arithmetic bit 0", and it is this bit which is used - defined - as being the first bit used in predication (on element 0).

Below is some pseudocode that, given a CR offset `offs` to represent `CR.eq` thru to `CR.ov` respectively, will copy the INT predicate bits in the correct order into the first 8 CRs:

    do n = 0 to 7
        CR[4*n+32+offs] <- (RS)[63-n]

Assuming that `offs` is set to `CR.eq` this results in:

* Arithmetic bit 0 (the LSB) of RS being inserted into CR0.eq
* Arithmetic bit 1  of RS being inserted into CR1.eq
* ...
* Arithmetic bit 7 of RS being inserted into CR7.eq

To clarify, then: all instructions below do **NOT** follow the IBM convention, they follow the natural sequence CR0 CR1 instead.  However it is critically important to note that the offsets **in** a CR (`CR.eq` for example) continue to follow the v3.0B definition and convention.


# Instruction form and pseudocode

    | 0-5 | 6-10  | 11 | 12-15 | 16-18 | 19-20 | 21-25   | 26-30   | 31 |
    | --- | ----  | -- | ----- | ----- | ----- | -----   | -----   | -- |
    | 19  | RT    |    | mask  | BB    |    /  | XO[0:4] | XO[5:9] | /  |
    | 19  | RT    | 0  | mask  | BB    |  0 /  | XO[0:4] | 0 mode  | /  |
    | 19  | RA    | 1  | mask  | BB    |  0 /  | XO[0:4] | 0 mode  | /  |
    | 19  | BT // | 0  | mask  | BB    |  1 /  | XO[0:4] | 0 mode  | /  |
    | 19  | BFT   | 1  | mask  | BB    |  1 /  | XO[0:4] | 0 mode  | /  |

mode is encoded in XO and is 4 bits

bit 11=0, bit 19=0

    crrweird: RT, BB, mask.mode

    creg = CRfile[32+BB*4:36+BB*4]
    n0 = mask[0] & (mode[0] == creg[0])
    n1 = mask[1] & (mode[1] == creg[1])
    n2 = mask[2] & (mode[2] == creg[2])
    n3 = mask[3] & (mode[3] == creg[3])
    RT[0] = n0|n1|n2|n3

bit 11=1, bit 19=0

    mtcrweird: RA, BB, mask.mode

    reg = (RA|0)
    n0 = mask[0] & (mode[0] == reg[0])
    n1 = mask[1] & (mode[1] == reg[0])
    n2 = mask[2] & (mode[2] == reg[0])
    n3 = mask[3] & (mode[3] == reg[0])
    CRfile[32+BB*4:36+BB*4] = n0 || n1 || n2 || n3

bit 11=0, bit 19=1

    crweird: BT, BB, mask.mode

    creg = CRfile[32+BB*4:36+BB*4]
    n0 = mask[0] & (mode[0] == creg[0])
    n1 = mask[1] & (mode[1] == creg[1])
    n2 = mask[2] & (mode[2] == creg[2])
    n3 = mask[3] & (mode[3] == creg[3])
    CRfile[32+BT*4:36+BT*4] = n0 || n1 || n2 || n3

bit 11=1, bit 19=1

    crweirder: BFT, BB, mask.mode

    creg = CRfile[32+BB*4:36+BB*4]
    n0 = mask[0] & (mode[0] == creg[0])
    n1 = mask[1] & (mode[1] == creg[1])
    n2 = mask[2] & (mode[2] == creg[2])
    n3 = mask[3] & (mode[3] == creg[3])
    CRfile[32+BFT] = n0|n1|n2|n3

Pseudo-op:

    mtcri BB, mode    mtcrweird r0, BB, 0b1111.~mode
    mtcrset BB, mask  mtcrweird r0, BB, mask.0b0000
    mtcrclr BB, mask  mtcrweird r0, BB, mask.0b1111