+[[!tag standards]]
+
# New instructions for CR/INT predication
See:
* <https://bugs.libre-soc.org/show_bug.cgi?id=533>
+* <https://bugs.libre-soc.org/show_bug.cgi?id=527>
+* <https://bugs.libre-soc.org/show_bug.cgi?id=569>
+* <https://bugs.libre-soc.org/show_bug.cgi?id=558#c47>
+
+Rationale:
+
+Condition Registers are conceptually perfect for use as predicate masks, the only problem being that typical Vector ISAs have quite comprehensive mask-based instructions: set-before-first, popcount and much more. In fact many Vector ISAs can use Vectors *as* masks, consequently the entire Vector ISA is available for use in creating masks. This is not practical for SV given the premise to minimise adding of instructions.
+
+With the scalar OpenPOWER v3.0B ISA having already popcnt, cntlz and others normally seen in Vector Mask operations it makes sense to allow *both* scalar integers *and* CR-Vectors to be predicate masks. That in turn means that much more comprehensive interaction between CRs and scalar Integers is required.
+
+The opportunity is therefore taken to also augment CR logical arithmetic as well, using a mask-based paradigm that takes into consideration multiple bits of each CR (eq/lt/gt/ov). v3.0B Scalar CR instructions (crand, crxor) only allow a single bit calculation.
Basic concept:
of a CR to create a single bit value (0/1) in an integer register
* Inverse of the same, taking a single bit value (0/1) from an integer
register to selectively target all four bits of a given CR
+* CR-to-CR version of the same, allowing multiple bits to be AND/OR/XORed
+ in one hit.
* Vectorisation of the same
Purpose:
* To provide a merged version of what is currently a multi-sequence of
- CR operations (crand, cror, crxor) with mfcr and mtcrf
+ CR operations (crand, cror, crxor) with mfcr and mtcrf, reducing
+ instruction count.
* To provide a vectorised version of the same, suitable for advanced
predication
+Side-effects:
+
+* mtcrweird when RA=0 is a means to set or clear arbitrary CR bits from immediates
+
+(Twin) Predication interactions:
+
+* INT twin predication with zeroing is a way to copy an integer into CRs without necessarily needing the INT register (RA). if it is, it is effectively ANDed (or negate-and-ANDed) with the INT Predicate
+* CR twin predication with zeroing is likewise a way to interact with the incoming integer
+
+this gets particularly powerful if data-dependent predication is also enabled.
+
+# Bit ordering.
+
+IBM chose MSB0 for the OpenPOWER v3.0B specification. This makes things slightly hair-raising. Our desire initially is therefore to follow the logical progression from the defined behaviour of `mtcr` and `mfcr` etc.
+In [[isa/sprset]] we see the pseudocode for `mtcrf` for example:
+
+ mtcrf FXM,RS
+
+ do n = 0 to 7
+ if FXM[n] = 1 then
+ CR[4*n+32:4*n+35] <- (RS)[4*n+32:4*n+35]
+
+This places (according to a mask schedule) `CR0` into MSB0-numbered bits 32-35 of the target Integer register `RS`, these bits of `RS` being the 31st down to the 28th. Unfortunately, even when not Vectorised, this inserts CR numbering inversions on each batch of 8 CRs, massively complicating matters. Predication when using CRs would have to be morphed to this (unacceptably complex) behaviour:
+
+ for i in range(VL):
+ if INTpredmode:
+ predbit = (r3)[63-i] # IBM MSB0 spec sigh
+ else:
+ # completely incomprehensible vertical numbering
+ n = (7-(i%8)) | (i & ~0x7) # total mess
+ CRpredicate = CR{n} # select CR0, CR1, ....
+ predbit = CRpredicate[offs] # select eq..ov bit
+
+Which is nowhere close to matching the straightforward obvious case:
+
+ for i in range(VL):
+ if INTpredmode:
+ predbit = (r3)[63-i] # IBM MSB0 spec sigh
+ else:
+ CRpredicate = CR{i} # start at CR0, work up
+ predbit = CRpredicate[offs]
+
+In other words unless we do something about this, when we transfer bits from an Integer Predicate into a Vector of CRs, our numbering of CRs, when enumerating them in a CR Vector, would be **CR7** CR6 CR5.... CR0 **CR15** CR14 CR13... CR8 **CR23** CR22 etc. **not** the more natural and obvious CR0 CR1 ... CR23.
+
+Therefore the instructions below need to **redefine** the relationship so that CR numbers (CR0, CR1) sequentially match the arithmetically-ordered bits of Integer registers. By `arithmetic` this is deduced from the fact that the instruction `addi r3, r0, 1` will result in the **LSB** (numbered 63 in IBM MSB0 order) of r3 being set to 1 and all other bits set to zero. We therefore refer, below, to this LSB as "Arithmetic bit 0", and it is this bit which is used - defined - as being the first bit used in Integer predication (on element 0).
+
+Below is some pseudocode that, given a CR offset `offs` to represent `CR.eq` thru to `CR.ov` respectively, will copy the INT predicate bits in the correct order into the first 8 CRs:
+
+ do n = 0 to 7
+ CR[4*n+32+offs] <- (RS)[63-n]
+
+Assuming that `offs` is set to `CR.eq` this results in:
+
+* Arithmetic bit 0 (the LSB, numbered 63 in IBM MSB0 terminology)
+ of RS being inserted into CR0.eq
+* Arithmetic bit 1 of RS being inserted into CR1.eq
+* ...
+* Arithmetic bit 7 of RS being inserted into CR7.eq
+
+To clarify, then: all instructions below do **NOT** follow the IBM convention, they follow the natural sequence CR0 CR1 instead. However it is critically important to note that the offsets **in** a CR (`CR.eq` for example) continue to follow the v3.0B definition and convention.
+
+
# Instruction form and pseudocode
- | 0-5 | 6-10 | 11 | 12-15 | 16-18 | 19-20 | 21-30 | 31 |
- | 19 | RT | 0 | mask | BB | m2 | XO | / |
- | 19 | RT | 1 | mask | BB | m2 | XO | / |
+Note that `CR{n}` refers to `CR0` when `n=0` and consequently, for CR0-7, is defined, in v3.0B pseudocode, as:
+
+ CR{7-n} = CR[32+n*4:35+n*4]
+
+Instruction format:
+
+ | 0-5 | 6-10 | 11 | 12-15 | 16-18 | 19-20 | 21-25 | 26-30 | 31 |
+ | --- | ---- | -- | ----- | ----- | ----- | ----- | ----- | -- |
+ | 19 | RT | | mask | BB | | XO[0:4] | XO[5:9] | / |
+ | 19 | RT | 0 | mask | BB | 0 M | XO[0:4] | 0 mode | Rc |
+ | 19 | RA | 1 | mask | BB | 0 / | XO[0:4] | 0 mode | / |
+ | 19 | BT // | 0 | mask | BB | 1 / | XO[0:4] | 0 mode | / |
+ | 19 | BFT | 1 | mask | BB | 1 M | XO[0:4] | 0 mode | / |
+
+mode is encoded in XO and is 4 bits
+
+bit 11=0, bit 19=0
+
+ crrweird: RT, BB, mask.mode
+
+ creg = CR{BB}
+ n0 = mask[0] & (mode[0] == creg[0])
+ n1 = mask[1] & (mode[1] == creg[1])
+ n2 = mask[2] & (mode[2] == creg[2])
+ n3 = mask[3] & (mode[3] == creg[3])
+ result = n0|n1|n2|n3 if M else n0&n1&n2&n3
+ RT[63] = result # MSB0 numbering, 63 is LSB
+ If Rc:
+ CR1 = analyse(RT)
+
+bit 11=1, bit 19=0
+
+ mtcrweird: RA, BB, mask.mode
+
+ reg = (RA|0)
+ lsb = reg[63] # MSB0 numbering
+ n0 = mask[0] & (mode[0] == lsb)
+ n1 = mask[1] & (mode[1] == lsb)
+ n2 = mask[2] & (mode[2] == lsb)
+ n3 = mask[3] & (mode[3] == lsb)
+ CR{BB} = n0 || n1 || n2 || n3
+
+bit 11=0, bit 19=1
+
+ crweird: BT, BB, mask.mode
+
+ creg = CR{BB}
+ n0 = mask[0] & (mode[0] == creg[0])
+ n1 = mask[1] & (mode[1] == creg[1])
+ n2 = mask[2] & (mode[2] == creg[2])
+ n3 = mask[3] & (mode[3] == creg[3])
+ CR{BT} = n0 || n1 || n2 || n3
+
+bit 11=1, bit 19=1
+
+ crweirder: BFT, BB, mask.mode
+
+ creg = CR{BB}
+ n0 = mask[0] & (mode[0] == creg[0])
+ n1 = mask[1] & (mode[1] == creg[1])
+ n2 = mask[2] & (mode[2] == creg[2])
+ n3 = mask[3] & (mode[3] == creg[3])
+ BF = BFT[2:4] # select CR
+ bit = BFT[0:1] # select bit of CR
+ result = n0|n1|n2|n3 if M else n0&n1&n2&n3
+ CR{BF}[bit] = result
+
+Pseudo-op:
+
+ mtcri BB, mode mtcrweird r0, BB, 0b1111.~mode
+ mtcrset BB, mask mtcrweird r0, BB, mask.0b0000
+ mtcrclr BB, mask mtcrweird r0, BB, mask.0b1111
+
+
+# Vectorised versions
+
+The name "weird" refers to a minor violation of SV rules when it comes to deriving the Vectorised versions of these instructions.
+
+Normally the progression of the SV for-loop would move on to the next register.
+Instead however in the scalar case these instructions **remain in the same register** and insert or transfer between **bits** of the scalar integer source or destination.
-mode is encoded in XO and from m2 to produce 4 bits
+ crrweird: RT, BB, mask.mode
-bit 11=0:
+ for i in range(VL):
+ if BB.isvec:
+ creg = CR{BB+i}
+ else:
+ creg = CR{BB}
+ n0 = mask[0] & (mode[0] == creg[0])
+ n1 = mask[1] & (mode[1] == creg[1])
+ n2 = mask[2] & (mode[2] == creg[2])
+ n3 = mask[3] & (mode[3] == creg[3])
+ result = n0|n1|n2|n3 if M else n0&n1&n2&n3
+ if RT.isvec:
+ iregs[RT+i][63] = result
+ else:
+ iregs[RT][63-i] = result
- crweird: RT, BB, mask.mode
+Note that:
- creg = CRfile[32+BB*4:36+BB*4]
- n0 = mask[1] & (mode[0] == creg[0]
- n1 = mask[1] & (mode[1] == creg[1]
- n2 = mask[2] & (mode[2] == creg[2]
- n3 = mask[3] & (mode[3] == creg[3]
- RT[0] = n0 | n1 | n2 | b3
+* in the scalar case the CR-Vector assessment
+ is stored bit-wise starting at the LSB of the
+ destination scalar INT
+* in the INT-vector case the result is stored in the
+ LSB of each element in the result vector
+Note that element width overrides are respected on the INT src or destination register (but that elwidth overrides on CRs are meaningless)