* <https://bugs.libre-soc.org/show_bug.cgi?id=533>
* <https://bugs.libre-soc.org/show_bug.cgi?id=527>
* <https://bugs.libre-soc.org/show_bug.cgi?id=569>
+* <https://bugs.libre-soc.org/show_bug.cgi?id=558#c47>
+
+Rationale:
+
+Condition Registers are conceptually perfect for use as predicate masks, the only problem being that typical Vector ISAs have quite comprehensive mask-based instructions: set-before-first, popcount and much more. In fact many Vector ISAs can use Vectors *as* masks, consequently the entire Vector ISA is available for use in creating masks. This is not practical for SV given the premise to minimise adding of instructions.
+
+With the scalar OpenPOWER v3.0B ISA having already popcnt, cntlz and others normally seen in Vector Mask operations it makes sense to allow *both* scalar integers *and* CR-Vectors to be predicate masks. That in turn means that much more comprehensive interaction between CRs and scalar Integers is required.
+
+The opportunity is therefore taken to also augment CR logical arithmetic as well, using a mask-based paradigm that takes into consideration multiple bits of each CR (eq/lt/gt/ov). v3.0B Scalar CR instructions (crand, crxor) only allow a single bit calculation.
Basic concept:
This places (according to a mask schedule) `CR0` into MSB0-numbered bits 32-35 of the target Integer register `RS`, these bits of `RS` being the 31st down to the 28th. Unfortunately, even when not Vectorised, this inserts CR numbering inversions on each batch of 8 CRs, massively complicating matters. Predication when using CRs would have to be morphed to this (unacceptably complex) behaviour:
for i in range(VL):
+ if INTpredmode:
+ predbit = (r3)[63-i] # IBM MSB0 spec sigh
+ else:
+ # completely incomprehensible vertical numbering
n = (7-(i%8)) | (i & ~0x7) # total mess
CRpredicate = CR{n} # select CR0, CR1, ....
predbit = CRpredicate[offs] # select eq..ov bit
Which is nowhere close to matching the straightforward obvious case:
for i in range(VL):
- if INTpredmode:
- predbit = (r3)[63-i] # IBM MSB0 spec sigh
- else:
- CRpredicate = CR{i} # start at CR0, work up
- predbit = CRpredicate[offs]
+ if INTpredmode:
+ predbit = (r3)[63-i] # IBM MSB0 spec sigh
+ else:
+ CRpredicate = CR{i} # start at CR0, work up
+ predbit = CRpredicate[offs]
In other words unless we do something about this, when we transfer bits from an Integer Predicate into a Vector of CRs, our numbering of CRs, when enumerating them in a CR Vector, would be **CR7** CR6 CR5.... CR0 **CR15** CR14 CR13... CR8 **CR23** CR22 etc. **not** the more natural and obvious CR0 CR1 ... CR23.
| 0-5 | 6-10 | 11 | 12-15 | 16-18 | 19-20 | 21-25 | 26-30 | 31 |
| --- | ---- | -- | ----- | ----- | ----- | ----- | ----- | -- |
- | 19 | RT | | mask | BB | / | XO[0:4] | XO[5:9] | / |
- | 19 | RT | 0 | mask | BB | 0 / | XO[0:4] | 0 mode | / |
+ | 19 | RT | | mask | BB | | XO[0:4] | XO[5:9] | / |
+ | 19 | RT | 0 | mask | BB | 0 M | XO[0:4] | 0 mode | Rc |
| 19 | RA | 1 | mask | BB | 0 / | XO[0:4] | 0 mode | / |
| 19 | BT // | 0 | mask | BB | 1 / | XO[0:4] | 0 mode | / |
- | 19 | BFT | 1 | mask | BB | 1 / | XO[0:4] | 0 mode | / |
+ | 19 | BFT | 1 | mask | BB | 1 M | XO[0:4] | 0 mode | / |
mode is encoded in XO and is 4 bits
n1 = mask[1] & (mode[1] == creg[1])
n2 = mask[2] & (mode[2] == creg[2])
n3 = mask[3] & (mode[3] == creg[3])
- RT[63] = n0|n1|n2|n3 # MSB0 numbering, 63 is LSB
+ result = n0|n1|n2|n3 if M else n0&n1&n2&n3
+ RT[63] = result # MSB0 numbering, 63 is LSB
+ If Rc:
+ CR1 = analyse(RT)
bit 11=1, bit 19=0
n3 = mask[3] & (mode[3] == creg[3])
BF = BFT[2:4] # select CR
bit = BFT[0:1] # select bit of CR
- CR{BF}[bit] = n0|n1|n2|n3
+ result = n0|n1|n2|n3 if M else n0&n1&n2&n3
+ CR{BF}[bit] = result
Pseudo-op:
mtcrclr BB, mask mtcrweird r0, BB, mask.0b1111
+# Vectorised versions
+
+The name "weird" refers to a minor violation of SV rules when it comes to deriving the Vectorised versions of these instructions.
+
+Normally the progression of the SV for-loop would move on to the next register.
+Instead however in the scalar case these instructions **remain in the same register** and insert or transfer between **bits** of the scalar integer source or destination.
+
+ crrweird: RT, BB, mask.mode
+
+ for i in range(VL):
+ if BB.isvec:
+ creg = CR{BB+i}
+ else:
+ creg = CR{BB}
+ n0 = mask[0] & (mode[0] == creg[0])
+ n1 = mask[1] & (mode[1] == creg[1])
+ n2 = mask[2] & (mode[2] == creg[2])
+ n3 = mask[3] & (mode[3] == creg[3])
+ result = n0|n1|n2|n3 if M else n0&n1&n2&n3
+ if RT.isvec:
+ iregs[RT+i][63] = result
+ else:
+ iregs[RT][63-i] = result
+
+Note that:
+
+* in the scalar case the CR-Vector assessment
+ is stored bit-wise starting at the LSB of the
+ destination scalar INT
+* in the INT-vector case the result is stored in the
+ LSB of each element in the result vector
+
+Note that element width overrides are respected on the INT src or destination register (but that elwidth overrides on CRs are meaningless)