(no commit message)

[libreriscv.git] / openpower / sv / cr_int_predication.mdwn
diff --git a/openpower/sv/cr_int_predication.mdwn b/openpower/sv/cr_int_predication.mdwn

index f6213096b64d3b23ddfd0c61583eb064daa27868..b7b8e4ee840bcf68543611163e114fcad599bf0f 100644 (file)
--- a/openpower/sv/cr_int_predication.mdwn
+++ b/openpower/sv/cr_int_predication.mdwn
@@ -7,6 +7,15 @@ See:
  * <https://bugs.libre-soc.org/show_bug.cgi?id=533>
  * <https://bugs.libre-soc.org/show_bug.cgi?id=527>
  * <https://bugs.libre-soc.org/show_bug.cgi?id=569>
+* <https://bugs.libre-soc.org/show_bug.cgi?id=558#c47>
+
+Rationale:
+
+Condition Registers are conceptually perfect for use as predicate masks, the only problem being that typical Vector ISAs have quite comprehensive mask-based instructions: set-before-first, popcount and much more.  In fact many Vector ISAs can use Vectors *as* masks, consequently the entire Vector ISA is available for use in creating masks.  This is not practical for SV given the premise to minimise adding of instructions.
+
+With the scalar OpenPOWER v3.0B ISA having already popcnt, cntlz and others normally seen in Vector Mask operations it makes sense to allow *both* scalar integers *and* CR-Vectors to be predicate masks.  That in turn means that much more comprehensive interaction between CRs and scalar Integers is required.
+
+The opportunity is therefore taken to also augment CR logical arithmetic as well, using a mask-based paradigm that takes into consideration multiple bits of each CR (eq/lt/gt/ov).  v3.0B Scalar CR instructions (crand, crxor) only allow a single bit calculation.
  
  Basic concept:
  
@@ -51,6 +60,10 @@ In [[isa/sprset]] we see the pseudocode for `mtcrf` for example:
  This places (according to a mask schedule) `CR0` into MSB0-numbered bits 32-35 of the target Integer register `RS`, these bits of `RS` being the 31st down to the 28th.  Unfortunately, even when not Vectorised, this inserts CR numbering inversions on each batch of 8 CRs, massively complicating matters.  Predication when using CRs would have to be morphed to this (unacceptably complex) behaviour:
  
      for i in range(VL):
+       if INTpredmode:
+         predbit = (r3)[63-i] # IBM MSB0 spec sigh
+       else:
+         # completely incomprehensible vertical numbering
           n = (7-(i%8)) | (i & ~0x7) # total mess
           CRpredicate = CR{n}        # select CR0, CR1, ....
           predbit = CRpredicate[offs]  # select eq..ov bit
@@ -58,11 +71,11 @@ This places (according to a mask schedule) `CR0` into MSB0-numbered bits 32-35 o
  Which is nowhere close to matching the straightforward obvious case:
  
      for i in range(VL):
-         if INTpredmode:
-             predbit = (r3)[63-i] # IBM MSB0 spec sigh
-         else:
-             CRpredicate = CR{i} # start at CR0, work up
-             predbit = CRpredicate[offs]
+       if INTpredmode:
+         predbit = (r3)[63-i] # IBM MSB0 spec sigh
+       else:
+         CRpredicate = CR{i} # start at CR0, work up
+         predbit = CRpredicate[offs]
  
  In other words unless we do something about this, when we transfer bits from an Integer Predicate into a Vector of CRs, our numbering of CRs, when enumerating them in a CR Vector, would be **CR7** CR6 CR5.... CR0 **CR15** CR14 CR13... CR8 **CR23** CR22 etc. **not** the more natural and obvious CR0 CR1 ... CR23.
  
@@ -94,11 +107,11 @@ Instruction format:
  
      | 0-5 | 6-10  | 11 | 12-15 | 16-18 | 19-20 | 21-25   | 26-30   | 31 |
      | --- | ----  | -- | ----- | ----- | ----- | -----   | -----   | -- |
-    | 19  | RT    |    | mask  | BB    |    /  | XO[0:4] | XO[5:9] | /  |
-    | 19  | RT    | 0  | mask  | BB    |  0 /  | XO[0:4] | 0 mode  | /  |
+    | 19  | RT    |    | mask  | BB    |       | XO[0:4] | XO[5:9] | /  |
+    | 19  | RT    | 0  | mask  | BB    |  0 M  | XO[0:4] | 0 mode  | Rc |
      | 19  | RA    | 1  | mask  | BB    |  0 /  | XO[0:4] | 0 mode  | /  |
      | 19  | BT // | 0  | mask  | BB    |  1 /  | XO[0:4] | 0 mode  | /  |
-    | 19  | BFT   | 1  | mask  | BB    |  1 /  | XO[0:4] | 0 mode  | /  |
+    | 19  | BFT   | 1  | mask  | BB    |  1 M  | XO[0:4] | 0 mode  | /  |
  
  mode is encoded in XO and is 4 bits
  
@@ -111,7 +124,10 @@ bit 11=0, bit 19=0
      n1 = mask[1] & (mode[1] == creg[1])
      n2 = mask[2] & (mode[2] == creg[2])
      n3 = mask[3] & (mode[3] == creg[3])
-    RT[63] = n0|n1|n2|n3 # MSB0 numbering, 63 is LSB
+    result = n0|n1|n2|n3 if M else n0&n1&n2&n3
+    RT[63] = result # MSB0 numbering, 63 is LSB
+    If Rc:
+        CR1 = analyse(RT)
  
  bit 11=1, bit 19=0
  
@@ -147,7 +163,8 @@ bit 11=1, bit 19=1
      n3 = mask[3] & (mode[3] == creg[3])
      BF = BFT[2:4] # select CR
      bit = BFT[0:1] # select bit of CR
-    CR{BF}[bit] = n0|n1|n2|n3
+    result = n0|n1|n2|n3 if M else n0&n1&n2&n3
+    CR{BF}[bit] = result
  
  Pseudo-op:
  
@@ -156,3 +173,36 @@ Pseudo-op:
      mtcrclr BB, mask  mtcrweird r0, BB, mask.0b1111
  
  
+# Vectorised versions
+
+The name "weird" refers to a minor violation of SV rules when it comes to deriving the Vectorised versions of these instructions.
+
+Normally the progression of the SV for-loop would move on to the next register.
+Instead however in the scalar case these instructions **remain in the same register** and insert or transfer between **bits** of the scalar integer source or destination.
+
+    crrweird: RT, BB, mask.mode
+
+    for i in range(VL):
+        if BB.isvec:
+            creg = CR{BB+i}
+        else:
+            creg = CR{BB}
+        n0 = mask[0] & (mode[0] == creg[0])
+        n1 = mask[1] & (mode[1] == creg[1])
+        n2 = mask[2] & (mode[2] == creg[2])
+        n3 = mask[3] & (mode[3] == creg[3])
+        result = n0|n1|n2|n3 if M else n0&n1&n2&n3
+        if RT.isvec:
+            iregs[RT+i][63] = result
+        else:
+            iregs[RT][63-i] = result
+
+Note that:
+
+* in the scalar case the CR-Vector assessment
+  is stored bit-wise starting at the LSB of the
+   destination scalar INT
+* in the INT-vector case the result is stored in the
+  LSB of each element in the result vector
+
+Note that element width overrides are respected on the INT src or destination register (but that elwidth overrides on CRs are meaningless)