From f637b0e2891c0a58a9ebf2110edf853e619e0f7b Mon Sep 17 00:00:00 2001
From: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date: Wed, 18 May 2022 13:18:07 +0100
Subject: [PATCH] bit-level packing descriptions for crweird family of
 instructions

---
 openpower/sv/cr_int_predication.mdwn | 41 +++++++++++++++++++++++++++-
 1 file changed, 40 insertions(+), 1 deletion(-)

diff --git a/openpower/sv/cr_int_predication.mdwn b/openpower/sv/cr_int_predication.mdwn
index e60f6a921..26958dd4c 100644
--- a/openpower/sv/cr_int_predication.mdwn
+++ b/openpower/sv/cr_int_predication.mdwn
@@ -115,11 +115,31 @@ bit 19=0, bit 20=0
     RT[63] = result # MSB0 numbering, 63 is LSB
     If Rc:
         CR0 = analyse(RT)
- 
+
 When used with SVP64 Prefixing this is a [[openpower/sv/normal]]
 SVP64 type operation and as such can use Rc=1 and RC1 Data-dependent
 Mode capability
 
+Also as noted below, element-width override bits normally used
+on the source is instead used to allow multiple results to be packed
+sequentially into the destination. *Destination elwidth overrides still apply*
+
+When the destination elwidth is default (0b00) the following packing occurs
+into destination elements:
+
+- SVRM bits 6:7 equal to 0b00 - one result element packed into one bit of each
+  destination element (in the LSB)
+- SVRM bits 6:7 equal to 0b01 - two result elements packed into two bits of
+  destination element (in the bottom two LSBs)
+- SVRM bits 6:7 equal to 0b10 - four result elements packed into four bits of
+  destination element (in the bottom four LSBs)
+- SVRM bits 6:7 equal to 0b11 - eight result elements packed into four bits of
+  destination element (in the bottom four LSBs)
+
+When for example the destination elwidth is 8-bit (0b11) then the destination
+element widths are 8-bit, and the result elements (grouped up to 8) still fit
+neatly into each 8-bit destination element.
+
 **mfcrrweird**
 
 mode is encoded in XO and is 4 bits
@@ -146,6 +166,25 @@ Also as noted below, element-width override bits normally used
 on the source is instead used to allow multiple results to be packed
 into the destination.  *Destination elwidth overrides still apply*
 
+Unlike `crrweird` however, the results are 4-bit wide, so the packing
+will begin to spill over to other destination elements.  8 results per
+destination at 4-bits each still fits into destination elwidth at 32-bit,
+but for 16-bit and 8-bit obviously this does not fit, and must split
+across to the next element
+
+When for example destination elwidth is 16-bit (0b10) the following packing
+occurs:
+
+- SVRM bits 6:7 equal to 0b00 - one 4-bit result element packed into the
+  first 4-bits of the 16-bit destination element (in the first 4 LSBs)
+- SVRM bits 6:7 equal to 0b01 - two 4-bit result elements packed into the
+  first 8-bits of the 16-bit destination element (in the first 8 LSBs)
+- SVRM bits 6:7 equal to 0b10 - four 4-bit result elements packed into each
+  16-bit destination element
+- SVRM bits 6:7 equal to 0b11 - eight 4-bit result elements, the first four
+  of which are packed into the first 16-bit destination element, the
+  second four of which are packed into the second 16-bit destination element.
+
 **mtcrrweird**
 
 mode is encoded in XO and is 4 bits
-- 
2.30.2