(no commit message)

[libreriscv.git] / openpower / sv / 16_bit_compressed.mdwn
diff --git a/openpower/sv/16_bit_compressed.mdwn b/openpower/sv/16_bit_compressed.mdwn

index b93939257c5068f8350254142d91a89ea5d479d3..ac393a9ffcce12a74d8f3000eb4e53997f8907c3 100644 (file)
--- a/openpower/sv/16_bit_compressed.mdwn
+++ b/openpower/sv/16_bit_compressed.mdwn
@@ -1,9 +1,24 @@
  # 16 bit Compressed
  
+Similar to VLE (but without immediate-prefixing) this encoding is designed
+to fit on top of OpenPOWER ISA v3.0B when a "Modeswitch" bit is set (PCR
+is recommended). Note that Compressed is *mutually exclusively incompatible*
+with OpenPOWER v3.1B "prefixing" due to using (requiring) both EXT000
+and EXT001. Hypothetically it could be made to use anything other than
+EXT001, with some inconvenience (extra gates).  The incompatibility is
+"fixed" by swapping out of "Compressed" Mode and back into "Normal"
+(v3.1B) Mode, at runtime, as needed.
+
+Although initially intended to be augmented by Simple-V Prefixing (to
+add Vector context, width overrides, e.g IEEE754 FP16, and predication) yet not put pressure on I-Cache power
+or size, this Compressed Encoding is not critically dependent
+*on* SV Prefixing, and may be used stand-alone.
+
  See:
  
  * <https://bugs.libre-soc.org/show_bug.cgi?id=238>
  * <https://ftp.libre-soc.org/VLE_314-68105.pdf> VLE Encoding
+* <http://lists.mailinglist.openpowerfoundation.org/pipermail/openpower-hdl-cores/2020-November/000210.html>
  
  This one is a conundrum.  OpenPOWER ISA was never designed with 16
  bit in mind.  VLE was added 10 years ago but only by way of marking
@@ -12,16 +27,21 @@ fully compatible with current PowerISA.
  
  Here, in order to embed 16 bit into a predominantly 32 bit stream the
  overhead of using an entire 16 bits just to switch into Compressed mode
-is itself a significant overhead.  The situation is made worse by 5 bits
-being taken up by Major Opcode space, leaving only 11 bits to allocate
+is itself a significant overhead.  The situation is made worse by 6 bits
+being taken up by Major Opcode space, leaving only 10 bits to allocate
  to actual instructions.
  
+Contrast this with RVC which takes 3 out of 4 
+combinations of the first 2 bits for indicating 16-bit (anything with 0b00 to 0b10 in the LSBs), and uses the 4th as a Huffman-style escape-sequence, easily allowing standard 32 bit and 16 bit to intermingle cleanly.  To achieve the same thing on OpenPOWER would require a whopping 24 6-bit Major Opcodes which is clearly impractical: other schemes need to be devised.
+
  In addition we would like to add SV-C32 which is a Vectorised version
  of 16 bit Compressed, and ideally have a variant that adds the 27-bit
  prefix format from SV-P64, as well.
  
  Potential ways to reduce pressure on the 16 bit space are:
  
+* To use more than one v3.0B Major Opcode, preferably an odd-even
+  contiguous pair
  * To provide "paging".  This involves bank-switching to alternative optimised encodings for specific workloads
  * To enter "16 bit mode" for durations specified at the start
  * To reserve one bit of every 16 bit instruction to indicate that the 16 bit mode is to continue to be sustained
@@ -51,12 +71,21 @@ for the 16 bit operations (bank selection of which scalar regs)
  Another is to use the 11 bits for only the utmost commonly used
  instructions.  That being the case then even one of those 11 bits would
  also need to be dedicated to saying if 16 bit mode is to be continued.
-10 bits remain for actual opcodes!
+10 bits remain for actual opcodes, which is ridiculously tight.
+
+The reason for picking 2 contiguous Major v3.0B opcodes is illustrated below:
+
+    |0 1 2 3 4 5 6 7 8 9 a b c d e f|
+    |major op..0| LO Half C space   |
+    |major op..1| HI Half C space   |
+    |N N N N N|<--11 bits C space-->|
+
+If NNNNN is the same value (two contiguous Major v3.0B Opcodes) this saves gates at a critical part of the decode phase.
  
  # Opcode Allocation Ideas
  
-* one bit from the 16-bit mode is used to indicate that 32-bit mode
-  is to be dropped into for only one single instruction
+* one bit from the 16-bit mode is used to indicate that standard
+  (v3.0B) mode is to be dropped into for only one single instruction
    <https://bugs.libre-soc.org/show_bug.cgi?id=238#c2>
  
  ## Opcodes exploration (Attempt 1)
@@ -81,71 +110,174 @@ The current "top" idea for 0b11 is to use it for a new encoding format
  of predominantly "immediates-based" 16-bit instructions (branch-conditional,
  addi, mulli etc.)
  
-The Compressed Major Opcode is in bits 5-7.
+* The Compressed Major Opcode is in bits 5-7.
+* Minor opcode in bit 8.
+* In some cases bit 9 is taken as an additional sub-opcode, followed
+  by bits 0-4 (for CR operations)
+* M+N mode-switching is not available for C-Major.minor 0b001.1
+* 10 bit mode may be expanded by 16 bit mode, adding capabilities
+  that do not fit in the extreme limited space.
  
-* M+N mode-switching is not available for C-Major 0b000 or 0b111
+Mode-switching FSM showing relationship between v3.0B, C 10bit and C 16bit.
+16-bit immediate mode remains in 16-bit.
  
-### Immediate Opcodes
+    | 0 | 1234 | 567  8 | 9abcde | f | explanation
+    |EXT000/1  | Cmaj.m | fields | 0 | 10bit then v3.0B
+    |EXT000/1  | Cmaj.m | fields | 1 | 10bit then 16bit
+    | 0 | flds | Cmaj.m | fields | 0 | 16bit then v3.0B
+    | 0 | flds | Cmaj.m | fields | 1 | 16bit then 16bit
+    | 1 | flds | Cmaj.m | fields | 0 | 16b then 1x v3.0B
+    | 1 | flds | Cmaj.m | fields | 1 | 16b/imm then 16bit
  
-only available in 16-bit mode, and only available when M=1 and N=1
+Notes:
  
-    | 0 | 1  | 2 3 4 | | 567 | 89a | b c | d   | e | f |
-    | 1 | o2 |  RT   | | 010 | RB  | offs      | 1 | addi.
-    | 1 | o2 |  RT   | | 011 | RB  | offs      | 1 | addis.
-    | 1 | o2 |       | | 100 |     | offs      | 1 | 
-    | 1 | o2 |  RT   | | 101 | RA  | offs      | 1 | ldi
-    | 1 | o2 |  RT   | | 110 | RA  | offs      | 1 | sti
+* Cmaj.m is the C major/minor opcode: 3 bits for major, 1 for minor
+* EXT000 and EXT001 are v3.0B Major Opcodes.  The first 5 bits
+  are zero, therefore the 6th bit is actually part of Cmaj.
+* "10bit then 16bit" means "this instruction is encoded C 10bit
+  and the following one in C 16bit"
  
-* Note that bc is included (below)
-* immediate is constructed from offs (LSBs) and o2 (MSB)
+### C Instruction Encoding types
  
-### Branch
+10-bit Opcode formats (all start with v3.0B EXT000 or EXT001
+Major Opcodes)
  
-10 bit mode may be expanded by 16 bit mode later, adding capabilities
-that do not fit in the extreme limited space.
+    | 01234    | 567  8 | 9  | a b | c  | d e | f | enc
+    | E01      | Cmaj.m | fld1     | fld2     | M | 10b
+    | E01      | Cmaj.m | offset              | M | 10b b
+    | E01      | 001.1  | S1 | fd1 | S2 | fd2 | M | 10b sub
+    | E01      | 111.m  | fld1     | fld2     | M | 10b LDST
+
+16-bit Opcode formats (including 10/16/v3.0B Switching)
+
+    | 0 | 1234 | 567  8 | 9  | a b | c  | d e | f | enc
+    | N | immf | Cmaj.m | fld1     | fld2     | M | 16b
+    | 1 | immf | Cmaj.m | fld1     | imm      | 1 | 16b imm
+    | fd3      | 001.1  | S1 | fd1 | S2 | fd2 | M | 16b sub
+    | N | fd4  | 111.m  | fld1     | fld2     | M | 16b LDST
+
+Notes:
+
+* fld1 and fld2 can contain reg numbers, immediates, or opcode
+  fields (BO, BI, LK)
+* S1 and S2 are further sub-selectors of C 001.1
+
+### Immediate Opcodes
  
-    | 16-bit mode | | 10-bit mode                |
-    | 0 | 1 | 234 | | 567 | 8 9 a | b | c d | e  | f |
-    | BO2   | BI3 | | 000 | 0  BI | 0   BO  | LK | M | bclr
-    | BO2   | BI3 | | 000 | 0  BI | 1   BO  | LK | M | bctr
-    | N | offs2   | | 001 |    offs         | LK | M | b
-    | 1 | offs2   | | 001 | BI    | BO1 oo  | LK | 1 | bc
+only available in 16-bit mode, only available when M=1 and N=1
+and when Cmaj.min is not 0b001.1.
+
+    | 0 | 1  | 2 | 3 4 | | 567.8 | 9ab  | cde | f |
+    | 1 | 0  | 0   0 0 | | 001.0 |      | 000 | 1 | TBD
+    | 1 | 0  |  sh2    | | 001.0 | RA   | sh  | 1 | sradi.
+    | 1 | 1  | 0   0 0 | | 001.0 |      | 000 | 1 | TBD
+    | 1 | 1  | 0 | sh2 | | 001.0 | RA   | sh  | 1 | srawi.
+    | 1 | 1  | 1 |     | | 001.0 |      |     | 1 | TBD
+    | 1 | i2 |  RT     | | 010.0 | RA|0 | imm | 1 | addi
+    | 1 | i2           | | 010.1 | RA   | imm | 1 | addis
+    | 1 | i2           | | 011.0 | RA   | imm | 1 | cmpdi
+    | 1 | i2           | | 011.1 | RA   | imm | 1 | cmpwi
+    | 1 | i2           | | 100.0 | RT   | imm | 1 | stwi
+    | 1 | i2           | | 100.1 | RT   | imm | 1 | stdi
+    | 1 | i2           | | 101.0 | RA   | imm | 1 | ldi
+    | 1 | i2           | | 101.1 | RA   | imm | 1 | lwi
+    | 1 | i2 | RA      | | 110.0 | RT   | imm | 1 | fsti
+    | 1 | i2 | RA      | | 110.1 | RT   | imm | 1 | fstdi
+    | 1 | i2 | RT      | | 111.0 | RA   | imm | 1 | flwi
+    | 1 | i2 | RT      | | 111.1 | RA   | imm | 1 | fldi
+
+Construction of immediate:
+
+* addi is EXTS(i2||imm) to give a 4-bit range -8 to +7
+* addis is EXTS(i2||imm||000) to give a 11-bit range -1024 to +1023 in increments of 8 
+* all others are EXTS(i2||imm) to give a 7-bit range -128 to +127
+  (further for LD/ST due to word/dword-alignment)
+
+Further Notes:
+
+* bc also has an immediate mode, listed separately below in Branch section
+* for LD/ST, offset is aligned.  8-byte: i2||imm||0b000 4-byte: 0b00
+* SV Prefix over-rides help provide alternative bitwidths for LD/ST
+* RA|0 if RA is zero, addi. becomes "li"
+  - this only works if RT takes part of opcode
+  - mv is also possible by specifying an immediate of zero
+
+### Illegal and nop
+
+Note that illeg is all zeros, including in the 16-bit mode.
+Given that C is allocated to OpenPOWER ISA Major opcodes EXT000 and
+EXT001 this ensures that in both 10-bit *and* 16-bit mode, a 16-bit
+run of all zeros is considered "illegal" whilst 0b0000.0000.1000.0000
+is "nop"
+
+    | 16-bit mode | | 10-bit mode                 |
+    | 0 | 1 | 234 | | 567.8  | 9  ab | c   de | f |
+    | 0 | 0   000 | | 000.0  | 0  00 | 0   00 | 0 | illeg
+    | 0 | 0   000 | | 000.0  | 0  00 | 0   00 | 1 | nop
+
+16 bit mode only:
+
+    | 1 | 0   000 | | 000.0  | 0  00 | 0   00 | 0 | nop
+    | 1 | nonzero | | 000.0  | 0  00 | 0   00 | 0 | TBD
+
+Notes:
+
+* The 10-bit nop (bit 15, M=1) is intended for circumstances
+  where alignment to 32-bit before rwturning to v3.0B is required.
+  M=1 being an indication
+  "return to Standard v3.0B Encoding Mode"
+* The 16-bit nop (bit 0, N=1) is intended for circumstances where a
+  return to Standard v3.0B Encoding is required for one cycle
+  but one cycle where alignment to a 32-bit boundary is needed
+* If for any reason multiple 16 bit nops are needed in succession
+  the 10-bit variant can be used, because each one returns to
+  Standard v3.0B Encoding Mode, each time.
+
+### Branch
+
+    | 16-bit mode | | 10-bit mode                 |
+    | 0 | 1 | 234 | | 567.8  | 9  ab | c   de | f |
+    | N | offs2   | | 000.LK | offs!=0        | M | b, bl
+    | 1 | offs2   | | 000.LK | BI    | BO1 oo | 1 | bc, bcl
+    | N | BO3 BI3 | | 001.0  | LK BI | BO     | M | bclr, bclrl
  
  16 bit mode:
  
  * bc only available when N,M=0b11
  * offs2 extends offset in MSBs
  * BI3 extends BI in MSBs to allow selection of full CR
-* BO2 extends BO
+* BO3 extends BO
  * bc offset constructed from oo as LSBs and offs2 as MSBs
  * bc BI allows selection of all bits from CR0 or CR1
  * bc CR check is always active (as if BO0=1) therefore BO1 inverts
  
  10 bit mode:
  
-* bc **not available**
+* illegal (all zeros) covers part of branch (offs=0,M=0,LK=0)
+* nop also covers part of branch (offs=0,M=0,LK=1)
+* bc **not available** in 10-bit mode
  * BO[0] enables CR check, BO[1] inverts check
  * BI refers to CR0 only (4 bits of)
  * no Branch Conditional with immediate
  * no Absolute Address
-* no CTR mode (TBD?)
+* CTR mode allowed with BO[2] for b only.
  * offs is to 2 byte (signed) aligned
  * all branches to 2 byte aligned
  
  ### LD/ST
  
-    | 16-bit mode       | | 10-bit mode             |
-    | 0   | 1   | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
-    | RB2 | RA2 |  RT   | | 001 | 1  RA | 1  RB | 0 | M | fld
-    | RA2 | RT2 |  RB   | | 001 | 1  RA | 1  RT | 1 | M | fst
-    |     |     |  RT   | | 111 |  RA   |  RB   | 0 | M | ld
-    |     |     |  RB   | | 111 |  RA   |  RT   | 1 | M | st
+    | 16-bit mode      | | 10-bit mode               |
+    | 0   | 1  | 2 3 4 | | 567.8 | 9 a b | c d e | f |
+    | RA2 | SZ |  RB   | | 001.1 | 1  RA | 0  RT | M | st
+    | RA2 | SZ |  RB   | | 001.1 | 1  RA | 1  RT | M | fst
+    | N   | SZ |  RT   | | 111.0 |  RA   |  RB   | M | ld
+    | N   | SZ |  RT   | | 111.1 |  RA   |  RB   | M | fld
  
  * elwidth overrides can set different widths
  
  16 bit mode:
  
-* F=1 is FLD, FST
+* SZ=1 is 64 bit, SZ=0 is 32 bit
  * RA2 extends RA to 3 bits (MSB)
  * RT2 extends RT to 3 bits (MSB)
  
@@ -157,44 +289,58 @@ that do not fit in the extreme limited space.
  
  ### Arithmetic
  
-    | 16-bit mode   | | 10-bit mode             |
-    | 0 | 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
-    | N |   |  RT   | | 010 | RB    | RA!=0 | 0 | M | add
-    | N |   |  RT   | | 011 | RB    | RA!=0 | 0 | M | sub.
-    | N |   |  RT   | | 010 | RB    | RA    | 1 | M | mul
-    | N |   |  RT   | | 011 | RB    | 0 0 0 | 0 | M | neg.
+    | 16-bit mode | | 10-bit mode             |
+    | 0 | 1 | 234 | | 567.8 | 9ab | c d e | f |
+    | N | 0 | RT  | | 010.0 | RB  | RA!=0 | M | add
+    | N | 0 | RT  | | 010.1 | RB  | RA|0  | M | sub.
+    | N | 0 | BF  | | 011.0 | RB  | RA|0  | M | cmpl
  
-10 bit mode:
+Notes:
  
-* sub. default CR target is CR0
+* sub. and cmpl: default CR target is CR0
  * for (RA|0) when RA=0 the input is a zero immediate,
-  meaning that sub. becomes neg.
+  meaning that sub. becomes neg. and cmp becomes cmpi against zero
  * RT is implicitly RB: "add RT(=RB), RA, RB"
+* Opcode 0b010.0 RA=0 is not missing from the above:
+  it is a system-wide instruction, "cbank" (section below)
+
+16 bit mode only:
+
+    | 0 | 1 | 234 | | 567.8 | 9ab | cde   | f |
+    | N | 1 | RA  | | 010.0 | RB  | RS    | 0 | sld.
+    | N | 1 | RA  | | 010.1 | RB  | RS!=0 | 0 | srd.
+    | N | 1 | RA  | | 010.1 | RB  | 000   | 0 | srad.
+    | N | 1 | BF  | | 011.0 | RB  | RA|0  | 0 | cmpw
+
+Notes:
+
+* for srad, RS=RA: "srad. RA(=RS), RS, RB"
+
  
  ### Logical
  
      | 16-bit mode   | | 10-bit mode             |
-    | 0 | 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
-    | N | 0 |  RT   | | 100 | RB    | RA!=0 | 0 | M | and
-    | N | 0 |  RT   | | 100 | RB    | RA!=0 | 1 | M | nand
-    | N | 0 |  RT   | | 101 | RB    | RA!=0 | 0 | M | or
-    | N | 0 |  RT   | | 101 | RB    | RA!=0 | 1 | M | nor
-    | N | 0 |  RT   | | 100 | RB    | 0 0 0 | 0 | M | extsw
-    | N | 0 |  RT   | | 100 | RB    | 0 0 0 | 1 | M | cntlz
-    | N | 0 |  RT   | | 101 | RB    | 0 0 0 | 0 | M | popcnt
-    | N | 0 |  RT   | | 101 | RB    | 0 0 0 | 1 | M | not
+    | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
+    | N | 0 |  RT   | | 100.0 | RB  | RA!=0 | M | and
+    | N | 0 |  RT   | | 100.1 | RB  | RA!=0 | M | nand
+    | N | 0 |  RT   | | 101.0 | RB  | RA!=0 | M | or
+    | N | 0 |  RT   | | 101.1 | RB  | RA!=0 | M | nor
+    | N | 0 |  RT   | | 100.0 | RB  | 0 0 0 | M | extsw
+    | N | 0 |  RT   | | 100.1 | RB  | 0 0 0 | M | cntlz
+    | N | 0 |  RT   | | 101.0 | RB  | 0 0 0 | M | popcnt
+    | N | 0 |  RT   | | 101.1 | RB  | 0 0 0 | M | not
  
  16-bit mode only:
  
-    | 0 | 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
-    | N | 1 |  RT   | | 100 | RB    | RA!=0 | 0 | 1 | 
-    | N | 1 |  RT   | | 100 | RB    | RA!=0 | 1 | 1 | 
-    | N | 1 |  RT   | | 101 | RB    | RA!=0 | 0 | 1 | 
-    | N | 1 |  RT   | | 101 | RB    | RA!=0 | 1 | 1 |
-    | N | 1 |  RT   | | 100 | RB    | 0 0 0 | 0 | 1 | extsb
-    | N | 1 |  RT   | | 100 | RB    | 0 0 0 | 1 | 1 | 
-    | N | 1 |  RT   | | 101 | RB    | 0 0 0 | 0 | 1 | 
-    | N | 1 |  RT   | | 101 | RB    | 0 0 0 | 1 | 1 |
+    | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
+    | N | 1 |  RT   | | 100.0 | RB  | RA!=0 | 0 | TBD
+    | N | 1 |  RT   | | 100.1 | RB  | RA!=0 | 0 | TBD
+    | N | 1 |  RT   | | 101.0 | RB  | RA!=0 | 0 | xor
+    | N | 1 |  RT   | | 101.1 | RB  | RA!=0 | 0 | eqv (xnor)
+    | N | 1 |  RT   | | 100.0 | RB  | 0 0 0 | 0 | extsb
+    | N | 1 |  RT   | | 100.1 | RB  | 0 0 0 | 0 | cnttz
+    | N | 1 |  RT   | | 101.0 | RB  | 0 0 0 | 0 | TBD
+    | N | 1 |  RT   | | 101.1 | RB  | 0 0 0 | 0 | extsh
  
  10 bit mode:
  
@@ -208,23 +354,23 @@ that do not fit in the extreme limited space.
  Note here that elwidth overrides (SV Prefix) can be used to select FP16/32/64
  
      | 16-bit mode   | | 10-bit mode             |
-    | 0 | 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
-    | N |   |  RT   | | 011 | RB    | RA!=0 | 1 | M | fsub.
-    | N | 0 |  RT   | | 110 | RB    | RA!=0 | 0 | M | fadd
-    | N | 0 |  RT   | | 110 | RB    | RA!=0 | 1 | M | fmul
-    | N | 0 |  RT   | | 011 | RB    | 0 0 0 | 1 | M | fneg.
-    | N |   |  RT   | | 110 | RB    | 0 0 0 | 0 | M | 
-    | N |   |  RT   | | 110 | RB    | 0 0 0 | 1 | M | 
+    | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
+    | N |   |  RT   | | 011.1 | RB  | RA!=0 | M | fsub.
+    | N | 0 |  RT   | | 110.0 | RB  | RA!=0 | M | fadd
+    | N | 0 |  RT   | | 110.1 | RB  | RA!=0 | M | fmul
+    | N | 0 |  RT   | | 011.1 | RB  | 0 0 0 | M | fneg.
+    | N | 0 |  RT   | | 110.0 | RB  | 0 0 0 | M |
+    | N | 0 |  RT   | | 110.1 | RB  | 0 0 0 | M |
  
  16-bit mode only:
  
-    | 0 | 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
-    | N | 1 |  RT   | | 011 | RB    | RA!=0 | 1 | M |
-    | N | 1 |  RT   | | 110 | RB    | RA!=0 | 0 | M | 
-    | N | 1 |  RT   | | 110 | RB    | RA!=0 | 1 | M | fdiv
-    | N | 1 |  RT   | | 011 | RB    | 0 0 0 | 1 | M | fabs.
-    | N |   |  RT   | | 110 | RB    | 0 0 0 | 0 | M | fmr.
-    | N |   |  RT   | | 110 | RB    | 0 0 0 | 1 | M | 
+    | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
+    | N | 1 |  RT   | | 011.1 | RB  | RA!=0 | 0 |
+    | N | 1 |  RT   | | 110.0 | RB  | RA!=0 | 0 |
+    | N | 1 |  RT   | | 110.1 | RB  | RA!=0 | 0 | fdiv
+    | N | 1 |  RT   | | 011.1 | RB  | 0 0 0 | 0 | fabs.
+    | N | 1 |  RT   | | 110.0 | RB  | 0 0 0 | 0 | fmr.
+    | N | 1 |  RT   | | 110.1 | RB  | 0 0 0 | 0 |
  
  10 bit mode:
  
@@ -238,22 +384,22 @@ Note here that elwidth overrides (SV Prefix) can be used to select FP16/32/64
  
  ### Condition Register
  
-    | 16-bit mode   | | 10-bit mode           |
-    | 0 1 2 3 | 4   | | 567 | 8 9 a | b c d e | f |
-    | 0 0 0 0 | BF2 | | 000 | 1  BF | 0  BFA  | M | mcrf
-    | 0 0 0 1 | BA2 | | 000 | 1  BA | 0  BB   | M | crnor
-    | 0 1 0 0 | BA2 | | 000 | 1  BA | 0  BB   | M | crandc
-    | 0 1 1 0 | BA2 | | 000 | 1  BA | 0  BB   | M | crxor
-    | 0 1 1 1 | BA2 | | 000 | 1  BA | 0  BB   | M | crnand
-    | 1 0 0 0 | BA2 | | 000 | 1  BA | 0  BB   | M | crand
-    | 1 0 0 1 | BA2 | | 000 | 1  BA | 0  BB   | M | creqv
-    | 1 1 0 1 | BA2 | | 000 | 1  BA | 0  BB   | M | crorc
-    | 1 1 1 0 | BA2 | | 000 | 1  BA | 0  BB   | M | cror
+    | 16-bit mode   | | 10-bit mode            |
+    | 0 1 2 3 | 4   | | 567.8 | 9 ab | cde | f |
+    | 0 0 0 0 | BF2 | | 001.1 | 0 BF | BFA | M | mcrf
+    | 0 0 0 1 | BA2 | | 001.1 | 0 BA | BB  | M | crnor
+    | 0 1 0 0 | BA2 | | 001.1 | 0 BA | BB  | M | crandc
+    | 0 1 1 0 | BA2 | | 001.1 | 0 BA | BB  | M | crxor
+    | 0 1 1 1 | BA2 | | 001.1 | 0 BA | BB  | M | crnand
+    | 1 0 0 0 | BA2 | | 001.1 | 0 BA | BB  | M | crand
+    | 1 0 0 1 | BA2 | | 001.1 | 0 BA | BB  | M | creqv
+    | 1 1 0 1 | BA2 | | 001.1 | 0 BA | BB  | M | crorc
+    | 1 1 1 0 | BA2 | | 001.1 | 0 BA | BB  | M | cror
  
  10 bit mode:
  
  * mcrf BF is only 2 bits which means the destination is only CR0-CR3
-* CR operations: **not available** in 10-bit mode
+* CR operations: **not available** in 10-bit mode (but mcrf is)
  
  16 bit mode:
  
@@ -267,33 +413,32 @@ SV (Vector Mode):
  
  ### System
  
-* cbank: Selection of Compressed-encoding "Bank".  Different "banks" give different
-meanings to opcodes.  Example: CBank=0b001 is heavily optimised to A/Video
-Encode/Decode.
+cbank: Selection of Compressed-encoding "Bank".  Different "banks"
+give different meanings to opcodes.  Example: CBank=0b001 is heavily
+optimised to A/Video Encode/Decode.  cbank borrows from add's encoding
+space (when RA==0)
  
      | 16-bit mode | | 10-bit mode             |
-    | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
-    |       Bank2 | | 010 | CBank | 0 0 0 | 0 | M | cbank
+    | 0 | 1 2 3 4 | | 567.8 | 9ab   | cde | f |
+    | N | 0 Bank2 | | 010.0 | CBank | 000 | M | cbank
  
  **not available** in 10-bit mode:
  
-    | 0 1 2 3 | 4  | | 567 | 8 9 a | b c d e  | f |
-    | 1 1 1 1 | 0  | | 000 | 1  00 | 0  RT    | M | mtlr
-    | 1 1 1 1 | 0  | | 000 | 1  01 | 0  RT    | M | mtctr
-    | 1 1 1 1 | 0  | | 000 | 1  11 | 0  RT    | M | mtcr
-    | 1 1 1 1 | 1  | | 000 | 1  00 | 0  RA    | M | mflr
-    | 1 1 1 1 | 1  | | 000 | 1  01 | 0  RA    | M | mfctr
-    | 1 1 1 1 | 1  | | 000 | 1  11 | 0  RA    | M | mfcr
+    | 0 1 2 3 | 4  | | 567.8 | 9 ab | cde  | f |
+    | 1 1 1 1 | 0  | | 001.1 | 0 00 |  RT  | M | mtlr
+    | 1 1 1 1 | 0  | | 001.1 | 0 01 |  RT  | M | mtctr
+    | 1 1 1 1 | 0  | | 001.1 | 0 11 |  RT  | M | mtcr
+    | 1 1 1 1 | 1  | | 001.1 | 0 00 |  RA  | M | mflr
+    | 1 1 1 1 | 1  | | 001.1 | 0 01 |  RA  | M | mfctr
+    | 1 1 1 1 | 1  | | 001.1 | 0 11 |  RA  | M | mfcr
  
  ### Unallocated
  
-    | 0 1 2 3 | 4  | | 567 | 8 9 a | b c d e  | f |
-    | 0 0 1 0 |    | | 000 | 1     | 0        | M |
-    | 0 0 1 1 |    | | 000 | 1     | 0        | M |
-    | 0 1 0 1 |    | | 000 | 1     | 0        | M |
-    | 1 0 1 0 |    | | 000 | 1     | 0        | M |
-    | 1 0 1 1 |    | | 000 | 1     | 0        | M |
-    | 1 1 0 0 |    | | 000 | 1     | 0        | M |
-    | 1 1 1 1 | 0  | | 000 | 1  10 | 0        | M |
-    | 1 1 1 1 | 1  | | 000 | 1  10 | 0        | M |
-
+    | 0 1 2 3 | 4  | | 567.8 | 9 ab | cde  | f |
+    | 0 0 1 0 |    | | 001.1 | 0    |      | M |
+    | 0 0 1 1 |    | | 001.1 | 0    |      | M |
+    | 0 1 0 1 |    | | 001.1 | 0    |      | M |
+    | 1 0 1 0 |    | | 001.1 | 0    |      | M |
+    | 1 0 1 1 |    | | 001.1 | 0    |      | M |
+    | 1 1 0 0 |    | | 001.1 | 0    |      | M |
+    | 1 1 1 1 |    | | 001.1 | 0 10 |      | M |