openpower/sv/16_bit_compressed.mdwn

   1 # 16 bit Compressed
   2
   3 Similar to VLE (but without immediate-prefixing) this encoding is designed
   4 to fit on top of OpenPOWER ISA v3.0B when a "Modeswitch" bit is set (PCR
   5 is recommended). Note that Compressed is *mutually exclusively incompatible*
   6 with OpenPOWER v3.1B "prefixing" due to using (requiring) both EXT000
   7 and EXT001. Hypothetically it could be made to use anything other than
   8 EXT001, with some inconvenience (extra gates).  The incompatibility is
   9 "fixed" by swapping out of "Compressed" Mode and back into "Normal"
  10 (v3.1B) Mode, at runtime, as needed.
  11
  12 Although initially intended to be augmented by Simple-V Prefixing (to
  13 add Vector context, width overrides, e.g IEEE754 FP16, and predication) yet not put pressure on I-Cache power
  14 or size, this Compressed Encoding is not critically dependent
  15 *on* SV Prefixing, and may be used stand-alone.
  16
  17 See:
  18
  19 * <https://bugs.libre-soc.org/show_bug.cgi?id=238>
  20 * <https://ftp.libre-soc.org/VLE_314-68105.pdf> VLE Encoding
  21 * <http://lists.mailinglist.openpowerfoundation.org/pipermail/openpower-hdl-cores/2020-November/000210.html>
  22
  23 This one is a conundrum.  OpenPOWER ISA was never designed with 16
  24 bit in mind.  VLE was added 10 years ago but only by way of marking
  25 an entire 64k page as "VLE".  With VLE not maintained it is not
  26 fully compatible with current PowerISA.
  27
  28 Here, in order to embed 16 bit into a predominantly 32 bit stream the
  29 overhead of using an entire 16 bits just to switch into Compressed mode
  30 is itself a significant overhead.  The situation is made worse by 6 bits
  31 being taken up by Major Opcode space, leaving only 10 bits to allocate
  32 to actual instructions.
  33
  34 Contrast this with RVC which takes 3 out of 4
  35 combinations of the first 2 bits for indicating 16-bit (anything with 0b00 to 0b10 in the LSBs), and uses the 4th as a Huffman-style escape-sequence, easily allowing standard 32 bit and 16 bit to intermingle cleanly.  To achieve the same thing on OpenPOWER would require a whopping 24 6-bit Major Opcodes which is clearly impractical: other schemes need to be devised.
  36
  37 In addition we would like to add SV-C32 which is a Vectorised version
  38 of 16 bit Compressed, and ideally have a variant that adds the 27-bit
  39 prefix format from SV-P64, as well.
  40
  41 Potential ways to reduce pressure on the 16 bit space are:
  42
  43 * To use more than one v3.0B Major Opcode, preferably an odd-even
  44   contiguous pair
  45 * To provide "paging".  This involves bank-switching to alternative optimised encodings for specific workloads
  46 * To enter "16 bit mode" for durations specified at the start
  47 * To reserve one bit of every 16 bit instruction to indicate that the 16 bit mode is to continue to be sustained
  48
  49 This latter would be useful in the Vector context to have an alternative
  50 meaning: as the bit which determines whether the instruction is 11-bit
  51 prefixed or 27-bit prefixed:
  52
  53     0 1 2 3 4 5 6 7 8 9 a b c d e f |
  54     |major op | 11 bit vector prefix|
  55     |16 bit opcode  alt vec. mode ^ |
  56     | extra vector prefix if alt set|
  57
  58 Using a major opcode to enter 16 bit mode, leaves 11 bits to find
  59 something to use them for:
  60
  61     0 1 2 3 4 5 6 7 8 9 a b c d e f |
  62     |major op | what to do here   1 |
  63     |16 bit    stay in 16bit mode 1 |
  64     |16 bit    stay in 16bit mode 1 |
  65     |16 bit       exit 16bit mode 0 |
  66
  67 One possibility is that the 11 bits are used for bank selection, with
  68 some room for additional context such as altering the registers used
  69 for the 16 bit operations (bank selection of which scalar regs).
  70 However the downside is that short sequences of Compressed instructions
  71 become penalised by the fixed overhead.  Even a single 16 bit instruction requires a 16 bit overhead to "gain access" to 16 bit "mode", making the exercise pointless.
  72
  73 An alternative is to use the first 11 bits for only the utmost commonly used
  74 instructions.  That being the case then one of those 11 bits could
  75 be dedicated to saying if 16 bit mode is to be continued, at which
  76 point *all* 16 bits can be used for Compressed.
  77 10 bits remain for actual opcodes, which is ridiculously tight,
  78 however the opportunity to subsequently use all 16 bits is worth it.
  79
  80 The reason for picking 2 contiguous Major v3.0B opcodes is illustrated below:
  81
  82     |0 1 2 3 4 5 6 7 8 9 a b c d e f|
  83     |major op..0| LO Half C space   |
  84     |major op..1| HI Half C space   |
  85     |N N N N N|<--11 bits C space-->|
  86
  87 If NNNNN is the same value (two contiguous Major v3.0B Opcodes) this saves gates at a critical part of the decode phase.
  88
  89 # Opcode Allocation Ideas
  90
  91 * one bit from the 16-bit mode is used to indicate that standard
  92   (v3.0B) mode is to be dropped into for only one single instruction
  93   <https://bugs.libre-soc.org/show_bug.cgi?id=238#c2>
  94
  95 ## Opcodes exploration (Attempt 1)
  96
  97 Switching between different encoding modes is controlled by M (alone)
  98 in 10-bit mode, and M and N in 16-bit mode.
  99
 100 * M in 10-bit mode if zero indicates that following instructions are
 101   standard OpenPOWER ISA 32-bit encoded (including, redundantly,
 102   further 10/16-bit instructions)
 103 * M in 10-bit mode if 1 indicates that following instructions are
 104   in 16-bit encoding mode
 105
 106 Once in 16-bit mode:
 107
 108 * 0b01 (M=1, N=0): stay in 16-bit mode
 109 * 0b00: leave 16-bit mode permanently (return to standard OpenPOWER ISA)
 110 * 0b10: leave 16-bit mode for one cycle (return to standard OpenPOWER ISA)
 111 * 0b11: free to be used for something completely different.
 112
 113 The current "top" idea for 0b11 is to use it for a new encoding format
 114 of predominantly "immediates-based" 16-bit instructions (branch-conditional,
 115 addi, mulli etc.)
 116
 117 * The Compressed Major Opcode is in bits 5-7.
 118 * Minor opcode in bit 8.
 119 * In some cases bit 9 is taken as an additional sub-opcode, followed
 120   by bits 0-4 (for CR operations)
 121 * M+N mode-switching is not available for C-Major.minor 0b001.1
 122 * 10 bit mode may be expanded by 16 bit mode, adding capabilities
 123   that do not fit in the extreme limited space.
 124
 125 Mode-switching FSM showing relationship between v3.0B, C 10bit and C 16bit.
 126 16-bit immediate mode remains in 16-bit.
 127
 128     | 0 | 1234 | 567  8 | 9abcde | f | explanation
 129     | EXT000/1 | Cmaj.m | fields | 0 | 10bit then v3.0B
 130     | EXT000/1 | Cmaj.m | fields | 1 | 10bit then 16bit
 131     | 0 | flds | Cmaj.m | fields | 0 | 16bit then v3.0B
 132     | 0 | flds | Cmaj.m | fields | 1 | 16bit then 16bit
 133     | 1 | flds | Cmaj.m | fields | 0 | 16b then 1x v3.0B
 134     | 1 | flds | Cmaj.m | fields | 1 | 16b/imm then 16bit
 135
 136 Notes:
 137
 138 * Cmaj.m is the C major/minor opcode: 3 bits for major, 1 for minor
 139 * EXT000 and EXT001 are v3.0B Major Opcodes.  The first 5 bits
 140   are zero, therefore the 6th bit is actually part of Cmaj.
 141 * "10bit then 16bit" means "this instruction is encoded C 10bit
 142   and the following one in C 16bit"
 143
 144 ### C Instruction Encoding types
 145
 146 10-bit Opcode formats (all start with v3.0B EXT000 or EXT001
 147 Major Opcodes)
 148
 149     | 01234    | 567  8 | 9  | a b | c  | d e | f | enc
 150     | E01      | Cmaj.m | fld1     | fld2     | M | 10b
 151     | E01      | Cmaj.m | offset              | M | 10b b
 152     | E01      | 001.1  | S1 | fd1 | S2 | fd2 | M | 10b sub
 153     | E01      | 111.m  | fld1     | fld2     | M | 10b LDST
 154
 155 16-bit Opcode formats (including 10/16/v3.0B Switching)
 156
 157     | 0 | 1234 | 567  8 | 9  | a b | c  | d e | f | enc
 158     | N | immf | Cmaj.m | fld1     | fld2     | M | 16b
 159     | 1 | immf | Cmaj.m | fld1     | imm      | 1 | 16b imm
 160     | fd3      | 001.1  | S1 | fd1 | S2 | fd2 | M | 16b sub
 161     | N | fd4  | 111.m  | fld1     | fld2     | M | 16b LDST
 162
 163 Notes:
 164
 165 * fld1 and fld2 can contain reg numbers, immediates, or opcode
 166   fields (BO, BI, LK)
 167 * S1 and S2 are further sub-selectors of C 001.1
 168
 169 ### Immediate Opcodes
 170
 171 only available in 16-bit mode, only available when M=1 and N=1
 172 and when Cmaj.min is not 0b001.1.
 173
 174 instruction counts from objdump on /bin/bash:
 175
 176       466 extsw r1,r1
 177       649 stw r1,1(r1)
 178       691 lwz r1,1(r1)
 179       705 cmpdi r1,1
 180       791 cmpwi r1,1
 181       794 addis r1,r1,1
 182      1474 std r1,1(r1)
 183      1846 li r1,1
 184      2031 mr r1,r1
 185      2473 addi r1,r1,1
 186      3012 nop
 187      3028 ld r1,1(r1)
 188
 189
 190     | 0 | 1  | 2 | 3 4 | | 567.8 | 9ab  | cde | f |
 191     | 1 | 0  | 0   0 0 | | 001.0 |      | 000 | 1 | TBD
 192     | 1 | 0  |  sh2    | | 001.0 | RA   | sh  | 1 | sradi.
 193     | 1 | 1  | 0   0 0 | | 001.0 |      | 000 | 1 | TBD
 194     | 1 | 1  | 0 | sh2 | | 001.0 | RA   | sh  | 1 | srawi.
 195     | 1 | 1  | 1 |     | | 001.0 |      |     | 1 | TBD
 196     | 1 | i2 |  RT     | | 010.0 | RA|0 | imm | 1 | addi
 197     | 1 | 0 | i2       | | 010.1 | RA   | imm | 1 | cmpdi
 198     | 1 | 1 | i2       | | 010.1 | RA   | imm | 1 | cmpwi
 199     | 1 | 0 | i2       | | 011.0 | RT!=1| imm | 1 | ldspi
 200     | 1 | 1 | i2       | | 011.0 | RT!=1| imm | 1 | lwspi
 201     | 1 | 0 | i2       | | 011.1 | RT!=1| imm | 1 | stwspi
 202     | 1 | 1 | i2       | | 011.1 | RT!=1| imm | 1 | stdspi
 203     | 1 |              | | 011.0 | 001  |     | 1 | TBD
 204     | 1 |              | | 011.1 | 001  |     | 1 | TBD
 205     | 1 | i2           | | 100.0 | RT   | imm | 1 | stwi
 206     | 1 | i2           | | 100.1 | RT   | imm | 1 | stdi
 207     | 1 | i2           | | 101.0 | RA   | imm | 1 | ldi
 208     | 1 | i2           | | 101.1 | RA   | imm | 1 | lwi
 209     | 1 | i2 | RA      | | 110.0 | RT   | imm | 1 | fsti
 210     | 1 | i2 | RA      | | 110.1 | RT   | imm | 1 | fstdi
 211     | 1 | i2 | RT      | | 111.0 | RA   | imm | 1 | flwi
 212     | 1 | i2 | RT      | | 111.1 | RA   | imm | 1 | fldi
 213
 214 Construction of immediate:
 215
 216 * LD/ST r1 (SP) variants should be offset by -256
 217  see <https://bugs.libre-soc.org/show_bug.cgi?id=238#c43>
 218   - SP variants map to e.g ld RT, imm(r1)
 219   - SV Prefixing can be used to map r1 to alternate regs
 220 * [1] not the same as v3.0B addis: the shift amount is smaller and actually
 221   still maps to within the v3.0B addi immediate range.
 222 * addi is EXTS(i2||imm) to give a 4-bit range -8 to +7
 223 * addis is EXTS(i2||imm||000) to give a 11-bit range -1024 to +1023 in increments of 8
 224 * all others are EXTS(i2||imm) to give a 7-bit range -128 to +127
 225   (further for LD/ST due to word/dword-alignment)
 226
 227 Further Notes:
 228
 229 * bc also has an immediate mode, listed separately below in Branch section
 230 * for LD/ST, offset is aligned.  8-byte: i2||imm||0b000 4-byte: 0b00
 231 * SV Prefix over-rides help provide alternative bitwidths for LD/ST
 232 * RA|0 if RA is zero, addi. becomes "li"
 233   - this only works if RT takes part of opcode
 234   - mv is also possible by specifying an immediate of zero
 235
 236 ### Illegal and nop
 237
 238 Note that illeg is all zeros, including in the 16-bit mode.
 239 Given that C is allocated to OpenPOWER ISA Major opcodes EXT000 and
 240 EXT001 this ensures that in both 10-bit *and* 16-bit mode, a 16-bit
 241 run of all zeros is considered "illegal" whilst 0b0000.0000.1000.0000
 242 is "nop"
 243
 244     | 16-bit mode | | 10-bit mode                 |
 245     | 0 | 1 | 234 | | 567.8  | 9  ab | c   de | f |
 246     | 0 | 0   000 | | 000.0  | 0  00 | 0   00 | 0 | illeg
 247     | 0 | 0   000 | | 000.0  | 0  00 | 0   00 | 1 | nop
 248
 249 16 bit mode only:
 250
 251     | 1 | 0   000 | | 000.0  | 0  00 | 0   00 | 0 | nop
 252     | 1 | nonzero | | 000.0  | 0  00 | 0   00 | 0 | TBD
 253
 254 Notes:
 255
 256 * All-zeros being an illegal instruction is normal for ISAs.  Ensuring that
 257   this remains true at all times i.e. for both 10 bit and 16 bit mode is
 258   common sense.
 259 * The 10-bit nop (bit 15, M=1) is intended for circumstances
 260   where alignment to 32-bit before returning to v3.0B is required.
 261   M=1 being an indication "return to Standard v3.0B Encoding Mode".
 262 * The 16-bit nop (bit 0, N=1) is intended for circumstances where a
 263   return to Standard v3.0B Encoding is required for one cycle
 264   but one cycle where alignment to a 32-bit boundary is needed.
 265   Examples of this would be to return to "strict" (non-C) mode
 266   where the PC may not be on a non-word-aligned boundary.
 267 * If for any reason multiple 16 bit nops are needed in succession
 268   the M=1 variant can be used, because each one returns to
 269   Standard v3.0B Encoding Mode, each time.
 270
 271 In essence the 2 nops are needed due to there being 2 different C forms: 10 and 16 bit.
 272
 273 ### Branch
 274
 275     | 16-bit mode | | 10-bit mode                 |
 276     | 0 | 1 | 234 | | 567.8  | 9  ab | c   de | f |
 277     | N | offs2   | | 000.LK | offs!=0        | M | b, bl
 278     | 1 | offs2   | | 000.LK | BI    | BO1 oo | 1 | bc, bcl
 279     | N | BO3 BI3 | | 001.0  | LK BI | BO     | M | bclr, bclrl
 280
 281 16 bit mode:
 282
 283 * bc only available when N,M=0b11
 284 * offs2 extends offset in MSBs
 285 * BI3 extends BI in MSBs to allow selection of full CR
 286 * BO3 extends BO
 287 * bc offset constructed from oo as LSBs and offs2 as MSBs
 288 * bc BI allows selection of all bits from CR0 or CR1
 289 * bc CR check is always active (as if BO0=1) therefore BO1 inverts
 290
 291 10 bit mode:
 292
 293 * illegal (all zeros) covers part of branch (offs=0,M=0,LK=0)
 294 * nop also covers part of branch (offs=0,M=0,LK=1)
 295 * bc **not available** in 10-bit mode
 296 * BO[0] enables CR check, BO[1] inverts check
 297 * BI refers to CR0 only (4 bits of)
 298 * no Branch Conditional with immediate
 299 * no Absolute Address
 300 * CTR mode allowed with BO[2] for b only.
 301 * offs is to 2 byte (signed) aligned
 302 * all branches to 2 byte aligned
 303
 304 ### LD/ST
 305
 306     | 16-bit mode      | | 10-bit mode               |
 307     | 0   | 1  | 2 3 4 | | 567.8 | 9 a b | c d e | f |
 308     | RA2 | SZ |  RB   | | 001.1 | 1  RA | 0  RT | M | st
 309     | RA2 | SZ |  RB   | | 001.1 | 1  RA | 1  RT | M | fst
 310     | N   | SZ |  RT   | | 111.0 |  RA   |  RB   | M | ld
 311     | N   | SZ |  RT   | | 111.1 |  RA   |  RB   | M | fld
 312
 313 * elwidth overrides can set different widths
 314
 315 16 bit mode:
 316
 317 * SZ=1 is 64 bit, SZ=0 is 32 bit
 318 * RA2 extends RA to 3 bits (MSB)
 319 * RT2 extends RT to 3 bits (MSB)
 320
 321 10 bit mode:
 322
 323 * RA and RB are only 2 bit (0-3)
 324 * for LD, RT is implicitly RB: "ld RT=RB, RA(RB)"
 325 * for ST, there is no offset: "st RT, RA(0)"
 326
 327 ### Arithmetic
 328
 329     | 16-bit mode | | 10-bit mode             |
 330     | 0 | 1 | 234 | | 567.8 | 9ab | c d e | f |
 331     | N | 0 | RT  | | 010.0 | RB  | RA!=0 | M | add
 332     | N | 0 | RT  | | 010.1 | RB  | RA|0  | M | sub.
 333     | N | 0 | BF  | | 011.0 | RB  | RA|0  | M | cmpl
 334
 335 Notes:
 336
 337 * sub. and cmpl: default CR target is CR0
 338 * for (RA|0) when RA=0 the input is a zero immediate,
 339   meaning that sub. becomes neg. and cmp becomes cmpi against zero
 340 * RT is implicitly RB: "add RT(=RB), RA, RB"
 341 * Opcode 0b010.0 RA=0 is not missing from the above:
 342   it is a system-wide instruction, "cbank" (section below)
 343
 344 16 bit mode only:
 345
 346     | 0 | 1 | 234 | | 567.8 | 9ab | cde   | f |
 347     | N | 1 | RA  | | 010.0 | RB  | RS    | 0 | sld.
 348     | N | 1 | RA  | | 010.1 | RB  | RS!=0 | 0 | srd.
 349     | N | 1 | RA  | | 010.1 | RB  | 000   | 0 | srad.
 350     | N | 1 | BF  | | 011.0 | RB  | RA|0  | 0 | cmpw
 351
 352 Notes:
 353
 354 * for srad, RS=RA: "srad. RA(=RS), RS, RB"
 355
 356
 357 ### Logical
 358
 359     | 16-bit mode   | | 10-bit mode             |
 360     | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
 361     | N | 0 |  RT   | | 100.0 | RB  | RA!=0 | M | and
 362     | N | 0 |  RT   | | 100.1 | RB  | RA!=0 | M | nand
 363     | N | 0 |  RT   | | 101.0 | RB  | RA!=0 | M | or
 364     | N | 0 |  RT   | | 101.1 | RB  | RA!=0 | M | nor
 365     | N | 0 |  RT   | | 100.0 | RB  | 0 0 0 | M | extsw
 366     | N | 0 |  RT   | | 100.1 | RB  | 0 0 0 | M | cntlz
 367     | N | 0 |  RT   | | 101.0 | RB  | 0 0 0 | M | popcnt
 368     | N | 0 |  RT   | | 101.1 | RB  | 0 0 0 | M | not
 369
 370 16-bit mode only:
 371
 372     | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
 373     | N | 1 |  RT   | | 100.0 | RB  | RA!=0 | 0 | TBD
 374     | N | 1 |  RT   | | 100.1 | RB  | RA!=0 | 0 | TBD
 375     | N | 1 |  RT   | | 101.0 | RB  | RA!=0 | 0 | xor
 376     | N | 1 |  RT   | | 101.1 | RB  | RA!=0 | 0 | eqv (xnor)
 377     | N | 1 |  RT   | | 100.0 | RB  | 0 0 0 | 0 | extsb
 378     | N | 1 |  RT   | | 100.1 | RB  | 0 0 0 | 0 | cnttz
 379     | N | 1 |  RT   | | 101.0 | RB  | 0 0 0 | 0 | TBD
 380     | N | 1 |  RT   | | 101.1 | RB  | 0 0 0 | 0 | extsh
 381
 382 10 bit mode:
 383
 384 * for (RA|0) when RA=0 the input is a zero immediate,
 385   meaning that nor becomes not
 386 * cntlz, popcnt, exts **not available** in 10-bit mode
 387 * RT is implicitly RB: "and RT(=RB), RA, RB"
 388
 389 ### Floating Point
 390
 391 Note here that elwidth overrides (SV Prefix) can be used to select FP16/32/64
 392
 393     | 16-bit mode   | | 10-bit mode             |
 394     | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
 395     | N |   |  RT   | | 011.1 | RB  | RA!=0 | M | fsub.
 396     | N | 0 |  RT   | | 110.0 | RB  | RA!=0 | M | fadd
 397     | N | 0 |  RT   | | 110.1 | RB  | RA!=0 | M | fmul
 398     | N | 0 |  RT   | | 011.1 | RB  | 0 0 0 | M | fneg.
 399     | N | 0 |  RT   | | 110.0 | RB  | 0 0 0 | M |
 400     | N | 0 |  RT   | | 110.1 | RB  | 0 0 0 | M |
 401
 402 16-bit mode only:
 403
 404     | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
 405     | N | 1 |  RT   | | 011.1 | RB  | RA!=0 | 0 |
 406     | N | 1 |  RT   | | 110.0 | RB  | RA!=0 | 0 |
 407     | N | 1 |  RT   | | 110.1 | RB  | RA!=0 | 0 | fdiv
 408     | N | 1 |  RT   | | 011.1 | RB  | 0 0 0 | 0 | fabs.
 409     | N | 1 |  RT   | | 110.0 | RB  | 0 0 0 | 0 | fmr.
 410     | N | 1 |  RT   | | 110.1 | RB  | 0 0 0 | 0 |
 411
 412 16 bit only, FP to INT convert (using C 0b001.1 subencoding)
 413
 414     | 0123 | 4 | | 567.8 | 9 ab | cde  | f |
 415     | 0010 | X | | 001.1 | 0 RA | Y RT | M | fp2int
 416     | 0011 | X | | 001.1 | 0 RA | Y RT | M | int2fp
 417
 418 * X: signed=1, unsigned=0
 419 * Y: FP32=0, FP64=1
 420
 421 10 bit mode:
 422
 423 * fsub. fneg. and fmr. default target is CR1
 424 * fmr. is **not available** in 10-bit mode
 425 * fdiv is **not available** in 10-bit mode
 426
 427 16 bit mode:
 428
 429 * fmr. copies RB to RT (and sets CR1)
 430
 431 ### Condition Register
 432
 433     | 16-bit mode   | | 10-bit mode            |
 434     | 0 1 2 3 | 4   | | 567.8 | 9 ab | cde | f |
 435     | 0 0 0 0 | BF2 | | 001.1 | 0 BF | BFA | M | mcrf
 436     | 0 0 0 1 | BA2 | | 001.1 | 0 BA | BB  | M | crnor
 437     | 0 1 0 0 | BA2 | | 001.1 | 0 BA | BB  | M | crandc
 438     | 0 1 1 0 | BA2 | | 001.1 | 0 BA | BB  | M | crxor
 439     | 0 1 1 1 | BA2 | | 001.1 | 0 BA | BB  | M | crnand
 440     | 1 0 0 0 | BA2 | | 001.1 | 0 BA | BB  | M | crand
 441     | 1 0 0 1 | BA2 | | 001.1 | 0 BA | BB  | M | creqv
 442     | 1 1 0 1 | BA2 | | 001.1 | 0 BA | BB  | M | crorc
 443     | 1 1 1 0 | BA2 | | 001.1 | 0 BA | BB  | M | cror
 444
 445 10 bit mode:
 446
 447 * mcrf BF is only 2 bits which means the destination is only CR0-CR3
 448 * CR operations: **not available** in 10-bit mode (but mcrf is)
 449
 450 16 bit mode:
 451
 452 * mcrf BF2 extends BF (in MSB) to 3 bits
 453 * CR operations: destination register is same as BA.
 454 * CR operations: only possible on CR0 and CR1
 455
 456 SV (Vector Mode):
 457
 458 * CR operations: greatly extended reach/range (useful for predicates)
 459
 460 ### System
 461
 462 cbank: Selection of Compressed-encoding "Bank".  Different "banks"
 463 give different meanings to opcodes.  Example: CBank=0b001 is heavily
 464 optimised to A/Video Encode/Decode.  cbank borrows from add's encoding
 465 space (when RA==0)
 466
 467     | 16-bit mode | | 10-bit mode             |
 468     | 0 | 1 2 3 4 | | 567.8 | 9ab   | cde | f |
 469     | N | 0 Bank2 | | 010.0 | CBank | 000 | M | cbank
 470
 471 **not available** in 10-bit mode:
 472
 473     | 0 1 2 3 | 4  | | 567.8 | 9 ab | cde  | f |
 474     | 1 1 1 1 | 0  | | 001.1 | 0 00 |  RT  | M | mtlr
 475     | 1 1 1 1 | 0  | | 001.1 | 0 01 |  RT  | M | mtctr
 476     | 1 1 1 1 | 0  | | 001.1 | 0 11 |  RT  | M | mtcr
 477     | 1 1 1 1 | 1  | | 001.1 | 0 00 |  RA  | M | mflr
 478     | 1 1 1 1 | 1  | | 001.1 | 0 01 |  RA  | M | mfctr
 479     | 1 1 1 1 | 1  | | 001.1 | 0 11 |  RA  | M | mfcr
 480
 481 ### Unallocated
 482
 483     | 0 1 2 3 | 4  | | 567.8 | 9 ab | cde  | f |
 484     | 0 1 0 1 |    | | 001.1 | 0    |      | M |
 485     | 1 0 1 0 |    | | 001.1 | 0    |      | M |
 486     | 1 0 1 1 |    | | 001.1 | 0    |      | M |
 487     | 1 1 0 0 |    | | 001.1 | 0    |      | M |
 488     | 1 1 1 1 |    | | 001.1 | 0 10 |      | M |