openpower/sv/16_bit_compressed.mdwn

   1 # 16 bit Compressed
   2
   3 See:
   4
   5 * <https://bugs.libre-soc.org/show_bug.cgi?id=238>
   6 * <https://ftp.libre-soc.org/VLE_314-68105.pdf> VLE Encoding
   7
   8 This one is a conundrum.  OpenPOWER ISA was never designed with 16
   9 bit in mind.  VLE was added 10 years ago but only by way of marking
  10 an entire 64k page as "VLE".  With VLE not maintained it is not
  11 fully compatible with current PowerISA.
  12
  13 Here, in order to embed 16 bit into a predominantly 32 bit stream the
  14 overhead of using an entire 16 bits just to switch into Compressed mode
  15 is itself a significant overhead.  The situation is made worse by 5 bits
  16 being taken up by Major Opcode space, leaving only 11 bits to allocate
  17 to actual instructions.
  18
  19 In addition we would like to add SV-C32 which is a Vectorised version
  20 of 16 bit Compressed, and ideally have a variant that adds the 27-bit
  21 prefix format from SV-P64, as well.
  22
  23 Potential ways to reduce pressure on the 16 bit space are:
  24
  25 * To provide "paging".  This involves bank-switching to alternative optimised encodings for specific workloads
  26 * To enter "16 bit mode" for durations specified at the start
  27 * To reserve one bit of every 16 bit instruction to indicate that the 16 bit mode is to continue to be sustained
  28
  29 This latter would be useful in the Vector context to have an alternative
  30 meaning: as the bit which determines whether the instruction is 11-bit
  31 prefixed or 27-bit prefixed:
  32
  33     0 1 2 3 4 5 6 7 8 9 a b c d e f |
  34     |major op | 11 bit vector prefix|
  35     |16 bit opcode  alt vec. mode ^ |
  36     | extra vector prefix if alt set|
  37
  38 Using a major opcode to enter 16 bit mode, leaves 11 bits to find
  39 something to use them for:
  40
  41     0 1 2 3 4 5 6 7 8 9 a b c d e f |
  42     |major op | what to do here   1 |
  43     |16 bit    stay in 16bit mode 1 |
  44     |16 bit    stay in 16bit mode 1 |
  45     |16 bit       exit 16bit mode 0 |
  46
  47 One possibility is that the 11 bits are used for bank selection, with
  48 some room for additional context such as altering the registers used
  49 for the 16 bit operations (bank selection of which scalar regs)
  50
  51 Another is to use the 11 bits for only the utmost commonly used
  52 instructions.  That being the case then even one of those 11 bits would
  53 also need to be dedicated to saying if 16 bit mode is to be continued.
  54 10 bits remain for actual opcodes!
  55
  56 # Opcode Allocation Ideas
  57
  58 * one bit from the 16-bit mode is used to indicate that 32-bit mode
  59   is to be dropped into for only one single instruction
  60   <https://bugs.libre-soc.org/show_bug.cgi?id=238#c2>
  61
  62 ## Opcodes exploration (Attempt 1)
  63
  64 Switching between different encoding modes is controlled by M (alone)
  65 in 10-bit mode, and M and N in 16-bit mode.
  66
  67 * M in 10-bit mode if zero indicates that following instructions are
  68   standard OpenPOWER ISA 32-bit encoded (including, redundantly,
  69   further 10/16-bit instructions)
  70 * M in 10-bit mode if 1 indicates that following instructions are
  71   in 16-bit encoding mode
  72
  73 Once in 16-bit mode:
  74
  75 * 0b01 (M=1, N=0): stay in 16-bit mode
  76 * 0b00: leave 16-bit mode permanently (return to standard OpenPOWER ISA)
  77 * 0b10: leave 16-bit mode for one cycle (return to standard OpenPOWER ISA)
  78 * 0b11: free to be used for something completely different.
  79
  80 The current "top" idea for 0b11 is to use it for a new encoding format
  81 of predominantly "immediates-based" 16-bit instructions (branch-conditional,
  82 addi, mulli etc.)
  83
  84 The Compressed Major Opcode is in bits 5-7.
  85
  86 * M+N mode-switching is not available for C-Major 0b001 or 0b111
  87
  88 ### Immediate Opcodes
  89
  90 only available in 16-bit mode, and only available when M=1 and N=1
  91
  92     | 0 | 1  | 2 3 4 | | 567 | e | 89a  | b c | d   | e | f |
  93     | 1 | o2 |  RT   | | 010 | 1 | RB|0 | offs      | 1 | addi.
  94     | 1 | o2 |  RT   | | 011 | 1 | RB|0 | offs      | 1 | addis.
  95     | 1 | o2     | 0 | | 100 | 1 | RB   | offs      | 1 | cmpdi
  96     | 1 | o2     | 1 | | 100 | 1 | RB   | offs      | 1 | cmpwi
  97     | 1 | o2     | 0 | | 101 | 1 | RA   | offs      | 1 | ldi
  98     | 1 | o2     | 1 | | 101 | 1 | RA   | offs      | 1 | lwi
  99     | 1 | o2     | 0 | | 110 | 1 | RA   | offs      | 1 | flwi
 100     | 1 | o2     | 1 | | 110 | 1 | RA   | offs      | 1 | fldi
 101
 102 * Note that bc is included (below)
 103 * immediate is constructed from offs (LSBs) and o2 (MSB)
 104 * for loads, offset is aligned.  8byte: o2||offs||0b000 4byte: 0b00
 105 * RB|0 if RB is zero, addi. becomes "li"
 106
 107 ### Branch
 108
 109 10 bit mode may be expanded by 16 bit mode later, adding capabilities
 110 that do not fit in the extreme limited space.
 111
 112     | 16-bit mode | | 10-bit mode              |
 113     | 0 | 1 | 234 | | 567 | 8 9a | b | cd | e  | f |
 114     | 0 | 0   000 | | 000 | 0 00 | 0   00 | 0  | 0 | illeg
 115     | N | offs2   | | 000 | LK offs            | M | b, bl
 116     | 1 | offs2   | | 000 | LK | BI   | BO1 oo | 1 | bc, bcl
 117     | N | BO3 BI3 | | 001 | LK | 0 BI | BO     | M | bclr, bclrl
 118
 119 16 bit mode:
 120
 121 * bc only available when N,M=0b11
 122 * offs2 extends offset in MSBs
 123 * BI3 extends BI in MSBs to allow selection of full CR
 124 * BO3 extends BO
 125 * bc offset constructed from oo as LSBs and offs2 as MSBs
 126 * bc BI allows selection of all bits from CR0 or CR1
 127 * bc CR check is always active (as if BO0=1) therefore BO1 inverts
 128
 129 10 bit mode:
 130
 131 * bc **not available** in 10-bit mode
 132 * BO[0] enables CR check, BO[1] inverts check
 133 * BI refers to CR0 only (4 bits of)
 134 * no Branch Conditional with immediate
 135 * no Absolute Address
 136 * CTR mode allowed with BO[2] for b only.
 137 * offs is to 2 byte (signed) aligned
 138 * all branches to 2 byte aligned
 139
 140 ### LD/ST
 141
 142     | 16-bit mode       | | 10-bit mode             |
 143     | 0   | 1   | 2 3 4 | | 567 | e | 8 9 a | b c d | f |
 144     | RB2 | RA2 |  RT   | | 001 | 0 | 1  RA | 1  RB | M | fld
 145     | RA2 | RT2 |  RB   | | 001 | 1 | 1  RA | 1  RT | M | fst
 146     |     |     |  RT   | | 111 | 0 |  RA   |  RB   | M | ld
 147     |     |     |  RB   | | 111 | 1 |  RA   |  RT   | M | st
 148
 149 * elwidth overrides can set different widths
 150
 151 16 bit mode:
 152
 153 * F=1 is FLD, FST
 154 * RA2 extends RA to 3 bits (MSB)
 155 * RT2 extends RT to 3 bits (MSB)
 156
 157 10 bit mode:
 158
 159 * RA and RB are only 2 bit (0-3)
 160 * for LD, RT is implicitly RB: "ld RT=RB, RA(RB)"
 161 * for ST, there is no offset: "st RT, RA(0)"
 162
 163 ### Arithmetic
 164
 165     | 16-bit mode   | | 10-bit mode           |
 166     | 0 | 1 | 2 3 4 | | 567 | e | 89a | b c d | f |
 167     | N |   |  RT   | | 010 | 0 | RB  | RA!=0 | M | add
 168     | N |   |  RT   | | 010 | 1 | RB  | RA    | M | mul
 169     | N |   | RT!=0 | | 011 | 0 | RB  | RA!=0 | M | sub.
 170     | N | 0 | 000   | | 011 | 0 | RB  | RA!=0 | M | cmpw
 171     | N | 1 | 000   | | 011 | 0 | RB  | RA!=0 | M | cmpl
 172     | N |   |  RT   | | 011 | 0 | RB  | 000   | M | neg.
 173
 174 10 bit mode:
 175
 176 * sub. default CR target is CR0
 177 * for (RA|0) when RA=0 the input is a zero immediate,
 178   meaning that sub. becomes neg.
 179 * RT is implicitly RB: "add RT(=RB), RA, RB"
 180
 181 ### Logical
 182
 183     | 16-bit mode   | | 10-bit mode             |
 184     | 0 | 1 | 2 3 4 | | 567 | e | 8 9 a | b c d | f |
 185     | N | 0 |  RT   | | 100 | 0 | RB    | RA!=0 | M | and
 186     | N | 0 |  RT   | | 100 | 1 | RB    | RA!=0 | M | nand
 187     | N | 0 |  RT   | | 101 | 0 | RB    | RA!=0 | M | or
 188     | N | 0 |  RT   | | 101 | 1 | RB    | RA!=0 | M | nor
 189     | N | 0 |  RT   | | 100 | 0 | RB    | 0 0 0 | M | extsw
 190     | N | 0 |  RT   | | 100 | 1 | RB    | 0 0 0 | M | cntlz
 191     | N | 0 |  RT   | | 101 | 0 | RB    | 0 0 0 | M | popcnt
 192     | N | 0 |  RT   | | 101 | 1 | RB    | 0 0 0 | M | not
 193
 194 16-bit mode only:
 195
 196     | 0 | 1 | 2 3 4 | | 567 | e | 8 9 a | b c d | f |
 197     | N | 1 |  RT   | | 100 | 0 | RB    | RA!=0 | M |
 198     | N | 1 |  RT   | | 100 | 1 | RB    | RA!=0 | M |
 199     | N | 1 |  RT   | | 101 | 0 | RB    | RA!=0 | M | xor
 200     | N | 1 |  RT   | | 101 | 1 | RB    | RA!=0 | M | eqv (xnor)
 201     | N | 1 |  RT   | | 100 | 0 | RB    | 0 0 0 | M | extsb
 202     | N | 1 |  RT   | | 100 | 1 | RB    | 0 0 0 | M | cnttz
 203     | N | 1 |  RT   | | 101 | 0 | RB    | 0 0 0 | M |
 204     | N | 1 |  RT   | | 101 | 1 | RB    | 0 0 0 | M | extsh
 205
 206 10 bit mode:
 207
 208 * for (RA|0) when RA=0 the input is a zero immediate,
 209   meaning that nor becomes not
 210 * cntlz, popcnt, exts **not available** in 10-bit mode
 211 * RT is implicitly RB: "and RT(=RB), RA, RB"
 212
 213 ### Floating Point
 214
 215 Note here that elwidth overrides (SV Prefix) can be used to select FP16/32/64
 216
 217     | 16-bit mode   | | 10-bit mode             |
 218     | 0 | 1 | 2 3 4 | | 567 | e | 8 9 a | b c d | f |
 219     | N |   |  RT   | | 011 | 1 | RB    | RA!=0 | M | fsub.
 220     | N | 0 |  RT   | | 110 | 0 | RB    | RA!=0 | M | fadd
 221     | N | 0 |  RT   | | 110 | 1 | RB    | RA!=0 | M | fmul
 222     | N | 0 |  RT   | | 011 | 1 | RB    | 0 0 0 | M | fneg.
 223     | N | 0 |  RT   | | 110 | 0 | RB    | 0 0 0 | M |
 224     | N | 0 |  RT   | | 110 | 1 | RB    | 0 0 0 | M |
 225
 226 16-bit mode only:
 227
 228     | 0 | 1 | 2 3 4 | | 567 | e | 8 9 a | b c d | f |
 229     | N | 1 |  RT   | | 011 | 1 | RB    | RA!=0 | M |
 230     | N | 1 |  RT   | | 110 | 0 | RB    | RA!=0 | M |
 231     | N | 1 |  RT   | | 110 | 1 | RB    | RA!=0 | M | fdiv
 232     | N | 1 |  RT   | | 011 | 1 | RB    | 0 0 0 | M | fabs.
 233     | N | 1 |  RT   | | 110 | 0 | RB    | 0 0 0 | M | fmr.
 234     | N | 1 |  RT   | | 110 | 1 | RB    | 0 0 0 | M |
 235
 236 10 bit mode:
 237
 238 * fsub. fneg. and fmr. default target is CR1
 239 * fmr. is **not available** in 10-bit mode
 240 * fdiv is **not available** in 10-bit mode
 241
 242 16 bit mode:
 243
 244 * fmr. copies RB to RT (and sets CR1)
 245
 246 ### Condition Register
 247
 248     | 16-bit mode   | | 10-bit mode           |
 249     | 0 1 2 3 | 4   | | 567 | 8 9 a | b c d e | f |
 250     | 0 0 0 0 | BF2 | | 001 | 1  BF | 0  BFA  | M | mcrf
 251     | 0 0 0 1 | BA2 | | 001 | 1  BA | 0  BB   | M | crnor
 252     | 0 1 0 0 | BA2 | | 001 | 1  BA | 0  BB   | M | crandc
 253     | 0 1 1 0 | BA2 | | 001 | 1  BA | 0  BB   | M | crxor
 254     | 0 1 1 1 | BA2 | | 001 | 1  BA | 0  BB   | M | crnand
 255     | 1 0 0 0 | BA2 | | 001 | 1  BA | 0  BB   | M | crand
 256     | 1 0 0 1 | BA2 | | 001 | 1  BA | 0  BB   | M | creqv
 257     | 1 1 0 1 | BA2 | | 001 | 1  BA | 0  BB   | M | crorc
 258     | 1 1 1 0 | BA2 | | 001 | 1  BA | 0  BB   | M | cror
 259
 260 10 bit mode:
 261
 262 * mcrf BF is only 2 bits which means the destination is only CR0-CR3
 263 * CR operations: **not available** in 10-bit mode
 264
 265 16 bit mode:
 266
 267 * mcrf BF2 extends BF (in MSB) to 3 bits
 268 * CR operations: destination register is same as BA.
 269 * CR operations: only possible on CR0 and CR1
 270
 271 SV (Vector Mode):
 272
 273 * CR operations: greatly extended reach/range (useful for predicates)
 274
 275 ### System
 276
 277 cbank: Selection of Compressed-encoding "Bank".  Different "banks" give different
 278 meanings to opcodes.
 279 Example: CBank=0b001 is heavily optimised to A/Video
 280 Encode/Decode.
 281
 282     | 16-bit mode | | 10-bit mode             |
 283     | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
 284     |       Bank2 | | 010 | CBank | 0 0 0 | 0 | M | cbank
 285
 286 **not available** in 10-bit mode:
 287
 288     | 0 1 2 3 | 4  | | 567 | 8 9 a | b c d e  | f |
 289     | 1 1 1 1 | 0  | | 001 | 1  00 | 0  RT    | M | mtlr
 290     | 1 1 1 1 | 0  | | 001 | 1  01 | 0  RT    | M | mtctr
 291     | 1 1 1 1 | 0  | | 001 | 1  11 | 0  RT    | M | mtcr
 292     | 1 1 1 1 | 1  | | 001 | 1  00 | 0  RA    | M | mflr
 293     | 1 1 1 1 | 1  | | 001 | 1  01 | 0  RA    | M | mfctr
 294     | 1 1 1 1 | 1  | | 001 | 1  11 | 0  RA    | M | mfcr
 295
 296 ### Unallocated
 297
 298     | 0 1 2 3 | 4  | | 567 | 8 9 a | b c d e  | f |
 299     | 0 0 1 0 |    | | 001 | 1     | 0        | M |
 300     | 0 0 1 1 |    | | 001 | 1     | 0        | M |
 301     | 0 1 0 1 |    | | 001 | 1     | 0        | M |
 302     | 1 0 1 0 |    | | 001 | 1     | 0        | M |
 303     | 1 0 1 1 |    | | 001 | 1     | 0        | M |
 304     | 1 1 0 0 |    | | 001 | 1     | 0        | M |
 305     | 1 1 1 1 | 0  | | 001 | 1  10 | 0        | M |
 306     | 1 1 1 1 | 1  | | 001 | 1  10 | 0        | M |
 307