openpower/sv/16_bit_compressed.mdwn

   1 # 16 bit Compressed
   2
   3 See:
   4
   5 * <https://bugs.libre-soc.org/show_bug.cgi?id=238>
   6 * <https://ftp.libre-soc.org/VLE_314-68105.pdf> VLE Encoding
   7
   8 This one is a conundrum.  OpenPOWER ISA was never designed with 16
   9 bit in mind.  VLE was added 10 years ago but only by way of marking
  10 an entire 64k page as "VLE".  With VLE not maintained it is not
  11 fully compatible with current PowerISA.
  12
  13 Here, in order to embed 16 bit into a predominantly 32 bit stream the
  14 overhead of using an entire 16 bits just to switch into Compressed mode
  15 is itself a significant overhead.  The situation is made worse by 5 bits
  16 being taken up by Major Opcode space, leaving only 11 bits to allocate
  17 to actual instructions.
  18
  19 In addition we would like to add SV-C32 which is a Vectorised version
  20 of 16 bit Compressed, and ideally have a variant that adds the 27-bit
  21 prefix format from SV-P64, as well.
  22
  23 Potential ways to reduce pressure on the 16 bit space are:
  24
  25 * To provide "paging".  This involves bank-switching to alternative optimised encodings for specific workloads
  26 * To enter "16 bit mode" for durations specified at the start
  27 * To reserve one bit of every 16 bit instruction to indicate that the 16 bit mode is to continue to be sustained
  28
  29 This latter would be useful in the Vector context to have an alternative
  30 meaning: as the bit which determines whether the instruction is 11-bit
  31 prefixed or 27-bit prefixed:
  32
  33     0 1 2 3 4 5 6 7 8 9 a b c d e f |
  34     |major op | 11 bit vector prefix|
  35     |16 bit opcode  alt vec. mode ^ |
  36     | extra vector prefix if alt set|
  37
  38 Using a major opcode to enter 16 bit mode, leaves 11 bits to find
  39 something to use them for:
  40
  41     0 1 2 3 4 5 6 7 8 9 a b c d e f |
  42     |major op | what to do here   1 |
  43     |16 bit    stay in 16bit mode 1 |
  44     |16 bit    stay in 16bit mode 1 |
  45     |16 bit       exit 16bit mode 0 |
  46
  47 One possibility is that the 11 bits are used for bank selection, with
  48 some room for additional context such as altering the registers used
  49 for the 16 bit operations (bank selection of which scalar regs)
  50
  51 Another is to use the 11 bits for only the utmost commonly used
  52 instructions.  That being the case then even one of those 11 bits would
  53 also need to be dedicated to saying if 16 bit mode is to be continued.
  54 10 bits remain for actual opcodes!
  55
  56 # Opcode Allocation Ideas
  57
  58 * one bit from the 16-bit mode is used to indicate that 32-bit mode
  59   is to be dropped into for only one single instruction
  60   <https://bugs.libre-soc.org/show_bug.cgi?id=238#c2>
  61
  62 ## Opcodes exploration (Attempt 1)
  63
  64 Switching between different encoding modes is controlled by M (alone)
  65 in 10-bit mode, and M and N in 16-bit mode.
  66
  67 * M in 10-bit mode if zero indicates that following instructions are
  68   standard OpenPOWER ISA 32-bit encoded (including, redundantly,
  69   further 10/16-bit instructions)
  70 * M in 10-bit mode if 1 indicates that following instructions are
  71   in 16-bit encoding mode
  72
  73 Once in 16-bit mode:
  74
  75 * 0b01 (M=1, N=0): stay in 16-bit mode
  76 * 0b00: leave 16-bit mode permanently (return to standard OpenPOWER ISA)
  77 * 0b10: leave 16-bit mode for one cycle (return to standard OpenPOWER ISA)
  78 * 0b11: free to be used for something completely different.
  79
  80 The current "top" idea for 0b11 is to use it for a new encoding format
  81 of predominantly "immediates-based" 16-bit instructions (branch-conditional,
  82 addi, mulli etc.)
  83
  84 * The Compressed Major Opcode is in bits 5-7.
  85 * Minor opcode in bit 8.
  86 * In some cases bit 9 is taken as an additional sub-opcode, followed
  87   by bits 0-4 (for CR operations)
  88 * M+N mode-switching is not available for C-Major 0b001 or 0b111
  89 * 10 bit mode may be expanded by 16 bit mode, adding capabilities
  90   that do not fit in the extreme limited space.
  91
  92 ### Immediate Opcodes
  93
  94 only available in 16-bit mode, and only available when M=1 and N=1
  95
  96     | 0 | 1  | 2 3 4 | | 567.8  | 9ab  | c d e | f |
  97     | 1 | i2 |  RT   | | 010.i3 | RB|0 | imm   | 1 | addi.
  98     | 1 | i2         | | 011.0  | RB   | imm   | 1 | cmpdi
  99     | 1 | i2         | | 011.1  | RB   | imm   | 1 | cmpwi
 100     | 1 | i2         | | 100.0  | RT   | imm   | 1 | sti
 101     | 1 | i2         | | 100.1  | RT   | imm   | 1 | fstwi
 102     | 1 | i2         | | 101.0  | RA   | imm   | 1 | ldi
 103     | 1 | i2         | | 101.1  | RA   | imm   | 1 | lwi
 104     | 1 | i2         | | 110.0  | RA   | imm   | 1 | flwi
 105     | 1 | i2         | | 110.1  | RA   | imm   | 1 | fldi
 106
 107 Construction of immediate:
 108
 109 * addi is EXTS(i3||i2||imm) to give a 5-bit range -32 to +31
 110 * all others are EXTS(i2||imm) to give a 7-bit range -128 to +127
 111   (further for LD/ST due to word/dword-alignment)
 112
 113 Further Notes:
 114
 115 * bc also has an immediate mode, listed below in Branch section
 116 * for LD/ST, offset is aligned.  8-byte: i2||imm||0b000 4-byte: 0b00
 117 * SV Prefix over-rides help provide alternative bitwidths for LD/ST
 118 * RB|0 if RB is zero, addi. becomes "li"
 119  - this only works if RT takes part of opcode
 120  - mv is also possible by specifying an immediate of zero
 121
 122
 123 ### Branch
 124
 125 Note that illeg and nop are all zeros, including in the 16-bit mode.
 126 Given that C is allocated to OpenPOWER ISA Major opcodes EXT000 and
 127 EXT001 this ensures that in both 10-bit *and* 16-bit mode, a 16-bit
 128 run of all zeros is considered "illegal" whilst 0b0000.0000.1000.0000
 129 is "nop"
 130
 131     | 16-bit mode | | 10-bit mode                 |
 132     | 0 | 1 | 234 | | 567.8  | 9  ab | c   de | f |
 133     | 0 | 0   000 | | 000.0  | 0  00 | 0   00 | 0 | illeg
 134     | 0 | 0   000 | | 000.1  | 0  00 | 0   00 | 0 | nop
 135     | N | offs2   | | 000.LK | offs!=0        | M | b, bl
 136     | 1 | offs2   | | 000.LK | BI    | BO1 oo | 1 | bc, bcl
 137     | N | BO3 BI3 | | 001.0  | LK BI | BO     | M | bclr, bclrl
 138
 139 16 bit mode:
 140
 141 * bc only available when N,M=0b11
 142 * offs2 extends offset in MSBs
 143 * BI3 extends BI in MSBs to allow selection of full CR
 144 * BO3 extends BO
 145 * bc offset constructed from oo as LSBs and offs2 as MSBs
 146 * bc BI allows selection of all bits from CR0 or CR1
 147 * bc CR check is always active (as if BO0=1) therefore BO1 inverts
 148
 149 10 bit mode:
 150
 151 * illegal (all zeros) covers part of branch (offs=0,M=0,LK=0)
 152 * nop also covers part of branch (offs=0,M=0,LK=1)
 153 * bc **not available** in 10-bit mode
 154 * BO[0] enables CR check, BO[1] inverts check
 155 * BI refers to CR0 only (4 bits of)
 156 * no Branch Conditional with immediate
 157 * no Absolute Address
 158 * CTR mode allowed with BO[2] for b only.
 159 * offs is to 2 byte (signed) aligned
 160 * all branches to 2 byte aligned
 161
 162 ### LD/ST
 163
 164     | 16-bit mode       | | 10-bit mode               |
 165     | 0   | 1   | 2 3 4 | | 567.8 | 9 a b | c d e | f |
 166     | RB2 | RA2 |  RT   | | 001.1 | 1  RA | 0  RB | M | fld
 167     | RA2 | RT2 |  RB   | | 001.1 | 1  RA | 1  RT | M | fst
 168     |     |     |  RT   | | 111.0 |  RA   |  RB   | M | ld
 169     |     |     |  RB   | | 111.1 |  RA   |  RT   | M | st
 170
 171 * elwidth overrides can set different widths
 172
 173 16 bit mode:
 174
 175 * F=1 is FLD, FST
 176 * RA2 extends RA to 3 bits (MSB)
 177 * RT2 extends RT to 3 bits (MSB)
 178
 179 10 bit mode:
 180
 181 * RA and RB are only 2 bit (0-3)
 182 * for LD, RT is implicitly RB: "ld RT=RB, RA(RB)"
 183 * for ST, there is no offset: "st RT, RA(0)"
 184
 185 ### Arithmetic
 186
 187     | 16-bit mode   | | 10-bit mode             |
 188     | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
 189     | N |   |  RT   | | 010.0 | RB  | RA!=0 | M | add
 190     | N |   |  RT   | | 010.1 | RB  | RA    | M | mul
 191     | N |   | RT!=0 | | 011.0 | RB  | RA!=0 | M | sub.
 192     | N | 0 | 000   | | 011.0 | RB  | RA!=0 | M | cmpw
 193     | N | 1 | 000   | | 011.0 | RB  | RA!=0 | M | cmpl
 194     | N |   |  RT   | | 011.0 | RB  | 000   | M | neg.
 195
 196 10 bit mode:
 197
 198 * sub. default CR target is CR0
 199 * for (RA|0) when RA=0 the input is a zero immediate,
 200   meaning that sub. becomes neg.
 201 * RT is implicitly RB: "add RT(=RB), RA, RB"
 202 * Opcode 0b010.0 RA=0 is not missing from the above:
 203   it is a system-wide instruction, "cbank" (section below)
 204
 205 ### Logical
 206
 207     | 16-bit mode   | | 10-bit mode             |
 208     | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
 209     | N | 0 |  RT   | | 100.0 | RB  | RA!=0 | M | and
 210     | N | 0 |  RT   | | 100.1 | RB  | RA!=0 | M | nand
 211     | N | 0 |  RT   | | 101.0 | RB  | RA!=0 | M | or
 212     | N | 0 |  RT   | | 101.1 | RB  | RA!=0 | M | nor
 213     | N | 0 |  RT   | | 100.0 | RB  | 0 0 0 | M | extsw
 214     | N | 0 |  RT   | | 100.1 | RB  | 0 0 0 | M | cntlz
 215     | N | 0 |  RT   | | 101.0 | RB  | 0 0 0 | M | popcnt
 216     | N | 0 |  RT   | | 101.1 | RB  | 0 0 0 | M | not
 217
 218 16-bit mode only:
 219
 220     | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
 221     | N | 1 |  RT   | | 100.0 | RB  | RA!=0 | M |
 222     | N | 1 |  RT   | | 100.1 | RB  | RA!=0 | M |
 223     | N | 1 |  RT   | | 101.0 | RB  | RA!=0 | M | xor
 224     | N | 1 |  RT   | | 101.1 | RB  | RA!=0 | M | eqv (xnor)
 225     | N | 1 |  RT   | | 100.0 | RB  | 0 0 0 | M | extsb
 226     | N | 1 |  RT   | | 100.1 | RB  | 0 0 0 | M | cnttz
 227     | N | 1 |  RT   | | 101.0 | RB  | 0 0 0 | M |
 228     | N | 1 |  RT   | | 101.1 | RB  | 0 0 0 | M | extsh
 229
 230 10 bit mode:
 231
 232 * for (RA|0) when RA=0 the input is a zero immediate,
 233   meaning that nor becomes not
 234 * cntlz, popcnt, exts **not available** in 10-bit mode
 235 * RT is implicitly RB: "and RT(=RB), RA, RB"
 236
 237 ### Floating Point
 238
 239 Note here that elwidth overrides (SV Prefix) can be used to select FP16/32/64
 240
 241     | 16-bit mode   | | 10-bit mode             |
 242     | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
 243     | N |   |  RT   | | 011.1 | RB  | RA!=0 | M | fsub.
 244     | N | 0 |  RT   | | 110.0 | RB  | RA!=0 | M | fadd
 245     | N | 0 |  RT   | | 110.1 | RB  | RA!=0 | M | fmul
 246     | N | 0 |  RT   | | 011.1 | RB  | 0 0 0 | M | fneg.
 247     | N | 0 |  RT   | | 110.0 | RB  | 0 0 0 | M |
 248     | N | 0 |  RT   | | 110.1 | RB  | 0 0 0 | M |
 249
 250 16-bit mode only:
 251
 252     | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
 253     | N | 1 |  RT   | | 011.1 | RB  | RA!=0 | M |
 254     | N | 1 |  RT   | | 110.0 | RB  | RA!=0 | M |
 255     | N | 1 |  RT   | | 110.1 | RB  | RA!=0 | M | fdiv
 256     | N | 1 |  RT   | | 011.1 | RB  | 0 0 0 | M | fabs.
 257     | N | 1 |  RT   | | 110.0 | RB  | 0 0 0 | M | fmr.
 258     | N | 1 |  RT   | | 110.1 | RB  | 0 0 0 | M |
 259
 260 10 bit mode:
 261
 262 * fsub. fneg. and fmr. default target is CR1
 263 * fmr. is **not available** in 10-bit mode
 264 * fdiv is **not available** in 10-bit mode
 265
 266 16 bit mode:
 267
 268 * fmr. copies RB to RT (and sets CR1)
 269
 270 ### Condition Register
 271
 272     | 16-bit mode   | | 10-bit mode            |
 273     | 0 1 2 3 | 4   | | 567.8 | 9 ab | cde | f |
 274     | 0 0 0 0 | BF2 | | 001.1 | 0 BF | BFA | M | mcrf
 275     | 0 0 0 1 | BA2 | | 001.1 | 0 BA | BB  | M | crnor
 276     | 0 1 0 0 | BA2 | | 001.1 | 0 BA | BB  | M | crandc
 277     | 0 1 1 0 | BA2 | | 001.1 | 0 BA | BB  | M | crxor
 278     | 0 1 1 1 | BA2 | | 001.1 | 0 BA | BB  | M | crnand
 279     | 1 0 0 0 | BA2 | | 001.1 | 0 BA | BB  | M | crand
 280     | 1 0 0 1 | BA2 | | 001.1 | 0 BA | BB  | M | creqv
 281     | 1 1 0 1 | BA2 | | 001.1 | 0 BA | BB  | M | crorc
 282     | 1 1 1 0 | BA2 | | 001.1 | 0 BA | BB  | M | cror
 283
 284 10 bit mode:
 285
 286 * mcrf BF is only 2 bits which means the destination is only CR0-CR3
 287 * CR operations: **not available** in 10-bit mode (but mcrf is)
 288
 289 16 bit mode:
 290
 291 * mcrf BF2 extends BF (in MSB) to 3 bits
 292 * CR operations: destination register is same as BA.
 293 * CR operations: only possible on CR0 and CR1
 294
 295 SV (Vector Mode):
 296
 297 * CR operations: greatly extended reach/range (useful for predicates)
 298
 299 ### System
 300
 301 cbank: Selection of Compressed-encoding "Bank".  Different "banks" give different
 302 meanings to opcodes.
 303 Example: CBank=0b001 is heavily optimised to A/Video
 304 Encode/Decode.
 305
 306     | 16-bit mode | | 10-bit mode               |
 307     | 0 | 1 2 3 4 | | 567.8 | 9 a b | c d e | f |
 308     | N |   Bank2 | | 010.0 | CBank | 0 0 0 | M | cbank
 309
 310 **not available** in 10-bit mode:
 311
 312     | 0 1 2 3 | 4  | | 567.8 | 9 ab | c d e  | f |
 313     | 1 1 1 1 | 0  | | 001.1 | 0 00 |  RT    | M | mtlr
 314     | 1 1 1 1 | 0  | | 001.1 | 0 01 |  RT    | M | mtctr
 315     | 1 1 1 1 | 0  | | 001.1 | 0 11 |  RT    | M | mtcr
 316     | 1 1 1 1 | 1  | | 001.1 | 0 00 |  RA    | M | mflr
 317     | 1 1 1 1 | 1  | | 001.1 | 0 01 |  RA    | M | mfctr
 318     | 1 1 1 1 | 1  | | 001.1 | 0 11 |  RA    | M | mfcr
 319
 320 ### Unallocated
 321
 322     | 0 1 2 3 | 4  | | 567.8 | 9 ab | c d e  | f |
 323     | 0 0 1 0 |    | | 001.1 | 0    |        | M |
 324     | 0 0 1 1 |    | | 001.1 | 0    |        | M |
 325     | 0 1 0 1 |    | | 001.1 | 0    |        | M |
 326     | 1 0 1 0 |    | | 001.1 | 0    |        | M |
 327     | 1 0 1 1 |    | | 001.1 | 0    |        | M |
 328     | 1 1 0 0 |    | | 001.1 | 0    |        | M |
 329     | 1 1 1 1 | 0  | | 001.1 | 0 10 |        | M |
 330     | 1 1 1 1 | 1  | | 001.1 | 0 10 |        | M |
 331