openpower/sv/16_bit_compressed.mdwn

   1 # 16 bit Compressed
   2
   3 See:
   4
   5 * <https://bugs.libre-soc.org/show_bug.cgi?id=238>
   6 * <https://ftp.libre-soc.org/VLE_314-68105.pdf> VLE Encoding
   7
   8 This one is a conundrum.  OpenPOWER ISA was never designed with 16
   9 bit in mind.  VLE was added 10 years ago but only by way of marking
  10 an entire 64k page as "VLE".  With VLE not maintained it is not
  11 fully compatible with current PowerISA.
  12
  13 Here, in order to embed 16 bit into a predominantly 32 bit stream the
  14 overhead of using an entire 16 bits just to switch into Compressed mode
  15 is itself a significant overhead.  The situation is made worse by 5 bits
  16 being taken up by Major Opcode space, leaving only 11 bits to allocate
  17 to actual instructions.
  18
  19 In addition we would like to add SV-C32 which is a Vectorised version
  20 of 16 bit Compressed, and ideally have a variant that adds the 27-bit
  21 prefix format from SV-P64, as well.
  22
  23 Potential ways to reduce pressure on the 16 bit space are:
  24
  25 * To provide "paging".  This involves bank-switching to alternative optimised encodings for specific workloads
  26 * To enter "16 bit mode" for durations specified at the start
  27 * To reserve one bit of every 16 bit instruction to indicate that the 16 bit mode is to continue to be sustained
  28
  29 This latter would be useful in the Vector context to have an alternative
  30 meaning: as the bit which determines whether the instruction is 11-bit
  31 prefixed or 27-bit prefixed:
  32
  33     0 1 2 3 4 5 6 7 8 9 a b c d e f |
  34     |major op | 11 bit vector prefix|
  35     |16 bit opcode  alt vec. mode ^ |
  36     | extra vector prefix if alt set|
  37
  38 Using a major opcode to enter 16 bit mode, leaves 11 bits to find
  39 something to use them for:
  40
  41     0 1 2 3 4 5 6 7 8 9 a b c d e f |
  42     |major op | what to do here   1 |
  43     |16 bit    stay in 16bit mode 1 |
  44     |16 bit    stay in 16bit mode 1 |
  45     |16 bit       exit 16bit mode 0 |
  46
  47 One possibility is that the 11 bits are used for bank selection, with
  48 some room for additional context such as altering the registers used
  49 for the 16 bit operations (bank selection of which scalar regs)
  50
  51 Another is to use the 11 bits for only the utmost commonly used
  52 instructions.  That being the case then even one of those 11 bits would
  53 also need to be dedicated to saying if 16 bit mode is to be continued.
  54 10 bits remain for actual opcodes!
  55
  56 # Opcode Allocation Ideas
  57
  58 * one bit from the 16-bit mode is used to indicate that 32-bit mode
  59   is to be dropped into for only one single instruction
  60   <https://bugs.libre-soc.org/show_bug.cgi?id=238#c2>
  61
  62 ## Opcodes exploration (Attempt 1)
  63
  64 Switching between different encoding modes is controlled by M (alone)
  65 in 10-bit mode, and M and N in 16-bit mode.
  66
  67 * M in 10-bit mode if zero indicates that following instructions are
  68   standard OpenPOWER ISA 32-bit encoded (including, redundantly,
  69   further 10/16-bit instructions)
  70 * M in 10-bit mode if 1 indicates that following instructions are
  71   in 16-bit encoding mode
  72
  73 Once in 16-bit mode:
  74
  75 * 0b01 (M=1, N=0): stay in 16-bit mode
  76 * 0b00: leave 16-bit mode permanently (return to standard OpenPOWER ISA)
  77 * 0b10: leave 16-bit mode for one cycle (return to standard OpenPOWER ISA)
  78 * 0b11: free to be used for something completely different.
  79
  80 The current "top" idea for 0b11 is to use it for a new encoding format
  81 of predominantly "immediates-based" 16-bit instructions (branch-conditional,
  82 addi, mulli etc.)
  83
  84 * The Compressed Major Opcode is in bits 5-7.
  85 * Minor opcode in bit 8.
  86 * In some cases bit 9 is taken as an additional sub-opcode, followed
  87   by bits 0-4 (for CR operations)
  88 * M+N mode-switching is not available for C-Major 0b001 or 0b111
  89 * 10 bit mode may be expanded by 16 bit mode, adding capabilities
  90   that do not fit in the extreme limited space.
  91
  92 ### Immediate Opcodes
  93
  94 only available in 16-bit mode, and only available when M=1 and N=1
  95
  96     | 0 | 1  | 2 3 4 | | 567.8 | 9ab   | cde | f |
  97     | 1 | i2 |  RT   | | 010.0 | RA|0  | imm | 1 | addi
  98     | 1 | i2         | | 010.1 | RA    | imm | 1 | addis
  99     | 1 | i2         | | 011.0 | RB    | imm | 1 | cmpdi
 100     | 1 | i2         | | 011.1 | RB    | imm | 1 | cmpwi
 101     | 1 | i2         | | 100.0 | RT    | imm | 1 | sti
 102     | 1 | i2         | | 100.1 | RT    | imm | 1 | fstwi
 103     | 1 | i2         | | 101.0 | RA    | imm | 1 | ldi
 104     | 1 | i2         | | 101.1 | RA    | imm | 1 | lwi
 105     | 1 | i2         | | 110.0 | RA    | imm | 1 | flwi
 106     | 1 | i2         | | 110.1 | RA    | imm | 1 | fldi
 107
 108 Construction of immediate:
 109
 110 * addi is EXTS(i2||imm) to give a 4-bit range -8 to +7
 111 * addis is EXTS(i2||imm||000) to give a 11-bit range -1024 to +1023 in increments of 8
 112 * all others are EXTS(i2||imm) to give a 7-bit range -128 to +127
 113   (further for LD/ST due to word/dword-alignment)
 114
 115 Further Notes:
 116
 117 * bc also has an immediate mode, listed separately below in Branch section
 118 * for LD/ST, offset is aligned.  8-byte: i2||imm||0b000 4-byte: 0b00
 119 * SV Prefix over-rides help provide alternative bitwidths for LD/ST
 120 * RA|0 if RA is zero, addi. becomes "li"
 121  - this only works if RT takes part of opcode
 122  - mv is also possible by specifying an immediate of zero
 123
 124
 125 ### Branch
 126
 127 Note that illeg and nop are all zeros, including in the 16-bit mode.
 128 Given that C is allocated to OpenPOWER ISA Major opcodes EXT000 and
 129 EXT001 this ensures that in both 10-bit *and* 16-bit mode, a 16-bit
 130 run of all zeros is considered "illegal" whilst 0b0000.0000.1000.0000
 131 is "nop"
 132
 133     | 16-bit mode | | 10-bit mode                 |
 134     | 0 | 1 | 234 | | 567.8  | 9  ab | c   de | f |
 135     | 0 | 0   000 | | 000.0  | 0  00 | 0   00 | 0 | illeg
 136     | 0 | 0   000 | | 000.1  | 0  00 | 0   00 | 0 | nop
 137     | N | offs2   | | 000.LK | offs!=0        | M | b, bl
 138     | 1 | offs2   | | 000.LK | BI    | BO1 oo | 1 | bc, bcl
 139     | N | BO3 BI3 | | 001.0  | LK BI | BO     | M | bclr, bclrl
 140
 141 16 bit mode:
 142
 143 * bc only available when N,M=0b11
 144 * offs2 extends offset in MSBs
 145 * BI3 extends BI in MSBs to allow selection of full CR
 146 * BO3 extends BO
 147 * bc offset constructed from oo as LSBs and offs2 as MSBs
 148 * bc BI allows selection of all bits from CR0 or CR1
 149 * bc CR check is always active (as if BO0=1) therefore BO1 inverts
 150
 151 10 bit mode:
 152
 153 * illegal (all zeros) covers part of branch (offs=0,M=0,LK=0)
 154 * nop also covers part of branch (offs=0,M=0,LK=1)
 155 * bc **not available** in 10-bit mode
 156 * BO[0] enables CR check, BO[1] inverts check
 157 * BI refers to CR0 only (4 bits of)
 158 * no Branch Conditional with immediate
 159 * no Absolute Address
 160 * CTR mode allowed with BO[2] for b only.
 161 * offs is to 2 byte (signed) aligned
 162 * all branches to 2 byte aligned
 163
 164 ### LD/ST
 165
 166     | 16-bit mode       | | 10-bit mode               |
 167     | 0   | 1   | 2 3 4 | | 567.8 | 9 a b | c d e | f |
 168     | RB2 | RA2 |  RT   | | 001.1 | 1  RA | 0  RB | M | fld
 169     | RA2 | RT2 |  RB   | | 001.1 | 1  RA | 1  RT | M | fst
 170     |     |     |  RT   | | 111.0 |  RA   |  RB   | M | ld
 171     |     |     |  RB   | | 111.1 |  RA   |  RT   | M | st
 172
 173 * elwidth overrides can set different widths
 174
 175 16 bit mode:
 176
 177 * F=1 is FLD, FST
 178 * RA2 extends RA to 3 bits (MSB)
 179 * RT2 extends RT to 3 bits (MSB)
 180
 181 10 bit mode:
 182
 183 * RA and RB are only 2 bit (0-3)
 184 * for LD, RT is implicitly RB: "ld RT=RB, RA(RB)"
 185 * for ST, there is no offset: "st RT, RA(0)"
 186
 187 ### Arithmetic
 188
 189     | 16-bit mode   | | 10-bit mode             |
 190     | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
 191     | N |   |  RT   | | 010.0 | RB  | RA!=0 | M | add
 192     | N |   |  RT   | | 010.1 | RB  | RA    | M | mul
 193     | N |   | RT!=0 | | 011.0 | RB  | RA!=0 | M | sub.
 194     | N | 0 | 000   | | 011.0 | RB  | RA!=0 | M | cmpw
 195     | N | 1 | 000   | | 011.0 | RB  | RA!=0 | M | cmpl
 196     | N |   |  RT   | | 011.0 | RB  | 000   | M | neg.
 197
 198 10 bit mode:
 199
 200 * sub. default CR target is CR0
 201 * for (RA|0) when RA=0 the input is a zero immediate,
 202   meaning that sub. becomes neg.
 203 * RT is implicitly RB: "add RT(=RB), RA, RB"
 204 * Opcode 0b010.0 RA=0 is not missing from the above:
 205   it is a system-wide instruction, "cbank" (section below)
 206
 207 ### Logical
 208
 209     | 16-bit mode   | | 10-bit mode             |
 210     | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
 211     | N | 0 |  RT   | | 100.0 | RB  | RA!=0 | M | and
 212     | N | 0 |  RT   | | 100.1 | RB  | RA!=0 | M | nand
 213     | N | 0 |  RT   | | 101.0 | RB  | RA!=0 | M | or
 214     | N | 0 |  RT   | | 101.1 | RB  | RA!=0 | M | nor
 215     | N | 0 |  RT   | | 100.0 | RB  | 0 0 0 | M | extsw
 216     | N | 0 |  RT   | | 100.1 | RB  | 0 0 0 | M | cntlz
 217     | N | 0 |  RT   | | 101.0 | RB  | 0 0 0 | M | popcnt
 218     | N | 0 |  RT   | | 101.1 | RB  | 0 0 0 | M | not
 219
 220 16-bit mode only:
 221
 222     | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
 223     | N | 1 |  RT   | | 100.0 | RB  | RA!=0 | M |
 224     | N | 1 |  RT   | | 100.1 | RB  | RA!=0 | M |
 225     | N | 1 |  RT   | | 101.0 | RB  | RA!=0 | M | xor
 226     | N | 1 |  RT   | | 101.1 | RB  | RA!=0 | M | eqv (xnor)
 227     | N | 1 |  RT   | | 100.0 | RB  | 0 0 0 | M | extsb
 228     | N | 1 |  RT   | | 100.1 | RB  | 0 0 0 | M | cnttz
 229     | N | 1 |  RT   | | 101.0 | RB  | 0 0 0 | M |
 230     | N | 1 |  RT   | | 101.1 | RB  | 0 0 0 | M | extsh
 231
 232 10 bit mode:
 233
 234 * for (RA|0) when RA=0 the input is a zero immediate,
 235   meaning that nor becomes not
 236 * cntlz, popcnt, exts **not available** in 10-bit mode
 237 * RT is implicitly RB: "and RT(=RB), RA, RB"
 238
 239 ### Floating Point
 240
 241 Note here that elwidth overrides (SV Prefix) can be used to select FP16/32/64
 242
 243     | 16-bit mode   | | 10-bit mode             |
 244     | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
 245     | N |   |  RT   | | 011.1 | RB  | RA!=0 | M | fsub.
 246     | N | 0 |  RT   | | 110.0 | RB  | RA!=0 | M | fadd
 247     | N | 0 |  RT   | | 110.1 | RB  | RA!=0 | M | fmul
 248     | N | 0 |  RT   | | 011.1 | RB  | 0 0 0 | M | fneg.
 249     | N | 0 |  RT   | | 110.0 | RB  | 0 0 0 | M |
 250     | N | 0 |  RT   | | 110.1 | RB  | 0 0 0 | M |
 251
 252 16-bit mode only:
 253
 254     | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
 255     | N | 1 |  RT   | | 011.1 | RB  | RA!=0 | M |
 256     | N | 1 |  RT   | | 110.0 | RB  | RA!=0 | M |
 257     | N | 1 |  RT   | | 110.1 | RB  | RA!=0 | M | fdiv
 258     | N | 1 |  RT   | | 011.1 | RB  | 0 0 0 | M | fabs.
 259     | N | 1 |  RT   | | 110.0 | RB  | 0 0 0 | M | fmr.
 260     | N | 1 |  RT   | | 110.1 | RB  | 0 0 0 | M |
 261
 262 10 bit mode:
 263
 264 * fsub. fneg. and fmr. default target is CR1
 265 * fmr. is **not available** in 10-bit mode
 266 * fdiv is **not available** in 10-bit mode
 267
 268 16 bit mode:
 269
 270 * fmr. copies RB to RT (and sets CR1)
 271
 272 ### Condition Register
 273
 274     | 16-bit mode   | | 10-bit mode            |
 275     | 0 1 2 3 | 4   | | 567.8 | 9 ab | cde | f |
 276     | 0 0 0 0 | BF2 | | 001.1 | 0 BF | BFA | M | mcrf
 277     | 0 0 0 1 | BA2 | | 001.1 | 0 BA | BB  | M | crnor
 278     | 0 1 0 0 | BA2 | | 001.1 | 0 BA | BB  | M | crandc
 279     | 0 1 1 0 | BA2 | | 001.1 | 0 BA | BB  | M | crxor
 280     | 0 1 1 1 | BA2 | | 001.1 | 0 BA | BB  | M | crnand
 281     | 1 0 0 0 | BA2 | | 001.1 | 0 BA | BB  | M | crand
 282     | 1 0 0 1 | BA2 | | 001.1 | 0 BA | BB  | M | creqv
 283     | 1 1 0 1 | BA2 | | 001.1 | 0 BA | BB  | M | crorc
 284     | 1 1 1 0 | BA2 | | 001.1 | 0 BA | BB  | M | cror
 285
 286 10 bit mode:
 287
 288 * mcrf BF is only 2 bits which means the destination is only CR0-CR3
 289 * CR operations: **not available** in 10-bit mode (but mcrf is)
 290
 291 16 bit mode:
 292
 293 * mcrf BF2 extends BF (in MSB) to 3 bits
 294 * CR operations: destination register is same as BA.
 295 * CR operations: only possible on CR0 and CR1
 296
 297 SV (Vector Mode):
 298
 299 * CR operations: greatly extended reach/range (useful for predicates)
 300
 301 ### System
 302
 303 cbank: Selection of Compressed-encoding "Bank".  Different "banks" give different
 304 meanings to opcodes.
 305 Example: CBank=0b001 is heavily optimised to A/Video
 306 Encode/Decode.
 307
 308     | 16-bit mode | | 10-bit mode               |
 309     | 0 | 1 2 3 4 | | 567.8 | 9 a b | c d e | f |
 310     | N |   Bank2 | | 010.0 | CBank | 0 0 0 | M | cbank
 311
 312 **not available** in 10-bit mode:
 313
 314     | 0 1 2 3 | 4  | | 567.8 | 9 ab | c d e  | f |
 315     | 1 1 1 1 | 0  | | 001.1 | 0 00 |  RT    | M | mtlr
 316     | 1 1 1 1 | 0  | | 001.1 | 0 01 |  RT    | M | mtctr
 317     | 1 1 1 1 | 0  | | 001.1 | 0 11 |  RT    | M | mtcr
 318     | 1 1 1 1 | 1  | | 001.1 | 0 00 |  RA    | M | mflr
 319     | 1 1 1 1 | 1  | | 001.1 | 0 01 |  RA    | M | mfctr
 320     | 1 1 1 1 | 1  | | 001.1 | 0 11 |  RA    | M | mfcr
 321
 322 ### Unallocated
 323
 324     | 0 1 2 3 | 4  | | 567.8 | 9 ab | c d e  | f |
 325     | 0 0 1 0 |    | | 001.1 | 0    |        | M |
 326     | 0 0 1 1 |    | | 001.1 | 0    |        | M |
 327     | 0 1 0 1 |    | | 001.1 | 0    |        | M |
 328     | 1 0 1 0 |    | | 001.1 | 0    |        | M |
 329     | 1 0 1 1 |    | | 001.1 | 0    |        | M |
 330     | 1 1 0 0 |    | | 001.1 | 0    |        | M |
 331     | 1 1 1 1 | 0  | | 001.1 | 0 10 |        | M |
 332     | 1 1 1 1 | 1  | | 001.1 | 0 10 |        | M |
 333