openpower/sv/16_bit_compressed.mdwn

   1 # 16 bit Compressed
   2
   3 Similar to VLE (but without immediate-prefixing) this encoding is designed
   4 to fit on top of OpenPOWER ISA v3.0B when a "Modeswitch" bit is set (PCR
   5 is recommended). Note that Compressed is *mutually exclusively incompatible*
   6 with OpenPOWER v3.1B "prefixing" due to using (requiring) both EXT000
   7 and EXT001. Hypothetically it could be made to use anything other than
   8 EXT001, with some inconvenience (extra gates).  The incompatibility is
   9 "fixed" by swapping out of "Compressed" Mode and back into "Normal"
  10 (v3.1B) Mode, at runtime, as needed.
  11
  12 Although initially intended to be augmented by Simple-V Prefixing, to
  13 add Vector context and predication yet not put pressure on I-Cache power
  14 or size, this Compressed Encoding is not critically dependent
  15 *on* SV Prefixing, and may be used stand-alone
  16
  17 See:
  18
  19 * <https://bugs.libre-soc.org/show_bug.cgi?id=238>
  20 * <https://ftp.libre-soc.org/VLE_314-68105.pdf> VLE Encoding
  21
  22 This one is a conundrum.  OpenPOWER ISA was never designed with 16
  23 bit in mind.  VLE was added 10 years ago but only by way of marking
  24 an entire 64k page as "VLE".  With VLE not maintained it is not
  25 fully compatible with current PowerISA.
  26
  27 Here, in order to embed 16 bit into a predominantly 32 bit stream the
  28 overhead of using an entire 16 bits just to switch into Compressed mode
  29 is itself a significant overhead.  The situation is made worse by 5 bits
  30 being taken up by Major Opcode space, leaving only 11 bits to allocate
  31 to actual instructions.
  32
  33 In addition we would like to add SV-C32 which is a Vectorised version
  34 of 16 bit Compressed, and ideally have a variant that adds the 27-bit
  35 prefix format from SV-P64, as well.
  36
  37 Potential ways to reduce pressure on the 16 bit space are:
  38
  39 * To provide "paging".  This involves bank-switching to alternative optimised encodings for specific workloads
  40 * To enter "16 bit mode" for durations specified at the start
  41 * To reserve one bit of every 16 bit instruction to indicate that the 16 bit mode is to continue to be sustained
  42
  43 This latter would be useful in the Vector context to have an alternative
  44 meaning: as the bit which determines whether the instruction is 11-bit
  45 prefixed or 27-bit prefixed:
  46
  47     0 1 2 3 4 5 6 7 8 9 a b c d e f |
  48     |major op | 11 bit vector prefix|
  49     |16 bit opcode  alt vec. mode ^ |
  50     | extra vector prefix if alt set|
  51
  52 Using a major opcode to enter 16 bit mode, leaves 11 bits to find
  53 something to use them for:
  54
  55     0 1 2 3 4 5 6 7 8 9 a b c d e f |
  56     |major op | what to do here   1 |
  57     |16 bit    stay in 16bit mode 1 |
  58     |16 bit    stay in 16bit mode 1 |
  59     |16 bit       exit 16bit mode 0 |
  60
  61 One possibility is that the 11 bits are used for bank selection, with
  62 some room for additional context such as altering the registers used
  63 for the 16 bit operations (bank selection of which scalar regs)
  64
  65 Another is to use the 11 bits for only the utmost commonly used
  66 instructions.  That being the case then even one of those 11 bits would
  67 also need to be dedicated to saying if 16 bit mode is to be continued.
  68 10 bits remain for actual opcodes!
  69
  70 # Opcode Allocation Ideas
  71
  72 * one bit from the 16-bit mode is used to indicate that 32-bit mode
  73   is to be dropped into for only one single instruction
  74   <https://bugs.libre-soc.org/show_bug.cgi?id=238#c2>
  75
  76 ## Opcodes exploration (Attempt 1)
  77
  78 Switching between different encoding modes is controlled by M (alone)
  79 in 10-bit mode, and M and N in 16-bit mode.
  80
  81 * M in 10-bit mode if zero indicates that following instructions are
  82   standard OpenPOWER ISA 32-bit encoded (including, redundantly,
  83   further 10/16-bit instructions)
  84 * M in 10-bit mode if 1 indicates that following instructions are
  85   in 16-bit encoding mode
  86
  87 Once in 16-bit mode:
  88
  89 * 0b01 (M=1, N=0): stay in 16-bit mode
  90 * 0b00: leave 16-bit mode permanently (return to standard OpenPOWER ISA)
  91 * 0b10: leave 16-bit mode for one cycle (return to standard OpenPOWER ISA)
  92 * 0b11: free to be used for something completely different.
  93
  94 The current "top" idea for 0b11 is to use it for a new encoding format
  95 of predominantly "immediates-based" 16-bit instructions (branch-conditional,
  96 addi, mulli etc.)
  97
  98 * The Compressed Major Opcode is in bits 5-7.
  99 * Minor opcode in bit 8.
 100 * In some cases bit 9 is taken as an additional sub-opcode, followed
 101   by bits 0-4 (for CR operations)
 102 * M+N mode-switching is not available for C-Major 0b001 or 0b111
 103 * 10 bit mode may be expanded by 16 bit mode, adding capabilities
 104   that do not fit in the extreme limited space.
 105
 106 Mode-switching FSM showing relationship between v3.0B, C 10bit and C 16bit.
 107 16-bit immediate mode remains in 16-bit.
 108
 109     | 0 | 1234 | 567  8 | 9abcde | f | explanation
 110     |EXT000/1  | Cmaj.m | fields | 0 | 10bit then v3.0B
 111     |EXT000/1  | Cmaj.m | fields | 1 | 10bit then 16bit
 112     | 0 | flds | Cmaj.m | fields | 0 | 16bit then v3.0B
 113     | 0 | flds | Cmaj.m | fields | 1 | 16bit then 16bit
 114     | 1 | flds | Cmaj.m | fields | 1 | 16b/imm then 16bit
 115     | 1 | flds | Cmaj.m | fields | 0 | 16b then 1x v3.0B
 116
 117 Notes:
 118
 119 * Cmaj.m is the C major/minor opcode: 3 bits for major, 1 for minor
 120 * EXT000 and EXT001 are v3.0B Major Opcodes.  The first 5 bits
 121   are zero, therefore the 6th bit is actually part of Cmaj.
 122 * "10bit then 16bit" means "this instruction is encoded C 10bit
 123   and the following one in C 16bit"
 124
 125 ### C Instruction Encoding types
 126
 127 10-bit Opcode formats (all start with v3.0B EXT000 or EXT001
 128 Major Opcodes)
 129
 130     | 01234    | 567  8 | 9  | a b | c  | d e | f | enc
 131     | E01      | Cmaj.m | fld1     | fld2     | M | 10b
 132     | E01      | Cmaj.m | offset              | M | 10b b
 133     | E01      | 001.1  | S1 | fd1 | S2 | fd2 | M | 10b sub
 134     | E01      | 111.m  | fld1     | fld2     | M | 10b LDST
 135
 136 16-bit Opcode formats (including 10/16/v3.0B Switching)
 137
 138     | 0 | 1234 | 567  8 | 9  | a b | c  | d e | f | enc
 139     | N | immf | Cmaj.m | fld1     | fld2     | M | 16b
 140     | 1 | immf | Cmaj.m | fld1     | imm      | 1 | 16b imm
 141     | fd3      | 001.1  | S1 | fd1 | S2 | fd2 | M | 16b sub
 142     | fd4      | 111.m  | fld1     | fld2     | M | 16b LDST
 143
 144 Notes:
 145
 146 * fld1 and fld2 can contain reg numbers, immediates, or opcode
 147   fields (BO, BI, LK)
 148 * S1 and S2 are further sub-selectors of C 001.1
 149
 150 ### Immediate Opcodes
 151
 152 only available in 16-bit mode, and only available when M=1 and N=1
 153
 154     | 0 | 1  | 2 3 4 | | 567.8 | 9ab   | cde | f |
 155     | 1 | i2 |  RT   | | 010.0 | RA|0  | imm | 1 | addi
 156     | 1 | i2         | | 010.1 | RA    | imm | 1 | addis
 157     | 1 | i2         | | 011.0 | RB    | imm | 1 | cmpdi
 158     | 1 | i2         | | 011.1 | RB    | imm | 1 | cmpwi
 159     | 1 | i2         | | 100.0 | RT    | imm | 1 | sti
 160     | 1 | i2         | | 100.1 | RT    | imm | 1 | fstwi
 161     | 1 | i2         | | 101.0 | RA    | imm | 1 | ldi
 162     | 1 | i2         | | 101.1 | RA    | imm | 1 | lwi
 163     | 1 | i2         | | 110.0 | RA    | imm | 1 | flwi
 164     | 1 | i2         | | 110.1 | RA    | imm | 1 | fldi
 165
 166 Construction of immediate:
 167
 168 * addi is EXTS(i2||imm) to give a 4-bit range -8 to +7
 169 * addis is EXTS(i2||imm||000) to give a 11-bit range -1024 to +1023 in increments of 8
 170 * all others are EXTS(i2||imm) to give a 7-bit range -128 to +127
 171   (further for LD/ST due to word/dword-alignment)
 172
 173 Further Notes:
 174
 175 * bc also has an immediate mode, listed separately below in Branch section
 176 * for LD/ST, offset is aligned.  8-byte: i2||imm||0b000 4-byte: 0b00
 177 * SV Prefix over-rides help provide alternative bitwidths for LD/ST
 178 * RA|0 if RA is zero, addi. becomes "li"
 179  - this only works if RT takes part of opcode
 180  - mv is also possible by specifying an immediate of zero
 181
 182 ### Branch
 183
 184 Note that illeg and nop are all zeros, including in the 16-bit mode.
 185 Given that C is allocated to OpenPOWER ISA Major opcodes EXT000 and
 186 EXT001 this ensures that in both 10-bit *and* 16-bit mode, a 16-bit
 187 run of all zeros is considered "illegal" whilst 0b0000.0000.1000.0000
 188 is "nop"
 189
 190     | 16-bit mode | | 10-bit mode                 |
 191     | 0 | 1 | 234 | | 567.8  | 9  ab | c   de | f |
 192     | 0 | 0   000 | | 000.0  | 0  00 | 0   00 | 0 | illeg
 193     | 0 | 0   000 | | 000.1  | 0  00 | 0   00 | 0 | nop
 194     | N | offs2   | | 000.LK | offs!=0        | M | b, bl
 195     | 1 | offs2   | | 000.LK | BI    | BO1 oo | 1 | bc, bcl
 196     | N | BO3 BI3 | | 001.0  | LK BI | BO     | M | bclr, bclrl
 197
 198 16 bit mode:
 199
 200 * bc only available when N,M=0b11
 201 * offs2 extends offset in MSBs
 202 * BI3 extends BI in MSBs to allow selection of full CR
 203 * BO3 extends BO
 204 * bc offset constructed from oo as LSBs and offs2 as MSBs
 205 * bc BI allows selection of all bits from CR0 or CR1
 206 * bc CR check is always active (as if BO0=1) therefore BO1 inverts
 207
 208 10 bit mode:
 209
 210 * illegal (all zeros) covers part of branch (offs=0,M=0,LK=0)
 211 * nop also covers part of branch (offs=0,M=0,LK=1)
 212 * bc **not available** in 10-bit mode
 213 * BO[0] enables CR check, BO[1] inverts check
 214 * BI refers to CR0 only (4 bits of)
 215 * no Branch Conditional with immediate
 216 * no Absolute Address
 217 * CTR mode allowed with BO[2] for b only.
 218 * offs is to 2 byte (signed) aligned
 219 * all branches to 2 byte aligned
 220
 221 ### LD/ST
 222
 223     | 16-bit mode       | | 10-bit mode               |
 224     | 0   | 1   | 2 3 4 | | 567.8 | 9 a b | c d e | f |
 225     | RB2 | RA2 |  RT   | | 001.1 | 1  RA | 0  RB | M | fld
 226     | RA2 | RT2 |  RB   | | 001.1 | 1  RA | 1  RT | M | fst
 227     |     |     |  RT   | | 111.0 |  RA   |  RB   | M | ld
 228     |     |     |  RB   | | 111.1 |  RA   |  RT   | M | st
 229
 230 * elwidth overrides can set different widths
 231
 232 16 bit mode:
 233
 234 * F=1 is FLD, FST
 235 * RA2 extends RA to 3 bits (MSB)
 236 * RT2 extends RT to 3 bits (MSB)
 237
 238 10 bit mode:
 239
 240 * RA and RB are only 2 bit (0-3)
 241 * for LD, RT is implicitly RB: "ld RT=RB, RA(RB)"
 242 * for ST, there is no offset: "st RT, RA(0)"
 243
 244 ### Arithmetic
 245
 246     | 16-bit mode   | | 10-bit mode             |
 247     | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
 248     | N | 0 |  RT   | | 010.0 | RB  | RA!=0 | M | add
 249     | N | 0 |  RT   | | 010.1 | RB  | RA    | M | mul
 250     | N | 0 | RT!=0 | | 011.0 | RB  | RA!=0 | M | sub.
 251     | N | 0 | 000   | | 011.0 | RB  | RA!=0 | M | cmpw
 252     | N | 0 |  RT   | | 011.0 | RB  | 000   | M | neg.
 253
 254 16 bit mode only:
 255
 256     | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
 257     | N | 1 |  RT   | | 010.0 |     |       | M |
 258     | N | 1 |  RT   | | 010.1 | RB  | RA    | M | div
 259     | N | 1 | RT!=0 | | 011.0 | RB  | RA!=0 | M |
 260     | N | 1 | 000   | | 011.0 | RB  | RA!=0 | M | cmpl
 261     | N | 1 |  RT   | | 011.0 | RB  | 000   | M |
 262
 263 10 bit mode:
 264
 265 * sub. default CR target is CR0
 266 * for (RA|0) when RA=0 the input is a zero immediate,
 267   meaning that sub. becomes neg.
 268 * RT is implicitly RB: "add RT(=RB), RA, RB"
 269 * Opcode 0b010.0 RA=0 is not missing from the above:
 270   it is a system-wide instruction, "cbank" (section below)
 271
 272 ### Logical
 273
 274     | 16-bit mode   | | 10-bit mode             |
 275     | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
 276     | N | 0 |  RT   | | 100.0 | RB  | RA!=0 | M | and
 277     | N | 0 |  RT   | | 100.1 | RB  | RA!=0 | M | nand
 278     | N | 0 |  RT   | | 101.0 | RB  | RA!=0 | M | or
 279     | N | 0 |  RT   | | 101.1 | RB  | RA!=0 | M | nor
 280     | N | 0 |  RT   | | 100.0 | RB  | 0 0 0 | M | extsw
 281     | N | 0 |  RT   | | 100.1 | RB  | 0 0 0 | M | cntlz
 282     | N | 0 |  RT   | | 101.0 | RB  | 0 0 0 | M | popcnt
 283     | N | 0 |  RT   | | 101.1 | RB  | 0 0 0 | M | not
 284
 285 16-bit mode only:
 286
 287     | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
 288     | N | 1 |  RT   | | 100.0 | RB  | RA!=0 | M |
 289     | N | 1 |  RT   | | 100.1 | RB  | RA!=0 | M |
 290     | N | 1 |  RT   | | 101.0 | RB  | RA!=0 | M | xor
 291     | N | 1 |  RT   | | 101.1 | RB  | RA!=0 | M | eqv (xnor)
 292     | N | 1 |  RT   | | 100.0 | RB  | 0 0 0 | M | extsb
 293     | N | 1 |  RT   | | 100.1 | RB  | 0 0 0 | M | cnttz
 294     | N | 1 |  RT   | | 101.0 | RB  | 0 0 0 | M |
 295     | N | 1 |  RT   | | 101.1 | RB  | 0 0 0 | M | extsh
 296
 297 10 bit mode:
 298
 299 * for (RA|0) when RA=0 the input is a zero immediate,
 300   meaning that nor becomes not
 301 * cntlz, popcnt, exts **not available** in 10-bit mode
 302 * RT is implicitly RB: "and RT(=RB), RA, RB"
 303
 304 ### Floating Point
 305
 306 Note here that elwidth overrides (SV Prefix) can be used to select FP16/32/64
 307
 308     | 16-bit mode   | | 10-bit mode             |
 309     | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
 310     | N |   |  RT   | | 011.1 | RB  | RA!=0 | M | fsub.
 311     | N | 0 |  RT   | | 110.0 | RB  | RA!=0 | M | fadd
 312     | N | 0 |  RT   | | 110.1 | RB  | RA!=0 | M | fmul
 313     | N | 0 |  RT   | | 011.1 | RB  | 0 0 0 | M | fneg.
 314     | N | 0 |  RT   | | 110.0 | RB  | 0 0 0 | M |
 315     | N | 0 |  RT   | | 110.1 | RB  | 0 0 0 | M |
 316
 317 16-bit mode only:
 318
 319     | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
 320     | N | 1 |  RT   | | 011.1 | RB  | RA!=0 | M |
 321     | N | 1 |  RT   | | 110.0 | RB  | RA!=0 | M |
 322     | N | 1 |  RT   | | 110.1 | RB  | RA!=0 | M | fdiv
 323     | N | 1 |  RT   | | 011.1 | RB  | 0 0 0 | M | fabs.
 324     | N | 1 |  RT   | | 110.0 | RB  | 0 0 0 | M | fmr.
 325     | N | 1 |  RT   | | 110.1 | RB  | 0 0 0 | M |
 326
 327 10 bit mode:
 328
 329 * fsub. fneg. and fmr. default target is CR1
 330 * fmr. is **not available** in 10-bit mode
 331 * fdiv is **not available** in 10-bit mode
 332
 333 16 bit mode:
 334
 335 * fmr. copies RB to RT (and sets CR1)
 336
 337 ### Condition Register
 338
 339     | 16-bit mode   | | 10-bit mode            |
 340     | 0 1 2 3 | 4   | | 567.8 | 9 ab | cde | f |
 341     | 0 0 0 0 | BF2 | | 001.1 | 0 BF | BFA | M | mcrf
 342     | 0 0 0 1 | BA2 | | 001.1 | 0 BA | BB  | M | crnor
 343     | 0 1 0 0 | BA2 | | 001.1 | 0 BA | BB  | M | crandc
 344     | 0 1 1 0 | BA2 | | 001.1 | 0 BA | BB  | M | crxor
 345     | 0 1 1 1 | BA2 | | 001.1 | 0 BA | BB  | M | crnand
 346     | 1 0 0 0 | BA2 | | 001.1 | 0 BA | BB  | M | crand
 347     | 1 0 0 1 | BA2 | | 001.1 | 0 BA | BB  | M | creqv
 348     | 1 1 0 1 | BA2 | | 001.1 | 0 BA | BB  | M | crorc
 349     | 1 1 1 0 | BA2 | | 001.1 | 0 BA | BB  | M | cror
 350
 351 10 bit mode:
 352
 353 * mcrf BF is only 2 bits which means the destination is only CR0-CR3
 354 * CR operations: **not available** in 10-bit mode (but mcrf is)
 355
 356 16 bit mode:
 357
 358 * mcrf BF2 extends BF (in MSB) to 3 bits
 359 * CR operations: destination register is same as BA.
 360 * CR operations: only possible on CR0 and CR1
 361
 362 SV (Vector Mode):
 363
 364 * CR operations: greatly extended reach/range (useful for predicates)
 365
 366 ### System
 367
 368 cbank: Selection of Compressed-encoding "Bank".  Different "banks"
 369 give different meanings to opcodes.  Example: CBank=0b001 is heavily
 370 optimised to A/Video Encode/Decode.  cbank borrows from add's encoding
 371 space (when RA==0)
 372
 373     | 16-bit mode | | 10-bit mode               |
 374     | 0 | 1 2 3 4 | | 567.8 | 9ab   | cde | f |
 375     | N | 0 Bank2 | | 010.0 | CBank | 000 | M | cbank
 376
 377 **not available** in 10-bit mode:
 378
 379     | 0 1 2 3 | 4  | | 567.8 | 9 ab | cde  | f |
 380     | 1 1 1 1 | 0  | | 001.1 | 0 00 |  RT  | M | mtlr
 381     | 1 1 1 1 | 0  | | 001.1 | 0 01 |  RT  | M | mtctr
 382     | 1 1 1 1 | 0  | | 001.1 | 0 11 |  RT  | M | mtcr
 383     | 1 1 1 1 | 1  | | 001.1 | 0 00 |  RA  | M | mflr
 384     | 1 1 1 1 | 1  | | 001.1 | 0 01 |  RA  | M | mfctr
 385     | 1 1 1 1 | 1  | | 001.1 | 0 11 |  RA  | M | mfcr
 386
 387 ### Unallocated
 388
 389     | 0 1 2 3 | 4  | | 567.8 | 9 ab | cde  | f |
 390     | 0 0 1 0 |    | | 001.1 | 0    |      | M |
 391     | 0 0 1 1 |    | | 001.1 | 0    |      | M |
 392     | 0 1 0 1 |    | | 001.1 | 0    |      | M |
 393     | 1 0 1 0 |    | | 001.1 | 0    |      | M |
 394     | 1 0 1 1 |    | | 001.1 | 0    |      | M |
 395     | 1 1 0 0 |    | | 001.1 | 0    |      | M |
 396     | 1 1 1 1 | 0  | | 001.1 | 0 10 |      | M |
 397     | 1 1 1 1 | 1  | | 001.1 | 0 10 |      | M |
 398