openpower/sv/16_bit_compressed.mdwn

   1 # 16 bit Compressed
   2
   3 See:
   4
   5 * <https://bugs.libre-soc.org/show_bug.cgi?id=238>
   6 * <https://ftp.libre-soc.org/VLE_314-68105.pdf> VLE Encoding
   7
   8 This one is a conundrum.  OpenPOWER ISA was never designed with 16
   9 bit in mind.  VLE was added 10 years ago but only by way of marking
  10 an entire 64k page as "VLE".  With VLE not maintained it is not
  11 fully compatible with current PowerISA.
  12
  13 Here, in order to embed 16 bit into a predominantly 32 bit stream the
  14 overhead of using an entire 16 bits just to switch into Compressed mode
  15 is itself a significant overhead.  The situation is made worse by 5 bits
  16 being taken up by Major Opcode space, leaving only 11 bits to allocate
  17 to actual instructions.
  18
  19 In addition we would like to add SV-C32 which is a Vectorised version
  20 of 16 bit Compressed, and ideally have a variant that adds the 27-bit
  21 prefix format from SV-P64, as well.
  22
  23 Potential ways to reduce pressure on the 16 bit space are:
  24
  25 * To provide "paging".  This involves bank-switching to alternative optimised encodings for specific workloads
  26 * To enter "16 bit mode" for durations specified at the start
  27 * To reserve one bit of every 16 bit instruction to indicate that the 16 bit mode is to continue to be sustained
  28
  29 This latter would be useful in the Vector context to have an alternative
  30 meaning: as the bit which determines whether the instruction is 11-bit
  31 prefixed or 27-bit prefixed:
  32
  33     0 1 2 3 4 5 6 7 8 9 a b c d e f |
  34     |major op | 11 bit vector prefix|
  35     |16 bit opcode  alt vec. mode ^ |
  36     | extra vector prefix if alt set|
  37
  38 Using a major opcode to enter 16 bit mode, leaves 11 bits to find
  39 something to use them for:
  40
  41     0 1 2 3 4 5 6 7 8 9 a b c d e f |
  42     |major op | what to do here   1 |
  43     |16 bit    stay in 16bit mode 1 |
  44     |16 bit    stay in 16bit mode 1 |
  45     |16 bit       exit 16bit mode 0 |
  46
  47 One possibility is that the 11 bits are used for bank selection, with
  48 some room for additional context such as altering the registers used
  49 for the 16 bit operations (bank selection of which scalar regs)
  50
  51 Another is to use the 11 bits for only the utmost commonly used
  52 instructions.  That being the case then even one of those 11 bits would
  53 also need to be dedicated to saying if 16 bit mode is to be continued.
  54 10 bits remain for actual opcodes!
  55
  56 # Opcode Allocation Ideas
  57
  58 * one bit from the 16-bit mode is used to indicate that 32-bit mode
  59   is to be dropped into for only one single instruction
  60   <https://bugs.libre-soc.org/show_bug.cgi?id=238#c2>
  61
  62 ## Opcodes exploration (Attempt 1)
  63
  64 Switching between different encoding modes is controlled by M (alone)
  65 in 10-bit mode, and M and N in 16-bit mode.
  66
  67 * M in 10-bit mode if zero indicates that following instructions are
  68   standard OpenPOWER ISA 32-bit encoded (including, redundantly,
  69   further 10/16-bit instructions)
  70 * M in 10-bit mode if 1 indicates that following instructions are
  71   in 16-bit encoding mode
  72
  73 Once in 16-bit mode:
  74
  75 * 0b01: stay in 16-bit mode
  76 * 0b00: leave 16-bit mode permanently (return to standard OpenPOWER ISA)
  77 * 0b10: leave 16-bit mode for one cycle (return to standard OpenPOWER ISA)
  78 * 0b11: free to be used for something completely different.
  79
  80 The current "top" idea for 0b11 is to use it for a new encoding format
  81 of predominantly "immediates-based" 16-bit instructions (branch-conditional,
  82 addi, mulli etc.)
  83
  84 ### Branch
  85
  86 10 bit mode may be expanded by 16 bit mode later, adding capabilities
  87 that do not fit in the extreme limited space.
  88
  89     | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e  | f |
  90     |   offs2     | | 000 |    offs       | LK | M | b
  91     | BO2 | BI3   | | 001 | 0  BI | 0  BO | LK | M | bclr
  92     | BO2 | BI3   | | 001 | 0  BI | 1  BO | LK | M | bctar
  93
  94 16 bit mode:
  95
  96 * offs2 extends offset in MSBs
  97 * BI3 extends BI in MSBs to allow selection of full CR
  98 * BO2 extends BO
  99
 100 10 bit mode:
 101
 102 * BO[0] enables CR check, BO[1] inverts check
 103 * BI refers to CR0 only (4 bits of)
 104 * no Branch Conditional with immediate
 105 * no Absolute Address
 106 * no CTR mode (and no bctr)
 107 * offs is to 2 byte (signed) aligned
 108 * all branches to 2 byte aligned
 109
 110 ### LD/ST
 111
 112     | 0   | 1   | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
 113     | RB2 | RA2 |  RT   | | 001 | 1  RA | 1  RB | 0 | M | fld
 114     | RA2 | RT2 |  RB   | | 001 | 1  RA | 1  RT | 1 | M | fst
 115     |     |     |  RT   | | 111 |  RA   |  RB   | 0 | M | ld
 116     |     |     |  RB   | | 111 |  RA   |  RT   | 1 | M | st
 117
 118 * elwidth overrides can set different widths
 119
 120 16 bit mode:
 121
 122 * F=1 is FLD, FST
 123 * RA2 extends RA to 3 bits (MSB)
 124 * RT2 extends RT to 3 bits (MSB)
 125
 126 10 bit mode:
 127
 128 * RA and RB are only 2 bit (0-3)
 129 * for LD, RT is implicitly RB: "ld RT=RB, RA(RB)"
 130 * for ST, there is no offset: "st RT, RA(0)"
 131
 132 ### Arithmetic
 133
 134     | 0 | 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
 135     | N |   |  RT   | | 010 | RB    | RA!=0 | 0 | M | add
 136     | N |   |  RT   | | 011 | RB    | RA!=0 | 0 | M | sub.
 137     | N |   |  RT   | | 010 | RB    | RA    | 1 | M | mul
 138     | N |   |  RT   | | 011 | RB    | 0 0 0 | 0 | M | neg.
 139
 140 10 bit mode:
 141
 142 * sub. default CR target is CR0
 143 * for (RA|0) when RA=0 the input is a zero immediate,
 144   meaning that sub. becomes neg.
 145 * RT is implicitly RB: "add RT(=RB), RA, RB"
 146
 147 ### Logical
 148
 149     | 0 | 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
 150     | N |   |  RT   | | 100 | RB    | RA!=0 | 0 | M | and
 151     | N |   |  RT   | | 100 | RB    | RA!=0 | 1 | M | nand
 152     | N |   |  RT   | | 101 | RB    | RA!=0 | 0 | M | or
 153     | N |   |  RT   | | 101 | RB    | RA!=0 | 1 | M | nor
 154     | N |   |  RT   | | 100 | RB    | 0 0 0 | 0 | M | exts
 155     | N |   |  RT   | | 100 | RB    | 0 0 0 | 1 | M | cntlz
 156     | N |   |  RT   | | 101 | RB    | 0 0 0 | 0 | M | popcnt
 157     | N |   |  RT   | | 101 | RB    | 0 0 0 | 1 | M | not
 158
 159 10 bit mode:
 160
 161 * for (RA|0) when RA=0 the input is a zero immediate,
 162   meaning that nor becomes not
 163 * cntlz, popcnt, exts **not available** in 10-bit mode
 164 * RT is implicitly RB: "and RT(=RB), RA, RB"
 165
 166 ### Floating Point
 167
 168     | 0 | 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
 169     | N |   |  RT   | | 011 | RB    | RA!=0 | 1 | M | fsub.
 170     | N |   |  RT   | | 110 | RB    | RA!=0 | 0 | M | fadd
 171     | N |   |  RT   | | 110 | RB    | RA!=0 | 1 | M | fmul
 172     | N |   |  RT   | | 011 | RB    | 0 0 0 | 1 | M | fneg.
 173     | N |   |  RT   | | 110 | RB    | 0 0 0 | 0 | M | fabs
 174     | N |   |  RT   | | 110 | RB    | 0 0 0 | 1 | M | fmr.
 175
 176 10 bit mode:
 177
 178 * fsub. fneg. and fmr. default target is CR1
 179 * fmr. is **not available** in 10-bit mode
 180
 181 16 bit mode:
 182
 183 * fmr. copies RB to RT (and sets CR1)
 184
 185 ### Condition Register
 186
 187     | 0 1 2 3 | 4   | | 567 | 8 9 a | b c d e | f |
 188     | 0 0 0 0 | BF2 | | 001 | 1  BF | 0  BFA  | M | mcrf
 189     | 0 0 0 1 | BA2 | | 001 | 1  BA | 0  BB   | M | crnor
 190     | 0 1 0 0 | BA2 | | 001 | 1  BA | 0  BB   | M | crandc
 191     | 0 1 1 0 | BA2 | | 001 | 1  BA | 0  BB   | M | crxor
 192     | 0 1 1 1 | BA2 | | 001 | 1  BA | 0  BB   | M | crnand
 193     | 1 0 0 0 | BA2 | | 001 | 1  BA | 0  BB   | M | crand
 194     | 1 0 0 1 | BA2 | | 001 | 1  BA | 0  BB   | M | creqv
 195     | 1 1 0 1 | BA2 | | 001 | 1  BA | 0  BB   | M | crorc
 196     | 1 1 1 0 | BA2 | | 001 | 1  BA | 0  BB   | M | cror
 197
 198 10 bit mode:
 199
 200 * mcrf BF is only 2 bits which means the destination is only CR0-CR3
 201 * CR operations: **not available** in 10-bit mode
 202
 203 16 bit mode:
 204
 205 * mcrf BF2 extends BF (in MSB) to 3 bits
 206 * CR operations: destination register is same as BA.
 207 * CR operations: only possible on CR0 and CR1
 208
 209 SV (Vector Mode):
 210
 211 * CR operations: greatly extended reach/range (useful for predicates)
 212
 213 ### System
 214
 215 Selection of Compressed-encoding "Bank".  Different "banks" give different
 216 meanings to opcodes.  Example: CBank=0b001 is heavily optimised to A/Video
 217 Encode/Decode.
 218
 219     | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
 220     |       Bank2 | | 010 | CBank | 0 0 0 | 0 | M | cbank
 221
 222 **not available** in 10-bit mode:
 223
 224     | 0 1 2 3 | 4  | | 567 | 8 9 a | b c d e  | f |
 225     | 1 1 1 1 | 0  | | 001 | 1  00 | 0  RT    | M | mtlr
 226     | 1 1 1 1 | 0  | | 001 | 1  01 | 0  RT    | M | mtctr
 227     | 1 1 1 1 | 0  | | 001 | 1  10 | 0  RT    | M | mttar
 228     | 1 1 1 1 | 0  | | 001 | 1  11 | 0  RT    | M | mtcr
 229     | 1 1 1 1 | 1  | | 001 | 1  00 | 0  RA    | M | mflr
 230     | 1 1 1 1 | 1  | | 001 | 1  01 | 0  RA    | M | mfctr
 231     | 1 1 1 1 | 1  | | 001 | 1  10 | 0  RA    | M | mftar
 232     | 1 1 1 1 | 1  | | 001 | 1  11 | 0  RA    | M | mfcr
 233
 234 ### Unallocated
 235
 236     | 0 1 2 3 | 4  | | 567 | 8 9 a | b c d e  | f |
 237     | 0 0 1 0 |    | | 001 | 1     | 0        | M |
 238     | 0 0 1 1 |    | | 001 | 1     | 0        | M |
 239     | 0 1 0 1 |    | | 001 | 1     | 0        | M |
 240     | 1 0 1 0 |    | | 001 | 1     | 0        | M |
 241     | 1 0 1 1 |    | | 001 | 1     | 0        | M |
 242     | 1 1 0 0 |    | | 001 | 1     | 0        | M |
 243