openpower/sv/16_bit_compressed.mdwn

   1 # 16 bit Compressed
   2
   3 See:
   4
   5 * <https://bugs.libre-soc.org/show_bug.cgi?id=238>
   6 * <https://ftp.libre-soc.org/VLE_314-68105.pdf> VLE Encoding
   7
   8 This one is a conundrum.  OpenPOWER ISA was never designed with 16
   9 bit in mind.  VLE was added 10 years ago but only by way of marking
  10 an entire 64k page as "VLE".  With VLE not maintained it is not
  11 fully compatible with current PowerISA.
  12
  13 Here, in order to embed 16 bit into a predominantly 32 bit stream the
  14 overhead of using an entire 16 bits just to switch into Compressed mode
  15 is itself a significant overhead.  The situation is made worse by 5 bits
  16 being taken up by Major Opcode space, leaving only 11 bits to allocate
  17 to actual instructions.
  18
  19 In addition we would like to add SV-C32 which is a Vectorised version
  20 of 16 bit Compressed, and ideally have a variant that adds the 27-bit
  21 prefix format from SV-P64, as well.
  22
  23 Potential ways to reduce pressure on the 16 bit space are:
  24
  25 * To provide "paging".  This involves bank-switching to alternative optimised encodings for specific workloads
  26 * To enter "16 bit mode" for durations specified at the start
  27 * To reserve one bit of every 16 bit instruction to indicate that the 16 bit mode is to continue to be sustained
  28
  29 This latter would be useful in the Vector context to have an alternative
  30 meaning: as the bit which determines whether the instruction is 11-bit
  31 prefixed or 27-bit prefixed:
  32
  33     0 1 2 3 4 5 6 7 8 9 a b c d e f |
  34     |major op | 11 bit vector prefix|
  35     |16 bit opcode  alt vec. mode ^ |
  36     | extra vector prefix if alt set|
  37
  38 Using a major opcode to enter 16 bit mode, leaves 11 bits to find
  39 something to use them for:
  40
  41     0 1 2 3 4 5 6 7 8 9 a b c d e f |
  42     |major op | what to do here   1 |
  43     |16 bit    stay in 16bit mode 1 |
  44     |16 bit    stay in 16bit mode 1 |
  45     |16 bit       exit 16bit mode 0 |
  46
  47 One possibility is that the 11 bits are used for bank selection, with
  48 some room for additional context such as altering the registers used
  49 for the 16 bit operations (bank selection of which scalar regs)
  50
  51 Another is to use the 11 bits for only the utmost commonly used
  52 instructions.  That being the case then even one of those 11 bits would
  53 also need to be dedicated to saying if 16 bit mode is to be continued.
  54 10 bits remain for actual opcodes!
  55
  56 # Opcode Allocation Ideas
  57
  58 ## Opcodes exploration (Attempt 1)
  59
  60 ### Branch
  61
  62 10 bit mode may be expanded by 16 bit mode later, adding capabilities
  63 that do not fit in the extreme limited space.
  64
  65     | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e  | f |
  66     |   offs2     | | 000 |    offs       | LK | 1 | b
  67     | BO2 | BI3   | | 001 | 0  BI | 0  BO | LK | 1 | bclr
  68     | BO2 | BI3   | | 001 | 0  BI | 1  BO | LK | 1 | bctar
  69
  70 16 bit mode:
  71
  72 * offs2 extends offset in MSBs
  73 * BI3 extends BI in MSBs to allow selection of full CR
  74 * BO2 extends BO
  75
  76 10 bit mode:
  77
  78 * BO[0] enables CR check, BO[1] inverts check
  79 * BI refers to CR0 only (4 bits of)
  80 * no Branch Conditional with immediate
  81 * no Absolute Address
  82 * no CTR mode (and no bctr)
  83 * offs is to 2 byte (signed) aligned
  84 * all branches to 2 byte aligned
  85
  86 ### LD/ST
  87
  88     | 0   | 1   | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
  89     | RB2 | RA2 |  RT   | | 001 | 1  RA | 1  RB | 0 | 1 | fld
  90     | RA2 | RT2 |  RB   | | 001 | 1  RA | 1  RT | 1 | 1 | fst
  91     |     |     |  RT   | | 111 |  RA   |  RB   | 0 | 1 | ld
  92     |     |     |  RB   | | 111 |  RA   |  RT   | 1 | 1 | st
  93
  94 * elwidth overrides can set different widths
  95
  96 16 bit mode:
  97
  98 * F=1 is FLD, FST
  99 * RA2 extends RA to 3 bits (MSB)
 100 * RT2 extends RT to 3 bits (MSB)
 101
 102 10 bit mode:
 103
 104 * RA and RB are only 2 bit (0-3)
 105 * for LD, RT is implicitly RB: ld RT=RB, RA(RB)
 106 * for ST, there is no offset: st RT, RA(0)
 107
 108 ### Arithmetic
 109
 110     | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
 111     |     |  RT   | | 010 | RB    | RA!=0 | 0 | 1 | add
 112     |     |  RT   | | 010 | RB    | RA    | 1 | 1 | mul
 113     |     |  RT   | | 011 | RB    | (RA|0)| 0 | 1 | sub.
 114
 115 10 bit mode:
 116
 117 * sub. default CR target is CR0
 118 * for (RA|0) when RA=0 the input is a zero immediate,
 119   meaning that sub. becomes neg.
 120
 121 ### Logical
 122
 123     | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
 124     |     |  RT   | | 100 | RB    | RA!=0 | 0 | 1 | and
 125     |     |  RT   | | 100 | RB    | RA!=0 | 1 | 1 | nand
 126     |     |  RT   | | 101 | RB    | RA!=0 | 0 | 1 | or
 127     |     |  RT   | | 101 | RB    | RA!=0 | 1 | 1 | nor
 128     |     |  RT   | | 100 | RB    | 0 0 0 | 0 | 1 | exts
 129     |     |  RT   | | 100 | RB    | 0 0 0 | 1 | 1 | cntlz
 130     |     |  RT   | | 101 | RB    | 0 0 0 | 0 | 1 | popcnt
 131     |     |  RT   | | 101 | RB    | 0 0 0 | 1 | 1 | not
 132
 133 10 bit mode:
 134
 135 * for (RA|0) when RA=0 the input is a zero immediate,
 136   meaning that nor becomes not
 137
 138 ### Floating Point
 139
 140     | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
 141     |     |  RT   | | 011 | RB    | RA!=0 | 1 | 1 | fsub.
 142     |     |  RT   | | 110 | RB    | RA!=0 | 0 | 1 | fadd
 143     |     |  RT   | | 110 | RB    | RA!=0 | 1 | 1 | fmul
 144     |     |  RT   | | 011 | RB    | 0 0 0 | 1 | 1 | fneg.
 145     |     |  RT   | | 110 | RB    | 0 0 0 | 0 | 1 | fabs
 146     |     |  RT   | | 110 | RB    | 0 0 0 | 1 | 1 | fmr.
 147
 148 10 bit mode:
 149
 150 * fsub. fneg. and fmr. default target is CR1
 151 * fmr. is **not available** in 10-bit mode
 152
 153 16 bit mode:
 154
 155 * fmr. copies RB to RT (and sets CR1)
 156
 157 ### Condition Register
 158
 159     | 0 1 2 3 | 4   | | 567 | 8 9 a | b c d e | f |
 160     | 0 0 0 0 | BF2 | | 001 | 1  BF | 0  BFA  | 1 | mcrf
 161     | 0 0 0 1 | BA2 | | 001 | 1  BA | 0  BB   | 1 | crnor
 162     | 0 1 0 0 | BA2 | | 001 | 1  BA | 0  BB   | 1 | crandc
 163     | 0 1 1 0 | BA2 | | 001 | 1  BA | 0  BB   | 1 | crxor
 164     | 0 1 1 1 | BA2 | | 001 | 1  BA | 0  BB   | 1 | crnand
 165     | 1 0 0 0 | BA2 | | 001 | 1  BA | 0  BB   | 1 | crand
 166     | 1 0 0 1 | BA2 | | 001 | 1  BA | 0  BB   | 1 | creqv
 167     | 1 1 0 1 | BA2 | | 001 | 1  BA | 0  BB   | 1 | crorc
 168     | 1 1 1 0 | BA2 | | 001 | 1  BA | 0  BB   | 1 | cror
 169
 170 10 bit mode:
 171
 172 * mcrf BF is only 2 bits which means the destination is only CR0-CR3
 173 * CR operations: **not available** in 10-bit mode
 174
 175 16 bit mode:
 176
 177 * mcrf BF2 extends BF (in MSB) to 3 bits
 178 * CR operations: destination register is same as BA.
 179 * CR operations: only possible on CR0 and CR1
 180
 181 SV (Vector Mode):
 182
 183 * CR operations: greatly extended reach/range (useful for predicates)
 184
 185 ### System
 186
 187 10/16-bit mode:
 188
 189     | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
 190     |     |       | | 010 | 0 0 0 | 0 0 0 | 0 | 1 | sc
 191     |     |       | | 010 | 0 0 1 | 0 0 0 | 0 | 1 | rfid
 192
 193 **not available** in 10-bit mode:
 194
 195     | 0 1 2 3 | 4  | | 567 | 8 9 a | b c d e  | f |
 196     | 1 1 1 1 | 0  | | 001 | 1  00 | 0  RT    | 1 | mtlr
 197     | 1 1 1 1 | 0  | | 001 | 1  01 | 0  RT    | 1 | mtctr
 198     | 1 1 1 1 | 0  | | 001 | 1  10 | 0  RT    | 1 | mttar
 199     | 1 1 1 1 | 0  | | 001 | 1  11 | 0  RT    | 1 | mtcr
 200     | 1 1 1 1 | 1  | | 001 | 1  00 | 0  RA    | 1 | mflr
 201     | 1 1 1 1 | 1  | | 001 | 1  01 | 0  RA    | 1 | mfctr
 202     | 1 1 1 1 | 1  | | 001 | 1  10 | 0  RA    | 1 | mftar
 203     | 1 1 1 1 | 1  | | 001 | 1  11 | 0  RA    | 1 | mfcr
 204
 205 ### Unallocated
 206
 207     | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
 208     |     |       | | 010 | 0 1 0 | 0 0 0 | 0 | 1 |
 209     |     |       | | 010 | 0 1 1 | 0 0 0 | 0 | 1 |
 210     |     |       | | 010 | 1 0 0 | 0 0 0 | 0 | 1 |
 211     |     |       | | 010 | 1 0 1 | 0 0 0 | 0 | 1 |
 212     |     |       | | 010 | 1 1 0 | 0 0 0 | 0 | 1 |
 213     |     |       | | 010 | 1 1 1 | 0 0 0 | 0 | 1 |
 214
 215     | 0 1 2 3 | 4  | | 567 | 8 9 a | b c d e  | f |
 216     | 0 0 1 0 |    | | 001 | 1     | 0        | 1 |
 217     | 0 0 1 1 |    | | 001 | 1     | 0        | 1 |
 218     | 0 1 0 1 |    | | 001 | 1     | 0        | 1 |
 219     | 1 0 1 0 |    | | 001 | 1     | 0        | 1 |
 220     | 1 0 1 1 |    | | 001 | 1     | 0        | 1 |
 221     | 1 1 0 0 |    | | 001 | 1     | 0        | 1 |
 222