openpower/sv/16_bit_compressed.mdwn

   1 # 16 bit Compressed
   2
   3 See:
   4
   5 * <https://bugs.libre-soc.org/show_bug.cgi?id=238>
   6 * <https://ftp.libre-soc.org/VLE_314-68105.pdf> VLE Encoding
   7
   8 This one is a conundrum.  OpenPOWER ISA was never designed with 16
   9 bit in mind.  VLE was added 10 years ago but only by way of marking
  10 an entire 64k page as "VLE".  With VLE not maintained it is not
  11 fully compatible with current PowerISA.
  12
  13 Here, in order to embed 16 bit into a predominantly 32 bit stream the
  14 overhead of using an entire 16 bits just to switch into Compressed mode
  15 is itself a significant overhead.  The situation is made worse by 5 bits
  16 being taken up by Major Opcode space, leaving only 11 bits to allocate
  17 to actual instructions.
  18
  19 In addition we would like to add SV-C32 which is a Vectorised version
  20 of 16 bit Compressed, and ideally have a variant that adds the 27-bit
  21 prefix format from SV-P64, as well.
  22
  23 Potential ways to reduce pressure on the 16 bit space are:
  24
  25 * To provide "paging".  This involves bank-switching to alternative optimised encodings for specific workloads
  26 * To enter "16 bit mode" for durations specified at the start
  27 * To reserve one bit of every 16 bit instruction to indicate that the 16 bit mode is to continue to be sustained
  28
  29 This latter would be useful in the Vector context to have an alternative
  30 meaning: as the bit which determines whether the instruction is 11-bit
  31 prefixed or 27-bit prefixed:
  32
  33     0 1 2 3 4 5 6 7 8 9 a b c d e f |
  34     |major op | 11 bit vector prefix|
  35     |16 bit opcode  alt vec. mode ^ |
  36     | extra vector prefix if alt set|
  37
  38 Using a major opcode to enter 16 bit mode, leaves 11 bits to find
  39 something to use them for:
  40
  41     0 1 2 3 4 5 6 7 8 9 a b c d e f |
  42     |major op | what to do here   1 |
  43     |16 bit    stay in 16bit mode 1 |
  44     |16 bit    stay in 16bit mode 1 |
  45     |16 bit       exit 16bit mode 0 |
  46
  47 One possibility is that the 11 bits are used for bank selection, with
  48 some room for additional context such as altering the registers used
  49 for the 16 bit operations (bank selection of which scalar regs)
  50
  51 Another is to use the 11 bits for only the utmost commonly used
  52 instructions.  That being the case then even one of those 11 bits would
  53 also need to be dedicated to saying if 16 bit mode is to be continued.
  54 10 bits remain for actual opcodes!
  55
  56 # Opcode Allocation Ideas
  57
  58 * one bit from the 16-bit mode is used to indicate that 32-bit mode
  59   is to be dropped into for only one single instruction
  60   <https://bugs.libre-soc.org/show_bug.cgi?id=238#c2>
  61
  62 ## Opcodes exploration (Attempt 1)
  63
  64 ### Branch
  65
  66 10 bit mode may be expanded by 16 bit mode later, adding capabilities
  67 that do not fit in the extreme limited space.
  68
  69     | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e  | f |
  70     |   offs2     | | 000 |    offs       | LK | 1 | b
  71     | BO2 | BI3   | | 001 | 0  BI | 0  BO | LK | 1 | bclr
  72     | BO2 | BI3   | | 001 | 0  BI | 1  BO | LK | 1 | bctar
  73
  74 16 bit mode:
  75
  76 * offs2 extends offset in MSBs
  77 * BI3 extends BI in MSBs to allow selection of full CR
  78 * BO2 extends BO
  79
  80 10 bit mode:
  81
  82 * BO[0] enables CR check, BO[1] inverts check
  83 * BI refers to CR0 only (4 bits of)
  84 * no Branch Conditional with immediate
  85 * no Absolute Address
  86 * no CTR mode (and no bctr)
  87 * offs is to 2 byte (signed) aligned
  88 * all branches to 2 byte aligned
  89
  90 ### LD/ST
  91
  92     | 0   | 1   | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
  93     | RB2 | RA2 |  RT   | | 001 | 1  RA | 1  RB | 0 | 1 | fld
  94     | RA2 | RT2 |  RB   | | 001 | 1  RA | 1  RT | 1 | 1 | fst
  95     |     |     |  RT   | | 111 |  RA   |  RB   | 0 | 1 | ld
  96     |     |     |  RB   | | 111 |  RA   |  RT   | 1 | 1 | st
  97
  98 * elwidth overrides can set different widths
  99
 100 16 bit mode:
 101
 102 * F=1 is FLD, FST
 103 * RA2 extends RA to 3 bits (MSB)
 104 * RT2 extends RT to 3 bits (MSB)
 105
 106 10 bit mode:
 107
 108 * RA and RB are only 2 bit (0-3)
 109 * for LD, RT is implicitly RB: "ld RT=RB, RA(RB)"
 110 * for ST, there is no offset: "st RT, RA(0)"
 111
 112 ### Arithmetic
 113
 114     | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
 115     |     |  RT   | | 010 | RB    | RA!=0 | 0 | 1 | add
 116     |     |  RT   | | 011 | RB    | RA!=0 | 0 | 1 | sub.
 117     |     |  RT   | | 010 | RB    | RA    | 1 | 1 | mul
 118     |     |  RT   | | 011 | RB    | 0 0 0 | 0 | 1 | neg.
 119
 120 10 bit mode:
 121
 122 * sub. default CR target is CR0
 123 * for (RA|0) when RA=0 the input is a zero immediate,
 124   meaning that sub. becomes neg.
 125 * RT is implicitly RB: "add RT(=RB), RA, RB"
 126
 127 ### Logical
 128
 129     | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
 130     |     |  RT   | | 100 | RB    | RA!=0 | 0 | 1 | and
 131     |     |  RT   | | 100 | RB    | RA!=0 | 1 | 1 | nand
 132     |     |  RT   | | 101 | RB    | RA!=0 | 0 | 1 | or
 133     |     |  RT   | | 101 | RB    | RA!=0 | 1 | 1 | nor
 134     |     |  RT   | | 100 | RB    | 0 0 0 | 0 | 1 | exts
 135     |     |  RT   | | 100 | RB    | 0 0 0 | 1 | 1 | cntlz
 136     |     |  RT   | | 101 | RB    | 0 0 0 | 0 | 1 | popcnt
 137     |     |  RT   | | 101 | RB    | 0 0 0 | 1 | 1 | not
 138
 139 10 bit mode:
 140
 141 * for (RA|0) when RA=0 the input is a zero immediate,
 142   meaning that nor becomes not
 143 * cntlz, popcnt, exts **not available** in 10-bit mode
 144 * RT is implicitly RB: "and RT(=RB), RA, RB"
 145
 146 ### Floating Point
 147
 148     | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
 149     |     |  RT   | | 011 | RB    | RA!=0 | 1 | 1 | fsub.
 150     |     |  RT   | | 110 | RB    | RA!=0 | 0 | 1 | fadd
 151     |     |  RT   | | 110 | RB    | RA!=0 | 1 | 1 | fmul
 152     |     |  RT   | | 011 | RB    | 0 0 0 | 1 | 1 | fneg.
 153     |     |  RT   | | 110 | RB    | 0 0 0 | 0 | 1 | fabs
 154     |     |  RT   | | 110 | RB    | 0 0 0 | 1 | 1 | fmr.
 155
 156 10 bit mode:
 157
 158 * fsub. fneg. and fmr. default target is CR1
 159 * fmr. is **not available** in 10-bit mode
 160
 161 16 bit mode:
 162
 163 * fmr. copies RB to RT (and sets CR1)
 164
 165 ### Condition Register
 166
 167     | 0 1 2 3 | 4   | | 567 | 8 9 a | b c d e | f |
 168     | 0 0 0 0 | BF2 | | 001 | 1  BF | 0  BFA  | 1 | mcrf
 169     | 0 0 0 1 | BA2 | | 001 | 1  BA | 0  BB   | 1 | crnor
 170     | 0 1 0 0 | BA2 | | 001 | 1  BA | 0  BB   | 1 | crandc
 171     | 0 1 1 0 | BA2 | | 001 | 1  BA | 0  BB   | 1 | crxor
 172     | 0 1 1 1 | BA2 | | 001 | 1  BA | 0  BB   | 1 | crnand
 173     | 1 0 0 0 | BA2 | | 001 | 1  BA | 0  BB   | 1 | crand
 174     | 1 0 0 1 | BA2 | | 001 | 1  BA | 0  BB   | 1 | creqv
 175     | 1 1 0 1 | BA2 | | 001 | 1  BA | 0  BB   | 1 | crorc
 176     | 1 1 1 0 | BA2 | | 001 | 1  BA | 0  BB   | 1 | cror
 177
 178 10 bit mode:
 179
 180 * mcrf BF is only 2 bits which means the destination is only CR0-CR3
 181 * CR operations: **not available** in 10-bit mode
 182
 183 16 bit mode:
 184
 185 * mcrf BF2 extends BF (in MSB) to 3 bits
 186 * CR operations: destination register is same as BA.
 187 * CR operations: only possible on CR0 and CR1
 188
 189 SV (Vector Mode):
 190
 191 * CR operations: greatly extended reach/range (useful for predicates)
 192
 193 ### System
 194
 195 Selection of Compressed-encoding "Bank".  Different "banks" give different
 196 meanings to opcodes.  Example: CBank=0b001 is heavily optimised to A/Video
 197 Encode/Decode.
 198
 199     | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
 200     |       Bank2 | | 010 | CBank | 0 0 0 | 0 | 1 | cbank
 201
 202 **not available** in 10-bit mode:
 203
 204     | 0 1 2 3 | 4  | | 567 | 8 9 a | b c d e  | f |
 205     | 1 1 1 1 | 0  | | 001 | 1  00 | 0  RT    | 1 | mtlr
 206     | 1 1 1 1 | 0  | | 001 | 1  01 | 0  RT    | 1 | mtctr
 207     | 1 1 1 1 | 0  | | 001 | 1  10 | 0  RT    | 1 | mttar
 208     | 1 1 1 1 | 0  | | 001 | 1  11 | 0  RT    | 1 | mtcr
 209     | 1 1 1 1 | 1  | | 001 | 1  00 | 0  RA    | 1 | mflr
 210     | 1 1 1 1 | 1  | | 001 | 1  01 | 0  RA    | 1 | mfctr
 211     | 1 1 1 1 | 1  | | 001 | 1  10 | 0  RA    | 1 | mftar
 212     | 1 1 1 1 | 1  | | 001 | 1  11 | 0  RA    | 1 | mfcr
 213
 214 ### Unallocated
 215
 216     | 0 1 2 3 | 4  | | 567 | 8 9 a | b c d e  | f |
 217     | 0 0 1 0 |    | | 001 | 1     | 0        | 1 |
 218     | 0 0 1 1 |    | | 001 | 1     | 0        | 1 |
 219     | 0 1 0 1 |    | | 001 | 1     | 0        | 1 |
 220     | 1 0 1 0 |    | | 001 | 1     | 0        | 1 |
 221     | 1 0 1 1 |    | | 001 | 1     | 0        | 1 |
 222     | 1 1 0 0 |    | | 001 | 1     | 0        | 1 |
 223