openpower/sv/16_bit_compressed.mdwn

   1 # 16 bit Compressed
   2
   3 See <https://bugs.libre-soc.org/show_bug.cgi?id=238>
   4
   5 This one is a conundrum.  OpenPOWER ISA was never designed with 16
   6 bit in mind.  VLE was added 10 years ago but only by way of marking
   7 an entire 64k page as "VLE".  With no means to mix 32 bit and 16 bit,
   8 jumping between the two would have been painful and taken up space.
   9
  10 Here, in order to embed 16 bit into a predominantly 32 bit stream the
  11 overhead of using an entire 16 bits just to switch into Compressed mode
  12 is itself a significant overhead.  The situation is made worse by 5 bits
  13 being taken up by Major Opcode space, leaving only 11 bits to allocate
  14 to actual instructions.
  15
  16 In addition we would like to add SV-C32 which is a Vectorised version
  17 of 16 bit Compressed, and ideally have a variant that adds the 27-bit
  18 prefix format from SV-P64, as well.
  19
  20 Potential ways to reduce pressure on the 16 bit space are:
  21
  22 * To provide "paging".  This involves bank-switching to alternative optimised encodings for specific workloads
  23 * To enter "16 bit mode" for durations specified at the start
  24 * To reserve one bit of every 16 bit instruction to indicate that the 16 bit mode is to continue to be sustained
  25
  26 This latter would be useful in the Vector context to have an alternative
  27 meaning: as the bit which determines whether the instruction is 11-bit
  28 prefixed or 27-bit prefixed:
  29
  30     0 1 2 3 4 5 6 7 8 9 a b c d e f |
  31     |major op | 11 bit vector prefix|
  32     |16 bit opcode  alt vec. mode ^ |
  33     | extra vector prefix if alt set|
  34
  35 Using a major opcode to enter 16 bit mode, leaves 11 bits to find
  36 something to use them for:
  37
  38     0 1 2 3 4 5 6 7 8 9 a b c d e f |
  39     |major op | what to do here   1 |
  40     |16 bit    stay in 16bit mode 1 |
  41     |16 bit    stay in 16bit mode 1 |
  42     |16 bit       exit 16bit mode 0 |
  43
  44 One possibility is that the 11 bits are used for bank selection, with
  45 some room for additional context such as altering the registers used
  46 for the 16 bit operations (bank selection of which scalar regs)
  47
  48 Another is to use the 11 bits for only the utmost commonly used
  49 instructions.  That being the case then even one of those 11 bits would
  50 also need to be dedicated to saying if 16 bit mode is to be continued.
  51 10 bits remain for actual opcodes!
  52
  53 # Opcode Allocation Ideas
  54
  55 ## Opcodes exploration (Attempt 1)
  56
  57 ### Branch
  58
  59 10 bit mode may be expanded by 16 bit mode later, adding capabilities
  60 that do not fit in the extreme limited space.
  61
  62     | 0 1 | 2 3 4 | | 5 6 7 | 8 9 | a b | c d | e  | f |
  63     |   offs2     | | 0 0 0 |     offs        | LK | 1 | b
  64     | BO2 | BI3   | | 0 0 1 | 00  | BI  | BO  | LK | 1 | bclr
  65     | BO2 | BI3   | | 0 0 1 | 01  | BI  | BO  | LK | 1 | bctar
  66
  67 16 bit mode:
  68
  69 * offs2 extends offset in MSBs
  70 * BI3 extends BI in MSBs to allow selection of full CR
  71 * BO2 extends BO
  72
  73 10 bit mode:
  74
  75 * BO[0] enables CR check, BO[1] inverts check
  76 * BI refers to CR0 only (4 bits of)
  77 * no Branch Conditional with immediate
  78 * no Absolute Address
  79 * no CTR mode (and no bctr)
  80 * offs is to 2 byte (signed) aligned
  81 * all branches to 2 byte aligned
  82
  83 ### LD/ST
  84
  85     | 0 | 1   | 2 3 4 | | 5 6 7 | 8 9 | a b | c d | e | f |
  86     | F | RA2 |  RT   | | 0 0 1 | 11  | RA  | RB  | 0 | 1 | ld
  87     | F | RT2 |  RB   | | 0 0 1 | 11  | RA  | RT  | 1 | 1 | st
  88
  89 * elwidth overrides can set different widths
  90
  91 16 bit mode:
  92
  93 * F=1 is FLD, FST
  94 * RA2 extends RA to 3 bits (MSB)
  95 * RT2 extends RT to 3 bits (MSB)
  96
  97 10 bit mode:
  98
  99 * RA and RB are only 2 bit (0-3)
 100 * for LD, RT is implicitly RB: ld RT=RB, RA(RB)
 101 * for ST, there is no offset: st RT, RA(0)
 102
 103 ### Arithmetic
 104
 105     | 0 1 | 2 3 4 | | 5 6 7 | 8 9 a | b c d | e | f |
 106     |     |       | | 0 1 0 | RB    | RA    | 0 | 1 | add
 107     |     |       | | 0 1 0 | RB    | RA    | 1 | 1 | mul
 108     |     |       | | 0 1 1 | RB    | (RA|0)| 0 | 1 | sub
 109     |     |       | | 0 1 1 | RB    | (RA|0)| 1 | 1 | cmp
 110
 111 10 bit mode:
 112
 113 * cmp default target is CR0
 114 * for (RA|0) when RA=0 the input is a zero immediate,
 115   meaning that sub becomes neg, and cmp becomes cmp-against-zero
 116
 117 ### Logical
 118
 119     | 0 1 | 2 3 4 | | 5 6 7 | 8 9 a | b c d | e | f |
 120     |     |       | | 1 0 0 | RB    | RA    | 0 | 1 | and
 121     |     |       | | 1 0 0 | RB    | RA    | 1 | 1 | nand
 122     |     |       | | 1 0 1 | RB    | RA    | 0 | 1 | or
 123     |     |       | | 1 0 1 | RB    | (RA|0)| 1 | 1 | nor
 124
 125 10 bit mode:
 126
 127 * for (RA|0) when RA=0 the input is a zero immediate,
 128   meaning that nor becomes not
 129
 130 ### Floating Point
 131
 132     | 0 1 | 2 3 4 | | 5 6 7 | 8 9 a | b c d | e | f |
 133     |     |  RT   | | 1 1 0 | RB    | RA!=0 | 0 | 1 | fadd
 134     |     |  RT   | | 1 1 0 | RB    | 0 0 0 | 0 | 1 | fabs
 135     |     |  RT   | | 1 1 0 | RB    | RA    | 1 | 1 | fmul
 136     |     |  RT   | | 1 1 1 | RB    | (RA|0)| 0 | 1 | fsub
 137     |     |  RT   | | 1 1 1 | RB    | (RA|0)| 1 | 1 | fcmp
 138
 139 10 bit mode:
 140
 141 * fcmp default target is CR1
 142 * for (RA|0) when RA=0 the input is a zero immediate,
 143   meaning that fsub becomes fneg, and fcmp becomes fcmp-against-zero
 144
 145 ### Condition Register
 146
 147     | 0 1 2 3 | 4   | | 5 6 7 | 8 9 | a b | c d e  | f |
 148     | 0 0 0 0 | BF2 | | 0 0 1 | 10  | BF  | BFA    | 1 | mcrf
 149     | 0 0 0 1 | BA2 | | 0 0 1 | 10  | BA  | BB     | 1 | crnor
 150     | 0 1 0 0 | BA2 | | 0 0 1 | 10  | BA  | BB     | 1 | crandc
 151     | 0 1 1 0 | BA2 | | 0 0 1 | 10  | BA  | BB     | 1 | crxor
 152     | 0 1 1 1 | BA2 | | 0 0 1 | 10  | BA  | BB     | 1 | crnand
 153     | 1 0 0 0 | BA2 | | 0 0 1 | 10  | BA  | BB     | 1 | crand
 154     | 1 0 0 1 | BA2 | | 0 0 1 | 10  | BA  | BB     | 1 | creqv
 155     | 1 1 0 1 | BA2 | | 0 0 1 | 10  | BA  | BB     | 1 | crorc
 156     | 1 1 1 0 | BA2 | | 0 0 1 | 10  | BA  | BB     | 1 | cror
 157
 158 10 bit mode:
 159
 160 * mcrf BF is only 2 bits which means the destination is only CR0-CR3
 161
 162 16 bit mode:
 163
 164 * mcrf BF2 extends BF (in MSB) to 3 bits
 165 * CR operations: destination register is same as BA.
 166 * CR operations: only possible on CR0 and CR1
 167
 168 SV (Vector Mode):
 169
 170 * CR operations: greatly extended reach/range (useful for predicates)
 171