mention mode-switching idea (to 32-bit for 1 cycle)
[libreriscv.git] / openpower / sv / 16_bit_compressed.mdwn
1 # 16 bit Compressed
2
3 See:
4
5 * <https://bugs.libre-soc.org/show_bug.cgi?id=238>
6 * <https://ftp.libre-soc.org/VLE_314-68105.pdf> VLE Encoding
7
8 This one is a conundrum. OpenPOWER ISA was never designed with 16
9 bit in mind. VLE was added 10 years ago but only by way of marking
10 an entire 64k page as "VLE". With VLE not maintained it is not
11 fully compatible with current PowerISA.
12
13 Here, in order to embed 16 bit into a predominantly 32 bit stream the
14 overhead of using an entire 16 bits just to switch into Compressed mode
15 is itself a significant overhead. The situation is made worse by 5 bits
16 being taken up by Major Opcode space, leaving only 11 bits to allocate
17 to actual instructions.
18
19 In addition we would like to add SV-C32 which is a Vectorised version
20 of 16 bit Compressed, and ideally have a variant that adds the 27-bit
21 prefix format from SV-P64, as well.
22
23 Potential ways to reduce pressure on the 16 bit space are:
24
25 * To provide "paging". This involves bank-switching to alternative optimised encodings for specific workloads
26 * To enter "16 bit mode" for durations specified at the start
27 * To reserve one bit of every 16 bit instruction to indicate that the 16 bit mode is to continue to be sustained
28
29 This latter would be useful in the Vector context to have an alternative
30 meaning: as the bit which determines whether the instruction is 11-bit
31 prefixed or 27-bit prefixed:
32
33 0 1 2 3 4 5 6 7 8 9 a b c d e f |
34 |major op | 11 bit vector prefix|
35 |16 bit opcode alt vec. mode ^ |
36 | extra vector prefix if alt set|
37
38 Using a major opcode to enter 16 bit mode, leaves 11 bits to find
39 something to use them for:
40
41 0 1 2 3 4 5 6 7 8 9 a b c d e f |
42 |major op | what to do here 1 |
43 |16 bit stay in 16bit mode 1 |
44 |16 bit stay in 16bit mode 1 |
45 |16 bit exit 16bit mode 0 |
46
47 One possibility is that the 11 bits are used for bank selection, with
48 some room for additional context such as altering the registers used
49 for the 16 bit operations (bank selection of which scalar regs)
50
51 Another is to use the 11 bits for only the utmost commonly used
52 instructions. That being the case then even one of those 11 bits would
53 also need to be dedicated to saying if 16 bit mode is to be continued.
54 10 bits remain for actual opcodes!
55
56 # Opcode Allocation Ideas
57
58 * one bit from the 16-bit mode is used to indicate that 32-bit mode
59 is to be dropped into for only one single instruction
60 <https://bugs.libre-soc.org/show_bug.cgi?id=238#c2>
61
62 ## Opcodes exploration (Attempt 1)
63
64 Switching between different encoding modes is controlled by M (alone)
65 in 10-bit mode, and M and N in 16-bit mode.
66
67 * M in 10-bit mode if zero indicates that following instructions are
68 standard OpenPOWER ISA 32-bit encoded (including, redundantly,
69 further 10/16-bit instructions)
70 * M in 10-bit mode if 1 indicates that following instructions are
71 in 16-bit encoding mode
72
73 Once in 16-bit mode:
74
75 * 0b01: stay in 16-bit mode
76 * 0b00: leave 16-bit mode permanently (return to standard OpenPOWER ISA)
77 * 0b10: leave 16-bit mode for one cycle (return to standard OpenPOWER ISA)
78 * 0b11: free to be used for something completely different.
79
80 The current "top" idea for 0b11 is to use it for a new encoding format
81 of predominantly "immediates-based" 16-bit instructions (branch-conditional,
82 addi, mulli etc.)
83
84 ### Branch
85
86 10 bit mode may be expanded by 16 bit mode later, adding capabilities
87 that do not fit in the extreme limited space.
88
89 | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
90 | offs2 | | 000 | offs | LK | M | b
91 | BO2 | BI3 | | 001 | 0 BI | 0 BO | LK | M | bclr
92 | BO2 | BI3 | | 001 | 0 BI | 1 BO | LK | M | bctar
93
94 16 bit mode:
95
96 * offs2 extends offset in MSBs
97 * BI3 extends BI in MSBs to allow selection of full CR
98 * BO2 extends BO
99
100 10 bit mode:
101
102 * BO[0] enables CR check, BO[1] inverts check
103 * BI refers to CR0 only (4 bits of)
104 * no Branch Conditional with immediate
105 * no Absolute Address
106 * no CTR mode (and no bctr)
107 * offs is to 2 byte (signed) aligned
108 * all branches to 2 byte aligned
109
110 ### LD/ST
111
112 | 0 | 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
113 | RB2 | RA2 | RT | | 001 | 1 RA | 1 RB | 0 | M | fld
114 | RA2 | RT2 | RB | | 001 | 1 RA | 1 RT | 1 | M | fst
115 | | | RT | | 111 | RA | RB | 0 | M | ld
116 | | | RB | | 111 | RA | RT | 1 | M | st
117
118 * elwidth overrides can set different widths
119
120 16 bit mode:
121
122 * F=1 is FLD, FST
123 * RA2 extends RA to 3 bits (MSB)
124 * RT2 extends RT to 3 bits (MSB)
125
126 10 bit mode:
127
128 * RA and RB are only 2 bit (0-3)
129 * for LD, RT is implicitly RB: "ld RT=RB, RA(RB)"
130 * for ST, there is no offset: "st RT, RA(0)"
131
132 ### Arithmetic
133
134 | 0 | 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
135 | N | | RT | | 010 | RB | RA!=0 | 0 | M | add
136 | N | | RT | | 011 | RB | RA!=0 | 0 | M | sub.
137 | N | | RT | | 010 | RB | RA | 1 | M | mul
138 | N | | RT | | 011 | RB | 0 0 0 | 0 | M | neg.
139
140 10 bit mode:
141
142 * sub. default CR target is CR0
143 * for (RA|0) when RA=0 the input is a zero immediate,
144 meaning that sub. becomes neg.
145 * RT is implicitly RB: "add RT(=RB), RA, RB"
146
147 ### Logical
148
149 | 0 | 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
150 | N | | RT | | 100 | RB | RA!=0 | 0 | M | and
151 | N | | RT | | 100 | RB | RA!=0 | 1 | M | nand
152 | N | | RT | | 101 | RB | RA!=0 | 0 | M | or
153 | N | | RT | | 101 | RB | RA!=0 | 1 | M | nor
154 | N | | RT | | 100 | RB | 0 0 0 | 0 | M | exts
155 | N | | RT | | 100 | RB | 0 0 0 | 1 | M | cntlz
156 | N | | RT | | 101 | RB | 0 0 0 | 0 | M | popcnt
157 | N | | RT | | 101 | RB | 0 0 0 | 1 | M | not
158
159 10 bit mode:
160
161 * for (RA|0) when RA=0 the input is a zero immediate,
162 meaning that nor becomes not
163 * cntlz, popcnt, exts **not available** in 10-bit mode
164 * RT is implicitly RB: "and RT(=RB), RA, RB"
165
166 ### Floating Point
167
168 | 0 | 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
169 | N | | RT | | 011 | RB | RA!=0 | 1 | M | fsub.
170 | N | | RT | | 110 | RB | RA!=0 | 0 | M | fadd
171 | N | | RT | | 110 | RB | RA!=0 | 1 | M | fmul
172 | N | | RT | | 011 | RB | 0 0 0 | 1 | M | fneg.
173 | N | | RT | | 110 | RB | 0 0 0 | 0 | M | fabs
174 | N | | RT | | 110 | RB | 0 0 0 | 1 | M | fmr.
175
176 10 bit mode:
177
178 * fsub. fneg. and fmr. default target is CR1
179 * fmr. is **not available** in 10-bit mode
180
181 16 bit mode:
182
183 * fmr. copies RB to RT (and sets CR1)
184
185 ### Condition Register
186
187 | 0 1 2 3 | 4 | | 567 | 8 9 a | b c d e | f |
188 | 0 0 0 0 | BF2 | | 001 | 1 BF | 0 BFA | M | mcrf
189 | 0 0 0 1 | BA2 | | 001 | 1 BA | 0 BB | M | crnor
190 | 0 1 0 0 | BA2 | | 001 | 1 BA | 0 BB | M | crandc
191 | 0 1 1 0 | BA2 | | 001 | 1 BA | 0 BB | M | crxor
192 | 0 1 1 1 | BA2 | | 001 | 1 BA | 0 BB | M | crnand
193 | 1 0 0 0 | BA2 | | 001 | 1 BA | 0 BB | M | crand
194 | 1 0 0 1 | BA2 | | 001 | 1 BA | 0 BB | M | creqv
195 | 1 1 0 1 | BA2 | | 001 | 1 BA | 0 BB | M | crorc
196 | 1 1 1 0 | BA2 | | 001 | 1 BA | 0 BB | M | cror
197
198 10 bit mode:
199
200 * mcrf BF is only 2 bits which means the destination is only CR0-CR3
201 * CR operations: **not available** in 10-bit mode
202
203 16 bit mode:
204
205 * mcrf BF2 extends BF (in MSB) to 3 bits
206 * CR operations: destination register is same as BA.
207 * CR operations: only possible on CR0 and CR1
208
209 SV (Vector Mode):
210
211 * CR operations: greatly extended reach/range (useful for predicates)
212
213 ### System
214
215 Selection of Compressed-encoding "Bank". Different "banks" give different
216 meanings to opcodes. Example: CBank=0b001 is heavily optimised to A/Video
217 Encode/Decode.
218
219 | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
220 | Bank2 | | 010 | CBank | 0 0 0 | 0 | M | cbank
221
222 **not available** in 10-bit mode:
223
224 | 0 1 2 3 | 4 | | 567 | 8 9 a | b c d e | f |
225 | 1 1 1 1 | 0 | | 001 | 1 00 | 0 RT | M | mtlr
226 | 1 1 1 1 | 0 | | 001 | 1 01 | 0 RT | M | mtctr
227 | 1 1 1 1 | 0 | | 001 | 1 10 | 0 RT | M | mttar
228 | 1 1 1 1 | 0 | | 001 | 1 11 | 0 RT | M | mtcr
229 | 1 1 1 1 | 1 | | 001 | 1 00 | 0 RA | M | mflr
230 | 1 1 1 1 | 1 | | 001 | 1 01 | 0 RA | M | mfctr
231 | 1 1 1 1 | 1 | | 001 | 1 10 | 0 RA | M | mftar
232 | 1 1 1 1 | 1 | | 001 | 1 11 | 0 RA | M | mfcr
233
234 ### Unallocated
235
236 | 0 1 2 3 | 4 | | 567 | 8 9 a | b c d e | f |
237 | 0 0 1 0 | | | 001 | 1 | 0 | M |
238 | 0 0 1 1 | | | 001 | 1 | 0 | M |
239 | 0 1 0 1 | | | 001 | 1 | 0 | M |
240 | 1 0 1 0 | | | 001 | 1 | 0 | M |
241 | 1 0 1 1 | | | 001 | 1 | 0 | M |
242 | 1 1 0 0 | | | 001 | 1 | 0 | M |
243