(no commit message)
[libreriscv.git] / openpower / sv / 16_bit_compressed.mdwn
1 # 16 bit Compressed
2
3 See:
4
5 * <https://bugs.libre-soc.org/show_bug.cgi?id=238>
6 * <https://ftp.libre-soc.org/VLE_314-68105.pdf> VLE Encoding
7
8 This one is a conundrum. OpenPOWER ISA was never designed with 16
9 bit in mind. VLE was added 10 years ago but only by way of marking
10 an entire 64k page as "VLE". With VLE not maintained it is not
11 fully compatible with current PowerISA.
12
13 Here, in order to embed 16 bit into a predominantly 32 bit stream the
14 overhead of using an entire 16 bits just to switch into Compressed mode
15 is itself a significant overhead. The situation is made worse by 5 bits
16 being taken up by Major Opcode space, leaving only 11 bits to allocate
17 to actual instructions.
18
19 In addition we would like to add SV-C32 which is a Vectorised version
20 of 16 bit Compressed, and ideally have a variant that adds the 27-bit
21 prefix format from SV-P64, as well.
22
23 Potential ways to reduce pressure on the 16 bit space are:
24
25 * To provide "paging". This involves bank-switching to alternative optimised encodings for specific workloads
26 * To enter "16 bit mode" for durations specified at the start
27 * To reserve one bit of every 16 bit instruction to indicate that the 16 bit mode is to continue to be sustained
28
29 This latter would be useful in the Vector context to have an alternative
30 meaning: as the bit which determines whether the instruction is 11-bit
31 prefixed or 27-bit prefixed:
32
33 0 1 2 3 4 5 6 7 8 9 a b c d e f |
34 |major op | 11 bit vector prefix|
35 |16 bit opcode alt vec. mode ^ |
36 | extra vector prefix if alt set|
37
38 Using a major opcode to enter 16 bit mode, leaves 11 bits to find
39 something to use them for:
40
41 0 1 2 3 4 5 6 7 8 9 a b c d e f |
42 |major op | what to do here 1 |
43 |16 bit stay in 16bit mode 1 |
44 |16 bit stay in 16bit mode 1 |
45 |16 bit exit 16bit mode 0 |
46
47 One possibility is that the 11 bits are used for bank selection, with
48 some room for additional context such as altering the registers used
49 for the 16 bit operations (bank selection of which scalar regs)
50
51 Another is to use the 11 bits for only the utmost commonly used
52 instructions. That being the case then even one of those 11 bits would
53 also need to be dedicated to saying if 16 bit mode is to be continued.
54 10 bits remain for actual opcodes!
55
56 # Opcode Allocation Ideas
57
58 * one bit from the 16-bit mode is used to indicate that 32-bit mode
59 is to be dropped into for only one single instruction
60 <https://bugs.libre-soc.org/show_bug.cgi?id=238#c2>
61
62 ## Opcodes exploration (Attempt 1)
63
64 Switching between different encoding modes is controlled by M (alone)
65 in 10-bit mode, and M and N in 16-bit mode.
66
67 * M in 10-bit mode if zero indicates that following instructions are
68 standard OpenPOWER ISA 32-bit encoded (including, redundantly,
69 further 10/16-bit instructions)
70 * M in 10-bit mode if 1 indicates that following instructions are
71 in 16-bit encoding mode
72
73 Once in 16-bit mode:
74
75 * 0b01 (M=1, N=0): stay in 16-bit mode
76 * 0b00: leave 16-bit mode permanently (return to standard OpenPOWER ISA)
77 * 0b10: leave 16-bit mode for one cycle (return to standard OpenPOWER ISA)
78 * 0b11: free to be used for something completely different.
79
80 The current "top" idea for 0b11 is to use it for a new encoding format
81 of predominantly "immediates-based" 16-bit instructions (branch-conditional,
82 addi, mulli etc.)
83
84 The Compressed Major Opcode is in bits 5-7.
85
86 * M+N mode-switching is not available for C-Major 0b001 or 0b111
87
88 ### Immediate Opcodes
89
90 only available in 16-bit mode, and only available when M=1 and N=1
91
92 | 0 | 1 | 2 3 4 | | 567 | e | 89a | b c | d | e | f |
93 | 1 | o2 | RT | | 010 | 1 | RB|0 | offs | 1 | addi.
94 | 1 | o2 | RT | | 011 | 1 | RB|0 | offs | 1 | addis.
95 | 1 | o2 | 0 | | 100 | 1 | RB | offs | 1 | cmpdi
96 | 1 | o2 | 1 | | 100 | 1 | RB | offs | 1 | cmpwi
97 | 1 | o2 | 0 | | 101 | 1 | RA | offs | 1 | ldi
98 | 1 | o2 | 1 | | 101 | 1 | RA | offs | 1 | lwi
99 | 1 | o2 | 0 | | 110 | 1 | RA | offs | 1 | flwi
100 | 1 | o2 | 1 | | 110 | 1 | RA | offs | 1 | fldi
101
102 * Note that bc is included (below)
103 * immediate is constructed from offs (LSBs) and o2 (MSB)
104 * for loads, offset is aligned. 8byte: o2||offs||0b000 4byte: 0b00
105 * RB|0 if RB is zero, addi. becomes "li"
106
107 ### Branch
108
109 10 bit mode may be expanded by 16 bit mode later, adding capabilities
110 that do not fit in the extreme limited space.
111
112 | 16-bit mode | | 10-bit mode |
113 | 0 | 1 | 234 | | 567 | 8 9a | b | cd | e | f |
114 | 0 | 0 000 | | 000 | 0 00 | 0 00 | 0 | 0 | illeg
115 | N | offs2 | | 000 | LK offs | M | b, bl
116 | 1 | offs2 | | 000 | LK | BI | BO1 oo | 1 | bc, bcl
117 | N | BO3 BI3 | | 001 | LK | 0 BI | BO | M | bclr, bclrl
118
119 16 bit mode:
120
121 * bc only available when N,M=0b11
122 * offs2 extends offset in MSBs
123 * BI3 extends BI in MSBs to allow selection of full CR
124 * BO3 extends BO
125 * bc offset constructed from oo as LSBs and offs2 as MSBs
126 * bc BI allows selection of all bits from CR0 or CR1
127 * bc CR check is always active (as if BO0=1) therefore BO1 inverts
128
129 10 bit mode:
130
131 * bc **not available** in 10-bit mode
132 * BO[0] enables CR check, BO[1] inverts check
133 * BI refers to CR0 only (4 bits of)
134 * no Branch Conditional with immediate
135 * no Absolute Address
136 * CTR mode allowed with BO[2] for b only.
137 * offs is to 2 byte (signed) aligned
138 * all branches to 2 byte aligned
139
140 ### LD/ST
141
142 | 16-bit mode | | 10-bit mode |
143 | 0 | 1 | 2 3 4 | | 567 | e | 8 9 a | b c d | f |
144 | RB2 | RA2 | RT | | 001 | 0 | 1 RA | 1 RB | M | fld
145 | RA2 | RT2 | RB | | 001 | 1 | 1 RA | 1 RT | M | fst
146 | | | RT | | 111 | 0 | RA | RB | M | ld
147 | | | RB | | 111 | 1 | RA | RT | M | st
148
149 * elwidth overrides can set different widths
150
151 16 bit mode:
152
153 * F=1 is FLD, FST
154 * RA2 extends RA to 3 bits (MSB)
155 * RT2 extends RT to 3 bits (MSB)
156
157 10 bit mode:
158
159 * RA and RB are only 2 bit (0-3)
160 * for LD, RT is implicitly RB: "ld RT=RB, RA(RB)"
161 * for ST, there is no offset: "st RT, RA(0)"
162
163 ### Arithmetic
164
165 | 16-bit mode | | 10-bit mode |
166 | 0 | 1 | 2 3 4 | | 567 | e | 89a | b c d | f |
167 | N | | RT | | 010 | 0 | RB | RA!=0 | M | add
168 | N | | RT | | 010 | 1 | RB | RA | M | mul
169 | N | | RT!=0 | | 011 | 0 | RB | RA!=0 | M | sub.
170 | N | 0 | 000 | | 011 | 0 | RB | RA!=0 | M | cmpw
171 | N | 1 | 000 | | 011 | 0 | RB | RA!=0 | M | cmpl
172 | N | | RT | | 011 | 0 | RB | 000 | M | neg.
173
174 10 bit mode:
175
176 * sub. default CR target is CR0
177 * for (RA|0) when RA=0 the input is a zero immediate,
178 meaning that sub. becomes neg.
179 * RT is implicitly RB: "add RT(=RB), RA, RB"
180
181 ### Logical
182
183 | 16-bit mode | | 10-bit mode |
184 | 0 | 1 | 2 3 4 | | 567 | e | 8 9 a | b c d | f |
185 | N | 0 | RT | | 100 | 0 | RB | RA!=0 | M | and
186 | N | 0 | RT | | 100 | 1 | RB | RA!=0 | M | nand
187 | N | 0 | RT | | 101 | 0 | RB | RA!=0 | M | or
188 | N | 0 | RT | | 101 | 1 | RB | RA!=0 | M | nor
189 | N | 0 | RT | | 100 | 0 | RB | 0 0 0 | M | extsw
190 | N | 0 | RT | | 100 | 1 | RB | 0 0 0 | M | cntlz
191 | N | 0 | RT | | 101 | 0 | RB | 0 0 0 | M | popcnt
192 | N | 0 | RT | | 101 | 1 | RB | 0 0 0 | M | not
193
194 16-bit mode only:
195
196 | 0 | 1 | 2 3 4 | | 567 | e | 8 9 a | b c d | f |
197 | N | 1 | RT | | 100 | 0 | RB | RA!=0 | M |
198 | N | 1 | RT | | 100 | 1 | RB | RA!=0 | M |
199 | N | 1 | RT | | 101 | 0 | RB | RA!=0 | M | xor
200 | N | 1 | RT | | 101 | 1 | RB | RA!=0 | M | eqv (xnor)
201 | N | 1 | RT | | 100 | 0 | RB | 0 0 0 | M | extsb
202 | N | 1 | RT | | 100 | 1 | RB | 0 0 0 | M | cnttz
203 | N | 1 | RT | | 101 | 0 | RB | 0 0 0 | M |
204 | N | 1 | RT | | 101 | 1 | RB | 0 0 0 | M | extsh
205
206 10 bit mode:
207
208 * for (RA|0) when RA=0 the input is a zero immediate,
209 meaning that nor becomes not
210 * cntlz, popcnt, exts **not available** in 10-bit mode
211 * RT is implicitly RB: "and RT(=RB), RA, RB"
212
213 ### Floating Point
214
215 Note here that elwidth overrides (SV Prefix) can be used to select FP16/32/64
216
217 | 16-bit mode | | 10-bit mode |
218 | 0 | 1 | 2 3 4 | | 567 | e | 8 9 a | b c d | f |
219 | N | | RT | | 011 | 1 | RB | RA!=0 | M | fsub.
220 | N | 0 | RT | | 110 | 0 | RB | RA!=0 | M | fadd
221 | N | 0 | RT | | 110 | 1 | RB | RA!=0 | M | fmul
222 | N | 0 | RT | | 011 | 1 | RB | 0 0 0 | M | fneg.
223 | N | 0 | RT | | 110 | 0 | RB | 0 0 0 | M |
224 | N | 0 | RT | | 110 | 1 | RB | 0 0 0 | M |
225
226 16-bit mode only:
227
228 | 0 | 1 | 2 3 4 | | 567 | e | 8 9 a | b c d | f |
229 | N | 1 | RT | | 011 | 1 | RB | RA!=0 | M |
230 | N | 1 | RT | | 110 | 0 | RB | RA!=0 | M |
231 | N | 1 | RT | | 110 | 1 | RB | RA!=0 | M | fdiv
232 | N | 1 | RT | | 011 | 1 | RB | 0 0 0 | M | fabs.
233 | N | 1 | RT | | 110 | 0 | RB | 0 0 0 | M | fmr.
234 | N | 1 | RT | | 110 | 1 | RB | 0 0 0 | M |
235
236 10 bit mode:
237
238 * fsub. fneg. and fmr. default target is CR1
239 * fmr. is **not available** in 10-bit mode
240 * fdiv is **not available** in 10-bit mode
241
242 16 bit mode:
243
244 * fmr. copies RB to RT (and sets CR1)
245
246 ### Condition Register
247
248 | 16-bit mode | | 10-bit mode |
249 | 0 1 2 3 | 4 | | 567 | 8 9 a | b c d e | f |
250 | 0 0 0 0 | BF2 | | 001 | 1 BF | 0 BFA | M | mcrf
251 | 0 0 0 1 | BA2 | | 001 | 1 BA | 0 BB | M | crnor
252 | 0 1 0 0 | BA2 | | 001 | 1 BA | 0 BB | M | crandc
253 | 0 1 1 0 | BA2 | | 001 | 1 BA | 0 BB | M | crxor
254 | 0 1 1 1 | BA2 | | 001 | 1 BA | 0 BB | M | crnand
255 | 1 0 0 0 | BA2 | | 001 | 1 BA | 0 BB | M | crand
256 | 1 0 0 1 | BA2 | | 001 | 1 BA | 0 BB | M | creqv
257 | 1 1 0 1 | BA2 | | 001 | 1 BA | 0 BB | M | crorc
258 | 1 1 1 0 | BA2 | | 001 | 1 BA | 0 BB | M | cror
259
260 10 bit mode:
261
262 * mcrf BF is only 2 bits which means the destination is only CR0-CR3
263 * CR operations: **not available** in 10-bit mode
264
265 16 bit mode:
266
267 * mcrf BF2 extends BF (in MSB) to 3 bits
268 * CR operations: destination register is same as BA.
269 * CR operations: only possible on CR0 and CR1
270
271 SV (Vector Mode):
272
273 * CR operations: greatly extended reach/range (useful for predicates)
274
275 ### System
276
277 cbank: Selection of Compressed-encoding "Bank". Different "banks" give different
278 meanings to opcodes.
279 Example: CBank=0b001 is heavily optimised to A/Video
280 Encode/Decode.
281
282 | 16-bit mode | | 10-bit mode |
283 | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
284 | Bank2 | | 010 | CBank | 0 0 0 | 0 | M | cbank
285
286 **not available** in 10-bit mode:
287
288 | 0 1 2 3 | 4 | | 567 | 8 9 a | b c d e | f |
289 | 1 1 1 1 | 0 | | 001 | 1 00 | 0 RT | M | mtlr
290 | 1 1 1 1 | 0 | | 001 | 1 01 | 0 RT | M | mtctr
291 | 1 1 1 1 | 0 | | 001 | 1 11 | 0 RT | M | mtcr
292 | 1 1 1 1 | 1 | | 001 | 1 00 | 0 RA | M | mflr
293 | 1 1 1 1 | 1 | | 001 | 1 01 | 0 RA | M | mfctr
294 | 1 1 1 1 | 1 | | 001 | 1 11 | 0 RA | M | mfcr
295
296 ### Unallocated
297
298 | 0 1 2 3 | 4 | | 567 | 8 9 a | b c d e | f |
299 | 0 0 1 0 | | | 001 | 1 | 0 | M |
300 | 0 0 1 1 | | | 001 | 1 | 0 | M |
301 | 0 1 0 1 | | | 001 | 1 | 0 | M |
302 | 1 0 1 0 | | | 001 | 1 | 0 | M |
303 | 1 0 1 1 | | | 001 | 1 | 0 | M |
304 | 1 1 0 0 | | | 001 | 1 | 0 | M |
305 | 1 1 1 1 | 0 | | 001 | 1 10 | 0 | M |
306 | 1 1 1 1 | 1 | | 001 | 1 10 | 0 | M |
307