(no commit message)
[libreriscv.git] / openpower / sv / 16_bit_compressed.mdwn
1 # 16 bit Compressed
2
3 See:
4
5 * <https://bugs.libre-soc.org/show_bug.cgi?id=238>
6 * <https://ftp.libre-soc.org/VLE_314-68105.pdf> VLE Encoding
7
8 This one is a conundrum. OpenPOWER ISA was never designed with 16
9 bit in mind. VLE was added 10 years ago but only by way of marking
10 an entire 64k page as "VLE". With VLE not maintained it is not
11 fully compatible with current PowerISA.
12
13 Here, in order to embed 16 bit into a predominantly 32 bit stream the
14 overhead of using an entire 16 bits just to switch into Compressed mode
15 is itself a significant overhead. The situation is made worse by 5 bits
16 being taken up by Major Opcode space, leaving only 11 bits to allocate
17 to actual instructions.
18
19 In addition we would like to add SV-C32 which is a Vectorised version
20 of 16 bit Compressed, and ideally have a variant that adds the 27-bit
21 prefix format from SV-P64, as well.
22
23 Potential ways to reduce pressure on the 16 bit space are:
24
25 * To provide "paging". This involves bank-switching to alternative optimised encodings for specific workloads
26 * To enter "16 bit mode" for durations specified at the start
27 * To reserve one bit of every 16 bit instruction to indicate that the 16 bit mode is to continue to be sustained
28
29 This latter would be useful in the Vector context to have an alternative
30 meaning: as the bit which determines whether the instruction is 11-bit
31 prefixed or 27-bit prefixed:
32
33 0 1 2 3 4 5 6 7 8 9 a b c d e f |
34 |major op | 11 bit vector prefix|
35 |16 bit opcode alt vec. mode ^ |
36 | extra vector prefix if alt set|
37
38 Using a major opcode to enter 16 bit mode, leaves 11 bits to find
39 something to use them for:
40
41 0 1 2 3 4 5 6 7 8 9 a b c d e f |
42 |major op | what to do here 1 |
43 |16 bit stay in 16bit mode 1 |
44 |16 bit stay in 16bit mode 1 |
45 |16 bit exit 16bit mode 0 |
46
47 One possibility is that the 11 bits are used for bank selection, with
48 some room for additional context such as altering the registers used
49 for the 16 bit operations (bank selection of which scalar regs)
50
51 Another is to use the 11 bits for only the utmost commonly used
52 instructions. That being the case then even one of those 11 bits would
53 also need to be dedicated to saying if 16 bit mode is to be continued.
54 10 bits remain for actual opcodes!
55
56 # Opcode Allocation Ideas
57
58 * one bit from the 16-bit mode is used to indicate that 32-bit mode
59 is to be dropped into for only one single instruction
60 <https://bugs.libre-soc.org/show_bug.cgi?id=238#c2>
61
62 ## Opcodes exploration (Attempt 1)
63
64 Switching between different encoding modes is controlled by M (alone)
65 in 10-bit mode, and M and N in 16-bit mode.
66
67 * M in 10-bit mode if zero indicates that following instructions are
68 standard OpenPOWER ISA 32-bit encoded (including, redundantly,
69 further 10/16-bit instructions)
70 * M in 10-bit mode if 1 indicates that following instructions are
71 in 16-bit encoding mode
72
73 Once in 16-bit mode:
74
75 * 0b01 (M=1, N=0): stay in 16-bit mode
76 * 0b00: leave 16-bit mode permanently (return to standard OpenPOWER ISA)
77 * 0b10: leave 16-bit mode for one cycle (return to standard OpenPOWER ISA)
78 * 0b11: free to be used for something completely different.
79
80 The current "top" idea for 0b11 is to use it for a new encoding format
81 of predominantly "immediates-based" 16-bit instructions (branch-conditional,
82 addi, mulli etc.)
83
84 * The Compressed Major Opcode is in bits 5-7.
85 * Minor opcode in bit 8.
86 * In some cases bit 9 is taken as an additional sub-opcode, followed
87 by bits 0-4 (for CR operations)
88 * M+N mode-switching is not available for C-Major 0b001 or 0b111
89 * 10 bit mode may be expanded by 16 bit mode, adding capabilities
90 that do not fit in the extreme limited space.
91
92 ### Immediate Opcodes
93
94 only available in 16-bit mode, and only available when M=1 and N=1
95
96 | 0 | 1 | 2 3 4 | | 567.8 | 9ab | cde | f |
97 | 1 | i2 | RT | | 010.0 | RA|0 | imm | 1 | addi
98 | 1 | i2 | | 010.1 | RA | imm | 1 | addis
99 | 1 | i2 | | 011.0 | RB | imm | 1 | cmpdi
100 | 1 | i2 | | 011.1 | RB | imm | 1 | cmpwi
101 | 1 | i2 | | 100.0 | RT | imm | 1 | sti
102 | 1 | i2 | | 100.1 | RT | imm | 1 | fstwi
103 | 1 | i2 | | 101.0 | RA | imm | 1 | ldi
104 | 1 | i2 | | 101.1 | RA | imm | 1 | lwi
105 | 1 | i2 | | 110.0 | RA | imm | 1 | flwi
106 | 1 | i2 | | 110.1 | RA | imm | 1 | fldi
107
108 Construction of immediate:
109
110 * addi is EXTS(i2||imm) to give a 4-bit range -8 to +7
111 * addis is EXTS(i2||imm||000) to give a 11-bit range -1024 to +1023 in increments of 8
112 * all others are EXTS(i2||imm) to give a 7-bit range -128 to +127
113 (further for LD/ST due to word/dword-alignment)
114
115 Further Notes:
116
117 * bc also has an immediate mode, listed separately below in Branch section
118 * for LD/ST, offset is aligned. 8-byte: i2||imm||0b000 4-byte: 0b00
119 * SV Prefix over-rides help provide alternative bitwidths for LD/ST
120 * RA|0 if RA is zero, addi. becomes "li"
121 - this only works if RT takes part of opcode
122 - mv is also possible by specifying an immediate of zero
123
124
125 ### Branch
126
127 Note that illeg and nop are all zeros, including in the 16-bit mode.
128 Given that C is allocated to OpenPOWER ISA Major opcodes EXT000 and
129 EXT001 this ensures that in both 10-bit *and* 16-bit mode, a 16-bit
130 run of all zeros is considered "illegal" whilst 0b0000.0000.1000.0000
131 is "nop"
132
133 | 16-bit mode | | 10-bit mode |
134 | 0 | 1 | 234 | | 567.8 | 9 ab | c de | f |
135 | 0 | 0 000 | | 000.0 | 0 00 | 0 00 | 0 | illeg
136 | 0 | 0 000 | | 000.1 | 0 00 | 0 00 | 0 | nop
137 | N | offs2 | | 000.LK | offs!=0 | M | b, bl
138 | 1 | offs2 | | 000.LK | BI | BO1 oo | 1 | bc, bcl
139 | N | BO3 BI3 | | 001.0 | LK BI | BO | M | bclr, bclrl
140
141 16 bit mode:
142
143 * bc only available when N,M=0b11
144 * offs2 extends offset in MSBs
145 * BI3 extends BI in MSBs to allow selection of full CR
146 * BO3 extends BO
147 * bc offset constructed from oo as LSBs and offs2 as MSBs
148 * bc BI allows selection of all bits from CR0 or CR1
149 * bc CR check is always active (as if BO0=1) therefore BO1 inverts
150
151 10 bit mode:
152
153 * illegal (all zeros) covers part of branch (offs=0,M=0,LK=0)
154 * nop also covers part of branch (offs=0,M=0,LK=1)
155 * bc **not available** in 10-bit mode
156 * BO[0] enables CR check, BO[1] inverts check
157 * BI refers to CR0 only (4 bits of)
158 * no Branch Conditional with immediate
159 * no Absolute Address
160 * CTR mode allowed with BO[2] for b only.
161 * offs is to 2 byte (signed) aligned
162 * all branches to 2 byte aligned
163
164 ### LD/ST
165
166 | 16-bit mode | | 10-bit mode |
167 | 0 | 1 | 2 3 4 | | 567.8 | 9 a b | c d e | f |
168 | RB2 | RA2 | RT | | 001.1 | 1 RA | 0 RB | M | fld
169 | RA2 | RT2 | RB | | 001.1 | 1 RA | 1 RT | M | fst
170 | | | RT | | 111.0 | RA | RB | M | ld
171 | | | RB | | 111.1 | RA | RT | M | st
172
173 * elwidth overrides can set different widths
174
175 16 bit mode:
176
177 * F=1 is FLD, FST
178 * RA2 extends RA to 3 bits (MSB)
179 * RT2 extends RT to 3 bits (MSB)
180
181 10 bit mode:
182
183 * RA and RB are only 2 bit (0-3)
184 * for LD, RT is implicitly RB: "ld RT=RB, RA(RB)"
185 * for ST, there is no offset: "st RT, RA(0)"
186
187 ### Arithmetic
188
189 | 16-bit mode | | 10-bit mode |
190 | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
191 | N | | RT | | 010.0 | RB | RA!=0 | M | add
192 | N | | RT | | 010.1 | RB | RA | M | mul
193 | N | | RT!=0 | | 011.0 | RB | RA!=0 | M | sub.
194 | N | 0 | 000 | | 011.0 | RB | RA!=0 | M | cmpw
195 | N | 1 | 000 | | 011.0 | RB | RA!=0 | M | cmpl
196 | N | | RT | | 011.0 | RB | 000 | M | neg.
197
198 10 bit mode:
199
200 * sub. default CR target is CR0
201 * for (RA|0) when RA=0 the input is a zero immediate,
202 meaning that sub. becomes neg.
203 * RT is implicitly RB: "add RT(=RB), RA, RB"
204 * Opcode 0b010.0 RA=0 is not missing from the above:
205 it is a system-wide instruction, "cbank" (section below)
206
207 ### Logical
208
209 | 16-bit mode | | 10-bit mode |
210 | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
211 | N | 0 | RT | | 100.0 | RB | RA!=0 | M | and
212 | N | 0 | RT | | 100.1 | RB | RA!=0 | M | nand
213 | N | 0 | RT | | 101.0 | RB | RA!=0 | M | or
214 | N | 0 | RT | | 101.1 | RB | RA!=0 | M | nor
215 | N | 0 | RT | | 100.0 | RB | 0 0 0 | M | extsw
216 | N | 0 | RT | | 100.1 | RB | 0 0 0 | M | cntlz
217 | N | 0 | RT | | 101.0 | RB | 0 0 0 | M | popcnt
218 | N | 0 | RT | | 101.1 | RB | 0 0 0 | M | not
219
220 16-bit mode only:
221
222 | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
223 | N | 1 | RT | | 100.0 | RB | RA!=0 | M |
224 | N | 1 | RT | | 100.1 | RB | RA!=0 | M |
225 | N | 1 | RT | | 101.0 | RB | RA!=0 | M | xor
226 | N | 1 | RT | | 101.1 | RB | RA!=0 | M | eqv (xnor)
227 | N | 1 | RT | | 100.0 | RB | 0 0 0 | M | extsb
228 | N | 1 | RT | | 100.1 | RB | 0 0 0 | M | cnttz
229 | N | 1 | RT | | 101.0 | RB | 0 0 0 | M |
230 | N | 1 | RT | | 101.1 | RB | 0 0 0 | M | extsh
231
232 10 bit mode:
233
234 * for (RA|0) when RA=0 the input is a zero immediate,
235 meaning that nor becomes not
236 * cntlz, popcnt, exts **not available** in 10-bit mode
237 * RT is implicitly RB: "and RT(=RB), RA, RB"
238
239 ### Floating Point
240
241 Note here that elwidth overrides (SV Prefix) can be used to select FP16/32/64
242
243 | 16-bit mode | | 10-bit mode |
244 | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
245 | N | | RT | | 011.1 | RB | RA!=0 | M | fsub.
246 | N | 0 | RT | | 110.0 | RB | RA!=0 | M | fadd
247 | N | 0 | RT | | 110.1 | RB | RA!=0 | M | fmul
248 | N | 0 | RT | | 011.1 | RB | 0 0 0 | M | fneg.
249 | N | 0 | RT | | 110.0 | RB | 0 0 0 | M |
250 | N | 0 | RT | | 110.1 | RB | 0 0 0 | M |
251
252 16-bit mode only:
253
254 | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
255 | N | 1 | RT | | 011.1 | RB | RA!=0 | M |
256 | N | 1 | RT | | 110.0 | RB | RA!=0 | M |
257 | N | 1 | RT | | 110.1 | RB | RA!=0 | M | fdiv
258 | N | 1 | RT | | 011.1 | RB | 0 0 0 | M | fabs.
259 | N | 1 | RT | | 110.0 | RB | 0 0 0 | M | fmr.
260 | N | 1 | RT | | 110.1 | RB | 0 0 0 | M |
261
262 10 bit mode:
263
264 * fsub. fneg. and fmr. default target is CR1
265 * fmr. is **not available** in 10-bit mode
266 * fdiv is **not available** in 10-bit mode
267
268 16 bit mode:
269
270 * fmr. copies RB to RT (and sets CR1)
271
272 ### Condition Register
273
274 | 16-bit mode | | 10-bit mode |
275 | 0 1 2 3 | 4 | | 567.8 | 9 ab | cde | f |
276 | 0 0 0 0 | BF2 | | 001.1 | 0 BF | BFA | M | mcrf
277 | 0 0 0 1 | BA2 | | 001.1 | 0 BA | BB | M | crnor
278 | 0 1 0 0 | BA2 | | 001.1 | 0 BA | BB | M | crandc
279 | 0 1 1 0 | BA2 | | 001.1 | 0 BA | BB | M | crxor
280 | 0 1 1 1 | BA2 | | 001.1 | 0 BA | BB | M | crnand
281 | 1 0 0 0 | BA2 | | 001.1 | 0 BA | BB | M | crand
282 | 1 0 0 1 | BA2 | | 001.1 | 0 BA | BB | M | creqv
283 | 1 1 0 1 | BA2 | | 001.1 | 0 BA | BB | M | crorc
284 | 1 1 1 0 | BA2 | | 001.1 | 0 BA | BB | M | cror
285
286 10 bit mode:
287
288 * mcrf BF is only 2 bits which means the destination is only CR0-CR3
289 * CR operations: **not available** in 10-bit mode (but mcrf is)
290
291 16 bit mode:
292
293 * mcrf BF2 extends BF (in MSB) to 3 bits
294 * CR operations: destination register is same as BA.
295 * CR operations: only possible on CR0 and CR1
296
297 SV (Vector Mode):
298
299 * CR operations: greatly extended reach/range (useful for predicates)
300
301 ### System
302
303 cbank: Selection of Compressed-encoding "Bank". Different "banks" give different
304 meanings to opcodes.
305 Example: CBank=0b001 is heavily optimised to A/Video
306 Encode/Decode.
307
308 | 16-bit mode | | 10-bit mode |
309 | 0 | 1 2 3 4 | | 567.8 | 9 a b | c d e | f |
310 | N | Bank2 | | 010.0 | CBank | 0 0 0 | M | cbank
311
312 **not available** in 10-bit mode:
313
314 | 0 1 2 3 | 4 | | 567.8 | 9 ab | c d e | f |
315 | 1 1 1 1 | 0 | | 001.1 | 0 00 | RT | M | mtlr
316 | 1 1 1 1 | 0 | | 001.1 | 0 01 | RT | M | mtctr
317 | 1 1 1 1 | 0 | | 001.1 | 0 11 | RT | M | mtcr
318 | 1 1 1 1 | 1 | | 001.1 | 0 00 | RA | M | mflr
319 | 1 1 1 1 | 1 | | 001.1 | 0 01 | RA | M | mfctr
320 | 1 1 1 1 | 1 | | 001.1 | 0 11 | RA | M | mfcr
321
322 ### Unallocated
323
324 | 0 1 2 3 | 4 | | 567.8 | 9 ab | c d e | f |
325 | 0 0 1 0 | | | 001.1 | 0 | | M |
326 | 0 0 1 1 | | | 001.1 | 0 | | M |
327 | 0 1 0 1 | | | 001.1 | 0 | | M |
328 | 1 0 1 0 | | | 001.1 | 0 | | M |
329 | 1 0 1 1 | | | 001.1 | 0 | | M |
330 | 1 1 0 0 | | | 001.1 | 0 | | M |
331 | 1 1 1 1 | 0 | | 001.1 | 0 10 | | M |
332 | 1 1 1 1 | 1 | | 001.1 | 0 10 | | M |
333