61fa9cd11f073e57ffaca3a92c87fa8ec5e787f8
[libreriscv.git] / openpower / sv / 16_bit_compressed.mdwn
1 # 16 bit Compressed
2
3 Similar to VLE (but without immediate-prefixing) this encoding is designed
4 to fit on top of OpenPOWER ISA v3.0B when a "Modeswitch" bit is set (PCR
5 is recommended). Note that Compressed is *mutually exclusively incompatible*
6 with OpenPOWER v3.1B "prefixing" due to using (requiring) both EXT000
7 and EXT001. Hypothetically it could be made to use anything other than
8 EXT001, with some inconvenience (extra gates). The incompatibility is
9 "fixed" by swapping out of "Compressed" Mode and back into "Normal"
10 (v3.1B) Mode, at runtime, as needed.
11
12 Although initially intended to be augmented by Simple-V Prefixing, to
13 add Vector context and predication yet not put pressure on I-Cache power
14 or size, this Compressed Encoding is not critically dependent
15 *on* SV Prefixing, and may be used stand-alone
16
17 See:
18
19 * <https://bugs.libre-soc.org/show_bug.cgi?id=238>
20 * <https://ftp.libre-soc.org/VLE_314-68105.pdf> VLE Encoding
21
22 This one is a conundrum. OpenPOWER ISA was never designed with 16
23 bit in mind. VLE was added 10 years ago but only by way of marking
24 an entire 64k page as "VLE". With VLE not maintained it is not
25 fully compatible with current PowerISA.
26
27 Here, in order to embed 16 bit into a predominantly 32 bit stream the
28 overhead of using an entire 16 bits just to switch into Compressed mode
29 is itself a significant overhead. The situation is made worse by 5 bits
30 being taken up by Major Opcode space, leaving only 11 bits to allocate
31 to actual instructions.
32
33 In addition we would like to add SV-C32 which is a Vectorised version
34 of 16 bit Compressed, and ideally have a variant that adds the 27-bit
35 prefix format from SV-P64, as well.
36
37 Potential ways to reduce pressure on the 16 bit space are:
38
39 * To provide "paging". This involves bank-switching to alternative optimised encodings for specific workloads
40 * To enter "16 bit mode" for durations specified at the start
41 * To reserve one bit of every 16 bit instruction to indicate that the 16 bit mode is to continue to be sustained
42
43 This latter would be useful in the Vector context to have an alternative
44 meaning: as the bit which determines whether the instruction is 11-bit
45 prefixed or 27-bit prefixed:
46
47 0 1 2 3 4 5 6 7 8 9 a b c d e f |
48 |major op | 11 bit vector prefix|
49 |16 bit opcode alt vec. mode ^ |
50 | extra vector prefix if alt set|
51
52 Using a major opcode to enter 16 bit mode, leaves 11 bits to find
53 something to use them for:
54
55 0 1 2 3 4 5 6 7 8 9 a b c d e f |
56 |major op | what to do here 1 |
57 |16 bit stay in 16bit mode 1 |
58 |16 bit stay in 16bit mode 1 |
59 |16 bit exit 16bit mode 0 |
60
61 One possibility is that the 11 bits are used for bank selection, with
62 some room for additional context such as altering the registers used
63 for the 16 bit operations (bank selection of which scalar regs)
64
65 Another is to use the 11 bits for only the utmost commonly used
66 instructions. That being the case then even one of those 11 bits would
67 also need to be dedicated to saying if 16 bit mode is to be continued.
68 10 bits remain for actual opcodes!
69
70 # Opcode Allocation Ideas
71
72 * one bit from the 16-bit mode is used to indicate that 32-bit mode
73 is to be dropped into for only one single instruction
74 <https://bugs.libre-soc.org/show_bug.cgi?id=238#c2>
75
76 ## Opcodes exploration (Attempt 1)
77
78 Switching between different encoding modes is controlled by M (alone)
79 in 10-bit mode, and M and N in 16-bit mode.
80
81 * M in 10-bit mode if zero indicates that following instructions are
82 standard OpenPOWER ISA 32-bit encoded (including, redundantly,
83 further 10/16-bit instructions)
84 * M in 10-bit mode if 1 indicates that following instructions are
85 in 16-bit encoding mode
86
87 Once in 16-bit mode:
88
89 * 0b01 (M=1, N=0): stay in 16-bit mode
90 * 0b00: leave 16-bit mode permanently (return to standard OpenPOWER ISA)
91 * 0b10: leave 16-bit mode for one cycle (return to standard OpenPOWER ISA)
92 * 0b11: free to be used for something completely different.
93
94 The current "top" idea for 0b11 is to use it for a new encoding format
95 of predominantly "immediates-based" 16-bit instructions (branch-conditional,
96 addi, mulli etc.)
97
98 * The Compressed Major Opcode is in bits 5-7.
99 * Minor opcode in bit 8.
100 * In some cases bit 9 is taken as an additional sub-opcode, followed
101 by bits 0-4 (for CR operations)
102 * M+N mode-switching is not available for C-Major 0b001 or 0b111
103 * 10 bit mode may be expanded by 16 bit mode, adding capabilities
104 that do not fit in the extreme limited space.
105
106 ### Immediate Opcodes
107
108 only available in 16-bit mode, and only available when M=1 and N=1
109
110 | 0 | 1 | 2 3 4 | | 567.8 | 9ab | cde | f |
111 | 1 | i2 | RT | | 010.0 | RA|0 | imm | 1 | addi
112 | 1 | i2 | | 010.1 | RA | imm | 1 | addis
113 | 1 | i2 | | 011.0 | RB | imm | 1 | cmpdi
114 | 1 | i2 | | 011.1 | RB | imm | 1 | cmpwi
115 | 1 | i2 | | 100.0 | RT | imm | 1 | sti
116 | 1 | i2 | | 100.1 | RT | imm | 1 | fstwi
117 | 1 | i2 | | 101.0 | RA | imm | 1 | ldi
118 | 1 | i2 | | 101.1 | RA | imm | 1 | lwi
119 | 1 | i2 | | 110.0 | RA | imm | 1 | flwi
120 | 1 | i2 | | 110.1 | RA | imm | 1 | fldi
121
122 Construction of immediate:
123
124 * addi is EXTS(i2||imm) to give a 4-bit range -8 to +7
125 * addis is EXTS(i2||imm||000) to give a 11-bit range -1024 to +1023 in increments of 8
126 * all others are EXTS(i2||imm) to give a 7-bit range -128 to +127
127 (further for LD/ST due to word/dword-alignment)
128
129 Further Notes:
130
131 * bc also has an immediate mode, listed separately below in Branch section
132 * for LD/ST, offset is aligned. 8-byte: i2||imm||0b000 4-byte: 0b00
133 * SV Prefix over-rides help provide alternative bitwidths for LD/ST
134 * RA|0 if RA is zero, addi. becomes "li"
135 - this only works if RT takes part of opcode
136 - mv is also possible by specifying an immediate of zero
137
138
139 ### Branch
140
141 Note that illeg and nop are all zeros, including in the 16-bit mode.
142 Given that C is allocated to OpenPOWER ISA Major opcodes EXT000 and
143 EXT001 this ensures that in both 10-bit *and* 16-bit mode, a 16-bit
144 run of all zeros is considered "illegal" whilst 0b0000.0000.1000.0000
145 is "nop"
146
147 | 16-bit mode | | 10-bit mode |
148 | 0 | 1 | 234 | | 567.8 | 9 ab | c de | f |
149 | 0 | 0 000 | | 000.0 | 0 00 | 0 00 | 0 | illeg
150 | 0 | 0 000 | | 000.1 | 0 00 | 0 00 | 0 | nop
151 | N | offs2 | | 000.LK | offs!=0 | M | b, bl
152 | 1 | offs2 | | 000.LK | BI | BO1 oo | 1 | bc, bcl
153 | N | BO3 BI3 | | 001.0 | LK BI | BO | M | bclr, bclrl
154
155 16 bit mode:
156
157 * bc only available when N,M=0b11
158 * offs2 extends offset in MSBs
159 * BI3 extends BI in MSBs to allow selection of full CR
160 * BO3 extends BO
161 * bc offset constructed from oo as LSBs and offs2 as MSBs
162 * bc BI allows selection of all bits from CR0 or CR1
163 * bc CR check is always active (as if BO0=1) therefore BO1 inverts
164
165 10 bit mode:
166
167 * illegal (all zeros) covers part of branch (offs=0,M=0,LK=0)
168 * nop also covers part of branch (offs=0,M=0,LK=1)
169 * bc **not available** in 10-bit mode
170 * BO[0] enables CR check, BO[1] inverts check
171 * BI refers to CR0 only (4 bits of)
172 * no Branch Conditional with immediate
173 * no Absolute Address
174 * CTR mode allowed with BO[2] for b only.
175 * offs is to 2 byte (signed) aligned
176 * all branches to 2 byte aligned
177
178 ### LD/ST
179
180 | 16-bit mode | | 10-bit mode |
181 | 0 | 1 | 2 3 4 | | 567.8 | 9 a b | c d e | f |
182 | RB2 | RA2 | RT | | 001.1 | 1 RA | 0 RB | M | fld
183 | RA2 | RT2 | RB | | 001.1 | 1 RA | 1 RT | M | fst
184 | | | RT | | 111.0 | RA | RB | M | ld
185 | | | RB | | 111.1 | RA | RT | M | st
186
187 * elwidth overrides can set different widths
188
189 16 bit mode:
190
191 * F=1 is FLD, FST
192 * RA2 extends RA to 3 bits (MSB)
193 * RT2 extends RT to 3 bits (MSB)
194
195 10 bit mode:
196
197 * RA and RB are only 2 bit (0-3)
198 * for LD, RT is implicitly RB: "ld RT=RB, RA(RB)"
199 * for ST, there is no offset: "st RT, RA(0)"
200
201 ### Arithmetic
202
203 | 16-bit mode | | 10-bit mode |
204 | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
205 | N | 0 | RT | | 010.0 | RB | RA!=0 | M | add
206 | N | 0 | RT | | 010.1 | RB | RA | M | mul
207 | N | 0 | RT!=0 | | 011.0 | RB | RA!=0 | M | sub.
208 | N | 0 | 000 | | 011.0 | RB | RA!=0 | M | cmpw
209 | N | 0 | RT | | 011.0 | RB | 000 | M | neg.
210
211 16 bit mode only:
212
213 | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
214 | N | 1 | RT | | 010.0 | | | M |
215 | N | 1 | RT | | 010.1 | RB | RA | M | div
216 | N | 1 | RT!=0 | | 011.0 | RB | RA!=0 | M |
217 | N | 1 | 000 | | 011.0 | RB | RA!=0 | M | cmpl
218 | N | 1 | RT | | 011.0 | RB | 000 | M |
219
220 10 bit mode:
221
222 * sub. default CR target is CR0
223 * for (RA|0) when RA=0 the input is a zero immediate,
224 meaning that sub. becomes neg.
225 * RT is implicitly RB: "add RT(=RB), RA, RB"
226 * Opcode 0b010.0 RA=0 is not missing from the above:
227 it is a system-wide instruction, "cbank" (section below)
228
229 ### Logical
230
231 | 16-bit mode | | 10-bit mode |
232 | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
233 | N | 0 | RT | | 100.0 | RB | RA!=0 | M | and
234 | N | 0 | RT | | 100.1 | RB | RA!=0 | M | nand
235 | N | 0 | RT | | 101.0 | RB | RA!=0 | M | or
236 | N | 0 | RT | | 101.1 | RB | RA!=0 | M | nor
237 | N | 0 | RT | | 100.0 | RB | 0 0 0 | M | extsw
238 | N | 0 | RT | | 100.1 | RB | 0 0 0 | M | cntlz
239 | N | 0 | RT | | 101.0 | RB | 0 0 0 | M | popcnt
240 | N | 0 | RT | | 101.1 | RB | 0 0 0 | M | not
241
242 16-bit mode only:
243
244 | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
245 | N | 1 | RT | | 100.0 | RB | RA!=0 | M |
246 | N | 1 | RT | | 100.1 | RB | RA!=0 | M |
247 | N | 1 | RT | | 101.0 | RB | RA!=0 | M | xor
248 | N | 1 | RT | | 101.1 | RB | RA!=0 | M | eqv (xnor)
249 | N | 1 | RT | | 100.0 | RB | 0 0 0 | M | extsb
250 | N | 1 | RT | | 100.1 | RB | 0 0 0 | M | cnttz
251 | N | 1 | RT | | 101.0 | RB | 0 0 0 | M |
252 | N | 1 | RT | | 101.1 | RB | 0 0 0 | M | extsh
253
254 10 bit mode:
255
256 * for (RA|0) when RA=0 the input is a zero immediate,
257 meaning that nor becomes not
258 * cntlz, popcnt, exts **not available** in 10-bit mode
259 * RT is implicitly RB: "and RT(=RB), RA, RB"
260
261 ### Floating Point
262
263 Note here that elwidth overrides (SV Prefix) can be used to select FP16/32/64
264
265 | 16-bit mode | | 10-bit mode |
266 | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
267 | N | | RT | | 011.1 | RB | RA!=0 | M | fsub.
268 | N | 0 | RT | | 110.0 | RB | RA!=0 | M | fadd
269 | N | 0 | RT | | 110.1 | RB | RA!=0 | M | fmul
270 | N | 0 | RT | | 011.1 | RB | 0 0 0 | M | fneg.
271 | N | 0 | RT | | 110.0 | RB | 0 0 0 | M |
272 | N | 0 | RT | | 110.1 | RB | 0 0 0 | M |
273
274 16-bit mode only:
275
276 | 0 | 1 | 2 3 4 | | 567.8 | 9ab | c d e | f |
277 | N | 1 | RT | | 011.1 | RB | RA!=0 | M |
278 | N | 1 | RT | | 110.0 | RB | RA!=0 | M |
279 | N | 1 | RT | | 110.1 | RB | RA!=0 | M | fdiv
280 | N | 1 | RT | | 011.1 | RB | 0 0 0 | M | fabs.
281 | N | 1 | RT | | 110.0 | RB | 0 0 0 | M | fmr.
282 | N | 1 | RT | | 110.1 | RB | 0 0 0 | M |
283
284 10 bit mode:
285
286 * fsub. fneg. and fmr. default target is CR1
287 * fmr. is **not available** in 10-bit mode
288 * fdiv is **not available** in 10-bit mode
289
290 16 bit mode:
291
292 * fmr. copies RB to RT (and sets CR1)
293
294 ### Condition Register
295
296 | 16-bit mode | | 10-bit mode |
297 | 0 1 2 3 | 4 | | 567.8 | 9 ab | cde | f |
298 | 0 0 0 0 | BF2 | | 001.1 | 0 BF | BFA | M | mcrf
299 | 0 0 0 1 | BA2 | | 001.1 | 0 BA | BB | M | crnor
300 | 0 1 0 0 | BA2 | | 001.1 | 0 BA | BB | M | crandc
301 | 0 1 1 0 | BA2 | | 001.1 | 0 BA | BB | M | crxor
302 | 0 1 1 1 | BA2 | | 001.1 | 0 BA | BB | M | crnand
303 | 1 0 0 0 | BA2 | | 001.1 | 0 BA | BB | M | crand
304 | 1 0 0 1 | BA2 | | 001.1 | 0 BA | BB | M | creqv
305 | 1 1 0 1 | BA2 | | 001.1 | 0 BA | BB | M | crorc
306 | 1 1 1 0 | BA2 | | 001.1 | 0 BA | BB | M | cror
307
308 10 bit mode:
309
310 * mcrf BF is only 2 bits which means the destination is only CR0-CR3
311 * CR operations: **not available** in 10-bit mode (but mcrf is)
312
313 16 bit mode:
314
315 * mcrf BF2 extends BF (in MSB) to 3 bits
316 * CR operations: destination register is same as BA.
317 * CR operations: only possible on CR0 and CR1
318
319 SV (Vector Mode):
320
321 * CR operations: greatly extended reach/range (useful for predicates)
322
323 ### System
324
325 cbank: Selection of Compressed-encoding "Bank". Different "banks"
326 give different meanings to opcodes. Example: CBank=0b001 is heavily
327 optimised to A/Video Encode/Decode. cbank borrows from add's encoding
328 space (when RA==0)
329
330 | 16-bit mode | | 10-bit mode |
331 | 0 | 1 2 3 4 | | 567.8 | 9ab | cde | f |
332 | N | 0 Bank2 | | 010.0 | CBank | 000 | M | cbank
333
334 **not available** in 10-bit mode:
335
336 | 0 1 2 3 | 4 | | 567.8 | 9 ab | cde | f |
337 | 1 1 1 1 | 0 | | 001.1 | 0 00 | RT | M | mtlr
338 | 1 1 1 1 | 0 | | 001.1 | 0 01 | RT | M | mtctr
339 | 1 1 1 1 | 0 | | 001.1 | 0 11 | RT | M | mtcr
340 | 1 1 1 1 | 1 | | 001.1 | 0 00 | RA | M | mflr
341 | 1 1 1 1 | 1 | | 001.1 | 0 01 | RA | M | mfctr
342 | 1 1 1 1 | 1 | | 001.1 | 0 11 | RA | M | mfcr
343
344 ### Unallocated
345
346 | 0 1 2 3 | 4 | | 567.8 | 9 ab | cde | f |
347 | 0 0 1 0 | | | 001.1 | 0 | | M |
348 | 0 0 1 1 | | | 001.1 | 0 | | M |
349 | 0 1 0 1 | | | 001.1 | 0 | | M |
350 | 1 0 1 0 | | | 001.1 | 0 | | M |
351 | 1 0 1 1 | | | 001.1 | 0 | | M |
352 | 1 1 0 0 | | | 001.1 | 0 | | M |
353 | 1 1 1 1 | 0 | | 001.1 | 0 10 | | M |
354 | 1 1 1 1 | 1 | | 001.1 | 0 10 | | M |
355