add C-Bank switching instruction
[libreriscv.git] / openpower / sv / 16_bit_compressed.mdwn
1 # 16 bit Compressed
2
3 See:
4
5 * <https://bugs.libre-soc.org/show_bug.cgi?id=238>
6 * <https://ftp.libre-soc.org/VLE_314-68105.pdf> VLE Encoding
7
8 This one is a conundrum. OpenPOWER ISA was never designed with 16
9 bit in mind. VLE was added 10 years ago but only by way of marking
10 an entire 64k page as "VLE". With VLE not maintained it is not
11 fully compatible with current PowerISA.
12
13 Here, in order to embed 16 bit into a predominantly 32 bit stream the
14 overhead of using an entire 16 bits just to switch into Compressed mode
15 is itself a significant overhead. The situation is made worse by 5 bits
16 being taken up by Major Opcode space, leaving only 11 bits to allocate
17 to actual instructions.
18
19 In addition we would like to add SV-C32 which is a Vectorised version
20 of 16 bit Compressed, and ideally have a variant that adds the 27-bit
21 prefix format from SV-P64, as well.
22
23 Potential ways to reduce pressure on the 16 bit space are:
24
25 * To provide "paging". This involves bank-switching to alternative optimised encodings for specific workloads
26 * To enter "16 bit mode" for durations specified at the start
27 * To reserve one bit of every 16 bit instruction to indicate that the 16 bit mode is to continue to be sustained
28
29 This latter would be useful in the Vector context to have an alternative
30 meaning: as the bit which determines whether the instruction is 11-bit
31 prefixed or 27-bit prefixed:
32
33 0 1 2 3 4 5 6 7 8 9 a b c d e f |
34 |major op | 11 bit vector prefix|
35 |16 bit opcode alt vec. mode ^ |
36 | extra vector prefix if alt set|
37
38 Using a major opcode to enter 16 bit mode, leaves 11 bits to find
39 something to use them for:
40
41 0 1 2 3 4 5 6 7 8 9 a b c d e f |
42 |major op | what to do here 1 |
43 |16 bit stay in 16bit mode 1 |
44 |16 bit stay in 16bit mode 1 |
45 |16 bit exit 16bit mode 0 |
46
47 One possibility is that the 11 bits are used for bank selection, with
48 some room for additional context such as altering the registers used
49 for the 16 bit operations (bank selection of which scalar regs)
50
51 Another is to use the 11 bits for only the utmost commonly used
52 instructions. That being the case then even one of those 11 bits would
53 also need to be dedicated to saying if 16 bit mode is to be continued.
54 10 bits remain for actual opcodes!
55
56 # Opcode Allocation Ideas
57
58 * one bit from the 16-bit mode is used to indicate that 32-bit mode
59 is to be dropped into for only one single instruction
60 <https://bugs.libre-soc.org/show_bug.cgi?id=238#c2>
61
62 ## Opcodes exploration (Attempt 1)
63
64 ### Branch
65
66 10 bit mode may be expanded by 16 bit mode later, adding capabilities
67 that do not fit in the extreme limited space.
68
69 | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
70 | offs2 | | 000 | offs | LK | 1 | b
71 | BO2 | BI3 | | 001 | 0 BI | 0 BO | LK | 1 | bclr
72 | BO2 | BI3 | | 001 | 0 BI | 1 BO | LK | 1 | bctar
73
74 16 bit mode:
75
76 * offs2 extends offset in MSBs
77 * BI3 extends BI in MSBs to allow selection of full CR
78 * BO2 extends BO
79
80 10 bit mode:
81
82 * BO[0] enables CR check, BO[1] inverts check
83 * BI refers to CR0 only (4 bits of)
84 * no Branch Conditional with immediate
85 * no Absolute Address
86 * no CTR mode (and no bctr)
87 * offs is to 2 byte (signed) aligned
88 * all branches to 2 byte aligned
89
90 ### LD/ST
91
92 | 0 | 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
93 | RB2 | RA2 | RT | | 001 | 1 RA | 1 RB | 0 | 1 | fld
94 | RA2 | RT2 | RB | | 001 | 1 RA | 1 RT | 1 | 1 | fst
95 | | | RT | | 111 | RA | RB | 0 | 1 | ld
96 | | | RB | | 111 | RA | RT | 1 | 1 | st
97
98 * elwidth overrides can set different widths
99
100 16 bit mode:
101
102 * F=1 is FLD, FST
103 * RA2 extends RA to 3 bits (MSB)
104 * RT2 extends RT to 3 bits (MSB)
105
106 10 bit mode:
107
108 * RA and RB are only 2 bit (0-3)
109 * for LD, RT is implicitly RB: "ld RT=RB, RA(RB)"
110 * for ST, there is no offset: "st RT, RA(0)"
111
112 ### Arithmetic
113
114 | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
115 | | RT | | 010 | RB | RA!=0 | 0 | 1 | add
116 | | RT | | 011 | RB | RA!=0 | 0 | 1 | sub.
117 | | RT | | 010 | RB | RA | 1 | 1 | mul
118 | | RT | | 011 | RB | 0 0 0 | 0 | 1 | neg.
119
120 10 bit mode:
121
122 * sub. default CR target is CR0
123 * for (RA|0) when RA=0 the input is a zero immediate,
124 meaning that sub. becomes neg.
125 * RT is implicitly RB: "add RT(=RB), RA, RB"
126
127 ### Logical
128
129 | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
130 | | RT | | 100 | RB | RA!=0 | 0 | 1 | and
131 | | RT | | 100 | RB | RA!=0 | 1 | 1 | nand
132 | | RT | | 101 | RB | RA!=0 | 0 | 1 | or
133 | | RT | | 101 | RB | RA!=0 | 1 | 1 | nor
134 | | RT | | 100 | RB | 0 0 0 | 0 | 1 | exts
135 | | RT | | 100 | RB | 0 0 0 | 1 | 1 | cntlz
136 | | RT | | 101 | RB | 0 0 0 | 0 | 1 | popcnt
137 | | RT | | 101 | RB | 0 0 0 | 1 | 1 | not
138
139 10 bit mode:
140
141 * for (RA|0) when RA=0 the input is a zero immediate,
142 meaning that nor becomes not
143 * cntlz, popcnt, exts **not available** in 10-bit mode
144 * RT is implicitly RB: "and RT(=RB), RA, RB"
145
146 ### Floating Point
147
148 | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
149 | | RT | | 011 | RB | RA!=0 | 1 | 1 | fsub.
150 | | RT | | 110 | RB | RA!=0 | 0 | 1 | fadd
151 | | RT | | 110 | RB | RA!=0 | 1 | 1 | fmul
152 | | RT | | 011 | RB | 0 0 0 | 1 | 1 | fneg.
153 | | RT | | 110 | RB | 0 0 0 | 0 | 1 | fabs
154 | | RT | | 110 | RB | 0 0 0 | 1 | 1 | fmr.
155
156 10 bit mode:
157
158 * fsub. fneg. and fmr. default target is CR1
159 * fmr. is **not available** in 10-bit mode
160
161 16 bit mode:
162
163 * fmr. copies RB to RT (and sets CR1)
164
165 ### Condition Register
166
167 | 0 1 2 3 | 4 | | 567 | 8 9 a | b c d e | f |
168 | 0 0 0 0 | BF2 | | 001 | 1 BF | 0 BFA | 1 | mcrf
169 | 0 0 0 1 | BA2 | | 001 | 1 BA | 0 BB | 1 | crnor
170 | 0 1 0 0 | BA2 | | 001 | 1 BA | 0 BB | 1 | crandc
171 | 0 1 1 0 | BA2 | | 001 | 1 BA | 0 BB | 1 | crxor
172 | 0 1 1 1 | BA2 | | 001 | 1 BA | 0 BB | 1 | crnand
173 | 1 0 0 0 | BA2 | | 001 | 1 BA | 0 BB | 1 | crand
174 | 1 0 0 1 | BA2 | | 001 | 1 BA | 0 BB | 1 | creqv
175 | 1 1 0 1 | BA2 | | 001 | 1 BA | 0 BB | 1 | crorc
176 | 1 1 1 0 | BA2 | | 001 | 1 BA | 0 BB | 1 | cror
177
178 10 bit mode:
179
180 * mcrf BF is only 2 bits which means the destination is only CR0-CR3
181 * CR operations: **not available** in 10-bit mode
182
183 16 bit mode:
184
185 * mcrf BF2 extends BF (in MSB) to 3 bits
186 * CR operations: destination register is same as BA.
187 * CR operations: only possible on CR0 and CR1
188
189 SV (Vector Mode):
190
191 * CR operations: greatly extended reach/range (useful for predicates)
192
193 ### System
194
195 Selection of Compressed-encoding "Bank". Different "banks" give different
196 meanings to opcodes. Example: CBank=0b001 is heavily optimised to A/Video
197 Encode/Decode.
198
199 | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
200 | Bank2 | | 010 | CBank | 0 0 0 | 0 | 1 | cbank
201
202 **not available** in 10-bit mode:
203
204 | 0 1 2 3 | 4 | | 567 | 8 9 a | b c d e | f |
205 | 1 1 1 1 | 0 | | 001 | 1 00 | 0 RT | 1 | mtlr
206 | 1 1 1 1 | 0 | | 001 | 1 01 | 0 RT | 1 | mtctr
207 | 1 1 1 1 | 0 | | 001 | 1 10 | 0 RT | 1 | mttar
208 | 1 1 1 1 | 0 | | 001 | 1 11 | 0 RT | 1 | mtcr
209 | 1 1 1 1 | 1 | | 001 | 1 00 | 0 RA | 1 | mflr
210 | 1 1 1 1 | 1 | | 001 | 1 01 | 0 RA | 1 | mfctr
211 | 1 1 1 1 | 1 | | 001 | 1 10 | 0 RA | 1 | mftar
212 | 1 1 1 1 | 1 | | 001 | 1 11 | 0 RA | 1 | mfcr
213
214 ### Unallocated
215
216 | 0 1 2 3 | 4 | | 567 | 8 9 a | b c d e | f |
217 | 0 0 1 0 | | | 001 | 1 | 0 | 1 |
218 | 0 0 1 1 | | | 001 | 1 | 0 | 1 |
219 | 0 1 0 1 | | | 001 | 1 | 0 | 1 |
220 | 1 0 1 0 | | | 001 | 1 | 0 | 1 |
221 | 1 0 1 1 | | | 001 | 1 | 0 | 1 |
222 | 1 1 0 0 | | | 001 | 1 | 0 | 1 |
223