add fmr. minor shuffle
[libreriscv.git] / openpower / sv / 16_bit_compressed.mdwn
1 # 16 bit Compressed
2
3 See:
4
5 * <https://bugs.libre-soc.org/show_bug.cgi?id=238>
6 * <https://ftp.libre-soc.org/VLE_314-68105.pdf> VLE Encoding
7
8 This one is a conundrum. OpenPOWER ISA was never designed with 16
9 bit in mind. VLE was added 10 years ago but only by way of marking
10 an entire 64k page as "VLE". With VLE not maintained it is not
11 fully compatible with current PowerISA.
12
13 Here, in order to embed 16 bit into a predominantly 32 bit stream the
14 overhead of using an entire 16 bits just to switch into Compressed mode
15 is itself a significant overhead. The situation is made worse by 5 bits
16 being taken up by Major Opcode space, leaving only 11 bits to allocate
17 to actual instructions.
18
19 In addition we would like to add SV-C32 which is a Vectorised version
20 of 16 bit Compressed, and ideally have a variant that adds the 27-bit
21 prefix format from SV-P64, as well.
22
23 Potential ways to reduce pressure on the 16 bit space are:
24
25 * To provide "paging". This involves bank-switching to alternative optimised encodings for specific workloads
26 * To enter "16 bit mode" for durations specified at the start
27 * To reserve one bit of every 16 bit instruction to indicate that the 16 bit mode is to continue to be sustained
28
29 This latter would be useful in the Vector context to have an alternative
30 meaning: as the bit which determines whether the instruction is 11-bit
31 prefixed or 27-bit prefixed:
32
33 0 1 2 3 4 5 6 7 8 9 a b c d e f |
34 |major op | 11 bit vector prefix|
35 |16 bit opcode alt vec. mode ^ |
36 | extra vector prefix if alt set|
37
38 Using a major opcode to enter 16 bit mode, leaves 11 bits to find
39 something to use them for:
40
41 0 1 2 3 4 5 6 7 8 9 a b c d e f |
42 |major op | what to do here 1 |
43 |16 bit stay in 16bit mode 1 |
44 |16 bit stay in 16bit mode 1 |
45 |16 bit exit 16bit mode 0 |
46
47 One possibility is that the 11 bits are used for bank selection, with
48 some room for additional context such as altering the registers used
49 for the 16 bit operations (bank selection of which scalar regs)
50
51 Another is to use the 11 bits for only the utmost commonly used
52 instructions. That being the case then even one of those 11 bits would
53 also need to be dedicated to saying if 16 bit mode is to be continued.
54 10 bits remain for actual opcodes!
55
56 # Opcode Allocation Ideas
57
58 ## Opcodes exploration (Attempt 1)
59
60 ### Branch
61
62 10 bit mode may be expanded by 16 bit mode later, adding capabilities
63 that do not fit in the extreme limited space.
64
65 | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
66 | offs2 | | 000 | offs | LK | 1 | b
67 | BO2 | BI3 | | 001 | 0 BI | 0 BO | LK | 1 | bclr
68 | BO2 | BI3 | | 001 | 0 BI | 1 BO | LK | 1 | bctar
69
70 16 bit mode:
71
72 * offs2 extends offset in MSBs
73 * BI3 extends BI in MSBs to allow selection of full CR
74 * BO2 extends BO
75
76 10 bit mode:
77
78 * BO[0] enables CR check, BO[1] inverts check
79 * BI refers to CR0 only (4 bits of)
80 * no Branch Conditional with immediate
81 * no Absolute Address
82 * no CTR mode (and no bctr)
83 * offs is to 2 byte (signed) aligned
84 * all branches to 2 byte aligned
85
86 ### LD/ST
87
88 | 0 | 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
89 | RB2 | RA2 | RT | | 001 | 1 RA | 1 RB | 0 | 1 | fld
90 | RA2 | RT2 | RB | | 001 | 1 RA | 1 RT | 1 | 1 | fst
91 | | | RT | | 111 | RA | RB | 0 | 1 | ld
92 | | | RB | | 111 | RA | RT | 1 | 1 | st
93
94 * elwidth overrides can set different widths
95
96 16 bit mode:
97
98 * F=1 is FLD, FST
99 * RA2 extends RA to 3 bits (MSB)
100 * RT2 extends RT to 3 bits (MSB)
101
102 10 bit mode:
103
104 * RA and RB are only 2 bit (0-3)
105 * for LD, RT is implicitly RB: ld RT=RB, RA(RB)
106 * for ST, there is no offset: st RT, RA(0)
107
108 ### Arithmetic
109
110 | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
111 | | RT | | 010 | RB | RA!=0 | 0 | 1 | add
112 | | RT | | 010 | RB | RA | 1 | 1 | mul
113 | | RT | | 011 | RB | (RA|0)| 0 | 1 | sub.
114
115 10 bit mode:
116
117 * sub. default CR target is CR0
118 * for (RA|0) when RA=0 the input is a zero immediate,
119 meaning that sub. becomes neg.
120
121 ### Logical
122
123 | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
124 | | RT | | 100 | RB | RA!=0 | 0 | 1 | and
125 | | RT | | 100 | RB | RA!=0 | 1 | 1 | nand
126 | | RT | | 101 | RB | RA!=0 | 0 | 1 | or
127 | | RT | | 101 | RB | RA!=0 | 1 | 1 | nor
128 | | RT | | 100 | RB | 0 0 0 | 0 | 1 | exts
129 | | RT | | 100 | RB | 0 0 0 | 1 | 1 | cntlz
130 | | RT | | 101 | RB | 0 0 0 | 0 | 1 | popcnt
131 | | RT | | 101 | RB | 0 0 0 | 1 | 1 | not
132
133 10 bit mode:
134
135 * for (RA|0) when RA=0 the input is a zero immediate,
136 meaning that nor becomes not
137
138 ### Floating Point
139
140 | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
141 | | RT | | 011 | RB | RA!=0 | 1 | 1 | fsub.
142 | | RT | | 110 | RB | RA!=0 | 0 | 1 | fadd
143 | | RT | | 110 | RB | RA!=0 | 1 | 1 | fmul
144 | | RT | | 011 | RB | 0 0 0 | 1 | 1 | fneg.
145 | | RT | | 110 | RB | 0 0 0 | 0 | 1 | fabs
146 | | RT | | 110 | RB | 0 0 0 | 1 | 1 | fmr.
147
148 10 bit mode:
149
150 * fsub. fneg. and fmr. default target is CR1
151 * fmr. is **not available** in 10-bit mode
152
153 16 bit mode:
154
155 * fmr. copies RB to RT (and sets CR1)
156
157 ### Condition Register
158
159 | 0 1 2 3 | 4 | | 567 | 8 9 a | b c d e | f |
160 | 0 0 0 0 | BF2 | | 001 | 1 BF | 0 BFA | 1 | mcrf
161 | 0 0 0 1 | BA2 | | 001 | 1 BA | 0 BB | 1 | crnor
162 | 0 1 0 0 | BA2 | | 001 | 1 BA | 0 BB | 1 | crandc
163 | 0 1 1 0 | BA2 | | 001 | 1 BA | 0 BB | 1 | crxor
164 | 0 1 1 1 | BA2 | | 001 | 1 BA | 0 BB | 1 | crnand
165 | 1 0 0 0 | BA2 | | 001 | 1 BA | 0 BB | 1 | crand
166 | 1 0 0 1 | BA2 | | 001 | 1 BA | 0 BB | 1 | creqv
167 | 1 1 0 1 | BA2 | | 001 | 1 BA | 0 BB | 1 | crorc
168 | 1 1 1 0 | BA2 | | 001 | 1 BA | 0 BB | 1 | cror
169
170 10 bit mode:
171
172 * mcrf BF is only 2 bits which means the destination is only CR0-CR3
173 * CR operations: **not available** in 10-bit mode
174
175 16 bit mode:
176
177 * mcrf BF2 extends BF (in MSB) to 3 bits
178 * CR operations: destination register is same as BA.
179 * CR operations: only possible on CR0 and CR1
180
181 SV (Vector Mode):
182
183 * CR operations: greatly extended reach/range (useful for predicates)
184
185 ### System
186
187 10/16-bit mode:
188
189 | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
190 | | | | 010 | 0 0 0 | 0 0 0 | 0 | 1 | sc
191 | | | | 010 | 0 0 1 | 0 0 0 | 0 | 1 | rfid
192
193 **not available** in 10-bit mode:
194
195 | 0 1 2 3 | 4 | | 567 | 8 9 a | b c d e | f |
196 | 1 1 1 1 | 0 | | 001 | 1 00 | 0 RT | 1 | mtlr
197 | 1 1 1 1 | 0 | | 001 | 1 01 | 0 RT | 1 | mtctr
198 | 1 1 1 1 | 0 | | 001 | 1 10 | 0 RT | 1 | mttar
199 | 1 1 1 1 | 0 | | 001 | 1 11 | 0 RT | 1 | mtcr
200 | 1 1 1 1 | 1 | | 001 | 1 00 | 0 RA | 1 | mflr
201 | 1 1 1 1 | 1 | | 001 | 1 01 | 0 RA | 1 | mfctr
202 | 1 1 1 1 | 1 | | 001 | 1 10 | 0 RA | 1 | mftar
203 | 1 1 1 1 | 1 | | 001 | 1 11 | 0 RA | 1 | mfcr
204
205 ### Unallocated
206
207 | 0 1 | 2 3 4 | | 567 | 8 9 a | b c d | e | f |
208 | | | | 010 | 0 1 0 | 0 0 0 | 0 | 1 |
209 | | | | 010 | 0 1 1 | 0 0 0 | 0 | 1 |
210 | | | | 010 | 1 0 0 | 0 0 0 | 0 | 1 |
211 | | | | 010 | 1 0 1 | 0 0 0 | 0 | 1 |
212 | | | | 010 | 1 1 0 | 0 0 0 | 0 | 1 |
213 | | | | 010 | 1 1 1 | 0 0 0 | 0 | 1 |
214
215 | 0 1 2 3 | 4 | | 567 | 8 9 a | b c d e | f |
216 | 0 0 1 0 | | | 001 | 1 | 0 | 1 |
217 | 0 0 1 1 | | | 001 | 1 | 0 | 1 |
218 | 0 1 0 1 | | | 001 | 1 | 0 | 1 |
219 | 1 0 1 0 | | | 001 | 1 | 0 | 1 |
220 | 1 0 1 1 | | | 001 | 1 | 0 | 1 |
221 | 1 1 0 0 | | | 001 | 1 | 0 | 1 |
222