add CR operations
[libreriscv.git] / openpower / sv / 16_bit_compressed.mdwn
1 # 16 bit Compressed
2
3 This one is a conundrum. OpenPOWER ISA was never designed with 16
4 bit in mind. VLE was added 10 years ago but only by way of marking
5 an entire 64k page as "VLE". With no means to mix 32 bit and 16 bit,
6 jumping between the two would have been painful and taken up space.
7
8 Here, in order to embed 16 bit into a predominantly 32 bit stream the
9 overhead of using an entire 16 bits just to switch into Compressed mode
10 is itself a significant overhead. The situation is made worse by 5 bits
11 being taken up by Major Opcode space, leaving only 11 bits to allocate
12 to actual instructions.
13
14 In addition we would like to add SV-C32 which is a Vectorised version
15 of 16 bit Compressed, and ideally have a variant that adds the 27-bit
16 prefix format from SV-P64, as well.
17
18 Potential ways to reduce pressure on the 16 bit space are:
19
20 * To provide "paging". This involves bank-switching to alternative optimised encodings for specific workloads
21 * To enter "16 bit mode" for durations specified at the start
22 * To reserve one bit of every 16 bit instruction to indicate that the 16 bit mode is to continue to be sustained
23
24 This latter would be useful in the Vector context to have an alternative
25 meaning: as the bit which determines whether the instruction is 11-bit
26 prefixed or 27-bit prefixed:
27
28 0 1 2 3 4 5 6 7 8 9 a b c d e f |
29 |major op | 11 bit vector prefix|
30 |16 bit opcode alt vec. mode ^ |
31 | extra vector prefix if alt set|
32
33 Using a major opcode to enter 16 bit mode, leaves 11 bits to find
34 something to use them for:
35
36 0 1 2 3 4 5 6 7 8 9 a b c d e f |
37 |major op | what to do here 1 |
38 |16 bit stay in 16bit mode 1 |
39 |16 bit stay in 16bit mode 1 |
40 |16 bit exit 16bit mode 0 |
41
42 One possibility is that the 11 bits are used for bank selection, with
43 some room for additional context such as altering the registers used
44 for the 16 bit operations (bank selection of which scalar regs)
45
46 Another is to use the 11 bits for only the utmost commonly used
47 instructions. That being the case then even one of those 11 bits would
48 also need to be dedicated to saying if 16 bit mode is to be continued.
49 10 bits remain for actual opcodes!
50
51 # Opcode Allocation Ideas
52
53 ## Opcodes exploration (Attempt 1)
54
55 ### Branch
56
57 10 bit mode may be expanded by 16 bit mode later, adding capabilities
58 that do not fit in the extreme limited space.
59
60 | 0 1 | 2 3 4 | | 5 6 7 | 8 9 | a b | c d | e | f |
61 | offs2 | | 0 0 0 | offs | LK | 1 | b
62 | BO2 | BI3 | | 0 0 1 | 00 | BI | BO | LK | 1 | bclr
63 | BO2 | BI3 | | 0 0 1 | 01 | BI | BO | LK | 1 | bctar
64
65 16 bit mode:
66
67 * offs2 extends offset in MSBs
68 * BI3 extends BI in MSBs to allow selection of full CR
69 * BO2 extends BO
70
71 10 bit mode:
72
73 * BO[0] enables CR check, BO[1] inverts check
74 * BI refers to CR0 only (4 bits of)
75 * no Branch Conditional with immediate
76 * no Absolute Address
77 * no CTR mode (and no bctr)
78 * offs is to 2 byte (signed) aligned
79 * all branches to 2 byte aligned
80
81 ### LD/ST
82
83 | 0 | 1 | 2 3 4 | | 5 6 7 | 8 9 | a b | c d | e | f |
84 | F | RA2 | RT | | 0 0 1 | 11 | RA | RB | 0 | 1 | ld
85 | F | RT2 | RB | | 0 0 1 | 11 | RA | RT | 1 | 1 | st
86
87 * elwidth overrides can set different widths
88
89 16 bit mode:
90
91 * F=1 is FLD, FST
92 * RA2 extends RA to 3 bits (MSB)
93 * RT2 extends RT to 3 bits (MSB)
94
95 10 bit mode:
96
97 * RA and RB are only 2 bit (0-3)
98 * for LD, RT is implicitly RB: ld RT=RB, RA(RB)
99 * for ST, there is no offset: st RT, RA(0)
100
101 ### Arithmetic
102
103 | 0 1 | 2 3 4 | | 5 6 7 | 8 9 a | b c d | e | f |
104 | | | | 0 1 0 | RB | RA | 0 | 1 | add
105 | | | | 0 1 0 | RB | RA | 1 | 1 | mul
106 | | | | 0 1 1 | RB | (RA|0)| 0 | 1 | sub
107 | | | | 0 1 1 | RB | (RA|0)| 1 | 1 | cmp
108
109 10 bit mode:
110
111 * cmp default target is CR0
112 * for (RA|0) when RA=0 the input is a zero immediate,
113 meaning that sub becomes neg, and cmp becomes cmp-against-zero
114
115 ### Logical
116
117 | 0 1 | 2 3 4 | | 5 6 7 | 8 9 a | b c d | e | f |
118 | | | | 1 0 0 | RB | RA | 0 | 1 | and
119 | | | | 1 0 0 | RB | RA | 1 | 1 | nand
120 | | | | 1 0 1 | RB | RA | 0 | 1 | or
121 | | | | 1 0 1 | RB | (RA|0)| 1 | 1 | nor
122
123 10 bit mode:
124
125 * for (RA|0) when RA=0 the input is a zero immediate,
126 meaning that nor becomes not
127
128 ### Floating Point
129
130 | 0 1 | 2 3 4 | | 5 6 7 | 8 9 a | b c d | e | f |
131 | | RT | | 1 1 0 | RB | RA!=0 | 0 | 1 | fadd
132 | | RT | | 1 1 0 | RB | 0 0 0 | 0 | 1 | fabs
133 | | RT | | 1 1 0 | RB | RA | 1 | 1 | fmul
134 | | RT | | 1 1 1 | RB | (RA|0)| 0 | 1 | fsub
135 | | RT | | 1 1 1 | RB | (RA|0)| 1 | 1 | fcmp
136
137 10 bit mode:
138
139 * fcmp default target is CR1
140 * for (RA|0) when RA=0 the input is a zero immediate,
141 meaning that fsub becomes fneg, and fcmp becomes fcmp-against-zero
142
143 ### Condition Register
144
145 | 0 1 2 3 | 4 | | 5 6 7 | 8 9 | a b | c d e | f |
146 | 0 0 0 0 | BF2 | | 0 0 1 | 10 | BF | BFA | 1 | mcrf
147 | 0 0 0 1 | BA2 | | 0 0 1 | 10 | BA | BB | 1 | crnor
148 | 0 1 0 0 | BA2 | | 0 0 1 | 10 | BA | BB | 1 | crandc
149 | 0 1 1 0 | BA2 | | 0 0 1 | 10 | BA | BB | 1 | crxor
150 | 0 1 1 1 | BA2 | | 0 0 1 | 10 | BA | BB | 1 | crnand
151 | 1 0 0 0 | BA2 | | 0 0 1 | 10 | BA | BB | 1 | crand
152 | 1 0 0 1 | BA2 | | 0 0 1 | 10 | BA | BB | 1 | creqv
153 | 1 1 0 1 | BA2 | | 0 0 1 | 10 | BA | BB | 1 | crorc
154 | 1 1 1 0 | BA2 | | 0 0 1 | 10 | BA | BB | 1 | cror
155
156 10 bit mode:
157
158 * mcrf BF is only 2 bits which means the destination is only CR0-CR3
159
160 16 bit mode:
161
162 * mcrf BF2 extends BF (in MSB) to 3 bits
163 * CR operations: destination register is same as BA.
164 * CR operations: only possible on CR0 and CR1
165
166 SV (Vector Mode):
167
168 * CR operations: greatly extended reach/range (useful for predicates)
169