(no commit message)
[libreriscv.git] / simple_v_extension / specification / bitmanip.mdwn
1
2 # Bitmanip opcodes
3
4 These are bit manipulation opcodes that, if provided, augment SimpleV for
5 the purposes of efficiently accelerating Vector Processing, 3D Graphics
6 and Video Processing.
7
8 The justification for their inclusion in BitManip is identical to the
9 significant justification that went into their inclusion in the
10 RISC-V Vector Extension (under the "Predicate Mask" opcodes section)
11
12 See
13 <https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-mask-instructions>
14 for details.
15
16 # Predicate Masks
17
18 SV uses standard integer scalar registers as a predicate bitmask. Therefore,
19 the majority of RISC-V RV32I / RV64I bit-level instructions are perfectly
20 adequate. Some exceptions however present themselves from RVV.
21
22 ## logical bit-wise instructions
23
24 These are the available bitwise instructions in RVV:
25
26 vmand.mm vd, vs2, vs1 # vd[i] = vs2[i].LSB && vs1[i].LSB
27 vmnand.mm vd, vs2, vs1 # vd[i] = !(vs2[i].LSB && vs1[i].LSB)
28 vmandnot.mm vd, vs2, vs1 # vd[i] = vs2[i].LSB && !vs1[i].LSB
29 vmxor.mm vd, vs2, vs1 # vd[i] = vs2[i].LSB ^^ vs1[i].LSB
30 vmor.mm vd, vs2, vs1 # vd[i] = vs2[i].LSB || vs1[i].LSB
31 vmnor.mm vd, vs2, vs1 # vd[i] = !(vs2[i[.LSB || vs1[i].LSB)
32 vmornot.mm vd, vs2, vs1 # vd[i] = vs2[i].LSB || !vs1[i].LSB
33 vmxnor.mm vd, vs2, vs1 # vd[i] = !(vs2[i].LSB ^^ vs1[i].LSB)
34
35 The ones that exist in scalar RISC-V are:
36
37 AND rd, rs1, rs2 # rd = rs1 & rs2
38 OR rd, rs1, rs2 # rd = rs1 | rs2
39 XOR rd, rs1, rs2 # rd = rs1 ^ rs2
40
41 The ones in Bitmanip are:
42
43 ANDN rd, rs1, rs2 # rd = rs1 & ~rs2
44 ORN rd, rs1, rs2 # rd = rs1 | ~rs2
45 XORN rd, rs1, rs2 # rd = rs1 ^ ~rs2
46
47 This leaves:
48
49 NOR
50 NAND
51
52 These are currently listed as "pseudo-ops" in BitManip-Draft (0.91)
53 They need to be actual opcodes.
54
55
56 TODO: there is an extensive table in RVV of bit-level operations:
57
58 output instruction pseudoinstruction
59
60 | 0 | 1 | 2 | 3 | instruction | pseudoinstruction |
61 | - | - | - | - | -------------------------- | ----------------- |
62 | 0 | 0 | 0 | 0 | vmxor.mm vd, vd, vd | vmclr.m vd |
63 | 1 | 0 | 0 | 0 | vmnor.mm vd, src1, src2 | |
64 | 0 | 1 | 0 | 0 | vmandnot.mm vd, src2, src1 | |
65 | 1 | 1 | 0 | 0 | vmnand.mm vd, src1, src1 | vmnot.m vd, src1 |
66 | 0 | 0 | 1 | 0 | vmandnot.mm vd, src1, src2 | |
67 | 1 | 0 | 1 | 0 | vmnand.mm vd, src2, src2 | vmnot.m vd, src2 |
68 | 0 | 1 | 1 | 0 | vmxor.mm vd, src1, src2 | |
69 | 1 | 1 | 1 | 0 | vmnand.mm vd, src1, src2 | |
70 | 0 | 0 | 0 | 1 | vmand.mm vd, src1, src2 | |
71 | 1 | 0 | 0 | 1 | vmxnor.mm vd, src1, src2 | |
72 | 0 | 1 | 0 | 1 | vmand.mm vd, src2, src2 | vmcpy.m vd, src2 |
73 | 1 | 1 | 0 | 1 | vmornot.mm vd, src2, src1 | |
74 | 0 | 0 | 1 | 1 | vmand.mm vd, src1, src1 | vmcpy.m vd, src1 |
75 | 1 | 0 | 1 | 1 | vmornot.mm vd, src1, src2 | |
76 | 1 | 1 | 1 | 1 | vmxnor.mm vd, vd, vd | vmset.m vd |
77
78 ## pcnt - population count
79
80 population-count.
81
82 Pseudocode:
83
84 unsigned int v; // count the number of bits set in v
85 unsigned int c; // c accumulates the total bits set in v
86 for (c = 0; v; c++)
87 {
88 v &= v - 1; // clear the least significant bit set
89 }
90
91 This instruction is present in BitManip.
92
93 ## ffirst - find first bit
94
95 finds the first bit set as an index.
96
97 Pseudocode:
98
99
100 uint_xlen_t clz(uint_xlen_t rs1)
101 {
102 for (int count = 0; count < XLEN; count++)
103 if ((rs1 << count) >> (XLEN - 1))
104 return count;
105 return XLEN; // -1
106 }
107
108 This is similar but not identical to BitManip "CLZ". CLZ returns XLEN when no bits are set, whereas RVV returns -1.
109
110 ## sbf - set before first bit
111
112 Sets all LSBs leading up to (excluding) where an LSB in the src is set,
113 and sets zeros including and following the src bit found.
114 If the second operand is non-zero, this process continues the search
115 (in the same LSB to MSB order) beginning each time (including the first time)
116 from where 1s are set in the second operand.
117
118 A side-effect of the search is that when src is zero, the output is all ones.
119 If the second operand is non-zero and the src is zero, the output is a
120 copy of the second operand.
121
122 # Example
123
124 7 6 5 4 3 2 1 0 Bit number
125
126 1 0 0 1 0 1 0 0 a3 contents
127 sbf a2, a3, x0
128 0 0 0 0 0 0 1 1 a2 contents
129
130 1 0 0 1 0 1 0 1 a3 contents
131 sbf a2, a3, x0
132 0 0 0 0 0 0 0 0 a2
133
134 0 0 0 0 0 0 0 0 a3 contents
135 sbf a2, a3, x0
136 1 1 1 1 1 1 1 1 a2
137
138 1 1 0 0 0 0 1 1 a0 vcontents
139 1 0 0 1 0 1 0 0 a3 contents
140 sbf a2, a3, a0
141 0 1 0 0 0 0 1 1 a2 contents
142
143 Pseudo-code:
144
145 def sof(rd, rs1, rs2):
146 rd = 0
147 setting_mode = rs2 == x0 or (regs[rs2] & 1)
148
149 while i < XLEN:
150 bit = 1<<i
151
152 # only reenable when predicate in use, and bit valid
153 if !setting_mode && rs2 != x0:
154 if (regs[rs2] & bit):
155 # back into "setting" mode
156 setting_mode = True
157
158 # skipping mode
159 if !setting_mode:
160 # skip any more 1s
161 if regs[rs1] & bit == 1:
162 i += 1
163 continue
164
165 # setting mode, search for 1
166 if regs[rs1] & bit: # found a bit in rs1:
167 setting_mode = False
168 # next loop starts skipping
169 else:
170 regs[rd] |= bit # always set except when search succeeds
171
172 i += 1
173
174 def sbf(rd, rs1, rs2):
175 rd = 0
176 # start setting if no predicate or if 1st predicate bit set
177 setting_mode = rs2 == x0 or (regs[rs2] & 1)
178 while i < XLEN:
179 bit = 1<<i
180 if rs2 != x0 and (regs[rs2] & bit):
181 # reset searching
182 setting_mode = False
183 if setting_mode:
184 if regs[rs1] & bit: # found a bit in rs1: stop setting rd
185 setting_mode = False
186 else:
187 regs[rd] |= bit
188 else if rs2 != x0: # searching mode
189 if (regs[rs2] & bit):
190 setting_mode = True # back into "setting" mode
191 i += 1
192
193 ## sif - set including first bit
194
195 Similar to sbf except including the bit which ends a run. i.e:
196 Sets all LSBs leading up to *and including* where an LSB in the src is set,
197 and sets zeros following the point where the src bit is found.
198
199 The side-effect of when the src is zero is also the same as for sbf:
200 output is all 1s if src2 is zero, and output is equal to src2 if src2
201 is non-zero.
202
203
204 # Example
205
206 7 6 5 4 3 2 1 0 Element number
207
208 1 0 0 1 0 1 0 0 a3 contents
209 sif a2, a3
210 0 0 0 0 0 1 1 1 a2 contents
211
212 1 0 0 1 0 1 0 1 a3 contents
213 sif a2, a3
214 0 0 0 0 0 0 0 1 a2
215
216 1 1 0 0 0 0 1 1 a0 vcontents
217 1 0 0 1 0 1 0 0 a3 contents
218 sif a2, a3, a0
219 1 1 x x x x 1 1 a2 contents
220
221 Pseudo-code:
222
223 def sif(rd, rs1, rs2):
224 rd = 0
225 setting_mode = rs2 == x0 or (regs[rs2] & 1)
226
227 while i < XLEN:
228 bit = 1<<i
229
230 # only reenable when predicate in use, and bit valid
231 if !setting_mode && rs2 != x0:
232 if (regs[rs2] & bit):
233 # back into "setting" mode
234 setting_mode = True
235
236 # skipping mode
237 if !setting_mode:
238 # skip any more 1s
239 if regs[rs1] & bit == 1:
240 i += 1
241 continue
242
243 # setting mode, search for 1
244 regs[rd] |= bit # always set during search
245 if regs[rs1] & bit: # found a bit in rs1:
246 setting_mode = False
247 # next loop starts skipping
248
249 i += 1
250
251 ## sof - set only first bit
252
253 Similar to sbf and sif except *only* set the bit which ends a run.
254
255 Unlike sbf and sif however, if the src is zero then the output is
256 also guaranteed to be zero, irrespective of src2's contents.
257
258 # Example
259
260 7 6 5 4 3 2 1 0 Element number
261
262 1 0 0 1 0 1 0 0 a3 contents
263 sof a2, a3
264 0 0 0 0 0 1 0 0 a2 contents
265
266 1 0 0 1 0 1 0 1 a3 contents
267 sof a2, a3
268 0 0 0 0 0 0 0 1 a2
269
270 1 1 0 0 0 0 1 1 a0 vcontents
271 1 1 0 1 0 1 0 0 a3 contents
272 sof a2, a3, a0
273 0 1 x x x x 0 0 a2 contents
274
275 Pseudo-code:
276
277 def sof(rd, rs1, rs2):
278 rd = 0
279 setting_mode = rs2 == x0 or (regs[rs2] & 1)
280
281 while i < XLEN:
282 bit = 1<<i
283
284 # only reenable when predicate in use, and bit valid
285 if !setting_mode && rs2 != x0:
286 if (regs[rs2] & bit):
287 # back into "setting" mode
288 setting_mode = True
289
290 # skipping mode
291 if !setting_mode:
292 # skip any more 1s
293 if regs[rs1] & bit == 1:
294 i += 1
295 continue
296
297 # setting mode, search for 1
298 if regs[rs1] & bit: # found a bit in rs1:
299 regs[rd] |= bit # only set when search succeeds
300 setting_mode = False
301 # next loop starts skipping
302
303 i += 1
304