# Bitmanip opcodes These are bit manipulation opcodes that, if provided, augment SimpleV for the purposes of efficiently accelerating Vector Processing, 3D Graphics and Video Processing. The justification for their inclusion in BitManip is identical to the significant justification that went into their inclusion in the RISC-V Vector Extension (under the "Predicate Mask" opcodes section) See for details. # Predicate Masks SV uses standard integer scalar registers as a predicate bitmask. Therefore, the majority of RISC-V RV32I / RV64I bit-level instructions are perfectly adequate. Some exceptions however present themselves from RVV. ## logical bit-wise instructions These are the available bitwise instructions in RVV: vmand.mm vd, vs2, vs1 # vd[i] = vs2[i].LSB && vs1[i].LSB vmnand.mm vd, vs2, vs1 # vd[i] = !(vs2[i].LSB && vs1[i].LSB) vmandnot.mm vd, vs2, vs1 # vd[i] = vs2[i].LSB && !vs1[i].LSB vmxor.mm vd, vs2, vs1 # vd[i] = vs2[i].LSB ^^ vs1[i].LSB vmor.mm vd, vs2, vs1 # vd[i] = vs2[i].LSB || vs1[i].LSB vmnor.mm vd, vs2, vs1 # vd[i] = !(vs2[i[.LSB || vs1[i].LSB) vmornot.mm vd, vs2, vs1 # vd[i] = vs2[i].LSB || !vs1[i].LSB vmxnor.mm vd, vs2, vs1 # vd[i] = !(vs2[i].LSB ^^ vs1[i].LSB) The ones that exist in scalar RISC-V are: AND rd, rs1, rs2 # rd = rs1 & rs2 OR rd, rs1, rs2 # rd = rs1 | rs2 XOR rd, rs1, rs2 # rd = rs1 ^ rs2 The ones in Bitmanip are: ANDN rd, rs1, rs2 # rd = rs1 & ~rs2 ORN rd, rs1, rs2 # rd = rs1 | ~rs2 XORN rd, rs1, rs2 # rd = rs1 ^ ~rs2 This leaves: NOR NAND These are currently listed as "pseudo-ops" in BitManip-Draft (0.91) They need to be actual opcodes. TODO: there is an extensive table in RVV of bit-level operations: output instruction pseudoinstruction | 0 | 1 | 2 | 3 | instruction | pseudoinstruction | | - | - | - | - | -------------------------- | ----------------- | | 0 | 0 | 0 | 0 | vmxor.mm vd, vd, vd | vmclr.m vd | | 1 | 0 | 0 | 0 | vmnor.mm vd, src1, src2 | | | 0 | 1 | 0 | 0 | vmandnot.mm vd, src2, src1 | | | 1 | 1 | 0 | 0 | vmnand.mm vd, src1, src1 | vmnot.m vd, src1 | | 0 | 0 | 1 | 0 | vmandnot.mm vd, src1, src2 | | | 1 | 0 | 1 | 0 | vmnand.mm vd, src2, src2 | vmnot.m vd, src2 | | 0 | 1 | 1 | 0 | vmxor.mm vd, src1, src2 | | | 1 | 1 | 1 | 0 | vmnand.mm vd, src1, src2 | | | 0 | 0 | 0 | 1 | vmand.mm vd, src1, src2 | | | 1 | 0 | 0 | 1 | vmxnor.mm vd, src1, src2 | | | 0 | 1 | 0 | 1 | vmand.mm vd, src2, src2 | vmcpy.m vd, src2 | | 1 | 1 | 0 | 1 | vmornot.mm vd, src2, src1 | | | 0 | 0 | 1 | 1 | vmand.mm vd, src1, src1 | vmcpy.m vd, src1 | | 1 | 0 | 1 | 1 | vmornot.mm vd, src1, src2 | | | 1 | 1 | 1 | 1 | vmxnor.mm vd, vd, vd | vmset.m vd | ## pcnt - population count population-count. Pseudocode: unsigned int v; // count the number of bits set in v unsigned int c; // c accumulates the total bits set in v for (c = 0; v; c++) { v &= v - 1; // clear the least significant bit set } This instruction is present in BitManip. ## ffirst - find first bit finds the first bit set as an index. Pseudocode: uint_xlen_t clz(uint_xlen_t rs1) { for (int count = 0; count < XLEN; count++) if ((rs1 << count) >> (XLEN - 1)) return count; return XLEN; // -1 } This is similar but not identical to BitManip "CLZ". CLZ returns XLEN when no bits are set, whereas RVV returns -1. ## sbf - set before first bit Sets all LSBs leading up to (excluding) where an LSB in the src is set, and sets zeros including and following the src bit found. If the second operand is non-zero, this process continues the search (in the same LSB to MSB order) beginning each time (including the first time) from where 1s are set in the second operand. A side-effect of the search is that when src is zero, the output is all ones. If the second operand is non-zero and the src is zero, the output is a copy of the second operand. # Example 7 6 5 4 3 2 1 0 Bit number 1 0 0 1 0 1 0 0 a3 contents sbf a2, a3, x0 0 0 0 0 0 0 1 1 a2 contents 1 0 0 1 0 1 0 1 a3 contents sbf a2, a3, x0 0 0 0 0 0 0 0 0 a2 0 0 0 0 0 0 0 0 a3 contents sbf a2, a3, x0 1 1 1 1 1 1 1 1 a2 1 1 0 0 0 0 1 1 a0 vcontents 1 0 0 1 0 1 0 0 a3 contents sbf a2, a3, a0 0 1 0 0 0 0 1 1 a2 contents Pseudo-code: def sof(rd, rs1, rs2): rd = 0 setting_mode = rs2 == x0 or (regs[rs2] & 1) while i < XLEN: bit = 1<