# Positional popcount SVP64 * * * RISC-V Bitmanip Extension Document Version 0.94-draft Editor: Claire Wolf Symbiotic GmbH Positional popcount in optimised assembler is typically done on SIMD ISAs in around 500 lines. Power ISA thanks to `bpermd` can be much more efficient: with SVP64 even more so. The reference implementation showing the concept is below. ``` // Copyright (c) 2020 Robert Clausecker // count8 reference implementation for tests. Do not alter. func count8safe(counts *[8]int, buf []uint8) { for i := range buf { for j := 0; j < 8; j++ { counts[j] += int(buf[i] >> j & 1) } } } ``` A simple but still hardware-paralleliseable SVP64 assembler for 8-bit input values (`count8safe`) is as follows: ``` mtspr 9, 3" # move r3 to CTR # VL = MIN(CTR,MAXVL=8), Rc=1 (CR0 set if CTR ends) setvl 3,0,8,0,1,1" # set MVL=8, VL=MIN(MVL,CTR) # load VL bytes (update r4 addr) but compressed (dw=8) addi 6, 0, 0 # initialise all 64-bits of r6 to zero sv.lbzu/pi/dw=8 *6, 1(4) # should be /lf here as well # gather performs the transpose (which gets us to positional..) gbbd 8,6 # now those bits have been turned around, popcount and sum them setvl 0,0,8,0,1,1 # set MVL=VL=8 sv.popcntd/sw=8 *24,*8 # do the (now transposed) popcount sv.add *16,*16,*24 # and accumulate in results # branch back if CTR still non-zero. works even though VL=8 sv.bc/all 16, *0, -0x28 # reduce CTR by VL and stop if -ve ``` Array popcount is just standard popcount function ([[!wikipedia Hamming weight]]) on an array of values whereas positional popcount adds up the totals of each bit set to 1 in each bit-position, of an array of input values. pospopcnt [[!tag svp64_cookbook ]]