a per-byte basis a SIMD-style count of each byte's 1s, it becomes possible
to simply count less bytes.
+Should it be more useful to redefine popcntb in terms of always returning
+eight results? For example `sv.popcntb/w=16` to return 8 2-bit counts of
+the number of bits in each 2-bit group in RS?
+
## no modification needed, but function changes
For the `addi` instruction there is no apparent change: