+Based on RVV vmiota. vmiota may be viewed as a cumulative variant of popcount, generating multiple results. successive iterations include more and more bits of the bitstream being tested.
+
+When masked, only the bits not masked out are included in the count process.
+
+ viota RT/v, RA, RB
+
+Note that when RA=0 this indicates to test against all 1s, resulting in the instruction generating a vector sequence [0, 1, 2... VL-1]. This will be equivalent to RVV vid.m which is a pseudo-op, here (RA=0).
+
+Example
+
+ 7 6 5 4 3 2 1 0 Element number
+
+ 1 0 0 1 0 0 0 1 v2 contents
+ viota.m v4, v2 # Unmasked
+ 2 2 2 1 1 1 1 0 v4 result
+
+ 1 1 1 0 1 0 1 1 v0 contents
+ 1 0 0 1 0 0 0 1 v2 contents
+ 2 3 4 5 6 7 8 9 v4 contents
+ viota.m v4, v2, v0.t # Masked
+ 1 1 1 5 1 7 1 0 v4 results
+
+ def iota(RT, RA, RB):
+ mask = RB ? iregs[RB] : 0b111111...1
+ val = RA ? iregs[RA] : 0b111111...1
+ for i in range(VL):
+ if RA.scalar:
+ testmask = (1<<i)-1 # only count below
+ to_test = val & testmask & mask
+ iregs[RT+i] = popcount(to_test)
+
+a Vector CR-based version of the same, due to CRs being used for predication. This would use the same testing mechanism as branch: BO[0:2]
+where bit 2 is inv, bits 0:1 select the bit of the CR.
+
+ def test_CR_bit(CR, BO):
+ return CR[BO[0:1]] == BO[2]
+
+ def iotacr(RT, BA, BO):
+ mask = get_src_predicate()
+ count = 0
+ for i in range(VL):
+ if mask & (1<<i) == 0: continue
+ iregs[RT+i] = count
+ if test_CR_bit(CR[i+BA], BO):
+ count += 1
+
+the variant of iotacr which is vidcr, this is not appropriate to have BA=0, plus, it is pointless to have it anyway. The integer version covers it, by not reading the int regfile at all.