sv.bc/all 16, *0, -0x28 # reduce CTR by VL and stop if -ve
```
+# Improvements
+
+There exist many opportunities for parallelism that simpler hardware
+would need to have in order to maximise performance. On Out-of-Order
+hardware the above extremely simple and very clear algorithm will
+achieve extreme high levels of performance simply by exploiting
+standard Multi-Issue Register Hazard Management.
+
+However simpler hardware - in-order - will need a little bit of
+assistance, and that can come in the form of expanding to QTY4 or
+QTY8 64-bit blocks (so that sv.popcntd uses MVL=VL=32 or MVL=VL=64),
+`gbbd` becomes an `sv.gbbd` but VL being set to the block count
+(QTY4 or QTY8), and the SV REMAP Parallel Reduction Schedule being
+applied to each intermediary result rather than using an array
+of straight accumulators `r16-r23`. However this starts to push
+the boundaries of the number of registers needed, so as an
+exercise is left for another time.
+
[[!tag svp64_cookbook ]]