which would normally stop looping if the result register is scalar.
Thus, the result scalar register, if also used as a source scalar,
may be used to perform sequential accumulation. This *deliberately*
-sets up a chain of Register Hazard Dependencies, whereas Parallel Reduce
+sets up a chain of Register Hazard Dependencies
+(which advanced hardware may optimise out), whereas Parallel Reduce
[[sv/remap]] deliberately issues a Tree-Schedule of operations that may
be parallelised.