RT = lowerhalf(product)
RC = upperhalf(product)
-Successive iterations thus effectively use RC as a 64-bit carry, and
+Horizontal-First Mode therefore may be applied to just this one
+instruction.
+Successive sequential iterations effectively use RC as a kind of
+64-bit carry, and
as noted by Intel in their notes on mulx,
`RA*RB+RC+RD` cannot overflow, so does not require
setting an additional CA flag. We first cover the chain of