## Multiply
+Long-multiply, assuming an O(N^2) algorithm, is performed by summing
+NxN separate smaller multiplications together. Karatsuba's algorithm
+reduces the number of small multiplies at the expense of increasing
+the number of additions. Some algorithms follow the Vedic Multiply
+pattern by grouping together all multiplies of the same magnitude/power
+(same column) whilst others perform row-based multiplication: a single
+digit of B multiplies the entirety of A, summed a row at a time. This
+algorithm is the basis of the analysis below (Knuth's Algorithm M).
+
Multiply is tricky: 64 bit operands actually produce a 128-bit result,
which clearly cannot fit into an orthogonal register file.
Most Scalar RISC ISAs have separate `mul-low-half` and `mul-hi-half`