Multiply is tricky: 64 bit operands actually produce a 128-bit result.
Most Scalar RISC ISAs have separate `mul-low-half` and `mul-hi-half`
instructions, whilst some (OpenRISC) have "Accumulators" from which
-the results of the multiply must be explicitly extracted. RISC advocates
+the results of the multiply must be explicitly extracted. High
+performance RISC advocates
recommend "macro-op fusion" which is in effect where the second instruction
-gains access to the cached copy of the HI result, which had already been
+gains access to the cached copy of the HI half of the
+multiply redult, which had already been
computed by the first. This approach quickly complicates the internal
microarchitecture, especially at the decode phase.