Some operations in the Power ISA already target two 64-bit scalar
registers: `lq` for example. Some mathematical algorithms are more
efficient when there are two outputs rather than one. 64-bit multiply
-for example produces a 128 bit result
+for example actually internally produces a 128 bit result, which clearly
+cannot be stored in a single 64 bit register. Some ISAs recommend
+"macro op fusion": the practice of setting a convention whereby if
+two commonly used instructions (mullo, mulhi) use the same ALU but
+one selects the low part of an identical operation and the other
+selects the high part, then optimised micro-architectures may
+"fuse" those two instructions together, using Micro-coding techniques,
+internally.
+
+Macro-op fusion would be perfect for Scalar Multiply Lo/Hi if it
+was not for SVP64 Horizontal-First Loops.
* [[isa/svfixedarith]]
* [[isa/svfparith]]