to deploy macro-op fusion targetting back-end 256-bit or greater
Dynamic SIMD ALUs for maximum performance and effectiveness.
-# Add and Subtract
+# Analysis
+
+This section covers an analysis of big integer operations
+
+## Add and Subtract
Surprisingly, no new additional instructions are required to perform
a straightforward big-integer add or subtract. Vectorised `addeo`
implementors are entirely at liberty to recognise Horizontal-First Vector
adds and send the vector of registers to a much larger and wider back-end
ALU.
+
+## Multiply
+
+Multiply is tricky: 64 bit operands actually produce a 128-bit result.
+Most Scalar RISC ISAs have separate `mul-low-half` and `mul-hi-half`
+instructions, whilst some (OpenRISC) have "Accumulators" from which
+the results of the multiply must be explicitly extracted. RISC advocates
+recommend "macro-op fusion" which is in effect where the second instruction
+gains access to the cached copy of the HI result, which had already been
+computed by the first. This approach quickly complicates the internal
+microarchitecture, especially at the decode phase.
+
+Instead, Intel, in 2012, specifically added a `mulx` instruction, allowing
+both HI and LO halves of the multiply to reach registers. If done as a
+multiply-and-accumulate this becomes quite an expensive operation:
+3 64-Bit in, 2 64-bit registers out).