From 9c2a5b486800a28dee6febda2ec21d75ee964d80 Mon Sep 17 00:00:00 2001
From: lkcl <lkcl@web>
Date: Tue, 19 Apr 2022 14:09:17 +0100
Subject: [PATCH]

---
 openpower/sv/biginteger.mdwn | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/openpower/sv/biginteger.mdwn b/openpower/sv/biginteger.mdwn
index 638eecd92..8969ce6b4 100644
--- a/openpower/sv/biginteger.mdwn
+++ b/openpower/sv/biginteger.mdwn
@@ -16,7 +16,11 @@ A secondary focus is that if Vectorised, implementors may choose
 to deploy macro-op fusion targetting back-end 256-bit or greater
 Dynamic SIMD ALUs for maximum performance and effectiveness.
 
-# Add and Subtract
+# Analysis
+
+This section covers an analysis of big integer operations
+
+## Add and Subtract
 
 Surprisingly, no new additional instructions are required to perform
 a straightforward big-integer add or subtract.  Vectorised `addeo`
@@ -30,3 +34,19 @@ a CA Flag, `sv.addeo` is in effect an alias for Vectorised add.  As such,
 implementors are entirely at liberty to recognise Horizontal-First Vector
 adds and send the vector of registers to a much larger and wider back-end
 ALU.
+
+## Multiply
+
+Multiply is tricky: 64 bit operands actually produce a 128-bit result.
+Most Scalar RISC ISAs have separate `mul-low-half` and `mul-hi-half`
+instructions, whilst some (OpenRISC) have "Accumulators" from which
+the results of the multiply must be explicitly extracted. RISC advocates
+recommend "macro-op fusion" which is in effect where the second instruction
+gains access to the cached copy of the HI result, which had already been
+computed by the first. This approach quickly complicates the internal
+microarchitecture, especially at the decode phase.
+
+Instead, Intel, in 2012, specifically added a `mulx` instruction, allowing
+both HI and LO halves of the multiply to reach registers.  If done as a
+multiply-and-accumulate this becomes quite an expensive operation:
+3 64-Bit in, 2 64-bit registers out).
-- 
2.30.2