From 9c2a5b486800a28dee6febda2ec21d75ee964d80 Mon Sep 17 00:00:00 2001 From: lkcl Date: Tue, 19 Apr 2022 14:09:17 +0100 Subject: [PATCH] --- openpower/sv/biginteger.mdwn | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/openpower/sv/biginteger.mdwn b/openpower/sv/biginteger.mdwn index 638eecd92..8969ce6b4 100644 --- a/openpower/sv/biginteger.mdwn +++ b/openpower/sv/biginteger.mdwn @@ -16,7 +16,11 @@ A secondary focus is that if Vectorised, implementors may choose to deploy macro-op fusion targetting back-end 256-bit or greater Dynamic SIMD ALUs for maximum performance and effectiveness. -# Add and Subtract +# Analysis + +This section covers an analysis of big integer operations + +## Add and Subtract Surprisingly, no new additional instructions are required to perform a straightforward big-integer add or subtract. Vectorised `addeo` @@ -30,3 +34,19 @@ a CA Flag, `sv.addeo` is in effect an alias for Vectorised add. As such, implementors are entirely at liberty to recognise Horizontal-First Vector adds and send the vector of registers to a much larger and wider back-end ALU. + +## Multiply + +Multiply is tricky: 64 bit operands actually produce a 128-bit result. +Most Scalar RISC ISAs have separate `mul-low-half` and `mul-hi-half` +instructions, whilst some (OpenRISC) have "Accumulators" from which +the results of the multiply must be explicitly extracted. RISC advocates +recommend "macro-op fusion" which is in effect where the second instruction +gains access to the cached copy of the HI result, which had already been +computed by the first. This approach quickly complicates the internal +microarchitecture, especially at the decode phase. + +Instead, Intel, in 2012, specifically added a `mulx` instruction, allowing +both HI and LO halves of the multiply to reach registers. If done as a +multiply-and-accumulate this becomes quite an expensive operation: +3 64-Bit in, 2 64-bit registers out). -- 2.30.2