From bff6bf4de1b8c84d52bc082c8c7c5290c30f851a Mon Sep 17 00:00:00 2001 From: lkcl Date: Wed, 27 Apr 2022 10:22:21 +0100 Subject: [PATCH] --- openpower/sv/biginteger/analysis.mdwn | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/openpower/sv/biginteger/analysis.mdwn b/openpower/sv/biginteger/analysis.mdwn index 703f51235..b731f2bee 100644 --- a/openpower/sv/biginteger/analysis.mdwn +++ b/openpower/sv/biginteger/analysis.mdwn @@ -129,6 +129,12 @@ With Scalar shift and rotate operations in the Power ISA already being complex and very comprehensive, it is hard to justify creating complex 3-in 2-out variants when a sequence of 3 simple instructions will suffice. +For larger shift amounts beyond an element bitwidth standard register move +operations may be used, or, if the shift amount is static, +to reference an alternate starting point in +the registers containing the Vector elements +because SVP64 sits on top of a standard Scalar register file. + # Vector Multiply Long-multiply, assuming an O(N^2) algorithm, is performed by summing @@ -149,12 +155,13 @@ the results of the multiply must be explicitly extracted. High performance RISC advocates recommend "macro-op fusion" which is in effect where the second instruction gains access to the cached copy of the HI half of the -multiply redult, which had already been +multiply result, which had already been computed by the first. This approach quickly complicates the internal microarchitecture, especially at the decode phase. Instead, Intel, in 2012, specifically added a `mulx` instruction, allowing -both HI and LO halves of the multiply to reach registers. If done as a +both HI and LO halves of the multiply to reach registers with a single +instruction. If however done as a multiply-and-accumulate this becomes quite an expensive operation: (3 64-Bit in, 2 64-bit registers out). -- 2.30.2