From 79de49b697be9d05a0bfe5daedb247f576a020fa Mon Sep 17 00:00:00 2001 From: lkcl Date: Sat, 30 Apr 2022 06:22:45 +0100 Subject: [PATCH] --- openpower/sv/biginteger/analysis.mdwn | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/openpower/sv/biginteger/analysis.mdwn b/openpower/sv/biginteger/analysis.mdwn index 8ddce0c3a..754f86d01 100644 --- a/openpower/sv/biginteger/analysis.mdwn +++ b/openpower/sv/biginteger/analysis.mdwn @@ -14,7 +14,8 @@ This page covers an analysis of big integer operations, to work out optimal Scalar Instructions to propose be submitted to the OpenPOWER ISA WG, that when combined with Draft SVP64 give -high performance compact Big Integer Vector Arithmetic. +high performance compact Big Integer Vector Arithmetic. Leverage +of existing Scalar Power ISA instructions is also explained. Use of smaller sub-operations is a given: worst-case in a Scalar context, addition is O(N) whilst multiply and divide are O(N^2), @@ -71,11 +72,14 @@ ALU, and short-cut the intermediate storage of XER.CA on an element level in back-end hardware that need only: * read the first incoming XER.CA -* implement a Vector-aware carry propagation algorithm +* implement a large Vector-aware carry propagation algorithm * store the very last XER.CA in the batch The size and implementation of the underlying back-end SIMD ALU -is entirely at the discretion of the implementer. +is entirely at the discretion of the implementer, as is whether to +deploy the above strategy. The only hard requirement for +implementors of SVP64 is to comply with strict and precise Program Order +even at the Element level. If there is pressure on the register file (or multi-million-digit big integers) @@ -286,15 +290,15 @@ the available space in the prefix is extremely limited (9 bits). With effectively 5 operands (3 in, 2 out) some compromises are needed. A little thought gives a useful workaround: two modes, controlled by a single bit in `RM.EXTRA`, determine whether the 5th -register is set to RC or whether to RT+VL. This then leaves only +register is set to RC or whether to RT+MAXVL. This then leaves only 4 registers to qualify as scalar/vector, which can use four EXTRA2 designators and fits into the available 9-bit space. -RS=RT+VL Mode: +RS=RT+MAXVL Mode: product = RA*RB+RC RT = lowerhalf(product) - RS=RT+VL = upperhalf(product) + RS=RT+MAXVL = upperhalf(product) and RS=RC Mode: @@ -304,7 +308,7 @@ and RS=RC Mode: Now there is much more potential, including setting RC to a Scalar, which would be useful as a 64 bit Carry. RC as a Vector would produce -a Vector of the HI halves of a Vector of multiplies. RS=RT+VL Mode +a Vector of the HI halves of a Vector of multiplies. RS=RT+MAXVL Mode would allow that same Vector of HI halves to not be an overwrite of RC. Also it is possible to specify that any of RA, RB or RC are scalar or vector. Overall it is extremely powerful. @@ -487,7 +491,7 @@ overflow for clarity) can be written as: Again, in an SVP64 context, using EXTRA mode bit 8 allows for selecting whether `RS=RC` or -`RS=RT+VL`. Similar flexibility in the scalar-vector settings +`RS=RT+MAXVL`. Similar flexibility in the scalar-vector settings allows the instruction to perform full parallel vector div/mod, or act in loop-back mode for big-int division by a scalar, or for a single scalar 128/64 div/mod. -- 2.30.2