From 7daa25e5670dfe05ce0f7eb0466fd5690d225e01 Mon Sep 17 00:00:00 2001 From: lkcl Date: Thu, 21 Apr 2022 10:45:39 +0100 Subject: [PATCH] --- openpower/sv/biginteger/analysis.mdwn | 30 ++++++++++++++++++++------- 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/openpower/sv/biginteger/analysis.mdwn b/openpower/sv/biginteger/analysis.mdwn index 52174220f..cbb061ab8 100644 --- a/openpower/sv/biginteger/analysis.mdwn +++ b/openpower/sv/biginteger/analysis.mdwn @@ -1,8 +1,9 @@ # Analysis -This section covers an analysis of big integer operations. Use of -smaller sub-operations is a given: worst-case, addition is O(N) -whilst multiply and divide are O(N^2). +This page covers an analysis of big integer operations, to +work out an optimal Vector Instruction for addition to Draft +SVP64. Use of smaller sub-operations is a given: worst-case, +addition is O(N) whilst multiply and divide are O(N^2). Links @@ -125,15 +126,28 @@ purpose. Successive iterations effectively use RC as a 64-bit carry, and as noted by Intel in their notes on mulx, RA*RB+RC+RD cannot overflow, so does not require -setting an additional CA flag. - -Combined with a Vectorised big-int `sv.adde` the key inner loop of +setting an additional CA flag, we first cover the chain of +RA*RB+RC as follows: + + RT0, RC0 = RA0 * RB0 + RC + | + +----------------+ + | + RT1, RC1 = RA1 * RB1 + RC0 + | + +----------------+ + | + RT2, RC2 = RA2 * RB2 + RC1 + +Following up to add each partially-computed row to what will become +the final result is achieved with a Vectorised big-int +`sv.adde`. Thus, the key inner loop of Knuth's Algorithm M may be achieved in four instructions, two of which are scalar initialisation: - li r16, 0 # zero accululator - addic r16, r16, 0 # CA to zero as well + li r16, 0 # zero accumulator sv.madde r0.v, r8.v, r17, r16 # mul vector + addic r16, r16, 0 # CA to zero as well sv.adde r24.v, r24.v, r0.v # big-add row to result Normally, in a Scalar ISA, the use of a register as both a source -- 2.30.2