From fa60f5ec79b90dc07c6f0857f322390e5f2a4b09 Mon Sep 17 00:00:00 2001 From: lkcl Date: Wed, 27 Apr 2022 00:17:37 +0100 Subject: [PATCH] --- openpower/sv/biginteger/analysis.mdwn | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/openpower/sv/biginteger/analysis.mdwn b/openpower/sv/biginteger/analysis.mdwn index f745748c7..52833d4bf 100644 --- a/openpower/sv/biginteger/analysis.mdwn +++ b/openpower/sv/biginteger/analysis.mdwn @@ -100,6 +100,31 @@ to people unfamiliar with Cray-style Vectors: if VL is not permitted to exceed 1 (because MAXVL is set to 1) then the above actually becomes a Scalar Big-Int add algorithm. +# Vector Shift + +Like add and subtract, strictly speaking these need no new instructions. +Keeping the shift amount within the range of the element (64 bit) +a Vector bit-shift may be synthesised from a pair of shift operations +and an OR, all of which are standard Scalar Power ISA instructions +that when Vectorised are exactly what is needed. + +``` +void biglsh(unsigned s, unsigned vn[], unsigned const v[], int n) +{ + for (int i = n - 1; i > 0; i--) + vn[i] = (v[i] << s) | ((unsigned long long)v[i - 1] >> (32 - s)); + vn[0] = v[0] << s; +} +``` + +The reason why three instructions are needed instead of one in the +case of big-add is because multiple bits chain through to the +next element, where for add it is a single bit (carry-in, carry-out). +For multiply and divide as shown later it is worthwhile to use +one scalar register effectively as a full 64-bit carry/chain +but in the case of shift, an OR may glue things together, easily, +and in parallel. + # Vector Multiply Long-multiply, assuming an O(N^2) algorithm, is performed by summing -- 2.30.2