From a9fe9a1cee2b7429f5391961efdb45ce59af50cb Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Wed, 27 Apr 2022 15:38:25 +0100 Subject: [PATCH] add SVP64 assembler version of big-shift --- openpower/sv/biginteger/analysis.mdwn | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/openpower/sv/biginteger/analysis.mdwn b/openpower/sv/biginteger/analysis.mdwn index 8b24b3753..00ca404ef 100644 --- a/openpower/sv/biginteger/analysis.mdwn +++ b/openpower/sv/biginteger/analysis.mdwn @@ -109,17 +109,26 @@ and an OR, all of which are standard Scalar Power ISA instructions that when Vectorised are exactly what is needed. ``` -void biglsh(unsigned s, uint64_t vn[], uint64_t const v[], int n) -{ - for (int i = n - 1; i > 0; i--) - vn[i] = ((unsigned long long)v[i] << s) | (v[i - 1] >> (32 - s)); - vn[0] = v[0] << s; +void bigrsh(unsigned s, uint64_t r[], uint64_t un[], int n) { + for (int i = 0; i < n - 1; i++) + r[i] = (un[i] >> s) | (un[i + 1] << (64 - s)); + r[n - 1] = un[n - 1] >> s; } ``` With SVP64 being on top of the standard scalar regfile the offset by one of the elements may be achieved simply by referencing the same -vector data offset by one. +vector data offset by one. Given that all three instructions +(`srd`, `sld`, `or`) are an SVP64 type `RM-1P-2S1D` and are `EXTRA3`, +it is possible to reference the full 128 64-bit registers (r0-r127): + + subfic t1, t0, 64 # compute 64-s (s in t0) + sv.srd r8.v, r24.v, t0 # shift all of r24.v up by s, store in r8 + sv.sld r16.v, r25.v, t1 # offset start of vector by one (r25) + sv.or r8.v, r8.v, r16.v # OR two parts together + +Predication with zeroing may be utilised on sld to ensure that the +last element is zero, avoiding over-run. The reason why three instructions are needed instead of one in the case of big-add is because multiple bits chain through to the -- 2.30.2