From a9fe9a1cee2b7429f5391961efdb45ce59af50cb Mon Sep 17 00:00:00 2001
From: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date: Wed, 27 Apr 2022 15:38:25 +0100
Subject: [PATCH] add SVP64 assembler version of big-shift

---
 openpower/sv/biginteger/analysis.mdwn | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/openpower/sv/biginteger/analysis.mdwn b/openpower/sv/biginteger/analysis.mdwn
index 8b24b3753..00ca404ef 100644
--- a/openpower/sv/biginteger/analysis.mdwn
+++ b/openpower/sv/biginteger/analysis.mdwn
@@ -109,17 +109,26 @@ and an OR, all of which are standard Scalar Power ISA instructions
 that when Vectorised are exactly what is needed.
 
 ```
-void biglsh(unsigned s, uint64_t vn[], uint64_t const v[], int n)
-{
-    for (int i = n - 1; i > 0; i--)
-        vn[i] = ((unsigned long long)v[i] << s) | (v[i - 1] >> (32 - s));
-    vn[0] = v[0] << s;
+void bigrsh(unsigned s, uint64_t r[], uint64_t un[], int n) {
+    for (int i = 0; i < n - 1; i++)
+        r[i] = (un[i] >> s) | (un[i + 1] << (64 - s));
+    r[n - 1] = un[n - 1] >> s;
 }
 ```
 
 With SVP64 being on top of the standard scalar regfile the offset by
 one of the elements may be achieved simply by referencing the same
-vector data offset by one.
+vector data offset by one.  Given that all three instructions
+(`srd`, `sld`, `or`) are an SVP64 type `RM-1P-2S1D` and are `EXTRA3`,
+it is possible to reference the full 128 64-bit registers (r0-r127):
+
+    subfic t1, t0, 64        # compute 64-s (s in t0)
+    sv.srd r8.v, r24.v, t0   # shift all of r24.v up by s, store in r8
+    sv.sld r16.v, r25.v, t1  # offset start of vector by one (r25)
+    sv.or  r8.v, r8.v, r16.v # OR two parts together
+
+Predication with zeroing may be utilised on sld to ensure that the
+last element is zero, avoiding over-run.
 
 The reason why three instructions are needed instead of one in the
 case of big-add is because multiple bits chain through to the
-- 
2.30.2