From: lkcl <lkcl@web>
Date: Tue, 26 Apr 2022 23:17:37 +0000 (+0100)
Subject: (no commit message)
X-Git-Tag: opf_rfc_ls005_v1~2573
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=fa60f5ec79b90dc07c6f0857f322390e5f2a4b09;p=libreriscv.git

---

diff --git a/openpower/sv/biginteger/analysis.mdwn b/openpower/sv/biginteger/analysis.mdwn
index f745748c7..52833d4bf 100644
--- a/openpower/sv/biginteger/analysis.mdwn
+++ b/openpower/sv/biginteger/analysis.mdwn
@@ -100,6 +100,31 @@ to people unfamiliar with Cray-style Vectors: if VL is not
 permitted to exceed 1 (because MAXVL is set to 1) then the above
 actually becomes a Scalar Big-Int add algorithm.
 
+# Vector Shift
+
+Like add and subtract, strictly speaking these need no new instructions.
+Keeping the shift amount within the range of the element (64 bit)
+a Vector bit-shift may be synthesised from a pair of shift operations
+and an OR, all of which are standard Scalar Power ISA instructions
+that when Vectorised are exactly what is needed.
+
+```
+void biglsh(unsigned s, unsigned vn[], unsigned const v[], int n)
+{
+    for (int i = n - 1; i > 0; i--)
+        vn[i] = (v[i] << s) | ((unsigned long long)v[i - 1] >> (32 - s));
+    vn[0] = v[0] << s;
+}
+```
+
+The reason why three instructions are needed instead of one in the
+case of big-add is because multiple bits chain through to the
+next element, where for add it is a single bit (carry-in, carry-out).
+For multiply and divide as shown later it is worthwhile to use
+one scalar register effectively as a full 64-bit carry/chain
+but in the case of shift, an OR may glue things together, easily,
+and in parallel.
+
 # Vector Multiply
 
 Long-multiply, assuming an O(N^2) algorithm, is performed by summing