that when Vectorised are exactly what is needed.
```
-void biglsh(unsigned s, uint64_t vn[], uint64_t const v[], int n)
-{
- for (int i = n - 1; i > 0; i--)
- vn[i] = ((unsigned long long)v[i] << s) | (v[i - 1] >> (32 - s));
- vn[0] = v[0] << s;
+void bigrsh(unsigned s, uint64_t r[], uint64_t un[], int n) {
+ for (int i = 0; i < n - 1; i++)
+ r[i] = (un[i] >> s) | (un[i + 1] << (64 - s));
+ r[n - 1] = un[n - 1] >> s;
}
```
With SVP64 being on top of the standard scalar regfile the offset by
one of the elements may be achieved simply by referencing the same
-vector data offset by one.
+vector data offset by one. Given that all three instructions
+(`srd`, `sld`, `or`) are an SVP64 type `RM-1P-2S1D` and are `EXTRA3`,
+it is possible to reference the full 128 64-bit registers (r0-r127):
+
+ subfic t1, t0, 64 # compute 64-s (s in t0)
+ sv.srd r8.v, r24.v, t0 # shift all of r24.v up by s, store in r8
+ sv.sld r16.v, r25.v, t1 # offset start of vector by one (r25)
+ sv.or r8.v, r8.v, r16.v # OR two parts together
+
+Predication with zeroing may be utilised on sld to ensure that the
+last element is zero, avoiding over-run.
The reason why three instructions are needed instead of one in the
case of big-add is because multiple bits chain through to the