-# Vector Shift
-
-Like add and subtract, strictly speaking these need no new instructions.
-Keeping the shift amount within the range of the element (64 bit)
-a Vector bit-shift may be synthesised from a pair of shift operations
-and an OR, all of which are standard Scalar Power ISA instructions
-that when Vectorized are exactly what is needed.
-
-```
-void bigrsh(unsigned s, uint64_t r[], uint64_t un[], int n) {
- for (int i = 0; i < n - 1; i++)
- r[i] = (un[i] >> s) | (un[i + 1] << (64 - s));
- r[n - 1] = un[n - 1] >> s;
-}
-```
-
-With SVP64 being on top of the standard scalar regfile the offset by
-one of the elements may be achieved simply by referencing the same
-vector data offset by one. Given that all three instructions
-(`srd`, `sld`, `or`) are an SVP64 type `RM-1P-2S1D` and are `EXTRA3`,
-it is possible to reference the full 128 64-bit registers (r0-r127):
-
- subfic t1, t0, 64 # compute 64-s (s in t0)
- sv.srd r8.v, r24.v, t0 # shift each element of r24.v up by s
- sv.sld r16.v, r25.v, t1 # offset start of vector by one (r25)
- sv.or r8.v, r8.v, r16.v # OR two parts together
-
-Predication with zeroing may be utilised on sld to ensure that the
-last element is zero, avoiding over-run.
-
-The reason why three instructions are needed instead of one in the
-case of big-add is because multiple bits chain through to the
-next element, where for add it is a single bit (carry-in, carry-out),
-and this is precisely what `adde` already does.
-For multiply and divide as shown later it is worthwhile to use
-one scalar register effectively as a full 64-bit carry/chain
-but in the case of shift, an OR may glue things together, easily,
-and in parallel, because unlike `sv.adde`, down-chain
-carry-propagation through multiple elements does not occur.
-
-With Scalar shift and rotate operations in the Power ISA already being
-complex and very comprehensive, it is hard to justify creating complex
-3-in 2-out variants when a sequence of 3 simple instructions will suffice.
-However it is reasonably justifiable to have a 3-in 1-out instruction
-with an implicit source, based around the inner operation:
-
-```
- # r[i] = (un[i] >> s) | (un[i + 1] << (64 - s));
- t <- ROT128(RA || RA1, RB[58:63])
- RT <- t[64:127]
-```
-
-RA1 is implicitly (or explicitly, RC) greater than RA by one
-scalar register number, and like the other operations below,
-a 128/64 shift is performed, truncating to take the lower
-64 bits. By taking a Vector source RA and assuming lower-numbered
-registers are lower-significant digits in the biginteger operation
-the entire biginteger source may be shifted by a scalar.
-
-For larger shift amounts beyond an element bitwidth standard register move
-operations may be used, or, if the shift amount is static,
-to reference an alternate starting point in
-the registers containing the Vector elements
-because SVP64 sits on top of a standard Scalar register file.
-`sv.sld r16.v, r26.v, t1` for example is equivalent to shifting
-by an extra 64 bits, compared to `sv.sld r16.v, r25.v, t1`.
-