(`srd`, `sld`, `or`) are an SVP64 type `RM-1P-2S1D` and are `EXTRA3`,
it is possible to reference the full 128 64-bit registers (r0-r127):
- subfic t1, t0, 64 # compute 64-s (s in t0)
- sv.srd r8.v, r24.v, t0 # shift each element of r24.v up by s
- sv.sld r16.v, r25.v, t1 # offset start of vector by one (r25)
- sv.or r8.v, r8.v, r16.v # OR two parts together
+ subfic t1, t0, 64 # compute 64-s (s in t0)
+ sv.srd *r8, *r24, t0 # shift each element of r24 vector up by s
+ sv.sld *r16, *r25, t1 # offset start of vector by one (r25)
+ sv.or *r8, *r8, *r16 # OR two parts together
Predication with zeroing may be utilised on sld to ensure that the
last element is zero, avoiding over-run.
```
The trick here is that the *entirety* of `RA` is rotated,
-the
+then parts of it are masked into the destinations.
+RC, if also properly masked, can be ORed into RT, as
+long as the bits of RC are in the right place.
+The really interesting bit is that when Vectorised,
+the upper bits (now in RS) *are* in the right bit-positions
+to be ORed into the second `dsrd` operation. This allows
+us to create a chain `sv.dsrd`.
+
For larger shift amounts beyond an element bitwidth standard register move
operations may be used, or, if the shift amount is static,
to reference an alternate starting point in