different encoding if SVP64-prefixed. It did not go well.
The complexity that resulted
in the decode phase was too great. The lesson was learned, the
-hard way: it is infinitely preferable to add a 32-bit Scalar Load-with-Shift
+hard way: it would be infinitely preferable
+to add a 32-bit Scalar Load-with-Shift
instruction *first*, which then inherently becomes Vectorised.
Perhaps a future Power ISA spec will have this Load-with-Shift instruction:
both ARM and x86 have it, because it saves greatly on instruction count in