`llvm.masked.compressstore.*`
followed by
`llvm.masked.expandload.*`
-with a single instruction.
+with a single instruction, but abstracted out from Load/Store and applicable
+in general to any 2P instruction.
This extreme power and flexibility comes down to the fact that SVP64
is not actually a Vector ISA: it is a loop-abstraction-concept that
enable either "packing" or "unpacking"
on the subvectors vec2/3/4.
-First, llustrating a
+First, illustrating a
"normal" SVP64 operation with `SUBVL!=1:` (assuming no elwidth overrides),
note that the VL loop is outer and the SUBVL loop inner:
For pack/unpack (again, no elwidth overrides), note that now there is the
option to swap the SUBVL and VL loop orders.
-In effect the Pack/Unpack performs a Transpose of the subvector elements:
+In effect the Pack/Unpack performs a Transpose of the subvector elements.
+Illustrated this time with a GPR mv operation:
# yield an outer-SUBVL or inner VL loop with SUBVL
def index_p(outer):