is applied *in general* to Scalar operations, just like the x86
`REP` instruction (if put on steroids).
-# EXTRA Pack/Unpack Modes
+# EXTRA Pack/Unpack bits
The pack/unpack concept of VSX `vpack` is abstracted out as a Sub-Vector
-reordering Schedule, named `RM-2P-1S1D-PU`.
-The usual RM-2P-1S1D is reduced from EXTRA3 to EXTRA2, making
-room for 2 extra bits that enable either "packing" or "unpacking"
+reordering Schedule.
+Two bits in the `RM` field
+enable either "packing" or "unpacking"
on the subvectors vec2/3/4.
-Illustrating a
-"normal" SVP64 operation with `SUBVL!=1:` (assuming no elwidth overrides):
+First, llustrating a
+"normal" SVP64 operation with `SUBVL!=1:` (assuming no elwidth overrides),
+note that the VL loop is outer and the SUBVL loop inner:
def index():
for i in range(VL):
for idx in index():
operation_on(RA+idx)
-For pack/unpack (again, no elwidth overrides):
+For pack/unpack (again, no elwidth overrides), note that now there is the
+option to swap the SUBVL and VL loop orders.
+In effect the Pack/Unpack performs a Transpose of the subvector elements:
# yield an outer-SUBVL or inner VL loop with SUBVL
def index_p(outer):
Setting of both `PACK_en` and `UNPACK_en` is neither prohibited nor
`UNDEFINED` because the reordering is fully deterministic, and
-additional REMAP reordering may be applied. For Matrix this would
+additional REMAP reordering may be applied. Combined with
+Matrix REMAP this would
give potentially up to 4 Dimensions of reordering.
-Pack/Unpack applies to mv operations, mv.swizzle,
+Pack/Unpack applies primarily to mv operations, mv.swizzle,
and some other single-source
single-destination operations such as Indexed LD/ST and extsw.
[[sv/mv.swizzle]] has a slightly different pseudocode algorithm