it is the destination that swaps its loop-order. Setting both
`PACK_en` and `UNPACK_en` is neither prohibited nor `UNDEFINED`
because the behaviour is fully deterministic.
+
+*However*, in
+Vertical-First Mode, when both are enabled,
+with both source and destination being outer loops a **single**
+step of srstep and dststep is performed. Contrast this when
+one of `PACK_en` is set, it is the *destination* that is an inner
+subvector loop, and therefore Vertical-First runs through the
+entire `dst_subvl` group. Likewise when `UNPACK_en` is set it
+is the source subvector that is run through as a group.
+
+```
+if VERTICAL_FIRST:
+ # must run through SUBVL or dst_subvl elements, to keep
+ # the subvector "together". weirdness occurs due to
+ # PACK_en/UNPACK_en
+ num_runs = SUBVL # 1-4
+ if PACK_en:
+ num_runs = dst_subvl # destination still an inner loop
+ if PACK_en and UNPACK_en:
+ num_runs = 1 # both are outer loops
+ for substep in num_runs:
+ (src_idx, offs) = yield from index_src()
+ dst_idx = yield from index_dst()
+ move_operation(RT+dst_idx, RA+src_idx+offs)
+```