accessing elements in alternative sequences (0 3 6 1 4 7 ...)
* normal Vector ISAs use either Indexed-MV or Indexed-LD/ST to "cope"
with this. both are expensive (copy large vectors, spill through memory)
- and very few Packed SIMD ISAs cope with non-Power-2.
+ and very few Packed SIMD ISAs cope with non-Power-2
+ (Duplicate-data inline-loop-unrolling is the costly solution)
* REMAP **redefines** the order of access according to set
(Deterministic) "Schedules".
* Matrix Schedules are not at all restricted to power-of-two boundaries