Commonly-used patterns such as Matrix Multiply, DCT and FFT have
helper instruction options which make REMAP easier to use.
+There are three types of REMAP:
+
+* **Matrix**, also known as 2D and 3D reshaping
+* **FFT/DCT**, with full triple-loop in-place support: limited to
+ Power-2 RADIX
+* **Indexing**, for any general-purpose reordering. Currently
+ under development.
+
# Principle
-* normal vector element read/write as operands would be sequential
+* normal vector element read/write of operands would be sequential
(0 1 2 3 ....)
* this is not appropriate for (e.g.) Matrix multiply which requires
accessing elements in alternative sequences (0 3 6 1 4 7 ...)
* normal Vector ISAs use either Indexed-MV or Indexed-LD/ST to "cope"
with this. both are expensive (copy large vectors, spill through memory)
+ and very few Packed SIMD ISAs cope with non-Power-2.
* REMAP **redefines** the order of access according to set "Schedules".
* The Schedules are not necessarily restricted to power-of-two boundaries
making it unnecessary to have for example specialised 3x4 transpose