then use Permutation instructions), instruction count (Matrix Multiply up to 127 FMACs
is 3 instructions), and programmer sanity.
+**Basic principle**
+
+The following illustrates why REMAP was added.
+
+* normal vector element read/write of operands would be sequential
+ (0 1 2 3 ....)
+* this is not appropriate for (e.g.) Matrix multiply which requires
+ accessing elements in alternative sequences (0 3 6 1 4 7 ...)
+* normal Vector ISAs use either Indexed-MV or Indexed-LD/ST to "cope"
+ with this. both are expensive (copy large vectors, spill through memory)
+ and very few Packed SIMD ISAs cope with non-Power-2
+ (Duplicate-data inline-loop-unrolling is the costly solution)
+* REMAP **redefines** the order of access according to set
+ (Deterministic) "Schedules".
+* Matrix Schedules are not at all restricted to power-of-two boundaries
+ making it unnecessary to have for example specialised 3x4 transpose
+ instructions of other Vector ISAs.
+* DCT and FFT REMAP are RADIX-2 limited but this is the case in existing Packed/Predicated
+ SIMD ISAs anyway (and Bluestein Convolution is typically deployed to
+ solve that).
+
+Only the most commonly-used algorithms in computer science have REMAP
+support, due to the high cost in both the ISA and in hardware. For
+arbitrary remapping the `Indexed` REMAP may be used.
+
# REMAP types
This section summarises the motivation for each REMAP Schedule