From 22a63f683c85612f07f38f339666c0428274e106 Mon Sep 17 00:00:00 2001
From: lkcl <lkcl@web>
Date: Fri, 19 May 2023 17:53:08 +0100
Subject: [PATCH]

---
 openpower/sv/rfc/ls009.mdwn | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/openpower/sv/rfc/ls009.mdwn b/openpower/sv/rfc/ls009.mdwn
index 8559f2b11..60dd70c9f 100644
--- a/openpower/sv/rfc/ls009.mdwn
+++ b/openpower/sv/rfc/ls009.mdwn
@@ -123,6 +123,31 @@ The result is a huge saving on register file accesses (no need to calculate Indi
 then use Permutation instructions), instruction count (Matrix Multiply up to 127 FMACs
 is 3 instructions), and programmer sanity.
 
+**Basic principle**
+
+The following illustrates why REMAP was added.
+
+* normal vector element read/write of operands would be sequential
+  (0 1 2 3 ....)
+* this is not appropriate for (e.g.) Matrix multiply which requires
+  accessing elements in alternative sequences (0 3 6 1 4 7 ...)
+* normal Vector ISAs use either Indexed-MV or Indexed-LD/ST to "cope"
+  with this.  both are expensive (copy large vectors, spill through memory)
+  and very few Packed SIMD ISAs cope with non-Power-2
+  (Duplicate-data inline-loop-unrolling is the costly solution)
+* REMAP **redefines** the order of access according to set
+  (Deterministic) "Schedules".
+* Matrix Schedules are not at all restricted to power-of-two boundaries
+  making it unnecessary to have for example specialised 3x4 transpose
+  instructions of other Vector ISAs.
+* DCT and FFT REMAP are RADIX-2 limited but this is the case in existing Packed/Predicated
+  SIMD ISAs anyway (and Bluestein Convolution is typically deployed to
+  solve that).
+
+Only the most commonly-used algorithms in computer science have REMAP
+support, due to the high cost in both the ISA and in hardware.  For
+arbitrary remapping the `Indexed` REMAP may be used.
+
 # REMAP types
 
 This section summarises the motivation for each REMAP Schedule
-- 
2.30.2