From 22a63f683c85612f07f38f339666c0428274e106 Mon Sep 17 00:00:00 2001 From: lkcl Date: Fri, 19 May 2023 17:53:08 +0100 Subject: [PATCH] --- openpower/sv/rfc/ls009.mdwn | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/openpower/sv/rfc/ls009.mdwn b/openpower/sv/rfc/ls009.mdwn index 8559f2b11..60dd70c9f 100644 --- a/openpower/sv/rfc/ls009.mdwn +++ b/openpower/sv/rfc/ls009.mdwn @@ -123,6 +123,31 @@ The result is a huge saving on register file accesses (no need to calculate Indi then use Permutation instructions), instruction count (Matrix Multiply up to 127 FMACs is 3 instructions), and programmer sanity. +**Basic principle** + +The following illustrates why REMAP was added. + +* normal vector element read/write of operands would be sequential + (0 1 2 3 ....) +* this is not appropriate for (e.g.) Matrix multiply which requires + accessing elements in alternative sequences (0 3 6 1 4 7 ...) +* normal Vector ISAs use either Indexed-MV or Indexed-LD/ST to "cope" + with this. both are expensive (copy large vectors, spill through memory) + and very few Packed SIMD ISAs cope with non-Power-2 + (Duplicate-data inline-loop-unrolling is the costly solution) +* REMAP **redefines** the order of access according to set + (Deterministic) "Schedules". +* Matrix Schedules are not at all restricted to power-of-two boundaries + making it unnecessary to have for example specialised 3x4 transpose + instructions of other Vector ISAs. +* DCT and FFT REMAP are RADIX-2 limited but this is the case in existing Packed/Predicated + SIMD ISAs anyway (and Bluestein Convolution is typically deployed to + solve that). + +Only the most commonly-used algorithms in computer science have REMAP +support, due to the high cost in both the ISA and in hardware. For +arbitrary remapping the `Indexed` REMAP may be used. + # REMAP types This section summarises the motivation for each REMAP Schedule -- 2.30.2