From: lkcl <lkcl@web>
Date: Thu, 7 Jul 2022 09:11:04 +0000 (+0100)
Subject: (no commit message)
X-Git-Tag: opf_rfc_ls005_v1~1305
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=56eb674b93719bc5aa435f3311eefe7489b01ae1;p=libreriscv.git

---

diff --git a/openpower/sv/remap.mdwn b/openpower/sv/remap.mdwn
index bc67a847c..57f13f556 100644
--- a/openpower/sv/remap.mdwn
+++ b/openpower/sv/remap.mdwn
@@ -112,7 +112,23 @@ and briefly goes over their characteristics and limitations.
 
 ## Matrix (1D/2D/3D shaping)
 
-TODO
+Matrix Multiplication is a huge part of High-Performance Compute,
+and 3D.
+In many PackedSIMD as well as Scalable Vector ISAs, non-power-of-two
+Matrix sizes are a serious challenge. PackedSIMD ISAs, in order to
+cope with for example 3x4 Matrices, recommend rolling data-repetition and loop-unrolling.
+Aside from the cost of the load on the L1 I-Cache, the trick only
+works if one of the dimensions X or Y are power-two. Prime Numbers
+(5x7, 3x5) become deeply problematic to unroll.
+
+Even traditional Scalable Vector ISAs have issues with Matrices, often
+having to perform data Transpose by pushing out through Memory and back,
+or computing Transposition Indices (costly) then copying to another
+Vector (costly).
+
+Matrix REMAP was thus designed to solve these issues by providing
+"Schedules" that can view what would otherwise be limited to a strictly
+linear Vector as instead being 2D (even 3D) in-place reordered.
 
 ## FFT/DCT Triple Loop