From f3ba1a1ebfce7768e4d9de8a465b306a346d19d6 Mon Sep 17 00:00:00 2001
From: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date: Tue, 14 Jun 2022 13:10:58 +0100
Subject: [PATCH] add matrix multiply image into whitepaper

---
 openpower/sv/SimpleV_rationale.mdwn | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)
diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn
index d16c79a2b..ff0bb23f4 100644
--- a/openpower/sv/SimpleV_rationale.mdwn
+++ b/openpower/sv/SimpleV_rationale.mdwn
@@ -915,6 +915,23 @@ Vectorisation Modes in SVP64:
   moving to next **element**.  Currently managed by `svstep`,
   ZOLC may be deployed to manage the stepping, in a Deterministic manner.
 
+Second:
+SVP64 Draft Matrix Multiply is currently set up to arrange a Schedule
+of Multiply-and-Accumulates, suitable for pipelining, that will,
+ultimately, result in a Matrix Multiply. Normal processors are forced
+to perform "loop-unrolling" in order to achieve this same Schedule.
+SIMD processors are further forced into a situation of pre-arranging rotated
+copies of data if the Matrices are not exactly on a power-of-two boundary.
+
+The current limitation of SVP64 however is (when Horizontal-First
+is deployed, at least, which is the least number of instructions)
+that both source and destination Matrices have to be in-registers,
+in full.  Vertical-First may be used to perform a LD/ST within
+the loop, covered by `svstep`, but it is still not ideal.  This
+is where the Snitch and EXTRA-V concepts kick in.
+
+<img src="https://ftp.libre-soc.org/matrix_svremap.jpg" width=600 />
+
 Imagine a large Matrix scenario, with several values close to zero that
 could be skipped: no need to include zero-multiplications, but a
 traditional CPU in no way can help: only by loading the data through
-- 
2.30.2