From: lkcl <lkcl@web>
Date: Fri, 6 May 2022 08:40:39 +0000 (+0100)
Subject: (no commit message)
X-Git-Tag: opf_rfc_ls005_v1~2403
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=a1a1590c7bb682b7a3619d5e4e539e138feca36a;p=libreriscv.git

---

diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn
index 8845d0505..551b719db 100644
--- a/openpower/sv/SimpleV_rationale.mdwn
+++ b/openpower/sv/SimpleV_rationale.mdwn
@@ -389,4 +389,17 @@ execution even further.
 Additional savings come in the form of `SVREMAP`. This is a hardware
 index transformation system where the normally sequentially-linear
 element access may be "Re-Mapped" to a limited but algorithmic-tailored
-deterministic schedule, for example Matrix Multiply, DCT, or FFT.
+and commonly-used deterministic schedule, for example Matrix Multiply,
+DCT, or FFT.  A full in-register-file 5x7 Matrix Multiply or a 3x4 or
+2x6 may be performed in as little as 4 instructions, one of which
+is to zero-initialise the accumulator Vector used to store the result.
+If addition to another Matrix is also required then it is only three
+instructions.  Not only that, but because the "Schedule" is an abstract
+concept separated from the mathematical operation, there is no reason
+why Matrix Multiplication Schedules may not be applied to Integer
+Mul-and-Accumulate, Galois Field Mul-and-Accumulate, or Logical
+AND-and-OR.  The flexibility is not only enormous, but the compactness
+unprecedented.  RADIX2 in-place DCT Triple-loop Schedules may be created in
+around 11 instructions. The only other processors well-known to have
+this type of compact capability are both VLIW DSPs: TI's TMS320 Series
+and Qualcom's Hexagon.