From: lkcl <lkcl@web>
Date: Fri, 6 May 2022 09:13:30 +0000 (+0100)
Subject: (no commit message)
X-Git-Tag: opf_rfc_ls005_v1~2395
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=381f4f3972da8aafabef00c7b52eefc1c9e8a92f;p=libreriscv.git

---

diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn
index ec20d02f9..2b47344fc 100644
--- a/openpower/sv/SimpleV_rationale.mdwn
+++ b/openpower/sv/SimpleV_rationale.mdwn
@@ -223,7 +223,7 @@ Packed SIMD explicitly smashes that width right in the face of the
 programmer and expects them to like it.  As the article immediately
 demonstrates, an arbitrary-sized data set has to contend with
 an insane power-of-two Packed SIMD cascade at both setup and teardown
-that can add literally an order
+that routinely adds literally an order
 of magnitude increase in the number of hand-written lines of assembler
 compared to a well-designed Cray-style Vector ISA with a `setvl`
 instruction.
@@ -393,7 +393,9 @@ astounding 240 hand-coded assembler instructions where it is around
 for Massive FFTs) the L1 I-Cache becomes completely ineffective, and in
 the case of the IBM POWER9 a little-known design flaw this results in
 contention between the L1 D and I Caches at the L2 Bus, slowing down
-execution even further.
+execution even further.  Power ISA 3.1 MMA (Matrix-Multiply-Assist)
+requires loop-unrolling to contend with non-power-of-two Matrix
+sizes: SVP64 does not, as hinted at below.
 
 Additional savings come in the form of `SVREMAP`. This is a hardware
 index transformation system where the normally sequentially-linear
@@ -403,7 +405,7 @@ DCT, or FFT.  A full in-register-file 5x7 Matrix Multiply or a 3x4 or
 2x6 may be performed in as little as 4 instructions, one of which
 is to zero-initialise the accumulator Vector used to store the result.
 If addition to another Matrix is also required then it is only three
-instructions.  Not only that, but because the "Schedule" is an abstract
+instructions. Not only that, but because the "Schedule" is an abstract
 concept separated from the mathematical operation, there is no reason
 why Matrix Multiplication Schedules may not be applied to Integer
 Mul-and-Accumulate, Galois Field Mul-and-Accumulate, or Logical