From: lkcl <lkcl@web>
Date: Fri, 6 May 2022 10:29:07 +0000 (+0100)
Subject: (no commit message)
X-Git-Tag: opf_rfc_ls005_v1~2391
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=03f31857e0c45dde7307106697477094d6bc66c5;p=libreriscv.git

---

diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn
index cdc340dfc..f6ab38263 100644
--- a/openpower/sv/SimpleV_rationale.mdwn
+++ b/openpower/sv/SimpleV_rationale.mdwn
@@ -373,7 +373,7 @@ Which brings us to the next important question: how is any of these
 CPU-centric Vector-centric improvements relevant to power efficiency
 and making more effective use of resources?
 
-# Simpler more compact programs
+# Simpler more compact programs saves power
 
 The first and most obvious saving is that, just as with any Vector
 ISA, the amount of data processing requested
@@ -393,7 +393,8 @@ to have the Packed SIMD setup and teardown. `strncpy` for VSX is an
 astounding 240 hand-coded assembler instructions where it is around
 12 to 14 for both RVV and SVP64. Worst case (full algorithm unrolling
 for Massive FFTs) the L1 I-Cache becomes completely ineffective, and in
-the case of the IBM POWER9 a little-known design flaw this results in
+the case of the IBM POWER9 with a little-known design flaw not
+normally otherwise encountered this results in
 contention between the L1 D and I Caches at the L2 Bus, slowing down
 execution even further.  Power ISA 3.1 MMA (Matrix-Multiply-Assist)
 requires loop-unrolling to contend with non-power-of-two Matrix
@@ -426,3 +427,15 @@ for example). Bear in mind that
 the submission process will be
 entirely at the discretion of the OpenPOWER Foundation ISA WG,
 something that is both encouraged and welcomed by the OPF.
+
+One of SVP64's current limitations is that it was initially designed
+for 3D and Video workloads as a hybrid GPU-VPU-CPU. This resulted in
+a heavy focus on adding hardware-for-loops onto the *Registers*.
+After more than three years of development the realisation hit that
+the SVP64 concept could be expanded to Coherent Distributed Memory,
+This astoundingly powerful concept is explored in the next section.
+
+# Coherent Deterministic Hybrid Distributed Memory-Processing
+
+It is not often that a heading for an article can legitimately
+contain quite so many buzzwords, but in this section it is justified.