From: lkcl <lkcl@web>
Date: Fri, 6 May 2022 17:00:31 +0000 (+0100)
Subject: (no commit message)
X-Git-Tag: opf_rfc_ls005_v1~2365
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=ddd5080ed9dd6c9d7210ed8d41cf081aeeb535e6;p=libreriscv.git

---

diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn
index b8e74d83e..a387fdf55 100644
--- a/openpower/sv/SimpleV_rationale.mdwn
+++ b/openpower/sv/SimpleV_rationale.mdwn
@@ -628,7 +628,8 @@ the concept introduced by Extra-V: where Extra-V brought information
 about Sparse-Distributed Data to the attention of the main CPU in
 a coherent fashion *without the CPU having to ask for it*, Snitch
 demonstrates a classic LOAD-COMPUTE-STORE cycle in the same
-distributed coherent manner.
+distributed coherent manner, and does so with dramatically-reduced
+power consumption.
 
 **Bringing it all together**
 
@@ -636,4 +637,36 @@ At this point we are well into a future revision of SVP64, one that
 clearly has some startlingly powerful potential: Supercomputing-class
 Multi-Issue Vector Engines kept 100% occupied in a 100% long-term
 sustained fashion with reduced complexity, reduced power consumption
-and reduced completion time, thanks to Deterministic Coherent
+and reduced completion time, thanks to Deterministic Coherent Scheduling
+of the data fed in and out, or even moved down next to Memory.
+
+This last part is where it normally gets hair-raising, but as ZOLC shows
+there is no reason at all why even complex algorithms such as MPEG cannot
+be run in a partially-deterministic manner, and anything that is 
+deterministic can be Scheduled, coherently.  Combine that with OpenCAPI
+which solves the many issues associated with SMP Virtual Memory and so on
+yet still allows Cache-Coherent Distributed Memory Access, and what was
+previously an intractable Computer Science problem for decades begins to
+look like there is a potential solution.
+
+The Deterministic Schedules created by ZOLC should even be possible to identify their
+suitability for full off-CPU distributed processing, as long as OpenCAPI
+is integrated into the mix.  What a compiler - or even the hardware -
+will be looking out for is a Basic Block of instructions that:
+
+* begins with a LOAD (to be handled by OpenCAPI)
+* contains some instructions that a given PE is capable of executing
+* ends with a STORE (again: OpenCAPI)
+
+For best results that would be wrapped with a Zero-Overhead Loop, where
+the Compiler (or hardware at runtime) could easily identify, in advance,
+the full range of Memory Addresses that the Loop is to encounter.  Copies
+of loop-invariant data would need to be passed down to the remote PE:
+again, for simple-enough Basic Blocks, with assistance from the Compiler,
+loop-invariant inputs are easily identified.
+
+The importance of OpenCAPI in this mix cannot be underestimated, because
+it will be the means by which the main CPU coordinates its activities
+with the remote PEs, ensuring that LOAD/STORE Memory Hazards are not
+violated.
+