From ddd5080ed9dd6c9d7210ed8d41cf081aeeb535e6 Mon Sep 17 00:00:00 2001 From: lkcl Date: Fri, 6 May 2022 18:00:31 +0100 Subject: [PATCH] --- openpower/sv/SimpleV_rationale.mdwn | 37 +++++++++++++++++++++++++++-- 1 file changed, 35 insertions(+), 2 deletions(-) diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn index b8e74d83e..a387fdf55 100644 --- a/openpower/sv/SimpleV_rationale.mdwn +++ b/openpower/sv/SimpleV_rationale.mdwn @@ -628,7 +628,8 @@ the concept introduced by Extra-V: where Extra-V brought information about Sparse-Distributed Data to the attention of the main CPU in a coherent fashion *without the CPU having to ask for it*, Snitch demonstrates a classic LOAD-COMPUTE-STORE cycle in the same -distributed coherent manner. +distributed coherent manner, and does so with dramatically-reduced +power consumption. **Bringing it all together** @@ -636,4 +637,36 @@ At this point we are well into a future revision of SVP64, one that clearly has some startlingly powerful potential: Supercomputing-class Multi-Issue Vector Engines kept 100% occupied in a 100% long-term sustained fashion with reduced complexity, reduced power consumption -and reduced completion time, thanks to Deterministic Coherent +and reduced completion time, thanks to Deterministic Coherent Scheduling +of the data fed in and out, or even moved down next to Memory. + +This last part is where it normally gets hair-raising, but as ZOLC shows +there is no reason at all why even complex algorithms such as MPEG cannot +be run in a partially-deterministic manner, and anything that is +deterministic can be Scheduled, coherently. Combine that with OpenCAPI +which solves the many issues associated with SMP Virtual Memory and so on +yet still allows Cache-Coherent Distributed Memory Access, and what was +previously an intractable Computer Science problem for decades begins to +look like there is a potential solution. + +The Deterministic Schedules created by ZOLC should even be possible to identify their +suitability for full off-CPU distributed processing, as long as OpenCAPI +is integrated into the mix. What a compiler - or even the hardware - +will be looking out for is a Basic Block of instructions that: + +* begins with a LOAD (to be handled by OpenCAPI) +* contains some instructions that a given PE is capable of executing +* ends with a STORE (again: OpenCAPI) + +For best results that would be wrapped with a Zero-Overhead Loop, where +the Compiler (or hardware at runtime) could easily identify, in advance, +the full range of Memory Addresses that the Loop is to encounter. Copies +of loop-invariant data would need to be passed down to the remote PE: +again, for simple-enough Basic Blocks, with assistance from the Compiler, +loop-invariant inputs are easily identified. + +The importance of OpenCAPI in this mix cannot be underestimated, because +it will be the means by which the main CPU coordinates its activities +with the remote PEs, ensuring that LOAD/STORE Memory Hazards are not +violated. + -- 2.30.2