From 56e9f5ca0e045097a93de5c36eee42cef7bb0d1d Mon Sep 17 00:00:00 2001
From: lkcl <lkcl@web>
Date: Fri, 6 May 2022 14:56:16 +0100
Subject: [PATCH]

---
 openpower/sv/SimpleV_rationale.mdwn | 37 ++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn
index ef3fac685..537543e5b 100644
--- a/openpower/sv/SimpleV_rationale.mdwn
+++ b/openpower/sv/SimpleV_rationale.mdwn
@@ -563,8 +563,8 @@ has nested conditional for-loops Extra-V appears to have just the
 one conditional for-loop, but the key strategically-crucial
 part of this multi-faceted puzzle is that due to the deterministic and
 coherent nature of Extra-V, the processing of the loops, which
-requires a tiny processor, is not
-done close to the CPU at all: it is
+requires a tiny non-Turing-Complete processor, is not
+done close to or by the main CPU at all: it is
 *embedded right next to the memory*.
 
 The similarity to the D-Matrix Systolic Array Processing, Aspex Microelectronics
@@ -572,7 +572,7 @@ Array-String Processing, and Elixent 2D Array Processing, should
 also not have gone unnoticed.  All of these solutions utilised
 or utilise
 a more comprehensive Turing-complete von-Neumann "Management Core"
-to coordinate data passed in and out of PEs: none of them had or
+to coordinate data passed in and out of PEs: none of them have or
 had something
 as powerful as OpenCAPI as part of that picture.
 
@@ -580,30 +580,39 @@ as powerful as OpenCAPI as part of that picture.
 
 Snitch is an elegant Memory-Coherent Barrel-Processor where registers
 become "tagged" with a Memory-access Mode that went out of fashion
-over forty years ago: Load-and-Increment. Expressed in c as
-`src = *x++`, and requiring special Address Registers (PDP-11, 68000)
-the efficiency of these Load-Store-with-Increment instructions has been
+over forty years ago: Load-then-Auto-Increment. Expressed in c as
+`src = *x++`, and requiring special Address Registers (PDP-11, 68000),
+thanks to the RISC paradigm having gone too far,
+the efficiency and effectiveness
+of these Load-Store-with-Increment instructions has been
 forgotten until Snitch.
 
 What the designers did however was not to add new Load-Store
 or Arithmetic instructions to RISC-V, but instead to "mark"
-registers with a tag.  These tags tell the CPU: when you perform
-an add on r6 and r7, please perform a Cache-coherent Load-with-Increment
-on each, using special Address Registers for each.  Each reference
-to r6 therefore brings in an entirely new value *directly from
+registers with a tag.  These tags tell the CPU: when you are asked to
+carry out
+an add instruction on r6 and r7, do not take r6 or r7 from the reguster
+file, instead please perform a Cache-coherent Load-with-Increment
+on each, using special Address Registers for each.  Each new use
+of r6 therefore brings in an entirely new value *directly from
 memory*. Likewise on the second operand, r7, and likewise on
-the destination which can be automatic Store-and-increment.
+the destination result which can be an automatic Coherent
+Store-and-increment
+directly into Memory.
 
 On top of a barrel-architecture the slowness of Memory access
 was not a problem because the Deterministic nature of classic
 Load-Store-Increment can be compensated for by having 8 Memory
 accesses scheduled underway and interleaved in a time-sliced
 fashion with an FPU that is correspondingly 8 times faster than
-Memory accesses.
+the Coherent Memory accesses.
 
 This design is almost identical to the early Vector Processors
-of the late 1950s and early 1960s. The barrel-archutecture neatly
+of the late 1950s and early 1960s, which also critically relied
+on implicit auto-increment addressing. The barrel-architecture neatly
 solves one of the inherent problems with those designs (memory
 speed) and the presence of a full register file caters for a
 second limitation of pure Memory-based Vector Processors: temporary
-variables needed in the computation of 
+variables needed in the computation of intermediate results put
+an awfully high artificial load on Memory bandwidth.
+
-- 
2.30.2