(no commit message)

author lkcl <lkcl@web>

Fri, 6 May 2022 13:56:16 +0000 (14:56 +0100)

committer IkiWiki <ikiwiki.info>

Fri, 6 May 2022 13:56:16 +0000 (14:56 +0100)
author lkcl <lkcl@web>
Fri, 6 May 2022 13:56:16 +0000 (14:56 +0100)
committer IkiWiki <ikiwiki.info>
Fri, 6 May 2022 13:56:16 +0000 (14:56 +0100)
diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn

index ef3fac685ea99ee67fd44e76acb99e4cf40d3c8e..537543e5b741c9504d9b3d4cf1bae3a2bef9fe60 100644 (file)
--- a/openpower/sv/SimpleV_rationale.mdwn
+++ b/openpower/sv/SimpleV_rationale.mdwn
@@ -563,8 +563,8 @@ has nested conditional for-loops Extra-V appears to have just the
  one conditional for-loop, but the key strategically-crucial
  part of this multi-faceted puzzle is that due to the deterministic and
  coherent nature of Extra-V, the processing of the loops, which
-requires a tiny processor, is not
-done close to the CPU at all: it is
+requires a tiny non-Turing-Complete processor, is not
+done close to or by the main CPU at all: it is
  *embedded right next to the memory*.
  
  The similarity to the D-Matrix Systolic Array Processing, Aspex Microelectronics
@@ -572,7 +572,7 @@ Array-String Processing, and Elixent 2D Array Processing, should
  also not have gone unnoticed.  All of these solutions utilised
  or utilise
  a more comprehensive Turing-complete von-Neumann "Management Core"
-to coordinate data passed in and out of PEs: none of them had or
+to coordinate data passed in and out of PEs: none of them have or
  had something
  as powerful as OpenCAPI as part of that picture.
  
@@ -580,30 +580,39 @@ as powerful as OpenCAPI as part of that picture.
  
  Snitch is an elegant Memory-Coherent Barrel-Processor where registers
  become "tagged" with a Memory-access Mode that went out of fashion
-over forty years ago: Load-and-Increment. Expressed in c as
-`src = *x++`, and requiring special Address Registers (PDP-11, 68000)
-the efficiency of these Load-Store-with-Increment instructions has been
+over forty years ago: Load-then-Auto-Increment. Expressed in c as
+`src = *x++`, and requiring special Address Registers (PDP-11, 68000),
+thanks to the RISC paradigm having gone too far,
+the efficiency and effectiveness
+of these Load-Store-with-Increment instructions has been
  forgotten until Snitch.
  
  What the designers did however was not to add new Load-Store
  or Arithmetic instructions to RISC-V, but instead to "mark"
-registers with a tag.  These tags tell the CPU: when you perform
-an add on r6 and r7, please perform a Cache-coherent Load-with-Increment
-on each, using special Address Registers for each.  Each reference
-to r6 therefore brings in an entirely new value *directly from
+registers with a tag.  These tags tell the CPU: when you are asked to
+carry out
+an add instruction on r6 and r7, do not take r6 or r7 from the reguster
+file, instead please perform a Cache-coherent Load-with-Increment
+on each, using special Address Registers for each.  Each new use
+of r6 therefore brings in an entirely new value *directly from
  memory*. Likewise on the second operand, r7, and likewise on
-the destination which can be automatic Store-and-increment.
+the destination result which can be an automatic Coherent
+Store-and-increment
+directly into Memory.
  
  On top of a barrel-architecture the slowness of Memory access
  was not a problem because the Deterministic nature of classic
  Load-Store-Increment can be compensated for by having 8 Memory
  accesses scheduled underway and interleaved in a time-sliced
  fashion with an FPU that is correspondingly 8 times faster than
-Memory accesses.
+the Coherent Memory accesses.
  
  This design is almost identical to the early Vector Processors
-of the late 1950s and early 1960s. The barrel-archutecture neatly
+of the late 1950s and early 1960s, which also critically relied
+on implicit auto-increment addressing. The barrel-architecture neatly
  solves one of the inherent problems with those designs (memory
  speed) and the presence of a full register file caters for a
  second limitation of pure Memory-based Vector Processors: temporary
-variables needed in the computation of 
+variables needed in the computation of intermediate results put
+an awfully high artificial load on Memory bandwidth.
+
author	lkcl <lkcl@web>
	Fri, 6 May 2022 13:56:16 +0000 (14:56 +0100)
committer	IkiWiki <ikiwiki.info>
	Fri, 6 May 2022 13:56:16 +0000 (14:56 +0100)