From 56e9f5ca0e045097a93de5c36eee42cef7bb0d1d Mon Sep 17 00:00:00 2001 From: lkcl Date: Fri, 6 May 2022 14:56:16 +0100 Subject: [PATCH] --- openpower/sv/SimpleV_rationale.mdwn | 37 ++++++++++++++++++----------- 1 file changed, 23 insertions(+), 14 deletions(-) diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn index ef3fac685..537543e5b 100644 --- a/openpower/sv/SimpleV_rationale.mdwn +++ b/openpower/sv/SimpleV_rationale.mdwn @@ -563,8 +563,8 @@ has nested conditional for-loops Extra-V appears to have just the one conditional for-loop, but the key strategically-crucial part of this multi-faceted puzzle is that due to the deterministic and coherent nature of Extra-V, the processing of the loops, which -requires a tiny processor, is not -done close to the CPU at all: it is +requires a tiny non-Turing-Complete processor, is not +done close to or by the main CPU at all: it is *embedded right next to the memory*. The similarity to the D-Matrix Systolic Array Processing, Aspex Microelectronics @@ -572,7 +572,7 @@ Array-String Processing, and Elixent 2D Array Processing, should also not have gone unnoticed. All of these solutions utilised or utilise a more comprehensive Turing-complete von-Neumann "Management Core" -to coordinate data passed in and out of PEs: none of them had or +to coordinate data passed in and out of PEs: none of them have or had something as powerful as OpenCAPI as part of that picture. @@ -580,30 +580,39 @@ as powerful as OpenCAPI as part of that picture. Snitch is an elegant Memory-Coherent Barrel-Processor where registers become "tagged" with a Memory-access Mode that went out of fashion -over forty years ago: Load-and-Increment. Expressed in c as -`src = *x++`, and requiring special Address Registers (PDP-11, 68000) -the efficiency of these Load-Store-with-Increment instructions has been +over forty years ago: Load-then-Auto-Increment. Expressed in c as +`src = *x++`, and requiring special Address Registers (PDP-11, 68000), +thanks to the RISC paradigm having gone too far, +the efficiency and effectiveness +of these Load-Store-with-Increment instructions has been forgotten until Snitch. What the designers did however was not to add new Load-Store or Arithmetic instructions to RISC-V, but instead to "mark" -registers with a tag. These tags tell the CPU: when you perform -an add on r6 and r7, please perform a Cache-coherent Load-with-Increment -on each, using special Address Registers for each. Each reference -to r6 therefore brings in an entirely new value *directly from +registers with a tag. These tags tell the CPU: when you are asked to +carry out +an add instruction on r6 and r7, do not take r6 or r7 from the reguster +file, instead please perform a Cache-coherent Load-with-Increment +on each, using special Address Registers for each. Each new use +of r6 therefore brings in an entirely new value *directly from memory*. Likewise on the second operand, r7, and likewise on -the destination which can be automatic Store-and-increment. +the destination result which can be an automatic Coherent +Store-and-increment +directly into Memory. On top of a barrel-architecture the slowness of Memory access was not a problem because the Deterministic nature of classic Load-Store-Increment can be compensated for by having 8 Memory accesses scheduled underway and interleaved in a time-sliced fashion with an FPU that is correspondingly 8 times faster than -Memory accesses. +the Coherent Memory accesses. This design is almost identical to the early Vector Processors -of the late 1950s and early 1960s. The barrel-archutecture neatly +of the late 1950s and early 1960s, which also critically relied +on implicit auto-increment addressing. The barrel-architecture neatly solves one of the inherent problems with those designs (memory speed) and the presence of a full register file caters for a second limitation of pure Memory-based Vector Processors: temporary -variables needed in the computation of +variables needed in the computation of intermediate results put +an awfully high artificial load on Memory bandwidth. + -- 2.30.2