(no commit message)

author lkcl <lkcl@web>

Fri, 6 May 2022 20:20:18 +0000 (21:20 +0100)

committer IkiWiki <ikiwiki.info>

Fri, 6 May 2022 20:20:18 +0000 (21:20 +0100)
author lkcl <lkcl@web>
Fri, 6 May 2022 20:20:18 +0000 (21:20 +0100)
committer IkiWiki <ikiwiki.info>
Fri, 6 May 2022 20:20:18 +0000 (21:20 +0100)
diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn

index 3fa0eecc851005426368db23314f41237ce470dd..b69170ea665d3e06b50afc632e1d5ebd135e2404 100644 (file)
--- a/openpower/sv/SimpleV_rationale.mdwn
+++ b/openpower/sv/SimpleV_rationale.mdwn
@@ -454,18 +454,21 @@ It should therefore come as no surprise that attempts are being made
  to move (distribute) processing closer to the DRAM Memory, firmly
  on the *opposite* side of the main CPU's L1/2/3/4 Caches.  However
  the alarm bells ring here at the keyword "distributed", because by
-moving the processing down next to the Memory, the speed of any
-of the parallel Processing Elements (PEs) has dropped
-by almost two orders of magnitude (5 ghz down to 100 mhz),
+moving the processing down next to the Memory, even onto
+the same die as the DRAM, the speed of any
+of the parallel Processing Elements (PEs) would likely drop
+by almost two orders of magnitude (5 ghz down to 150 mhz),
  the simplicity of each PE has, for pure pragmatic reasons,
  to drop by several
  orders of magnitude as well.
  Things that the average "sequential algorithm"
  programmer
  takes for granted such as SMP, Cache Coherency, Virtual Memory,
-spinlocks (atomic locking), all of these are either outright gone
+spinlocks (atomic locking, mutexes), all of these are either outright gone
  or expected that the programmer shall explicitly contend with
-(even if that programmer is the Compiler Developer).
+(even if that programmer is the Compiler Developer). There's definitely
+not going to be a standard OS: the PEs will be too basic, too
+resource-constrained, and definitely too busy.
  
  To give an extreme example: Aspex's Array-String Processor, which
  was 4096 2-bit SIMD PEs each with 256 bytes of Content Addressable
@@ -474,9 +477,9 @@ performance over Scalar CPUs such as the Pentium III of its era,
  all on a 3 watt budget at only 250 mhz in 130 nm.  Yet to take
  proper advantage of its capability required an astounding 5-10
  *days* per line of assembly code because multiple versions of
-an algorithm had to be hand-crafted then compared, and
-the best one selected, all others discarded. 20 lines of optimised
-Assembler taking six months to write can in no way be termed
+an algorithm had to be hand-crafted then compared, and only
+the best one selected: all others discarded. 20 lines of optimised
+Assembler taking three to six months to write can in no way be termed
  "productive", yet this extreme level of unproductivity is an inherent
  side-effect of going down the parallel-processing rabbithole.
author	lkcl <lkcl@web>
	Fri, 6 May 2022 20:20:18 +0000 (21:20 +0100)
committer	IkiWiki <ikiwiki.info>
	Fri, 6 May 2022 20:20:18 +0000 (21:20 +0100)