parallelising of algorithms is simply too difficult to master and always
has been. Thus whilst DRAM has to go parallel (like RAID Striping) to
keep up, CPUs are now at 8-way Multi-Issue 5 ghz clock rates and
-are at an astonishing four levels of cache (L1 to L4). The amount
-of wiring inside such CPUs is now measured in miles.
+are at an astonishing four levels of cache (L1 to L4).
+
+It should therefore come as no surprise that attempts are being made
+to move (distribute) processing closer to the DRAM Memory, firmly
+on the *opposite* side of the main CPU's L1/2/3/4 Caches. However
+the alarm bells ring here at the keyword "distributed", because by
+moving the processing down next to the Memory, the speed of any
+of the parallel Processing Elements has dropped
+by almost two orders of magnitude,
+the simplicity has for pure pragmatic reasons to drop by several
+orders of magnitude. Things that the average "sequential algorithm"
+programmer
+takes for granted such as SMP, Cache Coherency, Virtual Memory,
+spinlocks (atomic locking), all of these are either outright gone
+or expected that the programmer shall explicitly contend with
+(even if that programmer is the Compiler Developer).
+