(no commit message)

author lkcl <lkcl@web>

Fri, 6 May 2022 10:29:07 +0000 (11:29 +0100)

committer IkiWiki <ikiwiki.info>

Fri, 6 May 2022 10:29:07 +0000 (11:29 +0100)
author lkcl <lkcl@web>
Fri, 6 May 2022 10:29:07 +0000 (11:29 +0100)
committer IkiWiki <ikiwiki.info>
Fri, 6 May 2022 10:29:07 +0000 (11:29 +0100)
diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn

index cdc340dfcb31731abf2323ed1f87c6c2be7c564f..f6ab3826399d81a381db05b632a598f2de213d3a 100644 (file)
--- a/openpower/sv/SimpleV_rationale.mdwn
+++ b/openpower/sv/SimpleV_rationale.mdwn
@@ -373,7 +373,7 @@ Which brings us to the next important question: how is any of these
  CPU-centric Vector-centric improvements relevant to power efficiency
  and making more effective use of resources?
  
-# Simpler more compact programs
+# Simpler more compact programs saves power
  
  The first and most obvious saving is that, just as with any Vector
  ISA, the amount of data processing requested
@@ -393,7 +393,8 @@ to have the Packed SIMD setup and teardown. `strncpy` for VSX is an
  astounding 240 hand-coded assembler instructions where it is around
  12 to 14 for both RVV and SVP64. Worst case (full algorithm unrolling
  for Massive FFTs) the L1 I-Cache becomes completely ineffective, and in
-the case of the IBM POWER9 a little-known design flaw this results in
+the case of the IBM POWER9 with a little-known design flaw not
+normally otherwise encountered this results in
  contention between the L1 D and I Caches at the L2 Bus, slowing down
  execution even further.  Power ISA 3.1 MMA (Matrix-Multiply-Assist)
  requires loop-unrolling to contend with non-power-of-two Matrix
@@ -426,3 +427,15 @@ for example). Bear in mind that
  the submission process will be
  entirely at the discretion of the OpenPOWER Foundation ISA WG,
  something that is both encouraged and welcomed by the OPF.
+
+One of SVP64's current limitations is that it was initially designed
+for 3D and Video workloads as a hybrid GPU-VPU-CPU. This resulted in
+a heavy focus on adding hardware-for-loops onto the *Registers*.
+After more than three years of development the realisation hit that
+the SVP64 concept could be expanded to Coherent Distributed Memory,
+This astoundingly powerful concept is explored in the next section.
+
+# Coherent Deterministic Hybrid Distributed Memory-Processing
+
+It is not often that a heading for an article can legitimately
+contain quite so many buzzwords, but in this section it is justified.
author	lkcl <lkcl@web>
	Fri, 6 May 2022 10:29:07 +0000 (11:29 +0100)
committer	IkiWiki <ikiwiki.info>
	Fri, 6 May 2022 10:29:07 +0000 (11:29 +0100)