(no commit message)

author lkcl <lkcl@web>

Fri, 6 May 2022 12:08:03 +0000 (13:08 +0100)

committer IkiWiki <ikiwiki.info>

Fri, 6 May 2022 12:08:03 +0000 (13:08 +0100)
author lkcl <lkcl@web>
Fri, 6 May 2022 12:08:03 +0000 (13:08 +0100)
committer IkiWiki <ikiwiki.info>
Fri, 6 May 2022 12:08:03 +0000 (13:08 +0100)
diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn

index 32a3504fc9218b017b26a39da13e6f1955dc8e0a..c8101fdef42abdf32fa62354d8755d4c561322b1 100644 (file)
--- a/openpower/sv/SimpleV_rationale.mdwn
+++ b/openpower/sv/SimpleV_rationale.mdwn
@@ -500,6 +500,9 @@ concept needs no branches, no complex Register Hazard
  Management because it is down to the programmer (or, the compiler),
  to ensure data overlaps do not occur.
  
+The key aspect of these
+very simplistic countdown loops is: *they are deterministic*.
+
  Zero-Overhead Loop Control takes this basic "single loop" concept
  way further: both nested loops and conditional exit are included,
  but also arbitrary control-jumping from the current inner loop
@@ -508,14 +511,26 @@ dynamically at runtime.
  
  Even when deployed on as basic a CPU as a single-issue in-order RISC
  core, the performance and power-savings were astonishing: between 20
-and **80** reduction in algorithm completion times were achieved compared
+and **80%** reduction in algorithm completion times were achieved compared
  to a more traditional branch-speculative in-order RISC CPU.  MPEG
  Decode, the target algorithm specifically picked by the researcher
  due to its high complexity with 6-deep nested loops and conditional
  execution that frequently jumped in and out of at least 2 loops,
  came out with an astonishing 43% improvement in completion time. 43%
  less instructions executed is an almost unheard-of level of optimisation:
-most ISA designers are elated if they can achieve 5 to 10%.
+most ISA designers are elated if they can achieve 5 to 10%. The reduction
+was so compelling that ST Microelectronics put it into commercial
+production in one of their embedded CPUs.
+
+The kicker: when implementing SVP64's Matrix REMAP Schedule, the VLSI
+design of its triple-nested for-loop system
+turned out to be remarkably similar to the
+core nested for-loop engine of ZOLC. In hindsight this should not
+have come as a surprise, because both are basically nested for-loops.
+
+The important insight is, however, that if ZOLC can be general-purpose
+and apply deterministic nested loop schedules to more than just registers
+(unlike SVP64 in its current incarnation) then so can SVP64.
  
  **OpenCAPI and Extra-V**
author	lkcl <lkcl@web>
	Fri, 6 May 2022 12:08:03 +0000 (13:08 +0100)
committer	IkiWiki <ikiwiki.info>
	Fri, 6 May 2022 12:08:03 +0000 (13:08 +0100)