Management because it is down to the programmer (or, the compiler),
to ensure data overlaps do not occur.
+The key aspect of these
+very simplistic countdown loops is: *they are deterministic*.
+
Zero-Overhead Loop Control takes this basic "single loop" concept
way further: both nested loops and conditional exit are included,
but also arbitrary control-jumping from the current inner loop
Even when deployed on as basic a CPU as a single-issue in-order RISC
core, the performance and power-savings were astonishing: between 20
-and **80** reduction in algorithm completion times were achieved compared
+and **80%** reduction in algorithm completion times were achieved compared
to a more traditional branch-speculative in-order RISC CPU. MPEG
Decode, the target algorithm specifically picked by the researcher
due to its high complexity with 6-deep nested loops and conditional
execution that frequently jumped in and out of at least 2 loops,
came out with an astonishing 43% improvement in completion time. 43%
less instructions executed is an almost unheard-of level of optimisation:
-most ISA designers are elated if they can achieve 5 to 10%.
+most ISA designers are elated if they can achieve 5 to 10%. The reduction
+was so compelling that ST Microelectronics put it into commercial
+production in one of their embedded CPUs.
+
+The kicker: when implementing SVP64's Matrix REMAP Schedule, the VLSI
+design of its triple-nested for-loop system
+turned out to be remarkably similar to the
+core nested for-loop engine of ZOLC. In hindsight this should not
+have come as a surprise, because both are basically nested for-loops.
+
+The important insight is, however, that if ZOLC can be general-purpose
+and apply deterministic nested loop schedules to more than just registers
+(unlike SVP64 in its current incarnation) then so can SVP64.
**OpenCAPI and Extra-V**