From 1fd0eccf5d12e3d4c8429fe7304b7d14edaf973f Mon Sep 17 00:00:00 2001 From: lkcl Date: Fri, 6 May 2022 13:08:03 +0100 Subject: [PATCH] --- openpower/sv/SimpleV_rationale.mdwn | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn index 32a3504fc..c8101fdef 100644 --- a/openpower/sv/SimpleV_rationale.mdwn +++ b/openpower/sv/SimpleV_rationale.mdwn @@ -500,6 +500,9 @@ concept needs no branches, no complex Register Hazard Management because it is down to the programmer (or, the compiler), to ensure data overlaps do not occur. +The key aspect of these +very simplistic countdown loops is: *they are deterministic*. + Zero-Overhead Loop Control takes this basic "single loop" concept way further: both nested loops and conditional exit are included, but also arbitrary control-jumping from the current inner loop @@ -508,14 +511,26 @@ dynamically at runtime. Even when deployed on as basic a CPU as a single-issue in-order RISC core, the performance and power-savings were astonishing: between 20 -and **80** reduction in algorithm completion times were achieved compared +and **80%** reduction in algorithm completion times were achieved compared to a more traditional branch-speculative in-order RISC CPU. MPEG Decode, the target algorithm specifically picked by the researcher due to its high complexity with 6-deep nested loops and conditional execution that frequently jumped in and out of at least 2 loops, came out with an astonishing 43% improvement in completion time. 43% less instructions executed is an almost unheard-of level of optimisation: -most ISA designers are elated if they can achieve 5 to 10%. +most ISA designers are elated if they can achieve 5 to 10%. The reduction +was so compelling that ST Microelectronics put it into commercial +production in one of their embedded CPUs. + +The kicker: when implementing SVP64's Matrix REMAP Schedule, the VLSI +design of its triple-nested for-loop system +turned out to be remarkably similar to the +core nested for-loop engine of ZOLC. In hindsight this should not +have come as a surprise, because both are basically nested for-loops. + +The important insight is, however, that if ZOLC can be general-purpose +and apply deterministic nested loop schedules to more than just registers +(unlike SVP64 in its current incarnation) then so can SVP64. **OpenCAPI and Extra-V** -- 2.30.2