less instructions executed is an almost unheard-of level of optimisation:
most ISA designers are elated if they can achieve 5 to 10%. The reduction
was so compelling that ST Microelectronics put it into commercial
-production in one of their embedded CPUs.
+production in one of their embedded CPUs, the ST120 DSP-MCU.
The kicker: when implementing SVP64's Matrix REMAP Schedule, the VLSI
design of its triple-nested for-loop system