bare-bones microkernel
would be viable, or a Management Core closer to the PEs (on the same
die or Multi-Chip-Module as the PEs) would allow better bandwidth and
-reduce Management Overhead on the main CPUs. However once established,
-and running the same level of power saving as Snitch (1/6th) and
-the same sort of reduction in algorithm runtime (20 to 80%) is not
-unreasonable, and compelling enough to warrant in-depth investigation.
+reduce Management Overhead on the main CPUs. However if
+the same level of power saving as Snitch (1/6th) and
+the same sort of reduction in algorithm runtime as ZOLC (20 to 80%) is not
+unreasonable to expect, this is
+definitely compelling enough to warrant in-depth investigation.
**Use-case: Matrix and Convolutions**
+First, some important definitions, because there are two different
+Vectorisation Modes in SVP64:
+
* **Horizontal-First**: (aka standard Cray Vectors) walk
through **elements** first before moving to next **instruction**
* **Vertical-First**: walk through **instructions** before