**Use-case: Matrix and Convolutions**
Imagine a large Matrix scenario, with several values close to zero that
-could be skipped: no need to include zero-multiplications.
+could be skipped: no need to include zero-multiplications, but a
+traditional CPU in no way can help: only by loading the data through
+the L1-L4 Cache and Virtual Memory Barriers is it possible to
+ascertain, retrospectively, that time and power had just been wasted.
+
SVP64 is able to do what is termed "Vertical-First" Vectorisation,
combined with SVREMAP Matrix Schedules. Imagine that SVREMAP has been
extended, Snitch-style, to perform a deterministic memory-array walk of
Let us also imagine that the Matrices are stored in Memory with PEs
attached, and that the PEs are fully functioning Power ISA with Draft
-SVP64 their Multiply capability is not as good as the main CPU. Therefore:
-we want the PEs to feed the sparse data to the main CPU.
+SVP64, but their Multiply capability is not as good as the main CPU.
+Therefore:
+we want the PEs to feed the sparse data to the main CPU, a la "Extra-V".
* The ZOLC SVREMAP System running on the main CPU generates a Matrix
Memory-Load Schedule.