around 5x7 to 6x6 Matrices and constrained by the size of
the register files (128 64-bit entries), to arbitrary (massive) sizes.
+**Use-case: Matrix and Convolutions**
+
+Imagine a large Matrix scenario, with several values close to zero that
+could be skipped: no need to include zero-multiplications.
+SVP64 is able to do what is termed "Vertical-First" Vectorisation,
+combined with SVREMAP Matrix Schedules. Imagine that SVREMAP has been
+extended, Snitch-style, to perform a deterministic memory-array walk of
+a large Matrix.
+
+Let us also imagine that the Matrices are stored in Memory with PEs
+attached, and that the PEs are fully functioning Power ISA with Draft
+SVP64 their Multiply capability is not as good as the main CPU. Therefore:
+we want the PEs to feed the sparse data to the main CPU.
+
+* The ZOLC SVREMAP System running on the main CPU generates a Matrix
+ Memory-Load Schedule.
+* The Schedule is sent to the PEs, next to the Memory, via OpenCAPI
+* The PEs are also sent the Basic Block to be executed on each
+ Memory Load (each element of the Matrices to be multiplied)
+* The PEs execute the Basic Block and **exclude**, in a deterministic
+ fashion, any elements containing Zero values
+* Non-zero elements are sent, via OpenCAPI, to the main CPU, which
+ queues sequences of Multiply-and-Accumulate, and feeds the results
+ back to Memory, again via OpenCAPI, to the PEs.
+* The PEs, which are tracking the Sparse Conditions, know where
+ to store the results received
+
+In essence this is near-identical to the original Snitch concept
+except that there are, like Extra-V, PEs able to perform
+conditional testing of the data as it goes both to and from the
+main CPU. In this way a large Sparse Matrix Multiply or Convolution
+may be achieved without having to pass unnecessary data through
+L1/L2/L3 Caches only to find, at the CPU, that it is zero.
+
+**Summary**
+
Bottom line is that there is a clear roadmap towards solving a long
standing problem facing Computer Science and doing so in a way that
reduces power consumption reduces algorithm completion time and reduces