for several decades, and advanced programmers are comfortable with the
practice.
+Additional questions remain as to whether OpenCAPI or its use for this
+particular scenario requires that the PEs, even quite basic ones,
+implement a full RADIX MMU, and associated TLB lookup? In order to ensure
+that programs may be cleanly and seamlessly transferred between PEs
+and CPU the answer is quite likely to be "yes", which is interesting
+in and of itself. Fortunately, the associated L1 Cache with TLB
+Translation does not have to be large, and the actual RADIX Tree Walk
+need not explicitly be done by the PEs, it can be handled by the main
+CPU as a software-extension.
+
**Use-case: Matrix and Convolutions**
Imagine a large Matrix scenario, with several values close to zero that
may be achieved without having to pass unnecessary data through
L1/L2/L3 Caches only to find, at the CPU, that it is zero.
+**Use-case: More powerful PEs in-memory**
+
+An obvious variant of the above is that, if there is inherently
+more parallelism in the data set, then the PEs get their own
+Multiply-and-Accumulate instruction, and rather than send the
+data to the CPU over OpenCAPI, perform the Matrix-Multiply
+directly themselves.
+
+However the source code and binary would be near-identical if
+not identical in every respect, and the PEs implementing the full
+ZOLC capability in order to compact binary size to the bare minimum.
+
+One key strategic question does remain: do the PEs need to have
+a RADIX MMU and associated TLB-aware minimal L1 Cache, in order
+to support OpenCAPI properly?
+
**Roadmap summary of Advanced SVP64**
The future direction for SVP64, then, is: