on a GB-OoO Micro-architecture)
</blockquote>*
-Draft Image:
+Draft Image (placeholder):
<img src="/openpower/sv/zolc_svp64_extrav.jpg" width=800 />
+The program being executed is a simple loop with a conditional
+test that ignores the multiply if the input is zero.
+
+* In the CPU-only case (top) the data goes through L1/L2
+ Cache before reaching the CPU.
+* However the PE version does not send zero-data to the CPU,
+ and even when it does it goes into a Coherent FIFO: no real
+ compelling need to enter L1/L2 Cache or even the CPU Register
+ File (one of the key reasons why Snitch saves so much power).
+* The PE-only version (see next use-case) the CPU is mostly
+ idle, serving RADIX MMU TLB requests for PEs, and OpenCAPI
+ requests.
+
**Use-case variant: More powerful in-memory PEs**
An obvious variant of the above is that, if there is inherently