From bdef73c0c3b8abac753758da5a50276ba36b0e98 Mon Sep 17 00:00:00 2001 From: lkcl Date: Sun, 8 May 2022 07:02:21 +0100 Subject: [PATCH] --- openpower/sv/SimpleV_rationale.mdwn | 36 +++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn index aecdc58ef..97954ed1f 100644 --- a/openpower/sv/SimpleV_rationale.mdwn +++ b/openpower/sv/SimpleV_rationale.mdwn @@ -836,6 +836,42 @@ expand SVP64's capability for Matrices, currently limited to around 5x7 to 6x6 Matrices and constrained by the size of the register files (128 64-bit entries), to arbitrary (massive) sizes. +**Use-case: Matrix and Convolutions** + +Imagine a large Matrix scenario, with several values close to zero that +could be skipped: no need to include zero-multiplications. +SVP64 is able to do what is termed "Vertical-First" Vectorisation, +combined with SVREMAP Matrix Schedules. Imagine that SVREMAP has been +extended, Snitch-style, to perform a deterministic memory-array walk of +a large Matrix. + +Let us also imagine that the Matrices are stored in Memory with PEs +attached, and that the PEs are fully functioning Power ISA with Draft +SVP64 their Multiply capability is not as good as the main CPU. Therefore: +we want the PEs to feed the sparse data to the main CPU. + +* The ZOLC SVREMAP System running on the main CPU generates a Matrix + Memory-Load Schedule. +* The Schedule is sent to the PEs, next to the Memory, via OpenCAPI +* The PEs are also sent the Basic Block to be executed on each + Memory Load (each element of the Matrices to be multiplied) +* The PEs execute the Basic Block and **exclude**, in a deterministic + fashion, any elements containing Zero values +* Non-zero elements are sent, via OpenCAPI, to the main CPU, which + queues sequences of Multiply-and-Accumulate, and feeds the results + back to Memory, again via OpenCAPI, to the PEs. +* The PEs, which are tracking the Sparse Conditions, know where + to store the results received + +In essence this is near-identical to the original Snitch concept +except that there are, like Extra-V, PEs able to perform +conditional testing of the data as it goes both to and from the +main CPU. In this way a large Sparse Matrix Multiply or Convolution +may be achieved without having to pass unnecessary data through +L1/L2/L3 Caches only to find, at the CPU, that it is zero. + +**Summary** + Bottom line is that there is a clear roadmap towards solving a long standing problem facing Computer Science and doing so in a way that reduces power consumption reduces algorithm completion time and reduces -- 2.30.2