From: Luke Kenneth Casson Leighton Date: Tue, 14 Jun 2022 12:10:58 +0000 (+0100) Subject: add matrix multiply image into whitepaper X-Git-Tag: opf_rfc_ls005_v1~1792 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=f3ba1a1ebfce7768e4d9de8a465b306a346d19d6;p=libreriscv.git add matrix multiply image into whitepaper --- diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn index d16c79a2b..ff0bb23f4 100644 --- a/openpower/sv/SimpleV_rationale.mdwn +++ b/openpower/sv/SimpleV_rationale.mdwn @@ -915,6 +915,23 @@ Vectorisation Modes in SVP64: moving to next **element**. Currently managed by `svstep`, ZOLC may be deployed to manage the stepping, in a Deterministic manner. +Second: +SVP64 Draft Matrix Multiply is currently set up to arrange a Schedule +of Multiply-and-Accumulates, suitable for pipelining, that will, +ultimately, result in a Matrix Multiply. Normal processors are forced +to perform "loop-unrolling" in order to achieve this same Schedule. +SIMD processors are further forced into a situation of pre-arranging rotated +copies of data if the Matrices are not exactly on a power-of-two boundary. + +The current limitation of SVP64 however is (when Horizontal-First +is deployed, at least, which is the least number of instructions) +that both source and destination Matrices have to be in-registers, +in full. Vertical-First may be used to perform a LD/ST within +the loop, covered by `svstep`, but it is still not ideal. This +is where the Snitch and EXTRA-V concepts kick in. + + + Imagine a large Matrix scenario, with several values close to zero that could be skipped: no need to include zero-multiplications, but a traditional CPU in no way can help: only by loading the data through