From: lkcl Date: Fri, 6 May 2022 10:29:07 +0000 (+0100) Subject: (no commit message) X-Git-Tag: opf_rfc_ls005_v1~2391 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=03f31857e0c45dde7307106697477094d6bc66c5;p=libreriscv.git --- diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn index cdc340dfc..f6ab38263 100644 --- a/openpower/sv/SimpleV_rationale.mdwn +++ b/openpower/sv/SimpleV_rationale.mdwn @@ -373,7 +373,7 @@ Which brings us to the next important question: how is any of these CPU-centric Vector-centric improvements relevant to power efficiency and making more effective use of resources? -# Simpler more compact programs +# Simpler more compact programs saves power The first and most obvious saving is that, just as with any Vector ISA, the amount of data processing requested @@ -393,7 +393,8 @@ to have the Packed SIMD setup and teardown. `strncpy` for VSX is an astounding 240 hand-coded assembler instructions where it is around 12 to 14 for both RVV and SVP64. Worst case (full algorithm unrolling for Massive FFTs) the L1 I-Cache becomes completely ineffective, and in -the case of the IBM POWER9 a little-known design flaw this results in +the case of the IBM POWER9 with a little-known design flaw not +normally otherwise encountered this results in contention between the L1 D and I Caches at the L2 Bus, slowing down execution even further. Power ISA 3.1 MMA (Matrix-Multiply-Assist) requires loop-unrolling to contend with non-power-of-two Matrix @@ -426,3 +427,15 @@ for example). Bear in mind that the submission process will be entirely at the discretion of the OpenPOWER Foundation ISA WG, something that is both encouraged and welcomed by the OPF. + +One of SVP64's current limitations is that it was initially designed +for 3D and Video workloads as a hybrid GPU-VPU-CPU. This resulted in +a heavy focus on adding hardware-for-loops onto the *Registers*. +After more than three years of development the realisation hit that +the SVP64 concept could be expanded to Coherent Distributed Memory, +This astoundingly powerful concept is explored in the next section. + +# Coherent Deterministic Hybrid Distributed Memory-Processing + +It is not often that a heading for an article can legitimately +contain quite so many buzzwords, but in this section it is justified.