From: lkcl Date: Fri, 6 May 2022 08:40:39 +0000 (+0100) Subject: (no commit message) X-Git-Tag: opf_rfc_ls005_v1~2403 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=a1a1590c7bb682b7a3619d5e4e539e138feca36a;p=libreriscv.git --- diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn index 8845d0505..551b719db 100644 --- a/openpower/sv/SimpleV_rationale.mdwn +++ b/openpower/sv/SimpleV_rationale.mdwn @@ -389,4 +389,17 @@ execution even further. Additional savings come in the form of `SVREMAP`. This is a hardware index transformation system where the normally sequentially-linear element access may be "Re-Mapped" to a limited but algorithmic-tailored -deterministic schedule, for example Matrix Multiply, DCT, or FFT. +and commonly-used deterministic schedule, for example Matrix Multiply, +DCT, or FFT. A full in-register-file 5x7 Matrix Multiply or a 3x4 or +2x6 may be performed in as little as 4 instructions, one of which +is to zero-initialise the accumulator Vector used to store the result. +If addition to another Matrix is also required then it is only three +instructions. Not only that, but because the "Schedule" is an abstract +concept separated from the mathematical operation, there is no reason +why Matrix Multiplication Schedules may not be applied to Integer +Mul-and-Accumulate, Galois Field Mul-and-Accumulate, or Logical +AND-and-OR. The flexibility is not only enormous, but the compactness +unprecedented. RADIX2 in-place DCT Triple-loop Schedules may be created in +around 11 instructions. The only other processors well-known to have +this type of compact capability are both VLIW DSPs: TI's TMS320 Series +and Qualcom's Hexagon.