From: lkcl Date: Fri, 6 May 2022 09:13:30 +0000 (+0100) Subject: (no commit message) X-Git-Tag: opf_rfc_ls005_v1~2395 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=381f4f3972da8aafabef00c7b52eefc1c9e8a92f;p=libreriscv.git --- diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn index ec20d02f9..2b47344fc 100644 --- a/openpower/sv/SimpleV_rationale.mdwn +++ b/openpower/sv/SimpleV_rationale.mdwn @@ -223,7 +223,7 @@ Packed SIMD explicitly smashes that width right in the face of the programmer and expects them to like it. As the article immediately demonstrates, an arbitrary-sized data set has to contend with an insane power-of-two Packed SIMD cascade at both setup and teardown -that can add literally an order +that routinely adds literally an order of magnitude increase in the number of hand-written lines of assembler compared to a well-designed Cray-style Vector ISA with a `setvl` instruction. @@ -393,7 +393,9 @@ astounding 240 hand-coded assembler instructions where it is around for Massive FFTs) the L1 I-Cache becomes completely ineffective, and in the case of the IBM POWER9 a little-known design flaw this results in contention between the L1 D and I Caches at the L2 Bus, slowing down -execution even further. +execution even further. Power ISA 3.1 MMA (Matrix-Multiply-Assist) +requires loop-unrolling to contend with non-power-of-two Matrix +sizes: SVP64 does not, as hinted at below. Additional savings come in the form of `SVREMAP`. This is a hardware index transformation system where the normally sequentially-linear @@ -403,7 +405,7 @@ DCT, or FFT. A full in-register-file 5x7 Matrix Multiply or a 3x4 or 2x6 may be performed in as little as 4 instructions, one of which is to zero-initialise the accumulator Vector used to store the result. If addition to another Matrix is also required then it is only three -instructions. Not only that, but because the "Schedule" is an abstract +instructions. Not only that, but because the "Schedule" is an abstract concept separated from the mathematical operation, there is no reason why Matrix Multiplication Schedules may not be applied to Integer Mul-and-Accumulate, Galois Field Mul-and-Accumulate, or Logical