From: lkcl Date: Thu, 5 May 2022 22:07:16 +0000 (+0100) Subject: (no commit message) X-Git-Tag: opf_rfc_ls005_v1~2418 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=e7044d83f2bf4f2b3133470ae8a4d7b949ee9cf5;p=libreriscv.git --- diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn index d8424ae51..ecce194ee 100644 --- a/openpower/sv/SimpleV_rationale.mdwn +++ b/openpower/sv/SimpleV_rationale.mdwn @@ -228,9 +228,9 @@ it feels like there should be a better way, particularly on close inspection of RVV as an example, the basic arithmetic operations are massively duplicated: scalar-scalar from the base is joined by both scalar-vector and vector-vector *and* predicate -mask management, and transfer instructions between all the sane, +mask management, and transfer instructions between all the same, which goes a long way towards explaining why there are twice as many -Vector instructions in RISC-V as there are in the RV64GC base. +Vector instructions in RISC-V as there are in the RV64GC Scalar base. The question then becomes: with all the duplication of arithmetic operations just to make the registers scalar or vector, why not @@ -239,8 +239,8 @@ or prefix that augments its behaviour? Remarkably this is not a new idea. Intel's x86 `REP` instruction gives the base concept, but in 1994 it was Peter Hsu, the designer -of the MIPS R8000, who first came up with the idea of Vector -prefixing. Relying on a multi-issue Out-of-Order Execution Engine, +of the MIPS R8000, who first came up with the idea of Vector-augmented +prefixing of an existing Scalar ISA. Relying on a multi-issue Out-of-Order Execution Engine, the prefix would mark which of the registers were to be treated as Scalar and which as Vector, then perform a `REP`-like loop that jammed multiple scalar operations into the Multi-Issue Execution @@ -270,15 +270,22 @@ of the problem-space: with primarily Load and Store being able to handle 8/16/32/64 and sometimes 128-bit (quad-word), where Vector ISAs need to go as low as 8-bit arithmetic, even 8-bit Floating-Point for - high-performance AI. + high-performance AI. Rather than waste opcode space adding all + such operations at different bitwidths, let the prefix + *redefine* the element width. * "Reordering" of the assumption of linear sequential element access, for Matrices, rotations, transposition, Convolutions, DCT, FFT, Parallel Prefix-Sum and other common transformations that require significant programming effort in other ISAs. +From there, several more "Modes" can be added, including saturation, +which is needed for Audio and Video applications, "Reverse Gear" +which runs the Element Loop in reverse order (needed for Prefix +Sum), and more. + **What is missing from Power Scalar ISA that a Vector ISA needs?** -Remarkably, very little. +Remarkably, very little: the devil is in the details though. * The traditional `iota` instruction may be synthesised with an overlapping add, that stacks up incrementally @@ -294,3 +301,8 @@ Remarkably, very little. * The Condition Register Fields of the Power ISA make a great candidate for use as Predicate Masks, particularly when combined with Vectorised `cmp` and Vectorised `crand`, `crxor` etc. + +It is only when looking slightly deeper into the Power ISA that +certain things turn out to be missing, and this is down in part to IBM's +primary focus on the 750 Packed SIMD opcodes at the expense of the 250 or +so Scalar ones.