From: lkcl <lkcl@web>
Date: Thu, 5 May 2022 22:07:16 +0000 (+0100)
Subject: (no commit message)
X-Git-Tag: opf_rfc_ls005_v1~2418
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=e7044d83f2bf4f2b3133470ae8a4d7b949ee9cf5;p=libreriscv.git

---

diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn
index d8424ae51..ecce194ee 100644
--- a/openpower/sv/SimpleV_rationale.mdwn
+++ b/openpower/sv/SimpleV_rationale.mdwn
@@ -228,9 +228,9 @@ it feels like there should be a better way, particularly on
 close inspection of RVV as an example, the basic arithmetic
 operations are massively duplicated: scalar-scalar from the base
 is joined by both scalar-vector and vector-vector *and* predicate
-mask management, and transfer instructions between all the sane,
+mask management, and transfer instructions between all the same,
 which goes a long way towards explaining why there are twice as many
-Vector instructions in RISC-V as there are in the RV64GC base.
+Vector instructions in RISC-V as there are in the RV64GC Scalar base.
 
 The question then becomes: with all the duplication of arithmetic
 operations just to make the registers scalar or vector, why not
@@ -239,8 +239,8 @@ or prefix that augments its behaviour?
 
 Remarkably this is not a new idea.  Intel's x86 `REP` instruction
 gives the base concept, but in 1994 it was Peter Hsu, the designer
-of the MIPS R8000, who first came up with the idea of Vector
-prefixing.  Relying on a multi-issue Out-of-Order Execution Engine,
+of the MIPS R8000, who first came up with the idea of Vector-augmented
+prefixing of an existing Scalar ISA.  Relying on a multi-issue Out-of-Order Execution Engine,
 the prefix would mark which of the registers were to be treated as
 Scalar and which as Vector, then perform a `REP`-like loop that
 jammed multiple scalar operations into the Multi-Issue Execution
@@ -270,15 +270,22 @@ of the problem-space:
   with primarily Load and Store being able to handle 8/16/32/64
   and sometimes 128-bit (quad-word), where Vector ISAs need to
   go as low as 8-bit arithmetic, even 8-bit Floating-Point for
-  high-performance AI.
+  high-performance AI. Rather than waste opcode space adding all
+  such operations at different bitwidths, let the prefix
+  *redefine* the element width.
 * "Reordering" of the assumption of linear sequential element
   access, for Matrices, rotations, transposition, Convolutions,
   DCT, FFT, Parallel Prefix-Sum and other common transformations
   that require significant programming effort in other ISAs.
 
+From there, several more "Modes" can be added, including saturation,
+which is needed for Audio and Video applications, "Reverse Gear"
+which runs the Element Loop in reverse order (needed for Prefix
+Sum), and more.
+
 **What is missing from Power Scalar ISA that a Vector ISA needs?**
 
-Remarkably, very little.
+Remarkably, very little: the devil is in the details though.
 
 * The traditional `iota` instruction may be
   synthesised with an overlapping add, that stacks up incrementally
@@ -294,3 +301,8 @@ Remarkably, very little.
 * The Condition Register Fields of the Power ISA make a great candidate
   for use as Predicate Masks, particularly when combined with
   Vectorised `cmp` and Vectorised `crand`, `crxor` etc.
+
+It is only when looking slightly deeper into the Power ISA that
+certain things turn out to be missing, and this is down in part to IBM's
+primary focus on the 750 Packed SIMD opcodes at the expense of the 250 or
+so Scalar ones.