(no commit message)

author lkcl <lkcl@web>

Thu, 5 May 2022 22:07:16 +0000 (23:07 +0100)

committer IkiWiki <ikiwiki.info>

Thu, 5 May 2022 22:07:16 +0000 (23:07 +0100)
author lkcl <lkcl@web>
Thu, 5 May 2022 22:07:16 +0000 (23:07 +0100)
committer IkiWiki <ikiwiki.info>
Thu, 5 May 2022 22:07:16 +0000 (23:07 +0100)
diff --git a/openpower/sv/SimpleV_rationale.mdwn b/openpower/sv/SimpleV_rationale.mdwn

index d8424ae511181fc0794567aea9a936f5f822bacd..ecce194eeda43e9097801dd6daa532257c6fcbb4 100644 (file)
--- a/openpower/sv/SimpleV_rationale.mdwn
+++ b/openpower/sv/SimpleV_rationale.mdwn
@@ -228,9 +228,9 @@ it feels like there should be a better way, particularly on
  close inspection of RVV as an example, the basic arithmetic
  operations are massively duplicated: scalar-scalar from the base
  is joined by both scalar-vector and vector-vector *and* predicate
-mask management, and transfer instructions between all the sane,
+mask management, and transfer instructions between all the same,
  which goes a long way towards explaining why there are twice as many
-Vector instructions in RISC-V as there are in the RV64GC base.
+Vector instructions in RISC-V as there are in the RV64GC Scalar base.
  
  The question then becomes: with all the duplication of arithmetic
  operations just to make the registers scalar or vector, why not
@@ -239,8 +239,8 @@ or prefix that augments its behaviour?
  
  Remarkably this is not a new idea.  Intel's x86 `REP` instruction
  gives the base concept, but in 1994 it was Peter Hsu, the designer
-of the MIPS R8000, who first came up with the idea of Vector
-prefixing.  Relying on a multi-issue Out-of-Order Execution Engine,
+of the MIPS R8000, who first came up with the idea of Vector-augmented
+prefixing of an existing Scalar ISA.  Relying on a multi-issue Out-of-Order Execution Engine,
  the prefix would mark which of the registers were to be treated as
  Scalar and which as Vector, then perform a `REP`-like loop that
  jammed multiple scalar operations into the Multi-Issue Execution
@@ -270,15 +270,22 @@ of the problem-space:
    with primarily Load and Store being able to handle 8/16/32/64
    and sometimes 128-bit (quad-word), where Vector ISAs need to
    go as low as 8-bit arithmetic, even 8-bit Floating-Point for
-  high-performance AI.
+  high-performance AI. Rather than waste opcode space adding all
+  such operations at different bitwidths, let the prefix
+  *redefine* the element width.
  * "Reordering" of the assumption of linear sequential element
    access, for Matrices, rotations, transposition, Convolutions,
    DCT, FFT, Parallel Prefix-Sum and other common transformations
    that require significant programming effort in other ISAs.
  
+From there, several more "Modes" can be added, including saturation,
+which is needed for Audio and Video applications, "Reverse Gear"
+which runs the Element Loop in reverse order (needed for Prefix
+Sum), and more.
+
  **What is missing from Power Scalar ISA that a Vector ISA needs?**
  
-Remarkably, very little.
+Remarkably, very little: the devil is in the details though.
  
  * The traditional `iota` instruction may be
    synthesised with an overlapping add, that stacks up incrementally
@@ -294,3 +301,8 @@ Remarkably, very little.
  * The Condition Register Fields of the Power ISA make a great candidate
    for use as Predicate Masks, particularly when combined with
    Vectorised `cmp` and Vectorised `crand`, `crxor` etc.
+
+It is only when looking slightly deeper into the Power ISA that
+certain things turn out to be missing, and this is down in part to IBM's
+primary focus on the 750 Packed SIMD opcodes at the expense of the 250 or
+so Scalar ones.
author	lkcl <lkcl@web>
	Thu, 5 May 2022 22:07:16 +0000 (23:07 +0100)
committer	IkiWiki <ikiwiki.info>
	Thu, 5 May 2022 22:07:16 +0000 (23:07 +0100)