simple_v_extension.mdwn

   1 # SIMD / Simple-V Extension Proposal
   2
   3 This proposal exists so as to be able to satisfy several disparate
   4 requirements: area-conscious designs and performance-conscious designs.
   5 Also, the existing P (SIMD) proposal and the V (Vector) proposals,
   6 whilst each extremely powerful in their own right and clearly desirable,
   7 are also:
   8
   9 * Clearly independent in their origins (Cray and AndeStar v3 respectively)
  10 * Both contain duplication of pre-existing RISC-V instructions
  11 * Both have independent and disparate methods for introducing parallelism
  12   at the instruction level.
  13 * Both require that their respective parallelism paradigm be implemented
  14   along-side their respective functionality *or not at all*.
  15 * Both independently have methods for introducing parallelism that could,
  16   if separated, benefit *other areas of RISC-V not just DSP and Floating-point*.
  17
  18 Therefore it makes a huge amount of sense to have a means and method
  19 of introducing instruction parallelism in a flexible way that provides
  20 implementors with the option to choose exactly where they wish to offer
  21 performance improvements and where they wish to optimise for power
  22 and area.  If that can be offered even on a per-operation basis that
  23 would provide even more flexibility.
  24
  25 # Analysis and discussion of Vector vs SIMD
  26
  27 There are four combined areas between the two proposals that help with
  28 parallelism without over-burdening the ISA with a huge proliferation of
  29 instructions:
  30
  31 * Fixed vs variable parallelism (fixed or variable "M" in SIMD)
  32 * Implicit vs fixed instruction bit-width (integral to instruction or not)
  33 * Implicit vs explicit type-conversion (compounded on bit-width)
  34 * Implicit vs explicit inner loops.
  35
  36 The pros and cons of each are discussed and analysed below.
  37
  38 ## Fixed vs variable parallelism length
  39
  40 In David Patterson and Andrew Waterman's analysis of SIMD and Vector
  41 ISAs, the analysis comes out clearly in favour of (effectively) variable
  42 length SIMD.  As SIMD is a fixed width, typically 4, 8 or in extreme cases
  43 16 or 32 simultaneous operations, the setup, teardown and corner-cases of SIMD
  44 are extremely burdensome except for applications that *specifically* require
  45 match the *precise and exact* depth of the SIMD engine.
  46
  47 Thus, SIMD, no matter what width is chosen, is never going to be acceptable
  48 for general-purpose computation.
  49
  50 That basically leaves "variable-length vector" as the clear *general-purpose*
  51 winner, at least in terms of greatly simplifying the instruction set,
  52 reducing the number of instructions required for any given task, and thus
  53 reducing power consumption for the same.
  54
  55 ## Implicit vs fixed instruction bit-width
  56
  57 SIMD again has a severe disadvantage here, over Vector: huge proliferation
  58 of specialist instructions that target 8-bit, 16-bit, 32-bit, 64-bit, and
  59 have to then have operations *for each and between each*.  It gets very
  60 messy, very quickly.
  61
  62 The V-Extension on the other hand proposes to set the bit-width of
  63 future instructions on a per-register basis, such that subsequent instructions
  64 involving that register are *implicitly* of that particular bit-width until
  65 otherwise changed or reset.
  66
  67 This has some extremely useful properties, without being particularly
  68 burdensome to implementations, given that instruction decode already has
  69 to direct the operation to a correctly-sized width ALU engine, anyway.
  70
  71 ## Implicit and explicit type-conversion
  72
  73 The Draft 2.3 V-extension proposal has (deprecated) polymorphism to help
  74 deal with over-population of instructions, such that type-casting from
  75 integer (and floating point) of various sizes is automatically inferred
  76 due to "type tagging" that is set with a special instruction.  A register
  77 will be *specifically* marked as "16-bit Floating-Point" and, if added
  78 to an operand that is specifically tagged as "32-bit Integer" an implicit
  79 type-conversion will take placce *without* requiring that type-conversion
  80 to be explicitly done with its own separate instruction.
  81
  82 However, implicit type-conversion is not only quite burdensome to
  83 implement (explosion of inferred type-to-type conversion) but also is
  84 never really going to be complete.  It gets even worse when bit-widths
  85 also have to be taken into consideration.
  86
  87 Overall, type-conversion is generally best to leave to explicit
  88 type-conversion instructions, or in definite specific use-cases left to
  89 be part of an actual instruction (DSP or FP)
  90
  91 ## Zero-overhead loops vs explicit loops
  92
  93 The initial Draft P-SIMD Proposal by Chuanhua Chang of Andes Technology
  94 contains an extremely interesting feature: zero-overhead loops.  This
  95 proposal would basically allow an inner loop of instructions to be
  96 repeated indefinitely, a fixed number of times.
  97
  98 Its specific advantage over explicit loops is that the pipeline in a
  99 DSP can potentially be kept completely full *even in an in-order
 100 implementation*.  Normally, it requires a superscalar architecture and
 101 out-of-order execution capabilities to "pre-process" instructions in order
 102 to keep ALU pipelines 100% occupied.
 103
 104 This very simple proposal offers a way to increase pipeline activity in the
 105 one key area which really matters: the inner loop.
 106
 107 # References
 108
 109 * SIMD considered harmful <https://www.sigarch.org/simd-instructions-considered-harmful/>
 110 * Link to first proposal <https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/GuukrSjgBH8>
 111 * Recommendation by Jacob Bachmeyer to make zero-overhead loop an
 112   "implicit program-counter" <https://groups.google.com/a/groups.riscv.org/d/msg/isa-dev/vYVi95gF2Mo/SHz6a4_lAgAJ>
 113 * Re-continuing P-Extension proposal <https://groups.google.com/a/groups.riscv.org/forum/#!msg/isa-dev/IkLkQn3HvXQ/SEMyC9IlAgAJ>
 114 * First Draft P-SIMD (DSP) proposal <https://groups.google.com/a/groups.riscv.org/forum/#!topic/isa-dev/vYVi95gF2Mo>
 115