openpower/sv.mdwn

   1 [[!tag standards]]
   2
   3 # Simple-V Vectorisation for the OpenPOWER ISA
   4
   5 **SV is in DRAFT STATUS**. SV has not yet been submitted to the OpenPOWER Foundation ISA WG for review.
   6
   7 <https://bugs.libre-soc.org/show_bug.cgi?id=213>
   8
   9 Fundamental design principles:
  10
  11 * Simplicity of introduction and implementation on the existing OpenPOWER ISA
  12 * Effectively a hardware for-loop, pausing PC, issuing multiple scalar operations
  13 * Preserving the underlying scalar execution dependencies as if the for-loop had been expanded as actual scalar instructions
  14   (termed "preserving Program Order")
  15 * Augments ("tags") existing instructions, providing Vectorisation "context" rather than adding new ones.
  16 * Does not modify or deviate from the underlying scalar OpenPOWER ISA unless it provides significant performance or other advantage to do so in the Vector space (dropping XER.SO and OE=1 for example)
  17 * Designed for Supercomputing: avoids creating significant sequential
  18 dependency hazards, allowing high performance superscalar microarchitectures to be deployed.
  19
  20 Advantages of these design principles:
  21
  22 * It is therefore easy to create a first (and sometimes only) implementation as literally a for-loop in hardware, simulators, and compilers.
  23 * More complex HDL can be done by repeating existing scalar ALUs and pipelines as blocks.
  24 * As (mostly) a high-level "context" that does not (significantly) deviate from scalar OpenPOWER ISA and, in its purest form being "a for loop around scalar instructions", it is minimally-disruptive and consequently stands a reasonable chance of broad community adoption and acceptance
  25 * Completely wipes not just SIMD opcode proliferation off the
  26   map (SIMD is O(N^6) opcode proliferation)
  27   but off of Vectorisation ISAs as well.  No more separate Vector
  28   instructions.
  29
  30 Pages being developed and examples
  31
  32 * [[sv/overview]] explaining the basics.
  33 * [[sv/implementation]] implementation planning and coordination
  34 * [[sv/svp64]] contains the packet-format *only*
  35 * [[sv/setvl]] the Cray-style "Vector Length" instruction
  36 * [[sv/predication]] discussion on predication concepts
  37 * [[sv/cr_int_predication]] instructions needed for effective predication
  38 * [[sv/masked_vector_chaining]]
  39 * [[sv/discussion]]
  40 * [[sv/example_dep_matrices]]
  41 * [[sv/major_opcode_allocation]]
  42 * [[opcode_regs_deduped]]
  43 * [[sv/vector_swizzle]]
  44 * [[sv/vector_ops]]
  45 * [[sv/register_type_tags]]
  46 * [[sv/mv.swizzle]]
  47 * [[sv/mv.x]]
  48 * [[sv/branches]] - SVP64 Conditional Branch behaviour: All/Some Vector CRs
  49 * [[sv/cr_ops]] - SVP64 Condition Register ops: Guidelines
  50  on Vectorisation of any v3.0B base operations which return
  51  or modify a Condition Register bit or field.
  52 * [[sv/fcvt]] FP Conversion (due to OpenPOWER Scalar FP32)
  53 * [[sv/fclass]] detect class of FP numbers
  54 * [[sv/int_fp_mv]] Move and convert GPR <-> FPR, needed for !VSX
  55 * [[sv/mv.vec]] move to and from vec2/3/4
  56 * [[sv/16_bit_compressed]] experimental
  57 * [[sv/toc_data_pointer]] experimental
  58 * [[sv/ldst]] Load and Store
  59 * [[sv/sprs]] SPRs
  60 * [[sv/bitmanip]]
  61 * [[sv/biginteger]] Operations that help with big arithmetic
  62 * [[sv/remap]] "Remapping" for Matrix Multiply and RGB "Structure Packing"
  63 * [[sv/propagation]] Context propagation including svp64, swizzle and remap
  64 * [[sv/vector_ops]] Vector ops needed to make a "complete" Vector ISA
  65 * [[sv/av_opcodes]] scalar opcodes for Audio/Video
  66 * [[sv/byteswap]]
  67 * Twin targetted instructions (two registers out, one implicit)
  68   Explanation of the rules for twin register targets
  69   (implicit RS, FRS) explained in SVP4 [[sv/svp64/appendix]]
  70   - [[isa/svfixedarith]]
  71   - [[isa/svfparith]]
  72 * TODO: OpenPOWER [[openpower/transcendentals]]
  73
  74 Additional links:
  75
  76 * <https://www.sigarch.org/simd-instructions-considered-harmful/>
  77 * [[simple_v_extension]] old (deprecated) version
  78 * [[openpower/sv/llvm]]
  79 * [[openpower/sv/effect-of-more-decode-stages-on-reg-renaming]]
  80
  81 ===
  82
  83 Required Background Reading:
  84 ============================
  85
  86 These are all, deep breath, basically... required reading, *as well as and in addition* to a full and comprehensive deep technical understanding of the Power ISA, in order to understand the depth and background on SVP64 as a 3D GPU and VPU Extension.
  87
  88 I am keenly aware that each of them is 300 to 1,000 pages (just like the Power ISA itself).
  89
  90 This is just how it is.
  91
  92 Given the sheer overwhelming size and scope of SVP64 we have gone to CONSIDERABLE LENGTHS to provide justification and rationalisation for adding the various sub-extensions to the Base Scalar Power ISA.
  93
  94 * Scalar bitmanipulation is justifiable for the exact same reasons the extensions are justifiable for other ISAs.  The additional justification for their inclusion where some instructions are already (sort-of) present in VSX is that VSX is not mandatory, and the complexity of implementation of VSX is too high a price to pay at the Embedded SFFS Compliancy Level.
  95
  96 * Scalar FP-to-INT conversions, likewise.  ARM has a javascript conversion instruction, Power ISA does not (and it costs a ridiculous 45 instructions to implement, including 6 branches!)
  97
  98 * Scalar Transcendentals (SIN, COS, ATAN2, LOG) are easily justifiable for High-Performance Compute workloads.
  99
 100 It also has to be pointed out that normally this work would be covered by multiple separate full-time Workgroups with multiple Members contributing their time and resources!
 101
 102 Overall the contributions that we are developing take the Power ISA out of the specialist highly-focussed market it is presently best known for, and expands it into areas with much wider general adoption and broader uses.
 103
 104
 105 ---
 106
 107 OpenCL specifications are linked here, these are relevant when we get to a 3D GPU / High Performance Compute ISA WG RFC:
 108 [[openpower/transcendentals]]
 109
 110 (Failure to add Transcendentals to a 3D GPU is directly equivalent to *willfully* designing a product that is 100% destined for commercial failure.)
 111
 112 I mention these because they will be encountered in every single commercial GPU ISA, but they're not part of the "Base" (core design) of a Vector Processor. Transcendentals can be added as a sub-RFC.
 113
 114 ---
 115
 116 Actual 3D GPU Architectures and ISAs:
 117 -------------------------------------
 118
 119 * Broadcom Videocore
 120   <https://github.com/hermanhermitage/videocoreiv>
 121
 122 * Etnaviv
 123   <https://github.com/etnaviv/etna_viv/tree/master/doc>
 124
 125 * Nyuzi
 126   <http://www.cs.binghamton.edu/~millerti/nyuziraster.pdf>
 127
 128 * MALI
 129   <https://github.com/cwabbott0/mali-isa-docs>
 130
 131 * AMD
 132   <https://developer.amd.com/wp-content/resources/RDNA_Shader_ISA.pdf>
 133   <https://developer.amd.com/wp-content/resources/Vega_Shader_ISA_28July2017.pdf>
 134
 135 * MIAOW which is *NOT* a 3D GPU, it is a processor which happens to implement a subset of the AMDGPU ISA (Southern Islands), aka a "GPGPU"
 136   <https://miaowgpu.org/>
 137
 138
 139 Actual Vector Processor Architectures and ISAs:
 140 -----------------------------------------------
 141
 142 * NEC SX Aurora
 143   <https://www.hpc.nec/documents/guide/pdfs/Aurora_ISA_guide.pdf>
 144
 145 * Cray ISA
 146   <http://www.bitsavers.org/pdf/cray/CRAY_Y-MP/HR-04001-0C_Cray_Y-MP_Computer_Systems_Functional_Description_Jun90.pdf>
 147
 148 * RISC-V RVV
 149   <https://github.com/riscv/riscv-v-spec>
 150
 151 * MRISC32 ISA Manual (under active development)
 152   <https://github.com/mrisc32/mrisc32/tree/master/isa-manual>
 153
 154 * Mitch Alsup's MyISA 66000 Vector Processor ISA Manual is available from Mitch on direct contact with him.  It is a different approach from the others, which may be termed "Cray-Style Horizontal-First" Vectorisation.  66000 is a *Vertical-First* Vector ISA.
 155
 156 The term Horizontal or Vertical alludes to the Matrix "Row-First" or "Column-First" technique, where:
 157
 158 * Horizontal-First processes all elements in a Vector before moving on to the next instruction
 159 * Vertical-First processes *ONE* element per instruction, and requires loop constructs to explicitly step to the next element.
 160
 161 Vector-type Support by Architecture
 162 [[!table  data="""
 163 Architecture | Horizontal | Vertical
 164 MyISA 66000  |            | X
 165 Cray         | X          |
 166 SX Aurora    | X          |
 167 RVV          | X          |
 168 SVP64        | X          | X
 169 """]]
 170
 171 ===
 172
 173 Obligatory Dilbert:
 174
 175 <img src="https://assets.amuniversal.com/7fada35026ca01393d3d005056a9545d" width="600" />
 176