use () not {} see if PDF improves
[libreriscv.git] / openpower / sv / vector_isa_comparison.mdwn
1 [[!tag standards]]
2
3 # Comparative analysis
4
5 These are all, deep breath, basically... required reading, *as well as
6 and in addition* to a full and comprehensive deep technical understanding
7 of the Power ISA, in order to understand the depth and background on
8 SVP64 as a 3D GPU and VPU Extension.
9
10 I am keenly aware that each of them is 300 to 1,000 pages (just like
11 the Power ISA itself).
12
13 This is just how it is.
14
15 Given the sheer overwhelming size and scope of SVP64 we have gone to
16 **considerable lengths** to provide justification and rationalisation for
17 adding the various sub-extensions to the Base Scalar Power ISA.
18
19 * Scalar bitmanipulation is justifiable for the exact same reasons the
20 extensions are justifiable for other ISAs. The additional justification
21 for their inclusion where some instructions are already (sort-of) present
22 in VSX is that VSX is not mandatory, and the complexity of implementation
23 of VSX is too high a price to pay at the Embedded SFFS Compliancy Level.
24 * Scalar FP-to-INT conversions, likewise. ARM has a javascript conversion
25 instruction, Power ISA does not (and it costs a ridiculous 45 instructions
26 to implement, including 6 branches!)
27 * Scalar Transcendentals (SIN, COS, ATAN2, LOG) are easily justifiable
28 for High-Performance Compute workloads.
29
30 It also has to be pointed out that normally this work would be covered by
31 multiple separate full-time Workgroups with multiple Members contributing
32 their time and resources. In RISC-V there are over sixty Technical Working
33 Groups https://riscv.org/community/directory-of-working-groups/
34
35 Overall the contributions that we are developing take the Power ISA out of
36 the specialist highly-focussed market it is presently best known for, and
37 expands it into areas with much wider general adoption and broader uses.
38
39 ---
40
41 OpenCL specifications are linked here, these are relevant when we get
42 to a 3D GPU / High Performance Compute ISA WG RFC:
43 [[openpower/transcendentals]]
44
45 (Failure to add Transcendentals to a 3D GPU is directly equivalent to
46 *willfully* designing a product that is 100% destined for commercial
47 rejection, due to the extremely high competitive performance/watt achieved
48 by today's mass-volume GPUs.)
49
50 I mention these because they will be encountered in every single
51 commercial GPU ISA, but they're not part of the "Base" (core design)
52 of a Vector Processor. Transcendentals can be added as a sub-RFC.
53
54 # SIMD ISAs commonly mistaken for Vector
55
56 There is considerable confusion surrounding Vector ISAs
57 because of a mis-use of the word "Vector" in the marketing
58 material of most well-known Packed SIMD ISAs of the past 3
59 decades. These Packed
60 SIMD ISAs used features "inspired" from Scalable Vector ISAs.
61
62 * PackedSIMD VSX. VSX, which has the word "Vector" in its name,
63 is "inspired" by Vector Processing
64 but has no "Scaling" capability, and no Predicate masking.
65 Both these factors put pressure on developers to use
66 "inline assembler unrolling" and data repetition, which in turn
67 is detrimental to both L1 Data and Instruction Caches.
68 Adding Predicate Masks to the PackedSIMD VSX ISA
69 would effectively double the number of PackedSIMD
70 instructions (750 becomes 1,500) even if it were practical
71 to do so (no available 32 bit encoding space).
72 * [AVX / AVX2 / AVX128 / AVX256 / AVX512](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions)
73 again has the word "Vector" in its name but this in no
74 way makes it a Vector ISA. None of the AVX-\* family
75 are "Scalable" however there is at least Predicate Masking
76 in AVX-512.
77 * ARM NEON - accurately described as a Packed SIMD ISA in
78 all literature.
79 * ARM SVE / SVE2 - partially accurately described as a Scalable Vector
80 ISA, but the "Scaling" is, rather unfortunately, a parameter
81 that is chosen by the *Hardware Architect*, rather than
82 the programmer. The actual "Scalar" part as far as the programmer
83 is concerned is supposed to be the Predicate Masks. However in
84 practice, ARM NEON programmers have found it too hard to adapt and
85 have instead attempted to fit the NEON SIMD paradigm on top of SVE.
86 This has resulted in programmers writing
87 **multiple variants** of near-identical hand-coded assembler in order
88 to target different machines with different hardware widths,
89 going directly against the advice given on ARM's developer
90 documentation.
91
92 The saving grace of PackedSIMD VSX is that it did not fall to the
93 seduction outlined in the "SIMD Considered Harmful" article
94 <https://www.sigarch.org/simd-instructions-considered-harmful/>.
95 It is clear that it is expected to deploy Multi-Issue to achieve
96 high performance, which is a much cleaner approach that has not
97 resulted in ISA poisoning such as that suffered by x86 (AVX).
98
99 # Actual 3D GPU Architectures and ISAs (all SIMD)
100
101 All of these are not Vector ISAs, they are SIMD ISAs.
102
103 * Broadcom Videocore
104 <https://github.com/hermanhermitage/videocoreiv>
105 * Etnaviv
106 <https://github.com/etnaviv/etna_viv/tree/master/doc>
107 * Nyuzi
108 <http://www.cs.binghamton.edu/~millerti/nyuziraster.pdf>
109 * MALI
110 <https://github.com/cwabbott0/mali-isa-docs>
111 * AMD
112 <https://developer.amd.com/wp-content/resources/RDNA_Shader_ISA.pdf>
113 <https://developer.amd.com/wp-content/resources/Vega_Shader_ISA_28July2017.pdf>
114 * MIAOW which is *NOT* a 3D GPU, it is a processor which happens to
115 implement a subset of the AMDGPU ISA (Southern Islands), aka a "GPGPU"
116 <https://miaowgpu.org/>
117
118
119 # Actual Scalar Vector Processor Architectures and ISAs
120
121 * NEC SX Aurora
122 <https://www.hpc.nec/documents/guide/pdfs/Aurora_ISA_guide.pdf>
123 * Cray ISA
124 <http://www.bitsavers.org/pdf/cray/CRAY_Y-MP/HR-04001-0C_Cray_Y-MP_Computer_Systems_Functional_Description_Jun90.pdf>
125 * RISC-V RVV
126 <https://github.com/riscv/riscv-v-spec>
127 * MRISC32 ISA Manual (under active development)
128 <https://github.com/mrisc32/mrisc32/tree/master/isa-manual>
129 * Mitch Alsup's MyISA 66000 Vector Processor ISA Manual is available from
130 Mitch on direct contact with him. It is a different approach from the
131 others, which may be termed "Cray-Style Horizontal-First" Vectorisation.
132 66000 is a *Vertical-First* Vector ISA.
133
134 The term Horizontal or Vertical alludes to the Matrix "Row-First" or
135 "Column-First" technique, where:
136
137 * Horizontal-First processes all elements in a Vector before moving on
138 to the next instruction
139 * Vertical-First processes *ONE* element per instruction, and requires
140 loop constructs to explicitly step to the next element.
141
142 Vector-type Support by Architecture
143
144
145 | Architecture | Horizontal | Vertical |
146 | ------------ | ---------- | -------- |
147 | MyISA 66000 | | X |
148 | Cray | X | |
149 | SX Aurora | X | |
150 | RVV | X | |
151 | SVP64 | X | X |
152
153 ![Horizontal vs Vertical](/openpower/sv/sv_horizontal_vs_vertical.svg)
154