(no commit message)
[libreriscv.git] / openpower / sv / vector_isa_comparison.mdwn
1 [[!tag standards]]
2
3 # Comparative analysis
4
5 These are all, deep breath, basically... required reading, *as well as
6 and in addition* to a full and comprehensive deep technical understanding
7 of the Power ISA, in order to understand the depth and background on
8 SVP64 as a 3D GPU and VPU Extension.
9
10 I am keenly aware that each of them is 300 to 1,000 pages (just like
11 the Power ISA itself).
12
13 This is just how it is.
14
15 Given the sheer overwhelming size and scope of SVP64 we have gone to
16 **considerable lengths** to provide justification and rationalisation for
17 adding the various sub-extensions to the Base Scalar Power ISA.
18
19 * Scalar bitmanipulation is justifiable for the exact same reasons the
20 extensions are justifiable for other ISAs. The additional justification
21 for their inclusion where some instructions are already (sort-of) present
22 in VSX is that VSX is not mandatory, and the complexity of implementation
23 of VSX is too high a price to pay at the Embedded SFFS Compliancy Level.
24 * Scalar FP-to-INT conversions, likewise. ARM has a javascript conversion
25 instruction, Power ISA does not (and it costs a ridiculous 45 instructions
26 to implement, including 6 branches!)
27 * Scalar Transcendentals (SIN, COS, ATAN2, LOG) are easily justifiable
28 for High-Performance Compute workloads.
29
30 It also has to be pointed out that normally this work would be covered by
31 multiple separate full-time Workgroups with multiple Members contributing
32 their time and resources.
33
34 Overall the contributions that we are developing take the Power ISA out of
35 the specialist highly-focussed market it is presently best known for, and
36 expands it into areas with much wider general adoption and broader uses.
37
38 ---
39
40 OpenCL specifications are linked here, these are relevant when we get
41 to a 3D GPU / High Performance Compute ISA WG RFC:
42 [[openpower/transcendentals]]
43
44 (Failure to add Transcendentals to a 3D GPU is directly equivalent to
45 *willfully* designing a product that is 100% destined for commercial
46 rejection, due to the extremely high competitive performance/watt achieved
47 by today's mass-volume GPUs.)
48
49 I mention these because they will be encountered in every single
50 commercial GPU ISA, but they're not part of the "Base" (core design)
51 of a Vector Processor. Transcendentals can be added as a sub-RFC.
52
53 # SIMD ISAs commonly mistaken for Vector
54
55 There is considerable confusion surrounding Vector ISAs
56 because of a mis-use of the word "Vector" in most
57 well-known Packed SIMD ISAs.
58
59 * PackedSIMD VSX. VSX, which has the word "Vector" in its name,
60 is "inspired" by Vector Processing
61 but has no "Scaling" capability, and no Predicate masking.
62 Adding Predicate Masks to the PackedSIMD VSX ISA
63 would effectively double the number of PackedSIMD
64 instructions (750 becomes 1,500)
65 * [AVX / AVX2 / AVX128 / AVX256 / AVX512](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions)
66 again has the word "Vector" in its name but this in no
67 way makes it a Vector ISA. None of the AVX-\* family
68 are "Scalable" however there is at least Predicate Masking
69 in AVX-512.
70 * ARM NEON - accurately described as a Packed SIMD ISA in
71 all literature.
72 * ARM SVE / SVE2 - partially accurately described as a Scalable Vector
73 ISA, but the "Scaling" is, rather unfortunately, a parameter
74 that is chosen by the *Hardware Architect*, rather than
75 the programmer. The actual "Scalar" part as far as the programmer
76 is concerned is supposed to be the Predicate Masks. However in
77 practice, ARM NEON programmers have found it too hard to adapt and
78 have instead attempted to fit the NEON SIMD paradigm on top of SVE.
79 This has resulted in programmers writing
80 **multiple variants** of near-identical hand-coded assembler in order
81 to target different machines with different hardware widths,
82 going directly against the advice given on ARM's developer
83 documentation.
84
85
86 # Actual 3D GPU Architectures and ISAs (all SIMD)
87
88 All of these are not Vector ISAs, they are SIMD ISAs.
89
90 * Broadcom Videocore
91 <https://github.com/hermanhermitage/videocoreiv>
92 * Etnaviv
93 <https://github.com/etnaviv/etna_viv/tree/master/doc>
94 * Nyuzi
95 <http://www.cs.binghamton.edu/~millerti/nyuziraster.pdf>
96 * MALI
97 <https://github.com/cwabbott0/mali-isa-docs>
98 * AMD
99 <https://developer.amd.com/wp-content/resources/RDNA_Shader_ISA.pdf>
100 <https://developer.amd.com/wp-content/resources/Vega_Shader_ISA_28July2017.pdf>
101 * MIAOW which is *NOT* a 3D GPU, it is a processor which happens to
102 implement a subset of the AMDGPU ISA (Southern Islands), aka a "GPGPU"
103 <https://miaowgpu.org/>
104
105
106 # Actual Scalar Vector Processor Architectures and ISAs
107
108 * NEC SX Aurora
109 <https://www.hpc.nec/documents/guide/pdfs/Aurora_ISA_guide.pdf>
110 * Cray ISA
111 <http://www.bitsavers.org/pdf/cray/CRAY_Y-MP/HR-04001-0C_Cray_Y-MP_Computer_Systems_Functional_Description_Jun90.pdf>
112 * RISC-V RVV
113 <https://github.com/riscv/riscv-v-spec>
114 * MRISC32 ISA Manual (under active development)
115 <https://github.com/mrisc32/mrisc32/tree/master/isa-manual>
116 * Mitch Alsup's MyISA 66000 Vector Processor ISA Manual is available from
117 Mitch on direct contact with him. It is a different approach from the
118 others, which may be termed "Cray-Style Horizontal-First" Vectorisation.
119 66000 is a *Vertical-First* Vector ISA.
120
121 The term Horizontal or Vertical alludes to the Matrix "Row-First" or
122 "Column-First" technique, where:
123
124 * Horizontal-First processes all elements in a Vector before moving on
125 to the next instruction
126 * Vertical-First processes *ONE* element per instruction, and requires
127 loop constructs to explicitly step to the next element.
128
129 Vector-type Support by Architecture
130
131
132 | Architecture | Horizontal | Vertical |
133 | - | - |
134 | MyISA 66000 | | X |
135 | Cray | X | |
136 | SX Aurora | X | |
137 | RVV | X | |
138 | SVP64 | X | X |
139