From e20b4aa1e1f9add001782e3d9dbc90a4ab80764e Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Fri, 17 Jun 2022 11:35:37 +0100 Subject: [PATCH] split vector comparative analysis into separate page --- openpower/sv.mdwn | 140 +----------------------- openpower/sv/vector_isa_comparison.mdwn | 134 +++++++++++++++++++++++ 2 files changed, 136 insertions(+), 138 deletions(-) create mode 100644 openpower/sv/vector_isa_comparison.mdwn diff --git a/openpower/sv.mdwn b/openpower/sv.mdwn index f91319e55..c8156cbe1 100644 --- a/openpower/sv.mdwn +++ b/openpower/sv.mdwn @@ -132,6 +132,8 @@ Pages being developed and examples contains explanations and further details * [[sv/svp64_quirks]] things in SVP64 that slightly break the rules * [[opcode_regs_deduped]] autogenerated table of SVP64 instructions +* [[sv/vector_comparative_analysis] - a list of Packed SIMD, GPU, + and other Scalable Vector ISAs * [[sv/sprs]] SPRs * SVP64 "Modes": - For condition register operations see [[sv/cr_ops]] - SVP64 Condition @@ -194,141 +196,3 @@ Additional links: * [[openpower/sv/llvm]] * [[openpower/sv/effect-of-more-decode-stages-on-reg-renaming]] -=== - -Required Background Reading: -============================ - -These are all, deep breath, basically... required reading, *as well as -and in addition* to a full and comprehensive deep technical understanding -of the Power ISA, in order to understand the depth and background on -SVP64 as a 3D GPU and VPU Extension. - -I am keenly aware that each of them is 300 to 1,000 pages (just like -the Power ISA itself). - -This is just how it is. - -Given the sheer overwhelming size and scope of SVP64 we have gone to -**considerable lengths** to provide justification and rationalisation for -adding the various sub-extensions to the Base Scalar Power ISA. - -* Scalar bitmanipulation is justifiable for the exact same reasons the - extensions are justifiable for other ISAs. The additional justification - for their inclusion where some instructions are already (sort-of) present - in VSX is that VSX is not mandatory, and the complexity of implementation - of VSX is too high a price to pay at the Embedded SFFS Compliancy Level. -* Scalar FP-to-INT conversions, likewise. ARM has a javascript conversion - instruction, Power ISA does not (and it costs a ridiculous 45 instructions - to implement, including 6 branches!) -* Scalar Transcendentals (SIN, COS, ATAN2, LOG) are easily justifiable - for High-Performance Compute workloads. - -It also has to be pointed out that normally this work would be covered by -multiple separate full-time Workgroups with multiple Members contributing -their time and resources. - -Overall the contributions that we are developing take the Power ISA out of -the specialist highly-focussed market it is presently best known for, and -expands it into areas with much wider general adoption and broader uses. - - ---- - -OpenCL specifications are linked here, these are relevant when we get -to a 3D GPU / High Performance Compute ISA WG RFC: -[[openpower/transcendentals]] - -(Failure to add Transcendentals to a 3D GPU is directly equivalent to -*willfully* designing a product that is 100% destined for commercial -rejection, due to the extremely high competitive performance/watt achieved -by today's mass-volume GPUs.) - -I mention these because they will be encountered in every single -commercial GPU ISA, but they're not part of the "Base" (core design) -of a Vector Processor. Transcendentals can be added as a sub-RFC. - ---- - -SIMD ISAs commonly mistaken for Vector: ---------------------------------------- - -There is considerable confusion surrounding Vector ISAs -because of a mis-use of the word "Vector" in most -well-known Packed SIMD ISAs. - -* PackedSIMD VSX. VSX, which has the word "Vector" in its name, - is "inspired" by Vector Processing - but has no "Scaling" capability, and no Predicate masking. - Adding Predicate Masks to the PackedSIMD VSX ISA - would effectively double the number of PackedSIMD - instructions (750 becomes 1,500) -* [AVX / AVX2 / AVX128 / AVX256 / AVX512](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) - again has the word "Vector" in its name but this in no - way makes it a Vector ISA. None of the AVX-\* family - are "Scalable" however there is at least Predicate Masking - in AVX-512. -* ARM NEON - accurately described as a Packed SIMD ISA in - all literature. -* ARM SVE / SVE2 - accurately described as a Scalable Vector - ISA, but the "Scaling" is, rather unfortunately, a parameter - that is chosen by the *Hardware Architect*, rather than - the programmer. This has resulted in programmers writing - multiple variants of hand-coded assembler in order - to target different machines with different hardware widths, - going directly against the advice given on ARM's developer - documentation. - - -Actual 3D GPU Architectures and ISAs: -------------------------------------- - -* Broadcom Videocore - -* Etnaviv - -* Nyuzi - -* MALI - -* AMD - - -* MIAOW which is *NOT* a 3D GPU, it is a processor which happens to implement a subset of the AMDGPU ISA (Southern Islands), aka a "GPGPU" - - - -Actual Scalar Vector Processor Architectures and ISAs: ------------------------------------------------------- - -* NEC SX Aurora - -* Cray ISA - -* RISC-V RVV - -* MRISC32 ISA Manual (under active development) - -* Mitch Alsup's MyISA 66000 Vector Processor ISA Manual is available from - Mitch on direct contact with him. It is a different approach from the - others, which may be termed "Cray-Style Horizontal-First" Vectorisation. - 66000 is a *Vertical-First* Vector ISA. - -The term Horizontal or Vertical alludes to the Matrix "Row-First" or -"Column-First" technique, where: - -* Horizontal-First processes all elements in a Vector before moving on - to the next instruction -* Vertical-First processes *ONE* element per instruction, and requires - loop constructs to explicitly step to the next element. - -Vector-type Support by Architecture -[[!table data=""" -Architecture | Horizontal | Vertical -MyISA 66000 | | X -Cray | X | -SX Aurora | X | -RVV | X | -SVP64 | X | X -"""]] - diff --git a/openpower/sv/vector_isa_comparison.mdwn b/openpower/sv/vector_isa_comparison.mdwn new file mode 100644 index 000000000..67d1a675b --- /dev/null +++ b/openpower/sv/vector_isa_comparison.mdwn @@ -0,0 +1,134 @@ +[[!tag standards]] + +# Comparative analysis + +These are all, deep breath, basically... required reading, *as well as +and in addition* to a full and comprehensive deep technical understanding +of the Power ISA, in order to understand the depth and background on +SVP64 as a 3D GPU and VPU Extension. + +I am keenly aware that each of them is 300 to 1,000 pages (just like +the Power ISA itself). + +This is just how it is. + +Given the sheer overwhelming size and scope of SVP64 we have gone to +**considerable lengths** to provide justification and rationalisation for +adding the various sub-extensions to the Base Scalar Power ISA. + +* Scalar bitmanipulation is justifiable for the exact same reasons the + extensions are justifiable for other ISAs. The additional justification + for their inclusion where some instructions are already (sort-of) present + in VSX is that VSX is not mandatory, and the complexity of implementation + of VSX is too high a price to pay at the Embedded SFFS Compliancy Level. +* Scalar FP-to-INT conversions, likewise. ARM has a javascript conversion + instruction, Power ISA does not (and it costs a ridiculous 45 instructions + to implement, including 6 branches!) +* Scalar Transcendentals (SIN, COS, ATAN2, LOG) are easily justifiable + for High-Performance Compute workloads. + +It also has to be pointed out that normally this work would be covered by +multiple separate full-time Workgroups with multiple Members contributing +their time and resources. + +Overall the contributions that we are developing take the Power ISA out of +the specialist highly-focussed market it is presently best known for, and +expands it into areas with much wider general adoption and broader uses. + +--- + +OpenCL specifications are linked here, these are relevant when we get +to a 3D GPU / High Performance Compute ISA WG RFC: +[[openpower/transcendentals]] + +(Failure to add Transcendentals to a 3D GPU is directly equivalent to +*willfully* designing a product that is 100% destined for commercial +rejection, due to the extremely high competitive performance/watt achieved +by today's mass-volume GPUs.) + +I mention these because they will be encountered in every single +commercial GPU ISA, but they're not part of the "Base" (core design) +of a Vector Processor. Transcendentals can be added as a sub-RFC. + +# SIMD ISAs commonly mistaken for Vector + +There is considerable confusion surrounding Vector ISAs +because of a mis-use of the word "Vector" in most +well-known Packed SIMD ISAs. + +* PackedSIMD VSX. VSX, which has the word "Vector" in its name, + is "inspired" by Vector Processing + but has no "Scaling" capability, and no Predicate masking. + Adding Predicate Masks to the PackedSIMD VSX ISA + would effectively double the number of PackedSIMD + instructions (750 becomes 1,500) +* [AVX / AVX2 / AVX128 / AVX256 / AVX512](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) + again has the word "Vector" in its name but this in no + way makes it a Vector ISA. None of the AVX-\* family + are "Scalable" however there is at least Predicate Masking + in AVX-512. +* ARM NEON - accurately described as a Packed SIMD ISA in + all literature. +* ARM SVE / SVE2 - accurately described as a Scalable Vector + ISA, but the "Scaling" is, rather unfortunately, a parameter + that is chosen by the *Hardware Architect*, rather than + the programmer. This has resulted in programmers writing + multiple variants of hand-coded assembler in order + to target different machines with different hardware widths, + going directly against the advice given on ARM's developer + documentation. + + +# Actual 3D GPU Architectures and ISAs (all SIMD) + +All of these are not Vector ISAs, they are SIMD ISAs. + +* Broadcom Videocore + +* Etnaviv + +* Nyuzi + +* MALI + +* AMD + + +* MIAOW which is *NOT* a 3D GPU, it is a processor which happens to + implement a subset of the AMDGPU ISA (Southern Islands), aka a "GPGPU" + + + +# Actual Scalar Vector Processor Architectures and ISAs + +* NEC SX Aurora + +* Cray ISA + +* RISC-V RVV + +* MRISC32 ISA Manual (under active development) + +* Mitch Alsup's MyISA 66000 Vector Processor ISA Manual is available from + Mitch on direct contact with him. It is a different approach from the + others, which may be termed "Cray-Style Horizontal-First" Vectorisation. + 66000 is a *Vertical-First* Vector ISA. + +The term Horizontal or Vertical alludes to the Matrix "Row-First" or +"Column-First" technique, where: + +* Horizontal-First processes all elements in a Vector before moving on + to the next instruction +* Vertical-First processes *ONE* element per instruction, and requires + loop constructs to explicitly step to the next element. + +Vector-type Support by Architecture +[[!table data=""" +Architecture | Horizontal | Vertical +MyISA 66000 | | X +Cray | X | +SX Aurora | X | +RVV | X | +SVP64 | X | X +"""]] + -- 2.30.2