From: Jacob Lifshay Date: Fri, 29 Jul 2022 09:18:06 +0000 (-0700) Subject: name footnotes X-Git-Tag: opf_rfc_ls005_v1~954 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=2c43f09c02cc52fed32d2b87d6780c5adc0194b4;p=libreriscv.git name footnotes --- diff --git a/openpower/sv/comparison_table.mdwn b/openpower/sv/comparison_table.mdwn index 2feef0524..961a2b63f 100644 --- a/openpower/sv/comparison_table.mdwn +++ b/openpower/sv/comparison_table.mdwn @@ -2,52 +2,52 @@ |ISA
name |Num
opcodes|Num
intrinsics|Taxonomy /
Class|setvl
scalable|Predicate
Masks|Twin
Predication|Explicit
Vector regs|128-bit
ops|Bigint |LDST
Fault-First|Data-dep
Fail-first|Pred-
Result|Matrix HW
support|DCT/FFT HW
support | |--------------|---------------|------------------|---------------------|-------------------|--------------------|---------------------|-------------------------|----------------|--------|---------------------|-----------------------|----------------|---------------------|-----------------------| -|Draft SVP64 |5 [^1] |see [^26] |Scalable [^2] |yes |yes |yes [^3] |no [^4] |see [^5] |yes [^6] |yes [^7] |yes [^8] |yes [^9] |yes [^10] | yes[^11] | -|VSX |700+ |700+? [^27] |Packed SIMD |no |no |no |yes [^12] |yes |no |no |no |no |yes [^13] | no | -|NEON |~250 [^14] |7088 [^28] |Packed SIMD |no |no |no |yes |yes |no |no |no |no |no | no | -|SVE2 |~1000 [^15] |6040 [^29] |Predicated SIMD[^16] |no [^16] |yes |no |yes |yes |no |yes [^7] |no |no |yes [^33] | no | -|AVX512 [^17] |~1000s [^18] |7256 [^30] |Predicated SIMD |no |yes |no |yes |yes |no |no |no |no |yes [^34] | no | -|RVV [^19] |~190 [^20] |~25000 [^31] |Scalable [^21] |yes |yes |no |yes |yes [^22] |no |yes |no |no |no | no | -|Aurora SX[^23] |~200 [^24] |unknown [^32] |Scalable [^25] |yes |yes |no |yes |no |no |no |no |no |? | no | +|Draft SVP64 |5 [^svp64_prefix] |see [^svp64_intrin_cnt] |Scalable [^svp64_scalable] |yes |yes |yes [^twin_pred] |no [^svp64_no_vec_regs] |see [^svp64_128] |yes [^bigint] |yes [^fail_first] |yes [^data_fail_first] |yes [^pred_result] |yes [^svp64_mat] | yes[^svp64_fft] | +|VSX |700+ |700+? [^vsx_intrin] |Packed SIMD |no |no |no |yes [^vsx_vec_regs] |yes |no |no |no |no |yes [^ppc_mma] | no | +|NEON |~250 [^neon_opcodes] |7088 [^neon_intrin] |Packed SIMD |no |no |no |yes |yes |no |no |no |no |no | no | +|SVE2 |~1000 [^sve2_opcodes] |6040 [^sve2_intrin] |Predicated SIMD[^sve2_no_setvl] |no [^sve2_no_setvl] |yes |no |yes |yes |no |yes [^fail_first] |no |no |yes [^sve2_mat] | no | +|AVX512 [^avx512_wikipedia] |~1000s [^avx512_opcodes] |7256 [^avx512_intrin] |Predicated SIMD |no |yes |no |yes |yes |no |no |no |no |yes [^x86_amx] | no | +|RVV [^rvv_spec] |~190 [^rvv_opcodes] |~25000 [^rvv_intrin] |Scalable [^rvv_scalable] |yes |yes |no |yes |yes [^rvv_128] |no |yes |no |no |no | no | +|Aurora SX[^sx_aurora] |~200 [^aurora_isa] |unknown [^aurora_intrin] |Scalable [^aurora_scalable] |yes |yes |no |yes |no |no |no |no |no |? | no | -[^1]: plus EXT001 24-bit prefixing using 25% of EXT001 space. See [[sv/svp64]] -[^2]: A 2-Dimensional Scalable Vector ISA **specifically designed for the Power ISA** with both Horizontal-First and Vertical-First Modes. See [[sv/vector_isa_comparison]] -[^3]: on specific operations. See [[opcode_regs_deduped]] for full list. Key: 2P - Twin Predication, 1P - Single-Predicate -[^4]: SVP64 provides a Vector concept on top of the **Scalar** GPR, FPR and CR Fields, extended to 128 entries. -[^5]: SVP64 Vectorises Scalar ops. It is up to the **implementor** to choose (**optionally**) whether to apply SVP64 to e.g. VSX Quad-Precision (128-bit) instructions, to create 128-bit Vector ops. -[^6]: big-integer add is just `sv.adde`. For optimal performance Bigint Mul and divide first require addition of two scalar operations (in turn, naturally Vectorised by SVP64). See [[sv/biginteger/analysis]] -[^7]: See [[sv/svp64/appendix]] and [ARM SVE Fault-First](https://alastairreid.github.io/papers/sve-ieee-micro-2017.pdf) -[^8]: Based on LD/ST Fail-first, extended to data. See [[sv/svp64/appendix]] -[^9]: Predicate-result effectively turns any standard op into a type of "cmp". See [[sv/svp64/appendix]] -[^10]: Any non-power-of-two Matrices up to 127 FMACs (or other FMA-style op), full triple-loop Schedule. See [[sv/remap]] -[^11]: DCT (Lee) and FFT Full Triple-loops supported, RADIX2-only. Normally only found in VLIW DSPs (TI MSP320, Qualcom Hexagon). See [[sv/remap]] -[^12]: VSX's Vector Registers are mis-named: they are 100% PackedSIMD. AVX-512 is not a Vector ISA either. See [Flynn's Taxonomy](https://en.wikipedia.org/wiki/Flynn%27s_taxonomy) -[^13]: Power ISA v3.1 contains "Matrix Multiply Assist" (MMA) which due to PackedSIMD is restricted to RADIX2 and requires inline assembler loop-unrolling for non-power-of-two Matrix dimensions -[^14]: difficult to ascertain, see [NEON/VFP](https://developer.arm.com/documentation/den0018/a/NEON-and-VFP-Instruction-Summary/List-of-all-NEON-and-VFP-instructions). +[^svp64_prefix]: plus EXT001 24-bit prefixing using 25% of EXT001 space. See [[sv/svp64]] +[^svp64_scalable]: A 2-Dimensional Scalable Vector ISA **specifically designed for the Power ISA** with both Horizontal-First and Vertical-First Modes. See [[sv/vector_isa_comparison]] +[^twin_pred]: on specific operations. See [[opcode_regs_deduped]] for full list. Key: 2P - Twin Predication, 1P - Single-Predicate +[^svp64_no_vec_regs]: SVP64 provides a Vector concept on top of the **Scalar** GPR, FPR and CR Fields, extended to 128 entries. +[^svp64_128]: SVP64 Vectorises Scalar ops. It is up to the **implementor** to choose (**optionally**) whether to apply SVP64 to e.g. VSX Quad-Precision (128-bit) instructions, to create 128-bit Vector ops. +[^bigint]: big-integer add is just `sv.adde`. For optimal performance Bigint Mul and divide first require addition of two scalar operations (in turn, naturally Vectorised by SVP64). See [[sv/biginteger/analysis]] +[^fail_first]: See [[sv/svp64/appendix]] and [ARM SVE Fault-First](https://alastairreid.github.io/papers/sve-ieee-micro-2017.pdf) +[^data_fail_first]: Based on LD/ST Fail-first, extended to data. See [[sv/svp64/appendix]] +[^pred_result]: Predicate-result effectively turns any standard op into a type of "cmp". See [[sv/svp64/appendix]] +[^svp64_mat]: Any non-power-of-two Matrices up to 127 FMACs (or other FMA-style op), full triple-loop Schedule. See [[sv/remap]] +[^svp64_fft]: DCT (Lee) and FFT Full Triple-loops supported, RADIX2-only. Normally only found in VLIW DSPs (TI MSP320, Qualcom Hexagon). See [[sv/remap]] +[^vsx_vec_regs]: VSX's Vector Registers are mis-named: they are 100% PackedSIMD. AVX-512 is not a Vector ISA either. See [Flynn's Taxonomy](https://en.wikipedia.org/wiki/Flynn%27s_taxonomy) +[^ppc_mma]: Power ISA v3.1 contains "Matrix Multiply Assist" (MMA) which due to PackedSIMD is restricted to RADIX2 and requires inline assembler loop-unrolling for non-power-of-two Matrix dimensions +[^neon_opcodes]: difficult to ascertain, see [NEON/VFP](https://developer.arm.com/documentation/den0018/a/NEON-and-VFP-Instruction-Summary/List-of-all-NEON-and-VFP-instructions). Critically depends on ARM Scalar instructions -[^15]: difficult to exactly ascertain, see ARM Architecture Reference Manual Supplement, DDI 0584. Critically depends on ARM Scalar instructions. -[^16]: ARM states that the Scalability is a [Silicon-partner choice](https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/102340_0001_00_en_introduction-to-sve2.pdf?revision=aae96dd2-5334-4ad3-9a47-393086a20fea). +[^sve2_opcodes]: difficult to exactly ascertain, see ARM Architecture Reference Manual Supplement, DDI 0584. Critically depends on ARM Scalar instructions. +[^sve2_no_setvl]: ARM states that the Scalability is a [Silicon-partner choice](https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/102340_0001_00_en_introduction-to-sve2.pdf?revision=aae96dd2-5334-4ad3-9a47-393086a20fea). Scalability in the ISA is **not available to the programmer**: there is no `setvl` instruction in SVE2, which is already causing assembler programmer difficulties. [quote](https://gist.github.com/zingaburga/805669eb891c820bd220418ee3f0d6bd#file-sve2-md) **"you may be stuck with only using the bottom 128 bits of the vector, or need to code specifically for each width"** -[^17]: [AVX512 Wikipedia](https://en.wikipedia.org/wiki/AVX-512), [Lifecycle of an instruction set](https://media.handmade-seattle.com/tom-forsyth/) including full slides -[^18]: difficult to exactly ascertain, contains subsets. Critically depends on ISA support from earlier x86 ISA subsets (several more thousand instructions). See [SIMD ISA listing](https://www.officedaytime.com/simd512e/) -[^19]: [RVV Spec](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc) -[^20]: RISC-V Vectors are not stand-alone, i.e. like SVE2 and AVX-512 are critically dependent on the Scalar ISA (an additional ~96 instructions for the Scalar RV64GC set, needed for Linux). -[^21]: Like the original Cray RVV is a truly scalable Vector ISA (Cray setvl instruction). However, like SVE2, the Maximum Vector length is a Silicon-partner choice, which creates similar limitations that SVP64 does not have. +[^avx512_wikipedia]: [AVX512 Wikipedia](https://en.wikipedia.org/wiki/AVX-512), [Lifecycle of an instruction set](https://media.handmade-seattle.com/tom-forsyth/) including full slides +[^avx512_opcodes]: difficult to exactly ascertain, contains subsets. Critically depends on ISA support from earlier x86 ISA subsets (several more thousand instructions). See [SIMD ISA listing](https://www.officedaytime.com/simd512e/) +[^rvv_spec]: [RVV Spec](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc) +[^rvv_opcodes]: RISC-V Vectors are not stand-alone, i.e. like SVE2 and AVX-512 are critically dependent on the Scalar ISA (an additional ~96 instructions for the Scalar RV64GC set, needed for Linux). +[^rvv_scalable]: Like the original Cray RVV is a truly scalable Vector ISA (Cray setvl instruction). However, like SVE2, the Maximum Vector length is a Silicon-partner choice, which creates similar limitations that SVP64 does not have. The RISC-V Founders strongly discourage efforts by programmers to find out the Silicon's Maximum Vector Length, as an effort to steer programmers towards Silicon-independent assembler. This requires **all** algorithms to contain a loop construct. MAXVL in SVP64 is a Spec-hard-fixed quantity therefore loop constructs are not necessary 100% of the time. -[^22]: like SVP64 it is up to the hardware implementor (Silicon partner) to choose whether to support 128-bit elements. -[^23]: [NEC SX Aurora](https://ftp.libre-soc.org/NEC_SX_Aurora_TSUBASA_VectorEngine-as-manual-v1.2.pdf) is based on the original Cray Vectors -[^24]: [Aurora ISA guide](https://sxauroratsubasa.sakura.ne.jp/documents/guide/pdfs/Aurora_ISA_guide.pdf) Appendix-3 11.1 p508 -[^25]: Like the original Cray Vectors, the ISA Vector Length is independent of the underlying hardware, however Generation 1 has 256 elements per Vector register (3.2.4 p24, Aurora ISA guide) -[^26]: If treated as a 1-Dimensional ISA, and designed badly, the 24-bit Prefix expands 200+ scalar instructions to well over a million intrinsics (N~=10^4 **times** M~=10^2). +[^rvv_128]: like SVP64 it is up to the hardware implementor (Silicon partner) to choose whether to support 128-bit elements. +[^sx_aurora]: [NEC SX Aurora](https://ftp.libre-soc.org/NEC_SX_Aurora_TSUBASA_VectorEngine-as-manual-v1.2.pdf) is based on the original Cray Vectors +[^aurora_isa]: [Aurora ISA guide](https://sxauroratsubasa.sakura.ne.jp/documents/guide/pdfs/Aurora_ISA_guide.pdf) Appendix-3 11.1 p508 +[^aurora_scalable]: Like the original Cray Vectors, the ISA Vector Length is independent of the underlying hardware, however Generation 1 has 256 elements per Vector register (3.2.4 p24, Aurora ISA guide) +[^svp64_intrin_cnt]: If treated as a 1-Dimensional ISA, and designed badly, the 24-bit Prefix expands 200+ scalar instructions to well over a million intrinsics (N~=10^4 **times** M~=10^2). If treated as a 2-Dimensional ISA and designed well, there are far less. N prefix intrinsics **plus** M scalar instruction intrinsics, where N is likely to be of the order of 10^2 and M of the order of 10^2. -[^27]: [Altivec gcc intrinsics](https://gcc.gnu.org/onlinedocs/gcc/PowerPC-AltiVec_002fVSX-Built-in-Functions.html), contains links to additional VSX intrinsics for ISA 2.05/6/7, 3.0 and 3.1 -[^28]: NEON 32-bit 2754 intrinsics, NEON 64-bit 4334 intrinsics. -[^29]: SVE: 4140 intrinsics, SVE2 1900 intrinsics -[^30]: Count includes SSE, SSE2, AVX, AVX2 and all AVX512 variants -[^31]: [RVV intrinsics listing](https://raw.githubusercontent.com/riscv-non-isa/rvv-intrinsic-doc/master/intrinsic_funcs.md) page is 25,000 lines long. -[^32]: Unknown. estimated to be of the order of length of RVV due to also being a Cray-style Scalable ISA, NEC maintains an [LLVM hard fork](https://github.com/sx-aurora-dev) -[^33]: [Scalable Matrix Optional Extension](https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/scalable-matrix-extension-armv9-a-architecture) +[^vsx_intrin]: [Altivec gcc intrinsics](https://gcc.gnu.org/onlinedocs/gcc/PowerPC-AltiVec_002fVSX-Built-in-Functions.html), contains links to additional VSX intrinsics for ISA 2.05/6/7, 3.0 and 3.1 +[^neon_intrin]: NEON 32-bit 2754 intrinsics, NEON 64-bit 4334 intrinsics. +[^sve2_intrin]: SVE: 4140 intrinsics, SVE2 1900 intrinsics +[^avx512_intrin]: Count includes SSE, SSE2, AVX, AVX2 and all AVX512 variants +[^rvv_intrin]: [RVV intrinsics listing](https://raw.githubusercontent.com/riscv-non-isa/rvv-intrinsic-doc/master/intrinsic_funcs.md) page is 25,000 lines long. +[^aurora_intrin]: Unknown. estimated to be of the order of length of RVV due to also being a Cray-style Scalable ISA, NEC maintains an [LLVM hard fork](https://github.com/sx-aurora-dev) +[^sve2_mat]: [Scalable Matrix Optional Extension](https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/scalable-matrix-extension-armv9-a-architecture) the key is an outer-product instruction [SMOPA](https://developer.arm.com/documentation/ddi0602/2022-06/SME-Instructions/SMOPA--Signed-integer-sum-of-outer-products-and-accumulate-?lang=en) which is very hard to tell at a glance if it is power-2 or non-power-2 -[^34]: [Advanced matrix Extensions](https://en.wikipedia.org/wiki/Advanced_Matrix_Extensions) supports BF16 and INT8 only. Separate regfile, power-of-two "tiles". Not general-purpose at all. +[^x86_amx]: [Advanced matrix Extensions](https://en.wikipedia.org/wiki/Advanced_Matrix_Extensions) supports BF16 and INT8 only. Separate regfile, power-of-two "tiles". Not general-purpose at all.