From: Luke Kenneth Casson Leighton Date: Fri, 29 Jul 2022 01:20:22 +0000 (+0100) Subject: increase ref numbers X-Git-Tag: opf_rfc_ls005_v1~967 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=5280bed9b2922808c4fafbab312aa0fd38ad00b0;p=libreriscv.git increase ref numbers --- diff --git a/openpower/sv/comparison_table.mdwn b/openpower/sv/comparison_table.mdwn index 4392ba53a..9ec4a8330 100644 --- a/openpower/sv/comparison_table.mdwn +++ b/openpower/sv/comparison_table.mdwn @@ -3,12 +3,12 @@ |ISA
name |Num
opcodes|Num
intrinsics|Taxonomy /
Class|setvl
scalable|Predicate
Masks|Twin
Predication|Explicit
Vector regs|128-bit
ops|Bigint |LDST
Fault-First|Data-dep
Fail-first|Pred-
Result|Matrix HW
support| |--------------|---------------|------------------|---------------------|-------------------|--------------------|---------------------|-------------------------|----------------|--------|---------------------|-----------------------|----------------|---------------------| |Draft SVP64 |5 (1) |see (25) |Scalable (2) |yes |yes |yes (3) |no (4) |see (5) |yes (6) |yes (7) |yes (8) |yes (9) |yes (10) | -|VSX |700+ |700+? (26) |Packed SIMD |no |no |no |yes (11) |yes |no |no |no |no |yes (12) | -|NEON |~250 (13) |7088 (27) |Packed SIMD |no |no |no |yes |yes |no |no |no |no |no | -|SVE2 |~1000 (14) |6040 (28) |Predicated SIMD(15) |no (15) |yes |no |yes |yes |no |yes (7) |no |no |yes (32) | -|AVX512 (16) |~1000s (17) |7256 (29) |Predicated SIMD |no |yes |no |yes |yes |no |no |no |no |no | -|RVV (18) |~190 (19) |~25000 (30) |Scalable (20) |yes |yes |no |yes |yes (21) |no |yes |no |no |no | -|Aurora SX(22) |~200 (23) |unknown (31) |Scalable (24) |yes |yes |no |yes |no |no |no |no |no |? | +|VSX |700+ |700+? (26) |Packed SIMD |no |no |no |yes (12) |yes |no |no |no |no |yes (13) | +|NEON |~250 (14) |7088 (28) |Packed SIMD |no |no |no |yes |yes |no |no |no |no |no | +|SVE2 |~1000 (15) |6040 (29) |Predicated SIMD(16) |no (16) |yes |no |yes |yes |no |yes (7) |no |no |yes (33) | +|AVX512 (17) |~1000s (18) |7256 (30) |Predicated SIMD |no |yes |no |yes |yes |no |no |no |no |no | +|RVV (19) |~190 (20) |~25000 (31) |Scalable (21) |yes |yes |no |yes |yes (22) |no |yes |no |no |no | +|Aurora SX(23) |~200 (24) |unknown (32) |Scalable (25) |yes |yes |no |yes |no |no |no |no |no |? | * (1): plus EXT001 24-bit prefixing using 25% of EXT001 space. See [[sv/svp64]] * (2): A 2-Dimensional Scalable Vector ISA **specifically designed for the Power ISA** with both Horizontal-First and Vertical-First Modes. See [[sv/vector_isa_comparison]] @@ -20,32 +20,32 @@ * (8): Based on LD/ST Fail-first, extended to data. See [[sv/svp64/appendix]] * (9): Turns standard ops into a type of "cmp". See [[sv/svp64/appendix]] * (10): Any non-power-of-two Matrices up to 127 FMACs (or other FMA-style op), full triple-loop Schedule. Also DCT (Lee) and FFT Full (RADIX2) Triple-loops supported. See [[sv/remap]] -* (11): VSX's Vector Registers are mis-named: they are 100% PackedSIMD. AVX-512 is not a Vector ISA either. See [Flynn's Taxonomy](https://en.wikipedia.org/wiki/Flynn%27s_taxonomy) -* (12): Power ISA v3.1 contains "Matrix Multiply Assist" (MMA) which due to PackedSIMD is restricted to RADIX2 and requires inline assembler loop-unrolling for non-power-of-two Matrix dimensions -* (13): difficult to ascertain, see [NEON/VFP](https://developer.arm.com/documentation/den0018/a/NEON-and-VFP-Instruction-Summary/List-of-all-NEON-and-VFP-instructions). +* (12): VSX's Vector Registers are mis-named: they are 100% PackedSIMD. AVX-512 is not a Vector ISA either. See [Flynn's Taxonomy](https://en.wikipedia.org/wiki/Flynn%27s_taxonomy) +* (13): Power ISA v3.1 contains "Matrix Multiply Assist" (MMA) which due to PackedSIMD is restricted to RADIX2 and requires inline assembler loop-unrolling for non-power-of-two Matrix dimensions +* (14): difficult to ascertain, see [NEON/VFP](https://developer.arm.com/documentation/den0018/a/NEON-and-VFP-Instruction-Summary/List-of-all-NEON-and-VFP-instructions). Critically depends on ARM Scalar instructions -* (14): difficult to exactly ascertain, see ARM Architecture Reference Manual Supplement, DDI 0584. Critically depends on ARM Scalar instructions. -* (15): ARM states that the Scalability is a [Silicon-partner choice](https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/102340_0001_00_en_introduction-to-sve2.pdf?revision=aae96dd2-5334-4ad3-9a47-393086a20fea). +* (15): difficult to exactly ascertain, see ARM Architecture Reference Manual Supplement, DDI 0584. Critically depends on ARM Scalar instructions. +* (16): ARM states that the Scalability is a [Silicon-partner choice](https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/102340_0001_00_en_introduction-to-sve2.pdf?revision=aae96dd2-5334-4ad3-9a47-393086a20fea). Scalability in the ISA is **not available to the programmer**: there is no `setvl` instruction in SVE2, which is already causing assembler programmer difficulties. [quote](https://gist.github.com/zingaburga/805669eb891c820bd220418ee3f0d6bd#file-sve2-md) **"you may be stuck with only using the bottom 128 bits of the vector, or need to code specifically for each width"** -* (16): [AVX512 Wikipedia](https://en.wikipedia.org/wiki/AVX-512), [Lifecycle of an instruction set](https://media.handmade-seattle.com/tom-forsyth/) including full slides -* (17): difficult to exactly ascertain, contains subsets. Critically depends on ISA support from earlier x86 ISA subsets (several more thousand instructions). See [SIMD ISA listing](https://www.officedaytime.com/simd512e/) -* (18): [RVV Spec](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc) -* (19): RISC-V Vectors are not stand-alone, i.e. like SVE2 and AVX-512 are critically dependent on the Scalar ISA (an additional ~96 instructions for the Scalar RV64GC set, needed for Linux). -* (20): Like the original Cray RVV is a truly scalable Vector ISA (Cray setvl instruction). However, like SVE2, the Maximum Vector length is a Silicon-partner choice, which creates similar limitations that SVP64 does not have. +* (17): [AVX512 Wikipedia](https://en.wikipedia.org/wiki/AVX-512), [Lifecycle of an instruction set](https://media.handmade-seattle.com/tom-forsyth/) including full slides +* (18): difficult to exactly ascertain, contains subsets. Critically depends on ISA support from earlier x86 ISA subsets (several more thousand instructions). See [SIMD ISA listing](https://www.officedaytime.com/simd512e/) +* (19): [RVV Spec](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc) +* (20): RISC-V Vectors are not stand-alone, i.e. like SVE2 and AVX-512 are critically dependent on the Scalar ISA (an additional ~96 instructions for the Scalar RV64GC set, needed for Linux). +* (21): Like the original Cray RVV is a truly scalable Vector ISA (Cray setvl instruction). However, like SVE2, the Maximum Vector length is a Silicon-partner choice, which creates similar limitations that SVP64 does not have. The RISC-V Founders strongly discourage efforts by programmers to find out the Silicon's Maximum Vector Length, as an effort to steer programmers towards Silicon-independent assembler. This requires **all** algorithms to contain a loop construct. MAXVL in SVP64 is a Spec-hard-fixed quantity therefore loop constructs are not necessary 100% of the time. -* (21): like SVP64 it is up to the hardware implementor (Silicon partner) to choose whether to support 128-bit elements. -* (22): [NEC SX Aurora](https://ftp.libre-soc.org/NEC_SX_Aurora_TSUBASA_VectorEngine-as-manual-v1.2.pdf) is based on the original Cray Vectors -* (23): [Aurora ISA guide](https://sxauroratsubasa.sakura.ne.jp/documents/guide/pdfs/Aurora_ISA_guide.pdf) Appendix-3 11.1 p508 -* (24): Like the original Cray Vectors, the ISA Vector Length is independent of the underlying hardware, however Generation 1 has 256 elements per Vector register (3.2.4 p24, Aurora ISA guide) -* (25): If treated as a 1-Dimensional ISA, and designed badly, the 24-bit Prefix expands 200+ scalar instructions to well over a million intrinsics (N~=10^4 **times** M~=10^2). +* (22): like SVP64 it is up to the hardware implementor (Silicon partner) to choose whether to support 128-bit elements. +* (23): [NEC SX Aurora](https://ftp.libre-soc.org/NEC_SX_Aurora_TSUBASA_VectorEngine-as-manual-v1.2.pdf) is based on the original Cray Vectors +* (24): [Aurora ISA guide](https://sxauroratsubasa.sakura.ne.jp/documents/guide/pdfs/Aurora_ISA_guide.pdf) Appendix-3 11.1 p508 +* (25): Like the original Cray Vectors, the ISA Vector Length is independent of the underlying hardware, however Generation 1 has 256 elements per Vector register (3.2.4 p24, Aurora ISA guide) +* (26): If treated as a 1-Dimensional ISA, and designed badly, the 24-bit Prefix expands 200+ scalar instructions to well over a million intrinsics (N~=10^4 **times** M~=10^2). If treated as a 2-Dimensional ISA and designed well, there are far less. N prefix intrinsics **plus** M scalar instruction intrinsics, where N is likely to be of the order of 10^2 and M of the order of 10^2. -* (26): [Altivec gcc intrinsics](https://gcc.gnu.org/onlinedocs/gcc/PowerPC-AltiVec_002fVSX-Built-in-Functions.html), contains links to additional VSX intrinsics for ISA 2.05/6/7, 3.0 and 3.1 -* (27): NEON 32-bit 2754 intrinsics, NEON 64-bit 4334 intrinsics. -* (28): SVE: 4140 intrinsics, SVE2 1900 intrinsics -* (29): Count includes SSE, SSE2, AVX, AVX2 and all AVX512 variants -* (30): [RVV intrinsics listing](https://raw.githubusercontent.com/riscv-non-isa/rvv-intrinsic-doc/master/intrinsic_funcs.md) page is 25,000 lines long. -* (31): Unknown. estimated to be of the order of length of RVV due to also being a Cray-style Scalable ISA. -* (32): [Scalable Matrix Optional Extension](https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/scalable-matrix-extension-armv9-a-architecture) +* (27): [Altivec gcc intrinsics](https://gcc.gnu.org/onlinedocs/gcc/PowerPC-AltiVec_002fVSX-Built-in-Functions.html), contains links to additional VSX intrinsics for ISA 2.05/6/7, 3.0 and 3.1 +* (28): NEON 32-bit 2754 intrinsics, NEON 64-bit 4334 intrinsics. +* (29): SVE: 4140 intrinsics, SVE2 1900 intrinsics +* (30): Count includes SSE, SSE2, AVX, AVX2 and all AVX512 variants +* (31): [RVV intrinsics listing](https://raw.githubusercontent.com/riscv-non-isa/rvv-intrinsic-doc/master/intrinsic_funcs.md) page is 25,000 lines long. +* (32): Unknown. estimated to be of the order of length of RVV due to also being a Cray-style Scalable ISA. +* (33): [Scalable Matrix Optional Extension](https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/scalable-matrix-extension-armv9-a-architecture) the key is an outer-product instruction [SMOPA](https://developer.arm.com/documentation/ddi0602/2022-06/SME-Instructions/SMOPA--Signed-integer-sum-of-outer-products-and-accumulate-?lang=en) which is very hard to tell at a glance if it is power-2 or non-power-2