brackets
[libreriscv.git] / openpower / sv / comparison_table.mdwn
1 **ISA Comparison Table**
2
3 |ISA <br>name |Num <br>opcodes|Num <br>intrinsics|Taxonomy / <br> Class|setvl <br> scalable|Predicate <br> Masks|Twin <br> Predication|Explicit <br> Vector regs|128-bit <br> ops|Bigint |LDST <br> Fault-First|Data-dependent <br> Fail-first|Predicate-<br> Result|Matrix HW<br> support|
4 |--------------|---------------|------------------|---------------------|-------------------|--------------------|---------------------|-------------------------|----------------|--------|---------------------|------------------------------|---------------------|---------------------|
5 |Draft SVP64 |5 (1) |see (25) |Scalable (2) |yes |yes |yes (3) |no (4) |see (5) |yes (6) |yes (7) |yes (8) |yes (9) |yes (10) |
6 |VSX |700+ |700+? (26) |Packed SIMD |no |no |no |yes (11) |yes |no |no |no |no |yes (12) |
7 |NEON |~250 (13) |7088 (27) |Packed SIMD |no |no |no |yes |yes |no |no |no |no |no |
8 |SVE2 |~1000 (14) |6040 (28) |Predicated SIMD(15) |no (15) |yes |no |yes |yes |no |yes (7) |no |no |no |
9 |AVX512 (16) |~1000s (17) |7256 (29) |Predicated SIMD |no |yes |no |yes |yes |no |no |no |no |no |
10 |RVV (18) |~190 (19) |~25000 (30) |Scalable (20) |yes |yes |no |yes |yes (21) |no |yes |no |no |no |
11 |Aurora SX(22) |~200 (23) |unknown (31) |Scalable (24) |yes |yes |no |yes |no |no |no |no |no |no |
12
13 * (1): plus EXT001 24-bit prefixing. See [[sv/svp64]]
14 * (2): A 2-Dimensional Scalable Vector ISA **specifically designed for the Power ISA** with both Horizontal-First and Vertical-First Modes. See [[sv/vector_isa_comparison]]
15 * (3): on specific operations. See [[opcode_regs_deduped]] for full list
16 * (4): SVP64 provides a Vector concept on top of the **Scalar** GPR, FPR and CR Fields, extended to 128 entries.
17 * (5): SVP64 Vectorises Scalar instructions. It is up to the **implementor** to choose (**optionally**) whether to apply SVP64 to e.g. VSX Quad-Precision (128-bit) instructions, to create 128-bit Vector operations.
18 * (6): big-integer add is just `sv.adde`. Bigint Mul and divide require addition of two scalar operations. See [[sv/biginteger/analysis]]
19 * (7): See [[sv/svp64/appendix]] and [ARM SVE Fault-First](https://alastairreid.github.io/papers/sve-ieee-micro-2017.pdf)
20 * (8): Based on LD/ST Fail-first, extended to data. See [[sv/svp64/appendix]]
21 * (9): Turns standard ops into a type of "cmp". See [[sv/svp64/appendix]]
22 * (10): Any non-power-of-two Matrix up to 127 FMACs. Also DCT (Lee) and FFT Full (RADIX2) Triple-loops supported. See [[sv/remap]]
23 * (11): VSX's Vector Registers are mis-named: they are 100% PackedSIMD. AVX-512 is not a Vector ISA either. See [Flynn's Taxonomy](https://en.wikipedia.org/wiki/Flynn%27s_taxonomy)
24 * (12): Power ISA v3.1 contains "Matrix Multiply Assist" (MMA) which due to PackedSIMD is restricted to RADIX2 and requires inline assembler loop-unrolling for non-power-of-two Matrix dimensions
25 * (13): difficult to ascertain, see [NEON/VFP](https://developer.arm.com/documentation/den0018/a/NEON-and-VFP-Instruction-Summary/List-of-all-NEON-and-VFP-instructions).
26 Critically depends on ARM Scalar instructions
27 * (14): difficult to exactly ascertain, see ARM Architecture Reference Manual Supplement, DDI 0584. Critically depends on ARM Scalar instructions.
28 * (15): ARM states that the Scalability is a [Silicon-partner choice](https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/102340_0001_00_en_introduction-to-sve2.pdf?revision=aae96dd2-5334-4ad3-9a47-393086a20fea).
29 Scalability in the ISA is **not available to the programmer**: there is no `setvl` instruction in SVE2, which is already causing assembler programmer difficulties. Effectively this makes SVE2 Predicated SIMD where the SIMD width is chosen by the "Silicon partner"
30 * (16): [AVX512 Wikipedia](https://en.wikipedia.org/wiki/AVX-512), [Lifecycle of an instruction set](https://media.handmade-seattle.com/tom-forsyth/) including full slides
31 * (17): difficult to exactly ascertain, contains subsets. Critically depends on ISA support from earlier x86 ISA subsets (several more thousand instructions). See [SIMD ISA listing](https://www.officedaytime.com/simd512e/)
32 * (18): [RVV Spec](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc)
33 * (19): RISC-V Vectors are not stand-alone, i.e. like SVE2 and AVX-512 are critically dependent on the Scalar ISA (an additional ~96 instructions for the Scalar RV64GC set (RV64GC is equivalent to the Linux Compliancy Level)
34 * (20): Like the original Cray RVV is a truly scalable Vector ISA (Cray setvl instruction). However, like SVE2, the Maximum Vector length is a Silicon-partner choice, which creates similar limitations that SVP64 does not have.
35 * (21): like SVP64 it is up to the hardware implementor to choose whether to support 128-bit elements.
36 * (22): [NEC SX Aurora](https://ftp.libre-soc.org/NEC_SX_Aurora_TSUBASA_VectorEngine-as-manual-v1.2.pdf) is based on the original Cray Vectors
37 * (23): [Aurora ISA guide](https://sxauroratsubasa.sakura.ne.jp/documents/guide/pdfs/Aurora_ISA_guide.pdf) Appendix-3 11.1 p508
38 * (24): Like the original Cray Vectors, the ISA Vector Length is independent of the underlying hardware, however Generation 1 has 256 elements per Vector register (3.2.4 p24, Aurora ISA guide)
39 * (25): If treated as a 1-Dimensional ISA, the 24-bit Prefix expands 200+ scalar instructions to well over a million intrinsics (N **times** M).
40 If treated as a 2-Dimensional ISA there are far less. N prefix intrinsics **plus** M scalar instruction intrinsics, where N is of the order of 10^4 and M is of the order of 10^2.
41 * (26): [https://gcc.gnu.org/onlinedocs/gcc/PowerPC-AltiVec_002fVSX-Built-in-Functions.html](Altivec gcc intrinsic), contains links to additional VSX intrinsics for ISA 2.05/6/7, 3.0 and 3.1
42 * (27): NEON 32-bit 2754 intrinsics, NEON 64-bit 4334 intrinsics.
43 * (28): SVE: 4140 intrinsics, SVE2 1900 intrinsics
44 * (29): Count includes SSE, SSE2, AVX, AVX2 and all AVX512 variants
45 * (30): [RVV intrinsics listing](https://raw.githubusercontent.com/riscv-non-isa/rvv-intrinsic-doc/master/intrinsic_funcs.md) page is 25,000 lines long.
46 * (31): Unknown. estimated to be of the order of length of RVV due to also being a Cray-style Scalable ISA.