restored numbers for footnotes, got width of ascii table back down to under 250 chara...

author Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Fri, 29 Jul 2022 11:31:29 +0000 (12:31 +0100)

committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Fri, 29 Jul 2022 11:31:29 +0000 (12:31 +0100)
author Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Fri, 29 Jul 2022 11:31:29 +0000 (12:31 +0100)
committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Fri, 29 Jul 2022 11:31:29 +0000 (12:31 +0100)
diff --git a/openpower/sv/comparison_table.mdwn b/openpower/sv/comparison_table.mdwn

index 67b066be62232f64cfc0aaffa7e458659b11472a..36c45f367ab7e8d640f1e48fcd913a1af2f34aae 100644 (file)
--- a/openpower/sv/comparison_table.mdwn
+++ b/openpower/sv/comparison_table.mdwn
@@ -1,53 +1,53 @@
  **ISA Comparison Table** - discussion and research at <https://bugs.libre-soc.org/show_bug.cgi?id=893>
  
-|ISA <br>name              |Num <br>opcodes         |Num <br>intrinsics      |Taxonomy / <br> Class          |setvl <br> scalable|Predicate <br> Masks|Twin <br> Predication|Explicit <br> Vector regs|128-bit <br> ops|Bigint       |LDST <br> Fault-First|Data-dep<br> Fail-first|Pred-<br> Result  |Matrix HW<br> support|DCT/FFT HW<br> support |
-|--------------------------|------------------------|------------------------|-------------------------------|-------------------|--------------------|---------------------|-------------------------|----------------|-------------|---------------------|-----------------------|------------------|---------------------|-----------------------|
-|Draft SVP64               |5 [^svp64_prefix]       |see [^svp64_intrin_cnt] |Scalable [^svp64_scalable]     |yes                |yes                 |yes [^twin_pred]     |no [^svp64_no_vec_regs]  |see [^svp64_128]|yes [^bigint]|yes [^fail_first]    |yes [^data_fail_first] |yes [^pred_result]|yes [^svp64_mat]     | yes[^svp64_fft]       |
-|VSX                       |700+                    |700+? [^vsx_intrin]     |Packed SIMD                    |no                 |no                  |no                   |yes [^vsx_vec_regs]      |yes             |no           |no                   |no                     |no                |yes [^ppc_mma]       | no                    |
-|NEON                      |~250 [^neon_opcodes]    |7088 [^neon_intrin]     |Packed SIMD                    |no                 |no                  |no                   |yes                      |yes             |no           |no                   |no                     |no                |no                   | no                    |
-|SVE2                      |~1000 [^sve2_opcodes]   |6040 [^sve2_intrin]     |Predicated SIMD[^sve2_no_setvl]|no [^sve2_no_setvl]|yes                 |no                   |yes                      |yes             |no           |yes [^fail_first]    |no                     |no                |yes [^sve2_mat]      | no                    |
-|AVX512 [^avx512_wikipedia]|~1000s [^avx512_opcodes]|7256 [^avx512_intrin]   |Predicated SIMD                |no                 |yes                 |no                   |yes                      |yes             |no           |no                   |no                     |no                |yes [^x86_amx]       | no                    |
-|RVV [^rvv_spec]           |~190 [^rvv_opcodes]     |~25000 [^rvv_intrin]    |Scalable [^rvv_scalable]       |yes                |yes                 |no                   |yes                      |yes [^rvv_128]  |no           |yes                  |no                     |no                |no                   | no                    |
-|Aurora SX[^sx_aurora]     |~200 [^aurora_isa]      |unknown [^aurora_intrin]|Scalable [^aurora_scalable]    |yes                |yes                 |no                   |yes                      |no              |no           |no                   |no                     |no                |?                    | no                    |
+|ISA <br>name   |No <br>opcodes|No <br>intrinsics|Taxonomy / <br>Class|setvl <br> scalable|Predicate <br> Masks|Twin <br>Pred|Vector <br>regs |128-bit <br> ops |Bigint |LDST <br>F/First|Data-dep<br> Fail-first|Pred-<br> Result|HW<br> Matrix|DCT/FFT <br>HW|
+|---------------|--------------|-----------------|--------------------|-------------------|--------------------|-------------|----------------|-----------------|-------|----------------|-----------------------|----------------|-------------|--------------|
+|SVP64          |5 [^1]        |see [^2]         |Scalable [^3]       |yes                |yes                 |yes [^4]     |no [^5]         |see [^6]         |yes[^7]|yes [^8]        |yes [^9]               |yes [^10]       |yes [^11]    | yes[^12]     |
+|VSX            |700+          |700+? [^27]      |Packed SIMD         |no                 |no                  |no           |yes [^13]       |yes              |no     |no              |no                     |no              |yes [^14]    | no           |
+|NEON           |~250 [^15]    |7088 [^28]       |Packed SIMD         |no                 |no                  |no           |yes             |yes              |no     |no              |no                     |no              |no           | no           |
+|SVE2           |~1000 [^16]   |6040 [^29]       |Predicated SIMD[^17]|no [^17]           |yes                 |no           |yes             |yes              |no     |yes [^8]        |no                     |no              |yes [^33]    | no           |
+|AVX512 [^18]   |~1000s [^19]  |7256 [^30]       |Predicated SIMD     |no                 |yes                 |no           |yes             |yes              |no     |no              |no                     |no              |yes [^34]    | no           |
+|RVV [^20]      |~190 [^21]    |~25000 [^31]     |Scalable [^22]      |yes                |yes                 |no           |yes             |yes [^23]        |no     |yes             |no                     |no              |no           | no           |
+|Aurora SX[^24] |~200 [^25]    |unknown [^32]    |Scalable [^26]      |yes                |yes                 |no           |yes             |no               |no     |no              |no                     |no              |?            | no           |
  
-[^svp64_prefix]: plus EXT001 24-bit prefixing using 25% of EXT001 space. See [[sv/svp64]]
-[^svp64_scalable]: A 2-Dimensional Scalable Vector ISA **specifically designed for the Power ISA** with both Horizontal-First and Vertical-First Modes. See [[sv/vector_isa_comparison]]
-[^twin_pred]: on specific operations.  See [[opcode_regs_deduped]] for full list. Key: 2P - Twin Predication, 1P - Single-Predicate
-[^svp64_no_vec_regs]: SVP64 provides a Vector concept on top of the **Scalar** GPR, FPR and CR Fields, extended to 128 entries.
-[^svp64_128]: SVP64 Vectorises Scalar ops. It is up to the **implementor** to choose (**optionally**) whether to apply SVP64 to e.g. VSX Quad-Precision (128-bit) instructions, to create 128-bit Vector ops.
-[^bigint]: big-integer add is just `sv.adde`. For optimal performance Bigint Mul and divide first require addition of two scalar operations (in turn, naturally Vectorised by SVP64). See [[sv/biginteger/analysis]]
-[^fail_first]: See [[sv/svp64/appendix]] and [ARM SVE Fault-First](https://alastairreid.github.io/papers/sve-ieee-micro-2017.pdf)
-[^data_fail_first]: Based on LD/ST Fail-first, extended to data. See [[sv/svp64/appendix]]
-[^pred_result]: Predicate-result effectively turns any standard op into a type of "cmp". See [[sv/svp64/appendix]]
-[^svp64_mat]: Any non-power-of-two Matrices up to 127 FMACs (or other FMA-style op), full triple-loop Schedule. See [[sv/remap]]
-[^svp64_fft]: DCT (Lee) and FFT Full Triple-loops supported, RADIX2-only. Normally only found in VLIW DSPs (TI MSP320, Qualcom Hexagon). See [[sv/remap]]
-[^vsx_vec_regs]: VSX's Vector Registers are mis-named: they are 100% PackedSIMD. AVX-512 is not a Vector ISA either.  See [Flynn's Taxonomy](https://en.wikipedia.org/wiki/Flynn%27s_taxonomy)
-[^ppc_mma]: Power ISA v3.1 contains "Matrix Multiply Assist" (MMA) which due to PackedSIMD is restricted to RADIX2 and requires inline assembler loop-unrolling for non-power-of-two Matrix dimensions
-[^neon_opcodes]: difficult to ascertain, see [NEON/VFP](https://developer.arm.com/documentation/den0018/a/NEON-and-VFP-Instruction-Summary/List-of-all-NEON-and-VFP-instructions).
+[^1]: plus EXT001 24-bit prefixing using 25% of EXT001 space. See [[sv/svp64]]
+[^3]: A 2-Dimensional Scalable Vector ISA **specifically designed for the Power ISA** with both Horizontal-First and Vertical-First Modes. See [[sv/vector_isa_comparison]]
+[^4]: on specific operations.  See [[opcode_regs_deduped]] for full list. Key: 2P - Twin Predication, 1P - Single-Predicate
+[^5]: SVP64 provides a Vector concept on top of the **Scalar** GPR, FPR and CR Fields, extended to 128 entries.
+[^6]: SVP64 Vectorises Scalar ops. It is up to the **implementor** to choose (**optionally**) whether to apply SVP64 to e.g. VSX Quad-Precision (128-bit) instructions, to create 128-bit Vector ops.
+[^7]: big-integer add is just `sv.adde`. For optimal performance Bigint Mul and divide first require addition of two scalar operations (in turn, naturally Vectorised by SVP64). See [[sv/biginteger/analysis]]
+[^8]: LD/ST Fault-First: see [[sv/svp64/appendix]] and [ARM SVE Fault-First](https://alastairreid.github.io/papers/sve-ieee-micro-2017.pdf)
+[^9]: Based on LD/ST Fail-first, extended to data. See [[sv/svp64/appendix]]
+[^10]: Predicate-result effectively turns any standard op into a type of "cmp". See [[sv/svp64/appendix]]
+[^11]: Any non-power-of-two Matrices up to 127 FMACs (or other FMA-style op), full triple-loop Schedule. See [[sv/remap]]
+[^12]: DCT (Lee) and FFT Full Triple-loops supported, RADIX2-only. Normally only found in VLIW DSPs (TI MSP320, Qualcom Hexagon). See [[sv/remap]]
+[^13]: VSX's Vector Registers are mis-named: they are 100% PackedSIMD. AVX-512 is not a Vector ISA either.  See [Flynn's Taxonomy](https://en.wikipedia.org/wiki/Flynn%27s_taxonomy)
+[^14]: Power ISA v3.1 contains "Matrix Multiply Assist" (MMA) which due to PackedSIMD is restricted to RADIX2 and requires inline assembler loop-unrolling for non-power-of-two Matrix dimensions
+[^15]: difficult to ascertain, see [NEON/VFP](https://developer.arm.com/documentation/den0018/a/NEON-and-VFP-Instruction-Summary/List-of-all-NEON-and-VFP-instructions).
      Critically depends on ARM Scalar instructions
-[^sve2_opcodes]: difficult to exactly ascertain, see ARM Architecture Reference Manual Supplement, DDI 0584.  Critically depends on ARM Scalar instructions.
-[^sve2_no_setvl]: ARM states that the Scalability is a [Silicon-partner choice](https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/102340_0001_00_en_introduction-to-sve2.pdf?revision=aae96dd2-5334-4ad3-9a47-393086a20fea).
+[^16]: difficult to exactly ascertain, see ARM Architecture Reference Manual Supplement, DDI 0584.  Critically depends on ARM Scalar instructions.
+[^17]: ARM states that the Scalability is a [Silicon-partner choice](https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/102340_0001_00_en_introduction-to-sve2.pdf?revision=aae96dd2-5334-4ad3-9a47-393086a20fea).
      Scalability in the ISA is **not available to the programmer**: there is no `setvl` instruction in SVE2, which is already causing assembler programmer difficulties.
      [quote](https://gist.github.com/zingaburga/805669eb891c820bd220418ee3f0d6bd#file-sve2-md) **"you may be stuck with only using the bottom 128 bits of the vector, or need to code specifically for each width"**
-[^avx512_wikipedia]: [AVX512 Wikipedia](https://en.wikipedia.org/wiki/AVX-512), [Lifecycle of an instruction set](https://media.handmade-seattle.com/tom-forsyth/) including full slides
-[^avx512_opcodes]: difficult to exactly ascertain, contains subsets. Critically depends on ISA support from earlier x86 ISA subsets (several more thousand instructions). See [SIMD ISA listing](https://www.officedaytime.com/simd512e/)
-[^rvv_spec]: [RVV Spec](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc)
-[^rvv_opcodes]: RISC-V Vectors are not stand-alone, i.e. like SVE2 and AVX-512 are critically dependent on the Scalar ISA (an additional ~96 instructions for the Scalar RV64GC set, needed for Linux).
-[^rvv_scalable]: Like the original Cray RVV is a truly scalable Vector ISA (Cray setvl instruction).  However, like SVE2, the Maximum Vector length is a Silicon-partner choice, which creates similar limitations that SVP64 does not have.
+[^18]: [AVX512 Wikipedia](https://en.wikipedia.org/wiki/AVX-512), [Lifecycle of an instruction set](https://media.handmade-seattle.com/tom-forsyth/) including full slides
+[^19]: difficult to exactly ascertain, contains subsets. Critically depends on ISA support from earlier x86 ISA subsets (several more thousand instructions). See [SIMD ISA listing](https://www.officedaytime.com/simd512e/)
+[^20]: [RVV Spec](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc)
+[^21]: RISC-V Vectors are not stand-alone, i.e. like SVE2 and AVX-512 are critically dependent on the Scalar ISA (an additional ~96 instructions for the Scalar RV64GC set, needed for Linux).
+[^22]: Like the original Cray RVV is a truly scalable Vector ISA (Cray setvl instruction).  However, like SVE2, the Maximum Vector length is a Silicon-partner choice, which creates similar limitations that SVP64 does not have.
      The RISC-V Founders strongly discourage efforts by programmers to find out the Silicon's Maximum Vector Length, as an effort to steer programmers towards Silicon-independent assembler. This requires **all** algorithms to contain a loop construct.
      MAXVL in SVP64 is a Spec-hard-fixed quantity therefore loop constructs are not necessary 100% of the time.
-[^rvv_128]: like SVP64 it is up to the hardware implementor (Silicon partner) to choose whether to support 128-bit elements.
-[^sx_aurora]: [NEC SX Aurora](https://ftp.libre-soc.org/NEC_SX_Aurora_TSUBASA_VectorEngine-as-manual-v1.2.pdf) is based on the original Cray Vectors
-[^aurora_isa]: [Aurora ISA guide](https://sxauroratsubasa.sakura.ne.jp/documents/guide/pdfs/Aurora_ISA_guide.pdf) Appendix-3 11.1 p508
-[^aurora_scalable]: Like the original Cray Vectors, the ISA Vector Length is independent of the underlying hardware, however Generation 1 has 256 elements per Vector register (3.2.4 p24, Aurora ISA guide)
-[^svp64_intrin_cnt]: If treated as a 1-Dimensional ISA, and designed badly, the 24-bit Prefix expands 200+ scalar instructions to well over a million intrinsics (N~=10^4 **times** M~=10^2).
+[^23]: like SVP64 it is up to the hardware implementor (Silicon partner) to choose whether to support 128-bit elements.
+[^24]: [NEC SX Aurora](https://ftp.libre-soc.org/NEC_SX_Aurora_TSUBASA_VectorEngine-as-manual-v1.2.pdf) is based on the original Cray Vectors
+[^25]: [Aurora ISA guide](https://sxauroratsubasa.sakura.ne.jp/documents/guide/pdfs/Aurora_ISA_guide.pdf) Appendix-3 11.1 p508
+[^26]: Like the original Cray Vectors, the ISA Vector Length is independent of the underlying hardware, however Generation 1 has 256 elements per Vector register (3.2.4 p24, Aurora ISA guide)
+[^2]: If treated as a 1-Dimensional ISA, and designed badly, the 24-bit Prefix expands 200+ scalar instructions to well over a million intrinsics (N~=10^4 **times** M~=10^2).
      If treated as a 2-Dimensional ISA and designed well, there are far less. N prefix intrinsics **plus** M scalar instruction intrinsics, where N is likely to be of the order of 10^2 and M of the order of 10^2.
-[^vsx_intrin]: [Altivec gcc intrinsics](https://gcc.gnu.org/onlinedocs/gcc/PowerPC-AltiVec_002fVSX-Built-in-Functions.html), contains links to additional VSX intrinsics for ISA 2.05/6/7, 3.0 and 3.1
-[^neon_intrin]: NEON 32-bit 2754 intrinsics, NEON 64-bit 4334 intrinsics.
-[^sve2_intrin]: SVE: 4140 intrinsics, SVE2 1900 intrinsics
-[^avx512_intrin]: Count includes SSE, SSE2, AVX, AVX2 and all AVX512 variants
-[^rvv_intrin]: [RVV intrinsics listing](https://raw.githubusercontent.com/riscv-non-isa/rvv-intrinsic-doc/master/intrinsic_funcs.md) page is 25,000 lines long.
-[^aurora_intrin]: Unknown. estimated to be of the order of length of RVV due to also being a Cray-style Scalable ISA, NEC maintains an [LLVM hard fork](https://github.com/sx-aurora-dev)
-[^sve2_mat]: [Scalable Matrix Optional Extension](https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/scalable-matrix-extension-armv9-a-architecture)
+[^27]: [Altivec gcc intrinsics](https://gcc.gnu.org/onlinedocs/gcc/PowerPC-AltiVec_002fVSX-Built-in-Functions.html), contains links to additional VSX intrinsics for ISA 2.05/6/7, 3.0 and 3.1
+[^28]: NEON 32-bit 2754 intrinsics, NEON 64-bit 4334 intrinsics.
+[^29]: SVE: 4140 intrinsics, SVE2 1900 intrinsics
+[^30]: Count includes SSE, SSE2, AVX, AVX2 and all AVX512 variants
+[^31]: [RVV intrinsics listing](https://raw.githubusercontent.com/riscv-non-isa/rvv-intrinsic-doc/master/intrinsic_funcs.md) page is 25,000 lines long.
+[^32]: Unknown. estimated to be of the order of length of RVV due to also being a Cray-style Scalable ISA, NEC maintains an [LLVM hard fork](https://github.com/sx-aurora-dev)
+[^33]: [Scalable Matrix Optional Extension](https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/scalable-matrix-extension-armv9-a-architecture)
      the key is an outer-product instruction [SMOPA](https://developer.arm.com/documentation/ddi0602/2022-06/SME-Instructions/SMOPA--Signed-integer-sum-of-outer-products-and-accumulate-?lang=en) which is very hard to tell at a glance if it is power-2 or non-power-2
-[^x86_amx]: [Advanced matrix Extensions](https://en.wikipedia.org/wiki/Advanced_Matrix_Extensions) supports BF16 and INT8 only. Separate regfile, power-of-two "tiles". Not general-purpose at all.
+[^34]: [Advanced matrix Extensions](https://en.wikipedia.org/wiki/Advanced_Matrix_Extensions) supports BF16 and INT8 only. Separate regfile, power-of-two "tiles". Not general-purpose at all.
author	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Fri, 29 Jul 2022 11:31:29 +0000 (12:31 +0100)
committer	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Fri, 29 Jul 2022 11:31:29 +0000 (12:31 +0100)