From a0f46cc9f46dcd168bed29445f57807a779170ad Mon Sep 17 00:00:00 2001
From: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date: Fri, 22 Jul 2022 16:06:49 +0100
Subject: [PATCH] add matrix hardware column

---
 openpower/sv/comparison_table.mdwn | 44 ++++++++++++++++--------------
 1 file changed, 23 insertions(+), 21 deletions(-)
diff --git a/openpower/sv/comparison_table.mdwn b/openpower/sv/comparison_table.mdwn
index d7ac3215f..c0bb36363 100644
--- a/openpower/sv/comparison_table.mdwn
+++ b/openpower/sv/comparison_table.mdwn
@@ -1,14 +1,14 @@
 # ISA Comparison Table
 
-| ISA <br>name   | Num <br>opcodes   | Taxonomy / <br> Class | Predicate <br> Masks   | Twin <br> Predication   |  Explicit <br> Vector regs   | 128-bit | Bigint <br> capability   | LDST <br> Fault-First   | Data-dependent <br> Fail-first   | Predicate-<br> Result   |
-|----------------|-------------------|-----------------------|------------------------|-------------------------|------------------------------|---------|--------------------------|-------------------------|----------------------------------|-------------------------|
-| SVP64          | 5 {1}             | Scalable {2}          | yes                    | yes {3}                 | no {4}                       | see {5} | yes {6}                  | yes {7}                 | yes {8}                          | yes {9}                 |
-| VSX            | 700+              | Packed SIMD           | no                     | no                      | yes {10}                     | yes     | no                       | no                      | no                               | no                      |
-| NEON           | ~250 {11}         | Predicated SIMD       | yes                    | no                      | yes                          | yes     | no                       | no                      | no                               | no                      |
-| SVE2           | ~1000 {12}        | Scalable HW {13}      | yes                    | no                      | yes                          | yes     | no                       | yes {7}                 | no                               | no                      |
-| AVX-512 {14}   | ~1000s {15}       | Predicated SIMD       | yes                    | no                      | yes                          | yes     | no                       | no                      | no                               | no                      |
-| RVV {16}       | ~190              | Scalable {17}         | yes                    | no                      | yes                          | yes {18}| no                       | yes                     | no                               | no                      |
-| Aurora SX {19} | ~200 {20}         | Scalable {21}         | yes                    | no                      | yes                          | no      | no                       | no                      | no                               | no                      |
+| ISA <br>name   | Num <br>opcodes | Taxonomy / <br> Class | Predicate <br> Masks | Twin <br> Predication |  Explicit <br> Vector regs | 128-bit | Bigint <br> capability | LDST <br> Fault-First | Data-dependent <br> Fail-first | Predicate-<br> Result | Matrix HW<br> support |
+|----------------|-----------------|-----------------------|----------------------|-----------------------|----------------------------|---------|------------------------|-----------------------|--------------------------------|-----------------------|-----------------------|
+| SVP64          | 5 {1}           | Scalable {2}          | yes                  | yes {3}               | no {4}                     | see {5} | yes {6}                | yes {7}               | yes {8}                        | yes {9}               | yes {10}              |
+| VSX            | 700+            | Packed SIMD           | no                   | no                    | yes {11}                   | yes     | no                     | no                    | no                             | no                    | yes {12}              |
+| NEON           | ~250 {13}       | Predicated SIMD       | yes                  | no                    | yes                        | yes     | no                     | no                    | no                             | no                    | no                    |
+| SVE2           | ~1000 {14}      | Scalable HW {15}      | yes                  | no                    | yes                        | yes     | no                     | yes {7}               | no                             | no                    | no                    |
+| AVX-512 {16}   | ~1000s {17}     | Predicated SIMD       | yes                  | no                    | yes                        | yes     | no                     | no                    | no                             | no                    | no                    |
+| RVV {18}       | ~190            | Scalable {19}         | yes                  | no                    | yes                        | yes {20}| no                     | yes                   | no                             | no                    | no                    |
+| Aurora SX {21} | ~200 {22}       | Scalable {23}         | yes                  | no                    | yes                        | no      | no                     | no                    | no                             | no                    | no                    |
 
 * {1}: plus EXT001 24-bit prefixing. See [[sv/svp64]]
 * {2}: A 2-Dimensional Scalable Vector ISA with both Horizontal-First and Vertical-First Modes. See [[sv/vector_isa_comparison]]
@@ -19,17 +19,19 @@
 * {7} See [[sv/svp64/appendix]] and [ARM SVE Fault-First](https://alastairreid.github.io/papers/sve-ieee-micro-2017.pdf)
 * {8} Based on LD/ST Fail-first, extended to data. See [[sv/svp64/appendix]]
 * {9} Turns standard ops into a type of "cmp". See [[sv/svp64/appendix]]
-* {10} VSX's Vector Registers are mis-named: they are 100% PackedSIMD. AVX-512 is not a Vector ISA either.  See [Flynn's Taxonomy](https://en.wikipedia.org/wiki/Flynn%27s_taxonomy)
-* {11} difficult to ascertain, see [NEON/VFP](https://developer.arm.com/documentation/den0018/a/NEON-and-VFP-Instruction-Summary/List-of-all-NEON-and-VFP-instructions).
+* {10} Any non-power-of-two Matrix up to 127 FMACs.  Also DCT (Lee) and FFT Full (RADIX2) Triple-loops supported. See [[sv/svp64/remap]]
+* {11} VSX's Vector Registers are mis-named: they are 100% PackedSIMD. AVX-512 is not a Vector ISA either.  See [Flynn's Taxonomy](https://en.wikipedia.org/wiki/Flynn%27s_taxonomy)
+* {12} Power ISA v3.1 contains "Matrix Multiply Assist" (MMA) which due to PackedSIMD is restricted to RADIX2 and requires inline assembler loop-unrolling for non-power-of-two Matrix dimensions
+* {13} difficult to ascertain, see [NEON/VFP](https://developer.arm.com/documentation/den0018/a/NEON-and-VFP-Instruction-Summary/List-of-all-NEON-and-VFP-instructions).
   Critically depends on ARM Scalar instructions
-* {12} difficult to exactly ascertain, see ARM Architecture Reference Manual Supplement, DDI 0584.  Critically depends on ARM Scalar instructions.
-* {13}: ARM states that the Scalability is a [Silicon-partner choice](https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/102340_0001_00_en_introduction-to-sve2.pdf?revision=aae96dd2-5334-4ad3-9a47-393086a20fea).
+* {14} difficult to exactly ascertain, see ARM Architecture Reference Manual Supplement, DDI 0584.  Critically depends on ARM Scalar instructions.
+* {15}: ARM states that the Scalability is a [Silicon-partner choice](https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/102340_0001_00_en_introduction-to-sve2.pdf?revision=aae96dd2-5334-4ad3-9a47-393086a20fea).
   this "Scalability independence" is not entirely extended in full to the programmer although ARM requests developers to consider it so, in practice this does not happen.
-* {14}: [Wikipedia](https://en.wikipedia.org/wiki/AVX-512), [Lifecycle of an instruction set](https://media.handmade-seattle.com/tom-forsyth/) including full slides
-* {15}: difficult to exactly ascertain, contains subsets. Critically depends on ISA support from earlier x86 ISA subsets (several more thousand instructions). See [SIMD ISA listing](https://www.officedaytime.com/simd512e/)
-* {16}: [RVV Spec](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc)
-* {17}: Like the original Cray RVV is a truly scalable Vector ISA (Cray setvl instruction).
-* {18}: like SVP64 it is up to the hardware implementor to choose whether to support 128-bit elements.
-* {19}: [NEC SX Aurora](https://ftp.libre-soc.org/NEC_SX_Aurora_TSUBASA_VectorEngine-as-manual-v1.2.pdf) is based on the original Cray Vectors
-* {20}: [Aurora ISA guide)(https://sxauroratsubasa.sakura.ne.jp/documents/guide/pdfs/Aurora_ISA_guide.pdf) Appendix-3 11.1 p508
-* {21}: Like the original Cray Vectors, the ISA Vector Length is independent of the underlying hardware, however Generation 1 has 256 elements per Vector register (3.2.4 p24, Aurora ISA guide)
+* {16}: [Wikipedia](https://en.wikipedia.org/wiki/AVX-512), [Lifecycle of an instruction set](https://media.handmade-seattle.com/tom-forsyth/) including full slides
+* {17}: difficult to exactly ascertain, contains subsets. Critically depends on ISA support from earlier x86 ISA subsets (several more thousand instructions). See [SIMD ISA listing](https://www.officedaytime.com/simd512e/)
+* {18}: [RVV Spec](https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc)
+* {19}: Like the original Cray RVV is a truly scalable Vector ISA (Cray setvl instruction).
+* {20}: like SVP64 it is up to the hardware implementor to choose whether to support 128-bit elements.
+* {21}: [NEC SX Aurora](https://ftp.libre-soc.org/NEC_SX_Aurora_TSUBASA_VectorEngine-as-manual-v1.2.pdf) is based on the original Cray Vectors
+* {22}: [Aurora ISA guide)(https://sxauroratsubasa.sakura.ne.jp/documents/guide/pdfs/Aurora_ISA_guide.pdf) Appendix-3 11.1 p508
+* {23}: Like the original Cray Vectors, the ISA Vector Length is independent of the underlying hardware, however Generation 1 has 256 elements per Vector register (3.2.4 p24, Aurora ISA guide)
-- 
2.30.2