shorten table, clarify data-dependent fail-first
authorLuke Kenneth Casson Leighton <lkcl@lkcl.net>
Sun, 31 Jul 2022 21:40:20 +0000 (22:40 +0100)
committerLuke Kenneth Casson Leighton <lkcl@lkcl.net>
Sun, 31 Jul 2022 21:40:23 +0000 (22:40 +0100)
openpower/sv/comparison_table.mdwn

index dcc9f52d2ac42c4abda78d8f14ab6dc4f16f1148..428b5e2aa541568885d0e4ce1969680918235bb7 100644 (file)
@@ -1,15 +1,15 @@
 **ISA Comparison Table to DRAFT SVP64** - discussion and research at <https://bugs.libre-soc.org/show_bug.cgi?id=893>
 
-|ISA <br>name   |No <br>opcodes|No <br>intrinsics|Taxonomy / <br>Class|setvl <br> scalable|Predicate <br> Masks|Twin <br>Pred|Vector <br>regs |128-bit <br> ops |Bigint  |LDST <br>F/First|Data-dep<br> Fail-first|Pred-<br> Result|HW<br> Matrix|DCT/FFT <br>HW|
-|---------------|--------------|-----------------|--------------------|-------------------|--------------------|-------------|----------------|-----------------|--------|----------------|-----------------------|----------------|-------------|--------------|
-|SVP64          |5 [^1]        |see [^2]         |Scalable [^3]       |yes                |yes                 |yes [^4]     |no [^5]         |see [^6]         |yes[^7] |yes [^8]        |yes [^9]               |yes [^10]       |yes [^11]    | yes[^12]     |
-|VSX            |700+          |700?[^v1]        |PackedSIMD          |no                 |no                  |no           |yes [^v2]       |yes              |no      |no              |no                     |no              |yes [^v3]    | no           |
-|NEON           |~250 [^n1]    |7088 [^n2]       |PackedSIMD          |no                 |no                  |no           |yes             |see [^b1]        |no      |no              |no                     |no              |no           | no           |
-|SVE2           |~1000 [^e1]   |6040 [^e2]       |Predicated SIMD[^e3]|no [^e3]           |yes                 |no           |yes             |see [^b1]        |no      |yes [^8]        |no                     |no              |yes [^e4]    | no           |
-|AVX512 [^x1]   |~1000s [^x2]  |7256 [^x3]       |Predicated SIMD     |no                 |yes                 |no           |yes             |see [^b1]        |no      |no              |no                     |no              |yes [^x4]    | no           |
-|RVV [^r1]      |~190 [^r2]    |~25000[^r3]      |Scalable[^r4]       |yes                |yes                 |no           |yes             |yes [^r5]        |no      |yes             |no                     |no              |no           | no           |
-|Aurora SX[^s1] |~200 [^s2]    |unknown [^s3]    |Scalable [^s4]      |yes                |yes                 |no           |yes             |no               |no      |no              |no                     |no              |?            | no           |
-|66000[^m1]     |~200          |unknown          |AutoVec[^m1]        |see [^m1]          |see[^m1]            |no           |see [^m1]       |no               |yes[^m2]|see [^m1]       |no                     |no              |no           | no           |
+|ISA <br>name   |No <br>opcodes|No <br>intrinsics|Taxonomy / <br>Class|setvl <br> scalable|Pred. <br> Masks|Twin <br>Pred|Vector <br>regs |128-bit <br> ops |Bigint  |LDST <br>F/First|Data-dep <br>F-first|Pred-<br> Result|HW<br> Matrix|DCT/FFT <br>HW|
+|---------------|--------------|-----------------|--------------------|-------------------|----------------|-------------|----------------|-----------------|--------|----------------|--------------------|----------------|-------------|--------------|
+|SVP64          |5 [^1]        |see [^2]         |Scalable [^3]       |yes                |yes             |yes [^4]     |no [^5]         |see [^6]         |yes[^7] |yes [^8]        |yes [^9]            |yes [^10]       |yes [^11]    | yes[^12]     |
+|VSX            |700+          |700?[^v1]        |PackedSIMD          |no                 |no              |no           |yes [^v2]       |yes              |no      |no              |no                  |no              |yes [^v3]    | no           |
+|NEON           |~250 [^n1]    |7088 [^n2]       |PackedSIMD          |no                 |no              |no           |yes             |see [^b1]        |no      |no              |no                  |no              |no           | no           |
+|SVE2           |~1000 [^e1]   |6040 [^e2]       |Predicated SIMD[^e3]|no [^e3]           |yes             |no           |yes             |see [^b1]        |no      |yes [^8]        |no                  |no              |yes [^e4]    | no           |
+|AVX512 [^x1]   |~1000s [^x2]  |7256 [^x3]       |Predicated SIMD     |no                 |yes             |no           |yes             |see [^b1]        |no      |no              |no                  |no              |yes [^x4]    | no           |
+|RVV [^r1]      |~190 [^r2]    |~25000[^r3]      |Scalable[^r4]       |yes                |yes             |no           |yes             |yes [^r5]        |no      |yes             |no                  |no              |no           | no           |
+|Aurora SX[^s1] |~200 [^s2]    |unknown [^s3]    |Scalable [^s4]      |yes                |yes             |no           |yes             |no               |no      |no              |no                  |no              |?            | no           |
+|66000[^m1]     |~200          |unknown          |AutoVec[^m1]        |see [^m1]          |see[^m1]        |no           |see [^m1]       |no               |yes[^m2]|see [^m1]       |no                  |no              |no           | no           |
 
 [^1]: plus EXT001 24-bit prefixing using 25% of EXT001 space. See [[sv/svp64]]
 [^2]: If treated as a 1-Dimensional ISA, and designed badly, the 24-bit Prefix expands 200+ scalar instructions to well over a million intrinsics (N~=10^4 **times** M~=10^2).
@@ -20,7 +20,7 @@
 [^6]: SVP64 Vectorises Scalar ops. It is up to the **implementor** to choose (**optionally**) whether to apply SVP64 to e.g. VSX Quad-Precision (128-bit) instructions, to create 128-bit Vector ops.
 [^7]: big-integer add is just `sv.adde`. For optimal performance Bigint Mul and divide first require addition of two scalar operations (in turn, naturally Vectorised by SVP64). See [[sv/biginteger/analysis]]
 [^8]: LD/ST Fault-First: see [[sv/svp64/appendix]] and [ARM SVE Fault-First](https://alastairreid.github.io/papers/sve-ieee-micro-2017.pdf)
-[^9]: Based on LD/ST Fail-first, extended to data. See [[sv/svp64/appendix]]
+[^9]: Data-dependent Fail-First: Based on LD/ST Fail-first, extended to data. Truncates VL based on failing an Rc=1 test. See [[sv/svp64/appendix]]
 [^10]: Predicate-result effectively turns any standard op into a type of "cmp". See [[sv/svp64/appendix]]
 [^11]: Any non-power-of-two Matrices up to 127 FMACs (or other FMA-style op), full triple-loop Schedule. See [[sv/remap]]
 [^12]: DCT (Lee) and FFT Full Triple-loops supported, RADIX2-only. Normally only found in VLIW DSPs (TI MSP320, Qualcom Hexagon). See [[sv/remap]]