**ISA Comparison Table to DRAFT SVP64** - discussion and research at <https://bugs.libre-soc.org/show_bug.cgi?id=893>
-|ISA <br>name |No <br>opcodes|No <br>intrinsics|Taxonomy / <br>Class|setvl <br> scalable|Predicate <br> Masks|Twin <br>Pred|Vector <br>regs |128-bit <br> ops |Bigint |LDST <br>F/First|Data-dep<br> Fail-first|Pred-<br> Result|HW<br> Matrix|DCT/FFT <br>HW|
-|---------------|--------------|-----------------|--------------------|-------------------|--------------------|-------------|----------------|-----------------|--------|----------------|-----------------------|----------------|-------------|--------------|
-|SVP64 |5 [^1] |see [^2] |Scalable [^3] |yes |yes |yes [^4] |no [^5] |see [^6] |yes[^7] |yes [^8] |yes [^9] |yes [^10] |yes [^11] | yes[^12] |
-|VSX |700+ |700?[^v1] |PackedSIMD |no |no |no |yes [^v2] |yes |no |no |no |no |yes [^v3] | no |
-|NEON |~250 [^n1] |7088 [^n2] |PackedSIMD |no |no |no |yes |see [^b1] |no |no |no |no |no | no |
-|SVE2 |~1000 [^e1] |6040 [^e2] |Predicated SIMD[^e3]|no [^e3] |yes |no |yes |see [^b1] |no |yes [^8] |no |no |yes [^e4] | no |
-|AVX512 [^x1] |~1000s [^x2] |7256 [^x3] |Predicated SIMD |no |yes |no |yes |see [^b1] |no |no |no |no |yes [^x4] | no |
-|RVV [^r1] |~190 [^r2] |~25000[^r3] |Scalable[^r4] |yes |yes |no |yes |yes [^r5] |no |yes |no |no |no | no |
-|Aurora SX[^s1] |~200 [^s2] |unknown [^s3] |Scalable [^s4] |yes |yes |no |yes |no |no |no |no |no |? | no |
-|66000[^m1] |~200 |unknown |AutoVec[^m1] |see [^m1] |see[^m1] |no |see [^m1] |no |yes[^m2]|see [^m1] |no |no |no | no |
+|ISA <br>name |No <br>opcodes|No <br>intrinsics|Taxonomy / <br>Class|setvl <br> scalable|Pred. <br> Masks|Twin <br>Pred|Vector <br>regs |128-bit <br> ops |Bigint |LDST <br>F/First|Data-dep <br>F-first|Pred-<br> Result|HW<br> Matrix|DCT/FFT <br>HW|
+|---------------|--------------|-----------------|--------------------|-------------------|----------------|-------------|----------------|-----------------|--------|----------------|--------------------|----------------|-------------|--------------|
+|SVP64 |5 [^1] |see [^2] |Scalable [^3] |yes |yes |yes [^4] |no [^5] |see [^6] |yes[^7] |yes [^8] |yes [^9] |yes [^10] |yes [^11] | yes[^12] |
+|VSX |700+ |700?[^v1] |PackedSIMD |no |no |no |yes [^v2] |yes |no |no |no |no |yes [^v3] | no |
+|NEON |~250 [^n1] |7088 [^n2] |PackedSIMD |no |no |no |yes |see [^b1] |no |no |no |no |no | no |
+|SVE2 |~1000 [^e1] |6040 [^e2] |Predicated SIMD[^e3]|no [^e3] |yes |no |yes |see [^b1] |no |yes [^8] |no |no |yes [^e4] | no |
+|AVX512 [^x1] |~1000s [^x2] |7256 [^x3] |Predicated SIMD |no |yes |no |yes |see [^b1] |no |no |no |no |yes [^x4] | no |
+|RVV [^r1] |~190 [^r2] |~25000[^r3] |Scalable[^r4] |yes |yes |no |yes |yes [^r5] |no |yes |no |no |no | no |
+|Aurora SX[^s1] |~200 [^s2] |unknown [^s3] |Scalable [^s4] |yes |yes |no |yes |no |no |no |no |no |? | no |
+|66000[^m1] |~200 |unknown |AutoVec[^m1] |see [^m1] |see[^m1] |no |see [^m1] |no |yes[^m2]|see [^m1] |no |no |no | no |
[^1]: plus EXT001 24-bit prefixing using 25% of EXT001 space. See [[sv/svp64]]
[^2]: If treated as a 1-Dimensional ISA, and designed badly, the 24-bit Prefix expands 200+ scalar instructions to well over a million intrinsics (N~=10^4 **times** M~=10^2).
[^6]: SVP64 Vectorises Scalar ops. It is up to the **implementor** to choose (**optionally**) whether to apply SVP64 to e.g. VSX Quad-Precision (128-bit) instructions, to create 128-bit Vector ops.
[^7]: big-integer add is just `sv.adde`. For optimal performance Bigint Mul and divide first require addition of two scalar operations (in turn, naturally Vectorised by SVP64). See [[sv/biginteger/analysis]]
[^8]: LD/ST Fault-First: see [[sv/svp64/appendix]] and [ARM SVE Fault-First](https://alastairreid.github.io/papers/sve-ieee-micro-2017.pdf)
-[^9]: Based on LD/ST Fail-first, extended to data. See [[sv/svp64/appendix]]
+[^9]: Data-dependent Fail-First: Based on LD/ST Fail-first, extended to data. Truncates VL based on failing an Rc=1 test. See [[sv/svp64/appendix]]
[^10]: Predicate-result effectively turns any standard op into a type of "cmp". See [[sv/svp64/appendix]]
[^11]: Any non-power-of-two Matrices up to 127 FMACs (or other FMA-style op), full triple-loop Schedule. See [[sv/remap]]
[^12]: DCT (Lee) and FFT Full Triple-loops supported, RADIX2-only. Normally only found in VLIW DSPs (TI MSP320, Qualcom Hexagon). See [[sv/remap]]