From: Luke Kenneth Casson Leighton Date: Sun, 31 Jul 2022 21:40:20 +0000 (+0100) Subject: shorten table, clarify data-dependent fail-first X-Git-Tag: opf_rfc_ls005_v1~918 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=c3259d4c97efb31ec997c3e4ca301ca27ef17d41;p=libreriscv.git shorten table, clarify data-dependent fail-first --- diff --git a/openpower/sv/comparison_table.mdwn b/openpower/sv/comparison_table.mdwn index dcc9f52d2..428b5e2aa 100644 --- a/openpower/sv/comparison_table.mdwn +++ b/openpower/sv/comparison_table.mdwn @@ -1,15 +1,15 @@ **ISA Comparison Table to DRAFT SVP64** - discussion and research at -|ISA
name |No
opcodes|No
intrinsics|Taxonomy /
Class|setvl
scalable|Predicate
Masks|Twin
Pred|Vector
regs |128-bit
ops |Bigint |LDST
F/First|Data-dep
Fail-first|Pred-
Result|HW
Matrix|DCT/FFT
HW| -|---------------|--------------|-----------------|--------------------|-------------------|--------------------|-------------|----------------|-----------------|--------|----------------|-----------------------|----------------|-------------|--------------| -|SVP64 |5 [^1] |see [^2] |Scalable [^3] |yes |yes |yes [^4] |no [^5] |see [^6] |yes[^7] |yes [^8] |yes [^9] |yes [^10] |yes [^11] | yes[^12] | -|VSX |700+ |700?[^v1] |PackedSIMD |no |no |no |yes [^v2] |yes |no |no |no |no |yes [^v3] | no | -|NEON |~250 [^n1] |7088 [^n2] |PackedSIMD |no |no |no |yes |see [^b1] |no |no |no |no |no | no | -|SVE2 |~1000 [^e1] |6040 [^e2] |Predicated SIMD[^e3]|no [^e3] |yes |no |yes |see [^b1] |no |yes [^8] |no |no |yes [^e4] | no | -|AVX512 [^x1] |~1000s [^x2] |7256 [^x3] |Predicated SIMD |no |yes |no |yes |see [^b1] |no |no |no |no |yes [^x4] | no | -|RVV [^r1] |~190 [^r2] |~25000[^r3] |Scalable[^r4] |yes |yes |no |yes |yes [^r5] |no |yes |no |no |no | no | -|Aurora SX[^s1] |~200 [^s2] |unknown [^s3] |Scalable [^s4] |yes |yes |no |yes |no |no |no |no |no |? | no | -|66000[^m1] |~200 |unknown |AutoVec[^m1] |see [^m1] |see[^m1] |no |see [^m1] |no |yes[^m2]|see [^m1] |no |no |no | no | +|ISA
name |No
opcodes|No
intrinsics|Taxonomy /
Class|setvl
scalable|Pred.
Masks|Twin
Pred|Vector
regs |128-bit
ops |Bigint |LDST
F/First|Data-dep
F-first|Pred-
Result|HW
Matrix|DCT/FFT
HW| +|---------------|--------------|-----------------|--------------------|-------------------|----------------|-------------|----------------|-----------------|--------|----------------|--------------------|----------------|-------------|--------------| +|SVP64 |5 [^1] |see [^2] |Scalable [^3] |yes |yes |yes [^4] |no [^5] |see [^6] |yes[^7] |yes [^8] |yes [^9] |yes [^10] |yes [^11] | yes[^12] | +|VSX |700+ |700?[^v1] |PackedSIMD |no |no |no |yes [^v2] |yes |no |no |no |no |yes [^v3] | no | +|NEON |~250 [^n1] |7088 [^n2] |PackedSIMD |no |no |no |yes |see [^b1] |no |no |no |no |no | no | +|SVE2 |~1000 [^e1] |6040 [^e2] |Predicated SIMD[^e3]|no [^e3] |yes |no |yes |see [^b1] |no |yes [^8] |no |no |yes [^e4] | no | +|AVX512 [^x1] |~1000s [^x2] |7256 [^x3] |Predicated SIMD |no |yes |no |yes |see [^b1] |no |no |no |no |yes [^x4] | no | +|RVV [^r1] |~190 [^r2] |~25000[^r3] |Scalable[^r4] |yes |yes |no |yes |yes [^r5] |no |yes |no |no |no | no | +|Aurora SX[^s1] |~200 [^s2] |unknown [^s3] |Scalable [^s4] |yes |yes |no |yes |no |no |no |no |no |? | no | +|66000[^m1] |~200 |unknown |AutoVec[^m1] |see [^m1] |see[^m1] |no |see [^m1] |no |yes[^m2]|see [^m1] |no |no |no | no | [^1]: plus EXT001 24-bit prefixing using 25% of EXT001 space. See [[sv/svp64]] [^2]: If treated as a 1-Dimensional ISA, and designed badly, the 24-bit Prefix expands 200+ scalar instructions to well over a million intrinsics (N~=10^4 **times** M~=10^2). @@ -20,7 +20,7 @@ [^6]: SVP64 Vectorises Scalar ops. It is up to the **implementor** to choose (**optionally**) whether to apply SVP64 to e.g. VSX Quad-Precision (128-bit) instructions, to create 128-bit Vector ops. [^7]: big-integer add is just `sv.adde`. For optimal performance Bigint Mul and divide first require addition of two scalar operations (in turn, naturally Vectorised by SVP64). See [[sv/biginteger/analysis]] [^8]: LD/ST Fault-First: see [[sv/svp64/appendix]] and [ARM SVE Fault-First](https://alastairreid.github.io/papers/sve-ieee-micro-2017.pdf) -[^9]: Based on LD/ST Fail-first, extended to data. See [[sv/svp64/appendix]] +[^9]: Data-dependent Fail-First: Based on LD/ST Fail-first, extended to data. Truncates VL based on failing an Rc=1 test. See [[sv/svp64/appendix]] [^10]: Predicate-result effectively turns any standard op into a type of "cmp". See [[sv/svp64/appendix]] [^11]: Any non-power-of-two Matrices up to 127 FMACs (or other FMA-style op), full triple-loop Schedule. See [[sv/remap]] [^12]: DCT (Lee) and FFT Full Triple-loops supported, RADIX2-only. Normally only found in VLIW DSPs (TI MSP320, Qualcom Hexagon). See [[sv/remap]]