i386: Optimize psubusw compared to 0 into pminuw compared to op0 [PR96906]
authorJakub Jelinek <jakub@redhat.com>
Thu, 26 Nov 2020 07:44:15 +0000 (08:44 +0100)
committerJakub Jelinek <jakub@redhat.com>
Thu, 26 Nov 2020 07:46:14 +0000 (08:46 +0100)
commit32b0abb24b8702ec9954448739682ace6fa5ccf5
tree4e36a72b8c81c870020720f0a26d9f166873b462
parent768ce4f0ceb030e38427e85e483ed44330cd5da7
i386: Optimize psubusw compared to 0 into pminuw compared to op0 [PR96906]

The following patch renames VI12_AVX2 iterator to VI12_AVX2_AVX512BW
for consistency with some other iterators, as I need VI12_AVX2 without
AVX512BW for this change.
The real meat is a combiner split which combine
can use to optimize psubusw compared to 0 into pminuw compared to op0
(and similarly for psubusb compared to 0 into pminub compared to op0).
According to Agner Fog's tables, psubus[bw] and pminu[bw] timings
are the same, but the advantage of pminu[bw] is that the comparison
doesn't need a zero operand, so e.g. for -msse4.1 it causes changes like
-       psubusw %xmm1, %xmm0
-       pxor    %xmm1, %xmm1
+       pminuw  %xmm0, %xmm1
        pcmpeqw %xmm1, %xmm0
and similarly for avx2:
-       vpsubusb        %ymm1, %ymm0, %ymm0
-       vpxor   %xmm1, %xmm1, %xmm1
-       vpcmpeqb        %ymm1, %ymm0, %ymm0
+       vpminub %ymm1, %ymm0, %ymm1
+       vpcmpeqb        %ymm0, %ymm1, %ymm0

I haven't done the AVX512{BW,VL} define_split(s), they'll need
to match the UNSPEC_PCMP which are used for avx512 comparisons.

2020-11-26  Jakub Jelinek  <jakub@redhat.com>

PR target/96906
* config/i386/sse.md (VI12_AVX2): Remove V64QI/V32HI modes.
(VI12_AVX2_AVX512BW): New mode iterator.
(<sse2_avx2>_<plusminus_insn><mode>3<mask_name>,
uavg<mode>3_ceil, <sse2_avx2>_uavg<mode>3<mask_name>): Use
VI12_AVX2_AVX512BW iterator instead of VI12_AVX2.
(*<sse2_avx2>_<plusminus_insn><mode>3<mask_name>): Likewise.
(*<sse2_avx2>_uavg<mode>3<mask_name>): Likewise.
(*<sse2_avx2>_<plusminus_insn><mode>3<mask_name>): Add a new
define_split after this insn.

* gcc.target/i386/pr96906-1.c: New test.
gcc/config/i386/sse.md
gcc/testsuite/gcc.target/i386/pr96906-1.c [new file with mode: 0644]