i386: Optimize psubusw compared to 0 into pminuw compared to op0 [PR96906]
The following patch renames VI12_AVX2 iterator to VI12_AVX2_AVX512BW
for consistency with some other iterators, as I need VI12_AVX2 without
AVX512BW for this change.
The real meat is a combiner split which combine
can use to optimize psubusw compared to 0 into pminuw compared to op0
(and similarly for psubusb compared to 0 into pminub compared to op0).
According to Agner Fog's tables, psubus[bw] and pminu[bw] timings
are the same, but the advantage of pminu[bw] is that the comparison
doesn't need a zero operand, so e.g. for -msse4.1 it causes changes like
- psubusw %xmm1, %xmm0
- pxor %xmm1, %xmm1
+ pminuw %xmm0, %xmm1
pcmpeqw %xmm1, %xmm0
and similarly for avx2:
- vpsubusb %ymm1, %ymm0, %ymm0
- vpxor %xmm1, %xmm1, %xmm1
- vpcmpeqb %ymm1, %ymm0, %ymm0
+ vpminub %ymm1, %ymm0, %ymm1
+ vpcmpeqb %ymm0, %ymm1, %ymm0
I haven't done the AVX512{BW,VL} define_split(s), they'll need
to match the UNSPEC_PCMP which are used for avx512 comparisons.
2020-11-26 Jakub Jelinek <jakub@redhat.com>
PR target/96906
* config/i386/sse.md (VI12_AVX2): Remove V64QI/V32HI modes.
(VI12_AVX2_AVX512BW): New mode iterator.
(<sse2_avx2>_<plusminus_insn><mode>3<mask_name>,
uavg<mode>3_ceil, <sse2_avx2>_uavg<mode>3<mask_name>): Use
VI12_AVX2_AVX512BW iterator instead of VI12_AVX2.
(*<sse2_avx2>_<plusminus_insn><mode>3<mask_name>): Likewise.
(*<sse2_avx2>_uavg<mode>3<mask_name>): Likewise.
(*<sse2_avx2>_<plusminus_insn><mode>3<mask_name>): Add a new
define_split after this insn.
* gcc.target/i386/pr96906-1.c: New test.