i386: prefer vpermilpd over vpermpd [PR93395]
authorJakub Jelinek <jakub@redhat.com>
Fri, 24 Jan 2020 21:49:51 +0000 (22:49 +0100)
committerJakub Jelinek <jakub@redhat.com>
Fri, 24 Jan 2020 21:49:51 +0000 (22:49 +0100)
commit5d782a8d909c5cc472c911c0ab4de0b890aad868
treed6f339ea0122beaeebf6c00eb33965c150142c3f
parent14e5881e37771f1f58123e77c558adb3b90c8764
i386: prefer vpermilpd over vpermpd [PR93395]

In Agner Fog's tables, vpermilp[sd] with immediates seem to be
much faster than vpermpd with immediate, for a good reason,
the former only permute something within the lanes and don't do anything
intra-lane, while vpermpd can.  So, functionality-wise, vpermilpd
is more efficient subset of vpermpd.  We use the same RTL for those
though (and also for certain broadcast).

Now, the problem was that the vpermpd pattern appeared first in sse.md,
followed by the broadcast patterns, followed by the vpermilp[sd].
Which means unless -mavx -mno-avx2, we'd emit vpermpd instead of the
more efficient alternatives.

The following patch reorders them, so that vpermpd comes last, if we
can match a broadcast, we do, if we can match a vpermilp[sd] that is not a
broadcast, we will, otherwise fall back (of course only if -mavx2) to
vpermpd.

2020-01-24  Jakub Jelinek  <jakub@redhat.com>

PR target/93395
* config/i386/sse.md (*avx_vperm_broadcast_v4sf,
*avx_vperm_broadcast_<mode>,
<sse2_avx_avx512f>_vpermil<mode><mask_name>,
*<sse2_avx_avx512f>_vpermilp<mode><mask_name>):
Move before avx2_perm<mode>/avx512f_perm<mode>.

* gcc.target/i386/pr93395.c: New test.
* gcc.target/i386/avx512vl-vpermilpdi-1.c: Remove xfail.
gcc/ChangeLog
gcc/config/i386/sse.md
gcc/testsuite/ChangeLog
gcc/testsuite/gcc.target/i386/avx512vl-vpermilpdi-1.c
gcc/testsuite/gcc.target/i386/pr93395.c [new file with mode: 0644]