vector creation from two parts of two vectors produces TBL rather than ins (PR 93720)
The following patch enables vector permutations optimization by trying
to use ins instruction instead of slow and generic tbl.
example:
vector float f0(vector float a, vector float b)
{
return __builtin_shuffle (a, a, (vector int){3, 1, 2, 3});
}
was compiled into:
...
adrp x0, .LC0
ldr q1, [x0, #:lo12:.LC0]
tbl v0.16b, {v0.16b}, v1.16b
...
and after patch:
...
ins v0.s[0], v0.s[3]
...
bootstrapped and tested on aarch64-linux-gnu with no regressions
gcc/ChangeLog:
2020-07-17 Andrew Pinski <apinksi@marvell.com>
PR target/93720
* config/aarch64/aarch64.c (aarch64_evpc_ins): New function.
(aarch64_expand_vec_perm_const_1): Call it.
* config/aarch64/aarch64-simd.md (aarch64_simd_vec_copy_lane): Make
public, and add a "@" prefix.
gcc/testsuite/ChangeLog:
2020-07-17 Andrew Pinski <apinksi@marvell.com>
PR target/93720
* gcc.target/aarch64/vins-1.c: New test.
* gcc.target/aarch64/vins-2.c: New test.
* gcc.target/aarch64/vins-3.c: New test.
Co-Authored-By: Dmitrij Pochepko <dmitrij.pochepko@bell-sw.com>