Improve dup pattern
Improve the dup pattern to prefer vector registers. When doing a dup
after a load, the register allocator thinks the costs are identical
and chooses an integer load. However a dup from an integer register
includes an int->fp transfer which is not modelled. Adding a '?' to
the integer variant means the cost is increased slightly so we prefer
using a vector register. This improves the following example:
#include <arm_neon.h>
void f(unsigned *a, uint32x4_t *b)
{
b[0] = vdupq_n_u32(a[1]);
b[1] = vdupq_n_u32(a[2]);
}
to:
ldr s0, [x0, 4]
dup v0.4s, v0.s[0]
str q0, [x1]
ldr s0, [x0, 8]
dup v0.4s, v0.s[0]
str q0, [x1, 16]
ret
gcc/
* config/aarch64/aarch64-simd.md (aarch64_simd_dup):
Swap alternatives, make integer dup more expensive.
From-SVN: r249443