AArch64: Adjust costing of by element MUL to be the same as SAME3 MUL.
The cost model is currently treating multiplication by element as being more
expensive than 3 same multiplication. This means that if the value is on the
SIMD side we add an unneeded DUP. If the value is on the genreg side we use the
more expensive DUP instead of fmov.
This patch corrects the costs such that the two multiplies are costed the same
which allows us to generate
fmul v3.4s, v3.4s, v0.s[0]
instead of
dup v0.4s, v0.s[0]
fmul v3.4s, v3.4s, v0.4s
gcc/ChangeLog:
* config/aarch64/aarch64.c (aarch64_rtx_mult_cost): Adjust costs for mul.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/asimd-mull-elem.c: New test.