[AArch64] Improve SVE constant moves
authorRichard Sandiford <richard.sandiford@arm.com>
Tue, 13 Aug 2019 10:40:02 +0000 (10:40 +0000)
committerRichard Sandiford <rsandifo@gcc.gnu.org>
Tue, 13 Aug 2019 10:40:02 +0000 (10:40 +0000)
commit4aeb1ba7f62c1d680c819ae3e137c3bad6f520ca
treed1fbc5144505dbb6ba45c2f71bdd2d5358762151
parent4e55aefa3ee19167a41892e4920a3e8c520aee42
[AArch64] Improve SVE constant moves

If there's no SVE instruction to load a given constant directly, this
patch instead tries to use an Advanced SIMD constant move and then
duplicates the constant to fill an SVE vector.  The main use of this
is to support constants in which each byte is in { 0, 0xff }.

Also, the patch prefers a simple integer move followed by a duplicate
over a load from memory, like we already do for Advanced SIMD.  This is
a useful option to have and would be easy to turn off via a tuning
parameter if necessary.

The patch also extends the handling of wide LD1Rs to big endian,
whereas previously we punted to a full LD1RQ.

2019-08-13  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
* machmode.h (opt_mode::else_mode): New function.
(opt_mode::else_blk): Use it.
* config/aarch64/aarch64-protos.h (aarch64_vq_mode): Declare.
(aarch64_full_sve_mode, aarch64_sve_ld1rq_operand_p): Likewise.
(aarch64_gen_stepped_int_parallel): Likewise.
(aarch64_stepped_int_parallel_p): Likewise.
(aarch64_expand_mov_immediate): Remove the optional gen_vec_duplicate
argument.
* config/aarch64/aarch64.c
(aarch64_expand_sve_widened_duplicate): Delete.
(aarch64_expand_sve_dupq, aarch64_expand_sve_ld1rq): New functions.
(aarch64_expand_sve_const_vector): Rewrite to handle more cases.
(aarch64_expand_mov_immediate): Remove the optional gen_vec_duplicate
argument.  Use early returns in the !CONST_INT_P handling.
Pass all SVE data vectors to aarch64_expand_sve_const_vector rather
than handling some inline.
(aarch64_full_sve_mode, aarch64_vq_mode): New functions, split out
from...
(aarch64_simd_container_mode): ...here.
(aarch64_gen_stepped_int_parallel, aarch64_stepped_int_parallel_p)
(aarch64_sve_ld1rq_operand_p): New functions.
* config/aarch64/predicates.md (descending_int_parallel)
(aarch64_sve_ld1rq_operand): New predicates.
* config/aarch64/constraints.md (UtQ): New constraint.
* config/aarch64/aarch64.md (UNSPEC_REINTERPRET): New unspec.
* config/aarch64/aarch64-sve.md (mov<SVE_ALL:mode>): Remove the
gen_vec_duplicate from call to aarch64_expand_mov_immediate.
(@aarch64_sve_reinterpret<mode>): New expander.
(*aarch64_sve_reinterpret<mode>): New pattern.
(@aarch64_vec_duplicate_vq<mode>_le): New pattern.
(@aarch64_vec_duplicate_vq<mode>_be): Likewise.
(*sve_ld1rq<Vesize>): Replace with...
(@aarch64_sve_ld1rq<mode>): ...this new pattern.

gcc/testsuite/
* gcc.target/aarch64/sve/init_2.c: Expect ld1rd to be used
instead of a full vector load.
* gcc.target/aarch64/sve/init_4.c: Likewise.
* gcc.target/aarch64/sve/ld1r_2.c: Remove constants that no longer
need to be loaded from memory.
* gcc.target/aarch64/sve/slp_2.c: Expect the same output for
big and little endian.
* gcc.target/aarch64/sve/slp_3.c: Likewise.  Expect 3 of the
doubles to be moved via integer registers rather than loaded
from memory.
* gcc.target/aarch64/sve/slp_4.c: Likewise but for 4 doubles.
* gcc.target/aarch64/sve/spill_4.c: Expect 16-bit constants to be
loaded via an integer register rather than from memory.
* gcc.target/aarch64/sve/const_1.c: New test.
* gcc.target/aarch64/sve/const_2.c: Likewise.
* gcc.target/aarch64/sve/const_3.c: Likewise.

From-SVN: r274375
19 files changed:
gcc/ChangeLog
gcc/config/aarch64/aarch64-protos.h
gcc/config/aarch64/aarch64-sve.md
gcc/config/aarch64/aarch64.c
gcc/config/aarch64/aarch64.md
gcc/config/aarch64/constraints.md
gcc/config/aarch64/predicates.md
gcc/machmode.h
gcc/testsuite/ChangeLog
gcc/testsuite/gcc.target/aarch64/sve/const_1.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/const_2.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/const_3.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/sve/init_2.c
gcc/testsuite/gcc.target/aarch64/sve/init_4.c
gcc/testsuite/gcc.target/aarch64/sve/ld1r_2.c
gcc/testsuite/gcc.target/aarch64/sve/slp_2.c
gcc/testsuite/gcc.target/aarch64/sve/slp_3.c
gcc/testsuite/gcc.target/aarch64/sve/slp_4.c
gcc/testsuite/gcc.target/aarch64/sve/spill_4.c