[AArch64] Support for LDP/STP of Q-registers
This patch adds support for generating LDPs and STPs of Q-registers.
This allows for more compact code generation and makes better use of the ISA.
It's implemented in a straightforward way by allowing 16-byte modes in the
sched-fusion machinery and adding appropriate peepholes in aarch64-ldpstp.md
as well as the patterns themselves in aarch64-simd.md.
It adds a new no_ldp_stp_qregs tuning flag.
I use it to restrict the peepholes in aarch64-ldpstp.md from merging the
operations together into PARALLELs. I also use it to restrict the sched fusion
check that brings such loads and stores together. This is enough to avoid
forming the pairs when the tuning flag is set.
I didn't see any non-noise performance effect on SPEC2017 on Cortex-A72 and Cortex-A53.
* config/aarch64/aarch64-tuning-flags.def (no_ldp_stp_qregs): New.
* config/aarch64/aarch64.c (xgene1_tunings): Add
AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS to tune_flags.
(aarch64_mode_valid_for_sched_fusion_p):
Allow 16-byte modes.
(aarch64_classify_address): Allow 16-byte modes for load_store_pair_p.
* config/aarch64/aarch64-ldpstp.md: Add peepholes for LDP STP of
128-bit modes.
* config/aarch64/aarch64-simd.md (load_pair<VQ:mode><VQ2:mode>):
New pattern.
(vec_store_pair<VQ:mode><VQ2:mode>): Likewise.
* config/aarch64/iterators.md (VQ2): New mode iterator.
* gcc.target/aarch64/ldp_stp_q.c: New test.
* gcc.target/aarch64/stp_vec_128_1.c: Likewise.
* gcc.target/aarch64/ldp_stp_q_disable.c: Likewise.
From-SVN: r261796