aarch64: Use Q-reg loads/stores in movmem expansion
This is my attempt at reviving the old patch
https://gcc.gnu.org/pipermail/gcc-patches/2019-January/514632.html
I have followed on Kyrill's comment upstream on the link above and I
am using the recommended option iii that he mentioned.
"1) Adjust the copy_limit to 256 bits after checking
AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS in the tuning.
2) Adjust aarch64_copy_one_block_and_progress_pointers to handle
256-bit moves. by iii:
iii) Emit explicit V4SI (or any other 128-bit vector mode) pairs
ldp/stps. This wouldn't need any adjustments to MD patterns,
but would make aarch64_copy_one_block_and_progress_pointers
more complex as it would now have two paths, where one
handles two adjacent memory addresses in one calls."
gcc/ChangeLog:
* config/aarch64/aarch64.c (aarch64_gen_store_pair): Add case
for E_V4SImode.
(aarch64_gen_load_pair): Likewise.
(aarch64_copy_one_block_and_progress_pointers): Handle 256 bit copy.
(aarch64_expand_cpymem): Expand copy_limit to 256bits where
appropriate.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/cpymem-q-reg_1.c: New test.
* gcc.target/aarch64/large_struct_copy_2.c: Update for ldp q regs.