AArch64: Improve inline memcpy expansion
Improve the inline memcpy expansion. Use integer load/store for copies <= 24
bytes instead of SIMD. Set the maximum copy to expand to 256 by default,
except that -Os or no Neon expands up to 128 bytes. When using LDP/STP of
Q-registers, also use Q-register accesses for the unaligned tail, saving 2
instructions (eg. all sizes up to 48 bytes emit exactly 4 instructions).
Cleanup code and comments.
The codesize gain vs the GCC10 expansion is 0.05% on SPECINT2017.
2020-11-03 Wilco Dijkstra <wdijkstr@arm.com>
gcc/
* config/aarch64/aarch64.c (aarch64_expand_cpymem): Cleanup code and
comments, tweak expansion decisions and improve tail expansion.