[AArch64] Use more LDP/STP in shrinkwrapping
The shrinkwrap optimization added in GCC 7 allows each callee-save to
be delayed and done only across blocks which need a particular callee-save.
Although this reduces unnecessary memory traffic on code paths that need
few callee-saves, it typically uses LDR/STR rather than LDP/STP. This
means more memory accesses and increased codesize, ~1.0% on average.
To improve this, if a particular callee-save must be saved/restored, also
add the adjacent callee-save to allow use of LDP/STP. This significantly
reduces codesize (for example gcc_r, povray_r, parest_r, xalancbmk_r are
1% smaller). This is a simple fix which can be backported. A more advanced
approach would scan blocks for pairs of callee-saves, but that requires a
full rewrite of all the callee-save code which is too late at this stage.
An example epilog in a shrinkwrapped function before:
ldp x21, x22, [sp,#16]
ldr x23, [sp,#32]
ldr x24, [sp,#40]
ldp x25, x26, [sp,#48]
ldr x27, [sp,#64]
ldr x28, [sp,#72]
ldr x30, [sp,#80]
ldr d8, [sp,#88]
ldp x19, x20, [sp],#96
ret
And after this patch:
ldr d8, [sp,#88]
ldp x21, x22, [sp,#16]
ldp x23, x24, [sp,#32]
ldp x25, x26, [sp,#48]
ldp x27, x28, [sp,#64]
ldr x30, [sp,#80]
ldp x19, x20, [sp],#96
ret
gcc/
* config/aarch64/aarch64.c (aarch64_components_for_bb):
Increase LDP/STP opportunities by adding adjacent callee-saves.
From-SVN: r257482