Code scheduling for Cortex-A53 isn't as good as it could be.
authorWilco Dijkstra <wdijkstr@arm.com>
Fri, 5 May 2017 09:40:01 +0000 (09:40 +0000)
committerWilco Dijkstra <wilco@gcc.gnu.org>
Fri, 5 May 2017 09:40:01 +0000 (09:40 +0000)
commit9d29ae83ed27a06cdd729cd7f0cceb12c28b91dc
tree8e1a45a11552c8773d1770494a16e028b354522b
parentdfae9048a0ce06a8f240dd17c282cb1e1eaf2097
Code scheduling for Cortex-A53 isn't as good as it could be.

Code scheduling for Cortex-A53 isn't as good as it could be.  It turns out
code runs faster overall if we place loads and stores with a dependency
closer together.  To achieve this effect, this patch adds a bypass between
cortex_a53_load1 and cortex_a53_load*/cortex_a53_store* if the result of an
earlier load is used in an address calculation.  This significantly improved
benchmark scores in a proprietary benchmark suite.

    gcc/
* config/arm/aarch-common.c (arm_early_load_addr_dep_ptr):
New function.
(arm_early_store_addr_dep_ptr): Likewise.
* config/arm/aarch-common-protos.h
(arm_early_load_addr_dep_ptr): Add prototype.
(arm_early_store_addr_dep_ptr): Likewise.
* config/arm/cortex-a53.md: Add new bypasses.

From-SVN: r247631
gcc/ChangeLog
gcc/config/arm/aarch-common-protos.h
gcc/config/arm/aarch-common.c
gcc/config/arm/cortex-a53.md