From db4a1c18ceb5aede224c92ec4c86723f6fb93514 Mon Sep 17 00:00:00 2001
From: Wilco Dijkstra <wdijkstr@arm.com>
Date: Mon, 14 Nov 2016 11:51:33 +0000
Subject: [PATCH] The existing vector costs stop some beneficial vectorization.

The existing vector costs stop some beneficial vectorization.  This is mostly
due to vector statement cost being set to 3 as well as vector loads having a
higher cost than scalar loads.  This means that even when we vectorize 4x, it
is possible that the cost of a vectorized loop is similar to the scalar
version, and we fail to vectorize.

Using a cost of 3 for a vector operation suggests they are 3 times as
expensive as scalar operations.  Since most vector operations have a
similar throughput as scalar operations, this is not correct.

Using slightly lower values for these heuristics now allows this loop
and many others to be vectorized.  On a proprietary benchmark the gain
from vectorizing this loop is around 15-30% which shows vectorizing it is
indeed beneficial.

	* config/aarch64/aarch64.c (cortexa57_vector_cost):
	Change vec_stmt_cost, vec_align_load_cost and vec_unalign_load_cost.

From-SVN: r242383
---
 gcc/ChangeLog                | 5 +++++
 gcc/config/aarch64/aarch64.c | 6 +++---
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index a4f6a34f8f1..b3967a245de 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2016-11-14  Wilco Dijkstra  <wdijkstr@arm.com>
+
+	* config/aarch64/aarch64.c (cortexa57_vector_cost):
+	Change vec_stmt_cost, vec_align_load_cost and vec_unalign_load_cost.
+
 2016-11-14  Richard Biener  <rguenther@suse.de>
 
 	PR tree-optimization/78312
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b7d4640826a..bd97c5b701c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -398,12 +398,12 @@ static const struct cpu_vector_cost cortexa57_vector_cost =
   1, /* scalar_stmt_cost  */
   4, /* scalar_load_cost  */
   1, /* scalar_store_cost  */
-  3, /* vec_stmt_cost  */
+  2, /* vec_stmt_cost  */
   3, /* vec_permute_cost  */
   8, /* vec_to_scalar_cost  */
   8, /* scalar_to_vec_cost  */
-  5, /* vec_align_load_cost  */
-  5, /* vec_unalign_load_cost  */
+  4, /* vec_align_load_cost  */
+  4, /* vec_unalign_load_cost  */
   1, /* vec_unalign_store_cost  */
   1, /* vec_store_cost  */
   1, /* cond_taken_branch_cost  */
-- 
2.30.2