Allow inner-loop reductions with variable-length vectors
authorRichard Sandiford <richard.sandiford@arm.com>
Thu, 9 Aug 2018 16:03:25 +0000 (16:03 +0000)
committerRichard Sandiford <rsandifo@gcc.gnu.org>
Thu, 9 Aug 2018 16:03:25 +0000 (16:03 +0000)
While working on PR 86871, I noticed we were being overly restrictive
when handling variable-length vectors.  For:

  for (i : ...)
    {
      res = ...;
      for (j : ...)
        res op= ...;
      a[i] = res;
    }

we don't need a reduction operation (although we do for double
reductions like:

  res = ...;
  for (i : ...)
    for (j : ...)
      res op= ...;
  a[i] = res;

which must still be rejected).

2018-08-08  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
* tree-vect-loop.c (vectorizable_reduction): Allow inner-loop
reductions for variable-length vectors.

gcc/testsuite/
* gcc.target/aarch64/sve/reduc_8.c: New test.

From-SVN: r263451

gcc/ChangeLog
gcc/testsuite/ChangeLog
gcc/testsuite/gcc.target/aarch64/sve/reduc_8.c [new file with mode: 0644]
gcc/tree-vect-loop.c

index dd088bb46a8a09d3425b2a9ee6e51e6506a68aa2..839dfc3c2c2dd349cd450758f4372c892a80e09a 100644 (file)
@@ -1,3 +1,8 @@
+2018-08-09  Richard Sandiford  <richard.sandiford@arm.com>
+
+       * tree-vect-loop.c (vectorizable_reduction): Allow inner-loop
+       reductions for variable-length vectors.
+
 2018-08-09  David Malcolm  <dmalcolm@redhat.com>
 
        PR other/84889
index 3da9f3baa36537582953a61e1e83ac581dc9b761..8ea8530243dc71334d5ed6078dc669ff423e8476 100644 (file)
@@ -1,3 +1,7 @@
+2018-08-09  Richard Sandiford  <richard.sandiford@arm.com>
+
+       * gcc.target/aarch64/sve/reduc_8.c: New test.
+
 2018-08-09  David Malcolm  <dmalcolm@redhat.com>
 
        PR other/84889
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/reduc_8.c b/gcc/testsuite/gcc.target/aarch64/sve/reduc_8.c
new file mode 100644 (file)
index 0000000..3913b88
--- /dev/null
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+int
+reduc (int *restrict a, int *restrict b, int *restrict c)
+{
+  for (int i = 0; i < 100; ++i)
+    {
+      int res = 0;
+      for (int j = 0; j < 100; ++j)
+       if (b[i + j] != 0)
+         res = c[i + j];
+      a[i] = res;
+    }
+}
+
+/* { dg-final { scan-assembler-times {\tcmpne\tp[0-9]+\.s, } 1 } } */
+/* We ought to use the CMPNE result for the SEL too.  */
+/* { dg-final { scan-assembler-not {\tcmpeq\tp[0-9]+\.s, } { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tsel\tz[0-9]+\.s, } 1 } } */
index 0669f62c960054e67ad3b057729bdbd9fa79eb87..c167aec326e827e46fdd80b37678ed76b35605f7 100644 (file)
@@ -6714,6 +6714,7 @@ vectorizable_reduction (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
     }
 
   if (reduction_type != EXTRACT_LAST_REDUCTION
+      && (!nested_cycle || double_reduc)
       && reduc_fn == IFN_LAST
       && !nunits_out.is_constant ())
     {