match.pd: Optimize ffs of known non-zero arg into ctz + 1 [PR94956]
authorJakub Jelinek <jakub@redhat.com>
Fri, 8 May 2020 07:33:55 +0000 (09:33 +0200)
committerJakub Jelinek <jakub@redhat.com>
Fri, 8 May 2020 07:33:55 +0000 (09:33 +0200)
The ffs expanders on several targets (x86, ia64, aarch64 at least)
emit a conditional move or similar code to handle the case when the
argument is 0, which makes the code longer.
If we know from VRP that the argument will not be zero, we can (if the
target has also an ctz expander) just use ctz which is undefined at zero
and thus the expander doesn't need to deal with that.

2020-05-08  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/94956
* match.pd (FFS): Optimize __builtin_ffs* of non-zero argument into
__builtin_ctz* + 1 if direct IFN_CTZ is supported.

* gcc.target/i386/pr94956.c: New test.

gcc/ChangeLog
gcc/match.pd
gcc/testsuite/ChangeLog
gcc/testsuite/gcc.target/i386/pr94956.c [new file with mode: 0644]

index eb4924a3b4ba13e20f8b11481eada9d3fc572547..5bad3ff924b27004db2679ba3d866c53df7207f5 100644 (file)
@@ -1,5 +1,9 @@
 2020-05-08  Jakub Jelinek  <jakub@redhat.com>
 
+       PR tree-optimization/94956
+       * match.pd (FFS): Optimize __builtin_ffs* of non-zero argument into
+       __builtin_ctz* + 1 if direct IFN_CTZ is supported.
+
        PR tree-optimization/94913
        * match.pd (A - B + -1 >= A to B >= A): New simplification.
        (A - B > A to A < B): Don't test TYPE_OVERFLOW_WRAPS which is always
index cfe96975d8054196df302feaba041205ecb21b2a..892df1ec3d39ee715d088ae9975c822460d48621 100644 (file)
@@ -5986,6 +5986,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
        && direct_internal_fn_supported_p (IFN_POPCOUNT, type,
                                           OPTIMIZE_FOR_BOTH))
     (convert (IFN_POPCOUNT:type @0)))))
+
+/* __builtin_ffs needs to deal on many targets with the possible zero
+   argument.  If we know the argument is always non-zero, __builtin_ctz + 1
+   should lead to better code.  */
+(simplify
+ (FFS tree_expr_nonzero_p@0)
+ (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+      && direct_internal_fn_supported_p (IFN_CTZ, TREE_TYPE (@0),
+                                        OPTIMIZE_FOR_SPEED))
+  (plus (CTZ:type @0) { build_one_cst (type); })))
 #endif
 
 /* Simplify:
index 43e226e7e1889dd8ef25269f4d294afa7982e4ac..e8c54c7cd67764d88d6cb362b59ab8f8c624ab28 100644 (file)
@@ -1,5 +1,8 @@
 2020-05-08  Jakub Jelinek  <jakub@redhat.com>
 
+       PR tree-optimization/94956
+       * gcc.target/i386/pr94956.c: New test.
+
        PR tree-optimization/94913
        * gcc.dg/tree-ssa/pr94913.c: New test.
 
diff --git a/gcc/testsuite/gcc.target/i386/pr94956.c b/gcc/testsuite/gcc.target/i386/pr94956.c
new file mode 100644 (file)
index 0000000..cc27b45
--- /dev/null
@@ -0,0 +1,28 @@
+/* PR tree-optimization/94956 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "\tcmovne\t" } } */
+/* { dg-final { scan-assembler-not "\tsete\t" } } */
+
+int
+foo (unsigned x)
+{
+  if (x == 0) __builtin_unreachable ();
+  return __builtin_ffs (x) - 1;
+}
+
+int
+bar (unsigned long x)
+{
+  if (x == 0) __builtin_unreachable ();
+  return __builtin_ffsl (x) - 1;
+}
+
+#ifdef __x86_64__
+int
+baz (unsigned long long x)
+{
+  if (x == 0) __builtin_unreachable ();
+  return __builtin_ffsll (x) - 1;
+}
+#endif