openmp: cgraph support for late declare variant resolution
authorJakub Jelinek <jakub@redhat.com>
Thu, 14 May 2020 07:58:53 +0000 (09:58 +0200)
committerJakub Jelinek <jakub@redhat.com>
Thu, 14 May 2020 07:58:53 +0000 (09:58 +0200)
This is a new version of the
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg01493.html
patch.  Unlike the previous version, this one actually works properly
except for LTO, bootstrapped/regtested on x86_64-linux and i686-linux
too.

In short, #pragma omp declare variant is a directive which allows
redirection of direct calls to certain function to other calls with a
scoring system and some of those decisions need to be deferred until after
IPA.  The patch represents them with calls to an artificial FUNCTION_DECL
with declare_variant_alt in the cgraph_node set.

For LTO, the patch only saves/restores the two cgraph_node bits added in the
patch, but doesn't yet stream out and back in the on the side info for the
declare_variant_alt.  For the LTO partitioning, I believe those artificial
FUNCTION_DECLs with declare_variant_alt need to go into partition together
with anything that calls them (possibly duplicated), any way how to achieve
that?  Say if declare variant artificial fn foobar is directly
called from all of foo, bar and baz and not from qux and we want 4
partitions, one for each of foo, bar, baz, qux, then foobar is needed in the
first 3 partitions, and the IPA_REF_ADDRs recorded for foobar that right
after IPA the foobar call will be replaced with calls to foobar1, foobar2,
foobar3 or foobar (non-artificial) can of course stay in different
partitions if needed.

2020-05-14  Jakub Jelinek  <jakub@redhat.com>

* Makefile.in (GTFILES): Add omp-general.c.
* cgraph.h (struct cgraph_node): Add declare_variant_alt and
calls_declare_variant_alt members and initialize them in the
ctor.
* ipa.c (symbol_table::remove_unreachable_nodes): Handle direct
calls to declare_variant_alt nodes.
* lto-cgraph.c (lto_output_node): Write declare_variant_alt
and calls_declare_variant_alt.
(input_overwrite_node): Read them back.
* omp-simd-clone.c (simd_clone_create): Copy calls_declare_variant_alt
bit.
* tree-inline.c (expand_call_inline): Or in calls_declare_variant_alt
bit.
(tree_function_versioning): Copy calls_declare_variant_alt bit.
* omp-offload.c (execute_omp_device_lower): Call
omp_resolve_declare_variant on direct function calls.
(pass_omp_device_lower::gate): Also enable for
calls_declare_variant_alt functions.
* omp-general.c (omp_maybe_offloaded): Return false after inlining.
(omp_context_selector_matches): Handle the case when
cfun->curr_properties has PROP_gimple_any bit set.
(struct omp_declare_variant_entry): New type.
(struct omp_declare_variant_base_entry): New type.
(struct omp_declare_variant_hasher): New type.
(omp_declare_variant_hasher::hash, omp_declare_variant_hasher::equal):
New methods.
(omp_declare_variants): New variable.
(struct omp_declare_variant_alt_hasher): New type.
(omp_declare_variant_alt_hasher::hash,
omp_declare_variant_alt_hasher::equal): New methods.
(omp_declare_variant_alt): New variables.
(omp_resolve_late_declare_variant): New function.
(omp_resolve_declare_variant): Call omp_resolve_late_declare_variant
when called late.  Create a magic declare_variant_alt fndecl and
cgraph node and return that if decision needs to be deferred until
after gimplification.
* cgraph.c (symbol_table::create_edge): Or in calls_declare_variant_alt
bit.

* c-c++-common/gomp/declare-variant-14.c: New test.

12 files changed:
gcc/ChangeLog
gcc/Makefile.in
gcc/cgraph.c
gcc/cgraph.h
gcc/ipa.c
gcc/lto-cgraph.c
gcc/omp-general.c
gcc/omp-offload.c
gcc/omp-simd-clone.c
gcc/testsuite/ChangeLog
gcc/testsuite/c-c++-common/gomp/declare-variant-14.c [new file with mode: 0644]
gcc/tree-inline.c

index 0f0dbd051c9cfff9b042ed83da4896e2f829d2ac..360ad7a5b583d86ee40631997e6778cd65014ea5 100644 (file)
@@ -1,5 +1,44 @@
 2020-05-14  Jakub Jelinek  <jakub@redhat.com>
 
+       * Makefile.in (GTFILES): Add omp-general.c.
+       * cgraph.h (struct cgraph_node): Add declare_variant_alt and
+       calls_declare_variant_alt members and initialize them in the
+       ctor.
+       * ipa.c (symbol_table::remove_unreachable_nodes): Handle direct
+       calls to declare_variant_alt nodes.
+       * lto-cgraph.c (lto_output_node): Write declare_variant_alt
+       and calls_declare_variant_alt.
+       (input_overwrite_node): Read them back.
+       * omp-simd-clone.c (simd_clone_create): Copy calls_declare_variant_alt
+       bit.
+       * tree-inline.c (expand_call_inline): Or in calls_declare_variant_alt
+       bit.
+       (tree_function_versioning): Copy calls_declare_variant_alt bit.
+       * omp-offload.c (execute_omp_device_lower): Call
+       omp_resolve_declare_variant on direct function calls.
+       (pass_omp_device_lower::gate): Also enable for
+       calls_declare_variant_alt functions.
+       * omp-general.c (omp_maybe_offloaded): Return false after inlining.
+       (omp_context_selector_matches): Handle the case when
+       cfun->curr_properties has PROP_gimple_any bit set.
+       (struct omp_declare_variant_entry): New type.
+       (struct omp_declare_variant_base_entry): New type.
+       (struct omp_declare_variant_hasher): New type. 
+       (omp_declare_variant_hasher::hash, omp_declare_variant_hasher::equal):
+       New methods.
+       (omp_declare_variants): New variable.
+       (struct omp_declare_variant_alt_hasher): New type.
+       (omp_declare_variant_alt_hasher::hash,
+       omp_declare_variant_alt_hasher::equal): New methods.
+       (omp_declare_variant_alt): New variables.
+       (omp_resolve_late_declare_variant): New function.
+       (omp_resolve_declare_variant): Call omp_resolve_late_declare_variant
+       when called late.  Create a magic declare_variant_alt fndecl and
+       cgraph node and return that if decision needs to be deferred until
+       after gimplification.
+       * cgraph.c (symbol_table::create_edge): Or in calls_declare_variant_alt
+       bit.
+
        PR middle-end/95108
        * omp-simd-clone.c (struct modify_stmt_info): Add after_stmt member.
        (ipa_simd_modify_stmt_ops): For PHIs, only add before first stmt in
index b49d6b0d31ffcfdce94abe3e78d575b26bb9dffd..9ba21f735f6dfe632f57a18a0a9216ce6dac50a4 100644 (file)
@@ -2616,6 +2616,7 @@ GTFILES = $(CPPLIB_H) $(srcdir)/input.h $(srcdir)/coretypes.h \
   $(srcdir)/omp-offload.h \
   $(srcdir)/omp-offload.c \
   $(srcdir)/omp-expand.c \
+  $(srcdir)/omp-general.c \
   $(srcdir)/omp-low.c \
   $(srcdir)/targhooks.c $(out_file) $(srcdir)/passes.c $(srcdir)/cgraphunit.c \
   $(srcdir)/cgraphclones.c \
index 2a9813df2d91c6f6d617539a2f40aecf0f64ea95..c0b457950595e800c1a40a60ff3f26879b4953f9 100644 (file)
@@ -915,6 +915,8 @@ symbol_table::create_edge (cgraph_node *caller, cgraph_node *callee,
                                      caller->decl);
   else
     edge->in_polymorphic_cdtor = caller->thunk.thunk_p;
+  if (callee)
+    caller->calls_declare_variant_alt |= callee->declare_variant_alt;
 
   if (callee && symtab->state != LTO_STREAMING
       && edge->callee->comdat_local_p ())
index 5ddeb65269bd69b75a52ab5236ed9d83a102a31f..cfae6e91da92bd50474e2925b12311c3dd69c410 100644 (file)
@@ -937,7 +937,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node
       split_part (false), indirect_call_target (false), local (false),
       versionable (false), can_change_signature (false),
       redefined_extern_inline (false), tm_may_enter_irr (false),
-      ipcp_clone (false), m_uid (uid), m_summary_id (-1)
+      ipcp_clone (false), declare_variant_alt (false),
+      calls_declare_variant_alt (false), m_uid (uid), m_summary_id (-1)
   {}
 
   /* Remove the node from cgraph and all inline clones inlined into it.
@@ -1539,6 +1540,11 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node
   unsigned tm_may_enter_irr : 1;
   /* True if this was a clone created by ipa-cp.  */
   unsigned ipcp_clone : 1;
+  /* True if this is the deferred declare variant resolution artificial
+     function.  */
+  unsigned declare_variant_alt : 1;
+  /* True if the function calls declare_variant_alt functions.  */
+  unsigned calls_declare_variant_alt : 1;
 
 private:
   /* Unique id of the node.  */
index 554819316682a3f2cd5950baba84b858edc03b16..288b58cf73d01d6b7560eb7e22c9413e91dcabdb 100644 (file)
--- a/gcc/ipa.c
+++ b/gcc/ipa.c
@@ -450,6 +450,9 @@ symbol_table::remove_unreachable_nodes (FILE *file)
                        reachable.add (body);
                      reachable.add (e->callee);
                    }
+                 else if (e->callee->declare_variant_alt
+                          && !e->callee->in_other_partition)
+                   reachable.add (e->callee);
                  enqueue_node (e->callee, &first, &reachable);
                }
 
index b0c7ebf775bd81f7a748da2e918847bd521829e5..17b6cfd83a78de48aff616687a6160b74cd8d900 100644 (file)
@@ -535,6 +535,8 @@ lto_output_node (struct lto_simple_output_block *ob, struct cgraph_node *node,
   bp_pack_value (&bp, node->merged_extern_inline, 1);
   bp_pack_value (&bp, node->thunk.thunk_p, 1);
   bp_pack_value (&bp, node->parallelized_function, 1);
+  bp_pack_value (&bp, node->declare_variant_alt, 1);
+  bp_pack_value (&bp, node->calls_declare_variant_alt, 1);
   bp_pack_enum (&bp, ld_plugin_symbol_resolution,
                LDPR_NUM_KNOWN,
                /* When doing incremental link, we will get new resolution
@@ -1186,6 +1188,8 @@ input_overwrite_node (struct lto_file_decl_data *file_data,
   node->merged_extern_inline = bp_unpack_value (bp, 1);
   node->thunk.thunk_p = bp_unpack_value (bp, 1);
   node->parallelized_function = bp_unpack_value (bp, 1);
+  node->declare_variant_alt = bp_unpack_value (bp, 1);
+  node->calls_declare_variant_alt = bp_unpack_value (bp, 1);
   node->resolution = bp_unpack_enum (bp, ld_plugin_symbol_resolution,
                                     LDPR_NUM_KNOWN);
   node->split_part = bp_unpack_value (bp, 1);
index 49023f42c473ae52ef0827d856f98537d99ed101..315f24aeddf3454e8816829d8d3b06aacd1915ec 100644 (file)
@@ -642,6 +642,8 @@ omp_maybe_offloaded (void)
   if (symtab->state == PARSING)
     /* Maybe.  */
     return true;
+  if (cfun && cfun->after_inlining)
+    return false;
   if (current_function_decl
       && lookup_attribute ("omp declare target",
                           DECL_ATTRIBUTES (current_function_decl)))
@@ -694,8 +696,7 @@ omp_context_selector_matches (tree ctx)
             (so in most of the cases), and we'd need to maintain set of
             surrounding OpenMP constructs, which is better handled during
             gimplification.  */
-         if (symtab->state == PARSING
-             || (cfun->curr_properties & PROP_gimple_any) != 0)
+         if (symtab->state == PARSING)
            {
              ret = -1;
              continue;
@@ -704,6 +705,28 @@ omp_context_selector_matches (tree ctx)
          enum tree_code constructs[5];
          int nconstructs
            = omp_constructor_traits_to_codes (TREE_VALUE (t1), constructs);
+
+         if (cfun && (cfun->curr_properties & PROP_gimple_any) != 0)
+           {
+             if (!cfun->after_inlining)
+               {
+                 ret = -1;
+                 continue;
+               }
+             int i;
+             for (i = 0; i < nconstructs; ++i)
+               if (constructs[i] == OMP_SIMD)
+                 break;
+             if (i < nconstructs)
+               {
+                 ret = -1;
+                 continue;
+               }
+             /* If there is no simd, assume it is ok after IPA,
+                constructs should have been checked before.  */
+             continue;
+           }
+
          int r = omp_construct_selector_matches (constructs, nconstructs,
                                                  NULL);
          if (r == 0)
@@ -738,6 +761,9 @@ omp_context_selector_matches (tree ctx)
            case 'a':
              if (set == 'i' && !strcmp (sel, "atomic_default_mem_order"))
                {
+                 if (cfun && (cfun->curr_properties & PROP_gimple_any) != 0)
+                   break;
+
                  enum omp_memory_order omo
                    = ((enum omp_memory_order)
                       (omp_requires_mask
@@ -816,6 +842,9 @@ omp_context_selector_matches (tree ctx)
            case 'u':
              if (set == 'i' && !strcmp (sel, "unified_address"))
                {
+                 if (cfun && (cfun->curr_properties & PROP_gimple_any) != 0)
+                   break;
+
                  if ((omp_requires_mask & OMP_REQUIRES_UNIFIED_ADDRESS) == 0)
                    {
                      if (symtab->state == PARSING)
@@ -827,6 +856,9 @@ omp_context_selector_matches (tree ctx)
                }
              if (set == 'i' && !strcmp (sel, "unified_shared_memory"))
                {
+                 if (cfun && (cfun->curr_properties & PROP_gimple_any) != 0)
+                   break;
+
                  if ((omp_requires_mask
                       & OMP_REQUIRES_UNIFIED_SHARED_MEMORY) == 0)
                    {
@@ -841,6 +873,9 @@ omp_context_selector_matches (tree ctx)
            case 'd':
              if (set == 'i' && !strcmp (sel, "dynamic_allocators"))
                {
+                 if (cfun && (cfun->curr_properties & PROP_gimple_any) != 0)
+                   break;
+
                  if ((omp_requires_mask
                       & OMP_REQUIRES_DYNAMIC_ALLOCATORS) == 0)
                    {
@@ -855,6 +890,9 @@ omp_context_selector_matches (tree ctx)
            case 'r':
              if (set == 'i' && !strcmp (sel, "reverse_offload"))
                {
+                 if (cfun && (cfun->curr_properties & PROP_gimple_any) != 0)
+                   break;
+
                  if ((omp_requires_mask & OMP_REQUIRES_REVERSE_OFFLOAD) == 0)
                    {
                      if (symtab->state == PARSING)
@@ -944,7 +982,8 @@ omp_context_selector_matches (tree ctx)
                           #pragma omp declare simd on it, some simd clones
                           might have the isa added later on.  */
                        if (r == -1
-                           && targetm.simd_clone.compute_vecsize_and_simdlen)
+                           && targetm.simd_clone.compute_vecsize_and_simdlen
+                           && (cfun == NULL || !cfun->after_inlining))
                          {
                            tree attrs
                              = DECL_ATTRIBUTES (current_function_decl);
@@ -1415,6 +1454,191 @@ omp_context_compute_score (tree ctx, widest_int *score, bool declare_simd)
   return ret;
 }
 
+/* Class describing a single variant.  */
+struct GTY(()) omp_declare_variant_entry {
+  /* NODE of the variant.  */
+  cgraph_node *variant;
+  /* Score if not in declare simd clone.  */
+  widest_int score;
+  /* Score if in declare simd clone.  */
+  widest_int score_in_declare_simd_clone;
+  /* Context selector for the variant.  */
+  tree ctx;
+  /* True if the context selector is known to match already.  */
+  bool matches;
+};
+
+/* Class describing a function with variants.  */
+struct GTY((for_user)) omp_declare_variant_base_entry {
+  /* NODE of the base function.  */
+  cgraph_node *base;
+  /* NODE of the artificial function created for the deferred variant
+     resolution.  */
+  cgraph_node *node;
+  /* Vector of the variants.  */
+  vec<omp_declare_variant_entry, va_gc> *variants;
+};
+
+struct omp_declare_variant_hasher
+  : ggc_ptr_hash<omp_declare_variant_base_entry> {
+  static hashval_t hash (omp_declare_variant_base_entry *);
+  static bool equal (omp_declare_variant_base_entry *,
+                    omp_declare_variant_base_entry *);
+};
+
+hashval_t
+omp_declare_variant_hasher::hash (omp_declare_variant_base_entry *x)
+{
+  inchash::hash hstate;
+  hstate.add_int (DECL_UID (x->base->decl));
+  hstate.add_int (x->variants->length ());
+  omp_declare_variant_entry *variant;
+  unsigned int i;
+  FOR_EACH_VEC_SAFE_ELT (x->variants, i, variant)
+    {
+      hstate.add_int (DECL_UID (variant->variant->decl));
+      hstate.add_wide_int (variant->score);
+      hstate.add_wide_int (variant->score_in_declare_simd_clone);
+      hstate.add_ptr (variant->ctx);
+      hstate.add_int (variant->matches);
+    }
+  return hstate.end ();
+}
+
+bool
+omp_declare_variant_hasher::equal (omp_declare_variant_base_entry *x,
+                                  omp_declare_variant_base_entry *y)
+{
+  if (x->base != y->base
+      || x->variants->length () != y->variants->length ())
+    return false;
+  omp_declare_variant_entry *variant;
+  unsigned int i;
+  FOR_EACH_VEC_SAFE_ELT (x->variants, i, variant)
+    if (variant->variant != (*y->variants)[i].variant
+       || variant->score != (*y->variants)[i].score
+       || (variant->score_in_declare_simd_clone
+           != (*y->variants)[i].score_in_declare_simd_clone)
+       || variant->ctx != (*y->variants)[i].ctx
+       || variant->matches != (*y->variants)[i].matches)
+      return false;
+  return true;
+}
+
+static GTY(()) hash_table<omp_declare_variant_hasher> *omp_declare_variants;
+
+struct omp_declare_variant_alt_hasher
+  : ggc_ptr_hash<omp_declare_variant_base_entry> {
+  static hashval_t hash (omp_declare_variant_base_entry *);
+  static bool equal (omp_declare_variant_base_entry *,
+                    omp_declare_variant_base_entry *);
+};
+
+hashval_t
+omp_declare_variant_alt_hasher::hash (omp_declare_variant_base_entry *x)
+{
+  return DECL_UID (x->node->decl);
+}
+
+bool
+omp_declare_variant_alt_hasher::equal (omp_declare_variant_base_entry *x,
+                                      omp_declare_variant_base_entry *y)
+{
+  return x->node == y->node;
+}
+
+static GTY(()) hash_table<omp_declare_variant_alt_hasher>
+  *omp_declare_variant_alt;
+
+/* Try to resolve declare variant after gimplification.  */
+
+static tree
+omp_resolve_late_declare_variant (tree alt)
+{
+  cgraph_node *node = cgraph_node::get (alt);
+  cgraph_node *cur_node = cgraph_node::get (cfun->decl);
+  if (node == NULL
+      || !node->declare_variant_alt
+      || !cfun->after_inlining)
+    return alt;
+
+  omp_declare_variant_base_entry entry;
+  entry.base = NULL;
+  entry.node = node;
+  entry.variants = NULL;
+  omp_declare_variant_base_entry *entryp
+    = omp_declare_variant_alt->find_with_hash (&entry, DECL_UID (alt));
+
+  unsigned int i, j;
+  omp_declare_variant_entry *varentry1, *varentry2;
+  auto_vec <bool, 16> matches;
+  unsigned int nmatches = 0;
+  FOR_EACH_VEC_SAFE_ELT (entryp->variants, i, varentry1)
+    {
+      if (varentry1->matches)
+       {
+         /* This has been checked to be ok already.  */
+         matches.safe_push (true);
+         nmatches++;
+         continue;
+       }
+      switch (omp_context_selector_matches (varentry1->ctx))
+       {
+       case 0:
+          matches.safe_push (false);
+         break;
+       case -1:
+         return alt;
+       default:
+         matches.safe_push (true);
+         nmatches++;
+         break;
+       }
+    }
+
+  if (nmatches == 0)
+    return entryp->base->decl;
+
+  /* A context selector that is a strict subset of another context selector
+     has a score of zero.  */
+  FOR_EACH_VEC_SAFE_ELT (entryp->variants, i, varentry1)
+    if (matches[i])
+      {
+        for (j = i + 1;
+            vec_safe_iterate (entryp->variants, j, &varentry2); ++j)
+         if (matches[j])
+           {
+             int r = omp_context_selector_compare (varentry1->ctx,
+                                                   varentry2->ctx);
+             if (r == -1)
+               {
+                 /* ctx1 is a strict subset of ctx2, ignore ctx1.  */
+                 matches[i] = false;
+                 break;
+               }
+             else if (r == 1)
+               /* ctx2 is a strict subset of ctx1, remove ctx2.  */
+               matches[j] = false;
+           }
+      }
+
+  widest_int max_score = -1;
+  varentry2 = NULL;
+  FOR_EACH_VEC_SAFE_ELT (entryp->variants, i, varentry1)
+    if (matches[i])
+      {
+       widest_int score
+         = (cur_node->simdclone ? varentry1->score_in_declare_simd_clone
+            : varentry1->score);
+       if (score > max_score)
+         {
+           max_score = score;
+           varentry2 = varentry1;
+         }
+      }
+  return varentry2->variant->decl;
+}
+
 /* Try to resolve declare variant, return the variant decl if it should
    be used instead of base, or base otherwise.  */
 
@@ -1422,6 +1646,9 @@ tree
 omp_resolve_declare_variant (tree base)
 {
   tree variant1 = NULL_TREE, variant2 = NULL_TREE;
+  if (cfun && (cfun->curr_properties & PROP_gimple_any) != 0)
+    return omp_resolve_late_declare_variant (base);
+
   auto_vec <tree, 16> variants;
   auto_vec <bool, 16> defer;
   bool any_deferred = false;
@@ -1459,6 +1686,10 @@ omp_resolve_declare_variant (tree base)
       bool first = true;
       unsigned int i;
       tree attr1, attr2;
+      omp_declare_variant_base_entry entry;
+      entry.base = cgraph_node::get_create (base);
+      entry.node = NULL;
+      vec_alloc (entry.variants, variants.length ());
       FOR_EACH_VEC_ELT (variants, i, attr1)
        {
          widest_int score1;
@@ -1498,6 +1729,14 @@ omp_resolve_declare_variant (tree base)
                  variant2 = defer[i] ? NULL_TREE : attr1;
                }
            }
+         omp_declare_variant_entry varentry;
+         varentry.variant
+           = cgraph_node::get_create (TREE_PURPOSE (TREE_VALUE (attr1)));
+         varentry.score = score1;
+         varentry.score_in_declare_simd_clone = score2;
+         varentry.ctx = ctx;
+         varentry.matches = !defer[i];
+         entry.variants->quick_push (varentry);
        }
 
       /* If there is a clear winner variant with the score which is not
@@ -1522,17 +1761,67 @@ omp_resolve_declare_variant (tree base)
                }
            }
          if (variant1)
-           return TREE_PURPOSE (TREE_VALUE (variant1));
+           {
+             vec_free (entry.variants);
+             return TREE_PURPOSE (TREE_VALUE (variant1));
+           }
+       }
+
+      if (omp_declare_variants == NULL)
+       omp_declare_variants
+         = hash_table<omp_declare_variant_hasher>::create_ggc (64);
+      omp_declare_variant_base_entry **slot
+       = omp_declare_variants->find_slot (&entry, INSERT);
+      if (*slot != NULL)
+       {
+         vec_free (entry.variants);
+         return (*slot)->node->decl;
        }
 
-      return base;
+      *slot = ggc_cleared_alloc<omp_declare_variant_base_entry> ();
+      (*slot)->base = entry.base;
+      (*slot)->node = entry.base;
+      (*slot)->variants = entry.variants;
+      tree alt = build_decl (DECL_SOURCE_LOCATION (base), FUNCTION_DECL,
+                            DECL_NAME (base), TREE_TYPE (base));
+      DECL_ARTIFICIAL (alt) = 1;
+      DECL_IGNORED_P (alt) = 1;
+      TREE_STATIC (alt) = 1;
+      tree attributes = DECL_ATTRIBUTES (base);
+      if (lookup_attribute ("noipa", attributes) == NULL)
+       {
+         attributes = tree_cons (get_identifier ("noipa"), NULL, attributes);
+         if (lookup_attribute ("noinline", attributes) == NULL)
+           attributes = tree_cons (get_identifier ("noinline"), NULL,
+                                   attributes);
+         if (lookup_attribute ("noclone", attributes) == NULL)
+           attributes = tree_cons (get_identifier ("noclone"), NULL,
+                                   attributes);
+         if (lookup_attribute ("no_icf", attributes) == NULL)
+           attributes = tree_cons (get_identifier ("no_icf"), NULL,
+                                   attributes);
+       }
+      DECL_ATTRIBUTES (alt) = attributes;
+      DECL_INITIAL (alt) = error_mark_node;
+      (*slot)->node = cgraph_node::create (alt);
+      (*slot)->node->declare_variant_alt = 1;
+      (*slot)->node->create_reference (entry.base, IPA_REF_ADDR);
+      omp_declare_variant_entry *varentry;
+      FOR_EACH_VEC_SAFE_ELT (entry.variants, i, varentry)
+       (*slot)->node->create_reference (varentry->variant, IPA_REF_ADDR);
+      if (omp_declare_variant_alt == NULL)
+       omp_declare_variant_alt
+         = hash_table<omp_declare_variant_alt_hasher>::create_ggc (64);
+      *omp_declare_variant_alt->find_slot_with_hash (*slot, DECL_UID (alt),
+                                                    INSERT) = *slot;
+      return alt;
     }
 
   if (variants.length () == 1)
     return TREE_PURPOSE (TREE_VALUE (variants[0]));
 
-  /* A context selector that is a strict subset of another context selector has a score
-     of zero.  */
+  /* A context selector that is a strict subset of another context selector
+     has a score of zero.  */
   tree attr1, attr2;
   unsigned int i, j;
   FOR_EACH_VEC_ELT (variants, i, attr1)
@@ -1948,3 +2237,5 @@ oacc_get_ifn_dim_arg (const gimple *stmt)
   gcc_checking_assert (axis >= 0 && axis < GOMP_DIM_MAX);
   return (int) axis;
 }
+
+#include "gt-omp-general.h"
index 3e7012d649f7a6f17c2b3bfa4c95df387b7941d7..b2df91a5724b73949a1d85fa0f1b6651b0dd5e57 100644 (file)
@@ -2066,12 +2066,28 @@ execute_omp_device_lower ()
   bool regimplify = false;
   basic_block bb;
   gimple_stmt_iterator gsi;
+  bool calls_declare_variant_alt
+    = cgraph_node::get (cfun->decl)->calls_declare_variant_alt;
   FOR_EACH_BB_FN (bb, cfun)
     for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
       {
        gimple *stmt = gsi_stmt (gsi);
-       if (!is_gimple_call (stmt) || !gimple_call_internal_p (stmt))
+       if (!is_gimple_call (stmt))
          continue;
+       if (!gimple_call_internal_p (stmt))
+         {
+           if (calls_declare_variant_alt)
+             if (tree fndecl = gimple_call_fndecl (stmt))
+               {
+                 tree new_fndecl = omp_resolve_declare_variant (fndecl);
+                 if (new_fndecl != fndecl)
+                   {
+                     gimple_call_set_fndecl (stmt, new_fndecl);
+                     update_stmt (stmt);
+                   }
+               }
+           continue;
+         }
        tree lhs = gimple_call_lhs (stmt), rhs = NULL_TREE;
        tree type = lhs ? TREE_TYPE (lhs) : integer_type_node;
        switch (gimple_call_internal_fn (stmt))
@@ -2165,7 +2181,9 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *fun)
     {
-      return !(fun->curr_properties & PROP_gimple_lomp_dev);
+      return (!(fun->curr_properties & PROP_gimple_lomp_dev)
+             || (flag_openmp
+                 && cgraph_node::get (fun->decl)->calls_declare_variant_alt));
     }
   virtual unsigned int execute (function *)
     {
index 09fdd6074321b219cc3679e423bc69a6ff9f4b4f..942fb971cb786e47ad444662d9654d3580b4c468 100644 (file)
@@ -477,6 +477,7 @@ simd_clone_create (struct cgraph_node *old_node)
      the old node.  */
   new_node->local = old_node->local;
   new_node->externally_visible = old_node->externally_visible;
+  new_node->calls_declare_variant_alt = old_node->calls_declare_variant_alt;
 
   return new_node;
 }
index 13d55484c6a204fa4c23c7d3e6ea519f3b3c1e8b..2b6d4becf4eca2c1c3ea8045ad854d2bf2fe7122 100644 (file)
@@ -1,5 +1,7 @@
 2020-05-14  Jakub Jelinek  <jakub@redhat.com>
 
+       * c-c++-common/gomp/declare-variant-14.c: New test.
+
        PR middle-end/95108
        * gcc.dg/gomp/pr95108.c: New test.
 
diff --git a/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c b/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c
new file mode 100644 (file)
index 0000000..cdb0bb3
--- /dev/null
@@ -0,0 +1,28 @@
+/* { dg-do compile { target vect_simd_clones } } */
+/* { dg-additional-options "-fdump-tree-gimple -fdump-tree-optimized" } */
+/* { dg-additional-options "-mno-sse3" { target { i?86-*-* x86_64-*-* } } } */
+
+int f01 (int);
+int f02 (int);
+int f03 (int);
+#pragma omp declare variant (f01) match (device={isa("avx512f")}) /* 4 or 8 */
+#pragma omp declare variant (f02) match (implementation={vendor(score(3):gnu)},device={kind(cpu)}) /* (1 or 2) + 3 */
+#pragma omp declare variant (f03) match (implementation={vendor(score(5):gnu)},device={kind(host)}) /* (1 or 2) + 5 */
+int f04 (int);
+
+#pragma omp declare simd
+int
+test1 (int x)
+{
+  /* At gimplification time, we can't decide yet which function to call.  */
+  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" } } */
+  /* After simd clones are created, the original non-clone test1 shall
+     call f03 (score 6), the sse2/avx/avx2 clones too, but avx512f clones
+     shall call f01 with score 8.  */
+  /* { dg-final { scan-tree-dump-not "f04 \\\(x" "optimized" } } */
+  /* { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" } } */
+  /* { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" } } */
+  int a = f04 (x);
+  int b = f04 (x);
+  return a + b;
+}
index 8c5d5da0567703feacee5fffbabd22eb74a3491a..ee96c9cfff08a12324a1e77fd411b0996d184950 100644 (file)
@@ -4900,6 +4900,8 @@ expand_call_inline (basic_block bb, gimple *stmt, copy_body_data *id,
   if (src_properties != prop_mask)
     dst_cfun->curr_properties &= src_properties | ~prop_mask;
   dst_cfun->calls_eh_return |= id->src_cfun->calls_eh_return;
+  id->dst_node->calls_declare_variant_alt
+    |= id->src_node->calls_declare_variant_alt;
 
   gcc_assert (!id->src_cfun->after_inlining);
 
@@ -6231,6 +6233,8 @@ tree_function_versioning (tree old_decl, tree new_decl,
   DECL_ARGUMENTS (new_decl) = DECL_ARGUMENTS (old_decl);
   initialize_cfun (new_decl, old_decl,
                   new_entry ? new_entry->count : old_entry_block->count);
+  new_version_node->calls_declare_variant_alt
+    = old_version_node->calls_declare_variant_alt;
   if (DECL_STRUCT_FUNCTION (new_decl)->gimple_df)
     DECL_STRUCT_FUNCTION (new_decl)->gimple_df->ipa_pta
       = id.src_cfun->gimple_df->ipa_pta;