git.libre-soc.org Git

options: Save and restore opts_set for Optimization and Target options fallout

> This breaks ia64:
>
> In file included from ./tm.h:23,
>                  from ../../gcc/gencheck.c:23:
> ./options.h:7816:40: error: ISO C++ forbids zero-size array 'explicit_mask' [-Werror=pedantic]
>  7816 |   unsigned HOST_WIDE_INT explicit_mask[0];
>       |                                        ^
> ./options.h:7816:26: error: zero-size array member 'cl_target_option::explicit_mask' not at end of 'struct cl_target_option' [-Werror=pedantic]
>  7816 |   unsigned HOST_WIDE_INT explicit_mask[0];
>       |                          ^~~~~~~~~~~~~
> ./options.h:7812:16: note: in the definition of 'struct cl_target_option'
>  7812 | struct GTY(()) cl_target_option
>       |                ^~~~~~~~~~~~~~~~

Oops, sorry.

The following patch should fix that and should also fix streaming of the
new explicit_mask_* members.

2020-10-05  Jakub Jelinek  <jakub@redhat.com>

* opth-gen.awk: Don't emit explicit_mask array if n_target_explicit
is equal to n_target_explicit_mask.
* optc-save-gen.awk: Compute has_target_explicit_mask and if false,
don't emit code iterating over explicit_mask array elements.  Stream
also explicit_mask_* target members.

store-merging: Fix up -Wnarrowing warning

I've noticed a -Wnarrowing warning on gimple-ssa-store-merging.c, this
change fixes that up.

2020-10-05 Jakub Jelinek <jakub@redhat.com>

* gimple-ssa-store-merging.c
(imm_store_chain_info::output_merged_store): Use ~0U instead of ~0 in
unsigned int array initializer.

[omp, ftracer] Don't duplicate blocks in SIMT region

When running the libgomp testsuite on x86_64-linux with nvptx accelerator on
the test-case included in this patch, we run into:
...
FAIL: libgomp.fortran/pr95654.f90 -O3 -fomit-frame-pointer -funroll-loops \
  -fpeel-loops -ftracer -finline-functions  execution test
...

The test-case is a minimal version of this FAIL:
...
FAIL: libgomp.fortran/pr66199-5.f90 -O3 -fomit-frame-pointer -funroll-loops \
  -fpeel-loops -ftracer -finline-functions  execution test
...
but that one has stopped failing at commit c2ebf4f10de "openmp: Add support
for non-rect simd and improve collapsed simd support".

The problem is that ftracer duplicates a block containing GOMP_SIMT_VOTE_ANY.

That is, before ftracer we have (dropping the GOMP_SIMT_ prefix):
...
bb4(ENTER_ALLOC)
*----------+
|           \
|            \
|             v
|             *
v             bb8
*<------------*
bb5(VOTE_ANY)
*-------------+
|             |
|             |
|             |
|             |
|             v
|             *
v             bb7(XCHG_IDX)
*<------------*
bb6(EXIT)
...

The XCHG_IDX internal-fn does inter-SIMT-lane communication, which for nvptx
maps onto shfl, an operator which has the requirement that the warp executing
the operator is convergent.  The warp diverges at bb4, and
reconverges at bb5, and does not diverge by going to bb7, so the shfl is
indeed executed by a convergent warp.

After ftracer, we have:
...
bb4(ENTER_ALLOC)
*----------+
|           \
|            \
|             \
|              \
v               v
*               *
bb5(VOTE_ANY)   bb8(VOTE_ANY)
*               *
|\             /|
| \  +--------+ |
|  \/           |
|  /\           |
| /  +----------v
|/              *
v               bb7(XCHG_IDX)
*<--------------*
bb6(EXIT)
...

The warp diverges again at bb5, but does not reconverge again before bb6, so
the shfl is executed by a divergent warp, which causes the FAIL.

Fix this by making ftracer ignore blocks containing ENTER_ALLOC, VOTE_ANY and
EXIT, effectively treating the SIMT region conservatively.

An argument can be made that the test needs to be added in a more
generic place, like gimple_can_duplicate_bb_p or some such, and that ftracer
then needs to use the generic test.  But that's a discussion with a much
broader scope, so I'm leaving that for another patch.

Bootstrapped and reg-tested on x86_64-linux.

Build on x86_64-linux with nvptx accelerator, tested with libgomp.

gcc/ChangeLog:

PR fortran/95654
* tracer.c (ignore_bb_p): Ignore GOMP_SIMT_ENTER_ALLOC,
GOMP_SIMT_VOTE_ANY and GOMP_SIMT_EXIT.

libgomp/ChangeLog:

2020-10-05  Tom de Vries  <tdevries@suse.de>

PR fortran/95654
* testsuite/libgomp.fortran/pr95654.f90: New test.

Daily bump.

PR fortran/97272 - Wrong answer from MAXLOC with character arg

The optional KIND argument to the MINLOC/MAXLOC intrinsic must not be
passed to the library function, as the kind conversion of the result
is treated explicitly elsewhere.

gcc/fortran/ChangeLog:

PR fortran/97272
* trans-intrinsic.c (strip_kind_from_actual): Helper function for
removal of KIND argument.
(gfc_conv_intrinsic_minmaxloc): Ignore KIND argument here, as it
is treated elsewhere.

gcc/testsuite/ChangeLog:

PR fortran/97272
* gfortran.dg/pr97272.f90: New test.

Daily bump.

aix: apply aix_malloc more narrowly.

In recent Technology Levels of AIX 7.2, new "#ifdef __cplusplus" have been
added. Thus, the aix_malloc fix was applied in wrong locations. This patch
increases the context to avoid this.

fixincludes/ChangeLog:

2020-10-03 Clément Chigot <clement.chigot@atos.net>

* inclhack.def (aix_malloc): Add more context to select.
* fixincl.x: Regenerate.
* tests/base/malloc.h: Update expected results.

options: Fix up opts_set saving/restoring for underlying vars of Mask/InverseMask options

Seems I've missed that set_option has special treatment for
CLVC_BIT_CLEAR/CLVC_BIT_SET.
Which means I'll need to change the generic handling, so that for
global_options_set elements mentioned in CLVC_BIT_* options are treated
differently, instead of using the accumulated bitmasks they'll need to use
their specific bitmask variables during the option saving/restoring.

Here is a patch that implements that.

2020-10-03 Jakub Jelinek <jakub@redhat.com>

* opth-gen.awk: For variables referenced in Mask and InverseMask,
don't use the explicit_mask bitmask array, but add separate
explicit_mask_* members with the same types as the variables.
* optc-save-gen.awk: Save, restore, compare and hash the separate
explicit_mask_* members.

Add gcc.dg/tree-ssa/modref-3.c testcase

* gcc.dg/tree-ssa/modref-3.c: New test.

Track access ranges in ipa-modref

this patch implements tracking of access ranges.  This is only applied when
base pointer is an arugment. Incrementally i will extend it to also track
TBAA basetype so we can disambiguate ranges for accesses to same basetype
(which makes is quite bit more effective). For this reason i track the access
offset separately from parameter offset (the second track combined adjustments
to the parameter). This is I think last feature I would like to add to the
memory access summary this stage1.

Further work will be needed to opitmize the summary and merge adjacent
range/make collapsing more intelingent (so we do not lose track that often),
but I wanted to keep basic patch simple.

According to the cc1plus stats:

Alias oracle query stats:
  refs_may_alias_p: 64108082 disambiguations, 74386675 queries
  ref_maybe_used_by_call_p: 142319 disambiguations, 65004781 queries
  call_may_clobber_ref_p: 23587 disambiguations, 29420 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 38117 queries
  nonoverlapping_refs_since_match_p: 19489 disambiguations, 55748 must overlaps, 76044 queries
  aliasing_component_refs_p: 54763 disambiguations, 755876 queries
  TBAA oracle: 24184658 disambiguations 56823187 queries
               16260329 are in alias set 0
               10617146 queries asked about the same object
               125 queries asked about the same alias set
               0 access volatile
               3960555 are dependent in the DAG
               1800374 are aritificially in conflict with void *

Modref stats:
  modref use: 10656 disambiguations, 47037 queries
  modref clobber: 1473322 disambiguations, 1961464 queries
  5027242 tbaa queries (2.563005 per modref query)
  649087 base compares (0.330920 per modref query)

PTA query stats:
  pt_solution_includes: 977385 disambiguations, 13609749 queries
  pt_solutions_intersect: 1032703 disambiguations, 13187507 queries

Which should still compare with
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554930.html
there is about 2% more load disambiguations and 3.6% more store that is not
great, but the TBAA part helps noticeably more and also this should help
with -fno-strict-aliasing.

I plan to work on improving param tracking too.

Bootstrapped/regtested x86_64-linux with the other changes, OK?

2020-10-02  Jan Hubicka  <hubicka@ucw.cz>

* ipa-modref-tree.c (test_insert_search_collapse): Update andling
of accesses.
(test_merge): Likewise.
* ipa-modref-tree.h (struct modref_access_node): Add offset, size,
max_size, parm_offset and parm_offset_known.
(modref_access_node::useful_p): Constify.
(modref_access_node::range_info_useful_p): New predicate.
(modref_access_node::operator==): New.
(struct modref_parm_map): New structure.
(modref_tree::merge): Update for racking parameters)
* ipa-modref.c (dump_access): Dump new fields.
(get_access): Fill in new fields.
(merge_call_side_effects): Update handling of parm map.
(write_modref_records): Stream new fields.
(read_modref_records): Stream new fields.
(compute_parm_map): Update for new parm map.
(ipa_merge_modref_summary_after_inlining): Update.
(modref_propagate_in_scc): Update.
* tree-ssa-alias.c (modref_may_conflict): Handle known ranges.

doc: Replace roudnevenl with roundevenl

PR other/97280
* doc/extend.texi: Replace roudnevenl with roundevenl

Daily bump.

rs6000: clean up headers in rs6000.c and rs6000-call.c

When Andrew Macleod investigated the recent rs6000 bootstrap failure,
he suggested a clean up of the headers in rs6000.c and rs6000-call.c.
It now is recommended to include ssa.h instead of the individual headers.
This also ensures that value-range.h is included and in the correct order
so that the tree-ssa-propagate.h inclusion of value-query.h and its
dependencies are satisfied.

Bootstrapped on powerpc-ibm-aix7.2.0.0 and powerpc64le-linux.

gcc/ChangeLog:

2020-10-02 David Edelsohn <dje.gcc@gmail.com>
Andrew MacLeod <amacleod@redhat.com>

* config/rs6000/rs6000.c: Include ssa.h. Reorder some headers.
* config/rs6000/rs6000-call.c: Same.

c++: Fix printing of C++20 template parameter object [PR97014]

No one is interested in the mangled name of the C++20 template parameter
object for a class NTTP.  So instead of printing

  required for the satisfaction of ‘positive<T::ratio>’ [with T = X<::_ZTAXtl5ratioLin1ELi2EEE>]

let's print

  required for the satisfaction of ‘positive<T::ratio>’ [with T = X<{-1, 2}>]

I don't think adding a test is necessary for this.

gcc/cp/ChangeLog:

PR c++/97014
* cxx-pretty-print.c (pp_cxx_template_argument_list): If the
argument is template_parm_object_p, print its DECL_INITIAL.

libstdc++: Change test to work without 64-bit atomics

This fixes a linker error for older ARM cores without 64-bit atomics.

I think the { dg-add-options libatomic } is no longer needed, but it's
harmless to keep it there.

libstdc++-v3/ChangeLog:

* testsuite/29_atomics/atomic_float/value_init.cc: Use float
instead of double so that __atomic_load_8 isn't needed.

libstdc++: Fix testcase by using terminate handler

This test was supposed to verify that when __libc_single_threaded is
available we successfully detect recursive static initialization even
when linked to libpthread. But I forgot to that when recursive init is
detected, we terminate, and so the test fails.

This adds a terminate handler that exits cleanly, so the test passes
when recursive init is detected.

libstdc++-v3/ChangeLog:

* testsuite/18_support/96817.cc: Use terminate handler that
calls _Exit(0).

c++: Kill DECL_ANTICIPATED

Here's the patch to remove DECL_ANTICIPATED, and with it hiddenness is
managed entirely in the symbol table.  Sadly I couldn't get rid of the
actual field without more investigation -- it's repurposed for
OMP_PRIVATIZED_MEMBER.  It looks like a the VAR-related flags in
lang_decl_base are not completely orthogonal, so perhaps some can be
turned into an enumeration or something.  But that's more than I want
to do right now.

DECL_FRIEND_P Is still slightly suspect as it appears to mean more
than just in-class definition.  However, I'm leaving that for now.

gcc/cp/
* cp-tree.h (lang_decl_base): anticipated_p is not used for
anticipatedness.
(DECL_ANTICIPATED): Delete.
* decl.c (duplicate_decls): Delete DECL_ANTICIPATED_management,
use was_hidden.
(cxx_builtin_function): Drop DECL_ANTICIPATED setting.
(xref_tag_1): Drop DECL_ANTICIPATED assert.
* name-lookup.c (name_lookup::adl_class_only): Drop
DECL_ANTICIPATED check.
(name_lookup::search_adl): Always dedup.
(anticipated_builtin_p): Reimplement.
(do_pushdecl): Drop DECL_ANTICIPATED asserts & update.
(lookup_elaborated_type_1): Drop DECL_ANTICIPATED update.
(do_pushtag): Drop DECL_ANTICIPATED setting.
* pt.c (push_template_decl): Likewise.
(tsubst_friend_class): Likewise.
libcc1/
* libcp1plugin.cc (libcp1plugin.cc): Drop DECL_ANTICIPATED test.

c++: Hash table iteration for namespace-member spelling suggestions

For 'no such binding' errors, we iterate over binding levels to find a
close match.  At the namespace level we were using DECL_ANTICIPATED to
skip undeclared builtins.  But (a) there are other unnameable things
there and (b) decl-anticipated is about to go away.  This changes the
namespace scanning to iterate over the hash table, and look at
non-hidden bindings.  This does mean we look at fewer strings
(hurrarh), but the order we meet them is somewhat 'random'.  Our
distance measure is not very fine grained, and a couple of testcases
change their suggestion.  I notice for the c/c++ common one, we now
match the output of the C compiler.  For the other one we think 'int'
and 'int64_t' have the same distance from 'int64', and now meet the
former first.  That's a little unfortunate.  If it's too problematic I
suppose we could sort the strings via an intermediate array before
measuring distance.

gcc/cp/
* name-lookup.c (consider_decl): New, broken out of ...
(consider_binding_level): ... here.  Iterate the hash table for
namespace bindings.
gcc/testsuite/
* c-c++-common/spellcheck-reserved.c: Adjust diagnostic.
* g++.dg/spellcheck-typenames.C: Adjust diagnostic.

c++: cleanup ctor_omit_inherited_parms [PR97268]

ctor_omit_inherited_parms was being somewhat abused.  What I'd missed
is that it checks for a base-dtor name, before proceeding with the
check.  But we ended up passing it that during cloning before we'd
completed the cloning.  It was also using DECL_ORIGIN to get to the
in-charge ctor, but we sometimes zap DECL_ABSTRACT_ORIGIN, and it ends
up processing the incoming function -- which happens to work.  so,
this breaks out a predicate that expects to get the incharge ctor, and
will tell you whether its base ctor will need to omit the parms.  We
call that directly during cloning.

Then the original fn is essentially just a wrapper, but uses
DECL_CLONED_FUNCTION to get to the in-charge ctor.  That uncovered
abuse in add_method, which was happily passing TEMPLATE_DECLs to it.
Let's not do that.  add_method itself contained a loop mostly
containing an 'if (nomatch) continue' idiom, except for a final 'if
(match) {...}'  check, which itself contained instances of the former
idiom.  I refactored that to use the former idiom throughout.  In
doing that I found a place where we'd issue an error, but then not
actually reject the new member.

gcc/cp/
* cp-tree.h (base_ctor_omit_inherited_parms): Declare.
* class.c (add_method): Refactor main loop, only pass fns to
ctor_omit_inherited_parms.
(build_cdtor_clones): Rename bool parms.
(clone_cdtor): Call base_ctor_omit_inherited_parms.
* method.c (base_ctor_omit_inherited_parms): New, broken out of
...
(ctor_omit_inherited_parms): ... here, call it with
DECL_CLONED_FUNCTION.
gcc/testsuite/
* g++.dg/inherit/pr97268.C: New.

ipa-cp: Separate and increase the large-unit parameter

A previous patch in the series has taught IPA-CP to identify the
important cloning opportunities in 548.exchange2_r as worthwhile on
their own, but the optimization is still prevented from taking place
because of the overall unit-growh limit.  This patches raises that
limit so that it takes place and the benchmark runs 30% faster (on AMD
Zen2 CPU at least).

Before this patch, IPA-CP uses the following formulae to arrive at the
overall_size limit:

base = MAX(orig_size, param_large_unit_insns)
unit_growth_limit = base + base * param_ipa_cp_unit_growth / 100

since param_ipa_cp_unit_growth has default 10, param_large_unit_insns
has default value 10000.

The problem with exchange2 (at least on zen2 but I have had a quick
look on aarch64 too) is that the original estimated unit size is 10513
and so param_large_unit_insns does not apply and the default limit is
therefore 11564 which is good enough only for one of the ideal 8
clonings, we need the limit to be at least 16291.

I would like to raise param_ipa_cp_unit_growth a little bit more soon
too, but most certainly not to 55.  Therefore, the large_unit must be
increased.  In this patch, I decided to decouple the inlining and
ipa-cp large-unit parameters.  It also makes sense because IPA-CP uses
it only at -O3 while inlining also at -O2 (IIUC).  But if we agree we
can try raising param_large_unit_insns to 13-14 thousand
"instructions," perhaps it is not necessary.  But then again, it may
make sense to actually increase the IPA-CP limit further.

I plan to experiment with IPA-CP tuning on a larger set of programs.
Meanwhile, mainly to address the 548.exchange2_r regression, I'm
suggesting this simple change.

gcc/ChangeLog:

2020-09-07  Martin Jambor  <mjambor@suse.cz>

* params.opt (ipa-cp-large-unit-insns): New parameter.
* ipa-cp.c (get_max_overall_size): Use the new parameter.

ipa-cp: Add dumping of overall_size after cloning

When experimenting with IPA-CP parameters, especially when looking
into exchange2_r, it has been very useful to know what the value of
overall_size is at different stages of the decision process. This
patch therefore adds it to the generated dumps.

gcc/ChangeLog:

2020-09-07 Martin Jambor <mjambor@suse.cz>

* ipa-cp.c (estimate_local_effects): Add overeall_size to dumped
string.
(decide_about_value): Add dumping new overall_size.

ipa: Multiple predicates for loop properties, with frequencies

This patch enhances the ability of IPA to reason under what conditions
loops in a function have known iteration counts or strides because it
replaces single predicates which currently hold conjunction of
predicates for all loops with vectors capable of holding multiple
predicates, each with a cumulative frequency of loops with the
property.

This second property is then used by IPA-CP to much more aggressively
boost its heuristic score for cloning opportunities which make
iteration counts or strides of frequent loops compile time constant.

gcc/ChangeLog:

2020-09-03  Martin Jambor  <mjambor@suse.cz>

* ipa-fnsummary.h (ipa_freqcounting_predicate): New type.
(ipa_fn_summary): Change the type of loop_iterations and loop_strides
to vectors of ipa_freqcounting_predicate.
(ipa_fn_summary::ipa_fn_summary): Construct the new vectors.
(ipa_call_estimates): New fields loops_with_known_iterations and
loops_with_known_strides.
* ipa-cp.c (hint_time_bonus): Multiply param_ipa_cp_loop_hint_bonus
with the expected frequencies of loops with known iteration count or
stride.
* ipa-fnsummary.c (add_freqcounting_predicate): New function.
(ipa_fn_summary::~ipa_fn_summary): Release the new vectors instead of
just two predicates.
(remap_hint_predicate_after_duplication): Replace with function
remap_freqcounting_preds_after_dup.
(ipa_fn_summary_t::duplicate): Use it or duplicate new vectors.
(ipa_dump_fn_summary): Dump the new vectors.
(analyze_function_body): Compute the loop property vectors.
(ipa_call_context::estimate_size_and_time): Calculate also
loops_with_known_iterations and loops_with_known_strides.  Adjusted
dumping accordinly.
(remap_hint_predicate): Replace with function
remap_freqcounting_predicate.
(ipa_merge_fn_summary_after_inlining): Use it.
(inline_read_section): Stream loopcounting vectors instead of two
simple predicates.
(ipa_fn_summary_write): Likewise.
* params.opt (ipa-max-loop-predicates): New parameter.
* doc/invoke.texi (ipa-max-loop-predicates): Document new param.

gcc/testsuite/ChangeLog:

2020-09-03  Martin Jambor  <mjambor@suse.cz>

* gcc.dg/ipa/ipcp-loophint-1.c: New test.

ipa: Bundle estimates of ipa_call_context::estimate_size_and_time

A subsequent patch adds another two estimates that the code in
ipa_call_context::estimate_size_and_time computes, and the fact that
the function has a special output parameter for each thing it computes
would make it have just too many. Therefore, this patch collapses all
those ouptut parameters into one output structure.

gcc/ChangeLog:

2020-09-02 Martin Jambor <mjambor@suse.cz>

* ipa-inline-analysis.c (do_estimate_edge_time): Adjusted to use
ipa_call_estimates.
(do_estimate_edge_size): Likewise.
(do_estimate_edge_hints): Likewise.
* ipa-fnsummary.h (struct ipa_call_estimates): New type.
(ipa_call_context::estimate_size_and_time): Adjusted declaration.
(estimate_ipcp_clone_size_and_time): Likewise.
* ipa-cp.c (hint_time_bonus): Changed the type of the second argument
to ipa_call_estimates.
(perform_estimation_of_a_value): Adjusted to use ipa_call_estimates.
(estimate_local_effects): Likewise.
* ipa-fnsummary.c (ipa_call_context::estimate_size_and_time): Adjusted
to return estimates in a single ipa_call_estimates parameter.
(estimate_ipcp_clone_size_and_time): Likewise.

ipa: Introduce ipa_cached_call_context

Hi,

as we discussed with Honza on the mailin glist last week, making
cached call context structure distinct from the normal one may make it
clearer that the cached data need to be explicitely deallocated.

This patch does that division.  It is not mandatory for the overall
main goals of the patch set and can be dropped if deemed superfluous.

gcc/ChangeLog:

2020-09-02  Martin Jambor  <mjambor@suse.cz>

* ipa-fnsummary.h (ipa_cached_call_context): New forward declaration
and class.
(class ipa_call_context): Make friend ipa_cached_call_context.  Moved
methods duplicate_from and release to it too.
* ipa-fnsummary.c (ipa_call_context::duplicate_from): Moved to class
ipa_cached_call_context.
(ipa_call_context::release): Likewise, removed the parameter.
* ipa-inline-analysis.c (node_context_cache_entry): Change the type of
ctx to ipa_cached_call_context.
(do_estimate_edge_time): Remove parameter from the call to
ipa_cached_call_context::release.

ipa: Bundle vectors describing argument values

Hi,

this large patch is mostly mechanical change which aims to replace
uses of separate vectors about known scalar values (usually called
known_vals or known_csts), known aggregate values (known_aggs), known
virtual call contexts (known_contexts) and known value
ranges (known_value_ranges) with uses of either new type
ipa_call_arg_values or ipa_auto_call_arg_values, both of which simply
contain these vectors inside them.

The need for two distinct comes from the fact that when the vectors
are constructed from jump functions or lattices, we really should use
auto_vecs with embedded storage allocated on stack.  On the other hand,
the bundle in ipa_call_context can be allocated on heap when in cache,
one time for each call_graph node.

ipa_call_context is constructible from ipa_auto_call_arg_values but
then its vectors must not be resized, otherwise the vectors will stop
pointing to the stack ones.  Unfortunately, I don't think the
structure embedded in ipa_call_context can be made constant because we
need to manipulate and deallocate it when in cache.

gcc/ChangeLog:

2020-09-01  Martin Jambor  <mjambor@suse.cz>

* ipa-prop.h (ipa_auto_call_arg_values): New type.
(class ipa_call_arg_values): Likewise.
(ipa_get_indirect_edge_target): Replaced vector arguments with
ipa_call_arg_values in declaration.  Added an overload for
ipa_auto_call_arg_values.
* ipa-fnsummary.h (ipa_call_context): Removed members m_known_vals,
m_known_contexts, m_known_aggs, duplicate_from, release and equal_to,
new members m_avals, store_to_cache and equivalent_to_p.  Adjusted
construcotr arguments.
(estimate_ipcp_clone_size_and_time): Replaced vector arguments
with ipa_auto_call_arg_values in declaration.
(evaluate_properties_for_edge): Likewise.
* ipa-cp.c (ipa_get_indirect_edge_target): Adjusted to work on
ipa_call_arg_values rather than on separate vectors.  Added an
overload for ipa_auto_call_arg_values.
(devirtualization_time_bonus): Adjusted to work on
ipa_auto_call_arg_values rather than on separate vectors.
(gather_context_independent_values): Adjusted to work on
ipa_auto_call_arg_values rather than on separate vectors.
(perform_estimation_of_a_value): Likewise.
(estimate_local_effects): Likewise.
(modify_known_vectors_with_val): Adjusted both variants to work on
ipa_auto_call_arg_values and rename them to
copy_known_vectors_add_val.
(decide_about_value): Adjusted to work on ipa_call_arg_values rather
than on separate vectors.
(decide_whether_version_node): Likewise.
* ipa-fnsummary.c (evaluate_conditions_for_known_args): Likewise.
(evaluate_properties_for_edge): Likewise.
(ipa_fn_summary_t::duplicate): Likewise.
(estimate_edge_devirt_benefit): Adjusted to work on
ipa_call_arg_values rather than on separate vectors.
(estimate_edge_size_and_time): Likewise.
(estimate_calls_size_and_time_1): Likewise.
(summarize_calls_size_and_time): Adjusted calls to
estimate_edge_size_and_time.
(estimate_calls_size_and_time): Adjusted to work on
ipa_call_arg_values rather than on separate vectors.
(ipa_call_context::ipa_call_context): Construct from a pointer to
ipa_auto_call_arg_values instead of inividual vectors.
(ipa_call_context::duplicate_from): Adjusted to access vectors within
m_avals.
(ipa_call_context::release): Likewise.
(ipa_call_context::equal_to): Likewise.
(ipa_call_context::estimate_size_and_time): Adjusted to work on
ipa_call_arg_values rather than on separate vectors.
(estimate_ipcp_clone_size_and_time): Adjusted to work with
ipa_auto_call_arg_values rather than on separate vectors.
(ipa_merge_fn_summary_after_inlining): Likewise.  Adjusted call to
estimate_edge_size_and_time.
(ipa_update_overall_fn_summary): Adjusted call to
estimate_edge_size_and_time.
* ipa-inline-analysis.c (do_estimate_edge_time): Adjusted to work with
ipa_auto_call_arg_values rather than with separate vectors.
(do_estimate_edge_size): Likewise.
(do_estimate_edge_hints): Likewise.
* ipa-prop.c (ipa_auto_call_arg_values::~ipa_auto_call_arg_values):
New destructor.

libstdc++: Add missing P0896 changes to <iterator>

I noticed that the following changes from this paper were not yet
implemented.

libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (reverse_iterator::iter_move):
Define for C++20 as per P0896.
(reverse_iterator::iter_swap): Likewise.
(move_iterator::operator*): Apply P0896 changes for C++20.
(move_iterator::operator[]): Likewise.
* testsuite/24_iterators/reverse_iterator/cust.cc: New test.

arm: Remove coercion from scalar argument to vmin & vmax intrinsics

This patch fixes an issue with vmin* and vmax* intrinsics which accept
a scalar argument. Previously when the scalar was of different width
to the vector elements this would generate __ARM_undef. This change
allows the scalar argument to be implicitly converted to the correct
width. Also tidied up the relevant unit tests, some of which would
have passed even if only one of two or three intrinsic calls had
compiled correctly.

Bootstrapped and tested on arm-none-eabi, gcc and CMSIS_DSP
testsuites are clean. OK for trunk?

Thanks,
Joe

gcc/ChangeLog:

2020-08-10 Joe Ramsay <joe.ramsay@arm.com>

* config/arm/arm_mve.h (__arm_vmaxnmavq): Remove coercion of scalar
argument.
(__arm_vmaxnmvq): Likewise.
(__arm_vminnmavq): Likewise.
(__arm_vminnmvq): Likewise.
(__arm_vmaxnmavq_p): Likewise.
(__arm_vmaxnmvq_p): Likewise (and delete duplicate definition).
(__arm_vminnmavq_p): Likewise.
(__arm_vminnmvq_p): Likewise.
(__arm_vmaxavq): Likewise.
(__arm_vmaxavq_p): Likewise.
(__arm_vmaxvq): Likewise.
(__arm_vmaxvq_p): Likewise.
(__arm_vminavq): Likewise.
(__arm_vminavq_p): Likewise.
(__arm_vminvq): Likewise.
(__arm_vminvq_p): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vmaxavq_p_s16.c: Add test for mismatched
width of scalar argument.
* gcc.target/arm/mve/intrinsics/vmaxavq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxavq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_p_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxvq_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminavq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_p_f16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_p_u8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminvq_u8.c: Likewise.

AArch64: Add neoversev1_tunings struct

This patch adds a Neoverse V1-specific tuning struct that currently is
just a deduplication of the N1 struct it was using before and specifying
the SVE width.
This will allow us to tweak Neoverse V1 things in the future as needed.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/
* config/aarch64/aarch64.c (neoversev1_tunings): Define.
* config/aarch64/aarch64-cores.def (zeus): Use it.
(neoverse-v1): Likewise.

Perforate fnspec strings

gcc/ChangeLog:

2020-10-02 Jan Hubicka <hubicka@ucw.cz>

* attr-fnspec.h: Update documentation.
(attr_fnsec::return_desc_size): Set to 2
(attr_fnsec::arg_desc_size): Set to 2
* builtin-attrs.def (STR1): Update fnspec.
* internal-fn.def (UBSAN_NULL): Update fnspec.
(UBSAN_VPTR): Update fnspec.
(UBSAN_PTR): Update fnspec.
(ASAN_CHECK): Update fnspec.
(GOACC_DIM_SIZE): Remove fnspec.
(GOACC_DIM_POS): Remove fnspec.
* tree-ssa-alias.c (attr_fnspec::verify): Update verification.

gcc/fortran/ChangeLog:

2020-10-02 Jan Hubicka <hubicka@ucw.cz>

* trans-decl.c (gfc_build_library_function_decl_with_spec): Verify
fnspec.
(gfc_build_intrinsic_function_decls): Update fnspecs.
(gfc_build_builtin_function_decls): Update fnspecs.
* trans-io.c (gfc_build_io_library_fndecls): Update fnspecs.
* trans-types.c (create_fn_spec): Update fnspecs.

c++: Simplify __FUNCTION__ creation

I had reason to wander into cp_make_fname, and noticed it's the only
caller of cp_fname_init. Folding it in makes the code simpler.

gcc/cp/
* cp-tree.h (cp_fname_init): Delete declaration.
* decl.c (cp_fname_init): Merge into only caller ...
(cp_make_fname): ... here & refactor.

Commonize handling of attr-fnspec

* attr-fnspec.h: New file.
* calls.c (decl_return_flags): Use attr_fnspec.
* gimple.c (gimple_call_arg_flags): Use attr_fnspec.
(gimple_call_return_flags): Use attr_fnspec.
* tree-into-ssa.c (pass_build_ssa::execute): Use attr_fnspec.
* tree-ssa-alias.c (attr_fnspec::verify): New member fuction.

Break out ao_ref_init_from_ptr_and_range from ao_ref_init_from_ptr_and_size

* tree-ssa-alias.c (ao_ref_init_from_ptr_and_range): Break out from ...
(ao_ref_init_from_ptr_and_size): ... here.

Add poly_int64 streaming support

2020-10-02 Jan Hubicka <hubicka@ucw.cz>

* data-streamer-in.c (streamer_read_poly_int64): New function.
* data-streamer-out.c (streamer_write_poly_int64): New function.
* data-streamer.h (streamer_write_poly_int64): Declare.
(streamer_read_poly_int64): Declare.

aarch64: Remove aarch64_sve_pred_dominates_p

In r11-2922, Przemek fixed a post-RA instruction match failure
caused by the SVE FP subtraction patterns..  This patch applies
the same fix to the other patterns.

To recap, the issue is around the handling of predication.
We want to do two things:

- Optimise cases in which a predicate is known to be all-true.

- Differentiate cases in which the predicate on an _x ACLE function has
  to be kept as-is from cases in which we can make more lanes active.
  The former is true by default, the latter is true for certain
  combinations of flags in the -ffast-math group.

This is handled by a boolean flag in the unspecs to say whether the
predicate is “strict” or “relaxed”.  When combining multiple strict
operations, the predicates used in the operations generally need to
match.  When combining multiple relaxed operations, we can ignore the
predicates on nested operations and just use the predicate on the
“outermost” operation.

Originally I'd tried to reduce combinatorial explosion by using
aarch64_sve_pred_dominates_p.  This required matching predicates
for strict operations but allowed more combinations for relaxed
operations.

The problem (as I should have remembered) is that C conditions on
insn patterns can't reliably enforce matching operands.  If the
same register is used in two different input operands, the RA is
allowed to use different hard registers for those input operands
(and sometimes it has to).  So operands that match before RA
might not match afterwards.  The only sure way to force a match
is via match_dup.

This patch splits the cases into two.  I cry bitter tears at having
to do this, but I think it's the only backportable fix.  There might
be some way of using define_subst to generate the cond_* patterns from
the pred_* patterns, with some alternatives strategically disabled in
each case, but that's future work and might not be an improvement.

Since so many patterns now do this, I moved the comments from the
subtraction pattern to a new banner comment at the head of the file.

gcc/
* config/aarch64/aarch64-protos.h (aarch64_sve_pred_dominates_p):
Delete.
* config/aarch64/aarch64.c (aarch64_sve_pred_dominates_p): Likewise.
* config/aarch64/aarch64-sve.md: Add banner comment describing
how merging predicated FP operations are represented.
(*cond_<SVE_COND_FP_UNARY:optab><mode>_2): Split into...
(*cond_<SVE_COND_FP_UNARY:optab><mode>_2_relaxed): ...this and...
(*cond_<SVE_COND_FP_UNARY:optab><mode>_2_strict): ...this.
(*cond_<SVE_COND_FP_UNARY:optab><mode>_any): Split into...
(*cond_<SVE_COND_FP_UNARY:optab><mode>_any_relaxed): ...this and...
(*cond_<SVE_COND_FP_UNARY:optab><mode>_any_strict): ...this.
(*cond_<SVE_COND_FP_BINARY_INT:optab><mode>_2): Split into...
(*cond_<SVE_COND_FP_BINARY_INT:optab><mode>_2_relaxed): ...this and...
(*cond_<SVE_COND_FP_BINARY_INT:optab><mode>_2_strict): ...this.
(*cond_<SVE_COND_FP_BINARY_INT:optab><mode>_any): Split into...
(*cond_<SVE_COND_FP_BINARY_INT:optab><mode>_any_relaxed): ...this
and...
(*cond_<SVE_COND_FP_BINARY_INT:optab><mode>_any_strict): ...this.
(*cond_<SVE_COND_FP_BINARY:optab><mode>_2): Split into...
(*cond_<SVE_COND_FP_BINARY:optab><mode>_2_relaxed): ...this and...
(*cond_<SVE_COND_FP_BINARY:optab><mode>_2_strict): ...this.
(*cond_<SVE_COND_FP_BINARY_I1:optab><mode>_2_const): Split into...
(*cond_<SVE_COND_FP_BINARY_I1:optab><mode>_2_const_relaxed): ...this
and...
(*cond_<SVE_COND_FP_BINARY_I1:optab><mode>_2_const_strict): ...this.
(*cond_<SVE_COND_FP_BINARY:optab><mode>_3): Split into...
(*cond_<SVE_COND_FP_BINARY:optab><mode>_3_relaxed): ...this and...
(*cond_<SVE_COND_FP_BINARY:optab><mode>_3_strict): ...this.
(*cond_<SVE_COND_FP_BINARY:optab><mode>_any): Split into...
(*cond_<SVE_COND_FP_BINARY:optab><mode>_any_relaxed): ...this and...
(*cond_<SVE_COND_FP_BINARY:optab><mode>_any_strict): ...this.
(*cond_<SVE_COND_FP_BINARY_I1:optab><mode>_any_const): Split into...
(*cond_<SVE_COND_FP_BINARY_I1:optab><mode>_any_const_relaxed): ...this
and...
(*cond_<SVE_COND_FP_BINARY_I1:optab><mode>_any_const_strict): ...this.
(*cond_add<mode>_2_const): Split into...
(*cond_add<mode>_2_const_relaxed): ...this and...
(*cond_add<mode>_2_const_strict): ...this.
(*cond_add<mode>_any_const): Split into...
(*cond_add<mode>_any_const_relaxed): ...this and...
(*cond_add<mode>_any_const_strict): ...this.
(*cond_<SVE_COND_FCADD:optab><mode>_2): Split into...
(*cond_<SVE_COND_FCADD:optab><mode>_2_relaxed): ...this and...
(*cond_<SVE_COND_FCADD:optab><mode>_2_strict): ...this.
(*cond_<SVE_COND_FCADD:optab><mode>_any): Split into...
(*cond_<SVE_COND_FCADD:optab><mode>_any_relaxed): ...this and...
(*cond_<SVE_COND_FCADD:optab><mode>_any_strict): ...this.
(*cond_sub<mode>_3_const): Split into...
(*cond_sub<mode>_3_const_relaxed): ...this and...
(*cond_sub<mode>_3_const_strict): ...this.
(*aarch64_pred_abd<mode>): Split into...
(*aarch64_pred_abd<mode>_relaxed): ...this and...
(*aarch64_pred_abd<mode>_strict): ...this.
(*aarch64_cond_abd<mode>_2): Split into...
(*aarch64_cond_abd<mode>_2_relaxed): ...this and...
(*aarch64_cond_abd<mode>_2_strict): ...this.
(*aarch64_cond_abd<mode>_3): Split into...
(*aarch64_cond_abd<mode>_3_relaxed): ...this and...
(*aarch64_cond_abd<mode>_3_strict): ...this.
(*aarch64_cond_abd<mode>_any): Split into...
(*aarch64_cond_abd<mode>_any_relaxed): ...this and...
(*aarch64_cond_abd<mode>_any_strict): ...this.
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_2): Split into...
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_2_relaxed): ...this and...
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_2_strict): ...this.
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_4): Split into...
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_4_relaxed): ...this and...
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_4_strict): ...this.
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_any): Split into...
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_any_relaxed): ...this and...
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_any_strict): ...this.
(*cond_<SVE_COND_FCMLA:optab><mode>_4): Split into...
(*cond_<SVE_COND_FCMLA:optab><mode>_4_relaxed): ...this and...
(*cond_<SVE_COND_FCMLA:optab><mode>_4_strict): ...this.
(*cond_<SVE_COND_FCMLA:optab><mode>_any): Split into...
(*cond_<SVE_COND_FCMLA:optab><mode>_any_relaxed): ...this and...
(*cond_<SVE_COND_FCMLA:optab><mode>_any_strict): ...this.
(*aarch64_pred_fac<cmp_op><mode>): Split into...
(*aarch64_pred_fac<cmp_op><mode>_relaxed): ...this and...
(*aarch64_pred_fac<cmp_op><mode>_strict): ...this.
(*cond_<optab>_nontrunc<SVE_FULL_F:mode><SVE_FULL_HSDI:mode>): Split
into...
(*cond_<optab>_nontrunc<SVE_FULL_F:mode><SVE_FULL_HSDI:mode>_relaxed):
...this and...
(*cond_<optab>_nontrunc<SVE_FULL_F:mode><SVE_FULL_HSDI:mode>_strict):
...this.
(*cond_<optab>_nonextend<SVE_FULL_HSDI:mode><SVE_FULL_F:mode>): Split
into...
(*cond_<optab>_nonextend<SVE_FULL_HSDI:mode><SVE_FULL_F:mode>_relaxed):
...this and...
(*cond_<optab>_nonextend<SVE_FULL_HSDI:mode><SVE_FULL_F:mode>_strict):
...this.
* config/aarch64/aarch64-sve2.md
(*cond_<SVE2_COND_FP_UNARY_LONG:optab><mode>): Split into...
(*cond_<SVE2_COND_FP_UNARY_LONG:optab><mode>_relaxed): ...this and...
(*cond_<SVE2_COND_FP_UNARY_LONG:optab><mode>_strict): ...this.
(*cond_<SVE2_COND_FP_UNARY_NARROWB:optab><mode>_any): Split into...
(*cond_<SVE2_COND_FP_UNARY_NARROWB:optab><mode>_any_relaxed): ...this
and...
(*cond_<SVE2_COND_FP_UNARY_NARROWB:optab><mode>_any_strict): ...this.
(*cond_<SVE2_COND_INT_UNARY_FP:optab><mode>): Split into...
(*cond_<SVE2_COND_INT_UNARY_FP:optab><mode>_relaxed): ...this and...
(*cond_<SVE2_COND_INT_UNARY_FP:optab><mode>_strict): ...this.

arm: Make more use of the new mode macros

As Christophe pointed out, my r11-3522 patch didn't in fact fix
all of the armv8_2-fp16-arith-2.c failures introduced by allowing
FP16 vectorisation without -funsafe-math-optimizations. I must have
only tested the final patch on my usual arm-linux-gnueabihf bootstrap,
which it turns out treats the test as unsupported.

The focus of the original patch was to use mode macros for
patterns that are shared between Advanced SIMD, iwMMXt and MVE.
This patch uses the mode macros for general neon.md patterns too.

gcc/
* config/arm/neon.md (*sub<VDQ:mode>3_neon): Use the new mode macros
for the insn condition.
(sub<VH:mode>3, *mul<VDQW:mode>3_neon): Likewise.
(mul<VDQW:mode>3add<VDQW:mode>_neon): Likewise.
(mul<VH:mode>3add<VH:mode>_neon): Likewise.
(mul<VDQW:mode>3neg<VDQW:mode>add<VDQW:mode>_neon): Likewise.
(fma<VCVTF:mode>4, fma<VH:mode>4, *fmsub<VCVTF:mode>4): Likewise.
(quad_halves_<code>v4sf, reduc_plus_scal_<VD:mode>): Likewise.
(reduc_plus_scal_<VQ:mode>, reduc_smin_scal_<VD:mode>): Likewise.
(reduc_smin_scal_<VQ:mode>, reduc_smax_scal_<VD:mode>): Likewise.
(reduc_smax_scal_<VQ:mode>, mul<VH:mode>3): Likewise.
(neon_vabd<VF:mode>_2, neon_vabd<VF:mode>_3): Likewise.
(fma<VH:mode>4_intrinsic): Delete.
(neon_vadd<VCVTF:mode>): Use the new mode macros to decide which
form of instruction to generate.
(neon_vmla<VDQW:mode>, neon_vmls<VDQW:mode>): Likewise.
(neon_vsub<VCVTF:mode>): Likewise.
(neon_vfma<VH:mode>): Generate the main fma<mode>4 form instead
of using fma<mode>4_intrinsic.

gcc/testsuite/
* gcc.target/arm/armv8_2-fp16-arith-2.c (float16_t): Use _Float16_t
rather than __fp16.
(float16x4_t, float16x4_t): Likewise.
(fp16_abs): Use __builtin_fabsf16.

aarch64: ilp32 testsuite fixes

This fixes test failures on ilp32 introduced in
r11-3032-gd4febc75e8dfab23bd3132d5747eded918f85107.

The assembler checks in extend-syntax.c simply needed adjusting for
32-bit pointers.

It appears the subsp.c test has never passed on ILP32 due to a missed
optimisation there. Since this isn't a code quality regression, disable
that check on ILP32.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/extend-syntax.c: Fix assembler checks for
ilp32, disable check-function-bodies on ilp32.
* gcc.target/aarch64/subsp.c: Only check second scan-assembler
on lp64 since the code on ilp32 is missing the optimization
needed for this test to pass.

GCOV: do not mangle .gcno files.

gcc/ChangeLog:

PR gcov-profile/97193
* coverage.c (coverage_init): GCDA note files should not be
mangled and should end in output directory.

libgomp: Regenerate configure files with automake 1.15.1

libgomp/ChangeLog:
* Makefile.in: Regenerate with automake 1.15.1.
* aclocal.m4: Likewise.
* configure: Likewise.
* testsuite/Makefile.in: Likewise.

c++: Set CALL_FROM_NEW_OR_DELETE_P on more calls.

We were failing to set the flag on a delete call in a new expression, in a
deleting destructor, and in a coroutine. Fixed by setting it in the
function that builds the call.

2020-10-02 Jason Merril <jason@redhat.com>

gcc/cp/ChangeLog:
* call.c (build_operator_new_call): Set CALL_FROM_NEW_OR_DELETE_P.
(build_op_delete_call): Likewise.
* init.c (build_new_1, build_vec_delete_1, build_delete): Not here.
(build_delete):

gcc/ChangeLog:
* gimple.h (gimple_call_operator_delete_p): Rename from
gimple_call_replaceable_operator_delete_p.
* gimple.c (gimple_call_operator_delete_p): Likewise.
* tree.h (DECL_IS_REPLACEABLE_OPERATOR_DELETE_P): Remove.
* tree-ssa-dce.c (mark_all_reaching_defs_necessary_1): Adjust.
(propagate_necessity): Likewise.
(eliminate_unnecessary_stmts): Likewise.
* tree-ssa-structalias.c (find_func_aliases_for_call): Likewise.

gcc/testsuite/ChangeLog:
* g++.dg/pr94314.C: new/delete no longer omitted.

make use of CALL_FROM_NEW_OR_DELETE_P

This fixes points-to analysis and DCE to only consider new/delete
operator calls from new or delete expressions and not direct calls.

2020-10-01 Richard Biener <rguenther@suse.de>

* gimple.h (GF_CALL_FROM_NEW_OR_DELETE): New call flag.
(gimple_call_set_from_new_or_delete): New.
(gimple_call_from_new_or_delete): Likewise.
* gimple.c (gimple_build_call_from_tree): Set
GF_CALL_FROM_NEW_OR_DELETE appropriately.
* ipa-icf-gimple.c (func_checker::compare_gimple_call):
Compare gimple_call_from_new_or_delete.
* tree-ssa-dce.c (mark_all_reaching_defs_necessary_1): Make
sure to only consider new/delete calls from new or delete
expressions.
(propagate_necessity): Likewise.
(eliminate_unnecessary_stmts): Likewise.
* tree-ssa-structalias.c (find_func_aliases_for_call):
Likewise.

* g++.dg/tree-ssa/pta-delete-1.C: New testcase.

c++: Move CALL_FROM_NEW_OR_DELETE_P to tree.h

As discussed with richi, we should be able to use TREE_PROTECTED for this
flag, since CALL_FROM_THUNK_P will never be set on a call to an operator new
or delete.

2020-10-01 Jason Merril <jason@redhat.com>

gcc/cp/ChangeLog:
* lambda.c (call_from_lambda_thunk_p): New.
* cp-gimplify.c (cp_genericize_r): Use it.
* pt.c (tsubst_copy_and_build): Use it.
* typeck.c (check_return_expr): Use it.
* cp-tree.h: Declare it.
(CALL_FROM_NEW_OR_DELETE_P): Move to gcc/tree.h.

gcc/ChangeLog:
* tree.h (CALL_FROM_NEW_OR_DELETE_P): Move from cp-tree.h.
* tree-core.h: Document new usage of protected_flag.

Implement irange::fits_p.

This should have been included in the irange_allocator patch, as
a method to see if the current object can hold a passed range
without truncation.

gcc/ChangeLog:

* value-range.h (irange::fits_p): New.

Daily bump.

compiler: set varargs correctly for type of method expression

Fixes golang/go#41737

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/258977

[RS6000] ICE in decompose, at rtl.h:2282

during RTL pass: fwprop1
gcc.dg/pr82596.c: In function 'test_cststring':
gcc.dg/pr82596.c:27:1: internal compiler error: in decompose, at rtl.h:2282

-m32 gcc/testsuite/gcc.dg/pr82596.c fails along with other tests after
applying rtx_cost patches, which exposed a backend bug.
legitimize_address when presented with the following address
(plus (reg) (const_int 0x7ffffffff))
attempts to rewrite it as a high/low sum. The low part is 0xffff, or
-1, making the high part 0x80000000. But this is no longer canonical
for SImode.

* config/rs6000/rs6000.c (rs6000_legitimize_address): Use
gen_int_mode for high part of address constant.

[RS6000] rs6000_linux64_override_options fix

Commit c6be439b37 wrongly left a block of code inside an "else" block,
which changed the default for power10 TARGET_NO_FP_IN_TOC
accidentally. We don't want FP constants in the TOC when
-mcmodel=medium can address them just as efficiently outside the TOC.

* config/rs6000/rs6000.c (rs6000_linux64_override_options):
Formatting. Correct setting of TARGET_NO_FP_IN_TOC and
TARGET_NO_SUM_IN_TOC.

[RS6000] function for linux64 SUBSUBTARGET_OVERRIDE_OPTIONS

* config/rs6000/freebsd64.h (SUBSUBTARGET_OVERRIDE_OPTIONS): Use
rs6000_linux64_override_options.
* config/rs6000/linux64.h (SUBSUBTARGET_OVERRIDE_OPTIONS): Break
out to..
* config/rs6000/rs6000.c (rs6000_linux64_override_options): ..this,
new function. Tweak non-biarch test and clearing of
profile_kernel to work with freebsd64.h.

c++: Kill DECL_HIDDEN_P

There are only a couple of asserts remaining using this macro, and
nothing using TYPE_HIDDEN_P. Killed thusly.

gcc/cp/
* cp-tree.h (DECL_ANTICIPATED): Adjust comment.
(DECL_HIDDEN_P, TYPE_HIDDEN_P): Delete.
* tree.c (ovl_insert): Delete DECL_HIDDEN_P assert.
(ovl_skip_hidden): Likewise.

Fix build of ppc64 target.

Since a889e06ac68 the following fails.

In file included from ../../gcc/tree-ssa-propagate.h:25:0,
                 from ../../gcc/config/rs6000/rs6000.c:78:
../../gcc/value-query.h:90:31: error: ‘irange’ has not been declared
   virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) = 0;
                               ^~~~~~
../../gcc/value-query.h:91:31: error: ‘irange’ has not been declared
   virtual bool range_on_edge (irange &r, edge, tree name);
                               ^~~~~~
../../gcc/value-query.h:92:31: error: ‘irange’ has not been declared
   virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL);
                               ^~~~~~
In file included from ../../gcc/tree-ssa-propagate.h:25:0,
                 from ../../gcc/config/rs6000/rs6000-call.c:67:
../../gcc/value-query.h:90:31: error: ‘irange’ has not been declared
   virtual bool range_of_expr (irange &r, tree name, gimple * = NULL) = 0;
                               ^~~~~~
../../gcc/value-query.h:91:31: error: ‘irange’ has not been declared
   virtual bool range_on_edge (irange &r, edge, tree name);
                               ^~~~~~
../../gcc/value-query.h:92:31: error: ‘irange’ has not been declared
   virtual bool range_of_stmt (irange &r, gimple *, tree name = NULL);

gcc/ChangeLog:

* config/rs6000/rs6000-call.c: Include value-range.h.
* config/rs6000/rs6000.c: Likewise.

[nvptx] Emit mov.u32 instead of cvt.u32.u32 for truncsiqi2

When running:
...
$ gcc.sh src/gcc/testsuite/gcc.target/nvptx/abi-complex-arg.c -S -dP
...
we have in abi-complex-arg.s:
...
//(insn 3 5 4 2
//  (set
//    (reg:QI 23)
//    (truncate:QI (reg:SI 22))) "abi-complex-arg.c":38:1 29 {truncsiqi2}
//  (nil))
                cvt.u32.u32     %r23, %r22;     // 3    [c=4]  truncsiqi2/0
...

The cvt.u32.u32 can be written shorter and clearer as mov.u32.

Fix this in define_insn "truncsi<QHIM>2".

Tested on nvptx.

gcc/ChangeLog:

2020-10-01  Tom de Vries  <tdevries@suse.de>

PR target/80845
* config/nvptx/nvptx.md (define_insn "truncsi<QHIM>2"): Emit mov.u32
instead of cvt.u32.u32.

arm: Add missing vec_cmp and vcond patterns

This patch does several things at once:

(1) Add vector compare patterns (vec_cmp and vec_cmpu).

(2) Add vector selects between floating-point modes when the
    values being compared are integers (affects vcond and vcondu).

(3) Add vector selects between integer modes when the values being
    compared are floating-point (affects vcond).

(4) Add standalone vector select patterns (vcond_mask).

(5) Tweak the handling of compound comparisons with zeros.

Unfortunately it proved too difficult (for me) to separate this
out into a series of smaller patches, since everything is so
inter-related.  Defining only some of the new patterns does
not leave things in a happy state.

The handling of comparisons is mostly taken from the vcond patterns.
This means that it remains non-compliant with IEEE: “quiet” comparisons
use signalling instructions.  But that shouldn't matter for floats,
since we require -funsafe-math-optimizations to vectorize for them
anyway.

It remains the case that comparisons and selects aren't implemented
at all for HF vectors.  Implementing those feels like separate work.

gcc/
PR target/96528
PR target/97288
* config/arm/arm-protos.h (arm_expand_vector_compare): Declare.
(arm_expand_vcond): Likewise.
* config/arm/arm.c (arm_expand_vector_compare): New function.
(arm_expand_vcond): Likewise.
* config/arm/neon.md (vec_cmp<VDQW:mode><v_cmp_result>): New pattern.
(vec_cmpu<VDQW:mode><VDQW:mode>): Likewise.
(vcond<VDQW:mode><VDQW:mode>): Require operand 5 to be a register
or zero.  Use arm_expand_vcond.
(vcond<V_cvtto><V32:mode>): New pattern.
(vcondu<VDQIW:mode><VDQIW:mode>): Generalize to...
(vcondu<VDQW:mode><v_cmp_result): ...this.  Require operand 5
to be a register or zero.  Use arm_expand_vcond.
(vcond_mask_<VDQW:mode><v_cmp_result>): New pattern.
(neon_vc<cmp_op><mode>, neon_vc<cmp_op><mode>_insn): Add "@" marker.
(neon_vbsl<mode>): Likewise.
(neon_vc<cmp_op>u<mode>): Reexpress as...
(@neon_vc<code><mode>): ...this.

gcc/testsuite/
* lib/target-supports.exp (check_effective_target_vect_cond_mixed): Add
arm neon targets.
* gcc.target/arm/neon-compare-1.c: New test.
* gcc.target/arm/neon-compare-2.c: Likewise.
* gcc.target/arm/neon-compare-3.c: Likewise.
* gcc.target/arm/neon-compare-4.c: Likewise.
* gcc.target/arm/neon-compare-5.c: Likewise.
* gcc.target/arm/neon-vcond-gt.c: Expect comparisons with zero.
* gcc.target/arm/neon-vcond-ltgt.c: Likewise.
* gcc.target/arm/neon-vcond-unordered.c: Likewise.

aarch64: Restrict asm-matching tests to lp64

gcc/testsuite/
* gcc.target/aarch64/movtf_1.c: Restrict the asm matching to lp64.
* gcc.target/aarch64/movti_1.c: Likewise.

arm: Fix testcase selection for Low Overhead Loop tests [PR96375]

gcc/testsuite/

PR target/96375
* gcc.target/arm/lob1.c: Fix missing flag.
* gcc.target/arm/lob2.c: Likewise.
* gcc.target/arm/lob3.c: Likewise.
* gcc.target/arm/lob4.c: Likewise.
* gcc.target/arm/lob5.c: Likewise.
* gcc.target/arm/lob6.c: Likewise.
* lib/target-supports.exp
(check_effective_target_arm_v8_1_lob_ok): Return 1 only for
cortex-m targets, add '-mthumb' flag.

config/i386/t-rtems: Change from mtune to march for multilibs

* config/i386/t-rtems: Change from mtune to march when building
multilibs. The mtune argument tunes or optimizes for a specific
CPU model but does not ensure the generated code is appropriate
for the CPU model. Prior to this patch, i386 compatible code
was always generated but tuned for later models.

Convert sprintf/strlen passes to value query class.

gcc/ChangeLog:

* builtins.c (compute_objsize): Replace vr_values with range_query.
(get_range): Same.
(gimple_call_alloc_size): Same.
* builtins.h (class vr_values): Remove.
(gimple_call_alloc_size): Replace vr_values with range_query.
* gimple-ssa-sprintf.c (get_int_range): Same.
(struct directive): Pass gimple context to fmtfunc callback.
(directive::set_width): Replace inline with out-of-line version.
(directive::set_precision): Same.
(format_none): New gimple argument.
(format_percent): New gimple argument.
(format_integer): New gimple argument.
(format_floating): New gimple argument.
(get_string_length): Use range_query API.
(format_character): New gimple argument.
(format_string): New gimple argument.
(format_plain): New gimple argument.
(format_directive): New gimple argument.
(parse_directive): Replace vr_values with range_query.
(compute_format_length): Same.
(handle_printf_call): Same. Adjust for range_query API.
* tree-ssa-strlen.c (get_range): Same.
(compare_nonzero_chars): Same.
(get_addr_stridx) Replace vr_values with range_query.
(get_stridx): Same.
(dump_strlen_info): Same.
(get_range_strlen_dynamic): Adjust for range_query API.
(set_strlen_range): Same
(maybe_warn_overflow): Replace vr_values with range_query.
(handle_builtin_strcpy): Same.
(maybe_diag_stxncpy_trunc): Add FIXME comment.
(handle_builtin_memcpy): Replace vr_values with range_query.
(handle_builtin_memset): Same.
(get_len_or_size): Same.
(strxcmp_eqz_result): Same.
(handle_builtin_string_cmp): Same.
(count_nonzero_bytes_addr): Same, plus adjust for range_query API.
(count_nonzero_bytes): Replace vr_values with range_query.
(handle_store): Same.
(strlen_check_and_optimize_call): Same.
(handle_integral_assign): Same.
(check_and_optimize_stmt): Same.
* tree-ssa-strlen.h (class vr_values): Remove.
(get_range): Replace vr_values with range_query.
(get_range_strlen_dynamic): Same.
(handle_printf_call): Same.

Convert vr-values to value query class.

gcc/ChangeLog:

* gimple-loop-versioning.cc (lv_dom_walker::before_dom_children):
Pass m_range_analyzer instead of get_vr_values.
(loop_versioning::name_prop::get_value): Rename to...
(loop_versioning::name_prop::value_of_expr): ...this.
* gimple-ssa-evrp-analyze.c (evrp_range_analyzer::evrp_range_analyzer):
Adjust for evrp_range_analyzer
inheriting from vr_values.
(evrp_range_analyzer::try_find_new_range): Same.
(evrp_range_analyzer::record_ranges_from_incoming_edge): Same.
(evrp_range_analyzer::record_ranges_from_phis): Same.
(evrp_range_analyzer::record_ranges_from_stmt): Same.
(evrp_range_analyzer::push_value_range): Same.
(evrp_range_analyzer::pop_value_range): Same.
* gimple-ssa-evrp-analyze.h (class evrp_range_analyzer): Inherit from
vr_values. Adjust accordingly.
* gimple-ssa-evrp.c: Adjust for evrp_range_analyzer inheriting from
vr_values.
(evrp_folder::value_of_evrp): Rename from get_value.
* tree-ssa-ccp.c (class ccp_folder): Rename get_value to
value_of_expr.
(ccp_folder::get_value): Rename to...
(ccp_folder::value_of_expr): ...this.
* tree-ssa-copy.c (class copy_folder): Rename get_value to
value_of_expr.
(copy_folder::get_value): Rename to...
(copy_folder::value_of_expr): ...this.
* tree-ssa-dom.c (dom_opt_dom_walker::after_dom_children): Adjust
for evrp_range_analyzer inheriting from vr_values.
(dom_opt_dom_walker::optimize_stmt): Same.
* tree-ssa-propagate.c (substitute_and_fold_engine::replace_uses_in):
Call value_of_* instead of get_value.
(substitute_and_fold_engine::replace_phi_args_in): Same.
(substitute_and_fold_engine::propagate_into_phi_args): Same.
(substitute_and_fold_dom_walker::before_dom_children): Same.
* tree-ssa-propagate.h: Include value-query.h.
(class substitute_and_fold_engine): Inherit from value_query.
* tree-ssa-strlen.c (strlen_dom_walker::before_dom_children):
Adjust for evrp_range_analyzer inheriting from vr_values.
* tree-ssa-threadedge.c (record_temporary_equivalences_from_phis):
Same.
* tree-vrp.c (class vrp_folder): Same.
(vrp_folder::get_value): Rename to value_of_expr.
* vr-values.c (vr_values::get_lattice_entry): Adjust for
vr_values inheriting from range_query.
(vr_values::range_of_expr): New.
(vr_values::value_of_expr): New.
(vr_values::value_on_edge): New.
(vr_values::value_of_stmt): New.
(simplify_using_ranges::op_with_boolean_value_range_p): Call
get_value_range through query.
(check_for_binary_op_overflow): Rename store to query.
(vr_values::vr_values): Remove vrp_value_range_pool.
(vr_values::~vr_values): Same.
(simplify_using_ranges::get_vr_for_comparison): Call get_value_range
through query.
(simplify_using_ranges::compare_names): Same.
(simplify_using_ranges::vrp_evaluate_conditional): Same.
(simplify_using_ranges::vrp_visit_cond_stmt): Same.
(simplify_using_ranges::simplify_abs_using_ranges): Same.
(simplify_using_ranges::simplify_cond_using_ranges_1): Same.
(simplify_cond_using_ranges_2): Same.
(simplify_using_ranges::simplify_switch_using_ranges): Same.
(simplify_using_ranges::two_valued_val_range_p): Same.
(simplify_using_ranges::simplify_using_ranges): Rename store to query.
(simplify_using_ranges::simplify): Assert that we have a query.
* vr-values.h (class range_query): Remove.
(class simplify_using_ranges): Remove inheritance of range_query.
(class vr_values): Add virtuals for range_of_expr, value_of_expr,
value_on_edge, value_of_stmt, and get_value_range.
Call range_query allocator instead of using vrp_value_range_pool.
Remove vrp_value_range_pool.
(simplify_using_ranges::get_value_range): Remove.

tree-optimization/97236 - fix bad use of VMAT_CONTIGUOUS

This avoids using VMAT_CONTIGUOUS with single-element interleaving
when using V1mode vectors. Instead keep VMAT_ELEMENTWISE but
continue to avoid load-lanes and gathers.

2020-10-01 Richard Biener <rguenther@suse.de>

PR tree-optimization/97236
* tree-vect-stmts.c (get_group_load_store_type): Keep
VMAT_ELEMENTWISE for single-element vectors.

* gcc.dg/vect/pr97236.c: New testcase.

c++: pushdecl_top_level must set context

I discovered pushdecl_top_level was not setting the decl's context,
and we ended up with namespace-scope decls with NULL context.  That
broke modules.  Then I discovered a couple of places where we set the
context to a FUNCTION_DECL, which is also wrong.  AFAICT the literals
in question belong in global scope, as they're comdatable entities.
But create_temporary would use current_scope for the context before we
pushed it into namespace scope.

This patch asserts the context is NULL and then sets it to the frobbed
global_namespace.

gcc/cp/
* name-lookup.c (pushdecl_top_level): Assert incoming context is
null, add global_namespace context.
(pushdecl_top_level_and_finish): Likewise.
* pt.c (get_template_parm_object): Clear decl context before
pushing.
* semantics.c (finish_compound_literal): Likewise.

Add gcc.c-torture/compile/pr97243.c testcase.

PR ipa/97243
* gcc.c-torture/compile/pr97243.c: New test.

Fix ICE in compute_parm_map

gcc/ChangeLog:

* ipa-modref.c (compute_parm_map): Be ready for callee_pi to be NULL.

Add -fno-ipa-modref to gcc.dg/ipa/remref-2a.c

PR ipa/97244
* gcc.dg/ipa/remref-2a.c: Add -fno-ipa-modref

Fix ICE in ipa_edge_args_sum_t::duplicate

PR ipa/97244
* ipa-fnsummary.c (pass_free_fnsummary::execute): Free
also indirect inlining datastructure.
* ipa-modref.c (pass_ipa_modref::execute): Do not free them here.
* ipa-prop.c (ipa_free_all_node_params): Do not crash when info does
not exist.
(ipa_unregister_cgraph_hooks): Likewise.

Fix handling of fnspec for internal functions.

* internal-fn.c (DEF_INTERNAL_FN): Fix handling of fnspec

Initial implementation of value query class.

gcc/ChangeLog:

* Makefile.in: Add value-query.o.
* value-query.cc: New file.
* value-query.h: New file.

c++: Refactor lookup_and_check_tag

It turns out I'd already found lookup_and_check_tag's control flow
confusing, and had refactored it on the modules branch. For instance,
it continually checks 'if (decl &&$ condition)' before finally getting
to 'else if (!decl)'. why not just check !decl first and be done?
Well, it is done thusly.

gcc/cp/
* decl.c (lookup_and_check_tag): Refactor.

libstdc++: Fix test_and_acquire for EABI

libstdc++-v3/ChangeLog:

* config/cpu/arm/cxxabi_tweaks.h (_GLIBCXX_GUARD_TEST_AND_ACQUIRE):
Do not try to dereference return value of __atomic_load_n.

arm: Fix ordering in arm-cpus.in

This moves the recent entry for Neoverse N2 down and adds a comment in
order to preserve the existing order/structure in arm-cpus.in.

gcc/ChangeLog:

* config/arm/arm-cpus.in: Fix ordering, move Neoverse N2 down.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.

[testsuite] Enable pr94600-{1,3}.c tests for nvptx

When compiling test-case pr94600-1.c for nvptx, this gimple mem move:
...
  MEM[(volatile struct t0 *)655404B] ={v} a0[0];
...
is expanded into a memcpy, but when compiling pr94600-2.c instead, this similar
gimple mem move:
...
  MEM[(volatile struct t0 *)655404B] ={v} a00;
...
is expanded into a 32-bit load/store pair.

In both cases, emit_block_move is called.

In the latter case, can_move_by_pieces (4 /* byte-size */, 32 /* bit-align */)
is called, which returns true (because by_pieces_ninsns returns 1, which is
smaller than the MOVE_RATIO of 4).

In the former case, can_move_by_pieces (4 /* byte-size */, 8 /* bit-align */)
is called, which returns false (because by_pieces_ninsns returns 4, which is
not smaller than the MOVE_RATIO of 4).

So the difference in code generation is explained by the alignment.  The
difference in alignment comes from the move sources: a0[0] vs. a00.  Both
have the same type with 8-bit alignment, but a00 is on stack, which based on
the base stack align and stack variable placement happens to result in a
32-bit alignment.

Enable test-cases pr94600-{1,3}.c for nvptx by forcing the currently 8-byte
aligned variables to have a 32-bit alignment for STRICT_ALIGNMENT targets.

Tested on nvptx.

gcc/testsuite/ChangeLog:

2020-10-01  Tom de Vries  <tdevries@suse.de>

* gcc.dg/pr94600-1.c: Force 32-bit alignment for a0 for !non_strict_align
targets.  Remove target clauses from scan tests.
* gcc.dg/pr94600-3.c: Same.

c++: Fix up default initialization with consteval default ctor [PR96994]

> > The following testcase is miscompiled (in particular the a and i
> > initialization).  The problem is that build_special_member_call due to
> > the immediate constructors (but not evaluated in constant expression mode)
> > doesn't create a CALL_EXPR, but returns a TARGET_EXPR with CONSTRUCTOR
> > as the initializer for it,
>
> That seems like the bug; at the end of build_over_call, after you
>
> >        call = cxx_constant_value (call, obj_arg);
>
> You need to build an INIT_EXPR if obj_arg isn't a dummy.

That works.  obj_arg is NULL if it is a dummy from the earlier code.

2020-10-01  Jakub Jelinek  <jakub@redhat.com>

PR c++/96994
* call.c (build_over_call): If obj_arg is non-NULL, return INIT_EXPR
setting obj_arg to call.

* g++.dg/cpp2a/consteval18.C: New test.

c++: Handle std::construct_at on automatic vars during constant evaluation [PR97195]

As mentioned in the PR, we only support due to a bug in constant expressions
std::construct_at on non-automatic variables, because we VERIFY_CONSTANT the
second argument of placement new, which fails verification if it is an
address of an automatic variable.
The following patch fixes it by not performing that verification, the
placement new evaluation later on will verify it after it is dereferenced.

2020-10-01 Jakub Jelinek <jakub@redhat.com>

PR c++/97195
* constexpr.c (cxx_eval_call_expression): Don't VERIFY_CONSTANT the
second argument.

* g++.dg/cpp2a/constexpr-new14.C: New test.

s390: Fix up s390_atomic_assign_expand_fenv

The following patch fixes
-FAIL: gcc.dg/pr94780.c (internal compiler error)
-FAIL: gcc.dg/pr94780.c (test for excess errors)
-FAIL: gcc.dg/pr94842.c (internal compiler error)
-FAIL: gcc.dg/pr94842.c (test for excess errors)
on s390x-linux.  The fix is essentially the same as has been applied to many
other targets (i386, aarch64, arm, rs6000, alpha, riscv).

2020-10-01  Jakub Jelinek  <jakub@redhat.com>

* config/s390/s390.c (s390_atomic_assign_expand_fenv): Use
TARGET_EXPR instead of MODIFY_EXPR for the first assignments to
fenv_var and old_fpc.  Formatting fixes.

tree-optimization/97255 - missing vector bool pattern of SRAed bool

SRA tends to use VIEW_CONVERT_EXPR when replacing bool fields with
unsigned char fields.  Those are not handled in vector bool pattern
detection causing vector true values to leak.  The following fixes
this by turning those into b ? 1 : 0 as well.

2020-10-01  Richard Biener  <rguenther@suse.de>

* tree-vect-patterns.c (vect_recog_bool_pattern): Also handle
VIEW_CONVERT_EXPR.

* g++.dg/vect/pr97255.cc: New testcase.

PR target/97250: i386: Add support for x86-64-v2, x86-64-v3, x86-64-v4 levels for x86-64

These micro-architecture levels are defined in the x86-64 psABI:

https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/77566eb03bc6a326811cb7e9

PTA_NO_TUNE is introduced so that the new processor alias table entries
do not affect the CPU tuning setting in ix86_tune.

The tests depend on the macros added in commit 92e652d8c21bd7e66cbb0f900
("i386: Define __LAHF_SAHF__ and __MOVBE__ macros, based on ISA flags").

gcc/:
PR target/97250
* config/i386/i386.h (PTA_NO_TUNE, PTA_X86_64_BASELINE)
(PTA_X86_64_V2, PTA_X86_64_V3, PTA_X86_64_V4): New.
* common/config/i386/i386-common.c (processor_alias_table):
Add "x86-64-v2", "x86-64-v3", "x86-64-v4".
* config/i386/i386-options.c (ix86_option_override_internal):
Handle new PTA_NO_TUNE processor table entries.
* doc/invoke.texi (x86 Options): Document new -march values.

gcc/testsuite/:
PR target/97250
* gcc.target/i386/x86-64-v2.c: New test.
* gcc.target/i386/x86-64-v3.c: New test.
* gcc.target/i386/x86-64-v3-haswell.c: New test.
* gcc.target/i386/x86-64-v3-skylake.c: New test.
* gcc.target/i386/x86-64-v4.c: New test.

libgo: add 32-bit RISC-V (RV32) support

Add support for the 32-bit RISC-V (RV32) ISA matching the 64-bit RISC-V
(RV64) port except for async preemption added as a stub only.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/251179

[RS6000] Adjust gcc asm for power10

Generate assembly with .localentry,1 functions using @notoc calls.
This patch makes libgcc.a asm look the same as power10 pcrel as far as
toc/notoc is concerned.

Otherwise calling between functions that advertise as using the TOC
and those that don't, will require linker call stubs in statically
linked code.

gcc/
* config/rs6000/ppc-asm.h: Support __PCREL__ code.
libgcc/
* config/rs6000/morestack.S,
* config/rs6000/tramp.S: Support __PCREL__ code.
libitm/
* config/powerpc/sjlj.S: Support __PCREL__ code.

[RS6000] -mno-minimal-toc vs. power10 pcrelative

We've had this hack in the libgcc config to build libgcc with
-mcmodel=small for powerpc64 for a long time.  It wouldn't be a bad
thing if someone who knows the multilib machinery well could arrange
for -mcmodel=small to be passed just for ppc64 when building for
earlier than power10.  But for now, make -mno-minimal-toc do nothing
when pcrel.  Which will do the right thing for any project that has
copied libgcc's trick.

We want this if configuring using --with-cpu=power10 to build a
power10 pcrel libgcc.  --mcmodel=small turns off pcrel.

gcc/
* config/rs6000/linux64.h (SUBSUBTARGET_OVERRIDE_OPTIONS): Don't
set -mcmodel=small for -mno-minimal-toc when pcrel.
libgcc/
* config/rs6000/t-linux: Document purpose of -mno-minimal-toc.

c++: CTAD and explicit deduction guides for copy-list-init [PR90210]

This PR points out that we accept

  template<typename T> struct tuple { tuple(T); }; // #1
  template<typename T> explicit tuple(T t) -> tuple<T>; // #2
  tuple t = { 1 };

despite the 'explicit' deduction guide in a copy-list-initialization
context.  That's because in deduction_guides_for we first find the
user-defined deduction guide (#2), and then ctor_deduction_guides_for
creates artificial deduction guides: one from the tuple(T) constructor and
a copy guide.  So we end up with these three guides:

  (1) template<class T> tuple(T) -> tuple<T> [DECL_NONCONVERTING_P]
  (2) template<class T> tuple(tuple<T>) -> tuple<T>
  (3) template<class T> tuple(T) -> tuple<T>

Then, in do_class_deduction, we prune this set, and get rid of (1).
Then overload resolution selects (3) and we succeed.

But [over.match.list]p1 says "In copy-list-initialization, if an explicit
constructor is chosen, the initialization is ill-formed."  It also goes
on to say that this differs from other situations where only converting
constructors are considered for copy-initialization.  Therefore for
list-initialization we consider explicit constructors and complain if one
is chosen.  E.g. convert_like_internal/ck_user can give an error.

So my logic runs that we should not prune the deduction_guides_for guides
in a copy-list-initialization context, and only complain if we actually
choose an explicit deduction guide.  This matches clang++/EDG/msvc++.

gcc/cp/ChangeLog:

PR c++/90210
* pt.c (do_class_deduction): Don't prune explicit deduction guides
in copy-list-initialization.  In copy-list-initialization, if an
explicit deduction guide was selected, give an error.

gcc/testsuite/ChangeLog:

PR c++/90210
* g++.dg/cpp1z/class-deduction73.C: New test.

Daily bump.

libstdc++: Fix test_and_acquire / set_and_release for EABI guard variables

The default definitions of _GLIBCXX_GUARD_TEST_AND_ACQUIRE and
_GLIBCXX_GUARD_SET_AND_RELEASE in libsupc++/guard.cc only work for the
generic (IA64) ABI, because they test/set the first byte of the guard
variable. For EABI we need to use the least significant bit, which means
using the first byte is wrong for big endian targets.

This has been wrong since r224411, but previously it only caused poor
performance. The _GLIBCXX_GUARD_TEST_AND_ACQUIRE at the very start of
__cxa_guard_acquire would always return false even if the initialization
was actually complete. Before my r11-3484 change the atomic compare
exchange would have loaded the correct value, and then returned 0 as
expected when the initialization is complete. After my change, in the
single-threaded case there is no redundant check for init being
complete, because I foolishly assumed that the check at the start of the
function actually worked.

The default definition of _GLIBCXX_GUARD_SET_AND_RELEASE is also wrong
for big endian EABI, but appears to work because it sets the wrong bit
but then the buggy TEST_AND_ACQUIRE tests that wrong bit as well. Also,
the buggy SET_AND_RELEASE macro is only used for targets with threads
enabled but no futex syscalls.

This should fix the regressions introduced by my patch, by defining
custom versions of the TEST_AND_ACQUIRE and SET_AND_RELEASE macros that
are correct for EABI.

libstdc++-v3/ChangeLog:

* config/cpu/arm/cxxabi_tweaks.h (_GLIBCXX_GUARD_TEST_AND_ACQUIRE):
(_GLIBCXX_GUARD_SET_AND_RELEASE): Define for EABI.

Avoid assuming a VLA access specification string contains a closing bracket (PR middle-end/97189).

Resolves:
PR middle-end/97189 - ICE on redeclaration of a function with VLA argument and attribute access

gcc/ChangeLog:

PR middle-end/97189
* attribs.c (attr_access::array_as_string): Avoid assuming a VLA
access specification string contains a closing bracket.

gcc/c-family/ChangeLog:

PR middle-end/97189
* c-attribs.c (append_access_attr): Use the function declaration
location for a warning about an attribute access argument.

gcc/testsuite/ChangeLog:

PR middle-end/97189
* gcc.dg/attr-access-2.c: Adjust caret location.
* gcc.dg/Wvla-parameter-6.c: New test.
* gcc.dg/Wvla-parameter-7.c: New test.

PR c/97206 - ICE in composite_type on declarations of a similar array types

gcc/ChangeLog:

PR c/97206
* attribs.c (attr_access::array_as_string): Avoid modifying a shared
type in place and use build_type_attribute_qual_variant instead.

gcc/testsuite/ChangeLog:

PR c/97206
* gcc.dg/Warray-parameter-7.c: New test.
* gcc.dg/Warray-parameter-8.c: New test.
* gcc.dg/Wvla-parameter-5.c: New test.

libstdc++: Use __is_same instead of __is_same_as

PR 92271 added __is_same as another spelling of __is_same_as. Since
Clang also spells it __is_same, let's just use that consistently.

It appears that Intel icc sets __GNUC__ to 10, but only supports
__is_same_as. If we only use __is_same for __GNUC__ >= 11 then we won't
break icc again (it looks like we broke previous versions of icc when we
started using __is_same_as).

libstdc++-v3/ChangeLog:

* include/bits/c++config (_GLIBCXX_HAVE_BUILTIN_IS_SAME):
Define for GCC 11 or when !__is_identifier(__is_same).
(_GLIBCXX_BUILTIN_IS_SAME_AS): Remove.
* include/std/type_traits (is_same, is_same_v): Replace uses
of _GLIBCXX_BUILTIN_IS_SAME_AS.

libgomp: Enforce 1-thread limit in subteams

Accelerators with fixed thread-counts will break if nested teams are expected
to have multiple threads each.

libgomp/ChangeLog:

2020-09-29 Andrew Stubbs <ams@codesourcery.com>

* parallel.c (gomp_resolve_num_threads): Ignore nest_var on nvptx
and amdgcn targets.

Fix some fnspec strings in trans-decl.c

* trans-decl.c (gfc_build_intrinsic_function_decls): Add traling dots
to spec strings so they match the number of parameters; do not use
R and W for non-pointer parameters. Drop pointless specifier on
caf_stop_numeric and caf_get_team.

Add trailing dots so length of spec string matches number of arguments.

2020-09-30 Jan Hubicka <hubicka@ucw.cz>

* trans-io.c (gfc_build_io_library_fndecls): Add trailing dots so
length of spec string matches number of arguments.

Add a testcase for PR target/96827

Add a testcase for PR target/96827 which was fixed by r11-3559:

commit 97b798d80baf945ea28236eef3fa69f36626b579
Author: Joel Hutton <joel.hutton@arm.com>
Date: Wed Sep 30 15:08:13 2020 +0100

[SLP][VECT] Add check to fix 96837

PR target/96827
* gcc.target/i386/pr96827.c: New test.

arm: [testsuite] Skip thumb2-cond-cmp tests on Cortex-M [PR94595]

Since r204778 (g571880a0a4c512195aa7d41929ba6795190887b2), we favor
branches over IT blocks on Cortex-M. As a result, instead of
generating two nested IT blocks in thumb2-cond-cmp-[1234].c, we
generate either a single IT block, or use branches depending on
conditions tested by the program.

Since this was a deliberate change and the tests still pass as
expected on Cortex-A, this patch skips them when targetting
Cortex-M. The avoids the failures on Cortex M3, M4, and M33. This
patch makes the testcases unsupported on Cortex-M7 although they pass
in this case because this CPU has different branch costs.

I tried to relax the scan-assembler directives using eg. cmpne|subne
or cmpgt|ble but that seemed fragile.

2020-09-07 Christophe Lyon <christophe.lyon@linaro.org>

gcc/testsuite/
PR target/94595
* gcc.target/arm/thumb2-cond-cmp-1.c: Skip if arm_cortex_m.
* gcc.target/arm/thumb2-cond-cmp-2.c: Skip if arm_cortex_m.
* gcc.target/arm/thumb2-cond-cmp-3.c: Skip if arm_cortex_m.
* gcc.target/arm/thumb2-cond-cmp-4.c: Skip if arm_cortex_m.

amend SLP reduction testcases

This amends SLP reduction testcases that currently trigger
vect_attempt_slp_rearrange_stmts eliding load permutations to
verify this is actually happening.

2020-09-30 Richard Biener <rguenther@suse.de>

* gcc.dg/vect/pr37027.c: Amend.
* gcc.dg/vect/pr67790.c: Likewise.
* gcc.dg/vect/pr92324-4.c: Likewise.
* gcc.dg/vect/pr92558.c: Likewise.
* gcc.dg/vect/pr95495.c: Likewise.
* gcc.dg/vect/slp-reduc-1.c: Likewise.
* gcc.dg/vect/slp-reduc-2.c: Likewise.
* gcc.dg/vect/slp-reduc-3.c: Likewise.
* gcc.dg/vect/slp-reduc-4.c: Likewise.
* gcc.dg/vect/slp-reduc-5.c: Likewise.
* gcc.dg/vect/slp-reduc-7.c: Likewise.
* gcc.dg/vect/vect-reduc-in-order-4.c: Likewise.

arm: add support for Cortex-A78 and Cortex-A78AE

This patch introduces support for Cortex-A78 [0] and Cortex-A78AE [1]
cpus.

[0]: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78
[1]: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78ae

OK for master branch ?

kind regards
Przemyslaw Wirkus

gcc/ChangeLog:

* config/arm/arm-cpus.in: Add Cortex-A78 and Cortex-A78AE cores.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
* doc/invoke.texi: Update docs.

aarch64: add support for Cortex-A78 and Cortex-A78AE

This patch introduces support for Cortex-A78 [0] and Cortex-A78AE [1]
cpus.

[0]: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78
[1]: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78ae

OK for master branch ?

kind regards
Przemyslaw Wirkus

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def: Add Cortex-A78 and Cortex-A78AE cores.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Add -mtune=cortex-a78 and -mtune=cortex-a78ae.

[GCC][PATCH] arm: Fix MVE intrinsics polymorphic variants wrongly generating __ARM_undef type (pr96795).

Hello,

This patch fixes (PR96795) MVE intrinsic polymorphic variants vaddq, vaddq_m, vaddq_x, vcmpeqq_m,
vcmpeqq, vcmpgeq_m, vcmpgeq, vcmpgtq_m, vcmpgtq, vcmpleq_m, vcmpleq, vcmpltq_m, vcmpltq,
vcmpneq_m, vcmpneq, vfmaq_m, vfmaq, vfmasq_m, vfmasq, vmaxnmavq, vmaxnmavq_p, vmaxnmvq,
vmaxnmvq_p, vminnmavq, vminnmavq_p, vminnmvq, vminnmvq_p, vmulq_m, vmulq, vmulq_x, vsetq_lane,
vsubq_m, vsubq and vsubq_x which are incorrectly generating __ARM_undef and mismatching the passed
floating point scalar arguments.

Bootstrapped on arm-none-linux-gnueabihf and regression tested on arm-none-eabi and found no regressions.

Ok for master? Ok for GCC-10 branch?

Regards,
Srinath.

gcc/ChangeLog:

2020-09-30 Srinath Parvathaneni <srinath.parvathaneni@arm.com>

PR target/96795
* config/arm/arm_mve.h (__ARM_mve_coerce2): Define.
(__arm_vaddq): Correct the scalar argument.
(__arm_vaddq_m): Likewise.
(__arm_vaddq_x): Likewise.
(__arm_vcmpeqq_m): Likewise.
(__arm_vcmpeqq): Likewise.
(__arm_vcmpgeq_m): Likewise.
(__arm_vcmpgeq): Likewise.
(__arm_vcmpgtq_m): Likewise.
(__arm_vcmpgtq): Likewise.
(__arm_vcmpleq_m): Likewise.
(__arm_vcmpleq): Likewise.
(__arm_vcmpltq_m): Likewise.
(__arm_vcmpltq): Likewise.
(__arm_vcmpneq_m): Likewise.
(__arm_vcmpneq): Likewise.
(__arm_vfmaq_m): Likewise.
(__arm_vfmaq): Likewise.
(__arm_vfmasq_m): Likewise.
(__arm_vfmasq): Likewise.
(__arm_vmaxnmavq): Likewise.
(__arm_vmaxnmavq_p): Likewise.
(__arm_vmaxnmvq): Likewise.
(__arm_vmaxnmvq_p): Likewise.
(__arm_vminnmavq): Likewise.
(__arm_vminnmavq_p): Likewise.
(__arm_vminnmvq): Likewise.
(__arm_vminnmvq_p): Likewise.
(__arm_vmulq_m): Likewise.
(__arm_vmulq): Likewise.
(__arm_vmulq_x): Likewise.
(__arm_vsetq_lane): Likewise.
(__arm_vsubq_m): Likewise.
(__arm_vsubq): Likewise.
(__arm_vsubq_x): Likewise.

gcc/testsuite/ChangeLog:

PR target/96795
* gcc.target/arm/mve/intrinsics/mve_fp_vaddq_n.c: New Test.
* gcc.target/arm/mve/intrinsics/mve_vaddq_n.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_x_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_x_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmaq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmaq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmaq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmaq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmasq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmasq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_p_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_p_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_p_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_p_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_x_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_x_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsubq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsubq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsubq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsubq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsubq_x_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsubq_x_n_f32-1.c: Likewise.

[SLP][VECT] Add check to fix 96837

The following patch adds a simple check to prevent slp stmts from
vector constructors being rearranged. vect_attempt_slp_rearrange_stmts
tries to rearrange to avoid a load permutation.

This fixes PR target/96837
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96827

gcc/ChangeLog:

2020-09-29 Joel Hutton <joel.hutton@arm.com>

PR target/96837
* tree-vect-slp.c (vect_analyze_slp): Do not call
vect_attempt_slp_rearrange_stmts for vector constructors.

gcc/testsuite/ChangeLog:

2020-09-29 Joel Hutton <joel.hutton@arm.com>

PR target/96837
* gcc.dg/vect/bb-slp-49.c: New test.

middle-end: Refactor refcnt to use SLP_TREE_REF_COUNT for consistency

This is a small refactoring which introduces SLP_TREE_REF_COUNT and replaces
the uses of refcnt with it. This for consistency between the other properties.

A similar patch was pre-approved last year but since there are more use now I am
sending it for review anyway.

gcc/ChangeLog:

* tree-vectorizer.h (SLP_TREE_REF_COUNT): New.
* tree-vect-slp.c (_slp_tree::_slp_tree, _slp_tree::~_slp_tree,
vect_free_slp_tree, vect_build_slp_tree, vect_print_slp_tree,
slp_copy_subtree, vect_attempt_slp_rearrange_stmts): Use it.

c++: Kill DECL_HIDDEN_FRIEND_P

Now hiddenness is managed by name-lookup, we no longer need DECL_HIDDEN_FRIEND_P.
This removes it.  Mainly by deleting its bookkeeping, but there are a couple of uses

1) two name lookups look at it to see if they found a hidden thing.
In one we have the OVERLOAD, so can record OVL_HIDDEN_P.  In the other
we're repeating a lookup that failed, but asking for hidden things --
so if that succeeds we know the thing was hidden.  (FWIW CWG recently
discussed whether template specializations and instantiations should
see such hidden templates anyway, there is compiler divergence.)

2) We had a confusing setting of KOENIG_P when building a
non-dependent call.  We don't repeat that lookup at instantiation time
anyway.

gcc/cp/
* cp-tree.h (struct lang_decl_fn): Remove hidden_friend_p.
(DECL_HIDDEN_FRIEND_P): Delete.
* call.c (add_function_candidate): Drop assert about anticipated
decl.
(build_new_op_1): Drop koenig lookup flagging for hidden friend.
* decl.c (duplicate_decls): Drop HIDDEN_FRIEND_P updating.
* name-lookup.c (do_pushdecl): Likewise.
(set_decl_namespace): Discover hiddenness from OVL_HIDDEN_P.
* pt.c (check_explicit_specialization): Record found_hidden
explicitly.

Fortran: add contiguous check for ptr assignment, fix non-contig check (PR97242)

gcc/fortran/ChangeLog:

PR fortran/97242
* expr.c (gfc_is_not_contiguous): Fix check.
(gfc_check_pointer_assign): Use it.

gcc/testsuite/ChangeLog:

PR fortran/97242
* gfortran.dg/contiguous_11.f90: New test.
* gfortran.dg/contiguous_4.f90: Update.
* gfortran.dg/contiguous_7.f90: Update.

OpenMP: Add implicit declare target for nested procedures

gcc/ChangeLog:

* omp-offload.c (omp_discover_implicit_declare_target): Also
handled nested functions.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/declare-target-3.f90: New test.

This patch fixes PR97045 - unlimited polymorphic array element selectors.

2020-30-09 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/97045
* trans-array.c (gfc_conv_array_ref): Make sure that the class
decl is passed to build_array_ref in the case of unlimited
polymorphic entities.
* trans-expr.c (gfc_conv_derived_to_class): Ensure that array
refs do not preceed the _len component. Free the _len expr.
* trans-stmt.c (trans_associate_var): Reset 'need_len_assign'
for polymorphic scalars.
* trans.c (gfc_build_array_ref): When the vptr size is used for
span, multiply by the _len field of unlimited polymorphic
entities, when non-zero.

gcc/testsuite/
PR fortran/97045
* gfortran.dg/select_type_50.f90 : New test.

[nvptx] Add type arg to TARGET_LIBC_HAS_FUNCTION

GCC has a target hook TARGET_LIBC_HAS_FUNCTION, which tells the compiler
which functions it can expect to be present in libc.

The default target hook does not include the sincos functions.

The nvptx port of newlib does include sincos and sincosf, but not sincosl.

The target hook TARGET_LIBC_HAS_FUNCTION does not distinguish between sincos,
sincosf and sincosl, so if we enable it for the sincos functions, then for
test.c:
...
long double x, a, b;
int main (void) {
  x = 0.5;
  a = sinl (x);
  b = cosl (x);
  printf ("a: %f\n", (double)a);
  printf ("b: %f\n", (double)b);
  return 0;
}
...
we introduce a regression:
...
$ gcc test.c -lm -O2
unresolved symbol sincosl
collect2: error: ld returned 1 exit status
...

Add a type argument to target hook TARGET_LIBC_HAS_FUNCTION_TYPE, and use it
in nvptx_libc_has_function_type to enable sincos and sincosf, but not sincosl.

Build and reg-tested on x86_64-linux.

Build and tested on nvptx.

gcc/ChangeLog:

2020-09-28  Tobias Burnus  <tobias@codesourcery.com>
    Tom de Vries  <tdevries@suse.de>

* builtins.c (expand_builtin_cexpi, fold_builtin_sincos): Update
targetm.libc_has_function call.
* builtins.def (DEF_C94_BUILTIN, DEF_C99_BUILTIN, DEF_C11_BUILTIN):
(DEF_C2X_BUILTIN, DEF_C99_COMPL_BUILTIN, DEF_C99_C90RES_BUILTIN):
Same.
* config/darwin-protos.h (darwin_libc_has_function): Update prototype.
* config/darwin.c (darwin_libc_has_function): Add arg.
* config/linux-protos.h (linux_libc_has_function): Update prototype.
* config/linux.c (linux_libc_has_function): Add arg.
* config/i386/i386.c (ix86_libc_has_function): Update
targetm.libc_has_function call.
* config/nvptx/nvptx.c (nvptx_libc_has_function): New function.
(TARGET_LIBC_HAS_FUNCTION): Redefine to nvptx_libc_has_function.
* convert.c (convert_to_integer_1): Update targetm.libc_has_function
call.
* match.pd: Same.
* target.def (libc_has_function): Add arg.
* doc/tm.texi: Regenerate.
* targhooks.c (default_libc_has_function, gnu_libc_has_function)
(no_c99_libc_has_function): Add arg.
* targhooks.h (default_libc_has_function, no_c99_libc_has_function)
(gnu_libc_has_function): Update prototype.
* tree-ssa-math-opts.c (pass_cse_sincos::execute): Update
targetm.libc_has_function call.

gcc/fortran/ChangeLog:

2020-09-30  Tom de Vries  <tdevries@suse.de>

* f95-lang.c (gfc_init_builtin_functions):  Update
targetm.libc_has_function call.

x86: Use SET operation in MOVDIRI and MOVDIR64B

Since MOVDIRI and MOVDIR64B write to memory, similar to UNSPEC_MOVNT,
use SET operation in MOVDIRI and MOVDIR64B patterns with UNSPEC instead
of UNSPECV.

gcc/

PR target/97184
* config/i386/i386.md (UNSPECV_MOVDIRI): Renamed to ...
(UNSPEC_MOVDIRI): This.
(UNSPECV_MOVDIR64B): Renamed to ...
(UNSPEC_MOVDIR64B): This.
(movdiri<mode>): Use SET operation.
(@movdir64b_<mode>): Likewise.

gcc/testsuite/

PR target/97184
* gcc.target/i386/movdir64b.c: New test.
* gcc.target/i386/movdiri32.c: Likewise.
* gcc.target/i386/movdiri64.c: Likewise.
* lib/target-supports.exp (check_effective_target_movdir): New.

[testsuite] Re-enable pr94600-{1,3}.c tests for arm

Before commit 7e437162001 "[testsuite] Require non_strict_align in
pr94600-{1,3}.c", some tests were failing for nvptx, because volatile stores
were expected, but memcpy's were found instead.

This was traced back to this bit in compute_record_mode:
...
  /* If structure's known alignment is less than what the scalar
     mode would need, and it matters, then stick with BLKmode.  */
  if (mode != BLKmode
      && STRICT_ALIGNMENT
      && ! (TYPE_ALIGN (type) >= BIGGEST_ALIGNMENT
            || TYPE_ALIGN (type) >= GET_MODE_ALIGNMENT (mode)))
    {
      /* If this is the only reason this type is BLKmode, then
         don't force containing types to be BLKmode.  */
      TYPE_NO_FORCE_BLK (type) = 1;
      mode = BLKmode;
    }
...
which got triggered for nvptx, but not for x86_64.

The commit disabled the tests for non_strict_align effective target, but
that had the effect for the arm target that those tests were disabled, even
though they were passing before.

Further investigation in compute_record_mode shows that the if-condition
evaluates to false for arm because, because TYPE_ALIGN (type) == 32, while
it's 8 for nvptx.  This again can be explained by the
PCC_BITFIELD_TYPE_MATTERS setting, which is 1 for arm, but 0 for nvptx.

Re-enable the test for arm by using effective target
(non_strict_align || pcc_bitfield_type_matters).

Tested on arm-eabi and nvptx.

gcc/testsuite/ChangeLog:

2020-09-30  Tom de Vries  <tdevries@suse.de>

* gcc.dg/pr94600-1.c: Use effective target
(non_strict_align || pcc_bitfield_type_matters).
* gcc.dg/pr94600-3.c: Same.