git.libre-soc.org Git

Convert sprintf/strlen passes to value query class.

gcc/ChangeLog:

* builtins.c (compute_objsize): Replace vr_values with range_query.
(get_range): Same.
(gimple_call_alloc_size): Same.
* builtins.h (class vr_values): Remove.
(gimple_call_alloc_size): Replace vr_values with range_query.
* gimple-ssa-sprintf.c (get_int_range): Same.
(struct directive): Pass gimple context to fmtfunc callback.
(directive::set_width): Replace inline with out-of-line version.
(directive::set_precision): Same.
(format_none): New gimple argument.
(format_percent): New gimple argument.
(format_integer): New gimple argument.
(format_floating): New gimple argument.
(get_string_length): Use range_query API.
(format_character): New gimple argument.
(format_string): New gimple argument.
(format_plain): New gimple argument.
(format_directive): New gimple argument.
(parse_directive): Replace vr_values with range_query.
(compute_format_length): Same.
(handle_printf_call): Same. Adjust for range_query API.
* tree-ssa-strlen.c (get_range): Same.
(compare_nonzero_chars): Same.
(get_addr_stridx) Replace vr_values with range_query.
(get_stridx): Same.
(dump_strlen_info): Same.
(get_range_strlen_dynamic): Adjust for range_query API.
(set_strlen_range): Same
(maybe_warn_overflow): Replace vr_values with range_query.
(handle_builtin_strcpy): Same.
(maybe_diag_stxncpy_trunc): Add FIXME comment.
(handle_builtin_memcpy): Replace vr_values with range_query.
(handle_builtin_memset): Same.
(get_len_or_size): Same.
(strxcmp_eqz_result): Same.
(handle_builtin_string_cmp): Same.
(count_nonzero_bytes_addr): Same, plus adjust for range_query API.
(count_nonzero_bytes): Replace vr_values with range_query.
(handle_store): Same.
(strlen_check_and_optimize_call): Same.
(handle_integral_assign): Same.
(check_and_optimize_stmt): Same.
* tree-ssa-strlen.h (class vr_values): Remove.
(get_range): Replace vr_values with range_query.
(get_range_strlen_dynamic): Same.
(handle_printf_call): Same.

Convert vr-values to value query class.

gcc/ChangeLog:

* gimple-loop-versioning.cc (lv_dom_walker::before_dom_children):
Pass m_range_analyzer instead of get_vr_values.
(loop_versioning::name_prop::get_value): Rename to...
(loop_versioning::name_prop::value_of_expr): ...this.
* gimple-ssa-evrp-analyze.c (evrp_range_analyzer::evrp_range_analyzer):
Adjust for evrp_range_analyzer
inheriting from vr_values.
(evrp_range_analyzer::try_find_new_range): Same.
(evrp_range_analyzer::record_ranges_from_incoming_edge): Same.
(evrp_range_analyzer::record_ranges_from_phis): Same.
(evrp_range_analyzer::record_ranges_from_stmt): Same.
(evrp_range_analyzer::push_value_range): Same.
(evrp_range_analyzer::pop_value_range): Same.
* gimple-ssa-evrp-analyze.h (class evrp_range_analyzer): Inherit from
vr_values. Adjust accordingly.
* gimple-ssa-evrp.c: Adjust for evrp_range_analyzer inheriting from
vr_values.
(evrp_folder::value_of_evrp): Rename from get_value.
* tree-ssa-ccp.c (class ccp_folder): Rename get_value to
value_of_expr.
(ccp_folder::get_value): Rename to...
(ccp_folder::value_of_expr): ...this.
* tree-ssa-copy.c (class copy_folder): Rename get_value to
value_of_expr.
(copy_folder::get_value): Rename to...
(copy_folder::value_of_expr): ...this.
* tree-ssa-dom.c (dom_opt_dom_walker::after_dom_children): Adjust
for evrp_range_analyzer inheriting from vr_values.
(dom_opt_dom_walker::optimize_stmt): Same.
* tree-ssa-propagate.c (substitute_and_fold_engine::replace_uses_in):
Call value_of_* instead of get_value.
(substitute_and_fold_engine::replace_phi_args_in): Same.
(substitute_and_fold_engine::propagate_into_phi_args): Same.
(substitute_and_fold_dom_walker::before_dom_children): Same.
* tree-ssa-propagate.h: Include value-query.h.
(class substitute_and_fold_engine): Inherit from value_query.
* tree-ssa-strlen.c (strlen_dom_walker::before_dom_children):
Adjust for evrp_range_analyzer inheriting from vr_values.
* tree-ssa-threadedge.c (record_temporary_equivalences_from_phis):
Same.
* tree-vrp.c (class vrp_folder): Same.
(vrp_folder::get_value): Rename to value_of_expr.
* vr-values.c (vr_values::get_lattice_entry): Adjust for
vr_values inheriting from range_query.
(vr_values::range_of_expr): New.
(vr_values::value_of_expr): New.
(vr_values::value_on_edge): New.
(vr_values::value_of_stmt): New.
(simplify_using_ranges::op_with_boolean_value_range_p): Call
get_value_range through query.
(check_for_binary_op_overflow): Rename store to query.
(vr_values::vr_values): Remove vrp_value_range_pool.
(vr_values::~vr_values): Same.
(simplify_using_ranges::get_vr_for_comparison): Call get_value_range
through query.
(simplify_using_ranges::compare_names): Same.
(simplify_using_ranges::vrp_evaluate_conditional): Same.
(simplify_using_ranges::vrp_visit_cond_stmt): Same.
(simplify_using_ranges::simplify_abs_using_ranges): Same.
(simplify_using_ranges::simplify_cond_using_ranges_1): Same.
(simplify_cond_using_ranges_2): Same.
(simplify_using_ranges::simplify_switch_using_ranges): Same.
(simplify_using_ranges::two_valued_val_range_p): Same.
(simplify_using_ranges::simplify_using_ranges): Rename store to query.
(simplify_using_ranges::simplify): Assert that we have a query.
* vr-values.h (class range_query): Remove.
(class simplify_using_ranges): Remove inheritance of range_query.
(class vr_values): Add virtuals for range_of_expr, value_of_expr,
value_on_edge, value_of_stmt, and get_value_range.
Call range_query allocator instead of using vrp_value_range_pool.
Remove vrp_value_range_pool.
(simplify_using_ranges::get_value_range): Remove.

tree-optimization/97236 - fix bad use of VMAT_CONTIGUOUS

This avoids using VMAT_CONTIGUOUS with single-element interleaving
when using V1mode vectors. Instead keep VMAT_ELEMENTWISE but
continue to avoid load-lanes and gathers.

2020-10-01 Richard Biener <rguenther@suse.de>

PR tree-optimization/97236
* tree-vect-stmts.c (get_group_load_store_type): Keep
VMAT_ELEMENTWISE for single-element vectors.

* gcc.dg/vect/pr97236.c: New testcase.

c++: pushdecl_top_level must set context

I discovered pushdecl_top_level was not setting the decl's context,
and we ended up with namespace-scope decls with NULL context.  That
broke modules.  Then I discovered a couple of places where we set the
context to a FUNCTION_DECL, which is also wrong.  AFAICT the literals
in question belong in global scope, as they're comdatable entities.
But create_temporary would use current_scope for the context before we
pushed it into namespace scope.

This patch asserts the context is NULL and then sets it to the frobbed
global_namespace.

gcc/cp/
* name-lookup.c (pushdecl_top_level): Assert incoming context is
null, add global_namespace context.
(pushdecl_top_level_and_finish): Likewise.
* pt.c (get_template_parm_object): Clear decl context before
pushing.
* semantics.c (finish_compound_literal): Likewise.

Add gcc.c-torture/compile/pr97243.c testcase.

PR ipa/97243
* gcc.c-torture/compile/pr97243.c: New test.

Fix ICE in compute_parm_map

gcc/ChangeLog:

* ipa-modref.c (compute_parm_map): Be ready for callee_pi to be NULL.

Add -fno-ipa-modref to gcc.dg/ipa/remref-2a.c

PR ipa/97244
* gcc.dg/ipa/remref-2a.c: Add -fno-ipa-modref

Fix ICE in ipa_edge_args_sum_t::duplicate

PR ipa/97244
* ipa-fnsummary.c (pass_free_fnsummary::execute): Free
also indirect inlining datastructure.
* ipa-modref.c (pass_ipa_modref::execute): Do not free them here.
* ipa-prop.c (ipa_free_all_node_params): Do not crash when info does
not exist.
(ipa_unregister_cgraph_hooks): Likewise.

Fix handling of fnspec for internal functions.

* internal-fn.c (DEF_INTERNAL_FN): Fix handling of fnspec

Initial implementation of value query class.

gcc/ChangeLog:

* Makefile.in: Add value-query.o.
* value-query.cc: New file.
* value-query.h: New file.

c++: Refactor lookup_and_check_tag

It turns out I'd already found lookup_and_check_tag's control flow
confusing, and had refactored it on the modules branch. For instance,
it continually checks 'if (decl &&$ condition)' before finally getting
to 'else if (!decl)'. why not just check !decl first and be done?
Well, it is done thusly.

gcc/cp/
* decl.c (lookup_and_check_tag): Refactor.

libstdc++: Fix test_and_acquire for EABI

libstdc++-v3/ChangeLog:

* config/cpu/arm/cxxabi_tweaks.h (_GLIBCXX_GUARD_TEST_AND_ACQUIRE):
Do not try to dereference return value of __atomic_load_n.

arm: Fix ordering in arm-cpus.in

This moves the recent entry for Neoverse N2 down and adds a comment in
order to preserve the existing order/structure in arm-cpus.in.

gcc/ChangeLog:

* config/arm/arm-cpus.in: Fix ordering, move Neoverse N2 down.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.

[testsuite] Enable pr94600-{1,3}.c tests for nvptx

When compiling test-case pr94600-1.c for nvptx, this gimple mem move:
...
  MEM[(volatile struct t0 *)655404B] ={v} a0[0];
...
is expanded into a memcpy, but when compiling pr94600-2.c instead, this similar
gimple mem move:
...
  MEM[(volatile struct t0 *)655404B] ={v} a00;
...
is expanded into a 32-bit load/store pair.

In both cases, emit_block_move is called.

In the latter case, can_move_by_pieces (4 /* byte-size */, 32 /* bit-align */)
is called, which returns true (because by_pieces_ninsns returns 1, which is
smaller than the MOVE_RATIO of 4).

In the former case, can_move_by_pieces (4 /* byte-size */, 8 /* bit-align */)
is called, which returns false (because by_pieces_ninsns returns 4, which is
not smaller than the MOVE_RATIO of 4).

So the difference in code generation is explained by the alignment.  The
difference in alignment comes from the move sources: a0[0] vs. a00.  Both
have the same type with 8-bit alignment, but a00 is on stack, which based on
the base stack align and stack variable placement happens to result in a
32-bit alignment.

Enable test-cases pr94600-{1,3}.c for nvptx by forcing the currently 8-byte
aligned variables to have a 32-bit alignment for STRICT_ALIGNMENT targets.

Tested on nvptx.

gcc/testsuite/ChangeLog:

2020-10-01  Tom de Vries  <tdevries@suse.de>

* gcc.dg/pr94600-1.c: Force 32-bit alignment for a0 for !non_strict_align
targets.  Remove target clauses from scan tests.
* gcc.dg/pr94600-3.c: Same.

c++: Fix up default initialization with consteval default ctor [PR96994]

> > The following testcase is miscompiled (in particular the a and i
> > initialization).  The problem is that build_special_member_call due to
> > the immediate constructors (but not evaluated in constant expression mode)
> > doesn't create a CALL_EXPR, but returns a TARGET_EXPR with CONSTRUCTOR
> > as the initializer for it,
>
> That seems like the bug; at the end of build_over_call, after you
>
> >        call = cxx_constant_value (call, obj_arg);
>
> You need to build an INIT_EXPR if obj_arg isn't a dummy.

That works.  obj_arg is NULL if it is a dummy from the earlier code.

2020-10-01  Jakub Jelinek  <jakub@redhat.com>

PR c++/96994
* call.c (build_over_call): If obj_arg is non-NULL, return INIT_EXPR
setting obj_arg to call.

* g++.dg/cpp2a/consteval18.C: New test.

c++: Handle std::construct_at on automatic vars during constant evaluation [PR97195]

As mentioned in the PR, we only support due to a bug in constant expressions
std::construct_at on non-automatic variables, because we VERIFY_CONSTANT the
second argument of placement new, which fails verification if it is an
address of an automatic variable.
The following patch fixes it by not performing that verification, the
placement new evaluation later on will verify it after it is dereferenced.

2020-10-01 Jakub Jelinek <jakub@redhat.com>

PR c++/97195
* constexpr.c (cxx_eval_call_expression): Don't VERIFY_CONSTANT the
second argument.

* g++.dg/cpp2a/constexpr-new14.C: New test.

s390: Fix up s390_atomic_assign_expand_fenv

The following patch fixes
-FAIL: gcc.dg/pr94780.c (internal compiler error)
-FAIL: gcc.dg/pr94780.c (test for excess errors)
-FAIL: gcc.dg/pr94842.c (internal compiler error)
-FAIL: gcc.dg/pr94842.c (test for excess errors)
on s390x-linux.  The fix is essentially the same as has been applied to many
other targets (i386, aarch64, arm, rs6000, alpha, riscv).

2020-10-01  Jakub Jelinek  <jakub@redhat.com>

* config/s390/s390.c (s390_atomic_assign_expand_fenv): Use
TARGET_EXPR instead of MODIFY_EXPR for the first assignments to
fenv_var and old_fpc.  Formatting fixes.

tree-optimization/97255 - missing vector bool pattern of SRAed bool

SRA tends to use VIEW_CONVERT_EXPR when replacing bool fields with
unsigned char fields.  Those are not handled in vector bool pattern
detection causing vector true values to leak.  The following fixes
this by turning those into b ? 1 : 0 as well.

2020-10-01  Richard Biener  <rguenther@suse.de>

* tree-vect-patterns.c (vect_recog_bool_pattern): Also handle
VIEW_CONVERT_EXPR.

* g++.dg/vect/pr97255.cc: New testcase.

PR target/97250: i386: Add support for x86-64-v2, x86-64-v3, x86-64-v4 levels for x86-64

These micro-architecture levels are defined in the x86-64 psABI:

https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/77566eb03bc6a326811cb7e9

PTA_NO_TUNE is introduced so that the new processor alias table entries
do not affect the CPU tuning setting in ix86_tune.

The tests depend on the macros added in commit 92e652d8c21bd7e66cbb0f900
("i386: Define __LAHF_SAHF__ and __MOVBE__ macros, based on ISA flags").

gcc/:
PR target/97250
* config/i386/i386.h (PTA_NO_TUNE, PTA_X86_64_BASELINE)
(PTA_X86_64_V2, PTA_X86_64_V3, PTA_X86_64_V4): New.
* common/config/i386/i386-common.c (processor_alias_table):
Add "x86-64-v2", "x86-64-v3", "x86-64-v4".
* config/i386/i386-options.c (ix86_option_override_internal):
Handle new PTA_NO_TUNE processor table entries.
* doc/invoke.texi (x86 Options): Document new -march values.

gcc/testsuite/:
PR target/97250
* gcc.target/i386/x86-64-v2.c: New test.
* gcc.target/i386/x86-64-v3.c: New test.
* gcc.target/i386/x86-64-v3-haswell.c: New test.
* gcc.target/i386/x86-64-v3-skylake.c: New test.
* gcc.target/i386/x86-64-v4.c: New test.

libgo: add 32-bit RISC-V (RV32) support

Add support for the 32-bit RISC-V (RV32) ISA matching the 64-bit RISC-V
(RV64) port except for async preemption added as a stub only.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/251179

[RS6000] Adjust gcc asm for power10

Generate assembly with .localentry,1 functions using @notoc calls.
This patch makes libgcc.a asm look the same as power10 pcrel as far as
toc/notoc is concerned.

Otherwise calling between functions that advertise as using the TOC
and those that don't, will require linker call stubs in statically
linked code.

gcc/
* config/rs6000/ppc-asm.h: Support __PCREL__ code.
libgcc/
* config/rs6000/morestack.S,
* config/rs6000/tramp.S: Support __PCREL__ code.
libitm/
* config/powerpc/sjlj.S: Support __PCREL__ code.

[RS6000] -mno-minimal-toc vs. power10 pcrelative

We've had this hack in the libgcc config to build libgcc with
-mcmodel=small for powerpc64 for a long time.  It wouldn't be a bad
thing if someone who knows the multilib machinery well could arrange
for -mcmodel=small to be passed just for ppc64 when building for
earlier than power10.  But for now, make -mno-minimal-toc do nothing
when pcrel.  Which will do the right thing for any project that has
copied libgcc's trick.

We want this if configuring using --with-cpu=power10 to build a
power10 pcrel libgcc.  --mcmodel=small turns off pcrel.

gcc/
* config/rs6000/linux64.h (SUBSUBTARGET_OVERRIDE_OPTIONS): Don't
set -mcmodel=small for -mno-minimal-toc when pcrel.
libgcc/
* config/rs6000/t-linux: Document purpose of -mno-minimal-toc.

c++: CTAD and explicit deduction guides for copy-list-init [PR90210]

This PR points out that we accept

  template<typename T> struct tuple { tuple(T); }; // #1
  template<typename T> explicit tuple(T t) -> tuple<T>; // #2
  tuple t = { 1 };

despite the 'explicit' deduction guide in a copy-list-initialization
context.  That's because in deduction_guides_for we first find the
user-defined deduction guide (#2), and then ctor_deduction_guides_for
creates artificial deduction guides: one from the tuple(T) constructor and
a copy guide.  So we end up with these three guides:

  (1) template<class T> tuple(T) -> tuple<T> [DECL_NONCONVERTING_P]
  (2) template<class T> tuple(tuple<T>) -> tuple<T>
  (3) template<class T> tuple(T) -> tuple<T>

Then, in do_class_deduction, we prune this set, and get rid of (1).
Then overload resolution selects (3) and we succeed.

But [over.match.list]p1 says "In copy-list-initialization, if an explicit
constructor is chosen, the initialization is ill-formed."  It also goes
on to say that this differs from other situations where only converting
constructors are considered for copy-initialization.  Therefore for
list-initialization we consider explicit constructors and complain if one
is chosen.  E.g. convert_like_internal/ck_user can give an error.

So my logic runs that we should not prune the deduction_guides_for guides
in a copy-list-initialization context, and only complain if we actually
choose an explicit deduction guide.  This matches clang++/EDG/msvc++.

gcc/cp/ChangeLog:

PR c++/90210
* pt.c (do_class_deduction): Don't prune explicit deduction guides
in copy-list-initialization.  In copy-list-initialization, if an
explicit deduction guide was selected, give an error.

gcc/testsuite/ChangeLog:

PR c++/90210
* g++.dg/cpp1z/class-deduction73.C: New test.

Daily bump.

libstdc++: Fix test_and_acquire / set_and_release for EABI guard variables

The default definitions of _GLIBCXX_GUARD_TEST_AND_ACQUIRE and
_GLIBCXX_GUARD_SET_AND_RELEASE in libsupc++/guard.cc only work for the
generic (IA64) ABI, because they test/set the first byte of the guard
variable. For EABI we need to use the least significant bit, which means
using the first byte is wrong for big endian targets.

This has been wrong since r224411, but previously it only caused poor
performance. The _GLIBCXX_GUARD_TEST_AND_ACQUIRE at the very start of
__cxa_guard_acquire would always return false even if the initialization
was actually complete. Before my r11-3484 change the atomic compare
exchange would have loaded the correct value, and then returned 0 as
expected when the initialization is complete. After my change, in the
single-threaded case there is no redundant check for init being
complete, because I foolishly assumed that the check at the start of the
function actually worked.

The default definition of _GLIBCXX_GUARD_SET_AND_RELEASE is also wrong
for big endian EABI, but appears to work because it sets the wrong bit
but then the buggy TEST_AND_ACQUIRE tests that wrong bit as well. Also,
the buggy SET_AND_RELEASE macro is only used for targets with threads
enabled but no futex syscalls.

This should fix the regressions introduced by my patch, by defining
custom versions of the TEST_AND_ACQUIRE and SET_AND_RELEASE macros that
are correct for EABI.

libstdc++-v3/ChangeLog:

* config/cpu/arm/cxxabi_tweaks.h (_GLIBCXX_GUARD_TEST_AND_ACQUIRE):
(_GLIBCXX_GUARD_SET_AND_RELEASE): Define for EABI.

Avoid assuming a VLA access specification string contains a closing bracket (PR middle-end/97189).

Resolves:
PR middle-end/97189 - ICE on redeclaration of a function with VLA argument and attribute access

gcc/ChangeLog:

PR middle-end/97189
* attribs.c (attr_access::array_as_string): Avoid assuming a VLA
access specification string contains a closing bracket.

gcc/c-family/ChangeLog:

PR middle-end/97189
* c-attribs.c (append_access_attr): Use the function declaration
location for a warning about an attribute access argument.

gcc/testsuite/ChangeLog:

PR middle-end/97189
* gcc.dg/attr-access-2.c: Adjust caret location.
* gcc.dg/Wvla-parameter-6.c: New test.
* gcc.dg/Wvla-parameter-7.c: New test.

PR c/97206 - ICE in composite_type on declarations of a similar array types

gcc/ChangeLog:

PR c/97206
* attribs.c (attr_access::array_as_string): Avoid modifying a shared
type in place and use build_type_attribute_qual_variant instead.

gcc/testsuite/ChangeLog:

PR c/97206
* gcc.dg/Warray-parameter-7.c: New test.
* gcc.dg/Warray-parameter-8.c: New test.
* gcc.dg/Wvla-parameter-5.c: New test.

libstdc++: Use __is_same instead of __is_same_as

PR 92271 added __is_same as another spelling of __is_same_as. Since
Clang also spells it __is_same, let's just use that consistently.

It appears that Intel icc sets __GNUC__ to 10, but only supports
__is_same_as. If we only use __is_same for __GNUC__ >= 11 then we won't
break icc again (it looks like we broke previous versions of icc when we
started using __is_same_as).

libstdc++-v3/ChangeLog:

* include/bits/c++config (_GLIBCXX_HAVE_BUILTIN_IS_SAME):
Define for GCC 11 or when !__is_identifier(__is_same).
(_GLIBCXX_BUILTIN_IS_SAME_AS): Remove.
* include/std/type_traits (is_same, is_same_v): Replace uses
of _GLIBCXX_BUILTIN_IS_SAME_AS.

libgomp: Enforce 1-thread limit in subteams

Accelerators with fixed thread-counts will break if nested teams are expected
to have multiple threads each.

libgomp/ChangeLog:

2020-09-29 Andrew Stubbs <ams@codesourcery.com>

* parallel.c (gomp_resolve_num_threads): Ignore nest_var on nvptx
and amdgcn targets.

Fix some fnspec strings in trans-decl.c

* trans-decl.c (gfc_build_intrinsic_function_decls): Add traling dots
to spec strings so they match the number of parameters; do not use
R and W for non-pointer parameters. Drop pointless specifier on
caf_stop_numeric and caf_get_team.

Add trailing dots so length of spec string matches number of arguments.

2020-09-30 Jan Hubicka <hubicka@ucw.cz>

* trans-io.c (gfc_build_io_library_fndecls): Add trailing dots so
length of spec string matches number of arguments.

Add a testcase for PR target/96827

Add a testcase for PR target/96827 which was fixed by r11-3559:

commit 97b798d80baf945ea28236eef3fa69f36626b579
Author: Joel Hutton <joel.hutton@arm.com>
Date: Wed Sep 30 15:08:13 2020 +0100

[SLP][VECT] Add check to fix 96837

PR target/96827
* gcc.target/i386/pr96827.c: New test.

arm: [testsuite] Skip thumb2-cond-cmp tests on Cortex-M [PR94595]

Since r204778 (g571880a0a4c512195aa7d41929ba6795190887b2), we favor
branches over IT blocks on Cortex-M. As a result, instead of
generating two nested IT blocks in thumb2-cond-cmp-[1234].c, we
generate either a single IT block, or use branches depending on
conditions tested by the program.

Since this was a deliberate change and the tests still pass as
expected on Cortex-A, this patch skips them when targetting
Cortex-M. The avoids the failures on Cortex M3, M4, and M33. This
patch makes the testcases unsupported on Cortex-M7 although they pass
in this case because this CPU has different branch costs.

I tried to relax the scan-assembler directives using eg. cmpne|subne
or cmpgt|ble but that seemed fragile.

2020-09-07 Christophe Lyon <christophe.lyon@linaro.org>

gcc/testsuite/
PR target/94595
* gcc.target/arm/thumb2-cond-cmp-1.c: Skip if arm_cortex_m.
* gcc.target/arm/thumb2-cond-cmp-2.c: Skip if arm_cortex_m.
* gcc.target/arm/thumb2-cond-cmp-3.c: Skip if arm_cortex_m.
* gcc.target/arm/thumb2-cond-cmp-4.c: Skip if arm_cortex_m.

amend SLP reduction testcases

This amends SLP reduction testcases that currently trigger
vect_attempt_slp_rearrange_stmts eliding load permutations to
verify this is actually happening.

2020-09-30 Richard Biener <rguenther@suse.de>

* gcc.dg/vect/pr37027.c: Amend.
* gcc.dg/vect/pr67790.c: Likewise.
* gcc.dg/vect/pr92324-4.c: Likewise.
* gcc.dg/vect/pr92558.c: Likewise.
* gcc.dg/vect/pr95495.c: Likewise.
* gcc.dg/vect/slp-reduc-1.c: Likewise.
* gcc.dg/vect/slp-reduc-2.c: Likewise.
* gcc.dg/vect/slp-reduc-3.c: Likewise.
* gcc.dg/vect/slp-reduc-4.c: Likewise.
* gcc.dg/vect/slp-reduc-5.c: Likewise.
* gcc.dg/vect/slp-reduc-7.c: Likewise.
* gcc.dg/vect/vect-reduc-in-order-4.c: Likewise.

arm: add support for Cortex-A78 and Cortex-A78AE

This patch introduces support for Cortex-A78 [0] and Cortex-A78AE [1]
cpus.

[0]: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78
[1]: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78ae

OK for master branch ?

kind regards
Przemyslaw Wirkus

gcc/ChangeLog:

* config/arm/arm-cpus.in: Add Cortex-A78 and Cortex-A78AE cores.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
* doc/invoke.texi: Update docs.

aarch64: add support for Cortex-A78 and Cortex-A78AE

This patch introduces support for Cortex-A78 [0] and Cortex-A78AE [1]
cpus.

[0]: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78
[1]: https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a78ae

OK for master branch ?

kind regards
Przemyslaw Wirkus

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def: Add Cortex-A78 and Cortex-A78AE cores.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Add -mtune=cortex-a78 and -mtune=cortex-a78ae.

[GCC][PATCH] arm: Fix MVE intrinsics polymorphic variants wrongly generating __ARM_undef type (pr96795).

Hello,

This patch fixes (PR96795) MVE intrinsic polymorphic variants vaddq, vaddq_m, vaddq_x, vcmpeqq_m,
vcmpeqq, vcmpgeq_m, vcmpgeq, vcmpgtq_m, vcmpgtq, vcmpleq_m, vcmpleq, vcmpltq_m, vcmpltq,
vcmpneq_m, vcmpneq, vfmaq_m, vfmaq, vfmasq_m, vfmasq, vmaxnmavq, vmaxnmavq_p, vmaxnmvq,
vmaxnmvq_p, vminnmavq, vminnmavq_p, vminnmvq, vminnmvq_p, vmulq_m, vmulq, vmulq_x, vsetq_lane,
vsubq_m, vsubq and vsubq_x which are incorrectly generating __ARM_undef and mismatching the passed
floating point scalar arguments.

Bootstrapped on arm-none-linux-gnueabihf and regression tested on arm-none-eabi and found no regressions.

Ok for master? Ok for GCC-10 branch?

Regards,
Srinath.

gcc/ChangeLog:

2020-09-30 Srinath Parvathaneni <srinath.parvathaneni@arm.com>

PR target/96795
* config/arm/arm_mve.h (__ARM_mve_coerce2): Define.
(__arm_vaddq): Correct the scalar argument.
(__arm_vaddq_m): Likewise.
(__arm_vaddq_x): Likewise.
(__arm_vcmpeqq_m): Likewise.
(__arm_vcmpeqq): Likewise.
(__arm_vcmpgeq_m): Likewise.
(__arm_vcmpgeq): Likewise.
(__arm_vcmpgtq_m): Likewise.
(__arm_vcmpgtq): Likewise.
(__arm_vcmpleq_m): Likewise.
(__arm_vcmpleq): Likewise.
(__arm_vcmpltq_m): Likewise.
(__arm_vcmpltq): Likewise.
(__arm_vcmpneq_m): Likewise.
(__arm_vcmpneq): Likewise.
(__arm_vfmaq_m): Likewise.
(__arm_vfmaq): Likewise.
(__arm_vfmasq_m): Likewise.
(__arm_vfmasq): Likewise.
(__arm_vmaxnmavq): Likewise.
(__arm_vmaxnmavq_p): Likewise.
(__arm_vmaxnmvq): Likewise.
(__arm_vmaxnmvq_p): Likewise.
(__arm_vminnmavq): Likewise.
(__arm_vminnmavq_p): Likewise.
(__arm_vminnmvq): Likewise.
(__arm_vminnmvq_p): Likewise.
(__arm_vmulq_m): Likewise.
(__arm_vmulq): Likewise.
(__arm_vmulq_x): Likewise.
(__arm_vsetq_lane): Likewise.
(__arm_vsubq_m): Likewise.
(__arm_vsubq): Likewise.
(__arm_vsubq_x): Likewise.

gcc/testsuite/ChangeLog:

PR target/96795
* gcc.target/arm/mve/intrinsics/mve_fp_vaddq_n.c: New Test.
* gcc.target/arm/mve/intrinsics/mve_vaddq_n.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_x_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vaddq_x_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpeqq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgeq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgtq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpleq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpltq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpneq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmaq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmaq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmaq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmaq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmasq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmasq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmasq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vfmasq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmavq_p_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmaxnmvq_p_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_p_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmavq_p_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_p_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vminnmvq_p_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_x_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_x_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsubq_m_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsubq_m_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsubq_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsubq_n_f32-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsubq_x_n_f16-1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsubq_x_n_f32-1.c: Likewise.

[SLP][VECT] Add check to fix 96837

The following patch adds a simple check to prevent slp stmts from
vector constructors being rearranged. vect_attempt_slp_rearrange_stmts
tries to rearrange to avoid a load permutation.

This fixes PR target/96837
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96827

gcc/ChangeLog:

2020-09-29 Joel Hutton <joel.hutton@arm.com>

PR target/96837
* tree-vect-slp.c (vect_analyze_slp): Do not call
vect_attempt_slp_rearrange_stmts for vector constructors.

gcc/testsuite/ChangeLog:

2020-09-29 Joel Hutton <joel.hutton@arm.com>

PR target/96837
* gcc.dg/vect/bb-slp-49.c: New test.

middle-end: Refactor refcnt to use SLP_TREE_REF_COUNT for consistency

This is a small refactoring which introduces SLP_TREE_REF_COUNT and replaces
the uses of refcnt with it. This for consistency between the other properties.

A similar patch was pre-approved last year but since there are more use now I am
sending it for review anyway.

gcc/ChangeLog:

* tree-vectorizer.h (SLP_TREE_REF_COUNT): New.
* tree-vect-slp.c (_slp_tree::_slp_tree, _slp_tree::~_slp_tree,
vect_free_slp_tree, vect_build_slp_tree, vect_print_slp_tree,
slp_copy_subtree, vect_attempt_slp_rearrange_stmts): Use it.

c++: Kill DECL_HIDDEN_FRIEND_P

Now hiddenness is managed by name-lookup, we no longer need DECL_HIDDEN_FRIEND_P.
This removes it.  Mainly by deleting its bookkeeping, but there are a couple of uses

1) two name lookups look at it to see if they found a hidden thing.
In one we have the OVERLOAD, so can record OVL_HIDDEN_P.  In the other
we're repeating a lookup that failed, but asking for hidden things --
so if that succeeds we know the thing was hidden.  (FWIW CWG recently
discussed whether template specializations and instantiations should
see such hidden templates anyway, there is compiler divergence.)

2) We had a confusing setting of KOENIG_P when building a
non-dependent call.  We don't repeat that lookup at instantiation time
anyway.

gcc/cp/
* cp-tree.h (struct lang_decl_fn): Remove hidden_friend_p.
(DECL_HIDDEN_FRIEND_P): Delete.
* call.c (add_function_candidate): Drop assert about anticipated
decl.
(build_new_op_1): Drop koenig lookup flagging for hidden friend.
* decl.c (duplicate_decls): Drop HIDDEN_FRIEND_P updating.
* name-lookup.c (do_pushdecl): Likewise.
(set_decl_namespace): Discover hiddenness from OVL_HIDDEN_P.
* pt.c (check_explicit_specialization): Record found_hidden
explicitly.

Fortran: add contiguous check for ptr assignment, fix non-contig check (PR97242)

gcc/fortran/ChangeLog:

PR fortran/97242
* expr.c (gfc_is_not_contiguous): Fix check.
(gfc_check_pointer_assign): Use it.

gcc/testsuite/ChangeLog:

PR fortran/97242
* gfortran.dg/contiguous_11.f90: New test.
* gfortran.dg/contiguous_4.f90: Update.
* gfortran.dg/contiguous_7.f90: Update.

OpenMP: Add implicit declare target for nested procedures

gcc/ChangeLog:

* omp-offload.c (omp_discover_implicit_declare_target): Also
handled nested functions.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/declare-target-3.f90: New test.

This patch fixes PR97045 - unlimited polymorphic array element selectors.

2020-30-09 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/97045
* trans-array.c (gfc_conv_array_ref): Make sure that the class
decl is passed to build_array_ref in the case of unlimited
polymorphic entities.
* trans-expr.c (gfc_conv_derived_to_class): Ensure that array
refs do not preceed the _len component. Free the _len expr.
* trans-stmt.c (trans_associate_var): Reset 'need_len_assign'
for polymorphic scalars.
* trans.c (gfc_build_array_ref): When the vptr size is used for
span, multiply by the _len field of unlimited polymorphic
entities, when non-zero.

gcc/testsuite/
PR fortran/97045
* gfortran.dg/select_type_50.f90 : New test.

[nvptx] Add type arg to TARGET_LIBC_HAS_FUNCTION

GCC has a target hook TARGET_LIBC_HAS_FUNCTION, which tells the compiler
which functions it can expect to be present in libc.

The default target hook does not include the sincos functions.

The nvptx port of newlib does include sincos and sincosf, but not sincosl.

The target hook TARGET_LIBC_HAS_FUNCTION does not distinguish between sincos,
sincosf and sincosl, so if we enable it for the sincos functions, then for
test.c:
...
long double x, a, b;
int main (void) {
  x = 0.5;
  a = sinl (x);
  b = cosl (x);
  printf ("a: %f\n", (double)a);
  printf ("b: %f\n", (double)b);
  return 0;
}
...
we introduce a regression:
...
$ gcc test.c -lm -O2
unresolved symbol sincosl
collect2: error: ld returned 1 exit status
...

Add a type argument to target hook TARGET_LIBC_HAS_FUNCTION_TYPE, and use it
in nvptx_libc_has_function_type to enable sincos and sincosf, but not sincosl.

Build and reg-tested on x86_64-linux.

Build and tested on nvptx.

gcc/ChangeLog:

2020-09-28  Tobias Burnus  <tobias@codesourcery.com>
    Tom de Vries  <tdevries@suse.de>

* builtins.c (expand_builtin_cexpi, fold_builtin_sincos): Update
targetm.libc_has_function call.
* builtins.def (DEF_C94_BUILTIN, DEF_C99_BUILTIN, DEF_C11_BUILTIN):
(DEF_C2X_BUILTIN, DEF_C99_COMPL_BUILTIN, DEF_C99_C90RES_BUILTIN):
Same.
* config/darwin-protos.h (darwin_libc_has_function): Update prototype.
* config/darwin.c (darwin_libc_has_function): Add arg.
* config/linux-protos.h (linux_libc_has_function): Update prototype.
* config/linux.c (linux_libc_has_function): Add arg.
* config/i386/i386.c (ix86_libc_has_function): Update
targetm.libc_has_function call.
* config/nvptx/nvptx.c (nvptx_libc_has_function): New function.
(TARGET_LIBC_HAS_FUNCTION): Redefine to nvptx_libc_has_function.
* convert.c (convert_to_integer_1): Update targetm.libc_has_function
call.
* match.pd: Same.
* target.def (libc_has_function): Add arg.
* doc/tm.texi: Regenerate.
* targhooks.c (default_libc_has_function, gnu_libc_has_function)
(no_c99_libc_has_function): Add arg.
* targhooks.h (default_libc_has_function, no_c99_libc_has_function)
(gnu_libc_has_function): Update prototype.
* tree-ssa-math-opts.c (pass_cse_sincos::execute): Update
targetm.libc_has_function call.

gcc/fortran/ChangeLog:

2020-09-30  Tom de Vries  <tdevries@suse.de>

* f95-lang.c (gfc_init_builtin_functions):  Update
targetm.libc_has_function call.

x86: Use SET operation in MOVDIRI and MOVDIR64B

Since MOVDIRI and MOVDIR64B write to memory, similar to UNSPEC_MOVNT,
use SET operation in MOVDIRI and MOVDIR64B patterns with UNSPEC instead
of UNSPECV.

gcc/

PR target/97184
* config/i386/i386.md (UNSPECV_MOVDIRI): Renamed to ...
(UNSPEC_MOVDIRI): This.
(UNSPECV_MOVDIR64B): Renamed to ...
(UNSPEC_MOVDIR64B): This.
(movdiri<mode>): Use SET operation.
(@movdir64b_<mode>): Likewise.

gcc/testsuite/

PR target/97184
* gcc.target/i386/movdir64b.c: New test.
* gcc.target/i386/movdiri32.c: Likewise.
* gcc.target/i386/movdiri64.c: Likewise.
* lib/target-supports.exp (check_effective_target_movdir): New.

[testsuite] Re-enable pr94600-{1,3}.c tests for arm

Before commit 7e437162001 "[testsuite] Require non_strict_align in
pr94600-{1,3}.c", some tests were failing for nvptx, because volatile stores
were expected, but memcpy's were found instead.

This was traced back to this bit in compute_record_mode:
...
  /* If structure's known alignment is less than what the scalar
     mode would need, and it matters, then stick with BLKmode.  */
  if (mode != BLKmode
      && STRICT_ALIGNMENT
      && ! (TYPE_ALIGN (type) >= BIGGEST_ALIGNMENT
            || TYPE_ALIGN (type) >= GET_MODE_ALIGNMENT (mode)))
    {
      /* If this is the only reason this type is BLKmode, then
         don't force containing types to be BLKmode.  */
      TYPE_NO_FORCE_BLK (type) = 1;
      mode = BLKmode;
    }
...
which got triggered for nvptx, but not for x86_64.

The commit disabled the tests for non_strict_align effective target, but
that had the effect for the arm target that those tests were disabled, even
though they were passing before.

Further investigation in compute_record_mode shows that the if-condition
evaluates to false for arm because, because TYPE_ALIGN (type) == 32, while
it's 8 for nvptx.  This again can be explained by the
PCC_BITFIELD_TYPE_MATTERS setting, which is 1 for arm, but 0 for nvptx.

Re-enable the test for arm by using effective target
(non_strict_align || pcc_bitfield_type_matters).

Tested on arm-eabi and nvptx.

gcc/testsuite/ChangeLog:

2020-09-30  Tom de Vries  <tdevries@suse.de>

* gcc.dg/pr94600-1.c: Use effective target
(non_strict_align || pcc_bitfield_type_matters).
* gcc.dg/pr94600-3.c: Same.

i386: Define __LAHF_SAHF__ and __MOVBE__ macros, based on ISA flags

gcc/
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__LAHF_SAHF__ and __MOVBE__ based on ISA flags.

testsuite: Fix up amx* dg-do run tests with older binutils

These tests were missing dg-requires-effective-targets to ensure they
are UNSUPPORTED if the assembler doesn't have AMX support.

2020-09-30 Jakub Jelinek <jakub@redhat.com>

* gcc.target/i386/amxint8-dpbssd-2.c: Require effective targets
amx_tile and amx_int8.
* gcc.target/i386/amxint8-dpbsud-2.c: Likewise.
* gcc.target/i386/amxint8-dpbusd-2.c: Likewise.
* gcc.target/i386/amxint8-dpbuud-2.c: Likewise.
* gcc.target/i386/amxbf16-dpbf16ps-2.c: Require effective targets
amx_tile and amx_bf16.
* gcc.target/i386/amxtile-2.c: Require effective target amx_tile.

PR target/97150 AArch64: 2nd parameter of unsigned Neon scalar shift intrinsics should be signed

In this PR the second argument to the intrinsics should be signed but we
use an unsigned one erroneously.
The corresponding builtins are already using the correct types so it's
just a matter of correcting the signatures in arm_neon.h

gcc/
PR target/97150
* config/aarch64/arm_neon.h (vqrshlb_u8): Make second argument
signed.
(vqrshlh_u16): Likewise.
(vqrshls_u32): Likewise.
(vqrshld_u64): Likewise.
(vqshlb_u8): Likewise.
(vqshlh_u16): Likewise.
(vqshls_u32): Likewise.
(vqshld_u64): Likewise.
(vshld_u64): Likewise.

gcc/testsuite/
PR target/97150
* gcc.target/aarch64/pr97150.c: New test.

PR target/96313 AArch64: vqmovun* return types should be unsigned

In this PR we have the wrong return type for some intrinsics. It should
be unsigned, but we implement it as signed.
Fix this by adjusting the type qualifiers used when creating the
builtins and fixing the type in the arm_neon.h intrinsic.
With the adjustment in qualifiers we now don't need to cast the result
when returning.

Bootstrapped and tested on aarch64-none-linux-gnu.

gcc/
PR target/96313
* config/aarch64/aarch64-simd-builtins.def (sqmovun): Use UNOPUS
qualifiers.
* config/aarch64/arm_neon.h (vqmovun_s16): Adjust builtin call.
Remove unnecessary result cast.
(vqmovun_s32): Likewise.
(vqmovun_s64): Likewise.
(vqmovunh_s16): Likewise. Fix return type.
(vqmovuns_s32): Likewise.
(vqmovund_s64): Likewise.

gcc/testsuite/
PR target/96313
* gcc.target/aarch64/pr96313.c: New test.
* gcc.target/aarch64/scalar_intrinsics.c (test_vqmovunh_s16):
Adjust return type.
(test_vqmovuns_s32): Likewise.
(test_vqmovund_s64): Likewise.

aarch64: Tweak movti and movtf patterns

movti lacked an way of zeroing an FPR, meaning that we'd do:

        mov     x0, 0
        mov     x1, 0
        fmov    d0, x0
        fmov    v0.d[1], x1

instead of just:

        movi    v0.2d, #0

movtf had the opposite problem for GPRs: we'd generate:

        movi    v0.2d, #0
        fmov    x0, d0
        fmov    x1, v0.d[1]

instead of just:

        mov     x0, 0
        mov     x1, 0

Also, there was an unnecessary earlyclobber on the GPR<-GPR movtf
alternative (but not the movti one).  The splitter handles overlap
correctly.

The TF splitter used aarch64_reg_or_imm, but the _imm part only
accepts integer constants, not floating-point ones.  The patch
changes it to nonmemory_operand instead.

gcc/
* config/aarch64/aarch64.c (aarch64_split_128bit_move_p): Add a
function comment.  Tighten check for FP moves.
* config/aarch64/aarch64.md (*movti_aarch64): Add a w<-Z alternative.
(*movtf_aarch64): Handle r<-Y like r<-r.  Remove unnecessary
earlyclobber.  Change splitter predicate from aarch64_reg_or_imm
to nonmemory_operand.

gcc/testsuite/
* gcc.target/aarch64/movtf_1.c: New test.
* gcc.target/aarch64/movti_1.c: Likewise.

arm: Fix ICEs in no-literal-pool.c on MVE [PR97251]

This patch fixes ICEs when compiling
gcc/testsuite/gcc.target/arm/pure-code/no-literal-pool.c with
-mfp16-format=ieee -mfloat-abi=hard -march=armv8.1-m.main+mve
-mpure-code.

The existing conditions in the movsf/movdf expanders (as well as the
no_literal_pool patterns) were too restrictive, requiring
TARGET_HARD_FLOAT instead of TARGET_VFP_BASE, which caused unrecognised
insns when compiling this testcase with integer MVE and -mpure-code.

gcc/ChangeLog:

PR target/97251
* config/arm/arm.md (movsf): Relax TARGET_HARD_FLOAT to
TARGET_VFP_BASE.
(movdf): Likewise.
* config/arm/vfp.md (no_literal_pool_df_immediate): Likewise.
(no_literal_pool_sf_immediate): Likewise.

gcc/configure typo fix

* configure.ac (--with-long-double-format): Typo fix.
* configure: Regenerate.

Re: rs6000: Use parameterized names for tablejump

* config/rs6000/rs6000.md (@tablejump<mode>_normal): Don't use
non-existent operands[].
(@tablejump<mode>_nospec): Likewise.

Daily bump.

rs6000: Use parameterized names for tablejump

We have too many tablejump patterns. Using parameterized names
simplifies the code a bit.

2020-09-29 Segher Boessenkool <segher@kernel.crashing.org>

* config/rs6000/rs6000.md (tablejump): Simplify.
(tablejumpsi): Merge this ...
(tablejumpdi): ... and this ...
(@tablejump<mode>_normal): ... into this.
(tablejumpsi_nospec): Merge this ...
(tablejumpdi_nospec): ... and this ...
(@tablejump<mode>_nospec): ... into this.
(*tablejump<mode>_internal1): Delete, rename to ...
(@tablejump<mode>_insn_normal): ... this.
(*tablejump<mode>_internal1_nospec): Delete, rename to ...
(@tablejump<mode>_insn_nospec): ... this.

Correct and improve -Wnonnull for calls to functions with VLA arguments (PR middle-end/97188).

Resolves:
PR middle-end/97188 - ICE passing a null VLA to a function expecting at least one element

gcc/ChangeLog:

PR middle-end/97188
* calls.c (maybe_warn_rdwr_sizes): Simplify warning messages.
Correct handling of VLA argumments.

gcc/testsuite/ChangeLog:

PR middle-end/97188
* gcc.dg/Wstringop-overflow-23.c: Adjust text of expected warnings.
* gcc.dg/Wnonnull-4.c: New test.

c++: Implement -Wrange-loop-construct [PR94695]

This new warning can be used to prevent expensive copies inside range-based
for-loops, for instance:

  struct S { char arr[128]; };
  void fn () {
    S arr[5];
    for (const auto x : arr) {  }
  }

where auto deduces to S and then we copy the big S in every iteration.
Using "const auto &x" would not incur such a copy.  With this patch the
compiler will warn:

q.C:4:19: warning: loop variable 'x' creates a copy from type 'const S' [-Wrange-loop-construct]
    4 |   for (const auto x : arr) {  }
      |                   ^
q.C:4:19: note: use reference type 'const S&' to prevent copying
    4 |   for (const auto x : arr) {  }
      |                   ^
      |                   &

As per Clang, this warning is suppressed for trivially copyable types
whose size does not exceed 64B.  The tricky part of the patch was how
to figure out if using a reference would have prevented a copy.  To
that point, I'm using the new function called ref_conv_binds_directly_p.

This warning is enabled by -Wall.  Further warnings of similar nature
should follow soon.

gcc/c-family/ChangeLog:

PR c++/94695
* c.opt (Wrange-loop-construct): New option.

gcc/cp/ChangeLog:

PR c++/94695
* call.c (ref_conv_binds_directly_p): New function.
* cp-tree.h (ref_conv_binds_directly_p): Declare.
* parser.c (warn_for_range_copy): New function.
(cp_convert_range_for): Call it.

gcc/ChangeLog:

PR c++/94695
* doc/invoke.texi: Document -Wrange-loop-construct.

gcc/testsuite/ChangeLog:

PR c++/94695
* g++.dg/warn/Wrange-loop-construct.C: New test.

testsuite: Remove unnecessary DWARF2 xfails on AIX

A number of DWARF2 testsuite xfails no longer trigger on AIX. This patch
removes the unnecessary XFAIL decorations that cause extraneous notices
that clutter the testsuite output.

gcc/testsuite/ChangeLog:

2020-09-29 David Edelsohn <dje.gcc@gmail.com>

* g++.dg/debug/dwarf2/align-1.C: Remove AIX XFAIL.
* g++.dg/debug/dwarf2/align-2.C: Same.
* g++.dg/debug/dwarf2/align-3.C: Same.
* g++.dg/debug/dwarf2/align-4.C: Same.
* g++.dg/debug/dwarf2/align-5.C: Same.
* g++.dg/debug/dwarf2/align-6.C: Same.
* g++.dg/debug/dwarf2/defaulted-member-function-1.C: Same.
* g++.dg/debug/dwarf2/defaulted-member-function-2.C: Same.
* g++.dg/debug/dwarf2/defaulted-member-function-3.C: Same.
* g++.dg/debug/dwarf2/inline-var-1.C: Same.
* g++.dg/debug/dwarf2/inline-var-2.C: Same.
* g++.dg/debug/dwarf2/inline-var-3.C: Same.
* g++.dg/debug/dwarf2/noreturn-function.C: Same.
* g++.dg/debug/dwarf2/ptrdmem-1.C: Same.
* g++.dg/debug/dwarf2/ref-2.C: Same.
* g++.dg/debug/dwarf2/ref-3.C: Same.
* g++.dg/debug/dwarf2/ref-4.C: Same.
* g++.dg/debug/dwarf2/refqual-1.C: Same.
* g++.dg/debug/dwarf2/refqual-2.C: Same.
* gcc.dg/debug/dwarf2/align-1.c: Same.
* gcc.dg/debug/dwarf2/align-2.c: Same.
* gcc.dg/debug/dwarf2/align-3.c: Same.
* gcc.dg/debug/dwarf2/align-4.c: Same.
* gcc.dg/debug/dwarf2/align-5.c: Same.
* gcc.dg/debug/dwarf2/align-6.c: Same.
* gcc.dg/debug/dwarf2/align-as-1.c: Same.
* gcc.dg/debug/dwarf2/dwarf2-macro.c: Same.
* gcc.dg/debug/dwarf2/dwarf2-macro2.c: Same.
* gcc.dg/debug/dwarf2/lang-c89.c: Same.
* gcc.dg/debug/dwarf2/noreturn-function-attribute.c: Same.
* gcc.dg/debug/dwarf2/noreturn-function-keyword.c: Same.
* gcc.dg/debug/dwarf2/pr71855.c: Same.
* gcc.dg/debug/dwarf2/inline5.c: Add XFAIL on AIX.

analyzer: fix signal-handler registration location [PR95188]

PR analyzer/95188 reports that diagnostics from
-Wanalyzer-unsafe-call-within-signal-handler use the wrong
source location when reporting the signal-handler registration
event in the diagnostic_path. The diagnostics erroneously use the
location of the first stmt in the basic block containing the call
to "signal", rather than that of the call itself.

Fixed thusly.

gcc/analyzer/ChangeLog:
PR analyzer/95188
* engine.cc (stmt_requires_new_enode_p): Split enodes before
"signal" calls.

gcc/testsuite/ChangeLog:
PR analyzer/95188
* gcc.dg/analyzer/signal-registration-loc.c: New test.

Fix GCC 10+ build failure with zstd version 1.2.0 or older.

Extends the configure check for zstd.h to also verify the zstd version,
since gcc requires features that only exist in 1.3.0 and newer. Without
this patch we get a build error for lto-compress.c when using an old zstd
version.

gcc/
PR bootstrap/97183
* configure.ac (gcc_cv_header_zstd_h): Check ZSTD_VERISON_NUMBER.
* configure: Regenerated.

arm: add support for Cortex-X1

This adds support for the Arm Cortex-X1 CPU. For more information about this
processor, see [0].

[0] : https://www.arm.com/products/cortex-x

gcc/ChangeLog:

* config/arm/arm-cpus.in: Add Cortex-X1 core.
* config/arm/arm-tables.opt: Regenerate.
* config/arm/arm-tune.md: Regenerate.
* doc/invoke.texi: Update docs.

aarch64: add support for Cortex-X1

This adds support for the Arm Cortex-X1 CPU in AArch64 GCC. For more
information about this processor, see [0].

[0] : https://www.arm.com/products/cortex-x

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def: Add Cortex-X1 Arm core.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Add -mtune=cortex-x1 docs.

analyzer: silence -Wsign-compare warnings

gcc/analyzer/ChangeLog:
* constraint-manager.cc
(constraint_manager::add_constraint_internal): Whitespace fixes.
Silence -Wsign-compare warning.
* engine.cc (maybe_process_run_of_before_supernode_enodes):
Silence -Wsign-compare warning.

c++: Hiddenness is a property of the symbol table

This patch moves the handling of decl-hiddenness entirely into the
name lookup machinery, where it belongs.  We need a few new flags,
because pressing the existing OVL_HIDDEN_P into play for non-function
decls doesn't work well.  For a local binding we only need one marker,
as there cannot be both a hidden implicit typedef and a hidden
function.  That's not true for namespace-scope, where they could both
be hidden.

The name-lookup machinery maintains the existing decl_hidden and co
flags, and asserts have been sprinkled around to make sure they are
consistent.  The next series of patches will remove those old markers.
(we'll need to keep one, as there are some special restrictions on
redeclaring friend functions with in-class definitions or default args.)

gcc/cp/
* cp-tree.h (ovl_insert): Change final parm to hidden-or-using
indicator.
* name-lookup.h (HIDDEN_TYPE_BINDING_P): New.
(struct cxx_binding): Add type_is_hidden flag.
* tree.c (ovl_insert): Change using_p parm to using_or_hidden,
adjust.
(ovl_skip_hidden): Assert we never see a naked hidden decl.
* decl.c (xref_tag_1): Delete unhiding friend from here (moved to
lookup_elaborated_type_1).
* name-lookup.c (STAT_TYPE_HIDDEN_P, STAT_DECL_HIDDEN_P): New.
(name_lookup::search_namespace_only): Check new hidden markers.
(cxx_binding_make): Clear HIDDEN_TYPE_BINDING_P.
(update_binding): Update new hidden markers.
(lookup_name_1): Check HIDDEN_TYPE_BINDING_P and simplify friend
ignoring.
(lookup_elaborated_type_1): Use new hidden markers.  Reveal the
decl here.

x86: Replace <enqcmdntrin.h> with <enqcmdintrin.h>

Fix 2 typos in config/i386/enqcmdintrin.h by replacing <enqcmdntrin.h>
with <enqcmdintrin.h>:

[hjl@gnu-cfl-2 x86-gcc]$ echo "#include <enqcmdintrin.h>" | gcc -S -o /dev/null -x c -
In file included from <stdin>:1:
/usr/lib/gcc/x86_64-redhat-linux/10/include/enqcmdintrin.h:25:3: error: #error "Never use <enqcmdntrin.h> directly; include <x86intrin.h> instead."
   25 | # error "Never use <enqcmdntrin.h> directly; include <x86intrin.h> instead."
      |   ^~~~~
[hjl@gnu-cfl-2 x86-gcc]$

and _ENQCMDINTRIN_H_INCLUDED with _ENQCMDINTRIN_H_INCLUDED.

gcc/

PR target/97247
* config/i386/enqcmdintrin.h: Replace <enqcmdntrin.h> with
<enqcmdintrin.h>.  Replace _ENQCMDNTRIN_H_INCLUDED with
_ENQCMDINTRIN_H_INCLUDED.

c++: Name lookup simplifications

Here are a few cleanups, prior to landing the hidden decl changes.

1) Clear cxx_binding flags in the allocator, not at each user of the allocator.

2) Refactor update_binding. The logic was getting too convoluted.

3) Set friendliness and anticipatedness before pushing a template decl (not after).

gcc/cp/
* name-lookup.c (create_local_binding): Do not clear
INHERITED_VALUE_BINDING_P here.
(name_lookup::process_binding): Move done hidden-decl triage to ...
(name_lookup::search_namespace_only): ... here, its only caller.
(cxx_binding_make): Clear flags here.
(push_binding): Not here.
(pop_local_binding): RAII.
(update_binding): Refactor.
(do_pushdecl): Assert we're never revealing a local binding.
(do_pushdecl_with_scope): Directly call do_pushdecl.
(get_class_binding): Do not clear LOCAL_BINDING_P here.
* pt.c (push_template_decl): Set friend & anticipated before
pushing.

testsuite: Prevent spellcheck-inttypes failures on AIX.

AIX stdio.h implicitly includes sys/types.h, which implicitly includes
inttypes.h. With a recent AIX header fixincludes change to unilaterally
define STDC Macros, the GCC testsuite uses of inttypes now fails.

This patch explicitly defines the _STD_TYPES_T macro when the test is
run on AIX so that the inttypes.h header behaves as the testcase requires.

gcc/testsuite/ChangeLog:

2020-09-29 David Edelsohn <dje.gcc@gmail.com>

* g++.dg/spellcheck-inttypes.C: Define _STD_TYPES_T on AIX.
* gcc.dg/spellcheck-inttypes.c: Same.

c++: Identifier type value should not update binding

This simplification removes some unneeded behaviour in
set_identifier_type_value_with_scope, which was updating the namespace
binding.  And causing update_binding to have to deal with meeting two
implicit typedefs.  But the typedef is already there, and there's no
other way to have two such typedef's collide (we'll already have dealt
with that in lookup_elaborated_type).

So, let's kill this crufty code.

gcc/cp/
* name-lookup.c (update_binding): We never meet two implicit
typedefs.
(do_pushdecl): Adjust set_identifier_type_value_with_scope calls.
(set_identifier_type_value_with_scope): Do not update binding in
the namespace-case.  Assert it is already there.

tree-optimization/97241 - fix ICE in reduction vectorization

The following moves an ad-hoc attempt at discovering the SLP node
for a stmt to the place where we can find it in lock-step when
we find the stmt itself.

2020-09-29 Richard Biener <rguenther@suse.de>

PR tree-optimization/97241
* tree-vect-loop.c (vectorizable_reduction): Move finding
the SLP node for the reduction stmt to a better place.

* gcc.dg/vect/pr97241.c: New testcase.

move permute optimization to optimize-slp

This moves optimizing permutes of SLP reductions to vect_optimize_slp,
eliding the global slp_loads array.

2020-09-29 Richard Biener <rguenther@suse.de>

* tree-vect-slp.c (vect_analyze_slp): Move SLP reduction
re-arrangement and SLP graph load gathering...
(vect_optimize_slp): ... here.
* tree-vectorizer.h (vec_info::slp_loads): Remove.

Add missing FSF copyright notes for x86 intrinsic headers.

gcc/ChangeLog:

PR target/97231
* config/i386/amxbf16intrin.h: Add FSF copyright notes.
* config/i386/amxint8intrin.h: Ditto.
* config/i386/amxtileintrin.h: Ditto.
* config/i386/avx512vp2intersectintrin.h: Ditto.
* config/i386/avx512vp2intersectvlintrin.h: Ditto.
* config/i386/pconfigintrin.h: Ditto.
* config/i386/tsxldtrkintrin.h: Ditto.
* config/i386/wbnoinvdintrin.h: Ditto.

tree-optimization/97238 - fix typo causing ICE

This fixes a typo causing a NULL dereference.

2020-09-29 Richard Biener <rguenther@suse.de>

PR tree-optimization/97238
* tree-ssa-reassoc.c (ovce_extract_ops): Fix typo.

* gcc.dg/pr97238.c: New testcase.

libgomp: disable barriers in nested teams

Both GCN and NVPTX allow nested parallel regions, but the barrier
implementation did not allow the nested teams to run independently of each
other (due to hardware limitations). This patch fixes that, under the
assumption that each thread will create a new subteam of one thread, by
simply not using barriers when there's no other thread to synchronise.

libgomp/ChangeLog:

* config/gcn/bar.c (gomp_barrier_wait_end): Skip the barrier if the
total number of threads is one.
(gomp_team_barrier_wake): Likewise.
(gomp_team_barrier_wait_end): Likewise.
(gomp_team_barrier_wait_cancel_end): Likewise.
* config/nvptx/bar.c (gomp_barrier_wait_end): Likewise.
(gomp_team_barrier_wake): Likewise.
(gomp_team_barrier_wait_end): Likewise.
(gomp_team_barrier_wait_cancel_end): Likewise.
* testsuite/libgomp.c-c++-common/nested-parallel-unbalanced.c: New test.

arm: Add new vector mode macros

The AArch32 port now has three vector extensions: iwMMXt, Neon
and MVE.  We already have some named expanders that are shared
by all three, and soon we'll need more.

One way of handling this would be to use define_mode_iterators
that specify the condition for each mode.  For example,

  (V16QI "TARGET_NEON || TARGET_HAVE_MVE")
  (V8QI "TARGET_NEON || TARGET_REALLY_IWMXXT")
  ...
  (V2SF "TARGET_NEON && flag_unsafe_math_optimizations")

etc.  However, we'll need several mode iterators, and it would
be repetitive to specify the mode condition every time.

This patch therefore introduces per-mode macros that say whether
we can perform general arithmetic on the mode.  Initially there are
two sets of macros:

ARM_HAVE_NEON_<MODE>_ARITH
  true if Neon can handle general arithmetic on <MODE>

ARM_HAVE_<MODE>_ARITH
  true if any vector extension can handle general arithmetic on <MODE>

The macro definitions themselves are undeniably ugly, but hopefully
they're justified by the simplifications they allow.

The patch converts the addition patterns to use this scheme.

Previously there were three copies of the V8HF and V4HF addition
patterns for Neon:

(1) *add<VDQ:mode>3_neon, which provided plus:VnHF even without
    TARGET_NEON_FP16INST.  This was probably harmless since all the
    named patterns had an appropriate guard, but it is possible that
    something could have tried to generate the plus directly, such as
    by using a REG_EQUAL note to generate a new pattern.

(2) addv8hf3_neon and addv4hf3, which had the correct
    TARGET_NEON_FP16INST target condition, but unnecessarily required
    flag_unsafe_math_optimizations.  Unlike VnSF operations, VnHF
    operations do not force flush to zero.

(3) add<VH:mode>3_fp16, which provided plus:VnHF with the
    correct conditions (TARGET_NEON_FP16INST, with no
    flag_unsafe_math_optimizations test).

The patch in essence renames add<VH:mode>3_fp16 to *add<VH:mode>3_neon
(part of *add<VDQ:mode>3_neon) and removes the other two patterns.

gcc/
* config/arm/arm.h (ARM_HAVE_NEON_V8QI_ARITH, ARM_HAVE_NEON_V4HI_ARITH)
(ARM_HAVE_NEON_V2SI_ARITH, ARM_HAVE_NEON_V16QI_ARITH): New macros.
(ARM_HAVE_NEON_V8HI_ARITH, ARM_HAVE_NEON_V4SI_ARITH): Likewise.
(ARM_HAVE_NEON_V2DI_ARITH, ARM_HAVE_NEON_V4HF_ARITH): Likewise.
(ARM_HAVE_NEON_V8HF_ARITH, ARM_HAVE_NEON_V2SF_ARITH): Likewise.
(ARM_HAVE_NEON_V4SF_ARITH, ARM_HAVE_V8QI_ARITH, ARM_HAVE_V4HI_ARITH)
(ARM_HAVE_V2SI_ARITH, ARM_HAVE_V16QI_ARITH, ARM_HAVE_V8HI_ARITH)
(ARM_HAVE_V4SI_ARITH, ARM_HAVE_V2DI_ARITH, ARM_HAVE_V4HF_ARITH)
(ARM_HAVE_V2SF_ARITH, ARM_HAVE_V8HF_ARITH, ARM_HAVE_V4SF_ARITH):
Likewise.
* config/arm/iterators.md (VNIM, VNINOTM): Delete.
* config/arm/vec-common.md (add<VNIM:mode>3, addv8hf3)
(add<VNINOTM:mode>3): Replace with...
(add<VDQ:mode>3): ...this new expander.
* config/arm/neon.md (*add<VDQ:mode>3_neon): Use the new
ARM_HAVE_NEON_<MODE>_ARITH macros as the C condition.
(addv8hf3_neon, addv4hf3, add<VFH:mode>3_fp16): Delete in
favor of the above.
(neon_vadd<VH:mode>): Use gen_add<mode>3 instead of
gen_add<mode>3_fp16.

gcc/testsuite/
* gcc.target/arm/armv8_2-fp16-arith-2.c: Expect FP16 vectorization
even without -ffast-math.

RISC-V: Define __riscv_cmodel_medany for PIC mode.

- According the conclusion in RISC-V C API document, we decide to deprecat
the __riscv_cmodel_pic marco

- __riscv_cmodel_pic is deprecated and will removed in next GCC
release.

[1] https://github.com/riscv/riscv-c-api-doc/pull/11

gcc/ChangeLog:

* config/riscv/riscv-c.c (riscv_cpu_cpp_builtins): Define
__riscv_cmodel_medany when PIC mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-3.c: Update testcase.
* gcc.target/riscv/predef-6.c: Ditto.

aarch64: Fix ordering of aarch64-cores.def

This patch moves the entry for Neoverse N2 (an Armv8.5-A CPU) after
Saphira (an Armv8.4-A CPU) to preserve the overall ordering in the file.

Committing as obvious.

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def: Move neoverse-n2 after saphira.
* config/aarch64/aarch64-tune.md: Regenerate.

switch conversion: make a rapid speed up

gcc/ChangeLog:

PR tree-optimization/96979
* tree-switch-conversion.c (jump_table_cluster::can_be_handled):
Make a fast bail out.
(bit_test_cluster::can_be_handled): Likewise here.
* tree-switch-conversion.h (get_range): Use wi::to_wide instead
of a folding.

gcc/testsuite/ChangeLog:

PR tree-optimization/96979
* g++.dg/tree-ssa/pr96979.C: New test.

Revert "switch lowering: limit number of cluster attemps"

This reverts commit c6df6039e9180c580945266302ec14047d358364.

testsuite: Skip symver1 on AIX.

symver1.c only is valid on ELF targets. Add AIX to the skip list.

gcc/testsuite/ChangeLog

2020-09-28 David Edelsohn <dje.gcc@gmail.com>

* gcc.dg/ipa/symver1.c: Skip on AIX.

RISC-V/libgcc: Use `-fasynchronous-unwind-tables' for LIB2_DIVMOD_FUNCS

Use `-fasynchronous-unwind-tables' rather than `-fexceptions
-fnon-call-exceptions' in LIB2_DIVMOD_FUNCS compilation flags so as to
provide unwind tables for the affected functions while not pulling the
unwinder proper, which is not required here.

Beyond saving program space it fixes a RISC-V glibc build error due to
unsatisfied `malloc' and `free' references from the unwinder causing
link errors with `ld.so' where libgcc has been built at -O0.

libgcc/
* config/riscv/t-elf (LIB2_DIVMOD_EXCEPTION_FLAGS): New
variable.

Daily bump.

analyzer: add some missing FINAL OVERRIDEs

Spotted by cppcheck.

gcc/analyzer/ChangeLog:
* region-model.h (binop_svalue::dyn_cast_binop_svalue): Remove
redundant "virtual". Add FINAL OVERRIDE.
(widening_svalue::dyn_cast_widening_svalue): Add FINAL OVERRIDE.
(compound_svalue::dyn_cast_compound_svalue): Likewise.
(conjured_svalue::dyn_cast_conjured_svalue): Likewise.

analyzer: remove unused field

I added this field (and the struct itself) in the rewrite of region and
value-handling (808f4dfeb3a95f50f15e71148e5c1067f90a126d), but the field
was never used.

Found by cppcheck.

gcc/analyzer/ChangeLog:
* diagnostic-manager.cc (null_assignment_sm_context::m_visitor):
Remove unused field.

analyzer: fix ICE on non-pointer longjmp [PR97233]

gcc/analyzer/ChangeLog:
PR analyzer/97233
* analyzer.cc (is_longjmp_call_p): Require the initial argument
to be a pointer.
* engine.cc (exploded_node::on_longjmp): Likewise.

gcc/testsuite/ChangeLog:
PR analyzer/97233
* gcc.dg/analyzer/pr97233.c: New test.

analyzer: fix sm_state_map::print

In 10fc42a8396072912e9d9d940fba25950b3fdfc5 I converted state_t from
unsigned to const state *, but missed this comparison against 0.

gcc/analyzer/ChangeLog:
* program-state.cc (sm_state_map::print): Update check
for m_global_state being the start state.

net: add hurd build tag

Patch from Svante Signell.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/257857

irange_allocator class

This is the irange storage class. It is used to allocate the
minimum amount of storage needed for a given irange. Storage is
automatically freed at destruction of the storage class.

It is meant for long term storage, as opposed to int_range_max
which is meant for intermediate temporary results on the stack.

The general gist is:

irange_allocator alloc;

// Allocate an irange of 5 sub-ranges.
irange *p = alloc.allocate (5);

// Allocate an irange of 3 sub-ranges.
irange *q = alloc.allocate (3);

// Allocate an irange with as many sub-ranges as are currently
// used in "some_other_range".
irange *r = alloc.allocate (some_other_range);

gcc/ChangeLog:

* value-range.h (class irange): Add irange_allocator friend.
(class irange_allocator): New.

libgfortran/m4/unpack.m4: Silence -Wmaybe-uninitialized

libgfortran/ChangeLog:

* m4/unpack.m4 (unpack0_'rtype_code`,
unpack1_'rtype_code`): Move 'rstride[0]' initialization outside
conditional branch to silence -Wmaybe-uninitialized.
* generated/unpack_c10.c: Regenerate.
* generated/unpack_c16.c: Regenerate.
* generated/unpack_c4.c: Regenerate.
* generated/unpack_c8.c: Regenerate.
* generated/unpack_i1.c: Regenerate.
* generated/unpack_i16.c: Regenerate.
* generated/unpack_i2.c: Regenerate.
* generated/unpack_i4.c: Regenerate.
* generated/unpack_i8.c: Regenerate.
* generated/unpack_r10.c: Regenerate.
* generated/unpack_r16.c: Regenerate.
* generated/unpack_r4.c: Regenerate.
* generated/unpack_r8.c: Regenerate.

libbacktrace: build mtest.dSYM if using dsymutil

libbacktrace/ChangeLog:
PR libbacktrace/97082
* Makefile.am (check_DATA): Add mtest.dSYM if USE_DSYMUTIL.
* Makefile.in: Regenerate.

libbacktrace: only run dsymutil with Mach-O

libbacktrace/ChangeLog:
PR libbacktrace/97227
* configure.ac (USE_DSYMUTIL): Define instead of HAVE_DSYMUTIL.
* Makefile.am: Change all uses of HAVE_DSYMUTIL to USE_DSYMUTIL.
* configure: Regenerate.
* Makefile.in: Regenerate.

OpenMP: Handle cpp_implicit_alias in declare-target discovery (PR96390)

gcc/ChangeLog:

PR middle-end/96390
* omp-offload.c (omp_discover_declare_target_tgt_fn_r): Handle
alias nodes.

libgomp/ChangeLog:

PR middle-end/96390
* testsuite/libgomp.c++/pr96390.C: New test.
* testsuite/libgomp.c-c++-common/pr96390.c: New test.

libstdc++: Rearrange some range adaptors' data members

Since the standard range adaptors are specified to derive from the empty
class view_base, having their first data member store the underlying
view is suboptimal, for if the underlying view also derives from
view_base then the two view_base subobjects will be adjacent; this
prevents the compiler from applying the empty base optimization to elide
away the storage for these two empty bases.

This patch improves the situation by declaring the _M_base data member
last instead of first in each range adaptor that has more than one data
member, so that the empty base optimization can apply in more cases.

libstdc++-v3/ChangeLog:

* include/std/ranges (filter_view): Declare the data member
_M_base last instead of first, and adjust constructors' member
initializer lists accordingly.
(transform_view): Likewise.
(take_view): Likewise.
(take_while_view): Likewise.
(drop_view): Likewise.
(drop_while_view): Likewise.
(join_view): Likewise.
(split_view): Likewise (and tweak nearby formatting).
(reverse_view): Likewise.
* testsuite/std/ranges/adaptors/sizeof.cc: Update expected
sizes.

libstdc++: Add test that tracks range adaptors' sizes

libstdc++-v3/ChangeLog:

* testsuite/std/ranges/adaptors/sizeof.cc: New test.

libstdc++: Reduce the size of a subrange with empty sentinel type

libstdc++-v3/ChangeLog:

* include/bits/ranges_util.h (subrange::_M_end): Give it
[[no_unique_address]].
* testsuite/std/ranges/subrange/sizeof.cc: New test.

libstdc++: Reduce the size of an unbounded iota_view

libstdc++-v3/ChangeLog:

* include/std/ranges (iota_view::_M_bound): Give it
[[no_unique_address]].
* testsuite/std/ranges/iota/iota_view.cc: Check that an
unbounded iota_view has minimal size.

rs6000: Add tests for _mm_insert_epi{8,32,64}

Copied from gcc.target/i386.

2020-09-23 Paul A. Clarke <pc@us.ibm.com>

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/sse4_1-pinsrb.c: New test.
* gcc.target/powerpc/sse4_1-pinsrd.c: New test.
* gcc.target/powerpc/sse4_1-pinsrq.c: New test.

rs6000: Support _mm_insert_epi{8,32,64}

Add compatibility implementations for SSE4.1 intrinsics
_mm_insert_epi8, _mm_insert_epi32, _mm_insert_epi64.

2020-09-23 Paul A. Clarke <pc@us.ibm.com>

gcc/
* config/rs6000/smmintrin.h (_mm_insert_epi8): New.
(_mm_insert_epi32): New.
(_mm_insert_epi64): New.

Enable GCC support for AMX-TILE,AMX-INT8,AMX-BF16.

AMX-TILE:ldtilecfg/sttilecfg/tileloadd/tileloaddt1/tilezero/tilerelease
AMX-INT8:tdpbssd/tdpbsud/tdpbusd/tdpbuud
AMX-BF16:tdpbf16ps

gcc/ChangeLog

* common/config/i386/i386-common.c (OPTION_MASK_ISA2_AMX_TILE_SET,
OPTION_MASK_ISA2_AMX_INT8_SET, OPTION_MASK_ISA2_AMX_BF16_SET,
OPTION_MASK_ISA2_AMX_TILE_UNSET, OPTION_MASK_ISA2_AMX_INT8_UNSET,
OPTION_MASK_ISA2_AMX_BF16_UNSET, OPTION_MASK_ISA2_XSAVE_UNSET):
New marcos.
(ix86_handle_option): Hanlde -mamx-tile, -mamx-int8, -mamx-bf16.
* common/config/i386/i386-cpuinfo.h (processor_types): Add
FEATURE_AMX_TILE, FEATURE_AMX_INT8, FEATURE_AMX_BF16.
* common/config/i386/cpuinfo.h (XSTATE_TILECFG,
XSTATE_TILEDATA, XCR_AMX_ENABLED_MASK): New macro.
(get_available_features): Enable AMX features only if
their states are suoorited by OSXSAVE.
* common/config/i386/i386-isas.h: Add ISA_NAME_TABLE_ENTRY
for amx-tile, amx-int8, amx-bf16.
* config.gcc: Add amxtileintrin.h, amxint8intrin.h,
amxbf16intrin.h to extra headers.
* config/i386/amxbf16intrin.h: New file.
* config/i386/amxint8intrin.h: Ditto.
* config/i386/amxtileintrin.h: Ditto.
* config/i386/cpuid.h (bit_AMX_BF16, bit_AMX_TILE, bit_AMX_INT8):
New macro.
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__AMX_TILE__, __AMX_INT8__, AMX_BF16__.
* config/i386/i386-options.c (ix86_target_string): Add
-mamx-tile, -mamx-int8, -mamx-bf16.
(ix86_option_override_internal): Handle AMX-TILE,
AMX-INT8, AMX-BF16.
* config/i386/i386.h (TARGET_AMX_TILE, TARGET_AMX_TILE_P,
TARGET_AMX_INT8, TARGET_AMX_INT8_P, TARGET_AMX_BF16_P,
PTA_AMX_TILE, PTA_AMX_INT8, PTA_AMX_BF16): New macros.
* config/i386/i386.opt: Add -mamx-tile, -mamx-int8, -mamx-bf16.
* config/i386/immintrin.h: Include amxtileintrin.h,
amxint8intrin.h, amxbf16intrin.h.
* doc/invoke.texi: Document -mamx-tile, -mamx-int8, -mamx-bf16.
* doc/extend.texi: Document amx-tile, amx-int8, amx-bf16.
* doc/sourcebuild.texi ((Effective-Target Keywords, Other
hardware attributes): Document amx_int8, amx_tile, amx_bf16.

gcc/testsuite/ChangeLog

* lib/target-supports.exp (check_effective_target_amx_tile,
check_effective_target_amx_int8,
check_effective_target_amx_bf16): New proc.
* g++.dg/other/i386-2.C: Add -mamx-tile, -mamx-int8, -mamx-bf16.
* g++.dg/other/i386-3.C: Ditto.
* gcc.target/i386/sse-12.c: Ditto.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/funcspec-56.inc: Add new target attribute.
* gcc.target/i386/amx-check.h: New header file.
* gcc.target/i386/amxbf16-asmatt-1.c: New test.
* gcc.target/i386/amxint8-asmatt-1.c: New test.
* gcc.target/i386/amxtile-asmatt-1.c: Ditto.
* gcc.target/i386/amxbf16-asmintel-1.c: Ditto.
* gcc.target/i386/amxint8-asmintel-1.c: Ditto.
* gcc.target/i386/amxtile-asmintel-1.c: Ditto.
* gcc.target/i386/amxbf16-dpbf16ps-2.c: Ditto.
* gcc.target/i386/amxint8-dpbssd-2.c: Ditto.
* gcc.target/i386/amxint8-dpbsud-2.c: Ditto.
* gcc.target/i386/amxint8-dpbusd-2.c: Ditto.
* gcc.target/i386/amxint8-dpbuud-2.c: Ditto.
* gcc.target/i386/amxtile-2.c: Ditto.

aarch64: Do not alter force_reg returned rtx expanding pauth builtins

2020-09-21 Andrea Corallo <andrea.corallo@arm.com>

* config/aarch64/aarch64-builtins.c
(aarch64_general_expand_builtin): Do not alter value on a
force_reg returned rtx.