git.libre-soc.org Git

jit: add support for inline asm [PR87291]

This patch adds various entrypoints to libgccjit for directly embedding
asm statements into a compile, analogous to inline asm in the C frontend:
  gcc_jit_block_add_extended_asm
  gcc_jit_block_end_with_extended_asm_goto
  gcc_jit_extended_asm_as_object
  gcc_jit_extended_asm_set_volatile_flag
  gcc_jit_extended_asm_set_inline_flag
  gcc_jit_extended_asm_add_output_operand
  gcc_jit_extended_asm_add_input_operand
  gcc_jit_extended_asm_add_clobber
  gcc_jit_context_add_top_level_asm

gcc/jit/ChangeLog:
PR jit/87291
* docs/cp/topics/asm.rst: New file.
* docs/cp/topics/index.rst (Topic Reference): Add it.
* docs/topics/asm.rst: New file.
* docs/topics/compatibility.rst (LIBGCCJIT_ABI_15): New.
* docs/topics/functions.rst (Statements): Add link to extended
asm.
* docs/topics/index.rst (Topic Reference): Add asm.rst.
* docs/topics/objects.rst: Add gcc_jit_extended_asm to ASCII art.
* docs/_build/texinfo/Makefile: Regenerate.
* docs/_build/texinfo/libgccjit.texi: Regenerate.
* jit-common.h (gcc::jit::recording::extended_asm): New forward
decl.
(gcc::jit::recording::top_level_asm): Likewise.
* jit-playback.c: Include "stmt.h".
(build_string): New.
(gcc::jit::playback::context::new_string_literal): Disambiguate
build_string call.
(gcc::jit::playback::context::add_top_level_asm): New.
(build_operand_chain): New.
(build_clobbers): New.
(build_goto_operands): New.
(gcc::jit::playback::block::add_extended_asm): New.
* jit-playback.h (gcc::jit::playback::context::add_top_level_asm):
New decl.
(struct gcc::jit::playback::asm_operand): New struct.
(gcc::jit::playback::block::add_extended_asm): New decl.
* jit-recording.c (gcc::jit::recording::context::dump_to_file):
Dump top-level asms.
(gcc::jit::recording::context::add_top_level_asm): New.
(gcc::jit::recording::block::add_extended_asm): New.
(gcc::jit::recording::block::end_with_extended_asm_goto): New.
(gcc::jit::recording::asm_operand::asm_operand): New.
(gcc::jit::recording::asm_operand::print): New.
(gcc::jit::recording::asm_operand::make_debug_string): New.
(gcc::jit::recording::output_asm_operand::write_reproducer): New.
(gcc::jit::recording::output_asm_operand::print): New.
(gcc::jit::recording::input_asm_operand::write_reproducer): New.
(gcc::jit::recording::input_asm_operand::print): New.
(gcc::jit::recording::extended_asm::add_output_operand): New.
(gcc::jit::recording::extended_asm::add_input_operand): New.
(gcc::jit::recording::extended_asm::add_clobber): New.
(gcc::jit::recording::extended_asm::replay_into): New.
(gcc::jit::recording::extended_asm::make_debug_string): New.
(gcc::jit::recording::extended_asm::write_flags): New.
(gcc::jit::recording::extended_asm::write_clobbers): New.
(gcc::jit::recording::extended_asm_simple::write_reproducer): New.
(gcc::jit::recording::extended_asm::maybe_populate_playback_blocks):
New.
(gcc::jit::recording::extended_asm_goto::extended_asm_goto): New.
(gcc::jit::recording::extended_asm_goto::replay_into): New.
(gcc::jit::recording::extended_asm_goto::write_reproducer): New.
(gcc::jit::recording::extended_asm_goto::get_successor_blocks):
New.
(gcc::jit::recording::extended_asm_goto::maybe_print_gotos): New.
(gcc::jit::recording::extended_asm_goto::maybe_populate_playback_blocks):
New.
(gcc::jit::recording::top_level_asm::top_level_asm): New.
(gcc::jit::recording::top_level_asm::replay_into): New.
(gcc::jit::recording::top_level_asm::make_debug_string): New.
(gcc::jit::recording::top_level_asm::write_to_dump): New.
(gcc::jit::recording::top_level_asm::write_reproducer): New.
* jit-recording.h
(gcc::jit::recording::context::add_top_level_asm): New decl.
(gcc::jit::recording::context::m_top_level_asms): New field.
(gcc::jit::recording::block::add_extended_asm): New decl.
(gcc::jit::recording::block::end_with_extended_asm_goto): New
decl.
(gcc::jit::recording::asm_operand): New class.
(gcc::jit::recording::output_asm_operand): New class.
(gcc::jit::recording::input_asm_operand): New class.
(gcc::jit::recording::extended_asm): New class.
(gcc::jit::recording::extended_asm_simple): New class.
(gcc::jit::recording::extended_asm_goto): New class.
(gcc::jit::recording::top_level_asm): New class.
* libgccjit++.h (gccjit::extended_asm): New forward decl.
(gccjit::context::add_top_level_asm): New.
(gccjit::block::add_extended_asm): New.
(gccjit::block::end_with_extended_asm_goto): New.
(gccjit::extended_asm): New class.
(gccjit::extended_asm::extended_asm): New ctors.
(gccjit::extended_asm::set_volatile_flag): New.
(gccjit::extended_asm::set_inline_flag): New.
(gccjit::extended_asm::add_output_operand): New.
(gccjit::extended_asm::add_input_operand): New.
(gccjit::extended_asm::add_clobber): New.
(gccjit::extended_asm::get_inner_extended_asm): New.
* libgccjit.c (struct gcc_jit_extended_asm): New.
(jit_error): Make "loc" param take a gcc::jit::recording::location *
rather than a gcc_jit_location *.
(gcc_jit_block_add_extended_asm): New entrypoint.
(gcc_jit_block_end_with_extended_asm_goto): New entrypoint.
(gcc_jit_extended_asm_as_object): New entrypoint.
(gcc_jit_extended_asm_set_volatile_flag): New entrypoint.
(gcc_jit_extended_asm_set_inline_flag): New entrypoint.
(gcc_jit_extended_asm_add_output_operand): New entrypoint.
(gcc_jit_extended_asm_add_clobber): New entrypoint.
(gcc_jit_context_add_top_level_asm): New entrypoint.
* libgccjit.h: Add gcc_jit_extended_asm to ASCII art.
(gcc_jit_extended_asm): New typedef.
(LIBGCCJIT_HAVE_ASM_STATEMENTS): New define.
(gcc_jit_block_add_extended_asm): New entrypoint.
(gcc_jit_block_end_with_extended_asm_goto): New entrypoint.
(gcc_jit_extended_asm_as_object): New entrypoint.
(gcc_jit_extended_asm_set_volatile_flag): New entrypoint.
(gcc_jit_extended_asm_set_inline_flag): New entrypoint.
(gcc_jit_extended_asm_add_output_operand): New entrypoint.
(gcc_jit_extended_asm_add_input_operand): New entrypoint.
(gcc_jit_extended_asm_add_clobber): New entrypoint.
(gcc_jit_context_add_top_level_asm): New entrypoint.
* libgccjit.map (LIBGCCJIT_ABI_15): New.

gcc/testsuite/ChangeLog:
PR jit/87291
* jit.dg/jit.exp: Load target-supports-dg.exp.
Set dg-do-what-default.
(jit-dg-test): Set dg-do-what and call dg-get-options, skipping
the test if it's not supported on the given target.
* jit.dg/test-asm.c: New test.
* jit.dg/test-asm.cc: New test.

jit: fix string escaping

This patch fixes a bug in recording::string::make_debug_string in which
'\t' and '\n' were "escaped" by simply prepending a '\', thus emitting
'\' then '\n', rather than '\' then 'n'.  It also removes a hack that
determined if a string is to be escaped by checking for a leading '"',
by instead adding a flag.

gcc/jit/ChangeLog:
* jit-recording.c (recording::context::new_string): Add "escaped"
param and use it when creating the new recording::string instance.
(recording::string::string): Add "escaped" param and use it to
initialize m_escaped.
(recording::string::make_debug_string): Replace check that first
char is double-quote with use of m_escaped.  Fix escaping of
'\t' and '\n'.  Set "escaped" on the result.
* jit-recording.h (recording::context::new_string): Add "escaped"
param.
(recording::string::string): Add "escaped" param.
(recording::string::m_escaped): New field.

gcc/testsuite/ChangeLog:
* jit.dg/test-debug-strings.c (create_code): Add tests of
string literal escaping.

libgccjit.h: fix typo in comment

gcc/jit/ChangeLog:
* libgccjit.h: Fix typo in comment.

RISC-V: Enable ifunc if it was supported in the binutils for linux toolchain.

gcc/
* configure: Regenerated.
* configure.ac: If ifunc was supported in the binutils for
linux toolchain, then set enable_gnu_indirect_function to yes.

c: C2x __has_c_attribute

C2x adds the __has_c_attribute preprocessor operator, similar to C++
__has_cpp_attribute.

GCC implements __has_cpp_attribute as exactly equivalent to
__has_attribute.  (The documentation says they differ regarding the
values returned for standard attributes, but that's actually only a
matter of the particular nonzero value returned not being specified in
the documentation for __has_attribute; the implementation makes no
distinction between the two.)

I don't think having them exactly equivalent is actually correct,
either for __has_cpp_attribute or for __has_c_attribute.
Specifically, I think it is only correct for __has_cpp_attribute or
__has_c_attribute to return nonzero if the given attribute is
supported, with the particular pp-tokens passed to __has_cpp_attribute
or __has_c_attribute, with [[]] syntax, not if it's only accepted in
__attribute__ or with gnu:: added in [[]].  For example, they should
return nonzero for gnu::packed, but zero for plain packed, because
[[gnu::packed]] is accepted but [[packed]] is ignored as not a
standard attribute.

This patch implements that for __has_c_attribute, leaving any changes
to __has_cpp_attribute for the C++ maintainers.  A new
BT_HAS_STD_ATTRIBUTE is added for __has_c_attribute (which I think,
based on the above, would actually be correct to use for
__has_cpp_attribute as well).  The code in c_common_has_attribute that
deals with scopes has its C++ conditional removed; instead, whether
the language is C or C++ is used only to determine the numeric values
returned for standard attributes (and which standard attributes are
handled there at all).  A new argument is passed to
c_common_has_attribute to distinguish BT_HAS_STD_ATTRIBUTE from
BT_HAS_ATTRIBUTE, and that argument is used to stop attributes with no
scope specified from being accepted with __has_c_attribute unless they
are one of the known standard attributes and so handled specially.

Although the standard specify constants ending with 'L' as the values
for the standard attributes, there is no correctness issue with the
lack of code in GCC to add that 'L' to the expansion:
__has_c_attribute and __has_cpp_attribute are expanded in #if after
other macro expansion has occurred, with no semantics being specified
if they occur outside #if, so there is no way for a conforming program
to inspect the exact text of the expansion of those macros, only to
use the resulting pp-number in a #if expression, where long and int
have the same set of values.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/
2020-11-12  Joseph Myers  <joseph@codesourcery.com>

* doc/cpp.texi (__has_attribute): Document when scopes are allowed
for C.
(__has_c_attribute): New.

gcc/c-family/
2020-11-12  Joseph Myers  <joseph@codesourcery.com>

* c-lex.c (c_common_has_attribute): Take argument std_syntax.
Allow scope for C.  Handle standard attributes for C.  Do not
accept unscoped attributes if std_syntax and not handled as
standard attributes.
* c-common.h (c_common_has_attribute): Update prototype.

gcc/testsuite/
2020-11-12  Joseph Myers  <joseph@codesourcery.com>

* gcc.dg/c2x-has-c-attribute-1.c, gcc.dg/c2x-has-c-attribute-2.c,
gcc.dg/c2x-has-c-attribute-3.c, gcc.dg/c2x-has-c-attribute-4.c:
New tests.

libcpp/
2020-11-12  Joseph Myers  <joseph@codesourcery.com>

* include/cpplib.h (struct cpp_callbacks): Add bool argument to
has_attribute.
(enum cpp_builtin_type): Add BT_HAS_STD_ATTRIBUTE.
* init.c (builtin_array): Add __has_c_attribute.
(cpp_init_special_builtins): Handle BT_HAS_STD_ATTRIBUTE.
* macro.c (_cpp_builtin_macro_text): Handle BT_HAS_STD_ATTRIBUTE.
Update call to has_attribute for BT_HAS_ATTRIBUTE.
* traditional.c (fun_like_macro): Handle BT_HAS_STD_ATTRIBUTE.

openmp: Implement allocate clause in omp lowering.

For now, task/taskloop constructs aren't handled and C/C++ array reductions
and reductions with task or inscan modifiers need further work.
Instead of calling omp_alloc/omp_free (where the former doesn't have
alignment argument and omp_aligned_alloc is 5.1 only feature), this calls
GOMP_alloc/GOMP_free, so that the library can fail if it would fall back
into NULL (exception is zero length allocations).

2020-11-12  Jakub Jelinek  <jakub@redhat.com>

gcc/
* builtin-types.def (BT_FN_PTR_SIZE_SIZE_PTRMODE): New function type.
* omp-builtins.def (BUILT_IN_GOACC_DECLARE): Move earlier.
(BUILT_IN_GOMP_ALLOC, BUILT_IN_GOMP_FREE): New builtins.
* gimplify.c (gimplify_scan_omp_clauses): Force allocator into a
decl if it is not NULL, INTEGER_CST or decl.
(gimplify_adjust_omp_clauses): Clear GOVD_EXPLICIT on explicit clauses
which are being removed.  Remove allocate clauses for variables not seen
if they are private, firstprivate or linear too.  Call
omp_notice_variable on the allocator otherwise.
(gimplify_omp_for): Handle iterator vars mentioned in allocate clauses
similarly to non-is_gimple_reg iterators.
* omp-low.c (struct omp_context): Add allocate_map field.
(delete_omp_context): Delete it.
(scan_sharing_clauses): Fill it from allocate clauses.  Remove it
if mentioned also in shared clause.
(lower_private_allocate): New function.
(lower_rec_input_clauses): Handle allocate clause for privatized
variables, except for task/taskloop, C/C++ array reductions for now
and task/inscan variables.
(lower_send_shared_vars): Don't consider variables in allocate_map
as shared.
* omp-expand.c (expand_omp_for_generic, expand_omp_for_static_nochunk,
expand_omp_for_static_chunk): Use expand_omp_build_assign instead of
gimple_build_assign + gsi_insert_after.
* builtins.c (builtin_fnspec): Handle BUILTIN_GOMP_ALLOC and
BUILTIN_GOMP_FREE.
* tree-ssa-ccp.c (evaluate_stmt): Handle BUILTIN_GOMP_ALLOC.
* tree-ssa-dce.c (mark_stmt_if_obviously_necessary): Handle
BUILTIN_GOMP_ALLOC.
(mark_all_reaching_defs_necessary_1): Handle BUILTIN_GOMP_ALLOC
and BUILTIN_GOMP_FREE.
(propagate_necessity): Likewise.
gcc/fortran/
* f95-lang.c (ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LIST):
Define.
(gfc_init_builtin_functions): Add alloc_size and warn_unused_result
attributes to __builtin_GOMP_alloc.
* types.def (BT_PTRMODE): New primitive type.
(BT_FN_VOID_PTR_PTRMODE, BT_FN_PTR_SIZE_SIZE_PTRMODE): New function
types.
libgomp/
* libgomp.map (GOMP_alloc, GOMP_free): Export at GOMP_5.0.1.
* omp.h.in (omp_alloc): Add malloc and alloc_size attributes.
* libgomp_g.h (GOMP_alloc, GOMP_free): Declare.
* allocator.c (omp_aligned_alloc): New for now static function,
add alignment argument and handle it.
(omp_alloc): Reimplement using omp_aligned_alloc.
(GOMP_alloc, GOMP_free): New functions.
(omp_free): Add ialias.
* testsuite/libgomp.c-c++-common/allocate-1.c: New test.
* testsuite/libgomp.c++/allocate-1.C: New test.

Adjust 'libgomp.oacc-fortran/attach-descriptor-1.f90' for improved location information

Fix-up for commit b71ff8c15f5a7d6b1cc1524b4d27843f0d88dbda "Fortran: improve
location data for OpenACC/OpenMP directives [PR97782]".

libgomp/
PR fortran/97782
* testsuite/libgomp.oacc-fortran/attach-descriptor-1.f90: Adjust.

cgraph: Avoid segfault when attempting to dump NULL clone_info

cgraph_node::materialize_clone segfaulted when I tried compiling
Tramp3D with -fdump-ipa-all because there was no clone_info - IPA-CP
created a clone only for an aggregate constant, adding a note to its
transformation summary but not creating any tree_map nor
param_adjustements.

Fixed with the following obvious extra checks which has passed
bootstrap and testing on x86_64-linux.

gcc/ChangeLog:

2020-11-12 Martin Jambor <mjambor@suse.cz>

* cgraphclones.c (cgraph_node::materialize_clone): Check that clone
info is not NULL before attempting to dump it.

ipa-cp: Work with time benefits and frequencies in sreals

This patch converts the variables that hold time benefits and
frequencies in IPA-CP from plain integers to sreals, avoiding the need
to cap them to avoid overflows and also fixing a potential underflows.

Size costs corresponding to individual constants are left as ints so
that they do not take up too much space.  Care must be taken that
adding it up does not overflow, especially in the case of
prop_size_cost, because in cases of extremely long chains of lattice
dependencies it can overflow (e.g. in testsuite/gcc.dg/ipa/pr50744.c).
The overall size is already tracked in long ints.

gcc/ChangeLog:

2020-11-11  Martin Jambor  <mjambor@suse.cz>

* ipa-cp.c (class ipcp_value_base): Change the type of
local_time_benefit and prop_time_benefit to sreal.  Adjust the
constructor initializer.
(ipcp_lattice::print): Dump sreals.
(struct caller_statistics): Change the type of freq_sum to sreal.
(gather_caller_stats): Work with sreal freq_sum.
(incorporate_penalties): Work with sreal evaluation.
(good_cloning_opportunity_p): Adjusted for sreal sreal time_benefit
and freq_sum.  Bail out if size_cost is INT_MAX.
(perform_estimation_of_a_value): Work with sreal time_benefit.  Avoid
unnecessary capping.
(estimate_local_effects): Pass sreal time benefit to
good_cloning_opportunity_p without capping it.  Adjust dumping.
(safe_add): If there can be overflow, return INT_MAX.
(propagate_effects): Work with sreal times.
(get_info_about_necessary_edges): Work with sreal frequencies.
(decide_about_value): Likewise and with sreal time benefits.

system: Add WARN_UNUSED_RESULT

I'd like to have the option of marking functions with
__attribute__ ((__warn_unused_result__)), so this patch adds a macro.
And use it for maybe_wrap_with_location, it's always a bug if the
return value is not used, which happened to me and got me confused.

gcc/ChangeLog:

* system.h (WARN_UNUSED_RESULT): Define for GCC >= 3.4.
* tree.h (maybe_wrap_with_location): Add WARN_UNUSED_RESULT.

Compare field offsets in operand_equal_p and OEP_ADDRESS_OF

* fold-const.c (operand_compare::operand_equal_p): Compare field
offsets in operand_equal_p and OEP_ADDRESS_OF.
(operand_compare::hash_operand): Update.

libstdc++: Simplify __numeric_traits definition

This changes the __numeric_traits primary template to assume its
argument is an integer type. For the three floating point types that are
supported by __numeric_traits_floating an explicit specialization of
__numeric_traits chooses the right base class.

This improves the failure mode for using __numeric_traits with an
unsupported type. Previously it would use __numeric_traits_floating as
the base class, and give somewhat obscure errors for trying to access
the static data members. Now it will use __numeric_traits_integer which
has a static_assert to check for supported types.

As a side effect of this change there is no need to instantiate
__conditional_type to decide which base class to use.

libstdc++-v3/ChangeLog:

* include/ext/numeric_traits.h (__numeric_traits): Change
primary template to always derive from __numeric_traits_integer.
(__numeric_traits<float>, __numeric_traits<double>)
(__numeric_traits<long double>): Add explicit specializations.

More PRE compile-time optimizations

This fixes a bug in bitmap_list_view which could end up with
a NULL head->current which makes followup searches fail. Oops.

It also further optimizes the PRE DFS walk by removing useless
stuff and special-casing bitmaps with just one element for
EXECUTE_IF_AND_IN_BITMAP which makes a quite big difference.

2020-11-12 Richard Biener <rguenther@suse.de>

* bitmap.c (bitmap_list_view): Restore head->current.
* tree-ssa-pre.c (pre_expr_DFS): Elide expr_visited bitmap.
Special-case value expression bitmaps with one element.
(bitmap_find_leader): Likewise.
(sorted_array_from_bitmap_set): Elide expr_visited bitmap.

Specify reason of -Winvalid-pch warning

gcc/c-family
PR pch/86674
* c-pch.c (c_common_valid_pch): Use cpp_warning with CPP_W_INVALID_PCH
reason to fix -Werror=invalid-pch and -Wno-error=invalid-pch switches.

libcpp
PR pch/86674
* files.c (_cpp_find_file): Use CPP_DL_NOTE not CPP_DL_ERROR in call to
cpp_error.

Add support for copy specifiers in fnspec

* attr-fnspec.h: Update topleve comment.
(attr_fnspec::arg_direct_p): Accept 1...9.
(attr_fnspec::arg_maybe_written_p): Reject 1...9.
(attr_fnspec::arg_copied_to_arg_p): New member function.
* builtins.c (builtin_fnspec): Update fnspec of block copy.
* tree-ssa-alias.c (attr_fnspec::verify): Update.

Fortran: improve location data for OpenACC/OpenMP directives [PR97782]

gcc/fortran/ChangeLog:

PR fortran/97782
* trans-openmp.c (gfc_trans_oacc_construct, gfc_trans_omp_parallel_do,
gfc_trans_omp_parallel_do_simd, gfc_trans_omp_parallel_sections,
gfc_trans_omp_parallel_workshare, gfc_trans_omp_sections
gfc_trans_omp_single, gfc_trans_omp_task, gfc_trans_omp_teams
gfc_trans_omp_target, gfc_trans_omp_target_data,
gfc_trans_omp_workshare): Use code->loc instead of input_location
when building the OMP_/OACC_ construct.

gcc/testsuite/ChangeLog:

PR fortran/97782
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Move dg-message
one line up.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.

libstdc++: Fix __numeric_traits_integer<__int20> [PR 97798]

The expression used to calculate the maximum value for an integer type
assumes that the number of bits in the value representation is always
sizeof(T) * CHAR_BIT. This is not true for the __int20 type on msp430,
which has only 20 bits in the value representation but 32 bits in the
object representation. This causes an integer overflow in a constant
expression, which is ill-formed.

This problem was already solved by DJ for std::numeric_limits<__int20>
by generalizing the helper macros to use a specified number of bits
instead of assuming sizeof(T) * CHAR_BIT. Then the INT_N_n types can
specify the number of bits using the __GLIBCXX_BITSIZE_INT_N_n macros
that the compiler defines.

I'm using a slightly different approach here. I've replaced the helper
macros entirely, and just expanded the calculations in the initializers
for the static data members. By reordering the data members we can reuse
__is_signed and __digits in the other initializers. This removes the
repetition of expanding __glibcxx_signed(T) and __glibcxx_digits(T)
multiple times in each initializer.

The __is_integer_nonstrict trait now defines a new constant, __width,
which is sizeof(T) * CHAR_BIT by default (defined as an enumerator so
that no storage is needed for a static data member). By specializing
__is_integer_nonstrict for the INT_N types that have padding bits, we
can provide the correct width via the __GLIBCXX_BITSIZE_INT_N_n macros.

libstdc++-v3/ChangeLog:

PR libstdc++/97798
* include/ext/numeric_traits.h (__glibcxx_signed)
(__glibcxx_digits, __glibcxx_min, __glibcxx_max): Remove
macros.
(__is_integer_nonstrict::__width): Define new constant.
(__numeric_traits_integer): Define constants in terms of each
other and __is_integer_nonstrict::__width, rather than the
removed macros.
(_GLIBCXX_INT_N_TRAITS): Macro to define explicit
specializations for non-standard integer types.

Add test case for PR 97799.

gcc/testsuite/ChangeLog:

* gfortran.dg/entry_23.f: New test.

Avoid PRE insert iteration when possible

The following make sure to only iterate PRE insertion when
necessary - which is when AVAIL_OUT of a predecessor of a
block we already visited changed (that's backedge destinations).

To not regress this also makes sure to locally iterate insertion
since even topological sort of expressions isn't enough to
guarantee we get all opportunities of a block in one iteration.
This avoids costly re-compute of the topologically sorted expression
array (more micro-optimization is possible here).

2020-11-12 Richard Biener <rguenther@suse.de>

* tree-ssa-pre.c (bitmap_value_replace_in_set): Return
whether we have changed anything.
(do_pre_regular_insertion): Get topologically sorted array
of expressions from caller.
(do_pre_partial_partial_insertion): Likewise.
(insert): Compute topologically sorted arrays of expressions
here and locally iterate actual insertion. Iterate only
when AVAIL_OUT of an already visited block source changed.

aarch64: Fix SVE2 BCAX pattern [PR97730]

This patch adds a missing not to the SVE2 BCAX (Bitwise clear and
exclusive or) pattern, fixing the PR. Since SVE doesn't have an
unpredicated not instruction, we need to use a (vacuously) predicated
not here.

To ensure that the predicate is instantiated correctly (to all 1s) for
the intrinsics, we pull out a separate expander from the define_insn.

From the ISA reference [1]:
> Bitwise AND elements of the second source vector with the
> corresponding inverted elements of the third source vector, then
> exclusive OR the results with corresponding elements of the first
> source vector.

[1] : https://developer.arm.com/docs/ddi0602/g/a64-sve-instructions-alphabetic-order/bcax-bitwise-clear-and-exclusive-or

gcc/ChangeLog:

PR target/97730
* config/aarch64/aarch64-sve2.md (@aarch64_sve2_bcax<mode>):
Change to define_expand, add missing (trivially-predicated) not
rtx to fix wrong code bug.
(*aarch64_sve2_bcax<mode>): New.

gcc/testsuite/ChangeLog:

PR target/97730
* gcc.target/aarch64/sve2/bcax_1.c (OP): Add missing bitwise not
to match correct bcax semantics.
* gcc.dg/vect/pr97730.c: New test.

tree-optimization/97806 - fix PRE expression post order

This fixes the postorder compute for the case of multiple
expression leaders for a value.

2020-11-12 Richard Biener <rguenther@suse.de>

PR tree-optimization/97806
* tree-ssa-pre.c (pre_expr_DFS): New overload for visiting
values, visiting all leaders for a value. Use a bitmap
for visited values.
(sorted_array_from_bitmap_set): Walk over values and adjust.

* gcc.dg/pr97806.c: New testcase.

c++: Fix up constexpr CLEANUP_POINT_EXPR and TRY_FINALLY_EXPR handling [PR97790]

As the testcase shows, CLEANUP_POINT_EXPR (and I think TRY_FINALLY_EXPR too)
suffer from the same problem that I was trying to fix in
r10-3597-g1006c9d4395a939820df76f37c7b085a4a1a003f
for CLEANUP_STMT, namely that if in the middle of the body expression of
those stmts is e.g. return stmt, goto, break or continue (something that
changes *jump_target and makes it start skipping stmts), we then skip the
cleanups too, which is not appropriate - the cleanups were either queued up
during the non-skipping execution of the body (for CLEANUP_POINT_EXPR), or
for TRY_FINALLY_EXPR are relevant already after entering the body block.

> Would it make sense to always use a NULL jump_target when evaluating
> cleanups?

I was afraid of that, especially for TRY_FINALLY_EXPR, but it seems that
during constexpr evaluation the cleanups will most often be just very simple
destructor calls (or calls to cleanup attribute functions).
Furthermore, for neither of these 3 tree codes we'll reach that code if
jump_target && *jump_target initially (there is a return NULL_TREE much
earlier for those except for trees that could embed labels etc. in it and
clearly these 3 don't count in that).

2020-11-12 Jakub Jelinek <jakub@redhat.com>

PR c++/97790
* constexpr.c (cxx_eval_constant_expression) <case CLEANUP_POINT_EXPR,
case TRY_FINALLY_EXPR, case CLEANUP_STMT>: Don't pass jump_target to
cxx_eval_constant_expression when evaluating the cleanups.

* g++.dg/cpp2a/constexpr-dtor9.C: New test.

IBM Z: Fix PR97326: Enable fp compares in vec_cmp

gcc/ChangeLog:

PR target/97326
* config/s390/vector.md: Support vector floating point modes in
vec_cmp.

IBM Z: Rename mode attr tointvec to TOINTVEC

Just a preparation to add a lower-case tointvec.

gcc/ChangeLog:

* config/s390/vector.md: Rename tointvec to TOINTVEC.
* config/s390/vx-builtins.md: Likewise.

dwarf2: Set DW_AT_declaration for undefined fns [PR97060]

If DECL_INITIAL isn't set, we can't emit anything about the body of the
function, so add the declaration attribute.

gcc/ChangeLog:

PR debug/97060
* dwarf2out.c (gen_subprogram_die): It's a declaration
if DECL_INITIAL isn't set.

gcc/testsuite/ChangeLog:

PR debug/97060
* gcc.dg/debug/dwarf2/pr97060.c: New test.

testsuite: Adjust pr96789.c by disabling loop vect

New test gcc.dg/tree-ssa/pr96789.c fails on
arm-none-linux-gnueabihf since loop vectorizer is able to optimize
the two loops which operate on array tmp with load_lanes feature
support, it make dse3 fail to find expected inputs.

As Richard suggested, this patch is to replace option
-ftree-vectorize to -ftree-slp-vectorize -fno-tree-loop-vectorize.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr96789.c: Adjusted by disabling loop
vectorization.

analyzer: precision-of-wording for -Wanalyzer-stale-setjmp-buffer

This patch adds a custom event to paths emitted by
-Wanalyzer-stale-setjmp-buffer highlighting the place where the
pertinent stack frame is popped, and updates the final event in
the path to reference this.

gcc/analyzer/ChangeLog:
* checker-path.h (checker_event::get_id_ptr): New.
* diagnostic-manager.cc (path_builder::path_builder): Add "sd"
param and use it to initialize new field "m_sd".
(path_builder::get_pending_diagnostic): New.
(path_builder::m_sd): New field.
(diagnostic_manager::emit_saved_diagnostic): Pass sd to
path_builder ctor.
(diagnostic_manager::add_events_for_superedge): Call new
maybe_add_custom_events_for_superedge vfunc.
* engine.cc (stale_jmp_buf::stale_jmp_buf): Add "setjmp_point"
param and use it to initialize new field "m_setjmp_point".
Initialize new field "m_stack_pop_event".
(stale_jmp_buf::maybe_add_custom_events_for_superedge): New vfunc
implementation.
(stale_jmp_buf::describe_final_event): New vfunc implementation.
(stale_jmp_buf::m_setjmp_point): New field.
(stale_jmp_buf::m_stack_pop_event): New field.
(exploded_node::on_longjmp): Pass setjmp_point to stale_jmp_buf
ctor.
* pending-diagnostic.h
(pending_diagnostic::maybe_add_custom_events_for_superedge): New
vfunc.

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/setjmp-5.c: Update expected path output to show
an event where the pertinent stack frame is popped. Update
expected message from final event to reference this event.

analyzer: warn on invalid shift counts [PR97424]

This patch implements -Wanalyzer-shift-count-negative
and -Wanalyzer-shift-count-overflow, analogous to the C/C++
warnings -Wshift-count-negative and -Wshift-count-overflow, but
implemented via interprocedural path analysis rather than via parsing
in a front end, and thus capable of detecting interprocedural cases that the
warnings implemented in the front ends can miss.

gcc/analyzer/ChangeLog:
PR tree-optimization/97424
* analyzer.opt (Wanalyzer-shift-count-negative): New.
(Wanalyzer-shift-count-overflow): New.
* region-model.cc (class shift_count_negative_diagnostic): New.
(class shift_count_overflow_diagnostic): New.
(region_model::get_gassign_result): Complain about shift counts that
are negative or are >= the operand's type's width.

gcc/ChangeLog:
PR tree-optimization/97424
* doc/invoke.texi (Static Analyzer Options): Add
-Wno-analyzer-shift-count-negative and
-Wno-analyzer-shift-count-overflow.
(-Wno-analyzer-shift-count-negative): New.
(-Wno-analyzer-shift-count-overflow): New.

gcc/testsuite/ChangeLog:
PR tree-optimization/97424
* gcc.dg/analyzer/invalid-shift-1.c: New test.

Daily bump.

CFI-handling : Add a hook to allow target-specific Personality and LSDA indirections.

At present, the output of .cfi_personality and .cfi_lsda assumes
ELF semantics for indirections. This isn't suitable for all targets
and is one blocker to moving Darwin to use .cfi_xxxx.

The patch adds a target hook that allows non-ELF targets to use
indirections appropriate to their needs.

gcc/ChangeLog:

* config/darwin-protos.h (darwin_make_eh_symbol_indirect): New.
* config/darwin.c (darwin_make_eh_symbol_indirect): New. Use
Mach-O semantics for personality and ldsa indirections.
* config/darwin.h (TARGET_ASM_MAKE_EH_SYMBOL_INDIRECT): New.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Add TARGET_ASM_MAKE_EH_SYMBOL_INDIRECT hook.
* dwarf2out.c (dwarf2out_do_cfi_startproc): If the target defines
a hook for indirecting personality and ldsa references, use that
otherwise default to ELF semantics.
* target.def (make_eh_symbol_indirect): New target hook.

Objective-C++ : Allow prefix attrs on linkage specs.

For Objective-C++, this combines prefix attributes from before and
after top level linkage specs. The "reference implementation" for
Objective-C++ allows this, and system headers depend on it.

e.g.

__attribute__((__deprecated__))
extern "C" __attribute__((__visibility__("default")))
@interface MyClass
...
@end

Would consider the list of prefix attributes to the interface for
MyClass to include both the visibility and deprecated ones.

When we are compiling regular C++, this emits a warning and discards
any prefix attributes before a linkage spec.

gcc/cp/ChangeLog:

* parser.c (cp_parser_declaration): Unless we are compiling for
Ojective-C++, warn about and discard any attributes that prefix
a linkage specification.

c++: Change the mangling of __alignof__ [PR88115]

This patch changes the mangling of __alignof__ to v111__alignof__,
making its mangling distinct from that of alignof(type) and
alignof(expr).

How we mangle ALIGNOF_EXPR now depends on its ALIGNOF_EXPR_STD_P flag,
which after the previous patch gets consistently set for alignof(type)
as well as alignof(expr).

gcc/c-family/ChangeLog:

PR c++/88115
* c-opts.c (c_common_post_options): Update latest_abi_version.

gcc/ChangeLog:

PR c++/88115
* common.opt (-fabi-version): Document =15.
* doc/invoke.texi (C++ Dialect Options): Likewise.

gcc/cp/ChangeLog:

PR c++/88115
* mangle.c (write_expression): Mangle __alignof_ differently
from alignof when the ABI version is at least 15.

libiberty/ChangeLog:

PR c++/88115
* cp-demangle.c (d_print_comp_inner)
<case DEMANGLE_COMPONENT_EXTENDED_OPERATOR>: Don't print the
"operator " prefix for __alignof__.
<case DEMANGLE_COMPONENT_UNARY>: Always print parens around the
operand of __alignof__.
* testsuite/demangle-expected: Test demangling for __alignof__.

gcc/testsuite/ChangeLog:

PR c++/88115
* g++.dg/abi/macro0.C: Adjust.
* g++.dg/cpp0x/alignof7.C: New test.
* g++.dg/cpp0x/alignof8.C: New test.

c++: Correct the handling of alignof(expr) [PR88115]

We're currently neglecting to set the ALIGNOF_EXPR_STD_P flag on an
ALIGNOF_EXPR when its operand is an expression.  This leads to us
handling alignof(expr) as if it were written __alignof__(expr), and
returning the preferred alignment instead of the ABI alignment.  In the
testcase below, this causes the first and third static_assert to fail on
x86.

gcc/cp/ChangeLog:

PR c++/88115
* cp-tree.h (cxx_sizeof_or_alignof_expr): Add bool parameter.
* decl.c (fold_sizeof_expr): Pass false to
cxx_sizeof_or_alignof_expr.
* parser.c (cp_parser_unary_expression): Pass std_alignof to
cxx_sizeof_or_alignof_expr.
* pt.c (tsubst_copy): Pass false to cxx_sizeof_or_alignof_expr.
(tsubst_copy_and_build): Pass std_alignof to
cxx_sizeof_or_alignof_expr.
* typeck.c (cxx_alignof_expr): Add std_alignof bool parameter
and pass it to cxx_sizeof_or_alignof_type.  Set ALIGNOF_EXPR_STD_P
appropriately.
(cxx_sizeof_or_alignof_expr): Add std_alignof bool parameter
and pass it to cxx_alignof_expr.  Assert op is either
SIZEOF_EXPR or ALIGNOF_EXPR.

libcc1/ChangeLog:

PR c++/88115
* libcp1plugin.cc (plugin_build_unary_expr): Pass true to
cxx_sizeof_or_alignof_expr.

gcc/testsuite/ChangeLog:

PR c++/88115
* g++.dg/cpp0x/alignof6.C: New test.

c++: Tweak tsubst_qualified_id location.

Retain the location when tsubstituting a qualified-id so that our
static_assert diagnostic can benefit. Don't create useless location
wrappers for temporary variables.

gcc/ChangeLog:

PR c++/97518
* tree.c (maybe_wrap_with_location): Don't add a location
wrapper around an artificial and ignored decl.

gcc/cp/ChangeLog:

PR c++/97518
* pt.c (tsubst_qualified_id): Use EXPR_LOCATION of the qualified-id.
Use it to maybe_wrap_with_location the final expression.

gcc/testsuite/ChangeLog:

PR c++/97518
* g++.dg/diagnostic/static_assert3.C: New test.

Fix PRE NEW_SETS guarding

Accesses to NEW_SETS should be properly guarded. Committed
as obvious.

2020-11-11 Richard Biener <rguenther@suse.de>

PR tree-optimization/97623
* tree-ssa-pre.c (create_expression_by_pieces): Guard
NEW_SETS access.
(insert_into_preds_of_block): Likewise.

libstdc++: Exclude cygwin and mingw from linker relro support

PE format does not have ELF style relro linker support, exclude
from checking. If the host linker supports ELF format, configure
may get confused.

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_CHECK_LINKER_FEATURES): Exclude
cygwin and mingw from relro linker test.
* configure: Regenerate.

Fix PRE topological expression set sorting

This fixes sorted_array_from_bitmap_set to do a topological sort
as required by re-using what PHI-translation does, namely a DFS
walk with the help of bitmap_find_leader. The proper result
is verified by extra checking in clean () (which would have tripped
before) and for the testcase I'm working at during the last
patches (PR97623) it is neutral in compile-time cost.

2020-11-11 Richard Biener <rguenther@suse.de>

* tree-ssa-pre.c (pre_expr_DFS): New function.
(sorted_array_from_bitmap_set): Use it to properly
topologically sort the expression set.
(clean): Verify we've cleaned everything we should.

testsuite: Fix up scan-tree-dump-times regexps for 64-bit targets

The added (?:_ull) match on 32-bit targets, but are equivalent to just
adding _ull into the strings, i.e. require the _ull substrings, while
the intent is that they are optional, so we should use (?:_ull)? instead.

2020-11-11 Jakub Jelinek <jakub@redhat.com>

* gfortran.dg/gomp/workshare-reduction-3.f90: Use (?:_ull)? instead
of (?:_ull) in the scan-tree-dump-times directives.
* gfortran.dg/gomp/workshare-reduction-26.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-27.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-28.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-36.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-37.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-38.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-39.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-40.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-41.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-42.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-43.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-44.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-45.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-46.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-47.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-56.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-57.f90: Likewise.

Remove superfluous call to Base_Type

gcc/ada/ChangeLog:
* gcc-interface/gigi.h: Remove ^L characters throughout.
* gcc-interface/decl.c: Likewise.
* gcc-interface/utils.c: Likewise.
* gcc-interface/utils2.c: Likewise.
* gcc-interface/trans.c (gnat_to_gnu) <N_Allocator>: Do not explicitly
go to the base type for the Has_Constrained_Partial_View flag.

Fix biased integer arithmetic

The Ada compiler uses a biased representation when a size clause reserves
fewer bits than normal either for the lower or for the upper bound.

gcc/ada/ChangeLog:
* gcc-interface/trans.c (build_binary_op_trapv): Convert operands
to the result type before doing generic overflow checking.

gcc/testsuite/ChangeLog:
* gnat.dg/bias2.adb: New test.

Fix segfault on elaboration of empty 1-element array at -O

This is a rather obscure case where the elaboration of an empty array
whose base type is an array type of length at most 1 goes awry when
the code is compiled with optimization.

gcc/ada/ChangeLog:
* gcc-interface/trans.c (can_be_lower_p): Remove.
(Regular_Loop_to_gnu): Add ENTRY_COND unconditionally if
BOTTOM_COND is non-zero.

gcc/testsuite/ChangeLog:
* gnat.dg/opt89.adb: New test.

Fix internal error on chain of constants with -gnatc

gcc/ada/ChangeLog:
* gcc-interface/decl.c (gnat_to_gnu_entity) <E_Constant>: In case
the constant is not being defined, get the expression in type
annotation mode only if its type is elementary.

Fix internal error with Shift_Right operator on signed type

This is a regression present on the mainline and 10 branch in the form
of an ICE with a shift operator applied to a variable of a signed type,
and which is caused by a type mismatch.

gcc/ada/ChangeLog:
* gcc-interface/trans.c (gnat_to_gnu) <N_Op_Shift>: Also convert
GNU_MAX_SHIFT if the type of the operation has been changed.
* gcc-interface/utils.c (can_materialize_object_renaming_p): Add
pair of missing parentheses.

gcc/testsuite/ChangeLog:
* gnat.dg/shift1.adb: New test.

testsuite/97797 - adjust GIMPLE tests for sizetype

Tested on x86_64-unknown-linux-gnu, pushed.

2020-11-11 Richard Biener <rguenther@suse.de>

PR testsuite/97797
* gcc.dg/torture/ssa-fre-5.c: Use __SIZETYPE__ where
appropriate.
* gcc.dg/torture/ssa-fre-6.c: Likewise.

tree-optimization/97623 - Avoid PRE hoist insertion iteration

The recent previous change in this area limited hoist insertion
iteration via a param but the following is IMHO better since
we are not really interested in PRE opportunities exposed by
hoisting but only the other way around. So this moves hoist
insertion after PRE iteration finished and removes hoist
insertion iteration alltogether.

2020-11-11 Richard Biener <rguenther@suse.de>

PR tree-optimization/97623
* params.opt (-param=max-pre-hoist-insert-iterations): Remove
again.
* doc/invoke.texi (max-pre-hoist-insert-iterations): Likewise.
* tree-ssa-pre.c (insert): Move hoist insertion after PRE
insertion iteration and do not iterate it.

* gcc.dg/tree-ssa/ssa-hoist-3.c: Adjust.
* gcc.dg/tree-ssa/ssa-hoist-7.c: Likewise.
* gcc.dg/tree-ssa/ssa-pre-30.c: Likewise.

aarch64: Support SVE comparisons for unpacked integers

This patch adds support for comparing unpacked SVE integer vectors,
such as byte elements stored in the bottom bytes of halfword
containers. It also adds support for selects between unpacked
SVE vectors (both integer and floating-point), since selects and
compares are closely tied via the vcond optab interface.

gcc/
* config/aarch64/aarch64-sve.md (@vcond_mask_<mode><vpred>): Extend
from SVE_FULL to SVE_ALL.
(*vcond_mask_<mode><vpred>): Likewise.
(@aarch64_sel_dup<mode>): Likewise.
(vcond<SVE_FULL:mode><v_int_equiv>): Extend to...
(vcond<SVE_ALL:mode><SVE_I:mode>): ...this, but requiring the
sizes of the container modes to match.
(vcondu<SVE_FULL:mode><v_int_equiv>): Extend to...
(vcondu<SVE_ALL:mode><SVE_I:mode>): ...this.
(vec_cmp<SVE_FULL_I:mode><vpred>): Extend to...
(vec_cmp<SVE_I:mode><vpred>): ...this.
(vec_cmpu<SVE_FULL_I:mode><vpred>): Extend to...
(vec_cmpu<SVE_I:mode><vpred>): ...this.
(@aarch64_pred_cmp<cmp_op><SVE_FULL_I:mode>): Extend to...
(@aarch64_pred_cmp<cmp_op><SVE_I:mode>): ...this.
(*cmp<cmp_op><SVE_FULL_I:mode>_cc): Extend to...
(*cmp<cmp_op><SVE_I:mode>_cc): ...this.
(*cmp<cmp_op><SVE_FULL_I:mode>_ptest): Extend to...
(*cmp<cmp_op><SVE_I:mode>_ptest): ...this.
(*cmp<cmp_op><SVE_FULL_I:mode>_and): Extend to...
(*cmp<cmp_op><SVE_I:mode>_and): ...this.

gcc/testsuite/
* gcc.target/aarch64/sve/cmp_1.c: New test.
* gcc.target/aarch64/sve/cmp_2.c: Likewise.
* gcc.target/aarch64/sve/cond_arith_1.c: Add --param
aarch64-sve-compare-costs=0
* gcc.target/aarch64/sve/cond_arith_1_run.c: Likewise.
* gcc.target/aarch64/sve/cond_arith_3.c: Likewise.
* gcc.target/aarch64/sve/cond_arith_3_run.c: Likewise.
* gcc.target/aarch64/sve/mask_gather_load_7.c: Likewise.
* gcc.target/aarch64/sve/mask_load_slp_1.c: Likewise.
* gcc.target/aarch64/sve/vcond_11.c: Likewise.
* gcc.target/aarch64/sve/vcond_11_run.c: Likewise.

vect: Allow vconds between different vector sizes

The vcond code requires the compared vectors and the selected
vectors to have both the same size and the same number of elements
as each other.  But the operation makes logical sense even for
different vector sizes.  E.g. you could compare two V4SIs and
use the result to select between two V4DIs.

The underlying optab already allows the compared mode and the selected
mode to be specified separately.  Since the vectoriser now also
supports mixed vector sizes, I think we can simply remove the
equal-size check and just keep the equal-lanes check.  It's then
up to the target to decide which (if any) mixtures of sizes it
supports.

gcc/
* optabs-tree.c (expand_vec_cond_expr_p): Allow the compared values
and the selected values to have different mode sizes.
* gimple-isel.cc (gimple_expand_vec_cond_expr): Likewise.

libstdc++: Assigning to a joinable std::jthread calls std::terminate

Move assigning to a std::jthread that represents a thread of execution
needs to send a stop request and join that running thread. Otherwise the
std::thread data member will terminate in its assignment operator.

Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* include/std/thread (jthread::operator=(jthread&&)): Transfer
any existing state to a temporary that will request a stop and
then join.
* testsuite/30_threads/jthread/jthread.cc: Test move assignment.

libstdc++: Use helper type for checking thread ID

This encapsulates the storing and checking of the thread ID into a class
type, so that the macro _GLIBCXX_HAS_GTHREADS is only checked in one
place. The code doing the checks just calls member functions of the new
type, without caring whether that really does any work or not.

libstdc++-v3/ChangeLog:

* include/std/stop_token (_Stop_state_t::_M_requester): Define
new struct with members to store and check the thread ID.
(_Stop_state_t::_M_request_stop()): Use _M_requester._M_set().
(_Stop_state_t::_M_remove_callback(_Stop_cb*)): Use
_M_requester._M_is_current_thread().

Support Intel AVX VNNI

2020-10-13 Hongtao Liu <hongtao.liu@intel.com>
Hongyu Wang <hongyu.wang@intel.com>

gcc/
* common/config/i386/cpuinfo.h (get_available_features):
Detect AVXVNNI.
* common/config/i386/i386-common.c
(OPTION_MASK_ISA2_AVXVNNI_SET,
OPTION_MASK_ISA2_AVXVNNI_UNSET): New.
(OPTION_MASK_ISA2_AVX2_UNSET): Add AVXVNNI.
(ix86_hanlde_option): Handle -mavxvnni, unset avxvnni when
avx2 is disabled.
* common/config/i386/i386-cpuinfo.h (enum processor_features):
Add FEATURE_AVXVNNI.
* common/config/i386/i386-isas.h: Add ISA_NAMES_TABLE_ENTRY
for avxvnni.
* config.gcc: Add avxvnniintrin.h.
* config/i386/avx512vnnivlintrin.h: Reimplement 128/256 bit non-mask
intrinsics with macros to support unified interface.
* config/i386/avxvnniintrin.h: New header file.
* config/i386/cpuid.h (bit_AVXVNNI): New.
* config/i386/i386-builtins.c (def_builtin): Handle AVXVNNI mask
for unified builtin.
* config/i386/i386-builtin.def (BDESC): Adjust AVX512VNNI
builtins for AVXVNNI.
* config/i386/i386-c.c (ix86_target_macros_internal): Define
__AVXVNNI__.
* config/i386/i386-expand.c (ix86_expand_builtin): Handle bisa
for AVXVNNI to support unified intrinsic name, since there is no
dependency between AVX512VNNI and AVXVNNI.
* config/i386/i386-options.c (isa2_opts): Add -mavxvnni.
(ix86_valid_target_attribute_inner_p): Handle avxnnni.
(ix86_option_override_internal): Ditto.
* config/i386/i386.h (TARGET_AVXVNNI, TARGET_AVXVNNI_P,
TARGET_AVXVNNI_P, PTA_AVXVNNI): New.
(PTA_SAPPHIRERAPIDS): Add AVX_VNNI.
(PTA_ALDERLAKE): Likewise.
* config/i386/i386.md ("isa"): Add avxvnni, avx512vnnivl.
("enabled"): Adjust for avxvnni and avx512vnnivl.
* config/i386/i386.opt: Add option -mavxvnni.
* config/i386/immintrin.h: Include avxvnniintrin.h.
* config/i386/sse.md (vpdpbusd_<mode>): Adjust for AVXVNNI.
(vpdpbusds_<mode>): Likewise.
(vpdpwssd_<mode>): Likewise.
(vpdpwssds_<mode>): Likewise.
(vpdpbusd_v16si): New.
(vpdpbusds_v16si): Likewise.
(vpdpwssd_v16si): Likewise.
(vpdpwssds_v16si): Likewise.
* doc/invoke.texi: Document -mavxvnni.
* doc/extend.texi: Document avxvnni.
* doc/sourcebuild.texi: Document target avxvnni.

gcc/testsuite/

* gcc.target/i386/avx512vl-vnni-1.c: Rename..
* gcc.target/i386/avx512vl-vnni-1a.c: To This.
* gcc.target/i386/avx512vl-vnni-1b.c: New test.
* gcc.target/i386/avx512vl-vnni-2.c: Ditto.
* gcc.target/i386/avx512vl-vnni-3.c: Ditto.
* gcc.target/i386/avx-vnni-1.c: Ditto.
* gcc.target/i386/avx-vnni-2.c: Ditto.
* gcc.target/i386/avx-vnni-3.c: Ditto.
* gcc.target/i386/avx-vnni-4.c: Ditto.
* gcc.target/i386/avx-vnni-5.c: Ditto.
* gcc.target/i386/avx-vnni-6.c: Ditto.
* gcc.target/i386/avx-vpdpbusd-2.c: Ditto.
* gcc.target/i386/avx-vpdpbusds-2.c: Ditto.
* gcc.target/i386/avx-vpdpwssd-2.c: Ditto.
* gcc.target/i386/avx-vpdpwssds-2.c: Ditto.
* gcc.target/i386/vnni_inline_error.c: Ditto.
* gcc.target/i386/avx512vnnivl-builtin.c: Ditto.
* gcc.target/i386/avxvnni-builtin.c: Ditto.
* gcc.target/i386/funcspec-56.inc: Add new target attribute.
* gcc.target/i386/sse-12.c: Add -mavxvnni.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-14.c: Ditto.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* g++.dg/other/i386-2.C: Ditto.
* g++.dg/other/i386-3.C: Ditto.
* lib/target-supports.exp (check_effective_target_avxvnni):
New proc.

Fix spelling.

gcc/ChangeLog:

* tree.c (copy_node): Fix spelling.

Drop topological sort for PRE phi-translation

The topological sort sorted_array_from_bitmap_set is supposed to
provide isn't one since quite some time since value_ids are
assigned first to SSA names in the order of SSA_NAME_VERSION
and then to hashtable entries in the order they appear in the
table. One can even argue that expression-ids provide a closer
approximation of a topological sort since those are assigned
during AVAIL_OUT computation which is done in a dominator walk.

Now - phi-translation is not even depending on topological sorting
but it essentially does a DFS walk, phi-translating expressions
it depends on and relying on phi-translation caching to avoid
doing redundant work.

So this patch drops the use of sorted_array_from_bitmap_set from
phi_translate_set because this function is quite expensive.

2020-11-11 Richard Biener <rguenther@suse.de>

* tree-ssa-pre.c (phi_translate_set): Do not sort the
expression set topologically.

Early exit on VR_VARYING from irange::set.

gcc/ChangeLog:

* value-range.cc (irange::set): Early exit on VR_VARYING.

AArch64: Add FLAG for arithmetic operation intrinsics [PR94442]

2020-11-11 Zhiheng Xie <xiezhiheng@huawei.com>
Nannan Zheng <zhengnannan@huawei.com>

gcc/ChangeLog:

* config/aarch64/aarch64-simd-builtins.def: Add proper FLAG
for arithmetic operation intrinsics.

gfortran.dg/gomp/workshare-reduction-*.f90: Fix dumps for -m32

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/workshare-reduction-26.f90: Add (?:_ull) to
scan-tree-dump-times regex for -m32.
* gfortran.dg/gomp/workshare-reduction-27.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-28.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-3.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-36.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-37.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-38.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-39.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-40.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-41.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-42.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-43.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-44.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-45.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-46.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-47.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-56.f90: Likewise.
* gfortran.dg/gomp/workshare-reduction-57.f90: Likewise.

fortran: Fix up gfc_typename CHARACTER length handling [PR97768]

The first testcase below ICEs when f951 is 32-bit (or 64-bit big-endian).
The problem is that ex->ts.u.cl && ex->ts.u.cl->length are both non-NULL,
but ex->ts.u.cl->length->expr_type is not EXPR_CONSTANT, but EXPR_FUNCTION.
value.function.actual and value.function.name are in that case pointers,
but value._mp_alloc and value._mp_size are 4 byte integers no matter what.
So, in 64-bit little-endian the function returns most of the time incorrect
CHARACTER(0) because the most significant 32 bits of the
value.function.actual pointer are likely 0.
Anyway, the following patch is an attempt to get all the cases right.
Uses ex->value.character.length only for ex->expr_type == EXPR_CONSTANT
(i.e. CHARACTER literals), handles the deferred lengths, assumed lengths,
known constant lengths and finally if the length is something other,
just doesn't print it, i.e. prints just CHARACTER (for default kind)
or CHARACTER(KIND=4) (for e.g. kind 4).

2020-11-11  Jakub Jelinek  <jakub@redhat.com>

PR fortran/97768
gcc/fortran/
* misc.c (gfc_typename): Use ex->value.character.length only if
ex->expr_type == EXPR_CONSTANT.  If ex->ts.deferred, print : instead
of length.  If ex->ts.u.cl && ex->ts.u.cl->length == NULL, print *
instead of length.  Otherwise if character length is non-constant,
print just CHARACTER or CHARACTER(KIND=N).
gcc/testsuite/
* gfortran.dg/pr97768_1.f90: New test.
* gfortran.dg/pr97768_2.f90: New test.

Re: Refactor copying decl section names

* go-gcc.cc (Gcc_backend::global_variable_set_init): Cast NULL to
avoid ambiguous overloaded call.

Improve efficiency of copying section from another tree

gcc/
* cgraph.h (symtab_node::set_section_for_node): Declare new
overload.
(symtab_node::set_section_from_string): Rename from set_section.
(symtab_node::set_section_from_node): Declare.
* symtab.c (symtab_node::set_section_for_node): Define new
overload.
(symtab_node::set_section_from_string): Rename from set_section.
(symtab_node::set_section_from_node): Define.
(symtab_node::set_section): Call renamed set_section_from_string.
(symtab_node::set_section): Call new set_section_from_node.

Refactor section name ref counting

gcc/

* symtab.c (symtab_node::set_section_for_node): Extract reference
counting logic into ...
(retain_section_hash_entry): ... here (new function) and ...
(release_section_hash_entry): ... here (new function).

Formatting, there should be a space between PTA_* and (.

gcc/ChangeLog
* config/i386/i386.h (PTA_MOVDIRI, PTA_MOVDIR64B,
PTA_AMX_TILE, PTA_AMX_INT8, PTA_AMX_BF16, PTA_HRESET):
Formatting.

Update MicroBlaze strings test

gcc/testsuite

* gcc.target/microblaze/others/strings1.c: Update
to include $LC label.

testsuite: skip zero-scratch-regs on powerpc.

These tests are unsupported on PowerPC.

gcc/testsuite/ChangeLog:

* c-c++-common/zero-scratch-regs-10.c: Skip on powerpc*-*-*.
* c-c++-common/zero-scratch-regs-11.c: Skip on powerpc*-*-*.
* c-c++-common/zero-scratch-regs-5.c: Skip on powerpc*-*-aix*.
* c-c++-common/zero-scratch-regs-8.c: Skip on powerpc*-*-*.
* c-c++-common/zero-scratch-regs-9.c: Skip on powerpc*-*-*.

libstdc++: Implement std::emit_on_flush etc.

This adds the manipulators for use with basic_osyncstream. In order to
detect when an arbitrary basic_ostream<C,T> is the base class of a
basic_syncbuf<C,T,A> object, introduce a new intermediate base class
that stores the data members. The new base class stores a pointer and
two bools, which wastes (sizeof(void*) - 2) bytes of padding. It would
be possible to use the two least significant bits of the pointer for the
two bools, at least for targets where alignof(basic_streambuf) > 2, but
that's left as a possible change for a future date.

Also define basic_syncbuf::overflow to override the virtual function in
the base class, so that single characters can be inserted into the
stream buffer. Previously the default basic_streambuf::overflow
implementation was used, which drops the character on the floor.

libstdc++-v3/ChangeLog:

* include/std/ostream (__syncbuf_base): New class template.
(emit_on_flush, noemit_on_flush, flush_emit): New manipulators.
* include/std/syncstream (basic_syncbuf): Derive from
__syncbuf_base instead of basic_streambuf.
(basic_syncbuf::operator=): Remove self-assignment check.
(basic_syncbuf::swap): Remove self-swap check.
(basic_syncbuf::emit): Do not skip pubsync() call if sequence
is empty.
(basic_syncbuf::sync): Remove no-op pubsync on stringbuf.
(basic_syncbuf::overflow): Define override.
* testsuite/27_io/basic_syncstream/basic_ops/1.cc: Test
basic_osyncstream::put(char_type).
* testsuite/27_io/basic_ostream/emit/1.cc: New test.

Daily bump.

IBM Z: Fix bootstrap breakage due to HAVE_TF macro

Commit e627cda56865 ("IBM Z: Store long doubles in vector registers
when possible") introduced HAVE_TF macro which expands to a logical
"or" of HAVE_ constants. Not all of these constants are available in
GENERATOR_FILE context, so a hack was used: simply expand to true in
this case, because the actual value matters only during compiler
runtime and not during generation.

However, one aspect of this value matters during generation after all:
whether or not it's a constant, which in this case it appears to be.
This results in incorrect values in insn-flags.h and broken bootstrap
for some configurations.

Fix by using a dummy value that is not a constant.

gcc/ChangeLog:

2020-11-10 Ilya Leoshkevich <iii@linux.ibm.com>

* config/s390/s390.h (HAVE_TF): Use opaque value when
GENERATOR_FILE is defined.

libstdc++: Avoid bad_alloc exceptions when changing locales

For the --enable-clocale=generic configuration, the current code can
fail with a bad_alloc exception. This patch uses the nothrow version of
operator new and reports allocation failures by setting failbit in the
iostate variable.

* config/locale/generic/c_locale.cc (__set_C_locale()): New function
to set the "C" locale and return the name of the previous locale.
(__convert_to_v<float>, __convert_to_v<double>)
(__convert_to_v<long double>): Use __set_C_locale and set failbit on
error.

c++: Improve static_assert diagnostic [PR97518]

Currently, when a static_assert fails, we only say "static assertion failed".
It would be more useful if we could also print the expression that
evaluated to false; this is especially useful when the condition uses
template parameters.  Consider the motivating example, in which we have
this line:

  static_assert(is_same<X, Y>::value);

if this fails, the user has to play dirty games to get the compiler to
print the template arguments.  With this patch, we say:

  error: static assertion failed
  note: 'is_same<int*, int>::value' evaluates to false

which I think is much better.  However, always printing the condition that
evaluated to 'false' wouldn't be very useful: e.g. noexcept(fn) is
always parsed to true/false, so we would say "'false' evaluates to false"
which doesn't help.  So I wound up only printing the condition when it was
instantiation-dependent, that is, we called finish_static_assert from
tsubst_expr.

Moreover, this patch also improves the diagnostic when the condition
consists of a logical AND.  Say you have something like this:

  static_assert(fn1() && fn2() && fn3() && fn4() && fn5());

where fn4() evaluates to false and the other ones to true.  Highlighting
the whole thing is not that helpful because it won't say which clause
evaluated to false.  With the find_failing_clause tweak in this patch
we emit:

  error: static assertion failed
    6 | static_assert(fn1() && fn2() && fn3() && fn4() && fn5());
      |                                          ~~~^~

so you know right away what's going on.  Unfortunately, when you combine
both things, that is, have an instantiation-dependent expr and && in
a static_assert, we can't yet quite point to the clause that failed.  It
is because when we tsubstitute something like is_same<X, Y>::value, we
generate a VAR_DECL that doesn't have any location.  It would be awesome
if we could wrap it with a location wrapper, but I didn't see anything
obvious.

In passing, I've cleaned up some things:
* use iloc_sentinel when appropriate,
* it's nicer to call contextual_conv_bool instead of the rather verbose
  perform_implicit_conversion_flags,
* no need to check for INTEGER_CST before calling integer_zerop.

gcc/cp/ChangeLog:

PR c++/97518
* cp-tree.h (finish_static_assert): Adjust declaration.
* parser.c (cp_parser_static_assert): Pass false to
finish_static_assert.
* pt.c (tsubst_expr): Pass true to finish_static_assert.
* semantics.c (find_failing_clause_r): New function.
(find_failing_clause): New function.
(finish_static_assert): Add a bool parameter.  Use
iloc_sentinel.  Call contextual_conv_bool instead of
perform_implicit_conversion_flags.  Don't check for INTEGER_CST before
calling integer_zerop.  Call find_failing_clause and maybe use its
location.  Print the original condition or the failing clause if
SHOW_EXPR_P.

gcc/testsuite/ChangeLog:

PR c++/97518
* g++.dg/diagnostic/pr87386.C: Adjust expected output.
* g++.dg/diagnostic/static_assert1.C: New test.
* g++.dg/diagnostic/static_assert2.C: New test.

libcc1/ChangeLog:

PR c++/97518
* libcp1plugin.cc (plugin_add_static_assert): Pass false to
finish_static_assert.

c++: Add 5 unfixed tests.

A couple of dg-ice tests.

gcc/testsuite/ChangeLog:

PR c++/52830
PR c++/88982
PR c++/90799
PR c++/87765
PR c++/89565
* g++.dg/cpp0x/constexpr-52830.C: New test.
* g++.dg/cpp0x/vt-88982.C: New test.
* g++.dg/cpp1z/class-deduction76.C: New test.
* g++.dg/cpp1z/constexpr-lambda26.C: New test.
* g++.dg/cpp2a/nontype-class39.C: New test.

libstdc++: Reorder constructors in <sstream>

This groups all the constructors together, consistent with the synopses
in the C++20 standard.

libstdc++-v3/ChangeLog:

* include/std/sstream (basic_stringbug, basic_istringstream)
(basic_ostringstream, basic_stringstream): Reorder C++20
constructors to be declared next to other constructors.

libstdc++: Add remaining C++20 additions to <sstream> [P0408R7]

This adds the new overloads of basic_stringbuf::str, and the
corresponding overloads to basic_istringstream, basic_ostringstream and
basic_stringstream.

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu.ver (GLIBCXX_3.4.21): Tighten patterns.
(GLIBCXX_3.4.29): Export new symbols.
* include/bits/alloc_traits.h (__allocator_like): New concept.
* include/std/sstream (basic_stringbuf::swap): Add exception
specification.
(basic_stringbuf::str() const): Add ref-qualifier. Use new
_M_high_mark function.
(basic_stringbuf::str(const SAlloc&) const): Define new function.
(basic_stringbuf::str() &&): Likewise.
(basic_stringbuf::str(const basic_string<C,T,SAlloc>&)):
Likewise.
(basic_stringbuf::str(basic_string<C,T,Alloc>&&)): Likewise.
(basic_stringbuf::view() const): Use _M_high_mark.
(basic_istringstream::str, basic_ostringstream::str)
(basic_stringstream::str): Define new overloads.
* src/c++20/sstream-inst.cc (basic_stringbuf::str)
(basic_istringstream::str, basic_ostringstream::str)
(basic_stringstream::str): Explicit instantiation definitions
for new overloads.
* testsuite/27_io/basic_istringstream/view/char/1.cc: Add more
checks.
* testsuite/27_io/basic_istringstream/view/wchar_t/1.cc:
Likewise.
* testsuite/27_io/basic_ostringstream/view/char/1.cc:
Likewise.
* testsuite/27_io/basic_ostringstream/view/wchar_t/1.cc:
Likewise.
* testsuite/27_io/basic_stringstream/view/char/1.cc:
Likewise.
* testsuite/27_io/basic_stringstream/view/wchar_t/1.cc:
Likewise.
* testsuite/27_io/basic_istringstream/str/char/2.cc: New test.
* testsuite/27_io/basic_istringstream/str/wchar_t/2.cc: New test.
* testsuite/27_io/basic_ostringstream/str/char/3.cc: New test.
* testsuite/27_io/basic_ostringstream/str/wchar_t/3.cc: New test.
* testsuite/27_io/basic_stringbuf/str/char/4.cc: New test.
* testsuite/27_io/basic_stringbuf/str/wchar_t/4.cc: New test.
* testsuite/27_io/basic_stringstream/str/char/5.cc: New test.
* testsuite/27_io/basic_stringstream/str/wchar_t/5.cc.cc: New test.

libstdc++: Fix more unspecified comparisons to null pointer [PR 97415]

This adds some more null checks to avoid a relational comparison with a
null pointer, similar to 78198b6021a9695054dab039340202170b88423c.

libstdc++-v3/ChangeLog:

PR libstdc++/97415
* include/std/sstream (basic_stringbuf::_M_update_egptr)
(basic_stringbuf::__xfer_bufptrs::__xfer_bufptrs): Check for
null before comparing pointers.

Refactor copying decl section names

gcc/

* cgraph.h (symtab_node::get_section): Constify.
(symtab_node::set_section): Declare new overload.
* symtab.c (symtab_node::set_section): Define new overload.
(symtab_node::copy_visibility_from): Use new overload of
symtab_node::set_section.
(symtab_node::resolve_alias): Same.
* tree.h (set_decl_section_name): Declare new overload.
* tree.c (set_decl_section_name): Define new overload.
* tree-emutls.c (get_emutls_init_templ_addr): Same.
* cgraphclones.c (cgraph_node::create_virtual_clone): Use new
overload of symtab_node::set_section.
(cgraph_node::create_version_clone_with_body): Same.
* trans-mem.c (ipa_tm_create_version): Same.

gcc/c
* c-decl.c (merge_decls): Use new overload of
set_decl_section_name.

gcc/cp
* decl.c (duplicate_decls): Use new overload of
set_decl_section_name.
* method.c (use_thunk): Same.
* optimize.c (maybe_clone_body): Same.
* coroutines.cc (act_des_fn): Same.

gcc/d
* decl.cc (finish_thunk): Use new overload of
set_decl_section_name

Early exit from irange::set for poly ints.

My previous cleanups to irange::set moved the early exit when
VARYING. This caused poly int varyings to be created with
incorrect min/max.

We can just set varying and exit for all poly ints.

gcc/ChangeLog:

* value-range.cc (irange::set): Early exit for poly ints.

analyzer: remove dead code

gcc/analyzer/ChangeLog:

* constraint-manager.cc (constraint_manager::merge): Remove
unused code.
* constraint-manager.h: Likewise.
* program-state.cc (sm_state_map::sm_state_map): Likewise.
(program_state::program_state): Likewise.
(test_sm_state_map): Likewise.
* program-state.h: Likewise.
* region-model-reachability.cc (reachable_regions::reachable_regions): Likewise.
* region-model-reachability.h: Likewise.
* region-model.cc (region_model::handle_unrecognized_call): Likewise.
(region_model::get_reachable_svalues): Likewise.
(region_model::can_merge_with_p): Likewise.

Fortran: OpenMP 5.0 (in_,task_)reduction clause extensions

gcc/fortran/ChangeLog:

* dump-parse-tree.c (show_omp_clauses): Handle new reduction enums.
* gfortran.h (OMP_LIST_REDUCTION_INSCAN, OMP_LIST_REDUCTION_TASK,
OMP_LIST_IN_REDUCTION, OMP_LIST_TASK_REDUCTION): Add enums.
* openmp.c (enum omp_mask1): Add OMP_CLAUSE_IN_REDUCTION
and OMP_CLAUSE_TASK_REDUCTION.
(gfc_match_omp_clause_reduction): Extend reduction handling;
moved from ...
(gfc_match_omp_clauses): ... here. Add calls to it.
(OMP_TASK_CLAUSES, OMP_TARGET_CLAUSES, OMP_TASKLOOP_CLAUSES):
Add OMP_CLAUSE_IN_REDUCTION.
(gfc_match_omp_taskgroup): Add task_reduction matching.
(resolve_omp_clauses): Update for new reduction clause changes;
remove removed nonmonotonic-schedule restrictions.
(gfc_resolve_omp_parallel_blocks): Add new enums to switch.
* trans-openmp.c (gfc_omp_clause_default_ctor,
gfc_trans_omp_reduction_list, gfc_trans_omp_clauses,
gfc_split_omp_clauses): Handle updated reduction clause.

gcc/ChangeLog:

* gimplify.c (gimplify_scan_omp_clauses, gimplify_omp_loop): Use 'do'
instead of 'for' in error messages for Fortran.
* omp-low.c (check_omp_nesting_restrictions): Likewise

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/schedule-modifiers-2.f90: Remove some dg-error.
* gfortran.dg/gomp/reduction4.f90: New test.
* gfortran.dg/gomp/reduction5.f90: New test.
* gfortran.dg/gomp/workshare-reduction-1.f90: New test.
* gfortran.dg/gomp/workshare-reduction-2.f90: New test.
* gfortran.dg/gomp/workshare-reduction-3.f90: New test.
* gfortran.dg/gomp/workshare-reduction-4.f90: New test.
* gfortran.dg/gomp/workshare-reduction-5.f90: New test.
* gfortran.dg/gomp/workshare-reduction-6.f90: New test.
* gfortran.dg/gomp/workshare-reduction-7.f90: New test.
* gfortran.dg/gomp/workshare-reduction-8.f90: New test.
* gfortran.dg/gomp/workshare-reduction-9.f90: New test.
* gfortran.dg/gomp/workshare-reduction-10.f90: New test.
* gfortran.dg/gomp/workshare-reduction-11.f90: New test.
* gfortran.dg/gomp/workshare-reduction-12.f90: New test.
* gfortran.dg/gomp/workshare-reduction-13.f90: New test.
* gfortran.dg/gomp/workshare-reduction-14.f90: New test.
* gfortran.dg/gomp/workshare-reduction-15.f90: New test.
* gfortran.dg/gomp/workshare-reduction-16.f90: New test.
* gfortran.dg/gomp/workshare-reduction-17.f90: New test.
* gfortran.dg/gomp/workshare-reduction-18.f90: New test.
* gfortran.dg/gomp/workshare-reduction-19.f90: New test.
* gfortran.dg/gomp/workshare-reduction-20.f90: New test.
* gfortran.dg/gomp/workshare-reduction-21.f90: New test.
* gfortran.dg/gomp/workshare-reduction-22.f90: New test.
* gfortran.dg/gomp/workshare-reduction-23.f90: New test.
* gfortran.dg/gomp/workshare-reduction-24.f90: New test.
* gfortran.dg/gomp/workshare-reduction-25.f90: New test.
* gfortran.dg/gomp/workshare-reduction-26.f90: New test.
* gfortran.dg/gomp/workshare-reduction-27.f90: New test.
* gfortran.dg/gomp/workshare-reduction-28.f90: New test.
* gfortran.dg/gomp/workshare-reduction-29.f90: New test.
* gfortran.dg/gomp/workshare-reduction-30.f90: New test.
* gfortran.dg/gomp/workshare-reduction-31.f90: New test.
* gfortran.dg/gomp/workshare-reduction-32.f90: New test.
* gfortran.dg/gomp/workshare-reduction-33.f90: New test.
* gfortran.dg/gomp/workshare-reduction-34.f90: New test.
* gfortran.dg/gomp/workshare-reduction-35.f90: New test.
* gfortran.dg/gomp/workshare-reduction-36.f90: New test.
* gfortran.dg/gomp/workshare-reduction-37.f90: New test.
* gfortran.dg/gomp/workshare-reduction-38.f90: New test.
* gfortran.dg/gomp/workshare-reduction-39.f90: New test.
* gfortran.dg/gomp/workshare-reduction-40.f90: New test.
* gfortran.dg/gomp/workshare-reduction-41.f90: New test.
* gfortran.dg/gomp/workshare-reduction-42.f90: New test.
* gfortran.dg/gomp/workshare-reduction-43.f90: New test.
* gfortran.dg/gomp/workshare-reduction-44.f90: New test.
* gfortran.dg/gomp/workshare-reduction-45.f90: New test.
* gfortran.dg/gomp/workshare-reduction-46.f90: New test.
* gfortran.dg/gomp/workshare-reduction-47.f90: New test.
* gfortran.dg/gomp/workshare-reduction-48.f90: New test.
* gfortran.dg/gomp/workshare-reduction-49.f90: New test.
* gfortran.dg/gomp/workshare-reduction-50.f90: New test.
* gfortran.dg/gomp/workshare-reduction-51.f90: New test.
* gfortran.dg/gomp/workshare-reduction-52.f90: New test.
* gfortran.dg/gomp/workshare-reduction-53.f90: New test.
* gfortran.dg/gomp/workshare-reduction-54.f90: New test.
* gfortran.dg/gomp/workshare-reduction-55.f90: New test.
* gfortran.dg/gomp/workshare-reduction-56.f90: New test.
* gfortran.dg/gomp/workshare-reduction-57.f90: New test.
* gfortran.dg/gomp/workshare-reduction-58.f90: New test.

opts: Change `is incompatible with` messages to have standard parametrised form

Hello,

In a recent review for one of the hwasan patches Richard S. noticed there are
quite a few errors of the form "%<someflag%> is incompatible with
<otherflag%>".
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/556137.html

In order to avoid this creating extra work for translators we would like to
change these error messages to use the form "%qs is incompatible with %qs" and
pass the flag as format arguments.

This patch implements that change.
There is only one change in the output the compiler produces from this patch,
an error message of "-fsanitize=address and -fsanitize=kernel-address are
incompatible with -fsanitize=thread" has been changed to "-fsanitize=thread is
incompatible with -fsanitize=address|kernel-address".
This matches the similar error messages for live patching which use the
messages "-f<something> is incompatible with
-flive-patching=inline-only-static|inline-clone".

Ok for trunk?

gcc/ChangeLog:

* opts.c (control_options_for_live_patching): Reform 'is incompatible
with' error messages to use a standard message with differing format
arguments.
(finish_options): Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/ubsan/sanitize-recover-7.c: Update testcase.

Fix minor whitespace issues

libgcc/

* libgcc2.c: Fix whitespace issues in most recent change.

Improve generated code for various libgcc2.c routines

libgcc/

* libgcc2.c (__addvSI3): Use overflow builtins.
(__addvsi3, __addvDI3 ,__subvSI3, __subvsi3): Likewise.
(__subvDI3 __mulvSI3, __mulvsi3, __negvSI2): Likewise.
(__negvsi2, __negvDI2): Likewise.
(__cmpdi2, __ucmpdi2): Adjust implementation to improve
generated code.
* libgcc2.h (__ucmpdi2): Adjust prototype.

libgo: update to Go 1.15.4 release

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/268177

c, c++: Fix up -Wunused-value on COMPLEX_EXPRs [PR97748]

The -Wunused-value warning in both C and C++ FEs (implemented
significantly differently between the two) sees the COMPLEX_EXPRs created
e.g. for complex pre/post increment and many other expressions as useless
and warns about it.

For the C warning implementation, on e.g.
COMPLEX_EXPR < ++REALPART_EXPR <x>, IMAGPART_EXPR <x>>;
would warn even on the IMAGPART_EXPR <x> there alone etc., so what works
is check if we'd warn about both operands of COMPLEX_EXPR and if yes,
warn on the whole COMPLEX_EXPR, otherwise don't warn.

The C++ warning implementation is significantly different and for that one
the only warn if both would be warned about doesn't really work,
we then miss warnings e.g. about
COMPLEX_EXPR <REALPART_EXPR <SAVE_EXPR <x>> + 1.0e+0, IMAGPART_EXPR <SAVE_EXPR <x>>> >>>>>
The patch replaces the warning_at call with call to the c-family
warn_if_unused_value function.

On the testcase which after the initial new tests contains pretty much
everything from gcc.dg/Wunused-value-1.c both approaches seem to work
nicely.

2020-11-10  Jakub Jelinek  <jakub@redhat.com>

PR c/97748
gcc/c-family/
* c-common.h (warn_if_unused_value): Add quiet argument defaulted
to false.
* c-warn.c (warn_if_unused_value): Likewise.  Pass it down
recursively and just return true instead of warning if it is true.
Handle COMPLEX_EXPR.
gcc/cp/
* cvt.c (convert_to_void): Check (complain & tf_warning) in the outer
if rather than twice times in the inner one.  Use warn_if_unused_value.
Formatting fix.
gcc/testsuite/
* c-c++-common/Wunused-value-1.c: New test.

tree-optimization/97769 - fix assert in peeling for alignment

The following removes an assert that can not easily be adjusted to
cover the additional cases we now handle after the removal of
the same-align DRs vector.

2020-11-10 Richard Biener <rguenther@suse.de>

PR tree-optimization/97769
* tree-vect-data-refs.c (vect_update_misalignment_for_peel):
Remove assert.

* gcc.dg/vect/pr97769.c: New testcase.

tree-optimization/97780 - fix ICE in fini_pre

This deals with blocks elimination added.

2020-11-10 Richard Biener <rguenther@suse.de>

PR tree-optimization/97780
* tree-ssa-pre.c (fini_pre): Deal with added basic blocks
when freeing PHI_TRANS_TABLE.

AArch64: Add FLAG for tbl/tbx intrinsics [PR94442]

2020-11-10 Zhiheng Xie <xiezhiheng@huawei.com>
Nannan Zheng <zhengnannan@huawei.com>

gcc/ChangeLog:

* config/aarch64/aarch64-simd-builtins.def: Add proper FLAG
for tbl/tbx intrinsics.

openmp: Implement OpenMP 5.0 base-pointer attachement and clause ordering

This patch implements some parts of the target variable mapping changes
specified in OpenMP 5.0, including base-pointer attachment/detachment
behavior for array section list-items in map clauses, and ordering of
map clauses according to map kind.

2020-11-10 Chung-Lin Tang <cltang@codesourcery.com>

gcc/c-family/ChangeLog:

* c-common.h (c_omp_adjust_map_clauses): New declaration.
* c-omp.c (struct map_clause): Helper type for c_omp_adjust_map_clauses.
(c_omp_adjust_map_clauses): New function.

gcc/c/ChangeLog:

* c-parser.c (c_parser_omp_target_data): Add use of
new c_omp_adjust_map_clauses function. Add GOMP_MAP_ATTACH_DETACH as
handled map clause kind.
(c_parser_omp_target_enter_data): Likewise.
(c_parser_omp_target_exit_data): Likewise.
(c_parser_omp_target): Likewise.
* c-typeck.c (handle_omp_array_sections): Adjust COMPONENT_REF case to
use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type.
(c_finish_omp_clauses): Adjust bitmap checks to allow struct decl and
same struct field access to co-exist on OpenMP construct.

gcc/cp/ChangeLog:

* parser.c (cp_parser_omp_target_data): Add use of
new c_omp_adjust_map_clauses function. Add GOMP_MAP_ATTACH_DETACH as
handled map clause kind.
(cp_parser_omp_target_enter_data): Likewise.
(cp_parser_omp_target_exit_data): Likewise.
(cp_parser_omp_target): Likewise.
* semantics.c (handle_omp_array_sections): Adjust COMPONENT_REF case to
use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type. Fix
interaction between reference case and attach/detach.
(finish_omp_clauses): Adjust bitmap checks to allow struct decl and
same struct field access to co-exist on OpenMP construct.

gcc/ChangeLog:

* gimplify.c (is_or_contains_p): New static helper function.
(omp_target_reorder_clauses): New function.
(gimplify_scan_omp_clauses): Add use of omp_target_reorder_clauses to
reorder clause list according to OpenMP 5.0 rules. Add handling of
GOMP_MAP_ATTACH_DETACH for OpenMP cases.
* omp-low.c (is_omp_target): New static helper function.
(scan_sharing_clauses): Add scan phase handling of GOMP_MAP_ATTACH/DETACH
for OpenMP cases.
(lower_omp_target): Add lowering handling of GOMP_MAP_ATTACH/DETACH for
OpenMP cases.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/clauses-2.c: Remove dg-error cases now valid.
* gfortran.dg/gomp/map-2.f90: Likewise.
* c-c++-common/gomp/map-5.c: New testcase.

libgomp/ChangeLog:

* libgomp.h (enum gomp_map_vars_kind): Adjust enum values to be bit-flag
usable.
* oacc-mem.c (acc_map_data): Adjust gomp_map_vars argument flags to
'GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_ENTER_DATA'.
(goacc_enter_datum): Likewise for call to gomp_map_vars_async.
(goacc_enter_data_internal): Likewise.
* target.c (gomp_map_vars_internal):
Change checks of GOMP_MAP_VARS_ENTER_DATA to use bit-and (&). Adjust use
of gomp_attach_pointer for OpenMP cases.
(gomp_exit_data): Add handling of GOMP_MAP_DETACH.
(GOMP_target_enter_exit_data): Add handling of GOMP_MAP_ATTACH.
* testsuite/libgomp.c-c++-common/ptr-attach-1.c: New testcase.

IBM Z: Test long doubles in vector registers

gcc/testsuite/ChangeLog:

2020-11-05 Ilya Leoshkevich <iii@linux.ibm.com>

* gcc.target/s390/vector/long-double-callee-abi-scan.c: New test.
* gcc.target/s390/vector/long-double-caller-abi-run.c: New test.
* gcc.target/s390/vector/long-double-caller-abi-scan.c: New test.
* gcc.target/s390/vector/long-double-copysign.c: New test.
* gcc.target/s390/vector/long-double-fprx2-constant.c: New test.
* gcc.target/s390/vector/long-double-from-double.c: New test.
* gcc.target/s390/vector/long-double-from-float.c: New test.
* gcc.target/s390/vector/long-double-from-i16.c: New test.
* gcc.target/s390/vector/long-double-from-i32.c: New test.
* gcc.target/s390/vector/long-double-from-i64.c: New test.
* gcc.target/s390/vector/long-double-from-i8.c: New test.
* gcc.target/s390/vector/long-double-from-u16.c: New test.
* gcc.target/s390/vector/long-double-from-u32.c: New test.
* gcc.target/s390/vector/long-double-from-u64.c: New test.
* gcc.target/s390/vector/long-double-from-u8.c: New test.
* gcc.target/s390/vector/long-double-to-double.c: New test.
* gcc.target/s390/vector/long-double-to-float.c: New test.
* gcc.target/s390/vector/long-double-to-i16.c: New test.
* gcc.target/s390/vector/long-double-to-i32.c: New test.
* gcc.target/s390/vector/long-double-to-i64.c: New test.
* gcc.target/s390/vector/long-double-to-i8.c: New test.
* gcc.target/s390/vector/long-double-to-u16.c: New test.
* gcc.target/s390/vector/long-double-to-u32.c: New test.
* gcc.target/s390/vector/long-double-to-u64.c: New test.
* gcc.target/s390/vector/long-double-to-u8.c: New test.
* gcc.target/s390/vector/long-double-vec-duplicate.c: New test.
* gcc.target/s390/vector/long-double-wf.h: New test.
* gcc.target/s390/vector/long-double-wfaxb.c: New test.
* gcc.target/s390/vector/long-double-wfcxb-0001.c: New test.
* gcc.target/s390/vector/long-double-wfcxb-0111.c: New test.
* gcc.target/s390/vector/long-double-wfcxb-1011.c: New test.
* gcc.target/s390/vector/long-double-wfcxb-1101.c: New test.
* gcc.target/s390/vector/long-double-wfdxb.c: New test.
* gcc.target/s390/vector/long-double-wfixb.c: New test.
* gcc.target/s390/vector/long-double-wfkxb-0111.c: New test.
* gcc.target/s390/vector/long-double-wfkxb-1011.c: New test.
* gcc.target/s390/vector/long-double-wfkxb-1101.c: New test.
* gcc.target/s390/vector/long-double-wflcxb.c: New test.
* gcc.target/s390/vector/long-double-wflpxb.c: New test.
* gcc.target/s390/vector/long-double-wfmaxb-2.c: New test.
* gcc.target/s390/vector/long-double-wfmaxb-3.c: New test.
* gcc.target/s390/vector/long-double-wfmaxb-disabled.c: New test.
* gcc.target/s390/vector/long-double-wfmaxb.c: New test.
* gcc.target/s390/vector/long-double-wfmsxb-disabled.c: New test.
* gcc.target/s390/vector/long-double-wfmsxb.c: New test.
* gcc.target/s390/vector/long-double-wfmxb.c: New test.
* gcc.target/s390/vector/long-double-wfnmaxb-disabled.c: New test.
* gcc.target/s390/vector/long-double-wfnmaxb.c: New test.
* gcc.target/s390/vector/long-double-wfnmsxb-disabled.c: New test.
* gcc.target/s390/vector/long-double-wfnmsxb.c: New test.
* gcc.target/s390/vector/long-double-wfsqxb.c: New test.
* gcc.target/s390/vector/long-double-wfsxb-1.c: New test.
* gcc.target/s390/vector/long-double-wfsxb.c: New test.
* gcc.target/s390/vector/long-double-wftcixb-1.c: New test.
* gcc.target/s390/vector/long-double-wftcixb.c: New test.

IBM Z: Store long doubles in vector registers when possible

On z14+, there are instructions for working with 128-bit floats (long
doubles) in vector registers.  It's beneficial to use them instead of
instructions that operate on floating point register pairs, because it
allows to store 4 times more data in registers at a time, relieving
register pressure.  The raw performance of the new instructions is
almost the same as that of the new ones.

Implement by storing TFmode values in vector registers on z14+.  Since
not all operations are available with the new instructions, keep the
old ones available using the new FPRX2 mode, and convert between it and
TFmode when necessary (this is called "forwarder" expanders below).
Change the existing TFmode expanders to call either new- or old-style
ones depending on whether we are on z14+ or older machines
("dispatcher" expanders).

gcc/ChangeLog:

2020-11-03  Ilya Leoshkevich  <iii@linux.ibm.com>

* config/s390/s390-modes.def (FPRX2): New mode.
* config/s390/s390-protos.h (s390_fma_allowed_p): New function.
* config/s390/s390.c (s390_fma_allowed_p): Likewise.
(s390_build_signbit_mask): Support 128-bit masks.
(print_operand): Support printing the second word of a TFmode
operand as vector register.
(constant_modes): Add FPRX2mode.
(s390_class_max_nregs): Return 1 for TFmode on z14+.
(s390_is_fpr128): New function.
(s390_is_vr128): Likewise.
(s390_can_change_mode_class): Use s390_is_fpr128 and
s390_is_vr128 in order to determine whether mode refers to a FPR
pair or to a VR.
(s390_emit_compare): Force TFmode operands into registers on
z14+.
* config/s390/s390.h (HAVE_TF): New macro.
(EXPAND_MOVTF): New macro.
(EXPAND_TF): Likewise.
* config/s390/s390.md (PFPO_OP_TYPE_FPRX2): PFPO_OP_TYPE_TF
alias.
(ALL): Add FPRX2.
(FP_ALL): Add FPRX2 for z14+, restrict TFmode to z13-.
(FP): Likewise.
(FP_ANYTF): New mode iterator.
(BFP): Add FPRX2 for z14+, restrict TFmode to z13-.
(TD_TF): Likewise.
(xde): Add FPRX2.
(nBFP): Likewise.
(nDFP): Likewise.
(DSF): Likewise.
(DFDI): Likewise.
(SFSI): Likewise.
(DF): Likewise.
(SF): Likewise.
(fT0): Likewise.
(bt): Likewise.
(_d): Likewise.
(HALF_TMODE): Likewise.
(tf_fpr): New mode_attr.
(type): New mode_attr.
(*cmp<mode>_ccz_0): Use type instead of mode with fsimp.
(*cmp<mode>_ccs_0_fastmath): Likewise.
(*cmptf_ccs): New pattern for wfcxb.
(*cmptf_ccsfps): New pattern for wfkxb.
(mov<mode>): Rename to mov<mode><tf_fpr>.
(signbit<mode>2): Rename to signbit<mode>2<tf_fpr>.
(isinf<mode>2): Renamed to isinf<mode>2<tf_fpr>.
(*TDC_insn_<mode>): Use type instead of mode with fsimp.
(fixuns_trunc<FP:mode><GPR:mode>2): Rename to
fixuns_trunc<FP:mode><GPR:mode>2<FP:tf_fpr>.
(fix_trunctf<mode>2): Rename to fix_trunctf<mode>2_fpr.
(floatdi<mode>2): Rename to floatdi<mode>2<tf_fpr>, use type
instead of mode with itof.
(floatsi<mode>2): Rename to floatsi<mode>2<tf_fpr>, use type
instead of mode with itof.
(*floatuns<GPR:mode><FP:mode>2): Use type instead of mode for
itof.
(floatuns<GPR:mode><FP:mode>2): Rename to
floatuns<GPR:mode><FP:mode>2<tf_fpr>.
(trunctf<mode>2): Rename to trunctf<mode>2_fpr, use type instead
of mode with fsimp.
(extend<DSF:mode><BFP:mode>2): Rename to
extend<DSF:mode><BFP:mode>2<BFP:tf_fpr>.
(<FPINT:fpint_name><BFP:mode>2): Rename to
<FPINT:fpint_name><BFP:mode>2<BFP:tf_fpr>, use type instead of
mode with fsimp.
(rint<BFP:mode>2): Rename to rint<BFP:mode>2<BFP:tf_fpr>, use
type instead of mode with fsimp.
(<FPINT:fpint_name><DFP:mode>2): Use type instead of mode for
fsimp.
(rint<DFP:mode>2): Likewise.
(trunc<BFP:mode><DFP_ALL:mode>2): Rename to
trunc<BFP:mode><DFP_ALL:mode>2<BFP:tf_fpr>.
(trunc<DFP_ALL:mode><BFP:mode>2): Rename to
trunc<DFP_ALL:mode><BFP:mode>2<BFP:tf_fpr>.
(extend<BFP:mode><DFP_ALL:mode>2): Rename to
extend<BFP:mode><DFP_ALL:mode>2<BFP:tf_fpr>.
(extend<DFP_ALL:mode><BFP:mode>2): Rename to
extend<DFP_ALL:mode><BFP:mode>2<BFP:tf_fpr>.
(add<mode>3): Rename to add<mode>3<tf_fpr>, use type instead of
mode with fsimp.
(*add<mode>3_cc): Use type instead of mode with fsimp.
(*add<mode>3_cconly): Likewise.
(sub<mode>3): Rename to sub<mode>3<tf_fpr>, use type instead of
mode with fsimp.
(*sub<mode>3_cc): Use type instead of mode with fsimp.
(*sub<mode>3_cconly): Likewise.
(mul<mode>3): Rename to mul<mode>3<tf_fpr>, use type instead of
mode with fsimp.
(fma<mode>4): Restrict using s390_fma_allowed_p.
(fms<mode>4): Restrict using s390_fma_allowed_p.
(div<mode>3): Rename to div<mode>3<tf_fpr>, use type instead of
mode with fdiv.
(neg<mode>2): Rename to neg<mode>2<tf_fpr>.
(*neg<mode>2_cc): Use type instead of mode with fsimp.
(*neg<mode>2_cconly): Likewise.
(*neg<mode>2_nocc): Likewise.
(*neg<mode>2): Likeiwse.
(abs<mode>2): Rename to abs<mode>2<tf_fpr>, use type instead of
mode with fdiv.
(*abs<mode>2_cc): Use type instead of mode with fsimp.
(*abs<mode>2_cconly): Likewise.
(*abs<mode>2_nocc): Likewise.
(*abs<mode>2): Likewise.
(*negabs<mode>2_cc): Likewise.
(*negabs<mode>2_cconly): Likewise.
(*negabs<mode>2_nocc): Likewise.
(*negabs<mode>2): Likewise.
(sqrt<mode>2): Rename to sqrt<mode>2<tf_fpr>, use type instead
of mode with fsqrt.
(cbranch<mode>4): Use FP_ANYTF instead of FP.
(copysign<mode>3): Rename to copysign<mode>3<tf_fpr>, use type
instead of mode with fsimp.
* config/s390/s390.opt (flag_vx_long_double_fma): New
undocumented option.
* config/s390/vector.md (V_HW): Add TF for z14+.
(V_HW2): Likewise.
(VFT): Likewise.
(VF_HW): Likewise.
(V_128): Likewise.
(tf_vr): New mode_attr.
(tointvec): Add TF.
(mov<mode>): Rename to mov<mode><tf_vr>.
(movetf): New dispatcher.
(*vec_tf_to_v1tf): Rename to *vec_tf_to_v1tf_fpr, restrict to
z13-.
(*vec_tf_to_v1tf_vr): New pattern for z14+.
(*fprx2_to_tf): Likewise.
(*mov_tf_to_fprx2_0): Likewise.
(*mov_tf_to_fprx2_1): Likewise.
(add<mode>3): Rename to add<mode>3<tf_vr>.
(addtf3): New dispatcher.
(sub<mode>3): Rename to sub<mode>3<tf_vr>.
(subtf3): New dispatcher.
(mul<mode>3): Rename to mul<mode>3<tf_vr>.
(multf3): New dispatcher.
(div<mode>3): Rename to div<mode>3<tf_vr>.
(divtf3): New dispatcher.
(sqrt<mode>2): Rename to sqrt<mode>2<tf_vr>.
(sqrttf2): New dispatcher.
(fma<mode>4): Restrict using s390_fma_allowed_p.
(fms<mode>4): Likewise.
(neg_fma<mode>4): Likewise.
(neg_fms<mode>4): Likewise.
(neg<mode>2): Rename to neg<mode>2<tf_vr>.
(negtf2): New dispatcher.
(abs<mode>2): Rename to abs<mode>2<tf_vr>.
(abstf2): New dispatcher.
(float<mode>tf2_vr): New forwarder.
(float<mode>tf2): New dispatcher.
(floatuns<mode>tf2_vr): New forwarder.
(floatuns<mode>tf2): New dispatcher.
(fix_trunctf<mode>2_vr): New forwarder.
(fix_trunctf<mode>2): New dispatcher.
(fixuns_trunctf<mode>2_vr): New forwarder.
(fixuns_trunctf<mode>2): New dispatcher.
(<FPINT:fpint_name><VF_HW:mode>2<VF_HW:tf_vr>): New pattern.
(<FPINT:fpint_name>tf2): New forwarder.
(rint<mode>2<tf_vr>): New pattern.
(rinttf2): New forwarder.
(*trunctfdf2_vr): New pattern.
(trunctfdf2_vr): New forwarder.
(trunctfdf2): New dispatcher.
(trunctfsf2_vr): New forwarder.
(trunctfsf2): New dispatcher.
(extenddftf2_vr): New pattern.
(extenddftf2): New dispatcher.
(extendsftf2_vr): New forwarder.
(extendsftf2): New dispatcher.
(signbittf2_vr): New forwarder.
(signbittf2): New dispatchers.
(isinftf2_vr): New forwarder.
(isinftf2): New dispatcher.
* config/s390/vx-builtins.md (*vftci<mode>_cconly): Use VF_HW
instead of VECF_HW, add missing constraint, add vw support.
(vftci<mode>_intcconly): Use VF_HW instead of VECF_HW.
(*vftci<mode>): Rename to vftci<mode>, use VF_HW instead of
VECF_HW, and vw support.
(vftci<mode>_intcc): Use VF_HW instead of VECF_HW.

Fix wrong code for boolean negation in condition at -O2

The problem is the bitwise/logical dichotomy for operators and the
transition from the former to the latter for boolean types: if they
are 1-bit, that's straightforward but, if they are larger, then you
need to be careful because you cannot, on the one hand, turn a bitwise
AND into a logical AND and, on the other hand, *not* turn e.g. a
bitwise NOT into a logical NOT if they occur in the same computation,
as the first change will drop the masking that may need to be applied
after the bitwise NOT if it is not also changed.

Given that the ranger turns bitwise AND/OR into logical AND/OR for
booleans, the patch does the same for bitwise NOT.

gcc/ChangeLog:
* range-op.cc (operator_logical_not::fold_range): Tidy up.
(operator_logical_not::op1_range): Call above method.
(operator_bitwise_not::fold_range): If the type is compatible
with boolean, call op_logical_not.fold_range.
(operator_bitwise_not::op1_range): If the type is compatible
with boolean, call op_logical_not.op1_range.

gcc/testsuite/ChangeLog:
* gnat.dg/opt88.adb: New test.

More PRE TLC

This makes get_expr_value_id cheap and completes the
constant value-id simplification by turning the constant_value_expressions
into a direct map instead of a set of pre_exprs for the value.

2020-11-10 Richard Biener <rguenther@suse.de>

* tree-ssa-pre.c (pre_expr_d::value_id): Add.
(constant_value_expressions): Turn into an array of pre_expr.
(get_or_alloc_expr_for_nary): New function.
(get_or_alloc_expr_for_reference): Likewise.
(add_to_value): For constant values only ever add a single
CONSTANT.
(get_expr_value_id): Return the new value_id member.
(vn_valnum_from_value_id): Split out and simplify constant
value id handling.
(get_or_alloc_expr_for_constant): Set the value_id member.
(phi_translate_1): Use get_or_alloc_expr_for_*.
(compute_avail): Likewise.
(bitmap_find_leader): Simplify constant value id handling.

aarch64: Skip arm targets in vq*shr*n_high_n intrinsic tests

These tests should be skipped for arm targets as the instrinsics
are only supported on aarch64.

gcc/testsuite/ChangeLog

2020-11-10 David Candler <david.candler@arm.com>

* gcc.target/aarch64/advsimd-intrinsics/vqrshrn_high_n.c: Added skip
directive.
* gcc.target/aarch64/advsimd-intrinsics/vqrshrun_high_n.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vqshrn_high_n.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vqshrun_high_n.c: Likewise.

doc: Fix grammar in description of earlyclobber

gcc/ChangeLog:

* doc/md.texi (Modifiers): Fix grammar in description of
earlyclobber constraint modifier.

sccvn: Fix up push_partial_def little-endian bitfield handling [PR97764]

This patch fixes a thinko in the left-endian push_partial_def path.
As the testcase shows, we have 3 bitfields in the struct,
bitoff  bitsize
0       3
3       28
31      1
the corresponding read is the byte at offset 3 (i.e. 24 bits)
and push_partial_def first handles the full store ({}) to all bits
and then is processing the store to the middle bitfield with value of -1.
Here are the interesting spots:
  pd.offset -= offseti;
this adjusts the pd to { -21, 28 }, the (for little-endian lowest) 21
bits aren't interesting to us, we only care about the upper 7.
          len = native_encode_expr (pd.rhs, this_buffer, bufsize,
                                    MAX (0, -pd.offset) / BITS_PER_UNIT);
native_encode_expr has the offset parameter in bytes and we tell it
that we aren't interested in the first (lowest) two bytes of the number.
It encodes 0xff, 0xff with len == 2 then.
      HOST_WIDE_INT size = pd.size;
      if (pd.offset < 0)
        size -= ROUND_DOWN (-pd.offset, BITS_PER_UNIT);
we get 28 - 16, i.e. 12 - the 16 is subtracting those 2 bytes that we
omitted in native_encode_expr.
          size = MIN (size, (HOST_WIDE_INT) needed_len * BITS_PER_UNIT);
needed_len is how many bytes the read at most needs, and that is 1,
so we get size 8 and copy all 8 bits (i.e. a single byte plus nothing)
from the native_encode_expr filled this_buffer; this incorrectly sets
the byte to 0xff when we want 0x7f.  The above line is correct for the
pd.offset >= 0 case when we don't skip anything, but for the pd.offset < 0
case we need to subtract also the remainder of the bits we aren't interested
in (the code shifts the bytes by that number of bits).
If it weren't for the big-endian path, we could as well do
      if (pd.offset < 0)
        size += pd.offset;
but the big-endian path needs it differently.
With the following patch, amnt is 3 and we subtract from 12 the (8 - 3)
bits and thus get the 7 which is the value we want.

2020-11-10  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/97764
* tree-ssa-sccvn.c (vn_walk_cb_data::push_partial_def): For
little-endian stores with negative pd.offset, subtract
BITS_PER_UNIT - amnt from size if amnt is non-zero.

* gcc.c-torture/execute/pr97764.c: New test.

Fortran: Fix function decl's location [PR95847]

gcc/fortran/ChangeLog:

PR fortran/95847
* trans-decl.c (gfc_get_symbol_decl): Do not (re)set the location
of an external procedure.
(build_entry_thunks, generate_coarray_init, create_main_function,
gfc_generate_function_code): Use fndecl's location in BIND_EXPR.

gcc/testsuite/ChangeLog:

PR fortran/95847
* gfortran.dg/coverage.f90: New test.

tree-optimization/97760 - reduction paths with unhandled live stmt

This makes sure we reject reduction paths with a live stmt that
is not the last one altering the value. This is because we do not
handle this in the epilogue unless there's a scalar epilogue loop.

2020-11-09 Richard Biener <rguenther@suse.de>

PR tree-optimization/97760
* tree-vect-loop.c (check_reduction_path): Reject
reduction paths we do not handle in epilogue generation.

* gcc.dg/vect/pr97760.c: New testcase.

Normalize VARYING for -fstrict-enums.

The problem here is that the representation for VARYING in
-fstrict-enums is different between value_range and irange.

The helper function irange::normalize_min_max() will normalize to
VARYING only if setting the range to the entire domain of the
underlying type.  That is, [0, 0xff..ff], not the domain as defined by
-fstrict-enums.  This causes problems because the multi-range version
of varying_p() will return true if the range is the domain as defined
by -fstrict-enums.  Thus, normalize_min_max and varying_p have
different concepts of varying for multi-ranges.

(BTW, legacy ranges are different because they never look at the
extremes of a range to determine varying-ness.  They only look at the
kind field.)

One approach is to change all the code to limit ranges to the domain
in the -fstrict-enums world, but this won't work because there are
various instances of gimple where the values assigned or compared are
beyond the limits of TYPE_{MIN,MAX}_VALUE.  One example is the
addition of 0xffffffff to represent subtraction.

This patch fixes multi-range varying_p() and set_varying() to agree
with the normalization code, using the extremes of the underlying type,
to represent varying.

gcc/ChangeLog:

PR tree-optimization/97767
* value-range.cc (dump_bound_with_infinite_markers): Use
wi::min_value and wi::max_value.
(range_tests_strict_enum): New.
(range_tests): Call range_tests_strict_enum.
* value-range.h (irange::varying_p): Use wi::min_value
and wi::max_value.
(irange::set_varying): Same.
(irange::normalize_min_max): Remove comment.

gcc/testsuite/ChangeLog:

* g++.dg/opt/pr97767.C: New test.

Adjust Keylocker regex pattern for darwin, and add missing aesenc256kl test.

gcc/testsuite/ChangeLog

* gcc.target/i386/keylocker-aesdec128kl.c: Adjust regex patterns.
* gcc.target/i386/keylocker-aesdec256kl.c: Likewise.
* gcc.target/i386/keylocker-aesdecwide128kl.c: Likewise.
* gcc.target/i386/keylocker-aesdecwide256kl.c: Likewise.
* gcc.target/i386/keylocker-aesenc128kl.c: Likewise.
* gcc.target/i386/keylocker-aesencwide128kl.c: Likewise.
* gcc.target/i386/keylocker-aesencwide256kl.c: Likewise.
* gcc.target/i386/keylocker-encodekey128.c: Likewise.
* gcc.target/i386/keylocker-encodekey256.c: Likewise.
* gcc.target/i386/keylocker-aesenc256kl.c: New test.

Fix logical_combine OR operation. Again.

The original fix was incorrect and results in loss of opportunities.
Revert the original fix. When processing logical chains, do not
follow chains outside of the current basic block. Use the import
value instead.

gcc/
PR tree-optimization/97567
* gimple-range-gori.cc: (gori_compute::logical_combine): False
OR operations should intersect the 2 results.
(gori_compute::compute_logical_operands_in_chain): If def chains
are outside the current basic block, don't follow them.
gcc/testsuite/
* gcc.dg/pr97567-2.c: New.

Daily bump.

c++: DR 1914 - Allow duplicate standard attributes.

Following Joseph's change for C to allow duplicate C2x standard attributes
<https://gcc.gnu.org/pipermail/gcc-patches/2020-October/557272.html>,
this patch does a similar thing for C++.  This is DR 1914, to be resolved by
<wg21.link/p2156>, which is not part of the standard yet, but has wide
support so looks like a shoo-in.  The duplications now produce warnings
instead, but only if the attribute wasn't specified via a macro.

gcc/c-family/ChangeLog:

DR 1914
* c-common.c (attribute_fallthrough_p): Tweak the warning
message.

gcc/cp/ChangeLog:

DR 1914
* parser.c (cp_parser_check_std_attribute): Return bool.  Add a
location_t parameter.  Return true if the attribute wasn't duplicated.
Give a warning instead of an error.  Check more attributes.
(cp_parser_std_attribute_list): Don't add duplicated attributes to
the list.  Pass location to cp_parser_check_std_attribute.

gcc/testsuite/ChangeLog:

DR 1914
* c-c++-common/attr-fallthrough-2.c: Adjust dg-warning.
* g++.dg/cpp0x/fallthrough2.C: Likewise.
* g++.dg/cpp0x/gen-attrs-60.C: Turn dg-error into dg-warning.
* g++.dg/cpp1y/attr-deprecated-2.C: Likewise.
* g++.dg/cpp2a/attr-likely2.C: Adjust dg-warning.
* g++.dg/cpp2a/nodiscard-once.C: Turn dg-error into dg-warning.
* g++.dg/cpp0x/gen-attrs-72.C: New test.

c++: Consider only relevant template arguments in sat_hasher

A large source of cache misses in satisfy_atom is caused by the identity
of an (atom,args) pair within the satisfaction cache being determined by
the entire set of supplied template arguments rather than by the subset
of template arguments that the atom actually depends on.  For instance,
consider

  template <class T> concept range = range_v<T>;
  template <class U> void foo () requires range<U>;
  template <class U, class V> void bar () requires range<U>;

The associated constraints of foo and bar are equivalent: they both
consist of the atom range_v<T> (with mapping T -> U).  But the sat_cache
currently will never reuse a satisfaction value between the two atoms
because foo has one template parameter and bar has two, and the
satisfaction cache conservatively assumes that all template parameters
of the constrained decl are relevant to a satisfaction value of one of
its atoms.

This patch eliminates this assumption and makes the sat_cache instead
care about just the subset of args of an (atom,args) pair that is
relevant to satisfaction.

This patch additionally fixes a seemingly latent bug that was found when
testing against range-v3.  In the testcase concepts-decltype2.C below,
during normalization of f's constraints we end up forming a TARGET_EXPR
whose _SLOT has a DECL_CONTEXT that points to g instead of f because
current_function_decl is not updated before we start normalizing.
This patch fixes this accordingly, and also adds a sanity check to
keep_template_parm to verify each found parameter has a valid index.

With this patch, compile time and memory usage for the cmcstl2 test
test/algorithm/set_symmetric_difference4.cpp drops from 8.5s/1.2GB to
3.5s/0.4GB.

gcc/cp/ChangeLog:

* constraint.cc (norm_info::norm_info): Initialize orig_decl.
(norm_info::orig_decl): New data member.
(normalize_atom): When caching an atom for the first time,
compute a list of template parameters used in the targets of the
parameter mapping and store it in the TREE_TYPE of the mapping.
(get_normalized_constraints_from_decl): Set current_function_decl
appropriately when normalizing.  As an optimization, don't
set up a push_nested_class_guard when decl has no constraints.
(sat_hasher::hash): Use this list to hash only the template
arguments that are relevant to the atom.
(satisfy_atom): Use this list to compare only the template
arguments that are relevant to the atom.
* pt.c (keep_template_parm): Do a sanity check on the parameter's
index when flag_checking.

c++: Use two levels of caching in satisfy_atom

This improves the effectiveness of caching in satisfy_atom by querying
the cache again after we've instantiated the atom's parameter mapping.

Before instantiating its mapping, the identity of an (atom,args) pair
within the satisfaction cache is determined by idiosyncratic things like
the level and index of each template parameter used in targets of the
parameter mapping.  For example, the associated constraints of foo in

  template <class T> concept range = range_v<T>;
  template <class U, class V> void foo () requires range<U> && range<V>;

are range_v<T> (with mapping T -> U) /\ range_v<T> (with mapping T -> V).
If during satisfaction the template arguments supplied for U and V are
the same, then the satisfaction value of these two atoms will be the
same (despite their uninstantiated parameter mappings being different).

But sat_cache doesn't see this because it compares the uninstantiated
parameter mapping and the supplied template arguments of sat_entry's
independently.  So satisy_atom on this latter atom will end up fully
evaluating it instead of reusing the satisfaction value of the former.

But there is a point when the two atoms do look the same to sat_cache,
and that's after instantiating their parameter mappings.  By querying
the cache again at this point, we can avoid substituting the same
instantiated parameter mapping into the same expression a second time
around.

With this patch, compile time and memory usage for the cmcstl2 test
test/algorithm/set_symmetric_diference4.cpp drops from 11s/1.4GB to
8.5s/1.2GB with an --enable-checking=release compiler.

gcc/cp/ChangeLog:

* cp-tree.h (ATOMIC_CONSTR_MAP_INSTANTIATED_P): Define this flag
for ATOMIC_CONSTRs.
* constraint.cc (sat_hasher::hash): Use hash_atomic_constraint
if the flag is set, otherwise keep using a pointer hash.
(sat_hasher::equal): Return false if the flag's setting differs
on two atoms.  Call atomic_constraints_identical_p if the flag
is set, otherwise keep using a pointer equality test.
(satisfy_atom): After instantiating the parameter mapping, form
another ATOMIC_CONSTR using the instantiated mapping and query
the cache again.  Cache the satisfaction value of both atoms.
(diagnose_atomic_constraint): Simplify now that the supplied
atom has an instantiated mapping.