git.libre-soc.org Git

Use OEP_MATCH_SIDE_EFFECTS in compare_ao_refs

* tree-ssa-alias.c (ao_compare::compare_ao_refs,
ao_compare::hash_ao_ref): Use OEP_MATCH_SIDE_EFFECTS.

c++: Fix wrong error with constexpr destructor [PR97427]

When I implemented the code to detect modifying const objects in
constexpr contexts, we couldn't have constexpr destructors, so I didn't
consider them.  But now we can and that caused a bogus error in this
testcase: [class.dtor]p5 says that "const and volatile semantics are not
applied on an object under destruction.  They stop being in effect when
the destructor for the most derived object starts." so we have to clear
the TREE_READONLY flag we set on the object after the constructors have
been called to mark it as no-longer-under-construction.  In the ~Foo
call it's now an object under destruction, so don't report those errors.

gcc/cp/ChangeLog:

PR c++/97427
* constexpr.c (cxx_set_object_constness): New function.
(cxx_eval_call_expression): Set new_obj for destructors too.
Call cxx_set_object_constness to set/unset TREE_READONLY of
the object under construction/destruction.

gcc/testsuite/ChangeLog:

PR c++/97427
* g++.dg/cpp2a/constexpr-dtor10.C: New test.

libstdc++: Fix atomic waiting for non-linux targets

This fixes some UNRESOLVED tests on (at least) Solaris and Darwin, and
disables some tests that hang forever on Solaris. A proper fix is still
needed.

libstdc++-v3/ChangeLog:

* include/bits/atomic_base.h (atomic_flag::wait): Use correct
type for __atomic_wait call.
* include/bits/atomic_timed_wait.h (__atomic_wait_until): Check
_GLIBCXX_HAVE_LINUX_FUTEX.
* include/bits/atomic_wait.h (__atomic_notify): Likewise.
* include/bits/semaphore_base.h (_GLIBCXX_HAVE_POSIX_SEMAPHORE):
Only define if SEM_VALUE_MAX or _POSIX_SEM_VALUE_MAX is defined.
* testsuite/29_atomics/atomic/wait_notify/bool.cc: Disable on
non-linux targes.
* testsuite/29_atomics/atomic/wait_notify/generic.cc: Likewise.
* testsuite/29_atomics/atomic/wait_notify/pointers.cc: Likewise.
* testsuite/29_atomics/atomic_flag/wait_notify/1.cc: Likewise.
* testsuite/29_atomics/atomic_float/wait_notify.cc: Likewise.

Update vec-35.c and vect-35-big-array.c

We now determine depnedencies across union fields correctly.

* gcc.dg/vect/vect-35-big-array.c: Excpect 2 loops to be vectorized.
* gcc.dg/vect/vect-35.c: Excpect 2 loops to be vectorized.

Improve hasing of anonymous namespace types

* ipa-icf.c (sem_function::equals_wpa): Do not compare ODR type with
-fno-devirtualize.
(sem_item_optimizer::update_hash_by_addr_refs): Hash anonymous ODR
types by TYPE_UID of their main variant.

Re-enable vector pair memcpy/memmove expansion

After the MMA opaque mode patch goes in, we can re-enable
use of vector pair in the inline expansion of memcpy/memmove.

gcc/
* config/rs6000/rs6000.c (rs6000_option_override_internal):
Enable vector pair memcpy/memmove expansion.

Make MMA builtins use opaque modes

This patch changes powerpc MMA builtins to use the new opaque
mode class and use modes OO (32 bytes) and XO (64 bytes)
instead of POI/PXI. Using the opaque modes prevents
optimization from trying to do anything with vector
pair/quad, which was the problem we were seeing with the
partial integer modes.

gcc/
* config/rs6000/mma.md (unspec): Add assemble/extract UNSPECs.
(movoi): Change to movoo.
(*movpoi): Change to *movoo.
(movxi): Change to movxo.
(*movpxi): Change to *movxo.
(mma_assemble_pair): Change to OO mode.
(*mma_assemble_pair): New define_insn_and_split.
(mma_disassemble_pair): New define_expand.
(*mma_disassemble_pair): New define_insn_and_split.
(mma_assemble_acc): Change to XO mode.
(*mma_assemble_acc): Change to XO mode.
(mma_disassemble_acc): New define_expand.
(*mma_disassemble_acc): New define_insn_and_split.
(mma_<acc>): Change to XO mode.
(mma_<vv>): Change to XO mode.
(mma_<avv>): Change to XO mode.
(mma_<pv>): Change to OO mode.
(mma_<apv>): Change to XO/OO mode.
(mma_<vvi4i4i8>): Change to XO mode.
(mma_<avvi4i4i8>): Change to XO mode.
(mma_<vvi4i4i2>): Change to XO mode.
(mma_<avvi4i4i2>): Change to XO mode.
(mma_<vvi4i4>): Change to XO mode.
(mma_<avvi4i4>): Change to XO mode.
(mma_<pvi4i2>): Change to XO/OO mode.
(mma_<apvi4i2>): Change to XO/OO mode.
(mma_<vvi4i4i4>): Change to XO mode.
(mma_<avvi4i4i4>): Change to XO mode.
* config/rs6000/predicates.md (input_operand): Allow opaque.
(mma_disassemble_output_operand): New predicate.
* config/rs6000/rs6000-builtin.def:
Changes to disassemble builtins.
* config/rs6000/rs6000-call.c (rs6000_return_in_memory):
Disallow __vector_pair/__vector_quad as return types.
(rs6000_promote_function_mode): Remove function return type
check because we can't test it here any more.
(rs6000_function_arg): Do not allow __vector_pair/__vector_quad
as as function arguments.
(rs6000_gimple_fold_mma_builtin):
Handle mma_disassemble_* builtins.
(rs6000_init_builtins): Create types for XO/OO modes.
* config/rs6000/rs6000-modes.def: DElete OI, XI,
POI, and PXI modes, and create XO and OO modes.
* config/rs6000/rs6000-string.c (expand_block_move):
Update to OO mode.
* config/rs6000/rs6000.c (rs6000_hard_regno_mode_ok_uncached):
Update for XO/OO modes.
(rs6000_rtx_costs): Make UNSPEC_MMA_XXSETACCZ cost 0.
(rs6000_modes_tieable_p): Update for XO/OO modes.
(rs6000_debug_reg_global): Update for XO/OO modes.
(rs6000_setup_reg_addr_masks): Update for XO/OO modes.
(rs6000_init_hard_regno_mode_ok): Update for XO/OO modes.
(reg_offset_addressing_ok_p): Update for XO/OO modes.
(rs6000_emit_move): Update for XO/OO modes.
(rs6000_preferred_reload_class): Update for XO/OO modes.
(rs6000_split_multireg_move): Update for XO/OO modes.
(rs6000_mangle_type): Update for opaque types.
(rs6000_invalid_conversion): Update for XO/OO modes.
* config/rs6000/rs6000.h (VECTOR_ALIGNMENT_P):
Update for XO/OO modes.
* config/rs6000/rs6000.md (RELOAD): Update for XO/OO modes.
gcc/testsuite/
* gcc.target/powerpc/mma-double-test.c (main): Call abort for failure.
* gcc.target/powerpc/mma-single-test.c (main): Call abort for failure.
* gcc.target/powerpc/pr96506.c: Rename to pr96506-1.c.
* gcc.target/powerpc/pr96506-2.c: New test.

Additional small changes to support opaque modes

After building some larger codes using opaque types and some c++ codes
using opaque types it became clear I needed to go through and look for
places where opaque types and modes needed to be handled. A whole pile
of one-liners.

gcc/
* typeclass.h: Add opaque_type_class.
* builtins.c (type_to_class): Identify opaque type class.
* dwarf2out.c (is_base_type): Handle opaque types.
(gen_type_die_with_usage): Handle opaque types.
* expr.c (count_type_elements): Opaque types should
never have initializers.
* ipa-devirt.c (odr_types_equivalent_p): No type-specific handling
for opaque types is needed as it eventually checks the underlying
mode which is what is important.
* tree-streamer.c (record_common_node): Handle opaque types.
* tree.c (type_contains_placeholder_1): Handle opaque types.
(type_cache_hasher::equal): No additional comparison needed for
opaque types.
gcc/c-family
* c-pretty-print.c (c_pretty_printer::simple_type_specifier):
Treat opaque types like other types.
(c_pretty_printer::direct_abstract_declarator): Opaque types are
supported types.
gcc/c
* c-aux-info.c (gen_type): Support opaque types.
gcc/cp
* error.c (dump_type): Handle opaque types.
(dump_type_prefix): Handle opaque types.
(dump_type_suffix): Handle opaque types.
(dump_expr): Handle opaque types.
* pt.c (tsubst): Allow opaque types in templates.
(unify): Allow opaque types in templates.
* typeck.c (structural_comptypes): Handle comparison
of opaque types.

Darwin, libgfortran : Do not use environ directly from the library.

On macOS / Darwin, the environ variable can be used directly in the
code of an executable, but cannot be used in the code of a shared
library (i.e. libgfortran.dylib), in this case.

In such cases, the function _NSGetEnviron should be called to get
the address of 'environ'.

libgfortran/ChangeLog:

* intrinsics/execute_command_line.c (environ): Use
_NSGetEnviron to get the environment pointer on Darwin.

Darwin, libsanitizer : Support libsanitizer for x86_64-darwin20.

The sanitizer is supported for at least x86_64 and Darwin20.

libsanitizer/ChangeLog:

* configure.tgt: Allow x86_64 Darwin2x.

libgo: update to Go 1.15.5 release

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/272146

Daily bump.

Include math.h in nextafter-2.c test.

Since the test is compiled with -fno-builtin, include math.h to allow for
implementations (like the PowerPC) that have multiple versions of long double
that are selectable by switch. Math.h could possibly switch what function
nextafterl points to.

gcc/testsuite/
2020-11-17 Michael Meissner <meissner@linux.ibm.com>

* gcc.dg/nextafter-2.c: Include math.h.

Power10: Add missing IEEE 128-bit XSCMP* built-in mappings.

This patch adds support for mapping the scalar_cmp_exp_qp_* built-in functions
to handle arguments that are either TFmode or KFmode, depending on whether long
double uses the IEEE 128-bit representation (TFmode) or the IBM 128-bit
representation (KFmode). This shows up in the float128-cmp2-runnable.c test
when long double uses the IEEE 128-bit representation.

gcc/
2020-11-20 Michael Meissner <meissner@linux.ibm.com>

* config/rs6000/rs6000-call.c (rs6000_expand_builtin): Add missing
XSCMP* cases for IEEE 128-bit long double.

libstdc++: Add C++2a synchronization support

Add support for -
  * atomic_flag::wait/notify_one/notify_all
  * atomic::wait/notify_one/notify_all
  * counting_semaphore
  * binary_semaphore
  * latch

libstdc++-v3/ChangeLog:

* include/Makefile.am (bits_headers): Add new header.
* include/Makefile.in: Regenerate.
* include/bits/atomic_base.h (__atomic_flag::wait): Define.
(__atomic_flag::notify_one): Likewise.
(__atomic_flag::notify_all): Likewise.
(__atomic_base<_Itp>::wait): Likewise.
(__atomic_base<_Itp>::notify_one): Likewise.
(__atomic_base<_Itp>::notify_all): Likewise.
(__atomic_base<_Ptp*>::wait): Likewise.
(__atomic_base<_Ptp*>::notify_one): Likewise.
(__atomic_base<_Ptp*>::notify_all): Likewise.
(__atomic_impl::wait): Likewise.
(__atomic_impl::notify_one): Likewise.
(__atomic_impl::notify_all): Likewise.
(__atomic_float<_Fp>::wait): Likewise.
(__atomic_float<_Fp>::notify_one): Likewise.
(__atomic_float<_Fp>::notify_all): Likewise.
(__atomic_ref<_Tp>::wait): Likewise.
(__atomic_ref<_Tp>::notify_one): Likewise.
(__atomic_ref<_Tp>::notify_all): Likewise.
(atomic_wait<_Tp>): Likewise.
(atomic_wait_explicit<_Tp>): Likewise.
(atomic_notify_one<_Tp>): Likewise.
(atomic_notify_all<_Tp>): Likewise.
* include/bits/atomic_wait.h: New file.
* include/bits/atomic_timed_wait.h: New file.
* include/bits/semaphore_base.h: New file.
* include/std/atomic (atomic<bool>::wait): Define.
(atomic<bool>::wait_one): Likewise.
(atomic<bool>::wait_all): Likewise.
(atomic<_Tp>::wait): Likewise.
(atomic<_Tp>::wait_one): Likewise.
(atomic<_Tp>::wait_all): Likewise.
(atomic<_Tp*>::wait): Likewise.
(atomic<_Tp*>::wait_one): Likewise.
(atomic<_Tp*>::wait_all): Likewise.
* include/std/latch: New file.
* include/std/semaphore: New file.
* include/std/version: Add __cpp_lib_semaphore and
__cpp_lib_latch defines.
* testsuite/29_atomics/atomic/wait_notify/bool.cc: New test.
* testsuite/29_atomics/atomic/wait_notify/pointers.cc: Likewise.
* testsuite/29_atomics/atomic/wait_notify/generic.cc: Liekwise.
* testsuite/29_atomics/atomic_flag/wait_notify/1.cc: Likewise.
* testsuite/29_atomics/atomic_float/wait_notify.cc: Likewise.
* testsuite/29_atomics/atomic_integral/wait_notify.cc: Likewise.
* testsuite/29_atomics/atomic_ref/wait_notify.cc: Likewise.
* testsuite/30_threads/semaphore/1.cc: New test.
* testsuite/30_threads/semaphore/2.cc: Likewise.
* testsuite/30_threads/semaphore/least_max_value_neg.cc: Likewise.
* testsuite/30_threads/semaphore/try_acquire.cc: Likewise.
* testsuite/30_threads/semaphore/try_acquire_for.cc: Likewise.
* testsuite/30_threads/semaphore/try_acquire_posix.cc: Likewise.
* testsuite/30_threads/semaphore/try_acquire_until.cc: Likewise.
* testsuite/30_threads/latch/1.cc: New test.
* testsuite/30_threads/latch/2.cc: New test.
* testsuite/30_threads/latch/3.cc: New test.
* testsuite/util/atomic/wait_notify_util.h: New File.

dwarf2: ICE with local class in unused function [PR97918]

Here, since we only mention bar<B>, we never emit debug information for it.
But we do emit debug information for H<J>::h, so we need to refer to the
debug info for bar<B>::J even though there is no bar<B>.  We deal with this
sort of thing in dwarf2out with the limbo_die_list; parentless dies like J
get attached to the CU at EOF.  But here, we were flushing the limbo list,
then generating the template argument DIE for H<J> that refers to J, which
adds J to the limbo list, too late to be flushed.  So let's flush a little
later.

gcc/ChangeLog:

PR c++/97918
* dwarf2out.c (dwarf2out_early_finish): flush_limbo_die_list
after gen_scheduled_generic_parms_dies.

gcc/testsuite/ChangeLog:

PR c++/97918
* g++.dg/debug/localclass2.C: New test.

PR middle-end/97861 - ICE on an invalid redeclaration of a function with attribute access

gcc/c-family/ChangeLog:
* c-warn.c (warn_parm_array_mismatch): Bail on invalid redeclarations
with fewer arguments.

gcc/testsuite/ChangeLog:
* gcc.dg/attr-access-4.c: New test.

libstdc++: Limit memory allocation in stable_sort/inplace_merge (PR 83938)

Reduce memory allocation in stable_sort/inplace_merge algorithms to what is needed
by the implementation.

Co-authored-by: John Chang <john.chang@samba.tv>
libstdc++-v3/ChangeLog:

PR libstdc++/83938
* include/bits/stl_tempbuf.h (get_temporary_buffer): Change __len
computation in the loop to avoid truncation.
* include/bits/stl_algo.h:
(__inplace_merge): Take temporary buffer length from smallest range.
(__stable_sort): Limit temporary buffer length.
* testsuite/25_algorithms/inplace_merge/1.cc (test4): New.
* testsuite/performance/25_algorithms/stable_sort.cc: Test stable_sort
under different heap memory conditions.
* testsuite/performance/25_algorithms/inplace_merge.cc: New test.

libada: Check for the presence of _SC_NPROCESSORS_ONLN

Check for the presence of _SC_NPROCESSORS_ONLN rather than using a list
of OS-specific macros to decide whether to use `sysconf' like elsewhere
across GCC sources, fixing a compilation error:

adaint.c: In function '__gnat_number_of_cpus':
adaint.c:2398:26: error: '_SC_NPROCESSORS_ONLN' undeclared (first use in this function)
2398 | cores = (int) sysconf (_SC_NPROCESSORS_ONLN);
| ^~~~~~~~~~~~~~~~~~~~
adaint.c:2398:26: note: each undeclared identifier is reported only once for each function it appears in

at least with with VAX/NetBSD 1.6.2.

gcc/ada/
* adaint.c (__gnat_number_of_cpus): Check for the presence of
_SC_NPROCESSORS_ONLN rather than a list of OS-specific macros
to decide whether to use `sysconf'.

NetBSD/libgcc: Check for TARGET_DL_ITERATE_PHDR in the unwinder

Disable USE_PT_GNU_EH_FRAME frame unwinder support for old OS versions,
fixing compilation errors:

.../libgcc/unwind-dw2-fde-dip.c:75:21: error: unknown type name 'Elf_Phdr'
   75 | # define ElfW(type) Elf_##type
      |                     ^~~~
.../libgcc/unwind-dw2-fde-dip.c:132:9: note: in expansion of macro 'ElfW'
  132 |   const ElfW(Phdr) *p_eh_frame_hdr;
      |         ^~~~
.../libgcc/unwind-dw2-fde-dip.c:75:21: error: unknown type name 'Elf_Phdr'
   75 | # define ElfW(type) Elf_##type
      |                     ^~~~
.../libgcc/unwind-dw2-fde-dip.c:133:9: note: in expansion of macro 'ElfW'
  133 |   const ElfW(Phdr) *p_dynamic;
      |         ^~~~
.../libgcc/unwind-dw2-fde-dip.c:165:37: warning: 'struct dl_phdr_info' declared inside parameter list will not be visible outside of this definition or declaration
  165 | _Unwind_IteratePhdrCallback (struct dl_phdr_info *info, size_t size, void *ptr)
      |                                     ^~~~~~~~~~~~
[...]

and producing a working cross-compiler at least with VAX/NetBSD 1.6.2.

libgcc/
* unwind-dw2-fde-dip.c [__OpenBSD__ || __NetBSD__]
(USE_PT_GNU_EH_FRAME): Do not define if !TARGET_DL_ITERATE_PHDR.

PR middle-end/97879 - ICE on invalid mode in attribute access

gcc/c-family/ChangeLog:

PR middle-end/97879
* c-attribs.c (handle_access_attribute): Handle ATTR_FLAG_INTERNAL.
Error out on invalid modes.

gcc/c/ChangeLog:
PR middle-end/97879
* c-decl.c (start_function): Set ATTR_FLAG_INTERNAL in flags.

gcc/ChangeLog:

PR middle-end/97879
* tree-core.h (enum attribute_flags): Add ATTR_FLAG_INTERNAL.

gcc/testsuite/ChangeLog:

PR middle-end/97879
* gcc.dg/attr-access-3.c: New test.

compiler, libgo: change mangling scheme

Overhaul the mangling scheme to avoid ambiguities if the package path
contains a dot. Instead of using dot both to separate components and
to mangle characters, use dot only to separate components and use
underscore to mangle characters.

For golang/go#41862

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/271726

libstdc++: _Rb_tree code cleanup, remove lambdas

Use new template parameters to replace usage of lambdas to move or not
tree values on copy.

libstdc++-v3/ChangeLog:

* include/bits/move.h (_GLIBCXX_FWDREF): New.
* include/bits/stl_tree.h: Adapt to use latter.
(_Rb_tree<>::_M_clone_node): Add _MoveValue template parameter.
(_Rb_tree<>::_M_mbegin): New.
(_Rb_tree<>::_M_begin): Use latter.
(_Rb_tree<>::_M_copy): Add _MoveValues template parameter.
* testsuite/23_containers/map/allocator/move_cons.cc: New test.
* testsuite/23_containers/multimap/allocator/move_cons.cc: New test.
* testsuite/23_containers/multiset/allocator/move_cons.cc: New test.
* testsuite/23_containers/set/allocator/move_cons.cc: New test.

Improve hashing of decls in ipa-icf-gimple

Another remaining case is that we end up comparing calls with mismatching
number of parameters or with different permutations of them.

This is because we hash decls to nothing. This patch improves that by
hashing decls by their code and parm decls by indexes that are stable.
Also for defualt defs in SSA_NAMEs we can add the corresponding decl (that
is usually parm decls).

Still we could improve on this by hasing ssa names by their definit parameters
and possibly making maps of other decls and assigning them stable function
local IDs.

* ipa-icf-gimple.c (func_checker::hash_operand): Improve hashing of
decls.

Only compare sizes of automatic variables

one of common remaining reasons for ICF to fail after loading in fuction
body is mismatched type of automatic vairable.   This is becuase
compatible_types_p resorts to checking TYPE_MAIN_VARIANTS for
euqivalence that prevents merging many TBAA compaitle cases.  (And thus
is also not reflected by the hash extended by alias sets of accesses.)

Since in gimple
automatic variables are just blocks of memory I think we should only
check its size only. All accesses are matched when copmparing the actual
loads/stores.

I am not sure if we need to match types of other DECLs but I decided I can try
to be safe here: for PARM_DECl/RESUILT_DECL we match them anyway to be sure
that functions are ABI compatible.  For CONST_DECL and readonly global
VAR_DECLs they are matched when comparing their constructors.

* ipa-icf-gimple.c (func_checker::compare_decl): Do not compare types
of local variables.

re: FAIL: gcc.dg/pr97515.c

Adjust testcase to check in CCP not EVRP.

gcc/testuite/
* gcc.dg/pr97515.c: Check in ccp2, not evrp.

PR target/97727 aarch64: [testcase] fix bf16_vstN_lane_2.c for big endian targets

gcc/testsuite/ChangeLog

2020-11-09 Andrea Corallo <andrea.corallo@arm.com>

PR target/97727
* gcc.target/aarch64/advsimd-intrinsics/bf16_vstN_lane_2.c: Relax
regexps.

doc: Fixup a couple of formatting nits

I noticed a couple of places we used @code{program} instead of
@command{program}.

gcc/
* doc/invoke.texi: Replace a couple of @code with @command

[PR target/97726] arm: [testsuite] fix some simd tests on armbe

2020-11-10 Andrea Corallo <andrea.corallo@arm.com>

PR target/97726
* gcc.target/arm/simd/bf16_vldn_1.c: Relax regexps not to fail on
big endian.
* gcc.target/arm/simd/vldn_lane_bf16_1.c: Likewise
* gcc.target/arm/simd/vmmla_1.c: Add -mfloat-abi=hard flag.

SLP: Have vectorizable_slp_permutation set type on invariants

This modifies vectorizable_slp_permutation to update the type of the children
of a perm node before trying to permute them. This allows us to be able to
permute invariant nodes.

This will be covered by test from the SLP pattern matcher.

gcc/ChangeLog:

* tree-vect-slp.c (vectorizable_slp_permutation): Update types on nodes
when needed.

libstdc++: Remove <memory_resource> dependency from <regex> [PR 92546]

Unlike the other headers that declare alias templates in namespace pmr,
<regex> includes <memory_resource>. That was done because the
pmr::string::const_iterator typedef requires pmr::string to be complete,
which requires pmr::polymorphic_allocator<char> to be complete.

By using __normal_iterator<const char*, pmr::string> instead of the
const_iterator typedef we can avoid the completeness requirement.

This makes <regex> smaller, by not requiring <memory_resource> and its
<shared_mutex> dependency, which depends on <chrono>. Backporting this
will also help with PR 97876, where <stop_token> ends up being needed by
<regex> via <memory_resource>.

libstdc++-v3/ChangeLog:

PR libstdc++/92546
* include/std/regex (pmr::smatch, pmr::wsmatch): Declare using
underlying __normal_iterator type, not nested typedef
basic_string::const_iterator.

Deal with (pattern) SLP consumed stmts in hybrid discovery

This makes hybrid SLP discovery deal with stmts indirectly consumed
by SLP, for example via patterns.  This means that all uses of a
stmt end up in SLP vectorized stmts.

This helps my prototype patches for PR97832 where I make SLP discovery
re-associate chains to make operands match.  This ends up building
SLP computation nodes without 1:1 representatives in the scalar IL
and thus no scalar lane defs in SLP_TREE_SCALAR_STMTS.  Nevertheless
all of the original scalar stmts are consumed so this represents
another kind of SLP pattern for the computation chain result.

2020-11-20  Richard Biener  <rguenther@suse.de>

* tree-vect-slp.c (maybe_push_to_hybrid_worklist): New function.
(vect_detect_hybrid_slp): Use it.  Perform a backward walk
over the IL.

dump SLP_TREE_REPRESENTATIVE

It always annoyed me to see those empty SLP nodes in dumpfiles:

t.c:16:3: note:   node 0x3a2a280 (max_nunits=1, refcnt=1)
t.c:16:3: note:         { }
t.c:16:3: note:         children 0x3a29db0 0x3a29e90

resulting from two-operator handling.  The following makes
sure to also dump the operation template or VEC_PERM_EXPR.

2020-11-20  Richard Biener  <rguenther@suse.de>

* tree-vect-slp.c (vect_print_slp_tree): Also dump
SLP_TREE_REPRESENTATIVE.

c++: Add __builtin_clear_padding builtin - C++20 P0528R3 compiler side [PR88101]

The following patch implements __builtin_clear_padding builtin that clears
the padding bits in object representation (but preserves value
representation).  Inside of unions it clears only those padding bits that
are padding for all the union members (so that it never alters value
representation).

It handles trailing padding, padding in the middle of structs including
bitfields (PDP11 unhandled, I've never figured out how those bitfields
work), VLAs (doesn't handle variable length structures, but I think almost
nobody uses them and it isn't worth the extra complexity).  For VLAs and
sufficiently large arrays it uses runtime clearing loop instead of emitting
straight-line code (unless arrays are inside of a union).

The way I think this can be used for atomics is e.g. if the structures
are power of two sized and small enough that we use the hw atomics
for say compare_exchange __builtin_clear_padding could be called first on
the address of expected and desired arguments (for desired only if we want
to ensure that most of the time the atomic memory will have padding bits
cleared), then perform the weak cmpxchg and if that fails, we got the
value from the atomic memory; we can call __builtin_clear_padding on a copy
of that and then compare it with expected, and if it is the same with the
padding bits masked off, we can use the original with whatever random
padding bits in it as the new expected for next cmpxchg.
__builtin_clear_padding itself is not atomic and therefore it shouldn't
be called on the atomic memory itself, but compare_exchange*'s expected
argument is a reference and normally the implementation may store there
the current value from memory, so padding bits can be cleared in that,
and desired is passed by value rather than reference, so clearing is fine
too.
When using libatomic, we can use it either that way, or add new libatomic
APIs that accept another argument, pointer to the padding bit bitmask,
and construct that in the template as
  alignas (_T) unsigned char _mask[sizeof (_T)];
  std::memset (_mask, ~0, sizeof (_mask));
  __builtin_clear_padding ((_T *) _mask);
which will have bits cleared for padding bits and set for bits taking part
in the value representation.  Then libatomic could internally instead
of using memcmp compare
for (i = 0; i < N; i++) if ((val1[i] & mask[i]) != (val2[i] & mask[i]))

2020-11-20  Jakub Jelinek  <jakub@redhat.com>

PR libstdc++/88101
gcc/
* builtins.def (BUILT_IN_CLEAR_PADDING): New built-in function.
* gimplify.c (gimplify_call_expr): Rewrite single argument
BUILT_IN_CLEAR_PADDING into two-argument variant.
* gimple-fold.c (clear_padding_unit, clear_padding_buf_size): New
const variables.
(struct clear_padding_struct): New type.
(clear_padding_flush, clear_padding_add_padding,
clear_padding_emit_loop, clear_padding_type,
clear_padding_union, clear_padding_real_needs_padding_p,
clear_padding_type_may_have_padding_p,
gimple_fold_builtin_clear_padding): New functions.
(gimple_fold_builtin): Handle BUILT_IN_CLEAR_PADDING.
* doc/extend.texi (__builtin_clear_padding): Document.
gcc/c-family/
* c-common.c (check_builtin_function_arguments): Handle
BUILT_IN_CLEAR_PADDING.
gcc/testsuite/
* c-c++-common/builtin-clear-padding-1.c: New test.
* c-c++-common/torture/builtin-clear-padding-1.c: New test.
* c-c++-common/torture/builtin-clear-padding-2.c: New test.
* c-c++-common/torture/builtin-clear-padding-3.c: New test.
* c-c++-common/torture/builtin-clear-padding-4.c: New test.
* c-c++-common/torture/builtin-clear-padding-5.c: New test.
* g++.dg/torture/builtin-clear-padding-1.C: New test.
* g++.dg/torture/builtin-clear-padding-2.C: New test.
* gcc.dg/builtin-clear-padding-1.c: New test.

arm: Fix up neon_vector_mem_operand [PR97528]

The documentation for POST_MODIFY says:
   Currently, the compiler can only handle second operands of the
   form (plus (reg) (reg)) and (plus (reg) (const_int)), where
   the first operand of the PLUS has to be the same register as
   the first operand of the *_MODIFY.
The following testcase ICEs, because combine just attempts to simplify
things and ends up with
(post_modify (reg1) (plus (mult (reg2) (const_int 4)) (reg1))
but the target predicates accept it, because they only verify
that POST_MODIFY's second operand is PLUS and the second operand
of the PLUS is a REG.

The following patch fixes this by performing further verification that
the POST_MODIFY is in the form it should be.

2020-11-20  Jakub Jelinek  <jakub@redhat.com>

PR target/97528
* config/arm/arm.c (neon_vector_mem_operand): For POST_MODIFY, require
first POST_MODIFY operand is a REG and is equal to the first operand
of PLUS.

* gcc.target/arm/pr97528.c: New test.

Plug loophole in string store merging

There is a loophole in new string store merging support added recently:
it does not check that the stores are consecutive, which is obviously
required if you want to concatenate them... Simple fix attached, the
nice thing being that it can fall back to the regular processing if
any hole is detected in the series of stores, thanks to the handling
of STRING_CST by native_encode_expr.

gcc/ChangeLog:
* gimple-ssa-store-merging.c (struct merged_store_group): Add
new 'consecutive' field.
(merged_store_group): Set it to true.
(do_merge): Set it to false if the store is not consecutive and
set string_concatenation to false in this case.
(merge_into): Call do_merge on entry.
(merge_overlapping): Likewise.

gcc/testsuite/ChangeLog:
* gnat.dg/opt90a.adb: New test.
* gnat.dg/opt90b.adb: Likewise.
* gnat.dg/opt90c.adb: Likewise.
* gnat.dg/opt90d.adb: Likewise.
* gnat.dg/opt90e.adb: Likewise.
* gnat.dg/opt90a_pkg.ads: New helper.
* gnat.dg/opt90b_pkg.ads: Likewise.
* gnat.dg/opt90c_pkg.ads: Likewise.
* gnat.dg/opt90d_pkg.ads: Likewise.
* gnat.dg/opt90e_pkg.ads: Likewise.

Fix comment in ipa-icf-gimple.c

* ipa-icf-gimple.c (func_checker::operand_equal_p): Fix comment.

Fix comparsion of {CLOBBER} in icf

after fixing few issues I gotto stage where 1.4M icf mismatches are due to
comparing two gimple clobber.  The problem is that operand_equal_p match
clobber

case CONSTRUCTOR:
/* In GIMPLE empty constructors are allowed in initializers of
    aggregates.  */
return !CONSTRUCTOR_NELTS (arg0) && !CONSTRUCTOR_NELTS (arg1);

But this happens too late after comparing its types (that are not very relevant
for memory store).

In the context of ipa-icf we do not really need to match RHS of gimple clobbers:
it is enough to know that the LHS stores can be considered equivalent.

I this added logic to hash them all the same way and compare using
TREE_CLOBBER_P flag.  I see other option in extending operand_equal_p
in fold-const to handle them more generously or making stmt hash and compare
to skip comparing/hashing RHS of gimple_clobber_p.

* ipa-icf-gimple.c (func_checker::hash_operand): Hash gimple clobber.
(func_checker::operand_equal_p): Special case gimple clobber.

i386: Optimize abs expansion [PR97873]

The patch introduces absM named pattern to generate optimal insn sequence
for CMOVE_TARGET targets.  Currently, the expansion goes through neg+max
optabs, and the following code is generated:

movl    %edi, %eax
negl    %eax
cmpl    %edi, %eax
cmovl   %edi, %eax

This sequence is unoptimal in two ways.  a) The compare instruction is
not needed, since NEG insn sets the sign flag based on the result.
The CMOV can use sign flag to select between negated and original value:

movl    %edi, %eax
negl    %eax
cmovs   %edi, %eax

b) On some targets, CMOV is undesirable due to its performance issues.
In addition to TARGET_EXPAND_ABS bypass, the patch introduces STV
conversion of abs RTX to use PABS SSE insn:

vmovd   %edi, %xmm0
vpabsd  %xmm0, %xmm0
vmovd   %xmm0, %eax

The patch changes compare mode of NEG instruction to CCGOCmode,
which is the same mode as the mode of SUB instruction. IOW, sign bit
becomes usable.

Also, the mode iterator of <maxmin:code><mode>3 pattern is changed
to SWI48x instead of SWI248. The purpose of maxmin expander is to
prepare max/min RTX for STV to eventually convert them to SSE PMAX/PMIN
instructions, in order to *avoid* CMOV insns with general registers.

2020-11-20  Uroš Bizjak  <ubizjak@gmail.com>

gcc/
PR target/97873
* config/i386/i386.md (*neg<mode>2_2): Rename from
"*neg<mode>2_cmpz".  Use CCGOCmode instead of CCZmode.
(*negsi2_zext): Rename from *negsi2_cmpz_zext.
Use CCGOCmode instead of CCZmode.
(*neg<mode>_ccc_1): New insn pattern.
(*neg<dwi>2_doubleword): Use *neg<mode>_ccc_1.

(abs<mode>2): Add FLAGS_REG clobber.
Use TARGET_CMOVE insn predicate.
(*abs<mode>2_1): New insn_and_split pattern.
(*absdi2_doubleword): Ditto.

(<maxmin:code><mode>3): Use SWI48x mode iterator.
(*<maxmin:code><mode>3): Use SWI48 mode iterator.

* config/i386/i386-features.c
(general_scalar_chain::compute_convert_gain): Handle ABS code.
(general_scalar_chain::convert_insn): Ditto.
(general_scalar_to_vector_candidate_p): Ditto.

gcc/testsuite/
PR target/97873
* gcc.target/i386/pr97873.c: New test.
* gcc.target/i386/pr97873-1.c: New test.

configury: Fix up --enable-link-serialization support

Eric reported that the --enable-link-serialization changes seemed to
cause the binaries to be always relinked, for example from the
gcc/ directory of the build tree:
make
[relink of gnat1, brig1, cc1plus, d21, f951, go1, lto1, ...]
make
[relink of gnat1, brig1, cc1plus, d21, f951, go1, lto1, ...]
Furthermore as reported in PR, it can cause problems during make install
where make install rebuilds the binaries again.

The problem is that for make .PHONY targets are just
"rebuilt" always, so it is very much undesirable for the cc1plus$(exeext)
etc. dependencies to include .PHONY targets, but I was using
them - cc1plus.prev which would depend on some *.serial and
e.g. cc1.serial depending on c and c depending on cc1$(exeext).

The following patch rewrites this so that *.serial and *.prev aren't
.PHONY targets, but instead just make variables.

I was worried that the order in which the language makefile fragments are
included (which is quite random, what order we get from the filesystem
matching */config-lang.in) would be a problem but it seems to work fine
- as it uses make = rather than := variables, later definitions are just
fine for earlier uses as long as the uses aren't needed during the
makefile parsing, but only in the dependencies of make targets and in
their commands.

2020-11-20 Jakub Jelinek <jakub@redhat.com>

PR other/97911
gcc/
* configure.ac: In SERIAL_LIST use lang words without .serial
suffix. Change $lang.prev from a target to variable and instead
of depending on *.serial expand to the *.serial variable if
the word is in the SERIAL_LIST at all, otherwise to nothing.
* configure: Regenerated.
gcc/c/
* Make-lang.in (c.serial): Change from goal to a variable.
(.PHONY): Drop c.serial.
gcc/ada/
* gcc-interface/Make-lang.in (ada.serial): Change from goal to a
variable.
(.PHONY): Drop ada.serial and ada.prev.
(gnat1$(exeext)): Depend on $(ada.serial) rather than ada.serial.
gcc/brig/
* Make-lang.in (brig.serial): Change from goal to a variable.
(.PHONY): Drop brig.serial and brig.prev.
(brig1$(exeext)): Depend on $(brig.serial) rather than brig.serial.
gcc/cp/
* Make-lang.in (c++.serial): Change from goal to a variable.
(.PHONY): Drop c++.serial and c++.prev.
(cc1plus$(exeext)): Depend on $(c++.serial) rather than c++.serial.
gcc/d/
* Make-lang.in (d.serial): Change from goal to a variable.
(.PHONY): Drop d.serial and d.prev.
(d21$(exeext)): Depend on $(d.serial) rather than d.serial.
gcc/fortran/
* Make-lang.in (fortran.serial): Change from goal to a variable.
(.PHONY): Drop fortran.serial and fortran.prev.
(f951$(exeext)): Depend on $(fortran.serial) rather than
fortran.serial.
gcc/go/
* Make-lang.in (go.serial): Change from goal to a variable.
(.PHONY): Drop go.serial and go.prev.
(go1$(exeext)): Depend on $(go.serial) rather than go.serial.
gcc/jit/
* Make-lang.in (jit.serial): Change from goal to a
variable.
(.PHONY): Drop jit.serial and jit.prev.
($(LIBGCCJIT_FILENAME)): Depend on $(jit.serial) rather than
jit.serial.
gcc/lto/
* Make-lang.in (lto1.serial, lto2.serial): Change from goals to
variables.
(.PHONY): Drop lto1.serial, lto2.serial, lto1.prev and lto2.prev.
($(LTO_EXE)): Depend on $(lto1.serial) rather than lto1.serial.
($(LTO_DUMP_EXE)): Depend on $(lto2.serial) rather than lto2.serial.
gcc/objc/
* Make-lang.in (objc.serial): Change from goal to a variable.
(.PHONY): Drop objc.serial and objc.prev.
(cc1obj$(exeext)): Depend on $(objc.serial) rather than objc.serial.
gcc/objcp/
* Make-lang.in (obj-c++.serial): Change from goal to a variable.
(.PHONY): Drop obj-c++.serial and obj-c++.prev.
(cc1objplus$(exeext)): Depend on $(obj-c++.serial) rather than
obj-c++.serial.

rs6000: Fix p8_mtvsrd_df's insn type

This patch is to fix insn type of p8_mtvsrd_df from mfvsr to mtvsr,
in order to align with the other places using mtvsrd.

gcc/ChangeLog:

* config/rs6000/rs6000.md (p8_mtvsrd_df): Fix insn type.

C: Drop qualifiers during lvalue conversion [PR97702]

2020-11-20 Martin Uecker <muecker@gwdg.de>

gcc/
* gimplify.c (gimplify_modify_expr_rhs): Optimizie
NOP_EXPRs that contain compound literals.

gcc/c/
* c-typeck.c (convert_lvalue_to_rvalue): Drop qualifiers.

gcc/testsuite/
* gcc.dg/cond-constqual-1.c: Adapt test.
* gcc.dg/lvalue-11.c: New test.
* gcc.dg/pr60195.c: Add warning.

Daily bump.

ranger: Improve a % b operand ranges [PR91029]

As mentioned in the PR, the previous PR91029 patch was testing
op2 >= 0 which is unnecessary, even negative op2 values will work the same,
furthermore, from if a % b > 0 we can deduce a > 0 rather than just a >= 0
(0 % b would be 0), and it actually valid even for other constants than 0,
a % b > 5 means a > 5 (a % b has the same sign as a and a in [0, 5] would
result in a % b in [0, 5].  Also, we can deduce a range for the other
operand, if we know
a % b >= 20, then b must be (in absolute value for signed modulo) > 20,
for a % [0, 20] the result would be [0, 19].

2020-11-19  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/91029
* range-op.cc (operator_trunc_mod::op1_range): Don't require signed
types, nor require that op2 >= 0.  Implement (a % b) >= x && x > 0
implies a >= x and (a % b) <= x && x < 0 implies a <= x.
(operator_trunc_mod::op2_range): New method.

* gcc.dg/tree-ssa/pr91029-1.c: New test.
* gcc.dg/tree-ssa/pr91029-2.c: New test.

Process only valid shift ranges.

When shifting outside the valid range of [0, precision-1], we can
choose to process just the valid ones since the rest is undefined.
this allows us to produce results for x << [0,2][+INF, +INF] by discarding
the invalid ranges and processing just [0,2].

gcc/
PR tree-optimization/93781
* range-op.cc (get_shift_range): Rename from
undefined_shift_range_check and now return valid shift ranges.
(operator_lshift::fold_range): Use result from get_shift_range.
(operator_rshift::fold_range): Ditto.
gcc/testsuite/
* gcc.dg/tree-ssa/pr93781-1.c: New.
* gcc.dg/tree-ssa/pr93781-2.c: New.
* gcc.dg/tree-ssa/pr93781-3.c: New.

c++: Template hash access

This exposes the template specialization table, so the modules
machinery may access it. The hashed entity (tmpl, args & spec) is
available, along with a hash table walker. We also need a way of
finding or inserting entries, along with some bookkeeping fns to deal
with the instantiation and (partial) specialization lists.

gcc/cp/
* cp-tree.h (struct spec_entry): Moved from pt.c.
(walk_specializations, match_mergeable_specialization)
(get_mergeable_specialization_flags)
(add_mergeable_specialization): Declare.
* pt.c (struct spec_entry): Moved to cp-tree.h.
(walk_specializations, match_mergeable_specialization)
(get_mergeable_specialization_flags)
(add_mergeable_specialization): New.

libstdc++: Avoid calling undefined __gthread_self weak symbol [PR 95989]

Since glibc 2.27 the pthread_self symbol has been defined in libc rather
than libpthread. Because we only call pthread_self through a weak alias
it's possible for statically linked executables to end up without a
definition of pthread_self. This crashes when trying to call an
undefined weak symbol.

We can use the __GLIBC_PREREQ version check to detect the version of
glibc where pthread_self is no longer in libpthread, and call it
directly rather than through the weak reference.

It would be better to check for pthread_self in libc during configure
instead of hardcoding the __GLIBC_PREREQ check. That would be
complicated by the fact that prior to glibc 2.27 libc.a didn't have the
pthread_self symbol, but libc.so.6 did. The configure checks would need
to try to link both statically and dynamically, and the result would
depend on whether the static libc.a happens to be installed during
configure (which could vary between different systems using the same
version of glibc). Doing it properly is left for a future date, as that
will be needed anyway after glibc moves all pthread symbols from
libpthread to libc. When that happens we should revisit the whole
approach of using weak symbols for pthread symbols.

For the purposes of std::this_thread::get_id() we call
pthread_self() directly when using glibc 2.27 or later. Otherwise, if
__gthread_active_p() is true then we know the libpthread symbol is
available so we call that. Otherwise, we are single-threaded and just
use ((__gthread_t)1) as the thread ID.

An undesirable consequence of this change is that code compiled prior to
the change might inline the old definition of this_thread::get_id()
which always returns (__gthread_t)1 in a program that isn't linked to
libpthread. Code compiled after the change will use pthread_self() and
so get a real TID. That could result in the main thread having different
thread::id values in different translation units. This seems acceptable,
as there are not expected to be many uses of thread::id in programs
that aren't linked to libpthread.

An earlier version of this patch also changed __gthread_self() to use
__GLIBC_PREREQ(2, 27) and only use the weak symbol for older glibc. Tha
might still make sense to do, but isn't needed by libstdc++ now.

libstdc++-v3/ChangeLog:

PR libstdc++/95989
* config/os/gnu-linux/os_defines.h (_GLIBCXX_NATIVE_THREAD_ID):
Define new macro to get reliable thread ID.
* include/bits/std_thread.h: (this_thread::get_id): Use new
macro if it's defined.
* testsuite/30_threads/jthread/95989.cc: New test.
* testsuite/30_threads/this_thread/95989.cc: New test.

c++: Expose constexpr hash table

This patch exposes the constexpr hash table so that the modules
machinery can save and load constexpr bodies.  While there I noticed
that we could do a little constification of the hasher and comparator
functions.  Also combine the saving machinery to a single function
returning void -- nothing ever looked at its return value.

gcc/cp/
* cp-tree.h (struct constexpr_fundef): Moved from constexpr.c.
(maybe_save_constexpr_fundef): Declare.
(register_constexpr_fundef): Take constexpr_fundef object, return
void.
* decl.c (mabe_save_function_definition): Delete, functionality
moved to maybe_save_constexpr_fundef.
(emit_coro_helper, finish_function): Adjust.
* constexpr.c (struct constexpr_fundef): Moved to cp-tree.h.
(constexpr_fundef_hasher::equal): Constify.
(constexpr_fundef_hasher::hash): Constify.
(retrieve_constexpr_fundef): Make non-static.
(maybe_save_constexpr_fundef): Break out checking and duplication
from ...
(register_constexpr_fundef): ... here.  Just register the constexpr.

Fix two bugs in operand_equal_p

* fold-const.c (operand_compare::operand_equal_p): Fix thinko in
COMPONENT_REF handling and guard types_same_for_odr by
virtual_method_call_p.
(operand_compare::hash_operand): Likewise.

c, tree: Fix ICE from get_parm_array_spec [PR97860]

The C and C++ FEs handle zero sized arrays differently, C uses
NULL TYPE_MAX_VALUE on non-NULL TYPE_DOMAIN on complete ARRAY_TYPEs
with bitsize_zero_node TYPE_SIZE, while C++ FE likes to set
TYPE_MAX_VALUE to the largest value (and min to the lowest).

Martin has used array_type_nelts in get_parm_array_spec where the
function on the C form of [0] arrays returns error_mark_node and the code
crashes soon afterwards.  The following patch teaches array_type_nelts about
this (e.g. dwarf2out already handles that as [0]).  While it will change
what is_empty_type returns for certain types (e.g. struct S { int a[0]; };),
as those types occupy zero bits in C, it should make an ABI difference.

So, the tree.c change makes the c-decl.c code handle the [0] arrays
like any other constant extents, and the c-decl.c change just makes sure
that if we'd run into error_mark_node e.g. from the VLA expressions, we
don't crash on those.

2020-11-19  Jakub Jelinek  <jakub@redhat.com>

PR c/97860
* tree.c (array_type_nelts): For complete arrays with zero min
and NULL max and zero size return -1.

* c-decl.c (get_parm_array_spec): Bail out of nelts is
error_operand_p.

* gcc.dg/pr97860.c: New test.

c++: Fix array new with value-initialization [PR97523]

Since my r11-3092 the following is rejected with -std=c++20:

  struct T { explicit T(); };
  void fn(int n) {
    new T[1]();
  }

with "would use explicit constructor 'T::T()'".  It is because since
that change we go into the P1009 block in build_new (array_p is false,
but nelts is non-null and we're in C++20).  Since we only have (), we
build a {} and continue to build_new_1, which then calls build_vec_init
and then we error because the {} isn't CONSTRUCTOR_IS_DIRECT_INIT.

For (), which is value-initializing, we want to do what we were doing
before: pass empty init and let build_value_init take care of it.

For various reasons I wanted to dig a little bit deeper into this,
and as a result, I'm adding a test for [expr.new]/24 (and checked that
out current behavior matches clang++).

gcc/cp/ChangeLog:

PR c++/97523
* init.c (build_new): When value-initializing an array new,
leave the INIT as an empty vector.

gcc/testsuite/ChangeLog:

PR c++/97523
* g++.dg/expr/anew5.C: New test.
* g++.dg/expr/anew6.C: New test.

c++: Fix crash with broken deduction from {} [PR97895]

Unfortunately, the otherwise beautiful

for (constructor_elt &elt : *CONSTRUCTOR_ELTS (init))

is not immune to an empty constructor, so we have to check
CONSTRUCTOR_ELTS first.

gcc/cp/ChangeLog:

PR c++/97895
* pt.c (do_auto_deduction): Don't crash when the constructor has
zero elements.

gcc/testsuite/ChangeLog:

PR c++/97895
* g++.dg/cpp0x/auto54.C: New test.

config: Add tests for modules-desired features

this adds configure tests for features that modules can take advantage
of -- and if they are not present has reduced or fallback functionality.

gcc/
* configure.ac: Add tests for fstatat, sighandler_t, O_CLOEXEC,
unix-domain and ipv6 sockets.
* config.in: Rebuilt.
* configure: Rebuilt.

c++: Relax new assert [PR 97905]

It turns out there are legitimate cases for the new decl to not have
lang-specific.

PR c++/97905
gcc/cp/
* decl.c (duplicate_decls): Relax new assert.
gcc/testsuite/
* g++.dg/lookup/pr97905.C: New.

pru: Add builtins for HALT and LMBD

Add builtins for HALT and LMBD, per Texas Instruments document
SPRUHV7C. Use the new LMBD pattern to define an expand for clz.

Binutils [1] and sim [2] support for LMBD instruction are merged now.

[1] https://sourceware.org/pipermail/binutils/2020-October/113901.html
[2] https://sourceware.org/pipermail/gdb-patches/2020-November/173141.html

gcc/ChangeLog:

* config/pru/alu-zext.md: Add lmbd patterns for zero_extend
variants.
* config/pru/pru.c (enum pru_builtin): Add HALT and LMBD.
(pru_init_builtins): Ditto.
(pru_builtin_decl): Ditto.
(pru_expand_builtin): Ditto.
* config/pru/pru.h (CLZ_DEFINED_VALUE_AT_ZERO): Define PRU
value for CLZ with zero value parameter.
* config/pru/pru.md: Add halt, lmbd and clz patterns.
* doc/extend.texi: Document PRU builtins.

gcc/testsuite/ChangeLog:

* gcc.target/pru/halt.c: New test.
* gcc.target/pru/lmbd.c: New test.

vect: Add a “very cheap” cost model

Currently we have three vector cost models: cheap, dynamic and
unlimited.  -O2 -ftree-vectorize uses “cheap” by default, but that's
still relatively aggressive about peeling and aliasing checks,
and can lead to significant code size growth.

This patch adds an even more conservative choice, which for lack of
imagination I've called “very cheap”.  It only allows vectorisation
if the vector code entirely replaces the scalar code.  It also
requires one iteration of the vector loop to pay for itself,
regardless of how often the loop iterates.  (If the vector loop
needs multiple iterations to be beneficial then things are
probably too close to call, and the conservative thing would
be to stick with the scalar code.)

The idea is that this should be suitable for -O2, although the patch
doesn't change any defaults itself.

I tested this by building and running a bunch of workloads for SVE,
with three options:

  (1) -O2
  (2) -O2 -ftree-vectorize -fvect-cost-model=very-cheap
  (3) -O2 -ftree-vectorize [-fvect-cost-model=cheap]

All three builds used the default -msve-vector-bits=scalable and
ran with the minimum vector length of 128 bits, which should give
a worst-case bound for the performance impact.

The workloads included a mixture of microbenchmarks and full
applications.  Because it's quite an eclectic mix, there's not
much point giving exact figures.  The aim was more to get a general
impression.

Code size growth with (2) was much lower than with (3).  Only a
handful of tests increased by more than 5%, and all of them were
microbenchmarks.

In terms of performance, (2) was significantly faster than (1)
on microbenchmarks (as expected) but also on some full apps.
Again, performance only regressed on a handful of tests.

As expected, the performance of (3) vs. (1) and (3) vs. (2) is more
of a mixed bag.  There are several significant improvements with (3)
over (2), but also some (smaller) regressions.  That seems to be in
line with -O2 -ftree-vectorize being a kind of -O2.5.

The patch reorders vect_cost_model so that values are in order
of increasing aggressiveness, which makes it possible to use
range checks.  The value 0 still represents “unlimited”,
so “if (flag_vect_cost_model)” is still a meaningful check.

gcc/
* doc/invoke.texi (-fvect-cost-model): Add a very-cheap model.
* common.opt (fvect-cost-model=): Add very-cheap as a possible option.
(fsimd-cost-model=): Likewise.
(vect_cost_model): Add very-cheap.
* flag-types.h (vect_cost_model): Add VECT_COST_MODEL_VERY_CHEAP.
Put the values in order of increasing aggressiveness.
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Use
range checks when comparing against VECT_COST_MODEL_CHEAP.
(vect_prune_runtime_alias_test_list): Do not allow any alias
checks for the very-cheap cost model.
* tree-vect-loop.c (vect_analyze_loop_costing): Do not allow
any peeling for the very-cheap cost model.  Also require one
iteration of the vector loop to pay for itself.

gcc/testsuite/
* gcc.dg/vect/vect-cost-model-1.c: New test.
* gcc.dg/vect/vect-cost-model-2.c: Likewise.
* gcc.dg/vect/vect-cost-model-3.c: Likewise.
* gcc.dg/vect/vect-cost-model-4.c: Likewise.
* gcc.dg/vect/vect-cost-model-5.c: Likewise.
* gcc.dg/vect/vect-cost-model-6.c: Likewise.

libstdc++: Add missing header to some tests

These tests use std::this_thread::sleep_for without including <thread>.

libstdc++-v3/ChangeLog:

* testsuite/30_threads/async/async.cc: Include <thread>.
* testsuite/30_threads/future/members/93456.cc: Likewise.

AArch64: Add cost table for Cortex-A76

Add an initial cost table for Cortex-A76 - this is copied from
cotexa57_extra_costs but updated based on the Optimization Guide.
Use the new cost table on all Neoverse tunings and ensure the tunings
are consistent for all.  As a result more compact code is generated
with more combined shift+alu operations. Eg. -mcpu=cortex-a76 will now
merge the shifts in:

int f(int x, int y) { return (x & y << 3) * (x | y << 3); }

and  w2, w0, w1, lsl 3
orr  w0, w0, w1, lsl 3
mul  w0, w2, w0
ret

SPEC2017 codesize improves by 0.02% and SPECINT2017 shows 0.24% gain.

2020-11-18  Wilco Dijkstra  <wdijkstr@arm.com>

gcc/
* config/aarch64/aarch64.c (neoversen1_tunings): Use new
cortexa76_extra_costs.
(neoversev1_tunings): Likewise.
(neoversen2_tunines): Likewise.
* config/arm/aarch-cost-tables.h (cortexa76_extra_costs):
add new costs.

AArch64: Improve inline memcpy expansion

Improve the inline memcpy expansion.  Use integer load/store for copies <= 24
bytes instead of SIMD.  Set the maximum copy to expand to 256 by default,
except that -Os or no Neon expands up to 128 bytes.  When using LDP/STP of
Q-registers, also use Q-register accesses for the unaligned tail, saving 2
instructions (eg. all sizes up to 48 bytes emit exactly 4 instructions).
Cleanup code and comments.

The codesize gain vs the GCC10 expansion is 0.05% on SPECINT2017.

2020-11-03  Wilco Dijkstra  <wdijkstr@arm.com>

gcc/
* config/aarch64/aarch64.c (aarch64_expand_cpymem): Cleanup code and
comments, tweak expansion decisions and improve tail expansion.

Fix PR ada/97805

We need to include limits.h (or <climits>) in adaint.c because of LLONG_MIN.

gcc/ada/ChangeLog:
PR ada/97805
* adaint.c: Include climits in C++ and limits.h otherwise.

preprocessor: main file searching

This adds the capability to locate the main file on the user or system
include paths.  That's extremely useful to users building header
units.  Searching has to be requiested (plain header-unit compilation
will not search).  Also, to make include_next work as expected when
building a header unit, we add a mechanism to retrofit a non-searched
source file as one on the include path.

libcpp/
* include/cpplib.h (enum cpp_main_search): New.
(struct cpp_options): Add main_search field.
(cpp_main_loc): Declare.
(cpp_retrofit_as_include): Declare.
* internal.h (struct cpp_reader): Add main_loc field.
(_cpp_in_main_source_file): Not main if main is a header.
* init.c (cpp_read_main_file): Use main_search option to locate
main file.  Set main_loc
* files.c (cpp_retrofit_as_include): New.

libstdc++: Move std::thread to a new header

This makes it possible to use std::thread without including the whole of
<thread>. It also makes this_thread::get_id() and this_thread::yield()
available even when there is no gthreads support (e.g. when GCC is built
with --disable-threads or --enable-threads=single).

In order for the std::thread::id return type of this_thread::get_id() to
be defined, std:thread itself is defined unconditionally. However the
constructor that creates new threads is not defined for single-threaded
builds. The thread::join() and thread::detach() member functions are
defined inline for single-threaded builds and just throw an exception
(because we know the thread cannot be joinable if the constructor that
creates joinable threads doesn't exit).

The thread::hardware_concurrency() member function is also defined
inline and returns 0 (as suggested by the standard when the value "is
not computable or well-defined").

The main benefit for most targets is that other headers such as <future>
do not need to include the whole of <thread> just to be able to create a
std::thread. That avoids including <stop_token> and std::jthread where
not required. This is another partial fix for PR 92546.

This also means we can use this_thread::get_id() and this_thread::yield()
in <stop_token> instead of using the gthread functions directly. This
removes some preprocessor conditionals, simplifying the code.

libstdc++-v3/ChangeLog:

PR libstdc++/92546
* include/Makefile.am: Add new <bits/std_thread.h> header.
* include/Makefile.in: Regenerate.
* include/std/future: Include new header instead of <thread>.
* include/std/stop_token: Include new header instead of
<bits/gthr.h>.
(stop_token::_S_yield()): Use this_thread::yield().
(_Stop_state_t::_M_requester): Change type to std::thread::id.
(_Stop_state_t::_M_request_stop()): Use this_thread::get_id().
(_Stop_state_t::_M_remove_callback(_Stop_cb*)): Likewise.
Use __is_single_threaded() to decide whether to synchronize.
* include/std/thread (thread, operator==, this_thread::get_id)
(this_thread::yield): Move to new header.
(operator<=>, operator!=, operator<, operator<=, operator>)
(operator>=, hash<thread::id>, operator<<): Define even when
gthreads not available.
* src/c++11/thread.cc: Include <memory>.
* include/bits/std_thread.h: New file.
(thread, operator==, this_thread::get_id, this_thread::yield):
Define even when gthreads not available.
[!_GLIBCXX_HAS_GTHREADS] (thread::join, thread::detach)
(thread::hardware_concurrency): Define inline.

libstdc++: Fix overflow checks to use the correct "time_t" [PR 93456]

I recently added overflow checks to src/c++11/futex.cc for PR 93456, but
then changed the type of the timespec for PR 93421. This meant the
overflow checks were no longer using the right range, because the
variable being written to might be smaller than time_t.

This introduces new typedef that corresponds to the tv_sec member of the
struct being passed to the syscall, and uses that typedef in the range
checks.

libstdc++-v3/ChangeLog:

PR libstdc++/93421
PR libstdc++/93456
* src/c++11/futex.cc (syscall_time_t): New typedef for
the type of the syscall_timespec::tv_sec member.
(relative_timespec, _M_futex_wait_until)
(_M_futex_wait_until_steady): Use syscall_time_t in overflow
checks, not time_t.

preprocessor: main-file cleanup

In preparing module patch 7 I realized there was a cleanup I could
make to simplify it.  This is that cleanup.  Also, when doing the
cleanup I noticed some macros had been turned into inline functions,
but not renamed to the preprocessors internal namespace
(_cpp_$INTERNAL rather than cpp_$USER).  Thus, this renames those
functions, deletes an internal field of the file structure, and
determines whether we're in the main file by comparing to
pfile->main_file, the _cpp_file of the main file.

libcpp/
* internal.h (cpp_in_system_header): Rename to ...
(_cpp_in_system_header): ... here.
(cpp_in_primary_file): Rename to ...
(_cpp_in_main_source_file): ... here.  Compare main_file equality
and check main_search value.
* lex.c (maybe_va_opt_error, _cpp_lex_direct): Adjust for rename.
* macro.c (_cpp_builtin_macro_text): Likewise.
(replace_args): Likewise.
* directives.c (do_include_next): Likewise.
(do_pragma_once, do_pragma_system_header): Likewise.
* files.c (struct _cpp_file): Delete main_file field.
(pch_open): Check pfile->main_file equality.
(make_cpp_file): Drop cpp_reader parm, don't set main_file.
(_cpp_find_file): Adjust.
(_cpp_stack_file): Check pfile->main_file equality.
(struct report_missing_guard_data): Add cpp_reader field.
(report_missing_guard): Check pfile->main_file equality.
(_cpp_report_missing_guards): Adjust.

Fix bootstrap

This fixes a typo in the TREE_CODE compare which should
compare against TYPE_DECL, not TYPE_NAME.

2020-11-19 Richard Biener <rguenther@suse.de>

* fold-const.c (operand_compare::hash_operand): Fix typo.

Fix gcc.dg/pr97897.c

This adds dg-options "" to avoid the pedantic error on _Complex int.

2020-11-19 Richard Biener <rguenther@suse.de>

* gcc.dg/pr97897.c: Add dg-options.

refactor reassocs get_rank

This refactors things so assigned ranks are dumped and the cache
is consistently used also for PHIs.

2020-11-19 Richard Biener <rguenther@suse.de>

* tree-ssa-reassoc.c (get_rank): Refactor to consistently
use the cache and dump ranks assigned.

Fix operand_equal_p hash and copare of ODR_TYPE_REF

* fold-const.c (operand_compare::operand_equal_p): More OBJ_TYPE_REF
matching to correct place; drop OEP_ADDRESS_OF for TOKEN, OBJECT and
class.
(operand_compare::hash_operand): Hash ODR type for OBJ_TYPE_REF.

[3/3] [AArch64][vect] vec_widen_lshift pattern

Add aarch64 vec_widen_lshift_lo/hi patterns and fix bug it triggers in
mid-end. This pattern takes one vector with N elements of size S, shifts
each element left by the element width and stores the results as N
elements of size 2*s (in 2 result vectors). The aarch64 backend
implements this with the shll,shll2 instruction pair.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md: Add vec_widen_lshift_hi/lo<mode>
patterns.
* tree-vect-stmts.c (vectorizable_conversion): Fix for widen_lshift
case.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-widen-lshift.c: New test.

[2/3] [vect] Add widening add, subtract patterns

Add widening add, subtract patterns to tree-vect-patterns. Update the
widened code of patterns that detect PLUS_EXPR to also detect
WIDEN_PLUS_EXPR. These patterns take 2 vectors with N elements of size
S and perform an add/subtract on the elements, storing the results as N
elements of size 2*S (in 2 result vectors). This is implemented in the
aarch64 backend as addl,addl2 and subl,subl2 respectively. Add aarch64
tests for patterns.

gcc/ChangeLog:
* doc/generic.texi: Document new widen_plus/minus_lo/hi tree codes.
* doc/md.texi: Document new widenening add/subtract hi/lo optabs.
* expr.c (expand_expr_real_2): Add widen_add, widen_subtract cases.
* optabs-tree.c (optab_for_tree_code): Add case for widening optabs.
* optabs.def (OPTAB_D): Define vectorized widen add, subtracts.
* tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds,
subtracts.
* tree-inline.c (estimate_operator_cost): Add case for widening adds,
subtracts.
* tree-vect-generic.c (expand_vector_operations_1): Add case for
widening adds, subtracts
* tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog
pattern.
(vect_recog_widen_sub_pattern): New recog pattern.
(vect_recog_average_pattern): Update widened add code.
(vect_recog_average_pattern): Update widened add code.
* tree-vect-stmts.c (vectorizable_conversion): Add case for widened add,
subtract.
(supportable_widening_operation): Add case for widened add, subtract.
* tree.def
(WIDEN_PLUS_EXPR): New tree code.
(WIDEN_MINUS_EXPR): New tree code.
(VEC_WIDEN_ADD_HI_EXPR): New tree code.
(VEC_WIDEN_PLUS_LO_EXPR): New tree code.
(VEC_WIDEN_MINUS_HI_EXPR): New tree code.
(VEC_WIDEN_MINUS_LO_EXPR): New tree code.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-widen-add.c: New test.
* gcc.target/aarch64/vect-widen-sub.c: New test.

[1/3][aarch64] Add vec_widen patterns to aarch64

Add widening add and subtract patterns to the aarch64
backend. These allow taking vectors of N elements of size S
and performing and add/subtract on the high or low half
widening the resulting elements and storing N/2 elements of size 2*S.
These correspond to the addl,addl2,subl,subl2 instructions.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md: New patterns
vec_widen_saddl_lo/hi_<mode>.

tree-optimization/97901 - ICE propagating out LC PHIs

We need to fold the stmt to canonicalize MEM_REFs which means
we're back to using replace_uses_by. Which means we need dominators
to not require a CFG cleanup upthread.

2020-11-19 Richard Biener <rguenther@suse.de>

PR tree-optimization/97901
* tree-ssa-propagate.c (clean_up_loop_closed_phi): Compute
dominators and use replace_uses_by.

* gcc.dg/torture/pr97901.c: New testcase.

Enhance debug info for fixed-point types

The Ada language supports fixed-point types as first-class citizens so
they need to be described as-is in the debug info. We devised the
langhook get_fixed_point_type_info for this purpose a few years ago,
but it comes with a limitation for the representation of the scale
factor that we would need to lift in order to be able to represent
more fixed-point types.

gcc/ChangeLog:
* dwarf2out.h (struct fixed_point_type_info) <scale_factor>: Turn
numerator and denominator into a tree.
* dwarf2out.c (base_type_die): In the case of a fixed-point type
with arbitrary scale factor, call add_scalar_info on numerator and
denominator to emit the appropriate attributes.

gcc/ada/ChangeLog:
* exp_dbug.adb (Is_Handled_Scale_Factor): Delete.
(Get_Encoded_Name): Do not call it.
* gcc-interface/decl.c (gnat_to_gnu_entity) <Fixed_Point_Type>:
Tidy up and always use a meaningful description for arbitrary
scale factors.
* gcc-interface/misc.c (gnat_get_fixed_point_type_info): Remove
obsolete block and adjust the description of the scale factor.

tree-optimization/97897 - complex lowering on abnormal edges

This fixes complex lowering to not put constants into abnormal
edge PHI values by making sure abnormally used SSA names are
VARYING in its propagation lattice.

2020-11-19 Richard Biener <rguenther@suse.de>

PR tree-optimization/97897
* tree-complex.c (complex_propagate::visit_stmt): Make sure
abnormally used SSA names are VARYING.
(complex_propagate::visit_phi): Likewise.
* tree-ssa.c (verify_phi_args): Verify PHI arguments on abnormal
edges are SSA names.

* gcc.dg/pr97897.c: New testcase.

i386: Disable *<absneg:code><mode>2_i387_1 for TARGET_SSE_MATH modes

This pattern interferes with *<absneg:code><mode>2_1 when TARGET_SSE_MATH
modes are active. Combine pass is able to remove (use) RTXes and transforms
*<absneg:code><mode>2_1 to *<absneg:code><mode>2_i387_1 where SSE
alternatives are not available.

2020-11-19 Uroš Bizjak <ubizjak@gmail.com>

gcc/
* config/i386/i386.md (*<absneg:code><mode>2_i387_1):
Disable for TARGET_SSE_MATH modes.

gcc/testsuite/
* gcc.target/i386/pr97887.c: New test.

Minor H8 shift code generation change in preparation for cc0 removal

So I didn't stay up late to work from pago pago this year and beat the stage1
close, but I do want to flush out the removal of cc0 from the H8 port this
cycle.  Given these patches only affect the H8 and the H8 would be killed this
cycle without the conversion, I think this is suitable even though we're past
stage1 close.

This patch addresses an initial codegen issue that would have resulted in
regressions after removal of cc0.  The compare/test eliminate pass is unable to
handle multiple clobbers.  So patterns that clobber a scratch and also clobber
a condition code are never used to eliminate a compare/test.

The H8 can shift 1 or 2 bits at a time depending on the precise model.  Not
surprisingly we have multiple strategies to implement shifts, some of which
clobber scratch registers -- but we have a clobber on every shift insn and as
a result they can not participate in compare/test removal once cc0 is removed
from the port.

This patch removes the clobber in the initial code generation in cases where
it's obviously not needed allowing those shifts to participate in compare/test
removal in a future patch.  It has the advantage that is also generates
slightly better code.  By installing this now the removal of cc0 is a smaller
patch, but more importantly, it allows for a more direct comparison of the
generated code before/after cc0 removal.

I've had my tester test before/after this patch with no regressions on the
major H8 multilibs.  I've also spot checked the generated code and as expected
it's ever-so-slightly better after this patch.

I'll be installing this on the trunk momentarily.  More patches will follow,
though probably not in rapid succession as my time to push this stuff is very
limited.

gcc/

* config/h8300/constraints.md (R constraint): Add argument to call
to h8300_shift_needs_scratch_p.
(S and T constraints): Similary.
* config/h8300/h8300-protos.h: Update h8300_shift_needs_scratch_p
prototype.
* config/h8300/h8300.c (expand_a_shift): Emit a different pattern
if the shift does not require a scratch register.
(h8300_shift_needs_scratch_p): Refine to be more accurate.
* config/h8300/shiftrotate.md (shiftqi_noscratch): New pattern.
(shifthi_noscratch, shiftsi_noscratch): Similarly.

Daily bump.

Fix middle-end/85811: Introduce tree_expr_maybe_non_p et al.

The motivation for this patch is PR middle-end/85811, a wrong-code
regression entitled "Invalid optimization with fmax, fabs and nan".
The optimization involves assuming max(x,y) is non-negative if (say)
y is non-negative, i.e. max(x,2.0).  Unfortunately, this is an invalid
assumption in the presence of NaNs.  Hence max(x,+qNaN), with IEEE fmax
semantics will always return x even though the qNaN is non-negative.
Worse, max(x,2.0) may return a negative value if x is -sNaN.

I'll quote Joseph Myers (many thanks) who describes things clearly as:
> (a) When both arguments are NaNs, the return value should be a qNaN,
> but sometimes it is an sNaN if at least one argument is an sNaN.
> (b) Under TS 18661-1 semantics, if either argument is an sNaN then the
> result should be a qNaN (whereas if one argument is a qNaN and the
> other is not a NaN, the result should be the non-NaN argument).
> Various implementations treat sNaNs like qNaNs here.

Under this logic, the tree_expr_nonnegative_p for IEEE fmax should be:

    CASE_CFN_FMAX:
    CASE_CFN_FMAX_FN:
      /* Usually RECURSE (arg0) || RECURSE (arg1) but NaNs complicate
         things.  In the presence of sNaNs, we're only guaranteed to be
         non-negative if both operands are non-negative.  In the presence
         of qNaNs, we're non-negative if either operand is non-negative
         and can't be a qNaN, or if both operands are non-negative.  */
      if (tree_expr_maybe_signaling_nan_p (arg0) ||
          tree_expr_maybe_signaling_nan_p (arg1))
        return RECURSE (arg0) && RECURSE (arg1);
      return RECURSE (arg0) ? (!tree_expr_maybe_nan_p (arg0)
                              || RECURSE (arg1))
                            : (RECURSE (arg1)
                              && !tree_expr_maybe_nan_p (arg1));

Which indeed resolves the wrong code in the PR.  The infrastructure that
makes this possible are the two new functions tree_expr_maybe_nan_p and
tree_expr_maybe_signaling_nan_p which test whether a value may potentially
be a NaN or a signaling NaN respectively.  In fact, this patch adds seven
new predicates to the middle-end:

bool tree_expr_finite_p (const_tree);
bool tree_expr_infinite_p (const_tree);
bool tree_expr_maybe_infinite_p (const_tree);
bool tree_expr_signaling_nan_p (const_tree);
bool tree_expr_maybe_signaling_nan_p (const_tree);
bool tree_expr_nan_p (const_tree);
bool tree_expr_maybe_nan_p (const_tree);

These functions correspond to the "must" and "may" operators in modal logic,
and allow us to triage expressions in the middle-end; definitely a NaN,
definitely not a NaN, and unknown at compile-time, etc.  A prime example of
the utility of these functions is that a IEEE floating point value promoted
from an integer type can't be a NaN or infinite.  Hence (double)i+0.0 where
i is an integer can be simplified to (double)i even with -fsignaling-nans.
Currently in GCC optimizations are enabled/disabled based on whether the
expression's type supports NaNs or sNaNs; with these new predicates they
can be controlled by whether the actual operands may or may not be NaNs.

Having added these extremely useful helper functions to the middle-end,
I couldn't help by use then in a few places in fold-const.c, builtins.c
and match.pd.  In the near term, these can/should be used in places
where the tree optimizers test for HONOR_NANS, HONOR_INFINITIES or
HONOR_SNANS, or explicitly test whether a REAL_CST is a NaN or Inf.
In the longer term (I'm not volunteering) these predicates could perhaps
be hooked into the middle-end's SSA chaining and/or VRP machinery,
allowing finiteness to propagated around the CFG, much like we
currently propagate value ranges.

This patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap"
and "make -k check".
Ok for mainline?

2020-08-15  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR middle-end/85811
* fold-const.c (tree_expr_finite_p): New function to test whether
a tree expression must be finite, i.e. not a FP NaN or infinity.
(tree_expr_infinite_p):  New function to test whether a tree
expression must be infinite, i.e. a FP infinity.
(tree_expr_maybe_infinite_p): New function to test whether a tree
expression may be infinite, i.e. a FP infinity.
(tree_expr_signaling_nan_p): New function to test whether a tree
expression must evaluate to a signaling NaN (sNaN).
(tree_expr_maybe_signaling_nan_p): New function to test whether a
tree expression may be a signaling NaN (sNaN).
(tree_expr_nan_p): New function to test whether a tree expression
must evaluate to a (quiet or signaling) NaN.
(tree_expr_maybe_nan_p): New function to test whether a tree
expression me be a (quiet or signaling) NaN.

(tree_binary_nonnegative_warnv_p) [MAX_EXPR]: In the presence
of NaNs, MAX_EXPR is only guaranteed to be non-negative, if both
operands are non-negative.
(tree_call_nonnegative_warnv_p) [CASE_CFN_FMAX,CASE_CFN_FMAX_FN]:
In the presence of signaling NaNs, fmax is only guaranteed to be
non-negative if both operands are negative.  In the presence of
quiet NaNs, fmax is non-negative if either operand is non-negative
and not a qNaN, or both operands are non-negative.

* fold-const.h (tree_expr_finite_p, tree_expr_infinite_p,
tree_expr_maybe_infinite_p, tree_expr_signaling_nan_p,
tree_expr_maybe_signaling_nan_p, tree_expr_nan_p,
tree_expr_maybe_nan_p): Prototype new functions here.

* builtins.c (fold_builtin_classify) [BUILT_IN_ISINF]: Fold to
a constant if argument is known to be (or not to be) an Infinity.
[BUILT_IN_ISFINITE]: Fold to a constant if argument is known to
be (or not to be) finite.
[BUILT_IN_ISNAN]: Fold to a constant if argument is known to be
(or not to be) a NaN.
(fold_builtin_fpclassify): Check tree_expr_maybe_infinite_p and
tree_expr_maybe_nan_p instead of HONOR_INFINITIES and HONOR_NANS
respectively.
(fold_builtin_unordered_cmp): Fold UNORDERED_EXPR to a constant
when its arguments are known to be (or not be) NaNs.  Check
tree_expr_maybe_nan_p instead of HONOR_NANS when choosing between
unordered and regular forms of comparison operators.

* match.pd (ordered(x,y)->true/false): Constant fold ORDERED_EXPR
if its operands are known to be (or not to be) NaNs.
(unordered(x,y)->true/false): Constant fold UNORDERED_EXPR if its
operands are known to be (or not to be) NaNs.
(sqrt(x)*sqrt(x)->x): Check tree_expr_maybe_signaling_nan_p instead
of HONOR_SNANS.

gcc/testsuite/ChangeLog
PR middle-end/85811
* gcc.dg/pr85811.c: New test.
* gcc.dg/fold-isfinite-1.c: New test.
* gcc.dg/fold-isfinite-2.c: New test.
* gcc.dg/fold-isinf-1.c: New test.
* gcc.dg/fold-isinf-2.c: New test.
* gcc.dg/fold-isnan-1.c: New test.
* gcc.dg/fold-isnan-2.c: New test.

lto: Fix typo in comment of gcc/lto/lto-symtab.c

* lto-symtab.c (lto_symtab_merge_symbols): Fix typos in comment.

vrp: Fix operator_trunc_mod::op1_range [PR97888]

As mentioned in the PR, in (x % y) >= 0 && y >= 0, we can't deduce
x's range to be x >= 0, as e.g. -7 % 7 is 0.  But we can deduce it
from (x % y) > 0.  The patch also fixes up the comments.

2020-11-18  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/91029
PR tree-optimization/97888
* range-op.cc (operator_trunc_mod::op1_range): Only set op1
range to >= 0 if lhs is > 0, rather than >= 0.  Fix up comments.

* gcc.dg/pr91029.c: Add comment with PR number.
(f2): Use > 0 rather than >= 0.
* gcc.c-torture/execute/pr97888-1.c: New test.
* gcc.c-torture/execute/pr97888-2.c: New test.

plugins: Allow plugins to handle global_options changes

Any time somebody adds or removes an option in some *.opt file (which e.g.
on the 10 branch after branching off 11 happened 7 times already), many
offsets in global_options variable change and so plugins that ever access
GCC options or other global_options values are ABI dependent on it.  It is
true we don't guarantee ABI stability for plugins, but we change the most
often used data structures on the release branches only very rarely and so
the options changes are the most problematic for ABI stability of plugins.

Annobin uses a way to remap accesses to some of the global_options.x_* by
looking them up in the cl_options array where we have
offsetof (struct gcc_options, x_flag_lto)
etc. remembered, but sadly doesn't do it for all options (e.g. some flag_*
etc. option accesses may be hidden in various macros like POINTER_SIZE),
and more importantly some struct gcc_options offsets are not covered at all.
E.g. there is no offsetof (struct gcc_options, x_optimize),
offsetof (struct gcc_options, x_flag_sanitize) etc.  Those are usually:
Variable
int optimize
in the *.opt files.

The following patch allows the plugins to deal with reshuffling of even
the global_options fields that aren't tracked in cl_options by adding
another array that describes those, which adds an 816 bytes long array
and 1039 bytes in string literals, so 1855 .rodata bytes in total ATM.
And adds it only if --enable-plugin (the default), with --disable-plugin
it will not be compiled in.

2020-11-18  Jakub Jelinek  <jakub@redhat.com>

* opts.h (struct cl_var): New type.
(cl_vars): Declare.
* optc-gen.awk: Generate cl_vars array.

analyzer: only use CWE-690 for unchecked return value [PR97893]

CWE-690 is only for dereferencing an unchecked return value; for
other kinds of NULL dereference, use the parent classification, CWE-476.

gcc/analyzer/ChangeLog:
PR analyzer/97893
* sm-malloc.cc (null_deref::emit): Use CWE-476 rather than
CWE-690, as this isn't due to an unchecked return value.
(null_arg::emit): Likewise.

gcc/testsuite/ChangeLog:
PR analyzer/97893
* gcc.dg/analyzer/malloc-1.c: Add CWE-690 and CWE-476 codes to
expected output.

Objective-C++ : Avoid ICE on invalid with empty attributes.

Empty prefix attributes like:

__attribute__ (())
@interface MyClass
@end

cause an ICE at present, check for that case and skip them.

gcc/cp/ChangeLog:

* parser.c (cp_parser_objc_valid_prefix_attributes): Check
for empty attributes.

Optimize two patterns with three xors

gcc/
PR tree-optimization/96671
* match.pd (three xor patterns): New patterns.

openmp: Retire nest-var ICV for OpenMP 5.1

This removes the nest-var ICV, expressing nesting in terms of the
max-active-levels-var ICV instead.  The max-active-levels-var ICV
is now per data environment rather than per device.

2020-11-18  Kwok Cheung Yeung  <kcy@codesourcery.com>

libgomp/
* env.c (gomp_global_icv): Remove nest_var field.  Add
max_active_levels_var field.
(gomp_max_active_levels_var): Remove.
(parse_boolean): Return true on success.
(handle_omp_display_env): Express OMP_NESTED in terms of
max_active_levels_var.  Change format specifier for
max_active_levels_var.
(initialize_env): Set max_active_levels_var from
OMP_MAX_ACTIVE_LEVELS, OMP_NESTED, OMP_NUM_THREADS and
OMP_PROC_BIND.
* icv.c (omp_set_nested): Express in terms of
max_active_levels_var.
(omp_get_nested): Likewise.
(omp_set_max_active_levels): Use max_active_levels_var field instead
of gomp_max_active_levels_var.
(omp_get_max_active_levels): Likewise.
* libgomp.h (struct gomp_task_icv): Remove nest_var field.  Add
max_active_levels_var field.
(gomp_supported_active_levels): Set to UCHAR_MAX.
(gomp_max_active_levels_var): Delete.
* libgomp.texi (omp_get_nested): Update documentation.
(omp_set_nested): Likewise.
(OMP_MAX_ACTIVE_LEVELS): Likewise.
(OMP_NESTED): Likewise.
(OMP_NUM_THREADS): Likewise.
(OMP_PROC_BIND): Likewise.
* parallel.c (gomp_resolve_num_threads): Replace reference
to nest_var with max_active_levels_var.  Use max_active_levels_var
field instead of gomp_max_active_levels_var.

Update gcc zh_TW.po.

* zh_TW.po: Update.

options, lto: Optimize streaming of optimization nodes

Honza mentioned that especially for the new param machinery, most of
streamed values are probably going to be the default values.  Perhaps
somehow we could stream them more effectively.

This patch implements it and brings further savings, the size
goes down from 574 bytes to 273 bytes, i.e. less than half.
Not trying to handle enums because the code doesn't know if (enum ...) 10
is even valid, similarly non-parameters because those really generally
don't have large initializers, and params without Init (those are 0
initialized and thus don't need to be handled).

2020-11-18  Jakub Jelinek  <jakub@redhat.com>

* optc-save-gen.awk: Initialize var_opt_init.  In
cl_optimization_stream_out for params with default values larger than
10, xor the default value with the actual parameter value.  In
cl_optimization_stream_in repeat the above xor.

configury: --enable-link-serialization support

When performing LTO bootstraps, especially when using tmpfs for /tmp,
one can run a machine to halt when using higher levels of parallelism
and a large number of FEs, because there are too many concurrent LTO
link commands running at the same time and each one of them puts most of the
middle-end/backend objects into /tmp.

We have --enable-link-mutex configure option, but --enable-link-mutex has
a big problem that it decreases number of available jobs by the number of
link commands waiting for the lock, so e.g. when doing make -j32 build with
11 different big programs linked with $(LLINKER) we end up with just 22
effective jobs, and with e.g. make -j8 with those 11 different big programs
we actually most likely serialize everything during linking onto a single job.

The following patch implements a new configure option,
--enable-link-serialization, which implements different serialization and
as it doesn't use the mutex, just modifying the old option to be implemented
differently would be strange.  We can deprecate and later remove the old
option.  The new option doesn't use any shell mutexes, but uses make
dependencies.

The option is implemented inside of gcc/ configure and Makefiles,
which means that even inside of gcc/ make all (as well as e.g. make lto-dump)
will serialize and build all previous large binaries when configured this
way.
One can always make -j32 cc1 DO_LINK_SERIALIZATION=
to avoid that.
Furthermore, I've implemented the idea I wrote about, so that
--enable-link-serialization
is the same as
--enable-link-serialization=1
and means the large link commands are serialized, one can (the default)
--disable-link-serialization
which will cause all links to be parallelizable, but one can also
--enable-link-serialization=3
etc. which says that at most 3 of the large link commands can run
concurrently.
And finally I've implemented (only if the serialization is enabled) simple
progress bars for the linking.
With --enable-link-serialization and e.g. the 5 large links I have in my
current tree (cc1, cc1plus, f951, lto1 and lto-dump), before the linking it
prints
Linking |==--      | 20%
and after it
Linking |====      | 40%
(each == characters stand for already finished links, each --
characters stand for the link being started).
With --enable-link-serialization=3 it will change the way the start is
printed, one will get:
Linking |--        | 0%
at the start of cc1 link,
Linking |>>--      | 0%
at the start of the second large link and
Linking |>>>>--    | 0%
at the start of the third large link, where the >> characters stand for
already pending links.  The printing at the end of link command is
the same as with the full serialization, i.e. for the above 3:
Linking |==        | 20%
Linking |====      | 40%
Linking |======    | 60%
but one could actually get them in any order depending on which of those 3
finishes first - to get it 100% accurate I'd need to add some directory with
files representing finished links or similar, doesn't seem worth it.

2020-11-18  Jakub Jelinek  <jakub@redhat.com>

gcc/
* configure.ac: Add $lang.prev rules, INDEX.$lang and SERIAL_LIST and
SERIAL_COUNT variables to Make-hooks.
(--enable-link-serialization): New configure option.
* Makefile.in (DO_LINK_SERIALIZATION, LINK_PROGRESS): New variables.
* doc/install.texi (--enable-link-serialization): Document.
* configure: Regenerated.
gcc/c/
* Make-lang.in (c.serial): New goal.
(.PHONY): Add c.serial c.prev.
(cc1$(exeext)): Call LINK_PROGRESS.
gcc/cp/
* Make-lang.in (c++.serial): New goal.
(.PHONY): Add c++.serial c++.prev.
(cc1plus$(exeext)): Depend on c++.prev.  Call LINK_PROGRESS.
gcc/fortran/
* Make-lang.in (fortran.serial): New goal.
(.PHONY): Add fortran.serial fortran.prev.
(f951$(exeext)): Depend on fortran.prev.  Call LINK_PROGRESS.
gcc/lto/
* Make-lang.in (lto, lto1.serial, lto2.serial): New goals.
(.PHONY): Add lto lto1.serial lto1.prev lto2.serial lto2.prev.
(lto.all.cross, lto.start.encap): Remove dependencies.
($(LTO_EXE)): Depend on lto1.prev.  Call LINK_PROGRESS.
($(LTO_DUMP_EXE)): Depend on lto2.prev.  Call LINK_PROGRESS.
gcc/objc/
* Make-lang.in (objc.serial): New goal.
(.PHONY): Add objc.serial objc.prev.
(cc1obj$(exeext)): Depend on objc.prev.  Call LINK_PROGRESS.
gcc/objcp/
* Make-lang.in (obj-c++.serial): New goal.
(.PHONY): Add obj-c++.serial obj-c++.prev.
(cc1objplus$(exeext)): Depend on obj-c++.prev.  Call LINK_PROGRESS.
gcc/ada/
* gcc-interface/Make-lang.in (ada.serial): New goal.
(.PHONY): Add ada.serial ada.prev.
(gnat1$(exeext)): Depend on ada.prev.  Call LINK_PROGRESS.
gcc/brig/
* Make-lang.in (brig.serial): New goal.
(.PHONY): Add brig.serial brig.prev.
(brig1$(exeext)): Depend on brig.prev.  Call LINK_PROGRESS.
gcc/go/
* Make-lang.in (go.serial): New goal.
(.PHONY): Add go.serial go.prev.
(go1$(exeext)): Depend on go.prev.  Call LINK_PROGRESS.
gcc/jit/
* Make-lang.in (jit.serial): New goal.
(.PHONY): Add jit.serial jit.prev.
($(LIBGCCJIT_FILENAME)): Depend on jit.prev.  Call LINK_PROGRESS.
gcc/d/
* Make-lang.in (d.serial): New goal.
(.PHONY): Add d.serial d.prev.
(d21$(exeext)): Depend on d.prev.  Call LINK_PROGRESS.

testsuite: Adjust bb-slp-pr68892.c for AArch64

AArch64 passes the "not profitable" test because it treats vec_construct
as having a high-enough cost.  This means that we can try other vector
modes, which in turn causes "BB vectorization with gaps at the end of
a load is not supported" to be printed more than once.  The number of
times that we print the message doesn't seem important, so the patch
converts it to a plain scan-tree-dump.

gcc/testsuite/
* gcc.dg/vect/bb-slp-pr68892.c: Don't XFAIL the profitability
test for aarch64*-*-*.  Allow the "BB vectorization with gaps"
message to be printed more than once.

testsuite: Adjust gcc.dg/vect/slp-21.c for Arm targets

On arm* and aarch64* targets, we can vectorise the second of the main
loops using SLP, not just the third. As the comments say, whether this
is supported depends on a very specific permutation, so it seemed better
to use direct target selectors.

gcc/testsuite/
* gcc.dg/vect/slp-21.c: Expect 4 SLP instances to be vectorized
on arm* and aarch64* targets.

testsuite: Add vect_perm3_int guards

SLP vectorisation of gcc.dg/vect/fast-math-vect-call-1.c involves
a group of 3 floats, which requires the same permutation as
vect_perm3_int.

The load/store_lanes XFAILs in gcc.dg/vect/slp-perm-6.c implicitly
assumed vect_perm3_int, which is true for Advanced SIMD but not for
VLA SVE. Whether it's true for fixed-length SVE depends on the
vector length.

The xfail selector applies on top of the target selector, so it's
not necessary to make the xfail selector a strict subset of the
target selector.

gcc/testsuite/
* gcc.dg/vect/fast-math-vect-call-1.c: Only expect SLP to be used
on vect_perm3_int targets.
* gcc.dg/vect/slp-perm-6.c: Likewise. Only XFAIL the LOAD/STORE_LANES
tests on vect_perm3_int targets.

testsuite: Add a vect_partial_vectors_usage_2 guard

We don't need an epilogue loop if the main loop can operate on
partial vectors, so this patch disables an associated test.
The alternative would be to force partial-vectors-usage=1
on the command line.

gcc/testsuite/
* gcc.dg/vect/vect-epilogues.c: XFAIL test for epilogue loop
vectorization if vect_partial_vectors_usage_2.

testsuite: Fix vect/vect-sdiv-pow2-1.c

We're now able to vectorise the set-up loop:

      int p = power2 (fns[i].po2);
      for (int j = 0; j < N; j++)
        a[j] = ((p << 4) * j) / (N - 1) - (p << 5);

This patch adds an asm to stop the loop being vectorised.

gcc/testsuite/
* gcc.dg/vect/vect-sdiv-pow2-1.c (main): Add an asm to the
set-up loop.

aix: Fixinclude

This fixes an ODR violation in the AIX headers that is detected by C++
modules. While unnamed structs with typedef names for linkage
purposes are accepted, this case is an anonymous struct without such a
typedef name -- the typedef is attached to the pointer-to-struct type.
Fixed by naming the struct.

fixincludes/
* inclhack.def (aix_physaddr_t): New.
* fixincl.x: Regenerated.

preprocessor: C++ module-directives

C++20 modules introduces a new kind of preprocessor directive -- a
module directive.  These are directives but without the leading '#'.
We have to detect them by sniffing the start of a logical line.  When
detected we replace the initial identifiers with unspellable tokens
and pass them through to the language parser the same way deferred
pragmas are.  There's a PRAGMA_EOL at the logical end of line too.

One additional complication is that we have to do header-name lexing
after the initial tokens, and that requires changes in the macro-aware
piece of the preprocessor.  The above sniffer sets a counter in the
lexer state, and that triggers at the appropriate point.  We then do
the same header-name lexing that occurs on a #include directive or
has_include pseudo-macro.  Except that the header name ends up in the
token stream.

A couple of token emitters need to deal with the new token possibility.

gcc/c-family/
* c-lex.c (c_lex_with_flags): CPP_HEADER_NAMEs can now be seen.
libcpp/
* include/cpplib.h (struct cpp_options): Add module_directives
option.
(NODE_MODULE): New node flag.
(struct cpp_hashnode): Make rid-code a bitfield, increase bits in
flags and swap with type field.
* init.c (post_options): Create module-directive identifier nodes.
* internal.h (struct lexer_state): Add directive_file_token &
n_modules fields.  Add module node enumerator.
* lex.c (cpp_maybe_module_directive): New.
(_cpp_lex_token): Call it.
(cpp_output_token): Add '"' around CPP_HEADER_NAME token.
(do_peek_ident, do_peek_module): New.
(cpp_directives_only): Detect module-directive lines.
* macro.c (cpp_get_token_1): Deal with directive_file_token
triggering.

preprocessor: Add support for header unit translation

libcpp/
* files.c (struct _cpp_file): Add header_unit field.
(_cpp_stack_file): Add header unit support.
(cpp_find_header_unit): New.
* include/cpplib.h (cpp_find_header_unit): Declare.

preprocessor: Update mkdeps for modules

This is slightly different to the original patch I posted. This adds
separate module target and dependency functions (rather than a single
bi-modal function).

libcpp/
* include/cpplib.h (struct cpp_options): Add modules to
dep-options.
* include/mkdeps.h (deps_add_module_target): Declare.
(deps_add_module_dep): Declare.
* mkdeps.c (class mkdeps): Add modules, module_name, cmi_name,
is_header_unit fields. Adjust cdtors.
(deps_add_module_target, deps_add_module_dep): New.
(make_write): Write module dependencies, if enabled.

libstdc++: Fix ranges::join_view::_Iterator::operator-> [LWG 3500]

This applies the proposed resolution of LWG 3500, which corrects the
return type and constraints of this member function to use the right
iterator type. Additionally, a nearby local variable is uglified.

libstdc++-v3/ChangeLog:

* include/std/ranges (join_view::_Iterator::_M_satisfy): Uglify
local variable inner.
(join_view::_Iterator::operator->): Use _Inner_iter instead of
_Outer_iter in the function signature as per LWG 3500.
* testsuite/std/ranges/adaptors/join.cc (test08): Test it.

[PR97870] LRA: don't remove asm goto, just nullify it.

gcc/

2020-11-18 Vladimir Makarov <vmakarov@redhat.com>

PR target/97870
* lra-constraints.c (curr_insn_transform): Do not delete asm goto
with wrong constraints. Nullify it saving CFG.

testsuite/libgomp.c/usleep.h: Use sleep-loop also for GCN

As typically configured, newlib's libc.a does not build 'posix' and,
hence, usleep is not available. Thus, use the same fallback as for nvptx.

libgomp/
* testsuite/libgomp.c/usleep.h (fallback_usleep): Renamed from
nvptx_usleep; use also for device={arch(gcn)}.