git.libre-soc.org Git

[Arm] Implement CDE predicated intrinsics for MVE registers

These intrinsics are the predicated version of the intrinsics inroduced
in https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542725.html.

These are not yet public on developer.arm.com but we have reached
internal consensus on them.

The approach follows the same method as for the CDE intrinsics for MVE
registers, most notably using the same arm_resolve_overloaded_builtin
function with minor modifications.

The resolver hook has been moved from arm-builtins.c to arm-c.c so it
can access the c-common function build_function_call_vec.  This function
is needed to perform the same checks on arguments as a normal C or C++
function would perform.
It is fine to put this resolver in arm-c.c since it's only use is for
the ACLE functions, and these are only available in C/C++.
So that the resolver function has access to information it needs from
the builtins, we put two query functions into arm-builtins.c and use
them from arm-c.c.

We rely on the order that the builtins are defined in
gcc/config/arm/arm_cde_builtins.def, knowing that the predicated
versions come after the non-predicated versions.

The machine description patterns for these builtins are simpler than
those for the non-predicated versions, since the accumulator versions
*and* non-accumulator versions both need an input vector now.
The input vector is needed for the non-accumulator version to describe
the original values for those lanes that are not updated during the
merge operation.

We additionally need to introduce qualifiers for these new builtins,
which follow the same pattern as the non-predicated versions but with an
extra argument to describe the predicate.

Error message changes:
- We directly mention the builtin argument when complaining that an
  argument is not in the correct range.
  This more closely matches the C error messages.
- We ensure the resolver complains about *all* invalid arguments to a
  function instead of just the first one.
- The resolver error messages index arguments from 1 instead of 0 to
  match the arguments coming from the C/C++ frontend.

In order to allow the user to give an argument for the merging predicate
when they don't care what data is stored in the 'false' lanes, we also
move the __arm_vuninitializedq* intrinsics from arm_mve.h to
arm_mve_types.h which is shared with arm_cde.h.

We only move the fully type-specified `__arm_vuninitializedq*`
intrinsics and not the polymorphic versions, since moving the
polymorphic versions requires moving the _Generic framework as well as
just the intrinsics we're interested in.  This matches the approach taken
for the `__arm_vreinterpret*` functions in this include file.

This patch also contains a slight change in spacing of an existing
assembly instruction to be emitted.
This is just to help writing tests -- vmsr usually has a tab and a space
between the mnemonic and the first argument, but in one case it just has
a tab -- making all the same helps make test regexps simpler.

Testing Done:
    Bootstrap and full regtest on arm-none-linux-gnueabihf
    Full regtest on arm-none-eabi

    All testing done with a local fix for the bugzilla PR below.
    That bugzilla currently causes multiple ICE's on the tests added in
    this patch.
    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94341

gcc/ChangeLog:

2020-04-02  Matthew Malcomson  <matthew.malcomson@arm.com>

* config/arm/arm-builtins.c (CX_UNARY_UNONE_QUALIFIERS): New.
(CX_BINARY_UNONE_QUALIFIERS): New.
(CX_TERNARY_UNONE_QUALIFIERS): New.
(arm_resolve_overloaded_builtin): Move to arm-c.c.
(arm_expand_builtin_args): Update error message.
(enum resolver_ident): New.
(arm_describe_resolver): New.
(arm_cde_end_args): New.
* config/arm/arm-builtins.h: New file.
* config/arm/arm-c.c (arm_resolve_overloaded_builtin): New.
(arm_resolve_cde_builtin): Moved from arm-builtins.c.
* config/arm/arm_cde.h (__arm_vcx1q_m, __arm_vcx1qa_m,
__arm_vcx2q_m, __arm_vcx2qa_m, __arm_vcx3q_m, __arm_vcx3qa_m):
New.
* config/arm/arm_cde_builtins.def (vcx1q_p_, vcx1qa_p_,
vcx2q_p_, vcx2qa_p_, vcx3q_p_, vcx3qa_p_): New builtin defs.
* config/arm/iterators.md (CDE_VCX): New int iterator.
(a) New int attribute.
* config/arm/mve.md (arm_vcx1q<a>_p_v16qi, arm_vcx2q<a>_p_v16qi,
arm_vcx3q<a>_p_v16qi): New patterns.
* config/arm/vfp.md (thumb2_movhi_fp16): Extra space in assembly.

gcc/testsuite/ChangeLog:

2020-04-02  Matthew Malcomson  <matthew.malcomson@arm.com>

* gcc.target/arm/acle/cde-errors.c: Add predicated forms.
* gcc.target/arm/acle/cde-mve-error-1.c: Add predicated forms.
* gcc.target/arm/acle/cde-mve-error-2.c: Add predicated forms.
* gcc.target/arm/acle/cde-mve-error-3.c: Add predicated forms.
* gcc.target/arm/acle/cde-mve-full-assembly.c: Add predicated
forms.
* gcc.target/arm/acle/cde-mve-tests.c: Add predicated forms.
* gcc.target/arm/acle/cde_v_1_err.c (test_imm_range): Update for
error message format change.
* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c:
Update scan-assembler regexp.

[Arm] Implement CDE intrinsics for MVE registers.

Implement CDE intrinsics on MVE registers.

Other than the basics required for adding intrinsics this patch consists
of three changes.

** We separate out the MVE types and casts from the arm_mve.h header.

This is so that the types can be used in arm_cde.h without the need to include
the entire arm_mve.h header.
The only type that arm_cde.h needs is `uint8x16_t`, so this separation could be
avoided by using a `typedef` in this file.
Since the introduced intrinsics are all defined to act on the full range of MVE
types, declaring all such types seems intuitive since it will provide their
declaration to the user too.

This arm_mve_types.h header not only includes the MVE types, but also
the conversion intrinsics between them.
Some of the conversion intrinsics are needed for arm_cde.h, but most are
not.  We include all conversion intrinsics to keep the definition of
such conversion functions all in one place, on the understanding that
extra conversion functions being defined when including `arm_cde.h` is
not a problem.

** We define the TARGET_RESOLVE_OVERLOADED_BUILTIN hook for the Arm backend.

This is needed to implement the polymorphism for the required intrinsics.
The intrinsics have no specialised version, and the resulting assembly
instruction for all different types should be exactly the same.
Due to this we have implemented these intrinsics via one builtin on one type.
All other calls to the intrinsic with different types are implicitly cast to
the one type that is defined, and hence are all expanded to the same RTL
pattern that is only defined for one machine mode.

** We seperate the initialisation of the CDE intrinsics from others.

This allows us to ensure that the CDE intrinsics acting on MVE registers
are only created when both CDE and MVE are available.
Only initialising these builtins when both features are available is
especially important since they require a type that is only initialised
when the target supports hard float.  Hence trying to initialise these
builtins on a soft float target would cause an ICE.

Testing done:
  Full bootstrap and regtest on arm-none-linux-gnueabihf
  Regression test on arm-none-eabi

Ok for trunk?

gcc/ChangeLog:

2020-03-10  Matthew Malcomson  <matthew.malcomson@arm.com>

* config.gcc (arm_mve_types.h): New extra_header for arm.
* config/arm/arm-builtins.c (arm_resolve_overloaded_builtin): New.
(arm_init_cde_builtins): New.
(arm_init_acle_builtins): Remove initialisation of CDE builtins.
(arm_init_builtins): Call arm_init_cde_builtins when target
supports CDE.
* config/arm/arm-c.c (arm_resolve_overloaded_builtin): New declaration.
(arm_register_target_pragmas): Initialise resolve_overloaded_builtin
hook to the implementation for the arm backend.
* config/arm/arm.h (ARM_MVE_CDE_CONST_1): New.
(ARM_MVE_CDE_CONST_2): New.
(ARM_MVE_CDE_CONST_3): New.
* config/arm/arm_cde.h (__arm_vcx1q_u8): New.
(__arm_vcx1qa): New.
(__arm_vcx2q): New.
(__arm_vcx2q_u8): New.
(__arm_vcx2qa): New.
(__arm_vcx3q): New.
(__arm_vcx3q_u8): New.
(__arm_vcx3qa): New.
* config/arm/arm_cde_builtins.def (vcx1q, vcx1qa, vcx2q, vcx2qa, vcx3q,
vcx3qa): New builtins defined.
* config/arm/arm_mve.h: Move typedefs and conversion intrinsics
to arm_mve_types.h header.
* config/arm/arm_mve_types.h: New file.
* config/arm/mve.md (arm_vcx1qv16qi, arm_vcx1qav16qi, arm_vcx2qv16qi,
arm_vcx2qav16qi, arm_vcx3qv16qi, arm_vcx3qav16qi): New patterns.
* config/arm/predicates.md (const_int_mve_cde1_operand,
const_int_mve_cde2_operand, const_int_mve_cde3_operand): New.

gcc/testsuite/ChangeLog:

2020-03-23  Matthew Malcomson  <matthew.malcomson@arm.com>
    Dennis Zhang  <dennis.zhang@arm.com>

* gcc.target/arm/acle/cde-mve-error-1.c: New test.
* gcc.target/arm/acle/cde-mve-error-2.c: New test.
* gcc.target/arm/acle/cde-mve-error-3.c: New test.
* gcc.target/arm/acle/cde-mve-full-assembly.c: New test.
* gcc.target/arm/acle/cde-mve-tests.c: New test.
* lib/target-supports.exp (arm_v8_1m_main_cde_mve_fp): New check
effective.
(arm_v8_1m_main_cde_mve, arm_v8m_main_cde_fp): Use -mfpu=auto
so we only check configurations that make sense.

[Arm] Implement scalar Custom Datapath Extension intrinsics

This patch introduces the scalar CDE (Custom Datapath Extension)
intrinsics for the arm backend.

There is nothing beyond the standard in this patch.  We simply build upon what
has been done by Dennis for the vector intrinsics.

We do add `+cdecp6` to the default arguments for `target-supports.exp`, this
allows for using coprocessor 6 in tests. This patch uses an alternate
coprocessor to ease assembler scanning by looking for a use of coprocessor 6.

We also ensure that any DImode registers are put in an even-odd register pair
when compiling for a target with CDE -- this avoids faulty code generation for
-Os when producing the cx*d instructions.

Testing done:
Bootstrapped and regtested for arm-none-linux-gnueabihf.

gcc/ChangeLog:

2020-03-03  Matthew Malcomson  <matthew.malcomson@arm.com>

* config/arm/arm.c (arm_hard_regno_mode_ok): DImode registers forced
into even-odd register pairs for TARGET_CDE.
* config/arm/arm.h (ARM_CCDE_CONST_1): New.
(ARM_CCDE_CONST_2): New.
(ARM_CCDE_CONST_3): New.
* config/arm/arm.md (arm_cx1si, arm_cx1di arm_cx1asi, arm_cx1adi,
arm_cx2si, arm_cx2di arm_cx2asi, arm_cx2adi arm_cx3si, arm_cx3di,
arm_cx3asi, arm_cx3adi): New patterns.
* config/arm/arm_cde.h (__arm_cx1, __arm_cx1a, __arm_cx2, __arm_cx2a,
__arm_cx3, __arm_cx3a, __arm_cx1d, __arm_cx1da, __arm_cx2d, __arm_cx2da,
__arm_cx3d, __arm_cx3da): New ACLE function macros.
* config/arm/arm_cde_builtins.def (cx1, cx1a, cx2, cx2a, cx3, cx3a):
Define intrinsics.
* config/arm/iterators.md (cde_suffix, cde_dest): New mode attributes.
* config/arm/predicates.md (const_int_ccde1_operand,
const_int_ccde2_operand, const_int_ccde3_operand): New.
* config/arm/unspecs.md (UNSPEC_CDE, UNSPEC_CDEA): New.

gcc/testsuite/ChangeLog:

2020-03-03  Matthew Malcomson  <matthew.malcomson@arm.com>

* gcc.target/arm/acle/cde-errors.c: New test.
* gcc.target/arm/acle/cde.c: New test.
* lib/target-supports.exp: Update CDE flags to enable coprocessor 6.

arm: CDE intrinsics using FPU/MVE S/D registers

This patch enables the ACLE intrinsics calling VCX1<A>,
VCX2<A>, and VCX3<A> instructions who work with FPU/MVE
32-bit/64-bit registers. This patch also enables DImode for VFP
to support CDE with FPU.

gcc/ChangeLog:
2020-04-08  Dennis Zhang  <dennis.zhang@arm.com>
    Matthew Malcomson <matthew.malcomson@arm.com>

* config/arm/arm-builtins.c (CX_IMM_QUALIFIERS): New macro.
(CX_UNARY_QUALIFIERS, CX_BINARY_QUALIFIERS): Likewise.
(CX_TERNARY_QUALIFIERS): Likewise.
(ARM_BUILTIN_CDE_PATTERN_START): Likewise.
(ARM_BUILTIN_CDE_PATTERN_END): Likewise.
(arm_init_acle_builtins): Initialize CDE builtins.
(arm_expand_acle_builtin): Check CDE constant operands.
* config/arm/arm.h (ARM_CDE_CONST_COPROC): New macro to set the range
of CDE constant operand.
* config/arm/arm.c (arm_hard_regno_mode_ok): Support DImode for
TARGET_VFP_BASE.
(ARM_VCDE_CONST_1, ARM_VCDE_CONST_2, ARM_VCDE_CONST_3): Likewise.
* config/arm/arm_cde.h (__arm_vcx1_u32): New macro of ACLE interface.
(__arm_vcx1a_u32, __arm_vcx2_u32, __arm_vcx2a_u32): Likewise.
(__arm_vcx3_u32, __arm_vcx3a_u32, __arm_vcx1d_u64): Likewise.
(__arm_vcx1da_u64, __arm_vcx2d_u64, __arm_vcx2da_u64): Likewise.
(__arm_vcx3d_u64, __arm_vcx3da_u64): Likewise.
* config/arm/arm_cde_builtins.def: New file.
* config/arm/iterators.md (V_reg): New attribute of SI.
* config/arm/predicates.md (const_int_coproc_operand): New.
(const_int_vcde1_operand, const_int_vcde2_operand): New.
(const_int_vcde3_operand): New.
* config/arm/unspecs.md (UNSPEC_VCDE, UNSPEC_VCDEA): New.
* config/arm/vfp.md (arm_vcx1<mode>): New entry.
(arm_vcx1a<mode>, arm_vcx2<mode>, arm_vcx2a<mode>): Likewise.
(arm_vcx3<mode>, arm_vcx3a<mode>): Likewise.

gcc/testsuite/ChangeLog:
2020-04-08  Dennis Zhang  <dennis.zhang@arm.com>

* gcc.target/arm/acle/cde_v_1.c: New test.
* gcc.target/arm/acle/cde_v_1_err.c: New test.
* gcc.target/arm/acle/cde_v_1_mve.c: New test.

c++: Function type and parameter type disagreements [PR92010]

This resolves parts of Core issues 1001/1322 by rebuilding the function type
of an instantiated function template in terms of its formal parameter types
whenever the original function type and formal parameter types disagree about
the type of a parameter after substitution.

gcc/cp/ChangeLog:

Core issues 1001 and 1322
PR c++/92010
* pt.c (rebuild_function_or_method_type): Split function out from ...
(tsubst_function_type): ... here.
(maybe_rebuild_function_decl_type): New function.
(tsubst_function_decl): Use it.

gcc/testsuite/ChangeLog:

Core issues 1001 and 1322
PR c++/92010
* g++.dg/cpp2a/lambda-uneval11.c: New test.
* g++.dg/template/array33.C: New test.
* g++.dg/template/array34.C: New test.
* g++.dg/template/defarg22.C: New test.

arm: CLI for Custom Datapath Extension (CDE)

This patch is part of a series that adds support for the Arm Custom
Datapath Extension. It defines the options cdecp0-cdecp7 for CLI to
enable the CDE on corresponding coprocessor 0-7.
It also adds new target supports for CDE feature testsuite.

gcc/ChangeLog:
2020-04-08 Dennis Zhang <dennis.zhang@arm.com>

* config.gcc: Add arm_cde.h.
* config/arm/arm-c.c (arm_cpu_builtins): Define or undefine
__ARM_FEATURE_CDE and __ARM_FEATURE_CDE_COPROC.
* config/arm/arm-cpus.in (cdecp0, cdecp1, ..., cdecp7): New options.
* config/arm/arm.c (arm_option_reconfigure_globals): Configure
arm_arch_cde and arm_arch_cde_coproc to store the feature bits.
* config/arm/arm.h (TARGET_CDE): New macro.
* config/arm/arm_cde.h: New file.
* doc/invoke.texi: Document CDE options +cdecp[0-7].
* doc/sourcebuild.texi (arm_v8m_main_cde_ok): Document new target
supports option.
(arm_v8m_main_cde_fp, arm_v8_1m_main_cde_mve): Likewise.

gcc/testsuite/ChangeLog:
2020-04-08 Dennis Zhang <dennis.zhang@arm.com>

* gcc.target/arm/pragma_cde.c: New test.
* lib/target-supports.exp (arm_v8m_main_cde_ok): New target support
option.
(arm_v8m_main_cde_fp, arm_v8_1m_main_cde_mve): Likewise.

c++: Further fix for -fsanitize=vptr [PR94325]

For -fsanitize=vptr, we insert a NULL store into the vptr instead of just
adding a CLOBBER of this. build_clobber_this makes the CLOBBER conditional
on in_charge (implicit) parameter whenever CLASSTYPE_VBASECLASSES, but when
adding this conditionalization to the -fsanitize=vptr code in PR87095,
I wanted it to catch some more cases when the class has CLASSTYPE_VBASECLASSES,
but the vptr is still not shared with something else, otherwise the
sanitization would be less effective.
The following testcase shows that the chosen test that CLASSTYPE_PRIMARY_BINFO
is non-NULL and has BINFO_VIRTUAL_P set wasn't sufficient,
the D class has still sizeof(D) == sizeof(void*) and thus contains just
a single vptr, but while in B::~B() this results in the vptr not being
cleared, in C::~C() this condition isn't true, as CLASSTYPE_PRIMARY_BINFO
in that case is B and is not BINFO_VIRTUAL_P, so it clears the vptr, but the
D::~D() dtor after invoking C::~C() invokes A::~A() with an already cleared
vptr, which is then reported.
The following patch is just a shot in the dark, keep looking through
CLASSTYPE_PRIMARY_BINFO until we find BINFO_VIRTUAL_P, but it works on the
existing testcase as well as this new one.

2020-04-08 Jakub Jelinek <jakub@redhat.com>

PR c++/94325
* decl.c (begin_destructor_body): For CLASSTYPE_VBASECLASSES class
dtors, if CLASSTYPE_PRIMARY_BINFO is non-NULL, but not BINFO_VIRTUAL_P,
look at CLASSTYPE_PRIMARY_BINFO of its BINFO_TYPE if it is not
BINFO_VIRTUAL_P, and so on.

* g++.dg/ubsan/vptr-15.C: New test.

c++: ICE with defaulted comparison operator [PR94478]

Here we ICE because early_check_defaulted_comparison passed a null
ctx to same_type_p. The attached test is ill-formed according to
[class.compare.default]/1, so fixed by detecting this case early.

PR c++/94478 - ICE with defaulted comparison operator
* method.c (early_check_defaulted_comparison): Give an error when the
context is null.

* g++.dg/cpp2a/spaceship-err4.C: New test.

update polytypes.c -flax-vector-conversions msg

Since commit 2f6d557ff82876432be76b1892c6c3783c0095f4 AKA SVN-r269586,
the inform() message suggesting the use of -flax-vector-conversions
has had quotes around the option name, but the testcase still expected
the message without the quotes. This patch adds to the expected
compiler output the quotes that are now issues.

for gcc/testsuite/ChangeLog

* gcc.target/arm/polytypes.c: Add quotes around
-flax-vector-conversions.

postreload: Fix autoinc handling in reload_cse_move2add [PR94516]

The following testcase shows two separate issues caused by the cselib
changes.
One is that through the cselib sp tracking improvements on
... r12 = rsp; rsp -= 8; push cst1; push cst2; push cst3; call
rsp += 32; rsp -= 8; push cst4; push cst5; push cst6; call
rsp += 32; rsp -= 8; push cst7; push cst8; push cst9; call
rsp += 32
reload_cse_simplify_set decides to optimize the rsp += 32 insns
into rsp = r12 because cselib figures that the r12 register holds the right
value.  From the pure cost perspective that seems like a win and on its own
at least for -Os that would be beneficial, except that there are those
rsp -= 8 stack adjustments after it, where rsp += 32; rsp -= 8; is optimized
into rsp += 24; by the csa pass, but rsp = r12; rsp -= 8 can't.  Dunno
what to do about this part, the PR has a hack in a comment.

Anyway, the following patch fixes the other part, which isn't a missed
optimization, but a wrong-code issue.  The problem is that the pushes of
constant are on x86 represented through PRE_MODIFY and while
move2add_note_store has some code to handle {PRE,POST}_{INC,DEC} without
REG_INC note, it doesn't handle {PRE,POST}_MODIFY (that would be enough
to fix this testcase).  But additionally it looks misplaced, because
move2add_note_store is only called on the rtxes that are stored into,
while RTX_AUTOINC can happen not just in those, but anywhere else in the
instruction (e.g. pop insn can have autoinc in the SET_SRC MEM).
REG_INC note seems to be required for any autoinc except for stack pointer
autoinc which doesn't have those notes, so this patch just handles
the sp autoinc after the REG_INC note handling loop.

2020-04-08  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/94516
* postreload.c: Include rtl-iter.h.
(reload_cse_move2add): Handle SP autoinc here by FOR_EACH_SUBRTX_VAR
looking for all MEMs with RTX_AUTOINC operand.
(move2add_note_store): Remove {PRE,POST}_{INC,DEC} handling.

* gcc.dg/torture/pr94516.c: New test.

HSA: omp-grid.c – access proper clause code

* omp-grid.c (grid_eliminate_combined_simd_part): Use
OMP_CLAUSE_CODE to access the omp clause code.

Undo accidental commit to omp-grid.c

The following change accidentally got committed in the previous
commit, r10-7614-g13e41d8b9d3d7598c72c38acc86a3d97046c8373,
among the intended changes. Hence:

Revert:
gcc/
* omp-grid.c (grid_eliminate_combined_simd_part): Use
OMP_CLAUSE_CODE to access the omp clause code.

[C/C++, OpenACC] Reject vars of different scope in acc declare (PR94120)

gcc/c/
PR middle-end/94120
* c-decl.c (c_check_in_current_scope): New function.
* c-tree.h (c_check_in_current_scope): Declare it.
* c-parser.c (c_parser_oacc_declare): Add check that variables
are declared in the same scope as the directive. Fix handling
of namespace vars.

gcc/cp/
PR middle-end/94120
* paser.c (cp_parser_oacc_declare): Add check that variables
are declared in the same scope as the directive.

gcc/testsuite/
PR middle-end/94120
* c-c++-common/goacc/declare-pr94120.c: New.
* g++.dg/declare-pr94120.C: New.

libgomp/testsuite/
PR middle-end/94120
* libgomp.oacc-c++/declare-pr94120.C: New.

libphobos: Always build with warning flags enabled

This moves WARN_DFLAGS from GDCFLAGS to AM_DFLAGS so it is always
included in the build and testsuite of libphobos. Currently, this
doesn't happen as GDCFLAGS is overriden by it being set at the
top-level.

libphobos/ChangeLog:

* Makefile.in: Regenerate.
* configure: Regenerate.
* configure.ac: Substite WARN_DFLAGS independently of GDCFLAGS.
* libdruntime/Makefile.am: Add WARN_DFLAGS to AM_DFLAGS.
* libdruntime/Makefile.in: Regenerate.
* src/Makefile.am: Add WARN_DFLAGS to AM_DFLAGS.
* src/Makefile.in: Regenerate.
* testsuite/Makefile.in: Regenerate.
* testsuite/testsuite_flags.in: Add WARN_DFLAGS to --gdcflags.

c++: requires-expression and tentative parse [PR94480]

The problem here was that cp_parser_requires_expression committing to a
tentative parse confused cp_parser_decltype_expr, which needs to still be
tentative. The only reason to commit here is to get syntax errors within
the requires-expression, which we can still do when the commit is firewalled
from the enclosing context.

gcc/cp/ChangeLog
2020-04-07 Jason Merrill <jason@redhat.com>

PR c++/94480
* parser.c (cp_parser_requires_expression): Use tentative_firewall.

libphobos: Merge upstream phobos fb4f6a713

Improves the versioning of IeeeFlags and FloatingPointControl code and
unit-tests, making it clearer which targets can and cannot support it.

Reviewed-on: https://github.com/dlang/phobos/pull/7435

Daily bump.

Fix a variety of testsuite failures on the H8 after recent cselib changes

PR rtl-optimization/92264
* config/h8300/h8300.md (mov;add peephole2): Avoid applying when
the destination is the stack pointer.

c++: ICE on invalid concept placeholder [PR94481].

Here the 'decltype' is missing '(auto)', so open_paren was NULL, and trying
to get its location is a SEGV. Using matching_parens avoids that problem.

gcc/cp/ChangeLog
2020-04-07 Jason Merrill <jason@redhat.com>

PR c++/94481
* parser.c (cp_parser_placeholder_type_specifier): Use
matching_parens.

combine: Fix split_i2i3 ICE [PR94291]

The following testcase ICEs on armv7hl-linux-gnueabi.
try_combine is called on:
(gdb) p debug_rtx (i3)
(insn 20 12 22 2 (set (mem/c:SI (plus:SI (reg/f:SI 102 sfp)
                (const_int -4 [0xfffffffffffffffc])) [1 x+0 S4 A32])
        (reg:SI 125)) "pr94291.c":7:8 241 {*arm_movsi_insn}
     (expr_list:REG_DEAD (reg:SI 125)
        (nil)))
(gdb) p debug_rtx (i2)
(insn 12 7 20 2 (parallel [
            (set (reg:CC 100 cc)
                (compare:CC (reg:SI 121 [ <retval> ])
                    (const_int 0 [0])))
            (set (reg:SI 125)
                (reg:SI 121 [ <retval> ]))
        ]) "pr94291.c":7:8 248 {*movsi_compare0}
     (expr_list:REG_UNUSED (reg:CC 100 cc)
        (nil)))
and tries to recognize cc = r121 cmp 0; [sfp-4] = r121 parallel,
but that isn't recognized, so it splits it into two: split_i2i3
[sfp-4] = r121 followed by cc = r121 cmp 0 which is recognized, but
ICEs because the code below insist that the SET_DEST of newi2pat
(or first set in PARALLEL thereof) must be a REG or SUBREG of REG,
but it is a MEM in this case.  I don't see any condition that would
guarantee that, perhaps for the swap_i2i3 case it was somehow guaranteed.

As the code just wants to update LOG_LINKS and LOG_LINKS are only for
registers, not for MEM or anything else, the patch just doesn't update those
if it isn't a REG or SUBREG of REG.

2020-04-07  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/94291
PR rtl-optimization/84169
* combine.c (try_combine): For split_i2i3, don't assume SET_DEST
must be a REG or SUBREG of REG; if it is not one of these, don't
update LOG_LINKs.

* gcc.dg/pr94291.c: New test.

S/390: Fix PR91628

With this patch we get rid of the usage of the glibc-internal symbol
__tls_get_addr_internal.

If build with multilib, the file
gcc/libphobos/libdruntime/config/systemz/get_tls_offset.S is used
for both configurations: systemz and s390.
Therefore both implementations are now in the systemz file which
uses an "#ifdef __s390x__" in order to distinguish both cases.
The s390 file is just including the systemz one.

libphobos/ChangeLog:

2020-04-07 Robin Dapp <rdapp@linux.ibm.com>
Stefan Liebler <stli@linux.ibm.com>

* configure: Regenerate.
* libdruntime/Makefile.am: Add s390x and s390.
* libdruntime/Makefile.in: Regenerate.
* libdruntime/config/s390/get_tls_offset.S: New file.
* libdruntime/config/systemz/get_tls_offset.S: New file.
* libdruntime/gcc/sections/elf_shared.d: Use ibmz_get_tls_offset.
* m4/druntime/cpu.m4: Add s390x and s390.

libgcc: use syscall rather than __mmap/__munmap

PR libgcc/94513
* generic-morestack.c: Give up trying to use __mmap/__munmap, use
syscall instead.

middle-end/94479 - fix gimplification of address

When gimplifying an address operand we may expose an indirect
ref via DECL_VALUE_EXPR for example. This is dealt with in the
code already but it fails to consider that INDIRECT_REFs get
gimplified to MEM_REFs.

Fixed which makes the ICE observed on x86_64-netbsd go away.

2020-04-07 Richard Biener <rguenther@suse.de>

PR middle-end/94479
* gimplify.c (gimplify_addr_expr): Also consider generated
MEM_REFs.

* gcc.dg/torture/pr94479.c: New testcase.

Fix PR fortran/93871 and re-implement degree-valued trigonometric intrinsics.

2020-04-01 Fritz Reese <foreese@gcc.gnu.org>
Steven G. Kargl <kargl@gcc.gnu.org>

gcc/fortran/ChangeLog

PR fortran/93871
* gfortran.h (GFC_ISYM_ACOSD, GFC_ISYM_ASIND, GFC_ISYM_ATAN2D,
GFC_ISYM_ATAND, GFC_ISYM_COSD, GFC_ISYM_COTAND, GFC_ISYM_SIND,
GFC_ISYM_TAND): New.
* intrinsic.c (add_functions): Remove check for flag_dec_math.
Give degree trig functions simplification and name resolution
functions (e.g, gfc_simplify_atrigd () and gfc_resolve_atrigd ()).
(do_simplify): Remove special casing of degree trig functions.
* intrinsic.h (gfc_simplify_acosd, gfc_simplify_asind,
gfc_simplify_atand, gfc_simplify_cosd, gfc_simplify_cotand,
gfc_simplify_sind, gfc_simplify_tand, gfc_resolve_trigd2): Add new
prototypes.
(gfc_simplify_atrigd, gfc_simplify_trigd, gfc_resolve_cotan,
resolve_atrigd): Remove prototypes of deleted functions.
* iresolve.c (is_trig_resolved, copy_replace_function_shallow,
gfc_resolve_cotan, get_radians, get_degrees, resolve_trig_call,
gfc_resolve_atrigd, gfc_resolve_atan2d): Delete functions.
(gfc_resolve_trigd, gfc_resolve_trigd2): Resolve to library functions.
* simplify.c (rad2deg, deg2rad, gfc_simplify_acosd, gfc_simplify_asind,
gfc_simplify_atand, gfc_simplify_atan2d, gfc_simplify_cosd,
gfc_simplify_sind, gfc_simplify_tand, gfc_simplify_cotand): New
functions.
(gfc_simplify_atan2): Fix error message.
(simplify_trig_call, gfc_simplify_trigd, gfc_simplify_atrigd,
radians_f): Delete functions.
* trans-intrinsic.c: Add LIB_FUNCTION decls for sind, cosd, tand.
(rad2deg, gfc_conv_intrinsic_atrigd, gfc_conv_intrinsic_cotan,
gfc_conv_intrinsic_cotand, gfc_conv_intrinsic_atan2d): New functions.
(gfc_conv_intrinsic_function): Handle ACOSD, ASIND, ATAND, COTAN,
COTAND, ATAN2D.
* trigd_fe.inc: New file. Included by simplify.c to implement
simplify_sind, simplify_cosd, simplify_tand with code common to the
libgfortran implementation.

gcc/testsuite/ChangeLog

PR fortran/93871
* gfortran.dg/dec_math.f90: Extend coverage to real(10) and real(16).
* gfortran.dg/dec_math_2.f90: New test.
* gfortran.dg/dec_math_3.f90: Likewise.
* gfortran.dg/dec_math_4.f90: Likewise.
* gfortran.dg/dec_math_5.f90: Likewise.

libgfortran/ChangeLog

PR fortran/93871
* Makefile.am, Makefile.in: New make rule for intrinsics/trigd.c.
* gfortran.map: New routines for {sind, cosd, tand}X{r4, r8, r10, r16}.
* intrinsics/trigd.c, intrinsics/trigd_lib.inc, intrinsics/trigd.inc:
New files. Defines native degree-valued trig functions.

aarch64: Fix {ash[lr],lshr}<mode>3 expanders [PR94488]

The following testcase ICEs on aarch64 apparently since the introduction of
the aarch64 port.  The reason is that the {ashl,ashr,lshr}<mode>3 expanders
completely unnecessarily FAIL; if operands[2] is something other than
a CONST_INT or REG or MEM and the middle-end code can't cope with the
pattern giving up in these cases.  All the expanders use general_operand
predicate for the shift amount operand, but then have just a special case
for CONST_INT (if in-bound, emit an immediate shift, otherwise force into
REG), or MEM (force into REG), or REG (that is the case it handles).
In the testcase, operands[2] is a lowpart SUBREG of a REG, which is valid
general_operand.
I don't see any reason what is magic about MEMs that it should be forced
into REG and others like SUBREGs that it shouldn't, there isn't even a
reason to check for !REG_P because force_reg will do nothing if the operand
is already a REG, and otherwise can handle general_operand just fine.

2020-04-07  Jakub Jelinek  <jakub@redhat.com>

PR target/94488
* config/aarch64/aarch64-simd.md (ashl<mode>3, lshr<mode>3,
ashr<mode>3): Force operands[2] into reg whenever it is not CONST_INT.
Assume it is a REG after that instead of testing it and doing FAIL
otherwise.  Formatting fix.

* gcc.c-torture/compile/pr94488.c: New test.

libstdc++: Restore ability to use <charconv> in C++14 (PR 94520)

This C++17 header is supported in C++14 as a GNU extension, but stopped
working last year because I made it depend on an internal helper which
is only defined for C++17 and up.

PR libstdc++/94520
* include/std/charconv (__integer_to_chars_result_type)
(__integer_from_chars_result_type): Use __or_ instead of __or_v_ to
allow use in C++14.
* testsuite/20_util/from_chars/1.cc: Run test as C++14 and replace
use of std::string_view with std::string.
* testsuite/20_util/from_chars/2.cc: Likewise.
* testsuite/20_util/to_chars/1.cc: Likewise.
* testsuite/20_util/to_chars/2.cc: Likewise.

coroutines, ensure placeholder var is properly declared.

In cases that we need to extended the lifetime of a temporary captured
by reference, we make a replacement var for the temporary.  This will
be then used to define a coroutine frame entry (so that the var created
is elided by a later phase).  However, we should ensure that the var
is correctly declared anyway.

gcc/cp/ChangeLog:

2020-04-07  Iain Sandoe  <iain@sandoe.co.uk>

* coroutines.cc (maybe_promote_captured_temps): Ensure that
reference capture placeholder vars are properly declared.

arm: MVE: Add C++ polymorphism and fix some more issues

This patch adds C++ polymorphism for the MVE intrinsics, by using the native C++
polymorphic functions when C++ is used.

It also moves the PRESERVE name macro definitions to the right place so that the
variants without the '__arm_' prefix are not available if we define the PRESERVE
NAMESPACE macro.

This patch further fixes two testisms that were brought to light by C++ testing
added in this patch.

gcc/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>

* config/arm/arm_mve.h: Add C++ polymorphism and fix preserve MACROs.

gcc/testsuite/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>

* g++.target/arm/mve.exp: New.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f16: Fix testism.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f32: Likewise.

arm: MVE: Fixes for pointers used in intrinsics for c++

This patch fixes the passing of some pointers to builtins that expect slightly
different types of pointers. In C this didn't prove an issue, but when
compiling for C++ gcc complains.

gcc/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>

* config/arm/arm_mve.h: Cast some pointers to expected types.

arm: MVE: Fix -Wall testisms

This patch fixes some testisms I found when testing using -Wall/-Werror.

gcc/testsuite/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>

* gcc.target/arm/mve/intrinsics/vuninitializedq_float.c: Fix testism.
* gcc.target/arm/mve/intrinsics/vuninitializedq_float1.c: Likewise.
* gcc.target/arm/mve/intrinsics/vuninitializedq_int.c: Likewise.
* gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c: Likewise.

arm: MVE: make sure we only use the Arm namespace variant of vuninitializedq

This patch replaces all uses of 'vuninitializedq_*' by the same function but
under the __arm_ namespace. In case we define the PRESERVE MACRO the variant
without the '__arm_' prefix will not be available.

gcc/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>

* config/arm/arm_mve.h: Replace all uses of vuninitializedq_* with the
same with '__arm_' prefix.

arm: MVE: Fix vec extracts to memory

This patch fixes vec extracts to memory that can arise from code as seen in the
testcase added. The patch fixes this by allowing mem operands in the set of
mve_vec_extract patterns, which given the only '=r' constraint will lead to the
scalar value being written to a register and then stored in memory using scalar
store pattern.

gcc/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>

* config/arm/mve.md (mve_vec_extract*): Allow memory operands in set.

gcc/testsuite/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>

* gcc.target/arm/mve/intrinsics/mve_vec_extracts_from_memory.c: New
test.

arm: MVE Fix immediate constraints on some vector instructions

Hi,

This patch fixes the immediate checks on vcvt and vqshr(u)n[bt] instructions.
It also removes the 'arm_mve_immediate_check' as the check was wrong and the
error message is not much better than the constraint one, which albeit isn't
great either.

gcc/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>

* config/arm/arm.c (arm_mve_immediate_check): Removed.
* config/arm/mve.md (MVE_pred2, MVE_constraint2): Added FP types.
(mve_vcvtq_n_to_f_*, mve_vcvtq_n_from_f_*, mve_vqshrnbq_n_*,
mve_vqshrntq_n_*, mve_vqshrunbq_n_s*, mve_vqshruntq_n_s*,
mve_vcvtq_m_n_from_f_*, mve_vcvtq_m_n_to_f_*, mve_vqshrnbq_m_n_*,
mve_vqrshruntq_m_n_s*, mve_vqshrunbq_m_n_s*,
mve_vqshruntq_m_n_s*): Fixed immediate constraints.

gcc/testsuite/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>

* gcc.target/arm/mve/intrinsics/mve_immediates_1_n.c: New test.

arm: MVE Don't use lsll for 32-bit shifts scalar

After fixing the v[id]wdups using the "moving the wrap parameter" into the
top-end of a DImode operand using a shift, I noticed we were using lsll for
32-bit shifts in scalars, where we don't need to, as we can simply do a move,
which is much better if we don't need to use the bottom part.

We can solve this in a better way, but for now this will do.

gcc/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>

* config/arm/arm.d (ashldi3): Don't use lsll for constant 32-bit shifts.

arm: MVE: Fix v[id]wdup's

This patch fixes v[id]wdup intrinsics. They had two issues:
1) the predicated versions did not link the incoming inactive vector parameter
to the output
2) The backend didn't enforce the wrap limit operand be in an odd register.

1) was fixed like we did for all other predicated intrinsics
2) requires a temporary hack where we pass the value in the top end of DImode
operand. The proper fix would be to add a register CLASS but this interacted
badly with other existing targets codegen. We will look to fix this properly in GCC 11.

gcc/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>

* config/arm/arm_mve.h: Fix v[id]wdup intrinsics.
* config/arm/mve/md: Fix v[id]wdup patterns.

arm: MVE: Fix constant load pattern

This patch fixes the constant load pattern for MVE, this was not accounting
correctly for label + offset cases.

Added test that ICE'd before and removed the scan assemblers for the mve_vector*
tests as they were too fragile.

gcc/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>

* config/arm/arm.c (output_move_neon): Deal with label + offset cases.
* config/arm/mve.md (*mve_mov<mode>): Handle const vectors.

gcc/testsuite/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>

* gcc.target/arm/mve/intrinsics/mve_load_from_array.c: New test.
* gcc.target/arm/mve/intrinsics/mve_vector_float.c: Remove
scan-assembler.
* gcc.target/arm/mve/intrinsics/mve_vector_float1.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_vector_int1.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_vector_int2.c: Likewise.

arm: MVE: Do not use typeof for pointer parameters

To make sure our inlining of _Generic doesn't go crazy we added an in between
declaration of the parameters used for _Generic selection. However, this will
not work if the parameter being passed in is an array. Since none of our
intrinsics return pointers we do not need to use typeof here as we will never be
able to nest intrinsics through this parameter. I also removed the unnecessary
const pointers in mve_typeid.

gcc/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>

* config/arm/arm_mve.h: Remove use of typeof for addr pointer parameters
and remove const_ptr enums.

arm: MVE: Fix polymorphism for scalars and constants

This patch merges some polymorphic functions that were uncorrectly separating
scalar variants. It also simplifies the way we detect scalars and constants in
mve_typeid.

I also fixed some polymorphic intrinsics that were splitting of scalar cases.

gcc/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>

* config/arm/arm_mve.h (vsubq_n): Merge with...
(vsubq): ... this.
(vmulq_n): Merge with...
(vmulq): ... this.
(__ARM_mve_typeid): Simplify scalar and constant detection.

gcc/testsuite/ChangeLog:
2020-04-07 Andre Vieira <andre.simoesdiasvieira@arm.com>

* gcc.target/arm/mve/intrinsics/vmulq_n_f16.c: Fix test.
* gcc.target/arm/mve/intrinsics/vmulq_n_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_n_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_n_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_n_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_n_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_n_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vmulq_n_u8.c: Likewise.

S/390: Fix layout of struct sigaction_t

The ordering of some fields in struct sigaction on s390x (64bit)
differs compared to s390 and other architectures.
This patch adjusts this order according to the definition of
<glibc-src>/sysdeps/unix/sysv/linux/s390/bits/sigaction.h

Without this fix e.g. the call
sigaction( suspendSignalNumber, &sigusr1, null ) in thread.d
leads to setting the sa_restorer field to 0xffffffffffffffff.
In case a signal, the signal handler returns to this address
and the process stops with a SIGILL.

This was observable in several execution testcases on s390x:
libphobos.druntime/core/thread.d
libphobos.druntime_shared/core/thread.d
libphobos.thread/tlsgc_sections.d
libphobos.allocations/tls_gc_integration.d
libphobos.phobos/std/parallelism.d
libphobos.phobos_shared/std/parallelism.d
libphobos.shared/host.c
libphobos.shared/linkD.c
libphobos.shared/linkDR.c
libphobos.shared/link_linkdep.d
libphobos.shared/load.d
libphobos.shared/loadDR.c
libphobos.shared/load_linkdep.d
libphobos.shared/load_loaddep.d

libphobos/ChangeLog:

2020-04-07 Stefan Liebler <stli@linux.ibm.com>

* libdruntime/core/sys/posix/signal.d:
Add struct sigaction_t for SystemZ.

c++: Fix usage of CONSTRUCTOR_PLACEHOLDER_BOUNDARY inside array initializers [PR90996]

This PR reports that ever since the introduction of the
CONSTRUCTOR_PLACEHOLDER_BOUNDARY flag, we are sometimes failing to resolve
PLACEHOLDER_EXPRs inside array initializers that refer to some inner
constructor.  In the testcase in the PR, we have as the initializer for "S c[];"
the following

  {{.a=(int &) &_ZGR1c_, .b={*(&<PLACEHOLDER_EXPR struct S>)->a}}}

where CONSTRUCTOR_PLACEHOLDER_BOUNDARY is set on the middle constructor.  When
calling replace_placeholders from store_init_value, we pass the entire
initializer to it, and as a result we fail to resolve the PLACEHOLDER_EXPR
within due to the CONSTRUCTOR_PLACEHOLDER_BOUNDARY flag on the middle
constructor blocking replace_placeholders_r from reaching it.

To fix this, we could perhaps either call replace_placeholders in more places,
or we could change where we set CONSTRUCTOR_PLACEHOLDER_BOUNDARY.  This patch
takes this latter approach -- when building up an array initializer, we now
bubble any CONSTRUCTOR_PLACEHOLDER_BOUNDARY flag from the element initializers
up to the array initializer so that the boundary doesn't later impede us when we
call replace_placeholders from store_init_value.

Besides fixing the kind of code like in the testcase, this shouldn't cause any
other differences in PLACEHOLDER_EXPR resolution because we don't create or use
PLACEHOLDER_EXPRs of array type in the frontend, as far as I can tell.

gcc/cp/ChangeLog:

PR c++/90996
* tree.c (replace_placeholders): Look through all handled components,
not just COMPONENT_REFs.
* typeck2.c (process_init_constructor_array): Propagate
CONSTRUCTOR_PLACEHOLDER_BOUNDARY up from each element initializer to
the array initializer.

gcc/testsuite/ChangeLog:

PR c++/90996
* g++.dg/cpp1y/pr90996.C: New test.

i386: Fix V{64QI,32HI}mode constant permutations [PR94509]

The following testcases are miscompiled, because expand_vec_perm_pshufb
incorrectly thinks it can use vpshufb instruction for the permutations
when it can't.
The
          if (vmode == V32QImode)
            {
              /* vpshufb only works intra lanes, it is not
                 possible to shuffle bytes in between the lanes.  */
              for (i = 0; i < nelt; ++i)
                if ((d->perm[i] ^ i) & (nelt / 2))
                  return false;
            }
intra-lane check which is correct has been copied and adjusted for 64-byte
modes into:
          if (vmode == V64QImode)
            {
              /* vpshufb only works intra lanes, it is not
                 possible to shuffle bytes in between the lanes.  */
              for (i = 0; i < nelt; ++i)
                if ((d->perm[i] ^ i) & (nelt / 4))
                  return false;
            }
which is not correct, because 64-byte modes have 4 lanes rather than just
two and the above is only testing that the permutation grabs even lane elts
from even lanes and odd lane elts from odd lanes, but not that they are
from the same 256-bit half.

The following patch fixes it by using 3 * nelt / 4 instead of nelt / 4,
so we actually check the most significant 2 bits rather than just one.

2020-04-07  Jakub Jelinek  <jakub@redhat.com>

PR target/94509
* config/i386/i386-expand.c (expand_vec_perm_pshufb): Fix the check
for inter-lane permutation for 64-byte modes.

* gcc.target/i386/avx512bw-pr94509-1.c: New test.
* gcc.target/i386/avx512bw-pr94509-2.c: New test.

openmp: Fix parallel master error recovery [PR94512]

We need to set OMP_PARALLEL_COMBINED only if the parsing of omp_master
succeeded, because otherwise there is no nested master construct in the
parallel.

2020-04-07 Jakub Jelinek <jakub@redhat.com>

PR c++/94512
* c-parser.c (c_parser_omp_parallel): Set OMP_PARALLEL_COMBINED
if c_parser_omp_master succeeded.

* parser.c (cp_parser_omp_parallel): Set OMP_PARALLEL_COMBINED
if cp_parser_omp_master succeeded.

* g++.dg/gomp/pr94512.C: New test.

aarch64: Fix {ash[lr],lshr}<mode>3 expanders [PR94488]

The following testcase ICEs on aarch64 apparently since the introduction of
the aarch64 port.  The reason is that the {ashl,ashr,lshr}<mode>3 expanders
completely unnecessarily FAIL; if operands[2] is something other than
a CONST_INT or REG or MEM and the middle-end code can't cope with the
pattern giving up in these cases.  All the expanders use general_operand
predicate for the shift amount operand, but then have just a special case
for CONST_INT (if in-bound, emit an immediate shift, otherwise force into
REG), or MEM (force into REG), or REG (that is the case it handles).
In the testcase, operands[2] is a lowpart SUBREG of a REG, which is valid
general_operand.
I don't see any reason what is magic about MEMs that it should be forced
into REG and others like SUBREGs that it shouldn't, there isn't even a
reason to check for !REG_P because force_reg will do nothing if the operand
is already a REG, and otherwise can handle general_operand just fine.

2020-04-07  Jakub Jelinek  <jakub@redhat.com>

PR target/94488
* config/aarch64/aarch64-simd.md (ashl<mode>3, lshr<mode>3,
ashr<mode>3): Force operands[2] into reg whenever it is not CONST_INT.
Assume it is a REG after that instead of testing it and doing FAIL
otherwise.  Formatting fix.

* gcc.c-torture/compile/pr94488.c: New test.

d: Always set ASM_VOLATILE_P on asm statements (PR94425)

gcc/d/ChangeLog:

PR d/94425
* toir.cc (IRVisitor::visit (GccAsmStatement *)): Set ASM_VOLATILE_P
on all asm statements.

RTEMS: Delete useless mcpu=8540 multilib

The support for the 32-bit float GPRs was removed in GCC 8.

gcc/

* config/rs6000/t-rtems: Delete mcpu=8540 multilib.

i386: Fix emit_reduc_half on V{64Q,32H}Imode [PR94500]

The following testcase is miscompiled in 8.x, because emit_reduc_half is
prepared to handle for 512-bit modes only i equal to 512, 256, 128 and 64.
V32HImode also needs i equal to 32 and V64QImode i equal to 32 and 16,
but emit_reduc_half in that case performs a redundant permutation exactly
like i == 32.  In 9+ the testcase works because Richard in r9-3393
changed the reduc_* expanders so that they actually don't call
ix86_expand_reduc on 512-bit modes, but only 128-bit ones.

The patch fixes emit_reduc_half to handle also i of 32 and 16 similarly to
how V32QImode/V16HImode are handled for AVX2.  I think it shouldn't hurt
to fix the function even on the trunk and 9 branch even when nothing uses
it ATM.

2020-04-07  Jakub Jelinek  <jakub@redhat.com>

PR target/94500
* config/i386/i386-expand.c (emit_reduc_half): For V{64QI,32HI}mode
handle i < 64 using avx512bw_lshrv4ti3.  Formatting fixes.

* gcc.target/i386/avx512bw-pr94500.c: New test.

c++: Fix ICE with implicit operator== [PR94462]

duplicate_decls assumed that any TREE_ARTIFICIAL function at namespace scope
was a built-in function, but now in C++20 it's possible to have an
implicitly declared hidden friend operator==. We just need to move the
assert into the if condition.

gcc/cp/ChangeLog
2020-04-06 Jason Merrill <jason@redhat.com>

PR c++/94462
* decl.c (duplicate_decls): Fix handling of DECL_HIDDEN_FRIEND_P.

Daily bump.

libgo: update to almost the 1.14.2 release

Update to edea4a79e8d7dea2456b688f492c8af33d381dc2 which is likely to
be approximately the 1.14.2 release.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/227377

libgomp/test: Remove a build sysroot fix regression

Fix a problem with commit c8e759b4215b ("libgomp/test: Fix compilation
for build sysroot") that caused a regression in some standalone test
environments where testsuite/libgomp-test-support.exp is used, but the
compiler is expected to be determined by `[find_gcc]', and set the
GCC_UNDER_TEST TCL variable in testsuite/libgomp-site-extra.exp instead.

libgomp/
* configure.ac: Add testsuite/libgomp-site-extra.exp to output
files.
* configure: Regenerate.
* testsuite/libgomp-site-extra.exp.in: New file.
* testsuite/libgomp-test-support.exp.in (GCC_UNDER_TEST): Remove
variable.
* testsuite/Makefile.am (EXTRA_DEJAGNU_SITE_CONFIG): New
variable.
* testsuite/Makefile.in: Regenerate.

libatomic/test: Fix compilation for build sysroot

Fix a problem with the libatomic testsuite using a method to determine
the compiler to use resulting in the tool being different from one the
library has been built with, and causing a catastrophic failure from the
lack of a suitable `--sysroot=' option where the `--with-build-sysroot='
configuration option has been used to build the compiler resulting in
the inability to link executables.

Address this problem by providing a DejaGNU configuration file defining
the compiler to use, via the GCC_UNDER_TEST TCL variable, set from $CC
by autoconf, which will have all the required options set for the target
compiler to build executables in the environment configured, removing
failures like:

.../bin/riscv64-linux-gnu-ld: cannot find crt1.o: No such file or directory
.../bin/riscv64-linux-gnu-ld: cannot find -lm
collect2: error: ld returned 1 exit status
compiler exited with status 1
FAIL: libatomic.c/atomic-compare-exchange-1.c (test for excess errors)
Excess errors:
.../bin/riscv64-linux-gnu-ld: cannot find crt1.o: No such file or directory
.../bin/riscv64-linux-gnu-ld: cannot find -lm

UNRESOLVED: libatomic.c/atomic-compare-exchange-1.c compilation failed to produce executable

and bringing overall test results for the `riscv64-linux-gnu' target
(here with the `x86_64-linux-gnu' host and RISC-V QEMU in the Linux user
emulation mode as the target board) from:

=== libatomic Summary ===

# of unexpected failures 27
# of unresolved testcases 27

to:

=== libatomic Summary ===

# of expected passes 54

libatomic/
* configure.ac: Add testsuite/libatomic-site-extra.exp to output
files.
* configure: Regenerate.
* libatomic/testsuite/libatomic-site-extra.exp.in: New file.
* testsuite/Makefile.am (EXTRA_DEJAGNU_SITE_CONFIG): New
variable.
* testsuite/Makefile.in: Regenerate.

cselib: Fix endless cselib loop on (plus:P (reg) (const_int 0))

getopt.c hangs the compiler on h8300-elf with -O2 -g, because the
IL contains addition of constant 0, the first PLUS operand is determined
to have the SP_DERIVED_VALUE_P and the new code in cselib recurses
indefinitely on seeing SP_DERIVED_VALUE_P with locs of
(plus:P SP_DERIVED_VALUE_P (const_int 0)).

Fixed by making sure cselib_subst_to_values canonicalizes it, hashing
already hashes it the same too.

2020-04-06 Jakub Jelinek <jakub@redhat.com>

* cselib.c (cselib_subst_to_values): For SP_DERIVED_VALUE_P
+ const0_rtx return the SP_DERIVED_VALUE_P.

Update gcc sv.po.

* sv.po: Update.

Update cpplib eo.po.

* eo.po: Update.

Fix fortran/93686 -- ICE matching data statements with derived-type pointers.

gcc/fortran/ChangeLog:

2020-04-06 Steven G. Kargl <kargl@gcc.gnu.org>

PR fortran/93686
* decl.c (gfc_match_data): Handle data matching for derived type
pointers.

gcc/testsuite/ChangeLog:

2020-04-06 Steven G. Kargl <kargl@gcc.gnu.org>

PR fortran/93686
* gfortran.dg/pr93686_1.f90: New test.
* gfortran.dg/pr93686_2.f90: Likewise.
* gfortran.dg/pr93686_3.f90: Likewise.
* gfortran.dg/pr93686_4.f90: Likewise.

skip gcc.target/arm/div64-unwinding.c on vxworks_kernel targets

This test verifies, by using a weak reference to _Unwind_RaiseException,
that performing division by zero does not cause that symbol to get
indirectly pulled into our closure.

The testing methodology unfortunately does not work on VxWorks targets
when building in kernel mode. This is inherent to how kernel mode
on VxWorks works: The link is only partial and the remaining symbols
which have not been resolved already get automatically resolved by
the VxWorks loader at the moment the module is loaded onto the target,
prior to execution. The resolution includes weak symbols too, which
defeats the purpose of this test.

gcc/testsuite/

* gcc.target/arm/div64-unwinding.c: Skip on vxworks_kernel targets.

lra: Stop eh_return data regs being incorrectly marked live [PR92989]

lra_assign has an assert to make sure that no pseudo is allocated
to a conflicting hard register.  It used to be restricted to
!flag_ipa_ra, but in g:a1e6ee38e708ef2bdef4 I'd enabled it for
flag_ipa_ra too.  It then tripped a few times while building
libstdc++ for mips-mti-linux.

Previous patches fixed one of the problems: registers clobbered
by the taking of an exception were being treated as live at the
beginning of the EH receiver, and this got propagated to predecessor
blocks.  But it turns out that there was a second problem: eh_return
data registers were also being marked live in the same way.

These registers are defined by the unwinder and so in reality they
are live on entry to the EH receiver.  But definitions can only happen
in blocks, not on edges, so for liveness purposes we use artificial
definitions at the start of the EH receiver.  process_bb_lives should
therefore model the effect of a definition, not a plain use.

2020-04-06  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
PR rtl-optimization/92989
* lra-lives.c (process_bb_lives): Do not treat eh_return data
registers as being live at the beginning of the EH receiver.

libstdc++: Make string_view::copy usable in constant expressions (PR 94498)

PR libstdc++/94498
* include/bits/char_traits.h (__gnu_cxx::char_traits::move): Make it
usable in constant expressions for C++20.
(__gnu_cxx::char_traits::copy, __gnu_cxx::char_traits::assign): Add
_GLIBCXX20_CONSTEXPR.
(std::char_traits<char>, std::char_traits<wchar_t>)
(std::char_traits<char8_t>): Make move, copy and assign usable in
constant expressions for C++20.
(std::char_traits<char16_t>, std::char_traits<char32_t>): Make move
and copy usable in constant expressions for C++20.
* include/std/string_view (basic_string_view::copy): Add
_GLIBCXX20_CONSTEXPR.
* testsuite/21_strings/basic_string_view/operations/copy/char/
constexpr.cc: New test.
* testsuite/21_strings/basic_string_view/operations/copy/wchar_t/
constexpr.cc: New test.

c++: Fix crash in gimplifier with paren init of aggregates [PR94155]

Here we crash in the gimplifier because gimplify_init_ctor_eval doesn't
expect null indexes for a constructor:

      /* ??? Here's to hoping the front end fills in all of the indices,
         so we don't have to figure out what's missing ourselves.  */
      gcc_assert (purpose);

The indexes weren't filled because we never called reshape_init: for
a constructor that represents parenthesized initialization of an
aggregate we don't allow brace elision or designated initializers.

PR c++/94155 - crash in gimplifier with paren init of aggregates.
* init.c (build_vec_init): Fill in indexes.

* g++.dg/cpp2a/paren-init22.C: New test.

Daily bump.

libstdc++: Refer to Git documentation

* doc/xml/manual/appendix_contributing.xml: Refer to Git
documentation instead of Subversion. Switch to https.
* doc/html/manual/appendix_contributing.html: Regenerate.

coroutines, testsuite: Renumber two tests (NFC).

Try to keep tests order by distinct number (and with a short
descriptive name appended).

2020-04-05 Iain Sandoe <iain@sandoe.co.uk>

* g++.dg/coroutines/torture/co-await-14-template-traits.C: Rename...
* g++.dg/coroutines/torture/co-await-16-template-traits.C: to this.
* g++.dg/coroutines/torture/co-await-15-capture-comp-ref.C: Rename..
* g++.dg/coroutines/torture/co-await-17-capture-comp-ref.C: to this.

Minor doc fix for ISO C90

* extend.texi: Add free to list of ISO C90 functions that
are recognized by the compiler.

Microblaze: Fixed missing save of r18 in fast_interrupt. Register 18 is used as a clobber register, and must be stored when entering a fast_interrupt. Before this fix, register 18 was only saved if it was used directly in the interrupt function.

    However, if the fast_interrupt function called a function that used
    r18, the register would not be saved, and thus be mangled
    upon returning from the interrupt.

* config/microblaze/microblaze.c (microblaze_must_save_register): Check
for fast_interrupt.

Microblaze: Modified trap instruction There is a bug in trap instruction generation. Instead of "bri 0" instruction "brki r0, -1" was used, corrected it now.

* gcc/config/microblaze/microblaze.md (trap): Update output pattern.

* gcc.target/microblaze/others/builtin-trap.c: Update expected output.

Daily bump.

debug: Improve debug info of c++14 deduced return type [PR94459]

On the following testcase, in gdb ptype S<long>::m1 prints long as return
type, but all the other methods show void instead.
PR53756 added code to add_type_attribute if the return type is
auto/decltype(auto), but we actually should look through references,
pointers and qualifiers.
Haven't included there DW_TAG_atomic_type, because I think at least ATM
one can't use that in C++.  Not sure about DW_TAG_array_type or what else
could be deduced.

> http://eel.is/c++draft/dcl.spec.auto#3 says it has to appear as a
> decl-specifier.
>
> http://eel.is/c++draft/temp.deduct.type#8 lists the forms where a template
> argument can be deduced.
>
> Looks like you are missing arrays, pointers to members, and function return
> types.

2020-04-04  Hannes Domani  <ssbssa@yahoo.de>
    Jakub Jelinek  <jakub@redhat.com>

PR debug/94459
* dwarf2out.c (gen_subprogram_die): Look through references, pointers,
arrays, pointer-to-members, function types and qualifiers when
checking if in-class DIE had an 'auto' or 'decltype(auto)' return type
to emit type again on definition.

* g++.dg/debug/pr94459.C: New test.

Co-Authored-By: Hannes Domani <ssbssa@yahoo.de>

libgcc: only use __mmap if glibc >- 2.26

* generic-morestack.c: Only use __mmap on glibc >= 2.26.

c++: Mangling of dependent conversions [PR91377]

We skip over other conversion codes when mangling expressions, we should do
the same with IMPLICIT_CONV_EXPR.

gcc/cp/ChangeLog
2020-04-04 Jason Merrill <jason@redhat.com>

PR c++/91377
* mangle.c (write_expression): Skip IMPLICIT_CONV_EXPR.

c++: Refrain from using replace_placeholders in constexpr evaluation [PR94205]

This removes the use of replace_placeholders in cxx_eval_constant_expression
(which is causing the new test lambda-this6.C to ICE due to replace_placeholders
mutating the shared TARGET_EXPR_INITIAL tree which then trips up the
gimplifier).

In its place, this patch adds a 'parent' field to constexpr_ctx which is used to
store a pointer to an outer constexpr_ctx that refers to another object under
construction.  With this new field, we can beef up lookup_placeholder to resolve
PLACEHOLDER_EXPRs which refer to former objects under construction, which fixes
PR94205 without needing to do replace_placeholders.  Also we can now respect the
CONSTRUCTOR_PLACEHOLDER_BOUNDARY flag when resolving PLACEHOLDER_EXPRs, and
doing so fixes the constexpr analogue of PR79937.

gcc/cp/ChangeLog:

PR c++/94205
PR c++/79937
* constexpr.c (struct constexpr_ctx): New field 'parent'.
(cxx_eval_bare_aggregate): Propagate CONSTRUCTOR_PLACEHOLDER_BOUNDARY
flag from the original constructor to the reduced constructor.
(lookup_placeholder): Prefer to return the outermost matching object
by recursively calling lookup_placeholder on the 'parent' context,
but don't cross CONSTRUCTOR_PLACEHOLDER_BOUNDARY constructors.
(cxx_eval_constant_expression): Link the 'ctx' context to the 'new_ctx'
context via 'new_ctx.parent' when being expanded without an explicit
target.  Don't call replace_placeholders.
(cxx_eval_outermost_constant_expr): Initialize 'ctx.parent' to NULL.

gcc/testsuite/ChangeLog:

PR c++/94205
PR c++/79937
* g++.dg/cpp1y/pr79937-5.C: New test.
* g++.dg/cpp1z/lambda-this6.C: New test.

c++: Fix constexpr evaluation of self-modifying CONSTRUCTORs [PR94219]

This PR reveals that cxx_eval_bare_aggregate and cxx_eval_store_expression do
not anticipate that a constructor element's initializer could mutate the
underlying CONSTRUCTOR.  Evaluation of the initializer could add new elements to
the underlying CONSTRUCTOR, thereby potentially invalidating any pointers to
or assumptions about the CONSTRUCTOR's elements, and so these routines should be
prepared for that.

To fix this problem, this patch makes cxx_eval_bare_aggregate and
cxx_eval_store_expression recompute the constructor_elt pointers through which
we're assigning, after it evaluates the initializer.  Care is taken to to not
slow down the common case where the initializer does not modify the underlying
CONSTRUCTOR.

gcc/cp/ChangeLog:

PR c++/94219
PR c++/94205
* constexpr.c (get_or_insert_ctor_field): Split out (while adding
support for VECTOR_TYPEs, and optimizations for the common case)
from ...
(cxx_eval_store_expression): ... here.  Rename local variable
'changed_active_union_member_p' to 'activated_union_member_p'.  Record
the sequence of indexes into 'indexes' that yields the subobject we're
assigning to.  Record the integer offsets of the constructor indexes
we're assigning through into 'index_pos_hints'.  After evaluating the
initializer of the store expression, recompute 'valp' using 'indexes'
and using 'index_pos_hints' as hints.
(cxx_eval_bare_aggregate): Tweak comments.  Use get_or_insert_ctor_field
to recompute the constructor_elt pointer we're assigning through after
evaluating each initializer.

gcc/testsuite/ChangeLog:

PR c++/94219
PR c++/94205
* g++.dg/cpp1y/constexpr-nsdmi3.C: New test.
* g++.dg/cpp1y/constexpr-nsdmi4.C: New test.
* g++.dg/cpp1y/constexpr-nsdmi5.C: New test.
* g++.dg/cpp1z/lambda-this5.C: New test.

Fix previous commit.

gcc/ChangeLog:

2020-04-04 Jan Hubicka <hubicka@ucw.cz>

PR ipa/93940
* ipa-fnsummary.c (vrp_will_run_p): New function.
(fre_will_run_p): New function.
(evaluate_properties_for_edge): Use it.
* ipa-inline.c (can_inline_edge_by_limits_p): Do not inline
!optimize_debug to optimize_debug.

gcc/testsuite/ChangeLog:

2020-04-04 Jan Hubicka <hubicka@ucw.cz>

* g++.dg/tree-ssa/pr93940.C: New test.

c++: Fix invalid pointer-to-member in requires [PR67825]

A recent change to cmcstl2 led to two tests failing due to this bug: our
valid expression checking in the context of a requires-expression wasn't
catching that an expression of member function type can only appear as the
function operand of a call expression. Fixed by using convert_to_void to do
the same checking as a discarded-value expression.

This patch also fixes 67825, which already had a testcase, but the testcase
was testing for the wrong behavior.

gcc/cp/ChangeLog
2020-04-04 Jason Merrill <jason@redhat.com>

PR c++/67825
* constraint.cc (tsubst_valid_expression_requirement): Call
convert_to_void.

c++: Fix reuse of class constants [PR94453]

The testcase hit an ICE trying to expand a TARGET_EXPR temporary cached from
the other lambda-expression. This patch fixes this in two ways:

1) Avoid reusing a TARGET_EXPR from another function.
2) Avoid ending up with a TARGET_EXPR at all; the use of 'p' had become
<TARGET_EXPR<NON_LVALUE_EXPR<TARGET_EXPR ...>>>, which doesn't make any
sense.

gcc/cp/ChangeLog
2020-04-04 Jason Merrill <jason@redhat.com>

PR c++/94453
* constexpr.c (maybe_constant_value): Use break_out_target_exprs.
* expr.c (mark_use) [VIEW_CONVERT_EXPR]: Don't wrap a TARGET_EXPR in
NON_LVALUE_EXPR.

ipa: Fix wrong code with failed propagation to builtin_constant_p [PR93940]

this patch fixes wrong code on a testcase where inline predicts
builtin_constant_p to be true but we fail to optimize its parameter to constant
becuase FRE is not run and the value is passed by an aggregate.

This patch makes the inline predicates to disable aggregate tracking
when FRE is not going to be run and similarly value range when VRP is not
going to be run.

This is just partial fix.  Even with it we can arrange FRE/VRP to fail and
produce wrong code, unforutnately.

I think for GCC11 I will need to implement transformation in ipa-inline
but this is bit hard to do: predicates only tracks that value will be constant
and do not track what constant to be.

Optimizing builtin_constant_p in a conditional is not going to do good job
when the value is used later in a place that expects it to be constant.
This is pre-existing problem that is not limited to inline tracking. For example,
FRE may do the transofrm at one place but not in another due to alias oracle
walking limits.

So I am not sure what full fix would be :(

gcc/ChangeLog:

2020-04-04  Jan Hubicka  <hubicka@ucw.cz>

PR ipa/93940
* ipa-fnsummary.c (vrp_will_run_p): New function.
(fre_will_run_p): New function.
(evaluate_properties_for_edge): Use it.
* ipa-inline.c (can_inline_edge_by_limits_p): Do not inline
!optimize_debug to optimize_debug.

gcc/testsuite/ChangeLog:

2020-04-04  Jan Hubicka  <hubicka@ucw.cz>

* g++.dg/tree-ssa/pr93940.C: New test.

cselib: Don't consider SP_DERIVED_VALUE_P values as useless [PR94468]

The following testcase ICEs, because at one point we see the
SP_DERIVED_VALUE_P VALUE as useless (not PRESERVED_VALUE_P and no locs)
and so expect it to be discarded as useless.  But, later on we
are adding some new VALUE that is equivalent to it, and when adding
the equivalency that that new VALUE is equal to this SP_DERIVED_VALUE_P,
new_elt_loc_list has code for VALUE canonicalization and reverses addition
if uid is smaller, and at that point a new loc is added to the
SP_DERIVED_VALUE_P VALUE and it isn't discarded as useless anymore.
Now, I think we don't want to discard the SP_DERIVED_VALUE_P values
even if they have no locs, because they still have the special behaviour
that they then force other new VALUEs to be canonicalized against them,
which is what this patch implements.  I've not set PRESERVED_VALUE_P
on the SP_DERIVED_VALUE_P at the creation time, because whether a VALUE
is preserved or not is something that affects var-tracking decisions quite a
lot and we shouldn't set it blindly on other VALUEs.

Or, to avoid the repetitive code, should I introduce
static bool
cselib_useless_value_p (cselib_val *v)
{
  return (v->locs == 0
  && !PRESERVED_VALUE_P (v->val_rtx)
  && !SP_DERIVED_VALUE_P (v->val_rtx)));
}
predicate and use it in those 6 spots?

2020-04-04  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/94468
* cselib.c (references_value_p): Formatting fix.
(cselib_useless_value_p): New function.
(discard_useless_locs, discard_useless_values,
cselib_invalidate_regno_val, cselib_invalidate_mem,
cselib_record_set): Use it instead of
v->locs == 0 && !PRESERVED_VALUE_P (v->val_rtx).

* g++.dg/opt/pr94468.C: New test.

c++: Fix further protected_set_expr_location related -fcompare-debug issues [PR94441]

My recent protected_set_expr_location changes work well when
that function is called unconditionally, but as the testcase shows, the C++
FE has a few spots that do:
  if (!EXPR_HAS_LOCATION (stmt))
    protected_set_expr_location (stmt, locus);
or similar.  Now, if we have for -g0 stmt of some expression that can
have location and has != UNKNOWN_LOCATION, while -g instead has
a STATEMENT_LIST containing some DEBUG_BEGIN_STMTs + that expression with
that location, we don't call protected_set_expr_location in the -g0 case,
but do call it in the -g case, because on the STATEMENT_LIST
!EXPR_HAS_LOCATION.
The following patch introduces a helper function which digs up the single
expression of a STATEMENT_LIST and uses that expression in the
EXPR_HAS_LOCATION check (plus changes protected_set_expr_location to
also use that helper).

Or do we want a further wrapper, perhaps C++ FE only, that would do this
protected_set_expr_location_if_unset (stmt, locus)?

2020-04-04  Jakub Jelinek  <jakub@redhat.com>

PR debug/94441
* tree-iterator.h (expr_single): Declare.
* tree-iterator.c (expr_single): New function.
* tree.h (protected_set_expr_location_if_unset): Declare.
* tree.c (protected_set_expr_location): Use expr_single.
(protected_set_expr_location_if_unset): New function.

* parser.c (cp_parser_omp_for_loop): Use
protected_set_expr_location_if_unset.
* cp-gimplify.c (genericize_if_stmt, genericize_cp_loop): Likewise.

* g++.dg/opt/pr94441.C: New test.

Daily bump.

Fix stdarg-3 regression on xstormy16 port

PR rtl-optimization/92264
* config/stormy16/stormy16.c (xstormy16_preferred_reload_class): Handle
reloading of auto-increment addressing modes.

openmp: Fix ICE on #pragma omp parallel master in template [PR94477]

The following testcase ICEs, because for parallel combined with some
other construct we initialize the omp_parallel_combined_clauses pointer
and expect the construct combined with it to clear it after it no longer
needs it, but OMP_MASTER didn't do that.

2020-04-04 Jakub Jelinek <jakub@redhat.com>

PR c++/94477
* pt.c (tsubst_expr) <case OMP_MASTER>: Clear
omp_parallel_combined_clauses.

* g++.dg/gomp/pr94477.C: New test.

libgcc: avoid mmap/munmap hooks in split-stack code on GNU/Linux

* generic-morestack.c: On GNU/Linux use __mmap/__munmap rather
than mmap/munmap, to avoid hooks.

x86: Mark scratch operand in ssse3_pshufbv8qi3 as earlyclobber

commit 16ed2601ad0a4aa82f11e9df86ea92183f94f979
Author: H.J. Lu <hongjiu.lu@intel.com>
Date:   Wed May 15 15:26:19 2019 +0000

    i386: Emulate MMX pshufb with SSE version

has

+(define_insn_and_split "ssse3_pshufbv8qi3"
+  [(set (match_operand:V8QI 0 "register_operand" "=y,x,Yv")
+  (unspec:V8QI [(match_operand:V8QI 1 "register_operand" "0,0,Yv")
+           (match_operand:V8QI 2 "register_mmxmem_operand" "ym,x,Yv")]
+          UNSPEC_PSHUFB))
+   (clobber (match_scratch:V4SI 3 "=X,x,Yv"))]
                                       ^^^  There are earlyclobber.
+  "(TARGET_MMX || TARGET_MMX_WITH_SSE) && TARGET_SSSE3"
+  "@
+   pshufb\t{%2, %0|%0, %2}
+   #
+   #"
+  "TARGET_MMX_WITH_SSE && reload_completed"
+  [(set (match_dup 3) (match_dup 5))
+   (set (match_dup 3)
+  (and:V4SI (match_dup 3) (match_dup 2)))
+   (set (match_dup 0)
+  (unspec:V16QI [(match_dup 1) (match_dup 4)] UNSPEC_PSHUFB))]

If input register operand 2 is dead after this insn, RA may choose it
as scratch operand.  Since it isn't marked as earlyclobber, operand 2
becomes unused after split and then it gets optimized out.  Mark scratch
operand as earlyclobber fixes the issue.

gcc/

PR target/94467
* config/i386/sse.md (ssse3_pshufbv8qi3): Mark scratch operand
as earlyclobber.

gcc/testsuite/

PR target/94467
* gcc.target/i386/pr94467-1.c: New test.
* gcc.target/i386/pr94467-2.c: Likewise.

Fix va-arg-22.c at -O1 on m32r.

PR rtl-optimization/92264
* config/m32r/m32r.c (m32r_output_block_move): Properly account for
post-increment addressing of source operands as well as residuals
when computing any adjustments to the input pointer.

i386: Fix up handling of OPTION_MASK_ISA_MMX builtins [PR94461]

In https://gcc.gnu.org/ml/gcc-patches/2017-10/msg00576.html the builtin
handling was changed so that OPTION_MASK_ISA_MMX | OPTION_MASK_ISA_SSE
etc. in i386-builtin.def means we require both mmx and sse, not just one of
those, and later on for other option combinations very similar rule has
been clarified, with a few exceptions that ix86_expand_builtin lists
(SSE | 3DNOW_A, SSE4_2 | CRC32 and FMA | FMA4 are one or the other).
The above mentioned patch also added OPTION_MASK_ISA_MMX to a few insns
that in the ISA documents are documented e.g. only requiring SSE2 or SSSE3
etc. CPUID, but because those builtins take or return V2SI or similar
MMX-ish arguments, we can't really support those builtins in functions that
have MMX disabled.
Now, during the TARGET_MMX_WITH_SSE changes,
https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01479.html
and
https://gcc.gnu.org/ml/gcc-patches/2019-05/msg01084.html
actually changed this; it added | OPTION_MASK_ISA_SSE2 to builtins
that were formerly OPTION_MASK_ISA_MMX only, but didn't touch the builtins
that were already using OPTION_MASK_ISA_SSE2 | OPTION_MASK_ISA_MMX
for something different (both options must be enabled).
This causes e.g. ICE on the following testcase, because the builtins are
now enabled even with just -mmmx -mno-sse2, even when they (those changed in
2017) require SSE2.
The following patch instead reverts the above two 2019-ish changes (except
for header/testsuite changes), and instead treats OPTION_MASK_ISA_MMX
requirement in bdesc/.isa specially, as being satisfied by either
TARGET_MMX (no changes really needed for that), or by TARGET_MMX_WITH_SSE.
This achieves what the two 2019-ish patches want to do, that the
OPTION_MASK_ISA_MMX only builtins are enabled not just with -mmmx, but also
with -m64 -msse2, and for the other builtins that require MMX and something
else will either require -mmmx and that some other ISA, or -m64 -msse2 and
that other ISA, but -mmmx will not enable builtins that need something more
than OPTION_MASK_ISA_MMX only.
The i386-builtins.c changes that aren't reversion of the two patches try to
make sure that in .isa we still record OPTION_MASK_ISA_MMX for builtins that
have that requirement, so that it is in the end only ix86_expand_builtin
that decides if the builtin is ok or not and the rest of code just decides
if it is the right time to declare the builtin already or if it should be
deferred.

2020-04-03 Jakub Jelinek <jakub@redhat.com>

PR target/94461
* config/i386/i386-expand.c (ix86_expand_builtin): If
TARGET_MMX_WITH_SSE without TARGET_MMX and bisa contains
OPTION_MASK_ISA_MMX, clear OPTION_MASK_ISA_MMX and set
OPTION_MASK_ISA_SSE2 in bisa. Revert 2019-05-17 and 2019-05-15
changes.
* config/i386/i386-builtins.c (def_builtin): If mask includes
OPTION_MASK_ISA_MMX and TARGET_MMX_WITH_SSE, consider it satisfied.
(ix86_add_new_builtins): For TARGET_64BIT, consider
OPTION_MASK_ISA_SSE2 enabled in isa as satisfying OPTION_MASK_ISA_MMX
requirement.
(ix86_init_tm_builtins): If TARGET_MMX_WITH_SSE consider
OPTION_MASK_ISA_MMX as satisfied.
(bdesc_tm): Revert 2019-05-15 changes.
(ix86_init_mmx_sse_builtins): Likewise.
* config/i386/i386-builtin.def: Likewise.

* gcc.target/i386/pr94461.c: New test.

c++: alias template and parameter packs (PR91966).

In this testcase, when we do a pack expansion of count_better_mins<nums>,
nums appears both in the definition of count_better_mins and as its template
argument.  The intent is that we get a expansion over pairs of elements of
the pack, i.e. less<2,2>, less<2,7>, less<7,2>, ....  But if we substitute
into the definition of count_better_mins when parsing the template, we end
up with sum<less<nums,nums>...>, which never gives us less<2,7>.  We could
deal with this by somehow marking up the use of 'nums' as an argument for
'num', but it's simpler to mark the alias as complex, so we need to
instantiate it later with all its arguments rather than replace it early
with its expansion.

gcc/cp/ChangeLog
2020-04-03  Jason Merrill  <jason@redhat.com>

PR c++/91966
* pt.c (complex_pack_expansion_r): New.
(complex_alias_template_p): Use it.

i386: Fix vph{add,subs?}[wd] 256-bit AVX2 RTL patterns [PR94460]

The following testcase is miscompiled, because the AVX2 patterns don't
describe correctly what the insn does.  E.g. vphaddd with %ymm* operands
(the second pattern) instruction as per:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm256_hadd_epi32&expand=2941
does { a0+a1, a2+a3, b0+b1, b2+b3, a4+a5, a6+a7, b4+b5, b6+b7 }
but our RTL pattern did
     { a0+a1, a2+a3, a4+a5, a6+a7, b0+b1, b2+b3, b4+b5, b6+b7 }
where the first and last 64 bits are the same and two middle 64 bits
swapped.
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm256_hadd_epi16&expand=2939
similarly, insn does:
     { a0+a1, a2+a3, a4+a5, a6+a7, b0+b1, b2+b3, b4+b5, b6+b7,
       a8+a9, a10+a11, a12+a13, a14+a15, b8+b9, b10+b11, b12+b13, b14+b15 }
but RTL pattern did
     { a0+a1, a2+a3, a4+a5, a6+a7, a8+a9, a10+a11, a12+a13, a14+a15,
       b0+b1, b2+b3, b4+b5, b6+b7, b8+b9, b10+b11, b12+b13, b14+b15 }
again, first and last 64 bits are the same and the two middle 64 bits
swapped.

2020-04-03  Jakub Jelinek  <jakub@redhat.com>

PR target/94460
* config/i386/sse.md (avx2_ph<plusminus_mnemonic>wv16hi3,
avx2_ph<plusminus_mnemonic>dv8si3): Fix up RTL pattern to do
second half of first lane from first lane of second operand and
first half of second lane from second lane of first operand.

* gcc.target/i386/avx2-pr94460.c: New test.

c++: Add test for PR c++/93211

The fix for PR c++/90711 also fixed this PR.

gcc/testsuite/ChangeLog:

PR c++/93211
PR c++/90711
* g++.dg/template/koenig11.C: New test.

arm: MVE: Fix unintended change to tests

When committing my last patch I accidentally removed -mfpu=auto from the following tests. This puts it back.

testsuite/ChangeLog:
2020-04-03 Andre Vieira <andre.simoesdiasvieira@arm.com>

* gcc.target/arm/mve/intrinsics/mve_vector_float.c: Put -mfpu=auto back.
* gcc.target/arm/mve/intrinsics/mve_vector_float1.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_vector_float2.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_vector_int.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_vector_int1.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_vector_int2.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_vector_uint.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_vector_uint1.c: Likewise.
* gcc.target/arm/mve/intrinsics/mve_vector_uint2.c: Likewise.

Testing Done:
@IP: I assert this is almost no risk.

Reviewed at http://pdtlreviewboard.cambridge.arm.com/r/12880/

arm: Do not process rest of MVE header file after unsupported error

This patch makes sure the rest of the header file is not parsed if MVE is not
supported.  The user should not be including this file if MVE is not supported,
nevertheless making sure it doesn't parse the rest of the header file will
save the user from a huge error output that would be rather useless.

gcc/ChangeLog:
2020-04-03  Andre Vieira  <andre.simoesdiasvieira@arm.com>

        * config/arm/arm_mve.h: Condition the header file on __ARM_FEATURE_MVE.

AArch64: Fix options canonicalization for assembler

It is currently impossible to use fp16 on any architecture higher than Armv8.3-a
due to a bug in options canonization.  This bug results in the fp16 flag not
being emitted in the assembly when it should have been.

This is caused by a complicated architectural requirement at Armv8.4-a.  On
Armv8.2-a and Armv8.3-a fp16fml is an optional extension and turning it on turns
on both fp and fp16.  However starting with Armv8.4-a fp16fml is mandatory if
fp16 is available, otherwise it's optional.

In short this means that to enable fp16fml the smallest option that needs to
passed to the assembler is Armv8.4-a+fp16.

The fix in this patch takes into account that an option may be on by default in
an architecture, but that not all the bits required to use it are on by default
in an architecture.  In such cases the difference between the two are still
emitted to the assembler.

gcc/ChangeLog:

PR target/94396
* common/config/aarch64/aarch64-common.c
(aarch64_get_extension_string_for_isa_flags): Handle default flags.

gcc/testsuite/ChangeLog:

PR target/94396
* gcc.target/aarch64/options_set_11.c: New test.
* gcc.target/aarch64/options_set_12.c: New test.
* gcc.target/aarch64/options_set_13.c: New test.
* gcc.target/aarch64/options_set_14.c: New test.
* gcc.target/aarch64/options_set_15.c: New test.
* gcc.target/aarch64/options_set_16.c: New test.
* gcc.target/aarch64/options_set_17.c: New test.
* gcc.target/aarch64/options_set_18.c: New test.
* gcc.target/aarch64/options_set_19.c: New test.
* gcc.target/aarch64/options_set_20.c: New test.
* gcc.target/aarch64/options_set_21.c: New test.
* gcc.target/aarch64/options_set_22.c: New test.
* gcc.target/aarch64/options_set_23.c: New test.
* gcc.target/aarch64/options_set_24.c: New test.
* gcc.target/aarch64/options_set_25.c: New test.
* gcc.target/aarch64/options_set_26.c: New test.

middle-end/94465 - handle released SSA names in array_ref_low_bound

array_ref_low_bound is used in dumping ARRAY_REFs which in turn
is called when basic blocks are deleted. cleanup_control_flow_pre
consciously decides to remove unreachable basic-blocks in arbitrary
order so the following makes array_ref_low_bound forgiving in the
case the SSA name with the index definition has been released
already.

2020-04-03 Richard Biener <rguenther@suse.de>

PR middle-end/94465
* tree.c (array_ref_low_bound): Deal with released SSA names
in index position.

Improve svn-rev to search for pattern at line beginning.

* gcc-git-customization.sh: Search for the pattern
at line beginning only.

amdgcn: Support unordered floating-point comparison operators

2020-04-03 Kwok Cheung Yeung <kcy@codesourcery.com>

gcc/
* config/gcn/gcn.c (print_operand): Handle unordered comparison
operators.
* config/gcn/predicates.md (gcn_fp_compare_operator): Add unordered
comparison operators.

libstdc++: Fix std::to_address for debug iterators (PR 93960)

It should be valid to use std::to_address on a past-the-end iterator,
but the debug mode iterators do a check for dereferenceable in their
operator->(). That check is generally useful, so rather than remove it
this changes std::__to_address to identify a debug mode iterator and
use base().operator->() to skip the check.

PR libstdc++/93960
* include/bits/ptr_traits.h (__to_address): Add special case for debug
iterators, to avoid dereferenceable check.
* testsuite/20_util/to_address/1_neg.cc: Adjust dg-error line number.
* testsuite/20_util/to_address/debug.cc: New test.

Revert "[nvptx, libgomp] Update pr85381-{2,4}.c test-cases" [PR89713, PR94392]

In response to PR94392 commit 75efe9cb1f8938a713ce540dc3b27bc2afcd3fae
"c/94392 - only enable -ffinite-loops for C++", this reverts PR89713
commit 00908992f2a78f213d227aea8dbab014a1361df0, as apparently now again
"empty oacc loops are" no longer "removed before expand".

libgomp/
PR tree-optimization/89713
PR c/94392
* testsuite/libgomp.oacc-c-c++-common/pr85381-2.c: Again expect
'bar.sync'.
* testsuite/libgomp.oacc-c-c++-common/pr85381-4.c: Likewise.

Fix PR94443 with gsi_insert_seq_before [PR94443]

This patch is to fix the stupid mistake by using
gsi_insert_seq_before instead of gsi_insert_before.

BTW, the regression testing on one x86_64 machine from CFarm is
unable to reveal it (I guess due to native arch sandybridge?), so I
specified additional option -march=znver2 and verified the coverage.

Bootstrapped/regtested on powerpc64le-linux-gnu (P9) and
x86_64-pc-linux-gnu, also verified the fail cases in related PRs.

2020-04-03  Kewen Lin  <linkw@gcc.gnu.org>

gcc/
    PR tree-optimization/94443
    * tree-vect-loop.c (vectorizable_live_operation): Use
    gsi_insert_seq_before to replace gsi_insert_before.

gcc/testsuite/
    PR tree-optimization/94443
    * gcc.dg/vect/pr94443.c: New test.

ICF: compare type attributes for gimple_call_fntypes.

PR ipa/94445
* ipa-icf-gimple.c (func_checker::compare_gimple_call):
Compare type attributes for gimple_call_fntypes.

S/390 zTPF: Handle skip trace addresses when unwinding

Check for and handle new skip trace addresses when unwinding on zTPF.

libgcc/ChangeLog:

2020-04-03 Jim Johnston <jjohnst@us.ibm.com>

* config/s390/tpf-unwind.h (MIN_PATRANGE, MAX_PATRANGE)
(TPFRA_OFFSET): Macros removed.
(CP_CNF, cinfc_fast, CINFC_CMRESET, CINTFC_CMCENBKST)
(CINTFC_CMCENBKED, ICST_CRET, ICST_SRET, LOWCORE_PAGE3_ADDR)
(PG3_SKIPPING_OFFSET): New macros.
(__isPATrange): Use cinfc_fast for the check.
(__isSkipResetAddr): New function.
(s390_fallback_frame_state): Check for skip trace addresses. Use
either ICST_CRET or ICST_SRET to calculate return address
location.
(__tpf_eh_return): Handle skip trace addresses.

Daily bump.

Fix some comment typos in alias.c.

2020-04-02 Sandra Loosemore <sandra@codesourcery.com>

* alias.c (get_alias_set): Fix comment typos.