git.libre-soc.org Git

VAX: Use a mode with `const_double_zero' expressions

For predictable semantics propagate the mode from operands referred by
the FP substitution to the `const_double_zero' expressions used with the
associated condition code calculation. Use an iterator to make copies
of the FP substitution across the FP modes supported as the substitution
now has to match the mode of the operands.

gcc/
* config/vax/vax.md (subst_f<cc>): Add mode to operands and
`const_double_zero'.

PDP11: Use a mode with `const_double_zero' expressions

For predictable semantics propagate the mode from operands referred by
FP substitutions to the `const_double_zero' expressions used with the
associated condition code calculation, resulting in the following update
to insn-emit.c code produced for the `pdp11-aout' target (with machine
description line numbering change noise removed):

@@ -1514,7 +1514,7 @@
gen_rtx_COMPARE (CCmode,
gen_rtx_ABS (DFmode,
operand1),
- CONST_DOUBLE_ATOF ("0", VOIDmode))),
+ CONST_DOUBLE_ATOF ("0", DFmode))),
gen_rtx_SET (operand0,
gen_rtx_ABS (DFmode,
copy_rtx (operand1)))));
@@ -1555,7 +1555,7 @@
gen_rtx_COMPARE (CCmode,
gen_rtx_NEG (DFmode,
operand1),
- CONST_DOUBLE_ATOF ("0", VOIDmode))),
+ CONST_DOUBLE_ATOF ("0", DFmode))),
gen_rtx_SET (operand0,
gen_rtx_NEG (DFmode,
copy_rtx (operand1)))));
@@ -1790,7 +1790,7 @@
gen_rtx_MULT (DFmode,
operand1,
operand2),
- CONST_DOUBLE_ATOF ("0", VOIDmode))),
+ CONST_DOUBLE_ATOF ("0", DFmode))),
gen_rtx_SET (operand0,
gen_rtx_MULT (DFmode,
copy_rtx (operand1),
@@ -1942,7 +1942,7 @@
gen_rtx_DIV (DFmode,
operand1,
operand2),
- CONST_DOUBLE_ATOF ("0", VOIDmode))),
+ CONST_DOUBLE_ATOF ("0", DFmode))),
gen_rtx_SET (operand0,
gen_rtx_DIV (DFmode,
copy_rtx (operand1),

Provide a new iterator to provide copies of FP substitutions across the
FP modes supported as the substitutions now need to match the mode of
the operands.

gcc/
* config/pdp11/pdp11.md (PDPfp): New mode iterator.
(fcc_cc, fcc_ccnz): Use it. Add mode to `const_double_zero' and
operands.

RTL: Update `const_double_zero' handling for mode and callable insns

Handle machine mode specification with `const_double_zero' and handle
the rtx with callable code produced from named insns.  Complementing
commit 20ab43b5cad6 ("RTL: Add `const_double_zero' syntactic rtx") and
removing a commit c60d0736dff7 ("PDP11: Use `const_double_zero' to
express double zero constant") build regression observed with the
`pdp11-aout' target:

genemit: Internal error: abort in gen_exp, at genemit.c:202
make[2]: *** [Makefile:2427: s-emit] Error 1

where a:

(const_double 0 [0] 0 [0] 0 [0] 0 [0])

rtx coming from:

(parallel [
        (set (reg:CC 16)
            (compare:CC (abs:DF (match_operand:DF 1 ("general_operand") ("0,0")))
                (const_double 0 [0] 0 [0] 0 [0] 0 [0])))
        (set (match_operand:DF 0 ("nonimmediate_operand") ("=fR,Q"))
            (abs:DF (match_dup 1)))
    ])

and ultimately `(const_double_zero)' referred in a named RTL insn cannot
be interpreted.  Handle the rtx then by supplying the constant 0 double
operand requested, resulting in the following update to insn-emit.c code
produced for the `pdp11-aout' target, relative to before the triggering
commit:

@@ -1514,7 +1514,7 @@ gen_absdf2_cc (rtx operand0 ATTRIBUTE_UN
gen_rtx_COMPARE (CCmode,
gen_rtx_ABS (DFmode,
operand1),
- const0_rtx)),
+ CONST_DOUBLE_ATOF ("0", VOIDmode))),
gen_rtx_SET (operand0,
gen_rtx_ABS (DFmode,
copy_rtx (operand1)))));
@@ -1555,7 +1555,7 @@ gen_negdf2_cc (rtx operand0 ATTRIBUTE_UN
gen_rtx_COMPARE (CCmode,
gen_rtx_NEG (DFmode,
operand1),
- const0_rtx)),
+ CONST_DOUBLE_ATOF ("0", VOIDmode))),
gen_rtx_SET (operand0,
gen_rtx_NEG (DFmode,
copy_rtx (operand1)))));
@@ -1790,7 +1790,7 @@ gen_muldf3_cc (rtx operand0 ATTRIBUTE_UN
gen_rtx_MULT (DFmode,
operand1,
operand2),
- const0_rtx)),
+ CONST_DOUBLE_ATOF ("0", VOIDmode))),
gen_rtx_SET (operand0,
gen_rtx_MULT (DFmode,
copy_rtx (operand1),
@@ -1942,7 +1942,7 @@ gen_divdf3_cc (rtx operand0 ATTRIBUTE_UN
gen_rtx_DIV (DFmode,
operand1,
operand2),
- const0_rtx)),
+ CONST_DOUBLE_ATOF ("0", VOIDmode))),
gen_rtx_SET (operand0,
gen_rtx_DIV (DFmode,
copy_rtx (operand1),

This does not (yet) remove VOIDmode CONST_DOUBLE use, as it is up to
individual machine descriptions to choose.

gcc/
* genemit.c (gen_exp) <CONST_DOUBLE>: Handle `const_double_zero'
rtx.
* read-rtl.c (rtx_reader::read_rtx_code): Handle machine mode
with `const_double_zero'.
* doc/rtl.texi (Constant Expression Types): Document it.

tree-cfg: Allow enum types as result of POINTER_DIFF_EXPR [PR98556]

As conversions between signed integers and signed enums with the same
precision are useless in GIMPLE, it seems strange that we require that
POINTER_DIFF_EXPR result must be INTEGER_TYPE.

If we really wanted to require that, we'd need to change the gimplifier
to ensure that, which it isn't the case on the following testcase.
What is going on during the gimplification is that when we have the
(enum T) (p - q) cast, it is stripped through
      /* Strip away as many useless type conversions as possible
         at the toplevel.  */
      STRIP_USELESS_TYPE_CONVERSION (*expr_p);
and when the MODIFY_EXPR is gimplified, the *to_p has enum T type,
while *from_p has intptr_t type and as there is no conversion in between,
we just create GIMPLE_ASSIGN from that.

2021-01-09  Jakub Jelinek  <jakub@redhat.com>

PR c++/98556
* tree-cfg.c (verify_gimple_assign_binary): Allow lhs of
POINTER_DIFF_EXPR to be any integral type.

* c-c++-common/pr98556.c: New test.

vregs: Fix up instantiate_virtual_regs_in_insn for asm goto with outputs [PR98603]

If an asm insn fails constraint checking during vregs, it is just deleted.
We don't delete asm goto though because of the edges to the labels, so
instantiate_virtual_regs_in_insn would just remove the inputs and their
constraints, the pattern etc.
This worked fine when asm goto couldn't have output operands, but causes
ICEs later on when it has more than one output (and furthermore doesn't
really remove the problematic outputs). The problem is that
for multiple outputs we have a PARALLEL with multiple ASM_OPERANDS, but
those must use the same ASM_OPERANDS_INPUT_VEC etc., but the code was
adjusting just one.

The following patch turns invalid asm goto into a bare
asm goto ("" : : : : lab, lab2, lab3);
i.e. no inputs/outputs/clobbers, just the labels.

2021-01-09 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/98603
* function.c (instantiate_virtual_regs_in_insn): For asm goto
with impossible constraints, drop all SETs, CLOBBERs, drop PARALLEL
if any, set ASM_OPERANDS mode to VOIDmode and change
ASM_OPERANDS_OUTPUT_CONSTRAINT and ASM_OPERANDS_OUTPUT_IDX.

* gcc.target/i386/pr98603.c: New test.
* gcc.target/aarch64/pr98603.c: New test.

final: accept markers at line 0

Back when I introduced debug markers, I seem to have been under the
impression that location line 0 would only ever occur for unknown and
builtin locations.

Though line 0 never comes up in normal processing of source files, and
debug info formats often cannot represent them, I suppose there's no
need to preemptively discard them during final.

for gcc/ChangeLog

PR debug/97714
* final.c (notice_source_line): Narrow down the condition to
skip a line-0 marker.

for gcc/testsuite/ChangeLog

PR debug/97714
* gcc.dg/debug/pr97714.c: New.

Daily bump.

ipa-modref: avoid linebreak split in debug print

* ipa-modref.c (merge_call_side_effects): Fix
linebreak split by reordering two print calls.

IBM Z: Fix constraints in vpdi patterns

The destination register is only partially overwritten, so + should be
used instead of =.

gcc/ChangeLog:

2021-01-08 Ilya Leoshkevich <iii@linux.ibm.com>

* config/s390/vector.md (*tf_to_fprx2_0): Rename from
"*mov_tf_to_fprx2_0" for consistency, fix constraint.
(*tf_to_fprx2_1): Rename from "*mov_tf_to_fprx2_1" for
consistency, fix constraint.

x86-64: Require lp64 for PR target/98482 tests

Require lp64 for PR target/98482 tests since -mcmodel=large is isn't
supported for x32.

PR target/98482
* gcc.target/i386/pr98482-1.c: Require lp64.
* gcc.target/i386/pr98482-2.c: Likewise.

IBM Z: Introduce __LONG_DOUBLE_VX__ macro

Give end users the opportunity to find out whether long doubles are
stored in floating-point register pairs or in vector registers, so that
they could fine-tune their asm statements.

gcc/ChangeLog:

2020-12-14 Ilya Leoshkevich <iii@linux.ibm.com>

* config/s390/s390-c.c (s390_def_or_undef_macro): Accept
callables instead of mask values.
(struct target_flag_set_p): New predicate.
(s390_cpu_cpp_builtins_internal): Define or undefine
__LONG_DOUBLE_VX__ macro.

2020-12-14 Ilya Leoshkevich <iii@linux.ibm.com>

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/long-double-vx-macro-off-on.c: New test.
* gcc.target/s390/vector/long-double-vx-macro-on-off.c: New test.

Tweak dg-prune-output regex for out-of-build-tree contexts

libstdc++-v3/

* testsuite/20_util/bind/ref_neg.cc: Tweak the
dg-prune-output regex for out-of-build-tree contexts.

c++: ICE with constexpr call that returns a PMF [PR98551]

We shouldn't do replace_result_decl after evaluating a call that returns
a PMF because PMF temporaries aren't wrapped in a TARGET_EXPR (and so we
can't trust ctx->object), and PMF initializers can't be self-referential
anyway, so replace_result_decl would always be a no-op.

To that end, this patch changes the relevant AGGREGATE_TYPE_P test to
CLASS_TYPE_P, which should rule out PMFs (as well as arrays, which we
can't return and therefore won't see here). This fixes an ICE from the
sanity check in replace_result_decl in the below testcase during
constexpr evaluation of the call f() in the initializer g(f()).

gcc/cp/ChangeLog:

PR c++/98551
* constexpr.c (cxx_eval_call_expression): Check CLASS_TYPE_P
instead of AGGREGATE_TYPE_P before calling replace_result_decl.

gcc/testsuite/ChangeLog:

PR c++/98551
* g++.dg/cpp0x/constexpr-pmf2.C: New test.

c++: Fix access checking of scoped non-static member [PR98515]

In the first testcase below, we incorrectly reject the use of the
protected non-static member A::var0 from C<int>::g() because
check_accessibility_of_qualified_id, at template parse time, determines
that the access doesn't go through 'this'.  (This happens because the
dependent base B<T> of C<T> doesn't have a binfo object, so it appears
to DERIVED_FROM_P that A is not an indirect base of C<T>.)  From there
we create the corresponding deferred access check, which we then
perform at instantiation time and which (expectedly) fails.

The problem ultimately seems to be that we can't in general determine
whether a use of a scoped non-static member goes through 'this' until
instantiation time, as the second testcase below illustrates.  So this
patch makes check_accessibility_of_qualified_id punt in such situations
to avoid creating a bogus deferred access check.

gcc/cp/ChangeLog:

PR c++/98515
* semantics.c (check_accessibility_of_qualified_id): Punt if
we're checking access of a scoped non-static member inside a
class template.

gcc/testsuite/ChangeLog:

PR c++/98515
* g++.dg/template/access32.C: New test.
* g++.dg/template/access33.C: New test.

x86-64: Use R10 and R11 for profiling large model with PIC

For NO_PROFILE_COUNTERS targets, R11 is a scratch register. We can use
R10 and R11 to call mcount in large model with PIC.

gcc/

PR target/98482
* config/i386/i386.c (x86_function_profiler): Use R10 and R11
to call mcount in large model with PIC for NO_PROFILE_COUNTERS
targets.

gcc/testsuite/

PR target/98482
* gcc.target/i386/pr98482-2.c: Updated.

reset the SCEV htab after FRE in loop pipeline

When running FRE in the loop pipeline (as part of the conditionally
scheduled scalar cleanups) we have to reset the SCEV hashtable as
otherwise we can end up with stale entries and all sorts of problems.

Catched by my out-of-tree verifier for this problem.

2021-01-08 Richard Biener <rguenther@suse.de>

* tree-ssa-sccvn.c (pass_fre::execute): Reset the SCEV hash table.

fix vectorizer memleaks

This plugs two memleaks in the vectorizer.

2021-01-08 Richard Biener <rguenther@suse.de>

* tree-vect-slp.c (scalar_stmts_to_slp_tree_map_t): Fix.
(vect_build_slp_tree): On cache hit release the matched
scalar stmts vector.
* tree-vect-stmts.c (vectorizable_store): Properly free
vec_oprnds before possibly gathering them again.

tree-optimization/98544 - more permute optimization fixes

Permute nodes are not transparent to the permute of their children.
Instead we have to materialize child permutes always and in future
may treat permute nodes as the source of arbitrary permutes as
we can permute the lane permutation vector at will (as the target
supports in the end).

2021-01-08 Richard Biener <rguenther@suse.de>

PR tree-optimization/98544
* tree-vect-slp.c (vect_optimize_slp): Always materialize
permutes at a permute node.

* gcc.dg/vect/bb-slp-pr98544.c: New testcase.

x86-64: Use R10 for profiling large model

R10 is caller-saved.  Although it can be used as a static chain register,
it is preserved when calling mcount for nested functions.  Use R10 as a
scratch register to call mcount in large model.

gcc/

PR target/98482
* config/i386/i386.c (x86_function_profiler): Use R10 to call
mcount in large model.  Sorry for large model with PIC.

gcc/testsuite/

PR target/98482
* gcc.target/i386/pr98482-1.c: New test.
* gcc.target/i386/pr98482-1.c: Likewise.

i386: Fix -mcmodel= vs. target attribute [PR98585]

My patch to save/restore opts_set rather than essentially treating
global_options_set as a logical or whether some option has ever been
explicitly set somewhere apparently broke -mcmodel= vs. target attribute
(and as the patch shows some other options too).
The thing is, at least for options for which we ever test opts_set->x_*
or global_options_set.x_*, we need to save/restore them next to the
saving/restoring of the actual option values.
If an option has Save keyword or in case of TargetVariable, it is the
generic code that handles the saving and restoring of both the option
and corresponding opts_set flag automatically, for other variables
(TargetSave, or Target without Save) the backend needs to do that in the
target hook manually and in that case should save/restore both the option
values (the hooks mostly did that) and opts_set (they didn't).

As it seems much easier to let the automatic saving/restoring do the work
for us unless the saving/restoring of the option needs some specific magic,
the following patch is a result of grepping through the backend for
opts_set->x_ and global_options_set.x_ and for all such referenced
variables, grepping whether it is saved/restored including opts_set properly
in the generated options-save.c or not.

2021-01-08 Jakub Jelinek <jakub@redhat.com>

PR target/98585
* config/i386/i386.opt (ix86_cmodel, ix86_incoming_stack_boundary_arg,
ix86_pmode, ix86_preferred_stack_boundary_arg, ix86_regparm,
ix86_veclibabi_type): Remove x_ prefix, use TargetVariable instead of
TargetSave and initialize for variables with enum types.
(mfentry, mstack-protector-guard-reg=, mstack-protector-guard-offset=,
mstack-protector-guard-symbol=): Add Save.
* config/i386/i386-options.c (ix86_function_specific_save,
ix86_function_specific_restore): Don't save or restore x_ix86_cmodel,
x_ix86_incoming_stack_boundary_arg, x_ix86_pmode,
x_ix86_preferred_stack_boundary_arg, x_ix86_regparm,
x_ix86_veclibabi_type.

* gcc.target/i386/pr98585.c: New test.

aarch64: Support unpacked CNOT on SVE

This patch adds unpacked support for unconditional and
conditional CNOT. The type suffix has to be taken from
the element size rather than the container size.

gcc/
* config/aarch64/aarch64-sve.md (*cnot<mode>): Extend from
SVE_FULL_I to SVE_I.
(*cond_cnot<mode>_2, *cond_cnot<mode>_any): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sve/cnot_2.c: New test.
* gcc.target/aarch64/sve/cond_cnot_4.c: Likewise.
* gcc.target/aarch64/sve/cond_cnot_4_run.c: Likewise.
* gcc.target/aarch64/sve/cond_cnot_5.c: Likewise.
* gcc.target/aarch64/sve/cond_cnot_5_run.c: Likewise.
* gcc.target/aarch64/sve/cond_cnot_6.c: Likewise.
* gcc.target/aarch64/sve/cond_cnot_6_run.c: Likewise.

aarch64: Support conditional unpacked UXT on SVE

This patch extends the conditional UXT patterns from SVE_FULL_I
to SVE_I. It doesn't matter in this case whether the type suffix
is taken from the element size or the container size.

gcc/
* config/aarch64/aarch64-sve.md (*cond_uxt<mode>_2): Extend from
SVE_FULL_I to SVE_I.
(*cond_uxt<mode>_any): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sve/cond_uxt_5.c: New test.
* gcc.target/aarch64/sve/cond_uxt_5_run.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_6.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_6_run.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_7.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_7_run.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_8.c: Likewise.
* gcc.target/aarch64/sve/cond_uxt_8_run.c: Likewise.

SVE2: Fix aarch64-sve2-acle-asm tests.

This fixes a logical inconsistency with the SVE2 ACLE tests where the SVE2 tests
are checking for SVE support in the assembler instead of SVE2.

This makes all these tests fail when the user has an SVE enabled assembler but
not an SVE2 one.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_aarch64_asm_sve2_ok): New.
* g++.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Use it.
* gcc.target/aarch64/sve2/acle/aarch64-sve2-acle-asm.exp: Likewise.

aarch64: Reimplement most vpadal intrinsics using builtins

This patch reimplements most of the vpadal intrinsics to use RTL
builtins in the normal way.
The ones that aren't converted are the int32x2_t -> int64x1_t ones as
the RTL pattern doesn't currently handle
these modes. We don't have a V1DI mode so it would need to return a
DImode value or a V2DI one with the first lane
being the result. It's not hard to do, but it would require a bit more
refactoring so we can do it separately later.

This patch hopefully improves the status quo.

The new Vwhalf mode attribute is created because the existing Vwtype
attribute maps V8QI wrongly (for this pattern) to "8h" as the
suffix rather than "4h" as needed.

gcc/
* config/aarch64/iterators.md (Vwhalf): New iterator.
* config/aarch64/aarch64-simd.md (aarch64_<sur>adalp<mode>_3):
Rename to...
(aarch64_<sur>adalp<mode>): ... This. Make more
builtin-friendly.
(<sur>sadv16qi): Adjust callsite of the above.
* config/aarch64/aarch64-simd-builtins.def (sadalp, uadalp): New
builtins.
* config/aarch64/arm_neon.h (vpadal_s8): Reimplement using
builtins.
(vpadal_s16): Likewise.
(vpadal_u8): Likewise.
(vpadal_u16): Likewise.
(vpadalq_s8): Likewise.
(vpadalq_s16): Likewise.
(vpadalq_s32): Likewise.
(vpadalq_u8): Likewise.
(vpadalq_u16): Likewise.
(vpadalq_u32): Likewise.

aarch64: Reimplement vabd* intrinsics using builtins

This patch reimplements the vabd* intrinsics using RTL builtins.
It's fairly straightforward with new builtins + arm_neon.h changes.

gcc/
* config/aarch64/aarch64-simd.md (aarch64_<su>abd<mode>_3):
Rename to...
(aarch64_<su>abd<mode>): ... This.
(<sur>sadv16qi): Adjust callsite of the above.
* config/aarch64/aarch64-simd-builtins.def (sabd, uabd): Define
builtins.
* config/aarch64/arm_neon.h (vabd_s8): Reimplement using
builtin.
(vabd_s16): Likewise.
(vabd_s32): Likewise.
(vabd_u8): Likewise.
(vabd_u16): Likewise.
(vabd_u32): Likewise.
(vabdq_s8): Likewise.
(vabdq_s16): Likewise.
(vabdq_s32): Likewise.
(vabdq_u8): Likewise.
(vabdq_u16): Likewise.
(vabdq_u32): Likewise.

aarch64: Reimplement vaba* intrinsics using builtins

This patch reimplements the vaba* arm_neon.h intrinsics using RTL
builtins that expand to proper RTL patterns
rather than using inline asm.
The implementation is fairly straightforward by defining new builtins
and using them in the header.

gcc/
* config/aarch64/aarch64-simd-builtins.def (saba, uaba): Define
builtins.
* config/aarch64/arm_neon.h (vaba_s8): Implement using builtin.
(vaba_s16): Likewise.
(vaba_s32): Likewise.
(vaba_u8): Likewise.
(vaba_u16): Likewise.
(vaba_u32): Likewise.
(vabaq_s8): Likewise.
(vabaq_s16): Likewise.
(vabaq_s32): Likewise.
(vabaq_u8): Likewise.
(vabaq_u16): Likewise.
(vabaq_u32): Likewise.

aarch64: Fix RTL patterns for UABA/SABA

Sometime ago we changed the RTL representation of the (SU)ABD
instructions in RTL to a (MINUS (MAX) (MIN)) rather than a (MINUS (ABS) (ABS))
as it is more correctly models the semantics.
We should do the same for the accumulation forms of these instructions:
UABA/SABA.

This patch does that and allows the new pattern to generate the unsigned
UABA form as well.
The new form also allows it to more easily be re-used to implement the
relevant arm_neon.h intrinsics in the future.

The testcase takes an -fno-tree-reassoc to work around a side-effect of
PR98581.

gcc/
* config/aarch64/aarch64-simd.md (aba<mode>_3): Rename to...
(aarch64_<su>aba<mode>): ... This. Handle uaba as well.
Change RTL pattern to match.

gcc/testsuite/
* gcc.target/aarch64/usaba_1.c: New test.

Fortran: Allow pointer deferred length associate selectors. [PR93794]

2021-01-05 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/93794
* trans-expr.c (gfc_conv_component_ref): Remove the condition
that deferred character length components only be allocatable.

gcc/testsuite/
PR fortran/93794
* gfortran.dg/deferred_character_35.f90 : New test.

Fortran:Fix simplification of constructors with implied-do [PR98458]

2021-01-08 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/98458
* simplify.c (is_constant_array_expr): If an array constructor
expression has elements other than constants or structures, try
fixing the expression with gfc_reduce_init_expr. Also, if shape
is NULL, obtain the array size and set it.

gcc/testsuite/
PR fortran/98458
* gfortran.dg/implied_do_3.f90 : New test.

Fix array-quals-1.c for RISC-V

RISC-V will put those variable on srodata rather than rodata.

gcc/testsuite/ChangeLog:

* gcc.dg/array-quals-1.c: Allow srodata.

RISC-V: Implement new style of architecture extension test macros.

- This patch introduce new set of architecture extension test macros
  which is accept on riscv-c-api-doc recently.
  - https://github.com/riscv/riscv-c-api-doc/blob/master/riscv-c-api.md#architecture-extension-test-macro

- We will also mark deprecated for legacy architecture extension test macros
  in GCC 11, but still support that for 1 or 2 release cycles.

gcc/ChangeLog:

* common/config/riscv/riscv-common.c (riscv_current_subset_list): New.
* config/riscv/riscv-c.c (riscv-subset.h): New.
(INCLUDE_STRING): Define.
(riscv_cpu_cpp_builtins): Add new style architecture extension
test macros.
* config/riscv/riscv-subset.h (riscv_subset_list::begin): New.
(riscv_subset_list::end): New.
(riscv_current_subset_list): New.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-10.c: New.
* gcc.target/riscv/predef-11.c: New.
* gcc.target/riscv/predef-12.c: New.
* gcc.target/riscv/predef-13.c: New.

RISC-V: Move class riscv_subset_list and riscv_subset_t to riscv-protos.h

Pre-work of new style of architecture extension test macros, we need the
list used in `config/riscv/riscv-c.c`, so those struct/class declaration
must move to header file rather than local C file.

gcc/ChangeLog

* common/config/riscv/riscv-common.c (RISCV_DONT_CARE_VERSION):
Move to riscv-subset.h.
(struct riscv_subset_t): Ditto.
(class riscv_subset_list): Ditto.
* config/riscv/riscv-subset.h (RISCV_DONT_CARE_VERSION): Move
from riscv-common.c.
(struct riscv_subset_t): Ditto.
(class riscv_subset_list): Ditto.
* config/riscv/t-riscv ($(common_out_file)): Add file
dependency.

Daily bump.

c++: Fix up tsubst of BIT_CAST_EXPR [PR98329]

As the testcase shows, calling cp_build_bit_cast in tsubst_copy doesn't seem
to be a good idea, because tsubst_copy might not really make the operand
non-dependent, but as processing_template_decl can be 0,
type_dependent_expression_p will return false and then cp_build_bit_cast
assumes the type is non-NULL and non-dependent.
So, this patch just follows what is done e.g. for NOP_EXPR etc. and just
builds some tree in tsubst_copy, and only calls the semantics.c function
from tsubst_copy_and_build.

2021-01-07 Jakub Jelinek <jakub@redhat.com>

PR c++/98329
* pt.c (tsubst_copy) <case BIT_CAST_EXPR>: Don't call
cp_build_bit_cast here, instead just build_min a BIT_CAST_EXPR and set
its location.
(tsubst_copy_and_build): Handle BIT_CAST_EXPR.

* g++.dg/cpp2a/bit-cast10.C: New test.

PR middle-end/98578 - ICE warning on uninitialized VLA access

gcc/c-family/ChangeLog:

PR middle-end/98578
* c-pretty-print.c (print_mem_ref): Strip array from access type.
Avoid assuming acces type's size is constant. Correct condition
guarding the printing of a parenthesis.

gcc/testsuite/ChangeLog:

PR middle-end/98578
* gcc.dg/plugin/gil-1.c: Adjust expected output.
* gcc.dg/uninit-pr98578.c: New test.

c++: Fix thinko in auto return type checking [PR98441]

This fixes a thinko in my r11-2085 patch: when I said "But only give the
!late_return_type errors when funcdecl_p, to accept e.g. auto (*fp)() = f;
in C++11" I should've done this, otherwise we give bogus errors mentioning
"function with trailing return type" when there is none.

gcc/cp/ChangeLog:

PR c++/98441
* decl.c (grokdeclarator): Move the !funcdecl_p check inside the
!late_return_type block.

gcc/testsuite/ChangeLog:

PR c++/98441
* g++.dg/cpp0x/auto55.C: New test.

c++: Add TARGET_EXPR comments

Discussing the 98469 patch and class prvalues with Jakub led me to
double-check our handling of TARGET_EXPR in constexpr.c, and add a note
about why we don't strip them in parameter initialization. And another to
clarify that we're handling an INIT_EXPR in a place we do strip them.

gcc/cp/ChangeLog:

* constexpr.c (cxx_bind_parameters_in_call): Add comment.
(cxx_eval_store_expression): Add comment.

c++: Add some conversion sanity checking.

Another change I was working on revealed that for complex numbers we were
building a ck_identity with build_conv, leading to the wrong active member
in the union being set. Rather than add another enumeration of the
appropriate conversion codes, I factored that out.

gcc/cp/ChangeLog:

* call.c (has_next): Factor out from...
(next_conversion): ...here.
(strip_standard_conversion): And here.
(is_subseq): And here.
(build_conv): Check it.
(standard_conversion): Don't call build_conv
for ck_identity.

libstdc++: Add support for C++20 barriers

Adds <barrier>

libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in: Add new header.
* include/Makefile.am (std_headers): likewise.
* include/Makefile.in: Regenerate.
* include/precompiled/stdc++.h: Add new header.
* include/std/barrier: New file.
* include/std/version: Add __cpp_lib_barrier feature test macro.
* testsuite/30_threads/barrier/1.cc: New test.
* testsuite/30_threads/barrier/2.cc: Likewise.
* testsuite/30_threads/barrier/arrive_and_drop.cc: Likewise.
* testsuite/30_threads/barrier/arrive_and_wait.cc: Likewise.
* testsuite/30_threads/barrier/arrive.cc: Likewise.
* testsuite/30_threads/barrier/completion.cc: Likewise.

analyzer: fix ICE when DECL_INITIAL is error_mark_node [PR98580]

lto-streamer-out.c's get_symbol_initial_value can return error_mark_node
rather than DECL_INITIAL as an optimization to avoid extra sections for
simple scalar values.

Add a check to the analyzer to handle such cases gracefully.

gcc/analyzer/ChangeLog:
PR analyzer/98580
* region.cc (decl_region::get_svalue_for_initializer): Gracefully
handle when LTO writes out DECL_INITIAL as error_mark_node.

gcc/testsuite/ChangeLog:
PR analyzer/98580
* gcc.dg/analyzer/pr98580-a.c: New test.
* gcc.dg/analyzer/pr98580-b.c: New test.

test: add new Go tests from source repo

Update cpplib es.po.

* es.po: Update.

libstdc++: Fix long double to_chars testcase [PR98384]

The testcase was failing to compile on some targets due to its use of
the non-standard functions nextupl and nextdownl. This patch makes the
testcase instead use the C99 function nexttowardl in an equivalent way.

libstdc++-v3/ChangeLog:

PR libstdc++/98384
* testsuite/20_util/to_chars/long_double.cc: Use nexttowardl
instead of the non-standard nextupl and nextdownl.

Fortran: Improve resolution of associate variables. [PR93701].

2021-01-07 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/93701
* resolve.c (find_array_spec): Put static prototype for
resolve_assoc_var before this function and call for associate
variables.

gcc/testsuite/
PR fortran/93701
* gfortran.dg/associate_54.f90: New test.
* gfortran.dg/associate_55.f90: New test.
* gfortran.dg/associate_56.f90: New test.

d: Merge upstream dmd 9038e64c5.

Adds support for using user-defined attributes on function arguments and
single-parameter alias declarations. These attributes behave analogous
to existing UDAs.

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 9038e64c5.
* d-builtins.cc (build_frontend_type): Update call to
Parameter::create.

fix GIMPLE parser for loops

We do not tolerate "growing" a vector to a lower size.

2021-01-07 Richard Biener <rguenther@suse.de>

gcc/c/
* gimple-parser.c (c_parser_gimple_compound_statement): Only
reallocate loop array if it is too small.

i386: Optimize blsi followed by comparison [PR98567]

The BLSI instruction sets SF and ZF based on the result and clears OF.
CF is set to something unrelated.

The following patch optimizes BLSI followed by comparison, so we don't need
to emit a TEST insn in between.

2021-01-07 Jakub Jelinek <jakub@redhat.com>

PR target/98567
* config/i386/i386.md (*bmi_blsi_<mode>_cmp, *bmi_blsi_<mode>_ccno):
New define_insn patterns.

* gcc.target/i386/pr98567-1.c: New test.
* gcc.target/i386/pr98567-2.c: New test.

aarch64: Support conditional unpacked integer unary arithmetic on SVE

This patch extends the conditional unary integer operations
from SVE_FULL_I to SVE_I. In each case the type suffix is
taken from the element size rather than the container size:
this matters for ABS and NEG, but doesn't matter for NOT.

gcc/
* config/aarch64/aarch64-sve.md (@cond_<SVE_INT_UNARY:optab><mode>)
(*cond_<SVE_INT_UNARY:optab><mode>_2): Extend from SVE_FULL_I to SVE_I.
(*cond_<SVE_INT_UNARY:optab><mode>_any): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sve/cond_unary_5.c: New test.
* gcc.target/aarch64/sve/cond_unary_5_run.c: Likewise.
* gcc.target/aarch64/sve/cond_unary_6.c: Likewise.
* gcc.target/aarch64/sve/cond_unary_6_run.c: Likewise.
* gcc.target/aarch64/sve/cond_unary_7.c: Likewise.
* gcc.target/aarch64/sve/cond_unary_7_run.c: Likewise.
* gcc.target/aarch64/sve/cond_unary_8.c: Likewise.
* gcc.target/aarch64/sve/cond_unary_8_run.c: Likewise.

gimple-isel: Check whether IFN_VCONDEQ is supported [PR98560]

This patch follows on from the previous one for the PR and
makes sure that we can handle == as well as <. Previously
we assumed without checking that IFN_VCONDEQ was available
if IFN_VCOND or IFN_VCONDU wasn't.

The patch also fixes the definition of the IFN_VCOND* functions.
The optabs are convert optabs in which the first mode is the
data mode and the second mode is the comparison or mask mode.

gcc/
PR tree-optimization/98560
* internal-fn.def (IFN_VCONDU, IFN_VCONDEQ): Use type vec_cond.
* internal-fn.c (vec_cond_mask_direct): Get the data mode from
argument 1.
(vec_cond_direct): Likewise argument 2.
(vec_condu_direct, vec_condeq_direct): Delete.
(expand_vect_cond_optab_fn): Rename to...
(expand_vec_cond_optab_fn): ...this, replacing old macro.
(expand_vec_condu_optab_fn, expand_vec_condeq_optab_fn): Delete.
(expand_vect_cond_mask_optab_fn): Rename to...
(expand_vec_cond_mask_optab_fn): ...this, replacing old macro.
(direct_vec_cond_mask_optab_supported_p): Treat the optab as a
convert optab.
(direct_vec_cond_optab_supported_p): Likewise.
(direct_vec_condu_optab_supported_p): Delete.
(direct_vec_condeq_optab_supported_p): Delete.
* gimple-isel.cc: Include internal-fn.h.
(gimple_expand_vec_cond_expr): Check that IFN_VCONDEQ is supported
before using it.

gcc/testsuite/
PR tree-optimization/98560
* gcc.dg/vect/pr98560-2.c: New test.

gimple-isel: Fall back to using vcond_mask [PR98560]

PR98560 is about a case in which the vectoriser initially generates:

  mask_1 = a < 0;
  mask_2 = mask_1 & ...;
  res = VEC_COND_EXPR <mask_2, b, c>;

The vectoriser thus expects res to be calculated using vcond_mask.
However, we later manage to fold mask_2 to mask_1, leaving:

  mask_1 = a < 0;
  res = VEC_COND_EXPR <mask_1, b, c>;

gimple-isel then required a combined vcond to exist.

On most targets, it's not too onerous to provide all possible
(compare x select) combinations.  For each data mode, you just
need to provide unsigned comparisons, signed comparisons, and
floating-point comparisons, with the data mode and type of
comparison uniquely determining the mode of the compared values.
But for targets like SVE that support “unpacked” vectors,
it's not that simple: the level of unpacking adds another
degree of freedom.

Rather than insist that the combined versions exist, I think
we should be prepared to fall back to using separate comparisons
and vcond_masks.  I think that makes more sense on targets like
AArch64 and AArch32 in which compares and selects are fundementally
separate operations anyway.

gcc/
PR tree-optimization/98560
* gimple-isel.cc (gimple_expand_vec_cond_expr): If we fail to use
IFN_VCOND{,U,EQ}, fall back on IFN_VCOND_MASK.

gcc/testsuite/
PR tree-optimization/98560
* gcc.dg/vect/pr98560-1.c: New test.

i386: Merge various insn name mapping code attributes

2021-01-07 Uroš Bizjak <ubizjak@gmail.com>

gcc/
* config/i386/i386.md (insn): Merge from plusminus_insn, shift_insn,
rotate_insn and optab code attributes.
Update all uses to merged code attribute.
* config/i386/sse.md: Update all uses to merged code attribute.
* config/i386/mmx.md: Update all uses to merged code attribute.

bswap: Fix up recent vector CONSTRUCTOR optimization [PR98568]

As the testcase shows, bswap can match even byte-swapping or indentity
from low part of some wider SSA_NAME.
For bswap replacement other than for vector CONSTRUCTOR the code has been
using NOP_EXPR casts if the types weren't compatible, but for vectors
we need to use VIEW_CONVERT_EXPR. The problem with the latter is that
we require that it has the same size, which isn't guaranteed, so this patch
in those cases first adds a narrowing NOP_EXPR cast and only afterwards
does a VIEW_CONVERT_EXPR.

2021-01-07 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/98568
* gimple-ssa-store-merging.c (bswap_view_convert): New function.
(bswap_replace): Use it.

* g++.dg/torture/pr98568.C: New test.

Adjust testcase for PR 92658

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr92658-avx512bw.c: Add
-mprefer-vector-width=512 to avoid impact of different default
mtune which gcc is built with.
* gcc.target/i386/pr92658-avx512bw-2.c: Ditto.

analyzer: fix false leak reports when merging states [PR97074]

gcc/analyzer/ChangeLog:
PR analyzer/97074
* store.cc (binding_cluster::can_merge_p): Add "out_store" param
and pass to calls to binding_cluster::make_unknown_relative_to.
(binding_cluster::make_unknown_relative_to): Add "out_store"
param. Use it to mark base regions that are pointed to by
pointers that become unknown as having escaped.
(store::can_merge_p): Pass out_store to
binding_cluster::can_merge_p.
* store.h (binding_cluster::can_merge_p): Add "out_store" param.
(binding_cluster::make_unknown_relative_to): Likewise.
* svalue.cc (region_svalue::implicitly_live_p): New vfunc.
* svalue.h (region_svalue::implicitly_live_p): New vfunc decl.

gcc/testsuite/ChangeLog:
PR analyzer/97074
* gcc.dg/analyzer/pr97074.c: New test.

analyzer: fix missing bitmap_clear [PR98564]

gcc/analyzer/ChangeLog:
PR analyzer/98564
* engine.cc (exploded_path::feasible_p): Add missing call to
bitmap_clear.

gcc/testsuite/ChangeLog:
PR analyzer/98564
* gcc.dg/analyzer/pr98564.c: New test.

Daily bump.

sync libctf toplevel from binutils-gdb

This pulls in the toplevel portions of these binutils-gdb commits:

   1ff6de031241c59d0ff bfd, ld: add CTF section linking
   87279e3cef5b2c54f4a libctf: installable libctf as a shared library
   c59e30ed1727135f8ef libctf: new testsuite

* Makefile.def: Sync with binutils-gdb:
(dependencies): all-ld depends on all-libctf.
(host_modules): libctf is no longer no_install.
No longer no_check.  Checking depends on all-ld.
* Makefile.in: Regenerated.

[PR97978] LRA: Permit temporary allocation incorrectness after hard reg split.

LRA can crash when a hard register was split and the same hard register
was assigned on the previous assignment sub-pass. The following
patch fixes this problem.

gcc/ChangeLog:

PR rtl-optimization/97978
* lra-int.h (lra_hard_reg_split_p): New external.
* lra.c (lra_hard_reg_split_p): New global.
(lra): Set up lra_hard_reg_split_p after splitting a hard reg.
* lra-assigns.c (lra_assign): Don't check allocation correctness
after hard reg splitting.

gcc/testsuite/ChangeLog:

PR rtl-optimization/97978
* gcc.target/i386/pr97978.c: New.

PR c++/95768 - pretty-printer ICE on -Wuninitialized with allocated storage

gcc/c-family/ChangeLog:

PR c++/95768
* c-pretty-print.c (c_pretty_printer::primary_expression): For
SSA_NAMEs print VLA names and GIMPLE defining statements.
(print_mem_ref): New function.
(c_pretty_printer::unary_expression): Call it.

gcc/cp/ChangeLog:

PR c++/95768
* error.c (dump_expr): Call c_pretty_printer::unary_expression.

gcc/testsuite/ChangeLog:

PR c++/95768
* g++.dg/pr95768.C: New test.
* g++.dg/warn/Wuninitialized-12.C: New test.
* gcc.dg/uninit-38.c: New test.

PR c++/98305 spurious -Wmismatched-new-delete on template instance

gcc/ChangeLog:

PR c++/98305
* builtins.c (new_delete_mismatch_p): New overload.
(new_delete_mismatch_p (tree, tree)): Call it.

gcc/testsuite/ChangeLog:

PR c++/98305
* g++.dg/warn/Wmismatched-new-delete-3.C: New test.

testsuite, coroutines : Fix a bad testcase [PR96504].

Where possible (i.e. where that doesn't alter the intent of a test) we
use a suspend_always as the final suspend and a test that the coroutine
was 'done' to check that the state machine had terminated correctly.

Sometimes, filed PRs have 'suspend_never' as the final suspend expression
and that needs to be changed to match the testsuite style. This is one
I missed and means that the call to 'done()' on the handle is made to an
already-destructed coroutine. Surprisngly, thAt didn't actually trigger
a failure until glibc 2-32.

Fixed by changing the final suspend to be 'suspend_always'.

gcc/testsuite/ChangeLog:

PR c++/96504
* g++.dg/coroutines/torture/pr95519-05-gro.C: Use suspend_always
as the final suspend point so that we can check that the state
machine has reached the expected point.

PR fortran/78746 - invalid access after error recovery

The error recovery after an invalid reference to an undefined CLASS
during a TYPE declaration lead to an invalid access. Add a check.

gcc/fortran/ChangeLog:

* resolve.c (resolve_component): Add check for valid CLASS
reference before trying to access CLASS data.

c++: Fix g++.dg/warn/Wmismatched-dealloc.C for C++11 [PR98566]

C++ sized deallocation only came in C++14, so this test wasn't
working properly in C++11, which isn't tested by default. Fixed
thus by constraining the dg-errors to C++14 only.

gcc/testsuite/ChangeLog:

PR testsuite/98566
* g++.dg/warn/Wmismatched-dealloc.C: Use target c++14 in
dg-error.

Fix libcody build on hppa*-*-hpux11.11.

2021-01-06 John David Anglin <danglin@gcc.gnu.org>

libcody/ChangeLog:

PR bootstrap/98506
* resolver.cc: Only use fstatat when _POSIX_C_SOURCE >= 200809L.

add alignment to enable store merging in strict-alignment targets

In g++.dg/opt/store-merging-2.C, the natural alignment of types T and
S is a single byte, so we shouldn't expect store merging on
strict-alignment platforms. Indeed, without something like the
adjust-alignment pass to bump up the alignment of the automatic
variable, as in GCC 10, the optimization does not occur.

This patch adjusts the test so that the required alignment is
expressly stated, and so we don't rely on its accidentally being there
to get the desired optimization.

for gcc/testsuite/ChangeLog

* g++.dg/opt/store-merging-2.C: Add the required alignment.

robustify vxworks glimits.h overriding

The glimits.h overriding used in gcc/config/t-vxworks was fragile: the
intermediate file would already be there in a rebuild, and so the
adjustments would not be made, so the generated limits.h would miss
them, causing limits-width-[12] tests to fail on that target.

While changing it, I also replaced the modern $(cmd) shell syntax with
the more portable `cmd` construct.

for gcc/ChangeLog

* Makefile.in (T_GLIMITS_H): New.
(stmp-int-hdrs): Depend on it, use it.
* config/t-vxworks (T_GLIMITS_H): Override it.
(vxw-glimits.h): New.

add signed_bool_precision attribute for GIMPLE FE use

This adds __attribute__((signed_bool_precision(precision))) to be able
to construct nonstandard boolean types which for the included testcase
is needed to simulate Ada and LTO interaction (Ada uses a 8 bit
precision boolean_type_node). This will also be useful for vector
unit testcases where we need to produce vector types with
non-standard precision signed boolean type components.

2021-01-06 Richard Biener <rguenther@suse.de>

PR tree-optimization/95582
gcc/c-family/
* c-attribs.c (c_common_attribute_table): Add entry for
signed_bool_precision.
(handle_signed_bool_precision_attribute): New.

gcc/testsuite/
* gcc.dg/pr95582.c: New testcase.

tree-optimization/98513 - fix bug in range intersection code

This fixes a premature optimization in the range intersection code
which assumes earlier branches have to be taken, not taking into
account that for symbolic ranges we cannot always compare endpoints.
The fix is to instantiate the compare deemed redundant (which then
fails as undecidable for the testcase).

2021-01-06 Richard Biener <rguenther@suse.de>

PR tree-optimization/98513
* value-range.cc (intersect_ranges): Compare the upper bounds
for the expected relation.

* gcc.dg/tree-ssa/pr98513.c: New testcase.

gcc-changelog: workaround for utf8 filenames

contrib/ChangeLog:

* gcc-changelog/git_commit.py: Add decode_path function.
* gcc-changelog/git_email.py: Use it in order to solve
utf8 encoding filename issues.
* gcc-changelog/git_repository.py: Likewise.
* gcc-changelog/test_email.py: Test it.

analyzer: fix false leaks when writing through unknown ptrs [PR97072]

gcc/analyzer/ChangeLog:
PR analyzer/97072
* region-model-reachability.cc (reachable_regions::init_cluster):
Convert symbolic region handling to a switch statement. Add cases
to handle SK_UNKNOWN and SK_CONJURED.

gcc/testsuite/ChangeLog:
PR analyzer/97072
* gcc.dg/analyzer/pr97072.c: New test.

analyzer: add regression test for PR 98073

This ICE was fixed by r11-2694-g808f4dfeb3a95f50 (aka the big state
rewrite for GCC 11).

gcc/testsuite/ChangeLog:
PR analyzer/98073
* gcc.dg/analyzer/pr98073.c: New test.

analyzer: remove xfail [PR98223]

The bogus leak message went away after
fcae5121154d1c3382b056bcc2c563cedac28e74 (aka "Hybrid EVRP and
testcases") due to that patch improving a phi node in the gimple input
to the analyzer.

gcc/testsuite/ChangeLog:
PR analyzer/98223
* gcc.dg/analyzer/pr94851-1.c: Remove xfail.

Daily bump.

doc: Re-add HSAIL to Language Standards

The HSAIL web server has reappeared after weeks, so restore the standard
reference for now while we consider further deprecation.

This reverts commit 7e999bd84f47205dc44b0f2dc90b53b3c888ca48.

gcc/
2021-01-06 Gerald Pfeifer <gerald@pfeifer.com>

Revert:
2020-12-28 Gerald Pfeifer <gerald@pfeifer.com>

* doc/standards.texi (HSAIL): Remove section.

Update GNU/Hurd configure support

ChangeLog:

* libtool.m4: Match gnu* along other GNU systems.
* libgo/config/libtool.m4: Match gnu* along other GNU systems.
* libgo/configure: Re-generate.

libffi/
* configure: Re-generate.

libgomp/
* configure: Re-generate.

gcc/

* configure: Re-generate.

libatomic/

* configure: Re-generate.

libbacktrace/

* configure: Re-generate.

libcc1/

* configure: Re-generate.

libgfortran/

* configure: Re-generate.

libgomp/

* configure: Re-generate.

libhsail-rt/

* configure: Re-generate.

libitm/

* configure: Re-generate.

libobjc/

* configure: Re-generate.

liboffloadmic/

* configure: Re-generate.
* plugin/configure: Re-generate.

libphobos/

* configure: Re-generate.

libquadmath/

* configure: Re-generate.

libsanitizer/

* configure: Re-generate.

libssp/

* configure: Re-generate.

libstdc++-v3/

* configure: Re-generate.

libvtv/

* configure: Re-generate.

lto-plugin/

* configure: Re-generate.

zlib/

* configure: Re-generate.

IBM Z: Fix check_effective_target_s390_z14_hw

Commit 2f473f4b065d ("IBM Z: Do not run long double tests on old
machines") introduced a predicate for tests that must run only on z14+.
However, due to a syntax error, the predicate always returns false.

gcc/testsuite/ChangeLog:

2020-12-10 Ilya Leoshkevich <iii@linux.ibm.com>

* gcc.target/s390/s390.exp: Replace %% with %.

xfail test that will never pass on i?86 FreeBSD

gcc/testsuite
* gfortran.dg/dec_math.f90: xfail on i?86-*-freebsd*

syscall: don't define sys_SETREUID and friends

We don't use them, since we always call the C library functions which do
the right thing anyhow. And they aren't defined on all GNU/Linux variants.

Fixes PR go/98510

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/281473

internal/cpu: more build fixes for Go1.16beta1 release

Some files were missing from the libgo copy of internal/cpu, because they
used to only declare CacheLinePadSize which libgo gets from goarch.sh.
Now they also declare doinit, so copy them over. Adjust cpu_other.go.

Fix the amd64p32 build by adding a build constraint to cpu_no_name.go.

Fixes PR go/98493

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/281472

doc: reflect the publication of C++20 in invoke.texi and standards.texi

Jonathan mentioned on IRC that ISO/IEC 14882:2020 has been published
yesterday (and indeed it appears on www.iso.org for sale).
I think we should reflect that in our documentation and in cxx-status.html,
patches attached.
I understand we want to keep C++20 support experimental even in GCC 11,
though not sure if we should still talk about "almost certainly change in
incompatible ways" rather than that it might change in incompatible ways.

2021-01-05 Jakub Jelinek <jakub@redhat.com>

* doc/invoke.texi (-std=c++20): Adjust for the publication of
ISO 14882:2020 standard.
* doc/standards.texi: Likewise.

d: Merge upstream dmd a5c86f5b9

Adds the following new `__traits' to the D language.

- isDeprecated: used to detect if a function is deprecated.

- isDisabled: used to detect if a function is marked with @disable.

- isFuture: used to detect if a function is marked with @__future.

- isModule: used to detect if a given symbol represents a module, this
   enhancement also adds support using `is(sym == module)'.

- isPackage: used to detect if a given symbol represents a package,
   this enhancement also adds support using `is(sym == package)'.

- child: takes two arguments.  The first must be a symbol or expression
   and the second must be a symbol, such as an alias to a member of the
   first 'parent' argument.  The result is the second 'member' argument
   interpreted with its 'this' context set to 'parent'.  This is the
   inverse of `__traits(parent, member)'.

- isReturnOnStack: determines if a function's return value is placed on
   the stack, or is returned via registers.

- isZeroInit: used to detect if a type's default initializer has no
   non-zero bits.

- getTargetInfo: used to query features of the target being compiled
   for, the back-end can expand this to register any key to handle the
   given argument, however a reliable subset exists which includes
   "cppRuntimeLibrary", "cppStd", "floatAbi", and "objectFormat".

- getLocation: returns a tuple whose entries correspond to the
   filename, line number, and column number of where the argument was
   declared.

- hasPostblit: used to detect if a type is a struct with a postblit.

- isCopyable: used to detect if a type allows copying its value.

- getVisibility: an alias for the getProtection trait.

Reviewed-on: https://github.com/dlang/dmd/pull/12093

gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd a5c86f5b9.
* d-builtins.cc (d_eval_constant_expression): Handle ADDR_EXPR trees
created by build_string_literal.
* d-frontend.cc (retStyle): Remove function.
* d-target.cc (d_language_target_info): New variable.
(d_target_info_table): Likewise.
(Target::_init): Initialize d_target_info_table.
(Target::isReturnOnStack): New function.
(d_add_target_info_handlers): Likewise.
(d_handle_target_cpp_std): Likewise.
(d_handle_target_cpp_runtime_library): Likewise.
(Target::getTargetInfo): Likewise.
* d-target.h (struct d_target_info_spec): New type.
(d_add_target_info_handlers): Declare.

Add <source_location> to the precompiled header.

2021-01-05 Ed Smith-Rowland <3dw4rd@verizon.net>

* include/precompiled/stdc++.h: Add <source_location> to C++20 section.

x86: Use unsigned short to compute pextrw result

Use unsigned short to compute the zero-extended pextrw result.

PR target/98495
* gcc.target/i386/sse2-mmx-pextrw.c (compute_correct_result): Use
unsigned short to compute pextrw result.

c++: Fix deduction from the type of an NTTP

In the testcase nontype-auto17.C below, the calls to f and g are invalid
because neither deduction nor defaulting of the template parameter T
yields a valid specialization.  Deducing T doesn't work because T is
used only in a non-deduced context, and defaulting T doesn't work
because its default argument makes the type of M invalid.

But with -std=c++17 or later, we incorrectly accept both calls.
Starting with C++17 (specifically P0127R2), during deduction we're
allowed to try to deduce T from the argument '42' that's been
tentatively deduced for M.  The problem is that when unify walks into
the type of M (a TYPENAME_TYPE), it immediately gives up without
performing any new unifications (so the type of M is still unknown) --
and then we go on to unify M with '42' anyway.  Later in
type_unification_real, we complete the template argument vector using
T's default template argument, and end up forming the bogus
specializations f<void, 42> and g<S, 42>.

This patch fixes this issue by checking whether the type of an NTTP is
still dependent after walking into its type during unification.  If it
is, it means we couldn't deduce all the template parameters used in its
type, and so we shouldn't yet unify the NTTP.

(The new testcase ttp33.C demonstrates the need for the TEMPLATE_PARM_LEVEL
check; without it, we would ICE on this testcase from the call to tsubst.)

gcc/cp/ChangeLog:

* pt.c (unify) <case TEMPLATE_PARM_INDEX>: After walking into
the type of the NTTP, substitute into the type again.  If the
type is still dependent, don't unify the NTTP.

gcc/testsuite/ChangeLog:

* g++.dg/template/partial5.C: Adjust directives to expect the
same errors across all dialects.
* g++.dg/cpp1z/nontype-auto17.C: New test.
* g++.dg/cpp1z/nontype-auto18.C: New test.
* g++.dg/template/ttp33.C: New test.

expand: Fold x - y < 0 to x < y during expansion [PR94802]

My earlier patch to simplify x - y < 0 etc. for signed subtraction
with undefined overflow into x < y in match.pd regressed some tests,
even when it was guarded to be post-IPA, the following patch thus
attempts to optimize that during expansion instead (which is the last
time we can do it, afterwards we lose the information whether it was
x - y < 0 or (int) ((unsigned) x - y) < 0 for which we couldn't
optimize it.

2021-01-05 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/94802
* expr.h (maybe_optimize_sub_cmp_0): Declare.
* expr.c: Include tree-pretty-print.h and flags.h.
(maybe_optimize_sub_cmp_0): New function.
(do_store_flag): Use it.
* cfgexpand.c (expand_gimple_cond): Likewise.

* gcc.target/i386/pr94802.c: New test.
* gcc.dg/Wstrict-overflow-25.c: Remove xfail.

nvptx: Cache stacks block for OpenMP kernel launch

2021-01-05 Julian Brown <julian@codesourcery.com>

libgomp/
* plugin/plugin-nvptx.c (SOFTSTACK_CACHE_LIMIT): New define.
(struct ptx_device): Add omp_stacks struct.
(nvptx_open_device): Initialise cached-stacks housekeeping info.
(nvptx_close_device): Free cached stacks block and mutex.
(nvptx_stacks_free): New function.
(nvptx_alloc): Add SUPPRESS_ERRORS parameter.
(GOMP_OFFLOAD_alloc): Add strategies for freeing soft-stacks block.
(nvptx_stacks_alloc): Rename to...
(nvptx_stacks_acquire): This. Cache stacks block between runs if same
size or smaller is required.
(nvptx_stacks_free): Remove.
(GOMP_OFFLOAD_run): Call nvptx_stacks_acquire and lock stacks block
during kernel execution.

A couple of comment tweaks

Tweak a couple of comments added in the RTL-SSA series in response
to reviewer feedback.

gcc/
* mux-utils.h (pointer_mux::m_ptr): Tweak description of contents.
* rtlanal.c (simple_regno_set): Tweak description to clarify the
RMW condition.

Don't link cc1 etc. against libcody.a

Richi complained on IRC that cc1 is linked against libcody.a.
From my understanding, it is just the cc1plus and cc1objplus binaries
that need it, so this patch links only those against it.

> this is already part of my Solaris libcody patch

The following updated patch are the incremental changes between what Rainer
has committed and what I've posted.

2021-01-05 Jakub Jelinek <jakub@redhat.com>

gcc/cp/
* Make-lang.in (cc1plus-checksum, cc1plus$(exeext): Add
$(CODYLIB) after $(BACKEND).
gcc/objcp/
* Make-lang.in (cc1objplus-checksum, cc1objplus$(exeext): Add
$(CODYLIB) after $(BACKEND).

tree-optimization/98516 - fix SLP permute opt materialization

When materializing on a VEC_PERM node we have to permute the
incoming vectors, not the outgoing one.

2021-01-05 Richard Biener <rguenther@suse.de>

PR tree-optimization/98516
* tree-vect-slp.c (vect_optimize_slp): Permute the incoming
lanes when materializing on a VEC_PERM node.
(vectorizable_slp_permutation): Dump the permute properly.

* gcc.dg/vect/bb-slp-pr98516-1.c: New testcase.
* gcc.dg/vect/bb-slp-pr98516-2.c: Likewise.

c++: Fix ICE with __builtin_bit_cast [PR98469]

On the following testcase we ICE during constexpr evaluation (for warnings),
because the IL has ADDR_EXPR of BIT_CAST_EXPR and ADDR_EXPR case asserts
the result is not a CONSTRUCTOR.
The patch punts on lval BIT_CAST_EXPR folding.

> This change is OK, but part of the problem is that we're trying to do
> overload resolution for an S copy/move constructor, which we shouldn't be
> because bit_cast is a prvalue, so in C++17 and up we should use it to
> directly initialize the target without any implied constructor call.

This version therefore wraps it into a TARGET_EXPR then, it alone fixes
the bug, but I've kept the constexpr.c change too.

2021-01-05 Jakub Jelinek <jakub@redhat.com>

PR c++/98469
* constexpr.c (cxx_eval_constant_expression) <case BIT_CAST_EXPR>:
Punt if lval is true.
* semantics.c (cp_build_bit_cast): Call get_target_expr_sfinae on
the result if it has a class type.

* g++.dg/cpp2a/bit-cast8.C: New test.
* g++.dg/cpp2a/bit-cast9.C: New test.

c++: ICE with deferred noexcept when deducing targs [PR82099]

In this test we ICE in type_throw_all_p because it got a deferred
noexcept which it shouldn't.  Here's the story:

In noexcept61.C, we call bar, so we perform overload resolution.  When
adding the (only) candidate, we need to deduce template arguments, so
call fn_type_unification as usually.  That deduces U to

  void (*) (int &, int &)

which is correct, but its noexcept-spec is deferred_noexcept.  Then
we call add_function_candidate (bar), wherein we try to create an
implicit conversion sequence for every argument.  Since baz<int> is
of unknown type, we instantiate_type it; it is a TEMPLATE_ID_EXPR
so that calls resolve_address_of_overloaded_function.  But we crash
there, because target_type contains the deferred_noexcept.

So we need to maybe_instantiate_noexcept before we can compare types.
resolve_overloaded_unification seemed like the appropriate spot, now
fn_type_unification produces the function type with its noexcept-spec
instantiated.  This shouldn't go against CWG 1330 because here we
really need to instantiate the noexcept-spec.

This also fixes class-deduction76.C, a dg-ice test I recently added,
therefore this fix also fixes c++/90799, yay.

gcc/cp/ChangeLog:

PR c++/82099
* pt.c (resolve_overloaded_unification): Call
maybe_instantiate_noexcept after instantiating the function
decl.

gcc/testsuite/ChangeLog:

PR c++/82099
* g++.dg/cpp1z/class-deduction76.C: Remove dg-ice.
* g++.dg/cpp0x/noexcept61.C: New test.

move SLP debug counter

This moves it to catch individual SLP subgraphs

2021-01-05 Richard Biener <rguenther@suse.de>

* tree-vect-slp.c (vect_slp_region): Move debug counter
to cover individual subgraphs.

tree-optimization/98428 - avoid pre-existing vectors for loop SLP

It wasn't supposed to be enabled and appearantly copying around the
checking messed up the condition.

2021-01-05 Richard Biener <rguenther@suse.de>

PR tree-optimization/98428
* tree-vect-slp.c (vect_build_slp_tree_1): Properly reject
vector lane extracts for loop vectorization.

reassoc: Fix reassociation on 32-bit hosts with > 32767 bbs [PR98514]

Apparently reassoc ICEs on large functions (more than 32767 basic blocks
with something to reassociate in those).
The problem is that the pass uses long type to store the ranks, and
the bb ranks are (number of SSA_NAMEs with default defs + 2 + bb->index) << 16,
so with many basic blocks we overflow the ranks and we then have assertions
rank is not negative.

The following patch just uses int64_t instead of long in the pass,
yes, it means slightly higher memory consumption (one array indexed by
bb->index is twice as large, and one hash_map from trees to the ranks
will grow by 50%, but I think it is better than punting on large functions
the reassociation on 32-bit hosts and making it inconsistent e.g. when
cross-compiling.  Given vec.h uses unsigned for vect element counts,
we don't really support more than 4G of SSA_NAMEs or more than 2G of basic
blocks in a function, so even with the << 16 we can't really overflow the
int64_t rank counters.

2021-01-05  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/98514
* tree-ssa-reassoc.c (bb_rank): Change type from long * to
int64_t *.
(operand_rank): Change type from hash_map<tree, long> to
hash_map<tree, int64_t>.
(phi_rank): Change return type from long to int64_t.
(loop_carried_phi): Change block_rank variable type from long to
int64_t.
(propagate_rank): Change return type, rank parameter type and
op_rank variable type from long to int64_t.
(find_operand_rank): Change return type from long to int64_t
and change slot variable type from long * to int64_t *.
(insert_operand_rank): Change rank parameter type from long to
int64_t.
(get_rank): Change return type and rank variable type from long to
int64_t.  Use PRId64 instead of ld to print the rank.
(init_reassoc): Change rank variable type from long to int64_t
and adjust correspondingly bb_rank and operand_rank initialization.

phiopt: Optimize x < 0 ? ~y : y to (x >> 31) ^ y [PR96928]

As requested in the PR, the one's complement abs can be done more
efficiently without cmov or branching.

Had to change the ifcvt-onecmpl-abs-1.c testcase, we no longer optimize
it in ifcvt, on x86_64 with -m32 we generate in the end the exact same
code, but with -m64:
        movl    %edi, %eax
-       notl    %eax
-       cmpl    %edi, %eax
-       cmovl   %edi, %eax
+       sarl    $31, %eax
+       xorl    %edi, %eax
        ret

2021-01-05  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/96928
* tree-ssa-phiopt.c (xor_replacement): New function.
(tree_ssa_phiopt_worker): Call it.

* gcc.dg/tree-ssa/pr96928.c: New test.
* gcc.target/i386/ifcvt-onecmpl-abs-1.c: Remove -fdump-rtl-ce1,
instead of scanning rtl dump for ifcvt message check assembly
for xor instruction.

match.pd: Improve (A / (1 << B)) -> (A >> B) optimization [PR96930]

The following patch improves the A / (1 << B) -> A >> B simplification,
as seen in the testcase, if there is unnecessary widening for the division,
we just optimize it into a shift on the widened type, but if the lshift
is widened too, there is no reason to do that, we can just shift it in the
original type and convert after. The tree_nonzero_bits & wi::mask check
already ensures it is fine even for signed values.

I've split the vr-values optimization into a separate patch as it causes
a small regression on two testcases, but this patch fixes what has been
reported in the PR alone.

2021-01-05 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/96930
* match.pd ((A / (1 << B)) -> (A >> B)): If A is extended
from narrower value which has the same type as 1 << B, perform
the right shift on the narrower value followed by extension.

* g++.dg/tree-ssa/pr96930.C: New test.

store-merging: Handle vector CONSTRUCTORs using bswap [PR96239]

I've tried to add such helper, but handling over just analysis and letting
each pass handle it differently seems complicated given the limitations of
the bswap infrastructure.

So, this patch just hooks the optimization also into store-merging so that
the original testcase from the PR can be fixed.

2021-01-05 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/96239
* gimple-ssa-store-merging.c (maybe_optimize_vector_constructor): New
function.
(get_status_for_store_merging): Don't return BB_INVALID for blocks
with potential bswap optimizable CONSTRUCTORs.
(pass_store_merging::execute): Optimize vector CONSTRUCTORs with bswap
if possible.

* gcc.dg/tree-ssa/pr96239.c: New test.

go: Fix -fgo-embedcfg= option description.

Description of options should be . terminated, the:
FAIL: compiler driver --help=go option(s): "^ +-.*[^:.]$" absent from output: " -fgo-embedcfg=<file> List embedded files via go:embed"
test even reports that.

2021-01-05 Jakub Jelinek <jakub@redhat.com>

* lang.opt (fgo-embedcfg=): Add full stop at the end of description.

tree-optimization/98381 - fix live bool vector extract

This fixes extraction of live bool vector results for the case of
integer mode vectors.

2021-01-05 Richard Biener <rguenther@suse.de>

PR tree-optimization/98381
* tree.c (vector_element_bits): Properly compute bool vector
element size.
* tree-vect-loop.c (vectorizable_live_operation): Properly
compute the last lane bit offset.

i386: Prevent spurious FP exceptions with _mm_cvt{,t}ps_pi32 [PR98522]

Prevent spurious FP exceptions with _mm_cvt{,t}ps_pi32 for TARGET_MMX_WITH_SSE
by clearing the top 64 bytes of the input XMM register.

2021-01-05 Uroš Bizjak <ubizjak@gmail.com>

gcc/
PR target/98522
* config/i386/sse.md (sse_cvtps2pi): Redefine as define_insn_and_split.
Clear the top 64 bytes of the input XMM register.
(sse_cvttps2pi): Ditto.

gcc/testsuite

PR target/98522
* gcc.target/i386/pr98522.c: New test.