git.libre-soc.org Git

arm: Auto-vectorization for MVE: vneg

This patch enables MVE vneg instructions for auto-vectorization.  MVE
vnegq insns in mve.md are modified to use 'neg' instead of unspec
expression.  The neg<mode>2 expander is added to vec-common.md.

Existing patterns in neon.md are prefixed with neon_.
It's not clear why we have different patterns for VDQW
and VH in neon.md, when WDQWH handles both, and patterns
with VDQ have provision for attributes for FP modes.

Another question is why <absneg_str><mode>2 always sets
neon_abs<q> type when it also handles neon_neq<q> cases.

2020-12-11  Christophe Lyon  <christophe.lyon@linaro.org>

gcc/
* config/arm/mve.md (mve_vnegq_f): Use 'neg' instead of unspec.
(mve_vnegq_s): Likewise.
* config/arm/neon.md (neg<mode>2): Rename into neon_neg<mode>2.
(<absneg_str><mode>2): Rename into neon_<absneg_str><mode>2.
(neon_v<absneg_str><mode>): Call gen_neon_<absneg_str><mode>2.
(vashr<mode>3): Call gen_neon_neg<mode>2.
(vlshr<mode>3): Call gen_neon_neg<mode>2.
(neon_vneg<mode>): Call gen_neon_neg<mode>2.
* config/arm/unspecs.md (VNEGQ_F, VNEGQ_S): Remove.
* config/arm/vec-common.md (neg<mode>2): New expander.

gcc/testsuite/
* gcc.target/arm/simd/mve-vneg.c: Add tests for vneg.

arm: Auto-vectorization for MVE: vmvn

This patch enables MVE vmvnq instructions for auto-vectorization.  MVE
vmvnq insns in mve.md are modified to use 'not' instead of unspec
expression to support one_cmpl<mode>2.  The one_cmpl<mode>2 expander
is added to vec-common.md.

2020-12-11  Christophe Lyon  <christophe.lyon@linaro.org>

gcc/
* config/arm/iterators.md (VDQNOTM2): New mode iterator.
(supf): Remove VMVNQ_S and VMVNQ_U.
(VMVNQ): Remove.
* config/arm/mve.md (mve_vmvnq_u<mode>): New entry for vmvn
instruction using expression not.
(mve_vmvnq_s<mode>): New expander.
* config/arm/neon.md (one_cmpl<mode>2): Renamed into
one_cmpl<mode>2_neon.
* config/arm/unspecs.md (VMVNQ_S, VMVNQ_U): Remove.
* config/arm/vec-common.md (one_cmpl<mode>2): New expander.

gcc/testsuite/
* gcc.target/arm/simd/mve-vmvn.c: Add tests for vmvn.

arm: Auto-vectorization for MVE: vbic

This patch enables MVE vbic instructions for auto-vectorization. MVE
vbicq insns in mve.md are modified to use 'and not' instead of unspec
expression.

2020-12-11 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/iterators.md (supf): Remove VBICQ_S and VBICQ_U.
(VBICQ): Remove.
* config/arm/mve.md (mve_vbicq_u<mode>): New entry for vbic
instruction using expression and not.
(mve_vbicq_s<mode>): New expander.
(mve_vbicq_f<mode>): Replace use of unspec by 'and not'.
* config/arm/unspecs.md (VBICQ_S, VBICQ_U, VBICQ_F): Remove.

gcc/testsuite/
* gcc.target/arm/simd/mve-vbic.c: Add tests for vbic.

arm: Auto-vectorization for MVE: veor

This patch enables MVE veorq instructions for auto-vectorization.  MVE
veorq insns in mve.md are modified to use xor instead of unspec
expression to support xor<mode>3.  The xor<mode>3 expander is added to
vec-common.md

2020-12-11  Christophe Lyon  <christophe.lyon@linaro.org>

gcc/
* config/arm/iterators.md (supf): Remove VEORQ_S and VEORQ_U.
(VEORQ): Remove.
* config/arm/mve.md (mve_veorq_u<mode>): New entry for veor
instruction using expression xor.
(mve_veorq_s<mode>): New expander.
(mve_veorq_f<mode>): Use 'xor' code instead of unspec.
* config/arm/neon.md (xor<mode>3): Renamed into xor<mode>3_neon.
* config/arm/unspecs.md (VEORQ_S, VEORQ_U, VEORQ_F): Remove.
* config/arm/vec-common.md (xor<mode>3): New expander.

gcc/testsuite/
* gcc.target/arm/simd/mve-veor.c: Add tests for veor.

arm,testsuite: Fix vect-half-floats.c test

This patch fixes typos in effective targets which otherwise lead to
DejaGnu errors.

It also replaces dg-additional-options with dg-options to avoid
compiling with -ansi -pedantic-errors, resulting in
error: ISO C does not support the '_Float16' type [-Wpedantic]

2020-12-14 Christophe Lyon <christophe.lyon@linaro.org>

gcc/testsuite/
* gcc.target/arm/vect-half-floats.c: Fix typos.

sanitizer: do not ICE for pointer cmp/sub

gcc/c/ChangeLog:

PR sanitizer/98204
* c-typeck.c (pointer_diff): Do not emit a top-level
sanitization.
(build_binary_op): Likewise.

gcc/testsuite/ChangeLog:

PR sanitizer/98204
* c-c++-common/asan/pr98204.c: New test.

aarch64: Add support for Cortex-A78C

This patch adds support for -mcpu=cortex-a78c command line option.
For more information about this processor, see [0]:

[0] https://developer.arm.com/ip-products/processors/cortex-a/cortex-a78c

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (AARCH64_CORE): Add Cortex-A78C core.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Update docs.

-fgo-dump-spec: skip typedefs that match struct tag

gcc/:
* godump.c (go_output_typedef): Suppress typedefs whose name
matches the tag of the underlying struct, union, or enum.
Output declarations for enums that do not appear in typedefs.
gcc/testsuite:
* gcc.misc-tests/godump-1.c: Add test cases.

libstdc++: Fix several _GLIBCXX_DEBUG tests

libstdc++-v3/ChangeLog:

* testsuite/23_containers/array/debug/back2_neg.cc: target c++14 because assertion
for constexpr is disabled in C++11.
* testsuite/23_containers/array/debug/front2_neg.cc: Likewise.
* testsuite/23_containers/array/debug/square_brackets_operator2_neg.cc: Likewise.
* testsuite/23_containers/vector/debug/multithreaded_swap.cc: Include <memory>
for shared_ptr.

Daily bump.

VAX: Unify push operation selection

Avoid the possibility of code discrepancies like one fixed with the
previous change and improve the structure of code by selecting between
push and non-push operations in a single place in `vax_output_int_move'.

The PUSHAB/MOVAB address moves are never actually produced from this
code as the SImode invocation of this function is guarded with the
`nonsymbolic_operand' predicate, but let's not mess up with this code
too much on this occasion and keep the piece in place.

* config/vax/vax.c (vax_output_int_move): Unify push operation
selection.

VAX: Check the correct operand for constant 0 push operation

Check the output operand for representing pushing a value onto the stack
rather than the constant 0 input in determining whether to use the PUSHL
or the CLRL instruction for a SImode move. The latter actually works by
means of using the predecrement addressing mode with the SP register and
the machine code produced even takes the same number of bytes, however
at least with some VAX implementations it incurs a performance penalty.
Besides, we don't want to check the wrong operand anyway and have code
that works by chance only.

Add a test case covering push operations; for operands different from
constant zero there is actually a code size advantage for using PUSHL
rather than the equivalent MOVL instruction.

gcc/
* config/vax/vax.c (vax_output_int_move): Check the correct
operand for constant 0 push operation.

gcc/testsuite/
* gcc.target/vax/push.c: New test.

VAX: Handle subtracting from self with QMATH DImode add/sub

Remove an assertion the failure of which has not been actually observed,
but which appears clearly dangerous, for when the QMATH DImode add/sub
handler is invoked with the subtrahend and the minuend both the same.
Instead handle the operation by emitting a move of constant 0 to the
output operand. Adjust the relevant inline comment accordingly.

gcc/
* config/vax/vax.c (vax_expand_addsub_di_operands): Handle equal
input operands with subtraction.

VAX: Handle constant 0 with QMATH DImode add/sub

Handle constant 0 passed to the QMATH DImode add/sub handler such as
with:

#2  0x0000000011d409b0 in gen_adddi3 (operand0=0x7ffff5c0a128,
    operand1=0x7ffff5c60480, operand2=0x7ffff5c60470)
    at .../gcc/config/vax/vax.md:755
755   "vax_expand_addsub_di_operands (operands, PLUS); DONE;")
(gdb) pr operand0
(reg:DI 31)
(gdb) pr operand1
(const_int 0 [0])
(gdb) pr operand2
(const_int -1 [0xffffffffffffffff])
(gdb)

causing an assertion in `vax_expand_addsub_di_operands':

      gcc_assert (operands[1] != const0_rtx || code == MINUS);

to trigger:

during RTL pass: expand
.../gcc/testsuite/gcc.c-torture/compile/sync-1.c: In function 'test_op_ignore':
.../gcc/testsuite/gcc.c-torture/compile/sync-1.c:33:10: internal compiler error: in vax_expand_addsub_di_operands, at config/vax/vax.c:2080
0x11815003 vax_expand_addsub_di_operands(rtx_def**, rtx_code)
.../gcc/config/vax/vax.c:2080
0x11d409af gen_adddi3(rtx_def*, rtx_def*, rtx_def*)
.../gcc/config/vax/vax.md:755
0x10ea2763 rtx_insn* insn_gen_fn::operator()<rtx_def*, rtx_def*, rtx_def*>(rtx_def*, rtx_def*, rtx_def*) const
.../gcc/recog.h:304
0x10f7fc8f maybe_gen_insn(insn_code, unsigned int, expand_operand*)
.../gcc/optabs.c:7402
0x10f67f8b expand_binop_directly
.../gcc/optabs.c:1122
0x10f684cf expand_binop(machine_mode, optab_tag, rtx_def*, rtx_def*, rtx_def*, int, optab_methods)
.../gcc/optabs.c:1209
0x10f6fb4f expand_unop(machine_mode, optab_tag, rtx_def*, rtx_def*, int)
.../gcc/optabs.c:3013
0x10f6c493 expand_simple_unop(machine_mode, rtx_code, rtx_def*, rtx_def*, int)
.../gcc/optabs.c:2200
0x10f7e2f3 expand_atomic_fetch_op(rtx_def*, rtx_def*, rtx_def*, rtx_code, memmodel, bool)
.../gcc/optabs.c:7021
0x107f7523 expand_builtin_sync_operation
.../gcc/builtins.c:7605
0x107ff547 expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int)
.../gcc/builtins.c:9430
0x10acda63 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool)
.../gcc/expr.c:11249
0x10abeb9f expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool)
.../gcc/expr.c:8486
0x1085606b expand_expr
.../gcc/expr.h:282
0x1086157f expand_call_stmt
.../gcc/cfgexpand.c:2709
0x10865ab7 expand_gimple_stmt_1
.../gcc/cfgexpand.c:3713
0x108662fb expand_gimple_stmt
.../gcc/cfgexpand.c:3877
0x10870387 expand_gimple_basic_block
.../gcc/cfgexpand.c:5918
0x10872b6b execute
.../gcc/cfgexpand.c:6602
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
compiler exited with status 1
FAIL: gcc.c-torture/compile/sync-1.c   -O0  (internal compiler error)

causing numerous failures in regression testing.

While requesting an addition operation to be produced for the constant
operands of 0 and -1 may seem silly, technically there is nothing wrong
with it, and non-QMATH code (as with the `-mno-qmath' option) has no
issues with that, so neither should QMATH code.  This operation will
normally be folded in later passes anyway.

Observe then, that adding or subtracting constant 0 amounts to a move
(and we even have a machine instruction available to do that with a
single operation) so handle the case explicitly, swapping the addends if
so required, removing the assertion failure and along with that 70 test
suite failures like:

FAIL: gcc.c-torture/compile/sync-1.c   -O0  (internal compiler error)
FAIL: gcc.c-torture/compile/sync-1.c   -O0  fetch_and_nand (test for warnings, line )
FAIL: gcc.c-torture/compile/sync-1.c   -O0  nand_and_fetch (test for warnings, line )
FAIL: gcc.c-torture/compile/sync-1.c   -O0  (test for excess errors)
FAIL: gcc.c-torture/compile/sync-2.c   -O0  (internal compiler error)
FAIL: gcc.c-torture/compile/sync-2.c   -O0   (test for warnings, line )
FAIL: gcc.c-torture/compile/sync-2.c   -O0  (test for excess errors)
FAIL: gcc.c-torture/compile/sync-3.c   -O0  (internal compiler error)
FAIL: gcc.c-torture/compile/sync-3.c   -O0   (test for warnings, line )
FAIL: gcc.c-torture/compile/sync-3.c   -O0  (test for excess errors)

and similarly across all the other optimization levels and compilation
options covered.

gcc/
* config/vax/vax.c (vax_expand_addsub_di_operands): Handle the
addition or subtraction of 0.

VAX: Remove unused register allocation from QMATH DImode add/sub handler

An allocation is made for a temporary register, however it is unneeded,
as actually explained in the comment preceding the conditional block in
question, and consequently never used, so remove it. The `temp' rtx is
already used elsewhere in the function, which is possibly why this dead
assignment has not been warned about.

gcc/
* config/vax/vax.c (vax_expand_addsub_di_operands): Remove
unused register allocation.

VAX: Fix lower bound adjustment with `casesi'

Fix an issue with the `casesi' expander using `GEN_INT' to produce the
constant rtx for lower bound adjustment.  This generates a VOIDmode
value which may overflow the SImode range required for the operand to
stay within to satisfy `general_operand', resulting in an ICE like:

.../gcc/testsuite/gcc.c-torture/compile/pr46934.c: In function 'caller':
.../gcc/testsuite/gcc.c-torture/compile/pr46934.c:17:1: error: unrecognizable insn:
(insn 5 2 6 2 (set (reg:SI 25)
        (plus:SI (mem/c:SI (reg/f:SI 17 virtual-incoming-args) [1 reg_type+0 S4 A32])
            (const_int 2147483648 [0x80000000]))) -1
     (nil))
during RTL pass: vregs
.../gcc/testsuite/gcc.c-torture/compile/pr46934.c:17:1: internal compiler error: in extract_insn, at recog.c:2315
0x110d4673 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
.../gcc/rtl-error.c:108
0x110d46eb _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
.../gcc/rtl-error.c:116
0x1106578b extract_insn(rtx_insn*)
.../gcc/recog.c:2315
0x10b63f73 instantiate_virtual_regs_in_insn
.../gcc/function.c:1609
0x10b65b2f instantiate_virtual_regs
.../gcc/function.c:1979
0x10b65ca7 execute
.../gcc/function.c:2028
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
compiler exited with status 1
FAIL: gcc.c-torture/compile/pr46934.c   -O0  (internal compiler error)

Use `gen_int_mode' to produce the rtx instead, requesting a SImode value
so that the constant gets correctly truncated:

@@ -199,7 +199,7 @@ caller (unsigned int reg_type)

(insn 5 4 6 (set (reg:SI 25)
         (plus:SI (mem/c:SI (reg/f:SI 17 virtual-incoming-args) [1 reg_type+0 S4 A32])
-            (const_int 2147483648 [0x80000000]))) -1
+            (const_int -2147483648 [0xffffffff80000000]))) -1
      (nil))

(jump_insn 6 5 7 (set (pc)

removing these test suite failures:

FAIL: gcc.c-torture/compile/pr46934.c   -O0  (internal compiler error)
FAIL: gcc.c-torture/compile/pr46934.c   -O0  (test for excess errors)

with the `vax-netbsdelf' target.

gcc/
* config/vax/vax.md (casesi): Use `gen_int_mode' rather than
`GEN_INT' for the immediate used for lower bound adjustment.

widening_mul: Fix a > ~b to .ADD_OVERFLOW optimization [PR98256]

Unfortunately, my latest tree-ssa-math-opts.c patch broke the following
testcase.  The problem is that the code is adding .ADD_OVERFLOW or
.SUB_OVERFLOW before or after the stmt on which the function has been
called, which is normally a addition or subtraction that has all the
operands.
But in the a > ~b optimization that stmt is the ~b stmt and the other
comparison operand might be defined only after that ~b stmt, so we can't
insert the .ADD_OVERFLOW next to ~b that we want to delete, but need to
insert it before the a > temp comparison that uses it; and in that case
when removing the BIT_NOT_EXPR stmt we need to ensure the caller doesn't do
gsi_next because gsi_remove already points the iterator to the next stmt.

2020-12-13  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/98256
* tree-ssa-math-opts.c (match_uaddsub_overflow): For BIT_NOT_EXPR,
only handle a single use, and insert .ADD_OVERFLOW before the
comparison rather than after the BIT_NOT_EXPR.  Return true iff
it is BIT_NOT_EXPR and it has been removed.
(math_opts_dom_walker::after_dom_children) <case BIT_NOT_EXPR>:
If match_uaddsub_overflow returned true, continue instead of break.

* gcc.c-torture/compile/pr98256.c: New test.

Revert "Arm: Add NEON and MVE RTL patterns for Complex Addition, Multiply and FMA."

This reverts commit 3b8a82f97dd48e153ce93b317c44254839e11461.

Has a dependency on the AArch64 patch which hasn't been approved yet.

varasm: Reject soft frame or arg pointer registers for register vars [PR92469]

The following patch rejects frame, argp and retarg registers (unless they are equal
to hard frame pointer registers or if they aren't eliminable) from local or global
register vars.
These are just internal implementation details eliminated later into hard
frame pointer or stack pointer and using them as register variable leads
to numerous ICEs.

2020-12-13 Jakub Jelinek <jakub@redhat.com>

PR target/92469
* varasm.c (eliminable_regno_p): New function.
(make_decl_rtl): Reject asm vars for frame and argp
if they are different from hard frame pointer.

* gcc.target/i386/pr92469.c: New test.
* gcc.target/i386/pr79804.c: Adjust expected diagnostics.
* gcc.target/i386/pr88178.c: Expect an error.

Arm: Add NEON and MVE RTL patterns for Complex Addition, Multiply and FMA.

This adds implementation for the optabs for complex additions.  With this the
following C code:

  void f90 (float complex a[restrict N], float complex b[restrict N],
    float complex c[restrict N])
  {
    for (int i=0; i < N; i++)
      c[i] = a[i] + (b[i] * I);
  }

generates

  f90:
  add     r3, r2, #1600
  .L2:
  vld1.32 {q8}, [r0]!
  vld1.32 {q9}, [r1]!
  vcadd.f32       q8, q8, q9, #90
  vst1.32 {q8}, [r2]!
  cmp     r3, r2
  bne     .L2
  bx      lr

instead of

  f90:
  add     r3, r2, #1600
  .L2:
  vld2.32 {d24-d27}, [r0]!
  vld2.32 {d20-d23}, [r1]!
  vsub.f32 q8, q12, q11
  vadd.f32 q9, q13, q10
  vst2.32 {d16-d19}, [r2]!
  cmp     r3, r2
  bne     .L2
  bx      lr

gcc/ChangeLog:

* config/arm/arm_mve.h (__arm_vcaddq_rot90_u8, __arm_vcaddq_rot270_u8,
, __arm_vcaddq_rot90_s8, __arm_vcaddq_rot270_s8,
__arm_vcaddq_rot90_u16, __arm_vcaddq_rot270_u16, __arm_vcaddq_rot90_s16,
__arm_vcaddq_rot270_s16, __arm_vcaddq_rot90_u32,
__arm_vcaddq_rot270_u32, __arm_vcaddq_rot90_s32,
__arm_vcaddq_rot270_s32, __arm_vcmulq_rot90_f16,
__arm_vcmulq_rot270_f16, __arm_vcmulq_rot180_f16,
__arm_vcmulq_f16, __arm_vcaddq_rot90_f16, __arm_vcaddq_rot270_f16,
__arm_vcmulq_rot90_f32, __arm_vcmulq_rot270_f32,
__arm_vcmulq_rot180_f32, __arm_vcmulq_f32, __arm_vcaddq_rot90_f32,
__arm_vcaddq_rot270_f32, __arm_vcmlaq_f16, __arm_vcmlaq_rot180_f16,
__arm_vcmlaq_rot270_f16, __arm_vcmlaq_rot90_f16, __arm_vcmlaq_f32,
__arm_vcmlaq_rot180_f32, __arm_vcmlaq_rot270_f32,
__arm_vcmlaq_rot90_f32): Update builtin calls.
* config/arm/arm_mve_builtins.def (vcaddq_rot90_u, vcaddq_rot270_u,
vcaddq_rot90_s, vcaddq_rot270_s, vcaddq_rot90_f, vcaddq_rot270_f,
vcmulq_f, vcmulq_rot90_f, vcmulq_rot180_f, vcmulq_rot270_f,
vcmlaq_f, vcmlaq_rot90_f, vcmlaq_rot180_f, vcmlaq_rot270_f): Removed.
(vcaddq_rot90, vcaddq_rot270, vcmulq, vcmulq_rot90, vcmulq_rot180,
vcmulq_rot270, vcmlaq, vcmlaq_rot90, vcmlaq_rot180, vcmlaq_rot270):
New.
* config/arm/constraints.md (Dz): Include MVE.
* config/arm/iterators.md (mve_rotsplit1, mve_rotsplit2): New.
(rot): Add UNSPEC_VCMLS, UNSPEC_VCMUL and UNSPEC_VCMUL180.
(rot_op, rotsplit1, rotsplit2, fcmac1, VCMLA_OP, VCMUL_OP): New.
* config/arm/mve.md (VCADDQ_ROT270_S, VCADDQ_ROT90_S, VCADDQ_ROT270_U,
VCADDQ_ROT90_U, VCADDQ_ROT270_F, VCADDQ_ROT90_F, VCMULQ_F,
VCMULQ_ROT180_F, VCMULQ_ROT270_F, VCMULQ_ROT90_F, VCMLAQ_F,
VCMLAQ_ROT180_F, VCMLAQ_ROT90_F, VCMLAQ_ROT270_F, VCADDQ_ROT270_S,
VCADDQ_ROT270, VCADDQ_ROT90): Removed.
(mve_rot, VCMUL): New.
(mve_vcaddq_rot270_<supf><mode, mve_vcaddq_rot90_<supf><mode>,
mve_vcaddq_rot270_f<mode>, mve_vcaddq_rot90_f<mode>, mve_vcmulq_f<mode,
mve_vcmulq_rot180_f<mode>, mve_vcmulq_rot270_f<mode>,
mve_vcmulq_rot90_f<mode>, mve_vcmlaq_f<mode>, mve_vcmlaq_rot180_f<mode>,
mve_vcmlaq_rot270_f<mode>, mve_vcmlaq_rot90_f<mode>): Removed.
(mve_vcmlaq<mve_rot><mode>, mve_vcmulq<mve_rot><mode>,
mve_vcaddq<mve_rot><mode>, cadd<rot><mode>3, mve_vcaddq<mve_rot><mode>):
New.
(cmul<rot_op><mode>3): Exclude MVE types.
* config/arm/unspecs.md (UNSPEC_VCMUL90, UNSPEC_VCMUL270): New.
* config/arm/vec-common.md (cadd<rot><mode>3, cmul<rot_op><mode>3,
arm_vcmla<rot><mode>, cml<fcmac1><rot_op><mode>4): New.
* config/arm/unspecs.md (UNSPEC_VCMUL, UNSPEC_VCMUL180, UNSPEC_VCMLS,
UNSPEC_VCMLS180): New.
* config/arm/neon.md (cmul<rot_op><mode>3): New.

Arm: Add support for auto-vectorization using HF mode.

This adds support to the auto-vectorizer to support HFmode vectorization for
AArch32. This is supported when +fp16 is used. I wonder if I should disable
the returning of the type if the option isn't enabled.

At the moment it will be returned but the vectorizer will try and fail to use
it. It wastes a few compile cycles but doesn't result in bad code.

gcc/ChangeLog:

* config/arm/arm.c (arm_preferred_simd_mode): Add E_HFmode.

gcc/testsuite/ChangeLog:

* gcc.target/arm/vect-half-floats.c: New test.

middle-end: Support complex Addition

This patch adds support for

  * Complex Addition with rotation of 90 and 270.

  Addition with rotation of the second argument around the Argand plane.
    Supported rotations are 90 and 180.

    c = a + (b * I) and c = a + (b * I * I * I)

gcc/ChangeLog:

* tree-vect-slp-patterns.c: New file.
* Makefile.in: Add it.
* doc/passes.texi: Document it.
* internal-fn.def (COMPLEX_ADD_ROT90, COMPLEX_ADD_ROT270): New.
* optabs.def (cadd90_optab, cadd270_optab): New.
* doc/md.texi: Document them.
* tree-vect-loop.c (vect_analyze_loop_2): Add dissolve code.
* tree-vect-slp.c:
(vect_free_slp_instance, vect_create_new_slp_node): Export.
(vect_match_slp_patterns_2, vect_match_slp_patterns): New.
(vect_analyze_slp): Use it.
* tree-vectorizer.h (vect_free_slp_tree): Export.
(enum _complex_operation): Forward declare.
(class vect_pattern): New

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_arm_v8_3a_complex_neon_ok_nocache): Fix it.
(check_effective_target_vect_complex_add_byte
,check_effective_target_vect_complex_add_int
,check_effective_target_vect_complex_add_short
,check_effective_target_vect_complex_add_long
,check_effective_target_vect_complex_add_half
,check_effective_target_vect_complex_add_float
,check_effective_target_vect_complex_add_double): New.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-byte.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-int.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-long.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-short.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-byte.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-int.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-long.c: New test.
* gcc.dg/vect/complex/bb-slp-complex-add-pattern-unsigned-short.c: New test.
* gcc.dg/vect/complex/complex-add-pattern-template.c: New test.
* gcc.dg/vect/complex/complex-add-template.c: New test.
* gcc.dg/vect/complex/complex-operations-run.c: New test.
* gcc.dg/vect/complex/complex-operations.c: New test.
* gcc.dg/vect/complex/complex.exp: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-double.c: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-float.c: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-half-float.c: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-double.c: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-float.c: New test.
* gcc.dg/vect/complex/fast-math-bb-slp-complex-add-pattern-half-float.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-double.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-float.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-half-float.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-double.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-float.c: New test.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-byte.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-int.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-long.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-short.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-byte.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-int.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-long.c: New test.
* gcc.dg/vect/complex/vect-complex-add-pattern-unsigned-short.c: New test.

middle-end: Refactor and expose some vectorizer helper functions.

This is a small refactoring which exposes some helper functions in the
vectorizer so they can be used in other places.

gcc/ChangeLog:

* tree-vect-patterns.c (vect_mark_pattern_stmts): Remove static inline.
* tree-vect-slp.c (vect_create_new_slp_node): Remove static and only
set smts if valid.
* tree-vectorizer.c (vec_info::add_pattern_stmt): New.
(vec_info::set_vinfo_for_stmt): Optionally enforce read-only.
* tree-vectorizer.h (struct _slp_tree): Use new types.
(lane_permutation_t, lane_permutation_t): New.
(vect_create_new_slp_node, vect_mark_pattern_stmts): New.

Show coarrays on parse tree dump, implement debug for array references.

gcc/fortran/ChangeLog:

* dump-parse-tree.c (show_array_ref): Also show coarrays.
(debug): Implement for array reference.

testsuite: Fix various scan-assembler-symbol-section issues

This patch addresses some of the issues that I found when looking into the
failures of the scan-assembler-symbol-section tests on Solaris/SPARC.

* The first issue was that on Solaris/SPARC, section names are
  double-quoted, both with as and gas:

        .section        ".text"

  When using as, the section flag and type syntax is completely
  different from other ELF targets:

        .section        "my_named_section",#alloc,#execinstr,#progbits

  This patch fixes this by stripping double quotes from section names.

* However, this didn't work initially (only the leading quote was
  stripped), which is due to David's recent AIX patch: with the
  introduction of the new capturing group to handle both .section (ELF)
  and .csect (XCOFF), $full_section_directive would never be empty on
  ELF and Mach-O targets, so the extraction of the section name didn't
  work any longer.  This had also broken the Darwin tests completely.

* With working double quote stripping, all but one of the tests PASSed
  on Solaris/SPARC, the exception being:

FAIL: gcc.dg/20021029-1.c scan-assembler-symbol-section symbol ar (found __sparc_get_pc_thunk.l7) has section ^\\\\.(const|rodata)|\\\\[RO\\\\] (found .text.__sparc_get_pc_thunk.l7%__sparc_get_pc_thunk.l7)

  This is due to the symbol name (ar) not being anchored in the test and
  unexpectedly matchting __sparc_get_pc_thunk.l7.

* Next, I ran the tests on Darwin 11 and found two failing tests:

FAIL: gcc.dg/darwin-sections.c scan-assembler-symbol-section symbol ^_a\$ (symbol not found) has section \\\\.data
FAIL: gcc.dg/darwin-sections.c scan-assembler-symbol-section symbol ^_b\$ (symbol not found) has section \\\\.data

  is due to Iain's recent "Darwin : Begin rework of zero-fill sections."
  patch which emits

        .globl _a
        .zerofill __DATA,__common,_a,1,0

  This is already scanned for, so the two scans above can just go.

  The other failing test is

FAIL: g++.dg/gomp/tls-5.C  -std=c++14  scan-assembler-symbol-section symbol ^_?_ZGR2ir_\$ (symbol not found) has section ^\\\\.tdata|\\\\[TL\\\\]
FAIL: g++.dg/gomp/tls-5.C  -std=c++14  scan-assembler-symbol-section symbol ^_?ir\$ (symbol not found) has section ^\\\\.tbss|\\\\[TL\\\\]

  Other scans are guarded by target tls_native, and indeed the assembler
  output has

___emutls_v._ZGR2ir_:
___emutls_t._ZGR2ir_:

___emutls_v.ir:

  Unfortunately scan-assembler-symbol-section doesn't support selects
  yet, which this test implements both for the benefit of this test and
  for symmetry.

With those changes, test results are clean now on sparc-sun-solaris2.11,
i386-pc-solaris2.11, i386-apple-darwin11.4.2, and
powerpc-ibm-aix7.2.4.0.

2020-12-03  Rainer Orth  <ro@CeBiTec.Uni-Bielefeld.DE>

gcc:
* doc/sourcebuild.texi (Commands for use in dg-final, Scan the
assembly output, scan-assembler-symbol-section): Document.
(scan-symbol-section): Document.

gcc/testsuite:
* lib/scanasm.exp (scan-symbol-section): Pass args to
dg-scan-symbol-section.
(scan-assembler-symbol-section): Likewise.
(dg-scan-symbol-section): Handle selector from orig_args.
Get patterns from orig_args.
(parse_section_of_symbols): Fix section_pattern.
Strip double quotes from section name.

* g++.dg/gomp/tls-5.C: Restrict ir, _ZGR2ir_ scans to tls_native.
* gcc.dg/20021029-1.c: Anchor ar symbol.
* gcc.dg/darwin-sections.c: Remove obsolete scans for _a, _b in
.data.

Tweak the way that is_a is implemented

At the moment, class hierarchies that use is_a are expected
to define specialisations like:

  template <>
  template <>
  inline bool
  is_a_helper <cgraph_node *>::test (symtab_node *p)
  {
    return p->type == SYMTAB_FUNCTION;
  }

But this doesn't scale well to larger hierarchies, because it only
defines ::test for an argument that is exactly “symtab_node *”
(and not for example “const symtab_node *” or something that
comes between cgraph_node and symtab_node in the hierarchy).

For example:

  struct A { int x; };
  struct B : A {};
  struct C : B {};

  template <>
  template <>
  inline bool
  is_a_helper <C *>::test (A *a)
  {
    return a->x == 1;
  }

  bool f(B *b) { return is_a<C *> (b); }

gives:

  warning: inline function ‘static bool is_a_helper<T>::test(U*) [with U = B; T = C*]’ used but never defined

and:

  bool f(const A *a) { return is_a<const C *> (a); }

gives:

  warning: inline function ‘static bool is_a_helper<T>::test(U*) [with U = const A; T = const C*]’ used but never defined

This patch instead allows is_a to be implemented by specialising
is_a_helper as a whole, for example:

  template<>
  struct is_a_helper<C *> : static_is_a_helper<C *>
  {
    static inline bool test (const A *a) { return a->x == 1; }
  };

It also adds a general specialisation of is_a_helper for const
pointers.  Together, this makes both of the above examples work.

gcc/
* is-a.h (reinterpret_is_a_helper): New class.
(static_is_a_helper): Likewise.
(is_a_helper): Inherit from reinterpret_is_a_helper.
(is_a_helper<const T *>): New specialization.

Move iterator_range to a new iterator-utils.h file

A later patch will add more iterator-related utilities. Rather than
putting them all directly in coretypes.h, it seemed better to add a
new header file, here called "iterator-utils.h". This preliminary
patch moves the existing iterator_range class there too.

I used the same copyright date range as coretypes.h “just to be sure”.

gcc/
* coretypes.h (iterator_range): Move to...
* iterator-utils.h: ...this new file.

rtlanal: Remove noop_move_p REG_EQUAL condition

noop_move_p currently keeps any instruction that has a REG_EQUAL
note, on the basis that the equality might be useful in future.
But this creates a perverse incentive not to add potentially-useful
REG_EQUAL notes, in case they prevent an instruction from later being
removed as dead.

The condition originates from flow.c:life_analysis_1 and predates
the changes tracked by the current repository (1992). It probably
made sense when most optimisations were done on RTL rather than FE
trees, but it seems counterproductive now.

gcc/
* rtlanal.c (noop_move_p): Don't check for REG_EQUAL notes.

vec: Silence clang warning

I noticed during compatibility testing that clang warns that this
operator won't be implicitly const in C++14 onwards.

gcc/
* vec.h (vnull::operator vec<T, A, L>): Make const.

Daily bump.

libstdc++: Fix _GLIBCXX_DEBUG mode constexpr compatibility

The __glibcxx_check_can_[increment|decrement]_range macros are using the
_GLIBCXX_DEBUG_VERIFY_COND_AT macro which is not constexpr compliant and will produce nasty
diagnostics rather than the std::__failed_assertion dedicated to constexpr. Replace it with
correct _GLIBCXX_DEBUG_VERIFY_AT_F.

libstdc++-v3/ChangeLog:

* include/debug/macros.h (__glibcxx_check_can_increment_range): Replace
_GLIBCXX_DEBUG_VERIFY_COND_AT usage with _GLIBCXX_DEBUG_VERIFY_AT_F.
(__glibcxx_check_can_decrement_range): Likewise.
* testsuite/25_algorithms/copy_backward/constexpr.cc (test03): New.
* testsuite/25_algorithms/copy/debug/constexpr_neg.cc: New test.
* testsuite/25_algorithms/copy_backward/debug/constexpr_neg.cc: New test.
* testsuite/25_algorithms/equal/constexpr_neg.cc: New test.
* testsuite/25_algorithms/equal/debug/constexpr_neg.cc: New test.

Fortran: Enable inquiry references in data statements [PR98022].

2020-12-12 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/98022
* data.c (gfc_assign_data_value): Handle inquiry references in
the data statement object list.

gcc/testsuite/
PR fortran/98022
* gfortran.dg/data_inquiry_ref.f90: New test.

match.pd: Add ~(X - Y) -> ~X + Y simplification [PR96685]

This patch adds the ~(X - Y) -> ~X + Y simplification requested
in the PR (plus also ~(X + C) -> ~X + (-C) for constants C that can
be safely negated.

The first two simplify blocks is what has been requested in the PR
and that makes the first testcase pass.
Unfortunately, that change also breaks the second testcase, because
while the same expressions appearing in the same stmt and split
across multiple stmts has been folded (not really) before, with
this optimization fold-const.c optimizes ~X + Y further into
(Y - X) - 1 in fold_binary_loc associate: code, but we have nothing
like that in GIMPLE and so end up with different expressions.

The last simplify is an attempt to deal with just this case,
had to rule out there the Y == -1U case, because then we
reached infinite recursion as ~X + -1U was canonicalized by
the pattern into (-1U - X) + -1U but there is a canonicalization
-1 - A -> ~A that turns it back. Furthermore, had to make it #if
GIMPLE only, because it otherwise resulted in infinite recursion
when interacting with the associate: optimization.
The end result is that we pass all 3 testcases and thus canonizalize
the 3 possible forms of writing the same thing.

2020-12-12 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/96685
* match.pd (~(X - Y) -> ~X + Y): New optimization.
(~X + Y -> (Y - X) - 1): Likewise.

* gcc.dg/tree-ssa/pr96685-1.c: New test.
* gcc.dg/tree-ssa/pr96685-2.c: New test.
* gcc.dg/tree-ssa/pr96685-3.c: New test.

widening_mul: Recognize another form of ADD_OVERFLOW [PR96272]

The following patch recognizes another form of hand written
__builtin_add_overflow (this time _p), in particular when
the code does unsigned
if (x > ~0U - y)
or
if (x <= ~0U - y)
it can be optimized (if the subtraction turned into ~y is single use)
into
if (__builtin_add_overflow_p (x, y, 0U))
or
if (!__builtin_add_overflow_p (x, y, 0U))
and generate better code, e.g. for the first function in the testcase:
-       movl    %esi, %eax
        addl    %edi, %esi
-       notl    %eax
-       cmpl    %edi, %eax
-       movl    $-1, %eax
-       cmovnb  %esi, %eax
+       jc      .L3
+       movl    %esi, %eax
+       ret
+.L3:
+       orl     $-1, %eax
        ret
on x86_64.  As for the jumps vs. conditional move case, that is some CE
issue with complex branch patterns we should fix up no matter what, but
in this case I'm actually not sure if branchy code isn't better, overflow
is something that isn't that common.

2020-12-12  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/96272
* tree-ssa-math-opts.c (uaddsub_overflow_check_p): Add OTHER argument.
Handle BIT_NOT_EXPR.
(match_uaddsub_overflow): Optimize unsigned a > ~b into
__imag__ .ADD_OVERFLOW (a, b).
(math_opts_dom_walker::after_dom_children): Call match_uaddsub_overflow
even for BIT_NOT_EXPR.

* gcc.dg/tree-ssa/pr96272.c: New test.

openmp, openacc: Fix up handling of data regions [PR98183]

While the data regions (target data and OpenACC counterparts) aren't
standalone directives, unlike most other OpenMP/OpenACC constructs
we allow (apparently as an extension) exceptions and goto out of
the block. During gimplification we place an *end* call into a finally
block so that it is reached even on exceptions or goto out etc.).
During omplower pass we then add paired #pragma omp return for them,
but due to the exceptions because the region is not SESE we can end up
with #pragma omp return appearing only conditionally in the CFG etc.,
which the ompexp pass can't handle.
For the ompexp pass, we actually don't care about the end part or about
target data nesting, so we can treat it as standalone directive.

2020-12-12 Jakub Jelinek <jakub@redhat.com>

PR middle-end/98183
* omp-low.c (lower_omp_target): Don't add OMP_RETURN for
data regions.
* omp-expand.c (expand_omp_target): Don't try to remove
OMP_RETURN for data regions.
(build_omp_regions_1, omp_make_gimple_edges): Don't expect
OMP_RETURN for data regions.

* gcc.dg/gomp/pr98183.c: New test.
* gcc.dg/goacc/pr98183.c: New test.

Daily bump.

c++: Avoid considering some conversion ops [PR97600]

Patrick's earlier patch to check convertibility before constraints for
conversion ops wasn't suitable because checking convertibility can also lead
to unwanted instantiations, but it occurs to me that there's a smaller check
we can do to avoid doing normal consideration of the conversion ops in this
case: since we're in the middle of a user-defined conversion, we can exclude
from consideration any conversion ops that return a type that would need an
additional user-defined conversion to reach the desired type: namely, a type
that differs in class-ness from the desired type.

[temp.inst]/9 allows optimizations like this: "If the function selected by
overload resolution can be determined without instantiating a class template
definition, it is unspecified whether that instantiation actually takes
place."

gcc/cp/ChangeLog:

PR libstdc++/97600
* call.c (build_user_type_conversion_1): Avoid considering
conversion functions that return a clearly unsuitable type.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-conv3.C: New test.

c++: Fix build with --enable-gather-detailed-mem-stats.

Nathan's recent patch added make_binding_vec defined with MEM_STAT_DECL, but
didn't add the parallel decoration to the forward declaration.

gcc/cp/ChangeLog:

* cp-tree.h (make_binding_vec): Add CXX_MEM_STAT_INFO.

c++: Final module preparations

This adds the final few preparations to drop modules in.  I'd missed a
couple of changes to core compiler -- a new pair of preprocessor
options, and marking the boundary of fixed and lazy global trees.

For C++, we need to add module.cc to the GTY scanner.  Parsing final
cleanups needs a few tweaks for modules.  Lambdas used to initialize a
global (for instance) get an extra scope, but we now need to point
that object to the lambda too.  Finally template instantiation needs
to do lazy loading before looking at the available instantiations and
specializations.

gcc/
* gcc.c (cpp_unique_options): Add Mmodules, Mno-modules.
* tree-core.h (enum tree_index): Add TI_MODULE_HWM.
gcc/cp/
* config-lang.in (gtfiles): Add cp/module.cc.
* decl2.c (c_parse_final_cleanups): Add module support.
* lambda.c (record_lambda_scope): Call maybe_attach_decl.
* module.cc (maybe_attach_decl, lazy_load_specializations): Stubs.
(finish_module_procesing): Stub.
* pt.c (lookup_template_class_1): Lazy load specializations.
(instantiate_template_1): Likewise.

c++: Refactor final cleanup

This is a small refactor of the end of decl processing, into which
dropping module support will be simpler.

gcc/cp/
* decl2.c (c_parse_final_cleanups): Refactor loop.

Add missing varasm DECL_P check.

This fixes a riscv64-linux bootstrap failure.

get_constant_section calls the select_section target hook, and select_section
calls get_named_section which calls get_section.  So it is possible to have
a constant not a decl in both of these functions.  They already call DECL_P
checks everywhere except for the new code HJ recently added.  This adds the
missing DECL_P check.

gcc/
* varasm.c (get_section): Add DECL_P check before DECL_PRESERVE_P.

Daily bump.

compiler: encode user visible names if necessary

Avoid putting weird characters into the user visible name.
It breaks stabs in particular, and may also cause debugger problems.
Instead, encode those names, and use a "g." prefix to tell the debugger.

Also dereference the type for the name of a recover thunk, to avoid a
pointless '*' that gets encoded.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/277232

arm: Auto-vectorization for MVE clean condition for vand and vorr expanders

The patch restores the unconditional definition of the VDQ iterator,
and changes the conditions of the vand and vorr expanders to use
ARM_HAVE_<MODE>_ARITH.

2020-12-11 Christophe Lyon <christophe.lyon@linaro.org>

gcc/
* config/arm/iterators.md (VDQ): Remove TARGET_HAVE_MVE
conditions.
* config/arm/vec-common.md (and<mode>3): Use
ARM_HAVE_<MODE>_ARITH.
(ior<mode>3): Likewise.

arc: Update ARC700 cache hazard detection.

Replace/update ARC700 cache hazard detection. The next situations are
handled:

- There are 2 stores back2back, then 3 loads in next 3 or 4 instructions.

    if 3 loads in 3 instructions then we insert 2 nops after stores.
    if 3 loads in 4 instructions then we insert 1 nop after stores

- 2 back to back stores, followed by at least 3 loads in next 4 instructions.
        st st ld ld ld ##
        st st ## ld ld ld
        st st ld ## ld ld
        st st ld ld ## ld
        ## - any instruction

- store between non-store instructions, followed by 3 loads
        $$ st SS ld ld ld
        $$ - non-store instruction, even load.

gcc/
2020-12-11  Claudiu Zissulescu  <claziss@synopsys.com>

* config/arc/arc.c (arc_active_insn): Ignore all non essential
instructions when getting the next active instruction.
(check_store_cacheline_hazard): Update.
(workaround_arc_anomaly): Remove obsolete cache hazard code.

Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>

arc: Avoid generating brcc instructions with limm

BRcc instructions are generated quite late in the compilation
process. These instructions combines a compare with a regular
conditional branch if the result of the compare is not used
anylonger. However, when compiling for size, it is better to avoid
BRcc instructions which are introducing a 32-bit long immediate.

gcc/
2020-12-11 Claudiu Zissulescu <claziss@synopsys.com>

* config/arc/arc.c (arc_reorg): Avoid limm in BRcc.

arc: Refurbish adc/sbc patterns

The adc/sbc patterns were unecessary spliting, remove that and
associated functions.

gcc/
2020-12-11 Claudiu Zissulescu <claziss@synopsys.com>

* config/arc/arc-protos.h (arc_scheduling_not_expected): Remove
it.
(arc_sets_cc_p): Likewise.
(arc_need_delay): Likewise.
* config/arc/arc.c (arc_sets_cc_p): Likewise.
(arc_need_delay): Likewise.
(arc_scheduling_not_expected): Likewise.
* config/arc/arc.md: Convert adc/sbc patterns to simple
instruction definitions.

Signed-off-by: Claudiu Zissulescu <claziss@synopsys.com>

c++: module test harness

Here is the module test harness -- but no tests.

gcc/testsuite/
* g++.dg/modules/modules.exp: New.

c++: cp_tree_equal tweaks

When comparing streamed trees we can encounter NON_LVALUE_EXPR and
VIEW_CONVERT_EXPRs with null types. Also, when checking a potential
duplicate we don't want to reject PARM_DECLs with different contexts,
if those two contexts are the two decls of interest.

gcc/cp/
* cp-tree.h (map_context_from, map_context_to): Declare.
* module.cc (map_context_from, map_context_to): Define.
* tree.c (cp_tree_equal): Check map_context_{from,to} for parm
context difference. Allow NON_LVALUE_EXPR and VIEW_CONVERT_EXPR
with null types.

arm: Auto-vectorization for MVE: vorr

This patch enables MVE vorrq instructions for auto-vectorization.  MVE
vorrq insns in mve.md are modified to use ior instead of unspec
expression to support ior<mode>3.  The ior<mode>3 expander is added to
vec-common.md

2020-12-03  Christophe Lyon  <christophe.lyon@linaro.org>

gcc/
* config/arm/iterators.md (supf): Remove VORRQ_S and VORRQ_U.
(VORRQ): Remove.
* config/arm/mve.md (mve_vorrq_s<mode>): New entry for vorr
instruction using expression ior.
(mve_vorrq_u<mode>): New expander.
(mve_vorrq_f<mode>): Use ior code instead of unspec.
* config/arm/neon.md (ior<mode>3): Renamed into ior<mode>3_neon.
* config/arm/predicates.md (imm_for_neon_logic_operand): Enable
for MVE.
* config/arm/unspecs.md (VORRQ_S, VORRQ_U, VORRQ_F): Remove.
* config/arm/vec-common.md (ior<mode>3): New expander.

gcc/testsuite/
* gcc.target/arm/simd/mve-vorr.c: Add vorr tests.

arc: Use separate predicated patterns for mpyd(u)

The compiler can match mpyd.eq r0,r1,r0 as a predicated instruction,
which is incorrect. The mpyd(u) instruction takes as input two 32-bit
registers, returning into a double 64-bit even-odd register pair.  For
the predicated case, the ARC instruction decoder expects the
destination register to be the same as the first input register. In
the big-endian case the result is swaped in the destination register
pair, however, the instruction encoding remains the same.  Refurbish
the mpyd(u) patterns to take into account the above observation.

gcc/
2020-12-11  Claudiu Zissulescu  <claziss@synopsys.com>

* config/arc/arc.md (mpyd<su_optab>_arcv2hs): New template
pattern.
(*pmpyd<su_optab>_arcv2hs): Likewise.
(*pmpyd<su_optab>_imm_arcv2hs): Likewise.
(mpyd_arcv2hs): Moved into above template.
(mpyd_imm_arcv2hs): Moved into above template.
(mpydu_arcv2hs): Likewise.
(mpydu_imm_arcv2hs): Likewise.
(su_optab): New optab prefix for sign/zero-extending operations.

gcc/testsuite/
2020-12-11  Claudiu Zissulescu  <claziss@synopsys.com>

* gcc.target/arc/pmpyd.c: New test.
* gcc.target/arc/tmac-1.c: Update.

Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>

x86: Update user interrupt handler stack frame

User interrupt handler stack frame is similar to exception interrupt
handler stack frame. Instead of error code, the second argument is
user interrupt request register vector.

gcc/

PR target/98219
* config/i386/uintrintrin.h (__uintr_frame): Remove uirrv.

gcc/testsuite/

PR target/98219
* gcc.dg/guality/pr98219-1.c: New test.
* gcc.dg/guality/pr98219-2.c: Likewise.
* gcc.dg/torture/pr98219-1.c: Likewise.
* gcc.dg/torture/pr98219-2.c: Likewise.
* gcc.target/i386/uintr-2.c: Scan "add[lq] $8, %[er]sp".
(uword_t): New.
(foo): Add a uword_t argument.
(UINTR_hanlder): Likewise.
* gcc.target/i386/uintr-3.c: Scan "add[lq] $8, %[er]sp".
(uword_t): New.
(UINTR_hanlder): Add a uword_t argument.
* gcc.target/i386/uintr-4.c (uword_t): New.
(UINTR_hanlder): Add a uword_t argument.
* gcc.target/i386/uintr-5.c (uword_t): New.
(UINTR_hanlder): Add a uword_t argument.

c++: Module lang hook overriding

This installs stub lang hooks for modules and creates the module dump file.

gcc/cp/
* cp-lang.c (LANG_HOOKS_PREPROCESS_MAIN_FILE): Override.
(LANG_HOOKS_PREPROCESS_OPTIONS): Override.
(LANG_HOOKS_PREPROCESS_TOKEN): Override.
* cp-objcp-common.c (cp_register_dumps): Add module dump.
(cp_handle_option): New.
* cp-objcp-common.h (cp_handle_option): Declare.
(LANG_HOOKS_HANDLE_OPTION): Override.
* cp-tree.h (module_dump_id): Declare.
* module.cc (module_dump_id): Define.
(module_begin_main_file, handle_module_option)
(module_preproces_options): Stubs.

c++: name lookup API for modules

This adds a set of calls to name lookup that are needed by modules.
Generally installing imported bindings, or walking the current TU's
bindings.  One note about template instantiations though.  When we're
about to instantiate a template we have to know about all the
maybe-partial specializations that exist.  These can be in any
imported module -- not necesarily the module defining the template.
Thus we key such foreign templates to the innermost namespace and
identifier of the containing entitity -- that's the only thing we have
a handle on.  That's why we note and load pending specializations here.

gcc/cp/
* module.cc (lazy_specializations_p): Stub.
* name-lookup.h (append_imported_binding_slot)
(mergeable_namespacE_slots, lookup_class_binding)
(walk_module_binding, import_module_binding, set_module_binding)
(note_pending_specializations, load_pending_specializations)
(add_module_decl, add_imported_namespace): Declare.
(get_cxx_dialect_name): Declare.
(enum WMB_flags): New.
* name-lookup.c (append_imported_binding_slot)
(mergeable_namespacE_slots, lookup_class_binding)
(walk_module_binding, import_module_binding, set_module_binding)
(note_pending_specializations, load_pending_specializations)
(add_module_decl, add_imported_namespace): New.
(get_cxx_dialect_name): Make extern.

c++: missing SFINAE with pointer subtraction [PR78173]

This fixes a missed SFINAE when subtracting pointers to an incomplete
type.

gcc/cp/ChangeLog:

PR c++/78173
* typeck.c (pointer_diff): Use complete_type_or_maybe_complain
instead of complete_type_or_else.

gcc/testsuite/ChangeLog:

PR c++/78173
* g++.dg/cpp2a/concepts-pr78173.C: New test.

arm: Improve documentation for effective target 'arm_softfloat'

gcc/ChangeLog

2020-12-01 Andrea Corallo <andrea.corallo@arm.com>

* doc/sourcebuild.texi (arm_softfloat): Improve documentation.

gcc/testsuite/ChangeLog

2020-12-01 Andrea Corallo <andrea.corallo@arm.com>

* lib/target-supports.exp (check_effective_target_arm_softfloat):
Improve documentation.

arm: [testsuite] fix lob tests for -mfloat-abi=hard

2020-11-26 Andrea Corallo <andrea.corallo@arm.com>

* gcc.target/arm/lob2.c: Use '-march=armv8.1-m.main+fp'.
* gcc.target/arm/lob3.c: Skip with '-mfloat-abi=hard'.
* gcc.target/arm/lob4.c: Likewise.
* gcc.target/arm/lob5.c: Use '-march=armv8.1-m.main+fp'.

testsuite/98244 - amend gcc.dg/vect/vect-live-6.c

Committed.

2020-12-11 Richard Biener <rguenther@suse.de>

PR testsuite/98244
* gcc.dg/vect/vect-live-6.c: Require vect_condition.

testsuite/98242 - amend gcc.dg/vect/bb-slp-subgroups-3.c

Committed.

2020-12-11 Richard Biener <rguenther@suse.de>

PR testsuite/98242
* gcc.dg/vect/bb-slp-subgroups-3.c: Require vect_int_mult.

testsuite/98240 - amend gcc.dg/vect/pr97678.c

Committed.

2020-12-11 Richard Biener <rguenther@suse.de>

PR testsuite/98240
* gcc.dg/vect/pr97678.c: Require vect_int_mult and
vect_pack_trunc.

testsuite/98239 - require vect_condition for gcc.dg/vect/bb-slp-69.c

Committed.

2020-12-11 Richard Biener <rguenther@suse.de>

PR testsuite/98239
* gcc.dg/vect/bb-slp-69.c: Require vect_condition.

expand: Fix up expand_doubleword_mod on 32-bit targets [PR98229]

As the testcase shows, for 32-bit word size we can end up with op1
up to 0xffffffff (0x100000000 % 0xffffffff == 1 and so we use bit == 32
for that), but the CONST_INT we got from caller is for DImode in that case
and not valid for SImode operations.

The following patch canonicalizes the two spots where the constant needs
canonicalization.

2020-12-10 Jakub Jelinek <jakub@redhat.com>

PR rtl-optimization/98229
* optabs.c (expand_doubleword_mod): Canonicalize op1 and
1 - INTVAL (op1) as word_mode constants when used in
word_mode arithmetics.

* gcc.c-torture/compile/pr98229.c: New test.

tree-optimization/98235 - limit SLP discovery

With following backedges and the SLP discovery cache not being
permute aware we have to put some discovery limits in place again.
That's also the opportunity to ditch the separate limit on the
number of permutes we try, so the patch limits the overall work
done (as in vect_build_slp_tree cache misses) to what we compute
as max_tree_size which is based on the number of scalar stmts in
the vectorized region.

Note the limit is global and there's no attempt to divide the
allowed work evenly amongst opportunities, so one degenerate
can eat it all up.  That's probably only relevant for BB
vectorization where the limit is based on up to the size of the
whole function.

2020-12-11  Richard Biener  <rguenther@suse.de>

PR tree-optimization/98235
* tree-vect-slp.c (vect_build_slp_tree): Exchange npermutes
for limit.  Decrement that for each cache miss and fail
discovery when it reaches zero.
(vect_build_slp_tree_2): Remove npermutes handling and
simply pass down limit.
(vect_build_slp_instance): Use pass down limit.
(vect_analyze_slp_instance): Likewise.
(vect_analyze_slp): Base the SLP discovery limit on
max_tree_size and pass it down.

* gcc.dg/torture/pr98235.c: New testcase.

expansion: Sign or zero extend on MEM_REF stores into SUBREG with SUBREG_PROMOTED_VAR_P [PR98190]

Some targets decide to promote certain scalar variables to wider mode,
so their DECL_RTL is a SUBREG with SUBREG_PROMOTED_VAR_P.
When storing to such vars, store_expr takes care of sign or zero extending,
but if we store e.g. through MEM_REF into them, no sign or zero extension
happens and that leads to wrong-code e.g. on the following testcase on
aarch64-linux.

The following patch uses store_expr if we overwrite all the bits and it is
not reversed storage order, i.e. something that store_expr handles normally,
and otherwise (if the most significant bit is (or for pdp11 might be, but
pdp11 doesn't promote) being modified), the code extends manually.

2020-12-11 Jakub Jelinek <jakub@redhat.com>

PR middle-end/98190
* expr.c (expand_assignment): If to_rtx is a promoted SUBREG,
ensure sign or zero extension either through use of store_expr
or by extending manually.

* gcc.dg/pr98190.c: New test.

ira.c: Fix ICE in ira-color [PR97092]

gcc/ChangeLog

2020-12-10 Andrea Corallo <andrea.corallo@arm.com>

PR rtl-optimization/97092
* ira-color.c (update_costs_from_allocno): Do not carry over mode
between subsequent iterations.

gcc/testsuite/ChangeLog

2020-12-10 Andrea Corallo <andrea.corallo@arm.com>

* gcc.target/aarch64/sve/pr97092.c: New test.

tree-optimization/95582 - fix vector pattern with bool conversions

The pattern recognizer fends off against recognizing conversions
from VECT_SCALAR_BOOLEAN_TYPE_P to precision one types but what
it really needs to fend off is conversions between
VECT_SCALAR_BOOLEAN_TYPE_P types - the Ada FE uses an 8 bit
boolean type that satisfies this predicate.

2020-12-11 Richard Biener <rguenther@suse.de>

PR tree-optimization/95582
* tree-vect-patterns.c (vect_recog_bool_pattern): Check
for VECT_SCALAR_BOOLEAN_TYPE_P, not just precision one.

Fix feature check for HRESET/AVX_VNNI/UINTR

gcc/ChangeLog:
* common/config/i386/cpuinfo.h (get_available_features):
Move check for HRESET/AVX_VNNI/UINTR out of avx512_usable.

dojump: Fix up probabilities splitting in dojump.c comparison splitting [PR98212]

When compiling:
void foo (void);
void bar (float a, float b) { if (__builtin_expect (a != b, 1)) foo (); }
void baz (float a, float b) { if (__builtin_expect (a == b, 1)) foo (); }
void qux (float a, float b) { if (__builtin_expect (a != b, 0)) foo (); }
void corge (float a, float b) { if (__builtin_expect (a == b, 0)) foo (); }
on x86_64, we get (unimportant cruft removed):
bar:    ucomiss %xmm1, %xmm0
        jp      .L4
        je      .L1
.L4:    jmp     foo
.L1:    ret
baz:    ucomiss %xmm1, %xmm0
        jp      .L6
        jne     .L6
        jmp     foo
.L6:    ret
qux:    ucomiss %xmm1, %xmm0
        jp      .L13
        jne     .L13
        ret
.L13:   jmp     foo
corge:  ucomiss %xmm1, %xmm0
        jnp     .L18
.L14:   ret
.L18:   jne     .L14
        jmp     foo
(note for bar and qux that changed with a patch I've posted earlier today).
This is all reasonable, except the last function, the overall jump to
the tail call is predicted unlikely (10%), so it is good jmp foo isn't on
the straight line path, but NaNs are (or should be) considered very unlikely
in the programs, so IMHO the right code (and one emitted with the following
patch) is:
corge:  ucomiss %xmm1, %xmm0
        jp      .L14
        je      .L18
.L14:   ret
.L18:   jmp     foo

Let's discuss the probabilities in the above testcase:
for !and_them it looks all correct, so for
bar we split
if (a != b) goto t; // prob 90%
goto f;
into:
if (a unord b) goto t; // first_prob = prob * cprob = 90% * 1% = 0.9%
if (a ltgt b) goto t; // adjusted prob = (prob - first_prob) / (1 - first_prob) = (90% - 0.9%) / (1 - 0.9%) = 89.909%
and for qux we split
if (a != b) goto t; // prob 10%
goto f;
into:
if (a unord b) goto t; // first_prob = prob * cprob = 10% * 1% = 0.1%
if (a ltgt b) goto t; // adjusted prob = (prob - first_prob) / (1 - first_prob) = (10% - 0.1%) / (1 - 0.1%) = 9.910%
Now, the and_them cases should be probability wise exactly the same
if we swap the f and t labels, because baz
if (a == b) goto t; // prob 90%
goto f;
is equivalent to:
if (a != b) goto f; // prob 10%
goto t;
which is in qux.  This means we could expand baz as:
if (a unord b) goto f; // 0.1%
if (a ltgt b) goto f; // 9.910%
goto t;
But we don't expand it exactly that way, but instead (as the comment says)
as:
if (a ord b) ; else goto f; // first_prob as probability of ;
if (a uneq b) goto t; // adjusted prob
goto f;
So, first_prob.invert () should be 0.1% and adjusted prob should be
1 - 9.910%.
Thus, the right thing is 4 inverts:
prob = prob.invert (); // baz is equivalent to qux with swap(t, f) and thus inverted original prob
first_prob = prob.split (cprob.invert ()).invert ();
// cprob.invert because by doing if (cond) ; else goto f; we effectively invert the condition
// the second invert because first_prob is probability of ; rather than goto f
prob = prob.invert (); // lastly because adjusted prob we want is
// probability of goto t;, while the one from corresponding !and_them case
// would be if (...) goto f; goto t;

2020-12-11  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/98212
* dojump.c (do_compare_rtx_and_jump): Change computation of
first_prob for and_them.  Add comment explaining and_them case.

* gcc.dg/predict-8.c: Adjust expected probability.

libstdc++: Remove redundant branches in countl_one and countr_one [PR 98226]

There's no need to explicitly check for the maximum value, because the
function we call handles it correctly anyway.

libstdc++-v3/ChangeLog:

PR libstdc++/98226
* include/std/bit (__countl_one, __countr_one): Remove redundant
branches.

Reduce memory requirements for ranger

Calculate block exit info upfront, and then any SSA_NAME which is never
used in an outgoing range calculation is a pure global and can bypass the
on-entry cache.

PR tree-optimization/98174
* gimple-range-cache.cc (ranger_cache::ssa_range_in_bb): Only push
poor values to be examined if it isn't a pure global.
(ranger_cache::block_range): Don't process pure globals.
(ranger_cache::fill_block_cache): Adjust has_edge_range call.
* gimple-range-gori.cc (gori_map::all_outgoing): New bitmap.
(gori_map::gori_map): Allocate all_outgoing.
(gori_map::is_export_p): No specified BB returns global context.
(gori_map::calculate_gori): Accumulate each block into global.
(gori_compute::gori_compute): Preprocess each block for exports.
(gori_compute::has_edge_range_p): No edge returns global context.
* gimple-range-gori.h (has_edge_range_p): Provide default parameter.

Fix PR ada/98230

It's a rather curious malfunction of the 'Mod attribute applied to the
variable of a loop whose upper bound is dynamic.

gcc/ada/ChangeLog:
PR ada/98230
* exp_attr.adb (Expand_N_Attribute_Reference, case Mod): Use base
type of argument to obtain static bound and required size.

gcc/testsuite/ChangeLog:
* gnat.dg/modular6.adb: New test.

c++: Add make_temp_override generator functions

A common pattern before C++17 is the generator function, used to avoid
having to specify the type of a container element by using a function call
to get type deduction; for example, std::make_pair. C++17 added class type
argument deduction, making generator functions unnecessary for many uses,
but GCC won't be written in C++17 for years yet.

gcc/cp/ChangeLog:

* cp-tree.h (struct type_identity): New.
(make_temp_override): New.
* decl.c (grokdeclarator): Use it.
* except.c (maybe_noexcept_warning): Use it.
* parser.c (cp_parser_enum_specifier): Use it.
(cp_parser_parameter_declaration_clause): Use it.
(cp_parser_gnu_attributes_opt): Use it.
(cp_parser_std_attribute): Use it.

c++: Update value of __cplusplus for C++20.

It's past time to update this macro to the specified value for C++20.

libcpp/ChangeLog:

* init.c (cpp_init_builtins): Update __cplusplus for C++20.

c++: Add fixed test [PR91506]

Pre-r11-557 we issued a bogus

error: parameter may not have variably modified type 'double [x]'

but now we compile this, as we should.

gcc/testsuite/ChangeLog:

PR c++/91506
* g++.dg/init/array60.C: New test.

c++: modules & using-decls

This extends using-decls to modules. In modules you can export a
using decl, but the exported decl must have external linkage already.
One thing you can do is export something from the GMF.

The novel thing is that now 'export using foo::bar;' *in namespace
bar* can mean something significant (rather than be an obscure nop).

gcc/cp/
* name-lookup.c (do_nonmember_using_decl): Add INSERT_P parm.
Deal with exporting using decls.
(finish_nonmember_using_decl): Examine BINDING_VECTOR.

c++: Name lookup for modules

This augments the name lookup with knowledge about the BINDING_VECTOR.
That holds per-module namespace bindings, and we need to collect the
bindings in visible imports when we do lookup. We also need to do
some checking when we're pushing a new decl to check we're not
overriding an existing visible binding in some way.

To deal with the Global Module and Module Partitions, we reserve 1 or
2 slots inthe BINDING_VECTOR to record those entities that may
legitimately appear in more than one module.

As mentioned before, the BINDING_VECTOR is created lazily, when
imported bindings appear. The current TUs decls then appear on slot
zero.

gcc/cp/
* cp-tree.h (visible_instantiation_path): Renamed.
* module.cc (get_originating_module_decl, lazy_load_binding)
(lazy_load_members, visible_instantiation_path): Stubs.
* name-lookup.c (STAT_TYPE_VISIBLE_P, STAT_VISIBLE): New.
(search_imported_binding_slot, init_global_partition)
(get_fixed_binding_slot): New.
(name_lookup::process_module_binding): New.
(name_lookup::search_namespace_only): Search BINDING_VECTOR.
(name_lookup::adl_namespace_fns): Likewise.
(name_lookip::search_adl): Search visible instantiation path.
(maybe_lazily_declare): Maybe lazy load members.
(implicitly_exporT_namespace): New.
(maybe_record_mergeable_decl): New.
(check_module_override): New.
(do_pushdecl): Deal with BINDING_VECTOR, check override.
(add_mergeable_namespace_entity): New.
(get_namespace_binding): Deal with BINDING_VECTOR.
(do_namespace_alias): Call set_originating_module.
(lookup_elaborated_type_1): Deal with BINDING_VECTOR.
(do_pushtag): Call set_originating_module.
(reuse_namespace): New.
(make_namespace_finish): Add FROM_IMPORT parm.
(push_namespace): Deal with BINDING_VECTOR & namespace reuse.
(maybe_save_operator_binding): Save when module CMI in play.
* name-lookup.h (add_mergeable_namespace_entity): Declare.

c++: modularize spelling suggestions

This augments the spelling suggestion code to understand about visible
imported modules. Simply consider each visible binding in the
binding_vector, until we find one that has something of interest.

gcc/cp/
* name-lookup.c: Include bitmap.h.
(enum binding_slots): New.
(maybe_add_fuzzy_binding): Return bool true if found.
(consider_binding_level): Add module support.
* module.cc (get_import_bitmap): Stub.

arm: Fix typo in testcase mve-vsub_1.c

gcc/testsuite/
* gcc.target/arm/simd/mve-vsub_1.c: Fix typo.
Remove needless dg-additional-options.

c++: Add fixed test [PR68451]

I was about to add this test with dg-ice but it turned out it had
already been fixed by the recent r11-3361!

gcc/testsuite/ChangeLog:

PR c++/68451
* g++.dg/cpp0x/friend6.C: New test.

c++: name-lookup refactoring

Here are some refactorings to the name-lookup machinery.  Primarily
breakout out worker functions that the modules patch will also use.
Fixing a couple of comments on the way.

gcc/cp/
* name-lookup.c (pop_local_binding): Check for IDENTIFIER_ANON_P.
(update_binding): Level may be null, don't add namespaces to
level.
(newbinding_bookkeeping): New, broken out of ...
(do_pushdecl): ... here, call it.  Don't push anonymous decls.
(pushdecl, add_using_namespace): Correct comments.
(do_push_nested_namespace): Remove assert.
(make_namespace, make_namespace_finish): New, broken out of ...
(push_namespace): ... here.  Call them.  Add namespace to level
here.

Small fix to PLACEHOLDER_EXPR handling in loc_list_from_tree_1

This handles the discriminated record types of Ada: the PLACEHOLDER_EXPR is
the "template" expression for the discriminant in the type definition. Now
for some components, typically arrays whose upper bound is the discriminant,
the compiler creates a local subtype for the component, so the code needs to
be able to deal with this nested type.

gcc/ChangeLog:
* dwarf2out.c (loc_list_from_tree_1) <PLACEHOLDER_EXPR>: Deal with
a nested context type

c++: Module-specific error and tree dumping

With modules, we need the ability to name 'foos' in different modules.
The idiom for that is a trailing '@modulename' suffix. This adds that
to the error printing routines. I also augment the tree dumping
machinery to show module-specific metadata.

gcc/cp/
* error.c (dump_module_suffix): New.
(dump_aggr_type, dump_simple_decl, dump_function_name): Call it.
* ptree.c (cxx_print_decl): Print module information.
* module.cc (module_name, get_importing_module): Stubs.

c++: name-lookup cleanups

Name-lookup is the most changed piece of the front end for modules.
Here are some preparatort cleanups and API extensions.

gcc/cp/
* name-lookup.h (set_class_bindings): Return vector, take signed
'extra' parm.
* name-lookup.c (maybe_lazily_declare): Break out ...
(get_class_binding): .. of here, call it.
(find_member_slot): Adjust get_class_bindings call.
(set_class_bindings): Allow -ve extra. Return the vector.
(set_identifier_type_value_with_scope): Remove checking assert.
(lookup_using_decl): Set decl's context.
(do_pushtag): Adjust set_identifier_type_value_with_scope handling.

Remove misleading debug line entries

This removes gimple_debug_begin_stmts without block info which remain
after a gimple block originating from an inline function is unused.

The line numbers from these stmts are from the inline function,
but since the inline function is completely optimized away,
there will be no DW_TAG_inlined_subroutine so the debugger has
no callstack available at this point, and therefore those
line table entries are not helpful to the user.

2020-12-10 Bernd Edlinger <bernd.edlinger@hotmail.de>

* cfgexpand.c (expand_gimple_basic_block): Remove special handling
of debug_inline_entries without block info.
* tree-inline.c (remap_gimple_stmt): Drop debug_nonbind_markers when
the call statement has no block info.
(copy_debug_stmt): Remove debug_nonbind_markers when inlining
and the block info is mapped to NULL.
* tree-ssa-live.c (clear_unused_block_pointer): Remove
debug_nonbind_markers originating from removed inline functions.

remove obsolete conversion handling from vectorizable_assignment

This removes an odd special-case of VECTOR_BOOLEAN_TYPE_P typed
conversions from vectorizable_assignment that was obsoleted by
making all integer mode VECTOR_BOOLEAN_TYPE_P types have 1-bit
precision bool components with 605c2a393d3a2db8

2020-12-10 Richard Biener <rguenther@suse.de>

* tree-vect-stmts.c (vectorizable_assignment): Remove special
allowance of VECTOR_BOOLEAN_TYPE_P conversions.

arm: Auto-vectorization for MVE: vand

This patch enables MVE vandq instructions for auto-vectorization.  MVE
vandq insns in mve.md are modified to use 'and' instead of unspec
expression to support and<mode>3.  The and<mode>3 expander is added to
vec-common.md

2020-12-03  Christophe Lyon  <christophe.lyon@linaro.org>

gcc/
* config/arm/iterators.md (supf): Remove VANDQ_S and VANDQ_U.
(VANQ): Remove.
(VDQ): Add TARGET_HAVE_MVE condition where relevant.
* config/arm/mve.md (mve_vandq_u<mode>): New entry for vand
instruction using expression 'and'.
(mve_vandq_s<mode>): New expander.
(mve_vaddq_n_f<mode>): Use 'and' code instead of unspec.
* config/arm/neon.md (and<mode>3): Rename into and<mode>3_neon.
* config/arm/predicates.md (imm_for_neon_inv_logic_operand):
Enable for MVE.
* config/arm/unspecs.md (VANDQ_S, VANDQ_U, VANDQ_F): Remove.
* config/arm/vec-common.md (and<mode>3): New expander.

gcc/testsuite/
* gcc.target/arm/simd/mve-vand.c: New test.

data-ref: Rework integer handling in split_constant_offset [PR98069]

PR98069 is about a case in which split_constant_offset miscategorises
an expression of the form:

  int foo;
  …
  POINTER_PLUS_EXPR<base, (sizetype)(INT_MIN - foo) * size>

as:

  base: base
  offset: (sizetype) (-foo) * size
  init: INT_MIN * size

“-foo” overflows when “foo” is INT_MIN, whereas the original expression
didn't overflow in that case.

As discussed in the PR trail, we could simply ignore the fact that
int overflow is undefined and treat it as a wrapping type, but that
is likely to pessimise quite a few cases.

This patch instead reworks split_constant_offset so that:

- it treats integer operations as having an implicit cast to sizetype
- for integer operations, the returned VAR has type sizetype

In other words, the problem becomes to express:

  (sizetype) (OP0 CODE OP1)

as:

  VAR:sizetype + (sizetype) OFF:ssizetype

The top-level integer split_constant_offset will (usually) be a sizetype
POINTER_PLUS operand, so the extra cast to sizetype disappears.  But adding
the cast allows the conversion handling to defer a lot of the difficult
cases to the recursive split_constant_offset call, which can detect
overflow on individual operations.

The net effect is to analyse the access above as:

  base: base
  offset: -(sizetype) foo * size
  init: INT_MIN * size

See the comments in the patch for more details.

gcc/
PR tree-optimization/98069
* tree-data-ref.c (compute_distributive_range): New function.
(nop_conversion_for_offset_p): Likewise.
(split_constant_offset): In the internal overload, treat integer
expressions as having an implicit cast to sizetype and express
them accordingly.  Pass back the range of the original (uncast)
expression in a new range parameter.
(split_constant_offset_1): Likewise.  Rework the handling of
conversions to account for the implicit sizetype casts.

[VECT] pr97929 fix

This addresses pr97929. The case for WIDEN_PLUS and WIDEN_MINUS were
missing in vect_get_smallest_scalar_type.

gcc/ChangeLog:

PR tree-optimization/97929
* tree-vect-data-refs.c (vect_get_smallest_scalar_type): Add
WIDEN_PLUS/WIDEN_MINUS case.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/pr97929.c: New test.

Add WIDEN_PLUS, WIDEN_MINUS pretty print

Add 'w+'/'w-' as WIDEN_PLUS/WIDEN_MINUS respectively.
Add VEC_WIDEN_PLUS/MINUS_HI/LO<...> for
VEC_WIDEN_PLUS/MINUS_HI/LO

gcc/ChangeLog:

* tree-pretty-print.c (dump_generic_node): Add case for
VEC_WIDEN_(PLUS/MINUS)_(HI/LO)_EXPR and WIDEN_(PLUS/MINUS)_EXPR.

tree-optimization/98211 - fix bogus vectorization of conversion

Pattern recog incompletely handles some bool cases but we shouldn't
miscompile as a result but not vectorize.  Unfortunately
vectorizable_assignment lets invalid conversions (that
vectorizable_conversion rejects) slip through.  The following
rectifies that.

2020-12-10  Richard Biener  <rguenther@suse.de>

PR tree-optimization/98211
* tree-vect-stmts.c (vectorizable_assignment): Disallow
invalid conversions to bool vector types.

* gcc.dg/pr98211.c: New testcase.

drop __builtin_ from __clear_cache libname

I made a cut&pasto in my previous patch for tree.c, causing platforms
that have CLEAR_INSN_CACHE defined, and none of the internal
__clear_cache expansion overriders, to issue calls to symbols named
__builtin___clear_cache rather than __clear_cache, on languages other
than those in the C family. Oops.

This patch removes __builtin_ from the string used as the libname for
__buuiltin___clear_cache.

for gcc/ChangeLog

* tree.c (build_common_builtin_nodes): Drop __builtin_ from
__clear_cache libname.

dojump: Improve float != comparisons on x86 [PR98212]

The x86 backend doesn't have EQ or NE floating point comparisons,
so splits x != y into x unord y || x <> y.  The problem with that is
that unord comparison doesn't trap on qNaN operands but LTGT does.
The end effect is that it doesn't trap on qNaN operands, because x unord y
will be true for those and so LTGT will not be performed, but as the backend
is currently unable to merge signalling and non-signalling comparisons (and
after all, with this exact exception it shouldn't unless the first one is
signalling and the second one is non-signalling) it means we end up with:
        ucomiss %xmm1, %xmm0
        jp      .L4
        comiss  %xmm1, %xmm0
        jne     .L4
        ret
        .p2align 4,,10
        .p2align 3
.L4:
        xorl    %eax, %eax
        jmp     foo
where the comiss is the signalling comparison, but we already know that
the right flags bits are already computed by the ucomiss insn.

The following patch, if target supports UNEQ comparisons, splits NE
as x unord y || !(x uneq y) instead, which in the end means we end up with
just:
        ucomiss %xmm1, %xmm0
        jp      .L4
        jne     .L4
        ret
        .p2align 4,,10
        .p2align 3
.L4:
        jmp     foo
because UNEQ is like UNORDERED non-signalling.

2020-12-10  Jakub Jelinek  <jakub@redhat.com>

PR rtl-optimization/98212
* dojump.c (do_compare_rtx_and_jump): When splitting NE and backend
can do UNEQ, prefer splitting x != y into x unord y || !(x uneq y)
instead of into x unord y || x ltgt y.

* gcc.target/i386/pr98212.c: New test.

dojump: Optimize a == a or a != a [PR98169]

If the backend doesn't have floating point EQ or NE comparison, dojump.c
splits it into ORDERED && UNEQ or UNORDERED || LTGT. If both comparison
operands are the same, we know the result of the second comparison though,
a == b is equivalent to a ord b and a != b is equivalent to a unord b,
and thus can just use ORDERED or UNORDERED.

On the testcase, this changes f1:
- ucomiss %xmm0, %xmm0
- movl $1, %eax
- jp .L3
- jne .L3
- ret
- .p2align 4,,10
- .p2align 3
-.L3:
xorl %eax, %eax
+ ucomiss %xmm0, %xmm0
+ setnp %al
and f3:
- ucomisd %xmm0, %xmm0
- movl $1, %eax
- jp .L8
- jne .L8
- ret
- .p2align 4,,10
- .p2align 3
-.L8:
xorl %eax, %eax
+ ucomisd %xmm0, %xmm0
+ setnp %al
while keeping the same code for f2 and f4.

2020-12-10 Jakub Jelinek <jakub@redhat.com>

PR tree-optimization/98169
* dojump.c (do_compare_rtx_and_jump): Don't split self-EQ/NE
comparisons, just use ORDERED or UNORDERED.

* gcc.target/i386/pr98169.c: New test.

openmp: Fix ICE with broken doacross loop [PR98205]

If the loop body doesn't ever continue, we don't have a bb to insert the
updates. Fixed by not adding them at all in that case.

2020-12-10 Jakub Jelinek <jakub@redhat.com>

PR middle-end/98205
* omp-expand.c (expand_omp_for_generic): Fix up broken_loop handling.

* c-c++-common/gomp/doacross-4.c: New test.

Allow scalar fallback for pattern root stmt

This adjusts the SLP build to allow a pattern root stmt to be
built from scalars.  I've noticed this in PR98211 where we fail
to promote a SLP subtree to a simple splat operation and instead
emit a series of uniform vector operations.  The bb-slp-div-1.c
testcase is now vectorized on x86_64 but only the store so I
adjusted it to expect the load to be vectorized.

2020-12-10  Richard Biener  <rguenther@suse.de>

* tree-vect-slp.c (vect_get_and_check_slp_defs): Do
not mark the defs to occur in a pattern if it is the
pattern root and record the original stmt defs in that
case.

* gcc.dg/vect/bb-slp-div-1.c: Expect the load to be
vectorized.

RISC-V: Explicitly call python when using multilib generator

When building GCC for RISC-V with the --with-multilib-generator option,
it may not be possible to call arch-canonicalize as an executable when
building on Windows. Instead directly invoke the expected python
interpreter for this step.

gcc/ChangeLog:

* config/riscv/multilib-generator (arch_canonicalize): Invoke
python interpreter when calling arch-canonicalize script.

-fdump-go-spec: ignore type ordering of incomplete types

gcc/:
* godump.c (go_format_type): Don't consider whether a type has
been seen when determining whether to output a type by name.
Consider only the use_type_name parameter.
(go_output_typedef): When outputting a typedef, format the
declaration's original type, which contains the name of the
underlying type rather than the name of the typedef.
gcc/testsuite:
* gcc.misc-tests/godump-1.c: Add test case.

go-test.exp: recognize errorcheckdir -n

* go.test/go-test.exp (go-gc-tests): Recognize errorcheckdir -n,
for bug345.go.

Daily bump.

go-test.exp: rewrite errchk regexp quoting

* go.test/go-test.exp (errchk): Rewrite regexp quoting to use
curly braces, making it much simpler.