git.libre-soc.org Git

VAX: Fix predicates for widening multiply and multiply-add insns

It makes no sense for insn operand predicates, as long as they accept a
register operand, to be more restrictive than the set of the associated
constraints, because expand will choose the insn based on the relevant
operand being a pseudo register then and reload will keep it happily as
an immediate if a constraint permits it.  So the restriction posed by
such a predicate will be happily ignored, and moreover if a splitter is
added, such as required for MODE_CC support, the new instructions will
reject the original operands supplied, causing an ICE like below:

.../gcc/testsuite/gfortran.dg/graphite/PR67518.f90:44:0: Error: could not split insn
(insn 90 662 663 (set (reg:DI 10 %r10 [orig:97 _235 ] [97])
        (mult:DI (sign_extend:DI (mem/c:SI (plus:SI (reg/f:SI 13 %fp)
                        (const_int -800 [0xfffffffffffffce0])) [14 %sfp+-800 S4 A32]))
            (sign_extend:DI (const_int -51 [0xffffffffffffffcd])))) 299 {mulsidi3}
     (expr_list:REG_EQUAL (mult:DI (sign_extend:DI (subreg:SI (mem/c:DI (plus:SI (reg/f:SI 13 %fp)
                            (const_int -800 [0xfffffffffffffce0])) [14 %sfp+-800 S8 A32]) 0))
            (const_int -51 [0xffffffffffffffcd]))
        (nil)))
during RTL pass: final
.../gcc/testsuite/gfortran.dg/graphite/PR67518.f90:44:0: internal compiler error: in final_scan_insn_1, at final.c:3073
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://gcc.gnu.org/bugs/> for instructions.

Change the predicates used with the widening multiply and multiply-add
insns to allow immediates then, just as the constraints and the machine
instructions produced permit.

Also give the insns names, for easier reference here and elsewhere.

gcc/
* config/vax/vax.md (mulsidi3): Fix the multiplicand predicates.
(*maddsidi4, *maddsidi4_const): Likewise.  Name insns.

VAX: Fix predicates and constraints for bit-field comparison insns

It makes no sense for insn operand predicates, as long as they accept a
register operand, to be more restrictive than the set of the associated
constraints, because expand will choose the insn based on the relevant
operand being a pseudo register then and reload keep it happily as a
memory reference if a constraint permits it.  So the restriction posed
by such a predicate will be happily ignored, and moreover if a splitter
is added, such as required for MODE_CC support, the new instructions
will reject the original operands supplied, causing an ICE.  An actual
example will be given with a subsequent change.

Therefore, similarly to EXTV/EXTZV/INSV insns, remove inconsistencies
with predicates and constraints of bit-field comparison insns, observing
that a bit-field located in memory is byte-addressed by the respective
machine instructions and therefore SImode may only be used with a
register or an offsettable memory operand (i.e. not an indexed,
pre-decremented, or post-incremented one).

Also give the insns names, for easier reference here and elsewhere.

gcc/
* config/vax/vax.md (*cmpv_2): Name insn.
(*cmpv, *cmpzv, *cmpzv_2): Likewise.  Fix location predicate and
constraint.

VAX: Make `extv' an expander matching the remaining bit-field operations

We have matching insns defined for `sign_extract' and `zero_extract'
expressions, so make the three named patterns for bit-field operations
consistent and make `extv' an expander rather than an insn taking a
SImode, a QImode, and a SImode general operand for the LOC, SIZE, and
POS operands respectively, like with the `extzv' and `insv' patterns,
matching the machine instructions and giving the middle end more choice
as to which actual insn to choose in a given situation.

Given this program:

typedef struct
{
  int f0:1;
  int f1:7;
  int f8:8;
  int f16:16;
} bit_t;

typedef struct
{
  unsigned int f0:1;
  unsigned int f1:7;
  unsigned int f8:8;
  unsigned int f16:16;
} ubit_t;

typedef union
{
  bit_t b;
  int i;
} bit_u;

typedef union
{
  ubit_t b;
  unsigned int i;
} ubit_u;

int
ins1 (bit_u x, int y)
{
  asm volatile ("" : "+r" (x), "+r" (y));
  x.b.f1 = y;
  return x.i;
}

int
ext1 (bit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f1;
}

unsigned int
extz1 (ubit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f1;
}

int
ins8 (bit_u x, int y)
{
  asm volatile ("" : "+r" (x), "+r" (y));
  x.b.f8 = y;
  return x.i;
}

int
ext8 (bit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f8;
}

unsigned int
extz8 (ubit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f8;
}

int
ins16 (bit_u x, int y)
{
  asm volatile ("" : "+r" (x), "+r" (y));
  x.b.f16 = y;
  return x.i;
}

int
ext16 (bit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f16;
}

unsigned int
extz16 (ubit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f16;
}

this results in the following code change:

@@ -16,12 +16,12 @@ ins1:
.globl ext1
.type ext1, @function
ext1:
- .word 0 # 19 [c=0]  procedure_entry_mask
- subl2 $4,%sp # 20 [c=32]  addsi3
+ .word 0 # 18 [c=0]  procedure_entry_mask
+ subl2 $4,%sp # 19 [c=32]  addsi3
movl 4(%ap),%r0 # 2 [c=16]  movsi_2
- cvtbl %r0,%r0 # 7 [c=4]  extendqisi2
- ashl $-1,%r0,%r0 # 14 [c=40]  *vax.md:624
- ret # 24 [c=0]  return
+ extv $1,$7,%r0,%r0 # 7 [c=60]  *extv_non_const
+ cvtbl %r0,%r0 # 13 [c=4]  extendqisi2
+ ret # 23 [c=0]  return
.size ext1, .-ext1
.align 1
.globl extz1
@@ -49,12 +49,12 @@ ins8:
.globl ext8
.type ext8, @function
ext8:
- .word 0 # 20 [c=0]  procedure_entry_mask
- subl2 $4,%sp # 21 [c=32]  addsi3
+ .word 0 # 18 [c=0]  procedure_entry_mask
+ subl2 $4,%sp # 19 [c=32]  addsi3
movl 4(%ap),%r0 # 2 [c=16]  movsi_2
- cvtwl %r0,%r0 # 7 [c=4]  extendhisi2
- ashl $-8,%r0,%r0 # 15 [c=40]  *vax.md:624
- ret # 25 [c=0]  return
+ rotl $24,%r0,%r0 # 13 [c=60]  *extv_non_const
+ cvtbl %r0,%r0
+ ret # 23 [c=0]  return
.size ext8, .-ext8
.align 1
.globl extz8

If there is a performance degradation with the replacement sequences,
then it can and should be sorted within `extv_non_const'.

gcc/
* config/vax/vax.md (extv): Rename insn to...
(*extv): ... this.
(extv): New expander.

VAX: Ensure PIC mode address is adjustable with aligned bit-field insns

With the `*insv_aligned', `*extzv_aligned' and `*extv_aligned' insns we
are going to adjust the bit-field location if it is in memory, so only
allow such location addresses that can be offset, excluding external
symbol references in the PIC mode in particular.

This fixes an ICE like:

during RTL pass: final
In file included from .../gcc/testsuite/gcc.dg/torture/vshuf-v16qi.c:11:
.../gcc/testsuite/gcc.dg/torture/vshuf-main.inc: In function 'test_13':
.../gcc/testsuite/gcc.dg/torture/vshuf-main.inc:27:1: internal compiler error: in change_address_1, at emit-rtl.c:2275
.../gcc/testsuite/gcc.dg/torture/vshuf-16.inc:16:1: note: in expansion of macro 'T'
.../gcc/testsuite/gcc.dg/torture/vshuf-main.inc:28:1: note: in expansion of macro 'TESTS'
0x10a34b33 change_address_1
.../gcc/emit-rtl.c:2275
0x10a358af adjust_address_1(rtx_def*, machine_mode, poly_int<1u, long>, int, int, int, poly_int<1u, long>)
.../gcc/emit-rtl.c:2409
0x11d2505f output_97
.../gcc/config/vax/vax.md:806
0x10adec4b get_insn_template(int, rtx_insn*)
.../gcc/final.c:2070
0x10ae1c5b final_scan_insn_1
.../gcc/final.c:3039
0x10ae2257 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
.../gcc/final.c:3152
0x10ade9a3 final_1
.../gcc/final.c:2020
0x10ae6157 rest_of_handle_final
.../gcc/final.c:4658
0x10ae6697 execute
.../gcc/final.c:4736
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
compiler exited with status 1
FAIL: gcc.dg/torture/vshuf-v16qi.c   -O2  (internal compiler error)

triggered by an RTL instruction like:

(insn 97 96 98 (set (reg:SI 5 %r5 [88])
        (zero_extract:SI (mem/c:SI (symbol_ref:SI ("b") <var_decl 0x7ffff7f801b0 b>) [0 b+0 S4 A128])
            (const_int 8 [0x8])
            (const_int 24 [0x18]))) ".../gcc/testsuite/gcc.dg/torture/vshuf-main.inc":28:1 97 {*extzv_aligned}
     (nil))

and removes these regressions:

FAIL: gcc.dg/torture/vshuf-v16qi.c   -O2  (internal compiler error)
FAIL: gcc.dg/torture/vshuf-v16qi.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/vshuf-v4hi.c   -O2  (internal compiler error)
FAIL: gcc.dg/torture/vshuf-v4hi.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/vshuf-v8hi.c   -O2  (internal compiler error)
FAIL: gcc.dg/torture/vshuf-v8hi.c   -O2  (test for excess errors)
FAIL: gcc.dg/torture/vshuf-v8qi.c   -O2  (internal compiler error)
FAIL: gcc.dg/torture/vshuf-v8qi.c   -O2  (test for excess errors)

However expand typically presents pseudo-registers rather than memory
references to these insns, so a further rework is required to make a
better use of the code variant they are supposed to produce.  This at
least fixes the problem at hand.

gcc/
* config/vax/vax.md (*insv_aligned, *extzv_aligned)
(*extv_aligned): Also make sure the memory address of a bit-field
location can be adjusted in the PIC mode.

VAX: Remove EXTV/EXTZV/INSV instruction use from aligned case insns

The INSV machine instruction is the only computational operation in the
VAX ISA that keeps condition codes intact.  In preparation to MODE_CC
transition keep patterns apart then that make or do not make use of said
instruction.  For consistency update EXTV and EXTZV instruction uses
accordingly.  In expand SUBREGs will be presented as operands, so handle
that possibility in the insn condition.

This actually yields better code by avoiding EXTV/EXTZV instructions in
pseudo-aligned register cases previously resorting to those instructions:

@@ -42,7 +42,7 @@ ins8:
subl2 $4,%sp # 21 [c=32]  addsi3
movl 4(%ap),%r0 # 2 [c=16]  movsi_2
movl 8(%ap),%r1 # 17 [c=16]  movsi_2
- insv %r1,$8,$8,%r0 # 9 [c=4]  *insv_aligned
+ insv %r1,$8,$8,%r0 # 9 [c=4]  *insv_2
ret # 25 [c=0]  return
.size ins8, .-ins8
.align 1
@@ -60,12 +60,12 @@ ext8:
.globl extz8
.type extz8, @function
extz8:
- .word 0 # 19 [c=0]  procedure_entry_mask
- subl2 $4,%sp # 20 [c=32]  addsi3
+ .word 0 # 18 [c=0]  procedure_entry_mask
+ subl2 $4,%sp # 19 [c=32]  addsi3
movl 4(%ap),%r0 # 2 [c=16]  movsi_2
- extzv $8,$8,%r0,%r1 # 13 [c=60]  *extzv_aligned
- movl %r1,%r0 # 18 [c=4]  movsi_2
- ret # 24 [c=0]  return
+ rotl $24,%r0,%r0 # 13 [c=60]  *extzv_non_const
+ movzbl %r0,%r0
+ ret # 23 [c=0]  return
.size extz8, .-extz8
.align 1
.globl ins16
@@ -75,7 +75,7 @@ ins16:
subl2 $4,%sp # 21 [c=32]  addsi3
movl 4(%ap),%r0 # 2 [c=16]  movsi_2
movl 8(%ap),%r1 # 17 [c=16]  movsi_2
- insv %r1,$16,$16,%r0 # 9 [c=4]  *insv_aligned
+ insv %r1,$16,$16,%r0 # 9 [c=4]  *insv_2
ret # 25 [c=0]  return
.size ins16, .-ins16
.align 1
@@ -94,8 +94,9 @@ ext16:
extz16:
.word 0 # 18 [c=0]  procedure_entry_mask
subl2 $4,%sp # 19 [c=32]  addsi3
- movl 4(%ap),%r1 # 2 [c=16]  movsi_2
- extzv $16,$16,%r1,%r0 # 7 [c=60]  *extzv_aligned
+ movl 4(%ap),%r0 # 2 [c=16]  movsi_2
+ rotl $16,%r0,%r0 # 7 [c=60]  *extzv_non_const
+ movzwl %r0,%r0
movzwl %r0,%r0 # 13 [c=4]  zero_extendhisi2
ret # 23 [c=0]  return
.size extz16, .-extz16

demonstrated with this program:

typedef struct
{
  int f0:1;
  int f1:7;
  int f8:8;
  int f16:16;
} bit_t;

typedef struct
{
  unsigned int f0:1;
  unsigned int f1:7;
  unsigned int f8:8;
  unsigned int f16:16;
} ubit_t;

typedef union
{
  bit_t b;
  int i;
} bit_u;

typedef union
{
  ubit_t b;
  unsigned int i;
} ubit_u;

int
ins1 (bit_u x, int y)
{
  asm volatile ("" : "+r" (x), "+r" (y));
  x.b.f1 = y;
  return x.i;
}

int
ext1 (bit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f1;
}

unsigned int
extz1 (ubit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f1;
}

int
ins8 (bit_u x, int y)
{
  asm volatile ("" : "+r" (x), "+r" (y));
  x.b.f8 = y;
  return x.i;
}

int
ext8 (bit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f8;
}

unsigned int
extz8 (ubit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f8;
}

int
ins16 (bit_u x, int y)
{
  asm volatile ("" : "+r" (x), "+r" (y));
  x.b.f16 = y;
  return x.i;
}

int
ext16 (bit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f16;
}

unsigned int
extz16 (ubit_u x)
{
  asm volatile ("" : "+r" (x));
  return x.b.f16;
}

It also papers over a regression:

FAIL: gcc.dg/pr83623.c (internal compiler error)
FAIL: gcc.dg/pr83623.c (test for excess errors)

from an ICE like:

during RTL pass: final
.../gcc/testsuite/gcc.dg/pr83623.c: In function 'foo':
.../gcc/testsuite/gcc.dg/pr83623.c:13:1: internal compiler error: in change_address_1, at emit-rtl.c:2275
0x10a056e3 change_address_1
.../gcc/emit-rtl.c:2275
0x10a0645f adjust_address_1(rtx_def*, machine_mode, poly_int<1u, long>, int, int, int, poly_int<1u, long>)
.../gcc/emit-rtl.c:2409
0x11cb588f output_97
.../gcc/config/vax/vax.md:808
0x10aafb2f get_insn_template(int, rtx_insn*)
.../gcc/final.c:2070
0x10ab2b3f final_scan_insn_1
.../gcc/final.c:3039
0x10ab313b final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
.../gcc/final.c:3152
0x10aaf887 final_1
.../gcc/final.c:2020
0x10ab703b rest_of_handle_final
.../gcc/final.c:4658
0x10ab757b execute
.../gcc/final.c:4736
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
compiler exited with status 1
FAIL: gcc.dg/pr83623.c (internal compiler error)

triggered by an RTL instruction like:

(insn 17 14 145 (set (reg:SI 1 %r1)
        (zero_extract:SI (mem/c:SI (symbol_ref:SI ("x") <var_decl 0x7ffff7f80120 x>) [1 x+0 S4 A128])
            (const_int 16 [0x10])
            (const_int 16 [0x10]))) ".../gcc/testsuite/gcc.dg/pr83623.c":12:9 97 {*extzv_aligned}
     (nil))

(where the address cannot be adjusted by 2 for PIC code as requested
here as it would create an offset external symbol reference) otherwise
caused by the patterns modified here, addressed next.  This indicates
a further rework is warranted here, but at least problems at hand have
been fixed.

gcc/
* config/vax/vax.md (*insv_aligned, *extzv_aligned)
(*extv_aligned): Reject register bit-field locations that are not
aligned to the least significant bit; update output statement
accordingly.

VAX: Fix predicates and constraints for EXTV/EXTZV/INSV insns

It makes no sense for insn operand predicates, as long as they accept a
register operand, to be more restrictive than the set of the associated
constraints, because expand will choose the insn based on the relevant
operand being a pseudo register then and reload keep it happily as a
memory reference if a constraint permits it.  So the restriction posed
by such a predicate will be happily ignored, and moreover if a splitter
is added, such as required for MODE_CC support, the new instructions
will reject the original operands supplied, causing an ICE.  An actual
example will be given with a subsequent change.

Remove such inconsistencies we have with the EXTV/EXTZV/INSV insns then,
observing that a bit-field located in memory is byte-addressed by the
respective machine instructions and therefore SImode may only be used
with a register or an offsettable memory operand (i.e. not an indexed,
pre-decremented, or post-incremented one), which has already been taken
into account with the constraints currently used, except for `*insv_2'.
The QI machine mode may be used for the bit-field location with any kind
of memory operand, but we got the constraint wrong, although harmlessly
in reality, with `*insv'.  Fix that for consistency though.

Also give the insns names, for easier reference here and elsewhere.

gcc/
* config/vax/vax.md (*insv_aligned, *extzv_aligned)
(*extv_aligned, *extv_non_const, *extzv_non_const): Name insns.
Fix location predicate.
(*extzv): Name insn.
(*insv): Likewise.  Fix location constraint.
(*insv_2): Likewise, and the predicate.

VAX: Add the `movmemhi' instruction

The MOVC3 machine instruction has `memmove' semantics[1]:

"The operation of the instruction is such that overlap of the source and
destination strings does not affect the result."

so use it to provide the `movmemhi' instruction as well.

References:

[1] DEC STD 032-0 "VAX Architecture Standard", Digital Equipment
    Corporation, A-DS-EL-00032-00-0 Rev J, December 15, 1989, Section
    3.10 "Character-String Instructions", p. 3-162

gcc/
* config/vax/vax.md (cpymemhi1): Rename insn to...
(movmemhi1): ... this.
(cpymemhi): Update accordingly.  Remove constraints.
(movmemhi): New expander.

gcc/testsuite/
* gcc.target/vax/movmem.c: New test.

VAX: Add a test for the `cpymemhi' instruction

gcc/testsuite/
* gcc.target/vax/cpymem.c: New test.

VAX: Actually produce QImode and HImode `ctz' operations

The middle end does not refer to `ctzqi2'/`ctzhi2' or `ffsqi2'/`ffshi2'
patterns by name where `__builtin_ctz' or `__builtin_ffs' respectively
is invoked for an argument of the QImode or HImode type, and instead it
extends the data type before passing it to `ctzsi2' or `ffssi2'.

Avoid the redundant operation and use a peephole2 to convert it to the
right RTL expression that will collapse the two operations into a single
machine instruction instead unless we need the extended intermediate
result for another purpose.

gcc/
* config/vax/builtins.md: Add a peephole2 for QImode and HImode
`ctz' operations.
(any_extend): New code iterator.

gcc/testsuite/
* gcc.target/vax/ctzhi.c: New test.
* gcc.target/vax/ctzqi.c: New test.
* gcc.target/vax/ffshi.c: New test.
* gcc.target/vax/ffsqi.c: New test.

VAX: Also provide QImode and HImode `ctz' and `ffs' operations

The FFS machine instruction provides for arbitrary input bit-field widths
so take advantage of this and convert `ffssi2' and `ctzsi2' to templates
for all the three of QI, HI, SI machine modes.

Test cases will be added separately.

gcc/
* config/vax/builtins.md (width): New mode attribute.
(ffssi2): Rework expander into...
(ffs<mode>2): ... this.
(ctzsi2): Rework insn into...
(ctz<mode>2): ... this.

VAX: Provide the `ctz' operation

Our `ffssi2_internal' pattern and the machine FFS instruction, which
technically is a bit-field operation, match the `ctz' operation exactly,
with the result produced for the bit-field source operand of zero equal
to its width as specified with another machine instruction operand, not
directly expressed in RTL and currently hardcoded in the assembly code
produced. In our terms this is the bit size of the machine mode used,
and although it's SImode now let's be flexible for an upcoming change.

The operation also sets the Z condition code according to the value of
the source operand.

gcc/
* config/vax/builtins.md (ffssi2_internal): Rename insn to...
(ctzsi2): ... this. Update the RTL operation.
(ffssi2): Update accordingly.
* config/vax/vax.c (vax_notice_update_cc): Handle CTZ.
* config/vax/vax.h (CTZ_DEFINED_VALUE_AT_ZERO): New macro.

gcc/testsuite/
* gcc.target/vax/ctzsi.c: New test.

VAX: Add tests for `sync_lock_test_and_set' and `sync_lock_release'

Based on gcc.dg/pr61756.c.

gcc/testsuite/
* gcc.target/vax/bbcci.c: New test.
* gcc.target/vax/bbssi.c: New test.

VAX: Add a test for the SImode `ffs' operation

gcc/testsuite/
* gcc.target/vax/ffssi.c: New test.

VAX: Actually enable `builtins.md' now that it is fully functional

Test cases will follow.

gcc/
* config/vax/vax.md: Include `builtins.md'.

VAX: Correct `sync_lock_test_and_set' and `sync_lock_release' builtins

Remove an ICE like:

during RTL pass: expand
.../libatomic/tas_n.c: In function 'libat_test_and_set_1':
.../libatomic/tas_n.c:39:1: internal compiler error: in patch_jump_insn, at cfgrtl.c:1298
   39 | }
      | ^
0x108a09ff patch_jump_insn
.../gcc/cfgrtl.c:1298
0x108a0b07 redirect_branch_edge
.../gcc/cfgrtl.c:1325
0x108a124b rtl_redirect_edge_and_branch
.../gcc/cfgrtl.c:1458
0x1087f6d3 redirect_edge_and_branch(edge_def*, basic_block_def*)
.../gcc/cfghooks.c:373
0x11d6264b try_forward_edges
.../gcc/cfgcleanup.c:562
0x11d6b0eb try_optimize_cfg
.../gcc/cfgcleanup.c:2960
0x11d6ba4f cleanup_cfg(int)
.../gcc/cfgcleanup.c:3174
0x10870b3f execute
.../gcc/cfgexpand.c:6763

triggered with an RTL pattern like:

(jump_insn 8 7 20 2 (parallel [
            (set (pc)
                (if_then_else (ne (zero_extract:SI (mem/v:QI (mem/f/c:SI (reg/f:SI 16 virtual-incoming-args) [1 mptr+0 S4 A32]) [-1  S1 A8])
                            (const_int 1 [0x1])
                            (const_int 0 [0]))
                        (const_int 0 [0]))
                    (label_ref 10)
                    (pc)))
            (set (zero_extract:SI (mem/v:QI (mem/f/c:SI (reg/f:SI 16 virtual-incoming-args) [1 mptr+0 S4 A32]) [-1  S1 A8])
                    (const_int 1 [0x1])
                    (const_int 0 [0]))
                (const_int 1 [0x1]))
        ]) ".../libatomic/tas_n.c":38:12 -1
     (nil)
-> 10)

caused by a volatile memory reference used that is not accepted by the
`memory_operand' predicate of the `jbbssiqi' insn explicitly referred
from the `sync_lock_test_and_setqi' expander.  Also seen with:

FAIL: gcc.dg/pr61756.c (internal compiler error)

Define a new `any_memory_operand' predicate accepting both ordinary and
volatile memory references and use it with the `jbb<ccss>i<mode>' insn,
so as to address the ICE.

Also remove useless operations from the `sync_lock_test_and_set<mode>'
and `sync_lock_release<mode>' expanders as those always either complete
or fail and therefore never fall through to using their template other
than to match operands.  Wrap `jbb<ccss>i<mode>' into `unspec_volatile'
instead so that the jump does not get removed or reordered.  Share one
index to avoid a complication around the iterators since the index is
nowhere referred to anyway and the pattern required pulled by its name.

Test cases will be added separately.

gcc/
* config/vax/predicates.md (volatile_mem_operand)
(any_memory_operand): New predicates.
* config/vax/builtins.md (VUNSPEC_UNLOCK): Remove constant.
(sync_lock_test_and_set<mode>): Remove `set' and `unspec'
operations, match operands only.  Reformat.
(sync_lock_release<mode>): Likewise.  Remove cruft.
(jbb<ccss>i<mode>): Wrap into `unspec_volatile', use
`any_memory_operand' predicate.

VAX: Use an int iterator to produce individual interlocked branches

With mode-specific interlocked branch insns already folded into iterated
templates now fold the two templates into one too, observing that the
only difference between them is the value of the bit branched on, which
is of course reflected both in the RTL expression and the instruction
produced. Use an int iterator to iterate over the bit value, making use
of the newly-added wide integer support, and substituting patterns as
necessary to produce equivalent individual insns. No functional change.

gcc/
* config/vax/builtins.md (bit): New int iterator.
(ccss): New int attribute.
(jbbssi<mode>, jbbcci<mode>): Fold insns into...
(jbb<ccss>i<mode>): ... this.

VAX: Use a mode iterator to produce individual interlocked branches

Regardless of the machine mode all the interlocked branches of the same
kind, one of the two provided by the ISA, use the same RTL patterns and
machine instructions, except for the memory operand's constraint.

Remove code duplication then and make use of a mode iterator combined
with an attribute to expand the same insn patterns with the constraint
suitably substituted from a single template. No functional change.

gcc/
* config/vax/builtins.md (bb_mem): New mode attribute.
(jbbssiqi, jbbssihi, jbbssisi): Fold insns into...
(jbbssi<mode>): ... this.
(jbbcciqi, jbbccihi, jbbccisi): Likewise...
(jbbcci<mode>): ... this.

jump: Also handle jumps wrapped in UNSPEC or UNSPEC_VOLATILE

VAX has interlocked branch instructions used for atomic operations and
we want to have them wrapped in UNSPEC_VOLATILE so as not to have code
carried across.  This however breaks with jump optimization and leads
to an ICE in the build of libbacktrace like:

.../libbacktrace/mmap.c:190:1: internal compiler error: in fixup_reorder_chain, at cfgrtl.c:3934
  190 | }
      | ^
0x1087d46b fixup_reorder_chain
.../gcc/cfgrtl.c:3934
0x1087f29f cfg_layout_finalize()
.../gcc/cfgrtl.c:4447
0x1087c74f execute
.../gcc/cfgrtl.c:3662

on RTL like:

(jump_insn 18 17 150 4 (unspec_volatile [
            (set (pc)
                (if_then_else (eq (zero_extract:SI (mem/v:SI (reg/f:SI 23 [ _2 ]) [-1  S4 A32])
                            (const_int 1 [0x1])
                            (const_int 0 [0]))
                        (const_int 1 [0x1]))
                    (label_ref 20)
                    (pc)))
            (set (zero_extract:SI (mem/v:SI (reg/f:SI 23 [ _2 ]) [-1  S4 A32])
                    (const_int 1 [0x1])
                    (const_int 0 [0]))
                (const_int 1 [0x1]))
        ] 101) ".../libbacktrace/mmap.c":135:14 158 {jbbssisi}
     (nil)
-> 20)

when those branches are enabled with a follow-up change.  Also showing
with:

FAIL: gcc.dg/pr61756.c (internal compiler error)

Handle branches wrapped in UNSPEC_VOLATILE then and, for consistency,
also in UNSPEC.  The presence of UNSPEC_VOLATILE will prevent such
branches from being removed as they won't be accepted by `onlyjump_p',
we just need to let them through.

gcc/
* jump.c (pc_set): Also accept a jump wrapped in UNSPEC or
UNSPEC_VOLATILE.
(any_uncondjump_p, any_condjump_p): Update comment accordingly.

loop-doloop: Add missing call to `onlyjump_p'

Keep any jump that has side effects as those must not be removed.

gcc/
* loop-doloop.c (add_test): Only remove the jump if `onlyjump_p'.

cfgrtl: Add missing call to `onlyjump_p'

If any unconditional jumps within a block have side effects then the
block cannot be considered empty.

gcc/
* cfgrtl.c (rtl_block_empty_p): Return false if `!onlyjump_p'
too.

sel-sched-ir: Add missing call to `onlyjump_p'

Do not try to remove a conditional jump if it has side effects.

gcc/
* sel-sched-ir.c (maybe_tidy_empty_bb): Only try to remove a
conditional jump if `onlyjump_p'.

loop-iv: Add missing calls to `onlyjump_p'

Ignore jumps that have side effects in loop processing as pasting the
body of a loop multiple times within is semantically equivalent to jump
deletion (between the iterations unrolled) even if we do not physically
delete the jump RTL insn.

gcc/
* loop-iv.c (simplify_using_initial_values): Only process jumps
that match `onlyjump_p'.
(check_simple_exit): Likewise.

ifcvt: Add missing call to `onlyjump_p'

Do not convert a conditional jump into conditional execution (and remove
the jump as a consequence) if the jump has side effects.

gcc/
* ifcvt.c (dead_or_predicable) [!IFCVT_MODIFY_TESTS]: Bail out
if `!onlyjump_p'.

RTL: Also support HOST_WIDE_INT with int iterators

Add wide integer aka 'w' rtx format support to int iterators so that
machine description can iterate over `const_int' expressions.

This is made by expanding standard integer aka 'i' format support,
observing that any standard integer already present in any of our
existing RTL code will also fit into HOST_WIDE_INT, so there is no need
for a separate handler.  Any truncation of the number parsed is made by
the caller.  An assumption is made however that no place relies on
capping out of range values to INT_MAX.

Now the 'p' format is handled explicitly rather than being implied by
rtx being a SUBREG, so actually assert that it is, just to play safe.

gcc/
* read-rtl.c: Add a page-feed separator at the start of iterator
code.
(struct iterator_group): Change the return type to HOST_WIDE_INT
for the `find_builtin' member.  Likewise the second parameter
type for the `apply_iterator' member.
(atoll) [!HAVE_ATOQ]: Reorder.
(find_mode, find_code): Change the return type to HOST_WIDE_INT.
(apply_mode_iterator, apply_code_iterator)
(apply_subst_iterator): Change the second parameter type to
HOST_WIDE_INT.
(find_int): Handle input suitable for HOST_WIDE_INT output.
(apply_int_iterator): Rewrite in terms of explicit format
interpretation.
(rtx_reader::read_rtx_operand) <'w'>: Fold into...
<'i', 'n', 'p'>: ... this.
* doc/md.texi (Int Iterators): Document 'w' rtx format support.

VAX: Correct fatal issues with the `ffs' builtin

The `builtins.md' machine description fragment is not included anywhere
and is therefore dead code, which has become bitrotten due to non-use.

If actually enabled, it does not build due to the use of an unknown `t'
constraint:

.../gcc/config/vax/builtins.md:42:1: error: undefined machine-specific constraint at this point: "t"
.../gcc/config/vax/builtins.md:42:1: note:  in operand 1

which came from commit becb93d02cc1 ("builtins.md (ffssi2_internal):
Correct constraint."), which was not applied as posted and reviewed; `T'
was meant to be used instead.

Once this has been fixed this code still fails building:

.../gcc/config/vax/builtins.md: In function 'rtx_def* gen_ffssi2(rtx, rtx)':
.../gcc/config/vax/builtins.md:35:19: error: 'gen_bne' was not declared in this
scope; did you mean 'gen_use'?
   35 |   emit_jump_insn (gen_bne (label));
      |                   ^~~~~~~
      |                   gen_use
make[2]: *** [Makefile:1122: insn-emit.o] Error 1

Finally the FFS machine instruction sets the Z condition code according
to the comparison of the value held in the source operand against zero
rather than the value held in the target operand.  If the source operand
is found hold zero, then the target operand is set to the width of the
source operand, 32 for SImode (FFS supports arbitrary widths).

Correct the build issues then and update RTL to match the operation of
the machine instruction.  A test case will be added separately.

gcc/
* config/vax/builtins.md (ffssi2): Make preparation statements
actually buildable.
(ffssi2_internal): Fix input constraints; make the RTL pattern
match reality for `cc0'.

VAX: Rationalize expression and address costs

Expression costs are required to be given in terms of COSTS_N_INSNS (n),
which is defined to stand for the count of single fast instructions, and
actually returns `n * 4'.  The VAX backend however instead operates on
naked numbers, causing an anomaly for the integer const zero rtx, where
the cost given is 4 as opposed to 1 for integers in the [1:63] range, as
well as -1 for comparisons.  This is because the value of 0 returned by
`vax_rtx_costs' is converted to COSTS_N_INSNS (1) in `pattern_cost':

  return cost > 0 ? cost : COSTS_N_INSNS (1);

Consequently, where feasible, 1 or -1 are preferred over 0 by the middle
end causing code pessimization, e.g. rather than producing this:

subl2 $4,%sp
movl 4(%ap),%r0
jgtr .L2
addl2 $2,%r0
.L2:
ret

or this:

subl2 $4,%sp
addl3 4(%ap),8(%ap),%r0
jlss .L6
addl2 $2,%r0
.L6:
ret

code is produced like this:

subl2 $4,%sp
movl 4(%ap),%r0
cmpl %r0,$1
jgeq .L2
addl2 $2,%r0
.L2:
ret

or this:

subl2 $4,%sp
addl3 4(%ap),8(%ap),%r0
cmpl %r0,$-1
jleq .L6
addl2 $2,%r0
.L6:
ret

from this:

int
compare_mov (int x)
{
  if (x > 0)
    return x;
  else
    return x + 2;
}

and this:

int
compare_add (int x, int y)
{
  int z;

  z = x + y;
  if (z < 0)
    return z;
  else
    return z + 2;
}

respectively, which is slower and larger both at a time.

Furthermore once the backend is converted to MODE_CC this anomaly makes
it usually impossible to remove redundant comparisons in the comparison
elimination pass, because most VAX instructions set the condition codes
as per the relation of the instruction's result to 0 and not -1.

The middle end has some other assumptions as to rtx costs being given in
terms of COSTS_N_INSNS, so wrap all the VAX rtx costs then as they stand
into COSTS_N_INSNS invocations, effectively scaling the costs by 4 while
preserving their relative values, except for the integer const zero rtx
given the value of `COSTS_N_INSNS (1) / 2', half of a fast instruction
(this can be further halved if needed in the future).

Adjust address costs likewise so that they remain proportional to the
new absolute values of rtx costs.

Code size stats are as follows, collected from 17639 executables built
in `check-c' GCC testing:

              samples average  median
--------------------------------------
regressions      1420  0.400%  0.195%
unchanged       13811  0.000%  0.000%
progressions     2408 -0.504% -0.201%
--------------------------------------
total           17639 -0.037%  0.000%

with a small number of outliers only (over 5% size change):

old     new     change  %change filename
----------------------------------------------------
4991    5249     258     5.1693 981001-1.exe
2637    2777     140     5.3090 interchange-6.exe
2187    2307     120     5.4869 sprintf.x7
3969    4197     228     5.7445 pr28982a.exe
8264    8816     552     6.6795 vector-compare-1.exe
5199    5575     376     7.2321 pr28982b.exe
2113    2411     298    14.1031 20030323-1.exe
2113    2411     298    14.1031 20030323-1.exe
2113    2411     298    14.1031 20030323-1.exe

so it seems we are looking good, and we have complementing reductions
to compensate:

old     new     change  %change filename
----------------------------------------------------
2919    2631    -288    -9.8663 pr57521.exe
3427    3167    -260    -7.5868 sabd_1.exe
2985    2765    -220    -7.3701 ssad-run.exe
2985    2765    -220    -7.3701 ssad-run.exe
2985    2765    -220    -7.3701 usad-run.exe
2985    2765    -220    -7.3701 usad-run.exe
4509    4253    -256    -5.6775 vshuf-v2sf.exe
4541    4285    -256    -5.6375 vshuf-v2si.exe
4673    4417    -256    -5.4782 vshuf-v2df.exe
2993    2841    -152    -5.0785 abs-2.x4
2993    2841    -152    -5.0785 abs-3.x4

This actually causes `loop-8.c' to regress:

FAIL: gcc.dg/loop-8.c scan-rtl-dump-times loop2_invariant "Decided" 1
FAIL: gcc.dg/loop-8.c scan-rtl-dump-not loop2_invariant "without introducing a new temporary register"

but upon a closer inspection this is a red herring.  Old code looks as
follows:

.file "loop-8.c"
.text
.align 1
.globl f
.type f, @function
f:
.word 0
subl2 $4,%sp
movl 4(%ap),%r2
movl 8(%ap),%r3
movl $42,(%r2)
clrl %r0
movl $42,%r1
movl %r1,%r4
jbr .L2
.L5:
movl %r4,%r1
.L2:
movl %r1,(%r3)[%r0]
incl %r0
cmpl %r0,$100
jeql .L6
movl $42,(%r2)[%r0]
bicl3 $-2,%r0,%r1
jeql .L5
movl %r0,%r1
jbr .L2
.L6:
ret
.size f, .-f

while new one is like below:

.file "loop-8.c"
.text
.align 1
.globl f
.type f, @function
f:
.word 0
subl2 $4,%sp
movl 4(%ap),%r2
movl $42,(%r2)+
movl 8(%ap),%r1
clrl %r0
movl $42,%r3
movzbl $100,%r4
movl %r3,%r5
jbr .L2
.L5:
movl %r5,%r3
.L2:
movl %r3,(%r1)+
incl %r0
cmpl %r0,%r4
jeql .L6
movl $42,(%r2)+
bicl3 $-2,%r0,%r3
jeql .L5
movl %r0,%r3
jbr .L2
.L6:
ret
.size f, .-f

and is clearly better: not only it is smaller, but it also uses the
post-increment rather than indexed addressing mode in the loop, of
which the former comes for free in terms of both performance and code
size while the latter causes an extra byte per operand to be produced
for the index register and also incurs an execution penalty for the
extra address calculation.

Exclude the case from VAX testing then, as already done for some other
targets and discussed with commit d242fdaec186 ("gcc.dg/loop-8.c: Skip
for mmix.").

gcc/
* config/vax/vax.c (vax_address_cost): Express the cost in terms
of COSTS_N_INSNS.
(vax_rtx_costs): Likewise.

gcc/testsuite/
* gcc.dg/loop-8.c: Exclude for `vax-*-*'.
* gcc.target/vax/compare-add-zero.c: New test.
* gcc.target/vax/compare-mov-zero.c: New test.

VAX/testsuite: Run target testing over all the usual optimization levels

It makes sense to use what other targets do and run all the VAX test
cases over all the usual optimization levels, so make `vax.exp' use our
`gcc-dg-runtest' rather than the generic `dg-runtest' test driver.

This breaks `pr56875.c' however, which is optimized away at levels above
`-O0' as a result of how it has been written for calculations to make no
effect:

FAIL: gcc.target/vax/pr56875.c   -O1   scan-assembler ashq .*,\\$0xffffffffffffffff,
FAIL: gcc.target/vax/pr56875.c   -O2   scan-assembler ashq .*,\\$0xffffffffffffffff,
FAIL: gcc.target/vax/pr56875.c   -O3 -g   scan-assembler ashq .*,\\$0xffffffffffffffff,
FAIL: gcc.target/vax/pr56875.c   -Os   scan-assembler ashq .*,\\$0xffffffffffffffff,
FAIL: gcc.target/vax/pr56875.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none   scan-assembler ashq .*,\\$0xffffffffffffffff,
FAIL: gcc.target/vax/pr56875.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects   scan-assembler ashq .*,\\$0xffffffffffffffff,

Rather than keeping it at `-O0' update the test case for its code to do
make effect while retaining its sense.  Also reformat it according to
our requirements.

gcc/testsuite/
* gcc.target/vax/vax.exp: Use `gcc-dg-runtest' rather than
`dg-runtest'.
* gcc.target/vax/pr56875.c (dg-options): Make empty.
(a): Rewrite for calculations to make effect.  Reformat.

VAX: Define LEGITIMATE_PIC_OPERAND_P

The VAX ELF psABI does not permit the use of all hardware operand modes
for PIC symbol references due to the need to use PC-relative addressing
for symbols that end up local and the need to make references indirect
symbols that end up global.

Therefore symbols referred as immediates may only be used with the move
and push address (MOVA and PUSHA) instructions and their PC-relative
displacement address mode, as there is no genuine PC-relative immediate
available that all the other instructions would have to use.

Furthermore global symbol references must not have an offset applied,
which has to be added with a separate instruction, because there is no
support now for GOT entries for external `symbol+offset' references, so
any indirect GOT references made by the static linker from the original
direct symbol references must not have an addend applied.  Consequently
no addend is allowed even if a given external symbol turns out local,
for whatever reason, at the static link time.

Define the LEGITIMATE_PIC_OPERAND_P macro then, a corresponding function
and predicate to exclude the relevant expressions as required, and then
a constraint so that reloads are produced where needed, and use the new
facilities in the machine description, folding corresponding duplicated
patterns for local and external symbols together.  Rewrite predicates to
make use of the new function, rename them to match their sense and also
remove ones no longer used.

All this fixing an ICE like this:

during RTL pass: postreload
.../gcc/testsuite/gcc.c-torture/execute/20040709-2.c: In function 'testE':
.../gcc/testsuite/gcc.c-torture/execute/20040709-2.c:89:1: internal compiler error: in reload_combine_note_use, at postreload.c:1559
.../gcc/testsuite/gcc.c-torture/execute/20040709-2.c:96:65: note: in expansion of macro 'T'
0x10fe84cb reload_combine_note_use
.../gcc/postreload.c:1559
0x10fe8857 reload_combine_note_use
.../gcc/postreload.c:1621
0x10fe8303 reload_combine_note_use
.../gcc/postreload.c:1517
0x10fe7c7b reload_combine
.../gcc/postreload.c:1408
0x10fe3417 reload_cse_regs
.../gcc/postreload.c:67
0x10feaf9f execute
.../gcc/postreload.c:2358

due to the presence of a pseudo register post-reload:

(insn 435 228 229 13 (set (reg:SI 1 %r1)
        (mem/c:SI (reg/f:SI 341) [25 sE+12 S4 A8])) ".../gcc/testsuite/gcc.c-torture/execute/20040709-2.c":96:65 12 {movsi_2}
     (nil))

(due to the use of an offset `sE+12' symbol reference) and removing
these regressions:

FAIL: gcc.c-torture/execute/20040709-2.c   -O2  (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-2.c   -O2  (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-2.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-2.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-2.c   -O3 -g  (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-2.c   -O3 -g  (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-2.c   -Os  (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-2.c   -Os  (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-2.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-3.c   -O2  (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-3.c   -O2  (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-3.c   -O3 -g  (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-3.c   -O3 -g  (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-3.c   -Os  (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-3.c   -Os  (test for excess errors)
FAIL: gcc.c-torture/execute/20040709-3.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error)
FAIL: gcc.c-torture/execute/20040709-3.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
FAIL: gcc.dg/torture/pr52028.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error)
FAIL: gcc.dg/torture/pr52028.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)

gcc/
* config/vax/constraints.md (A): New constraint.
* config/vax/predicates.md (external_symbolic_operand)
(external_const_operand): Remove predicates.
(local_symbolic_operand): Rename to...
(pic_symbolic_operand): ... this, and rework.
(external_memory_operand): Rename to...
(non_pic_external_memory_operand): ... this, and rework.
(illegal_blk_memory_operand, illegal_addsub_di_memory_operand):
Update accordingly.
* config/vax/vax-protos.h (vax_acceptable_pic_operand_p): New
prototype.
* config/vax/vax.c (vax_acceptable_pic_operand_p): New function.
(vax_output_int_add): Update according to predicate rework.
* config/vax/vax.h (LEGITIMATE_PIC_OPERAND_P): New macro.
* config/vax/vax.md (pushlclsymreg, pushextsymreg): Fold
together, and rename to...
(*pushsymreg): ... this.  Use the `pic_symbolic_operand'
predicate and the `A' constraint for the displacement operand.
(movlclsymreg, movextsymreg): Fold together, and rename to...
(*movsymreg): ... this.  Use the `pic_symbolic_operand'
predicate and the `A' constraint for the displacement operand.
(pushextsym, pushlclsym): Fold together, and rename to...
(*pushsym): ... this.  Use the `pic_symbolic_operand' predicate
and the `A' constraint for the displacement operand.
(movextsym, movlclsym): Fold together, and rename to...
(*movsym): ... this.  Use the `pic_symbolic_operand' predicate
and the `A' constraint for the displacement operand.

VAX: Remove `c' operand format specifier overload

The `c' operand format specifier is handled directly by the middle end
in `output_asm_insn':

   %cN means require operand N to be a constant
      and print the constant expression with no punctuation.

however it resorts to the target for constants that are not valid
addresses:

    else if (letter == 'c')
      {
if (CONSTANT_ADDRESS_P (operands[opnum]))
  output_addr_const (asm_out_file, operands[opnum]);
else
  output_operand (operands[opnum], 'c');
      }

The VAX backend expects the fallback never to happen and overloads `c'
with the branch condition code.  This is confusing however and it is not
like we are short of letters, so instead make the branch condition code
use `k', and then for consistency make `K' the reverse branch condition
code format specifier.  This is safe to do as we provide no means to use
a computed branch condition code in user `asm'.

gcc/
* config/vax/vax.c (print_operand): Replace `c' and `C' with
`k' and `K' respectively.
* config/vax/vax.md (*branch, *branch_reversed): Update
accordingly.

PR target/58901: reload: Handle SUBREG of MEM with a mode-dependent address

Fix an ICE with the handling of RTL expressions like:

(subreg:QI (mem/c:SI (plus:SI (plus:SI (mult:SI (reg/v:SI 0 %r0 [orig:67 i ] [67])
                    (const_int 4 [0x4]))
                (reg/v/f:SI 7 %r7 [orig:59 doacross ] [59]))
            (const_int 40 [0x28])) [1 MEM[(unsigned int *)doacross_63 + 40B + i_106 * 4]+0 S4 A32]) 0)

that causes the compilation of libgomp to fail:

during RTL pass: reload
.../libgomp/ordered.c: In function 'GOMP_doacross_wait':
.../libgomp/ordered.c:507:1: internal compiler error: in change_address_1, at emit-rtl.c:2275
  507 | }
      | ^
0x10a3462b change_address_1
.../gcc/emit-rtl.c:2275
0x10a353a7 adjust_address_1(rtx_def*, machine_mode, poly_int<1u, long>, int, int, int, poly_int<1u, long>)
.../gcc/emit-rtl.c:2409
0x10ae2993 alter_subreg(rtx_def**, bool)
.../gcc/final.c:3368
0x10ae25cf cleanup_subreg_operands(rtx_insn*)
.../gcc/final.c:3322
0x110922a3 reload(rtx_insn*, int)
.../gcc/reload1.c:1232
0x10de2bf7 do_reload
.../gcc/ira.c:5812
0x10de3377 execute
.../gcc/ira.c:5986

in a `vax-netbsdelf' build, where an attempt is made to change the mode
of the contained memory reference to the mode of the containing SUBREG.
Such RTL expressions are produced by the VAX shift and rotate patterns
(`ashift', `ashiftrt', `rotate', `rotatert') where the count operand
always has the QI mode regardless of the mode, either SI or DI, of the
datum shifted or rotated.

Such a mode change cannot work where the memory reference uses the
indexed addressing mode, where a multiplier is implied that in the VAX
ISA depends on the width of the memory access requested and therefore
changing the machine mode would change the address calculation as well.

Avoid the attempt then by forcing the reload of any SUBREGs containing
a mode-dependent memory reference, also fixing these regressions:

FAIL: gcc.c-torture/compile/pr46883.c   -Os  (internal compiler error)
FAIL: gcc.c-torture/compile/pr46883.c   -Os  (test for excess errors)
FAIL: gcc.c-torture/execute/20120808-1.c   -O2  (internal compiler error)
FAIL: gcc.c-torture/execute/20120808-1.c   -O2  (test for excess errors)
FAIL: gcc.c-torture/execute/20120808-1.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler error)
FAIL: gcc.c-torture/execute/20120808-1.c   -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gcc.c-torture/execute/20120808-1.c   -O3 -g  (internal compiler error)
FAIL: gcc.c-torture/execute/20120808-1.c   -O3 -g  (test for excess errors)
FAIL: gcc.c-torture/execute/20120808-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (internal compiler error)
FAIL: gcc.c-torture/execute/20120808-1.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  (test for excess errors)
FAIL: gcc.c-torture/execute/20120808-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (internal compiler error)
FAIL: gcc.c-torture/execute/20120808-1.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  (test for excess errors)
FAIL: gcc.dg/20050629-1.c (internal compiler error)
FAIL: gcc.dg/20050629-1.c (test for excess errors)
FAIL: c-c++-common/torture/pr53505.c   -Os  (internal compiler error)
FAIL: c-c++-common/torture/pr53505.c   -Os  (test for excess errors)
FAIL: gfortran.dg/coarray_failed_images_1.f08   -Os  (internal compiler error)
FAIL: gfortran.dg/coarray_stopped_images_1.f08   -Os  (internal compiler error)

With test case #0 included it causes a reload with:

(insn 15 14 16 4 (set (reg:SI 31)
        (ashift:SI (const_int 1 [0x1])
            (subreg:QI (reg:SI 30 [ MEM[(int *)s_8(D) + 4B + _5 * 4] ]) 0))) "pr58901-0.c":15:12 94 {ashlsi3}
     (expr_list:REG_DEAD (reg:SI 30 [ MEM[(int *)s_8(D) + 4B + _5 * 4] ])
        (nil)))

as follows:

Reloads for insn # 15
Reload 0: reload_in (SI) = (reg:SI 30 [ MEM[(int *)s_8(D) + 4B + _5 * 4] ])
ALL_REGS, RELOAD_FOR_INPUT (opnum = 2)
reload_in_reg: (reg:SI 30 [ MEM[(int *)s_8(D) + 4B + _5 * 4] ])
reload_reg_rtx: (reg:SI 5 %r5)

resulting in:

(insn 37 14 15 4 (set (reg:SI 5 %r5)
        (mem/c:SI (plus:SI (plus:SI (mult:SI (reg/v:SI 1 %r1 [orig:25 i ] [25])
                        (const_int 4 [0x4]))
                    (reg/v/f:SI 4 %r4 [orig:29 s ] [29]))
                (const_int 4 [0x4])) [1 MEM[(int *)s_8(D) + 4B + _5 * 4]+0 S4 A32])) "pr58901-0.c":15:12 12 {movsi_2}
     (nil))
(insn 15 37 16 4 (set (reg:SI 2 %r2 [31])
        (ashift:SI (const_int 1 [0x1])
            (reg:QI 5 %r5))) "pr58901-0.c":15:12 94 {ashlsi3}
     (nil))

and assembly like:

.L3:
movl 4(%r4)[%r1],%r5
ashl %r5,$1,%r2
xorl2 %r2,%r0
incl %r1
cmpl %r1,%r3
jneq .L3

produced for the loop, providing optimization has been enabled.

Likewise with test case #1 the reload of:

(insn 17 16 18 4 (set (reg:SI 34)
        (and:SI (subreg:SI (reg/v:DI 27 [ t ]) 4)
            (const_int 1 [0x1]))) "pr58901-1.c":18:20 77 {*andsi_const_int}
     (expr_list:REG_DEAD (reg/v:DI 27 [ t ])
        (nil)))

is as follows:

Reloads for insn # 17
Reload 0: reload_in (DI) = (reg/v:DI 27 [ t ])
reload_out (SI) = (reg:SI 2 %r2 [34])
ALL_REGS, RELOAD_OTHER (opnum = 0)
reload_in_reg: (reg/v:DI 27 [ t ])
reload_out_reg: (reg:SI 2 %r2 [34])
reload_reg_rtx: (reg:DI 4 %r4)

resulting in:

(insn 40 16 17 4 (set (reg:DI 4 %r4)
        (mem/c:DI (plus:SI (mult:SI (reg/v:SI 1 %r1 [orig:26 i ] [26])
                    (const_int 8 [0x8]))
                (reg/v/f:SI 3 %r3 [orig:30 s ] [30])) [1 MEM[(const struct s *)s_13(D) + _7 * 8]+0 S8 A32])) "pr58901-1.c":18:20 11 {movdi}
     (nil))
(insn 17 40 41 4 (set (reg:SI 4 %r4)
        (and:SI (reg:SI 5 %r5 [+4 ])
            (const_int 1 [0x1]))) "pr58901-1.c":18:20 77 {*andsi_const_int}
     (nil))

and assembly like:

.L3:
movq (%r3)[%r1],%r4
bicl3 $-2,%r5,%r4
addl2 %r4,%r0
jaoblss %r0,%r1,.L3

First posted at: <https://gcc.gnu.org/ml/gcc/2014-06/msg00060.html>.

2020-12-05  Matt Thomas  <matt@3am-software.com>
    Maciej W. Rozycki  <macro@linux-mips.org>

gcc/
PR target/58901
* reload.c (push_reload): Also reload the inner expression of a
SUBREG for pseudos associated with a mode-dependent memory
reference.
(find_reloads): Force a reload likewise.

2020-12-05  Maciej W. Rozycki  <macro@linux-mips.org>

gcc/testsuite/
PR target/58901
* gcc.c-torture/compile/pr58901-0.c: New test.
* gcc.c-torture/compile/pr58901-1.c: New test.

modulo-sched: Carefully process loop counter initialization [PR97421]

Do not allow direct adjustment of pre-header initialization instruction for
count register if is read in some instruction below in that basic block.

gcc/ChangeLog:

PR rtl-optimization/97421
* modulo-sched.c (generate_prolog_epilog): Remove forward
declaration, adjust last argument name and type.
(const_iteration_count): Add bool pointer parameter to return
whether count register is read in pre-header after its
initialization.
(sms_schedule): Fix count register initialization adjustment
procedure according to what const_iteration_count said.

gcc/testsuite/ChangeLog:

PR rtl-optimization/97421
* gcc.c-torture/execute/pr97421-1.c: New test.
* gcc.c-torture/execute/pr97421-2.c: New test.
* gcc.c-torture/execute/pr97421-3.c: New test.

Fortran: flag formal argument before resolving an array spec [PR98016].

2020-12-05 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/98016
* resolve.c (resolve_symbol): Set formal_arg_flag before
resolving an array spec and restore value afterwards.

gcc/testsuite/
PR fortran/98016
* gfortran.dg/pr98016.f90: New test.

Darwin : Update libtool and dependencies for Darwin20 [PR97865]

The change in major version (and the increment from Darwin19 to 20)
caused libtool tests to fail which resulted in incorrect build settings
for shared libraries.

We take this opportunity to sort out the shared undefined symbols state
rather than propagating the current unsound behaviour into a new rev.

This change means that we default to the case that missing symbols are
considered an error, and if one wants to allow this intentionally, the
confiuration for that case should be set appropriately.

Three existing cases need undefined dynamic lookup:
libitm, where there is already a configuration mechanism to add the
flags.
libcc1, where we add simple configuration to add the flags for Darwin.
libsanitizer, where we can add to the existing extra flags.

libcc1/ChangeLog:

PR target/97865
* Makefile.am: Add dynamic_lookup to LD flags for Darwin.
* configure.ac: Test for Darwin host and set a flag.
* Makefile.in: Regenerate.
* configure: Regenerate.

libitm/ChangeLog:

PR target/97865
* configure.tgt: Add dynamic_lookup to XLDFLAGS for Darwin.
* configure: Regenerate.

libsanitizer/ChangeLog:

PR target/97865
* configure.tgt: Add dynamic_lookup to EXTRA_CXXFLAGS for
Darwin.
* configure: Regenerate.

ChangeLog:

PR target/97865
* libtool.m4: Update handling of Darwin platform link flags
for Darwin20.

gcc/ChangeLog:

PR target/97865
* configure: Regenerate.

libatomic/ChangeLog:

PR target/97865
* configure: Regenerate.

libbacktrace/ChangeLog:

PR target/97865
* configure: Regenerate.

libffi/ChangeLog:

PR target/97865
* configure: Regenerate.

libgfortran/ChangeLog:

PR target/97865
* configure: Regenerate.

libgomp/ChangeLog:

PR target/97865
* configure: Regenerate.

libhsail-rt/ChangeLog:

PR target/97865
* configure: Regenerate.

libobjc/ChangeLog:

PR target/97865
* configure: Regenerate.

libphobos/ChangeLog:

PR target/97865
* configure: Regenerate.

libquadmath/ChangeLog:

PR target/97865
* configure: Regenerate.

libssp/ChangeLog:

PR target/97865
* configure: Regenerate.

libstdc++-v3/ChangeLog:

PR target/97865
* configure: Regenerate.

libvtv/ChangeLog:

PR target/97865
* configure: Regenerate.

zlib/ChangeLog:

PR target/97865
* configure: Regenerate.

X86_64: Enable support for next generation AMD Zen3 CPU.

2020-12-03 Venkataramanan Kumar <Venkataramanan.Kumar@amd.com>
Sharavan Kumar <Shravan.Kumar@amd.com>

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (get_amd_cpu) recognize znver3.
* common/config/i386/i386-common.c (processor_names): Add
znver3.
(processor_alias_table): Add znver3 and AMDFAM19H entry.
* common/config/i386/i386-cpuinfo.h (processor_types): Add
AMDFAM19H.
(processor_subtypes): AMDFAM19H_ZNVER3.
* config.gcc (i[34567]86-*-linux* | ...): Likewise.
* config/i386/driver-i386.c: (host_detect_local_cpu): Let
-march=native recognize znver3 processors.
* config/i386/i386-c.c (ix86_target_macros_internal): Add
znver3.
* config/i386/i386-options.c (m_znver3): New definition.
(m_ZNVER): Include m_znver3.
(processor_cost_table): Add znver3.
* config/i386/i386.c (ix86_reassociation_width): Likewise.
* config/i386/i386.h (TARGET_znver3): New definition.
(enum processor_type): Add PROCESSOR_ZNVER3.
* config/i386/i386.md (define_attr "cpu"): Add znver3.
* config/i386/x86-tune-sched.c: (ix86_issue_rate): Likewise.
(ix86_adjust_cost): Likewise.
* config/i386/x86-tune.def (X86_TUNE_AVOID_256FMA_CHAINS:
Likewise.
* config/i386/znver1.md: Add new reservations for znver3.
* doc/extend.texi: Add details about znver3.
* doc/invoke.texi: Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/funcspec-56.inc: Handle new march.
* g++.target/i386/mv29.C: New file.

i386: Combine splitters followup [PR96226]

Here is the patch to simplify the newly added combine splitters,
when we split into 2 insns anyway, no reason to split into the masking
define_insn_and_split we'd be splitting shortly after.

2020-12-05 Jakub Jelinek <jakub@redhat.com>

PR target/96226
* config/i386/i386.md (splitter after *<rotate_insn><mode>3_mask,
splitter after *<rotate_insn><mode>3_mask_1): Drop the masking from
the patterns to split into.

c++: Fix constexpr access to union member through pointer-to-member [PR98122]

We currently incorrectly reject the first testcase, because
cxx_fold_indirect_ref_1 doesn't attempt to handle UNION_TYPEs.
As the second testcase shows, it isn't that easy, because I believe we need
to take into account the active member and prefer that active member over
other members, because if we pick a non-active one, we might reject valid
programs.

2020-12-05 Jakub Jelinek <jakub@redhat.com>

PR c++/98122
* constexpr.c (cxx_union_active_member): New function.
(cxx_fold_indirect_ref_1): Add ctx argument, pass it through to
recursive call. Handle UNION_TYPE.
(cxx_fold_indirect_ref): Add ctx argument, pass it to recursive calls
and cxx_fold_indirect_ref_1.
(cxx_eval_indirect_ref): Adjust cxx_fold_indirect_ref calls.

* g++.dg/cpp1y/constexpr-98122.C: New test.
* g++.dg/cpp2a/constexpr-98122.C: New test.

Daily bump.

runtime: update type descriptor name in fieldtrack C support code

We were using the old name, but nothing noticed because it is a weak
reference that is permitted to be nil, so that it works with code that
does not use the field tracking library.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/275449

c++: Fix deduction from auto template parameter [PR93083]

The check in do_class_deduction to handle passing one class placeholder
template parm as an argument for itself needed to be extended to also handle
equivalent parms from other templates.

gcc/cp/ChangeLog:

PR c++/93083
* pt.c (convert_template_argument): Handle equivalent placeholders.
(do_class_deduction): Look through EXPR_PACK_EXPANSION, too.

gcc/testsuite/ChangeLog:

PR c++/93083
* g++.dg/cpp2a/nontype-class40.C: New test.

vec: Simplify use with C++11 range-based 'for'.

It looks cleaner if we can use a vec* directly as a range for the C++11
range-based 'for' loop, without needing to indirect from it, and also works
with null pointers.

The change in cp_parser_late_parsing_default_args is an example of how this
can be used to simplify a simple loop over a vector. Reverse or subset
iteration will require adding range adaptors.

I deliberately didn't format the new overloads for etags since they are
trivial.

gcc/ChangeLog:

* vec.h (begin, end): Add overloads for vec*.
* tree.c (build_constructor_from_vec): Remove *.

gcc/cp/ChangeLog:

* decl2.c (clear_consteval_vfns): Remove *.
* pt.c (do_auto_deduction): Remove *.
* parser.c (cp_parser_late_parsing_default_args): Change loop
to use range 'for'.

rs6000: fix PTR_SIZE in rs6000.c

The recent change to rs6000.c for DWARF in AIX references the macro
PTR_SIZE that only is defined in dwarf2out.c. This patch changes the
reference to the equivalent POINTER_SIZE_UNITS defined in defaults.h.

gcc/ChangeLog:

* config/rs6000/rs6000.c (rs6000_option_override_internal):
Change PTR_SIZE to POINTER_SIZE_UNITS.

doc/implement-c.texi: About same-as-scalar-type volatile aggregate accesses, PR94600

We say very little about reads and writes to aggregate /
compound objects, just scalar objects (i.e. assignments don't
cause reads).  Let's lets say something safe about aggregate
objects, but only for those that are the same size as a scalar
type.

There's an equal-sounding section (Volatiles) in extend.texi,
but this seems a more appropriate place, as specifying the
behavior of a standard qualifier.

gcc:

2020-12-04  Hans-Peter Nilsson  <hp@axis.com>
    Martin Sebor  <msebor@redhat.com>

PR middle-end/94600
* doc/implement-c.texi (Qualifiers implementation): Add blurb
about access to the whole of a volatile aggregate object, only for
same-size as a scalar object.

gimple: Return fnspec only for replaceable new/delete operators called from new/delete [PR98130]

As mentioned in the PR, we shouldn't treat non-replaceable operator
new/delete (e.g. with the placement new) as replaceable ones.

There is some pending discussion that perhaps operator delete called from
delete if not replaceable should return some other fnspec, but can we handle
that incrementally, fix this wrong-code and then deal with a missed
optimization? I really don't know what exactly should be returned.

2020-12-04 Jakub Jelinek <jakub@redhat.com>

PR c++/98130
* gimple.c (gimple_call_fnspec): Only return ".co " for replaceable
operator delete or ".mC" for replaceable operator new called from
new/delete.

* g++.dg/opt/pr98130.C: New test.

i386: Add combine splitters to allow combining multiple insns into reg1 = const; reg2 = rotate (reg1, reg3 & cst) [PR96226]

As mentioned in the PR, we can combine ~(1 << x) into -2 r<< x, but we give
up in the ~(1 << (x & 31)) cases, as *<rotate_insn><mode>3_mask* don't allow
immediate operand 1 and find_split_point prefers to split (x & 31) instead
of the constant.

With these combine splitters we help combine decide how to split those
insns.

2020-12-04 Jakub Jelinek <jakub@redhat.com>

PR target/96226
* config/i386/i386.md (splitter after *<rotate_insn><mode>3_mask,
splitter after *<rotate_insn><mode>3_mask_1): New combine splitters.

* gcc.target/i386/pr96226.c: New test.

fold-const: Don't use build_constructor for non-aggregate types in native_encode_initializer [PR93121]

The following testcase is rejected, because when trying to encode a zeroing
CONSTRUCTOR, the code was using build_constructor to build initializers for
the elements but when recursing the function handles CONSTRUCTOR only for
aggregate types.

The following patch fixes that by using build_zero_cst instead for
non-aggregates.  Another option would be add handling CONSTRUCTOR for
non-aggregates in native_encode_initializer.  Or we can do both, I guess
the middle-end generally doesn't like CONSTRUCTORs for scalar variables, but
am not 100% sure if the FE doesn't produce those sometimes.

2020-12-04  Jakub Jelinek  <jakub@redhat.com>

PR libstdc++/93121
* fold-const.c (native_encode_initializer): Use build_zero_cst
instead of build_constructor.

* g++.dg/cpp2a/bit-cast6.C: New test.

c++: Revert dependent-array changes [PR 98116]

The changes reverted here are exposing an existing problem with alias
template comparisons. The typename_type changes are also incomplete,
possibly for similar reasons. It seems safer to revert them, fix the
underlying issue and then move forwards.

The testcases is adjusted to more robustly check the specialization
table, and ICEs with and without the c++ changes.

Revert:
62fb1b9e0da c++: Fix array type dependency [PR 98107]
07589ca2b2c c++: typename_type structural comparison
29ae1d7751 c++: Extend build_array_type API

PR c++/98116
gcc/cp/
* cp-tree.h (comparing_typenames): Delete.
(cplus_build_array_type): Remove default parm.
* pt.c (comparing_typenames): Delete.
(spec_hasher::equal): Don't increment it.
* tree.c (set_array_type_canon): Remove dep parm.
(build_cplus_array_type): Remove dep parm changes.
(cp_build_qualified_type_real): Remove dependent array type
changes.
(strip_typedefs): Likewise.
* typeck.c (structural_comptypes): Revert comparing_typename
changes.
gcc/testsuite/
* g++.dg/template/pr98116.C: Enable robust checking.

c++: Module API declarations

This provides the inline predicates about module state, and declares
the functions to be provided.

gcc/cp/
* cp-tree.h: Add various inline module state predicates, and
declare the API that will be provided by modules.cc

debug: Fix another vector DECL_MODE ICE [PR98100]

The PR88587 fix changes DECL_MODE of vars with vector type during inlining/cloning
when the vars are copied, so that their DECL_MODE matches their TYPE_MODE in
the new function.  Unfortunately, the following testcase still ICEs, the var
isn't really used in the new function and so it isn't copied, but becomes
just a nonlocalized var.  So we can't adjust its DECL_MODE because it
appears in multiple functions and needs different modes in between them.
The following patch changes the DEBUG_INSN creation to use TYPE_MODE instead
of DECL_MODE for vars with vector types.

2020-12-04  Jakub Jelinek  <jakub@redhat.com>

PR target/98100
* cfgexpand.c (expand_gimple_basic_block): For vars with
vector type, use TYPE_MODE rather than DECL_MODE.

* gcc.target/i386/pr98100.c: New test.

dwarf: Add -gdwarf{32,64} options

The following patch makes the choice between 32-bit and 64-bit DWARF formats
selectable by command line switch, rather than being hardcoded through
DWARF_OFFSET_SIZE macro.

The options themselves don't turn on debug info themselves, so one needs
to use -g -gdwarf64 or similar.

2020-12-04 Jakub Jelinek <jakub@redhat.com>

* common.opt (-gdwarf32, -gdwarf64): New options.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Default
dwarf_offset_size to 8 if not overridden from the command line.
* dwarf2out.c: Change all occurrences of DWARF_OFFSET_SIZE to
dwarf_offset_size.
* doc/invoke.texi (-gdwarf32, -gdwarf64): Document.

testsuite: use param for if-to-switch tests

gcc/testsuite/ChangeLog:

PR testsuite/98123
* gcc.dg/tree-ssa/if-to-switch-4.c: Add param to make the test
stable on all architectures.
* gcc.dg/tree-ssa/if-to-switch-6.c: Likewise.
* gcc.dg/tree-ssa/if-to-switch-8.c: Likewise.

Add target selector to gcc.dg/pr98099.c

gcc/testsuite/ChangeLog:
* gcc.dg/pr98099.c: Compile only for dfp targets.

Refactor -frecord-gcc-switches.

gcc/ChangeLog:

* doc/tm.texi: Change argument of the record_gcc_switches
hook and remove SWITCH_TYPE_* enum values.
* dwarf2out.c (gen_producer_string): Move to opts.c and remove
handling of the dwarf_record_gcc_switches option.
(dwarf2out_early_finish): Use moved gen_producer_string
function.
* opts.c (gen_producer_string): New.
* opts.h (gen_producer_string): New.
* target.def: Change type of record_gcc_switches.
* target.h (enum print_switch_type): Remove.
(elf_record_gcc_switches): Change first argument.
* toplev.c (MAX_LINE): Remove.
(print_to_asm_out_file): Likewise.
(print_to_stderr): Likewise.
(print_single_switch): Likewise.
(print_switch_values): Likewise.
(init_asm_output): Use new gen_producer_string function.
(process_options): Likewise.
* varasm.c (elf_record_gcc_switches): Just save the string argument
to the ELF container.

Fix checking failure in IPA-SRA

This is a regression present on the mainline and 10 branch: on the one
hand, IPA-SRA does *not* disqualify accesses with zero size but, on the
other hand, it checks that accesses present in the tree have a (strictly)
positive size, thus trivially yielding an ICE in some cases.

gcc/ChangeLog:
* ipa-sra.c (verify_access_tree_1): Relax assertion on the size.

gcc/testsuite/ChangeLog:
* gnat.dg/opt91.ads, gnat.dg/opt91.adb: New test.
* gnat.dg/opt91_pkg.ads, gnat.dg/opt91_pkg.adb: New helper.

Document missing params.

contrib/ChangeLog:

* check-params-in-docs.py: use flake8 and add some
tweaks to ignore aarch64 params.

gcc/ChangeLog:

* doc/invoke.texi: Add missing params.

c++: Change __builtin_source_location to use __PRETTY_FUNCTION__ instead of __FUNCTION__ [PR80780]

On Tue, Dec 01, 2020 at 01:03:52PM +0000, Jonathan Wakely via Gcc-patches wrote:
> I mentioned in PR 80780 that a __builtin__PRETTY_FUNCTION would have
> been nice, because __FUNCTION__ isn't very useful for C++, because of
> overloading and namespace/class scopes. There are an unlimited number
> of functions that have __FUNCTION__ == "s", e.g. "ns::s(int)" and
> "ns::s()" and "another_scope::s::s<T...>(T...)" etc.
>
> Since __builtin_source_location() can do whatever it wants (without
> needing to add __builtin__PRETTY_FUNCTION) it might be nice to use the
> __PRETTY_FUNCTION__ string. JeanHeyd's tests would still need changes,
> because the name would be "s::s(void*)" not "s::s" but that still
> seems better for users.

When I've added template tests for the previous patch, I have noticed that
the current __builtin_source_location behavior is not really __FUNCTION__,
just close, because e.g. in function template __FUNCTION__ is still
"bar" but __builtin_source_location gave "bar<0>".

Anyway, this patch implements above request to follow __PRETTY_FUNCTION__
(on top of the earlier posted patch).

2020-12-04  Jakub Jelinek  <jakub@redhat.com>

PR c++/80780
* cp-gimplify.c (fold_builtin_source_location): Use 2 instead of 0
as last argument to cxx_printable_name.

* g++.dg/cpp2a/srcloc1.C (quux): Use __PRETTY_FUNCTION__ instead of
function.
* g++.dg/cpp2a/srcloc2.C (quux): Likewise.
* g++.dg/cpp2a/srcloc15.C (S::S): Likewise.
(bar): Likewise.  Adjust expected column.
* g++.dg/cpp2a/srcloc17.C (S::S): Likewise.
(bar): Likewise.  Adjust expected column.

* testsuite/18_support/source_location/1.cc (main): Adjust for
__builtin_source_location using __PRETTY_FUNCTION__-like names instead
__FUNCTION__-like.
* testsuite/18_support/source_location/consteval.cc (main): Likewise.

Daily bump.

c++: XFAIL testcase for PR98019

Apparently it isn't actually fixed on trunk yet, was just passing because of
some WIP in my tree. So XFAIL for now.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-nodiscard1.C: XFAIL.

c++: Fix bootstrap on 32-bit hosts [PR91828]

Using the releasing_vec op[] with an int index was breaking on 32-bit hosts
because of ambiguity with the built-in operator and the conversion
function. Since the built-in operator has a ptrdiff_t, this was fine on
64-bit targets where ptrdiff_t is larger than int, but broke on 32-bit
targets where it's the same as int, making the conversion for that argument
better than the member function. Fixed by changing the member function to
also use ptrdiff_t for the index.

gcc/cp/ChangeLog:

* cp-tree.h (releasing_vec::operator[]): Change parameter type to
ptrdiff_t.

Add support for detecting mismatched allocation/deallocation calls.

PR c++/90629 - Support for -Wmismatched-new-delete
PR middle-end/94527 - Add an __attribute__ that marks a function as freeing an object

gcc/ChangeLog:

PR c++/90629
PR middle-end/94527
* builtins.c (access_ref::access_ref): Initialize new member.
(compute_objsize): Use access_ref::deref. Handle simple pointer
assignment.
(expand_builtin): Remove handling of the free built-in.
(call_dealloc_argno): Same.
(find_assignment_location): New function.
(fndecl_alloc_p): Same.
(gimple_call_alloc_p): Same.
(call_dealloc_p): Same.
(matching_alloc_calls_p): Same.
(warn_dealloc_offset): Same.
(maybe_emit_free_warning): Same.
* builtins.h (struct access_ref): Declare new member.
(maybe_emit_free_warning): Make extern. Make use of access_ref.
Handle -Wmismatched-new-delete.
* calls.c (initialize_argument_information): Call
maybe_emit_free_warning.
* doc/extend.texi (attribute malloc): Update.
* doc/invoke.texi (-Wfree-nonheap-object): Expand documentation.
(-Wmismatched-new-delete): Document new option.
(-Wmismatched-dealloc): Document new option.

gcc/c-family/ChangeLog:

PR c++/90629
PR middle-end/94527
* c-attribs.c (handle_dealloc_attribute): New function.
(handle_malloc_attribute): Handle argument forms of attribute.
* c.opt (-Wmismatched-dealloc): New option.
(-Wmismatched-new-delete): New option.

gcc/testsuite/ChangeLog:

PR c++/90629
PR middle-end/94527
* g++.dg/asan/asan_test.cc: Fix a bug.
* g++.dg/warn/delete-array-1.C: Add expected warning.
* g++.old-deja/g++.other/delete2.C: Add expected warning.
* g++.dg/warn/Wfree-nonheap-object-2.C: New test.
* g++.dg/warn/Wfree-nonheap-object.C: New test.
* g++.dg/warn/Wmismatched-new-delete.C: New test.
* g++.dg/warn/Wmismatched-dealloc-2.C: New test.
* g++.dg/warn/Wmismatched-dealloc.C: New test.
* gcc.dg/Wmismatched-dealloc.c: New test.
* gcc.dg/analyzer/malloc-1.c: Prune out expected warning.
* gcc.dg/attr-malloc.c: New test.
* gcc.dg/free-1.c: Adjust text of expected warning.
* gcc.dg/free-2.c: Same.
* gcc.dg/torture/pr71816.c: Prune out expected warning.
* gcc.dg/tree-ssa/pr19831-2.c: Add an expected warning.
* gcc.dg/Wfree-nonheap-object-2.c: New test.
* gcc.dg/Wfree-nonheap-object-3.c: New test.
* gcc.dg/Wfree-nonheap-object.c: New test.

libstdc++-v3/ChangeLog:

* testsuite/ext/vstring/modifiers/clear/56166.cc: Suppress a false
positive warning.

c++: Exported using decls

With modules we need to record whethe a (namespace-scope) using decl
is exporting the named entities. Record this on the OVERLOAD marking
the used decl.

gcc/cp/
* cp-tree.h (OVL_EXPORT): New.
(class ovl_iterator): Add get_using, exporting_p.
* tree.c (ovl_insert): Extend using_or_hidden meaning to include
an exported using.

c++: uninstantiated template friends

template friends need to be recognized by module streaming and
associated with the befriending class. but their context is that of
the friend (a namespace or other class). This adds a flag to mark
such templates, and uses their DECL_CHAIN to point at the befriender.

gcc/cp
* cp-tree.h (DECL_UNINSTANTIATED_TEMPLATE_FRIEND): New.
* pt.c (push_template_decl): Set it.
(tsubst_friend_function): Clear it.

Go testsuite: update new tests to version in source repo

PowerPC: PR libgcc/97543 and libgcc/97643, fix long double issues

If you use a compiler with long double defaulting to 64-bit instead of 128-bit
with IBM extended double, you get linker warnings about mis-matches in the gnu
attributes for long double (PR libgcc/97543).  Even if the compiler is
configured to have long double be 64 bit as the default with the configuration
option '--without-long-double-128' you get the warnings.

You also get the same issues if you use a compiler with long double defaulting
to IEEE 128-bit instead of IBM extended double (PR libgcc/97643).

The issue is the way libgcc.a/libgcc.so is built.  Right now when building
libgcc under Linux, the long double size is set to 128-bits when building
libgcc.  However, the gnu attributes are set, leading to the warnings.

One feature of the current GNU attribute implementation is if you have a shared
library (such as libgcc_s.so), the GNU attributes for the shared library is an
inclusive OR of all of the objects within the library.  This means if any
object file that uses the -mlong-double-128 option and uses long double, the GNU
attributes for the library will indicate that it uses 128-bit IBM long
doubles.  If you have a static library, you will get the warning only if you
actually reference an object file  with the attribute set.

This patch does two things:

    1) All of the object files that support IBM 128-bit long doubles
explicitly set the ABI to IBM extended double.

    2) I turned off GNU attributes for building the shared library or for
        building the IBM 128-bit long double support.

libgcc/
2020-12-03  Michael Meissner  <meissner@linux.ibm.com>

PR libgcc/97543
PR libgcc/97643
* config/rs6000/t-linux (IBM128_STATIC_OBJS): New make variable.
(IBM128_SHARED_OBJS): New make variable.
(IBM128_OBJS): New make variable.  Set all objects to use the
explicit IBM format, and disable gnu attributes.
(IBM128_CFLAGS): New make variable.
(gcc_s_compile): Add -mno-gnu-attribute to all shared library
modules.

PR fortran/95342 - ICE in gfc_match_subroutine, at fortran/decl.c:7913

Add checks for NULL pointers before dereferencing them.

gcc/fortran/ChangeLog:

PR fortran/95342
* decl.c (gfc_match_function_decl): Avoid NULL pointer dereference.
(gfc_match_subroutine): Likewise.

gcc/testsuite/ChangeLog:

PR fortran/95342
* gfortran.dg/pr95342.f90: New test.

libstdc++: Fix typos in #error strings

libstdc++-v3/ChangeLog:

* testsuite/26_numerics/bit/bit.cast/bit_cast.cc: Remove stray
word from copy&paste.
* testsuite/26_numerics/bit/bit.cast/version.cc: Likewise.

fix __builtin___clear_cache overrider fallout

Machines that had CLEAR_CACHE_INSN and that would thus issue calls to
__clear_cache with the default call expander, would fail on languages
that did not set up the __clear_cache builtin.  This patch arranges
for all languages to set up this builtin.

Machines or multilibs that had ptr_mode != Pmode, such as aarch64 with
-mabi=ilp32, would fail the RTL mode test of the arguments passed to
__clear_cache, because we'd insist on ptr_mode.  This patch arranges
for Pmode to be accepted as well.

for  gcc/ChangeLog

* tree.c (build_common_builtin_nodes): Declare
__builtin___clear_cache for all languages.
* builtins.c (maybe_emit_call_builtin___clear_cache): Accept
Pmode arguments.

libstdc++: Update C++20 library implementation status

libstdc++-v3/ChangeLog:

* doc/xml/manual/status_cxx2020.xml: Update C++20 status.
* doc/html/*: Regenerate.

libtdc++: Define std::source_location for C++20

This doesn't define a new _GLIBCXX_HAVE_BUILTIN_SOURCE_LOCATION macro.
because using __has_builtin(__builtin_source_location) is sufficient.
Currently only GCC supports it, but if/when Clang and Intel add it the
__has_builtin check should for them too.

Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
libstdc++-v3/ChangeLog:

* doc/doxygen/user.cfg.in (INPUT): Add <source_location>.
* include/Makefile.am: Add <source_location>.
* include/Makefile.in: Regenerate.
* include/std/version (__cpp_lib_source_location): Define.
* include/std/source_location: New file.
* testsuite/18_support/source_location/1.cc: New test.
* testsuite/18_support/source_location/consteval.cc: New test.
* testsuite/18_support/source_location/srcloc.h: New test.
* testsuite/18_support/source_location/version.cc: New test.

libstdc++: Add std::bit_cast for C++20 [PR 93121]

Thanks to Jakub's addition of the built-in, we can add this to the
library now. The compiler tests for the built-in are quite extensive,
including verifying the constraints, so this only adds minimal tests to
the library testsuite.

This doesn't add a new _GLIBCXX_HAVE_BUILTIN_BIT_CAST because using
__has_builtin(__builtin_bit_cast) works for GCC and versions of Clang
that provide the built-in.

libstdc++-v3/ChangeLog:

PR libstdc++/93121
* include/std/bit (__cpp_lib_bit_cast, bit_cast): Define.
* include/std/version (__cpp_lib_bit_cast): Define.
* testsuite/26_numerics/bit/bit.cast/bit_cast.cc: New test.
* testsuite/26_numerics/bit/bit.cast/version.cc: New test.

Go testsuite: add a bunch of new tests from source repo

go-test.exp: add -I. when compiling in directory

* go.test/go-test.exp (go-gc-tests): Add -I. when building all
sources in a directory (errorcheckdir, compiledir, rundir,
rundircmpout).

c++: Add testcase for PR98019

This has already been fixed on trunk, but I don't see a testcase for it.

gcc/testsuite/ChangeLog:

PR c++/98019
* g++.dg/cpp2a/concepts-nodiscard1.C: New test.

testsuite: update existing Go tests to source repo

This updates a bunch of existing Go tests to the contents of the
source repo. This does not add any of the newer tests.

RTEMS: Add Cortex-R52 multilib

gcc/
* config/arm/t-rtems: Add "-mthumb -mcpu=cortex-r52
-mfloat-abi=hard" multilib.

libstdc++: Update powerpc-linux baselines for GCC 10.1

This should have been done before the GCC 10.1 release.

libstdc++-v3/ChangeLog:

* config/abi/post/powerpc-linux-gnu/baseline_symbols.txt:
Update.
* config/abi/post/powerpc64-linux-gnu/32/baseline_symbols.txt:
Update.

libstdc++: Disable std::array assertions for C++11 constexpr

The recent changes to add assertions to std::array broke the functions
that need to be constexpr in C++11, because of the restrictive rules for
constexpr functions in C++11.

This simply disables the assertions for C++11 mode, so the functions can
be constexpr again.

libstdc++-v3/ChangeLog:

* include/std/array (array::operator[](size_t) const, array::front() const)
(array::back() const) [__cplusplus == 201103]: Disable
assertions.
* testsuite/23_containers/array/element_access/constexpr_element_access.cc:
Check for correct values.
* testsuite/23_containers/array/tuple_interface/get_neg.cc:
Adjust dg-error line numbers.
* testsuite/23_containers/array/debug/constexpr_c++11.cc: New test.

c++:  templatey type creation

This patch makes a couple of type-creation routines available to
modules.  That needs to create unbound template parms, and canonical
template parms.

gcc/cp/
* cp-tree.h (make_unbound_class_template_raw): Declare.
(canonical_type_parameter): Declare.
* decl.c (make_unbound_class_template_raw): Break out of ...
(make_unboud_class_template): ... here.  Call it.
* pt.c (canonical_type_parameter): Externalize.  Refactor & set
structural_equality for type parms.

i386: Fix up ix86_md_asm_adjust for TImode [PR98086]

ix86_md_asm_adjust assumes that dest_mode can be only [QHSD]Imode
and nothing else.  The patch rewrites zero-extension part to use
convert_to_mode to handle TImode and hypothetically even wider modes.

2020-12-03  Uroš Bizjak  <ubizjak@gmail.com>
    Jakub Jelinek  <jakub@redhat.com>

gcc/
PR target/98086
* config/i386/i386.c (ix86_md_asm_adjustmd): Rewrite
zero-extension part to use convert_to_mode.

gcc/testsuite/
PR target/98086
* gcc.target/i386/pr98086.c: New test.

c++: Testcases [PR 98115]

These two testcases provide coverage for 98115, which doesn't trigger on all hosts.

PR c++/98115
PR c++/98116
gcc/testsuite/
* g++.dg/template/pr98115.C: New.
* g++.dg/template/pr98116.C: New.

compiler: cast comparison function result to expected bool type

Otherwise cases like
type mybool bool
var b mybool = [10]string{} == [10]string{}
get an incorrect type checking error.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/274446

compiler: defer to middle-end for complex division

Go used to use slightly different semantics than C99 for complex division,
so we used runtime routines to handle the different. The gc compiler
has changes its behavior to match C99, so changes ours as well.

For golang/go#14644

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/274213

IBM Z: Fix mode in probe_stack pattern

The probe pattern uses Pmode but the middle-end wants to emit a
word_mode probe check. This - as usual - breaks on Z with -m31
-mzarch were word_mode doesn't match Pmode.

gcc/ChangeLog:

* config/s390/s390.md ("@probe_stack2<mode>"): Change mode
iterator to W.

gcc/testsuite/ChangeLog:

* gcc.target/s390/stack-clash-4.c: New test.

c++: Fix array type dependency [PR 98107]

I'd missed some paths through build_cplus_array_type, plus, some
arrays come via the C-type builder. This propagates dependency in
more places and asserts that in the cases where TYPE_DEPENDENT_P_VALID
is unset, the type is non-dependent.

PR c++/98107
gcc/cp/
* tree.c (build_cplus_array_type): Mark dependency of new variant.
(cp_build_qualified_type_real, strip_typedefs): Assert
TYPE_DEPENDENT_P_VALID, or not a dependent type.

aarch64: Don't fold svundef* at the gimple level

As the testcase shows, folding svundef*() at the gimple level
has the unfortunate side-effect of introducing -Wuninitialized
or -Wmaybe-uninitialized warnings. We don't have a testcase
that relies on the fold, so the easiest fix seems to be to
remove it.

gcc/
* config/aarch64/aarch64-sve-builtins-base.cc (svundef_impl::fold):
Delete.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/general/undef_1.c: New test.

Fix PR middle-end/98099

this replaces the ICE by a sorry message for the use of reverse scalar
storage order with 128-bit decimal floating-point type on 32-bit targets.

gcc/ChangeLog:
PR middle-end/98099
* expmed.c (flip_storage_order): In the case of a non-integer mode,
sorry out if the integer mode to be used instead is not supported.

gcc/testsuite/ChangeLog:
* gcc.dg/pr98099.c: New test.

Fix PR middle-end/98082

this fixes an ICE introduced by the fix for PR middle-end/97078 where
use_register_for_decl was changed to return true at -O0 for a parameter
of a thunk. It turns out that we need to do the same for a result in
this case.

gcc/ChangeLog:
PR middle-end/98082
* function.c (use_register_for_decl): Also return true for a result
if cfun->tail_call_marked is true.

gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/pr98082.C: New test.

c++: Add __builtin_bit_cast to implement std::bit_cast [PR93121]

The following patch adds __builtin_bit_cast builtin, similarly to
clang or MSVC which implement std::bit_cast using such an builtin too.
It checks the various std::bit_cast requirements, when not constexpr
evaluated acts pretty much like VIEW_CONVERT_EXPR of the source argument
to the destination type and the hardest part is obviously the constexpr
evaluation.
I've left out PDP11 handling of those, couldn't figure out how exactly are
bitfields laid out there

2020-12-03  Jakub Jelinek  <jakub@redhat.com>

PR libstdc++/93121
* fold-const.h (native_encode_initializer): Add mask argument
defaulted to nullptr.
(find_bitfield_repr_type): Declare.
(native_interpret_aggregate): Declare.
* fold-const.c (find_bitfield_repr_type): New function.
(native_encode_initializer): Add mask argument and support for
filling it.  Handle also some bitfields without integral
DECL_BIT_FIELD_REPRESENTATIVE.
(native_interpret_aggregate): New function.
* gimple-fold.h (clear_type_padding_in_mask): Declare.
* gimple-fold.c (struct clear_padding_struct): Add clear_in_mask
member.
(clear_padding_flush): Handle buf->clear_in_mask.
(clear_padding_union): Copy clear_in_mask.  Don't error if
buf->clear_in_mask is set.
(clear_padding_type): Don't error if buf->clear_in_mask is set.
(clear_type_padding_in_mask): New function.
(gimple_fold_builtin_clear_padding): Set buf.clear_in_mask to false.
* doc/extend.texi (__builtin_bit_cast): Document.

* c-common.h (enum rid): Add RID_BUILTIN_BIT_CAST.
* c-common.c (c_common_reswords): Add __builtin_bit_cast.

* cp-tree.h (cp_build_bit_cast): Declare.
* cp-tree.def (BIT_CAST_EXPR): New tree code.
* cp-objcp-common.c (names_builtin_p): Handle RID_BUILTIN_BIT_CAST.
(cp_common_init_ts): Handle BIT_CAST_EXPR.
* cxx-pretty-print.c (cxx_pretty_printer::postfix_expression):
Likewise.
* parser.c (cp_parser_postfix_expression): Handle
RID_BUILTIN_BIT_CAST.
* semantics.c (cp_build_bit_cast): New function.
* tree.c (cp_tree_equal): Handle BIT_CAST_EXPR.
(cp_walk_subtrees): Likewise.
* pt.c (tsubst_copy): Likewise.
* constexpr.c (check_bit_cast_type, cxx_eval_bit_cast): New functions.
(cxx_eval_constant_expression): Handle BIT_CAST_EXPR.
(potential_constant_expression_1): Likewise.
* cp-gimplify.c (cp_genericize_r): Likewise.

* g++.dg/cpp2a/bit-cast1.C: New test.
* g++.dg/cpp2a/bit-cast2.C: New test.
* g++.dg/cpp2a/bit-cast3.C: New test.
* g++.dg/cpp2a/bit-cast4.C: New test.
* g++.dg/cpp2a/bit-cast5.C: New test.

c++: consteval-defarg1.C test variant for templates

We weren't recognizing a default argument for a consteval member function as
being in immediate function context because there was no function parameter
scope to look at.

The following testcase is an attempt to test it with templates, both
non-dependent and dependent consteval calls in both function and class
templates, and with r11-5694 it now passes.

2020-12-03 Jakub Jelinek <jakub@redhat.com>

* g++.dg/cpp2a/consteval-defarg2.C: New test.

tree-ssa-threadedge.c (record_temporary_equivalences_from_stmts_at_dest): Do not allow __builtin_constant_p.

This is the same as commit 70a62009181f ("tree-ssa-threadbackward.c
(profitable_jump_thread_path): Do not allow __builtin_constant_p."), but
for the old forward threader.

gcc/ChangeLog:

2020-12-03 Ilya Leoshkevich <iii@linux.ibm.com>

* tree-ssa-threadedge.c (record_temporary_equivalences_from_stmts_at_dest):
Do not allow __builtin_constant_p on a threading path.

Fix division by 0 in printf_strlen_execute when dumping

gcc/ChangeLog:

2020-12-03 Ilya Leoshkevich <iii@linux.ibm.com>

* tree-ssa-strlen.c (printf_strlen_execute): Avoid division by
0.

RISC-V: Canonicalize --with-arch

- We would like to canonicalize the arch string for --with-arch for
easier handling multilib, so split canonicalization part to a stand
along script to shared the logic.

gcc/ChangeLog:

* config/riscv/multilib-generator (arch_canonicalize): Move
code to arch-canonicalize, and call that script to canonicalize arch
string.
(canonical_order): Move code to arch-canonicalize.
(LONG_EXT_PREFIXES): Ditto.
(IMPLIED_EXT): Ditto.
* config/riscv/arch-canonicalize: New.
* config.gcc (riscv*-*-*): Canonicalize --with-arch.

aarch64: Add +flagm to -march

New +flagm (Condition flag manipulation) feature option for -march command line
option.

Please note that FLAGM stays a Armv8.4-A feature but now can be
assigned to other architectures or CPUs.

gcc/ChangeLog:

* config/aarch64/aarch64-option-extensions.def
(AARCH64_OPT_EXTENSION): New +flagm option in -march for AArch64.
* config/aarch64/aarch64.h (AARCH64_FL_FLAGM): Add new flagm extension bit
mask.
(AARCH64_FL_FOR_ARCH8_4): Add flagm to Armv8.4-A.
* doc/invoke.texi: Update docs with +flagm.

testsuite: Add testcase for already fixed PR [PR98104]

This testcase got broken with r11-3826 and got fixed with r11-5628.

2020-12-03 Jakub Jelinek <jakub@redhat.com>

PR c++/98104
* g++.dg/warn/pr98104.C: New test.

Optimize vpsubusw compared to 0 into vpcmpleuw or vpcmpnleuw [PR96906]

For signed comparisons, it handles cases that are eq or neq to 0.
For unsigned comparisons, it additionaly handles cases that are le or
gt to 0(equivilent to eq or neq to 0). Transform case eq to leu,
case neq to gtu.

.i.e. for -mavx512bw -mavx512vl transform eq case code from

vpsubusw        %xmm1, %xmm0, %xmm0
vpxor   %xmm1, %xmm1, %xmm1
vpcmpeqw  %xmm1, %xmm0, %k0
to
vpcmpleuw       %xmm1, %xmm0, %k0

.i.e. for -mavx512bw -mavx512vl transform neq case code from

vpsubusw        %xmm1, %xmm0, %xmm0
vpxor   %xmm1, %xmm1, %xmm1
vpcmpneqw  %xmm1, %xmm0, %k0
to
vpcmpnleuw       %xmm1, %xmm0, %k0

gcc/ChangeLog
PR target/96906
* config/i386/sse.md
(<avx512>_ucmp<mode>3<mask_scalar_merge_name>): Add a new
define_split after this insn.

gcc/testsuite/ChangeLog

* gcc.target/i386/avx512bw-pr96906-1.c: New test.
* gcc.target/i386/pr96906-1.c: Add -mno-avx512f.

Fix incorrect replacement of vmovdqu32 with vpblendd which can cause fault.

gcc/ChangeLog:

PR target/97642
* config/i386/i386-expand.c
(ix86_expand_special_args_builtin): Don't move all-ones mask
operands into register.
* config/i386/sse.md (UNSPEC_MASKLOAD): New unspec.
(*<avx512>_load<mode>_mask): New define_insns for masked load
instructions.
(<avx512>_load<mode>_mask): Changed to define_expands which
specifically handle memory or all-ones mask operands.
(<avx512>_blendm<mode>): Changed to define_insns which are same
as original <avx512>_load<mode>_mask with adjustment of
operands order.
(*<avx512>_load<mode>): New define_insn_and_split which is
used to optimize for masked load with all one mask.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512bw-vmovdqu16-1.c: Adjust testcase to
make sure only masked load instruction is generated.
* gcc.target/i386/avx512bw-vmovdqu8-1.c: Ditto.
* gcc.target/i386/avx512f-vmovapd-1.c: Ditto.
* gcc.target/i386/avx512f-vmovaps-1.c: Ditto.
* gcc.target/i386/avx512f-vmovdqa32-1.c: Ditto.
* gcc.target/i386/avx512f-vmovdqa64-1.c: Ditto.
* gcc.target/i386/avx512vl-vmovapd-1.c: Ditto.
* gcc.target/i386/avx512vl-vmovaps-1.c: Ditto.
* gcc.target/i386/avx512vl-vmovdqa32-1.c: Ditto.
* gcc.target/i386/avx512vl-vmovdqa64-1.c: Ditto.
* gcc.target/i386/pr97642-1.c: New test.
* gcc.target/i386/pr97642-2.c: New test.

c++: Push parms when late parsing default args

In this testcase we weren't catching the error in A::f because the parameter
'I' wasn't in scope, so the default argument for 'b' found the global
typedef I. Fixed by pushing the parms before parsing. This is a bit
complicated because pushdecl clears DECL_CHAIN; do_push_parm_decls deals
with this by nreversing first, but that doesn't work here because we only
want to push them one at a time; if we pushed all of them before parsing,
we'd wrongly reject A::g.

gcc/cp/ChangeLog:

* parser.c (cp_parser_primary_expression): Distinguish
parms from vars in error.
(cp_parser_late_parsing_default_args): Pushdecl parms
as we go.

gcc/testsuite/ChangeLog:

* g++.dg/parse/defarg17.C: New test.

c++: Fix late-parsed default arg context

Jakub noticed that we weren't recognizing a default argument for a consteval
member function as being in immediate function context because there was no
function parameter scope to look at.

Note that this patch doesn't actually push the parameters into the scope,
that happens in a separate commit.

gcc/cp/ChangeLog:

* name-lookup.c (begin_scope): Set immediate_fn_ctx_p.
* parser.c (cp_parser_late_parsing_default_args): Push
sk_function_parms scope.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/consteval-defarg1.C: New test.

Add popcount<mode> expander to enable popcount auto vectorization under AVX512BITALG/AVX512POPCNTDQ target.

gcc/ChangeLog

PR target/97770
* config/i386/sse.md (popcount<mode>2): New expander
for SI/DI vector modes.
(popcount<mode>2): Likewise for QI/HI vector modes.

gcc/testsuite/ChangeLog

PR target/97770
* gcc.target/i386/avx512bitalg-pr97770-1.c: New test.
* gcc.target/i386/avx512vpopcntdq-pr97770-1.c: Likewise.
* gcc.target/i386/avx512vpopcntdq-pr97770-2.c: Likewise.
* gcc.target/i386/avx512vpopcntdqvl-pr97770-1.c: Likewise.

introduce overridable clear_cache emitter

This patch introduces maybe_emit_call_builtin___clear_cache for the
builtin expander machinery and the trampoline initializers to use to
clear the instruction cache, removing a source of inconsistencies and
subtle errors in low-level machinery.

I've adjusted all trampoline_init implementations that used to issue
explicit calls to __clear_cache or similar to use this new primitive.

Specifically on vxworks targets, we needed to drop the __clear_cache
symbol in libgcc, for reasons related with linking that I didn't need
to understand, and we wanted to call cacheTextUpdate directly, despite
the different calling conventions: the second argument is a length
rather than the end address.

So I introduced a target hook to enable target OS-level overriding of
builtin __clear_cache call emission, retaining nearly (*) the same
logic to govern the decision on whether to emit a call (or nothing, or
a machine-dependent insn) but enabling a call to a target
system-defined function with different calling conventions to be
issued, without having to modify .md files of the various
architectures supported by the target system to introduce or modify
clear_cache insns.

(*) I write "nearly" mainly because, when not optimizing, we'd issue a
call regardless, but since the call may now be overridden, I added it
to the set of builtins that are not directly turned into calls when
not optimizing, following the normal expansion path instead.  It
wouldn't be hard to skip the emission of cache-clearing insns when not
optimizing, but it didn't seem very important, especially for the new
uses from trampoline init.

    Another difference that might be relevant is that now we expand
the begin and end arguments unconditionally.  This might make a
difference if they have side effects.  That's prettty much impossible
at expand time, but I thought I'd mention it.

I have NOT modified targets that did not issue cache-clearing calls in
trampoline init to use the new clear_cache-calling infrastructure even
if it would expand to nothing.  I have considered doing so, to have
__builtin___clear_cache and trampoline init call cacheTextUpdate on
all vxworks targets, but decided not to, since on targets that don't
do any cache clearing, cacheTextUpdate ought to be a no-op, even
though rs6000 seems to use icbi and dcbf instructions in the function
called to initialize a trampoline, but AFAICT not in the __clear_cache
builtin.  Hopefully target maintainers will have a look and take
advantage of this new piece of infrastructure to remove such
(apparent?) inconsistencies.  Not rs6000 and other that call asm-coded
trampoline setup instructions, for sure, but they might wish to
introduce a CLEAR_INSN_CACHE macro or a clear_cache expander if they
don't have one.

for  gcc/ChangeLog

* builtins.c (default_emit_call_builtin___clear_cache): New.
(maybe_emit_call_builtin___clear_cache): New.
(expand_builtin___clear_cache): Split into the above.
(expand_builtin): Do not issue clear_cache call any more.
* builtins.h (maybe_emit_call_builtin___clear_cache): Declare.
* config/aarch64/aarch64.c (aarch64_trampoline_init): Use
maybe_emit_call_builtin___clear_cache.
* config/arc/arc.c (arc_trampoline_init): Likewise.
* config/arm/arm.c (arm_trampoline_init): Likewise.
* config/c6x/c6x.c (c6x_initialize_trampoline): Likewise.
* config/csky/csky.c (csky_trampoline_init): Likewise.
* config/m68k/linux.h (FInALIZE_TRAMPOLINE): Likewise.
* config/tilegx/tilegx.c (tilegx_trampoline_init): Likewise.
* config/tilepro/tilepro.c (tilepro_trampoline_init): Ditto.
* config/vxworks.c: Include rtl.h, memmodel.h, and optabs.h.
(vxworks_emit_call_builtin___clear_cache): New.
* config/vxworks.h (CLEAR_INSN_CACHE): Drop.
(TARGET_EMIT_CALL_BUILTIN___CLEAR_CACHE): Define.
* target.def (trampoline_init): In the documentation, refer to
maybe_emit_call_builtin___clear_cache.
(emit_call_builtin___clear_cache): New.
* doc/tm.texi.in: Add new hook point.
(CLEAR_CACHE_INSN): Remove duplicate 'both'.
* doc/tm.texi: Rebuilt.
* targhooks.h (default_meit_call_builtin___clear_cache):
Declare.
* tree.h (BUILTIN_ASM_NAME_PTR): New.

for  libgcc/ChangeLog

* config/t-vxworks (LIB2ADD): Drop.
* config/t-vxworks7 (LIB2ADD): Likewise.
* config/vxcache.c: Remove.

options.exp: unsupport tests that depend on missing language

There's a help.exp test that checks that the help message for
-Wabsolute-value mentions it's available in C and ObjC, when compiling
a C++ program.

However, if GCC is built with the C++ language disabled, the
.cc file is compiled as C, and the message [available in C...] becomes
[disabled] instead, because that's the default for the flag in C.

I suppose it might also be possible to disable the C language, and
then the multitude of help.exp tests that name c as the source
language will fail.

This patch avoids these fails: it detects the message "compiler not
installed" in the compiler output, and bails out as "unsupported".

for gcc/testsuite/ChangeLog

* lib/options.exp (check_for_options_with_filter): Detect
unavailable compiler for the selected language, and bail out
as unsupported.