git.libre-soc.org Git

Richard Biener [Mon, 13 Jul 2020 09:41:16 +0000 (11:41 +0200)]

fix global variable alignment for testcase gcc.dg/torture/pr96133.c

The testcase was errorneously accessing the global variable via a
type that might require bigger alignment than provided. Fix that
via an appropriate attribute.

2020-07-13 Richard Biener <rguenther@suse.de>

PR testsuite/96180
* gcc.dg/torture/pr96133.c: Align global variable.

commit | commitdiff | tree

Roger Sayle [Mon, 13 Jul 2020 08:49:34 +0000 (09:49 +0100)]

middle-end: Remove truly_noop_truncation check from convert.c

This patch eliminates a check of targetm.truly_noop_truncation from
the early middle-end, where the gimple/generic being generated by
GCC's front-ends is being inappropriately influenced by the target's
TRULY_NOOP_TRUNCATION.  The (recent) intention of TRULY_NOOP_TRUNCATION
is to indicate that a backend requires explicit truncation instructions
rather than using SUBREGs to perform truncations.  A long standing
(and probably unintentional) side-effect has been that this setting
also controls whether the middle-end narrows integer operations at
the tree-level.  Understandably, GCC and its testsuite assume that
GIMPLE and GENERIC behave consistently across platforms, and alas
defining TRULY_NOOP_TRUNCATION away from the default triggers several
regressions (including gcc.dg/fold-rotate-1.c).

2020-07-13  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
* convert.c (convert_to_integer_1): Narrow integer operations
even on targets that require explicit truncation instructions.

commit | commitdiff | tree

Hans-Peter Nilsson [Sun, 12 Jul 2020 16:41:25 +0000 (18:41 +0200)]

cris: Add new pass eliminating compares after delay-slot-filling

Delayed-branch-slot-filling a.k.a. reorg or dbr, often causes
opportunities for more compare-elimination than were visible for
the cmpelim pass.  With cc0, these were caught by the
elimination pass run in "final", thus the missed opportunities
is a regression.  A simple reorg-aware pass run just after reorg
handles most of them, if not all.  I chose to keep the "mach2"
pass identifier string I copy-pasted from the SPARC port instead
of inventing one like "postdbr_cmpelim".  Note the gap in numbers
in the test-case file names.

gcc:
PR target/93372
* config/cris/cris-passes.def: New file.
* config/cris/t-cris (PASSES_EXTRA): Add cris-passes.def.
* config/cris/cris.c: Add infrastructure bits and pass execute
function cris_postdbr_cmpelim.
* config/cris/cris-protos.h (make_pass_cris_postdbr_cmpelim): Declare.

gcc/testsuite:
* gcc.target/cris/pr93372-44.c, gcc.target/cris/pr93372-46.c: New.

commit | commitdiff | tree

Hans-Peter Nilsson [Fri, 10 Jul 2020 02:44:49 +0000 (04:44 +0200)]

cris: Remove config/cris/t-cris gt-cris.h cargo

Getting tired of:

make[1]: Entering directory 'x/gccobj/gcc'
Makefile:2682: warning: overriding recipe for target 'gt-cris.h'
xx/gcc/gcc/config/cris/t-cris:29: warning: ignoring old recipe for target 'gt-cris.h'

I'm just going to assume it is just stale cruft no longer (if
ever) needed since nothing else but sh/t-sh has it, and the
commit log shows just (x prepended to avoid commit-log parsing
confusion):
x Merge from pch-branch up to tag pch-commit-20020603.
x
x From-SVN: r54232

Building "works better"; the related warning is gone.

This effectively empties the t-cris file, but stuff will be
added soon enough that it's kept around.

gcc:
* config/cris/t-cris: Remove gt-cris.h-related excessive cargo.

commit | commitdiff | tree

Hans-Peter Nilsson [Wed, 8 Jul 2020 21:59:12 +0000 (23:59 +0200)]

cris: Use addi.b for additions where flags aren't inspected

Comparing to the cc0 version of the CRIS port, I ran a few
microbenchmarks, for example gcc.c-torture/execute/arith-rand.c,
where there's sometimes an addition between an operation of
interest and the test on the result.

Unfortunately this patch doesn't remedy all the performance
regression for that program.  But, this patch by itself helps
and makes sense to commit separately: lots of addi.b in
previously empty delay-slots, with functions shortened by one or
a few insns, in libgcc.  I had an experience with the
reload-related caveat of % on constraints, which is "fixed"
documentationwise since long (soon 15 years ago;
be3914df4cc8/r105517).  I removed an even older related FIXME.

gcc:
PR target/93372
* config/cris/cris.md ("*add<mode>3_addi"): New splitter.
("*addi_b_<mode>"): New pattern.
("*addsi3<setnz>"): Remove stale %-related comment.

gcc/testsuite:
PR target/93372
* gcc.target/cris/pr93372-45.c: New test.

commit | commitdiff | tree

Hans-Peter Nilsson [Sun, 12 Jul 2020 19:58:58 +0000 (21:58 +0200)]

cris: Correct output templates in define_subst patterns.

Whoops. This little gem had the effect of making the output
operand (0) constraints disappear but not the input operand (1)
constraints for define_subst:ed patterns, probably because
there's another (match_dup 1) in the output template (not
investigated).

That went surprisingly unnoticed until I added a pass leaning
just a little bit harder on the define_subst:ed patterns and
then only by the libgfortran library generating assembly with
nominally incorrect syntax. (There was a move to a special
register from a general register, and it incorrectly matched a
pattern affecting condition codes.)

gcc:
* config/cris/cris.md ("setnz_subst", "setnz_subst", "setcc_subst"):
Use match_dup in output template, not match_operand.

commit | commitdiff | tree

Richard Biener [Fri, 10 Jul 2020 09:58:41 +0000 (11:58 +0200)]

make var-tracking iteration consistent

This eliminates the visited bitmap and makes whether a to be processed
block goes to the next or the current iteration only depend on its
position in RPO order rather than on whether it was visited in the
current iteration.  As optimization single-BB iteration is processed
immediately.

2020-07-10  Richard Biener  <rguenther@suse.de>

* var-tracking.c (bb_heap_node_t): Remove unused typedef.
(vt_find_locations): Eliminate visited bitmap in favor of
RPO order check.  Dump statistics about the number of
local BB dataflow computes.

commit | commitdiff | tree

Hans-Peter Nilsson [Sun, 5 Jul 2020 18:50:52 +0000 (20:50 +0200)]

PR94600: fix volatile access to the whole of a compound object.

The store to the whole of each volatile object was picked apart
like there had been an individual assignment to each of the
fields.  Reads were added as part of that; see PR for details.
The reads from volatile memory were a clear bug; individual
stores questionable.  A separate patch clarifies the docs.

gcc:

2020-07-09  Richard Biener  <rguenther@suse.de>

PR middle-end/94600
* expr.c (expand_constructor): Make a temporary also if we're
storing to volatile memory.

gcc/testsuite:

2020-07-09  Hans-Peter Nilsson  <hp@axis.com>

PR middle-end/94600
* gcc.dg/pr94600-1.c, gcc.dg/pr94600-2.c, gcc.dg/pr94600-3.c,
gcc.dg/pr94600-4.c, gcc.dg/pr94600-5.c, gcc.dg/pr94600-6.c,
gcc.dg/pr94600-7.c, gcc.dg/pr94600-8.c: New tests.

commit | commitdiff | tree

Xionghu Luo [Mon, 13 Jul 2020 01:22:56 +0000 (20:22 -0500)]

rs6000: Define define_insn_and_split to split unspec sldi+or to rldimi

Combine pass could recognize the pattern defined and split it in split1,
this patch could optimize:

21: r130:DI=r133:DI<<0x20
11: {r129:DI=zero_extend(unspec[[r145:DI]] 87);clobber scratch;}
22: r134:DI=r130:DI|r129:DI

to

21: {r149:DI=zero_extend(unspec[[r145:DI]] 87);clobber scratch;}
22: r134:DI=r149:DI&0xffffffff|r133:DI<<0x20

rldimi is generated instead of sldi+or.

gcc/ChangeLog:

2020-07-13 Xionghu Luo <luoxhu@linux.ibm.com>

* config/rs6000/rs6000.md (rotl_unspec): New
define_insn_and_split.

gcc/testsuite/ChangeLog:

2020-07-13 Xionghu Luo <luoxhu@linux.ibm.com>

* gcc.target/powerpc/vector_float.c: New test.

commit | commitdiff | tree

Xionghu Luo [Mon, 13 Jul 2020 01:21:05 +0000 (20:21 -0500)]

rs6000: Init V4SF vector without converting SP to DP

Move V4SF to V4SI, init vector like V4SI and move to V4SF back.
Better instruction sequence could be generated on Power9:

lfs + xxpermdi + xvcvdpsp + vmrgew
=>
lwz + (sldi + or) + mtvsrdd

With the patch followed, it could be continue optimized to:

lwz + rldimi + mtvsrdd

The point is to use lwz to avoid converting the single-precision to
double-precision upon load, pack four 32-bit data into one 128-bit
register directly.

gcc/ChangeLog:

2020-07-13 Xionghu Luo <luoxhu@linux.ibm.com>

* config/rs6000/rs6000.c (rs6000_expand_vector_init):
Move V4SF to V4SI, init vector like V4SI and move to V4SF back.

commit | commitdiff | tree

GCC Administrator [Mon, 13 Jul 2020 00:16:22 +0000 (00:16 +0000)]

Daily bump.

commit | commitdiff | tree

H.J. Lu [Fri, 10 Jul 2020 21:50:03 +0000 (14:50 -0700)]

x86: Require Linux target for PR target/93492 tests

Since -fpatchable-function-entry is only supported on Linux and used by
Linux kernel, require Linux target for PR target/93492 tests.

PR target/93492
* gcc.target/i386/pr93492-1.c: Require Linux target.
* gcc.target/i386/pr93492-2.c: Likewise.
* gcc.target/i386/pr93492-3.c: Likewise.
* gcc.target/i386/pr93492-4.c: Likewise.
* gcc.target/i386/pr93492-5.c: Likewise.

commit | commitdiff | tree

GCC Administrator [Sun, 12 Jul 2020 00:16:23 +0000 (00:16 +0000)]

Daily bump.

commit | commitdiff | tree

Ian Lance Taylor [Fri, 10 Jul 2020 20:43:09 +0000 (13:43 -0700)]

compiler: avoid generating unnamed bool type descriptor

We were generating it in cases where a boolean expression was
converted directly to an empty interface type.

Fixes golang/go#40152

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/242002

commit | commitdiff | tree

Ian Lance Taylor [Fri, 10 Jul 2020 17:28:34 +0000 (10:28 -0700)]

compiler: handle aliases to pointer types with interfaces

Test case is https://golang.org/cl/241997.

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/241998

commit | commitdiff | tree

Roger Sayle [Sat, 11 Jul 2020 19:03:39 +0000 (20:03 +0100)]

middle-end: Improve RTL expansion in expand_mul_overflow,

This patch improves the RTL that the middle-end generates for testing
signed overflow following a widening multiplication.  During this
expansion the middle-end generates a truncation which can get used
multiple times.  Placing this intermediate value in a pseudo register
reduces the amount of code generated on platforms where this truncation
requires an explicit instruction.

2020-07-11  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog:
* internal-fn.c (expand_mul_overflow): When checking for signed
overflow from a widening multiplication, we access the truncated
lowpart RES twice, so keep this value in a pseudo register.

commit | commitdiff | tree

Thomas Koenig [Sat, 11 Jul 2020 17:16:16 +0000 (19:16 +0200)]

Fix ICE on warning with new interface check.

In the test case, there was a warning about INTENT where an EXTERNAL
masked an interface in an outer scope, when the location of the
symbol was not set, leading to an ICE.

Two problems, two-part solution: It makes no sense to warn about
INTENT for artificially generated formal argument lists, and the
location should be set.

gcc/fortran/ChangeLog:

PR fortran/96073
* frontend-passes.c (check_externals_procedure): Add locus
information for new_sym.
* interface.c (gfc_check_dummy_characteristics): Do not warn
about INTENT for artificially generated variables.

gcc/testsuite/ChangeLog:

PR fortran/96073
* gfortran.dg/interface_48.f90: New test.

commit | commitdiff | tree

David Edelsohn [Sat, 11 Jul 2020 15:37:56 +0000 (11:37 -0400)]

ChangeLog: add missing Bugzilla PR.

commit | commitdiff | tree

Richard Sandiford [Sat, 11 Jul 2020 12:25:26 +0000 (13:25 +0100)]

value-range: Fix handling of POLY_INT_CST anti-ranges [PR96146]

The range infrastructure has code to decompose POLY_INT_CST ranges
to worst-case integer bounds.  However, it had the fundamental flaw
(obvious in hindsight) that it applied to anti-ranges too, meaning
that a range 2+2X would end up with a range of ~[2, +INF], i.e.
[-INF, 1].  This patch decays to varying in that case instead.

I'm still a bit uneasy about this.  ISTM that in terms of
generality:

  SSA_NAME => POLY_INT_CST => INTEGER_CST
           => ADDR_EXPR

I.e. an SSA_NAME could store a POLY_INT_CST and a POLY_INT_CST
could store an INTEGER_CST (before canonicalisation).  POLY_INT_CST
is also “as constant as” ADDR_EXPR (well, OK, only some ADDR_EXPRs
are run-time rather than link-time constants, whereas all POLY_INT_CSTs
are, but still).  So it seems like we should at least be able to treat
POLY_INT_CST as symbolic.  On the other hand, I don't have any examples
in which that would be useful.

gcc/
PR tree-optimization/96146
* value-range.cc (value_range::set): Only decompose POLY_INT_CST
bounds to integers for VR_RANGE.  Decay to VR_VARYING for anti-ranges
involving POLY_INT_CSTs.

gcc/testsuite/
PR tree-optimization/96146
* gcc.target/aarch64/sve/acle/general/pr96146.c: New test.

commit | commitdiff | tree

Simon Cook [Fri, 10 Jul 2020 13:53:19 +0000 (14:53 +0100)]

RISC-V: Fix regular expression in target-specific test

Some square brackets were missing escape characters, causing DejaGnu to
try and call a proc with the name "at".

gcc/testsuite/ChangeLog:
* gcc.target/riscv/read-thread-pointer.c: Fix escaping on
regular expression.

commit | commitdiff | tree

GCC Administrator [Sat, 11 Jul 2020 00:16:31 +0000 (00:16 +0000)]

Daily bump.

commit | commitdiff | tree

David Edelsohn [Fri, 10 Jul 2020 21:06:21 +0000 (17:06 -0400)]

aix: only create named section for VAR_DECL or FUNCTION_DECL

get_constant_section() can be passed constant-like non-DECLs, such as
CONSTRUCTOR or STRING_CST, which make DECL_SECTION_NAME unhappy
(asserted in symtab_node::get). This patch ensures that xcoff select
section only invokes resolve_unique_section() for DECLs.

gcc/ChangeLog

2020-07-10 David Edelsohn <dje.gcc@gmail.com>

* config/rs6000/rs6000.c (rs6000_xcoff_select_section): Only
create named section for VAR_DECL or FUNCTION_DECL.

commit | commitdiff | tree

Joseph Myers [Fri, 10 Jul 2020 21:35:51 +0000 (21:35 +0000)]

c: Add C2X BOOL_MAX and BOOL_WIDTH to limits.h

C2X adds BOOL_MAX and BOOL_WIDTH macros to <limits.h>. As GCC only
supports values 0 and 1 for _Bool (regardless of the number of bits in
the representation, other bits are padding bits and if any of them are
nonzero, the representation is a trap representation), the values of
those macros can just be hardcoded directly in <limits.h> rather than
needing corresponding predefined macros.

Bootstrapped with no regressions on x86_64-pc-linux-gnu.

gcc/
* glimits.h [__STDC_VERSION__ > 201710L] (BOOL_MAX, BOOL_WIDTH):
New macros.

gcc/testsuite/
* gcc.dg/c11-bool-limits-1.c, gcc.dg/c2x-bool-limits-1.c: New
tests.

commit | commitdiff | tree

Aaron Sawdey [Tue, 30 Jun 2020 19:26:26 +0000 (14:26 -0500)]

rs6000: Add execution tests for mma builtins [v4]

This patch adds execution tests that use the MMA builtins and
check for the right answer, and new tests that checks whether
__builtin_cpu_supports and __builtin_cpu_is return sane
answers for power10.

2020-06-30 Rajalakshmi Srinivasaraghavan <rajis@linux.vnet.ibm.com>
Aaron Sawdey <acsawdey@linux.ibm.com>

gcc/testsuite/
* gcc.target/powerpc/p10-identify.c: New file.
* gcc.target/powerpc/p10-arch31.c: New file.
* gcc.target/powerpc/mma-single-test.c: New file.
* gcc.target/powerpc/mma-double-test.c: New file.

commit | commitdiff | tree

Alexander Popov [Fri, 10 Jul 2020 20:10:16 +0000 (14:10 -0600)]

Improve shrink wrapping debug output

Currently if requires_stack_frame_p() returns true for some insn, the
shrink-wrapping debug output contains only the number of a block containing
that insn.

But it is very useful to see the particular insn that requires the prologue.
Let's call print_rtl_single to display that insn in the following pass dump.

gcc/

* shrink-wrap.c (try_shrink_wrapping): Improve debug output.

commit | commitdiff | tree

Mike Nolta [Fri, 10 Jul 2020 20:05:41 +0000 (14:05 -0600)]

This is a harmless bug, as the script still works, but curl's '-O' option isn't the same as wget's.

contrib/ChangeLog:

* download_prerequisites: Don't pass wget options to curl.

commit | commitdiff | tree

Harald Anlauf [Fri, 10 Jul 2020 19:35:35 +0000 (21:35 +0200)]

PR fortran/95980 - ICE in get_unique_type_string, at fortran/class.c:485

In SELECT TYPE, the argument may be an incorrectly specified unlimited
CLASS variable. Avoid NULL pointer dereferences for clean error
recovery.

gcc/fortran/
PR fortran/95980
* class.c (gfc_add_component_ref, gfc_build_class_symbol):
Add checks for NULL pointer dereference.
* primary.c (gfc_variable_attr): Likewise.
* resolve.c (resolve_variable, resolve_assoc_var)
(resolve_fl_var_and_proc, resolve_fl_variable_derived)
(resolve_symbol): Likewise.

commit | commitdiff | tree

Harald Anlauf [Fri, 10 Jul 2020 19:00:13 +0000 (21:00 +0200)]

PR fortran/96086 - ICE in gfc_match_select_rank, at fortran/match.c:6645

Handle NULL pointer dereference on SELECT RANK with an invalid
assumed-rank array declaration.

gcc/fortran/
PR fortran/96086
* match.c (gfc_match_select_rank): Catch NULL pointer
dereference.
* resolve.c (resolve_assoc_var): Catch NULL pointer dereference
that may occur after an illegal declaration.

commit | commitdiff | tree

Ian Lance Taylor [Fri, 10 Jul 2020 17:51:40 +0000 (10:51 -0700)]

libgo: update to Go 1.14.4 release

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/241999

commit | commitdiff | tree

Richard Sandiford [Fri, 10 Jul 2020 18:06:46 +0000 (19:06 +0100)]

expr: Move reduce_bit_field target mode check [PR96151]

In some cases, expand_expr_real_2 prefers to use the mode of the
caller-suggested target instead of the mode of the expression when
passing values to reduce_to_bit_field_precision.  E.g.:

      else if (target == 0)
        op0 = convert_to_mode (mode, op0,
                               TYPE_UNSIGNED (TREE_TYPE
                                              (treeop0)));
      else
        {
          convert_move (target, op0,
                        TYPE_UNSIGNED (TREE_TYPE (treeop0)));
          op0 = target;
        }

where “op0” might not have “mode” for the “else” branch,
but does for all the others.

reduce_to_bit_field_precision discards the suggested target if it
has the wrong mode.  This patch moves that to expand_expr_real_2
instead (conditional on reduce_bit_field).

gcc/
PR middle-end/96151
* expr.c (expand_expr_real_2): When reducing bit fields,
clear the target if it has a different mode from the expression.
(reduce_to_bit_field_precision): Don't do that here.  Instead
assert that the target already has the correct mode.

commit | commitdiff | tree

Richard Sandiford [Fri, 10 Jul 2020 18:06:45 +0000 (19:06 +0100)]

arm: Treat GNU and Advanced SIMD vectors as distinct [PR92789, PR95726]

This is an arm version of aarch64 patch r11-1741.  The approach
is essentially identical, not much more than s/aarch64/arm/.

To recap, PR95726 is about template look-up for things like:

    foo<float vecf __attribute__((vector_size(16)))>
    foo<float32x4_t>

The immediate cause of the problem is that the hash function usually
returns different hashes for these types, yet the equality function
thinks they are equal.  This then raises the question of how the types
are supposed to be treated.

The answer we chose for AArch64 was that the GNU vector type should
be treated as distinct from float32x4_t, but that each type should
implicitly convert to the other.

This would mean that, as far as the PR is concerned, the hashing
function is right to (sometimes) treat the types differently and
the equality function is wrong to treat them as the same.

The most obvious way to enforce the type difference is to use a
target-specific type attribute.  That on its own is enough to fix
the PR.  The difficulty is deciding whether the knock-on effects
are acceptable.

One obvious effect is that GCC then rejects:

    typedef float vecf __attribute__((vector_size(16)));
    vecf x;
    float32x4_t &z = x;

on the basis that the types are no longer reference-compatible.
For AArch64 we took the approach that this was the correct behaviour.
It is also consistent with current Clang.

A trickier question is whether:

    vecf x;
    float32x4_t y;
    … c ? x : y …

should be valid, and if so, what its type should be [PR92789].
As explained in the comment in the testcase, GCC and Clang both
accepted this, but GCC chose the “then” type while Clang chose
the “else” type.  This can lead to different mangling for (probably
artificial) corner cases, as seen for “sel1” and “sel2” in the
testcase.

Adding the attribute makes GCC reject the conditional expression
as ambiguous.  For AArch64 we took the approach that this too is
the correct behaviour, for the reasons described in the testcase.
However, it does seem to have the potential to break existing code.

gcc/
PR target/92789
PR target/95726
* config/arm/arm.c (arm_attribute_table): Add
"Advanced SIMD type".
(arm_comp_type_attributes): Check that the "Advanced SIMD type"
attributes are equal.
* config/arm/arm-builtins.c: Include stringpool.h and
attribs.h.
(arm_mangle_builtin_vector_type): Use the mangling recorded
in the "Advanced SIMD type" attribute.
(arm_init_simd_builtin_types): Add an "Advanced SIMD type"
attribute to each Advanced SIMD type, using the mangled type
as the attribute's single argument.

gcc/testsuite/
PR target/92789
PR target/95726
* g++.target/arm/pr95726.C: New test.

commit | commitdiff | tree

Carl Love [Thu, 9 Jan 2020 19:37:18 +0000 (13:37 -0600)]

RS6000, add VSX mask manipulation support

gcc/ChangeLog

2020-07-09  Carl Love  <cel@us.ibm.com>

* config/rs6000/vsx.md  (VSX_MM): New define_mode_iterator.
(VSX_MM4): New define_mode_iterator.
(vec_mtvsrbmi): New define_insn.
(vec_mtvsr_<mode>): New define_insn.
(vec_cntmb_<mode>): New define_insn.
(vec_extract_<mode>): New define_insn.
(vec_expand_<mode>): New define_insn.
(define_c_enum unspec): Add entries UNSPEC_MTVSBM, UNSPEC_VCNTMB,
UNSPEC_VEXTRACT, UNSPEC_VEXPAND.
* config/rs6000/altivec.h ( vec_genbm, vec_genhm, vec_genwm,
vec_gendm, vec_genqm, vec_cntm, vec_expandm, vec_extractm): Add
defines.
* config/rs6000/rs6000-builtin.def: Add defines BU_P10_2, BU_P10_1.
(BU_P10_1): Add definitions for mtvsrbm, mtvsrhm, mtvsrwm,
mtvsrdm, mtvsrqm, vexpandmb, vexpandmh, vexpandmw, vexpandmd,
vexpandmq, vextractmb, vextractmh, vextractmw, vextractmd, vextractmq.
(BU_P10_2): Add definitions for cntmbb, cntmbh, cntmbw, cntmbd.
(BU_P10_OVERLOAD_1): Add definitions for mtvsrbm, mtvsrhm,
mtvsrwm, mtvsrdm, mtvsrqm, vexpandm, vextractm.
(BU_P10_OVERLOAD_2): Add defition for cntm.
* config/rs6000/rs6000-call.c (rs6000_expand_binop_builtin): Add
checks for CODE_FOR_vec_cntmbb_v16qi, CODE_FOR_vec_cntmb_v8hi,
CODE_FOR_vec_cntmb_v4si, CODE_FOR_vec_cntmb_v2di.
(altivec_overloaded_builtins): Add overloaded argument entries for
P10_BUILTIN_VEC_MTVSRBM, P10_BUILTIN_VEC_MTVSRHM,
P10_BUILTIN_VEC_MTVSRWM, P10_BUILTIN_VEC_MTVSRDM,
P10_BUILTIN_VEC_MTVSRQM, P10_BUILTIN_VEC_VCNTMBB,
P10_BUILTIN_VCNTMBB, P10_BUILTIN_VCNTMBH,
P10_BUILTIN_VCNTMBW, P10_BUILTIN_VCNTMBD,
P10_BUILTIN_VEXPANDMB, P10_BUILTIN_VEXPANDMH,
P10_BUILTIN_VEXPANDMW, P10_BUILTIN_VEXPANDMD,
P10_BUILTIN_VEXPANDMQ, P10_BUILTIN_VEXTRACTMB,
P10_BUILTIN_VEXTRACTMH, P10_BUILTIN_VEXTRACTMW,
P10_BUILTIN_VEXTRACTMD, P10_BUILTIN_VEXTRACTMQ.
(builtin_function_type): Add case entries for P10_BUILTIN_MTVSRBM,
P10_BUILTIN_MTVSRHM, P10_BUILTIN_MTVSRWM, P10_BUILTIN_MTVSRDM,
P10_BUILTIN_MTVSRQM, P10_BUILTIN_VCNTMBB, P10_BUILTIN_VCNTMBH,
P10_BUILTIN_VCNTMBW, P10_BUILTIN_VCNTMBD,
P10_BUILTIN_VEXPANDMB, P10_BUILTIN_VEXPANDMH,
P10_BUILTIN_VEXPANDMW, P10_BUILTIN_VEXPANDMD,
P10_BUILTIN_VEXPANDMQ.
* config/rs6000/rs6000-builtin.def (altivec_overloaded_builtins): Add
entries for MTVSRBM, MTVSRHM, MTVSRWM, MTVSRDM, MTVSRQM, VCNTM,
VEXPANDM, VEXTRACTM.

gcc/testsuite/ChangeLog

2020-07-09  Carl Love  <cel@us.ibm.com>
* gcc.target/powerpc/vsx_mask-count-runnable.c: New test case.
* gcc.target/powerpc/vsx_mask-expand-runnable.c: New test case.
* gcc.target/powerpc/vsx_mask-extract-runnable.c: New test case.
* gcc.target/powerpc/vsx_mask-move-runnable.c: New test case.

commit | commitdiff | tree

Julian Brown [Fri, 22 May 2020 11:06:10 +0000 (04:06 -0700)]

openacc: Adjust dynamic reference count semantics

This patch adjusts how dynamic reference counts work so that they match
the semantics of the source program more closely, instead of representing
"excess" reference counts beyond those that represent pointers in the
internal libgomp splay-tree data structure. This allows some corner
cases to be handled more gracefully.

2020-07-10  Julian Brown  <julian@codesourcery.com>
    Thomas Schwinge  <thomas@codesourcery.com>

libgomp/
* libgomp.h (struct splay_tree_key_s): Change virtual_refcount to
dynamic_refcount.
(struct gomp_device_descr): Remove GOMP_MAP_VARS_OPENACC_ENTER_DATA.
* oacc-mem.c (acc_map_data): Substitute virtual_refcount for
dynamic_refcount.
(acc_unmap_data): Update comment.
(goacc_map_var_existing, goacc_enter_datum): Adjust for
dynamic_refcount semantics.
(goacc_exit_datum_1, goacc_exit_datum): Re-add some error checking.
Adjust for dynamic_refcount semantics.
(goacc_enter_data_internal): Implement "present" case of dynamic
memory-map handling here.  Update "non-present" case for
dynamic_refcount semantics.
(goacc_exit_data_internal): Use goacc_exit_datum_1.
* target.c (gomp_map_vars_internal): Remove
GOMP_MAP_VARS_OPENACC_ENTER_DATA handling.  Update for dynamic_refcount
handling.
(gomp_unmap_vars_internal): Remove virtual_refcount handling.
(gomp_load_image_to_device): Substitute dynamic_refcount for
virtual_refcount.
* testsuite/libgomp.oacc-c-c++-common/pr92843-1.c: Remove XFAILs.
* testsuite/libgomp.oacc-c-c++-common/refcounting-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/refcounting-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/struct-3-1-1.c: New test.
* testsuite/libgomp.oacc-fortran/deep-copy-6.f90: Remove XFAILs and
trace output.
* testsuite/libgomp.oacc-fortran/deep-copy-6-no_finalize.F90: Remove
trace output.
* testsuite/libgomp.oacc-fortran/dynamic-incr-structural-1.f90: New
test.
* testsuite/libgomp.oacc-c-c++-common/structured-dynamic-lifetimes-4.c:
Remove stale comment.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-1-1.f90: Remove XFAILs.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-1-2.F90: Likewise.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-2-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-2-2.f90: Likewise.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-3-1.f90: Likewise.
* testsuite/libgomp.oacc-fortran/mdc-refcount-1-4-1.f90: Adjust XFAIL.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>

commit | commitdiff | tree

Julian Brown [Tue, 30 Jun 2020 09:15:56 +0000 (02:15 -0700)]

openacc: Helper functions for enter/exit data using single mapping

This patch factors out the parts of goacc_enter_datum and
goacc_exit_datum that can be shared with goacc_enter_data_internal
and goacc_exit_data_internal respectively (in the next patch),
without overloading function return values or complicating code paths
unnecessarily.

2020-07-10 Julian Brown <julian@codesourcery.com>
Thomas Schwinge <thomas@codesourcery.com>

libgomp/
* oacc-mem.c (goacc_map_var_existing): New function.
(goacc_enter_datum): Use above function.
(goacc_exit_datum_1): New function.
(goacc_exit_datum): Use above function.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>

commit | commitdiff | tree

Bill Seurer, 507-253-3502, seurer@us.ibm.com [Thu, 9 Jul 2020 21:41:38 +0000 (16:41 -0500)]

rs6000: Fix __builtin_altivec_mask_for_load to use correct type

gcc/ChangeLog:

PR target/95581
* config/rs6000/rs6000-call.c: Add new type v16qi_ftype_pcvoid.
(altivec_init_builtins) Change __builtin_altivec_mask_for_load to use
v16qi_ftype_pcvoid with correct number of parameters.

commit | commitdiff | tree

Martin Liska [Fri, 10 Jul 2020 12:45:13 +0000 (14:45 +0200)]

testsuite: Fix WPA scanning.

gcc/testsuite/ChangeLog:

PR gcov-profile/96148
* lib/scanwpaipa.exp: Fix wpa dump file suffix the same way
as other in the file.

commit | commitdiff | tree

Jason Merrill [Sat, 4 Jul 2020 09:45:01 +0000 (05:45 -0400)]

c++: Support non-type template parms of union type.

Another thing newly allowed by P1907R1. The ABI group has discussed
representing unions with designated initializers, and has separately
specified how to represent designators; this patch implements both.

gcc/cp/ChangeLog:

* tree.c (structural_type_p): Allow unions.
* mangle.c (write_expression): Express unions with a designator.

libiberty/ChangeLog:

* cp-demangle.c (cplus_demangle_operators): Add di, dx, dX.
(d_expression_1): Handle di and dX.
(is_designated_init, d_maybe_print_designated_init): New.
(d_print_comp_inner): Use d_maybe_print_designated_init.
* testsuite/demangle-expected: Add designator tests.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/nontype-class-union1.C: New test.

commit | commitdiff | tree

Jason Merrill [Thu, 2 Jul 2020 19:14:52 +0000 (15:14 -0400)]

c++: Allow floating-point template parms in C++20.

P1907R1 made various adjustments to non-type template parameters, notably
introducing the notion of "structural type". I implemented an early version
of that specification in r10-4426, but it was adjusted in the final paper to
allow more. This patch implements allowing template parameters of
floating-point type; still to be implemented are unions and subobjects.

gcc/cp/ChangeLog:

* pt.c (convert_nontype_argument): Handle REAL_TYPE.
(invalid_nontype_parm_type_p): Allow all structural types.
* tree.c (structural_type_p): Use SCALAR_TYPE_P.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/pr81246.C: No error in C++20.
* g++.dg/cpp0x/variadic74.C: No error in C++20.
* g++.dg/cpp1z/nontype-auto3.C: No error in C++20.
* g++.dg/template/crash106.C: No error in C++20.
* g++.dg/template/crash119.C: No error in C++20.
* g++.dg/template/nontype12.C: No error in C++20.
* g++.dg/template/void3.C: Don't require follow-on message.
* g++.dg/template/void7.C: Don't require follow-on message.
* g++.dg/template/void9.C: Don't require follow-on message.

commit | commitdiff | tree

Jason Merrill [Thu, 9 Jul 2020 19:11:12 +0000 (15:11 -0400)]

c++: [[no_unique_address]] fixes. [PR96105]

We were wrongly checking is_empty_class on the result of strip_array_types
rather than the actual field type. We weren't considering the alignment of
the data member. We needed to handle unions the same way as
layout_nonempty_base_or_field.

gcc/cp/ChangeLog:

PR c++/96105
PR c++/96052
PR c++/95976
* class.c (check_field_decls): An array of empty classes is not an
empty data member.
(layout_empty_base_or_field): Handle explicit alignment.
Fix union handling.

gcc/testsuite/ChangeLog:

PR c++/96105
PR c++/96052
PR c++/95976
* g++.dg/cpp2a/no_unique_address4.C: New test.
* g++.dg/cpp2a/no_unique_address5.C: New test.
* g++.dg/cpp2a/no_unique_address6.C: New test.

commit | commitdiff | tree

H.J. Lu [Thu, 9 Jul 2020 21:56:48 +0000 (14:56 -0700)]

x86: Check TARGET_AVX512VL when enabling FMA

Check TARGET_AVX512VL when enabling FMA to avoid

gcc.target/i386/avx512er-vrsqrt28ps-3.c:25:1: error: unrecognizable insn:
(insn 29 28 30 6 (set (reg:V8SF 108)
        (fma:V8SF (reg:V8SF 106)
            (reg:V8SF 105)
            (reg:V8SF 110)))

when TARGET_AVX512VL isn't enabled.

PR target/96144
* config/i386/i386-expand.c (ix86_emit_swsqrtsf): Check
TARGET_AVX512VL when enabling FMA.

commit | commitdiff | tree

Andrea Corallo [Tue, 26 May 2020 16:47:13 +0000 (17:47 +0100)]

arm: Implement Armv8.1-M low overhead loops

gcc/ChangeLog

2020-06-18  Andrea Corallo  <andrea.corallo@arm.com>
    Mihail-Calin Ionescu  <mihail.ionescu@arm.com>
    Iain Apreotesei  <iain.apreotesei@arm.com>

* config/arm/arm-protos.h (arm_target_insn_ok_for_lob): New
prototype.
* config/arm/arm.c (TARGET_INVALID_WITHIN_DOLOOP): Define.
(arm_invalid_within_doloop): Implement invalid_within_doloop hook.
(arm_target_insn_ok_for_lob): New function.
* config/arm/arm.h (TARGET_HAVE_LOB): Define macro.
* config/arm/thumb2.md (*doloop_end_internal, doloop_begin)
(dls_insn): Add new patterns.
(doloop_end): Modify to select LR when LOB is available.
* config/arm/unspecs.md: Add new unspec.
* doc/sourcebuild.texi (arm_v8_1_lob_ok)
(arm_thumb2_ok_no_arm_v8_1_lob): Document new target supports
options.

gcc/testsuite/ChangeLog

2020-06-18  Andrea Corallo  <andrea.corallo@arm.com>
    Mihail-Calin Ionescu  <mihail.ionescu@arm.com>
    Iain Apreotesei  <iain.apreotesei@arm.com>

* gcc.target/arm/lob.h: New header.
* gcc.target/arm/lob1.c: New testcase.
* gcc.target/arm/lob2.c: Likewise.
* gcc.target/arm/lob3.c: Likewise.
* gcc.target/arm/lob4.c: Likewise.
* gcc.target/arm/lob5.c: Likewise.
* gcc.target/arm/lob6.c: Likewise.
* gcc.target/arm/unsigned-extend-2.c: Do not run when generating
low loop overhead.
* gcc.target/arm/ivopts.c: Fix check for low loop overhead.
* lib/target-supports.exp (check_effective_target_arm_v8_1_lob)
(check_effective_target_arm_thumb2_ok_no_arm_v8_1_lob): New procs.

commit | commitdiff | tree

Piotr Trojanek [Sat, 30 May 2020 18:54:49 +0000 (20:54 +0200)]

[Ada] Revert mistaken negation related to references to labels

gcc/ada/

* sem_ch8.adb (Find_Direct_Name): Fix code to match the comment.

commit | commitdiff | tree

Eric Botcazou [Sat, 30 May 2020 09:14:12 +0000 (11:14 +0200)]

[Ada] Add warning for overlays changing scalar storage order

gcc/ada/

* sem_ch13.adb (Analyze_Attribute_Definition_Clause) <Address>:
Issue an unconditional warning for an overlay that changes the
scalar storage order.

commit | commitdiff | tree

Piotr Trojanek [Sat, 30 May 2020 18:02:52 +0000 (20:02 +0200)]

[Ada] Fix detection of actual parameters for procedure calls

gcc/ada/

* sem_ch8.adb (Is_Actual_Parameter): Fix processing when parent
is a procedure call statement; extend comment.

commit | commitdiff | tree

Bob Duff [Thu, 28 May 2020 20:16:06 +0000 (16:16 -0400)]

[Ada] Ada2020: AI12-0368 Declare expressions can be static

gcc/ada/

* sem_res.adb (Resolve_Expression_With_Actions): Check the rules
of AI12-0368, and mark the declare expression as static or known
at compile time as appropriate.
* sem_ch4.adb: Minor reformatting.
* libgnat/a-stoufo.ads, libgnat/a-stoufo.adb: Allow up to 9
replacement parameters. I'm planning to use this in the test
case for this ticket.

commit | commitdiff | tree

Ed Schonberg [Thu, 28 May 2020 21:09:32 +0000 (17:09 -0400)]

[Ada] Spurious error on parameterless acccess_to_subprogram

gcc/ada/

* exp_ch3.adb (Build_Access_Subprogram_Wrapper_Body): Create a
proper signature when the access type denotes a parameterless
subprogram.
* exp_ch6.adb (Expand_Call): Handle properly a parameterless
indirect call when the corresponding access type has contracts.

commit | commitdiff | tree

Eric Botcazou [Fri, 29 May 2020 14:30:54 +0000 (16:30 +0200)]

[Ada] Further improve the expansion of array aggregates

gcc/ada/

* exp_aggr.adb
(Convert_To_Positional): Add Dims local variable
and pass it in calls to Is_Flat and Flatten.
(Check_Static_Components): Pass Dims in call to
Is_Static_Element.
(Nonflattenable_Next_Aggr): New predicate.
(Flatten): Add Dims parameter and Expr local variable.  Call
Nonflattenable_Next_Aggr in a couple of places.  In the case
when an Others choice is present, check that the element is
either static or a nested aggregate that can be flattened,
before disregarding the replication limit for elaboration
purposes.  Check that a nested array is flattenable in the case
of a multidimensional array in any position.  Remove redundant
check in the Others case and pass Dims in call to
Is_Static_Element.  Use Expr variable.
(Is_Flat): Change type of Dims parameter from Int to Nat.
(Is_Static_Element): Add Dims parameter.  Replace tests on
literals with call to Compile_Time_Known_Value.  If everything
else failed and the dimension is 1, preanalyze the expression
before calling again Compile_Time_Known_Value on it.  Return
true for null.
(Late_Expansion): Do not expand further if the assignment to the
target can be done directly by the back end.

commit | commitdiff | tree

Arnaud Charlet [Fri, 29 May 2020 08:41:00 +0000 (04:41 -0400)]

[Ada] Preserve casing of output files

gcc/ada/

* osint-c.adb (Set_File_Name): Preserve casing of file.
* osint.adb (File_Names_Equal): New.
(Executable_Name): Use File_Equal instead of
Canonical_Case_File_Name.

commit | commitdiff | tree

Pascal Obry [Fri, 22 May 2020 16:37:17 +0000 (18:37 +0200)]

[Ada] Fix memory leak in routine Wait_On_Socket

gcc/ada/

* libgnat/g-socket.adb (Wait_On_Socket): Fix memory leaks and
file descriptor leaks. A memory leak was created each time the
routine was called without a selector (Selector = Null). Also,
in case of exception in the routine a memory leak and descriptor
leak was created as the created file selector was not closed.

commit | commitdiff | tree

Pascal Obry [Fri, 22 May 2020 16:36:36 +0000 (18:36 +0200)]

[Ada] Minor style fixes

gcc/ada/

* libgnat/g-socket.adb: Minor style fixes.

commit | commitdiff | tree

Javier Miranda [Wed, 27 May 2020 20:44:40 +0000 (16:44 -0400)]

[Ada] Potentially unevaluated nested expressions

gcc/ada/

* sem_util.adb
(Immediate_Context_Implies_Is_Potentially_Unevaluated): New
subprogram.
(Is_Potentially_Unevaluated): Do not stop climbing the tree on
the first candidate subexpression; required to handle nested
expressions.

commit | commitdiff | tree

Gary Dismukes [Wed, 27 May 2020 20:44:12 +0000 (16:44 -0400)]

[Ada] Reformatting and typo corrections

gcc/ada/

* exp_aggr.adb, exp_spark.adb, sem_ch13.ads, sem_ch13.adb,
snames.ads-tmpl: Minor reformatting and typo fixes.

commit | commitdiff | tree

Yannick Moy [Wed, 27 May 2020 14:46:27 +0000 (16:46 +0200)]

[Ada] Fix detection of volatile properties in SPARK

gcc/ada/

* sem_util.adb (Has_Enabled_Property): Add handling of
non-variable objects.

commit | commitdiff | tree

Piotr Trojanek [Thu, 28 May 2020 10:14:38 +0000 (12:14 +0200)]

[Ada] Cleanup excessive conditions in Check_Completion

gcc/ada/

* sem_ch3.adb (Check_Completion): Refactor chained
if-then-elsif-... statement to be more like a case
statement (note: we can't simply use case statement because of
Is_Intrinsic_Subprogram in the first condition).

commit | commitdiff | tree

Piotr Trojanek [Wed, 27 May 2020 15:41:40 +0000 (17:41 +0200)]

[Ada] Remove references to non-existing E_Protected_Object

gcc/ada/

* einfo.ads (E_Protected_Object): Enumeration literal removed.
* lib-xref.ads (Xref_Entity_Letters): Remove reference to
removed literal.
* sem_ch3.adb (Check_Completion): Likewise.
* sem_util.adb (Has_Enabled_Property): Likewise.

commit | commitdiff | tree

Arnaud Charlet [Mon, 2 Mar 2020 13:43:20 +0000 (08:43 -0500)]

[Ada] Use small limit for aggregates inside subprograms

gcc/ada/

* exp_aggr.adb (Max_Aggregate_Size): Use small limit for
aggregate inside subprograms.
* sprint.adb (Sprint_Node_Actual [N_Object_Declaration]): Do not
print the initialization expression if the No_Initialization
flag is set.
* sem_util.ads, sem_util.adb (Predicate_Enabled): New.
* exp_ch4.adb (Expand_N_Type_Conversion): Code cleanup and apply
predicate check consistently.
* exp_ch6.adb (Expand_Actuals.By_Ref_Predicate_Check): Ditto.
* sem_ch3.adb (Analyze_Object_Declaration): Ditto.
* exp_ch3.adb (Build_Assignment): Revert handling of predicate
check for allocators with qualified expressions, now handled in
Freeze_Expression directly.
* sem_aggr.adb: Fix typos.
* checks.adb: Code refactoring: use Predicate_Enabled.
(Apply_Predicate_Check): Code cleanup.
* freeze.adb (Freeze_Expression): Freeze the subtype mark before
a qualified expression on an allocator.
* exp_util.ads, exp_util.adb (Within_Internal_Subprogram):
Renamed Predicate_Check_In_Scope to clarify usage, refine
handling of predicates within init procs which should be enabled
when the node comes from source.
* sem_ch13.adb (Freeze_Entity_Checks): Update call to
Predicate_Check_In_Scope.

commit | commitdiff | tree

Eric Botcazou [Wed, 27 May 2020 20:42:27 +0000 (22:42 +0200)]

[Ada] Small cleanup throughout Exp_Ch4

gcc/ada/

* exp_ch4.adb (Expand_Array_Comparison): Reformat.
(Expand_Concatenate): Use standard size values directly and use
Standard_Long_Long_Unsigned instead of RE_Long_Long_Unsigned.
(Expand_Modular_Op): Use Standard_Long_Long_Integer in case the
modulus is larger than Integer.
(Expand_N_Op_Expon): Use standard size value directly.
(Narrow_Large_Operation): Use Uint instead of Nat for sizes and
use a local variable for the size of the type.
(Get_Size_For_Range): Return Uint instead of Nat.
(Is_OK_For_Range): Take Uint instead of Nat.

commit | commitdiff | tree

Javier Miranda [Tue, 26 May 2020 18:54:15 +0000 (14:54 -0400)]

[Ada] Spurious error in generic dispatching constructor call

gcc/ada/

* exp_ch6.adb (Make_Build_In_Place_Iface_Call_In_Allocator):
Build the internal anonymous access type using as a reference
the designated type imposed by the context (instead of using the
return type of the called function).

commit | commitdiff | tree

Yannick Moy [Wed, 27 May 2020 15:30:23 +0000 (17:30 +0200)]

[Ada] Fix assertion failure on (in-)out function parameter

gcc/ada/

* sem_res.adb (Resolve_Actuals): Protect call to
Is_Valued_Procedure.

commit | commitdiff | tree

Piotr Trojanek [Wed, 27 May 2020 11:26:36 +0000 (13:26 +0200)]

[Ada] Revert too late setting of Ekind on discriminants

gcc/ada/

* sem_ch3.adb (Process_Discriminants): Revert recent change to
location of Set_Ekind; detect effectively volatile discriminants
by their type only.

commit | commitdiff | tree

Joffrey Huguet [Tue, 26 May 2020 16:06:58 +0000 (18:06 +0200)]

[Ada] Add global contracts to Ada.Numerics.Big_Numbers libraries

gcc/ada/

* libgnat/a-nbnbin.ads, libgnat/a-nbnbre.ads: Add global
contract (Global => null) to all functions.

commit | commitdiff | tree

Ed Schonberg [Tue, 26 May 2020 19:39:38 +0000 (15:39 -0400)]

[Ada] Part of implementation of AI12-0212: container aggregates

gcc/ada/

* aspects.ads: Add Aspect_Aggregate.
* exp_aggr.adb (Expand_Container_Aggregate): Expand positional
container aggregates into separate initialization and insertion
operations.
* sem_aggr.ads (Resolve_Container_Aggregate): New subprogram.
* sem_aggr.adb (Resolve_Container_Aggregate): Parse aspect
aggregate, establish element types and key types if present, and
resolve aggregate components.
* sem_ch13.ads (Parse_Aspect_Aggregate): Public subprogram used
in validation, resolution and expansion of container aggregates
* sem_ch13.adb
(Parse_Aspect_Aggregate): Retrieve names of primitives specified
in aspect specification.
(Validate_Aspect_Aggregate): Check legality of specified
operations given in aspect specification, before nane
resolution.
(Resolve_Aspect_Aggregate): At freeze point resolve operations
and verify that given operations have the required profile.
* sem_res.adb (Resolve): Call Resolve_Aspect_Aggregate if aspect
is present for type.
* snames.ads-tmpl: Add names used in aspect Aggregate: Empty,
Add_Named, Add_Unnamed, New_Indexed, Assign_Indexed.

commit | commitdiff | tree

Arnaud Charlet [Mon, 25 May 2020 15:30:56 +0000 (11:30 -0400)]

[Ada] Make System.Generic_Bignums more flexible

gcc/ada/

* Makefile.rtl (GNATRTL_NONTASKING_OBJS): Add s-shabig.o.
* libgnat/s-shabig.ads: New file to share definitions.
* libgnat/s-genbig.ads, libgnat/s-genbig.adb: Reorganized to
make it more generic and flexible in terms of memory allocation
and data structure returned.
(To_String): Moved to System.Generic_Bignums to allow sharing
this code.
(Big_And, Big_Or, Big_Shift_Left, Big_Shift_Right): New.
* libgnat/s-bignum.adb, libgnat/s-bignum.ads: Adapt to new
System.Generic_Bignums spec.
* libgnat/a-nbnbin.adb: Likewise.
(To_String): Moved to System.Generic_Bignums to allow sharing
this code.
* libgnat/a-nbnbre.adb (Normalize): Fix handling of Num = 0
leading to an exception.

commit | commitdiff | tree

Eric Botcazou [Tue, 26 May 2020 18:06:14 +0000 (20:06 +0200)]

[Ada] Fix crash on quantified expression in expression function (2)

gcc/ada/

* freeze.adb (Freeze_Expr_Types): Replace call to Find_Aspect
with call to Find_Value_Of_Aspect and adjust accordingly.

commit | commitdiff | tree

Eric Botcazou [Tue, 26 May 2020 10:55:26 +0000 (12:55 +0200)]

[Ada] Fix crash on quantified expression in expression function

gcc/ada/

* einfo.adb (Write_Field24_Name): Handle E_Loop_Parameter.
* freeze.adb (Freeze_Expr_Types): Freeze the iterator type used as
Default_Iterator of the name of an N_Iterator_Specification node.

commit | commitdiff | tree

Eric Botcazou [Mon, 25 May 2020 21:27:46 +0000 (23:27 +0200)]

[Ada] Fix internal error on if-expression in call returning tagged type

gcc/ada/

* checks.adb (Determine_Range): Deal with Min and Max attributes.
* exp_ch6.adb (Expand_Call_Helper): When generating code to pass
the accessibility level to the caller in the case of an actual
which is an if-expression, also remove the nodes created after
the declaration of the dummy temporary.
* sem_ch6.adb (Analyze_Subprogram_Body_Helper): Use Natural as
the type of the minimum accessibility level object.

commit | commitdiff | tree

Piotr Trojanek [Tue, 26 May 2020 10:19:01 +0000 (12:19 +0200)]

[Ada] Fix failing assertions related to volatile objects

gcc/ada/

* sem_ch3.adb (Process_Discriminants): Set Ekind of the
processed discriminant entity before passing to
Is_Effectively_Volatile, which was crashing on a failed
assertion.
* sem_prag.adb (Analyze_External_Property_In_Decl_Part): Prevent
call to No_Caching_Enabled with entities other than variables,
which was crashing on a failed assertion.
(Analyze_Pragma): Style cleanups.
* sem_util.adb (Is_Effectively_Volatile): Enforce comment with
an assertion; prevent call to No_Caching_Enabled with entities
other than variables.
(Is_Effectively_Volatile_Object): Only call
Is_Effectively_Volatile on objects, not on types.
(No_Caching_Enabled): Enforce comment with an assertion.

commit | commitdiff | tree

Yannick Moy [Tue, 26 May 2020 08:15:18 +0000 (10:15 +0200)]

[Ada] Remove use of debug flag -gnatdF for GNATprove

gcc/ada/

* debug.adb: Update comments to free usage of -gnatdF.

commit | commitdiff | tree

Piotr Trojanek [Wed, 6 May 2020 20:02:11 +0000 (22:02 +0200)]

[Ada] Reuse SPARK expansion of attribute Update for delta_aggregate

gcc/ada/

* exp_spark.adb (Expand_SPARK_Delta_Or_Update): Refactored from
Expand_SPARK_N_Attribute_Reference; rewrite into N_Aggregate or
N_Delta_Aggregate depending on what is being rewritten.
(Expand_SPARK_N_Delta_Aggregate): New routine to expand
delta_aggregate.
(Expand_SPARK_N_Attribute_Reference): Call the refactored
routine.

commit | commitdiff | tree

Piotr Trojanek [Thu, 21 May 2020 13:42:32 +0000 (15:42 +0200)]

[Ada] Fix expansion of 'Update with multiple choices in GNATprove

gcc/ada/

* exp_spark.adb (Expand_SPARK_N_Attribute_Reference): Fix
expansion of attribute Update.

commit | commitdiff | tree

Arnaud Charlet [Sat, 23 May 2020 18:50:10 +0000 (20:50 +0200)]

[Ada] Crash in Walk_Library_Items on ghost units

gcc/ada/

* sem.adb (Walk_Library_Items): Fix handling of Ghost units.

commit | commitdiff | tree

Richard Biener [Thu, 9 Jul 2020 14:03:45 +0000 (16:03 +0200)]

fix constant folding from array CTORs

This fixes the case where we try to fold a read from an
array initalizer and happen to cross the boundary of
multiple CTORs which isn't really supported. For the
interesting cases like the testcase we actually handle
the folding by encoding the whole initializer.

2020-07-10 Richard Biener <rguenther@suse.de>

PR tree-optimization/96133
* gimple-fold.c (fold_array_ctor_reference): Do not
recurse to folding a CTOR that does not fully cover the
asked for object.

* gcc.dg/torture/pr96133.c: New testcase.

commit | commitdiff | tree

Cui,Lili [Wed, 24 Jun 2020 05:08:11 +0000 (13:08 +0800)]

Initial Sapphire Rapids and Alder Lake support from ISA r40

gcc/
* common/config/i386/cpuinfo.h
(get_intel_cpu): Handle sapphirerapids.
* common/config/i386/i386-common.c
(processor_names): Add sapphirerapids and alderlake.
(processor_alias_table): Add sapphirerapids and alderlake.
* common/config/i386/i386-cpuinfo.h
(processor_subtypes): Add INTEL_COREI7_ALDERLAKE and
INTEL_COREI7_ALDERLAKE.
* config.gcc: Add -march=sapphirerapids and alderlake.
* config/i386/driver-i386.c
(host_detect_local_cpu) Handle sapphirerapids and alderlake.
* config/i386/i386-c.c
(ix86_target_macros_internal): Handle sapphirerapids and alderlake.
* config/i386/i386-options.c
(m_SAPPHIRERAPIDS) : Define.
(m_ALDERLAKE): Ditto.
(m_CORE_AVX512) : Add m_SAPPHIRERAPIDS.
(processor_cost_table): Add sapphirerapids and alderlake.
(ix86_option_override_internal) Handle PTA_WAITPKG, PTA_ENQCMD,
PTA_CLDEMOTE, PTA_SERIALIZE, PTA_TSXLDTRK.
* config/i386/i386.h
(ix86_size_cost) : Define SAPPHIRERAPIDS and ALDERLAKE.
(processor_type) : Add PROCESSOR_SAPPHIRERAPIDS and
PROCESSOR_ALDERLAKE.
(PTA_ENQCMD): New.
(PTA_CLDEMOTE): Ditto.
(PTA_SERIALIZE): Ditto.
(PTA_TSXLDTRK): New.
(PTA_SAPPHIRERAPIDS): Ditto.
(PTA_ALDERLAKE): Ditto.
(processor_type) : Add PROCESSOR_SAPPHIRERAPIDS and
PROCESSOR_ALDERLAKE.
* doc/extend.texi: Add sapphirerapids and alderlake.
* doc/invoke.texi: Add sapphirerapids and alderlake.

gcc/testsuite/
* gcc.target/i386/funcspec-56.inc: Handle new march.
* g++.target/i386/mv16.C: Handle new march

commit | commitdiff | tree

Martin Liska [Thu, 9 Jul 2020 09:58:11 +0000 (11:58 +0200)]

Add -fdump-profile-report.

When using -fprofile-report, -fdump-profile-report can be used to
print the report to a foo.c.000i.profile-report file instead
of stderr. I see it handy for comparison purpose.

gcc/ChangeLog:

* dumpfile.c [profile-report]: Add new profile dump.
* dumpfile.h (enum tree_dump_index): Ad TDI_profile_report.
* passes.c (pass_manager::dump_profile_report): Change stderr
to dump_file.

commit | commitdiff | tree

Kewen Lin [Fri, 10 Jul 2020 02:58:28 +0000 (21:58 -0500)]

vect: Use adjusted niters by considering peeling prologue

This patch is derived from the review of vector with length patch
series. I relaxed the guard on LOOP_VINFO_PEELING_FOR_ALIGNMENT for
vector with length as Richard S.'s suggestion, then encountered one
failure from case gcc.dg/vect/vect-ifcvt-11.c with param
vect-partial-vector-usage=2 enablement run. The root cause is that
we still use the original niters for the loop body vectorization,
it leads the access to go out of bound, instead we should use
LOOP_VINFO_NITERS which has been adjusted in vect_do_peeling by
considering the peeling number for prologue.

Bootstrapped/regtested on aarch64-linux-gnu and powerpc64le-linux-gnu.

gcc/ChangeLog:

* tree-vect-loop.c (vect_transform_loop): Use LOOP_VINFO_NITERS which
is adjusted by considering peeled prologue for non
vect_use_loop_mask_for_alignment_p cases.

commit | commitdiff | tree

GCC Administrator [Fri, 10 Jul 2020 00:16:28 +0000 (00:16 +0000)]

Daily bump.

commit | commitdiff | tree

Julian Brown [Tue, 9 Jun 2020 13:21:34 +0000 (06:21 -0700)]

openacc: Set bias to zero for explicit attach/detach clauses in C and C++

This is a fix for the pointer (or array) size inadvertently being used
for the bias with attach and detach mapping kinds, for both C and C++.

2020-07-09 Julian Brown <julian@codesourcery.com>
Thomas Schwinge <thomas@codesourcery.com>

gcc/c/
PR middle-end/95270
* c-typeck.c (c_finish_omp_clauses): Set OMP_CLAUSE_SIZE (bias) to zero
for standalone attach/detach clauses.

gcc/cp/
PR middle-end/95270
* semantics.c (finish_omp_clauses): Likewise.

include/
PR middle-end/95270
* gomp-constants.h (gomp_map_kind): Expand comment for attach/detach
mapping kinds.

gcc/testsuite/
PR middle-end/95270
* c-c++-common/goacc/mdc-1.c: Update expected dump output for zero
bias.

libgomp/
PR middle-end/95270
* testsuite/libgomp.oacc-c-c++-common/pr95270-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/pr95270-2.c: New test.

commit | commitdiff | tree

Julian Brown [Thu, 11 Jun 2020 13:43:59 +0000 (06:43 -0700)]

openacc: GOMP_MAP_ATTACH handling in find_group_last

Arrange for GOMP_MAP_ATTACH to be grouped together with a preceding
GOMP_MAP_TO_PSET or other "to" data movement clause, except in cases
where an explicit "attach" clause is used.

2020-07-09 Julian Brown <julian@codesourcery.com>

include/
* gomp-constants.h (gomp_map_kind): Update comment for GOMP_MAP_TO_PSET.

libgomp/
* oacc-mem.c (find_group_last): Group data-movement clauses
(GOMP_MAP_TO_PSET, GOMP_MAP_TO, etc.) together with a subsequent
GOMP_MAP_ATTACH. Allow standalone GOMP_MAP_ATTACH also.

commit | commitdiff | tree

Julian Brown [Wed, 3 Jun 2020 21:25:19 +0000 (14:25 -0700)]

openacc: Fortran derived-type mapping fix

Fix a bug with mapping Fortran components which themselves have derived
types in the OpenACC 2.5+ manual deep-copy support.

2020-07-09 Julian Brown <julian@codesourcery.com>

gcc/fortran/
* trans-openmp.c (gfc_trans_omp_clauses): Use 'inner' not 'decl' for
derived type members which themselves have derived types.

gcc/testsuite/
* gfortran.dg/goacc/mapping-tests-3.f90: New test.
* gfortran.dg/goacc/mapping-tests-4.f90: New test.

commit | commitdiff | tree

Peter Bergner [Thu, 9 Jul 2020 20:52:59 +0000 (15:52 -0500)]

rs6000: Allow MMA built-in initialization regardless of compiler options

Built-in initialization occurs only once and fairly early, when the
command line options are in force. If the -mcpu=<CPU> is pre-power10,
then we fail to initialize the MMA built-ins, so they are not
available to call in a #pragma target/attribute target function.
The fix is to basically always (on server type cpus) initialize the MMA
built-ins so we can use them in #pragma target/attribute target functions.

2020-07-09 Peter Bergner <bergner@linux.ibm.com>

gcc/
PR target/96125
* config/rs6000/rs6000-call.c (rs6000_init_builtins): Define the MMA
specific types __vector_quad and __vector_pair, and initialize the
MMA built-ins if TARGET_EXTRA_BUILTINS is set.
(mma_init_builtins): Don't test for mask set in rs6000_builtin_mask.
Remove now unneeded mask variable.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Add the
OPTION_MASK_MMA flag for power10 if not already set.

gcc/testsuite/
PR target/96125
* gcc.target/powerpc/pr96125.c: New test.

commit | commitdiff | tree

Richard Biener [Thu, 9 Jul 2020 14:06:04 +0000 (16:06 +0200)]

fixup BIT_FIELD_REF detection in SLP discovery

This fixes a thinko where we end up combining a BIT_FIELD_REF
and a memory access, fixed by checking all stmts are a load or
none.

2020-07-09 Richard Biener <rguenther@suse.de>

PR tree-optimization/96133
* tree-vect-slp.c (vect_build_slp_tree_1): Compare load_p
status between stmts.

commit | commitdiff | tree

Patrick Palka [Thu, 9 Jul 2020 17:47:13 +0000 (13:47 -0400)]

c++: Partially revert fix for PR c++/95497 [PR96132]

I was mistaken to assume that a dependent type is necessarily
incomplete, and indeed there are multiple places in the frontend where
we check a type for both dependency and completeness. So this patch
partially reverts the fix for PR95497, restoring the dependent_type_p
check that guarded the call to is_really_empty_class below.

gcc/cp/ChangeLog:

PR c++/96132
* constexpr.c (potential_constant_expression_1) <case PARM_DECL>:
Restore dependent_type_p check that guarded the call to
is_really_empty_class.

gcc/testsuite/ChangeLog:

PR c++/96132
* g++.dg/template/incomplete12.C: New test.

commit | commitdiff | tree

H.J. Lu [Wed, 23 Jan 2019 14:33:58 +0000 (06:33 -0800)]

x86: Enable FMA in rsqrt<mode>2 expander

Enable FMA in rsqrt<mode>2 expander and fold rsqrtv16sf2 expander into
rsqrt<mode>2 expander which expands to UNSPEC_RSQRT28 for TARGET_AVX512ER.
Although it doesn't show performance change in our workloads, FMA can
improve other workloads.

gcc/

PR target/88713
* config/i386/i386-expand.c (ix86_emit_swsqrtsf): Enable FMA.
* config/i386/sse.md (VF_AVX512VL_VF1_128_256): New.
(rsqrt<mode>2): Replace VF1_128_256 with VF_AVX512VL_VF1_128_256.
(rsqrtv16sf2): Removed.

gcc/testsuite/

PR target/88713
* gcc.target/i386/pr88713-1.c: New test.
* gcc.target/i386/pr88713-2.c: Likewise.

commit | commitdiff | tree

Richard Biener [Wed, 8 Jul 2020 14:37:55 +0000 (16:37 +0200)]

remove premature vect_verify_datarefs_alignment

This followup removes vect_verify_datarefs_alignment and its
premature cancellation of vectorization leaving the actual
decision whether alignment is supported to the functions
deciding whether we can vectorize a load or store.

2020-07-08 Richard Biener <rguenther@suse.de>

* tree-vectorizer.h (vect_verify_datarefs_alignment): Remove.
(vect_slp_analyze_and_verify_instance_alignment): Rename to ...
(vect_slp_analyze_instance_alignment): ... this.
* tree-vect-data-refs.c (verify_data_ref_alignment): Remove.
(vect_verify_datarefs_alignment): Likewise.
(vect_enhance_data_refs_alignment): Do not call
vect_verify_datarefs_alignment.
(vect_slp_analyze_node_alignment): Rename from
vect_slp_analyze_and_verify_node_alignment and do not
call verify_data_ref_alignment.
(vect_slp_analyze_instance_alignment): Rename from
vect_slp_analyze_and_verify_instance_alignment.
* tree-vect-stmts.c (vectorizable_store): Dump when
we vectorize an unaligned access.
(vectorizable_load): Likewise.
* tree-vect-loop.c (vect_analyze_loop_2): Do not call
vect_verify_datarefs_alignment.
* tree-vect-slp.c (vect_slp_analyze_bb_1): Adjust.

* gcc.dg/vect/bb-slp-10.c: Adjust.
* gcc.dg/vect/slp-45.c: Likewise.
* gcc.dg/vect/vect-109.c: Likewise.

commit | commitdiff | tree

Bin Cheng [Thu, 9 Jul 2020 10:10:03 +0000 (18:10 +0800)]

Schedule reduction partition in the last.

If reduction partition's SCC is broken by runtime alias checks, force
a negative post order to it so that it will be scheduled in the last.

2020-07-09 Bin Cheng <bin.cheng@linux.alibaba.com>

gcc/
PR tree-optimization/95804
* tree-loop-distribution.c (break_alias_scc_partitions): Force
negative post order to reduction partition.

gcc/testsuite/
PR tree-optimization/95804
* gcc.dg/tree-ssa/pr95804.c: New test.

commit | commitdiff | tree

Jakub Jelinek [Thu, 9 Jul 2020 10:07:17 +0000 (12:07 +0200)]

openmp: Optimize triangular loop logical iterator to actual iterators computation using search for quadratic equation root(s)

This patch implements the optimized logical to actual iterators
computation for triangular loops.

I have a rough implementation using integers, but this one uses floating
point.  There is a small problem that -fopenmp programs aren't linked with
-lm, so it does it only if the hw has sqrt optab (and uses ifn rather than
__builtin_sqrt because it obviously doesn't need errno handling etc.).

Do you think it is ok this way, or should I use the integral computation
using inlined isqrt (we have inequation of the form
start >= x * t10 + t11 * (((x - 1) * x) / 2)
where t10 and t11 are signed long long values and start unsigned long long,
and the division by 2 actually is a problem for accuracy in some cases, so
if we do it in integral, we need to do actually
      long long t12 = 2 * t10 - t11;
      unsigned long long t13 = t12 * t12 + start * 8 * t11;
      unsigned long long isqrt_ = isqrtull (t13);
      long long x = (((long long) isqrt_ - t12) / t11) >> 1;
with careful overflow checking on all the computations before isqrtull
(and on overflows use the fallback implementation).

2020-07-09  Jakub Jelinek  <jakub@redhat.com>

* omp-general.h (struct omp_for_data): Add min_inner_iterations
and factor members.
* omp-general.c (omp_extract_for_data): Initialize them and remember
them in OMP_CLAUSE_COLLAPSE_COUNT if needed and restore from there.
* omp-expand.c (expand_omp_for_init_counts): Fix up computation of
counts[fd->last_nonrect] if fd->loop.n2 is INTEGER_CST.
(expand_omp_for_init_vars): For
fd->first_nonrect + 1 == fd->last_nonrect loops with for now
INTEGER_CST fd->loop.n2 find quadratic equation roots instead of
using fallback method when possible.

* testsuite/libgomp.c/loop-19.c: New test.
* testsuite/libgomp.c/loop-20.c: New test.

commit | commitdiff | tree

Jakub Jelinek [Thu, 9 Jul 2020 09:29:30 +0000 (11:29 +0200)]

openmp: Change omp_atv_default value and rename omp_atv_sequential to omp_atv_serialized.

While this is an OpenMP 5.1 change, it is undesirable to let people use different
values and then deal with ABI backwards compatibility in a year or two.

2020-07-09  Jakub Jelinek  <jakub@redhat.com>

* omp.h.in (omp_alloctrait_value_t): Change omp_atv_default from
2 to -1.  Add omp_atv_serialized and define omp_atv_sequential using
it.  Remove __omp_alloctrait_value_max__.
* allocator.c (omp_init_allocator): Handle omp_atv_default for
omp_atk_alignment and omp_atk_pool_size.

commit | commitdiff | tree

Omar Tahir [Thu, 9 Jul 2020 09:14:19 +0000 (10:14 +0100)]

ira: Fix unnecessary register spill

The variables first_moveable_pseudo and last_moveable_pseudo aren't
reset after compiling a function, which means they leak into the first
scheduler pass of the following function. In some cases, this can cause
an extra spill during register allocation of the second function.

gcc/ChangeLog:

* ira.c (move_unallocated_pseudos): Zero first_moveable_pseudo and
last_moveable_pseudo before returning.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/nospill.c: New test.

commit | commitdiff | tree

Szabolcs Nagy [Thu, 2 Jul 2020 16:12:05 +0000 (17:12 +0100)]

aarch64: Fix BTI support in libitm

sjlj.S did not have the GNU property note markup and the BTI c
instructions that are necessary when it is built with branch
protection.

The notes are only added when libitm is built with branch
protection, because old linkers mishandle the note (merge
them incorrectly or emit warnings), the BTI instructions
are added unconditionally.

2020-07-09 Szabolcs Nagy <szabolcs.nagy@arm.com>

libitm/ChangeLog:

* config/aarch64/sjlj.S: Add BTI marking and related definitions,
and add BTI c to function entries.

commit | commitdiff | tree

Szabolcs Nagy [Thu, 2 Jul 2020 16:11:56 +0000 (17:11 +0100)]

aarch64: Fix BTI support in libgcc [PR96001]

lse.S did not have the GNU property note markup and the BTI c
instructions that are necessary when it is built with branch
protection.

The notes are only added when libgcc is built with branch
protection, because old linkers mishandle the note (merge
them incorrectly or emit warnings), the BTI instructions
are added unconditionally.

Note: BTI c is only necessary at function entry if the function
may be called indirectly, currently lse functions are not called
indirectly, but BTI is added for ABI reasons e.g. to allow
linkers later to emit stub code with indirect jump.

2020-07-09 Szabolcs Nagy <szabolcs.nagy@arm.com>

libgcc/ChangeLog:

PR target/96001
* config/aarch64/lse.S: Add BTI marking and related definitions,
and add BTI c to function entries.

commit | commitdiff | tree

Szabolcs Nagy [Fri, 3 Jul 2020 13:11:49 +0000 (14:11 +0100)]

aarch64: Fix noexecstack note in libgcc

lse.S did not have GNU stack note, this may cause missing
PT_GNU_STACK in binaries on Linux and FreeBSD.

2020-07-09 Szabolcs Nagy <szabolcs.nagy@arm.com>

libgcc/ChangeLog:

* config/aarch64/lse.S: Add stack note.

commit | commitdiff | tree

Szabolcs Nagy [Fri, 3 Jul 2020 13:09:25 +0000 (14:09 +0100)]

aarch64: Fix noexecstack note in libitm

sjlj.S only had the note on Linux, but it is supposed
to have it on FreeBSD too.

2020-07-09 Szabolcs Nagy <szabolcs.nagy@arm.com>

libitm/ChangeLog:

* config/aarch64/sjlj.S: Add stack note if __FreeBSD__ is defined.

commit | commitdiff | tree

Szabolcs Nagy [Thu, 2 Jul 2020 15:04:51 +0000 (16:04 +0100)]

aarch64: Add missing ACLE support for BTI

Define the __ARM_FEATURE_BTI_DEFAULT feature test
macro when BTI branch protection is enabled.

2020-07-09 Szabolcs Nagy <szabolcs.nagy@arm.com>

gcc/ChangeLog:

* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Add
__ARM_FEATURE_BTI_DEFAULT support.

commit | commitdiff | tree

Matthew Malcomson [Thu, 9 Jul 2020 08:11:59 +0000 (09:11 +0100)]

aarch64: Mitigate SLS for BLR instruction

This patch introduces the mitigation for Straight Line Speculation past
the BLR instruction.

This mitigation replaces BLR instructions with a BL to a stub which uses
a BR to jump to the original value.  These function stubs are then
appended with a speculation barrier to ensure no straight line
speculation happens after these jumps.

When optimising for speed we use a set of stubs for each function since
this should help the branch predictor make more accurate predictions
about where a stub should branch.

When optimising for size we use one set of stubs for all functions.
This set of stubs can have human readable names, and we are using
`__call_indirect_x<N>` for register x<N>.

When BTI branch protection is enabled the BLR instruction can jump to a
`BTI c` instruction using any register, while the BR instruction can
only jump to a `BTI c` instruction using the x16 or x17 registers.
Hence, in order to ensure this transformation is safe we mov the value
of the original register into x16 and use x16 for the BR.

As an example when optimising for size:
a
    BLR x0
instruction would get transformed to something like
    BL __call_indirect_x0
where __call_indirect_x0 labels a thunk that contains
__call_indirect_x0:
    MOV X16, X0
    BR X16
    <speculation barrier>

The first version of this patch used local symbols specific to a
compilation unit to try and avoid relocations.
This was mistaken since functions coming from the same compilation unit
can still be in different sections, and the assembler will insert
relocations at jumps between sections.

On any relocation the linker is permitted to emit a veneer to handle
jumps between symbols that are very far apart.  The registers x16 and
x17 may be clobbered by these veneers.
Hence the function stubs cannot rely on the values of x16 and x17 being
the same as just before the function stub is called.

Similar can be said for the hot/cold partitioning of single functions,
so function-local stubs have the same restriction.

This updated version of the patch never emits function stubs for x16 and
x17, and instead forces other registers to be used.

Given the above, there is now no benefit to local symbols (since they
are not enough to avoid dealing with linker intricacies).  This patch
now uses global symbols with hidden visibility each stored in their own
COMDAT section.  This means stubs can be shared between compilation
units while still avoiding the PLT indirection.

This patch also removes the `__call_indirect_x30` stub (and
function-local equivalent) which would simply jump back to the original
location.

The function-local stubs are emitted to the assembly output file in one
chunk, which means we need not add the speculation barrier directly
after each one.
This is because we know for certain that the instructions directly after
the BR in all but the last function stub will be from another one of
these stubs and hence will not contain a speculation gadget.
Instead we add a speculation barrier at the end of the sequence of
stubs.

The global stubs are emitted in COMDAT/.linkonce sections by
themselves so that the linker can remove duplicates from multiple object
files.  This means they are not emitted in one chunk, and each one must
include the speculation barrier.

Another difference is that since the global stubs are shared across
compilation units we do not know that all functions will be targeting an
architecture supporting the SB instruction.
Rather than provide multiple stubs for each architecture, we provide a
stub that will work for all architectures -- using the DSB+ISB barrier.

This mitigation does not apply for BLR instructions in the following
places:
- Some accesses to thread-local variables use a code sequence with a BLR
  instruction.  This code sequence is part of the binary interface between
  compiler and linker. If this BLR instruction needs to be mitigated, it'd
  probably be best to do so in the linker. It seems that the code sequence
  for thread-local variable access is unlikely to lead to a Spectre Revalation
  Gadget.
- PLT stubs are produced by the linker and each contain a BLR instruction.
  It seems that at most only after the last PLT stub a Spectre Revalation
  Gadget might appear.

Testing:
  Bootstrap and regtest on AArch64
    (with BOOT_CFLAGS="-mharden-sls=retbr,blr")
  Used a temporary hack(1) in gcc-dg.exp to use these options on every
  test in the testsuite, a slight modification to emit the speculation
  barrier after every function stub, and a script to check that the
  output never emitted a BLR, or unmitigated BR or RET instruction.
  Similar on an aarch64-none-elf cross-compiler.

1) Temporary hack emitted a speculation barrier at the end of every stub
function, and used a script to ensure that:
  a) Every RET or BR is immediately followed by a speculation barrier.
  b) No BLR instruction is emitted by compiler.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_indirect_call_asm):
New declaration.
* config/aarch64/aarch64.c (aarch64_regno_regclass): Handle new
stub registers class.
(aarch64_class_max_nregs): Likewise.
(aarch64_register_move_cost): Likewise.
(aarch64_sls_shared_thunks): Global array to store stub labels.
(aarch64_sls_emit_function_stub): New.
(aarch64_create_blr_label): New.
(aarch64_sls_emit_blr_function_thunks): New.
(aarch64_sls_emit_shared_blr_thunks): New.
(aarch64_asm_file_end): New.
(aarch64_indirect_call_asm): New.
(TARGET_ASM_FILE_END): Use aarch64_asm_file_end.
(TARGET_ASM_FUNCTION_EPILOGUE): Use
aarch64_sls_emit_blr_function_thunks.
* config/aarch64/aarch64.h (STB_REGNUM_P): New.
(enum reg_class): Add STUB_REGS class.
(machine_function): Introduce `call_via` array for
function-local stub labels.
* config/aarch64/aarch64.md (*call_insn, *call_value_insn): Use
aarch64_indirect_call_asm to emit code when hardening BLR
instructions.
* config/aarch64/constraints.md (Ucr): New constraint
representing registers for indirect calls.  Is GENERAL_REGS
usually, and STUB_REGS when hardening BLR instruction against
SLS.
* config/aarch64/predicates.md (aarch64_general_reg): STUB_REGS class
is also a general register.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c: New test.
* gcc.target/aarch64/sls-mitigation/sls-miti-blr.c: New test.

commit | commitdiff | tree

Matthew Malcomson [Thu, 9 Jul 2020 08:11:59 +0000 (09:11 +0100)]

aarch64: Introduce SLS mitigation for RET and BR instructions

Instructions following RET or BR are not necessarily executed.  In order
to avoid speculation past RET and BR we can simply append a speculation
barrier.

Since these speculation barriers will not be architecturally executed,
they are not expected to add a high performance penalty.

The speculation barrier is to be SB when targeting architectures which
have this enabled, and DSB SY + ISB otherwise.

We add tests for each of the cases where such an instruction was seen.

This is implemented by modifying each machine description pattern that
emits either a RET or a BR instruction.  We choose not to use something
like `TARGET_ASM_FUNCTION_EPILOGUE` since it does not affect the
`indirect_jump`, `jump`, `sibcall_insn` and `sibcall_value_insn`
patterns and we find it preferable to implement the functionality in the
same way for every pattern.

There is one particular case which is slightly tricky.  The
implementation of TARGET_ASM_TRAMPOLINE_TEMPLATE uses a BR which needs
to be mitigated against.  The trampoline template is used *once* per
compilation unit, and the TRAMPOLINE_SIZE is exposed to the user via the
builtin macro __LIBGCC_TRAMPOLINE_SIZE__.
In the future we may implement function specific attributes to turn on
and off hardening on a per-function basis.
The fixed nature of the trampoline described above implies it will be
safer to ensure this speculation barrier is always used.

Testing:
  Bootstrap and regtest done on aarch64-none-linux
  Used a temporary hack(1) to use these options on every test in the
  testsuite and a script to check that the output never emitted an
  unmitigated RET or BR.

1) Temporary hack was a change to the testsuite to always use
`-save-temps` and run a script on the assembly output of those
compilations which produced one to ensure every RET or BR is immediately
followed by a speculation barrier.

gcc/ChangeLog:

* config/aarch64/aarch64-protos.h (aarch64_sls_barrier): New.
* config/aarch64/aarch64.c (aarch64_output_casesi): Emit
speculation barrier after BR instruction if needs be.
(aarch64_trampoline_init): Handle ptr_mode value & adjust size
of code copied.
(aarch64_sls_barrier): New.
(aarch64_asm_trampoline_template): Add needed barriers.
* config/aarch64/aarch64.h (AARCH64_ISA_SB): New.
(TARGET_SB): New.
(TRAMPOLINE_SIZE): Account for barrier.
* config/aarch64/aarch64.md (indirect_jump, *casesi_dispatch,
simple_return, *do_return, *sibcall_insn, *sibcall_value_insn):
Emit barrier if needs be, also account for possible barrier using
"sls_length" attribute.
(sls_length): New attribute.
(length): Determine default using any non-default sls_length
value.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sls-mitigation/sls-miti-retbr.c: New test.
* gcc.target/aarch64/sls-mitigation/sls-miti-retbr-pacret.c:
New test.
* gcc.target/aarch64/sls-mitigation/sls-mitigation.exp: New file.
* lib/target-supports.exp (check_effective_target_aarch64_asm_sb_ok):
New proc.

commit | commitdiff | tree

Matthew Malcomson [Thu, 9 Jul 2020 08:11:58 +0000 (09:11 +0100)]

aarch64: New Straight Line Speculation (SLS) mitigation flags

Here we introduce the flags that will be used for straight line speculation.

The new flag introduced is `-mharden-sls=`.
This flag can take arguments of `none`, `all`, or a comma seperated list of one
or more of `retbr` or `blr`.
`none` indicates no special mitigation of the straight line speculation
vulnerability.
`all` requests all mitigations currently implemented.
`retbr` requests that the RET and BR instructions have a speculation barrier
inserted after them.
`blr` requests that BLR instructions are replaced by a BL to a function stub
using a BR with a speculation barrier after it.

Setting this on a per-function basis using attributes or the like is not
enabled, but may be in the future.

gcc/ChangeLog:

2020-06-02 Matthew Malcomson <matthew.malcomson@arm.com>

* config/aarch64/aarch64-protos.h (aarch64_harden_sls_retbr_p):
New.
(aarch64_harden_sls_blr_p): New.
* config/aarch64/aarch64.c (enum aarch64_sls_hardening_type):
New.
(aarch64_harden_sls_retbr_p): New.
(aarch64_harden_sls_blr_p): New.
(aarch64_validate_sls_mitigation): New.
(aarch64_override_options): Parse options for SLS mitigation.
* config/aarch64/aarch64.opt (-mharden-sls): New option.
* doc/invoke.texi: Document new option.

commit | commitdiff | tree

Kewen Lin [Thu, 9 Jul 2020 03:27:41 +0000 (22:27 -0500)]

vect: Enhance condition check to use partial vectors

This patch is derived from the review of vector with length patch
series.  The length-based partial vector approach doesn't support
reduction so far, so we would like to disable vectorization with
partial vectors explicitly for it in vectorizable_condition.
Otherwise, it will cause some unexpected failures for a few cases
like gcc.dg/vect/pr65947-2.c.

But if we disable it for the cases excepting for reduction_type equal
to EXTRACT_LAST_REDUCTION, it cause one regression failure on aarch64:

  gcc.target/aarch64/sve/reduc_8.c -march=armv8.2-a+sve

The disabling makes the outer loop can't work with partial vectors,
the check fails.  But the case is safe to adopt it.  As Richard S.
pointed out in the review comments, the extra inactive lanes only
matter for double reductions, so this patch is to permit vectorization
with partial vectors for cases EXTRACT_LAST_REDUCTION or nested-cycle
reduction.

Bootstrapped/regtested on aarch64-linux-gnu.

gcc/ChangeLog:

* tree-vect-stmts.c (vectorizable_condition): Prohibit vectorization
with partial vectors explicitly excepting for EXTRACT_LAST_REDUCTION
or nested-cycle reduction.

commit | commitdiff | tree

Kewen Lin [Thu, 9 Jul 2020 03:18:54 +0000 (22:18 -0500)]

vect/testsuite: Adjust dumping for fully masking decision

As Richard S. suggested in the review of vector with length patch
series, we can use one message on "partial vectors" instead of
"fully with masking". This patch is to update the dumping string
and related test cases.

Bootstrapped/regtested on aarch64-linux-gnu.

gcc/ChangeLog:

* tree-vect-loop.c (vect_analyze_loop_2): Update dumping string
for fully masking to be more common.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/clastb_1.c: Update dumping string.
* gcc.target/aarch64/sve/clastb_2.c: Likewise.
* gcc.target/aarch64/sve/clastb_3.c: Likewise.
* gcc.target/aarch64/sve/clastb_4.c: Likewise.
* gcc.target/aarch64/sve/clastb_5.c: Likewise.
* gcc.target/aarch64/sve/clastb_6.c: Likewise.
* gcc.target/aarch64/sve/clastb_7.c: Likewise.

commit | commitdiff | tree

Kito Cheng [Tue, 7 Jul 2020 08:20:53 +0000 (16:20 +0800)]

RISC-V: Implement __builtin_thread_pointer

RISC-V has a dedicate register for thread pointer which is specified in psABI
doc, so we could support __builtin_thread_pointer in straightforward way.

Note: clang/llvm was supported __builtin_thread_pointer for RISC-V port
recently.
- https://reviews.llvm.org/rGaabc24acf0d5f8677bd22fe9c108581e07c3e180

gcc/ChangeLog:

* config/riscv/riscv.md (get_thread_pointer<mode>): New.
(TP_REGNUM): Ditto.
* doc/extend.texi (Target Builtins): Add RISC-V built-in section.
Document __builtin_thread_pointer.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/read-thread-pointer.c: New.

commit | commitdiff | tree

Kito Cheng [Fri, 3 Jul 2020 05:49:51 +0000 (13:49 +0800)]

RISC-V: Disable remove unneeded save-restore call optimization if there are any arguments on stack.

- This optimization will adjust stack, but it not check/update other
   stack pointer use-site, the example is when the arguments put on
   stack, the offset become wrong after optimization.

- However adjust stack frame usage after register allocation could be
   error prone, so we decide to turn off this optimization for such case.

- Ye-Ting Kuo report this issue on github:
   https://github.com/riscv/riscv-gcc/pull/192

gcc/ChangeLog:

* config/riscv/riscv-sr.c (riscv_remove_unneeded_save_restore_calls):
Abort if any arguments on stack.

gcc/testsuite/ChangeLog

* gcc.target/riscv/save-restore-9.c: New.

RSS Atom