git.libre-soc.org Git

vect: Add a “very cheap” cost model

Currently we have three vector cost models: cheap, dynamic and
unlimited.  -O2 -ftree-vectorize uses “cheap” by default, but that's
still relatively aggressive about peeling and aliasing checks,
and can lead to significant code size growth.

This patch adds an even more conservative choice, which for lack of
imagination I've called “very cheap”.  It only allows vectorisation
if the vector code entirely replaces the scalar code.  It also
requires one iteration of the vector loop to pay for itself,
regardless of how often the loop iterates.  (If the vector loop
needs multiple iterations to be beneficial then things are
probably too close to call, and the conservative thing would
be to stick with the scalar code.)

The idea is that this should be suitable for -O2, although the patch
doesn't change any defaults itself.

I tested this by building and running a bunch of workloads for SVE,
with three options:

  (1) -O2
  (2) -O2 -ftree-vectorize -fvect-cost-model=very-cheap
  (3) -O2 -ftree-vectorize [-fvect-cost-model=cheap]

All three builds used the default -msve-vector-bits=scalable and
ran with the minimum vector length of 128 bits, which should give
a worst-case bound for the performance impact.

The workloads included a mixture of microbenchmarks and full
applications.  Because it's quite an eclectic mix, there's not
much point giving exact figures.  The aim was more to get a general
impression.

Code size growth with (2) was much lower than with (3).  Only a
handful of tests increased by more than 5%, and all of them were
microbenchmarks.

In terms of performance, (2) was significantly faster than (1)
on microbenchmarks (as expected) but also on some full apps.
Again, performance only regressed on a handful of tests.

As expected, the performance of (3) vs. (1) and (3) vs. (2) is more
of a mixed bag.  There are several significant improvements with (3)
over (2), but also some (smaller) regressions.  That seems to be in
line with -O2 -ftree-vectorize being a kind of -O2.5.

The patch reorders vect_cost_model so that values are in order
of increasing aggressiveness, which makes it possible to use
range checks.  The value 0 still represents “unlimited”,
so “if (flag_vect_cost_model)” is still a meaningful check.

gcc/
* doc/invoke.texi (-fvect-cost-model): Add a very-cheap model.
* common.opt (fvect-cost-model=): Add very-cheap as a possible option.
(fsimd-cost-model=): Likewise.
(vect_cost_model): Add very-cheap.
* flag-types.h (vect_cost_model): Add VECT_COST_MODEL_VERY_CHEAP.
Put the values in order of increasing aggressiveness.
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Use
range checks when comparing against VECT_COST_MODEL_CHEAP.
(vect_prune_runtime_alias_test_list): Do not allow any alias
checks for the very-cheap cost model.
* tree-vect-loop.c (vect_analyze_loop_costing): Do not allow
any peeling for the very-cheap cost model.  Also require one
iteration of the vector loop to pay for itself.

gcc/testsuite/
* gcc.dg/vect/vect-cost-model-1.c: New test.
* gcc.dg/vect/vect-cost-model-2.c: Likewise.
* gcc.dg/vect/vect-cost-model-3.c: Likewise.
* gcc.dg/vect/vect-cost-model-4.c: Likewise.
* gcc.dg/vect/vect-cost-model-5.c: Likewise.
* gcc.dg/vect/vect-cost-model-6.c: Likewise.

libstdc++: Add missing header to some tests

These tests use std::this_thread::sleep_for without including <thread>.

libstdc++-v3/ChangeLog:

* testsuite/30_threads/async/async.cc: Include <thread>.
* testsuite/30_threads/future/members/93456.cc: Likewise.

AArch64: Add cost table for Cortex-A76

Add an initial cost table for Cortex-A76 - this is copied from
cotexa57_extra_costs but updated based on the Optimization Guide.
Use the new cost table on all Neoverse tunings and ensure the tunings
are consistent for all.  As a result more compact code is generated
with more combined shift+alu operations. Eg. -mcpu=cortex-a76 will now
merge the shifts in:

int f(int x, int y) { return (x & y << 3) * (x | y << 3); }

and  w2, w0, w1, lsl 3
orr  w0, w0, w1, lsl 3
mul  w0, w2, w0
ret

SPEC2017 codesize improves by 0.02% and SPECINT2017 shows 0.24% gain.

2020-11-18  Wilco Dijkstra  <wdijkstr@arm.com>

gcc/
* config/aarch64/aarch64.c (neoversen1_tunings): Use new
cortexa76_extra_costs.
(neoversev1_tunings): Likewise.
(neoversen2_tunines): Likewise.
* config/arm/aarch-cost-tables.h (cortexa76_extra_costs):
add new costs.

AArch64: Improve inline memcpy expansion

Improve the inline memcpy expansion.  Use integer load/store for copies <= 24
bytes instead of SIMD.  Set the maximum copy to expand to 256 by default,
except that -Os or no Neon expands up to 128 bytes.  When using LDP/STP of
Q-registers, also use Q-register accesses for the unaligned tail, saving 2
instructions (eg. all sizes up to 48 bytes emit exactly 4 instructions).
Cleanup code and comments.

The codesize gain vs the GCC10 expansion is 0.05% on SPECINT2017.

2020-11-03  Wilco Dijkstra  <wdijkstr@arm.com>

gcc/
* config/aarch64/aarch64.c (aarch64_expand_cpymem): Cleanup code and
comments, tweak expansion decisions and improve tail expansion.

Fix PR ada/97805

We need to include limits.h (or <climits>) in adaint.c because of LLONG_MIN.

gcc/ada/ChangeLog:
PR ada/97805
* adaint.c: Include climits in C++ and limits.h otherwise.

preprocessor: main file searching

This adds the capability to locate the main file on the user or system
include paths.  That's extremely useful to users building header
units.  Searching has to be requiested (plain header-unit compilation
will not search).  Also, to make include_next work as expected when
building a header unit, we add a mechanism to retrofit a non-searched
source file as one on the include path.

libcpp/
* include/cpplib.h (enum cpp_main_search): New.
(struct cpp_options): Add main_search field.
(cpp_main_loc): Declare.
(cpp_retrofit_as_include): Declare.
* internal.h (struct cpp_reader): Add main_loc field.
(_cpp_in_main_source_file): Not main if main is a header.
* init.c (cpp_read_main_file): Use main_search option to locate
main file.  Set main_loc
* files.c (cpp_retrofit_as_include): New.

libstdc++: Move std::thread to a new header

This makes it possible to use std::thread without including the whole of
<thread>. It also makes this_thread::get_id() and this_thread::yield()
available even when there is no gthreads support (e.g. when GCC is built
with --disable-threads or --enable-threads=single).

In order for the std::thread::id return type of this_thread::get_id() to
be defined, std:thread itself is defined unconditionally. However the
constructor that creates new threads is not defined for single-threaded
builds. The thread::join() and thread::detach() member functions are
defined inline for single-threaded builds and just throw an exception
(because we know the thread cannot be joinable if the constructor that
creates joinable threads doesn't exit).

The thread::hardware_concurrency() member function is also defined
inline and returns 0 (as suggested by the standard when the value "is
not computable or well-defined").

The main benefit for most targets is that other headers such as <future>
do not need to include the whole of <thread> just to be able to create a
std::thread. That avoids including <stop_token> and std::jthread where
not required. This is another partial fix for PR 92546.

This also means we can use this_thread::get_id() and this_thread::yield()
in <stop_token> instead of using the gthread functions directly. This
removes some preprocessor conditionals, simplifying the code.

libstdc++-v3/ChangeLog:

PR libstdc++/92546
* include/Makefile.am: Add new <bits/std_thread.h> header.
* include/Makefile.in: Regenerate.
* include/std/future: Include new header instead of <thread>.
* include/std/stop_token: Include new header instead of
<bits/gthr.h>.
(stop_token::_S_yield()): Use this_thread::yield().
(_Stop_state_t::_M_requester): Change type to std::thread::id.
(_Stop_state_t::_M_request_stop()): Use this_thread::get_id().
(_Stop_state_t::_M_remove_callback(_Stop_cb*)): Likewise.
Use __is_single_threaded() to decide whether to synchronize.
* include/std/thread (thread, operator==, this_thread::get_id)
(this_thread::yield): Move to new header.
(operator<=>, operator!=, operator<, operator<=, operator>)
(operator>=, hash<thread::id>, operator<<): Define even when
gthreads not available.
* src/c++11/thread.cc: Include <memory>.
* include/bits/std_thread.h: New file.
(thread, operator==, this_thread::get_id, this_thread::yield):
Define even when gthreads not available.
[!_GLIBCXX_HAS_GTHREADS] (thread::join, thread::detach)
(thread::hardware_concurrency): Define inline.

libstdc++: Fix overflow checks to use the correct "time_t" [PR 93456]

I recently added overflow checks to src/c++11/futex.cc for PR 93456, but
then changed the type of the timespec for PR 93421. This meant the
overflow checks were no longer using the right range, because the
variable being written to might be smaller than time_t.

This introduces new typedef that corresponds to the tv_sec member of the
struct being passed to the syscall, and uses that typedef in the range
checks.

libstdc++-v3/ChangeLog:

PR libstdc++/93421
PR libstdc++/93456
* src/c++11/futex.cc (syscall_time_t): New typedef for
the type of the syscall_timespec::tv_sec member.
(relative_timespec, _M_futex_wait_until)
(_M_futex_wait_until_steady): Use syscall_time_t in overflow
checks, not time_t.

preprocessor: main-file cleanup

In preparing module patch 7 I realized there was a cleanup I could
make to simplify it.  This is that cleanup.  Also, when doing the
cleanup I noticed some macros had been turned into inline functions,
but not renamed to the preprocessors internal namespace
(_cpp_$INTERNAL rather than cpp_$USER).  Thus, this renames those
functions, deletes an internal field of the file structure, and
determines whether we're in the main file by comparing to
pfile->main_file, the _cpp_file of the main file.

libcpp/
* internal.h (cpp_in_system_header): Rename to ...
(_cpp_in_system_header): ... here.
(cpp_in_primary_file): Rename to ...
(_cpp_in_main_source_file): ... here.  Compare main_file equality
and check main_search value.
* lex.c (maybe_va_opt_error, _cpp_lex_direct): Adjust for rename.
* macro.c (_cpp_builtin_macro_text): Likewise.
(replace_args): Likewise.
* directives.c (do_include_next): Likewise.
(do_pragma_once, do_pragma_system_header): Likewise.
* files.c (struct _cpp_file): Delete main_file field.
(pch_open): Check pfile->main_file equality.
(make_cpp_file): Drop cpp_reader parm, don't set main_file.
(_cpp_find_file): Adjust.
(_cpp_stack_file): Check pfile->main_file equality.
(struct report_missing_guard_data): Add cpp_reader field.
(report_missing_guard): Check pfile->main_file equality.
(_cpp_report_missing_guards): Adjust.

Fix bootstrap

This fixes a typo in the TREE_CODE compare which should
compare against TYPE_DECL, not TYPE_NAME.

2020-11-19 Richard Biener <rguenther@suse.de>

* fold-const.c (operand_compare::hash_operand): Fix typo.

Fix gcc.dg/pr97897.c

This adds dg-options "" to avoid the pedantic error on _Complex int.

2020-11-19 Richard Biener <rguenther@suse.de>

* gcc.dg/pr97897.c: Add dg-options.

refactor reassocs get_rank

This refactors things so assigned ranks are dumped and the cache
is consistently used also for PHIs.

2020-11-19 Richard Biener <rguenther@suse.de>

* tree-ssa-reassoc.c (get_rank): Refactor to consistently
use the cache and dump ranks assigned.

Fix operand_equal_p hash and copare of ODR_TYPE_REF

* fold-const.c (operand_compare::operand_equal_p): More OBJ_TYPE_REF
matching to correct place; drop OEP_ADDRESS_OF for TOKEN, OBJECT and
class.
(operand_compare::hash_operand): Hash ODR type for OBJ_TYPE_REF.

[3/3] [AArch64][vect] vec_widen_lshift pattern

Add aarch64 vec_widen_lshift_lo/hi patterns and fix bug it triggers in
mid-end. This pattern takes one vector with N elements of size S, shifts
each element left by the element width and stores the results as N
elements of size 2*s (in 2 result vectors). The aarch64 backend
implements this with the shll,shll2 instruction pair.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md: Add vec_widen_lshift_hi/lo<mode>
patterns.
* tree-vect-stmts.c (vectorizable_conversion): Fix for widen_lshift
case.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-widen-lshift.c: New test.

[2/3] [vect] Add widening add, subtract patterns

Add widening add, subtract patterns to tree-vect-patterns. Update the
widened code of patterns that detect PLUS_EXPR to also detect
WIDEN_PLUS_EXPR. These patterns take 2 vectors with N elements of size
S and perform an add/subtract on the elements, storing the results as N
elements of size 2*S (in 2 result vectors). This is implemented in the
aarch64 backend as addl,addl2 and subl,subl2 respectively. Add aarch64
tests for patterns.

gcc/ChangeLog:
* doc/generic.texi: Document new widen_plus/minus_lo/hi tree codes.
* doc/md.texi: Document new widenening add/subtract hi/lo optabs.
* expr.c (expand_expr_real_2): Add widen_add, widen_subtract cases.
* optabs-tree.c (optab_for_tree_code): Add case for widening optabs.
* optabs.def (OPTAB_D): Define vectorized widen add, subtracts.
* tree-cfg.c (verify_gimple_assign_binary): Add case for widening adds,
subtracts.
* tree-inline.c (estimate_operator_cost): Add case for widening adds,
subtracts.
* tree-vect-generic.c (expand_vector_operations_1): Add case for
widening adds, subtracts
* tree-vect-patterns.c (vect_recog_widen_add_pattern): New recog
pattern.
(vect_recog_widen_sub_pattern): New recog pattern.
(vect_recog_average_pattern): Update widened add code.
(vect_recog_average_pattern): Update widened add code.
* tree-vect-stmts.c (vectorizable_conversion): Add case for widened add,
subtract.
(supportable_widening_operation): Add case for widened add, subtract.
* tree.def
(WIDEN_PLUS_EXPR): New tree code.
(WIDEN_MINUS_EXPR): New tree code.
(VEC_WIDEN_ADD_HI_EXPR): New tree code.
(VEC_WIDEN_PLUS_LO_EXPR): New tree code.
(VEC_WIDEN_MINUS_HI_EXPR): New tree code.
(VEC_WIDEN_MINUS_LO_EXPR): New tree code.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vect-widen-add.c: New test.
* gcc.target/aarch64/vect-widen-sub.c: New test.

[1/3][aarch64] Add vec_widen patterns to aarch64

Add widening add and subtract patterns to the aarch64
backend. These allow taking vectors of N elements of size S
and performing and add/subtract on the high or low half
widening the resulting elements and storing N/2 elements of size 2*S.
These correspond to the addl,addl2,subl,subl2 instructions.

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md: New patterns
vec_widen_saddl_lo/hi_<mode>.

tree-optimization/97901 - ICE propagating out LC PHIs

We need to fold the stmt to canonicalize MEM_REFs which means
we're back to using replace_uses_by. Which means we need dominators
to not require a CFG cleanup upthread.

2020-11-19 Richard Biener <rguenther@suse.de>

PR tree-optimization/97901
* tree-ssa-propagate.c (clean_up_loop_closed_phi): Compute
dominators and use replace_uses_by.

* gcc.dg/torture/pr97901.c: New testcase.

Enhance debug info for fixed-point types

The Ada language supports fixed-point types as first-class citizens so
they need to be described as-is in the debug info. We devised the
langhook get_fixed_point_type_info for this purpose a few years ago,
but it comes with a limitation for the representation of the scale
factor that we would need to lift in order to be able to represent
more fixed-point types.

gcc/ChangeLog:
* dwarf2out.h (struct fixed_point_type_info) <scale_factor>: Turn
numerator and denominator into a tree.
* dwarf2out.c (base_type_die): In the case of a fixed-point type
with arbitrary scale factor, call add_scalar_info on numerator and
denominator to emit the appropriate attributes.

gcc/ada/ChangeLog:
* exp_dbug.adb (Is_Handled_Scale_Factor): Delete.
(Get_Encoded_Name): Do not call it.
* gcc-interface/decl.c (gnat_to_gnu_entity) <Fixed_Point_Type>:
Tidy up and always use a meaningful description for arbitrary
scale factors.
* gcc-interface/misc.c (gnat_get_fixed_point_type_info): Remove
obsolete block and adjust the description of the scale factor.

tree-optimization/97897 - complex lowering on abnormal edges

This fixes complex lowering to not put constants into abnormal
edge PHI values by making sure abnormally used SSA names are
VARYING in its propagation lattice.

2020-11-19 Richard Biener <rguenther@suse.de>

PR tree-optimization/97897
* tree-complex.c (complex_propagate::visit_stmt): Make sure
abnormally used SSA names are VARYING.
(complex_propagate::visit_phi): Likewise.
* tree-ssa.c (verify_phi_args): Verify PHI arguments on abnormal
edges are SSA names.

* gcc.dg/pr97897.c: New testcase.

i386: Disable *<absneg:code><mode>2_i387_1 for TARGET_SSE_MATH modes

This pattern interferes with *<absneg:code><mode>2_1 when TARGET_SSE_MATH
modes are active. Combine pass is able to remove (use) RTXes and transforms
*<absneg:code><mode>2_1 to *<absneg:code><mode>2_i387_1 where SSE
alternatives are not available.

2020-11-19 Uroš Bizjak <ubizjak@gmail.com>

gcc/
* config/i386/i386.md (*<absneg:code><mode>2_i387_1):
Disable for TARGET_SSE_MATH modes.

gcc/testsuite/
* gcc.target/i386/pr97887.c: New test.

Minor H8 shift code generation change in preparation for cc0 removal

So I didn't stay up late to work from pago pago this year and beat the stage1
close, but I do want to flush out the removal of cc0 from the H8 port this
cycle.  Given these patches only affect the H8 and the H8 would be killed this
cycle without the conversion, I think this is suitable even though we're past
stage1 close.

This patch addresses an initial codegen issue that would have resulted in
regressions after removal of cc0.  The compare/test eliminate pass is unable to
handle multiple clobbers.  So patterns that clobber a scratch and also clobber
a condition code are never used to eliminate a compare/test.

The H8 can shift 1 or 2 bits at a time depending on the precise model.  Not
surprisingly we have multiple strategies to implement shifts, some of which
clobber scratch registers -- but we have a clobber on every shift insn and as
a result they can not participate in compare/test removal once cc0 is removed
from the port.

This patch removes the clobber in the initial code generation in cases where
it's obviously not needed allowing those shifts to participate in compare/test
removal in a future patch.  It has the advantage that is also generates
slightly better code.  By installing this now the removal of cc0 is a smaller
patch, but more importantly, it allows for a more direct comparison of the
generated code before/after cc0 removal.

I've had my tester test before/after this patch with no regressions on the
major H8 multilibs.  I've also spot checked the generated code and as expected
it's ever-so-slightly better after this patch.

I'll be installing this on the trunk momentarily.  More patches will follow,
though probably not in rapid succession as my time to push this stuff is very
limited.

gcc/

* config/h8300/constraints.md (R constraint): Add argument to call
to h8300_shift_needs_scratch_p.
(S and T constraints): Similary.
* config/h8300/h8300-protos.h: Update h8300_shift_needs_scratch_p
prototype.
* config/h8300/h8300.c (expand_a_shift): Emit a different pattern
if the shift does not require a scratch register.
(h8300_shift_needs_scratch_p): Refine to be more accurate.
* config/h8300/shiftrotate.md (shiftqi_noscratch): New pattern.
(shifthi_noscratch, shiftsi_noscratch): Similarly.

Daily bump.

Fix middle-end/85811: Introduce tree_expr_maybe_non_p et al.

The motivation for this patch is PR middle-end/85811, a wrong-code
regression entitled "Invalid optimization with fmax, fabs and nan".
The optimization involves assuming max(x,y) is non-negative if (say)
y is non-negative, i.e. max(x,2.0).  Unfortunately, this is an invalid
assumption in the presence of NaNs.  Hence max(x,+qNaN), with IEEE fmax
semantics will always return x even though the qNaN is non-negative.
Worse, max(x,2.0) may return a negative value if x is -sNaN.

I'll quote Joseph Myers (many thanks) who describes things clearly as:
> (a) When both arguments are NaNs, the return value should be a qNaN,
> but sometimes it is an sNaN if at least one argument is an sNaN.
> (b) Under TS 18661-1 semantics, if either argument is an sNaN then the
> result should be a qNaN (whereas if one argument is a qNaN and the
> other is not a NaN, the result should be the non-NaN argument).
> Various implementations treat sNaNs like qNaNs here.

Under this logic, the tree_expr_nonnegative_p for IEEE fmax should be:

    CASE_CFN_FMAX:
    CASE_CFN_FMAX_FN:
      /* Usually RECURSE (arg0) || RECURSE (arg1) but NaNs complicate
         things.  In the presence of sNaNs, we're only guaranteed to be
         non-negative if both operands are non-negative.  In the presence
         of qNaNs, we're non-negative if either operand is non-negative
         and can't be a qNaN, or if both operands are non-negative.  */
      if (tree_expr_maybe_signaling_nan_p (arg0) ||
          tree_expr_maybe_signaling_nan_p (arg1))
        return RECURSE (arg0) && RECURSE (arg1);
      return RECURSE (arg0) ? (!tree_expr_maybe_nan_p (arg0)
                              || RECURSE (arg1))
                            : (RECURSE (arg1)
                              && !tree_expr_maybe_nan_p (arg1));

Which indeed resolves the wrong code in the PR.  The infrastructure that
makes this possible are the two new functions tree_expr_maybe_nan_p and
tree_expr_maybe_signaling_nan_p which test whether a value may potentially
be a NaN or a signaling NaN respectively.  In fact, this patch adds seven
new predicates to the middle-end:

bool tree_expr_finite_p (const_tree);
bool tree_expr_infinite_p (const_tree);
bool tree_expr_maybe_infinite_p (const_tree);
bool tree_expr_signaling_nan_p (const_tree);
bool tree_expr_maybe_signaling_nan_p (const_tree);
bool tree_expr_nan_p (const_tree);
bool tree_expr_maybe_nan_p (const_tree);

These functions correspond to the "must" and "may" operators in modal logic,
and allow us to triage expressions in the middle-end; definitely a NaN,
definitely not a NaN, and unknown at compile-time, etc.  A prime example of
the utility of these functions is that a IEEE floating point value promoted
from an integer type can't be a NaN or infinite.  Hence (double)i+0.0 where
i is an integer can be simplified to (double)i even with -fsignaling-nans.
Currently in GCC optimizations are enabled/disabled based on whether the
expression's type supports NaNs or sNaNs; with these new predicates they
can be controlled by whether the actual operands may or may not be NaNs.

Having added these extremely useful helper functions to the middle-end,
I couldn't help by use then in a few places in fold-const.c, builtins.c
and match.pd.  In the near term, these can/should be used in places
where the tree optimizers test for HONOR_NANS, HONOR_INFINITIES or
HONOR_SNANS, or explicitly test whether a REAL_CST is a NaN or Inf.
In the longer term (I'm not volunteering) these predicates could perhaps
be hooked into the middle-end's SSA chaining and/or VRP machinery,
allowing finiteness to propagated around the CFG, much like we
currently propagate value ranges.

This patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap"
and "make -k check".
Ok for mainline?

2020-08-15  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
PR middle-end/85811
* fold-const.c (tree_expr_finite_p): New function to test whether
a tree expression must be finite, i.e. not a FP NaN or infinity.
(tree_expr_infinite_p):  New function to test whether a tree
expression must be infinite, i.e. a FP infinity.
(tree_expr_maybe_infinite_p): New function to test whether a tree
expression may be infinite, i.e. a FP infinity.
(tree_expr_signaling_nan_p): New function to test whether a tree
expression must evaluate to a signaling NaN (sNaN).
(tree_expr_maybe_signaling_nan_p): New function to test whether a
tree expression may be a signaling NaN (sNaN).
(tree_expr_nan_p): New function to test whether a tree expression
must evaluate to a (quiet or signaling) NaN.
(tree_expr_maybe_nan_p): New function to test whether a tree
expression me be a (quiet or signaling) NaN.

(tree_binary_nonnegative_warnv_p) [MAX_EXPR]: In the presence
of NaNs, MAX_EXPR is only guaranteed to be non-negative, if both
operands are non-negative.
(tree_call_nonnegative_warnv_p) [CASE_CFN_FMAX,CASE_CFN_FMAX_FN]:
In the presence of signaling NaNs, fmax is only guaranteed to be
non-negative if both operands are negative.  In the presence of
quiet NaNs, fmax is non-negative if either operand is non-negative
and not a qNaN, or both operands are non-negative.

* fold-const.h (tree_expr_finite_p, tree_expr_infinite_p,
tree_expr_maybe_infinite_p, tree_expr_signaling_nan_p,
tree_expr_maybe_signaling_nan_p, tree_expr_nan_p,
tree_expr_maybe_nan_p): Prototype new functions here.

* builtins.c (fold_builtin_classify) [BUILT_IN_ISINF]: Fold to
a constant if argument is known to be (or not to be) an Infinity.
[BUILT_IN_ISFINITE]: Fold to a constant if argument is known to
be (or not to be) finite.
[BUILT_IN_ISNAN]: Fold to a constant if argument is known to be
(or not to be) a NaN.
(fold_builtin_fpclassify): Check tree_expr_maybe_infinite_p and
tree_expr_maybe_nan_p instead of HONOR_INFINITIES and HONOR_NANS
respectively.
(fold_builtin_unordered_cmp): Fold UNORDERED_EXPR to a constant
when its arguments are known to be (or not be) NaNs.  Check
tree_expr_maybe_nan_p instead of HONOR_NANS when choosing between
unordered and regular forms of comparison operators.

* match.pd (ordered(x,y)->true/false): Constant fold ORDERED_EXPR
if its operands are known to be (or not to be) NaNs.
(unordered(x,y)->true/false): Constant fold UNORDERED_EXPR if its
operands are known to be (or not to be) NaNs.
(sqrt(x)*sqrt(x)->x): Check tree_expr_maybe_signaling_nan_p instead
of HONOR_SNANS.

gcc/testsuite/ChangeLog
PR middle-end/85811
* gcc.dg/pr85811.c: New test.
* gcc.dg/fold-isfinite-1.c: New test.
* gcc.dg/fold-isfinite-2.c: New test.
* gcc.dg/fold-isinf-1.c: New test.
* gcc.dg/fold-isinf-2.c: New test.
* gcc.dg/fold-isnan-1.c: New test.
* gcc.dg/fold-isnan-2.c: New test.

lto: Fix typo in comment of gcc/lto/lto-symtab.c

* lto-symtab.c (lto_symtab_merge_symbols): Fix typos in comment.

vrp: Fix operator_trunc_mod::op1_range [PR97888]

As mentioned in the PR, in (x % y) >= 0 && y >= 0, we can't deduce
x's range to be x >= 0, as e.g. -7 % 7 is 0.  But we can deduce it
from (x % y) > 0.  The patch also fixes up the comments.

2020-11-18  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/91029
PR tree-optimization/97888
* range-op.cc (operator_trunc_mod::op1_range): Only set op1
range to >= 0 if lhs is > 0, rather than >= 0.  Fix up comments.

* gcc.dg/pr91029.c: Add comment with PR number.
(f2): Use > 0 rather than >= 0.
* gcc.c-torture/execute/pr97888-1.c: New test.
* gcc.c-torture/execute/pr97888-2.c: New test.

plugins: Allow plugins to handle global_options changes

Any time somebody adds or removes an option in some *.opt file (which e.g.
on the 10 branch after branching off 11 happened 7 times already), many
offsets in global_options variable change and so plugins that ever access
GCC options or other global_options values are ABI dependent on it.  It is
true we don't guarantee ABI stability for plugins, but we change the most
often used data structures on the release branches only very rarely and so
the options changes are the most problematic for ABI stability of plugins.

Annobin uses a way to remap accesses to some of the global_options.x_* by
looking them up in the cl_options array where we have
offsetof (struct gcc_options, x_flag_lto)
etc. remembered, but sadly doesn't do it for all options (e.g. some flag_*
etc. option accesses may be hidden in various macros like POINTER_SIZE),
and more importantly some struct gcc_options offsets are not covered at all.
E.g. there is no offsetof (struct gcc_options, x_optimize),
offsetof (struct gcc_options, x_flag_sanitize) etc.  Those are usually:
Variable
int optimize
in the *.opt files.

The following patch allows the plugins to deal with reshuffling of even
the global_options fields that aren't tracked in cl_options by adding
another array that describes those, which adds an 816 bytes long array
and 1039 bytes in string literals, so 1855 .rodata bytes in total ATM.
And adds it only if --enable-plugin (the default), with --disable-plugin
it will not be compiled in.

2020-11-18  Jakub Jelinek  <jakub@redhat.com>

* opts.h (struct cl_var): New type.
(cl_vars): Declare.
* optc-gen.awk: Generate cl_vars array.

analyzer: only use CWE-690 for unchecked return value [PR97893]

CWE-690 is only for dereferencing an unchecked return value; for
other kinds of NULL dereference, use the parent classification, CWE-476.

gcc/analyzer/ChangeLog:
PR analyzer/97893
* sm-malloc.cc (null_deref::emit): Use CWE-476 rather than
CWE-690, as this isn't due to an unchecked return value.
(null_arg::emit): Likewise.

gcc/testsuite/ChangeLog:
PR analyzer/97893
* gcc.dg/analyzer/malloc-1.c: Add CWE-690 and CWE-476 codes to
expected output.

Objective-C++ : Avoid ICE on invalid with empty attributes.

Empty prefix attributes like:

__attribute__ (())
@interface MyClass
@end

cause an ICE at present, check for that case and skip them.

gcc/cp/ChangeLog:

* parser.c (cp_parser_objc_valid_prefix_attributes): Check
for empty attributes.

Optimize two patterns with three xors

gcc/
PR tree-optimization/96671
* match.pd (three xor patterns): New patterns.

openmp: Retire nest-var ICV for OpenMP 5.1

This removes the nest-var ICV, expressing nesting in terms of the
max-active-levels-var ICV instead.  The max-active-levels-var ICV
is now per data environment rather than per device.

2020-11-18  Kwok Cheung Yeung  <kcy@codesourcery.com>

libgomp/
* env.c (gomp_global_icv): Remove nest_var field.  Add
max_active_levels_var field.
(gomp_max_active_levels_var): Remove.
(parse_boolean): Return true on success.
(handle_omp_display_env): Express OMP_NESTED in terms of
max_active_levels_var.  Change format specifier for
max_active_levels_var.
(initialize_env): Set max_active_levels_var from
OMP_MAX_ACTIVE_LEVELS, OMP_NESTED, OMP_NUM_THREADS and
OMP_PROC_BIND.
* icv.c (omp_set_nested): Express in terms of
max_active_levels_var.
(omp_get_nested): Likewise.
(omp_set_max_active_levels): Use max_active_levels_var field instead
of gomp_max_active_levels_var.
(omp_get_max_active_levels): Likewise.
* libgomp.h (struct gomp_task_icv): Remove nest_var field.  Add
max_active_levels_var field.
(gomp_supported_active_levels): Set to UCHAR_MAX.
(gomp_max_active_levels_var): Delete.
* libgomp.texi (omp_get_nested): Update documentation.
(omp_set_nested): Likewise.
(OMP_MAX_ACTIVE_LEVELS): Likewise.
(OMP_NESTED): Likewise.
(OMP_NUM_THREADS): Likewise.
(OMP_PROC_BIND): Likewise.
* parallel.c (gomp_resolve_num_threads): Replace reference
to nest_var with max_active_levels_var.  Use max_active_levels_var
field instead of gomp_max_active_levels_var.

Update gcc zh_TW.po.

* zh_TW.po: Update.

options, lto: Optimize streaming of optimization nodes

Honza mentioned that especially for the new param machinery, most of
streamed values are probably going to be the default values.  Perhaps
somehow we could stream them more effectively.

This patch implements it and brings further savings, the size
goes down from 574 bytes to 273 bytes, i.e. less than half.
Not trying to handle enums because the code doesn't know if (enum ...) 10
is even valid, similarly non-parameters because those really generally
don't have large initializers, and params without Init (those are 0
initialized and thus don't need to be handled).

2020-11-18  Jakub Jelinek  <jakub@redhat.com>

* optc-save-gen.awk: Initialize var_opt_init.  In
cl_optimization_stream_out for params with default values larger than
10, xor the default value with the actual parameter value.  In
cl_optimization_stream_in repeat the above xor.

configury: --enable-link-serialization support

When performing LTO bootstraps, especially when using tmpfs for /tmp,
one can run a machine to halt when using higher levels of parallelism
and a large number of FEs, because there are too many concurrent LTO
link commands running at the same time and each one of them puts most of the
middle-end/backend objects into /tmp.

We have --enable-link-mutex configure option, but --enable-link-mutex has
a big problem that it decreases number of available jobs by the number of
link commands waiting for the lock, so e.g. when doing make -j32 build with
11 different big programs linked with $(LLINKER) we end up with just 22
effective jobs, and with e.g. make -j8 with those 11 different big programs
we actually most likely serialize everything during linking onto a single job.

The following patch implements a new configure option,
--enable-link-serialization, which implements different serialization and
as it doesn't use the mutex, just modifying the old option to be implemented
differently would be strange.  We can deprecate and later remove the old
option.  The new option doesn't use any shell mutexes, but uses make
dependencies.

The option is implemented inside of gcc/ configure and Makefiles,
which means that even inside of gcc/ make all (as well as e.g. make lto-dump)
will serialize and build all previous large binaries when configured this
way.
One can always make -j32 cc1 DO_LINK_SERIALIZATION=
to avoid that.
Furthermore, I've implemented the idea I wrote about, so that
--enable-link-serialization
is the same as
--enable-link-serialization=1
and means the large link commands are serialized, one can (the default)
--disable-link-serialization
which will cause all links to be parallelizable, but one can also
--enable-link-serialization=3
etc. which says that at most 3 of the large link commands can run
concurrently.
And finally I've implemented (only if the serialization is enabled) simple
progress bars for the linking.
With --enable-link-serialization and e.g. the 5 large links I have in my
current tree (cc1, cc1plus, f951, lto1 and lto-dump), before the linking it
prints
Linking |==--      | 20%
and after it
Linking |====      | 40%
(each == characters stand for already finished links, each --
characters stand for the link being started).
With --enable-link-serialization=3 it will change the way the start is
printed, one will get:
Linking |--        | 0%
at the start of cc1 link,
Linking |>>--      | 0%
at the start of the second large link and
Linking |>>>>--    | 0%
at the start of the third large link, where the >> characters stand for
already pending links.  The printing at the end of link command is
the same as with the full serialization, i.e. for the above 3:
Linking |==        | 20%
Linking |====      | 40%
Linking |======    | 60%
but one could actually get them in any order depending on which of those 3
finishes first - to get it 100% accurate I'd need to add some directory with
files representing finished links or similar, doesn't seem worth it.

2020-11-18  Jakub Jelinek  <jakub@redhat.com>

gcc/
* configure.ac: Add $lang.prev rules, INDEX.$lang and SERIAL_LIST and
SERIAL_COUNT variables to Make-hooks.
(--enable-link-serialization): New configure option.
* Makefile.in (DO_LINK_SERIALIZATION, LINK_PROGRESS): New variables.
* doc/install.texi (--enable-link-serialization): Document.
* configure: Regenerated.
gcc/c/
* Make-lang.in (c.serial): New goal.
(.PHONY): Add c.serial c.prev.
(cc1$(exeext)): Call LINK_PROGRESS.
gcc/cp/
* Make-lang.in (c++.serial): New goal.
(.PHONY): Add c++.serial c++.prev.
(cc1plus$(exeext)): Depend on c++.prev.  Call LINK_PROGRESS.
gcc/fortran/
* Make-lang.in (fortran.serial): New goal.
(.PHONY): Add fortran.serial fortran.prev.
(f951$(exeext)): Depend on fortran.prev.  Call LINK_PROGRESS.
gcc/lto/
* Make-lang.in (lto, lto1.serial, lto2.serial): New goals.
(.PHONY): Add lto lto1.serial lto1.prev lto2.serial lto2.prev.
(lto.all.cross, lto.start.encap): Remove dependencies.
($(LTO_EXE)): Depend on lto1.prev.  Call LINK_PROGRESS.
($(LTO_DUMP_EXE)): Depend on lto2.prev.  Call LINK_PROGRESS.
gcc/objc/
* Make-lang.in (objc.serial): New goal.
(.PHONY): Add objc.serial objc.prev.
(cc1obj$(exeext)): Depend on objc.prev.  Call LINK_PROGRESS.
gcc/objcp/
* Make-lang.in (obj-c++.serial): New goal.
(.PHONY): Add obj-c++.serial obj-c++.prev.
(cc1objplus$(exeext)): Depend on obj-c++.prev.  Call LINK_PROGRESS.
gcc/ada/
* gcc-interface/Make-lang.in (ada.serial): New goal.
(.PHONY): Add ada.serial ada.prev.
(gnat1$(exeext)): Depend on ada.prev.  Call LINK_PROGRESS.
gcc/brig/
* Make-lang.in (brig.serial): New goal.
(.PHONY): Add brig.serial brig.prev.
(brig1$(exeext)): Depend on brig.prev.  Call LINK_PROGRESS.
gcc/go/
* Make-lang.in (go.serial): New goal.
(.PHONY): Add go.serial go.prev.
(go1$(exeext)): Depend on go.prev.  Call LINK_PROGRESS.
gcc/jit/
* Make-lang.in (jit.serial): New goal.
(.PHONY): Add jit.serial jit.prev.
($(LIBGCCJIT_FILENAME)): Depend on jit.prev.  Call LINK_PROGRESS.
gcc/d/
* Make-lang.in (d.serial): New goal.
(.PHONY): Add d.serial d.prev.
(d21$(exeext)): Depend on d.prev.  Call LINK_PROGRESS.

testsuite: Adjust bb-slp-pr68892.c for AArch64

AArch64 passes the "not profitable" test because it treats vec_construct
as having a high-enough cost.  This means that we can try other vector
modes, which in turn causes "BB vectorization with gaps at the end of
a load is not supported" to be printed more than once.  The number of
times that we print the message doesn't seem important, so the patch
converts it to a plain scan-tree-dump.

gcc/testsuite/
* gcc.dg/vect/bb-slp-pr68892.c: Don't XFAIL the profitability
test for aarch64*-*-*.  Allow the "BB vectorization with gaps"
message to be printed more than once.

testsuite: Adjust gcc.dg/vect/slp-21.c for Arm targets

On arm* and aarch64* targets, we can vectorise the second of the main
loops using SLP, not just the third. As the comments say, whether this
is supported depends on a very specific permutation, so it seemed better
to use direct target selectors.

gcc/testsuite/
* gcc.dg/vect/slp-21.c: Expect 4 SLP instances to be vectorized
on arm* and aarch64* targets.

testsuite: Add vect_perm3_int guards

SLP vectorisation of gcc.dg/vect/fast-math-vect-call-1.c involves
a group of 3 floats, which requires the same permutation as
vect_perm3_int.

The load/store_lanes XFAILs in gcc.dg/vect/slp-perm-6.c implicitly
assumed vect_perm3_int, which is true for Advanced SIMD but not for
VLA SVE. Whether it's true for fixed-length SVE depends on the
vector length.

The xfail selector applies on top of the target selector, so it's
not necessary to make the xfail selector a strict subset of the
target selector.

gcc/testsuite/
* gcc.dg/vect/fast-math-vect-call-1.c: Only expect SLP to be used
on vect_perm3_int targets.
* gcc.dg/vect/slp-perm-6.c: Likewise. Only XFAIL the LOAD/STORE_LANES
tests on vect_perm3_int targets.

testsuite: Add a vect_partial_vectors_usage_2 guard

We don't need an epilogue loop if the main loop can operate on
partial vectors, so this patch disables an associated test.
The alternative would be to force partial-vectors-usage=1
on the command line.

gcc/testsuite/
* gcc.dg/vect/vect-epilogues.c: XFAIL test for epilogue loop
vectorization if vect_partial_vectors_usage_2.

testsuite: Fix vect/vect-sdiv-pow2-1.c

We're now able to vectorise the set-up loop:

      int p = power2 (fns[i].po2);
      for (int j = 0; j < N; j++)
        a[j] = ((p << 4) * j) / (N - 1) - (p << 5);

This patch adds an asm to stop the loop being vectorised.

gcc/testsuite/
* gcc.dg/vect/vect-sdiv-pow2-1.c (main): Add an asm to the
set-up loop.

aix: Fixinclude

This fixes an ODR violation in the AIX headers that is detected by C++
modules. While unnamed structs with typedef names for linkage
purposes are accepted, this case is an anonymous struct without such a
typedef name -- the typedef is attached to the pointer-to-struct type.
Fixed by naming the struct.

fixincludes/
* inclhack.def (aix_physaddr_t): New.
* fixincl.x: Regenerated.

preprocessor: C++ module-directives

C++20 modules introduces a new kind of preprocessor directive -- a
module directive.  These are directives but without the leading '#'.
We have to detect them by sniffing the start of a logical line.  When
detected we replace the initial identifiers with unspellable tokens
and pass them through to the language parser the same way deferred
pragmas are.  There's a PRAGMA_EOL at the logical end of line too.

One additional complication is that we have to do header-name lexing
after the initial tokens, and that requires changes in the macro-aware
piece of the preprocessor.  The above sniffer sets a counter in the
lexer state, and that triggers at the appropriate point.  We then do
the same header-name lexing that occurs on a #include directive or
has_include pseudo-macro.  Except that the header name ends up in the
token stream.

A couple of token emitters need to deal with the new token possibility.

gcc/c-family/
* c-lex.c (c_lex_with_flags): CPP_HEADER_NAMEs can now be seen.
libcpp/
* include/cpplib.h (struct cpp_options): Add module_directives
option.
(NODE_MODULE): New node flag.
(struct cpp_hashnode): Make rid-code a bitfield, increase bits in
flags and swap with type field.
* init.c (post_options): Create module-directive identifier nodes.
* internal.h (struct lexer_state): Add directive_file_token &
n_modules fields.  Add module node enumerator.
* lex.c (cpp_maybe_module_directive): New.
(_cpp_lex_token): Call it.
(cpp_output_token): Add '"' around CPP_HEADER_NAME token.
(do_peek_ident, do_peek_module): New.
(cpp_directives_only): Detect module-directive lines.
* macro.c (cpp_get_token_1): Deal with directive_file_token
triggering.

preprocessor: Add support for header unit translation

libcpp/
* files.c (struct _cpp_file): Add header_unit field.
(_cpp_stack_file): Add header unit support.
(cpp_find_header_unit): New.
* include/cpplib.h (cpp_find_header_unit): Declare.

preprocessor: Update mkdeps for modules

This is slightly different to the original patch I posted. This adds
separate module target and dependency functions (rather than a single
bi-modal function).

libcpp/
* include/cpplib.h (struct cpp_options): Add modules to
dep-options.
* include/mkdeps.h (deps_add_module_target): Declare.
(deps_add_module_dep): Declare.
* mkdeps.c (class mkdeps): Add modules, module_name, cmi_name,
is_header_unit fields. Adjust cdtors.
(deps_add_module_target, deps_add_module_dep): New.
(make_write): Write module dependencies, if enabled.

libstdc++: Fix ranges::join_view::_Iterator::operator-> [LWG 3500]

This applies the proposed resolution of LWG 3500, which corrects the
return type and constraints of this member function to use the right
iterator type. Additionally, a nearby local variable is uglified.

libstdc++-v3/ChangeLog:

* include/std/ranges (join_view::_Iterator::_M_satisfy): Uglify
local variable inner.
(join_view::_Iterator::operator->): Use _Inner_iter instead of
_Outer_iter in the function signature as per LWG 3500.
* testsuite/std/ranges/adaptors/join.cc (test08): Test it.

[PR97870] LRA: don't remove asm goto, just nullify it.

gcc/

2020-11-18 Vladimir Makarov <vmakarov@redhat.com>

PR target/97870
* lra-constraints.c (curr_insn_transform): Do not delete asm goto
with wrong constraints. Nullify it saving CFG.

testsuite/libgomp.c/usleep.h: Use sleep-loop also for GCN

As typically configured, newlib's libc.a does not build 'posix' and,
hence, usleep is not available. Thus, use the same fallback as for nvptx.

libgomp/
* testsuite/libgomp.c/usleep.h (fallback_usleep): Renamed from
nvptx_usleep; use also for device={arch(gcn)}.

Fix PR ada/97859, building ada cross compiler targeting powerpc64le-linux-gnu

2020-11-18 Matthias Klose <doko@ubuntu.com>

PR ada/97859
* Makefile.rtl (powerpc% linux%): Also match powerpc64le cpu.

MSP430: Add 64-bit hardware multiply support

Hardware multipliers that support widening 32-bit multiplication can
be used to perform a 64-bit * 64-bit multiplication more efficiently
than a software implementation.

The following equation is used to perform 64-bit multiplication for
devices with "32bit" or "f5series" hardware multiply versions:

  64bit_result = (low32_op0 * lop32_op1)
    + ((low32_op0 * high32_op1) << 32)
       + ((high32_op0 * low32_op1) << 32)

libgcc/ChangeLog:

* config/msp430/lib2hw_mul.S (mult64_hw): New.
(if MUL_32): Use mult64_hw for __muldi3.
(if MUL_F5): Use mult64_hw for __muldi3.
* config/msp430/lib2mul.c (__muldi3): New.
* config/msp430/t-msp430 (LIB2FUNCS_EXCLUDE): Define.

MSP430: Add mul{hi,si} and {u,}mulsidi3 expanders

GCC generates better code when multiplication operations, which require
library functions to perform, are caught in early in RTL, rather than
leaving the operation to be mapped to a library function later on.

When there is hardware multiply support, it is more efficient to perform
widening multiplication using the hardware multiplier instead of letting
GCC widen the arguments before calling the multiplication routine in the
wider mode.

gcc/ChangeLog:

* config/msp430/msp430.md (mulhi3): New.
(mulsi3): New.
(mulsidi3): Rename to *mulsidi3_inline.
(umulsidi3): Rename to *umulsidi3_inline.
(mulsidi3): New define_expand.
(umulsidi3): New define_expand.

tree-optimization/97886 - deal with strange LC PHI nodes

This makes vectorization properly assign vector types to PHI
nodes that copy from externals on loop exit edges.

2020-11-18 Richard Biener <rguenther@suse.de>

PR tree-optimization/97886
* tree-vect-loop.c (vectorizable_lc_phi): Properly assign
vector types to invariants for SLP.

d: Fix LHS of array concatentation evaluated before the RHS.

In an array append expression:

array ~= fun(array);

The array in the left hand side of the expression was extended before
evaluating the result of the right hand side, which resulted in the
newly uninitialized array index being used before set.

This fixes that so that the result of the right hand side is always
saved in a reusable temporary before assigning to the destination.

gcc/d/ChangeLog:

PR d/97843
* d-codegen.cc (build_assign): Evaluate TARGET_EXPR before use in
the right hand side of an assignment.
* expr.cc (ExprVisitor::visit (CatAssignExp *)): Force a TARGET_EXPR
on the element to append if it is a CALL_EXPR.

gcc/testsuite/ChangeLog:

PR d/97843
* gdc.dg/torture/pr97843.d: New test.

d: Fix a couple of ICEs found in the dmd front-end (PR97842)

- Segmentation fault on incomplete static if.
- Segmentation fault resolving typeof() expression when gagging is on.

Reviewed-on: https://github.com/dlang/dmd/pull/11971

gcc/d/ChangeLog:

PR d/97842
* dmd/MERGE: Merge upstream dmd b6a779e49

d: Add dragonflybsd support for D compiler and runtime

gcc/ChangeLog:

* config.gcc (*-*-dragonfly*): Add dragonfly-d.o and t-dragonfly.
* config/dragonfly-d.c: New file.
* config/t-dragonfly: New file.

libphobos/ChangeLog:

* configure.tgt: Add *-*-dragonfly* as a supported target.
* configure: Regenerate.
* m4/druntime/os.m4 (DRUNTIME_OS_SOURCES): Add dragonfly* as a posix
target.

libphobos: Merge upstream phobos 7948e0967.

Removes deprecated functions from std.string module.

Reviewed-on: https://github.com/dlang/phobos/pull/7694

libphobos/ChangeLog:

* src/MERGE: Merge upstream phobos 7948e0967.

openmp: Fix ICE on non-rectangular loop with known 0 iterations

The loops in the testcase are non-rectangular and have 0 iterations
(the outer loop iterates, but the inner one never).  In this case we
just have the overall number of iterations computed (0), and don't have
factor and other values computed.  We never need to map logical iterations
to the individual iterations in that case, and we were crashing during
expansion of that code.

2020-11-18  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/97862
* omp-expand.c (expand_omp_for_init_vars): Don't use the sqrt path
if number of iterations is constant 0.

* c-c++-common/gomp/pr97862.c: New test.

RISC-V: Support version controling for ISA standard extensions

- New option -misa-spec support: -misa-spec=[2.2|20190608|20191213] and
corresponding configuration option --with-isa-spec.

- Current default ISA spec set to 2.2, but we intend to bump this to
20191213 or later in next release.

gcc/ChangeLog:

* common/config/riscv/riscv-common.c (riscv_ext_version): New.
(riscv_ext_version_table): Ditto.
(get_default_version): Ditto.
(riscv_subset_t::implied_p): New field.
(riscv_subset_t::riscv_subset_t): Init implied_p.
(riscv_subset_list::add): New.
(riscv_subset_list::handle_implied_ext): Pass riscv_subset_t
instead of separated argument.
(riscv_subset_list::to_string): Handle zifencei and zicsr, and
omit version if version is unknown.
(riscv_subset_list::parsing_subset_version): New argument `ext`,
remove default_major_version and default_minor_version, get
default version info via get_default_version.
(riscv_subset_list::parse_std_ext): Update argument for
parsing_subset_version calls.
Handle 2.2 ISA spec, always enable zicsr and zifencei, they are
included in baseline ISA in that time.
(riscv_subset_list::parse_multiletter_ext): Update argument for
`parsing_subset_version` and `add` calls.
(riscv_subset_list::parse): Adjust argument for
riscv_subset_list::handle_implied_ext call.
* config.gcc (riscv*-*-*): Handle --with-isa-spec=.
* config.in (HAVE_AS_MISA_SPEC): New.
(HAVE_AS_MARCH_ZIFENCEI): Ditto.
* config/riscv/riscv-opts.h (riscv_isa_spec_class): New.
(riscv_isa_spec): Ditto.
* config/riscv/riscv.h (HAVE_AS_MISA_SPEC): New.
(ASM_SPEC): Pass -misa-spec if gas supported.
* config/riscv/riscv.opt (riscv_isa_spec_class) New.
* configure.ac (HAVE_AS_MARCH_ZIFENCEI): New test.
(HAVE_AS_MISA_SPEC): Ditto.
* configure: Regen.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-9.c: New.
* gcc.target/riscv/arch-10.c: Ditto.
* gcc.target/riscv/arch-11.c: Ditto.
* gcc.target/riscv/attribute-6.c: Remove, we don't support G
with version anymore.
* gcc.target/riscv/attribute-8.c: Reorder arch string to fit canonical
ordering.
* gcc.target/riscv/attribute-9.c: We don't emit version for
unknown extensions now.
* gcc.target/riscv/attribute-11.c: Add -misa-spec=2.2 flags.
* gcc.target/riscv/attribute-12.c: Ditto.
* gcc.target/riscv/attribute-13.c: Ditto.
* gcc.target/riscv/attribute-14.c: Ditto.
* gcc.target/riscv/attribute-15.c: New.
* gcc.target/riscv/attribute-16.c: Ditto.
* gcc.target/riscv/attribute-17.c: Ditto.

RISC-V: Support zicsr and zifencei extension for -march.

- CSR related instructions and fence instructions has to be splitted from
baseline ISA, zicsr and zifencei are corresponding sub-extension.

gcc/ChangeLog:

* common/config/riscv/riscv-common.c (riscv_implied_info):
d and f implied zicsr.
(riscv_ext_flag_table): Handle zicsr and zifencei.
* config/riscv/riscv-opts.h (MASK_ZICSR): New.
(MASK_ZIFENCEI): Ditto.
(TARGET_ZICSR): Ditto.
(TARGET_ZIFENCEI): Ditto.
* config/riscv/riscv.md (clear_cache): Check TARGET_ZIFENCEI.
(fence_i): Ditto.
* config/riscv/riscv.opt (riscv_zi_subext): New.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-8.c: New.
* gcc.target/riscv/attribute-14.c: Ditto.

RISC-V: Handle implied extension in canonical ordering.

- ISA spec has specify the order between multi-letter extensions, implied
extension also need to follow store in canonical ordering, so
most easy way is we keep that in-order during insertion.

gcc/ChangeLog:

* common/config/riscv/riscv-common.c (single_letter_subset_rank): New.
(multi_letter_subset_rank): Ditto.
(subset_cmp): Ditto.
(riscv_subset_list::add): Insert subext in canonical ordering.
(riscv_subset_list::parse_std_ext): Move handle_implied_ext to ...
(riscv_subset_list::parse): ... here.

Clean up loop-closed PHIs after loop finalize

This patch propagates loop-closed PHIs them out at
loop_optimizer_finalize.  For some cases, to clean up loop-closed PHIs
would save efforts of optimization passes after loopdone.

Thanks,
Jiufu Guo.

gcc/ChangeLog:
2020-10-18  Jiufu Guo   <guojiufu@linux.ibm.com>

* cfgloop.h (loop_optimizer_finalize): Add flag argument.
* loop-init.c (loop_optimizer_finalize): Call clean_up_loop_closed_phi.
* tree-cfgcleanup.h (clean_up_loop_closed_phi): New declare.
* tree-ssa-loop.c (tree_ssa_loop_done): Call loop_optimizer_finalize
with flag argument.
* tree-ssa-propagate.c (clean_up_loop_closed_phi): New function.

gcc/testsuite/ChangeLog:
2020-10-18  Jiufu Guo   <guojiufu@linux.ibm.com>

* gcc.dg/tree-ssa/loopclosedphi.c: New test.

cmd/go, cmd/cgo: update gofrontend mangling checks

This is a port of two patches in the master repository.

https://golang.org/cl/259298

    cmd/cgo: split gofrontend mangling checks into cmd/internal/pkgpath

    This is a step toward porting https://golang.org/cl/219817 from the
    gofrontend repo to the main repo.

    Note that this also corrects the implementation of the v2 mangling
    scheme to use ..u and ..U where appropriate.

https://golang.org/cl/259299

    cmd/go: use cmd/internal/pkgpath for gccgo pkgpath symbol

For golang/go#37272
For golang/go#41862

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/270637

Daily bump.

libstdc++: Revert changes for SYS_clock_gettime64 [PR 93421]

As discussed in the PR, it's incredibly unlikely that a system that
needs to use the SYS_clock_gettime syscall (e.g. glibc 2.16 or older) is
going to define the SYS_clock_gettime64 macro. Ancient systems that need
to use the syscall aren't going to have time64 support.

This reverts the recent changes to try and make clock_gettime syscalls
be compatible with systems that have been updated for time64 (those
changes were wrong anyway as they misspelled the SYS_clock_gettime64
macro). The changes for futex syscalls are retained, because we still
use them on modern systems that might be using time64.

To ensure that the clock_gettime syscalls are safe, configure will fail
if SYS_clock_gettime is needed, and SYS_clock_gettime64 is also defined
(but to a distinct value from SYS_clock_gettime), and the tv_sec member
of timespec is larger than long. This means we will be unable to build
on a hypothetical system where we need the time32 version of
SYS_clock_gettime but where userspace is using a time64 struct timespec.
In the unlikely event that this failure is triggered on any real
systems, we can fix it later. But we probably won't need to.

libstdc++-v3/ChangeLog:

PR libstdc++/93421
* acinclude.m4 (GLIBCXX_ENABLE_LIBSTDCXX_TIME): Fail if struct
timespec isn't compatible with SYS_clock_gettime.
* configure: Regenerate.
* src/c++11/chrono.cc: Revert changes for time64 compatibility.
Add static_assert instead.
* src/c++11/futex.cc (_M_futex_wait_until_steady): Assume
SYS_clock_gettime can use struct timespec.

add --with-{cpu,arch,tune}-{32,64} as alias flags for --with-{cpu,arch,tune}

gcc/
* config.gcc: add configure flags --with-{cpu,arch,tune}-{32,64}
as alias flags for --with-{cpu,arch,tune} on AArch64.
* doc/install.texi: Document new flags for aarch64.

add --with-tune configure flag

fixes a configure error on Arm64 when passing --with-tune=... to configure:
```
This target does not support --with-tune.
Valid --with options are: abi cpu arch
```
The missing flag sets target tuning to a different value than generic tuning.

gcc/
* config.gcc: Add --with-tune to AArch64 configure flags.

recognize implied ranges for modulo.

implement op1_range for modulo with implied positive and negative ranges.

gcc/
PR tree-optimization/91029
* range-op.cc (operator_trunc_mod::op1_range): New.
gcc/testsuite/
* gcc.dg/pr91029.c: New.

Fix ipa-icf ICE on variadic types

* ipa-icf.c (sem_function::hash_stmt): Fix conditional on
variably_modified_type_p.

extend cache_integer_cst

This modules-related patch extends cache_integer_cst.  Currently, when
given a small cst, that cst is added to the type's small and /must
not/ already be there.  Large values are fine if they are already in
the large cache.  This adds a parameter to indicate small duplicates
are ok, and it returns the cached value -- either what was already
tehre, or the newly inserted const.

gcc/
* tree.h (cache_integer_cst): Add defaulted might_duplicate parm.
* tree.c (cache_integer_cst): Return the integer cst, add
might_duplicate parm to permit finding a small duplicate.

c++: duplicate block-scope extern [PR 97877]

We ICED with a duplicated block-scope extern, as duplicate_decls was
dropping the decl_lang_specific of olddecl. Simplys adding
appropriate retrofitting and copying turned out to be insufficient
because you can get a block-scope using decl also matching the extern.
The latter seems a little suspicious and I have asked CWG for advice.
While there robustified the assert about releasing olddecls'
lang-specific -- if it had one, the new decl better have one.

PR c++/97877
gcc/cp/
* decl.c (duplicate_decls): Deal with duplicated DECL_LOCAL_DECL_P
decls. Extend decl_lang_specific checking assert.
gcc/testsuite/
* g++.dg/lookup/pr97877.C: New.

global trees

This reorders the common and c++ global tree arrays. It introduces a
module-specific High Water Mark, below which are the immutable slots
initialized at startup and beyond which are the lazily filled slots
(and a few immutables we need to locate by name lookup anyway).

gcc/c-family/
* c-common.h (enum c_tree_index): Reorder to place lazy fields
after newly-added CTI_MODULE_HWM.
gcc/cp/
* cp-tree.h (enum cp_tree_index): Reorder to place lazy fields
after newly-added CPTI_MODULE_HWM.

Fortran texi: Fix description of GFC_RTCHECK_* macros.

gcc/fortran/ChangeLog:

* gfortran.texi: Fix description of GFC_RTCHECK_* to match actual
code.

IOR with nonzero, range cannot contain 0.

Remove zero from IOR ranges with non-zero masks.

gcc/
PR tree-optimization/83072
* range-op.cc (wi_optimize_and_or): Remove zero from IOR range when
mask is non-zero.
gcc/testsuite/
* gcc.dg/pr83072.c: New.

C++ : Remove an overzealous checking assert [PR97871]

It seems we accept __attribute__(()) without any diagnostic at present,
so my added checking assert fires for something like:

__attribute__ (()) int a;

Fixed by removing the assert; in the case that the user enters something
like:

__attribute__ (()) extern "C" int foo;

The diagnostic about attributes before linkage specs will fire and show
the empty attributes.

gcc/cp/ChangeLog:

PR c++/97871
* parser.c (cp_parser_declaration): Remove checking assert.

float.h: Handle C2x __STDC_WANT_IEC_60559_EXT__

TS 18661-1 and 18661-2 have various definitions conditional on
__STDC_WANT_IEC_60559_BFP_EXT__ and __STDC_WANT_IEC_60559_DFP_EXT__
macros.  When those TSes were integrated into C2x, most of the feature
test macro conditionals were removed (with declarations for decimal FP
becoming conditional only on whether decimal FP is supported by the
implementation and those for binary FP becoming unconditionally
required).

A few tests of those feature test macros remained for declarations
that appeared only in Annex F and not in the main part of the
standard.  A change accepted for C2x at the last WG14 meeting (but not
yet added to the working draft in git) was to replace both those
macros by __STDC_WANT_IEC_60559_EXT__; if __STDC_WANT_IEC_60559_EXT__
is defined, the specific declarations in the headers will then depend
on which features are supported by the implementation, as for
declarations not controlled by a feature test macro at all.

Thus, add a check of __STDC_WANT_IEC_60559_EXT__ for CR_DECIMAL_DIG in
float.h, the only case of this change relevant to GCC.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/
2020-11-17  Joseph Myers  <joseph@codesourcery.com>

* ginclude/float.h (CR_DECIMAL_DIG): Also define for
[__STDC_WANT_IEC_60559_EXT__].

gcc/testsuite/
2020-11-17  Joseph Myers  <joseph@codesourcery.com>

* gcc.dg/cr-decimal-dig-3.c: New test.

float.h: C2x *_IS_IEC_60559 macros

C2x adds float.h macros that say whether float, double and long double
match an IEC 60559 (IEEE 754) format and operations.  Add these
macros to GCC's float.h.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.

gcc/c-family/
2020-11-17  Joseph Myers  <joseph@codesourcery.com>

* c-cppbuiltin.c (builtin_define_float_constants): Define
"*_IS_IEC_60559__" macros.

gcc/
2020-11-17  Joseph Myers  <joseph@codesourcery.com>

* ginclude/float.h [__STDC_VERSION__ > 201710L] (FLT_IS_IEC_60559,
DBL_IS_IEC_60559, LDBL_IS_IEC_60559): New macros.

gcc/testsuite/
2020-11-17  Joseph Myers  <joseph@codesourcery.com>

* gcc.dg/c11-float-6.c, gcc.dg/c2x-float-10.c: New tests.

testsuite: allow opd section

PPC64 Linux ELFv1 uses function descriptors with the descriptor
placed in the .opd section. This patch expands the pattern in the testcase
to accept .opd section name associated with the function name.

gcc/testsuite/ChangeLog:

* gcc.dg/pr25376.c: Allow .opd section.

preprocessor: new callbacks

These two callbacks are needed for C++ modules. The first is for
handling macros from header-units. These are resolved lazily. The
second is for include-translation -- whether a #include gets turned
into a header-unit import.

libcpp/
* include/cpplib.h (struct cpp_callbacks): Add
user_deferred_macro & translate_include.

libstdc++: Fix unconditional definition of __cpp_lib_span in <version> [PR 97869}

The <span> header is empty unless Concepts are supported, but <version>
defines the __cpp_lib_span feature test macro unconditionally. It should
be guarded by the same conditions as in <span>.

libstdc++-v3/ChangeLog:

PR libstdc++/97869
* include/precompiled/stdc++.h: Include <coroutine>.
* include/std/version (__cpp_lib_span): Check __cpp_lib_concepts
before defining.

preprocessor: module line maps

This patch adds LC_MODULE as a map kind, used to indicate a c++
module.  Unlike a regular source file, it only contains a single
location, and the source locations in that module are represented by
ordinary locations whose 'included_from' location is the module.

It also exposes some entry points that modules will use to create
blocks of line maps.

In the original posting, I'd missed the deletion of the
linemap_enter_macro from internal.h.  That's included here.

libcpp/
* include/line-map.h (enum lc_reason): Add LC_MODULE.
(MAP_MODULE_P): New.
(line_map_new_raw): Declare.
(linemap_enter_macro): Move declaration from internal.h
(linemap_module_loc, linemap_module_reparent)
(linemap_module_restore): Declare.
(linemap_lookup_macro_indec): Declare.
* internal.h (linemap_enter_macro): Moved to line-map.h.
* line-map.c (linemap_new_raw): New, broken out of ...
(new_linemap): ... here.  Call it.
(LAST_SOURCE_LINE_LOCATION): New.
(liemap_module_loc, linemap_module_reparent)
(linemap_module_restore): New.
(linemap_lookup_macro_index): New, broken out of ...
(linemap_macro_map_lookup): ... here.  Call it.
(linemap_dump): Add module dump.

Add MODE_OPAQUE

After discussion with Richard Sandiford on IRC, he suggested adding a
new mode class MODE_OPAQUE to deal with the problems (PR 96791) we had
been having with POImode/PXImode in powerpc target. This patch is the
accumulation of changes I needed to make to add this and make it useable
for the purposes of what power10 MMA needed.

MODE_OPAQUE modes allow you to have modes for which you can just
define loads and stores. By design, optimization does not expect to
know how to do arithmetic or subregs on these modes. This allows us to
have modes for multi-register vector operations where we don't want to
open Pandora's Box and define general arithmetic operations.

This patch will be followed by a target specific patch to change the
powerpc power10 MMA builtins to use opaque modes, and will also let use
use the vector pair loads/stores defined with that in the inline expansion
of memcpy/memmove, allowing me to fix PR 96791.

gcc/ChangeLog
PR target/96791
* mode-classes.def: Add MODE_OPAQUE.
* machmode.def: Add OPAQUE_MODE.
* tree.def: Add OPAQUE_TYPE for types that will use
MODE_OPAQUE.
* doc/generic.texi: Document OPAQUE_TYPE.
* doc/rtl.texi: Document MODE_OPAQUE.
* machmode.h: Add OPAQUE_MODE_P().
* genmodes.c (complete_mode): Add MODE_OPAQUE.
(opaque_mode): New function.
* tree.c (tree_code_size): Add OPAQUE_TYPE.
* tree.h: Add OPAQUE_TYPE_P().
* stor-layout.c (int_mode_for_mode): Treat MODE_OPAQUE modes
like BLKmode.
* ira.c (find_moveable_pseudos): Treat MODE_OPAQUE modes more
like integer/float modes here.
* dbxout.c (dbxout_type): Treat OPAQUE_TYPE like VOID_TYPE.
* tree-pretty-print.c (dump_generic_node): Treat OPAQUE_TYPE
like like other types.

libstdc++: Fix ranges::search_n for random access iterators [PR97828]

My ranges transcription of the std::search_n implementation for random
access iterators missed a crucial part of the algorithm which the
existing tests didn't exercise.  When __remainder is less than __count
at the start of an iteration of the outer while loop, it means we're
continuing a partial match of __count - __remainder elements from the
previous iteration.  If at the end of the iteration we don't complete
this partial match, we need to reset __remainder so that it's only
offset by the size of the most recent partial match before starting the
next iteration.

This patch fixes this appropriately, mirroring how it's done in the
corresponding std::search_n implementation.

libstdc++-v3/ChangeLog:

PR libstdc++/97828
* include/bits/ranges_algo.h (__search_n_fn::operator()): Check
random_access_iterator before using the backtracking
implementation.  When the backwards scan fails prematurely,
reset __remainder appropriately.
* testsuite/25_algorithms/search_n/97828.cc: New test.

preprocessor: Fix profiled bootstrap warning [pr97858]

As Jakub points out, we only ever pass a single variadic parm (if at
all), so just an optional arg is fine.

PR preprocessor/97858
libcpp/
* mkdeps.c (munge): Drop varadic args, we only ever use one.

Improve handling of memory operands in ipa-icf 3/4

this patch is based on Maritn's patch
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg02633.html
however based on new code that track and compare memory accesses
so it can be implemented correctly.

As shown here
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558773.html
the most common reason for function body being streamed in but merging to fail
is the mismatch in base alias set.

This patch collect base and ref types ao_alias_ptr types, stream them to WPA
and at WPA time hash is produced. Now we can use alias_sets since these these
are assumed to be same as ltrans time alias sets. This is currently not always
true - but that is pre-existing issue.  I will try to produce a testcase and
make followup patch on this (that will stream out ODR types with TYPE_CANONICAL
that is !ODR as !ODR type). However for this patch this is not a problem since
the real alias sets are finer but definitly not coarser.

We may make it possible to use canonical type hash and save some streaming, but
I think it would be better to wait for next stage1 since it is not completely
trivial WRT ODR types: either we hash ODR type names and then hash values would
be too coarse for cases we got conflict betwen C and C++ type or we do not
stream and will again get into trouble with hash values being too weak. Tried
that - we get a lot of types that are struturally same but distinguished by
ODR names (from template instantiations).

As followup I will add code for merging with mismatched base alias sets.  This
makes the aforementioned problem about ODR names less pronounced but it is
still present on pointer loads/stores which requires REF alias set mismatches.

2020-11-13  Jan Hubicka  <hubicka@ucw.cz>
    Martin Liska  <mliska@suse.cz>

* ipa-icf.c: Include data-streamer.h and alias.h.
(sem_function::sem_function): Initialize memory_access_types
and m_alias_sets_hash.
(sem_function::hash_stmt): For memory accesses and when going to
do lto streaming add base and ref types into memory_access_types.
(sem_item_optimizer::write_summary): Stream memory access types.
(sem_item_optimizer::read_section): Likewise and also iniitalize
m_alias_sets_hash.
(sem_item_optimizer::execute): Call
sem_item_optimizer::update_hash_by_memory_access_type.
(sem_item_optimizer::update_hash_by_memory_access_type): Updat.
* ipa-icf.h (sem_function): Add memory_access_types and
m_alias_sets_hash.

Make ltrans type canonicals compatible with WPA ones

This patch fixes profiledbootstrap failure with LTO enabled.
Not refining alias sets from WPA to ltrans time is a good invariant to
maintain and the canonical type hash behaves this way.  However I broke
this with the ODR logic.

Normally we define canonical types for C++ ODR types according to their
type names.  However to make ODR types compatible with C types we check
if structurally equivalent C type exists and if so, we ignore ODR
names giving up on the precision.

This however is not stable between WPA and ltrans since at ltrans the
type merging does not see as many types as WPA does.  To make this
consistent the patch makes WPA ODR_TYPE_P == 0 for ODR types that
conflicted with non-ODR type.

I had to drop one sanity check in ipa-utils.h (that I think is not very
important - I added it while introducing CXX_ODR_P machinery) and also
it now may happen that we query odr_based_tbaa_p before registering
first ODR type so we do not want to ICE here.
ODR type registration happens early to produce ODR violation warings.
Those are not done at ltrans, so dropping the registration is safe. The
type will still be added while computing the type inheritance graph if
needed for devirtualization (and late devirtualization is not very
useful anyway since it won't enable inlining).

gcc/ChangeLog:
PR bootstrap/97857
* ipa-devirt.c (odr_based_tbaa_p): Do not ICE when
odr_hash is not initialized
* ipa-utils.h (type_with_linkage_p): Do not sanity check
CXX_ODR_P.
* tree-streamer-out.c (pack_ts_type_common_value_fields): Set
CXX_ODR_P according to the canonical type.

gcc/lto/ChangeLog:
PR bootstrap/97857
* lto-common.c (gimple_register_canonical_type_1): Only
register types with TYPE_CXX_ODR_P flag; sanity check that no
conflict happens at ltrans time.

langhooks: preprocessor hooks for c++ modules

This is a slightly modified version of 01-langhooks.def. I realized I
didn't need the deferred macro langhook -- that can be directly
installed into the preprocessor callbacks via preprocess_options lang
hook.

gcc/
* langhooks-def.h (LANG_HOOKS_PREPROCESS_MAIN_FILE)
(LANG_HOOKS_PREPROCESS_OPTIONS, LANG_HOOKS_PREPROCESS_UNDEF)
(LANG_HOOKS_PREPROCESS_TOKEN): New.
(LANG_HOOKS_INITIALIZER): Add them.
* langhooks.h (struct lang_hooks): Add preprocess_main_file,
preprocess_options, preprocess_undef, preprocess_token hooks. Add
enum PT_flags.
gcc/c-family/
* c-lex.c: #include "langhooks.h".
(cb_undef): Maybe call preprocess_undef lang hook.
* c-opts.c (c_common_post_options): Maybe call preprocess_options
lang hook.
(push_command_line_include): Maybe call preprocess_main_file lang
hook.
(cb_file_change): Likewise.
* c-ppoutput.c: #include "langhooks.h.
(scan_translation_unit): Maybe call preprocess_token lang hook.
(class do_streamer): New, derive from token_streamer.
(directives_only_cb): Data pointer is do_streamer, call
preprocess_token lang hook.
(scan_translation_unit_directives_only): Use do_streamer.
(print_line_1): Move src_line recording to after string output.
(cb_undef): Maybe call preprocess_undef lang hook.

c-family: token streamer

This is broken out of modules patch 01-langhooks.diff, I realized that
this part is independent, and removes some duplicated code -- migrated
to the token_streamer class.

gcc/c-family/
* c-ppoutput.c (scan_translation_unit): Use token_streamer, remove
code duplicating that functionality.

x86: Add a testcase for PR target/31799

Add a testcase for PR target/31799 which was fixed by

commit 4f0473fe89e68bf7c09542ee5c3684da25a5b435
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Fri May 12 21:04:05 2017 +0200

    compare-elim.c (try_eliminate_compare): Canonicalize operation with embedded compare to [(set (reg:CCM) (compare:CCM...

            * compare-elim.c (try_eliminate_compare): Canonicalize
            operation with embedded compare to
            [(set (reg:CCM) (compare:CCM (operation) (immediate)))
             (set (reg) (operation)].

            * config/i386/i386.c (TARGET_FLAGS_REGNUM): New define.

in GCC 8.

PR target/31799
* gcc.target/i386/pr31799.c: New test.

aarch64: Remove XFAILs for two SVE tests

These tests started passing a while ago, so remove the XFAILs.

gcc/testsuite/
* gcc.target/aarch64/sve/cond_cnot_1.c: Remove XFAIL.
* gcc.target/aarch64/sve/cond_unary_1.c: Likewise.

PR97693: Specify required vectype in vectorizable_call

The vectorizable_call part of r11-1143 dropped the required
vectype when moving from vect_get_vec_def_for_operand to
vect_get_vec_defs_for_operand. This caused an ICE on the
testcase for SVE, because we ended up with a non-predicate
value being passed to a predicate input.

AFAICT this was the only instance of that happening. The types
seemed to get carried forward for all the other converted calls.

gcc/
PR tree-optimization/97693
* tree-vect-stmts.c (vectorizable_call): Pass the required vectype
to vect_get_vec_defs_for_operand.

gcc/testsuite/
PR tree-optimization/97693
* gcc.dg/vect/pr97693.c: New test.

testsuite: Add a vect_load_lanes guard

We still fall back to load/store-lanes for slp-46.c, if the target
supports it.

gcc/testsuite/
* gcc.dg/vect/slp-46.c: XFAIL test for SLP on vect_load_lanes targets.

testsuite: Add a vect_element_align_preferred guard

We don't try to increase the alignment of decls if
vect_element_align_preferred.

gcc/testsuite/
* gcc.dg/vect/aligned-section-anchors-nest-1.c: XFAIL alignment
test if vect_element_align_preferred.

testsuite: Adjust vect/bb-slp-subgroups-3.c for VL vectors

Because we disable the cost model, targets with variable-length
vectors can end up vectorising the store to a[0..7] on its own.
With the cost model we do something sensible.

gcc/testsuite/
* gcc.dg/vect/bb-slp-subgroups-3.c: XFAIL for variable-length vectors.

testsuite: Adjust vect/pr65947-8.c for SVE

We can vectorise vect/pr65947-8.c for SVE, as we can for GCN.

gcc/testsuite/
* gcc.dg/vect/pr65947-8.c: Expect the loop to be vectorized for SVE.

testsuite: XFAIL SLP induction tests for VL vectors

We don't yet support SLP inductions for variable-length vectors,
so this patch XFAILs some associated tests.

(Inductions aren't inherently difficult to support. It just hasn't
been done yet.)

gcc/testsuite/
* gcc.dg/vect/pr97678.c: XFAIL test for SLP vectorization
for variable-length vectors.
* gcc.dg/vect/pr97835.c: Likewise.
* gcc.dg/vect/slp-49.c: Likewise.
* gcc.dg/vect/vect-outer-slp-1.c: Likewise.
* gcc.dg/vect/vect-outer-slp-2.c: Likewise.
* gcc.dg/vect/vect-outer-slp-3.c: Likewise.

testsuite: XFAIL some SLP reduction tests for VLA SVE

For variable-length SVE, we can only use SLP for N scalars of type
T if the number of Ts in a vector is a multiple of N. For ints
this means that N must be 4 or 2, so this patch XFAILs two tests
for N==8.

The exact limit seems inherently target-specific -- variable-length
vectors with a 256-bit granule would work fine -- so I used aarch64_sve
selectors on the XFAILs.

gcc/testsuite/
* gcc.dg/vect/slp-reduc-4.c: XFAIL test for SLP vectorization
for variable-length SVE.
* gcc.dg/vect/slp-reduc-7.c: Likewise.

testsuite: Remove XFAIL for variable-length vectors

The XFAIL for variable-length vectors is no longer needed since
we can't build the required constant vector and so fall back to
fixed-length alternatives.

gcc/testsuite/
* gcc.dg/vect/bb-slp-43.c: Remove XFAIL for vect_variable_length.

testsuite: Extend vector() regexp

For variable-length vectors, the N inside “vector(N) T” can
contain the characters ‘[’, ‘]’ and ‘,’.

gcc/testsuite/
* gcc.dg/vect/pr91750.c: Allow "[]," inside a vector(...) lane count.

gcc: Add `ll` and `L` length modifiers for `ms_printf`

Previous code abused `FMT_LEN_L` for the `I` modifier. As `L` is a
valid modifier for `f`, `e`, `g`, etc. and `I` has the same semantics
as the C99 `z` modifier, `FMT_LEN_z` is now used instead.

First, in the Microsoft ABI, type `long double` has the same layout as
type `double`, so `%Lg` behaves identically to `%g`. Users should pass
in `double`s instead of `long double`s, as GCC uses the 10-byte format.

Second, with a CRT that is recent enough (MSVCRT since Vista, MSVCR80,
UCRT, or mingw-w64 8.0), `printf`-family functions can handle the `ll`
length modifier correctly. This ability is assumed to be available
universally. A lot of libraries (such as libgomp) that use the
`format(printf, ...)` attribute used to suffer from warnings about
unknown format specifiers.

Reference: https://docs.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2008/tcxf1dw6(v=vs.90)
Reference: https://docs.microsoft.com/en-us/cpp/porting/visual-cpp-what-s-new-2003-through-2015#new-crt-features
Signed-off-by: Liu Hao <lh_mouse@126.com>
gcc/ChangeLog:
* config/i386/msformat-c.c: Add more length modifiers.

gcc/testsuite/ChangeLog:
* gcc.dg/format/ms_c99-printf-3.c: Update tests.

MingW: Don't add suffix for nul device

This patch fixes an issue where on systems that are
HAVE_TARGET_EXECUTABLE_SUFFIX the driver calls convert_filename in order to
add the suffix to the filename.  However while it excludes `-` it doesn't
exclude the null device.  This patches changes the check to exclude anything
that is not a file by calling not_actual_file_p instead.

This also fixes a bug in not_actual_file_p which was accidentally testing
a the global variable output_file instead of the supplied argument.  This
hasn't been an issue so far because because not_actual_file_p was only used
on output_file till now.

This fixes the adding of an extension to the nul device which is against
the recommendations on msdn[0] and makes it harder for the next tool in line
to detect it.

Bootstrapped Regtested on x86_64-w64-mingw32 and no issues.
Did do a bootstrap on x86_64-pc-linux-gnu but no regtest as it's not a
HAVE_TARGET_EXECUTABLE_SUFFIX system.

[0] https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file

gcc/ChangeLog:

PR driver/97574
* gcc.c (convert_filename): Don't add suffix to things that are
not files.
(not_actual_file_p): Use supplied argument.

c: Reject _Atomic type * as last argument to __builtin_*_overflow [PR90628]

During the __builtin_clear_padding implementation, I've noticed we don't
diagnose _Atomic whatever * as last argument to __builtin_*_overflow.
As the storing by that builtin isn't atomic in any way, I think we should
reject it.

2020-11-17 Jakub Jelinek <jakub@redhat.com>

PR c/90628
* c-common.c (check_builtin_function_arguments)
<case BUILT_IN_ADD_OVERFLOW>: Diagnose when last argument is pointer
to _Atomic. For the TYPE_READONLY case, adjust message to be usable
for more builtins and argument positions.

* gcc.dg/builtin-arith-overflow-4.c: New test.

guality: Workaround for guality/pr59776.c testcase

The test has been added 3 years before noipa attribute has been introduced,
but already at that point I wanted to avoid IPA opts getting into way,
most of the foo function is optimized away and the debug info just points
to the caller\s var.  With the recent modref/aliasing changes the caller's
store to the variable whose address it is passing to the function is
optimized away too.

I think we should just use noipa to avoid this, though perhaps longer term
we should think about some debug info improvements how to deal with that.

The caller had before dse1:
  # DEBUG BEGIN_STMT
  x.f = 5.0e+0;
  x.g = 6.0e+0;
  # DEBUG BEGIN_STMT
  foo (&x);
  # DEBUG BEGIN_STMT
  x ={v} {CLOBBER};
and the x.f and x.g stores are optimized away.  If we had a way to pretend
the memory contains those values anyway...

Tested on x86_64-linux, fixes the guality regressions
+FAIL: gcc.dg/guality/pr59776.c   -O1  -DPREVENT_OPTIMIZATION  line pr59776.c:17 s1.f == 5.0
+FAIL: gcc.dg/guality/pr59776.c   -O1  -DPREVENT_OPTIMIZATION  line pr59776.c:17 s1.g == 6.0
+FAIL: gcc.dg/guality/pr59776.c   -O1  -DPREVENT_OPTIMIZATION  line pr59776.c:20 s1.f == 5.0
+FAIL: gcc.dg/guality/pr59776.c   -O1  -DPREVENT_OPTIMIZATION  line pr59776.c:20 s1.g == 6.0
+FAIL: gcc.dg/guality/pr59776.c   -O1  -DPREVENT_OPTIMIZATION  line pr59776.c:20 s2.f == 5.0
+FAIL: gcc.dg/guality/pr59776.c   -O2  -DPREVENT_OPTIMIZATION  line pr59776.c:17 s1.f == 5.0
+FAIL: gcc.dg/guality/pr59776.c   -O2  -DPREVENT_OPTIMIZATION  line pr59776.c:17 s1.g == 6.0
+FAIL: gcc.dg/guality/pr59776.c   -O2  -DPREVENT_OPTIMIZATION  line pr59776.c:20 s1.f == 5.0
+FAIL: gcc.dg/guality/pr59776.c   -O2  -DPREVENT_OPTIMIZATION  line pr59776.c:20 s1.g == 6.0
+FAIL: gcc.dg/guality/pr59776.c   -O2  -DPREVENT_OPTIMIZATION  line pr59776.c:20 s2.f == 5.0
+FAIL: gcc.dg/guality/pr59776.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  -DPREVENT_OPTIMIZATION line pr59776.c:17 s1.f == 5.0
+FAIL: gcc.dg/guality/pr59776.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  -DPREVENT_OPTIMIZATION line pr59776.c:17 s1.g == 6.0
+FAIL: gcc.dg/guality/pr59776.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  -DPREVENT_OPTIMIZATION line pr59776.c:20 s1.f == 5.0
+FAIL: gcc.dg/guality/pr59776.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  -DPREVENT_OPTIMIZATION line pr59776.c:20 s1.g == 6.0
+FAIL: gcc.dg/guality/pr59776.c   -O2 -flto -fno-use-linker-plugin -flto-partition=none  -DPREVENT_OPTIMIZATION line pr59776.c:20 s2.f == 5.0
+FAIL: gcc.dg/guality/pr59776.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line pr59776.c:17 s1.f == 5.0
+FAIL: gcc.dg/guality/pr59776.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line pr59776.c:17 s1.g == 6.0
+FAIL: gcc.dg/guality/pr59776.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line pr59776.c:20 s1.f == 5.0
+FAIL: gcc.dg/guality/pr59776.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line pr59776.c:20 s1.g == 6.0
+FAIL: gcc.dg/guality/pr59776.c   -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line pr59776.c:20 s2.f == 5.0
+FAIL: gcc.dg/guality/pr59776.c   -O3 -g  -DPREVENT_OPTIMIZATION  line pr59776.c:17 s1.f == 5.0
+FAIL: gcc.dg/guality/pr59776.c   -O3 -g  -DPREVENT_OPTIMIZATION  line pr59776.c:17 s1.g == 6.0
+FAIL: gcc.dg/guality/pr59776.c   -O3 -g  -DPREVENT_OPTIMIZATION  line pr59776.c:20 s1.f == 5.0
+FAIL: gcc.dg/guality/pr59776.c   -O3 -g  -DPREVENT_OPTIMIZATION  line pr59776.c:20 s1.g == 6.0
+FAIL: gcc.dg/guality/pr59776.c   -O3 -g  -DPREVENT_OPTIMIZATION  line pr59776.c:20 s2.f == 5.0
+FAIL: gcc.dg/guality/pr59776.c   -Os  -DPREVENT_OPTIMIZATION  line pr59776.c:17 s1.f == 5.0
+FAIL: gcc.dg/guality/pr59776.c   -Os  -DPREVENT_OPTIMIZATION  line pr59776.c:17 s1.g == 6.0
+FAIL: gcc.dg/guality/pr59776.c   -Os  -DPREVENT_OPTIMIZATION  line pr59776.c:20 s1.f == 5.0
+FAIL: gcc.dg/guality/pr59776.c   -Os  -DPREVENT_OPTIMIZATION  line pr59776.c:20 s1.g == 6.0
+FAIL: gcc.dg/guality/pr59776.c   -Os  -DPREVENT_OPTIMIZATION  line pr59776.c:20 s2.f == 5.0
introduced in the last 2 days.

2020-11-17  Jakub Jelinek  <jakub@redhat.com>

* gcc.dg/guality/pr59776.c (foo): Use noipa attribute instead of
noinline, noclone.

Relocatable read-only section support for absolute jump table

This patch puts absolute jump tables into a relocatable read-only section
if they are on ELF target and relocation is supported.

gcc/ChangeLog:

* final.c (final_scan_insn_1): Set jump table relocatable as the
second argument of targetm.asm_out.function_rodata_section.
* output.h (default_function_rodata_section,
default_no_function_rodata_section): Add the second argument to the
declarations.
* target.def (function_rodata_section): Change the doc and add
the second argument.
* doc/tm.texi: Regenerate.
* varasm.c (jumptable_relocatable): Implement.
(default_function_rodata_section): Add the second argument
and the support for relocatable read only sections.
(default_no_function_rodata_section): Add the second argument.
(function_mergeable_rodata_prefix): Set the second argument to false.
* config/mips/mips.c (mips_function_rodata_section): Add the second
arugment and set it to false.
* config/s390/s390.c (targetm.asm_out.function_rodata_section): Set
the second argument to false.
* config/s390/s390.md: Likewise.