SLP reductions with variable-length vectors
Two things stopped us using SLP reductions with variable-length vectors:
(1) We didn't have a way of constructing the initial vector.
This patch does it by creating a vector full of the neutral
identity value and then using a shift-and-insert function
to insert any non-identity inputs into the low-numbered elements.
(The non-identity values are needed for double reductions.)
Alternatively, for unchained MIN/MAX reductions that have no neutral
value, we instead use the same duplicate-and-interleave approach as
for SLP constant and external definitions (added by a previous
patch).
(2) The epilogue for constant-length vectors would extract the vector
elements associated with each SLP statement and do scalar arithmetic
on these individual elements. For variable-length vectors, the patch
instead creates a reduction vector for each SLP statement, replacing
the elements for other SLP statements with the identity value.
It then uses a hardware reduction instruction on each vector.
2018-01-13 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/md.texi (vec_shl_insert_@var{m}): New optab.
* internal-fn.def (VEC_SHL_INSERT): New internal function.
* optabs.def (vec_shl_insert_optab): New optab.
* tree-vectorizer.h (can_duplicate_and_interleave_p): Declare.
(duplicate_and_interleave): Likewise.
* tree-vect-loop.c: Include internal-fn.h.
(neutral_op_for_slp_reduction): New function, split out from
get_initial_defs_for_reduction.
(get_initial_def_for_reduction): Handle option 2 for variable-length
vectors by loading the neutral value into a vector and then shifting
the initial value into element 0.
(get_initial_defs_for_reduction): Replace the code argument with
the neutral value calculated by neutral_op_for_slp_reduction.
Use gimple_build_vector for constant-length vectors.
Use IFN_VEC_SHL_INSERT for variable-length vectors if all
but the first group_size elements have a neutral value.
Use duplicate_and_interleave otherwise.
(vect_create_epilog_for_reduction): Take a neutral_op parameter.
Update call to get_initial_defs_for_reduction. Handle SLP
reductions for variable-length vectors by creating one vector
result for each scalar result, with the elements associated
with other scalar results stubbed out with the neutral value.
(vectorizable_reduction): Call neutral_op_for_slp_reduction.
Require IFN_VEC_SHL_INSERT for double reductions on
variable-length vectors, or SLP reductions that have
a neutral value. Require can_duplicate_and_interleave_p
support for variable-length unchained SLP reductions if there
is no neutral value, such as for MIN/MAX reductions. Also require
the number of vector elements to be a multiple of the number of
SLP statements when doing variable-length unchained SLP reductions.
Update call to vect_create_epilog_for_reduction.
* tree-vect-slp.c (can_duplicate_and_interleave_p): Make public
and remove initial values.
(duplicate_and_interleave): Make public.
* config/aarch64/aarch64.md (UNSPEC_INSR): New unspec.
* config/aarch64/aarch64-sve.md (vec_shl_insert_<mode>): New insn.
gcc/testsuite/
* gcc.dg/vect/pr37027.c: Remove XFAIL for variable-length vectors.
* gcc.dg/vect/pr67790.c: Likewise.
* gcc.dg/vect/slp-reduc-1.c: Likewise.
* gcc.dg/vect/slp-reduc-2.c: Likewise.
* gcc.dg/vect/slp-reduc-3.c: Likewise.
* gcc.dg/vect/slp-reduc-5.c: Likewise.
* gcc.target/aarch64/sve/slp_5.c: New test.
* gcc.target/aarch64/sve/slp_5_run.c: Likewise.
* gcc.target/aarch64/sve/slp_6.c: Likewise.
* gcc.target/aarch64/sve/slp_6_run.c: Likewise.
* gcc.target/aarch64/sve/slp_7.c: Likewise.
* gcc.target/aarch64/sve/slp_7_run.c: Likewise.
Co-Authored-By: Alan Hayward <alan.hayward@arm.com>
Co-Authored-By: David Sherwood <david.sherwood@arm.com>
From-SVN: r256623
22 files changed: