tree-optimization/98235 - limit SLP discovery
With following backedges and the SLP discovery cache not being
permute aware we have to put some discovery limits in place again.
That's also the opportunity to ditch the separate limit on the
number of permutes we try, so the patch limits the overall work
done (as in vect_build_slp_tree cache misses) to what we compute
as max_tree_size which is based on the number of scalar stmts in
the vectorized region.
Note the limit is global and there's no attempt to divide the
allowed work evenly amongst opportunities, so one degenerate
can eat it all up. That's probably only relevant for BB
vectorization where the limit is based on up to the size of the
whole function.
2020-12-11 Richard Biener <rguenther@suse.de>
PR tree-optimization/98235
* tree-vect-slp.c (vect_build_slp_tree): Exchange npermutes
for limit. Decrement that for each cache miss and fail
discovery when it reaches zero.
(vect_build_slp_tree_2): Remove npermutes handling and
simply pass down limit.
(vect_build_slp_instance): Use pass down limit.
(vect_analyze_slp_instance): Likewise.
(vect_analyze_slp): Base the SLP discovery limit on
max_tree_size and pass it down.
* gcc.dg/torture/pr98235.c: New testcase.