tree-optimization/98855 - redo BB vectorization costing
The following attempts to account for the fact that BB vectorization
regions now can span multiple loop levels and that an unprofitable
inner loop vectorization shouldn't be offsetted by a profitable
outer loop vectorization to make it overall profitable.
For now I've implemented a heuristic based on the premise that
vectorization should be profitable even if loops may not be entered
or if they iterate any number of times. Especially the first
assumption then requires that stmts directly belonging to loop A
need to be costed separately from stmts belonging to another loop
which also simplifies the implementation.
On x86 the added testcase has in the outer loop
t.c:38:20: note: Cost model analysis for part in loop 1:
Vector cost: 56
Scalar cost: 192
and the inner loop
t.c:38:20: note: Cost model analysis for part in loop 2:
Vector cost: 132
Scalar cost: 48
and thus the vectorization is considered not profitable
(note the same would happen in case the 2nd cost were for
a loop outer to the 1st costing).
Future enhancements may consider static knowledge of whether
a loop is always entered which would allow some inefficiency
in the vectorization of its loop header. Likewise stmts only
reachable from a loop exit can be treated this way.
2021-02-05 Richard Biener <rguenther@suse.de>
PR tree-optimization/98855
* tree-vectorizer.h (add_stmt_cost): New overload.
* tree-vect-slp.c (li_cost_vec_cmp): New.
(vect_bb_slp_scalar_cost): Cost individual loop regions
separately. Account for the scalar instance root stmt.
* g++.dg/vect/slp-pr98855.cc: New testcase.