improve BB vectorization dump locations
This tries to improve BB vectorization dumps by providing more
precise locations. Currently the vect_location is simply the
very last stmt in a basic-block that has a location. So for
double a[4], b[4];
int x[4], y[4];
void foo()
{
a[0] = b[0]; // line 5
a[1] = b[1];
a[2] = b[2];
a[3] = b[3];
x[0] = y[0]; // line 9
x[1] = y[1];
x[2] = y[2];
x[3] = y[3];
} // line 13
we show the user with -O3 -fopt-info-vec
t.c:13:1: optimized: basic block part vectorized using 16 byte vectors
while with the patch we point to both independently vectorized
opportunities:
t.c:5:8: optimized: basic block part vectorized using 16 byte vectors
t.c:9:8: optimized: basic block part vectorized using 16 byte vectors
there's the possibility that the location regresses in case the
root stmt in the SLP instance has no location. For a SLP subgraph
with multiple entries the location also chooses one entry at random,
not sure in which case we want to dump both.
Still as the plan is to extend the basic-block vectorization
scope from single basic-block to multiple ones this is a first
step to preserve something sensible.
Implementation-wise this makes both costing and code-generation
happen on the subgraphs as analyzed.
2020-09-11 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (_slp_instance::location): New method.
(vect_schedule_slp): Adjust prototype.
* tree-vectorizer.c (vec_info::remove_stmt): Adjust
the BB region begin if we removed the stmt it points to.
* tree-vect-loop.c (vect_transform_loop): Adjust.
* tree-vect-slp.c (_slp_instance::location): Implement.
(vect_analyze_slp_instance): For BB vectorization set
vect_location to that of the instance.
(vect_slp_analyze_operations): Likewise.
(vect_bb_vectorization_profitable_p): Remove wrapper.
(vect_slp_analyze_bb_1): Remove cost check here.
(vect_slp_region): Cost check and code generate subgraphs separately,
report optimized locations and missed optimizations due to
profitability for each of them.
(vect_schedule_slp): Get the vector of SLP graph entries to
vectorize as argument.