pass: Run cleanup passes before SLP [PR96789]
As the discussion in PR96789, we found that some scalar stmts
which can be eliminated by some passes after SLP, but we still
modeled their costs when trying to SLP, it could impact
vectorizer's decision. One typical case is the case in PR96789
on target Power.
As Richard suggested there, this patch is to introduce one pass
called pre_slp_scalar_cleanup which has some secondary clean up
passes, for now they are FRE and DSE. It introduces one new
TODO flags group called pending TODO flags, unlike normal TODO
flags, the pending TODO flags are passed down in the pipeline
until one of its consumers can perform the requested action.
Consumers should then clear the flags for the actions that they
have taken.
Soem compilation time statistics on all SPEC2017 INT bmks were
collected on one Power9 machine for several option sets below:
A1: -Ofast -funroll-loops
A2: -O1
A3: -O1 -funroll-loops
A4: -O2
A5: -O2 -funroll-loops
the corresponding increment rate is trivial:
A1 A2 A3 A4 A5
0.08% 0.00% -0.38% -0.10% -0.05%
Bootstrapped/regtested on powerpc64le-linux-gnu P8.
gcc/ChangeLog:
PR tree-optimization/96789
* function.h (struct function): New member unsigned pending_TODOs.
* passes.c (class pass_pre_slp_scalar_cleanup): New class.
(make_pass_pre_slp_scalar_cleanup): New function.
(pass_data_pre_slp_scalar_cleanup): New pass data.
* passes.def: (pass_pre_slp_scalar_cleanup): New pass, add
pass_fre and pass_dse as its children.
* timevar.def (TV_SCALAR_CLEANUP): New timevar.
* tree-pass.h (PENDING_TODO_force_next_scalar_cleanup): New
pending TODO flag.
(make_pass_pre_slp_scalar_cleanup): New declare.
* tree-ssa-loop-ivcanon.c (tree_unroll_loops_completely_1):
Once any outermost loop gets unrolled, flag cfun pending_TODOs
PENDING_TODO_force_next_scalar_cleanup on.
gcc/testsuite/ChangeLog:
PR tree-optimization/96789
* gcc.dg/tree-ssa/ssa-dse-28.c: Adjust.
* gcc.dg/tree-ssa/ssa-dse-29.c: Likewise.
* gcc.dg/vect/bb-slp-41.c: Likewise.
* gcc.dg/tree-ssa/pr96789.c: New test.