rtl-ssa: Reduce the amount of temporary memory needed [PR98863]
The rtl-ssa code uses an on-the-side IL and needs to build that IL
for each block and RTL insn. I'd originally not used the classical
dominance frontier method for placing phis on the basis that it seemed
like more work in this context: we're having to visit everything in
an RPO walk anyway, so for non-backedge cases we can tell immediately
whether a phi node is needed. We then speculatively created phis for
registers that are live across backedges and simplified them later.
This avoided having to walk most of the IL twice (once to build the
initial IL, and once to link uses to phis).
However, as shown in PR98863, this leads to excessive temporary
memory in extreme cases, since we had to record the value of
every live register on exit from every block. In that PR,
there were many registers that were live (but unused) across
a large region of code.
This patch does use the classical approach to placing phis, but tries
to use the existing DF defs information to avoid two walks of the IL.
We still use the previous approach for memory, since there is no
up-front information to indicate whether a block defines memory or not.
However, since memory is just treated as a single unified thing
(like for gimple vops), memory doesn't suffer from the same
scalability problems as registers.
With this change, fwprop no longer seems to be a memory-hog outlier
in the PR: the maximum RSS is similar with and without fwprop.
The PR also shows the problems inherent in using bitmap operations
involving the live-in and live-out sets, which in the testcase are
very large. I've therefore tried to reduce those operations to the
bare minimum.
The patch also includes other compile-time optimisations motivated
by the PR; see the changelog for details.
I tried adding:
for (int i = 0; i < 200; ++i)
{
crtl->ssa = new rtl_ssa::function_info (cfun);
delete crtl->ssa;
}
to fwprop.c to stress the code. fwprop then took 35% of the compile
time for the problematic partition in the PR (measured on a release
build). fwprop takes less than .5% of the compile time when running
normally.
The command:
git diff
0b76990a9d75d97b84014e37519086b81824c307~ gcc/fwprop.c | \
patch -p1 -R
still gives a working compiler that uses the old fwprop.c. The compile
time with that version is very similar.
For a more reasonable testcase like optabs.ii at -O, I saw a 6.7%
compile time regression with the loop above added (i.e. creating
the info 201 times per pass instead of once per pass). That goes
down to 4.8% with -O -g. I can't measure a significant difference
with a normal compiler (no 200-iteration loop).
So I think that (as expected) the patch does make things a bit
slower in the normal case. But like Richi says, peak memory usage
is harder for users to work around than slighter slower compile times.
gcc/
PR rtl-optimization/98863
* rtl-ssa/functions.h (function_info::bb_live_out_info): Delete.
(function_info::build_info): Turn into a declaration, moving the
definition to internals.h.
(function_info::bb_walker): Declare.
(function_info::create_reg_use): Likewise.
(function_info::calculate_potential_phi_regs): Take a build_info
parameter.
(function_info::place_phis, function_info::create_ebbs): Declare.
(function_info::calculate_ebb_live_in_for_debug): Likewise.
(function_info::populate_backedge_phis): Delete.
(function_info::start_block, function_info::end_block): Declare.
(function_info::populate_phi_inputs): Delete.
(function_info::m_potential_phi_regs): Move information to build_info.
* rtl-ssa/internals.h: New file.
(function_info::bb_phi_info): New class.
(function_info::build_info): Moved from functions.h.
Add a constructor and destructor.
(function_info::build_info::ebb_use): Delete.
(function_info::build_info::ebb_def): Likewise.
(function_info::build_info::bb_live_out): Likewise.
(function_info::build_info::tmp_ebb_live_in_for_debug): New variable.
(function_info::build_info::potential_phi_regs): Likewise.
(function_info::build_info::potential_phi_regs_for_debug): Likewise.
(function_info::build_info::ebb_def_regs): Likewise.
(function_info::build_info::bb_phis): Likewise.
(function_info::build_info::bb_mem_live_out): Likewise.
(function_info::build_info::bb_to_rpo): Likewise.
(function_info::build_info::def_stack): Likewise.
(function_info::build_info::old_def_stack_limit): Likewise.
* rtl-ssa/internals.inl (function_info::build_info::record_reg_def):
Remove the regno argument. Push the previous definition onto the
definition stack where necessary.
* rtl-ssa/accesses.cc: Include internals.h.
* rtl-ssa/changes.cc: Likewise.
* rtl-ssa/blocks.cc: Likewise.
(function_info::build_info::build_info): Define.
(function_info::build_info::~build_info): Likewise.
(function_info::bb_walker): New class.
(function_info::bb_walker::bb_walker): Define.
(function_info::add_live_out_use): Convert a logarithmic-complexity
test into a linear one. Allow the same definition to be passed
multiple times.
(function_info::calculate_potential_phi_regs): Moved from
functions.cc. Take a build_info parameter and store the
information there instead.
(function_info::place_phis): New function.
(function_info::add_entry_block_defs): Update call to record_reg_def.
(function_info::calculate_ebb_live_in_for_debug): New function.
(function_info::add_phi_nodes): Use bb_phis to decide which
registers need phi nodes and initialize ebb_def_regs accordingly.
Do not add degenerate phis here.
(function_info::add_artificial_accesses): Use create_reg_use.
Assert that all definitions are listed in the DF LR sets.
Update call to record_reg_def.
(function_info::record_block_live_out): Record live-out register
values in the phis of successor blocks. Use the live-out set
when processing the last block in an EBB, instead of always
using the live-in sets of successor blocks. AND the live sets
with the set of registers that have been defined in the EBB,
rather than with all potential phi registers. Cope correctly
with branches back to the start of the current EBB.
(function_info::start_block): New function.
(function_info::end_block): Likewise.
(function_info::populate_phi_inputs): Likewise.
(function_info::create_ebbs): Likewise.
(function_info::process_all_blocks): Rewrite into a multi-phase
process.
* rtl-ssa/functions.cc: Include internals.h.
(function_info::calculate_potential_phi_regs): Move to blocks.cc.
(function_info::init_function_data): Remove caller.
* rtl-ssa/insns.cc: Include internals.h
(function_info::create_reg_use): New function. Lazily any
degenerate phis needed by the linear RPO view.
(function_info::record_use): Use create_reg_use. When processing
debug uses, use potential_phi_regs and test it before checking
whether the register is live on entry to the current EBB. Lazily
calculate ebb_live_in_for_debug.
(function_info::record_call_clobbers): Update call to record_reg_def.
(function_info::record_def): Likewise.