i965/fs: Before reg alloc, schedule instructions to reduce live ranges.
This came from an idea by Ben Segovia. 16-wide pixel shaders are very
important for latency hiding on i965, so we want to try really hard to
get them. If scheduling an instruction makes some set of instructions
available, those are probably the ones that make the instruction's
result dead. By choosing those first, we'll have a tendency to reduce
the amount of live data as opposed to creating more.
Previously, we were sometimes getting this behavior out of the
scheduler, which was what produced the scheduler's original performance
wins on lightsmark. Unfortunately, that was mostly an accident of the
lame instruction latency information that I had, which made it
impossible to fix the actual scheduling for performance. Now that we've
fixed the scheduling for setup for register allocation, we can safely
update the latency parameters for the final schedule.
In shader-db, we lose 37 16-wide shaders, but gain 90 new ones. 4
shaders that were spilling change how many registers spill, for a
reduction of 70/3899 instructions.
v2: Simplify the new loop.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>