nir: Split ALU instructions in loops that read phis
A single shader in Unigine Superposition is affected by this change.
A single iadd is moved to the end of a loop. This iadd is involved in
a complex set of logic to terminate the loop, and an extra mov
instruction is inserted. This shader really needs the optimization
suggested by bugzilla #94747, and I expect that to make this tiny
regression go away.
All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs:
15047543 ->
15047545 (<.01%)
instructions in affected programs: 565 -> 567 (0.35%)
helped: 0
HURT: 2
total cycles in shared programs:
369977253 ->
369978253 (<.01%)
cycles in affected programs: 127910 -> 128910 (0.78%)
helped: 0
HURT: 2
v2: Skip nir_op_vec{2,3,4} and nir_op_[fi]mov instructions to avoid
infinite optimization loops. Remove the original ALU instruciton after
all of its readers are modified to read the new ALU instruction.
v3: Extend to the more general case. The if the prev-block value from
the phi is not undef, this means the ALU instruction has to be
duplicated in both the prev-block and the continue-block.
Fixes: 8fb8ebfbb05 ("intel/compiler: More peephole select")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>