radeonsi: do EarlyCSEMemSSA LLVM pass
so that LLVM IR looks like CSE has been run on it. It's also recommended
by the instruction combining pass.
This also fixes:
- GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2 (crash)
- piglit/spec/arb_shader_ballot/execution/fs-readFirstInvocation-uint-loop (fail)
The code size decrease is positive, the register usage isn't. There is
a decrease in VGPR spilling for Tomb Raider, but increase in DiRT Showdown
and GRID Autosport.
EarlyCSEMemSSA has a -0.01% change in code size compared EarlyCSE.
SGPRS:
1935420 ->
1938076 (0.14 %)
VGPRS:
1645504 ->
1645988 (0.03 %)
Spilled SGPRs: 2493 -> 2651 (6.34 %)
Spilled VGPRs: 107 -> 115 (7.48 %)
Private memory VGPRs: 1332 -> 1332 (0.00 %)
Scratch size: 1512 -> 1516 (0.26 %) dwords per thread
Code Size:
61981592 ->
61890012 (-0.15 %) bytes
Max Waves: 371847 -> 371798 (-0.01 %)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>