aco: enable load/store vectorizer
Totals from affected shaders:
SGPRS:
1890373 ->
1900772 (0.55 %)
VGPRS:
1210024 ->
1215244 (0.43 %)
Spilled SGPRs: 828 -> 828 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 252 -> 252 (0.00 %) dwords per thread
Code Size:
81937504 ->
74608304 (-8.94 %) bytes
LDS: 746 -> 746 (0.00 %) blocks
Max Waves: 230491 -> 230158 (-0.14 %)
In NeiR:Automata and GTA V, the code decrease is especially large: -13.79%
and -15.32%, respectively.
v9: rework the callback function
v10: handle load_shared/store_shared in the callback
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)