nir/algebraic: Optimize common array indexing sequence
Some shaders include code that looks like:
uniform int i;
uniform vec4 bones[...];
foo(bones[i * 3], bones[i * 3 + 1], bones[i * 3 + 2]);
CSE would do some work on this:
x = i * 3
foo(bones[x], bones[x + 1], bones[x + 2]);
The compiler may then add '<< 4 + base' to the index calculations.
This results in expressions like
x = i * 3
foo(bones[x << 4], bones[(x + 1) << 4], bones[(x + 2) << 4]);
Just rearranging the math to produce (i * 48) + 16 saves an
instruction, and it allows CSE to do more work.
x = i * 48;
foo(bones[x], bones[x + 16], bones[x + 32]);
So, ~6 instructions becomes ~3.
Some individual shader-db results look pretty bad. However, I have a
really, really hard time believing the change in estimated cycles in,
for example, 3dmmes-taiji/51.shader_test after looking that change in
the generated code.
G45
total instructions in shared programs:
4020840 ->
4010070 (-0.27%)
instructions in affected programs: 177460 -> 166690 (-6.07%)
helped: 894
HURT: 0
total cycles in shared programs:
98829000 ->
98784990 (-0.04%)
cycles in affected programs:
3936648 ->
3892638 (-1.12%)
helped: 894
HURT: 0
Ironlake
total instructions in shared programs:
6418887 ->
6408117 (-0.17%)
instructions in affected programs: 177460 -> 166690 (-6.07%)
helped: 894
HURT: 0
total cycles in shared programs:
143504542 ->
143460532 (-0.03%)
cycles in affected programs:
3936648 ->
3892638 (-1.12%)
helped: 894
HURT: 0
Sandy Bridge
total instructions in shared programs:
8357887 ->
8339251 (-0.22%)
instructions in affected programs: 432715 -> 414079 (-4.31%)
helped: 2795
HURT: 0
total cycles in shared programs:
118284184 ->
118207412 (-0.06%)
cycles in affected programs:
6114626 ->
6037854 (-1.26%)
helped: 2478
HURT: 317
Ivy Bridge
total instructions in shared programs:
7669390 ->
7653822 (-0.20%)
instructions in affected programs: 388234 -> 372666 (-4.01%)
helped: 2795
HURT: 0
total cycles in shared programs:
68381982 ->
68263684 (-0.17%)
cycles in affected programs:
1972658 ->
1854360 (-6.00%)
helped: 2458
HURT: 307
Haswell
total instructions in shared programs:
7082636 ->
7067068 (-0.22%)
instructions in affected programs: 388234 -> 372666 (-4.01%)
helped: 2795
HURT: 0
total cycles in shared programs:
68282020 ->
68164158 (-0.17%)
cycles in affected programs:
1891820 ->
1773958 (-6.23%)
helped: 2459
HURT: 261
Broadwell
total instructions in shared programs:
9002466 ->
8985875 (-0.18%)
instructions in affected programs: 658784 -> 642193 (-2.52%)
helped: 2795
HURT: 5
total cycles in shared programs:
78503092 ->
78450404 (-0.07%)
cycles in affected programs:
2873304 ->
2820616 (-1.83%)
helped: 2275
HURT: 415
Skylake
total instructions in shared programs:
9156978 ->
9140387 (-0.18%)
instructions in affected programs: 682625 -> 666034 (-2.43%)
helped: 2795
HURT: 5
total cycles in shared programs:
75591392 ->
75550574 (-0.05%)
cycles in affected programs:
3192120 ->
3151302 (-1.28%)
helped: 2271
HURT: 425
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>