v3d: Use ldunif instructions for uniforms.
The idea is that for repeated use of the same uniform, we could avoid
loading it on each consumer. The results look pretty good.
total instructions in shared programs:
6413571 ->
6521464 (1.68%)
total threads in shared programs: 154214 -> 154000 (-0.14%)
total uniforms in shared programs:
2393604 ->
2119629 (-11.45%)
total spills in shared programs: 4960 -> 4984 (0.48%)
total fills in shared programs: 6350 -> 6418 (1.07%)
Once we do scheduling at the NIR level, the register pressure (and thus
also instructions) issues we see here will drop back down.