vc4: Do instruction scheduling on the QIR to hide texture fetch latency.
authorEric Anholt <eric@anholt.net>
Fri, 19 Sep 2014 19:26:27 +0000 (12:26 -0700)
committerEric Anholt <eric@anholt.net>
Sat, 19 Dec 2015 01:12:10 +0000 (17:12 -0800)
commitf1fb85e5440d8874997eea1df982cf02b6ca2ca2
tree0652399c17515cd906719a8c5dde6972db66387c
parent5278c64de58b545dfe3272b005b331fd5b71da68
vc4: Do instruction scheduling on the QIR to hide texture fetch latency.

This is a rewrite of vc4_opt_qpu_schedule.c to operate on QIR.  Texture
fetch can probably take as much as the rest of the cycles of the program,
so it's important to hide our other cycles during it (which is hard to do
after register allocation).  Also, we can queue up multiple texture
requests before collecting the resulting samples, so that we keep the
texture unit busy more of the time.

High-settings openarena performance +2.35849% +/- 0.221154% (n=7).  Also
about 2-3% on the multiarb demo.  8 piglit tests
(ext_framebuffer_multisample accuracy depthstencil) go from failing in
rendering to failing in register allocation, but hopefully I can fix that
up with some better register pressure handling here.

total instructions in shared programs: 87723 -> 88448 (0.83%)
instructions in affected programs:     78411 -> 79136 (0.92%)
total estimated cycles in shared programs: 276583 -> 246306 (-10.95%)
estimated cycles in affected programs:     265691 -> 235414 (-11.40%)
src/gallium/drivers/vc4/Makefile.sources
src/gallium/drivers/vc4/vc4_program.c
src/gallium/drivers/vc4/vc4_qir.h
src/gallium/drivers/vc4/vc4_qir_schedule.c [new file with mode: 0644]