vc4: Fill thread switching delay slots
authorJonas Pfeil <pfeiljonas@gmx.de>
Sun, 20 Nov 2016 19:45:13 +0000 (20:45 +0100)
committerEric Anholt <eric@anholt.net>
Thu, 29 Dec 2016 22:41:09 +0000 (14:41 -0800)
commitd82dbc4cde1415560e259b5aac36f36175e8939a
treee55b49fedf24303dd2c8a217de2c30cbb1a38d92
parent63e7671c7e65f9df1678d3d79c92f358ae0bdc82
vc4: Fill thread switching delay slots

Scan for instructions without a signal set in front of the switching
instruction and move the signal up there.

shader-db results:

total instructions in shared programs: 94494 -> 93027 (-1.55%)
instructions in affected programs:     23545 -> 22078 (-6.23%)

v2: Fix re-emitting of the instruction in the loop trying to emit NOPs,
    drop a scheduling change from branch delay slots. (by anholt)

Signed-off-by: Jonas Pfeil <pfeiljonas@gmx.de>
src/gallium/drivers/vc4/vc4_qpu_schedule.c