We were expanding the live range too far, breaking register_coalesce_2()
and compute_to_mrf() on 16-wide shaders. Turning it back on improves
GLB2.7 performance by 0.239355% +/- 0.
0850649% (n=398). shader-db stats
are:
total instructions in shared programs:
1627211 ->
1609262 (-1.10%)
instructions in affected programs: 450351 -> 432402 (-3.99%)
While 33 new 16-wide shaders are gained, 70 are lost. Despite that,
tropics (the app that lost the most 16-wide) shows a .41% +/- .16%
(n=7/8, first-run outlier removed) performance improvement on my HSW.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* pixel_x/pixel_y, which are registers of 16-bit values and thus
* would get stomped by the first decode as well.
*/
- if (dispatch_width == 16 && (inst->src[i].smear ||
+ if (dispatch_width == 16 && (inst->src[i].smear >= 0 ||
(this->pixel_x.reg == reg ||
this->pixel_y.reg == reg))) {
end_ip++;