git.libre-soc.org Git - mesa.git/commit

author	Jason Ekstrand <jason.ekstrand@intel.com>
	Mon, 23 Mar 2015 22:08:31 +0000 (15:08 -0700)
committer	Jason Ekstrand <jason.ekstrand@intel.com>
	Wed, 1 Apr 2015 19:51:04 +0000 (12:51 -0700)
commit	37703040a142da6bc7c458479a70e35118e10e6b
tree	a374db9eb3199a20212d86b63ce8609ab1367499	tree
parent	7f344721b1a94a6166b53f959ff6b159af3b5f9a	commit \| diff

i965/nir: Run the ffma peephole after the rest of the optimizations

The idea here is that fusing multiply-add combinations too early can reduce
our ability to perform CSE and value-numbering.  Instead, we split ffma
opcodes up-front, hope CSE cleans up, and then fuse after-the-fact.
Unless an algebraic pass does something silly where it inserts something
between the multiply and the add, splitting and re-fusing should never
cause a problem.  We run the late algebraic optimizations after this so
that things like compare-with-zero don't hurt our ability to fuse things.

shader-db results for fragment shaders on Haswell:
total instructions in shared programs: 4390538 -> 4379236 (-0.26%)
instructions in affected programs:     989359 -> 978057 (-1.14%)
helped:                                5308
HURT:                                  97
GAINED:                                78
LOST:                                  5

This does, unfortunately, cause some substantial hurt to a shader in Kerbal
Space Program.  However, the damage is caused by changing a single
instruction from a ffma to an add.  This, in turn, *decreases* register
pressure in one part of the program causing it to fail to register allocate
and spill.  Given the overwhelmingly positive results in other shaders and
the fact that the NIR for the Kerbal shaders is actually better, this
should be considered a positive.

Reviewed-by: Matt Turner <mattst88@gmail.com>

src/mesa/drivers/dri/i965/brw_context.c		diff \| blob \| history
src/mesa/drivers/dri/i965/brw_fs_nir.cpp		diff \| blob \| history