git.libre-soc.org Git - mesa.git/commit

author	Roland Scheidegger <sroland@vmware.com>
	Fri, 6 Jul 2012 00:53:44 +0000 (02:53 +0200)
committer	José Fonseca <jfonseca@vmware.com>
	Fri, 20 Jul 2012 19:17:15 +0000 (20:17 +0100)
commit	70a969f123c98cf6fca71a5fed4efed983edf6c8
tree	f0ffa94d275d9ade9e8bfb8ffc16b99ec7fcf946	tree
parent	542bd6941f5a56f7a3aa84b44d92591488b146bf	commit \| diff

llvmpipe: use runtime loop instead of static loop for looping over quads

This can potentially cut shader program size by a factor of 4 for 4-wide
execution respectively 2 for 8-wide execution and while this ratios aren't
quite reached for more complex shaders it can be close.
Could not really measure a performance difference so far except for trivial
shaders (glxgears).
There seems to be a fair amount of unnecessary move's generated especially
at the beginning it might be possible to optimize those away somehow.
Things aren't quite as clean, some additional stuff needs to be done for
keeping both paths working (though llvm might be able to optimize this away).
glxgears seems to lose about 5-10% of performance, looking at the generated
shaders this is actually less than I'd think it would be - both 4 and 8-wide
shaders, despite containing a loop actually have about 10% more instructions
in total, and will have roughly 50% more executed instructions (though mostly
cheap ones). Need to figure out how to reduce overhead...

v2: keep complex interpolation for 4-wide mode, adapt to interface changes.

Reviewed-by: José Fonseca <jfonseca@vmware.com>

src/gallium/drivers/llvmpipe/lp_bld_interp.c		diff \| blob \| history
src/gallium/drivers/llvmpipe/lp_bld_interp.h		diff \| blob \| history
src/gallium/drivers/llvmpipe/lp_state_fs.c		diff \| blob \| history