i965: compute DDX in a subspan based only on top row
authorChia-I Wu <olv@lunarg.com>
Thu, 12 Sep 2013 05:00:52 +0000 (13:00 +0800)
committerChia-I Wu <olv@lunarg.com>
Wed, 2 Oct 2013 07:26:40 +0000 (15:26 +0800)
commit848c0e72f36d0e1e460193a2d30b2f631529156f
treea68300ffa0b3a4a30f584f5ea5733328a52a33a8
parent72edba16592e6b1589f2db410b9ab2939e196a07
i965: compute DDX in a subspan based only on top row

Consider only the top-left and top-right pixels to approximate DDX in a 2x2
subspan, unless the application requests a more accurate approximation via
GL_FRAGMENT_SHADER_DERIVATIVE_HINT or this optimization is disabled from the
new driconf option disable_derivative_optimization.

This results in a less accurate approximation.  However, it improves the
performance of Xonotic with Ultra settings by 24.3879% +/- 0.832202% (at 95.0%
confidence) on Haswell.  No noticeable image quality difference observed.

The improvement comes from faster sample_d.  It seems, on Haswell, some
optimizations are introduced to allow faster sample_d when all pixels in a
subspan have the same derivative.  I considered SAMPLE_STATE too, which allows
one to control the quality of sample_d on Haswell.  But it gave much worse
image quality without giving better performance comparing to this change.

No piglit quick.tests regression on Haswell (tested with v1).

v2: better guess for precompile program key

Signed-off-by: Chia-I Wu <olv@lunarg.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
src/mesa/drivers/dri/i965/brw_context.c
src/mesa/drivers/dri/i965/brw_context.h
src/mesa/drivers/dri/i965/brw_fs.cpp
src/mesa/drivers/dri/i965/brw_fs_generator.cpp
src/mesa/drivers/dri/i965/brw_wm.c
src/mesa/drivers/dri/i965/brw_wm.h
src/mesa/drivers/dri/i965/intel_screen.c