i965/fs: Improve accuracy of dFdy() to match dFdx().
authorPaul Berry <stereotype441@gmail.com>
Fri, 20 Sep 2013 16:04:31 +0000 (09:04 -0700)
committerPaul Berry <stereotype441@gmail.com>
Thu, 3 Oct 2013 20:49:15 +0000 (13:49 -0700)
commit800610f9eb6ad24b5fefc9206fb700c7ae2f0ec8
tree6e62eed31079f763378240c0cd6c7863686e8e3e
parent9267565ee4248f7bc8efebd8c994a93ff1e0683d
i965/fs: Improve accuracy of dFdy() to match dFdx().

Previously, we computed dFdy() using the following instruction:

  add(8) dst<1>F src<4,4,0)F -src.2<4,4,0>F { align1 1Q }

That had the disadvantage that it computed the same value for all 4
pixels of a 2x2 subspan, which meant that it was less accurate than
dFdx().  This patch changes it to the following instruction when
c->key.high_quality_derivatives is set:

  add(8) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1Q }

This gives it comparable accuracy to dFdx().

Unfortunately, align16 instructions can't be compressed, so in SIMD16
shaders, instead of emitting this instruction:

  add(16) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1H }

We need to unroll to two instructions:

  add(8) dst<1>F src<4,4,1>.xyxyF -src<4,4,1>.zwzwF { align16 1Q }
  add(8) (dst+1)<1>F (src+1)<4,4,1>.xyxyF -(src+1)<4,4,1>.zwzwF { align16 2Q }

Fixes piglit test spec/glsl-1.10/execution/fs-dfdy-accuracy.

Acked-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
src/mesa/drivers/dri/i965/brw_fs_generator.cpp
src/mesa/drivers/dri/i965/brw_reg.h