i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear
authorScott D Phillips <scott.d.phillips@intel.com>
Mon, 24 Sep 2018 05:33:06 +0000 (08:33 +0300)
committerTapani Pälli <tapani.palli@intel.com>
Tue, 23 Oct 2018 11:08:05 +0000 (14:08 +0300)
commit11b1afdc92db98e93f2ca50beeb7fc481a11e708
treef53bc832b081664396707a5be7faa8e5138889b1
parent91d3a5d1a86915480e9e07cf370ad0e9743ab5b5
i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear

The reference for MOVNTDQA says:

    For WC memory type, the nontemporal hint may be implemented by
    loading a temporary internal buffer with the equivalent of an
    aligned cache line without filling this data to the cache.
    [...] Subsequent MOVNTDQA reads to unread portions of the WC
    cache line will receive data from the temporary internal
    buffer if data is available.

This hidden cache line sized temporary buffer can improve the
read performance from wc maps.

v2: Add mfence at start of tiled_to_linear for streaming loads (Chris)
v3: add Android build support (Tapani)
v4: squash 'fix i915: Fix streaming loads for intel_tiled_memcpy'
    separate sse41 to own static library (Tapani)

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> (v2)
Reviewed-by: Matt Turner <mattst88@gmail.com> (v2)
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
src/mesa/drivers/dri/i965/Android.mk
src/mesa/drivers/dri/i965/Makefile.am
src/mesa/drivers/dri/i965/Makefile.sources
src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
src/mesa/drivers/dri/i965/intel_tiled_memcpy.h
src/mesa/drivers/dri/i965/intel_tiled_memcpy_normal.c [new file with mode: 0644]
src/mesa/drivers/dri/i965/intel_tiled_memcpy_sse41.c [new file with mode: 0644]
src/mesa/drivers/dri/i965/intel_tiled_memcpy_sse41.h [new file with mode: 0644]
src/mesa/drivers/dri/i965/meson.build