i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear
authorScott D Phillips <scott.d.phillips@intel.com>
Mon, 30 Apr 2018 17:25:48 +0000 (10:25 -0700)
committerKenneth Graunke <kenneth@whitecape.org>
Fri, 25 May 2018 18:05:46 +0000 (11:05 -0700)
commitd21c086d819d78fb3f6abcbb14aa492970f442aa
treea4d98307cb9590a10c5938ac21793f57dd43b1d6
parentfb20ae0374425ae3aff2a50a498c7e2b428632a4
i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear

The reference for MOVNTDQA says:

    For WC memory type, the nontemporal hint may be implemented by
    loading a temporary internal buffer with the equivalent of an
    aligned cache line without filling this data to the cache.
    [...] Subsequent MOVNTDQA reads to unread portions of the WC
    cache line will receive data from the temporary internal
    buffer if data is available.

This hidden cache line sized temporary buffer can improve the
read performance from wc maps.

v2: Add mfence at start of tiled_to_linear for streaming loads (Chris)

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
src/mesa/drivers/dri/i965/Makefile.am
src/mesa/drivers/dri/i965/Makefile.sources
src/mesa/drivers/dri/i965/intel_tiled_memcpy.c
src/mesa/drivers/dri/i965/meson.build