i965/bufmgr: Use write-combine mappings where available
Write-combine mappings give much better performance on writes than
uncached access through the GTT.
Improves performance of GFXBench 4's gl_driver2 benchmark at 1024x768
on Apollolake by 3.6086% +/- 0.674193% (n=15).
v2: (by Ken) Rebase on lockless mappings, map_count deletion, valgrind
updates, potential for CPU/WC maps failing, and other changes.
v3: (by Ken and Chris Wilson)
(Ken): Rebase on set_domain -> gem_wait
(Chris): Fix up a failed CPU/WC mmaping with a GTT mapping
Not all objects will be mappable for direct access by the CPU
(either using WC/CPU or WC paths), for example, a dmabuf wrapping an
object on a foreign device or an object wrapping access to stolen
memory. Since either the physical pages are not known or even do not
exist, we need to use the mediated, indirect access via the GTT. (If
one day, the kernel does suddenly start providing mediated access
via a regular WB/WC mmapping, we no longer need the fallback.)
v4: Avoid falling back for MAP_RAW (Chris).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>