Always use the streaming load (since we know we have Broadwell+, all of
our target CPU support sse41) for reading back form the tiled surface
for mapping the resource. This means we hit the fast WC handling paths
on Atoms (without LLC), and for big Core (with LLC) using the streaming
load is no less efficient as we do not require the tiled buffer to be
pulled into the CPU cache.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* most drawing while non-persistent mappings are active, we may still use
* the GPU for blits or other operations, causing batches to happen at
* inconvenient times.
+ *
+ * If RAW is set, we expect the caller to be able to handle a WC buffer
+ * more efficiently than the involuntary clflushes.
*/
- if (flags & (MAP_PERSISTENT | MAP_COHERENT | MAP_ASYNC))
+ if (flags & (MAP_PERSISTENT | MAP_COHERENT | MAP_ASYNC | MAP_RAW))
return false;
return !(flags & MAP_WRITE);
isl_memcpy_tiled_to_linear(x1, x2, y1, y2, ptr, src, xfer->stride,
surf->row_pitch_B, has_swizzling,
- surf->tiling, ISL_MEMCPY);
+ surf->tiling, ISL_MEMCPY_STREAMING_LOAD);
box.z++;
}
}