u_tile: Skip the packed temporary and just store tiles directly.
We were generating a packed copy and then memcpying it, but we can just
pack directly to the destination. Change on glmark2 -b build:use-vbo=true
is modest: 1.06328% +/- 0.994771% (n=84) but does remove the function that
was .6% of CPU time.
I'm not doing the equivalent "get" path at this time because softpipe's
texture cache has some clipping issues that get revealed.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3698>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3698>