Alyssa Rosenzweig [Wed, 19 Jun 2019 18:27:59 +0000 (11:27 -0700)]
panfrost: Implement command stream scoreboarding
This is a rather complex change, adding a lot of code but ideally
cleaning up quite a bit as we go.
Within a batch (single frame), there are multiple distinct Mali job
types: SET_VALUE, VERTEX, TILER, FRAGMENT for the few that we emit right
now (eventually more for compute and geometry shaders). Each hardware
job has a mali_job_descriptor_header, which contains three fields of
interest: job index, a dependencies list, and a next job pointer.
The next job pointer in each job is used to form a linked list of
submitted jobs. Easy enough.
The job index and dependencies list, however, are used to form a
dependency graph (a DAG, where each hardware job is a node and each
dependency is a directed edge). Internally, this sets up a scoreboarding
data structure for the hardware to dispatch jobs in parallel, enabling
(for example) vertex shaders from different draws to execute in parallel
while there are strict dependencies between tiling the geometry of a
draw and running that vertex shader.
For a while, we got by with an incredible series of total hacks,
manually coding indices, lists, and dependencies. That worked for a
moment, but combinatorial kaboom kicked in and it became an
unmaintainable mess of spaghetti code.
We can do better. This commit explicitly handles the scoreboarding by
providing high-level manipulation for jobs. Rather than a command like
"set dependency #2 to index 17", we can express quite naturally "add a
dependency from job T on job V". Instead of some open-coded logic to
copy a draw pointer into a delicate context array, we now have an
elegant exposed API to simple "queue a job of type XYZ".
The design is influenced by both our current requirements (standard ES2
draws and u_blitter) as well as the need for more complex scheduling in
the future. For instance, blits can be optimized to use only a tiler
job, without a vertex job first (since the screen-space vertices are
known ahead-of-time) -- causing tiler-only jobs. Likewise, when using
transform feedback with rasterizer discard enabled, vertex jobs are
created (to run vertex shaders) with no corresponding tiler job. Both of
these cases break the original model and could not be expressed with the
open-coded logic. More generally, this will make it easier to add
support for compute shaders, geometry shaders, and fused jobs (an
optimization available on Bifrost).
Incidentally, this moves quite a bit of state from the driver context to
the batch, which helps with Rohan's refactor to eventually permit
pipelining across framebuffers (one important outstanding optimization
for FBO-heavy workloads).
v2: Add comment explaining the meaning of "primary batch" as suggested
by Tomeu (trivial - not reviewed).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Rohan Garg <rohan.garg@collabora.com>
Anuj Phogat [Fri, 14 Jun 2019 00:34:46 +0000 (17:34 -0700)]
intel/icl: Add new ICL PCI-IDs
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jason Ekstrand [Wed, 19 Jun 2019 22:07:43 +0000 (17:07 -0500)]
anv: Implement "pop-free" clipping
This is the preferred clipping mode since it doesn't mean your points
disappear the moment part of the point crosses over the edge of the
viewport and that lines have weird endpoints at viewport edges. We've
just never bothered to hook it up until now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jason Ekstrand [Wed, 19 Jun 2019 21:04:54 +0000 (16:04 -0500)]
anv: Enable the guardband clip test
In workloads where there is a lot of geometry drawn that crosses over
the edge of the viewport, this should substantially improve clipper
performance. Not really sure why it's taken 3 years to turn it on but
we never got around to it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jason Ekstrand [Wed, 19 Jun 2019 20:52:55 +0000 (15:52 -0500)]
i965,iris: Move guardband calculations to a common location
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Mauro Rossi [Sat, 15 Jun 2019 05:39:02 +0000 (07:39 +0200)]
android: virgl: fix libmesa_winsys_virgil_common build and dependencies
Fixes the following building errors and resolves Bug 110922
Fixes gallium_dri target missing symbols at linking.
external/mesa/src/gallium/winsys/virgl/drm/Android.mk:
error: libmesa_winsys_virgl (STATIC_LIBRARIES android-x86_64) missing libmesa_winsys_virgl_common (STATIC_LIBRARIES android-x86_64)
...
external/mesa/src/gallium/winsys/virgl/vtest/Android.mk:
error: libmesa_winsys_virgl_vtest (STATIC_LIBRARIES android-x86_64) missing libmesa_winsys_virgl_common (STATIC_LIBRARIES android-x86_64)
...
build/core/main.mk:728: error: exiting from previous errors.
In file included from external/mesa/src/gallium/winsys/virgl/vtest/virgl_vtest_socket.c:34:
external/mesa/src/gallium/winsys/virgl/vtest/virgl_vtest_winsys.h:35:10:
fatal error: 'virgl_resource_cache.h' file not found
^~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
In file included from external/mesa/src/gallium/winsys/virgl/vtest/virgl_vtest_winsys.c:32:
external/mesa/src/gallium/winsys/virgl/vtest/virgl_vtest_winsys.h:35:10:
fatal error: 'virgl_resource_cache.h' file not found
#include "virgl_resource_cache.h"
^~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
Fixes: b18f09a ("virgl: Introduce virgl_resource_cache")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
Mauro Rossi [Sat, 8 Jun 2019 13:15:09 +0000 (15:15 +0200)]
android: winsys/amdgpu,radv: fix generated amdgfxregs.h header dependecies
Fix android building errors in winsys/amdgpu and radv
due to 'amdgfxregs.h' not found.
Changelog:
amd/common - generated $(intermediated)/common path is added to exports
winsys/amdgpu - libmesa_amd_common static dependency is added
radv - correct generated $(intermediated)/common path is added to includes
Fixes: f480b8a ("amd/common: use generated register header")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Samuel Pitoiset [Mon, 20 May 2019 09:46:33 +0000 (11:46 +0200)]
radv: add support for VK_KHR_depth_stencil_resolve
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 12 Jun 2019 09:39:58 +0000 (11:39 +0200)]
radv: pass sample locations for transitions before depth/stencil resolves
HTILE decompressions need the user sample locations if specified
in the current subpass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 22 May 2019 09:23:03 +0000 (11:23 +0200)]
radv: clear the depth/stencil resolve attachment if necessary
The driver might need to clear one aspect of the depth/stencil
resolve attachment before performing the resolve itself.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 22 May 2019 12:56:01 +0000 (14:56 +0200)]
radv: decompress HTILE if the resolve src image is compressed
It's required to decompress HTILE before resolving with the
compute path.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 22 May 2019 07:45:19 +0000 (09:45 +0200)]
radv: select the depth/stencil resolve method based on some conditions
Only fallback to the compute path for layers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 22 May 2019 07:42:12 +0000 (09:42 +0200)]
radv: implement all depth/stencil resolve modes using compute
This path supports layers but it requires to decompress HTILE
before resolving. The driver also needs to fixup HTILE after
the resolve. This path is probably slower than the graphics one.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 22 May 2019 07:43:39 +0000 (09:43 +0200)]
radv: implement all depth/stencil resolve modes using graphics
When using graphics, the driver doesn't need to decompress HTILE
before resolving. This path currently doesn't support layers
so we have to fallback to the compute path.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Mon, 20 May 2019 09:47:02 +0000 (11:47 +0200)]
radv: record if a render pass has depth/stencil resolve attachments
Only supported with vkCreateRenderPass2().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 12 Jun 2019 08:20:41 +0000 (10:20 +0200)]
radv: rename has_resolve to has_color_resolve
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 13 Jun 2019 11:56:22 +0000 (13:56 +0200)]
radv: emit framebuffer state from primary if secondary doesn't inherit it
Otherwise fast color/depth clears can't work because they depend
on the framebuffer.
This fixes the following CTS (when the small hint is disabled):
- dEQP-VK.geometry.layered.1d_array.secondary_cmd_buffer
- dEQP-VK.geometry.layered.2d_array.secondary_cmd_buffer
- dEQP-VK.geometry.layered.cube.secondary_cmd_buffer
- dEQP-VK.geometry.layered.cube_array.secondary_cmd_buffer
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110810
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107986
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Eric Engestrom [Fri, 23 Nov 2018 17:04:25 +0000 (17:04 +0000)]
drisw: move build logic to build systems
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tomeu Vizoso [Fri, 21 Jun 2019 06:10:57 +0000 (08:10 +0200)]
panfrost: ci: Exclude two more flip-flop from results
These three tests pass on RK3399, but fail on RK3288:
dEQP-GLES2.functional.shaders.matrix.div.const_lowp_mat2_mat2_vertex
dEQP-GLES2.functional.shaders.operator.unary_operator.pre_increment_effect.highp_ivec4_vertex
dEQP-GLES2.functional.shaders.texture_functions.vertex.texture2dprojlod_vec3
They reliably pass when run individually, but reliably fail when run in
a full CI run.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Gert Wollny [Thu, 20 Jun 2019 13:38:30 +0000 (15:38 +0200)]
gallium/st: Add Gallium hud to swrast drivers
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Iago Toral Quiroga [Wed, 19 Jun 2019 08:28:12 +0000 (10:28 +0200)]
v3d: flush jobs writing to vertex buffers used in the current draw call
This can happen when any of our vertex buffers was written by a previous
transform feedback draw.
Fixes the following piglit tests:
spec/ext_transform_feedback/position-render-bufferbase
spec/ext_transform_feedback/position-render-bufferbase-discard
spec/ext_transform_feedback/position-render-bufferoffset
spec/ext_transform_feedback/position-render-bufferoffset-discard
spec/ext_transform_feedback/position-render-bufferrange
spec/ext_transform_feedback/position-render-bufferrange-discard
Reviewed-by: Eric Anholt <eric@anholt.net>
Iago Toral Quiroga [Wed, 19 Jun 2019 07:48:12 +0000 (09:48 +0200)]
v3d: flush jobs reading from transform feedback output buffers
If we are about to write to a transform feedback buffer, we should
make sure that we flush any prior work that intended to read from
any of these buffers.
Fixes piglit test:
spec/ext_transform_feedback/immediate-reuse
Reviewed-by: Eric Anholt <eric@anholt.net>
Iago Toral Quiroga [Wed, 19 Jun 2019 08:23:43 +0000 (10:23 +0200)]
v3d: add a helper to check if transform feedback is enabled
v2: We should be safe assuming that bind_vs != NULL (Eric)
Reviewed-by: Eric Anholt <eric@anholt.net>
Dave Airlie [Wed, 19 Jun 2019 20:47:08 +0000 (06:47 +1000)]
llvmpipe: make remove_shader_variant static.
this isn't used outside this file.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Eric Engestrom [Wed, 1 May 2019 10:51:01 +0000 (11:51 +0100)]
util/os_file: resize buffer to what was actually needed
Fixes: 316964709e21286c2af5 "util: add os_read_file() helper"
Reported-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tomeu Vizoso [Thu, 20 Jun 2019 18:57:30 +0000 (20:57 +0200)]
panfrost: ci: Update expectations
These tests have been fixed recently.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Wed, 19 Jun 2019 14:23:27 +0000 (07:23 -0700)]
panfrost/midgard: Broadcast swizzle
Fixes regression in shaders using ball/etc by explicitly passing through
the number of channels in the NIR op and broadcasting the last
components of the channel appropriately, as the Midgard ops are all vec4
implicitly but NIR can be vec2/3.
v2: Don't also regress every other swizzle in Equestria.
v3: Don't regress the swizzles at Canterlot High either.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Kenneth Graunke [Thu, 20 Jun 2019 06:08:04 +0000 (01:08 -0500)]
iris: Use stream uploader for shader draw parameters.
Most vertex data lives in user VBOs in IRIS_MEMZONE_OTHER, which
typically have high bits set to 0xffff. The shader draw parameters were
being uploaded in IRIS_MEMZONE_DYNAMIC, which have high bets set to 0x2.
This was causing a lot of ping-ponging of high bits, leading to
unnecessary VF cache flushing.
Cuts 7.2% of the flushes in the Civizilation VI demo on Kabylake GT2.
Kenneth Graunke [Thu, 20 Jun 2019 05:47:33 +0000 (00:47 -0500)]
iris: Don't check VF address high bits when there is no buffer.
If there is no buffer, then it doesn't matter. Leave the old stale
high bits in place (for next time) and don't bother invalidating.
Cuts 5.6% of the flushes in the Civilization VI demo on Kabylake GT2.
Kenneth Graunke [Thu, 20 Jun 2019 04:12:52 +0000 (23:12 -0500)]
iris: Drop RT flushes from depth stencil clearing flushes.
These write depth and stencil, not color writes, so there's no need
to flush the render target.
Kenneth Graunke [Thu, 20 Jun 2019 04:30:52 +0000 (23:30 -0500)]
iris: Don't bother with PIPE_CONTROLs for CPU writes and no history
If a buffer has no usage history, we don't have any read only cache
invalidates to do. If we've written it with the CPU, we don't need
to flush the render cache. The only bit remaining is the CS stall
from iris_flush_bits_for_history. We can just skip the PIPE_CONTROL
in this case.
This is pretty common - an app creates a buffer, fills it with data,
and then binds it for some purpose.
Cuts 36% of the flushes in Manhattan 3.0 on Kabylake GT2.
Kenneth Graunke [Thu, 20 Jun 2019 04:04:37 +0000 (23:04 -0500)]
iris: Only do an RT flush for transfer maps if using copy_region.
If we wrote the data via the CPU, there's no point in doing a render
target flush. If using BLORP, we do want a render target flush so the
data lands.
Kenneth Graunke [Thu, 20 Jun 2019 04:01:21 +0000 (23:01 -0500)]
iris: Use iris_flush_bits_for_history in iris_transfer_flush_region
Instead of using the combined iris_flush_and_dirty_for_history, use
iris_flush_bits_for_history directly - we were already using the split
out iris_dirty_for_history. There's no need to dirty twice, and we can
avoid the looping altogether for non-buffers.
Kenneth Graunke [Thu, 20 Jun 2019 03:51:51 +0000 (22:51 -0500)]
iris: Avoid double flushing in iris_transfer_flush_region when copying.
My intention was to have iris_copy_region not do flushing, and leave
that up to the callers. iris_resource_copy_region needs to do this,
but iris_transfer_flush_region was already doing it. The net result
was that we were doing it twice for transfers.
So, move the flushing from iris_copy_region to iris_resource_copy_region
so that it only happens in the callers as I intended.
Kenneth Graunke [Thu, 20 Jun 2019 04:08:25 +0000 (23:08 -0500)]
iris: Fix iris_flush_and_dirty_history to actually dirty history.
When I split iris_flush_and_dirty_history into two helper functions,
I accidentally made it stop dirtying. Which was...sort of the point.
Fixes: 21688a306b2 iris: Split iris_flush_and_dirty_for_history into two helpers.
Kenneth Graunke [Thu, 20 Jun 2019 14:50:56 +0000 (09:50 -0500)]
iris: Add maybe_flush calls to texture_barrier and memory_barrier
Otherwise, tests which loop on glMemoryBarrier may run us out of
batch space with piles of flushing. (Ideally, we'd elide those bonus
PIPE_CONTROLs, but presumably this isn't that common of a case...)
Piglit's arb_pipeline_statistics_query-comp would hit this case after
some of the next patches remove other PIPE_CONTROLs with maybe_flushes.
Kenneth Graunke [Wed, 19 Jun 2019 21:04:50 +0000 (16:04 -0500)]
iris: Implement INTEL_DEBUG=pc for pipe control logging.
This prints a log of every PIPE_CONTROL flush we emit, noting which bits
were set, and also the reason for the flush. That way we can see which
are caused by hardware workarounds, render-to-texture, buffer updates,
and so on. It should make it easier to determine whether we're doing
too many flushes and why.
Alyssa Rosenzweig [Tue, 18 Jun 2019 19:30:55 +0000 (12:30 -0700)]
panfrost: Skip shading unaffected tiles
Looking at the scissor, we can discard some tiles. We specifially don't
care about the scissor on the wallpaper, since that's a no-op if the
entire tile is culled.
v2: Clarify clear comment (not reviewed but trivial).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Eric Engestrom [Fri, 14 Jun 2019 14:15:10 +0000 (15:15 +0100)]
glx: fix glvnd pointer types
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110709
Fixes: 22a9e00aab66d3dd6890 ("glx: Implement the libglvnd interface.")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric Engestrom [Wed, 19 Jun 2019 20:31:34 +0000 (21:31 +0100)]
glx: drop misleading comment about the file being "generated"
This `gen_scrn_dispatch.pl` has never existed, in the sense that NVIDIA
never published it. There have been a number (6) of commits to fix
various things in there over the years, and never anything from NVIDIA.
For all intents and purposes this file is hand-written and
hand-maintained, and we're on our own.
Let's make this clear by removing this misleading comment.
Suggested-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Boris Brezillon [Wed, 19 Jun 2019 13:05:34 +0000 (15:05 +0200)]
nir/lower_tex: Add an assert() in nir_lower_txs_lod()
We don't expect the output of a TXS instruction to be wider than a
vec3. Add an assert() to make sure this never happens.
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tomeu Vizoso [Thu, 20 Jun 2019 13:37:10 +0000 (15:37 +0200)]
panfrost: Set job requirements during draw
Right now we are doing it at a moment when we don't have all the
information we need.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Suggested-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Rohan Garg <rohan.garg@collabora.com>
Cc: Rohan Garg <rohan.garg@collabora.com>
Fixes: bfca21b622df ("panfrost: Figure out job requirements in pan_job.c")
Alyssa Rosenzweig [Thu, 20 Jun 2019 15:54:56 +0000 (08:54 -0700)]
panfrost/meson: Link with libpanfrost_shared
Fixes: 035a07c0 ("panfrost: Switch to lima tiling")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Hyunjun Ko [Thu, 20 Jun 2019 13:12:23 +0000 (22:12 +0900)]
freedreno/ir3: fix typo
Fixes: a9b556d3a04 ("freedreno/ir3: check the type of regs of absneg opcode in is_same_type_mov")
Reviewed-by: Rob Clark <robdclark@gmail.com>
Alyssa Rosenzweig [Tue, 18 Jun 2019 17:48:43 +0000 (10:48 -0700)]
panfrost: Load from tiled images
Now that we have lima tiling code available, use it to load from a tiled
source.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 18 Jun 2019 17:43:07 +0000 (10:43 -0700)]
panfrost: Switch to lima tiling
Lima and Panfrost both have implementations of software tiling
(the Lima one was forked off the Panfrost one which was forked off the
original Lima one...). Switch to the most recent Lima code, since it's
more complete than ours at this point.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Thu, 20 Jun 2019 15:19:06 +0000 (08:19 -0700)]
panfrost: Fix tiled NPOT textures with bpp<4
Panfrost's tiling routines (incorrectly) ignored the source stride,
masking this bug; lima's routines respect this stride, causing issues
when tiling NPOT textures whose stride is not a multiple of 64
(for instance, NPOT textures with bpp=1).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 18 Jun 2019 18:16:21 +0000 (11:16 -0700)]
lima,panfrost: Move lima_tiling.c/h to /src/panfrost
This will allow both drivers to share this code. Both drivers
build-tested with meson. Android build not tested.
v2: Change naming from tiling->shared, in case Lima and Panfrost can
share more in the future. Fix Android build system.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-and-tested-by: Qiang Yu <yuq825@gmail.com>
Kenneth Graunke [Thu, 20 Jun 2019 15:02:42 +0000 (10:02 -0500)]
iris: Use render_batch/compute_batch locals in memory_barrier
We have them, may as well use them.
Lionel Landwerlin [Sun, 21 Apr 2019 00:01:55 +0000 (01:01 +0100)]
anv: only resort to sync fds internally with no syncobj support
We can rely on only one kind of synchronization object (drm-syncobj)
when it is available. This reduces the number of file descriptors we
use in our implementation.
This will be required later for timeline semaphores implementation, at
this point we won't ever want to use anything else but syncobjs.
v2: Only use has_syncobj for semaphores (Jason)
v3: Only has_syncobj in assert on semaphores in QueueSubmit (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Alyssa Rosenzweig [Wed, 19 Jun 2019 16:36:22 +0000 (09:36 -0700)]
panfrost: Remove other commented pointers
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 19 Jun 2019 16:35:57 +0000 (09:35 -0700)]
panfrost/decode: Elide more zero fields
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 19 Jun 2019 16:33:19 +0000 (09:33 -0700)]
panfrost/decode: Remove memory comments
These do more harm than good at this point.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 19 Jun 2019 16:31:49 +0000 (09:31 -0700)]
panfrost: Add missing 0x in invocation_count
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 19 Jun 2019 16:31:16 +0000 (09:31 -0700)]
panfrost/decode: Skip decode of fragment backend in non-fragment
This is all zero for anything but fragment shaders.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 19 Jun 2019 16:24:01 +0000 (09:24 -0700)]
panfrost/decode: Clip mali_compute_fbd at 64-bytes
Looking at internal evidence (later fields including a literal other
compute job inception-style, seeming memory corruption, no clear
function, and the field after this being a pointer to *itself*), it
looks like this is really a much smaller descriptor.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 19 Jun 2019 16:07:13 +0000 (09:07 -0700)]
panfrost/decode: Print COMPUTE uniforms as pointers
In OpenGL, uniforms generally represent fp32 vec4s (at least in highp
mode). In OpenCL, they represent vec2s of 64-bit pointers.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 19 Jun 2019 15:59:23 +0000 (08:59 -0700)]
panfrost/decode: Show int uniforms
Float is ambiguous.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 19 Jun 2019 15:55:03 +0000 (08:55 -0700)]
panfrost/decode: Expand pointers in compute descriptor
Just as an aid.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 19 Jun 2019 15:41:51 +0000 (08:41 -0700)]
panfrost/decode: Identify "compute FBD"
There is fundamentally not a framebuffer associated with a compute job.
Allocate a new structure for it so we don't mess up graphics when
decoding.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tomeu Vizoso [Wed, 19 Jun 2019 11:16:45 +0000 (13:16 +0200)]
panfrost: Allocate panfrost_job in panfrost_context
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tomeu Vizoso [Wed, 19 Jun 2019 11:16:14 +0000 (13:16 +0200)]
panfrost: Release transient pools
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tomeu Vizoso [Thu, 20 Jun 2019 05:59:01 +0000 (07:59 +0200)]
panfrost: ci: Exclude flip-flops from results
These tests are failing at times, blacklist for now:
dEQP-GLES2.functional.fbo.render.shared_colorbuffer_clear.tex2d_rgba
dEQP-GLES2.functional.fbo.render.shared_colorbuffer_clear.tex2d_rgb
dEQP-GLES2.functional.shaders.matrix.mul.dynamic_highp_mat4_vec4_vertex
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alejandro Piñeiro [Thu, 20 Jun 2019 12:48:35 +0000 (14:48 +0200)]
util: add empty line before virgl options
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Alejandro Piñeiro [Thu, 20 Jun 2019 11:13:06 +0000 (13:13 +0200)]
util: add missing DRI_CONF_OPT_END
When DRI_CONF_GLES_EMULATE_BGRA was added for the virgl driver, it
missed a DRI_CONF_OPT_END.
This make some drivers, like v4c/v3d to crash with the following
error:
Fatal error in __driConfigOptions line 99, column 2: mismatched tag.
Not sure why it doesn't fail with virgl.
Fixes: b79366344929c6e477c64a63f246c6db0766a71c
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Eric Engestrom [Wed, 19 Jun 2019 21:18:15 +0000 (22:18 +0100)]
isl: tag unreachable path as such
GCC should be able to figure out that all the possible enum values are
exhausted in the switch() and all the branches return from the function,
but apparently it doesn't, so let's tell the compiler explicitly.
This gets rid of the following warnings in GCC 9:
[1/24] Compiling C object 'src/intel/isl/
60d23f8@@isl@sta/isl.c.o'.
../src/intel/isl/isl.c: In function ‘isl_surf_init_s’:
../src/intel/isl/isl.c:1569:10: warning: ‘array_pitch_el_rows’ may be used uninitialized in this function [-Wmaybe-uninitialized]
1569 | *surf = (struct isl_surf) {
| ~~~~~~^~~~~~~~~~~~~~~~~~~~~
1570 | .dim = info->dim,
| ~~~~~~~~~~~~~~~~~
1571 | .dim_layout = dim_layout,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
1572 | .msaa_layout = msaa_layout,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~
1573 | .tiling = tiling,
| ~~~~~~~~~~~~~~~~~
1574 | .format = info->format,
| ~~~~~~~~~~~~~~~~~~~~~~~
1575 |
|
1576 | .levels = info->levels,
| ~~~~~~~~~~~~~~~~~~~~~~~
1577 | .samples = info->samples,
| ~~~~~~~~~~~~~~~~~~~~~~~~~
1578 |
|
1579 | .image_alignment_el = image_align_el,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1580 | .logical_level0_px = logical_level0_px,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1581 | .phys_level0_sa = phys_level0_sa,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1582 |
|
1583 | .size_B = size_B,
| ~~~~~~~~~~~~~~~~~
1584 | .alignment_B = base_alignment_B,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1585 | .row_pitch_B = row_pitch_B,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~
1586 | .array_pitch_el_rows = array_pitch_el_rows,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1587 | .array_pitch_span = array_pitch_span,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1588 |
|
1589 | .usage = info->usage,
| ~~~~~~~~~~~~~~~~~~~~~
1590 | };
| ~
../src/intel/isl/isl.c:1488:24: warning: ‘*((void *)&phys_total_el+4)’ may be used uninitialized in this function [-Wmaybe-uninitialized]
1488 | struct isl_extent2d phys_total_el;
| ^~~~~~~~~~~~~
../src/intel/isl/isl.c:1335:38: warning: ‘phys_total_el’ may be used uninitialized in this function [-Wmaybe-uninitialized]
1335 | isl_align_div(phys_total_el->w * tile_el_scale,
| ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
../src/intel/isl/isl.c:1488:24: note: ‘phys_total_el’ was declared here
1488 | struct isl_extent2d phys_total_el;
| ^~~~~~~~~~~~~
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Samuel Pitoiset [Thu, 20 Jun 2019 07:17:37 +0000 (09:17 +0200)]
radv: enable DCC for mipmapped color textures on GFX8
It's tricky on GFX9, so only GFX8 for now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 20 Jun 2019 07:17:36 +0000 (09:17 +0200)]
radv: do not fast clears if one level can't be fast cleared
And fallback to slow color clears.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 20 Jun 2019 07:17:35 +0000 (09:17 +0200)]
radv: add fast clears support for mipmapped color images with DCC
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 20 Jun 2019 07:17:34 +0000 (09:17 +0200)]
radv: add radv_dcc_clear_level() helper
For clearing only one level.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 20 Jun 2019 07:17:33 +0000 (09:17 +0200)]
radv: re-initialize DCC metadata after decompressing using compute
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 20 Jun 2019 07:17:32 +0000 (09:17 +0200)]
radv: initialize levels without DCC during layout transitions
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Thomas Hellstrom [Fri, 5 Apr 2019 15:07:49 +0000 (17:07 +0200)]
svga: Support ARB_buffer_storage
This basically boils down to supporting persistent and coherent buffer
storage.
We chose to use coherent buffer storage for all persistent buffers
even if it's not explicitly specified, since using glMemoryBarrier to
obtain coherency would be particularly expensive in our driver stack,
and require a lot of additional bookkeeping.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Thomas Hellstrom [Wed, 10 Apr 2019 11:54:30 +0000 (13:54 +0200)]
gallium/util: Make it possible to disable persistent maps in the upload manager
For svga, the use of persistent / coherent maps is typically slightly
slower than without them. It's probably a bit case-dependent and
possible to tune, but for now, make sure we can disable those.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Thomas Hellstrom [Thu, 11 Apr 2019 08:10:04 +0000 (10:10 +0200)]
svga: Map vertex- index- and constant buffers ansynchronously when reading
With SWTNL and index translation we're mapping buffers for reading. These
buffers are commonly upload_mgr buffers that might already be referenced
by another submitted or unsubmitted GPU command. A synchronous map will
then trigger a flush and sync, at least on Linux that doesn't distinguish
between read- and write referencing. So map these buffers async. If they
for some obscure reason happen to be dirty (stream-output, buffer-copy),
the resource_buffer code will read-back and sync anyway. For persistent /
coherent buffers a corresponding read-back and sync will happen in the
kernel fault handler.
Testing: Piglit quick. No regressions.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Thomas Hellstrom [Thu, 11 Apr 2019 06:56:20 +0000 (08:56 +0200)]
svga: Fix index buffer uploads
In the case of SWTNL and index translation we were uploading index buffers
and then reading out from them using the CPU. Furthermore, when translating
indices we often cached the results with an upload_mgr buffer, causing the
cached indexes to be immediately discarded on the next write to that
upload_mgr buffer.
Fix this by only uploading when we know the index buffer is going to be
used by hardware. If translating, only cache translated indices if the
original buffer was not a user buffer. In the latter case when we're not
caching, use an upload_mgr buffer for the hardware indices.
This means we can also remove the SWTNL hand-crafted index buffer upload
mechanism in favour of the upload_mgr.
Finally avoid using util_upload_index_buffer(). It wastes index buffer
space by trying to make sure that the offset of the indices in the
upload_mgr buffer is larger or equal to the position of the indices in
the source buffer. From what I can tell, the SVGA device does not
require that.
Testing done: Piglit quick. No regressions.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Thomas Hellstrom [Fri, 5 Apr 2019 07:09:19 +0000 (09:09 +0200)]
winsys/svga: Make it possible to specify coherent resources
Add a flag in the surface cache key and a winsys usage flag to
specify coherent memory.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Thomas Hellstrom [Sat, 6 Apr 2019 09:13:39 +0000 (11:13 +0200)]
gallium/util: Make u_debug_flush support persistent maps
Previously unsynchronized maps have been assumed to also be persistent,
Now destinguish between persistent and unsynchronized map and also support
PIPE_TRANSFER_PERSISTENT from ARB_buffer_storage.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Gert Wollny [Mon, 27 May 2019 14:38:38 +0000 (16:38 +0200)]
virgl: Add debug flag to bypass driconf to enable the BGRA tweaks
This useful for testing, also because with vtest the dri configuration
is not read.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Gert Wollny [Mon, 27 May 2019 14:38:14 +0000 (16:38 +0200)]
virgl: Add a tweak to set the value for emulated queries of GL_SAMPLES_PASSED
On GLES hosts GL_SAMPLES_PASSED is emulated by GL_ANY_SAMPLES_PASSED which returns a boolen.
With this tweak the value that is returned if any sample passed can be set. This
may be of iterest when an application decides whether some geometry is rendered based
on an amount of visibility and not just a binary desicion. virgelrenderer sets a default
of 1024 on th host.
v2: Remove reference from virgl and correct description (Emil)
v3: Send the tweak binary encoded instead of using strings (Gurchetan)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Gert Wollny [Mon, 27 May 2019 14:32:40 +0000 (16:32 +0200)]
virgl: Add tweak to apply a swizzle when drawing/blitting to a emulated BGRA texture
With Qemu this final swizzle is not needed, but with vtest it is, i.e. it depends on
how a program using virglrenderer uses the surface that is rendered to, hence
a tweak is added.
v2: Update description and fix spelling (Emil)
v3: Send tweak as binary value instead of using strings (Gurchetan)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Gert Wollny [Mon, 27 May 2019 14:31:17 +0000 (16:31 +0200)]
virgl: Add driconf tweak for emulating BGRA surfaces on GLES
These tweaks are used to fix rendering issues with Valve games and
at least also "The Raven Remastered" when run on a GLES host.
v2: Fix type in define and remove virgl from driconf option (Emil)
v3: Encode tweak binary instead of using strings (Gurchetan)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Gert Wollny [Mon, 27 May 2019 14:28:44 +0000 (16:28 +0200)]
virgl: Add override for BGRA format to use swizzled SRGB format
Tie in the check whether the host supports tweaks and whether this tweak
is enabled.
v2: Add comment about the emulated formats not being used directly in the
guest (Gurchetan)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Gert Wollny [Mon, 27 May 2019 14:26:25 +0000 (16:26 +0200)]
virgl: Add code to accept BGRx_SRGB as RGBx_SRGB
This will be enabled in later patches by the emulation tweak.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Gert Wollny [Mon, 27 May 2019 14:02:28 +0000 (16:02 +0200)]
virgl: Add skeleton to evaluate cap and send tweaks
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Gert Wollny [Fri, 12 Apr 2019 07:52:31 +0000 (09:52 +0200)]
virgl: factor out format host bits check
This will make it a single location when we want to replace a format.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Gert Wollny [Wed, 10 Apr 2019 11:54:14 +0000 (13:54 +0200)]
gallium/virgl: Add code path for virgl to read driconf
This works only for the drm variant of virgl and not for the vtest
variant.
v2: Rebase, replace the configuration query function by a pointer to
the configuration data.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Gert Wollny [Wed, 10 Apr 2019 12:03:00 +0000 (14:03 +0200)]
virgl: Add driinfo file and tie it into the build
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Caio Marcelo de Oliveira Filho [Sat, 23 Mar 2019 17:28:03 +0000 (10:28 -0700)]
glspirv: Call pass to lower frexp instructions
These were previously handled by the spirv_to_nir, but that changed to
be an explict pass in
23d30f4099f "spirv,nir: lower
frexp_exp/frexp_sig inside a new NIR pass"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Caio Marcelo de Oliveira Filho [Fri, 22 Mar 2019 05:58:30 +0000 (22:58 -0700)]
spirv: Restrict use of descriptor intrinsics to Vulkan
In ARB_gl_spirv we'll be able to use variables for uniform buffers, so
don't use the descriptor intrinsics to lower the block access.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Nicolai Hähnle [Thu, 23 May 2019 13:17:51 +0000 (15:17 +0200)]
ac/rtld: report better error messages for LDS overallocation
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Marek Olšák [Tue, 21 May 2019 23:17:34 +0000 (19:17 -0400)]
ac/rtld: check correct LDS max size
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Nicolai Hähnle [Fri, 7 Dec 2018 09:31:41 +0000 (10:31 +0100)]
radeonsi: add s_sethalt to shaders for debugging
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Nicolai Hähnle [Thu, 23 May 2019 13:17:24 +0000 (15:17 +0200)]
ac/rtld: fix sorting of LDS symbols by alignment
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Bas Nieuwenhuizen [Wed, 19 Jun 2019 13:16:51 +0000 (15:16 +0200)]
meson: Allow building radeonsi with just the android platform.
Just as was allowed by autotools.
Fixes: 108d257a168 "meson: build libEGL"
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Bas Nieuwenhuizen [Wed, 19 Jun 2019 13:05:40 +0000 (15:05 +0200)]
anv: Fix vulkan build in meson.
Apparently the android part was never ported to meson.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Bas Nieuwenhuizen [Wed, 19 Jun 2019 13:03:43 +0000 (15:03 +0200)]
radv: Fix vulkan build in meson.
Apparently the android part was never ported to meson.
CC: <mesa-stable@lists.freedesktop.org>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Jason Ekstrand [Wed, 19 Jun 2019 20:18:06 +0000 (15:18 -0500)]
anv/image: Set different usage flags for shadow surfaces
For the block BLOCK_TEXEL_VIEW_COMPATIBLE case, this didn't matter
because the flags were already more-or-less what we wanted. However,
for gen7 stencil shadow images, it still had ISL_SURF_USAGE_STENCIL_BIT
so we were getting W-tiled which isn't what we want for the shadow. By
passing just ISL_SURF_USAGE_TEXTURE_BIT (and CUBE if we care), we now
get something that's actually texturable.
Fixes: f3ea0cf828 "anv: Add stencil texturing support for gen7"
Jason Ekstrand [Wed, 19 Jun 2019 19:14:20 +0000 (14:14 -0500)]
anv: Flush caches in anv_image_copy_to_shadow
Copies to a shadow image happen during a VkCmdPipelineBarrier or at
subpass transitions. We could potentially be a bit more conservative
but these transitions shouldn't happen often and it's better to have our
bases covered.
Fixes: f3ea0cf828 "anv: Add stencil texturing support for gen7"
Jason Ekstrand [Thu, 6 Jun 2019 15:51:25 +0000 (10:51 -0500)]
nir: Make nir_constant a vector rather than a matrix
Most places in NIR, we treat matrices like arrays. The one annoying
exception to this has been nir_constant where a matrix is a first-class
thing. This commit changes that so a matrix nir_constant is the same as
an array nir_constant. This makes matrix nir_constants a tiny bit more
expensive but shrinks all others by 96B.
Reviewed-by: Karol Herbst <kherbst@redhat.com>