mesa.git
4 years agopan/midgard: Simplify spillability test
Alyssa Rosenzweig [Fri, 6 Dec 2019 16:19:05 +0000 (11:19 -0500)]
pan/midgard: Simplify spillability test

Let's not worry about spilling twice in a bundle; that's too
restrictive. We'll need to change the schedule itself -- unfortunately,
this can have second-order effects due to pipeline registers.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Split spill node selection/spilling
Alyssa Rosenzweig [Fri, 6 Dec 2019 15:44:21 +0000 (10:44 -0500)]
pan/midgard: Split spill node selection/spilling

Instead of having a giant function for both, split into the two
subtasks so we can handle errors better.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Move spilling code out of scheduler
Alyssa Rosenzweig [Fri, 6 Dec 2019 14:32:38 +0000 (09:32 -0500)]
pan/midgard: Move spilling code out of scheduler

We move it to the register allocator itself. It doesn't belong in
midgard_schedule.c!

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agost/mesa: Don't access members of NULL pointers
Tomeu Vizoso [Thu, 12 Dec 2019 13:52:47 +0000 (14:52 +0100)]
st/mesa: Don't access members of NULL pointers

Should be harmless, but UBSAN complains about it and fills the logs with
noise.

../src/mesa/state_tracker/st_manager.c:523:27: runtime error: member access within null pointer of type 'struct st_framebuffer'"}
    #0 0xaad4e89c in st_framebuffer_reference ../src/mesa/state_tracker/st_manager.c:523"}
    #1 0xaad4e89c in st_api_make_current ../src/mesa/state_tracker/st_manager.c:1091"}
    #2 0xaab69e0e in dri_make_current ../src/gallium/state_trackers/dri/dri_context.c:301"}
    #3 0xaab48fd2 in driBindContext ../src/mesa/drivers/dri/common/dri_util.c:581"}
    #4 0xb682a122 in dri2_make_current ../src/egl/drivers/dri2/egl_dri2.c:1625"}
    #5 0xb67f95a4 in eglMakeCurrent ../src/egl/main/eglapi.c:884"}
    #6 0x4c2b0e in tcu::surfaceless::EglRenderContext::EglRenderContext(glu::RenderConfig const&, tcu::CommandLine const&) (/deqp/modules/gles2/deqp-gles2+0x29b0e)"}
    #7 0x4c3302 in tcu::surfaceless::ContextFactory::createContext(glu::RenderConfig const&, tcu::CommandLine const&, glu::RenderContext const*) const (/deqp/modules/gles2/deqp-gles2+0x2a302)"}
    #8 0x73a9b0 in glu::createRenderContext(tcu::Platform&, tcu::CommandLine const&, glu::RenderConfig const&, glu::RenderContext const*) (/deqp/modules/gles2/deqp-gles2+0x2a19b0)"}
    #9 0x73ad86 in glu::createDefaultRenderContext(tcu::Platform&, tcu::CommandLine const&, glu::ApiType) (/deqp/modules/gles2/deqp-gles2+0x2a1d86)"}
    #10 0x4c6a78 in deqp::gles2::Context::Context(tcu::TestContext&) (/deqp/modules/gles2/deqp-gles2+0x2da78)"}
    #11 0x4c3ba0 in deqp::gles2::TestPackage::init() (/deqp/modules/gles2/deqp-gles2+0x2aba0)"}
    #12 0x852fd8 in tcu::TestHierarchyIterator::next() (/deqp/modules/gles2/deqp-gles2+0x3b9fd8)"}
    #13 0x829660 in tcu::TestSessionExecutor::iterate() (/deqp/modules/gles2/deqp-gles2+0x390660)"}
    #14 0x810aac in tcu::App::iterate() (/deqp/modules/gles2/deqp-gles2+0x377aac)"}
    #15 0x4c1d4c in main (/deqp/modules/gles2/deqp-gles2+0x28d4c)"}
    #16 0xb64b6aa8 in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1aaa8)"}

../src/mesa/state_tracker/st_atom.c:115:8: runtime error: member access within null pointer of type 'struct st_program'"}
    #0 0xaae11a58 in check_program_state ../src/mesa/state_tracker/st_atom.c:115"}
    #1 0xaae128f6 in st_validate_state ../src/mesa/state_tracker/st_atom.c:192"}
    #2 0xaadc58c2 in prepare_draw ../src/mesa/state_tracker/st_draw.c:132"}
    #3 0xaadc58c2 in st_draw_vbo ../src/mesa/state_tracker/st_draw.c:184"}
    #4 0xabc4f924 in _mesa_validated_drawrangeelements ../src/mesa/main/draw.c:816"}
    #5 0xabc50240 in _mesa_DrawElements ../src/mesa/main/draw.c:970"}
    #6 0x73ebd2 in glu::CallLogWrapper::glDrawElements(unsigned int, int, unsigned int, void const*) (/deqp/modules/gles2/deqp-gles2+0x2d4bd2)"}
    #7 0x6d86b2 in deqp::gls::FragOpInteractionCase::iterate() (/deqp/modules/gles2/deqp-gles2+0x26e6b2)"}
    #8 0x494d16 in deqp::gles2::TestCaseWrapper::iterate(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x2ad16)"}
    #9 0x7f9cf2 in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x38fcf2)"}
    #10 0x7fa5f0 in tcu::TestSessionExecutor::iterate() (/deqp/modules/gles2/deqp-gles2+0x3905f0)"}
    #11 0x7e1aac in tcu::App::iterate() (/deqp/modules/gles2/deqp-gles2+0x377aac)"}
    #12 0x492d4c in main (/deqp/modules/gles2/deqp-gles2+0x28d4c)"}
    #13 0xb64b9aa8 in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1aaa8)"}

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopanfrost: Don't lose bits!
Tomeu Vizoso [Thu, 12 Dec 2019 13:49:57 +0000 (14:49 +0100)]
panfrost: Don't lose bits!

UBSAN complained that when alpha was 255 and we shifted it 24 positions
to the left, it didn't fit in a signed int. That's because bitwise
operations automatically promote to signed int.

../src/gallium/drivers/panfrost/pan_job.c:1130:64: runtime error: left shift of 255 by 24 places cannot be represented in type 'int'"}
    #0 0xacf953d6 in pan_pack_color ../src/gallium/drivers/panfrost/pan_job.c:1130"}
    #1 0xacf953d6 in panfrost_batch_clear ../src/gallium/drivers/panfrost/pan_job.c:1204"}
    #2 0xaae3226a in st_Clear ../src/mesa/state_tracker/st_cb_clear.c:513"}
    #3 0x4c3d0e in deqp::gles2::TestCaseWrapper::iterate(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x2ad0e)"}
    #4 0x828cf2 in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x38fcf2)"}
    #5 0x8295f0 in tcu::TestSessionExecutor::iterate() (/deqp/modules/gles2/deqp-gles2+0x3905f0)"}
    #6 0x810aac in tcu::App::iterate() (/deqp/modules/gles2/deqp-gles2+0x377aac)"}
    #7 0x4c1d4c in main (/deqp/modules/gles2/deqp-gles2+0x28d4c)"}
    #8 0xb64b6aa8 in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1aaa8)"}

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agoutil: Don't access members of NULL pointers
Tomeu Vizoso [Thu, 12 Dec 2019 14:00:40 +0000 (15:00 +0100)]
util: Don't access members of NULL pointers

Should be harmless, but UBSAN complains about it and fills the logs with
noise.

../src/gallium/auxiliary/util/u_inlines.h:110:8: runtime error: member access within null pointer of type 'struct pipe_surface'"}
    #0 0xaaccf186 in pipe_surface_reference ../src/gallium/auxiliary/util/u_inlines.h:110"}
    #1 0xaaccf186 in util_copy_framebuffer_state ../src/gallium/auxiliary/util/u_framebuffer.c:105"}
    #2 0xaabfb60e in cso_set_framebuffer ../src/gallium/auxiliary/cso_cache/cso_context.c:723"}
    #3 0xaae195ce in st_update_framebuffer_state ../src/mesa/state_tracker/st_atom_framebuffer.c:207"}
    #4 0xaae12316 in st_validate_state ../src/mesa/state_tracker/st_atom.c:261"}
    #5 0xaae31302 in st_Clear ../src/mesa/state_tracker/st_cb_clear.c:438"}
    #6 0x4c3d0e in deqp::gles2::TestCaseWrapper::iterate(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x2ad0e)"}
    #7 0x828cf2 in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x38fcf2)"}
    #8 0x8295f0 in tcu::TestSessionExecutor::iterate() (/deqp/modules/gles2/deqp-gles2+0x3905f0)"}
    #9 0x810aac in tcu::App::iterate() (/deqp/modules/gles2/deqp-gles2+0x377aac)"}
    #10 0x4c1d4c in main (/deqp/modules/gles2/deqp-gles2+0x28d4c)"}
    #11 0xb64b6aa8 in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1aaa8)"}

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agonir: Don't copy empty array
Tomeu Vizoso [Thu, 12 Dec 2019 13:40:07 +0000 (14:40 +0100)]
nir: Don't copy empty array

It's undefined behavior UBSAN complains about, so fixing this will
reduce the noise a bit.

../src/compiler/nir/nir_clone.c:710:4: runtime error: null pointer passed as argument 2, which is declared to never be null"}
    #0 0xac781be4 in clone_function ../src/compiler/nir/nir_clone.c:710"}
    #1 0xac781be4 in nir_shader_clone ../src/compiler/nir/nir_clone.c:740"}
    #2 0xacf99442 in panfrost_shader_compile ../src/gallium/drivers/panfrost/pan_assemble.c:54"}
    #3 0xacf6b268 in panfrost_bind_shader_state ../src/gallium/drivers/panfrost/pan_context.c:1960"}
    #4 0xaae326bc in set_fragment_shader ../src/mesa/state_tracker/st_cb_clear.c:135"}
    #5 0xaae326bc in clear_with_quad ../src/mesa/state_tracker/st_cb_clear.c:335"}
    #6 0xaae326bc in st_Clear ../src/mesa/state_tracker/st_cb_clear.c:518"}
    #7 0x494d0e in deqp::gles2::TestCaseWrapper::iterate(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x2ad0e)"}
    #8 0x7f9cf2 in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x38fcf2)"}
    #9 0x7fa5f0 in tcu::TestSessionExecutor::iterate() (/deqp/modules/gles2/deqp-gles2+0x3905f0)"}
    #10 0x7e1aac in tcu::App::iterate() (/deqp/modules/gles2/deqp-gles2+0x377aac)"}
    #11 0x492d4c in main (/deqp/modules/gles2/deqp-gles2+0x28d4c)"}
    #12 0xb64b9aa8 in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1aaa8)"}

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Remove undefined behavior
Tomeu Vizoso [Thu, 12 Dec 2019 13:37:46 +0000 (14:37 +0100)]
pan/midgard: Remove undefined behavior

As found by UBSAN, it should be harmless but it's good to remove any UB
so the tool's output is useful.

../src/panfrost/midgard/midgard_schedule.c:1094:9: runtime error: index -1 out of bounds for type 'midgard_instruction *[6]'"}
    #0 0xad047872 in schedule_block ../src/panfrost/midgard/midgard_schedule.c:1094"}
    #1 0xad04d41a in schedule_program ../src/panfrost/midgard/midgard_schedule.c:1116"}
    #2 0xad031f98 in midgard_compile_shader_nir ../src/panfrost/midgard/midgard_compile.c:2588"}
    #3 0xacf9874e in panfrost_shader_compile ../src/gallium/drivers/panfrost/pan_assemble.c:68"}
    #4 0xacf6b268 in panfrost_bind_shader_state ../src/gallium/drivers/panfrost/pan_context.c:1960"}
    #5 0xaae2596e in st_update_fp ../src/mesa/state_tracker/st_atom_shader.c:168"}
    #6 0xaae12316 in st_validate_state ../src/mesa/state_tracker/st_atom.c:261"}
    #7 0xaadc58c2 in prepare_draw ../src/mesa/state_tracker/st_draw.c:132"}
    #8 0xaadc58c2 in st_draw_vbo ../src/mesa/state_tracker/st_draw.c:184"}
    #9 0xabc4f924 in _mesa_validated_drawrangeelements ../src/mesa/main/draw.c:816"}
    #10 0xabc50240 in _mesa_DrawElements ../src/mesa/main/draw.c:970"}
    #11 0x73ebd2 in glu::CallLogWrapper::glDrawElements(unsigned int, int, unsigned int, void const*) (/deqp/modules/gles2/deqp-gles2+0x2d4bd2)"}
    #12 0x6d86b2 in deqp::gls::FragOpInteractionCase::iterate() (/deqp/modules/gles2/deqp-gles2+0x26e6b2)"}
    #13 0x494d16 in deqp::gles2::TestCaseWrapper::iterate(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x2ad16)"}
    #14 0x7f9cf2 in tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) (/deqp/modules/gles2/deqp-gles2+0x38fcf2)"}
    #15 0x7fa5f0 in tcu::TestSessionExecutor::iterate() (/deqp/modules/gles2/deqp-gles2+0x3905f0)"}
    #16 0x7e1aac in tcu::App::iterate() (/deqp/modules/gles2/deqp-gles2+0x377aac)"}
    #17 0x492d4c in main (/deqp/modules/gles2/deqp-gles2+0x28d4c)"}
    #18 0xb64b9aa8 in __libc_start_main (/lib/arm-linux-gnueabihf/libc.so.6+0x1aaa8)"}

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopanfrost: Hold a reference to sampler views
Tomeu Vizoso [Thu, 12 Dec 2019 07:43:12 +0000 (08:43 +0100)]
panfrost: Hold a reference to sampler views

Before we were just copying, but we need to hold a reference as well.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agogallium/swr: Fix Windows build
Jan Zielinski [Thu, 12 Dec 2019 13:07:20 +0000 (14:07 +0100)]
gallium/swr: Fix Windows build

Tessellator defines own fmin/fmax functions that conflict
with those defined in cmath header. Need to use legacy math.h
which was originally used in MS code.

Reviewed-by: Krzysztof Raszkowski <krzysztof.raszkowski@intel.com>
4 years agoac/nir: fix out-of-bound access when loading constants from global
Samuel Pitoiset [Tue, 10 Dec 2019 16:46:26 +0000 (17:46 +0100)]
ac/nir: fix out-of-bound access when loading constants from global

Global load/store instructions can't know if the offset is
out-of-bound because they don't use descriptors (no range).

Fix this by clamping the offset for arrays that are indexed
with a non-constant offset that's greater or equal to the array
size.

This fixes VM faults and GPU hangs with Dead Rising 4.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2148
Fixes: 71a67942003 ("ac/nir: Enable nir_opt_large_constants")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoanv: fix assumptions about temporary fence payload
Lionel Landwerlin [Wed, 11 Dec 2019 23:51:37 +0000 (01:51 +0200)]
anv: fix assumptions about temporary fence payload

Since f9a3d9738b12 temporary BO_WSI are definitely a thing so we have
an assert wrong.

Take that opportunity to expand a bit on an existing comment.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: f9a3d9738b12 ("anv: Use BO fences/semaphores for AcquireNextImage")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
4 years agoanv: fix fence underlying primitive checks
Lionel Landwerlin [Wed, 11 Dec 2019 23:58:01 +0000 (01:58 +0200)]
anv: fix fence underlying primitive checks

We appear to have got lucky that the only type of temporary fence
payload we could have was a syncobj and that would only happen when
the type of the permanent payload was also a syncobj.

This code was broken if that assumption changed and it did in commit
f9a3d9738b12.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
4 years agovtn/opencl: add shuffle/shuffle support
Dave Airlie [Mon, 27 May 2019 01:03:24 +0000 (11:03 +1000)]
vtn/opencl: add shuffle/shuffle support

This adds nir encoding for these, generating them from libclc
was very expensive, and this is a lot simpler.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
4 years agovtn: convert vload/store to single value loops
Dave Airlie [Wed, 22 May 2019 01:58:40 +0000 (11:58 +1000)]
vtn: convert vload/store to single value loops

There is an alignment issue doing this the other way, the
spec clearly says vload/store don't require alignment.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
4 years agoiris: Default to X-tiling for scanout buffers without modifiers
Kenneth Graunke [Wed, 11 Dec 2019 17:49:38 +0000 (09:49 -0800)]
iris: Default to X-tiling for scanout buffers without modifiers

Neither Mutter nor KWin's wayland compositors appear to use modifiers.
In the non-modifier case, iris was still trying to use Y-tiling for
scan-out surfaces, leading to this error:

(gnome-shell:7247): mutter-WARNING **: 09:23:47.787: meta_drm_buffer_gbm_new failed: drmModeAddFB failed: Invalid argument

We now fall back to the historical X-tiling for scanout buffers, which
ought to work everyone, at lower performance.  To regain that, we need
to ensure modifiers are actually supported in environments people use.

Fixes: fbf31247710 ("iris: Rework tiling/modifiers handling")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agollvmpipe: enable ARB_shader_draw_parameters.
Dave Airlie [Wed, 11 Dec 2019 03:30:35 +0000 (13:30 +1000)]
llvmpipe: enable ARB_shader_draw_parameters.

All the bits should be in place for this now.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agogallivm: fixup base_vertex support
Dave Airlie [Wed, 11 Dec 2019 03:29:45 +0000 (13:29 +1000)]
gallivm: fixup base_vertex support

base vertex should be 0 for non-indexed draws according to the
piglit tests.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agogallivm/draw: add support for draw_id system value.
Dave Airlie [Wed, 11 Dec 2019 03:02:34 +0000 (13:02 +1000)]
gallivm/draw: add support for draw_id system value.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agogallivm: add base instance sysval support
Dave Airlie [Tue, 3 Dec 2019 05:54:21 +0000 (15:54 +1000)]
gallivm: add base instance sysval support

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agonv50/ir: implement global atomics and handle it for nir
Karol Herbst [Thu, 24 Oct 2019 00:50:51 +0000 (02:50 +0200)]
nv50/ir: implement global atomics and handle it for nir

TGSI doesn't have any concept of global memory right now.

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Dave Airlie <airlied@redhat.com>
4 years agonir: handle nir_deref_type_ptr_as_array in rematerialize_deref_in_block
Karol Herbst [Sat, 9 Nov 2019 01:13:25 +0000 (02:13 +0100)]
nir: handle nir_deref_type_ptr_as_array in rematerialize_deref_in_block

I forgot why that was required, but it still is the correct thing to do.

Hit it at some point when working on implementing more CL features.

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agospirv: add OpLifetime*
Rob Clark [Fri, 23 Feb 2018 20:54:24 +0000 (15:54 -0500)]
spirv: add OpLifetime*

These are just hints so we can ignore them.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
4 years agoclover/spirv: allow Int64 Atomics for supported devices
Karol Herbst [Mon, 2 Dec 2019 16:03:36 +0000 (17:03 +0100)]
clover/spirv: allow Int64 Atomics for supported devices

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agoclover/nir: set spirv environment to OpenCL
Karol Herbst [Thu, 5 Dec 2019 10:30:11 +0000 (11:30 +0100)]
clover/nir: set spirv environment to OpenCL

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agoclover/nir: treat UniformConstant as global memory
Karol Herbst [Thu, 5 Dec 2019 10:42:14 +0000 (11:42 +0100)]
clover/nir: treat UniformConstant as global memory

Just like we already do in the llvm backend. The current constant buffer code
seems fundamentally flawed and right now we are thinking on how we want to
reimplement all of that.

But until that happens, just treat is as global memory and go on.

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agospirv: handle UniformConstant for OpenCL kernels
Karol Herbst [Thu, 5 Dec 2019 10:37:34 +0000 (11:37 +0100)]
spirv: handle UniformConstant for OpenCL kernels

The caller is responsible for setting up the ubo_addr_format value as
contrary to shared and global, it's not controlled by the spirv.

Right now clovers implementation of CL constant memory uses a 24/8 bit format
to encode the buffer index and offset, but that code is dead as all backends
treat constants as global memory to workaround annoying issues within OpenCL.

Maybe that will change, maybe not. But just in case somebody wants to look at
it, add a toggle for this inside vtn.

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agogallivm/nir: copy compare ordering code from tgsi
Dave Airlie [Tue, 3 Dec 2019 04:48:03 +0000 (14:48 +1000)]
gallivm/nir: copy compare ordering code from tgsi

This fixes some isinf/isnan tests copying what the tgsi code
paths do for float compares

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agogallivm/nir: cleanup code and call cmp wrapper
Dave Airlie [Tue, 3 Dec 2019 04:45:32 +0000 (14:45 +1000)]
gallivm/nir: cleanup code and call cmp wrapper

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agogallivm: fix perspective enable if usage_mask doesn't have 0 bit set
Dave Airlie [Tue, 3 Dec 2019 03:42:03 +0000 (13:42 +1000)]
gallivm: fix perspective enable if usage_mask doesn't have 0 bit set

The current code looks like a typo, and fails if the usage_mask
is for a y/z enabled input.

Fixes piglit ext_transform_feedback-immediate-reuse-index-buffer
with llvmpipe/nir

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agogallivm: fix transpose for when first channel isn't created
Dave Airlie [Tue, 3 Dec 2019 03:41:56 +0000 (13:41 +1000)]
gallivm: fix transpose for when first channel isn't created

The previous fix worked when the second channel wasn't exposed, but
a couple of piglit tests have inputs with just the y/z chans, no x/w.

Partly Fixes piglit ext_transform_feedback-immediate-reuse-index-buffer
with llvmpipe/nir

Fixes: 5363cda52b84 ("gallivm: add swizzle support where one channel isn't defined.")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agollvmpipe/nir: handle texcoord requirements
Dave Airlie [Fri, 6 Dec 2019 04:16:52 +0000 (14:16 +1000)]
llvmpipe/nir: handle texcoord requirements

Switch to using texcoord intrinsic support.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agofreedreno/a6xx: Silence warning for unused perf counters
Kristian H. Kristensen [Wed, 11 Dec 2019 17:33:56 +0000 (09:33 -0800)]
freedreno/a6xx: Silence warning for unused perf counters

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agofreedreno/a6xx: Convert some tile setup to OUT_REG()
Kristian H. Kristensen [Tue, 10 Dec 2019 03:45:31 +0000 (19:45 -0800)]
freedreno/a6xx: Convert some tile setup to OUT_REG()

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agofreedreno/a6xx: Convert gmem blits to OUT_REG()
Kristian H. Kristensen [Tue, 10 Dec 2019 03:31:26 +0000 (19:31 -0800)]
freedreno/a6xx: Convert gmem blits to OUT_REG()

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agofreedreno/a6xx: Convert VSC pipe setup to OUT_REG()
Kristian H. Kristensen [Tue, 10 Dec 2019 03:27:38 +0000 (19:27 -0800)]
freedreno/a6xx: Convert VSC pipe setup to OUT_REG()

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agofreedreno/a6xx: Convert emit_zs() to OUT_REG()
Kristian H. Kristensen [Tue, 10 Dec 2019 03:25:14 +0000 (19:25 -0800)]
freedreno/a6xx: Convert emit_zs() to OUT_REG()

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agofreedreno/a6xx: Convert emit_mrt() to OUT_REG()
Kristian H. Kristensen [Tue, 10 Dec 2019 03:24:19 +0000 (19:24 -0800)]
freedreno/a6xx: Convert emit_mrt() to OUT_REG()

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agofreedreno/a6xx: Include fd6_pack.h in a few files
Kristian H. Kristensen [Tue, 10 Dec 2019 03:22:04 +0000 (19:22 -0800)]
freedreno/a6xx: Include fd6_pack.h in a few files

Including non-functional changes to get the value from the fd_reg_pair
in places.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agofreedreno/a6xx: Drop stale include
Kristian H. Kristensen [Tue, 10 Dec 2019 03:08:31 +0000 (19:08 -0800)]
freedreno/a6xx: Drop stale include

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agofreedreno/registers: Add 64 bit address registers
Kristian H. Kristensen [Sat, 9 Nov 2019 08:57:56 +0000 (00:57 -0800)]
freedreno/registers: Add 64 bit address registers

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agofreedreno: New struct packing macros
Kristian H. Kristensen [Fri, 8 Nov 2019 04:03:21 +0000 (20:03 -0800)]
freedreno: New struct packing macros

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agofreedreno/registers: Remove duplicate register definitions
Kristian H. Kristensen [Wed, 23 Oct 2019 23:55:43 +0000 (16:55 -0700)]
freedreno/registers: Remove duplicate register definitions

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agodocs: remove mailing list as way of submitting patches
Timothy Arceri [Tue, 10 Dec 2019 22:55:03 +0000 (09:55 +1100)]
docs: remove mailing list as way of submitting patches

All developers now use gitlab, don't confuse newcomers by suggesting
they might use the mailing list. We want everyone to use gitlab so
that patches get run through basic CI before they are merged.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
4 years agoanv: Bump the advertised patch version to 129
Jason Ekstrand [Tue, 10 Dec 2019 20:26:07 +0000 (14:26 -0600)]
anv: Bump the advertised patch version to 129

We've been keeping up with the spec updates.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agoanv: Unconditionally advertise Vulkan 1.1
Jason Ekstrand [Tue, 10 Dec 2019 20:22:25 +0000 (14:22 -0600)]
anv: Unconditionally advertise Vulkan 1.1

Vulkan 1.1 requires VK_KHR_external_fence which requires syncobj support
to be actually usable.  However, it doesn't strictly require that we
support any external handle types.  We should be able to advertise 1.1
even on old kernels that don't have syncobj support.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agoanv: Flush the queue on DeviceWaitIdle
Jason Ekstrand [Wed, 11 Dec 2019 04:55:58 +0000 (22:55 -0600)]
anv: Flush the queue on DeviceWaitIdle

When we have syncobj_wait, we can trust in WAIT_FOR_SUBMIT but when we
don't, we only have BO waits and those aren't quite as nice.  This
commit adds a flag to _anv_queue_submit to wait for the queue to drain
before returning.  This gives us the behavior we need to implement
DeviceWaitIdle.

Fixes: 246261f0add "anv: prepare the driver for delayed submissions"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agonir/tests: MSVC build fix
Karol Herbst [Wed, 11 Dec 2019 16:52:17 +0000 (17:52 +0100)]
nir/tests: MSVC build fix

Fixes: 11f736a6f9c "nir/tests: add serializer tests"
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agoswr/rasterizer: Add tessellator implementation to the rasterizer
Jan Zielinski [Wed, 4 Dec 2019 12:10:18 +0000 (13:10 +0100)]
swr/rasterizer: Add tessellator implementation to the rasterizer

This is initial commit on the way to implement ARB_tessellation_shader
extension in OpenSWR. It introduces tessellator implementation
taken from Microsoft GitHub (published under MIT license):

https://github.com/microsoft/DirectX-Specs/blob/master/d3d/archive/images/d3d11/tessellator.cpp
https://github.com/microsoft/DirectX-Specs/blob/master/d3d/archive/images/d3d11/tessellator.hpp

It also adds some glue code that connects the tessellator
to the internals of SWR rasterizer.

Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Bruce Cherniak <bruce.cherniak@intel.com>
Reviwed-by: Alok Hota <alok.hota@intel.com>
4 years agogitlab-ci: set RADV_DEBUG=checkir for RADV test jobs
Samuel Pitoiset [Fri, 6 Dec 2019 16:07:35 +0000 (17:07 +0100)]
gitlab-ci: set RADV_DEBUG=checkir for RADV test jobs

This is used to validate if the driver emits correct LLVM IR.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agointel: add mi_builder_test for gen12
Eric Engestrom [Tue, 10 Dec 2019 16:09:36 +0000 (16:09 +0000)]
intel: add mi_builder_test for gen12

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agogitlab-ci: Use lavacli from packages
Rohan Garg [Thu, 5 Dec 2019 15:35:08 +0000 (16:35 +0100)]
gitlab-ci: Use lavacli from packages

lavacli 0.9.8 is now available in Debian Testing.
Ref: https://tracker.debian.org/news/1066828/lavacli-098-1-migrated-to-testing/
Fixes: 555c0de ("gitlab-ci: Move LAVA-related files into top-level ci dir")
Signed-off-by: Rohan Garg <rohan.garg@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
4 years agolima/ppir: enable lower_fdph
Erico Nunes [Tue, 10 Dec 2019 18:08:29 +0000 (19:08 +0100)]
lima/ppir: enable lower_fdph

Otherwise we may lower some fdot to fdph which is not implemented in pp.

Fixes #2126

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
4 years agonir/tests: add serializer tests
Karol Herbst [Mon, 2 Dec 2019 21:41:33 +0000 (22:41 +0100)]
nir/tests: add serializer tests

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
4 years agonir/serialize: fix vec8 and vec16
Karol Herbst [Mon, 2 Dec 2019 14:22:49 +0000 (15:22 +0100)]
nir/serialize: fix vec8 and vec16

Nir serializes uses nir_ssa_alu_instr_src_components in a few places to
determine how many components a src has, but that's not what this function
returns. It simply returns how many channels are used, which is still fine
for most of the code.

This was breaking code like this:

vec16 32 ssa_1 = intrinsic load_global
vec1  32 ssa_2 = fmax ssa_1.a, ssa_2.b

v2: make the 16bit encoding work for identify swizzles again

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
4 years agoradv: Fix RGBX Android<->Vulkan format correspondence.
Bas Nieuwenhuizen [Tue, 10 Dec 2019 15:53:56 +0000 (16:53 +0100)]
radv: Fix RGBX Android<->Vulkan format correspondence.

This is correct per the Vulkan spec format equivalence table.

Fixes: f36b52740a0 "radv/android: Add android hardware buffer queries."
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agopanfrost: Add PAN_MESA_DEBUG=sync
Tomeu Vizoso [Mon, 9 Dec 2019 07:39:59 +0000 (08:39 +0100)]
panfrost: Add PAN_MESA_DEBUG=sync

Sometimes it's useful to get information about GPU faults in the
console, so it's synchronized with other messages.

This commit will cause Mesa to wait for completion and check if there
are any faults raised by the GPU.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agoiris: Create smaller program keys without legacy features
Kenneth Graunke [Mon, 9 Dec 2019 04:25:42 +0000 (20:25 -0800)]
iris: Create smaller program keys without legacy features

A lot of the brw_*_prog_key fields are for emulating features on legacy
hardware that iris doesn't support.  In particular, all of the texture
swizzle fields take up a lot of space.  These dead fields make hashing
the shader keys more expensive than it ought to be.

We introduce iris-specific keys with only the information we need, and
translate them to brw keys when actually compiling new variants.  This
way, key comparisons can use the small keys.  The size reductions are:

   VS:  328 bytes ->  8 bytes
   TCS: 312 bytes -> 24 bytes
   TES: 304 bytes -> 24 bytes
   GS:  284 bytes ->  8 bytes
   FS:  304 bytes -> 16 bytes
   CS:  280 bytes ->  4 bytes

Scores for the Piglit drawoverhead microbenchmark case with a shader
program change improve by roughly 30%.

Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agocompiler/spirv: Fix uses of gnu struct = {} extension
Pierre Moreau [Wed, 27 Nov 2019 19:50:12 +0000 (20:50 +0100)]
compiler/spirv: Fix uses of gnu struct = {} extension

Fixes: a24d6fbae60 ("meson: Add -Werror=gnu-empty-initializer to MSVC compat args")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Signed-off-by: Pierre Moreau <dev@pmoreau.org>
4 years agoutil/u_thread: Restrict u_thread_get_time_nano on macOS.
Vinson Lee [Tue, 10 Dec 2019 07:09:58 +0000 (23:09 -0800)]
util/u_thread: Restrict u_thread_get_time_nano on macOS.

macOS does not have pthread_getcpuclockid.

src/util/u_thread.h:156:4: error: implicit declaration of function 'pthread_getcpuclockid' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
   pthread_getcpuclockid(thread, &cid);
   ^

Fixes: 4913215d145e ("util/u_thread: don't restrict u_thread_get_time_nano() to __linux__")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2171
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Eric Engestrom <eric@engestrom.ch>
4 years agotu: Move UBWC layout into fdl6_layout() and use that function.
Eric Anholt [Tue, 26 Nov 2019 20:29:19 +0000 (12:29 -0800)]
tu: Move UBWC layout into fdl6_layout() and use that function.

This gets us shared non-UBWC layout code between gallium and turnip.
Until I fix up the rest of gallium to handle UBWC mipmapping, we do the
single-level UBWC setup in gallium as a fixup after layout.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agofreedreno: Switch the 16-bit workaround to match what turnip does.
Eric Anholt [Tue, 26 Nov 2019 23:25:29 +0000 (15:25 -0800)]
freedreno: Switch the 16-bit workaround to match what turnip does.

Prevents regressions on argb1555 and rgb565 when making turnip use
freedreno's layout.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agofreedreno: Move a6xx's setup_slices() to a shareable helper function.
Eric Anholt [Tue, 26 Nov 2019 20:02:34 +0000 (12:02 -0800)]
freedreno: Move a6xx's setup_slices() to a shareable helper function.

We pass in all the parameters for setting up the layout, though freedreno
still sets a few of them up early (since it uses layout helpers in making
some decisions about the layout setup parameters that will be cleaned up
once krh's blitter work lands).

4 years agotu: Move our image layout into a freedreno_layout struct.
Eric Anholt [Tue, 26 Nov 2019 18:56:57 +0000 (10:56 -0800)]
tu: Move our image layout into a freedreno_layout struct.

This lets us start using some of the fdl_* helpers and have more obviously
matching code between gallium and turnip.  We can't yet use the fdl_* UBWC
helpers, since the gallium driver doesn't do UBWC mipmaps (which I'm
working on in another branch).

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agofreedreno: Move UBWC layout into a slices array like the non-UBWC slices.
Eric Anholt [Mon, 25 Nov 2019 19:23:03 +0000 (11:23 -0800)]
freedreno: Move UBWC layout into a slices array like the non-UBWC slices.

This is a little refactor in preparation for UBWC mipmapping support.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agofreedreno: Refactor the UBWC flags registers emission.
Eric Anholt [Fri, 22 Nov 2019 00:20:11 +0000 (16:20 -0800)]
freedreno: Refactor the UBWC flags registers emission.

It's the same logic for each of these being emitted, and I was about to
change the rsc->layout.* for UBWC.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agofreedreno: Drop the extra offset field for mipmap slices.
Eric Anholt [Thu, 21 Nov 2019 22:53:58 +0000 (14:53 -0800)]
freedreno: Drop the extra offset field for mipmap slices.

We can just bake the UBWC-goes-first delta into the slices at setup time.
I did have to fix up the resource shadowing swap path to swap the slice
fields, as it was missing and regressed the format reinterpets otherwise.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agointel/decoder: Make get_state_size take a full 64-bit address and a base
Kenneth Graunke [Wed, 2 Oct 2019 19:09:33 +0000 (15:09 -0400)]
intel/decoder: Make get_state_size take a full 64-bit address and a base

i965 wants to use an offset from a base because everything is in a
single buffer whose address may be relocated, and all base addresses
are set to the start of that buffer.

iris wants to use a full 64-bit address, because state lives in separate
buffers which may be in the shader, surface, and dynamic memory zones,
where addresses grow downward from the top of a 4GB zone,  So it's very
possible for a 32-bit offset to exist relative to multiple bases,
leading to the wrong state size.

4 years agoiris: INTEL performance query implementation
Dongwon Kim [Tue, 15 Oct 2019 19:43:02 +0000 (12:43 -0700)]
iris: INTEL performance query implementation

low-level implementation of INTEL-performance-query APIs in
Intel iris driver. Most of functions and procedures defined here
are adopted from i965 driver (brw_performance_query.c)

v2: - replace genX_init_performance_query with
      iris_init_perfquery_functions which is gen's version agnositic
    - general code clean-up

v3: include gen_perf_gens.h as some of defines were moved to this new
    header file

v4: - checking for kernel 4.13+ won't be needed here as Iris won't be
      loaded anyway without DRM_SYNCOBJ that is enabled after Kernel
      4.13.

    - checking whether gen < 8 or is_cherryview won't be required as
      well because those cases are screened in iris_screen_create.

v5: remove genX(init_performance_query)

v6: - remove oa_metrics_kernel_support as iris works only with kernel
    4.18 and newer.

    - use perf functions defined in separate file, iris_perf.h/c

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agoiris: separating out common perf code
Mark Janes [Fri, 22 Nov 2019 21:46:22 +0000 (13:46 -0800)]
iris: separating out common perf code

The configuration of the gen_perf vtable will be the same for
INTEL_performance_query and AMD_performance_monitor.
Initialize the table in a single routine that can be called from both
implementations.

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agogallium: enable INTEL_PERFORMANCE_QUERY
Dongwon Kim [Tue, 15 Oct 2019 19:43:04 +0000 (12:43 -0700)]
gallium: enable INTEL_PERFORMANCE_QUERY

new state tracker APIs added for INTEL_performance_query
This extension is enabled if all vendor specific functions for it
exist.

v2: add st_cb_perfquery.* to the list of sources in Makefile
v3: minor code clean-up
v4: - add driver hooks for intel-performance-query apis
    - add PIPE level performance counter and type enums that
      match to OpenGL enums
    - do conversion of pipe_perf_counter_type and
      pipe_perf_counter_data_type enums to GL defines in state_tracker

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agomeson/broadcom: libbroadcom_cle also needs zlib
Dylan Baker [Tue, 10 Dec 2019 19:15:37 +0000 (11:15 -0800)]
meson/broadcom: libbroadcom_cle also needs zlib

Fixes: 1ae8018a6af81eec4832a57d9d0346aa3dd98d28
       ("meson: Add support for the vc4 driver.")
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoanv: Enable Gen11 Color/Z write merging optimization
Kenneth Graunke [Tue, 3 Dec 2019 01:30:06 +0000 (17:30 -0800)]
anv: Enable Gen11 Color/Z write merging optimization

TCCNTLREG contains additional L3 cache write merging optimizations.

The default value on my system appears to be:
- URB Partial Write Merging (bit 0)
- L3 Data Partial Write Merging (bit 2)
- TC Disable (bit 3)

Windows drivers appear to set bit 1 as well to enable "Color/Z Partial
Write Merging".  This should solve an issue we were seeing where MRT
benchmarks were using substantially more bandwidth than they ought.
However, we have not observed it to cause measurable FPS gains.

It is unclear whether we should be setting bit 0 or bit 3, so for now
we leave those at the hardware default value.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agoiris: Enable Gen11 Color/Z write merging optimization
Kenneth Graunke [Sat, 31 Aug 2019 00:19:46 +0000 (17:19 -0700)]
iris: Enable Gen11 Color/Z write merging optimization

TCCNTLREG contains additional L3 cache write merging optimizations.

The default value on my system appears to be:
- URB Partial Write Merging (bit 0)
- L3 Data Partial Write Merging (bit 2)
- TC Disable (bit 3)

Windows drivers appear to set bit 1 as well to enable "Color/Z Partial
Write Merging".  This should solve an issue we were seeing where MRT
benchmarks were using substantially more bandwidth than they ought.
However, we have not observed it to cause measurable FPS gains.

It is unclear whether we should be setting bit 0 or bit 3, so for now
we leave those at the hardware default value.

Improves performance in Manhattan 3.0 by 6% on ICL 8x8 at a fixed
frequency, according to Felix Degrood.  I didn't see any improvements
at out-of-the-box power management settings, however.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agointel/genxml: Add a partial TCCNTLREG definition
Kenneth Graunke [Mon, 2 Dec 2019 07:01:19 +0000 (23:01 -0800)]
intel/genxml: Add a partial TCCNTLREG definition

TCCNTLREG contains additional cache programming settings.  In
particular, there are several write combining controls we'd like to use.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agoutil: Detect use-after-destroy in simple_mtx
Kenneth Graunke [Mon, 21 Oct 2019 21:51:13 +0000 (14:51 -0700)]
util: Detect use-after-destroy in simple_mtx

This makes simple_mtx_destroy set the counter to an invalid canary
value and then makes lock/unlock assert that the value is legal.

That way, calling lock/unlock after destroy will assert fail,
rather than deadlocking or potentially even working.

This has caught real deadlocks in dEQP multithreaded tests (in st/mesa
shader variant zombie list handling), which have since been fixed.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
4 years agofreedreno/a6xx: enable LRZ by default
Rob Clark [Tue, 10 Dec 2019 22:41:46 +0000 (14:41 -0800)]
freedreno/a6xx: enable LRZ by default

Now that dEQP should be happy, lets flip the switch.

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/a6xx: fix LRZ logic
Rob Clark [Fri, 6 Dec 2019 19:34:39 +0000 (11:34 -0800)]
freedreno/a6xx: fix LRZ logic

In particular, we need to invalidate the LRZ state when we cannot be
confident in what the Z state would be during rendering:

1) depth test modes not supported by LRZ
2) stencil test, which would require full rasterization and stencil
   test in the binning pass (whereas LRZ normally just needs to
   determine the min and max z value in an 8x8 quad)

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/a6xx: fix LRZ layout
Rob Clark [Tue, 10 Dec 2019 22:27:20 +0000 (14:27 -0800)]
freedreno/a6xx: fix LRZ layout

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/a5xx+a6xx: split LRZ layout to per-gen
Rob Clark [Tue, 10 Dec 2019 22:24:59 +0000 (14:24 -0800)]
freedreno/a5xx+a6xx: split LRZ layout to per-gen

Seems to be a bit different for a6xx, so let's split this out.

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/a6xx: disable LRZ when blending
Rob Clark [Thu, 5 Dec 2019 19:54:33 +0000 (11:54 -0800)]
freedreno/a6xx: disable LRZ when blending

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agoradeonsi: don't rely on CLEAR_STATE to set PA_SC_GENERIC_SCISSOR_*
Marek Olšák [Tue, 10 Dec 2019 00:27:26 +0000 (19:27 -0500)]
radeonsi: don't rely on CLEAR_STATE to set PA_SC_GENERIC_SCISSOR_*

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoradeonsi/gfx10: simplify the tess_turns_off_ngg condition
Marek Olšák [Tue, 12 Nov 2019 22:10:05 +0000 (17:10 -0500)]
radeonsi/gfx10: simplify the tess_turns_off_ngg condition

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoradeonsi/gfx10: disable vertex grouping
Marek Olšák [Mon, 28 Oct 2019 20:37:53 +0000 (16:37 -0400)]
radeonsi/gfx10: disable vertex grouping

based on PAL.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoradeonsi: enable NIR by default and document GL 4.6 support
Marek Olšák [Sat, 26 Oct 2019 03:32:18 +0000 (23:32 -0400)]
radeonsi: enable NIR by default and document GL 4.6 support

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agost/dri: assume external consumers of back buffers can write to the buffers
Marek Olšák [Thu, 17 Oct 2019 20:46:06 +0000 (16:46 -0400)]
st/dri: assume external consumers of back buffers can write to the buffers

This was reverted needlessly because if was part of another series.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-By: Tapani Pälli <tapani.palli@intel.com>
4 years agoANV: Stop advertising smoothLines support on gen10+
Jason Ekstrand [Fri, 6 Dec 2019 21:20:35 +0000 (15:20 -0600)]
ANV: Stop advertising smoothLines support on gen10+

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
4 years agomeson/broadcom: libbroadcom_cle needs expat headers
Dylan Baker [Tue, 10 Dec 2019 18:19:04 +0000 (10:19 -0800)]
meson/broadcom: libbroadcom_cle needs expat headers

Fixes: 1ae8018a6af81eec4832a57d9d0346aa3dd98d28
       ("meson: Add support for the vc4 driver.")
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoanv: fix incorrect VMA alignment for CCS main surfaces
Lionel Landwerlin [Tue, 10 Dec 2019 11:49:49 +0000 (03:49 -0800)]
anv: fix incorrect VMA alignment for CCS main surfaces

Maybe finer way of dealing with this requirement would be to increase
the number of pdevice->memory.types[] to add a category for special
alignment cases.

Meanwhile this fixes the problem of CCS surface alignment and it's
probably not going to cause issues given the size of our address
space.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 6af8a4acc4a4 ("anv: Add aux-map translation for gen12+")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agoanv: fix missing gen12 handling
Lionel Landwerlin [Tue, 10 Dec 2019 11:49:16 +0000 (03:49 -0800)]
anv: fix missing gen12 handling

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 181be14d4303 ("anv: Build for gen12")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agodocs: reword a bit and list HTTPS before FTP
Eric Engestrom [Fri, 22 Nov 2019 14:36:02 +0000 (14:36 +0000)]
docs: reword a bit and list HTTPS before FTP

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
4 years agomeson: drop `intel_` prefix on imgui_core
Eric Engestrom [Thu, 21 Nov 2019 23:13:01 +0000 (23:13 +0000)]
meson: drop `intel_` prefix on imgui_core

Again, no real effect, just the name of a temporary build file.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
4 years agomeson: drop duplicate `lib` prefix on libiris_gen*
Eric Engestrom [Thu, 21 Nov 2019 23:11:07 +0000 (23:11 +0000)]
meson: drop duplicate `lib` prefix on libiris_gen*

This has no real effect other than the names of the temporary files in
the build folder.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
4 years agoradv: implement VK_KHR_separate_depth_stencil_layouts
Samuel Pitoiset [Mon, 9 Dec 2019 12:56:24 +0000 (13:56 +0100)]
radv: implement VK_KHR_separate_depth_stencil_layouts

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: initialize HTILE for separate depth/stencil aspects
Samuel Pitoiset [Wed, 6 Nov 2019 14:49:10 +0000 (15:49 +0100)]
radv: initialize HTILE for separate depth/stencil aspects

It either clears the whole HTILE buffer or part of it depending
on the HTILE mask parameter.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: do not init HTILE as compressed state when dst layout allows it
Samuel Pitoiset [Wed, 6 Nov 2019 15:31:56 +0000 (16:31 +0100)]
radv: do not init HTILE as compressed state when dst layout allows it

I don't think this makes much differences and a potential clear
following the initialization will overwrite HTILE anyways.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: synchronize after performing a separate depth/stencil fast clears
Samuel Pitoiset [Tue, 26 Nov 2019 15:55:02 +0000 (16:55 +0100)]
radv: synchronize after performing a separate depth/stencil fast clears

For depth+stencil images, the driver might use an optimized path
if only one aspect is cleared. It either clears the depth or the
stencil part of HTILE. Because the two separate aspects might use
the same HTILE memory we have to synchronize.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agogitlab-ci: Don't exclude any piglit quick_shader tests
Michel Dänzer [Fri, 6 Dec 2019 11:02:13 +0000 (12:02 +0100)]
gitlab-ci: Don't exclude any piglit quick_shader tests

Now that we're running these with process isolation enabled, their
results will hopefully be stable.

Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agogallivm: add TGSI bit arithmetic opcodes support
Krzysztof Raszkowski [Thu, 5 Dec 2019 17:01:08 +0000 (18:01 +0100)]
gallivm: add TGSI bit arithmetic opcodes support

Add TGSI_OPCODE_BFI, TGSI_OPCODE_POPC, TGSI_OPCODE_LSB,
TGSI_OPCODE_IMSB, TGSI_OPCODE_UMSB, TGSI_OPCODE_IBFE,
TGSI_OPCODE_UBFE, TGSI_OPCODE_BREV support.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
4 years agoradv: fix possibly wrong PA_SC_AA_CONFIG value for conservative rast
Samuel Pitoiset [Fri, 6 Dec 2019 11:19:11 +0000 (12:19 +0100)]
radv: fix possibly wrong PA_SC_AA_CONFIG value for conservative rast

PA_SC_AA_CONFIG might be updated when conversative rasterization is
enabled. Because the driver only re-emits the multisample state if
the number of samples is different, that register value might not
be updated correctly.

Found by inspection, doesn't fix anything known.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>