Rhys Perry [Fri, 22 Nov 2019 15:00:04 +0000 (15:00 +0000)]
aco: improve can_use_VOP3()
No pipeline-db changes
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
Rhys Perry [Wed, 20 Nov 2019 16:31:43 +0000 (16:31 +0000)]
aco: combine two sgprs into a VALU if they're the same
This was supposed to be done before but it wasn't done correctly and
everywhere.
pipeline-db (Navi):
Totals from affected shaders:
SGPRS: 784680 -> 786128 (0.18 %)
VGPRS: 574012 -> 573892 (-0.02 %)
Spilled SGPRs: 461 -> 461 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size:
45477088 ->
45478172 (0.00 %) bytes
Max Waves: 81294 -> 81277 (-0.02 %)
Instructions:
8657970 ->
8622483 (-0.41 %)
pipeline-db (Vega):
Totals from affected shaders:
SGPRS: 780664 -> 782072 (0.18 %)
VGPRS: 573880 -> 573760 (-0.02 %)
Spilled SGPRs: 629 -> 629 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size:
45445244 ->
45448340 (0.01 %) bytes
Max Waves: 81178 -> 81161 (-0.02 %)
Instructions:
8649902 ->
8614918 (-0.40 %)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
Rhys Perry [Wed, 20 Nov 2019 19:09:25 +0000 (19:09 +0000)]
aco: apply literals to split mads
Removing the return is also needed to apply literals to mads (which can be
done on GFX10).
pipeline-db (Navi):
Totals from affected shaders:
SGPRS: 368787 -> 367555 (-0.33 %)
VGPRS: 312436 -> 312448 (0.00 %)
Spilled SGPRs: 461 -> 461 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size:
26113388 ->
26098260 (-0.06 %) bytes
Max Waves: 35982 -> 35982 (0.00 %)
Instructions:
5038670 ->
5028941 (-0.19 %)
pipeline-db (Vega):
Totals from affected shaders:
SGPRS: 369843 -> 368659 (-0.32 %)
VGPRS: 317224 -> 317196 (-0.01 %)
Spilled SGPRs: 629 -> 629 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size:
26310540 ->
26295156 (-0.06 %) bytes
Max Waves: 36324 -> 36326 (0.01 %)
Instructions:
5073957 ->
5064164 (-0.19 %)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
Rhys Perry [Mon, 25 Nov 2019 16:12:44 +0000 (16:12 +0000)]
aco: update IR validator
GFX10 increased the constant bus limit and allowed literals on VOP3
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2883>
Rhys Perry [Tue, 15 Oct 2019 15:46:02 +0000 (16:46 +0100)]
nir/lower_gs_intrinsics: add option for per-stream counts
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2422>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2422>
Rhys Perry [Mon, 14 Oct 2019 16:03:07 +0000 (17:03 +0100)]
nir/divergence: handle load_primitive_id in GS
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2323>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2323>
Erik Faye-Lund [Tue, 24 Sep 2019 13:51:13 +0000 (15:51 +0200)]
mesa/st: use float literals
This removes a warning on MSVC.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Erik Faye-Lund [Mon, 23 Sep 2019 11:36:45 +0000 (13:36 +0200)]
gallium: fix a warning
On some platforms (like Win64), unsigned long is 32-bit, so the first
cast doesn't do anything, and the compiler complains about an implicit
cast to a smaller type. So let's cast to an uintptr_t instead first,
as that's large enough on all platforms.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Erik Faye-Lund [Fri, 20 Sep 2019 14:07:47 +0000 (16:07 +0200)]
st/wgl: eliminate implicit cast warning
I get warnings on MSVC for these implicit casts. Let's use explicit
casts instead.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Erik Faye-Lund [Fri, 20 Sep 2019 14:04:06 +0000 (16:04 +0200)]
util: initialize float-array with float-literals
We currently initialize this float-array with double-literals. Some
compilers generate warnings for this, so let's switch these to
float-literals instead.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Lionel Landwerlin [Mon, 13 Jan 2020 15:50:36 +0000 (17:50 +0200)]
anv: Implement Gen12 workaround for non pipelined state
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3365>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3365>
Lionel Landwerlin [Mon, 13 Jan 2020 15:50:06 +0000 (17:50 +0200)]
iris: Implement Gen12 workaround for non pipelined state
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3365>
Vasily Khoruzhick [Sun, 12 Jan 2020 06:23:36 +0000 (22:23 -0800)]
lima: add new findings to texture descriptor
Lower 8 bits of unknown_1_3 seems to be min_lod,
rest of 4 bits + miplevels are max_lod and min_mipfilter seems to be
lod bias. All are in fixed format with 4 bit integer and 4 bit fraction,
lod_bias also has sign bit.
Blob also seems to do some magic with lod_bias if min filter is nearest --
it adds 0.5 to lod_bias in this case. Same story when all filters are
nearest and mipmapping is enabled, but in this case it subtracts 1/16
from lod_bias.
Fixes 134 dEQP tests in dEQP-GLES2.functional.texture.*
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3359>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3359>
Kenneth Graunke [Tue, 17 Dec 2019 08:51:20 +0000 (00:51 -0800)]
intel: Use similar brand strings to the Windows drivers
This updates our product name strings to match the ones reported
by the Windows driver, which is typically the marketing name.
We retain a platform abbreviation and GT level in parenthesis so that
we're able to distinguish similar parts more easily, helping us better
understand at a glance which GPU a bug reporter has.
Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3371>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3371>
Kenneth Graunke [Tue, 17 Dec 2019 10:57:55 +0000 (02:57 -0800)]
iris: Simplify iris_get_renderer_string()
We use gen_get_device_name() instead of PCI ID list munging.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3371>
Kenneth Graunke [Tue, 17 Dec 2019 09:00:14 +0000 (01:00 -0800)]
i965: Simplify brw_get_renderer_string()
This stops using driGetRendererString() in favor of a simple snprintf().
This should have the same functionality on 64-bit systems, but drops
a "x86/MMX/SSE2" suffix on 32-bit systems. (People shouldn't be using
the GL_RENDERER string to check for CPU features...)
We also use gen_get_device_name() instead of PCI ID list munging.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3371>
Kenneth Graunke [Tue, 14 Jan 2020 01:35:35 +0000 (17:35 -0800)]
Revert "nir: assert that nir_lower_tex runs after lowering derefs"
This reverts commit
4cda61f11e922fb5914ae73d22cc0c495abf0377 for now,
as it appears to break i965 CI (32,000+ failures). Rob and I suspect
we need to do the equivalent of
1c6a2efa06e9bb5914f4557118930fc61065a467
on i965 - we are doing nir_lower_tex and brw_nir_lower_resources in the
wrong order and that's likely triggering this condition. Once we fix
that, we should put this patch back.
Erik Faye-Lund [Thu, 19 Dec 2019 09:17:14 +0000 (10:17 +0100)]
zink: fixup initialization of operand_mask / num_extra_operands
This doesn't change behavior, but makes the code a bit easier to read.
Both values are zero, but I somehow swapped the logical meaning of them
when initializing.
Eric Anholt [Mon, 13 Jan 2020 21:06:01 +0000 (13:06 -0800)]
mesa: Fix detection of invalidating both depth and stencil.
Fixes an extra 1024x1024x4 MSAA Z/S store on WebGL fishtank on cheza.
Reported-by: Dave Airlie <airlied@redhat.com>
Fixes: db2ae5112106 ("mesa: Skip partial InvalidateFramebuffer of packed depth/stencil.")
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3370>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3370>
Rob Clark [Mon, 13 Jan 2020 19:36:19 +0000 (11:36 -0800)]
mesa/st: lower samplers before nir_lower_tex
Fixes incorrect lowering of YUV samplers when there are non-yuv
samplers.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3368>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3368>
Rob Clark [Mon, 13 Jan 2020 19:34:53 +0000 (11:34 -0800)]
nir: assert that nir_lower_tex runs after lowering derefs
It isn't going to do the right thing, because texture_index/
sampler_index defaults to zero.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3368>
Gurchetan Singh [Thu, 15 Aug 2019 01:09:28 +0000 (18:09 -0700)]
i965: support EXT_EGL_image_storage
i965 can support this.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Gurchetan Singh [Thu, 7 Nov 2019 02:02:37 +0000 (18:02 -0800)]
i965: refactor intel_image_target_texture_2d
intel_image_target_texture_tex_storage can reuse much of this
code.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Gurchetan Singh [Wed, 21 Aug 2019 22:07:28 +0000 (15:07 -0700)]
i965: track if image is created by a dmabuf
Will be used by EXT_EGL_image_storage later.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Gurchetan Singh [Thu, 7 Nov 2019 01:18:13 +0000 (17:18 -0800)]
dri_util: add driImageFormatToSizedInternalGLFormat function
This is needed to implement the EXT_EGL_image_storage spec:
"If <target> is GL_TEXTURE_2D, then the resultant texture must have a
sized internal format which is colorspace and size compatible with the
dma-buf. If the GL is unable to determine such a format, the error
INVALID_OPERATION is generated."
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Gurchetan Singh [Wed, 14 Aug 2019 22:16:04 +0000 (15:16 -0700)]
glapi / teximage: implement EGLImageTargetTexStorageEXT
Check various parts of the EXT_EGL_image_storage spec, and add a
new vfunc for drivers implementing it.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Gurchetan Singh [Wed, 14 Aug 2019 01:16:41 +0000 (18:16 -0700)]
teximage: split out helper from EGLImageTargetTexture2DOES
The major differences between EXT_EGL_image_storage and
EGLImageTargetTexture2DOES are:
(1) The texture target is made immutable
(2) EXT_EGL_image_storage supports non-2D targets.
We can reuse EGLImageTargetTexture2D and FreeTextureImageBuffer
for (1) pretty easily. For (2), let's just not support the
complicated targets. Let's reuse aspects of the
EGLImageTargetTexture2DOES implementation.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Jason Ekstrand [Mon, 13 Jan 2020 19:49:57 +0000 (13:49 -0600)]
anv: Memset array properties
This is probably better than possibly leaving those bytes uninitialized
even if the app will theoretically not use them.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3369>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3369>
Jason Ekstrand [Mon, 13 Jan 2020 18:55:41 +0000 (12:55 -0600)]
anv: Don't over-advertise descriptor indexing features
We should only advertise sub-features if we advertise the extension.
Fixes: 6e230d7607f "anv: Implement VK_EXT_descriptor_indexing"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3369>
Jason Ekstrand [Fri, 10 Jan 2020 21:30:02 +0000 (15:30 -0600)]
intel/blorp: Fill out all the dwords of MI_ATOMIC
This makes us valgrind clean again.
Fixes: 9175c7058efb "intel/blorp: Make blorp update the clear color..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3366>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3366>
Tomeu Vizoso [Mon, 13 Jan 2020 10:47:58 +0000 (11:47 +0100)]
gitlab-ci: Upgrade kernel for LAVA jobs to v5.5-rc5
Some fixes got in that should prevent hangs in lima jobs.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3363>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3363>
Daniel Schürmann [Fri, 10 Jan 2020 16:19:40 +0000 (17:19 +0100)]
aco: fix unconditional demote_to_helper
This patch fixes an out-of-bounds access on p_exit_early
and binds the exec register to the correct operand.
Fixes: 2ea9e59e8d976ec77800d2a20645087b96d1e241 ('aco: move s_andn2_b64 instructions out of the p_discard_if')
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3347>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3347>
Marek Olšák [Thu, 9 Jan 2020 21:41:13 +0000 (16:41 -0500)]
radeonsi: don't enable VBOs in user SGPRs if compute-based culling can be used
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Tue, 7 Jan 2020 23:23:53 +0000 (18:23 -0500)]
radeonsi: put up to 5 VBO descriptors into user SGPRs
gfx6-8: 1 VBO descriptor in user SGPRs
gfx9-10: 5 VBO descriptors in user SGPRs
We no longer pull up to 5 VBO descriptors from GTT when SDMA is disabled.
Totals from affected shaders:
SGPRS:
1110528 ->
1170528 (5.40 %)
VGPRS: 952896 -> 951936 (-0.10 %)
Spilled SGPRs: 83 -> 61 (-26.51 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size:
23766296 ->
22843920 (-3.88 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 179344 -> 179344 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 8 Jan 2020 20:52:44 +0000 (15:52 -0500)]
ac,radeonsi: increase the maximum number of shader args and return values
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 8 Jan 2020 00:45:01 +0000 (19:45 -0500)]
radeonsi: simplify si_set_vertex_buffers
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Tue, 7 Jan 2020 23:16:59 +0000 (18:16 -0500)]
radeonsi: don't allow draw calls with uninitialized VS inputs
These always hang, because vertex buffer descriptors are not set up.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Tue, 7 Jan 2020 23:10:38 +0000 (18:10 -0500)]
radeonsi: add si_context::num_vertex_elements
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Tue, 7 Jan 2020 23:06:14 +0000 (18:06 -0500)]
radeonsi: rename desc_list_byte_size -> vb_desc_list_alloc_size
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Lionel Landwerlin [Tue, 26 Nov 2019 15:53:09 +0000 (17:53 +0200)]
anv: set stencil layout for input attachments
If an input attachment has a stencil format, we need to set this.
v2: Fish out VkAttachmentReferenceStencilLayoutKHR from
VkAttachmentReference2KHR::pNext (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: c1c346f16673 ("anv: implement VK_KHR_separate_depth_stencil_layouts")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2891>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2891>
Jason Ekstrand [Mon, 13 Jan 2020 18:20:48 +0000 (12:20 -0600)]
anv: Drop an unused variable
Jason Ekstrand [Wed, 8 Jan 2020 01:22:13 +0000 (19:22 -0600)]
nir/lower_atomics_to_ssbo: Also lower barriers
This is more correct for a pass which is supposed to completely lower
away atomic counters. It also lets us stop supporting atomic counter
barriers in most of the drivers.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Jason Ekstrand [Tue, 7 Jan 2020 20:54:26 +0000 (14:54 -0600)]
nir: Rename nir_intrinsic_barrier to control_barrier
This is a more explicit name now that we don't want it to be doing any
memory barrier stuff for us.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Jason Ekstrand [Tue, 7 Jan 2020 20:58:45 +0000 (14:58 -0600)]
intel/nir: Stop adding redundant barriers
Now that both GLSL and SPIR-V are adding shared and tcs_patch barriers
(as appropreate) prior to the nir_intrinsic_barrier, we don't need to do
it ourselves in the back-end. This reverts commit
26e950a5de01564e3b5f2148ae994454ae5205fe.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Jason Ekstrand [Tue, 7 Jan 2020 20:40:53 +0000 (14:40 -0600)]
nir/glsl: Emit memory barriers as part of barrier()
The GLSL barrier() intrinsic does an implicit shared memory barrier in
compute shaders and an implicit TCS patch output barrier in tessellation
control shaders. We'd like NIR's barrier intrinsic to just be a control
flow barrier and not have memory implications. To satisfy this, we need
to add an extra memory barrier in front of each nir_intrinsic_barrier.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Jason Ekstrand [Tue, 7 Jan 2020 18:01:13 +0000 (12:01 -0600)]
spirv: Add output memory semantics to OpControlBarrier in TCS
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Jason Ekstrand [Tue, 7 Jan 2020 17:35:54 +0000 (11:35 -0600)]
spirv: Add a workaround for OpControlBarrier on old GLSLang
As per the Vulkan memory model, the proper translation of GLSL barrier()
is an OpControlBarrier with a scope of Workgroup and semantics of
Acquire, Release, and WorkgroupMemory. Older versions of GLSLang gave
an OpControlBarrier with semantics of None so we need to patch it up on
those versions.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Jason Ekstrand [Tue, 7 Jan 2020 20:18:56 +0000 (14:18 -0600)]
nir: Add a new memory_barrier_tcs_patch intrinsic
Right now, it's implemented as a no-op for everyone. For most drivers,
it's a switch case in the NIR -> whatever which just breaks. For ir3,
they already have code to delete tessellation barriers so we just add a
case to also delete memory_barrier_tcs_patch.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Jason Ekstrand [Wed, 8 Jan 2020 01:21:37 +0000 (19:21 -0600)]
llmvpipe: No-op implement more barriers
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Jason Ekstrand [Tue, 7 Jan 2020 20:13:43 +0000 (14:13 -0600)]
nir: Handle barriers with more granularity in combine_stores
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Jason Ekstrand [Tue, 7 Jan 2020 20:11:55 +0000 (14:11 -0600)]
nir: Handle more barriers in dead_write and copy_prop
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Jason Ekstrand [Tue, 7 Jan 2020 22:14:56 +0000 (16:14 -0600)]
intel/vec4: Support scoped_memory_barrier
Fixes: 06aecb14c0476 "anv: Implement VK_KHR_vulkan_memory_model"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Andreas Baierl [Tue, 7 Jan 2020 16:06:46 +0000 (17:06 +0100)]
lima: Add stencil support
This re-enables and fixes support for stencil buffer.
It fixes 365 stencil related deqp tests. All tests that use INCR, INCR_WRAR,
DECR and DECR_WRAP as a stencil op still fail, but they also fail with the
blob, so we may ignore that for now.
We still have dEQP-GLES2.functional.depth_stencil_clear.depth_stencil_masked
failing, which is strange because it's the only one out of the
depth_stencil_clear.* set.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Andreas Baierl [Mon, 13 Jan 2020 07:58:09 +0000 (08:58 +0100)]
lima/parser: Make rsw alpha blend parsing more readable
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Boris Brezillon [Mon, 6 Jan 2020 13:31:38 +0000 (14:31 +0100)]
panfrost: Remove unneeded phi nodes
Add a pass to remove unneeded phi nodes as done in other drivers.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3294>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3294>
Rhys Perry [Thu, 2 Jan 2020 17:05:30 +0000 (17:05 +0000)]
aco: check if multiplication/clamp is live when applying output modifier
It's possible that a multiplication/clamp is dead code and the single use
is from a different user.
Fixes portal rendering in Path of Exile when global illumination is
enabled.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Fixes: 93c8ebfa780 ('aco: Initial commit of independent AMD compiler')
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
Rhys Perry [Mon, 16 Dec 2019 13:58:16 +0000 (13:58 +0000)]
aco: disable add combining for ds_swizzle_b32
ds_bpermute_b32/ds_permute_b32 are fine, I think
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 93c8ebfa780 ('aco: Initial commit of independent AMD compiler')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
Rhys Perry [Mon, 16 Dec 2019 13:30:10 +0000 (13:30 +0000)]
aco: don't DCE atomics with return values
We don't create atomics with definitions if they are not used in NIR, but
our own DCE can remove the uses if an export turns out to be null.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 93c8ebfa780 ('aco: Initial commit of independent AMD compiler')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
Rhys Perry [Mon, 16 Dec 2019 11:29:08 +0000 (11:29 +0000)]
aco: set exec_potentially_empty for demotes
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 93c8ebfa780 ('aco: Initial commit of independent AMD compiler')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
Rhys Perry [Fri, 13 Dec 2019 16:59:54 +0000 (16:59 +0000)]
aco: better handle neg/abs of sgprs
isel/label_instruction currently doesn't create these but we should
probably check anyway.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
Rhys Perry [Wed, 11 Dec 2019 19:41:22 +0000 (19:41 +0000)]
aco: check usesModifiers() when identifying a neg/abs
This was fine because a literal used to mean that it didn't use modifiers,
but now VOP3 can take a literal on GFX10.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
Rhys Perry [Wed, 11 Dec 2019 15:54:18 +0000 (15:54 +0000)]
aco: handle omod successors with the constant in the first operand
No pipeline-db changes
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
Rhys Perry [Wed, 11 Dec 2019 16:57:11 +0000 (16:57 +0000)]
aco: handle VOP3 modifiers when combining a constant comparison's NaN test
No pipeline-db changes
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
Rhys Perry [Tue, 24 Sep 2019 16:21:51 +0000 (17:21 +0100)]
aco: fix uninitialized data in the binary
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Fixes: 93c8ebfa780 ('aco: Initial commit of independent AMD compiler')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
Rhys Perry [Mon, 9 Dec 2019 18:00:55 +0000 (18:00 +0000)]
aco: fix imageSize()/textureSize() with large buffers on GFX8
Tested on Navi by using dEQP-VK.image.image_size.buffer.* and the GFX8
path with the size multipled by the stride.
dEQP-VK.image.image_size.buffer.* was also run with the tests modified to
use a 96bit format.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 93c8ebfa780 ('aco: Initial commit of independent AMD compiler')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
Rhys Perry [Mon, 9 Dec 2019 13:38:47 +0000 (13:38 +0000)]
aco: set vm for pos0 exports on GFX10
RADV's LLVM backend and radeonsi does the same thing.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3081>
Daniel Ogorchock [Tue, 7 Jan 2020 16:07:37 +0000 (10:07 -0600)]
panfrost: Fix headers and gpu_headers memory leak
The per-batch headers/gpu_headers dynarrays need to be freed during the
batch cleanup to prevent leaking.
Signed-off-by: Daniel Ogorchock <daniel.ogorchock@garmin.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3308>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3308>
Daniel Ogorchock [Mon, 6 Jan 2020 23:33:49 +0000 (17:33 -0600)]
panfrost: Fix panfrost_bo_access memory leak
The bo access needs to be freed prior to removing it from its hash
table. This prevents leaking them over time.
Signed-off-by: Daniel Ogorchock <daniel.ogorchock@garmin.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3308>
Samuel Pitoiset [Wed, 8 Jan 2020 07:55:16 +0000 (08:55 +0100)]
radv/gfx10: improve performance for TES using PrimID but not exporting it
This field is for the primitive ID export to the fragment shader.
Ported from RadeonSI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 9 Jan 2020 07:24:11 +0000 (08:24 +0100)]
radv/gfx10: add support for NGG passthrough mode
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 8 Jan 2020 07:39:10 +0000 (08:39 +0100)]
radv/gfx10: do not declare LDS for NGG if useless
Only needed for NGG without passthrough mode or for NGG streamout.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 9 Jan 2020 07:23:12 +0000 (08:23 +0100)]
radv/gfx10: determine if a pipeline is eligible for NGG passthrough
It can't be enabled for geometry shaders, for NGG streamout and
for vertex shaders that export the primitive ID. NGG passthrough
requires that LDS isn't used.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Tue, 7 Jan 2020 16:01:39 +0000 (17:01 +0100)]
radv/gfx10: disable vertex grouping
RadeonSI and AMDVLK does that.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Ilia Mirkin [Sun, 12 Jan 2020 05:07:05 +0000 (00:07 -0500)]
nvc0: treat all draws without color0 broadcast as MRT
Per the semi-recently-released NVIDIA docs, when this bit is not
enabled, then the result for RT[0] will be used. So if e.g. only a
single RT is drawn to and it's not RT[2], the results will not be
visible. Fixes
GTF-GL45.gtf33.GL3Tests.explicit_attrib_location.explicit_attrib_location_pipeline
which was failing due to a frag shader outputting only to location=2.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Tue, 7 Jan 2020 02:54:26 +0000 (21:54 -0500)]
gm107/ir: avoid combining geometry shader stores at 0x60
This corresponds to gl_PrimitiveID and gl_Layer. When both of these are
stored in a single AST.64 or AST.128 operation, then it appears as
though the whole store fails. Fixes the recently extended
glsl-1.50-transform-feedback-builtins piglit, and also
gtf30.GL3Tests.transform_feedback.transform_feedback_builtins.
The issue was reproduced on GM107 and GP108 but not GK208 nor GK104.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Tue, 31 Dec 2019 02:30:28 +0000 (21:30 -0500)]
nvc0: add dummy reset status support
Perhaps in a future implementation, such events could be passed back to
the driver, or queried directly. However for now, this is required for
GL 4.3 robustness contexts.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Mon, 30 Dec 2019 02:50:34 +0000 (21:50 -0500)]
nv50,nvc0: fix destination coordinates of blit
The fix was found by Karol Herbst a long time ago, but it was unclear
why it helped or if it would create additional problems. This change
adds a comment that explains what's going on, and in the process also
normalizes the nv50 implementation to match.
The coordinates which are fed to gl_Position map directly to pixel
coordinates, since the viewport transform is disabled. If the
framebuffer is MSAA, then that doesn't affect the pixel coordinates at
all, it's just that each pixel has multiple samples.
Note that this makes it really clear that this approach is inappropriate
for EXT_framebuffer_multisample_blit_scaled, and also the 3d path will
fail terribly for direct copies. Thankfully the 2d path normally takes
care of this.
Fixes KHR-GL43.packed_depth_stencil.blit.depth32f_stencil8 as well as
scaling issues in a number of EXT_framebuffer_multisample-related piglit
tests (although they continue to fail due to inaccuracies).
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Bas Nieuwenhuizen [Tue, 31 Dec 2019 20:19:20 +0000 (21:19 +0100)]
radv: Use new scanout gfx9 metadata flag.
This updates for the new metadata ABI in radeonsi.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3244>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3244>
Vasily Khoruzhick [Sat, 11 Jan 2020 03:58:41 +0000 (19:58 -0800)]
lima: fix PIPE_CAP_* to mark features that aren't supported yet
lima doesn't support alpha test, flat shading, two-sided color nor
clip planes. We can enable these caps when corresponding hw features
are implemented in the driver.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Vasily Khoruzhick [Sat, 11 Jan 2020 03:48:11 +0000 (19:48 -0800)]
lima: implement polygon offset
Fixes some of dEQP-GLES2.functional.polygon_offset.* tests and shadows in Q3A.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Vasily Khoruzhick [Sat, 11 Jan 2020 03:35:11 +0000 (19:35 -0800)]
lima: fix viewport clipping
Apparently Mali4x0 doesn't do viewport clipping, so anything rendered beyond viewport
is still rendered. Looks like we need to use scissors to do clipping.
Fixes most of dEQP-GLES2.functional.clipping.*, 6 out of 7 remaining failures
fail on blob as well. Remaining [1] fails on many other gallium drivers.
[1] dEQP-GLES2.functional.clipping.triangle_vertex.clip_three.clip_neg_x_neg_z_and_pos_x_pos_z_and_neg_x_neg_y_pos_z
Suggested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Vasily Khoruzhick [Sat, 11 Jan 2020 03:33:21 +0000 (19:33 -0800)]
lima: fix PLBU_CMD_PRIMITIVE_SETUP command
Apparently it doesn't depend on primitive type, the value
only depends on whether we specify point size via PLBU command --
bit 12 is set in this case
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Timothy Arceri [Fri, 10 Jan 2020 01:28:24 +0000 (12:28 +1100)]
glsl: fix potential bug in nir uniform linker
The state value of main_uniform_storage_index will be wrong for
add_parameter() when find_and_update_previous_uniform_storage()
finds a uniform if there is more than 1 uniform used in
multiple shader stages.
The new code is also simpler.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Christian Gmeiner [Fri, 10 Jan 2020 19:12:28 +0000 (20:12 +0100)]
etnaviv: add deqp debug option
This new debug option will fake some driver CAPs to be able to run dEQP
for GLES3.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3351>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3351>
Timur Kristóf [Fri, 10 Jan 2020 16:13:06 +0000 (17:13 +0100)]
aco/wave32: Set the definitions of v_cmp instructions to the lane mask.
The output of v_cmp instructions is s1 (a single SGPR) in wave32 mode,
as opposed to s2 (an SGPR-pair) in wave64 mode.
A couple of cases where this should have been fixed were omitted from
the previous patch by mistake.
Fixes: e0bcefc3a0a15a8c7ec00cfa53fd8fffcc07342a
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Alyssa Rosenzweig [Fri, 10 Jan 2020 22:47:57 +0000 (17:47 -0500)]
pan/midgard: Support indirect UBO offsets
...in case we have arrays in a UBO block that we'd like to access
indirectly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3352>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3352>
Francisco Jerez [Sat, 28 Dec 2019 00:38:26 +0000 (16:38 -0800)]
intel/fs: Make implied_mrf_writes() an fs_inst method.
This will be convenient in a later commit enabling SIMD32 fragment
shaders, and happens to fix the calculation for MATH instructions
which is currently inaccurate for SIMD-lowered instructions on Gen4-5
platforms (all of them on Gen4 in SIMD16 mode), since it was based on
the shader's dispatch width rather than on the actual execution size
of the instruction.
This causes some shader-db noise on Gen4 due to the more compact
register allocation interacting with the SEND dependency workarounds,
but otherwise no major changes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Sun, 29 Dec 2019 14:10:47 +0000 (06:10 -0800)]
intel/fs/cse: Fix non-deterministic behavior due to inaccurate liveness calculation.
The liveness calculation done by the local CSE pass in order to prune
AEB entries whose sources are no longer live is currently inaccurate,
because the live intervals are calculated once at the beginning of the
pass, so they don't take into account any of the copy instructions
inserted by the CSE pass as it makes progress. However the IP counter
used in that calculation is based on the start_ip of the basic block,
which is updated automatically whenever any instructions are inserted
into the CFG. This causes the IP counter and liveness intervals to
get out of sync in programs with multiple basic blocks, causing the
CSE pass to toss AEB entries prematurely, which can lead to missed
optimization opportunities rather non-deterministically.
On BDW this leads to the following shader-db changes:
total instructions in shared programs:
14952488 ->
14951763 (-0.00%)
instructions in affected programs: 45416 -> 44691 (-1.60%)
helped: 40
HURT: 4
total spills in shared programs: 20989 -> 20970 (-0.09%)
spills in affected programs: 103 -> 84 (-18.45%)
helped: 3
HURT: 0
total fills in shared programs: 24981 -> 24926 (-0.22%)
fills in affected programs: 127 -> 72 (-43.31%)
helped: 3
HURT: 0
In addition it avoids a number of regressions in combination with some
of the optimization changes I'm working on for SIMD32, which would
have made CSE more effective... Causing it to be less effective
elsewhere in the program astonishingly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Sat, 28 Dec 2019 01:06:30 +0000 (17:06 -0800)]
intel/fs: Fix nir_intrinsic_load_barycentric_at_sample for SIMD32.
For uniform sample ID, only the first channel of msg_data will be
initialized. We need to pass that component only to the SEND message
for SIMD lowering to unzip the descriptor source correctly.
Fixes several dozens of conformance test failures with SIMD32 fragment
shaders enabled, including:
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.dynamic_sample_number.*
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Sat, 28 Dec 2019 00:08:04 +0000 (16:08 -0800)]
intel/fs/gen8+: Fix r127 dst/src overlap RA workaround for EOT message payload.
The problem occured when the return payload of a SIMD8 SEND
instruction was re-used as source payload of an EOT SEND message. In
such cases the interference edge added by that workaround between the
payload and grf127_send_hack_node would have no effect, because the
payload would be allocated to a fixed range of registers containing
r127 by the special handling of EOT message payloads in the same
function. This would cause things to blow up if the source payload of
the first SIMD8 message ended up being allocated to a range which
happened to overlap the destination.
Fix it by avoiding r127 altogether in the allocation of EOT message
payloads.
The problem can be reproduced on ICL with the fp-indirections2 Piglit
test-case in combination with the other optimizer changes of this
series.
Fixes: 232ed8980217 "i965/fs: Register allocator shoudn't use grf127 for sends dest"
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Mon, 25 Nov 2019 00:12:12 +0000 (16:12 -0800)]
intel/fs/gen11+: Handle ROR/ROL in lower_simd_width().
Prevents invalid code from being emitted for ROR/ROL instructions in
SIMD32 shaders.
The problem can be reproduced with the following tests while forcing
SIMD32 to be used for fragment shaders:
piglit.shaders.glsl-rotate-left
piglit.shaders.glsl-rotate-right
However the issue could occur in production already with compute
shaders and a workgroup size large enough to trigger SIMD32 dispatch.
Fixes: 83fdec0f0de "intel/compiler: Enable the emission of ROR/ROL instructions"
Cc: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Fri, 27 Dec 2019 22:10:31 +0000 (14:10 -0800)]
glsl: Fix software 64-bit integer to 32-bit float conversions.
The current implementation was broken for any integers between 2^24
and 2^30 (it would return zero for me on ICL). The reason is that for
such integers we wouldn't take the 'if (0 <= shiftCount)' early return
path, however 'shiftCount + 7' would be positive, leading to a
negative 'count' argument passed to __shift64RightJamming(), which
would give undefined results.
This reworks the affected conversion functions to use either
__shortShift64Left() or __shift64RightJamming() based on the sign of
the final shift count, which should avoid the problem. In addition
this should qualify as a clean-up/optimization -- This implementation
of the conversion functions translates to 7 instructions less than the
original on Intel hardware.
This fixes the 'KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot'
conformance tests on soft fp64 hardware with large enough subgroup
size (>16).
Fixes: d5cf6e92b4f7 "glsl: Add built-in functions to do uint64_to_fp32(uint64_t)"
Fixes: c9d333a6b76e "glsl: Add built-in functions to do int64_to_fp32(int64_t)"
Cc: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Daniel Schürmann [Wed, 8 Jan 2020 11:46:47 +0000 (12:46 +0100)]
aco: compact aco::span<T> to use uint16_t offset and size instead of pointer and size_t.
This reduces the size of the Instruction base class
from 40 bytes to 16 bytes. No pipelinedb changes.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3332>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3332>
Daniel Schürmann [Wed, 8 Jan 2020 10:49:11 +0000 (11:49 +0100)]
aco: compact various Instruction classes
No pipelinedb changes.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3332>
Andrii Simiklit [Fri, 10 Jan 2020 15:37:13 +0000 (17:37 +0200)]
mesa/st: fix a memory leak in get_version
This patch prevents memory leak in get_version function in st_manager.c
This issue was found by valgrind:
16 bytes in 1 blocks are definitely lost in loss record 6 of 1,418
at 0x483CD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
by 0x63D9476: st_init_extensions (st_extensions.c:1679)
by 0x63B803B: get_version (st_manager.c:1271)
by 0x63B8124: st_api_query_versions (st_manager.c:1289)
by 0x63266EF: dri_init_screen_helper (dri_screen.c:583)
by 0x6321B12: dri2_init_screen (dri2.c:2110)
by 0x631AACC: driCreateNewScreen2 (dri_util.c:155)
by 0x5D58192: dri3_create_screen (dri3_glx.c:897)
by 0x5D39829: AllocAndFetchScreenConfigs (glxext.c:815)
by 0x5D39C57: __glXInitialize (glxext.c:941)
by 0x5D3290A: GetGLXPrivScreenConfig (glxcmds.c:174)
by 0x5D34F38: glXQueryExtensionsString (glxcmds.c:1307)
Fixes: eca8032f20d0970184843d98e2bddb688e94a3a9 ("gallium: Add ARB_gl_spirv support")
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3345>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3345>
Lasse Lopperi [Fri, 10 Jan 2020 08:47:55 +0000 (10:47 +0200)]
freedreno/drm: Fix memory leak in softpin implementation
Free the memory allocated for cmds/reloc_bos array when destoying the
associated ringbuffer.
For similar fix for the non-softpin implementation see:
https://gitlab.freedesktop.org/mesa/mesa/commit/
d014af98b7afc69f4f733c8b8b6f2e3438e68407
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2324
Fixes: f3cc0d2 ("freedreno: import libdrm_freedreno + redesign submit")
Signed-off-by: Lasse Lopperi <lasse.lopperi@ge.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3342>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3342>
Rhys Perry [Wed, 18 Dec 2019 16:18:35 +0000 (16:18 +0000)]
aco: limit register usage for large work groups
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Timur Kristóf [Fri, 10 Jan 2020 11:06:53 +0000 (12:06 +0100)]
ac/llvm: Fix ac_build_reduce in wave32 mode.
Previously, when cluster_size was set to 0, it always worked as if
the cluster size was 64. This commit fixes it in wave32 mode by
changing to work as if the cluster size was set to 32.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Pierre-Eric Pelloux-Prayer [Thu, 9 Jan 2020 13:59:49 +0000 (14:59 +0100)]
radeonsi: release saved resources in si_compute_do_clear_or_copy
Fixes: 9b331e462e5 ("radeonsi: use compute shaders for clear_buffer & copy_buffer")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Pierre-Eric Pelloux-Prayer [Thu, 9 Jan 2020 13:57:41 +0000 (14:57 +0100)]
radeonsi: release saved resources in si_compute_clear_12bytes_buffer
Fixes: 6c901f06752 ("radeonsi: use compute shader for clear 12-byte buffer")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>