Rhys Perry [Wed, 27 Nov 2019 17:27:36 +0000 (17:27 +0000)]
aco: improve FLAT/GLOBAL scheduling
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Rhys Perry [Wed, 27 Nov 2019 17:15:54 +0000 (17:15 +0000)]
aco: don't enable store_global for helper invocations
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Rhys Perry [Wed, 27 Nov 2019 17:08:27 +0000 (17:08 +0000)]
aco: fix SADDR with FLAT on GFX10
The reference guide is incorrect and SADDR is actually used with FLAT on
GFX10.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Rhys Perry [Wed, 27 Nov 2019 17:06:10 +0000 (17:06 +0000)]
aco: fix assembly of FLAT/GLOBAL atomics
They can take both a definition and data operand
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Rhys Perry [Tue, 26 Nov 2019 21:06:35 +0000 (21:06 +0000)]
aco: fix GFX10 opcodes for some global/flat atomics
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Rhys Perry [Wed, 27 Nov 2019 17:23:02 +0000 (17:23 +0000)]
aco: improve WAR hazard workaround with >64bit stores
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Rhys Perry [Wed, 27 Nov 2019 17:20:15 +0000 (17:20 +0000)]
aco: add v_nop inbetween exec write and VMEM/DS/FLAT
LLVM and the proprietary compiler seem to do this
Fixes: b01847bd9 ("aco/gfx10: Fix mitigation of VMEMtoScalarWriteHazard.")
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Rhys Perry [Wed, 27 Nov 2019 17:11:58 +0000 (17:11 +0000)]
aco: fix incorrect cast in parse_wait_instr()
s_waitcnt is SOPP, not SOPK
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Rhys Perry [Wed, 27 Nov 2019 17:24:23 +0000 (17:24 +0000)]
aco: fix i2i64
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Rhys Perry [Thu, 28 Nov 2019 15:29:40 +0000 (15:29 +0000)]
aco: propagate p_wqm on an image_sample's coordinate p_create_vector
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2156
Fixes: 93c8ebfa780 ('aco: Initial commit of independent AMD compiler')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Christian Gmeiner [Fri, 29 Nov 2019 14:15:27 +0000 (15:15 +0100)]
etnaviv: remove dead code
ptiled is always NULL so the if statement is useless.
CoverityID:
1415572
Fixes: b9627765303 ("etnaviv: rework compatible render base")
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Christian Gmeiner [Sat, 19 Oct 2019 17:12:53 +0000 (19:12 +0200)]
etnaviv: handle integer case for GENERIC_ATTRIB_SCALE
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Christian Gmeiner [Sat, 19 Oct 2019 16:48:35 +0000 (18:48 +0200)]
etnaviv: fix R10G10B10A2 vertex format entries
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Christian Gmeiner [Wed, 16 Oct 2019 04:31:17 +0000 (06:31 +0200)]
etnaviv: use NORMALIZE_SIGN_EXTEND
The blob driver does something like this for all vertex formats:
if (normalize) {
if (OPENGL_ES30)
val = VIVS_FE_VERTEX_ELEMENT_CONFIG_NORMALIZE_SIGN_EXTEND;
else
val = VIVS_FE_VERTEX_ELEMENT_CONFIG_NORMALIZE_ON;
} else {
val = VIVS_FE_VERTEX_ELEMENT_CONFIG_NORMALIZE_OFF;
}
As there is no way to get to that information in gallium we always
assume OPENGL_ES30.
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Christian Gmeiner [Sun, 13 Oct 2019 08:54:48 +0000 (10:54 +0200)]
etnaviv: fix integer vertex formats
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Jonathan Gray [Thu, 28 Nov 2019 05:57:23 +0000 (16:57 +1100)]
i965: update Makefile.sources for perf changes
brw_performance_query_metrics.h was removed in
134e750e16bfc53480e0bba6f0ae3e1d2a7fb87c and
brw_performance_query.h was removed in
8ae6667992ccca41d08884d863b8aeb22a4c4e65
remove reference to these files from Makefile.sources
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Fixes: 134e750e16bfc53480e0 ("i965: extract performance query metrics")
Fixes: 8ae6667992ccca41d088 ("intel/perf: move query_object into perf")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Vinson Lee [Thu, 28 Nov 2019 08:05:13 +0000 (00:05 -0800)]
scons: Bump C standard to gnu11 on macOS 10.15.
Fix build error on macOS 10.15 Catalina.
src/util/u_queue.c:179:7: error: implicit declaration of function 'timespec_get' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
timespec_get(&ts, TIME_UTC);
^
timespec_get needs C11 starting with macOS 10.15.
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/time.h
193 #if (__DARWIN_C_LEVEL >= __DARWIN_C_FULL) && \
194 ((defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L) || \
195 (defined(__cplusplus) && __cplusplus >= 201703L))
196 /* ISO/IEC 9899:201x 7.27.2.5 The timespec_get function */
197 #define TIME_UTC 1 /* time elapsed since epoch */
198 __API_AVAILABLE(macosx(10.15), ios(13.0), tvos(13.0), watchos(6.0))
199 int timespec_get(struct timespec *ts, int base);
200 #endif
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Boris Brezillon [Thu, 14 Nov 2019 08:35:27 +0000 (09:35 +0100)]
panfrost: Make sure we reset the damage region of RTs at flush time
We must reset the damage info of our render targets here even though a
damage reset normally happens when the DRI layer swaps buffers. That's
because there can be implicit flushes the GL app is not aware of, and
those might impact the damage region: if part of the damaged portion
is drawn during those implicit flushes, you have to reload those areas
before next draws are pushed, and since the driver can't easily know
what's been modified by the draws it flushed, the easiest solution is
to reload everything.
Reported-by: Carsten Haitzler <raster@rasterman.com>
Fixes: 65ae86b85422 ("panfrost: Add support for KHR_partial_update()")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Boris Brezillon [Fri, 8 Nov 2019 23:02:54 +0000 (00:02 +0100)]
gallium: Fix the ->set_damage_region() implementation
BACK_LEFT attachment can be outdated when the user calls
KHR_partial_update() (->lastStamp != ->texture_stamp), leading to a
damage region update on the wrong pipe_resource object.
Let's delay the ->set_damage_region() call until the attachments are
updated when we're in that case.
Reported-by: Carsten Haitzler <raster@rasterman.com>
Fixes: 492ffbed63a2 ("st/dri2: Implement DRI2bufferDamageExtension")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Erik Faye-Lund [Wed, 27 Nov 2019 16:46:29 +0000 (17:46 +0100)]
zink: silence coverity error
Coverity doesn't know that we always have coordinates if we have lod. To
avoid annoying errors, let's just zero-initialize this.
CoverityID:
1455202
Reviewed-by: Dave Airlie <airlied@redhat.com>
Erik Faye-Lund [Wed, 27 Nov 2019 16:44:05 +0000 (17:44 +0100)]
zink: error-check right variable
That's not the value we just allocated...
CoverityID:
1455177
Reviewed-by: Dave Airlie <airlied@redhat.com>
Erik Faye-Lund [Wed, 27 Nov 2019 16:38:53 +0000 (17:38 +0100)]
zink: avoid NULL-deref
Same story as the previous two commits; these functions dereference the
memory they are pointed at. We can't do that.
CoverityID:
1455180
Reviewed-by: Dave Airlie <airlied@redhat.com>
Erik Faye-Lund [Wed, 27 Nov 2019 16:34:08 +0000 (17:34 +0100)]
zink: avoid NULL-deref
Similar to the previous commit, pipe_resource_reference also dereference
the memory pointed at. Let's avoid it.
CoverityID:
1455198
Reviewed-by: Dave Airlie <airlied@redhat.com>
Erik Faye-Lund [Wed, 27 Nov 2019 16:17:08 +0000 (17:17 +0100)]
zink: avoid NULL-deref
zink_render_pass_reference will dereference the memory 'dst' points at,
which can't really go well. All we want to do here is to increase the
reference-count, so let's use a different helper for that instead.
CoverityID:
1455200
Reviewed-by: Dave Airlie <airlied@redhat.com>
Erik Faye-Lund [Wed, 27 Nov 2019 16:31:28 +0000 (17:31 +0100)]
zink: handle calloc-failure
In case we fail to allocate the context, we should notice and fail
gracefully.
CoverityID:
1455193
Reviewed-by: Dave Airlie <airlied@redhat.com>
Erik Faye-Lund [Wed, 27 Nov 2019 16:22:24 +0000 (17:22 +0100)]
zink: do not try to destroy NULL-fence
destroy_fence doesn't handle NULL-pointers gracefully. So let's avoid
hitting that code-path, by simply returning NULL early here instead.
CoverityID:
1455179
Reviewed-by: Dave Airlie <airlied@redhat.com>
Erik Faye-Lund [Wed, 27 Nov 2019 15:30:29 +0000 (16:30 +0100)]
zink: delete query rather than allocating a new one
It seems I had some fat fingers when writing this function, and I
accidentally ended up allocating a new query and immediately trying to
delete an uninitialized pool instead of just deleting the pool of the
query that was passed.
CoverityID:
1455196
Reviewed-by: Dave Airlie <airlied@redhat.com>
Erik Faye-Lund [Thu, 28 Nov 2019 17:22:24 +0000 (18:22 +0100)]
zink: fix crash when restoring sampler-states
When I changed to heap-allocated sampler-objects, I missed the code-path
that restores sampler-states after the blitter; it needs an array of
pointers, not an array of VkSampler objects to behave.
This fixes spec@arb_texture_cube_map@copyteximage for me.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 5ea787950f6 ("zink: heap-allocate samplers objects")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Erik Faye-Lund [Thu, 28 Nov 2019 17:41:30 +0000 (18:41 +0100)]
zink: reject invalid sample-counts
Vulkan only allows power-of-two sample counts. We already kinda checked
for this, but forgot to validate the result in the end. Let's check the
result and error properly.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Erik Faye-Lund [Thu, 28 Nov 2019 17:41:05 +0000 (18:41 +0100)]
zink: use true/false instead of TRUE/FALSE
Reviewed-by: Dave Airlie <airlied@redhat.com>
Erik Faye-Lund [Thu, 28 Nov 2019 16:51:20 +0000 (17:51 +0100)]
st/mesa: unmap pbo after updating cache
Unmapping first leads to accessing an invalid pointer. So let's switch
these lines around.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Vinson Lee [Thu, 28 Nov 2019 07:37:00 +0000 (23:37 -0800)]
panfrost: Fix gnu-empty-initializer build errors.
Fixes: a24d6fbae60c ("meson: Add -Werror=gnu-empty-initializer to MSVC compat args")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Timothy Arceri [Thu, 28 Nov 2019 04:26:34 +0000 (15:26 +1100)]
docs: update source code repository documentation
This drops all the old documentaion around applying for push access.
Also this removes the documentation stating that you can push
directly to mesa rather than using merge requests.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1969
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Bas Nieuwenhuizen [Fri, 22 Nov 2019 00:51:36 +0000 (01:51 +0100)]
radv: Fix timeline semaphore refcounting.
Was totally broken ...
Removed two if(point) {} because point is always non-NULL and we
were counting on that already for counting, since we NULL our
references to semaphores without active point earlier.
Fixes: 4aa75bb3bdd "radv: Add wait-before-submit support for timelines."
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2137
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Jonathan Gray [Thu, 28 Nov 2019 05:56:30 +0000 (16:56 +1100)]
winsys/amdgpu: avoid double simple_mtx_unlock()
pthread_mutex_unlock() when unlocked is documented by posix as
being undefined behaviour. On OpenBSD pthread_mutex_unlock() will call
abort(3) if this happens.
This occurs in amdgpu_winsys_create() after
cb446dc0fa5c68f681108f4613560543aa4cf553
winsys/amdgpu: Add amdgpu_screen_winsys
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: 19.2 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Marek Olšák [Tue, 26 Nov 2019 01:05:47 +0000 (20:05 -0500)]
util/driconfig: print ATTENTION if MESA_DEBUG=silent is not set
unix-bytebenchmark refuses to run if the driver prints ATTENTION to stderr.
Acked-by: Eric Engestrom <eric@engestrom.ch>
Tapani Pälli [Fri, 8 Nov 2019 06:17:17 +0000 (08:17 +0200)]
glsl: handle max uniform limits with lower_const_arrays_to_uniforms
Fixes arb_tessellation_shader-large-uniforms Piglit test.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bas Nieuwenhuizen [Wed, 27 Nov 2019 23:36:24 +0000 (00:36 +0100)]
radv: Unify max_descriptor_set_size.
They were out of sync. Besides syncing, lets ensure they never diverge
again.
Fixes: 8d2654a4197 "radv: Support VK_EXT_inline_uniform_block."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Wed, 27 Nov 2019 22:33:59 +0000 (23:33 +0100)]
amd/llvm: Refactor ac_build_scan.
Split out the logic for exclusive scans into a separate function
that makes clear what it does instead of having this opaque 60
line if.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Samuel Pitoiset [Tue, 26 Nov 2019 07:32:02 +0000 (08:32 +0100)]
radv: add more constants to avoid using magic numbers
Trivial.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 27 Nov 2019 14:32:45 +0000 (15:32 +0100)]
ac/llvm: convert src operands to pointers if necessary
To avoid generating invalid LLVM IR when both operands don't have
the same type. This might happen when performing pointer comparisons
with SPIRV 1.4.
Fixes invalid LLVM IR for:
dEQP-VK.spirv_assembly.instruction.spirv1p4.opptrequal.variable_pointers_ssbo_equal
dEQP-VK.spirv_assembly.instruction.spirv1p4.opptrnotequal.variable_pointers_ssbo_not_equal
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Dave Airlie [Thu, 5 Sep 2019 05:49:25 +0000 (15:49 +1000)]
llvmpipe: add initial nir support
This adds the hooks between llvmpipe and the gallivm NIR
code, for compute and fragment shaders.
NIR support is hidden behind LP_DEBUG=nir for now until
all the intergration issues are solved
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Thu, 5 Sep 2019 05:41:05 +0000 (15:41 +1000)]
gallivm: add swizzle support where one channel isn't defined.
NIR doesn't always define all output channels
relies on outputs being memset to 0
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Thu, 5 Sep 2019 05:47:39 +0000 (15:47 +1000)]
gallium: add nir lowering passes for the draw pipe stages. (v2)
This transforms the NIR shaders like the TGSI transforms worked.
v2: fix some nir info requirements, use 32-bit bools
Acked-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Thu, 5 Sep 2019 05:47:19 +0000 (15:47 +1000)]
draw: add nir info gathering and building support
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Thu, 5 Sep 2019 05:46:31 +0000 (15:46 +1000)]
gallivm: add nir->llvm translation (v2)
This add the initial implementation of the NIR->LLVM conversion
for llvmpipe NIR support.
v2: lower bool to int32 in nir not llvm
Acked-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Thu, 5 Sep 2019 05:43:38 +0000 (15:43 +1000)]
gallivm: add selection for non-32 bit types
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Wed, 20 Nov 2019 01:44:22 +0000 (11:44 +1000)]
gallivm: add cttz wrapper
this will be used to write find_lsb support
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Mon, 28 Oct 2019 04:21:43 +0000 (14:21 +1000)]
gallivm: add popcount intrinsic wrapper
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Thu, 5 Sep 2019 05:32:21 +0000 (15:32 +1000)]
gallivm: nir->tgsi info convertor (v2)
This is a port of the old radeonsi code to be used for llvmpipe NIR support.
Once we remove TGSI support from llvmpipe (I can dream? :-), then
we should be able to refine most of this down and remove it.
v2: port to later radeonsi code for vertex inputs and sampler/io parsing.
Acked-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Thu, 5 Sep 2019 05:34:46 +0000 (15:34 +1000)]
gallivm: split out the flow control ir to a common file.
We can share a bunch of flow control handling between NIR and TGSI.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Marek Olšák [Wed, 6 Nov 2019 23:03:30 +0000 (18:03 -0500)]
radeonsi: enable SPIR-V and GL 4.6 for NIR
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Thu, 7 Nov 2019 01:50:26 +0000 (20:50 -0500)]
radeonsi/nir: support interface output types to fix SPIR-V xfb piglits
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Thu, 7 Nov 2019 01:19:17 +0000 (20:19 -0500)]
radeonsi/nir: fix location_frac handling for TCS outputs
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Thu, 7 Nov 2019 01:18:23 +0000 (20:18 -0500)]
radeonsi/nir: don't rely on data.patch for tess factors
GLCTS SPIR-V tests have this issue.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Thu, 7 Nov 2019 01:12:40 +0000 (20:12 -0500)]
radeonsi/nir: validate is_patch because SPIR-V doesn't set it for tess factors
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Thu, 7 Nov 2019 00:48:34 +0000 (19:48 -0500)]
radeonsi: simplify get_tcs_tes_buffer_address_from_generic_indices
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Thu, 7 Nov 2019 00:40:23 +0000 (19:40 -0500)]
radeonsi: simplify the interface of get_dw_address_from_generic_indices
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Thu, 7 Nov 2019 00:06:09 +0000 (19:06 -0500)]
radeonsi/nir: implement subgroup system values for SPIR-V
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Thu, 7 Nov 2019 01:19:10 +0000 (20:19 -0500)]
ac/nir: don't rely on data.patch for tess factors
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Kenneth Graunke [Fri, 22 Nov 2019 09:37:02 +0000 (01:37 -0800)]
drirc: Set vs_position_always_invariant for Shadow of Mordor on Intel
When drawing the main character in Shadow of Mordor, the game appears
to draw Talion with one vertex shader, and the Wraith with another.
If the compiler optimizes those in different ways which lead to slight
imprecisions, then the resulting positions may not line up, leading to
Z-fighting occurring as the game decides which of the two are in front.
brw_nir_opt_peephole_ffma looks at usages of multiply adds across the
entire shader, and may make different decisions between the two, leading
to such imprecisions and Z-fighting. This started happening recently
after a NIR change to eliminate unnecessary MOVs (
7025dbe7), but that
change simply exposed the existing problem.
Improves performance on Skylake GT4e by 1.22945% +/- 0.398672% (n=3),
likely due to the fixed rendering.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1985
Fixes: 7025dbe794b ("nir: Skip emitting no-op movs from the builder.")
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Kenneth Graunke [Fri, 22 Nov 2019 00:11:15 +0000 (16:11 -0800)]
driconf, glsl: Add a vs_position_always_invariant option
Many applications use multi-pass rendering and require their vertex
shader position to be computed the same way each time. Optimizations
may consider, say, fusing a multiply-add based on global usage of an
expression in a shader. But a second shader with the same expression
may have different code, causing that optimization to make the other
choice the second time around.
The correct solution is for applications to mark their VS outputs
'invariant', indicating they need multiple shaders to compute that
output in the same manner. However, most applications fail to do so.
So, we add a new driconf option - vs_position_always_invariant - which
forces the gl_Position output in vertex shaders to be marked invariant.
Fixes: 7025dbe794b ("nir: Skip emitting no-op movs from the builder.")
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Eric Anholt [Wed, 27 Nov 2019 00:16:05 +0000 (16:16 -0800)]
turnip: Disable timestamp queries for now.
They're not implemented, and not critical to bring up immediately. Avoids
failures in the CTS when nothing gets written to the query.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Jonathan Marek [Wed, 27 Nov 2019 15:46:22 +0000 (10:46 -0500)]
freedreno/perfcntrs/fdperf: add missing a2xx case in select_counter
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Jonathan Marek [Wed, 27 Nov 2019 15:45:41 +0000 (10:45 -0500)]
freedreno/perfcntrs/fdperf: add missing a20x compatible
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Jonathan Marek [Wed, 27 Nov 2019 15:44:57 +0000 (10:44 -0500)]
freedreno/perfcntrs/fdperf: fix u64 print on 32-bit builds
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Jonathan Marek [Wed, 27 Nov 2019 15:40:59 +0000 (10:40 -0500)]
freedreno/perfcntrs: add a2xx MH counters
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Jonathan Marek [Wed, 27 Nov 2019 15:38:14 +0000 (10:38 -0500)]
freedreno/registers: add missing MH perfcounter enum for a2xx
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Michel Dänzer [Mon, 25 Nov 2019 17:42:10 +0000 (18:42 +0100)]
gitlab-ci: Put HTML summary in artifacts for failed piglit jobs
This will make it easier to look at details of failed / skipped tests.
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Michel Dänzer [Tue, 26 Nov 2019 15:27:07 +0000 (16:27 +0100)]
gitlab-ci: Stop storing piglit test results as JUnit
Since we're not reporting test results as JUnit anymore, we can use the
default JSON format.
This affects how test results are summarized, update the reference files
accordingly.
Reviewed-by: Eric Anholt <eric@anholt.net>
Michel Dänzer [Tue, 26 Nov 2019 16:44:49 +0000 (17:44 +0100)]
gitlab-ci: Stop reporting piglit test results via JUnit
It was basically useless in this form, and processing the JUnit data in
the GitLab backend was pretty expensive.
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Iago Toral Quiroga [Tue, 26 Nov 2019 14:28:52 +0000 (15:28 +0100)]
v3d: fix indirect BO allocation for uniforms
We were always ensuring a minimum size of 4 bytes for uniforms
for the case where we don't have any, to account for hardware pre-fetching
of the uniform stream, however, pre-fetching could also lead to to out
of bounds reads when have read the last uniform in the stream, so we
probably want to have the extra 4 bytes to prevent the kernel from
observing invalid memory accesses when the uniform stream sits right at
the end of a page.
This seems to fix MMU exceptions reported with a Linux 5.4 kernel.
Credit goes to Phil Elwell for identifying the problem and narrowing
it down to memory accesses in the uniform stream.
Reported-by: Phil Elwell <phil@raspberrypi.org>
Tested-by: Phil Elwell <phil@raspberrypi.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Samuel Pitoiset [Mon, 25 Nov 2019 15:53:54 +0000 (16:53 +0100)]
radv: enable VK_KHR_shader_subgroup_extended_types on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Mon, 25 Nov 2019 16:02:44 +0000 (17:02 +0100)]
ac: add 8-bit and 16-bit supports to ac_build_permlane16()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Fri, 23 Aug 2019 15:53:05 +0000 (17:53 +0200)]
radv/gfx10: fix implementation of exclusive scans
This implementation is loosely based on ROCm.
https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/master/ockl/src/wfredscan.cl
This fixes dEQP-VK.subgroups.arithmetic.*.subgroupexclusive* on GFX10.
Fixes: 227c29a80de ("amd/common/gfx10: implement scan & reduce operations")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Tue, 26 Nov 2019 17:29:00 +0000 (18:29 +0100)]
radv: fix enabling sample shading with SampleID/SamplePosition
When a fragment shader includes an input variable decorated with
SampleId or SamplePosition, sample shading should be enabled
because minSampleShadingFactor is expected to be 1.0.
Cc: 19.2, 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Jonathan Marek [Tue, 19 Nov 2019 04:01:18 +0000 (23:01 -0500)]
turnip: fix integer render targets
Add missing required bits. Fixes at least:
dEQP-VK.pipeline.render_to_image.dedicated_allocation.1d.small.r16g16_sint_d24_unorm_s8_uint
dEQP-VK.pipeline.render_to_image.dedicated_allocation.2d.mipmap.r16g16_sint_d24_unorm_s8_uint
dEQP-VK.renderpass.dedicated_allocation.attachment.4.401
dEQP-VK.renderpass2.suballocation.formats.r16_uint.load.draw
dEQP-VK.synchronization.op.single_queue.barrier.write_draw_read_copy_image_to_buffer.image_128x128_r16_uint
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Mon, 25 Nov 2019 17:05:42 +0000 (11:05 -0600)]
anv: Push constants are relative to dynamic state on IVB
Fixes: aecde2351 "anv: Pre-compute push ranges for graphics pipelines"
Closes: #2136
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Dylan Baker [Thu, 21 Nov 2019 17:11:45 +0000 (09:11 -0800)]
meson: Add -Werror=gnu-empty-initializer to MSVC compat args
Only clang has this argument (at least as of clang 8 and gcc 9), which
errors when using the gcc empty initializer syntax in C:
```C
struct foo f = {};
```
GCC has a warning for this, but only when using -Wpedantic, which is a
lot of noise to lose useful warnings in.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Dylan Baker [Thu, 21 Nov 2019 17:50:27 +0000 (09:50 -0800)]
gallium/auxiliary: Fix uses of gnu struct = {} extension
Most of these will never actually be compiled by windows, but in the
interest of being able to make using struct foo = {}; an error and
avoiding breaking windows removing a handful of safe uses seems like a
good trade off.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Marek Olšák [Mon, 25 Nov 2019 22:58:45 +0000 (17:58 -0500)]
st/mesa: add st_variant base class to simplify code for shader variants
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Sat, 23 Nov 2019 00:31:46 +0000 (19:31 -0500)]
st/mesa: don't use ** in the st_nir_link_shaders signature
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Tue, 19 Nov 2019 22:30:03 +0000 (17:30 -0500)]
st/mesa: simplify looping over linked shaders when linking NIR
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Wed, 13 Nov 2019 04:48:02 +0000 (23:48 -0500)]
st/mesa: propagate gl_PatchVerticesIn from TCS to TES before linking for NIR
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Wed, 13 Nov 2019 04:46:37 +0000 (23:46 -0500)]
st/mesa: don't call ProgramStringNotify in glsl_to_nir
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Thu, 21 Nov 2019 00:18:21 +0000 (19:18 -0500)]
st/mesa: don't use redundant stp->state.ir.nir
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Mon, 25 Nov 2019 22:01:42 +0000 (17:01 -0500)]
st/mesa: don't serialize all streamout state if there are no SO outputs
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Kenneth Graunke [Mon, 25 Nov 2019 18:04:38 +0000 (10:04 -0800)]
iris: Disable VF cache partial address workaround on Gen11+
The vertex cache uses the full 48-bit address on Gen11+. See the
documentation for 3DSTATE_VERTEX_BUFFERS, which describes the
workaround and lists it as pre-Icelake.
Interestingly, the docs don't mention index buffers as needing a
workaround at all. So either we've been overzealous, or the docs
never got updated to record that. Which begs the question of whether
the issue there was fixed, if there was one...
Cuts 40% of the PIPE_CONTROLs from Civilization VI's benchmark; appears
that it improves performance by about 1-2% on Icelake 8x8 (not frequency
locked).
Rob Clark [Sun, 5 May 2019 17:59:37 +0000 (10:59 -0700)]
freedreno: switch to layout helper
The slices table and most of the other layout fields in the
freedreno_resource moves into fdl_layout.
v2: Changes by anholt to not have duplicate fields, which was introducing
a surprising behavior change in resource layout (using the
level_linear helper before the setup of the shadowed fields)
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Rob Clark <robdclark@chromium.org>
Eric Anholt [Thu, 21 Nov 2019 04:54:27 +0000 (20:54 -0800)]
freedreno/a6xx: Log the tiling mode in resource layout debug.
This was important for figuring out what went wrong with the layout
refactor.
Acked-by: Rob Clark <robdclark@chromium.org>
Eric Anholt [Wed, 20 Nov 2019 20:40:25 +0000 (12:40 -0800)]
freedreno: Convert the slice struct to the new resource header.
This gets the worst of the sed required for shared resource layout out of
the way. The texture layout comment is dropped now that we're referencing
the shared header, which has a more complete description.
Acked-by: Rob Clark <robdclark@chromium.org>
Eric Anholt [Wed, 20 Nov 2019 20:28:43 +0000 (12:28 -0800)]
freedreno: Introduce a resource layout header.
This will be used for sharing resource layout code between freedreno and
tu. Mostly copied from a commit by Rob, with a new location and the slice
struct renamed for consistency.
Acked-by: Rob Clark <robdclark@chromium.org>
Eric Anholt [Wed, 20 Nov 2019 21:17:27 +0000 (13:17 -0800)]
freedreno: Introduce a fd_resource_tile_mode() helper.
Multiple places were doing the same thing to get the tile mode of a level,
so refactor it out. This will make the shared resource helper transition
cleaner.
Acked-by: Rob Clark <robdclark@chromium.org>
Eric Anholt [Wed, 20 Nov 2019 20:55:56 +0000 (12:55 -0800)]
freedreno: Introduce a fd_resource_layer_stride() helper.
This factors out a bit of duplicated code, but will also make the shared
resource layout transition process clearer.
Acked-by: Rob Clark <robdclark@chromium.org>
Rob Clark [Sun, 5 May 2019 15:10:24 +0000 (08:10 -0700)]
freedreno: use rsc->slice accessor everywhere
This will make it easier to extract the slice table out into a layout
helper.
Acked-by: Rob Clark <robdclark@chromium.org>
Eric Anholt [Wed, 2 Oct 2019 17:59:13 +0000 (10:59 -0700)]
nir: Make algebraic backtrack and reprocess after a replacement.
The algebraic pass was exhibiting O(n^2) behavior in
dEQP-GLES2.functional.uniform_api.random.3 and
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 (along with
other code-generated tests, and likely real-world loop-unroll cases).
In the process of using fmul(b2f(x), b2f(x)) -> b2f(iand(x, y)) to
transform:
result = b2f(a == b);
result *= b2f(c == d);
...
result *= b2f(z == w);
->
temp = (a == b)
temp = temp && (c == d)
...
temp = temp && (z == w)
result = b2f(temp);
nir_opt_algebraic, proceeding bottom-to-top, would match and convert
the top-most fmul(b2f(), b2f()) case each time, leaving the new b2f to
be matched by the next fmul down on the next time algebraic got run by
the optimization loop.
Back in 2016 in
7be8d0773229 ("nir: Do opt_algebraic in reverse
order."), Matt changed algebraic to go bottom-to-top so that we would
match the biggest patterns first. This helped his cases, but I
believe introduced this failure mode. Instead of reverting that, now
that we've got the automaton, we can update the automaton's state
recursively and just re-process any instructions whose state has
changed (indicating that they might match new things). There's a
small chance that the state will hash to the same value and miss out
on this round of algebraic, but this seems to be good enough to fix
dEQP.
Effects with NIR_VALIDATE=0 (improvement is better with validation enabled):
Intel shader-db runtime -0.954712% +/- 0.333844% (n=44/46, obvious throttling
outliers removed)
dEQP-GLES2.functional.uniform_api.random.3 runtime
-65.3512% +/- 4.22369% (n=21, was 1.4s)
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 runtime
-68.8066% +/- 6.49523% (was 4.8s)
v2: Use two worklists, suggested by @cwabbott, to cut out a bunch of
tricky code. Runtime of uniform_api.random.3 down -0.790299% +/-
0.244213% compred to v1.
v3: Re-add the nir_instr_remove() that I accidentally dropped in v2,
fixing infinite loops.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Eric Anholt [Wed, 13 Nov 2019 20:15:08 +0000 (12:15 -0800)]
nir: Refactor algebraic's block walk
My motivation was to clarify the changes in the following commit, but
incidentally, it reduces runtime of
dEQP-GLES2.functional.uniform_api.random.3 (an algebraic-heavy
testcase) by -5.39524% +/- 2.21179% (n=15)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Connor Abbott [Fri, 10 May 2019 14:57:45 +0000 (16:57 +0200)]
nir: Maintain the algebraic automaton's state as we work.
In order to have nir_opt_algebraic be able to do further algebraic
work on the output of a replacement, we need to maintain the
automaton's state.
Reviewed-by: Eric Anholt <eric@anholt.net>
Jonathan Marek [Sun, 20 Oct 2019 06:11:45 +0000 (02:11 -0400)]
etnaviv: support 3d/array/integer formats in texture descriptors
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Jonathan Marek [Fri, 9 Aug 2019 14:55:46 +0000 (10:55 -0400)]
etnaviv: blt: fix partial ZS clears with TS
If not all bits are cleared, then BLT needs to be given the current clear
value and not the new one.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>