Marek Olšák [Thu, 19 Oct 2017 20:22:15 +0000 (22:22 +0200)]
mesa: enable ARB_texture_buffer_* extensions in the Compatibility profile
We already have piglit tests testing alpha, luminance, and intensity
formats. They were skipped by piglit until now.
Additionally, I'm enabling one ARB_texture_buffer_range piglit test to run
with the compat profile.
i965 behavior is unchanged except that it doesn't expose TBOs in the Compat
profile. Not sure how that affects the GL version override.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Dave Airlie [Thu, 2 Nov 2017 01:04:53 +0000 (11:04 +1000)]
docs: update r600 atomic counter status.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 2 Nov 2017 00:26:51 +0000 (10:26 +1000)]
r600: add support for hw atomic counters. (v3)
This adds support for the evergreen/cayman atomic counters.
These are implemented using GDS append/consume counters. The values
for each counter are loaded before drawing and saved after each draw
using special CP packets.
v2: move hw atomic assignment into driver.
v3: fix messing up caps (Gert Wollny), only store ranges in driver,
drop buffers.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Dave Airlie [Wed, 1 Nov 2017 19:51:36 +0000 (05:51 +1000)]
st/mesa: add support for hw atomics to glsl->tgsi. (v5)
This adds support for creating the hw atomic tgsi from
the glsl codepaths.
v2: drop the atomic index and move to backend.
v3: drop buffer decls. (Marek)
v4: fix off by one (Gert)
v5: fix off by one the other way (Dave)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 1 Nov 2017 19:49:40 +0000 (05:49 +1000)]
st/mesa: setup hw atomic limits. (v1.1)
HW atomics need to use caps to set some limits, and some
other limits may also need limiting.
This fixes things up to work for evergreen hw, it may need
more changes in the future if other hw wants to use this path.
v1.1: fix indent.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 1 Nov 2017 04:30:13 +0000 (14:30 +1000)]
st/mesa: start adding support for hw atomics atom. (v2)
This adds a new atom that calls the new driver API to
bind buffers containing hw atomics.
v2: fixup bindings for sparse buffers. (mareko/nha)
don't bind buffer atomics when hw atomics are enabled.
use NewAtomicBuffer (mareko)
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 1 Nov 2017 04:22:04 +0000 (14:22 +1000)]
mesa/program: add hw atomic counter file
This is needed for the GLSL->TGSI translation for hw atomic counters.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 1 Nov 2017 04:17:49 +0000 (14:17 +1000)]
gallium: add hw atomic buffer binding API.
This API binds atomic buffers for all bound shaders (as per the
GL semantics).
This is needed to support cross shader hw atomic counters.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 1 Nov 2017 04:05:19 +0000 (14:05 +1000)]
gallium/tgsi: start adding hw atomics (v3.2)
This adds support for a hw atomic counters to TGSI.
A new register file for storing atomic counters is added,
along with a new atomic counter semantic, along with docs
for both.
v2: drop semantic, move hw counter to backend,
Ilia pointed out SSO would have busted my plan, and he
was right.
v3: drop BUFFER decls. (Marek)
v3.1: minor fixups for whitespace, set ureg error
if we overflow the hw atomic limits. (nha)
v3.2: fix some docs inconsistencies (Ilia)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Tue, 8 Aug 2017 03:13:03 +0000 (13:13 +1000)]
gallium: add CAPs to support HW atomic counters. (v3)
This looks like an evergreen specific feature, but with atomic
counters AMD have hw specific counters they use instead of operating
on buffers directly. These are separate to the buffer atomics,
so require different limits and code paths.
I've left the CAP for atomic type extensible in case someone
else has a variant on this sort of thing (freedreno maybe?)
and needs to change it.
This adds all the CAPs required to add support for those atomic
counters, along with a related CAP for limiting the number of
output resources.
I'd like to land this and the st patch then I can start to
upstream the evergreen support for these and other GL4.x features.
v2: drop the ATOMIC_COUNTER_MODE cap, just use the return
from the HW counters. If 0 we use the current mode.
v3: fix some rebase errors (Gert Wollny)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 9 Nov 2017 05:48:18 +0000 (15:48 +1000)]
r600/query: drop rest of vi workaround code.
This isn't needed in r600 anymore.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Roland Scheidegger [Tue, 7 Nov 2017 00:43:51 +0000 (01:43 +0100)]
docs: Fix GL_MESA_program_debug enums
13b303ff9265b89bdd9100e32f905e9cdadfad81 added the actual enums but
didn't remove the already existing XXXX ones. (And also duplicated
the "fragment" names instead of using the "vertex" names.)
Fixes: 13b303ff9265b89bdd91 "docs: Update the list of used MESA GL enums."
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Brian Paul [Thu, 9 Nov 2017 17:48:42 +0000 (10:48 -0700)]
st/mesa: remove 'struct' keyword on function parameter
st_src_reg is a class, not a struct. Simply remove 'struct' to silence
a MSVC compiler warning (class vs. struct mismatch).
Reviewed-by; Charmaine Lee <charmainel@vmware.com>
Brian Paul [Thu, 9 Nov 2017 16:43:37 +0000 (09:43 -0700)]
threads: fix MinGW build breakage
Fixes: f1a364878431c8 ("threads: update for late C11 changes")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Brian Paul [Thu, 9 Nov 2017 05:18:40 +0000 (22:18 -0700)]
mesa: s/GLint/gl_buffer_index/ for _ColorDrawBufferIndexes
Also fix local variable declarations and replace -1 with BUFFER_NONE.
No Piglit changes.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Thu, 9 Nov 2017 05:09:59 +0000 (22:09 -0700)]
mesa: s/GLint/gl_buffer_index/ for _ColorReadBufferIndex
BUFFER_NONE is -1 so no reason for GLint.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Thu, 9 Nov 2017 05:07:08 +0000 (22:07 -0700)]
mesa: minor reformatting, add const to gl_external_samplers()
This function should probably be moved elsewhere, too.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Brian Paul [Mon, 6 Nov 2017 16:28:02 +0000 (09:28 -0700)]
st/mesa: whitespace clean-up in st_mesa_to_tgsi.c
Remove trailing whitespace, fix indentation, wrap lines to 78 columns, etc.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Dylan Baker [Mon, 30 Oct 2017 17:17:22 +0000 (10:17 -0700)]
meson: implement default driver arguments
This allows drivers to be set by OS/arch in a sane manner.
v2: - set _drivers to a list of drivers instead of manually assigning
each with_*
v3: - Use "auto" instead of "default", which matches the value of other
automatically configured options.
- Set vulkan drivers as well
- Add error message if no automatic drivers are known for a given
arch/OS combo
- use not(darwin or windows) instead of (linux or *bsd), which is
probably more accurate (that way Solaris and other *nix systems
aren't excluded)
- rename softpipe to swrast, as swrast is the actual option name
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Kenneth Graunke [Wed, 8 Nov 2017 18:56:00 +0000 (10:56 -0800)]
i965: Pretend there are 4 subslices for compute shader threads on Gen9+.
Similar to what we did for pixel shader threads - see gen_device_info.c.
We don't want to bump the actual Maximum Number of Threads though, so
we adjust it here. For pixel shaders, we don't use max_wm_threads, so
we could just bump it globally.
Supposedly fixes Piglit tests:
arb_gpu_shader_int64/execution/built-in-functions/cs-op-div-i64vec3-int64_t
arb_gpu_shader_int64/execution/built-in-functions/cs-op-div-i64vec4-int64_t
arb_gpu_shader_int64/execution/built-in-functions/cs-op-div-u64vec4-uint64_t
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Dylan Baker [Wed, 1 Nov 2017 18:54:10 +0000 (11:54 -0700)]
meson: Add script to use VERSION file for getting version
Meson has up until this point set it's version in the root meson.build
script, while the other build systems read the VERSION file. This is
just "one more thing" to duplicate between meson and every other build
system. This script is a simple "read, strip, print" sort of deal to
allow meson to read the VERSION file.
I chose to implement this in python since python is portable, and to
keep the meson.build script clean. This is also complicated by the fact
that the project() call *must* be the first non-comment,non-blank in the
toplevel meson.build script.
v2: - Move from scripts/ to bin/
- use python explicitly to run the scripts to support windows
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Boris Brezillon [Tue, 26 Sep 2017 11:37:40 +0000 (13:37 +0200)]
broadcom/vc4: Mark BOs as purgeable when they enter the BO cache
This patch makes use of the DRM_IOCTL_VC4_GEM_MADVISE ioctl to mark all
BOs placed in the mesa BO cache as purgeable so that the system can
reclaim this memory under memory pressure.
v2:
- Removed BOs from the cache when they've been purged by the kernel
- Check whether the madvise ioctl is supported or not before using it
v3: Don't walk the whole list when we find a busy BO (by anholt, acked by
Boris)
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Boris Brezillon [Tue, 26 Sep 2017 07:35:53 +0000 (09:35 +0200)]
drm-uapi: Update vc4 header from drm-next
Taken from drm-next
d65d31388a23 ("Merge tag
'drm-misc-next-fixes-2017-11-07' of
git://anongit.freedesktop.org/drm/drm-misc into drm-next")
v2: Add the NOTSUPP definition from the final drm-next version, not the
commit (anholt).
Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Eric Anholt [Wed, 8 Nov 2017 22:07:38 +0000 (14:07 -0800)]
meson: Enable VC4's NEON assembly support.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Timothy Arceri <tarceri@itsqueeze.com>
Eric Anholt [Wed, 8 Nov 2017 22:00:51 +0000 (14:00 -0800)]
meson: Always link libgallium_dri.so against dep_thread.
Somehow on my cross build the -pthread is getting lost. All the other
deps seem to work out fine.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Tested-by: Timothy Arceri <tarceri@itsqueeze.com>
Eric Anholt [Wed, 8 Nov 2017 19:54:24 +0000 (11:54 -0800)]
meson: Drop stale comment about making valgrind conditional.
It was fixed in
5c2ff5773a707519f6a773126f201c4e1e8a42d7.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Tested-by: Timothy Arceri <tarceri@itsqueeze.com>
Eric Anholt [Wed, 8 Nov 2017 19:52:09 +0000 (11:52 -0800)]
meson: Leave dep_llvm empty if !with_llvm
The gallium auxiliary build would link against llvm, for the gallivm code
that it didn't build. This broke the build on my armhf cross, where
libLLVM-3.9.so is not multiarch and thus points to x86-64 libs.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Tested-by: Timothy Arceri <tarceri@itsqueeze.com>
Adam Jackson [Thu, 9 Nov 2017 16:41:14 +0000 (11:41 -0500)]
Revert "glx: Implement GLX_EXT_no_config_context (v2)"
Pushed ahead of things actually working.
This reverts commit
5293b96b160b904c0e53cbce93679c3aa090f846.
Marek Olšák [Tue, 7 Nov 2017 18:00:20 +0000 (19:00 +0100)]
radeonsi: pack r600_surface better
160 -> 136 bytes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 7 Nov 2017 17:53:37 +0000 (18:53 +0100)]
radeonsi: pack r600_texture better
1752 -> 1736 bytes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 7 Nov 2017 17:53:59 +0000 (18:53 +0100)]
radeonsi: clean up r600_surface
216 -> 160 bytes
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 7 Nov 2017 17:42:53 +0000 (18:42 +0100)]
radeonsi: remove r600_texture::non_disp_tiling
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 7 Nov 2017 17:42:23 +0000 (18:42 +0100)]
radeonsi: remove DBG_NO_DISCARD_RANGE
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Adam Jackson [Tue, 7 Nov 2017 16:36:53 +0000 (11:36 -0500)]
glx: Implement GLX_EXT_no_config_context (v2)
This more or less ports EGL_KHR_no_config_context to GLX.
v2: Enable the extension only for those backends that support it.
Khronos: https://github.com/KhronosGroup/OpenGL-Registry/pull/102
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Adam Jackson [Tue, 7 Nov 2017 16:36:52 +0000 (11:36 -0500)]
glx: Prepare the DRI backends for GLX_EXT_no_config_context
This should be safe as these backends already support the EGL version of
this extension. DRI1 is not affected because it does not support
GLX_ARB_create_context anyway. DRI-Windows is not prepared to implement
this as there's no equivalent WGL extension, and wglCreateContextAttribs
seems to really want the HDC's pixel format to be set.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Adam Jackson [Tue, 7 Nov 2017 16:36:51 +0000 (11:36 -0500)]
glx: Relax validate_renderType_against_config for EXT_no_config_context
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Nicolai Hähnle [Thu, 9 Nov 2017 13:49:19 +0000 (14:49 +0100)]
anv: fix build failure
Fixes: e3a8013de8ca ("util/u_queue: add util_queue_fence_wait_timeout")
Nicolai Hähnle [Sun, 22 Oct 2017 15:39:04 +0000 (17:39 +0200)]
mesa: flush and wait after creating a fallback texture
Fixes non-deterministic failures in
dEQP-EGL.functional.sharing.gles2.multithread.simple_egl_sync.images.texture_source.teximage2d_render
and others in dEQP-EGL.functional.sharing.gles2.multithread.*
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:39:03 +0000 (17:39 +0200)]
mesa: increase MaxServerWaitTimeout
The current value was introduced in commit
a27180d0d8666, which claims
that it represents ~1.11 years. However, it is interpreted in nanoseconds,
so it actually only represents ~9.8 hours. That seems a bit short.
Use the largest value consistent with both int32 and int64. It
corresponds to ~292 years in nanoseconds.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:39:02 +0000 (17:39 +0200)]
st/mesa: remove redundant flushes from st_flush
st_flush should flush state tracker-internal state and the pipe, but
not mesa/main state. Of the four callers:
- glFlush/glFinish already call FLUSH_{VERTICES,STATE}.
- st_vdpau doesn't need to call them.
- st_manager will now call them explicitly.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:37 +0000 (17:38 +0200)]
st/dri: use stapi flush instead of pipe flush when creating fences
There may be pending operations (e.g. vertices) that need to be flushed
by the state tracker.
Found by inspection.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:39:05 +0000 (17:39 +0200)]
radeonsi: use a threaded context even for debug contexts
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:39:05 +0000 (17:39 +0200)]
radeonsi: record and dump time of flush
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:39:01 +0000 (17:39 +0200)]
ddebug: optionally handle transfer commands like draws
Transfer commands can have associated GPU operations.
Enabled by passing GALLIUM_DDEBUG=transfers.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:39:01 +0000 (17:39 +0200)]
ddebug: dump context and before/after times of draws
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:39:00 +0000 (17:39 +0200)]
ddebug: generalize print_named_xxx via a PRINT_NAMED macro
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:59 +0000 (17:38 +0200)]
ddebug: rewrite to always use a threaded approach
This patch has multiple goals:
1. Off-load the writing of records in 'always' mode to another thread
for performance.
2. Allow using ddebug with threaded contexts. This really forces us to
move some of the "after_draw" handling into another thread.
3. Simplify the different modes of ddebug, both in the code and in
the user interface, i.e. GALLIUM_DDEBUG. In particular, there's
no 'pipelined' anymore, since we're always pipelined; and 'noflush'
is replaced by 'flush', since we no longer flush by default.
4. Fix the fences in pipelining mode. They previously relied on writes
via pipe_context::clear_buffer. However, on radeonsi, those could
(quite reasonably) end up in the SDMA buffer. So we use the newly
added PIPE_FLUSH_{TOP,BOTTOM}_OF_PIPE fences instead.
5. Improve pipelined mode overall, using the finer grained information
provided by the new fences.
Overall, the result is that pipelined mode should be more useful, and
using ddebug in default mode is much less invasive, in the sense that
it changes the overall driver behavior less (which is kind of crucial
for a driver debugging tool).
An example of the new hang debug output:
Gallium debugger active.
Hang detection timeout is 1000ms.
GPU hang detected, collecting information...
Draw # driver prev BOP TOP BOP dump file
-------------------------------------------------------------
2 YES YES YES NO /home/nha/ddebug_dumps/shader_runner_19919_00000000
3 YES NO YES NO /home/nha/ddebug_dumps/shader_runner_19919_00000001
4 YES NO YES NO /home/nha/ddebug_dumps/shader_runner_19919_00000002
5 YES NO YES NO /home/nha/ddebug_dumps/shader_runner_19919_00000003
Done.
We can see that there were almost certainly 4 draws in flight when
the hang happened: the top-of-pipe fence was signaled for all 4 draws,
the bottom-of-pipe fence for none of them. In virtually all cases,
we'd expect the first draw in the list to be at fault, but due to the
GPU parallelism, it's possible (though highly unlikely) that one of
the later draws causes a component to get stuck in a way that prevents
the earlier draws from making progress as well.
(In the above example, there were actually only 3 draws truly in flight:
the last draw is a blit that waits for the earlier draws; however, its
top-of-pipe fence is emitted before the cache flush and wait, and so
the fact that the draw hasn't truly started yet can only be seen from a
closer inspection of GPU state.)
Acked-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:58 +0000 (17:38 +0200)]
ddebug: use an atomic increment when numbering files
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:58 +0000 (17:38 +0200)]
dd/util: extract dd_get_debug_filename_and_mkdir
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:57 +0000 (17:38 +0200)]
gallium/u_dump: add and use util_dump_transfer_usage
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:56 +0000 (17:38 +0200)]
gallium/u_dump: add util_dump_ns
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:55 +0000 (17:38 +0200)]
gallium/u_dump: export util_dump_ptr
Change format to %p while we're at it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:54 +0000 (17:38 +0200)]
radeonsi: implement PIPE_FLUSH_{TOP,BOTTOM}_OF_PIPE
v2: use uncached system memory for the fence, and use the CPU to
clear it so we never read garbage when checking the fence
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:53 +0000 (17:38 +0200)]
radeonsi: document some subtle details of fence_finish & fence_server_sync
v2: remove the change to si_fence_server_sync, we'll handle that more
robustly
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:53 +0000 (17:38 +0200)]
gallium: add pipe_context::callback
For running post-draw operations inside the driver thread. ddebug will
use it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:52 +0000 (17:38 +0200)]
gallium/u_threaded: implement pipe_context::set_log_context
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:51 +0000 (17:38 +0200)]
gallium/u_threaded: avoid syncs for get_query_result
Queries should still get marked as flushed when flushes are executed
asynchronously in the driver thread.
To this end, the management of the unflushed_queries list is moved into
the driver thread.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:50 +0000 (17:38 +0200)]
gallium/u_threaded: implement asynchronous flushes
This requires out-of-band creation of fences, and will be signaled to
the pipe_context::flush implementation by a special TC_FLUSH_ASYNC flag.
v2:
- remove an incorrect assertion
- handle fence_server_sync for unsubmitted fences by
relying on the improved cs_add_fence_dependency
- only implement asynchronous flushes on amdgpu
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:50 +0000 (17:38 +0200)]
gallium/u_threaded: mark queries flushed only for non-deferred flushes
The driver uses (and must use) the flushed flag of queries as a hint that
it does not have to check for synchronization with currently queued up
commands. Deferred flushes do not actually flush queued up commands, so
we must not set the flushed flag for them.
Found by inspection.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:49 +0000 (17:38 +0200)]
radeonsi: move fence functions to si_fence.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Thu, 9 Nov 2017 13:00:22 +0000 (14:00 +0100)]
winsys/amdgpu: handle cs_add_fence_dependency for deferred/unsubmitted fences
The idea is to fix the following interleaving of operations
that can arise from deferred fences:
Thread 1 / Context 1 Thread 2 / Context 2
-------------------- --------------------
f = deferred flush
<------- application-side synchronization ------->
fence_server_sync(f)
...
flush()
flush()
We will now stall in fence_server_sync until the flush of context 1
has completed.
This scenario was unlikely to occur previously, because applications
seem to be doing
Thread 1 / Context 1 Thread 2 / Context 2
-------------------- --------------------
f = glFenceSync()
glFlush()
<------- application-side synchronization ------->
glWaitSync(f)
... and indeed they probably *have* to use this ordering to avoid
deadlocks in the GLX model, where all GL operations conceptually
go through a single connection to the X server. However, it's less
clear whether applications have to do this with other WSI (i.e. EGL).
Besides, even this sequence of GL commands can be translated into
the Gallium-level sequence outlined above when Gallium threading
and asynchronous flushes are used. So it makes sense to be more
robust.
As a side effect, we no longer busy-wait on submission_in_progress.
We won't enable asynchronous flushes on radeon, but add a
cs_add_fence_dependency stub anyway to document the potential
issue.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:48 +0000 (17:38 +0200)]
gallium: add PIPE_FLUSH_{TOP,BOTTOM}_OF_PIPE bits
These bits are intended to be used by the ddebug hang detection and are
named in analogy to the Vulkan stage bits (and the corresponding Radeon
pipeline event).
Hang detection needs fences on the granularity of individual commands,
which nothing else really covers. The closest alternative would have
been PIPE_QUERY_GPU_FINISHED, but (a) queries are a per-context object
and we really want a per-screen object, (b) queries don't offer a
wait with timeout, and (c) in any case, PIPE_QUERY_GPU_FINISHED is
meant to imply that GPU caches are flushed, which the new bits
explicitly aren't.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:47 +0000 (17:38 +0200)]
gallium: add PIPE_FLUSH_ASYNC and PIPE_FLUSH_HINT_FINISH
Also document some subtleties of pipe_context::flush.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:46 +0000 (17:38 +0200)]
util/u_queue: add util_queue_fence_wait_timeout
v2:
- style fixes
- fix missing timeout handling in futex path
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:45 +0000 (17:38 +0200)]
threads: update for late C11 changes
C11 threads were changed to use struct timespec instead of xtime, and
thrd_sleep got a second argument.
See http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1554.htm and
http://en.cppreference.com/w/c/thread/{thrd_sleep,cnd_timedwait,mtx_timedlock}
Note that cnd_timedwait is spec'd to be relative to TIME_UTC / CLOCK_REALTIME.
v2: Fix Windows build errors. Tested with a default Appveyor config
that uses Visual Studio 2013. Judging from Brian's email and
random internet sources, Visual Studio 2015 does have timespec
and timespec_get, hence the _MSC_VER-based guard which I have
not tested.
Cc: Jose Fonseca <jfonseca@vmware.com>
Cc: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:44 +0000 (17:38 +0200)]
gallium: remove unused and deprecated u_time.h
Cc: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:44 +0000 (17:38 +0200)]
util: move os_time.[ch] to src/util
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:43 +0000 (17:38 +0200)]
radeonsi: always use async compiles when creating shader/compute states
With Gallium threaded contexts, creating shader/compute states is
effectively a screen operation, so we should not use context state.
In particular, this allows us to avoid using the context's LLVM
TargetMachine.
This isn't an issue yet because u_threaded_context filters out non-async
debug callbacks, and we disable threaded contexts for debug contexts.
However, we may want to change that in the future.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:42 +0000 (17:38 +0200)]
radeonsi: fix potential use-after-free of debug callbacks
Found by inspection.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:39 +0000 (17:38 +0200)]
radeonsi: move pipe debug callback to si_context
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:41 +0000 (17:38 +0200)]
u_queue: add util_queue_finish for waiting for previously added jobs
Schedule one job for every thread, and wait on a barrier inside the job
execution function.
v2: avoid alloca (fixes Windows build error)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:40 +0000 (17:38 +0200)]
util: move pipe_barrier into src/util and rename to util_barrier
The #if guard is probably not 100% equivalent to the previous PIPE_OS
check, but if anything it should be an over-approximation (are there
pthread implementations without barriers?), so people will get either
a good implementation or compile errors that are easy to fix.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:38 +0000 (17:38 +0200)]
gallium: add async debug message forwarding helper
v2: use util_vasprintf for Windows portability
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:36 +0000 (17:38 +0200)]
st/mesa: guard sampler views changes with a mutex
Some locking is unfortunately required, because well-formed GL programs
can have multiple threads racing to access the same texture, e.g.: two
threads/contexts rendering from the same texture, or one thread destroying
a context while the other is rendering from or modifying a texture.
Since even the simple mutex caused noticable slowdowns in the piglit
drawoverhead micro-benchmark, this patch uses a slightly more involved
approach to keep locks out of the fast path:
- the initial lookup of sampler views happens without taking a lock
- a per-texture lock is only taken when we have to modify the sampler
view(s)
- since each thread mostly operates only on the entry corresponding to
its context, the main issue is re-allocation of the sampler view array
when it needs to be grown, but the old copy is not freed
Old copies of the sampler views array are kept around in a linked list
until the entire texture object is deleted. The total memory wasted
in this way is roughly equal to the size of the current sampler views
array.
Fixes non-deterministic memory corruption in some
dEQP-EGL.functional.sharing.gles2.multithread.* tests, e.g.
dEQP-EGL.functional.sharing.gles2.multithread.simple.images.texture_source.create_texture_render
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:34 +0000 (17:38 +0200)]
st/mesa: re-arrange st_finalize_texture
Move the early-out for surface-based textures earlier. This narrows the
scope of the locking added in a follow-up commit.
Fix one remaining case of initializing a surface-based texture
without properly finalizing it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:33 +0000 (17:38 +0200)]
gallium: clarify the constraints on sampler_view_destroy
r600 expects the context that created the sampler view to still be alive
(there is a per-context list of sampler views).
svga currently bails when the context of destruction is not the same as
creation.
The GL state tracker, which is the only one that runs into the
multi-context subtleties (due to share groups), already guarantees that
sampler views are destroyed before their context of creation is destroyed.
Most drivers are context-agnostic, so the warning message in
pipe_sampler_view_release doesn't really make sense.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:32 +0000 (17:38 +0200)]
radeonsi: reduce the scope of sel->mutex in si_shader_select_with_key
We only need the lock to guard changes in the variant linked list. The
actual compilation can happen outside the lock, since we use the ready
fence as a guard.
v2: fix double-unlock
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:32 +0000 (17:38 +0200)]
radeonsi: use ready fences on all shaders, not just optimized ones
There's a race condition between si_shader_select_with_key and
si_bind_XX_shader:
Thread 1 Thread 2
-------- --------
si_shader_select_with_key
begin compiling the first
variant
(guarded by sel->mutex)
si_bind_XX_shader
select first_variant by default
as state->current
si_shader_select_with_key
match state->current and early-out
Since thread 2 never takes sel->mutex, it may go on rendering without a
PM4 for that shader, for example.
The solution taken by this patch is to broaden the scope of
shader->optimized_ready to a fence shader->ready that applies to
all shaders. This does not hurt the fast path (if anything it makes
it faster, because we don't explicitly check is_optimized).
It will also allow reducing the scope of sel->mutex locks, but this is
deferred to a later commit for better bisectability.
Fixes dEQP-EGL.functional.sharing.gles2.multithread.simple.buffers.bufferdata_render
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:31 +0000 (17:38 +0200)]
u_queue: add a futex-based implementation of fences
Fences are now 4 bytes instead of 96 bytes (on my 64-bit system).
Signaling a fence is a single atomic operation in the fast case plus a
syscall in the slow case.
Testing if a fence is signaled is the same as before (a simple comparison),
but waiting on a fence is now no more expensive than just testing it in
the fast (already signaled) case.
v2:
- style fixes
- use p_atomic_xxx macros with the right barriers
Acked-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:30 +0000 (17:38 +0200)]
u_queue: add util_queue_fence_reset
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:29 +0000 (17:38 +0200)]
u_queue: export util_queue_fence_signal
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:29 +0000 (17:38 +0200)]
u_queue: group fence functions together
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Fri, 3 Nov 2017 14:19:57 +0000 (15:19 +0100)]
util/u_atomic: add p_atomic_xchg
The closest to it in the old-style gcc builtins is __sync_lock_test_and_set,
however, that is only guaranteed to work with values 0 and 1 and only
provides an acquire barrier. I also don't know about other OSes, so we
provide a simple & stupid emulation via p_atomic_cmpxchg.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:26 +0000 (17:38 +0200)]
util: move futex helpers into futex.h
v2: style fixes
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Kenneth Graunke [Tue, 7 Nov 2017 08:57:52 +0000 (00:57 -0800)]
glsl: Make #pragma STDGL invariant(all) only modify outputs.
According to the GLSL ES 3.20, GLSL 4.50, and GLSL 1.20 specs:
"To force all output variables to be invariant, use the pragma
#pragma STDGL invariant(all)
before all declarations in a shader."
Notably, this is only supposed to affect output variables. Furthermore,
"Only variables output from a shader can be candidates for invariance."
It looks like this has been wrong since we first supported the pragma in
2011 (commit
86b4398cd158024f6be9fa830554a11c2a7ebe0c).
Fixes dEQP-GLES2.functional.shaders.preprocessor.pragmas.pragma_fragment.
v2: Now that all cases are identical (other than compute shaders, which
have no output variables anyway), we can drop the switch statement
entirely. We also don't need the current_function == NULL check;
this was a hold over from when we had a single var_mode_out for both
function parameters and shader varyings, in the bad old days.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tapani Pälli [Tue, 31 Oct 2017 08:56:28 +0000 (10:56 +0200)]
i965: expose SRGB visuals and turn on EGL_KHR_gl_colorspace
Patch exposes sRGB visuals and adds DRI integer query support for
__DRI2_RENDERER_HAS_FRAMEBUFFER_SRGB. Further changes make sure that
we mark if the app explicitly wanted sRGB and for these framebuffers
we don't turn sRGB off in intel_gles3_srgb_workaround. This way we
keep compatibility for existing applications relying on default sRGB
and ony add more visual support.
With this change, following dEQP tests start to pass:
dEQP-EGL.functional.wide_color.window_8888_colorspace_srgb
dEQP-EGL.functional.wide_color.pbuffer_8888_colorspace_srgb
v2: some code cleanup (Emil Velikov)
update num_formats correctly (reported by deveee@gmail.com)
v3: cleanup, remove redundant is_srgb
rename explicit_srgb as 'need_srgb' to follow style better
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v2)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102264
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102354
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102503
Neil Roberts [Mon, 30 Oct 2017 12:22:49 +0000 (13:22 +0100)]
glsl: Transform fb buffers are only active if a variable uses them
The GL spec will soon be revised to clarify that a buffer binding for
a transform feedback buffer is only required if a variable is actually
defined to use the buffer binding point. Previously a declaration for
the default transform buffer would make it require a binding even if
nothing was declared to use the default buffer.
Affects:
KHR-GL44/45.enhanced_layouts.xfb_stride_of_empty_list
KHR-GL44/45.enhanced_layouts.xfb_stride_of_empty_list_and_api
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Sat, 28 Oct 2017 16:02:14 +0000 (09:02 -0700)]
intel/nir: Use the correct indirect lowering masks in link_shaders
Previously, if we were linking a vec4 VS with a SIMD8/16 FS, we wouldn't
lower indirects on the fragment shader which is wrong. Instead of using
a single indirect mask, take advantage of our new little helper.
Reviewed-by: Timothy Arceri <tarceri at itsqueeze.com>
Cc: mesa-stable@lists.freedesktop.org
Ilia Mirkin [Sat, 4 Nov 2017 17:49:45 +0000 (13:49 -0400)]
r600g: use SIMPLE_FLOAT for blending to enable some optimizations
Radeonsi also sets this flag. Seems to avoid pulling up the desintation
RT value when the dst blend factor is zero if it's not otherwise being
loaded. Among other things, it allows blending to overwrite infinity/NaN
values in the destination RT.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Ilia Mirkin [Sat, 4 Nov 2017 17:49:04 +0000 (13:49 -0400)]
nv50: make blending work so that zero wins in a multiplication
This matches nvc0 behavior, tested with the fbo-float-nan piglit.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann<tobias.johannes.klausmann@mni.thm.de>
Ian Romanick [Thu, 2 Nov 2017 00:34:12 +0000 (17:34 -0700)]
glsl: Minor cleanups after previous commit
I think it's more clear to only call emit_access once. The only
difference between the two calls is the value of size_mul used for the
offset parameter... but you really have to look at it to be sure.
The s/is_64bit/is_double/ change is because there are no int64_t or
uint64_t matrix types.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Ian Romanick [Wed, 1 Nov 2017 23:40:32 +0000 (16:40 -0700)]
glsl: Use more link_calculate_matrix_stride in lower_buffer_access
I was going to squash this with the previous commit, but there's a lot
of churn in that commit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Ian Romanick [Wed, 1 Nov 2017 23:34:32 +0000 (16:34 -0700)]
glsl: Use link_calculate_matrix_stride in lower_buffer_access and friends
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Ian Romanick [Wed, 1 Nov 2017 23:27:53 +0000 (16:27 -0700)]
glsl: Refactor matrix stride calculation into a utility function
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Ian Romanick [Wed, 1 Nov 2017 05:18:10 +0000 (22:18 -0700)]
glsl/linker: Optimize swizzles again after linking
Without this, the SPIR-V generator has to deal with a bunch of junk
like:
(swiz z (swiz xxx (swiz x (var_ref packed:binormal.z,light_dir))))
It seems better to cull that stuff out than to add code to deal with
it. The problem is the way swizzles to and from scalars have to be
handled in SPIR-V.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Ian Romanick [Wed, 1 Nov 2017 06:37:14 +0000 (23:37 -0700)]
glsl: Combine nop-swizzle optimization with swizzle-swizzle optimization
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: <thomashelland90@gmail.com>
Ian Romanick [Wed, 1 Nov 2017 06:16:38 +0000 (23:16 -0700)]
glsl: Make the swizzle-swizzle optimization greedy
If there is a long sequence of swizzled swizzles, compact all of them
down to a single swizzle.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: <thomashelland90@gmail.com>
Ian Romanick [Tue, 31 Oct 2017 22:06:24 +0000 (15:06 -0700)]
glsl: Remove program_resource_visitor::visit_field(const glsl_struct_field *)
I could not find any remaining users.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Ian Romanick [Wed, 1 Nov 2017 23:25:37 +0000 (16:25 -0700)]
glsl: Silence unused parameter warning
glsl/lower_shared_reference.cpp: In member function ‘virtual void
{anonymous}::lower_shared_reference_visitor::insert_buffer_access(void*,
ir_dereference*, const glsl_type*, ir_rvalue*, unsigned int, int)’:
glsl/lower_shared_reference.cpp:244:58: warning: unused parameter
‘channel’ [-Wunused-parameter]
int channel)
^~~~~~~
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Dave Airlie [Thu, 9 Nov 2017 01:00:50 +0000 (01:00 +0000)]
ac/nir: add support for all intrinsics. (v2)
This is derived from tgsi/radeonsi code from the GLSL intrinsics.
This should pre-fix radv for the upcoming spirv patches.
v2: actually use wait_cnt, sleep deprived dad time! (Bas)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>