mesa.git
7 years agonir: handle get_buffer_size in nir_lower_atomics_to_ssbo
Rob Clark [Sun, 5 Nov 2017 18:53:50 +0000 (13:53 -0500)]
nir: handle get_buffer_size in nir_lower_atomics_to_ssbo

Overlooked initially, be we need to remap the SSBO index for this as
well.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoanv/meson: Generate dev_icd.json
Chad Versace [Wed, 1 Nov 2017 20:47:55 +0000 (13:47 -0700)]
anv/meson: Generate dev_icd.json

I tested this in a setup where the builddir was outside of the srcdir.

Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
7 years agoanv: Fix architecture in intel_icd.{arch}.json
Chad Versace [Thu, 9 Nov 2017 23:02:13 +0000 (15:02 -0800)]
anv: Fix architecture in intel_icd.{arch}.json

Use the host arch, not the target arch. In Meson and in recent
Autotools, the host arch is where the binary will be used. The target
arch is useful only when compiling a compiler.

See: http://mesonbuild.com/Cross-compilation.html
See: https://www.gnu.org/software/automake/manual/html_node/Cross_002dCompilation.html
Reported-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
7 years agoradv: Fix architecture in radeon_icd.{arch}.json
Chad Versace [Thu, 9 Nov 2017 22:55:37 +0000 (14:55 -0800)]
radv: Fix architecture in radeon_icd.{arch}.json

Use the host arch, not the target arch. In Meson and in recent
Autotools, the host arch is where the binary will be used. The target
arch is useful only when compiling a compiler.

See: http://mesonbuild.com/Cross-compilation.html
See: https://www.gnu.org/software/automake/manual/html_node/Cross_002dCompilation.html
Reported-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoanv: Refactor anv_GetImageSubresourceLayout()
Chad Versace [Tue, 7 Nov 2017 03:35:43 +0000 (19:35 -0800)]
anv: Refactor anv_GetImageSubresourceLayout()

Its helper function, anv_surface_get_subresource_layout(), was not very
helpful. So fold it into the main function.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv/image: Refactor choice of isl_tiling_flags_t
Chad Versace [Tue, 7 Nov 2017 02:51:53 +0000 (18:51 -0800)]
anv/image: Refactor choice of isl_tiling_flags_t

Instead of choosing the tiling flags inside make_surface(), which is
called once per aspect in a loop, and which chooses the same tiling for
each aspect, choose the tiling flags exactly once before entering the
aspect loop.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Refactor anv_get_format_plane() - explicit unsupported
Chad Versace [Fri, 3 Nov 2017 23:42:16 +0000 (16:42 -0700)]
anv: Refactor anv_get_format_plane() - explicit unsupported

The same local variable, 'plane_format', was returned on success *and*
failure. Be more explicit in distinguishing the two cases: return
'plane_format' on success and return 'unsupported' on failure.

This simplifies the diff in upcoming patches for
VK_EXT_image_drm_format_modifier.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Remove anv_physical_device_get_format_properties()
Chad Versace [Fri, 3 Nov 2017 23:35:12 +0000 (16:35 -0700)]
anv: Remove anv_physical_device_get_format_properties()

Fold its body into its sole caller,
anv_GetPhysicalDeviceFormatProperties().

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Simplify anv_physical_device_get_format_properties()
Chad Versace [Fri, 3 Nov 2017 19:44:34 +0000 (12:44 -0700)]
anv: Simplify anv_physical_device_get_format_properties()

Now that get_image_format_properties() returns the correct
VkFormatFeatureFlags, we can remove the unneeded if-branch and some
local variables.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Simplify anv_get_image_format_properties()
Chad Versace [Fri, 3 Nov 2017 23:26:51 +0000 (16:26 -0700)]
anv: Simplify anv_get_image_format_properties()

Now that get_image_format_features() has a VkImageTiling parameter, we
can bypass anv_physical_device_get_format_properties() and call
get_image_format_features() directly.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Rename get_image_format_properties()
Chad Versace [Fri, 3 Nov 2017 21:50:00 +0000 (14:50 -0700)]
anv: Rename get_image_format_properties()

The name is misleading. It looks like vkGetPhysicalDeviceImageFormatProperties(),
but it actually implement vkGetPhysicalDeviceFormatProperties. Let's
rename it to what it actually does, get_image_format_features(), because it
returns VkFormatFeatureFlags.

For consistency, also rename get_buffer_format_properties() to
get_buffer_format_features().

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Fix get_image_format_properties() - YCbCr
Chad Versace [Fri, 3 Nov 2017 19:32:00 +0000 (12:32 -0700)]
anv: Fix get_image_format_properties() - YCbCr

Teach it to calculate the format features for YCbCr.

The goal (which is completed in this patch) is to incrementally fix
get_image_format_properties() to return a correct result.  Previously,
it returned incorrect VkFormatFeatureFlags which the caller needed clean
up.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Fix get_image_format_properties() - 3-channel formats
Chad Versace [Fri, 3 Nov 2017 19:25:51 +0000 (12:25 -0700)]
anv: Fix get_image_format_properties() - 3-channel formats

Teach it to calculate the format features for 3-channel formats.

The goal is to incrementally fix get_image_format_properties() to return
a correct result.  Currently, it returns incorrect VkFormatFeatureFlags
which the caller must clean up.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Refactor get_image_format_properties() - Reduce params
Chad Versace [Fri, 3 Nov 2017 19:20:21 +0000 (12:20 -0700)]
anv: Refactor get_image_format_properties() - Reduce params

Replace parameters 'enum isl_format' and 'struct anv_format_plane' with
new parameter 'const struct anv_format *'.

The goal is to incrementally fix get_image_format_properties() to return
a correct result.  Currently, it returns incorrect VkFormatFeatureFlags
which the caller must clean up.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv: Refactor get_image_format_properties() - base_isl_format
Chad Versace [Fri, 3 Nov 2017 19:19:36 +0000 (12:19 -0700)]
anv: Refactor get_image_format_properties() - base_isl_format

Rename parameter 'base' to 'base_isl_format'.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv: Refactor get_image_format_properties() - plane_format
Chad Versace [Fri, 3 Nov 2017 01:28:02 +0000 (18:28 -0700)]
anv: Refactor get_image_format_properties() - plane_format

Rename parameter 'format' to 'plane_format'.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Refactor get_image_format_properties() - ASTC
Chad Versace [Fri, 3 Nov 2017 01:26:23 +0000 (18:26 -0700)]
anv: Refactor get_image_format_properties() - ASTC

Teach it to calculate the format features for ASTC.

The goal is to incrementally fix get_image_format_properties() to return
a correct result.  Currently, it returns incorrect VkFormatFeatureFlags
which the caller must clean up.

v2: New commit message

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Refactor get_image_format_properties() - depthstencil (v2)
Chad Versace [Fri, 3 Nov 2017 01:24:08 +0000 (18:24 -0700)]
anv: Refactor get_image_format_properties() - depthstencil (v2)

Teach it to calculate the features of depthstencil formats.

The goal is to incrementally fix get_image_format_properties() to return
a correct result.  Currently, it returns incorrect VkFormatFeatureFlags
which the caller must clean up.

v2: New commit message

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Better types for 'aspect' function params
Chad Versace [Fri, 3 Nov 2017 00:26:51 +0000 (17:26 -0700)]
anv: Better types for 'aspect' function params

Some functions have a comment that says "Exactly one bit must be in
'aspect'". So change the type of their 'aspect' parameter from
VkImageAspectFlags to VkImageAspectFlagBits.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv: Refactor get_buffer_format_properties()
Chad Versace [Thu, 2 Nov 2017 23:55:55 +0000 (16:55 -0700)]
anv: Refactor get_buffer_format_properties()

Make it a stand-alone function. Pre-patch, for some formats the function
returned incorrect VkFormatFeatureFlags which were cleaned up by the
caller.

This prepares for a cleaner implementation of
VK_EXT_image_drm_format_modifier.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agobroadcom/vc4: Fix simulator mode for the MADVISE usage.
Eric Anholt [Thu, 9 Nov 2017 23:51:56 +0000 (15:51 -0800)]
broadcom/vc4: Fix simulator mode for the MADVISE usage.

7 years agomesa: enable ARB_texture_buffer_* extensions in the Compatibility profile
Marek Olšák [Thu, 19 Oct 2017 20:22:15 +0000 (22:22 +0200)]
mesa: enable ARB_texture_buffer_* extensions in the Compatibility profile

We already have piglit tests testing alpha, luminance, and intensity
formats. They were skipped by piglit until now.

Additionally, I'm enabling one ARB_texture_buffer_range piglit test to run
with the compat profile.

i965 behavior is unchanged except that it doesn't expose TBOs in the Compat
profile. Not sure how that affects the GL version override.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
7 years agodocs: update r600 atomic counter status.
Dave Airlie [Thu, 2 Nov 2017 01:04:53 +0000 (11:04 +1000)]
docs: update r600 atomic counter status.

Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agor600: add support for hw atomic counters. (v3)
Dave Airlie [Thu, 2 Nov 2017 00:26:51 +0000 (10:26 +1000)]
r600: add support for hw atomic counters. (v3)

This adds support for the evergreen/cayman atomic counters.

These are implemented using GDS append/consume counters. The values
for each counter are loaded before drawing and saved after each draw
using special CP packets.

v2: move hw atomic assignment into driver.
v3: fix messing up caps (Gert Wollny), only store ranges in driver,
drop buffers.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
7 years agost/mesa: add support for hw atomics to glsl->tgsi. (v5)
Dave Airlie [Wed, 1 Nov 2017 19:51:36 +0000 (05:51 +1000)]
st/mesa: add support for hw atomics to glsl->tgsi. (v5)

This adds support for creating the hw atomic tgsi from
the glsl codepaths.

v2: drop the atomic index and move to backend.
v3: drop buffer decls. (Marek)
v4: fix off by one (Gert)
v5: fix off by one the other way (Dave)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agost/mesa: setup hw atomic limits. (v1.1)
Dave Airlie [Wed, 1 Nov 2017 19:49:40 +0000 (05:49 +1000)]
st/mesa: setup hw atomic limits. (v1.1)

HW atomics need to use caps to set some limits, and some
other limits may also need limiting.

This fixes things up to work for evergreen hw, it may need
more changes in the future if other hw wants to use this path.

v1.1: fix indent.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agost/mesa: start adding support for hw atomics atom. (v2)
Dave Airlie [Wed, 1 Nov 2017 04:30:13 +0000 (14:30 +1000)]
st/mesa: start adding support for hw atomics atom. (v2)

This adds a new atom that calls the new driver API to
bind buffers containing hw atomics.

v2: fixup bindings for sparse buffers. (mareko/nha)
don't bind buffer atomics when hw atomics are enabled.
use NewAtomicBuffer (mareko)

Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agomesa/program: add hw atomic counter file
Dave Airlie [Wed, 1 Nov 2017 04:22:04 +0000 (14:22 +1000)]
mesa/program: add hw atomic counter file

This is needed for the GLSL->TGSI translation for hw atomic counters.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agogallium: add hw atomic buffer binding API.
Dave Airlie [Wed, 1 Nov 2017 04:17:49 +0000 (14:17 +1000)]
gallium: add hw atomic buffer binding API.

This API binds atomic buffers for all bound shaders (as per the
GL semantics).

This is needed to support cross shader hw atomic counters.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agogallium/tgsi: start adding hw atomics (v3.2)
Dave Airlie [Wed, 1 Nov 2017 04:05:19 +0000 (14:05 +1000)]
gallium/tgsi: start adding hw atomics (v3.2)

This adds support for a hw atomic counters to TGSI.

A new register file for storing atomic counters is added,
along with a new atomic counter semantic, along with docs
for both.

v2: drop semantic, move hw counter to backend,
Ilia pointed out SSO would have busted my plan, and he
was right.
v3: drop BUFFER decls. (Marek)
v3.1: minor fixups for whitespace, set ureg error
if we overflow the hw atomic limits. (nha)
v3.2: fix some docs inconsistencies (Ilia)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agogallium: add CAPs to support HW atomic counters. (v3)
Dave Airlie [Tue, 8 Aug 2017 03:13:03 +0000 (13:13 +1000)]
gallium: add CAPs to support HW atomic counters. (v3)

This looks like an evergreen specific feature, but with atomic
counters AMD have hw specific counters they use instead of operating
on buffers directly. These are separate to the buffer atomics,
so require different limits and code paths.

I've left the CAP for atomic type extensible in case someone
else has a variant on this sort of thing (freedreno maybe?)
and needs to change it.

This adds all the CAPs required to add support for those atomic
counters, along with a related CAP for limiting the number of
output resources.

I'd like to land this and the st patch then I can start to
upstream the evergreen support for these and other GL4.x features.

v2: drop the ATOMIC_COUNTER_MODE cap, just use the return
from the HW counters. If 0 we use the current mode.
v3: fix some rebase errors (Gert Wollny)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agor600/query: drop rest of vi workaround code.
Dave Airlie [Thu, 9 Nov 2017 05:48:18 +0000 (15:48 +1000)]
r600/query: drop rest of vi workaround code.

This isn't needed in r600 anymore.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agodocs: Fix GL_MESA_program_debug enums
Roland Scheidegger [Tue, 7 Nov 2017 00:43:51 +0000 (01:43 +0100)]
docs: Fix GL_MESA_program_debug enums

13b303ff9265b89bdd9100e32f905e9cdadfad81 added the actual enums but
didn't remove the already existing XXXX ones. (And also duplicated
the "fragment" names instead of using the "vertex" names.)

Fixes: 13b303ff9265b89bdd91 "docs: Update the list of used MESA GL enums."
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
7 years agost/mesa: remove 'struct' keyword on function parameter
Brian Paul [Thu, 9 Nov 2017 17:48:42 +0000 (10:48 -0700)]
st/mesa: remove 'struct' keyword on function parameter

st_src_reg is a class, not a struct.  Simply remove 'struct' to silence
a MSVC compiler warning (class vs. struct mismatch).

Reviewed-by; Charmaine Lee <charmainel@vmware.com>

7 years agothreads: fix MinGW build breakage
Brian Paul [Thu, 9 Nov 2017 16:43:37 +0000 (09:43 -0700)]
threads: fix MinGW build breakage

Fixes: f1a364878431c8 ("threads: update for late C11 changes")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
7 years agomesa: s/GLint/gl_buffer_index/ for _ColorDrawBufferIndexes
Brian Paul [Thu, 9 Nov 2017 05:18:40 +0000 (22:18 -0700)]
mesa: s/GLint/gl_buffer_index/ for _ColorDrawBufferIndexes

Also fix local variable declarations and replace -1 with BUFFER_NONE.
No Piglit changes.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
7 years agomesa: s/GLint/gl_buffer_index/ for _ColorReadBufferIndex
Brian Paul [Thu, 9 Nov 2017 05:09:59 +0000 (22:09 -0700)]
mesa: s/GLint/gl_buffer_index/ for _ColorReadBufferIndex

BUFFER_NONE is -1 so no reason for GLint.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
7 years agomesa: minor reformatting, add const to gl_external_samplers()
Brian Paul [Thu, 9 Nov 2017 05:07:08 +0000 (22:07 -0700)]
mesa: minor reformatting, add const to gl_external_samplers()

This function should probably be moved elsewhere, too.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
7 years agost/mesa: whitespace clean-up in st_mesa_to_tgsi.c
Brian Paul [Mon, 6 Nov 2017 16:28:02 +0000 (09:28 -0700)]
st/mesa: whitespace clean-up in st_mesa_to_tgsi.c

Remove trailing whitespace, fix indentation, wrap lines to 78 columns, etc.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
7 years agomeson: implement default driver arguments
Dylan Baker [Mon, 30 Oct 2017 17:17:22 +0000 (10:17 -0700)]
meson: implement default driver arguments

This allows drivers to be set by OS/arch in a sane manner.

v2: - set _drivers to a list of drivers instead of manually assigning
      each with_*
v3: - Use "auto" instead of "default", which matches the value of other
      automatically configured options.
    - Set vulkan drivers as well
    - Add error message if no automatic drivers are known for a given
      arch/OS combo
    - use not(darwin or windows) instead of (linux or *bsd), which is
      probably more accurate (that way Solaris and other *nix systems
      aren't excluded)
    - rename softpipe to swrast, as swrast is the actual option name

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
7 years agoi965: Pretend there are 4 subslices for compute shader threads on Gen9+.
Kenneth Graunke [Wed, 8 Nov 2017 18:56:00 +0000 (10:56 -0800)]
i965: Pretend there are 4 subslices for compute shader threads on Gen9+.

Similar to what we did for pixel shader threads - see gen_device_info.c.

We don't want to bump the actual Maximum Number of Threads though, so
we adjust it here.  For pixel shaders, we don't use max_wm_threads, so
we could just bump it globally.

Supposedly fixes Piglit tests:
arb_gpu_shader_int64/execution/built-in-functions/cs-op-div-i64vec3-int64_t
arb_gpu_shader_int64/execution/built-in-functions/cs-op-div-i64vec4-int64_t
arb_gpu_shader_int64/execution/built-in-functions/cs-op-div-u64vec4-uint64_t

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agomeson: Add script to use VERSION file for getting version
Dylan Baker [Wed, 1 Nov 2017 18:54:10 +0000 (11:54 -0700)]
meson: Add script to use VERSION file for getting version

Meson has up until this point set it's version in the root meson.build
script, while the other build systems read the VERSION file. This is
just "one more thing" to duplicate between meson and every other build
system. This script is a simple "read, strip, print" sort of deal to
allow meson to read the VERSION file.

I chose to implement this in python since python is portable, and to
keep the meson.build script clean. This is also complicated by the fact
that the project() call *must* be the first non-comment,non-blank in the
toplevel meson.build script.

v2: - Move from scripts/ to bin/
    - use python explicitly to run the scripts to support windows

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agobroadcom/vc4: Mark BOs as purgeable when they enter the BO cache
Boris Brezillon [Tue, 26 Sep 2017 11:37:40 +0000 (13:37 +0200)]
broadcom/vc4: Mark BOs as purgeable when they enter the BO cache

This patch makes use of the DRM_IOCTL_VC4_GEM_MADVISE ioctl to mark all
BOs placed in the mesa BO cache as purgeable so that the system can
reclaim this memory under memory pressure.

v2:
- Removed BOs from the cache when they've been purged by the kernel
- Check whether the madvise ioctl is supported or not before using it

v3: Don't walk the whole list when we find a busy BO (by anholt, acked by
    Boris)

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
7 years agodrm-uapi: Update vc4 header from drm-next
Boris Brezillon [Tue, 26 Sep 2017 07:35:53 +0000 (09:35 +0200)]
drm-uapi: Update vc4 header from drm-next

Taken from drm-next d65d31388a23 ("Merge tag
'drm-misc-next-fixes-2017-11-07' of
git://anongit.freedesktop.org/drm/drm-misc into drm-next")

v2: Add the NOTSUPP definition from the final drm-next version, not the
    commit (anholt).

Signed-off-by: Boris Brezillon <boris.brezillon@free-electrons.com>
7 years agomeson: Enable VC4's NEON assembly support.
Eric Anholt [Wed, 8 Nov 2017 22:07:38 +0000 (14:07 -0800)]
meson: Enable VC4's NEON assembly support.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Timothy Arceri <tarceri@itsqueeze.com>
7 years agomeson: Always link libgallium_dri.so against dep_thread.
Eric Anholt [Wed, 8 Nov 2017 22:00:51 +0000 (14:00 -0800)]
meson: Always link libgallium_dri.so against dep_thread.

Somehow on my cross build the -pthread is getting lost.  All the other
deps seem to work out fine.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Tested-by: Timothy Arceri <tarceri@itsqueeze.com>
7 years agomeson: Drop stale comment about making valgrind conditional.
Eric Anholt [Wed, 8 Nov 2017 19:54:24 +0000 (11:54 -0800)]
meson: Drop stale comment about making valgrind conditional.

It was fixed in 5c2ff5773a707519f6a773126f201c4e1e8a42d7.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Tested-by: Timothy Arceri <tarceri@itsqueeze.com>
7 years agomeson: Leave dep_llvm empty if !with_llvm
Eric Anholt [Wed, 8 Nov 2017 19:52:09 +0000 (11:52 -0800)]
meson: Leave dep_llvm empty if !with_llvm

The gallium auxiliary build would link against llvm, for the gallivm code
that it didn't build.  This broke the build on my armhf cross, where
libLLVM-3.9.so is not multiarch and thus points to x86-64 libs.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Tested-by: Timothy Arceri <tarceri@itsqueeze.com>
7 years agoRevert "glx: Implement GLX_EXT_no_config_context (v2)"
Adam Jackson [Thu, 9 Nov 2017 16:41:14 +0000 (11:41 -0500)]
Revert "glx: Implement GLX_EXT_no_config_context (v2)"

Pushed ahead of things actually working.

This reverts commit 5293b96b160b904c0e53cbce93679c3aa090f846.

7 years agoradeonsi: pack r600_surface better
Marek Olšák [Tue, 7 Nov 2017 18:00:20 +0000 (19:00 +0100)]
radeonsi: pack r600_surface better

160 -> 136 bytes

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: pack r600_texture better
Marek Olšák [Tue, 7 Nov 2017 17:53:37 +0000 (18:53 +0100)]
radeonsi: pack r600_texture better

1752 -> 1736 bytes

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: clean up r600_surface
Marek Olšák [Tue, 7 Nov 2017 17:53:59 +0000 (18:53 +0100)]
radeonsi: clean up r600_surface

216 -> 160 bytes

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: remove r600_texture::non_disp_tiling
Marek Olšák [Tue, 7 Nov 2017 17:42:53 +0000 (18:42 +0100)]
radeonsi: remove r600_texture::non_disp_tiling

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: remove DBG_NO_DISCARD_RANGE
Marek Olšák [Tue, 7 Nov 2017 17:42:23 +0000 (18:42 +0100)]
radeonsi: remove DBG_NO_DISCARD_RANGE

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoglx: Implement GLX_EXT_no_config_context (v2)
Adam Jackson [Tue, 7 Nov 2017 16:36:53 +0000 (11:36 -0500)]
glx: Implement GLX_EXT_no_config_context (v2)

This more or less ports EGL_KHR_no_config_context to GLX.

v2: Enable the extension only for those backends that support it.

Khronos: https://github.com/KhronosGroup/OpenGL-Registry/pull/102
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Adam Jackson <ajax@redhat.com>
7 years agoglx: Prepare the DRI backends for GLX_EXT_no_config_context
Adam Jackson [Tue, 7 Nov 2017 16:36:52 +0000 (11:36 -0500)]
glx: Prepare the DRI backends for GLX_EXT_no_config_context

This should be safe as these backends already support the EGL version of
this extension. DRI1 is not affected because it does not support
GLX_ARB_create_context anyway. DRI-Windows is not prepared to implement
this as there's no equivalent WGL extension, and wglCreateContextAttribs
seems to really want the HDC's pixel format to be set.

Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
7 years agoglx: Relax validate_renderType_against_config for EXT_no_config_context
Adam Jackson [Tue, 7 Nov 2017 16:36:51 +0000 (11:36 -0500)]
glx: Relax validate_renderType_against_config for EXT_no_config_context

Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>
7 years agoanv: fix build failure
Nicolai Hähnle [Thu, 9 Nov 2017 13:49:19 +0000 (14:49 +0100)]
anv: fix build failure

Fixes: e3a8013de8ca ("util/u_queue: add util_queue_fence_wait_timeout")
7 years agomesa: flush and wait after creating a fallback texture
Nicolai Hähnle [Sun, 22 Oct 2017 15:39:04 +0000 (17:39 +0200)]
mesa: flush and wait after creating a fallback texture

Fixes non-deterministic failures in
dEQP-EGL.functional.sharing.gles2.multithread.simple_egl_sync.images.texture_source.teximage2d_render
and others in dEQP-EGL.functional.sharing.gles2.multithread.*

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agomesa: increase MaxServerWaitTimeout
Nicolai Hähnle [Sun, 22 Oct 2017 15:39:03 +0000 (17:39 +0200)]
mesa: increase MaxServerWaitTimeout

The current value was introduced in commit a27180d0d8666, which claims
that it represents ~1.11 years. However, it is interpreted in nanoseconds,
so it actually only represents ~9.8 hours. That seems a bit short.

Use the largest value consistent with both int32 and int64. It
corresponds to ~292 years in nanoseconds.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agost/mesa: remove redundant flushes from st_flush
Nicolai Hähnle [Sun, 22 Oct 2017 15:39:02 +0000 (17:39 +0200)]
st/mesa: remove redundant flushes from st_flush

st_flush should flush state tracker-internal state and the pipe, but
not mesa/main state. Of the four callers:

- glFlush/glFinish already call FLUSH_{VERTICES,STATE}.
- st_vdpau doesn't need to call them.
- st_manager will now call them explicitly.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agost/dri: use stapi flush instead of pipe flush when creating fences
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:37 +0000 (17:38 +0200)]
st/dri: use stapi flush instead of pipe flush when creating fences

There may be pending operations (e.g. vertices) that need to be flushed
by the state tracker.

Found by inspection.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: use a threaded context even for debug contexts
Nicolai Hähnle [Sun, 22 Oct 2017 15:39:05 +0000 (17:39 +0200)]
radeonsi: use a threaded context even for debug contexts

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: record and dump time of flush
Nicolai Hähnle [Sun, 22 Oct 2017 15:39:05 +0000 (17:39 +0200)]
radeonsi: record and dump time of flush

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoddebug: optionally handle transfer commands like draws
Nicolai Hähnle [Sun, 22 Oct 2017 15:39:01 +0000 (17:39 +0200)]
ddebug: optionally handle transfer commands like draws

Transfer commands can have associated GPU operations.

Enabled by passing GALLIUM_DDEBUG=transfers.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoddebug: dump context and before/after times of draws
Nicolai Hähnle [Sun, 22 Oct 2017 15:39:01 +0000 (17:39 +0200)]
ddebug: dump context and before/after times of draws

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoddebug: generalize print_named_xxx via a PRINT_NAMED macro
Nicolai Hähnle [Sun, 22 Oct 2017 15:39:00 +0000 (17:39 +0200)]
ddebug: generalize print_named_xxx via a PRINT_NAMED macro

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoddebug: rewrite to always use a threaded approach
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:59 +0000 (17:38 +0200)]
ddebug: rewrite to always use a threaded approach

This patch has multiple goals:

1. Off-load the writing of records in 'always' mode to another thread
   for performance.

2. Allow using ddebug with threaded contexts. This really forces us to
   move some of the "after_draw" handling into another thread.

3. Simplify the different modes of ddebug, both in the code and in
   the user interface, i.e. GALLIUM_DDEBUG. In particular, there's
   no 'pipelined' anymore, since we're always pipelined; and 'noflush'
   is replaced by 'flush', since we no longer flush by default.

4. Fix the fences in pipelining mode. They previously relied on writes
   via pipe_context::clear_buffer. However, on radeonsi, those could
   (quite reasonably) end up in the SDMA buffer. So we use the newly
   added PIPE_FLUSH_{TOP,BOTTOM}_OF_PIPE fences instead.

5. Improve pipelined mode overall, using the finer grained information
   provided by the new fences.

Overall, the result is that pipelined mode should be more useful, and
using ddebug in default mode is much less invasive, in the sense that
it changes the overall driver behavior less (which is kind of crucial
for a driver debugging tool).

An example of the new hang debug output:

  Gallium debugger active.
  Hang detection timeout is 1000ms.
  GPU hang detected, collecting information...

  Draw #   driver  prev BOP  TOP  BOP  dump file
  -------------------------------------------------------------
  2          YES      YES    YES  NO   /home/nha/ddebug_dumps/shader_runner_19919_00000000
  3          YES      NO     YES  NO   /home/nha/ddebug_dumps/shader_runner_19919_00000001
  4          YES      NO     YES  NO   /home/nha/ddebug_dumps/shader_runner_19919_00000002
  5          YES      NO     YES  NO   /home/nha/ddebug_dumps/shader_runner_19919_00000003

  Done.

We can see that there were almost certainly 4 draws in flight when
the hang happened: the top-of-pipe fence was signaled for all 4 draws,
the bottom-of-pipe fence for none of them. In virtually all cases,
we'd expect the first draw in the list to be at fault, but due to the
GPU parallelism, it's possible (though highly unlikely) that one of
the later draws causes a component to get stuck in a way that prevents
the earlier draws from making progress as well.

(In the above example, there were actually only 3 draws truly in flight:
the last draw is a blit that waits for the earlier draws; however, its
top-of-pipe fence is emitted before the cache flush and wait, and so
the fact that the draw hasn't truly started yet can only be seen from a
closer inspection of GPU state.)

Acked-by: Marek Olšák <marek.olsak@amd.com>
7 years agoddebug: use an atomic increment when numbering files
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:58 +0000 (17:38 +0200)]
ddebug: use an atomic increment when numbering files

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agodd/util: extract dd_get_debug_filename_and_mkdir
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:58 +0000 (17:38 +0200)]
dd/util: extract dd_get_debug_filename_and_mkdir

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agogallium/u_dump: add and use util_dump_transfer_usage
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:57 +0000 (17:38 +0200)]
gallium/u_dump: add and use util_dump_transfer_usage

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agogallium/u_dump: add util_dump_ns
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:56 +0000 (17:38 +0200)]
gallium/u_dump: add util_dump_ns

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agogallium/u_dump: export util_dump_ptr
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:55 +0000 (17:38 +0200)]
gallium/u_dump: export util_dump_ptr

Change format to %p while we're at it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: implement PIPE_FLUSH_{TOP,BOTTOM}_OF_PIPE
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:54 +0000 (17:38 +0200)]
radeonsi: implement PIPE_FLUSH_{TOP,BOTTOM}_OF_PIPE

v2: use uncached system memory for the fence, and use the CPU to
    clear it so we never read garbage when checking the fence

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: document some subtle details of fence_finish & fence_server_sync
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:53 +0000 (17:38 +0200)]
radeonsi: document some subtle details of fence_finish & fence_server_sync

v2: remove the change to si_fence_server_sync, we'll handle that more
    robustly

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agogallium: add pipe_context::callback
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:53 +0000 (17:38 +0200)]
gallium: add pipe_context::callback

For running post-draw operations inside the driver thread. ddebug will
use it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agogallium/u_threaded: implement pipe_context::set_log_context
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:52 +0000 (17:38 +0200)]
gallium/u_threaded: implement pipe_context::set_log_context

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agogallium/u_threaded: avoid syncs for get_query_result
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:51 +0000 (17:38 +0200)]
gallium/u_threaded: avoid syncs for get_query_result

Queries should still get marked as flushed when flushes are executed
asynchronously in the driver thread.

To this end, the management of the unflushed_queries list is moved into
the driver thread.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agogallium/u_threaded: implement asynchronous flushes
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:50 +0000 (17:38 +0200)]
gallium/u_threaded: implement asynchronous flushes

This requires out-of-band creation of fences, and will be signaled to
the pipe_context::flush implementation by a special TC_FLUSH_ASYNC flag.

v2:
- remove an incorrect assertion
- handle fence_server_sync for unsubmitted fences by
  relying on the improved cs_add_fence_dependency
- only implement asynchronous flushes on amdgpu

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agogallium/u_threaded: mark queries flushed only for non-deferred flushes
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:50 +0000 (17:38 +0200)]
gallium/u_threaded: mark queries flushed only for non-deferred flushes

The driver uses (and must use) the flushed flag of queries as a hint that
it does not have to check for synchronization with currently queued up
commands. Deferred flushes do not actually flush queued up commands, so
we must not set the flushed flag for them.

Found by inspection.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: move fence functions to si_fence.c
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:49 +0000 (17:38 +0200)]
radeonsi: move fence functions to si_fence.c

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agowinsys/amdgpu: handle cs_add_fence_dependency for deferred/unsubmitted fences
Nicolai Hähnle [Thu, 9 Nov 2017 13:00:22 +0000 (14:00 +0100)]
winsys/amdgpu: handle cs_add_fence_dependency for deferred/unsubmitted fences

The idea is to fix the following interleaving of operations
that can arise from deferred fences:

 Thread 1 / Context 1          Thread 2 / Context 2
 --------------------          --------------------
 f = deferred flush
 <------- application-side synchronization ------->
                               fence_server_sync(f)
                               ...
                               flush()
 flush()

We will now stall in fence_server_sync until the flush of context 1
has completed.

This scenario was unlikely to occur previously, because applications
seem to be doing

 Thread 1 / Context 1          Thread 2 / Context 2
 --------------------          --------------------
 f = glFenceSync()
 glFlush()
 <------- application-side synchronization ------->
                               glWaitSync(f)

... and indeed they probably *have* to use this ordering to avoid
deadlocks in the GLX model, where all GL operations conceptually
go through a single connection to the X server. However, it's less
clear whether applications have to do this with other WSI (i.e. EGL).
Besides, even this sequence of GL commands can be translated into
the Gallium-level sequence outlined above when Gallium threading
and asynchronous flushes are used. So it makes sense to be more
robust.

As a side effect, we no longer busy-wait on submission_in_progress.

We won't enable asynchronous flushes on radeon, but add a
cs_add_fence_dependency stub anyway to document the potential
issue.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agogallium: add PIPE_FLUSH_{TOP,BOTTOM}_OF_PIPE bits
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:48 +0000 (17:38 +0200)]
gallium: add PIPE_FLUSH_{TOP,BOTTOM}_OF_PIPE bits

These bits are intended to be used by the ddebug hang detection and are
named in analogy to the Vulkan stage bits (and the corresponding Radeon
pipeline event).

Hang detection needs fences on the granularity of individual commands,
which nothing else really covers. The closest alternative would have
been PIPE_QUERY_GPU_FINISHED, but (a) queries are a per-context object
and we really want a per-screen object, (b) queries don't offer a
wait with timeout, and (c) in any case, PIPE_QUERY_GPU_FINISHED is
meant to imply that GPU caches are flushed, which the new bits
explicitly aren't.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agogallium: add PIPE_FLUSH_ASYNC and PIPE_FLUSH_HINT_FINISH
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:47 +0000 (17:38 +0200)]
gallium: add PIPE_FLUSH_ASYNC and PIPE_FLUSH_HINT_FINISH

Also document some subtleties of pipe_context::flush.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoutil/u_queue: add util_queue_fence_wait_timeout
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:46 +0000 (17:38 +0200)]
util/u_queue: add util_queue_fence_wait_timeout

v2:
- style fixes
- fix missing timeout handling in futex path

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agothreads: update for late C11 changes
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:45 +0000 (17:38 +0200)]
threads: update for late C11 changes

C11 threads were changed to use struct timespec instead of xtime, and
thrd_sleep got a second argument.

See http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1554.htm and
http://en.cppreference.com/w/c/thread/{thrd_sleep,cnd_timedwait,mtx_timedlock}

Note that cnd_timedwait is spec'd to be relative to TIME_UTC / CLOCK_REALTIME.

v2: Fix Windows build errors. Tested with a default Appveyor config
    that uses Visual Studio 2013. Judging from Brian's email and
    random internet sources, Visual Studio 2015 does have timespec
    and timespec_get, hence the _MSC_VER-based guard which I have
    not tested.

Cc: Jose Fonseca <jfonseca@vmware.com>
Cc: Brian Paul <brianp@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
7 years agogallium: remove unused and deprecated u_time.h
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:44 +0000 (17:38 +0200)]
gallium: remove unused and deprecated u_time.h

Cc: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoutil: move os_time.[ch] to src/util
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:44 +0000 (17:38 +0200)]
util: move os_time.[ch] to src/util

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: always use async compiles when creating shader/compute states
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:43 +0000 (17:38 +0200)]
radeonsi: always use async compiles when creating shader/compute states

With Gallium threaded contexts, creating shader/compute states is
effectively a screen operation, so we should not use context state.

In particular, this allows us to avoid using the context's LLVM
TargetMachine.

This isn't an issue yet because u_threaded_context filters out non-async
debug callbacks, and we disable threaded contexts for debug contexts.
However, we may want to change that in the future.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: fix potential use-after-free of debug callbacks
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:42 +0000 (17:38 +0200)]
radeonsi: fix potential use-after-free of debug callbacks

Found by inspection.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: move pipe debug callback to si_context
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:39 +0000 (17:38 +0200)]
radeonsi: move pipe debug callback to si_context

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agou_queue: add util_queue_finish for waiting for previously added jobs
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:41 +0000 (17:38 +0200)]
u_queue: add util_queue_finish for waiting for previously added jobs

Schedule one job for every thread, and wait on a barrier inside the job
execution function.

v2: avoid alloca (fixes Windows build error)

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
7 years agoutil: move pipe_barrier into src/util and rename to util_barrier
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:40 +0000 (17:38 +0200)]
util: move pipe_barrier into src/util and rename to util_barrier

The #if guard is probably not 100% equivalent to the previous PIPE_OS
check, but if anything it should be an over-approximation (are there
pthread implementations without barriers?), so people will get either
a good implementation or compile errors that are easy to fix.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agogallium: add async debug message forwarding helper
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:38 +0000 (17:38 +0200)]
gallium: add async debug message forwarding helper

v2: use util_vasprintf for Windows portability

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
7 years agost/mesa: guard sampler views changes with a mutex
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:36 +0000 (17:38 +0200)]
st/mesa: guard sampler views changes with a mutex

Some locking is unfortunately required, because well-formed GL programs
can have multiple threads racing to access the same texture, e.g.: two
threads/contexts rendering from the same texture, or one thread destroying
a context while the other is rendering from or modifying a texture.

Since even the simple mutex caused noticable slowdowns in the piglit
drawoverhead micro-benchmark, this patch uses a slightly more involved
approach to keep locks out of the fast path:

- the initial lookup of sampler views happens without taking a lock
- a per-texture lock is only taken when we have to modify the sampler
  view(s)
- since each thread mostly operates only on the entry corresponding to
  its context, the main issue is re-allocation of the sampler view array
  when it needs to be grown, but the old copy is not freed

Old copies of the sampler views array are kept around in a linked list
until the entire texture object is deleted. The total memory wasted
in this way is roughly equal to the size of the current sampler views
array.

Fixes non-deterministic memory corruption in some
dEQP-EGL.functional.sharing.gles2.multithread.* tests, e.g.
dEQP-EGL.functional.sharing.gles2.multithread.simple.images.texture_source.create_texture_render

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agost/mesa: re-arrange st_finalize_texture
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:34 +0000 (17:38 +0200)]
st/mesa: re-arrange st_finalize_texture

Move the early-out for surface-based textures earlier. This narrows the
scope of the locking added in a follow-up commit.

Fix one remaining case of initializing a surface-based texture
without properly finalizing it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agogallium: clarify the constraints on sampler_view_destroy
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:33 +0000 (17:38 +0200)]
gallium: clarify the constraints on sampler_view_destroy

r600 expects the context that created the sampler view to still be alive
(there is a per-context list of sampler views).

svga currently bails when the context of destruction is not the same as
creation.

The GL state tracker, which is the only one that runs into the
multi-context subtleties (due to share groups), already guarantees that
sampler views are destroyed before their context of creation is destroyed.

Most drivers are context-agnostic, so the warning message in
pipe_sampler_view_release doesn't really make sense.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: reduce the scope of sel->mutex in si_shader_select_with_key
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:32 +0000 (17:38 +0200)]
radeonsi: reduce the scope of sel->mutex in si_shader_select_with_key

We only need the lock to guard changes in the variant linked list. The
actual compilation can happen outside the lock, since we use the ready
fence as a guard.

v2: fix double-unlock

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: use ready fences on all shaders, not just optimized ones
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:32 +0000 (17:38 +0200)]
radeonsi: use ready fences on all shaders, not just optimized ones

There's a race condition between si_shader_select_with_key and
si_bind_XX_shader:

  Thread 1                         Thread 2
  --------                         --------
  si_shader_select_with_key
    begin compiling the first
    variant
    (guarded by sel->mutex)
                                   si_bind_XX_shader
                                     select first_variant by default
                                     as state->current
                                   si_shader_select_with_key
                                     match state->current and early-out

Since thread 2 never takes sel->mutex, it may go on rendering without a
PM4 for that shader, for example.

The solution taken by this patch is to broaden the scope of
shader->optimized_ready to a fence shader->ready that applies to
all shaders. This does not hurt the fast path (if anything it makes
it faster, because we don't explicitly check is_optimized).

It will also allow reducing the scope of sel->mutex locks, but this is
deferred to a later commit for better bisectability.

Fixes dEQP-EGL.functional.sharing.gles2.multithread.simple.buffers.bufferdata_render

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agou_queue: add a futex-based implementation of fences
Nicolai Hähnle [Sun, 22 Oct 2017 15:38:31 +0000 (17:38 +0200)]
u_queue: add a futex-based implementation of fences

Fences are now 4 bytes instead of 96 bytes (on my 64-bit system).

Signaling a fence is a single atomic operation in the fast case plus a
syscall in the slow case.

Testing if a fence is signaled is the same as before (a simple comparison),
but waiting on a fence is now no more expensive than just testing it in
the fast (already signaled) case.

v2:
- style fixes
- use p_atomic_xxx macros with the right barriers

Acked-by: Marek Olšák <marek.olsak@amd.com>