mesa.git
6 years agoglsl: replace conditional compilation with MAYBE_UNUSED
Eric Engestrom [Mon, 18 Sep 2017 15:35:28 +0000 (16:35 +0100)]
glsl: replace conditional compilation with MAYBE_UNUSED

Suggested-by: Nicolai Hähnle <nhaehnle@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agobroadcom/vc4: Fix use-after-free when deleting a program.
Eric Anholt [Tue, 19 Sep 2017 00:47:44 +0000 (17:47 -0700)]
broadcom/vc4: Fix use-after-free when deleting a program.

By leaving the compiled shader in the context's stage state, the next
compile of a new FS would look in the old compiled FS for figuring out
whether to set various dirty flags for the VS compile.  Clear out the
pointer when deleting the program, and make sure that we always mark the
state as dirty if the previous program had been lost.  Fixes valgrind
warnings on glsl-max-varyings.

Fixes: 2350569a78c6 ("vc4: Avoid VS shader recompiles by keeping a set of FS inputs seen so far.")
6 years agoi965: Fix batch map failure check in INTEL_DEBUG=bat handling.
Kenneth Graunke [Tue, 19 Sep 2017 01:50:06 +0000 (18:50 -0700)]
i965: Fix batch map failure check in INTEL_DEBUG=bat handling.

I originally wrote the code to call the maps 'batch' and 'state',
until I remembered that 'batch' is the intel_batchbuffer struct pointer.
The NULL check was still using the wrong variable.

Caught by Coverity.

CID: 1418109

6 years agobroadcom/vc4: Fix crashes since the gallium blitter reworks.
Eric Anholt [Mon, 18 Sep 2017 19:58:05 +0000 (12:58 -0700)]
broadcom/vc4: Fix crashes since the gallium blitter reworks.

Even if we're not clearing color, the blitter has started dereferencing
the color value.

6 years agobroadcom/vc4: Fix use-after-free trying to mix a quad and tile clear.
Eric Anholt [Mon, 18 Sep 2017 22:17:31 +0000 (15:17 -0700)]
broadcom/vc4: Fix use-after-free trying to mix a quad and tile clear.

The blitter will bind just the depth buffer, which flushes the current job
if we had both a color and depth/stencil.  If the clear was doing partial
depth/stencil (quad-based) and color (tile-based), we'd go on to try to
set up the rest of the tile clear in the now flushed job.

Instead, move the partial clear up before we start setting up the job for
the current FBO state, and re-fetch the job if we're continuing on to a
tile-based clear.  Fixes valgrind failures in fbo-depthtex.

Fixes: 9421a6065c4e ("vc4: Fix fallback to quad clears of depth in GLX.")
6 years agobroadcom/vc4: Fix use-after-free for flushing when writing to a texture.
Eric Anholt [Mon, 18 Sep 2017 21:52:32 +0000 (14:52 -0700)]
broadcom/vc4: Fix use-after-free for flushing when writing to a texture.

I was trying to continue the hash table loop, not the inner loop.  This
tended to work out, because we would have *just* freed the job struct.
Fixes some valgrind failures in fbo-depthtex.

Fixes: f597ac396640 ("vc4: Implement job shuffling")
6 years agottn: Fix out-of-bounds accesses since the always-2D-constants change.
Eric Anholt [Thu, 7 Sep 2017 17:17:02 +0000 (10:17 -0700)]
ttn: Fix out-of-bounds accesses since the always-2D-constants change.

Only one of the three checks for dim was updated, so we would try to set a
UBO buffer index source value on a nir_load_uniform, and wouldn't actually
declare non-UBO uniforms.

Fixes: 37dd8e8dee1d ("gallium: all drivers should accept two-dimensional constant buffer indexing")
Tested-by: Derek Foreman <derekf@osg.samsung.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agoanv/android: Disable surface and swapchain extensions (v2)
Chad Versace [Fri, 1 Sep 2017 22:54:38 +0000 (15:54 -0700)]
anv/android: Disable surface and swapchain extensions (v2)

Android's Vulkan loader implements VK_KHR_surface and VK_KHR_swapchain,
and applications cannot access the driver's implementation. Moreoever,
if the driver exposes the those extension strings, then tests
dEQP-VK.api.info.instance.extensions and dEQP-VK.api.info.device fail
due to the duplicated strings.

v2: Replace !ANDROID with ANV_HAS_SURFACE. (for jekstrand)

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
6 years agoanv: Feed vk_android_native_buffer.xml to generators (v2)
Chad Versace [Tue, 22 Aug 2017 23:26:03 +0000 (16:26 -0700)]
anv: Feed vk_android_native_buffer.xml to generators (v2)

Feed the XML to anv_extensions.py and anv_entrypoints_gen.py.
Do it on all platforms, not just Android. Tested on Android and Fedora.

We always parse the Android XML, regardless of target platform, to
help reduce the chance that people working on non-Android break the
Android build.

v2:
  - Squash in Tapani's changes to Android.*.mk.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v1)
6 years agoanv: Teach generator scripts how to parse mutliple XML files
Chad Versace [Tue, 22 Aug 2017 23:23:26 +0000 (16:23 -0700)]
anv: Teach generator scripts how to parse mutliple XML files

The taught scripts are anv_extensions.py and anv_entrypoints_gen.py.  To
give a script multiple XML files, call it like so:

    anv_extensions.py --xml a.xml --xml b.xml --xml c.xml ...

The scripts parse the XML files in the given order.

This will allow us to feed the scripts XML files for extensions that are
missing from the official vk.xml, such as VK_ANDROID_native_buffer.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
6 years agovulkan/registry: Feed vk_android_native_buffer.xml to gen_enum_to_str.py
Chad Versace [Tue, 15 Aug 2017 23:48:38 +0000 (16:48 -0700)]
vulkan/registry: Feed vk_android_native_buffer.xml to gen_enum_to_str.py

Tested on Android and Fedora.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
6 years agovulkan/util: Teach gen_enum_to_str.py to parse mutliple XML files
Chad Versace [Tue, 15 Aug 2017 23:34:20 +0000 (16:34 -0700)]
vulkan/util: Teach gen_enum_to_str.py to parse mutliple XML files

To give the script multiple XML files, call it like so:

    gen_enum_to_str.py --xml a.xml --xml b.xml --xml c.xml ...

The script parses the XML files in the given order.

This will allow us to feed the script XML files for extensions that are
missing from the official vk.xml, such as VK_ANDROID_native_buffer.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
6 years agovulkan/registry: Add VK_ANDROID_native_buffer
Chad Versace [Mon, 10 Jul 2017 17:43:08 +0000 (10:43 -0700)]
vulkan/registry: Add VK_ANDROID_native_buffer

The VK_ANDROID_native_buffer extension is missing from the official
vk.xml. This patch defines the extension in a separate, minimal XML
file: vk_android_native_buffer.xml.

I chose to add the extension to a new XML file instead of adding it to
the official vk.xml in order to avoid conflicts each time we sync the
vk.xml from Khronos.

This should be only a temporary solution until Jesse Hall is persuaded
to add it to the official vk.xml.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
6 years agovulkan: Add #ifdef hack to vk_android_native_buffer.h
Chad Versace [Tue, 29 Aug 2017 21:41:24 +0000 (14:41 -0700)]
vulkan: Add #ifdef hack to vk_android_native_buffer.h

This patch consolidates many potential `#ifdef ANDROID` messes
throughout src/vulkan and src/intel/vulkan into a simple, localized
hack. The hack is an `#ifdef ANDROID` in vk_android_native_buffer.h
that, on non-Android platorms, avoids including the Android platform
headers and typedefs any Android-specific types to void*.

This hack doesn't remove *all* the `#ifdef ANDROID`s in upcoming
patches, but it does remove a lot.

I first tried implementing VK_ANDROID_native_buffer without this hack,
but eventually gave up when the yak shaving became too much.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
6 years agovulkan: Import vk_android_native_buffer.h
Chad Versace [Tue, 15 Nov 2016 00:05:40 +0000 (16:05 -0800)]
vulkan: Import vk_android_native_buffer.h

Just as Mesa imports the Khronos Vulkan headers, it should import this
Android-private Vulkan header too. This guarantees that Mesa will
continue to build even when upstream Android breaks header
compatibility.

This header is only for *implementers* of Vulkan, not for consumers of
Vulkan.

Imported from tag 'android-7.1.1_r28' in aosp/frameworks/native.

References: https://android.googlesource.com/platform/frameworks/native/+/android-7.1.1_r28/vulkan/include/vulkan/vk_android_native_buffer.h
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
6 years agoi965: Use prepare_external instead of make_shareable in setTexBuffer2
Jason Ekstrand [Tue, 12 Sep 2017 22:40:19 +0000 (15:40 -0700)]
i965: Use prepare_external instead of make_shareable in setTexBuffer2

The setTexBuffer2 hook from GLX is used to implement glxBindTexImageEXT
which has tighter restrictions than just "it's shared".  In particular,
it says that any rendering to the image while it is bound causes the
contents to become undefined.  This means that we can do whatever aux
tracking we want between glxBindTexImageEXT and glxReleaseTexImageEXT so
long as we always transition from external in Bind and to external in
Release.

The fact that we were using make_shareable before was a problem because
it would resolve away 100% of the aux data and then throw away our
reference to the aux buffer.  If the aux data was shared with some other
application (i.e. if we're using I915_FORMAT_MOD_Y_TILED_CCS) then we
would forget that the aux data even existed for the rest of eternity.
This is fine for the first frame but any subsequent calls to
glxBindTexImageEXT would bind the texture as if it has no aux
whatsoever and no resolves would happen and texturing would happen as if
there is no aux.  This was causing rendering corruption in mutter when
running on top of X11 with modifiers.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
6 years agoi965/tex_image: Reference the renderbuffer miptree in setTexBuffer2
Jason Ekstrand [Tue, 12 Sep 2017 21:26:04 +0000 (14:26 -0700)]
i965/tex_image: Reference the renderbuffer miptree in setTexBuffer2

The old code made a new miptree that referenced the same BO as the
renderbuffer and just trusted in the memory aliasing to work.  There are
only two ways in which the new miptree is liable to differ from the one
in the renderbuffer and neither of them matter:

 1) It may have a different target.  The only targets that we can ever
    see in intelSetTexBuffer2 are GL_TEXTURE_2D and GL_TEXTURE_RECTANGLE
    and the difference between the two doesn't matter as far as the
    miptree is concerned; genX(update_sampler_state) only looks at the
    gl_texture_object and not the miptree when determining whether or
    not to use normalized coordinates.

 2) It may have a very slightly different format.  Again, this doesn't
    matter because we've supported texture views for quite some time so
    we always look at the gl_texture_object format instead of the
    miptree format for hardware setup anyway.

On the other hand, because we were recreating the miptree, we were using
intel_miptree_create_for_bo which doesn't understand modifiers.  We
really want this function to work without doing a resolve so long as you
have modifiers so we need to fix that.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agoi965: Reset miptree aux state on update_image_buffer
Jason Ekstrand [Tue, 12 Sep 2017 22:24:40 +0000 (15:24 -0700)]
i965: Reset miptree aux state on update_image_buffer

When we get a miptree in through glxBindImageEXT, we don't know the
current aux state so we have to assume the worst-case.  If the image
gets recreated, everything is fine because miptreecreate_for_dri_image
sets it to the default.  However, if our miptree is recycled, then we
may have stale aux_usage and we need to reset to the default otherwise
our aux_state tracking will get messed up.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
6 years agointel/isl: Add a drm_modifier_get_default_aux_state helper
Jason Ekstrand [Tue, 12 Sep 2017 22:20:26 +0000 (15:20 -0700)]
intel/isl: Add a drm_modifier_get_default_aux_state helper

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
6 years agoi965: Warn for GTT fallbacks when mapping the batch/state buffers.
Kenneth Graunke [Sat, 16 Sep 2017 07:24:41 +0000 (00:24 -0700)]
i965: Warn for GTT fallbacks when mapping the batch/state buffers.

This shouldn't really happen in practice, but I hit it a couple of times
when running a driver with a bad memory leak.  We may as well hook up
the warning, because if it ever triggers, we'll know something is wrong.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
6 years agoi965: Plumb brw through to intel_batchbuffer_reset().
Kenneth Graunke [Sat, 16 Sep 2017 07:23:51 +0000 (00:23 -0700)]
i965: Plumb brw through to intel_batchbuffer_reset().

We'll want to pass this to brw_bo_map in a moment.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
6 years agoradeonsi: reallocate if a non-sharable textures is being shared
Marek Olšák [Thu, 14 Sep 2017 13:41:09 +0000 (15:41 +0200)]
radeonsi: reallocate if a non-sharable textures is being shared

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agoradeonsi: PIPE_BIND_SHARED should allow inter-process sharing
Marek Olšák [Thu, 14 Sep 2017 13:40:45 +0000 (15:40 +0200)]
radeonsi: PIPE_BIND_SHARED should allow inter-process sharing

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agofreedreno: compile fix
Nicolai Hähnle [Mon, 18 Sep 2017 15:38:41 +0000 (17:38 +0200)]
freedreno: compile fix

Fixes: 3f6b3d9db ("gallium: add PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE")
Reported-by: Jan Vesely <jan.vesely@rutgers.edu>
6 years agoclover: add missing include to compat.h
Jan Vesely [Sun, 17 Sep 2017 02:05:39 +0000 (22:05 -0400)]
clover: add missing include to compat.h

Fixes build issues with llvm-3.6
Fixes: 3115687f9b9830417c408228db2bc679e346bba6 (clover: Fix build after
LLVM r313390)

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Tested-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
6 years agoclover: Query and export half precision support
Jan Vesely [Fri, 1 Sep 2017 21:48:39 +0000 (17:48 -0400)]
clover: Query and export half precision support

v2: PIPE_CAP_HALFS -> PIPE_SHADER_CAP_FP16
    has_halfs -> has_halves

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
6 years agogallium: Add PIPE_SHADER_CAP_FP16
Jan Vesely [Fri, 1 Sep 2017 21:47:55 +0000 (17:47 -0400)]
gallium: Add PIPE_SHADER_CAP_FP16

Denotes native half precision float operations capability
v2: PIPE_CAP_HALFS -> PIPE_SHADER_CAP_FP16
    fix indentation

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agoanv: Implement VK_KHR_image_format_list
Jason Ekstrand [Mon, 19 Jun 2017 15:38:48 +0000 (08:38 -0700)]
anv: Implement VK_KHR_image_format_list

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agoanv: Implement VK_KHR_bind_memory2
Jason Ekstrand [Tue, 18 Jul 2017 16:02:53 +0000 (09:02 -0700)]
anv: Implement VK_KHR_bind_memory2

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agonvc0: fix compile error
Benedikt Schemmer [Mon, 18 Sep 2017 13:27:26 +0000 (15:27 +0200)]
nvc0: fix compile error

Fixes: 3f6b3d9db ("gallium: add PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE")
Signed-off-by: Benedikt Schemmer <ben@besd.de>
Previously-pointed-out-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agoradeonsi: allow out-of-order rasterization in commutative blending cases
Nicolai Hähnle [Mon, 18 Sep 2017 09:24:10 +0000 (11:24 +0200)]
radeonsi: allow out-of-order rasterization in commutative blending cases

We do not enable this by default for additive blending, since it slightly
breaks OpenGL invariance guarantees due to non-determinism.

Still, there may be some applications can benefit from white-listing
via the radeonsi_commutative_blend_add drirc setting without any real
visible artifacts.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
6 years agoradeonsi: add drirc option "radeonsi_assume_no_z_fights"
Nicolai Hähnle [Fri, 8 Sep 2017 13:15:08 +0000 (15:15 +0200)]
radeonsi: add drirc option "radeonsi_assume_no_z_fights"

This option enables a performance optimization where typical non-blending
draws with depth buffer may be rasterized out-of-order (on VI+, multi-SE
chips).

This optimization can lead to incorrect results when an applications
renders multiple objects with the same Z value at the same pixel, so we
will never enable it by default. But there may be applications that could
benefit from white-listing.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
6 years agoradeonsi: enable out-of-order rasterization when possible on VI and GFX9 dGPUs
Nicolai Hähnle [Fri, 8 Sep 2017 10:05:24 +0000 (12:05 +0200)]
radeonsi: enable out-of-order rasterization when possible on VI and GFX9 dGPUs

This does not take commutative blending into account yet.

R600_DEBUG=nooutoforder disables it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
6 years agogallium/radeon: pass old_(perfect_)enable to set_occlusion_query_state
Nicolai Hähnle [Fri, 8 Sep 2017 09:54:37 +0000 (11:54 +0200)]
gallium/radeon: pass old_(perfect_)enable to set_occlusion_query_state

The callee can derive the current enable state itself.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
6 years agogallium: add PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE
Nicolai Hähnle [Tue, 12 Sep 2017 16:46:46 +0000 (18:46 +0200)]
gallium: add PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE

To be able to properly distinguish between GL_ANY_SAMPLES_PASSED
and GL_ANY_SAMPLES_PASSED_CONSERVATIVE.

This patch goes through all drivers, having them treat the two
query types identically, except:

1. radeon incorrectly enabled conservative mode on
   PIPE_QUERY_OCCLUSION_PREDICATE. We now do it correctly, only
   on PIPE_QUERY_OCCLUSION_PREDICATE_CONSERVATIVE.
2. st/mesa uses the new query type.

Fixes dEQP-GLES31.functional.fbo.no_attachments.*

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoamd/common: add workaround for cube map array layer clamping
Nicolai Hähnle [Wed, 13 Sep 2017 13:33:23 +0000 (15:33 +0200)]
amd/common: add workaround for cube map array layer clamping

Fixes dEQP-GLES31.functional.texture.filtering.cube_array.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoamd/common: remove has_ds_bpermute argument from ac_build_ddxy
Nicolai Hähnle [Wed, 13 Sep 2017 12:38:17 +0000 (14:38 +0200)]
amd/common: remove has_ds_bpermute argument from ac_build_ddxy

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoamd/common: add chip_class to ac_llvm_context
Nicolai Hähnle [Wed, 13 Sep 2017 12:36:23 +0000 (14:36 +0200)]
amd/common: add chip_class to ac_llvm_context

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoamd/common: round cube array slice in ac_prepare_cube_coords
Nicolai Hähnle [Wed, 13 Sep 2017 08:47:02 +0000 (10:47 +0200)]
amd/common: round cube array slice in ac_prepare_cube_coords

The NIR-to-LLVM pass already does this; now the same fix covers
radeonsi as well.

Fixes various tests of
dEQP-GLES31.functional.texture.filtering.cube_array.combinations.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoradeonsi: workaround for gather4 on integer cube maps
Nicolai Hähnle [Wed, 13 Sep 2017 08:20:03 +0000 (10:20 +0200)]
radeonsi: workaround for gather4 on integer cube maps

This is the same workaround that radv already applied in commit
3ece76f03dc0 ("radv/ac: gather4 cube workaround integer").

Fixes dEQP-GLES31.functional.texture.gather.basic.cube.rgba8i/ui.*

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agost/glsl_to_tgsi: fix theoretical memory leak
Nicolai Hähnle [Wed, 13 Sep 2017 16:08:22 +0000 (18:08 +0200)]
st/glsl_to_tgsi: fix theoretical memory leak

It can't *really* happen since we don't use subroutines.

CID: 1417491
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-By: Gert Wollny <gw.fossdev@gmail.com>
6 years agoi965: emit BRW_NEW_AUX_STATE on aux state changes
Iago Toral Quiroga [Fri, 15 Sep 2017 07:13:07 +0000 (09:13 +0200)]
i965: emit BRW_NEW_AUX_STATE on aux state changes

Fixes a regression introduced with b96313c0e1289b296d7, which removed
BRW_NEW_BLORP for a bunch of SURFACE_STATE setup code, including render
targets, on the basis that blorp invalidates binding tables but not
surface states, however, at least on Broadwell, this caused a regression
in a CTS test, which Ken and Jason tracked down to the fact that we
are not uploading new render target surface states after allocating
new CCS_D surfaces for fast clears (which allocation is deferred until
an actual clear occurs).

The reason this only fails in BDW is that on SKL+ we use CCS_E which
is allocated up front so it exists in the initial surface state, the
problem can be reproduced in these platforms too if we use
INTEL_DEBUG=norcb to force the CCS_D path.

This patch, together with the ones preceding it, fixes the regression
by ensuring that we track and flag as dirty all aux state changes.

Credit goes to Jason and Ken for figuring out the reason for the
regression.

Fixes:
KHR-GL45.transform_feedback.draw_xfb_test

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965: emit BRW_NEW_AUX_STATE when we change the fast clear value
Iago Toral Quiroga [Fri, 15 Sep 2017 07:06:11 +0000 (09:06 +0200)]
i965: emit BRW_NEW_AUX_STATE when we change the fast clear value

v2: rename intel_miptree_set_clear_value to intel_miptree_set_clear_color
    (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965: emit BRW_NEW_AUX_STATE if we drop the aux surface
Iago Toral Quiroga [Thu, 14 Sep 2017 09:06:59 +0000 (11:06 +0200)]
i965: emit BRW_NEW_AUX_STATE if we drop the aux surface

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965: rename BRW_NEW_FAST_CLEAR_COLOR to BRW_NEW_AUX_STATE
Iago Toral Quiroga [Thu, 14 Sep 2017 08:06:33 +0000 (10:06 +0200)]
i965: rename BRW_NEW_FAST_CLEAR_COLOR to BRW_NEW_AUX_STATE

We want to use this flag to signal changes to the aux surfaces,
so let's not make it about fast clearing only. Suggested by Jason.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agodocs: update calendar, add news item and link release notes for 17.2.1
Emil Velikov [Sun, 17 Sep 2017 23:16:42 +0000 (00:16 +0100)]
docs: update calendar, add news item and link release notes for 17.2.1

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
6 years agodocs: add sha256 checksums for 17.2.1
Emil Velikov [Sun, 17 Sep 2017 23:12:36 +0000 (00:12 +0100)]
docs: add sha256 checksums for 17.2.1

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit bd903d4ee15333288848708a60d6c8002cbb5cb1)

6 years agodocs: add release notes for 17.2.1
Emil Velikov [Sun, 17 Sep 2017 22:57:32 +0000 (23:57 +0100)]
docs: add release notes for 17.2.1

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit d6d2b6b5ec9b1638c0827582872670c7da79bb53)

6 years agodocs: update sourcetree following omx rename
Eric Engestrom [Sat, 16 Sep 2017 22:56:08 +0000 (23:56 +0100)]
docs: update sourcetree following omx rename

Fixes: 6a8aa11c207b99920b93 "st/omx_bellagio: Rename state tracker and option"
Cc: Gurkirpal Singh <gurkirpal204@gmail.com>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Andres Gomez <agomez@igalia.com>
6 years agogbm: Add gbm_device_get_format_modifier_plane_count to test
Gert Wollny [Sat, 16 Sep 2017 16:03:16 +0000 (18:03 +0200)]
gbm: Add gbm_device_get_format_modifier_plane_count to test

Adding gbm_device_get_format_modifier_plane_count made the
test gbm-symbols-check fail, this patch adds the according
function name to the test.

Fixes: 8824141b8d48d9120ddbf542d6fb661046c41c62
 (gbm: Add a gbm_device_get_format_modifier_plane_count function)

Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Andres Gomez <agomez@igalia.com>
6 years agotravis: replace omx feature flag with omx-bellagio one
Andres Gomez [Sat, 16 Sep 2017 17:23:56 +0000 (20:23 +0300)]
travis: replace omx feature flag with omx-bellagio one

Fixes: 6a8aa11c207 ("st/omx_bellagio: Rename state tracker and
option")
Signed-off-by: Andres Gomez <agomez@igalia.com>
Cc: Gurkirpal Singh <gurkirpal204@gmail.com>
Cc: Eric Engestrom <eric.engestrom@imgtec.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
6 years agodocs/submittingpatches: add 'test each commit' instructions
Eric Engestrom [Fri, 15 Sep 2017 17:10:57 +0000 (17:10 +0000)]
docs/submittingpatches: add 'test each commit' instructions

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
6 years agoradv: Add support for more DCC compression with VK_KHR_image_format_list.
Bas Nieuwenhuizen [Fri, 15 Sep 2017 22:30:18 +0000 (00:30 +0200)]
radv: Add support for more DCC compression with VK_KHR_image_format_list.

Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoradv: Add code to check if two formats can share DCC metadata.
Bas Nieuwenhuizen [Fri, 15 Sep 2017 21:39:45 +0000 (23:39 +0200)]
radv: Add code to check if two formats can share DCC metadata.

Ported from radeonsi.

Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoi965: Add an INTEL_DEBUG=reemit option.
Kenneth Graunke [Sat, 16 Sep 2017 00:47:07 +0000 (17:47 -0700)]
i965: Add an INTEL_DEBUG=reemit option.

Jason and I use this for debugging all the time.  Recompiling the driver
to enable it is kind of annoying.  It's a great thing to try along with
always_flush_batch=true and always_flush_cache=true to detect a class of
problems - namely, atoms listening to an insufficient set of dirty bits.

Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agoclover: Fix build after LLVM r313390
Jan Vesely [Sat, 16 Sep 2017 00:34:42 +0000 (20:34 -0400)]
clover: Fix build after LLVM r313390

v2: pass llvm context reference instead of a pointer

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
6 years agoradv: Don't redundantly emit pipelines after secondary cmd buffer.
Bas Nieuwenhuizen [Tue, 12 Sep 2017 22:12:48 +0000 (00:12 +0200)]
radv: Don't redundantly emit pipelines after secondary cmd buffer.

Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoradv: Check for GFX9 for 1D arrays in image_size intrinsic.
Bas Nieuwenhuizen [Fri, 15 Sep 2017 19:40:00 +0000 (21:40 +0200)]
radv: Check for GFX9 for 1D arrays in image_size intrinsic.

Only on GFX9 we implement them as 2D images.

This fixes:
dEQP-VK.image.image_size.1d_array.readonly_12x34
dEQP-VK.image.image_size.1d_array.readonly_1x1
dEQP-VK.image.image_size.1d_array.readonly_32x32
dEQP-VK.image.image_size.1d_array.readonly_7x1
dEQP-VK.image.image_size.1d_array.readonly_writeonly_12x34
dEQP-VK.image.image_size.1d_array.readonly_writeonly_1x1
dEQP-VK.image.image_size.1d_array.readonly_writeonly_32x32
dEQP-VK.image.image_size.1d_array.readonly_writeonly_7x1
dEQP-VK.image.image_size.1d_array.writeonly_12x34
dEQP-VK.image.image_size.1d_array.writeonly_1x1
dEQP-VK.image.image_size.1d_array.writeonly_32x32
dEQP-VK.image.image_size.1d_array.writeonly_7x1

Fixes: 1bcb953e166 "radv: handle GFX9 1D textures"
Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoi965: drop unused variables
Eric Engestrom [Fri, 15 Sep 2017 17:11:11 +0000 (18:11 +0100)]
i965: drop unused variables

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoi965/tex: Unify the TexImage and TexSubImage code
Jason Ekstrand [Wed, 31 May 2017 20:48:10 +0000 (13:48 -0700)]
i965/tex: Unify the TexImage and TexSubImage code

It's nearly the same so there's no good reason why it can't be in a
common function.  The one difference is that _mesa_store_teximage
calls AllocTextureImageBuffer for us, while _mesa_store_texsubimage
doesn't, but we don't need that anyway - intelTexImage already does it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
6 years agoi965/tex: Remove the for_glTexImage parameter from texsubimage_tiled_memcpy
Jason Ekstrand [Wed, 31 May 2017 20:43:54 +0000 (13:43 -0700)]
i965/tex: Remove the for_glTexImage parameter from texsubimage_tiled_memcpy

It is set to false in both callers.  It isn't needed for glTexImage
because intelTexImage calls AllocTextureImageBuffer before calling
texsubimage_tiled_memcpy.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
6 years agoi965/tex: Make a couple of helpers static
Jason Ekstrand [Wed, 31 May 2017 20:35:30 +0000 (13:35 -0700)]
i965/tex: Make a couple of helpers static

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
6 years agoi965: Move TexSubImage functions to intel_tex_image.c
Jason Ekstrand [Wed, 31 May 2017 20:32:29 +0000 (13:32 -0700)]
i965: Move TexSubImage functions to intel_tex_image.c

These two paths are basically the same.  There's no good reason to have
them in different files.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
6 years agoi965/blorp: Set r8stencil_needs_update when writing stencil
Jason Ekstrand [Sat, 17 Jun 2017 20:50:30 +0000 (13:50 -0700)]
i965/blorp: Set r8stencil_needs_update when writing stencil

This fixes a crash on Haswell when we try to upload a stencil texture
with blorp.  It would also be a problem if someone tried to texture from
stencil after glBlitFramebuffers.

Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
6 years agoutil/u_atomic: Add implementation of __sync_val_compare_and_swap_8
Matt Turner [Thu, 14 Sep 2017 18:00:26 +0000 (11:00 -0700)]
util/u_atomic: Add implementation of __sync_val_compare_and_swap_8

Needed for 32-bit PowerPC.

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Fixes: a6a38a038bd ("util/u_atomic: provide 64bit atomics where
they're missing")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agoutil: Link libmesautil into u_atomic_test
Matt Turner [Thu, 14 Sep 2017 17:48:57 +0000 (10:48 -0700)]
util: Link libmesautil into u_atomic_test

Platforms without particular atomic operations require the
implementations in u_atomic.c

Cc: "17.2" <mesa-stable@lists.freedesktop.org>
Fixes: a6a38a038bd ("util/u_atomic: provide 64bit atomics where
they're missing")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agovulkan: update headers & registry to VK 1.0.61
Lionel Landwerlin [Fri, 15 Sep 2017 14:10:53 +0000 (15:10 +0100)]
vulkan: update headers & registry to VK 1.0.61

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoautomake: enable libunwind in `make distcheck'
Emil Velikov [Mon, 11 Sep 2017 17:13:55 +0000 (18:13 +0100)]
automake: enable libunwind in `make distcheck'

Enable the toggle to catch when the library is missing from the link
path. Better to test, fail and address before releasing Mesa ;-)

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agotravis: Add libunwind-dev to gallium/make builds
Gert Wollny [Thu, 14 Sep 2017 10:27:42 +0000 (12:27 +0200)]
travis: Add libunwind-dev to gallium/make builds

libunwind is a optional dependency used by the gallium aux module
(libgallium) and consequently the final binaries must be linked against
it. To test whether the library is properly specified in the link pass
add it to the travis-ci build environment and force its use.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agotravis: force llvm-3.3 for "make Gallium ST Other"
Gert Wollny [Thu, 14 Sep 2017 10:27:41 +0000 (12:27 +0200)]
travis: force llvm-3.3 for "make Gallium ST Other"

In Ubuntu Trusty the default version of llvm is 3.4 and the build was
actually randomly picking 3.5 or 3.9. Adding libunwind would then result
is build success or failure depending of what version was picked.

Install the llvm-3.3-dev package and force its use: On one hand it is
the minimum required version we want to the build test against, and on
the other hand forcing the version stabilizes the build.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agomesa/st/tests: Correct build flags and force -std=c++11
Gert Wollny [Wed, 13 Sep 2017 13:03:34 +0000 (15:03 +0200)]
mesa/st/tests: Correct build flags and force -std=c++11

Include src/gallium/Automake.inc, correct the build flags accordingly.

Force -std=c++11 (extensively used by the test) as otherwise it gets
defined only when building against llvm >= 3.9.

Fixes: 7be6d8fe12 ("mesa/st: glsl_to_tgsi: add tests for the new
temporary lifetime tracker")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102665
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v1)
6 years agoautomake: include radv_shader.h in the sources list
Emil Velikov [Fri, 15 Sep 2017 12:40:22 +0000 (13:40 +0100)]
automake: include radv_shader.h in the sources list

Otherwise it will be missing from the tarball, leadin to build failure.

Fixes: d4d777317b9 ("radv: move shaders related code to radv_shader.c")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
6 years agost/omx_bellagio: Rename state tracker and option
Gurkirpal Singh [Sat, 12 Aug 2017 16:07:15 +0000 (21:37 +0530)]
st/omx_bellagio: Rename state tracker and option

Changes --enable-omx option to --enable-omx-bellagio

Signed-off-by: Gurkirpal Singh <gurkirpal204@gmail.com>
Reviewed-and-Tested-by: Julien Isorce <julien.iso...@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
6 years agoi965: fix build warning on clang
Tapani Pälli [Thu, 14 Sep 2017 07:26:39 +0000 (10:26 +0300)]
i965: fix build warning on clang

fixes following warning:
   warning: format specifies type 'long' but the argument has type 'uint64_t' (aka 'unsigned long long')

cast is needed to avoid this change turning in to another warning:
   warning: format specifies type 'unsigned long long' but the argument has type 'uint64_t' (aka 'unsigned long')

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
6 years agoradv: fix a potential crash if attachments allocation failed
Samuel Pitoiset [Thu, 14 Sep 2017 16:47:04 +0000 (18:47 +0200)]
radv: fix a potential crash if attachments allocation failed

Also, it's useless to set the error code twice. Though, we
should probably skip the next commands when the command buffer
is considered invalid.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: dump the device name into the hang report
Samuel Pitoiset [Thu, 14 Sep 2017 09:25:24 +0000 (11:25 +0200)]
radv: dump the device name into the hang report

Similar to RadeonSI renderer string.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: add get_chip_name() callback
Samuel Pitoiset [Thu, 14 Sep 2017 09:25:23 +0000 (11:25 +0200)]
radv: add get_chip_name() callback

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agor600: add .gitignore for egd_tables.h
Dave Airlie [Fri, 15 Sep 2017 03:52:22 +0000 (13:52 +1000)]
r600: add .gitignore for egd_tables.h

6 years agoradeonsi: enable STD430 packing of UBOs by default
Timothy Arceri [Thu, 14 Sep 2017 22:22:33 +0000 (08:22 +1000)]
radeonsi: enable STD430 packing of UBOs by default

Before this change we were defaulting to STD140 which is slightly
less efficient at packing arrays.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agost/mesa: set UseSTD430AsDefaultPacking const based on CAP
Timothy Arceri [Thu, 14 Sep 2017 22:21:22 +0000 (08:21 +1000)]
st/mesa: set UseSTD430AsDefaultPacking const based on CAP

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agogallium: introduce PIPE_CAP_LOAD_CONSTBUF
Timothy Arceri [Thu, 17 Aug 2017 10:12:42 +0000 (20:12 +1000)]
gallium: introduce PIPE_CAP_LOAD_CONSTBUF

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi: make use of LOAD for UBOs
Timothy Arceri [Thu, 17 Aug 2017 03:29:54 +0000 (13:29 +1000)]
radeonsi: make use of LOAD for UBOs

v2: always set can_speculate and allow_smem to true

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa/st: add LOAD support for UBOs
Timothy Arceri [Tue, 25 Jul 2017 03:08:36 +0000 (13:08 +1000)]
mesa/st: add LOAD support for UBOs

This will allow us to use STD430 packing by default if the driver
supports it.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa/st: create add_buffer_to_load_and_stores() helper
Timothy Arceri [Thu, 17 Aug 2017 10:29:27 +0000 (20:29 +1000)]
mesa/st: create add_buffer_to_load_and_stores() helper

Will be used to add LOAD support to UBOs.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agogallium: add CONSTBUF type to tgsi_file_type
Timothy Arceri [Thu, 17 Aug 2017 06:50:01 +0000 (16:50 +1000)]
gallium: add CONSTBUF type to tgsi_file_type

This will be use to distinguish between load types when using
the TGSI_OPCODE_LOAD opcode.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agovirgl: drop const dimensions on first block.
Dave Airlie [Tue, 12 Sep 2017 23:23:15 +0000 (09:23 +1000)]
virgl: drop const dimensions on first block.

The virgl protocol version of tgsi doesn't handle this yet,
transform it back to the old ways.

Thanks to Nicolai Hähnle <nicolai.haehnle@amd.com>
for also writing nearly the same patch.

Fixes: 41e342d5 tgsi/ureg: always emit constants (and their decls) as 2D
Tested-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
6 years agost/glsl->tgsi: fix u64 to bool comparisons.
Dave Airlie [Thu, 14 Sep 2017 04:03:19 +0000 (05:03 +0100)]
st/glsl->tgsi: fix u64 to bool comparisons.

Otherwise we end up using a 32-bit comparison which didn't end well.

Timothy caught this while playing around with some opt passes.

Fixes: 278580729a (st/glsl_to_tgsi: add support for 64-bit integers)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agoi965: Print size of validation and relocation lists in INTEL_DEBUG=flush
Kenneth Graunke [Wed, 6 Sep 2017 17:55:07 +0000 (10:55 -0700)]
i965: Print size of validation and relocation lists in INTEL_DEBUG=flush

It's nice to have this information.  While we're at it, tweak the
formatting to try and vertically align numbers in the common case.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
6 years agoi965: Disentangle batch and state buffer flushing.
Kenneth Graunke [Tue, 5 Sep 2017 22:14:18 +0000 (15:14 -0700)]
i965: Disentangle batch and state buffer flushing.

We now flush the batch when either the batchbuffer or statebuffer
reaches the original intended batch size, instead of when the sum of
the two reaches a certain size (which makes no sense now that they're
separate buffers).

With this change, we also need to update our "are we near the end?"
estimate to require separate batch and state buffer space.  I obtained
these estimates by looking at the size of draw calls in the Unreal 4
Elemental Demo (using INTEL_DEBUG=flush and always_flush_batch=true).

This will significantly impact the size of our batches.  I've adjusted
both down to try and be roughly similar to what we had been doing.  On
various benchmarks, a 20kB batch and 16kB statebuffer seemed to about
right, but we may need to adjust this further.  I tried a 16kB batch,
but that regressed Synmark OglMultithread performance by a fair bit.
32kB for both would have significantly increased our batch sizes.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
6 years agoi965: Delete BATCH_RESERVED handling.
Kenneth Graunke [Tue, 5 Sep 2017 22:03:48 +0000 (15:03 -0700)]
i965: Delete BATCH_RESERVED handling.

Now that we can grow the batchbuffer if we absolutely need the extra
space, we don't need to reserve space for the final do-or-die ending
commands.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
6 years agoi965: Make BLORP properly avoid batch wrapping.
Kenneth Graunke [Tue, 5 Sep 2017 22:57:41 +0000 (15:57 -0700)]
i965: Make BLORP properly avoid batch wrapping.

We need to set brw->no_batch_wrap to actually avoid flushing in the
middle of our BLORP operation, and instead grow the batchbuffer.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
6 years agoi965: Grow the batch/state buffers if we need space and can't flush.
Kenneth Graunke [Fri, 1 Sep 2017 23:42:56 +0000 (16:42 -0700)]
i965: Grow the batch/state buffers if we need space and can't flush.

Previously, we would just assert fail and die in this case.  The only
safeguard is the "estimated max prim size" checks when starting a draw
(or compute dispatch or BLORP operation)...which are woefully broken.

Growing is fairly straightforward:

1. Allocate a new larger BO.
2. memcpy the existing contents over to the new buffer
3. Set the new BO to the same GTT offset as the old BO.  When emitting
   relocations, we write the presumed GTT offset of the target BO.  If
   we changed it, we'd have to update all the existing values (by
   walking the relocation list and looking at offsets), which is more
   expensive.  With the old BO freed, ideally the kernel could simply
   place the new BO at that offset anyway.
4. Update the validation list to contain the new BO.
5. Update the relocation list to have the GEM handle for the new BO
   (which we can skip if using I915_EXEC_HANDLE_LUT).

v2: Update to handle malloc'd shadow buffers.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
6 years agoi965: Use a separate state buffer, but avoid changing flushing behavior.
Kenneth Graunke [Wed, 30 Aug 2017 08:37:24 +0000 (01:37 -0700)]
i965: Use a separate state buffer, but avoid changing flushing behavior.

Previously, we emitted GPU commands and indirect state into the same
buffer, using a stack/heap like system where we filled in commands from
the start of the buffer, and state from the end of the buffer.  We then
flushed before the two met in the middle.

Meeting in the middle is fatal, so you have to be certain that you
reserve the correct amount of space before emitting commands or state
for a draw.  Currently, we will assert !no_batch_wrap and die if the
estimate is ever too small.  This has been mercifully obscure, but has
happened on a number of occasions, and could in theory happen to any
application that issues a large draw at just the wrong time.

Estimating the amount of batch space required is painful - it's hard to
get right, and getting it right involves a lot of code that would burn
CPU time, and also be painful to maintain.  Rolling back to a saved
state and retrying is also painful - failing to save/restore all the
required state will break things, and redoing state emission burns a
lot of CPU.  memcpy'ing to a new batch and continuing is painful,
because commands we issue for a draw depend on earlier commands as well
(such as STATE_BASE_ADDRESS, or the GPU being in a pirtacular state).

The best plan is to never run out of space, which is totally doable but
pretty wasteful - a pessimal draw requires a huge amount of space, and
rarely occurs.  Instead, we'd like to grow the batch buffer if we need
more space and can't safely flush.

We can't grow with a meet in the middle approach - we'd have to move the
state to the end, which would mean updating every offset from dynamic
state base address.  Using separate batch and state buffers, where both
fill starting at the beginning, makes it easy to grow either as needed.

This patch separates the two concepts.  We create a separate state
buffer, with a second relocation list, and use that for brw_state_batch.

However, this patch tries to retain the original flushing behavior - it
adds the amount of batch and state space together, as if they were still
co-existing in a single buffer.  The hope is to flush at the same time
as before.  This is necessary to avoid provoking bugs caused by broken
batch wrap handling (which we'll fix shortly).  It also avoids suddenly
increasing the size of the batch (due to state not taking up space),
which could have a significant performance impact.  We'll tune it later.

v2:
- Mark the statebuffer with EXEC_OBJECT_CAPTURE when supported (caught
  by Chris).  Unfortunately, we lose the ability to capture state data
  on older kernels.
- Continue to support the malloc'd shadow buffers.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
6 years agoi965: Pass screen to intel_batchbuffer_reset().
Kenneth Graunke [Fri, 8 Sep 2017 06:43:46 +0000 (23:43 -0700)]
i965: Pass screen to intel_batchbuffer_reset().

This will let us access screen->kernel_features in the next patch.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
6 years agoi965: Prepare INTEL_DEBUG=bat decoding for a separate statebuffer.
Kenneth Graunke [Wed, 30 Aug 2017 08:37:24 +0000 (01:37 -0700)]
i965: Prepare INTEL_DEBUG=bat decoding for a separate statebuffer.

We'll need to read from both buffers when decoding state.

This also drops the "failed to map" fallback - it's completely useless
on LLC systems where we write directly to the mapped BO.  It's not that
useful on non-LLC systems either.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
6 years agoi965: Split brw_emit_reloc into brw_batch_reloc and brw_state_reloc.
Kenneth Graunke [Thu, 31 Aug 2017 20:10:19 +0000 (13:10 -0700)]
i965: Split brw_emit_reloc into brw_batch_reloc and brw_state_reloc.

brw_batch_reloc emits a relocation from the batchbuffer to elsewhere.
brw_state_reloc emits a relocation from the statebuffer to elsewhere.

For now, they do the same thing, but when we actually split the two
buffers, we'll change brw_state_reloc to use the state buffer.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
6 years agoi965: Refactor relocs into a brw_reloc_list structure.
Kenneth Graunke [Thu, 31 Aug 2017 18:57:01 +0000 (11:57 -0700)]
i965: Refactor relocs into a brw_reloc_list structure.

I'm planning on splitting batch and state into separate buffers, at
which point we'll need two relocation lists.  In preparation for that,
this patch refactors the relocation stuff into a structure we can
replicate...which looks a lot like anv_reloc_list.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agoi965: Move brw_state_batch code to intel_batchbuffer.c
Kenneth Graunke [Wed, 30 Aug 2017 08:40:00 +0000 (01:40 -0700)]
i965: Move brw_state_batch code to intel_batchbuffer.c

The batch buffer and state buffer code is fairly tied together,
and having it in one .c file will make refactoring easier.

Also, drop some commentary above brw_state_batch.  The "aperture
checking performance hacks" are long since gone, so that paragraph
makes little sense at this point.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
6 years agoi965: Drop a useless ret == 0 check.
Kenneth Graunke [Wed, 30 Aug 2017 07:47:03 +0000 (00:47 -0700)]
i965: Drop a useless ret == 0 check.

Prior to the previous patch, we would pwrite the batchbuffer contents,
and wanted to skip the execbuffer if that failed.  Now that we memcpy,
we don't set ret != 0 on failure anymore, so it will always be 0.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
6 years agoi965: Use a WC map and memcpy for the batch instead of pwrite.
Kenneth Graunke [Fri, 8 Sep 2017 22:00:14 +0000 (15:00 -0700)]
i965: Use a WC map and memcpy for the batch instead of pwrite.

We'd like to eliminate the malloc'd shadow copy eventually, but there
are still unresolved performance problems.  In the meantime, let's at
least get rid of pwrite.

On Apollolake, improves Synmark OglBatch6 performance by:
1.53581% +/- 0.269589% (n=108).

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>