Mathias Fröhlich [Sat, 3 Feb 2018 18:42:20 +0000 (19:42 +0100)]
vbo: Use _DrawVAO for array type draw commands.
Switch over to use the _DrawVAO for all the array type draws.
The _DrawVAO needs to be set before we enter _mesa_update_state, so move
setting the draw method in front of the first call to _mesa_update_state
which is in turn called from the *validate*Draw* calls. Using the
gl_vertex_array_object::_Enabled bitmask, gl_vertex_program_state::_VPMode
and gl_vertex_array_object::_AttributeMapMode we can already set
varying_vp_inputs before we call _mesa_update_state the first time.
Thus remove duplicate state validation.
v2: Update comments.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Mathias Fröhlich [Sat, 3 Feb 2018 16:19:24 +0000 (17:19 +0100)]
vbo: Implement method to track the inputs array.
Provided the _DrawVAO and the derived state that is maintained if we have
the _DrawVAO set, implement a method to incrementally update the array of
gl_vertex_array input pointers.
v2: Add some more comments.
Rename _vbo_array_init to _vbo_init_inputs.
Rename vbo_context::arrays to vbo_context::draw_arrays.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Mathias Fröhlich [Wed, 24 Aug 2016 06:45:05 +0000 (08:45 +0200)]
mesa: Introduce a yet unused _DrawVAO.
During the patch series this VAO gets populated with either the currently
bound VAO or an internal VAO that will be used for immediate mode and
dlist rendering.
v2: More comments about the _DrawVAO, filter and enabled mask.
Rename _DrawVAOEnabled to _DrawVAOEnabledAttribs.
v3: Fix and move comment.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Mathias Fröhlich [Sat, 3 Feb 2018 09:44:10 +0000 (10:44 +0100)]
vbo: Remove get_vp_mode() and enum vp_mode.
Is now unused.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Mathias Fröhlich [Sat, 3 Feb 2018 09:42:01 +0000 (10:42 +0100)]
vbo: Use _VPMode instead of get_vp_mode().
At those places where we used get_vp_mode() use
gl_vertex_program_state::_VPMode instead.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Mathias Fröhlich [Fri, 2 Feb 2018 20:31:27 +0000 (21:31 +0100)]
mesa: Provide an alternative to get_vp_mode()
To get equivalent information than get_vp_mode(), track the vertex
processing mode in a per context variable at
gl_vertex_program_state::_VPMode.
This aims to replace get_vp_mode() as seen in the vbo module.
But instead of the get_vp_mode() implementation which only gives correct
answers past calling _mesa_update_state() this context variable is
immediately tracked when the vertex processing state is modified. The
correctness of this value is asserted on state validation.
With this in place we should be able to untangle the dependency with
varying_vp_inputs and state invalidation.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Ilia Mirkin [Thu, 22 Feb 2018 04:32:49 +0000 (23:32 -0500)]
nv50,nvc0: fix integer MS resolves using 2d engine
We don't want filtering for integer textures, same as depth/stencil.
Fixes: KHR-GL45.direct_state_access.renderbuffers_storage_multisample
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Karol Herbst <kherbst@redhat.com>
Ilia Mirkin [Wed, 21 Feb 2018 05:10:24 +0000 (00:10 -0500)]
nvc0: fix writing query results into buffer
We need to mark the range as valid, and validate the resource using a
helper to ensure that the buffer status is marked properly.
Fixes some CTS pipeline stats query tests, and
KHR-GL45.direct_state_access.queries_functional
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Karol Herbst <kherbst@redhat.com>
Ilia Mirkin [Wed, 21 Feb 2018 04:17:31 +0000 (23:17 -0500)]
nv50,nvc0: fix clear buffer acceleration
Two things were off:
- valid range was not updated, which could affect waiting for future
maps
- fencing was done manually instead of using the *_resource_validate
helper, which resulted in a missed dirty buffer flag being set
Fixes: KHR-GL45.direct_state_access.buffers_clear
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Karol Herbst <kherbst@redhat.com>
Lionel Landwerlin [Wed, 7 Feb 2018 10:48:32 +0000 (10:48 +0000)]
i965: perf: ensure reading config IDs from sysfs isn't interrupted
Fixes: 458468c136e "i965: Expose OA counters via INTEL_performance_query"
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Bas Nieuwenhuizen [Fri, 23 Feb 2018 00:42:07 +0000 (01:42 +0100)]
radv: Fix autotools build.
Somewhere along the way the Makefile changes got lost ...
Fixes: 4db78f3a6b "radv: Put supported extensions in a struct."
Acked-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sun, 11 Feb 2018 13:38:42 +0000 (14:38 +0100)]
radv: Return NULL for entrypoints when not supported.
This implements strict checking for the entrypoint ProcAddr
functions.
- InstanceProcAddr with instance = NULL, only returns the 3 allowed
entrypoints.
- DeviceProcAddr does not return any instance entrypoints.
- InstanceProcAddr does not return non-supported or disabled
instance entrypoints.
- DeviceProcAddr does not return non-supported or disabled device
entrypoints.
- InstanceProcAddr still returns non-supported device entrypoints.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sun, 11 Feb 2018 12:12:33 +0000 (13:12 +0100)]
radv: Reword radv_entrypoints_gen.py
With a big inspiration from anv as always ...
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sat, 10 Feb 2018 23:32:34 +0000 (00:32 +0100)]
radv: Track enabled extensions.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sat, 10 Feb 2018 20:43:55 +0000 (21:43 +0100)]
radv: Put supported extensions in a struct.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Jose Fonseca [Mon, 22 Jan 2018 11:11:16 +0000 (11:11 +0000)]
appveyor: Build with MSVC 2015.
The MSVC version we (at VMware) primarily care about from now on is
2015.
See https://ci.appveyor.com/project/jrfonseca/mesa/build/46
We can drop support for building with 2013 in a future commit. I'm not
aware of significant changes in C99/C11 support from MSVC 2013 to 2015,
but there's no point in continuing supporting old MSVC versions when
nobody cares.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Samuel Pitoiset [Mon, 5 Feb 2018 14:51:37 +0000 (15:51 +0100)]
ac/nir: remove emission of nir_op_fpow
fpow is now lowered at NIR level.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Fri, 2 Feb 2018 18:04:57 +0000 (19:04 +0100)]
radv: enable lowering of fpow to fexp2 and flog2
There is no fpow in hardware, so it's always lowered somewhere,
but it appears that lowering at NIR level is better. Figured while
comparing compute shaders between RadeonSI and RADV.
Polaris10:
Totals from affected shaders:
SGPRS: 18936 -> 18904 (-0.17 %)
VGPRS: 12240 -> 12220 (-0.16 %)
Spilled SGPRs: 2809 -> 2809 (0.00 %)
Code Size: 718116 -> 719848 (0.24 %) bytes
Max Waves: 1409 -> 1410 (0.07 %)
Vega10:
Totals from affected shaders:
SGPRS: 18392 -> 18392 (0.00 %)
VGPRS: 12008 -> 11920 (-0.73 %)
Spilled SGPRs: 3001 -> 2981 (-0.67 %)
Code Size: 777444 -> 778788 (0.17 %) bytes
Max Waves: 1503 -> 1504 (0.07 %)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Mon, 5 Feb 2018 14:08:03 +0000 (15:08 +0100)]
nir: lower fexp2(fmul(flog2(a), 2)) to fmul(a, a)
Similar for the 4 case.
Suggested by Bas.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Mon, 5 Feb 2018 15:07:45 +0000 (16:07 +0100)]
nir: add is_used_once for fmul(fexp2(a), fexp2(b)) to fexp2(fadd(a, b))
Otherwise the code size increases because the original fexp2()
instructions can't be deleted.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 22 Feb 2018 09:25:38 +0000 (10:25 +0100)]
ac/nir: set GLC=1 for load/store of coherent/volatile images
This disables persistence accross wavefronts.
F1 2017 and Wolfenstein 2 appear to use some coherent images
but this patch doesn't seem to change anything.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 22 Feb 2018 09:25:37 +0000 (10:25 +0100)]
spirv: apply memory qualifiers to images
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Chuck Atkins [Thu, 22 Feb 2018 14:19:37 +0000 (09:19 -0500)]
glx: Properly handle cases where screen creation fails
This fixes a segfault exposed by
a29d63ecf7 which occurs when swr is
used on an unsupported architecture.
v2: re-work to place logic in xmesa_init_display
Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Cc: George Kyriazis <george.kyriazis@intel.com>
Cc: Bruce Cherniak <bruce.cherniak@intel.com>
Iago Toral Quiroga [Wed, 14 Feb 2018 10:48:05 +0000 (11:48 +0100)]
anv/blorp: multisample resolve all attachment layers
We were only resolving the first.
v2:
- Do not require that the number of layers on dst and src are an
exact match, it is okay if the dst has more layers so long as
it has at least the same that we are going to resolve.
- Do not always resolve array_len layers, we should resolve
only from base_array_layer to array_len.
v3:
- v2 was assuming that array_len represented the total number of
layers in the image, but it represents the number of layers
starting at the base array ayer.
v4:
- The number of layers to resolve should be taken from the
framebuffer (Nanley).
Fixes new CTS tests for multisampled layered rendering:
dEQP-VK.renderpass.multisample_resolve.layers_*
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Tue, 28 Nov 2017 19:07:48 +0000 (11:07 -0800)]
intel/isl: Improve the documentation on get_default_aux_state
Reviewed-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Mon, 23 Oct 2017 23:32:42 +0000 (16:32 -0700)]
i965: Use finish_external instead of make_shareable in setTexBuffer2
The setTexBuffer2 hook from GLX is used to implement glxBindTexImageEXT
which has tighter restrictions than just "it's shared". In particular,
it says that any rendering to the image while it is bound causes the
contents to become undefined.
The GLX_EXT_texture_from_pixmap extension provides us with an acquire
and release in the form of glXBindTexImageEXT and glXReleaseTexImageEXT.
The extension spec says,
"Rendering to the drawable while it is bound to a texture will leave
the contents of the texture in an undefined state. However, no
synchronization between rendering and texturing is done by GLX. It
is the application's responsibility to implement any synchronization
required."
From the EGL 1.4 spec for eglBindTexImage:
"After eglBindTexImage is called, the specified surface is no longer
available for reading or writing. Any read operation, such as
glReadPixels or eglCopyBuffers, which reads values from any of the
surface’s color buffers or ancillary buffers will produce
indeterminate results. In addition, draw operations that are done
to the surface before its color buffer is released from the texture
produce indeterminate results
In other words, between the bind and release calls, we effectively own
those pixels and can assume, so long as we don't crash, that no one else
is reading from/writing to the surface. The GLX and EGL implementations
call the setTexBuffer2 and releaseTexBuffer function pointers that the
driver can hook.
In theory, this means that, between BindTexImage and ReleaseTexImage, we
own the pixels and it should be safe to track aux usage so we
can avoid redundant resolves so long as we start off with the right
assumption at the start of the bind/release pair.
In practice, however, X11 has slightly different expectations. It's
expected that the server may be drawing to the image at the same time as
the compositor is texturing from it. In that case, the worst expected
outcome should be tearing or partial rendering and not random corruption
like we see when rendering races with scanout with CCS. Fortunately,
the GEM rules about texture/render dependencies save us here. If X11
submits work to write to a pixmap after the compositor has submitted
work to texture from it, GEM inserts a dependency between the compositor
and X11. If X11 is using a high-priority context, this will cause the
compositor to get a temporarily boosted priority while the batch from
X11 is waiting on it. This means that we will never have an actual race
between X11 and the compositor so no corruption can happen.
Unfortunately, however, this means that X11 will likely be rendering to it
between the compositor's BindTexImage and ReleaseTexImage calls. If we
want to avoid strange issues, we need to be a bit careful about
resolves because we can't really transition it away from the "default"
aux usage. The only case where this would practically be a problem is
with image_load_store where we have to do a full resolve in order to use
the image via the data port. Even there it would only be a problem if
batches were split such that X11's rendering happens between the resolve
and the use of it as a storage image. However, the chances of this
happening are very slim so we just emit a warning and hope for the best.
This commit adds a new helper intel_miptree_finish_external which resets
all aux state to whatever ISL says is the right worst-case "default" for
the given modifier. It feels a little awkward to call it "finish"
because it's actually an acquire from the perspective of the driver, but
it matches the semantics of the other prepare/finish functions. This
new helper gets called in intelSetTexBuffer2 instead of make_shareable.
We also add an intelReleaseTexBuffer (we passed NULL to releaseTexBuffer
before) and call intel_miptree_prepare_external in it. This probably
does nothing most of the time but it means that the prepare/finish calls
are properly matched.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Tue, 12 Sep 2017 21:26:04 +0000 (14:26 -0700)]
i965/tex_image: Reference the renderbuffer miptree in setTexBuffer2
The old code made a new miptree that referenced the same BO as the
renderbuffer and just trusted in the memory aliasing to work. There are
only two ways in which the new miptree is liable to differ from the one
in the renderbuffer and neither of them matter:
1) It may have a different target. The only targets that we can ever
see in intelSetTexBuffer2 are GL_TEXTURE_2D and GL_TEXTURE_RECTANGLE
and the difference between the two doesn't matter as far as the
miptree is concerned; genX(update_sampler_state) only looks at the
gl_texture_object and not the miptree when determining whether or
not to use normalized coordinates.
2) It may have a very slightly different format. Again, this doesn't
matter because we've supported texture views for quite some time so
we always look at the gl_texture_object format instead of the
miptree format for hardware setup anyway.
On the other hand, because we were recreating the miptree, we were using
intel_miptree_create_for_bo which doesn't understand modifiers. We
really want this function to work without doing a resolve so long as you
have modifiers so we need to fix that.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Tue, 28 Nov 2017 19:26:55 +0000 (11:26 -0800)]
i965/tex_image: Pull the tex format from the renderbuffer in intelSetTexBuffer2
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Mon, 23 Oct 2017 22:06:11 +0000 (15:06 -0700)]
i965/miptree: Loosen the format check in miptree_match_image
This function is used to determine when we need to re-allocate a
miptree. Since we do nothing different in miptree allocation for
sRGB vs. linear, loosening this should be safe and may lead to less
copying and reallocating in some odd cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Jason Ekstrand [Wed, 29 Nov 2017 00:06:27 +0000 (16:06 -0800)]
i965/state: Ignore intel_obj->_Format for depth/stencil and ETC2
We're about to start letting the intel_obj->_Format be the "real"
texture format. For depth/stencil textures, this may be a combined
depth stencil format. For ETC2 on gen7 and earlier, this will be the
actual ETC2 format. This makes a bit more GL sense but means we have to
be careful in state upload.
Reviewed-by: Chad Versace <chadversary@chromium.org>
Kenneth Graunke [Mon, 19 Feb 2018 17:35:46 +0000 (09:35 -0800)]
glsl: Parse 'layout' as a token with advanced blending or bindless
Both KHR_blend_equation_advanced and ARB_bindless_texture provide
layout qualifiers, and are exposed in compatibility contexts. We
need to parse the layout qualifier as a token in order for those
to work, but forgot to extend this check.
ARB_shader_image_load_store would need a similar treatment, but we
don't expose that in legacy OpenGL contexts.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105161
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Daniel Stone [Tue, 20 Feb 2018 20:56:02 +0000 (20:56 +0000)]
vulkan/wsi/x11: Consistently update and return swapchain status
Use a helper function for updating the swapchain status. This will be
used later to handle VK_SUBOPTIMAL_KHR, where we need to make a
non-error status stick to the swapchain until recreation. Instead of
direct comparisons to VK_SUCCESS to check for error, test for negative
numbers meaning an error status, and positive numbers indicating
non-error statuses.
v2 (Jason Ekstrand):
- Use a pattern of "return x11_swapchain_result(chain, VK_WHATEVER)"
- Handle wsi_queue_pull returning VK_TIMEOUT
- Call x11_swapchain_result in x11_present_to_x11
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jason Ekstrand [Wed, 21 Feb 2018 20:38:12 +0000 (12:38 -0800)]
vulkan/wsi/x11: Set OUT_OF_DATE if wait_for_special_event fails
This most likely means we lost our connection to the X server so
OUT_OF_DATE is reasonable. This was also the one case where we pushed a
UINT32_MAX into the queue without setting an error condition.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Daniel Stone <daniels@collabora.com>
Daniel Stone [Fri, 9 Feb 2018 23:43:30 +0000 (15:43 -0800)]
vulkan/wsi/wayland: Add support for zwp_dmabuf
zwp_linux_dmabuf_v1 lets us use multi-planar images and buffer
modifiers.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jason Ekstrand [Tue, 14 Nov 2017 00:44:07 +0000 (16:44 -0800)]
anv/image: Add support for modifiers for WSI
This adds support for the modifiers portion of the WSI "extension".
Reviewed-by: Daniel Stone <daniels@collabora.com>
Jason Ekstrand [Thu, 25 Jan 2018 03:47:14 +0000 (19:47 -0800)]
anv/image: Separate modifiers from legacy scanout
For a bit there, we had a bug in i965 where it ignored the tiling of the
modifier and used the one from the BO instead. At one point, we though
this was best fixed by setting a tiling from Vulkan. However, we've
decided that i965 was just doing the wrong thing and have fixed it as of
50485723523d2948a44570ba110f02f726f86a54.
The old assumptions also affected the solution we used for legacy
scanout in Vulkan. Instead of treating it specially, we just treated it
like a modifier like we do in GL. This commit goes back to making it
it's own thing so that it's clear in the driver when we're using
modifiers and when we're using legacy paths.
v2 (Jason Ekstrand):
- Rename legacy_scanout to needs_set_tiling
Reviewed-by: Daniel Stone <daniels@collabora.com>
Jason Ekstrand [Fri, 9 Feb 2018 23:43:27 +0000 (15:43 -0800)]
vulkan/wsi: Add modifiers support to wsi_create_native_image
This involves extending our fake extension a bit to allow for additional
querying and passing of modifier information. The added bits are
intended to look a lot like the draft of VK_EXT_image_drm_format_modifier.
Once the extension gets finalized, we'll simply transition all of the
structs used in wsi_common to the real extension structs.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Daniel Stone [Fri, 9 Feb 2018 23:43:26 +0000 (15:43 -0800)]
vulkan/wsi: Add drm_modifier member to wsi_image
Not yet used anywhere.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Daniel Stone [Fri, 9 Feb 2018 23:43:25 +0000 (15:43 -0800)]
vulkan/wsi: Add multiple planes to wsi_image
Not currently used.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Timothy Arceri [Wed, 21 Feb 2018 03:36:09 +0000 (14:36 +1100)]
nir: remove old assert
This was originally intended to make sure the remap location
was not -1. However the code has changed alot since then,
the location is now never set to -1 and we also handle
components meaning this old assert has been doing comparisions
with the pointer to the array of component data.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105183
Timothy Arceri [Wed, 21 Feb 2018 02:27:17 +0000 (13:27 +1100)]
radeonsi/nir: collect more accurate output_usagemask
Fixes assert in the glsl-1.50-gs-max-output-components piglit test.
Note that the double handling will only work for doubles that
don't take up multiple slots i.e. double and dvec2. However
dual slot double handling is an existing bug which is made no
worse by this patch.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Wed, 21 Feb 2018 01:30:30 +0000 (12:30 +1100)]
radeonsi/nir: disable GLSL IR loop unrolling
Delaying unrolling and allowing NIR to do it instead has been shown
to result in better code in drivers such as i965. shader-db results
appear to show the same is true for radeonsi.
The other advantage is that using NIR unrolling improves compile
times significantly.
Totals from affected shaders:
SGPRS: 9624 -> 10016 (4.07 %)
VGPRS: 6800 -> 6464 (-4.94 %)
Spilled SGPRs: 0 -> 2 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 359176 -> 332264 (-7.49 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 1355 -> 1432 (5.68 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Tue, 20 Feb 2018 23:10:33 +0000 (10:10 +1100)]
radeonsi/nir: fix tess varying loads for doubles
Fixes the following piglit tests:
tests/spec/arb_tessellation_shader/execution/double-array-vs-tcs-tes.shader_test
tests/spec/arb_tessellation_shader/execution/double-vs-tcs-tes.shader_test
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Tue, 20 Feb 2018 23:09:18 +0000 (10:09 +1100)]
ac/radeonsi: pass type to load_tess_varyings()
We need this to be able to load 64bit varyings.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Daniel Stone [Wed, 21 Feb 2018 10:39:34 +0000 (10:39 +0000)]
x11/dri3: Store raw present completion mode
The DRI3 drawable info struct currently stores a boolean for whether the
last completed operation was a flip or not. As we need to track the full
completion mode for handling suboptimal returns, change the 'flipping'
field to the raw present completion mode from the server.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Daniel Stone [Wed, 21 Feb 2018 11:39:09 +0000 (11:39 +0000)]
x11/dri3: Don't open-code ARRAY_SIZE
Signed-off-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Jason Ekstrand [Wed, 21 Feb 2018 21:07:10 +0000 (13:07 -0800)]
anv: Don't assert that stencil HiZ clears are single-slice
It's true for depth HiZ clears because we only have HiZ on single-slice
images right now. However, for stencil-only clears there is no such
restriction.
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Sun, 11 Feb 2018 06:10:03 +0000 (22:10 -0800)]
anv: Only copy clear dwords if we're rendering to the first slice
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Marek Olšák [Tue, 30 Jan 2018 01:51:47 +0000 (02:51 +0100)]
radeonsi: don't flush when si_eliminate_fast_color_clear is no-op
Marek Olšák [Sun, 7 Jan 2018 20:04:55 +0000 (21:04 +0100)]
radeonsi: make texture_discard_cmask/eliminate functions non-static
James Zhu [Mon, 5 Feb 2018 17:02:50 +0000 (12:02 -0500)]
radeonsi: enable uvd encode for HEVC main
Enable UVD encode for HEVC main profile
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
James Zhu [Mon, 5 Feb 2018 22:08:22 +0000 (17:08 -0500)]
radeonsi:create uvd hevc enc entry
Add UVD hevc encode pipe video codec creation entry
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
James Zhu [Tue, 6 Feb 2018 18:29:11 +0000 (13:29 -0500)]
radeon/uvd:add uvd hevc enc functions
Implement UVD hevc encode functions
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
James Zhu [Tue, 6 Feb 2018 18:26:28 +0000 (13:26 -0500)]
radeon/uvd:add uvd hevc enc hw ib implementation
Implement required IBs for UVD HEVC encode.
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
James Zhu [Tue, 6 Feb 2018 18:18:21 +0000 (13:18 -0500)]
radeon/uvd:add uvd hevc enc hw interface header
Add hevc encode hardware interface for UVD
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
James Zhu [Tue, 6 Feb 2018 17:39:03 +0000 (12:39 -0500)]
winsys/amdgpu:add uvd hevc enc support in amdgpu cs
Support UVD HEVC encode in amdgpu cs
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
James Zhu [Mon, 5 Feb 2018 21:28:13 +0000 (16:28 -0500)]
amd/common:add uvd hevc enc support check in hw query
Based on amdgpu hardware query information to check if UVD hevc enc support
Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Karol Herbst [Mon, 19 Feb 2018 23:45:14 +0000 (00:45 +0100)]
nvir/nvc0: fix legalizing of ld unlock c0[0x10000]
We have to increase the file index also for 0x10000 not just for values
greater than 0x10000.
Fixes: 37b67db6ae34fb6586d640a7a1b6232f091dd812
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Samuel Pitoiset [Tue, 20 Feb 2018 10:11:43 +0000 (11:11 +0100)]
ac/nir: add glsl_is_array_image() helper
For consistency.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Tue, 20 Feb 2018 10:11:42 +0000 (11:11 +0100)]
ac/nir: set the DA field when performing atomics on 3D images
This doesn't fix anything known but it should definitely be set.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Eric Anholt [Sat, 10 Feb 2018 11:19:00 +0000 (11:19 +0000)]
i965: Fix compiler warning about write being undefined.
This looks like it should be protected by the assume() about
nr_color_regions, but my compiler warns anyway.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Eric Anholt [Sat, 10 Feb 2018 11:03:38 +0000 (11:03 +0000)]
glsl/tests: Fix a compiler warning about signed/unsigned loop comparison.
Fixes: d32956935edf ("glsl: Walk a list of ir_dereference_array to mark array elements as accessed")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Eric Anholt [Sat, 10 Feb 2018 10:45:18 +0000 (10:45 +0000)]
loader: Fix compiler warnings about truncating the PCI ID path.
My build was producing:
../src/loader/loader.c:121:67: warning: ‘%1u’ directive output may be truncated writing between 1 and 3 bytes into a region of size 2 [-Wformat-truncation=]
and we can avoid this careful calculation by just using asprintf (as we do
elsewhere in the file).
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Eric Anholt [Sat, 10 Feb 2018 10:41:07 +0000 (10:41 +0000)]
glsl: Silence warnings in the uniform initializer test about 16-bit types
They should probably get unit tests implemented, but this cleans up a
bunch of warnings in my build for now.
Fixes: 59f458cd8703 ("glsl: Add 16-bit types")
Cc: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Jordan Justen [Wed, 8 Nov 2017 23:42:14 +0000 (15:42 -0800)]
i965: Enable disk shader cache by default
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Dave Airlie [Mon, 19 Feb 2018 04:59:53 +0000 (04:59 +0000)]
radv: don't send num_tcs_input_cp to sgprs.
We never use it in the shaders.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 19 Feb 2018 04:55:52 +0000 (04:55 +0000)]
radv/tess: don't need to look in constant for vertices_per_patch
This just avoids passing this value via user sgprs.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 19 Feb 2018 06:19:07 +0000 (06:19 +0000)]
ac/radv: cleanup some tcs output values access
Just consolidates some code to make it easier to change.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 19 Feb 2018 06:53:21 +0000 (06:53 +0000)]
ac/radv: remove total_vertices variable
This just removes an unneeded variable.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 19 Feb 2018 20:33:17 +0000 (20:33 +0000)]
ac/radv: don't mark tess inner as used if we don't use it.
This just avoids marking it as a used output if we don't
actually use it.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Tue, 20 Feb 2018 00:15:18 +0000 (10:15 +1000)]
ac/nir: to integer the args to bcsel.
dEQP-VK.tessellation.invariance.outer_edge_symmetry.triangles_equal_spacing_ccw
was hitting an llvm assert due to one value being an int and the
other a float.
This just casts both values to integer and fixes the test.
Fixes: dEQP-VK.tessellation.invariance.outer_edge_symmetry.triangles_equal_spacing_ccw
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Fri, 2 Feb 2018 22:51:56 +0000 (14:51 -0800)]
anv/blorp: Use layout_to_aux_usage when a layout is provided
Instead of having aux usage and ANV_AUX_USAGE_DEFAULT to mean "give me
something reasonable" we now use anv_layout_to_aux_usage whenever a
layout is available. If a layout is available, we ignore the aux_usage
parameter. For the cases where we have an explicit aux usage such as
clears and aux ops, we have a new ANV_IMAGE_LAYOUT_EXPLICIT_AUX layout.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Jason Ekstrand [Fri, 2 Feb 2018 04:02:48 +0000 (20:02 -0800)]
anv/cmd_buffer: Delete some assert-only variables
Checking the sample count is almost as good as aux usage in this case.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Jason Ekstrand [Fri, 2 Feb 2018 03:36:22 +0000 (19:36 -0800)]
anv/cmd_buffer: Use layout_to_* helpers in compute_aux_usage
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Jason Ekstrand [Fri, 2 Feb 2018 03:13:12 +0000 (19:13 -0800)]
anv/cmd_buffer: Simplify transition_depth_buffer
If we don't have HiZ, then anv_layout_to_aux_usage will return NONE for
both layouts. If the two layouts are the same, they will get the aux
usage. In either case, the code below will give us ISL_AUX_OP_NONE and
we'll return without doing anything.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Jason Ekstrand [Tue, 21 Nov 2017 23:56:35 +0000 (15:56 -0800)]
anv/cmd_buffer: Do subpass image transitions in begin/end_subpass
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Sat, 13 Jan 2018 18:59:05 +0000 (10:59 -0800)]
anv/cmd_buffer: Mark depth/stencil surfaces written in begin_subpass
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Tue, 21 Nov 2017 23:16:40 +0000 (15:16 -0800)]
anv/cmd_buffer: Sync clear values in begin_subpass
This is quite a bit cleaner because we now sync the clear values at the
same time as we do the fast clear. For loading the clear values into
the surface state, we now do it once when we handle the LOAD_OP_LOAD
instead of every subpass.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Sat, 13 Jan 2018 18:45:55 +0000 (10:45 -0800)]
anv/pass: Store usage in each subpass attachment
This requires us to ditch the VkAttachmentReference struct in favor of
an anv-specific struct. However, we can now easily identify from just
the subpass attachment what kind of an attachment it is. This will make
iteration over anv_subpass::attachments a little easier in some case.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Wed, 22 Nov 2017 04:29:36 +0000 (20:29 -0800)]
anv/cmd_buffer: Add a concept of pending load aspects
These are the same as pending clear aspects only for the "load"
operation.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Mon, 27 Nov 2017 18:43:03 +0000 (10:43 -0800)]
anv/cmd_buffer: Iterate all subpass attachments when clearing
This unifies things a bit because we now handle depth and stencil at the
same time.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Mon, 27 Nov 2017 18:20:00 +0000 (10:20 -0800)]
anv/cmd_buffer: Decide whether or not to HiZ clear up-front
This moves the decision out of begin_subpass and into BeginRenderPass
like the decision for color clears. We use a similar name for the
function for depth/stencil as for color even though no aux usage is
really getting computed.
v2 (Jason Ekstrand):
- Don't always disable HiZ clears by accident
- Use the initial layout to decide whether to do fast clears
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Tue, 21 Nov 2017 22:46:25 +0000 (14:46 -0800)]
anv/cmd_buffer: Move the rest of clear_subpass into begin_subpass
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Tue, 21 Nov 2017 22:00:44 +0000 (14:00 -0800)]
intel/blorp: Add a blorp_hiz_clear_depth_stencil helper
This is similar to blorp_gen8_hiz_clear_attachments except that it takes
actual images instead of trusting in the already set depth state.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Tue, 21 Nov 2017 21:30:49 +0000 (13:30 -0800)]
anv/cmd_buffer: Move the color portion of clear_subpass into begin_subpass
This doesn't really change much now but it will give us more/better
control over clears in the future. The one interesting functional
change here is that we are now re-emitting 3DSTATE_DEPTH_BUFFERS and
friends for each clear. However, this only happens at begin_subpass
time so it shouldn't be substantially more expensive.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Tue, 21 Nov 2017 20:42:45 +0000 (12:42 -0800)]
anv/cmd_buffer: Pass a subpass id into begin_subpass
This is a bit less awkward than passing in the subpass because it means
we don't have to extract the subpass id from the subpass.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Tue, 21 Nov 2017 20:41:01 +0000 (12:41 -0800)]
anv/cmd_buffer: Add begin/end_subpass helpers
Having begin/end_subpass is a bit nicer than the begin/next/end hooks
that Vulkan gives us.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Tue, 21 Nov 2017 20:27:43 +0000 (12:27 -0800)]
anv/cmd_buffer: Apply subpass flushes before set_subpass
This seems slightly more correct because it means that the flushes
happen before any clears or resolves implied by the subpass transition.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Fri, 9 Feb 2018 00:44:56 +0000 (16:44 -0800)]
anv: Use framebuffer layers for implicit subpass transitions
Fixes: de3be618016 "anv/cmd_buffer: Rework aux tracking"
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Tue, 13 Feb 2018 00:03:28 +0000 (16:03 -0800)]
anv: Be more careful about fast-clear colors
Previously, we just used all the channels regardless of the format.
This is less than ideal because some channels may have undefined values
and this should be ok from the client's perspective. Even though the
driver should do the correct thing regardless of what is in the
undefined value, it makes things less deterministic. In particular, the
driver may choose to fast-clear or not based on undefined values. This
level of nondeterminism is bad.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Mon, 12 Feb 2018 23:50:12 +0000 (15:50 -0800)]
intel/isl: Add an isl_color_value_is_zero helper
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Jason Ekstrand [Sat, 17 Feb 2018 01:35:15 +0000 (17:35 -0800)]
anv/gpu_memcpy: CS Stall before a MI memcpy on gen7
This fixes a pile of hangs caused by the recent shuffling of resolves
and transitions. The particularly problematic case is when you have at
least three attachments with load ops of CLEAR, LOAD, CLEAR. In this
case, we execute the first CLEAR followed by a MI memcpy to copy the
clear values over for the LOAD followed by a second CLEAR. The MI
commands cause the first CLEAR to hang which causes us to get stuck on
the 3DSTATE_MULTISAMPLE in the second CLEAR.
We also add guards for BLORP to fix the same issue. These shouldn't
actually do anything right now because the only use of indirect clears
in BLORP today is for resolves which are already guarded by a render
cache flush and CS stall. However, this will guard us against potential
issues in the future.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Nanley Chery <nanley.g.chery@intel.com>
Guillaume Charifi [Tue, 20 Feb 2018 11:49:28 +0000 (12:49 +0100)]
st/mesa: Factorize duplicate code for atomic buffer binding
Signed-off-by: Guillaume Charifi <guillaume.charifi@sfr.fr>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Guillaume Charifi [Fri, 5 Jan 2018 16:49:39 +0000 (17:49 +0100)]
st/mesa: Factorize duplicate code in st_update_framebuffer_state()
Signed-off-by: Guillaume Charifi <guillaume.charifi@sfr.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Rob Clark [Tue, 20 Feb 2018 18:40:46 +0000 (13:40 -0500)]
freedreno/ir3: fix use_count refcnt'ing issue
Was hitting an assert with vs-varying-array-mat4-index-col-row-wr.shader_test
When eliminating a copy, we were dropping the use_count of the mov that
is skipped, but not increasing the use_count of it's src instruction.
Fixes: 76440fcca91 freedreno/ir3: clean up dangling false-dep's
Signed-off-by: Rob Clark <robdclark@gmail.com>
Eric Engestrom [Tue, 20 Feb 2018 13:35:56 +0000 (13:35 +0000)]
docs: fix patent url
Reported-by: Pierre Moreau <pierre.morrow@free.fr>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Brian Paul [Fri, 16 Feb 2018 20:57:51 +0000 (13:57 -0700)]
svga: replaced 'unsigned' with proper enum types in shader code
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Jonathan Gray [Tue, 20 Feb 2018 06:38:00 +0000 (17:38 +1100)]
configure.ac: pthread-stubs not present on OpenBSD
pthread-stubs is no longer required on OpenBSD and has been removed.
libpthread parts involved moved to libc.
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: 17.3 18.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Andres Gomez [Tue, 13 Feb 2018 22:42:57 +0000 (00:42 +0200)]
swr: bump minimum supported LLVM version to 4.0
Since radv and radeonsi removed support for LLVM 3.9 the distcheck
target got broken because SWR distribution needed 3.9.x.
After checking with George Kyriazis, SWR is OK with moving to LLVM 4.0
and above, which will solve this problem.
Fixes: 3bf1e036e8a ("amd: remove support for LLVM 3.9")
Cc: George Kyriazis <george.kyriazis@intel.com>
Cc: Tim Rowley <timothy.o.rowley@intel.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Dylan Baker <dylan@pnwbakers.com>
Cc: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
Andres Gomez [Tue, 6 Feb 2018 15:42:42 +0000 (17:42 +0200)]
travis: radeonsi and radv need LLVM 4.0
Fixes: 3bf1e036e8a ("amd: remove support for LLVM 3.9")
Cc: Marek Olšák <marek.olsak@amd.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Jan Vesely <jan.vesely@rutgers.edu>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>