mesa.git
6 years agoac: move ac_shader_info to radv folder
Samuel Pitoiset [Tue, 13 Mar 2018 13:49:11 +0000 (14:49 +0100)]
ac: move ac_shader_info to radv folder

This is RADV specific code.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac/nir: move ac_shader_variant_info and friends to radv folder
Samuel Pitoiset [Tue, 13 Mar 2018 13:34:35 +0000 (14:34 +0100)]
ac/nir: move ac_shader_variant_info and friends to radv folder

Also replace ac_ by radv_.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac/nir: move all RADV related code to radv_nir_to_llvm.c
Samuel Pitoiset [Fri, 9 Mar 2018 15:58:10 +0000 (16:58 +0100)]
ac/nir: move all RADV related code to radv_nir_to_llvm.c

Now the "ac/nir" prefix will really be the shared code between
RadeonSI and RADV, that might avoid confusions in the future.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac/nir: make emit_barrier() non-static
Samuel Pitoiset [Fri, 9 Mar 2018 15:56:31 +0000 (16:56 +0100)]
ac/nir: make emit_barrier() non-static

Required in order to move all RADV specific code outside of ac/nir.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac/nir: move radeon_llvm_reg_index_soa() to ac_nir_to_llvm.h
Samuel Pitoiset [Fri, 9 Mar 2018 15:54:46 +0000 (16:54 +0100)]
ac/nir: move radeon_llvm_reg_index_soa() to ac_nir_to_llvm.h

Required in order to move all RADV specific code outside of ac/nir.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac/nir: make handle_shader_output_decl() non-static
Samuel Pitoiset [Fri, 9 Mar 2018 15:53:06 +0000 (16:53 +0100)]
ac/nir: make handle_shader_output_decl() non-static

Required in order to move all RADV specific code outside of ac/nir.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac/nir: change prototype of handle_shader_output_decl()
Samuel Pitoiset [Fri, 9 Mar 2018 15:49:55 +0000 (16:49 +0100)]
ac/nir: change prototype of handle_shader_output_decl()

This allows to remove the ac_nir_context dependency.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac/nir: move unpack_param() to ac_llvm_build.c
Samuel Pitoiset [Fri, 9 Mar 2018 15:39:35 +0000 (16:39 +0100)]
ac/nir: move unpack_param() to ac_llvm_build.c

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac/nir: move trim_vector to ac_llvm_build.c
Samuel Pitoiset [Fri, 9 Mar 2018 15:36:31 +0000 (16:36 +0100)]
ac/nir: move trim_vector to ac_llvm_build.c

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac/nir: move cast_ptr() to ac_llvm_build.c
Samuel Pitoiset [Fri, 9 Mar 2018 15:26:34 +0000 (16:26 +0100)]
ac/nir: move cast_ptr() to ac_llvm_build.c

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac/nir: move ac_build_alloca() to ac_llvm_build.c
Samuel Pitoiset [Fri, 9 Mar 2018 15:22:44 +0000 (16:22 +0100)]
ac/nir: move ac_build_alloca() to ac_llvm_build.c

As well as si_build_alloca_undef() and drop the si prefix.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agogallium: silence __builtin_frame_address nonzero argument is unsafe warning
Timothy Arceri [Fri, 9 Mar 2018 00:00:55 +0000 (11:00 +1100)]
gallium: silence __builtin_frame_address nonzero argument is unsafe warning

Calling __builtin_frame_address with a nonzero argument is unsafe
but is sometimes done for debugging purposes. Since this code is
part of some debug util code I'm assuming that is the case here
and using GCC pragma to silence the warning.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
6 years agomeson: Add moduledir to d3d.pc
Dylan Baker [Fri, 9 Mar 2018 16:27:31 +0000 (08:27 -0800)]
meson: Add moduledir to d3d.pc

This is required to build wine with the nine patchset

Fixes: 6b4c7047d57178d3362a710ad503057c6a582ca3
       ("meson: build gallium nine state_tracker")
Reported-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
6 years agogallium: Use struct gl_array_attributes* as st_pipe_vertex_format argument.
Mathias Fröhlich [Sat, 10 Mar 2018 15:01:31 +0000 (16:01 +0100)]
gallium: Use struct gl_array_attributes* as st_pipe_vertex_format argument.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agomesa: Don't write to user buffer in glGetTexParameterIuiv on error
Ian Romanick [Thu, 8 Mar 2018 05:05:34 +0000 (21:05 -0800)]
mesa: Don't write to user buffer in glGetTexParameterIuiv on error

With some sets of optimization flags, GCC will generate warnings like
this:

src/mesa/main/texparam.c:2327:27: warning: ‘*((void *)&ip+12)’ may be used uninitialized in this function [-Wmaybe-uninitialized]
             params[3] = ip[3];
                         ~~^~~
src/mesa/main/texparam.c:2320:16: note: ‘*((void *)&ip+12)’ was declared here
          GLint ip[4];
                ^~

ip is not initialized in cases where a GL error is generated.  In these
cases, we should *not* write to the user's buffer, so this is actually a
bug.  I wrote a new piglit test gl-3.0-texparameteri to show this bug.

I suspect that Coverity also detected this, but the scan site is
currently down.

Fixes: c2c507786 "main: Added entry points for glGetTextureParameteriv, Iiv, and Iuiv."
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agogallium: work around libtool relink issue for libdrm
Roman Gilg [Mon, 5 Mar 2018 16:41:44 +0000 (17:41 +0100)]
gallium: work around libtool relink issue for libdrm

This is similar to commit 90633079. libtool links first to system directories
instead of custom locations of libdrm on relinking. Since a more recent libdrm
version than the one provided by the system is often needed when compiling
mesa, make sure this works by putting libdrm in front.

See also: https://bugs.freedesktop.org/show_bug.cgi?id=100259

Signed-off-by: Roman Gilg <subdiff@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agovulkan: autotools: do not redirect stdin/stdout for wayland-scanner
Emil Velikov [Thu, 8 Mar 2018 17:08:45 +0000 (17:08 +0000)]
vulkan: autotools: do not redirect stdin/stdout for wayland-scanner

The tool accepts the input and output files as arguments.
There's no need for the redirection.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agowayland-drm: autotools: do not redirect stdin/stdout for wayland-scanner
Emil Velikov [Thu, 8 Mar 2018 17:07:39 +0000 (17:07 +0000)]
wayland-drm: autotools: do not redirect stdin/stdout for wayland-scanner

The tool accepts the input and output files as arguments.
There's no need for the redirection.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agoegl: autotools: do not redirect stdin/stdout for wayland-scanner
Emil Velikov [Thu, 8 Mar 2018 16:16:18 +0000 (16:16 +0000)]
egl: autotools: do not redirect stdin/stdout for wayland-scanner

The tool accepts the input and output files as arguments.
There's no need for the redirection.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agodocs: document removal of GLX_SGIX_swap_{barrier,group} stubs
Emil Velikov [Thu, 8 Mar 2018 14:07:07 +0000 (14:07 +0000)]
docs: document removal of GLX_SGIX_swap_{barrier,group} stubs

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
6 years agoglx: remove empty GLX_SGIX_swap_group stubs
Emil Velikov [Mon, 5 Mar 2018 18:33:14 +0000 (18:33 +0000)]
glx: remove empty GLX_SGIX_swap_group stubs

The extension was never implemented. Quick search suggests:
 - no actual users (on my Arch setup)
 - the Nvidia driver does not implement the extension

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
6 years agogallium/x11: remove empty GLX_SGIX_swap_group stubs
Emil Velikov [Mon, 5 Mar 2018 18:30:40 +0000 (18:30 +0000)]
gallium/x11: remove empty GLX_SGIX_swap_group stubs

The extension was never implemented. Quick search suggests:
 - no actual users (on my Arch setup)
 - the Nvidia driver does not implement the extension

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
6 years agox11: remove empty GLX_SGIX_swap_group stubs
Emil Velikov [Mon, 5 Mar 2018 18:28:35 +0000 (18:28 +0000)]
x11: remove empty GLX_SGIX_swap_group stubs

The extension was never implemented. Quick search suggests:
 - no actual users (on my Arch setup)
 - the Nvidia driver does not implement the extension

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
6 years agoglx: remove empty GLX_SGIX_swap_barrier stubs
Emil Velikov [Mon, 5 Mar 2018 18:25:16 +0000 (18:25 +0000)]
glx: remove empty GLX_SGIX_swap_barrier stubs

The extension was never implemented. Quick search suggests:
 - no actual users (on my Arch setup)
 - the Nvidia driver does not implement the extension

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
6 years agogallium/x11: remove empty GLX_SGIX_swap_barrier stubs
Emil Velikov [Mon, 5 Mar 2018 18:22:38 +0000 (18:22 +0000)]
gallium/x11: remove empty GLX_SGIX_swap_barrier stubs

The extension was never implemented. Quick search suggests:
 - no actual users (on my Arch setup)
 - the Nvidia driver does not implement the extension

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
6 years agox11: remove empty GLX_SGIX_swap_barrier stubs
Emil Velikov [Mon, 5 Mar 2018 18:17:13 +0000 (18:17 +0000)]
x11: remove empty GLX_SGIX_swap_barrier stubs

The extension was never implemented. Quick search suggests:
 - no actual users (on my Arch setup)
 - the Nvidia driver does not implement the extension

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
6 years agoconfigure: remove unused AM_CONDITIONAL
Emil Velikov [Mon, 5 Mar 2018 18:14:51 +0000 (18:14 +0000)]
configure: remove unused AM_CONDITIONAL

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
6 years agoradv: Increase the number of dynamic uniform buffers.
Bas Nieuwenhuizen [Fri, 9 Mar 2018 16:18:03 +0000 (17:18 +0100)]
radv: Increase the number of dynamic uniform buffers.

The vulkan API is not ideal as it does not allow us have a
shared limit.

Feral needs 15+6 for one of their games, and I'm not a fan
of overcommitting the limits, so increase the number of
dynamic uniform buffers to 16.

CC: <mesa-stable@lists.freedesktop.org>
CC: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agou_vbuf/translate: pass max_index into the set_buffer.
Dave Airlie [Thu, 8 Mar 2018 20:18:55 +0000 (06:18 +1000)]
u_vbuf/translate: pass max_index into the set_buffer.

This fixes a memory trashing crash (not the test) seen with
dEQP-GLES3.stress.draw.unaligned_data.random.203
on virgl.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
6 years agor600: implement callstack workaround for evergreen.
Dave Airlie [Fri, 9 Mar 2018 06:03:53 +0000 (16:03 +1000)]
r600: implement callstack workaround for evergreen.

This is ported from the sb backend, there are some issues with
evergreen stacks on the boundary between entries and ALU_PUSH_BEFORE
instructions.

Whenever we are going to use a push before, we check the stack
usage and if we have to use the workaround, then we switch to
a separate push.

I noticed this problem dealing with some of the soft fp64 shaders,
in nosb mode, they are quite stack happy.

This fixes all the glitches and inconsistencies I've seen with them

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Tested-by: Elie Tournier <elie.tournier@collabora.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
6 years agogallium/util: add helper util_wait_for_idle
Marek Olšák [Sun, 11 Mar 2018 00:23:45 +0000 (19:23 -0500)]
gallium/util: add helper util_wait_for_idle

This is an old patch that I had.

6 years agou_blit: (trivial) u_blit.h needs to include p_defines.h
Roland Scheidegger [Sat, 10 Mar 2018 01:48:42 +0000 (02:48 +0100)]
u_blit: (trivial) u_blit.h needs to include p_defines.h

(For the pipe_tex_filter enum)

Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
6 years agotravis: bump libxcb version to 1.13
Christian Gmeiner [Sat, 10 Mar 2018 14:53:27 +0000 (15:53 +0100)]
travis: bump libxcb version to 1.13

Fixes following dependency problem:
  Native dependency xcb-dri3 found: NO found '1.11' but need: '>= 1.13'

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Fixes: c80c08e22603 ("vulkan/wsi/x11: Add support for DRI3 v1.2")
6 years agomesa: Make gl_vertex_array contain pointers to first order VAO members.
Mathias Fröhlich [Sun, 4 Mar 2018 17:15:53 +0000 (18:15 +0100)]
mesa: Make gl_vertex_array contain pointers to first order VAO members.

Instead of keeping a copy of the vertex array content in
struct gl_vertex_array only keep pointers to the first order
information originaly in the VAO.
For that represent the current values by struct gl_array_attributes
and struct gl_vertex_buffer_binding.

v2: Change comments.
    Remove gl... prefix from variables except in the i965 directory where
    it was like that before. Reindent because of that.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agodraw: fix alpha value for very short aa lines
Roland Scheidegger [Fri, 9 Mar 2018 04:27:25 +0000 (05:27 +0100)]
draw: fix alpha value for very short aa lines

The logic would not work correctly for line lengths smaller than 1.0,
even a degenerated line with length 0 would still produce a fragment
with anyhwere between alpha 0.0 and 0.5.

Reviewed-by: Brian Paul <brianp@vmware.com>
6 years agointel/vulkan: Hard code CS scratch_ids_per_subslice for Cherryview
Jordan Justen [Wed, 7 Mar 2018 07:28:00 +0000 (23:28 -0800)]
intel/vulkan: Hard code CS scratch_ids_per_subslice for Cherryview

Ken suggested that we might be underallocating scratch space on HD
400. Allocating scratch space as though there was actually 8 EUs
seems to help with a GPU hang seen on synmark CSDof.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoi965: Hard code CS scratch_ids_per_subslice for Cherryview
Jordan Justen [Tue, 6 Mar 2018 16:35:50 +0000 (08:35 -0800)]
i965: Hard code CS scratch_ids_per_subslice for Cherryview

Ken suggested that we might be underallocating scratch space on HD
400. Allocating scratch space as though there was actually 8 EUs
seems to help with a GPU hang seen on synmark CSDof.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Eero Tamminen <eero.t.tamminen@intel.com>
6 years agost/dri: fix OpenGL-OpenCL interop for GL_TEXTURE_BUFFER
Marek Olšák [Wed, 7 Mar 2018 18:47:28 +0000 (13:47 -0500)]
st/dri: fix OpenGL-OpenCL interop for GL_TEXTURE_BUFFER

Tested by our OpenCL team.

Fixes: 9c499e6759b26c5e "st/mesa: don't invoke st_finalize_texture & st_convert_sampler for TBOs"
Acked-by: Alex Deucher <alexander.deucher@amd.com>
6 years agoradeonsi: add a workaround for GFX9 hang with init_config alignment
Marek Olšák [Fri, 9 Mar 2018 21:25:42 +0000 (16:25 -0500)]
radeonsi: add a workaround for GFX9 hang with init_config alignment

Fixes: 75c5d25f0f34cd702 "radeonsi: align command buffer starting address to fix some Raven hangs"
Cc: 17.3 18.0 <mesa-stable@lists.freedesktop.org>
6 years agoac/gpu_info: print ib_start_alignment, add assertion
Marek Olšák [Fri, 9 Mar 2018 21:24:40 +0000 (16:24 -0500)]
ac/gpu_info: print ib_start_alignment, add assertion

6 years agomeson: Use system_has_kms_drm in default driver selection
Greg V [Tue, 6 Mar 2018 19:16:03 +0000 (22:16 +0300)]
meson: Use system_has_kms_drm in default driver selection

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agobroadcom/vc4: Add an accelerated path to turn raster R8/RG88 into tiled.
Eric Anholt [Tue, 6 Feb 2018 16:43:24 +0000 (16:43 +0000)]
broadcom/vc4: Add an accelerated path to turn raster R8/RG88 into tiled.

Drawing a 1080p YV12 video stream generated by MMAL goes from 10.5 FPS to
36.

6 years agogallium: Add a util_blitter path for using a custom VS and FS.
Eric Anholt [Wed, 7 Feb 2018 14:40:08 +0000 (14:40 +0000)]
gallium: Add a util_blitter path for using a custom VS and FS.

Like the r600 paths to use other custom states, we pass in a couple of
parameters to customize the innards of the blitter.  It's up to the caller
to wrap other state necessary for its shaders (for example, constant
buffers for the uniforms the shader uses).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agobroadcom/vc4: Allow binding non-zero constant buffers.
Eric Anholt [Wed, 7 Feb 2018 15:22:19 +0000 (15:22 +0000)]
broadcom/vc4: Allow binding non-zero constant buffers.

We're going to use UBO loads for implementing YUV linear-to-T-format
blits.

6 years agobroadcom: Remove our defines of DRM_FORMAT_MOD_INVALID.
Eric Anholt [Wed, 28 Feb 2018 22:16:54 +0000 (14:16 -0800)]
broadcom: Remove our defines of DRM_FORMAT_MOD_INVALID.

The imported drm_fourcc.h handles it now.

6 years agobroadcom: Suppress compiler warnings about enum pipe_tex_filter.
Eric Anholt [Wed, 28 Feb 2018 22:15:34 +0000 (14:15 -0800)]
broadcom: Suppress compiler warnings about enum pipe_tex_filter.

6 years agoegl/x11: Re-allocate buffers if format is suboptimal
Louis-Francis Ratté-Boulianne [Fri, 6 Oct 2017 05:26:51 +0000 (01:26 -0400)]
egl/x11: Re-allocate buffers if format is suboptimal

If PresentCompleteNotify event says the pixmap was presented
with mode PresentCompleteModeSuboptimalCopy, it means the pixmap
could possibly have been flipped instead if allocated with a
different format/modifier.

Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
6 years agoegl/x11: Support DRI3 v1.1
Louis-Francis Ratté-Boulianne [Fri, 7 Jul 2017 06:54:26 +0000 (02:54 -0400)]
egl/x11: Support DRI3 v1.1

Add support for DRI3 v1.1, which allows pixmaps to be backed by
multi-planar buffers, or those with format modifiers. This is both
for allocating render buffers, as well as EGLImage imports from a
native pixmap (EGL_NATIVE_PIXMAP_KHR).

Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
6 years agovulkan/wsi/x11: Return VK_SUBOPTIMAL_KHR for X11
Louis-Francis Ratté-Boulianne [Wed, 27 Sep 2017 03:11:55 +0000 (23:11 -0400)]
vulkan/wsi/x11: Return VK_SUBOPTIMAL_KHR for X11

When it is detected that a window could have been flipped
but has been copied because of suboptimal format/modifier.
The Vulkan client should then re-create the swapchain.

Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
6 years agovulkan/wsi/x11: Add support for DRI3 v1.2
Daniel Stone [Thu, 8 Jun 2017 16:24:30 +0000 (17:24 +0100)]
vulkan/wsi/x11: Add support for DRI3 v1.2

Adds support for multiple planes and buffer modifiers.

v4: Rename "has_dri3_v1_1" to "has_dri3_modifiers"
v12: Multi-planar/modifier support is now DRI3 v1.2; also update release
     versions

6 years agoautotools: include all meson.build files
Dylan Baker [Fri, 2 Mar 2018 17:57:54 +0000 (09:57 -0800)]
autotools: include all meson.build files

Otherwise SWR cannot be built with meson from an autotools generated
tarball, such as the 18.0.0-rc4 tarball.

Fixes: 16bf81383080 ("meson/swr: re-shuffle generated files")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: George Kyriazis <george.kyriazis@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agost/mesa: gl_program::info.system_values_read is a 64-bit-field
Michel Dänzer [Thu, 8 Mar 2018 16:32:50 +0000 (17:32 +0100)]
st/mesa: gl_program::info.system_values_read is a 64-bit-field

We were dropping the upper 32 bits, which caused assertion failures in
some compute shader piglit tests with radeonsi since the commit below.

Fixes: 752e96970303 ("compiler: Add two new system values for subgroups")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoswr/rast: Refactor memory gather operations
George Kyriazis [Thu, 1 Mar 2018 18:39:18 +0000 (12:39 -0600)]
swr/rast: Refactor memory gather operations

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Add KNOB_DISABLE_SPLIT_DRAW
George Kyriazis [Tue, 27 Feb 2018 21:29:52 +0000 (15:29 -0600)]
swr/rast: Add KNOB_DISABLE_SPLIT_DRAW

This is useful for archrast data collection. This greatly speeds up the
post processing script since there is significantly less events generated.

Finally, this is a simpler option to communicate to users than having
them directly adjust MAX_PRIMS_PER_DRAW and MAX_TESS_PRIMS_PER_DRAW.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Add VPOPCNT
George Kyriazis [Fri, 2 Mar 2018 06:54:38 +0000 (00:54 -0600)]
swr/rast: Add VPOPCNT

Supports popcnt on vector masks (e.g. <8 x i1>)

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Add tracking for stream out topology
George Kyriazis [Wed, 28 Feb 2018 23:33:13 +0000 (17:33 -0600)]
swr/rast: Add tracking for stream out topology

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Add split draw and other state information to DrawInfoEvent.
George Kyriazis [Mon, 26 Feb 2018 23:55:23 +0000 (17:55 -0600)]
swr/rast: Add split draw and other state information to DrawInfoEvent.

Removed specific split draw events.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Refactor api and worker event handlers.
George Kyriazis [Mon, 26 Feb 2018 21:19:08 +0000 (15:19 -0600)]
swr/rast: Refactor api and worker event handlers.

In the API event handler we want to share information between the core
layer and the API. Specifically, around associating various ids with
different kinds of events. For example, associate render pass id with
draw ids, or command buffer ids with draw ids.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Add support for generalized late and early z/stencil stats
George Kyriazis [Sat, 24 Feb 2018 00:51:18 +0000 (18:51 -0600)]
swr/rast: Add support for generalized late and early z/stencil stats

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Rasterized Subspans stats support
George Kyriazis [Fri, 23 Feb 2018 22:11:04 +0000 (16:11 -0600)]
swr/rast: Rasterized Subspans stats support

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Added comment
George Kyriazis [Wed, 21 Feb 2018 01:24:55 +0000 (19:24 -0600)]
swr/rast: Added comment

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agovulkan/wsi: clean up cleanup path
Eric Engestrom [Mon, 26 Feb 2018 13:34:54 +0000 (13:34 +0000)]
vulkan/wsi: clean up cleanup path

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agoradv: Fix the autotools build take 2.
Bas Nieuwenhuizen [Fri, 9 Mar 2018 13:08:38 +0000 (14:08 +0100)]
radv: Fix the autotools build take 2.

Forgot to remove a word....

Fixes: 04ffabf17a "radv: Fix autotools build."
6 years agoetnaviv: allow mixing different bit depths for color and depth surfaces
Lucas Stach [Wed, 7 Mar 2018 13:31:59 +0000 (14:31 +0100)]
etnaviv: allow mixing different bit depths for color and depth surfaces

Vivante hardware supports this just fine. There is no reason why this shouldn't
be advertised as a valid combination.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
6 years agoautotools: Add tegra to AM_DISTCHECK_CONFIGURE_FLAGS
Thierry Reding [Thu, 22 Feb 2018 17:21:45 +0000 (18:21 +0100)]
autotools: Add tegra to AM_DISTCHECK_CONFIGURE_FLAGS

This allows the driver to be built on a make distcheck and makes sure
that it properly builds when a distribution tarball is made.

Suggested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
6 years agotegra: Initial support
Thierry Reding [Tue, 27 May 2014 22:36:48 +0000 (00:36 +0200)]
tegra: Initial support

Tegra K1 and later use a GPU that can be driven by the Nouveau driver.
But the GPU is a pure render node and has no display engine, hence the
scanout needs to happen on the Tegra display hardware. The GPU and the
display engine each have a separate DRM device node exposed by the
kernel.

To make the setup appear as a single device, this driver instantiates
a Nouveau screen with each instance of a Tegra screen and forwards GPU
requests to the Nouveau screen. For purposes of scanout it will import
buffers created on the GPU into the display driver. Handles that
userspace requests are those of the display driver so that they can be
used to create framebuffers.

This has been tested with some GBM test programs, as well as kmscube and
weston. All of those run without modifications, but I'm sure there is a
lot that can be improved.

Some fixes contributed by Hector Martin <marcan@marcan.st>.

Changes in v2:
- duplicate file descriptor in winsys to avoid potential issues
- require nouveau when building the tegra driver
- check for nouveau driver name on render node
- remove unneeded dependency on libdrm_tegra
- remove zombie references to libudev
- add missing headers to C_SOURCES variable
- drop unneeded tegra/ prefix for includes
- open device files with O_CLOEXEC
- update copyrights

Changes in v3:
- properly unwrap resources in ->resource_copy_region()
- support vertex buffers passed by user pointer
- allocate custom stream and const uploader
- silence error message on pre-Tegra124
- support X without explicit PRIME

Changes in v4:
- ship Meson build files in distribution tarball
- drop duplicate driver_tegra dependency

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
6 years agonouveau: Add framebuffer modifier support
Thierry Reding [Wed, 11 Oct 2017 12:38:56 +0000 (14:38 +0200)]
nouveau: Add framebuffer modifier support

This adds support for framebuffer modifiers to Nouveau. This will be
used by the Tegra driver to share metadata about the format of buffers
(such as the tiling mode or compression).

Changes in v2:
- remove unused parameters to nouveau_buffer_create()
- move format modifier query code to nvc0 backend
- restrict format modifiers to 2D textures
- implement ->query_dmabuf_modifiers()

Changes in v4:
- add UAPI include path on meson builds

Changes in v5:
- remove unnecessary includes

Acked-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Thierry Reding <treding@nvidia.com>
6 years agonouveau/nvc0: Extract common tile mode macro
Thierry Reding [Wed, 11 Oct 2017 12:41:26 +0000 (14:41 +0200)]
nouveau/nvc0: Extract common tile mode macro

Add a new macro that can be used to extract the tiling mode from a
tile_mode value. This is will be used to determine the number of GOBs
used in block linear mode.

Acked-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Thierry Reding <treding@nvidia.com>
6 years agodrm/tegra: Sanitize format modifiers
Thierry Reding [Tue, 20 Feb 2018 14:48:37 +0000 (15:48 +0100)]
drm/tegra: Sanitize format modifiers

The existing format modifier definitions were merged prematurely, and
recent work has unveiled that the definitions are suboptimal in several
ways:

  - The format specifiers, except for one, are not Tegra specific, but
    the names don't reflect that.
  - The number space is split into two, reserving 32 bits for some
    "parameter" which most of the modifiers are not going to have.
  - Symbolic names for the modifiers are not using the standard
    DRM_FORMAT_MOD_* prefix, which makes them awkward to use.
  - The vendor prefix NV is somewhat ambiguous.

Fortunately, nobody's started using these modifiers, so we can still fix
the above issues. Do so by using the standard prefix. Also, remove TEGRA
from the name of those modifiers that exist on NVIDIA GPUs as well. In
case of the block linear modifiers, make the "parameter" smaller (4
bits, though only 6 values are valid) and don't let that leak into any
of the other modifiers.

Finally, also use the more canonical NVIDIA instead of the ambiguous NV
prefix.

This is based on commit 268892cb63a822315921a8dab48ac3e4abf7dd03 from
Linux v4.16-rc1.

Acked-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
6 years agodrm/fourcc: Fix fourcc_mod_code() definition
Thierry Reding [Tue, 20 Feb 2018 14:47:25 +0000 (15:47 +0100)]
drm/fourcc: Fix fourcc_mod_code() definition

Avoid a compiler warnings when the val parameter is an expression.

This is based on commit 5843f4e02fbe86a59981e35adc6cabebee46fdc0 from
Linux v4.16-rc1.

Acked-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
6 years agoradv: Fix autotools build.
Bas Nieuwenhuizen [Fri, 9 Mar 2018 07:43:01 +0000 (08:43 +0100)]
radv: Fix autotools build.

Forgot it again ....

Fixes: b6347807a9 "radv: Generate icd files."
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoac/nir: set number of channels for packed mrt exports
Samuel Pitoiset [Thu, 8 Mar 2018 16:30:05 +0000 (17:30 +0100)]
ac/nir: set number of channels for packed mrt exports

Bit 0 enables VSRC0 (R in low bits, G high) and bit 2 enables
VSRC1 (B in low bits, A high).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: Update version to 1.1.70.
Bas Nieuwenhuizen [Thu, 8 Mar 2018 23:49:57 +0000 (00:49 +0100)]
radv: Update version to 1.1.70.

Turns out they did not reset the patch number on release.

Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoradv: Generate icd files.
Bas Nieuwenhuizen [Thu, 8 Mar 2018 23:47:26 +0000 (00:47 +0100)]
radv: Generate icd files.

If the api version is too low, the loader clamps the application
requested version to the advertized version, which messes with
which extensions are enabled.

Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agonir: Don't i2b a value that is already Boolean
Ian Romanick [Thu, 22 Feb 2018 02:15:52 +0000 (18:15 -0800)]
nir: Don't i2b a value that is already Boolean

A bunch of shaders have sequences like:

    i2b(u2i(floatBitsToUint(intBitsToFloat(x == y ? -1 : 0))))

Other optimizations (and NIR's typeless nature) reduce this to

    i2b(x == y)

which is silly.

Skylake
total instructions in shared programs: 14498698 -> 14497948 (<.01%)
instructions in affected programs: 74480 -> 73730 (-1.01%)
helped: 277
HURT: 0
helped stats (abs) min: 1 max: 32 x̄: 2.71 x̃: 2
helped stats (rel) min: 0.04% max: 13.79% x̄: 1.45% x̃: 0.68%
95% mean confidence interval for instructions value: -3.35 -2.06
95% mean confidence interval for instructions %-change: -1.74% -1.16%
Instructions are helped.

total cycles in shared programs: 532015500 -> 531999238 (<.01%)
cycles in affected programs: 5943878 -> 5927616 (-0.27%)
helped: 251
HURT: 74
helped stats (abs) min: 1 max: 13149 x̄: 127.89 x̃: 14
helped stats (rel) min: 0.01% max: 17.31% x̄: 1.55% x̃: 0.53%
HURT stats (abs)   min: 1 max: 4550 x̄: 214.04 x̃: 15
HURT stats (rel)   min: <.01% max: 44.43% x̄: 2.81% x̃: 0.33%
95% mean confidence interval for cycles value: -158.51 58.43
95% mean confidence interval for cycles %-change: -1.07% -0.04%
Inconclusive result (value mean confidence interval includes 0).

total loops in shared programs: 4753 -> 4735 (-0.38%)
loops in affected programs: 18 -> 0
helped: 18
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for loops value: -1.00 -1.00
95% mean confidence interval for loops %-change: -100.00% -100.00%
Loops are helped.

Haswell and Broadwell had simliar results. (Broadwell shown)
total instructions in shared programs: 14791877 -> 14791127 (<.01%)
instructions in affected programs: 77326 -> 76576 (-0.97%)
helped: 278
HURT: 1
helped stats (abs) min: 1 max: 32 x̄: 2.70 x̃: 2
helped stats (rel) min: 0.04% max: 13.79% x̄: 1.42% x̃: 0.68%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.49% max: 0.49% x̄: 0.49% x̃: 0.49%
95% mean confidence interval for instructions value: -3.33 -2.05
95% mean confidence interval for instructions %-change: -1.70% -1.13%
Instructions are helped.

total cycles in shared programs: 558250067 -> 558252872 (<.01%)
cycles in affected programs: 5806328 -> 5809133 (0.05%)
helped: 235
HURT: 83
helped stats (abs) min: 1 max: 10630 x̄: 81.73 x̃: 16
helped stats (rel) min: 0.03% max: 18.58% x̄: 1.60% x̃: 0.51%
HURT stats (abs)   min: 1 max: 10590 x̄: 265.19 x̃: 20
HURT stats (rel)   min: <.01% max: 15.28% x̄: 1.89% x̃: 0.54%
95% mean confidence interval for cycles value: -89.87 107.51
95% mean confidence interval for cycles %-change: -1.06% -0.32%
Inconclusive result (value mean confidence interval includes 0).

total loops in shared programs: 4735 -> 4717 (-0.38%)
loops in affected programs: 18 -> 0
helped: 18
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for loops value: -1.00 -1.00
95% mean confidence interval for loops %-change: -100.00% -100.00%
Loops are helped.

total fills in shared programs: 83111 -> 83110 (<.01%)
fills in affected programs: 28 -> 27 (-3.57%)
helped: 1
HURT: 0

Ivy Bridge
total instructions in shared programs: 11774173 -> 11773436 (<.01%)
instructions in affected programs: 70819 -> 70082 (-1.04%)
helped: 267
HURT: 0
helped stats (abs) min: 1 max: 48 x̄: 2.76 x̃: 2
helped stats (rel) min: 0.21% max: 19.51% x̄: 1.57% x̃: 0.63%
95% mean confidence interval for instructions value: -3.51 -2.01
95% mean confidence interval for instructions %-change: -1.94% -1.21%
Instructions are helped.

total cycles in shared programs: 257153833 -> 257148932 (<.01%)
cycles in affected programs: 585341 -> 580440 (-0.84%)
helped: 167
HURT: 100
helped stats (abs) min: 1 max: 1327 x̄: 44.89 x̃: 16
helped stats (rel) min: 0.04% max: 26.54% x̄: 2.41% x̃: 0.88%
HURT stats (abs)   min: 1 max: 200 x̄: 25.95 x̃: 16
HURT stats (rel)   min: 0.04% max: 9.81% x̄: 1.34% x̃: 0.65%
95% mean confidence interval for cycles value: -33.25 -3.46
95% mean confidence interval for cycles %-change: -1.47% -0.54%
Cycles are helped.

total loops in shared programs: 3416 -> 3398 (-0.53%)
loops in affected programs: 18 -> 0
helped: 18
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for loops value: -1.00 -1.00
95% mean confidence interval for loops %-change: -100.00% -100.00%
Loops are helped.

LOST:   2
GAINED: 0

Sandy Bridge
total instructions in shared programs: 10499306 -> 10499094 (<.01%)
instructions in affected programs: 6051 -> 5839 (-3.50%)
helped: 43
HURT: 0
helped stats (abs) min: 1 max: 32 x̄: 4.93 x̃: 2
helped stats (rel) min: 0.39% max: 12.90% x̄: 4.29% x̃: 2.45%
95% mean confidence interval for instructions value: -7.66 -2.20
95% mean confidence interval for instructions %-change: -5.47% -3.12%
Instructions are helped.

total cycles in shared programs: 145862568 -> 145861370 (<.01%)
cycles in affected programs: 61733 -> 60535 (-1.94%)
helped: 36
HURT: 2
helped stats (abs) min: 16 max: 66 x̄: 36.61 x̃: 35
helped stats (rel) min: 0.45% max: 17.31% x̄: 4.92% x̃: 2.81%
HURT stats (abs)   min: 18 max: 102 x̄: 60.00 x̃: 60
HURT stats (rel)   min: 1.10% max: 1.85% x̄: 1.48% x̃: 1.48%
95% mean confidence interval for cycles value: -41.28 -21.77
95% mean confidence interval for cycles %-change: -6.16% -3.00%
Cycles are helped.

total loops in shared programs: 1803 -> 1785 (-1.00%)
loops in affected programs: 18 -> 0
helped: 18
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for loops value: -1.00 -1.00
95% mean confidence interval for loops %-change: -100.00% -100.00%
Loops are helped.

LOST:   4
GAINED: 0

No changes on Iron Lake of GM45.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
6 years agoi965/vec4: Allow CSE on subset VF constant loads
Ian Romanick [Sat, 17 Feb 2018 01:33:13 +0000 (17:33 -0800)]
i965/vec4: Allow CSE on subset VF constant loads

v2: Rewrite the code that generates the VF mask.  Suggested by Ken.

No changes on other platforms.

Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 13059891 -> 13059884 (<.01%)
instructions in affected programs: 431 -> 424 (-1.62%)
helped: 7
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.19% max: 5.26% x̄: 2.05% x̃: 1.49%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -3.39% -0.71%
Instructions are helped.

total cycles in shared programs: 409260032 -> 409260018 (<.01%)
cycles in affected programs: 4228 -> 4214 (-0.33%)
helped: 7
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.28% max: 2.04% x̄: 0.54% x̃: 0.28%
95% mean confidence interval for cycles value: -2.00 -2.00
95% mean confidence interval for cycles %-change: -1.15% 0.07%

Inconclusive result (%-change mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoi965/vec4: Relax writemask condition in CSE
Ian Romanick [Sat, 17 Feb 2018 01:26:11 +0000 (17:26 -0800)]
i965/vec4: Relax writemask condition in CSE

If the previously seen instruction generates more fields than the new
instruction, still allow CSE to happen.  This doesn't do much, but it
also enables a couple more shaders in the next patch.  It helped quite a
bit in another change series that I have (at least for now) abandoned.

v2: Add some extra comentary about the parameters to instructions_match.
Suggested by Ken.

No changes on Skylake, Broadwell, Iron Lake or GM45.

Ivy Bridge and Haswell had similar results. (Ivy Bridge shown)
total instructions in shared programs: 11780295 -> 11780294 (<.01%)
instructions in affected programs: 302 -> 301 (-0.33%)
helped: 1
HURT: 0

total cycles in shared programs: 257308315 -> 257308313 (<.01%)
cycles in affected programs: 2074 -> 2072 (-0.10%)
helped: 1
HURT: 0

Sandy Bridge
total instructions in shared programs: 10506687 -> 10506686 (<.01%)
instructions in affected programs: 335 -> 334 (-0.30%)
helped: 1
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoi965/fs: Merge CMP and SEL into CSEL on Gen8+
Ian Romanick [Thu, 22 Feb 2018 02:06:56 +0000 (18:06 -0800)]
i965/fs: Merge CMP and SEL into CSEL on Gen8+

v2: Fix several problems handling inverted predicates.  Add a much
bigger comment around the BRW_CONDITIONAL_NZ case.

v3: Allow uniforms and shader inputs as sources for the original SEL and
CMP instructions.  This enables a LOT more shaders to receive CSEL
merging (5816 vs 8564 on SKL).

v4: Report progress.

Broadwell and Skylake had similar results. (Broadwell shown)
helped: 8527
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 2.44 x̃: 1
helped stats (rel) min: 0.03% max: 17.80% x̄: 1.12% x̃: 0.70%
95% mean confidence interval for instructions value: -2.51 -2.36
95% mean confidence interval for instructions %-change: -1.15% -1.10%
Instructions are helped.

total cycles in shared programs: 559442317 -> 558288357 (-0.21%)
cycles in affected programs: 372699860 -> 371545900 (-0.31%)
helped: 6748
HURT: 1450
helped stats (abs) min: 1 max: 32000 x̄: 182.41 x̃: 12
helped stats (rel) min: <.01% max: 66.08% x̄: 3.42% x̃: 0.70%
HURT stats (abs)   min: 1 max: 2538 x̄: 53.08 x̃: 14
HURT stats (rel)   min: <.01% max: 96.72% x̄: 3.32% x̃: 0.90%
95% mean confidence interval for cycles value: -179.01 -102.51
95% mean confidence interval for cycles %-change: -2.37% -2.08%
Cycles are helped.

LOST:   0
GAINED: 6

No changes on earlier platforms.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v3]
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agoi965/fs: Add infrastructure for generating CSEL instructions.
Kenneth Graunke [Mon, 23 Nov 2015 04:12:17 +0000 (20:12 -0800)]
i965/fs: Add infrastructure for generating CSEL instructions.

v2 (idr): Don't allow CSEL with a non-float src2.

v3 (idr): Add CSEL to fs_inst::flags_written.  Suggested by Matt.

v4 (idr): Only set BRW_ALIGN_16 on Gen < 10 (suggested by Matt).  Don't
reset the access mode afterwards (suggested by Samuel and Matt).  Add
support for CSEL not modifying the flags to more places (requested by
Matt).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v3]
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agonir: Narrow some dot product operations
Ian Romanick [Thu, 15 Feb 2018 22:49:55 +0000 (14:49 -0800)]
nir: Narrow some dot product operations

On vector platforms, this helps elide some constant loads.

v2: Reorder the transformations.

No changes on Broadwell or Skylake.

Haswell
total instructions in shared programs: 13093793 -> 13060163 (-0.26%)
instructions in affected programs: 1277532 -> 1243902 (-2.63%)
helped: 13216
HURT: 95
helped stats (abs) min: 1 max: 18 x̄: 2.56 x̃: 2
helped stats (rel) min: 0.21% max: 20.00% x̄: 3.63% x̃: 2.78%
HURT stats (abs)   min: 1 max: 6 x̄: 1.77 x̃: 1
HURT stats (rel)   min: 0.09% max: 5.56% x̄: 1.25% x̃: 1.19%
95% mean confidence interval for instructions value: -2.57 -2.49
95% mean confidence interval for instructions %-change: -3.65% -3.54%
Instructions are helped.

total cycles in shared programs: 409580819 -> 409268463 (-0.08%)
cycles in affected programs: 71730652 -> 71418296 (-0.44%)
helped: 9898
HURT: 2352
helped stats (abs) min: 2 max: 16014 x̄: 37.08 x̃: 16
helped stats (rel) min: <.01% max: 35.55% x̄: 6.26% x̃: 4.50%
HURT stats (abs)   min: 2 max: 276 x̄: 23.25 x̃: 6
HURT stats (rel)   min: <.01% max: 40.00% x̄: 3.54% x̃: 1.97%
95% mean confidence interval for cycles value: -33.19 -17.80
95% mean confidence interval for cycles %-change: -4.50% -4.26%
Cycles are helped.

total fills in shared programs: 82059 -> 82052 (<.01%)
fills in affected programs: 21 -> 14 (-33.33%)
helped: 7
HURT: 0

Sandy Bridge and Ivy Bridge had similar results (Ivy Bridge shown)
total instructions in shared programs: 11811851 -> 11780605 (-0.26%)
instructions in affected programs: 1155007 -> 1123761 (-2.71%)
helped: 12304
HURT: 95
helped stats (abs) min: 1 max: 18 x̄: 2.55 x̃: 2
helped stats (rel) min: 0.21% max: 20.00% x̄: 3.69% x̃: 2.86%
HURT stats (abs)   min: 1 max: 6 x̄: 1.77 x̃: 1
HURT stats (rel)   min: 0.09% max: 5.56% x̄: 1.25% x̃: 1.19%
95% mean confidence interval for instructions value: -2.56 -2.48
95% mean confidence interval for instructions %-change: -3.71% -3.59%
Instructions are helped.

total cycles in shared programs: 257618409 -> 257316805 (-0.12%)
cycles in affected programs: 71999580 -> 71697976 (-0.42%)
helped: 9155
HURT: 2380
helped stats (abs) min: 2 max: 16014 x̄: 38.44 x̃: 16
helped stats (rel) min: <.01% max: 35.75% x̄: 6.39% x̃: 4.62%
HURT stats (abs)   min: 2 max: 290 x̄: 21.14 x̃: 4
HURT stats (rel)   min: <.01% max: 41.55% x̄: 3.14% x̃: 1.33%
95% mean confidence interval for cycles value: -34.32 -17.97
95% mean confidence interval for cycles %-change: -4.55% -4.29%
Cycles are helped.

GM45 and Iron Lake had nearly identical results (Iron Lake shown)
total instructions in shared programs: 7886750 -> 7879944 (-0.09%)
instructions in affected programs: 373781 -> 366975 (-1.82%)
helped: 3715
HURT: 47
helped stats (abs) min: 1 max: 8 x̄: 1.86 x̃: 1
helped stats (rel) min: 0.22% max: 16.67% x̄: 2.88% x̃: 2.06%
HURT stats (abs)   min: 1 max: 6 x̄: 2.55 x̃: 2
HURT stats (rel)   min: 1.09% max: 5.00% x̄: 1.93% x̃: 2.35%
95% mean confidence interval for instructions value: -1.85 -1.77
95% mean confidence interval for instructions %-change: -2.91% -2.73%
Instructions are helped.

total cycles in shared programs: 178114636 -> 178095452 (-0.01%)
cycles in affected programs: 7227666 -> 7208482 (-0.27%)
helped: 3349
HURT: 301
helped stats (abs) min: 2 max: 90 x̄: 6.55 x̃: 4
helped stats (rel) min: <.01% max: 14.18% x̄: 0.95% x̃: 0.63%
HURT stats (abs)   min: 2 max: 42 x̄: 9.13 x̃: 10
HURT stats (rel)   min: 0.01% max: 11.19% x̄: 1.22% x̃: 1.50%
95% mean confidence interval for cycles value: -5.52 -4.99
95% mean confidence interval for cycles %-change: -0.81% -0.73%
Cycles are helped.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v1]
6 years agoi965: perf: consolidate unmapping oa perf bo outside accumulation
Lionel Landwerlin [Wed, 7 Mar 2018 14:10:15 +0000 (14:10 +0000)]
i965: perf: consolidate unmapping oa perf bo outside accumulation

Do this in one place outside the only caller of the accumulation
function.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965: perf: count number of accumlated reports
Lionel Landwerlin [Tue, 6 Mar 2018 17:11:56 +0000 (17:11 +0000)]
i965: perf: count number of accumlated reports

This will be reused later.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965: perf: reuse timescale base function from query
Lionel Landwerlin [Tue, 6 Mar 2018 15:47:00 +0000 (15:47 +0000)]
i965: perf: reuse timescale base function from query

We already have the same function in brw_queryobj.c

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965: perf: store sysfs device entry into context
Lionel Landwerlin [Wed, 7 Feb 2018 18:09:58 +0000 (18:09 +0000)]
i965: perf: store sysfs device entry into context

We want to reuse it later on.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965: perf: store the hw_id of the context in the query
Lionel Landwerlin [Wed, 7 Feb 2018 18:10:57 +0000 (18:10 +0000)]
i965: perf: store the hw_id of the context in the query

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965: perf: default case for unknown query types
Lionel Landwerlin [Tue, 6 Feb 2018 17:29:32 +0000 (17:29 +0000)]
i965: perf: default case for unknown query types

Just some extra safety before further changes.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoradeonsi: remove chip_class parameter from si_lower_nir
Marek Olšák [Tue, 6 Mar 2018 23:30:06 +0000 (18:30 -0500)]
radeonsi: remove chip_class parameter from si_lower_nir

We can get it from si_screen.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
6 years agowinsys/amdgpu: query GDS info
Marek Olšák [Sun, 11 Sep 2016 19:53:20 +0000 (21:53 +0200)]
winsys/amdgpu: query GDS info

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
6 years agowinsys/amdgpu: pad compute IBs
Marek Olšák [Tue, 6 Mar 2018 20:03:09 +0000 (15:03 -0500)]
winsys/amdgpu: pad compute IBs

v2: pad with PKT2 NOPs on SI

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
6 years agoradeonsi: expand constbuf 0 address correctly to fix Vega10 hangs
Marek Olšák [Wed, 7 Mar 2018 16:36:26 +0000 (11:36 -0500)]
radeonsi: expand constbuf 0 address correctly to fix Vega10 hangs

This is only required with the latest libdrm.

This fixes 32-bit support with high addresses.
(and possibly 64-bit support too because the high bits need to be masked out)

Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
6 years agoradeonsi: align command buffer starting address to fix some Raven hangs
Marek Olšák [Wed, 7 Mar 2018 00:07:58 +0000 (19:07 -0500)]
radeonsi: align command buffer starting address to fix some Raven hangs

Cc: 17.3 18.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
6 years agoetnaviv: add get_driver_query_group_info(..)
Christian Gmeiner [Mon, 5 Mar 2018 22:26:43 +0000 (23:26 +0100)]
etnaviv: add get_driver_query_group_info(..)

This enables AMD_performance_monitor extension.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
6 years agoetnaviv: add query_group_info for sw counters
Christian Gmeiner [Mon, 5 Mar 2018 22:26:42 +0000 (23:26 +0100)]
etnaviv: add query_group_info for sw counters

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
6 years agomeson: Fix building gallium media libs without egl
Dylan Baker [Wed, 28 Feb 2018 21:07:57 +0000 (13:07 -0800)]
meson: Fix building gallium media libs without egl

v2: - rebase on omx fix

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net> (v1)
6 years agomeson: Allow building dri based EGL without GLX
Dylan Baker [Wed, 28 Feb 2018 18:13:38 +0000 (10:13 -0800)]
meson: Allow building dri based EGL without GLX

It should be possible to build EGL without GLX, but the meson build
currently doesn't allow that because it too tightly couples glx and dri.
This patch eases dri and glx apart, so that EGL without GLX can be
built.

CC: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
6 years agoglx/apple: Ship meson build file in tarball
Thierry Reding [Tue, 6 Mar 2018 09:44:08 +0000 (10:44 +0100)]
glx/apple: Ship meson build file in tarball

The meson build file for Apple GLX is not listed in the EXTRA_DIST make
variable and therefore isn't shipped as part of the release tarball, so
meson builds from the tarball will fail.

Add the file to EXTRA_DIST to ensure it is included in the tarball.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
6 years agoac/nir: do not emit unnecessary null exports in fragment shaders
Samuel Pitoiset [Thu, 8 Mar 2018 08:53:14 +0000 (09:53 +0100)]
ac/nir: do not emit unnecessary null exports in fragment shaders

Null exports should only be needed when no other exports are
emitted. This removes a bunch of 'exp null off, off, off, off done vm'.

Affected games are Dota 2 and Wolfenstein 2, not sure if that
really helps, but code size is decreasing there.

Polaris10:
Totals from affected shaders:
SGPRS: 8216 -> 8216 (0.00 %)
VGPRS: 7072 -> 7072 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 454968 -> 453896 (-0.24 %) bytes
Max Waves: 772 -> 772 (0.00 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agodrirc: whitespace fix
Eric Engestrom [Thu, 8 Mar 2018 09:52:16 +0000 (09:52 +0000)]
drirc: whitespace fix

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
6 years agodrirc: Disable the GLX_SGI_video_sync extension for gnome-shell on vmware
Thomas Hellstrom [Mon, 26 Feb 2018 13:32:01 +0000 (14:32 +0100)]
drirc: Disable the GLX_SGI_video_sync extension for gnome-shell on vmware

With this extension enabled and a server GLX implementation that actually
honors it, Window movement lags considerably on gnome-shell/vmware, so
disable it by default.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Deepak Rawat <drawat@vmware.com>
6 years agogallium/st_dri: Honor the glx_disable_sgi_video_sync config option
Thomas Hellstrom [Mon, 26 Feb 2018 13:30:33 +0000 (14:30 +0100)]
gallium/st_dri: Honor the glx_disable_sgi_video_sync config option

This option is disabled by default. Primarily intended for drivers on
virtual hardware.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Deepak Rawat <drawat@vmware.com>