mesa.git
8 years agomesa: Add MESA_SHADER_CAPTURE_PATH for writing .shader_test files.
Kenneth Graunke [Sun, 7 Sep 2014 03:26:51 +0000 (20:26 -0700)]
mesa: Add MESA_SHADER_CAPTURE_PATH for writing .shader_test files.

This writes linked shader programs to .shader_test files to
$MESA_SHADER_CAPTURE_PATH in the format used by shader-db
(http://cgit.freedesktop.org/mesa/shader-db).

It supports both GLSL shaders and ARB programs.  All stages that
are linked together are written in a single .shader_test file.

This eliminates the need for shader-db's split-to-files.py, as Mesa
produces the desired format directly.  It's much more reliable than
parsing stdout/stderr, as those may contain extraneous messages, or
simply be closed by the application and unavailable.

We have many similar features already, but this is a bit different:
- MESA_GLSL=dump writes to stdout, not files.
- MESA_GLSL=log writes each stage to separate files (rather than
  all linked shaders in one file), at draw time (not link time),
  with uniform data and state flag info.
- Tapani's shader replacement mechanism (MESA_SHADER_DUMP_PATH and
  MESA_SHADER_READ_PATH) also uses separate files per shader stage,
  but allows reading in files to replace an app's shader code.

v2:  Dump ARB programs too, not just GLSL.
v3:  Don't dump bogus 0.shader_test file.
v4:  Add "GL_ARB_separate_shader_objects" to the [require] block.
v5:  Print "GLSL 4.00" instead of "GLSL 4.0" in the [require] block.
v6:  Don't hardcode /tmp/mesa.
v7:  Fix memoization of getenv().
v8:  Also print "SSO ENABLED" (suggested by Timothy).
v9:  Also handle ES shaders (suggested by Ilia).
v10: Guard against MESA_SHADER_CAPTURE_PATH being too long; add
     _mesa_warning calls on error handling (suggested by Ben).
v11: Fix crash when variable is unset introduced in v10.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agonv50,nvc0: fix BGR10_A2UI vertex format
Ilia Mirkin [Sun, 5 Jun 2016 19:00:36 +0000 (15:00 -0400)]
nv50,nvc0: fix BGR10_A2UI vertex format

This is mostly academic as this is not reachable from GL, which only has
the packed RGB10_A2UI vertex format.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agonvc0: do not clear surfaces bins in the validate function
Samuel Pitoiset [Sun, 5 Jun 2016 16:53:26 +0000 (18:53 +0200)]
nvc0: do not clear surfaces bins in the validate function

We should not call nouveau_bufctx_reset() inside a validate function.
This only affects Fermi where images are aliased between 3D and CP.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agonvc0: re-validate images after launching a grid on Fermi
Samuel Pitoiset [Sun, 5 Jun 2016 16:01:19 +0000 (18:01 +0200)]
nvc0: re-validate images after launching a grid on Fermi

Images invalidation is a bit weird on Fermi and there is already a hack
which forces invalidating all images when launching a computer shader
to help in fixing 3D<->CP interaction.

However, we need to re-validate images for compute because
nvc0_compute_invalidate_surfaces() will destroy the previous binding.
This is not really good for performance purposes but this might be
improved later.

This fixes the following piglits:
- spec/arb_compute_shader/execution/basic-uniform-access
- spec/arb_compute_shader/execution/mutiple-texture-reading
- spec/arb_compute_shader/execution/multiple-workgroups
- spec/glsl-4.30/execution/built-in-functions/cs-* (207 tests)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agoradeonsi: fix images with level > 0
Marek Olšák [Fri, 3 Jun 2016 17:17:46 +0000 (19:17 +0200)]
radeonsi: fix images with level > 0

This should fix spec@arb_shader_image_load_store@level.

Broken by:
    Commit: 95c5bbae66af3ca1f805d94f6fe8d8e4ba2c9c43
    radeonsi: set some image descriptor fields at bind time

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
8 years agonvc0: reduce overhead from always marking images dirty
Ilia Mirkin [Sat, 4 Jun 2016 18:13:38 +0000 (14:13 -0400)]
nvc0: reduce overhead from always marking images dirty

We would revalidate images when anything was touched at all. Which is
unfortunate, since the state tracker does not use CSO's to reduce the
workload. So instead implement a protocol to ensure that something has
changed before revalidating all the images.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agonvc0: reduce overhead from always marking buffers dirty
Ilia Mirkin [Sat, 4 Jun 2016 17:50:21 +0000 (13:50 -0400)]
nvc0: reduce overhead from always marking buffers dirty

We would revalidate buffers when anything was touched at all. Which is
unfortunate, since the state tracker does not use CSO's to reduce the
workload. So instead implement a protocol to ensure that something has
changed before revalidating all the SSBOs.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agonvc0: fix memory barrier flag handling
Ilia Mirkin [Fri, 3 Jun 2016 01:36:04 +0000 (21:36 -0400)]
nvc0: fix memory barrier flag handling

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agonvc0: mark bound buffer range valid
Ilia Mirkin [Fri, 3 Jun 2016 01:42:14 +0000 (21:42 -0400)]
nvc0: mark bound buffer range valid

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agoanv/entrypoints: don't go using wayland/xcb unless they are configured
Dave Airlie [Sat, 4 Jun 2016 20:49:42 +0000 (06:49 +1000)]
anv/entrypoints: don't go using wayland/xcb unless they are configured

The fix in:
anv: let anv_entrypoints_gen.py generate proper Wayland/Xcb guards

breaks things if wayland headers aren't installed.

Separate things out properly to avoid that problem.

[airlied: fixed up to put in pre-existing sections].
Reported-by: Arjan van de Ven
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agogallium/radeon: don't use the DMA ring for pipelined buffer uploads
Marek Olšák [Thu, 26 May 2016 16:20:42 +0000 (18:20 +0200)]
gallium/radeon: don't use the DMA ring for pipelined buffer uploads

Submitting a DMA IB flushes the GFX IB and all GPU caches.

Vedran Miletić said:
  "On Tonga 380X, this improves The Talos Principle from 8.3 fps to 28.3 fps
   (all graphics settings Ultra, 4xAA, 1080p resolution with downsampling
   from 1200p)."

Some anonymous dude said:
   R9 390 results:
      Tomb Raider (normal settings): 80 -> 88 FPS
      Talos Principle (custom settings): 23 -> 56 FPS
      Metro Last Light Redux (default benchmark settings): 39 -> 40 FPS

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Vedran Miletić <vedran@miletic.net>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
8 years agor600g: don't flush caches when binding shader resources
Marek Olšák [Thu, 26 May 2016 16:14:27 +0000 (18:14 +0200)]
r600g: don't flush caches when binding shader resources

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
8 years agor600g: only do necessary cache flushes in cp_dma_copy_buffer
Marek Olšák [Thu, 26 May 2016 15:25:46 +0000 (17:25 +0200)]
r600g: only do necessary cache flushes in cp_dma_copy_buffer

The main impact is that {upload, draw, upload, draw, ..} doesn't flush
framebuffer caches before every upload.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
8 years agor600g: only do necessary cache flushes in cp_dma_clear_buffer
Marek Olšák [Thu, 26 May 2016 15:18:13 +0000 (17:18 +0200)]
r600g: only do necessary cache flushes in cp_dma_clear_buffer

The main impact is that fast color clear doesn't flush TC, CONST, DB.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
8 years agor600g: remove a CP DMA workaround that's not needed anymore
Marek Olšák [Wed, 1 Jun 2016 16:39:53 +0000 (18:39 +0200)]
r600g: remove a CP DMA workaround that's not needed anymore

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
8 years agor600g: fix CP DMA hazard with index buffer fetches (v3)
Marek Olšák [Thu, 26 May 2016 20:00:03 +0000 (22:00 +0200)]
r600g: fix CP DMA hazard with index buffer fetches (v3)

v3: use PFP_SYNC_ME on EG-CM only when supported by the kernel,
    otherwise use MEM_WRITE + WAIT_REG_MEM to emulate that

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
8 years agor600g: properly sync CP with CP DMA on R6xx
Marek Olšák [Wed, 1 Jun 2016 16:35:33 +0000 (18:35 +0200)]
r600g: properly sync CP with CP DMA on R6xx

This will allow removing useless cache & IB flushes.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
8 years agor600g: write WAIT_UNTIL in the correct place
Marek Olšák [Tue, 31 May 2016 21:07:15 +0000 (23:07 +0200)]
r600g: write WAIT_UNTIL in the correct place

This has been wrong all along. Fixing this will allow removing useless
cache flushes.

Cc: 11.1 11.2 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
8 years agogallium/radeon: rename allocator_so_filled_size -> allocator_zeroed_memory
Marek Olšák [Tue, 31 May 2016 17:11:54 +0000 (19:11 +0200)]
gallium/radeon: rename allocator_so_filled_size -> allocator_zeroed_memory

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
8 years agogallium/u_suballoc: allow different alignment for each allocation
Marek Olšák [Tue, 31 May 2016 17:06:45 +0000 (19:06 +0200)]
gallium/u_suballoc: allow different alignment for each allocation

Just move the alignment parameter from u_suballocator_create
to u_suballocator_alloc.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
8 years agoanv/blit: Use CLAMP_TO_EDGE for scaled blits
Jason Ekstrand [Thu, 2 Jun 2016 23:34:11 +0000 (16:34 -0700)]
anv/blit: Use CLAMP_TO_EDGE for scaled blits

When upscaling you can end up interpolating between the edge pixel and one
past the edge.  Using CLAMP_TO_EDGE seems like the most reasonable thing to
do in this case.  This fixes two of the new Vulkan CTS tests in
dEQP-VK.api.copy_and_blit.blit_image.*

Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agoanv/copy: Account for the anv_surface.offset when creating a blit2d_surf
Jason Ekstrand [Thu, 2 Jun 2016 23:25:44 +0000 (16:25 -0700)]
anv/copy: Account for the anv_surface.offset when creating a blit2d_surf

This was causing problems if the user tried to copy to/from the stencil
portion of a combined depth/stencil image.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agonir/spirv: Make a decoration switch complete
Jason Ekstrand [Thu, 2 Jun 2016 21:36:58 +0000 (14:36 -0700)]
nir/spirv: Make a decoration switch complete

Getting rid of the default case makes the compiler warn if we are missing
cases.  While we're here, we also add the one missing case.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agonir/spirv: Make unhandled decorations and capabilities non-fatal
Jason Ekstrand [Thu, 2 Jun 2016 21:34:15 +0000 (14:34 -0700)]
nir/spirv: Make unhandled decorations and capabilities non-fatal

glslang frequently throw bogus decorations into shaders.  While we are free
to assert-fail, it's a bit nicer to the application to just warn.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agonir/spirv: Add a way to print non-fatal warnings
Jason Ekstrand [Thu, 2 Jun 2016 21:32:56 +0000 (14:32 -0700)]
nir/spirv: Add a way to print non-fatal warnings

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agonir/spirv: Add string lookup tables for a couple of SPIR-V enums
Jason Ekstrand [Thu, 2 Jun 2016 21:06:30 +0000 (14:06 -0700)]
nir/spirv: Add string lookup tables for a couple of SPIR-V enums

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agonir/spirv: Complete the list of capabilities
Jason Ekstrand [Thu, 2 Jun 2016 20:43:19 +0000 (13:43 -0700)]
nir/spirv: Complete the list of capabilities

Previously we supported a subset of capabilities and just left a default
case for the others.  It's time to stop being lazy and actually audit the
capabilities.  This should bring them up-to-date with reality.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agoanv/pipeline: Add support for early depth stencil
Jason Ekstrand [Wed, 1 Jun 2016 03:16:01 +0000 (20:16 -0700)]
anv/pipeline: Add support for early depth stencil

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agomesa: Get rid of _mesa_active_fragment_shader_has_side_effects
Jason Ekstrand [Thu, 2 Jun 2016 01:53:32 +0000 (18:53 -0700)]
mesa: Get rid of _mesa_active_fragment_shader_has_side_effects

It is no longer used.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/ps_state: Use wm_prog_data.has_side_effects
Jason Ekstrand [Thu, 2 Jun 2016 01:55:35 +0000 (18:55 -0700)]
i965/ps_state: Use wm_prog_data.has_side_effects

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs Add a wm_prog_data bit for has_side_effects
Jason Ekstrand [Thu, 2 Jun 2016 01:46:30 +0000 (18:46 -0700)]
i965/fs Add a wm_prog_data bit for has_side_effects

This is more accurate than calling
_mesa_active_fragment_shader_has_side_effects because it looks at whether
or not the SSBOs, images, or atomic buffers are actually written rather
than just existing in the program.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agonir/info: Get rid of uses_interp_var_at_offset
Jason Ekstrand [Thu, 2 Jun 2016 01:29:09 +0000 (18:29 -0700)]
nir/info: Get rid of uses_interp_var_at_offset

We were using this briefly in the i965 driver to trigger recompiles but we
haven't been using it since we switched to the NIR y-transform lowering
pass.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoanv/pipeline: Silently pass tests if depth or stencil is missing
Jason Ekstrand [Wed, 1 Jun 2016 05:23:18 +0000 (22:23 -0700)]
anv/pipeline: Silently pass tests if depth or stencil is missing

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agoanv/pipeline: Unify gen7/8 emit_ds_state
Jason Ekstrand [Wed, 1 Jun 2016 05:19:53 +0000 (22:19 -0700)]
anv/pipeline: Unify gen7/8 emit_ds_state

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agogenxml/gen6,7,75: s/BackFace/Backface
Jason Ekstrand [Wed, 1 Jun 2016 05:15:38 +0000 (22:15 -0700)]
genxml/gen6,7,75: s/BackFace/Backface

This is more consistent with gen8+

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agonir/spirv: Handle the WorkgroupSize builtin decoration
Jason Ekstrand [Wed, 1 Jun 2016 18:20:22 +0000 (11:20 -0700)]
nir/spirv: Handle the WorkgroupSize builtin decoration

This fixes the 7 dEQP-VK.pipeline.spec_constant.compute.local_size.* tests
in the latest dev version of the Vulkan CTS.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agonir/spirv: Use breaks instead of returns in constant handling
Jason Ekstrand [Wed, 1 Jun 2016 17:34:04 +0000 (10:34 -0700)]
nir/spirv: Use breaks instead of returns in constant handling

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agoanv/pipeline: Refactor specialization constant handling a bit
Jason Ekstrand [Tue, 31 May 2016 23:27:19 +0000 (16:27 -0700)]
anv/pipeline: Refactor specialization constant handling a bit

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agonir/lower_indirect_derefs: Use the direct array deref for recursion
Jason Ekstrand [Tue, 31 May 2016 22:02:10 +0000 (15:02 -0700)]
nir/lower_indirect_derefs: Use the direct array deref for recursion

This fixes about 100 of the new Vulkan CTS tests.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agoanv/clear: Handle ClearImage on 3-D images
Jason Ekstrand [Tue, 31 May 2016 18:26:06 +0000 (11:26 -0700)]
anv/clear: Handle ClearImage on 3-D images

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agoRevert "i965/fs: Allow scalar source regions on SNB math instructions."
Francisco Jerez [Fri, 3 Jun 2016 19:32:15 +0000 (12:32 -0700)]
Revert "i965/fs: Allow scalar source regions on SNB math instructions."

This reverts commit c1107cec44ab030c7fcc97c67baa12df1cc9d7b5.
Apparently the hardware spec text I quoted in the commit message was
outright lying about scalar source math being supported on SNB, the
hardware seems to load 32 contiguous bits of data for each channel
regardless of the regioning mode.  Fixes regressions in the following
CTS tests (which we didn't catch early due to CTS being temporarily
disabled in our CI system):

   es2-cts.gtf.gl.atan.atan_vec3_frag_xvary
   es2-cts.gtf.gl.cos.cos_vec2_frag_xvary
   es2-cts.gtf.gl.atan.atan_vec2_frag_xvary
   es2-cts.gtf.gl.pow.pow_vec2_frag_xvary_yconsthalf
   es2-cts.gtf.gl.cos.cos_float_frag_xvary
   es2-cts.gtf.gl.pow.pow_float_frag_xvary_yconsthalf
   es2-cts.gtf.gl.atan.atan_vec3_frag_xvaryyvary
   es2-cts.gtf.gl.pow.pow_vec3_frag_xvary_yconsthalf
   es2-cts.gtf.gl.cos.cos_vec3_frag_xvary
   es2-cts.gtf.gl.atan.atan_vec2_frag_xvaryyvary

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96346
Reported-by: Mark Janes <mark.a.janes@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
8 years agoi965/vec4: Fix cmod propagation not to propagate non-identity cmod into CMP(N).
Francisco Jerez [Wed, 1 Jun 2016 23:27:52 +0000 (16:27 -0700)]
i965/vec4: Fix cmod propagation not to propagate non-identity cmod into CMP(N).

The conditional mod of these instructions determines the semantics of
the comparison itself (rather than being evaluated based on the result
of the instruction as is usually the case for most other instructions
that allow conditional mods), so it's in general not legal to
propagate a conditional mod into a CMP instruction.  This prevents
cmod propagation from (mis)optimizing:

 cmp.z.f0 tmp, ...
 mov.z.f0 null, tmp

into:

 cmp.z.f0 tmp, ...

which gives the negation of the flag result of the original sequence.
I originally noticed this while working on SIMD32 in the scalar
back-end, but the same scenario is likely to be possible in vec4
programs so this commit ports the bugfix with the same name from the
scalar back-end to the vec4 cmod propagation pass.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoanv: add the X related and Wayland CFLAGS to VULKAN_ENTRYPOINT_CPPFLAGS
Emil Velikov [Fri, 3 Jun 2016 23:20:53 +0000 (00:20 +0100)]
anv: add the X related and Wayland CFLAGS to VULKAN_ENTRYPOINT_CPPFLAGS

Otherwise we will fail to find the headers in some scenarios.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reported-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Tested-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
8 years agonir: automake: add nir_search_helpers.h to the sources list(s)
Emil Velikov [Fri, 3 Jun 2016 23:18:40 +0000 (00:18 +0100)]
nir: automake: add nir_search_helpers.h to the sources list(s)

Fixes: dfbae7d64f4 ("nir/algebraic: support for power-of-two
optimizations")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
8 years agofreedreno/ir3: do idiv lowering after main opt loop
Rob Clark [Mon, 9 May 2016 16:41:00 +0000 (12:41 -0400)]
freedreno/ir3: do idiv lowering after main opt loop

Give algebraic-opt pass a chance to catch udiv by const power-of-two,
before running lower-idiv pass.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agonir/algebraic: support for power-of-two optimizations
Rob Clark [Sat, 7 May 2016 17:01:24 +0000 (13:01 -0400)]
nir/algebraic: support for power-of-two optimizations

Some optimizations, like converting integer multiply/divide into left/
right shifts, have additional constraints on the search expression.
Like requiring that a variable is a constant power of two.  Support
these cases by allowing a fxn name to be appended to the search var
expression (ie. "a#32(is_power_of_two)").

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoradeonsi: mark buffer texture range valid for shader images
Nicolai Hähnle [Thu, 2 Jun 2016 20:17:40 +0000 (22:17 +0200)]
radeonsi: mark buffer texture range valid for shader images

When a shader image view into a buffer texture can be written to, the buffer's
valid range must be updated, or subsequent transfers may incorrectly skip
synchronization.

This fixes a bug that was exposed in Xephyr by PBO acceleration for glReadPixels,
reported by Michel Dänzer.

Cc: Michel Dänzer <michel.daenzer@amd.com>
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoRevert "egl: Check if API is supported when using eglBindAPI."
Marek Olšák [Fri, 3 Jun 2016 09:25:19 +0000 (11:25 +0200)]
Revert "egl: Check if API is supported when using eglBindAPI."

This reverts commit e8b38ca202fbe8c281aeb81a4b64256983f185e0.

It broke Glamor for Gallium at least.

8 years agomesa/formatquery: expand NUM_SAMPLE_COUNTS OpenGL ES comment
Alejandro Piñeiro [Fri, 6 May 2016 14:13:26 +0000 (16:13 +0200)]
mesa/formatquery: expand NUM_SAMPLE_COUNTS OpenGL ES comment

For ES 3.0 NUM_SAMPLE_COUNTS spec points that some formats will be
always zero. But on ES 3.1 can be different to zero.

The current code is correctly checking exactly against version 3.0,
but the comment only mentions 3.0 spec. It is clearer mentioning both.

v2: better wording on the comment (Ian Romanick)

Acked-by: Eduardo Lima <elima@igalia.com>
Acked-by: Antia Puentes <apuentes@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agomesa/get: return correct value for layer provoking vertex.
Dave Airlie [Fri, 3 Jun 2016 02:26:05 +0000 (12:26 +1000)]
mesa/get: return correct value for layer provoking vertex.

This fixes:
GL45-CTS.geometry_shader.layered_rendering.layered_rendering

on Skylake.

Reviewed-by: Chris Forbes <chrisforbes@google.com>
Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agoegl: Account for default values of texture target and format
Plamena Manolova [Wed, 1 Jun 2016 16:31:29 +0000 (17:31 +0100)]
egl: Account for default values of texture target and format

When validating attributes during surface creation we should account
for the default values of texture target and format (EGL_NO_TEXTURE)
since the user is not obligated to explicitly set both via the
attribute list passed to eglCreatePbufferSurface.

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
8 years agonvc0: mark buffer texture range valid for shader images
Samuel Pitoiset [Thu, 2 Jun 2016 22:00:27 +0000 (00:00 +0200)]
nvc0: mark buffer texture range valid for shader images

Loosely based on radeonsi (Thanks to Nicolai).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 12.0 <mesa-stable@lists.freedesktop.org>
8 years agoisl: add support for Android libmesa_isl static library
Mauro Rossi [Thu, 2 Jun 2016 19:15:35 +0000 (21:15 +0200)]
isl: add support for Android libmesa_isl static library

isl library is needed to build i965, libmesa_isl static library is added
to fix related Android building errors.

Any attempt to build libmesa_genxml as phony package module failed to deliver
gen{7,75,8,9}_pack.h generated headers, needed for libmesa_isl_gen{7,75,8,9}

Due to constraints in Android Build System, libmesa_genxml is built as static,
at least one source is needed, so dummy.c is autogenerated for this scope,
libmesa_genxml dependency is declared using LOCAL_WHOLE_STATIC_LIBRARIES,
to avoid building errors due to missing genxml/gen{7,75,8,9}_pack.h headers.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
8 years agoandroid: libmesa_glsl: add a dependency on libmesa_nir static
Mauro Rossi [Mon, 30 May 2016 22:20:28 +0000 (00:20 +0200)]
android: libmesa_glsl: add a dependency on libmesa_nir static

Fixes the following building error:

target  C++: libmesa_glsl <= external/mesa/src/compiler/glsl/glsl_to_nir.cpp
In file included from external/mesa/src/compiler/glsl/glsl_to_nir.h:28:0,
                 from external/mesa/src/compiler/glsl/glsl_to_nir.cpp:28:
external/mesa/src/compiler/nir/nir.h:42:25: fatal error: nir_opcodes.h: No such file or directory
compilation terminated.
build/core/binary.mk:432: recipe for target 'out/target/product/x86/obj/STATIC_LIBRARIES/libmesa_glsl_intermediates/glsl/glsl_to_nir.o' failed
make: *** [out/target/product/x86/obj/STATIC_LIBRARIES/libmesa_glsl_intermediates/glsl/glsl_to_nir.o] Error 1
make: *** Waiting for unfinished jobs....

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
8 years agoisl: automake: don't include isl_format_layout.c in two lists.
Emil Velikov [Tue, 31 May 2016 15:59:39 +0000 (16:59 +0100)]
isl: automake: don't include isl_format_layout.c in two lists.

Including the file in both ISL_FILES and ISL_GENERATED_FILES makes
the actual dependency list less obvious.

v2: Drop unrelated vulkan hunk (Jason).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoautomake: bring back the .PHONY git_sha1.h.tmp rule
Emil Velikov [Tue, 31 May 2016 13:46:19 +0000 (14:46 +0100)]
automake: bring back the .PHONY git_sha1.h.tmp rule

With earlier commit 3689ef32afd ("automake: rework the git_sha1.h rule,
include in tarball") we/I erroneously removed the PHONY rule and the
temporary file.

The former is used to ensure that the header is regenerated when on each
make invocation, while the latter helps us avoid the unneeded rebuild(s)
when the SHA1 hasn't changed.

Reported-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
8 years agoi965: Add _NEW_POINT to a couple of comments.
Kenneth Graunke [Thu, 2 Jun 2016 00:32:55 +0000 (17:32 -0700)]
i965: Add _NEW_POINT to a couple of comments.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
8 years agosvga: allow copy box in svga_transfer_dma_band()
Charmaine Lee [Tue, 31 May 2016 23:33:52 +0000 (16:33 -0700)]
svga: allow copy box in svga_transfer_dma_band()

Instead of just allow copy of a rectangle in svga_transfer_dma_band(),
this patch allows it to copy a box, hence allows copy a 3d texture
in one transfer.

Fixes black screen in running Heaven after commit fb9fe35. (Bug 1663282)

Tested with Heaven, glretrace, piglit.

Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agofreedreno: fix bad bitshift warnings
Rob Clark [Thu, 2 Jun 2016 20:23:36 +0000 (16:23 -0400)]
freedreno: fix bad bitshift warnings

Coverity doesn't realize idx will never be negative.  Throw in some
assert()s to help it out.

(Hopefully assert() isn't getting compiled out for coverity build.. but
there seems to be just one way to find out.  We might have to change
these to assume())

Fixes CID 13624421362443

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agofreedreno: assume builtin shaders do compile
Rob Clark [Thu, 2 Jun 2016 20:17:16 +0000 (16:17 -0400)]
freedreno: assume builtin shaders do compile

Maybe we should switch to ureg to build the builtin shaders.  But at any
rate, if they fail to compile it is because someone messed them up (or
changed TGSI syntax?).

CID 1362444

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agoi965/fs: Reindent emit_zip().
Francisco Jerez [Fri, 27 May 2016 08:02:19 +0000 (01:02 -0700)]
i965/fs: Reindent emit_zip().

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Skip SIMD lowering destination zipping if possible.
Francisco Jerez [Fri, 27 May 2016 07:45:04 +0000 (00:45 -0700)]
i965/fs: Skip SIMD lowering destination zipping if possible.

Skipping the temporary allocation and copy instructions is easy (just
return dst), but the conditions used to find out whether the copy can
be optimized out safely without breaking the program are rather
complex: The destination must be exactly one component of at most the
execution width of the lowered instruction, and all source regions of
the instruction must be either fully disjoint from the destination or
be aligned with it group by group.

v2: Don't handle partial source-destination overlap for simplicity
    (Jason).  No instruction count regressions with respect to v1 in
    either shader-db or the few FP64 shader_runner test-cases with
    partial overlap I've checked manually.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoblorp: Fix 16x multisample scaled blits
Anuj Phogat [Thu, 2 Jun 2016 18:05:44 +0000 (11:05 -0700)]
blorp: Fix 16x multisample scaled blits

Piglit test ext_framebuffer_multisample_blit_scaled-blit-scaled
(with added 16x sample support) now passes with this patch.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agometa: Fix indentation in shader code
Anuj Phogat [Tue, 31 May 2016 17:57:03 +0000 (10:57 -0700)]
meta: Fix indentation in shader code

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
8 years agomesa/copyimage: report INVALID_VALUE for missing cube face
Dave Airlie [Thu, 2 Jun 2016 04:13:18 +0000 (14:13 +1000)]
mesa/copyimage: report INVALID_VALUE for missing cube face

The specs says INVALID_VALUE for exceeding dimensions,
which is really what is happening here.

This fixes:
GL45-CTS.copy_image.non_existent_mipmap

Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agomesa/copyimage: fix num samples check to handle renderbuffers.
Dave Airlie [Thu, 2 Jun 2016 03:41:28 +0000 (13:41 +1000)]
mesa/copyimage: fix num samples check to handle renderbuffers.

This test was only happening for textures, but there is
nothing in the spec to say this, so test it for all cases.

This fixes:
GL45-CTS.copy_image.invalid_target

Cc: "11.2 12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agofreedreno/a4xx: silence coverity warning
Rob Clark [Thu, 2 Jun 2016 15:47:11 +0000 (11:47 -0400)]
freedreno/a4xx: silence coverity warning

CID 1362451

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agofreedreno/a3xx+a4xx: fix potential null ptr deref
Rob Clark [Thu, 2 Jun 2016 15:42:25 +0000 (11:42 -0400)]
freedreno/a3xx+a4xx: fix potential null ptr deref

Coverity spotted the a3xx case (not sure why not the a4xx).

CID 1362452

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agofreedreno/ir3: fix coverity warning
Rob Clark [Thu, 2 Jun 2016 15:19:43 +0000 (11:19 -0400)]
freedreno/ir3: fix coverity warning

CID 1362453

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agofreedreno/ir3: use nir_shader_get_entrypoint() helper
Rob Clark [Thu, 2 Jun 2016 15:13:26 +0000 (11:13 -0400)]
freedreno/ir3: use nir_shader_get_entrypoint() helper

Should also fix coverity warning: CID 1362454

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agofreedreno/a4xx: fix incorrect enum type
Rob Clark [Thu, 2 Jun 2016 14:49:18 +0000 (10:49 -0400)]
freedreno/a4xx: fix incorrect enum type

a4xx has it's own enum, different from a2xx/a3xx.

Spotted by coverity: CID 13624581362459

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agofreedreno: fix coverity negative array index warning
Rob Clark [Thu, 2 Jun 2016 14:36:23 +0000 (10:36 -0400)]
freedreno: fix coverity negative array index warning

Never can happen, since query would not have been created in the first
place if pidx(query_type) return negative.  Lets let coverity realize
this.

CID 1362460

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agofreedreno: fix dereference before null check
Rob Clark [Thu, 2 Jun 2016 14:33:08 +0000 (10:33 -0400)]
freedreno: fix dereference before null check

ptr can actually never be null so just drop the check.

CID 1362464 (#1 of 1): Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking ptr suggests that it may be null,
but it has already been dereferenced on all paths leading to the check.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agogallium/util: remove u_staging
Rob Clark [Sun, 29 May 2016 16:25:32 +0000 (12:25 -0400)]
gallium/util: remove u_staging

Unused, and fixes a couple of coverity warnings: CID 13621711362170

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Acked-by: Marek Olšák <marek.olsak@amd.com>
8 years agofreedreno/a3xx: only update/emit bordercolor state when needed
Rob Clark [Wed, 1 Jun 2016 17:40:53 +0000 (13:40 -0400)]
freedreno/a3xx: only update/emit bordercolor state when needed

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agofreedreno/a4xx: only update/emit bordercolor state when needed
Rob Clark [Wed, 1 Jun 2016 16:23:58 +0000 (12:23 -0400)]
freedreno/a4xx: only update/emit bordercolor state when needed

I noticed in stk that it was contributing to a lot of overhead.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
8 years agoi965: Add missing types to type_sz().
Matt Turner [Tue, 24 May 2016 22:10:25 +0000 (15:10 -0700)]
i965: Add missing types to type_sz().

Coverity warns in multiple places about the potential for division by
zero, caused by this function's default case.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
8 years agomesa/extensions: Fix ES1 extension reporting
Nanley Chery [Tue, 24 May 2016 21:27:26 +0000 (14:27 -0700)]
mesa/extensions: Fix ES1 extension reporting

Commit eda15abd84af575d3bde432e2163e30d743a7c87 , unintentionally
advertised these extensions in ES1 contexts. Undo this error.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agoegl: Check if API is supported when using eglBindAPI.
Plamena Manolova [Tue, 31 May 2016 16:32:38 +0000 (17:32 +0100)]
egl: Check if API is supported when using eglBindAPI.

According to the EGL specifications before binding an API
we must check whether it's supported first. If not eglBindAPI
should return EGL_FALSE and generate a EGL_BAD_PARAMETER error.

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agost/osmesa: remove double-write (overwriting)
Eric Engestrom [Tue, 31 May 2016 01:26:00 +0000 (19:26 -0600)]
st/osmesa: remove double-write (overwriting)

These two lines have been here since the file was created.
I'm guessing the second one was just for testing during dev, so it's the
one that's going away.

CoverityID: 1296205

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agost/vdpau: check for null pointer in get/put bits.
Nayan Deshmukh [Thu, 2 Jun 2016 06:41:58 +0000 (12:11 +0530)]
st/vdpau: check for null pointer in get/put bits.

Check for null pointer before accessing arrays in get/put bits
native/YCbCr/Indexed in VdpOutputSurface and VdpVideoSurface.

Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
8 years agoradeon/uvd: fix the H264 level for Tonga v2
Christian König [Wed, 25 May 2016 14:55:48 +0000 (16:55 +0200)]
radeon/uvd: fix the H264 level for Tonga v2

We support 5.2 for a while now.

v2: we even support 5.2 for H264, 5.1 is for HEVC.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: <mesa-stable@lists.freedesktop.org>
8 years agomesa/formatquery: add a comment to clarify INTERNALFORMAT_PREFERRED
Alejandro Piñeiro [Thu, 5 May 2016 09:27:05 +0000 (11:27 +0200)]
mesa/formatquery: add a comment to clarify INTERNALFORMAT_PREFERRED

The comment clarifies that the driver is called only to try to get
a preferred internalformat, and that it was already checked if the
format is supported or not.

Acked-by: Eduardo Lima <elima@igalia.com>
Acked-by: Antia Puentes <apuentes@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agoi965/formatquery: remove INTERNALFORMAT_PREFERRED implementation
Alejandro Piñeiro [Thu, 5 May 2016 09:28:37 +0000 (11:28 +0200)]
i965/formatquery: remove INTERNALFORMAT_PREFERRED implementation

Right now the implementation only checks if the internalformat is
supported or not. But that implementation is wrong, returning
unsupported for some internalformats. Additionally, checking if
the internalformat is supported or not is already done at mesa/main
before calling the driver hook, so this new check is not needed.

Acked-by: Eduardo Lima <elima@igalia.com>
Acked-by: Antia Puentes <apuentes@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agoi965/eu: use simd8 when exec_size != EXECUTE_16
Alejandro Piñeiro [Wed, 1 Jun 2016 16:49:29 +0000 (18:49 +0200)]
i965/eu: use simd8 when exec_size != EXECUTE_16

Among other thigs, fix a gpu hang when using INTEL_DEBUG=shader_time
for any shader.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agoi965: Remove old CS local ID handling
Jordan Justen [Mon, 23 May 2016 05:31:06 +0000 (22:31 -0700)]
i965: Remove old CS local ID handling

The old method pushed data for each channels uvec3 data of
gl_LocalInvocationID.

The new method pushes 1 dword of data that is a 'thread local ID'
value. Based on that value, we can generate gl_LocalInvocationIndex
and gl_LocalInvocationID with some calculations.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965: Enable cross-thread constants and compact local IDs for hsw+
Jordan Justen [Tue, 31 May 2016 22:45:24 +0000 (15:45 -0700)]
i965: Enable cross-thread constants and compact local IDs for hsw+

The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.

One complication is that cross-thread constants are loaded into
registers before per-thread constants. Previously, our local IDs were
loaded before the uniform data and treated as 'payload' data, even
though they were actually pushed into the registers like the other
uniform data.

Therefore, in this patch we simultaneously enable a newer layout where
each thread now uses a single uniform slot for a unique local ID for
the thread. This uniform is handled specially to make sure it is added
last into the uniform push constant registers. This minimizes our
usage of push constant registers, and maximizes our ability to use
cross-thread constants for registers.

To swap from the old to the new layout, we also need to flip some
lowering pass switches to let our driver handle the lowering instead.
We also no longer force thread_local_id_index to -1.

v4:
 * Minimize size of patch that switches from the old local ID layout
   to the new layout (Jason)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoanv: Support new local ID generation & cross-thread constants
Jordan Justen [Fri, 27 May 2016 07:53:27 +0000 (00:53 -0700)]
anv: Support new local ID generation & cross-thread constants

The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.

We also support per-thread data which allows us to store a per-thread
ID in one of the uniforms that can be used to calculate the
gl_LocalInvocationIndex and gl_LocalInvocationID variables.

v4:
 * Support the old local ID push constant layout as well (Jason)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965: Support new local ID push constant & cross-thread constants
Jordan Justen [Mon, 23 May 2016 04:55:43 +0000 (21:55 -0700)]
i965: Support new local ID push constant & cross-thread constants

The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.

We also support per-thread data which allows us to store a per-thread
ID in one of the uniforms that can be used to calculate the
gl_LocalInvocationIndex and gl_LocalInvocationID variables.

v4:
 * Support the old local ID push constant layout as well (Jason)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965: Add CS push constant info to brw_cs_prog_data
Jordan Justen [Mon, 23 May 2016 04:46:28 +0000 (21:46 -0700)]
i965: Add CS push constant info to brw_cs_prog_data

We need information about push constants in a few places for the GL
driver, and another couple places for the vulkan driver.

When we add support for uploading both a common (cross-thread) set of
push constants, combined with the previous per-thread push constant
data, things are going to get even more complicated. To simplify
things, we add push constant info into the cs prog_data struct.

The cross-thread constant support is added as of Haswell. To support
it we need to make sure all push constants with uniform values are
added to earlier registers. The register that varies per thread and
holds the thread invocation's unique local ID needs to be added last.

For now we add the code that would calculate cross-thread constatn
information for hsw+, but we force it (cross_thread_supported) off
until the other parts of the driver support it.

v4:
 * Support older local ID push constant layout as well. (Jason)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965: Store number of threads in brw_cs_prog_data
Jordan Justen [Thu, 26 May 2016 20:49:07 +0000 (13:49 -0700)]
i965: Store number of threads in brw_cs_prog_data

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965: Add nir based intrinsic lowering and thread ID uniform
Jordan Justen [Sun, 22 May 2016 07:08:06 +0000 (00:08 -0700)]
i965: Add nir based intrinsic lowering and thread ID uniform

We add a lowering pass for nir intrinsics. This pass can replace nir
intrinsics with driver specific nir lower code.

We lower the gl_LocalInvocationIndex intrinsic based on a uniform
which is loaded with a thread specific ID.

We also lower the gl_LocalInvocationID based on
gl_LocalInvocationIndex.

v2:
 * Create variable during lowering pass. (Ken)

v3:
 * Don't create a variable, but instead just insert an intrisic call
   to load a uniform from the allocated location. (Jason)

v4:
 * Don't run this pass if thread_local_id_index < 0

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965: Put CS local thread ID uniform in last push register
Jordan Justen [Mon, 23 May 2016 04:29:53 +0000 (21:29 -0700)]
i965: Put CS local thread ID uniform in last push register

This thread ID uniform will be used to compute the
gl_LocalInvocationIndex and gl_LocalInvocationID values.

It is important for this uniform to be added in the last push constant
register. fs_visitor::assign_constant_locations is updated to make
sure this happens.

The reason this is important is that the cross-thread push constant
registers are loaded first, and the per-thread push constant registers
are loaded after that. (Broadwell adds another push constant upload
mechanism which reverses this order, but we are ignoring this for
now.)

v2:
 * Add variable in intrinsics lowering pass
 * Make sure the ID is pushed last in assign_constant_locations, and
   that we save a spot for the ID in the push constants

v3:
 * Simplify code based with Jason's suggestions.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965: Add uniform for a CS thread local base ID
Jordan Justen [Sun, 29 May 2016 06:45:21 +0000 (23:45 -0700)]
i965: Add uniform for a CS thread local base ID

v4:
 * Force thread_local_id_index to -1 for now, and have
   fs_visitor::setup_cs_payload look at thread_local_id_index. This
   enables us to more easily cut over from the old local ID layout to
   the new layout, as suggested by Jason.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965: Add nir channel_num system value
Jordan Justen [Sun, 22 May 2016 23:33:44 +0000 (16:33 -0700)]
i965: Add nir channel_num system value

v2:
 * simd16/32 fixes (curro)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agonir: Make lowering gl_LocalInvocationIndex optional
Jordan Justen [Sun, 22 May 2016 22:54:48 +0000 (15:54 -0700)]
nir: Make lowering gl_LocalInvocationIndex optional

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoglsl: Add glsl LowerCsDerivedVariables option
Jordan Justen [Sat, 21 May 2016 21:21:32 +0000 (14:21 -0700)]
glsl: Add glsl LowerCsDerivedVariables option

v2:
 * Move lower flag to context constants. (Ken)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/fs: Copy the offset when lowering logical pull constant sends
Jason Ekstrand [Wed, 1 Jun 2016 22:01:04 +0000 (15:01 -0700)]
i965/fs: Copy the offset when lowering logical pull constant sends

This fixes 64 Vulkan CTS tests per gen

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96299
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agoglsl/distance: make sure we use clip dist varying slot for lowered var.
Dave Airlie [Tue, 24 May 2016 20:03:24 +0000 (06:03 +1000)]
glsl/distance: make sure we use clip dist varying slot for lowered var.

When lowering, we always want to use the clip dist varying.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agowinsys/amdgpu: decay max_ib_size over time
Nicolai Hähnle [Sun, 8 May 2016 17:53:23 +0000 (12:53 -0500)]
winsys/amdgpu: decay max_ib_size over time

So that memory use will eventually decrease again after a temporary peak.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>