mesa.git
6 years agoanv: Free the app and engine name
Jason Ekstrand [Wed, 29 Aug 2018 15:06:56 +0000 (10:06 -0500)]
anv: Free the app and engine name

Fixes: 8c048af5890d4 "anv: Copy the appliation info into the instance"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agonv50/ir: silence partitionLoadStore() unused function warning
Rhys Kidd [Fri, 10 Aug 2018 16:44:37 +0000 (12:44 -0400)]
nv50/ir: silence partitionLoadStore() unused function warning

Move this now-unused function into the existing comment block, which was its only prior use.

../../../../../src/gallium/drivers/nouveau/codegen/nv50_ir_from_tgsi.cpp:2645:1: warning:
      unused function 'partitionLoadStore' [-Wunused-function]
partitionLoadStore(uint8_t comp[2], uint8_t size[2], uint8_t mask)

Fixes: ("86e4440361 nouveau: codegen: Disable more old resource handling code")
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
6 years agoglsl/linker: Link all out vars from a shader objects on a single stage
vadym.shovkoplias [Tue, 28 Aug 2018 07:32:18 +0000 (10:32 +0300)]
glsl/linker: Link all out vars from a shader objects on a single stage

During intra stage linking some out variables can be dropped because
it is not used in a shader with the main function. But these out vars
can be referenced on later stages which can lead to further linking
errors.

Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105731

6 years agoanv: blorp: support multiple aspect blits
Lionel Landwerlin [Tue, 28 Aug 2018 10:16:33 +0000 (11:16 +0100)]
anv: blorp: support multiple aspect blits

Newer blit tests are enabling depth&stencils blits. We currently don't
support it but can do by iterating over the aspects masks (copy some
logic from the CopyImage function).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9f44745eca0e41 ("anv: Use blorp to implement VkBlitImage")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agomesa: allow GL_UNSIGNED_BYTE type for SNORM reads
Tapani Pälli [Mon, 27 Aug 2018 11:40:41 +0000 (14:40 +0300)]
mesa: allow GL_UNSIGNED_BYTE type for SNORM reads

OpenGL ES spec states:
   "For normalized fixed-point rendering surfaces, the combination format
    RGBA and type UNSIGNED_BYTE is accepted."

This fixes following failing VK-GL-CTS tests:

   KHR-GLES3.packed_pixels.pbo_rectangle.rgba8_snorm
   KHR-GLES3.packed_pixels.rectangle.rgba8_snorm
   KHR-GLES3.packed_pixels.varied_rectangle.rgba8_snorm

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
https://bugs.freedesktop.org/show_bug.cgi?id=107658
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Andres Gomez <agomez@igalia.com>
6 years agonir: add loop unroll support for wrapper loops
Timothy Arceri [Sat, 7 Jul 2018 07:56:26 +0000 (17:56 +1000)]
nir: add loop unroll support for wrapper loops

This adds support for unrolling the classic

    do {
        // ...
    } while (false)

that is used to wrap multi-line macros. GLSL IR also wraps switch
statements in a loop like this.

shader-db results IVB:

total loops in shared programs: 2515 -> 2512 (-0.12%)
loops in affected programs: 33 -> 30 (-9.09%)
helped: 3
HURT: 0

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agonir/opt_loop_unroll: Remove unneeded phis if we make progress
Timothy Arceri [Wed, 11 Jul 2018 00:50:16 +0000 (10:50 +1000)]
nir/opt_loop_unroll: Remove unneeded phis if we make progress

Now that SSA values can be derefs and they have special rules, we have
to be a bit more careful about our LCSSA phis.  In particular, we need
to clean up in case LCSSA ended up creating a phi node for a deref.
This avoids validation issues with some CTS tests with the following
patch, but its possible this we could also see the same problem with
the existing unrolling passes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agonir: add complex_loop bool to loop info
Timothy Arceri [Sat, 7 Jul 2018 02:09:26 +0000 (12:09 +1000)]
nir: add complex_loop bool to loop info

In order to be sure loop_terminator_list is an accurate
representation of all the jumps in the loop we need to be sure we
didn't encounter any other complex behaviour such as continues,
nested breaks, etc during analysis.

This will be used in the following patch.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agonir: always attempt to find loop terminators
Timothy Arceri [Sat, 7 Jul 2018 02:02:08 +0000 (12:02 +1000)]
nir: always attempt to find loop terminators

This will help later patches with unrolling loops that end with a
break i.e. loops the always exit on their first interation.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoac/surface: fix CMASK fast clear for NPOT textures with mipmapping on SI/CI/VI
Marek Olšák [Tue, 28 Aug 2018 18:39:09 +0000 (14:39 -0400)]
ac/surface: fix CMASK fast clear for NPOT textures with mipmapping on SI/CI/VI

This fixes VM faults and corruption.

Cc: 18.1 18.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoi965/vec4: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1
Ian Romanick [Sat, 25 Aug 2018 00:24:36 +0000 (17:24 -0700)]
i965/vec4: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1

No shader-db changes on any Intel platform.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
6 years agoi965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for imageAtomicAdd of +1 or -1
Ian Romanick [Sat, 25 Aug 2018 00:41:01 +0000 (17:41 -0700)]
i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for imageAtomicAdd of +1 or -1

v2: Refactor selection of atomic opcode to a separate function.
Suggested by Jason.

No changes on any other Intel platforms.

Skylake
total instructions in shared programs: 14304261 -> 14304241 (<.01%)
instructions in affected programs: 1625 -> 1605 (-1.23%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5
helped stats (rel) min: 1.01% max: 14.29% x̄: 5.86% x̃: 4.07%
95% mean confidence interval for instructions value: -10.66 0.66
95% mean confidence interval for instructions %-change: -15.91% 4.19%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 527531226 -> 527531194 (<.01%)
cycles in affected programs: 92204 -> 92172 (-0.03%)
helped: 2
HURT: 0

Haswell and Broadwell had similar results. (Broadwell shown)
total instructions in shared programs: 14615730 -> 14615710 (<.01%)
instructions in affected programs: 1838 -> 1818 (-1.09%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 5.00 x̃: 5
helped stats (rel) min: 0.89% max: 13.04% x̄: 5.37% x̃: 3.78%
95% mean confidence interval for instructions value: -10.66 0.66
95% mean confidence interval for instructions %-change: -14.59% 3.85%
Inconclusive result (value mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965/fs: Refactor image atomics to be a bit more like other atomics
Ian Romanick [Sat, 25 Aug 2018 00:23:26 +0000 (17:23 -0700)]
i965/fs: Refactor image atomics to be a bit more like other atomics

This greatly simplifies the next patch.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
6 years agoi965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1
Ian Romanick [Thu, 23 Aug 2018 03:31:11 +0000 (20:31 -0700)]
i965/fs: Emit BRW_AOP_INC or BRW_AOP_DEC for atomicAdd of +1 or -1

Funny story... a single shader was hurt for instructions, spills, fills.
That same shader was also the most helped for cycles.  #GPUsAreWeird

No changes on any other Intel platform.

v2: Refactor selection of atomic opcode to a separate function.
Suggested by Jason.

Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14304116 -> 14304261 (<.01%)
instructions in affected programs: 12776 -> 12921 (1.13%)
helped: 19
HURT: 1
helped stats (abs) min: 1 max: 16 x̄: 2.32 x̃: 1
helped stats (rel) min: 0.05% max: 7.27% x̄: 0.92% x̃: 0.55%
HURT stats (abs)   min: 189 max: 189 x̄: 189.00 x̃: 189
HURT stats (rel)   min: 4.87% max: 4.87% x̄: 4.87% x̃: 4.87%
95% mean confidence interval for instructions value: -12.83 27.33
95% mean confidence interval for instructions %-change: -1.57% 0.31%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 527552861 -> 527531226 (<.01%)
cycles in affected programs: 1459195 -> 1437560 (-1.48%)
helped: 16
HURT: 2
helped stats (abs) min: 2 max: 21328 x̄: 1353.69 x̃: 6
helped stats (rel) min: 0.01% max: 5.29% x̄: 0.36% x̃: 0.03%
HURT stats (abs)   min: 12 max: 12 x̄: 12.00 x̃: 12
HURT stats (rel)   min: 0.03% max: 0.03% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -3699.81 1295.92
95% mean confidence interval for cycles %-change: -0.94% 0.30%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 8025 -> 8033 (0.10%)
spills in affected programs: 208 -> 216 (3.85%)
helped: 1
HURT: 1

total fills in shared programs: 10989 -> 11040 (0.46%)
fills in affected programs: 444 -> 495 (11.49%)
helped: 1
HURT: 1

Ivy Bridge
total instructions in shared programs: 11709181 -> 11709153 (<.01%)
instructions in affected programs: 3505 -> 3477 (-0.80%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 23 x̄: 9.33 x̃: 4
helped stats (rel) min: 0.11% max: 1.16% x̄: 0.63% x̃: 0.61%

total cycles in shared programs: 254741126 -> 254738801 (<.01%)
cycles in affected programs: 919067 -> 916742 (-0.25%)
helped: 3
HURT: 0
helped stats (abs) min: 21 max: 2144 x̄: 775.00 x̃: 160
helped stats (rel) min: 0.03% max: 0.90% x̄: 0.32% x̃: 0.03%

total spills in shared programs: 4536 -> 4533 (-0.07%)
spills in affected programs: 40 -> 37 (-7.50%)
helped: 1
HURT: 0

total fills in shared programs: 4819 -> 4813 (-0.12%)
fills in affected programs: 94 -> 88 (-6.38%)
helped: 1
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agointel/compiler: Silence unused parameter warnings in brw_eu.h
Ian Romanick [Sat, 25 Aug 2018 00:24:59 +0000 (17:24 -0700)]
intel/compiler: Silence unused parameter warnings in brw_eu.h

All of the other brw_*_desc functions take a devinfo parameter, and all
of the others at least have an assert that uses it.  Keep the parameter,
but mark it as unused.

Silences 37 warnings like:

In file included from src/intel/common/gen_disasm.c:27:0:
src/intel/compiler/brw_eu.h: In function ‘brw_pixel_interp_desc’:
src/intel/compiler/brw_eu.h:377:53: warning: unused parameter ‘devinfo’ [-Wunused-parameter]
 brw_pixel_interp_desc(const struct gen_device_info *devinfo,
                                                     ^~~~~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
6 years agoi965: enable AMD_depth_clamp_separate
Sagar Ghuge [Tue, 21 Aug 2018 21:28:17 +0000 (14:28 -0700)]
i965: enable AMD_depth_clamp_separate

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agoi965: add functional changes for AMD_depth_clamp_separate
Sagar Ghuge [Tue, 21 Aug 2018 21:25:17 +0000 (14:25 -0700)]
i965: add functional changes for AMD_depth_clamp_separate

Gen >= 9 have ability to control clamping of depth values separately at
near and far plane.

z_w is clamped to the range [min(n,f), 0] if clamping at near plane is
enabled, [0, max(n,f)] if clamping at far plane is enabled and [min(n,f)
max(n,f)] if clamping at both plane is enabled.

v2: 1) Use better coding style (Ian Romanick)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agomesa: add EXTRA_EXT for AMD_depth_clamp_separate
Sagar Ghuge [Fri, 27 Jul 2018 22:03:54 +0000 (15:03 -0700)]
mesa: add EXTRA_EXT for AMD_depth_clamp_separate

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: add support for GL_AMD_depth_clamp_separate tokens
Sagar Ghuge [Tue, 21 Aug 2018 20:40:55 +0000 (13:40 -0700)]
mesa: add support for GL_AMD_depth_clamp_separate tokens

_mesa_set_enable() and _mesa_IsEnabled() extended to accept new two
tokens GL_DEPTH_CLAMP_NEAR_AMD and GL_DEPTH_CLAMP_FAR_AMD.

v2: Remove unnecessary parentheses (Marek Olsak)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: Add support for AMD_depth_clamp_separate
Sagar Ghuge [Fri, 27 Jul 2018 21:55:57 +0000 (14:55 -0700)]
mesa: Add support for AMD_depth_clamp_separate

Enable _mesa_PushAttrib() and _mesa_PopAttrib() to handle
GL_DEPTH_CLAMP_NEAR_AMD and GL_DEPTH_CLAMP_FAR_AMD tokens.

Remove DepthClamp, because DepthClampNear + DepthClampFar replaces it,
as suggested by Marek Olsak.

Driver that enables AMD_depth_clamp_separate will only ever look at
DepthClampNear and DepthClampFar, as suggested by Ian Romanick.

v2: 1) Remove unnecessary parentheses (Marek Olsak)
    2) if AMD_depth_clamp_separate is unsupported, TEST_AND_UPDATE
       GL_DEPTH_CLAMP only (Marek Olsak)
    3) Clamp against near and far plane separately (Marek Olsak)
    4) Clip point separately for near and far Z clipping plane (Marek
       Olsak)

v3: Clamp raster position zw to the range [min(n,f), 0] for near plane
    and [0, max(n,f)] for far plane (Marek Olsak)

v4: Use MIN2 and MAX2 instead of CLAMP (Marek Olsak)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agomesa: Add types for AMD_depth_clamp_separate.
Sagar Ghuge [Fri, 27 Jul 2018 03:08:44 +0000 (20:08 -0700)]
mesa: Add types for AMD_depth_clamp_separate.

Add some basic types and storage for the AMD_depth_clamp_separate
extension.

v2: 1) Drop unnecessary definition (Marek Olsak)
    2) Expose extension in compatibility profile (Marek Olsak)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoglapi: define AMD_depth_clamp_separate
Sagar Ghuge [Fri, 27 Jul 2018 21:42:36 +0000 (14:42 -0700)]
glapi: define AMD_depth_clamp_separate

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoanv: Claim to support depthBounds for ID games
Jason Ekstrand [Tue, 30 Jan 2018 02:41:15 +0000 (18:41 -0800)]
anv: Claim to support depthBounds for ID games

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agoanv: Copy the appliation info into the instance
Jason Ekstrand [Tue, 30 Jan 2018 02:12:04 +0000 (18:12 -0800)]
anv: Copy the appliation info into the instance

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agovulkan/alloc: Add a vk_strdup helper
Jason Ekstrand [Tue, 30 Jan 2018 02:11:38 +0000 (18:11 -0800)]
vulkan/alloc: Add a vk_strdup helper

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agomeson: Actually load translation files
Dylan Baker [Fri, 24 Aug 2018 14:05:36 +0000 (07:05 -0700)]
meson: Actually load translation files

Currently we run the script but don't actually load any files, even in a
tarball where they exist.

Fixes: 3218056e0eb375eeda470058d06add1532acd6d4
       ("meson: Build i965 and dri stack")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
6 years agonir: Remove outdated comment
Caio Marcelo de Oliveira Filho [Mon, 27 Aug 2018 22:18:10 +0000 (15:18 -0700)]
nir: Remove outdated comment

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoi965: Add INTEL_fragment_shader_ordering support.
Kevin Rogovin [Mon, 27 Aug 2018 06:54:24 +0000 (09:54 +0300)]
i965: Add INTEL_fragment_shader_ordering support.

Adds suppport for INTEL_fragment_shader_ordering. We achieve
the fragment ordering by using the same instruction as for
beginInvocationInterlockARB() which is by issuing a memory
fence via sendc.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
6 years agomesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering
Kevin Rogovin [Mon, 27 Aug 2018 06:54:23 +0000 (09:54 +0300)]
mesa: Add GL/GLSL plumbing for INTEL_fragment_shader_ordering

This extension provides new GLSL built-in function
beginFragmentShaderOrderingIntel() that guarantees
(taking wording of GL_INTEL_fragment_shader_ordering
extension) that any memory transactions issued by
shader invocations from previous primitives mapped to
same xy window coordinates (and same sample when
per-sample shading is active), complete and are visible
to the shader invocation that called
beginFragmentShaderOrderingINTEL().

One advantage of INTEL_fragment_shader_ordering over
ARB_fragment_shader_interlock is that it provides a
function that operates as a memory barrie (instead
of a defining a critcial section) that can be called
under arbitary control flow from any function (in
contrast the begin/end of ARB_fragment_shader_interlock
may only be called once, from main(), under no control
flow.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
6 years agoi965/gen6/xfb: handle case where transform feedback is not active
Andrii Simiklit [Wed, 15 Aug 2018 15:20:32 +0000 (18:20 +0300)]
i965/gen6/xfb: handle case where transform feedback is not active

When the SVBI Payload Enable is false I guess the register R1.4
which contains the Maximum Streamed Vertex Buffer Index is filled by zero
and GS stops to write transform feedback when the transform feedback
is not active.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107579
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
6 years agodocs: add forgotten features to 18.2.0 release notes
Rhys Perry [Tue, 21 Aug 2018 10:08:17 +0000 (11:08 +0100)]
docs: add forgotten features to 18.2.0 release notes

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewied-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: 18.2: <mesa-stable@lists.freedesktop.org>
6 years agovirgl: add debug-switch to output TGSI
Erik Faye-Lund [Mon, 20 Aug 2018 11:46:32 +0000 (12:46 +0100)]
virgl: add debug-switch to output TGSI

This is quite useful for debugging shader-transpiling issues in
virglrenderer.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
6 years agovirgl: introduce $VIRGL_DEBUG=verbose
Erik Faye-Lund [Mon, 20 Aug 2018 11:08:55 +0000 (13:08 +0200)]
virgl: introduce $VIRGL_DEBUG=verbose

This adds an environment-varaible that can be used for driver-specific
flags, as well as a flag for it to enable verbose output.

While we're at it, quiet some overly chatty debug-output by default.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
6 years agovirgl: replace fprintf-call with debug_printf
Erik Faye-Lund [Mon, 20 Aug 2018 10:48:51 +0000 (12:48 +0200)]
virgl: replace fprintf-call with debug_printf

This is the only direct call-site for fprintf in virgl; all other
call-sites call debug_printf instead. So let's follow in style here.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
6 years agovirgl: delete commented out fprintf-call
Erik Faye-Lund [Mon, 20 Aug 2018 10:45:14 +0000 (12:45 +0200)]
virgl: delete commented out fprintf-call

This is just debug-cruft left over. Let's just get rid of it.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
6 years agomeson: Don't enable any vulkan drivers on arm, aarch64
Guido Günther [Sun, 26 Aug 2018 20:24:00 +0000 (22:24 +0200)]
meson: Don't enable any vulkan drivers on arm, aarch64

There's no Vulkan support for arm atm.

Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agomeson: Be a bit more helpful when arch or OS is unknown
Guido Günther [Sun, 26 Aug 2018 20:23:59 +0000 (22:23 +0200)]
meson: Be a bit more helpful when arch or OS is unknown

V2: Add one missing @0@

Signed-off-by: Guido Günther <guido.gunther@puri.sm>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agointel/eu: print bytes instead of 32 bit hex value
Sagar Ghuge [Mon, 27 Aug 2018 17:23:19 +0000 (10:23 -0700)]
intel/eu: print bytes instead of 32 bit hex value

INTEL_DEBUG=hex prints 32 bit hex value and due to endianness of CPU
byte order is reversed. In order to disassemble binary files, print
each byte instead of 32 bit hex value.

v2: Print blank spaces in order to vertically align output of compacted
    instructions hex value with uncompacted instructions hex value.
    (Matt Turner)

v3: Fix line wrap at correct length

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agointel: decoder: handle 0 sized structs
Lionel Landwerlin [Sat, 25 Aug 2018 17:22:00 +0000 (18:22 +0100)]
intel: decoder: handle 0 sized structs

Gen7.5 has a BLEND_STATE of size 0 which includes a variable length
group. We did not deal with that very well, leading to an endless
loop.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107544
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agonv50/ir,nvc0: use constant buffers for compute when possible on Kepler+
Rhys Perry [Fri, 3 Aug 2018 21:11:28 +0000 (22:11 +0100)]
nv50/ir,nvc0: use constant buffers for compute when possible on Kepler+

Gives a +7.79% increase in FPS with Hitman on lowest quality settings on
my GTX 1060.

total instructions in shared programs : 5787979 -> 5748677 (-0.68%)
total gprs used in shared programs    : 669901 -> 669373 (-0.08%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21068 -> 21064 (-0.02%)

                local     shared        gpr       inst      bytes
    helped           1           0         152         274         274
      hurt           0           0           0           0           0

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
6 years agonv50/ir: optimize multiplication by 16-bit immediates into two xmads
Rhys Perry [Sat, 18 Aug 2018 14:06:50 +0000 (15:06 +0100)]
nv50/ir: optimize multiplication by 16-bit immediates into two xmads

Rather than the usual three that would be created.

total instructions in shared programs : 5796385 -> 5786560 (-0.17%)
total gprs used in shared programs    : 670103 -> 669968 (-0.02%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21164 -> 21068 (-0.45%)

                local     shared        gpr       inst      bytes
    helped           1           0          64        1040        1040
      hurt           0           0          27           0           0

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
6 years agonv50/ir: optimize near power-of-twos into shladd
Rhys Perry [Sat, 18 Aug 2018 14:06:01 +0000 (15:06 +0100)]
nv50/ir: optimize near power-of-twos into shladd

total instructions in shared programs : 5819319 -> 5796385 (-0.39%)
total gprs used in shared programs    : 670571 -> 670103 (-0.07%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21164 -> 21164 (0.00%)

                local     shared        gpr       inst      bytes
    helped           0           0         318        1758        1758
      hurt           0           0          63           0           0

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
6 years agonv50/ir: move a * b -> a << log2(b) code into createMul()
Rhys Perry [Wed, 13 Jun 2018 15:30:01 +0000 (16:30 +0100)]
nv50/ir: move a * b -> a << log2(b) code into createMul()

With this commit, OP_MAD is handled on nv50 too. This commit is also
useful for later commits.

Also, instead of creating a shladd, it relies on LateAlgebraicOpt to
create one. This simplifies the code and helps shader-db slightly overall.

total instructions in shared programs : 5820882 -> 5819319 (-0.03%)
total gprs used in shared programs    : 670595 -> 670571 (-0.00%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21164 -> 21164 (0.00%)

                local     shared        gpr       inst      bytes
    helped           0           0          18         230         230
      hurt           0           0           8         263         263

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
6 years agonv50/ir: optimize imul/imad to xmads
Rhys Perry [Wed, 13 Jun 2018 15:25:23 +0000 (16:25 +0100)]
nv50/ir: optimize imul/imad to xmads

This hits the shader-db numbers a good bit, though a few xmads is way
faster than an imul or imad and the cost is mitigated by the next commit,
which optimizes many multiplications by immediates into shorter and less
register heavy instructions than the xmads.

total instructions in shared programs : 5768871 -> 5820882 (0.90%)
total gprs used in shared programs    : 669919 -> 670595 (0.10%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21068 -> 21164 (0.46%)

                local     shared        gpr       inst      bytes
    helped           0           0          38           0           0
      hurt           1           0         365        3076        3076

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
6 years agogm107/ir: add support for OP_XMAD on GM107+
Rhys Perry [Wed, 13 Jun 2018 15:21:56 +0000 (16:21 +0100)]
gm107/ir: add support for OP_XMAD on GM107+

v4: make the immediate field 16 bits
v5: don't ever emit h1 flags for immediates

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
6 years agonv50/ir: add preliminary support for OP_XMAD
Rhys Perry [Wed, 13 Jun 2018 15:21:20 +0000 (16:21 +0100)]
nv50/ir: add preliminary support for OP_XMAD

v4: remove uint16_t(...)
v4: don't allow immediates outside [0,65535] in insnCanLoad()

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
6 years agoglsl/linker: Allow unused in blocks which are not declated on previous stage
vadym.shovkoplias [Thu, 23 Aug 2018 10:12:16 +0000 (13:12 +0300)]
glsl/linker: Allow unused in blocks which are not declated on previous stage

>From Section 4.3.4 (Inputs) of the GLSL 1.50 spec:

    "Only the input variables that are actually read need to be written
     by the previous stage; it is allowed to have superfluous
     declarations of input variables."

Fixes:
    * interstage-multiple-shader-objects.shader_test

v2:
  Update comment in ir.h since the usage of "used" field
  has been extended.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101247
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agonir: Pull block_ends_in_jump into nir.h
Jason Ekstrand [Fri, 24 Aug 2018 14:34:05 +0000 (09:34 -0500)]
nir: Pull block_ends_in_jump into nir.h

We had two different implementations in different files.  May as well
have one and put it in nir.h.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agoanv: Add support for protected memory properties on anv_GetPhysicalDeviceProperties2()
Samuel Iglesias Gonsálvez [Fri, 24 Aug 2018 10:11:49 +0000 (12:11 +0200)]
anv: Add support for protected memory properties on anv_GetPhysicalDeviceProperties2()

VkPhysicalDeviceProtectedMemoryProperties structure is new on Vulkan 1.1.

Fixes Vulkan CTS CL#2849.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/tools: Add 0x in front of a couple of hex values
Jason Ekstrand [Sat, 25 Aug 2018 22:34:17 +0000 (17:34 -0500)]
intel/tools: Add 0x in front of a couple of hex values

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agoanv: Fill holes in the VF VUE to zero
Jason Ekstrand [Sat, 25 Aug 2018 22:08:04 +0000 (17:08 -0500)]
anv: Fill holes in the VF VUE to zero

This fixes a GPU hang in DOOM 2016 running under wine.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104809
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel: tools: Fix aubinator_error's fprintf call (format-security)
Kai Wasserbäch [Sat, 25 Aug 2018 10:00:30 +0000 (12:00 +0200)]
intel: tools: Fix aubinator_error's fprintf call (format-security)

The recent commit 4616639b49b4bbc91e503c1c27632dccc1c2b5be introduced
the new function aubinator_error() which is a trivial wrapper around
fprintf() to STDERR. The call to fprintf() however is passed the message
msg directly:
  fprintf(stderr, msg);

This is a format-security violation and leads to an FTBFS with
-Werror=format-security (GCC 8):
  ../../../src/intel/tools/aubinator.c: In function 'aubinator_error':
  ../../../src/intel/tools/aubinator.c:74:4: error: format not a string literal and no format arguments [-Werror=format-security]
      fprintf(stderr, msg);
      ^~~~~~~

This patch fixes this trivially by introducing a catch-all "%s" format
argument.

Fixes: 4616639b49b ("intel: tools: split aub parsing from aubinator")
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/batch_decoder: Print blend states properly
Jason Ekstrand [Fri, 24 Aug 2018 21:05:08 +0000 (16:05 -0500)]
intel/batch_decoder: Print blend states properly

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/batch_decoder: Fix dynamic state printing
Jason Ekstrand [Fri, 24 Aug 2018 21:04:03 +0000 (16:04 -0500)]
intel/batch_decoder: Fix dynamic state printing

Instead of printing addresses like everyone else, we were accidentally
printing the offset from state base address.  Also, state_map is a void
pointer so we were incrementing in bytes instead of dwords and every
state other than the first was wrong.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/decoder: Print ISL formats for vertex elements
Jason Ekstrand [Fri, 24 Aug 2018 20:27:38 +0000 (15:27 -0500)]
intel/decoder: Print ISL formats for vertex elements

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/decoder: Clean up field iteration and fix sub-dword fields
Jason Ekstrand [Fri, 24 Aug 2018 20:23:04 +0000 (15:23 -0500)]
intel/decoder: Clean up field iteration and fix sub-dword fields

First of all, setting iter->name in advance_field is unnecessary because
it gets set by gen_decode_field which gets called immediately after
gen_decode_field in the one call-site.  Second, we weren't properly
initializing start_bit and end_bit in the initial condition of
gen_field_iterator_next so the first field of a struct would get printed
wrong if it doesn't start on the first bit.  This is fixed by adding a
iter_start_field helper which sets the field and also sets up the other
bits we need.  This fixes decoding of 3DSTATE_SBE_SWIZ.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agogallium: Split out PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE.
Kenneth Graunke [Sun, 24 Jun 2018 00:26:47 +0000 (17:26 -0700)]
gallium: Split out PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE.

Some hardware can do PIPE_TEX_WRAP_MIRROR_REPEAT but not
PIPE_TEX_WRAP_MIRROR_CLAMP and PIPE_TEX_WRAP_MIRROR_CLAMP_TO_BORDER.

Drivers for such hardware would like to advertise support for
ARB_texture_mirror_clamp_to_edge but not EXT_texture_mirror_clamp.

This commit adds a new PIPE_CAP_TEXTURE_MIRROR_CLAMP_TO_EDGE bit,
changes the extension enable to be based on that, and enables it
in all upstream drivers which supported PIPE_CAP_TEXTURE_MIRROR_CLAMP
(so they continue supporting this mode).

6 years agointel: decoder: unify MI_BB_START field naming
Lionel Landwerlin [Tue, 14 Aug 2018 10:22:12 +0000 (11:22 +0100)]
intel: decoder: unify MI_BB_START field naming

The batch decoder looks for a field with a particular name to decide
whether an MI_BB_START leads into a second batch buffer level. Because
the names are different between Gen7.5/8 and the newer generation we
fail that test and keep on reading (invalid) instructions.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107544
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agodocs: Update calendar, news, relnotes for 18.1.7
Dylan Baker [Fri, 24 Aug 2018 16:35:24 +0000 (09:35 -0700)]
docs: Update calendar, news, relnotes for 18.1.7

6 years agodocs: Add mesa 18.1.7 notes
Dylan Baker [Fri, 24 Aug 2018 16:29:07 +0000 (09:29 -0700)]
docs: Add mesa 18.1.7 notes

6 years agodocs: Add mesa 18.1.7 docs
Dylan Baker [Thu, 23 Aug 2018 16:39:20 +0000 (09:39 -0700)]
docs: Add mesa 18.1.7 docs

6 years agodocs: update calendar 18.2.0-rc4 is out, extend to 18.2.0-rc5
Andres Gomez [Fri, 24 Aug 2018 15:58:00 +0000 (18:58 +0300)]
docs: update calendar 18.2.0-rc4 is out, extend to 18.2.0-rc5

Signed-off-by: Andres Gomez <agomez@igalia.com>
6 years agodocs/relnotes: Mark NV_fragment_shader_interlock support in i965
Kevin Rogovin [Fri, 24 Aug 2018 06:00:46 +0000 (09:00 +0300)]
docs/relnotes: Mark NV_fragment_shader_interlock support in i965

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoegl/drm: use gbm_dri_bo() wrapper
Emil Velikov [Thu, 9 Aug 2018 14:13:07 +0000 (15:13 +0100)]
egl/drm: use gbm_dri_bo() wrapper

Remove the explicit cast, using the appropriate wrapper instead.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Daniel Stone <daniels@collabora.com>
6 years agoegl/drm: use gbm_dri_surface() wrapper
Emil Velikov [Thu, 9 Aug 2018 14:11:38 +0000 (15:11 +0100)]
egl/drm: use gbm_dri_surface() wrapper

Remove the explicit cast, using the appropriate wrapper instead.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Daniel Stone <daniels@collabora.com>
6 years agoegl/drm: use gbm_dri_device() wrapper
Emil Velikov [Thu, 9 Aug 2018 14:05:58 +0000 (15:05 +0100)]
egl/drm: use gbm_dri_device() wrapper

Remove the explicit cast, using the appropriate wrapper instead.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Daniel Stone <daniels@collabora.com>
6 years agoegl/android: simplify device open/probe
Emil Velikov [Wed, 8 Aug 2018 14:40:56 +0000 (15:40 +0100)]
egl/android: simplify device open/probe

Currently droid_probe_device, does not do any 'probing' but filtering
out a device if it doesn't match the vendor string given.

Rename the function, straighten the return type and call it only as
needed - an actual vendor string is provided.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
6 years agoegl/android: remove drmVersion::name NULL check
Emil Velikov [Wed, 8 Aug 2018 14:13:20 +0000 (15:13 +0100)]
egl/android: remove drmVersion::name NULL check

The name string is guaranteed to be non-NULL.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
6 years agoegl/android: remove droid_probe_driver()
Emil Velikov [Wed, 8 Aug 2018 14:05:56 +0000 (15:05 +0100)]
egl/android: remove droid_probe_driver()

The function name is misleading - it effectively checks if
loader_get_driver_for_fd fails. Which can happen only only on strdup
error - a close to impossible scenario.

Drop the function - we call the loader API at at later stage.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
6 years agoegl/android: use strcmp with drmVersion::name
Emil Velikov [Wed, 8 Aug 2018 13:56:00 +0000 (14:56 +0100)]
egl/android: use strcmp with drmVersion::name

The name string is guaranteed to be NULL terminated. Drop the explicit
length check that comes with strncmp().

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
6 years agoegl/android: use drmDevice instead of the manual /dev/dri iteration
Emil Velikov [Wed, 8 Aug 2018 13:19:05 +0000 (14:19 +0100)]
egl/android: use drmDevice instead of the manual /dev/dri iteration

Replace the manual handling of /dev/dri in favor of the drmDevice API.
The latter provides a consistent way of enumerating the devices,
providing device details as needed.

v2:
 - Use ARRAY_SIZE (Frank)
 - s/famour/favor/ typo (Frank)
 - Make MAX_DRM_DEVICES a macro - fix vla errors (RobF)
 - Remove left-over dev_path instance (RobF)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Robert Foss <robert.foss@collabora.com> (v1)
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
6 years agoRevert "configure: allow building with python3"
Emil Velikov [Fri, 24 Aug 2018 10:14:15 +0000 (11:14 +0100)]
Revert "configure: allow building with python3"

This reverts commit ae7898dfdbe5c8dab7d11c71862353f1ae43feb0.

Turns out the python scripts are _not_ fully python 3 compatible.
As Ilia reported using get_xmlpool.py with LANG=C produces some weird
output - see the link for details.

Even though the issue was spotted with the autoconf build, it exposes a
genuine problem with the script (and lack of lang handling of the meson
build.)

https://lists.freedesktop.org/archives/mesa-dev/2018-August/203508.html

6 years agoRevert "travis: use python3 for the autoconf builds"
Emil Velikov [Fri, 24 Aug 2018 10:10:24 +0000 (11:10 +0100)]
Revert "travis: use python3 for the autoconf builds"

This reverts commit 855af9a5a209f061355513b92f3ba4576f48d091.

Turns out the python scripts are _not_ fully python 3 compatible.
As Ilia reported using get_xmlpool.py with LANG=C produces some weird
output - see the link for details.

Even though the issue was spotted with the autoconf build, it exposes a
genuine problem with the script (and lack of lang handling of the meson
build.)

https://lists.freedesktop.org/archives/mesa-dev/2018-August/203508.html

6 years agoRevert "mesa: bump GL_MAX_ELEMENTS_INDICES and GL_MAX_ELEMENTS_VERTICES"
Kenneth Graunke [Fri, 24 Aug 2018 07:32:09 +0000 (00:32 -0700)]
Revert "mesa: bump GL_MAX_ELEMENTS_INDICES and GL_MAX_ELEMENTS_VERTICES"

This reverts commit 095515e16ca3cb2c9f1813b6602ee57ae28325a8.

This breaks KHR-GL46.map_buffer_alignment.functional on i965.

This code was apparently not reviewed and I don't know why we would
move from a driver configurable constant to a hardcoded value for all
drivers.  This really looks like an accidental hack push.

6 years agoRevert recent changes about not including compute in combined limits.
Kenneth Graunke [Fri, 24 Aug 2018 03:58:32 +0000 (20:58 -0700)]
Revert recent changes about not including compute in combined limits.

As far as I can tell, no one reviewed these changes, they made i965
assert fail on driver load, and I am not certain they are correct.
(Hopefully reverting these does not break radeonsi too badly...)

The uniform related changes seem fine and reasonable, but the texture
image units change is possibly incorrect.  According to the
OES_tessellation_shader spec issue 5:

   (5) How are aggregate shader limits computed?

    RESOLVED: Following the GL 4.4 model, but we restrict uniform
    buffer bindings to 12/stage instead of 14, this results in

        MAX_UNIFORM_BUFFER_BINDINGS = 72
            This is 12 bindings/stage * 6 shader stages, allowing a static
            partitioning of the bindings even though at most 5 stages can
            appear in a program object).
        MAX_COMBINED_UNIFORM_BLOCKS = 60
            This is 12 blocks/stage * 5 stages, since compute shaders can't
            be mixed with other stages.
        MAX_COMBINED_TEXTURE_IMAGE_UNITS = 96
            This is 16 textures/stage * 6 stages.

which definitely is including compute shaders in that last limit.
Not including compute shaders breaks the following test:
dEQP-GLES31.functional.state_query.integer.max_combined_texture_image_units_getinteger

There was enough breakage that I figured we should just send this back
to the drawing board.

Revert "i965: don't include compute resources in "Combined" limits"
Revert "st/mesa: don't include compute resources in "Combined" limits"
Revert "mesa: don't include compute resources in MAX_COMBINED_* limits"

This reverts commit b03dcb1e5f507c5950d0de053a6f76e6306ee71f.
This reverts commit cff290df4c09547cd2cb3b129ec59bdebdadba90.
This reverts commit 45f87a48f94148b484961f18a4f1ccf86f066b1c.

6 years agogallivm: don't use saturated unsigned add/sub intrinsics for llvm 8.0
Roland Scheidegger [Thu, 23 Aug 2018 17:07:05 +0000 (19:07 +0200)]
gallivm: don't use saturated unsigned add/sub intrinsics for llvm 8.0

These have been removed. Unfortunately auto-upgrade doesn't work for
jit. (Worse, it seems we don't get a compilation error anymore when
compiling the shader, rather llvm will just do a call to a null
function in the jitted shaders making it difficult to detect when
intrinsics vanish.)

Luckily the signed ones are still there, I helped convincing llvm
removing them is a bad idea for now, since while the unsigned ones have
sort of agreed-upon simplest patterns to replace them with, this is not
the case for the signed ones, and they require _significantly_ more
complex patterns - to the point that the recognition is IMHO probably
unlikely to ever work reliably in practice (due to other optimizations
interfering). (Even for the relatively trivial unsigned patterns, llvm
already added test cases where recognition doesn't work, unsaturated
add followed by saturated add may produce atrocious code.)
Nevertheless, it seems there's a serious quest to squash all
cpu-specific intrinsics going on, so I'd expect patches to nuke them as
well to resurface.

Adapt the existing fallback code to match the simple patterns llvm uses
and hope for the best. I've verified with lp_test_blend that it does
produce the expected saturated assembly instructions. Though our
cmp/select build helpers don't use boolean masks, but it doesn't seem
to interfere with llvm's ability to recognize the pattern.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106231
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
6 years agost/mesa: expose KHR_texture_compression_astc_sliced_3d
Marek Olšák [Mon, 6 Aug 2018 02:16:48 +0000 (22:16 -0400)]
st/mesa: expose KHR_texture_compression_astc_sliced_3d

This is ASTC 2D LDR allowing texture arrays and 3D, compressing each
slice as a separate 2D image. Tested by piglit. Trivial.

6 years agost/mesa: expose EXT_disjoint_timer_query
Marek Olšák [Mon, 6 Aug 2018 01:41:11 +0000 (21:41 -0400)]
st/mesa: expose EXT_disjoint_timer_query

same cap as ARB_timer_query, no changes needed, tested by piglit

6 years agomesa: expose EXT_vertex_attrib_64bit
Marek Olšák [Mon, 6 Aug 2018 06:48:12 +0000 (02:48 -0400)]
mesa: expose EXT_vertex_attrib_64bit

because the closed driver exposes it.
It's the same as the ARB extension.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agomesa: expose AMD_query_buffer_object
Marek Olšák [Mon, 6 Aug 2018 06:26:09 +0000 (02:26 -0400)]
mesa: expose AMD_query_buffer_object

it's a subset of the ARB extension.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agomesa: expose AMD_multi_draw_indirect
Marek Olšák [Mon, 6 Aug 2018 05:55:59 +0000 (01:55 -0400)]
mesa: expose AMD_multi_draw_indirect

because the closed driver exposes it.
This is equivalent to the ARB extension.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agomesa: expose AMD_gpu_shader_int64
Marek Olšák [Mon, 6 Aug 2018 04:56:35 +0000 (00:56 -0400)]
mesa: expose AMD_gpu_shader_int64

because the closed driver exposes it.

It's equivalent to ARB_gpu_shader_int64.
In this patch, I did everything the same as we do for ARB_gpu_shader_int64.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agomesa: expose ARB_post_depth_coverage in the Compatibility profile
Marek Olšák [Mon, 6 Aug 2018 05:25:51 +0000 (01:25 -0400)]
mesa: expose ARB_post_depth_coverage in the Compatibility profile

It only contains GLSL changes.

v2: allow the layout qualifier on GLSL <= 1.30

6 years agointel/nir: Enable nir_opt_find_array_copies
Jason Ekstrand [Tue, 24 Jul 2018 05:20:41 +0000 (22:20 -0700)]
intel/nir: Enable nir_opt_find_array_copies

We have to be a bit careful with this one because we want it to run in
the optimization loop but only in the first brw_nir_optimize call.
Later calls assume that we've lowered away copy_deref instructions and
we don't want to introduce any more.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15176942 -> 15176942 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

In spite of the lack of any shader-db improvement, this patch completely
eliminates spilling in the Batman: Arkham City tessellation shaders.
This is because we are now able to detect that the temporary array
created by DXVK for storing TCS inputs is a copy of the input arrays and
use indirect URB reads instead of making a copy of 4.5 KiB of input data
and then indirecting on it with if-ladders.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
6 years agonir: Add an array copy optimization
Jason Ekstrand [Tue, 24 Jul 2018 02:16:56 +0000 (19:16 -0700)]
nir: Add an array copy optimization

This peephole optimization looks for a series of load/store_deref or
copy_deref instructions that copy an array from one variable to another
and turns it into a copy_deref that copies the entire array.  The
pattern it looks for is extremely specific but it's good enough to pick
up on the input array copies in DXVK and should also be able to pick up
the sequence generated by spirv_to_nir for a OpLoad of a large composite
followed by OpStore.  It can always be improved later if needed.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
6 years agointel/nir: Use nir_shrink_vec_array_vars
Jason Ekstrand [Wed, 25 Jul 2018 15:54:09 +0000 (08:54 -0700)]
intel/nir: Use nir_shrink_vec_array_vars

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15177605 -> 15176765 (<.01%)
    instructions in affected programs: 4259 -> 3419 (-19.72%)
    helped: 1
    HURT: 0

    total spills in shared programs: 10954 -> 10855 (-0.90%)
    spills in affected programs: 295 -> 196 (-33.56%)
    helped: 1
    HURT: 0

    total fills in shared programs: 22222 -> 22117 (-0.47%)
    fills in affected programs: 417 -> 312 (-25.18%)
    helped: 1
    HURT: 0

The helped shader is from the OglCSDof synmark test.  On my Kaby Lake
laptop, the actual framerate of the benchmark didn't appear to improve
beyond the noise.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
6 years agonir: Add a array-of-vector variable shrinking pass
Jason Ekstrand [Wed, 25 Jul 2018 02:32:27 +0000 (19:32 -0700)]
nir: Add a array-of-vector variable shrinking pass

This pass looks for variables with vector or array-of-vector types and
narrows the type to only the components used.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
6 years agointel/nir: Use the new structure and array splitting passes
Jason Ekstrand [Tue, 24 Jul 2018 17:08:20 +0000 (10:08 -0700)]
intel/nir: Use the new structure and array splitting passes

We call structure splitting once because it is guaranteed to split all
the structures in the entire shader in one go.  We call array splitting
in the loop in case future optimizations turn indirects into direct
dereferences and we can split more arrays.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15177605 -> 15177605 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

This is unsurprising because nir_lower_vars_to_ssa already effectively
does structure and array splitting internally.  It doesn't actually
split the variables but it's ability to reason about aliasing in the
presence of arrays and structures and pick out scalars or vectors to be
lowered to SSA values is fairly advanced.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
6 years agonir: Add an array splitting pass
Jason Ekstrand [Tue, 24 Jul 2018 19:33:46 +0000 (12:33 -0700)]
nir: Add an array splitting pass

This pass looks for array variables where at least one level of the
array is never indirected and splits it into multiple smaller variables.

This pass doesn't really do much now because nir_lower_vars_to_ssa can
already see through arrays of arrays and can detect indirects on just
one level or even see that arr[i][0][5] does not alias arr[i][1][j].
This pass exists to help other passes more easily see through arrays of
arrays.  If a back-end does implement arrays using scratch or indirects
on registers, having more smaller arrays is likely to have better memory
efficiency.

v2 (Jason Ekstrand):
 - Better comments and naming (some from Caio)
 - Rework to use one hash map instead of two

v2.1 (Jason Ekstrand):
 - Fix a couple of bugs that were added in the rework including one
   which basically prevented it from running

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
6 years agonir: Add a structure splitting pass
Jason Ekstrand [Tue, 24 Jul 2018 17:08:06 +0000 (10:08 -0700)]
nir: Add a structure splitting pass

This pass doesn't really do much now because nir_lower_vars_to_ssa can
already see through structures and considers them to be "split".  This
pass exists to help other passes more easily see through structure
variables.  If a back-end does implement arrays using scratch or
indirects on registers, having more smaller arrays is likely to have
better memory efficiency.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
6 years agonir/types: Add array_or_matrix helpers
Jason Ekstrand [Wed, 25 Jul 2018 15:53:58 +0000 (08:53 -0700)]
nir/types: Add array_or_matrix helpers

Reviewed-by: Thomas Helland<thomashelland90@gmail.com>
6 years agoi965: don't include compute resources in "Combined" limits
Kenneth Graunke [Fri, 24 Aug 2018 00:24:19 +0000 (17:24 -0700)]
i965: don't include compute resources in "Combined" limits

The combined limits should only include shader stages that can be active
at the same time.  We don't need to include compute.

See also cff290df4c09547cd2cb3b129ec59bdebdadba90 for st/mesa.

Unbreaks i965 from assert failing on driver load since Marek's
45f87a48f94148b484961f18a4f1ccf86f066b1c, which dropped the core
Mesa capabilities before adjusting driver limits down to match.

6 years agoradeonsi: increase the maximum UBO size to 2 GB
Marek Olšák [Wed, 8 Aug 2018 19:37:21 +0000 (15:37 -0400)]
radeonsi: increase the maximum UBO size to 2 GB

Same as the closed driver.

This causes a failure in GL45-CTS.compute_shader.max, which has a trivial
bug.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
6 years agoradeonsi: bump MAX_GS_INVOCATIONS
Marek Olšák [Mon, 6 Aug 2018 12:09:52 +0000 (08:09 -0400)]
radeonsi: bump MAX_GS_INVOCATIONS

same as the closed driver

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
6 years agogallium: add PIPE_CAP_MAX_SHADER_BUFFER_SIZE
Marek Olšák [Mon, 6 Aug 2018 12:38:54 +0000 (08:38 -0400)]
gallium: add PIPE_CAP_MAX_SHADER_BUFFER_SIZE

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
6 years agogallium: add PIPE_CAP_MAX_GS_INVOCATIONS
Marek Olšák [Mon, 6 Aug 2018 12:07:25 +0000 (08:07 -0400)]
gallium: add PIPE_CAP_MAX_GS_INVOCATIONS

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
6 years agotgsi/ureg: don't call tgsi_sanity when it's too slow
Marek Olšák [Wed, 8 Aug 2018 19:07:51 +0000 (15:07 -0400)]
tgsi/ureg: don't call tgsi_sanity when it's too slow

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
6 years agost/mesa: fix up uniform limits to be able to expose large UBOs
Marek Olšák [Wed, 8 Aug 2018 19:17:26 +0000 (15:17 -0400)]
st/mesa: fix up uniform limits to be able to expose large UBOs

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
6 years agost/mesa: don't include compute resources in "Combined" limits
Marek Olšák [Wed, 8 Aug 2018 19:21:05 +0000 (15:21 -0400)]
st/mesa: don't include compute resources in "Combined" limits

The combined limits should only include shader stages that can be active
at the same time.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
6 years agost/mesa: set ctx->Const.SubPixelBits
Marek Olšák [Mon, 6 Aug 2018 08:25:15 +0000 (04:25 -0400)]
st/mesa: set ctx->Const.SubPixelBits

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>