mesa.git
6 years agoAndroid: fix missing generation of vtn_gather_types.c
Rob Herring [Wed, 13 Dec 2017 21:06:08 +0000 (15:06 -0600)]
Android: fix missing generation of vtn_gather_types.c

Commit bb1e6ff161c9 ("spirv: Add a prepass to set types on vtn_values")
added generation of vtn_gather_types.c, but forgot to add it to the
Android build files.

Fixes: bb1e6ff161c9 ("spirv: Add a prepass to set types on vtn_values")
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Rob Herring <robh@kernel.org>
6 years agomesa: Add glSpecializeShaderARB to common_desktop_functions
Dylan Baker [Tue, 12 Dec 2017 19:48:31 +0000 (11:48 -0800)]
mesa: Add glSpecializeShaderARB to common_desktop_functions

CC: Nicolai Hähnle <nicolai.haehnle@amd.com>
CC: Mark Janes <mark.a.janes@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104231
Fixes: 46b21b8f906 ("mesa: add GL_ARB_gl_spirv boilerplate")
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agoegl/android: Partially handle HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED
Tomasz Figa [Mon, 4 Dec 2017 18:22:39 +0000 (19:22 +0100)]
egl/android: Partially handle HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED

There is no API available to properly query the IMPLEMENTATION_DEFINED
format. As a workaround we rely here on gralloc allocating either
an arbitrary YCbCr 4:2:0 or RGBX_8888, with the latter being recognized
by lock_ycbcr failing.

Reviewed-on: https://chromium-review.googlesource.com/566793

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Signed-off-by: Rob Herring <robh@kernel.org>
6 years agoswr: Correct texture allocation and limit max size to 2GB
Bruce Cherniak [Mon, 20 Nov 2017 17:32:55 +0000 (11:32 -0600)]
swr: Correct texture allocation and limit max size to 2GB

This patch fixes piglit tex3d-maxsize by correcting 4 things:

The total_size calculation was using 32-bit math, therefore a >4GB
allocation request overflowed and was not returning false (unsupported).

Changed AlignedMalloc arguments from "unsigned int" to size_t, to handle
>4GB allocations.

Added error checking on texture allocations to fail gracefully.

Finally, temporarily decreased supported max texture size from 4GB to 2GB.
The gallivm texture-sampler needs some additional work to correctly handle
larger than 2GB textures (offsets to LLVMBuildGEP are signed).

I'm working on a follow-on patch to allow up to 4GB textures, as this is
useful in HPC visualization applications.

Fixes piglit tex3d-maxsize.

v2: Updated patch description to clarify ">4GB".

Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
6 years agoswr: Fix KNOB_MAX_WORKER_THREADS thread creation override.
Bruce Cherniak [Tue, 12 Dec 2017 23:18:23 +0000 (17:18 -0600)]
swr: Fix KNOB_MAX_WORKER_THREADS thread creation override.

Environment variable KNOB_MAX_WORKER_THREADS allows the user to override
default thread creation and thread binding.  Previous commit to adjust
linux cpu topology caused setting this KNOB to bind all threads to a single
core.

This patch restores correct functionality of override.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
6 years agomeson: fix glx-test race
Dylan Baker [Tue, 12 Dec 2017 18:23:48 +0000 (10:23 -0800)]
meson: fix glx-test race

This test should rely on dispatch.h being generated, but it doesn't.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agogallium/docs: document behavior of set_sample_mask()
Brian Paul [Wed, 13 Dec 2017 03:32:06 +0000 (20:32 -0700)]
gallium/docs: document behavior of set_sample_mask()

The sample mask is used even if msaa is not explicity enabled when we
have a framebuffer with multisampled surfaces.  That's DX behavior and
what the Radeon drivers do.  Not sure about other drivers at this point.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
6 years agoglsl: trivial whitespace fixes in link_varyings.cpp
Brian Paul [Tue, 12 Dec 2017 22:11:21 +0000 (15:11 -0700)]
glsl: trivial whitespace fixes in link_varyings.cpp

6 years agoprogram: Don't reset SamplersValidated when restoring from shader cache
Jordan Justen [Tue, 12 Dec 2017 19:44:01 +0000 (11:44 -0800)]
program: Don't reset SamplersValidated when restoring from shader cache

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103988
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agomesa: remove second include of errors.h in src/mesa/main/glspirv.c
Kai Wasserbäch [Tue, 12 Dec 2017 15:20:06 +0000 (16:20 +0100)]
mesa: remove second include of errors.h in src/mesa/main/glspirv.c

Fixes: 5bc03d2508 ("mesa: implement SPIR-V loading in glShaderBinary")
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agoradeonsi: create get_tcs_tes_buffer_address helper
Timothy Arceri [Thu, 23 Nov 2017 01:59:01 +0000 (12:59 +1100)]
radeonsi: create get_tcs_tes_buffer_address helper

This will be shared between the NIR and TGSI backends.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agoac: fix nir_op_f2f64
Timothy Arceri [Tue, 12 Dec 2017 05:10:24 +0000 (16:10 +1100)]
ac: fix nir_op_f2f64

Without this we get the error "FPExt only operates on FP" when
converting the following:

   vec1 32 ssa_5 = b2f ssa_4
   vec1 64 ssa_6 = f2f64 ssa_5

Which results in:

   %44 = and i32 %43, 1065353216
   %45 = fpext i32 %44 to double

With this patch we now get:

   %44 = and i32 %43, 1065353216
   %45 = bitcast i32 %44 to float
   %46 = fpext float %45 to double

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agonir: fix shift for uint64_t
Timothy Arceri [Tue, 12 Dec 2017 02:52:50 +0000 (13:52 +1100)]
nir: fix shift for uint64_t

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agost/glsl_to_nir: skip forced array splitting for tcs
Timothy Arceri [Tue, 12 Dec 2017 02:49:41 +0000 (13:49 +1100)]
st/glsl_to_nir: skip forced array splitting for tcs

nir_lower_io_to_temporaries() does not support tcs so we cannot
assume there are no indirects here. Also the radeonsi backend
(the only backend to support tess) has support for tcs indirects
so there is no need to lower them anyway.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agointel/fs/bank_conflicts: Don't touch Gen7 MRF hack registers.
Francisco Jerez [Tue, 12 Dec 2017 04:24:53 +0000 (20:24 -0800)]
intel/fs/bank_conflicts: Don't touch Gen7 MRF hack registers.

Fixes: af2c320190f3c731 "intel/fs: Implement GRF bank conflict mitigation pass."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104199
Reported-by: Darius Spitznagel <d.spitznagel@goodbytez.de>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agoi965: compute scratch space size correctly for Gen9+
Kevin Rogovin [Tue, 12 Dec 2017 12:17:27 +0000 (14:17 +0200)]
i965: compute scratch space size correctly for Gen9+

Fixes: 8ecdbb61360 "i965: Pretend there are 4 subslices for compute shader threads on Gen9+."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104005
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Eero Tamminen <eero.t.tamminen@intel.com>
6 years agoi965: Program MEDIA_VFE_STATE in a more readable fashion.
Kevin Rogovin [Tue, 12 Dec 2017 12:17:26 +0000 (14:17 +0200)]
i965: Program MEDIA_VFE_STATE in a more readable fashion.

This patch is purely for readability improvements when programming
the MEDIA_VFE_STATE.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agocso: add point rasterization sanity check assertion
Brian Paul [Fri, 8 Dec 2017 04:11:40 +0000 (21:11 -0700)]
cso: add point rasterization sanity check assertion

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
6 years agogallium/u_blitter: replace tabs with spaces
Brian Paul [Thu, 7 Dec 2017 15:52:36 +0000 (08:52 -0700)]
gallium/u_blitter: replace tabs with spaces

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoxlib: call _mesa_warning() instead of fprintf()
Brian Paul [Fri, 8 Dec 2017 16:31:08 +0000 (09:31 -0700)]
xlib: call _mesa_warning() instead of fprintf()

We use _mesa_warning() everywhere else in this code.  Change requested
by Rick Irons of Mathworks.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agogallium/util: don't pass a pipe_resource to util_resource_is_array_texture()
Brian Paul [Thu, 7 Dec 2017 22:00:49 +0000 (15:00 -0700)]
gallium/util: don't pass a pipe_resource to util_resource_is_array_texture()

No need to pass a pipe_resource when we can just pass the target.
This makes the function potentially more usable.  Rename it too.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agogallium/aux: include nr_samples in util_resource_size() computation
Brian Paul [Thu, 7 Dec 2017 21:47:32 +0000 (14:47 -0700)]
gallium/aux: include nr_samples in util_resource_size() computation

This function is only used in two places:
1. VMware driver, but only for HUD reporting
2. st/nine state tracker, used for texture memory accounting

Fixes: a69efa9482d ("util: add new util_resource_size() function in
u_resource.[ch]")

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agosvga: trivial whitespace/formatting fixes in svga_pipe_rasterizer.c
Brian Paul [Thu, 7 Dec 2017 21:30:29 +0000 (14:30 -0700)]
svga: trivial whitespace/formatting fixes in svga_pipe_rasterizer.c

6 years agost/mesa: trivial whitespace/formatting fixes in st_atom_rasterizer.c
Brian Paul [Thu, 7 Dec 2017 20:33:57 +0000 (13:33 -0700)]
st/mesa: trivial whitespace/formatting fixes in st_atom_rasterizer.c

6 years agospirv: Handle image and sampler function parameters
Jason Ekstrand [Fri, 8 Dec 2017 07:42:16 +0000 (23:42 -0800)]
spirv: Handle image and sampler function parameters

6 years agospirv/cfg: Refactor the function parameter loop a bit
Jason Ekstrand [Fri, 8 Dec 2017 07:42:15 +0000 (23:42 -0800)]
spirv/cfg: Refactor the function parameter loop a bit

6 years agospirv/cfg: Be a bit more precise about function parameters
Jason Ekstrand [Fri, 8 Dec 2017 07:42:14 +0000 (23:42 -0800)]
spirv/cfg: Be a bit more precise about function parameters

Pointers with no storage type are converted to inout variables but SSA
values and pointers with a storage type (which turns into a uint or
uvec2) are just input variables.

6 years agospirv: Make sampled images a real type
Jason Ekstrand [Fri, 8 Dec 2017 07:42:13 +0000 (23:42 -0800)]
spirv: Make sampled images a real type

Previously, we just gave them exactly the same type as the respective
image (which already had a sampler2D or similar type).  Now they have
their own base type and a pointer to the vtn_type for the image.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
6 years agoi915: add missing 0 defines
Eric Engestrom [Mon, 4 Dec 2017 14:30:13 +0000 (14:30 +0000)]
i915: add missing 0 defines

Thanks to Emil's -Wundef, t_dd_dmatmp.h now complains that intel_render.c
is missing a couple `#define`s.

Assigning them to 0 keeps the existing behaviour; I'll let someone else
turn them on if this is the behaviour that was intended.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agomesa: refuse to compile SPIR-V shaders or link mixed shaders
Nicolai Hähnle [Sat, 10 Jun 2017 17:57:18 +0000 (19:57 +0200)]
mesa: refuse to compile SPIR-V shaders or link mixed shaders

Note that gl_shader::CompileStatus will also indicate whether a shader
has been successfully specialized.

v2: Use the 'spirv_data' member of gl_shader to know if it is a SPIR-V
   shader, instead of a dedicated flag. (Timothy Arceri)

v3: Use bool instead of GLboolean. (Ian Romanick)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agomesa/shaderapi: add a getter for GL_SPIR_V_BINARY_ARB
Nicolai Hähnle [Sat, 10 Jun 2017 17:46:58 +0000 (19:46 +0200)]
mesa/shaderapi: add a getter for GL_SPIR_V_BINARY_ARB

v2: Use the 'spirv_data' member of gl_shader instead of a
   dedicated flag. (Timothy Arceri)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agomesa: implement SPIR-V loading in glShaderBinary
Nicolai Hähnle [Sat, 10 Jun 2017 18:35:21 +0000 (20:35 +0200)]
mesa: implement SPIR-V loading in glShaderBinary

v2: * Add a gl_shader_spirv_data member to gl_shader, which already
   encapsulates a gl_spirv_module where the binary will be saved.
   (Eduardo Lima)

    * Just use the 'spirv_data' member to know whether a gl_shader has
   the SPIR_V_BINARY_ARB state. (Timothy Arceri)

    * Remove redundant argument checks. Move extension presence check
   to API entry point where the rest of checks are. Retype 'n' and
   'length'arguments to use the correct and more standard types.
   (Ian Romanick)

    * Fix some nitpicks. (Ian Romanick)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agomesa/glspirv: Add struct gl_shader_spirv_data
Eduardo Lima Mitev [Mon, 13 Nov 2017 12:57:46 +0000 (13:57 +0100)]
mesa/glspirv: Add struct gl_shader_spirv_data

This is a per-shader structure holding the SPIR-V data associated with the
shader (binary module, specialization constants and entry-point).

This is needed because both gl_shader and gl_linked_shader need to share this
data. Instead of copying the data, we pass a reference to it upon program
linking. That's why it is reference-counted.

This struct is created and associated with the shader upon calling
glShaderBinary(), then subsequently filled up by the call to
glSpecializeShaderARB().

v2: Readability improvements (Ian Romanick)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agomesa/glspirv: Add struct gl_spirv_module
Nicolai Hähnle [Sat, 10 Jun 2017 18:14:44 +0000 (20:14 +0200)]
mesa/glspirv: Add struct gl_spirv_module

v2: * Make the SPIR-V module struct part of a larger gl_shader_spirv_data
    struct that will be introduced later, and don't reference it directly
    in gl_shader. (Eduardo Lima)
    * Readability improvements (Ian Romanick)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agomesa: add GL_ARB_gl_spirv boilerplate
Nicolai Hähnle [Sat, 10 Jun 2017 17:39:02 +0000 (19:39 +0200)]
mesa: add GL_ARB_gl_spirv boilerplate

v2: * Add meson build bits (Eric Engestrom)
    * Return INVALID_OPERATION error on SpecializeShaderARB (Ian Romanick)

v3: Include boilerplate for the GL 4.6 alias of glSpecializeShaderARB
   (Neil Roberts)

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agospirv: Add support for all bit sizes in OpSwitch
Jason Ekstrand [Wed, 6 Dec 2017 18:01:22 +0000 (10:01 -0800)]
spirv: Add support for all bit sizes in OpSwitch

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101560

6 years agospirv: Restructure the case loop in OpSwitch handling
Jason Ekstrand [Wed, 6 Dec 2017 18:09:28 +0000 (10:09 -0800)]
spirv: Restructure the case loop in OpSwitch handling

Instead of calling vtn_add_case for the default case and then looping,
add an is_default variable and do everything inside the loop.  This will
make the next commit easier.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agospirv: Add better parameter validation for vector and matrix types
Jason Ekstrand [Wed, 6 Dec 2017 17:35:10 +0000 (09:35 -0800)]
spirv: Add better parameter validation for vector and matrix types

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agospirv: Add type validation for OpSelect
Jason Ekstrand [Wed, 6 Dec 2017 17:14:20 +0000 (09:14 -0800)]
spirv: Add type validation for OpSelect

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agospirv: Add basic type validation for OpLoad, OpStore, and OpCopyMemory
Jason Ekstrand [Wed, 6 Dec 2017 06:51:53 +0000 (22:51 -0800)]
spirv: Add basic type validation for OpLoad, OpStore, and OpCopyMemory

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
6 years agospirv: Add a prepass to set types on vtn_values
Jason Ekstrand [Wed, 6 Dec 2017 06:31:02 +0000 (22:31 -0800)]
spirv: Add a prepass to set types on vtn_values

This autogenerated pass will automatically find and set the type field
on all vtn_values.  This way we always have the type and can use it for
validation and other checks.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agospirv: Add a vtn_type field to all vtn_values
Jason Ekstrand [Wed, 6 Dec 2017 05:39:51 +0000 (21:39 -0800)]
spirv: Add a vtn_type field to all vtn_values

At the moment, this just lets us drop the const_type for constants and
unify things a bit.  Eventually, we will use this to store the types of
all SPIR-V SSA values.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agoanv: fix bug when using component qualifier in FS outputs
Samuel Iglesias Gonsálvez [Tue, 31 Oct 2017 10:47:57 +0000 (11:47 +0100)]
anv: fix bug when using component qualifier in FS outputs

We can write to the same output but in different components, like
in this example:

layout(location = 0, component = 0) out ivec2 dEQP_FragColor_0;
layout(location = 0, component = 2) out ivec2 dEQP_FragColor_1;

Therefore, they are not two different outputs but only one.

Fixes:

dEQP-VK.glsl.440.linkage.varying.component.frag_out.*

v3:
- Remove FRAG_RESULT_MAX.
- Add const and use sizeof (Ian).
- Do three-pass to set properly the locations of fragment
  outputs when having arrays (Jason).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agost/mesa: swizzle argument when there's a vector size mismatch
Ilia Mirkin [Sat, 2 Dec 2017 16:20:46 +0000 (11:20 -0500)]
st/mesa: swizzle argument when there's a vector size mismatch

GLSL IR operation arguments can sometimes have an implicit swizzle as a
result of a vector arg and a scalar arg, where the scalar argument is
implicitly expanded to the size of the vector argument.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103955
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agogallivm: fix texture wrapping for texture gather for mirror modes
Roland Scheidegger [Tue, 12 Dec 2017 03:22:28 +0000 (04:22 +0100)]
gallivm: fix texture wrapping for texture gather for mirror modes

Care must be taken that all coords end up correct, the tests are very
sensitive that everything is correctly rounded. This doesn't matter
for bilinear filter (since picking a wrong texel with weight zero is
ok), and we could also switch the per-sample coords mistakenly.
While here, also optimize the coord_mirror helper a bit (we can do the
mirroring directly by exploiting float rounding, no need for fixing up
odd/even manually).
I did not touch the mirror_clamp and mirror_clamp_to_border modes.
In contrast to mirror_clamp_to_edge and mirror_repeat these are legacy
modes. They are specified against old gl rules, which actually does
the mirroring not per sample (so you get swapped order if the coord
is in the mirrored section). I think the idea though is that they should
follow the respecified mirror_clamp_to_edge rules so the order would be
correct.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
6 years agospirv: Allow ignoring decorations for workgroup variables
Jason Ekstrand [Mon, 11 Dec 2017 23:31:22 +0000 (15:31 -0800)]
spirv: Allow ignoring decorations for workgroup variables

Since we switched over to lowering SLM access directly in SPIR-V -> NIR,
we no longer have vtn_variables for SLM.  It's all safe as with UBOs and
SSBOs but we need to let it through in the assert.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104213
Fixes: 8761a04d0d9332d9c0c99164faf855fc3c741f7c
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
6 years agospirv: Set lengths on scalar and vector types
Jason Ekstrand [Wed, 6 Dec 2017 17:13:29 +0000 (09:13 -0800)]
spirv: Set lengths on scalar and vector types

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agoac/nir: Support vulkan_resource_reindex.
Bas Nieuwenhuizen [Sun, 10 Dec 2017 22:31:45 +0000 (23:31 +0100)]
ac/nir: Support vulkan_resource_reindex.

Fixes: 93b4cb61eb2 "spirv: Allow OpPtrAccessChain for block indices"
Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoac/nir: Don't load the descriptor in vulkan_resource_index.
Bas Nieuwenhuizen [Sun, 10 Dec 2017 22:18:32 +0000 (23:18 +0100)]
ac/nir: Don't load the descriptor in vulkan_resource_index.

To support the reindex intrinsic, we need the result to be
something on which we can adjust the index/address.

Since it is all within a basic block, the compiler should be
able to merge any extra loads.

v2: Change visit_get_buffer_size too.
Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agowinsys/amdgpu: disable local BOs again due to worse performance
Marek Olšák [Mon, 11 Dec 2017 15:29:40 +0000 (16:29 +0100)]
winsys/amdgpu: disable local BOs again due to worse performance

Cc: 17.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agodrirc: whitelist glthread for Mount and Blade Warband again
Marek Olšák [Wed, 26 Jul 2017 13:21:45 +0000 (15:21 +0200)]
drirc: whitelist glthread for Mount and Blade Warband again

6 years agoradv: Don't use local BOs when allocating with export options.
Bas Nieuwenhuizen [Sun, 10 Dec 2017 14:34:54 +0000 (15:34 +0100)]
radv: Don't use local BOs when allocating with export options.

If the app does not plan to put a buffer or image in it
(why? But it is allowed and CTS does it), they do not need to
allocate it with the deciate allocation struct.

Fixes: a639d40f133 "radv: add support for local bos. (v3)"
Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agospirv: Fix loading an entire block at once.
Bas Nieuwenhuizen [Sun, 3 Dec 2017 14:35:39 +0000 (15:35 +0100)]
spirv: Fix loading an entire block at once.

There is no chain, so  checking the length ends with a SEGFAULT.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103579
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoanv: Enable UBO pushing
Jason Ekstrand [Sat, 2 Dec 2017 00:10:48 +0000 (16:10 -0800)]
anv: Enable UBO pushing

Push constants on Intel hardware are significantly more performant than
pull constants.  Since most Vulkan applications don't actively use push
constants on Vulkan or at least don't use it heavily, we're pulling way
more than we should be.  By enabling pushing chunks of UBOs we can get
rid of a lot of those pulls.

On my SKL GT4e, this improves the performance of Dota 2 and Talos by
around 2.5% and improves Aztec Ruins by around 2%.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
6 years agoi965/fs: Handle !supports_pull_constants and push UBOs properly
Jason Ekstrand [Sun, 3 Dec 2017 06:34:47 +0000 (22:34 -0800)]
i965/fs: Handle !supports_pull_constants and push UBOs properly

In Vulkan, we don't support classic pull constants and everything the
client asks us to push, we push.  However, for pushed UBOs, we still
want to fall back to conventional pulls if we run out of space.

6 years agoanv/device: Increase the UBO alignment requirement to 32
Jason Ekstrand [Sat, 2 Dec 2017 00:07:23 +0000 (16:07 -0800)]
anv/device: Increase the UBO alignment requirement to 32

Push constants work in terms of 32-byte chunks so if we want to be able
to push UBOs, every thing needs to be 32-byte aligned.  Currently, we
only require 16-byte which is too small.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
6 years agoanv/cmd_buffer: Add support for pushing UBO ranges
Jason Ekstrand [Fri, 1 Dec 2017 22:28:46 +0000 (14:28 -0800)]
anv/cmd_buffer: Add support for pushing UBO ranges

In order to do this we have to modify push constant set up to handle
ranges.  We also have to tweak the way we handle dirty bits a bit so
that we re-push whenever a descriptor set changes.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
6 years agoanv/cmd_buffer: Add some stage asserts
Jason Ekstrand [Fri, 1 Dec 2017 22:43:25 +0000 (14:43 -0800)]
anv/cmd_buffer: Add some stage asserts

There are several places where we look up opcodes in an array of stages.
Assert that the we don't end up going out-of-bounds.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
6 years agoanv/cmd_buffer: Add some helpers for working with descriptor sets
Jason Ekstrand [Fri, 1 Dec 2017 12:25:05 +0000 (04:25 -0800)]
anv/cmd_buffer: Add some helpers for working with descriptor sets

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
6 years agoanv/pipeline: Translate vulkan_resource_index to a constant when possible
Jason Ekstrand [Fri, 1 Dec 2017 11:18:51 +0000 (03:18 -0800)]
anv/pipeline: Translate vulkan_resource_index to a constant when possible

We want to call brw_nir_analyze_ubo_ranges immedately after
anv_nir_apply_pipeline_layout and it badly wants constants.  We could
run an optimization step and let constant folding do it but that's way
more expensive than needed.  It's really easy to just handle constants
in apply_pipeline_layout.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
6 years agoi965/fs: Rewrite assign_constant_locations
Jason Ekstrand [Sun, 3 Dec 2017 06:32:59 +0000 (22:32 -0800)]
i965/fs: Rewrite assign_constant_locations

This rewires the logic for assigning uniform locations to work in terms
of "complex alignments".  The basic idea is that, as we walk the list of
instructions, we keep track of the alignment and continuity requirements
of each slot and assert that the alignments all match up.  We then use
those alignments in the compaction stage to ensure that everything gets
placed at a properly aligned register.  The old mechanism handled
alignments by special-casing each of the bit sizes and placing 64-bit
values first followed by 32-bit values.

The old scheme had the advantage of never leaving a hole since all the
64-bit values could be tightly packed and so could the 32-bit values.
However, the new scheme has no type size special cases so it handles not
only 32 and 64-bit types but should gracefully extend to 16 and 8-bit
types as the need arises.

Tested-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
6 years agoanv: Disable VK_KHR_16bit_storage
Jason Ekstrand [Fri, 8 Dec 2017 23:39:00 +0000 (15:39 -0800)]
anv: Disable VK_KHR_16bit_storage

The testing for this extension is currently very poor.  The CTS tests
only test accessing UBOs and SSBOs at dynamic offsets so none of our
constant-offset paths get triggered at all.  Also, there's an assertion
in our handling of nir_intrinsic_load_uniform that offset % 4 == 0 which
is never triggered indicating that nothing every gets loaded from an
offset which is not a dword.  Both push constants and the constant
offset pull paths are complex enough, we really don't want to ship
without tests.  We'll turn the extension back on once we have decent
tests.

6 years agoradeon/vce: move destroy command before feedback command
Leo Liu [Thu, 7 Dec 2017 17:04:59 +0000 (12:04 -0500)]
radeon/vce: move destroy command before feedback command

VCE processing IBs starts from session and task info at first level,
other commands processed subsequently. The task info for destroy is
embedded to destroy command, resulting that feedback command is not
properly procoessed. This is causing kernel spin VM fault messages on
Polaris and Vega10 card when running ends at encode application.

The fix is also verified on VCE physical mode card.

Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Christian König <christian.koenig@amd.com>
6 years agodocs/llvmpipe: document ppc64le as alternative architecture to x86.
Ben Crocker [Mon, 27 Nov 2017 19:44:58 +0000 (14:44 -0500)]
docs/llvmpipe: document ppc64le as alternative architecture to x86.

Power8, Power8NV, and Power9 are supported on an equal footing
with X86.

Cc: "17.2" "17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
[Eric: changed formatting, reworded a bit (with Ben's ack)]
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
6 years agodocs/release-calendar: drop 17.3.0 from the table
Emil Velikov [Fri, 8 Dec 2017 13:59:27 +0000 (13:59 +0000)]
docs/release-calendar: drop 17.3.0 from the table

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
6 years agodocs: add news item and link release notes for 17.3.0
Emil Velikov [Fri, 8 Dec 2017 13:56:01 +0000 (13:56 +0000)]
docs: add news item and link release notes for 17.3.0

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
6 years agodocs: add sha256 checksums for 17.3.0
Emil Velikov [Fri, 8 Dec 2017 13:53:30 +0000 (13:53 +0000)]
docs: add sha256 checksums for 17.3.0

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 49a612d1580b3316392273a069d20d93967126a8)

6 years agodocs: Update 17.3.0 release notes
Emil Velikov [Fri, 8 Dec 2017 13:47:33 +0000 (13:47 +0000)]
docs: Update 17.3.0 release notes

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 8d55da9f579463038f4305ed7d505aa7fffa0f37)

6 years agoradv: do not print ASM to stderr when dumping shaders
Samuel Pitoiset [Fri, 1 Dec 2017 15:15:40 +0000 (16:15 +0100)]
radv: do not print ASM to stderr when dumping shaders

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoradv/winsys: implement query_value()
Samuel Pitoiset [Wed, 6 Dec 2017 11:06:43 +0000 (12:06 +0100)]
radv/winsys: implement query_value()

Might be useful to know the VRAM/GTT usage, the number of VRAM
CPU page faults, etc. Nothing is currently using that new
interface, but it's a first step.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoradv: remove useless check radv_set_dcc_need_cmask_elim_pred()
Samuel Pitoiset [Wed, 6 Dec 2017 16:49:37 +0000 (17:49 +0100)]
radv: remove useless check radv_set_dcc_need_cmask_elim_pred()

emit_fast_color_clear() already checks that.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoradv: remove useless checks in radv_set_{color,depth}_clear_regs()
Samuel Pitoiset [Wed, 6 Dec 2017 16:49:36 +0000 (17:49 +0100)]
radv: remove useless checks in radv_set_{color,depth}_clear_regs()

Already checked by the respective callers.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoradv: only re-mit the index type when it changes
Samuel Pitoiset [Wed, 6 Dec 2017 16:49:20 +0000 (17:49 +0100)]
radv: only re-mit the index type when it changes

dota2 binds a ton of index buffers but the type is always 16-bit.
Note that we have to invalidate the type when switching from
indexed draws to normal draws.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoradv: only reset command buffers that are not in the initial state
Samuel Pitoiset [Wed, 6 Dec 2017 16:48:41 +0000 (17:48 +0100)]
radv: only reset command buffers that are not in the initial state

dota2 always calls vkResetCommandBuffer() before
vkBeginCommandBuffer() which is quite useless.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoradv: track different status of a command buffer
Samuel Pitoiset [Wed, 6 Dec 2017 16:48:40 +0000 (17:48 +0100)]
radv: track different status of a command buffer

RADV_CMD_BUFFER_STATUS_INVALID is not used for now, but I think
it makes sense to declare it. Could be used later with better
command buffer error handling.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoradv: fix TC-compat HTILE with VK_FORMAT_D32_SFLOAT_S8_UINT on Vega
Samuel Pitoiset [Thu, 7 Dec 2017 10:39:46 +0000 (11:39 +0100)]
radv: fix TC-compat HTILE with VK_FORMAT_D32_SFLOAT_S8_UINT on Vega

Copied from RadeonSI.

This fixes all CTS
dEQP-VK.renderpass.dedicated_allocation.formats.d32_sfloat_s8_uint.clear.*

And some other ones which use the same format.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agodocs: Update GL_ARB_get_program_binary docs to support 1 format
Jordan Justen [Mon, 20 Nov 2017 21:42:33 +0000 (13:42 -0800)]
docs: Update GL_ARB_get_program_binary docs to support 1 format

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
6 years agoi965: Add ARB_get_program_binary support using nir_serialization
Jordan Justen [Sat, 4 Nov 2017 23:53:15 +0000 (16:53 -0700)]
i965: Add ARB_get_program_binary support using nir_serialization

This resolves an apparent game bug described in 85564. The game
doesn't properly handle ARB_get_program_binary with 0 supported
formats.

V2 (Timothy Arceri):
 - less driver code as more has been moved into the common helpers.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85564
Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
6 years agomain: Clear shader program data whenever ProgramBinary is called
Jordan Justen [Tue, 7 Nov 2017 10:11:28 +0000 (02:11 -0800)]
main: Clear shader program data whenever ProgramBinary is called

The GL_ARB_get_program_binary extension spec says:

 "If ProgramBinary fails to load a binary, no error is generated, but
  any information about a previous link or load of that program object
  is lost."

v2:
 * Re-initialize shProg->data after clear. (Jordan)
   (Required after 6a72eba755fea15a0d97abb913a6315d9d32e274)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agomain: add binary support to ProgramBinary
Jordan Justen [Sat, 4 Nov 2017 23:47:54 +0000 (16:47 -0700)]
main: add binary support to ProgramBinary

V2: call generic mesa_program_binary() helper rather than driver
    function directly to allow greater code sharing.

Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
6 years agomain: add binary support to GetProgramBinary
Jordan Justen [Sat, 4 Nov 2017 23:47:25 +0000 (16:47 -0700)]
main: add binary support to GetProgramBinary

V2: call generic _mesa_get_program_binary() helper rather than driver
    function directly to allow greater code sharing.

Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
6 years agomain: Support getting GL_PROGRAM_BINARY_LENGTH
Jordan Justen [Sat, 4 Nov 2017 23:43:21 +0000 (16:43 -0700)]
main: Support getting GL_PROGRAM_BINARY_LENGTH

V2: call generic _mesa_get_program_binary_length() helper
    rather than driver function directly to allow greater
    code sharing.

Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>i (v1)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
6 years agomesa: Add Mesa ARB_get_program_binary helper functions
Jordan Justen [Sat, 4 Nov 2017 23:52:14 +0000 (16:52 -0700)]
mesa: Add Mesa ARB_get_program_binary helper functions

V2 (Timothy Arceri):
 - add extra code comment
 - stop passing around void *binary and just pass
   program_binary_header *hdr instead.
 - move to src/mesa/main rather than src/util

V3 (Timothy Arceri):
 - Move more code out of the backend and into the common
   helpers.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
6 years agomesa: add driver callbacks for serialising ProgramBinary blobs
Timothy Arceri [Tue, 28 Nov 2017 03:27:51 +0000 (14:27 +1100)]
mesa: add driver callbacks for serialising ProgramBinary blobs

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
6 years agomain: Support 1 Mesa format with get for GL_PROGRAM_BINARY_FORMATS
Jordan Justen [Tue, 7 Nov 2017 08:21:33 +0000 (00:21 -0800)]
main: Support 1 Mesa format with get for GL_PROGRAM_BINARY_FORMATS

Mesa supports either 0 or 1 formats. If 1 format is supported, it is
GL_PROGRAM_BINARY_FORMAT_MESA as defined in the
GL_MESA_program_binary_formats extension spec.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agomain: Allow non-zero NUM_PROGRAM_BINARY_FORMATS
Jordan Justen [Sat, 4 Nov 2017 23:39:08 +0000 (16:39 -0700)]
main: Allow non-zero NUM_PROGRAM_BINARY_FORMATS

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agoi965: Fix memory leak when serializing nir
Jordan Justen [Sat, 4 Nov 2017 00:18:32 +0000 (17:18 -0700)]
i965: Fix memory leak when serializing nir

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agoi965: Add brw_program_serialize_nir
Jordan Justen [Fri, 3 Nov 2017 23:57:42 +0000 (16:57 -0700)]
i965: Add brw_program_serialize_nir

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agoi965: Free serialized nir after deserializing
Jordan Justen [Fri, 3 Nov 2017 23:45:46 +0000 (16:45 -0700)]
i965: Free serialized nir after deserializing

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agoi965: Add brw_program_deserialize_nir
Jordan Justen [Fri, 3 Nov 2017 23:40:17 +0000 (16:40 -0700)]
i965: Add brw_program_deserialize_nir

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agomain, glsl: Add UniformDataDefaults which stores uniform defaults
Jordan Justen [Mon, 30 Oct 2017 18:16:48 +0000 (11:16 -0700)]
main, glsl: Add UniformDataDefaults which stores uniform defaults

The ARB_get_program_binary extension requires that uniform values in a
program be restored to their initial value just after linking.

This patch saves off the initial values just after linking. When the
program is restored by glProgramBinary, we can use this to copy the
initial value of uniforms into UniformDataSlots.

V2 (Timothy Arceri):
 - Store UniformDataDefaults only when serializing GLSL as this
   is what we want for both disk cache and ARB_get_program_binary.
   This saves us having to come back later and reset the Uniforms
   on program binary restores.

Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
6 years agoglsl: Split out shader program serialization
Jordan Justen [Fri, 27 Oct 2017 08:04:53 +0000 (01:04 -0700)]
glsl: Split out shader program serialization

This will allow us to use the program serialization to implement
ARB_get_program_binary.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agoinclude: Add GL_MESA_program_binary_formats to GL/GLES2 ext.h files
Jordan Justen [Tue, 7 Nov 2017 08:16:47 +0000 (00:16 -0800)]
include: Add GL_MESA_program_binary_formats to GL/GLES2 ext.h files

Thus was merged into the OpenGL Registry in version
667c5a253781834b40a6ae9eb19d05af4542cfe1.

Ref: https://github.com/KhronosGroup/OpenGL-Registry/pull/127
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agomesa: add GL_PROGRAM_BINARY_FORMAT_MESA enum
Jordan Justen [Tue, 28 Nov 2017 00:15:07 +0000 (11:15 +1100)]
mesa: add GL_PROGRAM_BINARY_FORMAT_MESA enum

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agointel/cfg: Represent divergent control flow paths caused by non-uniform loop execution.
Francisco Jerez [Sat, 14 Oct 2017 00:52:00 +0000 (17:52 -0700)]
intel/cfg: Represent divergent control flow paths caused by non-uniform loop execution.

This addresses a long-standing back-end compiler bug that could lead
to cross-channel data corruption in loops executed non-uniformly.  In
some cases live variables extending through a loop divergence point
(e.g. a non-uniform break) into a convergence point (e.g. the end of
the loop) wouldn't be considered live along all physical control flow
paths the SIMD thread could possibly have taken in between due to some
channels remaining in the loop for additional iterations.

This patch fixes the problem by extending the CFG with physical edges
that don't exist in the idealized non-vectorized program, but
represent valid control flow paths the SIMD EU may take due to the
divergence of logical threads.  This makes sense because the i965 IR
is explicitly SIMD, and it's not uncommon for instructions to have an
influence on neighboring channels (e.g. a force_writemask_all header
setup), so the behavior of the SIMD thread as a whole needs to be
considered.

No changes in shader-db.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/fs: Don't let undefined values prevent copy propagation.
Francisco Jerez [Mon, 23 Oct 2017 20:47:10 +0000 (13:47 -0700)]
intel/fs: Don't let undefined values prevent copy propagation.

This makes the dataflow propagation logic of the copy propagation pass
more intelligent in cases where the destination of a copy is known to
be undefined for some incoming CFG edges, building upon the
definedness information provided by the last patch.  Helps a few
programs, and avoids a handful shader-db regressions from the next
patch.

shader-db results on ILK:

  total instructions in shared programs: 6541547 -> 6541523 (-0.00%)
  instructions in affected programs: 360 -> 336 (-6.67%)
  helped: 8
  HURT: 0

  LOST:   0
  GAINED: 10

shader-db results on BDW:

  total instructions in shared programs: 8174323 -> 8173882 (-0.01%)
  instructions in affected programs: 7730 -> 7289 (-5.71%)
  helped: 5
  HURT: 2

  LOST:   0
  GAINED: 4

shader-db results on SKL:

  total instructions in shared programs: 8185669 -> 8184598 (-0.01%)
  instructions in affected programs: 10364 -> 9293 (-10.33%)
  helped: 5
  HURT: 2

  LOST:   0
  GAINED: 2

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agointel/fs: Restrict live intervals to the subset possibly reachable from any definition.
Francisco Jerez [Thu, 7 Sep 2017 07:26:03 +0000 (00:26 -0700)]
intel/fs: Restrict live intervals to the subset possibly reachable from any definition.

Currently the liveness analysis pass would extend a live interval up
to the top of the program when no unconditional and complete
definition of the variable is found that dominates all of its uses.

This can lead to a serious performance problem in shaders containing
many partial writes, like scalar arithmetic, FP64 and soon FP16
operations.  The number of oversize live intervals in such workloads
can cause the compilation time of the shader to explode because of the
worse than quadratic behavior of the register allocator and scheduler
when running out of registers, and it can also cause the running time
of the shader to explode due to the amount of spilling it leads to,
which is orders of magnitude slower than GRF memory.

This patch fixes it by computing the intersection of our current live
intervals with the subset of the program that can possibly be reached
from any definition of the variable.  Extending the storage allocation
of the variable beyond that is pretty useless because its value is
guaranteed to be undefined at a point that cannot be reached from any
definition.

According to Jason, this improves performance of the subgroup Vulkan
CTS tests significantly (e.g. the runtime of the dvec4 broadcast test
improves by nearly 50x).

No significant change in the running time of shader-db (with 5%
statistical significance).

shader-db results on IVB:

  total cycles in shared programs: 61108780 -> 60932856 (-0.29%)
  cycles in affected programs: 16335482 -> 16159558 (-1.08%)
  helped: 5121
  HURT: 4347

  total spills in shared programs: 1309 -> 1288 (-1.60%)
  spills in affected programs: 249 -> 228 (-8.43%)
  helped: 3
  HURT: 0

  total fills in shared programs: 1652 -> 1597 (-3.33%)
  fills in affected programs: 262 -> 207 (-20.99%)
  helped: 4
  HURT: 0

  LOST:   2
  GAINED: 209

shader-db results on BDW:

  total cycles in shared programs: 67617262 -> 67361220 (-0.38%)
  cycles in affected programs: 23397142 -> 23141100 (-1.09%)
  helped: 8045
  HURT: 6488

  total spills in shared programs: 1456 -> 1252 (-14.01%)
  spills in affected programs: 465 -> 261 (-43.87%)
  helped: 3
  HURT: 0

  total fills in shared programs: 1720 -> 1465 (-14.83%)
  fills in affected programs: 471 -> 216 (-54.14%)
  helped: 4
  HURT: 0

  LOST:   2
  GAINED: 162

shader-db results on SKL:

  total cycles in shared programs: 65436248 -> 65245186 (-0.29%)
  cycles in affected programs: 22560936 -> 22369874 (-0.85%)
  helped: 8457
  HURT: 6247

  total spills in shared programs: 437 -> 437 (0.00%)
  spills in affected programs: 0 -> 0
  helped: 0
  HURT: 0

  total fills in shared programs: 870 -> 854 (-1.84%)
  fills in affected programs: 16 -> 0
  helped: 1
  HURT: 0

  LOST:   0
  GAINED: 107

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agointel/fs: Teach instruction scheduler about GRF bank conflict cycles.
Francisco Jerez [Wed, 6 Dec 2017 19:42:54 +0000 (11:42 -0800)]
intel/fs: Teach instruction scheduler about GRF bank conflict cycles.

This should allow the post-RA scheduler to do a slightly better job at
hiding latency in presence of instructions incurring bank conflicts.
The main purpuse of this patch is not to improve performance though,
but to get conflict cycles to show up in shader-db statistics in order
to make sure that regressions in the bank conflict mitigation pass
don't go unnoticed.

Acked-by: Matt Turner <mattst88@gmail.com>
6 years agointel/fs: Implement GRF bank conflict mitigation pass.
Francisco Jerez [Thu, 15 Jun 2017 22:23:57 +0000 (15:23 -0700)]
intel/fs: Implement GRF bank conflict mitigation pass.

Unnecessary GRF bank conflicts increase the issue time of ternary
instructions (the overwhelmingly most common of which is MAD) by
roughly 50%, leading to reduced ALU throughput.  This pass attempts to
minimize the number of bank conflicts by rearranging the layout of the
GRF space post-register allocation.  It's in general not possible to
eliminate all of them without introducing extra copies, which are
typically more expensive than the bank conflict itself.

In a shader-db run on SKL this helps roughly 46k shaders:

   total conflicts in shared programs: 1008981 -> 600461 (-40.49%)
   conflicts in affected programs: 816222 -> 407702 (-50.05%)
   helped: 46234
   HURT: 72

The running time of shader-db itself on SKL seems to be increased by
roughly 2.52%±1.13% with n=20 due to the additional work done by the
compiler back-end.

On earlier generations the pass is somewhat less effective in relative
terms because the hardware incurs a bank conflict anytime the last two
sources of the instruction are duplicate (e.g. while trying to square
a value using MAD), which is impossible to avoid without introducing
copies.  E.g. for a shader-db run on SNB:

   total conflicts in shared programs: 944636 -> 623185 (-34.03%)
   conflicts in affected programs: 853258 -> 531807 (-37.67%)
   helped: 31052
   HURT: 19

And on BDW:

   total conflicts in shared programs: 1418393 -> 987539 (-30.38%)
   conflicts in affected programs: 1179787 -> 748933 (-36.52%)
   helped: 47592
   HURT: 70

On SKL GT4e this improves performance of GpuTest Volplosion by 3.64%
±0.33% with n=16.

NOTE: This patch intentionally disregards some i965 coding conventions
      for the sake of reviewability.  This is addressed by the next
      squash patch which introduces an amount of (for the most part
      boring) boilerplate that might distract reviewers from the
      non-trivial algorithmic details of the pass.

The following patch is squashed in:

SQUASH: intel/fs/bank_conflicts: Roll back to the nineties.

Acked-by: Matt Turner <mattst88@gmail.com>
6 years agomeson: Fix building gallium media targets with gallium-xlib glx
Dylan Baker [Tue, 5 Dec 2017 17:40:03 +0000 (09:40 -0800)]
meson: Fix building gallium media targets with gallium-xlib glx

To demonstrate this bug run meson with the options:
-Ddri-drivers= -Dglx=gallium-xlib

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>