git.libre-soc.org Git - mesa.git/log

nir: Allow to skip integer ops in nir_lower_to_source_mods

Some hardware supports source mods only for float operations. Make it
possible to skip lowering to source mods in these cases.

v2: use option flags instead of a boolean (Jason Ekstrand)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

nir/spirv: cast shift operand to u32

v2: fix for specialization constants as well

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Karol Herbst <kherbst@redhat.com>

nir: replace nir_load_system_value calls with appropiate builder functions

this helps reduce the overall code changes when a bit_size parameter is
added to nir_load_system_value

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>

nir: add const_index parameters to system value builder function

this allows to replace some nir_load_system_value calls with the specific
system value constructor

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>

radv: make use of nir_move_out_const_to_consumer()

vkpipeline-db results:

Totals from affected shaders:
SGPRS: 28400 -> 28576 (0.62 %)
VGPRS: 27916 -> 27692 (-0.80 %)
Spilled SGPRs: 140 -> 138 (-1.43 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 1534456 -> 1520560 (-0.91 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 3541 -> 3582 (1.16 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

anv: move helper function internally

It's only used in anv_image.c

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

anv: use image aspects rather than computed ones

This shouldn't make any difference but I feel uneasy to use the
expanded aspects that do not represent the image in its entirety. If
we ever change the implementation of the anv_image_aspect_to_plane()
helper, this is safer.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

anv: associate vulkan formats with aspects

This will make it easier to associate an aspect with a plane number.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

anv/lower_ycbcr: make sure to set 0s on all components

To play around with debugging, we might want to disable one or the
other component. Having 0s as default values makes this work.
Otherwise we might have NULL components, leading to crashes.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

anv/image: remove unused parameter

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

anv: simplify internal address offset

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

meson: fix wayland-less builds

Those empty variables in the !wayland case are useless and running that
meson.build with them breaks the build:

  [287/850] Generating wayland-drm-client-protocol.h with a custom command.
  FAILED: src/egl/wayland/wayland-drm/wayland-drm-client-protocol.h
  client-header ../src/egl/wayland/wayland-drm/wayland-drm.xml src/egl/wayland/wayland-drm/wayland-drm-client-protocol.h
  /bin/sh: client-header: command not found
  ninja: build stopped: subcommand failed.

Fixes: d1992255bb29054fa5176 "meson: Add build Intel "anv" vulkan driver"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

gbm: remove unnecessary meson include

`inc_wayland_drm` is only used if wayland is built, and it's already
added in that case a few lines below.

Fixes: a29869e8720b385d3692f "gbm: Don't traverse backwards for includes"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

meson: only run vulkan's meson.build when building vulkan

Fixes: d1992255bb29054fa5176 "meson: Add build Intel "anv" vulkan driver"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

xmlpool: update translation po files

These files are close to 4 years out of date; a lot's changed since.
Let's just check in a recently-regenerated version.

Changes generated by running `ninja xmlpool-{pot,update-po,gmo}`.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>

REVIEWERS: add Vulkan reviewer group

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>

REVIEWERS: add Emil as EGL reviewer

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>

REVIEWERS: add include path for EGL

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>

intel/genxml: Add engine definition to render engine instructions (gen11)

Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.

v2: Divided into individual patches for each gen

v3: Added additional engine definitions.

v4: Added missing engine definition to MI_TOPOLOGY_FILTER.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/genxml: Add engine definition to render engine instructions (gen10)

Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.

v2: Divided into individual patches for each gen

v3: Added additional engine definitions.

v4: Added missing engine definition to MI_TOPOLOGY_FILTER.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/genxml: Add engine definition to render engine instructions (gen9)

Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.

v2: Divided into individual patches for each gen

v3: Added additional engine definitions.

v4: Added more missing engine definitions.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/genxml: Add engine definition to render engine instructions (gen8)

Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.

v2: Divided into individual patches for each gen

v3: Added additional engine definitions.

v4: Added missing engine tag for MI_TOPOLOGY_FILTER and MI_LOAD_URB_MEM.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/genxml: Add engine definition to render engine instructions (gen75)

Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.

v2: Divided into individual patches for each gen

v3: Added additional engine definitions.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/genxml: Add engine definition to render engine instructions (gen7)

Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.

v2: Divided into individual patches for each gen

v3: Added additional engine definitions.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/genxml: Add engine definition to render engine instructions (gen6)

Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.

v2: Divided into individual patches for each gen

v3: Added additional engine definitions

v4: Added missing engine to MEDIA_GATEWAY_STATE

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/genxml: Add engine definition to render engine instructions (gen5)

Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.

v2: Divided into individual patches for each gen

v3: Added additional engine definitions.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/genxml: Add engine definition to render engine instructions (gen45)

Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.

v2: Divided into individual patches for each gen

v3: Added addition engine definitions.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/genxml: Add engine definition to render engine instructions (gen4)

Instructions meant for the render engine now have a definition specifying that
so that can differentiate instructions meant for different engines due to shared
opcodes.

v2: Divided into individual patches for each gen

v3: Added additional engine definitions.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/decoder: tools: Use engine for decoding batch instructions

The engine to which the batch was sent to is now set to the decoder context when
decoding the batch. This is needed so that we can distinguish between
instructions as the render and video pipe share some of the instruction opcodes.

v2: The engine is now in the decoder context and the batch decoder uses a local
function for finding the instruction for an engine.

v3: Spec uses engine_mask now instead of engine, replaced engine class enums
with the definitions from UAPI.

v4: Fix up aubinator_viewer (Lionel)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/decoder: tools: gen_engine to drm_i915_gem_engine_class

Removed the gen_engine enum and changed the involved functions to use the
drm_i915_gem_engine_class enum from UAPI instead.

v3: Wrong engine was being used for blocks in video ring

v4: Fixed aubinator_viewer.cpp
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/decoder: Engine parameter for instructions

Preliminary work for adding handling of different pipes to gen_decoder. Each
instruction needs to have a definition describing which engine it is meant for.
If left undefined, by default, the instruction is defined for all engines.

v2: Changed to use the engine class definitions from UAPI

v3: Changed I915_ENGINE_CLASS_TO_MASK to use BITSET_BIT, change engine to
engine_mask, added check for incorrect engine and added the possibility to
define an instruction to multiple engines using the "|" as a delimiter in the
engine attribute.

v4: Fixed the memory leak.

v5: Removed an unnecessary ralloc_free().

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

virgl: Add command and flags to initiate debugging on the host (v2)

On the host VREND_DEBUG=guestallow must be set to let the guest override
the debug flags.

v2: Send flag string instead of flags, this avoids the need to keep
the flags in sync.
v3: Only request host logging if the host actually understands the command

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>

mesa: Reference count shaders that are used by transform feedback objects

Transform feedback objects may hold a pointer to a shader program, and
at least in Gallium, this must be a valid pointer until
ctx->Driver.EndTransformFeedback in glEndTransformFeedback has been called
- which is conform with the spec that any program that is part of a
current rendering state should only be flagged for deletion by glDeleteProgram.
This was not handled properly for the transform feedback objects so that
a call sequence

  glUseProgram(x)
  glBeginTransformFreedback(...)
  glPauseTransformFeedback(...)
  glDeleteProgram(x)
  glEndTransformFeedback(...)

would result in a use after free bug. With this patch the transform
feedback object also updates the reference count to the used program
thereby keeping the program valid as long as the transform feedback
objects links to it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108713
Fixes: 654587696b4234d09a6b471b70e9629cf2887c27
       mesa: add end_transform_feedback() helper

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

radv: set optimal OVERWRITE_COMBINER_WATERMARK on GFX9

Ported from RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: set PA.SC_CONSERVATIVE_RASTERIZATION.NULL_SQUAD_AA_MASK_ENABLE

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: binding streamout buffers doesn't change context regs

Cc: 18.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

nir: Don't lower the local work group size if it's variable.

If the local work group size is variable it won't be available
at compile time so we can't lower it in nir_lower_system_values().

Signed-off-by: Plamena Manolova <plamena.n.manolova@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>

util/ralloc: Make sizeof(linear_header) a multiple of 8

Prior to this patch sizeof(linear_header) was 20 bytes in a
non-debug build on 32-bit platforms. We do some pointer arithmetic to
calculate the next available location with

   ptr = (linear_size_chunk *)((char *)&latest[1] + latest->offset);

in linear_alloc_child(). The &latest[1] adds 20 bytes, so an allocation
would only be 4-byte aligned.

On 32-bit SPARC a 'sttw' instruction (which stores a consecutive pair of
4-byte registers to memory) requires an 8-byte aligned address. Such an
instruction is used to store to an 8-byte integer type, like intmax_t
which is used in glcpp's expression_value_t struct.

As a result of the 4-byte alignment returned by linear_alloc_child() we
would generate a SIGBUS (unaligned exception) on SPARC.

According to the GNU libc manual malloc() always returns memory that has
at least an alignment of 8-bytes [1]. I think our allocator should do
the same.

So, simple fix with two parts:

   (1) Increase SUBALLOC_ALIGNMENT to 8 unconditionally.
   (2) Mark linear_header with an aligned attribute, which will cause
       its sizeof to be rounded up to that alignment. (We already do
       this for ralloc_header)

With this done, all Mesa's unit tests now pass on SPARC.

[1] https://www.gnu.org/software/libc/manual/html_node/Aligned-Memory-Blocks.html

Fixes: 47e17586924f ("glcpp: use the linear allocator for most objects")
Bug: https://bugs.gentoo.org/636326
Reviewed-by: Eric Anholt <eric@anholt.net>

util/ralloc: Switch from DEBUG to NDEBUG

The debug code is all asserts, so protect it with the same thing that
controls assert.

Reviewed-by: Eric Anholt <eric@anholt.net>

nir: add support for removing redundant stores to copy prop var

For example the following type of thing is seen in TCS from
a number of Vulkan and DXVK games:

vec1 32 ssa_557 = deref_var &oPatch (shader_out float)
vec1 32 ssa_558 = intrinsic load_deref (ssa_557) ()
vec1 32 ssa_559 = deref_var &oPatch@42 (shader_out float)
vec1 32 ssa_560 = intrinsic load_deref (ssa_559) ()
vec1 32 ssa_561 = deref_var &oPatch@43 (shader_out float)
vec1 32 ssa_562 = intrinsic load_deref (ssa_561) ()
intrinsic store_deref (ssa_557, ssa_558) (1) /* wrmask=x */
intrinsic store_deref (ssa_559, ssa_560) (1) /* wrmask=x */
intrinsic store_deref (ssa_561, ssa_562) (1) /* wrmask=x */

No shader-db changes on i965 (SKL).

vkpipeline-db results RADV (VEGA):

Totals from affected shaders:
SGPRS: 7832 -> 7728 (-1.33 %)
VGPRS: 6476 -> 6740 (4.08 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 469572 -> 456596 (-2.76 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 989 -> 960 (-2.93 %)
Wait states: 0 -> 0 (0.00 %)

The Max Waves and VGPRS changes here are misleading. What is
happening is a bunch of TCS outputs are being optimised away as
they are now recognised as unused. This results in more varyings
being compacted via nir_compact_varyings() which can result in
more register pressure when they are not packed in an optimal way.
This is an existing problem independent of this patch. I've run
some benchmarks and haven't noticed any performance regressions
in affected games.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

anv/i965: make use of nir_link_constant_varyings()

shader-db results for SLK:

total instructions in shared programs: 13106498 -> 13091573 (-0.11%)
instructions in affected programs: 1186244 -> 1171319 (-1.26%)
helped: 6186
HURT: 0

total cycles in shared programs: 332062633 -> 331961653 (-0.03%)
cycles in affected programs: 8537165 -> 8436185 (-1.18%)
helped: 5371
HURT: 862

LOST: 6
GAINED: 14

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

egl: Improve the debugging of gbm format matching in DRI configs.

Previously the debug would be:

libEGL debug: No DRI config supports native format 0x20203852
libEGL debug: No DRI config supports native format 0x38385247

but

libEGL debug: No DRI config supports native format R8
libEGL debug: No DRI config supports native format GR88

is a lot easier to understand.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>

gbm: Introduce a helper function for printing GBM format names.

This requires that the caller make a little (stack) allocation to store
the string.

v2: Use gbm_format_canonicalize (suggested by Daniel)

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>

gbm: Move gbm_format_canonicalize() to the core.

I want it for the format name debugging code.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>

meson: fix libatomic tests

There are two problems:
1) the extra underscore in MISSING_64BIT_ATOMICS
2) we should link with libatomic if the previous test decided we needed
it

Fixes: d1992255bb29054fa51763376d125183a9f602f3
("meson: Add build Intel "anv" vulkan driver")
Reviewed-and-Tested-by: Matt Turner <mattst88@gmail.com>

mesa: mark GL_SR8_EXT non-renderable on GLES

Fixes: dEQP-GLES3.functional.fbo.completeness.renderable.texture.color0.sr8_ext
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

st/mesa: disable L3 thread pinning

This implementation can have massive drawbacks.

Cc: 18.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>

nir: add lowering for ffloor

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

util: Fix warning in u_cpu_detect on non-x86

regs is only set and used on x86; on other platforms (like ARM), this
code causes a trivial warning, solved by moving the regs declaration to
the architecture-dependent usage.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>

meson: Don't set -Wall

meson does this for you with its warn levels, so we don't need to set
it ourselves.

Fixes: d1992255bb29054fa51763376d125183a9f602f3
("meson: Add build Intel "anv" vulkan driver")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

freedreno/drm: fix unused 'entry' warnings

Looks like importing libdrm_freedreno into mesa crossed paths with
e27902a2613.

Signed-off-by: Rob Clark <robdclark@gmail.com>

i965: add support for sampling from AYUV

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

dri: add AYUV format

v2: Add a AYUV entry android in the android backend (Tapani)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

nir/lower_tex: Add AYUV lowering support

Byte ordering is :

0: V
1: U
2: Y
3: A

v2: Split refactoring of alpha channel (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com> (v1)
Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v2)

nir/lower_tex: add alpha channel parameter for yuv lowering

We're about to introduce AYUV support which provides its own alpha
channel. So give alpha as a parameter and set it to 1 on exising
formats.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

radv: make use of num_good_cu_per_sh in si_emit_graphics() too

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: clean up setting partial_es_wave for distributed tess on VI

Only needed when the pipeline actually uses tessellation. I don't
think that changes anything, except improving readability.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: cleanup and document a Hawaii bug with offchip buffers

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

glsl/test: Fix use after free in test_optpass.

The variable state is free'd and afterwards state->error is used
as the return value, resulting in a use after free bug detected
by memory safety tools like address sanitizer.

Signed-off-by: Hanno Böck <hanno@hboeck.de>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108636
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

nir: don't pack varyings ints with floats unless flat

Fixes: 1c9c42d16b4c ("nir: add varying component packing helpers")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

nir: add glsl_type_is_integer() helper

Fixes: 1c9c42d16b4c ("nir: add varying component packing helpers")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

intel/fs: Prevent emission of IR instructions not aligned to their own execution size.

This can occur during payload setup of SIMD-split send message
instructions, which can lead to the emission of header setup
instructions with a non-zero channel group and fixed SIMD width. Such
instructions could end up using undefined channel enable signals
except they don't care since they're always marked force_writemask_all.

Not known to affect correctness of any workload at this point, but it
would be trivial to back-port to stable if something comes up.

Reported-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Sagar Ghuge <sagar.ghuge@intel.com>

st/mesa: make use of nir_link_constant_varyings()

Shader-db results radeonsi (VEGA):

Totals from affected shaders:
SGPRS: 161464 -> 161368 (-0.06 %)
VGPRS: 86904 -> 86292 (-0.70 %)
Spilled SGPRs: 296 -> 314 (6.08 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 3618596 -> 3573852 (-1.24 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 26189 -> 26276 (0.33 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Eric Anholt <eric@anholt.net>

nir: add new linking opt nir_link_constant_varyings()

This pass moves constant outputs to the consuming shader stage
where possible.

Reviewed-by: Eric Anholt <eric@anholt.net>

st/nine: clean up thead shutdown sequence a bit

Just break out of the loop instead, it does the same thing.

Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>

st/nine: plug thread related leaks

Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>

st/nine: fix stack corruption due to ABI mismatch

This fixes various crashes and hangs when using nine's 'thread_submit'
feature.

On 64bit, the thread function's data argument would just be NULL.
On 32bit, the data argument would be garbage depending on the compiler
flags (in my case -march>=core2).

Fixes: f3fa7e3068512d ("st/nine: Use WINE thread for threadpool")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>

radeonsi: stop command submission with PIPE_CONTEXT_LOSE_CONTEXT_ON_RESET only

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

gallium: add PIPE_CONTEXT_LOSE_CONTEXT_ON_RESET

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

radeonsi: don't set the CB clear color registers for 0/1 clear colors on Raven2

and add has_dcc_constant_encode.

radeonsi: use better DCC clear codes

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

ac/surface: remove the overallocation workaround for Vega12

not needed anymore (probably since the tile_swizzle fix)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

intel/aub_read: remove useless breaks

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

Revert "mesa: expose NV_conditional_render on GLES"

This reverts commit 5213be9fab72548c799b30e320dd1b257534f096.

Revert "mesa/main: fixup make check after NV_conditional_render for gles"

This reverts commit cccd7a253f9ed14ea748a222f58b0e5c895eb939.

mesa/main: fixup make check after NV_conditional_render for gles

It seems I missed some details when exposing NV_conditional_render
on GLES; this fixes up "make check".

Fixes: 5213be9fab7 ("mesa: expose NV_conditional_render on GLES")
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-and-Tested-by: Eric Engestrom <eric.engestrom@intel.com>

radv: include LLVM IR in the VK_AMD_shader_info "disassembly"

Helpful for debugging compiler backend problems: this allows us to
easily retrieve the LLVM IR from RenderDoc.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

mesa: expose NV_conditional_render on GLES

The extension spec has been updated to include GLES 2 support, so let's
enable it there.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

nir/constant_folding: fix incorrect bit-size check

nir_alu_type_get_type_size takes a type as parameter and we were
passing a bit-size instead, which did what we wanted by accident,
since a bit-size of zero matches nir_type_invalid, which has a
size of 0 too.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>

intel/compiler: fix node interference of simd16 instructions

SIMD16 instructions need to have additional interferences to prevent
source / destination hazards when the source and destination registers
are off by one register.

While we already have code to handle this, it was only running for SIMD16
dispatches, however, we can have SIDM16 instructions in a SIMD8 dispatch.
An example of this are pull constant loads since commit b56fa830c6095,
but there are more cases.

This fixes a number of CTS test failures found in work-in-progress
tests that were hitting this situation for 16-wide pull constants
in a SIMD8 program.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>

gallivm: fix improper clamping of vertex index when fetching gs inputs

Because we only have one file_max for the (2d) gs input file, the value
actually represents the max of attrib and vertex index (although I'm
not entirely sure if we really want the max, since the max valid value
of the vertex dimension can be easily deduced from the input primitive).

Thus in cases where the number of inputs is higher than the number of
vertices per prim, we did not properly clamp the vertex index, which
would result in out-of-bound fetches, potentially causing segfaults
(the segfaults seemed actually difficult to trigger, but valgrind
certainly wasn't happy). This might have happened even if the shader
did not actually try to fetch bogus vertices, if the fetching happened
in non-active conditional clauses.

To fix simply use the correct max vertex index value (derived from
the input prim type) instead when clamping for this case.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

i965: Lift restriction in external textures for EGLImage support

Fixes Skqp's unitTest_EGLImageTest test.

For Intel platforms, we support external textures only for EGLImages
created with EGL_EXT_image_dma_buf_import. This restriction seems to
be Intel specific and not present for other platforms.

While running SKQP test - unitTest_EGLImageTest, GL_INVALID is sent
to the test because of this restriction.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105301
Signed-off-by: Aditya Swarup <aditya.swarup@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>

glsl: Add pragma to disable all warnings

Use #pragma warning(off) and #pragma warning(on) to disable or enable
all warnings.  This is a big hammer.  If we ever need a smaller hammer,
we can enhance this functionality.

There is one lame thing about this.  Because we parse everything, create
an AST, then convert the AST to GLSL IR, we have to treat the #pragma
like a statment.  This means that you can't do something like

'    void
'    #pragma warning(off)
'    __foo
'    #pragma warning(on)
'    (float param0);

Fixing that would, as far as I can tell, require a huge amount of work.

I did try just handling the #pragma during parsing (like we do for
state for the whole shader.

v2: Fix the #pragma lines in the commit message that git-commit ate.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

glsl: Add warning tests for identifiers with __

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

intel/fs: Add an assert to optimize_frontfacing_ternary

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

anv: Use nir_src_is_const and friends in lowering code

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/analyze_ubo_ranges: Use nir_src_is_const and friends

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/vec4: Use the new nir_src_is_const and friends

As of this commit, all uses of const sources either go through a
nir_src_as_<type> helper which handles bit sizes correctly or else are
accompanied by a nir_src_bit_size() == 32 assertion to assert that we
have the size we think we have.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

nir: Add a read_mask helper for ALU instructions

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/fs: Use the new nir_src_is_const and friends

As of this commit, all uses of const sources either go through a
nir_src_as_<type> helper which handles bit sizes correctly or else are
accompanied by a nir_src_bit_size() == 32 assertion to assert that we
have the size we think we have.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/fs,vec4: Clean up a repeated pattern with SSBOs

Everywhere we handle SSBO intrinsics, we have exactly the same pattern
for computing the index so we may as well make a helper for it. We also
add a get_nir_src_imm to vec4 and use it for SSBO offsets.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

radv: fix GPU hangs when loading depth/stencil clear values on SI/CIK

HTILE is supported on these chips, not sure how I missed that.
This restores using PFP_SYNC_ME when LOAD_CONTEXT_REG is not used.

Fixes: f425d9ee74 ("radv: use LOAD_CONTEXT_REG when loading fast clear values")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

radv: use LOAD_CONTEXT_REG when loading fast clear values

This avoids syncing the Micro Engine. This is only supported
for VI+ currently. There is probably a way for using
LOAD_CONTEXT_REG on previous chips but that could be done later.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

radv: only expose VK_SUBGROUP_FEATURE_ARITHMETIC_BIT for VI+

Inclusive and exclusives scan are missing because older chips
don't have llvm.amdgcn.update.dpp.

This fixes crashes with dEQP-VK.subgroups.arithmetic.*.

CC: mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

glx: Demand success from CreateContext requests (v2)

GLXCreate{,New}Context, like most X resource creation requests, does not
emit a reply and therefore is emitted into the X stream asynchronously.
However, unlike most resource creation requests, the GLXContext we
return is a handle to library state instead of an XID. So if context
creation fails for any reason - say, the server doesn't support indirect
contexts - then we will fail in strange places for strange reasons.

We could make every GLX entrypoint robust against half-created contexts,
or we could just verify that context creation worked. Reuse the
__glXIsDirect code to do this, as a cheap way of verifying that the
XID is real.

glXCreateContextAttribsARB solves this by using the _checked version of
the xcb command, so effectively this change makes the classic context
creation paths as robust as CreateContextAttribs.

v2: Better use of Bool, check that error != NULL first (Olivier Fourdan)

Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

gm107/ir: fix compile time warning in getTEXSMask

In function 'uint8_t nv50_ir::getTEXSMask(uint8_t)':
warning: control reaches end of non-void function [-Wreturn-type]

Reported-by: Moiman@freenode
Fixes: f821e80213e38e93f96255b3deacb737a600ed40
"gm107/ir: use scalar tex instructions where possible"
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

winsys/amdgpu: Stop using amdgpu_bo_handle_type_kms_noimport

It only behaves any different from amdgpu_bo_handle_type_kms with
libdrm 2.4.93, and it breaks if an older version is picked up.

Bugzilla: https://bugs.freedesktop.org/108096
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>

intel/dump_gpu: add platform option

Got tired of remembering the PCI ids.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

intel/dump_gpu: move output option together

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

radv: disable conditional rendering for vkCmdCopyQueryPoolResults()

VK_EXT_conditional_rendering says that copy commands should not be
affected by conditional rendering.

Cc: 18.2 18.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>