git.libre-soc.org Git - mesa.git/log

glsl/nir: add glsl_types::explicit_size plus nir C wrapper

While using SPIR-V shaders (ARB_gl_spirv), layout data is not implicit
to a specific value (std140, std430, etc) but explicitly included on
the type (explicit values for offset, stride and row_major).

So this method is equivalent to the existing std140_size and
std430_size, but using such explicit values.

Note that the value returned by this method is only valid if such data
is set, so when dealing with SPIR-V shaders.

v2: (all changes suggested by Jason Ekstrand)
   * Iterate through all struct members, instead of assume that fields
     are ordered by offset
   * Use else if
   * Take into account the case that explicit_stride > elem_size, to
     fine graine the final size on arrays and matrices
   * Handle different bit-sizes in general, not just 32 and 64.

v3: (change suggested by Caio Marcelo de Oliveira Filho)
   * fix up explicit_size() to consider interface types

Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

glsl_types: add type::bit_size and glsl_base_type_bit_size helpers

Note that the nir_types glsl_get_bit_size is not a wrapper of this
one, because for bools at the nir level, we want to return size 1, but
at the glsl_types we want to return 32.

v2: reuse the new method in order to simplify is_16bit and is_32bit
helpers (Timothy)

v3: add a comment clarifying the difference between
glsl_base_type_bit_size and glsl_get_bit_size.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

nir: add is_in_ubo/ssbo/block helpers

Equivalent to the already existing ir_variable is_in_buffer_block and
is_in_shader_storage_block, adding the uniform buffer object one. I'm
using the short forms (ssbo, ubo) to avoid having method names too
long.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

spirv/nir: fill up nir variable info for ubos and ssbo

The data for some nir variables is only filled up for some specific
modes. We need now too for UBO/SSBO, as such info would be used when
linking for OpenGL (ARB_gl_spirv).

There is an existing comment just before that code (starts with XXX)
that points that binding still needs to be filled up for uniform
variables at that point, and that should be fixed, although it doesn't
specify why that's a problem or what would be the alternative. For now
doing the same for UBO/SSBO, and will hope that the future fixing is
done for all of them.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

spirv/nir: create nir variable for UBO/SSBO

Providing nir variables for UBO/SSBO it is not required for Vulkan,
but it is needed for OpenGL (ARB_gl_spirv), like for example, to
gather info from the UBO/SSBO while linking.

In opposite with most cases where the nir variables is created, here
the type assigned is the full type (not just the bare type). This is
needed because while linking using the nir shader we need the explicit
layout info (explicit stride, explicit offset, row_major, etc).

Also, we need to assign an interface type, used also on the OpenGL
linker if it is a UBO/SSBO. See ir_variable::is_in_buffer_block as
example.

v2: assign interface_type to be the variable type, not need to be
arrayness (Timothy)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

vl: Use CS composite shader only if TEX_LZ and DIV are supported

Enable the compute shader copositer only when TEX_LZ is supported by the driver.

v2: Also check whether DIV is supported.

https://bugs.freedesktop.org/show_bug.cgi?id=110783

Fixes: 9364d66cb7f7
gallium/auxiliary/vl: Add video compositor compute shader render

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

gallium: Add CAP for opcode DIV

Not all drivers support TGSI_OPCODE_DIV, so we should have a cap to be able
to check this.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

vl: replace DIV-ADD with MAD using inverse size

Optimize the shader a bit by emitting MAD with the inverse size values
instead of DIV+ADD.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

etnaviv: blt: blit with the original format when possible

This fixes BGR565 blit: currently BGRA444 is used for the blit, but with
swizzles from the original BGR565 format, so the 4 alpha bits are set to 1.
We can't just use the swizzle from the 'compatible' format, since there are
cases where BGR<->RGB swap needs to happen.

We can avoid all this trouble by using the original formats and only
falling back to the 'compatible' format when we need to.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>

etnaviv: clear all bits for 24bpp depth without stencil

For fast clear to happen, all bits must be cleared.

This allows using fast clear for 24bpp depth without stencil.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>

mesa: use binary search for MESA_EXTENSION_OVERRIDE

Not a hot path obviously, but the table still has 425 extensions, which
you can go through in just 9 steps with a binary search.

The table is already sorted, as required by other parts of the code and
enforced by mesa's `main-test`.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

gitlab-ci: test meson installation

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

anv: fix indentation

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

anv: fix typo

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

anv: replace hard-coded platform list with vk.xml parse

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

android: fix typo LOCAL_EXPORT_C_INCLUDES

Should be LOCAL_EXPORT_C_INCLUDE_DIRS.

Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Tested-by: Mauro Rossi <issor.oruam@gmail.com>

android: virgl: fix generated virgl_driinfo.h building rules

Changelog in Android makefile:
- Add LOCAL_MODULE_CLASS, intermediates and LOCAL_GENERATED_SOURCES
- Use LOCAL_EXPORT_C_INCLUDE_DIRS to export $(intermediates) path
- Move generated header rules before 'include $(BUILD_STATIC_LIBRARY)'

Fixes the following building error:

In file included from external/mesa/src/gallium/targets/dri/target.c:1:
external/mesa/src/gallium/auxiliary/target-helpers/drm_helper.h:257:16:
fatal error: 'virgl/virgl_driinfo.h' file not found
#include "virgl/virgl_driinfo.h"
^~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

Fixes: cf800998a ("virgl: Add driinfo file and tie it into the build")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Review-by: Chih-Wei Huang <cwhuang@linux.org.tw>

intel/compiler: don't use byte operands for src1 on ICL

The simulator complains about using byte operands, we also have
documentation telling us.

Note that add operations on bytes seems to work fine on HW (like ADD).
Using dwords operands with CMP & SEL fixes the following tests :

   dEQP-VK.spirv_assembly.type.vec*.i8.*

v2: Drop the GLK changes (Matt)
    Add validator tests (Matt)

v3: Drop GLK ref (Matt)
    Don't mix float/integer in MAD (Matt)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com>
BSpec: 3017
Cc: <mesa-stable@lists.freedesktop.org>

egl: Enable eglGetPlatformDisplay on Android Platform

This helps to add eglGetPlatformDisplay support on Android
Platform.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

nir/serach: Increase maximum commutative expressions from 4 to 8

No shader-db change on any Intel platform. No shader-db run-time
difference on a certain 36-core / 72-thread system at 95% confidence
(n=20).

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

nir/algebraic: Don't mark expression with duplicate sources as commutative

There is no reason to mark the fmul in the expression

    ('fmul', ('fadd', a, b), ('fadd', a, b))

as commutative.  If a source of an instruction doesn't match one of the
('fadd', a, b) patterns, it won't match the other either.

This change is enough to make this pattern work:

    ('~fadd@32', ('fmul', ('fadd', 1.0, ('fneg', a)),
                          ('fadd', 1.0, ('fneg', a))),
                 ('fmul', ('flrp', a, 1.0, a), b))

This pattern has 5 commutative expressions (versus a limit of 4), but
the first fmul does not need to be commutative.

No shader-db change on any Intel platform.  No shader-db run-time
difference on a certain 36-core / 72-thread system at 95% confidence
(n=20).

There are more subpatterns that could be marked as non-commutative, but
detecting these is more challenging.  For example, this fadd:

    ('fadd', ('fmul', a, b), ('fmul', a, c))

The first fadd:

    ('fmul', ('fadd', a, b), ('fadd', a, b))

And this fadd:

    ('flt', ('fadd', a, b), 0.0)

This last case may be easier to detect.  If all sources are variables
and they are the only instances of those variables, then the pattern can
be marked as non-commutative.  It's probably not worth the effort now,
but if we end up with some patterns that bump up on the limit again, it
may be worth revisiting.

v2: Update the comment about the explicit "len(self.sources)" check to
be more clear about why it is necessary.  Requested by Connor.  Many
Python fixes style / idom fixes suggested by Dylan.  Add missing (!!!)
opcode check in Expression::__eq__ method.  This bug is the reason the
expected number of commutative expressions in the bitfield_reverse
pattern changed from 61 to 45 in the first version of this patch.

v3: Use all() in Expression::__eq__ method.  Suggested by Connor.
Revert away from using __eq__ overloads.  The "equality" implementation
of Constant and Variable needed for commutativity pruning is weaker than
the one needed for propagating and validating bit sizes.  Using actual
equality caused the pruning to fail for my ('fmul', ('fadd', 1, a),
('fadd', 1, a)) case.  I changed the name to "equivalent" rather than
the previous "same_as" to further differentiate it from __eq__.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

nir/search: Log Boolean constants instead of asserting

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

nir/algebraic: Fail build when too many commutative expressions are used

Search patterns that are expected to have too many (e.g., the giant
bitfield_reverse pattern) can be added to a white list.

This would have saved me a few hours debugging. :(

v2: Implement the expected-failure annotation as a property of the
search-replace pattern instead of as a property of the whole list of
patterns. Suggested by Connor.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

nir/algebraic: Fix whitespace error

Trivial

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

panfrost: Allow R11G11B10 rendering

Doesn't fully work yet, but better than crashing.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost: Default to util_pack_color for clears

This might help as we bringup more render-target formats.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

intel/vec4: Try both sources as candidates for being immediates

For some reason, when I first wrote try_immediate_source, I thought the
sources had already been ordered so that the immediate value was the
second source.  That's rubbish.  The generator assumes *neither* source
is immediate, and it relies on later copy/constant propagation passes to
do the reordering.

For this reason, the changes to try_immediate_source have to go to some
efforts to reorder the operands and tell the caller when it reordered
them.  The generator for comparison instructions uses this to determine
when the comparison needs to change (e.g., from GT to LT).

No changes on any Gen8 or later platform because those platforms do not
use the vec4 backend.

Haswell
total instructions in shared programs: 13484431 -> 13480500 (-0.03%)
instructions in affected programs: 441138 -> 437207 (-0.89%)
helped: 1883
HURT: 0
helped stats (abs) min: 1 max: 49 x̄: 2.09 x̃: 1
helped stats (rel) min: 0.07% max: 8.91% x̄: 1.10% x̃: 0.90%
95% mean confidence interval for instructions value: -2.19 -1.98
95% mean confidence interval for instructions %-change: -1.14% -1.06%
Instructions are helped.

total cycles in shared programs: 376420286 -> 376406400 (<.01%)
cycles in affected programs: 15995668 -> 15981782 (-0.09%)
helped: 1692
HURT: 219
helped stats (abs) min: 2 max: 764 x̄: 13.78 x̃: 4
helped stats (rel) min: <.01% max: 9.69% x̄: 0.69% x̃: 0.35%
HURT stats (abs)   min: 2 max: 516 x̄: 43.09 x̃: 22
HURT stats (rel)   min: 0.02% max: 12.09% x̄: 2.30% x̃: 1.13%
95% mean confidence interval for cycles value: -9.70 -4.83
95% mean confidence interval for cycles %-change: -0.42% -0.28%
Cycles are helped.

total spills in shared programs: 23166 -> 23158 (-0.03%)
spills in affected programs: 66 -> 58 (-12.12%)
helped: 2
HURT: 0

total fills in shared programs: 34592 -> 34580 (-0.03%)
fills in affected programs: 75 -> 63 (-16.00%)
helped: 2
HURT: 0

Ivy Bridge
total instructions in shared programs: 12051590 -> 12048513 (-0.03%)
instructions in affected programs: 355911 -> 352834 (-0.86%)
helped: 1481
HURT: 0
helped stats (abs) min: 1 max: 12 x̄: 2.08 x̃: 1
helped stats (rel) min: 0.07% max: 4.92% x̄: 1.08% x̃: 0.90%
95% mean confidence interval for instructions value: -2.17 -1.98
95% mean confidence interval for instructions %-change: -1.12% -1.04%
Instructions are helped.

total cycles in shared programs: 180319624 -> 180307642 (<.01%)
cycles in affected programs: 15591028 -> 15579046 (-0.08%)
helped: 1340
HURT: 174
helped stats (abs) min: 2 max: 764 x̄: 14.19 x̃: 2
helped stats (rel) min: <.01% max: 8.68% x̄: 0.64% x̃: 0.32%
HURT stats (abs)   min: 2 max: 518 x̄: 40.41 x̃: 14
HURT stats (rel)   min: 0.02% max: 8.37% x̄: 1.59% x̃: 0.67%
95% mean confidence interval for cycles value: -10.85 -4.97
95% mean confidence interval for cycles %-change: -0.45% -0.31%
Cycles are helped.

All Gen6 and earlier platforms had simlar results. (Sandy Bridge shown)
total instructions in shared programs: 10863159 -> 10861462 (-0.02%)
instructions in affected programs: 157839 -> 156142 (-1.08%)
helped: 715
HURT: 0
helped stats (abs) min: 1 max: 12 x̄: 2.37 x̃: 2
helped stats (rel) min: 0.23% max: 4.33% x̄: 1.07% x̃: 0.85%
95% mean confidence interval for instructions value: -2.53 -2.21
95% mean confidence interval for instructions %-change: -1.13% -1.02%
Instructions are helped.

total cycles in shared programs: 153957782 -> 153948778 (<.01%)
cycles in affected programs: 3171648 -> 3162644 (-0.28%)
helped: 696
HURT: 62
helped stats (abs) min: 2 max: 390 x̄: 15.72 x̃: 4
helped stats (rel) min: 0.02% max: 10.57% x̄: 0.57% x̃: 0.12%
HURT stats (abs)   min: 2 max: 300 x̄: 31.29 x̃: 2
HURT stats (rel)   min: 0.11% max: 7.23% x̄: 0.83% x̃: 0.34%
95% mean confidence interval for cycles value: -15.65 -8.11
95% mean confidence interval for cycles %-change: -0.56% -0.36%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>

intel/vec4: Try immediate sources for dot products too

No changes on any Gen8 or later platform because those platforms do not
use the vec4 backend.

All Haswell and earlier platforms has similar results. (Haswell shown)
total instructions in shared programs: 13484467 -> 13484431 (<.01%)
instructions in affected programs: 8540 -> 8504 (-0.42%)
helped: 33
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.09 x̃: 1
helped stats (rel) min: 0.31% max: 1.53% x̄: 0.49% x̃: 0.35%
95% mean confidence interval for instructions value: -1.19 -0.99
95% mean confidence interval for instructions %-change: -0.60% -0.38%
Instructions are helped.

total cycles in shared programs: 376420572 -> 376420286 (<.01%)
cycles in affected programs: 56260 -> 55974 (-0.51%)
helped: 26
HURT: 5
helped stats (abs) min: 2 max: 204 x̄: 11.85 x̃: 2
helped stats (rel) min: 0.11% max: 3.08% x̄: 0.39% x̃: 0.13%
HURT stats (abs) min: 2 max: 6 x̄: 4.40 x̃: 6
HURT stats (rel) min: 0.03% max: 0.35% x̄: 0.24% x̃: 0.35%
95% mean confidence interval for cycles value: -22.91 4.45
95% mean confidence interval for cycles %-change: -0.56% -0.02%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Matt Turner <mattst88@gmail.com>

intel/vec4: Try emitting non-scalar immediates

Sometimes an instruction has a vector as a source, but all of the
components have the same value.  For example,

    vec3 32 ssa_16 = load_const (1.0, 1.0, 1.0)
    ...
    vec3 32 ssa_82 = fadd ssa_16, -ssa_81.xyz

No changes on any Gen8 or later platform because those platforms do not
use the vec4 backend.

Haswell
total instructions in shared programs: 13487811 -> 13484467 (-0.02%)
instructions in affected programs: 421981 -> 418637 (-0.79%)
helped: 1859
HURT: 0
helped stats (abs) min: 1 max: 15 x̄: 1.80 x̃: 1
helped stats (rel) min: 0.04% max: 9.80% x̄: 1.04% x̃: 0.84%
95% mean confidence interval for instructions value: -1.85 -1.74
95% mean confidence interval for instructions %-change: -1.07% -1.00%
Instructions are helped.

total cycles in shared programs: 376423252 -> 376420572 (<.01%)
cycles in affected programs: 14800970 -> 14798290 (-0.02%)
helped: 1519
HURT: 329
helped stats (abs) min: 2 max: 462 x̄: 10.59 x̃: 4
helped stats (rel) min: 0.03% max: 16.73% x̄: 0.79% x̃: 0.36%
HURT stats (abs)   min: 2 max: 598 x̄: 40.74 x̃: 16
HURT stats (rel)   min: <.01% max: 10.32% x̄: 2.56% x̃: 0.98%
95% mean confidence interval for cycles value: -3.53 0.63
95% mean confidence interval for cycles %-change: -0.30% -0.09%
Inconclusive result (value mean confidence interval includes 0).

total fills in shared programs: 34601 -> 34592 (-0.03%)
fills in affected programs: 91 -> 82 (-9.89%)
helped: 9
HURT: 0

Ivy Bridge
total instructions in shared programs: 12053565 -> 12051626 (-0.02%)
instructions in affected programs: 298103 -> 296164 (-0.65%)
helped: 1228
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 1.58 x̃: 1
helped stats (rel) min: 0.04% max: 3.57% x̄: 0.91% x̃: 0.81%
95% mean confidence interval for instructions value: -1.63 -1.53
95% mean confidence interval for instructions %-change: -0.95% -0.88%
Instructions are helped.

total cycles in shared programs: 180322270 -> 180319922 (<.01%)
cycles in affected programs: 14123840 -> 14121492 (-0.02%)
helped: 1036
HURT: 195
helped stats (abs) min: 2 max: 462 x̄: 11.93 x̃: 2
helped stats (rel) min: 0.03% max: 14.05% x̄: 0.82% x̃: 0.35%
HURT stats (abs)   min: 2 max: 598 x̄: 51.33 x̃: 16
HURT stats (rel)   min: <.01% max: 9.68% x̄: 3.02% x̃: 0.72%
95% mean confidence interval for cycles value: -4.92 1.10
95% mean confidence interval for cycles %-change: -0.35% -0.07%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total instructions in shared programs: 10864286 -> 10863189 (-0.01%)
instructions in affected programs: 159722 -> 158625 (-0.69%)
helped: 724
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 1.52 x̃: 1
helped stats (rel) min: 0.10% max: 2.91% x̄: 0.79% x̃: 0.62%
95% mean confidence interval for instructions value: -1.58 -1.46
95% mean confidence interval for instructions %-change: -0.82% -0.75%
Instructions are helped.

total cycles in shared programs: 153967938 -> 153957926 (<.01%)
cycles in affected programs: 1923186 -> 1913174 (-0.52%)
helped: 654
HURT: 56
helped stats (abs) min: 2 max: 170 x̄: 20.00 x̃: 4
helped stats (rel) min: 0.03% max: 11.82% x̄: 0.89% x̃: 0.18%
HURT stats (abs)   min: 2 max: 390 x̄: 54.75 x̃: 32
HURT stats (rel)   min: 0.05% max: 6.92% x̄: 3.09% x̃: 2.92%
95% mean confidence interval for cycles value: -17.42 -10.78
95% mean confidence interval for cycles %-change: -0.76% -0.40%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8142677 -> 8141721 (-0.01%)
instructions in affected programs: 139511 -> 138555 (-0.69%)
helped: 588
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 1.63 x̃: 1
helped stats (rel) min: 0.21% max: 4.39% x̄: 0.84% x̃: 0.46%
95% mean confidence interval for instructions value: -1.70 -1.55
95% mean confidence interval for instructions %-change: -0.89% -0.78%
Instructions are helped.

total cycles in shared programs: 188549394 -> 188547676 (<.01%)
cycles in affected programs: 3171960 -> 3170242 (-0.05%)
helped: 527
HURT: 0
helped stats (abs) min: 2 max: 18 x̄: 3.26 x̃: 2
helped stats (rel) min: <.01% max: 0.80% x̄: 0.08% x̃: 0.06%
95% mean confidence interval for cycles value: -3.49 -3.03
95% mean confidence interval for cycles %-change: -0.09% -0.07%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>

nir: Fix lowering of bitfield_insert to shifts.

The bfi/bfm behavior change replaced the bfi/bfm usage in
lower_bitfield_insert_to_shifts with actual shifts like the name says,
but it failed to handle the offset=0, bits==32 case in the new
lowering.

v2: Use 31 < bits instead of bits == 32, to get the 31 < (iand bits,
31) -> false optimization.

Fixes regressions in dEQP-GLES31.*bitfield_insert* on freedreno.

Fixes: 165b7f3a4487 ("nir: define behavior of nir_op_bfm and nir_op_u/ibfe according to SM5 spec.")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>

Revert "meson: Add support for using cmake for finding LLVM"

This reverts commit 5157a4276500c77e2210e853b262be1d1b30aedf.

There is a meson bug that causes llvm to always be statically linked,
which is obviously not what we want. I haven't had time to look into it
yet, but for now let's just revert it.

Revert "meson: try to use cmake as a finder for clang"

This reverts commit 0ba0c0c15c633a5a3b7a4651a743f800f30bcbf6.

mesa: stop trying new filenames if the filename existing is not the issue

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

mesa: use os_file_create_unique()

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

util: add os_file_create_unique()

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

panfrost: Disable DXT-style texture compression

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost: Dump unknown formats before aborting

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost/midgard: Fix 3D texture regression

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost: Add some special formats

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost/midgard: Implement integer sampler

Turns out one of the magic bits in the texture instruction meant
'float'. Different magic bits mean int and uint then :)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost: Remove dubious assert

We already *can* support texture formats with bpp > 4, so..

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost: Implement primitive restart

For GLES3, just pass the flag through.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

i965/icl: Apply WA_1606682166 to compute workloads

We missed the workaround for compute workloads in earlier patches.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

Revert "iris/icl: Add WA_2204188704 to disable pixel shader panic dispatch"

SLICE_COMMON_CHICKEN3 is a privileged register not accesible from userspace.
This patch silences a simulator warning about it.

We don't need to add this workaround in linux kernel as the WA description
says it's fixed on latest stepping.

This reverts commit 9c421d6b47e0c5f206959acd68814b63232946be.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

Revert "anv/icl: Add WA_2204188704 to disable pixel shader panic dispatch"

SLICE_COMMON_CHICKEN3 is a privileged register not accesible from userspace.
This patch silences a simulator warning about it.

We don't need to add this workaround in linux kernel as the WA description
says it's fixed on latest stepping.

This reverts commit 2be60e0c73ed1555a919c5725cc0cab119a2b6de.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

Revert "i965/icl: Add WA_2204188704 to disable pixel shader panic dispatch"

SLICE_COMMON_CHICKEN3 is a privileged register not accesible from userspace.
This patch silences a simulator warning about it.

We don't need to add this workaround in linux kernel as the WA description
says it's fixed on latest stepping.

This reverts commit 85ecd14ef6a084f5e82860de6dbc79870b335682.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

i965/icl: Fix WA_1606682166

An earlier change was setting the SamplerCount = 0 for Gen 11
under #if GEN_GEN < 7. This commit fixes the problem.

This WA has also been added to the linux kernel.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

freedreno/ir3: small cleanup

`target` cannot be NULL here.

Signed-off-by: Rob Clark <robdclark@chromium.org>

freedreno/ir3: fix missing (ss) in dummy bary.f case

In case we need to insert a dummy bary.f for the (ei) flag, it also
needs (ss) so we don't release varying storage to the next VS wave
before the ldlv completed. Fixes random failures in:

dEQP-GLES3.functional.transform_feedback.random.interleaved.lines.*

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>

freedreno/a6xx: wire up dither state

Fixes:
dEQP-GLES2.functional.fbo.render.recreate_colorbuffer.rebind_rbo_rgba4
dEQP-GLES2.functional.fbo.render.recreate_colorbuffer.no_rebind_rbo_rgba4
dEQP-GLES2.functional.fbo.render.recreate_colorbuffer.no_rebind_rbo_rgba4_stencil_index8
dEQP-GLES2.functional.fbo.render.recreate_depthbuffer.rebind_rbo_rgba4_depth_component16
dEQP-GLES2.functional.fbo.render.recreate_depthbuffer.no_rebind_rbo_rgba4_depth_component16
dEQP-GLES2.functional.fbo.render.recreate_stencilbuffer.rebind_rbo_rgba4_stencil_index8
dEQP-GLES2.functional.fbo.render.recreate_stencilbuffer.no_rebind_rbo_rgba4_stencil_index8

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>

meson: Improve detection of Python when using Meson >=0.50.

Previously, on systems where multiple versions of Python 3 (e.g. 3.6 and 3.7)
are installed, wrong version of Python 3 could have been used.

The proper fix requires availability of path() method in Meson's python
module, which has been added in Meson 0.50:
https://github.com/mesonbuild/meson/pull/4616

Distro Bug: https://bugs.gentoo.org/671308
Signed-off-by: Arfrever Frehtes Taifersar Arahesis <Arfrever@Apache.Org>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
v2: - Add missing `endif` keyword (Dylan)

radeon/uvd: fix calc_ctx_size_h265_main10

Left shift was applied twice.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110702

Reviewed-by: Leo Liu <leo.liu@amd.com>
Tested-by: <irherder@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Cc: <mesa-stable@lists.freedesktop.org>

mesa: add display list support for gl(Compressed)TextureSubImage2DEXT

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

mesa: add glTextureParameteri/iv/f/fvEXT

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

mesa: extend _mesa_lookup_or_create_texture to support EXT_dsa

Adds a boolean to implement EXT_dsa specifics.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

mesa: refactor bind_texture

Splits texture lookup and binding actions.

The new _mesa_lookup_or_create_texture will be useful to implement the EXT_direct_state_access extension.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

mesa: extract helper function for glTexParameter*

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

mesa: add buffer != 0 checks to glNamedBufferEXT functions

The EXT_direct_state_access spec says:

    INVALID_OPERATION is generated by GetNamedBufferParameterivEXT,
    GetNamedBufferPointervEXT, GetNamedBufferSubDataEXT,
    MapNamedBufferEXT, NamedBufferDataEXT, NamedBufferSubDataEXT, and
    UnmapNamedBufferEXT if the buffer parameter is zero.

This commits adds buffer != 0 validation to the implemented functions.

glNamedBufferStorageEXT isn't included in this list and the EXT_buffer_storage
doesn't says that buffer = 0 is an error either so I didn't add the same
validation for this function.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

mesa: fix a typo in map_named_buffer_range

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

mesa: add support for glMapNamedBufferEXT()

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

mesa: add support for glUnmapNamedBufferEXT()

Since the ARB DSA function glUnmapNamedBuffer() is only exposed
for 3.1 or above we make glUnmapNamedBuffer() an alias of
glUnmapNamedBufferEXT() rather than the other way around.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

mesa: add support for glCompressedTextureSubImage2DEXT()

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

mesa: add support for glTextureSubImage2DEXT()

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

mesa: add support for glMapNamedBufferRangeEXT()

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

mesa: add support for glNamedBufferStorageEXT

This is available in ARB_buffer_storage when
EXT_direct_state_access is present.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

mesa: add support for glNamedBuffer*DataEXT()

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

mesa: add support for glBindMultiTextureEXT

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

mesa: delete framebuffer texture attachment sampler views

When a context is destroyed the destroy_tex_sampler_cb makes sure that all the
sampler views created by that context are destroyed.
This is done by walking the ctx->Shared->TexObjects hash table.

In a multiple context environment the texture can be deleted by a different context,
so it will be removed from the TexObjects table and will prevent the above mechanism
to work.
This can result in an assertion in st_save_zombie_sampler_view because the
sampler_view owns a reference to a destroyed context.

This issue occurs in blender 2.80.

This commit fixes this by explicitly releasing sampler_view created by the destroyed
context for all texture attachments.

Fixes: 593e36f956 (st/mesa: implement "zombie" sampler views (v2))
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110944
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

meson: GNU/kFreeBSD has DRM/KMS and requires -D_GNU_SOURCE

This is a regression from the old autotools build system.

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>

gallium/u_transfer_helper: Don't leak a reference to the resource.

We pipe_resource_reference when handling transfers in map, we need to
do a corresponding unreference in unmap.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

meson: only add empty lines betwen active summary sections

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

meson: bump required libdrm version to 2.4.81

dbb4457d9858fa977246 started using drmDevicesEqual(), which was
introduced in libdrm 2.4.81

We could either copy the function locally, or bump the required version.
Since the function is non-trivial and 2.4.81 is old enough already,
I suggesting the latter.

Fixes: dbb4457d9858fa977246 ("egl: add EGL_EXT_device_drm support")
Cc: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

ac: change ac_query_gpu_info() signature

Currently libdrm_amdgpu provides a typedef of the various handles. While
the goal was to make those opaque, it effectively became part of the API

To the best of my knowledge there are two ways to have opaque handles:
- "typedef void *foo;" - rather messy IMHO
- "stuct foo;" and use "struct foo *" through the API

In our case amdgpu_device_handle is used only internally, plus
respective code is not used or applicable for r300 and r600. Hence we
copied the typedef.

Seemingly this will be a problem since libdrm_amdgpu wants to change the
API, while not updating the code(?).

Either way, we can safely s/amdgpU_device_handle/void */ and carry on.

Cc: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak at amd.com>

panfrost: Only tag AFBC addresses when sampling

Rendering to AFBC was broken, as the HW will complaint loudly if we pass
a tagged pointer in bifrost_render_target.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Fixes: 3609b50a6443 ("panfrost: Merge AFBC slab with BO backing")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

gallivm: Improve lp_build_rcp_refine.

Use the alternative more accurate expression from
https://en.wikipedia.org/wiki/Division_algorithm#Newton%E2%80%93Raphson_division

v2: Use lp_build_fmuladd as suggested by Roland

Tested by enabling this code path, and running lp_test_arit.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>

panfrost/ci: Don't error out on RK3288

At the moment we don't have enough people to ensure that RK3288 is
regression-free, so don't fail the CI in that case.

For now we'll focus on not regressing on RK3399 and we can expand to
other SoCs as more people join the effort.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Suggested-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost/ci: Don't print every kernel file

As there's lots of them and Gitlab struggles rendering logs with so many
lines.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost/ci: Fix the image name

These changes will make sure we get the right image from the container
registry.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost/ci: Remove batching

Panfrost has grown and doesn't leak as much as before.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

iris: Don't leak resources in iris_create_surface for incomplete FBOs

We were failing to pipe_resource_unreference on the failure path due
to a non-renderable format. Instead of fixing this, just move the
checks earlier, before we even bother with refcounting or calloc.

radv: only enable VK_AMD_gpu_shader_{half_float,int16} on GFX9+

These two extensions are supported on GFX8 but the throughput
of 16-bit floats/integers is same as 32-bit. Also, shaderInt16
is only enabled on GFX9+ for the same reason, be more consistent.

This fixes a crash with Wolfenstein II because it expects
shaderInt16 to be enabled when VK_AMD_gpu_shader_half_float is
exposed. Note that AMDVLK only enables these extensions on GFX9+.

Cc: 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: add si_emit_ia_multi_vgt_param() helper

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

virgl: Don't allow creating staging pipe_resources

Staging buffers are now created directly by the virgl_staging_mgr. We
don't need to support creating staging pipe_resources.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>

virgl: Use virgl_staging_mgr

Use an instance of virgl_staging_mgr instead of u_upload_mgr to handle
the staging buffer. This removes the need to track the availability
of the staging manager, since virgl_staging_mgr can handle concurrent
active allocations.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>

virgl: Add tests for virgl_staging_mgr

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>

virgl: Introduce virgl_staging_mgr

Add a manager for the staging buffer used in virgl. The staging manager
is heavily inspired by u_upload_mgr, but is simpler and is a better fit
for virgl's purposes. In particular, the staging manager:

* Allows concurrent staging allocations.
* Calls the virgl winsys directly to create and map resources, avoiding
unnecessarily going through gallium resources and transfers.

olv: make virgl_staging_alloc_buffer return a bool

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>

virgl: Store the virgl_hw_res for copy transfers

Store the virgl_hw_res instead of the pipe_resource for copy transfer
sources. This prepares the codebase for a change to provide only the
virgl_hw_res for the staging buffers in upcoming commits.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>

iris: Fix major resource leak in iris_set_shader_images

We were failing to unreference the old image resource. Instead of open
coding this and doing it badly, just use the copier function which does
the right thing.

gallium: Make util_copy_image_view handle shader_access

A while back, we added a new field, but failed to update the copier.
I believe iris is the only current user of the new field, and it hasn't
used the copier, so noone noticed.

Fixes: 8b626a22b24 st/mesa: Record shader access qualifiers for images
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

gallium: Teach GALLIUM_REFCNT_LOG about array textures

Otherwise they are classified as pipe_martian_resource, and don't
contain any helpful information about the texture.

Reviewed-by: Eric Anholt <eric@anholt.net>

isl: Don't align phys_level0_sa by block dimension

Aligning phys_level0_sa by the compression block dimension prior to
mipmap layout causes the layout of compressed surfaces to differ from
the sampler's expectations in certain cases. The hardware docs agree:

From the BDW PRM, Vol. 5, Compressed Mipmap Layout,

   The compressed mipmaps are stored in a similar fashion to
   uncompressed mipmaps [...]

   The following exceptions apply to the layout of compressed (vs.
   uncompressed) mipmaps:
      * [...]
      * The dimensions of the mip maps are first determined by applying
the sizing algorithm presented in Non-Power-of-Two Mipmaps
above. Then, if necessary, they are padded out to compression
block boundaries.

The last bullet indicates that alignment should not be done for
calculating a miplevel's dimensions, but rather for determining miplevel
placement/padding. Comply with this text by removing the extra
alignment.

Fixes some fbo-generatemipmap-formats piglit failures on all tested
platforms (SNB-KBL).

v2:
- Note fixed platforms.
- Update some consumers via a helper function.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel: Add and use helpers for level0 extent

Prepare for a bug fix by adding and using helpers which convert
isl_surf::logical_level0_px and isl_surf::phys_level0_sa to units of
surface elements.

v2:
- Update iris (Ken).
- Update anv.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

meson: try to use cmake as a finder for clang

Clang (like LLVM), very annoyingly refuses to provide pkg-config, and
only provides cmake (unlike LLVM which at least provides llvm-config,
even if llvm-config is terrible). Meson has gained the ability to use
cmake to find dependencies, and can successfully find Clang. This change
attempts to use cmake to find clang instead of a bunch of library
searches, when paired with -Dcmake_prefix_path we can much more reliably
use cmake to control which clang we're getting. This is only enabled for
meson >= 0.51, which adds the required options.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

meson: Add support for using cmake for finding LLVM

Meson has support for using cmake as a finder for some dependencies,
including LLVM. Using cmake has a lot of advantages: it needs less meson
maintenance to keep working (even for llvm updates); it works more
sanely for cross compiles (as llvm-config is a compiled binary not a
shell script). Meson 0.51.0 also has a new generic variable getter that
can be used to get information from either cmake, pkg-config, or
config-tools dependencies, which is needed for cmake. We continue to
support using llvm-config if you don't have cmake installed, or if cmake
cannot find a suitable version.

Fixes: 0d59459432cf077d768164091318af8fb1612500
("meson: Force the use of config-tool for llvm")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

iris: Fix memory leak of SO targets

We need to pitch these on context destroy.

iris: Fix memory leak for draw parameter resources

Need to pitch these on context destroy.

iris: Drop u_upload_unmap

We use persistent maps so this does nothing.

intel/compiler: fix derivative on y axis implementation

This rewrites the ddy in EXECUTE_4 mode with a loop to make it more
obvious what is going on and also sets the group each of the 4 threads
in the groups are supposed to execute.

Fixes the following CTS tests :

dEQP-VK.glsl.derivate.dfdyfine.dynamic_*

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Co-Authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 2134ea380033d5 ("intel/compiler/fs: Implement ddy without using align16 for Gen11+")

meson: set up a proper internal dependency for xmlconfig

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

xmlconfig: add missing #include

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>