git.libre-soc.org Git - mesa.git/log

bifrost: Set RTZ rounding mode for f2i conversion

Fixes dEQP-GLES2.functional.shaders.conversions.scalar_to_scalar.float_to_int_fragment

Signed-off-by: Chris Forbes <chrisforbes@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5779>

tu: Enable KHR_variable_pointers

Passes dEQP-VK.spirv_assembly.instruction.graphics.variable_pointers.*
and dEQP-VK.spirv_assembly.instruction.compute.variable_pointers.*

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5684>

tu: Rewrite variable lowering

Don't lower to offsets, instead use nir_lower_explicit_io here and
use actual pointers for UBO's and SSBO's. This makes
KHR_variable_pointers trivial. This also fixes asserts with shared
variables, which are now supposed to be lowered with
nir_lower_explicit_io.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5684>

anv: garbage collect timeline semaphore when querying value

If we don't garbage collect the timeline, the value never progresses
even though work completed.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3226
Fixes: 34f32a6d664807 ("anv: implement VK_KHR_timeline_semaphore")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5774>

v3d: Enable perpendicular line caps when line smoothing

V3D has a bit to set the line caps to be perpendicular to the line
rather than aligned to the edges of the framebuffer. I don’t know what
the disadvantages are of enabling this, but I noticed by experimentation
that enabling line smoothing on the Intel driver also enables nicer line
caps, so it seems nice to enable it here too.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>

v3d: Add a lowering pass for line smoothing

When line smoothing is enabled, the driver now increases the width of
the line so that it can add some semi-transparent pixels to either side
of the line. A lowering pass is added which modifies the alpha component
of every write to fragment output 0 so that if the fragment is outside
the width of the line then the alpha is reduced. It additionally
discards fragments that are completely invisible. It might seem bad to
use discard on a tiled renderer but the assumption is that any bad
effects from using discard will also happen anyway because of enabling
alpha blending.

v2: Disable the line smoothing pass entirely when the framebuffer
    contains an integer colour output or one with no alpha channel.
    Calculate the coverage once upfront and store in a global variable
    instead of calculating each time an output write is modified. Also
    do the conditional discard once upfront.
v3: Don’t check whether the output buffer has an alpha channel. Only
    look at output 0. Use aa_line_width intrinsic instead of calculating
    the real line width in the shader. Clamp the coverage as part of the
    global variable, not per output write.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>

v3d: Handle the line width intrinsics

Adds new QUNIFORMs to store the line widths.

v2: Also handle the aa_line_width intrinsic

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>

nir: Add intrinsics for the line width

The first intrinsic is intended to expose the value set by glLineWidth
to shaders internally. The second intrinsic exposes the value actually
sent to the hardware. This may be wider than the first one in order to
implement anti-aliasing. These will be used in later patches to
implement a line smoothing lowering pass.

v2: Add a second intrinsic for the expanded line width for
anti-aliasing.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>

v3d: Implement the line coord intrinsic

The line coord intrinsic is loaded from the implicit varying stored in
the same slot as the point coord when drawing lines.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>

compiler: Add a system value for the line coord

The line coord is a coordinate along the axis perpendicular to the line.
It is in the range [0,1] between the two edges of the line. It is
available at least on Broadcom hardware.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>

intel/perf: move query_mask and location out of gen_perf_query_counter

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5399>

iris: remove iris_monitor_config

perf_cfg is enough - it already contains almost all necessary
information and is constructed in a more optimal way (O(n) vs O(n^2)
- it uses hash table to build the unique counter list).

"Almost all", because it doesn't contain OA raw counters, but
we should have not exposed them anyway. Quoting Mark Janes:
"I see no reason to include the OA raw counters in the list that
are provided to the user. They are unusable.
The MDAPI library can be used to configure raw counters in a way
that provides esoteric metrics, but that library is written against
INTEL_performance_query."

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5399>

a4xx: hook up centroid ij coords

This is necessary now that the compiler respects centroid interpolation,
even in non-MSAA mode. Otherwise the interpolation doesn't work. Fixes a
bunch of dEQP centroid transform feedback tests.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5778>

nir: Add docs to nir_lower[_explicit]_io

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5418>

nir: Remove shared support from lower_io

No drivers are using this anymore so we can delete it and not keep
maintaining this legacy code-path. If any drivers want this in the
future, they should use nir_lower_varst_to_explicit_types followed by
nir_lower_explicit_io.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5418>

nir: Assert that nir_lower_io is only called with allowed modes

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5418>

panfrost: Only call nir_lower_io on shader_in/out

Gallium drivers should never see nir_var_uniform because gallium lowers
regular uniforms to a UBO. No GL driver should ever see either
nir_var_mem_shared because that's lowered in GLSL IR.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5418>

v3d: Only call nir_lower_io on shader_in/out

Gallium drivers should never see nir_var_uniform because gallium lowers
regular uniforms to a UBO. No GL driver should ever see either
nir_var_mem_shared because that's lowered in GLSL IR.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5418>

vc4: Only call nir_lower_io on shader_in/out

Gallium drivers should never see nir_var_uniform because gallium lowers
regular uniforms to a UBO. No GL driver should ever see either
nir_var_mem_shared because that's lowered in GLSL IR.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5418>

nouveau: Only call nir_lower_io on shader_in/out

Gallium drivers should never see nir_var_uniform because gallium lowers
regular uniforms to a UBO. No GL driver should ever see either
nir_var_mem_shared because that's lowered in GLSL IR.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5418>

lima: Only call nir_lower_io on shader_in/out

Gallium drivers should never see nir_var_uniform because gallium lowers
regular uniforms to a UBO. No GL driver should ever see either
nir_var_mem_shared because that's lowered in GLSL IR.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5418>

freedreno: Only call nir_lower_io on shader_in/out

Gallium drivers should never see nir_var_uniform because gallium lowers
regular uniforms to a UBO. No GL driver should ever see either
nir_var_mem_shared because that's lowered in GLSL IR.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5418>

ir3: mark ucp_enables as allowed values on all keys

Both vertex and fragment shaders need to have the lowering.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5751>

etnaviv: replace prims-emitted query

As we do not support stream output buffers we only count the primitives
processed by the pipeline. Use the correct query type.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5754>

a4xx: add polygon offset clamp, fix units

For some reason, in order to get all tests to pass, pretty much all
hardware (across vendors) has to program in offset_units * 2. This fixes
dEQP-GLES3.functional.polygon_offset.float32_displacement_with_units.

While we're at it, add polygon offset clamp support.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5763>

a4xx: add noperspective interpolation support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5753>

nir: add vec2_index_32bit_offset address format

For turnip, we use the "bindless" model on a6xx. Loads and stores with
the bindless model require a bindless base, which is an immediate field
in the instruction that selects between 5 different 64-bit "bindless
base registers", a 32-bit descriptor index that's added to the base, and
the usual 32-bit offset. The bindless base usually, but not always,
corresponds to the Vulkan descriptor set. We can handle the case where
the base is non-constant by using a bunch of if-statements, to make it a
little easier in core NIR, and this seems to be what Qualcomm's driver
does too. Therefore, the pointer format we need to use in NIR has a vec2
index, for the bindless base and descriptor index. Plumb this format
through core NIR.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5683>

nir: Refactor load/store intrinsic helper

Add the possibility to specify the source components. This is necessary
to let the UBO/SSBO index have more than one component, and it also lets
us remove a few hand-rolled load intrinsic definitions.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5683>

freedreno/regs: document SS6_UBO state src

Document this new a6xx_state_src value seen in A640/A650 tess traces.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5760>

freedreno/fdperf: prefer render node

Avoid inadvertantly becoming master if fdperf happens to be the first
thing to open the device.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5762>

freedreno/fdperf: better compatible string matching

Previously we would match the start of the compatible string, in
a couple of cases, in order to match compatible strings like
"qcom,adreno-630.2". But these cases would always list a more
generic compatible (ie. "qcom,adreno") as a later choice. So if
we parse all the compatible strings, we can do a more precise
exact match.

This avoids us accidentially matching on "qcom,adreno-smmu" and
the hilarity that ensues.

Fixes: 5a13507164a ("freedreno/perfcntrs: add fdperf")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5762>

freedreno/fdperf: fix print of base address

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5762>

wsi/x11: Log swapchain status changes

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5672>

vulkan/wsi: Don't consider VK_SUBOPTIMAL_KHR to be an error condition

This was causing vkAcquireNextImageKHR to not signal the fences and
semaphores. In the case where the semaphore was brand new, this could
cause an unsignalled syncobj to be passed into execbuffer2 which it will
reject with -EINVAL leading to VK_ERROR_DEVICE_LOST. Thanks to Henrik
Rydgård who works on the PPSSPP project for helping me figure this out.

Fixes: ca3cfbf6f1e00 "vk: Add an initial implementation of the actual..."
Fixes: 778b51f491cfe "vulkan/wsi: Add a hooks for signaling semaphores..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5672>

Revert "radv: add support for MRTs compaction to avoid holes"

This reverts commit 7a5e6fd25f2e132ef4cacc3a5b714c4e153227b0.

Since we have two different users bisecting issues to this commit, let's
revert.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 7a5e6fd25f2 "radv: add support for MRTs compaction to avoid holes"
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3202
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3228
(Other report in https://gitlab.freedesktop.org/mesa/mesa/-/issues/3151#note_558589)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5758>

radv: Always enable PERFECT_ZPASS_COUNTS.

We have an issue with early depth testing and discard, where
non-perfect counts count the tile if the early depth test succeeds.

We could spend a lot of effort to set this conditionally based
on the presence of the two conditions, but in the presence of
inherited queries let's try this first.

Changing PERFECT_ZPASS_COUNTS since I'm pretty sure this has a lower
performance impact than always using late depth testing.

CC: <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3218
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5757>

radv: Set handle types in Android semaphore/fence import.

Seems like we forgot to set it all this time ...

Fixes: b1444c9ccb0 "radv: Implement VK_ANDROID_native_buffer."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5759>

radv: disable FMASK compression when drawing with GENERAL layout

Fixes: 96063100 "radv: enable shaderStorageImageMultisample feature on GFX8+"
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3219
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/855
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3165>

Revert "nir: Support sysval tess levels in SPIR-V to NIR"

This reverts commit d2d4677b56efa0003065b61e39c1ef977c83f7da.

The option is not used by any driver.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5744>

Revert "nir: Add an option for lowering TessLevelInner/Outer to vecs"

This reverts commit d2df0761200ba9680f0d22defaa02c33fb051fcf.

The option is not used by any driver.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5744>

freedreno/ir3: fix/rework tess levels

The previous version assumes tess level outputs will only be written once
in the shader, however its not possible to guarantee that.

It also assumes all invocations will write all the levels, which is also
not guaranteed.

This is required to fix the "tesselation" and "terraintessellation" demos
with turnip.

The comment about nir_lower_io_to_temporaries in lower_tess_ctrl_block is
removed because nir_lower_io_to_temporaries specifically skips TESS_CTRL
shaders so the comment doesn't make sense.

The split load for tess levels workaround is removed, the new version only
has scalar access unless if ever gets vectorized.

This sets NIR_COMPACT_ARRAYS cap to avoid the glsl tess vec lowering with
gallium. It seems this will also disable "LowerCombinedClipCullDistance",
which I'm not sure was needed or not.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5744>

freedreno/layout: fix explicit layout offset not added to slice offset

Accidentally broke this when rebasing the offending commit.

My use case with non-zero explicit offset is UV plane of UBWC NV12, and
only the UBWC slice offset is used for the UBWC sampler, so I didn't catch
it immediately.

Fixes: d53dc6c37680eba8e8 ("freedreno/fdl6: rework layout code a bit (reduce linear align to 64 bytes)")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5761>

amd/addrlib: fix another C++ one definition rule violation

Clashes with the SI definition.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3116
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5673>

iris: return max counter value for AMD_performance_monitor

glGetPerfMonitorCounterInfoAMD(..., ..., GL_COUNTER_RANGE_AMD, ...)
returned NAN (binary representation of uint64_t(-1) as float) as
a max value.

Fixes: 0fd4359733e6 ("iris/perf: implement routines to return counter info")
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5473>

st/mesa: fix reporting of float perf counters max value

Some Piglit tests (rightfully) fail because of min >= max when exposed
to perf counters that do not explicitly define their max value.

Failing tests:
spec/amd_performance_monitor/api/test_counter_info
spec/amd_performance_monitor/vc4/test_counter_info

u32/u64 changes are no-ops.

Fixes: 4cd1cfb9831d ("st/mesa: implement GL_AMD_performance_monitor")
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5473>

llvmpipe: enable GL 4.2

mostly just docs patch, features were all complete already

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5724>

llvmpipe: bump to GL support to GL 4.1

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5724>

llvmpipe: bump texture/scene limits to enable GL 4.1

Do we need to make this more dynamic? or have some options for vmware
embedded?

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5724>

mesa/version: only enable GL4.1 with correct limits.

I haven't tested all the limits, but these two should be enough
for driver writers to realise.

I've also submitted a minmax test for piglit to test this.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5727>

turnip: enable 420_UNORM formats

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4600>

turnip: support multi-image layouts

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4600>

turnip: clear_blit: pass aspect mask to setup function

Avoids having to duplicate logic to figure out the write mask on D24S8

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4600>

st/mesa: allow R8 to not be exposed as renderable by driver

A3xx GPUs support RG8 and RGBA8, but not R8 for rendering. Add RG8 as
fallbacks for integer formats, and require a renderable format to be
picked for all R8 variants.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5748>

mesa/glformats: make _mesa_gles_error_check_format_and_type() more consistent

Let's consistently use the following code format instead of relying on
falling through to `default`:

    if (!req)
       return GL_INVALID_OPERATION;
    break;

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5729>

drirc: Add picom to adaptive_sync exclusion list

The compton compositor is unmaintained, with a new fork named picom taking
its place. As with the other compositors (including compton), adaptive
sync should not be enabled.

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5740>

turnip: fix tess param bo size calculation

ir3 already calculates the stride in the tess param bo, so use that instead
of a incorrect calculation. The calculation of per_vertex_output_size /
per_patch_output_size is wrong because it counts dwords instead of bytes,
and what it counts for per_vertex_output_size is a per-patch size because
the glsl type is already an array of # vertex/patch elements.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5743>

nir: Add nir_lower_clip_disable.c to SCons build.

Fixes: fb2fe802f638 ("nir: add lowering pass for clip plane enabling")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3217
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5741>

gitlab-ci: Enable -Werror in `meson-classic` job

It's warning-clean.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5730>

nouveau: fix pointer-sign warning

Fixes: e630271e0ec3 ("mesa: don't ever set NullBufferObj in gl_vertex_array_binding")
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5730>

util: Avoid strict aliasing bugs in xxhash.

XXH32 is doing access through u32 *, and with strict aliasing the compiler
gets to assume that those are independent of the u16 writes we did in
fd6_texture_key setup, and based on various tweaks to the code, would
result in bad hashes computed after inlining. The failure was:

../src/util/hash_table.c:326:_mesa_hash_table_search_pre_hashed: Assertion
`ht->key_hash_function == ((void *)0) || hash == ht->key_hash_function(key)'
failed.)

By setting these two flags, we always take the unaligned,
memcpy-the-32-bit-data path. I believe this should be same perf on x86
(which will happily unaligned load 32 bits in the end), while it will be
slower on arm (where you have to a special unaligned load operation iirc).
This should still be far faster than our old hash.

Fixes: edd62619a1c4 ("freedreno: replace fnv1a hash function with xxhash")
Acked-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5271>

draw/clip: fix viewport index for geometry shaders

The old code updated the viewport index on the first vertex in
a primitive, however it was picking the first vertex wrong
when used with geometry shaders.

This code has access to the prim info with the primitive lengths
so instead keep track of when a new primitive starts by tracking
the lengths and updating the viewport index then. The prim info
is only valid after a GS or prim assembly, so enable prim assembly
if a vertex shader ever uses viewport index.

This fixes:
piglit arb_viewport_array-render-viewport-2
KHR-GLES31.core.viewport_array.draw_to_single_layer_with_multiple_viewports,Fail
KHR-GLES31.core.viewport_array.draw_mulitple_viewports_with_single_invocation,Fail
KHR-GLES31.core.viewport_array.draw_multiple_layers,Fail
KHR-GLES31.core.viewport_array.depth_range,Fail

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5489>

draw/clip: cleanup viewport index handling code.

This moves code around, and adds initial clamping

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5489>

turnip: vsc improvements

* Remove scratch_bo from cmdbuffer, use a device-global bo instead, which
  also includes border color (and eventually shaders for 3D blit path)
* Use CP_SET_BIN_DATA5_OFFSET to allow setting VSC buffer addresses only
  once at the start of the cmdstream
* Use scratch bo mechanism for a resizable VSC buffer
* Use feedback from "vsc_draw_overflow" and "vsc_prim_overflow" values to
  increase the size of VSC buffer when beginning to record a new cmdbuffer

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5570>

turnip: rework render_tiles loop

Loop through pipes and then loop over the tiles in that pipe instead of
looping over all tiles then having to calculate the pipe # and slot #.

Mainly this avoids the hard to follow "config_get_tile" logic, but should
also be a gain due to better use of cache with the VSC data.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5570>

turnip: make tiling config part of framebuffer state

Compute the tiling config at framebuffer creation time. A framebuffer will b
be re-used multiple times, so this will avoid having to re-calculate the
tiling config every time a command buffer is recorded.

The tiling config already couldn't use the render area's x1/y1 because of
hw binning, this move makes it so the render area isn't used at all for the
tiling config.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5570>

Revert "loader/dri3: Check for window destruction in dri3_wait_for_event_locked"

This reverts commit d7d7687829875e401690219d4a72458fb2bbe4de.

It caused freezes with e.g. kwin_x11 due to hitting the 1s timeout.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3214
Reopens: https://gitlab.freedesktop.org/mesa/mesa/-/issues/116
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5722>

meson: Add versioning for xvmc tracker

The xvmc tracker used to be versionned with autotool but this seems to have been
lost in the meson switch.

Fixes: 22a817af8a89eb3c762f ("meson: build gallium xvmc state tracker")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Emmanuel Vadot <manu@FreeBSD.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5708>

st/program: use nir_lower_clip_disable instead of nir_lower_clip_vs conditionally

if the shader already outputs gl_ClipDistance, nir_lower_clip_vs will create
duplicate variables when what we want is to just change the existing values

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5529>

nir: add lowering pass for clip plane enabling

a pass which rewrites gl_ClipDistance[n] to an undef if the corresponding
clip plane is disabled in the rasterizer state

this pass is needed for zink to handle api disables of clip planes

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5529>

v3d/tex: handle correctly coordinates for cube/cubearrays images

When fetching for cube maps, we need to interpret them as 2d texture
arrays, being the third coordinate the index for the face.

Fixes Vulkan CTS tests like the following using v3dv:

dEQP-VK.binding_model.shader_access.primary_cmd_buf.storage_image.fragment.single_descriptor.cube_base_mip
dEQP-VK.binding_model.shader_access.primary_cmd_buf.storage_image.compute.multiple_descriptor_sets.multiple_contiguous_descriptors.cube_array_base_mip

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5675>

CI: reduce bandwidth for git pull

Over the last 7 days, git pulls represented a total of 1.7 TB.

On those 1.7 TB, we can see:
- ~300 GB for the CI farm on hetzner
- ~730 GB for the CI farm on packet.net
- ~680 GB for the rest of the world

We can not really change the rest of the world*, but we can
certainly reduce the egress costs towards our CI farms.

Right now, the gitlab runners are not doing a good job at
caching the git trees for the various jobs we make, and
we end up with a lot of cache-misses. A typical pipeline
ends up with a good 2.8GB of git pull data. (a compressed
archive of the mesa folder accounts for 280MB)

In this patch, we implemented what was suggested in
https://gitlab.com/gitlab-org/gitlab/-/issues/215591#note_334642576

- we host a brand new MinIO server on packet
- jobs can upload files on 2 locations:
  * git-cache/<namespace>/<project>/<branch-name>.tar.gz
  * artifacts/<namespace>/<project>/<pipeline-id>/
- the authorization is handled by gitlab with short tokens
  valid only for the time of the job is running
- whenever a job runs, the runner are configured to execute
  (eval) $CI_PRE_CLONE_SCRIPT
- this variable is set globally to download the current cache
  from the MinIO packet server, unpack it and replace the
  possibly out of date cache found on the runner
- then git fetch is run by the runner, and only the delta
  between the upstream tree and the local tree gets pulled.

We can rebuild the git cache in a schedule job (once a day
seems sufficient), and then we can stop the cache miss
entirely.

First results showed that instead of pulling 280MB of data
in my fork, I got a pull of only 250KB. That should help us.

* arguably, there are other farms in the rest of the world, so
hopefully we can change those too.

Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Peter Hutterer <peter.hutterer@who-t.net>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5428>

tu,radv: fix potentially wrong offset of flexible array.

v2. Remove redundant memset and make the expression simpler.

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5703>

meson: turn on Wimplicit-fallthrough project wide

This will help avoid coding errors and allows for less warnings
from some static analysis tools.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5705>

nv30: add missing fallthrough comment

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5705>

mesa: update fallthrough comment so gcc can see it

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5705>

svga: add missing fallthrough comments

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5705>

r300: add and fix up fallthrough comments

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5705>

mesa: fix unintended fallthrough in glIsEnabled()

Fixes: 08fae07f5246 ("mesa: Handle GL_TEXTURE_GEN_STR_OES in _mesa_Enable()")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5705>

mesa: add missing fallthrough comment to teximage.c

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5705>

mesa/vbo: add some missing fallthrough comments

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5705>

spirv: add missing fallthrough comments

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5705>

radeon: add missing fallthrough comments

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5705>

glsl: move fallthrough comment to where gcc can see it

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5705>

glx: add missing fallthrough comment

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5705>

radeonsi: add missing fallthrough comment

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5705>

mesa: add fallthrough comments to COPY_SZ_4V()

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5705>

nir: fix implicit fallthrough warnings

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5705>

mesa: add fallthrough comments to get.c

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5705>

mesa: add fallthrough comments to glformats.c

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5705>

mesa: fix fallthrough in glformats

Before 908f817918fb this would fallthrough to GL_INVALID_OPERATION
if the validation condition was not met. But since that change it
will now only return GL_INVALID_OPERATION if
!_mesa_has_EXT_texture_compression_bptc(ctx) is true. This seems
unintended.

Here we fix up the fallthrough and add the fallthrough comment so
this doesn't happen again.

Fixes: 908f817918fb ("mesa: expose EXT_texture_compression_bptc in GLES")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3005
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5705>

nir/algebraic: Don't distrubte absolute-value into dot-products

Dot product is multiplication followed by addition, and absolute value
does not distribute into addition.

Only vec4 platforms are affected by this change as scalar-only platforms
never have any of the fdot_replicated instructions.  In the shader-db
results, below, shaders in MANY different applications are affected.
Trine, Doom3, Enemy Territory: Quake Wars, Counter Strike: Global
Offensive, Mad Max, Metro Last Light, and on and on...  I'm really
shocked that there were no test regressions!

All Haswell and earlier platforms had similar results. (Haswell shown)
total instructions in shared programs: 16219743 -> 16219820 (<.01%)
instructions in affected programs: 12171 -> 12248 (0.63%)
helped: 1
HURT: 78
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.78% max: 0.78% x̄: 0.78% x̃: 0.78%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.35% max: 2.38% x̄: 0.91% x̃: 1.06%
95% mean confidence interval for instructions value: 0.92 1.03
95% mean confidence interval for instructions %-change: 0.78% 1.00%
Instructions are HURT.

total cycles in shared programs: 538481383 -> 538491045 (<.01%)
cycles in affected programs: 470796 -> 480458 (2.05%)
helped: 149
HURT: 142
helped stats (abs) min: 1 max: 1338 x̄: 71.13 x̃: 4
helped stats (rel) min: 0.06% max: 40.99% x̄: 2.76% x̃: 0.67%
HURT stats (abs)   min: 1 max: 2092 x̄: 142.68 x̃: 12
HURT stats (rel)   min: 0.07% max: 55.38% x̄: 5.07% x̃: 1.07%
95% mean confidence interval for cycles value: -5.28 71.69
95% mean confidence interval for cycles %-change: -0.07% 2.19%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes: 62795475e8f ("nir/algebraic: Distribute source modifiers into instructions")
Closes: #3129
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5581>

ci: Disable pixmark-piano trace on a630 due to GPU hangs.

I haven't reproduced it with just this trace in a loop locally, but it's
blocked some CI jobs with hangs where a few tiles didn't get
rendered. For example:

https://gitlab.freedesktop.org/mesa/mesa/-/jobs/3314062

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5667>

pan/mdg: Schedule based on liveness

By estimating liveness in the scheduler and choosing instructions likely
to reduce register pressure, on average we can decrease pressure given a
sufficiently larger window. On the other hand, decreasing pressure
instead of leaning too heavily on the search window enables us to use a
much larger search window without inflating pressure too much. So by
doing both in lockstep, we benefit pretty well.

total instructions in shared programs: 49458 -> 48540 (-1.86%)
instructions in affected programs: 26931 -> 26013 (-3.41%)
helped: 221
HURT: 15
helped stats (abs) min: 1 max: 36 x̄: 4.37 x̃: 2
helped stats (rel) min: 0.31% max: 16.90% x̄: 4.97% x̃: 3.85%
HURT stats (abs)   min: 1 max: 4 x̄: 3.13 x̃: 3
HURT stats (rel)   min: 0.50% max: 7.14% x̄: 4.53% x̃: 4.55%
95% mean confidence interval for instructions value: -4.65 -3.13
95% mean confidence interval for instructions %-change: -4.94% -3.81%
Instructions are helped.

total bundles in shared programs: 25199 -> 23446 (-6.96%)
bundles in affected programs: 21600 -> 19847 (-8.12%)
helped: 277
HURT: 170
helped stats (abs) min: 1 max: 45 x̄: 7.33 x̃: 6
helped stats (rel) min: 1.06% max: 33.83% x̄: 11.01% x̃: 8.57%
HURT stats (abs)   min: 1 max: 6 x̄: 1.63 x̃: 1
HURT stats (rel)   min: 1.19% max: 40.00% x̄: 13.36% x̃: 11.11%
95% mean confidence interval for bundles value: -4.61 -3.23
95% mean confidence interval for bundles %-change: -3.00% -0.49%
Bundles are helped.

total quadwords in shared programs: 40269 -> 39652 (-1.53%)
quadwords in affected programs: 35881 -> 35264 (-1.72%)
helped: 242
HURT: 244
helped stats (abs) min: 1 max: 36 x̄: 4.61 x̃: 3
helped stats (rel) min: 0.39% max: 16.33% x̄: 5.33% x̃: 5.13%
HURT stats (abs)   min: 1 max: 20 x̄: 2.04 x̃: 1
HURT stats (rel)   min: 0.81% max: 21.74% x̄: 7.57% x̃: 6.25%
95% mean confidence interval for quadwords value: -1.71 -0.83
95% mean confidence interval for quadwords %-change: 0.46% 1.82%
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

total registers in shared programs: 3786 -> 3336 (-11.89%)
registers in affected programs: 2161 -> 1711 (-20.82%)
helped: 262
HURT: 35
helped stats (abs) min: 1 max: 7 x̄: 1.87 x̃: 1
helped stats (rel) min: 6.25% max: 66.67% x̄: 28.91% x̃: 25.00%
HURT stats (abs)   min: 1 max: 3 x̄: 1.11 x̃: 1
HURT stats (rel)   min: 7.69% max: 100.00% x̄: 19.76% x̃: 12.50%
95% mean confidence interval for registers value: -1.70 -1.33
95% mean confidence interval for registers %-change: -25.56% -20.79%
Registers are helped.

total threads in shared programs: 2453 -> 2592 (5.67%)
threads in affected programs: 160 -> 299 (86.87%)
helped: 79
HURT: 6
helped stats (abs) min: 1 max: 2 x̄: 1.85 x̃: 2
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
HURT stats (abs)   min: 1 max: 2 x̄: 1.17 x̃: 1
HURT stats (rel)   min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00%
95% mean confidence interval for threads value: 1.45 1.82
95% mean confidence interval for threads %-change: 81.08% 97.75%
Threads are [helped].

total spills in shared programs: 168 -> 17 (-89.88%)
spills in affected programs: 167 -> 16 (-90.42%)
helped: 13
HURT: 0

total fills in shared programs: 186 -> 35 (-81.18%)
fills in affected programs: 186 -> 35 (-81.18%)
helped: 14

HURT: 0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5513>

pan/mdg: Vectorize vlut operations

total instructions in shared programs: 49462 -> 49458 (<.01%)
instructions in affected programs: 348 -> 344 (-1.15%)
helped: 2
HURT: 0

total bundles in shared programs: 25201 -> 25199 (<.01%)
bundles in affected programs: 142 -> 140 (-1.41%)
helped: 2
HURT: 0

total quadwords in shared programs: 40273 -> 40269 (<.01%)
quadwords in affected programs: 244 -> 240 (-1.64%)
helped: 2
HURT: 0

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5513>

pan/mdg: Skip r1.w write where possible

Should help cycle count. Register pressure is spurious here.

total instructions in shared programs: 50501 -> 49517 (-1.95%)
instructions in affected programs: 33342 -> 32358 (-2.95%)
helped: 393
HURT: 0
helped stats (abs) min: 2 max: 3 x̄: 2.50 x̃: 3
helped stats (rel) min: 0.26% max: 33.33% x̄: 11.99% x̃: 9.09%
95% mean confidence interval for instructions value: -2.55 -2.45
95% mean confidence interval for instructions %-change: -13.01% -10.97%
Instructions are helped.

total bundles in shared programs: 25511 -> 25309 (-0.79%)
bundles in affected programs: 7778 -> 7576 (-2.60%)
helped: 202
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.43% max: 20.00% x̄: 5.97% x̃: 4.35%
95% mean confidence interval for bundles value: -1.00 -1.00
95% mean confidence interval for bundles %-change: -6.65% -5.28%
Bundles are helped.

total quadwords in shared programs: 40789 -> 40339 (-1.10%)
quadwords in affected programs: 25453 -> 25003 (-1.77%)
helped: 273
HURT: 0
helped stats (abs) min: 1 max: 3 x̄: 1.65 x̃: 2
helped stats (rel) min: 0.16% max: 22.22% x̄: 5.99% x̃: 3.92%
95% mean confidence interval for quadwords value: -1.71 -1.59
95% mean confidence interval for quadwords %-change: -6.68% -5.30%
Quadwords are helped.

total registers in shared programs: 3911 -> 3784 (-3.25%)
registers in affected programs: 275 -> 148 (-46.18%)
helped: 129
HURT: 2
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 14.29% max: 50.00% x̄: 48.69% x̃: 50.00%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00%
95% mean confidence interval for registers value: -1.01 -0.93
95% mean confidence interval for registers %-change: -49.45% -44.91%
Registers are helped.

total threads in shared programs: 2455 -> 2455 (0.00%)
threads in affected programs: 0 -> 0
helped: 0

HURT: 0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5513>

pan/mdg: Prioritize non-moves on VADD/VLUT

This helps reduce ALU cycle count.

total instructions in shared programs: 50507 -> 50501 (-0.01%)
instructions in affected programs: 487 -> 481 (-1.23%)
helped: 7
HURT: 3
helped stats (abs) min: 1 max: 2 x̄: 1.29 x̃: 1
helped stats (rel) min: 1.01% max: 8.33% x̄: 4.11% x̃: 4.35%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 1.54% max: 4.35% x̄: 2.80% x̃: 2.50%
95% mean confidence interval for instructions value: -1.44 0.24
95% mean confidence interval for instructions %-change: -5.12% 1.04%
Inconclusive result (value mean confidence interval includes 0).

total bundles in shared programs: 25640 -> 25511 (-0.50%)
bundles in affected programs: 5879 -> 5750 (-2.19%)
helped: 67
HURT: 7
helped stats (abs) min: 1 max: 16 x̄: 2.04 x̃: 1
helped stats (rel) min: 0.63% max: 18.18% x̄: 4.11% x̃: 2.12%
HURT stats (abs)   min: 1 max: 2 x̄: 1.14 x̃: 1
HURT stats (rel)   min: 1.75% max: 14.29% x̄: 5.42% x̃: 3.70%
95% mean confidence interval for bundles value: -2.29 -1.20
95% mean confidence interval for bundles %-change: -4.41% -2.00%
Bundles are helped.

total quadwords in shared programs: 40899 -> 40789 (-0.27%)
quadwords in affected programs: 11438 -> 11328 (-0.96%)
helped: 70
HURT: 26
helped stats (abs) min: 1 max: 8 x̄: 2.17 x̃: 1
helped stats (rel) min: 0.42% max: 9.76% x̄: 3.29% x̃: 2.56%
HURT stats (abs)   min: 1 max: 5 x̄: 1.62 x̃: 1
HURT stats (rel)   min: 0.48% max: 9.68% x̄: 3.58% x̃: 1.99%
95% mean confidence interval for quadwords value: -1.60 -0.69
95% mean confidence interval for quadwords %-change: -2.28% -0.58%
Quadwords are helped.

total registers in shared programs: 3916 -> 3911 (-0.13%)
registers in affected programs: 129 -> 124 (-3.88%)
helped: 10
HURT: 5
helped stats (abs) min: 1 max: 2 x̄: 1.10 x̃: 1
helped stats (rel) min: 8.33% max: 25.00% x̄: 12.84% x̃: 9.55%
HURT stats (abs)   min: 1 max: 2 x̄: 1.20 x̃: 1
HURT stats (rel)   min: 11.11% max: 66.67% x̄: 27.30% x̃: 14.29%
95% mean confidence interval for registers value: -0.98 0.32
95% mean confidence interval for registers %-change: -12.67% 13.75%
Inconclusive result (value mean confidence interval includes 0).

total threads in shared programs: 2455 -> 2455 (0.00%)
threads in affected programs: 6 -> 6 (0.00%)
helped: 1
HURT: 1
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 50.00% max: 50.00% x̄: 50.00% x̃: 50.00%

total loops in shared programs: 6 -> 6 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total spills in shared programs: 168 -> 168 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0

total fills in shared programs: 186 -> 186 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5513>

pan/mdg: Allow Z/S writes to use any 2nd stage unit

This ensures there will not be dependency problems if we emit a move
that tries to read from a parallel instruction.

No shader-db changes.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5513>

pan/mdg: Defer smul, vlut until after writeout moves

We can end up with bad dependencies with a depth/stencil export. Let's
let the writeout special cases consume these values if possible, using a
move otherwise in which case it won't be used in the other slots anyway.

total instructions in shared programs: 50508 -> 50507 (<.01%)
instructions in affected programs: 12 -> 11 (-8.33%)
helped: 1
HURT: 0

total bundles in shared programs: 25640 -> 25640 (0.00%)
bundles in affected programs: 0 -> 0
helped: 0
HURT: 0

total quadwords in shared programs: 40899 -> 40899 (0.00%)
quadwords in affected programs: 0 -> 0
helped: 0
HURT: 0

total registers in shared programs: 3917 -> 3916 (-0.03%)
registers in affected programs: 3 -> 2 (-33.33%)
helped: 1
HURT: 0

total threads in shared programs: 2455 -> 2455 (0.00%)
threads in affected programs: 0 -> 0
helped: 0
HURT: 0

total spills in shared programs: 168 -> 168 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0

total fills in shared programs: 186 -> 186 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5513>

pan/mdg: Schedule writeout to VLUT

Many thanks to Icecream95 for noticing this is possible if alpha is not
written.

total instructions in shared programs: 50509 -> 50508 (<.01%)
instructions in affected programs: 221 -> 220 (-0.45%)
helped: 2
HURT: 1
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.74% max: 1.35% x̄: 1.04% x̃: 1.04%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 9.09% max: 9.09% x̄: 9.09% x̃: 9.09%

total bundles in shared programs: 25675 -> 25640 (-0.14%)
bundles in affected programs: 5434 -> 5399 (-0.64%)
helped: 34
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.03 x̃: 1
helped stats (rel) min: 0.27% max: 20.00% x̄: 2.29% x̃: 0.67%
95% mean confidence interval for bundles value: -1.09 -0.97
95% mean confidence interval for bundles %-change: -3.64% -0.94%
Bundles are helped.

total quadwords in shared programs: 40887 -> 40899 (0.03%)
quadwords in affected programs: 1995 -> 2007 (0.60%)
helped: 2
HURT: 16
helped stats (abs) min: 1 max: 3 x̄: 2.00 x̃: 2
helped stats (rel) min: 1.67% max: 2.40% x̄: 2.03% x̃: 2.03%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.54% max: 5.88% x̄: 1.40% x̃: 0.86%
95% mean confidence interval for quadwords value: 0.15 1.18
95% mean confidence interval for quadwords %-change: 0.13% 1.90%
Quadwords are HURT.

total registers in shared programs: 3916 -> 3917 (0.03%)
registers in affected programs: 2 -> 3 (50.00%)
helped: 0
HURT: 1

total threads in shared programs: 2455 -> 2455 (0.00%)
threads in affected programs: 0 -> 0
helped: 0
HURT: 0

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5513>

pan/mdg: Remove bundle interference code

This incorrectly worked around the r1 issue fixed earlier.

total instructions in shared programs: 50514 -> 50509 (<.01%)
instructions in affected programs: 826 -> 821 (-0.61%)
helped: 10
HURT: 5
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.10% max: 4.17% x̄: 2.04% x̃: 1.59%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 1.16% max: 5.00% x̄: 3.10% x̃: 2.17%
95% mean confidence interval for instructions value: -0.87 0.21
95% mean confidence interval for instructions %-change: -1.90% 1.25%
Inconclusive result (value mean confidence interval includes 0).

total bundles in shared programs: 25680 -> 25675 (-0.02%)
bundles in affected programs: 539 -> 534 (-0.93%)
helped: 10
HURT: 5
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.54% max: 9.09% x̄: 3.51% x̃: 2.22%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 2.22% max: 8.33% x̄: 5.44% x̃: 4.17%
95% mean confidence interval for bundles value: -0.87 0.21
95% mean confidence interval for bundles %-change: -3.40% 2.35%
Inconclusive result (value mean confidence interval includes 0).

total quadwords in shared programs: 40887 -> 40887 (0.00%)
quadwords in affected programs: 0 -> 0
helped: 0
HURT: 0

total registers in shared programs: 3916 -> 3916 (0.00%)
registers in affected programs: 22 -> 22 (0.00%)
helped: 2
HURT: 2
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 16.67% max: 25.00% x̄: 20.83% x̃: 20.83%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 16.67% max: 16.67% x̄: 16.67% x̃: 16.67%
95% mean confidence interval for registers value: -1.84 1.84
95% mean confidence interval for registers %-change: -36.96% 32.79%
Inconclusive result (value mean confidence interval includes 0).

total threads in shared programs: 2455 -> 2455 (0.00%)
threads in affected programs: 0 -> 0
helped: 0
HURT: 0

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5513>