git.libre-soc.org Git - mesa.git/log

intel/fs: Implement quad_swap_horizontal with a swizzle on gen7

This fixes dEQP-VK.subgroups.quad.compute.subgroupquadswaphorizontal_*
on all gen7 platforms.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>

intel/fs: Use ALIGN16 instructions for all derivatives on gen <= 7

The issue here was discovered by a set of Vulkan CTS tests:

    dEQP-VK.glsl.derivate.*.dynamic_*

These tests use ballot ops to construct a branch condition that takes
the same path for each 2x2 quad but may not be uniform across the whole
subgroup.  They then tests that derivatives work and give the correct
value even when executed inside such a branch.  Because the derivative
isn't executed in uniform control-flow and the values coming into the
derivative aren't smooth (or worse, linear), they nicely catch bugs that
aren't uncovered by simpler derivative tests.

Unfortunately, these tests require Vulkan and the equivalent GL test
would require the GL_ARB_shader_ballot extension which requires int64.
Because the requirements for these tests are so high, it's not easy to
test on older hardware and the bug is only proven to exist on gen7;
gen4-6 are a conjecture.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>

scons+meson: suppress spammy build warning on MacOS

Originally introduced in c7f36574506838274460 ("darwin: Suppress type
conversion warnings for GLhandleARB") to fix Bugzilla #66346 [1], this
workaround was never ported to Scons or Meson.

[1] https://bugs.freedesktop.org/66346

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

i965/fs: Print the scheduler mode.

Line wrap some awfully long lines while we are here.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

i965/fs: Add a shader_stats struct.

It'll grow further, and we'd like to avoid adding an additional
parameter to fs_generator() for each new piece of data.

v2 (idr): Rebase on 17 months. Track a visitor instead of a cfg.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

lima/gp: Support exp2 and log2

log2 is tricky because there cannot be a move between complex1 and
postlog2. We can't guarantee that scheduling complex1 will succeed when
we schedule postlog2, so we try to schedule complex1 and if it fails we
back out by rewriting the postlog2 as a move and introducing a new
postlog2 so that we can try again later.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Qiang Yu <yuq825@gmail.com>

lima/gpir: Always schedule complex2 and *_impl right after complex1

See https://gitlab.freedesktop.org/lima/mesa/issues/94 for the gory
details of why this is needed. For *_impl this is easy, since it never
increases register pressure and it goes in the complex slot hence it
never counts against max nodes. It's a bit more challenging for
complex2, since it does count against max nodes, so we need to change
the reservation logic to reserve an extra slot for complex2 when
scheduling complex1. This second part isn't strictly necessary yet, but
it will be for exp2.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Qiang Yu <yuq825@gmail.com>

radv: Fix descriptor set allocation failure.

Set all the handles to VK_NULL_HANDLE:

"If the creation of any of those descriptor sets fails, then the implementation
must destroy all successfully created descriptor set objects from this command,
set all entries of the pDescriptorSets array to VK_NULL_HANDLE and return the
error."

(Vulkan 1.1.117 Spec, section 13.2)

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>

radv: fix queries with WAIT_BIT returning VK_NOT_READY

When vkGetQueryPoolResults() is called with VK_QUERY_RESULT_WAIT_BIT
set, the driver is supposed to wait for the query to become available
before returning.

Currently, radv returns once the query is indeed ready, but it returns
VK_NOT_READY. It also fails to populate the results.

The problem is a missing volatile in the secondary check for query
availability. This patch removes the secondary check altogether since it
is redundant with the preceding loop.

This bug was found with an unreleased version of SteamVR.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

meson: Test for program_invocation_name

program_invocation_name and program_invocation_short_name are both GNU
extensions. I don't believe one can exist without the other, so only
check for program_invocation_name.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

scons: Test for random_r()

Suggested-by: Eric Engestrom <eric.engestrom@intel.com>

meson: Test for random_r()

It's better to test for needed functions instead of using external
knowledge about presence in this or that C library.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

st/nine: Drop preprocessor guards for glibc-2.12

Same rationale as the previous patch, but additionally these checks just
seem entirely unnecessary. pthread_self() has been used in Mesa since at
least 1999.

Acked-by: Eric Engestrom <eric.engestrom@intel.com>

util: Drop preprocessor guards for glibc-2.12

glibc-2.12 was released in 2010. No one is building new Mesa against 9
year old glibc, and removing these checks allows the code to work on
other C libraries like musl.

Acked-by: Eric Engestrom <eric.engestrom@intel.com>

pan/midgard: Nothing to see here, move along folks

Fixes: dee1e18fe4f ("pan/midgard: Cleanup ops table")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

spirv: don't discard access set by vtn_pointer_dereference

We can have a access flag already set here so just augment the
existing ones.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0fb61dfdeb ("spirv: propagate access qualifiers through ssa & pointer")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

iris: Enable EXT_texture_shadow_lod

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

gallium: Add PIPE_CAP_TEXTURE_SHADOW_LOD

v2: Line wrap to 80 char (Marek Olsak)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: Enable EXT_texture_shadow_lod

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

glsl: Add builtin functions for EXT_texture_shadow_lod

With the help of Sagar, Ian and Ivan.

v2: Fix dependencies (Ian Romanick)

v3: 1) fix function name (Marek Olsak)
2) Add check for extension enable (Marek Olsak)

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

glsl: Allow _textureCubeArrayShadow function to accept ir_texture_opcode

This will be used to support one of the function from
Ext_texture_shadow_lod specification.

With the help of Sagar, Ian and Ivan.

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

mesa: extension boilerplate for EXT_texture_shadow_lod

With the help of Sagar, Ian and Ivan.

Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

pan/midgard: Cleanup ops table

Hopefully this should make a few ops make more sense. No functional
changes.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

pan/midgard: Extend copy-propagation to swizzles

We can compose them when we rewrite, which is.. more code.. but helps.

total instructions in shared programs: 3611 -> 3513 (-2.71%)
instructions in affected programs: 672 -> 574 (-14.58%)
helped: 11
HURT: 2
helped stats (abs) min: 2 max: 14 x̄: 9.09 x̃: 10
helped stats (rel) min: 5.71% max: 24.56% x̄: 17.99% x̃: 18.87%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 1.19% max: 2.08% x̄: 1.64% x̃: 1.64%
95% mean confidence interval for instructions value: -10.45 -4.62
95% mean confidence interval for instructions %-change: -20.07% -9.87%
Instructions are helped.

total bundles in shared programs: 2117 -> 2067 (-2.36%)
bundles in affected programs: 356 -> 306 (-14.04%)
helped: 11
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 4.55 x̃: 5
helped stats (rel) min: 4.55% max: 15.22% x̄: 13.63% x̃: 14.71%
95% mean confidence interval for bundles value: -5.64 -3.45
95% mean confidence interval for bundles %-change: -15.71% -11.55%
Bundles are helped.

total quadwords in shared programs: 3567 -> 3468 (-2.78%)
quadwords in affected programs: 695 -> 596 (-14.24%)
helped: 11
HURT: 1
helped stats (abs) min: 2 max: 14 x̄: 9.09 x̃: 10
helped stats (rel) min: 5.56% max: 21.88% x̄: 14.97% x̃: 15.15%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 2.38% max: 2.38% x̄: 2.38% x̃: 2.38%
95% mean confidence interval for quadwords value: -10.96 -5.54
95% mean confidence interval for quadwords %-change: -17.42% -9.63%
Quadwords are helped.

total registers in shared programs: 391 -> 383 (-2.05%)
registers in affected programs: 46 -> 38 (-17.39%)
helped: 9
HURT: 1
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 25.00% max: 25.00% x̄: 25.00% x̃: 25.00%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 10.00% max: 10.00% x̄: 10.00% x̃: 10.00%
95% mean confidence interval for registers value: -1.25 -0.35
95% mean confidence interval for registers %-change: -29.42% -13.58%
Registers are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

pan/midgard: Extract simple source mod check

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

pan/midgard: Lower texr/texw mixed registers

Conceptually, r28-r29 (as used for reading) and r28-r29 (as used for
writing) aren't registers at all, merely push/pull arrangements. So you
can't feed a texture result back into itself without explicitly moving
in the middle.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

pan/midgard: Always set .cont for derivatives in loops

We need to keep the helper invocations alive.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

pan/midgard: Implement derivatives

Implement the fdd* and fdd* opcodes in the Midgard compiler.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

pan/midgard: Compose original texture swizzle in RA

Used for lowering derivatives.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

pan/midgard: Add new swizzles

Used for derivatives.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

pan/midgard: Add OP_IS_DERIVATIVE helper

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

pan/midgard: Add make_compiler_temp_reg helper

Corrollary to make_compiler_temp (for SSA).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

pan/midgard: Move nir_*_src_index to compiler.h

These helpers are useful for code emission everywhere. Share the love!

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

pan/midgard: Disassemble unknown texture ops as hex

I'm not sure why I ever thought decimal was a good idea.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

pan/midgard: Add support for disassembling derivatives

They're just texture ops.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

nir/find_array_copies: Use correct parent array length

instr->type is the type of the array element, not the type of the array
being dereferenced. Rather than fishing out the parent type, just use
parent->num_children which should be the length plus 1. While we're here
add another assert for the issue fixed by the previous commit.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111251
Fixes: 156306e5e62 ("nir/find_array_copies: Handle wildcards and overlapping copies")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

nir: Fix comparison for nir_deref_instr_is_known_out_of_bounds()

There was an off-by-one error.

Fixes: 156306e5e62 ("nir/find_array_copies: Handle wildcards and overlapping copies")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

radv/gfx10: only compile the GS copy shader on-demand

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

gitlab-ci: Fix scons build directory path

Fixes: dd3d0b2897b8 "gitlab-ci: Only keep the build logs as artifacts."
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

swr/rasterizer: Add memory tracking support

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>

swr/rasterizer: Better implementation of scatter

Added support for avx512 scatter instruction. Non-avx512 will
now call into a C function to do the scatter emulation.

This has better jit compile performance than
the previous approach of jitting scalar loops.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>

swr/rasterizer: cleanups for tessellation

This commit introduces small fixes in preparation for tessellation
support.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>

rasterizer/swr: move BucketMgr to SwrContext

This move gets us back to parity with global manager
in that we can dump render context buckets now.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>

v3d: take into account separate_stencil when checking if stencil should be cleared

In most cases this is not needed because the usual is that when a
separate stencil is written, the parent resource is also written.

This is needed if we have a separate stencil, no depth buffer, and the
source and destination is the same, as in that case the stencil can be
updated, but not the parent source (like if you are blitting only the
stencil buffer). On that situation, the following access to the
stencil buffer would clear the stencil buffer (so overwritting the
previous blitting) cleared because the parent source has
v3d_resource.writes to 0.

As far as I see, that situation only happens with the
GL_DEPTH32F_STENCIL8 format.

Note that one alternative would consider that if the separate_stencil
has been written, the parent should also be considered written (and
update its "writes" field accordingly). But I found this patch more
natural.

Fixes the following piglit tests:
spec/arb_depth_buffer_float/fbo-stencil-gl_depth32f_stencil8-blit
spec/arb_depth_buffer_float/fbo-stencil-gl_depth32f_stencil8-copypixels

the latter regressed when internally glCopyPixels implementation
started to use blitting. So:

Fixes: 131d40cfc91f ("st/mesa: accelerate glCopyPixels(STENCIL)")
Reviewed-by: Eric Anholt <eric@anholt.net>

radv: Don't include radv_private.h from radv_shader.h

This patch decouples radv_shader.h from any LLVM dependency.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

i965/gen10: Remove unnecessary workaround.

In fact, the description of the workaround states that the mask field
doesn't work correctly on gen10, and we need to set it to 0xffff even we
we only want to update a single field:

"The mask bits are not implemented properly on 3DSTATE_3D_MODE. Driver
must always program bits 31:16 of DW1 a value of 0xFFFF. This means
if it is only updating 1 field, it must update all the fields to the
correct value."

So unless we want to change any of the fields of 3DSTATE_3D_MODE,
there's not need to emit. Additionally, it seems this workaround is not
required on gen11. And last but not least, this workaround is not
implemented on iris or anv, and it doesn't seem to be missed there.

So let's just remove the whole thing.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

iris: Fix SO offset to be 32-bit in DrawTransformFeedback handling

We accidentally started copying a full 64-bit value rather than copying
a 32-bit offset and zeroing the top 32-bits. This caused us to compute
bogus vertex counts which could lead to GPU hangs in some cases.

Thanks to Clayton Craft for catching the regressions!

Fixes: 0e24d10ff5c ("iris: Use gen_mi_builder to handle CS ALU operations.")

intel: Use a system value for gl_FragCoord

It's kind-of an anomaly that the Intel drivers are still treating
gl_FragCoord as an input. It also makes zero sense because we have to
special-case it in the back-end.

Because ANV is the only user of nir_lower_wpos_center, we go ahead and
just update it to look for nir_intrinsic_load_frag_coord as part of this
patch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

glsl: Treat gl_FragCoord as a varying even when it's a system value

This fixes glsl-fcoord-invariant-pass.shader_test on drivers that set
GLSLFragCoordIsSysVal which includes radeonsi among others.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

mesa/spirv: Set frag_coord_is_sysval to GLSLFragCoordIsSysVal

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/fs: Remove calculate_urb_setup from fs_visitor

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

freedreno/a6xx: fix MSAA resolve hangs

Seems like RB_BLIT_SCISSOR needs to be aligned to (minimum?) tile size.

Fixes intermittent GPU hangs triggered by some of the three.js samples
on https://threejs.org/

Signed-off-by: Rob Clark <robdclark@chromium.org>

freedreno/ir3: fix for array/reg store vs meta instructions

fishgl.com has a shader which does roughly:

   foo = texture(...);
   if (bar)
      foo = texture(...);

after lowering phi webs to regs we end up w/ a vec4 reg (array).  But
since it was not an indirect access, we try to skip the extra mov.  This
results that the per-component fanout (split) meta instructions store
directly to the reg (array).  Which doesn't work out in RA.

Signed-off-by: Rob Clark <robdclark@chromium.org>

meson: bump required version to 0.46

0.45 has a few annoying bugs (like the one in !358 [1]), and 0.46 is
well over a year old by now, so let's move to it.

[1] https://gitlab.freedesktop.org/mesa/mesa/merge_requests/358

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

radeon/vcn/vp9: add Arcturus VP9 support

Arcturus CHIP enum is less than Navi10, since it's still gfx9,
but its VCN version belongs to VCN2.x

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

radeon/vcn: add Arcturus decode support

different internal registers offset from previous HW

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

amd: add support for Arcturus

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

radeonsi: add AMD_DEBUG=nogfx for testing

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

radeonsi: add support for compute-only chips

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

gallium/auxiliary/vl: add compute shaders for deint yuv

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

gallium/auxiliary/vl: don't call gfx functions on compute-only chips

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

gallium/auxiliary/vl: add PIPE_CAP_GRAPHICS check for vl compositor

Init graphic shader Only when PIPE_CAP_GRAPHICS is true.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

gallium: create multimedia contexts as compute-only if graphics is unsupported

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

gallium: add PIPE_CAP_GRAPHICS

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

radv: implement VK_EXT_index_type_uint8

Natively supported on VI+.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

anv: implement VK_EXT_index_type_uint8

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

vulkan: Bump headers to 1.1.117

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

include/vulkan: bump vk_android_native_buffer

Taken off https://android.googlesource.com/platform/frameworks/native/+/refs/tags/android-9.0.0_r45/vulkan/include/vulkan/vk_android_native_buffer.h

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

intel/mi: only resolve to a temp register if source isn't in memory

aka. fix a s/||/&&/ typo

Fixes: 74063ee61aadd1371a9b ("intel/mi: Add a new gen_mi_store_if() helper.")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

gitlab-ci: Enable freedreno shader-db runs.

Now that helgrind is less upset and I've completed many successful
full shader-db runs, we should be able to enable freedreno shader-db
runs for Mesa checkins on the tiny public shader-db.

Reviewed-by: Rob Clark <robdclark@gmail.com>

nir: Fix helgrind complaints about data race in trivial_swizzle init.

Even if the data race wasn't real (I'm not great at reasoning about
this), helgrind is a nice enough tool that keeping noise out of it is
probably worthwhile. Besides, typing out the numbers keeps the data
in the read-only data section instead of emitting code to initialize
it every time.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>

freedreno: Fix data race on making the shader's id.

The value is only used for IR3_DBG_DISASM, but it cleans up the
helgrind output.

Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno: Take a lock around shader variant creation.

Shaders are shared across contexts in gallium (part of making it so
that you get shader compile at link time, for shader-db and to reduce
compiles at draw time). So, we need to protect from variant creation
for a shader from multiple threads at the same time.

Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno: Fix data races with allocating/freeing struct ir3.

There is a single ir3_compiler in the screen, and each context may be
compiling ir3 shaders, which call ir3_create. ralloc doesn't do any
locking on its own, so eventually you can end up racing to break
ralloc's linked lists.

We really don't want struct ir3 to live as long as the compiler (maybe
struct ir3_shader's lifetime, if anything), so you'd better be freeing
it anyway.

Fixes: 8fe20762433d ("freedreno/ir3: convert over to ralloc")
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno: Fix helgrind complaint on shader-db key setup.

If the variable's going to be static, we shouldn't be memsetting it
from every thread and instead just have it in the data section.

Reviewed-by: Rob Clark <robdclark@gmail.com>

radv: Take variable descriptor counts into account for buffer entries.

Fixes: b5e04e9217b "radv: Support allocating variable size descriptor sets."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111019
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

anv: Don't claim support for 24 and 48-bit formats on IVB

Cc: mesa-stable@lists.freedesktop.org

isl/formats: R8G8B8_UNORM_SRGB isn't supported on HSW

On Haswell, the format works but it doesn't properly do an sRGB decode.
It appears to act identically to R8G8B8_UNORM. Only Vulkan uses this
format so this only affects Vulkan on HSW.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric@engestrom.ch>

pan/midgard: Fix alpha test w.r.t new indexing

Fixes: 9beb3391b55 ("pan/midgard: Tag SSA/reg")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

softpipe: Don't draw when rasterizer_discard is set

Fixes:
  dEQP-GLES3.functional.rasterizer_discard.basic.write_depth_points
  dEQP-GLES3.functional.rasterizer_discard.basic.write_stencil_points
  dEQP-GLES3.functional.rasterizer_discard.fbo.write_depth_points
  dEQP-GLES3.functional.rasterizer_discard.fbo.write_stencil_points
  dEQP-GLES3.functional.rasterizer_discard.scissor.write_depth_points
  dEQP-GLES3.functional.rasterizer_discard.scissor.write_stencil_points

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

softpipe: Fix cube arrays layer selection

To select the correct layer the z-coordinate must be rounded before it
is multiplied by six.

Fixes a number of tests out of
dEQP-GLES31.functional.texture.filtering.cube_array.formats.*

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

vulkan/wsi/wayland: implement acquire timeout

v2: Eric's nits

v3: Reuse timespec utils (Daniel)
Deal with ppoll being interrupted by a signal (Daniel)

v4: Remove unnecessary time check

v5: Deal with EAGAIN from wl_display_prepare_read_queue() (Daniel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v2)
Reviewed-by: Daniel Stone <daniels@collabora.com>

util: add a timespec helper

Copied from Weston, upon Daniel's suggestion

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>

intel: replace large stack buffer with heap allocation

For now, this keeps the "100 bytes" allocation; we can try to figure out
the correct size as a follow up.

Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

radv/gfx10: do not use the fast depth or stencil clear bytes path

It causes issues on GFX10.

This fixes rendering issues with vkmark and Wreckfest at least.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl

ac: do not crash when the buffer data format is invalid

This might happen when a pipeline doesn't define the vertex input
state, so the buffer data format is 0 (aka INVALID).

This fixes crashes when compiling some shaders on GFX10.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

ac/nir: fix txf_ms with an offset

Seems to fix some hair artifacts in Max Payne 3:
https://github.com/daniel-schuermann/mesa/issues/76

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: f4e499ec791 ('radv: add initial non-conformant radv vulkan driver')
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

radv: Delete unused local variables in optimization loop

Totals from affected shaders:
SGPRS: 376 -> 376 (0.00 %)
VGPRS: 620 -> 560 (-9.68 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 292 -> 292 (0.00 %) dwords per thread
Code Size: 20024 -> 20144 (0.60 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 25 -> 25 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

nir/find_array_copies: Handle wildcards and overlapping copies

This commit rewrites opt_find_array_copies to be able to handle
an array copy sequence with other intervening operations in between. In
particular, this handles the case where we OpLoad an array of structs
and then OpStore it, which generates code like:

foo[0].a = bar[0].a
foo[0].b = bar[0].b
foo[1].a = bar[1].a
foo[1].b = bar[1].b
...

that wasn't recognized by the previous pass.

In order to correctly handle copying arrays of arrays, and in particular
to correctly handle copies involving wildcards, we need to use a tree
structure similar to lower_vars_to_ssa so that we can walk all the
partial array copies invalidated by a particular write, including
ones where one of the common indices is a wildcard. I actually think
that when factoring in the needed hashing/comparing code, a hash table
based approach wouldn't be a lot smaller anyways.

All of the changes come from tessellation control shaders in Strange
Brigade, where we're able to remove the DXVK-inserted copy at the
beginning of the shader. These are the result for radv:

Totals from affected shaders:
SGPRS: 4576 -> 4576 (0.00 %)
VGPRS: 13784 -> 5560 (-59.66 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 8696 -> 6876 (-20.93 %) dwords per thread
Code Size: 329940 -> 263268 (-20.21 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 330 -> 898 (172.12 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

nir: Print array deref indices as decimal

We print the size as decimal too, and using hex without a leading "0x"
was very confusing.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

lima/gpir/sched: Handle more special ops in can_use_complex()

We were missing handling for a few other ops that rearrange their
sources somehow in codegen, namely complex2 and select.

This should fix spec@glsl-1.10@execution@built-in-functions@vs-asin-vec3
and possibly other random regressions from the new scheduler which were
supposed to be fixed in the commit right after.

Fixes: 54434fe6706 ("lima/gpir: Rework the scheduler")
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Qiang Yu <yuq825@gmail.com>

lima/gp: Clean up lima_program_optimize_vs_nir() a little

Remove an unnecessary nir_lower_regs_to_ssa as that should be done by
the state tracker, and add a missing DCE pass after running copy
propagation in order to remove the dead copies. This shouldn't fix
anything but the second part will reduce shader sizes.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>

lima/gpir/sched: Don't try to spill when something else has succeeded

In try_node(), we assume that the node we pick can still be scheduled
successfully after speculatively trying all the other nodes. Normally we
always undo every node after speculating it, so that when we finally
schedule best_node the scheduler state is exactly the same and it
succeeds. However, we also try to spill nodes, which can change the
state and in a corner case that can make scheduling best_node fail. In
particular, the following sequence of events happened with piglit
shaders@glsl-vs-if-nested: a partially-ready node N was spilled and a
register store node S, which is a use of N, was created and then later
the other uses of N were scheduled, so that S is now ready and N is
partially ready. First we try to schedule S and succeed, then we try to
schedule another node M, which fails, so we try to spill the remaining
uses of N. This succeeds, but scheduling M still fails so that best_node
is still S. However since one of the uses of N is one cycle ago, and
therefore we inserted a read dependent on S one cycle ago when spilling
N, S can no longer be scheduled as read-after-write latency is three
cycles.

While we could ad-hoc try to catch cases like this, or (the best option
but very complicated) treat the spill as speculative and roll it back if
we decide not to schedule the node, a simpler solution is to just
give up on spilling if we've already successfully speculatively
scheduled another node. We'd give up a few cases where we discover that
by spilling even harder we could schedule a more desirable node, but
that seems like it would be pretty rare in practice. With this we
guarantee that nothing has been touched after best_node was successfully
scheduled. We also cut down on pointless spilling, since if we already
scheduled a node it's unlikely that spilling harder will let us schedule
an even better node, and hence any spilling at this point is probably
useless.

While we're here, clean up the code around spilling by flattening the
two if's and getting rid of the second unnecessary check for INT_MIN.

Fixes: 54434fe6706 ("lima/gpir: Rework the scheduler")
Acked-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>

nv50/ir: don't consider the main compute function as taking arguments

With OpenCL, kernels can take arguments and return values (?). However
in practice, there is no more TGSI compute implementation, and even if
there were, it would probably have named functions and no explicit main.

This improves RA considerably for compute shaders, since temps are not
kept around as return values.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>

nv50/ir: handle insn not being there for definition of CVT arg

This can happen if it's e.g. a uniform or a function argument.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111217
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Cc: mesa-stable@lists.freedesktop.org

nouveau: flip DEBUG -> !NDEBUG

The meson conversion chose to change the meaning of DEBUG to "used for
debugging" to be "used for expensive things for debugging", primarily
for nir_validate. Flip things over so that we get nice things with
optimizations enabled.

While we're at it, also kill off nouveau_statebuf.h which is unused (and
has a mention of DEBUG which is how I found it).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>

nvc0: allow a non-user buffer to be bound at position 0

Previously the code only handled it for positions 1 and up (as would be
for UBO's in GL). It's not a lot of trouble to handle this, and vl or
vdpau want this.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111213
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Cc: mesa-stable@lists.freedesktop.org

nv50,nvc0: update sampler/view bind functions to accept NULL array

Apparently vl (or vdpau) wants to pass that in now. Handle it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111213
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Cc: mesa-stable@lists.freedesktop.org

gallium/vl: fix compute tgsi shaders to not process undefined components

This caused nouveau's function handling logic to think that the MAIN
function was due to receive external parameters, and cascaded some
failures after that. Instead avoid having the undefined components in
the first place.

Fixes: f6ac0b5d71 (gallium/auxiliary/vl: Add compute shader to support video compositor render)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111213
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111217
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

pan/midgard: Introduce invert field

This will enable us to fuse inverts in various ways. Marginal hurt:

total instructions in shared programs: 3610 -> 3611 (0.03%)
instructions in affected programs: 67 -> 68 (1.49%)
helped: 0
HURT: 1

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>