git.libre-soc.org Git - mesa.git/log

turnip: Fix support for immutable samplers.

We were setting up the hardware sampler state when updating a combined
image sampler, but never looking at the immutable sampler for in the
separate case.

Fixes failures in
dEQP-VK.binding_model.shader_access.primary_cmd_buf.sampler_immutable.fragment.*

Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3127>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3127>

turnip: don't set LRZ enable at end of renderpass

Fixes hanging with cases that use more than one renderpass.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3122>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3122>

freedreno/ir3: lower pack/unpack ops

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3106>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3106>

nir: add option to lower half packing opcodes

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3106>

turnip: Add support for descriptor arrays.

I had a bigger rework I was working on, but this is simple and gets tests
passing.

Fixes 36 failures in
dEQP-VK.binding_model.shader_access.primary_cmd_buf.sampler_mutable.fragment.*
(now all passing)

Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3124>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3124>

turnip: Drop unused variable.

We really need -Werror in CI.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3124>

panfrost: Don't double-create scratchpad

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes: 4f7fddbd716 ("panfrost: Pass size to panfrost_batch_get_scratchpad")
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3119>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3119>

panfrost: Simplify sampler upload condition

Makes it more obvious what's going on.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3119>

gallium/auxiliary: Handle count == 0 in u_vbuf_get_minmax_index_mapped

This makes u_vbuf_get_minmax_index_mapped return min = 0 / max = 0
when info->count == 0.

That should never happen anyway, but this commit makes it at least
return a sane value that callers expect, and also allows us - and
GCC - to assume count != 0 for optimization purposes.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3050>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3050>

gallium/auxiliary: Reduce conversions in u_vbuf_get_minmax_index_mapped

With this patch, GCC generates vectorized code that does the comparisons
without converting the indices to 32-bit first.

This optimization makes the aforementioned function almost twice as fast
for ARM NEON, and should speed up vectorised code on other platforms.

Without vectorisation, the function is still a percent or two faster,
but slightly larger.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3050>

amd/addrlib: update to the latest version

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>

turnip: remove duplicate A6XX_SP_CS_CONFIG_NIBO

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3104>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3104>

turnip: change emit_ibo to be like emit_textures

Adds missing alignment and error checking.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3104>

turnip: fix emit_ibo

Based on the GL driver:
-Compute needs different opcode (this fixes a GPU hang problem)
-REG_A6XX_SP_IBO_LO/REG_A6XX_SP_CS_IBO_LO were swapped

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3104>

turnip: remove compute emit_border_color

Current tu6_emit_border_color doesn't work for compute and there's no
example from the GL driver to base it on, so replace it with a finishme.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3104>

turnip: fix emit_textures for compute shaders

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3104>

utils/os_socket: Define ssize_t on windows.

Fixes: ef5266ebd50 ("util/os_socket: Add socket related functions.")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

radeonsi/gfx10: fix ngg_get_ordered_id

This could have caused issues with NGG streamout.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3095>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3095>

radeonsi: reset more fields in si_llvm_context_set_ir to fix reusing ctx

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3095>

radeonsi: fix determining whether the VS prolog is needed

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3095>

radeonsi: allow generating VS prologs with 0 inputs

If "ls_vgpr_fix" is set, we use a prolog, but it can have 0 inputs.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3095>

radeonsi/gfx10: don't insert NGG streamout atomics if they are never used

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3095>

radeonsi: don't wrap the VS prolog in if (ES thread) .. endif

We can execute it unconditionally and the values computed for disabled
threads won't be used anyway.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3095>

radeonsi: set is_monolithic for VS prologs when the shader is really monolithic

This fixes a bug with NGG that is probably harmless.

Basically, !is_monolithic makes the VS prolog emit
llvm.amdgcn.init.exec.from.input, which sets the EXEC mask to only enable
ES threads. In the NGG non-GS case, the GS threads <= ES threads, so it was
never an issue.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3095>

radeonsi: disallow compute-based culling if polygon mode is enabled

Polygon mode can generate thick points or lines.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3095>

radeonsi: deduplicate ES and GS thread enablement code

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3095>

ac: fix the return value in cull_bbox when bbox culling is disabled

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3095>

ac: fix ac_get_i1_sgpr_mask for Wave32

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3095>

panfrost: Remove asserts in panfrost_pack_work_groups_compute

It's a hot routine and these are exceedingly unlikely to break.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3067>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3067>

panfrost: Pack invocation_shifts manually instead of a bit field

gcc generates exceptionally bad code for panfrost_pack_work_groups_fused
otherwise ... although that routine is somehow still hot ...

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3067>

anv: Export VK_KHR_buffer_device_address only when really supported

Fixes: 1b6991ba1d8 ("anv: Implement VK_KHR_buffer_device_address")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3071>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3071>

anv: Export filter_minmax support only when it's really supported

Fixes: bea4d4c78c3 ("anv: add VK_EXT_sampler_filter_minmax support")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3071>

freedreno/ir3: lower mul_2x32_64

lower_mul_2x32_64 generates mul_high opcodes, and lower_mul_high is done by
nir_lower_alu, so call nir_lower_alu after nir_opt_algebraic.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>

turnip: implement CmdFillBuffer/CmdUpdateBuffer

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>

turnip: don't require src image to be set for clear blits

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>

turnip: use common blit path for buffer copy

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>

turnip: use single substream cs

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>

panfrost: Remove fbd_type enum

Just use the MALI_MFBD tag directly; it's clean.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3118>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3118>

ci: Reinstate Panfrost CI

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3118>

panfrost: Fix FBD issue

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes: b0e915b4e65 ("panfrost: Emit SFBD/MFBD after a batch, instead of before")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3118>

vulkan/wsi: error out when image fence doesn't signal

If for some reason the fence associated with an image doesn't signal,
we're likely in a device lost scenario, we should report that error.

We can't really wait for a given amount of time because we could get a
timeout and that is not a valid error to report for vkQueuePresentKHR,
so just wait forever.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/830
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

anv: drop unused parameter from apply layout pass

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

anv: constify pipeline layout in nir passes

Was hoping to find potential issues but nothing. Still probably a good
idea.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

pan/midgard: Set r1.w magic

I'm honestly unsure what this is for, but it's needed on MFBD systems
for unknown reasons, at least when MRT is actually in use and then
sometimes without MRT (it fixes a blend shader issue on T760?)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>

pan/midgard: Fix liveness analysis with multiple epilogues

Epilogues are special fixed-function blocks, so they need special
handling for liveness analysis to work completely. This in turns fixes
RA issues for many shaders using MRT.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>

pan/midgard: Writeout per render target

The flow is considerably more complicated. Instead of one writeout loop
like usual, we have a separate write loop for each render target. This
requires some scheduling shenanigans to get right.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>

pan/midgard: Add schedule barrier after fragment writeout

This is a branch, like discard, so we need a barrier to make it safe.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>

panfrost: Pass blend RT number through

We have to key the blend shader for the render target number due to
writeout silliness.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Visoso <tomeu.vizoso@collabora.com>

gallium: refuse to create buffers larger than UINT32_MAX

pipe_resource.width0 is 32 bits and hardware support for bigger buffer is
limited (eg: AMD hardware doesn't support buffer shader resources bigger
than 4GB).

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2053
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2948>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2948>

radeonsi: disable dcc for 2x MSAA surface and bpe < 4

This fixes a series of dEQP tests on Raven platforms:
  - dEQP-GLES3.functional.fbo.msaa.2_samples.rgba4
  - dEQP-GLES3.functional.fbo.msaa.2_samples.rgb5_a1
  - dEQP-GLES3.functional.fbo.msaa.2_samples.rgb565
  - dEQP-GLES3.functional.fbo.msaa.2_samples.rg8
  - dEQP-GLES3.functional.fbo.msaa.2_samples.r16f

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3090>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3090>

v3d: expose OES_geometry_shader

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: support precompiling geometry shaders

At present, this is only relevant for shader-db.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: disable lowering of indirect inputs

V3D can do indirect inputs so we don't need it. Also, the lowering
produces horrible if-ladder code that is particularly bad for geometry
shaders where inputs are always arrays and shader bodies usually have
a loop indexing into them.

This fixes a couple of geometry shader tests in CTS that would fail to
register allocate otherwise.

There are no changes in shader-db.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: fix primitive queries for geometry shaders

With geometry shaders the number of emitted primitived is decided
at run time, so we cannot precompute it in the CPU and we need to
use the PRIMITIVE_COUNTS_FEEDBACK commands to have the GPU provide
the number like we do for the number of primitives written to
transform feedback. This may have a performance impact though, since
it requires a sync wait for the draw to complete, so we only do
it when geometry shaders are present.

v2: remove '> 0' comparison for ponter type (Alejandro)

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: handle writes to gl_Layer from geometry shaders

When geometry shaders write a value to gl_Layer that doesn't correspond to
an existing layer in the target framebuffer the rendering behavior is
undefined according to the spec, however, there are CTS tests that trigger
this scenario on purpose, probably to ensure that nothing terrible happens.

For V3D, this situation is problematic because the binner uses the layer
index to select the offset to write into the tile state data, and we only
allocate tile state for MAX2(num_layers, 1), so we want to make sure we
don't produce values that would lead to out of bounds writes. The simulator
has an assert to catch this, although we haven't observed issues in actual
hardware it is probably best to play safe.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: move layer rendering to a separate helper

This helps with reducing nesting level after adding the loop
to handle layered rendering.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: support rendering to multi-layered framebuffers

When doing layered rendering the binning stage will prepare per-tile
lists for each layer in the framebuffer, so we need to make sure
we allocate enough space for them .

We also need to emit the NUMBER_OF_LAYERS packet. This is required
even when the number of layers is only 1, otherwise the simulator
detects buffer overflows in the tile_state BO during some CTS test
cases involving layered FBOs.

When rendering, we need to emit commands for each layer of the
framebuffer separately and make sure we address the correct layers for
each one.

v2: fixed typo in comment (Alejandro)

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: do not limit new CL space allocations with branch to 4096 bytes

For layered rendering we need to emit per layer rendering commands
lists so we we can end up requiring a fairly large buffer for this
if the number of layers is large enough.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: remove obsolete assertion

OES_geometry_shader introduced the concept of layered framebuffers.
Removing this assertion gets a bunch of CTS tests to pass. We will
also need layered images to implement layered rendering with geometry
shaders.

v2: fix typo in commit message (Alejandro)

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: support transform feedback with geometry shaders

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: save geometry shader state for blitting

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: predicate geometry shader outputs inside non-uniform control flow

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: don't try to render if shaders failed to compile

This is the same we do in the compute path to avoid crashes
at draw time.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: add support for adjacency primitives

v2: remove obsolete comment (Alejandro)

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: we always have at least one output segment

If we program an output size of 0 the simulator asserts. This was
not a problem until now because our VS would always have to
emit fixed function outputs, however, now that it can be paired
with a GS we can end up with a VS shader that no longer emits
any outputs.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: compute appropriate VPM memory configuration for geometry shader workloads

Geometry shaders can output many vertices and thus have higher VPM memory
pressure as a result. It is possible that too wide geometry shader dispatches
exceed the maximum available VPM output allocated, in which case we need
to reduce the dispatch width until we can fit the VPM memory requirements.
Supported dispatch widths for geometry shaders are 16, 8, 4, 1.

There is a limit in the number of VPM output sectors that can be used by a
geometry shader that we can meet by lowering the dispatch width at compile
time, however, at draw time we need to revisit this number and, together with
other elements that can contribute to total VPM memory requirements, decide
on a configuration that can fit the program into the available VPM memory.
Ideally, we also want to aim for not using more than half of the available
memory so we that we can run a pair of bin and render programs in parallel.

v2: fixed language in comment and typo in commit log. (Alejandro)

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: add 1-way SIMD packing definition

According to the documentation, the 1-way dispatch width is only supported
with geometry shaders.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: implement geometry shader instancing

v2:
- Remove unused field uses_iid from v3d_gs_prog_data (Alejandro)

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: emit geometry shader state commands

This is good enough to get basic GS workloads working, later patches will
improve this by adding instancing support, proper SIMD configuration, etc.

Notice that most of the TESSELLATION_GEOMETRY_SHADER_PARAMS fields are only
relevant when tessellation shaders are present. We do not support tessellation
yet, but we still need to fill in these tessellation state with default values
since our packing functions require some of these to have non-zero values.

v2:
- Add a comment in the code explaining why we fill in
tessellation fields (Alejandro)

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: fix packet descriptions for geometry and tessellation shaders

Every code address starts at bit 3 (addresses must be 64-bit aligned),
with the first 3 bits used to specify threading and NaN propagation
parameters for the shader program.

We generally skip "reserved" bits, however, doing this when the
reserved field is the last in a struct and it is large enough can
make us compute incorrect (smaller) struct sizes which can
lead to corrupt CLs. In particular, the "Tess/Geom Common Params"
struct has a reserved field at the end that is 8-bit, so if we
don't include this we compute a packet size that is 1 byte smaller
than it shold, making the next packet we emit start 1 byte
earlier and therefore leading to incorrect CL data from that point
forward.

The name of one of the fields was not correct.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: add initial compiler plumbing for geometry shaders

Most of the relevant work happens in the v3d_nir_lower_io. Since
geometry shaders can write any number of output vertices, this pass
injects a few variables into the shader code to keep track of things
like the number of vertices emitted or the offsets into the VPM
of the current vertex output, etc. This is also where we handle
EmitVertex() and EmitPrimitive() intrinsics.

The geometry shader VPM output layout has a specific structure
with a 32-bit general header, then another 32-bit header slot for
each output vertex, and finally the actual vertex data.

When vertex shaders are paired with geometry shaders we also need
to consider the following:
  - Only geometry shaders emit fixed function outputs.
  - The coordinate shader used for the vertex stage during binning must
    not drop varyings other than those used by transform feedback, since
    these may be read by the binning GS.

v2:
- Use MAX3 instead of a chain of MAX2 (Alejandro).
- Make all loop variables unsigned in ntq_setup_gs_inputs (Alejandro)
- Update comment in IO owering so it includes the GS stage (Alejandro)

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: remove unused variable

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: enable debug options for geometry shader dumps

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: add debug assert

While lowering vpm outputs we look for the NIR variables matching
particular store output instructions and we expect to find a match,
so assert on that.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

v3d: add missing plumbing for VPM load instructions

We will need to use LDVPMG_IN specifically to read VPM inputs
in geometry shaders.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

turnip: Lower usub_borrow.

Fixes dEQP-VK.glsl.builtin.function.integer.usubborrow.uvec2_mediump_fragment.

Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2986>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2986>

intel/fs: Lower 64-bit MOVs after lower_load_payload()

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3070>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3070>

amd/common: Always use addrlib for HTILE tc-compat.

Even without depth+stencil addrlib can (correctly!) decide to
disable tc compatible HTILE.

One example is 8x sampling with 32-bit depth on Stoney. The row size
on Stoney is 1024, while the tile size is 2048, which results in
tile splits which are not supported with tc-compat.

On Stoney, this fixes
dEQP-VK.glsl.builtin_var.fragdepth.*_list_d32_sfloat_multisample_8

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3054>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3054>

amd/common: Fix tcCompatible degradation on Stoney.

addrlib sometimes returns smaller sizes for tcCompat as it does
not seem to take into account the depth+stencil matching config
gymnastics with tcCompat.

This fixes
dEQP-VK.pipeline.render_to_image.core.2d_array.huge.height.r8g8b8a8_unorm_d32_sfloat_s8_uint

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3054>

docs/features: mark GL_ARB_texture_compression_bptc as done for llvmpipe, softpipe, swr

Signed-off-by: Denis Pauk <pauk.denis@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: Marek Olšák <maraeo@gmail.com>
CC: Rhys Perry <pendingchaos02@gmail.com>
CC: Bruce Cherniak <bruce.cherniak@intel.com>
CC: Matt Turner <mattst88@gmail.com>

gallium/swr: Enable support bptc format.

Reuse Code from:
f69bc797e1 gallium/auxiliary: Add helper support for bptc format compress/decompress

Signed-off-by: Denis Pauk <pauk.denis@gmail.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: Marek Olšák <maraeo@gmail.com>
CC: Tim Rowley <timothy.o.rowley@intel.com>

freedreno/a6xx: fix OUT_REG() vs growable cmdstream

BEGIN_RING() could decide we can't fit the next packet in the current
cmdstream segment, and grow a new segment. So we need to grab ring->cur
*after* BEGIN_RING(), otherwise we are writing cmdstream past the end of
the previous segment.

Fixes: bdd98b892f3 ("freedreno: New struct packing macros")
Signed-off-by: Rob Clark <robdclark@chromium.org>

lima: split draw calls on 64k vertices

The Mali400 only supports draws with up to 64k vertices per command.
To handle this, break the draw_vbo call into multiple commands.
Indexed drawing is left to a separate code path.
This implementation was ported from vc4_draw_vbo.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2445>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2445>

vc4: move the draw splitting routine to shared code

This can also be useful for other hardware which has similar limitations
on vertex count per single draw.
The Mali400 has a similar limitation and can reuse this.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2445>

lima: refactor indexed draw indices upload

As of this commit this is just a refactor in preparation to enable
support for more than 64k vertices.
To support splitting the draw_vbo call, indices shouldn't be re-uploaded
every time.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2445>

lima: allocate separate bo to store varyings

The current strategy using the suballocator with fixed size doesn't
scale and causes some programs with large number of vertices (like some
glmark2 scenes) to crash.
Change it to dynamically allocate a separate bo to accomodate for
arbitrary number of vertices.
This also fixes the buffer read/write flags for gp.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2445>

gallium/util: add alignment parameter to util_upload_index_buffer

At least on Mali Utgard, index buffers need to be aligned on 0x40.
To avoid duplicating this, add an alignment parameter.
Keep the previous default for the other existing users.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2445>

drirc: Final Fantasy VIII: Remastered needs allow_higher_compat_version

This gets it running on i965 with Mesa master. (The game won't start
without GL 3.3 compatibility, but uses 1.20 with GL_EXT_gpu_shader4
for shaders.)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3076>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3076>

st/glsl_to_nir: fix SSO validation regression

Fixes: b77907edb554 ("st/glsl_to_nir: use nir based program resource list builder")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2216
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

ci: Remove T760/T860 from CI temporarily

I feel really bad about this but this one test is flaking. I don't want
to do a mass revert (and bisection is extremely difficult with
nondeterministic/Heisenbugs), but it's Friday night and master needs to
pass. This commit should be reverted asap (once the flake is solved)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

iris: Implement WA for push constants.

v2: Apply WA to gen11+ instead of gen12+ (Jordan).

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

lima/parser: Add texture descriptor parser

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2980>

lima/parser: Add RSW parsing

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2980>

lima/parser: Some fixes and cleanups

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2980>

vulkan/overlay: Update docs.

Add mention to overlay control socket.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

vulkan/overlay: Add basic overlay control script.

This can be used to start/stop statistics capturing from the command
line.

v3:
- Install script (Lionel)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

vulkan/overlay: Add a command to start capturing data to a file.

By default, if an output_file is specified, the overlay layer will start
capturing data immediately. After this commit, when a control socket is
used, the capture starts disabled by default, and is only enabled when a
command ":capture=1;" is received.

when the capture is enabled, we might have already accumulated some
stats. To avoid capturing such noise, we discard and reset the fps and
stats, updating the display and capturing only data from that point on.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

vulkan/overlay: Add support for a control socket.

Add support for socket from which the overlay layer can receive
commands. This control socket can be useful to allow setting options
once the application is already running. For instance, triggering the
capture of fps data at a certain point.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

vulkan/overlay: Add a control socket.

v2: Use a socket instead of named pipe.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

util/os_socket: Add socket related functions.

v3:
- Add os_socket.c/h into Makefile.sources (Lionel)
- Add empty non-linux implementation to public functions.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>