git.libre-soc.org Git - mesa.git/log

intel/fs: refactor surface header setup

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

intel/fs: Add DWord scattered read/write opcodes

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

intel/nir: Use nir_extract_bits in lower_mem_access_bit_sizes

The new helper solves most of the annoying problems with data wrangling
in brw_nir_lower_mem_access_bit_sizes.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

nir: Add tests for nir_extract_bits

nir/builder: Add a nir_extract_bits helper

This new helper is better than nir_bitcast_vector because it's able to
take a (mostly) arbitrary range from the source vector. The only
requirement is that first_bit has to be aligned to the smaller of the
two bit sizes. It wouldn't be hard to lift that requirement but it's
reasonable for now.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

egl: fix _EGL_NATIVE_PLATFORM fallback

When the X11 or Haiku platforms were compiled in, they would bypass the
`_EGL_NATIVE_PLATFORM` fallback by always returning themselves instead.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

anv: Unify GetDeviceQueue and GetDeviceQueue2

Avoid duplicating some checks and code by making anv_GetDeviceQueue a
subcase of anv_GetDeviceQueue2, like radv does.

Signed-off-by: Ricardo Garcia <rgarcia@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

panfrost: Select format-specific blending intrinsics

If we have an accelerated path for a particular framebuffer format,
let's use it to save a bunch of instructions in a blend shader.

[Tomeu: Only use the faster intrinsic on >T760]

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>

pan/midgard: Pack load/store masks

While most load/store operations on 32-bit/vec4 intriniscally, some are
not and have special type-size-dependent semantics for the mask. We need
to convert into this native format.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>

pan/midgard: Implement nir_intrinsic_load_output_u8_as_fp16_pan

We can use the native Midgard ops for this, depending what chip we're
on.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>

pan/midgard: Identify ld_color_buffer_u8_as_fp16*

There are two versions of this opcode, depending what version of the ISA
you're using. I'm not sure if there's a semantic difference; I think
there might be some slight subtleties but it's too early to know at this
stage.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>

nir: Add load_output_u8_as_fp16_pan intrinsic

This is a single opcode, at least on newer Midgard chips. It's easier to
have this represented in NIR rather than trying to optimize out the
conversions, so let's add the intrinsic.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>

panfrost: Set depth and stencil for SFBD based on the format

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

zink: correct depth-stencil format

When using packed vulkan-formats on little-endian systems, we need to
swap the components for the gallium formats. And since Zink isn't
big-endian safe yet, little-endian is the only endianess we care about
right now.

This fixes a bunch of piglit tests, amongs others:
- spec@arb_depth_texture@depth-level-clamp
- spec@arb_depth_texture@depthstencil-render-miplevels * d=z24
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-blit
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-copypixels
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-drawpixels
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-readpixels

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 8d46e35d16e ("zink: introduce opengl over vulkan")

zink/spirv: add support for nir_op_flrp

This fixes the following piglit:

spec@ati_fragment_shader@ati_fragment_shader-render-fog

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>

egl: Mention if swrast is being forced

The system can be disabling HW acceleration unbeknown to the user,
leading to a long debug session trying to work out which component is
failing. A quick mention that it is the environment override would be
very useful.

v2: Use more generic "CPU renderer" and so try to avoid jargon.

Reviewed-By: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Martin Peres <martin.peres@linux.intel.com>

spirv: Sort out the mess that is sampled image

This commit makes two major changes.  First, we add a second case to
OpLoad for sampled images which constructs a vtn_sampled_image and
stashes that rather than stashing a pointer to the combined image
sampler like we do for bare samplers and images.  This should be more in
line with how SPIR-V is intended to work and hopefully doesn't cause any
weird problems.  The second is a rework of vtn_handle_texture to assume
that everything has an image but not everything has a sampler.  We also
add a vtn_fail_if for the case where a texture instructions require a
sampler but none is provided.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

spirv: Add a vtn_decorate_pointer helper

This helper makes a duplicate copy of the pointer if any new access
flags are set at this stage. This way we don't end up propagating
access flags further than they actual SPIR-V decorations. In several
instances where we create new pointers, we still call the decoration
helper directly because no copy is needed.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

spirv: Remove the type from sampled_image

We have types on all vtn_values at this point so there's no reason to
carry the redundant type information.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

freedreno/ir3: also track # of nops for shader-db

The instruction count is (mostly) a measure of what optimization passes
can do, while # of nops is more an indication of how effectively the
scheduler is balancing register pressure vs instruction count. So track
these independently.

(There could be opportunities to rematerialize values to reduce register
pressure, swapping some nop's with other alu instructions, so nothing is
truely independent.. but it is still useful to break these stats out.)

Signed-off-by: Rob Clark <robdclark@chromium.org>

freedreno/ir3: sync disasm changes from envytools

Signed-off-by: Rob Clark <robdclark@chromium.org>

freedreno/a4xx: fix SP_FS_MRT_REG.HALF_PRECISION

Set flag based on actual output reg type.

Signed-off-by: Rob Clark <robdclark@chromium.org>

freedreno/a3xx: fix SP_FS_MRT_REG.HALF_PRECISION

We should really be setting this based on the actual output register
type.

Signed-off-by: Rob Clark <robdclark@chromium.org>

freedreno/ir3: remove obsolete comment

The meta PHI instruction was removed long ago. And fanin/fanout
themselves to not contribute actual instructions (at least not by the
time you get to sched, they may prevent copy-propagating away a mov)

Signed-off-by: Rob Clark <robdclark@chromium.org>

freedreno/ir3/ra: remove ir print after livein/out

The IR hasn't changed at this point, so it isn't really adding any
value.

Signed-off-by: Rob Clark <robdclark@chromium.org>

freedreno/ir3/ra: move regs_count==0 check

Fold it in to writes_gpr() (since a register that does not reference any
registers by definition does not write a register). This lets us avoid
having to handle this case in a few other places.

Signed-off-by: Rob Clark <robdclark@chromium.org>

freedreno/ir3: ir3_print tweaks

Handle HALF/HIGH flags in all cases, and colorize SSA src notation.

Signed-off-by: Rob Clark <robdclark@chromium.org>

freedreno/ir3: use SSA flag on dest register too

We did this in some places before, but not consistantly. But it will be
useful for two-pass RA, to identify which registers have already been
assigned.

While we are cleaning this up, use __ssa_src() and new __ssa_dst()
helper more consistently. (If nothing else, this reduces the # of
callers of ir3_reg_create() to audit that we didn't miss something)

Signed-off-by: Rob Clark <robdclark@chromium.org>

freedreno/ir3: split pre-coloring to it's own function

Signed-off-by: Rob Clark <robdclark@chromium.org>

spirv: Don't leak GS initialization to other stages

The stage specific fields of shader_info are in an union. We've
likely been lucky that this value was either overwritten or ignored by
other stages. The recent change in shader_info layout in commit
84a1a2578da ("compiler: pack shader_info from 160 bytes to 96 bytes")
made this issue visible.

Fixes: cf2257069cb ("nir/spirv: Set a default number of invocations for geometry shaders")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

compiler: pack shader_info from 160 bytes to 96 bytes

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

glsl/linker: pass shader_info to analyze_clip_cull_usage directly

This will be needed by the next commit.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

radeonsi/nir: fix compute shader crash due to nir_binary == NULL

This partially reverts 8b30114dda8.

Fixes: 8b30114dda8 "radeonsi/nir: call nir_serialize only once per shader"

radeonsi/nir: call nir_serialize only once per shader

We were calling it twice.

First serialize it, then use it to compute the cache key.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

util: add blob_finish_get_buffer

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

u_format: Fix swizzle of A1R5G5B5.

Found once I started using the generated unpack code from the Mesa side.

Fixes: 4bbaac3782ad ("gallium: Add some more channel orderings of packed formats.")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>

virgl: support emulating planar image sampling

Mesa emulates planar format sampling with per-plane samplers. Virgl now
supports this by allowing the plane index to be passed when creating a
sampler view from a planar image. With this change, mesa now passes that
information to virgl.

Signed-off-by: David Stevens <stevensd@chromium.org>
Reviewed-by: Lepton Wu <lepton@chromium.org>

gallium/swr: Enable some ARB_gpu_shader5 extensions
Enable / add to features.txt:
- Enhanced textureGather.
- Geometry shader instancing.
- Geometry shader multiple streams.

Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>

gallium/swr: Fix GS invocation issues
- Fixed proper setting gl_InvocationID.
- Fixed GS vertices output memory overflow.

Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>

ac: Handle invalid GFX10 format correctly in ac_get_tbuffer_format.

It happens that some games try to access a vertex buffer without
a valid format. This case was incorrectly handled by
ac_get_tbuffer_format which made ACO emit an invalid instruction.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

panfrost: Try to evict unused BOs from the cache

The panfrost BO cache can only grow since all newly allocated BOs are
returned to the cache (unless they've been exported).

With the MADVISE ioctl that's not a big issue because the kernel can
come and reclaim this memory, but MADVISE will only be available on 5.4
kernels. This means an app can currently allocate a lot memory without
ever releasing it, leading to some situations where the OOM-killer kicks
in and kills the app (or even worse, kills another process consuming
more memory than the GL app) to get some of this memory back.

Let's try to limit the amount of BOs we keep in the cache by evicting
entries that have not been used for more than one second (if the app
stopped allocating BOs of this size, it's likely to not allocate
similar BOs in a near future).

This solution is based on the VC4/V3D implementation.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

panfrost: Move BO cache related fields to a sub-struct

We will soon introduce an LRU list to evict BOs that have been unused
for more than 1 second. Let's first move all BO cache fields to a
sub-struct to clarify which fields are used by the BO caching logic.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

pan/midgard: Switch base for vertex texturing on T720

There aren't texture pipeline registers anymore; instead, space is
shared with work and ldst registers for output and input respectively.
We need to shift the base registers to represent this correctly.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

pan/midgard: Pass shader stage to disassembler

Vertex texturing behaves differently from fragment texturing on some
GPUs.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

pan/midgard: Disassemble half-steps correctly

The meaning of some bits shifts; we need to account for this to print
swizzles sanely.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

pan/midgard: Fix printing of half-registers in texture ops

We were using old style half-registers; let's update that to be
consistent, preparing us for more disassmbler changes in this area.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

freedreno/ir3: Use regid() helper when setting up precolor regs

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/a6xx: Turn on tessellation shaders

Wow. Very triangle. So shader.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/a6xx: Only use merged regs and four quads for VS+FS

When other geometry stages are present, we chose two quads and no
merged regs.

Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/blitter: Save tessellation state

We have tessellation state now.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/a6xx: Only set emit.hs/ds when we're drawing patches

At least the gallium blitter helper will call us to draw with
tessellation shaders set but a non-patch primitive.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno: Use bypass rendering for tessellation

It seems like tiling could work in the Adreno architecture, but we've
only ever seen bypass rendering with tessellation. For now, let's do
that too.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/a6xx: Program state for tessellation stages

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/a6xx: Emit constant parameters for tessellation stages

Assemble the information the stages need and emit the constants.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/a6xx: Allocate and program tessellation buffer

Tessellation needs a couple of buffers that should hold the entire
output from a full VS+TCS draw call.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/a6xx: Build the right draw command for tessellation

We need to select the right primitive type, set a bit to turn on
tessellation and or in the TES output primitive type.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/ir3: Allocate const space for tessellation parameters

The tessellation stages need size and stride or the patch layout as
well as locations of attributes in the patch. The tesselation stages
also use two system memory BOs and need the iovas of those.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/ir3: Pre-color TCS header and primitive ID inputs

Similar to GS, the registers are shared and not reinitialized betewen
VS and TCS, so we need to make sure to allocate the same registers for
the system values between stages.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/ir3: Don't assume binning shader is always VS

In tessellation mode, the TES is (probably) the binning shader.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/ir3: Setup inputs and outputs for tessellation stages

Similar to GS, some inputs are reused when the chsh from VS to TCS or
TES to GS, so we need to make sure we setup the right inputs and make
the shared system values outputs so they don't get clobbered.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/ir3: Implement TCS synchronization intrinsics

We add two new IR3 specific nir intrinsics that map to the new condend
and endpatch instructions.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/ir3: Implement tess coord intrinsic

Our lowering pass made the z component unused by replacing its uses
by 1 - x - y. The intrinsic implementation then just need to return
the x and y components.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/ir3: End TES with chsh when using GS

When we have both TES and GS, the TES needs to chain to the VS with
chmask and chsh GS just like the VS does to either TCS or GS.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/ir3: Add new synchronization opcodes

There are two new opcodes in use in tesselation control shaders:
category 0, opcodes 13 and 15. unk13 is a kill type of instruction
that terminates threads where !p0.x and it used to narrow down a patch
wavefront to just thread 0. Then, once thread 0 has written the tess
levels, it issues unk15, which might signal the TE that another patch
has been fully written.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/ir3: Extend geometry lowering pass to handle tessellation

VS and TCS pass varyings the same way as VS and GS does. TCS then
writes entire patch to a system memory BO and TES eventually reads
back from the BO once the TE starts generating vertices. TES outputs
vertices the same way as VS and GS, except when there's a GS as well,
in which case TES passes varyings to GS same way the VS would.

In addition, the TCS needs a little bit of control flow massaging so
that it only runs for valid invocations needs a couple of unknown
instructions to synchronize with the TE.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/ir3: Add tessellation field to shader key

Whether we're tessellating and which primitives the TES outputs
affects the entire pipeline so let's add a field to the key to track
that.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/ir3: Use imul24 in offset calculations

With the imul24 opcode in place, we can now use it for computing local
offsets (ie for ldlw/stlw).

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/ir3: Add ir3 intrinsics for tessellation

These provide the iovas for system memory buffers used for
tessellation as well as a new HW specific system value.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno: Don't count primitives for patches

The gallium helper doesn't like patches and we can't determine how
many primitives it gets tessellated into anyway. On gens where we
have tessellation, we get the prim count from a HW counter so just
skip counting on the CPU.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/ir3: Add load and store intrinsics for global io

These intrinsics take a ivec2 for the 64 bit base address and a
integer offset.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/ir3: Emit link map as byte or dwords offsets as needed

Stages that load inputs with ldlw (TCS, GS) need byte offsets, stages
that load with ldg (TES) need dwords offsets.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/a6xx: Add register offset for STG/LDG

These instructions take a 64 bit iova as two conescutive registers and
a immediate offset. This patch adds support for the offset to be a
single register, which is added to the 64 bit iova.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/a6x: Rename z/s formats

What we call eRB6_Z24_UNORM_S8_UINT now is actually
RB6_Z24_UNORM_S8_UINT_AS_R8G8B8A8 and RB6_X8Z24_UNORM is actually
RB6_Z24_UNORM_S8_UINT.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/a6xx: Fix layered texture type enum

2D array textures and 3D textures are different enum values after all.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno: Add nogmem debug option to force bypass rendering

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/a6xx: Clear sysmem with CP_BLIT

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/a6xx: Fix primitive counters again

We use one mechanism for (REG_A6XX_RBBM_PRIMCTR_8_LO)
PIPE_QUERY_PRIMITIVES_GENERATED, which counts all primitives that exit
the geometry pipeline, whether or not xfb is on. Then for
PIPE_QUERY_PRIMITIVES_EMITTED, we use the CP_EVENT_WRITE subfunction
that writes out per-stream counts for generated and emitted, but only
when xfb is enabled.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/registers: Add comments about primitive counters

Adding comments about best guess at what the counters count.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/registers: Move SP_PRIMITIVE_CNTL and SP_VS_VPC_DST

Move these two to be in order with the other VS regs.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

freedreno/registers: Fix typo

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>

aco: add Instruction::usesModifiers() and add more checks in the optimizer

No pipeline-db changes.

v2: use early-exit for VOP3

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v1)

radv: adjust loop unrolling heuristics for int64

In particular, increase the cost of 64-bit integer division.

Fixes huge shaders with dEQP-VK.spirv_assembly.type.scalar.i64.mod_geom
, with ACO used for GS this creates shaders requiring a branch with
>32767 dword offset.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

lima: fix bo submit memory leak

Fix memory leak on allocation for lima submit, reported by valgrind.

128 bytes in 1 blocks are definitely lost in loss record 38 of 84
   at 0x484A6E8: realloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
   by 0x58689C7: util_dynarray_ensure_cap (u_dynarray.h:91)
   by 0x5868BBB: util_dynarray_grow_bytes (u_dynarray.h:139)
   by 0x5868BBB: lima_submit_add_bo (lima_submit.c:113)
   by 0x585D7D3: lima_ctx_buff_va (lima_context.c:57)
   by 0x586378F: lima_pack_plbu_cmd (lima_draw.c:802)
   by 0x586378F: lima_draw_vbo (lima_draw.c:1351)
   by 0x5406A2F: u_vbuf_draw_vbo (u_vbuf.c:1184)
   by 0x55D0A57: st_draw_vbo (st_draw.c:268)
   by 0x55576CB: _mesa_draw_arrays (draw.c:374)
   by 0x55576CB: _mesa_draw_arrays (draw.c:351)
   by 0x43610B: Mesh::render_vbo() (mesh.cpp:583)
   by 0x415DBB: SceneBuild::draw() (scene-build.cpp:242)
   by 0x41131B: MainLoop::draw() (main-loop.cpp:133)
   by 0x411947: MainLoop::step() (main-loop.cpp:108)

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>

lima: fix nir shader memory leak

Fix memory leak on allocation for nir shader, reported by valgrind.

3,502 (480 direct, 3,022 indirect) bytes in 1 blocks are definitely lost in loss record 77 of 84
   at 0x48483F8: malloc (in /usr/lib/valgrind/vgpreload_memcheck-arm64-linux.so)
   by 0x5750817: ralloc_size (ralloc.c:119)
   by 0x5750977: rzalloc_size (ralloc.c:151)
   by 0x575C173: nir_shader_create (nir.c:45)
   by 0x5763ACB: nir_shader_clone (nir_clone.c:728)
   by 0x55D5003: st_create_fp_variant (st_program.c:1242)
   by 0x55D789F: st_get_fp_variant (st_program.c:1522)
   by 0x55D789F: st_get_fp_variant (st_program.c:1507)
   by 0x56400C3: st_update_fp (st_atom_shader.c:163)
   by 0x563D333: st_validate_state (st_atom.c:261)
   by 0x55D07CB: prepare_draw (st_draw.c:132)
   by 0x55D08DF: st_draw_vbo (st_draw.c:184)
   by 0x55576CB: _mesa_draw_arrays (draw.c:374)
   by 0x55576CB: _mesa_draw_arrays (draw.c:351)

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>

Meson: Remove lib prefix from graw and osmesa when building with Mingw.
Also remove version sufix from osmesa swrast on Windows.

v2: Make sure we don't remove lib prefix on *nix platforms.

Signed-off-by: Prodea Alexandru-Liviu <liviuprodea@yahoo.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Cc: "19.3" <mesa-stable@lists.freedesktop.org>

mesa: expose SPIR-V extensions in the Compatibility profile too

We would like to have GL 4.6 Compatibility too.

The extensions don't support compatibility features, so no other changes
are needed.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>

st_get_external_sampler_key: improve error message

Signed-off-by: Marek Olšák <marek.olsak@amd.com>

mesa/st: Make st_pipe_format_to_mesa_format an effective no-op.

All callers other than the unit test just wanted to convert back from
a known-mesa-equivalent format, which is now a no-op.

v2: Fix assertion failure in iris GL startup with BGR565 by continuing
to return MESA_FORMAT_NONE for non-Mesa formats.

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)

mesa/st: Gut most of st_mesa_format_to_pipe_format().

Now that MESA_FORMAT_x is just a PIPE_FORMAT_x define, we can strip
this function down to just the compression fallbacks.

v2: Restore the SRGB format for ASTC SRGB fallback case.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

mesa: Redefine MESA_FORMAT_* in terms of PIPE_FORMAT_*.

There are various places in Mesa where we would like to be able to
have a shared format enum between Mesa and gallium (NIR compiler's
image formats, for example, or mapping from gallium's formats to
mesa's and vice versa in st_format.c). Rewriting all MESA_FORMAT to
PIPE_FORMAT would be disruptive and possibly more work than it's worth
(And I actually prefer MESA_FORMAT's name scheme), so for now just
make it so that there's one shared set of enum values.

The #defines here were generated by printing out from the
tests/st_format.c round-tripping loop, with the exception of 8888
formats where I hand-edited the #defines to point at the corresponding
gallium packed format define.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

mesa: Prepare for the MESA_FORMAT_* enum to be sparse.

To redefine MESA_FORMAT in terms of PIPE_FORMAT enums, we need to fix
places where we iterated up to MESA_FORMAT_COUNT. I use
_mesa_get_format_name(f) == NULL as the signal that it's not an enum
value with a MESA_FORMAT.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

mesa/st: Test round-tripping of all compressed formats.

We checked round-tripping of formats without fallbacks, but weren't
setting the compression support flags in the mock context and thus
needed to skip testing those. Just set all the flags and assert that
no fallbacks are triggered, so we get full test coverage.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

mesa: Stop defining a full separate format for RGBA_UINT8.

We have packed formats for RGBA and ABGR already, so we can just
pack/unpack code.

v2: Rebase on endianness macro rename

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)

gallium: Add equivalents of packed MESA_FORMAT_*UINT formats.

These are the last formats that MESA_FORMAT had and PIPE_FORMAT
didn't. The .csv entries channel sizes and swizzles all came from the
corresponding UNORM format.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

gallium: Add an equivalent of MESA_FORMAT_BGR_UNORM8.

This is the last unorm format that MESA_FORMAT had and PIPE_FORMAT
didn't. Note that it's an array format on gallium's side as well,
since it's a NPOT pixel size.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

gallium: Add some more channel orderings of packed formats.

This covers everything that MESA_FORMAT had for packed unorm.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

gallium: Add defines for FXT1 texture compression.

This texture compression is exposed by 830 and 915, and to make
MESA_FORMAT match PIPE_FORMAT defines I need a corresponding
PIPE_FORMAT.

v2: Set is_hand_written so we don't try to generate pack/unpack code.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

mesa/st: Add mapping of MESA_FORMAT_RGB_SNORM16 to gallium.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

radv/gfx10: fix primitive indices orientation for NGG GS

The primitive indices have to be swapped to follow the drawing
order.

This fixes corruption with Overwatch when NGG GS is force enabled.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Revert "intel/blorp: Fix usage of uninitialized memory in key hashing"

This reverts commit 4432a2d14d80081d062f7939a950d65ea3a16eed.

Pretty much every SKQP test dies with this assertion:
skqp: ../src/mesa/drivers/dri/i965/brw_program_cache.c:102: hash_key: Assertion `item->key_size % 4 == 0' failed.