git.libre-soc.org Git - mesa.git/log

st/mesa: move st_init_driver_flags() earlier in file

To get rid of forward declaration.

Reviewed-By: Gert Wollny <gw.fossdev@gmail.com>

docs: update llvmpipe.html build instructions

etnaviv: Add sampler TS support

Sampler TS is an hardware optimization that can be used when rendering
to textures. After rendering to a resource with TS enabled, the
texture unit can use this to bypass lookups to empty tiles. This also
means a resolve-in-place can be avoided to flush the TS.

This commit is also an optimization when not using sampler TS, as
resolve-in-place will now be skipped if a resource has no (valid) TS.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>

etnaviv: Flush TS cache before changing TS configuration

This is to make sure that the TS is properly flushed to memory before
rendering to a new surface starts.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>

etnaviv: Add TS_SAMPLER formats to etnaviv_format

Sampler TS introduces yet another format enumeration for
renderable+textureable formats. Introduce it into the etnaviv_format
table as another column.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>

etnaviv: Check that resource has a valid TS in etna_resource_needs_flush

Resources only need a resolve-to-itself if their TS is valid for any
level, not just if it happens to be allocated.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>

etnaviv: rnndb update

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>

radv: it isn't an error to not support a format or driver

This reverts two of the vk_error changes:

reporting unsupported format is common,
and testing non-amdgpu drivers and ignoring them is also common.

Fixes: cd64a4f70 (radv: use vk_error() everywhere an error is returned)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

i965: Drop some reserved space remnants.

BATCH_RESERVED was deleted in commit 2c46a67b4138631217 (i965: Delete
BATCH_RESERVED handling.) The reserved_space field is dead code, and
the comments aren't useful these days.

intel: Drop mtypes.h include from brw_compiler.h.

This isn't necessary and causes trouble for a project I'm working on.

i965: Fold ABO state upload code into the SSBO/UBO state upload code.

Having this separate could potentially make programs that rebind atomics
but no other surfaces ever so slightly faster. But it's a tiny amount
of code to add to the existing UBO/SSBO atom, and very related.

The extra atoms have a cost on every draw call, and so dropping some of
them would be nice. This also reclaims a dirty bit.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965: Use nir_lower_atomics_to_ssbos and delete ABO compiler code.

We use the same hardware mechanism for both atomic counters and SSBO
atomics, so there's really no benefit to maintaining separate code to
handle each case. Instead, we can just use Rob's shiny new NIR pass to
convert atomic_uints to SSBOs, and delete piles of code.

The ssbo_start section of the binding table becomes a combined ABO and
SSBO section, with ABOs first, then SSBOs.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965: Make a better helper function for UBO/SSBO/ABO surface handling.

This fixes the missing AutomaticSize handling in the ABO code, removes
a bunch of duplicated code, and drops an extra layer of wrapping around
brw_emit_buffer_surface_state().

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

radv: add the vertex buffers BO to the list at bind time

This should reduce the overhead of adding a BO to the current
list, especially when the list is huge. Also, when a new pipeline
is bound, we only need to update the descriptor, the buffer objects
should already be in the list.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: replace vb_dirty with RADV_CMD_DIRTY_VERTEX_BUFFER

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: drop radv_cmd_dirty_mask_t typedef

I don't think we will need a 64-bit unsigned integer for the
dirty flags in the future, and there is still 20 bits left.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: use an unsigned 32-bit integer for radv_queue::family_index

VkDeviceQueueCreateInfo::queueFamilyIndex is an unsigned 32-bit
integer.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: do not add the image BO in radv_set_dcc_need_cmask_elim_pred()

radv_fill_buffer() ensures that the image BO is added to the list.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: do not add the image BO in radv_set_color_clear_regs()

radv_fill_buffer() ensures that the image BO is added to the list.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

r600: set the number type correctly for float rts in cb setup

Float rts were always set as unorm instead of float.
Not sure of the consequences, but at least it looks like the blend clamp
would have been enabled, which is against the rules (only eg really bothered
to even attempt to specify this correctly, r600 always used clamp anyway).
Albeit r600 (not r700) setup still looks bugged to me due to never setting
BLEND_FLOAT32 which must be set according to docs...
Not sure if the hw really cares, no piglit change (on eg/juniper).

Reviewed-by: Dave Airlie <airlied@redhat.com>

r600: use ieee version of rsq

Both r600 and evergreen used the clamped version, whereas cayman used the
ieee one. I don't think there's a valid reason for this discrepancy, so let's
switch to the ieee version for r600 and evergreen too, since we generally
want to stick to ieee arithmetic.
With this, behavior for both rcp and rsq should now be the same for all of
r600, eg, cm, all using ieee versions (albeit note rsq retains the abs
behavior for everybody, which may not be a good idea ultimately).

Reviewed-by: Dave Airlie <airlied@redhat.com>

r600: use ieee version of rcp

r600 used the clamped version for rcp, whereas both evergreen and cayman
used the ieee version. I don't know why that discrepancy exists (it does so
since day 1) but there does not seem to be a valid reason for this, so make
it consistent. This seems now safer than before the previous commit (using
the dx10 clamp bit).
Note that rsq still uses clamped version (as before even though the table
may have suggested otherwise for evergreen) for r600/eg, but not for cayman.
Will be changed separately for better regression tracking...

Reviewed-by: Dave Airlie <airlied@redhat.com>

r600: use DX10_CLAMP bit in shader setup

The docs are not very concise in what this really does, however both
Alex Deucher and Nicolai Hähnle suggested this only really affects instructions
using the CLAMP output modifier, and I've confirmed that with the newly
changed piglit isinf_and_isnan test.
So, with this bit set, if an instruction has the CLAMP modifier bit (which
clamps to [0,1]) set, then NaNs will be converted to zero, otherwise the result
will be NaN.
D3D10 would require this, glsl doesn't have modifiers (with mesa
clamp(x,0,1) would get converted to such a modifier) coupled with a
whatever-floats-your-boat specified NaN behavior, but the clamp behavior
should probably always be used (this also matches what a decomposition into
min(1.0, max(x, 0.0)) would do, if min/max also adhere to the ieee spec of
picking the non-nan result).
Some apps may in fact rely on this, as this prevents misrenderings in
This War of Mine since using ieee muls
(ce7a045feeef8cad155f1c9aa07f166e146e3d00), without having to use clamped
rcp opcode, which would also fix this bug there.
radeonsi also seems to set this bit nowadays if I see that righ (albeit the
llvm amdgpu code comment now says "Make clamp modifier on NaN input returns 0"
instead of "Do not clamp NAN to 0" since it was changed, which also looks
a bit misleading).

v2: set it in all shader stages.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103544

Reviewed-by: Dave Airlie <airlied@redhat.com>

r600: use min_dx10/max_dx10 instead of min/max

I believe this is the safe thing to do, especially ever since the driver
actually generates NaNs for muls too.
The ISA docs are not very helpful here, however the dx10 versions will pick
a non-nan result over a NaN one (this is also the ieee754 behavior), whereas
the non-dx10 ones will pick the NaN (verified by newly changed piglit
isinf-and-isnan test).
Other "modern" drivers will most likely do the same.
This was shown to make some difference for bug 103544, albeit it is not
required to fix it.

Reviewed-by: Dave Airlie <airlied@redhat.com>

r600: fix cubemap arrays

A lot of cubemap array piglits fail, port the texture type
picking code from radeonsi which seems to fix most of them.

For images I will port the rest of the code.

Fixes:
getteximage-depth gl_texture_cube_map_array-*
fbo-generatemipmap-cubemap array
getteximage-targets cube_array
amongst others.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

freedreno/a5xx: small comment fix

Signed-off-by: Rob Clark <robdclark@gmail.com>

freedreno/a5xx: indirect draw support

A couple failures in piglit tests w/ TF or gl_VertexID + indirect draws.
OTOH all the deqp tests (although they don't test those combinations).
I suspect this could be fixed by a firmware update, but I don't think
there is much we can do in mesa for that.

Signed-off-by: Rob Clark <robdclark@gmail.com>

freedreno/a5xx: split out helper for pipeline stalls

We need a similar thing for indirect draws.

Signed-off-by: Rob Clark <robdclark@gmail.com>

freedreno: update generated headers

Signed-off-by: Rob Clark <robdclark@gmail.com>

gallium/radeon: disable the cache when nir backend enabled

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

st/glsl_to_tgsi: use tgsi_get_gl_varying_semantic() for gs/tes outputs

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

gallium/tgsi: add tess output supoort to tgsi_get_gl_varying_semantic()

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

st/glsl_to_tgsi: make use of tgsi_get_gl_varying_semantic()

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

gallium/tgsi: add prim id to tgsi_get_gl_varying_semantic()

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

i965: Make use of brw_load_register_imm32() helper function

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Nanley Chery <nanley.g.chery@intel.com>

i965/gen8+: Fix the number of dwords programmed in MI_FLUSH_DW

Number of dwords in MI_FLUSH_DW changed from 4 to 5 in gen8+.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>

i965: Program DWord Length in MI_FLUSH_DW

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>

anv/gen10: Enable float blend optimization

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

intel/genxml: Add Cache Mode SubSlice Register to gen10.xml

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

anv/gen10: Implement WaSampleOffsetIZ workaround

We already have this workaround in OpenGL driver.
See Mesa commit 3cf4fe2219.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: Nanley Chery <nanley.g.chery@intel.com>
Cc: Rafael Antognolli <rafael.antognolli@intel.com>

mesa/st: add missing copyright headers to memoryobjects files

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

mesa: minor tidy up for memory object error strings

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

broadcom/vc4: fix indentation in vc4_screen.c

Stumbled into this when adding a new PIPE_CAP.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

Revert "intel/fs: Use a pure vertical stride for large register strides"

This reverts commit e8c9e65185de3e821e1e482e77906d1d51efa3ec.

With the actual bug fixed (by commit 6ac2d1690192), this is not
necessary. I'm doubtful of its correctness in any case.

i965/fs: Fix extract_i8/u8 to a 64-bit destination

The MOV instruction can extract bytes to words/double words, and
words/double words to quadwords, but not byte to quadwords.

For unsigned byte to quadword, we can read them as words and AND off the
high byte and extract to quadword in one instruction. For signed bytes,
we need to first sign extend to word and the sign extend that word to a
quadword.

Fixes the following test on CHV, BXT, and GLK:
KHR-GL46.shader_ballot_tests.ShaderBallotBitmasks
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103628
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965/fs: Split all 32->64-bit MOVs on CHV, BXT, GLK

Fixes the following tests on CHV, BXT, and GLK:
KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot
dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint32_to_int64
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103115

swr/rast: Faster emulated simd16 permute

Speed up simd16 frontend (default) on avx/avx2 platforms;
fixes performance regression caused by switch to simdlib.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Cc: mesa-stable@lists.freedesktop.org

swr/rast: Use gather instruction for i32gather_ps on simd16/avx512

Speed up avx512 platforms; fixes performance regression caused
by swithc to simdlib.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Cc: mesa-stable@lists.freedesktop.org

egl/wayland: Add a fallback when fourcc query isn't supported

When queryImage doesn't support __DRI_IMAGE_ATTRIB_FOURCC wayland clients
will die with a NULL derefence in wl_proxy_add_listener.

Attempt to provide a simple fallback to keep ancient systems working.

Fixes: 6595c699511 ("egl/wayland: Remove more surface specifics from
create_wl_buffer")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103519
Signed-off-by: Derek Foreman <derekf@osg.samsung.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>

radeonsi: remove has_cp_dma, has_streamout flags (v2)

v2: remove r600_can_dma_copy_buffer

i965: implement (un)mapImage

Already implemented for Gallium drivers.

Useful for gbm_bo_(un)map.

Tests:
  By porting wayland/weston/clients/simple-dmabuf-drm.c to GBM.
  kmscube --mode=rgba
  kmscube --mode=nv12-1img
  kmscube --mode=nv12-2img
  piglit ext_image_dma_buf_import-refcount -auto
  piglit ext_image_dma_buf_import-transcode-nv12-as-r8-gr88 -auto
  piglit ext_image_dma_buf_import-sample_rgb -fmt=XR24 -alpha-one -auto
  piglit ext_image_dma_buf_import-sample_rgb -fmt=AR24 -auto
  piglit ext_image_dma_buf_import-sample_yuv -fmt=NV12 -auto
  piglit ext_image_dma_buf_import-sample_yuv -fmt=YU12 -auto
  piglit ext_image_dma_buf_import-sample_yuv -fmt=YV12 -auto

v2: add early return if (flag & MAP_INTERNAL_MASK)
v3: take input rect into account and test with kmscube and piglit.
v4: handle wraparound and bo reference.
v5: indent, exclude 0 width and height on the boundary, map bo
    independently of the image.

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

radv: force enable LLVM sisched for The Talos Principle

It seems safe and it improves performance by +4% (73->76).

A drirc based solution is not what we want for now, keep it
simple and improve later if it's really needed.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

radv: add nosisched debug option

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>

spirv: fix typo on DO NOT EDIT header

Introduced on commit 157c9a13414b524ce98ea0ea07fce819efc1ba65

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>

meson: if dep_dl is an empty list, it's not a dependency object

It's ok to use an empty list for dependencies:, but it's not ok to try to
use the found() method of it.

See also https://github.com/mesonbuild/meson/issues/2324

Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

radv: Free temporary syncobj after waiting on it.

Otherwise we leak it.

Fixes: eaa56eab6da "radv: initial support for shared semaphores (v2)"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

radv: Free syncobj with multiple imports.

Otherwise we can leak the old syncobj.

Fixes: eaa56eab6da "radv: initial support for shared semaphores (v2)"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

i965: Track the depth and render caches separately

Previously, we just had one hash set for tracking depth and render
caches called brw_context::render_cache.  This is less than ideal
because the depth and render caches are separate and we can't track
moves between the depth and the render caches.  This limitation led
to some unnecessary flushing around the depth cache.  There are cases
(mostly with BLORP) where we can end up touching a depth or stencil
buffer through the render cache.  To guard against this, blorp would
unconditionally do a render_cache_set_check_flush on it's destination
which meant that if you did any rendering (including a BLORP operation)
to a given surface and then used it as a blorp destination, you would
end up flushing it out of the render cache before rendering into it.

Things get worse when you dig into the depth/stencil state code for
regular GL draw calls.  Because we may end up rendering to a depth
or stencil buffer via BLORP, we did a render_cache_set_check_flush on
all depth and stencil buffers in brw_emit_depthbuffer to ensure that
they got flushed out of the render cache prior to using them for depth
or stencil testing.  However, because we also need to track dirtiness
for depth and stencil so that we can implement depth and stencil
texturing correctly, we were adding all depth and stencil buffers to the
render cache set in brw_postdraw_set_buffers_need_resolve.  This meant
that, if anything caused 3DSTATE_DEPTH_BUFFER to get re-emitted
(currently _NEW_BUFFERS, BRW_NEW_BATCH, and BRW_NEW_BLORP), we would
almost always do a full pipeline stall and render/depth cache flush.

The root cause of both of these problems is that we can't tell the
difference between the render and depth caches in our tracking.  This
commit splits our cache tracking into two sets, one for render and one
for depth, and properly handles transitioning between the two.  We still
flush all the caches whenever anything needs to be flushed.  The idea is
that if we're going to take the hit of a flush and stall, we may as well
flush everything in the hopes that we can avoid a flush by something
else later.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/blorp: Add more destination flushing

Right now we just always flush the destination for render and aren't
particularly careful about depth or stencil. Soon, flush_for_render
isn't going to do the same thing as flush_for_depth and we may be doing
a good deal less depth flushing so we should be a bit more precise.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: Add more precise cache tracking helpers

In theory, this will let us track the depth and render caches
separately. Right now, they're just wrappers around
brw_render_cache_set_*

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: Add stencil buffers to cache set regardless of stencil texturing

We may access them as a texture using blorp regardless of whether or not
stencil texturing is enabled.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org

i965: Switch over to fully external-or-not MOCS scheme

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: Use PTE MOCS for all external buffers

We were already using PTE for all render targets in case one happened to
get scanned out. However, this still wasn't 100% correct because there
are still possibly cases where we may want to texture from an external
buffer even though we don't know the caching mode. This can happen, for
instance, on buffers imported from another GPU via prime.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101691
Cc: "17.3" <mesa-stable@lists.freedesktop.org>
Tested-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/blorp: Make the MOCS setting part of blorp_address

This makes our MOCS settings significantly more flexible.

Cc: "17.3" <mesa-stable@lists.freedesktop.org>
Tested-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

anv/blorp: Add a device parameter to blorp_surf_for_anv_image

Cc: "17.3" <mesa-stable@lists.freedesktop.org>
Tested-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/blorp: Use mocs.tex for depth stencil

Cc: "17.3" <mesa-stable@lists.freedesktop.org>
Tested-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/tools/error: Decode compute shaders.

This is a bit more annoying than your average shader - we need to look
at MEDIA_INTERFACE_DESCRIPTOR_LOAD in the batch buffer, then hop over
to the dynamic state buffer to read the INTERFACE_DESCRIPTOR_DATA, then
hop over to the instruction buffer to decode the program.

Now that we store all the buffers before decoding, we can actually do
this fairly easily.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/tools/error: Use do-while for field iterator loops.

while loops skip the first field of the instruction/structure, which
is not what the code intended. It works out because the field we're
looking for doesn't happen to be first, but we ought to do it right
regardless.

Found while writing the next patch, where Kernel Start Pointer is
the first field of INTERFACE_DESCRIPTOR_DATA.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/tools/error: Decode shaders while decoding batch commands.

This makes aubinator_error_decode's shader dumping work like aubinator.
Instead of printing them after the fact, it prints them right inside the
3DSTATE_VS/HS/DS/GS/PS packet that references them. This saves you the
effort of cross-referencing things and jumping back and forth.

It also reduces a bunch of book-keeping, and eliminates the limitation
that we could only handle 4096 programs. That code was also broken and
failed to print any shaders if there were under 4096 programs.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/tools/error: Save error state sections and decode them later.

This lets us complete parsing and storing of each buffer's data before
we begin decoding the batchbuffer. This makes it possible to inspect
the state buffer and program buffer, so we can properly decode any
indirect state or shader programs.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

intel/tools/error: Fix null termination of ring name string.

Ported from intel_error_decode. We don't want to run off the end.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

intel/tools/error: Drop unused MAX_RINGS #define.

Dead code.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

intel/tools/error: Refactor buffer matching, add more buffers.

Based on a similar patch to intel_error_decode by Chris Wilson.

While we're de-duplicating the gtt_offset calculation, we can simplify
it to assume two hex digits are there - the kernel has done this since
v4.6, and we already require error states from v4.10.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

intel/tools/error: Only decode a few sections of error states.

These three are the only we can reasonably decode with genxml.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

intel/tools/error: Drop unused parameters from decode() helper.

Also change count from a pointer into a value. We were supposed to
be resetting it to 0 (and failed to), but that's gone since we dropped
the pre-ascii85 handling.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

intel/tools/error: Drop support for non-ascii85 encoded error states.

Error state files used to look like:

   render ring --- gtt_offset = 0x0e8f6000
   00000000 :  69040000
   00000004 :  79090000
   ...
   00007ffc :  00000000
   --- ringbuffer = 0x00001000

There were thousands of lines between sections.  The file format changed
with Kernel 4.10, and now has a single ascii85-encoded line following
each section heading.  This is much easier to parse.

There are a bunch of bugs in our handling of the old style format,
where we'd decode the wrong data, at the wrong time.  Fixing all of
these is going to be a giant pain.  It's also a lot of extra code
complexity.  In order to properly decode indirect state, or compute
shaders, we'll also need to parse data in advance of decoding, which
is going to be a giant pain with this ad-hoc "decode everywhere!"
mentality.  So, let's just drop support for the older file format.

This unfortunately requires an error state generated by Kernel 4.10 or
later.  That's probably not the end of the world, as we encourage users
to upgrade to the latest kernel when encountering GPU hangs anyway.  It
might be a giant pain for people with LTS kernels, though...

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

intel/tools/error: Do ascii85 decode first.

The dashes "---" may occur within an ascii85 block, but only an ascii85
block starts with ':' or '~'.

Ported from Chris Wilson's intel-gpu-tools commit:
bceec7e1d8a160226b783c6344eae8cbf4ece144

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

c11/haiku: Define missing timespec_get on Haiku

Reviewed-by: Brian Paul <brianp@vmware.com>

egl/haiku: Correct invalid void* conversion in calloc

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>

meson: Remove build_by_default from amd code

This is the same logic as the previous two patches.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

meson: Don't build intel shared components by default

It's a neat idea, and still useful in some cases, but the intel common
code is used by i965 and anvil only, this is a little clearer.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

meson: don't use build_by_default for specific gallium drivers

Using build_by_default : false is convenient for dependencies that can
be pulled in by various diverse components of the build system, the
gallium hardware/software drivers and state trackers do not fit that
description. Instead, these should be guarded using the variable that tracks
whether that driver should be enabled.

This leaves a few helper libraries: trace, rbug, etc, and the generic
winsys bits as `build_by_default : false` because there are a large
number of gallium components that pull them in.

v2: - remove build_by_default from winsys convenience libs as well.
v3: - Always put drivers before winsys for consistency

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>

r600/shader: handle bitfield extract semantics properly.

Fixes:
tests/spec/arb_gpu_shader5/execution/built-in-functions/fs-bitfieldExtract.shader_test

Signed-off-by: Dave Airlie <airlied@redhat.com>

r600: handle bitfieldInsert corner case.

This handles the bits >= 32 corner case in bitfieldInsert.

Fixes:
tests/spec/arb_gpu_shader5/execution/built-in-functions/fs-bitfieldInsert.shader_test.

Signed-off-by: Dave Airlie <airlied@redhat.com>

r600: add gs tri strip adjacency fix.

Like
radeonsi: generate GS prolog to (partially) fix triangle strip adjacency rotation

evergreen hw suffers from the same problem, so rotate the
geometry inputs to fix this.

This fixes:
./bin/glsl-1.50-geometry-primitive-types GL_TRIANGLE_STRIP_ADJACENCY
on evergreen.

Signed-off-by: Dave Airlie <airlied@redhat.com>

r600: fix isoline tess factor component swapping.

As per radeonsi, the tess factor components for isolines
are reversed.

Fixes: tests/spec/arb_tessellation_shader/execution/isoline.shader_test
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>

r600/shader: reserve first register of vertex shader.

r0 in input into vertex shaders contains things like vertexid,
we need to reserve it even if we have no inputs.

This fixes a bunch of tessellation piglits.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>

meson: Move -Dvulkan-drivers handling higher in the file

The window-system auto-detection code (specifically for glx) relies on
with_any_vk being available. This fixes the Vulkan-only build. Also,
this puts it up near the handling of -Ddri-drivers and -Dgallium-drivers
which seems to make a bit more sense.

Fixes: 118a7f044191d4ab15ac9 "meson: add support for xlib glx"
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

meson: Stop requiring platforms for Vulkan

It should be perfectly valid to build a completely headless Vulkan
driver. We don't need to require a platform.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

r600: don't emit atomic save if we have no atomic counters.

Otherwise we end up emitting the fence.

Tested-By: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

glx/dri3: Fix passing renderType into glXCreateContext

Without this, trying to create a GLX_RGBA_FLOAT_TYPE_ARB context would
fail, because GLX_RGBA_TYPE would be a mismatch with the fbconfig.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>

glx/drisw: Fix glXMakeCurrent(dpy, None, ctx)

This is perfectly legal in GL 3.0+.

Fixes piglit/glx-create-context-current-no-framebuffer.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>

glx: Lower GLX opcode lookup into SendMakeCurrentRequest

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>

aubinator: Don't skip the first field in each subgroup

The previous iteration algorithm would advance the field pointer right
after we advance the group. This meant that you would end up with
skipping the first field of the group. In the common case, where the
only field is a struct (e.g. 3DSTATE_VERTEX_BUFFERS), it would get
skipped entirely.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/genxml: Delete empty groups

They serve no purpose other than to just fill empty space in the packet
so each dword has something. Just disallowing empty groups is a bit
easier on some of the tools. This does not change the generated packing
headers in any way.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

anv: Don't crash on invalid heap sizes when the PCI ID is overriden

nir/spirv: tg4 requires a sampler

Gather operations in both GLSL and SPIR-V require a sampler. Fixes
gathers returning garbage when using separate texture/samplers (on AMD,
was using an invalid sampler descriptor).

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

spirv: Use correct type for sampled images

We should use the result type of the OpSampledImage opcode, rather than
the type of the underlying image/samplers.

This resolves an issue when using separate images and shadow samplers
with glslang. Example:

    layout (...) uniform samplerShadow s0;
    layout (...) uniform texture2D res0;
    ...
    float result = textureLod(sampler2DShadow(res0, s0), uv, 0);

For this, for the combined OpSampledImage, the type of the base image
was being used (which does not have the Depth flag set, whereas the
result type does), therefore it was not being recognised as a shadow
sampler. This led to the wrong LLVM intrinsics being emitted by RADV.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

spirv: add DO NOT EDIT warning on generated spirv_info.c

Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>

loader/dri3: Improve dri3 thread-safety

It turned out that with recent changes that call into dri3 from glFinish(),
it appears like different thread end up waiting for X events simultaneously,
causing deadlocks since they steal events from eachoter and update the dri3
counters behind eachothers backs.

This patch intends to improve on that. It allows at most one thread at a
time to wait on events for a single drawable. If another thread intends to
do the same, it's put to sleep until the first thread finishes waiting, and
then it rechecks counters and optionally retries the waiting. Threads that
poll for X events never pulls X events off the event queue if there are
other threads waiting for events on that drawable. Counters in the
dri3 drawable structure are protected by a mutex. Finally, the mutex we
introduce is never held while waiting for the X server to avoid
unnecessary stalls.

This does not make dri3 drawables completely thread-safe but at least it's a
first step.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102358
Fixes: d5ba75f8881 "st/dri2 Plumb the flush_swapbuffer functionality through to dri3"
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>