git.libre-soc.org Git - mesa.git/log

radv: move descriptor sets out of cmd_state.

Instead of storing all the pointers and zeroing them all out,
just store a valid bitmask in the state. This also moves
the CmdBindPipeline path down the cpu usage path for the
multithreading demo as it no longer has to traverse MAX_SETS
to find the active descriptor sets.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>

radv: add helper for setting a descriptor.

This is just a simple refactor.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>

radv: move vertex binding out of cmd state.

This isn't required to be cleared, since buffers are only linked
by vertex elements, so if elements are clear then no buffers
should be referenced.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>

radv: reorder cmd_state to remove a hole.

This just removes a hole in the cmd_state and packs some bools
together.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>

radv: free attachments on end command buffer.

If we allocate attachments in the begin command buffer due to the
render pass continue bit, we were leaking them.

Since renderpasses inside a cmd buffer malloc/free these properly,
and set to NULL, we just need to call free at end.

Fixes a memory leak with multithreading demo.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>

radv: Optimize calling radv_save_descriptors.

uint32_t data[MAX_SETS * 2] = {}; was getting executed before
the exit and took significant amounts of time. By having the
check outside the function, we skip the execution of the clear.

Reviewed-by: Dave Airlie <airlied@redhat.com>

radv: Use an array to store descriptor sets.

The vram_list linked list resulted in lots of pointer chasing.
Replacing this with an array instead improves descriptor set
allocation CPU usage by 3x at least (when also considering the free),
because it had to iterate through 300-400 sets on average.

Not a huge improvement as the pre-improvement CPU usage was only
about 2.3% in the busiest thread.

Reviewed-by: Dave Airlie <airlied@redhat.com>

nv50,nvc0: Display shared memory usage in pipe_debug_message

Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>

nv50,nvc0: Copy shared memory per block to the program info structure and back

In OpenCL/CUDA kernels, shared memory usage can be defined within the
kernel code. Those usage will only be picked up while parsing the
SPIR-V, during the translation phase of the program.

Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>

nv50/ir: Store shared memory per block in nv50_ir_prog_info

Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>

i965/gen10: Implement Wa3DStateMode

This workaround doesn't fix any of the piglit hangs we've seen
on CNL. But it might be fixing something we haven't tested yet.

V2: Remove the bits enabling Float blend optimization. It is
    enabled through CACHE_MODE_SS register.
    Update the comment.
    Move gen10 if block on top of gen9 if block.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>

i965/gen10: Enable float blend optimization

This optimization is enabled for previous generations too.
See Mesa commit c17e214a6b
On CNL this bit has been moved to CACHE_MODE_SS register.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>

i965/gen10: Implement WaForceRCPFEHangWorkaround

This workaround doesn't fix any of the piglit hangs we've seen
on CNL. But it might be fixing something we haven't tested yet.

V2: Add the check for Post Sync Operation.
Update the workaround comment.
Use braces around if-else.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>

i965/gen10: Implement WaSampleOffsetIZ workaround

There are few other (duplicate) workarounds which have similar recommendations:
WaFlushHangWhenNonPipelineStateAndMarkerStalled
WaCSStallBefore3DSamplePattern
WaPipeControlBefore3DStateSamplePattern

WaPipeControlBefore3DStateSamplePattern has some extra recommendations if
driver is using mid batch context restore. Ignoring it for now because We're
not doing mid-batch context restore in Mesa.

This workaround doesn't fix any of the piglit hangs we've seen
on CNL. But it might be fixing something we haven't tested yet.

V2: Use brw_load_register_imm32() to program CACHE_MODE_0.
Get rid of brw_flush_gpu_caches().

V3: Make the workaround helper functions static.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by :Nanley Chery <nanley.g.chery@intel.com>

i965/gen10: Don't set Antialiasing Enable in 3DSTATE_RASTER if num_samples > 1

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/gen10: Don't set Smooth Point Enable in 3DSTATE_SF if num_samples > 1

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

winsys/amdgpu: Add R600_DEBUG flag to reserve VMID per ctx.

Fixes reverted patch f03b7c9 by doing VMID reservation per
process and not per context.
Also updates required amdgpu libdrm version since the change
involved interface updates in amdgpu libdrm.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

i965: perf: list registers to program for queries

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: perf: factorize code for availability

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: perf: make revision variable available

This will be used in the next commit to build up register programming.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

glsl: fix interpolateAtXxx(some_vec[idx], ...) with dynamic idx

The dynamic index of a vector (not array!) is lowered to a sequence of
conditional assignments. However, the interpolate_at_* expressions
require that the interpolant is an l-value of a shader input.

So instead of doing conditional assignments of parts of the shader input
and then interpolating that (which is nonsensical), we interpolate the
entire shader input and then do conditional assignments of the interpolated
result.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

glsl: allow any l-value of an input variable as interpolant in interpolateAt*

The intended rule has been clarified in GLSL 4.60, Section 8.13.2
(Interpolation Functions):

   "For all of the interpolation functions, interpolant must be an l-value
    from an in declaration; this can include a variable, a block or
    structure member, an array element, or some combination of these.
    Component selection operators (e.g., .xy) may be used when specifying
    interpolant."

For members of interface blocks, var->data.must_be_shader_input must be
determined on-the-fly after lowering interface blocks, since we don't want
to disable varying packing for an entire block just because one input in it
is used in interpolateAt*.

v2: keep setting must_be_shader_input in ast_function (Ian)
v3: follow the relaxed rule of GLSL 4.60
v4: only apply the relaxed rules to desktop GL
    (the ES WG decided that the relaxed rules may apply in a future version
     but not retroactively; see also
     dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_centroid.negative.*)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101378
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

nir/serialize: fix build with gcc 4.4.7

I had to build on RHEL6 today, and noticed this.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>

i915g: remove some unknown cap warnings.

i915g: make gears run again.

We need to validate some structs exist before we dirty the states, and
avoid the problem in some other places.

Fixes: e027935a7 ("st/mesa: don't update unrelated states in non-draw calls such as Clear")

ac: remove the remaining duplicate llvm types

Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

ac: remove usused v4f32

Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

ac: add v2f32 to the common code and make use of it

Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

ac: use the ac f16 llvm type

Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

ac: use the ac f32 llvm type

Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

ac: use the ac f64 llvm type

Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

ac: use the common v8i32 llvm type

Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

ac: use the common v4i32 llvm type

Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

ac: add v3i32 to the common code and make use of it

Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

ac: add v2i32 to the common code and use it

Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

ac: use the ac i64 llvm type

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

ac: remove unused i16 llvm type

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

ac: use the ac ivoidt llvm type

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

ac: use the ac i8 llvm type

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

ac: use the ac i1 llvm type

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

ac: use the ac i32 llvm type

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

ac/radeonsi: add support for tex instr without a derefence

These are produced by nir_lower_bitmap(), adding the missing derefence
would cause other issues that need to be hacked around such as
skipping sampler lowering and uniform location assignment, so this
change seems the correct way to go.

Fixes 194 piglit crashes on radeonsi using NIR.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

nir: skip lowering sampler if there is no dereference

This avoids a crash on the output of nir_lower_bitmap().

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

r600: add support for early depth/stencil.

This add support for the early depth/stencil property found
on image shaders.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

r600: add support for emitting RAT instructions to the assembler.

This adds support for emitting RAT instructions to the assembler.
RAT instructions are used to implement image accessors.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

r600: add support for mark bit to the assembler.

This adds support to the assembler for the mark bit
on the export word1.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

r600: add support for valid pixel mode on CF clauses

This just adds support to the assembler for setting the valid
pixel mode on the CF clause.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

r600: add support for some ALU sources.

These special ALU sources provide the shader engine,
simd and hw wave ids.

These are required for images support.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

radv: use the optimal packets order for dispatch calls

This should reduce the time where compute units are idle, mainly
for meta operations because they use a bunch of compute shaders.

This seems to have a really minor positive effect for Talos, at least.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

nir: add tess patch support to nir_remove_unused_varyings()

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

es2api/ABI-check: Add es3.x symbols

Currently this ABI check only checks for es2 symbols, but es3.x symbols
are also exposed. Exposing these symbols is recommended by Khronos, and
as such the test should accept that as ABI.

see: https://lists.freedesktop.org/archives/mesa-stable/2016-June/004545.html
for the discussion about exposing these symbols

cc: Ian Romanick <idr@freedesktop.org>
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Tested-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>

meson: Set c visibility args for wayland-drm

Because otherwise gbm will expose wayland symbols that it shouldn't.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-and-Tested-by: Eric Engestrom <eric.engestrom@imgtec.com>

st/glsl_to_nir: pass gl_shader_program to st_finalize_nir()

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

radv: Don't expose heaps with 0 memory.

It confuses CTS. This pregenerates the heap info into the
physical device, so we can use it for translating contiguous
indices into our "standard" ones.

This also makes the WSI a bit smarter in case the first preferred
heap does not exist.

Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: <mesa-stable@lists.freedesktop.org>

gbm: Don't traverse backwards for includes

This is just a bad idea and should be avoided. Instead, make the #include
flat and fix the build systems to pass the proper -I flags

v2: - add an inc_wayland_drm instead passing a path to
include_directories (Emil)
- update commit message (Emil)

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)

automake: Remove unused include path

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>

radeonsi: remove 'Authors:' comments

It's inaccurate. Instead, see the copyright and use "git log" and
"git blame" to know the authorship.

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>

intel/fs: Don't allocate a param array for zero push constants

Thanks to the ralloc invariant of "any pointer returned from ralloc can
be used as a context", calling ralloc_size with a size of zero will
cause it to allocate at least a header. If we don't have any push
constants, then NULL is perfectly acceptable (and even preferred).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>

intel/fs: Alloc pull constants off mem_ctx

It doesn't actually matter since the only user of push constants, i965,
ralloc_steals it back to NULL but it's more consistent and probably
fixes memory leaks in some error cases.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: mesa-stable@lists.freedesktop.org

Revert "meson: bump libdrm version required by amdgpu"

This reverts commit d364684711a5894fd3221191811d56713d6abdee.

The commit that bumped the autotools version was reverted, so lets
revert the meson version to match.

fixes: 1f2640bfa940362c7550cdd065d37555f21c8ae8
"Revert "winsys/amdgpu: Add R600_DEBUG flag to reserve VMID per ctx.""
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>

gallivm: allow arch rounding with avx512

Fixes piglit vs-roundeven-{float,vec[234]} with simd16 VS.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>

etnaviv: Allow clearing constant buffer using buffer==NULL user_buffer==NULL

Prevents an assertion when using GALLIUM_HUD with ioquake3,
when cso_restore_constant_buffer_slot0 restores an empty
constant buffer in slot 0.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>

etnaviv: Don't flush on transfer when UNSYNCHRONIZED

Structure code to only flush when we will potentially call cpu_prep. This
prevents spurious flushes in applications that heavily rely on u_uploader.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>

etnaviv: don't do resolve-in-place without valid TS

GC3000 resolve-in-place assumes that the TS state is configured.
If it is not, this will result in MMU errors. This is especially
apparent when using glGenMipmaps().

Fixes: 78ade659569e ("etnaviv: Do GC3000 resolve-in-place when possible")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Tested-by: Chris Healy <cphealy@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>

radv: make radv_bind_descriptor_set() static

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: make sure we set buffers as shareable properly.

This should make sure we don't treat exports buffers as local
bos.

Fixes: a639d40f13 (radv: add support for local bos. (v3))
Tested-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

svga: Use __asm__ instead of asm

__asm__ is portable, and allows the svga driver to be compiled with the
c99 standard instead of requiring the gnu99 standard.

I have compile tested this with GCC and Clang on Linux.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>

Revert "winsys/amdgpu: Add R600_DEBUG flag to reserve VMID per ctx."

This reverts commit f03b7c9ad92c1656a221297819fbc6d065cc0af7.

The libdrm interface is wrong.

intel: decoder: enable decoding a single field

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: expose missing find_enum()

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: extract field value computation

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: rename field() to field_value()

We would like to avoid collisions with variables named field.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: rename internal function to free name

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: simplify field_is_header()

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: common: make intel utils available from C++

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: remove unused platform field

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: error-decode: implement a rolling window of programs

If we have more programs than what we can store,
aubinator_error_decode will assert. Instead let's have a rolling
window of programs.

v2: Fix overflowing issues (Eric Engestrom)

v3: Go through programs starting at idx_program (Scott)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

gallium: increase pipe_sampler_view::target bitfield size for MSVC

MSVC treats enums as being signed. The 4-bit target field isn't large
enough to correctly store the value 8 (for PIPE_TEXTURE_CUBE_ARRAY).
The bitfield value 0x8 was being interpreted as -8 so matching the
target with PIPE_TEXTURE_CUBE_ARRAY in switch statements, etc. was
failing.

To keep the structure size the same, we reduce the format field from
16 bits to 15. There don't appear to be any other enum bitfields
which need to be adjusted.

This fixes a number of Piglit cube map array tests.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>

mapi: fix .so path in ABI-check

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>

intel: decoder: extract instruction/structs length

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: pack iterator variable declarations

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: simplify creation of struct when 0-allocated

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: add destructor for gen_spec

This makes use of ralloc to simplify the destruction. We can also
store instructions in hash tables.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: expose helper to test header fields

These fields are of little importance as they're used to recognize
instructions.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: don't read qword outside instruction/struct limit

We used to print invalid data when the last field was being clamped to
32bits due to Dword Length of the whole instruction. Here is an
example where the decoder read part of the next instruction instead of
stopping at the 32bit limit:

0x000ce0b4:  0x10000002:  MI_STORE_DATA_IMM
0x000ce0b4:  0x10000002 : Dword 0
    DWord Length: 2
    Store Qword: 0
    Use Global GTT: false
0x000ce0b8:  0x00045010 : Dword 1
    Core Mode Enable: 0
    Address: 0x00045010
0x000ce0bc:  0x00000000 : Dword 2
0x000ce0c0:  0x00000000 : Dword 3
    Immediate Data: 8791026489807077376

With this change we have the proper value :

0x000ce0b4:  0x10000002:  MI_STORE_DATA_IMM (4 Dwords)
0x000ce0b4:  0x10000002 : Dword 0
    DWord Length: 2
    Store Qword: 0
    Use Global GTT: false
0x000ce0b8:  0x00045010 : Dword 1
    Core Mode Enable: 0
    Address: 0x00045010
0x000ce0bc:  0x00000000 : Dword 2
0x000ce0c0:  0x00000000 : Dword 3
    Immediate Data: 0

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: split out getting the next field and decoding it

Due to the new way we handle fields, we need *not* to forget the first
field when decoding instructions. The issue was that the advance
function was called first and skipped the first field.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: move field name copy

This should be inside the function that actually decodes fields.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: reorder iterator init function

Making the next change more readable.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: common: print out all dword with field spanning multiple dwords

For example, we were skipping Dword 3 in this PIPE_CONTROL :

0x000ce130:  0x7a000004:  PIPE_CONTROL
    DWord Length: 4
0x000ce134:  0x00000010 : Dword 1
    Flush LLC: false
    Destination Address Type: 0 (PPGTT)
    LRI Post Sync Operation: 0 (No LRI Operation)
    Store Data Index: 0
    Command Streamer Stall Enable: false
    Global Snapshot Count Reset: false
    TLB Invalidate: false
    Generic Media State Clear: false
    Post Sync Operation: 0 (No Write)
    Depth Stall Enable: false
    Render Target Cache Flush Enable: false
    Instruction Cache Invalidate Enable: false
    Texture Cache Invalidation Enable: false
    Indirect State Pointers Disable: false
    Notify Enable: false
    Pipe Control Flush Enable: false
    DC Flush Enable: false
    VF Cache Invalidation Enable: true
    Constant Cache Invalidation Enable: false
    State Cache Invalidation Enable: false
    Stall At Pixel Scoreboard: false
    Depth Cache Flush Enable: false
0x000ce138:  0x00000000 : Dword 2
    Address: 0x00000000
0x000ce140:  0x00000000 : Dword 4
    Immediate Data: 0

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: build sorted linked lists of fields

The xml files don't always have fields in order. This might confuse
our parsing of the commands. Let's have the fields in order. To do
this, the easiest way it to use a linked list. It also helps a bit
with the iterator.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: common: expose gen_spec fields

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

travis: build meson first for quicker feedback

Meson is much quicker to build Mesa, giving quicker feedback if
executed first.

Cc: Dylan Baker <dylan@pnwbakers.com>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

meson: bump libdrm version required by amdgpu

Fixes: f03b7c9ad92c1656a221 "winsys/amdgpu: Add R600_DEBUG flag to
reserve VMID per ctx."
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

i965: Initialize disk shader cache if MESA_GLSL_CACHE_DISABLE is false

(Apologies for the double negative.)

For now, the shader cache is disabled by default on i965 to allow us
to verify its stability.

In other words, to enable the shader cache on i965, set
MESA_GLSL_CACHE_DISABLE to false or 0. If the variable is unset, then
the shader cache will be disabled.

We use the build-id of i965_dri.so for the timestamp, and the pci
device id for the device name.

v2:
* Simplify code by forcing link to include build id sha. (Matt)

v3:
* Don't use a for loop with snprintf for bin to hex. (Matt)
* Assume fixed length render and timestamp string to further simplify
code.

Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

dri drivers: Always add the sha1 build-id

v4:
* Add Android build changes. (Emil)

Cc: Dylan Baker <dylanx.c.baker@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

disk_cache: Fix issue reading GLSL metadata

This would cause the read of the metadata content to fail, which would
prevent the linking from being skipped.

Seen on Rocket League with i965 shader cache.

Fixes: b86ecea3446e "util/disk_cache: write cache item metadata to disk"
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

glsl/shader_cache: Save fs (BlendSupport) metadata

Fixes many GL 4.5 CTS blend tests, such as:

* GL45-CTS.blend_equation_advanced.extension_directive_enable
* GL45-CTS.blend_equation_advanced.extension_directive_warn
* GL45-CTS.blend_equation_advanced.blend_all.GL_MULTIPLY_KHR_all_qualifier
* GL45-CTS.blend_equation_advanced.blend_specific.GL_COLORBURN_KHR

v2:
* Directly save the BlendSupport field to avoid potentially including
a pointer in the future in the structure is updated. (tarceri)

Cc: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965: Initialize sha1 hash of dri config options

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: Don't link when the program was found in the disk cache

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965: add cache fallback support using serialized nir

If the i965 gen program cannot be loaded from the cache, then we
fallback to using a serialized nir program.

This is based on "i965: add cache fallback support" by Timothy Arceri
<timothy.arceri@collabora.com>. Tim's version was written to fallback
to compiling from source, and therefore had to be much more complex.
After Connor and Jason implemented nir serialization, I was able to
rewrite and greatly simplify this patch.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>