git.libre-soc.org Git - mesa.git/log

etnaviv: Don't flush on transfer when UNSYNCHRONIZED

Structure code to only flush when we will potentially call cpu_prep. This
prevents spurious flushes in applications that heavily rely on u_uploader.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>

etnaviv: don't do resolve-in-place without valid TS

GC3000 resolve-in-place assumes that the TS state is configured.
If it is not, this will result in MMU errors. This is especially
apparent when using glGenMipmaps().

Fixes: 78ade659569e ("etnaviv: Do GC3000 resolve-in-place when possible")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Tested-by: Chris Healy <cphealy@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>

radv: make radv_bind_descriptor_set() static

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: make sure we set buffers as shareable properly.

This should make sure we don't treat exports buffers as local
bos.

Fixes: a639d40f13 (radv: add support for local bos. (v3))
Tested-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

svga: Use __asm__ instead of asm

__asm__ is portable, and allows the svga driver to be compiled with the
c99 standard instead of requiring the gnu99 standard.

I have compile tested this with GCC and Clang on Linux.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>

Revert "winsys/amdgpu: Add R600_DEBUG flag to reserve VMID per ctx."

This reverts commit f03b7c9ad92c1656a221297819fbc6d065cc0af7.

The libdrm interface is wrong.

intel: decoder: enable decoding a single field

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: expose missing find_enum()

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: extract field value computation

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: rename field() to field_value()

We would like to avoid collisions with variables named field.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: rename internal function to free name

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: simplify field_is_header()

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: common: make intel utils available from C++

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: remove unused platform field

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: error-decode: implement a rolling window of programs

If we have more programs than what we can store,
aubinator_error_decode will assert. Instead let's have a rolling
window of programs.

v2: Fix overflowing issues (Eric Engestrom)

v3: Go through programs starting at idx_program (Scott)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

gallium: increase pipe_sampler_view::target bitfield size for MSVC

MSVC treats enums as being signed. The 4-bit target field isn't large
enough to correctly store the value 8 (for PIPE_TEXTURE_CUBE_ARRAY).
The bitfield value 0x8 was being interpreted as -8 so matching the
target with PIPE_TEXTURE_CUBE_ARRAY in switch statements, etc. was
failing.

To keep the structure size the same, we reduce the format field from
16 bits to 15. There don't appear to be any other enum bitfields
which need to be adjusted.

This fixes a number of Piglit cube map array tests.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>

mapi: fix .so path in ABI-check

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>

intel: decoder: extract instruction/structs length

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: pack iterator variable declarations

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: simplify creation of struct when 0-allocated

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: add destructor for gen_spec

This makes use of ralloc to simplify the destruction. We can also
store instructions in hash tables.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: expose helper to test header fields

These fields are of little importance as they're used to recognize
instructions.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: don't read qword outside instruction/struct limit

We used to print invalid data when the last field was being clamped to
32bits due to Dword Length of the whole instruction. Here is an
example where the decoder read part of the next instruction instead of
stopping at the 32bit limit:

0x000ce0b4:  0x10000002:  MI_STORE_DATA_IMM
0x000ce0b4:  0x10000002 : Dword 0
    DWord Length: 2
    Store Qword: 0
    Use Global GTT: false
0x000ce0b8:  0x00045010 : Dword 1
    Core Mode Enable: 0
    Address: 0x00045010
0x000ce0bc:  0x00000000 : Dword 2
0x000ce0c0:  0x00000000 : Dword 3
    Immediate Data: 8791026489807077376

With this change we have the proper value :

0x000ce0b4:  0x10000002:  MI_STORE_DATA_IMM (4 Dwords)
0x000ce0b4:  0x10000002 : Dword 0
    DWord Length: 2
    Store Qword: 0
    Use Global GTT: false
0x000ce0b8:  0x00045010 : Dword 1
    Core Mode Enable: 0
    Address: 0x00045010
0x000ce0bc:  0x00000000 : Dword 2
0x000ce0c0:  0x00000000 : Dword 3
    Immediate Data: 0

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: split out getting the next field and decoding it

Due to the new way we handle fields, we need *not* to forget the first
field when decoding instructions. The issue was that the advance
function was called first and skipped the first field.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: move field name copy

This should be inside the function that actually decodes fields.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: reorder iterator init function

Making the next change more readable.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: common: print out all dword with field spanning multiple dwords

For example, we were skipping Dword 3 in this PIPE_CONTROL :

0x000ce130:  0x7a000004:  PIPE_CONTROL
    DWord Length: 4
0x000ce134:  0x00000010 : Dword 1
    Flush LLC: false
    Destination Address Type: 0 (PPGTT)
    LRI Post Sync Operation: 0 (No LRI Operation)
    Store Data Index: 0
    Command Streamer Stall Enable: false
    Global Snapshot Count Reset: false
    TLB Invalidate: false
    Generic Media State Clear: false
    Post Sync Operation: 0 (No Write)
    Depth Stall Enable: false
    Render Target Cache Flush Enable: false
    Instruction Cache Invalidate Enable: false
    Texture Cache Invalidation Enable: false
    Indirect State Pointers Disable: false
    Notify Enable: false
    Pipe Control Flush Enable: false
    DC Flush Enable: false
    VF Cache Invalidation Enable: true
    Constant Cache Invalidation Enable: false
    State Cache Invalidation Enable: false
    Stall At Pixel Scoreboard: false
    Depth Cache Flush Enable: false
0x000ce138:  0x00000000 : Dword 2
    Address: 0x00000000
0x000ce140:  0x00000000 : Dword 4
    Immediate Data: 0

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: decoder: build sorted linked lists of fields

The xml files don't always have fields in order. This might confuse
our parsing of the commands. Let's have the fields in order. To do
this, the easiest way it to use a linked list. It also helps a bit
with the iterator.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

intel: common: expose gen_spec fields

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

travis: build meson first for quicker feedback

Meson is much quicker to build Mesa, giving quicker feedback if
executed first.

Cc: Dylan Baker <dylan@pnwbakers.com>
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

meson: bump libdrm version required by amdgpu

Fixes: f03b7c9ad92c1656a221 "winsys/amdgpu: Add R600_DEBUG flag to
reserve VMID per ctx."
Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

i965: Initialize disk shader cache if MESA_GLSL_CACHE_DISABLE is false

(Apologies for the double negative.)

For now, the shader cache is disabled by default on i965 to allow us
to verify its stability.

In other words, to enable the shader cache on i965, set
MESA_GLSL_CACHE_DISABLE to false or 0. If the variable is unset, then
the shader cache will be disabled.

We use the build-id of i965_dri.so for the timestamp, and the pci
device id for the device name.

v2:
* Simplify code by forcing link to include build id sha. (Matt)

v3:
* Don't use a for loop with snprintf for bin to hex. (Matt)
* Assume fixed length render and timestamp string to further simplify
code.

Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

dri drivers: Always add the sha1 build-id

v4:
* Add Android build changes. (Emil)

Cc: Dylan Baker <dylanx.c.baker@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

disk_cache: Fix issue reading GLSL metadata

This would cause the read of the metadata content to fail, which would
prevent the linking from being skipped.

Seen on Rocket League with i965 shader cache.

Fixes: b86ecea3446e "util/disk_cache: write cache item metadata to disk"
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

glsl/shader_cache: Save fs (BlendSupport) metadata

Fixes many GL 4.5 CTS blend tests, such as:

* GL45-CTS.blend_equation_advanced.extension_directive_enable
* GL45-CTS.blend_equation_advanced.extension_directive_warn
* GL45-CTS.blend_equation_advanced.blend_all.GL_MULTIPLY_KHR_all_qualifier
* GL45-CTS.blend_equation_advanced.blend_specific.GL_COLORBURN_KHR

v2:
* Directly save the BlendSupport field to avoid potentially including
a pointer in the future in the structure is updated. (tarceri)

Cc: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965: Initialize sha1 hash of dri config options

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: Don't link when the program was found in the disk cache

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965: add cache fallback support using serialized nir

If the i965 gen program cannot be loaded from the cache, then we
fallback to using a serialized nir program.

This is based on "i965: add cache fallback support" by Timothy Arceri
<timothy.arceri@collabora.com>. Tim's version was written to fallback
to compiling from source, and therefore had to be much more complex.
After Connor and Jason implemented nir serialization, I was able to
rewrite and greatly simplify this patch.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: add support for cached shaders with xfb qualifiers

For now this disables the shader cache when transform feedback is
enabled via the GL API as we don't currently allow for it when
generating the sha for the shader.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

mesa/glsl: add api_enabled flag to gl_transform_feedback_info

This will be used to disable the shader cache when xfb is enabled
via the api as we don't currently allow for it when generating the
sha for the shader.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: Add shader cache support for compute

v2:
* Use MAYBE_UNUSED. (Matt)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: add shader cache support for tess stages

v2:
* Use MAYBE_UNUSED. (Matt)

[jordan.l.justen@intel.com: *_cached_program => brw_disk_cache_*_program]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: add shader cache support for geometry shaders

v2:
* Use MAYBE_UNUSED. (Matt)

[jordan.l.justen@intel.com: *_cached_program => brw_disk_cache_*_program]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: Add shader cache support for vertex and fragment stages

This enables the cache on vertex and fragment shaders only.

v2:
* Use MAYBE_UNUSED. (Matt)

[jordan.l.justen@intel.com: reword subject]
[jordan.l.justen@intel.com: *_cached_program => brw_disk_cache_*_program]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: add initial implementation of on disk shader cache

This uses the Mesa disk_cache support to write out the final linked
binary for vertex and fragment shader programs.

This is based off the initial implementation done by Carl Worth. It
has been significantly reworked, first by Tim Arceri, and then by
Jordan Justen.

v2:
* Squash 'i965: add image param shader cache support'
* Squash 'i965: add shader cache support for pull param pointers'
* Sustantially simplified by a rework on top of Jason's 2975e4c56a7a.
* Rename load_program_data to read_program_data. (Jason)

v3:
* Simplify and align program read/write. (Jason)

v4:
* Don't save prog_data size since we know it from the stage. (Ken)
* Don't save program size, since prog_data includes the size. (Ken)
* Remove `assert` that potentially could be triggered by disk
corruption of the cache entries. (Ken)
* Fix compute shader scratch allocation. (Ken)
* Remove special case mapping for non-LLC. (Ken)
* Remove SET_UPLOAD_PARAMS macro

[jordan.l.justen@intel.com: *_cached_program => brw_disk_cache_*_program]
[jordan.l.justen@intel.com: brw_shader_cache.c => brw_disk_cache.c]
[jordan.l.justen@intel.com: don't map to write program when LLC is present]
[jordan.l.justen@intel.com: set program_written_to_cache on read from cache]
[jordan.l.justen@intel.com: only try cache when status is linking_skipped]
[jordan.l.justen@intel.com: all v2-v4 changes noted above]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: Calculate thread_count in brw_alloc_stage_scratch

Previously, thread_count was sent in from the stage after some stage
specific calculations. Those stage specific calculations were moved
into brw_alloc_stage_scratch, which will allow the shader cache to
also use the same calculations.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/compiler: Add functions to get prog_data and prog_key sizes for a stage

v2:
* Return unsigned instead of size_t. (Ken)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

intel/compiler: Add union types for prog_data and prog_key stages

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

blob: Don't set overrun if reading 0 bytes at end of data

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

intel/compiler: Remove final_program_size from brw_compile_*

The caller can now use brw_stage_prog_data::program_size which is set
by the brw_compile_* functions.

Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

intel/compiler: add new field for storing program size

This will be used by the on disk shader cache.

v2:
* Set in brw_compile_* rather than brw_codegen_*. (Jason)

Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com>
[jordan.l.justen@intel.com: Only add to brw_stage_prog_data]
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965: Don't rely on nir for uses_texture_gather

When a program is restored from the shader cache, prog->nir will be
NULL, but prog->info will be restored.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/link: Serialize program to nir after linking for shader cache

If the shader cache is enabled, after linking the program, we
serialize the program to nir. This will be saved out by the glsl
shader cache support.

Later, if the same program is found in the cache, we can use the nir
for a fallback in the unlikely case that the gen binary program is not
found in the cache.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

glsl/shader_cache: Save and restore serialized nir in gl_program

v3:
* Rename serialized_nir* to driver_cache_blob*. (Tim)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

main: Add driver cache blob fields to gl_program

These fields can be used to optionally save off a driver blob with the
program metadata. For example, serialized nir, or tgsi.

v3:
* Rename serialized_nir* to driver_cache_blob*. (Tim)
* Free memory. (Jason)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

nir: Add hooks for testing serialization

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

nir: add serialization and deserialization

v2 (Jason Ekstrand):
- Various whitespace cleanups
- Add helpers for reading/writing objects
- Rework derefs
- [de]serialize nir_shader::num_*
- Fix uses of blob_reserve_bytes
- Use a bitfield struct for packing tex_instr data

v3:
- Zero nir_variable struct on deserialization. (Jordan)
- Allow nir_serialize.h to be included in C++. (Jordan)
- Handle NULL info.name. (Jason)
- Set info.name to NULL when name is NULL. (Jordan)

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>

mesa/st: implement max combined output resources limiting.

if the driver sets the cap, then use the value it gives us.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>

gallium: add cap for driver specified max combined shader resources.

Some hw (evergreen) has a limit on how many combined (images/buffers/mrts)
a fragment shader can access.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>

r600/sb: bail out if prepare_alu_group() doesn't find a proper scheduling

It is possible that the optimizer ends up in an infinite loop in
post_scheduler::schedule_alu(), because post_scheduler::prepare_alu_group()
does not find a proper scheduling. This can be deducted from
pending.count() being larger than zero and not getting smaller.

This patch works around this problem by signalling this failure so that the
optimizers bails out and the un-optimized shader is used.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103142
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

radeonsi: fix culldist_writemask in nir path

The shared si_create_shader_selector() code already offsets the mask.

Fixes the following piglit tests:

arb_cull_distance/clip-cull-3.shader_test
arb_cull_distance/clip-cull-4.shader_test

Fixes: 29d7bdd179bb (radeonsi: scan NIR shaders to obtain required info)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

nir/opt_intrinsics: Fix values for gl_SubGroupG{e,t}MaskARB

Previously the values were calculated by just shifting ~0 by the
invocation ID. This would end up including bits that are higher than
gl_SubGroupSizeARB. The corresponding CTS test effectively requires that
these high bits be zero so it was failing. There is a Piglit test as
well but this appears to checking the wrong values so it passes.

For the two greater-than bitmasks, this patch adds an extra mask with
(~0>>(64-gl_SubGroupSizeARB)) to force these bits to zero.

Fixes: KHR-GL45.shader_ballot_tests.ShaderBallotBitmasks
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102680#c3
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Neil Roberts <nroberts@igalia.com>

i965: Check CCS_E compatibility for texture view rendering

Only use CCS_E to render to a texture that is CCS_E-compatible with the
original texture's miptree (linear) format. This prevents render
operations from writing data that can't be decoded with the original
miptree format.

On Gen10, with the new CCS_E-enabled formats handled, this enables the
driver to pass the arb_texture_view-rendering-formats piglit test.

v2. Add a TODO for texturing. (Jason)

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

intel/isl: Disable some gen10 CCS_E formats for now

CannonLake additionally supports R11G11B10_FLOAT and four 10-10-10-2
formats with CCS_E. None of these formats fit within the current
blorp_copy framework so disable them until support is added.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

meson: pass correct args to gles2 ABI test

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

meson: pass correct args to gles1 ABI test

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

meson: pass correct args to gbm symbol test

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

meson: pass correct args to wayland-egl symbol test

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

automake+meson: don't run egl symbol check on libglvnd lib

We might want to add a symbol check for the glvnd variant though.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

meson: pass correct env/args to egl tests

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

gles2: fail symbol check if lib is missing

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

gles1: fail symbol check if lib is missing

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

gbm: fail symbol check if lib is missing

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

wayland-egl: fail symbol check if lib is missing

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

egl: fail symbol check if lib is missing

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

meson: set visibility flags on gbm

This is done in autotools, and is an oversight in the meson build.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Eric Engestrom <eric.engestrom@imgtec.com>

meson: Don't link gbm with threads

It's supposed to be linked with pthread-stubs (if the platform needs
pthread-stubs). Pthread stubs support isn't (yet) implemented in the
meson build, so add a TODO.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>

meson: Use true and false instead of yes and no for tristate options

This allows a user to not care whether they're setting a tristate or a
boolean option, which is a nice user facing feature, and something I've
personally run into.

Suggested-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>

winsys/amdgpu: Add R600_DEBUG flag to reserve VMID per ctx.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

meson: do not search for needless deps

If we don't want to use these deps, there's no good reason to search
for them in the first place. This should shave a bit of time for the
initial build.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>

radv: bail out when binding the same vertex buffers

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: bail out when binding the same index buffer

DOW3 appears to hit this path.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

meson: use dep_m in libgallium

The u_format_other.c users sqrtf, which on some systems require
a math-library. So let's make sure we link with it.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

radv: use correct alloc function when loading from disk

Fixes regression in:

dEQP-VK.api.object_management.alloc_callback_fail.graphics_pipeline

Fixes: 1e84e53712ae "radv: add cache items to in memory cache when reading from disk"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

i965: Fix ARB_indirect_parameters logic.

This patch modifies the ARB_indirect_parameters logic in
brw_draw_prims, so that our implementation isn't affected if
another application attempts to use predicates. Previously we
were using a predicate with a DELTAS_EQUAL comparison operation
and relying on the MI_PREDICATE_DATA register being 0. Our code
to initialize MI_PREDICATE_DATA to 0 was incorrect, so we were
accidentally using whatever value was written there. Because the
kernel does not initialize the MI_PREDICATE_DATA register on
hardware context creation, we might inherit the value from whatever
context was last running on the GPU (likely another process).
The Haswell command parser also does not currently allow us to write
the MI_PREDICATE_DATA register. Rather than fixing this and requiring
an updated kernel, we switch to a different approach which uses a
SRCS_EQUAL predicate that makes no assumptions about the states of any
of the predicate registers.

Fixes Piglit's spec/arb_indirect_parameters/tf-count-arrays test.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103085
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: Don't flag BRW_NEW_SURFACES unless some push constants are dirty.

Due to a gaffe on my part, we were re-emitting all binding table entries
on every single draw call.  The push_constant_packets atom listens to
BRW_NEW_DRAW_CALL, but skips emitting 3DSTATE_CONSTANT_XS for each stage
unless stage_state->push_constants_dirty is true.  However, it flagged
BRW_NEW_SURFACES unconditionally at the end, by mistake.

Instead, it should only flag it if we actually emit 3DSTATE_CONSTANT_XS
for a stage.  We can move it a few lines up, inside the loop - the early
continues will skip over it if push constants aren't dirty for a stage.

With INTEL_NO_HW=1 set, improves performance of GFXBench5 gl_driver_2
on Apollolake at 1280x720 by 1.01122% +/- 0.470723% (n=35).

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

intel/genxml: Fix decoding of groups with fields smaller than a DWord.

Groups containing fields smaller than a DWord were not being decoded
correctly.  For example:

    <group count="32" start="32" size="4">
      <field name="Vertex Element Enables" start="0" end="3" type="uint"/>
    </group>

gen_field_iterator_next would properly walk over each element of the
array, incrementing group_iter, and calling iter_group_offset_bits()
to advance to the proper DWord.  However, the code to print the actual
values only considered iter->field->start/end, which are 0 and 3 in the
above example.  So it would always fetch bits 3:0 of the current DWord
when printing values, instead of advancing to each element of the array,
printing bits 0-3, 4-7, 8-11, and so on.

To fix this, we add new iter->start/end tracking, which properly
advances for each instance of a group's field.

Caught by Matt Turner while working on 3DSTATE_VF_COMPONENT_PACKING,
with a patch to convert it to use an array of bitfields (the example
above).

This also fixes the decoding of 3DSTATE_SBE's "Attribute Active
Component Format" fields.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

glsl: Fix bad formatting in a comment

Trivial

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>

broadcom/vc5: Force blending to treat alpha as 1 for formats without alpha.

Fixes fbo-blending-formats on RGB8 and 565. We will still need to demote
blending to shader code in the MRT case to fix it in general, but that can
be added when we start doing 32F blending (which also needs to be done in
the shader).

broadcom/vc5: Do BGRA vs RGBA swapping for the BLEND_CONSTANT_COLOR.

Fixes many of the fbo-blending-formats tests.

broadcom/vc5: Pack clear colors according to the TLB internal format/type.

The previous packing I did got us all the R*16F and R*32F formats, where
the pipe format basically matched the TLB's format, but since the clear
color will just be memcpyed to the TLB, we should be looking at its format
for deciding how to pack.

Fixes RGB565, RGB5_A1 and RGBA10 fbo-clear-formats tests and improves
4444.

broadcom/vc5: Don't do r/b channel swapping on 565.

The HW's format actually matches the gallium format.

broadcom/vc5: Use the proper gallium format for our RGB10_A2.

This keeps us from needing our own reswizzling of the B vs R fields.

broadcom/vc5: Add some comments about the texture/output format ordering.

The output formats are consistent with their channels appearing from low
to high in their name. Textures are interpreted the same way, but their
names may have the channels swapped around. I'm retaining the texture
names so that we are consistent with the documentation, but I want to
leave a warning for others.

broadcom/vc5: Drop duplicated setup of clip_window_height_in_pixels.

broadcom/vc5: Don't forget to actually turn on stencil testing.

I had the rest of stencil state set up, but forgot to actually enable it
in the higher level configuration bits packet.

broadcom/vc5: Stop lowering negates to subs.

In the case of fneg(0.0), we were getting back 0.0 instead of -0.0. We
were also needing an immediate 0 value for ineg, when there's an opcode to
do the job properly.

Fixes fs-floatBitsToInt-neg.shader_test.

broadcom/vc5: Set up MSAA texture type according to the internal format.

It gets most of EXT_framebuffer_multisample-formats passing, but doesn't
really work for texture views.

broadcom/vc5: Use the sampler view's format, not the resource's.

This should help with texture views, though I just noticed this while
reading the code.

broadcom/vc5: Emit raw loads for MSAA buffers.

Similar to stores, but we also need to emit dummy stores in between each
load, to flush out the previous queued load.