git.libre-soc.org Git - mesa.git/log

intel/compiler: Cast to target type before shifting left

Otherwise a smaller type may be promoted to int, which can hit undefined
behaviour:

../src/intel/compiler/brw_packed_float.c:66:17: runtime error: left shift of 128 by 24 places cannot be represented in type 'int'
    #0 0x5604a03969aa in brw_vf_to_float ../src/intel/compiler/brw_packed_float.c:66
    #1 0x5604a0391305 in vf_float_conversion_test_test_vf_to_float_Test::TestBody() ../src/intel/compiler/test_vf_float_conversions.cpp:70
    #2 0x5604a041a323 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402
    #3 0x5604a0405c31 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438
    #4 0x5604a03ab03b in testing::Test::Run() ../src/gtest/src/gtest.cc:2474
    #5 0x5604a03ad714 in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656
    #6 0x5604a03afea2 in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774
    #7 0x5604a03cb87c in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649
    #8 0x5604a041df3c in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402
    #9 0x5604a0409609 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438
    #10 0x5604a03c2e9e in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257
    #11 0x5604a0442d57 in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233
    #12 0x5604a0442c17 in main ../src/gtest/src/gtest_main.cc:37
    #13 0x7f9a1983dbba in __libc_start_main ../csu/libc-start.c:308
    #14 0x5604a0390d89 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/vf_float_conversions+0x8dd89)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>

intel/compiler: Don't left-shift by >= the number of bits of the type

To avoid it, use the modulo of the number of bits in the value being
shifted, which is presumably what ended up happening on x86.

Flagged by UBSan:

../src/intel/compiler/brw_eu_validate.c:974:33: runtime error: shift exponent 64 is too large for 64-bit type 'long unsigned int'
    #0 0x561abb612ab3 in general_restrictions_on_region_parameters ../src/intel/compiler/brw_eu_validate.c:974
    #1 0x561abb617574 in brw_validate_instructions ../src/intel/compiler/brw_eu_validate.c:1851
    #2 0x561abb53bd31 in validate ../src/intel/compiler/test_eu_validate.cpp:106
    #3 0x561abb555369 in validation_test_source_cannot_span_more_than_2_registers_Test::TestBody() ../src/intel/compiler/test_eu_validate.cpp:486
    #4 0x561abb742651 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402
    #5 0x561abb72e64d in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438
    #6 0x561abb6d5451 in testing::Test::Run() ../src/gtest/src/gtest.cc:2474
    #7 0x561abb6d7b2a in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656
    #8 0x561abb6da2b8 in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774
    #9 0x561abb6f5c92 in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649
    #10 0x561abb74626a in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402
    #11 0x561abb732025 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438
    #12 0x561abb6ed2b4 in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257
    #13 0x561abb768b3b in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233
    #14 0x561abb7689fb in main ../src/gtest/src/gtest_main.cc:37
    #15 0x7f525e5a9bba in __libc_start_main ../csu/libc-start.c:308
    #16 0x561abb538ed9 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/eu_validate+0x1b8ed9)

Reviewed-by: Adam Jackson <ajax@redhat.com>

anv: fix error message

`strerror()` takes an `errno`, not the negative value returned by the
`ioctl()`.
Instead of fixing this as `"%s", strerror(errno)`, let's just use the
`"%m"` shortcut for it.

Fixes: 2b5f30b1d91b98ab27ba ("anv: implement VK_INTEL_performance_query")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

meson: add -Werror=empty-body to disallow `if(x);`

This would have prevented a bug in MR 2058 [1]; with that MR fixed,
nothing else uses empty-body blocks, so let's just forbid them altogether.

[1] https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2058#note_237880

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>

llvmpipe: avoid generating empty-body blocks

Suggested-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>

llvmpipe: avoid compiling no-op block on release builds

Suggested-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>

winsys/svga: Limit the maximum DMA hardware buffer size

The kernel total GMR/DMA size is limited, but it's definitely possible for the
kernel to allow a larger buffer allocation to succeed, but command
submission using that buffer as a GMR would fail typically causing an
application crash.

So have the winsys limit the size of GMR/DMA buffers. The pipe driver will
then resort to allocating smaller buffers and perform the DMA transfer in
multiple bands, also allowing for the pre-flush mechanism to kick in.

This avoids the related application crashes.

Fixes: e7843273fae ("winsys/svga: Update to vmwgfx kernel module 2.1")
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

svga: Fix banded DMA upload unmap

Even with banded DMA uploads, st->hwbuf is always non-NULL, but when we've
allocated a software buffer to hold the full upload, unmapping of the
hardware buffer has already been done before
svga_texture_transfer_unmap_dma(), and the code was performing an unmap of
an already mapped buffer.

Fix this by testing for software buffer not present.

Fixes: a9c4a861d5d ("svga: refactor svga_texture_transfer_map/unmap functions")
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

gitlab-ci: Update kernel for LAVA jobs to 5.4-rc4

Update to 5.4-rc4 so we can test Panfrost on devices with Mali T720 and
T820.

A bug was found that prevented things working at all on RK3288 devices,
so we carry a patch for now in my personal fork.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>

glsl: remove propagate_invariance() call from the linker

This was added in 586f4a42e78f and became redundant with 34ab9b0947cd

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

nir: improve nir_variable packing

Before:

/* size: 136, cachelines: 3, members: 10 */

After:

/* size: 128, cachelines: 2, members: 10 */

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>

nir: fix nir_variable_data packing

Before:

/* size: 60, cachelines: 1, members: 29 */

After:

/* size: 56, cachelines: 1, members: 29 */

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>

radeonsi/nir: implement pipe_screen::finalize_nir

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

st/mesa: use pipe_screen::finalize_nir

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

tgsi_to_nir: use pipe_screen::finalize_nir

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

gallium: add pipe_screen::finalize_nir

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

st/mesa: update VS shader_info for NIR after lowering passes

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

st/mesa: assign driver locations for VS inputs for NIR before caching

fix up edge flags in the NIR pass, because st/mesa doesn't touch the inputs
after caching

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

st/mesa: don't lower_global_vars_to_local for VS if there are no dead inputs

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

st/mesa: move some NIR lowering before shader caching

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

util/u_queue: skip util_queue_finish if num_threads is 0

This fixes a deadlock in pthread_barrier_destroy.

Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

util/disk_cache: finish all queue jobs in destroy instead of killing them

If there are queued shaders to be written to disk, wait for that.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

iris: Rework edgeflag handling

We were relying on specific pass ordering in st to avoid setting
inputs_read/outputs_written for edge flags. Instead, just assume
that it happens and throw out the results we don't want.

We should probably revisit this and try and add a vertex element
property like I originally wanted so we can avoid having it be
associated with the VS altogether.

gallium/noop: implement get_disk_shader_cache and get_compiler_options

trivial

aco: take LDS into account when calculating num_waves

pipeline-db (Vega):
SGPRS: 344 -> 344 (0.00 %)
VGPRS: 424 -> 524 (23.58 %)
Spilled SGPRs: 84 -> 80 (-4.76 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 52812 -> 52484 (-0.62 %) bytes
LDS: 135 -> 135 (0.00 %) blocks
Max Waves: 56 -> 53 (-5.36 %)

v2: consider WGP, rework to be clearer and apply the
"maximum 16 workgroups per CU" limit properly
v2: use "SIMD" instead of "EU"
v2: fix spiller by introducing "Program::max_waves"
v2: rename "lds_size" to "lds_limit"
v3: make max_waves actually independant of register usage
v3: fix issue where max_waves was way too high
v3: use DIV_ROUND_UP(a, b) instead of max(a / b, 1)
v3: rename "workgroups_per_cu" to "workgroups_per_cu_wgp"
v4: fix typo from "workgroups_per_cu" rename

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v3)

aco: increase accuracy of SGPR limits

SGPRs are allocated in groups of 16 on GFX8/GFX9. GFX10 allocates a fixed
number of SGPRs and has 106 addressable SGPRs.

pipeline-db (Vega):
SGPRS: 5912 -> 6232 (5.41 %)
VGPRS: 1772 -> 1780 (0.45 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 88228 -> 87904 (-0.37 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 559 -> 571 (2.15 %)

piepline-db (Navi):
SGPRS: 341256 -> 363384 (6.48 %)
VGPRS: 171536 -> 170960 (-0.34 %)
Spilled SGPRs: 832 -> 581 (-30.17 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 14207332 -> 14190872 (-0.12 %) bytes
LDS: 33 -> 33 (0.00 %) blocks
Max Waves: 18072 -> 18251 (0.99 %)

v2: unconditionally count vcc as an extra sgpr on GFX10+
v3: pass SGPRs rounded to 8

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>

radv: round vgprs/sgprs before calculating max_waves

Note that ACO doesn't correctly round SGPR counts on GFX8/GFX9.

pipeline-db (ACO/Vega):
SGPRS: 11000 -> 11000 (0.00 %)
VGPRS: 3120 -> 3120 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 164328 -> 164328 (0.00 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 1125 -> 1000 (-11.11 %)

v2: consider wave32

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>

docs: Add new Intel extension

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

Revert "vc4: do not report alpha-test as supported"

This reverts commit a79b93269cf340ce4d23b5b34100039bcaafc841.

Reviewed-by: Jose Maria Casanova <jmcasanova@igalia.com>

Revert "v3d: do not report alpha-test as supported"

This reverts commit 9d0523b569bb7208c6e74cafc0f3945415d94336.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jose Maria Casanova <jmcasanova@igalia.com>

Revert "nir: drop support for using load_alpha_ref_float"

This reverts commit 5af272b47469398762e984e27f65fc4ecc293d28.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jose Maria Casanova <jmcasanova@igalia.com>

Revert "nir: drop unused alpha_ref_float"

This reverts commit e8095f2af0736b5937674ca319f29cc9dabb17d4.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jose Maria Casanova <jmcasanova@igalia.com>

radv: fix a performance regression with graphics depth/stencil clears

I recently changed the slow depth/stencil clear path to make sure
depth values are explicitly exported by the fragment shader. This
is actually only useful when VK_EXT_depth_range_unrestricted is
enabled.

While this path is correct, it introduced a performance regression
with Heroes of the Storm, Shadow of Mordor (Vulkan beta) and
probably more titles. This is because it prevents the hardware
to do some optimizations like discarding fragments.

This commit re-introduces the previous (a bit faster) slow
depth/stencil clear path and it selects the unrestricted path
only if VK_EXT_depth_range_unrestricted is enabled.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/863
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: fix vkUpdateDescriptorSets with inline uniform blocks

descriptorCount is the number of bytes into the descriptor, so
it shouldn't be used as an index. srcArrayElement/dstArrayElement
specify the starting byte offset within the binding to copy from/to.

This fixes new CTS tests:
dEQP-VK.binding_model.descriptor_copy.*.inline_uniform_block_*
dEQP-VK.binding_model.descriptor_copy.*.mix_3
dEQP-VK.binding_model.descriptor_copy.*.mix_array1

Fixes: 8d2654a4197 ("radv: Support VK_EXT_inline_uniform_block.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv/gfx10: fix 3D images

GFX10 does act like GFX9 actually.

This fixes
dEQP-VK.glsl.texture_functions.query.texturesize.*sampler3d_*.

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv/gfx10: re-enable fast depth/stencil clears with separate aspects

It used to cause weird issues on GFX10 in the past with vkmark and
Wreckfest, and they can't be reproduced now. Shadow Of Mordor
(Vulkan beta) hits that path and it works fine.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: do not emit rbplus if attachments are undefined

Fixes some crashes with dEQP-VK.geometry.layered.*.secondary_cmd_buffer
on Raven and other chips that allow rbplus.

This just prevents a crash and rbplus probaby needs more work.

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: add an assertion in radv_gfx10_compute_bin_size()

To prevent out of bounds access.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: do not create meta pipelines with 16 samples

The driver only supports up to 8 samples, so it's useless to
create more pipelines than needed.

This fixes a conditional jump reported by Valgrind on GFX10:

==194282== Conditional jump or move depends on uninitialised value(s)
==194282==    at 0xDBF925A: radv_gfx10_compute_bin_size (radv_pipeline.c:3242)
==194282==    by 0xDBF95A6: radv_pipeline_generate_binning_state (radv_pipeline.c:3334)
==194282==    by 0xDBFC1A0: radv_pipeline_generate_pm4 (radv_pipeline.c:4440)
==194282==    by 0xDBFD15E: radv_pipeline_init (radv_pipeline.c:4764)
==194282==    by 0xDBFD23E: radv_graphics_pipeline_create (radv_pipeline.c:4788)
==194282==    by 0xDBB95A3: create_pipeline (radv_meta_clear.c:114)
==194282==    by 0xDBB9AC5: create_color_pipeline (radv_meta_clear.c:297)
==194282==    by 0xDBBCF05: radv_device_init_meta_clear_state (radv_meta_clear.c:1277)
==194282==    by 0xDB9ACD9: radv_device_init_meta (radv_meta.c:363)
==194282==    by 0xDB7FE3A: radv_CreateDevice (radv_device.c:2080

This is caused by an out of bound access of 'fmask_array' (ie. index
is 4 as for 16 samples).

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

anv: implement VK_INTEL_performance_query

v2: Introduce the appropriate pipe controls
    Properly deal with changes in metric sets (using execbuf parameter)
    Record marker at query end

v3: Fill out PerfCntr1&2

v4: Introduce vkUninitializePerformanceApiINTEL

v5: Use new execbuf extension mechanism

v6: Fix comments in genX_query.c (Rafael)
    Use PIPE_CONTROL workarounds (Rafael)
    Refactor on the last kernel series update (Lionel)

v7: Only I915_PERF_IOCTL_CONFIG when perf stream is already opened (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

intel/perf: add mdapi writes for register perf counters

Those are not part of the OA reports.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

intel/genxml: add RPSTAT register for core frequency

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

intel/genxml: add generic perf counters registers

We have 2 of those we can configure to source programmable events.
Those are not part of the OA reports. Configuration happens in i915
through the metric set selected by the application. On the Mesa side
we'll just sample those and do a diff.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

intel/perf: add support for querying kernel loaded configurations

We use this as a communication mechanism between MDAPI & Anv.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

drm-uapi: Update headers from drm-next

Pull new updates from drm-next as of the following commit:

commit f1b4a9217efd61d0b84c6dc404596c8519ff6f59
Merge: 400e91347e1d f3a36d469621
Author: Dave Airlie <airlied@redhat.com>
Date: Tue Oct 22 15:04:00 2019 +1000

Merge tag 'du-next-20191016' of git://linuxtv.org/pinchartl/media into drm-next

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/perf: move registers to their own header

Will conflict with the genxml RPSTAT register.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

intel/perf: extract register configuration

We want to query the content of register configurations from the
kernel. Let's pull this out of the query.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

intel/perf: expose some utility functions

The Vulkan performance query extension is a bit lower level than the
GL one. Expose some of the functions to do the result accumulation
directly in the Anv driver.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

intel/perf: add mdapi maker helper

A simple utility to put the marker at the right location.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

st/mesa: Silence chatty debug printf

Other debug_printf's in this file are in if (0) blocks.

Trivial.

st/mesa: Map MESA_FORMAT_RGB_UNORM8 <-> PIPE_FORMAT_R8G8B8_UNORM

This is useful for PBO texture upload with GL_RGB and GL_UNSIGNED_BYTE.

v2: Vasily Khoruzhick provided an update for the Lima CI expectations.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

anv: fix unwind of vkCreateDevice fail

We're skipping the context destruction in some cases which is the
grand scheme of thing is not that important because closing device->fd
will destroy the associated context as well.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Fixes: b30e01aef56 ("anv: fix memory leak on device destroy")

Revert "aco: only emit waitcnt on loop continues if we there was some load or export"

We don't properly pass on ctx.lgkm_cnt/ctx.barrier_imm/etc, so this
waitcnt was necessary for barriers and correctly waiting for SMEM before
s_dcache_wb on GFX10.

Totals from affected shaders:
SGPRS: 33200 -> 33200 (0.00 %)
VGPRS: 31376 -> 31376 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 2431804 -> 2433956 (0.09 %) bytes
LDS: 316 -> 316 (0.00 %) blocks
Max Waves: 1609 -> 1609 (0.00 %)

This reverts commit 2c050b49b3d776f054f1265d5523cabb61f22fc3.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>

aco: add missing bld.scc()

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>

aco: keep can_reorder/barrier when combining addition into SMEM

Affects 30 shaders in the pipeline-db (all youngblood).

Totals from affected shaders:
SGPRS: 2656 -> 2456 (-7.53 %)
VGPRS: 2260 -> 2260 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 240680 -> 240944 (0.11 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 90 -> 90 (0.00 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>

aco: add a few missing checks in value numbering

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>

aco: use ds_read2_b64/ds_write2_b64

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>

aco: properly combine additions into ds_write2_b64/ds_read2_b64

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>

aco: fix sparse store_lds()

p_extract_vector's second operand is in units of the definition size, not
dwords.

v2: move extract_subvector() to right before ds_write_helper

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>

aco: create load_lds/store_lds helpers

We'll want these for GS, since VS->GS IO on Vega is done using LDS.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>

aco: fix 64-bit p_extract_vector on 32-bit p_create_vector

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>

aco: small stage corrections

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>

st/mesa: replace pipe_shader_state with tgsi_token* in st_vp_variant

we don't need more than that

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

nir: allow nir_lower_uniforms_to_ubo to be run repeatedly

for st/mesa

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

freedreno/ir3: fixup register footprint fixup

Small typo resulted in not converting footprint to vec4, meaning that we
could potentially ask for quite a few more registers than required

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>

freedreno/ir3: handle scalarized varying inputs

If the load_interpolated_input is scalarized, we would be too
conservative about deciding the tex instruction wasn't a candidate to
pre-fetch:

vec1 32 ssa_0 = load_const (0x00000000 /* 0.000000 */)
vec2 32 ssa_1 = intrinsic load_barycentric_pixel () (0) /* interp_mode=0 */
vec1 32 ssa_2 = intrinsic load_interpolated_input (ssa_1, ssa_0) (0, 0) /* base=0 */ /* component=0 */ /* packed:v_uv,v_uv1 */
vec1 32 ssa_3 = intrinsic load_interpolated_input (ssa_1, ssa_0) (0, 1) /* base=0 */ /* component=1 */ /* packed:v_uv,v_uv1 */
vec2 32 ssa_8 = vec2 ssa_2, ssa_3
vec4 32 ssa_9 = tex ssa_8 (coord), 0 (texture), 0 (sampler)

Really we don't care that the texcoord components come from different
load_interpolated_input instructions, just that they have consecutive
varying offsets.

Reported-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>

aco: refactor value numbering

Previously, we used one hashset per BB, so that we could
always initialize the current hashset from the immediate
dominator. This patch changes the behavior to a single
hashmap using the block index per instruction to resolve
dominance.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

mesa/st: assert that lowering is supported

Some of these lowerings aren't supported for drivers that supports
tesselation and geometry shaders. Let's add a couple of asserts to make
it obvious if these have been enabled when it's not possible.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>

gitlab-ci: Enable llvmpipe in ARM build jobs

v2:
* Use LLVM 8 from buster-backports
v3:
* Use LLVM 7 again for armhf, llvmpipe is still broken there with LLVM 8

Acked-by: Eric Engestrom <eric.engestrom@intel.com>

gitlab-ci: Update the meson cross file for LLVM_VERSION as well

Cross builds don't use the llvm-config path from the native file.

gitlab-ci: Use native aarch64 runner for ARM build jobs

This allows running the regression tests.

One downside is that we can't easily build the Vulkan overlay layer,
because only x86 binaries of the glslang validator are available. If
that's important, we could either use those binaries via qemu, or build
it from source.

v2:
* Add :amd64 suffix to existing debian-9/10 job names (Eric Engestrom)

Acked-by: Eric Engestrom <eric.engestrom@intel.com> # v1

gitlab-ci: Explicitly list debian-10 in needs: for .deqp-test template

Apparently needs: in a definition overwrites inherited ones. So
.deqp-test effectively didn't declare needs: for debian-10, which means
any jobs based on .deqp-test could spuriously run after the debian-10
job failed or was cancelled.

gitlab-ci: Bring ARM docker image install script in line with x86_64

Use https:// URLs in the APT configuration.

Drop --no-install-recommends, the image generation template disables
installation of recommended packages in /etc/apt/apt.conf.

Run apt-get autoremove at the end, cleaning up packages which were
installed to satisfy dependencies but are no longer needed.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

gitlab-ci: Sort ARM docker image packages in alphabetical order

No functional change.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

radv: fix updating bound fast ds clear values with different aspects

On GFX9, the driver is able to do an optimized fast depth/stencil
clear with only one aspect (ie. clear the stencil part of a
depth/stencil image). When this happens, the driver should only
update the clear values of the given aspect.

Note that it's currently only supported on GFX9 but I have some
local patches that extend this optimized path for other gens.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1967
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

intel/compiler: Refactor disassembly of sources in 3src instruction

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

intel/compiler: Don't move immediate in register

On Gen12, we support mixed mode HF/F operands, and also 3 source
instruction supports immediate value support, so keep immediate as it
is, if it fits properly in 16 bit field.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

intel/compiler: Set bits according to source file

On Gen >= 12, if src0 or src2 holds immediate value, we need set
src[0/2]_is_imm bits instead of register file.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

intel/compiler: Add Immediate support for 3 source instruction

On Gen >= 10, Either src0 or src2 can use 16-bit immediate value, but
not both.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

ci: Disable lima until its farm can get fixed.

It's been throwing the following error today:

"<Fault -32603: 'Internal Server Error (contact server administrator
for details): could not extend file "base/17952/18226": No space left
on device\nHINT: Check free disk space.\n'>"

Reviewed-by: Daniel Stone <daniels@collabora.com>

intel: Add missing entry for brw_nir_lower_alpha_to_coverage in Makefile

Fixes: 7ecfbd4f6d4 ("nir: Add alpha_to_coverage lowering pass")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>

llvmpipe: handle compute shader launch with 0 threads

If you set LP_NUM_THREADS=0 compute shaders would hang,
just execute the workloads in sequence if we have no threads
in the pool.

Fixes: 1b24e3ba75 ("llvmpipe: add compute threadpool + mutex")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

freedreno/ir3: Add missing ir3_nir_lower_tex_prefetch.c to Android.mk

This file is created in 2a0d45ae6cf09d60c048d7854e3d082bf15e374f but
addition to android makefiles was omitted. It breaks the build with
missing references which are defined in this file.
List the file in ir3_SOURCES to make the build succeed.

Signed-off-by: Marijn Suijten <marijns95@gmail.com>

ac/llvm: fix ac_to_integer_type() for 32-bit const addr space pointers

This fixes some crashes with dEQP-VK.descriptor_indexing.* when
read_first_invocation has its source from a descriptor.

Most of these tests still fail because of an LLVM bug (they work
with ACO).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

aco: run opt_algebraic in a loop

Totals from affected shaders:
SGPRS: 13920 -> 13656 (-1.90 %)
VGPRS: 12972 -> 12960 (-0.09 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 1005680 -> 1000648 (-0.50 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Max Waves: 688 -> 688 (0.00 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>

aco: use nir_lower_idiv_precise

v7: rename _nv50/_llvm to _fast/_precise

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>

nir/lower_idiv: add new llvm-based path

v2: make variable names snake_case
v2: minor cleanups in emit_udiv()
v2: fix Panfrost build failure
v3: use an enum instead of a boolean flag in nir_lower_idiv()'s signature
v4: remove nir_op_urcp
v5: drop nv50 path
v5: rebase
v6: add back nv50 path
v6: add comment for nir_lower_idiv_path enum
v7: rename _nv50/_llvm to _fast/_precise
v8: fix etnaviv build failure

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>

intel/compiler: Remove emit_alpha_to_coverage workaround from backend

Remove emit_alpha_to_coverage workaround from backend compiler and start
using ported workaround from NIR.

v2: Copy comment from brw_fs_visitor (Caio Marcelo de Oliveira Filho)

Fixes piglit test on HSW:
- arb_sample_shading-builtin-gl-sample-mask-mrt-alpha-to-coverage-combinations

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

nir: Add alpha_to_coverage lowering pass

Importing this pass from fs_visitor::emit_alpha_to_coverage_workaround()
in intel/compiler.

v2 (Caio Marcelo de Oliveira Filho):
- Track store output and sample mask instruction
- Nest math insturction for more readability
- Bail out early if no gl_SampleMask

v3: (Caio Marcelo de Oliveira Filho):
- Do math instructions after instruction block
- Restructure code
- Move pass under src/intel/compiler

v4: (Caio Marcelo de Oliveira Filho):
- Organize dither mask calculation

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

aco: ensure that uniform booleans are computed in WQM if their uses happen in WQM

This fixes graphical corruption in SC2.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

meson: Require meson >= 0.49.1 when using icc or icl

0.49.0 can compile most of mesa with ICC or ICL, but not SWR without
additional workarounds in our meson.build files. Bumping patch version
is easier and shouldn't be a big burden anyway, especially to cover a
niche compiler. The check originally only covered ICC, but now covers
ICL as well.

Fixes: 3740ffb59c89d8d879b1e0c1aed32c389dd82a35
("meson: add switches for SWR with MSVC")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1937
Acked-by: Eric Engestrom <eric.engestrom@intel.com>

docs: update calendar, add news item and link release notes for 19.1.8

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>

docs: add release notes for 19.1.8

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit cc88eeb6ffc4e86d76dfdbfc601d519bc35b6c41)

docs: add release notes for 19.1.8

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit 5c6d266c591208b1c27e06f61b814210fc6e095f)

aco/gfx10: Update constant addresses in fix_branches_gfx10.

Due to a bug in GFX10 hardware, s_nop instructions must be added
if a branch is at 0x3f. We already do this, but forgot to also update
the constant addresses that come after this instruction.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>

aco/gfx10: Fix PS exports for SPI_SHADER_32_AR.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>

aco/gfx10: Wait for pending SMEM stores before loads

Currently if you have an SMEM store followed by an SMEM load that
loads the same location as was written, it won't work because the
store isn't finished before the load is executed. This is NOT
mitigated by an s_nop instruction on GFX10.

Since we currently don't have proper alias analysis, this commit adds
a workaround which will insert an s_waitcnt lgkmcnt(0) before each
SSBO load if they follow a store. We should further refine this in
the future when we can make sure to only add the wait when we load the
same thing as has been stored.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>

panfrost: Fix the DISCARD_WHOLE_RES case in transfer_map()

The current implementation does not synchronize on BO readiness when
DISCARD_WHOLE_RES flag is set, which can lead to misbehaviours when the
resource being updated is being used by one of the pending or already
flushed batches.

Adding unconditional BO synchronization would do the trick, but we can
sometimes optimize this path by re-allocating a new BO instead of
waiting for the existing one to be ready.

Reported-by: Daniel Stone <daniels@collabora.com>
Reported-by: Heinrich Fink <heinrich.fink@daqri.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>

st/mesa: only require ESSL 3.1 for geometry shaders

According to the OES_geometry_shader spec, section Dependencies:

"OpenGL ES 3.1 and OpenGL ES Shading Language 3.10
are required."

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>

egl/android: Remove our own reference to buffers.

We currently doesn't maintain it correctly and the buffer gets leaked if
surface is destroyed before calling swapping buffers.

From Android frameworks/native/libs/nativewindow/include/system/window.h:

  The window holds a reference to the buffer between dequeueBuffer and
  either queueBuffer or cancelBuffer, so clients only need their own
  reference if they might use the buffer after queueing or canceling it.

v2: Remove our own reference.

Fixes: 0212db35040 ("egl/android: Cancel any outstanding ANativeBuffer in surface destructor")
Reviewed-by: Chia-I Wu <olvaffe@gmail.com> (v1)
Reviewed-By: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>