Boris Brezillon [Mon, 20 Jan 2020 21:00:48 +0000 (22:00 +0100)]
panfrost/midgard: Print the actual source register for store operations
Store operation use r26/r27 but have a word->reg set to 0 or 1 (base is
r26). Let's take this base offset into account in
print_load_store_instr().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3482>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3482>
Alyssa Rosenzweig [Thu, 16 Jan 2020 15:43:03 +0000 (10:43 -0500)]
panfrost: Add pandecode entries for ASTC/ETC formats
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3414>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3414>
Icecream95 [Sat, 11 Jan 2020 06:19:45 +0000 (19:19 +1300)]
panfrost: Add ASTC texture formats
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3414>
Icecream95 [Sat, 11 Jan 2020 07:00:38 +0000 (20:00 +1300)]
panfrost: Add ETC1/ETC2 texture formats
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3414>
Alyssa Rosenzweig [Wed, 15 Jan 2020 18:15:01 +0000 (13:15 -0500)]
panfrost: Rework linear<--->tiled conversions
There's a lot going on here (it's a ton of commits squashed together
since otherwise this would be impossible to review...)
1. We have a fast path for linear->tiled for whole (aligned) tiles, but we
have to use a slow path for unaligned accesses. We can get a pretty
major win for partial updates by using this slow path simply on the
borders of the update region, and then hit the fast path for the
tile-aligned interior. This does require some shuffling.
2. Mark the LUTs constant, which allows the compiler to inline them,
which pairs well with loop unrolling (eliminating the memory accesses
and just becoming some immediates.. which are not as immediate on
aarch64 as I'd like..)
3. Add fast path for bpp1/2/8/16. These use the same algorithm and we
have native types for them, so may as well get the fast path.
4. Drop generic path for bpp != 1/2/8/16, since these formats are
generally awful and there's no way to tile them efficienctly and
honestly there's not a good reason too either. Lima doesn't support any
of these formats; Panfrost can make the opinionated choice to make them
linear.
5. Specialize the unaligned routines. They don't have to be fully
generic, they just can't assume alignment. So now they should be nearly
as fast as the aligned versions (which get some extra tricks to be even
faster but the difference might be neglible on some workloads).
6. Specialize also for the size of the tile, to allow 4x4 tiling as well
as 16x16 tiling. This allows compressed textures to be efficiently tiled
with the same routines (so we add support for tiling ASTC/ETC textures
while we're at it)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Vasily Khoruzhick <anarsoul@gmail.com> #lima on Mali400
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3414>
Alyssa Rosenzweig [Tue, 14 Jan 2020 17:52:02 +0000 (12:52 -0500)]
panfrost,lima: De-Galliumize tiling routines
There's an implicit dependence on Gallium here that will add more
complexity than needed when testing/optimizing out of driver as well as
potentially Vulkanizing. We don't need a full pipe_box, just the x/y/w/h
properties directly.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Vasily Khoruzhick <anarsoul@gmail.com> #lima on Mali400
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3414>
Alyssa Rosenzweig [Tue, 14 Jan 2020 17:27:47 +0000 (12:27 -0500)]
panfrost: Compile tiling routines with -O3
These are major hot spots for panfrost and lima; better let the compiler
do its thing even on debug builds.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Vasily Khoruzhick <anarsoul@gmail.com> #lima on Mali400
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3414>
Bas Nieuwenhuizen [Tue, 21 Jan 2020 10:49:55 +0000 (11:49 +0100)]
radv: Remove syncobj_handle variable in header.
I strongly suspect it was supposed to be a typedef. However, used
nowhere, we should remove it.
Fixes: eaa56eab6da "radv: initial support for shared semaphores (v2)"
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2385
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3479>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3479>
Neil Armstrong [Tue, 15 Oct 2019 13:22:07 +0000 (15:22 +0200)]
gitlab-ci/lava: add pipeline information in the lava job name
In order to have more informations in the LAVA jobs list, add the
current pipeline URL and commit ref name in the LAVA job name.
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2337>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2337>
Jan Zielinski [Mon, 20 Jan 2020 12:57:36 +0000 (13:57 +0100)]
gallium/gallivm: enable linking lp_bld_printf function with C++ code
To enable linking functions declared in lp_bld_printf.h file with C++,
we need to add appropriate macros to the header.
Reviewed-by: Krzysztof Raszkowski <krzysztof.raszkowski@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3470>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3470>
Danylo Piliaiev [Wed, 15 Jan 2020 16:56:42 +0000 (18:56 +0200)]
iris: Fix value of out-of-bounds accesses for vertex attributes
Having VERTEX_BUFFER_STATE.BufferSize greater than the size of
a bound vertex buffer allows shader to read uninitialized vertex
attributes from BO, instead of allowing hardware to return zeroes
on out-of-bounds access.
OpenGL spec "6.4 Effects of Accessing Outside Buffer Bounds" says:
"Robust buffer access can be enabled by creating a context with robust access
enabled through the window system binding APIs. When enabled, any command
unable to generate a GL error as described above, such as buffer object accesses
from the active program, will not read or modify memory outside of the data
store of the buffer object and will not result in GL interruption or termination.
Out-of-bounds reads may return values from within the buffer object or zero
values."
Fixes three webgl tests:
conformance/rendering/out-of-bounds-array-buffers.html
conformance2/rendering/out-of-bounds-index-buffers-after-copying.html
conformance2/rendering/element-index-uint.html
See #1996
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3427>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3427>
Vasily Khoruzhick [Thu, 19 Dec 2019 06:15:11 +0000 (22:15 -0800)]
ci: Re-enable CI for lima on mali450
Amend fails and skips lists basing on lists from Andreas Baierl,
shard mali400 job across two devices since it takes close to 10min
and rename jobs to lima-mali400-test and lima-mali450-test.
Also don't set MESA_GLES_VERSION_OVERRIDE=3.0 for lima since we don't support
GLES 3.0 and lower DEQP_PARALLEL to 3 for jobs on H3.
Keep mali400 jobs disabled atm since they take too much time to complete
and we also get some unexplicable failures in dEQP-GLES2.functional.default_vertex_attrib.*
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3163>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3163>
Vasily Khoruzhick [Fri, 17 Jan 2020 03:37:32 +0000 (19:37 -0800)]
ci: lava: pass CI_NODE_INDEX and CI_NODE_TOTAL to lava jobs
deqp-runner.sh uses it to determine whether we split job across multiple
devices and if we do what's the node index.
With this change we now can set 'parallel: N' in job description if we want
to split the job.
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3163>
Hyunjun Ko [Fri, 17 Jan 2020 07:23:03 +0000 (07:23 +0000)]
turnip: fix invalid VK_ERROR_OUT_OF_POOL_MEMORY
When VK_DESCRIPTOR_TYPE_SAMPLER is provided, it doesn't need to be
counted as a buffer count. Otherwise it leads to mismatch of allocated
buffer size, hitting VK_ERROR_OUT_OF_POOL_MEMORY finally.
Fixes: c39afe68f0390d45130c1317b3b7e65f55542c36
Also fixes amber tests:
./tests/cases/address_modes_float.amber
./tests/cases/address_modes_int.amber
./tests/cases/magfilter_linear.amber
./tests/cases/magfilter_nearest.amber
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Jan Vesely [Sun, 19 Jan 2020 02:27:01 +0000 (21:27 -0500)]
clover: Initialize Asm Parsers
Fixes piglits that use ADMGCN inline assembly:
program@execute@calls
program@execute@amdgcn-mubuf-negative-vaddr
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Jason Ekstrand [Sat, 18 Jan 2020 05:52:50 +0000 (23:52 -0600)]
anv: Allow enumerating multiple physical devices
Instead of having a single physical device in anv_instance, have a
linked list of them. What we have now works today because we our GPUs
are build into the CPU and so you're guaranteed to only ever have one of
them. One day, that will change and we want ANV to be ready.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
Jason Ekstrand [Sat, 18 Jan 2020 05:48:12 +0000 (23:48 -0600)]
anv: Re-arrange physical_device_init
This commit simply moves fetching the device info and checking if ANV
supports the device a bit higher up. This way we fail earlier and it'll
make error checking easier in the next commit.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
Jason Ekstrand [Sat, 18 Jan 2020 05:45:31 +0000 (23:45 -0600)]
anv: Drop separate chipset_id fields
This already exists in gen_device_info. There's no reason to keep
duplicate copies.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
Jason Ekstrand [Sat, 18 Jan 2020 05:17:48 +0000 (23:17 -0600)]
anv: Move the physical device dispatch table to anv_instance
We don't actually have genX versions of any physical device level
commands so we don't need the trampoline versions and we don't need to
have a separate table per physical device.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
Jason Ekstrand [Sat, 18 Jan 2020 05:05:13 +0000 (23:05 -0600)]
anv: Drop the instance pointer from anv_device
There are very few times when we actually want to fetch the instance
from the anv_device. We can put up with a bit of pain there in exchange
for strongly discouraging people from doing this in general.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
Jason Ekstrand [Sat, 18 Jan 2020 05:03:41 +0000 (23:03 -0600)]
anv: Stop allocating WSI event fences off the instance
Fixes: 16eb390834d "anv: add VK_EXT_display_control to anv driver [v5]"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
Jason Ekstrand [Sat, 18 Jan 2020 04:57:35 +0000 (22:57 -0600)]
anv: Take a device in anv_perf_warn
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
Jason Ekstrand [Sat, 18 Jan 2020 04:43:06 +0000 (22:43 -0600)]
anv: Take an anv_device in vk_errorf
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
Jason Ekstrand [Sat, 18 Jan 2020 04:23:30 +0000 (22:23 -0600)]
anv: Add an anv_physical_device field to anv_device
Having to always pull the physical device from the instance has been
annoying for almost as long as the driver has existed. It also won't
work in a world where we ever have more than one physical device. This
commit adds a new field called "physical" to anv_device and switches
every location where we use device->instance->physicalDevice to use the
new field instead.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3461>
Marek Olšák [Thu, 9 Jan 2020 01:21:04 +0000 (20:21 -0500)]
radeonsi/gfx10: enable GS fast launch for triangles and strips with NGG culling
Only non-indexed triangle lists and strips are supported. This increases
performance if there is something to cull.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Thu, 9 Jan 2020 21:09:47 +0000 (16:09 -0500)]
radeonsi/gfx10: rewrite late alloc computation
- Use conservative late alloc when the number of CUs <= 6.
- Move the late alloc GS register to the GS shader state, so that it can be
tuned for NGG culling.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Fri, 10 Jan 2020 00:12:36 +0000 (19:12 -0500)]
ac: add helper ac_build_triangle_strip_indices_to_triangle
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Mon, 30 Dec 2019 19:23:16 +0000 (14:23 -0500)]
radeonsi/gfx10: implement NGG culling for 4x wave32 subgroups
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Tue, 24 Dec 2019 18:50:06 +0000 (13:50 -0500)]
radeonsi/gfx10: move GE_PC_ALLOC setting to shader states
The value is not changed. I just use a different way to compute it.
The value will vary with NGG culling.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Sat, 4 Jan 2020 02:16:22 +0000 (21:16 -0500)]
radeonsi/gfx10: don't initialize VGPRs not used by NGG passthrough
v2: TES doesn't use the GS PrimitiveID
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Fri, 3 Jan 2020 22:07:38 +0000 (17:07 -0500)]
radeonsi/gfx10: merge main and pos/param export IF blocks into one if possible
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Fri, 3 Jan 2020 21:59:20 +0000 (16:59 -0500)]
radeonsi/gfx10: export primitives at the beginning of VS/TES
This decreases VGPR usage and will allow us to merge some IF blocks
in shaders.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Fri, 3 Jan 2020 21:20:40 +0000 (16:20 -0500)]
radeonsi/gfx10: move s_sendmsg gs_alloc_req to the beginning of shaders
This will allow us to merge some IF blocks in shaders.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Fri, 6 Dec 2019 01:46:30 +0000 (20:46 -0500)]
radeonsi/gfx10: correct VS PrimitiveID implementation for NGG
We didn't use the correct LDS pointer, though it probably doesn't matter,
because I think that nothing else is using LDS here.
This commit makes it consistent with all other esgs_ring use.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Tue, 24 Dec 2019 00:42:46 +0000 (19:42 -0500)]
radeonsi/gfx10: update comments and remove invalid TODOs
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Thu, 2 Jan 2020 23:41:26 +0000 (18:41 -0500)]
ac: add ac_build_readlane without optimization barrier
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Mon, 30 Dec 2019 19:08:45 +0000 (14:08 -0500)]
ac: add prefix bitcount functions
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Sat, 18 Jan 2020 00:55:13 +0000 (19:55 -0500)]
radeonsi: turn an assertion into return in si_nir_store_output_tcs
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Fri, 17 Jan 2020 23:37:35 +0000 (18:37 -0500)]
radeonsi: fix doubles and int64
Fixes: 57bd73e2296 - radeonsi: remove llvm_type_is_64bit
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Sat, 18 Jan 2020 02:24:14 +0000 (21:24 -0500)]
radeonsi: don't invoke decompression inside internal launch_grid
Decompress resources properly but don't do it inside launch_grid
to prevent recursion.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Marek Olšák [Sat, 18 Jan 2020 02:23:12 +0000 (21:23 -0500)]
radeonsi: clean up how internal compute dispatches are handled
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Marek Olšák [Sat, 18 Jan 2020 00:19:43 +0000 (19:19 -0500)]
Revert "radeonsi: unbind image before compute clear"
This reverts commit
3a527eda7ceee37643f948bfcf05285c5aa3a4d6.
It's incorrect.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Samuel Pitoiset [Thu, 16 Jan 2020 16:03:43 +0000 (17:03 +0100)]
aco: implement nir_intrinsic_load_barycentric_at_sample on GFX6
GFX6 doesn't have FLAT instructions which means we have to emit
a 64-bit MUBUF load.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3432>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3432>
Samuel Pitoiset [Thu, 16 Jan 2020 16:02:44 +0000 (17:02 +0100)]
aco: add new addr64 bit to MUBUF instructions on GFX6-GFX7
According to the different ISA docs (and to LLVM), this bit seems
to only exists on GFX6-GFX7.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3432>
Samuel Pitoiset [Thu, 16 Jan 2020 13:44:02 +0000 (14:44 +0100)]
aco: do not use the vec3 variant for loads on GFX6
GFX6 only supports vec3 with load/store format.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3432>
Samuel Pitoiset [Thu, 16 Jan 2020 13:37:11 +0000 (14:37 +0100)]
aco: do not use the vec3 variant for stores on GFX6
GFX6 only supports vec3 with load/store format.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3432>
Samuel Pitoiset [Thu, 16 Jan 2020 13:04:49 +0000 (14:04 +0100)]
aco: fix constant folding of SMRD instructions on GFX6
SMRD instructions have an 8-bit dword offset on SI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3432>
Jason Ekstrand [Fri, 17 Jan 2020 23:46:31 +0000 (17:46 -0600)]
anv: Canonicalize buffer formats for image/buffer copies
Some formats, in particular YCbCr formats and ASTC have additional
restrictions. We already whack ASTC formats to RGBA32_UINT because the
hardware doesn't allow LINEAR with ASTC. However, we need to fix YCbCr
formats as well because they come with alignment restrictions that we
can't guarantee are satisfied. We're using blorp_copy to do the copies
so we may as well just stomp formats for everything.
Fixes: b24b93d5843 "anv: enable VK_KHR_sampler_ycbcr_conversion"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3460>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3460>
Jason Ekstrand [Sat, 18 Jan 2020 00:25:33 +0000 (18:25 -0600)]
anv/blorp: Rename buffer image stride parameters
The new names fit better with the Vulkan names and don't pretend to be
an actual image extent.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3460>
Daniel Stone [Mon, 20 Jan 2020 12:33:29 +0000 (12:33 +0000)]
Revert "gallium: add st_context_iface::flush_resource to call FLUSH_VERTICES"
This reverts commit
bec9c90b5ecf9cc2dc580f9ff297f94ba5aa3506.
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3472>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3472>
Daniel Stone [Mon, 20 Jan 2020 12:33:22 +0000 (12:33 +0000)]
Revert "st/dri: do FLUSH_VERTICES before calling flush_resource"
This reverts commit
3ba16d36c988a1c7b31c7fe44c1b6a24d9d8227d.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3472>
Rhys Perry [Fri, 17 Jan 2020 20:08:34 +0000 (20:08 +0000)]
aco: fix fall-through test in try_remove_simple_block() with back-edges
3bca0af2 enhanced empty block determination which exposed this bug and
created an infinite loop in a Guild Wars 2 shader.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 3bca0af25dbf6d6b162463138100abb20bc1a1cc
('aco: ignore parallelcopies to the same register on jump threading')
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2364
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3452>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3452>
Krzysztof Raszkowski [Mon, 20 Jan 2020 11:24:53 +0000 (12:24 +0100)]
docs/GL4: update gallium/swr features
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
Rhys Perry [Fri, 17 Jan 2020 11:35:20 +0000 (11:35 +0000)]
aco: fix stack buffer overflow in apply_sgprs()
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: cef78797191 ('aco: rewrite apply_sgprs()')
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2361
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3442>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3442>
Tapani Pälli [Mon, 20 Jan 2020 07:36:10 +0000 (09:36 +0200)]
anv: add assert for isl_mod_info in choose_isl_tiling_flags
CID:
1457859
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3469>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3469>
Tapani Pälli [Mon, 20 Jan 2020 07:13:48 +0000 (09:13 +0200)]
anv: fix assert in GetImageDrmFormatModifierPropertiesEXT
CID:
1457861
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3469>
Tapani Pälli [Fri, 17 Jan 2020 08:30:03 +0000 (10:30 +0200)]
isl/gen12: add reminder comment about missing WA with 3D surfaces
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3441>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3441>
Icecream95 [Sun, 12 Jan 2020 01:19:25 +0000 (14:19 +1300)]
panfrost: Dynamically allocate shader variants
This fixes a crash in LZDoom where over 16 shader variants are needed
for a few shaders in some maps, and should also save a few kilobytes
of RAM as most of the time only one or two variants of the 8 previously
allocated are actually needed.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Sat, 18 Jan 2020 14:44:19 +0000 (09:44 -0500)]
panfrost: Expose some functionality with dEQP flag
These features are stable enough that they don't need to be hidden.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3464>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3464>
Alyssa Rosenzweig [Sat, 18 Jan 2020 14:34:39 +0000 (09:34 -0500)]
pan/midgard: Fix recursive csel scheduling
Corner case causing invalid scheduling on shaders with nested csels,
i.e. GLSL code resembling:
(foo ? bool1 : bool2) ? x : y
By explicitly disallowing csels this is fixed.
Fixes INSTR_INVALID_ENC on a glamor shader (noticeable with slowdown and
visual corruption when scrolling "too far" on GTK apps).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3463>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3463>
Alyssa Rosenzweig [Wed, 8 Jan 2020 20:11:45 +0000 (15:11 -0500)]
panfrost: Identify un/pack colour opcodes
We still need to identify formats in the disassembler, but this will at
least get the opcode name clear.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3462>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3462>
Alyssa Rosenzweig [Fri, 10 Jan 2020 23:04:39 +0000 (18:04 -0500)]
pan/midgard: Bytemasks should round up, not round down
Otherwise we'll lost components in DCE.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3462>
Icecream95 [Wed, 15 Jan 2020 20:51:17 +0000 (09:51 +1300)]
panfrost: Compact the bo_access readers array
Previously, the array bo_access->readers was only cleared when there
were no unsignaled fences, which in some situations never happened.
That resulted in the array having thousands of NULL pointers, but only
a handful of active readers.
With this patch, all the unsignaled readers are moved to the front of
the array, effectively building a new array only containing the active
readers in-place. This results in the readers array usually only having
a couple of elements.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3419>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3419>
Erik Faye-Lund [Fri, 3 Jan 2020 13:51:55 +0000 (14:51 +0100)]
zink: support arrays of samplers
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3275>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3275>
Erik Faye-Lund [Fri, 3 Jan 2020 11:42:02 +0000 (12:42 +0100)]
zink: support sampling non-float textures
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3275>
Erik Faye-Lund [Fri, 3 Jan 2020 12:55:35 +0000 (13:55 +0100)]
zink: store image-type per texture
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3275>
Erik Faye-Lund [Fri, 3 Jan 2020 11:22:38 +0000 (12:22 +0100)]
zink: avoid incorrect vector-construction
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3275>
Erik Faye-Lund [Thu, 2 Jan 2020 10:26:38 +0000 (11:26 +0100)]
zink: support offset-variants of texturing
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3275>
Erik Faye-Lund [Wed, 30 Oct 2019 09:50:20 +0000 (10:50 +0100)]
zink: implement nir_texop_txs
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3275>
Erik Faye-Lund [Thu, 16 Jan 2020 20:11:29 +0000 (21:11 +0100)]
docs: fixup indentation
The most canonical indentation-style here is two spaces, which is what
the standard boilerplate in all documents use. So let's normalize to
that.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>
Erik Faye-Lund [Thu, 16 Jan 2020 20:03:08 +0000 (21:03 +0100)]
docs: remove pointless, stray newline
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>
Erik Faye-Lund [Thu, 16 Jan 2020 19:45:22 +0000 (20:45 +0100)]
docs: use [1] instead of asterisk for footnote
While we're at it, make it a link as well.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>
Erik Faye-Lund [Thu, 16 Jan 2020 19:28:53 +0000 (20:28 +0100)]
docs: remove trailing newlines
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>
Erik Faye-Lund [Thu, 16 Jan 2020 19:18:07 +0000 (20:18 +0100)]
docs: remove leading spaces
There's no good reason to have leading space in these pre-formatted
blocks. It looks strange, so let's get rid of it.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>
Erik Faye-Lund [Thu, 16 Jan 2020 19:02:17 +0000 (20:02 +0100)]
docs: remove trailing header
This header has been there since the document was added, but contains
nothing. So let's get rid of it.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>
Erik Faye-Lund [Thu, 16 Jan 2020 18:57:13 +0000 (19:57 +0100)]
docs: use figure/figcaption instead of tables
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>
Erik Faye-Lund [Thu, 16 Jan 2020 18:34:02 +0000 (19:34 +0100)]
docs: do not use definition-list for sub-topics
The dl-tag isn't a neat tool for defining sub-headings, it's a semantic
tool for defining definitions and their meaning. Let's insetad use
normal sub-headings instead.
To make the last few paragraphs stand out from the above, let's add a
sub-heading for those as well.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>
Rob Clark [Thu, 16 Jan 2020 23:14:19 +0000 (15:14 -0800)]
freedreno/a6xx: add PROG_FB_RAST stateobj
For the handful of registers that depend on the union of program/
framebuffer/rasterizer state.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>
Rob Clark [Thu, 16 Jan 2020 22:38:41 +0000 (14:38 -0800)]
freedreno/a6xx: move dynamic program state to streaming stateobj
Move the program state which we can't pre-bake to a streaming state
object, rather than emitting directly in the draw cmdstream.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>
Rob Clark [Thu, 16 Jan 2020 20:42:45 +0000 (12:42 -0800)]
freedreno/a6xx: drop a few more per-draw registers
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>
Rob Clark [Thu, 16 Jan 2020 20:15:37 +0000 (12:15 -0800)]
freedreno/a6xx: separate rast stateobj for prim restart
This lets us move PC_PRIMITIVE_CNTL into the rasterizr stateobj, rather
than unconditionally emitting it directly in the cmdstream on every
draw.
This also starts adding some tracking about previous draw state, so that
following patches can limit some of the register writes we currently
emit on every draw.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>
Rob Clark [Thu, 16 Jan 2020 19:25:24 +0000 (11:25 -0800)]
freedreno/a6xx: cleanup rasterizer state
All but one of the reg values is only used in the stateobj, so we can
inline the register value setup and stateobj construction. While we
are at it, switch over to the new register builders.
Prep work for next patch.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>
Rob Clark [Thu, 16 Jan 2020 18:42:39 +0000 (10:42 -0800)]
freedreno/a6xx: limit scratch/debug markers to debug builds
The overhead does seem to matter when you have a high enough # of draw
calls that effect few bins/pixels, because these writes would happen
unconditionally (ie. not part of a state-group).
Possibly we could keep these if we moved them into a state-group so the
register writes would be no-ops on bins with no geometry. OTOH I
usually end up adding in a WFI when using them scratch reg values to
track down a crash. (So add a WFI to mitigate the annoyance of needing
to use a debug build to get scratch regs to locate the position of a
crash/hang in the cmdstream.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>
Jordan Justen [Tue, 14 Jan 2020 02:07:34 +0000 (18:07 -0800)]
iris: Fix some indentation in iris_init_render_context
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
C Stout [Thu, 16 Jan 2020 23:05:06 +0000 (15:05 -0800)]
util/vector: Fix u_vector_foreach when head rolls over
Also add unit tests for u_vector.
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3453>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3453>
Francisco Jerez [Sat, 4 Jan 2020 01:08:51 +0000 (17:08 -0800)]
intel/fs: Switch to standard vector layout for barycentrics at optimization time.
This involves permuting the registers of barycentric vectors to have
the standard X[0-n] Y[0-n] layout at NIR translation time.
Barycentrics are converted to the format expected by the PLN
instruction in the lower_barycentrics() pass run after the
optimization loop.
Main reason is correctness of SIMD32 fragment shaders. The
shuffle_from_pln_layout() and shuffle_to_pln_layout() helpers used
during NIR translation are busted for SIMD32. This leads to serious
corruption at present with INTEL_DEBUG=do32, especially on Gen11+
where these helpers are hit more frequently due to the lack of a
hardware PLN instruction.
Of course one could have chosen to fix those helpers instead, but
there is another far more subtle issue that was reported during review
of the SIMD32 fragment shader codegen changes: The SIMD splitting pass
currently handles SIMD32 barycentric vectors as if they had the
standard X[0-n] Y[0-n] layout, even though they are interleaved for
the PLN instruction, which causes incorrect execution masks to be
applied to the MOVs unzipping barycentric vectors in cases where a
LINTERP instruction occurs under non-uniform control flow.
I'm not aware of any conformance regressions due to the latter issue
at present, but for our peace of mind let's move the conversion to the
PLN layout into the lower_barycentrics() pass run after
lower_simd_width().
This leads to the following shader-db improvements (including SIMD32
shaders) in combination with the previous back-end preparation changes
-- Without them (especially the copy propagation changes) this would
lead to a massive number of regressions. On ICL:
total instructions in shared programs:
20662316 ->
20466903 (-0.95%)
instructions in affected programs:
10538474 ->
10343061 (-1.85%)
helped: 68775
HURT: 6
total spills in shared programs: 8938 -> 8748 (-2.13%)
spills in affected programs: 376 -> 186 (-50.53%)
helped: 9
HURT: 5
total fills in shared programs: 8965 -> 8663 (-3.37%)
fills in affected programs: 965 -> 663 (-31.30%)
helped: 9
HURT: 6
LOST: 146
GAINED: 43
On SKL:
total instructions in shared programs:
18725867 ->
18614912 (-0.59%)
instructions in affected programs:
3876590 ->
3765635 (-2.86%)
helped: 27492
HURT: 2
LOST: 191
GAINED: 417
On SNB:
total instructions in shared programs:
14573613 ->
13980646 (-4.07%)
instructions in affected programs:
5199074 ->
4606107 (-11.41%)
helped: 29998
HURT: 0
LOST: 21
GAINED: 30
Results are somewhat less impressive but still significant without
SIMD32 fragment shaders enabled. On ICL:
total instructions in shared programs:
16148728 ->
16061659 (-0.54%)
instructions in affected programs:
6114788 ->
6027719 (-1.42%)
helped: 42046
HURT: 6
total spills in shared programs: 8218 -> 8028 (-2.31%)
spills in affected programs: 376 -> 186 (-50.53%)
helped: 9
HURT: 5
total fills in shared programs: 8953 -> 8651 (-3.37%)
fills in affected programs: 965 -> 663 (-31.30%)
helped: 9
HURT: 6
LOST: 0
GAINED: 3
On SKL:
total instructions in shared programs:
14927994 ->
14926738 (-0.01%)
instructions in affected programs: 168850 -> 167594 (-0.74%)
helped: 711
HURT: 2
On SNB:
total instructions in shared programs:
10770538 ->
10734403 (-0.34%)
instructions in affected programs:
2702172 ->
2666037 (-1.34%)
helped: 17818
HURT: 0
All of the hurt shaders are either spilling slightly more or emitting
additional NOP instructions due to the SIMD16 POW workaround for
Gen8-9 combined with differences in scheduling.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Sat, 4 Jan 2020 00:12:23 +0000 (16:12 -0800)]
intel/fs: Introduce barycentric layout lowering pass.
The goal is to represent barycentrics with the standard vector layout
during optimization and particularly SIMD lowering. Instead of
emitting the barycentric layout conversions at NIR translation time,
do it later as a lowering pass. For the moment this is only applied
to PI messages, but we'll give the same treatment to LINTERP
instructions too.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Fri, 3 Jan 2020 22:41:15 +0000 (14:41 -0800)]
intel/fs: Split fetch_payload_reg() into separate helper for barycentrics.
We're about to change the layout of barycentric vectors, which will
involve permuting the GRFs of barycentrics fetched from the thread
payload. Make room for this in a function separate from the generic
fetch_payload_reg(), since the permutation will only be applicable to
barycentric vectors. This allows simplifying fetch_payload_reg(),
since there was no need for handling multiple-component payload
registers except for barycentrics.
This causes some minor shader-db noise due to the new helper emitting
a LOAD_PAYLOAD instruction unconditionally, but it will be cleaned up
shortly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Fri, 3 Jan 2020 23:58:05 +0000 (15:58 -0800)]
intel/fs/gen6: Use SEL instead of bashing thread payload for unlit centroid workaround.
This prevents regressions on SNB due to the redundant MOVs lying
around in cases where fetch_payload_reg() returns a VGRF (currently
only in SIMD32 but soon in pretty much all cases). The MOVs can't be
register-coalesced due to their source being a FIXED_GRF, and they
can't be copy-propagated either due to the unlit centroid workaround
partial writes. They can be copy-propagated just fine into a SEL
instruction though.
On SNB this prevents the following shader-db regressions (including
SIMD32 programs) in combination with the interpolation rework part of
this series:
total instructions in shared programs:
13996898 ->
14001982 (0.04%)
instructions in affected programs: 197461 -> 202545 (2.57%)
helped: 0
HURT: 1251
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Fri, 3 Jan 2020 23:06:52 +0000 (15:06 -0800)]
intel/fs/gen6: Generalize aligned_pairs_class to SIMD16 aligned barycentrics.
This is mainly meant to avoid shader-db regressions on SNB as we start
using VGRFs for barycentrics more frequently. Currently the
aligned_pairs_class is only useful in SIMD8 mode, because in SIMD16
mode barycentric vectors are typically 4 GRFs. This is not a problem
on Gen4-5, because on those platforms all VGRF allocations are
pair-aligned in SIMD16 mode. However on Gen6 we end up using either
the fast or the slow path of LINTERP rather non-deterministically
based on the behavior of the register allocator.
Fix it by repurposing aligned_pairs_class to hold PLN-aligned
registers of whatever the natural size of a barycentric vector is in
the current dispatch width.
On SNB this prevents the following shader-db regressions (including
SIMD32 programs) in combination with the interpolation rework part of
this series:
total instructions in shared programs:
13983257 ->
14527274 (3.89%)
instructions in affected programs:
1766255 ->
2310272 (30.80%)
helped: 0
HURT: 11608
LOST: 26
GAINED: 13
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Tue, 31 Dec 2019 00:34:22 +0000 (16:34 -0800)]
intel/fs/gen6: Constrain barycentric source of LINTERP during bank conflict mitigation.
This avoids regressions on SNB due to the bank conflict mitigation
pass moving a VGRF-allocated barycentric vector to a misaligned
location, which would prevent the PLN instruction from being used.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Fri, 3 Jan 2020 22:53:11 +0000 (14:53 -0800)]
intel/fs/gen4-6: Allocate registers from aligned_pairs_class based on LINTERP use.
Previously we would hardcode fs_visitor::delta_xy barycentrics to be
allocated from aligned_pairs_class on hardware with PLN source
alignment restrictions (pre-Gen7). Instead allocate any registers
consumed by LINTERP from aligned_pairs_class, even if some barycentric
vector had ended up in a temporary.
On SNB this prevents the following shader-db regressions (including
SIMD32 programs) in combination with the interpolation rework part of
this series:
total instructions in shared programs:
13983257 ->
14527274 (3.89%)
instructions in affected programs:
1766255 ->
2310272 (30.80%)
helped: 0
HURT: 11608
LOST: 26
GAINED: 13
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Mon, 30 Dec 2019 08:37:35 +0000 (00:37 -0800)]
intel/fs: Allow limited copy propagation of a LOAD_PAYLOAD into another.
This is particularly useful in cases where register coalaesce is
unlikely to succeed because the LOAD_PAYLOAD isn't a plain copy --
E.g. when a LOAD_PAYLOAD is shuffling the contents of a barycentric
vector in order to transform it into the PLN layout.
This prevents the following shader-db regressions (including SIMD32
programs) in combination with the interpolation rework part of this
series. On SKL:
total instructions in shared programs:
18596672 ->
18976097 (2.04%)
instructions in affected programs:
7937041 ->
8316466 (4.78%)
helped: 39
HURT: 67427
LOST: 466
GAINED: 220
On SNB:
total instructions in shared programs:
13993866 ->
14202963 (1.49%)
instructions in affected programs:
7611309 ->
7820406 (2.75%)
helped: 624
HURT: 52943
LOST: 6
GAINED: 18
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Mon, 30 Dec 2019 08:36:48 +0000 (00:36 -0800)]
intel/fs: Add support for copy-propagating a block of multiple FIXED_GRFs.
In cases where a LOAD_PAYLOAD instruction copies a single block of
sequential GRF registers into the destination (see
is_identity_payload()), splitting the block copy into a number of ACP
entries (one for each LOAD_PAYLOAD source) is undesirable, because
that prevents copy propagation into any instructions which read
multiple components at once with the same source (the barycentric
source of the LINTERP instruction is going to be the overwhelmingly
most common example).
Technically it would also be possible to do this for VGRF sources, but
there is little benefit from that since register coalesce already
covers many of those cases -- There is no way for a block of
FIXED_GRFs to be coalesced into a VGRF though.
This prevents the following shader-db regressions (including SIMD32
programs) in combination with the interpolation rework part of this
series. On SKL:
total instructions in shared programs:
18595160 ->
18828562 (1.26%)
instructions in affected programs:
13374946 ->
13608348 (1.75%)
helped: 7
HURT: 108977
total spills in shared programs: 9116 -> 9106 (-0.11%)
spills in affected programs: 404 -> 394 (-2.48%)
helped: 7
HURT: 9
total fills in shared programs: 8994 -> 9176 (2.02%)
fills in affected programs: 898 -> 1080 (20.27%)
helped: 7
HURT: 9
LOST: 469
GAINED: 220
On SNB:
total instructions in shared programs:
13996898 ->
14096222 (0.71%)
instructions in affected programs:
8088546 ->
8187870 (1.23%)
helped: 2
HURT: 66520
total spills in shared programs: 2985 -> 2961 (-0.80%)
spills in affected programs: 632 -> 608 (-3.80%)
helped: 2
HURT: 0
total fills in shared programs: 3144 -> 3128 (-0.51%)
fills in affected programs: 1515 -> 1499 (-1.06%)
helped: 2
HURT: 0
LOST: 0
GAINED: 4
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Mon, 30 Dec 2019 08:38:08 +0000 (00:38 -0800)]
intel/fs: Add partial support for copy-propagating FIXED_GRFs.
This will be useful for eliminating redundant copies from the FS
thread payload, particularly in SIMD32 programs. For the moment we
only allow FIXED_GRFs with identity strides in order to avoid dealing
with composing the arbitrary bidimensional strides that FIXED_GRF
regions potentially have, which are rarely used at the IR level
anyway.
This enables the following commit allowing block-propagation of
FIXED_GRF LOAD_PAYLOAD copies, and prevents the following shader-db
regressions (including SIMD32 programs) in combination with the
interpolation rework part of this series. On ICL:
total instructions in shared programs:
20484665 ->
20529650 (0.22%)
instructions in affected programs:
6031235 ->
6076220 (0.75%)
helped: 5
HURT: 42073
total spills in shared programs: 8748 -> 8925 (2.02%)
spills in affected programs: 186 -> 363 (95.16%)
helped: 5
HURT: 9
total fills in shared programs: 8663 -> 8960 (3.43%)
fills in affected programs: 647 -> 944 (45.90%)
helped: 5
HURT: 9
On SKL:
total instructions in shared programs:
18937442 ->
19128162 (1.01%)
instructions in affected programs:
8378187 ->
8568907 (2.28%)
helped: 39
HURT: 68176
LOST: 1
GAINED: 4
On SNB:
total instructions in shared programs:
14094685 ->
14243499 (1.06%)
instructions in affected programs:
7751062 ->
7899876 (1.92%)
helped: 623
HURT: 53586
LOST: 7
GAINED: 25
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Fri, 3 Jan 2020 02:54:13 +0000 (18:54 -0800)]
intel/fs: Extend copy propagation dataflow analysis to copies with FIXED_GRF source.
This involves indexing the ACP tables used internally by
fs_copy_prop_dataflow::setup_initial_values() by reg_space() instead
of register number. Both are nearly equivalent for virtual GRFs
(barring the single bit of entropy lost in the hash), and this makes
handling FIXED_GRFs straightforward.
Because we're only going to support FIXED_GRFs for the source of a
copy, this change is only strictly necessary during the second pass
that checks for source interference, but we also apply the same change
to the first pass for consistency.
Note that this shouldn't change the behavior of the copy propagation
pass until we start inserting FIXED_GRF entries into the ACP. Even
then FIXED_GRF writes are extremely rare so this change will hardly
ever have an effect, but they aren't completely non-existing so we
need to handle them for correctness.
No functional nor shader-db changes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Tue, 31 Dec 2019 08:10:28 +0000 (00:10 -0800)]
intel/fs: Rework fs_inst::is_copy_payload() into multiple classification helpers.
This reworks the current fs_inst::is_copy_payload() method into a
number of classification helpers with well-defined semantics. This
will be useful later on in order to optimize LOAD_PAYLOAD instructions
more aggressively in cases where we can determine it's safe to do so.
The closest equivalent of the present fs_inst::is_copy_payload()
method is the is_coalescing_payload() helper introduced here.
No functional nor shader-db changes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Thu, 2 Jan 2020 23:32:56 +0000 (15:32 -0800)]
intel/fs: Generalize fs_reg::is_contiguous() to register files other than VGRF.
No functional nor shader-db changes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Mon, 30 Dec 2019 02:17:10 +0000 (18:17 -0800)]
intel/fs: Try to vectorize header setup in lower_load_payload().
In cases where LOAD_PAYLOAD is provided a pair of contiguous registers
as header sources, try to use a single SIMD16 instruction in order to
initialize them. This is unlikely to affect the overall cycle count
of the shader, since the compressed instruction has twice the issue
time, except due to the reduced pressure on the instruction cache.
Main motivation is avoiding instruction-count regressions in
combination with the following copy propagation improvements, which
will allow the SIMD16 g0-1 header setup emitted for framebuffer writes
to be copy-propagated into its LOAD_PAYLOAD, leading to the emission
of two SIMD8 MOV instructions instead of a single SIMD16 MOV.
Reverting this commit on top of the copy propagation changes would
lead to the following shader-db regressions on SKL and other
platforms:
total instructions in shared programs:
14926738 ->
14935415 (0.06%)
instructions in affected programs:
1892445 ->
1901122 (0.46%)
helped: 0
HURT: 8676
Without the following copy propagation changes this doesn't have any
effect on shader-db on Gen7+, because we would typically set up the FB
write header with a separate SIMD16 MOV that isn't currently
copy-propagated into the LOAD_PAYLOAD, so the individual SIMD8 MOVs
result of LOAD_PAYLOAD lowering would get register-coalesced away
under normal circumstances. However that wasn't the case for MRF
LOAD_PAYLOAD destinations on Gen6 and earlier, because register
coalesce only kicks in for GRFs, leaving a number of redundant SIMD8
MOVs lying around. On SNB this leads to the following shader-db
improvements:
total instructions in shared programs:
10770538 ->
10734681 (-0.33%)
instructions in affected programs:
2700655 ->
2664798 (-1.33%)
helped: 17791
HURT: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Marek Olšák [Tue, 10 Dec 2019 20:45:14 +0000 (15:45 -0500)]
st/dri: do FLUSH_VERTICES before calling flush_resource