Rob Clark [Tue, 28 Apr 2020 20:07:16 +0000 (13:07 -0700)]
freedreno: switch to simple_mtx
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4810>
Rob Clark [Tue, 28 Apr 2020 20:04:16 +0000 (13:04 -0700)]
freedreno: add screen lock wrappers
This will make it easier to swap out to simple_mtx_t
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4810>
Rob Clark [Tue, 28 Apr 2020 19:39:32 +0000 (12:39 -0700)]
util/simple_mtx: add assert_locked()
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4810>
Jonathan Marek [Tue, 28 Apr 2020 23:54:06 +0000 (19:54 -0400)]
turnip: fix wrong substream size in parse_multisample_and_color_blend
Missed updating this when adding tu6_emit_sample_locations
Fixes: a92d2e11095 ("turnip: implement VK_EXT_sample_locations")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4795>
Eric Anholt [Mon, 13 Apr 2020 18:14:23 +0000 (11:14 -0700)]
util/ra: Improve ra_set_finalize() performance.
BITSET_FOR_EACH_SET can walk a sparse set (such as a register class's set
of registers) much faster than just iterating over individual bits.
Improves freedreno startup time (as measured by shader-db ./run
shaders/closed/gputest/triangle on my x86 system) by -4.12679% +/-
1.99006% (n=151)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>
Eric Anholt [Mon, 13 Apr 2020 17:47:17 +0000 (10:47 -0700)]
util/ra: Use util_dynarray for handling the conflict lists.
Again, shortens the code significantly.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>
Eric Anholt [Mon, 13 Apr 2020 17:36:08 +0000 (10:36 -0700)]
util/ra: Use util_dynarray for the adjacency list.
This make the code significantly more readable, I think (along with
shorter). Also, using util_dynarray_delete_unordered() saves us a move of
the rest of the list when removing adjacency on a node.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>
Eric Anholt [Thu, 9 Apr 2020 22:10:08 +0000 (15:10 -0700)]
util/ra: Sanity check that we're adding a valid reg to a class.
BITSET_SET might not segfault on you right away if you're just slightly
off, and an assert is nicer anyway.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>
Eric Anholt [Thu, 9 Apr 2020 21:11:51 +0000 (14:11 -0700)]
util/ra: Sanity check that the driver selected a valid reg.
freedreno was returning -1 when it didn't pick a reg from the given bitset
due to an off-by-a-small-number error.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>
Konrad Dybcio [Thu, 26 Mar 2020 16:48:52 +0000 (17:48 +0100)]
freedreno/a4xx: enable A405
This patch brings support for Adreno A405
as found on MSM8939. That chip is a cut-down
version of A4XX IP and requires no special handling.
Tested on Asus Zenfone 2 Laser (Z00T) smartphone.
Signed-off-by: Konrad Dybcio <konradybcio@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4753>
Mike Blumenkrantz [Tue, 24 Mar 2020 15:58:29 +0000 (11:58 -0400)]
iris: handle PIPE_CAP_CLEAR_SCISSORED
this allows passing scissored clear calls through the driver where it can
be handled by a repclear shader
fix kwg/mesa#61
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4310>
Mike Blumenkrantz [Tue, 24 Mar 2020 16:02:51 +0000 (12:02 -0400)]
gallium: add pipe cap for scissored clears and pass scissor state to clear() hook
this adds a new pipe cap that drivers can support which enables passing buffer
clears with scissor test enabled through to be handled by the driver instead
of having mesa draw a quad
also adjust all existing clear() hooks to have the new parameter
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4310>
Caio Marcelo de Oliveira Filho [Wed, 29 Apr 2020 04:05:05 +0000 (21:05 -0700)]
i965: Use correct constant for max_variable_local_size
Fixes: 5664bd6db38 ("i965: Implement ARB_compute_variable_group_size")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4799>
Mike Blumenkrantz [Mon, 30 Mar 2020 14:37:29 +0000 (10:37 -0400)]
iris: move iris_vtable to iris_screen
instead of inlining this into every context, now a struct is used in the screen
struct to reduce memory usage and simplify a couple of the methods
Closes: https://gitlab.freedesktop.org/kwg/mesa/-/issues/6
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4376>
Jason Ekstrand [Mon, 27 Apr 2020 20:31:12 +0000 (15:31 -0500)]
intel/fs: Don't delete coalesced MOVs if they have a cmod
Shader-db results on ICL:
total instructions in shared programs:
17133088 ->
17133287 (<.01%)
instructions in affected programs: 61300 -> 61499 (0.32%)
helped: 0
HURT: 199
This means it's likely fixing 199 bugs. :-) All the changed shaders are
in Mad Max. It's surprisingly difficult to get the back-end compiler to
generate a pattern that hits this we don't tend to emit a lot coalescable
MOVs. The pattern in Mad Max that's able to hit is fsign(fsat(x)) under
the right conditions.
Closes: #2820
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4773>
Marek Olšák [Mon, 27 Apr 2020 03:17:41 +0000 (23:17 -0400)]
st/mesa: expose more SPIR-V capabilities
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4760>
Marek Olšák [Mon, 27 Apr 2020 05:03:38 +0000 (01:03 -0400)]
mesa: report GL_INVALID_OPERATION for invalid glTextureBuffer target
This fixes:
KHR-GL46.direct_state_access.textures_buffer_errors
KHR-GL46.direct_state_access.textures_buffer_range_errors
Fixes: 98e64e538af - main: Added entry point for glTextureBuffer
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4759>
Alyssa Rosenzweig [Tue, 28 Apr 2020 21:44:39 +0000 (17:44 -0400)]
pan/mdg: Replicate 16-bit swizzles
We don't support vec8 quite yet anyway, this fixes dot products.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>
Alyssa Rosenzweig [Tue, 28 Apr 2020 21:44:19 +0000 (17:44 -0400)]
pan/mdg: Ensure fdot is scalar out in disasm
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>
Alyssa Rosenzweig [Tue, 28 Apr 2020 21:43:56 +0000 (17:43 -0400)]
pan/mdg: Move condense_writemask to disasm
The compiler should *never* use this. Packing should be 1 way.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>
Alyssa Rosenzweig [Tue, 28 Apr 2020 00:35:39 +0000 (20:35 -0400)]
pan/mdg: Pass through some types from scheduling
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>
Alyssa Rosenzweig [Tue, 28 Apr 2020 00:35:00 +0000 (20:35 -0400)]
pan/mdg: Don't crash on unknown branch target
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>
Alyssa Rosenzweig [Tue, 28 Apr 2020 00:34:36 +0000 (20:34 -0400)]
pan/mdg: Make some branch targets more explicit
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>
Alyssa Rosenzweig [Tue, 28 Apr 2020 21:57:38 +0000 (17:57 -0400)]
pan/mdg: Always print the mask
Meaningful for fp16.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>
Alyssa Rosenzweig [Tue, 28 Apr 2020 00:12:55 +0000 (20:12 -0400)]
pan/mdg: Specialize swizzle to type
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>
Alyssa Rosenzweig [Tue, 28 Apr 2020 00:09:43 +0000 (20:09 -0400)]
pan/mdg: Lower specials to 32-bit
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>
Alyssa Rosenzweig [Mon, 27 Apr 2020 23:23:17 +0000 (19:23 -0400)]
pan/mdg: Move sampler_type emission to pack time
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>
Alyssa Rosenzweig [Mon, 27 Apr 2020 23:22:06 +0000 (19:22 -0400)]
pan/mdg: Set texture full fields at pack time
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>
Alyssa Rosenzweig [Mon, 27 Apr 2020 23:11:19 +0000 (19:11 -0400)]
pan/mdg: Track texture types
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>
Alyssa Rosenzweig [Mon, 27 Apr 2020 23:11:07 +0000 (19:11 -0400)]
pan/mdg: Track v_mov type (force uint32 for now?)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>
Alyssa Rosenzweig [Mon, 27 Apr 2020 23:04:24 +0000 (19:04 -0400)]
pan/mdg: Denoise prints
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>
Alyssa Rosenzweig [Mon, 27 Apr 2020 23:01:40 +0000 (19:01 -0400)]
pan/mdg: Track a primary type for I/O
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>
Alyssa Rosenzweig [Mon, 27 Apr 2020 22:58:21 +0000 (18:58 -0400)]
pan/mdg: Another goofy comment gone
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>
Alyssa Rosenzweig [Mon, 27 Apr 2020 22:57:34 +0000 (18:57 -0400)]
pan/mdg: Track ALU dest type
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>
Alyssa Rosenzweig [Mon, 27 Apr 2020 22:55:11 +0000 (18:55 -0400)]
pan/mdg: Track ALU src types
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>
Alyssa Rosenzweig [Mon, 27 Apr 2020 22:43:12 +0000 (18:43 -0400)]
pan/mdg: Add type fields to IR
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>
Alyssa Rosenzweig [Mon, 27 Apr 2020 22:43:01 +0000 (18:43 -0400)]
pan/bi: Share ALU type printing
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>
Alyssa Rosenzweig [Mon, 27 Apr 2020 22:33:10 +0000 (18:33 -0400)]
pan/mdg: Set lower_flrp16
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>
Alyssa Rosenzweig [Mon, 27 Apr 2020 22:30:53 +0000 (18:30 -0400)]
pan/mdg: Remove old hack
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>
Alyssa Rosenzweig [Mon, 27 Apr 2020 21:55:54 +0000 (17:55 -0400)]
pan/mdg: Remove goofy 16-bit comment
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4792>
Alyssa Rosenzweig [Mon, 27 Apr 2020 21:47:13 +0000 (17:47 -0400)]
pan/mdg: Don't break SSA
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4792>
Alyssa Rosenzweig [Mon, 27 Apr 2020 21:19:24 +0000 (17:19 -0400)]
pan/mdg: SSA_FIXED_MINIMUM already covered by PAN_IS_REG
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4792>
Alyssa Rosenzweig [Mon, 27 Apr 2020 20:34:53 +0000 (16:34 -0400)]
pan/mdg: Use PAN_IS_REG
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4792>
Alyssa Rosenzweig [Mon, 27 Apr 2020 20:33:54 +0000 (16:33 -0400)]
pan/mdg: Remove nir_alu_src_index
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4792>
Alyssa Rosenzweig [Mon, 27 Apr 2020 20:04:05 +0000 (16:04 -0400)]
pan/bi: Use common IR indices
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4792>
Alyssa Rosenzweig [Mon, 27 Apr 2020 20:00:38 +0000 (16:00 -0400)]
panfrost: Move Bifrost IR indexing to common
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4792>
Alyssa Rosenzweig [Tue, 28 Apr 2020 19:28:11 +0000 (15:28 -0400)]
panfrost: Fix BO reference counting
Typo.
Fixes: 3283c7f4dad ("panfrost: Inline reference counting routines")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4792>
Marek Olšák [Mon, 20 Apr 2020 21:29:39 +0000 (17:29 -0400)]
ac: enable displayable DCC on Navi12 & Navi14
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>
Marek Olšák [Sat, 18 Apr 2020 00:44:14 +0000 (20:44 -0400)]
ac/surface: validate that DCC is enabled correctly on gfx9+
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>
Marek Olšák [Sat, 18 Apr 2020 00:37:41 +0000 (20:37 -0400)]
ac/surface: add code for gfx10 displayable DCC
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>
Marek Olšák [Wed, 22 Apr 2020 22:51:42 +0000 (18:51 -0400)]
ac/surface: move non-displayable DCC to the end of the buffer
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>
Marek Olšák [Sat, 18 Apr 2020 00:27:32 +0000 (20:27 -0400)]
ac/surface: don't compute DCC if it's unsupported by DCN on gfx9+
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>
Marek Olšák [Sat, 18 Apr 2020 00:19:26 +0000 (20:19 -0400)]
ac/surface: match get_display_flag() with expectations for is_displayable
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>
Marek Olšák [Thu, 23 Apr 2020 05:00:24 +0000 (01:00 -0400)]
ac/surface: replace RADEON_SURF_OPTIMIZE_FOR_SPACE with !FORCE_SWIZZLE_MODE
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>
Marek Olšák [Thu, 23 Apr 2020 04:47:04 +0000 (00:47 -0400)]
ac/surface: remove RADEON_SURF_TC_COMPATIBLE_HTILE and assume it's always set
So that drivers can enable it without worrying how the texture was
allocated.
v2: reworked the mechanism, hopefully fixes now
added Bas Nieuwenhuizen's diff to fix radv
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>
Marek Olšák [Thu, 23 Apr 2020 04:31:36 +0000 (00:31 -0400)]
ac/surface: rename micro tile mode enums like gfx10 uses them
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>
Thomas Hellstrom [Wed, 22 Apr 2020 13:03:15 +0000 (15:03 +0200)]
winsys/svga: Optionally avoid caching buffer maps
Mapping of graphics kernel buffers is quite costly. Therefore the svga
drm winsys caches all kernel buffer maps. However, that may lead to
less testing coverage of the unmap paths and (possibly) processes running
out of virtual memory space. Introduce a possibility to avoid that caching
by setting the environment variable SVGA_FORCE_KERNEL_UNMAPS to 1.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4804>
Thomas Hellstrom [Wed, 22 Apr 2020 11:27:35 +0000 (13:27 +0200)]
gallium/pipebuffer: Use persistent maps for slabs
Instead of the ugly practice of relying on the provider caching maps,
introduce and use persistent pipebuffer maps. Providers that can't handle
persistent maps can't use the slab manager.
The only current user is the svga drm winsys which always maps
persistently.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4804>
Timur Kristóf [Thu, 23 Apr 2020 13:13:31 +0000 (15:13 +0200)]
radv: Use smaller esgs_itemsize for ACO.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
Timur Kristóf [Mon, 30 Mar 2020 15:23:25 +0000 (17:23 +0200)]
aco: Use new default driver locations.
The way the new locations are set up has much fewer gaps
between each I/O slot, so this results in a massive reduction
in the LDS usage of tessellation shaders.
Totals (GFX10):
VGPRS:
3976792 ->
3974864 (-0.05 %)
Code Size:
260552784 ->
260532860 (-0.01 %) bytes
LDS: 48723 -> 30179 (-38.06 %) blocks
Max Waves:
1053407 ->
1053583 (0.02 %)
Totals from affected shaders (1407 shaders on GFX10):
SGPRS: 59144 -> 59216 (0.12 %)
VGPRS: 63024 -> 61096 (-3.06 %)
Code Size:
2695508 ->
2675584 (-0.74 %) bytes
LDS: 47109 -> 28565 (-39.36 %) blocks
Max Waves: 12999 -> 13175 (1.35 %)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
Timur Kristóf [Mon, 27 Apr 2020 10:22:03 +0000 (12:22 +0200)]
radv: Use new linking helper to set default driver locations.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
Timur Kristóf [Mon, 30 Mar 2020 13:58:07 +0000 (15:58 +0200)]
nir: Add new linking helper to set linked driver locations.
This commit introduces a new function nir_assign_linked_io_var_locations
which is intended to help with assigning driver locations to shaders
during linking, primarily aimed at the VS->TCS->TES->GS stages.
It ensures that the linked shaders have the same driver locations,
and it also packs these as close to each other as possible.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
Timur Kristóf [Thu, 23 Apr 2020 12:02:47 +0000 (14:02 +0200)]
aco: Set config->lds_size when TES or VS is running on HW ESGS.
This doesn't fix anything, just reports the LDS size used by
merged ESGS shaders, such as vertex_geometry_gs and
tess_eval_geometry_gs.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
Timur Kristóf [Mon, 27 Apr 2020 17:51:40 +0000 (19:51 +0200)]
aco: Calculate workgroup size of legacy GS.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
Timur Kristóf [Mon, 30 Mar 2020 14:54:56 +0000 (16:54 +0200)]
aco: Remember VS/TCS output driver locations.
Instead of relying on calling shader_io_get_unique_index repeatedly,
remember the which output driver location corresponds to which
varying slot.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
Timur Kristóf [Mon, 30 Mar 2020 14:11:14 +0000 (16:11 +0200)]
aco: Use context variables instead of calculating TCS inputs/outputs.
VS needs the number of TCS inputs, and TES needs the number of TCS
outputs.
It is error-prone to repeat those calculations in both instruction
selection and setup. Just set them in one place instead.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
Timur Kristóf [Mon, 30 Mar 2020 14:04:53 +0000 (16:04 +0200)]
radv: Refactor calculate_tess_lds_size and get_tcs_num_patches.
Previously these functions needed the bit mask of the TCS outputs
and patch outputs written, and concluded the number of outputs
from that.
Now, they take the number of outputs and patch outputs instead.
This will allow the backend compiler to better optimize the
LDS layout.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>
Rhys Perry [Mon, 27 Apr 2020 12:53:59 +0000 (13:53 +0100)]
aco: consider blocks unreachable if they are in the logical cfg
unreachable was true if the last block is unreachable in the linear cfg,
but it should also be true if it is unreachable in the logical cfg.
Fixes dEQP-VK.graphicsfuzz.for-with-ifs-and-return
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 8d8c864beba399ae4ee2267f680d1f600ad32767
('aco: improve check for unreachable loop continue blocks')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4764>
Christopher James Halse Rogers [Tue, 24 Mar 2020 03:19:51 +0000 (14:19 +1100)]
egl/wayland: Fix zwp_linux_dmabuf usage
There's no guarantee that the formats advertised by wl_drm and the formats
advertised by zwp_linux_dmabuf_v1 are the same.
get_back_bo() handles this by falling back from createImageWithModifiers() to
createImage() when there's a wl_drm format but no corresponding linux_dmabuf
format, but create_wl_buffer() unconditionally tries to create a linux_dmabuf
buffer unless DRIimage has DRM_FORMAT_MOD_INVALID.
Fix this by always checking if the DRIimage modifier has been advertised
by zwp_linux_dmabuf_v1, and falling back to wl_drm if not.
If DRM_FORMAT_MOD_INVALID has been advertised then we trust the client
has allocated something appropriate and treat any modifier as matching.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2220
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4294>
Danylo Piliaiev [Tue, 28 Apr 2020 11:51:26 +0000 (14:51 +0300)]
iris/bufmgr: Check if iris_bo_gem_mmap failed
After refactoring of iris_bo_map_cpu and iris_bo_map_wc - immediate
return of NULL on failure to mmap a buffer was lost.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2855
Fixes: 5bc3f52dd8c2b5acaae959ccae2e1fb7c769bb22
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4786>
Tapani Pälli [Fri, 24 Apr 2020 12:28:41 +0000 (15:28 +0300)]
anv: remove assert from GetImageMemoryRequirements[2]
This assert is actually correct but due to how android hardware buffer
support is implemented we should remove it, otherwise debug build of
mesa hits the assert with Android CTS tests.
Test creates VkImage with non-external format and sets up
VkExternalMemoryImageCreateInfo to indicate that image *may* be used
with Android hardwarebuffer handle. Then test attempts to get image
memory requirements. Problem with this is that we setup all android
supporting images as having external format and thus hit the assert as
the size has not been set yet. This is not a problem in practice since
android will bind ahw memory with the image later on.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2807
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4762>
Samuel Pitoiset [Wed, 29 Apr 2020 07:53:48 +0000 (09:53 +0200)]
gitlab-ci: add a list of expected failures for FIJI with ACO
Timur has this chip now. The depth stencil resolve failures are
somehow unexpected.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4805>
Samuel Pitoiset [Wed, 15 Apr 2020 09:39:28 +0000 (11:39 +0200)]
radv: advertise VK_EXT_robustness2
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>
Samuel Pitoiset [Wed, 15 Apr 2020 09:48:13 +0000 (11:48 +0200)]
radv: handle NULL vertex bindings
With VK_EXT_robustness2, an element of pBuffers can be NULL.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>
Samuel Pitoiset [Thu, 23 Apr 2020 14:02:59 +0000 (16:02 +0200)]
radv: handle NULL descriptors
All fields must be zero, otherwise the HW hangs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>
Samuel Pitoiset [Mon, 27 Apr 2020 15:27:22 +0000 (17:27 +0200)]
aco: fix adjusting the sample index with FMASK if value is negative
The SPIR-V spec doesn't say explicitly that the sample index
must be an unsigned integer.
This fixes crashes with some new VK_EXT_robustness2 tests.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>
Samuel Pitoiset [Mon, 27 Apr 2020 15:02:18 +0000 (17:02 +0200)]
aco: fix nir_texop_texture_samples with NULL descriptors
With VK_EXT_robustness2, descriptors can be NULL and the number of
samples returned by nir_texop_texture_samples should be 0.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>
Samuel Pitoiset [Mon, 27 Apr 2020 11:04:40 +0000 (13:04 +0200)]
ac/llvm: fix nir_texop_texture_samples with NULL descriptors
With VK_EXT_robustness2, descriptors can be NULL and the number of
samples returned by nir_texop_texture_samples should be 0.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>
Caio Marcelo de Oliveira Filho [Fri, 17 Jan 2020 22:17:58 +0000 (14:17 -0800)]
intel/fs: Only stall after sending all memory fence messages
In Gen11+, when emitting a fence for both L3 and SLM, the generated
code would look like
SEND, MOV (for stall), SEND, MOV (for stall)
This commit change that so two SENDs are emitted before the MOVs for
stall. This is similar to the approach used in Ivy Bridge for the
render fence.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>
Caio Marcelo de Oliveira Filho [Fri, 17 Jan 2020 23:07:44 +0000 (15:07 -0800)]
intel/fs,vec4: Pull stall logic for memory fences up into the IR
Instead of emitting the stall MOV "inside" the
SHADER_OPCODE_MEMORY_FENCE generation, use the scheduling fences when
creating the IR.
For IvyBridge, every (data cache) fence is accompained by a render
cache fence, that now is explicit in the IR, two
SHADER_OPCODE_MEMORY_FENCEs are emitted (with different SFIDs).
Because Begin and End interlock intrinsics are effectively memory
barriers, move its handling alongside the other memory barrier
intrinsics. The SHADER_OPCODE_INTERLOCK is still used to distinguish
if we are going to use a SENDC (for Begin) or regular SEND (for End).
This change is a preparation to allow emitting both SENDs in Gen11+
before we can stall on them.
Shader-db results for IVB (i965):
total instructions in shared programs:
11971190 ->
11971200 (<.01%)
instructions in affected programs: 11482 -> 11492 (0.09%)
helped: 0
HURT: 8
HURT stats (abs) min: 1 max: 3 x̄: 1.25 x̃: 1
HURT stats (rel) min: 0.03% max: 0.50% x̄: 0.14% x̃: 0.10%
95% mean confidence interval for instructions value: 0.66 1.84
95% mean confidence interval for instructions %-change: 0.01% 0.27%
Instructions are HURT.
Unlike the previous code, that used the `mov g1 g2` trick to force
both `g1` and `g2` to stall, the scheduling fence will generate `mov
null g1` and `mov null g2`. During review it was decided it was not
worth keeping the special codepath for the small effect will have.
Shader-db results for HSW (i965), BDW and SKL don't have a change
on instruction count, but do report changes in cycles count, showing
SKL results below
total cycles in shared programs:
341738444 ->
341710570 (<.01%)
cycles in affected programs:
7240002 ->
7212128 (-0.38%)
helped: 46
HURT: 5
helped stats (abs) min: 14 max: 1940 x̄: 676.22 x̃: 154
helped stats (rel) min: <.01% max: 2.62% x̄: 1.28% x̃: 0.95%
HURT stats (abs) min: 2 max: 1768 x̄: 646.40 x̃: 362
HURT stats (rel) min: <.01% max: 0.83% x̄: 0.28% x̃: 0.08%
95% mean confidence interval for cycles value: -777.71 -315.38
95% mean confidence interval for cycles %-change: -1.42% -0.83%
Cycles are helped.
This seems to be the effect of allocating two registers separatedly
instead of a single one with size 2, which causes different register
allocation, affecting the cycle estimates.
while ICL also has not change on instruction count but report changes
negative changes in cycles
total cycles in shared programs:
352665369 ->
352707484 (0.01%)
cycles in affected programs:
9608288 ->
9650403 (0.44%)
helped: 4
HURT: 104
helped stats (abs) min: 24 max: 128 x̄: 88.50 x̃: 101
helped stats (rel) min: <.01% max: 0.85% x̄: 0.46% x̃: 0.49%
HURT stats (abs) min: 2 max: 2016 x̄: 408.36 x̃: 48
HURT stats (rel) min: <.01% max: 3.31% x̄: 0.88% x̃: 0.45%
95% mean confidence interval for cycles value: 256.67 523.24
95% mean confidence interval for cycles %-change: 0.63% 1.03%
Cycles are HURT.
AFAICT this is the result of the case above.
Shader-db results for TGL have similar cycles result as ICL, but also
affect instructions
total instructions in shared programs:
17690586 ->
17690597 (<.01%)
instructions in affected programs: 64617 -> 64628 (0.02%)
helped: 55
HURT: 32
helped stats (abs) min: 1 max: 16 x̄: 4.13 x̃: 3
helped stats (rel) min: 0.05% max: 2.78% x̄: 0.86% x̃: 0.74%
HURT stats (abs) min: 1 max: 65 x̄: 7.44 x̃: 2
HURT stats (rel) min: 0.05% max: 4.58% x̄: 1.13% x̃: 0.69%
95% mean confidence interval for instructions value: -2.03 2.28
95% mean confidence interval for instructions %-change: -0.41% 0.15%
Inconclusive result (value mean confidence interval includes 0).
Now that more is done in the IR, more dependencies are visible and
more SWSB annotations are emitted. Mixed with different register
allocation decisions like above, some shaders will see more `sync
nops` while others able to avoid them.
Most of the new `sync nops` are also redundant and could be dropped,
which will be fixed in a separate change.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>
Caio Marcelo de Oliveira Filho [Fri, 17 Jan 2020 22:52:13 +0000 (14:52 -0800)]
intel/fs: Allow FS_OPCODE_SCHEDULING_FENCE stall on registers
It will generate the MOVs (or SYNC_NOP in Gen12+) needed for stall.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>
Bas Nieuwenhuizen [Tue, 28 Apr 2020 15:04:25 +0000 (17:04 +0200)]
radv: Expose 4G element texel buffers.
Old value seems to be copied from anv.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4787>
Kenneth Graunke [Tue, 28 Apr 2020 21:04:58 +0000 (14:04 -0700)]
iris: Fix downcast of bound_vertex_buffers from uint64_t to int
This is the wrong data type, the original field - and the values we're
adding in - are both 64-bit unsigned. Keep the original data type.
Thanks to Dave Airlie for finding this while reading the code.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4802>
Francisco Jerez [Fri, 3 Apr 2020 00:42:21 +0000 (17:42 -0700)]
intel/ir: Remove scheduling-based cycle count estimates.
The cycle count estimation logic part of the scheduler is now
redundant with the shader performance modeling pass, and the estimates
can be consolidated into the brw::performance analysis result object
instead of being part of the CFG, which guarantees that the estimates
cannot be accessed without previously calling the
performance_analysis::require() method, which makes sure that the
right analysis pass is executed at the right time if we don't already
have up-to-date cached results.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Fri, 3 Apr 2020 00:42:57 +0000 (17:42 -0700)]
intel/ir: Pass block cycle count information explicitly to disassembler.
So we can eventually remove the cycle count estimates from the CFG
data structure and consolidate performance information in the
brw::performance object.
It would be cleaner to pass the brw::performance object directly to
the disassembler but that isn't straightforward since the disassembler
is built as a plain C file unlike the rest of the compiler back-end.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Thu, 26 Mar 2020 23:27:32 +0000 (16:27 -0700)]
intel/ir: Use brw::performance object instead of CFG cycle counts for codegen stats.
These should be more accurate than the current cycle counts, since
among other things they consider the effect of post-scheduling passes
like the software scoreboard on TGL. In addition it will enable us to
clean up some of the now redundant cycle-count estimation
functionality in the instruction scheduler.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Wed, 22 Apr 2020 20:29:34 +0000 (13:29 -0700)]
intel/fs: Add INTEL_DEBUG=no32 debugging flag.
This is useful in order to identify codegen issues caused by SIMD32.
It doesn't currently have any effect on compute shaders since SIMD32
dispatch is only enabled for CS when it's strictly necessary to do so
in order to support the workgroup size requested for the shader --
That might change in the future though when we hook up the SIMD32
heuristic to CS compilation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Fri, 3 Apr 2020 00:30:06 +0000 (17:30 -0700)]
intel/fs: Implement performance analysis-based SIMD32 heuristic for fragment shaders.
The heuristic enables the SIMD32 fragment shader based on whether the
IR performance modeling pass predicts it to have greater throughput
than the SIMD16 and SIMD8 variants of the same shader. It would be
straightforward to do the same thing in order to control whether
SIMD16 dispatch is enabled, but it's pending additional performance
evaluation.
The INTEL_DEBUG=do32 option is left around in order to force the
SIMD32 shader to be used regardless of the result of the heuristic,
since it's useful as a debugging aid e.g. in order to identify
SIMD32-specific codegen issues which may be masked by the SIMD32
heuristic, or cases where the heuristic is incorrectly disabling
SIMD32 shaders that offer a performance advantage.
Currently this is only enabled on Gen6+, since SIMD32 codegen support
is incomplete on earlier platforms.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Fri, 3 Apr 2020 00:16:45 +0000 (17:16 -0700)]
intel/fs: Heap-allocate fs_visitors in brw_compile_fs().
This makes brw_compile_fs() look a bit more similar to
brw_compile_cs(). It saves us three v*_shader_stats local variables,
and will save us additional triplicated declarations as we start
tracking IR performance analysis results.
The triplicated cfg pointers are left around because they're set to
NULL to mark specific dispatch modes as disabled (e.g. in order to
enforce hardware restrictions). Doing the same thing with the visitor
pointers would cause data leaks.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Thu, 26 Mar 2020 21:59:02 +0000 (14:59 -0700)]
intel/ir: Import shader performance analysis pass.
This introduces an analysis pass intended to estimate several
performance statistics of the shader, including cycle count latency
and throughput values, based on static modeling. It has instruction
performance information more comprehensive than the current scheduling
pass for all platforms between Gen4-11, and works on both the FS and
VEC4 back-end.
The most immediate purpose of this pass is to implement a heuristic
meant to determine whether using SIMD32 dispatch for a fragment shader
can be expected to help more than it hurts. In addition this will
allow the effect of passes run after scheduling (e.g. the TGL software
scoreboard pass and the VEC4 dependency control pass) to be visible in
shader-db statistics.
But that isn't the end of the story, other potential applications of
this pass (not part of this MR) I've been playing around with are:
- Implement a similar SIMD16 heuristic allowing the identification of
inefficient SIMD16 fragment shaders.
- Implement similar SIMD16 and SIMD32 heuristics for the compute
shader stage -- Currently compute shader builds always use the
SIMD16 shader if available and never use the SIMD32 shader unless
strictly necessary, which is suboptimal under certain conditions.
- Hook up to the instruction scheduler in order to improve the
accuracy of its timing information.
- Use as heuristic in order to drive the selection of scheduling
modes (Matt was experimenting with that).
- Plug to the TGL software scoreboard pass in order to implement a
more effective SBID token allocation algorithm, since in general
the optimal token allocation depends on the timings of all
instructions in the program.
- Use its bottleneck detection functionality in order to implement a
heuristic computing a more optimal bound for the number of fragment
shader threads executed in parallel (by adjusting the
MaximumNumberofThreadsPerPSD control of 3DSTATE_PS).
As a follow-up I'm planning to submit updated timing information for
Gen12 platforms -- Everything else required to support Gen12 like SWSB
handling is already included in this patch, but there were some IP
concerns regarding the TGL timing parameters since they cannot
currently be obtained with the documentation and hardware which is
publicly available. The timing parameters for any previous Gen7-11
platforms can be obtained by anyone by sampling the timestamp register
using e.g. shader_time, though I have some more convenient
instrumentation coming up.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Sat, 22 Feb 2020 09:17:21 +0000 (01:17 -0800)]
intel/vec4: Fix constness of vec4_instruction::reads_flag() and ::writes_flag().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Thu, 2 Apr 2020 23:20:34 +0000 (16:20 -0700)]
intel/fs: Replace fs_visitor::bank_conflict_cycles() with stand-alone function.
This will be re-usable by the IR performance analysis pass.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Thu, 2 Apr 2020 23:18:12 +0000 (16:18 -0700)]
intel/fs: Fix constness of argument of fs_instruction_scheduler::is_compressed().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Fri, 3 Apr 2020 20:04:43 +0000 (13:04 -0700)]
intel/fs: Rename half() helpers to quarter(), allow index up to 3.
Makes more sense considering SIMD32. Relaxing the assertion in
brw_ir_fs.h will be required in order to avoid assertion failures on
SNB with SIMD32 fragment shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Thu, 26 Mar 2020 22:01:13 +0000 (15:01 -0700)]
intel/ir: Add missing initialization of backend_reg::offset during construction.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Tue, 7 Apr 2020 23:39:59 +0000 (16:39 -0700)]
intel/fs/gen12: Fix Render Target Read header setup for new thread payload layout.
In Gen12 the Poly 0 Info DWORD containing the Viewport Index and
Render Target Index fields were moved from r0.0 to r1.1 in order to
make room for dual-polygon dispatch. The render target message format
was updated to expect that information in the same location, so we
didn't need to make any changes for framebuffer fetch to work with
SIMD8 and SIMD16 dispatch. Unfortunately that won't work with SIMD32,
since the render target message header is assembled from r0 and r2
instead of r1, and the r2 thread payload wasn't updated with an
additional copy of the same information. We need to fix things up
manually instead. This avoids a handful of
EXT_shader_framebuffer_fetch regressions in combination with SIMD32
fragment shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Wed, 8 Apr 2020 00:31:07 +0000 (17:31 -0700)]
intel/fs/gen12: Work around dual-source blending hangs in combination with SIMD32.
This applies the same work-around I commited as
b84fa0b31e67
"intel/fs/gen11: Work around dual-source blending hangs in combination
with SIMD32." to Gen12, which seems to suffer from the same hardware
bug found empirically. The failure mode seems to be identical.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Wed, 8 Apr 2020 00:22:10 +0000 (17:22 -0700)]
intel/fs/gen12: Fix hangs with per-sample SIMD32 fragment shader dispatch.
The Gen12 docs are rather contradictory regarding the dispatch
configurations supported by the fragment shader -- The same table
present in previous generations seems to imply that only one dispatch
mode can be enabled when doing per-sample shading, but a restriction
documented in the 3DSTATE_PS_BODY page implies the opposite: That
SIMD32 can only be used in combination with some other dispatch mode.
The latter seems to match the behavior of real hardware as I could
tell from my testing: A bunch of multisample test-cases that do
per-sample shading hang if we only provide a SIMD32 shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Dylan Baker [Wed, 22 Apr 2020 06:32:45 +0000 (23:32 -0700)]
mesa: Follow OpenGL conversion rules for values that exceed storage size
Section 2.2.2 (Data Conversions For State Query Commands) of the
OpenGL 4.5 spec says:
Following these steps, if a value is so large in magnitude that
it cannot be represented by the returned data type, then the
nearest value representable using that type is returned.
The current code doesn't do the correct thing, because it truncates a
long (potentially a 64bit values) to an int.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2828
Fixes: 53c36dfcfe3eb3749a53267f054870280afb0d71
("replace IROUND with util functions")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4673>
Alyssa Rosenzweig [Tue, 28 Apr 2020 17:57:31 +0000 (13:57 -0400)]
pan/bit: Add BITWISE test
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4790>