mesa.git
4 years agoac/surface: rename micro tile mode enums like gfx10 uses them
Marek Olšák [Thu, 23 Apr 2020 04:31:36 +0000 (00:31 -0400)]
ac/surface: rename micro tile mode enums like gfx10 uses them

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>

4 years agowinsys/svga: Optionally avoid caching buffer maps
Thomas Hellstrom [Wed, 22 Apr 2020 13:03:15 +0000 (15:03 +0200)]
winsys/svga: Optionally avoid caching buffer maps

Mapping of graphics kernel buffers is quite costly. Therefore the svga
drm winsys caches all kernel buffer maps. However, that may lead to
less testing coverage of the unmap paths and (possibly) processes running
out of virtual memory space. Introduce a possibility to avoid that caching
by setting the environment variable SVGA_FORCE_KERNEL_UNMAPS to 1.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4804>

4 years agogallium/pipebuffer: Use persistent maps for slabs
Thomas Hellstrom [Wed, 22 Apr 2020 11:27:35 +0000 (13:27 +0200)]
gallium/pipebuffer: Use persistent maps for slabs

Instead of the ugly practice of relying on the provider caching maps,
introduce and use persistent pipebuffer maps. Providers that can't handle
persistent maps can't use the slab manager.

The only current user is the svga drm winsys which always maps
persistently.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4804>

4 years agoradv: Use smaller esgs_itemsize for ACO.
Timur Kristóf [Thu, 23 Apr 2020 13:13:31 +0000 (15:13 +0200)]
radv: Use smaller esgs_itemsize for ACO.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>

4 years agoaco: Use new default driver locations.
Timur Kristóf [Mon, 30 Mar 2020 15:23:25 +0000 (17:23 +0200)]
aco: Use new default driver locations.

The way the new locations are set up has much fewer gaps
between each I/O slot, so this results in a massive reduction
in the LDS usage of tessellation shaders.

Totals (GFX10):
VGPRS: 3976792 -> 3974864 (-0.05 %)
Code Size: 260552784 -> 260532860 (-0.01 %) bytes
LDS: 48723 -> 30179 (-38.06 %) blocks
Max Waves: 1053407 -> 1053583 (0.02 %)

Totals from affected shaders (1407 shaders on GFX10):
SGPRS: 59144 -> 59216 (0.12 %)
VGPRS: 63024 -> 61096 (-3.06 %)
Code Size: 2695508 -> 2675584 (-0.74 %) bytes
LDS: 47109 -> 28565 (-39.36 %) blocks
Max Waves: 12999 -> 13175 (1.35 %)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>

4 years agoradv: Use new linking helper to set default driver locations.
Timur Kristóf [Mon, 27 Apr 2020 10:22:03 +0000 (12:22 +0200)]
radv: Use new linking helper to set default driver locations.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>

4 years agonir: Add new linking helper to set linked driver locations.
Timur Kristóf [Mon, 30 Mar 2020 13:58:07 +0000 (15:58 +0200)]
nir: Add new linking helper to set linked driver locations.

This commit introduces a new function nir_assign_linked_io_var_locations
which is intended to help with assigning driver locations to shaders
during linking, primarily aimed at the VS->TCS->TES->GS stages.

It ensures that the linked shaders have the same driver locations,
and it also packs these as close to each other as possible.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>

4 years agoaco: Set config->lds_size when TES or VS is running on HW ESGS.
Timur Kristóf [Thu, 23 Apr 2020 12:02:47 +0000 (14:02 +0200)]
aco: Set config->lds_size when TES or VS is running on HW ESGS.

This doesn't fix anything, just reports the LDS size used by
merged ESGS shaders, such as vertex_geometry_gs and
tess_eval_geometry_gs.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>

4 years agoaco: Calculate workgroup size of legacy GS.
Timur Kristóf [Mon, 27 Apr 2020 17:51:40 +0000 (19:51 +0200)]
aco: Calculate workgroup size of legacy GS.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>

4 years agoaco: Remember VS/TCS output driver locations.
Timur Kristóf [Mon, 30 Mar 2020 14:54:56 +0000 (16:54 +0200)]
aco: Remember VS/TCS output driver locations.

Instead of relying on calling shader_io_get_unique_index repeatedly,
remember the which output driver location corresponds to which
varying slot.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>

4 years agoaco: Use context variables instead of calculating TCS inputs/outputs.
Timur Kristóf [Mon, 30 Mar 2020 14:11:14 +0000 (16:11 +0200)]
aco: Use context variables instead of calculating TCS inputs/outputs.

VS needs the number of TCS inputs, and TES needs the number of TCS
outputs.

It is error-prone to repeat those calculations in both instruction
selection and setup. Just set them in one place instead.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>

4 years agoradv: Refactor calculate_tess_lds_size and get_tcs_num_patches.
Timur Kristóf [Mon, 30 Mar 2020 14:04:53 +0000 (16:04 +0200)]
radv: Refactor calculate_tess_lds_size and get_tcs_num_patches.

Previously these functions needed the bit mask of the TCS outputs
and patch outputs written, and concluded the number of outputs
from that.

Now, they take the number of outputs and patch outputs instead.
This will allow the backend compiler to better optimize the
LDS layout.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>

4 years agoaco: consider blocks unreachable if they are in the logical cfg
Rhys Perry [Mon, 27 Apr 2020 12:53:59 +0000 (13:53 +0100)]
aco: consider blocks unreachable if they are in the logical cfg

unreachable was true if the last block is unreachable in the linear cfg,
but it should also be true if it is unreachable in the logical cfg.

Fixes dEQP-VK.graphicsfuzz.for-with-ifs-and-return

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 8d8c864beba399ae4ee2267f680d1f600ad32767
    ('aco: improve check for unreachable loop continue blocks')

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4764>

4 years agoegl/wayland: Fix zwp_linux_dmabuf usage
Christopher James Halse Rogers [Tue, 24 Mar 2020 03:19:51 +0000 (14:19 +1100)]
egl/wayland: Fix zwp_linux_dmabuf usage

There's no guarantee that the formats advertised by wl_drm and the formats
advertised by zwp_linux_dmabuf_v1 are the same.

get_back_bo() handles this by falling back from createImageWithModifiers() to
createImage() when there's a wl_drm format but no corresponding linux_dmabuf
format, but create_wl_buffer() unconditionally tries to create a linux_dmabuf
buffer unless DRIimage has DRM_FORMAT_MOD_INVALID.

Fix this by always checking if the DRIimage modifier has been advertised
by zwp_linux_dmabuf_v1, and falling back to wl_drm if not.

If DRM_FORMAT_MOD_INVALID has been advertised then we trust the client
has allocated something appropriate and treat any modifier as matching.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2220
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4294>

4 years agoiris/bufmgr: Check if iris_bo_gem_mmap failed
Danylo Piliaiev [Tue, 28 Apr 2020 11:51:26 +0000 (14:51 +0300)]
iris/bufmgr: Check if iris_bo_gem_mmap failed

After refactoring of iris_bo_map_cpu and iris_bo_map_wc - immediate
return of NULL on failure to mmap a buffer was lost.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2855
Fixes: 5bc3f52dd8c2b5acaae959ccae2e1fb7c769bb22
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4786>

4 years agoanv: remove assert from GetImageMemoryRequirements[2]
Tapani Pälli [Fri, 24 Apr 2020 12:28:41 +0000 (15:28 +0300)]
anv: remove assert from GetImageMemoryRequirements[2]

This assert is actually correct but due to how android hardware buffer
support is implemented we should remove it, otherwise debug build of
mesa hits the assert with Android CTS tests.

Test creates VkImage with non-external format and sets up
VkExternalMemoryImageCreateInfo to indicate that image *may* be used
with Android hardwarebuffer handle. Then test attempts to get image
memory requirements. Problem with this is that we setup all android
supporting images as having external format and thus hit the assert as
the size has not been set yet. This is not a problem in practice since
android will bind ahw memory with the image later on.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2807
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4762>

4 years agogitlab-ci: add a list of expected failures for FIJI with ACO
Samuel Pitoiset [Wed, 29 Apr 2020 07:53:48 +0000 (09:53 +0200)]
gitlab-ci: add a list of expected failures for FIJI with ACO

Timur has this chip now. The depth stencil resolve failures are
somehow unexpected.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4805>

4 years agoradv: advertise VK_EXT_robustness2
Samuel Pitoiset [Wed, 15 Apr 2020 09:39:28 +0000 (11:39 +0200)]
radv: advertise VK_EXT_robustness2

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>

4 years agoradv: handle NULL vertex bindings
Samuel Pitoiset [Wed, 15 Apr 2020 09:48:13 +0000 (11:48 +0200)]
radv: handle NULL vertex bindings

With VK_EXT_robustness2, an element of pBuffers can be NULL.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>

4 years agoradv: handle NULL descriptors
Samuel Pitoiset [Thu, 23 Apr 2020 14:02:59 +0000 (16:02 +0200)]
radv: handle NULL descriptors

All fields must be zero, otherwise the HW hangs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>

4 years agoaco: fix adjusting the sample index with FMASK if value is negative
Samuel Pitoiset [Mon, 27 Apr 2020 15:27:22 +0000 (17:27 +0200)]
aco: fix adjusting the sample index with FMASK if value is negative

The SPIR-V spec doesn't say explicitly that the sample index
must be an unsigned integer.

This fixes crashes with some new VK_EXT_robustness2 tests.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>

4 years agoaco: fix nir_texop_texture_samples with NULL descriptors
Samuel Pitoiset [Mon, 27 Apr 2020 15:02:18 +0000 (17:02 +0200)]
aco: fix nir_texop_texture_samples with NULL descriptors

With VK_EXT_robustness2, descriptors can be NULL and the number of
samples returned by nir_texop_texture_samples should be 0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>

4 years agoac/llvm: fix nir_texop_texture_samples with NULL descriptors
Samuel Pitoiset [Mon, 27 Apr 2020 11:04:40 +0000 (13:04 +0200)]
ac/llvm: fix nir_texop_texture_samples with NULL descriptors

With VK_EXT_robustness2, descriptors can be NULL and the number of
samples returned by nir_texop_texture_samples should be 0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>

4 years agointel/fs: Only stall after sending all memory fence messages
Caio Marcelo de Oliveira Filho [Fri, 17 Jan 2020 22:17:58 +0000 (14:17 -0800)]
intel/fs: Only stall after sending all memory fence messages

In Gen11+, when emitting a fence for both L3 and SLM, the generated
code would look like

    SEND, MOV (for stall), SEND, MOV (for stall)

This commit change that so two SENDs are emitted before the MOVs for
stall.  This is similar to the approach used in Ivy Bridge for the
render fence.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>

4 years agointel/fs,vec4: Pull stall logic for memory fences up into the IR
Caio Marcelo de Oliveira Filho [Fri, 17 Jan 2020 23:07:44 +0000 (15:07 -0800)]
intel/fs,vec4: Pull stall logic for memory fences up into the IR

Instead of emitting the stall MOV "inside" the
SHADER_OPCODE_MEMORY_FENCE generation, use the scheduling fences when
creating the IR.

For IvyBridge, every (data cache) fence is accompained by a render
cache fence, that now is explicit in the IR, two
SHADER_OPCODE_MEMORY_FENCEs are emitted (with different SFIDs).

Because Begin and End interlock intrinsics are effectively memory
barriers, move its handling alongside the other memory barrier
intrinsics.  The SHADER_OPCODE_INTERLOCK is still used to distinguish
if we are going to use a SENDC (for Begin) or regular SEND (for End).

This change is a preparation to allow emitting both SENDs in Gen11+
before we can stall on them.

Shader-db results for IVB (i965):

    total instructions in shared programs: 11971190 -> 11971200 (<.01%)
    instructions in affected programs: 11482 -> 11492 (0.09%)
    helped: 0
    HURT: 8
    HURT stats (abs)   min: 1 max: 3 x̄: 1.25 x̃: 1
    HURT stats (rel)   min: 0.03% max: 0.50% x̄: 0.14% x̃: 0.10%
    95% mean confidence interval for instructions value: 0.66 1.84
    95% mean confidence interval for instructions %-change: 0.01% 0.27%
    Instructions are HURT.

  Unlike the previous code, that used the `mov g1 g2` trick to force
  both `g1` and `g2` to stall, the scheduling fence will generate `mov
  null g1` and `mov null g2`.  During review it was decided it was not
  worth keeping the special codepath for the small effect will have.

Shader-db results for HSW (i965), BDW and SKL don't have a change
on instruction count, but do report changes in cycles count, showing
SKL results below

    total cycles in shared programs: 341738444 -> 341710570 (<.01%)
    cycles in affected programs: 7240002 -> 7212128 (-0.38%)
    helped: 46
    HURT: 5
    helped stats (abs) min: 14 max: 1940 x̄: 676.22 x̃: 154
    helped stats (rel) min: <.01% max: 2.62% x̄: 1.28% x̃: 0.95%
    HURT stats (abs)   min: 2 max: 1768 x̄: 646.40 x̃: 362
    HURT stats (rel)   min: <.01% max: 0.83% x̄: 0.28% x̃: 0.08%
    95% mean confidence interval for cycles value: -777.71 -315.38
    95% mean confidence interval for cycles %-change: -1.42% -0.83%
    Cycles are helped.

  This seems to be the effect of allocating two registers separatedly
  instead of a single one with size 2, which causes different register
  allocation, affecting the cycle estimates.

while ICL also has not change on instruction count but report changes
negative changes in cycles

    total cycles in shared programs: 352665369 -> 352707484 (0.01%)
    cycles in affected programs: 9608288 -> 9650403 (0.44%)
    helped: 4
    HURT: 104
    helped stats (abs) min: 24 max: 128 x̄: 88.50 x̃: 101
    helped stats (rel) min: <.01% max: 0.85% x̄: 0.46% x̃: 0.49%
    HURT stats (abs)   min: 2 max: 2016 x̄: 408.36 x̃: 48
    HURT stats (rel)   min: <.01% max: 3.31% x̄: 0.88% x̃: 0.45%
    95% mean confidence interval for cycles value: 256.67 523.24
    95% mean confidence interval for cycles %-change: 0.63% 1.03%
    Cycles are HURT.

  AFAICT this is the result of the case above.

Shader-db results for TGL have similar cycles result as ICL, but also
affect instructions

    total instructions in shared programs: 17690586 -> 17690597 (<.01%)
    instructions in affected programs: 64617 -> 64628 (0.02%)
    helped: 55
    HURT: 32
    helped stats (abs) min: 1 max: 16 x̄: 4.13 x̃: 3
    helped stats (rel) min: 0.05% max: 2.78% x̄: 0.86% x̃: 0.74%
    HURT stats (abs)   min: 1 max: 65 x̄: 7.44 x̃: 2
    HURT stats (rel)   min: 0.05% max: 4.58% x̄: 1.13% x̃: 0.69%
    95% mean confidence interval for instructions value: -2.03 2.28
    95% mean confidence interval for instructions %-change: -0.41% 0.15%
    Inconclusive result (value mean confidence interval includes 0).

  Now that more is done in the IR, more dependencies are visible and
  more SWSB annotations are emitted.  Mixed with different register
  allocation decisions like above, some shaders will see more `sync
  nops` while others able to avoid them.

  Most of the new `sync nops` are also redundant and could be dropped,
  which will be fixed in a separate change.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>

4 years agointel/fs: Allow FS_OPCODE_SCHEDULING_FENCE stall on registers
Caio Marcelo de Oliveira Filho [Fri, 17 Jan 2020 22:52:13 +0000 (14:52 -0800)]
intel/fs: Allow FS_OPCODE_SCHEDULING_FENCE stall on registers

It will generate the MOVs (or SYNC_NOP in Gen12+) needed for stall.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>

4 years agoradv: Expose 4G element texel buffers.
Bas Nieuwenhuizen [Tue, 28 Apr 2020 15:04:25 +0000 (17:04 +0200)]
radv: Expose 4G element texel buffers.

Old value seems to be copied from anv.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4787>

4 years agoiris: Fix downcast of bound_vertex_buffers from uint64_t to int
Kenneth Graunke [Tue, 28 Apr 2020 21:04:58 +0000 (14:04 -0700)]
iris: Fix downcast of bound_vertex_buffers from uint64_t to int

This is the wrong data type, the original field - and the values we're
adding in - are both 64-bit unsigned.  Keep the original data type.

Thanks to Dave Airlie for finding this while reading the code.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4802>

4 years agointel/ir: Remove scheduling-based cycle count estimates.
Francisco Jerez [Fri, 3 Apr 2020 00:42:21 +0000 (17:42 -0700)]
intel/ir: Remove scheduling-based cycle count estimates.

The cycle count estimation logic part of the scheduler is now
redundant with the shader performance modeling pass, and the estimates
can be consolidated into the brw::performance analysis result object
instead of being part of the CFG, which guarantees that the estimates
cannot be accessed without previously calling the
performance_analysis::require() method, which makes sure that the
right analysis pass is executed at the right time if we don't already
have up-to-date cached results.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/ir: Pass block cycle count information explicitly to disassembler.
Francisco Jerez [Fri, 3 Apr 2020 00:42:57 +0000 (17:42 -0700)]
intel/ir: Pass block cycle count information explicitly to disassembler.

So we can eventually remove the cycle count estimates from the CFG
data structure and consolidate performance information in the
brw::performance object.

It would be cleaner to pass the brw::performance object directly to
the disassembler but that isn't straightforward since the disassembler
is built as a plain C file unlike the rest of the compiler back-end.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/ir: Use brw::performance object instead of CFG cycle counts for codegen stats.
Francisco Jerez [Thu, 26 Mar 2020 23:27:32 +0000 (16:27 -0700)]
intel/ir: Use brw::performance object instead of CFG cycle counts for codegen stats.

These should be more accurate than the current cycle counts, since
among other things they consider the effect of post-scheduling passes
like the software scoreboard on TGL.  In addition it will enable us to
clean up some of the now redundant cycle-count estimation
functionality in the instruction scheduler.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs: Add INTEL_DEBUG=no32 debugging flag.
Francisco Jerez [Wed, 22 Apr 2020 20:29:34 +0000 (13:29 -0700)]
intel/fs: Add INTEL_DEBUG=no32 debugging flag.

This is useful in order to identify codegen issues caused by SIMD32.
It doesn't currently have any effect on compute shaders since SIMD32
dispatch is only enabled for CS when it's strictly necessary to do so
in order to support the workgroup size requested for the shader --
That might change in the future though when we hook up the SIMD32
heuristic to CS compilation.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs: Implement performance analysis-based SIMD32 heuristic for fragment shaders.
Francisco Jerez [Fri, 3 Apr 2020 00:30:06 +0000 (17:30 -0700)]
intel/fs: Implement performance analysis-based SIMD32 heuristic for fragment shaders.

The heuristic enables the SIMD32 fragment shader based on whether the
IR performance modeling pass predicts it to have greater throughput
than the SIMD16 and SIMD8 variants of the same shader.  It would be
straightforward to do the same thing in order to control whether
SIMD16 dispatch is enabled, but it's pending additional performance
evaluation.

The INTEL_DEBUG=do32 option is left around in order to force the
SIMD32 shader to be used regardless of the result of the heuristic,
since it's useful as a debugging aid e.g. in order to identify
SIMD32-specific codegen issues which may be masked by the SIMD32
heuristic, or cases where the heuristic is incorrectly disabling
SIMD32 shaders that offer a performance advantage.

Currently this is only enabled on Gen6+, since SIMD32 codegen support
is incomplete on earlier platforms.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs: Heap-allocate fs_visitors in brw_compile_fs().
Francisco Jerez [Fri, 3 Apr 2020 00:16:45 +0000 (17:16 -0700)]
intel/fs: Heap-allocate fs_visitors in brw_compile_fs().

This makes brw_compile_fs() look a bit more similar to
brw_compile_cs().  It saves us three v*_shader_stats local variables,
and will save us additional triplicated declarations as we start
tracking IR performance analysis results.

The triplicated cfg pointers are left around because they're set to
NULL to mark specific dispatch modes as disabled (e.g. in order to
enforce hardware restrictions).  Doing the same thing with the visitor
pointers would cause data leaks.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/ir: Import shader performance analysis pass.
Francisco Jerez [Thu, 26 Mar 2020 21:59:02 +0000 (14:59 -0700)]
intel/ir: Import shader performance analysis pass.

This introduces an analysis pass intended to estimate several
performance statistics of the shader, including cycle count latency
and throughput values, based on static modeling.  It has instruction
performance information more comprehensive than the current scheduling
pass for all platforms between Gen4-11, and works on both the FS and
VEC4 back-end.

The most immediate purpose of this pass is to implement a heuristic
meant to determine whether using SIMD32 dispatch for a fragment shader
can be expected to help more than it hurts.  In addition this will
allow the effect of passes run after scheduling (e.g. the TGL software
scoreboard pass and the VEC4 dependency control pass) to be visible in
shader-db statistics.

But that isn't the end of the story, other potential applications of
this pass (not part of this MR) I've been playing around with are:

 - Implement a similar SIMD16 heuristic allowing the identification of
   inefficient SIMD16 fragment shaders.

 - Implement similar SIMD16 and SIMD32 heuristics for the compute
   shader stage -- Currently compute shader builds always use the
   SIMD16 shader if available and never use the SIMD32 shader unless
   strictly necessary, which is suboptimal under certain conditions.

 - Hook up to the instruction scheduler in order to improve the
   accuracy of its timing information.

 - Use as heuristic in order to drive the selection of scheduling
   modes (Matt was experimenting with that).

 - Plug to the TGL software scoreboard pass in order to implement a
   more effective SBID token allocation algorithm, since in general
   the optimal token allocation depends on the timings of all
   instructions in the program.

 - Use its bottleneck detection functionality in order to implement a
   heuristic computing a more optimal bound for the number of fragment
   shader threads executed in parallel (by adjusting the
   MaximumNumberofThreadsPerPSD control of 3DSTATE_PS).

As a follow-up I'm planning to submit updated timing information for
Gen12 platforms -- Everything else required to support Gen12 like SWSB
handling is already included in this patch, but there were some IP
concerns regarding the TGL timing parameters since they cannot
currently be obtained with the documentation and hardware which is
publicly available.  The timing parameters for any previous Gen7-11
platforms can be obtained by anyone by sampling the timestamp register
using e.g. shader_time, though I have some more convenient
instrumentation coming up.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/vec4: Fix constness of vec4_instruction::reads_flag() and ::writes_flag().
Francisco Jerez [Sat, 22 Feb 2020 09:17:21 +0000 (01:17 -0800)]
intel/vec4: Fix constness of vec4_instruction::reads_flag() and ::writes_flag().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs: Replace fs_visitor::bank_conflict_cycles() with stand-alone function.
Francisco Jerez [Thu, 2 Apr 2020 23:20:34 +0000 (16:20 -0700)]
intel/fs: Replace fs_visitor::bank_conflict_cycles() with stand-alone function.

This will be re-usable by the IR performance analysis pass.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs: Fix constness of argument of fs_instruction_scheduler::is_compressed().
Francisco Jerez [Thu, 2 Apr 2020 23:18:12 +0000 (16:18 -0700)]
intel/fs: Fix constness of argument of fs_instruction_scheduler::is_compressed().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs: Rename half() helpers to quarter(), allow index up to 3.
Francisco Jerez [Fri, 3 Apr 2020 20:04:43 +0000 (13:04 -0700)]
intel/fs: Rename half() helpers to quarter(), allow index up to 3.

Makes more sense considering SIMD32.  Relaxing the assertion in
brw_ir_fs.h will be required in order to avoid assertion failures on
SNB with SIMD32 fragment shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/ir: Add missing initialization of backend_reg::offset during construction.
Francisco Jerez [Thu, 26 Mar 2020 22:01:13 +0000 (15:01 -0700)]
intel/ir: Add missing initialization of backend_reg::offset during construction.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs/gen12: Fix Render Target Read header setup for new thread payload layout.
Francisco Jerez [Tue, 7 Apr 2020 23:39:59 +0000 (16:39 -0700)]
intel/fs/gen12: Fix Render Target Read header setup for new thread payload layout.

In Gen12 the Poly 0 Info DWORD containing the Viewport Index and
Render Target Index fields were moved from r0.0 to r1.1 in order to
make room for dual-polygon dispatch.  The render target message format
was updated to expect that information in the same location, so we
didn't need to make any changes for framebuffer fetch to work with
SIMD8 and SIMD16 dispatch.  Unfortunately that won't work with SIMD32,
since the render target message header is assembled from r0 and r2
instead of r1, and the r2 thread payload wasn't updated with an
additional copy of the same information.  We need to fix things up
manually instead.  This avoids a handful of
EXT_shader_framebuffer_fetch regressions in combination with SIMD32
fragment shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs/gen12: Work around dual-source blending hangs in combination with SIMD32.
Francisco Jerez [Wed, 8 Apr 2020 00:31:07 +0000 (17:31 -0700)]
intel/fs/gen12: Work around dual-source blending hangs in combination with SIMD32.

This applies the same work-around I commited as b84fa0b31e67
"intel/fs/gen11: Work around dual-source blending hangs in combination
with SIMD32." to Gen12, which seems to suffer from the same hardware
bug found empirically.  The failure mode seems to be identical.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs/gen12: Fix hangs with per-sample SIMD32 fragment shader dispatch.
Francisco Jerez [Wed, 8 Apr 2020 00:22:10 +0000 (17:22 -0700)]
intel/fs/gen12: Fix hangs with per-sample SIMD32 fragment shader dispatch.

The Gen12 docs are rather contradictory regarding the dispatch
configurations supported by the fragment shader -- The same table
present in previous generations seems to imply that only one dispatch
mode can be enabled when doing per-sample shading, but a restriction
documented in the 3DSTATE_PS_BODY page implies the opposite: That
SIMD32 can only be used in combination with some other dispatch mode.

The latter seems to match the behavior of real hardware as I could
tell from my testing: A bunch of multisample test-cases that do
per-sample shading hang if we only provide a SIMD32 shader.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agomesa: Follow OpenGL conversion rules for values that exceed storage size
Dylan Baker [Wed, 22 Apr 2020 06:32:45 +0000 (23:32 -0700)]
mesa: Follow OpenGL conversion rules for values that exceed storage size

Section 2.2.2 (Data Conversions For State Query Commands) of the
OpenGL 4.5 spec says:

  Following these steps, if a value is so large in magnitude that
  it cannot be represented by the returned data type, then the
  nearest value representable using that type is returned.

The current code doesn't do the correct thing, because it truncates a
long (potentially a 64bit values) to an int.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2828
Fixes: 53c36dfcfe3eb3749a53267f054870280afb0d71
       ("replace IROUND with util functions")

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4673>

4 years agopan/bit: Add BITWISE test
Alyssa Rosenzweig [Tue, 28 Apr 2020 17:57:31 +0000 (13:57 -0400)]
pan/bit: Add BITWISE test

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4790>

4 years agopan/bit: Interpret BI_BITWISE
Alyssa Rosenzweig [Tue, 28 Apr 2020 17:49:24 +0000 (13:49 -0400)]
pan/bit: Interpret BI_BITWISE

No shifting yet.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4790>

4 years agopan/bi: Handle iand/ior/ixor in NIR->BIR
Alyssa Rosenzweig [Tue, 28 Apr 2020 18:36:17 +0000 (14:36 -0400)]
pan/bi: Handle iand/ior/ixor in NIR->BIR

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4790>

4 years agopan/bi: Pack BI_BITWISE
Alyssa Rosenzweig [Tue, 28 Apr 2020 18:19:34 +0000 (14:19 -0400)]
pan/bi: Pack BI_BITWISE

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4790>

4 years agopan/bi: Add bitwise modifiers
Alyssa Rosenzweig [Tue, 28 Apr 2020 17:48:37 +0000 (13:48 -0400)]
pan/bi: Add bitwise modifiers

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4790>

4 years agofreedreno/a6xx: invalidate tex state cache entries on rebind
Rob Clark [Fri, 24 Apr 2020 22:15:09 +0000 (15:15 -0700)]
freedreno/a6xx: invalidate tex state cache entries on rebind

When a resource's backing bo changes, its seqno will be incremented.
Which would result in a new tex state cache key, and nothing to clean
up the old tex state until the sampler view/state is destroyed.  But
in some games, that may never happen, or at least not happen before
we run out of memory.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2830
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>

4 years agofreedreno: rebind_resource() *before* bo changes
Rob Clark [Fri, 24 Apr 2020 22:10:49 +0000 (15:10 -0700)]
freedreno: rebind_resource() *before* bo changes

This will matter in the next patch, where we need the original
rsc->seqno.

It means slight shuffling of where we call rebind_resource() in the
`fd_try_shadow_resource()` path.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>

4 years agofreedreno: rebind resource in all contexts
Rob Clark [Fri, 24 Apr 2020 22:00:20 +0000 (15:00 -0700)]
freedreno: rebind resource in all contexts

If the resource is rebound, we need to invalidate in all contexts.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>

4 years agofreedreno: optimize rebind_resource()
Rob Clark [Fri, 24 Apr 2020 21:45:04 +0000 (14:45 -0700)]
freedreno: optimize rebind_resource()

Track how resources are used, ie. which state they may potentially dirty
if the backing bo is changed/reallocated, to optimize rebind_resource().

This will be more important in a later patch when we hook up eviction of
entries in a6xx tex state cache.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>

4 years agofreedreno: mark more state dirty when rebinding resources
Rob Clark [Fri, 24 Apr 2020 20:54:43 +0000 (13:54 -0700)]
freedreno: mark more state dirty when rebinding resources

Plus a bonus typo fix.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>

4 years agofreedreno: don't realloc idle bo's
Rob Clark [Fri, 24 Apr 2020 20:56:09 +0000 (13:56 -0700)]
freedreno: don't realloc idle bo's

The `DISCARD_WHOLE_RESOURCE` is just a hint.  And `rebind_resource()` is
a bunch of faffing about (and going to get worse in a later patch), so
let's not bother when the bo is already idle.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>

4 years agofreedreno: small whitespace fix
Rob Clark [Wed, 22 Apr 2020 23:22:22 +0000 (16:22 -0700)]
freedreno: small whitespace fix

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4744>

4 years agogallium/swr: Fix crashes and failures in vertex fetch
Jan Zielinski [Tue, 28 Apr 2020 19:05:46 +0000 (21:05 +0200)]
gallium/swr: Fix crashes and failures in vertex fetch

This commit fixes two problems:
- In some cases SWR does not correctly report to Gallium
  which formats are supported.
- Incorrect LLVM instructions are used in vertex fetch in some situations

Reviewed-by: Krzysztof Raszkowski <krzysztof.raszkowski@intel.com>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4788>

4 years agofreedreno/log-parser: support to read gzip'd logs
Rob Clark [Wed, 15 Apr 2020 20:36:21 +0000 (13:36 -0700)]
freedreno/log-parser: support to read gzip'd logs

~50MB gzip'd log files are nicer than ~300MB uncompressed

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4750>

4 years agofreedreno/a6xx: pre-calculate expected vsc stream sizes
Rob Clark [Sat, 25 Apr 2020 19:16:35 +0000 (12:16 -0700)]
freedreno/a6xx: pre-calculate expected vsc stream sizes

We should only rely on overflow detection for indirect draws, where we
have no other option.

This doesn't use quite the worst-possible-case sizes, which in practice
seem to be ~20x larger than what is required.  But instead uses roughly
half of that.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4750>

4 years agofreedreno: add helper to estimate # of bins per pipe
Rob Clark [Sat, 25 Apr 2020 17:45:31 +0000 (10:45 -0700)]
freedreno: add helper to estimate # of bins per pipe

For vsc size calculation, we need to know the # of bins per pipe.  Or at
least the worst-case # of bins, assuming we don't eliminate an unused depth/
stencil buffer.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4750>

4 years agofreedreno/a6xx+tu: rename VSC_DATA/VSC_DATA2
Rob Clark [Sat, 25 Apr 2020 16:51:09 +0000 (09:51 -0700)]
freedreno/a6xx+tu: rename VSC_DATA/VSC_DATA2

These are the draw-stream and primitive-stream, so lets give them more
descriptive names.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4750>

4 years agoaco: fix vgpr nir_op_vecn with sgpr operands
Rhys Perry [Mon, 27 Apr 2020 20:17:56 +0000 (21:17 +0100)]
aco: fix vgpr nir_op_vecn with sgpr operands

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>

4 years agoaco: improve clamped integer addition disassembly workaround
Rhys Perry [Mon, 27 Apr 2020 20:16:15 +0000 (21:16 +0100)]
aco: improve clamped integer addition disassembly workaround

Make it work with 16-bit and GFX10.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>

4 years agoaco: add various GFX10 int16 opcodes
Rhys Perry [Mon, 27 Apr 2020 19:57:28 +0000 (20:57 +0100)]
aco: add various GFX10 int16 opcodes

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>

4 years agoaco: fix sub-dword overwrite check in RA validator
Rhys Perry [Mon, 27 Apr 2020 19:52:20 +0000 (20:52 +0100)]
aco: fix sub-dword overwrite check in RA validator

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>

4 years agoaco: fix sub-dword out-of-bounds check in RA validator
Rhys Perry [Mon, 27 Apr 2020 19:51:56 +0000 (20:51 +0100)]
aco: fix sub-dword out-of-bounds check in RA validator

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>

4 years agoaco: add missing adjust_max_used_regs()
Rhys Perry [Mon, 27 Apr 2020 19:42:24 +0000 (20:42 +0100)]
aco: add missing adjust_max_used_regs()

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>

4 years agoaco: improve RA for uneven p_split_vector
Rhys Perry [Mon, 27 Apr 2020 19:28:41 +0000 (20:28 +0100)]
aco: improve RA for uneven p_split_vector

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>

4 years agoaco: don't recurse in sub-dword get_reg_simple()
Rhys Perry [Mon, 27 Apr 2020 19:24:24 +0000 (20:24 +0100)]
aco: don't recurse in sub-dword get_reg_simple()

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>

4 years agoaco: split self-intersecting copies instead of swapping
Rhys Perry [Mon, 27 Apr 2020 19:13:53 +0000 (20:13 +0100)]
aco: split self-intersecting copies instead of swapping

Example situation:
v1 = {v0.hi, v1.lo}
v0.hi = v1.hi

The 4-byte copy's definition is completely used, but swapping it makes no
sense. We have to split it to generate correct code:
swap(v0.hi, v1.lo)
swap(v0.hi, v1.hi)

Found in dEQP-VK.spirv_assembly.type.vec3.i16.constant_composite_vert

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>

4 years agoaco: fix neighboring register check in get_reg_simple()
Rhys Perry [Mon, 27 Apr 2020 17:15:23 +0000 (18:15 +0100)]
aco: fix neighboring register check in get_reg_simple()

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>

4 years agoaco: check alignment of non-subdword registers in get_reg_specified()
Rhys Perry [Mon, 27 Apr 2020 16:54:49 +0000 (17:54 +0100)]
aco: check alignment of non-subdword registers in get_reg_specified()

When splitting a v6b vector into v1 and v2b components, we should ensure
the v1 definition doesn't start at the upper half.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>

4 years agoaco: make RegisterFile::block() take a regclass
Rhys Perry [Mon, 27 Apr 2020 16:49:22 +0000 (17:49 +0100)]
aco: make RegisterFile::block() take a regclass

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4772>

4 years agoanv: Claim VK_EXT_robustness2 support
Jason Ekstrand [Mon, 13 Jan 2020 16:14:01 +0000 (10:14 -0600)]
anv: Claim VK_EXT_robustness2 support

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>

4 years agoanv: Handle null vertex buffer bindings
Jason Ekstrand [Fri, 7 Feb 2020 03:38:40 +0000 (21:38 -0600)]
anv: Handle null vertex buffer bindings

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>

4 years agoanv: Handle NULL descriptors
Jason Ekstrand [Fri, 7 Feb 2020 03:18:59 +0000 (21:18 -0600)]
anv: Handle NULL descriptors

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>

4 years agonir/combine_stores: Handle volatile
Jason Ekstrand [Mon, 27 Apr 2020 16:33:44 +0000 (11:33 -0500)]
nir/combine_stores: Handle volatile

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>

4 years agonir/dead_write_vars: Handle volatile
Jason Ekstrand [Mon, 27 Apr 2020 06:54:40 +0000 (01:54 -0500)]
nir/dead_write_vars: Handle volatile

We can't remove volatile writes and we can't combine them with other
volatile writes so all we can do is clear the unused bits.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>

4 years agonir/copy_prop_vars: Report progress when deleting self-copies
Jason Ekstrand [Mon, 27 Apr 2020 15:38:31 +0000 (10:38 -0500)]
nir/copy_prop_vars: Report progress when deleting self-copies

Fixes: 62332d139c8f6 "nir: Add a local variable-based copy prop..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>

4 years agonir/copy_prop_vars: Handle volatile better
Jason Ekstrand [Tue, 14 Apr 2020 16:01:15 +0000 (11:01 -0500)]
nir/copy_prop_vars: Handle volatile better

For deref_store, we can still delete invalid stores that write to
statically OOB data.  For everything, we need to make sure that we kill
aliases of destinations even if it's volatile.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>

4 years agovulkan: Update Vulkan XML and headers to 1.2.139
Jason Ekstrand [Mon, 27 Apr 2020 06:24:49 +0000 (01:24 -0500)]
vulkan: Update Vulkan XML and headers to 1.2.139

Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4767>

4 years agoanv: Allow all clear colors for texturing on Gen11+
Jason Ekstrand [Tue, 31 Mar 2020 22:46:37 +0000 (17:46 -0500)]
anv: Allow all clear colors for texturing on Gen11+

Starting with Gen11, we have two indirect clear colors: An unconverted
float/int version which is us used for rendering and a converted pixel
value version which is used for texturing.  Because the one used for
texturing is stored as a single pixel of that color, it works no matter
what format is being used.  Because it's a simple HW indirect and
doesn't involve copying surface states around, we can use it in the
sampler without having to worry about surface states having out-of-date
clear values.  The result is that we can now allow any clear color when
texturing.

This cuts the number of resolves in a RenderDoc trace of Dota2 by 95%
on Gen11+ (you read that right) and improves perf by 3.5%.  It improves
perf in a handful of other workloads by < 1%.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>

4 years agoanv: Use anv_layout_to_aux_usage for color during render passes
Jason Ekstrand [Thu, 26 Mar 2020 15:32:03 +0000 (10:32 -0500)]
anv: Use anv_layout_to_aux_usage for color during render passes

Previously, we tried to treat color image layouts as a special case
during render passes.  This is largely an artifact of history as our
initial understanding of Vulkan placed much more emphasis on render
passes than our current understanding.  The only real practical use for
magic layouts in the middle of a render pass, as far as I can tell, is
to allow more clear colors to get passed through to input attachments.
However, most apps aren't very creative with their clear colors and very
few of them (none coming from DXVK) actually use render passes in any
interesting way.  Therefore, the risk of being able to pass fewer clear
colors through to input attachments should be minimal.

There are, however, three very big advantages to this change:

 1. We are now consistent in our handling of aux usage and layouts
    between color and depth/stencil.

 2. We are now actually following the layout guidelines from the app and
    aren't nearly as likely to see strange behavior due to us overriding
    the image layouts manually.

 3. It's more obviously correct.  While I think our old render pass code
    was probably correct, it was full of corner cases and it's very
    possible that it was behaving badly in weird ways.  This follows the
    Vulkan API much more blindly and, as such, is more likely to be
    correct and behave the same as other implementations.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>

4 years agoanv: Split color_attachment_compute_aux_usage in two
Jason Ekstrand [Wed, 25 Mar 2020 21:38:28 +0000 (16:38 -0500)]
anv: Split color_attachment_compute_aux_usage in two

In particular, we split out an anv_can_fast_clear_color_view helper
which only cares about fast-clear and not aux_usage itself.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>

4 years agoanv: Rework depth_stencil_attachment_compute_aux_usage
Jason Ekstrand [Wed, 25 Mar 2020 20:31:55 +0000 (15:31 -0500)]
anv: Rework depth_stencil_attachment_compute_aux_usage

Instead of making it a function that pretends to choose aux usage (which
isn't what it does at all), make it a function which returns whether or
not we want to do a fast clear.  This is far more accurate to the
purpose of the function.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>

4 years agoanv: Refactor cmd_buffer_setup_attachments
Jason Ekstrand [Wed, 25 Mar 2020 20:29:09 +0000 (15:29 -0500)]
anv: Refactor cmd_buffer_setup_attachments

This commit just renames some things so that we use names for temporary
variables which are more consistent with other places in the code-base.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>

4 years agoanv: Stop allowing non-zero clear colors in input attachments
Jason Ekstrand [Wed, 25 Mar 2020 20:02:15 +0000 (15:02 -0500)]
anv: Stop allowing non-zero clear colors in input attachments

Previously, we bent over backwards to allow non-zero clear colors input
attachments whenever we could.  However, very few apps use input
attachments and very few use non-zero clear colors.  Getting rid of
support for non-zero clear colors input attachments will allow us to
treat them identically to textures which should help us simplify things
a good bit.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>

4 years agoanv: Disallow fast-clears which require format-reinterpretation
Jason Ekstrand [Wed, 25 Mar 2020 19:54:19 +0000 (14:54 -0500)]
anv: Disallow fast-clears which require format-reinterpretation

In order to actually hit this case you have to be using a very odd
color/view combination.  The common cases of clear-to-zero and 0/1 clear
colors with an sRGB view don't require any re-interpretation.  This is
probably better than always resolving whenever we have a format mismatch
like we are today because that hits the sRGB case every time.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>

4 years agointel: Move swizzle_color_value from blorp to ISL
Jason Ekstrand [Wed, 25 Mar 2020 19:35:53 +0000 (14:35 -0500)]
intel: Move swizzle_color_value from blorp to ISL

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>

4 years agoanv: Allocate surface states per-subpass
Jason Ekstrand [Wed, 25 Mar 2020 00:24:54 +0000 (19:24 -0500)]
anv: Allocate surface states per-subpass

Instead of allocating surface states for attachments in BeginRenderPass,
we now allocate them in begin_subpass.  Also, since we're zeroing
things, we can be a bit cleaner about or implementation and just fill
out all those passes for which we have allocated surface states.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>

4 years agoanv: Split command buffer attachment setup in three
Jason Ekstrand [Tue, 24 Mar 2020 23:32:01 +0000 (18:32 -0500)]
anv: Split command buffer attachment setup in three

This commit splits genX(cmd_buffer_setup_attachments)() into three
functions: one which sets up cmd_buffer->state.attachments, one which
allocates surface states, and one which fills out the surface states.
While we're here, we make both functions take the framebuffer (if any)
as an argument instead of pulling it from the command buffer so it's
more clear what things are inputs to the functions.  We also make the
render pass and framebuffer parameters const as those are immutable
objects.  The only functional change here should be that we now
vk_zalloc the attachments which should be a bit safer.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>

4 years agoanv: Mark images written in end_subpass
Jason Ekstrand [Wed, 25 Mar 2020 02:28:06 +0000 (21:28 -0500)]
anv: Mark images written in end_subpass

This makes a lot more sense than marking them written in begin_subpass
since, at that point, we haven't written them yet.  This should reduce
the chances of accidental extra resolves.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>

4 years agoanv: Use ANV_FROM_HANDLE for pInheritanceInfo fields
Jason Ekstrand [Tue, 24 Mar 2020 23:30:17 +0000 (18:30 -0500)]
anv: Use ANV_FROM_HANDLE for pInheritanceInfo fields

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>

4 years agoanv: Assert surface states are valid
Jason Ekstrand [Wed, 25 Mar 2020 05:29:31 +0000 (00:29 -0500)]
anv: Assert surface states are valid

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>

4 years agoanv: Stop filling out the clear color in compute_aux_usage
Jason Ekstrand [Tue, 24 Mar 2020 23:08:03 +0000 (18:08 -0500)]
anv: Stop filling out the clear color in compute_aux_usage

It's a pointless micro-optimization that just makes compute_aux_usage
unnecessarily entangled with setting up surface states.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>

4 years agoanv: Add TRANSFER_SRC to pass usage not subpass usage
Jason Ekstrand [Wed, 25 Mar 2020 05:43:14 +0000 (00:43 -0500)]
anv: Add TRANSFER_SRC to pass usage not subpass usage

The subpass usage flags are supposed to always be one bit and never
multiple bits.  However, when adding in TRANSFER_SRC usage for resolve
attachments we were adding it to the subpass bits and not the render
pass bits.  This potentially is causing issues where images aren't
getting marked written properly.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>

4 years agoanv: Return an error if allocating attachment memory fails
Jason Ekstrand [Tue, 24 Mar 2020 23:18:28 +0000 (18:18 -0500)]
anv: Return an error if allocating attachment memory fails

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4393>

4 years agoradv: advertise VK_AMD_memory_overallocation_behavior
Samuel Pitoiset [Mon, 25 Nov 2019 14:51:03 +0000 (15:51 +0100)]
radv: advertise VK_AMD_memory_overallocation_behavior

Doom Eternal explicitly allows overallocation via this extension
but that shouldn't change anything because it's the default RADV
behavior.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4785>

4 years agoradv: track memory heaps usage if overallocation is explicitly disallowed
Samuel Pitoiset [Tue, 28 Apr 2020 11:10:56 +0000 (13:10 +0200)]
radv: track memory heaps usage if overallocation is explicitly disallowed

By default, RADV supports overallocation by the sense that it doesn't
reject an allocation if the target heap is full.

With VK_AMD_overallocation_behaviour, apps can disable overallocation
and the driver should account for all allocations explicitly made by
the application, and reject if the heap is full.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4785>

4 years agoradv: remove unused radv_device_memory::map_size field
Samuel Pitoiset [Tue, 28 Apr 2020 09:54:06 +0000 (11:54 +0200)]
radv: remove unused radv_device_memory::map_size field

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4785>