mesa.git
4 years agofreedreno: switch to simple_mtx
Rob Clark [Tue, 28 Apr 2020 20:07:16 +0000 (13:07 -0700)]
freedreno: switch to simple_mtx

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4810>

4 years agofreedreno: add screen lock wrappers
Rob Clark [Tue, 28 Apr 2020 20:04:16 +0000 (13:04 -0700)]
freedreno: add screen lock wrappers

This will make it easier to swap out to simple_mtx_t

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4810>

4 years agoutil/simple_mtx: add assert_locked()
Rob Clark [Tue, 28 Apr 2020 19:39:32 +0000 (12:39 -0700)]
util/simple_mtx: add assert_locked()

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4810>

4 years agoturnip: fix wrong substream size in parse_multisample_and_color_blend
Jonathan Marek [Tue, 28 Apr 2020 23:54:06 +0000 (19:54 -0400)]
turnip: fix wrong substream size in parse_multisample_and_color_blend

Missed updating this when adding tu6_emit_sample_locations

Fixes: a92d2e11095 ("turnip: implement VK_EXT_sample_locations")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4795>

4 years agoutil/ra: Improve ra_set_finalize() performance.
Eric Anholt [Mon, 13 Apr 2020 18:14:23 +0000 (11:14 -0700)]
util/ra: Improve ra_set_finalize() performance.

BITSET_FOR_EACH_SET can walk a sparse set (such as a register class's set
of registers) much faster than just iterating over individual bits.

Improves freedreno startup time (as measured by shader-db ./run
shaders/closed/gputest/triangle on my x86 system) by -4.12679% +/-
1.99006% (n=151)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>

4 years agoutil/ra: Use util_dynarray for handling the conflict lists.
Eric Anholt [Mon, 13 Apr 2020 17:47:17 +0000 (10:47 -0700)]
util/ra: Use util_dynarray for handling the conflict lists.

Again, shortens the code significantly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>

4 years agoutil/ra: Use util_dynarray for the adjacency list.
Eric Anholt [Mon, 13 Apr 2020 17:36:08 +0000 (10:36 -0700)]
util/ra: Use util_dynarray for the adjacency list.

This make the code significantly more readable, I think (along with
shorter).  Also, using util_dynarray_delete_unordered() saves us a move of
the rest of the list when removing adjacency on a node.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>

4 years agoutil/ra: Sanity check that we're adding a valid reg to a class.
Eric Anholt [Thu, 9 Apr 2020 22:10:08 +0000 (15:10 -0700)]
util/ra: Sanity check that we're adding a valid reg to a class.

BITSET_SET might not segfault on you right away if you're just slightly
off, and an assert is nicer anyway.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>

4 years agoutil/ra: Sanity check that the driver selected a valid reg.
Eric Anholt [Thu, 9 Apr 2020 21:11:51 +0000 (14:11 -0700)]
util/ra: Sanity check that the driver selected a valid reg.

freedreno was returning -1 when it didn't pick a reg from the given bitset
due to an off-by-a-small-number error.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4537>

4 years agofreedreno/a4xx: enable A405
Konrad Dybcio [Thu, 26 Mar 2020 16:48:52 +0000 (17:48 +0100)]
freedreno/a4xx: enable A405

This patch brings support for Adreno A405
as found on MSM8939. That chip is a cut-down
version of A4XX IP and requires no special handling.

Tested on Asus Zenfone 2 Laser (Z00T) smartphone.

Signed-off-by: Konrad Dybcio <konradybcio@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4753>

4 years agoiris: handle PIPE_CAP_CLEAR_SCISSORED
Mike Blumenkrantz [Tue, 24 Mar 2020 15:58:29 +0000 (11:58 -0400)]
iris: handle PIPE_CAP_CLEAR_SCISSORED

this allows passing scissored clear calls through the driver where it can
be handled by a repclear shader

fix kwg/mesa#61

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4310>

4 years agogallium: add pipe cap for scissored clears and pass scissor state to clear() hook
Mike Blumenkrantz [Tue, 24 Mar 2020 16:02:51 +0000 (12:02 -0400)]
gallium: add pipe cap for scissored clears and pass scissor state to clear() hook

this adds a new pipe cap that drivers can support which enables passing buffer
clears with scissor test enabled through to be handled by the driver instead
of having mesa draw a quad

also adjust all existing clear() hooks to have the new parameter

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4310>

4 years agoi965: Use correct constant for max_variable_local_size
Caio Marcelo de Oliveira Filho [Wed, 29 Apr 2020 04:05:05 +0000 (21:05 -0700)]
i965: Use correct constant for max_variable_local_size

Fixes: 5664bd6db38 ("i965: Implement ARB_compute_variable_group_size")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4799>

4 years agoiris: move iris_vtable to iris_screen
Mike Blumenkrantz [Mon, 30 Mar 2020 14:37:29 +0000 (10:37 -0400)]
iris: move iris_vtable to iris_screen

instead of inlining this into every context, now a struct is used in the screen
struct to reduce memory usage and simplify a couple of the methods

Closes: https://gitlab.freedesktop.org/kwg/mesa/-/issues/6
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4376>

4 years agointel/fs: Don't delete coalesced MOVs if they have a cmod
Jason Ekstrand [Mon, 27 Apr 2020 20:31:12 +0000 (15:31 -0500)]
intel/fs: Don't delete coalesced MOVs if they have a cmod

Shader-db results on ICL:

    total instructions in shared programs: 17133088 -> 17133287 (<.01%)
    instructions in affected programs: 61300 -> 61499 (0.32%)
    helped: 0
    HURT: 199

This means it's likely fixing 199 bugs. :-)  All the changed shaders are
in Mad Max.  It's surprisingly difficult to get the back-end compiler to
generate a pattern that hits this we don't tend to emit a lot coalescable
MOVs.  The pattern in Mad Max that's able to hit is fsign(fsat(x)) under
the right conditions.

Closes: #2820
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4773>

4 years agost/mesa: expose more SPIR-V capabilities
Marek Olšák [Mon, 27 Apr 2020 03:17:41 +0000 (23:17 -0400)]
st/mesa: expose more SPIR-V capabilities

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4760>

4 years agomesa: report GL_INVALID_OPERATION for invalid glTextureBuffer target
Marek Olšák [Mon, 27 Apr 2020 05:03:38 +0000 (01:03 -0400)]
mesa: report GL_INVALID_OPERATION for invalid glTextureBuffer target

This fixes:
    KHR-GL46.direct_state_access.textures_buffer_errors
    KHR-GL46.direct_state_access.textures_buffer_range_errors

Fixes: 98e64e538af - main: Added entry point for glTextureBuffer
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4759>

4 years agopan/mdg: Replicate 16-bit swizzles
Alyssa Rosenzweig [Tue, 28 Apr 2020 21:44:39 +0000 (17:44 -0400)]
pan/mdg: Replicate 16-bit swizzles

We don't support vec8 quite yet anyway, this fixes dot products.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>

4 years agopan/mdg: Ensure fdot is scalar out in disasm
Alyssa Rosenzweig [Tue, 28 Apr 2020 21:44:19 +0000 (17:44 -0400)]
pan/mdg: Ensure fdot is scalar out in disasm

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>

4 years agopan/mdg: Move condense_writemask to disasm
Alyssa Rosenzweig [Tue, 28 Apr 2020 21:43:56 +0000 (17:43 -0400)]
pan/mdg: Move condense_writemask to disasm

The compiler should *never* use this. Packing should be 1 way.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>

4 years agopan/mdg: Pass through some types from scheduling
Alyssa Rosenzweig [Tue, 28 Apr 2020 00:35:39 +0000 (20:35 -0400)]
pan/mdg: Pass through some types from scheduling

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>

4 years agopan/mdg: Don't crash on unknown branch target
Alyssa Rosenzweig [Tue, 28 Apr 2020 00:35:00 +0000 (20:35 -0400)]
pan/mdg: Don't crash on unknown branch target

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>

4 years agopan/mdg: Make some branch targets more explicit
Alyssa Rosenzweig [Tue, 28 Apr 2020 00:34:36 +0000 (20:34 -0400)]
pan/mdg: Make some branch targets more explicit

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>

4 years agopan/mdg: Always print the mask
Alyssa Rosenzweig [Tue, 28 Apr 2020 21:57:38 +0000 (17:57 -0400)]
pan/mdg: Always print the mask

Meaningful for fp16.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>

4 years agopan/mdg: Specialize swizzle to type
Alyssa Rosenzweig [Tue, 28 Apr 2020 00:12:55 +0000 (20:12 -0400)]
pan/mdg: Specialize swizzle to type

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>

4 years agopan/mdg: Lower specials to 32-bit
Alyssa Rosenzweig [Tue, 28 Apr 2020 00:09:43 +0000 (20:09 -0400)]
pan/mdg: Lower specials to 32-bit

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>

4 years agopan/mdg: Move sampler_type emission to pack time
Alyssa Rosenzweig [Mon, 27 Apr 2020 23:23:17 +0000 (19:23 -0400)]
pan/mdg: Move sampler_type emission to pack time

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>

4 years agopan/mdg: Set texture full fields at pack time
Alyssa Rosenzweig [Mon, 27 Apr 2020 23:22:06 +0000 (19:22 -0400)]
pan/mdg: Set texture full fields at pack time

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>

4 years agopan/mdg: Track texture types
Alyssa Rosenzweig [Mon, 27 Apr 2020 23:11:19 +0000 (19:11 -0400)]
pan/mdg: Track texture types

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>

4 years agopan/mdg: Track v_mov type (force uint32 for now?)
Alyssa Rosenzweig [Mon, 27 Apr 2020 23:11:07 +0000 (19:11 -0400)]
pan/mdg: Track v_mov type (force uint32 for now?)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>

4 years agopan/mdg: Denoise prints
Alyssa Rosenzweig [Mon, 27 Apr 2020 23:04:24 +0000 (19:04 -0400)]
pan/mdg: Denoise prints

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>

4 years agopan/mdg: Track a primary type for I/O
Alyssa Rosenzweig [Mon, 27 Apr 2020 23:01:40 +0000 (19:01 -0400)]
pan/mdg: Track a primary type for I/O

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>

4 years agopan/mdg: Another goofy comment gone
Alyssa Rosenzweig [Mon, 27 Apr 2020 22:58:21 +0000 (18:58 -0400)]
pan/mdg: Another goofy comment gone

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>

4 years agopan/mdg: Track ALU dest type
Alyssa Rosenzweig [Mon, 27 Apr 2020 22:57:34 +0000 (18:57 -0400)]
pan/mdg: Track ALU dest type

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>

4 years agopan/mdg: Track ALU src types
Alyssa Rosenzweig [Mon, 27 Apr 2020 22:55:11 +0000 (18:55 -0400)]
pan/mdg: Track ALU src types

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>

4 years agopan/mdg: Add type fields to IR
Alyssa Rosenzweig [Mon, 27 Apr 2020 22:43:12 +0000 (18:43 -0400)]
pan/mdg: Add type fields to IR

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>

4 years agopan/bi: Share ALU type printing
Alyssa Rosenzweig [Mon, 27 Apr 2020 22:43:01 +0000 (18:43 -0400)]
pan/bi: Share ALU type printing

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>

4 years agopan/mdg: Set lower_flrp16
Alyssa Rosenzweig [Mon, 27 Apr 2020 22:33:10 +0000 (18:33 -0400)]
pan/mdg: Set lower_flrp16

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>

4 years agopan/mdg: Remove old hack
Alyssa Rosenzweig [Mon, 27 Apr 2020 22:30:53 +0000 (18:30 -0400)]
pan/mdg: Remove old hack

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4793>

4 years agopan/mdg: Remove goofy 16-bit comment
Alyssa Rosenzweig [Mon, 27 Apr 2020 21:55:54 +0000 (17:55 -0400)]
pan/mdg: Remove goofy 16-bit comment

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4792>

4 years agopan/mdg: Don't break SSA
Alyssa Rosenzweig [Mon, 27 Apr 2020 21:47:13 +0000 (17:47 -0400)]
pan/mdg: Don't break SSA

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4792>

4 years agopan/mdg: SSA_FIXED_MINIMUM already covered by PAN_IS_REG
Alyssa Rosenzweig [Mon, 27 Apr 2020 21:19:24 +0000 (17:19 -0400)]
pan/mdg: SSA_FIXED_MINIMUM already covered by PAN_IS_REG

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4792>

4 years agopan/mdg: Use PAN_IS_REG
Alyssa Rosenzweig [Mon, 27 Apr 2020 20:34:53 +0000 (16:34 -0400)]
pan/mdg: Use PAN_IS_REG

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4792>

4 years agopan/mdg: Remove nir_alu_src_index
Alyssa Rosenzweig [Mon, 27 Apr 2020 20:33:54 +0000 (16:33 -0400)]
pan/mdg: Remove nir_alu_src_index

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4792>

4 years agopan/bi: Use common IR indices
Alyssa Rosenzweig [Mon, 27 Apr 2020 20:04:05 +0000 (16:04 -0400)]
pan/bi: Use common IR indices

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4792>

4 years agopanfrost: Move Bifrost IR indexing to common
Alyssa Rosenzweig [Mon, 27 Apr 2020 20:00:38 +0000 (16:00 -0400)]
panfrost: Move Bifrost IR indexing to common

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4792>

4 years agopanfrost: Fix BO reference counting
Alyssa Rosenzweig [Tue, 28 Apr 2020 19:28:11 +0000 (15:28 -0400)]
panfrost: Fix BO reference counting

Typo.

Fixes: 3283c7f4dad ("panfrost: Inline reference counting routines")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4792>

4 years agoac: enable displayable DCC on Navi12 & Navi14
Marek Olšák [Mon, 20 Apr 2020 21:29:39 +0000 (17:29 -0400)]
ac: enable displayable DCC on Navi12 & Navi14

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>

4 years agoac/surface: validate that DCC is enabled correctly on gfx9+
Marek Olšák [Sat, 18 Apr 2020 00:44:14 +0000 (20:44 -0400)]
ac/surface: validate that DCC is enabled correctly on gfx9+

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>

4 years agoac/surface: add code for gfx10 displayable DCC
Marek Olšák [Sat, 18 Apr 2020 00:37:41 +0000 (20:37 -0400)]
ac/surface: add code for gfx10 displayable DCC

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>

4 years agoac/surface: move non-displayable DCC to the end of the buffer
Marek Olšák [Wed, 22 Apr 2020 22:51:42 +0000 (18:51 -0400)]
ac/surface: move non-displayable DCC to the end of the buffer

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>

4 years agoac/surface: don't compute DCC if it's unsupported by DCN on gfx9+
Marek Olšák [Sat, 18 Apr 2020 00:27:32 +0000 (20:27 -0400)]
ac/surface: don't compute DCC if it's unsupported by DCN on gfx9+

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>

4 years agoac/surface: match get_display_flag() with expectations for is_displayable
Marek Olšák [Sat, 18 Apr 2020 00:19:26 +0000 (20:19 -0400)]
ac/surface: match get_display_flag() with expectations for is_displayable

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>

4 years agoac/surface: replace RADEON_SURF_OPTIMIZE_FOR_SPACE with !FORCE_SWIZZLE_MODE
Marek Olšák [Thu, 23 Apr 2020 05:00:24 +0000 (01:00 -0400)]
ac/surface: replace RADEON_SURF_OPTIMIZE_FOR_SPACE with !FORCE_SWIZZLE_MODE

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>

4 years agoac/surface: remove RADEON_SURF_TC_COMPATIBLE_HTILE and assume it's always set
Marek Olšák [Thu, 23 Apr 2020 04:47:04 +0000 (00:47 -0400)]
ac/surface: remove RADEON_SURF_TC_COMPATIBLE_HTILE and assume it's always set

So that drivers can enable it without worrying how the texture was
allocated.

v2: reworked the mechanism, hopefully fixes now
    added Bas Nieuwenhuizen's diff to fix radv

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>

4 years agoac/surface: rename micro tile mode enums like gfx10 uses them
Marek Olšák [Thu, 23 Apr 2020 04:31:36 +0000 (00:31 -0400)]
ac/surface: rename micro tile mode enums like gfx10 uses them

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4697>

4 years agowinsys/svga: Optionally avoid caching buffer maps
Thomas Hellstrom [Wed, 22 Apr 2020 13:03:15 +0000 (15:03 +0200)]
winsys/svga: Optionally avoid caching buffer maps

Mapping of graphics kernel buffers is quite costly. Therefore the svga
drm winsys caches all kernel buffer maps. However, that may lead to
less testing coverage of the unmap paths and (possibly) processes running
out of virtual memory space. Introduce a possibility to avoid that caching
by setting the environment variable SVGA_FORCE_KERNEL_UNMAPS to 1.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4804>

4 years agogallium/pipebuffer: Use persistent maps for slabs
Thomas Hellstrom [Wed, 22 Apr 2020 11:27:35 +0000 (13:27 +0200)]
gallium/pipebuffer: Use persistent maps for slabs

Instead of the ugly practice of relying on the provider caching maps,
introduce and use persistent pipebuffer maps. Providers that can't handle
persistent maps can't use the slab manager.

The only current user is the svga drm winsys which always maps
persistently.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4804>

4 years agoradv: Use smaller esgs_itemsize for ACO.
Timur Kristóf [Thu, 23 Apr 2020 13:13:31 +0000 (15:13 +0200)]
radv: Use smaller esgs_itemsize for ACO.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>

4 years agoaco: Use new default driver locations.
Timur Kristóf [Mon, 30 Mar 2020 15:23:25 +0000 (17:23 +0200)]
aco: Use new default driver locations.

The way the new locations are set up has much fewer gaps
between each I/O slot, so this results in a massive reduction
in the LDS usage of tessellation shaders.

Totals (GFX10):
VGPRS: 3976792 -> 3974864 (-0.05 %)
Code Size: 260552784 -> 260532860 (-0.01 %) bytes
LDS: 48723 -> 30179 (-38.06 %) blocks
Max Waves: 1053407 -> 1053583 (0.02 %)

Totals from affected shaders (1407 shaders on GFX10):
SGPRS: 59144 -> 59216 (0.12 %)
VGPRS: 63024 -> 61096 (-3.06 %)
Code Size: 2695508 -> 2675584 (-0.74 %) bytes
LDS: 47109 -> 28565 (-39.36 %) blocks
Max Waves: 12999 -> 13175 (1.35 %)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>

4 years agoradv: Use new linking helper to set default driver locations.
Timur Kristóf [Mon, 27 Apr 2020 10:22:03 +0000 (12:22 +0200)]
radv: Use new linking helper to set default driver locations.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>

4 years agonir: Add new linking helper to set linked driver locations.
Timur Kristóf [Mon, 30 Mar 2020 13:58:07 +0000 (15:58 +0200)]
nir: Add new linking helper to set linked driver locations.

This commit introduces a new function nir_assign_linked_io_var_locations
which is intended to help with assigning driver locations to shaders
during linking, primarily aimed at the VS->TCS->TES->GS stages.

It ensures that the linked shaders have the same driver locations,
and it also packs these as close to each other as possible.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>

4 years agoaco: Set config->lds_size when TES or VS is running on HW ESGS.
Timur Kristóf [Thu, 23 Apr 2020 12:02:47 +0000 (14:02 +0200)]
aco: Set config->lds_size when TES or VS is running on HW ESGS.

This doesn't fix anything, just reports the LDS size used by
merged ESGS shaders, such as vertex_geometry_gs and
tess_eval_geometry_gs.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>

4 years agoaco: Calculate workgroup size of legacy GS.
Timur Kristóf [Mon, 27 Apr 2020 17:51:40 +0000 (19:51 +0200)]
aco: Calculate workgroup size of legacy GS.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>

4 years agoaco: Remember VS/TCS output driver locations.
Timur Kristóf [Mon, 30 Mar 2020 14:54:56 +0000 (16:54 +0200)]
aco: Remember VS/TCS output driver locations.

Instead of relying on calling shader_io_get_unique_index repeatedly,
remember the which output driver location corresponds to which
varying slot.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>

4 years agoaco: Use context variables instead of calculating TCS inputs/outputs.
Timur Kristóf [Mon, 30 Mar 2020 14:11:14 +0000 (16:11 +0200)]
aco: Use context variables instead of calculating TCS inputs/outputs.

VS needs the number of TCS inputs, and TES needs the number of TCS
outputs.

It is error-prone to repeat those calculations in both instruction
selection and setup. Just set them in one place instead.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>

4 years agoradv: Refactor calculate_tess_lds_size and get_tcs_num_patches.
Timur Kristóf [Mon, 30 Mar 2020 14:04:53 +0000 (16:04 +0200)]
radv: Refactor calculate_tess_lds_size and get_tcs_num_patches.

Previously these functions needed the bit mask of the TCS outputs
and patch outputs written, and concluded the number of outputs
from that.

Now, they take the number of outputs and patch outputs instead.
This will allow the backend compiler to better optimize the
LDS layout.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4388>

4 years agoaco: consider blocks unreachable if they are in the logical cfg
Rhys Perry [Mon, 27 Apr 2020 12:53:59 +0000 (13:53 +0100)]
aco: consider blocks unreachable if they are in the logical cfg

unreachable was true if the last block is unreachable in the linear cfg,
but it should also be true if it is unreachable in the logical cfg.

Fixes dEQP-VK.graphicsfuzz.for-with-ifs-and-return

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 8d8c864beba399ae4ee2267f680d1f600ad32767
    ('aco: improve check for unreachable loop continue blocks')

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4764>

4 years agoegl/wayland: Fix zwp_linux_dmabuf usage
Christopher James Halse Rogers [Tue, 24 Mar 2020 03:19:51 +0000 (14:19 +1100)]
egl/wayland: Fix zwp_linux_dmabuf usage

There's no guarantee that the formats advertised by wl_drm and the formats
advertised by zwp_linux_dmabuf_v1 are the same.

get_back_bo() handles this by falling back from createImageWithModifiers() to
createImage() when there's a wl_drm format but no corresponding linux_dmabuf
format, but create_wl_buffer() unconditionally tries to create a linux_dmabuf
buffer unless DRIimage has DRM_FORMAT_MOD_INVALID.

Fix this by always checking if the DRIimage modifier has been advertised
by zwp_linux_dmabuf_v1, and falling back to wl_drm if not.

If DRM_FORMAT_MOD_INVALID has been advertised then we trust the client
has allocated something appropriate and treat any modifier as matching.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2220
Signed-off-by: Christopher James Halse Rogers <christopher.halse.rogers@canonical.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4294>

4 years agoiris/bufmgr: Check if iris_bo_gem_mmap failed
Danylo Piliaiev [Tue, 28 Apr 2020 11:51:26 +0000 (14:51 +0300)]
iris/bufmgr: Check if iris_bo_gem_mmap failed

After refactoring of iris_bo_map_cpu and iris_bo_map_wc - immediate
return of NULL on failure to mmap a buffer was lost.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2855
Fixes: 5bc3f52dd8c2b5acaae959ccae2e1fb7c769bb22
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4786>

4 years agoanv: remove assert from GetImageMemoryRequirements[2]
Tapani Pälli [Fri, 24 Apr 2020 12:28:41 +0000 (15:28 +0300)]
anv: remove assert from GetImageMemoryRequirements[2]

This assert is actually correct but due to how android hardware buffer
support is implemented we should remove it, otherwise debug build of
mesa hits the assert with Android CTS tests.

Test creates VkImage with non-external format and sets up
VkExternalMemoryImageCreateInfo to indicate that image *may* be used
with Android hardwarebuffer handle. Then test attempts to get image
memory requirements. Problem with this is that we setup all android
supporting images as having external format and thus hit the assert as
the size has not been set yet. This is not a problem in practice since
android will bind ahw memory with the image later on.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2807
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4762>

4 years agogitlab-ci: add a list of expected failures for FIJI with ACO
Samuel Pitoiset [Wed, 29 Apr 2020 07:53:48 +0000 (09:53 +0200)]
gitlab-ci: add a list of expected failures for FIJI with ACO

Timur has this chip now. The depth stencil resolve failures are
somehow unexpected.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4805>

4 years agoradv: advertise VK_EXT_robustness2
Samuel Pitoiset [Wed, 15 Apr 2020 09:39:28 +0000 (11:39 +0200)]
radv: advertise VK_EXT_robustness2

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>

4 years agoradv: handle NULL vertex bindings
Samuel Pitoiset [Wed, 15 Apr 2020 09:48:13 +0000 (11:48 +0200)]
radv: handle NULL vertex bindings

With VK_EXT_robustness2, an element of pBuffers can be NULL.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>

4 years agoradv: handle NULL descriptors
Samuel Pitoiset [Thu, 23 Apr 2020 14:02:59 +0000 (16:02 +0200)]
radv: handle NULL descriptors

All fields must be zero, otherwise the HW hangs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>

4 years agoaco: fix adjusting the sample index with FMASK if value is negative
Samuel Pitoiset [Mon, 27 Apr 2020 15:27:22 +0000 (17:27 +0200)]
aco: fix adjusting the sample index with FMASK if value is negative

The SPIR-V spec doesn't say explicitly that the sample index
must be an unsigned integer.

This fixes crashes with some new VK_EXT_robustness2 tests.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>

4 years agoaco: fix nir_texop_texture_samples with NULL descriptors
Samuel Pitoiset [Mon, 27 Apr 2020 15:02:18 +0000 (17:02 +0200)]
aco: fix nir_texop_texture_samples with NULL descriptors

With VK_EXT_robustness2, descriptors can be NULL and the number of
samples returned by nir_texop_texture_samples should be 0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>

4 years agoac/llvm: fix nir_texop_texture_samples with NULL descriptors
Samuel Pitoiset [Mon, 27 Apr 2020 11:04:40 +0000 (13:04 +0200)]
ac/llvm: fix nir_texop_texture_samples with NULL descriptors

With VK_EXT_robustness2, descriptors can be NULL and the number of
samples returned by nir_texop_texture_samples should be 0.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4775>

4 years agointel/fs: Only stall after sending all memory fence messages
Caio Marcelo de Oliveira Filho [Fri, 17 Jan 2020 22:17:58 +0000 (14:17 -0800)]
intel/fs: Only stall after sending all memory fence messages

In Gen11+, when emitting a fence for both L3 and SLM, the generated
code would look like

    SEND, MOV (for stall), SEND, MOV (for stall)

This commit change that so two SENDs are emitted before the MOVs for
stall.  This is similar to the approach used in Ivy Bridge for the
render fence.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>

4 years agointel/fs,vec4: Pull stall logic for memory fences up into the IR
Caio Marcelo de Oliveira Filho [Fri, 17 Jan 2020 23:07:44 +0000 (15:07 -0800)]
intel/fs,vec4: Pull stall logic for memory fences up into the IR

Instead of emitting the stall MOV "inside" the
SHADER_OPCODE_MEMORY_FENCE generation, use the scheduling fences when
creating the IR.

For IvyBridge, every (data cache) fence is accompained by a render
cache fence, that now is explicit in the IR, two
SHADER_OPCODE_MEMORY_FENCEs are emitted (with different SFIDs).

Because Begin and End interlock intrinsics are effectively memory
barriers, move its handling alongside the other memory barrier
intrinsics.  The SHADER_OPCODE_INTERLOCK is still used to distinguish
if we are going to use a SENDC (for Begin) or regular SEND (for End).

This change is a preparation to allow emitting both SENDs in Gen11+
before we can stall on them.

Shader-db results for IVB (i965):

    total instructions in shared programs: 11971190 -> 11971200 (<.01%)
    instructions in affected programs: 11482 -> 11492 (0.09%)
    helped: 0
    HURT: 8
    HURT stats (abs)   min: 1 max: 3 x̄: 1.25 x̃: 1
    HURT stats (rel)   min: 0.03% max: 0.50% x̄: 0.14% x̃: 0.10%
    95% mean confidence interval for instructions value: 0.66 1.84
    95% mean confidence interval for instructions %-change: 0.01% 0.27%
    Instructions are HURT.

  Unlike the previous code, that used the `mov g1 g2` trick to force
  both `g1` and `g2` to stall, the scheduling fence will generate `mov
  null g1` and `mov null g2`.  During review it was decided it was not
  worth keeping the special codepath for the small effect will have.

Shader-db results for HSW (i965), BDW and SKL don't have a change
on instruction count, but do report changes in cycles count, showing
SKL results below

    total cycles in shared programs: 341738444 -> 341710570 (<.01%)
    cycles in affected programs: 7240002 -> 7212128 (-0.38%)
    helped: 46
    HURT: 5
    helped stats (abs) min: 14 max: 1940 x̄: 676.22 x̃: 154
    helped stats (rel) min: <.01% max: 2.62% x̄: 1.28% x̃: 0.95%
    HURT stats (abs)   min: 2 max: 1768 x̄: 646.40 x̃: 362
    HURT stats (rel)   min: <.01% max: 0.83% x̄: 0.28% x̃: 0.08%
    95% mean confidence interval for cycles value: -777.71 -315.38
    95% mean confidence interval for cycles %-change: -1.42% -0.83%
    Cycles are helped.

  This seems to be the effect of allocating two registers separatedly
  instead of a single one with size 2, which causes different register
  allocation, affecting the cycle estimates.

while ICL also has not change on instruction count but report changes
negative changes in cycles

    total cycles in shared programs: 352665369 -> 352707484 (0.01%)
    cycles in affected programs: 9608288 -> 9650403 (0.44%)
    helped: 4
    HURT: 104
    helped stats (abs) min: 24 max: 128 x̄: 88.50 x̃: 101
    helped stats (rel) min: <.01% max: 0.85% x̄: 0.46% x̃: 0.49%
    HURT stats (abs)   min: 2 max: 2016 x̄: 408.36 x̃: 48
    HURT stats (rel)   min: <.01% max: 3.31% x̄: 0.88% x̃: 0.45%
    95% mean confidence interval for cycles value: 256.67 523.24
    95% mean confidence interval for cycles %-change: 0.63% 1.03%
    Cycles are HURT.

  AFAICT this is the result of the case above.

Shader-db results for TGL have similar cycles result as ICL, but also
affect instructions

    total instructions in shared programs: 17690586 -> 17690597 (<.01%)
    instructions in affected programs: 64617 -> 64628 (0.02%)
    helped: 55
    HURT: 32
    helped stats (abs) min: 1 max: 16 x̄: 4.13 x̃: 3
    helped stats (rel) min: 0.05% max: 2.78% x̄: 0.86% x̃: 0.74%
    HURT stats (abs)   min: 1 max: 65 x̄: 7.44 x̃: 2
    HURT stats (rel)   min: 0.05% max: 4.58% x̄: 1.13% x̃: 0.69%
    95% mean confidence interval for instructions value: -2.03 2.28
    95% mean confidence interval for instructions %-change: -0.41% 0.15%
    Inconclusive result (value mean confidence interval includes 0).

  Now that more is done in the IR, more dependencies are visible and
  more SWSB annotations are emitted.  Mixed with different register
  allocation decisions like above, some shaders will see more `sync
  nops` while others able to avoid them.

  Most of the new `sync nops` are also redundant and could be dropped,
  which will be fixed in a separate change.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>

4 years agointel/fs: Allow FS_OPCODE_SCHEDULING_FENCE stall on registers
Caio Marcelo de Oliveira Filho [Fri, 17 Jan 2020 22:52:13 +0000 (14:52 -0800)]
intel/fs: Allow FS_OPCODE_SCHEDULING_FENCE stall on registers

It will generate the MOVs (or SYNC_NOP in Gen12+) needed for stall.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3278>

4 years agoradv: Expose 4G element texel buffers.
Bas Nieuwenhuizen [Tue, 28 Apr 2020 15:04:25 +0000 (17:04 +0200)]
radv: Expose 4G element texel buffers.

Old value seems to be copied from anv.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4787>

4 years agoiris: Fix downcast of bound_vertex_buffers from uint64_t to int
Kenneth Graunke [Tue, 28 Apr 2020 21:04:58 +0000 (14:04 -0700)]
iris: Fix downcast of bound_vertex_buffers from uint64_t to int

This is the wrong data type, the original field - and the values we're
adding in - are both 64-bit unsigned.  Keep the original data type.

Thanks to Dave Airlie for finding this while reading the code.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4802>

4 years agointel/ir: Remove scheduling-based cycle count estimates.
Francisco Jerez [Fri, 3 Apr 2020 00:42:21 +0000 (17:42 -0700)]
intel/ir: Remove scheduling-based cycle count estimates.

The cycle count estimation logic part of the scheduler is now
redundant with the shader performance modeling pass, and the estimates
can be consolidated into the brw::performance analysis result object
instead of being part of the CFG, which guarantees that the estimates
cannot be accessed without previously calling the
performance_analysis::require() method, which makes sure that the
right analysis pass is executed at the right time if we don't already
have up-to-date cached results.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/ir: Pass block cycle count information explicitly to disassembler.
Francisco Jerez [Fri, 3 Apr 2020 00:42:57 +0000 (17:42 -0700)]
intel/ir: Pass block cycle count information explicitly to disassembler.

So we can eventually remove the cycle count estimates from the CFG
data structure and consolidate performance information in the
brw::performance object.

It would be cleaner to pass the brw::performance object directly to
the disassembler but that isn't straightforward since the disassembler
is built as a plain C file unlike the rest of the compiler back-end.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/ir: Use brw::performance object instead of CFG cycle counts for codegen stats.
Francisco Jerez [Thu, 26 Mar 2020 23:27:32 +0000 (16:27 -0700)]
intel/ir: Use brw::performance object instead of CFG cycle counts for codegen stats.

These should be more accurate than the current cycle counts, since
among other things they consider the effect of post-scheduling passes
like the software scoreboard on TGL.  In addition it will enable us to
clean up some of the now redundant cycle-count estimation
functionality in the instruction scheduler.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs: Add INTEL_DEBUG=no32 debugging flag.
Francisco Jerez [Wed, 22 Apr 2020 20:29:34 +0000 (13:29 -0700)]
intel/fs: Add INTEL_DEBUG=no32 debugging flag.

This is useful in order to identify codegen issues caused by SIMD32.
It doesn't currently have any effect on compute shaders since SIMD32
dispatch is only enabled for CS when it's strictly necessary to do so
in order to support the workgroup size requested for the shader --
That might change in the future though when we hook up the SIMD32
heuristic to CS compilation.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs: Implement performance analysis-based SIMD32 heuristic for fragment shaders.
Francisco Jerez [Fri, 3 Apr 2020 00:30:06 +0000 (17:30 -0700)]
intel/fs: Implement performance analysis-based SIMD32 heuristic for fragment shaders.

The heuristic enables the SIMD32 fragment shader based on whether the
IR performance modeling pass predicts it to have greater throughput
than the SIMD16 and SIMD8 variants of the same shader.  It would be
straightforward to do the same thing in order to control whether
SIMD16 dispatch is enabled, but it's pending additional performance
evaluation.

The INTEL_DEBUG=do32 option is left around in order to force the
SIMD32 shader to be used regardless of the result of the heuristic,
since it's useful as a debugging aid e.g. in order to identify
SIMD32-specific codegen issues which may be masked by the SIMD32
heuristic, or cases where the heuristic is incorrectly disabling
SIMD32 shaders that offer a performance advantage.

Currently this is only enabled on Gen6+, since SIMD32 codegen support
is incomplete on earlier platforms.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs: Heap-allocate fs_visitors in brw_compile_fs().
Francisco Jerez [Fri, 3 Apr 2020 00:16:45 +0000 (17:16 -0700)]
intel/fs: Heap-allocate fs_visitors in brw_compile_fs().

This makes brw_compile_fs() look a bit more similar to
brw_compile_cs().  It saves us three v*_shader_stats local variables,
and will save us additional triplicated declarations as we start
tracking IR performance analysis results.

The triplicated cfg pointers are left around because they're set to
NULL to mark specific dispatch modes as disabled (e.g. in order to
enforce hardware restrictions).  Doing the same thing with the visitor
pointers would cause data leaks.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/ir: Import shader performance analysis pass.
Francisco Jerez [Thu, 26 Mar 2020 21:59:02 +0000 (14:59 -0700)]
intel/ir: Import shader performance analysis pass.

This introduces an analysis pass intended to estimate several
performance statistics of the shader, including cycle count latency
and throughput values, based on static modeling.  It has instruction
performance information more comprehensive than the current scheduling
pass for all platforms between Gen4-11, and works on both the FS and
VEC4 back-end.

The most immediate purpose of this pass is to implement a heuristic
meant to determine whether using SIMD32 dispatch for a fragment shader
can be expected to help more than it hurts.  In addition this will
allow the effect of passes run after scheduling (e.g. the TGL software
scoreboard pass and the VEC4 dependency control pass) to be visible in
shader-db statistics.

But that isn't the end of the story, other potential applications of
this pass (not part of this MR) I've been playing around with are:

 - Implement a similar SIMD16 heuristic allowing the identification of
   inefficient SIMD16 fragment shaders.

 - Implement similar SIMD16 and SIMD32 heuristics for the compute
   shader stage -- Currently compute shader builds always use the
   SIMD16 shader if available and never use the SIMD32 shader unless
   strictly necessary, which is suboptimal under certain conditions.

 - Hook up to the instruction scheduler in order to improve the
   accuracy of its timing information.

 - Use as heuristic in order to drive the selection of scheduling
   modes (Matt was experimenting with that).

 - Plug to the TGL software scoreboard pass in order to implement a
   more effective SBID token allocation algorithm, since in general
   the optimal token allocation depends on the timings of all
   instructions in the program.

 - Use its bottleneck detection functionality in order to implement a
   heuristic computing a more optimal bound for the number of fragment
   shader threads executed in parallel (by adjusting the
   MaximumNumberofThreadsPerPSD control of 3DSTATE_PS).

As a follow-up I'm planning to submit updated timing information for
Gen12 platforms -- Everything else required to support Gen12 like SWSB
handling is already included in this patch, but there were some IP
concerns regarding the TGL timing parameters since they cannot
currently be obtained with the documentation and hardware which is
publicly available.  The timing parameters for any previous Gen7-11
platforms can be obtained by anyone by sampling the timestamp register
using e.g. shader_time, though I have some more convenient
instrumentation coming up.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/vec4: Fix constness of vec4_instruction::reads_flag() and ::writes_flag().
Francisco Jerez [Sat, 22 Feb 2020 09:17:21 +0000 (01:17 -0800)]
intel/vec4: Fix constness of vec4_instruction::reads_flag() and ::writes_flag().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs: Replace fs_visitor::bank_conflict_cycles() with stand-alone function.
Francisco Jerez [Thu, 2 Apr 2020 23:20:34 +0000 (16:20 -0700)]
intel/fs: Replace fs_visitor::bank_conflict_cycles() with stand-alone function.

This will be re-usable by the IR performance analysis pass.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs: Fix constness of argument of fs_instruction_scheduler::is_compressed().
Francisco Jerez [Thu, 2 Apr 2020 23:18:12 +0000 (16:18 -0700)]
intel/fs: Fix constness of argument of fs_instruction_scheduler::is_compressed().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs: Rename half() helpers to quarter(), allow index up to 3.
Francisco Jerez [Fri, 3 Apr 2020 20:04:43 +0000 (13:04 -0700)]
intel/fs: Rename half() helpers to quarter(), allow index up to 3.

Makes more sense considering SIMD32.  Relaxing the assertion in
brw_ir_fs.h will be required in order to avoid assertion failures on
SNB with SIMD32 fragment shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/ir: Add missing initialization of backend_reg::offset during construction.
Francisco Jerez [Thu, 26 Mar 2020 22:01:13 +0000 (15:01 -0700)]
intel/ir: Add missing initialization of backend_reg::offset during construction.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs/gen12: Fix Render Target Read header setup for new thread payload layout.
Francisco Jerez [Tue, 7 Apr 2020 23:39:59 +0000 (16:39 -0700)]
intel/fs/gen12: Fix Render Target Read header setup for new thread payload layout.

In Gen12 the Poly 0 Info DWORD containing the Viewport Index and
Render Target Index fields were moved from r0.0 to r1.1 in order to
make room for dual-polygon dispatch.  The render target message format
was updated to expect that information in the same location, so we
didn't need to make any changes for framebuffer fetch to work with
SIMD8 and SIMD16 dispatch.  Unfortunately that won't work with SIMD32,
since the render target message header is assembled from r0 and r2
instead of r1, and the r2 thread payload wasn't updated with an
additional copy of the same information.  We need to fix things up
manually instead.  This avoids a handful of
EXT_shader_framebuffer_fetch regressions in combination with SIMD32
fragment shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs/gen12: Work around dual-source blending hangs in combination with SIMD32.
Francisco Jerez [Wed, 8 Apr 2020 00:31:07 +0000 (17:31 -0700)]
intel/fs/gen12: Work around dual-source blending hangs in combination with SIMD32.

This applies the same work-around I commited as b84fa0b31e67
"intel/fs/gen11: Work around dual-source blending hangs in combination
with SIMD32." to Gen12, which seems to suffer from the same hardware
bug found empirically.  The failure mode seems to be identical.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs/gen12: Fix hangs with per-sample SIMD32 fragment shader dispatch.
Francisco Jerez [Wed, 8 Apr 2020 00:22:10 +0000 (17:22 -0700)]
intel/fs/gen12: Fix hangs with per-sample SIMD32 fragment shader dispatch.

The Gen12 docs are rather contradictory regarding the dispatch
configurations supported by the fragment shader -- The same table
present in previous generations seems to imply that only one dispatch
mode can be enabled when doing per-sample shading, but a restriction
documented in the 3DSTATE_PS_BODY page implies the opposite: That
SIMD32 can only be used in combination with some other dispatch mode.

The latter seems to match the behavior of real hardware as I could
tell from my testing: A bunch of multisample test-cases that do
per-sample shading hang if we only provide a SIMD32 shader.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agomesa: Follow OpenGL conversion rules for values that exceed storage size
Dylan Baker [Wed, 22 Apr 2020 06:32:45 +0000 (23:32 -0700)]
mesa: Follow OpenGL conversion rules for values that exceed storage size

Section 2.2.2 (Data Conversions For State Query Commands) of the
OpenGL 4.5 spec says:

  Following these steps, if a value is so large in magnitude that
  it cannot be represented by the returned data type, then the
  nearest value representable using that type is returned.

The current code doesn't do the correct thing, because it truncates a
long (potentially a 64bit values) to an int.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2828
Fixes: 53c36dfcfe3eb3749a53267f054870280afb0d71
       ("replace IROUND with util functions")

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4673>

4 years agopan/bit: Add BITWISE test
Alyssa Rosenzweig [Tue, 28 Apr 2020 17:57:31 +0000 (13:57 -0400)]
pan/bit: Add BITWISE test

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4790>