mesa.git
4 years agoradeonsi: tell the shader disk cache what IR is used
Pierre-Eric Pelloux-Prayer [Wed, 30 Oct 2019 13:28:01 +0000 (14:28 +0100)]
radeonsi: tell the shader disk cache what IR is used

Until 8bef4df196fbb the IR (TGSI or NIR) was used in disk_cache driver_flags.
This commit restores this features to avoid crashing when switching from
one IR to the other.

As radeonsi's default is TGSI, I used "driver_flags & 0x8000000 = 0" for TGSI
to keep the same driver_flags.

Fixes: 8bef4df196f ("radeonsi: add si_debug_options for convenient adding/removing of options")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agointel/perf: add TGL support
Lionel Landwerlin [Fri, 20 Sep 2019 18:11:33 +0000 (21:11 +0300)]
intel/perf: add TGL support

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agoandroid: Add panfrost support to build scripts
Robert Foss [Tue, 22 Oct 2019 17:31:52 +0000 (19:31 +0200)]
android: Add panfrost support to build scripts

Currently the Android build system doesn't expose the panfrost
driver.

This patch enables the panfrost driver to be build on for the
Android platform.

Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-By: Rohan Garg <rohan.garg@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agonir: Build nir_lower_point_size.c in libmesa_nir
Robert Foss [Fri, 25 Oct 2019 15:34:37 +0000 (17:34 +0200)]
nir: Build nir_lower_point_size.c in libmesa_nir

nir_lower_point_size.c was not build into the libmesa_nir library for non-meson
builds. However it was included in the meson build.

This patch fixes that.

Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agov3d: rename vertex shader key (num)_fs_inputs fields
Iago Toral Quiroga [Tue, 29 Oct 2019 07:32:44 +0000 (08:32 +0100)]
v3d: rename vertex shader key (num)_fs_inputs fields

Until now this made sense because we always paired vertex shaders
with fragment shaders, but as soon as we implement geometry and
tessellation shaders that will no longer be the case, so rename
this to (num_)used_outputs.

v2: Use 'used_outputs' instead of ns_outputs, which is more explicit (Eric).

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoandroid: aco: fix Lower to CSSA
Mauro Rossi [Thu, 31 Oct 2019 00:59:07 +0000 (01:59 +0100)]
android: aco: fix Lower to CSSA

Fixes the following building error:

external/mesa/src/amd/compiler/aco_spill.cpp:1768:
error: undefined reference to 'aco::lower_to_cssa(aco::Program*, aco::live&, radv_nir_compiler_options const*)'

Fixes: 0b8216b ("aco: Lower to CSSA")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
4 years agogallium/swr: Fix depth values for blit scenario
Jan Zielinski [Tue, 29 Oct 2019 18:29:27 +0000 (19:29 +0100)]
gallium/swr: Fix depth values for blit scenario

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
4 years agoiris/gen11+: Move flush for render target change
Jordan Justen [Fri, 15 Feb 2019 19:35:28 +0000 (11:35 -0800)]
iris/gen11+: Move flush for render target change

When starting a BLORP operation, we do the BTI-change flush.  However,
when ending it and transitioning back to regular drawing, we change the
render target again - without a set_framebuffer_state() call.  We need
to do the BTI flush there too.  BLORP flags IRIS_DIRTY_RENDER_BUFFER
now, which will cause the next draw to get the BTI flush again.

(explanation of fix by Ken)

Fixes: 2b956a093a1 ("iris: totally untested icelake support")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agoiris: Add IRIS_DIRTY_RENDER_BUFFER state flag
Jordan Justen [Fri, 15 Feb 2019 19:31:31 +0000 (11:31 -0800)]
iris: Add IRIS_DIRTY_RENDER_BUFFER state flag

Fixes: 2b956a093a1 ("iris: totally untested icelake support")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agoradv: declare NGG scratch for VS or TES and only on GFX10
Samuel Pitoiset [Mon, 28 Oct 2019 13:41:13 +0000 (14:41 +0100)]
radv: declare NGG scratch for VS or TES and only on GFX10

Do not need to declare it for other stages because this is for
streamout.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agolima: add cubemap support
Arno Messiaen [Tue, 17 Sep 2019 21:40:03 +0000 (23:40 +0200)]
lima: add cubemap support

Signed-off-by: Arno Messiaen <arnomessiaen@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
4 years agolima: introduce ppir_op_load_coords_reg to differentiate between loading texture...
Arno Messiaen [Sat, 12 Oct 2019 22:05:57 +0000 (00:05 +0200)]
lima: introduce ppir_op_load_coords_reg to differentiate between loading texture coordinates straight from a varying vs loading them from a register

Signed-off-by: Arno Messiaen <arnomessiaen@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
4 years agolima: add layer_stride field to lima_resource struct
Arno Messiaen [Sun, 29 Sep 2019 21:20:45 +0000 (23:20 +0200)]
lima: add layer_stride field to lima_resource struct

Signed-off-by: Arno Messiaen <arnomessiaen@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
4 years agolima: fix stride in texture descriptor
Arno Messiaen [Sun, 29 Sep 2019 21:21:39 +0000 (23:21 +0200)]
lima: fix stride in texture descriptor

Signed-off-by: Arno Messiaen <arnomessiaen@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
4 years agointel/compiler: Report the number of non-spill/fill SEND messages on vec4 too
Ian Romanick [Tue, 29 Oct 2019 19:18:16 +0000 (12:18 -0700)]
intel/compiler: Report the number of non-spill/fill SEND messages on vec4 too

This make shader-db's report.py work on Haswell and earlier platforms.
The problem is that the script would detect the "sends" output for
scalar shaders and expect in in vec4 shaders too.  When it didn't find
it, the script would fail with:

    Traceback (most recent call last):
      File "./report.py", line 351, in <module>
        main()
      File "./report.py", line 182, in main
        before_count = before[p][m]
    KeyError: 'sends'

Fixes: f192741ddd8 ("intel/compiler: Report the number of non-spill/fill SEND messages")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agonir: fix couple of compile warnings
Tapani Pälli [Wed, 30 Oct 2019 12:43:57 +0000 (14:43 +0200)]
nir: fix couple of compile warnings

Fixes "warning: braces around scalar initializer" warnings.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agoradv: Fix timeout handling in syncobj wait.
Bas Nieuwenhuizen [Wed, 30 Oct 2019 20:58:42 +0000 (21:58 +0100)]
radv: Fix timeout handling in syncobj wait.

libdrm returns -errno instead of directly the ioctl ret of -1.

Fixes: 1c3cda7d277 "radv: Add syncobj signal/reset/wait to winsys."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agonv50/ir: mark STORE destination inputs as used
Ilia Mirkin [Mon, 14 Oct 2019 06:40:11 +0000 (02:40 -0400)]
nv50/ir: mark STORE destination inputs as used

Observed an issue when looking at the code generatedy by the
image-vertex-attrib-input-output piglit test. Even though the test
itself worked fine (due to TIC 0 being used for the image), this needs
to be fixed.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
4 years agogm107/ir: fix loading z offset for layered 3d image bindings
Ilia Mirkin [Mon, 4 Feb 2019 04:25:07 +0000 (23:25 -0500)]
gm107/ir: fix loading z offset for layered 3d image bindings

Unfortuantely we don't know if a particular load is a real 2d image (as
would be a cube face or 2d array element), or a layer of a 3d image.
Since we pass in the TIC reference, the instruction's type has to match
what's in the TIC (experimentally). In order to properly support
bindless images, this also can't be done by looking at the current
bindings and generating appropriate code.

As a result all plain 2d loads are converted into a pair of 2d/3d loads,
with appropriate predicates to ensure only one of those actually
executes, and the values are all merged in.

This goes somewhat against the current flow, so for GM107 we do the OOB
handling directly in the surface processing logic. Perhaps the other
gens should do something similar, but that is left to another change.

This fixes dEQP tests like image_load_store.3d.*_single_layer and GL-CTS
tests like shader_image_load_store.non-layered_binding without breaking
anything else.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "20.0" <mesa-stable@lists.freedesktop.org>
4 years agointel/dev: set default num_eu_per_subslice on gen12
Lionel Landwerlin [Wed, 30 Oct 2019 22:03:30 +0000 (00:03 +0200)]
intel/dev: set default num_eu_per_subslice on gen12

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 8125d7960b ("intel/dev: Add preliminary device info for Tigerlake")
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agodocs/new_features: Empty the feature list for the 20.0 cycle
Dylan Baker [Wed, 30 Oct 2019 22:18:27 +0000 (15:18 -0700)]
docs/new_features: Empty the feature list for the 20.0 cycle

4 years agoBump VERSION to 20.0.0-devel
Dylan Baker [Wed, 30 Oct 2019 21:56:02 +0000 (14:56 -0700)]
Bump VERSION to 20.0.0-devel

4 years agodocs/relnotes/new_features.txt: Add note about gen12 support
Jordan Justen [Fri, 25 Oct 2019 11:20:37 +0000 (04:20 -0700)]
docs/relnotes/new_features.txt: Add note about gen12 support

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
4 years agointel/eu/validate/gen12: Add TGL to eu_validate tests.
Jordan Justen [Tue, 20 Mar 2018 15:23:35 +0000 (08:23 -0700)]
intel/eu/validate/gen12: Add TGL to eu_validate tests.

These reworks were combined into this patch:

 * Matt Turner: i965: Disable NoDDChk/NoDDClr test on Gen12+
 * Francisco Jerez: intel/eu/validate/gen12: Disable
   qword_low_power_no_depctrl eu_validate test.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/dev: Add preliminary device info for Tigerlake
Jordan Justen [Tue, 8 Aug 2017 21:08:58 +0000 (14:08 -0700)]
intel/dev: Add preliminary device info for Tigerlake

Reworks:
 * adjust 64-bit support, hiz (Jason Ekstrand)
 * sim-id (Lionel Landwerlin)
 * adjust threads, urb size (Rafael Antognolli)
 * adjust urb size (Kenneth Graunke)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/dump_gpu: handle context create extended ioctl
Lionel Landwerlin [Fri, 25 Oct 2019 10:52:47 +0000 (13:52 +0300)]
intel/dump_gpu: handle context create extended ioctl

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
4 years agoradv: Allocate space for temp. semaphore parts.
Bas Nieuwenhuizen [Wed, 30 Oct 2019 18:52:51 +0000 (19:52 +0100)]
radv: Allocate space for temp. semaphore parts.

Calculated the number for allocation and did not
reserve space ....

Fixes: 2117c53b723 "radv: Add temporary datastructure for submissions."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agoanv: Add Tile Cache Flush for Unified Cache.
Rafael Antognolli [Tue, 30 Apr 2019 20:34:20 +0000 (13:34 -0700)]
anv: Add Tile Cache Flush for Unified Cache.

4 years agoblorp: Add Tile Cache Flush for Unified Cache.
Rafael Antognolli [Tue, 30 Apr 2019 20:34:06 +0000 (13:34 -0700)]
blorp: Add Tile Cache Flush for Unified Cache.

4 years agoiris: Add Tile Cache Flush for Unified Cache.
Rafael Antognolli [Mon, 29 Apr 2019 18:05:07 +0000 (11:05 -0700)]
iris: Add Tile Cache Flush for Unified Cache.

4 years agointel/genxml: Add gen12 tile cache flush bit
Jordan Justen [Sat, 9 Sep 2017 02:08:21 +0000 (19:08 -0700)]
intel/genxml: Add gen12 tile cache flush bit

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
4 years agoaco: implement VGPR spilling
Daniel Schürmann [Thu, 24 Oct 2019 16:27:25 +0000 (18:27 +0200)]
aco: implement VGPR spilling

VGPR spilling is implemented via MUBUF instructions and scratch memory.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: always set scratch_offset in startpgm
Daniel Schürmann [Wed, 30 Oct 2019 17:24:39 +0000 (18:24 +0100)]
aco: always set scratch_offset in startpgm

This patch also moves private_segment_buffer and
scratch_offset to Program to easily access it.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: omit linear VGPRs as spill variables
Daniel Schürmann [Wed, 30 Oct 2019 13:54:44 +0000 (14:54 +0100)]
aco: omit linear VGPRs as spill variables

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: ensure that spilled VGPR reloads are done after p_logical_start
Daniel Schürmann [Wed, 30 Oct 2019 13:42:00 +0000 (14:42 +0100)]
aco: ensure that spilled VGPR reloads are done after p_logical_start

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: simplify calculation of target register pressure when spilling
Daniel Schürmann [Thu, 24 Oct 2019 09:38:37 +0000 (11:38 +0200)]
aco: simplify calculation of target register pressure when spilling

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: fix new_demand calculation for first instructions
Rhys Perry [Wed, 30 Oct 2019 18:00:36 +0000 (18:00 +0000)]
aco: fix new_demand calculation for first instructions

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: don't add interferences between spilled phi operands
Daniel Schürmann [Wed, 30 Oct 2019 11:32:32 +0000 (12:32 +0100)]
aco: don't add interferences between spilled phi operands

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: consider loop_exit blocks like merge blocks, even if they have only one predecessor
Daniel Schürmann [Wed, 30 Oct 2019 11:04:22 +0000 (12:04 +0100)]
aco: consider loop_exit blocks like merge blocks, even if they have only one predecessor

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: don't insert the exec mask into set of live-out variables when spilling
Daniel Schürmann [Wed, 30 Oct 2019 11:00:23 +0000 (12:00 +0100)]
aco: don't insert the exec mask into set of live-out variables when spilling

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: fix transitive affinities of spilled variables
Daniel Schürmann [Wed, 16 Oct 2019 14:39:06 +0000 (16:39 +0200)]
aco: fix transitive affinities of spilled variables

Variables spilled on both branch legs need to be assigned to the same spilling slot.
These affinities can be transitive through multiple merge blocks.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: fix live-range splits of phis
Daniel Schürmann [Tue, 29 Oct 2019 10:58:21 +0000 (11:58 +0100)]
aco: fix live-range splits of phis

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: remove potential critical edge on loops.
Daniel Schürmann [Tue, 29 Oct 2019 10:57:11 +0000 (11:57 +0100)]
aco: remove potential critical edge on loops.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: improve live variable analysis
Daniel Schürmann [Tue, 29 Oct 2019 10:56:09 +0000 (11:56 +0100)]
aco: improve live variable analysis

This patch makes the live variable analysis more precise
w.r.t. killed phi operands and the block's register pressure.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: Lower to CSSA
Daniel Schürmann [Tue, 15 Oct 2019 16:23:52 +0000 (18:23 +0200)]
aco: Lower to CSSA

Converting to 'Conventional SSA Form' ensures correctness w.r.t. spilling of phi nodes.
Previously, it was possible that phi operands have intersecting live-ranges, and thus,
couldn't get spilled to the same spilling slot. For this reason, ACO tried to avoid to
spill phis, even if it was beneficial.
This patch implements a conversion pass which is currently only called if spilling is necessary.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoetnaviv: fix non-pointsprite points on GC7000L
Jonathan Marek [Wed, 3 Jul 2019 18:08:37 +0000 (14:08 -0400)]
etnaviv: fix non-pointsprite points on GC7000L

Fixes these deqp tests (and more):
dEQP-GLES2.functional.draw.draw_arrays.points.single_attribute
dEQP-GLES2.functional.draw.draw_arrays.points.multiple_attributes
dEQP-GLES2.functional.draw.draw_arrays.points.default_attribute
dEQP-GLES2.functional.draw.draw_elements.points.single_attribute
dEQP-GLES2.functional.draw.draw_elements.points.multiple_attributes
dEQP-GLES2.functional.draw.draw_elements.points.default_attribute

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoetnaviv: stencil fix
Jonathan Marek [Sun, 20 Oct 2019 18:37:25 +0000 (14:37 -0400)]
etnaviv: stencil fix

The final version of previous stencil fix patch ended up breaking one-sided
stencil.

Fixes remaining failures in these deqp tests (tested on GC3000/GC7000L):
dEQP-GLES2.functional.fragment_ops.depth_stencil.*

Note: deqp tests require --deqp-gl-config-name=rgba8888d24s8ms0

Fixes: 05da025f ("etnaviv: fix two-sided stencil")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoetnaviv: fix depth bias
Jonathan Marek [Mon, 2 Sep 2019 18:46:15 +0000 (14:46 -0400)]
etnaviv: fix depth bias

Fixes remaining failures in these deqp tests (tested on GC3000/GC7000L):
dEQP-GLES2.functional.polygon_offset.*

Fixes: 6c3c05dc ("etnaviv: fix polygon offset")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoiris: Set MOCS for external surfaces to uncached
Jordan Justen [Fri, 10 May 2019 18:50:54 +0000 (11:50 -0700)]
iris: Set MOCS for external surfaces to uncached

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agoiris: Align fast clear color state buffer to a page.
Rafael Antognolli [Tue, 13 Aug 2019 21:47:27 +0000 (14:47 -0700)]
iris: Align fast clear color state buffer to a page.

On gen11 and older, compressed images are tiled and aligned to 4K. On
gen12 this 4K alignment restriction was removed. However, only aligning
the fast clear color buffer to 64B (a cacheline, as it's on the
documentation) is causing some bugs where the fast clear color is not
converted during the fast clear operation. Aligning things to 4K seems
to fix it.

v2: Fix typo case in the comment (Nanley)
v3: Rebase and fix conflicts.
v4: Fix rebase mistake (Nanley).

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
4 years agoanv: Align fast clear color state buffer to a page.
Rafael Antognolli [Tue, 13 Aug 2019 21:47:27 +0000 (14:47 -0700)]
anv: Align fast clear color state buffer to a page.

On gen11 and older, compressed images are tiled and aligned to 4K. On
gen12 this 4K alignment restriction was removed. However, only aligning
the fast clear color buffer to 64B (a cacheline, as it's on the
documentation) is causing some bugs where the fast clear color is not
converted during the fast clear operation. Aligning things to 4K seems
to fix it.

v2: Assert that image->planes[plane].offset is 4K aligned (Nanley)

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
4 years agozink: only enable KHR_external_memory_fd if supported
Erik Faye-Lund [Wed, 23 Oct 2019 10:16:22 +0000 (12:16 +0200)]
zink: only enable KHR_external_memory_fd if supported

While we're at it, make sure we error out if it's not supported when
required.

This brings us a bit closer to being able to test on SwiftShader, which
doesn't currently support KHR_external_memory_fd.

4 years agoradv: Start signalling semaphores in WSI acquire.
Bas Nieuwenhuizen [Wed, 30 Oct 2019 13:51:17 +0000 (14:51 +0100)]
radv: Start signalling semaphores in WSI acquire.

Winsys semaphores without signal operation get silently ignored.

Not so for syncobjs, so actually signal them.

Fixes: 84d9551b232 "radv: Always enable syncobj when supported for all fences/semaphores."
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2030
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agoaco: rename README to README.md
Rhys Perry [Mon, 21 Oct 2019 14:08:07 +0000 (15:08 +0100)]
aco: rename README to README.md

Closes: #1974
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: a couple loop handling fixes for GFX10 hazard pass
Rhys Perry [Tue, 29 Oct 2019 11:19:39 +0000 (11:19 +0000)]
aco: a couple loop handling fixes for GFX10 hazard pass

It was joining from the wrong blocks and block.kind is a bitmask instead
of an enum.

Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
4 years agointel/compiler: Add instruction compaction support on Gen12
Matt Turner [Mon, 9 Sep 2019 20:01:06 +0000 (13:01 -0700)]
intel/compiler: Add instruction compaction support on Gen12

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agointel/compiler: Make separate src0/src1 index tables
Matt Turner [Thu, 15 Feb 2018 18:33:18 +0000 (10:33 -0800)]
intel/compiler: Make separate src0/src1 index tables

TGL uses different data (and even a different format!) for each source.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agointel/compiler: Inline get_src_index()
Matt Turner [Tue, 13 Feb 2018 00:35:49 +0000 (16:35 -0800)]
intel/compiler: Inline get_src_index()

TGL will have separate tables for src0 and src1, so the shared function
will no longer make sense.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agointel/compiler: Restructure instruction compaction in preparation for Gen12
Matt Turner [Tue, 13 Feb 2018 00:26:20 +0000 (16:26 -0800)]
intel/compiler: Restructure instruction compaction in preparation for Gen12

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agointel/compiler: Remove unreachable() from brw_reg_type.c
Matt Turner [Wed, 16 Oct 2019 19:45:55 +0000 (12:45 -0700)]
intel/compiler: Remove unreachable() from brw_reg_type.c

The EU compaction unit test fuzzes the compaction code by flipping bits.
We use a simple skip_bits() function with a list of reserved bits to
ignore, but for more complex cases like invalid combinations of register
file:type, we need either machinery to check validity or for these
functions to simply inform us whether a combination was valid.

enum brw_reg_type a 4-bit field in brw_reg, so rather than expanding it
with an "INVALID" value, just return -1 and let the caller check for
that.

Scott suggested redefining unreachable() within the unit test to
longjmp() which would allow driver code like this to still use it and
allow the test to handle expected failures like this. If that plan works
out, I plan to revert this.

4 years agofreedreno/a2xx: add missing vertex formats (SSCALE/USCALE/FIXED)
Jonathan Marek [Fri, 6 Sep 2019 16:59:15 +0000 (12:59 -0400)]
freedreno/a2xx: add missing vertex formats (SSCALE/USCALE/FIXED)

Mostly for vertex formats, but they are supported as texture formats too
(untested however).

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agoradeonsi: disable sdma for gfx10
Pierre-Eric Pelloux-Prayer [Tue, 22 Oct 2019 08:12:49 +0000 (10:12 +0200)]
radeonsi: disable sdma for gfx10

Disable sdma on gfx10 until all timeouts bugs are fixed.

See:
    https://gitlab.freedesktop.org/mesa/mesa/issues/1907
    https://bugs.freedesktop.org/show_bug.cgi?id=111481

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agoradeonsi: sdma misc fixes
Pierre-Eric Pelloux-Prayer [Thu, 17 Oct 2019 14:15:54 +0000 (16:15 +0200)]
radeonsi: sdma misc fixes

SDMA IB doesn't need to be padded for SDMA.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agoradeonsi: align sdma byte count to dw
Pierre-Eric Pelloux-Prayer [Tue, 15 Oct 2019 13:19:22 +0000 (15:19 +0200)]
radeonsi: align sdma byte count to dw

If src/dst addresses are dw aligned and size is > 4 then we align
byte count to dw as well.

PAL implementation works like this.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agoradv: Enable ACO on Navi.
Timur Kristóf [Tue, 17 Sep 2019 17:59:52 +0000 (19:59 +0200)]
radv: Enable ACO on Navi.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agoradeonsi: enable 8K video decode support for HEVC and VP9
Leo Liu [Mon, 28 Oct 2019 17:17:04 +0000 (13:17 -0400)]
radeonsi: enable 8K video decode support for HEVC and VP9

HW 8K decode support starts at Renoir

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
4 years agoradeon/vcn: Add VP9 8K decode support
Leo Liu [Mon, 28 Oct 2019 17:08:25 +0000 (13:08 -0400)]
radeon/vcn: Add VP9 8K decode support

Require increase of context buffer size

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
4 years agoaco: try to group together VMEM loads of the same resource
Rhys Perry [Fri, 18 Oct 2019 12:05:00 +0000 (13:05 +0100)]
aco: try to group together VMEM loads of the same resource

v2: remove accidental shaderInt16 change
v2: simplify can_move_down initialization
v2: simplify VMEM_CLAUSE_MAX_GRAB_DIST

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: don't schedule instructions through depending VMEM instructions
Daniel Schürmann [Thu, 10 Oct 2019 14:31:40 +0000 (16:31 +0200)]
aco: don't schedule instructions through depending VMEM instructions

Previously, the scheduler tried to move up instructions from below depending
VMEM instructions only to move them down again when scheduling the VMEM
instruction.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: add can_reorder flags to load_ubo and load_constant
Daniel Schürmann [Thu, 10 Oct 2019 12:55:13 +0000 (14:55 +0200)]
aco: add can_reorder flags to load_ubo and load_constant

These got lost due to some refactoring.
Due to the way our scheduler works currently, for now
we add back the reorder flag for divergent loads only.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: only skip RAR dependencies if the variable is killed somewhere
Daniel Schürmann [Wed, 28 Aug 2019 10:08:12 +0000 (12:08 +0200)]
aco: only skip RAR dependencies if the variable is killed somewhere

This patch changes VMEM scheduling in a way that they can only
be moved upwards by previous VMEM instructions but not downwards.
This way, it improves the order of VMEM instructions in relation
to their users.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: restrict scheduling depending on max_waves
Daniel Schürmann [Thu, 29 Aug 2019 15:17:32 +0000 (17:17 +0200)]
aco: restrict scheduling depending on max_waves

Previously, we allowed all shaders to reduce the number of max_waves to as low as 5.
Restricting this on shaders with low register demand, increases the total number of waves
while the VMEM def-use distances hardly change.
This patch also changes the max number of move operations per MEM instruction.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoanv: Avoid emitting UBO surface states that won't be used
Jason Ekstrand [Tue, 29 Oct 2019 21:10:49 +0000 (16:10 -0500)]
anv: Avoid emitting UBO surface states that won't be used

This shaves around 4-5% off of a CPU-limited example running with the
Dawn WebGPU implementation.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/vec4: Set brw_stage_prog_data::has_ubo_pull
Jason Ekstrand [Tue, 29 Oct 2019 22:28:18 +0000 (17:28 -0500)]
intel/vec4: Set brw_stage_prog_data::has_ubo_pull

In 0e4a75f917, Ken added a flag brw_stage_prog_data which indicates
whether any UBO pulls ever occur.  Unfortunately, he neglected to set
the bit in the vec4 back-end.  This was fine at the time because the
optimization was intended for iris which does not support gen7 and using
the vec4 back-end on Gen8+ requires an environment variable.  We want to
use this in Vulkan which does support Gen7 so we want the information
from the vec4 back-end as well as scalar.

Fixes: 0e4a75f917 "intel/compiler: Record whether any pull constant..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agoradv: fix perftest options
Samuel Pitoiset [Mon, 28 Oct 2019 14:12:27 +0000 (15:12 +0100)]
radv: fix perftest options

RADV_PERFTEST=outooforder has been removed a while ago. This fixes
dumping the options into hang reports.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: move nomemorycache debug option at the right palce
Samuel Pitoiset [Mon, 28 Oct 2019 14:12:03 +0000 (15:12 +0100)]
radv: move nomemorycache debug option at the right palce

Fixes: 6571000071d ("radv: add debug option to turn off in memory cache")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: fix dumping SPIR-V into hang reports
Samuel Pitoiset [Mon, 28 Oct 2019 15:56:15 +0000 (16:56 +0100)]
radv: fix dumping SPIR-V into hang reports

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agomesa: enable ARB_gpu_shader_int64 in compat profile
Tapani Pälli [Fri, 25 Oct 2019 08:06:05 +0000 (11:06 +0300)]
mesa: enable ARB_gpu_shader_int64 in compat profile

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agomesa: add [Program]Uniform*64ARB display list support
Tapani Pälli [Fri, 25 Oct 2019 08:00:04 +0000 (11:00 +0300)]
mesa: add [Program]Uniform*64ARB display list support

This is required for int64 to be enabled in compat profile.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agoradv: Enable VK_KHR_timeline_semaphore.
Bas Nieuwenhuizen [Fri, 25 Oct 2019 08:26:50 +0000 (10:26 +0200)]
radv: Enable VK_KHR_timeline_semaphore.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agoradv: Add wait-before-submit support for timelines.
Bas Nieuwenhuizen [Mon, 28 Oct 2019 01:44:54 +0000 (02:44 +0100)]
radv: Add wait-before-submit support for timelines.

This is actually a non-threaded implementation. I'd summarize this
as event-based submission.

When submit happens we walk a tree of submissions that depend on
the syncobj signal operations to be submitted and if those submission
we no other dependencies we start to execute them immediately.

Or, well I still use a list to avoid issues with long chains and
the stacksize when using recursion.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agoradv: Add timelines with a VK_KHR_timeline_semaphore impl.
Bas Nieuwenhuizen [Tue, 22 Oct 2019 08:18:06 +0000 (10:18 +0200)]
radv: Add timelines with a VK_KHR_timeline_semaphore impl.

This does not fully do wait-before-submit, to be done in a follow
up patch.

For kernels without support for timeline syncobjs, this adds an
implementation of non-shareable timelines using legacy syncobjs.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agoradv: Add temporary datastructure for submissions.
Bas Nieuwenhuizen [Wed, 23 Oct 2019 13:31:43 +0000 (15:31 +0200)]
radv: Add temporary datastructure for submissions.

So we can defer them.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agoradv: Split semaphore into two parts as enum+union.
Bas Nieuwenhuizen [Sun, 20 Oct 2019 20:50:58 +0000 (22:50 +0200)]
radv: Split semaphore into two parts as enum+union.

This is in preparation to adding more types.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agoradv: Always enable syncobj when supported for all fences/semaphores.
Bas Nieuwenhuizen [Sun, 20 Oct 2019 17:15:24 +0000 (19:15 +0200)]
radv: Always enable syncobj when supported for all fences/semaphores.

This simplifies code for timeline semaphores by needing to support
less configurations.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agoradv: Improve fence signalling in QueueSubmit.
Bas Nieuwenhuizen [Sun, 20 Oct 2019 17:12:24 +0000 (19:12 +0200)]
radv: Improve fence signalling in QueueSubmit.

Only signalling it once.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agoradv: Do sparse binding in queue submission.
Bas Nieuwenhuizen [Sat, 19 Oct 2019 15:05:22 +0000 (17:05 +0200)]
radv: Do sparse binding in queue submission.

So we have one place to do queue things if we end up deferring
submissions.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agoradv: Split out commandbuffer submission.
Bas Nieuwenhuizen [Thu, 3 Oct 2019 19:08:29 +0000 (21:08 +0200)]
radv: Split out commandbuffer submission.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agoradv: Clean up unused variable.
Bas Nieuwenhuizen [Tue, 1 Oct 2019 16:14:34 +0000 (18:14 +0200)]
radv: Clean up unused variable.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agoradv: Add an early exit in the secure compile if we already have the cache entries.
Bas Nieuwenhuizen [Wed, 30 Oct 2019 02:29:21 +0000 (03:29 +0100)]
radv: Add an early exit in the secure compile if we already have the cache entries.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agoradv: Compute hashes in secure process for secure compilation.
Bas Nieuwenhuizen [Wed, 30 Oct 2019 01:54:37 +0000 (02:54 +0100)]
radv: Compute hashes in secure process for secure compilation.

To prevent poisoning arbitrary cache entries.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agozink: drop nop descriptor-updates
Erik Faye-Lund [Tue, 29 Oct 2019 13:12:02 +0000 (14:12 +0100)]
zink: drop nop descriptor-updates

If there's nothing to be done, let's actually do nothing. Seems like a
good idea.

Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agozink: use bitfield for dirty flagging
Erik Faye-Lund [Tue, 29 Oct 2019 12:27:58 +0000 (13:27 +0100)]
zink: use bitfield for dirty flagging

Bitfields are a bit more ideomatic than explicit flags, and harder to
get wrong.

Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agozink: use dynamic state for line-width
Erik Faye-Lund [Tue, 29 Oct 2019 11:43:56 +0000 (12:43 +0100)]
zink: use dynamic state for line-width

This will lead to fewer pipelines in the cache, which is assumed to
become our most unavoidable performance bottle-neck down the line.

Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agozink: Use optimal layout instead of general. Reduces valid layer warnings. Fixes...
Duncan Hopkins [Wed, 14 Aug 2019 10:07:47 +0000 (11:07 +0100)]
zink: Use optimal layout instead of general. Reduces valid layer warnings. Fixes RADV image noise.

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
4 years agogitlab-ci: Disable meson-windows job for the time being
Michel Dänzer [Wed, 30 Oct 2019 08:38:20 +0000 (09:38 +0100)]
gitlab-ci: Disable meson-windows job for the time being

It needs a CI runner carrying the mesa-windows tag, but there's none
available currently.

4 years agoradv: make use of radv_sc_read()
Timothy Arceri [Tue, 29 Oct 2019 06:46:57 +0000 (17:46 +1100)]
radv: make use of radv_sc_read()

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: add radv_sc_read() helper
Timothy Arceri [Tue, 29 Oct 2019 06:43:40 +0000 (17:43 +1100)]
radv: add radv_sc_read() helper

This is a function with timeout support for reading from the pipe
between processes used for secure compile.

Initially we hardcode the timeout to 5 seconds. We can adjust the
timeout limit in future if needed.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: allow select() calls in secure compile
Timothy Arceri [Tue, 29 Oct 2019 06:41:41 +0000 (17:41 +1100)]
radv: allow select() calls in secure compile

This will be used in the following patch to support timeouts for
reading the pipe between processes.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agomapi: Improve the x86 tsd stubs performance.
Lepton Wu [Wed, 30 Oct 2019 00:52:21 +0000 (17:52 -0700)]
mapi: Improve the x86 tsd stubs performance.

This skips touching %ebx most times and it shows that glGetString performance
increased from 114M/s to 120M/s on my desktop.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>