mesa.git
5 years agoac: initial Wave32 support in LLVM build helpers
Marek Olšák [Fri, 12 Jul 2019 21:12:17 +0000 (17:12 -0400)]
ac: initial Wave32 support in LLVM build helpers

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradeonsi: assume that selector != NULL for compute shaders
Marek Olšák [Tue, 16 Jul 2019 02:00:05 +0000 (22:00 -0400)]
radeonsi: assume that selector != NULL for compute shaders

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradeonsi: remove what appears to be legacy compute code
Marek Olšák [Tue, 16 Jul 2019 01:55:43 +0000 (21:55 -0400)]
radeonsi: remove what appears to be legacy compute code

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradeonsi: remove si_program::use_code_object_v2
Marek Olšák [Tue, 16 Jul 2019 01:49:30 +0000 (21:49 -0400)]
radeonsi: remove si_program::use_code_object_v2

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradeonsi: add si_shader_selector into si_compute
Marek Olšák [Tue, 16 Jul 2019 01:39:22 +0000 (21:39 -0400)]
radeonsi: add si_shader_selector into si_compute

Now we can assume that shader->selector is always set.
This will simplify some code.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradeonsi: set threadgroup size to 0 for threadgroups with only 1 wave
Marek Olšák [Fri, 12 Jul 2019 21:22:30 +0000 (17:22 -0400)]
radeonsi: set threadgroup size to 0 for threadgroups with only 1 wave

This has no effect on Wave64.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradeonsi/gfx10: set as_ngg for GS prolog
Marek Olšák [Fri, 12 Jul 2019 21:26:24 +0000 (17:26 -0400)]
radeonsi/gfx10: set as_ngg for GS prolog

as_ngg is required by Wave32.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradeonsi/gfx10: remove the disable_ngg option
Marek Olšák [Fri, 12 Jul 2019 19:31:14 +0000 (15:31 -0400)]
radeonsi/gfx10: remove the disable_ngg option

because legacy VS hangs.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradeonsi/gfx10: combine hw edgeflags with user edgeflags for correct behavior
Marek Olšák [Sat, 6 Jul 2019 04:12:26 +0000 (00:12 -0400)]
radeonsi/gfx10: combine hw edgeflags with user edgeflags for correct behavior

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradeonsi/gfx10: deduplicate code for esvert_lds_size
Marek Olšák [Sat, 6 Jul 2019 04:11:36 +0000 (00:11 -0400)]
radeonsi/gfx10: deduplicate code for esvert_lds_size

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradeonsi/gfx10: simplify a streamout loop in gfx10_emit_ngg_epilogue
Marek Olšák [Sat, 6 Jul 2019 03:32:36 +0000 (23:32 -0400)]
radeonsi/gfx10: simplify a streamout loop in gfx10_emit_ngg_epilogue

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradeonsi/gfx10: don't use MALLOC for outputs
Marek Olšák [Sat, 6 Jul 2019 03:22:33 +0000 (23:22 -0400)]
radeonsi/gfx10: don't use MALLOC for outputs

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradeonsi/gfx10: clean up ESGS ring size computation
Marek Olšák [Sat, 6 Jul 2019 02:19:47 +0000 (22:19 -0400)]
radeonsi/gfx10: clean up ESGS ring size computation

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradeonsi/gfx10: fix unnecessary LDS overallocation for NGG GS
Marek Olšák [Sat, 6 Jul 2019 02:12:36 +0000 (22:12 -0400)]
radeonsi/gfx10: fix unnecessary LDS overallocation for NGG GS

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradeonsi/gfx10: don't compile the GS copy shader if it's 100% not needed
Marek Olšák [Sat, 6 Jul 2019 01:19:41 +0000 (21:19 -0400)]
radeonsi/gfx10: don't compile the GS copy shader if it's 100% not needed

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradeonsi/gfx10: set GE_CTNL.PACKET_TO_ONE_PA for NGG
Marek Olšák [Sat, 6 Jul 2019 01:06:04 +0000 (21:06 -0400)]
radeonsi/gfx10: set GE_CTNL.PACKET_TO_ONE_PA for NGG

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradeonsi/gfx10: update a tunable max_es_verts_base for NGG
Marek Olšák [Fri, 5 Jul 2019 21:53:47 +0000 (17:53 -0400)]
radeonsi/gfx10: update a tunable max_es_verts_base for NGG

We have to fix the computation so as not to break quads.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradeonsi/gfx10: implement ARB_post_depth_coverage
Marek Olšák [Fri, 5 Jul 2019 21:30:08 +0000 (17:30 -0400)]
radeonsi/gfx10: implement ARB_post_depth_coverage

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradeonsi: fix leaked compute shader NIR
Marek Olšák [Tue, 16 Jul 2019 04:08:27 +0000 (00:08 -0400)]
radeonsi: fix leaked compute shader NIR

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradeonsi: save the enable_nir option in the shader cache correctly
Marek Olšák [Fri, 12 Jul 2019 19:42:44 +0000 (15:42 -0400)]
radeonsi: save the enable_nir option in the shader cache correctly

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradeonsi/gfx10: enable SDMA
Marek Olšák [Sat, 13 Jul 2019 00:21:01 +0000 (20:21 -0400)]
radeonsi/gfx10: enable SDMA

no changes since gfx9 for buffers

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoac: use llvm.amdgcn.writelane
Marek Olšák [Tue, 16 Jul 2019 05:07:49 +0000 (01:07 -0400)]
ac: use llvm.amdgcn.writelane

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoac: fix shader clock on LLVM 9
Marek Olšák [Tue, 16 Jul 2019 03:42:35 +0000 (23:42 -0400)]
ac: fix shader clock on LLVM 9

Probably relevant commit:

commit dd32dc3f72ec99b1794d62c74d2beb3b60468d50
Author: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com>
Date:   Tue Jul 9 03:10:18 2019 +0000

    [AMDGPU] Always use s_memtime for readcyclecounter

    Differential Revision: https://reviews.llvm.org/D64369

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@365431 91177308-0d34-0410-b5e6-96231b3b80d8

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradeon/vcn: adding engine type for new fw interface
Boyuan Zhang [Wed, 15 May 2019 19:05:21 +0000 (15:05 -0400)]
radeon/vcn: adding engine type for new fw interface

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradeonsi: use the correct buffer size in si_vid_clear_buffer
Marek Olšák [Mon, 8 Jul 2019 18:57:42 +0000 (14:57 -0400)]
radeonsi: use the correct buffer size in si_vid_clear_buffer

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agomesa: add EXT_dsa glEnabledIndexedEXT
Pierre-Eric Pelloux-Prayer [Fri, 26 Apr 2019 14:50:57 +0000 (16:50 +0200)]
mesa: add EXT_dsa glEnabledIndexedEXT

The implementation uses _mesa_ActiveTexture to change the active texture unit and
then reset it.

It causes an unnecessary _NEW_TEXTURE_STATE but:
  - adding an index argument to _mesa_set_enable causes a lot of changes (~140 callers)
  - enable_texture (called by _mesa_set_enable) might cause a _NEW_TEXTURE_STATE
    anyway.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agomesa: add EXT_dsa glGetTextureLevelParameter*vEXT functions
Pierre-Eric Pelloux-Prayer [Mon, 20 May 2019 12:12:54 +0000 (14:12 +0200)]
mesa: add EXT_dsa glGetTextureLevelParameter*vEXT functions

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agomesa: add EXT_dsa gl(Copy)Texture(Sub)Image1D/2D/3DEXT functions
Pierre-Eric Pelloux-Prayer [Fri, 26 Apr 2019 14:50:31 +0000 (16:50 +0200)]
mesa: add EXT_dsa gl(Copy)Texture(Sub)Image1D/2D/3DEXT functions

Added functions:
- glTextureImage1DEXT
- glTextureImage2DEXT
- glTextureImage3DEXT
- glTextureSubImage1DEXT
- glTextureSubImage3DEXT
- glCopyTextureImage1DEXT
- glCopyTextureImage2DEXT
- glCopyTextureSubImage1DEXT
- glCopyTextureSubImage2DEXT
- glCopyTextureSubImage3DEXT
- glGetTextureImageEXT

All but the last one can be compiled in a display list.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agomesa: move lookup_texture_ext_dsa up in teximage.c
Pierre-Eric Pelloux-Prayer [Tue, 2 Jul 2019 09:33:36 +0000 (11:33 +0200)]
mesa: move lookup_texture_ext_dsa up in teximage.c

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agomesa: pass gl_texture_object as arg to not depend on state
Pierre-Eric Pelloux-Prayer [Tue, 2 Jul 2019 09:32:06 +0000 (11:32 +0200)]
mesa: pass gl_texture_object as arg to not depend on state

This will allow to use the same functions for EXT_dsa implementation.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agomesa: refactor get_texture_image to remove duplicate code
Pierre-Eric Pelloux-Prayer [Tue, 2 Jul 2019 08:59:21 +0000 (10:59 +0200)]
mesa: refactor get_texture_image to remove duplicate code

Move shared code in a new function (_get_texture_image) and use it instead
of duplicating the same lines.
Will be also used by the EXT_dsa functions (GetTextureImageEXT and GetMultiTexImageEXT).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agopipe-loader: use radeonsi for MM if amdgpu dri is used
Jeremy Newton [Wed, 10 Jul 2019 14:23:53 +0000 (10:23 -0400)]
pipe-loader: use radeonsi for MM if amdgpu dri is used

The amdgpu dri is used for the closed source AMD driver. Since this driver
does not implement multimedia, we fall back to radeonsi in mesa to do
multimedia. This corrects the dri driver name for when it is set to amdgpu.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
5 years agoegl: drop incorrect pkg-config file for glvnd
Eric Engestrom [Thu, 4 Jul 2019 13:48:43 +0000 (14:48 +0100)]
egl: drop incorrect pkg-config file for glvnd

With b01524fff05eef66e8cd ("meson: don't build libGLES*.so with GLVND")
we dropped the incorrect pkg-config files for GLES*.

Since then, the glvnd issue of its missing files has become painfully
apparent, since it break the build for everyone using glvnd.

NVIDIA has had a fix for a few years now, but has yet to accept it:
https://github.com/NVIDIA/libglvnd/pull/86

Since the breakage is already there, let's clean up everything on our side
while we wait for NVIDIA to accept the fix.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agodocs: simplify `Fixes:` git command
Eric Engestrom [Fri, 19 Jul 2019 20:27:44 +0000 (21:27 +0100)]
docs: simplify `Fixes:` git command

Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agomesa/tests: add missing dep_thread
Eric Engestrom [Fri, 19 Jul 2019 14:00:35 +0000 (15:00 +0100)]
mesa/tests: add missing dep_thread

Fixes: f8c27c277585141f2d27 ("state_tracker: Move the format test out to be an actual unit test.")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Vinson Lee <vlee@freedesktop.org>
5 years agoutil: drop strncat(), strcmp(), strncmp(), snprintf() & vsnprintf() MSVC fallbacks
Eric Engestrom [Thu, 4 Jul 2019 15:13:34 +0000 (16:13 +0100)]
util: drop strncat(), strcmp(), strncmp(), snprintf() & vsnprintf() MSVC fallbacks

It would seem MSVC>=2015 is now C99-compliant wrt these functions:
strncat:   https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/strncat-strncat-l-wcsncat-wcsncat-l-mbsncat-mbsncat-l?view=vs-2017
strcmp:    https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/strcmp-wcscmp-mbscmp?view=vs-2017
strncmp:   https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/strncmp-wcsncmp-mbsncmp-mbsncmp-l?view=vs-2017
snprintf:  https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/snprintf-snprintf-snprintf-l-snwprintf-snwprintf-l?view=vs-2017
vsnprintf: https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/vsnprintf-vsnprintf-vsnprintf-l-vsnwprintf-vsnwprintf-l?view=vs-2017

Suggested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoutil: use standard name for vsnprintf()
Eric Engestrom [Tue, 20 Nov 2018 12:02:36 +0000 (12:02 +0000)]
util: use standard name for vsnprintf()

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoutil: use standard name for snprintf()
Eric Engestrom [Tue, 20 Nov 2018 11:59:28 +0000 (11:59 +0000)]
util: use standard name for snprintf()

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoutil: use standard name for vasprintf()
Eric Engestrom [Tue, 20 Nov 2018 11:55:55 +0000 (11:55 +0000)]
util: use standard name for vasprintf()

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoutil: use standard name for sprintf()
Eric Engestrom [Tue, 20 Nov 2018 11:55:00 +0000 (11:55 +0000)]
util: use standard name for sprintf()

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoutil: use standard name for strcmp()
Eric Engestrom [Tue, 20 Nov 2018 11:49:52 +0000 (11:49 +0000)]
util: use standard name for strcmp()

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoutil: use standard name for strcasecmp()
Eric Engestrom [Tue, 20 Nov 2018 11:39:28 +0000 (11:39 +0000)]
util: use standard name for strcasecmp()

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoutil: use standard name for strncmp()
Eric Engestrom [Tue, 20 Nov 2018 11:48:38 +0000 (11:48 +0000)]
util: use standard name for strncmp()

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoutil: use standard name for strncat()
Eric Engestrom [Tue, 20 Nov 2018 11:47:06 +0000 (11:47 +0000)]
util: use standard name for strncat()

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoutil: use standard name for strdup()
Eric Engestrom [Tue, 20 Nov 2018 11:42:14 +0000 (11:42 +0000)]
util: use standard name for strdup()

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoutil: use standard name for strchrnul()
Eric Engestrom [Tue, 20 Nov 2018 11:24:55 +0000 (11:24 +0000)]
util: use standard name for strchrnul()

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoutil: drop unused vsprintf() wrapper
Eric Engestrom [Tue, 20 Nov 2018 11:57:03 +0000 (11:57 +0000)]
util: drop unused vsprintf() wrapper

Suggested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoutil: drop unused strchr() wrapper
Eric Engestrom [Tue, 20 Nov 2018 11:51:35 +0000 (11:51 +0000)]
util: drop unused strchr() wrapper

Suggested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoutil: drop unused strstr() wrapper
Eric Engestrom [Tue, 20 Nov 2018 11:43:57 +0000 (11:43 +0000)]
util: drop unused strstr() wrapper

Suggested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agonir: Only rematerialize comparisons with all SSA sources
Jason Ekstrand [Fri, 19 Jul 2019 18:07:39 +0000 (13:07 -0500)]
nir: Only rematerialize comparisons with all SSA sources

Otherwise, you may end up moving a register read and that could result
in an incorrect shader.  This commit fixes a rendering issue in Elite:
Dangerous.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111152
Fixes: 3ee2e84c60 "nir: Rematerialize compare instructions"
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
5 years agospirv: Fix order of barriers in SpvOpControlBarrier
Daniel Schürmann [Thu, 18 Jul 2019 18:48:14 +0000 (20:48 +0200)]
spirv: Fix order of barriers in SpvOpControlBarrier

Semantically, the memory barrier has to come first to wait
for the completion of pending memory requests.
Afterwards, the workgroups can be synchronized.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agonir: use a switch when printing intrinsic indices
Caio Marcelo de Oliveira Filho [Thu, 7 Mar 2019 19:07:04 +0000 (11:07 -0800)]
nir: use a switch when printing intrinsic indices

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
5 years agonir/algebraic: mark a few comparison simplifications as precise
Rhys Perry [Mon, 1 Jul 2019 14:49:40 +0000 (15:49 +0100)]
nir/algebraic: mark a few comparison simplifications as precise

No vkpipeline-db changes found.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reveiewed-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir/algebraic: optimize contradictory iand operands
Rhys Perry [Fri, 28 Jun 2019 15:13:04 +0000 (16:13 +0100)]
nir/algebraic: optimize contradictory iand operands

Some of these were found in a few GTAV, Rise of the Tomb Raider and
Shadow of the Tomb Raider shaders.

Results from vkpipeline-db run with ACO:
Totals from affected shaders:
SGPRS: 376 -> 376 (0.00 %)
VGPRS: 220 -> 220 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 13492 -> 11560 (-14.32 %) bytes
LDS: 6 -> 6 (0.00 %) blocks
Max Waves: 69 -> 69 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

v2: use False instead of 0

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reveiewed-by: Alyssa Rosenzweig alyssa.rosenzweig@collabora.com
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agolima/ppir: handle all node types in ppir_node_replace_child
Erico Nunes [Tue, 16 Jul 2019 23:31:01 +0000 (01:31 +0200)]
lima/ppir: handle all node types in ppir_node_replace_child

ppir_node_replace_child is used by the const lowering routine in ppir.
All types need to be handled here, otherwise the src node is not updated
properly when one of the lowered nodes is a const, which results in, for
example, regalloc not assigning registers correctly.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agolima/ppir: branch regalloc fixes
Erico Nunes [Tue, 16 Jul 2019 23:30:55 +0000 (01:30 +0200)]
lima/ppir: branch regalloc fixes

The branch instruction has sources which must be handled in src handling
paths so that regalloc assigns registers to them properly.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agomain: Destroy static hash table
Yevhenii Kolesnikov [Thu, 18 Jul 2019 14:38:48 +0000 (17:38 +0300)]
main: Destroy static hash table

format_array_format_table has a static lifetime - it will be destroyed
by an atexit handler.

Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoradv: reset the window scissor with no clear state.
Dave Airlie [Thu, 18 Jul 2019 01:19:11 +0000 (11:19 +1000)]
radv: reset the window scissor with no clear state.

If we don't have clear state (which gfx10 doesn't currently)
we will fix to reset the scissor. AMDVLK will leave it set
to something else.

Marek also has this fix for radeonsi pending.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: fix crash in shader tracing.
Dave Airlie [Thu, 18 Jul 2019 00:44:10 +0000 (10:44 +1000)]
radv: fix crash in shader tracing.

Enabling tracing, and then having a vmfault, can leads to a segfault
before we print out the traces, as if a meta shader is executing
and we don't have the NIR for it.

Just pass the stage and give back a default.

Fixes: 9b9ccee4d64 ("radv: take LDS into account for compute shader occupancy stats")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoiris: change last_vue_stage() to look at uncompiled shaders
Timothy Arceri [Fri, 28 Jun 2019 01:13:11 +0000 (11:13 +1000)]
iris: change last_vue_stage() to look at uncompiled shaders

This allows us to find the last vue stage before we have compiled
the shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agonir/lower_clip: add support for geometry shaders
Timothy Arceri [Thu, 27 Jun 2019 04:20:37 +0000 (14:20 +1000)]
nir/lower_clip: add support for geometry shaders

This will be used to enabled compat profile support for geometry
shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agonir/lower_clip: add lower_clip_outputs() helper
Timothy Arceri [Fri, 28 Jun 2019 00:50:15 +0000 (10:50 +1000)]
nir/lower_clip: add lower_clip_outputs() helper

This will be reused in the following patch to add support for clip
vertex lowering in geometry shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agonir/lower_clip: add create_clipdist_vars() helper
Timothy Arceri [Fri, 28 Jun 2019 00:35:11 +0000 (10:35 +1000)]
nir/lower_clip: add create_clipdist_vars() helper

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agonir/lower_clip: add a find_clipvertex_and_position_outputs() helper
Timothy Arceri [Fri, 28 Jun 2019 00:10:28 +0000 (10:10 +1000)]
nir/lower_clip: add a find_clipvertex_and_position_outputs() helper

This will allow code sharing in a following patch that adds support
for lowering in geometry shaders. It also allows us to exit early
if there is no lowering to do which allows a small code tidy up.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agopanfrost: Set rt_count
Alyssa Rosenzweig [Thu, 18 Jul 2019 19:43:39 +0000 (12:43 -0700)]
panfrost: Set rt_count

This doesn't quite work yet, but it illustrates how MRT is implemented
in the MFBD: rt_count is set appropriately based on the number of render
targets, while additional render target descriptors are appended on with
an index variable in them (not quite decoded since there's some aspects
we don't understand there, but conceptually this should be right).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Trace invisible BOs
Alyssa Rosenzweig [Thu, 18 Jul 2019 19:42:27 +0000 (12:42 -0700)]
panfrost: Trace invisible BOs

Helps make the decode a little more readable (names instead of
addresses).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/decode: Preserve empty tiler heap symmetry
Alyssa Rosenzweig [Thu, 18 Jul 2019 19:28:56 +0000 (12:28 -0700)]
panfrost/decode: Preserve empty tiler heap symmetry

If tiler_heap_end == tiler_heap_start, ensure it's printed the same
rather than one erroring out as hex.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Zero polygon list body size for clears
Alyssa Rosenzweig [Thu, 18 Jul 2019 19:10:39 +0000 (12:10 -0700)]
panfrost: Zero polygon list body size for clears

There's no polygons, so you can't have any size to the polygon list,
although there is a minimal header.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/mfbd: Unify depth-only with masked FBO path
Alyssa Rosenzweig [Thu, 18 Jul 2019 18:09:19 +0000 (11:09 -0700)]
panfrost/mfbd: Unify depth-only with masked FBO path

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Simplify set_framebuffer_state
Alyssa Rosenzweig [Thu, 18 Jul 2019 18:05:01 +0000 (11:05 -0700)]
panfrost: Simplify set_framebuffer_state

Most of the ad hoc logic is already in Gallium.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Check for NULL surface in places
Alyssa Rosenzweig [Thu, 18 Jul 2019 17:59:59 +0000 (10:59 -0700)]
panfrost: Check for NULL surface in places

Fixes a bunch of NULL dereferences, although it does cause GPU faults of
course.

This is caused by color buffers masked out in MRT, which we'll
eventually have to solve the right way... one thing at a time.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Expose 4 render targets
Alyssa Rosenzweig [Thu, 18 Jul 2019 17:48:19 +0000 (10:48 -0700)]
panfrost: Expose 4 render targets

Hidden behind deqp flag as usual.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Shrink tiler heap
Alyssa Rosenzweig [Thu, 18 Jul 2019 20:04:44 +0000 (13:04 -0700)]
panfrost: Shrink tiler heap

128MB is excessive and 16MB is still plenty. Saves 112MB/context on
kernels without growable/heap support.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agonir/large_constants: De-duplicate constants
Caio Marcelo de Oliveira Filho [Fri, 7 Jun 2019 16:28:14 +0000 (09:28 -0700)]
nir/large_constants: De-duplicate constants

If a function has a constant and is called more than once, after
inlining we may end up with different variables representing the same
constant.  This commit look into the data and de-duplicate them.

The first pass now will collect the constant data in a per variable
buffer, then de-duplication happens (by sorting then linear walk), and
the second pass will use the data in var->data.location.

One side-effect of the current implementation is that constants will
be reordered.  If this turns out to be a problem is something that can
be fixed.

An alternative strategy considered was to perform this in a
per-function basis and then merge the results, the problem is that we
would have to fix up the offsets during the merge.  Given the data we
have, the current patch is good enough.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir/large_constants: Use ralloc for var_infos
Caio Marcelo de Oliveira Filho [Fri, 7 Jun 2019 16:21:09 +0000 (09:21 -0700)]
nir/large_constants: Use ralloc for var_infos

This will be used later on to allocate constant data for each
variable (and then deduplicate).  Also drop initializing found_read,
as it is already implicitly false in the literal.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agofreedreno: Convert nir_lower_tg4_to_tex to the NIR lowering helper.
Eric Anholt [Tue, 16 Jul 2019 18:19:28 +0000 (11:19 -0700)]
freedreno: Convert nir_lower_tg4_to_tex to the NIR lowering helper.

Cuts a bunch of boilerplate.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno: Convert load_barycentric_at_sample to the NIR lowering helper.
Eric Anholt [Tue, 16 Jul 2019 18:15:15 +0000 (11:15 -0700)]
freedreno: Convert load_barycentric_at_sample to the NIR lowering helper.

Cuts out a ton of boilerplate.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno: Convert load_barycentric_at_offset to the NIR lowering helper.
Eric Anholt [Tue, 16 Jul 2019 18:09:39 +0000 (11:09 -0700)]
freedreno: Convert load_barycentric_at_offset to the NIR lowering helper.

Cuts out a ton of boilerplate.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agov3d: Use nir_shader_lower_instructions() for txf_ms lowering.
Eric Anholt [Tue, 16 Jul 2019 17:55:56 +0000 (10:55 -0700)]
v3d: Use nir_shader_lower_instructions() for txf_ms lowering.

Cuts out a bunch of boilerplate.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
5 years agonir: Allow internal changes to the instr in nir_shader_lower_instructions().
Eric Anholt [Tue, 16 Jul 2019 17:52:25 +0000 (10:52 -0700)]
nir: Allow internal changes to the instr in nir_shader_lower_instructions().

v3d's NIR txf_ms lowering wants to swizzle around the input coordinates in
NIR, but doesn't generate a new txf_ms instructions as replacement.  It's
pretty easy to allow that in nir_shader_lower_instructions, and it may be
common in lowering passes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agovc4: Convert vc4_nir_lower_txf_ms to nir_shader_lower_instructions().
Eric Anholt [Tue, 16 Jul 2019 17:21:13 +0000 (10:21 -0700)]
vc4: Convert vc4_nir_lower_txf_ms to nir_shader_lower_instructions().

Cuts out a bunch of boilerplate.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
5 years agov3d: Fix assertion failures in debug builds.
Eric Anholt [Tue, 16 Jul 2019 18:59:35 +0000 (11:59 -0700)]
v3d: Fix assertion failures in debug builds.

nir_lower_io leaves around deref_var instructions after lowering away
deref intrinsics.  This ends up breaking validation after v3d_nir_lower_io
removes variables not actually being stored by the shader's
store_output()s.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
5 years agopanfrost: Handle Z24 textures
Alyssa Rosenzweig [Thu, 18 Jul 2019 00:39:34 +0000 (17:39 -0700)]
panfrost: Handle Z24 textures

Just use the Z32 code.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/ci: Update expectations
Alyssa Rosenzweig [Thu, 18 Jul 2019 00:17:41 +0000 (17:17 -0700)]
panfrost/ci: Update expectations

We just fixed some stencil tests.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Make scissor test more robust
Alyssa Rosenzweig [Wed, 17 Jul 2019 23:30:09 +0000 (16:30 -0700)]
panfrost: Make scissor test more robust

See v3d implementation.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Use correct NO_DITHER field on MFBD
Alyssa Rosenzweig [Wed, 17 Jul 2019 23:19:45 +0000 (16:19 -0700)]
panfrost: Use correct NO_DITHER field on MFBD

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Implement Z32F(_S8) support
Alyssa Rosenzweig [Wed, 17 Jul 2019 22:49:42 +0000 (15:49 -0700)]
panfrost: Implement Z32F(_S8) support

Z32F uses a dediacted float path. Z32F_S8 uses separate stencil planes
in the hardware, lowered via u_transfer_helper.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost/decode: Don't disassemble NULL shaders
Alyssa Rosenzweig [Wed, 17 Jul 2019 22:43:24 +0000 (15:43 -0700)]
panfrost/decode: Don't disassemble NULL shaders

It is legal to load a shader from a NULL address, particularly when the
TILER job is used strictly for effects on the Z/S buffer with 0x0 color
mask. Don't crash the decoder in this case.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopanfrost: Copy stencil front to back if back disabled
Alyssa Rosenzweig [Wed, 17 Jul 2019 22:42:48 +0000 (15:42 -0700)]
panfrost: Copy stencil front to back if back disabled

When backside stenciling is disabled, backfacing primitives just do the
same thing as frontfacing primitives.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agoswr/rast: Refactor memory API between rasterizer core and swr
Jan Zielinski [Wed, 17 Jul 2019 15:22:16 +0000 (17:22 +0200)]
swr/rast: Refactor memory API between rasterizer core and swr

This commit cleans up API between the core of the rasterizer and swr.
Some formatting changes are also done.

Reviewed-by: Alok Hota <alok.hota@intel.com>
5 years agolima/ppir: Add gl_PointCoord handling
Andreas Baierl [Fri, 31 May 2019 07:54:27 +0000 (09:54 +0200)]
lima/ppir: Add gl_PointCoord handling

Treat gl_PointCoord as a system value and
add the necessary bits for correct codegen.

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agogallium: Add PIPE_CAP_TGSI_FS_POINT_IS_SYSVAL
Andreas Baierl [Tue, 4 Jun 2019 11:25:28 +0000 (13:25 +0200)]
gallium: Add PIPE_CAP_TGSI_FS_POINT_IS_SYSVAL

This adds an option to treat gl_PointCoord as a system value.

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agonir/tgsi: Extend tgsi_to_nir.c to support gl_PointCoord as a system value.
Andreas Baierl [Tue, 11 Jun 2019 12:59:11 +0000 (14:59 +0200)]
nir/tgsi: Extend tgsi_to_nir.c to support gl_PointCoord as a system value.

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agonir: Add gl_PointCoord system value
Andreas Baierl [Tue, 4 Jun 2019 11:24:53 +0000 (13:24 +0200)]
nir: Add gl_PointCoord system value

gl_PointCoord handling needs some special bits set in lima/ppir code
generation. Treating gl_PointCoord as a system value makes it easier
to distinguish from a regular varying.

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoglsl: Optionally declare gl_PointCoord as a system value
Andreas Baierl [Tue, 4 Jun 2019 11:23:44 +0000 (13:23 +0200)]
glsl: Optionally declare gl_PointCoord as a system value

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agolima/gp: Fix problem with complex moves
Connor Abbott [Sat, 11 May 2019 16:43:30 +0000 (18:43 +0200)]
lima/gp: Fix problem with complex moves

When writing the scheduler, we forgot that you can't read the complex
unit in certain sources because it gets overwritten to 0 or 1. Fixing
this turned out to be possible without giving up and reducing
GPIR_VALUE_REG_NUM to 10, although it was difficult in a way I didn't
expect. There can be at most 4 next-max nodes that can't have moves
scheduled in the complex slot, so it actually isn't a problem for
getting the number of next-max nodes at 5 or lower. However, it is a
problem for stores. If a given node is a next-max node whose move cannot
go in the complex slot *and* is used by a store that we decide to
schedule, we have to reserve one of the non-complex slots for a move
instead of all the slots, or we can wind up in a situation where only
the complex slot is free and we fail the move. This means that we have
to add another term to the reservation logic, for stores whose children
cannot be in the complex slot.

Acked-by: Qiang Yu <yuq825@gmail.com>
5 years agolima/gpir: Rework the scheduler
Connor Abbott [Thu, 11 Jan 2018 23:35:58 +0000 (18:35 -0500)]
lima/gpir: Rework the scheduler

Now, we do scheduling at the same time as value register allocation. The
ready list now acts similarly to the array of registers in
value_regalloc, keeping us from running out of slots. Before this, the
value register allocator wasn't aware of the scheduling constraints of
the actual machine, which meant that it sometimes chose the wrong false
dependencies to insert. Now, we assign value registers at the same time
as we actually schedule instructions, making its choices reflect reality
much better. It was also conservative in some cases where the new scheme
doesn't have to be. For example, in something like:

1 = ld_att
2 = ld_uni
3 = add 1, 2

It's possible that one of 1 and 2 can't be scheduled in the same
instruction as 3, meaning that a move needs to be inserted, so the value
register allocator needs to assume that this sequence requires two
registers. But when actually scheduling, we could discover that 1, 2,
and 3 can all be scheduled together, so that they only require one
register. The new scheduler speculatively inserts the instruction under
consideration, as well as all of its child load instructions, and then
counts the number of live value registers after all is said and done.
This lets us be more aggressive with scheduling when we're close to the
limit.

With the new scheduler, the kmscube vertex shader is now scheduled in 40
instructions, versus 66 before.

Acked-by: Qiang Yu <yuq825@gmail.com>
5 years agolima/gp: Mark more add-only nodes as maybe-two-slot
Connor Abbott [Mon, 22 Apr 2019 19:54:06 +0000 (21:54 +0200)]
lima/gp: Mark more add-only nodes as maybe-two-slot

Reviewed-by: Qiang Yu <yuq825@gmail.com>
5 years agolima/gpir: Fix some bugs in instruction handling
Connor Abbott [Tue, 16 Jan 2018 00:38:17 +0000 (19:38 -0500)]
lima/gpir: Fix some bugs in instruction handling

Reviewed-by: Qiang Yu <yuq825@gmail.com>
5 years agolima: Reintroduce the standalone compiler
Connor Abbott [Fri, 3 Nov 2017 21:34:32 +0000 (17:34 -0400)]
lima: Reintroduce the standalone compiler

I used this to test things without needing to have a device handy.

Acked-by: Qiang Yu <yuq825@gmail.com>