mesa.git
5 years agopan/midgard: Add OP_IS_DERIVATIVE helper
Alyssa Rosenzweig [Mon, 29 Jul 2019 23:52:55 +0000 (16:52 -0700)]
pan/midgard: Add OP_IS_DERIVATIVE helper

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Add make_compiler_temp_reg helper
Alyssa Rosenzweig [Mon, 29 Jul 2019 23:52:36 +0000 (16:52 -0700)]
pan/midgard: Add make_compiler_temp_reg helper

Corrollary to make_compiler_temp (for SSA).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Move nir_*_src_index to compiler.h
Alyssa Rosenzweig [Mon, 29 Jul 2019 22:10:41 +0000 (15:10 -0700)]
pan/midgard: Move nir_*_src_index to compiler.h

These helpers are useful for code emission everywhere. Share the love!

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Disassemble unknown texture ops as hex
Alyssa Rosenzweig [Mon, 29 Jul 2019 21:07:42 +0000 (14:07 -0700)]
pan/midgard: Disassemble unknown texture ops as hex

I'm not sure why I ever thought decimal was a good idea.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Add support for disassembling derivatives
Alyssa Rosenzweig [Mon, 29 Jul 2019 21:07:19 +0000 (14:07 -0700)]
pan/midgard: Add support for disassembling derivatives

They're just texture ops.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agonir/find_array_copies: Use correct parent array length
Connor Abbott [Tue, 30 Jul 2019 09:05:22 +0000 (11:05 +0200)]
nir/find_array_copies: Use correct parent array length

instr->type is the type of the array element, not the type of the array
being dereferenced. Rather than fishing out the parent type, just use
parent->num_children which should be the length plus 1. While we're here
add another assert for the issue fixed by the previous commit.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111251
Fixes: 156306e5e62 ("nir/find_array_copies: Handle wildcards and overlapping copies")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: Fix comparison for nir_deref_instr_is_known_out_of_bounds()
Connor Abbott [Tue, 30 Jul 2019 09:04:14 +0000 (11:04 +0200)]
nir: Fix comparison for nir_deref_instr_is_known_out_of_bounds()

There was an off-by-one error.

Fixes: 156306e5e62 ("nir/find_array_copies: Handle wildcards and overlapping copies")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoradv/gfx10: only compile the GS copy shader on-demand
Samuel Pitoiset [Tue, 30 Jul 2019 13:14:35 +0000 (15:14 +0200)]
radv/gfx10: only compile the GS copy shader on-demand

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agogitlab-ci: Fix scons build directory path
Michel Dänzer [Fri, 26 Jul 2019 10:20:41 +0000 (12:20 +0200)]
gitlab-ci: Fix scons build directory path

Fixes: dd3d0b2897b8 "gitlab-ci: Only keep the build logs as artifacts."
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoswr/rasterizer: Add memory tracking support
Jan Zielinski [Fri, 26 Jul 2019 07:37:12 +0000 (09:37 +0200)]
swr/rasterizer: Add memory tracking support

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
5 years agoswr/rasterizer: Better implementation of scatter
Jan Zielinski [Wed, 24 Jul 2019 10:25:27 +0000 (12:25 +0200)]
swr/rasterizer: Better implementation of scatter

Added support for avx512 scatter instruction. Non-avx512 will
now call into a C function to do the scatter emulation.

This has better jit compile performance than
the previous approach of jitting scalar loops.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
5 years agoswr/rasterizer: cleanups for tessellation
Jan Zielinski [Wed, 24 Jul 2019 10:10:27 +0000 (12:10 +0200)]
swr/rasterizer: cleanups for tessellation

This commit introduces small fixes in preparation for tessellation
support.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
5 years agorasterizer/swr: move BucketMgr to SwrContext
Jan Zielinski [Wed, 24 Jul 2019 10:03:49 +0000 (12:03 +0200)]
rasterizer/swr: move BucketMgr to SwrContext

This move gets us back to parity  with global manager
in that we can dump render context buckets now.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
5 years agov3d: take into account separate_stencil when checking if stencil should be cleared
Alejandro Piñeiro [Mon, 29 Jul 2019 11:06:44 +0000 (13:06 +0200)]
v3d: take into account separate_stencil when checking if stencil should be cleared

In most cases this is not needed because the usual is that when a
separate stencil is written, the parent resource is also written.

This is needed if we have a separate stencil, no depth buffer, and the
source and destination is the same, as in that case the stencil can be
updated, but not the parent source (like if you are blitting only the
stencil buffer). On that situation, the following access to the
stencil buffer would clear the stencil buffer (so overwritting the
previous blitting) cleared because the parent source has
v3d_resource.writes to 0.

As far as I see, that situation only happens with the
GL_DEPTH32F_STENCIL8 format.

Note that one alternative would consider that if the separate_stencil
has been written, the parent should also be considered written (and
update its "writes" field accordingly). But I found this patch more
natural.

Fixes the following piglit tests:
   spec/arb_depth_buffer_float/fbo-stencil-gl_depth32f_stencil8-blit
   spec/arb_depth_buffer_float/fbo-stencil-gl_depth32f_stencil8-copypixels

the latter regressed when internally glCopyPixels implementation
started to use blitting. So:

Fixes: 131d40cfc91f ("st/mesa: accelerate glCopyPixels(STENCIL)")
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoradv: Don't include radv_private.h from radv_shader.h
Daniel Schürmann [Mon, 29 Jul 2019 15:51:01 +0000 (17:51 +0200)]
radv: Don't include radv_private.h from radv_shader.h

This patch decouples radv_shader.h from any LLVM dependency.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoi965/gen10: Remove unnecessary workaround.
Rafael Antognolli [Fri, 19 Jul 2019 22:41:35 +0000 (15:41 -0700)]
i965/gen10: Remove unnecessary workaround.

In fact, the description of the workaround states that the mask field
doesn't work correctly on gen10, and we need to set it to 0xffff even we
we only want to update a single field:

 "The mask bits are not implemented properly on 3DSTATE_3D_MODE.  Driver
 must always program bits 31:16 of DW1 a value of 0xFFFF.   This means
 if it is only updating 1 field, it must update all the fields to the
 correct value."

So unless we want to change any of the fields of 3DSTATE_3D_MODE,
there's not need to emit. Additionally, it seems this workaround is not
required on gen11. And last but not least, this workaround is not
implemented on iris or anv, and it doesn't seem to be missed there.

So let's just remove the whole thing.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agoiris: Fix SO offset to be 32-bit in DrawTransformFeedback handling
Kenneth Graunke [Mon, 29 Jul 2019 22:33:02 +0000 (15:33 -0700)]
iris: Fix SO offset to be 32-bit in DrawTransformFeedback handling

We accidentally started copying a full 64-bit value rather than copying
a 32-bit offset and zeroing the top 32-bits.  This caused us to compute
bogus vertex counts which could lead to GPU hangs in some cases.

Thanks to Clayton Craft for catching the regressions!

Fixes: 0e24d10ff5c ("iris: Use gen_mi_builder to handle CS ALU operations.")
5 years agointel: Use a system value for gl_FragCoord
Jason Ekstrand [Thu, 18 Jul 2019 14:59:44 +0000 (09:59 -0500)]
intel: Use a system value for gl_FragCoord

It's kind-of an anomaly that the Intel drivers are still treating
gl_FragCoord as an input.  It also makes zero sense because we have to
special-case it in the back-end.

Because ANV is the only user of nir_lower_wpos_center, we go ahead and
just update it to look for nir_intrinsic_load_frag_coord as part of this
patch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agoglsl: Treat gl_FragCoord as a varying even when it's a system value
Jason Ekstrand [Fri, 19 Jul 2019 15:42:56 +0000 (10:42 -0500)]
glsl: Treat gl_FragCoord as a varying even when it's a system value

This fixes glsl-fcoord-invariant-pass.shader_test on drivers that set
GLSLFragCoordIsSysVal which includes radeonsi among others.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agomesa/spirv: Set frag_coord_is_sysval to GLSLFragCoordIsSysVal
Jason Ekstrand [Thu, 18 Jul 2019 18:54:57 +0000 (13:54 -0500)]
mesa/spirv: Set frag_coord_is_sysval to GLSLFragCoordIsSysVal

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agointel/fs: Remove calculate_urb_setup from fs_visitor
Jason Ekstrand [Thu, 18 Jul 2019 14:15:15 +0000 (09:15 -0500)]
intel/fs: Remove calculate_urb_setup from fs_visitor

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agofreedreno/a6xx: fix MSAA resolve hangs
Rob Clark [Wed, 24 Jul 2019 20:31:13 +0000 (13:31 -0700)]
freedreno/a6xx: fix MSAA resolve hangs

Seems like RB_BLIT_SCISSOR needs to be aligned to (minimum?) tile size.

Fixes intermittent GPU hangs triggered by some of the three.js samples
on https://threejs.org/

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: fix for array/reg store vs meta instructions
Rob Clark [Fri, 19 Jul 2019 23:47:15 +0000 (16:47 -0700)]
freedreno/ir3: fix for array/reg store vs meta instructions

fishgl.com has a shader which does roughly:

   foo = texture(...);
   if (bar)
      foo = texture(...);

after lowering phi webs to regs we end up w/ a vec4 reg (array).  But
since it was not an indirect access, we try to skip the extra mov.  This
results that the per-component fanout (split) meta instructions store
directly to the reg (array).  Which doesn't work out in RA.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agomeson: bump required version to 0.46
Eric Engestrom [Thu, 18 Jul 2019 11:55:09 +0000 (12:55 +0100)]
meson: bump required version to 0.46

0.45 has a few annoying bugs (like the one in !358 [1]), and 0.46 is
well over a year old by now, so let's move to it.

[1] https://gitlab.freedesktop.org/mesa/mesa/merge_requests/358

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
5 years agoradeon/vcn/vp9: add Arcturus VP9 support
Leo Liu [Fri, 12 Jul 2019 13:47:44 +0000 (09:47 -0400)]
radeon/vcn/vp9: add Arcturus VP9 support

Arcturus CHIP enum is less than Navi10, since it's still gfx9,
but its VCN version belongs to VCN2.x

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
5 years agoradeon/vcn: add Arcturus decode support
Leo Liu [Thu, 20 Jun 2019 13:00:27 +0000 (09:00 -0400)]
radeon/vcn: add Arcturus decode support

different internal registers offset from previous HW

Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
5 years agoamd: add support for Arcturus
Marek Olšák [Mon, 22 Jul 2019 19:11:37 +0000 (15:11 -0400)]
amd: add support for Arcturus

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
5 years agoradeonsi: add AMD_DEBUG=nogfx for testing
Marek Olšák [Mon, 22 Jul 2019 19:09:54 +0000 (15:09 -0400)]
radeonsi: add AMD_DEBUG=nogfx for testing

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
5 years agoradeonsi: add support for compute-only chips
Marek Olšák [Thu, 7 Feb 2019 05:04:32 +0000 (00:04 -0500)]
radeonsi: add support for compute-only chips

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
5 years agogallium/auxiliary/vl: add compute shaders for deint yuv
Sonny Jiang [Fri, 7 Jun 2019 20:07:29 +0000 (16:07 -0400)]
gallium/auxiliary/vl: add compute shaders for deint yuv

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
5 years agogallium/auxiliary/vl: don't call gfx functions on compute-only chips
Sonny Jiang [Fri, 17 May 2019 19:07:29 +0000 (15:07 -0400)]
gallium/auxiliary/vl: don't call gfx functions on compute-only chips

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
5 years agogallium/auxiliary/vl: add PIPE_CAP_GRAPHICS check for vl compositor
James Zhu [Fri, 26 Apr 2019 15:40:09 +0000 (11:40 -0400)]
gallium/auxiliary/vl: add PIPE_CAP_GRAPHICS check for vl compositor

Init graphic shader Only when PIPE_CAP_GRAPHICS is true.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
5 years agogallium: create multimedia contexts as compute-only if graphics is unsupported
Marek Olšák [Thu, 7 Feb 2019 05:13:44 +0000 (00:13 -0500)]
gallium: create multimedia contexts as compute-only if graphics is unsupported

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
5 years agogallium: add PIPE_CAP_GRAPHICS
Marek Olšák [Thu, 7 Feb 2019 05:06:28 +0000 (00:06 -0500)]
gallium: add PIPE_CAP_GRAPHICS

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
5 years agoradv: implement VK_EXT_index_type_uint8
Samuel Pitoiset [Mon, 29 Jul 2019 08:50:56 +0000 (10:50 +0200)]
radv: implement VK_EXT_index_type_uint8

Natively supported on VI+.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoanv: implement VK_EXT_index_type_uint8
Lionel Landwerlin [Mon, 13 May 2019 15:33:22 +0000 (16:33 +0100)]
anv: implement VK_EXT_index_type_uint8

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agovulkan: Bump headers to 1.1.117
Lionel Landwerlin [Mon, 13 May 2019 15:29:31 +0000 (16:29 +0100)]
vulkan: Bump headers to 1.1.117

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoinclude/vulkan: bump vk_android_native_buffer
Lionel Landwerlin [Mon, 29 Jul 2019 13:00:00 +0000 (16:00 +0300)]
include/vulkan: bump vk_android_native_buffer

Taken off https://android.googlesource.com/platform/frameworks/native/+/refs/tags/android-9.0.0_r45/vulkan/include/vulkan/vk_android_native_buffer.h

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/mi: only resolve to a temp register if source isn't in memory
Eric Engestrom [Mon, 29 Jul 2019 14:11:13 +0000 (15:11 +0100)]
intel/mi: only resolve to a temp register if source isn't in memory

aka. fix a s/||/&&/ typo

Fixes: 74063ee61aadd1371a9b ("intel/mi: Add a new gen_mi_store_if() helper.")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agogitlab-ci: Enable freedreno shader-db runs.
Eric Anholt [Thu, 25 Jul 2019 22:46:51 +0000 (15:46 -0700)]
gitlab-ci: Enable freedreno shader-db runs.

Now that helgrind is less upset and I've completed many successful
full shader-db runs, we should be able to enable freedreno shader-db
runs for Mesa checkins on the tiny public shader-db.

Reviewed-by: Rob Clark <robdclark@gmail.com>
5 years agonir: Fix helgrind complaints about data race in trivial_swizzle init.
Eric Anholt [Thu, 25 Jul 2019 20:37:28 +0000 (13:37 -0700)]
nir: Fix helgrind complaints about data race in trivial_swizzle init.

Even if the data race wasn't real (I'm not great at reasoning about
this), helgrind is a nice enough tool that keeping noise out of it is
probably worthwhile.  Besides, typing out the numbers keeps the data
in the read-only data section instead of emitting code to initialize
it every time.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
5 years agofreedreno: Fix data race on making the shader's id.
Eric Anholt [Thu, 25 Jul 2019 20:26:01 +0000 (13:26 -0700)]
freedreno: Fix data race on making the shader's id.

The value is only used for IR3_DBG_DISASM, but it cleans up the
helgrind output.

Reviewed-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno: Take a lock around shader variant creation.
Eric Anholt [Thu, 25 Jul 2019 20:21:56 +0000 (13:21 -0700)]
freedreno: Take a lock around shader variant creation.

Shaders are shared across contexts in gallium (part of making it so
that you get shader compile at link time, for shader-db and to reduce
compiles at draw time).  So, we need to protect from variant creation
for a shader from multiple threads at the same time.

Reviewed-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno: Fix data races with allocating/freeing struct ir3.
Eric Anholt [Thu, 25 Jul 2019 20:09:51 +0000 (13:09 -0700)]
freedreno: Fix data races with allocating/freeing struct ir3.

There is a single ir3_compiler in the screen, and each context may be
compiling ir3 shaders, which call ir3_create.  ralloc doesn't do any
locking on its own, so eventually you can end up racing to break
ralloc's linked lists.

We really don't want struct ir3 to live as long as the compiler (maybe
struct ir3_shader's lifetime, if anything), so you'd better be freeing
it anyway.

Fixes: 8fe20762433d ("freedreno/ir3: convert over to ralloc")
Reviewed-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno: Fix helgrind complaint on shader-db key setup.
Eric Anholt [Thu, 25 Jul 2019 19:58:59 +0000 (12:58 -0700)]
freedreno: Fix helgrind complaint on shader-db key setup.

If the variable's going to be static, we shouldn't be memsetting it
from every thread and instead just have it in the data section.

Reviewed-by: Rob Clark <robdclark@gmail.com>
5 years agoradv: Take variable descriptor counts into account for buffer entries.
Bas Nieuwenhuizen [Mon, 29 Jul 2019 14:52:23 +0000 (16:52 +0200)]
radv: Take variable descriptor counts into account for buffer entries.

Fixes: b5e04e9217b "radv: Support allocating variable size descriptor sets."
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111019
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoanv: Don't claim support for 24 and 48-bit formats on IVB
Jason Ekstrand [Sat, 27 Jul 2019 13:55:47 +0000 (08:55 -0500)]
anv: Don't claim support for 24 and 48-bit formats on IVB

Cc: mesa-stable@lists.freedesktop.org
5 years agoisl/formats: R8G8B8_UNORM_SRGB isn't supported on HSW
Jason Ekstrand [Fri, 26 Jul 2019 22:41:59 +0000 (17:41 -0500)]
isl/formats: R8G8B8_UNORM_SRGB isn't supported on HSW

On Haswell, the format works but it doesn't properly do an sRGB decode.
It appears to act identically to R8G8B8_UNORM.  Only Vulkan uses this
format so this only affects Vulkan on HSW.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
5 years agopan/midgard: Fix alpha test w.r.t new indexing
Alyssa Rosenzweig [Mon, 29 Jul 2019 15:31:03 +0000 (08:31 -0700)]
pan/midgard: Fix alpha test w.r.t new indexing

Fixes: 9beb3391b55 ("pan/midgard: Tag SSA/reg")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agosoftpipe: Don't draw when rasterizer_discard is set
Gert Wollny [Thu, 25 Jul 2019 13:30:09 +0000 (15:30 +0200)]
softpipe: Don't draw when rasterizer_discard is set

Fixes:
  dEQP-GLES3.functional.rasterizer_discard.basic.write_depth_points
  dEQP-GLES3.functional.rasterizer_discard.basic.write_stencil_points
  dEQP-GLES3.functional.rasterizer_discard.fbo.write_depth_points
  dEQP-GLES3.functional.rasterizer_discard.fbo.write_stencil_points
  dEQP-GLES3.functional.rasterizer_discard.scissor.write_depth_points
  dEQP-GLES3.functional.rasterizer_discard.scissor.write_stencil_points

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agosoftpipe: Fix cube arrays layer selection
Gert Wollny [Thu, 18 Jul 2019 06:21:39 +0000 (08:21 +0200)]
softpipe: Fix cube arrays layer selection

To select the correct layer the z-coordinate must be rounded before it
is multiplied by six.

Fixes a number of tests out of
   dEQP-GLES31.functional.texture.filtering.cube_array.formats.*

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agovulkan/wsi/wayland: implement acquire timeout
Lionel Landwerlin [Fri, 26 Jul 2019 09:08:13 +0000 (12:08 +0300)]
vulkan/wsi/wayland: implement acquire timeout

v2: Eric's nits

v3: Reuse timespec utils (Daniel)
    Deal with ppoll being interrupted by a signal (Daniel)

v4: Remove unnecessary time check

v5: Deal with EAGAIN from wl_display_prepare_read_queue() (Daniel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v2)
Reviewed-by: Daniel Stone <daniels@collabora.com>
5 years agoutil: add a timespec helper
Lionel Landwerlin [Mon, 29 Jul 2019 09:54:04 +0000 (12:54 +0300)]
util: add a timespec helper

Copied from Weston, upon Daniel's suggestion

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
5 years agointel: replace large stack buffer with heap allocation
Eric Engestrom [Thu, 18 Oct 2018 16:19:56 +0000 (17:19 +0100)]
intel: replace large stack buffer with heap allocation

For now, this keeps the "100 bytes" allocation; we can try to figure out
the correct size as a follow up.

Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agoradv/gfx10: do not use the fast depth or stencil clear bytes path
Samuel Pitoiset [Mon, 29 Jul 2019 12:15:23 +0000 (14:15 +0200)]
radv/gfx10: do not use the fast depth or stencil clear bytes path

It causes issues on GFX10.

This fixes rendering issues with vkmark and Wreckfest at least.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl
5 years agoac: do not crash when the buffer data format is invalid
Samuel Pitoiset [Mon, 29 Jul 2019 10:03:46 +0000 (12:03 +0200)]
ac: do not crash when the buffer data format is invalid

This might happen when a pipeline doesn't define the vertex input
state, so the buffer data format is 0 (aka INVALID).

This fixes crashes when compiling some shaders on GFX10.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoac/nir: fix txf_ms with an offset
Rhys Perry [Fri, 19 Jul 2019 14:20:44 +0000 (15:20 +0100)]
ac/nir: fix txf_ms with an offset

Seems to fix some hair artifacts in Max Payne 3:
https://github.com/daniel-schuermann/mesa/issues/76

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: f4e499ec791 ('radv: add initial non-conformant radv vulkan driver')
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Delete unused local variables in optimization loop
Connor Abbott [Wed, 26 Jun 2019 12:03:31 +0000 (14:03 +0200)]
radv: Delete unused local variables in optimization loop

Totals from affected shaders:
SGPRS: 376 -> 376 (0.00 %)
VGPRS: 620 -> 560 (-9.68 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 292 -> 292 (0.00 %) dwords per thread
Code Size: 20024 -> 20144 (0.60 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 25 -> 25 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir/find_array_copies: Handle wildcards and overlapping copies
Connor Abbott [Tue, 18 Jun 2019 10:12:49 +0000 (12:12 +0200)]
nir/find_array_copies: Handle wildcards and overlapping copies

This commit rewrites opt_find_array_copies to be able to handle
an array copy sequence with other intervening operations in between. In
particular, this handles the case where we OpLoad an array of structs
and then OpStore it, which generates code like:

foo[0].a = bar[0].a
foo[0].b = bar[0].b
foo[1].a = bar[1].a
foo[1].b = bar[1].b
...

that wasn't recognized by the previous pass.

In order to correctly handle copying arrays of arrays, and in particular
to correctly handle copies involving wildcards, we need to use a tree
structure similar to lower_vars_to_ssa so that we can walk all the
partial array copies invalidated by a particular write, including
ones where one of the common indices is a wildcard. I actually think
that when factoring in the needed hashing/comparing code, a hash table
based approach wouldn't be a lot smaller anyways.

All of the changes come from tessellation control shaders in Strange
Brigade, where we're able to remove the DXVK-inserted copy at the
beginning of the shader. These are the result for radv:

Totals from affected shaders:
SGPRS: 4576 -> 4576 (0.00 %)
VGPRS: 13784 -> 5560 (-59.66 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 8696 -> 6876 (-20.93 %) dwords per thread
Code Size: 329940 -> 263268 (-20.21 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 330 -> 898 (172.12 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: Print array deref indices as decimal
Connor Abbott [Wed, 26 Jun 2019 11:41:20 +0000 (13:41 +0200)]
nir: Print array deref indices as decimal

We print the size as decimal too, and using hex without a leading "0x"
was very confusing.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agolima/gpir/sched: Handle more special ops in can_use_complex()
Connor Abbott [Sat, 27 Jul 2019 18:24:32 +0000 (20:24 +0200)]
lima/gpir/sched: Handle more special ops in can_use_complex()

We were missing handling for a few other ops that rearrange their
sources somehow in codegen, namely complex2 and select.

This should fix spec@glsl-1.10@execution@built-in-functions@vs-asin-vec3
and possibly other random regressions from the new scheduler which were
supposed to be fixed in the commit right after.

Fixes: 54434fe6706 ("lima/gpir: Rework the scheduler")
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Acked-by: Qiang Yu <yuq825@gmail.com>
5 years agolima/gp: Clean up lima_program_optimize_vs_nir() a little
Connor Abbott [Sat, 27 Jul 2019 17:38:53 +0000 (19:38 +0200)]
lima/gp: Clean up lima_program_optimize_vs_nir() a little

Remove an unnecessary nir_lower_regs_to_ssa as that should be done by
the state tracker, and add a missing DCE pass after running copy
propagation in order to remove the dead copies. This shouldn't fix
anything but the second part will reduce shader sizes.

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
5 years agolima/gpir/sched: Don't try to spill when something else has succeeded
Connor Abbott [Sat, 27 Jul 2019 16:31:09 +0000 (18:31 +0200)]
lima/gpir/sched: Don't try to spill when something else has succeeded

In try_node(), we assume that the node we pick can still be scheduled
successfully after speculatively trying all the other nodes. Normally we
always undo every node after speculating it, so that when we finally
schedule best_node the scheduler state is exactly the same and it
succeeds. However, we also try to spill nodes, which can change the
state and in a corner case that can make scheduling best_node fail. In
particular, the following sequence of events happened with piglit
shaders@glsl-vs-if-nested: a partially-ready node N was spilled and a
register store node S, which is a use of N, was created and then later
the other uses of N were scheduled, so that S is now ready and N is
partially ready. First we try to schedule S and succeed, then we try to
schedule another node M, which fails, so we try to spill the remaining
uses of N. This succeeds, but scheduling M still fails so that best_node
is still S. However since one of the uses of N is one cycle ago, and
therefore we inserted a read dependent on S one cycle ago when spilling
N, S can no longer be scheduled as read-after-write latency is three
cycles.

While we could ad-hoc try to catch cases like this, or (the best option
but very complicated) treat the spill as speculative and roll it back if
we decide not to schedule the node, a simpler solution is to just
give up on spilling if we've already successfully speculatively
scheduled another node. We'd give up a few cases where we discover that
by spilling even harder we could schedule a more desirable node, but
that seems like it would be pretty rare in practice. With this we
guarantee that nothing has been touched after best_node was successfully
scheduled. We also cut down on pointless spilling, since if we already
scheduled a node it's unlikely that spilling harder will let us schedule
an even better node, and hence any spilling at this point is probably
useless.

While we're here, clean up the code around spilling by flattening the
two if's and getting rid of the second unnecessary check for INT_MIN.

Fixes: 54434fe6706 ("lima/gpir: Rework the scheduler")
Acked-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonv50/ir: don't consider the main compute function as taking arguments
Ilia Mirkin [Fri, 26 Jul 2019 05:18:23 +0000 (01:18 -0400)]
nv50/ir: don't consider the main compute function as taking arguments

With OpenCL, kernels can take arguments and return values (?). However
in practice, there is no more TGSI compute implementation, and even if
there were, it would probably have named functions and no explicit main.

This improves RA considerably for compute shaders, since temps are not
kept around as return values.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
5 years agonv50/ir: handle insn not being there for definition of CVT arg
Ilia Mirkin [Fri, 26 Jul 2019 05:01:45 +0000 (01:01 -0400)]
nv50/ir: handle insn not being there for definition of CVT arg

This can happen if it's e.g. a uniform or a function argument.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111217
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
5 years agonouveau: flip DEBUG -> !NDEBUG
Ilia Mirkin [Fri, 26 Jul 2019 03:23:08 +0000 (23:23 -0400)]
nouveau: flip DEBUG -> !NDEBUG

The meson conversion chose to change the meaning of DEBUG to "used for
debugging" to be "used for expensive things for debugging", primarily
for nir_validate. Flip things over so that we get nice things with
optimizations enabled.

While we're at it, also kill off nouveau_statebuf.h which is unused (and
has a mention of DEBUG which is how I found it).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
5 years agonvc0: allow a non-user buffer to be bound at position 0
Ilia Mirkin [Fri, 26 Jul 2019 03:27:56 +0000 (23:27 -0400)]
nvc0: allow a non-user buffer to be bound at position 0

Previously the code only handled it for positions 1 and up (as would be
for UBO's in GL). It's not a lot of trouble to handle this, and vl or
vdpau want this.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111213
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
5 years agonv50,nvc0: update sampler/view bind functions to accept NULL array
Ilia Mirkin [Fri, 26 Jul 2019 03:26:46 +0000 (23:26 -0400)]
nv50,nvc0: update sampler/view bind functions to accept NULL array

Apparently vl (or vdpau) wants to pass that in now. Handle it.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111213
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Cc: mesa-stable@lists.freedesktop.org
5 years agogallium/vl: fix compute tgsi shaders to not process undefined components
Ilia Mirkin [Fri, 26 Jul 2019 03:18:18 +0000 (23:18 -0400)]
gallium/vl: fix compute tgsi shaders to not process undefined components

This caused nouveau's function handling logic to think that the MAIN
function was due to receive external parameters, and cascaded some
failures after that. Instead avoid having the undefined components in
the first place.

Fixes: f6ac0b5d71 (gallium/auxiliary/vl: Add compute shader to support video compositor render)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111213
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111217
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agopan/midgard: Introduce invert field
Alyssa Rosenzweig [Fri, 26 Jul 2019 18:15:31 +0000 (11:15 -0700)]
pan/midgard: Introduce invert field

This will enable us to fuse inverts in various ways. Marginal hurt:

total instructions in shared programs: 3610 -> 3611 (0.03%)
instructions in affected programs: 67 -> 68 (1.49%)
helped: 0
HURT: 1

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Tag SSA/reg
Alyssa Rosenzweig [Fri, 26 Jul 2019 18:30:06 +0000 (11:30 -0700)]
pan/midgard: Tag SSA/reg

Rather than putting registers after SSA in the MIR indexing, put them
side-by-side, shifted 1, using the bottom bit as the SSA/reg select.
This will allow us to generate SSA temps in the compiler.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agoradeon/vcn: enable rate control for hevc encoding
Boyuan Zhang [Mon, 17 Jun 2019 19:02:32 +0000 (15:02 -0400)]
radeon/vcn: enable rate control for hevc encoding

Set cu_qp_delta_enable_flag on when rate control is enabled, and set it
off when rate control is disabled (e.g. constant qp).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110673
Cc: mesa-stable@lists.freedesktop.org
V2: fix typo and add bugzilla info

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
5 years agoradeon/uvd: enable rate control for hevc encoding
Boyuan Zhang [Mon, 17 Jun 2019 19:00:53 +0000 (15:00 -0400)]
radeon/uvd: enable rate control for hevc encoding

Set cu_qp_delta_enable_flag on when rate control is enabled, and set it
off when rate control is disabled (e.g. constant qp).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110673
Cc: mesa-stable@lists.freedesktop.org
V2: fix typo and add bugzilla info

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
5 years agoradeon/vcn: fix poc for hevc encode
Boyuan Zhang [Wed, 29 May 2019 18:25:38 +0000 (14:25 -0400)]
radeon/vcn: fix poc for hevc encode

MaxPicOrderCntLsb should be at least 16 according to the spec,
therefore add minimum value check.

Also use poc value passed from st instead of calculation
in slice header encoding.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110673
Cc: mesa-stable@lists.freedesktop.org
V2: Fix typo

V3: Use MAX2 macro instead of coding. Also MaxPicOrderCntLsb
should be power of 2 according to spec.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
5 years agoradeon/uvd: fix poc for hevc encode
Boyuan Zhang [Wed, 29 May 2019 18:25:07 +0000 (14:25 -0400)]
radeon/uvd: fix poc for hevc encode

MaxPicOrderCntLsb should be at least 16 according to the spec,
therefore add minimum value check.

Also use poc value passed from st instead of calculation
in slice header encoding.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110673
Cc: mesa-stable@lists.freedesktop.org
V2: Fix typo

V3: Use MAX2 macro instead of coding. Also MaxPicOrderCntLsb
should be power of 2 according to spec.

Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Leo Liu <leo.liu@amd.com>
5 years agonir: Optimize umod lowering
Sagar Ghuge [Mon, 22 Jul 2019 23:30:56 +0000 (16:30 -0700)]
nir: Optimize umod lowering

We don't have calculate final quotient in order to calculate unsigned
modulo result.  Once we are done with error correction we have partial
result which can be used to find out modulo operation result

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agopan/midgard: Improve scheduling
Alyssa Rosenzweig [Fri, 26 Jul 2019 17:28:46 +0000 (10:28 -0700)]
pan/midgard: Improve scheduling

Make scalar scheduling onto vector units more aggressive (it can only
help while we schedule strictly in order). Also, allow imov on VLUT.

total bundles in shared programs: 2176 -> 2117 (-2.71%)
bundles in affected programs: 901 -> 842 (-6.55%)
helped: 24
HURT: 0
helped stats (abs) min: 1 max: 18 x̄: 2.46 x̃: 2
helped stats (rel) min: 2.08% max: 20.00% x̄: 8.68% x̃: 5.94%
95% mean confidence interval for bundles value: -3.93 -0.99
95% mean confidence interval for bundles %-change: -10.92% -6.45%
Bundles are helped.

total quadwords in shared programs: 3605 -> 3566 (-1.08%)
quadwords in affected programs: 1984 -> 1945 (-1.97%)
helped: 28
HURT: 5
helped stats (abs) min: 1 max: 3 x̄: 1.68 x̃: 2
helped stats (rel) min: 1.02% max: 14.29% x̄: 5.12% x̃: 2.94%
HURT stats (abs)   min: 1 max: 3 x̄: 1.60 x̃: 1
HURT stats (rel)   min: 0.57% max: 9.09% x̄: 6.40% x̃: 9.09%
95% mean confidence interval for quadwords value: -1.67 -0.69
95% mean confidence interval for quadwords %-change: -5.37% -1.37%
Quadwords are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Specialize mod checking by type when checking constants
Alyssa Rosenzweig [Fri, 26 Jul 2019 15:50:22 +0000 (08:50 -0700)]
pan/midgard: Specialize mod checking by type when checking constants

Fixes inlining of integer constants.

total quadwords in shared programs: 3585 -> 3568 (-0.47%)
quadwords in affected programs: 625 -> 608 (-2.72%)
helped: 13
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.31 x̃: 1
helped stats (rel) min: 1.27% max: 9.52% x̄: 3.84% x̃: 2.94%
95% mean confidence interval for quadwords value: -1.60 -1.02
95% mean confidence interval for quadwords %-change: -5.60% -2.07%
Quadwords are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Use more aggressive writeout criteria
Alyssa Rosenzweig [Fri, 26 Jul 2019 15:30:22 +0000 (08:30 -0700)]
pan/midgard: Use more aggressive writeout criteria

We loosen the requirement of "no dependencies" to simply be "no
non-pipelined dependencies", so we check for what could be pipelined.

total bundles in shared programs: 2176 -> 2156 (-0.92%)
bundles in affected programs: 779 -> 759 (-2.57%)
helped: 20
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.33% max: 20.00% x̄: 6.47% x̃: 2.78%
95% mean confidence interval for bundles value: -1.00 -1.00
95% mean confidence interval for bundles %-change: -9.44% -3.50%
Bundles are helped.

total quadwords in shared programs: 3605 -> 3585 (-0.55%)
quadwords in affected programs: 1391 -> 1371 (-1.44%)
helped: 20
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.19% max: 14.29% x̄: 3.84% x̃: 1.64%
95% mean confidence interval for quadwords value: -1.00 -1.00
95% mean confidence interval for quadwords %-change: -5.73% -1.94%
Quadwords are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Pipeline non-SSA registers
Alyssa Rosenzweig [Fri, 26 Jul 2019 16:37:58 +0000 (09:37 -0700)]
pan/midgard: Pipeline non-SSA registers

Rather than bailing if we see something that's not SSA, do out the
analysis to check if we can pipeline and do so if we can.

total registers in shared programs: 392 -> 391 (-0.26%)
registers in affected programs: 3 -> 2 (-33.33%)
helped: 1
HURT: 0

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Add mir_mask_of_read_components helper
Alyssa Rosenzweig [Fri, 26 Jul 2019 16:37:28 +0000 (09:37 -0700)]
pan/midgard: Add mir_mask_of_read_components helper

This facilitates analysis of vec4 registers (after going out-of-SSA).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Add mir_is_written_before helper
Alyssa Rosenzweig [Fri, 26 Jul 2019 16:20:52 +0000 (09:20 -0700)]
pan/midgard: Add mir_is_written_before helper

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Obey fragment writeout criteria
Alyssa Rosenzweig [Tue, 16 Jul 2019 22:57:19 +0000 (15:57 -0700)]
pan/midgard: Obey fragment writeout criteria

Rather than always emitting an extra move for fragments, check the
actual criteria and emit accordingly. (This was lost during the RA
improvements at the end of May).

total bundles in shared programs: 2210 -> 2176 (-1.54%)
bundles in affected programs: 501 -> 467 (-6.79%)
helped: 34
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.59% max: 33.33% x̄: 13.13% x̃: 12.50%
95% mean confidence interval for bundles value: -1.00 -1.00
95% mean confidence interval for bundles %-change: -16.06% -10.21%
Bundles are helped.

total quadwords in shared programs: 3639 -> 3605 (-0.93%)
quadwords in affected programs: 795 -> 761 (-4.28%)
helped: 34
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.96% max: 33.33% x̄: 11.22% x̃: 8.33%
95% mean confidence interval for quadwords value: -1.00 -1.00
95% mean confidence interval for quadwords %-change: -14.31% -8.13%
Quadwords are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Add post-RA move elimination
Alyssa Rosenzweig [Thu, 25 Jul 2019 22:34:13 +0000 (15:34 -0700)]
pan/midgard: Add post-RA move elimination

Think of this pass as register coalescing part 2. After RA runs, but
before scheduling, we scan for code of the form:

   mov rN, rN

and delete the move, since it's totally redundant. This pass helps
already, but it'd of course be much more effective paired with
register coalescing to encourage moves in general to end up in this
form. Nevertheless, even by itself:

total instructions in shared programs: 3665 -> 3613 (-1.42%)
instructions in affected programs: 2046 -> 1994 (-2.54%)
helped: 52
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.19% max: 25.00% x̄: 8.02% x̃: 4.00%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -10.26% -5.79%
Instructions are helped.

total bundles in shared programs: 2256 -> 2213 (-1.91%)
bundles in affected programs: 1154 -> 1111 (-3.73%)
helped: 43
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.33% max: 25.00% x̄: 9.10% x̃: 5.56%
95% mean confidence interval for bundles value: -1.00 -1.00
95% mean confidence interval for bundles %-change: -11.60% -6.60%
Bundles are helped.

total quadwords in shared programs: 3689 -> 3642 (-1.27%)
quadwords in affected programs: 2025 -> 1978 (-2.32%)
helped: 47
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.19% max: 25.00% x̄: 7.86% x̃: 3.85%
95% mean confidence interval for quadwords value: -1.00 -1.00
95% mean confidence interval for quadwords %-change: -10.30% -5.42%
Quadwords are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Share mir_nontrivial_outmod
Alyssa Rosenzweig [Thu, 25 Jul 2019 22:33:56 +0000 (15:33 -0700)]
pan/midgard: Share mir_nontrivial_outmod

To be used with redundant move elimination.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Implement texture RA
Alyssa Rosenzweig [Thu, 25 Jul 2019 15:44:53 +0000 (08:44 -0700)]
pan/midgard: Implement texture RA

total instructions in shared programs: 3916 -> 3665 (-6.41%)
instructions in affected programs: 1405 -> 1154 (-17.86%)
helped: 35
HURT: 0
helped stats (abs) min: 1 max: 21 x̄: 7.17 x̃: 3
helped stats (rel) min: 3.00% max: 28.57% x̄: 20.11% x̃: 21.74%
95% mean confidence interval for instructions value: -9.35 -4.99
95% mean confidence interval for instructions %-change: -22.75% -17.46%
Instructions are helped.

total bundles in shared programs: 2472 -> 2256 (-8.74%)
bundles in affected programs: 906 -> 690 (-23.84%)
helped: 32
HURT: 0
helped stats (abs) min: 1 max: 18 x̄: 6.75 x̃: 3
helped stats (rel) min: 5.56% max: 32.26% x̄: 20.83% x̃: 16.67%
95% mean confidence interval for bundles value: -9.09 -4.41
95% mean confidence interval for bundles %-change: -23.77% -17.89%
Bundles are helped.

total quadwords in shared programs: 3965 -> 3689 (-6.96%)
quadwords in affected programs: 1568 -> 1292 (-17.60%)
helped: 35
HURT: 0
helped stats (abs) min: 1 max: 21 x̄: 7.89 x̃: 3
helped stats (rel) min: 2.08% max: 28.57% x̄: 19.87% x̃: 20.00%
95% mean confidence interval for quadwords value: -10.38 -5.39
95% mean confidence interval for quadwords %-change: -22.57% -17.17%
Quadwords are helped.

total registers in shared programs: 411 -> 392 (-4.62%)
registers in affected programs: 76 -> 57 (-25.00%)
helped: 15
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.27 x̃: 1
helped stats (rel) min: 9.09% max: 50.00% x̄: 30.97% x̃: 33.33%
95% mean confidence interval for registers value: -1.52 -1.01
95% mean confidence interval for registers %-change: -39.12% -22.82%
Registers are helped.

total threads in shared programs: 426 -> 432 (1.41%)
threads in affected programs: 6 -> 12 (100.00%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Fix backwards blend color load
Alyssa Rosenzweig [Fri, 26 Jul 2019 15:15:50 +0000 (08:15 -0700)]
pan/midgard: Fix backwards blend color load

The source and destination were incorrectly flipped in the move, but
some details of our internal regalloc made this function anyway. Now
that we're changing the regalloc, we need to fix this to avoid
regressing blend shaders.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Fix scheduling mishap
Alyssa Rosenzweig [Thu, 25 Jul 2019 21:53:20 +0000 (14:53 -0700)]
pan/midgard: Fix scheduling mishap

We shouldn't try to schedule onto a vmul if the last unit was a smul;
that would force a break ("traveling back in time").

total bundles in shared programs: 2519 -> 2472 (-1.87%)
bundles in affected programs: 791 -> 744 (-5.94%)
helped: 20
HURT: 0
helped stats (abs) min: 1 max: 9 x̄: 2.35 x̃: 1
helped stats (rel) min: 1.52% max: 11.76% x̄: 7.94% x̃: 7.69%
95% mean confidence interval for bundles value: -3.47 -1.23
95% mean confidence interval for bundles %-change: -9.36% -6.51%
Bundles are helped.

total quadwords in shared programs: 4028 -> 3965 (-1.56%)
quadwords in affected programs: 1223 -> 1160 (-5.15%)
helped: 17
HURT: 0
helped stats (abs) min: 1 max: 17 x̄: 3.71 x̃: 2
helped stats (rel) min: 2.97% max: 10.64% x̄: 6.97% x̃: 7.14%
95% mean confidence interval for quadwords value: -5.71 -1.70
95% mean confidence interval for quadwords %-change: -8.03% -5.91%
Quadwords are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Fix vector->scalar swizzles
Alyssa Rosenzweig [Fri, 26 Jul 2019 13:30:16 +0000 (06:30 -0700)]
pan/midgard: Fix vector->scalar swizzles

The swizzle should be taken on the masked component, rather than
unconditionally X.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Add dead move elimination pass
Alyssa Rosenzweig [Thu, 25 Jul 2019 21:43:32 +0000 (14:43 -0700)]
pan/midgard: Add dead move elimination pass

This is a special case of DCE designed to run after the out-of-ssa pass
to cleanup special register lowering.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Move DCE into its own file
Alyssa Rosenzweig [Thu, 25 Jul 2019 21:33:58 +0000 (14:33 -0700)]
pan/midgard: Move DCE into its own file

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Add mir_rewrite_dst_tag helper
Alyssa Rosenzweig [Thu, 25 Jul 2019 19:28:38 +0000 (12:28 -0700)]
pan/midgard: Add mir_rewrite_dst_tag helper

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Fix flipped register bias fields
Alyssa Rosenzweig [Thu, 25 Jul 2019 17:55:09 +0000 (10:55 -0700)]
pan/midgard: Fix flipped register bias fields

We mixed up component_lo and full, which made it appear that we had
less freedom in RA than we actually do. Fix this to fix some
disassemblies as well as prepare for RA with the bias field.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Update RA for cubemap coords
Alyssa Rosenzweig [Thu, 25 Jul 2019 14:09:40 +0000 (07:09 -0700)]
pan/midgard: Update RA for cubemap coords

Following the RA work, we apply the same technique to eliminate the move
to r27 when loading cubemaps.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agoanv+tu+radv: delete unusable dev_icd.json
Eric Engestrom [Wed, 10 Jul 2019 15:22:29 +0000 (16:22 +0100)]
anv+tu+radv: delete unusable dev_icd.json

As per previous commit, Meson doesn't support using uninstalled libs,
they're simply not ready until `ninja install` is ran, so delete them.

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> # for anv
Reviewed-by: Eric Anholt <eric@anholt.net> # for tu
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> # for radv
5 years agodocs: fix intel_icd.json path
Eric Engestrom [Fri, 21 Jun 2019 15:53:17 +0000 (16:53 +0100)]
docs: fix intel_icd.json path

Meson doesn't support using uninstalled libs, they're simply not ready
until `ninja install` is ran, at which point one might as well use the
proper icd.json file in the install folder.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agovulkan/wsi/x11: Increase the effective min. images for mailbox.
Bas Nieuwenhuizen [Mon, 20 May 2019 20:58:32 +0000 (22:58 +0200)]
vulkan/wsi/x11: Increase the effective min. images for mailbox.

We need 5 images:
1) CPU work
2) GPU work
3) idle
4) queued for flip
5) presenting

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agovulkan/wsi/x11: Wait for GPU work before present with mailbox.
Bas Nieuwenhuizen [Mon, 20 May 2019 01:10:46 +0000 (03:10 +0200)]
vulkan/wsi/x11: Wait for GPU work before present with mailbox.

Otherwise the wait only happens at flip time, which messes with
keeping idle buffers around if the GPU work makes the image miss
the next flip.

I decided not to use the wait fences as those are still xshm fences,
so that means we'd still have to wait in the application. Just doing
it before presenting makes things simpler.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agovulkan/wsi/x11: Allow using thread present-only.
Bas Nieuwenhuizen [Mon, 20 May 2019 00:59:00 +0000 (02:59 +0200)]
vulkan/wsi/x11: Allow using thread present-only.

This allows doing a potential long blocking operation before present.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agovulkan/wsi: Use one fence per image.
Bas Nieuwenhuizen [Mon, 20 May 2019 00:51:51 +0000 (02:51 +0200)]
vulkan/wsi: Use one fence per image.

Much easier to work with if we want to use them in the WS-specific
WSI implementation.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>