mesa.git
7 years agodri: Update dri_util to keep track of __DRI_BACKGROUND_CALLABLE
Paul Berry [Wed, 14 Nov 2012 22:39:21 +0000 (14:39 -0800)]
dri: Update dri_util to keep track of __DRI_BACKGROUND_CALLABLE

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
7 years agodri_interface: Add new marshalling interfaces to dri_interface.h
Paul Berry [Wed, 14 Nov 2012 19:13:02 +0000 (11:13 -0800)]
dri_interface: Add new marshalling interfaces to dri_interface.h

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
7 years agogallivm: (trivial) remove duplicated line
Roland Scheidegger [Thu, 16 Mar 2017 03:01:41 +0000 (04:01 +0100)]
gallivm: (trivial) remove duplicated line

pointed out by clang (stored value never read)

7 years agodraw: (trivial) remove a unnecessary lp_build_alloca()
Roland Scheidegger [Thu, 16 Mar 2017 02:59:52 +0000 (03:59 +0100)]
draw: (trivial) remove a unnecessary lp_build_alloca()

pointed out by clang (stored value never read)

7 years agoswr: support layer output in geometry shaders
Ilia Mirkin [Sun, 5 Mar 2017 23:24:44 +0000 (18:24 -0500)]
swr: support layer output in geometry shaders

This makes bin/gl-3.2-layered-rendering-gl-layer-render fail only with
2DMS_ARRAY, which is expected given the lackluster MSAA support. However
all the regular types pass.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoRevert "radv: Emit cache flushes before CP DMA."
Bas Nieuwenhuizen [Wed, 15 Mar 2017 17:49:29 +0000 (18:49 +0100)]
Revert "radv: Emit cache flushes before CP DMA."

This reverts commit cce43f6d8c40222099badaf52344d6a0eed993f3.

Redundant, as the flush already happens at si_cp_dma_prepare.

Acked-by: Dave Airlie <airlied@redhat.com>
7 years agogallium/tgsi: Treat UCMP sources as floats to match the GLSL-to-TGSI pass expectations.
Francisco Jerez [Tue, 14 Mar 2017 00:31:39 +0000 (17:31 -0700)]
gallium/tgsi: Treat UCMP sources as floats to match the GLSL-to-TGSI pass expectations.

Currently the GLSL-to-TGSI translation pass assumes it can use
floating point source modifiers on the UCMP instruction.  See the bug
report linked below for an example where an unrelated change in the
GLSL built-in lowering code for atan2 (e9ffd12827ac11a2d2002a42fa8eb1)
caused the generation of floating-point ir_unop_neg instructions
followed by ir_triop_csel, which is translated into UCMP with a negate
modifier on back-ends with native integer support.

Allowing floating-point source modifiers on an integer instruction
seems like rather dubious design for a transport IR, since the same
semantics could be represented as a sequence of MOV+UCMP instructions
instead, but supposedly this matches the expectations of TGSI
back-ends other than tgsi_exec, and the expectations of the DX10 API.
I take no responsibility for future headaches caused by this
inconsistency.

Fixes a regression of piglit glsl-fs-tan-1 on softpipe introduced by
the above-mentioned glsl front-end commit.  Even though the commit
that triggered the regression doesn't seem to have made it to any
stable branches yet, this might be worth back-porting since I don't
see any reason why the bug couldn't have been reproduced before that
point.

Suggested-by: Roland Scheidegger <sroland@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99817
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
7 years agoutil/disk_cache: do eviction before creating .tmp
Grazvydas Ignotas [Wed, 15 Mar 2017 18:53:56 +0000 (20:53 +0200)]
util/disk_cache: do eviction before creating .tmp

cache_put() first creates a .tmp file and then tries to do eviction.
The recently added LRU eviction code selects non-empty directory with
the oldest access time, but that may easily be the one with just the
new .tmp file, especially on Linux where atime is updated lazily
(with "relatime" mount option, which is the default). So when cache is
small, if random doesn't hit another dir LRU keeps selecting the same
dir with just the .tmp and not deleting anything. To fix this (and the
tests), do eviction earlier.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
7 years agoswr: validate backend state numAttributes
Tim Rowley [Wed, 15 Mar 2017 16:42:43 +0000 (11:42 -0500)]
swr: validate backend state numAttributes

General protection and prevents us from smashing the stack
on the first clear state validation (a7b8d50bcb).  Fixes crash
using icc.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agogbm: Export a get modifiers
Ben Widawsky [Fri, 21 Oct 2016 01:21:24 +0000 (18:21 -0700)]
gbm: Export a get modifiers

This patch originally had i965 specific code and was named:
commit 61cd3c52b868cf8cb90b06e53a382a921eb42754
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Thu Oct 20 18:21:24 2016 -0700

    gbm: Get modifiers from DRI

To accomplish this, two new query tokens are added to the extension:
__DRI_IMAGE_ATTRIB_MODIFIER_UPPER
__DRI_IMAGE_ATTRIB_MODIFIER_LOWER

The query extension only supported 32b queries, and modifiers are 64b,
so we needed two of them.

NOTE: The extension version is still set to 13, so none of this will
actually be called.

v2: Error handling of queryImage (Emil)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoi965: introduce modifier selection.
Ben Widawsky [Tue, 14 Mar 2017 01:20:02 +0000 (18:20 -0700)]
i965: introduce modifier selection.

Nothing special here other than a brief introduction to modifier
selection. Originally this was part of another patch but was split out
from
gbm: Introduce modifiers into surface/bo creation by request of Emil.

Requested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoegl/drm: Use modifiers for backbuffer creation
Ben Widawsky [Tue, 14 Mar 2017 01:19:00 +0000 (18:19 -0700)]
egl/drm: Use modifiers for backbuffer creation

Split into a separate patch from the previous patch as requested by
Emil.

Requested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agogbm: Introduce modifiers into surface/bo creation
Ben Widawsky [Thu, 3 Nov 2016 23:14:44 +0000 (16:14 -0700)]
gbm: Introduce modifiers into surface/bo creation

The idea behind modifiers like this is that the user of GBM will have
some mechanism to query what properties the hardware supports for its BO
or surface. This information is directly passed in (and stored) so that
the DRI implementation can create an image with the appropriate
attributes.

A getter() will be added later so that the user GBM will be able to
query what modifier should be used.

Only in surface creation, the modifiers are stored until the BO is
actually allocated. In regular buffer allocation, the correct modifier
can (will be, in future patches be chosen at creation time.

v2: Make sure to check if count is non-zero in addition to testing if
calloc fails. (Daniel)

v3: Remove "usage" and "flags" from modifier creation. Requested by
Kristian.

v4: Take advantage of the "INVALID" modifier added by the GET_PLANE2
series.

v5: Don't bother with storing modifiers for gbm_bo_create because that's
a synchronous operation and we can actually select the correct modifier
at create time (done in a later patch) (Jason)

v6: Make modifier condition outside the check so that dri_use will work
properly (Jason)

Cc: Kristian Høgsberg <krh@bitplanet.net>
References (v4): https://lists.freedesktop.org/archives/intel-gfx/2017-January/116636.html
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
7 years agoi965: Implement basic modifier image creation
Ben Widawsky [Mon, 13 Mar 2017 21:53:43 +0000 (14:53 -0700)]
i965: Implement basic modifier image creation

This is just a stub for now and will be filled in later.

This was split out of an earlier patch

Requested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agodri: Add an image creation with modifiers
Ben Widawsky [Fri, 4 Nov 2016 18:31:15 +0000 (11:31 -0700)]
dri: Add an image creation with modifiers

Modifiers will be obtained or guessed by the client and passed in during
image creation/import. In guessing, a client might decide to simply pass
along all known modifiers

This requires bumping the DRIimage version.

As of this patch, the modifiers aren't plumbed all the way down, this
patch simply makes sure the interface level stuff is correct.

v2: Don't allow usage + modifiers

v3: Make NAND actually NAND. Bug introduced in v2. (Jason)

v4:
- s/obtains/obtained (Jason)
- Pull out i965 imlemnentation into a later patch (Emil)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
7 years agoradeonsi: implement TGSI opcodes TEX_LZ and TXF_LZ
Marek Olšák [Tue, 7 Mar 2017 01:19:47 +0000 (02:19 +0100)]
radeonsi: implement TGSI opcodes TEX_LZ and TXF_LZ

This massively decreases VGPR spilling for DiRT Showdown, because we
no longer have to use v4i32 for 2D fetches when level == 0.
We now use v2i32 for those cases.

DiRT Showdown - Spilled VGPRs: -26 (-81%)

This surprisingly doesn't have any useful effect on performance (+ 0.05%).

7 years agoglsl_to_tgsi: use TEX_LZ and TXF_LZ when available
Marek Olšák [Tue, 7 Mar 2017 01:26:47 +0000 (02:26 +0100)]
glsl_to_tgsi: use TEX_LZ and TXF_LZ when available

7 years agoglsl_to_tgsi: remove a redundant statement
Marek Olšák [Tue, 7 Mar 2017 01:01:08 +0000 (02:01 +0100)]
glsl_to_tgsi: remove a redundant statement

it's the same as the last "else".

7 years agogallium: add TGSI opcodes TEX_LZ and TXF_LZ
Marek Olšák [Tue, 7 Mar 2017 01:15:14 +0000 (02:15 +0100)]
gallium: add TGSI opcodes TEX_LZ and TXF_LZ

for better code generation in radeonsi

7 years agogallium: add PIPE_CAP_TGSI_TEX_TXF_LZ
Marek Olšák [Tue, 7 Mar 2017 01:09:03 +0000 (02:09 +0100)]
gallium: add PIPE_CAP_TGSI_TEX_TXF_LZ

7 years agoradeonsi: disable sinking common instructions down to the end block
Samuel Pitoiset [Tue, 14 Mar 2017 23:59:13 +0000 (00:59 +0100)]
radeonsi: disable sinking common instructions down to the end block

Initially this was a workaround for a bug introduced in LLVM 4.0
in the SimplifyCFG pass that caused image instrinsics to disappear
(because they were badly sunk). Finally, this is a win because it
decreases SGPR spilling and increases the number of waves a bit.

Although, shader-db results are good I think we might want to
remove it in the future once the issue is fixed. For now, enable
it for LLVM >= 4.0.

This also fixes a rendering issue with the speedometer in Dirt Rally.

More information can be found here https://reviews.llvm.org/D26348.

Thanks to Dave Airlie for the patch.

v2: - add a FIXME comment
    - use if (HAVE_LLVM >= 0x0400) instead

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99484
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97988
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agotgsi: add missing compute shader entry in tgsi_get_processor_name()
Samuel Pitoiset [Wed, 15 Mar 2017 11:40:13 +0000 (12:40 +0100)]
tgsi: add missing compute shader entry in tgsi_get_processor_name()

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: clean up tex_fetch_ptrs()
Samuel Pitoiset [Wed, 15 Mar 2017 12:00:02 +0000 (13:00 +0100)]
radeonsi: clean up tex_fetch_ptrs()

Will also help when the src sampler register will be
TGSI_FILE_CONSTANT for bindless.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoconfigure.ac: bump pthread-stubs requirement
Emil Velikov [Thu, 2 Mar 2017 19:02:45 +0000 (19:02 +0000)]
configure.ac: bump pthread-stubs requirement

On platforms that require it, we bump the requirement to 0.4 or later.
Due to an issue with the project [design] any version earlier than it,
is bound to cause issues. For the specifics see the pthread-stubs README

Cc: Uli Schlachter <psychon@znc.in>
Cc: Jonathan Gray <jsg@jsg.id.au>
Cc: Jean-Sébastien Pédron <dumbbell@FreeBSD.org>
Cc: François Tigeot <ftigeot@wolfpond.org>
Cc: Tobias Nygren <tnn@NetBSD.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
7 years agoglx: don't expose systemTimeExtension for DRI2/DRI3/DRISW
Emil Velikov [Tue, 27 Sep 2016 12:39:36 +0000 (13:39 +0100)]
glx: don't expose systemTimeExtension for DRI2/DRI3/DRISW

Used/applicable to only dri1 drivers.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
7 years agoanv: do not open random render node(s)
Emil Velikov [Thu, 1 Dec 2016 21:21:10 +0000 (21:21 +0000)]
anv: do not open random render node(s)

drmGetDevices2() provides us with enough flexibility to build heuristics
upon. Opening a random node on the other hand will wake up the device,
regardless if it's the one we're interested or not.

v2: Rebase, explicitly require/check for libdrm
v3: Return VK_ERROR_INCOMPATIBLE_DRIVER for no devices (Ilia)
v4: Rebase

Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Tested-by: Mike Lothian <mike@fireburn.co.uk>
7 years agoradv: do not open random render node(s)
Emil Velikov [Thu, 1 Dec 2016 20:58:20 +0000 (20:58 +0000)]
radv: do not open random render node(s)

drmGetDevices2() provides us with enough flexibility to build heuristics
upon. Opening a random node on the other hand will wake up the device,
regardless if it's the one we're interested or not.

v2: Rebase.
v3: Return VK_ERROR_INCOMPATIBLE_DRIVER for no devices (Ilia)

Cc: Michel Dänzer <michel.daenzer@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Tested-by: Mike Lothian <mike@fireburn.co.uk>
7 years agoradv/winsys: use drmGetDevice2 API
Emil Velikov [Thu, 1 Dec 2016 19:53:11 +0000 (19:53 +0000)]
radv/winsys: use drmGetDevice2 API

Analogous to previous commit

v2: Add explicit require_libdrm check.

Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Tested-by: Mike Lothian <mike@fireburn.co.uk>
7 years agowinsys/amdgpu: use drmGetDevice2 API
Emil Velikov [Thu, 1 Dec 2016 19:54:39 +0000 (19:54 +0000)]
winsys/amdgpu: use drmGetDevice2 API

Analogous to previous commit

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98502
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
7 years agoloader: use drmGetDevice[s]2 API
Emil Velikov [Thu, 1 Dec 2016 19:51:03 +0000 (19:51 +0000)]
loader: use drmGetDevice[s]2 API

By this allows us to fetch the device list/info w/o the revision field.
At the moment retrieving the latter wakes up the device.

Note: kernel patch to resolve that should be in 4.10.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
7 years agoautoconf/scons: bump libdrm to 2.4.75
Emil Velikov [Thu, 1 Dec 2016 19:48:43 +0000 (19:48 +0000)]
autoconf/scons: bump libdrm to 2.4.75

We'll be using the drmGetDevice[s]2 API in src/loader with next patch.

v2: Rebase.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Tested-by: Mike Lothian <mike@fireburn.co.uk>
7 years agoutil/sha1: drop _mesa_sha1_{update, format} return type
Emil Velikov [Tue, 24 Jan 2017 21:21:10 +0000 (21:21 +0000)]
util/sha1: drop _mesa_sha1_{update, format} return type

Unused/unchecked by any of the callers.

v2: Fix the glsl cases that have crept in since v1

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
7 years agoutil/sha1: rework _mesa_sha1_{init,final}
Emil Velikov [Tue, 24 Jan 2017 21:21:09 +0000 (21:21 +0000)]
util/sha1: rework _mesa_sha1_{init,final}

Rather than having an extra memory allocation [that we currently do not
and act accordingly] just make the API take an pointer to a stack
allocated instance.

This and follow-up steps will effectively make the _mesa_sha1_foo simple
define/inlines around their SHA1 counterparts.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
7 years agoutil/sha1: add non-typedef name for the SHA1_CTX struct
Emil Velikov [Tue, 24 Jan 2017 21:21:08 +0000 (21:21 +0000)]
util/sha1: add non-typedef name for the SHA1_CTX struct

Using typedef(s) is not always the answer and makes it harder for people
to do clever (or one might call nasty) things with the code.

Add a struct name which we will use with follow-up commit.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
7 years agoradv: Remove unused descriptor set field.
Bas Nieuwenhuizen [Wed, 15 Mar 2017 07:54:04 +0000 (08:54 +0100)]
radv: Remove unused descriptor set field.

Trivial.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
7 years agor600: refactor binding code for attach buffer to CB.
Dave Airlie [Thu, 31 Mar 2016 05:33:16 +0000 (15:33 +1000)]
r600: refactor binding code for attach buffer to CB.

This refactors out the code and fixes it up to be used
for images later. It uses the code in the current RAT binding
for compute.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agor600: refactor out CB setup.
Dave Airlie [Thu, 31 Mar 2016 05:27:42 +0000 (15:27 +1000)]
r600: refactor out CB setup.

This moves the code to create CB info out into
a separate function so it can be reused in images
code to create RATs.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agor600: refactor texture resource words setup code.
Dave Airlie [Thu, 31 Mar 2016 05:24:47 +0000 (15:24 +1000)]
r600: refactor texture resource words setup code.

This refactors out the code to setup a texture resource
so we can reuse it later from the images code.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agor600: factor out the code to initialise a buffer resource.
Dave Airlie [Thu, 31 Mar 2016 05:20:42 +0000 (15:20 +1000)]
r600: factor out the code to initialise a buffer resource.

This takes the code required to initialise a buffer resource
out of the texture buffer code, into it's own function.

This is going to be used for the image support later.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agor600g: make framebuffer atom rely on dual src blend state.
Dave Airlie [Tue, 26 Jan 2016 03:35:08 +0000 (13:35 +1000)]
r600g: make framebuffer atom rely on dual src blend state.

In order to make ARB_shader_image_load_store, we have to share
the CB space with RATs, so we should only steal the dual src
space if we have dual src enabled.

Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agointel/debug: Add a common INTEL_DEBUG=nohiz option
Jason Ekstrand [Mon, 13 Mar 2017 21:23:34 +0000 (14:23 -0700)]
intel/debug: Add a common INTEL_DEBUG=nohiz option

The GL driver had a driconf option (which doesn't make much sense) and
the Vulkan driver had a hand-rolled environment variable.  Instead,
let's tie both into the INTEL_DEBUG mechanism and unify things.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoanv/image: Move handling of INTEL_VK_HIZ
Jason Ekstrand [Mon, 13 Mar 2017 15:10:38 +0000 (08:10 -0700)]
anv/image: Move handling of INTEL_VK_HIZ

This makes it so that you don't get an "Implement gen7 HiZ" perf warning
when you manually disable HiZ on gen8.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoradv: trivial tidy ups
Timothy Arceri [Tue, 14 Mar 2017 04:50:34 +0000 (15:50 +1100)]
radv: trivial tidy ups

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
7 years agoutil/disk_cache: scale cache according to filesystem size
Alan Swanson [Mon, 6 Mar 2017 16:17:32 +0000 (16:17 +0000)]
util/disk_cache: scale cache according to filesystem size

Select higher of current 1G default or 10% of filesystem where
cache is located.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
7 years agoutil/disk_cache: actually enforce cache size
Alan Swanson [Mon, 6 Mar 2017 16:17:31 +0000 (16:17 +0000)]
util/disk_cache: actually enforce cache size

Currently only a one in one out eviction so if at max_size and
cache files were to constantly increase in size then so would the
cache. Restrict to limit of 8 evictions per new cache entry.

V2: (Timothy Arceri) fix make check tests

Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
7 years agoutil/disk_cache: use LRU eviction rather than random eviction
Alan Swanson [Fri, 10 Mar 2017 16:22:51 +0000 (16:22 +0000)]
util/disk_cache: use LRU eviction rather than random eviction

Still using fast random selection of two-character subdirectory in
which to check cache files rather than scanning entire cache.

v2: Factor out double strlen call
v3: C99 declaration of variables where used

Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
7 years agoutil/disk_cache: don't fallback to an empty cache dir on evict
Timothy Arceri [Tue, 14 Mar 2017 00:22:44 +0000 (11:22 +1100)]
util/disk_cache: don't fallback to an empty cache dir on evict

If we fail to randomly select a two letter cache dir, don't select
an empty dir on fallback.

In real world use we should never hit the fallback path but it can
be hit by tests when the cache is set to a very small max value.

Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
7 years agoutil/disk_cache: use a thread queue to write to shader cache
Timothy Arceri [Mon, 13 Mar 2017 00:07:30 +0000 (11:07 +1100)]
util/disk_cache: use a thread queue to write to shader cache

This should help reduce any overhead added by the shader cache
when programs are not found in the cache.

To avoid creating any special function just for the sake of the
tests we add a one second delay whenever we call dick_cache_put()
to give it time to finish.

V2: poll for file when waiting for thread in test
V3: fix poll delay to really be 100ms, and simplify the wait function

Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
7 years agoutil/disk_cache: add helpers for creating/destroying disk cache put jobs
Timothy Arceri [Sun, 12 Mar 2017 23:14:35 +0000 (10:14 +1100)]
util/disk_cache: add helpers for creating/destroying disk cache put jobs

V2: Make a copy of the data so we don't have to worry about it being
freed before we are done compressing/writing.

Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
7 years agoutil/disk_cache: add thread queue to disk cache
Timothy Arceri [Wed, 8 Mar 2017 23:51:01 +0000 (10:51 +1100)]
util/disk_cache: add thread queue to disk cache

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
7 years agoradv/ac: workaround regression in llvm 4.0 release
Dave Airlie [Tue, 14 Mar 2017 21:15:50 +0000 (07:15 +1000)]
radv/ac: workaround regression in llvm 4.0 release

LLVM 4.0 released with a pretty messy regression, that hopefully
get fixed in the future.

This work around was proposed by Tom, and it fixes the CTS regressions
here at least, I'm not sure if this will cause any major side effects,
but correctness over speed and all that.

radeonsi should possibly consider the same workaround until an llvm
fix can be found.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/ac: gather4 cube workaround integer
Dave Airlie [Mon, 27 Feb 2017 01:30:41 +0000 (11:30 +1000)]
radv/ac: gather4 cube workaround integer

This fix is extracted from amdgpu-pro shader traces.

It appears the gather4 workaround for integer types doesn't
work for cubes, so instead if forces a float scaled sample,
then converts to integer.

It modifies the descriptor before calling the gather.

This also produces some ugly asm code for reasons specified
in the patch, llvm could probably do better than dumping
sgprs to vgprs.

This fixes:
dEQP-VK.glsl.texture_gather.basic.cube.rgba8*

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Set driver version to mesa version;
Bas Nieuwenhuizen [Tue, 14 Mar 2017 21:57:55 +0000 (22:57 +0100)]
radv: Set driver version to mesa version;

I couldn't really find an encoding in the spec. I'm not sure it
prescribes VK_MAKE_VERSION format, but vulkan.gpuinfo.org interprets
it that way by default. vulkaninfo gives the raw number, so we could
alternatively do something like 17001000, but that doesn't show
up right on vulkan.gpuinfo.org again. Looking at that site, the -pro
driver also uses VK_MAKE_VERSION, so keeping consistency is probably
best.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Increase api version to 1.0.42.
Bas Nieuwenhuizen [Tue, 14 Mar 2017 21:37:03 +0000 (22:37 +0100)]
radv: Increase api version to 1.0.42.

I've skimmed to changes from 1.0.5 to 1.0.42 and I think we have all
changes. We're still not conformant ofcourse, but this should not
regress stuff,

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
7 years agoutil/vk: Add helpers for finding an extension struct
Jason Ekstrand [Tue, 14 Mar 2017 02:26:06 +0000 (19:26 -0700)]
util/vk: Add helpers for finding an extension struct

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Flush before copying with PKT3_WRITE_DATA in CmdUpdateBuffer
Alex Smith [Tue, 14 Mar 2017 15:26:32 +0000 (15:26 +0000)]
radv: Flush before copying with PKT3_WRITE_DATA in CmdUpdateBuffer

Need to flush before updating the buffer to ensure that the copy is
ordered after previous accesses (assuming the app has performed the
appropriate barriers).

This fixes potential issues due to draws prior to an update reading
the new buffer content, despite having the necessary barriers between
them.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Emit cache flushes before CP DMA.
Bas Nieuwenhuizen [Tue, 14 Mar 2017 20:46:54 +0000 (21:46 +0100)]
radv: Emit cache flushes before CP DMA.

The flushes could be due to TRANSFER barriers.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoConvert sed(1) syntax to be compatible with FreeBSD and OpenBSD
Jan Beich [Sun, 12 Mar 2017 03:19:14 +0000 (03:19 +0000)]
Convert sed(1) syntax to be compatible with FreeBSD and OpenBSD

BSD regex library doesn't support extended RE escapes (e.g. \+) and
shorthand character classes (e.g. \s, \S) and SVR4-style word
delimiters[1] (on DragonFly and NetBSD). Both GNU and BSD sed support
-E and -r to enable extended RE but OS X still lacks -r.

[1] https://www.illumos.org/issues/516

Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Eric Engestrom <eric.engestrom@imgtec.com> (GNU sed)
7 years agoanv: Properly enumerate physical devices when none are present
Jason Ekstrand [Tue, 14 Mar 2017 02:30:26 +0000 (19:30 -0700)]
anv: Properly enumerate physical devices when none are present

7 years agonir/constant_expressions: Refactor helper functions
Jason Ekstrand [Thu, 9 Mar 2017 04:23:05 +0000 (20:23 -0800)]
nir/constant_expressions: Refactor helper functions

Apart from avoiding some unneeded size cases, this shouldn't have any
actual functional impact.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agonir: Rework conversion opcodes
Jason Ekstrand [Wed, 8 Mar 2017 03:54:37 +0000 (19:54 -0800)]
nir: Rework conversion opcodes

The NIR story on conversion opcodes is a mess.  We've had way too many
of them, naming is inconsistent, and which ones have explicit sizes was
sort-of random.  This commit re-organizes things and makes them all
consistent:

 - All non-bool conversion opcodes now have the explicit size in the
   destination and are named <src_type>2<dst_type><size>.

 - Integer <-> integer conversion opcodes now only come in i2i and u2u
   forms (i2u and u2i have been removed) since the only difference
   between the different integer conversions is whether or not they
   sign-extend when up-converting.

 - Boolean conversion opcodes all have the explicit size on the bool and
   are named <src_type>2<dst_type>.

Making things consistent also allows nir_type_conversion_op to be moved
to nir_opcodes.c and auto-generated using mako.  This will make adding
int8, int16, and float16 versions much easier when the time comes.

Reviewed-by: Eric Anholt <eric@anholt.net>
7 years agoi965/fs: Re-arrange conversion operations
Jason Ekstrand [Wed, 8 Mar 2017 03:32:50 +0000 (19:32 -0800)]
i965/fs: Re-arrange conversion operations

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965/vec4: Get rid of the type parameter from to/from_double
Jason Ekstrand [Wed, 8 Mar 2017 02:32:17 +0000 (18:32 -0800)]
i965/vec4: Get rid of the type parameter from to/from_double

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoglsl/nir: Use nir_type_conversion_op
Jason Ekstrand [Wed, 8 Mar 2017 00:46:44 +0000 (16:46 -0800)]
glsl/nir: Use nir_type_conversion_op

Using the helper is way better than hand-coding the universe.

Reviewed-by: Eric Anholt <eric@anholt.net>
7 years agonir: Rewrite nir_type_conversion_op
Jason Ekstrand [Mon, 13 Mar 2017 20:07:24 +0000 (13:07 -0700)]
nir: Rewrite nir_type_conversion_op

The original version was very convoluted and tried way too hard to not
just have the nested switch statement that it needs.  Let's just write
the obvious code and then we know it's correct.  This fixes a bunch of
missing cases particularly with int64.

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
7 years agonir: Add a get_nir_type_for_glsl_base_type helper
Jason Ekstrand [Wed, 8 Mar 2017 00:46:17 +0000 (16:46 -0800)]
nir: Add a get_nir_type_for_glsl_base_type helper

Reviewed-by: Eric Anholt <eric@anholt.net>
7 years agonir/validate: Rework ALU bit-size rule validation
Jason Ekstrand [Wed, 8 Mar 2017 18:32:40 +0000 (10:32 -0800)]
nir/validate: Rework ALU bit-size rule validation

The original bit-size validation wasn't capable of properly dealing with
instructions with variable bit sizes.  An attempt was made to handle it
by looking at source and destinations but, because the validation was
done in validate_alu_(src|dest), it didn't really have the needed
information.  The new validation code is much more straightforward and
should be more correct.

Reviewed-by: Eric Anholt <eric@anholt.net>
7 years agonir/validate: Validate that bit sizes and components always match
Jason Ekstrand [Fri, 3 Mar 2017 00:25:59 +0000 (16:25 -0800)]
nir/validate: Validate that bit sizes and components always match

We've always required bit sizes to match but the rules for number of
components have been a bit loose.  You've never been allowed to source
from something with less components than you consume, but more has
always been fine.  This changes the validator to require that they match
exactly.  The fact that they don't always match has been a source of
confusion in NIR for quite some time and it's time we got rid of it.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
7 years agonir: Make image_size a variable-width intrinsic
Jason Ekstrand [Fri, 3 Mar 2017 05:42:06 +0000 (21:42 -0800)]
nir: Make image_size a variable-width intrinsic

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
7 years agoi965/fs: Use num_components from the SSA def in image intrinsics
Jason Ekstrand [Fri, 3 Mar 2017 05:39:58 +0000 (21:39 -0800)]
i965/fs: Use num_components from the SSA def in image intrinsics

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
7 years agonir/lower_tex: Use tex_instr_dest_size for txs destinations
Jason Ekstrand [Fri, 3 Mar 2017 03:27:57 +0000 (19:27 -0800)]
nir/lower_tex: Use tex_instr_dest_size for txs destinations

Using coord_components of the source texture is correct for everything
except cube maps where it's off by one.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
7 years agonir/spirv: Restrict the number of channels in texture coordinates
Jason Ekstrand [Fri, 3 Mar 2017 03:03:01 +0000 (19:03 -0800)]
nir/spirv: Restrict the number of channels in texture coordinates

Some SPIR-V texturing instructions pack more than the texture coordinate
into the coordinate source.  We need to mask off the unused channels.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
7 years agonir/copy_prop: Respect the source's number of components
Jason Ekstrand [Fri, 3 Mar 2017 01:10:24 +0000 (17:10 -0800)]
nir/copy_prop: Respect the source's number of components

In the near future we are going to require that the num_components in a
src dereference match the num_components of the SSA value being
dereferenced.  To do that, we need copy_prop to not remove our MOVs from
a larger SSA value into an instruction that uses fewer channels.

Because we suddenly have to know how many components each source has,
this makes the pass a bit more complicated.  Fortunately, copy
propagation is the only pass that cares about the number of components
are read by any given source so it's fairly contained.

Shader-db results on Sky Lake:

   total instructions in shared programs: 13318947 -> 13320265 (0.01%)
   instructions in affected programs: 260633 -> 261951 (0.51%)
   helped: 324
   HURT: 1027

Looking through the hurt programs, about a dozen are hurt by 3
instructions and the rest are all hurt by 2 instructions.  From a
spot-check of the shaders, the story is always the same:  They get a
vec4 from somewhere (frequently an input) and use the first two or three
components as a texture coordinate.  Because of the vector component
mismatch, we have a mov or, more likely, a vecN sitting between the
texture instruction and the input.  This means that the back-end inserts
a bunch of MOVs and split_virtual_grfs() goes to town.  Because the
texture coordinate is also used by some other calculation, register
coalesce can't combine them back together and we end up with an extra 2
MOV instructions in our shader.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
7 years agonir/intrinsics: Make load_barycentric_input take a 2-component coor
Jason Ekstrand [Fri, 3 Mar 2017 01:39:11 +0000 (17:39 -0800)]
nir/intrinsics: Make load_barycentric_input take a 2-component coor

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
7 years agoanv/blorp: Only set a clear color for resolves if fast-cleared
Jason Ekstrand [Fri, 3 Mar 2017 07:03:03 +0000 (23:03 -0800)]
anv/blorp: Only set a clear color for resolves if fast-cleared

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
7 years agoanv/blorp: Turn off AUX after doing a CCS_D resolve
Jason Ekstrand [Fri, 10 Mar 2017 00:37:23 +0000 (16:37 -0800)]
anv/blorp: Turn off AUX after doing a CCS_D resolve

For render passes with multiple subpasses on gen7, we only fast-clear at
the top but an input attachment use can cause us to do a resolve in the
middle of the render pass.  Once we've done so, we are no longer have a
fast-cleared surface so we can just set aux_usage to NONE.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
7 years agoandroid: add '/vulkan' to libmesa_anv_entrypoints path
Tapani Pälli [Mon, 13 Mar 2017 12:08:38 +0000 (14:08 +0200)]
android: add '/vulkan' to libmesa_anv_entrypoints path

otherwise generated entrypoint headers are not found during build

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoandroid: add src/intel/compiler to libmesa_intel_compiler includes
Tapani Pälli [Mon, 13 Mar 2017 12:08:37 +0000 (14:08 +0200)]
android: add src/intel/compiler to libmesa_intel_compiler includes

fixes build error when brw_nir.h not found in the generated file
brw_nir_trig_workarounds.c.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoanv: Add missing error-checking to anv_CreateDevice (v3)
Gwan-gyeong Mun [Tue, 29 Nov 2016 21:59:15 +0000 (06:59 +0900)]
anv: Add missing error-checking to anv_CreateDevice (v3)

This patch adds missing error-checking and fixes resource leak in
allocation failure path on anv_CreateDevice()

v2: Fixes from Jason Ekstrand's review
  a) Add missing destructors for all of the state pools on allocation
     failure path
  b) Add missing destructor for batch bo pools on allocation failure path

v3: Fixes from Emil Velikov's review
  Add missing destructor for queue and scratch_pool on allocation failure
  path

Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoradv: setup llvm target data layout
Dave Airlie [Mon, 13 Mar 2017 20:50:59 +0000 (06:50 +1000)]
radv: setup llvm target data layout

Ported from radeonsi, pointed out by Tom.

"This prevents LLVM from using sext instructions for local memory
offsets and allows the backend to fold immediate offsets into the
instruction. This also prevents some incorrect code generation for
ptrtoint and inttoptr instructions."

Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Reinitialise loaderMagic when allocating a cached command buffer
Alex Smith [Mon, 13 Mar 2017 13:28:19 +0000 (13:28 +0000)]
radv: Reinitialise loaderMagic when allocating a cached command buffer

This must be set to ICD_LOADER_MAGIC by vkAllocateCommandBuffers, which
was being done when allocating a new buffer but not when reusing an
existing one in the cache. This would hit an assertion and crash in
debug builds of the Vulkan loader.

Fixes: 682248db451f ("radv: Cache command buffers in command pool.")
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agogallium/radeon: disable the shader cache if dumping shaders
Marek Olšák [Fri, 10 Mar 2017 11:18:07 +0000 (12:18 +0100)]
gallium/radeon: disable the shader cache if dumping shaders

otherwise, cached shaders aren't dumped.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
7 years agoradeonsi: mark all bound shader buffer ranges as initialized
Marek Olšák [Mon, 6 Mar 2017 00:47:52 +0000 (01:47 +0100)]
radeonsi: mark all bound shader buffer ranges as initialized

This should prevent cases when a buffer was incorrectly mapped without
synchronization just because this wasn't done.

Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
7 years agost/mesa: disable the shader cache if dumping shaders
Marek Olšák [Fri, 10 Mar 2017 11:19:50 +0000 (12:19 +0100)]
st/mesa: disable the shader cache if dumping shaders

otherwise, cached shaders aren't dumped.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
7 years agoanv: Use vk_outarray in vkGetPhysicalDeviceQueueFamilyProperties
Chad Versace [Sun, 5 Mar 2017 21:15:06 +0000 (13:15 -0800)]
anv: Use vk_outarray in vkGetPhysicalDeviceQueueFamilyProperties

No intended change in behavior. Just a refactor.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv: Use vk_outarray in vkEnumeratePhysicalDevices (v2)
Chad Versace [Sun, 5 Mar 2017 21:07:13 +0000 (13:07 -0800)]
anv: Use vk_outarray in vkEnumeratePhysicalDevices (v2)

No intended change in behavior. Just a refactor.

v2: Replace vk_outarray_is_incomplete() with vk_outarray_status(). For
    Jason.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoutil/vulkan: Add vk_outarray (v2)
Chad Versace [Sat, 25 Feb 2017 04:58:59 +0000 (20:58 -0800)]
util/vulkan: Add vk_outarray (v2)

This is a wrapper for a Vulkan output array. A Vulkan output array is
one that follows the convention of the parameters to
vkGetPhysicalDeviceQueueFamilyProperties().

v2: Replace vk_outarray_is_incomplete() with vk_outarray_status(). For
    Jason.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agointel: genxml: prevent missing ; with address fields dwords
Lionel Landwerlin [Sun, 12 Mar 2017 16:53:29 +0000 (16:53 +0000)]
intel: genxml: prevent missing ; with address fields dwords

Before this change, the generator could print this kind of things :

   const uint32_t v0 =
      __gen_uint(values->ValidBit, 0, 0) |
      __gen_uint(values->FaultType, 1, 2) |
      __gen_uint(values->SRCIDofFault, 3, 10) |
      __gen_uint(values->GTTSEL, 11, 1) |
   dw[0] = __gen_combine_address(data, &dw[0], values->VirtualAddressofFault, v0);

This change fix the trailing '|'.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agogallium/hud: check NULL return from u_upload_alloc
Julien Isorce [Fri, 10 Mar 2017 17:16:07 +0000 (17:16 +0000)]
gallium/hud: check NULL return from u_upload_alloc

Fixes the following segmentation fault:

signal SIGSEGV: invalid address (fault address: 0x0)
 frame #0: 0x00007fffe718e117 radeonsi_dri.so hud_draw_background_quad hud_context.c:170
   167
   168     assert(hud->bg.num_vertices + 4 <= hud->bg.max_num_vertices);
   169
-> 170     vertices[num++] = (float) x1;
   171     vertices[num++] = (float) y1;
   172
   173     vertices[num++] = (float) x1;
(lldb) bt
  * frame #0: 0x00007fffe718e117 radeonsi_dri.so`hud_draw_background_quad
    frame #1: 0x00007fffe718f458 radeonsi_dri.so`hud_draw
    frame #2: 0x00007fffe712967f radeonsi_dri.so`dri_flush

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
7 years agowinsys/radeon: check null return from radeon_cs_create_fence in cs_flush
Julien Isorce [Fri, 10 Mar 2017 17:20:56 +0000 (17:20 +0000)]
winsys/radeon: check null return from radeon_cs_create_fence in cs_flush

Follow-up of patch:
"radeon_cs_create_fence: check null return from radeon_winsys_bo_create"

radeon_drm_cs_flush
  radeon_cs_create_fence
    radeon_winsys_bo_create

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
7 years agowinsys/radeon: check null in radeon_cs_create_fence
Julien Isorce [Fri, 10 Mar 2017 17:16:05 +0000 (17:16 +0000)]
winsys/radeon: check null in radeon_cs_create_fence

Fixes the following segmentation fault:

radeon_drm_cs_add_buffer (bo=0x0) at radeon_drm_cs.c
  -> if (!bo->handle)
(gdb) bt
0  radeon_drm_cs_add_buffer (bo=0x0) at radeon_drm_cs.c
1  0x00007fffe73575de in radeon_cs_create_fence radeon_drm_cs.c
2  0x00007fffe7358c48 in radeon_drm_cs_flush radeon_drm_cs.c

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
7 years agovulkan/wsi: include builddir for generated headers
Juan A. Suarez Romero [Mon, 13 Mar 2017 15:04:20 +0000 (16:04 +0100)]
vulkan/wsi: include builddir for generated headers

wayland-drm-client-protocol.h is generated in builddir, so when
builddir != srcdir the header is not found, and compilation of
wsi_common_wayland.c will fail.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoanv: Use on-the-fly surface states for dynamic buffer descriptors
Jason Ekstrand [Sat, 4 Mar 2017 17:23:26 +0000 (09:23 -0800)]
anv: Use on-the-fly surface states for dynamic buffer descriptors

We have a performance problem with dynamic buffer descriptors.  Because
we are currently implementing them by pushing an offset into the shader
and adding that offset onto the already existing offset for the UBO/SSBO
operation, all UBO/SSBO operations on dynamic descriptors are indirect.
The back-end compiler implements indirect pull constant loads using what
basically amounts to a texelFetch instruction.  For pull constant loads
with constant offsets, however, we use an oword block read message which
goes through the constant cache and reads a whole cache line at a time.
Because of these two things, direct pull constant loads are much faster
than indirect pull constant loads.  Because all loads from dynamically
bound buffers are indirect, the user takes a substantial performance
penalty when using this "performance" feature.

There are two potential solutions I have seen for this problem.  The
alternate solution is to continue pushing offsets into the shader but
wire things up in the back-end compiler so that we use the oword block
read messages anyway.  The only reason we can do this because we know a
priori that the dynamic offsets are uniform and 16-byte aligned.
Unfortunately, thanks to the 16-byte alignment requirement of the oword
messages, we can't do some general "if the indirect offset is uniform,
use an oword message" sort of thing.

This solution, however, is recommended for a few of reasons:

 1. Surface states are relatively cheap.  We've been using on-the-fly
    surface state setup for some time in GL and it works well.  Also,
    dynamic offsets with on-the-fly surface state should still be
    cheaper than allocating new descriptor sets every time you want to
    change a buffer offset which is really the only requirement of the
    dynamic offsets feature.

 2. This requires substantially less compiler plumbing.  Not only can we
    delete the entire apply_dynamic_offsets pass but we can also avoid
    having to add architecture for passing dynamic offsets to the back-
    end compiler in such a way that it can continue using oword messages.

 3. We get robust buffer access range-checking for free.  Because the
    offset and range are baked into the surface state, we no longer need
    to pass ranges around and do bounds-checking in the shader.

 4. Once we finally get UBO pushing implemented, it will be much easier
    to handle pushing chunks of dynamic descriptors if the compiler
    remains blissfully unaware of dynamic descriptors.

This commit improves performance of The Talos Principle on ULTRA
settings by around 50% and brings it nicely into line with OpenGL
performance.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Stall before fast-clear operations
Jason Ekstrand [Sat, 11 Mar 2017 07:00:49 +0000 (23:00 -0800)]
anv: Stall before fast-clear operations

During initial CCS bring-up, I discovered that you have to do a full CS
stall prior to doing a CCS resolve as well as afterwards.  It appears
that the same is needed for fast-clears as well.  This fixes rendering
corruptions on The Talos Principle on Sky Lake GT4.  The issue hasn't
been demonstrated on any other hardware however, given that this appears
to be a "too many things in the pipe" problem, having it be easier to
reproduce on a system with more EUs makes sense.  The issues with
resolves is demonstrable on a GT3 or GT2 so this is probably also a
problem on all GTs.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
7 years agoanv: Accurately advertise dynamic descriptor limits
Jason Ekstrand [Sat, 4 Mar 2017 18:52:43 +0000 (10:52 -0800)]
anv: Accurately advertise dynamic descriptor limits

The number of dynamic descriptors is limited by both the number of
descriptors and the total number of dynamic things.  Because there isn't
a single "maximum dynamic things" limit, we need to divide by two so
that they can create the maximum of both UBOs and SSBOs.

Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
7 years agoanv: Add a helper for working with VK_WHOLE_SIZE for buffers
Jason Ekstrand [Sat, 4 Mar 2017 18:07:56 +0000 (10:07 -0800)]
anv: Add a helper for working with VK_WHOLE_SIZE for buffers

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
7 years agofreedreno/ir3: fragz cannot be half precision
Rob Clark [Tue, 31 Jan 2017 13:31:37 +0000 (08:31 -0500)]
freedreno/ir3: fragz cannot be half precision

Signed-off-by: Rob Clark <robdclark@gmail.com>
7 years agofreedreno/ir3: optimize less in glsl
Rob Clark [Mon, 30 Jan 2017 22:27:35 +0000 (17:27 -0500)]
freedreno/ir3: optimize less in glsl

Rely on nir for optimization, to reduce compile times.  Very minimal impact
on shader-db:

  total instructions in shared programs:          104170 -> 104199 (0.03%)
  total dwords in shared programs:                209664 -> 209728 (0.03%)
  total full registers used in shared programs:   7156 -> 7161 (0.07%)
  total half registers used in shader programs:   109 -> 109 (0.00%)
  total const registers used in shared programs:  24222 -> 24224 (0.01%)

                   half       full      const      instr     dwords
      helped          12         107         103         112          98
        hurt          11         104         105         115         102

But shader db runtime dropped from ~29.3s user to ~20.4s user.

Signed-off-by: Rob Clark <robdclark@gmail.com>
7 years agoaubinator/genxml: use gzipped files to store embedded genxml
Lionel Landwerlin [Fri, 10 Mar 2017 16:14:43 +0000 (16:14 +0000)]
aubinator/genxml: use gzipped files to store embedded genxml

This reduces the size of the aubinator binary from ~1.4Mb to ~700Kb.
With can now drop the checks on xxd in configure.

v2: Fix incorrect makefile dependency (Lionel)

v3: use $(PYTHON2) (Emil)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agointel: genxml: add script to generate gzipped genxml
Lionel Landwerlin [Fri, 10 Mar 2017 16:12:51 +0000 (16:12 +0000)]
intel: genxml: add script to generate gzipped genxml

v2 (from Dylan):
   Add main function
   Add missing Copyright
   Use print_function

v3: Add actually license (Dylan)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>