Iago Toral Quiroga [Thu, 9 Mar 2017 08:07:39 +0000 (09:07 +0100)]
anv: remove unnecessary function prototype.
The function is defined right after the prototype declaration. Also, the
protoype for it is included in anv_genX.h which is included via anv_private.h.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Timothy Arceri [Thu, 16 Mar 2017 04:43:48 +0000 (15:43 +1100)]
mapi: don't include X11/Xlib-xcb.h on non PTHREAD platforms
Should fix the last of the glthread build issues on windows.
Timothy Arceri [Thu, 16 Mar 2017 04:28:47 +0000 (15:28 +1100)]
mesa: fix glthread marshal build issues on platforms without PTHREAD
Timothy Arceri [Thu, 16 Mar 2017 03:46:16 +0000 (14:46 +1100)]
mesa: fix glthread build issues on platforms without PTHREAD
Marek Olšák [Sun, 5 Feb 2017 00:20:51 +0000 (01:20 +0100)]
gallium: implement the backend of threaded GL dispatch
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Gregory Hainaut [Sun, 12 Feb 2017 14:21:47 +0000 (15:21 +0100)]
mesa/glthread: restore the dispatch table when incompatible gl calls are detected
While a context only has a single glthread, the context itself can be
attached to several threads. Therefore the dispatch table must be
updated in all threads before the destruction of glthread. In others
words, glthread can only be destroyed safely when the context is deleted.
Fixes remaining crashes in the glx-multithread-makecurrent* tests.
V2: (Timothy Arceri) updated gl_API.dtd marshal_fail description.
Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Gregory Hainaut [Mon, 13 Feb 2017 18:14:28 +0000 (19:14 +0100)]
mesa/glthread: don't set a dispatch table if we aren't the owner
Fix crashes when glxMakeCurrent is called.
Signed-off-by: Gregory Hainaut <gregory.hainaut@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Eric Anholt [Mon, 4 Mar 2013 18:30:15 +0000 (10:30 -0800)]
mesa: Track the current vertex/element array buffers for glthread.
We want to support glthread on GLES contexts with reasonable apps, and on
desktop for apps that use VBOs but haven't completely moved to core GL.
To do so, we have to deal with the "the user may or may not pass user
pointers to draw calls" problem.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Eric Anholt [Mon, 4 Mar 2013 17:47:40 +0000 (09:47 -0800)]
mesa: Disable glthread when glBegin() is called.
glBegin() swaps dispatch tables, and we don't have any code in place for
handling that in glthread (which also messes with dispatch tables), and I
don't particularly care to at this point.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Eric Anholt [Fri, 1 Mar 2013 02:15:58 +0000 (18:15 -0800)]
mesa: Add an attribute for conditions to turn off threading.
The threading for GL core is in place, but there are so few applications
actually using a core GL context that it would be nice to extend support
back. However, some of the features of compat GL (particularly user
vertex arrays) would be so expensive to track state for that we want to be
able to disable threading when we discover that the app is using them.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Eric Anholt [Wed, 27 Feb 2013 22:28:16 +0000 (14:28 -0800)]
mesa: Add support for asynchronous glDraw* on GL core.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Eric Anholt [Wed, 27 Feb 2013 21:37:14 +0000 (13:37 -0800)]
mesa: Add support for NULL arguments like in glBufferData() in marshalling.
This will let us support things like glBufferData() that should be
asynchronous.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Eric Anholt [Sat, 16 Feb 2013 00:33:33 +0000 (16:33 -0800)]
mesa: Statically allocate glthread command buffer in the batch struct.
This avoids an extra pointer dereference in the marshalling functions,
which, with the instruction count doing in the low 30s, could actually
matter for main-thread performance.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Eric Anholt [Thu, 27 Dec 2012 20:11:57 +0000 (12:11 -0800)]
glapi: Mark vertex attrib pointer functions as async.
These don't actually read data out of the pointers, they set the
pointers (or offsets in a VBO) to be used in a later draw call.
v2: Don't forget glVertexAttribIPointer, and don't bother with annotations
on aliases.
v3: Mark CompressedTexSubImage1D as sync also.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Paul Berry [Fri, 16 Nov 2012 19:43:08 +0000 (11:43 -0800)]
mesa: Custom thread marshalling for Flush.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Paul Berry [Wed, 14 Nov 2012 18:40:22 +0000 (10:40 -0800)]
mesa: Custom thread marshalling for ShaderSource.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Eric Anholt [Wed, 2 Jan 2013 22:12:04 +0000 (14:12 -0800)]
mesa: Connect the generated GL command marshalling code to the build.
v2: Rebase on the Begin/End changes, and just disable this feature on
non-GL-core.
v3: (Timothy Arceri) enable for non-GL-core contexts. Remove
unrelated safe_mul() hunk. while loop style fix.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Marek Olšák [Wed, 28 Sep 2016 23:00:39 +0000 (01:00 +0200)]
Revert "mesa: make _mesa_alloc_dispatch_table() static"
This reverts commit
4009d22b61e76850b1b725f4e491da05c2406fa4.
glthread needs it.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Paul Berry [Wed, 3 Oct 2012 22:39:52 +0000 (15:39 -0700)]
mesa: Create pointers for multithread marshalling dispatch table.
This patch splits the context's CurrentDispatch pointer into two
pointers, CurrentClientDispatch, and CurrentServerDispatch, so that
when doing multithread marshalling, we can distinguish between the
dispatch table that's being used by the client (to serialize GL calls
into the marshal buffer) and the dispatch table that's being used by
the server (to execute the GL calls).
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Eric Anholt [Fri, 28 Dec 2012 23:05:34 +0000 (15:05 -0800)]
mesa: Add infrastructure for a worker thread to process GL commands.
v2: Keep an allocated buffer around instead of checking for one at the
start of every GL command. Inline the now-small space allocation
function.
v3: Remove duplicate !glthread->shutdown check, process remaining work
before shutdown.
v4: Fix leaks on destroy.
V5: (Timothy Arceri) fix order of source files in makefile
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Eric Anholt [Thu, 3 Jan 2013 19:56:54 +0000 (11:56 -0800)]
mesa: Validate count parameters when marshalling.
Otherwise, for example, glDeleteBuffers(-1, &bo) gets you a segfault
instead of GL_INVALID_VALUE.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Paul Berry [Wed, 7 Nov 2012 21:03:13 +0000 (13:03 -0800)]
glapi: Generate GL API marshalling code from the XML.
This is not yet used in the build, just generated.
v2: Add missing build dependencies.
v3: Avoid mixing declarations and code, remove logic for avoiding emitting
code that the compiler's optimizer can deal with anyway.
v4: (Timothy Arceri) move safe_mul() genereation here from a later patch.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Eric Anholt [Tue, 5 Mar 2013 19:51:18 +0000 (11:51 -0800)]
glapi: Mark compressed teximage functions as sync.
Without doing some additional tracking, we won't know whether the data
will be immediate user data, or will be loaded from a PBO. The normal
teximage functions will be sync by default because they don't know up
front what the size of their image data is. But for compressed teximage,
we have the count information, so they would end up async by default.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Paul Berry [Wed, 7 Nov 2012 21:04:40 +0000 (13:04 -0800)]
glapi: Annotate functions with "marshal" attribute.
Several API functions require special treatment in order to be marshalled
to a background thread. Others can't be safely executed in a background
thread and need to be executed synchronously (e.g. since they return data
through a pointer argument).
This annotation will be used when code generating thread marshalling code,
to ensure that each function is marshalled in the correct way.
Note that PixelMap functions are marked as synchronous for now since
their pointer may be relative to buffer on the GPU, so we'll need
special logic to marshal them properly.
v2: Move description of attribute types to a comment in the dtd file.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Eric Anholt [Fri, 28 Dec 2012 01:39:37 +0000 (17:39 -0800)]
egl: Implement __DRI_BACKGROUND_CALLABLE
v2: (Timothy Arceri) use C99 initializers.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Paul Berry [Wed, 14 Nov 2012 23:50:27 +0000 (15:50 -0800)]
glx: Implement __DRI_BACKGROUND_CALLABLE
v2: Marek: Add DRI3 support.
v3: (Timothy Arceri) use C99 initializers.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Paul Berry [Wed, 14 Nov 2012 22:26:22 +0000 (14:26 -0800)]
mesa: Add SetBackgroundContext to dd_function_table
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Paul Berry [Wed, 14 Nov 2012 22:39:21 +0000 (14:39 -0800)]
dri: Update dri_util to keep track of __DRI_BACKGROUND_CALLABLE
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Paul Berry [Wed, 14 Nov 2012 19:13:02 +0000 (11:13 -0800)]
dri_interface: Add new marshalling interfaces to dri_interface.h
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Marek Olšák <maraeo@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Roland Scheidegger [Thu, 16 Mar 2017 03:01:41 +0000 (04:01 +0100)]
gallivm: (trivial) remove duplicated line
pointed out by clang (stored value never read)
Roland Scheidegger [Thu, 16 Mar 2017 02:59:52 +0000 (03:59 +0100)]
draw: (trivial) remove a unnecessary lp_build_alloca()
pointed out by clang (stored value never read)
Ilia Mirkin [Sun, 5 Mar 2017 23:24:44 +0000 (18:24 -0500)]
swr: support layer output in geometry shaders
This makes bin/gl-3.2-layered-rendering-gl-layer-render fail only with
2DMS_ARRAY, which is expected given the lackluster MSAA support. However
all the regular types pass.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Bas Nieuwenhuizen [Wed, 15 Mar 2017 17:49:29 +0000 (18:49 +0100)]
Revert "radv: Emit cache flushes before CP DMA."
This reverts commit
cce43f6d8c40222099badaf52344d6a0eed993f3.
Redundant, as the flush already happens at si_cp_dma_prepare.
Acked-by: Dave Airlie <airlied@redhat.com>
Francisco Jerez [Tue, 14 Mar 2017 00:31:39 +0000 (17:31 -0700)]
gallium/tgsi: Treat UCMP sources as floats to match the GLSL-to-TGSI pass expectations.
Currently the GLSL-to-TGSI translation pass assumes it can use
floating point source modifiers on the UCMP instruction. See the bug
report linked below for an example where an unrelated change in the
GLSL built-in lowering code for atan2 (
e9ffd12827ac11a2d2002a42fa8eb1)
caused the generation of floating-point ir_unop_neg instructions
followed by ir_triop_csel, which is translated into UCMP with a negate
modifier on back-ends with native integer support.
Allowing floating-point source modifiers on an integer instruction
seems like rather dubious design for a transport IR, since the same
semantics could be represented as a sequence of MOV+UCMP instructions
instead, but supposedly this matches the expectations of TGSI
back-ends other than tgsi_exec, and the expectations of the DX10 API.
I take no responsibility for future headaches caused by this
inconsistency.
Fixes a regression of piglit glsl-fs-tan-1 on softpipe introduced by
the above-mentioned glsl front-end commit. Even though the commit
that triggered the regression doesn't seem to have made it to any
stable branches yet, this might be worth back-porting since I don't
see any reason why the bug couldn't have been reproduced before that
point.
Suggested-by: Roland Scheidegger <sroland@vmware.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99817
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Grazvydas Ignotas [Wed, 15 Mar 2017 18:53:56 +0000 (20:53 +0200)]
util/disk_cache: do eviction before creating .tmp
cache_put() first creates a .tmp file and then tries to do eviction.
The recently added LRU eviction code selects non-empty directory with
the oldest access time, but that may easily be the one with just the
new .tmp file, especially on Linux where atime is updated lazily
(with "relatime" mount option, which is the default). So when cache is
small, if random doesn't hit another dir LRU keeps selecting the same
dir with just the .tmp and not deleting anything. To fix this (and the
tests), do eviction earlier.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tim Rowley [Wed, 15 Mar 2017 16:42:43 +0000 (11:42 -0500)]
swr: validate backend state numAttributes
General protection and prevents us from smashing the stack
on the first clear state validation (
a7b8d50bcb). Fixes crash
using icc.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Ben Widawsky [Fri, 21 Oct 2016 01:21:24 +0000 (18:21 -0700)]
gbm: Export a get modifiers
This patch originally had i965 specific code and was named:
commit
61cd3c52b868cf8cb90b06e53a382a921eb42754
Author: Ben Widawsky <ben@bwidawsk.net>
Date: Thu Oct 20 18:21:24 2016 -0700
gbm: Get modifiers from DRI
To accomplish this, two new query tokens are added to the extension:
__DRI_IMAGE_ATTRIB_MODIFIER_UPPER
__DRI_IMAGE_ATTRIB_MODIFIER_LOWER
The query extension only supported 32b queries, and modifiers are 64b,
so we needed two of them.
NOTE: The extension version is still set to 13, so none of this will
actually be called.
v2: Error handling of queryImage (Emil)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Ben Widawsky [Tue, 14 Mar 2017 01:20:02 +0000 (18:20 -0700)]
i965: introduce modifier selection.
Nothing special here other than a brief introduction to modifier
selection. Originally this was part of another patch but was split out
from
gbm: Introduce modifiers into surface/bo creation by request of Emil.
Requested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Ben Widawsky [Tue, 14 Mar 2017 01:19:00 +0000 (18:19 -0700)]
egl/drm: Use modifiers for backbuffer creation
Split into a separate patch from the previous patch as requested by
Emil.
Requested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Ben Widawsky [Thu, 3 Nov 2016 23:14:44 +0000 (16:14 -0700)]
gbm: Introduce modifiers into surface/bo creation
The idea behind modifiers like this is that the user of GBM will have
some mechanism to query what properties the hardware supports for its BO
or surface. This information is directly passed in (and stored) so that
the DRI implementation can create an image with the appropriate
attributes.
A getter() will be added later so that the user GBM will be able to
query what modifier should be used.
Only in surface creation, the modifiers are stored until the BO is
actually allocated. In regular buffer allocation, the correct modifier
can (will be, in future patches be chosen at creation time.
v2: Make sure to check if count is non-zero in addition to testing if
calloc fails. (Daniel)
v3: Remove "usage" and "flags" from modifier creation. Requested by
Kristian.
v4: Take advantage of the "INVALID" modifier added by the GET_PLANE2
series.
v5: Don't bother with storing modifiers for gbm_bo_create because that's
a synchronous operation and we can actually select the correct modifier
at create time (done in a later patch) (Jason)
v6: Make modifier condition outside the check so that dri_use will work
properly (Jason)
Cc: Kristian Høgsberg <krh@bitplanet.net>
References (v4): https://lists.freedesktop.org/archives/intel-gfx/2017-January/116636.html
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
Ben Widawsky [Mon, 13 Mar 2017 21:53:43 +0000 (14:53 -0700)]
i965: Implement basic modifier image creation
This is just a stub for now and will be filled in later.
This was split out of an earlier patch
Requested-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Ben Widawsky [Fri, 4 Nov 2016 18:31:15 +0000 (11:31 -0700)]
dri: Add an image creation with modifiers
Modifiers will be obtained or guessed by the client and passed in during
image creation/import. In guessing, a client might decide to simply pass
along all known modifiers
This requires bumping the DRIimage version.
As of this patch, the modifiers aren't plumbed all the way down, this
patch simply makes sure the interface level stuff is correct.
v2: Don't allow usage + modifiers
v3: Make NAND actually NAND. Bug introduced in v2. (Jason)
v4:
- s/obtains/obtained (Jason)
- Pull out i965 imlemnentation into a later patch (Emil)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Daniel Stone <daniels@collabora.com>
Marek Olšák [Tue, 7 Mar 2017 01:19:47 +0000 (02:19 +0100)]
radeonsi: implement TGSI opcodes TEX_LZ and TXF_LZ
This massively decreases VGPR spilling for DiRT Showdown, because we
no longer have to use v4i32 for 2D fetches when level == 0.
We now use v2i32 for those cases.
DiRT Showdown - Spilled VGPRs: -26 (-81%)
This surprisingly doesn't have any useful effect on performance (+ 0.05%).
Marek Olšák [Tue, 7 Mar 2017 01:26:47 +0000 (02:26 +0100)]
glsl_to_tgsi: use TEX_LZ and TXF_LZ when available
Marek Olšák [Tue, 7 Mar 2017 01:01:08 +0000 (02:01 +0100)]
glsl_to_tgsi: remove a redundant statement
it's the same as the last "else".
Marek Olšák [Tue, 7 Mar 2017 01:15:14 +0000 (02:15 +0100)]
gallium: add TGSI opcodes TEX_LZ and TXF_LZ
for better code generation in radeonsi
Marek Olšák [Tue, 7 Mar 2017 01:09:03 +0000 (02:09 +0100)]
gallium: add PIPE_CAP_TGSI_TEX_TXF_LZ
Samuel Pitoiset [Tue, 14 Mar 2017 23:59:13 +0000 (00:59 +0100)]
radeonsi: disable sinking common instructions down to the end block
Initially this was a workaround for a bug introduced in LLVM 4.0
in the SimplifyCFG pass that caused image instrinsics to disappear
(because they were badly sunk). Finally, this is a win because it
decreases SGPR spilling and increases the number of waves a bit.
Although, shader-db results are good I think we might want to
remove it in the future once the issue is fixed. For now, enable
it for LLVM >= 4.0.
This also fixes a rendering issue with the speedometer in Dirt Rally.
More information can be found here https://reviews.llvm.org/D26348.
Thanks to Dave Airlie for the patch.
v2: - add a FIXME comment
- use if (HAVE_LLVM >= 0x0400) instead
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99484
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97988
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Wed, 15 Mar 2017 11:40:13 +0000 (12:40 +0100)]
tgsi: add missing compute shader entry in tgsi_get_processor_name()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Wed, 15 Mar 2017 12:00:02 +0000 (13:00 +0100)]
radeonsi: clean up tex_fetch_ptrs()
Will also help when the src sampler register will be
TGSI_FILE_CONSTANT for bindless.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Emil Velikov [Thu, 2 Mar 2017 19:02:45 +0000 (19:02 +0000)]
configure.ac: bump pthread-stubs requirement
On platforms that require it, we bump the requirement to 0.4 or later.
Due to an issue with the project [design] any version earlier than it,
is bound to cause issues. For the specifics see the pthread-stubs README
Cc: Uli Schlachter <psychon@znc.in>
Cc: Jonathan Gray <jsg@jsg.id.au>
Cc: Jean-Sébastien Pédron <dumbbell@FreeBSD.org>
Cc: François Tigeot <ftigeot@wolfpond.org>
Cc: Tobias Nygren <tnn@NetBSD.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Emil Velikov [Tue, 27 Sep 2016 12:39:36 +0000 (13:39 +0100)]
glx: don't expose systemTimeExtension for DRI2/DRI3/DRISW
Used/applicable to only dri1 drivers.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Emil Velikov [Thu, 1 Dec 2016 21:21:10 +0000 (21:21 +0000)]
anv: do not open random render node(s)
drmGetDevices2() provides us with enough flexibility to build heuristics
upon. Opening a random node on the other hand will wake up the device,
regardless if it's the one we're interested or not.
v2: Rebase, explicitly require/check for libdrm
v3: Return VK_ERROR_INCOMPATIBLE_DRIVER for no devices (Ilia)
v4: Rebase
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Emil Velikov [Thu, 1 Dec 2016 20:58:20 +0000 (20:58 +0000)]
radv: do not open random render node(s)
drmGetDevices2() provides us with enough flexibility to build heuristics
upon. Opening a random node on the other hand will wake up the device,
regardless if it's the one we're interested or not.
v2: Rebase.
v3: Return VK_ERROR_INCOMPATIBLE_DRIVER for no devices (Ilia)
Cc: Michel Dänzer <michel.daenzer@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Emil Velikov [Thu, 1 Dec 2016 19:53:11 +0000 (19:53 +0000)]
radv/winsys: use drmGetDevice2 API
Analogous to previous commit
v2: Add explicit require_libdrm check.
Cc: Dave Airlie <airlied@redhat.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Emil Velikov [Thu, 1 Dec 2016 19:54:39 +0000 (19:54 +0000)]
winsys/amdgpu: use drmGetDevice2 API
Analogous to previous commit
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98502
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Emil Velikov [Thu, 1 Dec 2016 19:51:03 +0000 (19:51 +0000)]
loader: use drmGetDevice[s]2 API
By this allows us to fetch the device list/info w/o the revision field.
At the moment retrieving the latter wakes up the device.
Note: kernel patch to resolve that should be in 4.10.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Emil Velikov [Thu, 1 Dec 2016 19:48:43 +0000 (19:48 +0000)]
autoconf/scons: bump libdrm to 2.4.75
We'll be using the drmGetDevice[s]2 API in src/loader with next patch.
v2: Rebase.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Emil Velikov [Tue, 24 Jan 2017 21:21:10 +0000 (21:21 +0000)]
util/sha1: drop _mesa_sha1_{update, format} return type
Unused/unchecked by any of the callers.
v2: Fix the glsl cases that have crept in since v1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Emil Velikov [Tue, 24 Jan 2017 21:21:09 +0000 (21:21 +0000)]
util/sha1: rework _mesa_sha1_{init,final}
Rather than having an extra memory allocation [that we currently do not
and act accordingly] just make the API take an pointer to a stack
allocated instance.
This and follow-up steps will effectively make the _mesa_sha1_foo simple
define/inlines around their SHA1 counterparts.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Emil Velikov [Tue, 24 Jan 2017 21:21:08 +0000 (21:21 +0000)]
util/sha1: add non-typedef name for the SHA1_CTX struct
Using typedef(s) is not always the answer and makes it harder for people
to do clever (or one might call nasty) things with the code.
Add a struct name which we will use with follow-up commit.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Bas Nieuwenhuizen [Wed, 15 Mar 2017 07:54:04 +0000 (08:54 +0100)]
radv: Remove unused descriptor set field.
Trivial.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Dave Airlie [Thu, 31 Mar 2016 05:33:16 +0000 (15:33 +1000)]
r600: refactor binding code for attach buffer to CB.
This refactors out the code and fixes it up to be used
for images later. It uses the code in the current RAT binding
for compute.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 31 Mar 2016 05:27:42 +0000 (15:27 +1000)]
r600: refactor out CB setup.
This moves the code to create CB info out into
a separate function so it can be reused in images
code to create RATs.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 31 Mar 2016 05:24:47 +0000 (15:24 +1000)]
r600: refactor texture resource words setup code.
This refactors out the code to setup a texture resource
so we can reuse it later from the images code.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 31 Mar 2016 05:20:42 +0000 (15:20 +1000)]
r600: factor out the code to initialise a buffer resource.
This takes the code required to initialise a buffer resource
out of the texture buffer code, into it's own function.
This is going to be used for the image support later.
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Tue, 26 Jan 2016 03:35:08 +0000 (13:35 +1000)]
r600g: make framebuffer atom rely on dual src blend state.
In order to make ARB_shader_image_load_store, we have to share
the CB space with RATs, so we should only steal the dual src
space if we have dual src enabled.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Mon, 13 Mar 2017 21:23:34 +0000 (14:23 -0700)]
intel/debug: Add a common INTEL_DEBUG=nohiz option
The GL driver had a driconf option (which doesn't make much sense) and
the Vulkan driver had a hand-rolled environment variable. Instead,
let's tie both into the INTEL_DEBUG mechanism and unify things.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason Ekstrand [Mon, 13 Mar 2017 15:10:38 +0000 (08:10 -0700)]
anv/image: Move handling of INTEL_VK_HIZ
This makes it so that you don't get an "Implement gen7 HiZ" perf warning
when you manually disable HiZ on gen8.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Timothy Arceri [Tue, 14 Mar 2017 04:50:34 +0000 (15:50 +1100)]
radv: trivial tidy ups
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Alan Swanson [Mon, 6 Mar 2017 16:17:32 +0000 (16:17 +0000)]
util/disk_cache: scale cache according to filesystem size
Select higher of current 1G default or 10% of filesystem where
cache is located.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Alan Swanson [Mon, 6 Mar 2017 16:17:31 +0000 (16:17 +0000)]
util/disk_cache: actually enforce cache size
Currently only a one in one out eviction so if at max_size and
cache files were to constantly increase in size then so would the
cache. Restrict to limit of 8 evictions per new cache entry.
V2: (Timothy Arceri) fix make check tests
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Alan Swanson [Fri, 10 Mar 2017 16:22:51 +0000 (16:22 +0000)]
util/disk_cache: use LRU eviction rather than random eviction
Still using fast random selection of two-character subdirectory in
which to check cache files rather than scanning entire cache.
v2: Factor out double strlen call
v3: C99 declaration of variables where used
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Timothy Arceri [Tue, 14 Mar 2017 00:22:44 +0000 (11:22 +1100)]
util/disk_cache: don't fallback to an empty cache dir on evict
If we fail to randomly select a two letter cache dir, don't select
an empty dir on fallback.
In real world use we should never hit the fallback path but it can
be hit by tests when the cache is set to a very small max value.
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Timothy Arceri [Mon, 13 Mar 2017 00:07:30 +0000 (11:07 +1100)]
util/disk_cache: use a thread queue to write to shader cache
This should help reduce any overhead added by the shader cache
when programs are not found in the cache.
To avoid creating any special function just for the sake of the
tests we add a one second delay whenever we call dick_cache_put()
to give it time to finish.
V2: poll for file when waiting for thread in test
V3: fix poll delay to really be 100ms, and simplify the wait function
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Timothy Arceri [Sun, 12 Mar 2017 23:14:35 +0000 (10:14 +1100)]
util/disk_cache: add helpers for creating/destroying disk cache put jobs
V2: Make a copy of the data so we don't have to worry about it being
freed before we are done compressing/writing.
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Timothy Arceri [Wed, 8 Mar 2017 23:51:01 +0000 (10:51 +1100)]
util/disk_cache: add thread queue to disk cache
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Dave Airlie [Tue, 14 Mar 2017 21:15:50 +0000 (07:15 +1000)]
radv/ac: workaround regression in llvm 4.0 release
LLVM 4.0 released with a pretty messy regression, that hopefully
get fixed in the future.
This work around was proposed by Tom, and it fixes the CTS regressions
here at least, I'm not sure if this will cause any major side effects,
but correctness over speed and all that.
radeonsi should possibly consider the same workaround until an llvm
fix can be found.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 27 Feb 2017 01:30:41 +0000 (11:30 +1000)]
radv/ac: gather4 cube workaround integer
This fix is extracted from amdgpu-pro shader traces.
It appears the gather4 workaround for integer types doesn't
work for cubes, so instead if forces a float scaled sample,
then converts to integer.
It modifies the descriptor before calling the gather.
This also produces some ugly asm code for reasons specified
in the patch, llvm could probably do better than dumping
sgprs to vgprs.
This fixes:
dEQP-VK.glsl.texture_gather.basic.cube.rgba8*
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Tue, 14 Mar 2017 21:57:55 +0000 (22:57 +0100)]
radv: Set driver version to mesa version;
I couldn't really find an encoding in the spec. I'm not sure it
prescribes VK_MAKE_VERSION format, but vulkan.gpuinfo.org interprets
it that way by default. vulkaninfo gives the raw number, so we could
alternatively do something like
17001000, but that doesn't show
up right on vulkan.gpuinfo.org again. Looking at that site, the -pro
driver also uses VK_MAKE_VERSION, so keeping consistency is probably
best.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Tue, 14 Mar 2017 21:37:03 +0000 (22:37 +0100)]
radv: Increase api version to 1.0.42.
I've skimmed to changes from 1.0.5 to 1.0.42 and I think we have all
changes. We're still not conformant ofcourse, but this should not
regress stuff,
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Tue, 14 Mar 2017 02:26:06 +0000 (19:26 -0700)]
util/vk: Add helpers for finding an extension struct
Reviewed-by: Dave Airlie <airlied@redhat.com>
Alex Smith [Tue, 14 Mar 2017 15:26:32 +0000 (15:26 +0000)]
radv: Flush before copying with PKT3_WRITE_DATA in CmdUpdateBuffer
Need to flush before updating the buffer to ensure that the copy is
ordered after previous accesses (assuming the app has performed the
appropriate barriers).
This fixes potential issues due to draws prior to an update reading
the new buffer content, despite having the necessary barriers between
them.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Tue, 14 Mar 2017 20:46:54 +0000 (21:46 +0100)]
radv: Emit cache flushes before CP DMA.
The flushes could be due to TRANSFER barriers.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Jan Beich [Sun, 12 Mar 2017 03:19:14 +0000 (03:19 +0000)]
Convert sed(1) syntax to be compatible with FreeBSD and OpenBSD
BSD regex library doesn't support extended RE escapes (e.g. \+) and
shorthand character classes (e.g. \s, \S) and SVR4-style word
delimiters[1] (on DragonFly and NetBSD). Both GNU and BSD sed support
-E and -r to enable extended RE but OS X still lacks -r.
[1] https://www.illumos.org/issues/516
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Eric Engestrom <eric.engestrom@imgtec.com> (GNU sed)
Jason Ekstrand [Tue, 14 Mar 2017 02:30:26 +0000 (19:30 -0700)]
anv: Properly enumerate physical devices when none are present
Jason Ekstrand [Thu, 9 Mar 2017 04:23:05 +0000 (20:23 -0800)]
nir/constant_expressions: Refactor helper functions
Apart from avoiding some unneeded size cases, this shouldn't have any
actual functional impact.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jason Ekstrand [Wed, 8 Mar 2017 03:54:37 +0000 (19:54 -0800)]
nir: Rework conversion opcodes
The NIR story on conversion opcodes is a mess. We've had way too many
of them, naming is inconsistent, and which ones have explicit sizes was
sort-of random. This commit re-organizes things and makes them all
consistent:
- All non-bool conversion opcodes now have the explicit size in the
destination and are named <src_type>2<dst_type><size>.
- Integer <-> integer conversion opcodes now only come in i2i and u2u
forms (i2u and u2i have been removed) since the only difference
between the different integer conversions is whether or not they
sign-extend when up-converting.
- Boolean conversion opcodes all have the explicit size on the bool and
are named <src_type>2<dst_type>.
Making things consistent also allows nir_type_conversion_op to be moved
to nir_opcodes.c and auto-generated using mako. This will make adding
int8, int16, and float16 versions much easier when the time comes.
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Wed, 8 Mar 2017 03:32:50 +0000 (19:32 -0800)]
i965/fs: Re-arrange conversion operations
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason Ekstrand [Wed, 8 Mar 2017 02:32:17 +0000 (18:32 -0800)]
i965/vec4: Get rid of the type parameter from to/from_double
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason Ekstrand [Wed, 8 Mar 2017 00:46:44 +0000 (16:46 -0800)]
glsl/nir: Use nir_type_conversion_op
Using the helper is way better than hand-coding the universe.
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Mon, 13 Mar 2017 20:07:24 +0000 (13:07 -0700)]
nir: Rewrite nir_type_conversion_op
The original version was very convoluted and tried way too hard to not
just have the nested switch statement that it needs. Let's just write
the obvious code and then we know it's correct. This fixes a bunch of
missing cases particularly with int64.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Jason Ekstrand [Wed, 8 Mar 2017 00:46:17 +0000 (16:46 -0800)]
nir: Add a get_nir_type_for_glsl_base_type helper
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Wed, 8 Mar 2017 18:32:40 +0000 (10:32 -0800)]
nir/validate: Rework ALU bit-size rule validation
The original bit-size validation wasn't capable of properly dealing with
instructions with variable bit sizes. An attempt was made to handle it
by looking at source and destinations but, because the validation was
done in validate_alu_(src|dest), it didn't really have the needed
information. The new validation code is much more straightforward and
should be more correct.
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Fri, 3 Mar 2017 00:25:59 +0000 (16:25 -0800)]
nir/validate: Validate that bit sizes and components always match
We've always required bit sizes to match but the rules for number of
components have been a bit loose. You've never been allowed to source
from something with less components than you consume, but more has
always been fine. This changes the validator to require that they match
exactly. The fact that they don't always match has been a source of
confusion in NIR for quite some time and it's time we got rid of it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 3 Mar 2017 05:42:06 +0000 (21:42 -0800)]
nir: Make image_size a variable-width intrinsic
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 3 Mar 2017 05:39:58 +0000 (21:39 -0800)]
i965/fs: Use num_components from the SSA def in image intrinsics
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 3 Mar 2017 03:27:57 +0000 (19:27 -0800)]
nir/lower_tex: Use tex_instr_dest_size for txs destinations
Using coord_components of the source texture is correct for everything
except cube maps where it's off by one.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 3 Mar 2017 03:03:01 +0000 (19:03 -0800)]
nir/spirv: Restrict the number of channels in texture coordinates
Some SPIR-V texturing instructions pack more than the texture coordinate
into the coordinate source. We need to mask off the unused channels.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Fri, 3 Mar 2017 01:10:24 +0000 (17:10 -0800)]
nir/copy_prop: Respect the source's number of components
In the near future we are going to require that the num_components in a
src dereference match the num_components of the SSA value being
dereferenced. To do that, we need copy_prop to not remove our MOVs from
a larger SSA value into an instruction that uses fewer channels.
Because we suddenly have to know how many components each source has,
this makes the pass a bit more complicated. Fortunately, copy
propagation is the only pass that cares about the number of components
are read by any given source so it's fairly contained.
Shader-db results on Sky Lake:
total instructions in shared programs:
13318947 ->
13320265 (0.01%)
instructions in affected programs: 260633 -> 261951 (0.51%)
helped: 324
HURT: 1027
Looking through the hurt programs, about a dozen are hurt by 3
instructions and the rest are all hurt by 2 instructions. From a
spot-check of the shaders, the story is always the same: They get a
vec4 from somewhere (frequently an input) and use the first two or three
components as a texture coordinate. Because of the vector component
mismatch, we have a mov or, more likely, a vecN sitting between the
texture instruction and the input. This means that the back-end inserts
a bunch of MOVs and split_virtual_grfs() goes to town. Because the
texture coordinate is also used by some other calculation, register
coalesce can't combine them back together and we end up with an extra 2
MOV instructions in our shader.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>