mesa.git
7 years agovc4: Avoid making temporaries for assignments to NIR registers.
Eric Anholt [Tue, 18 Oct 2016 20:08:02 +0000 (13:08 -0700)]
vc4: Avoid making temporaries for assignments to NIR registers.

Getting stores to NIR regs to not generate new MOVs is tricky, since the
result we're trying to store into the NIR reg may have been from a
conditional update of a temp, or a series of packed writes.  The easiest
solution seems to be to require that nir_store_dest()'s arg comes from an
SSA temp.

This causes us to put in a few more temporary MOVs in the NIR SSA dest
case, but copy propagation successfully cleans those up.

The shader-db change is modest:

total instructions in shared programs: 93774 -> 93598 (-0.19%)
instructions in affected programs:     14760 -> 14584 (-1.19%)
total estimated cycles in shared programs: 212135 -> 211946 (-0.09%)
estimated cycles in affected programs:     27005 -> 26816 (-0.70%)

but I was seeing patterns in some register-allocation failures in DEQP
tests that looked like the extra MOVs would increase maximum register
pressure in loops.  Some debug code indicates that that's not the case,
though I'm still a bit confused by that result.

7 years agovc4: Add a comment with discussion of how simulation works.
Eric Anholt [Thu, 20 Oct 2016 20:34:54 +0000 (13:34 -0700)]
vc4: Add a comment with discussion of how simulation works.

7 years agovc4: Move simulator winsys mapping and tracking to the simulator.
Eric Anholt [Thu, 20 Oct 2016 18:50:03 +0000 (11:50 -0700)]
vc4: Move simulator winsys mapping and tracking to the simulator.

One tiny hack is left in vc4_bufmgr.c for what kind of mapping we got so
that we can free it.

7 years agovc4: Move simulator memory management to a u_mm.h heap.
Eric Anholt [Wed, 12 Oct 2016 17:30:41 +0000 (10:30 -0700)]
vc4: Move simulator memory management to a u_mm.h heap.

Now we aren't limited to 256MB total allocated across a driver instance,
just 256MB at one time.  We're still copying in and out, which should get
fixed.

7 years agovc4: Move simulator globals into a struct.
Eric Anholt [Wed, 12 Oct 2016 00:44:43 +0000 (17:44 -0700)]
vc4: Move simulator globals into a struct.

I would like to put a couple more things in here, so it's time to package
it up.

7 years agovc4: Restructure the simulator mode.
Eric Anholt [Tue, 11 Oct 2016 23:47:58 +0000 (16:47 -0700)]
vc4: Restructure the simulator mode.

Rather than having simulator mode changes scattered around vc4_bufmgr.c
and vc4_screen.c, make vc4_bufmgr.c just call a vc4_simulator_ioctl, which
then dispatches to a corresponding implementation.

This will give the simulator support a centralized place to do tricks like
storing most BOs directly in simulator memory rather than copying in and
out.

This leaves special casing of mmaping BOs and execution, because of the
winsys mapping.

7 years agovc4: Fix termination of the initial scan for branch targets.
Eric Anholt [Thu, 20 Oct 2016 23:48:12 +0000 (16:48 -0700)]
vc4: Fix termination of the initial scan for branch targets.

The loop is scanning until the original max_ip (size of the BO), but we
want to not examine any code after the PROG_END's delay slots.  There was
a block trying to do that, except that we had some early continue
statements if the signal wasn't a PROG_END or a BRANCH.

The failure mode would be that a valid shader is rejected because some
undefined memory after the PROG_END slots is parsed as a branch and the
rest of its setup is illegal.  I haven't seen this in the wild, but
valgrind was complaining and the new userland simulator code started
triggering it.

7 years agoconfigure: Get rid of the --disable-vulkan-icd-full-driver-path flag
Jason Ekstrand [Thu, 20 Oct 2016 23:04:36 +0000 (16:04 -0700)]
configure: Get rid of the --disable-vulkan-icd-full-driver-path flag

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
7 years agoanv: Always use the full driver path in the intel_icd.*.json
Jason Ekstrand [Thu, 20 Oct 2016 23:04:16 +0000 (16:04 -0700)]
anv: Always use the full driver path in the intel_icd.*.json

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
7 years agoanv: Suffix the intel_icd file with the host CPU
Jason Ekstrand [Thu, 20 Oct 2016 22:46:21 +0000 (15:46 -0700)]
anv: Suffix the intel_icd file with the host CPU

Vulkan has a multi-arch problem... The idea behind the Vulkan loader is
that you have a little json file on your disk that tells the loader where
to find drivers.  The loader looks for these json files in standard
locations, and then goes and loads the my_driver.so's that they specify.
This allows you as a driver implementer to put their driver wherever on the
disk they want so long as the ICD points in the right place.

For a multi-arch system, however, you may have multiple libvulkan_intel.so
files installed that the loader needs to pick depending on architecture.
Since the ICD file format does not specify any architecture information,
you can't tell the loader where to find the 32-bit version vs. the 64-bit
version.  The way that packagers have been dealing with this is to place
libvulkan_intel.so in the top level lib directory and provide just a name
(and no path) to the loader.  It will then use the regular system search
paths and find the correct driver.  While this solution works fine for
distro-installed Vulkan drivers, it doesn't work so well for user-installed
drivers because they may put it in /opt or $HOME/.local or some other more
exotic location.  In this case, you can't use an ICD json file with just a
library name because it doesn't know where to find it; you also have to add
that to your library lookup path via LD_LIBRARY_PATH or similar.

This patch handles both use-cases by taking advantage of the fact that the
loader dlopen()s each of the drivers and, if one dlopen() calls fails, it
silently continues on to open other drivers.  By suffixing the icd file, we
can provide two different json files: intel_icd.x86_64.json and
intel_icd.i686.json with different paths.  Since dlopen() will only succeed
on the libvulkan_intel.so of the right arch, the loader will happily ignore
the others and load that one.  This allows us to properly handle multi-arch
while still providing a full path so user installs will work fine.

I tested this on my Fedora 25 machine with 32 and 64-bit builds of our
Vulkan driver installed and 32 and 64-bit builds of crucible.  It seems to
work just fine.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
7 years agoradeonsi: fix a regression in si_eliminate_const_output
Nicolai Hähnle [Thu, 20 Oct 2016 11:05:40 +0000 (13:05 +0200)]
radeonsi: fix a regression in si_eliminate_const_output

A constant value of float type is not necessarily a ConstantFP: it could also
be a constant expression that for some reason hasn't been folded.

This fixes a regression in GL45-CTS.arrays_of_arrays_gl.InteractionFunctionCalls2
that was introduced by commit 3ec9975555d1cc5365413ad9062f412904f944a3.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agonv50,nvc0: don't keep track of whether fb rt0 is integer-only
Ilia Mirkin [Thu, 20 Oct 2016 02:36:03 +0000 (22:36 -0400)]
nv50,nvc0: don't keep track of whether fb rt0 is integer-only

This reverts commits 1af0641db345209c076e9b1ba4dca7524541671a and
a6ad49cbbd599aec054d0a3163fff5ad724f2b18.

st/mesa adjusts the rasterizer state for us now.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
7 years agoRevert "Revert "mapi: export all GLES 3.2 functions in libGLESv2.so""
Francisco Jerez [Wed, 19 Oct 2016 03:44:10 +0000 (20:44 -0700)]
Revert "Revert "mapi: export all GLES 3.2 functions in libGLESv2.so""

This reverts commit 85e9bbc14d93fa7166c9ae075ee7ae29a8313e3f.  The
previous commit should help with the scons build failure caused by the
original commit.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
7 years agoglapi: Move PrimitiveBoundingBox and BlendBarrier definitions into ES3.2 category.
Francisco Jerez [Tue, 18 Oct 2016 21:53:20 +0000 (14:53 -0700)]
glapi: Move PrimitiveBoundingBox and BlendBarrier definitions into ES3.2 category.

These two GLES 3.2 entry points were being defined in the category of
the ARB_ES3_2_compatibility and KHR_blend_equation_advanced extensions
respectively instead of in the ES3.2 category.  Defining them in the
ES3.2 category makes sure that the gl_procs.py generator emits
declarations in the glprocs.h header file for the unsuffixed GLES-only
entry points that PrimitiveBoundingBoxARB and BlendBarrierKHR
respectively alias.  This should avoid a compilation failure during
scons builds in combination with "mapi: export all GLES 3.2 functions
in libGLESv2.so".

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
7 years agoutil: Include string.h in bitscan.h.
Vinson Lee [Thu, 20 Oct 2016 01:03:12 +0000 (18:03 -0700)]
util: Include string.h in bitscan.h.

Fix build error with clang.

  Compiling src/compiler/glsl/link_varyings.cpp ...
In file included from src/compiler/glsl/link_varyings.cpp:33:
In file included from src/compiler/glsl/glsl_symbol_table.h:34:
In file included from src/compiler/glsl/ir.h:33:
In file included from src/compiler/glsl_types.h:29:
/usr/include/string.h:518:12: error: exception specification in declaration does not match previous declaration
extern int ffs (int __i) __THROW __attribute__ ((__const__));
           ^
src/util/bitscan.h:51:13: note: expanded from macro 'ffs'
            ^
src/util/bitscan.h:96:18: note: previous declaration is here
   const int i = ffs(*mask) - 1;
                 ^
src/util/bitscan.h:51:13: note: expanded from macro 'ffs'
            ^

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97952
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
7 years agonvc0: do not break 3D state by pushing MS coordinates on Fermi
Samuel Pitoiset [Wed, 19 Oct 2016 22:41:00 +0000 (00:41 +0200)]
nvc0: do not break 3D state by pushing MS coordinates on Fermi

Long story short, 3D and CP are aliased on Fermi and initializing
compute after pushing the MS sample coordinate offsets seems to
corrupt 3D state for weird reasons.

I still don't have the faintest clue what is going on, but
this seems to only affect Fermi generation. A possible fix
could be to use two different channels, one for 3D and one
for CP.

This fixes a bunch of regressions pinpointed by piglit.

Fixes: "nvc0: fix up image support for allowing multiple samples"
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agonvc0: translate compute shaders at program creation
Samuel Pitoiset [Thu, 20 Oct 2016 16:08:44 +0000 (18:08 +0200)]
nvc0: translate compute shaders at program creation

This makes shader-db reports results for compute shaders.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agoi965: Reorder PCI ID list to match release order
Ben Widawsky [Tue, 18 Oct 2016 20:50:08 +0000 (13:50 -0700)]
i965: Reorder PCI ID list to match release order

I have some OCD...

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
7 years agoi965: Add some APL and KBL SKU strings
Ben Widawsky [Tue, 18 Oct 2016 20:32:08 +0000 (13:32 -0700)]
i965: Add some APL and KBL SKU strings

We got a couple for products that exist on ark.intel.com, so let's just
put them in now.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
7 years agovbo: clean up with 'indent', whitespace fixes, etc in vbo_exec_array.c
Brian Paul [Fri, 14 Oct 2016 15:35:43 +0000 (09:35 -0600)]
vbo: clean up with 'indent', whitespace fixes, etc in vbo_exec_array.c

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
7 years agovbo: whitespace fixes and reformatting in vbo_exec_api.c
Brian Paul [Fri, 14 Oct 2016 15:23:37 +0000 (09:23 -0600)]
vbo: whitespace fixes and reformatting in vbo_exec_api.c

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
7 years agovbo: minor clean-up in vbo_exec_api.c
Brian Paul [Fri, 14 Oct 2016 15:18:18 +0000 (09:18 -0600)]
vbo: minor clean-up in vbo_exec_api.c

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
7 years agovbo: move attribute type assignment
Brian Paul [Thu, 13 Oct 2016 20:43:36 +0000 (14:43 -0600)]
vbo: move attribute type assignment

If the attribute type is changing, we would have found that earlier in
the ATTR_UNION() macro and would have called vbo_exec_fixup_vertex().
So move the assignment into that function so we don't do it every time.

No Piglit regressions.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
7 years agovbo: rename reset_attrfv() to vbo_reset_all_attr()
Brian Paul [Thu, 13 Oct 2016 20:21:46 +0000 (14:21 -0600)]
vbo: rename reset_attrfv() to vbo_reset_all_attr()

Use a better name.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
7 years agovbo: make vbo_reset_attr() static
Brian Paul [Thu, 13 Oct 2016 20:20:25 +0000 (14:20 -0600)]
vbo: make vbo_reset_attr() static

Not called from any other file.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
7 years agovbo: trivial indentation fix in vbo_exec_api.c
Brian Paul [Thu, 13 Oct 2016 20:11:06 +0000 (14:11 -0600)]
vbo: trivial indentation fix in vbo_exec_api.c

7 years agogallivm: try to fix build with LLVM <= 3.4 due to missing CallSite.h
Marek Olšák [Thu, 20 Oct 2016 09:21:26 +0000 (11:21 +0200)]
gallivm: try to fix build with LLVM <= 3.4 due to missing CallSite.h

Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
7 years agoradeonsi: fix build of si_eliminate_const_vs_outputs on LLVM <= 3.8
Marek Olšák [Wed, 19 Oct 2016 22:11:48 +0000 (00:11 +0200)]
radeonsi: fix build of si_eliminate_const_vs_outputs on LLVM <= 3.8

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallivm: add wrappers for missing functions in LLVM <= 3.8
Marek Olšák [Wed, 19 Oct 2016 22:09:44 +0000 (00:09 +0200)]
gallivm: add wrappers for missing functions in LLVM <= 3.8

radeonsi needs these.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: fix 64-bit loads from LDS
Nicolai Hähnle [Tue, 18 Oct 2016 16:40:38 +0000 (18:40 +0200)]
radeonsi: fix 64-bit loads from LDS

Fixes spec/arb_tessellation_shader/execution/dvec[23]-vs-tcs-tes, among
others.

Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agost/mesa: only set primitive_restart when the restart index is in range
Nicolai Hähnle [Wed, 19 Oct 2016 16:14:48 +0000 (18:14 +0200)]
st/mesa: only set primitive_restart when the restart index is in range

Even when enabled, primitive restart has no effect when the restart index
is larger than the representable values in the index buffer.

Fixes GL45-CTS.gtf31.GL3Tests.primitive_restart.primitive_restart_upconvert
for radeonsi VI.

v2: add an explanatory comment

Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
7 years agost/glsl_to_tgsi: sort input and output decls by TGSI index
Nicolai Hähnle [Tue, 18 Oct 2016 15:35:45 +0000 (17:35 +0200)]
st/glsl_to_tgsi: sort input and output decls by TGSI index

Fixes a regression introduced by commit 777dcf81b.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98307
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
7 years agost/glsl_to_tgsi: fix block copies of arrays of structs
Nicolai Hähnle [Sun, 16 Oct 2016 15:34:33 +0000 (17:34 +0200)]
st/glsl_to_tgsi: fix block copies of arrays of structs

Use a full writemask in this case. This is relevant e.g. when a function
has an inout argument which is an array of structs.

v2: use C-style comment (Timothy Arceri)

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
7 years agost/glsl_to_tgsi: fix block copies of arrays of doubles
Nicolai Hähnle [Sun, 16 Oct 2016 15:33:51 +0000 (17:33 +0200)]
st/glsl_to_tgsi: fix block copies of arrays of doubles

Set the type of the left-hand side to the same as the right-hand side,
so that when the base type is double, the writemask of the MOV instruction
is properly fixed up.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
7 years agoglsl: Indirect array indexing on non-last SSBO member must fail compilation
Iago Toral Quiroga [Tue, 18 Oct 2016 12:15:36 +0000 (14:15 +0200)]
glsl: Indirect array indexing on non-last SSBO member must fail compilation

After the changes in comit 5b2675093e863a52, we moved this check to the
linker, but the spec expects this to be checked at compile-time. There are
dEQP tests that expect an error at compile time and the spec seems to confirm
that expectation:

"Except for the last declared member of a shader storage block (section 4.3.9
 “Interface Blocks”), the size of an array must be declared (explicitly sized)
 before it is indexed with anything other than an integral constant expression.
 The size of any array must be declared before passing it as an argument to a
 function. Violation of any of these rules result in compile-time errors. It
 is legal to declare an array without a size (unsized) and then later
 redeclare the same name as an array of the same type and specify a size, or
 index it only with integral constant expressions (implicitly sized)."

Commit 5b2675093e863a52 tries to take care of the case where we have implicitly
sized arrays in SSBOs and it does so by checking the max_array_access field
in ir_variable during linking. In this patch we change the approach: we look
for indirect access on SSBO arrays, and when we find one, we emit a
compile-time error if the accessed member is not the last in the SSBO
definition.

There is a corner case that the specs do not address directly though and that
dEQP checks for: the case of an unsized array in an SSBO definition that is
not defined last but is never used in the shader code either. The following
dEQP tests expect a compile-time error in this scenario:

dEQP-GLES31.functional.debug.negative_coverage.callbacks.shader.compile_compute_shader
dEQP-GLES31.functional.debug.negative_coverage.get_error.shader.compile_compute_shader
dEQP-GLES31.functional.debug.negative_coverage.log.shader.compile_compute_shader

However, since the unsized array is never used it is never indexed with a
non-constant expression, so by the spec quotation above, it should be valid and
the tests are probably incorrect.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agonv50/ir: process texture offset sources as regular sources
Ilia Mirkin [Wed, 19 Oct 2016 05:20:03 +0000 (01:20 -0400)]
nv50/ir: process texture offset sources as regular sources

With ARB_gpu_shader5, texture offsets can be any source, including TEMPs
and IN's. Make sure to process them as regular sources so that we pick
up masks, etc.

This should fix some CTS tests that feed offsets directly to
textureGatherOffset, and we were not picking up the input use, thus not
advertising it in the shader header.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dave Airlie <airlied@redhat.com>
Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
7 years agonv50,nvc0: avoid reading out of bounds when getting bogus so info
Ilia Mirkin [Wed, 19 Oct 2016 04:05:26 +0000 (00:05 -0400)]
nv50,nvc0: avoid reading out of bounds when getting bogus so info

The state tracker tries to attach the info to the wrong shader. This is
easy enough to protect against.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
7 years agowsi/wayland: fix error path
Eric Engestrom [Wed, 19 Oct 2016 23:09:11 +0000 (00:09 +0100)]
wsi/wayland: fix error path

Fixes: 1720bbd353d87412754f ("anv/wsi: split image alloc/free out to separate fns.")
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoanv: drop unused zero macro.
Dave Airlie [Wed, 19 Oct 2016 03:36:23 +0000 (13:36 +1000)]
anv: drop unused zero macro.

I can't see this being used anywhere.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: use emit_icmp for samples_identical
Dave Airlie [Thu, 20 Oct 2016 00:42:22 +0000 (01:42 +0100)]
radv: use emit_icmp for samples_identical

On a debug llvm build we'd assert on the next compare
when the return from samples_identical was i1 instead
of i32.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoi965/cs: Don't use a thread channel ID for small local sizes
Jordan Justen [Wed, 6 Jul 2016 22:08:27 +0000 (15:08 -0700)]
i965/cs: Don't use a thread channel ID for small local sizes

When the local group size is 8 or less, we will execute the program at
most 1 time. Therefore, the local channel ID will always be 0. By
using a constant 0 in this case we can prevent using push constant
data.

This is not expected to be common a occurance in real applications,
but it has been seen in tests.

We could extend this optimization to 16 and 32 for SIMD16 and SIMD32,
but it gets a bit more complicated, because this optimization is
currently being done early on, before we have decided the SIMD size.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
7 years agoi965/cs: Use udiv/umod for local IDs
Jordan Justen [Wed, 19 Oct 2016 17:25:21 +0000 (10:25 -0700)]
i965/cs: Use udiv/umod for local IDs

This allows for more optimizations relating to power-of-two divisions.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
7 years agomesa: remove unused LocalSizeVariable
Timothy Arceri [Tue, 18 Oct 2016 23:51:48 +0000 (10:51 +1100)]
mesa: remove unused LocalSizeVariable

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
7 years agonvc0/ir: simplify predicate logic for GK104 atomic operations
Samuel Pitoiset [Wed, 19 Oct 2016 11:09:49 +0000 (13:09 +0200)]
nvc0/ir: simplify predicate logic for GK104 atomic operations

The predicate is always CC_NOT_P as defined in
processSurfaceCoordsNVE4(), so we only want to emit OR.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agonvc0/ir: remove useless NVC0LoweringPass::gMemBase
Samuel Pitoiset [Wed, 19 Oct 2016 11:02:02 +0000 (13:02 +0200)]
nvc0/ir: remove useless NVC0LoweringPass::gMemBase

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
7 years agonv50/ir: print CCTL subops in debug mode
Samuel Pitoiset [Wed, 19 Oct 2016 12:01:33 +0000 (14:01 +0200)]
nv50/ir: print CCTL subops in debug mode

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agonir: Optimize integer division and modulus with 1
Ian Romanick [Wed, 19 Oct 2016 15:53:10 +0000 (08:53 -0700)]
nir: Optimize integer division and modulus with 1

The previous power-of-two rules didn't catch idiv (because i965 doesn't
set lower_idiv) and imod cases.  The udiv and umod cases should have
been caught, but I included them for orthogonality.

This fixes silly code observed from compute shaders with local_size_[xy]
= 1.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98299
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoconfigure.ac: enable EGL platform DRM if GBM is enabled
Marek Olšák [Tue, 18 Oct 2016 21:20:29 +0000 (23:20 +0200)]
configure.ac: enable EGL platform DRM if GBM is enabled

since GBM is enabled by default, this is also enabled by default

the whitespace changes remove tabs

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoconfigure.ac: enable GBM by default
Marek Olšák [Tue, 18 Oct 2016 21:19:58 +0000 (23:19 +0200)]
configure.ac: enable GBM by default

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoconfigure.ac: print whether GBM is enabled
Marek Olšák [Tue, 18 Oct 2016 21:18:28 +0000 (23:18 +0200)]
configure.ac: print whether GBM is enabled

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoradeonsi: eliminate trivial constant VS outputs
Marek Olšák [Tue, 18 Oct 2016 13:20:22 +0000 (15:20 +0200)]
radeonsi: eliminate trivial constant VS outputs

These constant value VS PARAM exports:
- 0,0,0,0
- 0,0,0,1
- 1,1,1,0
- 1,1,1,1
can be loaded into PS inputs using the DEFAULT_VAL field, and the VS exports
can be removed from the IR to save export & parameter memory.

After LLVM optimizations, analyze the IR to see which exports are equal to
the ones listed above (or undef) and remove them if they are.

Targeted use cases:
- All DX9 eON ports always clear 10 VS outputs to 0.0 even if most of them
  are unused by PS (such as Witcher 2 below).
- VS output arrays with unused elements that the GLSL compiler can't
  eliminate (such as Batman below).

The shader-db deltas are quite interesting:
(not from upstream si-report.py, it won't be upstreamed)

PERCENTAGE DELTAS    Shaders PARAM exports (affected only)
batman_arkham_origins    589  -67.17 %
bioshock-infinite       1769   -0.47 %
dirt-showdown            548   -2.68 %
dota2                   1747   -3.36 %
f1-2015                  776   -4.94 %
left_4_dead_2           1762   -0.07 %
metro_2033_redux        2670   -0.43 %
portal                   474   -0.22 %
talos_principle          324   -3.63 %
warsow                   176   -2.20 %
witcher2                1040  -73.78 %
----------------------------------------
All affected             991  -65.37 %  ... 9681 -> 3353
----------------------------------------
Total                  26725  -10.82 %  ... 58490 -> 52162

v2: treat Undef as both 0 and 1

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> (v1)
7 years agonv50/ir: silent TGSI_PROPERTY_FS_DEPTH_LAYOUT
Samuel Pitoiset [Tue, 18 Oct 2016 17:59:27 +0000 (19:59 +0200)]
nv50/ir: silent TGSI_PROPERTY_FS_DEPTH_LAYOUT

Found that information message while replaying a trace from
Metro 2033 Redux. Mark that property as useless for now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agodocs: add 13.1.0-devel release notes template, bump version
Emil Velikov [Wed, 19 Oct 2016 17:46:22 +0000 (18:46 +0100)]
docs: add 13.1.0-devel release notes template, bump version

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
7 years agodocs: rename release notes to 13.0.0
Emil Velikov [Wed, 19 Oct 2016 16:33:38 +0000 (17:33 +0100)]
docs: rename release notes to 13.0.0

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoradeonsi: remove cb0_is_integer handling
Marek Olšák [Fri, 16 Sep 2016 20:42:54 +0000 (22:42 +0200)]
radeonsi: remove cb0_is_integer handling

st/mesa does this for us.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agost/mesa: disable alpha-test, alpha-to-coverage, alpha-to-one for integer FBs
Marek Olšák [Fri, 16 Sep 2016 20:39:15 +0000 (22:39 +0200)]
st/mesa: disable alpha-test, alpha-to-coverage, alpha-to-one for integer FBs

v2: rebased

Reviewed-by: Brian Paul <brianp@vmware.com>
7 years agomesa: remove gl_shader_compiler_options::EmitNoNoise
Marek Olšák [Sun, 16 Oct 2016 22:54:35 +0000 (00:54 +0200)]
mesa: remove gl_shader_compiler_options::EmitNoNoise

it's always true

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoglsl_to_tgsi: remove code for fixing up TGSI labels
Marek Olšák [Sun, 16 Oct 2016 22:47:49 +0000 (00:47 +0200)]
glsl_to_tgsi: remove code for fixing up TGSI labels

I don't know what this was supposed to do, but all TGSI labels were
always 0.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoglsl_to_tgsi: remove subroutine support
Marek Olšák [Sun, 16 Oct 2016 22:38:41 +0000 (00:38 +0200)]
glsl_to_tgsi: remove subroutine support

Never used. The GLSL compiler doesn't even look at EmitNoFunctions.

v2: add back "return" support in "main"

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agomesa_to_tgsi: remove remnants of flow control and subroutine support
Marek Olšák [Sun, 16 Oct 2016 22:11:21 +0000 (00:11 +0200)]
mesa_to_tgsi: remove remnants of flow control and subroutine support

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agomesa_to_tgsi: drop support for instructions that can't occur here
Marek Olšák [Sun, 16 Oct 2016 22:07:01 +0000 (00:07 +0200)]
mesa_to_tgsi: drop support for instructions that can't occur here

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoglsl_to_tgsi: allocate glsl_to_tgsi_instruction::tex_offsets on demand
Marek Olšák [Sun, 16 Oct 2016 20:08:03 +0000 (22:08 +0200)]
glsl_to_tgsi: allocate glsl_to_tgsi_instruction::tex_offsets on demand

sizeof(glsl_to_tgsi_instruction): 384 -> 264

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoglsl_to_tgsi: merge buffer and sampler fields in glsl_to_tgsi_instruction
Marek Olšák [Sun, 16 Oct 2016 20:04:02 +0000 (22:04 +0200)]
glsl_to_tgsi: merge buffer and sampler fields in glsl_to_tgsi_instruction

sizeof(glsl_to_tgsi_instruction): 416 -> 384

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoglsl_to_tgsi: reduce the size of glsl_to_tgsi_instruction using bitfields
Marek Olšák [Sun, 16 Oct 2016 19:58:13 +0000 (21:58 +0200)]
glsl_to_tgsi: reduce the size of glsl_to_tgsi_instruction using bitfields

sizeof(glsl_to_tgsi_instruction): 464 -> 416

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoglsl_to_tgsi: reduce the size of st_dst_reg and st_src_reg
Marek Olšák [Sun, 16 Oct 2016 19:30:05 +0000 (21:30 +0200)]
glsl_to_tgsi: reduce the size of st_dst_reg and st_src_reg

I noticed that glsl_to_tgsi_instruction is too huge.

sizeof(glsl_to_tgsi_instruction): 752 -> 464 (-38%)

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoglsl_to_tgsi: remove unused st_translate::tex_offsets
Marek Olšák [Sun, 16 Oct 2016 19:28:36 +0000 (21:28 +0200)]
glsl_to_tgsi: remove unused st_translate::tex_offsets

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoglsl_to_tgsi: remove unused parameters from calc_deref_offsets
Marek Olšák [Sun, 16 Oct 2016 19:22:11 +0000 (21:22 +0200)]
glsl_to_tgsi: remove unused parameters from calc_deref_offsets

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoglsl_to_tgsi: use array_id for temp arrays instead of hacking high bits
Marek Olšák [Sun, 16 Oct 2016 21:22:55 +0000 (23:22 +0200)]
glsl_to_tgsi: use array_id for temp arrays instead of hacking high bits

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoreviewers: Throw myself on the GLX grenade
Adam Jackson [Thu, 6 Oct 2016 19:37:54 +0000 (15:37 -0400)]
reviewers: Throw myself on the GLX grenade

Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
7 years agoegl: bring back the default glapi.so name
Eric Engestrom [Wed, 19 Oct 2016 14:09:26 +0000 (15:09 +0100)]
egl: bring back the default glapi.so name

Earlier commit replaced the default platform specific libglapi.so name
with an #error.

This may have been overzealous since the name is the correct for the BSD
platforms, at least. Reinstate the hunk - bringing back OpenBSD, et al.
to a successful build state.

Fixes: 7a9c92d071d ("egl/dri2: non-shared glapi cleanups")
[Emil Velikov: format the patch from Eric, add commit message and tag.]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
7 years agoi965: fix subnr overflow in suboffset()
Iago Toral Quiroga [Tue, 27 Sep 2016 10:23:44 +0000 (12:23 +0200)]
i965: fix subnr overflow in suboffset()

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
7 years agoradv: decompress fmask before reading using texture unit
Dave Airlie [Wed, 19 Oct 2016 07:34:28 +0000 (17:34 +1000)]
radv: decompress fmask before reading using texture unit

Before we can read the fmask using the compute shader, we need
to decompress the fmask in place.

This fixes a bunch of remaining failure and hopefully multisampling
in Talos.

7 years agoradv: fix samples_identical return value.
Dave Airlie [Wed, 19 Oct 2016 05:43:26 +0000 (15:43 +1000)]
radv: fix samples_identical return value.

This was returning an inversion, so not doing as it should have.

We need to compare the fmask value with 0, and return the result
from that.

7 years agoradv: fix wsi porting regression in swapchain destroy.
Dave Airlie [Wed, 19 Oct 2016 03:53:55 +0000 (13:53 +1000)]
radv: fix wsi porting regression in swapchain destroy.

The code in anv is right, there's a pending patch to fix this up
different, but I'll sync the code for now.

7 years agoradv: fix fmask ptr issue
Dave Airlie [Wed, 19 Oct 2016 02:27:04 +0000 (12:27 +1000)]
radv: fix fmask ptr issue

We were using the wrong descriptor in the fmask picking code.

7 years agoradv: simplify fast clear shaders
Dave Airlie [Tue, 18 Oct 2016 03:20:11 +0000 (13:20 +1000)]
radv: simplify fast clear shaders

There is no need for anything but a noop shader here.

7 years agovulkan/wsi: fix out of tree build.
Dave Airlie [Wed, 19 Oct 2016 00:53:51 +0000 (10:53 +1000)]
vulkan/wsi: fix out of tree build.

7 years agoradv: start using defines for the user sgpr offsets
Dave Airlie [Mon, 10 Oct 2016 02:20:36 +0000 (03:20 +0100)]
radv: start using defines for the user sgpr offsets

This adds some comments and adds defines for the user sgprs,
so that we can move them around easier later and not have
to change/revalidate every one of these.

Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: port to common wsi codebase
Dave Airlie [Fri, 14 Oct 2016 06:49:34 +0000 (07:49 +0100)]
radv: port to common wsi codebase

This drops all the radv WSI code in favour of using
the new shared code that was ported from anv

This regresses Talos for now, Jason has pointed out
the bug is in Talos and we should wait for them to fix it.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv: move to using shared wsi code
Dave Airlie [Fri, 14 Oct 2016 06:12:33 +0000 (07:12 +0100)]
anv: move to using shared wsi code

This moves the shared code to a common subdirectory
and makes anv linked to that code instead of the copy
it was using.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/wsi: remove all anv references from WSI common code
Dave Airlie [Fri, 14 Oct 2016 05:36:17 +0000 (06:36 +0100)]
anv/wsi: remove all anv references from WSI common code

the WSI code should be now be clean for sharing.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv: move common wsi code to x11/wayland common files.
Dave Airlie [Fri, 14 Oct 2016 04:42:29 +0000 (05:42 +0100)]
anv: move common wsi code to x11/wayland common files.

Next task is to rename all the anv_ out of this,
and move to a common location

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/wsi/wayland: add callback to get device format properties.
Dave Airlie [Fri, 14 Oct 2016 04:14:45 +0000 (05:14 +0100)]
anv/wsi/wayland: add callback to get device format properties.

This avoids having to know the toplevel API name.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/wsi/wl: stop using device in more places
Dave Airlie [Fri, 14 Oct 2016 02:09:02 +0000 (03:09 +0100)]
anv/wsi/wl: stop using device in more places

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/wsi: split out surface creation to avoid instance API
Dave Airlie [Fri, 14 Oct 2016 01:51:36 +0000 (02:51 +0100)]
anv/wsi: split out surface creation to avoid instance API

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/wsi: move further away from passing anv displays around
Dave Airlie [Fri, 14 Oct 2016 01:38:49 +0000 (02:38 +0100)]
anv/wsi: move further away from passing anv displays around

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/wsi: split image alloc/free out to separate fns.
Dave Airlie [Fri, 14 Oct 2016 00:34:10 +0000 (01:34 +0100)]
anv/wsi: split image alloc/free out to separate fns.

This moves these outside the wsi platform code, so we can reuse
that code

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/wsi: switch to using VkDevice in swapchain
Dave Airlie [Thu, 13 Oct 2016 23:42:56 +0000 (00:42 +0100)]
anv/wsi: switch to using VkDevice in swapchain

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/wsi/x11: more refactoring to use generic handles
Dave Airlie [Thu, 13 Oct 2016 23:35:12 +0000 (00:35 +0100)]
anv/wsi/x11: more refactoring to use generic handles

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/wsi/x11: start refactoring out the image allocation/free functionality
Dave Airlie [Thu, 13 Oct 2016 23:21:17 +0000 (00:21 +0100)]
anv/wsi/x11: start refactoring out the image allocation/free functionality

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/wsi: drop device from get format
Dave Airlie [Thu, 13 Oct 2016 04:32:41 +0000 (05:32 +0100)]
anv/wsi: drop device from get format

Just use the wsi_device instead.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/wsi: remove device from get_support interface
Dave Airlie [Thu, 13 Oct 2016 04:26:03 +0000 (05:26 +0100)]
anv/wsi: remove device from get_support interface

replace with wsi_device and allocator.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/wsi/x11: abstract WSI interface from internals.
Dave Airlie [Thu, 13 Oct 2016 04:25:33 +0000 (05:25 +0100)]
anv/wsi/x11: abstract WSI interface from internals.

This allows the API and the internals to be split, and the
internals shared.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/wsi/x11: push anv_device out of the init/finish routines
Dave Airlie [Thu, 13 Oct 2016 04:18:34 +0000 (05:18 +0100)]
anv/wsi/x11: push anv_device out of the init/finish routines

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/wsi: abstract wsi interfaces away from device a bit more.
Dave Airlie [Thu, 13 Oct 2016 04:14:52 +0000 (05:14 +0100)]
anv/wsi: abstract wsi interfaces away from device a bit more.

This is a step towards separating out the wsi code for sharing

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/wsi/x11: push device out of x11 connection fns.
Dave Airlie [Thu, 13 Oct 2016 04:07:27 +0000 (05:07 +0100)]
anv/wsi/x11: push device out of x11 connection fns.

just pass the allocator/wsi_interface instead.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/wsi: drop device from get caps
Dave Airlie [Thu, 13 Oct 2016 04:27:56 +0000 (05:27 +0100)]
anv/wsi: drop device from get caps

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/wsi: drop get present modes device arg
Dave Airlie [Thu, 13 Oct 2016 04:33:28 +0000 (05:33 +0100)]
anv/wsi: drop get present modes device arg

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoradv/anv/wsi: drop unneeded parameter
Dave Airlie [Thu, 13 Oct 2016 03:43:27 +0000 (04:43 +0100)]
radv/anv/wsi: drop unneeded parameter

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agodraw: improve vertex fetch (v2)
Roland Scheidegger [Sat, 15 Oct 2016 01:53:48 +0000 (03:53 +0200)]
draw: improve vertex fetch (v2)

The per-element fetch has quite some calculations which are constant,
these can be moved outside both the per-element as well as the main
shader loop (llvm can figure out it's constant mostly on its own, however
this can have a significant compile time cost).
Similarly, it looks easier swapping the fetch loops (outer loop per attrib,
inner loop filling up the per vertex elements - this way the aos->soa
conversion also can be done per attrib and not just at the end though again
this doesn't really make much of a difference in the generated code). (This
would also make it possible to vectorize the calculations leading to the
fetches.)
There's also some minimal change simplifying the overflow math slightly.
All in all, the generated code seems to look slightly simpler (depending
on the actual vs), but more importantly I've seen a significant reduction
in compile times for some vs (albeit with old (3.3) llvm version, and the
time reduction is only really for the optimizations run on the IR).
v2: adapt to other draw change.

No changes with piglit.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>