mesa.git
6 years agopython: Use range() instead of xrange()
Mathieu Bridon [Fri, 6 Jul 2018 10:22:18 +0000 (12:22 +0200)]
python: Use range() instead of xrange()

Python 2 has a range() function which returns a list, and an xrange()
one which returns an iterator.

Python 3 lost the function returning a list, and renamed the function
returning an iterator as range().

As a result, using range() makes the scripts compatible with both Python
versions 2 and 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agopython: Better use iterators
Mathieu Bridon [Thu, 5 Jul 2018 13:17:39 +0000 (15:17 +0200)]
python: Better use iterators

In Python 2, iterators had a .next() method.

In Python 3, instead they have a .__next__() method, which is
automatically called by the next() builtin.

In addition, it is better to use the iter() builtin to create an
iterator, rather than calling its __iter__() method.

These were also introduced in Python 2.6, so using it makes the script
compatible with Python 2 and 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agopython: Better sort dictionary keys/values
Mathieu Bridon [Fri, 6 Jul 2018 10:17:50 +0000 (12:17 +0200)]
python: Better sort dictionary keys/values

In Python 2, dict.keys() and dict.values() both return a list, which can
be sorted in two ways:

* l.sort() modifies the list in-place;
* sorted(l) returns a new, sorted list;

In Python 3, dict.keys() and dict.values() do not return lists any more,
but iterators. Iterators do not have a .sort() method.

This commit moves the build scripts to using sorted() on dict keys and
values, which makes them compatible with both Python 2 and Python 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agopython: Better iterate over dictionaries
Mathieu Bridon [Fri, 6 Jul 2018 10:20:26 +0000 (12:20 +0200)]
python: Better iterate over dictionaries

In Python 2, dictionaries have 2 sets of methods to iterate over their
keys and values: keys()/values()/items() and iterkeys()/itervalues()/iteritems().

The former return lists while the latter return iterators.

Python 3 dropped the method which return lists, and renamed the methods
returning iterators to keys()/values()/items().

Using those names makes the scripts compatible with both Python 2 and 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agopython: Stop using the string module
Mathieu Bridon [Thu, 5 Jul 2018 13:17:36 +0000 (15:17 +0200)]
python: Stop using the string module

Most functions in the builtin string module also exist as methods of
string objects.

Since the functions were removed from the string module in Python 3,
using the instance methods directly makes the code compatible with both
Python 2 and Python 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agopython: Better check for keys in dicts
Mathieu Bridon [Thu, 5 Jul 2018 13:17:35 +0000 (15:17 +0200)]
python: Better check for keys in dicts

Python 3 lost the dict.has_key() method. Instead it requires using the
"in" operator.

This is also compatible with Python 2.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agointel: Make the disassembler take a const pointer to the assembly.
Kenneth Graunke [Wed, 11 Jul 2018 17:30:12 +0000 (10:30 -0700)]
intel: Make the disassembler take a const pointer to the assembly.

Disassembling doesn't modify the assembly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agotravis: manually generate sys/syscall.h
Andres Gomez [Thu, 19 Jul 2018 12:33:33 +0000 (15:33 +0300)]
travis: manually generate sys/syscall.h

Until now, the needed bits were wrongly included in linux/memfd.h

Since Travis' sys/syscall.h doesn't provide the SYS_memfd_create, we
generate that header manually, including the needed bits to avoid
compilation problems, as the ones observed after:
3228335b55c ("intel: aubinator: handle GGTT mappings")

v2: replace fixes commit with the first direct user of
    syscall.h (Emil).

Fixes: 3228335b55c ("intel: aubinator: handle GGTT mappings")
Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: Dylan Baker <dylan.c.baker@intel.com>
Cc: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
6 years agodocs: update calendar to match the 18.2 plan with the one announced
Andres Gomez [Thu, 19 Jul 2018 13:03:11 +0000 (16:03 +0300)]
docs: update calendar to match the 18.2 plan with the one announced

Additionally, I've extended the 18.1 cycle by one more release,
tentatively assigned to Dylan, due to the ~2 weeks delay for 18.2.

Cc: Dylan Baker <dylan.c.baker@intel.com>
Cc: Juan A. Suarez <jasuarez@igalia.com>
Cc: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
6 years agodocs: move releases from Fridays to Wednesdays
Andres Gomez [Thu, 19 Jul 2018 13:00:07 +0000 (16:00 +0300)]
docs: move releases from Fridays to Wednesdays

As discussed at:
https://lists.freedesktop.org/archives/mesa-dev/2018-March/188525.html

Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: Dylan Baker <dylan.c.baker@intel.com>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Cc: Carl Worth <cworth@cworth.org>
Cc: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
6 years agodocs: correct typo in the submitting patches instructions
Andres Gomez [Thu, 19 Jul 2018 13:02:19 +0000 (16:02 +0300)]
docs: correct typo in the submitting patches instructions

Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agoradv: Still enable inmemory & API level caching if disk cache is not enabled.
Bas Nieuwenhuizen [Tue, 24 Jul 2018 12:57:42 +0000 (14:57 +0200)]
radv: Still enable inmemory & API level caching if disk cache is not enabled.

That we don't have a background disk cache does not mean we should
prevent the app caching anything.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agogallium/tests: Don't ignore S3TC errors.
Jose Fonseca [Tue, 24 Jul 2018 12:57:05 +0000 (13:57 +0100)]
gallium/tests: Don't ignore S3TC errors.

Now we do full S3TC decompression they should no longer fail.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
6 years agoegl: Fix missing clamping in eglSetDamageRegionKHR
Harish Krupo [Sun, 8 Jul 2018 07:23:00 +0000 (12:53 +0530)]
egl: Fix missing clamping in eglSetDamageRegionKHR

Clamp the x and y co-ordinates of the rectangles.

v2: Clamp width/height after converting to co-ordinates
    (Ilia Merkin)

Signed-off-by: Harish Krupo <harish.krupo.kps@intel.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
6 years agoforward precise-flag if supported
Erik Faye-Lund [Wed, 11 Jul 2018 14:28:30 +0000 (15:28 +0100)]
forward precise-flag if supported

New versions of virglrenderer supports the precise-flag, so let's
forward it from TGSI if that's the case.

This fixes a few dEQP-GLES31 tests:
- dEQP-GLES31.functional.tessellation.common_edge.quads_equal_spacing_precise
- dEQP-GLES31.functional.tessellation.common_edge.quads_fractional_even_spacing_precise
- dEQP-GLES31.functional.tessellation.common_edge.quads_fractional_odd_spacing_precise
- dEQP-GLES31.functional.tessellation.common_edge.triangles_equal_spacing_precise
- dEQP-GLES31.functional.tessellation.common_edge.triangles_fractional_even_spacing_precise
- dEQP-GLES31.functional.tessellation.common_edge.triangles_fractional_odd_spacing_precise

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
6 years agoradeonsi: fix pk2h breakage
Marek Olšák [Tue, 24 Jul 2018 02:11:12 +0000 (22:11 -0400)]
radeonsi: fix pk2h breakage

6 years agoradeonsi: reduce LDS stalls by 40% for tessellation
Marek Olšák [Fri, 13 Jul 2018 04:23:36 +0000 (00:23 -0400)]
radeonsi: reduce LDS stalls by 40% for tessellation

40% is the decrease in the LGKM counter (which includes SMEM too)
for the GFX9 LSHS stage.

This will make the LDS size slightly larger, but I wasn't able to increase
the patch stride without corruption, so I'm increasing the vertex stride.

6 years agoradeonsi: Add debug option to enable LLVM GlobalISel (v2)
Tom Stellard [Fri, 20 Jul 2018 17:54:56 +0000 (19:54 +0200)]
radeonsi: Add debug option to enable LLVM GlobalISel (v2)

R600_DEBUG=gisel will tell LLVM to use GlobalISel rather than
SelectionDAG for instruction selection.

v2: mareko: move the helper to src/amd/common

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <tstellar@redhat.com>
6 years agointel/compiler: Account for built-in uniforms in analyze_ubo_ranges
Jason Ekstrand [Mon, 23 Jul 2018 16:41:26 +0000 (09:41 -0700)]
intel/compiler: Account for built-in uniforms in analyze_ubo_ranges

The original pass only looked for load_uniform intrinsics but there are
a number of other places that could end up loading a push constant.  One
obvious omission was images which always implicitly use a push constant.
Legacy VS clip planes also get pushed into the shader.  This fixes some
new Vulkan CTS tests that test random combinations of bindings and, in
particular, test lots of UBOs and images together.

Cc: mesa-stable@lists.freedesktop.org
Cc: Kenneth Graunke <kenneth@whitecape.org>
6 years agoradv: enable VK_KHR_16bit_storage extension / 16bit storage features
Daniel Schürmann [Tue, 15 May 2018 15:10:12 +0000 (17:10 +0200)]
radv: enable VK_KHR_16bit_storage extension / 16bit storage features

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac: add support for 16bit load_push_constant
Daniel Schürmann [Mon, 16 Jul 2018 18:45:24 +0000 (20:45 +0200)]
ac: add support for 16bit load_push_constant

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: add support for 16bit input/output
Daniel Schürmann [Tue, 15 May 2018 15:09:03 +0000 (17:09 +0200)]
radv: add support for 16bit input/output

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agonir: add 16bit type information to glsl types
Daniel Schürmann [Tue, 6 Feb 2018 17:53:33 +0000 (18:53 +0100)]
nir: add 16bit type information to glsl types

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac: add support for 16bit buffer loads
Daniel Schürmann [Tue, 15 May 2018 14:01:25 +0000 (16:01 +0200)]
ac: add support for 16bit buffer loads

v2: Fixed dvec3 loads (bas)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac: add support for 16bit UBO loads
Daniel Schürmann [Wed, 7 Feb 2018 18:40:43 +0000 (19:40 +0100)]
ac: add support for 16bit UBO loads

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac: add support for 16bit ssbo stores
Daniel Schürmann [Tue, 15 May 2018 09:27:25 +0000 (11:27 +0200)]
ac: add support for 16bit ssbo stores

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoac: add 16bit conversion operations
Daniel Schürmann [Sat, 3 Feb 2018 13:37:26 +0000 (14:37 +0100)]
ac: add 16bit conversion operations

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agor600: enable tess_input_info for TES
Dave Airlie [Thu, 19 Jul 2018 04:39:15 +0000 (05:39 +0100)]
r600: enable tess_input_info for TES

There might be a nicer way to do this, but this is at least correct.

This fixes:
KHR-GL44.tessellation_shader.single.max_patch_vertices
KHR-GL44.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn

Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
6 years agodocs/features: fix virgl gles3.1 entries
Dave Airlie [Mon, 23 Jul 2018 20:10:06 +0000 (06:10 +1000)]
docs/features: fix virgl gles3.1 entries

6 years agodraw: force draw pipeline if there's more than 65535 vertices
Roland Scheidegger [Sat, 21 Jul 2018 23:05:39 +0000 (01:05 +0200)]
draw: force draw pipeline if there's more than 65535 vertices

The pt emit path can only handle 65535 - the number of vertices is
truncated to a ushort, resulting in a too small buffer allocation, which
will crash.

Forcing the pipeline path looks suboptimal, then again this bug is
probably there ever since GS is supported, so it seems it's not
happening often. (Note that the vertex_id in the vertex header is 16
bit too, however this is only used by the draw pipeline, and it denotes
the emit vertex nr, and that uses vbuf code, which will only emit smaller
chunks, so should be fine I think.)
Other solutions would be to simply allow 32bit counts for vertex
allocation, however 65535 is already larger than this was intended for
(the idea being it should be more cache friendly). Or could try to teach
the pt emit path to split the emit in smaller chunks (only the non-index
path can be affected, since gs output is always linear), but it's a bit
tricky (we don't know the primitive boundaries up-front).

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=107295
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
6 years agodocs/features: note ARB_copy_image is working on virgl
Dave Airlie [Mon, 23 Jul 2018 20:05:50 +0000 (06:05 +1000)]
docs/features: note ARB_copy_image is working on virgl

6 years agoRevert "virgl: remove unused stride-arguments"
Dave Airlie [Mon, 23 Jul 2018 20:03:03 +0000 (06:03 +1000)]
Revert "virgl: remove unused stride-arguments"

This reverts commit dc938b8398c0dafb60507e41685f7518b681c24d.

This adds warnings in vtest, and possibly breaks it.

6 years agodocs/features: note ssbo and atomic counters done for virgl
Dave Airlie [Wed, 18 Jul 2018 02:36:04 +0000 (12:36 +1000)]
docs/features: note ssbo and atomic counters done for virgl

6 years agovirgl: add initial shader_storage_buffer_object support. (v2)
Dave Airlie [Tue, 17 Jul 2018 07:24:29 +0000 (17:24 +1000)]
virgl: add initial shader_storage_buffer_object support. (v2)

This adds the guest side support for ARB_shader_storage_buffer_object.

Co-authors: Gurchetan Singh <gurchetansingh@chromium.org>

v2: move to using separate maximums
(fixup macros)

Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
6 years agonir: Add a couple trivial abs optimizations
Jason Ekstrand [Mon, 23 Jul 2018 06:57:07 +0000 (23:57 -0700)]
nir: Add a couple trivial abs optimizations

Spotted in a shader in Batman: Arkham City.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agoglsl: remove delegating constructors to allow build with C++98
Caio Marcelo de Oliveira Filho [Fri, 20 Jul 2018 20:21:33 +0000 (13:21 -0700)]
glsl: remove delegating constructors to allow build with C++98

Delegating constructors is a C++11 feature, so this was breaking when
compiling with C++98. Change the copy_propagation_state() calls that
used the convenience constructor to use a static member function
instead.

Since copy_propagation_state is expected to be heap allocated, this
change is a good fit.

Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107305

6 years agov3d: Implement a small immediates optimization, based on VC4's.
Eric Anholt [Fri, 20 Jul 2018 21:27:09 +0000 (14:27 -0700)]
v3d: Implement a small immediates optimization, based on VC4's.

We can do one per instruction, and we have to be careful not to overwrite
raddr_b, but this greatly reduces the pressure on uniform loads
(particularly around ldvpm/stvpm instructions).

total instructions in shared programs: 90768 -> 88220 (-2.81%)
instructions in affected programs:     82711 -> 80163 (-3.08%)

6 years agov3d: Return an invalid src number if asked for a missing implicit uniform.
Eric Anholt [Fri, 20 Jul 2018 21:06:57 +0000 (14:06 -0700)]
v3d: Return an invalid src number if asked for a missing implicit uniform.

Sometimes when iterating over sources, we might want to check if it's the
implicit one.  We wouldn't want to match on a non-implicit src using this
function.

6 years agov3d: Skip emitting texture config parameter 2 if it's just the defaults.
Eric Anholt [Fri, 20 Jul 2018 20:31:49 +0000 (13:31 -0700)]
v3d: Skip emitting texture config parameter 2 if it's just the defaults.

shader-db:
total instructions in shared programs: 91275 -> 90768 (-0.56%)
instructions in affected programs:     20702 -> 20195 (-2.45%)

6 years agov3d: Update an XXX comment for a path we handled in HW on V3D 4.x.
Eric Anholt [Fri, 20 Jul 2018 20:24:53 +0000 (13:24 -0700)]
v3d: Update an XXX comment for a path we handled in HW on V3D 4.x.

6 years agov3d: Switch to using the new SFU instructions on V3D 4.x.
Eric Anholt [Fri, 20 Jul 2018 20:06:50 +0000 (13:06 -0700)]
v3d: Switch to using the new SFU instructions on V3D 4.x.

These instructions let us write directly to the phys regfile, instead of
just R4.  That lets us avoid moving out of R4 to avoid conflicting with
other SFU results, and to avoid conflicting with thread switches.

There is still an extra instruction of latency, which is not represented
in the scheduler at the moment.  If you use the result before it's ready,
the QPU will just stall, unlike the magic R4 mode where you'd read the
previous value.  That means that the following shader-db results aren't
quite representative (since we now cause some stalls instead of emitting
nops), but they're impressive enough that I'm happy with the change.

total instructions in shared programs: 95669 -> 91275 (-4.59%)
instructions in affected programs:     82590 -> 78196 (-5.32%)

6 years agov3d: Add QPU pack/unpack for the new SFU instructions.
Eric Anholt [Fri, 20 Jul 2018 19:19:36 +0000 (12:19 -0700)]
v3d: Add QPU pack/unpack for the new SFU instructions.

These instructions allow writing the result to any register, instead of a
special writeback to r4.

6 years agov3d: Fix the name of the "flpop" operation.
Eric Anholt [Fri, 20 Jul 2018 19:43:37 +0000 (12:43 -0700)]
v3d: Fix the name of the "flpop" operation.

Noticed while trying to sort a new op into the appropriate place to match
the documentation.

6 years agov3d: Print the instruction we're testing in the QPU disasm/pack round-trip.
Eric Anholt [Fri, 20 Jul 2018 19:29:39 +0000 (12:29 -0700)]
v3d: Print the instruction we're testing in the QPU disasm/pack round-trip.

If we fail initial disassembly, it's good to know what instruction it was
that failed.

6 years agov3d: Drop unused vir_SAT() operation.
Eric Anholt [Fri, 20 Jul 2018 19:10:08 +0000 (12:10 -0700)]
v3d: Drop unused vir_SAT() operation.

We lower saturates in NIR.

6 years agov3d: Rotate through registers to improve post-RA scheduling options.
Eric Anholt [Fri, 20 Jul 2018 19:05:57 +0000 (12:05 -0700)]
v3d: Rotate through registers to improve post-RA scheduling options.

Similarly to VC4's implementation, by not picking r0 immediately upon
freeing it, we give the scheduler more of a chance to fit later writes in
earlier.  I'm not clear on whether there's any real cost to picking phys
over accumulators, so keep that behavior for now.

shader-db:
total instructions in shared programs: 96831 -> 95669 (-1.20%)
instructions in affected programs:     77254 -> 76092 (-1.50%)

6 years agov3d: Allow reading from physical regs written in the previous instruction.
Eric Anholt [Fri, 20 Jul 2018 18:53:25 +0000 (11:53 -0700)]
v3d: Allow reading from physical regs written in the previous instruction.

This restriction existed in V3D 2.x, but lifting it was a major change in
3.x.

shader-db results:
total instructions in shared programs: 98117 -> 96831 (-1.31%)
instructions in affected programs:     48520 -> 47234 (-2.65%)

6 years agoanv: remove unnecessary runtime copy of static string
Eric Engestrom [Tue, 17 Jul 2018 15:58:22 +0000 (16:58 +0100)]
anv: remove unnecessary runtime copy of static string

It's actually also a bit safer, since now the compiler will warn if
the string is larger than the `.name` array.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoanv: Pay attention to VK_ACCESS_MEMORY_(READ|WRITE)_BIT
Alex Smith [Fri, 20 Jul 2018 10:39:32 +0000 (11:39 +0100)]
anv: Pay attention to VK_ACCESS_MEMORY_(READ|WRITE)_BIT

According to the spec, these should apply to all read/write access
types (so would be equivalent to specifying all other access types
individually). Currently, they were doing nothing.

v2: Handle VK_ACCESS_MEMORY_WRITE_BIT in dstAccessMask.

Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agovirgl: remove unused stride-arguments
Erik Faye-Lund [Wed, 18 Jul 2018 10:57:13 +0000 (11:57 +0100)]
virgl: remove unused stride-arguments

The IOCTLs doesn't pass this along, so computing them in the first
place is kinda pointless.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
6 years agoradv: print a big warning when RADV_TRACE_FILE is set
Samuel Pitoiset [Fri, 20 Jul 2018 16:47:03 +0000 (18:47 +0200)]
radv: print a big warning when RADV_TRACE_FILE is set

Users shouldn't use this debugging option except when we
ask them to do!

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: fix a memleak for merged shaders on GFX9
Samuel Pitoiset [Fri, 20 Jul 2018 16:48:07 +0000 (18:48 +0200)]
radv: fix a memleak for merged shaders on GFX9

modules[i] can be NULL for merged shaders but we have to
free the NIR code. radv_can_dump_shader_stats() already handles
if modules[i] is NULL, no need to check it twice.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agointel/blorp: Fix blits to R8G8B8_UNORM_SRGB sRGB harder
Jason Ekstrand [Fri, 20 Jul 2018 22:10:57 +0000 (15:10 -0700)]
intel/blorp: Fix blits to R8G8B8_UNORM_SRGB sRGB harder

The first fix attempt contained a nasty typo which somehow didn't get
caught in review.  It also didn't work as intended because the sRGB
conversion was happening but then throwing away all but the red channel
because it dind't know it was RGB.  Really, it's my fault for trying to
fix a bug without first writing tests.  I've now written tests and they
pass with this change. :)

Fixes: 11712b9ca17 "intel/blorp: Fix blits to R8G8B8_UNORM_SRGB"
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agoanv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV
Jason Ekstrand [Wed, 11 Jul 2018 23:31:02 +0000 (16:31 -0700)]
anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV

We've had several broadwell hangs that have come down to this bit just
not working correctly.  Most recently, we've had a pile of hangs
reported with apps running under DXVK:

https://github.com/doitsujin/dxvk/issues/469

Instead, use the bit that doesn't try to imply weird D3D coherency
things and just force-enables the PS like we want.

cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoanv: Properly handle GetImageSubresourceLayout on complex images
Jason Ekstrand [Fri, 20 Jul 2018 21:24:17 +0000 (14:24 -0700)]
anv: Properly handle GetImageSubresourceLayout on complex images

We support mipmapped and arrayed linear images so we need to support
vkGetImageSubresourceLayout on them.  Fortunately, it's just a trivial
call into ISL.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agoradeonsi/nir: make use of nir_lower_load_const_to_scalar()
Timothy Arceri [Mon, 16 Jul 2018 04:01:40 +0000 (14:01 +1000)]
radeonsi/nir: make use of nir_lower_load_const_to_scalar()

This allows NIR to CSE more operations. LLVM does this also so the
impact is limited, however doing this in NIR allows other opts to
make progress. For example some loops in Civilization Beyond Earth
shaders are unrolled.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoanv/gen9: expose VK_EXT_post_depth_coverage
Ilia Mirkin [Fri, 20 Jul 2018 21:50:02 +0000 (15:50 -0600)]
anv/gen9: expose VK_EXT_post_depth_coverage

Note that the use of ICMS_INNER_CONSERVATIVE disagrees with the GL driver.
Perhaps it's more performant than ICMS_NORMAL and is otherwise permitted?
Not sure, so I left it as-is.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agospirv: add support for SPV_KHR_post_depth_coverage
Ilia Mirkin [Fri, 20 Jul 2018 21:50:01 +0000 (15:50 -0600)]
spirv: add support for SPV_KHR_post_depth_coverage

Allow the capability to be exposed, and convert the new execution mode
into fs state.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoandroid: util/disk_cache: fix building errors in gallium drivers
Mauro Rossi [Sat, 21 Jul 2018 08:40:32 +0000 (10:40 +0200)]
android: util/disk_cache: fix building errors in gallium drivers

This patch applies the necessary changes in Android.common.mk
as per automake rules, to avoid following building error:

external/mesa/src/gallium/drivers/nouveau/nouveau_screen.c:159:8:
error: implicit declaration of function 'disk_cache_get_function_timestamp'
is invalid in C99 [-Werror,-Wimplicit-function-declaration]
   if (disk_cache_get_function_timestamp(nouveau_disk_cache_create,
       ^
1 error generated.

(v2) -DENABLE_SHADER_CACHE Android cflag is kept, to leave the AS-IS capability enabled

Fixes: cc10b34 ("util/disk_cache: Fix disk_cache_get_function_timestamp with disabled cache.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoAndroid: fix a missing nir_intrinsics.h error
Chih-Wei Huang [Thu, 24 May 2018 07:03:31 +0000 (15:03 +0800)]
Android: fix a missing nir_intrinsics.h error

The commit 76dfed8ae2d5 changed nir_intrinsics.h to be a generated
header, but the corresponding dependency was not updated for Android.
It causes the error:

[  0% 19/4336] target  C: libmesa_pipe_radeonsi <= external/mesa/src/gallium/drivers/radeonsi/si_debug.c
...
In file included from external/mesa/src/gallium/drivers/radeonsi/si_debug.c:25:
In file included from external/mesa/src/gallium/drivers/radeonsi/si_pipe.h:28:
In file included from external/mesa/src/gallium/drivers/radeonsi/si_shader.h:140:
In file included from external/mesa/src/amd/common/ac_llvm_build.h:30:
external/mesa/src/compiler/nir/nir.h:966:10: fatal error: 'nir_intrinsics.h' file not found
         ^~~~~~~~~~~~~~~~~~
1 error generated.

Fixes: 76dfed8ae2d5 ("nir: mako all the intrinsics")
Signed-off-by: Chih-Wei Huang <cwhuang@linux.org.tw>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Mauro Rossi <issor.oruam@gmail.com>
6 years agonir: Fix end of function without return warning/error.
Bas Nieuwenhuizen [Fri, 20 Jul 2018 17:54:56 +0000 (19:54 +0200)]
nir: Fix end of function without return warning/error.

There always is a continue block, so let us just do unreachable.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: 8cacf38f527 "nir: Do not use continue block after removing it."
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107312

6 years agost: Sweep NIR after linking phase to free held memory
Danylo Piliaiev [Tue, 10 Jul 2018 08:51:45 +0000 (11:51 +0300)]
st: Sweep NIR after linking phase to free held memory

After optimization passes and many trasfromations most of memory
NIR holds is a garbage which was being freed only after shader deletion.
Freeing it at the end of linking will save memory which would be useful
in case there are a lot of complex shaders being compiled.
The common case for this issue is 32bit game running under Wine.

The cost of the optimization is around ~3-5% of compilation speed
with complex shaders.

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agost/dri: Don't require a dri_format for image creation.
Eric Anholt [Mon, 16 Jul 2018 22:22:57 +0000 (15:22 -0700)]
st/dri: Don't require a dri_format for image creation.

Nothing in EGL_KHR_gl_image.txt seems to let us deny creation based on
formats, and doing so causes many failures in
dEQP-EGL.functional.image.api.*

The NONE value we were protecting from only gets looked at in the
__DRI_IMAGE_ATTRIB_FORMAT and __DRI_IMAGE_ATTRIB_FOURCC queries, which are
used from wayland and gbm (which throw an error cleanly on unknown format)
and DMABUF export.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoegl: Refuse EGL_MESA_image_dma_buf_export if we don't have a DRM fourcc.
Eric Anholt [Mon, 16 Jul 2018 23:18:03 +0000 (16:18 -0700)]
egl: Refuse EGL_MESA_image_dma_buf_export if we don't have a DRM fourcc.

The EGL CTS expects that you can make images from all sorts of things,
including things like z16 and s8, which we don't have DRM fourccs for.
Just return an error when trying to export one of those.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agov3d: Fix incorrect handling of two fences created back-to-back.
Eric Anholt [Mon, 9 Jul 2018 19:41:46 +0000 (12:41 -0700)]
v3d: Fix incorrect handling of two fences created back-to-back.

Recreating our context's syncobj with ALREADY_SIGNALED meant that if you
created two fences in a row, then waiting on the second would succeed
immediately.  Instead, export a sync file in the gallium fence (since we
don't have a syncobj clone ioctl), and just create a new syncobj to wait
on whenever we need to.

Noticed while debugging
dEQP-GLES3.functional.fence_sync.client_wait_sync_finish

6 years agov3d: Fix the timeout value passed to drmSyncobjWait().
Eric Anholt [Mon, 9 Jul 2018 20:18:34 +0000 (13:18 -0700)]
v3d: Fix the timeout value passed to drmSyncobjWait().

The API wants an absolute time, so we need to go add gallium's argument to
CLOCK_MONOTONIC.

6 years agov3d: Fix drmSyncobjWait() return value checking even more.
Eric Anholt [Wed, 18 Jul 2018 19:06:45 +0000 (12:06 -0700)]
v3d: Fix drmSyncobjWait() return value checking even more.

It tends to return >0 in the success case (I think the value is something
like "how much of the timeout remained").  Fixes
dEQP-GLES3.functional.fence_sync.client_wait_sync_finish

6 years agov3d: Use the list_first_entry/list_last_entry macros.
Eric Anholt [Tue, 17 Jul 2018 21:33:19 +0000 (14:33 -0700)]
v3d: Use the list_first_entry/list_last_entry macros.

6 years agov3d: Move BO cache counting to dump time instead of cache management.
Eric Anholt [Tue, 17 Jul 2018 21:29:41 +0000 (14:29 -0700)]
v3d: Move BO cache counting to dump time instead of cache management.

This is one less way to get the dump stats wrong.

6 years agov3d: Reduce the stale BO reclamation spam with dump_stats set.
Eric Anholt [Tue, 17 Jul 2018 20:21:58 +0000 (13:21 -0700)]
v3d: Reduce the stale BO reclamation spam with dump_stats set.

This was obviously meant to be when we were actually freeing a BO, not
just when there was at least one BO in the list.

6 years agov3d: Respect a sampler view's first_layer field.
Eric Anholt [Mon, 16 Jul 2018 23:44:58 +0000 (16:44 -0700)]
v3d: Respect a sampler view's first_layer field.

Fixes texturing from EGL images created from cubemap faces, as in
dEQP-EGL.functional.image.create.gles2_cubemap_negative_x_rgba_texture

6 years agoradeonsi: emit_spi_map packets optimization
Sonny Jiang [Wed, 18 Jul 2018 21:48:50 +0000 (17:48 -0400)]
radeonsi: emit_spi_map packets optimization

v2: marek: remove an empty line before break;
    rename reg_val_seq -> spi_ps_input_cntl
    "type * x" -> "type *x"

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
6 years agovirgl: Expose GL_ARB_copy_image if host supports it
Gert Wollny [Tue, 3 Jul 2018 11:32:21 +0000 (13:32 +0200)]
virgl: Expose GL_ARB_copy_image if host supports it

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
6 years agovirgl: Allow RGB32* textures only as buffer objects
Gert Wollny [Thu, 12 Jul 2018 10:55:36 +0000 (12:55 +0200)]
virgl: Allow RGB32* textures only as buffer objects

When requesting a texture of the internal format GL_RGB32F Gallium will
try to allocate a renderable texture and returns RGBA32F or RGBX32F, but
when one requests GL_RGB32I or GL_RGB32UI the according 3-component
texture will be returned. This leads to problems later, when one wants
to use glCopyImageSubData to copy data between these textures that should
be compatible, but given the way virgl and Gallium  handle this the latter
fails with an assertion, because the per-texel bit size is different.

By allowing the GL_RGB32* only for texture buffers these problems are avoided
without losing the ARB_tbo_rgb32 extension (thanks Ilia Mirkin).

v2: Correct spelling (Gurchetan Singh)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
6 years agointel: tools: dump: protect against multiple calls on destructor
Lionel Landwerlin [Fri, 20 Jul 2018 10:20:41 +0000 (11:20 +0100)]
intel: tools: dump: protect against multiple calls on destructor

When running gdb, make sure to pass the LD_PRELOAD variable only to
the executed program, not the debugger. Otherwise the debugger will
run the preloaded constructor/destructor too and bad things will
happen.

Suggested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
6 years agointel: tools: dump: make dump tool reliable under gdb
Lionel Landwerlin [Fri, 20 Jul 2018 10:18:18 +0000 (11:18 +0100)]
intel: tools: dump: make dump tool reliable under gdb

The problem with passing the configuration of the dump lib through a
file descriptor is that it can be read only once. But under gdb you
might want to rerun your program multiple times.

This change hands the configuration through a temporary file that is
deleted once the command line passes to intel_dump_gpu has exited.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
6 years agoradv: don't flush DB before subpass FS resolves
Samuel Pitoiset [Fri, 20 Jul 2018 13:07:34 +0000 (15:07 +0200)]
radv: don't flush DB before subpass FS resolves

That shouldn't be needed because the DB state is invalid.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agor600: Correct evaluation of cube array index and face
Gert Wollny [Tue, 17 Jul 2018 17:04:09 +0000 (19:04 +0200)]
r600: Correct evaluation of cube array index and face

The array index needs to be corrected and it must be insured that it is
rounded and its value is non-negative before it is combined with the
face id.

v5: Use RNDNE instead of ADD 0.5 and FLOOR (Ilia Mirkin)

v6: Fix type (Roland Scheidegger)

Fixes 182 from android/cts/master/gles31-master.txt:
  dEQP-GLES31.functional.texture.filtering.cube_array.formats.*
  dEQP-GLES31.functional.texture.filtering.cube_array.sizes.*
  dEQP-GLES31.functional.texture.filtering.cube_array.combinations.nearest_mipmap_*
  dEQP-GLES31.functional.texture.filtering.cube_array.combinations.linear_mipmap_*
  dEQP-GLES31.functional.texture.filtering.cube_array.no_edges_visible.*

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
6 years agor600: correct texture offset for array index lookup
Gert Wollny [Tue, 17 Jul 2018 17:04:08 +0000 (19:04 +0200)]
r600: correct texture offset for array index lookup

Correct the array index for TEXTURE_*1D_ARRAY, and TEXTURE_*2D_ARRAY
The standard says the array index is evaluated according to

   floor(z + 0.5)

but RNDNE is sufficient also for the test cases were z is close to 1.5
and it is likely to hit 1.5, the corner case were RNDNE gives a result
different from above formula.

v5: - Use RNDNE instead of ADD 0.5 and FLOOR (Ilia Mirkin)
    - update commit message

Fixes 325 tests from android/cts/master/gles3-master.txt:
  dEQP-GLES3.functional.shaders.texture_functions.texture.*sampler2darray*
  dEQP-GLES3.functional.shaders.texture_functions.textureoffset.*sampler2darray*
  dEQP-GLES3.functional.shaders.texture_functions.texturelod.sampler2darray*
  dEQP-GLES3.functional.shaders.texture_functions.texturelodoffset.*sampler2darray*
  dEQP-GLES3.functional.shaders.texture_functions.texturegrad.*sampler2darray*
  dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.*sampler2darray*
  dEQP-GLES3.functional.texture.filtering.2d_array.formats.*
  dEQP-GLES3.functional.texture.filtering.2d_array.sizes.*
  dEQP-GLES3.functional.texture.filtering.2d_array.combinations.*
  dEQP-GLES3.functional.texture.shadow.2d_array.*
  dEQP-GLES3.functional.texture.vertex.2d_array.*

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
6 years agor600: Delay emission of texture gradients and lookup offsets
Gert Wollny [Tue, 17 Jul 2018 17:04:07 +0000 (19:04 +0200)]
r600: Delay emission of texture gradients and lookup offsets

Gradients used in texture lookups and the offsets must reside in the
same fetch clause (the first is imposed by the hardware and the second
is expected by sb). In order to ensure that no ALU clause is inserted
between emission and use of these, delay the emission of these
instructions until the texture instruction using them is also emitted.

This is needed in preparation for the correction of the texture array
indices.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
6 years agoutil/disk_cache: Fix disk_cache_get_function_timestamp with disabled cache.
Bas Nieuwenhuizen [Wed, 18 Jul 2018 11:58:49 +0000 (13:58 +0200)]
util/disk_cache: Fix disk_cache_get_function_timestamp with disabled cache.

radv always needs it, so just check the header instead. Also
do not declare the function if the variable is not set, so we
get a nice compile error instead of failing to open a device
at runtime.

Fixes: b87ef9e606a "util: fix MSVC build issue in disk_cache.h"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agonir: Do not use continue block after removing it.
Bas Nieuwenhuizen [Sat, 14 Jul 2018 23:19:17 +0000 (01:19 +0200)]
nir: Do not use continue block after removing it.

Reinserting code directly before a jump means the block gets split
and merged, removing the original block and replacing it in the
process.

Hence keeping a pointer to the continue block over a reinsert
causes issues.

This code changes nir_opt_if to simply look for the new continue
block.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107275
CC: 18.1 <mesa-stable@lists.freedesktop.org>
6 years agoradv: simplify a condition in radv_src_access_flush()
Samuel Pitoiset [Wed, 18 Jul 2018 14:19:07 +0000 (16:19 +0200)]
radv: simplify a condition in radv_src_access_flush()

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: save current state just before resolving with FS
Samuel Pitoiset [Wed, 18 Jul 2018 14:19:06 +0000 (16:19 +0200)]
radv: save current state just before resolving with FS

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: don't check if a subpass has resolve attachments twice
Samuel Pitoiset [Wed, 18 Jul 2018 14:19:05 +0000 (16:19 +0200)]
radv: don't check if a subpass has resolve attachments twice

We already check that in radv_cmd_buffer_resolve_subpass().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: make use of radv_subpass_barrier() when resolving subpasses
Samuel Pitoiset [Wed, 18 Jul 2018 14:19:04 +0000 (16:19 +0200)]
radv: make use of radv_subpass_barrier() when resolving subpasses

The goal is to use radv_barrier()/radv_subpass_barrier() as
much as possible for further optimizations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agonv50/ir: move LateAlgebraicOpt back to right after ConstantFolding
Rhys Perry [Tue, 12 Jun 2018 11:14:14 +0000 (12:14 +0100)]
nv50/ir: move LateAlgebraicOpt back to right after ConstantFolding

total instructions in shared programs : 5480808 -> 5472107 (-0.16%)
total gprs used in shared programs    : 647530 -> 647532 (0.00%)
total shared used in shared programs  : 389120 -> 389120 (0.00%)
total local used in shared programs   : 21064 -> 21064 (0.00%)
total bytes used in shared programs   : 58551648 -> 58459352 (-0.16%)

                local     shared        gpr       inst      bytes
    helped           0           0          73        2609        2609
      hurt           0           0          71          34          34

6 years agonv50/ir: handle SHLADD in IndirectPropagation
Rhys Perry [Tue, 12 Jun 2018 10:43:49 +0000 (11:43 +0100)]
nv50/ir: handle SHLADD in IndirectPropagation

An alternative solution to the problem fixed in
0bd83d0 ("nv50/ir: move LateAlgebraicOpt to the very end").

total instructions in shared programs : 5481195 -> 5480808 (-0.01%)
total gprs used in shared programs    : 647535 -> 647530 (-0.00%)
total shared used in shared programs  : 389120 -> 389120 (0.00%)
total local used in shared programs   : 21064 -> 21064 (0.00%)
total bytes used in shared programs   : 58555784 -> 58551648 (-0.01%)

                local     shared        gpr       inst      bytes
    helped           0           0           2          34          34
      hurt           0           0           0           0           0

6 years agogm107/ir: use CS2R for SV_CLOCK
Rhys Perry [Thu, 19 Jul 2018 15:58:46 +0000 (16:58 +0100)]
gm107/ir: use CS2R for SV_CLOCK

This instruction seems to be faster than S2R and requires no barrier,
though the range of special registers it can read from is limited.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
6 years agointel: tools: dump: remove mentions of intel_aubdump
Lionel Landwerlin [Wed, 18 Jul 2018 16:38:52 +0000 (17:38 +0100)]
intel: tools: dump: remove mentions of intel_aubdump

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Rafael Antognolli <rafael.antognolli@intel.com>
6 years agointel: tools: aubwrite: fix invalid frees on finish
Lionel Landwerlin [Wed, 18 Jul 2018 16:39:19 +0000 (17:39 +0100)]
intel: tools: aubwrite: fix invalid frees on finish

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
6 years agoac/nir: add a workaround for bitfield_extract when count is 0
Samuel Pitoiset [Thu, 19 Jul 2018 18:27:11 +0000 (20:27 +0200)]
ac/nir: add a workaround for bitfield_extract when count is 0

LLVM 7 returns incorrect results when count is 0, something
has been broken since LLVM 6. Of course, the best solution is
to fix LLVM but this workaround works as expected for now.

Original workaround by Philippe Rebohle.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107276
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agointel/isl/gen4: Make depth/stencil buffers Y-Tiled
Nanley Chery [Mon, 16 Jul 2018 22:42:39 +0000 (15:42 -0700)]
intel/isl/gen4: Make depth/stencil buffers Y-Tiled

Rendering to a linear depth buffer on gen4 is causing a GPU hang in the
CI system. Until a better explanation is found, assume that errata is
applicable to all gen4 platforms.

Fixes fbe01625f6bf2cef6742e1ff0d3d44a2afec003e
("i965/miptree: Share tiling_flags in miptree_create").

Reported-by: Mark Janes <mark.a.janes@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107248
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoi965/misc: Use depth/stencil surf's tiling on gen4-5
Nanley Chery [Mon, 16 Jul 2018 20:03:09 +0000 (13:03 -0700)]
i965/misc: Use depth/stencil surf's tiling on gen4-5

Make the 3D engine aware of the depth/stencil surface's tiling before
doing any render operations.

Fixes fbe01625f6bf2cef6742e1ff0d3d44a2afec003e
("i965/miptree: Share tiling_flags in miptree_create").

Reported-by: Mark Janes <mark.a.janes@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107248
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoglsl: don't let an 'if' then-branch kill copy propagation (elements) for else-branch
Caio Marcelo de Oliveira Filho [Tue, 26 Jun 2018 23:26:46 +0000 (16:26 -0700)]
glsl: don't let an 'if' then-branch kill copy propagation (elements) for else-branch

When handling 'if' in copy propagation elements, if a certain variable
was killed when processing the first branch of the 'if', then the
second would get any propagation from previous nodes.

    x = y;
    if (...) {
        z = x;  // This would turn into z = y.
        x = 22; // x gets killed.
    } else {
        w = x;  // This would NOT turn into w = y.
    }

With the change, we let copy propagation happen independently in the
two branches and only then apply the killed values for the subsequent
code.

One example in shader-db part of shaders/unity/8.shader_test:

    (assign  (xyz) (var_ref col_1)  (var_ref tmpvar_8) )
    (if (expression bool < (swiz y (var_ref xlv_TEXCOORD0) )(constant float (0.000000)) ) (
      (assign  (xyz) (var_ref col_1)  (expression vec3 + (var_ref tmpvar_8) ... ) ... )
    )
    (
      (assign  (xyz) (var_ref col_1)  (expression vec3 lrp (var_ref col_1) ... ) ... )
    ))

The variable col_1 was replaced by tmpvar_8 in the then-part but not
in the else-part.

NIR deals well with copy propagation, so it already covered for the
missing ones that this patch fixes.

Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agoglsl: change opt_copy_propagation_elements data structures
Caio Marcelo de Oliveira Filho [Mon, 25 Jun 2018 17:44:56 +0000 (10:44 -0700)]
glsl: change opt_copy_propagation_elements data structures

Instead of keeping multiple acp_entries in lists, have a single
acp_entry per variable. With this, the implementation of clone is more
convenient and now fully implemented. In the previous code, clone was
only partial.

Before this patch, each acp_entry struct represented a write to a
variable including LHS, RHS and a mask of what channels were written
to. There were two main hash tables, the first (lhs_ht) stored a list
of acp_entries per LHS variable, with the values available to copy for
that variable; the second (rhs_ht) was a "reverse index" for the first
hash table, so stored acp_entries per RHS variable.

After the patch, there's a single acp_entry struct per LHS variable,
it contains an array with references to the RHS variables per
channel. There now is a single hash table, from LHS variable to the
corresponding entry. The "reverse index" is stored in the ACP entry,
in the form of a set of variables that copy from the LHS. To make the
clone operation cheaper, the ACP entries are created on demand.

This should not change the result of copy propagation, a later patch
will take advantage of the clone operation.

v2: Add note clarifying how the hashtable is destroyed.

v3: (all from Eric Anholt)
    Add remove_unused_var_from_dsts() function for reuse.
    Remove from dsts as we go instead of clearing at the end.
    Add clarifying comment to erase().

Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agoglsl: separate copy propagation state
Caio Marcelo de Oliveira Filho [Sat, 23 Jun 2018 00:35:23 +0000 (17:35 -0700)]
glsl: separate copy propagation state

Separate higher level logic of visiting instructions and chosing when
to store and use new copy data from the datastructure holding the copy
propagation information. This will also make easier later patches that
change the structure.

v2: Remove empty destructor and clarify how hash tables are destroyed.

Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agointel: tools: dump: trace memory writes
Lionel Landwerlin [Wed, 18 Jul 2018 17:19:31 +0000 (18:19 +0100)]
intel: tools: dump: trace memory writes

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
6 years agointel: tools: dump: remove command execution feature
Lionel Landwerlin [Wed, 18 Jul 2018 14:12:57 +0000 (15:12 +0100)]
intel: tools: dump: remove command execution feature

In commit 86cb05a6d35a52 ("intel: aubinator: remove standard input
processing option") we removed the ability to process aub as an input
stream because we're now rely on mmapping the aub file to back the
buffers aubinator is parsing.

intel_aubdump was the provider of the standard input data and since
we've copied/reworked intel_aubdump into intel_dump_gpu within Mesa,
we don't need that code anymore.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoradv: Fix incorrect assumption about ternary operator precedence
Danylo Piliaiev [Wed, 18 Jul 2018 08:47:19 +0000 (11:47 +0300)]
radv: Fix incorrect assumption about ternary operator precedence

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>