mesa.git
5 years agoi965/vec4: Properly handle sign(-abs(x))
Ian Romanick [Wed, 27 Jun 2018 01:30:09 +0000 (18:30 -0700)]
i965/vec4: Properly handle sign(-abs(x))

This is achived by copying the sign(abs(x)) optimization from the FS
backend.

On Gen7 an earlier platforms, this fixes new piglit tests:

 - glsl-1.10/execution/vs-sign-neg-abs.shader_test
 - glsl-1.10/execution/vs-sign-sat-neg-abs.shader_test

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoi965/fs: Properly handle sign(-abs(x))
Ian Romanick [Tue, 26 Jun 2018 22:11:21 +0000 (15:11 -0700)]
i965/fs: Properly handle sign(-abs(x))

Fixes new piglit tests:

 - glsl-1.10/execution/fs-sign-neg-abs.shader_test
 - glsl-1.10/execution/fs-sign-sat-neg-abs.shader_test
 - glsl-1.10/execution/vs-sign-neg-abs.shader_test
 - glsl-1.10/execution/vs-sign-sat-neg-abs.shader_test

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agovulkan: utils: handle hexadecimal values in registry
Lionel Landwerlin [Fri, 6 Jul 2018 10:48:23 +0000 (11:48 +0100)]
vulkan: utils: handle hexadecimal values in registry

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agost/dri: fix a crash in server_wait_sync
Marek Olšák [Thu, 5 Jul 2018 22:15:31 +0000 (18:15 -0400)]
st/dri: fix a crash in server_wait_sync

Ported from i965 including the comment.

This fixes:
    dEQP-EGL.functional.reusable_sync.valid.wait_server

Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
5 years agopython: Stop using the Python 2 exception syntax
Mathieu Bridon [Fri, 6 Jul 2018 10:13:36 +0000 (12:13 +0200)]
python: Stop using the Python 2 exception syntax

We could have made this compatible with Python 3 by using:

    except Exception as e:

But since none of this code actually uses the exception objects, let's
just drop them entirely.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
5 years agopython: Use spaces, not tabs
Mathieu Bridon [Thu, 5 Jul 2018 13:17:33 +0000 (15:17 +0200)]
python: Use spaces, not tabs

Python 3 doesn't allow mixing spaces and tabs in a script, contrarily to
Python 2.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
5 years agopython: Use the print function
Mathieu Bridon [Thu, 5 Jul 2018 13:17:32 +0000 (15:17 +0200)]
python: Use the print function

In Python 2, `print` was a statement, but it became a function in
Python 3.

Using print functions everywhere makes the script compatible with Python
versions >= 2.6, including Python 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
5 years agovma/tests: Fix compilation if limits.h defines PAGE_SIZE (v2)
Jon Turney [Thu, 5 Jul 2018 13:40:58 +0000 (14:40 +0100)]
vma/tests: Fix compilation if limits.h defines PAGE_SIZE (v2)

per POSIX, limits.h may define PAGE_SIZE when the value is not indeterminate

v2: just change the variable name, since there's no intended correlation
here between this value and the machine's actual page size.

Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
5 years agoradv: fix emitting the view index on GFX9
Samuel Pitoiset [Thu, 5 Jul 2018 16:56:55 +0000 (18:56 +0200)]
radv: fix emitting the view index on GFX9

For merged shaders, VS as HS for example.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoi965/vec4: Make the vec4_visitor::nir_emit_instr default case unreachable
Ian Romanick [Tue, 3 Jul 2018 03:29:27 +0000 (20:29 -0700)]
i965/vec4: Make the vec4_visitor::nir_emit_instr default case unreachable

The bug fixed by the previous commit went undetected because extra
stderr messages are not flagged by the CI.  Copy the solution from
fs_visitor::nir_emit_instr and mark the default case unreachable.

An alternate solution is to delete the default case so that the compiler
will issue a warning.  That may require more work since there are other
(impossible) cases that exist.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/compiler: More DCE after lowering
Ian Romanick [Tue, 3 Jul 2018 18:49:07 +0000 (11:49 -0700)]
intel/compiler: More DCE after lowering

Some of the lowering passes, nir_lower_locals_to_regs for example, can
cause some previously live code to be dead.  This pass in particular
leaves a bunch of nir_instr_type_deref instructions floating around.
This causes shader-db runs on Gen5 through Haswell to spew tons of
messages like:

    VS instruction not yet implemented by NIR->vec4

UnrealEngine4/EffectsCaveDemo/239.shader_test is one shader that
generates these messages.  Cleaning up the dead code fixes that.

To verify, I did a shader-db before and after.  Even though all the
messages are gone, the results make my brain hurt. :(

Haswell
total cycles in shared programs: 411890163 -> 411891145 (<.01%)
cycles in affected programs: 57016 -> 57998 (1.72%)
helped: 3
HURT: 11
helped stats (abs) min: 2 max: 154 x̄: 96.67 x̃: 134
helped stats (rel) min: 0.08% max: 2.23% x̄: 1.42% x̃: 1.96%
HURT stats (abs)   min: 18 max: 686 x̄: 115.64 x̃: 20
HURT stats (rel)   min: 0.81% max: 7.12% x̄: 1.87% x̃: 0.93%
95% mean confidence interval for cycles value: -51.39 191.67
95% mean confidence interval for cycles %-change: -0.14% 2.46%
Inconclusive result (value mean confidence interval includes 0).

Ivy Bridge
total cycles in shared programs: 259114802 -> 259115032 (<.01%)
cycles in affected programs: 24034 -> 24264 (0.96%)
helped: 1
HURT: 9
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08%
HURT stats (abs)   min: 18 max: 48 x̄: 25.78 x̃: 20
HURT stats (rel)   min: 0.80% max: 1.94% x̄: 1.08% x̃: 0.80%
95% mean confidence interval for cycles value: 12.42 33.58
95% mean confidence interval for cycles %-change: 0.54% 1.38%
Cycles are HURT.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 5a02ffb733e nir: Rework lower_locals_to_regs to use deref instructions
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agov3d: Fix leak of the default attributes BOs.
Eric Anholt [Thu, 5 Jul 2018 20:34:14 +0000 (13:34 -0700)]
v3d: Fix leak of the default attributes BOs.

The GLES3 CTS makes a lot more progress on a run now.

5 years agov3d: Fix leak of the spill BO on context destruction.
Eric Anholt [Thu, 5 Jul 2018 20:30:03 +0000 (13:30 -0700)]
v3d: Fix leak of the spill BO on context destruction.

5 years agonir: Apply fragment color clamping to gl_FragData[] as well.
Eric Anholt [Tue, 3 Jul 2018 22:39:21 +0000 (15:39 -0700)]
nir: Apply fragment color clamping to gl_FragData[] as well.

From the ARB_color_buffer_float spec:

   35. Should the clamping of fragment shader output gl_FragData[n]
       be controlled by the fragment color clamp.

       RESOLVED: Since the destination of the FragData is a color
       buffer, the fragment color clamp control should apply.

Fixes arb_color_buffer_float-mrt mixed on v3d.

Reviewed-by: Rob Clark <robdclark@gmail.com>
5 years agov3d: Skip emitting per-RT blend state for RTs with blend disabled.
Eric Anholt [Tue, 3 Jul 2018 23:27:39 +0000 (16:27 -0700)]
v3d: Skip emitting per-RT blend state for RTs with blend disabled.

Cleans up the CL of fbo-drawbuffers2-blend a bit.  We could do better on
more complicated cases by noticing if multiple RTs have the same blend
state and emitting them in a single packet.

5 years agov3d: Add proper support for GL_EXT_draw_buffers2's blending enables.
Eric Anholt [Tue, 3 Jul 2018 23:24:35 +0000 (16:24 -0700)]
v3d: Add proper support for GL_EXT_draw_buffers2's blending enables.

I had flagged it as enabled on V3D 4.x, but not actually implemented the
per-RT enables.  Fixes piglit fbo_drawbuffers2-blend.

5 years agov3d: Add support for GL_SAMPLE_ALPHA_TO_ONE.
Eric Anholt [Tue, 3 Jul 2018 22:56:48 +0000 (15:56 -0700)]
v3d: Add support for GL_SAMPLE_ALPHA_TO_ONE.

Fixes piglit ext_framebuffer_multisample-draw-buffers-alpha-to-one

5 years agov3d: Respect swap_color_rb for the f32_color_rb case.
Eric Anholt [Tue, 3 Jul 2018 22:52:59 +0000 (15:52 -0700)]
v3d: Respect swap_color_rb for the f32_color_rb case.

We don't actually set the two flags together, but I want to use the
r/g/b/a reordered fields in the next commit.

5 years agost/nir: Disable varying packing when doing transform feedback.
Eric Anholt [Wed, 20 Jun 2018 20:26:52 +0000 (13:26 -0700)]
st/nir: Disable varying packing when doing transform feedback.

The varying packing would result in st_nir_assign_var_locations() picking
new driver_locations, despite the pipe_stream_output already being set up
for the old driver location.  This left the gallium driver with no way to
work back to what varying was referenced by pipe_stream_output.

Fixes these tests on V3D:
dEQP-GLES3.functional.transform_feedback.random.separate.points.3
dEQP-GLES3.functional.transform_feedback.random.separate.points.7
dEQP-GLES3.functional.transform_feedback.random.separate.points.9
dEQP-GLES3.functional.transform_feedback.random.separate.triangles.3
dEQP-GLES3.functional.transform_feedback.random.separate.triangles.8

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agomeson: Set with_dri from with_gallium when DRI glx is explicitly configured
Jon Turney [Mon, 15 Jan 2018 19:39:46 +0000 (19:39 +0000)]
meson: Set with_dri from with_gallium when DRI glx is explicitly configured

Set with_dri from with_gallium when DRI GLX is explicitly configured, as
well as when DRI GLX is chosen automatically.

Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
5 years agoradv/winsys: make use of radeon_emit()
Samuel Pitoiset [Thu, 5 Jul 2018 15:07:07 +0000 (17:07 +0200)]
radv/winsys: make use of radeon_emit()

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: only flush CB meta in pipeline image barriers when needed
Samuel Pitoiset [Thu, 5 Jul 2018 10:54:18 +0000 (12:54 +0200)]
radv: only flush CB meta in pipeline image barriers when needed

If the given image doesn't enable CMASK, FMASK or DCC that's
useless to flush CB metadata.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: only flush DB meta in pipeline image barriers when needed
Samuel Pitoiset [Thu, 5 Jul 2018 10:54:17 +0000 (12:54 +0200)]
radv: only flush DB meta in pipeline image barriers when needed

If the given image doesn't have HTILE, that's useless to flush
DB metadata.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: fix "error: initializer element is not constant" build error
Samuel Pitoiset [Thu, 5 Jul 2018 15:01:23 +0000 (17:01 +0200)]
radv: fix "error: initializer element is not constant" build error

GCC 4.8 fails to compile with "static const", while GCC 8.1
fails to compile with only "static".

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoutil: u_queue: fix android build error
Lionel Landwerlin [Thu, 5 Jul 2018 10:55:43 +0000 (11:55 +0100)]
util: u_queue: fix android build error

mesa/src/util/u_queue.c:242:15: error: address of array 'queue->name'
  will always evaluate to 'true' [-Werror,-Wpointer-bool-conversion]

Fixes: b238e33bc9d48b814370 "kutil/queue: add a process name into a thread name"
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoUtil: fix msvc build
Benedikt Schemmer [Thu, 5 Jul 2018 07:49:15 +0000 (09:49 +0200)]
Util: fix msvc build

The MSVC preprocessor doesnt understand #warning

Fixes: 2e1e6511f76 ("util: extract get_process_name from xmlconfig.c")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
5 years agopython: Specify the JSON separators
Mathieu Bridon [Wed, 27 Jun 2018 10:37:39 +0000 (12:37 +0200)]
python: Specify the JSON separators

On Python 2, the default JSON separators are ', ' for items and ': ' for
dicts.

On Python 3, the default is the same when no indent is specified, but if
one is (and we do specify one) then the default items separator becomes
',' (the dict separator remains unchanged).

This change explicitly specifies the Python 3 default, which helps
ensuring that the output is identical, whether it was generated by
Python 2 or 3.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agopython: Stabilize some script outputs
Mathieu Bridon [Wed, 27 Jun 2018 10:37:38 +0000 (12:37 +0200)]
python: Stabilize some script outputs

In Python, dictionaries and sets are unordered, and as a result their
is no guarantee that running this script twice will produce the same
output.

Using ordered dicts and explicitly sorting items makes the build more
reproducible, and will make it possible to verify that we're not
breaking anything when we move the build scripts to Python 3.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agointel: tools: remove drm-uapi defines
Lionel Landwerlin [Mon, 18 Jun 2018 19:46:57 +0000 (20:46 +0100)]
intel: tools: remove drm-uapi defines

We already embed the headers, no need to redefine defines/structs.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agointel: intel_dump_gpu: use simulator id in captures
Lionel Landwerlin [Sat, 16 Jun 2018 22:25:12 +0000 (23:25 +0100)]
intel: intel_dump_gpu: use simulator id in captures

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agointel: devinfo: add simulator id
Lionel Landwerlin [Sat, 16 Jun 2018 22:22:00 +0000 (23:22 +0100)]
intel: devinfo: add simulator id

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agointel: tools: dump-gpu: dump 48-bit addresses
Scott D Phillips [Fri, 30 Mar 2018 19:56:25 +0000 (12:56 -0700)]
intel: tools: dump-gpu: dump 48-bit addresses

For gen8+, write out PPGTT tables in aub files so that full 48-bit
addresses can be serialized.

v2: Fix handling of `end` index in map_ppgtt

v3: Correctly mark GGTT entry as present (Rafael)

Signed-off-by: Scott D Phillips <scott.d.phillips@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agointel: tools: import intel_aubdump
Lionel Landwerlin [Sat, 16 Jun 2018 16:42:13 +0000 (17:42 +0100)]
intel: tools: import intel_aubdump

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agointel: tools: update intel_aub.h
Lionel Landwerlin [Sat, 16 Jun 2018 11:16:03 +0000 (12:16 +0100)]
intel: tools: update intel_aub.h

Scott added new stuff in IGT.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agointel: batch-decoder: add missing return line
Lionel Landwerlin [Sun, 10 Jun 2018 11:54:59 +0000 (12:54 +0100)]
intel: batch-decoder: add missing return line

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agointel: batch-decoder: don't asks for constant BO until decoding
Lionel Landwerlin [Thu, 14 Jun 2018 16:29:16 +0000 (17:29 +0100)]
intel: batch-decoder: don't asks for constant BO until decoding

With PPGTT mappings, our aubinator implementation can be quite slow if
we request a buffer that doesn't exist. Instead of doing a PPGTT walk
for invalid addresses (0 lengths), wait until we're sure we want to
decode the data.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agointel/batch-decoder: handle non-contiguous binding table / surface state
Scott D Phillips [Mon, 9 Apr 2018 19:46:51 +0000 (12:46 -0700)]
intel/batch-decoder: handle non-contiguous binding table / surface state

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agointel/tools/aubinator: aubinate ppgtt aubs
Scott D Phillips [Fri, 6 Apr 2018 18:02:55 +0000 (11:02 -0700)]
intel/tools/aubinator: aubinate ppgtt aubs

v2: by Lionel
    Fix memfd_create compilation issue
    Fix pml4 address stored on 32 instead of 64bits
    Return no buffer if first ppgtt page is not mapped

v3: Drop additional memfd_create() (Rafael)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agointel: aubinator: handle GGTT mappings
Lionel Landwerlin [Tue, 19 Jun 2018 11:34:26 +0000 (12:34 +0100)]
intel: aubinator: handle GGTT mappings

We use memfd to store physical pages as they get read/written to and
the GGTT entries translating virtual address to physical pages.

Based on a commit by Scott Phillips.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agoutil: rb-tree: A simple, invasive, red-black tree
Jason Ekstrand [Thu, 5 Apr 2018 15:21:49 +0000 (08:21 -0700)]
util: rb-tree: A simple, invasive, red-black tree

This is a simple, invasive, liberally licensed red-black tree
implementation. It's an invasive data structure similar to the
Linux kernel linked-list where the intention is that you embed a
rb_node struct the data structure you intend to put into the
tree.

The implementation is mostly based on the one in "Introduction to
Algorithms", third edition, by Cormen, Leiserson, Rivest, and
Stein. There were a few other key design points:

 * It's an invasive data structure similar to the [Linux kernel
   linked list].

 * It uses NULL for leaves instead of a sentinel. This means a few
   algorithms differ a small bit from the ones in "Introduction to
   Algorithms".

 * All search operations are inlined so that the compiler can
   optimize away the function pointer call.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agointel: aubinator: drop the 1Tb GTT mapping
Lionel Landwerlin [Tue, 19 Jun 2018 11:11:20 +0000 (12:11 +0100)]
intel: aubinator: drop the 1Tb GTT mapping

Now that we're softpinning the address of our BOs in anv & i965, the
addresses selected start at the top of the addressing space. This is a
problem for the current implementation of aubinator which uses only a
40bit mmapped address space.

This change keeps track of all the memory writes from the aub file and
fetch them on request by the batch decoder. As a result we can get rid
of the 1<<40 mmapped address space and only rely on the mmap aub file
\o/

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agointel: aubinator: rework register writes handling
Lionel Landwerlin [Tue, 19 Jun 2018 11:08:46 +0000 (12:08 +0100)]
intel: aubinator: rework register writes handling

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agointel: aubinator: remove standard input processing option
Lionel Landwerlin [Sun, 10 Jun 2018 18:49:12 +0000 (19:49 +0100)]
intel: aubinator: remove standard input processing option

On a follow up commit in this series, we stop copying the data from
the mmap'ed file into our big gtt mmap, and start referencing data in
it directly. So reallocating the read buffer and adding more data from
stdin wouldn't work. For that reason, let's stop supporting stdin
process.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agointel: aubinator: remove unused variables
Lionel Landwerlin [Tue, 19 Jun 2018 10:19:22 +0000 (11:19 +0100)]
intel: aubinator: remove unused variables

These memory offsets are stored in the gen_batch_decode_ctx.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agogallium/auxiliary: Fix string matching
Mathieu Bridon [Thu, 5 Jul 2018 10:43:04 +0000 (12:43 +0200)]
gallium/auxiliary: Fix string matching

Commit f69bc797e15fe6beb9e439009fab55f7fae0b7f9 did the following:

-        if format.layout in ('bptc', 'astc'):
+        if format.layout in ('astc'):

The intention was to go from matching either 'bptc' or 'astc' to
matching only 'astc'.

But the new code doesn't respect this intention any more, because in
Python `('astc')` is not a tuple containing a string, it is just the
string. (the parentheses are simply ignored)

That means we now match any substring of 'astc', for example 'a'.

This commit fixes the test to respect the original intention.

Fixes: f69bc797e15fe6beb9e4 "gallium/auxiliary: Add helper support for
                             bptc format compress/decompress"
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoradv: optimize vkCmd{Set,Reset}Event() a little bit
Samuel Pitoiset [Thu, 28 Jun 2018 10:21:18 +0000 (12:21 +0200)]
radv: optimize vkCmd{Set,Reset}Event() a little bit

Always emitting a bottom-of-pipe event is quite dumb. Instead,
start to optimize these functions by syncing PFP for the
top-of-pipe and syncing ME for the post-index-fetch event.

This can still be improved by emitting EOS events for
syncing PS and CS stages.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: optimize radv_CmdWaitEvents()
Samuel Pitoiset [Wed, 27 Jun 2018 12:15:58 +0000 (14:15 +0200)]
radv: optimize radv_CmdWaitEvents()

This introduces radv_barrier() (same as the draw/dispatch codepath).
This helper is used for merging the code from CmdWaitEvents() and
CmdPipelineBarrier because it's quite similar.

We do ignore the source stage mask for CmdWaitEvents because
it's irrelevant when event objects are used.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agonir/linker: fix msvc build
Roland Scheidegger [Wed, 4 Jul 2018 01:44:50 +0000 (03:44 +0200)]
nir/linker: fix msvc build

Empty initializer braces aren't valid c (it's a gnu extension, and
it's valid in c++).
Hopefully fixes appveyor / msvc build...

Fixes 6677e131b806b10754adcb7cf3f427a7fcc2aa09
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agor600: compare structure elements instead of doing a memcmp
Gert Wollny [Sun, 1 Jul 2018 08:37:12 +0000 (10:37 +0200)]
r600: compare structure elements instead of doing a memcmp

Structures might be padded by the compiler and these padding bytes remain
un-initialized which in turn makes memcmp return a difference where from
the logical point of view there is none.

 Fixes valgrind:
     Conditional jump or move depends on uninitialised value(s)
       at 0x4C32CBA: __memcmp_sse4_1 (vg_replace_strmem.c:1099)
       by 0xB8D2537: r600_set_vertex_buffers (r600_state_common.c:573)
       by 0xB71D44A: u_vbuf_set_driver_vertex_buffers (u_vbuf.c:1129)
       by 0xB71F7BB: u_vbuf_draw_vbo (u_vbuf.c:1153)
       by 0xB3B92CB: st_draw_vbo (st_draw.c:235)
       by 0xB36B1AE: vbo_draw_arrays (vbo_exec_array.c:391)
       by 0xB36BB0D: vbo_exec_DrawArrays (vbo_exec_array.c:550)
       by 0x10A989: piglit_display (textureSize.c:157)
       by 0x4F8F174: run_test (piglit_fbo_framework.c:52)
       by 0x4F7BA12: piglit_gl_test_run (piglit-framework-gl.c:229)
       by 0x10A60A: main (textureSize.c:71)
     Uninitialised value was created by a stack allocation
       at 0xB3948FD: st_update_array (st_atom_array.c:388)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agor600: Add R4G4B4A4 and A1B5G5R5 to supported vertex formats
Gert Wollny [Sun, 1 Jul 2018 17:32:10 +0000 (19:32 +0200)]
r600: Add R4G4B4A4 and A1B5G5R5 to supported vertex formats

Below tests would fail with an error message
  "Vertex format (R4G4B4A4|R5G5B5A1) not supported."
Add the formate to the translation routine to enable these formats.

Fixes:
  dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rgba4_2d
  dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rgba4_cube
  dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rgb5_a1_2d
  dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rgb5_a1_cube
  dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgba4_2d
  dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgba4_cube
  dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb5_a1_2d
  dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb5_a1_cube
  dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgba4_2d_array
  dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgba4_3d
  dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb5_a1_2d_array
  dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb5_a1_3d
  dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgba4_2d_array
  dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgba4_3d
  dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb5_a1_2d_array
  dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb5_a1_3d
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agor600: force LOD range to be only one value when mip.min filter is NONE
Gert Wollny [Sun, 1 Jul 2018 17:32:09 +0000 (19:32 +0200)]
r600: force LOD range to be only one value when mip.min filter is NONE

For a texture that has only one LOD defined, but for which
GL_TEXTURE_MAX_LEVEL is the default (1000) and
GL_TEXTURE_MIN_LOD != GL_TEXTURE_MAX_LOD the reading from the texture does
not properly resolve the LOD level and texture lookup might fail. Hence,
when no mipmap filter is given (indicating that no mip-mapping takes place),
force the LOD range to contain only value.

Fixes:
  dEQP-GLES3.functional.shaders.texture_functions.texture*.(i|u)sampler2d*
  dEQP-GLES3.functional.texture.format.sized.cube.rgb*
  out of VK_GL_CTS/android/cts/master/gles3-master.txt
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agomesa/st: draw_vbo: initialize restart_index too
Gert Wollny [Sun, 1 Jul 2018 08:05:35 +0000 (10:05 +0200)]
mesa/st: draw_vbo: initialize restart_index too

restart_index is later always used in a comparison, so it should be
initialized properly.

Fixes valgrind warning:
 Conditional jump or move depends on uninitialised value(s)
    at 0xB8D682F: r600_draw_vbo (r600_state_common.c:2153)
    by 0xB71F743: u_vbuf_draw_vbo (u_vbuf.c:1156)
    by 0xB3B92DB: st_draw_vbo (st_draw.c:235)
    by 0xB36B1AE: vbo_draw_arrays (vbo_exec_array.c:391)
    by 0xB36BB0D: vbo_exec_DrawArrays (vbo_exec_array.c:550)
    by 0x10A989: piglit_display (textureSize.c:157)
    by 0x4F8F174: run_test (piglit_fbo_framework.c:52)
    by 0x4F7BA12: piglit_gl_test_run (piglit-framework-gl.c:229)
    by 0x10A60A: main (textureSize.c:71)
 Uninitialised value was created by a stack allocation
    at 0xB3B90B0: st_draw_vbo (st_draw.c:143)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
5 years agomesa: enable ARB_direct_state_access in OpenGL 4.5 compat profile
Timothy Arceri [Wed, 4 Jul 2018 01:06:41 +0000 (11:06 +1000)]
mesa: enable ARB_direct_state_access in OpenGL 4.5 compat profile

Its unlikely anyone will add proper ARB_direct_state_access compat
support before we branch 18.2. Enabling the extension in 4.5 at
least allows users to make use of MESA_GL_VERSION_OVERRIDE=4.5COMPAT
for games like No Mans Sky.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoutil/drirc: turn on force_glsl_extensions_warn for No Mans Sky
Timothy Arceri [Wed, 4 Jul 2018 00:38:17 +0000 (10:38 +1000)]
util/drirc: turn on force_glsl_extensions_warn for No Mans Sky

The game forgets to enable multiple extensions in its shaders, one
of those extesions is EXT_texture_array. But enabling this config
entry fixes at least one other rendering issue that enabling
EXT_texture_array on its own doesn't fix.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoutil/queue: remove leftover debug code
Marek Olšák [Thu, 5 Jul 2018 02:19:08 +0000 (22:19 -0400)]
util/queue: remove leftover debug code

5 years agoShorten u_queue names
Marek Olšák [Tue, 3 Jul 2018 18:49:42 +0000 (14:49 -0400)]
Shorten u_queue names

There is a 15-character limit for thread names shared by the queue name
and process name. Shorten the thread name to make space for the process
name.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agokutil/queue: add a process name into a thread name
Marek Olšák [Tue, 3 Jul 2018 18:48:16 +0000 (14:48 -0400)]
kutil/queue: add a process name into a thread name

v2: simplifications

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
5 years agogallium/os: use util_get_process_name when possible
Marek Olšák [Tue, 3 Jul 2018 18:16:17 +0000 (14:16 -0400)]
gallium/os: use util_get_process_name when possible

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoutil: extract get_process_name from xmlconfig.c
Marek Olšák [Tue, 3 Jul 2018 18:07:05 +0000 (14:07 -0400)]
util: extract get_process_name from xmlconfig.c

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoac: fold LLVMContext creation into ac_llvm_context_init
Marek Olšák [Wed, 4 Jul 2018 05:37:30 +0000 (01:37 -0400)]
ac: fold LLVMContext creation into ac_llvm_context_init

Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi: reorder code in si_llvm_context_init
Marek Olšák [Wed, 4 Jul 2018 05:35:10 +0000 (01:35 -0400)]
radeonsi: reorder code in si_llvm_context_init

Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agoradeonsi: use ac_compile_module_to_binary to reduce compile times
Marek Olšák [Wed, 4 Jul 2018 05:28:17 +0000 (01:28 -0400)]
radeonsi: use ac_compile_module_to_binary to reduce compile times

Compile times of simple shaders are reduced by ~20%.
Compile times of prologs and epilogs are reduced by up to 40%.

Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agoac: add reusable helpers for direct LLVM compilation
Marek Olšák [Wed, 4 Jul 2018 05:11:47 +0000 (01:11 -0400)]
ac: add reusable helpers for direct LLVM compilation

This is basically LLVMTargetMachineEmitToMemoryBuffer inlined and reworked.

struct ac_compiler_passes (opaque type) contains the main pass manager.

ac_create_llvm_passes -- the result can go to thread local storage
ac_destroy_llvm_passes -- can be called by a destructor in TLS
ac_compile_module_to_binary -- from LLVMModuleRef to ac_shader_binary

The motivation is to do the expensive call addPassesToEmitFile once
per context or thread.

Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agonvc0: implement multisampled images on Maxwell+
Rhys Perry [Wed, 4 Jul 2018 09:21:41 +0000 (10:21 +0100)]
nvc0: implement multisampled images on Maxwell+

Changes in v2:
- make loadSuInfo32() protected without making the rest protected
- move NVC0_SU_INFO_* into nv50_ir_lowering_nvc0.h instead of duplicating
  NVC0_SU_INFO_MS

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
5 years agoi965: Fix output register sizes when variable ranges are interleaved
Neil Roberts [Fri, 18 May 2018 11:39:13 +0000 (13:39 +0200)]
i965: Fix output register sizes when variable ranges are interleaved

In 6f5abf31466aed this code was fixed to calculate the maximum size of
an attribute in a seperate pass and then allocate the registers to
that size. However this wasn’t taking into account ranges that overlap
but don’t have the same starting location. For example:

layout(location = 0, component = 0) out float a[4];
layout(location = 2, component = 1) out float b[4];

Previously, if ‘a’ was processed first then it would allocate a
register of size 4 for location 0 and it wouldn’t allocate another
register for location 2 because it would already be covered by the
range of 0. Then if something tries to write to b[2] it would try to
write past the end of the register allocated for ‘a’ and it would hit
an assert.

This patch changes it to scan for any overlapping ranges that start
within each range to calculate the maximum extent and allocate that
instead.

Fixed Piglit’s arb_enhanced_layouts/execution/component-layout/
vs-fs-array-interleave-range.shader_test

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: 6f5abf31466 "i965: Fix output register sizes when multiple variables
       share a slot."

5 years agor600/sb: cleanup if_conversion iterator to be legal C++
Dave Airlie [Fri, 29 Jun 2018 02:47:26 +0000 (03:47 +0100)]
r600/sb: cleanup if_conversion iterator to be legal C++

The current code causes:
/usr/include/c++/8/debug/safe_iterator.h:207:
Error: attempt to copy from a singular iterator.

This is due to the iterators getting invalidated, fix the
reverse iterator to use the return value from erase, and
cast it properly.

(used Mathias suggestion)
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
5 years agoradeonsi: fix compiler breakage
Marek Olšák [Wed, 4 Jul 2018 04:13:04 +0000 (00:13 -0400)]
radeonsi: fix compiler breakage

Broken by d853d3a59bd5f8720a5b021bcd64a193d370b623.

5 years agoac: make some fns static
Dave Airlie [Wed, 27 Jun 2018 00:24:18 +0000 (10:24 +1000)]
ac: make some fns static

Some of the compiler functions are no longer called outside
the util file.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoac/radv: move llvm compiler info to struct and init in one place
Dave Airlie [Tue, 26 Jun 2018 23:27:03 +0000 (09:27 +1000)]
ac/radv: move llvm compiler info to struct and init in one place

This ports radv to the shared code, however due to a bug in LLVM
version prior to 7, radv cannot add target info at this stage,
as it would leak one for every shader compile, however I'd prefer
to keep this llvm damage in the shared code, since it isn't the
driver at fault here. We just add a flag to denote if the driver
can support leaking the target info or not, and the common code
does the right thing depending on the llvm version.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoac/radeonsi: port compiler init/destroy out of radeonsi.
Dave Airlie [Mon, 2 Jul 2018 23:51:42 +0000 (09:51 +1000)]
ac/radeonsi: port compiler init/destroy out of radeonsi.

We want to share this code with radv in the future, so port
it out of radeonsi.

Add a return value as radv will want that to know if this
succeeds

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoradv/radeonsi: add a check ir tm options
Dave Airlie [Mon, 2 Jul 2018 23:44:22 +0000 (09:44 +1000)]
radv/radeonsi: add a check ir tm options

This doesn't do much yet, but it makes it easier to move the code
to a common shared code base.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoradeonsi: rename si_compiler -> ac_llvm_compiler
Dave Airlie [Mon, 2 Jul 2018 23:39:27 +0000 (09:39 +1000)]
radeonsi: rename si_compiler -> ac_llvm_compiler

As precursor to moving init to common code, just rename the struct
and move it.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoac: add target library info helpers
Dave Airlie [Tue, 26 Jun 2018 23:34:42 +0000 (09:34 +1000)]
ac: add target library info helpers

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoradv: create/destroy passmgr at the higher level.
Dave Airlie [Tue, 26 Jun 2018 23:11:47 +0000 (09:11 +1000)]
radv: create/destroy passmgr at the higher level.

This is prep work for moving this to a per-thread struct

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoradv: port to use common passmgr code.
Dave Airlie [Tue, 26 Jun 2018 23:02:25 +0000 (09:02 +1000)]
radv: port to use common passmgr code.

This adds a inline always pass, but otherwise should work the
same.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoac/radeonsi: refactor out pass manager init to common code.
Dave Airlie [Tue, 26 Jun 2018 22:52:20 +0000 (08:52 +1000)]
ac/radeonsi: refactor out pass manager init to common code.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoradv: drop copy of ac_create_target_machine.
Dave Airlie [Tue, 26 Jun 2018 22:38:30 +0000 (08:38 +1000)]
radv: drop copy of ac_create_target_machine.

Once we split the init once stuff out, this can be shared again.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoac/radv: split the non-common init_once code from the common target code. (v2)
Dave Airlie [Tue, 26 Jun 2018 22:36:41 +0000 (08:36 +1000)]
ac/radv: split the non-common init_once code from the common target code. (v2)

This just splits out the non-shared code and reuses ac_get_llvm_target in radv.

v2: rebase on Marek's patch - fixup brace position/whitespace

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoi965: Use the new nir atomic counter linker for SPIR-V shaders
Neil Roberts [Wed, 29 Nov 2017 09:14:25 +0000 (10:14 +0100)]
i965: Use the new nir atomic counter linker for SPIR-V shaders

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agoi965: enable AtomicStorage capability for gen7+
Alejandro Piñeiro [Sat, 28 Oct 2017 09:27:17 +0000 (11:27 +0200)]
i965: enable AtomicStorage capability for gen7+

That is the same gen requirement for ARB_shader_atomic_counters.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agomesa/glspirv: lower workgroup access to offsets
Antia Puentes [Wed, 14 Feb 2018 11:58:33 +0000 (12:58 +0100)]
mesa/glspirv: lower workgroup access to offsets

This will perform the CS shared lowering. See 8761a04d0d93

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir: Fix OpAtomicCounterIDecrement for uniform atomic counters
Antia Puentes [Thu, 22 Feb 2018 12:50:23 +0000 (13:50 +0100)]
nir: Fix OpAtomicCounterIDecrement for uniform atomic counters

From the SPIR-V 1.0 specification, section 3.32.18, "Atomic
Instructions":

   "OpAtomicIDecrement:
    <skip>
    The instruction's result is the Original Value."

However, we were implementing it, for uniform atomic counters, as a
pre-decrement operation, as was the one available from GLSL.

Renamed the former nir intrinsic 'atomic_counter_dec*' to
'atomic_counter_pre_dec*' for clarification purposes, as it implements
a pre-decrement operation as specified for GLSL. From GLSL 4.50 spec,
section 8.10, "Atomic Counter Functions":

   "uint atomicCounterDecrement (atomic_uint c)

    Atomically
    1. decrements the counter for c, and
    2. returns the value resulting from the decrement operation.

    These two steps are done atomically with respect to the atomic
    counter functions in this table."

Added a new nir intrinsic 'atomic_counter_post_dec*' which implements
a post-decrement operation as required by SPIR-V.

v2: (Timothy Arceri)
   * Add extra spec quotes on commit message
   * Use "post" instead "pos" to avoid confusion with "position"

Signed-off-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir/linker: Add a pure NIR implementation of the atomic counter linker
Neil Roberts [Tue, 28 Nov 2017 12:39:44 +0000 (13:39 +0100)]
nir/linker: Add a pure NIR implementation of the atomic counter linker

This is mostly just a straight-forward conversion of
link_assign_atomic_counter_resources to C directly using nir variables
instead of GLSL IR variables.

It is based on the version of link_assign_atomic_counter_resources in
6b8909f2d1906. I’m noting this here to make it easier to track changes
and keep the NIR version up-to-date.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir/types: Add wrappers for a couple of atomic counter methods
Neil Roberts [Tue, 28 Nov 2017 12:38:32 +0000 (13:38 +0100)]
nir/types: Add wrappers for a couple of atomic counter methods

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agospirv/nir: add capability check for SpvCapabilityAtomicStorage
Alejandro Piñeiro [Sat, 28 Oct 2017 08:57:35 +0000 (10:57 +0200)]
spirv/nir: add capability check for SpvCapabilityAtomicStorage

Capability that informs if atomic counters are supported. From SPIR-V
1.0 spec, section 3.7, "Storage Class", item 10 from table:

(Column "Storage Class"):

   "AtomicCounter For holding atomic counters. Visible across all
    functions of the current invocation. Atomic counter-specific
    memory."

(Column "Required Capability"):

   "AtomicStorage"

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agospirv/nir: add atomic counter support on vtn_handle_ssbo_or_shared_atomic
Alejandro Piñeiro [Tue, 31 Oct 2017 12:12:11 +0000 (13:12 +0100)]
spirv/nir: add atomic counter support on vtn_handle_ssbo_or_shared_atomic

So renamed to a more general vtn_handle_atomics

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agospirv/nir: initialize offset on the nir var at vtn_create_variable
Alejandro Piñeiro [Sun, 5 Nov 2017 15:19:43 +0000 (16:19 +0100)]
spirv/nir: initialize offset on the nir var at vtn_create_variable

This is convenient when dealing with atomic counter uniforms. The
alternative would be doing that at vtn_handle_atomics.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir/spirv: Fix atomic counter (multidimensional-)arrays
Antia Puentes [Wed, 2 May 2018 20:28:43 +0000 (22:28 +0200)]
nir/spirv: Fix atomic counter (multidimensional-)arrays

When constructing NIR if we have a SPIR-V uint variable and the
storage class is SpvStorageClassAtomicCounter, we store as NIR's
glsl_type an atomic_uint to reflect the fact that the variable is an
atomic counter.

However, we were tweaking the type only for atomic_uint scalars, we
have to do it as well for atomic_uint arrays and atomic_uint arrays of
arrays of any depth.

Signed-off-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
v2: update after deref patches got pushed (Alejandro Piñeiro)
v3: simplify repair_atomic_type (suggested by Timothy Arceri, included
    on the patch by Alejandro)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agospirv/nir: tweak nir type when storage class is SpvStorageClassAtomicCounter
Alejandro Piñeiro [Fri, 10 Nov 2017 15:57:40 +0000 (16:57 +0100)]
spirv/nir: tweak nir type when storage class is SpvStorageClassAtomicCounter

GLSL types differentiates uint from atomic uint. On SPIR-V the type is
uint, and the variable has a specific storage class. So we need to
tweak the type based on the storage class.

Ideally we would like to get the proper type at vtn_handle_type, but
we don't have the storage class at that moment.

We tweak only the nir type, as is the one that really requires it.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir_types: add glsl_atomic_uint_type() helper
Alejandro Piñeiro [Fri, 10 Nov 2017 15:32:41 +0000 (16:32 +0100)]
nir_types: add glsl_atomic_uint_type() helper

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agospirv/nir: add offset at vtn_variable
Alejandro Piñeiro [Sun, 5 Nov 2017 11:00:19 +0000 (12:00 +0100)]
spirv/nir: add offset at vtn_variable

Also initialize it on var_decoration_cb

This is equivalent to nir_variable.offset, used to store the location
an atomic counter is stored at.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agospirv/nir: SpvStorageClassAtomicCounter support on vtn_storage_class_to_mode
Alejandro Piñeiro [Fri, 27 Oct 2017 10:40:35 +0000 (12:40 +0200)]
spirv/nir: SpvStorageClassAtomicCounter support on vtn_storage_class_to_mode

Atomic Counters are uniforms per spec.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agonir/linker: handle uniforms without explicit location
Alejandro Piñeiro [Fri, 13 Apr 2018 13:47:49 +0000 (15:47 +0200)]
nir/linker: handle uniforms without explicit location

ARB_gl_spirv points that uniforms in general need explicit
location. But there are still some cases of uniforms without location,
like for example uniform atomic counters. Those doesn't have a
location from the OpenGL point of view (they are identified with a
binding and offset), but Mesa internally assigns it a location.

Signed-off-by: Eduardo Lima <elima@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
v2: squash with another patch, minor variable name tweak (Timothy
Arceri)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agocompiler/glsl: refactor empty_uniform_block utilities to linker_util
Alejandro Piñeiro [Tue, 26 Jun 2018 14:28:59 +0000 (16:28 +0200)]
compiler/glsl: refactor empty_uniform_block utilities to linker_util

This includes:
  * Move the defition of empty_uniform_block to linker_util.h
  * Move find_empty_block (with a rename) to linker_util.h
  * Refactor some code at linker.cpp to a new method at linker_util.h
    (link_util_update_empty_uniform_locations)

So all that code could be used by the GLSL linker and the NIR linker
used for ARB_gl_spirv.

v2: include just "ir_uniform.h" (Timothy Arceri)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agoi965/vec4: Don't cmod propagate from CMP to ADD if the writemask isn't compatible
Ian Romanick [Thu, 28 Jun 2018 00:25:34 +0000 (17:25 -0700)]
i965/vec4: Don't cmod propagate from CMP to ADD if the writemask isn't compatible

Otherwise we can incorrectly cmod propagate in situations like

    add(8)          g10<1>.xD       g2<0>.xD        -16D
    ...
    cmp.ge.f0(8)    null<1>D        g2<0>.xD        16D
    ...
    (+f0) sel(8)    g21<1>.xyUD     g14<4>.xyyyUD   g18<4>.xyyyUD

Sadly, this change hurts quite a few shaders.

v2: Refactor writemask compatibility check into a separate function.
Suggested by Caio.

Ivy Bridge and Haswell had similar results. (Haswell shown)
total instructions in shared programs: 12968489 -> 12968738 (<.01%)
instructions in affected programs: 60679 -> 60928 (0.41%)
helped: 0
HURT: 249
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.22% max: 0.81% x̄: 0.46% x̃: 0.44%
95% mean confidence interval for instructions value: 1.00 1.00
95% mean confidence interval for instructions %-change: 0.44% 0.48%
Instructions are HURT.

total cycles in shared programs: 409171965 -> 409172317 (<.01%)
cycles in affected programs: 260056 -> 260408 (0.14%)
helped: 0
HURT: 176
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.04% max: 0.34% x̄: 0.17% x̃: 0.17%
95% mean confidence interval for cycles value: 2.00 2.00
95% mean confidence interval for cycles %-change: 0.16% 0.18%
Cycles are HURT.

Sandy Bridge
total instructions in shared programs: 10423577 -> 10423753 (<.01%)
instructions in affected programs: 40667 -> 40843 (0.43%)
helped: 0
HURT: 176
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.29% max: 0.79% x̄: 0.48% x̃: 0.42%
95% mean confidence interval for instructions value: 1.00 1.00
95% mean confidence interval for instructions %-change: 0.46% 0.51%
Instructions are HURT.

total cycles in shared programs: 146097503 -> 146097855 (<.01%)
cycles in affected programs: 503990 -> 504342 (0.07%)
helped: 0
HURT: 176
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.02% max: 0.36% x̄: 0.12% x̃: 0.11%
95% mean confidence interval for cycles value: 2.00 2.00
95% mean confidence interval for cycles %-change: 0.11% 0.13%
Cycles are HURT.

No changes on any other platforms.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: cd635d149b2 i965/vec4: Propagate conditional modifiers from compares to adds
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agointel/compiler: Silence unused parameter warnings brw_nir.c
Ian Romanick [Wed, 23 May 2018 18:33:51 +0000 (11:33 -0700)]
intel/compiler: Silence unused parameter warnings brw_nir.c

src/intel/compiler/brw_nir.c: In function ‘brw_nir_lower_vue_outputs’:
src/intel/compiler/brw_nir.c:464:32: warning: unused parameter ‘is_scalar’ [-Wunused-parameter]
                           bool is_scalar)
                                ^~~~~~~~~
src/intel/compiler/brw_nir.c: In function ‘lower_bit_size_callback’:
src/intel/compiler/brw_nir.c:610:57: warning: unused parameter ‘data’ [-Wunused-parameter]
 lower_bit_size_callback(const nir_alu_instr *alu, void *data)
                                                         ^~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoi965: Fix BRW_NEW_NUM_SAMPLES to be in .brw, not .mesa
Kenneth Graunke [Mon, 2 Jul 2018 21:17:37 +0000 (14:17 -0700)]
i965: Fix BRW_NEW_NUM_SAMPLES to be in .brw, not .mesa

This is the wrong kind of dirty bit.  Caught by GCC warnings, due to
64-bit values being truncated to 32 bits.

Fixes: b95b0e2918c052068caeb4f6c2802ba89be043a3 (intel/anv,blorp,i965: Implement the SKL 16x MSAA SIMD32 workaround)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoanv: Add support for the on-disk shader cache
Jason Ekstrand [Sat, 30 Jun 2018 00:08:30 +0000 (17:08 -0700)]
anv: Add support for the on-disk shader cache

The Vulkan API provides a mechanism for applications to cache their own
shaders and manage on-disk pipeline caching themselves.  Generally, this
is what I would recommend to application developers and I've resisted
implementing driver-side transparent caching in the Vulkan driver for a
long time.  However, not all applications do this and, for some
use-cases, it's just not practical.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agoanv/pipeline_cache: Add a _locked suffix to a function
Jason Ekstrand [Sat, 30 Jun 2018 01:12:34 +0000 (18:12 -0700)]
anv/pipeline_cache: Add a _locked suffix to a function

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agoanv: Add device-level helpers for searching for and uploading kernels
Jason Ekstrand [Sat, 30 Jun 2018 01:02:07 +0000 (18:02 -0700)]
anv: Add device-level helpers for searching for and uploading kernels

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>