mesa.git
7 years agonv50/ir: move LateAlgebraicOpt to the very end
Ilia Mirkin [Thu, 16 Nov 2017 06:48:20 +0000 (01:48 -0500)]
nv50/ir: move LateAlgebraicOpt to the very end

Memory loads can take offsets, but the SHLADD will often attempt to
consume the offsets too. As there may be multiple memory loads with the
same base but different offsets, those would end up in a SHLADD instead
of the offset of the memory operation.

This moves the pass after we've had a chance to attempt to propagate
immediate adds into the indirect offset.

total instructions in shared programs : 6580681 -> 6567716 (-0.20%)
total gprs used in shared programs    : 944261 -> 943375 (-0.09%)
total shared used in shared programs  : 0 -> 0 (0.00%)
total local used in shared programs   : 15328 -> 15328 (0.00%)
total bytes used in shared programs   : 60339896 -> 60221504 (-0.20%)

                local     shared        gpr       inst      bytes
    helped           0           0         555        2698        2698
      hurt           0           0         138         336         336

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agonv50/ir: when merging immediates/consts, load directly
Ilia Mirkin [Thu, 16 Nov 2017 04:32:16 +0000 (23:32 -0500)]
nv50/ir: when merging immediates/consts, load directly

When a MERGE operation gets its constraint moves added, it
susbstantially extends live ranges to be reusing an immediate from
earlier in the program (not to mention the silliness of loading an
immediate into a register, and then moving into another register).

We detect these scenarios and insert moves that take the immediate or
constbuf load directly into the register. If it's the last use, then we
can just move that operation to the closer location.

With SM35 (255 regs) we get these results:

total instructions in shared programs : 6583670 -> 6580681 (-0.05%)
total gprs used in shared programs    : 950818 -> 944261 (-0.69%)
total shared used in shared programs  : 0 -> 0 (0.00%)
total local used in shared programs   : 15328 -> 15328 (0.00%)
total bytes used in shared programs   : 60367456 -> 60339896 (-0.05%)

                local     shared        gpr       inst      bytes
    helped           0           0        4584        3186        3186
      hurt           0           0          55         968         968

I suspect they will be better for SM20 and SM30.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agonv50/ir: add optimization for modulo by a non-power-of-2 value
Ilia Mirkin [Sat, 11 Nov 2017 03:10:46 +0000 (22:10 -0500)]
nv50/ir: add optimization for modulo by a non-power-of-2 value

We can still use the optimized division methods which make use of
multiplication with overflow.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
7 years agonv50/ir: optimize signed integer modulo by pow-of-2
Ilia Mirkin [Sat, 11 Nov 2017 02:47:59 +0000 (21:47 -0500)]
nv50/ir: optimize signed integer modulo by pow-of-2

It's common to use signed int modulo in GLSL. As it happens, the GLSL
specs allow the result to be undefined, but that seems fairly
surprising. It's not that much more effort to get it right, at least for
positive modulo operators.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agoutil: Just give up and define PIPE_ARCH_LITTLE_ENDIAN on MSVC
Matt Turner [Sun, 26 Nov 2017 00:45:27 +0000 (16:45 -0800)]
util: Just give up and define PIPE_ARCH_LITTLE_ENDIAN on MSVC

MSVC doesn't support #warning?! Getting really tired of this.

7 years agodocs: remove bug 103626 from fix list as per 17.2.6
Andres Gomez [Sun, 26 Nov 2017 00:15:43 +0000 (02:15 +0200)]
docs: remove bug 103626 from fix list as per 17.2.6

Bug https://bugs.freedesktop.org/show_bug.cgi?id=103626 was
incorrectly listed as fixed.

Signed-off-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit b9b60dbf55a1307a60a333c70c3add3643243c36)

7 years agoutil: Use preprocessor correctly
Matt Turner [Sat, 25 Nov 2017 23:56:43 +0000 (15:56 -0800)]
util: Use preprocessor correctly

Fixes: 6a353479a757 ("util: Assume little endian in the absence of
                      platform-specific handling")

7 years agodocs: update calendar, add news item and link release notes for 17.2.6
Andres Gomez [Sat, 25 Nov 2017 23:46:25 +0000 (01:46 +0200)]
docs: update calendar, add news item and link release notes for 17.2.6

Signed-off-by: Andres Gomez <agomez@igalia.com>
7 years agodocs: add sha256 checksums for 17.2.6
Andres Gomez [Sat, 25 Nov 2017 23:40:36 +0000 (01:40 +0200)]
docs: add sha256 checksums for 17.2.6

Signed-off-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit 93c2beafc0a7fa2f210b006d22aba61caa71f773)

7 years agodocs: add release notes for 17.2.6
Andres Gomez [Sat, 25 Nov 2017 23:32:53 +0000 (01:32 +0200)]
docs: add release notes for 17.2.6

Signed-off-by: Andres Gomez <agomez@igalia.com>
(cherry picked from commit 00b52f8e99653316a090826914509a138a1c78f7)

7 years agofreedreno/a4xx: add ARB_framebuffer_no_attachments support
Ilia Mirkin [Sun, 19 Nov 2017 21:36:08 +0000 (16:36 -0500)]
freedreno/a4xx: add ARB_framebuffer_no_attachments support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
7 years agofreedreno/a4xx: add indirect draw support
Ilia Mirkin [Sun, 19 Nov 2017 21:32:12 +0000 (16:32 -0500)]
freedreno/a4xx: add indirect draw support

This is a copy of the a5xx logic. Fails a few tests, but basic
functionality is there.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
7 years agofreedreno: regenerate pm4 header, adjust code for new names
Ilia Mirkin [Sun, 19 Nov 2017 21:31:02 +0000 (16:31 -0500)]
freedreno: regenerate pm4 header, adjust code for new names

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
7 years agofreedreno/a4xx: add stencil texturing support
Ilia Mirkin [Sun, 19 Nov 2017 20:13:41 +0000 (15:13 -0500)]
freedreno/a4xx: add stencil texturing support

Copied from a5xx, should be identical.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
7 years agofreedreno/ir3: add a pass to lower tg4 to txl, enable gather on a4xx
Ilia Mirkin [Sun, 19 Nov 2017 17:28:53 +0000 (12:28 -0500)]
freedreno/ir3: add a pass to lower tg4 to txl, enable gather on a4xx

Unfortunately Adreno A4xx hardware returns incorrect results with the
GATHER4 opcodes. As a result, we have to lower to 4 individual texture
calls (txl since we have to force lod to 0). We achieve this using
offsets, including on cube maps which normally never have offsets.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
7 years agonir: allow texture offsets with cube maps
Ilia Mirkin [Sun, 19 Nov 2017 17:27:12 +0000 (12:27 -0500)]
nir: allow texture offsets with cube maps

GL doesn't have this, but some hardware supports it. This is convenient
for lowering tg4 to plain texture calls, which is necessary on Adreno
A4xx hardware.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
7 years agoutil: Fix disk_cache index calculation on big endian
Matt Turner [Thu, 23 Nov 2017 18:41:34 +0000 (10:41 -0800)]
util: Fix disk_cache index calculation on big endian

The cache-test test program attempts to create a collision (using key_a
and key_a_collide) by making the first two bytes identical. The idea is
fine -- the shader cache wants to use the first four characters of a
SHA1 hex digest as the index.

The following program

        unsigned char array[4] = {1, 2, 3, 4};
        int *ptr = (int *)array;

        for (int i = 0; i < 4; i++) {
            printf("%02x", array[i]);
        }
        printf("\n");

        printf("%08x\n", *ptr);

prints

   01020304
   04030201

on little endian, and

   01020304
   01020304

on big endian.

On big endian platforms reading the character array back as an int (as
is done in disk_cache.c) does not yield the same results as reading the
byte array.

To get the first four characters of the SHA1 hex digest when we mask
with CACHE_INDEX_KEY_MASK, we need to byte swap the int on big endian
platforms.

Bugzilla: https://bugs.freedesktop.org/103668
Bugzilla: https://bugs.gentoo.org/637060
Bugzilla: https://bugs.gentoo.org/636326
Fixes: 87ab26b2ab35 ("glsl: Add initial functions to implement an
                      on-disk cache")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoutil: Add a SHA1 unit test program
Matt Turner [Wed, 22 Nov 2017 23:10:47 +0000 (15:10 -0800)]
util: Add a SHA1 unit test program

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoutil: Fix SHA1 implementation on big endian
Matt Turner [Thu, 23 Nov 2017 06:39:51 +0000 (22:39 -0800)]
util: Fix SHA1 implementation on big endian

The code defines a macro blk0(i) based on the preprocessor condition
BYTE_ORDER == LITTLE_ENDIAN. If true, blk0(i) is defined as a byte swap
operation. Unfortunately, if the preprocessor macros used in the test
are no defined, then the comparison becomes 0 == 0 and it evaluates as
true.

Fixes: d1efa09d342b ("util: import sha1 implementation from OpenBSD")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoutil: Assume little endian in the absence of platform-specific handling
Matt Turner [Sat, 25 Nov 2017 04:25:04 +0000 (20:25 -0800)]
util: Assume little endian in the absence of platform-specific handling

7 years agomesa: shrink VERT_ATTRIB bitfields to 32 bits
Marek Olšák [Wed, 15 Nov 2017 22:53:04 +0000 (23:53 +0100)]
mesa: shrink VERT_ATTRIB bitfields to 32 bits

There are only 32 vertex attribs now.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
7 years agomesa: remove unused vertex attrib WEIGHT
Marek Olšák [Wed, 15 Nov 2017 22:24:56 +0000 (23:24 +0100)]
mesa: remove unused vertex attrib WEIGHT

We don't support ARB_vertex_blend.

Note that the attribute aliasing check for ARB_vertex_program had to be
rewritten.

vbo_context: 20344 -> 20008 bytes
gl_context: 74672 -> 74616 bytes

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
7 years agomesa: don't assign numbers to vertex attrib enums manually
Marek Olšák [Wed, 15 Nov 2017 21:58:58 +0000 (22:58 +0100)]
mesa: don't assign numbers to vertex attrib enums manually

I plan to remove one of them.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
7 years agogallium/hud: add HUD sharing within a context share group
Marek Olšák [Sun, 19 Nov 2017 20:29:46 +0000 (21:29 +0100)]
gallium/hud: add HUD sharing within a context share group

This is needed for profiling multi-context applications like Chrome.
One context can record queries and another context can draw the HUD.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/hud: update the HUD interface for multiple contexts
Marek Olšák [Sun, 19 Nov 2017 20:04:07 +0000 (21:04 +0100)]
gallium/hud: update the HUD interface for multiple contexts

This is the boring subset of the following commit.
All new parameters are optional.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/hud: prevent a crash if the recording context is inactive
Marek Olšák [Sun, 19 Nov 2017 03:36:38 +0000 (04:36 +0100)]
gallium/hud: prevent a crash if the recording context is inactive

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/hud: separate code for record context init/release
Marek Olšák [Sat, 18 Nov 2017 17:07:40 +0000 (18:07 +0100)]
gallium/hud: separate code for record context init/release

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/hud: separate code for draw context init/release
Marek Olšák [Sat, 18 Nov 2017 17:07:40 +0000 (18:07 +0100)]
gallium/hud: separate code for draw context init/release

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/hud: don't use hud->pipe in hud_parse_env_var
Marek Olšák [Sat, 18 Nov 2017 16:53:34 +0000 (17:53 +0100)]
gallium/hud: don't use hud->pipe in hud_parse_env_var

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/hud: use cso_get_pipe_context
Marek Olšák [Sat, 18 Nov 2017 16:46:51 +0000 (17:46 +0100)]
gallium/hud: use cso_get_pipe_context

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agocso: add cso_get_pipe_context
Marek Olšák [Sat, 18 Nov 2017 16:43:42 +0000 (17:43 +0100)]
cso: add cso_get_pipe_context

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/hud: pass pipe_context explicitly to most functions
Marek Olšák [Sat, 18 Nov 2017 15:25:52 +0000 (16:25 +0100)]
gallium/hud: pass pipe_context explicitly to most functions

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/hud: split hud_draw into 3 separate functions
Marek Olšák [Sat, 18 Nov 2017 14:23:23 +0000 (15:23 +0100)]
gallium/hud: split hud_draw into 3 separate functions

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agost/dri: remove dead code and incorrect comment around make_current
Marek Olšák [Sat, 18 Nov 2017 23:24:40 +0000 (00:24 +0100)]
st/dri: remove dead code and incorrect comment around make_current

Core Mesa already handles flushing based on ContextReleaseBehavior,
so the comment is wrong.

Also, old_st is always NULL, because unbind_context always precedes
make_current.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agost/dri: clean up dri_unbind_context
Marek Olšák [Sat, 18 Nov 2017 23:19:19 +0000 (00:19 +0100)]
st/dri: clean up dri_unbind_context

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: expose all CB performance counters on Stoney
Marek Olšák [Tue, 21 Nov 2017 00:47:30 +0000 (01:47 +0100)]
radeonsi: expose all CB performance counters on Stoney

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: handle imported textures with DCC robustly
Marek Olšák [Mon, 20 Nov 2017 00:51:50 +0000 (01:51 +0100)]
radeonsi: handle imported textures with DCC robustly

now you can hack the driver to enable DCC for displayable textures and
Glamor that doesn't enable that by default won't crash anymore.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: fix a typo in creating monolithic ES-GS
Marek Olšák [Fri, 17 Nov 2017 16:52:09 +0000 (17:52 +0100)]
radeonsi: fix a typo in creating monolithic ES-GS

This has no effect because both occupy the same memory in a union.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: don't write undefined output channels to LDS in LS
Marek Olšák [Fri, 17 Nov 2017 03:56:13 +0000 (04:56 +0100)]
radeonsi: don't write undefined output channels to LDS in LS

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: use ac.lds for shared memory
Marek Olšák [Thu, 9 Nov 2017 22:34:26 +0000 (23:34 +0100)]
radeonsi: use ac.lds for shared memory

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: do 64-bit LDS loads recursively
Marek Olšák [Thu, 9 Nov 2017 22:25:34 +0000 (23:25 +0100)]
radeonsi: do 64-bit LDS loads recursively

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agomapi: Teach es{1,2}api/ABI-check shared library names on Cygwin
Jon Turney [Sat, 11 Nov 2017 14:48:10 +0000 (14:48 +0000)]
mapi: Teach es{1,2}api/ABI-check shared library names on Cygwin

Ideally we'd be able to get the library filename from libtool, but that
doesn't seem to be a feature...

Use of ${uname} is presumably ok here as we won't be running 'make check' if
we are cross-compiling

Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agoRevert "radv: remove unnecessary memset() in radv_AllocateCommandBuffers()"
Samuel Pitoiset [Wed, 22 Nov 2017 15:13:28 +0000 (16:13 +0100)]
Revert "radv: remove unnecessary memset() in radv_AllocateCommandBuffers()"

This fixes two CTS regressions:
- dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_primary
- dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_secondary

These two tests are part the mustpass lists, so presumably they
are correct and my change was wrong.

This reverts commit 0f68208f1d1d3b7b2963dab40e84c60212518692.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoradv/winsys: improve error messages when the buffer list creation failed
Samuel Pitoiset [Wed, 22 Nov 2017 19:13:26 +0000 (20:13 +0100)]
radv/winsys: improve error messages when the buffer list creation failed

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoradv/winsys: do not try to create a BO list with 0 buffers
Samuel Pitoiset [Wed, 22 Nov 2017 19:13:25 +0000 (20:13 +0100)]
radv/winsys: do not try to create a BO list with 0 buffers

This happens when all BOs have the RADEON_FLAG_NO_INTERPROCESS_SHARING
(DRM version >= 3.23) flag set. This flag is mainly used for reducing
overhead on the userspace side because we don't have to put those BOs
inside the list.

Though, if the driver tries to create a list with 0 buffers inside it,
libdrm returns -EINVAL and the app just crashes.

This fixes a bunch of CTS dEQP-VK.sparse_resources.* fails (~100).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoi965/vec4: fix splitting of interleaved attributes
Iago Toral Quiroga [Tue, 21 Nov 2017 10:33:53 +0000 (11:33 +0100)]
i965/vec4: fix splitting of interleaved attributes

When we split an instruction that reads an uniform value
(vstride 0) we need to respect the vstride on the second
half of the instruction (that is, the second half should
read the same region as the first).

We were doing this already, but we didn't account for
stages that have interleaved input attributes which also
have a vstride of 0 and need the same treatment.

Fixes the following on Haswell:
KHR-GL45.enhanced_layouts.varying_locations
KHR-GL45.enhanced_layouts.varying_array_locations
KHR-GL45.enhanced_layouts.varying_structure_locations

Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Andres Gomez <agomez@igalia.com>
7 years agoetnaviv: Emit vertex buffers consecutively
Wladimir J. van der Laan [Thu, 23 Nov 2017 09:08:34 +0000 (10:08 +0100)]
etnaviv: Emit vertex buffers consecutively

Vertex buffer legacy state is no longer picked up with new drawing
commands. Change to use different cases depending on the number of
vertex streams in the GPU specs.

This results in slightly more compact state emission as well, on all
vivantes.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
7 years agoREVIEWERS: add Alexander von Gluck IV as a reviewer for Haiku
Eric Engestrom [Thu, 9 Nov 2017 17:38:25 +0000 (17:38 +0000)]
REVIEWERS: add Alexander von Gluck IV as a reviewer for Haiku

There's been some Haiku-related activity lately, so let's document who
to cc on these patches.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Alexander von Gluck IV <kallisti5@unixzen.com>
7 years agogenxml: fix assert guards
Eric Engestrom [Wed, 22 Nov 2017 10:11:13 +0000 (10:11 +0000)]
genxml: fix assert guards

This removes a few hundred warnings on debug builds with asserts off.

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agomeson: add variable for mapi_abi.py instead of going back up the tree
Eric Engestrom [Tue, 21 Nov 2017 15:07:50 +0000 (15:07 +0000)]
meson: add variable for mapi_abi.py instead of going back up the tree

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
7 years agomeson: reorder subdirs to avoid directly including more than one level
Eric Engestrom [Tue, 21 Nov 2017 15:07:11 +0000 (15:07 +0000)]
meson: reorder subdirs to avoid directly including more than one level

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
7 years agomeson: fix strtof locale support check
Eric Engestrom [Tue, 21 Nov 2017 14:24:01 +0000 (14:24 +0000)]
meson: fix strtof locale support check

Fixes: d1992255bb29054fa5176 "meson: Add build Intel "anv" vulkan driver"
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
7 years agor600: set DX10_CLAMP for compute shader too
Roland Scheidegger [Wed, 22 Nov 2017 02:11:33 +0000 (03:11 +0100)]
r600: set DX10_CLAMP for compute shader too

I really intended to set this for all shader stages by
3835009796166968750ff46cf209f6d4208cda86 but missed it for compute shaders
(because it's in a different source file...).

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoanv: flag batch & instruction BOs for capture
Lionel Landwerlin [Fri, 17 Nov 2017 17:29:26 +0000 (17:29 +0000)]
anv: flag batch & instruction BOs for capture

When the kernel support flagging our BO, let's mark batch &
instruction BOs for capture so then can be included in the error
state.

v2: Only add EXEC_CAPTURE if supported (Kristian)

v3: Fix operator precedence issue (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoanv: setup BO flags at state_pool/block_pool creation
Lionel Landwerlin [Fri, 17 Nov 2017 17:26:59 +0000 (17:26 +0000)]
anv: setup BO flags at state_pool/block_pool creation

This will allow to set the flags on any anv_bo created/filled from a
state pool or block pool later.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agor600/shader: Fix all warnings issed with "-Wall -Wextra"
Gert Wollny [Wed, 15 Nov 2017 09:29:11 +0000 (10:29 +0100)]
r600/shader: Fix all warnings issed with "-Wall -Wextra"

- fix a number of -Wsign-compare warnings
- fix two warnings for -Woverride-init because TGSI_OPCODE_CEIL == 83, and
  the according field was defined two times.

[airlied: don't use -1 with unsigned type,
fix whitespace]

Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agor600: Emit EOP for more CF instruction types
Gert Wollny [Fri, 17 Nov 2017 11:13:40 +0000 (12:13 +0100)]
r600: Emit EOP for more CF instruction types

So far on pre-cayman chipsets the CF instructions CF_OP_LOOP_END,
CF_OP_CALL_FS, CF_OP_POP, and CF_OP_GDS an extra CF_NOP instruction
was added to add the EOP flag, even though this is not actually
needed, because all these instrutions support the EOP flag.

This patch removes the fixup code, adds setting the EOP flag for the
according instructions as well as others like CF_OP_TEX and CF_OP_VTX,
and adds writing out EOP for this type of instruction in the disassembler.

This also fixes a bug where shaders were created that didn't actually have
the EOP flag set in the last CF instruction, which might have resulted
in GPU lockups.

[airlied: cleaned up a little]
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agomeson: replace with_*dri with with_dri_platform
Dylan Baker [Tue, 21 Nov 2017 00:34:28 +0000 (16:34 -0800)]
meson: replace with_*dri with with_dri_platform

This fixes the windows and macos stubs to be consistent with the *nix
path.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agomeson: add logic to select apple and windows dri
Dylan Baker [Sat, 28 Oct 2017 00:20:52 +0000 (17:20 -0700)]
meson: add logic to select apple and windows dri

This is still not fully correct (haiku and BSD is notably probably not
correct), but Linux is not regressed and this should be correct for
macOS and Windows.

v2: - set the dri_platform to windows on Cygwin as well (Jon)
v3: - Add a better todo for Hurd (Eric)

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agomeson: Fix LLVM requires for radeonsi
Dylan Baker [Tue, 21 Nov 2017 00:26:06 +0000 (16:26 -0800)]
meson: Fix LLVM requires for radeonsi

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agomeson: convert llvm option to tristate
Dylan Baker [Sat, 18 Nov 2017 00:37:50 +0000 (16:37 -0800)]
meson: convert llvm option to tristate

This option has been acting as a strange sort of half-tri state anyway.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agomeson: Convert platform to auto
Dylan Baker [Thu, 16 Nov 2017 01:31:32 +0000 (17:31 -0800)]
meson: Convert platform to auto

This is necessary to support operating systems other than the *nix
family (excluding macOS). For Linux nothing has changed, the defaults
are still the same.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agomeson: Remove duplicate _GNU_SOURCE
Dylan Baker [Thu, 16 Nov 2017 01:30:52 +0000 (17:30 -0800)]
meson: Remove duplicate _GNU_SOURCE

There is one provided unconditionally, and one guarded by platform ==
linux. Remove the unconditional one.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agomeson: Remove completed or irrelevant TODO comments
Dylan Baker [Thu, 16 Nov 2017 01:09:33 +0000 (17:09 -0800)]
meson: Remove completed or irrelevant TODO comments

These are all either done already, or are autotools specific. The
misspelled gallium G3DVL is the autotools specific bit, meson is
handling that via build_by_default.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agomeson: Fix TODO for missing dl_iterate_phdr function
Dylan Baker [Thu, 16 Nov 2017 01:07:37 +0000 (17:07 -0800)]
meson: Fix TODO for missing dl_iterate_phdr function

This function is required for both the Intel "Anvil" vulkan driver and
the i965 GL driver. Error out if either of those is enabled but this
function isn't found.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agomeson: disable x86 asm in fewer cases.
Dylan Baker [Thu, 16 Nov 2017 00:53:40 +0000 (16:53 -0800)]
meson: disable x86 asm in fewer cases.

This patch allows building asm for x86 on x86_64 platforms, when the
operating system is the same. Previously cross compile always turned off
assembly. This allows using a cross file to cross compile x86 binaries
on x86_64 with asm.

This could probably be relaxed further thanks to meson's "exe_wrapper",
which is way to specify an emulator or compatibility layer (wine) that
can run the foreign binaries on the build system. Since the meson build
at this point only supports building on Linux I can't test this and I
don't want to write/enable code that cannot even be build tested.

v4: - set condition to build == x86_64 and host == x86 and
      build.system == host.system

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agomeson: Enable SSE4.1 optimizations
Dylan Baker [Thu, 16 Nov 2017 00:09:22 +0000 (16:09 -0800)]
meson: Enable SSE4.1 optimizations

This patch checks for an and then enables sse4.1 optimizations if the
host machine will be x86/x86_64.

v2: - Don't compile code, it's unnecessary since we require a compiler
      which always has SSE4.1 (Matt)
v3: - x64 -> x86_64 (Matt)

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agobroadcom/vc5: Fix BASE_LEVEL handling with txl.
Eric Anholt [Wed, 22 Nov 2017 00:32:33 +0000 (16:32 -0800)]
broadcom/vc5: Fix BASE_LEVEL handling with txl.

The HW doesn't add the base level anywhere (the min/max lod clamping is
what does base level), so we need to add it manually in this case.

Fixes piglit tex-miplevel-selection *Lod 2D.

7 years agobroadcom/vc5: Fix array texture layer count setup.
Eric Anholt [Wed, 22 Nov 2017 00:05:49 +0000 (16:05 -0800)]
broadcom/vc5: Fix array texture layer count setup.

Fixes piglit array-texture.

7 years agobroadcom/vc5: Don't increment primitive queries while they're paused.
Eric Anholt [Tue, 21 Nov 2017 23:27:20 +0000 (15:27 -0800)]
broadcom/vc5: Don't increment primitive queries while they're paused.

Fixes ext_transform_feedback-generatemipmap prims_generated

7 years agobroadcom/vc5: Fix incorrect padding of TF outputs.
Eric Anholt [Tue, 21 Nov 2017 23:20:31 +0000 (15:20 -0800)]
broadcom/vc5: Fix incorrect padding of TF outputs.

After the first output, we were padding by an extra size of the previous
output.  Fixes piglit ext_transform_feedback-output-type mat4x3[2] and
friends.

7 years agobroadcom/vc5: Fix UIF surface size setup for ARB_fbo's mismatched sizes.
Eric Anholt [Tue, 21 Nov 2017 23:00:36 +0000 (15:00 -0800)]
broadcom/vc5: Fix UIF surface size setup for ARB_fbo's mismatched sizes.

The HW was computing an implicit height for the surface based on the image
size, but that may be smaller than the surface with ARB_fbo mismatched
sizes.  In that case, we need to tell it about the pad, either with the
little 4-bit field in the RT config, or the extended field in
CLEAR_COLORS_PART3.

Fixes piglit arb_framebuffer_object-mixed-buffer-sizes.

7 years agoetnaviv: Put HALTI level in specs
Wladimir J. van der Laan [Sat, 18 Nov 2017 09:44:25 +0000 (10:44 +0100)]
etnaviv: Put HALTI level in specs

The HALTI level is an indication of the gross architecture of the GPU.
It determines for significant part what feature level the GPU has, what
state (especially frontend state) is there, and where it is located.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
7 years agoetnaviv: Const-correctness etnaviv_emit.h
Wladimir J. van der Laan [Sat, 18 Nov 2017 09:44:24 +0000 (10:44 +0100)]
etnaviv: Const-correctness etnaviv_emit.h

The relocation structure is never changed by submitting it.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
7 years agomeson: add si_driinfo.h in libgallium_dri
Juan A. Suarez Romero [Tue, 21 Nov 2017 11:38:27 +0000 (12:38 +0100)]
meson: add si_driinfo.h in libgallium_dri

v2: generate target conditionally (Dylan)

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
7 years agonir/gather_info: recognize load_patch_vertices_in as a system value
Iago Toral Quiroga [Thu, 16 Nov 2017 07:53:07 +0000 (08:53 +0100)]
nir/gather_info: recognize load_patch_vertices_in as a system value

This intrinsic is produced to load SYSTEM_VALUE_VERTICES_IN, which is
generated to load gl_PatchVerticesIn in the SPIR-V path for both
Vulkan and OpenGL.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoi965: Support decoding INTERFACE_DESCRIPTOR_DATA with INTEL_DEBUG=bat
Jordan Justen [Wed, 15 Nov 2017 00:27:34 +0000 (16:27 -0800)]
i965: Support decoding INTERFACE_DESCRIPTOR_DATA with INTEL_DEBUG=bat

This will dump the INTERFACE_DESCRIPTOR_DATA along with the associated
samplers & surfaces.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
7 years agointel/genxml: Add helpers for determining field type
Kristian H. Kristensen [Wed, 30 Nov 2016 05:07:57 +0000 (21:07 -0800)]
intel/genxml: Add helpers for determining field type

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoi965/fs: Check ADD/MAD with immediates in satprop unit test
Matt Turner [Mon, 20 Nov 2017 22:21:43 +0000 (14:21 -0800)]
i965/fs: Check ADD/MAD with immediates in satprop unit test

The gen had to be changed from 4 to 6 so that we could test MAD, which
is new on Gen6.

mad_imm_float_neg_mov_sat tests the case fixed by the previous commit.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
7 years agoi965/fs: Handle negating immediates on MADs when propagating saturates
Matt Turner [Mon, 20 Nov 2017 22:24:57 +0000 (14:24 -0800)]
i965/fs: Handle negating immediates on MADs when propagating saturates

MADs don't take immediate sources, but we allow them in the IR since it
simplifies a lot of things. I neglected to consider that case.

Fixes: 4009a9ead490 ("i965/fs: Allow saturate propagation to propagate
                      negations into MADs.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103616
Reported-and-Tested-by: Ruslan Kabatsayev <b7.10110111@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
7 years agomesa/teximage: add TEXTURE_CUBE_MAP_ARRAY target for CompressedTexImage3D
Juan A. Suarez Romero [Wed, 15 Nov 2017 16:49:21 +0000 (16:49 +0000)]
mesa/teximage: add TEXTURE_CUBE_MAP_ARRAY target for CompressedTexImage3D

From section 8.7, page 179 of OpenGL ES 3.2 spec:

  An INVALID_OPERATION error is generated by CompressedTexImage3D
  if internalformat is one of the the formats in table 8.17 and target
  is not TEXTURE_2D_ARRAY, TEXTURE_CUBE_MAP_ARRAY or TEXTURE_3D.

  An INVALID_OPERATION error is generated by CompressedTexImage3D if
  internalformat is TEXTURE_CUBE_MAP_ARRAY and the “Cube Map Array”
  column of table 8.17 is not checked, or if internalformat is
  TEXTURE_3D and the “3D Tex.” column of table 8.17 is not checked.

So far it was only considering TEXTURE_2D_ARRAY as valid target. But as
"Cube Map Array" column is checked for all the cases, in practice we can
consider also TEXTURE_CUBE_MAP_ARRAY.

This fixes KHR-GLES32.core.texture_cube_map_array.etc2_texture

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
7 years agointel: fix disasm_info memory leaks
Tapani Pälli [Mon, 20 Nov 2017 08:57:17 +0000 (10:57 +0200)]
intel: fix disasm_info memory leaks

Fixes: 4f82b1728719 ("i965: Rewrite disassembly annotation code")
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agost/glsl_to_nir: don't generate nir twice for gs
Timothy Arceri [Thu, 16 Nov 2017 00:16:10 +0000 (11:16 +1100)]
st/glsl_to_nir: don't generate nir twice for gs

This was left out of c980a3aa3133

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agollvmpipe: fix snorm blending
Roland Scheidegger [Sat, 18 Nov 2017 05:23:35 +0000 (06:23 +0100)]
llvmpipe: fix snorm blending

The blend math gets a bit funky due to inverse blend factors being
in range [0,2] rather than [-1,1], our normalized math can't really
cover this.
src_alpha_saturate blend factor has a similar problem too.
(Note that piglit fbo-blending-formats test is mostly useless for
anything but unorm formats, since not just all src/dst values are
between [0,1], but the tests are crafted in a way that the results
are between [0,1] too.)

v2: some formatting fixes, and fix a fairly obscure (to debug)
issue with alpha-only formats (not related to snorm at all), where
blend optimization would think it could simplify the blend equation
if the blend factors were complementary, however was using the
completely unrelated rgb blend factors instead of the alpha ones...

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
7 years agor600: add cull distance support
Dave Airlie [Fri, 13 May 2016 04:35:33 +0000 (14:35 +1000)]
r600: add cull distance support

This passes all the tests in piglit.

Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoi965: Optimize bucket index calculation
Aravindan Muthukumar [Thu, 9 Nov 2017 05:45:28 +0000 (11:15 +0530)]
i965: Optimize bucket index calculation

Reducing Bucket index calculation to O(1).

This algorithm calculates the index using matrix method.  Assuming
PAGE_SIZE is 4096, matrix arrangement is as below:

          1*4096   2*4096    3*4096    4*4096
          5*4096   6*4096    7*4096    8*4096
          10*4096  12*4096   14*4096   16*4096
          20*4096  24*4096   28*4096   32*4096
           ...      ...       ...       ...
           ...      ...       ...       ...
           ...      ...       ...   max_cache_size

From this matrix its clearly seen that every row follows the below way:

          ...       ...       ...        n
        n+(1/4)n  n+(1/2)n  n+(3/4)n    2n

Row is calculated as log2(size/PAGE_SIZE) Column is calculated as
converting the difference between the elements to fit into power size of
two and indexing it.

Final Index is (row*4)+(col-1)

Tested with Intel Mesa CI.

Improves performance of 3DMark on BXT by 0.705966% +/- 0.229767% (n=20)

v4: Review comments on style and code comments implemented (Ian).
v3: Review comments implemented (Ian).
v2: Review comments implemented (Jason).

Signed-off-by: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
Signed-off-by: Kedar Karanje <kedar.j.karanje@intel.com>
Reviewed-by: Yogesh Marathe <yogesh.marathe@intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
7 years agomeson: Guard the gallium dri componenet
Dylan Baker [Wed, 15 Nov 2017 01:04:27 +0000 (17:04 -0800)]
meson: Guard the gallium dri componenet

Currently the target has a redundant guard, and the state tracker isn't
properly guarded.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
7 years agomeson: don't build gallium subdir unless we're building gallium
Dylan Baker [Wed, 15 Nov 2017 01:03:39 +0000 (17:03 -0800)]
meson: don't build gallium subdir unless we're building gallium

This will allow us to simplify some guards within the gallium directory.

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
7 years agobroadcom/vc5: Align 1D texture miplevels to 64b.
Eric Anholt [Mon, 20 Nov 2017 18:14:38 +0000 (10:14 -0800)]
broadcom/vc5: Align 1D texture miplevels to 64b.

Fixes tex-miplevel-selection GL2:texture() 1D

7 years agobroadcom/vc5: Clamp min lod to the last level.
Eric Anholt [Mon, 20 Nov 2017 18:07:24 +0000 (10:07 -0800)]
broadcom/vc5: Clamp min lod to the last level.

Otherwise, the simulator would complain in tex-miplevel-selection that the
min/max clamp was out of order.  The actual HW seems to have clamped to
the max anyway.

7 years agobroadcom/vc5: Increase simulator memory for tex-miplevel-selection.
Eric Anholt [Mon, 20 Nov 2017 20:26:49 +0000 (12:26 -0800)]
broadcom/vc5: Increase simulator memory for tex-miplevel-selection.

We were overflowing, because of all the little 4k allocations for CLs that
were getting expanded to 128kb in the simulator due to the GMP alignment.

7 years agoswr/rast: Repair simd8 frontend code rot
Tim Rowley [Fri, 10 Nov 2017 22:45:38 +0000 (16:45 -0600)]
swr/rast: Repair simd8 frontend code rot

Keep non-default simd8 frontend code running for comparison purposes.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: Implement AVX-512 GATHERPS in SIMD16 fetch shader
Tim Rowley [Thu, 9 Nov 2017 01:17:24 +0000 (19:17 -0600)]
swr/rast: Implement AVX-512 GATHERPS in SIMD16 fetch shader

Disabled for now.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: Simplify GATHER* jit builder api
Tim Rowley [Wed, 8 Nov 2017 20:07:33 +0000 (14:07 -0600)]
swr/rast: Simplify GATHER* jit builder api

General cleanup, and prep work for possibly moving to llvm masked
gather intrinsic.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: Add alignment to transpose targets
Tim Rowley [Tue, 7 Nov 2017 21:24:25 +0000 (15:24 -0600)]
swr/rast: Add alignment to transpose targets

Needed to ensure alignment for avx512.

Fixes address sanitizer crash.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: Cache eventmanager
Tim Rowley [Tue, 7 Nov 2017 19:50:11 +0000 (13:50 -0600)]
swr/rast: Cache eventmanager

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: Enable AVX-512 targets in the jitter
Tim Rowley [Tue, 31 Oct 2017 21:46:59 +0000 (16:46 -0500)]
swr/rast: Enable AVX-512 targets in the jitter

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: Points with clipdistance can't go through simplepoints path
Tim Rowley [Tue, 31 Oct 2017 14:41:02 +0000 (09:41 -0500)]
swr/rast: Points with clipdistance can't go through simplepoints path

Fixes piglit glsl-1.20:vs-clip-vertex-primitives and
glsl-1.30:vs-clip-distance-primitives.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: Code style change (NFC)
Tim Rowley [Mon, 23 Oct 2017 20:10:35 +0000 (15:10 -0500)]
swr/rast: Code style change (NFC)

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: Widen fetch shader to SIMD16
Tim Rowley [Thu, 19 Oct 2017 22:33:37 +0000 (17:33 -0500)]
swr/rast: Widen fetch shader to SIMD16

Widen fetch shader to SIMD16, enable SIMD16 types in the jitter,
and provide utility EXTRACT/INSERT SIMD8 <-> SIMD16 utility functions.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>