mesa.git
7 years agoswr/rast: Properly size GS stage scratch space
Tim Rowley [Thu, 8 Jun 2017 16:48:37 +0000 (11:48 -0500)]
swr/rast: Properly size GS stage scratch space

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: Fix early z / query interaction
Tim Rowley [Wed, 7 Jun 2017 18:32:11 +0000 (13:32 -0500)]
swr/rast: Fix early z / query interaction

For certain cases, we perform early z for optimization. The GL_SAMPLES_PASSED
query was providing erroneous results because we were counting the number
of samples passed before the fragment shader, which did not work if the
fragment shader contained a discard.

Account properly for discard and early z, by anding the zpass mask with
the post fragment shader active mask, after the fragment shader.

Fixes the following piglit tests:
    - occlusion-query-discard
    - occlusion_query_meta_fragments

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: Share vertex memory between VS input/output
Tim Rowley [Wed, 7 Jun 2017 18:16:15 +0000 (13:16 -0500)]
swr/rast: Share vertex memory between VS input/output

Removes large simdvertex stack allocation.

Vertex shader must ensure reads happen before writes.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: Add support for dynamic vertex size for VS output
Tim Rowley [Tue, 6 Jun 2017 23:41:40 +0000 (18:41 -0500)]
swr/rast: Add support for dynamic vertex size for VS output

Add support for dynamic vertex size for the vertex shader output.

Add new state in SWR_FRONTEND_STATE to specify the size.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: SIMD16 FE - improve calcDeterminantIntVertical
Tim Rowley [Tue, 6 Jun 2017 20:34:54 +0000 (15:34 -0500)]
swr/rast: SIMD16 FE - improve calcDeterminantIntVertical

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: Add support to PA for variable sized vertices
Tim Rowley [Mon, 5 Jun 2017 21:13:25 +0000 (16:13 -0500)]
swr/rast: Add support to PA for variable sized vertices

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: Rework attribute layout
Tim Rowley [Thu, 1 Jun 2017 18:08:04 +0000 (13:08 -0500)]
swr/rast: Rework attribute layout

Move fixed attributes to the top and pack single component SGVs.
WIP to support dynamically allocated vertex size.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: Remove explicit primitive id slot in the vertex layout
Tim Rowley [Wed, 31 May 2017 16:24:08 +0000 (11:24 -0500)]
swr/rast: Remove explicit primitive id slot in the vertex layout

- Remove any special casing in the PS stage when primitive ID is input.
  Treat as a normal attribute that must be set up properly in the FE linkage.
- Remove primitive id from the PS_CONTEXT and TRI_FLAGS

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: Fix invalid 16-bit format traits for A1R5G5B5
Tim Rowley [Fri, 26 May 2017 06:47:58 +0000 (01:47 -0500)]
swr/rast: Fix invalid 16-bit format traits for A1R5G5B5

Correctly handle formats of <= 16 bits where the component bits don't
add up to the pixel size.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr/rast: Implement JIT shader caching to disk
Tim Rowley [Thu, 25 May 2017 02:54:43 +0000 (21:54 -0500)]
swr/rast: Implement JIT shader caching to disk

Disabled by default; currently doesn't cache shaders (fs,gs,vs).

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agogallium/docs: improve docs for SAMPLE_POS, SAMPLE_INFO, TXQS, MSAA semantics
Brian Paul [Fri, 26 May 2017 19:56:37 +0000 (13:56 -0600)]
gallium/docs: improve docs for SAMPLE_POS, SAMPLE_INFO, TXQS, MSAA semantics

For the SAMPLE_POS and SAMPLE_INFO opcodes, clarify resource vs. render
target queries, range of postion values, swizzling, etc.  We basically
follow the DX10.1 conventions.

For the TXQS opcode and TGSI_SEMANTIC_SAMPLEID, clarify return value
and type.

For the TGSI_SEMANTIC_SAMPLEPOS system value, clarify the range of
positions returned.

v2: use 'undef' for unused vector components.  Use (0.5, 0.5, undef, undef)
for sample pos when MSAA not applicable.

v3: Add note that OPCODE_SAMPLE_INFO, OPCODE_SAMPLE_POS are not used yet
and the information is subject to change.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
7 years agosvga: add some missing SVGA_STATS_* enum values, prefix strings
Brian Paul [Fri, 16 Jun 2017 19:16:30 +0000 (13:16 -0600)]
svga: add some missing SVGA_STATS_* enum values, prefix strings

To fix the build when VMX86_STATS is defined.
Also, some minor whitespace changes to match upstream code.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
7 years agoradeonsi: add new polaris12 pci id
Alex Deucher [Fri, 16 Jun 2017 16:12:21 +0000 (12:12 -0400)]
radeonsi: add new polaris12 pci id

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: 17.0 17.1 <mesa-stable@lists.freedesktop.org>
7 years agoswr: Don't crash when encountering a VBO with stride = 0.
Bruce Cherniak [Thu, 15 Jun 2017 16:24:47 +0000 (11:24 -0500)]
swr: Don't crash when encountering a VBO with stride = 0.

The swr driver uses vertex_buffer->stride to determine the number
of elements in a VBO. A recent change to the state-tracker made it
possible for VBO's with stride=0. This resulted in a divide by zero
crash in the driver. The solution is to use the pre-calculated vertex
element stream_pitch in this case.

This patch fixes the crash in a number of piglit and VTK tests introduced
by 17f776c27be266f2.

There are several VTK tests that still crash and need proper handling of
vertex_buffer_index.  This will come in a follow-on patch.

v2: Correctly update all parameters for VBO constants (stride = 0).
    Also fixes the remaining crashes/regressions that v1 did
    not address, without touching vertex_buffer_index.

Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
7 years agointel/isl: Add the maximum surface size limit
Anuj Phogat [Fri, 19 May 2017 19:09:22 +0000 (12:09 -0700)]
intel/isl: Add the maximum surface size limit

V2: Use 2^31 bytes (2GB) surface size limit on pre-gen9 and
    2^38 bytes for gen9+.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
7 years agointel/isl: Use uint64_t to store total surface size
Anuj Phogat [Fri, 19 May 2017 20:47:12 +0000 (13:47 -0700)]
intel/isl: Use uint64_t to store total surface size

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
7 years agoi965: Mark freshly allocate bo as idle
Chris Wilson [Thu, 8 Jun 2017 23:35:09 +0000 (00:35 +0100)]
i965: Mark freshly allocate bo as idle

When created, buffers are idle, so mark them as such to save an early
ioctl or mistakenly assuming the fresh buffer is busy.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoetnaviv: add rs-operations sw query
Christian Gmeiner [Fri, 9 Jun 2017 10:34:49 +0000 (12:34 +0200)]
etnaviv: add rs-operations sw query

It could be useful to get the number of emited resolve operations when
doing driver optimizations.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
7 years agoetnaviv: advertise correct max LOD bias
Lucas Stach [Sun, 4 Jun 2017 19:06:33 +0000 (21:06 +0200)]
etnaviv: advertise correct max LOD bias

The maximum LOD bias supported is the same as the max texture level
supported.

Fixes piglit: ext_texture_lod_bias

Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
7 years agoetnaviv: mask correct channel for RB swapped rendertargets
Lucas Stach [Sun, 4 Jun 2017 19:06:32 +0000 (21:06 +0200)]
etnaviv: mask correct channel for RB swapped rendertargets

Now that we support RB swapped targets by using a shader variant, we
must derive the color mask from both the blend state and the bound
framebuffer.

Fixes piglit: fbo-colormask-formats

Fixes: 7f62ffb68ad ("etnaviv: add support for rb swap")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
7 years agoetnaviv: replace translate_clear_color with util_pack_color
Lucas Stach [Sun, 4 Jun 2017 19:06:31 +0000 (21:06 +0200)]
etnaviv: replace translate_clear_color with util_pack_color

This replaces the open coded etnaviv version of the color pack with the
common util_pack_color.

Fixes piglits:
arb_color_buffer_float-clear
fcc-front-buffer-distraction
fbo-clearmipmap

Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
7 years agoetnaviv: remove bogus assert
Lucas Stach [Sun, 4 Jun 2017 19:06:30 +0000 (21:06 +0200)]
etnaviv: remove bogus assert

etna_resource_copy_region handles resources with multiple samples
by falling back to the software path. There is no need to kill the
application there.

Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
7 years agoetnaviv: use padded width/height for resource copies
Lucas Stach [Sun, 4 Jun 2017 19:06:29 +0000 (21:06 +0200)]
etnaviv: use padded width/height for resource copies

When copying a resource fully we can just blit the whole level. This allows
to use the RS even for level sizes not aligned to the RS min alignment. This
is especially useful, as etna_copy_resource is part of the software fallback
paths (used in etna_transfer), that are used for doing unaligned copies.

Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
7 years agoetnaviv: don't try RS blit if blit region is unaligned
Lucas Stach [Sun, 4 Jun 2017 19:06:28 +0000 (21:06 +0200)]
etnaviv: don't try RS blit if blit region is unaligned

If the blit region is not aligned to the RS min alignment don't try
to execute the blit, but fall back to the software path.

Fixes: c9e8b49b ("etnaviv: gallium driver for Vivante GPUs")
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Lucas Stach <dev@lynxeye.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
7 years agoRevert "amd/common: add missing libdrm include path"
Emil Velikov [Fri, 26 May 2017 15:32:53 +0000 (16:32 +0100)]
Revert "amd/common: add missing libdrm include path"

This reverts commit 44b29dd7b6cdc1a3fde58c367b9de8081ac4167b.

Should no longer be required as of last patch.

Cc: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agoac: remove amdgpu.h dependency
Emil Velikov [Mon, 29 May 2017 13:50:47 +0000 (14:50 +0100)]
ac: remove amdgpu.h dependency

Add a couple of forward declarations and drop the amdgpu.h requirement.

With this we can build the r300 and r600 drivers without the need for
amdgpu.

v2:
 - Add amdgpu.h include in the C file (Marek)
 - Add a comment about pre C11 typedef redeclaration warning (Eric)

Cc: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101189
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
7 years agor600g,compute: provide local copy of functions from ac_binary.c
Jan Vesely [Fri, 2 Jun 2017 16:37:07 +0000 (12:37 -0400)]
r600g,compute: provide local copy of functions from ac_binary.c

This is a verbatim copy of the code. The functions can be cleaned up since
r600 does not use all the stuff that gcn does.
The symbol names have been changed since we still use ac_binary.h header
(for struct definition)

v2: Add ifdef guard around r600_binary_clean call (Aaron)
    Remove stray comment

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Tested-By: Aaron Watry <awatry@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agor600: android: amdgpu_common is only required when building OpenCL
Jan Vesely [Fri, 2 Jun 2017 16:37:06 +0000 (12:37 -0400)]
r600: android: amdgpu_common is only required when building OpenCL

v2: split off Android changes

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoegl/display: make platform detection thread-safe
Eric Engestrom [Thu, 15 Jun 2017 22:53:55 +0000 (23:53 +0100)]
egl/display: make platform detection thread-safe

Imagine there are 2 threads that both call _eglGetNativePlatform()
simultaneously:
- thread 1 completes the first "if (native_platform ==
  _EGL_INVALID_PLATFORM)" check and is preempted to do something else
- thread 2 executes the whole function, does "native_platform =
  _EGL_NATIVE_PLATFORM" and just before returning it's preempted
- thread 1 wakes up and calls _eglGetNativePlatformFromEnv() which
  returns _EGL_INVALID_PLATFORM because no env vars are set, updates
  native_platform and then gets preempted again
- thread 2 wakes up and returns wrong _EGL_INVALID_PLATFORM

Solve this by doing the detection in a local var and only overwriting
the global one at the end, if no other thread has updated it since.

This means the platform detected in the thread might not be the platform
returned by the function, but this is a different issue that will need
to be discussed when this becomes possible.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101252
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
7 years agoegl/display: only detect the platform once
Eric Engestrom [Thu, 15 Jun 2017 22:53:54 +0000 (23:53 +0100)]
egl/display: only detect the platform once

My refactor missed the fact that `native_platform` is static.
Add the proper guard around the detection code, as it might not be
necessary, and only print the debug message when a detection was
actually performed.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101252
Fixes: 7adb9b094894a512c019 ("egl/display: remove unnecessary code and
                              make it easier to read")
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
7 years agosvga: Relax the format checks for copy_region_vgpu10 somewhat
Thomas Hellstrom [Wed, 14 Jun 2017 13:53:40 +0000 (15:53 +0200)]
svga: Relax the format checks for copy_region_vgpu10 somewhat

The new generic checks were actually more restrictive than the previous svga-
specific tests and not vice versa. So bypass the common format checks for
copy_region_vgpu10.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
7 years agosvga: Fix incorrect format conversion blit destination
Thomas Hellstrom [Wed, 14 Jun 2017 13:39:42 +0000 (15:39 +0200)]
svga: Fix incorrect format conversion blit destination

The blit.dst.resource member that was used as destination was
modified earlier in the function, effectively making us try to blit
the content onto itself. Fix this and also add a debug printout when the
format conversion blits fail.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
7 years agosvga: Fix srgb copy_region regression
Thomas Hellstrom [Wed, 3 May 2017 12:26:02 +0000 (05:26 -0700)]
svga: Fix srgb copy_region regression

This fixes a tf2 srgb copy_region regression from
"svga: Rework the blit and resource_copy_region functionality v3"

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
7 years agosvga: Prefer accelerated blits over cpu copy region
Thomas Hellstrom [Thu, 27 Apr 2017 06:58:47 +0000 (23:58 -0700)]
svga: Prefer accelerated blits over cpu copy region

This reduces the number of cpu copy_region fallbacks on a Nvidia system
running the piglit command

./publish/bin/piglit run  -1 -t copy -t blit tests/quick

from 64789 to 780

Previously this has caused a regression in piglit test
spec@!opengl 1.0@gl-1.0-scissor-copypixels, but I'm currently not able to
reproduce that regression.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
7 years agosvga: Support accelerated conditional blitting
Thomas Hellstrom [Wed, 12 Apr 2017 08:38:23 +0000 (10:38 +0200)]
svga: Support accelerated conditional blitting

The blitter has functions to save and restore the conditional rendering state,
but we currently don't save the needed info.

Since also the copy_region_vgpu10 path supports conditional blitting,
we instead use the same function as the clearing routines and move
that function to svga_pipe_query.c

Note that we still haven't implemented conditional blitting with
the software fallbacks.

Fixes piglit nv_conditional_render::copyteximage

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
7 years agosvga: Use utility functions to help determine whether we can use copy_region
Thomas Hellstrom [Wed, 12 Apr 2017 07:28:49 +0000 (09:28 +0200)]
svga: Use utility functions to help determine whether we can use copy_region

It seems like the SVGA tests are in general more stringent than the utility
tests, but they also miss some blitter features like filters and window
rectangles, and if new blitter features are added in the future, it might
be possible that we forget adding tests for those.

So in addition to the SVGA tests, use the utility tests to restrict the
situations where we can use copy_region.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
7 years agosvga: Rework the blit and resource_copy_region functionality v3
Thomas Hellstrom [Tue, 11 Apr 2017 13:18:04 +0000 (15:18 +0200)]
svga: Rework the blit and resource_copy_region functionality v3

This work was initially trigged by the fact that imported surfaces may
be backed by other SVGA3D formats than the default. Therefore some fixes were
needed to avoid using the copy_region_vgpu10() functionality for incompatible
SVGA3D formats where the pipe formats were OK. This situation happens when
using dri3.

Also in some situations, for example where a R8G8_UNORM surface is backed by
an SVGA3D_NV12 format, we can't use the copy_region functionality at all and
thus need to fall back to the quad blitter also for the resource_copy_region
function. This situation doesn't happen currently, but will if we start using
video textures.

The patch makes the blit- and copy_region paths similar and the decision whether
to use a certain gpu command should now be easy to locate. Probably the
resource_copy_region path will suffer from a minor additional cpu overhead,
but on the other hand there are more cases now that we accelerate, since
we try harder before falling back to cpu copies / blits.

v2: Addressed review comments and fixed up piglit failures by sometimes
preferring cpu_copy_region() over blit().
v3: Removed a stray test statement. Updated commit message.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
7 years agoi965: Improve conditional rendering in fallback paths.
Kenneth Graunke [Thu, 2 Mar 2017 00:41:05 +0000 (16:41 -0800)]
i965: Improve conditional rendering in fallback paths.

We need to fall back in a couple of cases:
- Sandybridge (it just doesn't do this in hardware)
- Occlusion queries on Gen7-7.5 with command parser version < 2
- Transform feedback overflow queries on Gen7, or on Gen7.5 with
  command parser version < 7

In these cases, we printed a perf_debug message and fell back to
_mesa_check_conditional_render(), which stalls until the full
query result is available.  Additionally, the code to handle this
was a bit of a mess.

We can do better by using our normal conditional rendering code,
and setting a new state, BRW_PREDICATE_STATE_STALL_FOR_QUERY, when
we would have set BRW_PREDICATE_STATE_USE_BIT.  Only if that state
is set do we perf_debug and potentially stall.  This means we avoid
stalls when we have a partial query result (i.e. we know it's > 0,
but don't have the full value).  The perf_debug should trigger less
often as well.

Still, this is primarily intended as a cleanup.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoconfigure.ac: remove manual AC_SUBST for pthread-stubs
Emil Velikov [Sun, 4 Jun 2017 23:04:02 +0000 (00:04 +0100)]
configure.ac: remove manual AC_SUBST for pthread-stubs

Unneeded, since the PKG_CHECK_MODULES macro already does the
substitution of the package Cflags/Libs.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agoconfigure.ac: add -pthread to PTHREAD_LIBS
Emil Velikov [Sun, 4 Jun 2017 23:03:59 +0000 (00:03 +0100)]
configure.ac: add -pthread to PTHREAD_LIBS

As described inline - follow what's written in the manual and what works
for all platforms that Mesa supports.

We want to untangle things leaving only -pthread, yet that has a
potential of causing regressions. Thus we'll do it as a follow-up patch.

As a nice side-effect this resolves issues, where the system lacks
libpthread.so, yet the linker does not warn about it and we and up with
unresolved symbols.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101071
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agomesa: stop assigning unused storage for non-bindless opaque types
Timothy Arceri [Thu, 15 Jun 2017 01:56:28 +0000 (11:56 +1000)]
mesa: stop assigning unused storage for non-bindless opaque types

The storage was once used by get_sampler_uniform_value() but that
was fixed long ago to use the uniform storage assigned by the
linker.

By not assigning storage for images/samplers the constant buffer
for gallium drivers will be reduced which could result in small
perf improvements.

V2: rebase on ARB_bindless_texture

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
7 years agoegl/android: Fix typ-o
Robert Foss [Thu, 15 Jun 2017 20:47:53 +0000 (16:47 -0400)]
egl/android: Fix typ-o

Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
7 years agodraw: check for line_width != 1.0f in validate_pipeline()
Brian Paul [Thu, 15 Jun 2017 17:29:38 +0000 (11:29 -0600)]
draw: check for line_width != 1.0f in validate_pipeline()

We shouldn't use the wide line stage if the line width is 1.
This check isn't strictly needed because all drivers are (now)
specifying a line wide threshold of at least 1.0 pixels, but
let's play it safe.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
7 years agosvga: clamp device line width to at least 1 to fix HWv8 line stippling
Brian Paul [Thu, 15 Jun 2017 17:31:53 +0000 (11:31 -0600)]
svga: clamp device line width to at least 1 to fix HWv8 line stippling

The line stipple fallback code for virtual HW version 8 didn't work.

With HW version 8, we were getting zero when querying the max line
widths (AA and non-AA).  This means we were setting the draw module's
wide line threshold to zero.  This caused the wide line stage to always
get enabled.  That caused the line stipple module to fall because the
wide line stage was clobbering the rasterization state with a state
object setting the line stipple pattern to 0xffff.

Now the wide_lines variable in draw's validate_pipeline() will not
be incorrectly set.

Also improve debug output.

BTW, also this fixes several other piglit tests: polygon-mode,
primitive- restart-draw-mode, and line-flat-clip-color since they
all use the draw module fallback.

See VMware bug 1895811.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
7 years agodraw: whitespace and formatting fixes
Brian Paul [Thu, 15 Jun 2017 17:40:37 +0000 (11:40 -0600)]
draw: whitespace and formatting fixes

Trivial.

7 years agoautomake: increase the MESA_GIT_SHA1 hash id length from 7 to 10 digits
Brian Paul [Thu, 15 Jun 2017 03:38:31 +0000 (21:38 -0600)]
automake: increase the MESA_GIT_SHA1 hash id length from 7 to 10 digits

The SCons build has been using 10 digits of the git hash id for the
MESA_GIT_SHA1 string in git_sha1.h for about a year now.  I bumped it
up after running into a case where a 7-digit hash ID was ambiguous.

This patch makes the same change for the autotools build.

The command "git log | grep "^commit" | cut -b 8-14 | sort | uniq -d"
shows there are currently 17 cases where 7 digits of hash id are
ambiguous on master (probably quite a few more if we'd consider other
branches).

Instead of using "git log -n 1 --oneline" use
"git rev-parse --short=10 HEAD" to get the HEAD hash id.

v2: use printf instead of sed, per Eric's suggestion.

Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agogallium: Add renderonly-based support for pl111+vc4.
Eric Anholt [Mon, 8 May 2017 22:57:21 +0000 (15:57 -0700)]
gallium: Add renderonly-based support for pl111+vc4.

This follows the model of imx (display) and etnaviv (render): pl111 is a
display-only device, so when asked to do GL for it, we see if we have a
vc4 renderer, make the vc4 screen, and have vc4 call back to pl111 to do
scanout allocations.

The difference from etnaviv is that we share the same BO between vc4 and
pl111, rather than having a vc4 bo and a pl11 bo and copies between the
two.  The only mismatch between their requirements is that vc4 requires
4-pixel (at 32bpp) stride alignment, while pl111 requires that stride
match width.  The kernel will reject any modesets to an incorrect stride,
so the 3D driver doesn't need to worry about that.

v2: Rebase on Android rework, drop unused include.
v3: Fix another Android bug, from Rob Herring's build-testing.

Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
7 years agoetnaviv: Only use renderonly_get_handle for GEM handles.
Eric Anholt [Wed, 10 May 2017 23:06:11 +0000 (16:06 -0700)]
etnaviv: Only use renderonly_get_handle for GEM handles.

Note that for requests for Prime FDs or flink names, we return handles to
the etanviv BO, not the scanout BO.  This is at least better than previous
behavior of returning GEM handles for a request for an FD or flink name.

And add an assert that renderonly_get_handle is only used for getting the
GEM handle.

Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
7 years agoandroid: r600/eg: add support for tracing IBs after a hang.
Mauro Rossi [Sun, 4 Jun 2017 17:11:48 +0000 (19:11 +0200)]
android: r600/eg: add support for tracing IBs after a hang.

The rules to generate egd_tables.h are added in Android makefile

Fixes: f42fb00 "r600/eg: add support for tracing IBs after a hang."
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agosvga: fix git_sha1.h include path in Android.mk (v3)
Mauro Rossi [Tue, 6 Jun 2017 21:28:33 +0000 (23:28 +0200)]
svga: fix git_sha1.h include path in Android.mk (v3)

Adds libmesa_git_sha1 static (dummy) library to generate git_sha1.h
with some polishing to header dependency on .git/HEAD and scripted rules.

The now redundant generation rules are removed from Android.gen.mk
libmesa_git_sha1 whole static depedency is added to libmesa_pipe_svga,
libmesa_dricore and libmesa_st_mesa modules

Fixes the following building error:

external/mesa/src/gallium/drivers/svga/svga_screen.c:26:10:
fatal error: 'git_sha1.h' file not found
         ^
1 error generated.

Fixes: 1ce3a27 ("svga: Add the ability to log messages to
vmware.log on the host.")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agobin/get-fixes-pick-list.sh: better identify multiple "fixes:" tags
Andres Gomez [Sat, 13 May 2017 00:11:08 +0000 (03:11 +0300)]
bin/get-fixes-pick-list.sh: better identify multiple "fixes:" tags

We were not considering as multiple fixes lines with:
Fixes: $sha_1, Fixes: $sha_2
Now, we split the lines so we will consider them individually, as in:
Fixes: $sha_1,
Fixes: $sha_2
Additionally, we try to get the SHA from split lines so:
Fixes:
$sha_1

Will be considered as:
Fixes: $sha_1
v2:
 - Treat empty spaces earlier in fix lines (Emil)
 - Fold 2 lines into one to gather fix commit ids (Emil)

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emli.velikov@collabora.com>
7 years agobin/get-fixes-pick-list.sh: parse just the commit message
Andres Gomez [Sat, 13 May 2017 00:11:07 +0000 (03:11 +0300)]
bin/get-fixes-pick-list.sh: parse just the commit message

We were parsing the whole diff, although the candidates were
identified only by the commit message.

Now, we only use the commit message for parsing.

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emli.velikov@collabora.com>
7 years agogallium/radeon: fix initialization of new resource bindless fields
Samuel Pitoiset [Wed, 14 Jun 2017 19:11:19 +0000 (21:11 +0200)]
gallium/radeon: fix initialization of new resource bindless fields

r600_resource objects are not calloc'd.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agogbm: implement FD import with modifier
Lucas Stach [Thu, 8 Jun 2017 18:56:17 +0000 (20:56 +0200)]
gbm: implement FD import with modifier

This implements a way to import FDs with modifiers on plain GBM devices,
without the need to go through EGL. This is mostly to the benefit of
gbm_gralloc, which can keep its dependencies low.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
7 years agogbm: add API to to import FD with modifier
Lucas Stach [Thu, 8 Jun 2017 18:56:16 +0000 (20:56 +0200)]
gbm: add API to to import FD with modifier

This allows to import an FD with an explicit modifier passed through
userspace protocols.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
7 years agoi965: gen4_blorp_exec.h to the sources list
Emil Velikov [Wed, 14 Jun 2017 16:00:50 +0000 (17:00 +0100)]
i965: gen4_blorp_exec.h to the sources list

We tend to use the sources, as opposed to EXTRA_DIST to include the
headers.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
7 years agogallium/util: Break recursion in pipe_resource_reference
Michel Dänzer [Tue, 13 Jun 2017 03:02:59 +0000 (12:02 +0900)]
gallium/util: Break recursion in pipe_resource_reference

It calling itself recursively prevented it from being inlined, resulting
in a copy being generated in every compilation unit referencing it. This
bloated the text segment of the Gallium mega-driver *_dri.so by ~4%,
and might also have impacted performance.

Fixes: ecd6fce2611e ("mesa/st: support lowering multi-planar YUV")
v2:
* Add comment above pipe_resource_next_reference [Samuel Pitoiset]
v3:
* Use loop to unreference the full chain of resources referenced via
  the next members [Timothy Arceri]
v4:
* Stop chasing ->next chain at the first sub-resource which isn't
  destroyed [Nicolai Hähnle]

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agomesa: fix 'make check' by moving bindless functions at the right place
Samuel Pitoiset [Wed, 14 Jun 2017 16:08:09 +0000 (18:08 +0200)]
mesa: fix 'make check' by moving bindless functions at the right place

Fixes: 5f249b9f05e ("mapi: add GL_ARB_bindless_texture entry points")
Reported-by: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Aaron Watry <awatry@gmail.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
7 years agoi965/miptree: Use the new simple alloc_tiled for CCS buffers
Jason Ekstrand [Mon, 12 Jun 2017 16:44:20 +0000 (09:44 -0700)]
i965/miptree: Use the new simple alloc_tiled for CCS buffers

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965/bufmgr: Add a new, simpler, bo_alloc_tiled
Jason Ekstrand [Mon, 12 Jun 2017 16:40:42 +0000 (09:40 -0700)]
i965/bufmgr: Add a new, simpler, bo_alloc_tiled

ISL already has all of the complexity required to figure out the correct
surface pitch and size taking tile alignment into account.  When we get
a surface out of ISL, the pitch and size are already correct and using
brw_bo_alloc_tiled_2d doesn't actually gain us anything other than extra
asserts we have to do in order to ensure that the bufmgr code and ISL
agree.  This new helper doesn't try to be smart but just allocates the
BO you ask for and sets up the tiling.

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965/bufmgr: Rename bo_alloc_tiled to bo_alloc_tiled_2d
Jason Ekstrand [Mon, 12 Jun 2017 16:35:22 +0000 (09:35 -0700)]
i965/bufmgr: Rename bo_alloc_tiled to bo_alloc_tiled_2d

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965: Use blorp for depth/stencil clears on gen6+
Jason Ekstrand [Sun, 9 Oct 2016 05:54:00 +0000 (22:54 -0700)]
i965: Use blorp for depth/stencil clears on gen6+

Acked-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965: Set step_rate = 0 for interleaved vertex buffers
Jason Ekstrand [Wed, 7 Jun 2017 00:53:26 +0000 (17:53 -0700)]
i965: Set step_rate = 0 for interleaved vertex buffers

Before, we weren't setting step rate so we got whatever old value
happened to be lying around.  This can lead to some interesting
rendering errors.  In particular, if you run the OpenGL ES CTS with
dEQP-GLES3.functional.instanced.types.mat2x4 immediately followed by one
of the dEQP-GLES3.functional.transform_feedback.* tests, the transform
feedback test gets stale instancing data from the other test and fails.
The only thing that is causing this to not be a problem today is that we
use meta for clears and meta is setting up vertex buffers via the VBO or
non-interleaved path and setting step_rate to 0 for us.  When blorp
depth/stencil clears are enabled, meta is no longer sitting between the
two tests and the stale data starts causing noticeable problems.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
7 years agoi965: Disable the interleaved vertex optimization when instancing
Jason Ekstrand [Wed, 7 Jun 2017 03:58:31 +0000 (20:58 -0700)]
i965: Disable the interleaved vertex optimization when instancing

Instance divisor is a property of the vertex buffer and not the vertex
element so if we ever see anything other than 0, bail.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
7 years agointel/blorp: Work around Sandy Bridge occlusion query issue
Jason Ekstrand [Thu, 8 Jun 2017 16:36:15 +0000 (09:36 -0700)]
intel/blorp: Work around Sandy Bridge occlusion query issue

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
7 years agoi965/blorp: Set no_depth_or_stencil correctly
Jason Ekstrand [Sat, 3 Jun 2017 21:48:15 +0000 (14:48 -0700)]
i965/blorp: Set no_depth_or_stencil correctly

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
7 years agoi965: Remove some unneeded fields from brw_context
Jason Ekstrand [Sat, 3 Jun 2017 22:51:29 +0000 (15:51 -0700)]
i965: Remove some unneeded fields from brw_context

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
7 years agoi965: Remove some of the remnants of meta
Jason Ekstrand [Sat, 3 Jun 2017 22:20:32 +0000 (15:20 -0700)]
i965: Remove some of the remnants of meta

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
7 years agointel/isl: Properly set SeparateStencilBufferEnable on gen5-6
Jason Ekstrand [Fri, 2 Jun 2017 17:36:04 +0000 (10:36 -0700)]
intel/isl: Properly set SeparateStencilBufferEnable on gen5-6

On gen5-6, SeparateStencilBufferEnable and HierarchicalDepthBufferEnable
come hand in hand and we have to set either both or neither.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
7 years agoi965/miptree: Choose the stencil layout in miptree_create_layout
Jason Ekstrand [Fri, 2 Jun 2017 17:05:21 +0000 (10:05 -0700)]
i965/miptree: Choose the stencil layout in miptree_create_layout

This ensures that we get the correct layout for all stencil buffers, not
just those which are created as separate stencil for a depth buffer.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
7 years agomesa: Add a BUFFER_BITS mask for depth+stencil
Jason Ekstrand [Wed, 12 Oct 2016 21:15:41 +0000 (14:15 -0700)]
mesa: Add a BUFFER_BITS mask for depth+stencil

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
7 years agoi965/blorp: Set aux_usage to NONE for miplevels without HiZ
Jason Ekstrand [Mon, 10 Oct 2016 18:18:06 +0000 (11:18 -0700)]
i965/blorp: Set aux_usage to NONE for miplevels without HiZ

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
7 years agoradeon/winsys: Limit max allocation size to 70% of VRAM
Aaron Watry [Fri, 9 Jun 2017 17:57:42 +0000 (12:57 -0500)]
radeon/winsys: Limit max allocation size to 70% of VRAM

The CL CTS queries the max allocation size, and then attempts to
allocate buffers of that size. If not enough contiguous RAM/VRAM is
available, this causes errors in the radeon kernel module due to
inability to allocate the required memory.

It's a bit of a hack, but experimentally on my system, I can use ~3/4
of the card's VRAM for a single global/constant buffer allocation given
current GUI/compositor use.

For a 1GB Pitcairn (HD7850) this gets me from the reported clinfo values of:
Global memory size                              2143076352 (1.996GiB)
Max memory allocation                           1500153446 (1.397GiB)
Max constant buffer size                        1500153446 (1.397GiB)

To:
Global memory size                              2143076352 (1.996GiB)
Max memory allocation                           751619276 (716MiB)
Max constant buffer size                        751619276 (716MiB)

Fixes: OpenCL CTS test/conformance/api/min_max_mem_alloc_size,
       OpenCL CTS test/conformance/api/min_max_constant_buffer_size

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoi965: Use a line end cap width of 0.5 unless smooth lines enabled.
Kenneth Graunke [Wed, 10 May 2017 09:45:53 +0000 (02:45 -0700)]
i965: Use a line end cap width of 0.5 unless smooth lines enabled.

This updates the Gen4-5 code to use a line end cap width of 0.5
for non-smooth lines, and 1.0 for smooth lines - which is what we
do on Gen6+.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
7 years agoi965: Use brw_get_line_width() in Gen4-5 SF_STATE code.
Kenneth Graunke [Wed, 10 May 2017 09:41:43 +0000 (02:41 -0700)]
i965: Use brw_get_line_width() in Gen4-5 SF_STATE code.

This unifies the Gen4-5 and Gen6+ line width calculations.

I believe it also fixes a bug - we weren't rounding the line width
to the nearest integer.  The GL 4.5 (and GL 2.1) specs "Wide Lines"
section says:

"The actual width of non-antialiased lines is determined by rounding
 the supplied width to the nearest integer, then clamping it to the
 implementation-dependent maximum non-antialiased line width."

We don't need to care about _NEW_MULTISAMPLE here because multisampling
doesn't exist on Gen4-5, so the state shouldn't change.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
7 years agogenxml: Fix Gen4-5 SF_STATE "Line Width" fixed point type.
Kenneth Graunke [Wed, 10 May 2017 09:40:47 +0000 (02:40 -0700)]
genxml: Fix Gen4-5 SF_STATE "Line Width" fixed point type.

It's a U3.1.  It became a U3.7 on Sandybridge.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
7 years agoi965: Stop using BRW_RASTRULE_LOWER_RIGHT on Gen4-5.
Kenneth Graunke [Wed, 10 May 2017 09:25:23 +0000 (02:25 -0700)]
i965: Stop using BRW_RASTRULE_LOWER_RIGHT on Gen4-5.

This effectively reverts Robert Ellison's 2009 commit
cc8afbd3862fedfe42e51c3774960d1c7078ec53.

I'm not seeing any GL spec text indicating that UPPER won't work.
On Gen6+, this bit moved to 3DSTATE_WM as a single bit, controlling
UPPER_LEFT vs. UPPER_RIGHT.  There is no way to request LOWER_RIGHT,
so UPPER_RIGHT is the best you can do.

In the G45 docs, it's marked as "Reserved" as well, but we just
decided to use it anyway.

This patch unifies the behavior between Gen4-5 and Gen6+.

Note that this is separate from point sprite texcoord behavior.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
7 years agoi965: When gl_PointSize is unwritten, default to 1.0 on Gen4-5.
Kenneth Graunke [Wed, 10 May 2017 09:07:46 +0000 (02:07 -0700)]
i965: When gl_PointSize is unwritten, default to 1.0 on Gen4-5.

Modern GL specifications say that the point size should be 1.0 when
gl_PointSize is unwritten and the last enabled stage is a geometry
or tessellation shader.  If it's a vertex shader, though, both the
GL specs and ES 3.0 spec say that it's undefined - so since Gen4-5
only support vertex shaders, there's no actual requirement to do this.

Since there is a cost associated (an extra dirty bit, which may cause
SF_STATE to be emitted more often), it may not be a good idea.

The real benefit is that it makes all generations behave identically.
And that seems somewhat nice...

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
7 years agoi965: Make Gen4-5 SF_STATE use the point size calculations from Gen6+.
Kenneth Graunke [Wed, 10 May 2017 07:57:05 +0000 (00:57 -0700)]
i965: Make Gen4-5 SF_STATE use the point size calculations from Gen6+.

Apparently, Nanhai made the Gen4-5 point size calculations round to the
nearest integer in commit 8d5231a3582e4f2769ac0685cf0174e09750700e,
"according to spec".  When Eric first ported the driver to Sandybridge,
he did not implement this rounding.

In the GL 2.1 and 3.0 specs "Basic Point Rasterization" section, it does
say "If antialiasing and point sprites are disabled, the actual width is
determined by rounding the supplied width to the nearest integer, then
clamping it to the implementation-dependent maximum non-antialised point
width."

In contrast, GL 3.1 and later do not appear to contain this rounding.

It might be reasonable to round, given that we only implement GL 2.1.
Of course, if we were to do that, we should actually implement the AA
vs. non-AA distinction.  Brian added an XXX comment reminding us to fix
this 10 years ago, but it never happened.

I think a better plan is to follow the newer, unrounded behavior.  This
is what we do on Gen6+ and it passes all the relevant conformance tests.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
7 years agoi965: Do an end-of-pipe sync after flushes
Jason Ekstrand [Tue, 13 Jun 2017 17:31:41 +0000 (10:31 -0700)]
i965: Do an end-of-pipe sync after flushes

According to the docs, a simple CS stall is insufficient to ensure that
the memory from the flush is visible and an end-of-pipe sync is needed.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965/blorp: Do an end-of-pipe sync around CCS ops
Jason Ekstrand [Tue, 13 Jun 2017 16:29:42 +0000 (09:29 -0700)]
i965/blorp: Do an end-of-pipe sync around CCS ops

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965: Do an end-of-pipe sync prior to STATE_BASE_ADDRESS
Jason Ekstrand [Tue, 13 Jun 2017 17:19:56 +0000 (10:19 -0700)]
i965: Do an end-of-pipe sync prior to STATE_BASE_ADDRESS

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965: Add an end-of-pipe sync helper
Topi Pohjolainen [Fri, 20 Jan 2017 11:17:39 +0000 (13:17 +0200)]
i965: Add an end-of-pipe sync helper

v2 (Jason Ekstrand):
 - Take a flags parameter to control the flushes
 - Refactoring

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965: Unify the two emit_pipe_control functions
Jason Ekstrand [Tue, 13 Jun 2017 16:59:18 +0000 (09:59 -0700)]
i965: Unify the two emit_pipe_control functions

These two functions contain almost identical logic except for one SNB
workaround required for render target cache flushes.  They may as well
call into the same code so we only have to handle the work-arounds in
one place.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965: Take a uint64_t immediate in emit_pipe_control_write
Jason Ekstrand [Tue, 13 Jun 2017 16:56:31 +0000 (09:56 -0700)]
i965: Take a uint64_t immediate in emit_pipe_control_write

It's a 64-bit value.  Splitting it up just makes the function arguments
awkward.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965: Flush around state base address
Jason Ekstrand [Thu, 8 Jun 2017 04:39:52 +0000 (21:39 -0700)]
i965: Flush around state base address

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965: Print "force dual color blending" in FS recompile debug output.
Kenneth Graunke [Wed, 14 Jun 2017 08:52:04 +0000 (01:52 -0700)]
i965: Print "force dual color blending" in FS recompile debug output.

I forgot to add this when introducing the new key field.  It doesn't
happen often - just with the Unigine workarounds.  But we may as well
have it, so we get an accurate picture of why recompiles happen.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
7 years agoFix khrplatform.h not installed if EGL is disabled.
Eric Le Bihan [Mon, 12 Jun 2017 11:00:07 +0000 (12:00 +0100)]
Fix khrplatform.h not installed if EGL is disabled.

KHR/khrplatform.h is required by the EGL, GLES and VG headers, but is
only installed if Mesa3d is compiled with EGL support.

This patch installs this header file unconditionally.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77240
Signed-off-by: Eric Le Bihan <eric.le.bihan.dev@free.fr>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoi915: Fix wpos_tex vs. -1 comparison
Ville Syrjälä [Mon, 5 Jun 2017 11:01:58 +0000 (14:01 +0300)]
i915: Fix wpos_tex vs. -1 comparison

wpos_tex used to be a GLuint so assigning -1 to it and
later comparing with -1 worked correctly, but commit
c349031c27b7 ("i915: Fix texcoord vs. varying collision in
fragment programs") changed wpos_tex to uint8_t and hence
broke the comparison. To fix this define a more explicit
invalid value for wpos_tex.

gcc warns us:
i915_fragprog.c:1255:57: warning: comparison is always true due to limited range of data type [-Wtype-limits]
    if (inputsRead & VARYING_BITS_TEX_ANY || p->wpos_tex != -1) {
                                                         ^

And clang says:
i915_fragprog.c:1255:57: warning: comparison of constant -1 with expression of type 'uint8_t' (aka 'unsigned char') is always true [-Wtautological-constant-out-of-range-compare]
   if (inputsRead & VARYING_BITS_TEX_ANY || p->wpos_tex != -1) {
                                            ~~~~~~~~~~~ ^  ~~

Cc: Chih-Wei Huang <cwhuang@android-x86.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Fixes: c349031c27b7 ("i915: Fix texcoord vs. varying collision in fragment programs")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agotgsi/scan: add missing 'static' to tgsi_is_bindless_image_file()
Samuel Pitoiset [Wed, 14 Jun 2017 09:37:17 +0000 (11:37 +0200)]
tgsi/scan: add missing 'static' to tgsi_is_bindless_image_file()

This should fix compilation errors in some situations.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101418
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoconfigure.ac: Reduce zlib requirement from 1.2.8 to 1.2.3.
Chuck Atkins [Thu, 8 Jun 2017 17:11:32 +0000 (13:11 -0400)]
configure.ac: Reduce zlib requirement from 1.2.8 to 1.2.3.

Testing with zlib versions 1.2.{3,4,5,6,7,8} showed no difference in
functionality, correctness, or zlib API usage and 1.2.3 is the oldest
version available in still actively deployed production Linux
distributions (RHEL/CentOS 6 and SuSE 11).

Build 17.1.1 against the system supplied zlib-devel packages for 1.2.3
in EL6 and 1.2.7 on EL7. I then swapped out the zlib version at runtime
via LD_LIBRARY_PATH with ones build from the release tarballs from
zlib.net

Testwise - I ran the piglit shader profile with --quick addded to the
tests since I figured that would exercise the shader cache, which would
in turn use zlib.

Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
[Emil Velikov: add hunk about version/piglit testing]
Acked-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoradeonsi: enable ARB_bindless_texture
Samuel Pitoiset [Mon, 27 Feb 2017 12:15:38 +0000 (13:15 +0100)]
radeonsi: enable ARB_bindless_texture

This has only been tested on RX480.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: add support for loading bindless images
Samuel Pitoiset [Thu, 30 Mar 2017 14:55:58 +0000 (16:55 +0200)]
radeonsi: add support for loading bindless images

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: add support for loading bindless samplers
Samuel Pitoiset [Thu, 30 Mar 2017 15:34:49 +0000 (17:34 +0200)]
radeonsi: add support for loading bindless samplers

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: invalidate buffers which are made resident if needed
Samuel Pitoiset [Thu, 18 May 2017 22:04:26 +0000 (00:04 +0200)]
radeonsi: invalidate buffers which are made resident if needed

When a buffer becomes resident, check if it has been invalidated,
if so update the descriptor and the dirty flag.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: upload new descriptors when resident buffers are invalidated
Samuel Pitoiset [Thu, 18 May 2017 21:51:26 +0000 (23:51 +0200)]
radeonsi: upload new descriptors when resident buffers are invalidated

When texture buffers are invalidated the addr in the resident
descriptor has to be updated but we can't create a new descriptor
because the resident handle has to be the same.

Instead, use the WRITE_DATA packet which allows to update memory
directly but graphics/compute have to be idle in case the GPU is
reading the descriptor.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: only decompress resident textures/images when used
Samuel Pitoiset [Tue, 16 May 2017 20:31:30 +0000 (22:31 +0200)]
radeonsi: only decompress resident textures/images when used

When the current bound shaders don't use any bindless textures
or images, it's useless to decompress the resident resources.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: track use of bindless samplers/images from tgsi_shader_info
Samuel Pitoiset [Tue, 16 May 2017 10:25:28 +0000 (12:25 +0200)]
radeonsi: track use of bindless samplers/images from tgsi_shader_info

This adds some new helper functions to know if the current draw
call (or dispatch compute) is using bindless samplers/images,
based on TGSI analysis.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: decompress resident textures/images before graphics/compute
Samuel Pitoiset [Mon, 15 May 2017 21:50:32 +0000 (23:50 +0200)]
radeonsi: decompress resident textures/images before graphics/compute

Similar to the existing decompression code path except that it
loops over the list of resident textures/images.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: decompress DCC for resident textures/images
Samuel Pitoiset [Mon, 15 May 2017 21:48:08 +0000 (23:48 +0200)]
radeonsi: decompress DCC for resident textures/images

Analogous to bound textures/images. We should also update the
resident descriptors and disable COMPRESSION_EN for avoiding
useless DCC fetches, but I postpone this optimization for a
separate series.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>