mesa.git
7 years agost/nine: Implement normal transformation with vertex blending
Axel Davy [Thu, 29 Sep 2016 20:16:19 +0000 (22:16 +0200)]
st/nine: Implement normal transformation with vertex blending

The formula is different from the one of the spec,
but otherwise nothing particular.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
7 years agost/nine: Increase MaxVertexBlendMatrixIndex
Axel Davy [Sat, 24 Sep 2016 19:09:08 +0000 (21:09 +0200)]
st/nine: Increase MaxVertexBlendMatrixIndex

Modern cards do advertise 8.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
7 years agost/nine: Compact ff vs constants a bit
Axel Davy [Sat, 24 Sep 2016 19:05:04 +0000 (21:05 +0200)]
st/nine: Compact ff vs constants a bit

There are several holes. This patch reduces
the holes a bit, which reduces the size of
the constant buffer uploaded.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
7 years agost/nine: Fix vertex blending aVtx computation
Axel Davy [Sat, 24 Sep 2016 08:42:08 +0000 (10:42 +0200)]
st/nine: Fix vertex blending aVtx computation

There was an multiplication by the world matrix 0
which had nothing to do there.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
7 years agost/nine: Reorganize ff vtx processing
Axel Davy [Sat, 24 Sep 2016 08:22:30 +0000 (10:22 +0200)]
st/nine: Reorganize ff vtx processing

The new order simplified the code a bit for
next patches.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
7 years agost/nine: Small simplification for position_t and fog
Axel Davy [Sat, 24 Sep 2016 08:14:42 +0000 (10:14 +0200)]
st/nine: Small simplification for position_t and fog

position_t disables fog computation.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
7 years agost/nine: Cleaning code for vs temporaries
Axel Davy [Fri, 23 Sep 2016 21:14:36 +0000 (23:14 +0200)]
st/nine: Cleaning code for vs temporaries

This has been a real mess up to now: the temporaries
were allocated once, and shared after that between
the different parts of the code.

To help maintaining the code, the temporaries are now
allocated and released on need.

As surprising as it could be, this patch, which was
supposed to introduce no behaviour change, actually
solved a visual bug observed on a sample program.
This was due to ureg_normalize3 polluting a temporary
variable.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
7 years agost/nine: No need for the local flag for temporaries in ff
Axel Davy [Fri, 23 Sep 2016 20:24:42 +0000 (22:24 +0200)]
st/nine: No need for the local flag for temporaries in ff

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
7 years agost/nine: Handle D3DRS_NORMALIZENORMALS
Axel Davy [Fri, 23 Sep 2016 19:50:51 +0000 (21:50 +0200)]
st/nine: Handle D3DRS_NORMALIZENORMALS

When this state is set, the normals computed
in the vs ff shader should be normalized.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
7 years agost/nine: Initial ProcessVertices support
Axel Davy [Mon, 19 Sep 2016 17:00:23 +0000 (19:00 +0200)]
st/nine: Initial ProcessVertices support

For now only VS 3 support is implemented.

This enables The Sims 2 to work.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
7 years agost/nine: Partial software vertex processing support
Axel Davy [Sat, 17 Sep 2016 12:16:41 +0000 (14:16 +0200)]
st/nine: Partial software vertex processing support

Software Vertex Processing allows:
. Less limitations for shaders (more loops, etc)
. Less limitations for ff (more enabled lights, 255
matrices for VertexBlend)

In particular shaders can get more constants.
This patch implements support for this (not using software
rendering, but hardware rendering, as llvmpipe and dx10+ hw
have the same limits...)

This is considered a second class path. Even apps asking for
"Mixed Vertex processing" (ie the ability to switch to swvp
on demand) do not use the feature much. Some just initialize
more constants than the normal limit at the start of the
application, but never use more than the normal limit.
When the apps do not need the software vertex processing
features, they do not seem to turn it on. This means it is
ok if that path is slow.
Thus no care has been made to make the path optimized.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
7 years agost/nine: Rework vs int and bool constants buffer
Axel Davy [Tue, 4 Oct 2016 17:45:40 +0000 (19:45 +0200)]
st/nine: Rework vs int and bool constants buffer

This will help to support swvp constants.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
7 years agost/nine: Change dirty tracking for vs int and bool constants
Axel Davy [Tue, 4 Oct 2016 17:29:59 +0000 (19:29 +0200)]
st/nine: Change dirty tracking for vs int and bool constants

This change makes easier to introduce tracking for
swvp constants.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
7 years agost/nine: Drop unused constant upload path
Axel Davy [Tue, 4 Oct 2016 17:14:42 +0000 (19:14 +0200)]
st/nine: Drop unused constant upload path

This path has been disabled for some time because
of some bugs with it. It hasn't been updated to the
new features, and is not faster.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
7 years agost/nine: Add support for swvp constants in shaders
Axel Davy [Sat, 17 Sep 2016 10:14:58 +0000 (12:14 +0200)]
st/nine: Add support for swvp constants in shaders

swvp has relaxed limits (more nested loops, etc).
In particular it enables more constants.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
7 years agost/nine: Initial mixed vertex processing support
Axel Davy [Thu, 15 Sep 2016 21:00:02 +0000 (23:00 +0200)]
st/nine: Initial mixed vertex processing support

In mixed vertex processing, the user can enable or disable
software vertex processing. It is on hardware by default.

This feature is not a state, and thus the setting doesn't
need to be recorded by stateblocks.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
7 years agost/nine: Implement SetNPatchMode
Axel Davy [Sun, 11 Sep 2016 14:21:43 +0000 (16:21 +0200)]
st/nine: Implement SetNPatchMode

Signed-off-by: Axel Davy <axel.davy@ens.fr>
7 years agost/nine: Implement D3DUSAGE_SOFTWAREPROCESSING
Axel Davy [Sun, 11 Sep 2016 13:57:12 +0000 (15:57 +0200)]
st/nine: Implement D3DUSAGE_SOFTWAREPROCESSING

Buffers with this flag must be usable with both software
and hardware vertex processing. Use Staging for fast cpu access.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
7 years agost/nine: Allocate more space for ATI1
Patrick Rudolph [Sat, 20 Aug 2016 07:39:08 +0000 (09:39 +0200)]
st/nine: Allocate more space for ATI1

ATIx are "unknown" formats that do not follow block format conventions.
Tests showed that pitch*height bytes are allocated.
apitrace used to depend on this behaviour.
It used to copy more bytes than it has to for the ATI1 block format,
but it didn't crash on Windows.

Increase buffersize for ATI1 to fix this crash.
The same issue was present in WINE but a patch has been sent by me.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
7 years agost/nine: Add missing break
Patrick Rudolph [Sat, 6 Aug 2016 05:54:54 +0000 (07:54 +0200)]
st/nine: Add missing break

Add missing break instruction.

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
7 years agost/nine: Implement relative addressing for ps inputs
Axel Davy [Sun, 12 Jun 2016 20:20:21 +0000 (22:20 +0200)]
st/nine: Implement relative addressing for ps inputs

To implement the feature we copy the ps inputs to a temp array.
This is not optimal for performance, but it is the simplest solution.

This is a feature that is very very rarely used.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
7 years agost/nine: Wait for pending tasks to execute in swapchain
Axel Davy [Sat, 7 May 2016 14:25:03 +0000 (16:25 +0200)]
st/nine: Wait for pending tasks to execute in swapchain

Fixes crash after Reset() when using thread_submit=true

Signed-off-by: Axel Davy <axel.davy@ens.fr>
7 years agost/nine: Use fixed size arrays for swapchain buffers
Axel Davy [Sat, 7 May 2016 14:14:00 +0000 (16:14 +0200)]
st/nine: Use fixed size arrays for swapchain buffers

Signed-off-by: Axel Davy <axel.davy@ens.fr>
7 years agost/nine: Fix buffer count check for Ex devices
Patrick Rudolph [Sat, 7 May 2016 14:02:59 +0000 (16:02 +0200)]
st/nine: Fix buffer count check for Ex devices

Signed-off-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
7 years agost/nine: Disable seamless cubemap for d3d
Axel Davy [Sun, 17 Apr 2016 17:14:35 +0000 (19:14 +0200)]
st/nine: Disable seamless cubemap for d3d

d3d9 doesn't have seamless cubemap.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
7 years agost/nine: Fix some check flags
Axel Davy [Mon, 14 Mar 2016 21:19:54 +0000 (22:19 +0100)]
st/nine: Fix some check flags

Uses the new defines introduced in previous commit.
See comment in the commit for more explanation.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
7 years agost/nine: Unify some check flags
Axel Davy [Mon, 14 Mar 2016 20:26:19 +0000 (21:26 +0100)]
st/nine: Unify some check flags

The new defines will be reused in a later patch.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
7 years agogallium/util: Really allow aliasing of dst for u_box_union_*
Axel Davy [Fri, 30 Sep 2016 18:48:22 +0000 (20:48 +0200)]
gallium/util: Really allow aliasing of dst for u_box_union_*

Gallium nine relies on aliasing to work with this function.
Without this patch, dirty region tracking was incorrect, which
could lead to incorrect textures or vertex buffers.
Fixes several game bugs with nine.
Fixes https://github.com/iXit/Mesa-3D/issues/234

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Patrick Rudolph <siro@das-labor.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
7 years agosoftpipe: Cap to 2 GB on 32 bits
Axel Davy [Thu, 6 Oct 2016 17:42:21 +0000 (19:42 +0200)]
softpipe: Cap to 2 GB on 32 bits

On 32 bits system, application memory is quite limited.
softpipe uses application memory. To help prevent memory
exhaustion, limit reported memory availability to 2GB.

Some gallium nine apps do check reported memory by allocating
resources until memory is full. Gallium nine refuses allocations
when 80% of the reported memory limit is used. This change
helps some apps to start.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
7 years agollvmpipe: Cap to 2 GB on 32 bits
Axel Davy [Mon, 28 Mar 2016 20:34:35 +0000 (22:34 +0200)]
llvmpipe: Cap to 2 GB on 32 bits

On 32 bits system, application memory is quite limited.
llvmpipe uses application memory. To help prevent memory
exhaustion, limit reported memory availability to 2GB.

Some gallium nine apps do check reported memory by allocating
resources until memory is full. Gallium nine refuses allocations
when 80% of the reported memory limit is used. This change
helps some apps to start.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
7 years agogallium/os: Fix overflow on 32 bits
Axel Davy [Thu, 6 Oct 2016 17:35:04 +0000 (19:35 +0200)]
gallium/os: Fix overflow on 32 bits

On systems with more than 4GB of ram,
os_get_total_physical_memory was triggering an integer
overflow for the linux and haiku path, when on
32 bits.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94561

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agost/nine: Memset pipe_resource templates
Axel Davy [Sun, 9 Oct 2016 12:26:32 +0000 (14:26 +0200)]
st/nine: Memset pipe_resource templates

Fixes regression introduced by
ecd6fce2611e88ff8468a354cff8eda39f260a31
and is more future proof than just clearing the next
field.

Other nine usages did already zero out the templates.

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
7 years agonvc0: fix valid range for shader buffers
Samuel Pitoiset [Sun, 9 Oct 2016 20:17:51 +0000 (22:17 +0200)]
nvc0: fix valid range for shader buffers

When offset != 0, the valid range was wrong because the second
argument of util_range_add() is end, not size.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agonvc0/ir: fix overwriting of value backing non-constant gather offset
Ilia Mirkin [Mon, 10 Oct 2016 16:06:59 +0000 (12:06 -0400)]
nvc0/ir: fix overwriting of value backing non-constant gather offset

Normally the value is an immediate, which is moved to some temporary, so
there's no problem. In the case of a non-constant offset (as allowed by
ARB_gpu_shader5), we have to take care to copy it first before using it
to build up the bits.

This fixes a compilation error observed in F1 2015.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
7 years agoglsl: Add missing cache_destroy stub function.
Vinson Lee [Fri, 7 Oct 2016 20:57:44 +0000 (13:57 -0700)]
glsl: Add missing cache_destroy stub function.

  CC       glsl/tests/cache_test.o
glsl/tests/cache_test.c: In function ‘test_cache_create’:
glsl/tests/cache_test.c:160:4: error: implicit declaration of function ‘cache_destroy’ [-Werror=implicit-function-declaration]
    cache_destroy(cache);
    ^

Fixes: 87ab26b2ab35 ("glsl: Add initial functions to implement an on-disk cache")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
7 years agodocs: Mark GL_OES_viewport_array done on i965
Anuj Phogat [Wed, 5 Oct 2016 19:18:55 +0000 (12:18 -0700)]
docs: Mark GL_OES_viewport_array done on i965

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
7 years agoegl: Unify the EGLint/EGLAttrib paths in eglCreateSync* (v3)
Chad Versace [Tue, 27 Sep 2016 20:27:21 +0000 (13:27 -0700)]
egl: Unify the EGLint/EGLAttrib paths in eglCreateSync* (v3)

Pre-patch, there were two code paths for parsing EGLSync attribute
lists: one path for old-style EGLint lists, used by eglCreateSyncKHR,
and another for new-style EGLAttrib lists, used by eglCreateSync (1.5)
and eglCreateSync64 (EGL_KHR_cl_event2).

There were two attrib_list parsing functions,
  _eglParseSyncAttribList(_EGLSync *sync, const EGLint *attrib_list)
  _eglParseSyncAttribList64(_EGLSync *sync, const EGLattrib *attrib_list)
This patch unifies the two attrib_list parsing functions into one,
  _eglParseSyncAttribList(_EGLSync *sync, const EGLattrib *attrib_list)

Many internal EGLSync function signatures had *two* attrib_list
parameters to accomodate both code paths: one parameter was an EGLint
list and other an EGLAttrib list. At most one of the parameters was
allowed to be non-null.  This patch removes the `EGLint *attrib_list`
parameter, leaving only the `EGLAttrib *attrib_list` parameter, for all
internal EGLSync functions.

v2:
  - Consistently use condition (sizeof(int_list[0]) ==
    sizeof(attrib_list[0])). [for emil]
v3:
  - Don't double-unlock the display in eglCreateSyncKHR.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v2)
7 years agointel: Fix bash-specific redirection.
Eric Anholt [Mon, 10 Oct 2016 16:18:19 +0000 (09:18 -0700)]
intel: Fix bash-specific redirection.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agogallium: Fix install-gallium-links.mk on non-bash /bin/sh
Eric Anholt [Thu, 6 Oct 2016 22:19:21 +0000 (15:19 -0700)]
gallium: Fix install-gallium-links.mk on non-bash /bin/sh

Debian uses dash by default, which doesn't do '+='.  Fixes servo's
osmesa-based headless testing system, which was looking for libOSMesa in
the lib/ directory.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Cc: mesa-stable@lists.freedesktop.org
7 years agonv50/ir: only stick one preret per function
Ilia Mirkin [Sun, 9 Oct 2016 04:09:54 +0000 (00:09 -0400)]
nv50/ir: only stick one preret per function

A function with multiple returns would have had multiple preret settings
at the top of the function. While this is unlikely to have caused issues
since we don't use functions in earnest, it could have in some cases
overflowed the call stack, in case a function had a lot of early
returns.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
7 years agoradeonsi: make more use of si_have_tgsi_compute
Nicolai Hähnle [Thu, 6 Oct 2016 20:57:55 +0000 (22:57 +0200)]
radeonsi: make more use of si_have_tgsi_compute

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agogallium/radeon: assign a name to LLVM output variables in debug builds
Nicolai Hähnle [Fri, 7 Oct 2016 15:14:54 +0000 (17:14 +0200)]
gallium/radeon: assign a name to LLVM output variables in debug builds

This can be helpful with R600_DEBUG=preoptir.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agogallium/radeon: avoid redundant work with overlapping in/out arrays
Nicolai Hähnle [Fri, 7 Oct 2016 10:54:34 +0000 (12:54 +0200)]
gallium/radeon: avoid redundant work with overlapping in/out arrays

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeonsi: support ARB_compute_variable_group_size
Nicolai Hähnle [Fri, 9 Sep 2016 08:08:11 +0000 (10:08 +0200)]
radeonsi: support ARB_compute_variable_group_size

Not sure if it's possible to avoid programming the block size twice (once for
the userdata and once for the dispatch).

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoanv: turn on samplerAnisotropy in VkPhysicalDeviceFeatures
Lionel Landwerlin [Fri, 7 Oct 2016 12:53:04 +0000 (13:53 +0100)]
anv: turn on samplerAnisotropy in VkPhysicalDeviceFeatures

According to the Vulkan spec 5.63.4 :

  samplerAnisotropy indicates whether anisotropic filtering is supported. If
  this feature is not enabled, the maxAnisotropy member of the
  VkSamplerCreateInfo structure must be 1.0.

Since we already set maxAnisotropy to 16 and program the hardware according
to the VkSamplerCreateInfo.maxAnisotropy, it seems we can turn this on.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoradv: Use proper header guards over 'pragma once' directives
Edward O'Callaghan [Fri, 7 Oct 2016 11:19:19 +0000 (22:19 +1100)]
radv: Use proper header guards over 'pragma once' directives

Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agomesa: throw error if bufSize negative in GetSynciv on OpenGL ES
Tapani Pälli [Fri, 7 Oct 2016 05:41:15 +0000 (08:41 +0300)]
mesa: throw error if bufSize negative in GetSynciv on OpenGL ES

Fixes following dEQP tests:

   dEQP-GLES31.functional.debug.negative_coverage.callbacks.state.get_synciv
   dEQP-GLES31.functional.debug.negative_coverage.get_error.state.get_synciv
   dEQP-GLES31.functional.debug.negative_coverage.log.state.get_synciv

v2: drop _mesa_is_gles check (Kenneth)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98133
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoglsl: prohibit lowp, mediump precision on atomic_uint
Tapani Pälli [Fri, 7 Oct 2016 05:23:41 +0000 (08:23 +0300)]
glsl: prohibit lowp, mediump precision on atomic_uint

Fixes following dEQP tests:

   dEQP-GLES31.functional.debug.negative_coverage.callbacks.atomic_counter.atomic_precision
   dEQP-GLES31.functional.debug.negative_coverage.get_error.atomic_counter.atomic_precision
   dEQP-GLES31.functional.debug.negative_coverage.log.atomic_counter.atomic_precision

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98131
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoglsl: optimize copy_propagation_elements pass
Tapani Pälli [Fri, 30 Sep 2016 15:12:12 +0000 (08:12 -0700)]
glsl: optimize copy_propagation_elements pass

Changes make copy_propagation_elements pass faster, reducing link
time spent in test case of bug 94477. Does not fix the actual issue
but brings down the total time. No regressions seen in CI.

v2 (idr): Formatting / whitespace fixes.  Embed the acp_ref in the
acp_entry.

v3 (idr): Delete unused copy constructor.  Use while(pop_head) instead
of foreach() { remove }.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoradv: don't build without SHA1.
Dave Airlie [Mon, 10 Oct 2016 00:06:52 +0000 (10:06 +1000)]
radv: don't build without SHA1.

Just copy the section from anv above this.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98167
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agodocs/features.txt: Add GL_KHR_robustness supported on ES 3.2
Edward O'Callaghan [Fri, 7 Oct 2016 12:21:32 +0000 (23:21 +1100)]
docs/features.txt: Add GL_KHR_robustness supported on ES 3.2

Both radeonsi and nvc0 should also support ES so fixup doc.

Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agointel: aubinator: enable loading dumps from standard input
Lionel Landwerlin [Tue, 4 Oct 2016 14:15:50 +0000 (15:15 +0100)]
intel: aubinator: enable loading dumps from standard input

In conjuction with an intel_aubdump change, you can now look at your
application's output like this :

$ intel_aubdump -c '/path/to/aubinator --gen=hsw' my_gl_app

v2: Add print_help() comment about standard input handling (Eero)
    Remove shrinked gtt space debug workaround (Eero)

v3: Use realloc rather than memcpy/free (Ben)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sirisha Gandikota <Sirisha.Gandikota@intel.com>
7 years agointel: aubinator: enable loading xml files from a given directory
Lionel Landwerlin [Wed, 5 Oct 2016 22:52:51 +0000 (23:52 +0100)]
intel: aubinator: enable loading xml files from a given directory

This might be useful for people who debug with out of tree descriptions.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sirisha Gandikota <Sirisha.Gandikota@intel.com>
7 years agointel: aubinator: generate a standalone binary
Lionel Landwerlin [Tue, 4 Oct 2016 15:19:43 +0000 (16:19 +0100)]
intel: aubinator: generate a standalone binary

Embed the xml files into the binary, so aubinator can be used from any
location.

v2: Split generation packing into another patch (Jason)
    Check for xxd (Jason)

v3: Fix out of tree builds (Jason)
    Generate custom variable name rather than names generated by xxd
    (Lionel)

v4: Move generated _xml.h files to genxml/ (Sirisha)

v5: Remove newline from makefile (Jason)

v6: Add comment on gen*_xml.h creation (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/TODO: Update the HiZ task
Nanley Chery [Fri, 7 Oct 2016 19:07:33 +0000 (12:07 -0700)]
anv/TODO: Update the HiZ task

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
7 years agoanv: Enable fast depth clears
Nanley Chery [Fri, 7 Oct 2016 19:07:31 +0000 (12:07 -0700)]
anv: Enable fast depth clears

Provides an FPS increase of ~30% on the Sascha triangle and multisampling
demos.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
7 years agoanv/cmd_buffer: Enable rendering to HiZ
Chad Versace [Thu, 6 Oct 2016 22:21:53 +0000 (15:21 -0700)]
anv/cmd_buffer: Enable rendering to HiZ

Nanley Chery:
(rebase)
 - Resolve conflicts with new anv_batch_emit macro
(amend)
 - Handle a QPitch TODO
 - Emit 3DSTATE_HIER_DEPTH_BUFFER on pre-BDW systems
 - Only use HiZ for single-subpass renderpasses
 - Emit the HiZ instruction before the stencil instruction to follow the
   optimized clear sequence specified in the PRMs
 - Don't modify clear params
 - Enable resolves when a HiZ buffer is used to ensure depth buffer validity

Provides an FPS increase of ~15% on the Sascha triangle and multisampling
demos.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/cmd_buffer: Add code for performing HZ operations
Nanley Chery [Fri, 7 Oct 2016 19:07:27 +0000 (12:07 -0700)]
anv/cmd_buffer: Add code for performing HZ operations

Create a function that performs one of three HiZ operations -
depth/stencil clears, HiZ resolve, and depth resolves.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
7 years agoanv/image: Memset hiz surfaces to 0 when binding memory
Jason Ekstrand [Thu, 6 Oct 2016 22:21:51 +0000 (15:21 -0700)]
anv/image: Memset hiz surfaces to 0 when binding memory

Nanley Chery (amend):
 - Change memset value from 0xff to 0 (a defined value for HiZ).

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv: Move BindImageMemory to anv_image.c
Jason Ekstrand [Thu, 6 Oct 2016 22:21:50 +0000 (15:21 -0700)]
anv: Move BindImageMemory to anv_image.c

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv: Allocate hiz surface
Chad Versace [Thu, 6 Oct 2016 22:21:49 +0000 (15:21 -0700)]
anv: Allocate hiz surface

Nanley Chery:
(rebase)
 - Use isl_surf_get_hiz_surf()
(amend)
 - Only add a HiZ surface onto a depth/stencil attachment
 - Add comment above HiZ surface addition
 - Hide HiZ behind INTEL_VK_HIZ prior to BDW
 - Disable HiZ for untested cases
 - Remove DISABLE_AUX_BIT instead of preventing it from being added

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
7 years agoanv: Add func anv_image_has_hiz()
Chad Versace [Thu, 6 Oct 2016 22:21:48 +0000 (15:21 -0700)]
anv: Add func anv_image_has_hiz()

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv: Add anv_image::hiz_surface
Chad Versace [Thu, 6 Oct 2016 22:21:47 +0000 (15:21 -0700)]
anv: Add anv_image::hiz_surface

Unused.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoisl: Correct a comment in the isl_format enum
Nanley Chery [Fri, 7 Oct 2016 19:07:22 +0000 (12:07 -0700)]
isl: Correct a comment in the isl_format enum

HiZ is not a color surface, but an auxiliary depth surface.

Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agogallium: add missing zero-init for resource templates
Rob Clark [Fri, 7 Oct 2016 15:58:16 +0000 (11:58 -0400)]
gallium: add missing zero-init for resource templates

Mostly test code, plus one spot I noticed in r600.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agofreedreno: don't try to shadow layered textures
Rob Clark [Fri, 7 Oct 2016 15:59:28 +0000 (11:59 -0400)]
freedreno: don't try to shadow layered textures

We will only hit this with multi-planar YUV external images, so we would
probably never hit this code path in the first place.  But if we did, it
wouldn't do the right thing so just bail.

Signed-off-by: Rob Clark <robdclark@gmail.com>
7 years agofreedreno/a3xx+a4xx: fix clip-plane lowering state
Rob Clark [Mon, 3 Oct 2016 20:24:07 +0000 (16:24 -0400)]
freedreno/a3xx+a4xx: fix clip-plane lowering state

If enabled clip-planes have changed, we need to mark program state
dirty.

Signed-off-by: Rob Clark <robdclark@gmail.com>
7 years agoglsl: Let cache_test build when the shader cache is not enabled
Ian Romanick [Wed, 5 Oct 2016 20:21:53 +0000 (13:21 -0700)]
glsl: Let cache_test build when the shader cache is not enabled

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Tested-by: Aaron Watry <awatry@gmail.com>
7 years agoanv: pipeline cache: fix return value of vkGetPipelineCacheData
Lionel Landwerlin [Fri, 7 Oct 2016 16:16:55 +0000 (17:16 +0100)]
anv: pipeline cache: fix return value of vkGetPipelineCacheData

According to the spec - 9.6. Pipeline Cache :

  If pDataSize is less than the maximum size that can be retrieved by the
  pipeline cache, at most pDataSize bytes will be written to pData, and
  vkGetPipelineCacheData will return VK_INCOMPLETE.

Fixes the following test from Vulkan CTS :

  dEQP-VK.pipeline.cache.pipeline_from_incomplete_get_data.vertex_stage_fragment_stage
  dEQP-VK.pipeline.cache.pipeline_from_incomplete_get_data.vertex_stage_geometry_stage_fragment_stage
  dEQP-VK.pipeline.cache.misc_tests.invalid_size_test

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoutil: remove unused variable
Timothy Arceri [Fri, 7 Oct 2016 10:10:37 +0000 (21:10 +1100)]
util: remove unused variable

Also initialise page at declaration.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoloader/dri3: import prime buffers in the currently-bound screen
Martin Peres [Thu, 6 Oct 2016 14:10:35 +0000 (17:10 +0300)]
loader/dri3: import prime buffers in the currently-bound screen

This tries to mirrors the codepath taken by DRI2 in IntelSetTexBuffer2()
and fixes many applications when using DRI3:
 - Totem with libva on hw-accelerated decoding
 - obs-studio, using Window Capture (Xcomposite) as a Source
 - gstreamer with VAAPI

v2:
 - introduce get_dri_screen() in the dri3 loader's vtable (krh)

Tested-by: Timo Aaltonen <tjaalton@ubuntu.com>
Tested-by: Ionut Biru <biru.ionut@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71759
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
7 years agoloader/dri3: add get_dri_screen() to the vtable
Martin Peres [Thu, 6 Oct 2016 14:07:22 +0000 (17:07 +0300)]
loader/dri3: add get_dri_screen() to the vtable

This allows querying the current active screen from the
loader's common code.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
7 years agoanv/entrypoints: Save off the entire devinfo rather than a pointer
Jason Ekstrand [Fri, 7 Oct 2016 00:16:51 +0000 (17:16 -0700)]
anv/entrypoints: Save off the entire devinfo rather than a pointer

Since the gen_device_info structs are no longer just constant memory, a
pointer to one is not a pointer to something in the .data section so we
shouldn't be storing it in a static variable.  Instead, we should just
store the entire device_info structure.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoradv: drop all uint for unsigned.
Dave Airlie [Fri, 7 Oct 2016 02:08:26 +0000 (12:08 +1000)]
radv: drop all uint for unsigned.

Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agovc4: Don't worry about partial Z/S clear if the other is already cleared.
Eric Anholt [Thu, 6 Oct 2016 21:50:04 +0000 (14:50 -0700)]
vc4: Don't worry about partial Z/S clear if the other is already cleared.

We have to be careful to not smash the value they're clearing to, but
other than that we're fine.  Avoids quad clears in Processing, which likes
to do glClear(Z|S); glClear(Z).

Improves performance of Processing's QuadRendering demo at 5000 quads by
5.46507% +/- 1.35576% (n=15 before, 32 after)

7 years agovc4: Try to fix the HW-2116 workaround.
Eric Anholt [Thu, 6 Oct 2016 21:45:23 +0000 (14:45 -0700)]
vc4: Try to fix the HW-2116 workaround.

We were incrementing the count at the end of vc4_start_draw(), except that
that function returns immediately if we've already started drawing on this
batch.  It also failed to count the statechanges from the GFXH-515
workaround.

This incidentally allows repeated glClear() to be coalesced, because the
fast clears aren't counted in draw_calls_queued any more.  Fixes most of
the extra flushes in Processing, which emits glClear(Z|S); glClear(Z);
glClear(C) during its frame setup.

Improves performance of Processing's QuadRendering demo at 5000 quads by
3.33538% +/- 2.05846% (n=21 before, 15 after)

7 years agovc4: Drop dead argument from vc4_start_draw().
Eric Anholt [Thu, 6 Oct 2016 21:35:58 +0000 (14:35 -0700)]
vc4: Drop dead argument from vc4_start_draw().

7 years agovc4: Fix fallback to quad clears of depth in GLX.
Eric Anholt [Wed, 5 Oct 2016 21:10:30 +0000 (14:10 -0700)]
vc4: Fix fallback to quad clears of depth in GLX.

The fix in the vc4-jobs series ended up triggering the fallback path on
GLX apps that use depth but not stencil.

7 years agovc4: Add the format name in miptree_debug.
Eric Anholt [Wed, 5 Oct 2016 21:22:09 +0000 (14:22 -0700)]
vc4: Add the format name in miptree_debug.

I was curious if my Z/S buffer was actually ZS or ZX, and the vc4 format
of "0" didn't tell me much.

7 years agovc4: Fix perf debug formatting on partial Z/S clear.
Eric Anholt [Wed, 5 Oct 2016 21:05:51 +0000 (14:05 -0700)]
vc4: Fix perf debug formatting on partial Z/S clear.

7 years agovc4: Drop destination register when it's unused.
Eric Anholt [Wed, 5 Oct 2016 16:21:37 +0000 (09:21 -0700)]
vc4: Drop destination register when it's unused.

This slightly reduces instructions on shader-db, but I think it's just
perturbing register allocation -- the allocator should have always
trivially colored these nodes, before.  This commit is just to make QIR
code failing more intelligible when register allocation fails.

7 years agovc4: Fix live intervals analysis for screening defs in if statements.
Eric Anholt [Wed, 5 Oct 2016 16:07:46 +0000 (09:07 -0700)]
vc4: Fix live intervals analysis for screening defs in if statements.

If a conditional assignment is only conditioned on the exec mask, that's
still screening off the value in the executed channels (and, since we're
not storing to the unexcuted channels, we don't care what's in there).

Fixes a bunch of extra register pressure on Processing's Ribbons demo,
which is failing to allocate.

7 years agovc4: Fix simulator when more than one vc4_screen is opened.
Eric Anholt [Tue, 4 Oct 2016 23:29:26 +0000 (16:29 -0700)]
vc4: Fix simulator when more than one vc4_screen is opened.

We would assertion fail in setting up the simulator the second time
around.  This at least postpones the assertion failure until we've closed
all of the first set of screens and started opening a new set.

7 years agovc4: Fix assertion fails from trying to cast non-ALU instrs to ALU.
Eric Anholt [Thu, 6 Oct 2016 23:46:35 +0000 (16:46 -0700)]
vc4: Fix assertion fails from trying to cast non-ALU instrs to ALU.

Fixes 100 piglit tests since the assertions were added to nir.h.  What's
amazing is that these tests used to pass, even when casting garbage.

7 years agoanv/cmd_buffer: Move the clear_subpasses calls to set_subpass
Jason Ekstrand [Wed, 5 Oct 2016 23:54:57 +0000 (16:54 -0700)]
anv/cmd_buffer: Move the clear_subpasses calls to set_subpass

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
7 years agoanv/cmd_buffer: Don't call set_subpass in a secondary
Jason Ekstrand [Wed, 5 Oct 2016 23:51:02 +0000 (16:51 -0700)]
anv/cmd_buffer: Don't call set_subpass in a secondary

Initially, we had intended set_subpass to be an interesting function that
did whatever (presumably a lot) setup we needed for a subpass.  In reality,
it just sets a pointer and a dirty bit and then emits depth and stencil
state.  When we call BeginCommandBuffer on a secondary, there's no point in
setting depth and stencil state since it will already be set by the
primary.  Instead, the only thing we need to do at the start of a secondary
is set the subpass pointer and the dirty bit.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
7 years agoanv/cmd_buffer: Rework descriptor dirtying in set_subpass
Jason Ekstrand [Thu, 6 Oct 2016 22:50:21 +0000 (15:50 -0700)]
anv/cmd_buffer: Rework descriptor dirtying in set_subpass

We have a DIRTY_RENDER_TARGETS flag and that makes a lot more sense than
just dirtying fragment descriptors.  We're checking for it in some of the
gen7 code but unfortunately, nothing was setting it and it didn't do what
it was supposed to do in cmd_buffer_flush_state.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv/wsi: Advertise UNORM formats as well as sRGB
Jason Ekstrand [Wed, 5 Oct 2016 20:23:22 +0000 (13:23 -0700)]
anv/wsi: Advertise UNORM formats as well as sRGB

Because WSI images are created with VkImageCreateInfo::flags explicitly set
to 0, they don't ever have the VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT set.
This means that you can't create an image view of it with a different
format so applications can't render directly in sRGB (without automatic
encoding) unless we actually advertise UNORM formats.  There are a lot of
applications that want to do their own sRGB conversion, so we should allow
for that.  We do, however, make UNORM come after sRGB in the list so that
the default for dumb apps that just grab the first thing is to render in
linear and let the sRGB conversion happen automatically.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoradv: fix configure.ac check
Dave Airlie [Thu, 6 Oct 2016 23:27:36 +0000 (09:27 +1000)]
radv: fix configure.ac check

This should be positive test.

Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Skip already signalled fences.
Gustaw Smolarczyk [Wed, 5 Oct 2016 23:09:54 +0000 (01:09 +0200)]
radv: Skip already signalled fences.

If the user created a fence with VK_FENCE_CREATE_SIGNALED_BIT set, we
shouldn't fail to wait for a fence if it was not submitted since that is
not necessary.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: add initial non-conformant radv vulkan driver
Dave Airlie [Thu, 6 Oct 2016 23:16:09 +0000 (09:16 +1000)]
radv: add initial non-conformant radv vulkan driver

This squashes all the radv development up until now into
one for merging.

History can be found:
https://github.com/airlied/mesa/tree/semi-interesting

This requires llvm 3.9 and is in no way considered
a conformant vulkan implementation. It can run a number
of vulkan applications, and supports all GPUs using
the amdgpu kernel driver.

Thanks to Intel for providing anv and spirv->nir,
and Emil Velikov for reviewing build integration.

Parts of this are:
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Authors: Bas Nieuwenhuizen and Dave Airlie
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agonv50/ir: fix wrong check when optimizing MAD to SHLADD
Samuel Pitoiset [Thu, 6 Oct 2016 23:08:54 +0000 (01:08 +0200)]
nv50/ir: fix wrong check when optimizing MAD to SHLADD

Checking if MAD is supported is definitely wrong, and it's
more likely a typo I introduced few days ago which breaks
NV50 because SHLADD is not supported there.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agointel: aubinator: use getopt to parse arguments
Lionel Landwerlin [Tue, 4 Oct 2016 15:29:55 +0000 (16:29 +0100)]
intel: aubinator: use getopt to parse arguments

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Sirisha Gandikota <sirisha.gandikota@intel.com>
7 years agonvc0: dump program binary only when NV50_PROG_DEBUG is set
Samuel Pitoiset [Thu, 6 Oct 2016 22:43:51 +0000 (00:43 +0200)]
nvc0: dump program binary only when NV50_PROG_DEBUG is set

When the chipset is forced with NV50_PROG_CHIPSET, we actually
only want to output the binary if NV50_PROG_DEBUG is also
enabled. Otherwise, this pollutes the shader-db output.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
7 years agonir: Fix the control flow tests for nir_loop_first_block changes
Jason Ekstrand [Thu, 6 Oct 2016 22:46:22 +0000 (15:46 -0700)]
nir: Fix the control flow tests for nir_loop_first_block changes

Commit 2ed17d46de045404042f13c6591895a1cf31b167 changed
nir_loop_first_cf_node and friends to return a nir_block instead of a
nir_cf_node.  This broke one of the NIR control flow tests.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98128

7 years agodocs: mark ARB_compute_variable_group_size as done for nvc0
Samuel Pitoiset [Sun, 11 Sep 2016 15:46:24 +0000 (17:46 +0200)]
docs: mark ARB_compute_variable_group_size as done for nvc0

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agonvc0: expose ARB_compute_variable_group_size
Samuel Pitoiset [Sat, 10 Sep 2016 14:45:32 +0000 (16:45 +0200)]
nvc0: expose ARB_compute_variable_group_size

Only expose 512 threads/block on Fermi to not be limited by
32 GPRs/thread.

v4: - use 512 threads on Fermi, 1024 on Kepler+

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
7 years agonv50/ir: set number of threads/block for variable local size
Samuel Pitoiset [Tue, 6 Sep 2016 22:12:51 +0000 (00:12 +0200)]
nv50/ir: set number of threads/block for variable local size

When a variable local size is defined as specified by
ARB_compute_variable_group_size, the fixed local size is set to 0
and a SIGFPE occurs when we compute the maximum number of regs.

This allows to use 64 GPRs/thread.

v4: - use 512 threads on Fermi, 1024 on Kepler+

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
7 years agost/mesa: expose ARB_compute_variable_group_size
Samuel Pitoiset [Wed, 7 Sep 2016 16:00:16 +0000 (18:00 +0200)]
st/mesa: expose ARB_compute_variable_group_size

This extension is only exposed if the underlying driver supports
ARB_compute_shader and if PIPE_COMPUTE_MAX_VARIABLE_THREADS_PER_BLOCK
is set.

v3: - initialize max_variable_threads_per_block to 0
v2: - expose the ext based on that new cap

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agost/mesa: add support for dispatching a variable local size
Samuel Pitoiset [Tue, 6 Sep 2016 18:15:00 +0000 (20:15 +0200)]
st/mesa: add support for dispatching a variable local size

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>