mesa.git
8 years agosvga: add svga_init_clear_functions()
Neha Bhende [Thu, 11 Aug 2016 23:43:03 +0000 (16:43 -0700)]
svga: add svga_init_clear_functions()

define svga_init_clear_functions()
and svga_clear_texture as svga->pipe.clear_texture. This is part of
ARB_clear_texture extension

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
8 years agosvga: add new function svga_clear_texture()
Neha Bhende [Thu, 11 Aug 2016 23:37:24 +0000 (16:37 -0700)]
svga: add new function svga_clear_texture()

To clear texture this function can be used. This is part of
ARB_clear_texture extension. Basically this extension allows you to
clear texture with given color values.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
8 years agosvga: add new begin_blit()
Neha Bhende [Thu, 11 Aug 2016 23:30:14 +0000 (16:30 -0700)]
svga: add new begin_blit()

Saving all blitter states will be done in begin_blit() so that
begin_blit() can be used before performing any blit operation.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
8 years agosvga: add opt to the list of valid build types
Charmaine Lee [Fri, 12 Aug 2016 01:41:52 +0000 (18:41 -0700)]
svga: add opt to the list of valid build types

For opt build, add VMX86_STATS to the list of cpp defines.

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agosvga: add guest statistic gathering interface
Charmaine Lee [Fri, 19 Aug 2016 14:49:17 +0000 (08:49 -0600)]
svga: add guest statistic gathering interface

With this patch, guest statistic gathering interface is added to
svga winsys interface that can be used to gather svga driver
statistic. The winsys module can then share the statistic info with
the VMX host via the mksstats interface.

The statistic enums used in the svga driver are defined in
svga_stats_count and svga_stats_time in svga_winsys.h

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agosvga: fix indirect non-indexable temp access
Charmaine Lee [Tue, 25 Aug 2015 21:53:51 +0000 (14:53 -0700)]
svga: fix indirect non-indexable temp access

If the shader has indirect access to non-indexable temporaries,
convert these non-indexable temporaries to indexable temporary array.
This works around a bug in the GLSL->TGSI translator.

Fixes glsl-1.20/execution/fs-const-array-of-struct-of-array.shader_test
on DX11Renderer.

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agogallium/hud: move signo declaration inside PIPE_OS_UNIX block
Brian Paul [Wed, 17 Aug 2016 14:29:55 +0000 (08:29 -0600)]
gallium/hud: move signo declaration inside PIPE_OS_UNIX block

To silence unused var warning with MSVC, MinGW.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoi965: Embrace "unlimited" GTT mmap support
Chris Wilson [Wed, 24 Aug 2016 19:35:46 +0000 (20:35 +0100)]
i965: Embrace "unlimited" GTT mmap support

From about kernel 4.9, GTT mmaps are virtually unlimited. A new
parameter, I915_PARAM_MMAP_GTT_VERSION, is added to advertise the
feature so query it and use it to avoid limiting tiled allocations to
only fit within the mappable aperture.

A couple of caveats:

 - fence support is still limited by stride to 262144 and the stride
needs to be a multiple of tile_width (as before, and same limitation as
the current 3D pipeline in hardware)

 - the max_gtt_map_object_size forcing untiled may be hiding a few bugs
in handling of large objects, though none were spotted in piglits.

See kernel commit 4cc6907501ed ("drm/i915: Add I915_PARAM_MMAP_GTT_VERSION
to advertise unlimited mmaps").

v2: Include some commentary on mmap virtual space vs CPU addressable
space.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
8 years agomesa/main: Fix missing return in non void function
Tobias Klausmann [Thu, 25 Aug 2016 21:48:31 +0000 (23:48 +0200)]
mesa/main: Fix missing return in non void function

This was found by obs:
I: Program returns random data in a function
E: Mesa no-return-in-nonvoid-function main/program_resource.c:109

v2: Remove the ! on the string (Ian Romanick)

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agoi965: Implement GL_KHR_blend_equation_advanced_coherent on Gen9+.
Kenneth Graunke [Thu, 30 Jun 2016 05:16:49 +0000 (22:16 -0700)]
i965: Implement GL_KHR_blend_equation_advanced_coherent on Gen9+.

We always use a coherent read, and ignore the "opt out" enable flag.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agomesa: Implement GL_KHR_blend_equation_advanced_coherent.
Kenneth Graunke [Thu, 30 Jun 2016 04:53:06 +0000 (21:53 -0700)]
mesa: Implement GL_KHR_blend_equation_advanced_coherent.

This adds the extension enable (so drivers can advertise it) and the
extra boolean state flag, GL_BLEND_ADVANCED_COHERENT_KHR, which can
be set to request coherent blending.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agoi965: Enable GL_KHR_blend_equation_advanced on G45 and later.
Kenneth Graunke [Tue, 28 Jun 2016 06:02:24 +0000 (23:02 -0700)]
i965: Enable GL_KHR_blend_equation_advanced on G45 and later.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agoi965: Disable hardware blending if advanced blending is in use.
Kenneth Graunke [Tue, 28 Jun 2016 15:24:11 +0000 (08:24 -0700)]
i965: Disable hardware blending if advanced blending is in use.

We'll do blending in the shader in this case, so just disable the
hardware blending.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agoglsl: Add a lowering pass to handle advanced blending modes.
Kenneth Graunke [Mon, 27 Jun 2016 18:32:16 +0000 (11:32 -0700)]
glsl: Add a lowering pass to handle advanced blending modes.

Many GPUs cannot handle GL_KHR_blend_equation_advanced natively, and
need to emulate it in the pixel shader.  This lowering pass implements
all the necessary math for advanced blending.  It fetches the existing
framebuffer value using the MESA_shader_framebuffer_fetch built-in
variables, and the previous commit's state var uniform to select
which equation to use.

This is done at the GLSL IR level to make it easy for all drivers to
implement the GL_KHR_blend_equation_advanced extension and share code.

Drivers need to hook up MESA_shader_framebuffer_fetch functionality:
1. Hook up the fb_fetch_output variable
2. Implement BlendBarrier()

Then to get KHR_blend_equation_advanced, they simply need to:
3. Disable hardware blending based on ctx->Color._AdvancedBlendEnabled
4. Call this lowering pass.

Very little driver specific code should be required.

v2: Handle multiple output variables per render target (which may exist
    due to ARB_enhanced_layouts), and array variables (even with one
    render target, we might have out vec4 color[1]), and non-vec4
    variables (it's easier than finding spec text to justify not
    handling it).  Thanks to Francisco Jerez for the feedback.
v3: Lower main returns so that we have a single exit point where we
    can add our blending epilogue (caught by Francisco Jerez).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agocompiler: Add a new STATE_VAR_ADVANCED_BLENDING_MODE built-in uniform.
Kenneth Graunke [Tue, 28 Jun 2016 16:02:42 +0000 (09:02 -0700)]
compiler: Add a new STATE_VAR_ADVANCED_BLENDING_MODE built-in uniform.

This will be used for emulating GL_KHR_advanced_blend_equation features
in shader code.  We'll pass in the blending mode that's in use, and use
that in (effectively) a switch statement in the shader.

v2: Use the new _AdvancedBlendMode field.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agomesa: Add draw time validation for advanced blending modes.
Kenneth Graunke [Sat, 20 Aug 2016 19:51:03 +0000 (12:51 -0700)]
mesa: Add draw time validation for advanced blending modes.

v2: Add null checks (requested by Curro).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agomesa: Restyle _mesa_check_blend_func_error().
Kenneth Graunke [Sat, 20 Aug 2016 19:18:16 +0000 (12:18 -0700)]
mesa: Restyle _mesa_check_blend_func_error().

I'm about to add more error conditions to this function, so I wanted to
move the current spec citation above the code that checks it.  Indenting
it required reformatting, so I tried to move it to our newer style.

While there, I also decided to drop some GL type usage, and drop the
unnecessary "_mesa_" prefix on a static function.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agomesa: Track the current advanced blending mode.
Kenneth Graunke [Tue, 28 Jun 2016 15:17:57 +0000 (08:17 -0700)]
mesa: Track the current advanced blending mode.

This will be useful for a number of things:
- Checking the current advanced blending mode against the shader's
  blend_support_* qualifiers.
- Disabling hardware blending when emulating advanced blending.
- Uploading the current advanced blending mode as a state var.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agomesa: Allow advanced blending enums in glBlendEquation[i].
Kenneth Graunke [Tue, 28 Jun 2016 16:18:19 +0000 (09:18 -0700)]
mesa: Allow advanced blending enums in glBlendEquation[i].

Don't allow them in glBlendEquationSeparate[i], though, as required
by the spec.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agoglsl: Merge blend_support qualifiers when linking.
Kenneth Graunke [Tue, 28 Jun 2016 17:02:06 +0000 (10:02 -0700)]
glsl: Merge blend_support qualifiers when linking.

Since each qualifier represents a blending mode the shader can be used
with, we take the union of all possible modes when linking.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agoglsl: process blend_support_* qualifiers
Ilia Mirkin [Sat, 2 Apr 2016 02:51:39 +0000 (22:51 -0400)]
glsl: process blend_support_* qualifiers

v2 (Ken): Add a BLEND_NONE enum value (no qualifiers in use).
v3 (Ken): Rename gl_blend_support_qualifier to gl_advanced_blend_mode.
v4 (Ken): Mark map[] as static const (Ilia).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agoglsl: add basic KHR_blend_equation_advanced infrastructure
Ilia Mirkin [Sat, 2 Apr 2016 02:17:27 +0000 (22:17 -0400)]
glsl: add basic KHR_blend_equation_advanced infrastructure

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agomesa: add KHR_blend_equation_advanced enable and extension string
Ilia Mirkin [Sat, 2 Apr 2016 02:13:22 +0000 (22:13 -0400)]
mesa: add KHR_blend_equation_advanced enable and extension string

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agoglapi: add KHR_blend_equation_advanced dispatch
Ilia Mirkin [Sat, 2 Apr 2016 02:08:13 +0000 (22:08 -0400)]
glapi: add KHR_blend_equation_advanced dispatch

v2 (Ken): Fix enum values, drop _mesa_BlendBarrierKHR stub as Curro has
          already implemented it.
v3 (Ken): Rework for _mesa_BlendBarrierKHR -> _mesa_BlendBarrier rename.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agomesa: Rename _mesa_BlendBarrierMESA to _mesa_BlendBarrier.
Kenneth Graunke [Sat, 13 Aug 2016 02:07:33 +0000 (19:07 -0700)]
mesa: Rename _mesa_BlendBarrierMESA to _mesa_BlendBarrier.

Note that _mesa_BlendBarrierMESA is not currently hooked up in the
glapi XML, so we can just rename it.  We'll hook it up for the
KHR_blend_equation_advanced extension shortly.

We may as well use the ES 3.2 core name with no suffixes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agoi965: Safely iterate the predecessors of the end block.
Kenneth Graunke [Thu, 25 Aug 2016 04:33:16 +0000 (21:33 -0700)]
i965: Safely iterate the predecessors of the end block.

We want to insert code in each of the predecessors of the end block.
This code includes a nir_if, which would split the block, altering
the set.  To avoid that, I emitted a dead constant at the end of each
block before splitting it, so that the set of predecessors remained
unchanged.  This was admittedly ugly.

Connor suggested instead saving a copy of the set, so we can iterate
it safely.  This is also a little ugly, but a much better plan.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
8 years agonir: Use nir_shader_get_entrypoint in TCS quad workaround code.
Kenneth Graunke [Thu, 18 Aug 2016 17:56:48 +0000 (10:56 -0700)]
nir: Use nir_shader_get_entrypoint in TCS quad workaround code.

We want to insert the code at the end of the program.  Looping over
all the functions (of which there was only one) was the old way of doing
this, but now we have nir_shader_get_entrypoint(), so let's use it.

Suggested by Connor Abbott.

v2: Update for nir_shader_get_entrypoint API change.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
8 years agonir: Change nir_shader_get_entrypoint to return an impl.
Kenneth Graunke [Thu, 25 Aug 2016 02:09:57 +0000 (19:09 -0700)]
nir: Change nir_shader_get_entrypoint to return an impl.

Jason suggested adding an assert(function->impl) here.  All callers
of this function actually want ->impl, so I decided just to change
the API.

We also change the nir_lower_io_to_temporaries API here.  All but one
caller passed nir_shader_get_entrypoint(), and with the previous commit,
it now uses a nir_function_impl internally.  Folding this change in
avoids the need to change it and change it back.

v2: Fix one call I missed in ir3_compiler (caught by Eric).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
8 years agonir: Make nir_lower_io_to_temporaries store an impl internally.
Kenneth Graunke [Thu, 25 Aug 2016 02:15:53 +0000 (19:15 -0700)]
nir: Make nir_lower_io_to_temporaries store an impl internally.

This changes the pass internals to work with a nir_function_impl
directly rather than a nir_function.  The next patch will change
the API.

v2: Rebase after framebuffer fetch landed.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
8 years agoi965: Expose shader framebuffer fetch extensions on Gen9+.
Francisco Jerez [Fri, 22 Jul 2016 22:52:49 +0000 (15:52 -0700)]
i965: Expose shader framebuffer fetch extensions on Gen9+.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Hook up coherent framebuffer reads to the NIR front-end.
Francisco Jerez [Fri, 19 Aug 2016 05:12:37 +0000 (22:12 -0700)]
i965/fs: Hook up coherent framebuffer reads to the NIR front-end.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Remove special casing of framebuffer writes in scheduler code.
Francisco Jerez [Thu, 21 Jul 2016 23:56:05 +0000 (16:56 -0700)]
i965/fs: Remove special casing of framebuffer writes in scheduler code.

The reason why it was safe for the scheduler to ignore the side
effects of framebuffer write instructions was that its side effects
couldn't have had any influence on any other instruction in the
program, because we weren't doing framebuffer reads, and framebuffer
writes were always non-overlapping.  We need actual memory dependency
analysis in order to determine whether a side-effectful instruction
can be reordered with respect to other instructions in the program.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Don't CSE render target messages with different target index.
Francisco Jerez [Thu, 7 Jul 2016 03:49:58 +0000 (20:49 -0700)]
i965/fs: Don't CSE render target messages with different target index.

We weren't checking the fs_inst::target field when comparing whether
two instructions are equal.  For FB writes it doesn't matter because
they aren't CSE-able anyway, but this would have become a problem with
FB reads which are expression-like instructions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Define logical framebuffer read opcode and lower it to physical reads.
Francisco Jerez [Thu, 21 Jul 2016 23:55:45 +0000 (16:55 -0700)]
i965/fs: Define logical framebuffer read opcode and lower it to physical reads.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Define framebuffer read virtual opcode.
Francisco Jerez [Thu, 21 Jul 2016 23:52:33 +0000 (16:52 -0700)]
i965/fs: Define framebuffer read virtual opcode.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/disasm: Fix RC message type strings on Gen7+.
Francisco Jerez [Tue, 19 Jul 2016 18:52:23 +0000 (11:52 -0700)]
i965/disasm: Fix RC message type strings on Gen7+.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/eu: Add codegen support for the Gen9+ render target read message.
Francisco Jerez [Fri, 22 Jul 2016 02:13:55 +0000 (19:13 -0700)]
i965/eu: Add codegen support for the Gen9+ render target read message.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/eu: Take into account the target cache argument in brw_set_dp_read_message.
Francisco Jerez [Fri, 22 Jul 2016 01:49:36 +0000 (18:49 -0700)]
i965/eu: Take into account the target cache argument in brw_set_dp_read_message.

brw_set_dp_read_message() was setting the data cache as send message
SFID on Gen7+ hardware, ignoring the target cache specified by the
caller.  Some of the callers were passing a bogus target cache value
as argument relying on brw_set_dp_read_message not to take it into
account.  Fix them too.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Flip the non-coherent framebuffer fetch extension bit on G45-Gen8 hardware.
Francisco Jerez [Tue, 19 Jul 2016 22:23:30 +0000 (15:23 -0700)]
i965: Flip the non-coherent framebuffer fetch extension bit on G45-Gen8 hardware.

This is not enabled on the original Gen4 part because it lacks surface
state tile offsets so it may not be possible to sample from arbitrary
non-zero layers of the framebuffer depending on the miptree layout (it
should be possible to work around this by allocating a scratch surface
and doing the same hack currently used for render targets, but meh...).

On Gen9+ even though it should mostly work (feel free to force-enable
it in order to compare the coherent and non-coherent paths in terms of
performance), there are some corner cases like 1D array layered
framebuffers that cannot be handled easily by the non-coherent path
because of the incompatible layout in memory of 1D and 2D miptrees (it
should be possible to work around this too by doing state-dependent
recompiles, but it's hard to care enough since Gen9 has native support
for coherent render target reads...)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Implement glBlendBarrier.
Francisco Jerez [Fri, 1 Jul 2016 20:54:05 +0000 (13:54 -0700)]
i965: Implement glBlendBarrier.

This is a no-op if the platform supports coherent framebuffer fetch,
-- If it doesn't we just need to flush the render cache and invalidate
the texture cache in order for previous rendering to be visible to
framebuffer fetch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Upload surface state for non-coherent framebuffer fetch.
Francisco Jerez [Fri, 1 Jul 2016 20:56:47 +0000 (13:56 -0700)]
i965: Upload surface state for non-coherent framebuffer fetch.

This iterates over the list of attached render buffers and binds
appropriate surface state structures to the binding table block
allocated for shader framebuffer read.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Implement support for overriding the texture target in brw_emit_surface_state.
Francisco Jerez [Fri, 22 Jul 2016 05:23:13 +0000 (22:23 -0700)]
i965: Implement support for overriding the texture target in brw_emit_surface_state.

This allows the caller to bind a miptree using a texture target other
than the one it it was created with.  The code should work even if the
memory layouts of the specified and original targets don't match, as
long as the caller only intends to access a single slice of the
miptree structure.

This will be exploited by the next commit in order to support
non-coherent framebuffer fetch of a single layer of a 3D texture
(since some generations lack the minimum array element control for 3D
textures bound to the sampler unit), and multiple layers of a 1D array
texture (since binding it as an actual 1D array texture would require
state-dependent recompiles because the same shader couldn't
simultaneously work for 1D and 2D array textures due to the different
texel fetch coordinate ordering).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Massage argument list of brw_emit_surface_state().
Francisco Jerez [Fri, 19 Aug 2016 05:08:10 +0000 (22:08 -0700)]
i965: Massage argument list of brw_emit_surface_state().

This commit does three different things in a single pass in order to
keep the amount of churn low: Remove the for_gather boolean argument
which was unused, pass the isl_view argument by value rather than by
reference since I'll have to modify it from within the function, and
add a target argument to allow callers to bind textures using a target
other than the original.  The prototype of the function now looks
like:

 void brw_emit_surface_state(struct brw_context *brw,
                             struct intel_mipmap_tree *mt,
                             GLenum target, struct isl_view view,
                             uint32_t mocs, uint32_t *surf_offset, int surf_index,
                             unsigned read_domains, unsigned write_domains);

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Add missing has_surface_tile_offset flag to the Gen8+ device info structures.
Francisco Jerez [Tue, 19 Jul 2016 01:06:02 +0000 (18:06 -0700)]
i965: Add missing has_surface_tile_offset flag to the Gen8+ device info structures.

This surface state control has been supported by all hardware
generations since G45.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Return the correct layout from get_isl_dim_layout for pre-ILK cube textures.
Francisco Jerez [Fri, 22 Jul 2016 05:09:46 +0000 (22:09 -0700)]
i965: Return the correct layout from get_isl_dim_layout for pre-ILK cube textures.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Factor out isl_surf_dim/isl_dim_layout calculation into functions.
Francisco Jerez [Tue, 19 Jul 2016 01:07:35 +0000 (18:07 -0700)]
i965: Factor out isl_surf_dim/isl_dim_layout calculation into functions.

The logic to calculate the right layout and dimensionality for a given
GL texture target is going to be useful elsewhere, factor it out from
intel_miptree_get_isl_surf().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Resolve color for non-coherent FB fetch at UpdateState time.
Francisco Jerez [Fri, 1 Jul 2016 20:45:22 +0000 (13:45 -0700)]
i965: Resolve color for non-coherent FB fetch at UpdateState time.

This is required because the sampler unit used to fetch from the
framebuffer is unable to interpret non-color-compressed fast-cleared
single-sample texture data.  Roughly the same limitation applies for
surfaces bound to texture or image units, but unlike texture sampling,
non-coherent framebuffer fetch is by definition non-coherent with
previous rendering, so the brw_render_cache_set_check_flush() call can
be omitted except after resolve.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Return whether the miptree was resolved from intel_miptree_resolve_color().
Francisco Jerez [Sat, 23 Jul 2016 01:16:45 +0000 (18:16 -0700)]
i965: Return whether the miptree was resolved from intel_miptree_resolve_color().

This will allow optimizing out the cache flush in some cases when
resolving wasn't necessary.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Translate nir_intrinsic_load_output on a fragment output.
Francisco Jerez [Fri, 22 Jul 2016 04:57:00 +0000 (21:57 -0700)]
i965/fs: Translate nir_intrinsic_load_output on a fragment output.

This gets the non-coherent framebuffer fetch path hooked up to the NIR
front-end.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Allocate fragment output temporaries on demand.
Francisco Jerez [Fri, 22 Jul 2016 04:47:45 +0000 (21:47 -0700)]
i965/fs: Allocate fragment output temporaries on demand.

This gets rid of the duplication of logic between nir_setup_outputs()
and get_frag_output() by allocating fragment output temporaries lazily
whenever get_frag_output() is called.  This makes nir_setup_outputs()
a no-op for the fragment shader stage.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Rework representation of fragment output locations in NIR.
Francisco Jerez [Fri, 22 Jul 2016 04:26:20 +0000 (21:26 -0700)]
i965/fs: Rework representation of fragment output locations in NIR.

The problem with the current approach is that driver output locations
are represented as a linear offset within the nir_outputs array, which
makes it rather difficult for the back-end to figure out what color
output and index some nir_intrinsic_load/store_output was meant for,
because the offset of a given output within the nir_output array is
dependent on the type and size of all previously allocated outputs.
Instead this defines the driver location of an output to be the pair
formed by its GLSL-assigned location and index (I've borrowed the
bitfield macros from brw_defines.h in order to represent the pair of
integers as a single scalar value that can be assigned to
nir_variable_data::driver_location).  nir_assign_var_locations is no
longer useful for fragment outputs.

Because fragment outputs are now allocated independently rather than
within the nir_outputs array, the get_frag_output() helper becomes
necessary in order to obtain the right temporary register for a given
location-index pair.

The type_size helper passed to nir_lower_io is now type_size_dvec4
rather than type_size_vec4_times_4 so that output array offsets are
provided in terms of whole array elements rather than in terms of
scalar components (dvec4 is the largest vector type supported by the
GLSL so this will cause all individual fragment outputs to have a size
of one regardless of the type).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Fix undefined signed overflow in INTEL_MASK for bitfields of 31 bits.
Francisco Jerez [Fri, 22 Jul 2016 04:58:56 +0000 (21:58 -0700)]
i965: Fix undefined signed overflow in INTEL_MASK for bitfields of 31 bits.

Most likely we had only ever used this macro on bitfields of less than
31 bits -- That's going to change shortly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Special-case nir_intrinsic_store_output for the fragment shader.
Francisco Jerez [Fri, 22 Jul 2016 04:25:46 +0000 (21:25 -0700)]
i965/fs: Special-case nir_intrinsic_store_output for the fragment shader.

I'm about to change how fragment shader output locations are
represented, so the generic nir_intrinsic_store_output implementation
that assumes that outputs are just contiguous elements in the big
nir_outputs array won't work anymore.  This somewhat simplified
implementation of nir_intrinsic_store_output for fragment shaders
should be functionally equivalent to the current fall-back one.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Implement non-coherent framebuffer fetch using the sampler unit.
Francisco Jerez [Fri, 22 Jul 2016 03:25:28 +0000 (20:25 -0700)]
i965/fs: Implement non-coherent framebuffer fetch using the sampler unit.

v2: Memoize sample ID, misc codestyle changes. (Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Emit interpolation setup if non-coherent framebuffer fetch is in use.
Francisco Jerez [Fri, 22 Jul 2016 03:35:29 +0000 (20:35 -0700)]
i965/fs: Emit interpolation setup if non-coherent framebuffer fetch is in use.

This will be required for the next commit since the non-coherent path
makes use of the fragment coordinates implicitly, so they need to be
calculated.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Force per-sample dispatch if the shader reads from a multisample FBO.
Francisco Jerez [Thu, 21 Jul 2016 23:20:07 +0000 (16:20 -0700)]
i965/fs: Force per-sample dispatch if the shader reads from a multisample FBO.

The result of a framebuffer fetch from a multisample FBO is inherently
per-sample, so the spec requires at least those sections of the shader
that depend on the framebuffer fetch result to be executed once per
sample.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Allocate space in the binding table for non-coherent FB fetch.
Francisco Jerez [Fri, 1 Jul 2016 20:46:40 +0000 (13:46 -0700)]
i965: Allocate space in the binding table for non-coherent FB fetch.

Unfortunately due to the inconsistent meaning of some surface state
structure fields, we cannot re-use the same binding table entries for
sampling from and rendering into the same set of render buffers, so we
need to allocate a separate binding table block specifically for
render target reads if the non-coherent path is in use.

The slight noise is due to the change of
brw_assign_common_binding_table_offsets to return the next available
binding table index rather than void.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Add brw_wm_prog_key bit specifying whether FB reads should be coherent.
Francisco Jerez [Fri, 22 Jul 2016 03:32:12 +0000 (20:32 -0700)]
i965/fs: Add brw_wm_prog_key bit specifying whether FB reads should be coherent.

Some of the following changes in this series are specific to the
non-coherent path, so I need some way to tell whether the coherent or
non-coherent path is in use.  The flag defaults to the value of the
gl_extensions::MESA_shader_framebuffer_fetch enable so that it can be
overridden easily on hardware that supports both framebuffer fetch
extensions in order to test the non-coherent path, like:

 MESA_EXTENSION_OVERRIDE=-GL_EXT_shader_framebuffer_fetch

(Of course trying to force-enable the coherent framebuffer fetch
extension on hardware without native support won't work and lead to
assertion failures).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Get rid of fs_visitor::do_dual_src.
Francisco Jerez [Thu, 21 Jul 2016 19:46:04 +0000 (12:46 -0700)]
i965/fs: Get rid of fs_visitor::do_dual_src.

This boolean flag was being used for two different things:

 - To set the brw_wm_prog_data::dual_src_blend flag.  Instead we can
   just set it based on whether the dual_src_output register is valid,
   which will be the case if the shader writes the secondary blending
   color.

 - To decide whether to call emit_single_fb_write() once, or in a loop
   that would iterate only once, which seems pretty useless.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agonir: Handle FB fetch outputs correctly in nir_lower_io_to_temporaries.
Francisco Jerez [Wed, 20 Jul 2016 03:35:26 +0000 (20:35 -0700)]
nir: Handle FB fetch outputs correctly in nir_lower_io_to_temporaries.

This requires emitting a series of copies at the top of the program
from each output variable to the corresponding temporary.  The initial
copy can be skipped for non-framebuffer fetch outputs whose initial
value is undefined, and the final copy needs to be skipped for
read-only outputs (i.e. gl_LastFragData), since it would be illegal to
emit a store output intrinsic for it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agonir: Pass through fb_fetch_output and OutputsRead from GLSL IR.
Francisco Jerez [Wed, 20 Jul 2016 03:33:46 +0000 (20:33 -0700)]
nir: Pass through fb_fetch_output and OutputsRead from GLSL IR.

The NIR representation of framebuffer fetch is the same as the GLSL
IR's until interface variables are lowered away, at which point it
will be translated to load output intrinsics.  The GLSL-to-NIR pass
just needs to copy the bits over to the NIR program.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agovc4: Add support for fddx/fddy
Eric Anholt [Thu, 25 Aug 2016 19:32:19 +0000 (12:32 -0700)]
vc4: Add support for fddx/fddy

Based vaguely on a patch by jonasarrow on github.

8 years agovc4: Add register allocation support for MUL output rotation.
Eric Anholt [Thu, 25 Aug 2016 21:32:47 +0000 (14:32 -0700)]
vc4: Add register allocation support for MUL output rotation.

We need the source to be in r0-r3, so make a new register class for it.
It will be up to the surrounding passes to make sure that the r0-r3
allocation of its source won't conflict with anything other class
requirements on that temp.

8 years agovc4: Add support for MUL output rotation.
Eric Anholt [Thu, 25 Aug 2016 19:31:49 +0000 (12:31 -0700)]
vc4: Add support for MUL output rotation.

Extracted from a patch by jonasarrow on github.

8 years agovc4: Add support for the 2-bit LOAD_IMM variants.
Eric Anholt [Thu, 25 Aug 2016 19:15:29 +0000 (12:15 -0700)]
vc4: Add support for the 2-bit LOAD_IMM variants.

Extracted and fixed up from a patch by jonasarrow on github.  This ended
up not getting used for ddx/ddy, but seems like it might still be useful.

8 years agovc4: Add QPU scheduling to handle MUL rotate sources.
Eric Anholt [Thu, 25 Aug 2016 20:40:27 +0000 (13:40 -0700)]
vc4: Add QPU scheduling to handle MUL rotate sources.

We need MUL rotates to do ddx/ddy support.

8 years agovc4: Add disassembly for constant MUL rotates
Eric Anholt [Thu, 25 Aug 2016 20:26:50 +0000 (13:26 -0700)]
vc4: Add disassembly for constant MUL rotates

8 years agovc4: Add real validation for MUL rotation.
Eric Anholt [Thu, 25 Aug 2016 20:21:58 +0000 (13:21 -0700)]
vc4: Add real validation for MUL rotation.

Caught problems in the upcoming DDX/DDY implementation.

8 years agovc4: Add a QIR value for the QPU element register.
Eric Anholt [Thu, 25 Aug 2016 20:48:21 +0000 (13:48 -0700)]
vc4: Add a QIR value for the QPU element register.

This will be used in the ddx/ddy support for "Am I the top half?" or "Am I
the left half?" checks.

8 years agoi965: Respect miptree offsets in intel_readpixels_tiled_memcpy()
Chad Versace [Thu, 25 Aug 2016 23:08:27 +0000 (16:08 -0700)]
i965: Respect miptree offsets in intel_readpixels_tiled_memcpy()

Respect intel_miptree_slice::x_offset,y_offset and
intel_mipmap_tree::offset. All three may be non-zero when glReadPixels
is called on an EGLImage created from the non-base slice of a miptree.

Patch 2/2 that fixes test
'dEQP-EGL.functional.image.create.gles2_cubemap_*'.

Reported-by: Haixia Shi <hshi@chromium.org>
Diagnosed-by: Haixia Shi <hshi@chromium.org>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Change-Id: I4b397b27e55a743a7094d29fb0a6a4b6b34352b0

8 years agoi965: Fix miptree layout for EGLImage-based renderbuffers
Chad Versace [Thu, 25 Aug 2016 23:08:15 +0000 (16:08 -0700)]
i965: Fix miptree layout for EGLImage-based renderbuffers

When glEGLImageTargetRenderbufferStorageOES() was given an EGLImage
created from the non-base slice of a miptree,
intel_image_target_renderbuffer_storage() forgot to apply the intra-tile
offsets __DRIimage::tile_x,tile_y to the miptree layout.

This patch fixes the problem with a quick hack suitable for
cherry-picking. A proper fix requires more thorough plumbing in
intel_miptree_create_layout() and brw_tex_layout().

Patch 1/2 that fixes test
'dEQP-EGL.functional.image.create.gles2_cubemap_*'.

Reported-by: Haixia Shi <hshi@chromium.org>
Diagnosed-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Change-Id: I8a64b0048a1ee9e714ebb3f33fffd8334036450b

8 years agointel: Flatten the makefile structure
Jason Ekstrand [Mon, 22 Aug 2016 21:10:46 +0000 (14:10 -0700)]
intel: Flatten the makefile structure

This pulls isl and genxml into a single make file so that they can properly
build in parallel.  This isn't terribly important now as genxml just
generates sources which happens serially first anyway but it will be more
important as we add more stuff to src/intel.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agoisl/tests: Use a longer path for isl.h
Jason Ekstrand [Mon, 22 Aug 2016 21:24:01 +0000 (14:24 -0700)]
isl/tests: Use a longer path for isl.h

The tests assumed that isl would be in the include path but that usually
isn't the case.  Instead, we usually have src/intel and you need to add an
"isl/" prefix.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years agointel/isl/gen9: Only use the magic 1D alignment for GEN9_1D surfaces
Jason Ekstrand [Wed, 24 Aug 2016 04:46:58 +0000 (21:46 -0700)]
intel/isl/gen9: Only use the magic 1D alignment for GEN9_1D surfaces

If the surface has a layout of GEN4_2D then we need to compute a normal 2D
alignment and not use the magic linewar 1D alignment.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
8 years agointel/isl: Pass the dim_layout into choose_alignment_el
Jason Ekstrand [Wed, 24 Aug 2016 04:46:23 +0000 (21:46 -0700)]
intel/isl: Pass the dim_layout into choose_alignment_el

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
8 years agointel/isl: Use DIM_LAYOUT_GEN4_2D for tiled 1-D surfaces on SKL
Jason Ekstrand [Wed, 24 Aug 2016 04:35:36 +0000 (21:35 -0700)]
intel/isl: Use DIM_LAYOUT_GEN4_2D for tiled 1-D surfaces on SKL

The Sky Lake 1D layout is only used if the surface is linear.  For tiled
surfaces such as depth and stencil the old gen4 2D layout is used.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
8 years agonir/phi_builder: Don't recurse in value_get_block_def
Jason Ekstrand [Thu, 25 Aug 2016 04:49:10 +0000 (21:49 -0700)]
nir/phi_builder: Don't recurse in value_get_block_def

In some programs, we can have very deep dominance trees and the recursion
can cause us to risk stack overflows.  Instead, we replace the recursion
with a pair of loops, one at the start and one at the end.  This is
functionally equivalent to what we had before and it's actually a bit
easier to read in the new form without the recursion.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97225
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
8 years ago.mailmap: Update my address again
Chad Versace [Thu, 25 Aug 2016 20:54:47 +0000 (13:54 -0700)]
.mailmap: Update my address again

I joined Google's Chrome OS graphics team.

8 years agonir: Walk blocks in source code order in lower_vars_to_ssa.
Matt Turner [Thu, 25 Aug 2016 02:25:58 +0000 (19:25 -0700)]
nir: Walk blocks in source code order in lower_vars_to_ssa.

Prior to this commit rename_variables_block() is recursively called,
performing a depth-first traversal of the control flow graph. The
function uses a non-trivial amount of stack space for local variables,
which puts us in danger of smashing the stack, given a sufficiently deep
dominance tree.

XCOM: Enemy Within contains a shader with such a dominance tree (1574
nir_blocks in total, depth of at least 143).

Jason tells me that he believes that any walk over the nir_blocks that
respects dominance is sufficient (a DFS might have been necessary prior
to the introduction of nir_phi_builder).

In fact, the introduction of nir_phi_builder made the problem worse:
rename_variables_block(), walks to the bottom of the dominance tree
before calling nir_phi_builder_value_get_block_def() which walks back to
the top of the dominance tree...

In any case, this patch ensures we avoid that problem as well.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97225
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
8 years agoradeonsi: don't use allocas for arrays with LLVM 3.8
Marek Olšák [Thu, 25 Aug 2016 18:22:59 +0000 (20:22 +0200)]
radeonsi: don't use allocas for arrays with LLVM 3.8

It crashes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97413

8 years agogallium/radeon: unify and simplify checking for an empty gfx IB
Marek Olšák [Wed, 24 Aug 2016 23:26:54 +0000 (01:26 +0200)]
gallium/radeon: unify and simplify checking for an empty gfx IB

We can take advantage of the fact that multi_fence does the obvious thing
with NULL fences.

This fixes unflushed fences that can get stuck due to empty IBs.

8 years agomesa: Drop sed of now dead Plo files.
Matt Turner [Thu, 25 Aug 2016 18:19:55 +0000 (11:19 -0700)]
mesa: Drop sed of now dead Plo files.

gen6/7/8_blorp.c were removed in commits c8bc1ae96ae198983c61, and
16a9fcbbb6 respectively.

8 years agometa: Always do GenerateMipmaps in linear colorspace.
Kenneth Graunke [Fri, 12 Aug 2016 21:48:54 +0000 (14:48 -0700)]
meta: Always do GenerateMipmaps in linear colorspace.

When generating mipmaps for sRGB textures, force both decode and encode,
so the filtering is done in linear colorspace, regardless of settings.

Fixes a WebGL conformance test in Chrome:
https://www.khronos.org/registry/webgl/sdk/tests/conformance2/textures/misc/tex-srgb-mipmap.html?webglVersion=2

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97322
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
8 years agoconfigure.ac: raise Mako required version to 0.8.0
Eric Engestrom [Tue, 19 Jul 2016 12:41:36 +0000 (13:41 +0100)]
configure.ac: raise Mako required version to 0.8.0

It seems [0] old versions of Mako are no longer supported. Emil mentioned it
might need v0.8.0 [1] for isl_format_layout [2], although I didn't get
a confirmation that it's really the minimum.
Let's raise it to that to avoid getting other bugs.
We might lower it a bit again later if it turns out we can.

[0] https://lists.freedesktop.org/archives/mesa-dev/2016-July/122772.html
[1] https://lists.freedesktop.org/archives/mesa-dev/2016-July/122775.html
[2] https://lists.freedesktop.org/archives/mesa-dev/2016-July/123278.html

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Dave Airlie <Airlied@redhat.com>
8 years agoswrast: fix incorrectly positioned putImage() in swrast driver
Brian Paul [Wed, 24 Aug 2016 14:52:29 +0000 (08:52 -0600)]
swrast: fix incorrectly positioned putImage() in swrast driver

Some front buffer rendering was in the wrong position.  This included
scissored clears, glDrawPixels and glCopyPixels.  The problem was the
y coordinate passed to putImage() didn't match the y coordinate passed
to getImage().

We fix this by setting xrb->map_y to the inverted coordinate in
swrast_map_renderbuffer() which is used later by the putImage() call.
Also pass xrb->map_y to getImage() to be symmetric.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97426
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
8 years agoradeonsi: disable SDMA texture copying on Carrizo
Marek Olšák [Wed, 24 Aug 2016 21:34:01 +0000 (23:34 +0200)]
radeonsi: disable SDMA texture copying on Carrizo

Cc: 12.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
8 years agogallium/noop: use 3-space indentation
Marek Olšák [Sun, 21 Aug 2016 10:41:29 +0000 (12:41 +0200)]
gallium/noop: use 3-space indentation

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agogallium: add a pipe_context parameter to resource_get_handle
Marek Olšák [Sun, 21 Aug 2016 10:24:59 +0000 (12:24 +0200)]
gallium: add a pipe_context parameter to resource_get_handle

radeonsi needs to do some operations (DCC decompression) for OpenGL-OpenCL
interop and this is the only way to make it coherent with the current
context. It can optionally be set to NULL.

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agost/mesa: fix sRGB BlitFramebuffer regression
Nicolai Hähnle [Thu, 11 Aug 2016 11:06:47 +0000 (13:06 +0200)]
st/mesa: fix sRGB BlitFramebuffer regression

Broken since: 3190c7ee9727161d627f107c2e7f8ec3a11941c1

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97285

Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
8 years agoloader/dri3: Overhaul dri3_update_num_back
Michel Dänzer [Wed, 17 Aug 2016 08:02:04 +0000 (17:02 +0900)]
loader/dri3: Overhaul dri3_update_num_back

Always use 3 buffers when flipping. With only 2 buffers, we have to wait
for a flip to complete (which takes non-0 time even with asynchronous
flips) before we can start working on the next frame. We were previously
only using 2 buffers for flipping if the X server supports asynchronous
flips, even when we're not using asynchronous flips. This could result
in bad performance (the referenced bug report is an extreme case, where
the inter-frame stalls were preventing the GPU from reaching its maximum
clocks).

I couldn't measure any performance boost using 4 buffers with flipping.
Performance actually seemed to go down slightly, but that might have
been just noise.

Without flipping, a single back buffer is enough for swap interval 0,
but we need to use 2 back buffers when the swap interval is non-0,
otherwise we have to wait for the swap interval to pass before we can
start working on the next frame. This condition was previously reversed.

Cc: "12.0 11.2" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97260
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
8 years agoanv: Include the pipeline layout in the shader hash
Jason Ekstrand [Thu, 25 Aug 2016 00:14:11 +0000 (17:14 -0700)]
anv: Include the pipeline layout in the shader hash

The pipeline layout affects shader compilation because it is what
determines binding table locations as well as whether or not a particular
buffer has dynamic offsets.  Since this affects the generated shader, it
needs to be in the hash.  This fixes a bunch of CTS tests now that the CTS
is using a pipeline cache.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: "12.0" <mesa-stable@lists.freedesktop.org>
8 years agoanv: Add a --disable-vulkan-icd-full-driver-path option
Jason Ekstrand [Tue, 23 Aug 2016 01:11:41 +0000 (18:11 -0700)]
anv: Add a --disable-vulkan-icd-full-driver-path option

This option makes installed Vulkan ICD files contain only a driver library
name and not a path.  This is intended for distros to help them work around
multi-arch issues.

Reviewed-by: Dave Airlie <airlied@redhat.com>
8 years agoi965/fs: Don't consider the stencil output to be a color output.
Francisco Jerez [Tue, 23 Aug 2016 01:50:41 +0000 (18:50 -0700)]
i965/fs: Don't consider the stencil output to be a color output.

This would cause gl_FragStencilRef to be counted as a color output
incorrectly during the precompile phase, which leads to unnecessary
recompilation on master and could trigger an assertion failure in
fs_visitor::emit_fb_writes() on my i965-fb-fetch branch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoglsl: Keep track of the set of fragment outputs read by a GL program.
Francisco Jerez [Wed, 20 Jul 2016 03:30:24 +0000 (20:30 -0700)]
glsl: Keep track of the set of fragment outputs read by a GL program.

This is the set of shader outputs whose initial value is provided to
the shader by some external means when the shader is executed, rather
than computed by the shader itself.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoglsl: Don't consider read-only fragment outputs to be written to.
Francisco Jerez [Wed, 20 Jul 2016 03:29:55 +0000 (20:29 -0700)]
glsl: Don't consider read-only fragment outputs to be written to.

Since they cannot be written.  This prevents adding fragment outputs
to the OutputsWritten set that are only read from via the
gl_LastFragData array but never written to.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoglsl/linker: Allow fragment output overlap for gl_LastFragData.
Francisco Jerez [Thu, 14 Jul 2016 19:57:14 +0000 (12:57 -0700)]
glsl/linker: Allow fragment output overlap for gl_LastFragData.

gl_LastFragData overlaps gl_FragData by definition.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoglsl/ast: Allow redeclaration of gl_LastFragData with different precision qualifier.
Francisco Jerez [Thu, 14 Jul 2016 19:52:51 +0000 (12:52 -0700)]
glsl/ast: Allow redeclaration of gl_LastFragData with different precision qualifier.

v2: No need to check the GLSL version. (Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoglsl: Don't attempt to do dead varying elimination on gl_LastFragData arrays.
Francisco Jerez [Wed, 20 Jul 2016 03:23:17 +0000 (20:23 -0700)]
glsl: Don't attempt to do dead varying elimination on gl_LastFragData arrays.

Apparently this pass can only handle elimination of a single built-in
fragment output array, so the presence of gl_LastFragData (which it
wouldn't split correctly anyway) could prevent it from splitting the
actual gl_FragData array.  Just match gl_FragData by name since it's
the only built-in it can handle.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoglsl: Define a gl_LastFragData built-in for older GLSL versions.
Francisco Jerez [Wed, 20 Jul 2016 03:11:53 +0000 (20:11 -0700)]
glsl: Define a gl_LastFragData built-in for older GLSL versions.

The EXT_shader_framebuffer_fetch extension defines alternative
language for GLES2 shaders where user-defined fragment outputs are not
allowed.  Instead of using inout user-defined fragment outputs the
shader is expected to read from the gl_LastFragData built-in array.
In addition this allows using the same language on desktop GLSL
versions prior to 4.2 that support the deprecated gl_FragData built-in
in preparation for the MESA_shader_framebuffer_fetch desktop GL
extension.

Both legacy and user-defined inout outputs have a common
representation at the GLSL IR level, so it shouldn't make any
difference for optimization passes and back-ends whether the
application is using gl_LastFragData or user-defined outputs, all
they'll see is a variable dereference of a fragment output at a
certain interface location with the fb_fetch_output bit set to one.

v2: Don't define the built-in variable on GLSL versions for which
    gl_FragData exists but is deprecated. (Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoglsl: Handle the inout qualifier in fragment shader output declarations.
Francisco Jerez [Wed, 20 Jul 2016 03:10:21 +0000 (20:10 -0700)]
glsl: Handle the inout qualifier in fragment shader output declarations.

According to the EXT_shader_framebuffer_fetch extension the inout
qualifier can be used on ESSL 3.0+ shaders to declare a special kind
of fragment output that gets implicitly initialized with the previous
framebuffer contents at the current fragment coordinates.  In addition
we allow using the same language to define FB fetch outputs in GLSL
1.3+ shaders in preparation for the desktop MESA_shader_framebuffer_fetch
extensions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>