mesa.git
8 years agosvga: remove local variable in create_backed_surface_view()
Brian Paul [Thu, 25 Aug 2016 21:04:52 +0000 (15:04 -0600)]
svga: remove local variable in create_backed_surface_view()

To simplify the code a bit.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
8 years agodocs: Add GL_KHR_blend_equation_advanced to relnotes.
Kenneth Graunke [Fri, 26 Aug 2016 20:17:22 +0000 (13:17 -0700)]
docs: Add GL_KHR_blend_equation_advanced to relnotes.

8 years agor600: increase performance for DRI PRIME offloading if 2nd GPU is Evergreen+
Mario Kleiner [Fri, 26 Aug 2016 16:59:05 +0000 (18:59 +0200)]
r600: increase performance for DRI PRIME offloading if 2nd GPU is Evergreen+

This is a direct port of Marek Olšáks patch
"radeonsi: increase performance for DRI PRIME
offloading if 2nd GPU is CIK or VI" to r600.

It uses SDMA for the detiling blit from renderoffload VRAM
to GTT, as SDMA is much faster for tiled->linear blits from
VRAM to GTT.

Testing on a dual Radeon HD-5770 setup reduced the time
for the render offload gpu to get its rendering into
system RAM from approximately 16 msecs for simple rendering
at 1920x1080 pixel 32 bpp to 5 msecs, a > 3x speedup!

This was measured using ftrace to trace the time the radeon kms
driver waited on the dmabuf fence of the renderoffload gpu to
complete.

All in all this brought the time for a flip down from 20 msecs
to 9 msecs, so the prime setup can display at full 60 fps instead
of barely 30 fps vsync'ed.

The current r600 implementation supports SDMA on Evergreen and
later, but not R600/R700 due to some bugs apparently present
in their SDMA implementation.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
8 years agodocs: Update stencil texturing & ES 3.1 status for i965 Haswell
Jordan Justen [Thu, 18 Aug 2016 22:05:13 +0000 (15:05 -0700)]
docs: Update stencil texturing & ES 3.1 status for i965 Haswell

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agoi965: Enable OpenGLES 3.1 for Haswell
Jordan Justen [Wed, 8 Jun 2016 20:17:41 +0000 (13:17 -0700)]
i965: Enable OpenGLES 3.1 for Haswell

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agoi965: Enable ARB_texture_stencil8 for Haswell
Jordan Justen [Tue, 14 Jun 2016 22:57:49 +0000 (15:57 -0700)]
i965: Enable ARB_texture_stencil8 for Haswell

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agoi965: Enable ARB_stencil_texturing for Haswell
Jordan Justen [Wed, 8 Jun 2016 20:21:10 +0000 (13:21 -0700)]
i965: Enable ARB_stencil_texturing for Haswell

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agoi965/gen7: Use R8_UINT stencil copy when sampling the stencil texture
Jordan Justen [Sat, 11 Jun 2016 23:41:18 +0000 (16:41 -0700)]
i965/gen7: Use R8_UINT stencil copy when sampling the stencil texture

v2:
 * Check gen <= 7, rather than gen == 7. (Ian)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agoi965/gen7: Copy stencil when sampling the stencil texture
Jordan Justen [Sat, 11 Jun 2016 23:46:13 +0000 (16:46 -0700)]
i965/gen7: Copy stencil when sampling the stencil texture

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agoi965: Add function to copy a stencil miptree to an R8_UINT miptree
Jordan Justen [Sat, 11 Jun 2016 23:44:27 +0000 (16:44 -0700)]
i965: Add function to copy a stencil miptree to an R8_UINT miptree

v2:
 * Cleanups suggested by Ian, Matt and Topi

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
8 years agoi965: Track that the stencil data was updated when using Tex*Image
Jordan Justen [Wed, 6 Jul 2016 22:50:34 +0000 (15:50 -0700)]
i965: Track that the stencil data was updated when using Tex*Image

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
8 years agoi965: Track that the stencil data was updated when rendering
Jordan Justen [Sat, 11 Jun 2016 23:29:36 +0000 (16:29 -0700)]
i965: Track that the stencil data was updated when rendering

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
8 years agoi965: Track that the stencil data was updated when clearing
Jordan Justen [Sat, 11 Jun 2016 23:27:48 +0000 (16:27 -0700)]
i965: Track that the stencil data was updated when clearing

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
8 years agoi965/gen7: Add R8_UINT stencil miptree copy for sampling
Jordan Justen [Sat, 11 Jun 2016 23:21:36 +0000 (16:21 -0700)]
i965/gen7: Add R8_UINT stencil miptree copy for sampling

For gen < 8, we can't sample from the stencil buffer, which is
required for the ARB_stencil_texturing extension. We'll make a copy of
the stencil data into a new texture that we can sample using the
R8_UINT surface type.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
8 years agoi965: Fix assert with multisampling and cubemaps
Jordan Justen [Wed, 24 Aug 2016 04:46:58 +0000 (21:46 -0700)]
i965: Fix assert with multisampling and cubemaps

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agoi965/hsw: Adjust uploading default color for stencil surfaces
Jordan Justen [Tue, 23 Aug 2016 05:47:50 +0000 (22:47 -0700)]
i965/hsw: Adjust uploading default color for stencil surfaces

v2:
 * has_component (Ken); const bits_per_channel (Topi)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agoi965/hsw: Don't advertise more than 64 threads for compute shaders
Jordan Justen [Tue, 14 Jun 2016 22:04:34 +0000 (15:04 -0700)]
i965/hsw: Don't advertise more than 64 threads for compute shaders

thread_width_max in the GPGPU walker command limits us to a maximum of
64 threads.

This fixes a crash on Haswell in the OpenGLES 3.1 conformance test
suite which tests the advertised limits of the max invocation counts.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agomain: Add MESA_VERBOSE=api support for glClearStencil
Jordan Justen [Sat, 11 Jun 2016 23:23:44 +0000 (16:23 -0700)]
main: Add MESA_VERBOSE=api support for glClearStencil

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agomain: Add MESA_VERBOSE=api support for glTexImage
Jordan Justen [Sat, 16 Jul 2016 01:03:29 +0000 (18:03 -0700)]
main: Add MESA_VERBOSE=api support for glTexImage

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agosvga: add guest statistic gathering interface
Charmaine Lee [Fri, 26 Aug 2016 13:58:59 +0000 (07:58 -0600)]
svga: add guest statistic gathering interface

This file was supposed to be added with the previous "svga: add guest
statistic gathering interface" patch but went MIA for some reason.

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agoradeonsi: disable CE on SI + AMDGPU
Marek Olšák [Thu, 18 Aug 2016 23:37:34 +0000 (01:37 +0200)]
radeonsi: disable CE on SI + AMDGPU

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
8 years agowinsys/amdgpu: disable IB chaining on SI
Marek Olšák [Fri, 24 Jun 2016 16:13:21 +0000 (18:13 +0200)]
winsys/amdgpu: disable IB chaining on SI

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
8 years agowinsys/amdgpu: finish up SI addrlib integration
Marek Olšák [Thu, 18 Aug 2016 23:40:29 +0000 (01:40 +0200)]
winsys/amdgpu: finish up SI addrlib integration

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
8 years agowinsys/amdgpu: initial SI support
Ronie Salgado [Thu, 11 Feb 2016 09:17:33 +0000 (06:17 -0300)]
winsys/amdgpu: initial SI support

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
8 years agogallium/radeon: add a driver query for AMDGPU_INFO_NUM_EVICTIONS
Marek Olšák [Wed, 17 Aug 2016 23:18:14 +0000 (01:18 +0200)]
gallium/radeon: add a driver query for AMDGPU_INFO_NUM_EVICTIONS

If the kernel driver doesn't support it, it returns 0.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
8 years agoradeonsi: fix printing shaders and states on a VM fault
Marek Olšák [Thu, 18 Aug 2016 13:24:41 +0000 (15:24 +0200)]
radeonsi: fix printing shaders and states on a VM fault

This was missed while rewriting the PIPE_DUMP flags.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
8 years agoradeonsi: increase performance for DRI PRIME offloading if 2nd GPU is CIK or VI
Marek Olšák [Thu, 18 Aug 2016 11:05:29 +0000 (13:05 +0200)]
radeonsi: increase performance for DRI PRIME offloading if 2nd GPU is CIK or VI

SDMA is much faster for tiled->linear blits from VRAM to GTT.
I have Bonaire in my second PCIe slot.

$ glxinfo | grep OpenGL.renderer
OpenGL renderer string: Gallium 0.4 on AMD TONGA ...

$ DRI_PRIME=1 glxinfo | grep OpenGL.renderer
OpenGL renderer string: Gallium 0.4 on AMD BONAIRE ...

Without SDMA:
$ DRI_PRIME=1 glxgears
8796 frames in 5.0 seconds = 1759.074 FPS
8899 frames in 5.0 seconds = 1779.672 FPS

With SDMA:
$ DRI_PRIME=1 glxgears
12765 frames in 5.0 seconds = 2552.788 FPS
12888 frames in 5.0 seconds = 2577.495 FPS

The 1st GPU is irrelevant. The improvement should be much lower at 60 fps,
but definitely measurable.

SI will get this once we add SDMA blit support for it.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
8 years agoradeonsi: enable SDMA on CIK
Marek Olšák [Thu, 18 Aug 2016 11:03:26 +0000 (13:03 +0200)]
radeonsi: enable SDMA on CIK

It passes R600_DEBUG=testdma on Bonaire/radeon.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
8 years agogallium/radeon: increase priority for shader binaries
Marek Olšák [Wed, 17 Aug 2016 12:24:26 +0000 (14:24 +0200)]
gallium/radeon: increase priority for shader binaries

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
8 years agogallium/radeon: merge USER_SHADER and INTERNAL_SHADER priority flags
Marek Olšák [Wed, 17 Aug 2016 12:22:11 +0000 (14:22 +0200)]
gallium/radeon: merge USER_SHADER and INTERNAL_SHADER priority flags

there's no reason to separate these

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
8 years agovbo: set draw_id
Miklós Máté [Fri, 26 Aug 2016 12:48:00 +0000 (06:48 -0600)]
vbo: set draw_id

Fixes conditional jump depending on uninitialized value
in si_state_draw.c:593

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agosvga: fix regression related to srgb
Neha Bhende [Thu, 18 Aug 2016 22:27:45 +0000 (15:27 -0700)]
svga: fix regression related to srgb

This regression is caused because of commit 3190c7ee9727161d627f107c2e7f8ec3a11941c1
Regression caused by following OpenGL 4.4 spec rules relates to
GL_FRAMEBUFFER_SRGB in Mesa.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
8 years agosvga: use local variable blit instead of pointer
Neha Bhende [Fri, 19 Aug 2016 19:52:57 +0000 (13:52 -0600)]
svga: use local variable blit instead of pointer

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
8 years agosvga: s/INDEX_0D/INDEX_IMMEDIATE32/
Brian Paul [Fri, 19 Aug 2016 15:37:11 +0000 (09:37 -0600)]
svga: s/INDEX_0D/INDEX_IMMEDIATE32/

Both are zero, but the later is the right token.

8 years agosvga: add comment about unsupported blend modes
Brian Paul [Fri, 19 Aug 2016 15:36:02 +0000 (09:36 -0600)]
svga: add comment about unsupported blend modes

8 years agosvga: fix ordering of mksstats counter strings
Charmaine Lee [Fri, 19 Aug 2016 01:02:17 +0000 (18:02 -0700)]
svga: fix ordering of mksstats counter strings

String for SVGA_STATS_COUNT_TEXREADBACK was swapped
with the string for SVGA_STATS_COUNT_SURFACEWRITEFLUSH.

Trivial fix.

8 years agosvga: avoid emitting redundant SetShaderResource command
Charmaine Lee [Wed, 17 Aug 2016 23:50:23 +0000 (16:50 -0700)]
svga: avoid emitting redundant SetShaderResource command

Tested with Lightsmark2008, Heaven, MTT piglit, glretrace, viewperf, conform.

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agosvga: add a cleanup function to clean up sampler state
Charmaine Lee [Wed, 17 Aug 2016 21:53:38 +0000 (14:53 -0700)]
svga: add a cleanup function to clean up sampler state

This patch adds a cleanup function to clean up sampler state at
context destruction time.

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agosvga: loosen the condition to flush in get_query_result_vgpu10()
Brian Paul [Fri, 19 Aug 2016 16:15:14 +0000 (10:15 -0600)]
svga: loosen the condition to flush in get_query_result_vgpu10()

Fixes piglit spec/ext_transform_feedback/overflow-edge-cases segfaults
because the query's fence pointer was null.

Tested with Piglit, Sauerbraten, ETQW.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
8 years agosvga: fix vgpu10 query fencing
Brian Paul [Thu, 18 Aug 2016 16:15:46 +0000 (10:15 -0600)]
svga: fix vgpu10 query fencing

We don't want to flush the command buffer or sync on the fence when ending
a query (that kind of defeats the whole purpose of async queries).  Do that
instead in get_query_result().

Tested with Piglit, arbocclude, Sauerbraten game, Nobel Clinician Viewer,
ETQW.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
8 years agosvga: avoid emitting redundant DXSetSamplers command
Charmaine Lee [Tue, 16 Aug 2016 01:35:28 +0000 (18:35 -0700)]
svga: avoid emitting redundant DXSetSamplers command

This patch avoid emitting redundant DXSetSamplers command.

Tested with Lightsmark2008, Heaven, MTT piglit, glretrace, viewperf.

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agosvga: enable ARB_clear_texture extension in the driver.
Neha Bhende [Thu, 11 Aug 2016 23:56:01 +0000 (16:56 -0700)]
svga: enable ARB_clear_texture extension in the driver.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
8 years agosvga: define svga_clear() in svga_init_clear_functions()
Neha Bhende [Thu, 11 Aug 2016 23:53:04 +0000 (16:53 -0700)]
svga: define svga_clear() in svga_init_clear_functions()

Put all the clearing related functions in svga_init_clear_functions()

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
8 years agosvga: add svga_init_clear_functions()
Neha Bhende [Thu, 11 Aug 2016 23:43:03 +0000 (16:43 -0700)]
svga: add svga_init_clear_functions()

define svga_init_clear_functions()
and svga_clear_texture as svga->pipe.clear_texture. This is part of
ARB_clear_texture extension

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
8 years agosvga: add new function svga_clear_texture()
Neha Bhende [Thu, 11 Aug 2016 23:37:24 +0000 (16:37 -0700)]
svga: add new function svga_clear_texture()

To clear texture this function can be used. This is part of
ARB_clear_texture extension. Basically this extension allows you to
clear texture with given color values.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
8 years agosvga: add new begin_blit()
Neha Bhende [Thu, 11 Aug 2016 23:30:14 +0000 (16:30 -0700)]
svga: add new begin_blit()

Saving all blitter states will be done in begin_blit() so that
begin_blit() can be used before performing any blit operation.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
8 years agosvga: add opt to the list of valid build types
Charmaine Lee [Fri, 12 Aug 2016 01:41:52 +0000 (18:41 -0700)]
svga: add opt to the list of valid build types

For opt build, add VMX86_STATS to the list of cpp defines.

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agosvga: add guest statistic gathering interface
Charmaine Lee [Fri, 19 Aug 2016 14:49:17 +0000 (08:49 -0600)]
svga: add guest statistic gathering interface

With this patch, guest statistic gathering interface is added to
svga winsys interface that can be used to gather svga driver
statistic. The winsys module can then share the statistic info with
the VMX host via the mksstats interface.

The statistic enums used in the svga driver are defined in
svga_stats_count and svga_stats_time in svga_winsys.h

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agosvga: fix indirect non-indexable temp access
Charmaine Lee [Tue, 25 Aug 2015 21:53:51 +0000 (14:53 -0700)]
svga: fix indirect non-indexable temp access

If the shader has indirect access to non-indexable temporaries,
convert these non-indexable temporaries to indexable temporary array.
This works around a bug in the GLSL->TGSI translator.

Fixes glsl-1.20/execution/fs-const-array-of-struct-of-array.shader_test
on DX11Renderer.

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agogallium/hud: move signo declaration inside PIPE_OS_UNIX block
Brian Paul [Wed, 17 Aug 2016 14:29:55 +0000 (08:29 -0600)]
gallium/hud: move signo declaration inside PIPE_OS_UNIX block

To silence unused var warning with MSVC, MinGW.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoi965: Embrace "unlimited" GTT mmap support
Chris Wilson [Wed, 24 Aug 2016 19:35:46 +0000 (20:35 +0100)]
i965: Embrace "unlimited" GTT mmap support

From about kernel 4.9, GTT mmaps are virtually unlimited. A new
parameter, I915_PARAM_MMAP_GTT_VERSION, is added to advertise the
feature so query it and use it to avoid limiting tiled allocations to
only fit within the mappable aperture.

A couple of caveats:

 - fence support is still limited by stride to 262144 and the stride
needs to be a multiple of tile_width (as before, and same limitation as
the current 3D pipeline in hardware)

 - the max_gtt_map_object_size forcing untiled may be hiding a few bugs
in handling of large objects, though none were spotted in piglits.

See kernel commit 4cc6907501ed ("drm/i915: Add I915_PARAM_MMAP_GTT_VERSION
to advertise unlimited mmaps").

v2: Include some commentary on mmap virtual space vs CPU addressable
space.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
8 years agomesa/main: Fix missing return in non void function
Tobias Klausmann [Thu, 25 Aug 2016 21:48:31 +0000 (23:48 +0200)]
mesa/main: Fix missing return in non void function

This was found by obs:
I: Program returns random data in a function
E: Mesa no-return-in-nonvoid-function main/program_resource.c:109

v2: Remove the ! on the string (Ian Romanick)

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
8 years agoi965: Implement GL_KHR_blend_equation_advanced_coherent on Gen9+.
Kenneth Graunke [Thu, 30 Jun 2016 05:16:49 +0000 (22:16 -0700)]
i965: Implement GL_KHR_blend_equation_advanced_coherent on Gen9+.

We always use a coherent read, and ignore the "opt out" enable flag.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agomesa: Implement GL_KHR_blend_equation_advanced_coherent.
Kenneth Graunke [Thu, 30 Jun 2016 04:53:06 +0000 (21:53 -0700)]
mesa: Implement GL_KHR_blend_equation_advanced_coherent.

This adds the extension enable (so drivers can advertise it) and the
extra boolean state flag, GL_BLEND_ADVANCED_COHERENT_KHR, which can
be set to request coherent blending.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agoi965: Enable GL_KHR_blend_equation_advanced on G45 and later.
Kenneth Graunke [Tue, 28 Jun 2016 06:02:24 +0000 (23:02 -0700)]
i965: Enable GL_KHR_blend_equation_advanced on G45 and later.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agoi965: Disable hardware blending if advanced blending is in use.
Kenneth Graunke [Tue, 28 Jun 2016 15:24:11 +0000 (08:24 -0700)]
i965: Disable hardware blending if advanced blending is in use.

We'll do blending in the shader in this case, so just disable the
hardware blending.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agoglsl: Add a lowering pass to handle advanced blending modes.
Kenneth Graunke [Mon, 27 Jun 2016 18:32:16 +0000 (11:32 -0700)]
glsl: Add a lowering pass to handle advanced blending modes.

Many GPUs cannot handle GL_KHR_blend_equation_advanced natively, and
need to emulate it in the pixel shader.  This lowering pass implements
all the necessary math for advanced blending.  It fetches the existing
framebuffer value using the MESA_shader_framebuffer_fetch built-in
variables, and the previous commit's state var uniform to select
which equation to use.

This is done at the GLSL IR level to make it easy for all drivers to
implement the GL_KHR_blend_equation_advanced extension and share code.

Drivers need to hook up MESA_shader_framebuffer_fetch functionality:
1. Hook up the fb_fetch_output variable
2. Implement BlendBarrier()

Then to get KHR_blend_equation_advanced, they simply need to:
3. Disable hardware blending based on ctx->Color._AdvancedBlendEnabled
4. Call this lowering pass.

Very little driver specific code should be required.

v2: Handle multiple output variables per render target (which may exist
    due to ARB_enhanced_layouts), and array variables (even with one
    render target, we might have out vec4 color[1]), and non-vec4
    variables (it's easier than finding spec text to justify not
    handling it).  Thanks to Francisco Jerez for the feedback.
v3: Lower main returns so that we have a single exit point where we
    can add our blending epilogue (caught by Francisco Jerez).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agocompiler: Add a new STATE_VAR_ADVANCED_BLENDING_MODE built-in uniform.
Kenneth Graunke [Tue, 28 Jun 2016 16:02:42 +0000 (09:02 -0700)]
compiler: Add a new STATE_VAR_ADVANCED_BLENDING_MODE built-in uniform.

This will be used for emulating GL_KHR_advanced_blend_equation features
in shader code.  We'll pass in the blending mode that's in use, and use
that in (effectively) a switch statement in the shader.

v2: Use the new _AdvancedBlendMode field.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agomesa: Add draw time validation for advanced blending modes.
Kenneth Graunke [Sat, 20 Aug 2016 19:51:03 +0000 (12:51 -0700)]
mesa: Add draw time validation for advanced blending modes.

v2: Add null checks (requested by Curro).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agomesa: Restyle _mesa_check_blend_func_error().
Kenneth Graunke [Sat, 20 Aug 2016 19:18:16 +0000 (12:18 -0700)]
mesa: Restyle _mesa_check_blend_func_error().

I'm about to add more error conditions to this function, so I wanted to
move the current spec citation above the code that checks it.  Indenting
it required reformatting, so I tried to move it to our newer style.

While there, I also decided to drop some GL type usage, and drop the
unnecessary "_mesa_" prefix on a static function.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agomesa: Track the current advanced blending mode.
Kenneth Graunke [Tue, 28 Jun 2016 15:17:57 +0000 (08:17 -0700)]
mesa: Track the current advanced blending mode.

This will be useful for a number of things:
- Checking the current advanced blending mode against the shader's
  blend_support_* qualifiers.
- Disabling hardware blending when emulating advanced blending.
- Uploading the current advanced blending mode as a state var.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agomesa: Allow advanced blending enums in glBlendEquation[i].
Kenneth Graunke [Tue, 28 Jun 2016 16:18:19 +0000 (09:18 -0700)]
mesa: Allow advanced blending enums in glBlendEquation[i].

Don't allow them in glBlendEquationSeparate[i], though, as required
by the spec.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agoglsl: Merge blend_support qualifiers when linking.
Kenneth Graunke [Tue, 28 Jun 2016 17:02:06 +0000 (10:02 -0700)]
glsl: Merge blend_support qualifiers when linking.

Since each qualifier represents a blending mode the shader can be used
with, we take the union of all possible modes when linking.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agoglsl: process blend_support_* qualifiers
Ilia Mirkin [Sat, 2 Apr 2016 02:51:39 +0000 (22:51 -0400)]
glsl: process blend_support_* qualifiers

v2 (Ken): Add a BLEND_NONE enum value (no qualifiers in use).
v3 (Ken): Rename gl_blend_support_qualifier to gl_advanced_blend_mode.
v4 (Ken): Mark map[] as static const (Ilia).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agoglsl: add basic KHR_blend_equation_advanced infrastructure
Ilia Mirkin [Sat, 2 Apr 2016 02:17:27 +0000 (22:17 -0400)]
glsl: add basic KHR_blend_equation_advanced infrastructure

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agomesa: add KHR_blend_equation_advanced enable and extension string
Ilia Mirkin [Sat, 2 Apr 2016 02:13:22 +0000 (22:13 -0400)]
mesa: add KHR_blend_equation_advanced enable and extension string

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agoglapi: add KHR_blend_equation_advanced dispatch
Ilia Mirkin [Sat, 2 Apr 2016 02:08:13 +0000 (22:08 -0400)]
glapi: add KHR_blend_equation_advanced dispatch

v2 (Ken): Fix enum values, drop _mesa_BlendBarrierKHR stub as Curro has
          already implemented it.
v3 (Ken): Rework for _mesa_BlendBarrierKHR -> _mesa_BlendBarrier rename.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agomesa: Rename _mesa_BlendBarrierMESA to _mesa_BlendBarrier.
Kenneth Graunke [Sat, 13 Aug 2016 02:07:33 +0000 (19:07 -0700)]
mesa: Rename _mesa_BlendBarrierMESA to _mesa_BlendBarrier.

Note that _mesa_BlendBarrierMESA is not currently hooked up in the
glapi XML, so we can just rename it.  We'll hook it up for the
KHR_blend_equation_advanced extension shortly.

We may as well use the ES 3.2 core name with no suffixes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agoi965: Safely iterate the predecessors of the end block.
Kenneth Graunke [Thu, 25 Aug 2016 04:33:16 +0000 (21:33 -0700)]
i965: Safely iterate the predecessors of the end block.

We want to insert code in each of the predecessors of the end block.
This code includes a nir_if, which would split the block, altering
the set.  To avoid that, I emitted a dead constant at the end of each
block before splitting it, so that the set of predecessors remained
unchanged.  This was admittedly ugly.

Connor suggested instead saving a copy of the set, so we can iterate
it safely.  This is also a little ugly, but a much better plan.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
8 years agonir: Use nir_shader_get_entrypoint in TCS quad workaround code.
Kenneth Graunke [Thu, 18 Aug 2016 17:56:48 +0000 (10:56 -0700)]
nir: Use nir_shader_get_entrypoint in TCS quad workaround code.

We want to insert the code at the end of the program.  Looping over
all the functions (of which there was only one) was the old way of doing
this, but now we have nir_shader_get_entrypoint(), so let's use it.

Suggested by Connor Abbott.

v2: Update for nir_shader_get_entrypoint API change.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
8 years agonir: Change nir_shader_get_entrypoint to return an impl.
Kenneth Graunke [Thu, 25 Aug 2016 02:09:57 +0000 (19:09 -0700)]
nir: Change nir_shader_get_entrypoint to return an impl.

Jason suggested adding an assert(function->impl) here.  All callers
of this function actually want ->impl, so I decided just to change
the API.

We also change the nir_lower_io_to_temporaries API here.  All but one
caller passed nir_shader_get_entrypoint(), and with the previous commit,
it now uses a nir_function_impl internally.  Folding this change in
avoids the need to change it and change it back.

v2: Fix one call I missed in ir3_compiler (caught by Eric).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
8 years agonir: Make nir_lower_io_to_temporaries store an impl internally.
Kenneth Graunke [Thu, 25 Aug 2016 02:15:53 +0000 (19:15 -0700)]
nir: Make nir_lower_io_to_temporaries store an impl internally.

This changes the pass internals to work with a nir_function_impl
directly rather than a nir_function.  The next patch will change
the API.

v2: Rebase after framebuffer fetch landed.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
8 years agoi965: Expose shader framebuffer fetch extensions on Gen9+.
Francisco Jerez [Fri, 22 Jul 2016 22:52:49 +0000 (15:52 -0700)]
i965: Expose shader framebuffer fetch extensions on Gen9+.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Hook up coherent framebuffer reads to the NIR front-end.
Francisco Jerez [Fri, 19 Aug 2016 05:12:37 +0000 (22:12 -0700)]
i965/fs: Hook up coherent framebuffer reads to the NIR front-end.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Remove special casing of framebuffer writes in scheduler code.
Francisco Jerez [Thu, 21 Jul 2016 23:56:05 +0000 (16:56 -0700)]
i965/fs: Remove special casing of framebuffer writes in scheduler code.

The reason why it was safe for the scheduler to ignore the side
effects of framebuffer write instructions was that its side effects
couldn't have had any influence on any other instruction in the
program, because we weren't doing framebuffer reads, and framebuffer
writes were always non-overlapping.  We need actual memory dependency
analysis in order to determine whether a side-effectful instruction
can be reordered with respect to other instructions in the program.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Don't CSE render target messages with different target index.
Francisco Jerez [Thu, 7 Jul 2016 03:49:58 +0000 (20:49 -0700)]
i965/fs: Don't CSE render target messages with different target index.

We weren't checking the fs_inst::target field when comparing whether
two instructions are equal.  For FB writes it doesn't matter because
they aren't CSE-able anyway, but this would have become a problem with
FB reads which are expression-like instructions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Define logical framebuffer read opcode and lower it to physical reads.
Francisco Jerez [Thu, 21 Jul 2016 23:55:45 +0000 (16:55 -0700)]
i965/fs: Define logical framebuffer read opcode and lower it to physical reads.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Define framebuffer read virtual opcode.
Francisco Jerez [Thu, 21 Jul 2016 23:52:33 +0000 (16:52 -0700)]
i965/fs: Define framebuffer read virtual opcode.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/disasm: Fix RC message type strings on Gen7+.
Francisco Jerez [Tue, 19 Jul 2016 18:52:23 +0000 (11:52 -0700)]
i965/disasm: Fix RC message type strings on Gen7+.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/eu: Add codegen support for the Gen9+ render target read message.
Francisco Jerez [Fri, 22 Jul 2016 02:13:55 +0000 (19:13 -0700)]
i965/eu: Add codegen support for the Gen9+ render target read message.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/eu: Take into account the target cache argument in brw_set_dp_read_message.
Francisco Jerez [Fri, 22 Jul 2016 01:49:36 +0000 (18:49 -0700)]
i965/eu: Take into account the target cache argument in brw_set_dp_read_message.

brw_set_dp_read_message() was setting the data cache as send message
SFID on Gen7+ hardware, ignoring the target cache specified by the
caller.  Some of the callers were passing a bogus target cache value
as argument relying on brw_set_dp_read_message not to take it into
account.  Fix them too.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Flip the non-coherent framebuffer fetch extension bit on G45-Gen8 hardware.
Francisco Jerez [Tue, 19 Jul 2016 22:23:30 +0000 (15:23 -0700)]
i965: Flip the non-coherent framebuffer fetch extension bit on G45-Gen8 hardware.

This is not enabled on the original Gen4 part because it lacks surface
state tile offsets so it may not be possible to sample from arbitrary
non-zero layers of the framebuffer depending on the miptree layout (it
should be possible to work around this by allocating a scratch surface
and doing the same hack currently used for render targets, but meh...).

On Gen9+ even though it should mostly work (feel free to force-enable
it in order to compare the coherent and non-coherent paths in terms of
performance), there are some corner cases like 1D array layered
framebuffers that cannot be handled easily by the non-coherent path
because of the incompatible layout in memory of 1D and 2D miptrees (it
should be possible to work around this too by doing state-dependent
recompiles, but it's hard to care enough since Gen9 has native support
for coherent render target reads...)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Implement glBlendBarrier.
Francisco Jerez [Fri, 1 Jul 2016 20:54:05 +0000 (13:54 -0700)]
i965: Implement glBlendBarrier.

This is a no-op if the platform supports coherent framebuffer fetch,
-- If it doesn't we just need to flush the render cache and invalidate
the texture cache in order for previous rendering to be visible to
framebuffer fetch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Upload surface state for non-coherent framebuffer fetch.
Francisco Jerez [Fri, 1 Jul 2016 20:56:47 +0000 (13:56 -0700)]
i965: Upload surface state for non-coherent framebuffer fetch.

This iterates over the list of attached render buffers and binds
appropriate surface state structures to the binding table block
allocated for shader framebuffer read.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Implement support for overriding the texture target in brw_emit_surface_state.
Francisco Jerez [Fri, 22 Jul 2016 05:23:13 +0000 (22:23 -0700)]
i965: Implement support for overriding the texture target in brw_emit_surface_state.

This allows the caller to bind a miptree using a texture target other
than the one it it was created with.  The code should work even if the
memory layouts of the specified and original targets don't match, as
long as the caller only intends to access a single slice of the
miptree structure.

This will be exploited by the next commit in order to support
non-coherent framebuffer fetch of a single layer of a 3D texture
(since some generations lack the minimum array element control for 3D
textures bound to the sampler unit), and multiple layers of a 1D array
texture (since binding it as an actual 1D array texture would require
state-dependent recompiles because the same shader couldn't
simultaneously work for 1D and 2D array textures due to the different
texel fetch coordinate ordering).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Massage argument list of brw_emit_surface_state().
Francisco Jerez [Fri, 19 Aug 2016 05:08:10 +0000 (22:08 -0700)]
i965: Massage argument list of brw_emit_surface_state().

This commit does three different things in a single pass in order to
keep the amount of churn low: Remove the for_gather boolean argument
which was unused, pass the isl_view argument by value rather than by
reference since I'll have to modify it from within the function, and
add a target argument to allow callers to bind textures using a target
other than the original.  The prototype of the function now looks
like:

 void brw_emit_surface_state(struct brw_context *brw,
                             struct intel_mipmap_tree *mt,
                             GLenum target, struct isl_view view,
                             uint32_t mocs, uint32_t *surf_offset, int surf_index,
                             unsigned read_domains, unsigned write_domains);

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Add missing has_surface_tile_offset flag to the Gen8+ device info structures.
Francisco Jerez [Tue, 19 Jul 2016 01:06:02 +0000 (18:06 -0700)]
i965: Add missing has_surface_tile_offset flag to the Gen8+ device info structures.

This surface state control has been supported by all hardware
generations since G45.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Return the correct layout from get_isl_dim_layout for pre-ILK cube textures.
Francisco Jerez [Fri, 22 Jul 2016 05:09:46 +0000 (22:09 -0700)]
i965: Return the correct layout from get_isl_dim_layout for pre-ILK cube textures.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Factor out isl_surf_dim/isl_dim_layout calculation into functions.
Francisco Jerez [Tue, 19 Jul 2016 01:07:35 +0000 (18:07 -0700)]
i965: Factor out isl_surf_dim/isl_dim_layout calculation into functions.

The logic to calculate the right layout and dimensionality for a given
GL texture target is going to be useful elsewhere, factor it out from
intel_miptree_get_isl_surf().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Resolve color for non-coherent FB fetch at UpdateState time.
Francisco Jerez [Fri, 1 Jul 2016 20:45:22 +0000 (13:45 -0700)]
i965: Resolve color for non-coherent FB fetch at UpdateState time.

This is required because the sampler unit used to fetch from the
framebuffer is unable to interpret non-color-compressed fast-cleared
single-sample texture data.  Roughly the same limitation applies for
surfaces bound to texture or image units, but unlike texture sampling,
non-coherent framebuffer fetch is by definition non-coherent with
previous rendering, so the brw_render_cache_set_check_flush() call can
be omitted except after resolve.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Return whether the miptree was resolved from intel_miptree_resolve_color().
Francisco Jerez [Sat, 23 Jul 2016 01:16:45 +0000 (18:16 -0700)]
i965: Return whether the miptree was resolved from intel_miptree_resolve_color().

This will allow optimizing out the cache flush in some cases when
resolving wasn't necessary.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Translate nir_intrinsic_load_output on a fragment output.
Francisco Jerez [Fri, 22 Jul 2016 04:57:00 +0000 (21:57 -0700)]
i965/fs: Translate nir_intrinsic_load_output on a fragment output.

This gets the non-coherent framebuffer fetch path hooked up to the NIR
front-end.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Allocate fragment output temporaries on demand.
Francisco Jerez [Fri, 22 Jul 2016 04:47:45 +0000 (21:47 -0700)]
i965/fs: Allocate fragment output temporaries on demand.

This gets rid of the duplication of logic between nir_setup_outputs()
and get_frag_output() by allocating fragment output temporaries lazily
whenever get_frag_output() is called.  This makes nir_setup_outputs()
a no-op for the fragment shader stage.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Rework representation of fragment output locations in NIR.
Francisco Jerez [Fri, 22 Jul 2016 04:26:20 +0000 (21:26 -0700)]
i965/fs: Rework representation of fragment output locations in NIR.

The problem with the current approach is that driver output locations
are represented as a linear offset within the nir_outputs array, which
makes it rather difficult for the back-end to figure out what color
output and index some nir_intrinsic_load/store_output was meant for,
because the offset of a given output within the nir_output array is
dependent on the type and size of all previously allocated outputs.
Instead this defines the driver location of an output to be the pair
formed by its GLSL-assigned location and index (I've borrowed the
bitfield macros from brw_defines.h in order to represent the pair of
integers as a single scalar value that can be assigned to
nir_variable_data::driver_location).  nir_assign_var_locations is no
longer useful for fragment outputs.

Because fragment outputs are now allocated independently rather than
within the nir_outputs array, the get_frag_output() helper becomes
necessary in order to obtain the right temporary register for a given
location-index pair.

The type_size helper passed to nir_lower_io is now type_size_dvec4
rather than type_size_vec4_times_4 so that output array offsets are
provided in terms of whole array elements rather than in terms of
scalar components (dvec4 is the largest vector type supported by the
GLSL so this will cause all individual fragment outputs to have a size
of one regardless of the type).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Fix undefined signed overflow in INTEL_MASK for bitfields of 31 bits.
Francisco Jerez [Fri, 22 Jul 2016 04:58:56 +0000 (21:58 -0700)]
i965: Fix undefined signed overflow in INTEL_MASK for bitfields of 31 bits.

Most likely we had only ever used this macro on bitfields of less than
31 bits -- That's going to change shortly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Special-case nir_intrinsic_store_output for the fragment shader.
Francisco Jerez [Fri, 22 Jul 2016 04:25:46 +0000 (21:25 -0700)]
i965/fs: Special-case nir_intrinsic_store_output for the fragment shader.

I'm about to change how fragment shader output locations are
represented, so the generic nir_intrinsic_store_output implementation
that assumes that outputs are just contiguous elements in the big
nir_outputs array won't work anymore.  This somewhat simplified
implementation of nir_intrinsic_store_output for fragment shaders
should be functionally equivalent to the current fall-back one.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Implement non-coherent framebuffer fetch using the sampler unit.
Francisco Jerez [Fri, 22 Jul 2016 03:25:28 +0000 (20:25 -0700)]
i965/fs: Implement non-coherent framebuffer fetch using the sampler unit.

v2: Memoize sample ID, misc codestyle changes. (Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Emit interpolation setup if non-coherent framebuffer fetch is in use.
Francisco Jerez [Fri, 22 Jul 2016 03:35:29 +0000 (20:35 -0700)]
i965/fs: Emit interpolation setup if non-coherent framebuffer fetch is in use.

This will be required for the next commit since the non-coherent path
makes use of the fragment coordinates implicitly, so they need to be
calculated.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965/fs: Force per-sample dispatch if the shader reads from a multisample FBO.
Francisco Jerez [Thu, 21 Jul 2016 23:20:07 +0000 (16:20 -0700)]
i965/fs: Force per-sample dispatch if the shader reads from a multisample FBO.

The result of a framebuffer fetch from a multisample FBO is inherently
per-sample, so the spec requires at least those sections of the shader
that depend on the framebuffer fetch result to be executed once per
sample.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agoi965: Allocate space in the binding table for non-coherent FB fetch.
Francisco Jerez [Fri, 1 Jul 2016 20:46:40 +0000 (13:46 -0700)]
i965: Allocate space in the binding table for non-coherent FB fetch.

Unfortunately due to the inconsistent meaning of some surface state
structure fields, we cannot re-use the same binding table entries for
sampling from and rendering into the same set of render buffers, so we
need to allocate a separate binding table block specifically for
render target reads if the non-coherent path is in use.

The slight noise is due to the change of
brw_assign_common_binding_table_offsets to return the next available
binding table index rather than void.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>