git.libre-soc.org Git - mesa.git/log

glsl: Allow invocations layout qualifier with GL_OES_geometry_shader

Fixes

dEQP-GLES31.functional.geometry_shading.instanced.geometry_1_invocations
dEQP-GLES31.functional.geometry_shading.instanced.invocation_per_layer_2d_array
dEQP-GLES31.functional.geometry_shading.instanced.invocation_per_layer_2d_multisample_array
dEQP-GLES31.functional.geometry_shading.instanced.invocation_per_layer_3d
dEQP-GLES31.functional.geometry_shading.instanced.invocation_per_layer_cubemap
dEQP-GLES31.functional.geometry_shading.instanced.multiple_layers_per_invocation_2d_array
dEQP-GLES31.functional.geometry_shading.instanced.multiple_layers_per_invocation_2d_multisample_array
dEQP-GLES31.functional.geometry_shading.instanced.multiple_layers_per_invocation_3d
dEQP-GLES31.functional.geometry_shading.instanced.multiple_layers_per_invocation_cubemap
dEQP-GLES31.functional.geometry_shading.query.geometry_shader_invocations

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

glsl: Allow gl_InvocationID and gl_Layer with GL_OES_geometry_shader

Fixes

dEQP-GLES31.functional.geometry_shading.layered.fragment_layer_2d_array
dEQP-GLES31.functional.geometry_shading.layered.fragment_layer_2d_multisample_array
dEQP-GLES31.functional.geometry_shading.layered.fragment_layer_3d
dEQP-GLES31.functional.geometry_shading.layered.fragment_layer_cubemap

v2: Don't enable gl_ViewportIndex in GLSL ES 3.20. Noticed by Ilia.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

mesa: Allow GL_EXT_geometry_shader and GL_EXT_geometry_point_size

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

mesa: Document reasons for allowing XFB drawing modes in GLES 3.1 w/GL_OES_geometry_shader

Originally this patch added the checks to allow the draw calls with XFB,
but commit 2dabd497 beat me to it.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

mesa: Remove redundant _mesa_has_shader_subroutine

The checks in _mesa_has_shader_subroutine are slightly different than
_mesa_has_ARB_shader_subroutine, but they're not different in a way
that matters. The only way to have ctx->Version >= 40 is if
ctx->Extensions.ARB_shader_subroutine is set.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>

nouveau: Enable EXT_texture_env_dot3 on NV10 and NV20

GL_DOT3_RGB_EXT and GL_DOT3_RGBA_EXT. are nearly identical to
GL_DOT3_RGB and GL_DOT3_RGBA. The only difference is the _EXT
versions do not apply the post-scale. Just smash logscale to 0 so
that RC_OUT_SCALE_1 is always used.

NOTE: I have not actually tested this.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>

nouveau: Fix non-1x post-scale factor with DOT3 combiner

Fixes long standing bug on NV10 and NV20 where using a non-1x RGB or A
post-scale with GL_DOT3_RGB or GL_DOT3_RGBA texture environment would
not work.

The old combiner math uses HALF_BIAS_NORMAL and HALF_BIAS_NEGATE.  The
GL_NV_register_combiners defines these as

    HALF_BIAS_NORMAL_NV       max(0.0, e) - 0.5
    HALF_BIAS_NEGATE_NV       -max(0.0, e) + 0.5

In order to get the correct result from the dot-product, the
intermediate dot-product must be multiplied by 4.  This is a literal
implementation of the GL_ARB_texture_env_dot3 spec.  It also requires
using the register combiner post-scale.  As a result, the post-scale
cannot be used for the post-scale set by the application.

The new combiner math uses EXPAND_NORMAL and EXPAND_NEGATE.  The
GL_NV_register_combiners defines these as

    EXPAND_NORMAL_NV          2.0 * max(0.0, e) - 1.0
    EXPAND_NEGATE_NV          -2.0 * max(0.0, e) + 1.0

Since this fully expands the value to [-1, 1] range, the intermediate
dot-product result is the desired value.  This leaves the register
combiner post-scale available for application use.

NOTE: I have not actually tested this.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>

docs: Rename GL3.txt to features.txt

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>

docs: Update GL3.txt for OpenGL 4.x on i965-ish hardware

v2: Note that GL_KHR_blend_equation_advanced and
GL_KHR_blend_equation_advanced_coherent are done.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

docs: add links to clarify patch mailing section

* Changed "Mesa mailing list" to "mesa-dev mailing list" to clarify
  which list patches should be sent to

* Added an explicit link to
  https://lists.freedesktop.org/mailman/listinfo/mesa-dev to show
  where to subscribe to the list

* Added a link to https://git-scm.com/docs/git-send-email to help new
  users of that command

v2: add signed-off-by

Signed-off-by: Nicholas Bishop <nicholasbishop@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>

svga: minor whitespace, etc clean-ups in svga_pipe_misc.c

Reviewed-by: Neha Bhende <bhenden@vmware.com>

svga: move some code in svga_propagate_surface()

Move computation of zslice, layer inside the conditional where they're
used.

Reviewed-by: Neha Bhende <bhenden@vmware.com>

svga: simplify surface propagation code in svga_set_framebuffer_state()

Rewrite the comment too.

Reviewed-by: Neha Bhende <bhenden@vmware.com>

svga: add some comments in the svga_surface struct

Give more info about backing resources/surfaces.

Reviewed-by: Neha Bhende <bhenden@vmware.com>

svga: use new svga_check_sampler_framebuffer_resource_collision()

Reviewed-by: Neha Bhende <bhenden@vmware.com>

svga: add new svga_check_sampler_framebuffer_resource_collision()

Reviewed-by: Neha Bhende <bhenden@vmware.com>

svga: remove assertions in svga_surface cast wrappers

We don't do this for other cast wrappers. And this will simplify some
code at call sites.

Reviewed-by: Neha Bhende <bhenden@vmware.com>

svga: minor code simplification in svga_texture_transfer_unmap()

Use the tex variable instead of using svga_texture() again.

Reviewed-by: Neha Bhende <bhenden@vmware.com>

svga: reformat some expressions in svga_texture_transfer_map()

Reviewed-by: Neha Bhende <bhenden@vmware.com>

svga: remove duplicated variable in svga_texture_transfer_map()

tex was already declared at the function body scope.

Reviewed-by: Neha Bhende <bhenden@vmware.com>

svga: move some assignments in svga_texture_transfer_map()

Put near other assignments to the svga_transfer variable.

Reviewed-by: Neha Bhende <bhenden@vmware.com>

svga: minor simplifications in svga_texture_transfer_map()

Use local vars instead of jumping through a pointer.

Reviewed-by: Neha Bhende <bhenden@vmware.com>

svga: minor reformatting of svga_texture() cast wrapper

Reviewed-by: Neha Bhende <bhenden@vmware.com>

svga: rewrite svga_buffer() cast wrapper

To make it symmetric with the svga_texture() cast wrapper.

Reviewed-by: Neha Bhende <bhenden@vmware.com>

svga: remove local variable in create_backed_surface_view()

To simplify the code a bit.

Reviewed-by: Neha Bhende <bhenden@vmware.com>

docs: Add GL_KHR_blend_equation_advanced to relnotes.

r600: increase performance for DRI PRIME offloading if 2nd GPU is Evergreen+

This is a direct port of Marek Olšáks patch
"radeonsi: increase performance for DRI PRIME
offloading if 2nd GPU is CIK or VI" to r600.

It uses SDMA for the detiling blit from renderoffload VRAM
to GTT, as SDMA is much faster for tiled->linear blits from
VRAM to GTT.

Testing on a dual Radeon HD-5770 setup reduced the time
for the render offload gpu to get its rendering into
system RAM from approximately 16 msecs for simple rendering
at 1920x1080 pixel 32 bpp to 5 msecs, a > 3x speedup!

This was measured using ftrace to trace the time the radeon kms
driver waited on the dmabuf fence of the renderoffload gpu to
complete.

All in all this brought the time for a flip down from 20 msecs
to 9 msecs, so the prime setup can display at full 60 fps instead
of barely 30 fps vsync'ed.

The current r600 implementation supports SDMA on Evergreen and
later, but not R600/R700 due to some bugs apparently present
in their SDMA implementation.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

docs: Update stencil texturing & ES 3.1 status for i965 Haswell

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

i965: Enable OpenGLES 3.1 for Haswell

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

i965: Enable ARB_texture_stencil8 for Haswell

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

i965: Enable ARB_stencil_texturing for Haswell

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

i965/gen7: Use R8_UINT stencil copy when sampling the stencil texture

v2:
* Check gen <= 7, rather than gen == 7. (Ian)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

i965/gen7: Copy stencil when sampling the stencil texture

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

i965: Add function to copy a stencil miptree to an R8_UINT miptree

v2:
* Cleanups suggested by Ian, Matt and Topi

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

i965: Track that the stencil data was updated when using Tex*Image

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

i965: Track that the stencil data was updated when rendering

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

i965: Track that the stencil data was updated when clearing

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

i965/gen7: Add R8_UINT stencil miptree copy for sampling

For gen < 8, we can't sample from the stencil buffer, which is
required for the ARB_stencil_texturing extension. We'll make a copy of
the stencil data into a new texture that we can sample using the
R8_UINT surface type.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

i965: Fix assert with multisampling and cubemaps

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

i965/hsw: Adjust uploading default color for stencil surfaces

v2:
* has_component (Ken); const bits_per_channel (Topi)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

i965/hsw: Don't advertise more than 64 threads for compute shaders

thread_width_max in the GPGPU walker command limits us to a maximum of
64 threads.

This fixes a crash on Haswell in the OpenGLES 3.1 conformance test
suite which tests the advertised limits of the max invocation counts.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

main: Add MESA_VERBOSE=api support for glClearStencil

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

main: Add MESA_VERBOSE=api support for glTexImage

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

svga: add guest statistic gathering interface

This file was supposed to be added with the previous "svga: add guest
statistic gathering interface" patch but went MIA for some reason.

Reviewed-by: Brian Paul <brianp@vmware.com>

radeonsi: disable CE on SI + AMDGPU

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

winsys/amdgpu: disable IB chaining on SI

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

winsys/amdgpu: finish up SI addrlib integration

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>

winsys/amdgpu: initial SI support

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>

gallium/radeon: add a driver query for AMDGPU_INFO_NUM_EVICTIONS

If the kernel driver doesn't support it, it returns 0.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radeonsi: fix printing shaders and states on a VM fault

This was missed while rewriting the PIPE_DUMP flags.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radeonsi: increase performance for DRI PRIME offloading if 2nd GPU is CIK or VI

SDMA is much faster for tiled->linear blits from VRAM to GTT.
I have Bonaire in my second PCIe slot.

$ glxinfo | grep OpenGL.renderer
OpenGL renderer string: Gallium 0.4 on AMD TONGA ...

$ DRI_PRIME=1 glxinfo | grep OpenGL.renderer
OpenGL renderer string: Gallium 0.4 on AMD BONAIRE ...

Without SDMA:
$ DRI_PRIME=1 glxgears
8796 frames in 5.0 seconds = 1759.074 FPS
8899 frames in 5.0 seconds = 1779.672 FPS

With SDMA:
$ DRI_PRIME=1 glxgears
12765 frames in 5.0 seconds = 2552.788 FPS
12888 frames in 5.0 seconds = 2577.495 FPS

The 1st GPU is irrelevant. The improvement should be much lower at 60 fps,
but definitely measurable.

SI will get this once we add SDMA blit support for it.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radeonsi: enable SDMA on CIK

It passes R600_DEBUG=testdma on Bonaire/radeon.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

gallium/radeon: increase priority for shader binaries

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

gallium/radeon: merge USER_SHADER and INTERNAL_SHADER priority flags

there's no reason to separate these

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

vbo: set draw_id

Fixes conditional jump depending on uninitialized value
in si_state_draw.c:593

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Miklós Máté <mtmkls@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>

svga: fix regression related to srgb

This regression is caused because of commit 3190c7ee9727161d627f107c2e7f8ec3a11941c1
Regression caused by following OpenGL 4.4 spec rules relates to
GL_FRAMEBUFFER_SRGB in Mesa.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>

svga: use local variable blit instead of pointer

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>

svga: s/INDEX_0D/INDEX_IMMEDIATE32/

Both are zero, but the later is the right token.

svga: add comment about unsupported blend modes

svga: fix ordering of mksstats counter strings

String for SVGA_STATS_COUNT_TEXREADBACK was swapped
with the string for SVGA_STATS_COUNT_SURFACEWRITEFLUSH.

Trivial fix.

svga: avoid emitting redundant SetShaderResource command

Tested with Lightsmark2008, Heaven, MTT piglit, glretrace, viewperf, conform.

Reviewed-by: Brian Paul <brianp@vmware.com>

svga: add a cleanup function to clean up sampler state

This patch adds a cleanup function to clean up sampler state at
context destruction time.

Reviewed-by: Brian Paul <brianp@vmware.com>

svga: loosen the condition to flush in get_query_result_vgpu10()

Fixes piglit spec/ext_transform_feedback/overflow-edge-cases segfaults
because the query's fence pointer was null.

Tested with Piglit, Sauerbraten, ETQW.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>

svga: fix vgpu10 query fencing

We don't want to flush the command buffer or sync on the fence when ending
a query (that kind of defeats the whole purpose of async queries). Do that
instead in get_query_result().

Tested with Piglit, arbocclude, Sauerbraten game, Nobel Clinician Viewer,
ETQW.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>

svga: avoid emitting redundant DXSetSamplers command

This patch avoid emitting redundant DXSetSamplers command.

Tested with Lightsmark2008, Heaven, MTT piglit, glretrace, viewperf.

Reviewed-by: Brian Paul <brianp@vmware.com>

svga: enable ARB_clear_texture extension in the driver.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>

svga: define svga_clear() in svga_init_clear_functions()

Put all the clearing related functions in svga_init_clear_functions()

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>

svga: add svga_init_clear_functions()

define svga_init_clear_functions()
and svga_clear_texture as svga->pipe.clear_texture. This is part of
ARB_clear_texture extension

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>

svga: add new function svga_clear_texture()

To clear texture this function can be used. This is part of
ARB_clear_texture extension. Basically this extension allows you to
clear texture with given color values.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>

svga: add new begin_blit()

Saving all blitter states will be done in begin_blit() so that
begin_blit() can be used before performing any blit operation.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>

svga: add opt to the list of valid build types

For opt build, add VMX86_STATS to the list of cpp defines.

Reviewed-by: Brian Paul <brianp@vmware.com>

svga: add guest statistic gathering interface

With this patch, guest statistic gathering interface is added to
svga winsys interface that can be used to gather svga driver
statistic. The winsys module can then share the statistic info with
the VMX host via the mksstats interface.

The statistic enums used in the svga driver are defined in
svga_stats_count and svga_stats_time in svga_winsys.h

Reviewed-by: Brian Paul <brianp@vmware.com>

svga: fix indirect non-indexable temp access

If the shader has indirect access to non-indexable temporaries,
convert these non-indexable temporaries to indexable temporary array.
This works around a bug in the GLSL->TGSI translator.

Fixes glsl-1.20/execution/fs-const-array-of-struct-of-array.shader_test
on DX11Renderer.

Reviewed-by: Brian Paul <brianp@vmware.com>

gallium/hud: move signo declaration inside PIPE_OS_UNIX block

To silence unused var warning with MSVC, MinGW.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>

i965: Embrace "unlimited" GTT mmap support

From about kernel 4.9, GTT mmaps are virtually unlimited. A new
parameter, I915_PARAM_MMAP_GTT_VERSION, is added to advertise the
feature so query it and use it to avoid limiting tiled allocations to
only fit within the mappable aperture.

A couple of caveats:

- fence support is still limited by stride to 262144 and the stride
needs to be a multiple of tile_width (as before, and same limitation as
the current 3D pipeline in hardware)

- the max_gtt_map_object_size forcing untiled may be hiding a few bugs
in handling of large objects, though none were spotted in piglits.

See kernel commit 4cc6907501ed ("drm/i915: Add I915_PARAM_MMAP_GTT_VERSION
to advertise unlimited mmaps").

v2: Include some commentary on mmap virtual space vs CPU addressable
space.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

mesa/main: Fix missing return in non void function

This was found by obs:
I: Program returns random data in a function
E: Mesa no-return-in-nonvoid-function main/program_resource.c:109

v2: Remove the ! on the string (Ian Romanick)

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

i965: Implement GL_KHR_blend_equation_advanced_coherent on Gen9+.

We always use a coherent read, and ignore the "opt out" enable flag.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

mesa: Implement GL_KHR_blend_equation_advanced_coherent.

This adds the extension enable (so drivers can advertise it) and the
extra boolean state flag, GL_BLEND_ADVANCED_COHERENT_KHR, which can
be set to request coherent blending.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

i965: Enable GL_KHR_blend_equation_advanced on G45 and later.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

i965: Disable hardware blending if advanced blending is in use.

We'll do blending in the shader in this case, so just disable the
hardware blending.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

glsl: Add a lowering pass to handle advanced blending modes.

Many GPUs cannot handle GL_KHR_blend_equation_advanced natively, and
need to emulate it in the pixel shader.  This lowering pass implements
all the necessary math for advanced blending.  It fetches the existing
framebuffer value using the MESA_shader_framebuffer_fetch built-in
variables, and the previous commit's state var uniform to select
which equation to use.

This is done at the GLSL IR level to make it easy for all drivers to
implement the GL_KHR_blend_equation_advanced extension and share code.

Drivers need to hook up MESA_shader_framebuffer_fetch functionality:
1. Hook up the fb_fetch_output variable
2. Implement BlendBarrier()

Then to get KHR_blend_equation_advanced, they simply need to:
3. Disable hardware blending based on ctx->Color._AdvancedBlendEnabled
4. Call this lowering pass.

Very little driver specific code should be required.

v2: Handle multiple output variables per render target (which may exist
    due to ARB_enhanced_layouts), and array variables (even with one
    render target, we might have out vec4 color[1]), and non-vec4
    variables (it's easier than finding spec text to justify not
    handling it).  Thanks to Francisco Jerez for the feedback.
v3: Lower main returns so that we have a single exit point where we
    can add our blending epilogue (caught by Francisco Jerez).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

compiler: Add a new STATE_VAR_ADVANCED_BLENDING_MODE built-in uniform.

This will be used for emulating GL_KHR_advanced_blend_equation features
in shader code. We'll pass in the blending mode that's in use, and use
that in (effectively) a switch statement in the shader.

v2: Use the new _AdvancedBlendMode field.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

mesa: Add draw time validation for advanced blending modes.

v2: Add null checks (requested by Curro).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

mesa: Restyle _mesa_check_blend_func_error().

I'm about to add more error conditions to this function, so I wanted to
move the current spec citation above the code that checks it. Indenting
it required reformatting, so I tried to move it to our newer style.

While there, I also decided to drop some GL type usage, and drop the
unnecessary "_mesa_" prefix on a static function.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

mesa: Track the current advanced blending mode.

This will be useful for a number of things:
- Checking the current advanced blending mode against the shader's
blend_support_* qualifiers.
- Disabling hardware blending when emulating advanced blending.
- Uploading the current advanced blending mode as a state var.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

mesa: Allow advanced blending enums in glBlendEquation[i].

Don't allow them in glBlendEquationSeparate[i], though, as required
by the spec.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

glsl: Merge blend_support qualifiers when linking.

Since each qualifier represents a blending mode the shader can be used
with, we take the union of all possible modes when linking.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

glsl: process blend_support_* qualifiers

v2 (Ken): Add a BLEND_NONE enum value (no qualifiers in use).
v3 (Ken): Rename gl_blend_support_qualifier to gl_advanced_blend_mode.
v4 (Ken): Mark map[] as static const (Ilia).

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

glsl: add basic KHR_blend_equation_advanced infrastructure

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

mesa: add KHR_blend_equation_advanced enable and extension string

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

glapi: add KHR_blend_equation_advanced dispatch

v2 (Ken): Fix enum values, drop _mesa_BlendBarrierKHR stub as Curro has
already implemented it.
v3 (Ken): Rework for _mesa_BlendBarrierKHR -> _mesa_BlendBarrier rename.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

mesa: Rename _mesa_BlendBarrierMESA to _mesa_BlendBarrier.

Note that _mesa_BlendBarrierMESA is not currently hooked up in the
glapi XML, so we can just rename it. We'll hook it up for the
KHR_blend_equation_advanced extension shortly.

We may as well use the ES 3.2 core name with no suffixes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>

i965: Safely iterate the predecessors of the end block.

We want to insert code in each of the predecessors of the end block.
This code includes a nir_if, which would split the block, altering
the set.  To avoid that, I emitted a dead constant at the end of each
block before splitting it, so that the set of predecessors remained
unchanged.  This was admittedly ugly.

Connor suggested instead saving a copy of the set, so we can iterate
it safely.  This is also a little ugly, but a much better plan.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

nir: Use nir_shader_get_entrypoint in TCS quad workaround code.

We want to insert the code at the end of the program. Looping over
all the functions (of which there was only one) was the old way of doing
this, but now we have nir_shader_get_entrypoint(), so let's use it.

Suggested by Connor Abbott.

v2: Update for nir_shader_get_entrypoint API change.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

nir: Change nir_shader_get_entrypoint to return an impl.

Jason suggested adding an assert(function->impl) here.  All callers
of this function actually want ->impl, so I decided just to change
the API.

We also change the nir_lower_io_to_temporaries API here.  All but one
caller passed nir_shader_get_entrypoint(), and with the previous commit,
it now uses a nir_function_impl internally.  Folding this change in
avoids the need to change it and change it back.

v2: Fix one call I missed in ir3_compiler (caught by Eric).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

nir: Make nir_lower_io_to_temporaries store an impl internally.

This changes the pass internals to work with a nir_function_impl
directly rather than a nir_function. The next patch will change
the API.

v2: Rebase after framebuffer fetch landed.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

i965: Expose shader framebuffer fetch extensions on Gen9+.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/fs: Hook up coherent framebuffer reads to the NIR front-end.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/fs: Remove special casing of framebuffer writes in scheduler code.

The reason why it was safe for the scheduler to ignore the side
effects of framebuffer write instructions was that its side effects
couldn't have had any influence on any other instruction in the
program, because we weren't doing framebuffer reads, and framebuffer
writes were always non-overlapping. We need actual memory dependency
analysis in order to determine whether a side-effectful instruction
can be reordered with respect to other instructions in the program.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/fs: Don't CSE render target messages with different target index.

We weren't checking the fs_inst::target field when comparing whether
two instructions are equal. For FB writes it doesn't matter because
they aren't CSE-able anyway, but this would have become a problem with
FB reads which are expression-like instructions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>