git.libre-soc.org Git - mesa.git/log

glsl: Reformat and/or/xor cases in ir_expression ctor

Replace tabs with spaces. According to docs/devinfo.html, Mesa's
indetation style is:
indent -br -i3 -npcs --no-tabs infile.c -o outfile.c

This patch prevents whitespace weirdness in the next patch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>

glsl/ir_builder: Add helpers for making if-statements

Add two overloaded variants of
ir_if *if_tree()

The new functions allow one to chain together if-trees within a single C++
expression that resembles a real if-statement.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>

glsl/ir_builder: Add `enum writemask`

Using this enum improves the readibility of calls to assign(), whose third
argument is a writemask.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>

glsl/ir_factory: Add helper method for making an ir_constant

Add method ir_factory::constant. This little method constructs an
ir_constant using the factory's mem_ctx.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>

glsl/ir_builder: Add more helpers for constructing expressions

Add the following functions, each of which construct the similarly named
ir expression:
    div, round_even, clamp

    equal, less, greater, lequal, gequal

    logic_not, logic_and, logic_or

    bit_not, bit_or, bit_and, lshift, rshift

    f2i, i2f, f2u, u2f, i2u, u2i

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>

glsl/ir_factory: Initialize members to NULL in constructor

This eliminates unexpected behavior due to unitialized values.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>

glsl: Evaluate constant GLSL ES 3.00 pack/unpack expressions (v3)

That is, evaluate constant expressions of the following functions:
  packSnorm2x16  unpackSnorm2x16
  packUnorm2x16  unpackUnorm2x16
  packHalf2x16   unpackHalf2x16

v2: Reuse _mesa_pack_float_to_half and its inverse to evaluate
    pack/unpackHalf2x16. [for idr]
v3: Whitespace fixes. [for mattst88]
    Don't cast neg floats directly to uint16; use an intermediate cast to
    int16. [for paul]

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v2)
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Matt Tuner <mattst88@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>

mesa: Remove rounding bias in _mesa_float_to_half()

Not all float32 values can be exactly represented as a float16.
_mesa_float_to_half() rounded such intermediate float32 values to zero by
truncating unrepresentable bits in the mantissa.

This patch improves _mesa_float_to_half() by rounding intermediate float32
values to the nearest float16; when the float32 is exactly between two
float16 values we round to the one with an even mantissa. This behavior is
preferred over the old behavior because:
  - It has reduced bias relative to the old behavior.

  - It reproduces the behavior of real hardware: opcode F32TO16 in
    Intel's GPU ISA.

  - By reproducing the behavior of the GPU (at least on Intel hardware),
    compile-time evaluation of constant packHalf2x16 GLSL expressions will
    result in the same value as if the expression were executed on the GPU.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>

mesa,glsl: Move round_to_even() from glsl to mesa/main (v2)

Move round_to_even's definition to mesa/main so that _mesa_float_to_half()
can use it in order to eliminate rounding bias.

In additon to moving the fuction definition, prefix its name with "_mesa",
just as all other functions in mesa/main are prefixed.

v2: Fix Android build.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>

glsl/standalone_scaffolding: Add stub for _mesa_warning()

A subsequent patch will add mesa/main/imports.c as a dependency to the
compiler, which in turn requires that _mesa_warning() be defined.

The real definition of _mesa_warning() is in mesa/main/errors.c, but to
pull that file into the standalone scaffolding would require transitively
pulling in the dispatch tables.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>

glsl: Extend ir_expression_operation for GLSL 3.00 pack/unpack functions (v2)

For each function {pack,unpack}{Snorm,Unorm,Half}2x16, add a corresponding
opcode to enum ir_expression_operation.  Validate the new opcodes in
ir_validate.cpp.

Also, add opcodes for scalarized variants of the Half2x16 functions.  (The
code generator for the i965 fragment shader requires that all vector
operations be scalarized.  A lowering pass, to be added later, will
scalarize the Half2x16 functions).

v2: Fix assertion message in ir_to_mesa [for idr].

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Tuner <mattst88@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>

glsl: Add IR lisp for GLSL ES 3.00 pack/unpack functions

For each of the following functions, add a declaration to
builtins/profiles/300es.glsl and create new file
builtins/ir/${funcname}.ir:

  packSnorm2x16  unpackSnorm2x16
  packUnorm2x16  unpackUnorm2x16
  packHalf2x16   unpackHalf2x16

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Tuner <mattst88@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>

glsl: Fix typo in comment

s/num_operands()/get_num_operands()/

Discovered because Eclipse failed to resolve the false reference.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>

i965/disasm: Fix horizontal stride of dest registers

The bug: The printed horizontal stride was the numerical value of the
BRW_HORIZONTAL_$N enum.
The fix: Translate the enum before printing.

Note: This is a candidate for the stable releases.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>

intel: Fix glCopyTexSubImage on buffers whose width >= 32kbytes

When possible, glCopyTexSubImage calls are performed using the
hardware blitter.  However, according to the Ivy Bridge PRM, Vol1
Part4, section 1.2.1.2 (Graphics Data Size Limitations):

    The BLT engine is capable of transferring very large quantities of
    graphics data. Any graphics data read from and written to the
    destination is permitted to represent a number of pixels that
    occupies up to 65,536 scan lines and up to 32,768 bytes per scan
    line at the destination. The maximum number of pixels that may be
    represented per scan line’s worth of graphics data depends on the
    color depth.

With an RGBA32F color buffer (which has 16 bytes per pixel) this
imposes a maximum width of 2048 pixels.  Other pixel formats have
accordingly larger limits.

To make matters worse, if the pitch of the buffer is 32k or greater,
intel_copy_texsubimage's call to intelEmitCopyBlit will overflow
intelEmitCopyBlit's src_pitch and dst_pitch parameters (which are
16-bit signed integers).

We can conveniently avoid both problems by avoiding use of the blitter
when the miptree's pitch is >= 32k.

Fixes gles3conform "framebuffer_blit_functionality_magnifying_blit"
tests when the buffer width is equal to 8192.

Note: this is very similar to the recent patch "intel: Fix ReadPixels
on buffers whose width >= 32kbytes" except that it applies to
glCopyTexSubImage instead of glReadPixels.  In a future patch it would
be nice to refactor the code so that (a) overflow is avoided, and (b)
intelEmitCopyBlit is responsible for checking whether the blitter can
handle the width, so that all callers of intelEmitCopyBlit work
properly, rather than just these two.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

glsl: Allow varying structs in GLSL ES 3.00 and GLSL 1.50.

Previously I thought that varying structs had been added to GLSL ES
3.00 by mistake, because chapter 11 of the GLSL ES 3.00 spec
("Counting of Inputs and Outputs") failed to mention how structs
should be handled. Khronos has clarified
(https://cvs.khronos.org/bugzilla/show_bug.cgi?id=9828) that varying
structs are indeed required, and that chapter 11 will be modified to
indicate that the minimal reference packing algorithm flattens varying
structs to their individual components.

Mesa doesn't flatten varying structs to their individual components,
but this is ok, since it packs varyings of all kinds with no wasted
space at all (except where this is impossible due to differing
interpolation modes), so it will outperform the minimal reference
packing algorithm in all but the most pathological cases.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

glsl: Disable transform feedback of varying structs.

It is not clear from the GLSL ES 3.00 spec how transform feedback is
supposed to apply to varying structs:

- There is no specification for how the structure is to be packed when
it is recorded into the transform feedback buffer.

- There is no reasonable value for GetTransformFeedbackVarying to
return as the "type" of the variable.

We currently have a Khronos bug requesting clarification on how this
feature is supposed to work
(https://cvs.khronos.org/bugzilla/show_bug.cgi?id=9856).

This patch just disables transform feedback of varying structs for
now; we can implement the proper behaviour once we find out from
Khronos what it is.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

glsl: Update lower_packed_varyings to handle varying structs.

This patch adds code to lower_packed_varyings to handle varyings of
type struct.  Varying structs are currently packed in the most naive
possible way (in declaration order, with no gaps), so there is a
potential loss of runtime efficiency.  In a later patch it would be
nice to replace this with a "flattening" approach (wherein a varying
struct is flattened to individual varyings corresponding to each of
its structure elements), so that the linker can align each structure
element independently.  However, that would require a significantly
more complex implementation.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

glsl: Generalize compute_packing_order for varying structs.

This patch paves the way for allowing varying structs by generalizing
varying_matches::compute_packing_order to handle any type of varying.
Previously, we packed in the order (vec4, vec2, float, vec3), with
matrices being packed according to the size of their columns.  Now, we
pack everything according to its number of components mod 4, in the
order (0, 2, 1, 3).

There is no behavioural change for vectors.  Matrices are now packed
slightly differently:

- mat2x2 gets assigned PACKING_ORDER_VEC4 instead of
  PACKING_ORDER_VEC2.  This is slightly better, because it guarantees
  that the matrix occupies a single varying slot.

- mat2x3 gets assigned PACKING_ORDER_VEC2 instead of
  PACKING_ORDER_VEC3.  This is kind of a wash.  Previously, mat2x3 had
  a 25% chance of having neither of its columns double parked, a 50%
  chance of having exactly one of its columns double parked, and a 25%
  chance of having both of its columns double parked.  Now it always
  has exactly one of its columns double parked.

- mat3x3 gets assigned PACKING_ORDER_SCALAR instead of
  PACKING_ORDER_VEC3.  This doesn't affect much, since in both cases
  there is no guarantee of how the matrix will be aligned.

- mat4x2 gets assigned PACKING_ORDER_VEC4 instead of
  PACKING_ORDER_VEC2.  This is slightly better for the same reason as
  in mat2x2.

- mat4x3 gets assigned PACKING_ORDER_VEC4 instead of
  PACKING_ORDER_VEC3.  This is slightly better for the same reason as
  in mat2x2.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

glsl: Disable structure splitting for shader ins/outs.

Previously, it didn't matter whether structure splitting tried to
split shader ins/outs, because structs were prohibited from being used
for shader ins/outs. However, GLSL 3.00 ES supports varying structs.
In order for varying structs to work, we need to make sure that
structure splitting doesn't get applied to them, because if it does,
then the linker won't be able to match up varyings properly.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

glsl: Eliminate ambiguity between function ins/outs and shader ins/outs

This patch replaces the three ir_variable_mode enums:

- ir_var_in
- ir_var_out
- ir_var_inout

with the following five:

- ir_var_shader_in
- ir_var_shader_out
- ir_var_function_in
- ir_var_function_out
- ir_var_function_inout

This eliminates a frustrating ambiguity: it used to be impossible to
tell whether an ir_var_{in,out} variable was a shader in/out or a
function in/out without seeing where the variable was declared in the
IR. This complicated some optimization and lowering passes, and would
have become a problem for implementing varying structs.

In the lisp-style serialization of GLSL IR to strings performed by
ir_print_visitor.cpp and ir_reader.cpp, I've retained the names "in",
"out", and "inout" for function parameters, to avoid introducing code
churn to the src/glsl/builtins/ir/ directory.

Note: a couple of comments in the code seemed to indicate that we were
planning for a possible future in which geometry shaders could have
shader-scope inout variables. Our GLSL grammar rejects shader-scope
inout variables, and I've been unable to find any evidence in the GLSL
standards documents (or extensions) that this will ever be allowed, so
I've eliminated these comments.

Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

glsl: Clean up case statement in builtin_variables.cpp's add_variable.

The case statement purported to handle the addition of ir_var_const_in
and ir_var_inout builtin variables. But no such variables exist.
This patch removes the unnecessary cases, and adds a comment
explaining why they're not needed.

Reviewed-by: Carl Worth <cworth@cworth.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

i965/vs: Do headerless texturing for texelFetchOffset().

For texelFetchOffset(), we just add the texel offsets to the coordinate
rather than using the message header's offset fields. So we don't
actually need a header on Gen5+.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>

libgl-xlib/build: Link with C++ when LLVM is used

Also link-in libX11 and libXext.

Tested-by: Brian Paul <brianp@vmware.com>

intel: Fix ReadPixels on buffers whose width >= 32kbytes

When possible, glReadPixels calls are performed using the hardware
blitter.  However, according to the Ivy Bridge PRM, Vol1 Part4,
section 1.2.1.2 (Graphics Data Size Limitations):

    The BLT engine is capable of transferring very large quantities of
    graphics data. Any graphics data read from and written to the
    destination is permitted to represent a number of pixels that
    occupies up to 65,536 scan lines and up to 32,768 bytes per scan
    line at the destination. The maximum number of pixels that may be
    represented per scan line’s worth of graphics data depends on the
    color depth.

With an RGBA32F color buffer (which has 16 bytes per pixel) this
imposes a maximum width of 2048 pixels.

To make matters worse, if the pitch of the buffer is 32k or greater,
intel_miptree_map_blit's call to intelEmitCopyBlit will overflow
intelEmitCopyBlit's src_pitch and dst_pitch parameters (which are
16-bit signed integers).

We can conveniently avoid both problems by avoiding the readpixels
blit path when the miptree's pitch is >= 32k.

Fixes gles3conform "half_float" tests when the buffer width is greater
than 2048.

Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>

intel: callocing a 32 byte temp is silly, so don't

I believe that the size used to vary, so the dynamic allocation is
necessary.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

st/mesa: implement ARB_internalformat_query v2

Reviewed-by: Brian Paul <brianp@vmware.com>

st/mesa: advertise OES_depth_texture_cube_map if GLSL 1.30 is supported

Reviewed-by: Brian Paul <brianp@vmware.com>

st/dri: disallow recursion in dri_flush

ST_FLUSH_FRONT may call driThrottle, which is implemented with dri_flush.
This prevents double flush as well as fence leaks caused by a recursion
in the middle of throttling.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58839

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>

st/dri: add null-pointer check, remove duplicated local variable

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>

Revert "Revert "targets/opencl: Link against libgallium.la instead of libgallium.a""

This reverts commit 7824ab807050c03c6df01c44774914dcbef88248.

Now that we force linking with LLVM shared libs when building clover,
we can link against libgallium.la with no problems.

configure.ac: Force use of LLVM shared libs with --enable-opencl v2

If we build clover with LLVM static libraries, then clover and also each
pipe_*.so driver that is built will contain their own static copy of
LLVM.  The recent automake changes have uncovered a problem where
the pipe_*.so drivers try to use clover's LLVM symbols.  This causes
LLVM's static registry objects to be initialized each time
a pipe_*.so driver is loaded by clover.  Initializing these objects
multiple times is not allowed and leads to assertion failures in the
LLVM code.

We can avoid all these problems by having clover and all the pipe_*.so
drivers link against the same LLVM shared library.

https://bugs.freedesktop.org/show_bug.cgi?id=59334
https://bugs.freedesktop.org/show_bug.cgi?id=59534

v2:
  - Fix shared library detection when LLVM is built with CMake

configure.ac: Compute the required llvm static libraries only once

In order to determine which static LLVM libraries are needed we pass
a list of components to llvm-config and it generates the list of
library dependencies for us. The advantage of only calling llvm-config
one time is that it can determine if two components depend on the same
library and then add it to the output list only once. The old practice
of having each driver call llvm-config to add its own dependencies to
$(LLVM_LIBS) caused many libraries to be added to this variable multiple
times.

radeonsi: Fall back to dummy pixel shader instead of trying indirect addressing.

Indirect addressing isn't fully handled yet.

Fixes crashes with piglit tests using indirect addressing.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>

radeonsi: make sure copying of all texture formats is accelerated

[ Cherry-picked from r600g commit 7c371f46958910dd2ca9487c89af1b72bbfdada9 ]

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>

radeonsi: Handle PIPE_FORMAT_L32A32_S/UINT for rendering.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>

radeonsi: Make sure to use float number format for packed float colour formats.

These aren't covered by UTIL_FORMAT_TYPE_FLOAT.

Fixes 15 piglit (sub)tests.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>

intel: Enable S3TC extensions always

Always enable the use of pre-compressed texture data.  The ability to
perform on-line compression still requires the presence of libtxc_dxtn
or an explicit driconf over-ride.  Applications that just want to submit
precompessed data when an on-line compressor is not available can look
for the GL_EXT_texture_compression_dxt1 and
GL_ANGLE_texture_compression_dxt[35] extensions.

v2: Only enable the extensions that do not require on-line compression
by default.  The previous statement "This should not impact many (if
any) real applications." proved to be false for at least Sauerbraten.
This application mostly submits pre-compressed data, but it also can
submit uncompressed data that it asks the driver to compress.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
Acked-by: Eric Anholt <eric@anholt.net> [v1]
Acked-by: Lee Salzman <lsalzman@gmail.com>

mesa: Like EXT_texture_compression_dxt1, advertise ANGLE_texture_compression_dxt in all APIs

This is technically outside the ANGLE spec, but it seems unlikely to
cause any harm.

v2: Simplify the extension checks by assuming the ANGLE extension will
always be enabled by any driver that enables the EXT. Suggested by
Eric Anholt.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Lee Salzman <lsalzman@gmail.com>

mesa: Simplify _mesa_choose_tex_format handling of compressed formats

For non-generic compressed format we assert two things:

1. The format has already been validated against the set of available
extensions.

2. The driver only enables the extension if it supports all of the
formats that are part of that extension.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

mesa: Use a single flag for the S3TC extensions that don't require on-line compression

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Lee Salzman <lsalzman@gmail.com>

i965: Use swizzles to force R, G, and B to 0.0 for ALPHA textures.

Similar to the previous commit, we may be using a texture with actual RGBA
storage for the GL_ALPHA format, so force the color values to 0.0.

This commit fixes the following piglit (sub) tests:

EXT_texture_snorm/fbo-blending-formats
GL_ALPHA16_SNORM
GL_ALPHA8_SNORM
GL_ALPHA_SNORM

Note: Haswell bypasses this swizzle code, so may require an independent fix
for this bug.

Reviewed-by: Eric Anholt <eric@anholt.net>

i965: Use swizzles to force alpha to 1.0 for RED, RG, or RGB textures.

We may be using a texture with actual RGBA storage for these formats, so force
the alpha value read to 1.0.

This commit fixes the following piglit (sub) tests:

ARB_texture_float/fb-blending-formats
GL_RGB16F_ARB
EXT_framebuffer_object/fbo-blending-formats
GL_RGB10
GL_RGB12
GL_RGB16
EXT_texture_snorm/fbo-blending-formats
GL_RGB16_SNORM
GL_RGB8_SNORM
GL_RGB_SNORM

These test improvements depend on the previous commit as well. That commit
smashes alpha to 1.0 for the case of ReadPixels (so fixes "FBO testing" as
reported by this test), while this commit smashes alpha to 1.0 for the case of
texturing (fixed the "window testing" as reported by this test).

Note: Haswell bypasses this swizzle code, so may require an independent fix
for this bug.

Reviewed-by: Eric Anholt <eric@anholt.net>

ReadPixels: Force ALPHA to 1 while rebasing RGBA values for GL_RGB format

When performing a ReadPixels operation, we may be reading from a buffer that
stores alpha values, but that is actually representing a buffer with no alpha
channel. In this case, while rebasing the values, touch up all alpha values
read to 1.0.

This commit fixes the following piglit (sub) tests:

ARB_texture_float/fbo-colormask-formats
GL_RBG16F_ARB
EXT_texture_snorm/fbo-colormask-formats
GL_RGB16_SNORM
GL_RGB8_SNORM
GL_RGB_SNORM

It likely improves the results of other tests as well, but a PASS remains
elusive due to additional bugs.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>

i965: Examine _BaseFormat when deciding to perform xRGB_alpha fixups

The renderbuffer's Format field may have an alpha channel even when the
underlying _BaseFormat does not. This can happen when mesa chooses to use
RGBA16 for an RGB16 format, for example.

So look at _BaseFormat when deciding whether to fixup the blend factors.

This test improves the results of at least the following piglit tests:

EXT_frambebuffer_object/fbo-blending-formats
{GL_RGB10, GL_RGB12, GL_RGB16}
EXT_texture_snorm/fbo-blending-formats
{GL_RGB16_SNORM, GLRGB8_SNORM, GL_RGB_SNORM}

But none of these actually change from FAIL to PASS yet. The R, G, and B probe
values are fixed with this commit, but the tests still fail because the alpha
values are still wrong.

Reviewed-by: Eric Anholt <eric@anholt.net>

scons: Fix source lists parsing on Windows.

/ vs \ mismatch was causing .objs to be put in the source tree, causing
breakeage when doing different build types in the same tree (eg., debug
vs release).

Fix this by normalizing everything to / slashes.

It's probably a good idea to purge all .objs from source tree to prevent
issues completely.

GL3.txt: i965 supports ARB_base_instance

Added in commit cdd3f549.

wmesa: include api_exec.h to fix compilation

draw: fix MSVC divide-by-zero compilation error

Kind of lame, but it works.

i965: Implement the GL_ARB_base_instance extension.

Thanks to Fredrik Höglund, all the hard work was already done.

Tested using a modified oglconform (that actually runs these tests on
our driver); it looks like there may be some bugs when using client
arrays. All applicable non-compatibility tests passed.

For now, only enable it in core profiles.

Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Ian Romanick <idr@freedesktop.org>

glsl/build: Build libglcpp and libglslcore in builtin_compiler

And reuse them if not cross compiling.

Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>

glsl/Makefile.sources: Correct BUILTIN_COMPILER_CXX_FILES

Squashed with two reverts:

Revert "android: Update for builtin_stubs.cpp move"

This reverts commit c0def90ede1e939173041b8785303de90f8fdc6c.

Revert "scons: Update for builtin_stubs.cpp"

This reverts commit 8ac4b82699ad0a59ae6ae6d3415702eaa5d4fe3b.

Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
Tested-on-Android-by: Chad Versace <chad.versace@linux.intel.com>

build: Use AX_PROG_FLEX

Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47248

build: Use AX_PROG_BISON

No one tests yacc/byacc. Let's just request bison specifically.

Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=46815

builtin_compiler/build: Use generated parser files

... instead of generating them again.

Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>

glsl/build: Build tests via the glsl Makefile

Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>

glsl/build: Build glcpp via the glsl Makefile

Removing the subdirectory recursion provides a small speed up.

Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>

glsl/build: Don't build builtin_compiler separately if not cross compiling

Reduces the number of times that src/glsl/ is compiled when not cross
compiling.

Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>

glsl/build: Don't build glsl_compiler

Use glslparsertest from piglit instead.

Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>

draw: fix problem in screen-space interpolation clip code

I don't see how this could have ever worked right.

The screen-space interpolation code uses the vertex->data[pos_attr]
position which contain window coords. But window coords are only
computed for the unclipped vertices; the clipped vertices have
undefined window coords (see draw_cliptest_tmp.h).

Use the vertex clip coords instead which are always defined.

Fixes http://bugs.freedesktop.org/show_bug.cgi?id=55476
(piglit fbo-blit-stretch failure on softpipe)

Note: This is a candidate for the 9.0 branch.

Reviewed-by: José Fonseca <jfonseca@vmware.com>

draw: improve the clipper debug/printf code

Reviewed-by: José Fonseca <jfonseca@vmware.com>

draw: add new debug code and comments in clip code template

In debug builds, set clipped vertex window coordinates to NaN values
to help debugging. Otherwise, we're just leaving the coordinate in clip
space and it's invalid to use it later expecting it to be a window coord.

Reviewed-by: José Fonseca <jfonseca@vmware.com>

swrast: fix blit code's nearest/linear coordinate arithmetic

Fixes piglit's fbo-blit-stretch test.

Reviewed-by: José Fonseca <jfonseca@vmware.com>

swrast: fix incorrect width for direct/nearest blit

Reviewed-by: José Fonseca <jfonseca@vmware.com>

swrast: move resampleRow setup code in blit_nearest()

The resampleRow setup depends on pixelSize. For color buffers,
we don't know the pixelSize until we're in the buffer loop. Move
that code inside the loop.

Fixes: http://bugs.freedesktop.org/show_bug.cgi?id=59541
Reviewed-by: José Fonseca <jfonseca@vmware.com>

docs: import release notes for 9.0.2, add news item

scons: Disable frame pointer omission for all build types except release.

In particular for checked builds, where debug_backtrace_capture relies
on it.

nouveau/build: Fix build failures when drm is not in /usr/include.

Fixes failures to include libdrm/nouveau.h when drm is not installed in
/usr/include.

Reviewed-by: Matt Turner <mattst88@gmail.com>

radeon/llvm: Handle LP_CHAN_ALL in emit_fetch_immediate().

Fixes piglit spec/ARB_sampler_objects/sampler-incomplete and
spec/EXT_texture_swizzle/depth_texture_mode_and_swizzle.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>

build: Fix build on systems where /usr/bin/python isn't python 2.

configure.ac sets up a PYTHON2 variable, which is what we want
AX_PYTHON_MODULE to use (since we only use Python 2 for now).

NOTE: This is a candidate for the 9.0 branch.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=31598
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>

mesa/es3: Apply stricter multisample blit rules for ES3.

Fixes gles3conform
framebuffer_blit_error_blitframebuffer_multisampled_read_buffer_different_origins.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

mesa/es3: Disallow FRAMEBUFFER_ATTACHMENT_COMPONENT_TYPE query of DEPTH_STENCIL_ATTACHMENT

This error was added in the 3.0.1 update to the OpenGL ES 3.0 spec.
Fixes the updated gles3conform packed_depth_stencil_parameters test.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

mesa: Don't allow blits to / from the same buffer in OpenGL ES 3.0

Fixes gles3conform test CoverageES30. It temporarily regresses some
framebuffer_blit tests, but the failing subcases have been determined to
be invalid for OpenGL ES 3.0.

v2: Fix typo in depth (and stencil) RB checking. Noticed by Ken.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>

mesa: Remove exec thunks from the dlist.c module.

These were introduced in 2000 during a rework of the TNL module (commit
cab974cf6c2dbfbf5dd5d291e1aae0f8eeb34290), though I'm having a hard time
finding an instance there of one of these Exec functions being changed
at runtime.

Regardless, as far as I can tell now, these functions don't get changed,
by grepping for calls to SET_* to change the dispatch table (we do change
functions in GLvertexformat at runtime, but those don't overlap with
this set of functions). Remove them and just let them be initialized to
the same functions as are in the Exec table.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

mesa: Initially populate the display list with the exec list.

This cuts out a ton of code to make functions not set to a save_ variant
match.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

mesa: Delay display list save dispatch setup until Exec is set up.

This will let us copy from the Exec dispatch to deal with our commands that
don't get compiled into display lists.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

mesa: Make the drivers call a non-code-generated dispatch table setup.

I want to drive the Save dispatch table setup from this same function.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

mesa: Remove the size argument from _mesa_alloc_dispatch_table().

All callers are in Mesa core and all use _gloffset_COUNT, so just rely on
the already baked-in use of _gloffset_COUNT in the function.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

mesa: Remove two of the now unused ASSERT_OUTSIDE_BEGIN_END macros.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

mesa: Drop manual checks for outside begin/end.

We now have a separate dispatch table for begin/end that prevent these
functions from being entered during that time. The
ASSERT_OUTSIDE_BEGIN_END_WITH_RETVALs are left because I don't want to
change any return values or introduce new error-only stubs at this
point.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

mesa: Install a minimal dispatch table during glBegin()/glEnd().

This is a step toward getting rid of ASSERT_OUTSIDE_BEGIN_END() in Mesa.

v2: Finish create_beginend_table() comment, move loopback API init into it,
and add a const flag. (suggestions by Brian)

Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)

mesa: Remove the dead PrepareExecBegin() driver hook.

This was used in i965 for a while, but no more.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

mesa: Use an early return to unindent most of vbo_exec_Begin/End().

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

mesa: Improve a glTexEnv error message by looking up the enum.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

mesa: Fix regression in dlist save primitive tracking.

My change 7ca4f07b5b77ccac0a9b60dc5ac9082906b5947e caused errors to not
be thrown when they should, because the new if statement for ExecuteFlag
made the CurrentSavePrimitive not get set. And on further review, we
shouldn't be validating our primitive in GL_COMPILE mode, since the
command shouldn't be executed yet.

Partially fixes piglit gl-1.0-beginend-coverage.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

vl: round next_msc to integer frame, and kill skew_msc

This reduces jitter slightly in a cleaner way, without desynchronizing mplayer2 as badly
when falling behind.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Reviewed-by: Christian König <christian.koenig@amd.com>

scons: Fix dependencies of generated headers.

It appears that scons implicit dependency scanners fail to chain
dependencies of generated headers when these are outside the build tree.

This patch ensures generated source files are _always_ put in the build
tree. I'm not 100% this will fix all depency issues, but from my
experiments it does seem to fix this.

NOTE: For this to be effective it is necessary to clean the source tree
from generated header/source files.

Reviewed-by: Brian Paul <brianp@vmware.com>

intel: Don't expose XRGB8888 visuals any more

There really isn't any point.  There is no resource savings, and we have
to do gymnastics in the driver to make it work.

There are also bad interactions with multisampling and OpenGL ES 3.0.
In ES3, a multisample-to-singlesample blit must have identical source
and destination format.  This means a multisample RGBA8 to singlesample
RGB8 (window) blit will generate an error.  Also in ES3, RGB8 is not a
renderable format.  This means that the application CANNOT make an RGB8
multisample renderbuffer.

As a result, if an application gets an RGB8 window and wants to do
multisample FBO rendering, it will probably break.

"Fixes" gles3conform
framebuffer_blit_functionality_multisampled_to_singlesampled_blit test
on RGB8 visuals.

v2: Fix 'formats' array size.  Suggested by Ken.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>

i965: Enable floating-point textures always

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Eric Anholt <eric@anholt.net>

r300g: add a workaround for the AA colorbuffer addressing bug on R500

r300g: allow resolutions up to 1280x1024 with AA optimizations on 1-pipe cards

because single-pipe cards have bigger CMASK RAM

r300g: enable AA optimizations for the RGBA16F format

radeonsi: More assorted depth/stencil changes ported from r600g.

[ Squashed port of the following r600g commits: - Michel Dänzer ]

commit 428e37c2da420f7dc14a2ea265f2387270f9bee1
Author: Marek Olšák <maraeo@gmail.com>
Date:   Tue Oct 2 22:02:54 2012 +0200

    r600g: add in-place DB decompression and texturing with DB tiling

    The decompression is done in-place and only the compressed tiles are
    decompressed. Note: R6xx-R7xx can do that only with Z16 and Z32F.

    The texture unit is programmed to use non-displayable tiling and depth
    ordering of samples, so that it can fetch the texture in the native DB format.

    The latest version of the libdrm surface allocator is required for stencil
    texturing to work. The old one didn't create the mipmap tree correctly.
    We need a separate mipmap tree for stencil, because the stencil mipmap
    offsets are not really depth offsets/4.

    There are still some known bugs, but this should save some memory and it also
    improves performance a little bit in Lightsmark (especially with low
    resolutions; tested with Radeon HD 5000).

    The DB->CB copy is still used for transfers.

commit e2f623f1d6da9bc987582ff68d0471061ae44030
Author: Marek Olšák <maraeo@gmail.com>
Date:   Sat Jul 28 13:55:59 2012 +0200

    r600g: don't decompress depth or stencil if there isn't any

commit 43e226b6efb77db2247741cc2057d9625a2cfa05
Author: Marek Olšák <maraeo@gmail.com>
Date:   Wed Jul 18 00:32:50 2012 +0200

    r600g: optimize uploading depth textures

    Make it only copy the portion of a depth texture being uploaded and
    not the whole 2D layer.

    There is also a little code cleanup.

commit b242adbe5cfa165b252064a1ea36f802d8251ef1
Author: Marek Olšák <maraeo@gmail.com>
Date:   Wed Jul 18 00:17:46 2012 +0200

    r600g: remove needless wrapper r600_texture_depth_flush

commit 611dd529425281d73f1f0ad2000362d4a5525a25
Author: Marek Olšák <maraeo@gmail.com>
Date:   Wed Jul 18 00:05:14 2012 +0200

    r600g: init_flushed_depth_texture should be able to report errors

commit 80755ff56317446a8c89e611edc1fdf320d6779b
Author: Marek Olšák <maraeo@gmail.com>
Date:   Sat Jul 14 17:06:27 2012 +0200

    r600g: properly track which textures are depth

    This fixes the issue with have_depth_texture never being set to false.

commit fe1fd675565231b49d3ac53d0b4bec39d8bc6781
Author: Marek Olšák <maraeo@gmail.com>
Date:   Sun Jul 8 03:10:37 2012 +0200

    r600g: don't flush depth textures set as colorbuffers

    The only case a depth buffer can be set as a color buffer is when flushing.

    That wasn't always the case, but now this code isn't required anymore.

commit 5a17d8318ec2c20bf86275044dc8f715105a88e7
Author: Marek Olšák <maraeo@gmail.com>
Date:   Sun Jul 8 02:14:18 2012 +0200

    r600g: flush depth textures bound to vertex shaders

    This was missing/broken. There are also minor code cleanups.

commit dee58f94af833906863b0ff2955b20f3ab407e63
Author: Marek Olšák <maraeo@gmail.com>
Date:   Sun Jul 8 01:54:24 2012 +0200

    r600g: do fine-grained depth texture flushing

    - maintain a mask of which mipmap levels are dirty (instead of one big flag)
    - only flush what was requested at a given point and not the whole resource
      (most often only one level and one layer has to be flushed)

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>

radeonsi: improve flushed depth texture handling

Use r600_resource_texture::flished_depth_texture for GPU access, and
allocate it in the VRAM. For transfers we'll allocate texture in the GTT
and store it in the r600_transfer::staging.

Improves performance when flushed depth texture is frequently used by the
GPU, e.g. in Lightsmark

[ Ported from r600g commit 37708479608af877986b76302a9c92611d1e23d0 ]

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>

radeonsi: Assorted depth/stencil changes ported from r600g.

[ Squashed port of the following r600g commits: - Michel Dänzer ]

commit c1e8c845ea9c6f843cc5bba5974668c007799bbc
Author: Marek Olšák <maraeo@gmail.com>
Date:   Sat Jul 7 19:10:00 2012 +0200

    r600g: inline r600_hw_copy_region

commit 4891c5dc64ccd8cf2bf8a8550ae23e1a61806a7d
Author: Marek Olšák <maraeo@gmail.com>
Date:   Mon Jun 25 22:53:21 2012 +0200

    r600g: inline r600_blit_push_depth and use resource_copy_region

    We are going to have a separate resource for depth texturing and transfers
    and this is just a transfer thing.

commit da98bb6fc105e1a2f688a1713ca9e50f0ac8fbed
Author: Marek Olšák <maraeo@gmail.com>
Date:   Mon Jun 25 12:45:32 2012 +0200

    r600g: split flushed depth texture creation and flushing

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>

radeonsi: Enable 1D tiling for non-depth resources as well.

No piglit regressions anymore thanks to fixes in libdrm_radeon and here.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>

radeonsi: Fix 1D tiling mode index for non-scanout resources.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>

build: Remove dead SHARED_GLAPI variable

The static Makefiles used it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>

glsl/build: Build glsl_test only on make check

Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>

glsl/build: Remove dead LIBRARY_* variables

Reviewed-by: Andreas Boll <andreas.boll.dev@gmail.com>