mesa.git
11 years agor600g: rename GPU_FLUSH -> INVAL_READ_CACHES
Marek Olšák [Sat, 22 Dec 2012 18:05:37 +0000 (19:05 +0100)]
r600g: rename GPU_FLUSH -> INVAL_READ_CACHES

because that's what it does.

11 years agor600g: remove redundant parameter alloc_bo from r600_texture_create_object
Marek Olšák [Sat, 22 Dec 2012 01:54:52 +0000 (02:54 +0100)]
r600g: remove redundant parameter alloc_bo from r600_texture_create_object

alloc_bo == !buf

11 years agoMake IsVertexArray() return false before BindVertexArray()
Matt Turner [Thu, 20 Dec 2012 04:20:34 +0000 (20:20 -0800)]
Make IsVertexArray() return false before BindVertexArray()

Rename existing _Used flag to EverBound.

The GL 4.3 and ES 3.0 specs say

   These names are marked as used, for the purposes of GenVertexArrays
   only, but they do not acquire array state until they are first bound.

This also affects Apple VAOs, which is fine since the
APPLE_vertex_array_object spec says

   A vertex array object is created by binding an unused name. This
   binding is accomplished by calling BindVertexArrayAPPLE with id set
   to the name of the new vertex array object.

Fixes arb_vertex_array_object_isvertexarray.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoMake IsTransformFeedback() return false before BindTransformFeedback()
Matt Turner [Wed, 19 Dec 2012 21:43:31 +0000 (13:43 -0800)]
Make IsTransformFeedback() return false before BindTransformFeedback()

The GL 4.3 an ES 3.0 specs say

   A transform feedback object is created by binding a name returned by
   GenTransformFeedbacks with the command
      void BindTransformFeedback( enum target, uint id );

Fixes arb_transform_feedback2-istransformfeedback and part of
es3conform's CoverageES30.test.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agonouveau: deal with tbo cap for now.
Dave Airlie [Sat, 22 Dec 2012 03:12:30 +0000 (13:12 +1000)]
nouveau: deal with tbo cap for now.

This fixes the printk running apps against master.

Signed-off-by: Dave Airlie <airlied@redhat.com>
11 years agor600g: always use a tiled resource as the destination of MSAA resolve
Marek Olšák [Fri, 21 Dec 2012 19:34:52 +0000 (20:34 +0100)]
r600g: always use a tiled resource as the destination of MSAA resolve

i.e. we have to allocate a temporary tiled resource if dst isn't tiled.

This fixes hardlocks on r6xx-r7xx, though using a linear resource is forbidden
on later asics as well.

NOTE: This is a candidate for the stable branches.

11 years agowinsys/radeon: the env var RADEON_NOOP can be used to skip CS ioctls
Marek Olšák [Fri, 21 Dec 2012 18:15:20 +0000 (19:15 +0100)]
winsys/radeon: the env var RADEON_NOOP can be used to skip CS ioctls

11 years agor600g: remove a false comment
Marek Olšák [Fri, 21 Dec 2012 18:15:02 +0000 (19:15 +0100)]
r600g: remove a false comment

11 years agor600g: don't suspend TIME_ELAPSED queries during flushing
Marek Olšák [Fri, 21 Dec 2012 15:29:19 +0000 (16:29 +0100)]
r600g: don't suspend TIME_ELAPSED queries during flushing

According to the GL spec, the result should be equivalent to comparing
two timestamps.

11 years agogallium/tests: fix build breakage after pipe_surface::usage removal
Marek Olšák [Fri, 21 Dec 2012 15:56:41 +0000 (16:56 +0100)]
gallium/tests: fix build breakage after pipe_surface::usage removal

11 years agomesa: add bounds checking for uniform array access
Frank Henigman [Fri, 14 Dec 2012 20:52:17 +0000 (15:52 -0500)]
mesa: add bounds checking for uniform array access

No piglit regressions and now passes glsl-uniform-out-of-bounds-2.

validate_uniform_parameters now checks that the array index is
valid.  This means if an index is out of bounds, glGetUniform* now
fails with GL_INVALID_OPERATION, as it should.
_mesa_uniform and _mesa_uniform_matrix also call
validate_uniform_parameters so the bounds checks there became
redundant and were removed.

The test in glGetUniformLocation is modified to check array bounds
so it now returns GL_INVALID_INDEX (-1) if you ask for the location
of a non-existent array element, as it should.

Signed-off-by: Frank Henigman <fjhenigman@google.com>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
11 years agoutil/u_format: Round when converting depth values from float to z16_unorm.
José Fonseca [Thu, 20 Dec 2012 12:03:45 +0000 (12:03 +0000)]
util/u_format: Round when converting depth values from float to z16_unorm.

This makes the z16_unorm -> float -> z16_unorm conversion lossless.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
11 years agor600g: add cs tracing infrastructure for lockup pin pointing
Jerome Glisse [Wed, 19 Dec 2012 17:23:50 +0000 (12:23 -0500)]
r600g: add cs tracing infrastructure for lockup pin pointing

It's a build time option you need to set R600_TRACE_CS to 1 and it
will print to stderr all cs along as cs trace point value which
gave last offset into a cs process by the GPU.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
11 years agor600g: add htile support v16
Jerome Glisse [Thu, 11 Oct 2012 14:40:30 +0000 (10:40 -0400)]
r600g: add htile support v16

htile is used for HiZ and HiS support and fast Z/S clears.
This commit just adds the htile setup and Fast Z clear.
We don't take full advantage of HiS with that patch.

v2 really use fast clear, still random issue with some tiles
   need to try more flush combination, fix depth/stencil
   texture decompression
v3 fix random issue on r6xx/r7xx
v4 rebase on top of lastest mesa, disable CB export when clearing
   htile surface to avoid wasting bandwidth
v5 resummarize htile surface when uploading z value. Fix z/stencil
   decompression, the custom blitter with custom dsa is no longer
   needed.
v6 Reorganize render control/override update mecanism, fixing more
   issues in the process.
v7 Add nop after depth surface base update to work around some htile
   flushing issue. For htile to 8x8 on r6xx/r7xx as other combination
   have issue. Do not enable hyperz when flushing/uncompressing
   depth buffer.
v8 Fix htile surface, preload and prefetch setup. Only set preload
   and prefetch on htile surface clear like fglrx. Record depth
   clear value per level. Support several level for the htile
   surface. First depth clear can't be a fast clear.
v9 Fix comments, properly account new register in emit function,
   disable fast zclear if clearing different layer of texture
   array to different value
v10 Disable hyperz for texture array making test simpler. Force
    db_misc_state update when no depth buffer is bound. Remove
    unused variable, rename depth_clearstencil to depth_clear.
    Don't allocate htile surface for flushed depth. Something
    broken the cliprect change, this need to be investigated.
v11 Rebase on top of newer mesa
v12 Rebase on top of newer mesa
v13 Rebase on top of newer mesa, htile surface need to be initialized
    to zero, somehow special casing first clear to not use fast clear
    and thus initialize the htile surface with proper value does not
    work in all case.
v14 Use resource not texture for htile buffer make the htile buffer
    size computation easier and simpler. Disable preload on evergreen
    as its still troublesome in some case
v15 Cleanup some comment and remove some left over
v16 Define name for bit 20 of CP_COHER_CNTL

Signed-off-by: Pierre-Eric Pelloux-Prayer <pelloux@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
11 years agor600g: rework flusing and synchronization pattern v7
Jerome Glisse [Thu, 1 Nov 2012 20:09:40 +0000 (16:09 -0400)]
r600g: rework flusing and synchronization pattern v7

This bring r600g allmost inline with closed source driver when
it comes to flushing and synchronization pattern.

v2-v4: history lost somewhere in outer space
v5: Fix compute size of flushing, use define for flags, update
    worst case cs size requirement for flush, treat rs780 and
    newer as r7xx when it comes to streamout.
v6: Fix num dw computation for framebuffer state, remove dead
    code, use define instead of hardcoded value.
v7: Remove dead code

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
11 years agomesa: Allow glReadBuffer(GL_NONE) for winsys framebuffers.
Paul Berry [Wed, 19 Dec 2012 18:08:58 +0000 (10:08 -0800)]
mesa: Allow glReadBuffer(GL_NONE) for winsys framebuffers.

Previously, Mesa code assumed that glReadBuffer(GL_NONE) was only
valid for user-created framebuffer objects.  However, the spec is
quite clear that is should also be valid for the default framebuffer.
From section 18.2.1 ("Obtaining Pixels from the Framebuffer") of the
GL 4.3 spec:

    "When READ_FRAMEBUFFER_BINDING is zero, i.e. the default
    framebuffer, src must be one of the values listed in table 17.4,
    including NONE."

Similar language exists in the GLES 3.0 spec, and in desktop GL all
the way back to ARB_framebuffer_object.

Partially fixes GLES3 conformance test "CoverageES30.test".

NOTE: This is a candidate for stable branches.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
11 years agollvmpipe: Drop PIPE_QUERY_TIME_ELAPSED support.
José Fonseca [Sun, 9 Dec 2012 10:15:19 +0000 (10:15 +0000)]
llvmpipe: Drop PIPE_QUERY_TIME_ELAPSED support.

It was slightly wrong: we were computing the longest duration of
the query among all the rasterizer tasks.

Regardless, for tile-based implementations such as llvmpipe, time differences
will never be very useful, because rendering before/during/after the query
is all interleaved.  And this is expected, see ARB_timer_query spec, issue 10.

In particular, piglit ext_timer_query-time-elapsed still fails, because
it makes assumptions that don't hold true in in tiled architectures. Not
sure how to fix that though.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agomesa/st: Implement GL_TIME_ELAPSED w/ PIPE_QUERY_TIMESTAMP.
José Fonseca [Sun, 9 Dec 2012 10:08:13 +0000 (10:08 +0000)]
mesa/st: Implement GL_TIME_ELAPSED w/ PIPE_QUERY_TIMESTAMP.

ARB/EXT_timer_query's definition of GL_TIME_ELAPSED match precisely the
subtraction of two GL_TIMESTAMP queries.

And for a lot of drivers, that's precisely how they have to implement
internally -- by emitting two hardware timestamp queries.

So, to simplify driver implementation, simply allow doing so in the state
tracker.

Eventually if no driver implements PIPE_QUERY_TIME_ELAPSED then we could
retire it.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agogallium: s/PIPE_CAP_TIMER_QUERY/PIPE_CAP_QUERY_TIME_ELAPSED/
José Fonseca [Sun, 9 Dec 2012 09:50:34 +0000 (09:50 +0000)]
gallium: s/PIPE_CAP_TIMER_QUERY/PIPE_CAP_QUERY_TIME_ELAPSED/

To better reflect what it is being advertised.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agor600g: add assertions to prevent creation of invalid surfaces
Marek Olšák [Wed, 19 Dec 2012 23:28:24 +0000 (00:28 +0100)]
r600g: add assertions to prevent creation of invalid surfaces

11 years agor600g: refactor and make streamout dumping more informative
Marek Olšák [Wed, 19 Dec 2012 16:05:01 +0000 (17:05 +0100)]
r600g: refactor and make streamout dumping more informative

Reviewed-by: Dave Airlie <airlied@redhat.com>
11 years agor600g: try to fix streamout for the cases where BURST_COUNT > 0
Marek Olšák [Wed, 19 Dec 2012 15:59:45 +0000 (16:59 +0100)]
r600g: try to fix streamout for the cases where BURST_COUNT > 0

The burst was incorrectly used, because ELEM_SIZE was always 0.
I don't know if the burst works, because I don't know of any test
which uses it.

NOTE: This is a candidate for the stable branches.

Reviewed-by: Dave Airlie <airlied@redhat.com>
11 years agor600g: lower stream outputs with dst_offset < start_component
Marek Olšák [Wed, 19 Dec 2012 14:09:54 +0000 (15:09 +0100)]
r600g: lower stream outputs with dst_offset < start_component

This fixes streamout breakage caused by the varying packing.

Reviewed-by: Dave Airlie <airlied@redhat.com>
11 years agor600g: use r600_get_temp to get temporaries for CLIPDIST shader outputs
Marek Olšák [Wed, 19 Dec 2012 14:00:45 +0000 (15:00 +0100)]
r600g: use r600_get_temp to get temporaries for CLIPDIST shader outputs

I need this to be able to use r600_get_temp in the function later.

Reviewed-by: Dave Airlie <airlied@redhat.com>
11 years agosoftpipe: fix up FS variant unbinding / deletion
Brian Paul [Fri, 14 Dec 2012 17:47:46 +0000 (10:47 -0700)]
softpipe: fix up FS variant unbinding / deletion

The old call to tgsi_exec_machine_bind_shader() in
softpipe_delete_fs_state() was never called since the shader's original
tokens are never passed to the tgsi interpreter (only shader _variant_
tokens are).  Now, unbind the variant's tokens from the tgsi interpreter
when we free the variant.

This doesn't fix any known bugs but it's the right thing to do.

Note: This is a candidate for the stable branches.

11 years agosoftpipe: fix unreliable FS variant binding bug
Brian Paul [Fri, 14 Dec 2012 17:34:33 +0000 (10:34 -0700)]
softpipe: fix unreliable FS variant binding bug

In exec_prepare() we were comparing pointers to see if the fragment
shader variant had changed before calling tgsi_exec_machine_bind_shader().
This didn't work reliably when there was a lot of shader token malloc/
freeing going on because the memory might get reused.
Instead, bind the shader variant during regular state validation.

Fixes http://bugs.freedesktop.org/show_bug.cgi?id=40404
(fixes a couple of piglit's glsl-max-varyings test)

Note: This is a candidate for the stable branches.

11 years agoRevert "r600g: work around ddx over alignment"
Jerome Glisse [Wed, 19 Dec 2012 14:56:17 +0000 (09:56 -0500)]
Revert "r600g: work around ddx over alignment"

This reverts commit d8287bac1fd4a77abc2db38de134f14176740d23.

Cause more issue than it fix. Need to think of a proper solution.

11 years agor600g: work around ddx over alignment
Jerome Glisse [Tue, 18 Dec 2012 17:45:31 +0000 (12:45 -0500)]
r600g: work around ddx over alignment

This force surface allocated from ddx to be consider as height
aligned on 8 and fix 1D->2D tiling transition that result from
this.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
11 years agoi965: Fix gl_VertexID when there are no other vertex inputs.
Paul Berry [Mon, 17 Dec 2012 17:55:48 +0000 (09:55 -0800)]
i965: Fix gl_VertexID when there are no other vertex inputs.

brw_emit_vertices contains special case logic to handle the case where
a vertex shader doesn't read any inputs.  This special case logic was
incorrectly activating in the case were the only vertex input is
gl_VertexID.  As a result, if a shader used gl_VertexID but used no
other inputs, then all vertices got a gl_VertexID of zero.

Fixes oglconform test "ubo-usage advanced.transform_feedback".

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agomesa: Make a function is_transform_feedback_active_and_unpaused.
Paul Berry [Sat, 15 Dec 2012 22:21:32 +0000 (14:21 -0800)]
mesa: Make a function is_transform_feedback_active_and_unpaused.

The rather unweildy logic for determining this condition was repeated
in a large number of places.  This patch consolidates it to a single
inline function.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
11 years agomesa: Fix corner cases of BindBufferBase with transform feedback.
Paul Berry [Sat, 15 Dec 2012 21:06:10 +0000 (13:06 -0800)]
mesa: Fix corner cases of BindBufferBase with transform feedback.

This patch implements the following behaviours, which are mandated by
the GL 4.3 and GLES3 specs.

1. Regarding the GL_TRANSFORM_FEEDBACK_BUFFER_SIZE query: "If the
   ... size was not specified when the buffer object was bound
   (e.g. if it was bound with BindBufferBase), ... zero is returned."
   (GL 4.3 section 6.7.1 "Indexed Buffer Object Limits and Binding
   Queries").

2. "BindBufferBase binds the entire buffer, even when the size of the
   buffer is changed after the binding is established. It is
   equivalent to calling BindBufferRange with offset zero, while size
   is determined by the size of the bound buffer at the time the
   binding is used."  (GL 4.3 section 6.1.1 "Binding Buffer Objects to
   Indexed Targets").  I interpret "at the time the binding is used"
   to mean "at the time of the call to glBeginTransformFeedback".

3. "Regardless of the size specified with BindBufferRange, or
   indirectly with BindBufferBase, the GL will never read or write
   beyond the end of a bound buffer. In some cases this constraint may
   result in visibly different behavior when a buffer overflow would
   otherwise result, such as described for transform feedback
   operations in section 13.2.2."  (GL 4.3 section 6.1.1 "Binding
   Buffer Objects to Indexed Targets").

Item 1 has been part of the spec all the way back to the inception of
the EXT_transform_feedback extension.  Items 2 and 3 were added in GL
4.2 and GLES 3.

Prior to GL 4.2, in place of items 2 and 3, the spec simply said
"BindBufferBase is equivalent to calling BindBufferRange with offset
zero and size equal to the size of buffer."  For transform feedback,
Mesa behaved as though this meant "...equal to the size of buffer at
the time of the call to BindBufferBase".  However, this was
problematic because it left it ambiguous what to do if the buffer is
shrunk between the call to BindBuffer{Base,Range} and the call to
BeginTransformFeedback.  Prior to this patch, Mesa's behaviour was to
try to write beyond the end of the buffer, likely resulting in memory
corruption.  In light of this, I'm interpreting the spec change as a
clarification, not an intended behavioural change, so I'm making the
change apply regardless of API version.

Fixes GLES3 conformance test transform_feedback2_pause_resume.test.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
11 years agomesa/gles3: Generate error on draw call if transform feedback would overflow.
Paul Berry [Thu, 13 Dec 2012 17:30:09 +0000 (09:30 -0800)]
mesa/gles3: Generate error on draw call if transform feedback would overflow.

In desktop GL, if a draw call would cause transform feedback buffers
to overflow, the draw call should succeed, and the extra primitives
should simply not be recorded in the transform feedback buffers.

In GLES3, however, if a draw call would cause transform feedback
buffers to overflow, the draw call is supposed to produce an
INVALID_OPERATION error and no drawing should occur.

This patch implements the GLES3-required behaviour.

Fixes GLES3 conformance test "transform_feedback_overflow.test".

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
11 years agomesa/gles3: Generate error on DrawElements* calls if transform feedback active.
Paul Berry [Fri, 14 Dec 2012 20:20:08 +0000 (12:20 -0800)]
mesa/gles3: Generate error on DrawElements* calls if transform feedback active.

In GLES3, only glDrawArrays() and glDrawArraysInstanced() calls are
allowed when transform feedback is active.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
11 years agomesa: refactor _mesa_compute_max_transform_feedback_vertices from i965.
Paul Berry [Wed, 12 Dec 2012 22:14:12 +0000 (14:14 -0800)]
mesa: refactor _mesa_compute_max_transform_feedback_vertices from i965.

Previously, the i965 driver contained code to compute the maximum
number of vertices that could be written without overflowing any
transform feedback buffers.  This code wasn't driver-specific, and for
GLES3 support we're going to need to use it in core mesa.  So this
patch moves the code into a core mesa function,
_mesa_compute_max_transform_feedback_vertices().

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
v2: Eliminate C++-style variable declarations, since these won't work
with MSVC.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agomesa: Change args to vbo_count_tessellated_primitives.
Paul Berry [Wed, 12 Dec 2012 21:37:45 +0000 (13:37 -0800)]
mesa: Change args to vbo_count_tessellated_primitives.

No functional change--this simply paves the way to allow futures
patches to call vbo_count_tessellated_primitives() during error
checking, before the _mesa_prim struct has been constructed.

This will be needed for GLES3, which requires draw calls to fail if
there is not enough space available in transform feedback buffers to
accommodate the primitives to be drawn.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agoradeon/llvm: improve cube map handling
Vadim Girlin [Tue, 18 Dec 2012 13:39:19 +0000 (17:39 +0400)]
radeon/llvm: improve cube map handling

Add support for TEX2, TXB2, TXL2, fix SHADOWCUBE

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
11 years agoradeon/llvm: fix TXQ_LZ handling for cube maps
Vadim Girlin [Tue, 18 Dec 2012 13:40:36 +0000 (17:40 +0400)]
radeon/llvm: fix TXQ_LZ handling for cube maps

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
11 years agor600g: initialize inst_mod in r600_tex_from_byte_stream
Vadim Girlin [Tue, 18 Dec 2012 13:40:06 +0000 (17:40 +0400)]
r600g: initialize inst_mod in r600_tex_from_byte_stream

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
11 years agogallivm: fix conversion for pure integer formats
Roland Scheidegger [Mon, 17 Dec 2012 21:06:40 +0000 (22:06 +0100)]
gallivm: fix conversion for pure integer formats

Since the idea is to just expand or shrink the bit width but not otherwise do
conversion we also need to adjust the sign bit according to src, otherwise
the conversion code will incorrectly clamp the values. (Since this only works
for casting to ordinary floats the norm and fixed bits should always be fine.)

This fixes the remaining piglit attribs GL3 failures.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
11 years agoglsl: Fix gl_context vs. ralloc context in check_version again, again.
Kenneth Graunke [Mon, 17 Dec 2012 19:20:53 +0000 (11:20 -0800)]
glsl: Fix gl_context vs. ralloc context in check_version again, again.

Dave found some, but there were more.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58039

11 years agovega: fix for object handle leak
Andreas Pokorny [Sat, 15 Dec 2012 22:28:57 +0000 (23:28 +0100)]
vega: fix for object handle leak

frees the object handle when a OpenVG
is destroyed.

Signed-off-by: Andreas Pokorny <andreas.pokorny@elektrobit.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
11 years agowmesa: include version.h to silence warning
Brian Paul [Mon, 17 Dec 2012 15:39:17 +0000 (08:39 -0700)]
wmesa: include version.h to silence warning

11 years agoxlib: include headers to fix errors/warnings
Brian Paul [Mon, 17 Dec 2012 17:22:10 +0000 (10:22 -0700)]
xlib: include headers to fix errors/warnings

11 years agomesa osmesa/x11: fix build error introduced in 4bea4cb9
Jordan Justen [Mon, 17 Dec 2012 00:53:02 +0000 (16:53 -0800)]
mesa osmesa/x11: fix build error introduced in 4bea4cb9

Fixes https://bugs.freedesktop.org/show_bug.cgi?id=58380

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agogallivm: fix texel fetch for array textures (2)
Roland Scheidegger [Fri, 14 Dec 2012 20:17:05 +0000 (21:17 +0100)]
gallivm: fix texel fetch for array textures (2)

a460aea3f14222af46f88d1bc686f82180b8a872 wasn't entirely correct,
since all coords are already ints hence need to skip the iround.
Passes piglit texelFetch with sampler1DArray/sampler2DArray.

Reviewed-by: Dave Airlie <airlied@redhat.com>
11 years agomesa: assert if driver did not compute the version
Jordan Justen [Fri, 16 Nov 2012 21:40:59 +0000 (13:40 -0800)]
mesa: assert if driver did not compute the version

Make sure drivers initialize the version before:
 * _mesa_initialize_exec_table is called
 * _mesa_initialize_exec_table_vbo is called
 * A context is made current

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agomesa: don't initialize VBO vtxfmt in _vbo_CreateContext
Jordan Justen [Mon, 19 Nov 2012 19:21:05 +0000 (11:21 -0800)]
mesa: don't initialize VBO vtxfmt in _vbo_CreateContext

The driver should call _mesa_initialize_vbo_vtxfmt after
computing the context version.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agomesa: don't initialize exec dispatch tables in _mesa_initialize_context
Jordan Justen [Fri, 16 Nov 2012 18:42:02 +0000 (10:42 -0800)]
mesa: don't initialize exec dispatch tables in _mesa_initialize_context

Drivers must compute the context version, and then call
_mesa_initialize_exec_table themselves.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agomesa dispatch_sanity: call new functions to initialize exec table
Jordan Justen [Sat, 17 Nov 2012 02:25:35 +0000 (18:25 -0800)]
mesa dispatch_sanity: call new functions to initialize exec table

In a future patch the exec functions will no longer set up
by _mesa_initialize_context and _vbo_CreateContext.

Therefore we must call _mesa_initialize_exec_table and
_mesa_initialize_exec_table_vbo.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agodrivers: compute version and then initialize exec table
Jordan Justen [Fri, 16 Nov 2012 18:30:19 +0000 (10:30 -0800)]
drivers: compute version and then initialize exec table

This change forces the context version to be computed before
initilizing the exec dispatch tables.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agovbo: add _mesa_initialize_vbo_vtxfmt
Jordan Justen [Mon, 19 Nov 2012 19:17:39 +0000 (11:17 -0800)]
vbo: add _mesa_initialize_vbo_vtxfmt

This function initializes the exec/save dispatch tables
for VBO vtxfmt.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agomesa: separate exec allocation from initialization
Jordan Justen [Fri, 16 Nov 2012 18:27:13 +0000 (10:27 -0800)]
mesa: separate exec allocation from initialization

In glapi/gl_genexec.py:
* Remove _mesa_alloc_dispatch_table call

In glapi/gl_genexec.py and api_exec.h:
* Rename _mesa_create_exec_table to _mesa_initialize_exec_table

In context.c:
* Call _mesa_alloc_dispatch_table instead of _mesa_create_exec_table
* Call _mesa_initialize_exec_table (this is temporary)

Once all drivers have been modified to call
_mesa_initialize_exec_table, then the call to
_mesa_initialize_context can be removed from context.c.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agor600g: fixup offset types for printing
Dave Airlie [Sun, 16 Dec 2012 10:16:09 +0000 (10:16 +0000)]
r600g: fixup offset types for printing

This allows the debug code to at least show the sign properly.

Signed-off-by: Dave Airlie <airlied@redhat.com>
11 years agogallium/u_blitter: Remove the overlapped blit assert from util_blitter_blit_generic().
Henri Verbeet [Fri, 14 Dec 2012 03:14:14 +0000 (04:14 +0100)]
gallium/u_blitter: Remove the overlapped blit assert from util_blitter_blit_generic().

This is used by st_BlitFramebuffer() / r600_blit(), and ARB_fbo allows
overlapped blits, even though the result is undefined. No piglit regressions
on r600g / CYPRESS.

Signed-off-by: Henri Verbeet <hverbeet@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
11 years agoglsl_parser_extras.cpp: fixup gl vs mem contexts again.
Dave Airlie [Sun, 16 Dec 2012 01:47:01 +0000 (11:47 +1000)]
glsl_parser_extras.cpp: fixup gl vs mem contexts again.

This should fix:
https://bugs.freedesktop.org/show_bug.cgi?id=58039

Tested-by: Darxus on bug 58039
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
11 years agoi965: Move BRW_MAX_GRF and similar defines to brw_reg.h.
Kenneth Graunke [Sat, 10 Nov 2012 05:20:05 +0000 (21:20 -0800)]
i965: Move BRW_MAX_GRF and similar defines to brw_reg.h.

These don't really belong in brw_structs.h.

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoi965: Split struct brw_reg out from brw_eu.h into its own header.
Kenneth Graunke [Fri, 9 Nov 2012 22:00:15 +0000 (14:00 -0800)]
i965: Split struct brw_reg out from brw_eu.h into its own header.

struct brw_instruction and the related instruction emitting code won't
be useful on Gen8+, as the instruction encoding changed.  However, the
struct brw_reg code is still extremely valuable.

While we're at it, fix up some style points:
- s/GLuint/unsigned/g
- s/GLint/int/g
- s/GLshort/int16_t/g
- s/GLushort/uint16_t/g
- s/INLINE/inline/g
- Replace tabs with spaces
- Put return types on a separate line from the function name/parameters
- Remove trailing whitespace
- Remove extraneous whitespace around function parameters

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agodocs: add ARB_texture_buffer_object_rgb32
Dave Airlie [Sat, 15 Dec 2012 21:06:54 +0000 (07:06 +1000)]
docs: add ARB_texture_buffer_object_rgb32

Signed-off-by: Dave Airlie <airlied@redhat.com>
11 years agost/mesa: add texture buffer object rgb32 support.
Dave Airlie [Sat, 15 Dec 2012 03:03:54 +0000 (13:03 +1000)]
st/mesa: add texture buffer object rgb32 support.

This checks if the pipe driver can support RGB32 formats.

Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
11 years agomesa: add support for ARB_texture_buffer_object_rgb32
Dave Airlie [Sat, 15 Dec 2012 03:03:40 +0000 (13:03 +1000)]
mesa: add support for ARB_texture_buffer_object_rgb32

This adds the extensions + the tex buffer support for checking
the formats.

There is a piglit test enhancement sent to that list.

Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
11 years agoglsl: avoid using gl context as a memory context
Dave Airlie [Sat, 15 Dec 2012 03:24:52 +0000 (13:24 +1000)]
glsl: avoid using gl context as a memory context

Not sure what was going on here, but running piglit with debug builds
might be a good plan :-)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
11 years agoi965: Add missing autoconf bits so test_vec4_register_coalesce will build
Ian Romanick [Sat, 15 Dec 2012 01:22:29 +0000 (17:22 -0800)]
i965: Add missing autoconf bits so test_vec4_register_coalesce will build

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Eric Anholt <eric@anholt.net>
11 years agoi965: Generalize VS compute-to-MRF for compute-to-another-GRF, too.
Eric Anholt [Thu, 2 Aug 2012 02:35:18 +0000 (19:35 -0700)]
i965: Generalize VS compute-to-MRF for compute-to-another-GRF, too.

No statistically significant performance difference on glbenchmark 2.7
(n=60).  It reduces cycles spent in the vertex shader by 3.3% +/- 0.8%
(n=5), but that's only about .3% of all cycles spent according to the
fixed shader_time.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/vs: Extend opt_compute_to_mrf to handle limited "reswizzling"
Eric Anholt [Sat, 1 Dec 2012 06:29:26 +0000 (22:29 -0800)]
i965/vs: Extend opt_compute_to_mrf to handle limited "reswizzling"

The way our visitor works, scalar expression/swizzle results that get
stored in channels other than .x will have an intermediate MOV from
their result in the .x channel to the real .y (or whatever) channel, and
similarly for vec2/vec3 results.

By knowing how to adjust DP4-type instructions for optimizing out a
swizzled MOV, we can reduce instructions in common matrix multiplication
cases.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/vs: Add a unit test for opt_compute_to_mrf().
Eric Anholt [Sat, 1 Dec 2012 07:52:38 +0000 (23:52 -0800)]
i965/vs: Add a unit test for opt_compute_to_mrf().

The compute-to-mrf code is really twitchy, and it's hard to construct
GLSL testcases for it.  This unit test is also really hard to work with
(for example, if your instruction is removed by dead code elimination,
you end up inspecting something irrelevant), but I did use it for
debugging some of the commits to follow.

I called it test_vec4_register_coalesce because the compute-to-mrf code
is about to morph into that.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Drop an unnecessary _safe on a list walk.
Eric Anholt [Sat, 1 Dec 2012 05:15:35 +0000 (21:15 -0800)]
i965/fs: Drop an unnecessary _safe on a list walk.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Add a note explaining a detail of register_coalesce_2().
Eric Anholt [Fri, 30 Nov 2012 23:54:19 +0000 (15:54 -0800)]
i965/fs: Add a note explaining a detail of register_coalesce_2().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965: Also consider HALTs a potential block end.
Eric Anholt [Wed, 12 Dec 2012 20:47:50 +0000 (12:47 -0800)]
i965: Also consider HALTs a potential block end.

The final halt of the fragment shader turns off the remaining channels,
then jumps such that everything is turned back on.  So, we can have our
last ENDIF of the shader point at that directly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965: Jump to the end of the next outer conditional block on ENDIFs.
Kenneth Graunke [Wed, 12 Dec 2012 10:20:05 +0000 (02:20 -0800)]
i965: Jump to the end of the next outer conditional block on ENDIFs.

From the Ivybridge PRM, Volume 4, Part 3, section 6.24 (page 172):

"The endif instruction is also used to hop out of nested conditionals by
 jumping to the end of the next outer conditional block when all
 channels are disabled."

Also:
"Pseudocode:
 Evaluate(WrEn);
 if ( WrEn == 0 ) {  // all channels false
   Jump(IP + JIP);
 }"

First, ENDIF re-enables any channels that were disabled because they
didn't match the conditional.  If any channels are active, it proceeds
to the next instruction (IP + 16).  However, if they're all disabled,
there's no point in walking through all of the instructions that have no
effect---it can jump to the next instruction that might re-enable some
channels (an ELSE, ENDIF, or WHILE).

Previously, we always set JIP on ENDIF instructions to 2 (which is
measured in 8-byte units).  This made it do Jump(IP + 16), which just
meant it would go to the next instruction even if all channels were off.

It turns out that walking over instructions while all the channels are
disabled like this is worse than just instruction dispatch overhead: if
there are texturing messages, it still costs a couple hundred cycles to
not-actually-read from the texture results.

This patch finds the next instruction that could re-enable channels and
sets JIP accordingly.

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoi965: expose ARB_texture_cube_map_array
Chris Forbes [Thu, 22 Nov 2012 06:24:22 +0000 (19:24 +1300)]
i965: expose ARB_texture_cube_map_array

V3: Put enable in an existing block rather than making a new
one for no good reason.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Fix setup for textureGrad(samplerCubeArray, coord, dPdx, dPdy)
Eric Anholt [Fri, 14 Dec 2012 20:24:55 +0000 (12:24 -0800)]
i965/fs: Fix setup for textureGrad(samplerCubeArray, coord, dPdx, dPdy)

Caught by tex_grad-01.frag.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Move the failure for gen7 16-wide intdiv to emit_math().
Eric Anholt [Wed, 5 Dec 2012 22:56:32 +0000 (14:56 -0800)]
i965/fs: Move the failure for gen7 16-wide intdiv to emit_math().

The cube map array code adds another caller of emit_math(), which
needs this check.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965: fs: Add fixup for textureSize on Gen6/7
Chris Forbes [Thu, 22 Nov 2012 09:13:46 +0000 (22:13 +1300)]
i965: fs: Add fixup for textureSize on Gen6/7

V2: Moved up into emit(ir_texture *) to avoid duplication and fix
ordering for Gen7; Gen6 math quirks moved into previous patches.

Tested on Gen6 only; passes all the cube_map_array piglits.

V3: Fixed weird whitespace
V4: Use sampler->type; otherwise broken on arrays of samplers.
v5: Minor style fixes (by anholt)

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965: fs: fix gen6+ math operands in one place
Chris Forbes [Wed, 28 Nov 2012 19:39:08 +0000 (08:39 +1300)]
i965: fs: fix gen6+ math operands in one place

V4: Fix various style nits as pointed out by Eric, and expand IMM
    operands on both Gen6 and Gen7.
v5: minor style nits (by anholt)

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965: vs: Add fixup for textureSize with cube array samplers
Chris Forbes [Thu, 22 Nov 2012 08:32:08 +0000 (21:32 +1300)]
i965: vs: Add fixup for textureSize with cube array samplers

V3: Fixed weird whitespace
V4: Use sampler's type rather than variable's type; otherwise broken
    with arrays of samplers. (Thanks Eric)
v5: Fix a couple more style nits (by anholt)

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/vs: Fix gen6+ math operand quirks in one place
Chris Forbes [Sun, 9 Dec 2012 09:03:49 +0000 (22:03 +1300)]
i965/vs: Fix gen6+ math operand quirks in one place

This causes immediate values to get moved to a temp on gen7, which is needed
for an upcoming change but hadn't happened in the visitor until then.

v2: Drop gen > 7 checks (doesn't exist), and style-fix comments (changes by
    anholt).

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965: Add various plumbing for cubemap arrays
Chris Forbes [Thu, 22 Nov 2012 04:55:08 +0000 (17:55 +1300)]
i965: Add various plumbing for cubemap arrays

V4: Fixed style nits

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Add empirically-determined instruction latencies for gen7.
Eric Anholt [Thu, 6 Dec 2012 00:19:43 +0000 (16:19 -0800)]
i965/fs: Add empirically-determined instruction latencies for gen7.

v2: Actually switch on the other math instructions mentioned in the
    comment.
v3: Add timing data for textureSize(), and clean up some long comment
    lines.

Testing shader_time of fs16 shaders on a few frames of various apps:
nexuiz improved by 2.9% +/- 1.5% (n=10)
no difference on GLB2.5 (n=36, outliers removed)
no difference on GLB2.7 (n=25)
etqw improved by 2.6% +/- 2.2% (n=25)
no difference on lightsmark (n=25)

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Fix the clock increment in scheduling.
Eric Anholt [Thu, 6 Dec 2012 00:17:58 +0000 (16:17 -0800)]
i965/fs: Fix the clock increment in scheduling.

I've tested this to be true with various ALU ops on gen7 (with the
exception of MADs, which go at either 3 or 4 cycles per dispatch).

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Move the old gen4 bspec-based scheduling info to a helper func.
Eric Anholt [Wed, 5 Dec 2012 23:24:07 +0000 (15:24 -0800)]
i965/fs: Move the old gen4 bspec-based scheduling info to a helper func.

For gen7 everything changes, and we have actual information on latency.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Set up gen7 UBO loads as sends from GRFs.
Eric Anholt [Wed, 5 Dec 2012 08:06:30 +0000 (00:06 -0800)]
i965/fs: Set up gen7 UBO loads as sends from GRFs.

This gives the instruction scheduler a chance to schedule between the
loads, whereas before it was restricted due to the dependencies between
the MRFs for setting them up.

For one shader in gles3conform, it goes from getting stuck in register
allocation for as long as anybody's bothered to leave it running down
to 23 seconds, thanks to the LIFO scheduling.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Before reg alloc, schedule instructions to reduce live ranges.
Eric Anholt [Tue, 4 Dec 2012 03:59:55 +0000 (19:59 -0800)]
i965/fs: Before reg alloc, schedule instructions to reduce live ranges.

This came from an idea by Ben Segovia.  16-wide pixel shaders are very
important for latency hiding on i965, so we want to try really hard to
get them.  If scheduling an instruction makes some set of instructions
available, those are probably the ones that make the instruction's
result dead.  By choosing those first, we'll have a tendency to reduce
the amount of live data as opposed to creating more.

Previously, we were sometimes getting this behavior out of the
scheduler, which was what produced the scheduler's original performance
wins on lightsmark.  Unfortunately, that was mostly an accident of the
lame instruction latency information that I had, which made it
impossible to fix the actual scheduling for performance.  Now that we've
fixed the scheduling for setup for register allocation, we can safely
update the latency parameters for the final schedule.

In shader-db, we lose 37 16-wide shaders, but gain 90 new ones.  4
shaders that were spilling change how many registers spill, for a
reduction of 70/3899 instructions.

v2: Simplify the new loop.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Add some optional debug printfs to scheduling.
Eric Anholt [Tue, 4 Dec 2012 21:52:19 +0000 (13:52 -0800)]
i965/fs: Add some optional debug printfs to scheduling.

Seeing when instructions become available to schedule is really useful.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Schedule instructions both before and after register allocation.
Eric Anholt [Tue, 4 Dec 2012 01:58:03 +0000 (17:58 -0800)]
i965/fs: Schedule instructions both before and after register allocation.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965: Make sure that the shader_time report at context destroy happens.
Eric Anholt [Fri, 14 Dec 2012 22:02:34 +0000 (14:02 -0800)]
i965: Make sure that the shader_time report at context destroy happens.

Otherwise, you end up with some report from within a second of context
destroy, which is now what you really want for testing the impact of
changes

11 years agoi965: Print a total time for the different shader stages.
Eric Anholt [Mon, 10 Dec 2012 18:22:41 +0000 (10:22 -0800)]
i965: Print a total time for the different shader stages.

Sometimes I've got a patch for a performance optimization that's not
showing a statistically significant performance difference on reported
FPS, but still seems like a good idea because it ought to reduce time
spent in the shader.  If I can see the total number of cycles spent in
the shader stage being optimized, it may show that the patch is still
worthwhile (or point out that it's actually broken in some way).

11 years agoi965: Scale shader_time to compensate for resets.
Eric Anholt [Mon, 10 Dec 2012 17:44:19 +0000 (09:44 -0800)]
i965: Scale shader_time to compensate for resets.

Some shaders experience resets more than others, which skews the numbers
reported.  Attempt to correct for this by linearly scaling according to
the number of resets that happen.

Note that will not be accurate if invocations of shaders have varying
times and longer invocations are more likely to reset.  However, this
should at least be better than the previous situation.

11 years agoi965: Adjust the split between shader_time_end() and shader_time_write().
Eric Anholt [Mon, 10 Dec 2012 17:21:34 +0000 (09:21 -0800)]
i965: Adjust the split between shader_time_end() and shader_time_write().

I'm about to emit other kinds of writes besides time deltas, and it
turns out with the frequency of resets, we couldn't really use the old
time delta write() function more than once in a shader.

11 years agoglsl/linker: Pack between varyings.
Paul Berry [Wed, 5 Dec 2012 22:37:19 +0000 (14:37 -0800)]
glsl/linker: Pack between varyings.

This patch implements varying packing between varyings.

Previously, each varying occupied components 0 through N-1 of its
assigned varying slot, so there was no way to pack two varyings into
the same slot.  For example, if the varyings were a float, a vec2, a
vec3, and another vec2, they would be stored as follows:

 <----slot1----> <----slot2----> <----slot3----> <----slot4---->  slots
  *   *   *   *   *   *   *   *   *   *   *   *   *   *   *   *
 flt  x   x   x  <vec2->  x   x  <--vec3--->  x  <vec2->  x   x   varyings

(Each * represents a varying component, and the "x"s represent wasted
space).

This change packs the varyings together to eliminate wasted space
between varyings, like so:

 <----slot1----> <----slot2----> <----slot3----> <----slot4---->  slots
  *   *   *   *   *   *   *   *   *   *   *   *   *   *   *   *
 <vec2-> <vec2-> flt <--vec3--->  x   x   x   x   x   x   x   x   varyings

Note that we take advantage of the sort order introduced in previous
patches (vec4's first, then vec2's, then scalars, then vec3's) to
minimize how often a varying is "double parked" (split across varying
slots).

Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Skip varying packing if ctx->Const.DisableVaryingPacking is true.

11 years agoglsl/linker: Pack within compound varyings.
Paul Berry [Mon, 10 Dec 2012 04:59:26 +0000 (20:59 -0800)]
glsl/linker: Pack within compound varyings.

This patch implements varying packing within varyings that are
composed of multiple vectors of size less than 4 (e.g. arrays of
vec2's, or matrices with height less than 4).

Previously, such varyings used up a full 4-wide varying slot for each
constituent vector, meaning that some of the components of each
varying slot went unused.  For example, a mat4x3 would be stored as
follows:

 <----slot1----> <----slot2----> <----slot3----> <----slot4---->  slots
  *   *   *   *   *   *   *   *   *   *   *   *   *   *   *   *
 <-column1->  x  <-column2->  x  <-column3->  x  <-column4->  x   matrix

(Each * represents a varying component, and the "x"s represent wasted
space).  In addition to wasting precious varying components, this
layout complicated transform feedback, since the constituents of the
varying are expected to be output to the transform feedback buffer
contiguously (e.g. without gaps between the columns, in the case of a
matrix).

This change packs the constituents of each varying together so that
all wasted space is at the end.  For the mat4x3 example, this looks
like so:

 <----slot1----> <----slot2----> <----slot3----> <----slot4---->  slots
  *   *   *   *   *   *   *   *   *   *   *   *   *   *   *   *
 <-column1-> <-column2-> <-column3-> <-column4->  x   x   x   x   matrix

Note that matrix columns 2 and 3 now cross a boundary between varying
slots (a characteristic I call "double parking" of a varying).

We don't bother trying to eliminate the wasted space at the end of the
varying, since the patch that follows will take care of that.

Since compiler back-ends don't (yet) support this packed layout, the
lower_packed_varyings function is used to rewrite the shader into a
form where each varying occupies a full varying slot.  Later, if we
add native back-end support for varying packing, we can make this
lowering pass optional.

Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Skip varying packing if ctx->Const.DisableVaryingPacking is true.

11 years agogallium: Disable varying packing on hardware with <=8 texture indirections.
Paul Berry [Thu, 13 Dec 2012 20:45:54 +0000 (12:45 -0800)]
gallium: Disable varying packing on hardware with <=8 texture indirections.

In practice this will disable varying packing on R300, R400, i915g,
and nv30.

Reviewed-by: Marek Olšák <maraeo@gmail.com>
11 years agomesa: Add an option so driver can opt out of varying packing.
Paul Berry [Thu, 13 Dec 2012 20:45:37 +0000 (12:45 -0800)]
mesa: Add an option so driver can opt out of varying packing.

On hardware that supports a limited number of texture indirections,
varying packing will comsume an extra texture indirection, since ALU
operations are needed in the fragment shader to unpack the varyings
before any texturing can be done.

This patch introduces a new driver option,
ctx->Const.DisableVaryingPacking, which can be used by a driver to opt
out of varying packing if the extra texture indirection is costly
enough to outweigh the advantages of packing varyings.

Reviewed-by: Marek Olšák <maraeo@gmail.com>
11 years agoglsl: Add a lowering pass for packing varyings.
Paul Berry [Sun, 9 Dec 2012 23:25:38 +0000 (15:25 -0800)]
glsl: Add a lowering pass for packing varyings.

This lowering pass generates GLSL code that manually packs varyings
into vec4 slots, for the benefit of back-ends that don't support
packed varyings natively.

No functional change--the lowering pass is not yet used.

Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Don't use ir_hierarchical_visitor--just loop over instructions
directly.  Also, make the names of the packed varyings include the
names of the original varyings that were packed into them.

11 years agoglsl/linker: Sort varyings by packing class, then vector size.
Paul Berry [Wed, 5 Dec 2012 18:19:19 +0000 (10:19 -0800)]
glsl/linker: Sort varyings by packing class, then vector size.

This patch paves the way for varying packing by adding a sorting step
before varying assignment, which sorts the varyings into an order that
increases the likelihood of being able to find an efficient packing.

First, varyings are sorted into "packing classes" by considering
attributes that can't be mixed during varying packing--at the moment
this includes base type (float/int/uint/bool) and interpolation mode
(smooth/noperspective/flat/centroid), though later we will hopefully
be able to relax some of these restrictions.  The number of packing
classes places an upper limit on the amount of space that must be
wasted by varying packing, since in theory a shader might nave 4n+1
components worth of varyings in each of m packing classes, resulting
in 3m components worth of wasted space.

Then, within each packing class, varyings are sorted by vector size,
with vec4's coming first, then vec2's, then scalars, and then finally
vec3's.  The motivation for this order is that it ensures that the
only vectors that might be "double parked" (with part of the vector in
one varying slot and the remainder in another) are vec3's.

Note that the varyings aren't actually packed yet, merely placed in an
order that will facilitate packing.

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoglsl/linker: Subdivide the first phase of varying assignment.
Paul Berry [Tue, 4 Dec 2012 23:55:59 +0000 (15:55 -0800)]
glsl/linker: Subdivide the first phase of varying assignment.

This patch further subdivides the loop that assigns varying locations
into two phases: one phase to match up the varyings between shader
stages, and one phase to assign them varying locations.

In between the two phases the matched varyings are stored in a new
data structure called varying_matches.  This will free us to be able
to assign varying locations in any order, which will pave the way for
packing varyings.

Note that the new varying_matches::assign_locations() function returns
the number of varying slots that were used; this return value will be
used in a future patch.

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoglsl/linker: Defer recording transform feedback locations.
Paul Berry [Tue, 4 Dec 2012 18:34:45 +0000 (10:34 -0800)]
glsl/linker: Defer recording transform feedback locations.

This patch subdivides the loop that assigns varying locations into two
phases: one phase to match up varyings between shader stages (and
assign them varying locations), and a second phase to record the
varying assignments for use by transform feedback.

This paves the way for varying packing, which will require us to
further subdivide the first phase.

In addition, it lets us avoid a clumsy O(n^2) algorithm, since we can
now record the locations of all transform feedback varyings in a
single pass through the tfeedback_decls array, rather than have to
iterate through the array after assigning each varying.

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoglsl: Create a field to store fractional varying locations.
Paul Berry [Wed, 5 Dec 2012 18:47:55 +0000 (10:47 -0800)]
glsl: Create a field to store fractional varying locations.

Currently, the location of each varying is recorded in ir_variable as
a multiple of the size of a vec4.  In order to pack varyings, we need
to be able to record, e.g. that a vec2 is stored in the second half of
a varying slot rather than the first half.

This patch introduces a field ir_variable::location_frac, which
represents the offset within a vec4 where a varying's value is stored.
Varyings that are not subject to packing will always have a
location_frac value of zero.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
11 years agoglsl/linker: Make separate ir_variable field to mean "unmatched".
Paul Berry [Tue, 4 Dec 2012 23:17:01 +0000 (15:17 -0800)]
glsl/linker: Make separate ir_variable field to mean "unmatched".

Previously, the linker used a value of -1 in ir_variable::location to
denote a generic input or output of the shader that had not yet been
matched up to a variable in another pipeline stage.

This patch introduces a new ir_variable field,
is_unmatched_generic_inout, for that purpose.

In future patches, this will allow us to separate the process of
matching varyings between shader stages from the processes of
assigning locations to those varying.  That will in turn pave the way
for packing varyings.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
11 years agoglsl/linker: Always invalidate shader ins/outs, even in corner cases.
Paul Berry [Wed, 5 Dec 2012 15:17:07 +0000 (07:17 -0800)]
glsl/linker: Always invalidate shader ins/outs, even in corner cases.

Previously, link_invalidate_variable_locations() was only called
during assign_attribute_or_color_locations() and
assign_varying_locations().  This meant that in the corner case when
there was only a vertex shader, and varyings were being captured by
transform feedback, link_invalidate_variable_locations() wasn't being
called for the varyings.

This patch migrates the calls to link_invalidate_variable_locations()
to link_shaders(), so that they will be called in all circumstances.
In addition, it modifies the call semantics so that
link_invalidate_variable_locations() need only be called once per
shader stage (rather than once for inputs and once for outputs).

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
11 years agoglsl/lower_clip_distance: Update symbol table.
Paul Berry [Tue, 4 Dec 2012 19:11:02 +0000 (11:11 -0800)]
glsl/lower_clip_distance: Update symbol table.

This patch modifies the clip distance lowering pass so that the new
symbol it generates (glClipDistanceMESA) is added to the shader's
symbol table.

This will allow a later patch to modify the linker so that it finds
transform feedback varyings using the symbol table rather than having
to iterate through all the declarations in the shader.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>