git.libre-soc.org Git - mesa.git/log

Eric Anholt [Tue, 4 Dec 2012 01:58:03 +0000 (17:58 -0800)]

i965/fs: Schedule instructions both before and after register allocation.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>

commit | commitdiff | tree

Eric Anholt [Fri, 14 Dec 2012 22:02:34 +0000 (14:02 -0800)]

i965: Make sure that the shader_time report at context destroy happens.

Otherwise, you end up with some report from within a second of context
destroy, which is now what you really want for testing the impact of
changes

commit | commitdiff | tree

Eric Anholt [Mon, 10 Dec 2012 18:22:41 +0000 (10:22 -0800)]

i965: Print a total time for the different shader stages.

Sometimes I've got a patch for a performance optimization that's not
showing a statistically significant performance difference on reported
FPS, but still seems like a good idea because it ought to reduce time
spent in the shader. If I can see the total number of cycles spent in
the shader stage being optimized, it may show that the patch is still
worthwhile (or point out that it's actually broken in some way).

commit | commitdiff | tree

Eric Anholt [Mon, 10 Dec 2012 17:44:19 +0000 (09:44 -0800)]

i965: Scale shader_time to compensate for resets.

Some shaders experience resets more than others, which skews the numbers
reported. Attempt to correct for this by linearly scaling according to
the number of resets that happen.

Note that will not be accurate if invocations of shaders have varying
times and longer invocations are more likely to reset. However, this
should at least be better than the previous situation.

commit | commitdiff | tree

Eric Anholt [Mon, 10 Dec 2012 17:21:34 +0000 (09:21 -0800)]

i965: Adjust the split between shader_time_end() and shader_time_write().

I'm about to emit other kinds of writes besides time deltas, and it
turns out with the frequency of resets, we couldn't really use the old
time delta write() function more than once in a shader.

commit | commitdiff | tree

Paul Berry [Wed, 5 Dec 2012 22:37:19 +0000 (14:37 -0800)]

glsl/linker: Pack between varyings.

This patch implements varying packing between varyings.

Previously, each varying occupied components 0 through N-1 of its
assigned varying slot, so there was no way to pack two varyings into
the same slot.  For example, if the varyings were a float, a vec2, a
vec3, and another vec2, they would be stored as follows:

<----slot1----> <----slot2----> <----slot3----> <----slot4---->  slots
  *   *   *   *   *   *   *   *   *   *   *   *   *   *   *   *
flt  x   x   x  <vec2->  x   x  <--vec3--->  x  <vec2->  x   x   varyings

(Each * represents a varying component, and the "x"s represent wasted
space).

This change packs the varyings together to eliminate wasted space
between varyings, like so:

<----slot1----> <----slot2----> <----slot3----> <----slot4---->  slots
  *   *   *   *   *   *   *   *   *   *   *   *   *   *   *   *
<vec2-> <vec2-> flt <--vec3--->  x   x   x   x   x   x   x   x   varyings

Note that we take advantage of the sort order introduced in previous
patches (vec4's first, then vec2's, then scalars, then vec3's) to
minimize how often a varying is "double parked" (split across varying
slots).

Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Skip varying packing if ctx->Const.DisableVaryingPacking is true.

commit | commitdiff | tree

Paul Berry [Mon, 10 Dec 2012 04:59:26 +0000 (20:59 -0800)]

glsl/linker: Pack within compound varyings.

This patch implements varying packing within varyings that are
composed of multiple vectors of size less than 4 (e.g. arrays of
vec2's, or matrices with height less than 4).

Previously, such varyings used up a full 4-wide varying slot for each
constituent vector, meaning that some of the components of each
varying slot went unused.  For example, a mat4x3 would be stored as
follows:

<----slot1----> <----slot2----> <----slot3----> <----slot4---->  slots
  *   *   *   *   *   *   *   *   *   *   *   *   *   *   *   *
<-column1->  x  <-column2->  x  <-column3->  x  <-column4->  x   matrix

(Each * represents a varying component, and the "x"s represent wasted
space).  In addition to wasting precious varying components, this
layout complicated transform feedback, since the constituents of the
varying are expected to be output to the transform feedback buffer
contiguously (e.g. without gaps between the columns, in the case of a
matrix).

This change packs the constituents of each varying together so that
all wasted space is at the end.  For the mat4x3 example, this looks
like so:

<----slot1----> <----slot2----> <----slot3----> <----slot4---->  slots
  *   *   *   *   *   *   *   *   *   *   *   *   *   *   *   *
<-column1-> <-column2-> <-column3-> <-column4->  x   x   x   x   matrix

Note that matrix columns 2 and 3 now cross a boundary between varying
slots (a characteristic I call "double parking" of a varying).

We don't bother trying to eliminate the wasted space at the end of the
varying, since the patch that follows will take care of that.

Since compiler back-ends don't (yet) support this packed layout, the
lower_packed_varyings function is used to rewrite the shader into a
form where each varying occupies a full varying slot.  Later, if we
add native back-end support for varying packing, we can make this
lowering pass optional.

Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Skip varying packing if ctx->Const.DisableVaryingPacking is true.

commit | commitdiff | tree

Paul Berry [Thu, 13 Dec 2012 20:45:54 +0000 (12:45 -0800)]

gallium: Disable varying packing on hardware with <=8 texture indirections.

In practice this will disable varying packing on R300, R400, i915g,
and nv30.

Reviewed-by: Marek Olšák <maraeo@gmail.com>

commit | commitdiff | tree

Paul Berry [Thu, 13 Dec 2012 20:45:37 +0000 (12:45 -0800)]

mesa: Add an option so driver can opt out of varying packing.

On hardware that supports a limited number of texture indirections,
varying packing will comsume an extra texture indirection, since ALU
operations are needed in the fragment shader to unpack the varyings
before any texturing can be done.

This patch introduces a new driver option,
ctx->Const.DisableVaryingPacking, which can be used by a driver to opt
out of varying packing if the extra texture indirection is costly
enough to outweigh the advantages of packing varyings.

Reviewed-by: Marek Olšák <maraeo@gmail.com>

commit | commitdiff | tree

Paul Berry [Sun, 9 Dec 2012 23:25:38 +0000 (15:25 -0800)]

glsl: Add a lowering pass for packing varyings.

This lowering pass generates GLSL code that manually packs varyings
into vec4 slots, for the benefit of back-ends that don't support
packed varyings natively.

No functional change--the lowering pass is not yet used.

Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Don't use ir_hierarchical_visitor--just loop over instructions
directly. Also, make the names of the packed varyings include the
names of the original varyings that were packed into them.

commit | commitdiff | tree

Paul Berry [Wed, 5 Dec 2012 18:19:19 +0000 (10:19 -0800)]

glsl/linker: Sort varyings by packing class, then vector size.

This patch paves the way for varying packing by adding a sorting step
before varying assignment, which sorts the varyings into an order that
increases the likelihood of being able to find an efficient packing.

First, varyings are sorted into "packing classes" by considering
attributes that can't be mixed during varying packing--at the moment
this includes base type (float/int/uint/bool) and interpolation mode
(smooth/noperspective/flat/centroid), though later we will hopefully
be able to relax some of these restrictions. The number of packing
classes places an upper limit on the amount of space that must be
wasted by varying packing, since in theory a shader might nave 4n+1
components worth of varyings in each of m packing classes, resulting
in 3m components worth of wasted space.

Then, within each packing class, varyings are sorted by vector size,
with vec4's coming first, then vec2's, then scalars, and then finally
vec3's. The motivation for this order is that it ensures that the
only vectors that might be "double parked" (with part of the vector in
one varying slot and the remainder in another) are vec3's.

Note that the varyings aren't actually packed yet, merely placed in an
order that will facilitate packing.

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Paul Berry [Tue, 4 Dec 2012 23:55:59 +0000 (15:55 -0800)]

glsl/linker: Subdivide the first phase of varying assignment.

This patch further subdivides the loop that assigns varying locations
into two phases: one phase to match up the varyings between shader
stages, and one phase to assign them varying locations.

In between the two phases the matched varyings are stored in a new
data structure called varying_matches. This will free us to be able
to assign varying locations in any order, which will pave the way for
packing varyings.

Note that the new varying_matches::assign_locations() function returns
the number of varying slots that were used; this return value will be
used in a future patch.

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Paul Berry [Tue, 4 Dec 2012 18:34:45 +0000 (10:34 -0800)]

glsl/linker: Defer recording transform feedback locations.

This patch subdivides the loop that assigns varying locations into two
phases: one phase to match up varyings between shader stages (and
assign them varying locations), and a second phase to record the
varying assignments for use by transform feedback.

This paves the way for varying packing, which will require us to
further subdivide the first phase.

In addition, it lets us avoid a clumsy O(n^2) algorithm, since we can
now record the locations of all transform feedback varyings in a
single pass through the tfeedback_decls array, rather than have to
iterate through the array after assigning each varying.

Reviewed-by: Eric Anholt <eric@anholt.net>

commit | commitdiff | tree

Paul Berry [Wed, 5 Dec 2012 18:47:55 +0000 (10:47 -0800)]

glsl: Create a field to store fractional varying locations.

Currently, the location of each varying is recorded in ir_variable as
a multiple of the size of a vec4. In order to pack varyings, we need
to be able to record, e.g. that a vec2 is stored in the second half of
a varying slot rather than the first half.

This patch introduces a field ir_variable::location_frac, which
represents the offset within a vec4 where a varying's value is stored.
Varyings that are not subject to packing will always have a
location_frac value of zero.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Paul Berry [Tue, 4 Dec 2012 23:17:01 +0000 (15:17 -0800)]

glsl/linker: Make separate ir_variable field to mean "unmatched".

Previously, the linker used a value of -1 in ir_variable::location to
denote a generic input or output of the shader that had not yet been
matched up to a variable in another pipeline stage.

This patch introduces a new ir_variable field,
is_unmatched_generic_inout, for that purpose.

In future patches, this will allow us to separate the process of
matching varyings between shader stages from the processes of
assigning locations to those varying. That will in turn pave the way
for packing varyings.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Paul Berry [Wed, 5 Dec 2012 15:17:07 +0000 (07:17 -0800)]

glsl/linker: Always invalidate shader ins/outs, even in corner cases.

Previously, link_invalidate_variable_locations() was only called
during assign_attribute_or_color_locations() and
assign_varying_locations(). This meant that in the corner case when
there was only a vertex shader, and varyings were being captured by
transform feedback, link_invalidate_variable_locations() wasn't being
called for the varyings.

This patch migrates the calls to link_invalidate_variable_locations()
to link_shaders(), so that they will be called in all circumstances.
In addition, it modifies the call semantics so that
link_invalidate_variable_locations() need only be called once per
shader stage (rather than once for inputs and once for outputs).

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Paul Berry [Tue, 4 Dec 2012 19:11:02 +0000 (11:11 -0800)]

glsl/lower_clip_distance: Update symbol table.

This patch modifies the clip distance lowering pass so that the new
symbol it generates (glClipDistanceMESA) is added to the shader's
symbol table.

This will allow a later patch to modify the linker so that it finds
transform feedback varyings using the symbol table rather than having
to iterate through all the declarations in the shader.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Tapani Pälli [Thu, 13 Dec 2012 08:56:08 +0000 (10:56 +0200)]

android: build fix for libmesa_glsl_utils

hash_table.c compilation requires ralloc.h include path

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>

commit | commitdiff | tree