mesa.git
11 years agonv50,nvc0: disable DEPTH_RANGE_NEAR/FAR clipping during blit
Christoph Bumiller [Sun, 31 Mar 2013 20:10:02 +0000 (22:10 +0200)]
nv50,nvc0: disable DEPTH_RANGE_NEAR/FAR clipping during blit

We send position.z == 0, DEPTH_RANGE may be some arbitrary range
not including 0 (for exmaple in piglit's hiz tests).

11 years agost/mesa: fix bitmap,drawpix,drawtex for PIPE_CAP_TGSI_TEXCOORD
Christoph Bumiller [Sat, 30 Mar 2013 13:57:21 +0000 (14:57 +0100)]
st/mesa: fix bitmap,drawpix,drawtex for PIPE_CAP_TGSI_TEXCOORD

NOTE: Changed the semantic index for the drawtex coordinate to
be the texture unit index instead of always 0.
Not sure if this is correct but since the value seems to depend
on the unit it would make sense to use different varying slots.

11 years agonouveau: accelerate buffer copies in resource_copy_region
Christoph Bumiller [Sat, 30 Mar 2013 14:55:20 +0000 (15:55 +0100)]
nouveau: accelerate buffer copies in resource_copy_region

11 years agonvc0: demagic some of the NVE4_COMPUTE_UPLOAD methods
Christoph Bumiller [Mon, 1 Apr 2013 19:46:24 +0000 (21:46 +0200)]
nvc0: demagic some of the NVE4_COMPUTE_UPLOAD methods

It's actually the same as P2MF.

11 years agonvc0: read PM counters for each warp scheduler separately
Christoph Bumiller [Tue, 2 Apr 2013 16:24:45 +0000 (18:24 +0200)]
nvc0: read PM counters for each warp scheduler separately

11 years agonvc0: add some metrics to driver specific queries
Christoph Bumiller [Mon, 1 Apr 2013 15:25:40 +0000 (17:25 +0200)]
nvc0: add some metrics to driver specific queries

11 years agonvc0: add some driver statistics queries
Christoph Bumiller [Fri, 29 Mar 2013 15:30:58 +0000 (16:30 +0100)]
nvc0: add some driver statistics queries

11 years agonvc0: disable compressed storage type 0xdb for now
Christoph Bumiller [Sun, 31 Mar 2013 18:10:23 +0000 (20:10 +0200)]
nvc0: disable compressed storage type 0xdb for now

Single-sample color compression doesn't seem that useful anyway.

11 years agonvc0: use correct hw query for PRIMITIVES_GENERATED
Christoph Bumiller [Fri, 29 Mar 2013 14:11:16 +0000 (15:11 +0100)]
nvc0: use correct hw query for PRIMITIVES_GENERATED

It was the same as SO_STATISTICS[1] before.

11 years agonvc0: use fence to check state of queries that don't write sequence
Christoph Bumiller [Fri, 29 Mar 2013 12:50:44 +0000 (13:50 +0100)]
nvc0: use fence to check state of queries that don't write sequence

This still isn't optimal, since the fence will signal a bit late,
but better than checking on the bo, which may never be ready if it
is shared (which is likely).

11 years agogallium/hud: add support for PIPE_QUERY_PIPELINE_STATISTICS
Christoph Bumiller [Fri, 29 Mar 2013 12:56:35 +0000 (13:56 +0100)]
gallium/hud: add support for PIPE_QUERY_PIPELINE_STATISTICS

Also, renamed "pixels-rendered" to "samples-passed" because the
occlusion counter increments even if colour and depth writes are
disabled, or (on some implementations) for killed fragments that
passed the depth test when PS early_fragment_tests is set.

11 years agogallium/docs: fix definition of PIPE_QUERY_SO_STATISTICS
Christoph Bumiller [Fri, 29 Mar 2013 13:30:49 +0000 (14:30 +0100)]
gallium/docs: fix definition of PIPE_QUERY_SO_STATISTICS

Reviewed-by: Marek Olšák <maraeo@gmail.com>
11 years agogallium: add PIPE_CAP_QUERY_PIPELINE_STATISTICS
Christoph Bumiller [Fri, 29 Mar 2013 12:02:49 +0000 (13:02 +0100)]
gallium: add PIPE_CAP_QUERY_PIPELINE_STATISTICS

Reviewed-by: Marek Olšák <maraeo@gmail.com>
11 years agoi965: Reduce code duplication in handling of depth, stencil, and HiZ.
Paul Berry [Tue, 26 Mar 2013 20:24:43 +0000 (13:24 -0700)]
i965: Reduce code duplication in handling of depth, stencil, and HiZ.

This patch consolidates duplicate code in the brw_depthbuffer and
gen7_depthbuffer state atoms.  Previously, these state atoms contained
5 chunks of code for emitting the _3DSTATE_DEPTH_BUFFER packet (3 for
Gen4-6 and 2 for Gen7).  Also a lot of logic for determining the
appropriate buffer setup was duplicated between the Gen4-6 and Gen7
functions.

This refactor splits the code into three separate functions:
brw_emit_depthbuffer(), which determines the appropriate buffer setup
in a mostly generation-independent way, brw_emit_depth_stencil_hiz(),
which emits the appropriate state packets for Gen4-6, and
gen7_emit_depth_stencil_hiz(), which emits the appropriate state
packets for Gen7.

Tested using Piglit on Gen5-7 (no regressions).

v2: Re-word some comments.  Fix an assertion that incorrectly
prohibited packed depth/stencil formats on Gen6 (these are allowed
provided that HiZ is disabled).

Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoRevert "glsl: Replace constant-index vector array accesses with swizzles"
Paul Berry [Tue, 2 Apr 2013 16:35:32 +0000 (09:35 -0700)]
Revert "glsl: Replace constant-index vector array accesses with swizzles"

This reverts commit dbf94d105a48b7aafb2c8cf64d8b4392d87efea1, which
was working around a bug in the handling of array indexing when
constant folding built-in functions.  Now that the constant folding
bug has been fixed, the workaround is no longer needed.

11 years agoglsl: Fix array indexing when constant folding built-in functions.
Paul Berry [Fri, 29 Mar 2013 20:34:51 +0000 (13:34 -0700)]
glsl: Fix array indexing when constant folding built-in functions.

Mesa constant-folds built-in functions by using a miniature GLSL
interpreter (see
ir_function_signature::constant_expression_evaluate_expression_list()).
This interpreter had a bug in its handling of array indexing, which
caused expressions like "m[i][j]" (where m is a matrix) to be handled
incorrectly.  Specifically, it incorrectly treated j as indexing into
the whole matrix (rather than indexing just into the vector m[i]); as
a result the offset computed for m[i] was lost and m[i][j] was treated
as m[j][0].

Fixes piglit tests inverse-mat[234].{vert,frag}.

NOTE: This is a candidate for the 9.1 and 9.0 branches.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57436

11 years agogallivm: bring back optimized but incorrect float to smallfloat optimizations
Roland Scheidegger [Tue, 2 Apr 2013 15:47:30 +0000 (17:47 +0200)]
gallivm: bring back optimized but incorrect float to smallfloat optimizations

Conceptually the same as previously done in float_to_half.
Should cut down number of instructions from 14 to 10 or so, but
will promote some NaNs to Infs, so it's disabled.
It gets a bit tricky though handling all the cases correctly...
Passes basic tests either way (though there are no tests testing special
cases, but some manual tests injecting them seemed promising).

v2: style and comment fixes suggested by Jose

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
11 years agogallivm: consolidate code for float-to-half and float-to-packed conversion.
Roland Scheidegger [Tue, 2 Apr 2013 15:41:44 +0000 (17:41 +0200)]
gallivm: consolidate code for float-to-half and float-to-packed conversion.

This replaces the existing float-to-half implementation.
There are definitely a couple of differences - the old implementation
had unspecified(?) rounding behavior, and could at least in theory
construct Inf values out of NaNs. NaNs and Infs should now always be
properly propagated, and rounding behavior is now towards zero
(note this means too large but non-Infinity values get propagated to max
representable value, not Infinity).
The implementation will definitely not match util code, however (which
does nearest rounding, which also means too large values will get
propagated to Infinity).

Also fix a bogus round mask probably leading to rounding bugs...
v2: fix a logic bug in handling infs/nans.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
11 years agor600g: don't reserve more stack space than required v5
Vadim Girlin [Tue, 2 Apr 2013 15:33:40 +0000 (19:33 +0400)]
r600g: don't reserve more stack space than required v5

Reduced stack size allows to run more threads in some cases,
improving performance for the shaders that use stack (that is, for the
shaders with control flow instructions). E.g. with unigine-based apps.

v4: implement exact computation taking into account wavefront size
v5: add cases for RV620, RS880

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
11 years agor600g: fix range handling for tgsi input declarations v2
Vadim Girlin [Tue, 2 Apr 2013 15:32:26 +0000 (19:32 +0400)]
r600g: fix range handling for tgsi input declarations v2

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
11 years agogallium/hud: do .xxxx swizzling for the font texture in the fragment shader
Marek Olšák [Tue, 2 Apr 2013 01:30:09 +0000 (03:30 +0200)]
gallium/hud: do .xxxx swizzling for the font texture in the fragment shader

This allows using L8 and R8 for the font if I8 isn't supported.

Tested-by: Brian Paul <brianp@vmware.com>
11 years agohud: flush/unmap the vertex buffer before drawing
Brian Paul [Mon, 1 Apr 2013 22:46:06 +0000 (16:46 -0600)]
hud: flush/unmap the vertex buffer before drawing

The VMware svga driver is picky about making sure the VBO is unmapped
before drawing.

Reviewed-by: Marek Olšák <maraeo@gmail.com>
11 years agodraw: use pipe_transfer_unmap() to match pipe_transfer_map()
Brian Paul [Mon, 1 Apr 2013 22:44:01 +0000 (16:44 -0600)]
draw: use pipe_transfer_unmap() to match pipe_transfer_map()

11 years agogallivm: fix signed small float to float conversion
Roland Scheidegger [Tue, 2 Apr 2013 11:20:24 +0000 (13:20 +0200)]
gallivm: fix signed small float to float conversion

Introduced by 5f41e08cf39d585d600aa506cdcd2f5380c60ddd,
just a silly typo.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=62921.

11 years agoradeonsi: add instance divisor support v3
Christian König [Fri, 22 Mar 2013 14:59:22 +0000 (15:59 +0100)]
radeonsi: add instance divisor support v3

v2: reduce key size, don't copy key around to much.
v3: remove key size reduction

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
11 years agoradeonsi: add start instance support
Christian König [Thu, 21 Mar 2013 17:30:23 +0000 (18:30 +0100)]
radeonsi: add start instance support

This works different than on R600, we need to add the start instance manually.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
11 years agoradeonsi: add instanceid support
Christian König [Thu, 21 Mar 2013 17:02:52 +0000 (18:02 +0100)]
radeonsi: add instanceid support

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
11 years agoradeon/llvm: move system value fetching to common code
Christian König [Thu, 21 Mar 2013 16:37:37 +0000 (17:37 +0100)]
radeon/llvm: move system value fetching to common code

This should be used by both SI and R600.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
11 years agoradeonsi: Handle arbitrary 2-byte formats in resource_copy_region
Michel Dänzer [Wed, 27 Mar 2013 11:43:32 +0000 (12:43 +0100)]
radeonsi: Handle arbitrary 2-byte formats in resource_copy_region

Fixes mplayer -vo vdpau OSD.

NOTE: This is a candidate for the 9.1 branch.

Reported-by: Igor Vagulin <igor.vagulin@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Tested-by: Christian König <christian.koenig@amd.com>
11 years agonvc0: Fix fd leak in nvc0_create_decoder
Maarten Lankhorst [Sun, 24 Mar 2013 13:37:41 +0000 (14:37 +0100)]
nvc0: Fix fd leak in nvc0_create_decoder

NOTE: This is a candidate for the 9.0 and 9.1 branches.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
11 years agoGLSL: fix lower_jumps to report progress properly
Aras Pranckevicius [Fri, 1 Mar 2013 10:05:11 +0000 (12:05 +0200)]
GLSL: fix lower_jumps to report progress properly

A fix for lower_jumps progress reporting, very much like similar in
c1e591eed.

NOTE: This is a candidate for stable branches.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
11 years agoi965/fs: Allow CSE on pre-gen7 varying-index uniform loads
Eric Anholt [Wed, 20 Mar 2013 00:45:02 +0000 (17:45 -0700)]
i965/fs: Allow CSE on pre-gen7 varying-index uniform loads

All the other expression types allowed here have inst->mlen == 0, and this
one has implied MRF writes for all of its payload, so nothing else in the
implementation should need to change.

Reduces SEND messages for loading from pull constants in kwin's Lanczos
shader from 16 to 6.  (Due to a deficiency in constant propagation, I
can't use the hack I did in the previous commit to test the performance
change)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554
NOTE: This is a candidate for the 9.1 branch.

11 years agoi965/fs: Use LD messages for pre-gen7 varying-index uniform loads
Eric Anholt [Mon, 18 Mar 2013 17:16:42 +0000 (10:16 -0700)]
i965/fs: Use LD messages for pre-gen7 varying-index uniform loads

This comes at a minor performance cost at the moment (-3.2% +/- 0.2%, n=14 on
my GM45 forced to load all uniforms through the varying-index path), but we
get a whole vec4 at a time to reuse in the next commit.

v2: Fix comment about channels in the other message.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 9.1 branch.

11 years agoi965/fs: Don't double-emit SEND dependency workarounds at control flow.
Eric Anholt [Wed, 20 Mar 2013 00:36:10 +0000 (17:36 -0700)]
i965/fs: Don't double-emit SEND dependency workarounds at control flow.

We weren't setting needs_dep[i] in the loops, so we'd continue on to
potentially add the same workaround MOVs to the later basic block
boundaries, too.  We can either set needs_dep[i] to exit through the
normal path, or we can just return since we know we're done.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Bake regs_written into the IR instead of recomputing it later.
Eric Anholt [Mon, 18 Mar 2013 18:30:57 +0000 (11:30 -0700)]
i965/fs: Bake regs_written into the IR instead of recomputing it later.

For sampler messages, it depends on the target gen, and on gen4
SIMD16-sampler-on-SIMD8-execution we were returning 4 instead of 8 like we
should.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NOTE: This is a candidate for the 9.1 branch.

11 years agoi965/fs: Clean up the setup of gen4 simd16 message destinations.
Eric Anholt [Mon, 18 Mar 2013 18:26:17 +0000 (11:26 -0700)]
i965/fs: Clean up the setup of gen4 simd16 message destinations.

I think this makes it much more obvious what's going on here.

NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Do CSE on gen7's varying-index pull constant loads.
Eric Anholt [Fri, 15 Mar 2013 21:43:28 +0000 (14:43 -0700)]
i965/fs: Do CSE on gen7's varying-index pull constant loads.

This is our first CSE on a regs_written() > 1 instruction, so it takes a
bit of extra fixup.  Reduces the number of loads on kwin's Lanczos shader
from 12 to 2.

v2: Fix compiler warning (false positive on possibly-uninitialized variable)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
NOTE: This is a candidate for the 9.1 branch.

11 years agoi965/fs: Improve performance of varying-index uniform loads on IVB.
Eric Anholt [Wed, 13 Mar 2013 21:48:55 +0000 (14:48 -0700)]
i965/fs: Improve performance of varying-index uniform loads on IVB.

Like we have done for the VS and for constant-index uniform loads, we use
the sampler engine to get caching in front of the L3 to avoid tickling the
IVB L3 bug.  This is also a bit of a functional change, as we're now
loading a vec4 instead of a single dword, though we're not taking
advantage of the other 3 components of the vec4 (yet).

With the driver hacked to always take the varying-index path for all
uniforms, improves performance of my old GLSL demo by 315% +/- 2% (n=4).
This a major fix for some blur shaders in compositors from the
varying-index uniforms support I introduced in 9.1.

v2: Move old offset computation into the pre-gen7 path.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61554
NOTE: This is a candidate for the 9.1 branch.

11 years agoi965/fs: Avoid inappropriate optimization with regs_written > 1.
Eric Anholt [Fri, 15 Mar 2013 21:31:46 +0000 (14:31 -0700)]
i965/fs: Avoid inappropriate optimization with regs_written > 1.

Right now we don't have anything with regs_written() > 1 and !inst->mlen,
but that's about to change.

NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965: Make the fragment shader pull constants index by dwords, not vec4s.
Eric Anholt [Thu, 14 Mar 2013 21:41:37 +0000 (14:41 -0700)]
i965: Make the fragment shader pull constants index by dwords, not vec4s.

We want to load vec4s, since loading a vec4 instead of a dword is
basically no increased latency.  But for variable indexed access, the
previous requirement of aligned vec4s for a sampler LD was hard to
implement.

Note that this change only affects those messages that use the surface
format, like sampler LDs, but not to the untyped data cache loads we've
used in other cases.

No significant performance difference on my GLSL demo with uniforms forced
to take the varying pull constants path (n=4).

NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965: Make the constant surface interface take a normal byte size.
Eric Anholt [Wed, 20 Mar 2013 17:46:20 +0000 (10:46 -0700)]
i965: Make the constant surface interface take a normal byte size.

This puts the rounding-up logic into the function itself instead of all
the callers having to manage it.  Also drop an "unused" comment in gen4,
as the stride *is* used for texbos (and will be for uniforms soon).

NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Move varying uniform offset compuation into the helper func.
Eric Anholt [Wed, 13 Mar 2013 19:27:17 +0000 (12:27 -0700)]
i965/fs: Move varying uniform offset compuation into the helper func.

I'm going to want to change the math for gen7 using sampler LD
instructions in a way that gets CSE to occur like we'd hope.

NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Remove creation of a MOV instruction that's never used.
Eric Anholt [Wed, 13 Mar 2013 19:17:25 +0000 (12:17 -0700)]
i965/fs: Remove creation of a MOV instruction that's never used.

We weren't inserting it into the list, so it did nothing.  This line was
replaced by the MOV/MUL block above.

NOTE: This is a candidate for the 9.1 branch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Allow constant propagation into MACH.
Eric Anholt [Fri, 15 Mar 2013 21:21:30 +0000 (14:21 -0700)]
i965/fs: Allow constant propagation into MACH.

This happens quite a bit with varying-index uniform loads.  We could also
do better by avoiding the MACH entirely, but there's no reason not to at
least take this step.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agor600g/llvm: Update LLVM_REVISION.txt
Vincent Lejeune [Mon, 1 Apr 2013 21:50:20 +0000 (23:50 +0200)]
r600g/llvm: Update LLVM_REVISION.txt

11 years agor600g/llvm: Use stack_size provided from llvm.
Vincent Lejeune [Sat, 30 Mar 2013 01:09:15 +0000 (02:09 +0100)]
r600g/llvm: Use stack_size provided from llvm.

11 years agor600g/llvm: uses function attribute to pass shader type
Vincent Lejeune [Sat, 30 Mar 2013 19:05:45 +0000 (20:05 +0100)]
r600g/llvm: uses function attribute to pass shader type

11 years agor600g/llvm: Add support for cf_alu native encode
Vincent Lejeune [Tue, 26 Mar 2013 14:00:18 +0000 (15:00 +0100)]
r600g/llvm: Add support for cf_alu native encode

11 years agoACTIVE_UNIFORM_MAX_LENGTH should include 3 extra characters for arrays.
Haixia Shi [Mon, 1 Apr 2013 20:24:55 +0000 (13:24 -0700)]
ACTIVE_UNIFORM_MAX_LENGTH should include 3 extra characters for arrays.

If the active uniform is an array, then the length of the uniform name should
include the three extra characters for the "[0]" suffix, which is required by
the GL 4.2 spec to be appended to the uniform name in glGetActiveUniform().

This avoids the situation where the output buffer does not have enough space
to hold the "[0]" suffix, resulting in an incomplete array specification like
"foobar[0".

NOTE: This is a candidate for the 9.1 branch.

Change-Id: I41e87ba347a7169eec8c575596cc3416adbe0728
Signed-off-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Stéphane Marchesin <marcheu@chromium.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
11 years agoi965/fs: Fix bad interaction between tex swizzles and textureQueryLOD.
Matt Turner [Sun, 31 Mar 2013 04:26:57 +0000 (21:26 -0700)]
i965/fs: Fix bad interaction between tex swizzles and textureQueryLOD.

Reported-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965: Remove the old brw_optimize() code.
Eric Anholt [Sat, 1 Dec 2012 02:30:40 +0000 (18:30 -0800)]
i965: Remove the old brw_optimize() code.

This is now done in the VS backend before instruction emit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/vs: Add a pass to set dependency control fields on instructions.
Eric Anholt [Sat, 1 Dec 2012 02:29:34 +0000 (18:29 -0800)]
i965/vs: Add a pass to set dependency control fields on instructions.

This is a more aggressive version of the old brw_optimize() path.  Reduces
cycles spent in the vertex shader on minecraft by 18.6% +/- 10.0% (n=15).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965: Dump shader source for linked shader programs.
Eric Anholt [Fri, 22 Mar 2013 23:50:58 +0000 (16:50 -0700)]
i965: Dump shader source for linked shader programs.

We dump shader source in ir_to_mesa.cpp, and we dump linked programs here,
but we had no reference from the linked programs to their source.  This
was preventing improvement of shader-db to use linked shader programs
instead of individual shader files (which is bogus, because it means we
optimize out VS outputs, and don't interpolate FS inputs!)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoclover: Fix build with LLVM 3.3
Mike Lothian [Mon, 1 Apr 2013 17:50:23 +0000 (10:50 -0700)]
clover: Fix build with LLVM 3.3

11 years agollvmpipe: use triangle subdivision to avoid fixed-point overflow issues
Brian Paul [Tue, 26 Mar 2013 04:02:47 +0000 (22:02 -0600)]
llvmpipe: use triangle subdivision to avoid fixed-point overflow issues

If we're drawing to a surface that's 2048 x 2048 pixels or larger there's
danger of fixed-point overflow in the triangle rasterization code.  That
leads to various rendering glitches.

Rather than implement some intricate changes to the rasterization code,
simply subdivide triangles into smaller subtriangles to avoid the issue.
Only do this when the drawing surface is larger than 2048 by 2048.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
11 years agomesa: remove platform checks around __builtin_ffs, __builtin_ffsll
Brian Paul [Thu, 28 Mar 2013 23:06:35 +0000 (17:06 -0600)]
mesa: remove platform checks around __builtin_ffs, __builtin_ffsll

Use the __builtin_ffs, __builtin_ffsll functions whenever we have GCC,
not just for specific platforms.  Fixes Solaris build.

Note: This is a candidate for the stable branches.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62868
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
11 years agodocs: add a new page documenting known application issues
Brian Paul [Mon, 25 Mar 2013 19:15:37 +0000 (13:15 -0600)]
docs: add a new page documenting known application issues

Let's try to update this when we find other broken applications...

Reviewed-by: José Fonseca <jfonseca@vmware.com>
11 years agodrirc: set always_have_depth_buffer for Topogon
Brian Paul [Sat, 30 Mar 2013 00:29:52 +0000 (18:29 -0600)]
drirc: set always_have_depth_buffer for Topogon

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
11 years agogallivm: Minor comment cleanup
Adam Jackson [Mon, 1 Apr 2013 13:45:38 +0000 (09:45 -0400)]
gallivm: Minor comment cleanup

Signed-off-by: Adam Jackson <ajax@redhat.com>
11 years agomesa: fix texture storage multisample prototypes harder.
Dave Airlie [Mon, 1 Apr 2013 09:53:55 +0000 (19:53 +1000)]
mesa: fix texture storage multisample prototypes harder.

I just noticed the warnings since I fixed the other bit.

Signed-off-by: Dave Airlie <airlied@redhat.com>
11 years agor600g/llvm: Update LLVM_REVISION
Vincent Lejeune [Sun, 31 Mar 2013 19:37:20 +0000 (21:37 +0200)]
r600g/llvm: Update LLVM_REVISION

11 years agor600g/llvm: use native encode for tex
Vincent Lejeune [Mon, 25 Mar 2013 23:47:08 +0000 (00:47 +0100)]
r600g/llvm: use native encode for tex

11 years agoglapi: fix storage multisample build errors
Dave Airlie [Sun, 31 Mar 2013 10:41:28 +0000 (20:41 +1000)]
glapi: fix storage multisample build errors

Reported on #radeon by udovdh

Signed-off-by: Dave Airlie <airlied@redhat.com>
11 years agodocs: mark ARB_texture_storage_multisample done
Chris Forbes [Sat, 16 Mar 2013 04:02:58 +0000 (17:02 +1300)]
docs: mark ARB_texture_storage_multisample done

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agoi965: enable ARB_texture_storage_multisample on Gen6+
Chris Forbes [Sat, 16 Feb 2013 09:09:38 +0000 (22:09 +1300)]
i965: enable ARB_texture_storage_multisample on Gen6+

This can be enabled everywhere that ARB_texture_multisample is
supported -- ARB_texture_storage is supported on everything.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agomesa: allow multisample texture targets in [Get]TexParameter*
Chris Forbes [Fri, 15 Mar 2013 09:52:12 +0000 (22:52 +1300)]
mesa: allow multisample texture targets in [Get]TexParameter*

ARB_texture_storage_multisample allows texture parameters to be
queried for TEXTURE_2D_MULTISAMPLE and TEXTURE_2D_MULTISAMPLE_ARRAY
targets.

Some parameters may also be set, with the following exceptions:

- TEXTURE_BASE_LEVEL may not be set to a nonzero value; generates
   INVALID_OPERATION

- any state which appears in the `per-sampler` state table may not
  be set; generates INVALID_OPERATION

V2: Don't introduce bogus handling of TEXTURE_MAX_LEVEL

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agomesa: improve reported function name in Tex*Multisample
Chris Forbes [Sat, 16 Mar 2013 04:00:05 +0000 (17:00 +1300)]
mesa: improve reported function name in Tex*Multisample

Now that there are 4 variants, just pass the function name into
teximagemultisample rather than reconstructing it.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agomesa: add enable bit for ARB_texture_storage_multisample
Chris Forbes [Sat, 16 Feb 2013 09:34:22 +0000 (22:34 +1300)]
mesa: add enable bit for ARB_texture_storage_multisample

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agoglapi: add definition of ARB_texture_storage_multisample
Chris Forbes [Sat, 16 Feb 2013 09:02:00 +0000 (22:02 +1300)]
glapi: add definition of ARB_texture_storage_multisample

Adds XML for the extension, dispatch_sanity enabling, and the two new
entrypoints. These are both implemented by calling the shared
teximagemultisample() with immutable=GL_TRUE.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agomesa: add support for immutable textures to teximagemultisample()
Chris Forbes [Sat, 16 Feb 2013 08:38:20 +0000 (21:38 +1300)]
mesa: add support for immutable textures to teximagemultisample()

The new entrypoints will come later, but this adds the actual logic for
supporting immutable multisample textures:

- The immutability flag is set as desired.
- Attempting to modify an immutable multisample texture produces
  INVALID_OPERATION.

Note: The extension spec does not mention adding this behavior to
TexImage*Multisample, but it seems like the reasonable thing to do.

V2: - Cover missing error cases (unsized formats; texture object zero)

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
[V1] Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agomesa: extract _mesa_is_legal_tex_storage_format helper
Chris Forbes [Fri, 22 Mar 2013 06:58:03 +0000 (19:58 +1300)]
mesa: extract _mesa_is_legal_tex_storage_format helper

This is about to be used in teximagemultisample() when immutable=true.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agomesa: Delete VERT_ATTRIB_GENERIC_NV and VERT_BIT_GENERIC_NV macros.
Kenneth Graunke [Thu, 21 Mar 2013 17:57:08 +0000 (10:57 -0700)]
mesa: Delete VERT_ATTRIB_GENERIC_NV and VERT_BIT_GENERIC_NV macros.

These haven't been used since we deleted NV_vertex_program support.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoi965: Fix an inconsistency inb the VUE map with gl_ClipVertex on gen4/5.
Eric Anholt [Fri, 29 Mar 2013 07:26:07 +0000 (00:26 -0700)]
i965: Fix an inconsistency inb the VUE map with gl_ClipVertex on gen4/5.

We are intentionally not allocating a slot for gl_ClipVertex.  But by
leaving the bit set in the slots_valid, the fragment shader's computation
of where varyings are in urb entry coming out of the SF would be off by
one.  Fixes rendering in Freespace 2 SCP, and improves rendering in TF2.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62830
Tested-by: Joaquín Ignacio Aramendía <samsagax@gmail.com>
NOTE: This is a candidate for the 9.1 branch.
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
11 years agointel: Remove a never-taken debug print path.
Eric Anholt [Thu, 28 Mar 2013 22:58:25 +0000 (15:58 -0700)]
intel: Remove a never-taken debug print path.

Alessandro Pignotti noted when I added this code in commit
0e723b135bfd59868c92c3ae243f1adaedaec3a5 that it's in the else block for
"if (busy)", so this debug print couldn't happen.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agost/mesa: add ir_lod case in GLSL->TGSI code to silence warning
Brian Paul [Fri, 29 Mar 2013 23:21:33 +0000 (17:21 -0600)]
st/mesa: add ir_lod case in GLSL->TGSI code to silence warning

11 years agoglsl: Generated masked write instead of vector array index for UBO lowering
Ian Romanick [Sat, 23 Mar 2013 01:55:49 +0000 (18:55 -0700)]
glsl: Generated masked write instead of vector array index for UBO lowering

When reading a column from a row-major matrix, we would slot the single
value read into the vector using an ir_dereference_array of the vector
with a constant index.  This will (eventually) get optimized to a
masked-write, so just generate the masked write in the first place.

v2: Remove unused variable 'chan'.  Suggested by Ken.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Eric Anholt <eric@anholt.net>
11 years agoglsl: Replace open-coded dot-product with dot
Ian Romanick [Mon, 25 Mar 2013 21:40:53 +0000 (14:40 -0700)]
glsl: Replace open-coded dot-product with dot

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Paul Berry <stereotype441@gmail.com>
11 years agoglsl: Replace constant-index vector array accesses with swizzles
Ian Romanick [Sat, 16 Mar 2013 01:05:55 +0000 (18:05 -0700)]
glsl: Replace constant-index vector array accesses with swizzles

Search and replace:

    ][0] -> ].x
    ][1] -> ].y
    ][2] -> ].z
    ][3] -> ].w

Fixes piglit tests inverse-mat[234].{vert,frag}.  These tests call the
inverse function with constant parameters and expect proper constant
folding to happen.  My suspicion is that this patch papers over some bug
in constant propagation involving array accesses.

Either way, all of these accesses eventually get lowered to swizzles.
This cuts out the middle man (saving a trivial amount of CPU).

NOTE: This is a candidate for the 9.1 branch.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Paul Berry <stereotype441@gmail.com>
11 years agoglsl: Add missing bool case in glsl_type::get_scalar_type
Ian Romanick [Fri, 15 Mar 2013 23:47:46 +0000 (16:47 -0700)]
glsl: Add missing bool case in glsl_type::get_scalar_type

Since the case was missing bec4->get_scalar_type() would return bvec4,
but vec4->get_scalar_type() would return float.

NOTE: This is a candidate for stable branches.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
11 years agoi965: Fix INTEL_DEBUG=shader_time for fragment shaders with discards.
Kenneth Graunke [Thu, 28 Mar 2013 06:19:39 +0000 (23:19 -0700)]
i965: Fix INTEL_DEBUG=shader_time for fragment shaders with discards.

"discard" instructions generate HALT instructions which jump to a final
HALT near the end of the shader.  Previously, fs_generator created this
final jump target when it saw the first FS_OPCODE_FB_WRITE, causing it
to jump right before the FB write epilogue.  This is normally good.

However, INTEL_DEBUG=shader_time also has an epilogue section which
records the final timestamp.  The frontend emits IR for this just before
FS_OPCODE_FB_WRITE.  Unfortunately, this led to the following ordering:

1. Shader Time Epilogue
2. Final HALT (where discards jump)
3. Framebuffer Write Epilogue

This meant that discarded pixels completely skipped the shader time
epilogue, causing no ending timestamp to be written.  This obviously
led to inaccurate results.

This patch adds a new FS_OPCODE_PLACEHOLDER_HALT in the IR stream just
before any epilogue sections.  This is where the final HALT should be
generated, and makes it easy to ensure the correct ordering:

1. Final HALT
2. Shader Time Epilogue
3. Framebuffer Write Epilogue

For shaders that don't discard, this opcode compiles away to nothing.
The scheduler adds barrier dependencies to make sure that it doesn't
get moved above any FS_OPCODE_DISCARD_JUMP instructions.

One 8-wide shader in GLBenchmark 2.7 dropped from 2291.67 Gcycles to
a mere 5.13 Gcycles.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoi965: Add names for all instructions to dump_instruction() in FS and VS.
Eric Anholt [Tue, 12 Mar 2013 00:36:54 +0000 (17:36 -0700)]
i965: Add names for all instructions to dump_instruction() in FS and VS.

I'd previously added the minimum names to understand my dumps, but this
makes dumps in general much easier to read.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965: Enable ARB_texture_query_lod.
Matt Turner [Wed, 6 Mar 2013 22:54:27 +0000 (14:54 -0800)]
i965: Enable ARB_texture_query_lod.

v2: Support Ironlake as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Generate LOD sampler message from ir_lod.
Matt Turner [Wed, 6 Mar 2013 22:47:01 +0000 (14:47 -0800)]
i965/fs: Generate LOD sampler message from ir_lod.

v2: Support Ironlake as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoglsl: Implement ARB_texture_query_lod
Dave Airlie [Sun, 23 Sep 2012 09:50:41 +0000 (19:50 +1000)]
glsl: Implement ARB_texture_query_lod

v2 [mattst88]:
   - Rebase.
   - #define GL_ARB_texture_query_lod to 1.
   - Remove comma after ir_lod in ir.h for MSVC.
   - Handled ir_lod in ir_hv_accept.cpp, ir_rvalue_visitor.cpp,
     opt_tree_grafting.cpp.
   - Rename textureQueryLOD to textureQueryLod, see
     https://www.khronos.org/bugzilla/show_bug.cgi?id=821
   - Fix ir_reader of (lod ...).
v3 [mattst88]:
   - Rename textureQueryLod to textureQueryLOD, pending resolution of
     Khronos 821.
   - Add ir_lod case to ir_to_mesa.cpp.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Use measured Gen7 instruction timings on Gen6.
Matt Turner [Thu, 28 Mar 2013 18:38:57 +0000 (11:38 -0700)]
i965/fs: Use measured Gen7 instruction timings on Gen6.

x before
+ after
+------------------------------------------------------------------------------+
|   x                                   x   +                                  |
|   xx  ++                              x   +                                  |
|   xx  ++ +                           xx   ++                                 |
|x xxx x+++++          +           xxx x*x+*+++ +         x                   +|
|   |_____|____________A______A____M____M_|_______|                            |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
    x  23       8083.78       8287.83       8205.55     8162.7461     68.307951
    +  23       8107.56       8358.74       8224.33     8186.1765     71.506301
    No difference proven at 95.0% confidence

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoi965/fs: Increase and document MAD latency on Gen7.
Matt Turner [Thu, 28 Mar 2013 18:15:20 +0000 (11:15 -0700)]
i965/fs: Increase and document MAD latency on Gen7.

58% of mad(8) generated in shader-db are reading registers from the same
bank.

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoi965/fs: Add LRP instruction latency.
Matt Turner [Thu, 28 Mar 2013 17:57:34 +0000 (10:57 -0700)]
i965/fs: Add LRP instruction latency.

Set its latency to what happens to be the default floating-point
instruction latency. One day we may want to handle latency based on
register bank information.

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoi965/fs: Add Haswell cycle timings
Matt Turner [Fri, 1 Mar 2013 00:42:51 +0000 (16:42 -0800)]
i965/fs: Add Haswell cycle timings

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoi965: Note that write-after-write dependencies are blocking.
Matt Turner [Thu, 28 Mar 2013 17:46:17 +0000 (10:46 -0700)]
i965: Note that write-after-write dependencies are blocking.

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoi965: Reword comment about the shared mathbox.
Matt Turner [Thu, 28 Mar 2013 17:45:34 +0000 (10:45 -0700)]
i965: Reword comment about the shared mathbox.

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agogallivm: consolidate some half-to-float and r11g11b10-to-float code
Roland Scheidegger [Fri, 29 Mar 2013 05:16:33 +0000 (06:16 +0100)]
gallivm: consolidate some half-to-float and r11g11b10-to-float code

Similar enough that we can try to use shared code.
v2: fix a stupid bug using wrong variable causing mayhem with Inf and NaNs.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com
11 years agomesa: provide default implementation of QuerySamplesForFormat
Chris Forbes [Fri, 29 Mar 2013 03:22:09 +0000 (16:22 +1300)]
mesa: provide default implementation of QuerySamplesForFormat

Previously at least i915 failed to provide an implementation, but
exposed ARB_internalformat_query anyway, leading to crashes when
QueryInternalformativ was called.

Default implementation just returns 1 for everything, so is suitable for
any driver which does not support multisampling.

V2: - Move from intel to core mesa.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agonvc0: implement MP performance counters
Christoph Bumiller [Wed, 27 Mar 2013 22:39:06 +0000 (23:39 +0100)]
nvc0: implement MP performance counters

There's more, but this only adds (most) of the counters that are
handled directly by the shader processors.
The other counter domains are not handled on the multiprocessor and
there are no FIFO object methods for configuring them.
Instead, they have to be programmed by the kernel via PCOUNTER, and
the interface for this isn't in place yet.

11 years agonvc0: enable compression when supported
Christoph Bumiller [Thu, 21 Mar 2013 18:26:01 +0000 (19:26 +0100)]
nvc0: enable compression when supported

11 years agonvc0: use NOUVEAU_GETPARAM_GRAPH_UNITS to get MP count
Christoph Bumiller [Wed, 27 Mar 2013 22:38:29 +0000 (23:38 +0100)]
nvc0: use NOUVEAU_GETPARAM_GRAPH_UNITS to get MP count

11 years agonv50,nvc0: fix 3d blits, restore viewport after blit
Christoph Bumiller [Fri, 22 Mar 2013 12:49:40 +0000 (13:49 +0100)]
nv50,nvc0: fix 3d blits, restore viewport after blit

11 years agonv50: fix 3D render target setup
Christoph Bumiller [Mon, 25 Mar 2013 18:41:18 +0000 (19:41 +0100)]
nv50: fix 3D render target setup

11 years agollvmpipe: put .bmp extension on dumped image files
Brian Paul [Thu, 28 Mar 2013 23:17:26 +0000 (17:17 -0600)]
llvmpipe: put .bmp extension on dumped image files

11 years agollvmpipe: add 'f' suffix to 1.0 in fixed_to_float()
Brian Paul [Thu, 28 Mar 2013 23:17:26 +0000 (17:17 -0600)]
llvmpipe: add 'f' suffix to 1.0 in fixed_to_float()

11 years agodraw: fix some build breakage when LLVM is not used
Brian Paul [Thu, 28 Mar 2013 23:03:57 +0000 (17:03 -0600)]
draw: fix some build breakage when LLVM is not used

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62883
Tested-by: Vinson Lee <vlee@freedesktop.org>