Paul Berry [Sat, 6 Apr 2013 03:15:39 +0000 (20:15 -0700)]
i965/gen7.5: Allow HW primitive restart for all primitive types.
Gen7.5 (Haswell) hardware supports primitive restart for all primitive
types. It also handles all possible primitive restart indices.
Rather than specialize both can_cut_index_handle_restart_index() and
the switch statement in can_cut_index_handle_prims() for Haswell, just
return early if the hardware is Haswell because we know it can handle
everything.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Paul Berry [Fri, 5 Apr 2013 21:56:21 +0000 (14:56 -0700)]
i965: Only use brw_draw.c's trim() function when necessary.
brw_draw.c contains a trim() function which modifies the vertex count
for quads and quad strips in order to discard dangling vertices. In
principle this shouldn't be necessary, since hardware since Gen4 is
capable of discarding dangling vertices by itself. However, it's
necessary because as a hack to speed up rendering on Gen 4-5, we
sometimes convert quads to trifans and quad strips to tristrips. The
trim() function isn't necessary on Gen6 and up.
This patch documents why and when the trim() function is necessary,
and avoids calling it when it's not needed.
This will avoid creating problems when we enable hardware support for
primitive restart of quads and quad strips on Haswell.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Paul Berry [Sun, 7 Apr 2013 13:29:46 +0000 (06:29 -0700)]
i965/vs: Fix DEBUG_SHADER_TIME when VS terminates with 2 URB writes.
The call to emit_shader_time_end() before the second URB write was
conditioned with "if (eot)", but eot is always false in this code
path, so emit_shader_time_end() was never being called for vertex
shaders that performed 2 URB writes.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Christian König [Tue, 9 Apr 2013 16:36:22 +0000 (18:36 +0200)]
st/vdpau: fix subtitle related bug v2
Drawing subtitles didn't increased the dirty area of the surface.
Reported and tested by freeedrich on irc.
v2: don't clear the surface
Signed-off-by: Christian König <christian.koenig@amd.com>
Paul Berry [Tue, 9 Apr 2013 17:37:16 +0000 (10:37 -0700)]
glsl/linker: Reduce scope of non-flat integer varying fix.
In the mailing list discussion of "glsl/linker: fix varying packing
for non-flat integer varyings." (commit
7862bde), we concluded that
since the bug only applies to integral variables, it is safer to just
apply the bug fix to integer varyings. I forgot to make the change
before pushing the patch upstream. (Note: we aren't aware of any bugs
in commit
7862bde; it just seems wise to be on the safe side).
This patch makes the change. Assuming commit
7862bde gets
cherry-picked back to 9.1, this commit should be cherry-picked too.
NOTE: This is a candidate for the 9.1 release branch.
Paul Berry [Sat, 6 Apr 2013 17:50:46 +0000 (10:50 -0700)]
glsl/linker: Adapt flat varying handling in preparation for geometry shaders.
When a varying is consumed by transform feedback, but is not used by
the fragment shader, assign_varying_locations() sets its interpolation
type to "flat" in order to ensure that lower_packed_varyings never has
to deal with non-flat integral varyings (the GLSL spec doesn't require
integral vertex outputs to be flat if they aren't consumed by the
fragment shader).
A similar situation will arise when geometry shader support is added,
since the GLSL spec only requires integral vertex shader outputs to be
flat when they are consumed by the fragment shader. This patch
modifies the linker to handle this situation too.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Paul Berry [Sat, 6 Apr 2013 17:33:25 +0000 (10:33 -0700)]
glsl: Document lower_packed_varyings' "flat" requirement with an assert.
To minimize the variety of type conversions that lower_packed_varyings
needs to perform, it assumes that integral varyings are always
qualified as "flat". link_varyings.cpp takes care of ensuring that
this is the case (even in the circumstances where GLSL doesn't require
it).
This patch documents the assumption with an assertion, for ease in
future debugging.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Paul Berry [Sat, 6 Apr 2013 16:36:06 +0000 (09:36 -0700)]
glsl/linker: fix varying packing for non-flat integer varyings.
Commit
dfb57e7 (glsl: Fix error checking on "flat" keyword to match
GLSL ES 3.00, GLSL 1.50) relaxed the rules for integral varyings: they
only need to be declared as "flat" if they are a fragment shader
inputs. This allowed for the possibility of a vertex shader output
being a non-flat integer, provided that it was not matched to a
fragment shader input. A non-contrived situation where this might
arise is if a vertex shader generates some integral outputs which are
consumed by tranform feedback, but not by the fragment shader.
Unfortunately, lower_packed_varyings assumes that *all* integral
varyings are flat, regardless of whether they are consumed by the
fragment shader. As a result, attempting to create a non-flat
integral vertex output of a size that required packing (i.e. a size
other than ivec4 or uvec4) would cause an assertion failure in
lower_packed_varyings.
This patch prevents the assertion failure by forcing vertex shader
outputs to be "flat" whenever they are not consumed by the fragment
shader. This should have no effect on rendering since the "flat"
keyword only affects the behaviour of fragment shader inputs.
Fixes piglit test "spec/EXT_transform_feedback/nonflat-integral".
NOTE: This is a candidate for the 9.1 release branch.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Paul Berry [Tue, 9 Apr 2013 17:03:11 +0000 (10:03 -0700)]
glsl: Check the size of ir_print_visitor's mode[] array with STATIC_ASSERT.
ir_print_visitor::visit(ir_variable *)'s mode[] array needs to match
the declaration of the enum ir_variable_mode. It's hard to verify
that at compile time, but at least we can use a STATIC_ASSERT to make
sure it's the right size.
This required adding ir_var_mode_count to the enum.
Paul Berry [Sun, 7 Apr 2013 02:16:58 +0000 (19:16 -0700)]
glsl: Fix ir_print_visitor's handling of interpolation qualifiers.
This patch updates the interp[] array to match the enum
glsl_interp_qualifier.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Add a STATIC_ASSERT to make sure the array is the correct size.
This required adding INTERP_QUALIFIER_COUNT to the enum.
Johannes Obermayr [Tue, 9 Apr 2013 16:38:42 +0000 (17:38 +0100)]
autotools: Better describe which cases OProfileJIT is required.
Signed-off-by: José Fonseca <jfonseca@vmware.com>
Brian Paul [Tue, 9 Apr 2013 01:13:29 +0000 (19:13 -0600)]
softpipe: misc updates to image dumping in softpipe_flush()
Vinson Lee [Sat, 6 Apr 2013 03:46:30 +0000 (20:46 -0700)]
tgsi: Ensure struct tgsi_ind_register field Index is initialized.
Fixes uninitialized scalar variable defect reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Martin Andersson [Tue, 2 Apr 2013 20:43:33 +0000 (22:43 +0200)]
r600g: Fix UMAD on Cayman
The multiplication part of tgsi_umad did not work on Cayman, because it did
not populate the correct vector slots.
This fixed hardlocks in the EXT_transform_feedback/order tests.
NOTE: This is a candidate for the stable branches.
(might not be easy to cherry-pick though)
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Kenneth Graunke [Tue, 9 Apr 2013 02:39:20 +0000 (19:39 -0700)]
intel: Remove the texture_tiling driconf option.
This option can force textures to be untiled. However, on Gen6+, depth
buffers must be Y-tiled. MSAA buffers also must be Y-tiled. So setting
this option on even a trivial application like glxgears causes assertion
failures in a debug build, and likely GPU hangs in a release build.
It's just giving users a license to shoot themselves in the foot.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Tue, 9 Apr 2013 02:27:38 +0000 (19:27 -0700)]
i965: Prefer Y-tiling on Gen6+.
In the past, we preferred X-tiling for color buffers because our BLT
code couldn't handle Y-tiling. However, the BLT paths have been largely
replaced by BLORP on Gen6+, which can handle any kind of tiling.
We hadn't measured any performance improvement in the past, but that's
probably because compressed textures were all untiled anyway.
Improves performance in GLB27_TRex_C24Z16_FixedTime by 7.69231%.
v2: Rebase on top of Eric's untiled-for-larger-than-aperture changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Tue, 9 Apr 2013 02:27:37 +0000 (19:27 -0700)]
i965: Use tiling even for compressed textures.
The code has no rationale for why we would force compressed textures to
be untiled, and it appears to work fine. Git archeology indicates that
it's been that way dating back to when we first started tiling.
Improves performance in GLB27_TRex_C24Z16_FixedTimeStep at 1280x720 by
10.0529% +/- 0.573075% (n=12). Improves performance in Xonotic by
4.56409% +/- 0.27965% (n=3).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Chad Versace [Tue, 9 Apr 2013 02:27:36 +0000 (19:27 -0700)]
intel: Refactor selection of miptree tiling
This patch (1) extracts from intel_miptree_create() the spaghetti logic
that selects the tiling format, (2) rewrites that spaghetti into a lucid
form, and (3) moves it to a new function, intel_miptree_choose_tiling().
No behavioral change.
As a bonus, it is now evident that the force_y_tiling parameter to
intel_miptree_create() does not really force Y tiling.
v2 (Ken): Rebase on top of Eric's untiled-for-larger-than-aperture
changes. This required passing in the miptree.
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Chad Versace [Fri, 5 Apr 2013 22:18:00 +0000 (15:18 -0700)]
intel: Allocate hiz in intel_renderbuffer_move_to_temp()
When moving the renderbuffer to a new miptree, we neglected to allocate
the hiz buffer for the new miptree. Oops.
Fixes all Piglit depthstencil-render-miplevels tests from crash to pass on
Sandybridge.
Note: This is a candidate for the 9.1 branch.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Dave Airlie [Sun, 7 Apr 2013 04:29:59 +0000 (14:29 +1000)]
st/mesa: fix levels in initial texture creation
calim pointed out we were getting mipmap levels for array multisamples,
this didn't make sense. So then I noticed this function takes last_level
so we are passing in a too high value here.
I think this should fix the case he was seeing.
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Ian Romanick [Fri, 15 Mar 2013 22:23:19 +0000 (15:23 -0700)]
glsl: Don't early-out for error-type inputs
Check the type of the array operand and the index operand before doing
other checks. This simplifies the code a bit now (eliminating the
error_emitted parameter), and enables some later functional changes.
The shader
uniform float x[6];
uniform sampler2D s;
void main() { gl_Position.x = xx[s + 1]; }
still generates (only) the two expected errors:
0:3(33): error: `xx' undeclared
0:3(39): error: Operands to arithmetic operators must be numeric
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Fri, 15 Mar 2013 22:14:18 +0000 (15:14 -0700)]
glsl: Don't emit spurious errors for constant indexes of the wrong type
Previously the shader
uniform float x[6];
void main() { gl_Position.x = x[1.0]; }
would have generated the errors
0:2(33): error: array index must be integer type
0:2(36): error: array index must be < 6
Now only
0:2(33): error: array index must be integer type
will be generated.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Fri, 15 Mar 2013 22:10:35 +0000 (15:10 -0700)]
glsl: Collect all of the non-constant index error checks together
This puts all of the checks togeher for easier reading. It also means
that all the checks are blocked on array->type->is_array. Shortly this
will allow elimination of some is_error check work-arounds in this
function.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Fri, 15 Mar 2013 22:09:48 +0000 (15:09 -0700)]
glsl: Minor code compaction in _mesa_ast_array_index_to_hir
Also, document the reason for not checking for type->is_array in some of
the bound-checking cases.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Fri, 15 Mar 2013 21:33:01 +0000 (14:33 -0700)]
glsl: Don't return a value from check_builtin_array_max_size
That last consumer of the return value was changed to not use it by the
previous commit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Fri, 15 Mar 2013 21:27:22 +0000 (14:27 -0700)]
glsl: Remove some unnecessary uses of error_emitted
The error_emitted flag is used in semantic checking to prevent spurious
cascading errors. For example,
void foo(sampler2D s, float a)
{
float x = a + (1.2 + s);
...
}
should only generate a single error. Without the error_emitted flag for
the first error, "a + ..." would also generate an error.
However, a bunch of cases in _mesa_ast_array_index_to_hir that were
setting error_emitted would mask legitimate errors. For example,
vec4 a[7];
float b = a[3.14];
should generate two error (float index and type mismatch in assignment).
The uses of error_emitted would cause only the first to be emitted.
This patch removes most of the places in _mesa_ast_array_index_to_hir
that would set the error_emitted flag.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Fri, 15 Mar 2013 21:10:12 +0000 (14:10 -0700)]
glsl: Refactor handling of ast_array_index to a separate function
I love 800+ line switch-statements as much as the next guy... Future
commits will make changes to this part of the AST-to-HIR conversion, and
extracting this code will make that a bit easier.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Fri, 15 Mar 2013 21:09:00 +0000 (14:09 -0700)]
glsl: Make check_build_array_max_size externally visible
A future commit will try to use this function in a different file.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Wed, 3 Apr 2013 00:28:41 +0000 (17:28 -0700)]
intel: Avoid making tiled miptrees we won't be able to blit.
Doing so was breaking miptree mapping, which we really need to be able to
handle. With this change, intel_miptree_map_direct() falls through to
doing a CPU mapping on the buffer like we need.
With the previous 2 patches, all of these should be fixed:
piglit max-texture-size (all 3 patches required!)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=37871
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44958
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=53494
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Wed, 3 Apr 2013 00:21:25 +0000 (17:21 -0700)]
intel: Do temporary CPU maps of textures that are too big to GTT map.
This still fails, since 8192*4bpp == 32768, which is too big to use the
blitter on.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Eric Anholt [Wed, 3 Apr 2013 00:19:55 +0000 (17:19 -0700)]
intel: Add support for writing to our linear-temporary-CPU-map case.
This will be used for handling updates of large textures.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>.
Kenneth Graunke [Sat, 6 Apr 2013 07:08:37 +0000 (00:08 -0700)]
intel: Remove check for kernel 2.6.29.
Now that we require 2.6.39, there's no need to also check for 2.6.29.
Calling drm_intel_bufmgr_gem_enable_fenced_relocs() without checking
should be safe, as it simply sets a flag.
This does remove the check for zero fences available, but that doesn't
seem worth checking.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Sat, 6 Apr 2013 06:59:52 +0000 (23:59 -0700)]
intel: Require kernel 2.6.39 for relaxed relocation support.
Chris Wilson's relaxed relocation patch landed in March 2011. Anyone
running pre-3.0 kernels probably isn't going to get the latest Mesa
anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Sat, 6 Apr 2013 06:31:57 +0000 (23:31 -0700)]
i965: Remove a few BRW_STATE_... enum values.
These were likely used for BRW_NEW_... dirty bit flags at one point, but
they're unused now.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Sat, 6 Apr 2013 05:58:39 +0000 (22:58 -0700)]
i965: Remove brw->vb.info and struct brw_vertex_info.
Nobody uses this value, so there's no need to set it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Sat, 6 Apr 2013 05:54:17 +0000 (22:54 -0700)]
i965: Remove the BRW_NEW_INPUT_DIMENSIONS flag.
When I removed the proj_attrib_mask optimization, I also removed the
last consumer of this bit without realizing it.
Since nobody uses it, there's no point in flagging it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Matt Turner [Tue, 2 Apr 2013 20:38:07 +0000 (13:38 -0700)]
register_allocate: Fix the type of best_benefit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tom Stellard [Mon, 8 Apr 2013 14:43:34 +0000 (07:43 -0700)]
radeon/llvm: Bump minimum LLVM version to 3.3
Niels Ole Salscheider [Thu, 4 Apr 2013 21:26:45 +0000 (23:26 +0200)]
clover: Fix linkage of libOpenCL
Clover needs the irreader component of llvm
v2: Check for irreader component
irreader is only available with LLVM 3.3 >= 177971
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Vincent Lejeune [Sat, 6 Apr 2013 16:12:26 +0000 (18:12 +0200)]
r600g/llvm: Add support for native isa for pre EG
This fixes bug 62756 :
https://bugs.freedesktop.org/show_bug.cgi?id=62756#c12
Marek Olšák [Fri, 5 Apr 2013 12:18:22 +0000 (14:18 +0200)]
gallium/util: add const to a parameter of util_max_layer
Marek Olšák [Thu, 28 Mar 2013 02:16:25 +0000 (03:16 +0100)]
st/mesa: don't expose ARB_color_buffer_float without driver support in GL core
Reviewed-by: Brian Paul <brianp@vmware.com>
Marek Olšák [Thu, 28 Mar 2013 02:02:14 +0000 (03:02 +0100)]
mesa: allow drivers not to expose ARB_color_buffer_float in GL core profile
Reviewed-by: Brian Paul <brianp@vmware.com>
Marek Olšák [Thu, 28 Mar 2013 01:48:17 +0000 (02:48 +0100)]
mesa: move updating clamp control derived state out of mesa_update_state_locked
It has 2 dependencies: glClampColor and the framebuffer, we might just as well
do the update where those two are changed.
v2: cosmetic changes from Brian's email
Reviewed-by: Brian Paul <brianp@vmware.com>
Marek Olšák [Thu, 28 Mar 2013 00:56:01 +0000 (01:56 +0100)]
mesa: don't set _ClampFragmentColor to TRUE if it has no effect
This should reduce shader recompilations with drivers that emulate fragment
color clamping, because we want the clamping to be enabled only if there is
a signed normalized or floating-point colorbuffer.
Reviewed-by: Brian Paul <brianp@vmware.com>
Marek Olšák [Thu, 28 Mar 2013 00:50:21 +0000 (01:50 +0100)]
mesa: refactor clamping controls, get rid of _ClampReadColor
v2: cosmetic changes from Brian's email
Reviewed-by: Brian Paul <brianp@vmware.com>
Chris Forbes [Sun, 31 Mar 2013 23:51:59 +0000 (12:51 +1300)]
mesa: don't memcmp() off the end of a cache key.
Reported-by: `per` in #intel-gfx
The size of the cache key varies, so store the actual size as well as
the key blob itself, rather than just assuming it's the same as the size
passed in.
NOTE: This is a candidate for stable branches.
V2: Don't leave silly holes in structure; use unsigned instead of GLuint.
V3: Fix missing case for `last` match.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Tom Stellard [Thu, 25 Oct 2012 17:50:10 +0000 (13:50 -0400)]
radeonsi: Add compute support v3
v2:
- Only dump shaders when env variable is set.
v3:
- Don't emit VGT registers
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com
Tom Stellard [Wed, 13 Mar 2013 16:59:33 +0000 (12:59 -0400)]
radeonsi: Set TCL1_ACTION_ENA when invalidating the texture cache
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com
Tom Stellard [Wed, 13 Mar 2013 17:01:32 +0000 (13:01 -0400)]
radeonsi: Remove si_pm4_inval_vertex_cache()
This function is a holdover from r600g and is identical to
si_pm4_inval_texture_cache(), so it is not needed.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com
Tom Stellard [Thu, 7 Mar 2013 15:51:25 +0000 (10:51 -0500)]
gallium: PIPE_COMPUTE_CAP_IR_TARGET - allow drivers to specify a processor v2
This target string now contains four values instead of three. The old
processor field (which was really being interpreted as arch) has been split
into two fields: processor and arch. This allows drivers to pass a
more a more detailed description of the hardware to compiler frontends.
v2:
- Adapt to libclc changes
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Wladimir [Fri, 5 Apr 2013 17:49:26 +0000 (19:49 +0200)]
util: add ETC as compressed format
Add UTIL_FORMAT_LAYOUT_ETC to util_format_is_compressed. It was missing.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Brian Paul [Fri, 5 Apr 2013 17:21:09 +0000 (11:21 -0600)]
gallium/u_blitter: fix is_blit_generic_supported() stencil checking
Don't check if there's sampler support for stencil if we're not
going to actually blit/copy stencil values. Fixes the case where
we mistakenly said we can't support a blit of depth values from
S8Z24 to X8Z24.
Also, rename the is_stencil variable to dst_has_stencil to improve
readability.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Alexander Monakov [Mon, 1 Apr 2013 21:38:27 +0000 (01:38 +0400)]
Honor GLX_DONT_CARE in MATCH_MASK
NOTE: This is a candidate for stable branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=47478
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62999
Bugzilla: http://bugs.winehq.org/show_bug.cgi?id=26763
Rob Clark [Fri, 5 Apr 2013 16:54:37 +0000 (12:54 -0400)]
freedreno: use autogenerated register defs
Switch to use the envytools generated headers for register/bitfield
definitions. This is the first step in preparing to add a3xx support,
since it avoids having conflicting names for a3xx and a2xx registers.
And since I'm using envytools for a3xx it is simpler to just use it for
everything.
This shouldn't cause any functional change, it is really just a lot of
renaming.
Signed-off-by: Rob Clark <robdclark@gmail.com>
José Fonseca [Thu, 4 Apr 2013 19:27:39 +0000 (20:27 +0100)]
st/wgl: Install our windows message hook to threads created before the ICD is loaded.
Otherwise we will not receive destroy windows events, causing framebuffers
to leak.
This happens particularly with java and jogl.
Tested with java + jogl, MATLAB.
VMware Internal Bug Number:
1013086.
Reviewed-by: Brian Paul <brianp@vmware.com>
Adam Jackson [Thu, 4 Apr 2013 21:16:22 +0000 (17:16 -0400)]
llvmpipe: Work without sse2 if llvm is new enough
At least on llvm 3.2 this appears to work fine. Tested on an Athlon XP
2600+, which has sse and 3dnow but not sse2.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Jerome Glisse [Wed, 27 Mar 2013 15:04:29 +0000 (11:04 -0400)]
winsys/radeon: add command stream replay dump for faulty lockup v3
Build time option, set RADEON_CS_DUMP_ON_LOCKUP to 1 in radeon_drm_cs.h to
enable it.
When enabled after each cs submission the code will try to detect lockup by
waiting on one of the buffer of the cs to become idle, after a timeout it
will consider that the cs triggered a lockup and will write a radeon_lockup.c
file in current directory that have all information for replaying the cs.
To build this file :
gcc -O0 -g radeon_lockup.c -ldrm -o radeon_lockup -I/usr/include/libdrm
v2: Add radeon_ctx.h file to mesa git tree
v3: Slightly improve dumped file for easier editing, only dump first faulty cs
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Brian Paul [Thu, 4 Apr 2013 20:06:51 +0000 (14:06 -0600)]
st/xlib: add HUD support for xlib/GLX
For the softpipe and llvmpipe drivers.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Thu, 4 Apr 2013 22:37:56 +0000 (16:37 -0600)]
gallium/hud: add GALLIUM_HUD_PERIOD env var
To set the graph update rate, in seconds. The default update rate
has also been changed to 1/2 second.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Brian Paul [Thu, 4 Apr 2013 22:24:40 +0000 (16:24 -0600)]
gallium/hud: initialize sampler state
The default wrap mode (PIPE_TEX_WRAP_REPEAT) is incompatible with
unnormalized texcoords (at least for softpipe).
v2: use PIPE_TEX_WRAP_CLAMP_TO_EDGE
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Kenneth Graunke [Thu, 4 Apr 2013 06:56:57 +0000 (23:56 -0700)]
glsl: Add an optimization pass to flatten simple nested if blocks.
GLBenchmark 2.7's shaders contain conditional blocks like:
if (x) {
if (y) {
...
}
}
where the outer conditional's then clause contains exactly one statement
(the nested if) and there are no else clauses. This can easily be
optimized into:
if (x && y) {
...
}
This saves a few instructions in GLBenchmark 2.7:
total instructions in shared programs: 11833 -> 11649 (-1.55%)
instructions in affected programs: 8234 -> 8050 (-2.23%)
It also helps CS:GO slightly (-0.05%/-0.22%). More importantly,
however, it simplifies the control flow graph, which could enable other
optimizations.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Wed, 3 Apr 2013 04:11:51 +0000 (21:11 -0700)]
i965: Use a variable for the push constant size in kB.
This clarifies that the offset of 2 is actually 16 kB / 8kB units.
It also keys both computations off of a single variable, which should
make it easier to change in the future.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Kenneth Graunke [Wed, 3 Apr 2013 04:11:50 +0000 (21:11 -0700)]
i965: Turn brw->urb.vs_size and gs_size into local variables.
These variables are only used within a single function, so we may as
well make them local variables.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Kenneth Graunke [Wed, 13 Mar 2013 05:16:37 +0000 (22:16 -0700)]
i965: Remove BRW_NEW_WM_INPUT_DIMENSIONS dirty bit.
This was only produced by the brw_wm_input_dimensions atom, which was
removed in the previous commit. So there's no need for the dirty bit.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Wed, 13 Mar 2013 04:12:08 +0000 (21:12 -0700)]
i965: Delete brw_vs_constval.c and the brw_wm_input_sizes atom.
This was only used to compute proj_attrib_mask, which was removed by the
previous commit. That makes this dead code.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Wed, 13 Mar 2013 04:09:35 +0000 (21:09 -0700)]
i965: Remove now dead brw_wm_prog_key::proj_attrib_mask field.
The previous commit removed the last user of this field, so there's no
longer any point in setting it. Removing this should eliminate
state-dependent recompiles, and make the precompile more reliable.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Wed, 13 Mar 2013 04:09:19 +0000 (21:09 -0700)]
i965: Remove fixed-function texture projection avoidance optimization.
This optimization attempts to avoid extra attribute interpolation
instructions for texture coordinates where the W-component is 1.0.
Unfortunately, it requires a lot of complexity: the brw_wm_input_sizes
state atom (all the brw_vs_constval.c code) needs to run on each draw.
It computes the input_size_masks array, then uses that to compute
proj_attrib_mask. Differences in proj_attrib_mask can cause
state-dependent fragment shader recompiles. We also often fail to guess
proj_attrib_mask for the fragment shader precompile, causing us to
needlessly compile it twice.
Furthermore, this optimization only applies to fixed-function programs;
it does not help modern GLSL-based programs at all. Generally, older
fixed-function programs run fine on modern hardware anyway.
The optimization has existed in some form since the initial commit. When
we rewrote the fragment shader backend, we dropped it for a while. Eric
readded it in commit
eb30820f268608cf451da32de69723036dddbc62 as part of
an attempt to cure a ~1% performance regression caused by converting the
fixed-function fragment shader generation code from Mesa IR to GLSL IR.
However, no performance data was included in the commit message, so it's
unclear whether or not it was successful.
Time has passed, so I decided to re-measure this. Surprisingly,
Eric's OpenArena timedemo actually runs /faster/ after removing this and
the brw_wm_input_sizes atom. On Ivybridge at 1024x768, I measured a
1.39532% +/- 0.91833% increase in FPS (n = 55). On Ironlake, there was
no statistically significant difference (n = 37).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Kenneth Graunke [Tue, 2 Apr 2013 17:28:07 +0000 (10:28 -0700)]
i965: Use ctx->Stencil._WriteEnabled in DEPTH_STENCIL_STATE.
This is the same computation as the _WriteEnabled flag, so we may as
well use it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Kenneth Graunke [Tue, 2 Apr 2013 17:29:37 +0000 (10:29 -0700)]
i965: Fix stencil write enable flag in 3DSTATE_DEPTH_BUFFER on Gen7+.
ctx->Stencil.WriteMask is a statically sized array of 3 elements.
Checking it against 0 actually is a NULL check, and can never fail,
which meant that we always said stencil writes were enabled.
Use the new core Mesa derived state flag to fix this.
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Kenneth Graunke [Tue, 2 Apr 2013 17:22:18 +0000 (10:22 -0700)]
mesa: Add new ctx->Stencil._WriteEnabled derived state flag.
i965 needs to know whether stencil writes are enabled in several places,
and gets the test wrong sometimes. While we could create a function to
compute this, it seems generally useful enough to warrant a new piece of
derived state. Also, all the plumbing is already in place.
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Roland Scheidegger [Thu, 4 Apr 2013 21:20:49 +0000 (23:20 +0200)]
gallivm: some minor cube map cleanup
The ar_ge_as_at variable was just very very confusing since the condition
was actually the other way around (as_at_ge_ar). So change the condition
(and the selects depending on it) to match the variable name.
And also change the chosen major axis in case the coord values are the
same. OpenGL doesn't care one bit which one is chosen in this case but
it looks like dx10 would require z chosen over y, and y chosen over x
(previously did x chosen over y, y chosen over z). Since it's all the
same effort just honor dx10's wishes. (Though actually, for some prefered
orderings, we could save one (or two with derivatives) selects since the
tnewx and tnewz (and the corresponding dmax values) are the same.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Eric Anholt [Sat, 1 Dec 2012 00:34:09 +0000 (16:34 -0800)]
i965: Ask the register allocator to round-robin through registers.
The way we were allocating registers before, packing into low register
numbers for Ironlake, resulted in an overly-constrained dependency graph
for instruction scheduling. Improves GLBenchmark 2.1 performance by
4.5% +/- 0.7% (n=26). No difference on my old GLSL demo (n=20). No
difference on nexuiz (n=15).
v2: Fix off-by-one bug that made the change only work for 16-wide on i965.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Zack Rusin [Thu, 4 Apr 2013 04:15:13 +0000 (21:15 -0700)]
llvmpipe: implement ucmp
and add a test for it
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Paul Berry [Tue, 2 Apr 2013 16:51:47 +0000 (09:51 -0700)]
Avoid spurious GCC warnings in STATIC_ASSERT() macro.
GCC 4.8 now warns about typedefs that are local to a scope and not
used anywhere within that scope. This produced spurious warnings with
the STATIC_ASSERT() macro (which used a typedef to provoke a compile
error in the event of an assertion failure).
This patch switches to a simpler technique that avoids the warning.
v2: Avoid GCC-specific syntax. Also update p_compiler.h.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Erik Faye-Lund [Tue, 26 Mar 2013 13:48:45 +0000 (14:48 +0100)]
freedreno: document debug flag
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Brian Paul <brianp@vmware.com>
Brian Paul [Wed, 3 Apr 2013 19:46:40 +0000 (13:46 -0600)]
st/wgl: add HUD support
v2: fix a few minor issues spotted by Jose.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Brian Paul [Wed, 3 Apr 2013 19:45:47 +0000 (13:45 -0600)]
st/wgl: make stw_current_context() non-static
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Brian Paul [Wed, 3 Apr 2013 19:36:50 +0000 (13:36 -0600)]
util: add debug_memory_check_block(), debug_memory_tag()
The former just checks that the given block is valid by checking
the header and footer.
The later sets the memory block's tag. With extra debug code, we
can use that for monitoring/checking particular allocations.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Brian Paul [Wed, 3 Apr 2013 19:33:38 +0000 (13:33 -0600)]
gallium/hud: replace malloc w/ MALLOC
To match the FREE() called used later. Fixes things on Windows.
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Vincent Lejeune [Wed, 3 Apr 2013 19:19:22 +0000 (21:19 +0200)]
r600g/llvm: Workaround for wrong tex.offset_*
Roland Scheidegger [Wed, 3 Apr 2013 22:56:23 +0000 (00:56 +0200)]
gallivm: honor explicit derivatives values for cube maps.
This is trivial now, though need to make sure we pass all the necessary
derivative values (which is 3 each for ddx/ddy not 2).
Passes piglit arb_shader_texture_lod-texgradcube test.
v2: add the forgotten abs() for all incoming derivatives (discovered
by new piglit arb_shader_texture_lod-texgradcube test, though more by
luck as it was failing only for exactly one pixel...).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Wed, 3 Apr 2013 01:26:22 +0000 (03:26 +0200)]
gallivm: do per-pixel cube face selection (finally!!!)
This proved to be tricky, the problem is that after selection/mirroring
we cannot calculate reasonable derivatives (if not all pixels in a quad
end up on the same face the derivatives could get "randomly" exceedingly
large).
However, it is actually quite easy to simply calculate the derivatives
before selection/mirroring and then transform them similar to
the cube coordinates (they only need selection/projection, but not
mirroring as we're not interested in the sign bit, of course). While
there is a tiny bit more work to do (need to calculate derivs for 3
coords instead of 2, and additional selects) it also simplifies things
somewhat for the coord selection itself (as we save some broadcast aos
shuffles, and we don't need to calculate the average vector) - hence if
derivatives aren't needed this should actually be faster.
Also, this has the benefit that this will (trivially) work for explicit
derivatives too, which we completely ignored before that (will be in a
separate commit for better trackability).
Note that while the way for getting rho looks very different, it should
result in "nearly" the same values as before (the "nearly" is only because
before the code would choose the face based on an "average" vector and hence
the derivatives calculated according to this face, where now (for implicit
derivatives) the derivatives are projected on the face selected for the
first (top-left) pixel in a quad, so not necessarly the same face).
The transformation done might not quite be state-of-the-art, calculating
length(dx,dy) as max(dx,dy) certainly isn't neither but this stays the
same as before (that is I think a better transform would _somehow_ take
the "derivative major axis" into account so that derivative changes in
the major axis wouldn't get ignored).
Should solve some accuracy problems with cubemaps (can easily be seen with
the cubemap demo when switching wrapping/filtering), though we still don't
do seamless filtering to fix it completely (so not per-sample but per-pixel
is certainly better than per-quad and already sufficient for accurate
results with nearest tex filter).
As for performance, it seems to be a tiny bit faster too (maybe 3% or so
with cubemap demo). Which I'd have expected with nearest/nearest filtering
where this will be less instructions, but the difference seems to actually
be larger with linear/linear_mipmap_linear where it is slightly more
instructions, probably the code appears less serialized allowing better
scheduling (on a sandy bridge cpu). It actually seems to be now at least
as fast as the old path using a conditional when using 128bit vectors too
(that is probably more a result of testing with a newer cpu though), for now
that old path is still there but unused.
No piglit regressions.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Wed, 3 Apr 2013 00:49:56 +0000 (02:49 +0200)]
gallivm: minor rho calculation optimization for 1 or 3 coords
Using a different packing for the single coord case should save a shuffle.
Plus some minor style fixes.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Tue, 2 Apr 2013 23:06:52 +0000 (01:06 +0200)]
gallivm: use f16c hw support for float->half and half->float conversion
Should be way faster of course on cpus supporting this (includes AMD
Bulldozer and Jaguar cores, Intel Ivy Bridge and up (except budget models)).
Passes piglit fbo-blending-formats GL_ARB_texture_float -auto on Ivy Bridge.
Reviewed-by: Brian Paul <brianp@vmware.com>
Zack Rusin [Sat, 30 Mar 2013 13:21:41 +0000 (06:21 -0700)]
draw/llvmpipe: allow independent so attachments to the vs
When geometry shaders are present, one needs to be able to create
an empty geometry shader with stream output that needs to be
resolved later and attached to the currently bound vertex shader.
Lets add support for it to llvmpipe and draw. draw allows attaching
independent stream output info to any vertex shader and llvmpipe
resolves at draw time which vertex shader the given empty geometry
shader should be linked to.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Zack Rusin [Sat, 30 Mar 2013 07:21:03 +0000 (00:21 -0700)]
llvmpipe: reset so buffers when not appending
We need to reset the internal state of the so buffers or we'll
keep appending even though we're not supposed to.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Zack Rusin [Sat, 30 Mar 2013 07:20:05 +0000 (00:20 -0700)]
draw: remove unused function
we use draw_set_mapped_so_targets nowadays
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Zack Rusin [Sat, 30 Mar 2013 02:33:34 +0000 (19:33 -0700)]
draw/llvm: use an enum instead of magic numbers
I think this was there before and got accidently
removed during a merge. Same code as for the GS
context, which is also using an enum instead of
hardcoded numbers.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Zack Rusin [Sat, 30 Mar 2013 00:18:42 +0000 (17:18 -0700)]
draw/gs: cleanup some debugging code
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Zack Rusin [Fri, 29 Mar 2013 11:52:29 +0000 (04:52 -0700)]
draw/so: maintain an exact number of written vertices
It's quite helpful during the rendering when we know
exactly the count of the vertices available in the
buffer.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Zack Rusin [Fri, 29 Mar 2013 11:50:32 +0000 (04:50 -0700)]
draw: Implement support for primitive id
We were largely ignoring primitive id.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Zack Rusin [Thu, 28 Mar 2013 03:13:13 +0000 (20:13 -0700)]
draw/so: Fix bogus assert
We do support so with multiple primitives.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Zack Rusin [Thu, 28 Mar 2013 03:11:16 +0000 (20:11 -0700)]
draw/gs: Fix memory corruption with multiple primitives
We were flushing with incorrect number of primitives. TGSI exec
can only work with a single primitive at a time. Plus the fetching
with multiple primitives on llvm paths wasn't copying the last
element.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Zack Rusin [Wed, 27 Mar 2013 11:27:59 +0000 (04:27 -0700)]
gallivm: cleanup the gs interface
Instead of void pointers use a base interface.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Brian Paul [Wed, 3 Apr 2013 16:23:57 +0000 (10:23 -0600)]
svga: add new memory-used HUD query
To track the amount of memory used by all pipe_resources (textures
and buffers).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Wed, 3 Apr 2013 16:23:16 +0000 (10:23 -0600)]
util: add new util_resource_size() function in u_resource.[ch]
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Wed, 3 Apr 2013 16:21:34 +0000 (10:21 -0600)]
util: move functions from u_resource.c to u_transfer.c
The functions are prototyped in u_transfer.h and are related to the
other functions in u_transfer.c.
The next patch will re-use the u_resource.c file for new code.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Vincent Lejeune [Wed, 3 Apr 2013 16:39:18 +0000 (18:39 +0200)]
r600g/llvm: Do not override llvm provided stack_size
Vincent Lejeune [Tue, 2 Apr 2013 17:19:24 +0000 (19:19 +0200)]
r600g/llvm: Do not change cf_alu inst when adding alus