mesa.git
12 years agoglsl: Fix assertion failure on handling switch on uint expressions.
Eric Anholt [Mon, 14 May 2012 15:51:03 +0000 (08:51 -0700)]
glsl: Fix assertion failure on handling switch on uint expressions.

Fixes piglit glsl-1.30/execution/switch/fs-uint.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
12 years agoglsl: Reject non-scalar switch expressions.
Eric Anholt [Mon, 14 May 2012 15:45:59 +0000 (08:45 -0700)]
glsl: Reject non-scalar switch expressions.

The comment quotes spec saying that only scalar integers are allowed,
but we only checked for integer.

Fixes piglit switch-expression-const-ivec2.vert

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
12 years agoglsl: Let the constructor figure out the types of switch-related expressions.
Eric Anholt [Mon, 14 May 2012 15:39:54 +0000 (08:39 -0700)]
glsl: Let the constructor figure out the types of switch-related expressions.

I noticed this while unindenting the code.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
12 years agoglsl: Fix indentation of switch code.
Eric Anholt [Mon, 14 May 2012 15:37:50 +0000 (08:37 -0700)]
glsl: Fix indentation of switch code.

I managed to completely trash it in 22d81f15.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
12 years agoi965/vs: Fix up swizzle for dereference_array of matrices.
Eric Anholt [Thu, 10 May 2012 22:38:11 +0000 (15:38 -0700)]
i965/vs: Fix up swizzle for dereference_array of matrices.

Fixes assertion failure in piglit:
vs-mat2-struct-assignment.shader_test
vs-mat2-array-assignment.shader_test

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
12 years agomesa: Throw error on glGetActiveUniform inside Begin/End.
Eric Anholt [Thu, 10 May 2012 21:56:48 +0000 (14:56 -0700)]
mesa: Throw error on glGetActiveUniform inside Begin/End.

Fixes piglit GL_ARB_shader_objeccts/getactiveuniform-beginend.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
12 years agoglsl: Improve the local dead code optimization to eliminate unused channels.
Eric Anholt [Thu, 23 Feb 2012 19:51:04 +0000 (11:51 -0800)]
glsl: Improve the local dead code optimization to eliminate unused channels.

Total instructions: 261582 -> 261316
135/2147 programs affected (6.3%)
36752 -> 36486 instructions in affected programs (0.7% reduction)

This excludes a tropics shader that now gets 16-wide mode and throws
off the numbers.  5 shaders are hurt: two extra MOVs in 4 tropics
shaders it looks like because we don't split register names according
to independent webs, and one gstreamer shader where it looks like
try_rewrite_rhs_to_dst() is falling on its face.

This should also help avoid a regression in VSes from idr's ARB
programs to GLSL work.

12 years agoi965/fs: Do more register coalescing by using the interference graph.
Eric Anholt [Tue, 8 May 2012 17:18:20 +0000 (10:18 -0700)]
i965/fs: Do more register coalescing by using the interference graph.

By using the live variables code for determining interference, we can
handle coalescing in the presence of control flow, which the other
register coalescing path couldn't.

Total instructions: 207184 -> 206990
74/1246 programs affected (5.9%)
33993 -> 33799 instructions in affected programs (0.6% reduction)

There is a newerth shader that loses out, because of some extra MOVs
that now get their dead-code nature obscured by coalescing.  This
should be fixed by doing better at dead code elimination.

12 years agonouveau: place static buffers in VRAM if preferred by the driver
Christoph Bumiller [Thu, 17 May 2012 12:43:47 +0000 (14:43 +0200)]
nouveau: place static buffers in VRAM if preferred by the driver

12 years agonv50/ir: fix reversed order of lane ops in quadops
Christoph Bumiller [Wed, 9 May 2012 18:32:44 +0000 (20:32 +0200)]
nv50/ir: fix reversed order of lane ops in quadops

12 years agonv50,nvc0: handle user vertex buffers
Christoph Bumiller [Wed, 16 May 2012 19:08:37 +0000 (21:08 +0200)]
nv50,nvc0: handle user vertex buffers

And restructure VBO validation a little in the process.

12 years agonv50,nvc0: handle user index buffers
Christoph Bumiller [Wed, 16 May 2012 18:54:23 +0000 (20:54 +0200)]
nv50,nvc0: handle user index buffers

12 years agonv50,nvc0: handle user constbufs without wrapping them in a resource
Christoph Bumiller [Wed, 16 May 2012 18:52:41 +0000 (20:52 +0200)]
nv50,nvc0: handle user constbufs without wrapping them in a resource

12 years agost/mesa: set PIPE_BIND_STREAM_OUTPUT for TFB target in st_bufferobj_data
Christoph Bumiller [Sun, 13 May 2012 19:32:47 +0000 (21:32 +0200)]
st/mesa: set PIPE_BIND_STREAM_OUTPUT for TFB target in st_bufferobj_data

12 years agodarwin: Eliminate a possible race condition while destroying a surface
Jeremy Huddleston [Sat, 28 Apr 2012 01:36:33 +0000 (18:36 -0700)]
darwin: Eliminate a possible race condition while destroying a surface

Introduced by: c60ffd2840036af1ea6f2b6c6e1e9014bb8e2c34
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
12 years agodarwin: Unlock our mutex before destroying it
Jeremy Huddleston [Fri, 11 May 2012 01:56:50 +0000 (18:56 -0700)]
darwin: Unlock our mutex before destroying it

http://xquartz.macosforge.org/trac/ticket/575

Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
12 years agogallium/radeon: Fix r300g tiling breakage.
Michel Dänzer [Wed, 16 May 2012 21:52:19 +0000 (23:52 +0200)]
gallium/radeon: Fix r300g tiling breakage.

Commit 11f056a3f0b87e86267efa8b5ac9d36a343c9dc1 broke the r300g build. Fix it
up, and reinstate some code which isn't needed by r600g and radeonsi but is
by r300g.

12 years agogallium/auxiliary/pipe-loader: Fix usage of anonymous union.
Francisco Jerez [Wed, 16 May 2012 13:43:29 +0000 (15:43 +0200)]
gallium/auxiliary/pipe-loader: Fix usage of anonymous union.

Anonymous unions aren't part of the C99 standard.  Fixes build on GCC
versions older than 4.6.

https://bugs.freedesktop.org/show_bug.cgi?id=50001

Reported-by: Michael Lange <michaell@gmx.org>
12 years agoradeonsi: Initial tiling support.
Michel Dänzer [Wed, 16 May 2012 16:19:13 +0000 (18:19 +0200)]
radeonsi: Initial tiling support.

Largely based on the corresponding Evergreen support in r600g.

12 years agor600g: Set tiling information for BOs being shared.
Michel Dänzer [Wed, 16 May 2012 15:45:17 +0000 (17:45 +0200)]
r600g: Set tiling information for BOs being shared.

Fixes https://bugs.freedesktop.org/show_bug.cgi?id=48747

12 years agost/xorg: Better handling of EXA copies.
Michel Dänzer [Wed, 16 May 2012 15:45:10 +0000 (17:45 +0200)]
st/xorg: Better handling of EXA copies.

Always use the resource_copy_region hook. If a source and destination rectangle
overlap, copy to/from a temporary pixmap.

12 years agoradeonsi: Bump MAX_DRAW_CS_DWORDS.
Michel Dänzer [Tue, 15 May 2012 15:14:12 +0000 (17:14 +0200)]
radeonsi: Bump MAX_DRAW_CS_DWORDS.

I missed this when updating si_context_draw().

12 years agodraw,llvmpipe: Avoid named struct types on LLVM 3.0 and later.
José Fonseca [Wed, 16 May 2012 14:00:23 +0000 (15:00 +0100)]
draw,llvmpipe: Avoid named struct types on LLVM 3.0 and later.

Starting with LLVM 3.0, named structures are meant not for debugging, but
for recursive data types, previously also known as opaque types.

The recursive nature of these types leads to several memory management
difficulties.  Given that we don't actually need recursive types, avoid
them altogether.

This is an attempt to address fdo bugs 41791 and 44466. The issue is
somewhat random so there's no easy way to check how effective this is.

12 years agollvmpipe: Color slot interpolation can be flat or perspective, not linear.
Olivier Galibert [Tue, 15 May 2012 20:10:08 +0000 (22:10 +0200)]
llvmpipe: Color slot interpolation can be flat or perspective, not linear.

Fixes a bunch of glsl 1.10 interpolation piglit tests.

Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: José Fonseca <jfonseca@vmware.com>
12 years agoconfigure.ac: Fix typos in the r600-llvm-compiler option
Homer Hsing [Tue, 15 May 2012 22:56:17 +0000 (18:56 -0400)]
configure.ac: Fix typos in the r600-llvm-compiler option

Signed-off-by: Tom Stellard <thomas.stellard@amd.com>
12 years agogallivm: Add MCRegisterInfo.h to silence benign warnings about missing implementation.
José Fonseca [Tue, 15 May 2012 12:10:26 +0000 (05:10 -0700)]
gallivm: Add MCRegisterInfo.h to silence benign warnings about missing implementation.

Trivial.

12 years agoi965/blorp: Move exec() out of brw_blorp_params.
Paul Berry [Tue, 15 May 2012 14:29:26 +0000 (07:29 -0700)]
i965/blorp: Move exec() out of brw_blorp_params.

No functional change.  This patch replaces the
brw_blorp_params::exec() method with a global function
brw_blorp_exec() that performs the operation described by the params
data structure.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
12 years agoi965/gen6: Initial implementation of MSAA.
Paul Berry [Mon, 30 Apr 2012 04:41:42 +0000 (21:41 -0700)]
i965/gen6: Initial implementation of MSAA.

This patch enables MSAA for Gen6, by modifying intel_mipmap_tree to
understand multisampled buffers, adapting the rendering pipeline setup
to enable multisampled rendering, and adding multisample resolve
operations to brw_blorp_blit.cpp. Some preparation work is also
included for Gen7, but it is not yet enabled.

MSAA support is still fairly preliminary.  In particular, the
following are not yet supported:
- Fully general blits between MSAA and non-MSAA buffers.
- Formats other than RGBA8, DEPTH24, and STENCIL8.
- Centroid interpolation.
- Coverage parameters (glSampleCoverage, GL_SAMPLE_ALPHA_TO_COVERAGE,
  GL_SAMPLE_ALPHA_TO_ONE, GL_SAMPLE_COVERAGE, GL_SAMPLE_COVERAGE_VALUE,
  GL_SAMPLE_COVERAGE_INVERT).

Fixes piglit tests "EXT_framebuffer_multisample/accuracy" on
i965/Gen6.

v2:
- In intel_alloc_renderbuffer_storage(), quantize the requested number
  of samples to the next higher sample count supported by the
  hardware.  This ensures that a query of GL_SAMPLES will return the
  correct value.  It also ensures that MSAA is fully disabled on Gen7
  for now (since Gen7 MSAA support doesn't work yet).
- When reading from a non-MSAA surface, ensure that s_is_zero is true
  so that we won't try to read from a nonexistent sample.

12 years agoi965/gen6+: Add code to perform blits on the render path ("blorp").
Paul Berry [Mon, 30 Apr 2012 05:44:25 +0000 (22:44 -0700)]
i965/gen6+: Add code to perform blits on the render path ("blorp").

This patch expands the "blorp" component to be able to perform blits
as well as HiZ resolves.  The new blitting code is located in
brw_blorp_blit.cpp.  This includes the necessary fragment shader code
to look up pixels in the source buffer (which is configured as a
texture) and output them to the destination buffer (which is
configured as the render target).

Most of the time the fragment shader code is simple and
straightforward, since it merely has to apply a coordinate offset,
read from the texture, and write to the render target.  However, in
the case of blitting stencil buffers, things are more complicated,
since the GPU stores stencil data using W tiling, and W tiling is not
supported for textures or render targets.  So, we set up the stencil
buffers as Y tiled, and emit fragment shader code that adjusts the
coordinates to account for the difference between W and Y tiling.
Furthermore, since a rectangular region in W tiling does not
necessarily correspond to a rectangular region in Y tiling, we widen
the rectangle primitive to the nearest tile boundary and have the
fragment shader "kill" any pixels that don't fall inside the actual
desired destination rectangle.

All of this is a necessary prerequisite for implementing MSAA, since
we'll need to be able to blit between multisample color, depth, and
stencil buffers and their non-multisampled counterparts, and none of
the existing blitting mechanisms support multisampling.

In addition, the new blitting code should speed up operations where we
previously fell back to software rasterization, such as blitting of
stencil buffers.  The current fallback sequence is: first we try to do
a blit using the hardware blitting engine.  If that fails we try to do
a blit using the render path.  If that also fails then we do the blit
using a meta-op (which may or may not fall back to software
rasterization).

Note that blitting using the render path has some limitations at the
moment: it only supports a few formats, and it doesn't support
clipping or scissoring.  These limitations will be addressed in future
patch series.

v2:
- Add the code that configures the WM program to
  gen{6,7}_emit_wm_config() and gen7_emit_ps_config() rather than
  creating separate ...enable() functions.
- Call intel_prepare_render before determining which miptrees we are
  blitting from/to, because it may cause miptrees to be reallocated.
- Allow the blit to mirror X and/or Y coordinates.
- Disable blorp blits on Gen7 for now, since they aren't working yet.

12 years agoi965: Expose surface setup internals for use by blits.
Paul Berry [Fri, 27 Apr 2012 01:01:01 +0000 (18:01 -0700)]
i965: Expose surface setup internals for use by blits.

This patch exposes the functions brw_get_surface_tiling_bits and
gen7_set_surface_tiling, so that they can be re-used when setting up
surface states in gen6_blorp.cpp and gen7_blorp.cpp.

Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
12 years agoi965: split gen{6,7}_blorp_exec functions into manageable chunks.
Paul Berry [Mon, 30 Apr 2012 21:29:35 +0000 (14:29 -0700)]
i965: split gen{6,7}_blorp_exec functions into manageable chunks.

This patch splits up the gen6_blorp_exec and gen7_blorp_exec
functions, which were very long, into simple component functions.
With a few exceptions, there is one function per state packet.

This will allow blit functionality to be added without significantly
complicating the code.

Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
v2: Rename the functions gen{6,7}_emit_wm_disable() to
gen{6,7}_emit_wm_config() (since the WM is not actually disabled
during HiZ ops; it simply doesn't have a program).  Also, on gen7,
split out the configration of 3DSTATE_PS to a separate function
gen7_emit_ps_config().

12 years agoi965: Parameterize HiZ code to prepare for adding blitting.
Paul Berry [Mon, 30 Apr 2012 05:00:46 +0000 (22:00 -0700)]
i965: Parameterize HiZ code to prepare for adding blitting.

This patch groups together the parameters used by the HiZ functions
into a new data structure, brw_hiz_resolve_params, rather than passing
each parameter individually between the HiZ functions.  This data
structure is a subclass of brw_blorp_params, which represents the
parameters of a general-purpose blit or resolve operation.  A future
patch will add another subclass for blits.

In addition, this patch generalizes the (width, height) parameters to
a full rect (x0, y0, x1, y1), since blitting operations will need to
be able to operate on arbitrary rectangles.  Also, it renames several
of the HiZ functions to reflect the expanded role they will serve.

v2: Rename brw_hiz_resolve_params to brw_hiz_op_params.  Move
gen{6,7}_blorp_exec() functions back into gen{6,7}_blorp.h.

Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
12 years agoi965: Implement guardband clipping on Ivybridge.
Kenneth Graunke [Sat, 5 May 2012 00:06:39 +0000 (17:06 -0700)]
i965: Implement guardband clipping on Ivybridge.

Improves performance in Citybench:
- 320x240: 9.19589% +/- 0.557621%
- 1280x480: 3.90797% +/- 0.774429%

No apparent difference in OpenArena.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
12 years agoi965: Implement guardband clipping on Sandybridge.
Kenneth Graunke [Sat, 5 May 2012 00:06:38 +0000 (17:06 -0700)]
i965: Implement guardband clipping on Sandybridge.

Improves performance in Citybench:
- 320x240:  19.8008% +/- 0.937818%
- 1280x480: 6.53856% +/- 0.859083%

No apparent difference in OpenArena nor Xonotic.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
12 years agollvmpipe: Add a test for lp_build_sgn.
José Fonseca [Tue, 15 May 2012 21:38:53 +0000 (22:38 +0100)]
llvmpipe: Add a test for lp_build_sgn.

Only floating point though, but better than nothing.

12 years agogallivm: Fix lp_build_sgn for normalized/fixed-point integers.
José Fonseca [Tue, 15 May 2012 21:36:09 +0000 (22:36 +0100)]
gallivm: Fix lp_build_sgn for normalized/fixed-point integers.

These types got broken with the recent commit that fixed lp_build_sgn
for negative integers.

12 years agogallivm: Fix lp_build_const_xxx for negative integers.
José Fonseca [Tue, 15 May 2012 21:34:36 +0000 (22:34 +0100)]
gallivm: Fix lp_build_const_xxx for negative integers.

Do proper rounding.

Thanks to Olivier Galibert for investigating this.

12 years agosvga: fix FBO / viewport bugs
Brian Paul [Mon, 14 May 2012 22:33:58 +0000 (16:33 -0600)]
svga: fix FBO / viewport bugs

When drawing to a FBO, the viewport wasn't always set correctly.  It
was fine in the usual case of the viewport dims matching the surface
dims but broken otherwise.  In particular, this was happening because
the viewport scale is negative for FBO rendering.

The piglit fbo-viewport test exercises this.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
12 years agoradeon/llvm: add support for texture offsets, fix TEX_LD
Vadim Girlin [Tue, 15 May 2012 14:48:51 +0000 (18:48 +0400)]
radeon/llvm: add support for texture offsets, fix TEX_LD

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
12 years agoradeon/llvm: add SET_GRADIENTS*, fix SAMPLE_G
Vadim Girlin [Tue, 15 May 2012 14:53:06 +0000 (18:53 +0400)]
radeon/llvm: add SET_GRADIENTS*, fix SAMPLE_G

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
12 years agoradeon/llvm: increase const regs count
Vadim Girlin [Tue, 15 May 2012 14:48:26 +0000 (18:48 +0400)]
radeon/llvm: increase const regs count

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
12 years agoradeon/llvm: use IntrNoMem property for intrinsics where possible
Vadim Girlin [Tue, 15 May 2012 14:48:16 +0000 (18:48 +0400)]
radeon/llvm: use IntrNoMem property for intrinsics where possible

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
12 years agoradeon/llvm: use correct intrinsic for CEIL
Vadim Girlin [Tue, 15 May 2012 14:48:06 +0000 (18:48 +0400)]
radeon/llvm: use correct intrinsic for CEIL

Should be round_posinf instead of round_neginf.

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
12 years agoradeon/llvm: improve ABS_i32 lowering
Vadim Girlin [Tue, 15 May 2012 14:47:53 +0000 (18:47 +0400)]
radeon/llvm: improve ABS_i32 lowering

We can save one instruction by lowering it to:
  SUB_INT tmp, 0, src
  MAX_INT dst, src, tmp

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
12 years agoradeon/llvm: fix BUILD_VECTOR lowering for replicated value
Vadim Girlin [Tue, 15 May 2012 14:47:38 +0000 (18:47 +0400)]
radeon/llvm: fix BUILD_VECTOR lowering for replicated value

We expect that all elements will be assigned even if they are equal

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
12 years agoradeon/llvm: add names for AMDGPU* passes
Vadim Girlin [Tue, 15 May 2012 14:47:22 +0000 (18:47 +0400)]
radeon/llvm: add names for AMDGPU* passes

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
12 years agoradeon/llvm: add generated files to .gitignore
Vadim Girlin [Tue, 15 May 2012 14:47:02 +0000 (18:47 +0400)]
radeon/llvm: add generated files to .gitignore

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
12 years agoAdd .gitignore files for recently-added gallium projects
Paul Berry [Mon, 14 May 2012 16:24:46 +0000 (09:24 -0700)]
Add .gitignore files for recently-added gallium projects

This patch adds .gitignore files to ignore the makefiles generated by
the gallium pipe loader and the clover OpenCL state tracker.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
12 years agoglsl: Fix lower_discard_flow prototype mismatch.
José Fonseca [Tue, 15 May 2012 11:27:15 +0000 (12:27 +0100)]
glsl: Fix lower_discard_flow prototype mismatch.

Should fix MSVC link failure.

12 years agoRevert "i965/fs: Jump from discard statements to the end of the program when done."
Eric Anholt [Fri, 4 May 2012 20:09:38 +0000 (13:09 -0700)]
Revert "i965/fs: Jump from discard statements to the end of the program when done."

This reverts commit 31866308fcf989df992ace28b5b986c3d3770e90.

Fixes piglit glsl-fs-discard-exit-3 and unigine tropics rendering.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
12 years agoglsl: Implement the GLSL 1.30+ discard control flow rule in GLSL IR.
Eric Anholt [Fri, 4 May 2012 20:08:46 +0000 (13:08 -0700)]
glsl: Implement the GLSL 1.30+ discard control flow rule in GLSL IR.

Previously, I tried implementing this in the i965 driver, but did so
in a way that violated the intent of the spec, and broke Tropics.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
12 years agoglsl: Remove the opt_discard_simplification pass.
Eric Anholt [Fri, 4 May 2012 20:37:08 +0000 (13:37 -0700)]
glsl: Remove the opt_discard_simplification pass.

This conflicts with the GLSL 1.30+ rules for derivatives after a
discard has occurred.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
12 years agoi965/fs: Remove the requirement of no dead code for interference checks.
Eric Anholt [Tue, 8 May 2012 17:36:18 +0000 (10:36 -0700)]
i965/fs: Remove the requirement of no dead code for interference checks.

This will be convenient when I want to comment out optimization code
to see the raw program being optimized, but more importantly will let
the interference check be used during optimization.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
12 years agoi965/fs: Add support for copy propagation.
Eric Anholt [Tue, 8 May 2012 20:01:52 +0000 (13:01 -0700)]
i965/fs: Add support for copy propagation.

We could do more by handling abs/negate and non-GRF sources, but this is
a good start.  Improves tropics performance 0.30% +/- .17% (n=43).

shader-db results:
Total instructions: 208032 -> 207184
60/1246 programs affected (4.8%)
23286 -> 22438 instructions in affected programs (3.6% reduction)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
12 years agoi965/fs: When doing no work for live interval calculation, do no allocation.
Eric Anholt [Tue, 8 May 2012 20:40:44 +0000 (13:40 -0700)]
i965/fs: When doing no work for live interval calculation, do no allocation.

When I had a bug causing the backend to never finish optimizing, it
also sent me deep into swap.  This avoids extra memory allocation per
trip through optimization, and thus may reduce the peak memory
allocation of the driver even in the success case.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
12 years agoi965/gen7: Set tile_x/y to 0 in the no-stencil case.
Eric Anholt [Thu, 10 May 2012 15:49:17 +0000 (08:49 -0700)]
i965/gen7: Set tile_x/y to 0 in the no-stencil case.

Fixes compiler warnings.

12 years agointel: Fix signed/unsigned comparison warnings.
Eric Anholt [Thu, 10 May 2012 15:50:14 +0000 (08:50 -0700)]
intel: Fix signed/unsigned comparison warnings.

12 years agointel: Fix compile warning from 7b6424143d8bf572cadd46adcbaa91d2a5598635
Eric Anholt [Thu, 10 May 2012 15:46:35 +0000 (08:46 -0700)]
intel: Fix compile warning from 7b6424143d8bf572cadd46adcbaa91d2a5598635

12 years agointel: Fix compiler warning from 3cd7bee48f7caf7850ea64d40f43875d4c975507
Eric Anholt [Thu, 10 May 2012 15:45:25 +0000 (08:45 -0700)]
intel: Fix compiler warning from 3cd7bee48f7caf7850ea64d40f43875d4c975507

12 years agoi965/fs: Add a local common subexpression elimination pass.
Kenneth Graunke [Thu, 10 May 2012 23:10:15 +0000 (16:10 -0700)]
i965/fs: Add a local common subexpression elimination pass.

Total instructions: 18210 -> 17836
49/163 programs affected (30.1%)
12888 -> 12514 instructions in affected programs (2.9% reduction)

This reduces Lightsmark's "Scale down filter" shader from 395
instructions to 283, a whopping 28%.  It also reduces register pressure
significantly: the SIMD8 program now uses 29 registers instead of 101,
giving us more than enough room for a SIMD16 program.

v2: Add && !inst->conditional_mod to the "skip some instructions" check.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
12 years agoi965/fs: Use a const reference in fs_reg::equals instead of a pointer.
Kenneth Graunke [Thu, 10 May 2012 23:10:14 +0000 (16:10 -0700)]
i965/fs: Use a const reference in fs_reg::equals instead of a pointer.

This lets you omit some ampersands and is more idiomatic C++.  Using
const also marks the function as not altering either register (which
was obvious, but nice to enforce).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
12 years agomesa: print the Git SHA1 in GL_VERSION for ES1 and ES2.
Oliver McFadden [Mon, 7 May 2012 13:01:16 +0000 (16:01 +0300)]
mesa: print the Git SHA1 in GL_VERSION for ES1 and ES2.

Signed-off-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
12 years agomesa: GLES specifies restrictions on uniform matrix transpose.
Oliver McFadden [Mon, 7 May 2012 11:15:53 +0000 (14:15 +0300)]
mesa: GLES specifies restrictions on uniform matrix transpose.

GL_INVALID_VALUE is generated if transpose is not GL_FALSE.

http://www.khronos.org/opengles/sdk/docs/man/xhtml/glUniform.xml

Signed-off-by: Oliver McFadden <oliver.mcfadden@linux.intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
12 years agoradeonsi: Keep around copies of original sampler states.
Michel Dänzer [Mon, 14 May 2012 14:34:11 +0000 (16:34 +0200)]
radeonsi: Keep around copies of original sampler states.

Fixes crashes when restoring sampler states after blits.

12 years agoradeonsi: Flesh out shader interpolation related code.
Michel Dänzer [Mon, 14 May 2012 14:26:19 +0000 (16:26 +0200)]
radeonsi: Flesh out shader interpolation related code.

Handle perspective interpolation and ceontroid vs. center.

12 years agoradeonsi: Add proper SI family names.
Michel Dänzer [Mon, 14 May 2012 13:39:17 +0000 (15:39 +0200)]
radeonsi: Add proper SI family names.

12 years agoradeonsi: Separate states for samplers and sampler views.
Michel Dänzer [Mon, 14 May 2012 13:32:02 +0000 (15:32 +0200)]
radeonsi: Separate states for samplers and sampler views.

And reset nregs on updates. Prevents eventual assertion failure.

12 years agoradeonsi: Fixups for drawing with an index buffer.
Michel Dänzer [Fri, 11 May 2012 13:26:15 +0000 (15:26 +0200)]
radeonsi: Fixups for drawing with an index buffer.

Mostly using the DRAW_INDEX_2 type 3 packet instead of DRAW_INDEX, which is
no longer supported on SI.

12 years agovl: Initialize pipe_vertex_buffer.user_buffer fields.
Vinson Lee [Mon, 14 May 2012 06:40:57 +0000 (23:40 -0700)]
vl: Initialize pipe_vertex_buffer.user_buffer fields.

Fix uninitialized scalar variable defects reported by Coverity.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
12 years agollvmpipe: Calculate fixed point coordinates for triangle setup earlier.
James Benton [Mon, 14 May 2012 15:00:06 +0000 (16:00 +0100)]
llvmpipe: Calculate fixed point coordinates for triangle setup earlier.

This allows us to calculate the triangle's area using fixed point,
previously it was cacluated in floating point space. It was possible
that a triangle which had negative area in floating point space had
a positive area in fixed point space.

Fixes fdo 40920.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
12 years agoradeon/llvm: Coding style fixes for R600CodeEmitter.cpp
Tom Stellard [Mon, 14 May 2012 13:32:45 +0000 (09:32 -0400)]
radeon/llvm: Coding style fixes for R600CodeEmitter.cpp

12 years agoradeon/llvm: Lower bitcast instructions to copies
Tom Stellard [Mon, 14 May 2012 14:40:12 +0000 (10:40 -0400)]
radeon/llvm: Lower bitcast instructions to copies

12 years agoradeonsi: remove slab allocator for pipe_resource (used mainly for user buffers)
Marek Olšák [Fri, 11 May 2012 20:56:08 +0000 (22:56 +0200)]
radeonsi: remove slab allocator for pipe_resource (used mainly for user buffers)

12 years agor600g: remove slab allocator for pipe_resource (used mainly for user buffers)
Marek Olšák [Fri, 11 May 2012 20:56:08 +0000 (22:56 +0200)]
r600g: remove slab allocator for pipe_resource (used mainly for user buffers)

12 years agor600g: handle R16G16B16_FLOAT and R32G32B32_FLOAT in translate_colorswap (EG)
Marek Olšák [Sat, 12 May 2012 15:35:42 +0000 (17:35 +0200)]
r600g: handle R16G16B16_FLOAT and R32G32B32_FLOAT in translate_colorswap (EG)

12 years agogallium: remove user_buffer_create from the interface
Marek Olšák [Sat, 12 May 2012 11:08:02 +0000 (13:08 +0200)]
gallium: remove user_buffer_create from the interface

Nothing uses it now.

12 years agogallium/graw: stop using user_buffer_create
Marek Olšák [Sat, 12 May 2012 10:56:19 +0000 (12:56 +0200)]
gallium/graw: stop using user_buffer_create

This is compile-tested.

12 years agogallium/util: remove unused parameter nr_vertex_buffers in util_draw_max_index
Marek Olšák [Thu, 29 Mar 2012 22:20:16 +0000 (00:20 +0200)]
gallium/util: remove unused parameter nr_vertex_buffers in util_draw_max_index

12 years agoclover: Fix build on i386.
Francisco Jerez [Sat, 12 May 2012 17:33:33 +0000 (19:33 +0200)]
clover: Fix build on i386.

12 years agoclover: Check the total work-group size provided to clEnqueueNDRangeKernel.
Francisco Jerez [Sat, 12 May 2012 17:24:09 +0000 (19:24 +0200)]
clover: Check the total work-group size provided to clEnqueueNDRangeKernel.

12 years agoclover, gallium: add PIPE_COMPUTE_CAP_MAX_THREADS_PER_BLOCK
Christoph Bumiller [Sat, 12 May 2012 17:32:46 +0000 (19:32 +0200)]
clover, gallium: add PIPE_COMPUTE_CAP_MAX_THREADS_PER_BLOCK

This is not necessarily the product of MAX_BLOCK_SIZE[i].

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
12 years agor600g: Handle compute caps.
Francisco Jerez [Sat, 12 May 2012 17:08:43 +0000 (19:08 +0200)]
r600g: Handle compute caps.

12 years agor300g: Handle compute caps.
Francisco Jerez [Sat, 12 May 2012 17:17:13 +0000 (19:17 +0200)]
r300g: Handle compute caps.

12 years agoauxiliary/util: Ensure pipe_constant_buffer::user_buffer is initialized.
José Fonseca [Sat, 12 May 2012 16:23:52 +0000 (17:23 +0100)]
auxiliary/util: Ensure pipe_constant_buffer::user_buffer is initialized.

12 years agoscons: Fix missing gbm symbols in st/egl.
José Fonseca [Sat, 12 May 2012 16:08:30 +0000 (17:08 +0100)]
scons: Fix missing gbm symbols in st/egl.

12 years agotargets/egl-static: Fix some missing symbols.
José Fonseca [Sat, 12 May 2012 16:00:11 +0000 (17:00 +0100)]
targets/egl-static: Fix some missing symbols.

12 years agotrace: Fix pipe_context::clear dumping.
José Fonseca [Sat, 12 May 2012 15:59:41 +0000 (16:59 +0100)]
trace: Fix pipe_context::clear dumping.

12 years agotrace: Fix pipe_shader_state dumping.
José Fonseca [Sat, 12 May 2012 15:59:22 +0000 (16:59 +0100)]
trace: Fix pipe_shader_state dumping.

12 years agoscons: Link r600_drm.so against libdrm-radeon
José Fonseca [Fri, 23 Mar 2012 10:52:47 +0000 (10:52 +0000)]
scons: Link r600_drm.so against libdrm-radeon

12 years agotrace: Match NULL context members.
José Fonseca [Sat, 12 May 2012 15:31:25 +0000 (16:31 +0100)]
trace: Match NULL context members.

12 years agogallium/docs: remove documentation of redefine_user_buffer
Marek Olšák [Sat, 12 May 2012 10:25:33 +0000 (12:25 +0200)]
gallium/docs: remove documentation of redefine_user_buffer

12 years agoradeonsi: Fixed point vertex formats aren't supported.
Michel Dänzer [Fri, 11 May 2012 14:19:19 +0000 (16:19 +0200)]
radeonsi: Fixed point vertex formats aren't supported.

12 years agoradeonsi: Fixups for recent build infrastructure changes.
Michel Dänzer [Sat, 12 May 2012 10:12:21 +0000 (12:12 +0200)]
radeonsi: Fixups for recent build infrastructure changes.

In particular for the pipe loader changes.

12 years agor600g: setup COLOR1 for possible dual-src in the framebuffer bind
Dave Airlie [Fri, 27 Apr 2012 08:38:46 +0000 (09:38 +0100)]
r600g: setup COLOR1 for possible dual-src in the framebuffer bind

As pointed out by Marek, if we have only one cb, we may as well add this
single register write here rather than adding it in the draw loop.

Reviewed-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
12 years agonv30: Silence pipe_cap warnings
Roy Spliet [Sat, 12 May 2012 01:42:31 +0000 (03:42 +0200)]
nv30: Silence pipe_cap warnings

Signed-off-by: Roy Spliet <r.spliet@student.tudelft.nl>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
12 years agonv30/shader: SSG, LIT only requires one source register
Roy Spliet [Sat, 12 May 2012 01:42:30 +0000 (03:42 +0200)]
nv30/shader: SSG, LIT only requires one source register

Fixes crashing due to assertion error

Signed-off-by: Roy Spliet <r.spliet@student.tudelft.nl>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
12 years agonouveau/vieux: finish != flush, how about we do that..
Ben Skeggs [Thu, 10 May 2012 17:02:13 +0000 (03:02 +1000)]
nouveau/vieux: finish != flush, how about we do that..

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
12 years agor300g/swtcl: move vertex buffer updates into set_vertex_buffers
Marek Olšák [Fri, 11 May 2012 21:33:50 +0000 (23:33 +0200)]
r300g/swtcl: move vertex buffer updates into set_vertex_buffers

12 years agor300g/swtcl: move index buffer updates from swtcl_draw_vbo into set_index_buffer
Marek Olšák [Fri, 11 May 2012 21:33:50 +0000 (23:33 +0200)]
r300g/swtcl: move index buffer updates from swtcl_draw_vbo into set_index_buffer

12 years agor300g/swtcl: malloc vertex and index buffers (don't use radeon DRM to get them)
Marek Olšák [Fri, 11 May 2012 21:22:21 +0000 (23:22 +0200)]
r300g/swtcl: malloc vertex and index buffers (don't use radeon DRM to get them)

Vertex and index buffers are never used by hardware, only by Draw.
SWTCL chipsets usually have very little memory, so this might help
with stability and reliability.