mesa.git
9 years agoglsl: use the is_gl_identifier() helper in a couple more places
Brian Paul [Fri, 2 Jan 2015 23:19:48 +0000 (16:19 -0700)]
glsl: use the is_gl_identifier() helper in a couple more places

Reviewed-by: Eric Anholt <eric@anholt.net>
9 years agometa: init var to silence uninitialized variable warning
Brian Paul [Fri, 19 Dec 2014 21:26:57 +0000 (14:26 -0700)]
meta: init var to silence uninitialized variable warning

9 years agodraw: silence uninitialized variable warning
Brian Paul [Fri, 19 Dec 2014 16:37:33 +0000 (09:37 -0700)]
draw: silence uninitialized variable warning

v2: move initialization of llvm_gs to declaration.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
9 years agogallivm: silence a couple compiler warnings
Brian Paul [Fri, 19 Dec 2014 16:36:51 +0000 (09:36 -0700)]
gallivm: silence a couple compiler warnings

Silence warnings about possibly uninitialized variables when making a
release build.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
9 years agogallium/util: make sure cache line size is not zero
Leonid Shatz [Wed, 31 Dec 2014 18:07:44 +0000 (19:07 +0100)]
gallium/util: make sure cache line size is not zero

The "normal" detection (querying clflush size) already made sure it is
non-zero, however another method did not. This lead to crashes if this
value happened to be zero (apparently can happen in virtualized environments
at least).
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=87913

Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agogallium/util: fix crash with daz detection on x86
Roland Scheidegger [Wed, 31 Dec 2014 16:39:57 +0000 (17:39 +0100)]
gallium/util: fix crash with daz detection on x86

The code used PIPE_ALIGN_VAR for the variable used by fxsave, however this
does not work if the stack isn't aligned. Hence use PIPE_ALIGN_STACK function
decoration to fix the segfault which can happen if stack alignment is only
4 bytes.
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=87658.

Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agonvc0: add name to magic number
Ilia Mirkin [Mon, 5 Jan 2015 05:33:58 +0000 (00:33 -0500)]
nvc0: add name to magic number

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonvc0: regenerate rnndb headers
Ilia Mirkin [Wed, 31 Dec 2014 03:27:57 +0000 (22:27 -0500)]
nvc0: regenerate rnndb headers

The headers hadn't been regenerated in a long time and had seen a number
of manual modifications. A few changes:
 - remove nvc0_2d entirely, use the nv50 header which has the nvc0
   values too
 - remove 3ddefs, it's identical to the nv50 file
 - move macros out into a separate file

Also the upstream rnndb changed the overall chip naming convention; this
was fixed up manually in the generated files until a better solution is
determined.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonv50: regenerate rnndb headers
Ilia Mirkin [Wed, 31 Dec 2014 02:19:14 +0000 (21:19 -0500)]
nv50: regenerate rnndb headers

The headers hadn't been regenerated in a long time, and there were a few
minor divergences. Among other things, rnndb has changed naming to
G80/etc, for now I've not tackled switching that over and manually
replaced the nvidia codenames back to the chip ids. However no other
modifications of the headergen'd headers was done.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonv50: enable texture compression
Tobias Klausmann [Sat, 3 Jan 2015 00:00:08 +0000 (01:00 +0100)]
nv50: enable texture compression

Compression seems to be supported for only some formats. Enable it for
those. Previously this was disabled for everything despite the code
looking like it was actually enabled.

Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonv50/ir: enable sat modifier for OP_SUB
Ilia Mirkin [Mon, 5 Jan 2015 00:32:18 +0000 (19:32 -0500)]
nv50/ir: enable sat modifier for OP_SUB

SUB is handled the same as ADD, so no reason not to allow a saturate
modifier on it.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonv50/ir: Add sat modifier for mul
Roy Spliet [Sun, 4 Jan 2015 23:22:17 +0000 (00:22 +0100)]
nv50/ir: Add sat modifier for mul

Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonv50,nvc0: avoid doing work inside of an assert
Ilia Mirkin [Mon, 5 Jan 2015 05:17:26 +0000 (00:17 -0500)]
nv50,nvc0: avoid doing work inside of an assert

assert is compiled out in release builds - don't put logic into it. Note
that this particular instance is only used for vp debugging and is
normally compiled out.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonv50/ir: fix texture offsets in release builds
Ilia Mirkin [Sun, 4 Jan 2015 23:03:20 +0000 (18:03 -0500)]
nv50/ir: fix texture offsets in release builds

assert's get compiled out in release builds, so they can't be relied
upon to perform logic.

Reported-by: Pierre Moreau <pierre.morrow@free.fr>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Roy Spliet <rspliet@eclipso.eu>
Cc: "10.2 10.3 10.4" <mesa-stable@lists.freedesktop.org>
9 years agoi965: Micro-optimize swizzle_to_scs() and make it inlinable.
Kenneth Graunke [Thu, 31 Jul 2014 08:26:30 +0000 (01:26 -0700)]
i965: Micro-optimize swizzle_to_scs() and make it inlinable.

brw_swizzle_to_scs has been showing up in my CPU profiling, which is
rather silly - it's a tiny amount of code.  It really should be inlined,
and can easily be implemented with fewer instructions.

The enum translation is as follows:

SWIZZLE_X, SWIZZLE_Y, SWIZZLE_Z, SWIZZLE_W, SWIZZLE_ZERO, SWIZZLE_ONE
        0          1          2          3             4            5
        4          5          6          7             0            1
  SCS_RED, SCS_GREEN,  SCS_BLUE, SCS_ALPHA,     SCS_ZERO,     SCS_ONE

which is simply (swizzle + 4) & 7.

Haswell needs extra textureGather workarounds to remap GREEN to BLUE,
but Broadwell and later do not.

This patch replicates swizzle_to_scs in gen7_wm_surface_state.c and
gen8_surface_state.c, since the Gen8+ code can be simplified to a mere
two instructions.  Both copies can be marked static for easy inlining.

v2: Put the commit message in the code as comments (requested by
    Jason Ekstrand).  Also fix a typo.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agoi965: Support MESA_FORMAT_R8G8B8X8_SRGB.
Kenneth Graunke [Thu, 1 Jan 2015 04:31:26 +0000 (20:31 -0800)]
i965: Support MESA_FORMAT_R8G8B8X8_SRGB.

Valve games use GL_SRGB8 textures.  Instead of supporting that properly,
we fell back to MESA_FORMAT_R8G8B8A8_SRGB (with an alpha channel), which
meant that we had to use texture swizzling to override the alpha to 1.0
when sampling.  This meant shader recompiles on Gen < 7.5 platforms.

By supporting MESA_FORMAT_R8G8B8X8_SRGB, the hardware just returns 1.0
for us, so we can just use SWIZZLE_XYZW, and avoid any recompiles.  All
generations of hardware have supported the format for sampling and
filtering; we can easily support rendering by using the R8G8B8A8_SRGB
format and writing garbage to the X channel.  (We do this already for
the non-SRGB version of this format.)

This removes all remaining shader recompiles in a time demo of "Counter
Strike: Global Offensive" (32 -> 0) on Sandybridge.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
9 years agoi965: Fix BLORP sRGB MSAA overrides to cope with X vs. A formats.
Kenneth Graunke [Thu, 1 Jan 2015 05:51:05 +0000 (21:51 -0800)]
i965: Fix BLORP sRGB MSAA overrides to cope with X vs. A formats.

The logic in brw_blorp_surface_info::set uses brw_format_for_mesa_format
for source surfaces, and brw->render_target_format[] for destination
surfaces.  We should do the same in the sRGB MSAA overrides.

Currently, this isn't a problem, since SRGB MSAA buffers are all RGBA.
The next commit will introduce RGBX SRGB MSAA buffers, at which point
we need to get the RGBX -> RGBA format overrides for rendering right.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
9 years agoi965: Copy shader->shadow_samplers to prog->ShadowSamplers.
Kenneth Graunke [Thu, 1 Jan 2015 01:38:05 +0000 (17:38 -0800)]
i965: Copy shader->shadow_samplers to prog->ShadowSamplers.

ir_to_mesa does this - apparently we just forgot or something.

Without this, we'll guess the wrong texture swizzle (XYZW for color
instead of XXX1 for depth) when doing precompiles.

This cuts 26 shader recompiles in a time demo of "Counter Strike:
Global Offensive" (58 -> 32) on Sandybridge.  Haswell still has 0
recompiles.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
9 years agoi965: Make the precompile ignore DEPTH_TEXTURE_MODE on Gen7.5+.
Kenneth Graunke [Thu, 1 Jan 2015 02:06:41 +0000 (18:06 -0800)]
i965: Make the precompile ignore DEPTH_TEXTURE_MODE on Gen7.5+.

Gen7.5+ platforms that support the "Shader Channel Select" feature leave
key->tex.swizzles[i] as SWIZZLE_NOOP except when GL_DEPTH_TEXTURE_MODE
is GL_ALPHA (which is really uncommon).  So, the precompile should leave
them as SWIZZLE_NOOP (aka SWIZZLE_XYZW) as well.

We didn't notice this because prog->ShadowSamplers is not set correctly.
The next patch will fix that problem.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87886
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
9 years agoi965: Implement WaCsStallAtEveryFourthPipecontrol on IVB/BYT.
Kenneth Graunke [Wed, 12 Nov 2014 19:17:55 +0000 (11:17 -0800)]
i965: Implement WaCsStallAtEveryFourthPipecontrol on IVB/BYT.

According to the documentation, we need to do a CS stall on every fourth
PIPE_CONTROL command to avoid GPU hangs.  The kernel does a CS stall
between batches, so we only need to count the PIPE_CONTROLs in our batches.

v2: Get the generation check right (caught by Chris Wilson),
    combine the ++ with the check (suggested by Daniel Vetter).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
9 years agor300g: handle vertex format PIPE_FORMAT_NONE
Marek Olšák [Sun, 4 Jan 2015 22:53:23 +0000 (23:53 +0100)]
r300g: handle vertex format PIPE_FORMAT_NONE

9 years agoglsl_to_tgsi: fix a bug in copy propagation
Marek Olšák [Fri, 2 Jan 2015 13:13:43 +0000 (14:13 +0100)]
glsl_to_tgsi: fix a bug in copy propagation

This fixes the new piglit test: arb_uniform_buffer_object/2-buffers-bug

Cc: 10.2 10.3 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
9 years agoi965: Make INTEL_DEBUG=state ignore state flags with a count of 1.
Kenneth Graunke [Wed, 3 Dec 2014 07:44:30 +0000 (23:44 -0800)]
i965: Make INTEL_DEBUG=state ignore state flags with a count of 1.

There are too many state flags to fit in one terminal screen, even with
a very tall terminal.  Everything is flagged once, so a value of 1 means
that it hasn't ever happened again, and thus isn't terribly interesting.

Skipping those makes it easier to see the interesting values.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoi965: Fix INTEL_DEBUG=optimizer with VF types.
Kenneth Graunke [Thu, 1 Jan 2015 00:54:44 +0000 (16:54 -0800)]
i965: Fix INTEL_DEBUG=optimizer with VF types.

Hardcoding stderr is wrong; INTEL_DEBUG=optimizer uses other files.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoi965: Show opt_vector_float() and later passes in INTEL_DEBUG=optimizer.
Kenneth Graunke [Thu, 1 Jan 2015 00:47:25 +0000 (16:47 -0800)]
i965: Show opt_vector_float() and later passes in INTEL_DEBUG=optimizer.

In order to support calling opt_vector_float() inside a condition, this
patch makes OPT() a statement expression:

https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html

We've used that elsewhere already.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoswrast: Fix -Wduplicate-decl-specifier warning
Jeremy Huddleston Sequoia [Fri, 2 Jan 2015 03:54:41 +0000 (19:54 -0800)]
swrast: Fix -Wduplicate-decl-specifier warning

swrast.c:67:12: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier]
const char const *swrast_vendor_string = "Mesa Project";
           ^
swrast.c:68:12: warning: duplicate 'const' declaration specifier [-Wduplicate-decl-specifier]
const char const *swrast_renderer_string = "Software Rasterizer";
           ^

Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
9 years agonv50/ir: Fold sat into mad
Roy Spliet [Fri, 2 Jan 2015 02:28:50 +0000 (03:28 +0100)]
nv50/ir: Fold sat into mad

The mad instruction emitter already supported the saturate modifier,
but the ModifierFolding pass never tried folding cvt sat operations
in for NV50.

Signed-off-by: Roy Spliet <rspliet@eclipso.eu>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonv50/ir: fold MAD when one of the multiplicands is const
Ilia Mirkin [Thu, 1 Jan 2015 06:01:13 +0000 (01:01 -0500)]
nv50/ir: fold MAD when one of the multiplicands is const

Fold MAD dst, src0, immed, src2 (or src0/immed swapped) when
 - immed = 0 -> MOV dst, src2
 - immed = +/- 1 -> ADD dst, src0, src2

These types of MAD patterns were observed in some st/nine shaders.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agogallium/state_tracker: Rewrite Haiku's state tracker
Alexander von Gluck IV [Mon, 29 Dec 2014 21:51:46 +0000 (21:51 +0000)]
gallium/state_tracker: Rewrite Haiku's state tracker

* More gallium-like
* Leverage stamps properly and don't call mesa functions

9 years agoradeonsi: fix warnings
Marek Olšák [Wed, 31 Dec 2014 00:03:30 +0000 (01:03 +0100)]
radeonsi: fix warnings

9 years agoi965: Fix start/base_vertex_location for >1 prims but !BRW_NEW_VERTICES.
Kenneth Graunke [Thu, 18 Dec 2014 12:45:40 +0000 (04:45 -0800)]
i965: Fix start/base_vertex_location for >1 prims but !BRW_NEW_VERTICES.

This is a partial revert of c89306983c07e5a88c0d636267e5ccf263cb4213.
It split the {start,base}_vertex_location handling into several steps:

1. Set brw->draw.start_vertex_location = prim[i].start
   and brw->draw.base_vertex_location = prim[i].basevertex.
   (This happened once per _mesa_prim, in the main drawing loop.)
2. Add brw->vb.start_vertex_bias and brw->ib.start_vertex_offset
   appropriately.  (This happened in brw_prepare_shader_draw_parameters,
   which was called just after brw_prepare_vertices, as part of state
   upload, and only happened when BRW_NEW_VERTICES was flagged.)
3. Use those values when emitting 3DPRIMITIVE (once per _mesa_prim).

If we drew multiple _mesa_prims, but didn't flag BRW_NEW_VERTICES on
the second (or later) primitives, we would do step #1, but not #2.
The first _mesa_prim would get correct values, but subsequent ones
would only get the first half of the summation.

The reason I originally did this was because I needed the value of
gl_BaseVertexARB to exist in a buffer object prior to uploading
3DSTATE_VERTEX_BUFFERS.  I believed I wanted to upload the value
of 3DPRIMITIVE's "Base Vertex Location" field, which was computed
as: (prims[i].indexed ? prims[i].start : prims[i].basevertex) +
brw->vb.start_vertex_bias.  The latter value wasn't available until
after brw_prepare_vertices, and the former weren't available in the
state upload code at all.  Hence the awkward split.

However, I believe that including brw->vb.start_vertex_bias was a
mistake.  It's an extra bias we apply when uploading vertex data into
VBOs, to move [min_index, max_index] to [0, max_index - min_index].

>From the GL_ARB_shader_draw_parameters specification:
"<gl_BaseVertexARB> holds the integer value passed to the <baseVertex>
 parameter to the command that resulted in the current shader
 invocation.  In the case where the command has no <baseVertex>
 parameter, the value of <gl_BaseVertexARB> is zero."

I conclude that gl_BaseVertexARB should only include the baseVertex
parameter from glDraw*Elements*, not any internal biases we add for
optimization purposes.

With that in mind, gl_BaseVertexARB only needs prim[i].start or
prim[i].basevertex.  We can simply store that, and go back to computing
start_vertex_location and base_vertex_location in brw_emit_prim(), like
we used to.  This is much simpler, and should actually fix two bugs.

Fixes missing geometry in Unvanquished.

Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85529
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
9 years agoi965: Use WARN_ONCE for the single-primitive-exceeded-aperture message.
Kenneth Graunke [Tue, 30 Dec 2014 20:21:03 +0000 (12:21 -0800)]
i965: Use WARN_ONCE for the single-primitive-exceeded-aperture message.

This makes it show up via ARB_debug_output and is also less code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agou_primconvert: Fix leak of the upload BO on context destroy.
Eric Anholt [Tue, 30 Dec 2014 23:39:20 +0000 (15:39 -0800)]
u_primconvert: Fix leak of the upload BO on context destroy.

v2: Conditionalize it on having done any uploads (Turns out
    u_upload_destroy() isn't safe with a NULL arg).

Reviewed-by: Dave Airlie <airlied@redhat.com> (v1)
9 years agovc4: Fix memory leak as of 0404e7fe0ac2a6234a11290b4b1596e8bc127a4b.
Eric Anholt [Wed, 31 Dec 2014 00:10:28 +0000 (16:10 -0800)]
vc4: Fix memory leak as of 0404e7fe0ac2a6234a11290b4b1596e8bc127a4b.

Can't reset the CL before looking at how much we had pupt in it.

9 years agonv50,nvc0: set vertex id base to index_bias
Ilia Mirkin [Wed, 31 Dec 2014 04:19:47 +0000 (23:19 -0500)]
nv50,nvc0: set vertex id base to index_bias

Fixes the piglits which check that gl_VertexID includes the base vertex
offset:
  arb_draw_indirect-vertexid elements
  gl-3.2-basevertex-vertexid

Note that this leaves out the original G80, for which this will continue
to fail. It could be fixed by passing a driver constbuf value in, but
that's beyond the scope of this change.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.3 10.4" <mesa-stable@lists.freedesktop.org>
9 years agonv50,nvc0: implement half_pixel_center
Tiziano Bacocco [Tue, 30 Dec 2014 20:33:48 +0000 (21:33 +0100)]
nv50,nvc0: implement half_pixel_center

LAST_LINE_PIXEL has actually been renamed to PIXEL_CENTER_INTEGER in
rnndb; use that method to implement the rasterizer setting, used for
st/nine.

Signed-off-by: Tiziano Bacocco <tizbac2@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agovc4: Only render tiles where the scissor ever intersected them.
Eric Anholt [Sun, 28 Dec 2014 18:14:19 +0000 (08:14 -1000)]
vc4: Only render tiles where the scissor ever intersected them.

This gives a 2.7x improvement in x11perf -rect100, since we only end up
load/storing the x11perf window, not the whole screen.

9 years agovc4: Move draw call reset handling to a helper function.
Eric Anholt [Tue, 30 Dec 2014 20:12:15 +0000 (12:12 -0800)]
vc4: Move draw call reset handling to a helper function.

This will be more important in the next commit, when there's more state to
reset to nonzero values, and I want an early exit from the submit
function.

9 years agovc4: Drop the content of vc4_flush_resource().
Eric Anholt [Fri, 26 Dec 2014 02:24:15 +0000 (16:24 -1000)]
vc4: Drop the content of vc4_flush_resource().

The callers all follow it with a flush of the context, and the flush of
the context gives us more information about how things are being flushed.

9 years agodocs: add news item and link release notes for mesa 10.3.6/10.4.1
Emil Velikov [Tue, 30 Dec 2014 02:50:43 +0000 (02:50 +0000)]
docs: add news item and link release notes for mesa 10.3.6/10.4.1

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
9 years agodocs: Add sha256 sums for the 10.4.1 release
Emil Velikov [Tue, 30 Dec 2014 02:38:02 +0000 (02:38 +0000)]
docs: Add sha256 sums for the 10.4.1 release

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
9 years agoAdd release notes for the 10.4.1 release
Emil Velikov [Tue, 30 Dec 2014 02:11:34 +0000 (02:11 +0000)]
Add release notes for the 10.4.1 release

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
9 years agodocs: Add sha256 sums for the 10.3.6 release
Emil Velikov [Tue, 30 Dec 2014 01:44:43 +0000 (01:44 +0000)]
docs: Add sha256 sums for the 10.3.6 release

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
9 years agoAdd release notes for the 10.3.6 release
Emil Velikov [Tue, 30 Dec 2014 01:21:22 +0000 (01:21 +0000)]
Add release notes for the 10.3.6 release

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
9 years agomesa: Remove __SSE4_1__ guards from sse_minmax.c.
Matt Turner [Mon, 29 Dec 2014 18:52:17 +0000 (10:52 -0800)]
mesa: Remove __SSE4_1__ guards from sse_minmax.c.

See commit e07c9a288.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
9 years agoi965/vec4: Do separate copy followed by constant propagation after opt_vector_float().
Matt Turner [Sun, 21 Dec 2014 02:02:29 +0000 (18:02 -0800)]
i965/vec4: Do separate copy followed by constant propagation after opt_vector_float().

total instructions in shared programs: 5877012 -> 5876617 (-0.01%)
instructions in affected programs:     33140 -> 32745 (-1.19%)

From before the commit that allows VF constant propagation (which hurt
some programs) to here, the results are:

total instructions in shared programs: 5877951 -> 5876617 (-0.02%)
instructions in affected programs:     123444 -> 122110 (-1.08%)

with no programs hurt.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agoi965/vec4: Allow constant propagation of VF immediates.
Matt Turner [Sun, 21 Dec 2014 01:42:52 +0000 (17:42 -0800)]
i965/vec4: Allow constant propagation of VF immediates.

total instructions in shared programs: 5877951 -> 5877012 (-0.02%)
instructions in affected programs:     155923 -> 154984 (-0.60%)

Helps 1233, hurts 156 shaders. The hurt shaders are addressed in the
next commit.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agoi965/vec4: Add parameter to skip doing constant propagation.
Matt Turner [Sun, 21 Dec 2014 01:37:09 +0000 (17:37 -0800)]
i965/vec4: Add parameter to skip doing constant propagation.

After CSEing some MOV ..., VF instructions we have code like

   mov tmp, [1F, 2F, 3F, 4F]VF
   mov r10, tmp
   mov r11, tmp
   ...
   use r10
   use r11

We want to copy propagate tmp into the uses of r10 and r11, but *not*
constant propagate the VF immediate into the uses of tmp.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agoi965/vec4: Do CSE, copy propagation, and DCE after opt_vector_float().
Matt Turner [Sat, 20 Dec 2014 21:17:00 +0000 (13:17 -0800)]
i965/vec4: Do CSE, copy propagation, and DCE after opt_vector_float().

total instructions in shared programs: 5869005 -> 5868220 (-0.01%)
instructions in affected programs:     70208 -> 69423 (-1.12%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agoi965/vec4: Perform CSE on MOV ..., VF instructions.
Matt Turner [Thu, 3 Apr 2014 21:29:30 +0000 (14:29 -0700)]
i965/vec4: Perform CSE on MOV ..., VF instructions.

Port of commit a28ad9d4 from the fs backend.

No shader-db changes since we don't emit MOV ..., VF instructions yet.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agoi965/vec4: Add pass to gather constants into a vector-float MOV.
Matt Turner [Sat, 20 Dec 2014 19:50:31 +0000 (11:50 -0800)]
i965/vec4: Add pass to gather constants into a vector-float MOV.

Currently only handles consecutive instructions with the same
destination that collectively write all channels.

total instructions in shared programs: 5879798 -> 5869011 (-0.18%)
instructions in affected programs:     465236 -> 454449 (-2.32%)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agoi965: Add support for saturating immediates.
Matt Turner [Sun, 21 Dec 2014 14:56:54 +0000 (06:56 -0800)]
i965: Add support for saturating immediates.

I don't feel great about assert(!"unimplemented: ...") but these
cases do only seem possible under some currently impossible circumstances.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agoi965: Add fs_reg/src_reg constructors that take vf[4].
Matt Turner [Sat, 20 Dec 2014 19:47:40 +0000 (11:47 -0800)]
i965: Add fs_reg/src_reg constructors that take vf[4].

Sometimes it's easier to generate 4x values into an array, and the
memcpy is 1 instruction, rather than 11 to piece 4 arguments together.

I'd forgotten to remove the prototype from fs_reg from a previous patch,
so it's already there for us here.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agogallium/target: Drop no longer needed Haiku viewport override
Alexander von Gluck IV [Sat, 27 Dec 2014 06:12:54 +0000 (06:12 +0000)]
gallium/target: Drop no longer needed Haiku viewport override

* Drop no longer needed mesa headers
* Haiku LLVM pipe working with LLVM 3.5.0 on x86_64

9 years agogallium/st: Clean up Haiku depth mapping, fix colorspace errors
Alexander von Gluck IV [Sat, 27 Dec 2014 05:55:23 +0000 (05:55 +0000)]
gallium/st: Clean up Haiku depth mapping, fix colorspace errors

9 years agovc4: Handle unaligned accesses in CL emits.
Eric Anholt [Thu, 25 Dec 2014 22:22:02 +0000 (12:22 -1000)]
vc4: Handle unaligned accesses in CL emits.

As of 229bf4475ff0a5dbeb9bc95250f7a40a983c2e28 we started getting SIBGUS
from unaligned accesses on the hardware, for reasons I haven't figured
out.  However, we should be avoiding unaligned accesses anyway, and our CL
setup certainly would have produced them.

9 years agovc4: Don't bother zero-initializing the shader reloc indices.
Eric Anholt [Thu, 25 Dec 2014 22:14:54 +0000 (12:14 -1000)]
vc4: Don't bother zero-initializing the shader reloc indices.

They should all be set to real values by the time they're read, and
ideally if you used valgrind you'd see uninitialized value uses.

9 years agovc4: Fix the argument type for cl_u16().
Eric Anholt [Thu, 25 Dec 2014 19:35:46 +0000 (09:35 -1000)]
vc4: Fix the argument type for cl_u16().

It doesn't matter, since it just got truncated to 16 inside, anyway.

9 years agoegl: Fix non-dri SCons builds re #87657
Alexander von Gluck IV [Wed, 24 Dec 2014 13:44:25 +0000 (07:44 -0600)]
egl: Fix non-dri SCons builds re #87657

* Revert change to egl main producing Shared Libraries
* Check for dri before including dri code

9 years agoradeonsi: Don't modify PA_SC_RASTER_CONFIG register value if rb_mask == 0
Michel Dänzer [Tue, 9 Dec 2014 08:00:32 +0000 (17:00 +0900)]
radeonsi: Don't modify PA_SC_RASTER_CONFIG register value if rb_mask == 0

E.g. this could happen on older kernels which don't support the
RADEON_INFO_SI_BACKEND_ENABLED_MASK query yet. The code in
si_write_harvested_raster_configs() doesn't deal with this correctly and
would probably mangle the value badly.

Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
9 years agovc4: Optimize CL emits by doing size checks up front.
Eric Anholt [Mon, 22 Dec 2014 18:09:10 +0000 (10:09 -0800)]
vc4: Optimize CL emits by doing size checks up front.

The optimizer obviously doesn't have the ability to rewrite these to skip
the size checks per call, so we have to do it manually.

Improves a norast benchmark on simulation by 0.779706% +/- 0.405838%
(n=6087).

9 years agovc4: Avoid repeated hindex lookups in the loop over tiles.
Eric Anholt [Sun, 21 Dec 2014 21:10:25 +0000 (13:10 -0800)]
vc4: Avoid repeated hindex lookups in the loop over tiles.

Improves norast performance of a microbenchmark by 11.1865% +/- 2.37673%
(n=20).

9 years agoi965: Add missing BRW_NEW_*_PROG_DATA to texture/renderbuffer atoms.
Kenneth Graunke [Tue, 23 Dec 2014 02:43:08 +0000 (18:43 -0800)]
i965: Add missing BRW_NEW_*_PROG_DATA to texture/renderbuffer atoms.

This was probably missed when moving from a fixed binding table layout
to a dynamic one that changes based on the shader.

Fixes newly proposed Piglit test fbo-mrt-new-bind.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87619
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Mike Stroyan <mike@LunarG.com>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
9 years agoi965: Cache register write capability checks.
Kenneth Graunke [Mon, 22 Dec 2014 08:55:37 +0000 (00:55 -0800)]
i965: Cache register write capability checks.

Our ability to perform register writes depends on the hardware and
kernel version.  It shouldn't ever change on a per-context basis,
so we only need to check once.

Checking introduces a synchronization point between the CPU and GPU:
even though we submit very few GPU commands, the GPU might be busy doing
other work, which could cause us to stall for a while.

On an idle i7 4750HQ, this improves performance in OglDrvCtx (a context
creation microbenchmark) by 6.14748% +/- 1.6837% (n=20).  With Unigine
Valley running in the background (to keep the GPU busy), it improves
performance in OglDrvCtx by 2290.92% +/- 29.5274% (n=5).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
9 years agofreedreno/ir3: split out legalize pass
Rob Clark [Sat, 25 Oct 2014 19:47:21 +0000 (15:47 -0400)]
freedreno/ir3: split out legalize pass

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno/ir3: ra debug
Rob Clark [Sat, 25 Oct 2014 18:04:29 +0000 (14:04 -0400)]
freedreno/ir3: ra debug

Some compile time RA debug

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agoegl/haiku: Clean up SConscript whitespace
Alexander von Gluck IV [Mon, 22 Dec 2014 16:29:56 +0000 (11:29 -0500)]
egl/haiku: Clean up SConscript whitespace

9 years agoegl/dri2: Fix build of dri2 egl driver with SCons
Alexander von Gluck IV [Mon, 22 Dec 2014 16:27:35 +0000 (11:27 -0500)]
egl/dri2: Fix build of dri2 egl driver with SCons

* egl/dri2 was missing a SConscript
* Problem caught by Adrián Arroyo Calle

9 years agoegl: Clean up Haiku visual creation
Alexander von Gluck IV [Mon, 22 Dec 2014 16:02:50 +0000 (16:02 +0000)]
egl: Clean up Haiku visual creation

* Only create one struct
* 'final' also is a language conflict
* Some style cleanup

9 years agoegl: Add Haiku code and support
Alexander von Gluck IV [Mon, 22 Dec 2014 15:10:13 +0000 (10:10 -0500)]
egl: Add Haiku code and support

* This is the cleaned up work of the Haiku GCI student
  Adrián Arroyo Calle adrian.arroyocalle@gmail.com
* Several patches were consolidated to prevent
  unnecessary touching of non-related code

9 years agoglsl: check if implicitly sized arrays match explicitly sized arrays across the same...
Timothy Arceri [Tue, 25 Nov 2014 12:04:23 +0000 (23:04 +1100)]
glsl: check if implicitly sized arrays match explicitly sized arrays across the same stage

V2: Improve error message.

Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
9 years agoi965: Use safer pointer arithmetic in gather_oa_results()
Chad Versace [Wed, 19 Nov 2014 05:11:26 +0000 (21:11 -0800)]
i965: Use safer pointer arithmetic in gather_oa_results()

This patch reduces the likelihood of pointer arithmetic overflow bugs in
gather_oa_results(), like the one fixed by b69c7c5dac.

I haven't yet encountered any overflow bugs in the wild along this
patch's codepath. But I get nervous when I see code patterns like this:

   (void*) + (int) * (int)

I smell 32-bit overflow all over this code.

This patch retypes 'snapshot_size' to 'ptrdiff_t', which should fix any
potential overflow.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
9 years agoi965: Use safer pointer arithmetic in intel_texsubimage_tiled_memcpy()
Chad Versace [Wed, 19 Nov 2014 05:11:25 +0000 (21:11 -0800)]
i965: Use safer pointer arithmetic in intel_texsubimage_tiled_memcpy()

This patch reduces the likelihood of pointer arithmetic overflow bugs in
intel_texsubimage_tiled_memcpy() , like the one fixed by b69c7c5dac.

I haven't yet encountered any overflow bugs in the wild along this
patch's codepath. But I recently solved, in commit b69c7c5dac, an overflow
bug in a line of code that looks very similar to pointer arithmetic in
this function.

This patch conceptually applies the same fix as in b69c7c5dac. Instead
of retyping the variables, though, this patch adds some casts. (I tried
to retype the variables as ptrdiff_t, but it quickly got very messy. The
casts are cleaner).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
9 years agoi965: Fix intel_miptree_map() signature to be more 64-bit safe
Chad Versace [Wed, 19 Nov 2014 05:11:24 +0000 (21:11 -0800)]
i965: Fix intel_miptree_map() signature to be more 64-bit safe

This patch should diminish the likelihood of pointer arithmetic overflow
bugs, like the one fixed by b69c7c5dac.

Change the type of parameter 'out_stride' from int to ptrdiff_t. The
logic is that if you call intel_miptree_map() and use the value of
'out_stride', then you must be doing pointer arithmetic on 'out_ptr'.
Using ptrdiff_t instead of int should make a little bit harder to hit
overflow bugs.

As a side-effect, some function-scope variables needed to be retyped to
avoid compilation errors.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
9 years agoi965: Remove spurious casts in copy_image_with_memcpy()
Chad Versace [Wed, 19 Nov 2014 05:11:23 +0000 (21:11 -0800)]
i965: Remove spurious casts in copy_image_with_memcpy()

If a pointer points to raw, untyped memory and is never dereferenced,
then declare it as 'void*' instead of casting it to 'void*'.

Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoradeonsi: force NaNs to 0
Marek Olšák [Wed, 10 Dec 2014 20:08:50 +0000 (21:08 +0100)]
radeonsi: force NaNs to 0

This fixes incorrect rendering in Unreal Engine demos.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83510

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agost/nine: fix DBG typo (trivial)
David Heidelberg [Fri, 19 Dec 2014 13:13:15 +0000 (14:13 +0100)]
st/nine: fix DBG typo (trivial)

Signed-off-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
9 years agor300g: implement ARR opcode
David Heidelberg [Fri, 19 Dec 2014 13:11:21 +0000 (14:11 +0100)]
r300g: implement ARR opcode

Same as ARL, just has extra rounding.
Useful for st/nine.

Tested-by: Pavel Ondračka <pavel.ondracka@email.cz>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
9 years agofreedreno/a4xx: blend-color
Rob Clark [Sat, 20 Dec 2014 17:04:05 +0000 (12:04 -0500)]
freedreno/a4xx: blend-color

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno/a4xx: alpha-test
Rob Clark [Sat, 20 Dec 2014 17:01:02 +0000 (12:01 -0500)]
freedreno/a4xx: alpha-test

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno: update generated headers
Rob Clark [Sat, 20 Dec 2014 16:49:34 +0000 (11:49 -0500)]
freedreno: update generated headers

9 years agofreedreno/ir3: trans_kill cleanup
Rob Clark [Sat, 20 Dec 2014 16:46:43 +0000 (11:46 -0500)]
freedreno/ir3: trans_kill cleanup

trans_kill() only handles the single opcode.  Drop the remnant of a time
when both KILL and KILL_IF were handled by the same fxn.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno/ir3: hack for standalone compiler
Rob Clark [Sat, 20 Dec 2014 16:44:28 +0000 (11:44 -0500)]
freedreno/ir3: hack for standalone compiler

Standalone compiler doesn't have screen or context.  We need to come up
with a better way to control the target arch (ie. something that we can
control from cmdline w/ standalone compiler) but for now this hack keeps
it from segfault'ing.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agoi965/fs: Add missing const qualifier.
Matt Turner [Fri, 19 Dec 2014 20:55:13 +0000 (12:55 -0800)]
i965/fs: Add missing const qualifier.

9 years agovc4: Coalesce MOVs into VPM with the instructions generating the values.
Eric Anholt [Thu, 18 Dec 2014 04:35:17 +0000 (20:35 -0800)]
vc4: Coalesce MOVs into VPM with the instructions generating the values.

total instructions in shared programs: 41168 -> 40976 (-0.47%)
instructions in affected programs:     18156 -> 17964 (-1.06%)

9 years agovc4: Redefine VPM writes as a (destination) QIR register file.
Eric Anholt [Thu, 18 Dec 2014 04:23:57 +0000 (20:23 -0800)]
vc4: Redefine VPM writes as a (destination) QIR register file.

This will let me coalesce the VPM writes into the instructions generating
the values.

9 years agodocs: note change in minimum GCC version to 4.2.0
Timothy Arceri [Wed, 17 Dec 2014 20:46:24 +0000 (07:46 +1100)]
docs: note change in minimum GCC version to 4.2.0

Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
9 years agogallium: remove support for GCC older than 4.2.0
Timothy Arceri [Wed, 17 Dec 2014 20:45:04 +0000 (07:45 +1100)]
gallium: remove support for GCC older than 4.2.0

Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agomesa: bump required GCC version to 4.2.0
Timothy Arceri [Wed, 17 Dec 2014 20:29:47 +0000 (07:29 +1100)]
mesa: bump required GCC version to 4.2.0

It turns out Mesa hasn't compiled on less then 4.2 for a while
 so update conf to reflect this.

Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agovc4: Add support for turning constant uniforms into small immediates.
Eric Anholt [Wed, 10 Dec 2014 22:56:46 +0000 (14:56 -0800)]
vc4: Add support for turning constant uniforms into small immediates.

Small immediates have the downside of taking over the raddr B field, so
you might have less chance to pack instructions together thanks to raddr B
conflicts.  However, it also reduces some register pressure since it lets
you load 2 "uniform" values in one instruction (avoiding a previous load
of the constant value to a register), and increases some pairing for the
same reason.

total uniforms in shared programs: 16231 -> 13374 (-17.60%)
uniforms in affected programs:     10280 -> 7423 (-27.79%)
total instructions in shared programs: 40795 -> 41168 (0.91%)
instructions in affected programs:     25551 -> 25924 (1.46%)

In a previous version of this patch I had a reduction in instruction count
by forcing the other args alongside a SMALL_IMM to be in the A file or
accumulators, but that increases register pressure and had a bug in
handling FRAG_Z.  In this patch is I just use raddr conflict resolution,
which is more expensive.  I think I'd rather tweak allocation to have some
way to slightly prefer good choices for files in general, rather than risk
failing to register allocate by forcing things into register classes.

9 years agovc4: Move follow_movs() to common QIR code.
Eric Anholt [Wed, 10 Dec 2014 23:37:07 +0000 (15:37 -0800)]
vc4: Move follow_movs() to common QIR code.

I want this from other passes.

9 years agovc4: Fix missing newline for load immediate instruction disasm.
Eric Anholt [Thu, 11 Dec 2014 00:20:36 +0000 (16:20 -0800)]
vc4: Fix missing newline for load immediate instruction disasm.

9 years agomesa: Remove unnecessary -f from $(RM).
Matt Turner [Wed, 17 Dec 2014 21:19:37 +0000 (13:19 -0800)]
mesa: Remove unnecessary -f from $(RM).

$(RM) includes -f.

9 years agomesa: Remove tarballs/checksum rules.
Matt Turner [Mon, 8 Dec 2014 02:09:49 +0000 (18:09 -0800)]
mesa: Remove tarballs/checksum rules.

9 years agogallium: Add egl and gbm to distribution.
Matt Turner [Wed, 17 Dec 2014 21:37:38 +0000 (13:37 -0800)]
gallium: Add egl and gbm to distribution.

9 years agomesa: Set DISTCHECK_CONFIGURE_FLAGS.
Matt Turner [Wed, 17 Dec 2014 20:41:02 +0000 (12:41 -0800)]
mesa: Set DISTCHECK_CONFIGURE_FLAGS.

Enable some non-default options that distros are likely to use.

9 years agotargets/xvmc: Add uninstall hooks to handle megadriver hardlinks.
Matt Turner [Wed, 17 Dec 2014 20:40:43 +0000 (12:40 -0800)]
targets/xvmc: Add uninstall hooks to handle megadriver hardlinks.

9 years agotargets/vdpau: Add uninstall hooks to handle megadriver hardlinks.
Matt Turner [Wed, 17 Dec 2014 20:40:30 +0000 (12:40 -0800)]
targets/vdpau: Add uninstall hooks to handle megadriver hardlinks.

9 years agotargets/vdpau: Add clean-local rule to remove .lib links.
Matt Turner [Wed, 17 Dec 2014 20:39:59 +0000 (12:39 -0800)]
targets/vdpau: Add clean-local rule to remove .lib links.

9 years agovc4: Add a userspace BO cache.
Eric Anholt [Sat, 13 Dec 2014 23:27:39 +0000 (15:27 -0800)]
vc4: Add a userspace BO cache.

Since our kernel BOs require CMA allocation, and the use of them requires
new mmaps, it's pretty expensive and we should avoid it if possible.
Copying my original design for Intel, make a userspace cache that reuses
BOs that haven't been shared to other processes but frees BOs that have
sat in the cache for over a second.

Improves glxgears framerate on RPi by around 30%.