mesa.git
11 years agomesa: Add unreachable() macro.
Matt Turner [Tue, 5 Nov 2013 00:24:35 +0000 (16:24 -0800)]
mesa: Add unreachable() macro.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
11 years agogallivm: fix indirect addressing of inputs
Roland Scheidegger [Wed, 6 Nov 2013 14:40:25 +0000 (15:40 +0100)]
gallivm: fix indirect addressing of inputs

We weren't adding the soa offsets when constructing the indices
for the gather functions. That meant that we were always returning
the data in the first element.
(Copied straight from the same fix for temps.)
While here fix up a couple of broken comments in the fetch functions,
plus don't name a straight float type float4 which is just confusing.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Zack Rusin <zackr@vmware.com>
11 years agor600/llvm: Fix isampleBuffer on preEG
Vincent Lejeune [Mon, 21 Oct 2013 19:05:57 +0000 (21:05 +0200)]
r600/llvm: Fix isampleBuffer on preEG

11 years agor600/llvm: Fix texbuf for pre EG gen
Vincent Lejeune [Mon, 21 Oct 2013 16:48:21 +0000 (18:48 +0200)]
r600/llvm: Fix texbuf for pre EG gen

11 years agomesa: for GLSL_DUMP_ON_ERROR, also dump the info log
Brian Paul [Tue, 5 Nov 2013 23:58:15 +0000 (16:58 -0700)]
mesa: for GLSL_DUMP_ON_ERROR, also dump the info log

Since it's helpful to know why the shader did not compile.
Also, call fflush() for Windows.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
11 years agost/vdpau: resolve delayed rendering for GL interop v2
Grigori Goronzy [Tue, 5 Nov 2013 23:35:31 +0000 (00:35 +0100)]
st/vdpau: resolve delayed rendering for GL interop v2

Otherwise OutputSurface interop has funny results sometimes.
This fixes interop with the mpv media player.

v2 (chk): add proper locking

Signed-off-by: Christian König <christian.koenig@amd.com>
11 years agodocs: Mark off ARB_sample_shading; minor tidyup.
Chris Forbes [Wed, 6 Nov 2013 06:35:41 +0000 (19:35 +1300)]
docs: Mark off ARB_sample_shading; minor tidyup.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
11 years agoi965/fs: Gen4-5: Implement alpha test in shader for MRT
Chris Forbes [Sat, 26 Oct 2013 23:32:03 +0000 (12:32 +1300)]
i965/fs: Gen4-5: Implement alpha test in shader for MRT

V2: Add comment explaining what emit_alpha_test() is for;
    fix spurious temp and bogus whitespace.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoi965/fs: Gen4-5: Setup discard masks for MRT alpha test
Chris Forbes [Sun, 27 Oct 2013 15:18:29 +0000 (04:18 +1300)]
i965/fs: Gen4-5: Setup discard masks for MRT alpha test

The same setup is required here as when the user-provided shader
explicitly uses KIL or discard.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoi965: Gen4-5: Include alpha func/ref in program key
Chris Forbes [Sat, 26 Oct 2013 23:09:51 +0000 (12:09 +1300)]
i965: Gen4-5: Include alpha func/ref in program key

V2: Better explanation of the rationale for doing this.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoi965: Gen4-5: Don't enable hardware alpha test with MRT
Chris Forbes [Sat, 26 Oct 2013 23:09:51 +0000 (12:09 +1300)]
i965: Gen4-5: Don't enable hardware alpha test with MRT

We have to do this in the shader instead, since these gens lack an
independent RT0 alpha value in their render target write messages.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoi965: Combine {brw,gen7}_update_texture_buffer_surface() functions.
Kenneth Graunke [Sat, 2 Nov 2013 03:05:27 +0000 (20:05 -0700)]
i965: Combine {brw,gen7}_update_texture_buffer_surface() functions.

Now that brw_update_texture_buffer_surface() uses the virtual
emit_buffer_surface_state() function, it works for Gen7+ too.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
11 years agoi965: Unvirtualize brw_create_constant_surface; delete Gen7+ variant.
Kenneth Graunke [Sat, 2 Nov 2013 00:37:10 +0000 (17:37 -0700)]
i965: Unvirtualize brw_create_constant_surface; delete Gen7+ variant.

Now that brw_create_constant_surface uses a virtual function internally,
it doesn't need to be virtual itself.  We can delete the Gen7+ variant
and simplify things.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
11 years agoi965: Use the new emit_buffer_surface_state() vtable entry.
Kenneth Graunke [Sat, 2 Nov 2013 00:33:42 +0000 (17:33 -0700)]
i965: Use the new emit_buffer_surface_state() vtable entry.

This will allow us to combine the Gen4-6 and Gen7 variants of these
functions.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
11 years agoi965: Virtualize emit_buffer_surface_state().
Kenneth Graunke [Fri, 25 Oct 2013 18:37:06 +0000 (11:37 -0700)]
i965: Virtualize emit_buffer_surface_state().

This entails adding "mocs" and "rw" parameters to the Gen4-5 version.
I made it actually pay attention to the rw flag (even though it is
always false), but mocs is always ignored.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
11 years agoi965: Fix compiler warning.
Courtney Goeltzenleuchter [Wed, 30 Oct 2013 21:58:30 +0000 (15:58 -0600)]
i965: Fix compiler warning.

fix: intel_screen.c:1320:4: warning: initialization from
incompatible pointer type [enabled by default]

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965: Tell the unit states how many binding table entries we have.
Eric Anholt [Sat, 2 Nov 2013 00:43:43 +0000 (17:43 -0700)]
i965: Tell the unit states how many binding table entries we have.

Before the series with 3c9dc2d31b80fc73bffa1f40a91443a53229c8e2 to
dynamically assign our binding table indices, we didn't really track our
binding table count per shader, so we never filled in these fields.

Affects cairo-gl trace runtime by -2.47953% +/- 1.07281% (n=20)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965: Fix context initialization after 2f896627175384fd5
Eric Anholt [Mon, 4 Nov 2013 23:49:52 +0000 (15:49 -0800)]
i965: Fix context initialization after 2f896627175384fd5

You can't return stack-initialized values and expect anything good to
happen.

Reviewed-by: Chad Versace <chad.versace@linux.intel.com
Reviewed-by: Matt Turner <mattst88@gmail.com>
11 years agogallivm: optimize lp_build_minify for sse
Roland Scheidegger [Tue, 5 Nov 2013 18:21:25 +0000 (19:21 +0100)]
gallivm: optimize lp_build_minify for sse

SSE can't handle true vector shifts (with variable shift count),
so llvm is turning them into a mess of extracts, scalar shifts and inserts.
It is however possible to emulate them in lp_build_minify with float muls,
which should be way faster (saves over 20 instructions per 8-wide
lp_build_minify). This wouldn't work for "generic" 32bit shifts though
since we've got only 24bits of mantissa (actually for left shifts it would
work by using sse41 int mul instead of float mul but not for right shifts).
Note that this has very limited scope for now, since this is only used with
per-pixel lod (otherwise we're avoiding the non-constant shift count by doing
per-quad shifts manually), and only 1d textures even then (though the latter
should change).

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
11 years agonouveau: Use _NEW_SCISSOR instead of hooking through dd_function_table
Ian Romanick [Fri, 1 Nov 2013 21:56:53 +0000 (14:56 -0700)]
nouveau: Use _NEW_SCISSOR instead of hooking through dd_function_table

This will enable removing the dd_function_table::Scissor hook in the
near future.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
11 years agonouveau: Use _NEW_VIEWPORT instead of hooking through dd_function_table
Ian Romanick [Fri, 1 Nov 2013 21:56:28 +0000 (14:56 -0700)]
nouveau: Use _NEW_VIEWPORT instead of hooking through dd_function_table

This will enable removing the dd_function_table::DepthRange hook in the
near future.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
11 years agoradeon / r200: Don't pass unused parameters to radeon_viewport
Ian Romanick [Fri, 1 Nov 2013 18:40:44 +0000 (11:40 -0700)]
radeon / r200: Don't pass unused parameters to radeon_viewport

The x, y, width, and height parameters aren't used by radeon_viewport,
so don't pass them.  This should make future changes to the
dd_function_table::Viewport interface a little easier.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jljusten@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Courtney Goeltzenleuchter <courtney@lunarg.com>
11 years agoi915: Bring sanity to the Viewport function
Ian Romanick [Fri, 1 Nov 2013 18:38:25 +0000 (11:38 -0700)]
i915: Bring sanity to the Viewport function

The i830 and the i915 driver have the same dd_function_table::Viewport
function... it just has two names and lives in two places.  Using a
single implementation allows cleaning up the saved_viewport nonsense
too.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jljusten@gmail.com>
Cc: Courtney Goeltzenleuchter <courtney@lunarg.com>
11 years agoi965: Eliminate the saved_viewport wrapper
Ian Romanick [Fri, 1 Nov 2013 18:36:47 +0000 (11:36 -0700)]
i965: Eliminate the saved_viewport wrapper

The i965 driver never installed a dd_function_table::Viewport function,
so this wrapper never actually did anything.

No piglit regressions on IVB on DRI2.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jljusten@gmail.com>
Cc: Courtney Goeltzenleuchter <courtney@lunarg.com>
11 years agomesa: Remove last BEOS checks
Alexander von Gluck IV [Tue, 5 Nov 2013 01:31:26 +0000 (01:31 +0000)]
mesa: Remove last BEOS checks

* Goodbye BeOS, we hardly knew thee
* As BeOS was gcc2 only, there was little chance
  of this being useful.
* Doesn't effect Haiku in any meaningful way

Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agoutil/u_format: take normalized flag in consideration in util_format_is_rgba8_variant
José Fonseca [Fri, 25 Oct 2013 11:39:42 +0000 (12:39 +0100)]
util/u_format: take normalized flag in consideration in util_format_is_rgba8_variant

Just happened to notice it was missing while looking at it.

11 years agoglsl: Don't generate misleading debug names when packing gs inputs.
Paul Berry [Thu, 31 Oct 2013 00:01:01 +0000 (17:01 -0700)]
glsl: Don't generate misleading debug names when packing gs inputs.

Previously, when packing geometry shader input varyings like this:

    in float foo[3];
    in float bar[3];

lower_packed_varyings would declare a packed varying like this:

    (declare (shader_in flat) (array ivec4 3) packed:foo[0],bar[0])

That's confusing, since the packed varying acutally stores all three
values of foo and all three values of bar.

This patch causes it to generate the more sensible declaration:

    (declare (shader_in flat) (array ivec4 3) packed:foo,bar)

Note that there should be no functional change for users of geometry
shaders, since the packed name is only used for generating debug
output.  But this should reduce confusion when using INTEL_DEBUG=gs.

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agogallivm: Remove llvm::DisablePrettyStackTrace for LLVM >= 3.4.
Vinson Lee [Mon, 4 Nov 2013 04:27:13 +0000 (20:27 -0800)]
gallivm: Remove llvm::DisablePrettyStackTrace for LLVM >= 3.4.

LLVM 3.4 r193971 removed llvm::DisablePrettyStackTrace and made the
pretty stack trace opt-in rather than opt-out.

The default value of DisablePrettyStackTrace has changed to true in LLVM
3.4 and newer.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60929
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agotarget/haiku-softpipe: Fix viewport issues
Alexander von Gluck IV [Mon, 4 Nov 2013 18:51:41 +0000 (18:51 +0000)]
target/haiku-softpipe: Fix viewport issues

* Call mesa viewport call on winndow resize
* Add initial postprocessing code
* Pass hgl_context to private statetracker
  as it is more useful than GalliumContext
* Use Lock and Unlock functions to standardize
  GalliumContext locking
* Create texture resources in texture validation

Acked-by: Brian Paul <brianp@vmware.com>
11 years agomesa: remove __alpha__ && CCPML check
Brian Paul [Tue, 5 Nov 2013 01:07:37 +0000 (18:07 -0700)]
mesa: remove __alpha__ && CCPML check

Reviewed-by: Matt Turner <mattst88@gmail.com>
11 years agomesa: remove OPENSTEP stuff
Brian Paul [Tue, 5 Nov 2013 00:47:19 +0000 (17:47 -0700)]
mesa: remove OPENSTEP stuff

Reviewed-by: Matt Turner <mattst88@gmail.com>
11 years agomesa: remove macintosh preprocessor stuff
Brian Paul [Tue, 5 Nov 2013 00:47:19 +0000 (17:47 -0700)]
mesa: remove macintosh preprocessor stuff

IIRC, this is MacOS 9.x stuff.

Reviewed-by: Matt Turner <mattst88@gmail.com>
11 years agomesa: remove __QUICKDRAW__ tests
Brian Paul [Tue, 5 Nov 2013 00:47:19 +0000 (17:47 -0700)]
mesa: remove __QUICKDRAW__ tests

Reviewed-by: Matt Turner <mattst88@gmail.com>
11 years agomesa: remove WGLAPI macro
Brian Paul [Tue, 5 Nov 2013 00:47:19 +0000 (17:47 -0700)]
mesa: remove WGLAPI macro

WGLAPI was defined in glheader.h but wasn't used anywhere.

Reviewed-by: Matt Turner <mattst88@gmail.com>
11 years agoi965: Expose brw_reg_from_fs_reg() to other files.
Kenneth Graunke [Fri, 1 Nov 2013 20:29:37 +0000 (13:29 -0700)]
i965: Expose brw_reg_from_fs_reg() to other files.

This will be useful for Broadwell code as well.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965: Combine gen6_clip_state.c and gen7_clip_state.c.
Kenneth Graunke [Fri, 1 Nov 2013 23:21:01 +0000 (16:21 -0700)]
i965: Combine gen6_clip_state.c and gen7_clip_state.c.

The changes between Gen6-7 are minimal, and can easily be solved with
an extra generation check.  This cuts a lot of duplicated code.

It also helps prevent even more duplication for Broadwell.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agodri/nouveau: Fix nouveau_init_screen2 breakage.
Francisco Jerez [Mon, 4 Nov 2013 19:58:10 +0000 (11:58 -0800)]
dri/nouveau: Fix nouveau_init_screen2 breakage.

Fix incorrect init ordering in nouveau_init_screen2 caused by
083f66fdd6451648fe355b64b02b29a6a4389f0d.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71172

11 years agoi965/gen7: Add instruction latency estimates for untyped atomics and reads.
Francisco Jerez [Fri, 1 Nov 2013 18:29:13 +0000 (11:29 -0700)]
i965/gen7: Add instruction latency estimates for untyped atomics and reads.

The latency information has been obtained empirically from
measurements taken on Haswell and Ivy Bridge.

Acked-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
11 years agoi965/gen7: Handle atomic instructions from the VEC4 back-end.
Francisco Jerez [Wed, 25 Sep 2013 23:31:35 +0000 (16:31 -0700)]
i965/gen7: Handle atomic instructions from the VEC4 back-end.

This can deal with all the 15 32-bit untyped atomic operations the
hardware supports, but only INC and PREDEC are going to be exposed
through the API for now.

v2: Represent atomics as GLSL intrinsics.  Add support for variably
    indexed atomic counter arrays.
v3: Add comment on why we don't need to assign uniform storage for
    atomic counters.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
11 years agoi965/gen7: Handle atomic instructions from the FS back-end.
Francisco Jerez [Wed, 25 Sep 2013 23:30:20 +0000 (16:30 -0700)]
i965/gen7: Handle atomic instructions from the FS back-end.

This can deal with all the 15 32-bit untyped atomic operations the
hardware supports, but only INC and PREDEC are going to be exposed
through the API for now.

v2: Represent atomics as GLSL intrinsics.  Add support for variably
    indexed atomic counter arrays.  Fix interaction with fragment
    discard.
v3: Add comment on why we don't need to assign uniform storage for
    atomic counters.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
11 years agoi965: Add a 'has_side_effects' back-end instruction predicate.
Francisco Jerez [Sun, 20 Oct 2013 21:02:08 +0000 (14:02 -0700)]
i965: Add a 'has_side_effects' back-end instruction predicate.

This patch fixes the three dead code elimination passes and the
VEC4/FS instruction scheduling passes so they leave instructions with
side effects alone.

At some point it might be interesting to have the instruction
scheduler calculate the exact memory dependencies between atomic ops,
but they're rare enough that it seems unlikely that it will make any
practical difference.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
11 years agoclover: Calculate optimal work group size when it's not specified by the user.
Francisco Jerez [Mon, 4 Nov 2013 19:26:13 +0000 (11:26 -0800)]
clover: Calculate optimal work group size when it's not specified by the user.

Inspired by a patch sent to the mailing list by Tom Stellard, but
using a different algorithm to calculate the optimal block size that
has been found to be considerably more effective.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
11 years agoclover: Constify some command_queue arguments.
Francisco Jerez [Mon, 4 Nov 2013 19:24:10 +0000 (11:24 -0800)]
clover: Constify some command_queue arguments.

11 years agoclover: Workaround compiler bug present in GCC 4.7.0-4.7.2.
Francisco Jerez [Wed, 30 Oct 2013 18:11:06 +0000 (11:11 -0700)]
clover: Workaround compiler bug present in GCC 4.7.0-4.7.2.

Variadic template aliases make these versions of GCC very confused,
write down the full type spec instead.

11 years agost/xorg: handle updates to DamageUnregister API
Emil Velikov [Fri, 1 Nov 2013 16:44:10 +0000 (16:44 +0000)]
st/xorg: handle updates to DamageUnregister API

xserver 1.14.99.2 simplified the DamageUnregister API, by
dropping the drawable argument.
Follow xf86-video-intel and xf86-video-vmware approach and
handle the new API by checking XORG_VERSION_CURRENT.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71110
Reported-by: Michał Górny <mgorny@gentoo.org>
Reported-by: Vinson Lee <vlee@freedesktop.org>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
11 years agomesa: remove Watcom C support
Brian Paul [Mon, 4 Nov 2013 14:33:41 +0000 (07:33 -0700)]
mesa: remove Watcom C support

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agomesa: remove Centerline C support from gl.h
Brian Paul [Mon, 4 Nov 2013 14:26:54 +0000 (07:26 -0700)]
mesa: remove Centerline C support from gl.h

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agomesa: remove BUILD_FOR_SNAP bits
Brian Paul [Mon, 4 Nov 2013 14:29:57 +0000 (07:29 -0700)]
mesa: remove BUILD_FOR_SNAP bits

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agomesa: remove SciTech stuff from gl.h
Brian Paul [Mon, 4 Nov 2013 14:25:22 +0000 (07:25 -0700)]
mesa: remove SciTech stuff from gl.h

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agor600g: properly unbind a DSA state being deleted in r600_delete_dsa_state
Marek Olšák [Sun, 3 Nov 2013 19:27:28 +0000 (20:27 +0100)]
r600g: properly unbind a DSA state being deleted in r600_delete_dsa_state

Tested-by: Christian König <christian.koenig@amd.com>
11 years agodocs/GL3: document radeonsi support, minor cleanup
Marek Olšák [Thu, 31 Oct 2013 14:49:36 +0000 (15:49 +0100)]
docs/GL3: document radeonsi support, minor cleanup

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
11 years agoradeonsi: implement ARB_vertex_type_2_10_10_10_rev
Marek Olšák [Thu, 31 Oct 2013 14:20:06 +0000 (15:20 +0100)]
radeonsi: implement ARB_vertex_type_2_10_10_10_rev

11 years agor600g,radeonsi: properly expose texture buffer formats
Marek Olšák [Thu, 31 Oct 2013 14:32:30 +0000 (15:32 +0100)]
r600g,radeonsi: properly expose texture buffer formats

This exposes GL_ARB_texture_buffer_object_rgb32.

11 years agoradeonsi: implement texture buffer objects
Marek Olšák [Thu, 31 Oct 2013 14:08:49 +0000 (15:08 +0100)]
radeonsi: implement texture buffer objects

GLSL 1.40 is done.

11 years agoradeonsi: report our border color behavior
Marek Olšák [Wed, 30 Oct 2013 20:44:07 +0000 (21:44 +0100)]
radeonsi: report our border color behavior

11 years agoradeonsi: bind a dummy constant buffer in place of NULL buffers
Marek Olšák [Wed, 30 Oct 2013 19:44:23 +0000 (20:44 +0100)]
radeonsi: bind a dummy constant buffer in place of NULL buffers

11 years agoradeonsi: implement uniform buffer objects
Marek Olšák [Fri, 25 Oct 2013 09:45:47 +0000 (11:45 +0200)]
radeonsi: implement uniform buffer objects

11 years agotgsi/scan: set maximum index for each constant buffer
Marek Olšák [Wed, 30 Oct 2013 13:24:27 +0000 (14:24 +0100)]
tgsi/scan: set maximum index for each constant buffer

11 years agoradeonsi: try to fix IA_MULTI_VGT_PARAM programming
Marek Olšák [Tue, 29 Oct 2013 23:36:58 +0000 (00:36 +0100)]
radeonsi: try to fix IA_MULTI_VGT_PARAM programming

This doesn't make any difference on Bonaire, but it might help on Hawaii.

11 years agowinsys/radeon: use type-3 NOPs for CS padding on CIK
Marek Olšák [Tue, 29 Oct 2013 23:22:01 +0000 (00:22 +0100)]
winsys/radeon: use type-3 NOPs for CS padding on CIK

The type-2 NOPs are said to be unstable. It doesn't make a difference here.

11 years agoclover: fix build with LLVM 3.4
Aaron Watry [Fri, 1 Nov 2013 15:25:43 +0000 (10:25 -0500)]
clover: fix build with LLVM 3.4

dso_list was added as an argument for createInternalizePass in 3.4, and then
it was removed again in the same llvm version.

Tested-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
11 years agodraw: move type construction out of loop
Brian Paul [Fri, 1 Nov 2013 23:07:55 +0000 (17:07 -0600)]
draw: move type construction out of loop

We can create clip_ptr_type once instead of n times inside the loop.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
11 years agoi965: Add driconf option clamp_max_samples
Chad Versace [Sun, 3 Nov 2013 21:14:50 +0000 (13:14 -0800)]
i965: Add driconf option clamp_max_samples

The new option clamps GL_MAX_SAMPLES to a hardware-supported MSAA mode.
If negative, then no clamping occurs.

v2: (for Paul)
  - Add option to i965 only, not to all DRI drivers.
  - Do not realy on int->uint cast to convert negative
    values to large positive values. Explicitly check for
    clamp_max_samples < 0.
v3: (for Ken)
   - Don't allow clamp_max_samples to alter context version.
   - Use clearer for-loop and correct comment.
   - Rename variables.
v4: (for Ken)
   - Merge identical if-branches.

Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
11 years agoi965: Fix logic_op check.
Vinson Lee [Sun, 3 Nov 2013 22:43:53 +0000 (14:43 -0800)]
i965: Fix logic_op check.

Fixes "Macro compares unsigned to 0" defect reported by Coverity.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
11 years agoi915: Fix logic_op check.
Vinson Lee [Sun, 3 Nov 2013 22:42:18 +0000 (14:42 -0800)]
i915: Fix logic_op check.

Fixes "Macro compares unsigned to 0" defect reported by Coverity.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
11 years agoi965: Initialize vec4_visitor member variables.
Vinson Lee [Sat, 26 Oct 2013 07:10:25 +0000 (00:10 -0700)]
i965: Initialize vec4_visitor member variables.

Fixes "Uninitialized pointer field" defect reported by Coverity.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
11 years agogallium/targets: remove vdpau-softpipe
Marek Olšák [Sat, 2 Nov 2013 11:20:29 +0000 (12:20 +0100)]
gallium/targets: remove vdpau-softpipe

Reviewed-by: Christian König <christian.koenig@amd.com>
11 years agogallium/targets: remove xvmc-softpipe
Marek Olšák [Sat, 2 Nov 2013 11:18:44 +0000 (12:18 +0100)]
gallium/targets: remove xvmc-softpipe

Reviewed-by: Christian König <christian.koenig@amd.com>
11 years agogallium/targets: remove r300/vdpau
Marek Olšák [Sat, 2 Nov 2013 11:07:42 +0000 (12:07 +0100)]
gallium/targets: remove r300/vdpau

Reviewed-by: Christian König <christian.koenig@amd.com>
11 years agogallium/targets: remove r300/xvmc
Marek Olšák [Sat, 2 Nov 2013 11:03:42 +0000 (12:03 +0100)]
gallium/targets: remove r300/xvmc

Reviewed-by: Christian König <christian.koenig@amd.com>
11 years agogallium/targets: remove radeonsi/xorg
Marek Olšák [Fri, 1 Nov 2013 18:42:47 +0000 (19:42 +0100)]
gallium/targets: remove radeonsi/xorg

Reviewed-by: Christian König <christian.koenig@amd.com>
11 years agogallium/targets: remove r600/xorg
Marek Olšák [Fri, 1 Nov 2013 18:36:12 +0000 (19:36 +0100)]
gallium/targets: remove r600/xorg

Reviewed-by: Christian König <christian.koenig@amd.com>
11 years agofreedreno/a3xx/texture: min/max lod
Rob Clark [Fri, 1 Nov 2013 23:46:55 +0000 (19:46 -0400)]
freedreno/a3xx/texture: min/max lod

Signed-off-by: Rob Clark <robclark@freedesktop.org>
11 years agofreedreno/a3xx: update envytools headers
Rob Clark [Fri, 1 Nov 2013 23:45:02 +0000 (19:45 -0400)]
freedreno/a3xx: update envytools headers

Signed-off-by: Rob Clark <robclark@freedesktop.org>
11 years agofreedreno/a3xx: fix VS out / FS in linking
Rob Clark [Fri, 1 Nov 2013 14:11:27 +0000 (10:11 -0400)]
freedreno/a3xx: fix VS out / FS in linking

Actually link VS out / FS in based on semantic info, keeping in mind
that position/pointsize can also be an input to the FS.  This fixes a
few fragment shaders which were using gl_Position.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
11 years agofreedreno/a3xx: allow num_samplers != num_textures
Rob Clark [Fri, 1 Nov 2013 14:09:39 +0000 (10:09 -0400)]
freedreno/a3xx: allow num_samplers != num_textures

Signed-off-by: Rob Clark <robclark@freedesktop.org>
11 years agofreedreno/a3xx/compiler: highp frag shader
Rob Clark [Thu, 31 Oct 2013 13:59:49 +0000 (09:59 -0400)]
freedreno/a3xx/compiler: highp frag shader

Fixes use of full-precision in fragment shader (ie. don't clobber r0.x
since that can be used by future bary instructions for varying fetch).
And makes use of full-precision the default in fragment shader (but can
be overriden via FD_MESA_DEBUG=fraghalf).

Seems like half precision is often not enough for texture coordinates.
The blob compiler is clever enough to keep texture coords in full
precision registers while using half precision for everything else.  But
we aren't quite that clever yet, so better to default to full precision.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
11 years agofreedreno/a3xx/compiler: relative addressing fixes.
Rob Clark [Sun, 27 Oct 2013 14:19:58 +0000 (10:19 -0400)]
freedreno/a3xx/compiler: relative addressing fixes.

Handle some relative addressing constraints: cannot handle const or
relative in cat5 and src2 of cat3.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
11 years agofreedreno: we do actually support sqrt
Rob Clark [Fri, 25 Oct 2013 15:48:24 +0000 (11:48 -0400)]
freedreno: we do actually support sqrt

Signed-off-by: Rob Clark <robclark@freedesktop.org>
11 years agoi965: Enable ARB_sample_shading on intel hardware >= gen6
Anuj Phogat [Fri, 30 Aug 2013 20:13:15 +0000 (13:13 -0700)]
i965: Enable ARB_sample_shading on intel hardware >= gen6

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ken Graunke <kenneth@whitecape.org>
11 years agoi965/gen7: Enable the features required for GL_ARB_sample_shading
Anuj Phogat [Mon, 7 Oct 2013 19:45:44 +0000 (12:45 -0700)]
i965/gen7: Enable the features required for GL_ARB_sample_shading

- Enable GEN7_WM_MSDISPMODE_PERSAMPLE, GEN7_WM_POSOFFSET_SAMPLE,
  GEN7_WM_OMASK_TO_RENDER_TARGET as per extension's specification.
- Only enable one of GEN7_WM_8_DISPATCH_ENABLE or GEN7_WM_16_DISPATCH_ENABLE
  when GEN7_WM_MSDISPMODE_PERSAMPLE is enabled. Refer IVB PRM Vol. 2, Part 1,
  Page 288 for details.

V2:
    - Use shared function _mesa_get_min_invocations_per_fragment().
    - Use brw_wm_prog_data variables: uses_pos_offset, uses_omask.

V3:
    - Enable simd16 dispatch with per sample shading.
    - Make changes to give preference to 'simd16 only' mode over
      'simd8 only' mode in case of non 1x per sample shading.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
11 years agoi965/gen6: Enable the features required for GL_ARB_sample_shading
Anuj Phogat [Mon, 7 Oct 2013 19:05:56 +0000 (12:05 -0700)]
i965/gen6: Enable the features required for GL_ARB_sample_shading

- Enable GEN6_WM_MSDISPMODE_PERSAMPLE, GEN6_WM_POSOFFSET_SAMPLE,
  GEN6_WM_OMASK_TO_RENDER_TARGET as per extension's specification.
- Only enable one of GEN6_WM_8_DISPATCH_ENABLE or GEN6_WM_16_DISPATCH_ENABLE
  when GEN6_WM_MSDISPMODE_PERSAMPLE is enabled.
  Refer SNB PRM Vol. 2, Part 1, Page 279 for details.

V2:
    - Use shared function _mesa_get_min_invocations_per_fragment().
    - Use brw_wm_prog_data variables: uses_pos_offset, uses_omask.

V3:
    - Enable simd16 dispatch with per sample shading.
    - Make changes to give preference to 'simd16 only' mode over
      'simd8 only' mode in case of non 1x per sample shading.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
11 years agoi965: Add FS backend for builtin gl_SampleMask[]
Anuj Phogat [Thu, 24 Oct 2013 23:21:13 +0000 (16:21 -0700)]
i965: Add FS backend for builtin gl_SampleMask[]

V2:
   - Update comments
   - Add a special backend instructions to compute sample_mask.
   - Add a new variable uses_omask in brw_wm_prog_data.

V3:
   - Make changes to support simd16 mode.
   - Delete redundant AND instruction and handle the register
     stride in FS backend instruction.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
11 years agoi965: Add FS backend for builtin gl_SampleID
Anuj Phogat [Thu, 24 Oct 2013 23:17:08 +0000 (16:17 -0700)]
i965: Add FS backend for builtin gl_SampleID

V2:
   - Update comments
   - Add compute_sample_id variables in brw_wm_prog_key
   - Add a special backend instruction to compute sample_id.

V3:
   - Make changes to support simd16 mode.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
11 years agoi965: Add FS backend for builtin gl_SamplePosition
Anuj Phogat [Thu, 24 Oct 2013 22:53:05 +0000 (15:53 -0700)]
i965: Add FS backend for builtin gl_SamplePosition

V2:
   - Update comments.
   - Add compute_pos_offset variable in brw_wm_prog_key.
   - Add variable uses_pos_offset in brw_wm_prog_data.

V3:
   - Make changes to support simd16 mode.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
11 years agoi965: Don't do vector splitting for ir_var_system_value
Anuj Phogat [Tue, 22 Oct 2013 19:00:11 +0000 (12:00 -0700)]
i965: Don't do vector splitting for ir_var_system_value

This is required while adding builtin system value vec{2, 3, 4}
variables. For example:
(declare (sys) vec2 gl_SamplePosition)

Without this patch above glsl ir splits in to:
(declare (temporary) float gl_SamplePosition_x)
(declare (temporary) float gl_SamplePosition_y)

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agomesa: Add a helper function _mesa_get_min_invocations_per_fragment()
Anuj Phogat [Thu, 17 Oct 2013 00:22:18 +0000 (17:22 -0700)]
mesa: Add a helper function _mesa_get_min_invocations_per_fragment()

This function is used to test if we need to do per sample shading or
per fragment shading.

V2: Use MAX2() to make sure the function returns a number >= 1.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
11 years agoglsl: Add new builtins required by GL_ARB_sample_shading
Anuj Phogat [Fri, 30 Aug 2013 20:10:54 +0000 (13:10 -0700)]
glsl: Add new builtins required by GL_ARB_sample_shading

New builtins added by GL_ARB_sample_shading:
in vec2 gl_SamplePosition
in int gl_SampleID
in int gl_NumSamples
out int gl_SampleMask[]

V2: - Use SWIZZLE_XXXX for STATE_NUM_SAMPLES.
    - Use "result.samplemask" in arb_output_attrib_string.
    - Add comment to explain the size of gl_SampleMask[] array.
    - Make gl_SampleID and gl_SamplePosition system values.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
11 years agomesa: Pass number of samples as a program state variable
Anuj Phogat [Thu, 3 Oct 2013 01:02:20 +0000 (18:02 -0700)]
mesa: Pass number of samples as a program state variable

Number of samples will be required in fragment shader program by new
GLSL builtin uniform "gl_NumSamples".

V2: Use "state.numsamples" in place of "state.num.samples"
    Use _NEW_BUFFERS flag in place of _NEW_MULTISAMPLE

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Ken Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
11 years agomesa: Add new functions and enums required by GL_ARB_sample_shading
Anuj Phogat [Fri, 30 Aug 2013 19:52:38 +0000 (12:52 -0700)]
mesa: Add new functions and enums required by GL_ARB_sample_shading

New functions added by GL_ARB_sample_shading:
glMinSampleShadingARB()

New enums:
GL_SAMPLE_SHADING_ARB
GL_MIN_SAMPLE_SHADING_VALUE_ARB

V2: Update comments.
    Create new GL4x.xml.
    Remove redundant code in get.c.
    Update the API_XML list in Makefile.am.
    Add extra_gl40_ARB_sample_shading predicate to get.c.

V3:
   Fix make check failure.
   Add checks for desktop GL.
   Use GLfloat in place of GLclampf in glMinSampleShading().
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ken Graunke <kenneth@whitecape.org>
11 years agomesa: Add infrastructure for GL_ARB_sample_shading
Anuj Phogat [Fri, 30 Aug 2013 19:34:36 +0000 (12:34 -0700)]
mesa: Add infrastructure for GL_ARB_sample_shading

This patch implements the common support code required for the
GL_ARB_sample_shading extension.

V2: Move GL_ARB_sample_shading to ARB extension list.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <idr@freedesktop.org>
Reviewed-by: Ken Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Optimize saturating SEL.G(E) with imm val <= 0.0f.
Matt Turner [Mon, 28 Oct 2013 04:26:36 +0000 (21:26 -0700)]
i965/fs: Optimize saturating SEL.G(E) with imm val <= 0.0f.

Only one program's instruction count is changed, but a shader in Tropics
is also affected.

instructions in affected programs:     326 -> 320 (-1.84%)

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
11 years agoi965/fs: Optimize saturating SEL.L(E) with imm val >= 1.0.
Matt Turner [Mon, 28 Oct 2013 03:03:48 +0000 (20:03 -0700)]
i965/fs: Optimize saturating SEL.L(E) with imm val >= 1.0.

total instructions in shared programs: 1409124 -> 1406971 (-0.15%)
instructions in affected programs:     158376 -> 156223 (-1.36%)

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
11 years agoi965/fs: Optimize OR with identical sources into a MOV.
Matt Turner [Mon, 28 Oct 2013 02:34:48 +0000 (19:34 -0700)]
i965/fs: Optimize OR with identical sources into a MOV.

Helps a lot of Steam games.

total instructions in shared programs: 1409360 -> 1409124 (-0.02%)
instructions in affected programs:     20842 -> 20606 (-1.13%)

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
11 years agoglsl: Add a CSE pass.
Eric Anholt [Thu, 17 Oct 2013 17:28:40 +0000 (10:28 -0700)]
glsl: Add a CSE pass.

This only operates on constant/uniform values for now, because otherwise I'd
have to deal with killing my available CSE entries when assignments happen,
and getting even this working in the tree ir was painful enough.

As is, it has the following effect in shader-db:

total instructions in shared programs: 1524077 -> 1521964 (-0.14%)
instructions in affected programs:     50629 -> 48516 (-4.17%)
GAINED:                                0
LOST:                                  0

And, for tropics, that accounts for most of the effect, the FPS
improvement is 11.67% +/- 0.72% (n=3).

v2: Use read_only field of the variable, manually check the lod_info union
    members, use get_num_operands(), rename cse_operands_visitor to
    is_cse_candidate_visitor, move all is-a-candidate logic to that
    function, and call it before checking for CSE on a given rvalue, more
    comments, use private keyword.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
11 years agoi965/vec4: Don't overwrite op[1] when doing a UBO load.
Eric Anholt [Thu, 31 Oct 2013 00:09:53 +0000 (17:09 -0700)]
i965/vec4: Don't overwrite op[1] when doing a UBO load.

Prior to the GLSL CSE pass, all of our testing happened to have a freshly
computed temporary in op[1], from the multiply by 16 to get a byte offset.
As of CSE you'll get var_refs of a reused value when you've got multiple
loads from the same offset.

Make a proper temporary for computing our temporary value, to avoid
shifting the value farther and farther down.  Avoids a regression in
gs-float-array-variable-index

Reviewed-by: Paul Berry <stereotype441@gmail.com>
11 years agost/mesa: fix _mesa_init_transform_feedback_object() argument
Brian Paul [Fri, 1 Nov 2013 14:43:22 +0000 (08:43 -0600)]
st/mesa: fix _mesa_init_transform_feedback_object() argument

Need to pass a pointer of the base type, not the st type.
Fixes a compiler warning.

11 years agoi965: Fix brw_store_register_mem64 to stay within a single batch.
Kenneth Graunke [Wed, 30 Oct 2013 23:06:06 +0000 (16:06 -0700)]
i965: Fix brw_store_register_mem64 to stay within a single batch.

Previously, the write of each 32-bit half might land in separate batch
buffers, which is insane.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
11 years agodocs: List transfom_feedback{2,3,instanced} for i965 in release notes.
Kenneth Graunke [Thu, 31 Oct 2013 18:10:20 +0000 (11:10 -0700)]
docs: List transfom_feedback{2,3,instanced} for i965 in release notes.

11 years agoi965: Enable the ARB_transform_feedback_instanced extension on Gen7+.
Kenneth Graunke [Sat, 26 Oct 2013 20:27:18 +0000 (13:27 -0700)]
i965: Enable the ARB_transform_feedback_instanced extension on Gen7+.

This depends on ARB_transform_feedback2, so I've predicated it on the
ability to do register writes.

It also depends on ARB_transform_feedback3, which is the only reason we
couldn't expose it previously.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>