mesa.git
10 years agoi965/copy_image: Use the correct texture level
Jason Ekstrand [Mon, 1 Sep 2014 11:38:28 +0000 (04:38 -0700)]
i965/copy_image: Use the correct texture level

Previously, we were using the source images level for both source and
destination.  Also, we weren't taking the MinLevel from a potential texture
view into account.  This commit fixes both problems.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82804
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agogallivm: Fix build against LLVM SVN >= r216982
Michel Dänzer [Wed, 3 Sep 2014 02:36:34 +0000 (11:36 +0900)]
gallivm: Fix build against LLVM SVN >= r216982

Only MCJIT is available anymore.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agor600g: fix alpha-test with HyperZ enabled, fixing L4D2 tree corruption
Marek Olšák [Tue, 2 Sep 2014 18:38:08 +0000 (20:38 +0200)]
r600g: fix alpha-test with HyperZ enabled, fixing L4D2 tree corruption

*_update_db_shader_control depends on the alpha test state. The problem was
it was in a block which is only entered if the pixel shader is changed.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74863

Cc: mesa-stable@lists.freedesktop.org
Tested-by: Benjamin Bellec <b.bellec@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agor600g,radeonsi: Preserve existing buffer flags
Michel Dänzer [Tue, 2 Sep 2014 08:52:30 +0000 (17:52 +0900)]
r600g,radeonsi: Preserve existing buffer flags

The default case was accidentally clearing RADEON_FLAG_CPU_ACCESS from the
previous fall-through cases.

Reported-by: Mathias Fröhlich <Mathias.Froehlich@gmx.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
10 years agomain: Don't leak temporary texture rows
Jason Ekstrand [Mon, 1 Sep 2014 08:33:36 +0000 (01:33 -0700)]
main: Don't leak temporary texture rows

Reviewed-by: Dave Airlie <airlied@gmail.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
10 years agor300g: pointless assignment of info.indexed
Dave Airlie [Mon, 1 Sep 2014 23:17:35 +0000 (09:17 +1000)]
r300g: pointless assignment of info.indexed

Did this code mean to do something else, you tell me!

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
10 years agoomx/h264: remove stray semicolon after if
Dave Airlie [Mon, 1 Sep 2014 23:39:24 +0000 (09:39 +1000)]
omx/h264: remove stray semicolon after if

Coverity reported this, looks wrong to me.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
10 years agovdpau: unlock the mutex on error paths in attribute setting.
Dave Airlie [Mon, 1 Sep 2014 22:57:53 +0000 (08:57 +1000)]
vdpau: unlock the mutex on error paths in attribute setting.

Coverity pointed out we never dropped the lock here, so fix
it by using a common exit path.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
10 years agou_primconvert: Use u_upload_mgr for our little IB allocations.
Eric Anholt [Sat, 30 Aug 2014 01:03:36 +0000 (18:03 -0700)]
u_primconvert: Use u_upload_mgr for our little IB allocations.

tex-miplevel-selection was hammering my memory manager with primconverts
on individual quads.  This gets all those converted IBs packed into larger
IBs.

Reviewed-by: Rob Clark <robclark@freedesktop.org>
10 years agou_primconvert: Shut up compiler warning.
Eric Anholt [Sat, 30 Aug 2014 01:02:23 +0000 (18:02 -0700)]
u_primconvert: Shut up compiler warning.

gcc isn't detecting that src is set before used, since both are under if
(info->indexed).

Reviewed-by: Rob Clark <robclark@freedesktop.org>
10 years agogbm: Fix gallium build when X11 is in a non-system directory
Eric Anholt [Fri, 18 Jul 2014 23:25:45 +0000 (16:25 -0700)]
gbm: Fix gallium build when X11 is in a non-system directory

pipe-loader.h will include Xlib.h when HAVE_PIPE_LOADER_XLIB is set in the
build.

Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
10 years agovc4: Handle a couple of the transfer map flags.
Eric Anholt [Fri, 22 Aug 2014 20:47:19 +0000 (13:47 -0700)]
vc4: Handle a couple of the transfer map flags.

This is part of fixing extremely long runtimes on some piglit tests that
involve streaming vertex reuploads due to format conversions, and will
similarly be important for X performance, which relies on these flags.

10 years agometa: Make MESA_META_DRAW_BUFFERS restore properly
Kristian Høgsberg [Sat, 16 Aug 2014 06:19:52 +0000 (23:19 -0700)]
meta: Make MESA_META_DRAW_BUFFERS restore properly

A meta begin/end pair with MESA_META_DRAW_BUFFERS will change visible GL
state.  We recreate the draw buffer enums from the buffer bitfield, which
changes GL_BACK to GL_BACK_LEFT (and GL_FRONT to GL_FRONT_LEFT).

This commit modifes the save/restore logic to instead copy the buffer enums
from the gl_framebuffer and then set them on restore using
_mesa_drawbuffers().

It's not clear how this breaks the benchmark in 82796, but fixing meta to not
leak the state change fixes the regression.

No piglit regressions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=82796
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: mesa-stable@lists.freedesktop.org
10 years agoRevert "mesa: fix make tarballs"
Emil Velikov [Mon, 1 Sep 2014 11:04:12 +0000 (12:04 +0100)]
Revert "mesa: fix make tarballs"

This reverts commit 0fbb9a599df898d4e1166d6d6f00cb34a0524bea.

Rather than adding hacks around the issue drop the sources from the
final tarball, and re-add them back with 'make dist'. This fixes a
problem when running parallel 'make install' fails as it recreates
sources and triggers partial recompilation.

Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83355
Reported-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Tested-by: Kai Wasserbäch <kai@dev.carbon-project.org>
10 years agomesa/program_cache: calloc the correct size for the cache.
Dave Airlie [Mon, 1 Sep 2014 23:21:18 +0000 (09:21 +1000)]
mesa/program_cache: calloc the correct size for the cache.

Coverity reported this, and I think this is the right solution,
since cache->items is struct cache_item ** not struct cache_item *,
we also realloc it using struct cache_item * at some point.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
10 years agoradeonsi: Compile dummy pixel shader on demand
Michel Dänzer [Wed, 27 Aug 2014 08:29:08 +0000 (17:29 +0900)]
radeonsi: Compile dummy pixel shader on demand

It's never used under normal circumstances.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
10 years agou_blitter: Create all shaders on demand
Michel Dänzer [Wed, 27 Aug 2014 07:43:56 +0000 (16:43 +0900)]
u_blitter: Create all shaders on demand

Not all of these are used in every context, so this can make a
significant difference for short-lived contexts such as in piglit tests.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
10 years agor600g,radeonsi: Inform the kernel if a BO will likely be accessed by the CPU
Michel Dänzer [Tue, 26 Aug 2014 09:06:49 +0000 (18:06 +0900)]
r600g,radeonsi: Inform the kernel if a BO will likely be accessed by the CPU

This allows the kernel to prevent such BOs from ever being stored in the
CPU inaccessible part of VRAM.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
10 years agoglsl: free uniform_map on failure path.
Dave Airlie [Mon, 1 Sep 2014 23:54:36 +0000 (09:54 +1000)]
glsl: free uniform_map on failure path.

If we fails in reserve_explicit_locations, we leak uniform_map.

Reported-by: coverity scanner.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
10 years agomain/cs: Add gl_context::ComputeProgram
Paul Berry [Sat, 11 Jan 2014 05:39:25 +0000 (21:39 -0800)]
main/cs: Add gl_context::ComputeProgram

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
10 years agomesa: Convert NewDriverState to 64-bits
Jordan Justen [Thu, 7 Aug 2014 05:32:03 +0000 (22:32 -0700)]
mesa: Convert NewDriverState to 64-bits

i965 will have more than 32 bits when BRW_STATE_COMPUTE_PROGRAM is added.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
10 years agoi965: Modify state upload to allow 2 different sets of state atoms.
Paul Berry [Sat, 11 Jan 2014 00:37:09 +0000 (16:37 -0800)]
i965: Modify state upload to allow 2 different sets of state atoms.

The set of state atoms for compute shaders is currently empty; it will
be filled in by future patches.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
10 years agoi965: Modify dirty bit handling to support 2 pipelines.
Paul Berry [Sat, 11 Jan 2014 00:05:11 +0000 (16:05 -0800)]
i965: Modify dirty bit handling to support 2 pipelines.

The hardware state for compute shaders is almost entirely orthogonal
to the hardware state for 3D rendering.  To avoid sending unnecessary
state to the hardware, we'll need to have a separate set of state
atoms for the compute pipeline and the 3D pipeline.  That means we
need to maintain two separate sets of dirty bits to determine which
state atoms need to be run.

But the dirty bits are not completely independent; for example, if
BRW_NEW_SURFACES is flagged while doing 3D rendering, then not only do
we need to re-run 3D state atoms that depend on BRW_NEW_SURFACES, but
we also need to re-run compute state atoms that depend on
BRW_NEW_SURFACES.  But we'll also need to re-run those state atoms the
next time the compute pipeline is run.

To accomplish this, we record two sets of dirty bits, one for each
pipeline.  When bits are dirtied (via SET_DIRTY_BIT() or
SET_DIRTY_ALL()) we set them to the dirty state in both pipelines.
When brw_state_upload() is run, we clear the dirty bits just for the
pipeline that was run.

Note that since the number of pipelines is known at compile time to be
2, the compiler should unroll the loops in SET_DIRTY_BIT() and
SET_DIRTY_ALL().

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
10 years agoi965: Create a macro for checking a dirty bit.
Paul Berry [Fri, 10 Jan 2014 23:40:57 +0000 (15:40 -0800)]
i965: Create a macro for checking a dirty bit.

This will make it easier to extend dirty bit handling to support
compute shaders.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
10 years agoi965: Create a macro for setting all dirty bits.
Paul Berry [Fri, 10 Jan 2014 22:23:52 +0000 (14:23 -0800)]
i965: Create a macro for setting all dirty bits.

This will make it easier to extend dirty bit handling to support
compute shaders.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
10 years agoi965: Create a macro for setting a dirty bit.
Paul Berry [Fri, 10 Jan 2014 21:00:51 +0000 (13:00 -0800)]
i965: Create a macro for setting a dirty bit.

This will make it easier to extend dirty bit handling to support
compute shaders.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
10 years agoi965: add missing parens in vec4 visitor
Dave Airlie [Tue, 2 Sep 2014 00:13:02 +0000 (10:13 +1000)]
i965: add missing parens in vec4 visitor

coverity reported this, Matt said it look like missing parens,
not bad identing, so lets try that.

Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
10 years agonouveau: don't leak dec struct on error
Dave Airlie [Mon, 1 Sep 2014 23:07:55 +0000 (09:07 +1000)]
nouveau: don't leak dec struct on error

This one path doesn't goto fail, so it seems to leak dec.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
10 years agoxvmc/tests: %C isn't a valid printf specifier.
Dave Airlie [Tue, 2 Sep 2014 00:03:00 +0000 (10:03 +1000)]
xvmc/tests: %C isn't a valid printf specifier.

Reported-by: Coverity scanner.
Signed-off-by: Dave Airlie <airlied@redhat.com>
10 years agonouveau/nv40: quiten coverity warning in unused vertex texture code.
Dave Airlie [Mon, 1 Sep 2014 22:55:55 +0000 (08:55 +1000)]
nouveau/nv40: quiten coverity warning in unused vertex texture code.

This fixes the code, but we never run it anyways, so silence coverity.

Signed-off-by: Dave Airlie <airlied@redhat.com>
10 years agonv50: remove unused variables
Ilia Mirkin [Mon, 1 Sep 2014 22:47:01 +0000 (18:47 -0400)]
nv50: remove unused variables

Recent code changes have caused these to no longer be used. Remove them.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
10 years agomesa: force height of 1D textures to be 1 in texture views
Ilia Mirkin [Wed, 20 Aug 2014 06:42:30 +0000 (02:42 -0400)]
mesa: force height of 1D textures to be 1 in texture views

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
10 years agonv50: attach the buffer bo to the miptree structures
Ilia Mirkin [Mon, 1 Sep 2014 14:51:08 +0000 (10:51 -0400)]
nv50: attach the buffer bo to the miptree structures

The current code... makes no sense. Use nouveau_bo_ref to attach the bo
to the exposed resource so as to have the proper lifetime guarantees.

Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
10 years agonv50: mt address may not be the underlying bo's start address
Ilia Mirkin [Mon, 1 Sep 2014 14:48:09 +0000 (10:48 -0400)]
nv50: mt address may not be the underlying bo's start address

With VP2, nv50_miptree is faked because the underlying bo's have to be
laid out in a certain way. This is done by adjusting the address. Make
sure that blits (and everything else for consistency) use the mt address
rather than the bo address as a base.

This fixes retrieving chroma plane with VDPAU.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82255
Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
10 years agonv50: set the miptree address when clearing bo's in vp2 init
Ilia Mirkin [Mon, 1 Sep 2014 16:48:12 +0000 (12:48 -0400)]
nv50: set the miptree address when clearing bo's in vp2 init

The mt address is about to be used more, make sure it's set
appropriately.

Reported-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
10 years agonv50/ir: avoid creating instructions that can't be emitted
Ilia Mirkin [Mon, 1 Sep 2014 14:55:27 +0000 (10:55 -0400)]
nv50/ir: avoid creating instructions that can't be emitted

When constant folding a MAD operation, we first fold the multiply and
generate an ADD. However we do so without making sure that the immediate
can be handled in the saturate case. If it can't, load the immediate in
a separate instruction.

Reported-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
10 years agonvc0: don't make 1d staging textures linear
Ilia Mirkin [Mon, 1 Sep 2014 04:43:06 +0000 (00:43 -0400)]
nvc0: don't make 1d staging textures linear

Experimentally, the sampler doesn't appear to like these, neither as
buffer nor as rect textures. So remove 1D from the list of texture types
to make linear when used for staging.

This fixes the OSD in mplayer for VDPAU.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
10 years agonv50: zero out unbound samplers
Ilia Mirkin [Sat, 30 Aug 2014 17:35:47 +0000 (13:35 -0400)]
nv50: zero out unbound samplers

Samplers are only defined up to num_samplers, so set all samplers above
nr to NULL so that we don't try to read them again later.

Tested-by: Christian Ruppert <idl0r@qasl.de>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
10 years agonvc0/ir: avoid infinite recursion when finding first uses of tex
Ilia Mirkin [Fri, 29 Aug 2014 03:05:49 +0000 (23:05 -0400)]
nvc0/ir: avoid infinite recursion when finding first uses of tex

In certain circumstances, findFirstUses could end up doubling back on
instructions it had already processed, resulting in an infinite
recursion. Avoid this by keeping track of already-visited instructions.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83079
Tested-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
10 years agofreedreno/ir3: add DDX/DDY
Rob Clark [Mon, 1 Sep 2014 16:37:26 +0000 (12:37 -0400)]
freedreno/ir3: add DDX/DDY

Signed-off-by: Rob Clark <robclark@freedesktop.org>
10 years agofreedreno/ir3: don't keep IR around
Rob Clark [Mon, 1 Sep 2014 16:36:34 +0000 (12:36 -0400)]
freedreno/ir3: don't keep IR around

Once we've assembled the shader, no need to keep the intermediate
around.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
10 years agoi965/fs: Don't segfault when debug-logging a null program
Jason Ekstrand [Fri, 29 Aug 2014 18:23:55 +0000 (11:23 -0700)]
i965/fs: Don't segfault when debug-logging a null program

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
10 years agoi965/vec4: Don't segfault when debug-logging a null program
Jason Ekstrand [Thu, 28 Aug 2014 04:49:50 +0000 (21:49 -0700)]
i965/vec4: Don't segfault when debug-logging a null program

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
10 years agoradeonsi: implement EXPCLEAR optimization for depth
Marek Olšák [Sat, 23 Aug 2014 14:46:53 +0000 (16:46 +0200)]
radeonsi: implement EXPCLEAR optimization for depth

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agor600g,radeonsi: initialize HTILE to fully-expanded state
Marek Olšák [Sat, 23 Aug 2014 13:48:21 +0000 (15:48 +0200)]
r600g,radeonsi: initialize HTILE to fully-expanded state

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agoradeonsi: implement fast depth clear
Marek Olšák [Sat, 23 Aug 2014 01:39:08 +0000 (03:39 +0200)]
radeonsi: implement fast depth clear

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agoradeonsi: move DB_RENDER_CONTROL into draw_vbo
Marek Olšák [Sat, 23 Aug 2014 01:25:29 +0000 (03:25 +0200)]
radeonsi: move DB_RENDER_CONTROL into draw_vbo

So that I can add fast depth clear.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agoradeonsi: disable occlusion queries if they are not needed
Marek Olšák [Sat, 23 Aug 2014 09:12:01 +0000 (11:12 +0200)]
radeonsi: disable occlusion queries if they are not needed

We always left them enabled, which turned off HiZ in some cases.
This should improve performace with Hyper-Z.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agor600g,radeonsi: force fast stencil and HTILE stencil off, fixing a Hyper-Z hang
Marek Olšák [Sat, 23 Aug 2014 00:03:58 +0000 (02:03 +0200)]
r600g,radeonsi: force fast stencil and HTILE stencil off, fixing a Hyper-Z hang

This should be as fast as no HTILE for stencil. I think we can still get full
performance with depth-only rendering even if stencil is present in the buffer
but not used, but I'm not 100% sure. This may be revisited when HiS and fast
stencil clear are implemented.

This fixes a hang in Brutal Legend.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64471

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agor600g: set VGT_ENHANCE=4 on R7xx
Marek Olšák [Wed, 20 Aug 2014 21:58:24 +0000 (23:58 +0200)]
r600g: set VGT_ENHANCE=4 on R7xx

This is a golden setting on RV740, but there is a hw bug which recommends
setting it on all R7xx chipsets.

Acked-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agor600g: expose AMD_vertex_shader_layer and *_viewport_index on R600-R700
Marek Olšák [Wed, 20 Aug 2014 17:17:39 +0000 (19:17 +0200)]
r600g: expose AMD_vertex_shader_layer and *_viewport_index on R600-R700

already implemented

Acked-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agor600g: fix layered clear
Marek Olšák [Wed, 20 Aug 2014 17:17:09 +0000 (19:17 +0200)]
r600g: fix layered clear

Cc: mesa-stable@lists.freedesktop.org
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agor600g: some DB bug workarounds for R6xx DB flushing
Marek Olšák [Wed, 20 Aug 2014 15:22:41 +0000 (17:22 +0200)]
r600g: some DB bug workarounds for R6xx DB flushing

Acked-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agor600g: enable fast depth clear for array textures and cubemaps
Marek Olšák [Wed, 20 Aug 2014 12:36:53 +0000 (14:36 +0200)]
r600g: enable fast depth clear for array textures and cubemaps

I have a piglit test that hits this.

Acked-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agor600g: use HTILE allocator from SI
Marek Olšák [Tue, 19 Aug 2014 23:34:37 +0000 (01:34 +0200)]
r600g: use HTILE allocator from SI

It's almost the same.

This enables tiling for HTILE. It also enables Hyper-Z for other texture
targets (1D, 1D_ARRAY, 2D_ARRAY, CUBE, CUBE_ARRAY, 3D, RECT).

2D array depth textures are tested by Unigine Sanctuary and my new piglit
test.

Acked-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agor600g: set DB_DEPTH_SIZE.HEIGHT_TILE_MAX for EG/CM, inline other fields
Marek Olšák [Wed, 20 Aug 2014 10:52:09 +0000 (12:52 +0200)]
r600g: set DB_DEPTH_SIZE.HEIGHT_TILE_MAX for EG/CM, inline other fields

This fixes rendering to non-zero layer/face/slice with HTILE.

v2: added the assertion

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agoradeonsi: set DB_DEPTH_SIZE.HEIGHT_TILE_MAX, inline other fields
Marek Olšák [Tue, 19 Aug 2014 14:22:12 +0000 (16:22 +0200)]
radeonsi: set DB_DEPTH_SIZE.HEIGHT_TILE_MAX, inline other fields

This fixes rendering to a non-zero layer/face/slice with HTILE.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72685

v2: added the assertion

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
10 years agor600g: Implement sm5 geometry shader instancing
Glenn Kennard [Mon, 25 Aug 2014 09:05:06 +0000 (11:05 +0200)]
r600g: Implement sm5 geometry shader instancing

Requires Evergreen or later hardware.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
10 years agoglsl_to_tgsi: allocate and enlarge arrays for temporaries on demand
Marek Olšák [Sat, 23 Aug 2014 22:56:12 +0000 (00:56 +0200)]
glsl_to_tgsi: allocate and enlarge arrays for temporaries on demand

This fixes crashes if the number of temporaries is greater than 4096.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66184

v2: added fail paths for realloc failures

Cc: 10.2 10.3 mesa-stable@lists.freedesktop.org
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
10 years agogallium/pb_bufmgr_cache: limit the size of cache
Marek Olšák [Wed, 20 Aug 2014 21:53:40 +0000 (23:53 +0200)]
gallium/pb_bufmgr_cache: limit the size of cache

This should make a machine which is running piglit more responsive at times.
e.g. streaming-texture-leak can easily eat 600 MB because of how fast it
creates new textures.

10 years agopipe-loader: use the correct screen index
Marek Olšák [Tue, 19 Aug 2014 22:34:18 +0000 (00:34 +0200)]
pipe-loader: use the correct screen index

10 years agoegl/dri2: use the correct screen index
Marek Olšák [Tue, 19 Aug 2014 22:33:34 +0000 (00:33 +0200)]
egl/dri2: use the correct screen index

Required for multi-GPU configuration where each GPU has its own X screen.

10 years agodocs: Mark ARB_compute_shader as work in progress
Jordan Justen [Wed, 27 Aug 2014 20:22:12 +0000 (13:22 -0700)]
docs: Mark ARB_compute_shader as work in progress

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
10 years agoi965/fs: don't use ir->shadow_comparitor in emit_texture_*
Connor Abbott [Mon, 4 Aug 2014 22:20:37 +0000 (15:20 -0700)]
i965/fs: don't use ir->shadow_comparitor in emit_texture_*

Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
10 years agoi965/fs: don't pass ir_variable * to emit_samplepos_setup()
Connor Abbott [Tue, 5 Aug 2014 18:10:07 +0000 (11:10 -0700)]
i965/fs: don't pass ir_variable * to emit_samplepos_setup()

We were only using it to get at its type, which we already know because
it's a builtin variable.

Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
10 years agoi965/fs: don't pass ir_variable * to emit_frontfacing_interpolation()
Connor Abbott [Tue, 5 Aug 2014 17:29:00 +0000 (10:29 -0700)]
i965/fs: don't pass ir_variable * to emit_frontfacing_interpolation()

We were only using it to get at its type, which we already know because
it's a builtin variable.

v2 (Ken): Rebase on Matt's optimized gl_FrontFacing calculations.

Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
10 years agoi965: Fix GPU hangs when INTEL_DEBUG=no16 is set.
Kenneth Graunke [Sat, 30 Aug 2014 06:10:47 +0000 (23:10 -0700)]
i965: Fix GPU hangs when INTEL_DEBUG=no16 is set.

The replicated data clear shader needs to be SIMD16, or else the GPU
will hang.  So, compile it even if INTEL_DEBUG=no16 is set.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
10 years agomesa: fix make tarballs
Emil Velikov [Sun, 31 Aug 2014 22:16:15 +0000 (23:16 +0100)]
mesa: fix make tarballs

Current method of generating distribution tar-balls involves manually
invoking make + target name in the appropriate places. This temporary
solution is used until we get 'make dist' working.

Currently it does not work, as in order to have the target (which is
also a filename) available in the final Makefile we need to add a PHONY
target + use the correct target name.

Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
10 years agoi965/vec4: Remove try_emit_saturate
Abdiel Janulgue [Mon, 16 Jun 2014 20:56:10 +0000 (13:56 -0700)]
i965/vec4: Remove try_emit_saturate

Now that saturate is implemented natively as an instruction,
we can cut down on unneeded functionality.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
10 years agoi965/fs: Refactor try_emit_saturate
Abdiel Janulgue [Mon, 16 Jun 2014 19:28:00 +0000 (12:28 -0700)]
i965/fs: Refactor try_emit_saturate

v3: Since the fs backend can emit saturate as a separate instruction, there is
    no need to detect for min/max instructions and to rewrite the instruction tree
    accordingly. On the other hand, we don't need to emit a separate saturated
    mov either when the expression generating src can do saturate directly.
v4: Add can_do_saturate() check before enabling saturate modifer (Ken)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
10 years agoir_to_mesa, glsl_to_tgsi: Remove try_emit_saturate
Abdiel Janulgue [Mon, 16 Jun 2014 19:28:44 +0000 (12:28 -0700)]
ir_to_mesa, glsl_to_tgsi: Remove try_emit_saturate

Now that saturate is implemented natively as instruction,
we can cut down on unneeded functionality.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
10 years agoi965/vec4: Allow propagation of instructions with saturate flag to sel
Abdiel Janulgue [Fri, 4 Jul 2014 11:52:36 +0000 (04:52 -0700)]
i965/vec4: Allow propagation of instructions with saturate flag to sel

When sel conditon is bounded within 0 and 1.0. This allows code as:
        mov.sat a b
        sel.ge  dst a 0.25F

To be propagated as:
        sel.ge.sat dst b 0.25F

v3: - Syntax clarifications in inst->saturate assignment
    - Remove extra parenthesis when assigning src_reg value
      from copy_entry (Matt Turner)
v4: - Take channels into consideration when propagating saturated instructions.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
10 years agoi965/fs: Allow propagation of instructions with saturate flag to sel
Abdiel Janulgue [Thu, 3 Jul 2014 11:14:39 +0000 (04:14 -0700)]
i965/fs: Allow propagation of instructions with saturate flag to sel

When sel conditon is bounded within 0 and 1.0. This allows code as:
mov.sat a b
sel.ge  dst a 0.25F

To be propagated as:
sel.ge.sat dst b 0.25F

v3: Syntax clarifications in inst->saturate assignment (Matt Turner)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
10 years agoglsl: Optimize clamp(x, b, 1.0), where b > 0.0 as max(saturate(x),b)
Abdiel Janulgue [Tue, 8 Jul 2014 11:12:50 +0000 (14:12 +0300)]
glsl: Optimize clamp(x, b, 1.0), where b > 0.0 as max(saturate(x),b)

v2: - Output max(saturate(x),b) instead of saturate(max(x,b))
    - Make sure we do component-wise comparison for vectors (Ian Romanick)
v3: - Add missing condition where the outer constant value is > 0.0 and
      inner constant is 1.0.
    - Fix comments to show that the optimization is a commutative operation
      (Matt Turner)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
10 years agoglsl: Optimize clamp(x, 0.0, b), where b < 1.0 as min(saturate(x),b)
Abdiel Janulgue [Fri, 20 Jun 2014 05:17:20 +0000 (22:17 -0700)]
glsl: Optimize clamp(x, 0.0, b), where b < 1.0 as min(saturate(x),b)

v2: - Output min(saturate(x),b) instead of saturate(min(x,b)) suggested by Ilia Mirkin
    - Make sure we do component-wise comparison for vectors (Ian Romanick)
v3: - Add missing condition where the outer constant value is zero and
      inner constant is < 1
    - Fix comments to reflect we are doing a commutative operation (Matt Turner)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
10 years agoglsl: Optimize clamp(x, 0, 1) as saturate(x)
Abdiel Janulgue [Fri, 20 Jun 2014 05:15:14 +0000 (22:15 -0700)]
glsl: Optimize clamp(x, 0, 1) as saturate(x)

v2: - Check that the base type is float (Ian Romanick)
v3: - Make sure comments reflect that we are doing a commutative operation
    - Add missing condition where the inner constant is 1.0 and outer constant is 0.0
    - Make indexing of operands easier to read (Matt Turner)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
10 years agoglsl: Implement saturate as ir_unop_saturate
Abdiel Janulgue [Fri, 20 Jun 2014 23:55:03 +0000 (16:55 -0700)]
glsl: Implement saturate as ir_unop_saturate

Now that we have the ir_unop_saturate implemented as a single
instruction, generate the correct simplified expression.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
10 years agoyi965/vec4: Add support for ir_unop_saturate
Abdiel Janulgue [Mon, 16 Jun 2014 20:56:50 +0000 (13:56 -0700)]
yi965/vec4: Add support for ir_unop_saturate

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
10 years agoi965/fs: Add support for ir_unop_saturate
Abdiel Janulgue [Mon, 16 Jun 2014 17:35:44 +0000 (10:35 -0700)]
i965/fs: Add support for ir_unop_saturate

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
10 years agoir_to_mesa, glsl_to_tgsi: Add support for ir_unop_saturate
Abdiel Janulgue [Mon, 16 Jun 2014 18:14:32 +0000 (11:14 -0700)]
ir_to_mesa, glsl_to_tgsi: Add support for ir_unop_saturate

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
10 years agoir_to_mesa, glsl_to_tgsi: lower ir_unop_saturate
Abdiel Janulgue [Mon, 16 Jun 2014 19:16:57 +0000 (12:16 -0700)]
ir_to_mesa, glsl_to_tgsi: lower ir_unop_saturate

Needed when vertex programs doesn't allow saturate

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
10 years agoglsl: Add a pass to lower ir_unop_saturate to clamp(x, 0, 1)
Abdiel Janulgue [Thu, 12 Jun 2014 21:59:30 +0000 (14:59 -0700)]
glsl: Add a pass to lower ir_unop_saturate to clamp(x, 0, 1)

Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
10 years agoglsl: Add constant evaluation of ir_unop_saturate
Abdiel Janulgue [Thu, 12 Jun 2014 20:53:40 +0000 (13:53 -0700)]
glsl: Add constant evaluation of ir_unop_saturate

v2: Use CLAMP macro (Ian Romanick)

Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
10 years agoglsl: Add ir_unop_saturate
Abdiel Janulgue [Fri, 20 Jun 2014 18:56:48 +0000 (11:56 -0700)]
glsl: Add ir_unop_saturate

Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
10 years agoi965/vec4/fs: Count loops in shader debug
Abdiel Janulgue [Wed, 6 Aug 2014 08:27:58 +0000 (11:27 +0300)]
i965/vec4/fs: Count loops in shader debug

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
10 years agoi965/vec4: inline generate_vec4_instruction() within generate_code()
Abdiel Janulgue [Fri, 29 Aug 2014 16:07:08 +0000 (19:07 +0300)]
i965/vec4: inline generate_vec4_instruction() within generate_code()

Suggested by Matt. This patch combines and moves back the code-generation
functions from generate_vec4_instruction() into generate_code(). Makes
generate_code() a bit larger, but helps us to count loops in a
straightforward manner.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
10 years agoi965: Add 2x MSAA support to Broadwell fast clear code.
Kenneth Graunke [Fri, 29 Aug 2014 22:15:43 +0000 (15:15 -0700)]
i965: Add 2x MSAA support to Broadwell fast clear code.

According to the cited documentation section (but in the newer docs),
x_scaledown is the same for 2x and 4x MSAA.

+47 piglits.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83081
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
10 years agoi965/vec4: Update register coalescing test.
Matt Turner [Fri, 29 Aug 2014 03:16:42 +0000 (20:16 -0700)]
i965/vec4: Update register coalescing test.

In commit 04895f5c I added support for reswizzling writemasks. This test
was checking that we didn't support this.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82881

10 years agoi965: Use unreachable() to silence warning.
Matt Turner [Sat, 30 Aug 2014 16:42:18 +0000 (09:42 -0700)]
i965: Use unreachable() to silence warning.

brw_meta_fast_clear.c:211:17: warning: 'x_scaledown' may be used
uninitialized in this function [-Wmaybe-uninitialized]
    unsigned int x_scaledown, y_scaledown;

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
10 years agoilo: set INTEL_RELOC_GGTT only on GEN6
Chia-I Wu [Sun, 31 Aug 2014 00:12:27 +0000 (08:12 +0800)]
ilo: set INTEL_RELOC_GGTT only on GEN6

We asked MI commands to use GGTT only on GEN6.

10 years agoilo: fix bound check for 3DSTATE_URB_VS
Chia-I Wu [Sat, 30 Aug 2014 17:34:41 +0000 (01:34 +0800)]
ilo: fix bound check for 3DSTATE_URB_VS

Fix max/min entries on GEN7.5 GT2/GT3.

10 years agoilo: replace cmd by dw0 in GPE
Chia-I Wu [Sat, 30 Aug 2014 23:22:01 +0000 (07:22 +0800)]
ilo: replace cmd by dw0 in GPE

With e3c251071b0c9396c3ec76d1cf943c60ae297281, the magic values are gone.  We
no longer need "cmd" to hide them.  Replace it by dw0.

10 years agost/hgl: Move st_visual create/destroy into hgl state_tracker
Alexander von Gluck IV [Fri, 29 Aug 2014 15:06:09 +0000 (15:06 +0000)]
st/hgl: Move st_visual create/destroy into hgl state_tracker

10 years agost/hgl: Move st_manager create/destroy into hgl state_tracker
Alexander von Gluck IV [Fri, 29 Aug 2014 14:42:26 +0000 (14:42 +0000)]
st/hgl: Move st_manager create/destroy into hgl state_tracker

10 years agofreedreno/ir3: fix potential null ptr deref
Rob Clark [Sat, 30 Aug 2014 20:51:46 +0000 (16:51 -0400)]
freedreno/ir3: fix potential null ptr deref

Fix potential segfault in debug code.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
10 years agofreedreno/ir3: add TXB
Rob Clark [Sat, 30 Aug 2014 19:17:49 +0000 (15:17 -0400)]
freedreno/ir3: add TXB

Signed-off-by: Rob Clark <robclark@freedesktop.org>
10 years agofreedreno/ir3: detect scheduler fail
Rob Clark [Fri, 29 Aug 2014 14:51:40 +0000 (10:51 -0400)]
freedreno/ir3: detect scheduler fail

There are some cases where the scheduler can get itself into impossible
situations, by scheduling the wrong write to pred or addr register
first.  (Ie. it could end up being unable to schedule any instruction if
some instruction which depends on the current addr/reg value also
depends on another addr/reg value.)

To solve this we'd need to be able to insert extra mov instructions
(which would also help when register assignment gets into impossible
situations).  To do that, we'd need to move the nop padding from sched
into legalize.

But to start with, just detect when we get into an impossible situation
and bail, rather than sitting forever in an infinite loop.  This way it
will at least fall back to the old compiler, which might even work if
you are lucky.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
10 years agoglsl: Use bit-flags image attributes and uint16_t for the image format
Ian Romanick [Mon, 14 Jul 2014 22:48:38 +0000 (15:48 -0700)]
glsl: Use bit-flags image attributes and uint16_t for the image format

All of the GL image enums fit in 16-bits.

Also move the fields from the anonymous "image" structucture to the next
higher structure.  This will enable packing the bits with the other
bitfield.

Valgrind massif results for a trimmed apitrace of dota2:

                  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
Before (32-bit): 76 40,572,916,873       68,831,248       63,328,783     5,502,465            0
After  (32-bit): 70 40,577,421,777       68,487,584       62,973,695     5,513,889            0

Before (64-bit): 60 36,822,640,058       96,526,824       88,735,296     7,791,528            0
After  (64-bit): 74 37,124,603,758       95,891,808       88,466,712     7,425,096            0

A real savings of 346KiB on 32-bit and 262KiB on 64-bit.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
10 years agoglsl: Use a single bit for the dual-source blend index
Ian Romanick [Mon, 14 Jul 2014 22:48:37 +0000 (15:48 -0700)]
glsl: Use a single bit for the dual-source blend index

The only values allowed are 0 and 1, and the value is checked before
assigning.

Valgrind massif results for a trimmed apitrace of dota2:

                  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
Before (32-bit): 74 40,580,119,657       69,186,544       63,506,327     5,680,217            0
After  (32-bit): 76 40,572,916,873       68,831,248       63,328,783     5,502,465            0

Before (64-bit): 89 36,822,971,897       96,526,616       88,735,296     7,791,320            0
After  (64-bit): 60 36,822,640,058       96,526,824       88,735,296     7,791,528            0

A real savings of 173KiB on 32-bit and no change on 64-bit.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
10 years agoglsl: Eliminate ir_variable::data.atomic.buffer_index
Ian Romanick [Mon, 14 Jul 2014 22:48:36 +0000 (15:48 -0700)]
glsl: Eliminate ir_variable::data.atomic.buffer_index

Just use ir_variable::data.binding... because that's the where the
binding is stored for everything else that can use layout(binding=).

Valgrind massif results for a trimmed apitrace of dota2:

                  n        time(i)         total(B)   useful-heap(B) extra-heap(B)    stacks(B)
Before (32-bit): 50 40,564,927,443       69,185,408       63,683,871     5,501,537            0
After  (32-bit): 74 40,580,119,657       69,186,544       63,506,327     5,680,217            0

Before (64-bit): 59 36,822,048,449       96,526,888       89,113,000     7,413,888            0
After  (64-bit): 89 36,822,971,897       96,526,616       88,735,296     7,791,320            0

A real savings of 173KiB on 32-bit and 368KiB on 64-bit.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>