mesa.git
11 years agost/mesa: add PIPE_FORMAT_R16G16B16A16_UNORM renderbuffer support
Brian Paul [Tue, 12 Mar 2013 00:31:21 +0000 (18:31 -0600)]
st/mesa: add PIPE_FORMAT_R16G16B16A16_UNORM renderbuffer support

To allow rendering in 16-bit/channel RGBA buffers.

Reviewed-by: José Fonseca <jfonseca@vmware.com>
11 years agoscons: Re-add ','
José Fonseca [Wed, 13 Mar 2013 00:31:03 +0000 (00:31 +0000)]
scons: Re-add ','

11 years agoautotools: Add missing top-level include dir.
José Fonseca [Wed, 13 Mar 2013 00:16:24 +0000 (00:16 +0000)]
autotools: Add missing top-level include dir.

Fixes autotools build failure.  Not sure if there are more, as I have
difficulties in building the full tree.

11 years agoconfigure.ac: Alphabetize freedreno makefiles.
Matt Turner [Wed, 13 Mar 2013 00:09:55 +0000 (17:09 -0700)]
configure.ac: Alphabetize freedreno makefiles.

11 years agobuild: Get rid of dead MESA_ASM_FILES variable
Matt Turner [Fri, 22 Feb 2013 00:51:19 +0000 (16:51 -0800)]
build: Get rid of dead MESA_ASM_FILES variable

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agomesa/build: Get rid of dead ALL_FILES variable
Matt Turner [Fri, 22 Feb 2013 00:51:03 +0000 (16:51 -0800)]
mesa/build: Get rid of dead ALL_FILES variable

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agoxmlpool/.gitignore: Remove 'Makefile'
Matt Turner [Fri, 22 Feb 2013 01:03:18 +0000 (17:03 -0800)]
xmlpool/.gitignore: Remove 'Makefile'

Handled by top level .gitignore.

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agomesa: Use PACKAGE_BUGREPORT macro.
Matt Turner [Sat, 9 Mar 2013 08:28:09 +0000 (00:28 -0800)]
mesa: Use PACKAGE_BUGREPORT macro.

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agomesa: Remove unused version #defines from version.h.
Matt Turner [Sat, 9 Mar 2013 08:23:20 +0000 (00:23 -0800)]
mesa: Remove unused version #defines from version.h.

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agomesa: Replace MESA_VERSION with PACKAGE_VERSION.
Matt Turner [Sat, 9 Mar 2013 08:25:45 +0000 (00:25 -0800)]
mesa: Replace MESA_VERSION with PACKAGE_VERSION.

One fewer place to have to update.

Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agodraw/so: Fix stream output with geometry shaders
Zack Rusin [Tue, 12 Mar 2013 20:41:35 +0000 (13:41 -0700)]
draw/so: Fix stream output with geometry shaders

If geometry shader is present its stream output info should
be used instead of the vs and we shouldn't use the pre-clipped
corrdinates.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
11 years agoinclude: Fix build with VS 11 (i.e, 2012).
José Fonseca [Tue, 12 Mar 2013 20:37:47 +0000 (20:37 +0000)]
include: Fix build with VS 11 (i.e, 2012).

NOTE: Candidate for the stable branches.

Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agomesa,gallium,egl,mapi: One definition of C99 inline/__func__ to rule them all.
José Fonseca [Tue, 12 Mar 2013 11:17:49 +0000 (11:17 +0000)]
mesa,gallium,egl,mapi: One definition of C99 inline/__func__ to rule them all.

We were in four already...

NOTE: Candidate for the stable branches.

Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agoscons: Allows choosing VS 10 or 11.
José Fonseca [Tue, 12 Mar 2013 20:33:38 +0000 (20:33 +0000)]
scons: Allows choosing VS 10 or 11.

NOTE: Candidate for the stable branches.

Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agoradeonsi: Fix off-by-one for maximum vertex element index in some cases
Michel Dänzer [Tue, 12 Mar 2013 11:34:37 +0000 (12:34 +0100)]
radeonsi: Fix off-by-one for maximum vertex element index in some cases

In cases where the vertex element size is smaller than the vertex buffer
stride, the previous calculation could end up 1 too low. This would result
in the GPU using index 0 instead of the maximum index for those elements,
which would be visible as intermittent distorted triangles.

NOTE: This is a candidate for the 9.1 branch.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
11 years agonvc0: avoid crash on updating RASTERIZE_ENABLE state
Christoph Bumiller [Mon, 11 Mar 2013 19:53:25 +0000 (20:53 +0100)]
nvc0: avoid crash on updating RASTERIZE_ENABLE state

When doing a blit with the 3D engine, the rasterizer or zsa cso may
be NULL.

11 years agogallium/tests: check format in compute tests, make selectable
Christoph Bumiller [Fri, 1 Mar 2013 15:45:47 +0000 (16:45 +0100)]
gallium/tests: check format in compute tests, make selectable

11 years agonvc0: add MP trap handler for nve4
Christoph Bumiller [Sat, 9 Mar 2013 16:17:14 +0000 (17:17 +0100)]
nvc0: add MP trap handler for nve4

11 years agonvc0: they removed the NTID,NCTAID,GRIDID registers on nve4
Christoph Bumiller [Sat, 9 Mar 2013 11:11:38 +0000 (12:11 +0100)]
nvc0: they removed the NTID,NCTAID,GRIDID registers on nve4

11 years agonvc0: implement compute support for nve4
Christoph Bumiller [Sat, 23 Feb 2013 18:40:23 +0000 (19:40 +0100)]
nvc0: implement compute support for nve4

11 years agonvc0/ir: try to fix CAS (CompareAndSwap)
Christoph Bumiller [Mon, 11 Mar 2013 16:34:43 +0000 (17:34 +0100)]
nvc0/ir: try to fix CAS (CompareAndSwap)

11 years agonv50/ir: add CCTL (cache control) op
Christoph Bumiller [Mon, 11 Mar 2013 16:34:05 +0000 (17:34 +0100)]
nv50/ir: add CCTL (cache control) op

11 years agonvc0/ir/emit: fix emission of large address offsets
Christoph Bumiller [Mon, 11 Mar 2013 16:32:52 +0000 (17:32 +0100)]
nvc0/ir/emit: fix emission of large address offsets

11 years agonvc0: add SHADER/COMPUTE_RESOURCE bind flags to format table
Christoph Bumiller [Fri, 8 Mar 2013 21:40:30 +0000 (22:40 +0100)]
nvc0: add SHADER/COMPUTE_RESOURCE bind flags to format table

11 years agonouveau: align PIPE_BIND_SHADER,COMPUTE_RESOURCEs to 256 bytes
Christoph Bumiller [Sat, 2 Mar 2013 17:27:56 +0000 (18:27 +0100)]
nouveau: align PIPE_BIND_SHADER,COMPUTE_RESOURCEs to 256 bytes

11 years agonv50,nvc0: copy writable flag on surface creation
Christoph Bumiller [Fri, 1 Mar 2013 20:37:37 +0000 (21:37 +0100)]
nv50,nvc0: copy writable flag on surface creation

11 years agonv50/ir: add support for different sampler and resource index on nve4
Christoph Bumiller [Sat, 2 Mar 2013 20:00:26 +0000 (21:00 +0100)]
nv50/ir: add support for different sampler and resource index on nve4

And remove non-working code for indirect sampler/resource selection.
Will be added back later.

Includes code from "nv50/ir/tgsi: Resource indirect indexing" by
Francisco Jerez (when mixing the R and S handles we can only specify
them via a register, i.e. indirectly, unless we upload all the used
handle combinations to c[] space, which we don't for now).

11 years agonv50/ir: implement splitting of 64 bit ops after RA
Christoph Bumiller [Sat, 2 Mar 2013 13:59:06 +0000 (14:59 +0100)]
nv50/ir: implement splitting of 64 bit ops after RA

11 years agonvc0/ir: skip back edges when determining latest sched value
Christoph Bumiller [Thu, 28 Feb 2013 21:08:36 +0000 (22:08 +0100)]
nvc0/ir: skip back edges when determining latest sched value

11 years agonvc0/ir: use large issue delay after RET, too
Christoph Bumiller [Thu, 28 Feb 2013 18:07:24 +0000 (19:07 +0100)]
nvc0/ir: use large issue delay after RET, too

11 years agonv50/ir: fix size adjustment for sched info for multiple functions
Christoph Bumiller [Thu, 28 Feb 2013 18:00:02 +0000 (19:00 +0100)]
nv50/ir: fix size adjustment for sched info for multiple functions

11 years agonv50/ir: print function inputs and outputs
Christoph Bumiller [Wed, 27 Feb 2013 20:02:29 +0000 (21:02 +0100)]
nv50/ir: print function inputs and outputs

11 years agonv50/ir/ssa: add a few comments regarding RenamePass
Christoph Bumiller [Wed, 27 Feb 2013 14:32:35 +0000 (15:32 +0100)]
nv50/ir/ssa: add a few comments regarding RenamePass

11 years agonv50/ir/tgsi: Exclude local declarations from function prototypes.
Francisco Jerez [Mon, 25 Feb 2013 20:57:32 +0000 (21:57 +0100)]
nv50/ir/tgsi: Exclude local declarations from function prototypes.

11 years agonv50/ir/opt: try to make use of SUCLAMP addend
Christoph Bumiller [Mon, 25 Feb 2013 14:52:10 +0000 (15:52 +0100)]
nv50/ir/opt: try to make use of SUCLAMP addend

11 years agonv50/ir: don't assert on type in Modifier.applyTo if it is 0
Christoph Bumiller [Sun, 24 Feb 2013 17:36:44 +0000 (18:36 +0100)]
nv50/ir: don't assert on type in Modifier.applyTo if it is 0

11 years agonv50/ir: add support for barriers
Christoph Bumiller [Sat, 23 Feb 2013 12:09:32 +0000 (13:09 +0100)]
nv50/ir: add support for barriers

nv50 part by Francisco Jerez.

11 years agonv50/ir/tgsi: add support for atomics
Christoph Bumiller [Wed, 20 Feb 2013 20:33:38 +0000 (21:33 +0100)]
nv50/ir/tgsi: add support for atomics

11 years agonv50/ir/tgsi: handle TGSI_OPCODE_LOAD,STORE
Christoph Bumiller [Fri, 22 Feb 2013 23:39:23 +0000 (00:39 +0100)]
nv50/ir/tgsi: handle TGSI_OPCODE_LOAD,STORE

Squashed and (heavily) modified original patches by Francisco Jerez:
nv50/ir/tgsi: Implement resource LOAD/STORE (wip).
nv50/ir/tgsi: Emit SUST/SULD for surface access, and add CB LOAD/STORE support
nv50/ir/tgsi: Fix/clean up the LOAD/STORE handling code.

Left out for now:
nv50/ir/tgsi: Resource indirect indexing

Treating raw, read-only surfaces as constant buffers (CBs) was removed
because CBs are limited to a size of 64 KiB which isn't desireable, and
because this decision should probably be made by the state tracker.
If we used a number of CB slots for surfaces, it might find that we
cannot accomodate the advertised limit.

11 years agonvc0/ir: don't replace load from input in COMPUTE progs with VFETCH
Christoph Bumiller [Thu, 28 Feb 2013 20:05:45 +0000 (21:05 +0100)]
nvc0/ir: don't replace load from input in COMPUTE progs with VFETCH

11 years agonvc0/ir: implement lowering of surface ops for nve4
Christoph Bumiller [Fri, 22 Feb 2013 23:00:27 +0000 (00:00 +0100)]
nvc0/ir: implement lowering of surface ops for nve4

11 years agonvc0/ir: add formatted surface load lib code, move to extra header
Christoph Bumiller [Tue, 19 Feb 2013 21:12:01 +0000 (22:12 +0100)]
nvc0/ir: add formatted surface load lib code, move to extra header

OpenGL is nice and makes the user specify a format with an image unit.
OpenCL is evil and doesn't, and what's better than adding a huge load
of functions that we call indirectly to handle the conversion ?

11 years agonv50/ir: extend moveSources for delta < 0
Christoph Bumiller [Sun, 17 Feb 2013 11:01:55 +0000 (12:01 +0100)]
nv50/ir: extend moveSources for delta < 0

11 years agonvc0/ir: lower atomics in s[]
Christoph Bumiller [Fri, 22 Feb 2013 19:46:28 +0000 (20:46 +0100)]
nvc0/ir: lower atomics in s[]

11 years agonvc0/ir/emit: implement INSBF, EXTBF, PERMT and ATOM
Christoph Bumiller [Fri, 22 Feb 2013 19:35:32 +0000 (20:35 +0100)]
nvc0/ir/emit: implement INSBF, EXTBF, PERMT and ATOM

11 years agonv50/ir/emit: handle OP_ATOM
Christoph Bumiller [Wed, 20 Feb 2013 19:54:14 +0000 (20:54 +0100)]
nv50/ir/emit: handle OP_ATOM

11 years agonvc0/ir/target: some ops can't be predicated, e.g. CALL
Christoph Bumiller [Fri, 8 Mar 2013 18:08:23 +0000 (19:08 +0100)]
nvc0/ir/target: some ops can't be predicated, e.g. CALL

11 years agonv50/ir/opt: CALLs cannot load
Christoph Bumiller [Tue, 26 Feb 2013 20:05:03 +0000 (21:05 +0100)]
nv50/ir/opt: CALLs cannot load

11 years agonv50/ir: add support for indirect BRA,CALL
Christoph Bumiller [Fri, 22 Feb 2013 19:08:57 +0000 (20:08 +0100)]
nv50/ir: add support for indirect BRA,CALL

11 years agonvc0/ir/emit: implement move to and logic ops on predicates
Christoph Bumiller [Fri, 22 Feb 2013 18:10:20 +0000 (19:10 +0100)]
nvc0/ir/emit: implement move to and logic ops on predicates

11 years agonvc0/ir/emit: implement surface related ops
Christoph Bumiller [Fri, 22 Feb 2013 18:05:16 +0000 (19:05 +0100)]
nvc0/ir/emit: implement surface related ops

11 years agonv50/ir: initialize CodeEmitters' specialized target fields
Christoph Bumiller [Mon, 25 Feb 2013 11:52:43 +0000 (12:52 +0100)]
nv50/ir: initialize CodeEmitters' specialized target fields

11 years agonv50/ir/opt: make optimization aware of atomics, barriers, surface ops
Christoph Bumiller [Wed, 20 Feb 2013 20:03:30 +0000 (21:03 +0100)]
nv50/ir/opt: make optimization aware of atomics, barriers, surface ops

11 years agonv50/ir: add various new OPs that will be needed for compute
Christoph Bumiller [Fri, 22 Feb 2013 17:45:16 +0000 (18:45 +0100)]
nv50/ir: add various new OPs that will be needed for compute

11 years agonv50/ir: Rename "mkLoad" to "mkLoadv" for consistency.
Francisco Jerez [Fri, 18 May 2012 14:17:44 +0000 (16:17 +0200)]
nv50/ir: Rename "mkLoad" to "mkLoadv" for consistency.

11 years agonv50/ir: fix comparison of system values
Christoph Bumiller [Sun, 24 Feb 2013 17:36:21 +0000 (18:36 +0100)]
nv50/ir: fix comparison of system values

11 years agonv50/ir/tgsi: Translate grid-related system parameters.
Francisco Jerez [Tue, 6 Mar 2012 19:18:12 +0000 (20:18 +0100)]
nv50/ir/tgsi: Translate grid-related system parameters.

11 years agonv50/ir/tgsi: Accept COMPUTE programs.
Francisco Jerez [Mon, 14 Nov 2011 23:12:20 +0000 (00:12 +0100)]
nv50/ir/tgsi: Accept COMPUTE programs.

11 years agonv50/ir/ra: make sure all used function inputs get assigned a reg
Christoph Bumiller [Wed, 27 Feb 2013 20:08:57 +0000 (21:08 +0100)]
nv50/ir/ra: make sure all used function inputs get assigned a reg

A live range [0, 0) counts as empty. For function inputs this can
be a problem, so insert a nop at the beginning to make it [0, 1).
This is a bit of a hack but also the most simple solution.

11 years agonv50/ir/ra: also add pre-existing MERGE,SPLIT to constraint list
Christoph Bumiller [Mon, 25 Feb 2013 13:45:52 +0000 (14:45 +0100)]
nv50/ir/ra: also add pre-existing MERGE,SPLIT to constraint list

11 years agonv50/ir/ra: fix confusion with conditional RegisterSet::occupy
Christoph Bumiller [Wed, 6 Feb 2013 16:14:55 +0000 (17:14 +0100)]
nv50/ir/ra: fix confusion with conditional RegisterSet::occupy

11 years agonv50/ir/ra: swap copyCompound args if src is compound and dst isn't
Christoph Bumiller [Thu, 28 Feb 2013 22:41:41 +0000 (23:41 +0100)]
nv50/ir/ra: swap copyCompound args if src is compound and dst isn't

11 years agonv50/ir/ra: Fix maxGPR calculation for programs with multiple functions.
Francisco Jerez [Mon, 30 Apr 2012 13:22:27 +0000 (15:22 +0200)]
nv50/ir/ra: Fix maxGPR calculation for programs with multiple functions.

11 years agonv50/ir/ra: Fix traversal before the beginning of the active list in buildRIG.
Francisco Jerez [Mon, 30 Apr 2012 13:19:40 +0000 (15:19 +0200)]
nv50/ir/ra: Fix traversal before the beginning of the active list in buildRIG.

11 years agonv50/ir/ra: Fix RegisterSet::occupy(const Value *v).
Francisco Jerez [Mon, 30 Apr 2012 13:13:07 +0000 (15:13 +0200)]
nv50/ir/ra: Fix RegisterSet::occupy(const Value *v).

11 years agonv50/ir/ra: Fix argument const-ness in RegisterSet::idToUnits and idToBytes
Francisco Jerez [Mon, 30 Apr 2012 13:12:15 +0000 (15:12 +0200)]
nv50/ir/ra: Fix argument const-ness in RegisterSet::idToUnits and idToBytes

11 years agonv50/ir/opt: Fix tryPropagateBranch for BBs with several exit branches.
Francisco Jerez [Wed, 6 Feb 2013 13:12:44 +0000 (14:12 +0100)]
nv50/ir/opt: Fix tryPropagateBranch for BBs with several exit branches.

Comments and "if (bf->cfg.incidentCount() == 1)" condition added
by Christoph Bumiller.

11 years agonv50/ir: Clean up references to function values before destroying them.
Francisco Jerez [Mon, 30 Apr 2012 13:06:52 +0000 (15:06 +0200)]
nv50/ir: Clean up references to function values before destroying them.

11 years agonouveau: Bail out from nouveau_fence_wait if flushing the pushbuf fails.
Francisco Jerez [Wed, 25 Apr 2012 21:48:47 +0000 (23:48 +0200)]
nouveau: Bail out from nouveau_fence_wait if flushing the pushbuf fails.

11 years agomesa: Use correct functions for enum conversion.
Vinson Lee [Mon, 11 Mar 2013 05:51:23 +0000 (22:51 -0700)]
mesa: Use correct functions for enum conversion.

Fixes mixing enum types defects reported by Coverity.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
11 years agofreedreno: gallium driver for adreno
Rob Clark [Sat, 27 Oct 2012 16:07:34 +0000 (11:07 -0500)]
freedreno: gallium driver for adreno

Currently works on a220.  Others in the a2xx family look pretty similar
and should be pretty straightforward to support with the same driver.

The a3xx has a new shader ISA, and while many registers appear similar,
the register addresses have been completely shuffled around.  I am not
sure yet whether it is best to support with the same driver, but
different compiler, or whether it should be split into a different
driver.

v1: original
v2: build file updates from review comments, and remove GPL licensed
    header files from msm kernel
v3: smarter temp/pred register assignment, fix clear and depth/stencil
    format issues, resource_transfer fixes, scissor fixes

Signed-off-by: Rob Clark <robdclark@gmail.com>
11 years agod3d1x: Remove.
José Fonseca [Mon, 11 Mar 2013 10:13:47 +0000 (10:13 +0000)]
d3d1x: Remove.

Unused/unmaintained.

Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
11 years agonv50: Remove nv0_ir_from_sm4.*
José Fonseca [Mon, 11 Mar 2013 10:14:19 +0000 (10:14 +0000)]
nv50: Remove nv0_ir_from_sm4.*

Unused, depends on d3d1x.

Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
11 years agogallivm: clean up passing derivatives around
Roland Scheidegger [Sat, 9 Mar 2013 00:46:33 +0000 (01:46 +0100)]
gallivm: clean up passing derivatives around

Previously, the derivatives were calculated and passed in a packed form
to the sample code (for implicit derivatives, explicit derivatives were
packed to the same format).
There's several reasons why this wasn't such a good idea:
1) the derivatives may not even be needed (not as bad as it sounds since
llvm will just throw the calculations needed for them away but still)
2) the special packing format really shouldn't be part of the sampler
interface
3) depending what the sample code actually does the derivatives will
be processed differently, hence there is no "ideal" packing. For cube
maps with explicit derivatives (which we don't do yet) for instance the
packing looked downright useless, and for non-isotropic filtering we'd
need different calculations too.

So, instead just pass the derivatives as is (for explicit derivatives),
or let the rho calculating sample code calculate them itself. This still
does exactly the same packing stuff for implicit derivatives for now,
though explicit ones are handled in a more straightforward manner (quick
estimates show performance should be quite similar, though it is much
easier to follow and also does the rho calculation per-pixel until the
end, which we eventually need for spec compliance anyway).

No piglit changes.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
11 years agoi965: Fix typo in doxygen hyperlink
Chad Versace [Thu, 21 Feb 2013 03:59:07 +0000 (19:59 -0800)]
i965: Fix typo in doxygen hyperlink

s/brw_state_upload/brw_upload_state/

Found because the link was broken.

Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
11 years agomesa: Reduce memory usage for reg alloc with many graph nodes (part 2).
Eric Anholt [Wed, 20 Feb 2013 01:01:41 +0000 (17:01 -0800)]
mesa: Reduce memory usage for reg alloc with many graph nodes (part 2).

After the previous fix that almost removes an allocation of 4*n^2
bytes, we can use a bitset to reduce another allocation from n^2 bytes
to n^2/8 bytes.

Between the previous commit and this one, the peak heap size for an
oglconform ARB_fragment_program max instructions test on i965 goes from
4GB to 255MB.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55825
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agomesa: Reduce the memory usage for reg alloc with many graph nodes (part 1)
Eric Anholt [Wed, 20 Feb 2013 00:46:41 +0000 (16:46 -0800)]
mesa: Reduce the memory usage for reg alloc with many graph nodes (part 1)

We were allocating an adjacency_list entry for every possible
interference that could get created, but that usually doesn't happen.
We can save a lot of memory by resizing the array on demand.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Improve CSE performance by expiring some available expressions.
Eric Anholt [Wed, 20 Feb 2013 00:20:10 +0000 (16:20 -0800)]
i965/fs: Improve CSE performance by expiring some available expressions.

We're already walking the list, and we can easily know when something
has no reason to be in the list any longer, so take a brief extra step
to reduce our worst-case runtime (an oglconform test that emits the
maximum instructions in a fragment program).  I don't actually know what
the worst-case runtime was, because it was too long and I got bored.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Improve live variables calculation performance.
Eric Anholt [Tue, 19 Feb 2013 22:36:06 +0000 (14:36 -0800)]
i965/fs: Improve live variables calculation performance.

We can execute way fewer instructions by doing our boolean manipulation
on an "int" of bits at a time, while also reducing our working set size.

Reduces compile time of L4D2's slowest shader from 4s to 1.1s
(-72.4% +/- 0.2%, n=10)

v2: Remove redundant masking (noted by Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Also do the gen4 SEND dependency workaround against other SENDs.
Eric Anholt [Thu, 7 Mar 2013 01:50:50 +0000 (17:50 -0800)]
i965/fs: Also do the gen4 SEND dependency workaround against other SENDs.

We were handling the the dependency workaround for the first written reg
of a send preceding the one we're fixing up, but didn't consider the other
regs.  Thus if you had two sampler calls that got allocated to the same
set of regs, one might, rarely, ovewrite the other.  This was occurring in
XBMC's GLSL shaders.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44567
NOTE: This is a candidate for the stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Switch to using sampler LD messages for uniform pull constants.
Eric Anholt [Wed, 6 Mar 2013 22:47:22 +0000 (14:47 -0800)]
i965/fs: Switch to using sampler LD messages for uniform pull constants.

When forcing the compiler to always generate pull constants instead of
push constants (in order to have an easy to use testcase), improves
performance of my old GLSL demo 23.3553% +/- 1.42968% (n=7).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60866
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Fix broken rendering in large shaders with UBO loads.
Eric Anholt [Wed, 6 Mar 2013 23:58:46 +0000 (15:58 -0800)]
i965/fs: Fix broken rendering in large shaders with UBO loads.

The lowering process creates a new vgrf on gen7 that should be represented
in live interval analysis.  As-is, it was getting a conflicting allocation
with gl_FragDepth in the dolphin emulator, producing broken rendering.

NOTE: This is a candidate for the 9.1 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61317
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Add a comment about about an implementation detail.
Eric Anholt [Thu, 7 Mar 2013 01:12:28 +0000 (17:12 -0800)]
i965/fs: Add a comment about about an implementation detail.

I was going to fix the code above like the previous commit, but we already
had that covered (otherwise all our uniform access would have been broken,
unlike just pull constants).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agoi965/fs: Fix register allocation for uniform pull constants in 16-wide.
Eric Anholt [Thu, 7 Mar 2013 00:38:10 +0000 (16:38 -0800)]
i965/fs: Fix register allocation for uniform pull constants in 16-wide.

We were allowing a compressed instruction to write a register that
contained the last use of a uniform pull constant (either UBO load or push
constant spillover), so it would get half its values smashed.

Since we need to see the actual instruction to decide this, move the
pre-gen6 pixel_x/y logic here, which should improve the performance of
register allocation since virtual_grf_interferes() is called more than
once per instruction.

NOTE: This is a candidate for the stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61317
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agointel: Remove some unused debug flags.
Eric Anholt [Wed, 6 Mar 2013 00:24:07 +0000 (16:24 -0800)]
intel: Remove some unused debug flags.

I was looking at the list to see what might be interesting to document for
application developers, and it turns out some are completely dead.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
11 years agodraw/gs: Correctly iterate the emitted primitives
Zack Rusin [Fri, 8 Mar 2013 03:15:03 +0000 (19:15 -0800)]
draw/gs: Correctly iterate the emitted primitives

We were assuming that each emitted primitive had the same
number of vertices. That is incorrect. Emitted primitives
can have arbirtrary number of vertices. Simply increment
index on iteration to fix it.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
11 years agotgsi/exec: Correctly reset NumOutputs before parsing the shader
Zack Rusin [Fri, 8 Mar 2013 03:11:28 +0000 (19:11 -0800)]
tgsi/exec: Correctly reset NumOutputs before parsing the shader

Whenever we're binding the shaders we're incrementing NumOutputs,
assuming the parser spots an output decleration, but we were never
reseting the variable. That means that each subsequent bind of
a geometry shader would add its number of output to the number
of output bound by all previously ran shaders and our indexes
would get completely messed up.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
11 years agodraw/llvm: another quick hack for drawing with no position output
Roland Scheidegger [Mon, 11 Mar 2013 16:03:55 +0000 (17:03 +0100)]
draw/llvm: another quick hack for drawing with no position output

Also need to skip things if we have no cv value but pos value
(happens with geometry shaders enabled).
Needs a round of cleanup, though.

11 years agosoftpipe: don't use samplers with prebaked sampler and sampler_view state
Roland Scheidegger [Fri, 8 Mar 2013 21:29:34 +0000 (22:29 +0100)]
softpipe: don't use samplers with prebaked sampler and sampler_view state

This is needed for handling the dx10-style sample opcodes.
This also simplifies the logic by getting rid of sampler variants
completely (sampler_views though OTOH have sort of variants because
some of their state is different depending on the shader stage they
are bound to).
No significant performance difference (openarena run:
840 frames in 459.8 seconds vs. 840 frames in 460.5 seconds).

v2: fix reference counting bug spotted by Jose.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
11 years agotgsi: emit code for SVIEWINFO and SAMPLE_I
Roland Scheidegger [Fri, 8 Mar 2013 21:10:21 +0000 (22:10 +0100)]
tgsi: emit code for SVIEWINFO and SAMPLE_I

Can handle them since the single sampler interface was introduced.

v2: simplify txf/sample_i handling a bit according to Brian's feedback.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
11 years agotgsi: fix wrong reg used for unit for TGSI_OPCODE_TXF
Roland Scheidegger [Fri, 8 Mar 2013 18:45:52 +0000 (19:45 +0100)]
tgsi: fix wrong reg used for unit for TGSI_OPCODE_TXF

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
11 years agor600g/llvm: Fix build
Tom Stellard [Mon, 11 Mar 2013 15:10:51 +0000 (11:10 -0400)]
r600g/llvm: Fix build

11 years agor600g: add debug options disabling various copy-buffer-related features
Marek Olšák [Tue, 5 Mar 2013 00:15:45 +0000 (01:15 +0100)]
r600g: add debug options disabling various copy-buffer-related features

This will be invaluable for debugging and bug reports.

11 years agomesa: don't allocate a texture if width or height is 0 in CopyTexImage
Marek Olšák [Mon, 4 Mar 2013 12:26:51 +0000 (13:26 +0100)]
mesa: don't allocate a texture if width or height is 0 in CopyTexImage

NOTE: This is a candidate for the stable branches.

Reviewed-by: Brian Paul <brianp@vmware.com>
11 years agogallium/util: attempt to fix blitting multisample texture arrays
Marek Olšák [Sun, 3 Mar 2013 16:33:11 +0000 (17:33 +0100)]
gallium/util: attempt to fix blitting multisample texture arrays

We don't have a test for this yet, but obviously the swizzle was wrong.

11 years agor600g: allocate FMASK right after the texture, so that it's aligned with it
Marek Olšák [Sun, 3 Mar 2013 13:54:31 +0000 (14:54 +0100)]
r600g: allocate FMASK right after the texture, so that it's aligned with it

This avoids the kernel CS checker errors with MSAA textures.

Reviewed-by: Jerome Glisse <jglisse@redhat.com>
11 years agor600g: remove r600.h, move the stuff elsewhere (mostly to r600_pipe.h)
Marek Olšák [Sun, 3 Mar 2013 13:33:00 +0000 (14:33 +0100)]
r600g: remove r600.h, move the stuff elsewhere (mostly to r600_pipe.h)

Reviewed-by: Jerome Glisse <jglisse@redhat.com>
11 years agor600g: remove r600_hw_context_priv.h, move the stuff to r600_pipe.h
Marek Olšák [Sun, 3 Mar 2013 13:21:34 +0000 (14:21 +0100)]
r600g: remove r600_hw_context_priv.h, move the stuff to r600_pipe.h

Reviewed-by: Jerome Glisse <jglisse@redhat.com>
11 years agor600g: remove deprecated state management code
Marek Olšák [Sat, 2 Mar 2013 16:36:05 +0000 (17:36 +0100)]
r600g: remove deprecated state management code

It's nice to see so much code that did pretty much nothing go away.

Reviewed-by: Jerome Glisse <jglisse@redhat.com>
11 years agor600g: atomize pixel shader
Marek Olšák [Sat, 2 Mar 2013 16:14:51 +0000 (17:14 +0100)]
r600g: atomize pixel shader

Reviewed-by: Jerome Glisse <jglisse@redhat.com>