Vadim Girlin [Tue, 30 Apr 2013 16:50:24 +0000 (20:50 +0400)]
r600g/sb: don't propagate dead values in GVN pass
In some cases we use value::gvn_source field to link values that
are known to be equal before gvn pass (e.g. results of DOT4 in different
slots of the same alu group), but then source value may become dead later
and this confuses further passes.
This patch resets value::gvn_source to NULL in the dce_cleanup pass
if it points to dead value.
Fixes segfault during shader optimization with ETQW.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Vadim Girlin [Sat, 27 Apr 2013 08:03:39 +0000 (12:03 +0400)]
r600g/sb: use simple heuristic to limit register pressure
It's not a complete register pressure tracking, yet it helps to prevent
register allocation problems in some cases where they were observed.
The problems are uncovered by false dependencies between fetch instructions
introduced by some recent changes in TGSI and/or default backend.
Sometimes we have code like this:
...
SAMPLE R5.xyzw, R5.xyzw
... store R5.xyzw somewhere
MOV R5.x, <next x coord>
MOV R5.y, <next y coord>
SAMPLE R5.xyzw, R5.xyzw
... <may be repeated a lot of times>
With 2D resources, z and w in SAMPLE src reg aren't used and can be simply
masked, but shader backend doesn't have this information, so it's
considered as data dependency by optimization algorithms.
Vadim Girlin [Tue, 23 Apr 2013 06:34:42 +0000 (10:34 +0400)]
r600g/sb: improve error checking in ra_coalesce pass
Vadim Girlin [Tue, 23 Apr 2013 06:34:00 +0000 (10:34 +0400)]
r600g/sb: use source bytecode in case of optimization errors
Vadim Girlin [Tue, 30 Apr 2013 16:53:15 +0000 (20:53 +0400)]
r600g: plug in optimizing backend
Optimization is enabled with "R600_DEBUG=sb".
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Vadim Girlin [Tue, 30 Apr 2013 16:51:36 +0000 (20:51 +0400)]
r600g/sb: initial commit of the optimizing shader backend
Vadim Girlin [Sun, 21 Apr 2013 15:10:32 +0000 (19:10 +0400)]
r600g: use enum type for domains field in struct r600_resource
This prevents the problems when the header is included in C++ code.
Vadim Girlin [Sun, 21 Apr 2013 15:11:36 +0000 (19:11 +0400)]
r600g: add new flags to isa instruction tables
Vadim Girlin [Fri, 1 Feb 2013 09:51:25 +0000 (13:51 +0400)]
r600g: always create reverse lookup isa tables
Vadim Girlin [Thu, 25 Apr 2013 15:42:31 +0000 (19:42 +0400)]
r600g: mask unused source components for SAMPLE
This results in more clean shader code and may improve the quality of
optimized code produced by r600-sb due to eliminated false dependencies
in some cases.
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Eric Anholt [Fri, 19 Apr 2013 21:51:55 +0000 (14:51 -0700)]
intel: Remove the last spans code!
The remaining bits happen to do nothing that
_swrast_span_render_start()/finish() don't do.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Eric Anholt [Fri, 19 Apr 2013 21:50:43 +0000 (14:50 -0700)]
intel: Move the S8 offset calc function near its remaining usage.
It's not really span code ever since we stopped using spans for S8.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Eric Anholt [Fri, 19 Apr 2013 21:47:28 +0000 (14:47 -0700)]
intel: Ensure renderbuffers are current when mapping them.
In the case of renering to windows in X, we would render to stale buffers
(or not render at all!) if you hit a MapRenderbuffer as the first thing
done to your window after new buffers are ready to be collected in DRI2.
I think this also covers the weird comment about irb->mt being missing
sometimes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Eric Anholt [Fri, 19 Apr 2013 20:57:17 +0000 (13:57 -0700)]
mesa: Add a clarifying comment about rowStride of compressed textures.
I always forget how we do this for compressed textures.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Eric Anholt [Fri, 19 Apr 2013 20:00:02 +0000 (13:00 -0700)]
mesa: Remove the Map field from texture images.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Eric Anholt [Fri, 19 Apr 2013 20:10:55 +0000 (13:10 -0700)]
swrast: Always use MapTextureImage for mapping textures for swrast.
Now that everything goes through ImageSlices[], we can rely on the
driver's existing texture mapping function.
A big block of code goes away on Radeon that looks like it was to deal with
the validate that happened at SpanRenderStart, which no longer occurs since we
don't need validation for the MapTextureImage hook.
v2: Rewrite comment about ImageSlices, fix duplicated swImages, touch up
unmap loop.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Brian Paul <brianp@vmware.com>
Eric Anholt [Sat, 20 Apr 2013 04:32:06 +0000 (21:32 -0700)]
nouveau: Replace swrast_texture_image->Map usage with ->Buffer.
This code is trying to deal with providing a map in the case that
AllocTexImageBuffer was called, which is hooked up to the swrast variant.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Eric Anholt [Sat, 20 Apr 2013 05:05:21 +0000 (22:05 -0700)]
nouveau: Just use MapTextureImage instead of duplicating the logic.
MapTextureImage has the exact same logic, except it can also handle
swrast-allocated buffers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Eric Anholt [Fri, 19 Apr 2013 21:00:22 +0000 (14:00 -0700)]
swrast: Make a teximage's stored RowStride be in terms of bytes per row.
For hardware drivers with pitch alignment requirements, a
non-power-of-two-sized texture format won't end up being an integer number
of pixels per row. Also, avoids having to change our units between
MapTextureImage's rowStride and swrast's RowStride.
This doesn't fully convert the compressed texel fetch path, but does make
sure we don't drop any bits (not that we'd expect to).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Eric Anholt [Fri, 19 Apr 2013 19:51:20 +0000 (12:51 -0700)]
swrast: Replace use of teximage Map in 1D/2D paths with ImageSlices[0].
This gets us ready for the Map field to die.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Eric Anholt [Fri, 19 Apr 2013 18:44:53 +0000 (11:44 -0700)]
swrast: Replace ImageOffsets with an ImageSlices pointer.
This is a step toward allowing drivers to use their normal mapping paths,
instead of requiring that all slice mappings come from an aligned offset
from the first slice's map.
This incidentally fixes missing slice handling in FXT1 swrast.
v2: Use slice height helper function.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Eric Anholt [Fri, 19 Apr 2013 18:57:28 +0000 (11:57 -0700)]
swrast: Reuse _swrast_free_texture_image_buffer from drivers.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Eric Anholt [Fri, 19 Apr 2013 18:56:35 +0000 (11:56 -0700)]
swrast: Move ImageOffsets allocation to shared code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Eric Anholt [Fri, 19 Apr 2013 20:35:31 +0000 (13:35 -0700)]
swrast: Clean up and explain the mapping process.
v2: Move slice height calculation to a helper function (recommeded by Brian).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Reviewed-by: Brian Paul <brianp@vmware.com>
Eric Anholt [Fri, 19 Apr 2013 20:30:34 +0000 (13:30 -0700)]
swrast: Factor out texture slice counting.
This function going to get used a lot more in upcoming patches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Eric Anholt [Fri, 19 Apr 2013 22:45:33 +0000 (15:45 -0700)]
radeon: Remove some dead teximage mapping code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Eric Anholt [Fri, 19 Apr 2013 18:52:36 +0000 (11:52 -0700)]
radeon: Add missing swrast field initialization.
This is the equivalent of intel's
80513ec8b4c812b9c6249cc5824337a5f04ab34c.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Vincent Lejeune [Tue, 30 Apr 2013 14:08:58 +0000 (16:08 +0200)]
r600g/llvm: Fix opencl build
Alexander von Gluck IV [Mon, 29 Apr 2013 23:08:41 +0000 (18:08 -0500)]
Gallium: Use mmap on Haiku for executable memory vs malloc
* Haiku now has DEP enabled by default.
Alexander von Gluck IV [Mon, 29 Apr 2013 23:08:26 +0000 (18:08 -0500)]
Mapi: Use mmap on Haiku for executable memory vs malloc
* Haiku now has DEP enabled by default.
Alexander von Gluck IV [Mon, 29 Apr 2013 23:08:02 +0000 (18:08 -0500)]
Mesa: Use mmap on Haiku for executable memory vs malloc
* Haiku now has DEP enabled by default.
Vincent Lejeune [Sat, 27 Apr 2013 22:01:00 +0000 (00:01 +0200)]
r600g/llvm: get use_kill from compiler shader
Eric Anholt [Mon, 8 Apr 2013 23:38:57 +0000 (16:38 -0700)]
i965/fs: Print out the estimated cycle count in INTEL_DEBUG=wm
This could be used by shader-db for hopefully more accurate regression
testing.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Fri, 26 Apr 2013 03:20:05 +0000 (20:20 -0700)]
i965/fs: Allow LRPs with uniform registers.
Improves GLB2.7 performance on my HSW by 0.671455% +/- 0.225037% (n=62).
v2: Make is_valid_3src() a method of the fs_reg. (recommended by Ken)
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
Eric Anholt [Thu, 25 Apr 2013 21:41:36 +0000 (14:41 -0700)]
intel: Be more conservative in disabling tiling to save memory.
Improves GLB2.7 trex performance 1.01985% +/- 0.721366% on my IVB (n=10)
and by 3.38771% +/- 0.584241% (n=15) on my HSW, due to a 32x32 ARGB8888
cubemap going from untiled to tiled.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Eric Anholt [Thu, 25 Apr 2013 19:34:07 +0000 (12:34 -0700)]
i965: Disable Z16 on contexts that don't require it.
It appears that Z16 on Intel hardware is in fact slower than Z24, so
people are getting surprisingly hurt when trying to use Z16 as a
performance-versus-precision tradeoff, or when they're targeting GLES2 and
that's all you get.
GL 3.0+ have Z16 on the list of required exact format sizes, but GLES
doesn't, so choose the better-performing layout in that case. Improves
GLB 2.7 trex performance at 1920x1080 by 10.7% +/- 1.1% (n=3) on my IVB
system.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Tue, 23 Apr 2013 02:41:40 +0000 (19:41 -0700)]
intel: Report FBO incompleteness causes through GL_ARB_debug_output.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Mon, 22 Apr 2013 23:04:25 +0000 (16:04 -0700)]
intel: Fold the one last function intel_tex_format.c into the caller.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Wed, 10 Apr 2013 17:04:11 +0000 (10:04 -0700)]
mesa: Fix error checking for GS UBO getters.
These are supposed to be present if both things are available, but we were
enabling them if either one was.
Eric Anholt [Wed, 10 Apr 2013 16:59:41 +0000 (09:59 -0700)]
mesa: Add a clarifying comment about EXTRA_ error checking.
Eric Anholt [Wed, 10 Apr 2013 16:53:11 +0000 (09:53 -0700)]
mesa: Add an extra clarifying set of braces to getter checking.
For this multi-page single statement, my thought the end was to that the
next block was mis-indented, rather than that the dropped indentation
actually indicated the end of the loop.
Eric Anholt [Wed, 10 Apr 2013 16:49:37 +0000 (09:49 -0700)]
mesa: Fix error checking for getters consisting of only API versions.
In almost all of our cases, getters that are turned on for only some API
variants will have an extension listed as one of the things that can
enable it, and thus api_check gets set. For extra_gl30_es3 (used for
NUM_EXTENSIONS, MAJOR_VERSION, MINOR_VERSION) on a GL 2.1 context, though,
we would check twice, not find either one, but never actually throw the
error.
Eric Anholt [Wed, 10 Apr 2013 16:47:12 +0000 (09:47 -0700)]
mesa: Clarify the names of error checking variables for glGet.
There's no reason to actually count these things, so the integer ++
behavior was just confusing.
Eric Anholt [Thu, 18 Apr 2013 02:10:29 +0000 (19:10 -0700)]
i915: Add support for GL_EXT_texture_sRGB and GL_EXT_texture_sRGB_decode.
This brings the driver up to GL 2.1.
Eric Anholt [Wed, 17 Apr 2013 20:55:08 +0000 (13:55 -0700)]
i915: Always enable GL 2.0 support.
There's no point in shipping a non-GL2 driver today.
Eric Anholt [Wed, 17 Apr 2013 20:58:00 +0000 (13:58 -0700)]
i915: Correctly set the OQ counter bits.
While we may provide the extension, we need to tell applications that they
can't actually use it:
An implementation can either set QUERY_COUNTER_BITS_ARB to the
value 0, or to some number greater than or equal to n. If an
implementation returns 0 for QUERY_COUNTER_BITS_ARB, then the
occlusion queries will always return that zero samples passed the
occlusion test, and so an application should not use occlusion
queries on that implementation.
Kenneth Graunke [Sun, 28 Apr 2013 08:35:57 +0000 (01:35 -0700)]
i965: Move is_math/is_tex/is_control_flow() to backend_instruction.
These are entirely based on the opcode, which is available in
backend_instruction. It makes sense to only implement them in one
place.
This changes the VS implementation of is_tex() slightly, which now
accepts FS_OPCODE_TXB and SHADER_OPCODE_LOD. However, since those
aren't generated in the VS anyway, it should be fine.
This also makes is_control_flow() available in the VS.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Zack Rusin [Sat, 27 Apr 2013 06:51:26 +0000 (02:51 -0400)]
draw/so: fix overflow calculation
only report overflow for missing targets if they're actually being
used. if the targets are missing but are not being used by any
slot in the stream output declaration we should correctly just
ignore them.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
José Fonseca [Mon, 29 Apr 2013 14:40:06 +0000 (15:40 +0100)]
llvmpipe: Fix queries when screen->num_threads == 0.
That is, when llvmpipe is run in single-threaded mode.
Trivial.
Tested with
LP_NUM_THREADS=0 glean --run results --overwrite --quick --tests occluQry
José Fonseca [Mon, 29 Apr 2013 14:12:26 +0000 (15:12 +0100)]
Revert "st/mesa: add a simple path to BufferData if it only discards buffer contents"
This reverts commit
5649f886f76023532538b8792605a3578cec1ed1.
It causes segfaults when size is zero.
Jerome Glisse [Wed, 24 Apr 2013 23:15:52 +0000 (19:15 -0400)]
r600g: force full cache for hyperz
Seems that in some case allowing half cache usage confuse the gpu
and trigger lockup. Force full cache use.
Should fix :
https://bugs.freedesktop.org/show_bug.cgi?id=59592
https://bugs.freedesktop.org/show_bug.cgi?id=60848
https://bugs.freedesktop.org/show_bug.cgi?id=60969
https://bugs.freedesktop.org/show_bug.cgi?id=61747
https://bugs.freedesktop.org/show_bug.cgi?id=62466
https://bugs.freedesktop.org/show_bug.cgi?id=62669
https://bugs.freedesktop.org/show_bug.cgi?id=62721
https://bugs.freedesktop.org/show_bug.cgi?id=63124
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Rob Clark [Mon, 29 Apr 2013 11:36:27 +0000 (07:36 -0400)]
freedreno: fix rebase screw-up
Add back 2nd arg to emit_vertexbufs() which got lost in rebase.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Chris Forbes [Fri, 26 Apr 2013 23:00:46 +0000 (11:00 +1200)]
i965/fs: Don't try to use bogus interpolation modes pre-Gen6.
Interpolation modes other than perspective-barycentric-pixel-center (and
their associated coefficients in the WM payload) only exist in Gen6 and
later.
Unfortunately, if a varying was declared as `centroid`, we would blindly
read the nonexistant values, and so produce all manner of bad behavior
-- texture swimming, snow, etc.
Fixes rendering in Counter-Strike Source and Team Fortress 2 on
Ironlake.
NOTE: This is a candidate for the 9.1 branch.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Jordan Justen <jordan.l.justen@intel.com>
Matt Turner [Sun, 28 Apr 2013 21:35:01 +0000 (14:35 -0700)]
i965/vs: Fix order of source arguments to LRP.
The order or arguments matches DirectX, and is backwards from GLSL's
mix() built-in.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=63983
Zack Rusin [Sat, 27 Apr 2013 04:52:49 +0000 (00:52 -0400)]
llvmpipe: stop crashing when one of the so targets is null
Fixes a crash when one of the so targets is null.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Zack Rusin [Sat, 27 Apr 2013 04:49:23 +0000 (00:49 -0400)]
draw/so: indicate overflow when buffer is missing
We were crashing if one of the buffers wasn't set, we should
just treat it as an overflow. It's useful when using so
statistics because it allows one to figure out how much data
would be generated by so without actually writing any of it.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Zack Rusin [Sat, 27 Apr 2013 02:53:07 +0000 (22:53 -0400)]
gallivm: fix indirect addressing of temps in soa mode
we weren't adding the soa offsets when constructing the indices
for the gather functions. That meant that we were always returning
the data in the first vertex/primitive/pixel in the SoA structure
and not correctly fetching from all structures.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Zack Rusin [Wed, 24 Apr 2013 03:36:40 +0000 (23:36 -0400)]
tgsi/ureg: Add a function to return the number of outputs
We already hold the variable, just weren't providing access
to it.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Zack Rusin [Tue, 23 Apr 2013 22:56:47 +0000 (18:56 -0400)]
draw/so: Fix overflow calculations
We weren't taking the buffer offset, destination offset or the
stride into consideration so we were frequently writing into
an overflown buffer.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Zack Rusin [Tue, 23 Apr 2013 22:47:08 +0000 (18:47 -0400)]
draw/llvm: fix viewport transformations
This was a very serious bug. We were always doing the viewport
transformations on the first output of the vertex shader. That means
that every application that was storing position in anything but
OUT[0] was outputing untransformed vertices and had broken output
for whatever it was storing at OUT[0]. Correctly take into
consideration where the vertex position is actually stored.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Zack Rusin [Sat, 27 Apr 2013 03:00:38 +0000 (23:00 -0400)]
gallium: increase the number of available stream output decls
There can be more stream output decls than shader outputs because
individual components from them can be split and distributed
among different so buffers.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Zack Rusin [Tue, 23 Apr 2013 10:19:14 +0000 (06:19 -0400)]
llvmpipe: implement so_overflow query
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Brian Paul [Fri, 26 Apr 2013 19:49:35 +0000 (13:49 -0600)]
mesa: fix the compressed TexSubImage size checking code
Before, we'd incorrectly generate an error if we we tried to
replace a non-4x4 block near the edge of a NPOT compressed texture.
For example, if the dest image was 15 texels wide and xoffset=12
and width=3 we'd incorrectly generate GL_INVALID_OPERATION.
Verified with new tests added to piglit s3tc-errors test.
Note: This is a candidate for the stable branches.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Brian Paul [Fri, 26 Apr 2013 13:31:49 +0000 (07:31 -0600)]
llvmpipe: replace LP_MAX_THREADS with screen->num_threads in query code
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Brian Paul [Fri, 26 Apr 2013 13:26:46 +0000 (07:26 -0600)]
llvmpipe: bump LP_MAX_THREADS to 16
On the mesa-users list, Burlen Loring reported a speed-up with 16 cores
and his test/app.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Brian Paul [Fri, 26 Apr 2013 13:26:06 +0000 (07:26 -0600)]
mesa: updated read_buffer_enum_to_index() comment
Remove the part about the value of gl_framebuffer::Name.
Christian König [Fri, 26 Apr 2013 09:49:55 +0000 (11:49 +0200)]
r600/uvd: stop advertising MPEG4 on UVD 2.x chips v2
That is just not supported by the hardware.
v2: fix compare
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Fri, 26 Apr 2013 09:16:19 +0000 (11:16 +0200)]
radeon/uvd: stop using anonymous unions
Signed-off-by: Christian König <christian.koenig@amd.com>
Tapani Pälli [Thu, 18 Apr 2013 06:21:27 +0000 (09:21 +0300)]
mesa: fix type comparison errors in sub-texture error checking code
patch fixes a crash that happens if glTexSubImage2D is called with a
negative xoffset.
NOTE: This is a candidate for stable branches.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
José Fonseca [Thu, 25 Apr 2013 17:51:17 +0000 (18:51 +0100)]
Revert "draw: Yield zeros for LLVM fetches of non-existing vertex elements."
After more thought/discussion, it seems it is better to handle this sort
of stuff in the state tracker.
So this reverts commit
12096f334b82340dc165ed15e6f8f44d4cf94df4, except the
variant->key -> key shorthands.
Chia-I Wu [Wed, 12 Dec 2012 22:01:23 +0000 (06:01 +0800)]
ilo: add the driver to the build system
Add ilo to targets/egl-static and add a new target dri-ilo. Update autoconf
and automake rules.
Chia-I Wu [Wed, 12 Dec 2012 21:48:46 +0000 (05:48 +0800)]
ilo: compile VS/GS/FS with the toy compiler
Chia-I Wu [Wed, 12 Dec 2012 21:48:28 +0000 (05:48 +0800)]
ilo: add a toy shader compiler
This is a simple shader compiler that performs almost zero optimizations. The
generated code is usually much larger comparing to that generated by i965.
The generated code also requires many more registers.
Function-wise, it lacks register spilling and does not support most TGSI
indirections. Other than those, it works alright.
Chia-I Wu [Wed, 12 Dec 2012 21:44:41 +0000 (05:44 +0800)]
ilo: hook up pipe context GPGPU functions
This just adds a stub.
Chia-I Wu [Wed, 12 Dec 2012 21:43:04 +0000 (05:43 +0800)]
ilo: hook up pipe context video functions
This just hooks them up with auxiliary/vl layer.
Chia-I Wu [Wed, 12 Dec 2012 21:35:37 +0000 (05:35 +0800)]
ilo: add support for time/occlusion/primitive queries
Chia-I Wu [Tue, 16 Apr 2013 08:36:03 +0000 (16:36 +0800)]
ilo: hook up pipe context 3D functions
Chia-I Wu [Tue, 16 Apr 2013 10:09:35 +0000 (18:09 +0800)]
ilo: add GEN7 support for 3D pipeline
Chia-I Wu [Wed, 12 Dec 2012 21:28:42 +0000 (05:28 +0800)]
ilo: add 3D pipeline for GEN6
The 3D pipeline is a high-level interface to emit 3D commands and states. It
uses GEN6 GPE to do the real work.
Chia-I Wu [Tue, 16 Apr 2013 10:09:01 +0000 (18:09 +0800)]
ilo: add GEN7 GPE
Chia-I Wu [Wed, 12 Dec 2012 21:23:34 +0000 (05:23 +0800)]
ilo: add GEN6 GPE
GEN6 GPE (Graphics Processing Engine) is a low-level interface to emit 3D
commands and states.
Chia-I Wu [Wed, 12 Dec 2012 21:20:40 +0000 (05:20 +0800)]
ilo: hook up pipe context query functions
None of the query types are supported yet.
Chia-I Wu [Wed, 12 Dec 2012 21:18:25 +0000 (05:18 +0800)]
ilo: hook up pipe context transfer functions
Chia-I Wu [Wed, 12 Dec 2012 21:15:10 +0000 (05:15 +0800)]
ilo: hook up pipe context blit functions
Chia-I Wu [Tue, 16 Apr 2013 08:27:50 +0000 (16:27 +0800)]
ilo: hook up pipe context state functions
Chia-I Wu [Wed, 12 Dec 2012 21:05:01 +0000 (05:05 +0800)]
ilo: add functions to manage shaders
This commits add shader cache, shader state, shader variant, and etc. It does
not add the shader compiler though.
Chia-I Wu [Tue, 16 Apr 2013 08:24:40 +0000 (16:24 +0800)]
ilo: hook up pipe context flush function
Chia-I Wu [Wed, 12 Dec 2012 20:36:41 +0000 (04:36 +0800)]
ilo: add command parser
The command parser manages batch buffers and command submissions.
Chia-I Wu [Wed, 12 Dec 2012 20:44:21 +0000 (04:44 +0800)]
ilo: hook up pipe screen resource functions
Chia-I Wu [Wed, 12 Dec 2012 20:43:01 +0000 (04:43 +0800)]
ilo: hook up pipe screen format functions
Chia-I Wu [Wed, 12 Dec 2012 20:26:23 +0000 (04:26 +0800)]
ilo: hook up pipe_screen param and fence functions
Chia-I Wu [Wed, 12 Dec 2012 20:24:40 +0000 (04:24 +0800)]
ilo: add debug flags settable through ILO_DEBUG
Chia-I Wu [Wed, 12 Dec 2012 20:07:16 +0000 (04:07 +0800)]
ilo: new pipe driver for Intel GEN6+
This commit adds some boilerplate code. The header files found under include/
are copied from i965.
Chia-I Wu [Wed, 12 Dec 2012 19:52:50 +0000 (03:52 +0800)]
winsys/intel: new winsys for intel
This is a wrapper for libdrm_intel to allow the pipe driver to stay OS
agnostic.
José Fonseca [Fri, 26 Apr 2013 07:43:00 +0000 (08:43 +0100)]
gallivm: Fix trivial out-of-bounds indirection in lp_build_cube_lookup().
Courtesy of clang:
src/gallium/auxiliary/gallivm/lp_bld_sample.c:1483:10: warning: array index of '2' indexes past the end of an array (that contains 2 elements) [-Warray-bounds]
tmp[2] = lp_build_swizzle_aos(coord_bld, ddx_ddy[1], swizzle02);
^ ~
src/gallium/auxiliary/gallivm/lp_bld_sample.c:1430:10: note: array 'tmp' declared here
LLVMValueRef ddx_ddy[2], tmp[2], rho_vec;
^
src/gallium/auxiliary/gallivm/lp_bld_sample.c:1487:56: warning: array index of '2' indexes past the end of an array (that contains 2 elements) [-Warray-bounds]
rho_vec = lp_build_add(coord_bld, rho_vec, tmp[2]);
^ ~
src/gallium/auxiliary/gallivm/lp_bld_sample.c:1430:10: note: array 'tmp' declared here
LLVMValueRef ddx_ddy[2], tmp[2], rho_vec;
^
src/gallium/auxiliary/gallivm/lp_bld_sample.c:1491:56: warning: array index of '2' indexes past the end of an array (that contains 2 elements) [-Warray-bounds]
rho_vec = lp_build_max(coord_bld, rho_vec, tmp[2]);
^ ~
src/gallium/auxiliary/gallivm/lp_bld_sample.c:1430:10: note: array 'tmp' declared here
LLVMValueRef ddx_ddy[2], tmp[2], rho_vec;
^
Matt Turner [Thu, 25 Apr 2013 18:03:38 +0000 (11:03 -0700)]
i965/vs: Add support for LRP instruction.
Only 13 affected programs in shader-db, but they were all helped.
total instructions in shared programs: 368877 -> 368851 (-0.01%)
instructions in affected programs: 1576 -> 1550 (-1.65%)
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Matt Turner [Thu, 25 Apr 2013 18:02:02 +0000 (11:02 -0700)]
i965/vs: Add a function to fix-up uniform arguments for 3-src insts.
Three-source instructions have a vertical stride overloaded to 4, which
prevents directly using vec4 uniforms as arguments. Instead we need to
insert a MOV instruction to do the replication for the three-source
instruction.
With this in place, we can use three-source instructions in the vertex
shader. While some thought needs to go into deciding whether its better
to use a three-source instruction rather than a sequence of equivalent
instructions (when one or more sources are uniforms or immediates), this
will allow us to skip a lot of ugly lowering code and use the BFE and
BFI2 instructions directly.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Jerome Glisse [Tue, 23 Apr 2013 23:22:33 +0000 (19:22 -0400)]
winsys/radeon: consolidate tracing into winsys v2
This move the tracing timeout and printing into winsys and add
an debug environement variable for it (R600_DEBUG=trace_cs).
Lot of file touched because of winsys API changes.
v2: Do not write lockup file if ib uniq id does not match last one
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Tom Stellard [Mon, 22 Apr 2013 16:12:07 +0000 (09:12 -0700)]
r600g/compute: Removed unused and untested code
There was a lot of code in evergreen_compute_internal.c that was not
being used at all and most of it was duplicating code from other parts
of the driver.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tom Stellard [Mon, 22 Apr 2013 15:38:40 +0000 (08:38 -0700)]
r600g/compute: Use a constant buffer to store kernel parameters v2
v2:
- Fix usage of set_constant_buffer()
- Fix typo in comment
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>