Eric Anholt [Tue, 12 Mar 2013 00:36:54 +0000 (17:36 -0700)]
i965: Add names for all instructions to dump_instruction() in FS and VS.
I'd previously added the minimum names to understand my dumps, but this
makes dumps in general much easier to read.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Wed, 6 Mar 2013 22:54:27 +0000 (14:54 -0800)]
i965: Enable ARB_texture_query_lod.
v2: Support Ironlake as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Wed, 6 Mar 2013 22:47:01 +0000 (14:47 -0800)]
i965/fs: Generate LOD sampler message from ir_lod.
v2: Support Ironlake as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Dave Airlie [Sun, 23 Sep 2012 09:50:41 +0000 (19:50 +1000)]
glsl: Implement ARB_texture_query_lod
v2 [mattst88]:
- Rebase.
- #define GL_ARB_texture_query_lod to 1.
- Remove comma after ir_lod in ir.h for MSVC.
- Handled ir_lod in ir_hv_accept.cpp, ir_rvalue_visitor.cpp,
opt_tree_grafting.cpp.
- Rename textureQueryLOD to textureQueryLod, see
https://www.khronos.org/bugzilla/show_bug.cgi?id=821
- Fix ir_reader of (lod ...).
v3 [mattst88]:
- Rename textureQueryLod to textureQueryLOD, pending resolution of
Khronos 821.
- Add ir_lod case to ir_to_mesa.cpp.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Thu, 28 Mar 2013 18:38:57 +0000 (11:38 -0700)]
i965/fs: Use measured Gen7 instruction timings on Gen6.
x before
+ after
+------------------------------------------------------------------------------+
| x x + |
| xx ++ x + |
| xx ++ + xx ++ |
|x xxx x+++++ + xxx x*x+*+++ + x +|
| |_____|____________A______A____M____M_|_______| |
+------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 23 8083.78 8287.83 8205.55 8162.7461 68.307951
+ 23 8107.56 8358.74 8224.33 8186.1765 71.506301
No difference proven at 95.0% confidence
Reviewed-by: Eric Anholt <eric@anholt.net>
Matt Turner [Thu, 28 Mar 2013 18:15:20 +0000 (11:15 -0700)]
i965/fs: Increase and document MAD latency on Gen7.
58% of mad(8) generated in shader-db are reading registers from the same
bank.
Reviewed-by: Eric Anholt <eric@anholt.net>
Matt Turner [Thu, 28 Mar 2013 17:57:34 +0000 (10:57 -0700)]
i965/fs: Add LRP instruction latency.
Set its latency to what happens to be the default floating-point
instruction latency. One day we may want to handle latency based on
register bank information.
Reviewed-by: Eric Anholt <eric@anholt.net>
Matt Turner [Fri, 1 Mar 2013 00:42:51 +0000 (16:42 -0800)]
i965/fs: Add Haswell cycle timings
Reviewed-by: Eric Anholt <eric@anholt.net>
Matt Turner [Thu, 28 Mar 2013 17:46:17 +0000 (10:46 -0700)]
i965: Note that write-after-write dependencies are blocking.
Reviewed-by: Eric Anholt <eric@anholt.net>
Matt Turner [Thu, 28 Mar 2013 17:45:34 +0000 (10:45 -0700)]
i965: Reword comment about the shared mathbox.
Reviewed-by: Eric Anholt <eric@anholt.net>
Roland Scheidegger [Fri, 29 Mar 2013 05:16:33 +0000 (06:16 +0100)]
gallivm: consolidate some half-to-float and r11g11b10-to-float code
Similar enough that we can try to use shared code.
v2: fix a stupid bug using wrong variable causing mayhem with Inf and NaNs.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com
Chris Forbes [Fri, 29 Mar 2013 03:22:09 +0000 (16:22 +1300)]
mesa: provide default implementation of QuerySamplesForFormat
Previously at least i915 failed to provide an implementation, but
exposed ARB_internalformat_query anyway, leading to crashes when
QueryInternalformativ was called.
Default implementation just returns 1 for everything, so is suitable for
any driver which does not support multisampling.
V2: - Move from intel to core mesa.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Christoph Bumiller [Wed, 27 Mar 2013 22:39:06 +0000 (23:39 +0100)]
nvc0: implement MP performance counters
There's more, but this only adds (most) of the counters that are
handled directly by the shader processors.
The other counter domains are not handled on the multiprocessor and
there are no FIFO object methods for configuring them.
Instead, they have to be programmed by the kernel via PCOUNTER, and
the interface for this isn't in place yet.
Christoph Bumiller [Thu, 21 Mar 2013 18:26:01 +0000 (19:26 +0100)]
nvc0: enable compression when supported
Christoph Bumiller [Wed, 27 Mar 2013 22:38:29 +0000 (23:38 +0100)]
nvc0: use NOUVEAU_GETPARAM_GRAPH_UNITS to get MP count
Christoph Bumiller [Fri, 22 Mar 2013 12:49:40 +0000 (13:49 +0100)]
nv50,nvc0: fix 3d blits, restore viewport after blit
Christoph Bumiller [Mon, 25 Mar 2013 18:41:18 +0000 (19:41 +0100)]
nv50: fix 3D render target setup
Brian Paul [Thu, 28 Mar 2013 23:17:26 +0000 (17:17 -0600)]
llvmpipe: put .bmp extension on dumped image files
Brian Paul [Thu, 28 Mar 2013 23:17:26 +0000 (17:17 -0600)]
llvmpipe: add 'f' suffix to 1.0 in fixed_to_float()
Brian Paul [Thu, 28 Mar 2013 23:03:57 +0000 (17:03 -0600)]
draw: fix some build breakage when LLVM is not used
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62883
Tested-by: Vinson Lee <vlee@freedesktop.org>
Marek Olšák [Thu, 28 Mar 2013 13:50:01 +0000 (14:50 +0100)]
mesa: handle STATE_CURRENT_ATTRIB_MAYBE_VP_CLAMPED for parameter printing
Reviewed-by: Brian Paul <brianp@vmware.com>
Kenneth Graunke [Thu, 28 Mar 2013 07:18:46 +0000 (00:18 -0700)]
i965: Tidy shader time printing code by using printf's field widths.
We can use %-6s%-6s rather than manually counting characters, resulting
in much more readable code.
This necessitates a small secondary change: using "total fs16" and ""
now causes the "" string to be padded out to 6 characters, resulting in
too much whitespace. Splitting it into "total" and "fs16" produces the
same output as before.
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Tue, 19 Mar 2013 23:28:54 +0000 (16:28 -0700)]
i965/vs: Include URB payload setup in shader_time.
This much more accurately reflects the cost of the vertex shader, since
the payload setup is often a significant fraction of the instructions in
the VS.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Tue, 18 Dec 2012 01:11:21 +0000 (17:11 -0800)]
i965/vs: Use a send from a 2-register VGRF for shader time writes.
This will let us emit it later, after we're setting up MRFs for the
URB write.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Tue, 18 Dec 2012 01:03:02 +0000 (17:03 -0800)]
i965/vs: Teach copy propagation about sends from GRFs.
This incidentally also teaches it a bit about gen6 math -- we now allow
unswizzled, unmodified GRF temps as the sources for math.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Tue, 18 Dec 2012 00:48:20 +0000 (16:48 -0800)]
i965/vs: Prepare split_virtual_grfs() for the presence of SENDs from GRFs.
v2: Fix silly bool handling, and don't add new tabs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Tue, 19 Mar 2013 22:14:20 +0000 (15:14 -0700)]
i965/fs: Include everything but the final FB write in shader_time.
Previously, if you just wrote a constant color to the render target, no
time got noted at all. This is convenient for doing single-instruction
timings, but not so much for actual program analysis.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Tue, 19 Mar 2013 22:28:11 +0000 (15:28 -0700)]
i965/fs: Switch shader_time writes to using GRFs.
This avoids conflicts between shader_time and FB writes, so we can include
more of the program under our profiling. This does mean hiding more of
the message setup from the optimizer, which doesn't have a way to handle
multi-reg sends from GRFs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Tue, 19 Mar 2013 21:28:29 +0000 (14:28 -0700)]
i965: Provide more detailed information to match shader_time to programs.
Ken asked me the other day what -1 vs 0 vs 3 vs other meant in our shader
names, and I realized that it was really unclear. I'd like to do even
better, like noting which one is the clear shader, but that would require
exposing the metaops struct to the driver.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Tue, 19 Mar 2013 21:27:42 +0000 (14:27 -0700)]
i965: Track ARB program state along with GLSL state for shader_time.
This will let us do much better printouts for non-GLSL programs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Marek Olšák [Wed, 27 Mar 2013 00:56:25 +0000 (01:56 +0100)]
st/dri: fix crash with HUD and single buffering
Marek Olšák [Thu, 28 Mar 2013 13:51:28 +0000 (14:51 +0100)]
st/mesa: remove leftover printfs from ReadPixels
Oops, I thought I had removed all debugging code.
Eric Anholt [Fri, 22 Mar 2013 21:11:25 +0000 (14:11 -0700)]
i965/fs: Improve performance of copy propagation dataflow using bitsets.
Reduces compile time of l4d2's slowest shader by 17.8% +/- 1.3% (n=10).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Zack Rusin [Wed, 27 Mar 2013 00:53:27 +0000 (17:53 -0700)]
llvmpipe/draw: Fix texture sampling in geometry shaders
We weren't correctly propagating the samplers and sampler views
when they were related to geometry shaders.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Zack Rusin [Tue, 26 Mar 2013 19:35:45 +0000 (12:35 -0700)]
draw/llvm: Cleanup the store debugging code
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Zack Rusin [Tue, 26 Mar 2013 19:32:30 +0000 (12:32 -0700)]
draw: Allocate the output buffer for output primitives
We were allocating the output buffer but using the input
primitives. We need to allocate that buffer using the
maximum number of output, not input, primitives.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Zack Rusin [Wed, 27 Mar 2013 09:38:32 +0000 (02:38 -0700)]
gallivm: Implement the breakc instruction
Required by more modern examples. Like BRK but with a condition.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Zack Rusin [Wed, 27 Mar 2013 09:30:38 +0000 (02:30 -0700)]
gallivm: implement implicit primitive flushing
TGSI semantics currently require an implicit endprim at the end
of GS if an ending primitive hasn't been emitted.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Zack Rusin [Mon, 18 Feb 2013 12:00:19 +0000 (04:00 -0800)]
gallium/llvm: implement geometry shaders in the llvm paths
This commits implements code generation of the geometry shaders in
the SOA paths. All the code is there but bugs are likely present.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Zack Rusin [Thu, 14 Mar 2013 08:07:28 +0000 (01:07 -0700)]
draw/gs: Fetch more than one primitive per invocation
Allows executing gs on up to 4 primitives at a time. Will also be
required by the llvm code because there we definitely don't want
to flush with just a single primitive.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Zack Rusin [Thu, 14 Mar 2013 07:42:06 +0000 (00:42 -0700)]
draw/gs: Abstract the portions of GS that are tgsi specific
To be able to add llvm paths later on we need to have some common
interface for them.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Zack Rusin [Thu, 14 Mar 2013 00:13:21 +0000 (17:13 -0700)]
draw/llvm: Remove unused gs_constants from jit_context
The member was never used and we'll need to handle it differently
because gs will also need samplers/textures setup.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Zack Rusin [Wed, 13 Mar 2013 23:46:24 +0000 (16:46 -0700)]
graw/gs: add missing max output vertices to all tests
A few tests were missing this crucial property.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Jerome Glisse [Mon, 25 Mar 2013 15:46:38 +0000 (11:46 -0400)]
radeonsi: add cs tracing v3
Same as on r600, trace cs execution by writting cs offset after each
states, this allow to pin point lockup inside command stream and
narrow down the scope of lockup investigation.
v2: Use WRITE_DATA packet instead of WRITE_MEM
v3: Remove useless nop packet
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Chris Forbes [Mon, 25 Mar 2013 10:19:07 +0000 (23:19 +1300)]
mesa: only check sample count if we actually wanted multisampling
Fixes various test fallout from
90b5a2425a on Pineview, which claims to
support ARB_internalformat_query but doesn't actually provide the
driverfunc.
That driver is still broken [GetInternalformativ will still segfault!]
but it was silly to be going through the sample count logic in the
nonmultisampling case at all.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Christian König [Tue, 26 Mar 2013 14:08:00 +0000 (15:08 +0100)]
radeon/llvm: document LLVM commit
We need at least that revision to work correctly now.
Signed-off-by: Christian König <christian.koenig@amd.com>
Christian König [Sun, 17 Mar 2013 15:02:42 +0000 (16:02 +0100)]
radeonsi: add preloading for all samplers
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Christian König [Fri, 15 Mar 2013 14:53:25 +0000 (15:53 +0100)]
radeonsi: add preloading of all constants
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Christian König [Wed, 20 Mar 2013 11:10:35 +0000 (12:10 +0100)]
radeonsi: mark most intrinsics as readnone/nounwind
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Christian König [Wed, 20 Mar 2013 13:37:21 +0000 (14:37 +0100)]
radeonsi: mark all loads as constant
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Christian König [Tue, 19 Mar 2013 12:36:26 +0000 (13:36 +0100)]
radeonsi: remove wqm intrinsic
Now the backend handles that itself.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Christian König [Tue, 26 Mar 2013 10:37:45 +0000 (11:37 +0100)]
radeon/llvm: remove uneeded inclusion
The include isn't needed and the file has moved with LLVM master.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Christian König [Sun, 24 Mar 2013 15:24:52 +0000 (16:24 +0100)]
glsl_to_tgsi: avoid creating arrays if driver doesn't support them
Avoid creating arrays if we replace indirect addressing anyway.
Signed-off-by: Christian König <christian.koenig@amd.com>
Christian König [Mon, 25 Mar 2013 09:57:48 +0000 (10:57 +0100)]
glsl_to_tgsi: make simplify_cmp work with arrays
Even when we have arrays it is possible for simplify_cmp
to work on temps, just not on arrays.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=62696
Signed-off-by: Christian König <christian.koenig@amd.com>
Marek Olšák [Tue, 26 Mar 2013 00:37:40 +0000 (01:37 +0100)]
gallium/docs: document get_driver_query_info
Marek Olšák [Fri, 22 Mar 2013 01:39:42 +0000 (02:39 +0100)]
r600g: add a driver query returning the amount of requested VRAM and GTT memory
Marek Olšák [Thu, 21 Mar 2013 18:44:18 +0000 (19:44 +0100)]
r600g: add a driver query returning the number of draw_vbo calls
between begin_query and end_query
Marek Olšák [Thu, 21 Mar 2013 18:51:30 +0000 (19:51 +0100)]
st/dri: integrate the HUD
Reviewed-by: Brian Paul <brianp@vmware.com>
Marek Olšák [Thu, 21 Mar 2013 18:47:06 +0000 (19:47 +0100)]
gallium: implement a heads-up display module
Reviewed-by: Brian Paul <brianp@vmware.com>
v2: lots of cosmetic changes
Marek Olšák [Thu, 21 Mar 2013 18:32:24 +0000 (19:32 +0100)]
gallium: add interface for driver queries like performance counters, etc.
The pipe query interface is reused. The list of available queries can be
obtained using pipe_screen::get_driver_query_info.
Reviewed-by: Brian Paul <brianp@vmware.com>
Marek Olšák [Fri, 22 Mar 2013 16:04:15 +0000 (17:04 +0100)]
gallium/tgsi: fix valgrind warning
"Conditional jump or move depends on uninitialised value(s)"
Reviewed-by: Brian Paul <brianp@vmware.com>
Marek Olšák [Thu, 21 Mar 2013 23:31:47 +0000 (00:31 +0100)]
st/mesa: fix crash with blit-based GetTexImage
https://bugs.freedesktop.org/show_bug.cgi?id=62573
Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
Marek Olšák [Mon, 18 Mar 2013 21:36:21 +0000 (22:36 +0100)]
cso: add constant buffer save/restore feature for postprocessing
Postprocessing is an internal meta op and should restore the states
it changes.
Marek Olšák [Thu, 21 Mar 2013 18:29:29 +0000 (19:29 +0100)]
radeonsi: fix crash while binding a NULL constant buffer
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Thu, 21 Mar 2013 18:29:29 +0000 (19:29 +0100)]
r600g: fix crash while binding a NULL constant buffer
Marek Olšák [Thu, 21 Mar 2013 18:29:29 +0000 (19:29 +0100)]
r300g: fix crash while binding a NULL constant buffer
Martin Andersson [Mon, 25 Mar 2013 22:11:34 +0000 (23:11 +0100)]
r600g: Use virtual address for PIPE_QUERY_SO* in r600_emit_query_end
Virtual address is used for PIPE_QUERY_SO* queries in
r600_emit_query_begin, but not in r600_emit_query_end.
This will trigger a GPU fault when one of those queries is
made and virtual address is enabled.
Note: this is a candidate for the 9.1 branch
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rob Clark [Mon, 25 Mar 2013 18:57:24 +0000 (14:57 -0400)]
freedreno: use u_debug for debug env vars
Signed-off-by: Rob Clark <robdclark@gmail.com>
Jordan Justen [Sun, 10 Mar 2013 01:12:09 +0000 (17:12 -0800)]
glsl ir: add as_dereference_record
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Brian Paul [Mon, 25 Mar 2013 16:24:01 +0000 (10:24 -0600)]
gallium: undef PACKAGE_* macros to silence warnings
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Mon, 25 Mar 2013 16:23:42 +0000 (10:23 -0600)]
gallivm: init vars to silence warnings
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Mon, 25 Mar 2013 16:23:27 +0000 (10:23 -0600)]
swrast: init vars to silence warnings
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Rob Clark [Mon, 25 Mar 2013 15:55:18 +0000 (11:55 -0400)]
freedreno: prefer sw upload for textures
Since we are UMA, in most cases the GPU blit doesn't make much sense for
texture upload.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Wed, 6 Mar 2013 15:45:58 +0000 (10:45 -0500)]
freedreno: track maximal scissor bounds
Optimize out parts of the render target that are scissored out by taking
into account maximal scissor bounds in fd_gmem_render_tiles().
This is a big win on things like gnome-shell which frequently do partial
screen updates.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Adrian Marius Negreanu [Fri, 22 Mar 2013 11:42:40 +0000 (13:42 +0200)]
android: fix Android.mk bug in mesa/drivers/dri/common
target-specific variables are undefined when used as pre-requisites.
instead, use secondary-expansion.
I noticed this when building the patch:
i965: Add a driconf option to disable flush throttling
Signed-off-by: Adrian Marius Negreanu <adrian.m.negreanu@intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Eric Anholt [Mon, 18 Mar 2013 15:42:19 +0000 (08:42 -0700)]
mesa: Disable validate_ir_tree() on release builds.
Since half of ir_validate uses asserts() (the other using printf() then
abort()), there's not much use to calling it in a release build. Cuts
6.3% of the startup time of TF2.
NOTE: This is a candidate for the stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Roland Scheidegger [Sun, 24 Mar 2013 01:08:01 +0000 (02:08 +0100)]
gallivm: move code for dealing with rgb9e5 and r11g11b10 formats to own file
This is really not generic conversion stuff and the code very particular to
these formats.
Vinson Lee [Sat, 23 Mar 2013 07:24:52 +0000 (00:24 -0700)]
llvmpipe: Fix assertions with assignment instead of comparison.
Fixes assign instead of compare defects reported by Coverity.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Paul Berry [Fri, 22 Mar 2013 00:14:53 +0000 (17:14 -0700)]
i965: Shrink brw_vue_map struct.
This patch changes the arrays in brw_vue_map (which only ever contain
values from -1 to 58) from ints to signed chars. This reduces the
size of the struct from 488 bytes to 136 bytes.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: fix STATIC_ASSERT to use 127 instead of 128.
Reviewed-by: Eric Anholt <eric@anholt.net>
Paul Berry [Wed, 20 Mar 2013 17:15:52 +0000 (10:15 -0700)]
i965/fs: Rename vp_outputs_written to input_slots_valid.
With the introduction of geometry shaders, fragment inputs will no
longer come exclusively from the vertex shader; sometimes they come
from the geometry shader. So the name "vp_outputs_written" will
become a misnomer. This patch renames vp_outputs_written to
input_slots_valid, to reflect the true meaning of the bitfield from
the fragment shader's point of view: it indicates which of the
possible input slots contain valid data that was written by the
previous shader stage.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Sun, 17 Mar 2013 18:29:28 +0000 (11:29 -0700)]
i965: Use brw.vue_map_geom_out instead of VS output VUE map where appropriate.
This patch modifies post-GS pipeline stages (transform feedback, clip,
sf, fs) to refer to the VUE map through brw->vue_map_geom_out rather
than brw->vs.prog_data->vue_map. This ensures that when geometry
shader support is added, these pipeline stages will consult the
geometry shader output VUE map when appropriate, rather than the
vertex shader output VUE map.
v2: Fixed some stale "CACHE_NEW_VS_PROG" comments.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Mon, 18 Feb 2013 18:16:02 +0000 (10:16 -0800)]
i965: Store the geometry output VUE map in brw_context.
Currently, the GPU pipeline has one active VUE map in effect at any
given time--the one representing the layout of vertex data coming from
the vertex shader. However, when geometry shaders are added, they
will have their own independent VUE map. Later pipeline stages (clip,
sf, fs) will need to consult the geometry shader VUE map if a geometry
shader is in use, and the vertex shader VUE map otherwise.
This patch adds a new field to brw_context, vue_map_geom_out, which
contains the VUE map that should be used by later pipeline stages. It
also adds a new state flag, BRW_NEW_VUE_MAP_GEOM_OUT, which is
signalled whenever the contents of the VUE map changes.
Since we don't support geometry shaders yet, vue_map_geom_out is
currently set only by the brw_vs_prog state atom.
v2: Don't set vue_map_geom_out in do_vs_prog--that's redundant and
possibly problematic for precompiles. Only set it in
brw_upload_vs_prog. Also, make a copy instead of using a
pointer--this makes it possible to detect when the VUE map hasn't
changed, so we can avoid redundant state uploads.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Sun, 17 Mar 2013 18:13:56 +0000 (11:13 -0700)]
i965: Move brw_vs_prog_data::outputs_written into VUE map.
Future patches will allow for there to be separate VUE maps when both
a geometry shader and a vertex shader are in use. When this happens,
we will want to have correspondingly separate outputs_written
bitfields. Moving outputs_written into the VUE map will make this
easy.
For consistency with the terminology used in the VUE map, the bitfield
is renamed to "slots_valid" in the process.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Sat, 23 Mar 2013 15:23:03 +0000 (08:23 -0700)]
i965/gen7: Use WE_all mode when enabling channel masks for URB write.
Gen7 adds mask bits to the message header for a URB write which allow
the write to apply only to certain channels. We don't use this
functionality, so to ensure that the entire write always occurs, we
emit an OR instruction to set the mask bits.
With the advent of geometry shaders, URB writes won't just happen at
the end of a thread; they will happen in mid-thread too. Thus, we can
no longer rely on channel 0 being enabled, so we need to emit the OR
instruction in WE_all mode to ensure that it is executed.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Sat, 23 Mar 2013 22:53:33 +0000 (15:53 -0700)]
i965: Rename BRW_VARYING_SLOT_MAX -> BRW_VARYING_SLOT_COUNT.
The new name clarifies that it represents *one more* than the maximum
possible brw_varying_slot value.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Paul Berry [Fri, 22 Mar 2013 16:39:11 +0000 (09:39 -0700)]
i965: Clarify nomenclature: vert_result -> varying
This patch removes the terminology "vert_result" from the i965 driver,
replacing it with "varying". The old terminology, "vert_result", was
confusing because (a) it referred to the enum gl_vert_result, which no
longer exists (it was replaced with gl_varying_slot), and (b) it
implied a vertex output, but with the advent of geometry shaders, it
could be either a vertex or a geometry output, depending what shaders
are in use. The generic term "varying" is less confusing.
No functional change.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Whitespace fixes.
Chris Forbes [Sun, 24 Mar 2013 03:21:01 +0000 (16:21 +1300)]
i965: bump MAX_DEPTH_TEXTURE_SAMPLES to 4/8
Bump MAX_DEPTH_TEXTURE_SAMPLES to match what GetInternalformativ is
claiming. Since that limit is what is actually enforced now, this
doesn't actually change anything except the queried value.
There's still no piglits verifying that multisample depth textures work,
but this works in the Unigine demos.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Chris Forbes [Sun, 3 Mar 2013 08:46:12 +0000 (21:46 +1300)]
mesa: use _mesa_check_sample_count() for multisample textures
Extends _mesa_check_sample_count() to properly support the
TEXTURE_2D_MULTISAMPLE and TEXTURE_2D_MULTISAMPLE_ARRAY targets, which
have subtly different limits than renderbuffers.
This resolves the remaining TODO in the implementation of
TexImage*DMultisample.
V2: - Don't introduce spurious block.
- Do this in multisample.c instead.
- Fix typo in error message.
- Inline spec quotes
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Chris Forbes [Wed, 6 Feb 2013 07:42:53 +0000 (20:42 +1300)]
mesa: helper for checking renderbuffer sample count
Pulls the checking of the sample count into a helper function, and
extends the existing logic to include the interactions with both
ARB_texture_multisample and ARB_internalformat_query.
_mesa_check_sample_count() checks a desired sample count against a
a combination of target/internalformat, and returns the error enum
to be produced, if any. Unfortunately the conditions are messy and the
errors vary.
V2: - Tidy up spurious block.
- Move _mesa_check_sample_count() to multisample.c instead; It
doesn't really belong in fbobject.c or teximage.c.
- Inlined spec quotes
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Chris Forbes [Sat, 16 Feb 2013 07:47:11 +0000 (20:47 +1300)]
mesa: allow internalformat_query with multisample texture targets
Now that we support ARB_texture_multisample, there are multiple targets
accepted for this query, and they may have target-dependent limits, so
pass the target to the driverfunc.
For example, the sampling hardware may not be able to do general
texelFetch() for some format/sample count combination, but the driver
may still be able to implement a reasonable resolve operation, so it can
be supported for renderbuffers.
V2: - Don't break Gallium compile.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Dmitry Cherkassov [Sat, 23 Mar 2013 19:51:22 +0000 (23:51 +0400)]
clover: add dynamic_cast results checking down in clSetKernelArgument() code path.
Signed-off-by: Dmitry Cherkassov <dcherkassov@gmail.com>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Roland Scheidegger [Sat, 23 Mar 2013 01:05:54 +0000 (02:05 +0100)]
gallivm: Add code for rgb9e5 shared exponent format to float conversion
And use this (and the code for r11g11b10 packed float to float conversion)
in the soa texturing code (the generated code looks quite good).
Should be an order of magnitude faster probably than using the fallback
(not measured).
Tested with piglit texwrap GL_EXT_packed_float and
GL_EXT_texture_shared_exponent respectively (didn't find much else using
it).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Marek Olšák [Thu, 14 Mar 2013 16:18:43 +0000 (17:18 +0100)]
gallium,st/mesa: don't use blit-based transfers with software rasterizers
The blit-based paths for TexImage, GetTexImage, and ReadPixels aren't very
fast with software rasterizer. Now Gallium drivers have the ability to turn
them off.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Marek Olšák [Thu, 14 Mar 2013 15:36:22 +0000 (16:36 +0100)]
st/mesa: implement blit-based ReadPixels
Initial version contributed by: Martin Andersson <g02maran@gmail.com>
This is only used if the memcpy path cannot be used and if no transfer ops
are needed. It's pretty similar to our TexImage and GetTexImage
implementations.
The motivation behind this is to be able to use ReadPixels every frame and
still have at least 20 fps (or 60 fps with a powerful GPU and CPU)
instead of 0.5 fps.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Marek Olšák [Thu, 14 Mar 2013 14:20:27 +0000 (15:20 +0100)]
mesa: add common format-independent memcpy-based ReadPixels path
I'll need the _mesa_readpixels_needs_slow_path function for the blit-based
version, but it's also useful to have this memcpy-based path in one place
and not scattered across several functions.
v2: add "const" to function parameters
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Marek Olšák [Thu, 14 Mar 2013 13:22:56 +0000 (14:22 +0100)]
mesa: add helper func for checking combined depthstencil buffers from st/mesa
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Marek Olšák [Thu, 14 Mar 2013 12:15:54 +0000 (13:15 +0100)]
mesa: add a common function returning transfer ops for ReadPixels
I'll need both new functions for later. For now, it consolidates the code
for determining what the transfer ops should be and makes it a little bit
smarter.
v2: added "const"
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Marek Olšák [Wed, 13 Mar 2013 15:47:21 +0000 (16:47 +0100)]
mesa: handle HALF_FLOAT like FLOAT in get_tex_rgba
NOTE: This is a candidate for the stable branches.
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
Roland Scheidegger [Fri, 22 Mar 2013 19:09:18 +0000 (20:09 +0100)]
llvmpipe: add EXT_packed_float render target format support
New conversion code to handle conversion from/to r11g11b10 AoS to/from
SoA floats, and also add code for conversion from rgb9e5 AoS to float SoA
(which works pretty much the same as r11g11b10 except for the packing).
(This code should also be used for texture sampling instead of
relying on u_format conversion but it's not yet, so rgb9e5 is unused.)
Unfortunately a crazy amount of hacks is necessary to get the conversion
code running in llvmpipe's generate_unswizzled_blend, which isn't well
suited for formats where the storage representation has nothing to do
with what's needed for blending (moreover, the conversion will convert
from packed AoS values, which is the storage format, to float SoA values,
because this is much more natural for the conversion, and likewise from
SoA values to packed AoS values - but the "blend" (which includes
trivial things like partial mask) works on AoS values, so incoming fs
values will go SoA->AoS, values from destination will go packed
AoS->SoA->AoS, then do blend, then AoS->SoA->packed AoS which probably
isn't the most efficient way though the shuffles are probably bearable).
Passes piglit fbo-blending-formats (with GL_EXT_packed_float parameter),
still need to verify Inf/NaNs (where most of the complexity in the
conversion comes from actually).
v2: drop the (very bogus) rgb9e5 part, and do component extraction
in the helper code for r11g11b10 to float conversion, making the code
slightly more compact (suggested by Jose), now that there are no other
callers left this works quite well. (Could do the same for the
opposite way but it's less than ideal there, final part of packing
needs to be done in caller anyway and there'd be another conditional.)
v3: minor style and comment fixes. Also fix a potential issue with
negative zero being potentially returned by max(src, zero) as we
don't have well-defined min/max behavior (fortunately no additonal cost).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Michel Dänzer [Thu, 21 Mar 2013 16:56:52 +0000 (17:56 +0100)]
r600g: Honour legacy debugging environment variables
This helps minimize confusion / effort when moving between branches or
helping others.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>