Kenneth Graunke [Sun, 26 Jan 2014 03:22:56 +0000 (19:22 -0800)]
i965/fs: Assume fragment color clamping is off when precompiling.
Modern applications frequencly use both UNORM buffers and FLOAT buffers
with color clamping disabled. (FLOAT with clamping explicitly enabled
and SNORM buffers appear to be less common.) We don't need to emit
saturates in the fragment shader in either of the common cases.
Mesa sets ctx->Color._ClampFragmentColor to false if all the color
buffers are UNORM. Also, for GL_FIXED_ONLY mode (the default in
legacy OpenGL), it will be false if any FLOAT buffers are bound.
Since the common case is false, that should be our default.
Thanks to Roland Scheidegger for pointing out some faulty logic
in v1 of this patch (unnecessary code and incorrect explanations).
v2: Drop superfluous code and reword commit message.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Sarah Sharp [Tue, 6 May 2014 19:10:57 +0000 (12:10 -0700)]
egl: Add EGL_CHROMIUM_sync_control extension.
Chromium defined a new GL extension (that isn't registered with Khronos).
We need to add an EGL extension for it, so we can migrate ChromeOS on
Intel systems to use EGL instead of GLX.
http://git.chromium.org/gitweb/?p=chromium/src/third_party/khronos.git;a=commitdiff;h=
27cbfdab35c601f70aa150581ad1448d0401f447
The EGL_CHROMIUM_sync_control extension is similar to the GLX extension
OML_sync_control, but only defines one function,
eglGetSyncValuesCHROMIUM, which is equivalent to glXGetSyncValuesOML.
http://www.opengl.org/registry/specs/OML/glx_sync_control.txt
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: Jamey Sharp <jamey@minilop.net>
Cc: Ian Romanick <idr@freedesktop.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Sarah Sharp [Tue, 6 May 2014 19:10:56 +0000 (12:10 -0700)]
Import eglextchromium.h from Chromium.
In order to support the (currently unregistered) Chromium-specific EGL
extension eglGetSyncValuesCHROMIUM on Intel systems, we need to import
the Chromium header that defines it. The file was downloaded from
https://chromium.googlesource.com/chromium/chromium/+/trunk/ui/gl/EGL/eglextchromium.h
It is subject to the license found at
https://chromium.googlesource.com/chromium/chromium/+/trunk/LICENSE
I have imported the header file and added the license text to the top.
The only change was to fix the include guard on the Chromium header to
change the last line from a #define to a #endif, which makes the header
actually compile.
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Cc: Jamey Sharp <jamey@minilop.net>
Cc: Ian Romanick <idr@freedesktop.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Jeremy Huddleston Sequoia [Tue, 20 May 2014 17:53:00 +0000 (10:53 -0700)]
darwin: Fix test for kCGLPFAOpenGLProfile support at runtime
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Rob Clark [Tue, 20 May 2014 14:52:56 +0000 (10:52 -0400)]
freedreno: don't advertise texture arrays for now
I think a3xx and later should support (it is part of GLES3), but this
isn't needed for the time being and still needs to be reversed.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Jeremy Huddleston Sequoia [Tue, 20 May 2014 08:37:58 +0000 (01:37 -0700)]
glapi: Avoid heap corruption in _glapi_table
Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com>
Rob Clark [Mon, 19 May 2014 21:56:11 +0000 (17:56 -0400)]
freedreno/a3xx: shadow sampler support
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Mon, 19 May 2014 21:34:54 +0000 (17:34 -0400)]
freedreno/a3xx/compiler: refactor trans_samp()
Split it up into some smaller fxns so it doesn't grow into a huge
monster as we add things.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Mon, 19 May 2014 21:28:31 +0000 (17:28 -0400)]
freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Kenneth Graunke [Mon, 19 May 2014 05:26:59 +0000 (22:26 -0700)]
meta: Avoid _swrast_BlitFramebuffer in the meta CopyTexSubImage code.
This is a replacement for
bd44ac8b5ca08016bb064b37edaec95eccfdbcd5
that should actually work.
Fixes Piglit's copyteximage-border on swrast, as well as one of
es3conform's packed_pixels_pixelstore test.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78546
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77705
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Kenneth Graunke [Mon, 19 May 2014 05:16:01 +0000 (22:16 -0700)]
meta: Split _swrast_BlitFramebuffer out of the meta blit path.
Separating the software fallbacks from the rest of the meta path (which
is usually hardware accelerated) gives callers better control over their
blitting options.
For example, i965 might want to try meta blit, hardware blits, then
swrast as a last resort. Splitting it makes that possible.
This updates all callers to maintain the existing behavior (even in the
few cases where it isn't desirable behavior - later patches can change
that).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Kenneth Graunke [Mon, 19 May 2014 02:32:44 +0000 (19:32 -0700)]
meta: Drop unnecessary early returns in _mesa_meta_BlitFramebuffer.
These aren't necessary - all of the following code is predicated on mask
being non-zero, so no code will get executed anyway.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Courtney Goeltzenleuchter <courtney@lunarg.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Kenneth Graunke [Mon, 19 May 2014 02:24:30 +0000 (19:24 -0700)]
Revert "i965: Don't _swrast_BlitFramebuffer when doing CopyTexSubImage."
This reverts commit
bd44ac8b5ca08016bb064b37edaec95eccfdbcd5.
Fixes:
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78842
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78843
Re-breaks:
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77705
but that will be fixed properly in a few commits.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Brian Paul [Mon, 19 May 2014 13:54:30 +0000 (07:54 -0600)]
docs: update the prerequisites section
SCons is required for Windows. Add links to flex/bison for Windows.
Reorder items and improve formatting.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Topi Pohjolainen [Mon, 19 May 2014 07:10:33 +0000 (10:10 +0300)]
i965/fbo: Only try stencil meta blits on gen >= 8
I don't have an ILK at hand but the fix should be trivial.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78872
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
Kenneth Graunke [Wed, 14 May 2014 01:53:28 +0000 (18:53 -0700)]
mesa: Disable GL_EXT_framebuffer_multisample_blit_scaled on Broadwell.
It's not properly implemented in the meta code, and we don't have time
to fix it for 10.2.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Roland Scheidegger [Fri, 16 May 2014 20:45:27 +0000 (22:45 +0200)]
llvmpipe: do IR counting for shader cache management after optimization.
2ea923cf571235dfe573c35c3f0d90f632bd86d8 had the side effect of IR counting
now being done after IR optimization instead of before. Some quick analysis
shows that there's roughly 1.5 times more IR instructions before optimization
than after, hence the effective shader cache size got quite a bit smaller.
Could counter this with an increase of the instruction limit but it probably
makes more sense to count them after optimizations, so move that code.
Reviewed-by: Brian Paul <brianp@vmware.com>
Vinson Lee [Mon, 19 May 2014 07:39:12 +0000 (00:39 -0700)]
i965: Rename brw_disasm to brw_disassemble_inst.
Fixes build error introduced with commit
4b04152db055babb8b06929a0c9ebea5c7f4fb92.
CC test_eu_compact.o
test_eu_compact.c: In function ‘test_compact_instruction’:
test_eu_compact.c:54:3: error: implicit declaration of function ‘brw_disasm’ [-Werror=implicit-function-declaration]
brw_disasm(stderr, &src, brw->gen, false);
^
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78888
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Kenneth Graunke [Mon, 19 May 2014 06:36:19 +0000 (23:36 -0700)]
i965: Fix a "discards 'const' qualifier" warning.
Trivial.
Kenneth Graunke [Wed, 14 May 2014 08:35:30 +0000 (01:35 -0700)]
i965/fs: Finally kill struct brw_wm_compile (better known as 'c').
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Wed, 14 May 2014 08:32:54 +0000 (01:32 -0700)]
i965/fs: Stop copying the program key.
We already have a perfectly good copy of the program key, and nobody is
going to modify it. The only reason we copied it was because the
brw_wm_compile structure embedded the key rather than pointing to it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Wed, 14 May 2014 07:41:41 +0000 (00:41 -0700)]
i965/fs: Rip struct brw_wm_compile out of the visitors and generators.
Instead, just pass the key and prog_data as separate parameters.
This moves it up a level - one step further toward getting rid of it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Wed, 14 May 2014 08:21:02 +0000 (01:21 -0700)]
i965/fs: Plumb a mem_ctx all the way through the FS compile.
'c' is going away, but we still need a memory context that lives
for the duration of the compile.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Wed, 14 May 2014 08:07:32 +0000 (01:07 -0700)]
i965/fs: Use 'c' as the mem_ctx in fs_visitor.
Previously, the memory context situation was a bit of a mess:
fs_visitor allocated its own memory context, and freed it in the
destructor. However, some data produced by fs_visitor (such as the list
of instructions) needs to live beyond when fs_visitor is "done", so the
caller can pass it to fs_generator.
Everything worked out because brw_wm_fs_emit's fs_visitor variables
happen to not go out of scope until the end of the function. But that
meant that moving the declaration of, say, the SIMD16 fs_visitor
instance, could cause everything to explode.
Using a memory context that exists for the duration of the compile is
clearer, and should be equivalent.
Ultimately, we don't want to use 'c', but this matches the behavior of
fs_generator and gen8_fs_generator, so it'll be simple to change later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Wed, 14 May 2014 08:04:02 +0000 (01:04 -0700)]
i965/fs: Actually free program data on the error path.
We throw away the data generated during compilation on the success path,
so we really ought to on the failure path as well. The caller has no
access to it anyway, so it's purely leaked.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Wed, 14 May 2014 07:24:50 +0000 (00:24 -0700)]
i965/fs: Replace c->key with a direct reference in the generators.
'c' is going away. This is also a bit shorter.
Marking the key pointer as const will also deter people from changing
it in these classes, as that's absolutely not OK.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Wed, 14 May 2014 04:06:00 +0000 (21:06 -0700)]
i965/fs: Replace c->key with a direct reference in fs_visitor.
'c' is going away. This is also shorter.
Marking the key pointer as const will also deter people from changing
it in fs_visitor, as it's absolutely not OK to modify it there.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Wed, 14 May 2014 07:20:24 +0000 (00:20 -0700)]
i965/fs: Replace c->prog_data with a direct reference in the generators.
'c' is going away. This is also a bit shorter.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Wed, 14 May 2014 07:17:03 +0000 (00:17 -0700)]
i965/fs: Replace c->prog_data with a direct reference in fs_visitor.
'c' is going away. This is also a bit shorter.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Wed, 14 May 2014 07:08:58 +0000 (00:08 -0700)]
i965/fs: Move some flags that affect code generation to fs_visitor.
runtime_check_aads_emit isn't actually used currently, but I believe
we should be using it on Gen4-5, so I haven't eliminated it.
See https://bugs.freedesktop.org/show_bug.cgi?id=78679 for details.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Wed, 14 May 2014 04:52:51 +0000 (21:52 -0700)]
i965/fs: Move payload register info from brw_wm_compile to fs_visitor.
This data is created by fs_visitor and only used when emitting code,
so keeping it in fs_visitor makes sense. I decided it would be
reasonable to group these all together in a struct, since they're
highly related.
v2: s/nr_payload_regs/payload.num_regs/ in some comments (chrisf).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Wed, 14 May 2014 04:21:21 +0000 (21:21 -0700)]
i965/fs: Simplify gl_SampleMaskIn handling.
As far as I can tell, there's no point in allocating an extra register
and generating a MOV---we can just use the copy provided as part of our
thread payload directly. It's already in the right format.
Of course, there are zero Piglit tests for this. We don't actually ship
the extension (GL_ARB_gpu_shader5) that exposes this functionality
either.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Wed, 14 May 2014 04:36:28 +0000 (21:36 -0700)]
i965/fs: Rename c->sample_mask_reg to sample_mask_in_reg.
This is actually for gl_SampleMaskIn, which is quite different than
gl_SampleMask. Renaming should help avoid confusion.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Wed, 14 May 2014 04:00:35 +0000 (21:00 -0700)]
i965/fs: Move c->last_scratch into fs_visitor.
Nothing outside of fs_visitor uses it, so we may as well keep it
internal.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Wed, 14 May 2014 03:51:32 +0000 (20:51 -0700)]
i965/fs: Move total_scratch calculation into fs_visitor::run().
With this one use gone, c->last_scratch is now only used inside
fs_visitor. The rest of the driver uses prog_data->total_scratch.
We already compute similar prog_data fields in fs_visitor, so this
seems reasonable.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Wed, 14 May 2014 03:41:27 +0000 (20:41 -0700)]
i965/fs: Move perf_debug about register spilling to a more obvious spot.
The if (!allocated_without_spills) block is an obvious spot for this
performance warning message.
In the Vec4 backend, scratch is also used for indirect access of
temporary arrays. The FS backend doesn't implement that yet, but
if it did, this message would be inaccurate, since scratch access
wouldn't necessarily mean spilling. Moving it preemptively fixes that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Kenneth Graunke [Thu, 15 May 2014 23:10:09 +0000 (16:10 -0700)]
i965: Rename brw/gen8_dump_compile to brw/gen8_disassemble.
"Disassemble" is an accurate description of what this function does.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Thu, 15 May 2014 23:02:16 +0000 (16:02 -0700)]
i965: Rename brw_disasm/gen8_disassemble to brw/gen8_disassemble_inst.
We're going to use "disassemble" for the function that disassembles
the whole program.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Thu, 15 May 2014 22:58:07 +0000 (15:58 -0700)]
i965: Fix dump_prog_cache to handle compacted instructions.
dump_prog_cache has interpreted compacted instructions as full size
instructions, decoding garbage and complaining about invalid values.
We can just use brw_dump_compile to handle this correctly in less code.
The output format changes slightly, but it's still perfectly acceptable.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Kenneth Graunke [Thu, 15 May 2014 21:12:48 +0000 (14:12 -0700)]
i965: Use brw_dump_compile for clip, SF, and old GS programs.
Looping over the instructions and calling brw_disasm doesn't handle
compacted instructions. In most cases, this hasn't been a problem since
we don't compact prior to Sandybridge.
However, Sandybridge's transform feedback GS program should already be
compacted, and so this ought to fix decoding of that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ilia Mirkin [Tue, 13 May 2014 15:23:33 +0000 (11:23 -0400)]
nv50/ir: fix integer mul lowering for u32 x u32 -> high u32
UNION appears to expect that all of its sources are conditionally
defined. Otherwise it inserts an unpredicated mov instruction which
overwrites the desired result. This fixes tests that use UMUL_HI, and
much less directly, unsigned integer division by a constant, which uses
this functionality in a peephole pass.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Ilia Mirkin [Tue, 13 May 2014 05:31:20 +0000 (01:31 -0400)]
nv50/ir: make sure that texprep/texquerylod's args get coalesced
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Rob Clark [Sun, 18 May 2014 19:19:34 +0000 (15:19 -0400)]
freedreno/a3xx: use util_format_compose_swizzles()
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sat, 17 May 2014 17:49:52 +0000 (13:49 -0400)]
freedreno/a3xx/compiler: 1D textures
Gallium already gives us height==1 for these, so the texture state is
already setup correctly to emulate 1D textures as a Nx1 2D texture. We
just need to supply the .y coord.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sun, 18 May 2014 12:02:08 +0000 (08:02 -0400)]
freedreno: fix caps
In particular, we want mesa to emulate primitive restart for us.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sat, 17 May 2014 17:50:10 +0000 (13:50 -0400)]
freedreno: fix index buffer offset
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sat, 17 May 2014 00:29:44 +0000 (20:29 -0400)]
freedreno/a3xx: add sRBG texture support
That was easy. Turns out it is just a matter of setting one bit.
Enable sampling from sRGB texture, and therefore enable GL 2.1 :-)
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sat, 17 May 2014 00:07:36 +0000 (20:07 -0400)]
freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Roland Scheidegger [Sat, 17 May 2014 00:03:35 +0000 (02:03 +0200)]
gallivm: (trivial) fix compilation with llvm 3.1, 3.2
I actually checked the getModuleIdentifier() function exists with 3.1 but
missed that the file moved...
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=78803
Roland Scheidegger [Thu, 15 May 2014 23:01:07 +0000 (01:01 +0200)]
gallivm: print out how long it takes to optimize shader IR.
Enabled with GALLIVM_DEBUG=perf (which up to now was only used to print
warnings for unoptimized code).
While some unexpectedly long shader compile times for some shaders were fixed
with
8a9f5ecdb116d0449d63f7b94efbfa8b205d826f this should help recognize such
problems in the future. For now though only available in debug builds (which
are not always suitable for such analysis). And since this uses system time,
it might not be all that accurate (even llvmpipe's own rasterization threads
might be running at the same time, or just other tasks).
(llvmpipe also has LP_DEBUG=counters but this only gives an average per shader
and the the total time for all shaders.)
This prints information like this:
optimizing module fs17_variant0 took 1 msec
optimizing module setup_variant_0 took 0 msec
optimizing module draw_llvm_vs_variant0 took 9 msec
optimizing module draw_llvm_vs_variant0 took 12 msec
optimizing module fs17_variant1 took 2 msec
v2: rebase for recent gallivm compilation changes, and print time for whole
modules instead of functions (otherwise it would be very spammy since it would
include all trivial inline sse2 functions), using the shiny new module names,
prying them off LLVM using new helper (not available through C bindings).
Per function timings, while possibly giving more information (if there'd be
a problem only in for instance the partial not the whole function), don't seem
all that useful for now.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Thu, 15 May 2014 23:00:53 +0000 (01:00 +0200)]
gallivm: give more verbose names to modules
When we had just one module "gallivm" was an appropriate name. But now we have
modules containing all functions for a particular variant, so give it a
corresponding name (this is really just for helping debugging).
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Brian Paul [Thu, 15 May 2014 21:49:14 +0000 (15:49 -0600)]
mesa: fix double-freeing of dispatch tables inside glBegin/End.
We allocate dispatch tables for BeginEnd and OutsideBeginEnd. But
when we destroy the context we were freeing the BeginEnd and Exec
tables. If Exec==BeginEnd we did a double-free. This would happen
if the context was destroyed while inside a glBegin/End pair. Now
free the BeginEnd and OutsideBeginEnd pointers.
Cc: "10.1", "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Matt Turner [Tue, 4 Mar 2014 03:10:44 +0000 (19:10 -0800)]
i965: Use binary literals counter select.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Michel Dänzer [Thu, 15 May 2014 03:23:16 +0000 (12:23 +0900)]
glsl_to_tgsi: Make sure the 'shader' member is always initialized
Fixes the valgrind report below and random crashes with piglit on radeonsi.
==30005== Conditional jump or move depends on uninitialised value(s)
==30005== at 0xB13584E: st_translate_program (st_glsl_to_tgsi.cpp:5100)
==30005== by 0xB14698B: st_translate_fragment_program (st_program.c:747)
==30005== by 0xB14777D: st_get_fp_variant (st_program.c:824)
==30005== by 0xB11219C: get_color_fp_variant (st_cb_drawpixels.c:1042)
==30005== by 0xB1131AE: st_DrawPixels (st_cb_drawpixels.c:1154)
==30005== by 0xAFF8806: _mesa_DrawPixels (drawpix.c:162)
==30005== by 0x4EB86DB: stub_glDrawPixels (generated_dispatch.c:6640)
==30005== by 0x4F1DF08: piglit_visualize_image (piglit-util-gl.c:1574)
==30005== by 0x40691D: draw_image_to_window_system_fb(int, bool) (draw-buffers-common.cpp:733)
==30005== by 0x406C8B: draw_reference_image(bool, bool) (draw-buffers-common.cpp:854)
==30005== by 0x40722A: piglit_display (alpha-to-coverage-dual-src-blend.cpp:117)
==30005== by 0x4EA7168: run_test (piglit_fbo_framework.c:52)
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Roland Scheidegger [Thu, 15 May 2014 15:01:40 +0000 (17:01 +0200)]
gallivm: remove optimization workaround when not having sse 4.1
This workaround doesn't list any llvm version, but it was introduced
2010-06-10 (
e277d5c1f6b2c5a6d202561e67d2b6821a69ecc4). It is unlikely
this bug is still present in llvm versions we support (3.1+).
There's no specific test listed, but I ran lp_test_arit (which uses
the mentioned functions) on llvm 3.1 and 3.3 with sse41 disabled and
this pass enabled without issues.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Thu, 15 May 2014 14:26:00 +0000 (16:26 +0200)]
gallivm: remove workaround for reversing optimization pass order.
32bit code generation and llvm >= 2.7 used a different optimization pass
order - this code was initially introduced (2010-07-23) by
815e79e72c1f4aa849c0ee6103621685b678bc9d, apparently due to buggy code being
generated with then brand new llvm versions (which was llvm 2.7 plus pre 2.8
devel).
It seems very highly likely that whatever this bug was it has been fixed in
newer llvm versions, though there's no easy way to test this - the mentioned
piglit test has been removed years ago, and even if you'd build it I'm
sceptical the glsl compiler would still produce the required code to trigger
it.
I have no idea what a good order of passes is, but just remove the workaround
and use the same order everywhere.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Matt Turner [Fri, 9 May 2014 00:27:31 +0000 (17:27 -0700)]
i965/gen8: Make disassembly function match brw's signature.
gen8_dump_compile will be called indirectly by code common used by
generations before and after the gen8 instruction format change.
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Fri, 9 May 2014 23:15:30 +0000 (16:15 -0700)]
i965: Pass brw_context and assembly separately to brw_dump_compile.
brw_dump_compile will be called indirectly by code common used by
generations before and after the gen8 instruction format change.
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Wed, 7 May 2014 18:53:22 +0000 (11:53 -0700)]
i965: Pull brw_compact_instructions() out of brw_get_program().
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Thu, 8 May 2014 23:06:33 +0000 (16:06 -0700)]
i965/disasm: Align send instruction meta-information with dst.
Has been misaligned since we added instruction offset prefixes.
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Thu, 1 May 2014 18:20:25 +0000 (11:20 -0700)]
i965/disasm: Disassemble the compaction control bit.
brw_disasm doesn't disassemble compacted instructions, so we uncompact
before disassembling them which would unset the compaction control bit.
Instead pass it as a separate argument.
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Mon, 12 May 2014 21:40:40 +0000 (14:40 -0700)]
i965/cfg: Embed exec_node in bblock_link.
In order to remove bblock_link's inheritance of exec_node. Also makes
linked list walk code much nicer.
Acked-by: Eric Anholt <eric@anholt.net>
Matt Turner [Mon, 12 May 2014 16:54:15 +0000 (09:54 -0700)]
i965/cfg: Make brw_cfg.h closer to C-includable.
Only bblock_link's inheritance left.
Acked-by: Eric Anholt <eric@anholt.net>
Matt Turner [Wed, 19 Feb 2014 22:47:57 +0000 (14:47 -0800)]
i965/cfg: Protect brw_cfg.h from multiple inclusion.
Acked-by: Eric Anholt <eric@anholt.net>
Matt Turner [Tue, 13 May 2014 01:16:22 +0000 (18:16 -0700)]
glsl: Add C-callable fprint_ir function.
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Topi Pohjolainen [Mon, 12 May 2014 09:42:28 +0000 (12:42 +0300)]
i965/fb: Use meta path for stencil up/downsampling
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Topi Pohjolainen [Mon, 12 May 2014 09:35:40 +0000 (12:35 +0300)]
i965/meta: Stencil blit for miptree updownsampling
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Topi Pohjolainen [Sat, 19 Apr 2014 14:11:10 +0000 (17:11 +0300)]
i965/fb: Use meta path for stencil blits
This is effective only on gen8 for now as previous generations still
go through blorp.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Topi Pohjolainen [Mon, 5 May 2014 19:18:46 +0000 (22:18 +0300)]
i965/meta: Stencil blits
v2: Create the intel renderbuffer with level hardcoded to zero instead
of overriding it in the surface state configuration. Also moved the
dimension adjustments for tiling, mip level, msaa into the render
buffer creation. Finally prepares for another blit path needed for
miptree updownsampling.
v3 (Ken): Dropped unnecessary memory context for "ralloc_asprintf()"
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Topi Pohjolainen [Fri, 18 Apr 2014 23:02:42 +0000 (02:02 +0300)]
i965: Extend brw_get_rb_for_first_slice() for specified level/layer
v2: Configure stencil directly for final dimensions instead of
adjusting bit by bit for tiling, mip level and msaa.
v3 (Ken): Used non-static constant for horizontal alignment
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Topi Pohjolainen [Wed, 7 May 2014 09:16:28 +0000 (12:16 +0300)]
i965/gen8: Surface state overriding for stencil
v2: Allow hardware to offset accesses to individual layers. Also leave
the mip-level overriding for the creator of the intel renderbuffer
to handle. Merged with "i965/gen8: Allow stencil buffers to be
configured as single sampled"
Ken: I left the "_mesa_problem()" still in place. I think it is clearer
to remove it in a separate patch.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Topi Pohjolainen [Wed, 7 May 2014 07:49:50 +0000 (10:49 +0300)]
i965/wm: Surface state overrides for configuring w-tiled as y-tiled
v2: Use intel_mipmap_tree::total_width in order to get correct alignment
automatically. Also use "mt->total_height / mt->physical_depth0" as
surface height allowing hardware to offset to correct slice.
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jordan Justen [Thu, 15 May 2014 06:06:47 +0000 (06:06 +0000)]
i965 meta up/downsample: Fix renderbuffer _BaseFormat
mt->format is of type mesa_format, and therefore can't be
used with _mesa_base_fbo_format which requires a GLenum input.
On gen8, this fixes various piglit fbo-depthstencil tests with
samples > 1.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Matt Turner [Mon, 5 May 2014 21:08:56 +0000 (14:08 -0700)]
i965: Delete current_insn() function.
Matt Turner [Wed, 14 May 2014 22:15:02 +0000 (15:15 -0700)]
i965: Remove blorp unit tests.
They've served their purpose (in transitioning blorp to using
fs_generator) and now they just necessitate large amounts of manual
labor to regenerate if the disassembler changes.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Emil Velikov [Sat, 10 May 2014 14:59:03 +0000 (15:59 +0100)]
egl-static: include libradeonwinsys.la only once
With this and the previous patch, we no longer have multiple
definitions in the final egl_gallium.so.
v2: Drop duplicate libloader link.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chia-I Wu <olv@lunarg.com> (v1)
Reviewed-by: Tom Stellard <thomas.stellard@amd.com> (v1)
Emil Velikov [Sat, 10 May 2014 13:35:08 +0000 (14:35 +0100)]
gallium/radeon: link in libradeon.la at target level
It makes more sense to link the core and common parts of the driver as the
target is build. Additionally this will help us drop duplicating symbols
for targets that static link mulitple pipe-drivers. Only egl-static needs
that currently with more to come.
To simplify things a bit add HAVE_GALLIUM_RADEON_COMMON variable.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Emil Velikov [Sat, 10 May 2014 13:25:08 +0000 (14:25 +0100)]
gallium/radeon: build only a single common library libradeon
Just fold libllvmradeon in libradeon.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Rob Clark [Wed, 14 May 2014 16:46:42 +0000 (12:46 -0400)]
freedreno/a3xx: fix write to bogus register
The loops for updating the multiple packed fields in SP_VS_OUT[] and
SP_VS_VPC_DST[] will zero out one register beyond the last that on
required. Which is normally not a problem (and is kinda convenient
when looking at cmdstream dumps) unless we have maximum (16) varyings.
Fix loop termination condition so that this does not happen.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Wed, 14 May 2014 15:39:44 +0000 (11:39 -0400)]
freedreno/a3xx: account for special inputs/outputs
We need to size input/output tables big enough for special inputs/
outputs (gl_Position, gl_FrontFacing, etc) which, while they don't
count towards the hw limit of 16 attributes or 16 varyings, we do
still need to track them all the same.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Wed, 14 May 2014 15:15:26 +0000 (11:15 -0400)]
freedreno/a3xx: fix MAX_INPUTS shader cap
Hardware only supports 16. Which fd3_shader_variant properly reflected,
but the pipe cap did not, leading to array overflow (and shaders that
could not possibly work).
Also a bunch of asserts to make problems like this easier to see.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Wed, 14 May 2014 15:06:21 +0000 (11:06 -0400)]
freedreno/a3xx: add debug flag to expose glsl130
We are starting to add integer support to the compiler, which does not
get exercised with glsl feature level 120 and without advertising
integer support. But doing so breaks too many things right now. So
for now use a debug flag to conditionally expose the functionality
while it is in development.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Ryan Houdek [Wed, 14 May 2014 02:58:03 +0000 (21:58 -0500)]
freedreno/a3xx/compiler: add KILL_IF
The KILL_IF opcode could potentially be merged in to the regular KILL
opcode function. It was a pain to do so, so I've left is separated
for cleanliness.
Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Ryan Houdek [Wed, 14 May 2014 02:44:41 +0000 (21:44 -0500)]
freedreno/a3xx/compiler: start adding integer support
Adds a large sum of TGSI opcodes to the a3xx compiler.
For integer opcodes we have 28 opcodes added.
Adds 4 floating point compare opcodes
If GLSL 1.30 is enabled, this allows the GLSL 1.30 piglits to have a
completion amount of 432/641.
Signed-off-by: Ryan Houdek <Sonicadvance1@gmail.com>
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Roland Scheidegger [Tue, 13 May 2014 01:43:11 +0000 (03:43 +0200)]
draw: better llvm names for shaders for debugging.
All shaders had the same name.
We could probably use some identifier per shader too, but for now only use
the variant number.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Tue, 13 May 2014 00:59:30 +0000 (02:59 +0200)]
llvmpipe: improve setup shader names (for debugging)
The setup shaders were composed of both a fs shader number and a variant
number. But since they aren't tied to a particular fragment shader, the
former was a fixed zero while the latter was also always zero because
it was never assigned. So, similar to what the fs code does, use a ever
increasing number to give it a more catchy name (unlike fragment shaders
though where this number is for each explicitly created shader, we just use
it for the implicitly created variants).
And while here, fix whitespace a bit.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Tue, 13 May 2014 00:20:32 +0000 (02:20 +0200)]
llvmpipe: kill off llvmpipe_variant_count
Unused except it was increased for both fs and setup shader variants created.
Probably some leftover from ages ago.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Wed, 14 May 2014 19:06:23 +0000 (21:06 +0200)]
mesa/st: fix number of ubos being declared in a shader
Previously the code used the total number of ubos being declared in the
linked program (so the ubos of all shaders combined), use the number
from the particular shader instead.
This fixes an assertion failure with piglit arb_uniform_buffer_object-maxblocks
seen in llvmpipe since
8a9f5ecdb116d0449d63f7b94efbfa8b205d826f as it now emits
code for each declared buffer, not just the ones actually used.
CC: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Ben Skeggs [Fri, 9 May 2014 05:56:08 +0000 (15:56 +1000)]
nvc0: enable support for maxwell boards
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ben Skeggs [Fri, 9 May 2014 05:56:05 +0000 (15:56 +1000)]
nvc0: add maxwell (sm50) compiler backend
The big missing part here is proper sched data calculations, but
hopefully the chosen placeholder will be sufficient for now.
Passes piglit as well as GK107 does.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ben Skeggs [Fri, 9 May 2014 05:56:03 +0000 (15:56 +1000)]
nvc0: maxwell isa has no per-instruction join modifier
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ben Skeggs [Fri, 9 May 2014 05:56:01 +0000 (15:56 +1000)]
nvc0: replace immd 0 with $rLASTGPR for emit/restart opcodes
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ben Skeggs [Fri, 9 May 2014 05:55:59 +0000 (15:55 +1000)]
nvc0: move nvc0 lowering pass class definitions into header
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ben Skeggs [Fri, 9 May 2014 05:55:57 +0000 (15:55 +1000)]
nvc0: bump sched data member to 32-bits
SM50 backend requires 21 bits per instruction, not 8.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ben Skeggs [Fri, 9 May 2014 05:55:55 +0000 (15:55 +1000)]
nvc0: use vertex arrays for eng3d blit
Maxwell doesn't have immediate-mode.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ben Skeggs [Fri, 9 May 2014 05:55:53 +0000 (15:55 +1000)]
nvc0: restrict "constant vbo" logic to fermi/kepler classes
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ben Skeggs [Fri, 9 May 2014 05:55:51 +0000 (15:55 +1000)]
nvc0: replace some vb->stride checks with constant_vbo instead
Maxwell no longer has the methods to set constant attributes, and we'll
want to be treating stride 0 vtxbufs the same as for stride > 0.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ben Skeggs [Fri, 9 May 2014 05:55:49 +0000 (15:55 +1000)]
nvc0: add maxwell class
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ben Skeggs [Fri, 9 May 2014 05:55:47 +0000 (15:55 +1000)]
nvc0: allow for easier modification of compiler library routines
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ben Skeggs [Fri, 9 May 2014 05:55:44 +0000 (15:55 +1000)]
nvc0: properly distribute macros in source form
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>