mesa.git
9 years agonv50,nvc0: provide debug messages with shader compilation stats
Ilia Mirkin [Fri, 30 Oct 2015 22:41:09 +0000 (18:41 -0400)]
nv50,nvc0: provide debug messages with shader compilation stats

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonouveau: add support for sending debug messages via KHR_debug
Ilia Mirkin [Fri, 30 Oct 2015 21:23:22 +0000 (17:23 -0400)]
nouveau: add support for sending debug messages via KHR_debug

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agost/clover: provide a path for drivers to call through to pfn_notify
Ilia Mirkin [Sat, 31 Oct 2015 03:25:59 +0000 (23:25 -0400)]
st/clover: provide a path for drivers to call through to pfn_notify

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
[ Francisco Jerez: Clean up clover::context interface by passing
  around a function object. ]

9 years agost/mesa: set debug callback for debug contexts
Ilia Mirkin [Sat, 31 Oct 2015 03:28:01 +0000 (23:28 -0400)]
st/mesa: set debug callback for debug contexts

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
9 years agogallium: expose a debug message callback settable by context owner
Ilia Mirkin [Fri, 30 Oct 2015 07:17:35 +0000 (03:17 -0400)]
gallium: expose a debug message callback settable by context owner

This will allow gallium drivers to send messages to KHR_debug endpoints

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years agost/mesa: account for texture views when doing CopyImageSubData
Ilia Mirkin [Thu, 5 Nov 2015 05:33:22 +0000 (00:33 -0500)]
st/mesa: account for texture views when doing CopyImageSubData

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years agoi965/fs: Do not mark used surfaces in FS_OPCODE_GET_BUFFER_SIZE
Iago Toral Quiroga [Fri, 30 Oct 2015 10:10:02 +0000 (11:10 +0100)]
i965/fs: Do not mark used surfaces in FS_OPCODE_GET_BUFFER_SIZE

Do it in the visitor, like we do for other opcodes.

v2: use const, get rid of useless surf_index temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agoi965/vec4: Do not mark used surfaces in VS_OPCODE_GET_BUFFER_SIZE
Iago Toral Quiroga [Fri, 30 Oct 2015 09:57:47 +0000 (10:57 +0100)]
i965/vec4: Do not mark used surfaces in VS_OPCODE_GET_BUFFER_SIZE

Do it in the visitor, like we do for other opcodes.

v2: use const, get rid of useless surf_index temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agoi965/vec4: Do not mark used direct surfaces in VS_OPCODE_PULL_CONSTANT_LOAD
Iago Toral Quiroga [Fri, 30 Oct 2015 09:24:12 +0000 (10:24 +0100)]
i965/vec4: Do not mark used direct surfaces in VS_OPCODE_PULL_CONSTANT_LOAD

Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.

v2: Use const, do not add unnecessary temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agoi965/fs: Do not mark used direct surfaces in UNIFORM_PULL_CONSTANT_LOAD
Iago Toral Quiroga [Fri, 30 Oct 2015 07:48:57 +0000 (08:48 +0100)]
i965/fs: Do not mark used direct surfaces in UNIFORM_PULL_CONSTANT_LOAD

Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agoi965/fs: Do not mark direct used surfaces in VARYING_PULL_CONSTANT_LOAD
Iago Toral Quiroga [Fri, 30 Oct 2015 07:39:11 +0000 (08:39 +0100)]
i965/fs: Do not mark direct used surfaces in VARYING_PULL_CONSTANT_LOAD

Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.

v2: Use const and remove useless surf_index temporary (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agoi965/skl+: Enable support for 16x multisampling
Neil Roberts [Mon, 7 Sep 2015 17:23:14 +0000 (18:23 +0100)]
i965/skl+: Enable support for 16x multisampling

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
9 years agomesa/meta: Use interpolateAtOffset for 16x MSAA copy blit
Neil Roberts [Mon, 28 Sep 2015 17:22:32 +0000 (18:22 +0100)]
mesa/meta: Use interpolateAtOffset for 16x MSAA copy blit

Previously there was a problem in i965 where if 16x MSAA is used then
some of the sample positions are exactly on the 0 x or y axis. When
the MSAA copy blit shader interpolates the texture coordinates at
these sample positions it was possible that it would jump to a
neighboring texel due to rounding errors. It is likely that these
positions would be used on 16x MSAA because that is where they are
defined to be in D3D.

To fix that this patch makes it use interpolateAtOffset in the blit
shader whenever 16x MSAA is used and the GL_ARB_gpu_shader5 extension
is available. This forces it to interpolate the texture coordinates at
the pixel center to avoid these problematic positions.

This fixes ext_framebuffer_multisample-unaligned-blit and
ext_framebuffer_multisample-clip-and-scissor-blit with 16x MSAA on
SKL+.

v2: Use interpolateAtOffset instead of interpolateAtSample
v3: Always try to enable GL_ARB_gpu_shader5 in the shader
    [Ian Romanick]

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
9 years agometa/blit: Always try to enable GL_ARB_sample_shading
Neil Roberts [Thu, 22 Oct 2015 08:55:35 +0000 (10:55 +0200)]
meta/blit: Always try to enable GL_ARB_sample_shading

Previously this extension was only enabled when blitting between two
multisampled buffers. However I don't think it does any harm to just
enable it all the time. The ‘enable’ option is used instead of
‘require’ so that the shader will still compile if the extension isn't
available in the cases where it isn't used. This will make the next
patch simpler because it wants to add another optional extension.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
9 years agometa: Support 16x MSAA in the multisample scaled blit shader
Neil Roberts [Wed, 16 Sep 2015 16:43:33 +0000 (17:43 +0100)]
meta: Support 16x MSAA in the multisample scaled blit shader

v2: Fix the x_scale in the shader. Remove the doubts in the commit
    message.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
9 years agoi965/meta: Support 16x MSAA in the meta stencil blit
Neil Roberts [Fri, 11 Sep 2015 17:09:46 +0000 (18:09 +0100)]
i965/meta: Support 16x MSAA in the meta stencil blit

The destination rectangle is now drawn at 4x4 the size and the shader
code to calculate the sample number is adjusted accordingly.

Acked-by: Ben Widawsky <ben@bwidawsk.net>
9 years agoi965/fs/skl+: Fix calculating gl_SampleID for 16x MSAA
Neil Roberts [Wed, 9 Sep 2015 16:44:17 +0000 (17:44 +0100)]
i965/fs/skl+: Fix calculating gl_SampleID for 16x MSAA

In order to accomodate 16x MSAA, the starting sample pair index is now
3 bits rather than 2 on SKL+.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
9 years agoi965: Support allocating the MCS buffer for 16x MSAA
Neil Roberts [Wed, 9 Sep 2015 13:38:08 +0000 (14:38 +0100)]
i965: Support allocating the MCS buffer for 16x MSAA

When 16 samples are used the MCS buffer needs 64 bits per pixel.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
9 years agoi965: Support calculating the bits needed to set up 16x MSAA
Neil Roberts [Wed, 9 Sep 2015 13:36:42 +0000 (14:36 +0100)]
i965: Support calculating the bits needed to set up 16x MSAA

The gen7_surface_msaa_bits function already returns the right values
for 16 samples but it just needs its assert to be relaxed.

Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
9 years agoi965/fs: Add a sampler program key for whether the texture is 16x MSAA
Neil Roberts [Tue, 15 Sep 2015 15:34:35 +0000 (16:34 +0100)]
i965/fs: Add a sampler program key for whether the texture is 16x MSAA

When 16x MSAA is used for sampling with texelFetch the compiler needs
to use a different instruction which passes more arguments for the MCS
data. Previously on skl+ it was unconditionally using this new
instruction. However since 16x MSAA is probably going to be pretty
rare, it is probably worthwhile to avoid using this instruction for
the other sample counts. In order to do that this patch adds a new
member to brw_sampler_prog_key_data to track when a sampler refers to
a buffer with 16 samples.

Note that this isn't done for the vec4 backend because it wouldn't
change how many registers it uses.

Acked-by: Ben Widawsky <ben@bwidawsk.net>
9 years agoi965/vec4/skl+: Use ld2dms_w instead of ld2dms
Neil Roberts [Wed, 9 Sep 2015 14:59:36 +0000 (15:59 +0100)]
i965/vec4/skl+: Use ld2dms_w instead of ld2dms

In order to support 16x MSAA, skl+ has a wider version of ld2dms that
takes two parameters for the MCS data. The MCS data in the response
still fits in a single register so we just need to ensure we copy both
values rather than just the lower one.

Acked-by: Ben Widawsky <ben@bwidawsk.net>
9 years agoi965/fs/skl+: Use ld2dms_w instead of ld2dms
Neil Roberts [Tue, 8 Sep 2015 14:52:09 +0000 (15:52 +0100)]
i965/fs/skl+: Use ld2dms_w instead of ld2dms

In order to support 16x MSAA, skl+ has a wider version of ld2dms that
takes two parameters for the MCS data. The MCS data retrieved from the
ld_mcs instruction already returns 4 or 8 registers and is documented
to return zeroes for the mcsh value when the sample count is less than
16.

v2: Use get_lowered_simd_width to fall back to SIMD8 instructions when
    the message length would be too long in SIMD16.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
9 years agoi965: Program 16x MSAA sample positions.
Neil Roberts [Wed, 16 Sep 2015 10:48:42 +0000 (11:48 +0100)]
i965: Program 16x MSAA sample positions.

This is the standard pattern used by the other 3D graphics API.

BDW has slots for these values, but they aren't actually used until
SKL. Even though the documentation for BDW says they must be zero, it
doesn't seem to cause any harm to program them anyway.

The comment above for the 8x sample positions says that the hardware
implements centroid interpolation by picking the centre-most sample
that is inside the primitive. That implies that it might be worthwhile
to pick a pattern that includes 0.5,0.5. However by experimentation
this doesn't seem to actually be the case. With the sample positions
in this patch, if I modify the piglit test below so that it instead
reports the centroid position, it reports 0.492188,0.421875 which
doesn't match any of the positions. If I modify the sample positions
so that they include one at exactly 0.5,0.5 it doesn't help and it
reports another position which is even further from the center for
some reason.

arb_gpu_shader5-interpolateAtSample-different

Kenneth Graunke experimented with some other patterns that have a
higher standard deviation but I think after some discussion it was
decided that it would be better to pick the same pattern as the other
graphics API in case there are games that rely on this pattern.

(Based on a patch by Kenneth Graunke)

Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben at bwidawsk.net>
9 years agoi965: Handle 16x MSAA in IMS dimension munging code.
Kenneth Graunke [Thu, 29 Jan 2015 07:58:43 +0000 (23:58 -0800)]
i965: Handle 16x MSAA in IMS dimension munging code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
9 years agonir: Rename nir_live_variables.c to nir_liveness.c.
Kenneth Graunke [Wed, 4 Nov 2015 01:16:49 +0000 (17:16 -0800)]
nir: Rename nir_live_variables.c to nir_liveness.c.

It doesn't actually operate on variables.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agonir: Rename live_variables to live_ssa_defs.
Kenneth Graunke [Wed, 4 Nov 2015 01:15:24 +0000 (17:15 -0800)]
nir: Rename live_variables to live_ssa_defs.

This computes liveness of SSA values, not nir_variables.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agoi965/vec4: select predicate based on writemask for sel emissions
Alejandro Piñeiro [Tue, 20 Oct 2015 11:08:09 +0000 (13:08 +0200)]
i965/vec4: select predicate based on writemask for sel emissions

Equivalent to commit 8ac3b525c but with sel operations. In this case
we select the PredCtrl based on the writemask.

This patch helps on cases like this:
 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D
 3: (+f0.0) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

In this case, cmod propagation can't optimize instruction #2, because
instructions #1 and #2 have different writemasks, and we can't update
directly instruction #2 writemask because our code thinks that sel at
instruction #3 reads all four channels of the flag, when it actually
only reads .x.

So, with this patch, the previous case becames this:
 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D
 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

Now only the x channel of the flag is used, allowing dead code
eliminate to update the writemask at the second instruction:
 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
 2: cmp.nz.f0.0 null.x:D, vgrf40.xxxx:D, 0D
 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

So now cmod propagation can simplify out #2:
 1: cmp.l.f0.0 vgrf40.0.x:F, attr18.wwww:F, vgrf7.xxxx:F
 2: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD

Shader-db numbers:
total instructions in shared programs: 6235835 -> 6228008 (-0.13%)
instructions in affected programs:     219850 -> 212023 (-3.56%)
total loops in shared programs:        1979 -> 1979 (0.00%)
helped:                                1192
HURT:                                  0

9 years agonouveau: relax fence emit space assert
Ilia Mirkin [Thu, 5 Nov 2015 03:42:41 +0000 (22:42 -0500)]
nouveau: relax fence emit space assert

We also have the "reserved for kick" space available. Some of my earlier
changes can probably be removed, but this is a quick fix for some of the
rarer fallout.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
9 years agovc4: When the create ioctl fails, free our cache and try again.
Eric Anholt [Wed, 4 Nov 2015 21:27:16 +0000 (13:27 -0800)]
vc4: When the create ioctl fails, free our cache and try again.

This greatly increases the pressure you can put on the driver before
create fails.  Ultimately we need to let the kernel take control of
our cached BOs and just take them from us (and other clients)
directly, but this is a very easy patch for the moment.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
9 years agovc4: Print the rounded shader size in debug output.
Eric Anholt [Wed, 4 Nov 2015 21:10:28 +0000 (13:10 -0800)]
vc4: Print the rounded shader size in debug output.

It's surprising to see "0kb" printed for debug on short shaders, while
4kb alignment won't be suprising.

9 years agovc4: Fix dumping the size of BOs allocated/cached.
Eric Anholt [Wed, 4 Nov 2015 21:13:39 +0000 (13:13 -0800)]
vc4: Fix dumping the size of BOs allocated/cached.

60MB of cached BOs are a lot less scary than 600MB.

9 years agomesa/tests: add glBufferStorageEXT to ES 3.1 dispatch list
Ilia Mirkin [Wed, 4 Nov 2015 19:26:37 +0000 (14:26 -0500)]
mesa/tests: add glBufferStorageEXT to ES 3.1 dispatch list

I thought that aliased functions didn't need to be added, but that might
only be if the function aliases something in the same {desktop,ES}
space. Resolves the dispatch sanity test failure.

Fixes: 13b19aa81 (mesa: expose support for GL_EXT_buffer_storage)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92824
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agovbo: fix another GL_LINE_LOOP bug
Brian Paul [Sat, 31 Oct 2015 13:02:36 +0000 (07:02 -0600)]
vbo: fix another GL_LINE_LOOP bug

Very long line loops which spanned 3 or more vertex buffers were not
handled correctly and could result in stray lines.

The piglit lineloop test draws 10000 vertices by default, and is not
long enough to trigger this.  Even 'lineloop -count 100000' doesn't
trigger the bug.

For future reference, the issue can be reproduced by changing Mesa's
VBO_VERT_BUFFER_SIZE to 4096 and changing the piglit lineloop test to
use glVertex2f(), draw 3 loops instead of 1, and specifying -count
1023.

Acked-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
9 years agosvga: implement 'white_fragments' option for VGPU10 fragment shaders
Brian Paul [Tue, 3 Nov 2015 21:34:15 +0000 (14:34 -0700)]
svga: implement 'white_fragments' option for VGPU10 fragment shaders

When we emulate XOR logicop mode with blend-subtract, we need to ensure
that the fragment shader always emits white.  We had this implemented
for VGPU9, but not VGPU10.

VMware bug 1545492.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
9 years agou_vbuf: minor code reformatting / line wrapping
Brian Paul [Thu, 29 Oct 2015 01:05:27 +0000 (19:05 -0600)]
u_vbuf: minor code reformatting / line wrapping

Trivial.

9 years agou_vbuf: add some const qualifiers
Brian Paul [Thu, 29 Oct 2015 01:02:38 +0000 (19:02 -0600)]
u_vbuf: add some const qualifiers

Trivial.

9 years agosvga: use new enum indices_mode type
Brian Paul [Sat, 31 Oct 2015 13:44:49 +0000 (07:44 -0600)]
svga: use new enum indices_mode type

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
9 years agoutil/indices: replace #define tokens with enum type
Brian Paul [Sat, 31 Oct 2015 13:44:23 +0000 (07:44 -0600)]
util/indices: replace #define tokens with enum type

To ease debugging in gdb.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
9 years agoi965: check inst->predicate when clearing flag_live at dead code eliminate
Alejandro Piñeiro [Thu, 22 Oct 2015 20:22:14 +0000 (22:22 +0200)]
i965: check inst->predicate when clearing flag_live at dead code eliminate

Detected by Matt Turner while reviewing commit
a59359ecd22154cc2b3f88bb8c599f21af8a3934

Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agogallivm: fix sampling for s3tc srgb formats when using texture cache
Roland Scheidegger [Wed, 4 Nov 2015 13:21:43 +0000 (14:21 +0100)]
gallivm: fix sampling for s3tc srgb formats when using texture cache

This actually stored the values as 8bit linear values in the cache,
then did another srgb->linear conversion...
We don't want to do the former (decoding 8bit srgb values to 8bit linear
completely defeats the purpose of srgb in the first place), so just decode
to 8bit srgb.
Fixes piglit.spec.ext_texture_srgb.texwrap formats-s3tc tests.

9 years agoi965/meta: Assert fast clears and rep clears never overlap
Ben Widawsky [Wed, 14 Oct 2015 03:50:25 +0000 (20:50 -0700)]
i965/meta: Assert fast clears and rep clears never overlap

There is nothing wrong with the code today, but as one modifies the code it
turns out to be not too difficult to mess up the code, and this easy assertion
should catch such driver implementation failures quickly.

Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
9 years agomesa: expose support for GL_EXT_buffer_storage
Ryan Houdek [Tue, 3 Nov 2015 01:30:18 +0000 (19:30 -0600)]
mesa: expose support for GL_EXT_buffer_storage

This extension requires ES 3.1 since it relies on glMemoryBarrier.
For testing purposes I temporarily moved glMemoryBarrier to be an ES 3.0
function.
This has been tested with the piglit in the ML and the Dolphin emulator.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agoglsl: make sure to only add subroutines to resource list
Timothy Arceri [Tue, 3 Nov 2015 21:41:29 +0000 (08:41 +1100)]
glsl: make sure to only add subroutines to resource list

Over looked in 763cd8c080353.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
9 years agoglsl: remove old TODO
Timothy Arceri [Wed, 4 Nov 2015 03:50:49 +0000 (14:50 +1100)]
glsl: remove old TODO

SSBO support now exists as of commits f24e5e and f408a13dd30.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
9 years agodocs: Mark AoA as done for i965
Timothy Arceri [Thu, 15 Oct 2015 23:28:48 +0000 (10:28 +1100)]
docs: Mark AoA as done for i965

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agoi965: enable ARB_arrays_of_arrays
Timothy Arceri [Thu, 15 Oct 2015 23:28:47 +0000 (10:28 +1100)]
i965: enable ARB_arrays_of_arrays

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
9 years agoi965: add support for image AoA
Timothy Arceri [Fri, 30 Oct 2015 23:31:37 +0000 (10:31 +1100)]
i965: add support for image AoA

V3: clamp array index to the correct size (the size of the current array
rather than the inner array) Francisco Jerez.

V2: avoid useless zero-initialization and addition for the first AoA level,
avoid redundant temporary, make use of type_size_scalar(), rename aoa_size
to element_size, assign the indirect indexing temporary directly to
image.reladdr, and replace while loop with a for loop. All suggested
by Francisco Jerez.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agollvmpipe: add cache for compressed textures
Roland Scheidegger [Tue, 27 Oct 2015 04:34:00 +0000 (05:34 +0100)]
llvmpipe: add cache for compressed textures

compressed textures are very slow because decoding is rather complex
(and because there's no jit code code to decode them too for non-technical
reasons).
Thus, add some texture cache which holds a couple of decoded blocks.
Right now this handles only s3tc format albeit it could be extended to work
with other formats rather trivially as long as the result of decode fits into
32bit per texel (ideally, rgtc actually would decode to more than 8 bits
per channel, but even then making it work for it shouldn't be too difficult).
This can improve performance noticeably but don't expect wonders (uncompressed
is unsurprisingly still faster). It's also possible it might be slower in
some cases (using nearest filtering for example or if there's otherwise not
many cache hits, the cache is only direct mapped which isn't great).
Also, actual decode of a block relies on util code, thus even though always
full blocks are decoded it is done texel by texel - this could obviously
benefit greatly from simd-optimized code decoding full blocks at once...
Note the cache is per (raster) thread, and currently only used for fragment
shaders.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
9 years agollvmpipe: use simple coeffs calc for 128bit vectors
Oded Gabbay [Tue, 3 Nov 2015 08:36:01 +0000 (10:36 +0200)]
llvmpipe: use simple coeffs calc for 128bit vectors

There are currently two methods in llvmpipe code to calculate coeffs to
be used as inputs for the fragment shader. The two methods use slightly
different ways to do the floating point calculations and thus produce
slightly different results.

The decision which method to use is determined by the size of the vector
that is used by the platform.

For vectors with size of more than 128bit, a single-step method is used,
in which coeffs_init_simple() + attribs_update_simple() are called.

For vectors with size of 128bit or less, a two-step method is used, in
which coeffs_init() + attribs_update() are called.

This causes some piglit tests (clip-distance-bulk-copy,
interface-vs-unnamed-to-fs-unnamed) to fail when using platforms with
128bit vectors (such as ppc64le or x86-64 without AVX).

This patch makes platforms with 128bit vectors use the single-step
method (aka "simple" method) instead of the two-step method.
This would make the resulting coeffs identical between more platforms,
make sure the piglit tests passes, and make debugging and maintainability
a bit easier as the generated LLVM IR will be the same for more platforms.

The performance impact is negligible for x86-64 without AVX, and
basically non-existent for ppc64le, as it can be seen from the following
benchmarking results:

- glxspheres, on ppc64le:

   - original code:  4.892745317 frames/sec 5.460303857 Mpixels/sec
   - with the patch: 4.932083873 frames/sec 5.504205571 Mpixels/sec
   - Additional 0.8% performance boost

- glxspheres, on x86-64 without AVX:

   - original code:  20.16418809 frames/sec 22.50323395 Mpixels/sec
   - with the patch: 20.31328989 frames/sec 22.66963152 Mpixels/sec
   - Additional 0.74% performance boost

- glmark2, on ppc64le:

  - original code:  score of 58
  - with my change: score of 57

- glmark2, on x86-64 without AVX:

  - original code:  score of 175
  - with the patch: score of 167
  - Impact of of -4.5% on performance

- OpenArena, on ppc64le:

  - original code:  3398 frames 1719.0 seconds 2.0 fps
                    255.0/505.9/2773.0/0.0 ms

  - with the patch: 3398 frames 1690.4 seconds 2.0 fps
                    241.0/497.5/2563.0/0.2 ms

  - 29 seconds faster with the patch, which is about 2%

- OpenArena, on x86-64 without AVX:

  - original code:  3398 frames 239.6 seconds 14.2 fps
                    38.0/70.5/719.0/14.6 ms

  - with the patch: 3398 frames 244.4 seconds 13.9 fps
                    38.0/71.9/697.0/14.3 ms

  - 0.3 fps slower with the patch (about 2%)

Additional details can be found at:
http://lists.freedesktop.org/archives/mesa-dev/2015-October/098635.html

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
9 years agonir: Properly invalidate metadata in nir_opt_remove_phis().
Kenneth Graunke [Tue, 3 Nov 2015 05:43:40 +0000 (21:43 -0800)]
nir: Properly invalidate metadata in nir_opt_remove_phis().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
9 years agonir: Properly invalidate metadata in nir_lower_vec_to_movs().
Kenneth Graunke [Tue, 3 Nov 2015 05:38:56 +0000 (21:38 -0800)]
nir: Properly invalidate metadata in nir_lower_vec_to_movs().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
9 years agonir: Properly invalidate metadata in nir_opt_copy_prop().
Kenneth Graunke [Tue, 3 Nov 2015 05:21:25 +0000 (21:21 -0800)]
nir: Properly invalidate metadata in nir_opt_copy_prop().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
9 years agonir: Properly invalidate metadata in nir_remove_dead_variables().
Kenneth Graunke [Tue, 3 Nov 2015 05:28:26 +0000 (21:28 -0800)]
nir: Properly invalidate metadata in nir_remove_dead_variables().

v2: Preserve live_variables too (Jason).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
9 years agonir: Properly invalidate metadata in nir_split_var_copies().
Kenneth Graunke [Tue, 3 Nov 2015 05:05:08 +0000 (21:05 -0800)]
nir: Properly invalidate metadata in nir_split_var_copies().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
9 years agonir: Properly invalidate metadata in nir_lower_global_vars_to_local().
Kenneth Graunke [Tue, 3 Nov 2015 05:02:37 +0000 (21:02 -0800)]
nir: Properly invalidate metadata in nir_lower_global_vars_to_local().

v2: Preserve nir_metadata_live_variables as well (caught by Jason).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
9 years agonir: Unexpose _impl versions of copy_prop and dce
Jason Ekstrand [Wed, 28 Oct 2015 17:11:11 +0000 (10:11 -0700)]
nir: Unexpose _impl versions of copy_prop and dce

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agomesa: rename UniformBlockStageIndex to InterfaceBlockStageIndex
Jordan Justen [Fri, 23 Oct 2015 23:10:02 +0000 (16:10 -0700)]
mesa: rename UniformBlockStageIndex to InterfaceBlockStageIndex

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: Iago Toral <itoral@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
9 years agoi965/vec4: Send from GRF in atomic operations.
Matt Turner [Fri, 30 Oct 2015 17:07:23 +0000 (10:07 -0700)]
i965/vec4: Send from GRF in atomic operations.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agogallium/radeon: allow returning SDMA fences from pipe->flush
Marek Olšák [Wed, 28 Oct 2015 12:50:08 +0000 (13:50 +0100)]
gallium/radeon: allow returning SDMA fences from pipe->flush

pipe->flush never returned SDMA fences. This fixes it.
This is only an issue on amdgpu where fences can signal out of order.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agogallium/radeon: always return the last SDMA fence on SDMA flush if needed
Marek Olšák [Wed, 28 Oct 2015 11:59:38 +0000 (12:59 +0100)]
gallium/radeon: always return the last SDMA fence on SDMA flush if needed

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoi965: Add scalar geometry shader support.
Kenneth Graunke [Thu, 12 Mar 2015 06:14:31 +0000 (23:14 -0700)]
i965: Add scalar geometry shader support.

This is hidden behind INTEL_SCALAR_GS=1 for now, as we don't yet support
instanced geometry shaders, and Orbital Explorer's shader spills like
crazy.  But the infrastructure is in place, and it's largely working.

v2: Lots of rebasing.

v3: (feedback from Kristian Høgsberg)
- Handle stride and subreg_offset correctly for ATTRs; use a helper.
- Fix missing emit_shader_time_end() call.
- Delete dead code after early EOT in static vertex case to avoid
  tripping asserts in emit_shader_time_end().
- Use proper D/UD type in intexp2().
- Fix "EndPrimitve" and "to that" typos.
- Assert that invocations == 1 so we know this is missing.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
9 years agoi965: Add scalar GS input lowering code.
Kenneth Graunke [Thu, 24 Sep 2015 03:52:19 +0000 (20:52 -0700)]
i965: Add scalar GS input lowering code.

We really ought to compute the VUE map at link time and stash it, rather
than recomputing it here, but with the mess of program structures I
wasn't sure where to put it.  We can improve that later.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
9 years agoi965: Fix the fs_visitor GS constructor to take shader_time_index.
Kenneth Graunke [Tue, 3 Nov 2015 20:51:32 +0000 (12:51 -0800)]
i965: Fix the fs_visitor GS constructor to take shader_time_index.

Jason reworked this so it isn't simply ST_GS anymore...it's either -1
(not enabled) or an actual offset.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
9 years agoi965/gen8+: Extract color clear surface state
Ben Widawsky [Wed, 14 Oct 2015 03:50:19 +0000 (20:50 -0700)]
i965/gen8+: Extract color clear surface state

On future generation platforms the color clear value is stored elsewhere in the
surface state. By extracting this logic, we can cleanly implement the difference
in an upcoming patch.

Should have no functional impact.

v2: Move hunk from the next patch into this patch (Matt)
Whitespace fix (Ben)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
9 years agoi965/gen8+: Remove redundant zeroing of surface state
Ben Widawsky [Wed, 14 Oct 2015 03:50:18 +0000 (20:50 -0700)]
i965/gen8+: Remove redundant zeroing of surface state

The allocate_surface_state already zeroes out the surface state, and doing it
later in the function is destructive for what we want to accomplish when we
split out support for gen9 fast clears (next patch).

NOTE: Only dword 12 actually needed to be fixed, but it seemed more consistent
to remove the other instances as well. I can make an argument both ways (open
coding it, vs. not). I can rework the next patch if requires.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
9 years agonvc0: add missing compute parameters required by clover
Samuel Pitoiset [Tue, 3 Nov 2015 18:33:08 +0000 (19:33 +0100)]
nvc0: add missing compute parameters required by clover

This fixes crashes with some piglit OpenCL tests.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonvc0: handle NULL pointer in nvc0_get_compute_param()
Samuel Pitoiset [Tue, 3 Nov 2015 18:32:49 +0000 (19:32 +0100)]
nvc0: handle NULL pointer in nvc0_get_compute_param()

To get the size (in bytes) of a compute parameter, clover first calls
get_compute_param() with a NULL data pointer. The RET() macro is based
on nv50.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agoi965/skl: PCI ID cleanup and brand strings
Ben Widawsky [Fri, 23 Oct 2015 18:30:16 +0000 (11:30 -0700)]
i965/skl: PCI ID cleanup and brand strings

A few new PCI ids are added here, and one is removed (0x190B) because it no
longer seems to exist anywhere.

v2-4:
Only use ascii characters (Ilia)
0x1921 is no longer marked as f

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
9 years agoi965/skl: Add GT4 PCI IDs
Ben Widawsky [Fri, 30 Oct 2015 00:30:35 +0000 (17:30 -0700)]
i965/skl: Add GT4 PCI IDs

Like other gen8+ hardware, the hardware automatically scales up thread counts.
We must be careful about the URB sizes since GT4 adds another slice.

One of the existing PCI IDs is actually mislabeled as GT3. Arguably this is a
real bug since the URB size will be wrong. Because this patch is simply meant to
add the missing IDs, that will be fixed in a later patch.

v2: No longer relevant.

v3: Update the wm thread count to support GT4. The WM thread count is used to
determine the maximum scratch space required. Currently the code always
allocates the maximum amount even though lower GT SKUs require less. The formula
is threads_per_psd * subslices_per_slice * slices

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
9 years agomesa: Add spec citations for DispatchCompute*
Jordan Justen [Tue, 13 Oct 2015 22:04:54 +0000 (15:04 -0700)]
mesa: Add spec citations for DispatchCompute*

Note: The OpenGL 4.3 - 4.5 specification language for DispatchCompute
appears to have an error regarding the max allowed values. When adding
the specification citation, we note why the code does not match the
specification language.

v2:
 * Updates based on review from Iago

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
9 years agomesa: Update DispatchComputeIndirect errors for indirect parameter
Jordan Justen [Tue, 13 Oct 2015 22:04:54 +0000 (15:04 -0700)]
mesa: Update DispatchComputeIndirect errors for indirect parameter

There is some discrepancy between the return values for some error
cases for the DispatchComputeIndirect call in the ARB_compute_shader
specification. Regarding the indirect parameter, in one place the
extension spec lists that the error returned for invalid values should
be INVALID_OPERATION, while later it specifies INVALID_VALUE.

The OpenGL 4.3 and OpenGLES 3.1 specifications appear to be consistent
in requiring the INVALID_VALUE error return in this case.

Here we update the code to match the main specifications, and update
the citations use the main specification rather than the extension
specification.

v2:
 * Updates based on review from Iago

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
Cc: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
9 years agoi965/fs: Clean up FBH code.
Matt Turner [Mon, 26 Oct 2015 18:35:57 +0000 (11:35 -0700)]
i965/fs: Clean up FBH code.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agoi965/vec4: Clean up FBH code.
Matt Turner [Mon, 26 Oct 2015 18:35:57 +0000 (11:35 -0700)]
i965/vec4: Clean up FBH code.

It did a bunch of unnecessary stuff, emitting an extra MOV included.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agoi965: Replace default case with list of enum values.
Matt Turner [Mon, 26 Oct 2015 13:58:56 +0000 (06:58 -0700)]
i965: Replace default case with list of enum values.

If we add a new file type, we'd like to get warnings if it's not
handled.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agoi965/vec4: Don't disable channels in any/all comparisons.
Matt Turner [Mon, 26 Oct 2015 03:49:08 +0000 (20:49 -0700)]
i965/vec4: Don't disable channels in any/all comparisons.

We've made a mistake in calling the Channel Enable bits "writemask",
because they do more than control which channels of the destination are
written -- they actually control which channels are enabled (surprise!
surprise!)

So, if we emit

               cmp.z.f0(8) null.xy<1>D  g10<4,4,1>.xyzzD g2<0,4,1>.xyzzD
               mov(8)      g12<1>.xUD   0x00000000UD
   (+f0.all4h) mov(8)      g12<1>.xUD   0xffffffffUD

where the CMP instruction has only .xy channel enables, it won't write
the .zw channels of the flag register, which are of course read by the
+f0.all4 predicate.

We need to always emit CMP instructions whose flag result might be read
by such a predicate with all channels enabled.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agomesa: fix uniforms calculation in glGetProgramiv
Tapani Pälli [Fri, 30 Oct 2015 12:30:35 +0000 (14:30 +0200)]
mesa: fix uniforms calculation in glGetProgramiv

Since introduction of SSBO, UniformStorage contains not just uniforms
but also buffer variables, this needs to be taken in to account when
calculating active uniforms with GL_ACTIVE_UNIFORMS and
GL_ACTIVE_UNIFORM_MAX_LENGTH.

No Piglit regressions.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
9 years agomesa: fix program resource queries for atomic counter buffers
Tapani Pälli [Fri, 30 Oct 2015 10:02:51 +0000 (12:02 +0200)]
mesa: fix program resource queries for atomic counter buffers

gl_active_atomic_buffer contains index to UniformStorage, we need to
calculate resource index for that gl_uniform_storage.

Fixes following CTS tests:
   ES31-CTS.program_interface_query.atomic-counters
   ES31-CTS.program_interface_query.atomic-counters-one-buffer

No Piglit regressions.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
9 years agoglsl: join calculate_array_size() and calculate_array_stride()
Juha-Pekka Heikkila [Wed, 21 Oct 2015 09:09:21 +0000 (12:09 +0300)]
glsl: join calculate_array_size() and calculate_array_stride()

These helpers are ran for same case the same loop. Here joined
their operation so the loop is ran just once. Also fixed
out-of-memory condition here.

v2: Make the loop simpler to read as per Tapani's suggestion

Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
9 years agomesa: expose support for OES/EXT_draw_elements_base_vertex to OpenGL ES
Ryan Houdek [Mon, 2 Nov 2015 03:25:27 +0000 (21:25 -0600)]
mesa: expose support for OES/EXT_draw_elements_base_vertex to OpenGL ES

This has been tested with the piglits in the mailing list and
on the Dolphin emulator.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonouveau: set MaxDrawBuffers to the same value as MaxColorAttachments
Ilia Mirkin [Mon, 2 Nov 2015 01:13:13 +0000 (20:13 -0500)]
nouveau: set MaxDrawBuffers to the same value as MaxColorAttachments

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
9 years agonv50: use correct heaps for FP and GP code segments
Samuel Pitoiset [Sun, 1 Nov 2015 22:28:02 +0000 (23:28 +0100)]
nv50: use correct heaps for FP and GP code segments

This is just a cosmetic change. Trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
9 years agomesa/sso: Add compute shader support
Jordan Justen [Sat, 17 Oct 2015 04:19:45 +0000 (21:19 -0700)]
mesa/sso: Add compute shader support

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
[itoral@igalia.com: Reviewed-by for all except the ctx->_Shader change]
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
9 years agomesa/sso: Add MESA_VERBOSE=api trace support
Jordan Justen [Sat, 17 Oct 2015 04:14:10 +0000 (21:14 -0700)]
mesa/sso: Add MESA_VERBOSE=api trace support

v2:
 * Use %u for unsigned values (Iago)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
9 years agoi965: Setup pull constant state for compute programs
Jordan Justen [Thu, 15 Oct 2015 17:27:00 +0000 (10:27 -0700)]
i965: Setup pull constant state for compute programs

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
9 years agomain/get: Add MAX_COMBINED_COMPUTE_UNIFORM_COMPONENTS
Jordan Justen [Wed, 14 Oct 2015 00:19:54 +0000 (17:19 -0700)]
main/get: Add MAX_COMBINED_COMPUTE_UNIFORM_COMPONENTS

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
9 years agoglsl: OpenGLES GLSL 3.1 precision qualifiers ordering rules
Jordan Justen [Thu, 15 Oct 2015 21:47:34 +0000 (14:47 -0700)]
glsl: OpenGLES GLSL 3.1 precision qualifiers ordering rules

The OpenGLES GLSL 3.1 specification uses the precision qualifier
ordering rules from ARB_shading_language_420pack.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
9 years agoglsl: Add compute shader builtin variables for OpenGLES 3.1
Jordan Justen [Wed, 14 Oct 2015 00:18:52 +0000 (17:18 -0700)]
glsl: Add compute shader builtin variables for OpenGLES 3.1

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
9 years agonouveau: get rid of tabs
Ilia Mirkin [Sat, 31 Oct 2015 23:54:38 +0000 (19:54 -0400)]
nouveau: get rid of tabs

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agoi965/sched: don't calculate live intervals for post-RA scheduling
Connor Abbott [Fri, 30 Oct 2015 22:19:34 +0000 (18:19 -0400)]
i965/sched: don't calculate live intervals for post-RA scheduling

For some reason, this causes assertions on gm965 only. In any case, it's
unnecessary since we don't need liveness information in the post-RA
scheduler.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92744
Cc: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agovirgl/vtest: fix extra malloc
Dave Airlie [Sat, 31 Oct 2015 08:04:26 +0000 (18:04 +1000)]
virgl/vtest: fix extra malloc

This somehow got added twice, drop the first one.

Reported by Coverity.

Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agovirgl: free sampler view on failure path
Dave Airlie [Sat, 31 Oct 2015 06:07:52 +0000 (16:07 +1000)]
virgl: free sampler view on failure path

Reported by Coverity.

Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agogallium/swrast: fixup build breakage and warnings
Dave Airlie [Sat, 31 Oct 2015 06:11:29 +0000 (16:11 +1000)]
gallium/swrast: fixup build breakage and warnings

The front buffer rendering changes broke an interface, I didn't
fix up all of them.

Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agogallium/swrast: fix front buffer blitting. (v2)
Dave Airlie [Fri, 9 Oct 2015 00:38:08 +0000 (01:38 +0100)]
gallium/swrast: fix front buffer blitting. (v2)

So I've known this was broken before, cogl has a workaround
for it from what I know, but with the gallium based swrast
drivers BlitFramebuffer from back to front or vice-versa
was pretty broken.

The legacy swrast driver tracks when a front buffer is used
and does the get/put images when it is mapped/unmapped,
so this patch attempts to add the same functionality to the
gallium drivers.

It creates a new context interface to denote when a front
buffer is being created, and passes a private pointer to it,
this pointer is then used to decide on map/unmap if the
contents should be updated from the real frontbuffer using
get/put image.

This is primarily to make gtk's gl code work, the only
thing I've tested so far is the glarea test from
https://github.com/ebassi/glarea-example.git

v2: bump extension version,
check extension version before calling get image. (Ian)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91930

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agoglsl: set image access qualifiers for AoA
Timothy Arceri [Thu, 15 Oct 2015 23:28:45 +0000 (10:28 +1100)]
glsl: set image access qualifiers for AoA

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agoi965: Do legacy userclipping in OpenGL ES 1.x contexts.
Ian Romanick [Tue, 27 Oct 2015 21:50:14 +0000 (14:50 -0700)]
i965: Do legacy userclipping in OpenGL ES 1.x contexts.

Commit fba4823a disabled user clipping for everything except
compatibility profile.  Core profile and OpenGL ES 2.0+ have all removed
the classic, OpenGL 1.0 user clip planes.  ES 1.x, however, still has
them.

Fixes OpenGL ES 1.1 conformance mustpass.c and userclip.c

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Olivier Berthier <olivierx.berthier@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92639
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92641

9 years agogbm.h: Add a missing stddef.h include for size_t.
Emmanuel Gil Peyrot [Thu, 29 Oct 2015 15:22:19 +0000 (15:22 +0000)]
gbm.h: Add a missing stddef.h include for size_t.

This was causing compilation issues when one of its providers wasn’t
already included before gbm.h.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
9 years agowinsys/virgl: rework line wrapping/indent
Emil Velikov [Wed, 28 Oct 2015 09:54:15 +0000 (09:54 +0000)]
winsys/virgl: rework line wrapping/indent

Wrap some of the 'omg it's getting out of hand' long lines, and
re-indent where things feel off.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agovirgl: unwrap the includes
Emil Velikov [Thu, 29 Oct 2015 10:17:04 +0000 (10:17 +0000)]
virgl: unwrap the includes

Include what you want, rather than relying on a header foo.h N levels
down the include chain, to provide something that you need.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agowinsys/virgl: remove temporary ret variable
Emil Velikov [Wed, 28 Oct 2015 12:50:47 +0000 (12:50 +0000)]
winsys/virgl: remove temporary ret variable

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
9 years agowinsys/virgl: always memset prior to ioctl
Emil Velikov [Wed, 28 Oct 2015 12:49:08 +0000 (12:49 +0000)]
winsys/virgl: always memset prior to ioctl

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>