Iago Toral Quiroga [Thu, 17 Sep 2015 11:43:52 +0000 (13:43 +0200)]
i965/vec4: Use MRF registers 21-23 for spilling in gen6
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Iago Toral Quiroga [Tue, 15 Sep 2015 14:33:48 +0000 (16:33 +0200)]
i965/fs: Use MRF registers 21-23 for spilling in gen6
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Iago Toral Quiroga [Tue, 15 Sep 2015 14:00:26 +0000 (16:00 +0200)]
i965: Turn BRW_MAX_MRF into a macro that accepts a hardware generation
There are some bug reports about shaders failing to compile in gen6
because MRF 14 is used when we need to spill. For example:
https://bugs.freedesktop.org/show_bug.cgi?id=86469
https://bugs.freedesktop.org/show_bug.cgi?id=90631
Discussion in bugzilla pointed to the fact that gen6 might actually have
24 MRF registers available instead of 16, so we could use other MRF
registers and avoid these conflicts (we still need to investigate why
some shaders need up to MRF 14 anyway, since this is not expected).
Notice that the hardware docs are not clear about this fact:
SNB PRM Vol4 Part2's "Table 5-4. MRF Registers Available in Device
Hardware" says "Number per Thread" - "24 registers"
However, SNB PRM Vol4 Part1, 1.6.1 Message Register File (MRF) says:
"Normal threads should construct their messages in m1..m15. (...)
Regardless of actual hardware implementation, the thread should
not assume th at MRF addresses above m15 wrap to legal MRF registers."
Therefore experimentation was necessary to evaluate if we had these extra
MRF registers available or not. This was tested in gen6 using MRF
registers 21..23 for spilling and doing a full piglit run (all.py) forcing
spilling of everything on the FS backend. It was also tested by doing
spilling of everything on both the FS and the VS backends with a piglit run
of shader.py. In both cases no regressions were observed. In fact, many of
these tests where helped in the cases where we forced spilling, since that
triggered the same underlying problem described in the bug reports. Here are
some results using INTEL_DEBUG=spill_fs,spill_vec4 for a shader.py run on
gen6 hardware:
Using MRFs 13..15 for spilling:
crash: 2, fail: 113, pass: 6621, skip: 5461
Using MRFs 21..23 for spilling:
crash: 2, fail: 12, pass: 6722, skip: 5461
This patch sets the ground for later patches to implement spilling
using MRF registers 21..23 in gen6.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Iago Toral Quiroga [Wed, 16 Sep 2015 07:08:19 +0000 (09:08 +0200)]
i965: Move MRF register asserts out of brw_reg.h
In a later patch we will make BRW_MAX_MRF return a different value depending
on the hardware generation, but it is inconvenient to add a gen parameter
to the brw_reg functions only for the assertions, so move these to places where
we have the hardware generation available.
Ken suggested to add the asserts to brw_set_src0 and brw_set_dest since that
would make sure that we catch all uses of MRF registers, even those coming
from modules that generate native code directly, like blorp. Unfortunately,
this is very late in the process which can make things harder to debug, so add
asserts to the generator as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Iago Toral Quiroga [Fri, 18 Sep 2015 06:15:52 +0000 (08:15 +0200)]
i965: Maximum allowed size of SEND messages is 15 (4 bits)
Until now we only used MRFs 1..15 for regular SEND messages, so the
message length could not possibly exceed the maximum size. Soon we'll
allow to use MRF registers 1..23 in gen6, so we need to be careful
not to build messages that can go beyond the limit. That could occur,
specifically, when building URB write messages, which we may need to
split in chunks due to their size. Previously we would simply go and
create a new message when we reached MRF 13 (since 13..15 were
reserved for spilling), now we also want to check the size of the
message explicitly.
Besides adding that condition to split URB write messages properly,
this patch also adds asserts in the generator. Notice that
brw_inst_set_mlen already asserts for this, but asserting in the
generators is easy and can make debugging easier in some cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rob Clark [Sun, 20 Sep 2015 18:02:29 +0000 (14:02 -0400)]
nir/print: fix coverity error
Not something actually hit in real life (now state is never non-null,
but only case state->syms is null is if nir_print_instr() path). But it
was something I overlooked the first time, so might as well fix it.
*** CID
1324642: Null pointer dereferences (REVERSE_INULL)
/src/glsl/nir/nir_print.c: 299 in print_var_decl()
293
294 fprintf(fp, " (%s, %u)", loc, var->data.driver_location);
295 }
296
297 fprintf(fp, "\n");
298
>>> CID
1324642: Null pointer dereferences (REVERSE_INULL)
>>> Null-checking "state" suggests that it may be null, but it has already been dereferenced on all paths leading to the check.
299 if (state) {
300 _mesa_set_add(state->syms, name);
301 _mesa_hash_table_insert(state->ht, var, name);
302 }
303 }
304
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Eduardo Lima Mitev [Fri, 18 Sep 2015 08:30:12 +0000 (10:30 +0200)]
i965/vec4/nir: Remove all "this->" snippets
For consistency, either we have all class members dereferenced, or none.
In this case, very few are so lets get rid of them all.
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Marcin Ślusarz [Sun, 20 Sep 2015 11:40:10 +0000 (13:40 +0200)]
dri/common: fix gbm-symbols-check regression
Broken by commit
c228514c72cb2fd5fb9e510808e29204fc9e7ae1
"dri/common: use sysconfdir when looking for drirc".
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92054
Signed-off-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
Emil Velikov [Sun, 20 Sep 2015 10:59:24 +0000 (11:59 +0100)]
docs: add news item and link release notes for 10.6.8
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Emil Velikov [Sun, 20 Sep 2015 10:55:41 +0000 (11:55 +0100)]
docs: add sha256 checksums for 10.6.8
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit
02387926addc62198c9b684f4f51f7cbe06b3e25)
Emil Velikov [Sun, 20 Sep 2015 10:05:07 +0000 (11:05 +0100)]
docs: add release notes for 10.6.8
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit
91c6302734574e91424a7ccb52b6368b712366cc)
Nanley Chery [Thu, 27 Aug 2015 23:29:06 +0000 (16:29 -0700)]
mesa/teximage: reuse compressed format utility functions for base_format
Reuse utility functions instead of reimplementing the same logic.
* _mesa_is_compressed_format() performs the required checking to
determine format support in the current context.
* _mesa_gl_compressed_format_base_format() returns the base format.
As a side effect, we now check that we're in a desktop context when
determining support for the FXT1 and RGTC formats. This is in agreement
with our extension table and the glext headers.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Nanley Chery [Thu, 27 Aug 2015 23:25:48 +0000 (16:25 -0700)]
mesa/texcompress: add compressed formats to base format utility function
Add S3TC and PALETTE formats.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Nanley Chery [Wed, 26 Aug 2015 23:36:11 +0000 (16:36 -0700)]
mesa/glformats: refactor compressed format support function
Instead of case statements, use _mesa_get_format_layout() to
determine if a GL format is part of a family of compressed formats.
v2. restrict LATC formats to API_OPENGL_COMPAT (Ilia).
rename the variable mFormat to m_format.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Nanley Chery [Wed, 26 Aug 2015 23:25:44 +0000 (16:25 -0700)]
mesa/formats: add MESA_LAYOUT_LATC
This enables us to predicate statments on a compressed format being
a type of LATC format. Also, remove the comment that lists the enum
(it was getting a tad long).
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Marcin Ślusarz [Sat, 19 Sep 2015 17:17:34 +0000 (19:17 +0200)]
dri/common: use sysconfdir when looking for drirc
Useful when locally installed mesa has more quirks than the system one.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Rob Clark [Thu, 17 Sep 2015 17:35:33 +0000 (13:35 -0400)]
freedreno/ir3: use nir two-sided-color lowering
With this, we completely switch over to nir lowering passes instead of
tgsi_lowering. So one step closer to supporting direct glsl or spirv to
nir support for freedreno a3xx/a4xx.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Thu, 17 Sep 2015 17:17:08 +0000 (13:17 -0400)]
nir: add two-sided-color lowering pass
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rob Clark [Fri, 18 Sep 2015 17:23:36 +0000 (13:23 -0400)]
nir/build: add nir_vec() helper
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Rob Clark [Wed, 16 Sep 2015 17:42:21 +0000 (13:42 -0400)]
freedreno/ir3: lower txp/clamp in NIR
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Fri, 18 Sep 2015 14:44:27 +0000 (10:44 -0400)]
nir/lower_tex: add support to clamp texture coords
Some hardware needs to clamp texture coordinates to [0.0, 1.0] in the
shader to emulate GL_CLAMP. This is added to lower_tex_proj since, in
the case of projected coords, the clamping needs to happen *after*
projection.
v2: comments/suggestions from Ilia and Eric, use txs to get texture size
and clamp RECT textures to their dimensions rather than [0.0, 1.0] to
avoid having to lower RECT textures to 2D.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rob Clark [Wed, 16 Sep 2015 20:49:14 +0000 (16:49 -0400)]
nir/lower_tex: support for lowering RECT textures
v2: comments/suggestions from Ilia and Eric, split out get_texture_size()
helper so we can use it in the next commit for clamping RECT textures.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rob Clark [Wed, 16 Sep 2015 16:56:58 +0000 (12:56 -0400)]
nir/lower_tex: support projector lowering per sampler type
Some hardware, such as adreno a3xx, supports txp on some but not all
sampler types. In this case we want more fine grained control over
which texture projectors get lowered.
v2: split out nir_lower_tex_options struct to make it easier to
add the additional parameters coming in the following patches
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rob Clark [Wed, 16 Sep 2015 16:53:12 +0000 (12:53 -0400)]
nir/lower_tex: split out project_src() helper
Split this out to reduce noise in later patches.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rob Clark [Thu, 17 Sep 2015 11:54:35 +0000 (07:54 -0400)]
nir: rename nir_lower_tex_projector
Since the following patches will add additional tex-lowering related
functionality, which doesn't make sense to split out into a separate
pass (as they would require duplication of the projector lowering
logic), let's give this pass a more generic name.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Alejandro Piñeiro [Wed, 16 Sep 2015 08:26:55 +0000 (10:26 +0200)]
i965/vec4: Change types as needed to propagate source modifiers using current instruction
SEL and MOV instructions, as long as they don't have source modifiers, are
just copying bits around. So those kind of instruction could be propagated
even if there are type mismatches. This is needed because NIR generates
integer SEL and MOV instructions whenever it doesn't know what else to
generate.
This commit adds support for copy propagation using current instruction
as reference.
Equivalent to commit 472ef9 but for vec4.
v2: include check for saturate, as Jason Ekstrand suggested
v3: check that the dst.type and the src type are the same, in order to
solve (among others) the following deqp regression with v2:
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.lowp_uint_vertex
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Iago Toral Quiroga [Fri, 18 Sep 2015 09:02:34 +0000 (11:02 +0200)]
i965/fs: Fix comparison between signed and unsigned integer expressions
brw_fs_visitor.cpp: In member function 'void fs_visitor::emit_urb_writes()':
brw_fs_visitor.cpp:977:58: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Tapani Pälli [Tue, 15 Sep 2015 06:17:20 +0000 (09:17 +0300)]
mesa: fix errors when reading depth with glReadPixels
OpenGL ES 3.0 spec 3.7.2 "Transfer of Pixel Rectangles" specifies
DEPTH_COMPONENT, UNSIGNED_INT as a valid couple, validation for
internal format is checked by is_float_depth().
Fix regression caused by
81d2fd91a90e5b2fd9fd74792a7a7c329f0e4d29 in:
ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels
Test uses GL_DEPTH_COMPONENT, UNSIGNED_INT only when GL_NV_read_depth
extension is present.
v2: change check in _mesa_error_check_format_and_type to be explicit
for ES 2.0+, desktop OpenGL does not allow this behaviour + uses
this function for both glReadPixels and glDrawPixels validation.
(No Piglit regressions seen with v2.)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com> [v1]
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92009
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Rob Clark [Fri, 18 Sep 2015 01:07:41 +0000 (21:07 -0400)]
nir/builder: fix c++11 compiler warning
Fixes:
In file included from nir/nir_lower_samplers.cpp:27:0:
nir/nir_builder.h: In function 'nir_ssa_def* nir_channel(nir_builder*, nir_ssa_def*, int)':
nir/nir_builder.h:222:37: warning: narrowing conversion of 'c' from 'int' to 'unsigned int' inside { } is ill-formed in C++11 [-Wnarrowing]
unsigned swizzle[4] = {c, c, c, c};
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Fri, 18 Sep 2015 01:06:11 +0000 (21:06 -0400)]
nir: really actually fix comment this time
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Thu, 17 Sep 2015 22:18:45 +0000 (18:18 -0400)]
nir/print: print variable names
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Rob Clark [Thu, 17 Sep 2015 22:18:19 +0000 (18:18 -0400)]
nir: some comment fixups
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Timothy Arceri <t_arceri@yahoo.com.au>
Rob Clark [Wed, 16 Sep 2015 17:57:26 +0000 (13:57 -0400)]
freedreno/ir3: add --gpu arg to cmdline compiler
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sat, 12 Sep 2015 15:15:32 +0000 (11:15 -0400)]
freedreno/a4xx: wire up ucp support
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Thu, 10 Sep 2015 20:09:13 +0000 (16:09 -0400)]
freedreno/ir3: add support for ucp
Use nir_lower_clip pass for adding the VS/FS instructions to handle
user-clip-planes and CLIPDIST. Wire up support for load_user_clip_plane
intrinsic to fetch ucp[plane] values as driver-params (passed as const's
to the shader).
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Wed, 9 Sep 2015 18:57:15 +0000 (14:57 -0400)]
nir: add lowering stage for user-clip-planes / clipdist
The vertex shader lowering adds calculation for CLIPDIST, if needed
(ie. user-clip-planes), and the frag shader lowering adds conditional
kills based on CLIPDIST value (which should be treated as a normal
interpolated varying by the driver).
Note that this won't quite do the right thing in the face of MSAA plus
user-clip-planes, since all the samples would be killed or not (rather
than potentially only a portion of them). But it's better than no UCP
support at all for drivers that don't have this in hw.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rob Clark [Thu, 27 Aug 2015 21:42:40 +0000 (17:42 -0400)]
nir: add sysval for user-clip-planes
For lowering user-clip-planes, we need a way to pass the enabled/used
user-clip-planes in to shader.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Rob Clark [Fri, 11 Sep 2015 21:20:48 +0000 (17:20 -0400)]
freedreno/ir3: convert from tgsi semantic/index to varying-slot
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Fri, 11 Sep 2015 21:01:23 +0000 (17:01 -0400)]
glsl: add SYSTEM_VALUE_VERTEX_CNT
Used internally in freedreno/ir3 to calc stream-out position. Seems
like a generic enough way to implement stream-out (using str instrs),
plus it avoids compiler warnings by sneaking in a non-enum value in
switch statements.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Thu, 10 Sep 2015 21:25:18 +0000 (17:25 -0400)]
freedreno/ir3: switch to shader_enums.h interp constants
A small step towards un-TGSI'ifying ir3.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Ilia Mirkin [Thu, 17 Sep 2015 02:17:18 +0000 (22:17 -0400)]
nv50,nvc0: flush texture cache in presence of coherent bufs
This fixes the newly-added arb_texture_buffer_object-bufferstorage
piglit test.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Tue, 15 Sep 2015 05:32:40 +0000 (01:32 -0400)]
nv50,nvc0: detect underlying resource changes and update tic
When updating texture buffers, we might end up replacing the whole
buffer. Check that the tic address matches the resource address, and if
not, update the tic and reupload it.
This fixes:
arb_direct_state_access-texture-buffer
arb_texture_buffer_object-data-sync
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Boyan Ding [Sun, 30 Aug 2015 07:07:33 +0000 (15:07 +0800)]
vc4: Try to pair up instructions when only one of them has PM bit
Instructions with difference in PM field can actually be paired up if
the one without PM doesn't do packing/unpacking and non-NOP
packing/unpacking operations from PM instruction aren't added to the
other without PM.
total instructions in shared programs: 48209 -> 47460 (-1.55%)
instructions in affected programs: 11688 -> 10939 (-6.41%)
Signed-off-by: Boyan Ding <boyan.j.ding@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Tue, 8 Sep 2015 23:45:57 +0000 (16:45 -0700)]
i965/vec4: Use nir_move_vec_src_uses_to_dest
The idea here is not that it gives register coalescing a little bit of a
helping hand. It doesn't actually fix the coalescing problems, but it
seems to help a good bit.
Shader-db results for vec4 programs on Haswell:
total instructions in shared programs:
1746280 ->
1683959 (-3.57%)
instructions in affected programs:
1259166 ->
1196845 (-4.95%)
helped: 11363
HURT: 148
v2 (Jason Ekstrand):
- Run nir_move_vec_src_uses_to_dest after going out of SSA
- New shader-db numbers
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Jason Ekstrand [Tue, 8 Sep 2015 22:18:01 +0000 (15:18 -0700)]
nir: Add a pass to rewrite uses of vecN sources to the vecN destination
v2 (Jason Ekstrand):
- Handle non-SSA sources and destinations
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Jason Ekstrand [Mon, 14 Sep 2015 19:25:28 +0000 (12:25 -0700)]
nir: Add comments to nir_index_instrs and nir_index_ssa_defs
The provided indices have the very nice property that if A dominates B then
A->index <= B->index. We should document that somewhere.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Tue, 8 Sep 2015 23:43:51 +0000 (16:43 -0700)]
nir: Add a generic instruction index
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ulrich Weigand [Tue, 15 Sep 2015 13:23:26 +0000 (15:23 +0200)]
mesa: Fix texture compression on big-endian systems
Various pieces of code to create compressed textures will first
generate an uncompressed RGBA texture into a temporary buffer,
and then read from that buffer while creating the final compressed
texture in the requested format.
The code reading from the temporary buffer assumes the buffer is
formatted as an array of bytes in RGBA order. However, the buffer
is filled using a _mesa_texstore call with MESA_FORMAT_R8G8B8A8_UNORM
format -- this is defined as an array of *integers* holding the
RGBA values in packed format (least-significant to most-significant).
This means incorrect bytes are accessed on big-endian systems.
This patch fixes this by using the MESA_FORMAT_A8B8G8R8_UNORM format
instead on big-endian systems when filling the buffer. This fixes
about 100 piglit test case failures on s390x for me.
Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Tested-by: Oded Gabbay <oded.gabbay@gmail.com>
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@gmail.com>
Thomas Hellstrom [Wed, 16 Sep 2015 12:53:13 +0000 (05:53 -0700)]
st/xa: Use PIPE_FORMAT_R8_UNORM when available
XA has been using L8_UNORM for a8 and yuv component surfaces.
This commit instead makes XA prefer R8_UNORM since it's assumed to have a
higher availability.
Also neither of these formats are suitable as destination formats using
destination alpha blending, so reject those operations.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tapani Pälli [Tue, 1 Sep 2015 10:53:44 +0000 (13:53 +0300)]
mesa: return initial value for VALIDATE_STATUS if pipe not bound
From OpenGL 4.5 Core spec (7.13):
"If pipeline is a name that has been generated (without subsequent
deletion) by GenProgramPipelines, but refers to a program pipeline
object that has not been previously bound, the GL first creates a
new state vector in the same manner as when BindProgramPipeline
creates a new program pipeline object."
I interpret this as "If GetProgramPipelineiv gets called without a
bound (but valid) pipeline object, the state should reflect initial
state of a new pipeline object." This is also expected behaviour by
ES31-CTS.sepshaderobjs.PipelineApi conformance test.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Tapani Pälli [Tue, 1 Sep 2015 10:53:43 +0000 (13:53 +0300)]
mesa: return initial value for PROGRAM_SEPARABLE when not linked
From OpenGL ES 3.1 spec (7.12):
"Most properties set within program objects are specified not to
take effect until the next call to LinkProgram or ProgramBinary.
Some properties further require a successful call to either of
these commands before taking effect. GetProgramiv returns the
properties currently in effect for program, which may differ from
the properties set within program since the most recent call to
LinkProgram or ProgramBinary, which have not yet taken effect. If
there has been no such call putting changes to pname into effect,
initial values are returned."
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Tapani Pälli [Mon, 14 Sep 2015 07:46:01 +0000 (10:46 +0300)]
mesa: enable query of PROGRAM_PIPELINE_BINDING for ES 3.1
Specified in OpenGL ES 3.1 spec, Table 23.32: Program Object State.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
Timothy Arceri [Wed, 26 Aug 2015 12:18:36 +0000 (22:18 +1000)]
nir: support indirect indexing samplers in struct arrays
As a bonus we get indirect support for arrays of arrays for free.
V5: couple of small clean-ups suggested by Jason.
V4: fix struct member location caclulation, use nir_ssa_def rather than
nir_src for the indirect as suggested by Jason
V3: Use nir_instr_rewrite_src() with empty src rather then clearing
the use_link list directly for the old indirects as suggested by Jason
V2: Fixed validation error in debug build
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Timothy [Fri, 11 Sep 2015 21:33:27 +0000 (07:33 +1000)]
glsl: add helper for calculating offsets for struct members
V2: update comments
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Timothy Arceri [Tue, 1 Sep 2015 05:52:10 +0000 (15:52 +1000)]
glsl: make variables private
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Timothy Arceri [Sun, 30 Aug 2015 02:50:34 +0000 (12:50 +1000)]
glsl: store uniform slot id in var location field
This will allow us to access the uniform later on without resorting to
building a name string and looking it up in UniformHash.
V3: remove line wrap change from this patch
V2: store slot number for all non-UBO uniforms to make code more
consitent, renamed explicit_binding to explicit_location and added
comment about what it does. Store the location at every shader stage.
Updated data.location comments in ir/nir.h.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Timothy Arceri [Wed, 2 Sep 2015 01:29:11 +0000 (11:29 +1000)]
glsl: assign hidden uniforms their slot id earlier
This is required so that the next patch can safely assign the slot id
to the var.
The ids are now assigned in the order we want before allocating storage
so there is no need to sort the storage array and move things around.
V2: rename variable to make code easier to follow as suggested by Jason
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Timothy Arceri [Sun, 30 Aug 2015 02:49:46 +0000 (12:49 +1000)]
glsl: order indices for samplers inside a struct array
This allows the correct offset to be easily calculated for indirect
indexing when a struct array contains multiple samplers, or any crazy
nesting.
The indices for the folling struct will now look like this:
Sampler index: 0 Name: s[0].tex
Sampler index: 1 Name: s[1].tex
Sampler index: 2 Name: s[0].si.tex
Sampler index: 3 Name: s[1].si.tex
Sampler index: 4 Name: s[0].si.tex2
Sampler index: 5 Name: s[1].si.tex2
Before this change it looked like this:
Sampler index: 0 Name: s[0].tex
Sampler index: 3 Name: s[1].tex
Sampler index: 1 Name: s[0].si.tex
Sampler index: 4 Name: s[1].si.tex
Sampler index: 2 Name: s[0].si.tex2
Sampler index: 5 Name: s[1].si.tex2
struct S_inner {
sampler2D tex;
sampler2D tex2;
};
struct S {
sampler2D tex;
S_inner si;
};
uniform S s[2];
V3: Update comments with suggestions from Jason
V2: rename struct array counter to have better name
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Dave Airlie [Wed, 16 Sep 2015 20:55:31 +0000 (06:55 +1000)]
Revert "mesa/extensions: restrict GL_OES_EGL_image to GLES"
This reverts commit
48961fa3ba37999a6f8fd812458b735e39604a95.
glamor/Xwayland use this, the spec saying something when it
was written, and the fact that the comment says Mesa relies on it
hasn't changed.
I also don't have a copy of this patch in my mail archive, which
seems wierd, did it get posted to mesa-dev?
Signed-off-by: Dave Airlie <airlied@redhat.com>
Eric Anholt [Wed, 16 Sep 2015 19:51:00 +0000 (15:51 -0400)]
vc4: Only build in simulator mode if we find pkg-config for it.
This will let other developers build it x86 for build-testing purposes.
Ilia Mirkin [Wed, 16 Sep 2015 18:19:21 +0000 (14:19 -0400)]
freedreno/a3xx: use NUM_USER_CLIP_PLANES helper instead of magic number
Use the helper from the newly-updated generated header file.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Mon, 14 Sep 2015 05:59:01 +0000 (01:59 -0400)]
freedreno/a3xx: fix blending of L8 format
Even though luminance formats don't have alpha, we still want the alpha
output to go to the blender. This fixes the luminance blending tests.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Sun, 13 Sep 2015 23:50:45 +0000 (19:50 -0400)]
freedreno/a3xx: add support for dual-source blending
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Eric Anholt [Wed, 9 Sep 2015 17:23:55 +0000 (13:23 -0400)]
vc4: convert from tgsi semantic/index to varying-slot
(originally part of previous patch, split out to separate patch by Rob)
v2: squash in some fixes from Eric
v3: Another fix from Eric for point coords.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Eric Anholt [Tue, 4 Aug 2015 21:28:02 +0000 (14:28 -0700)]
gallium/ttn: Convert to using VARYING_SLOT_* / FRAG_RESULT_*.
This avoids exceeding the size of the .index bitfield since it got
truncated, and should make our NIR look more like the NIR that the rest of
the NIR developers are working on.
v2: split out vc4 updates, first patch uses varying_slot_to_tgsi_semantic()
helper, and second patch does the actual conversion.
v3: add frag_result_to_tgsi_semantic() helper and don't try to map
frag_results to semantic name/index as if they were varying_slot's
v4: use VERT_ATTRIB_ for VS inputs
v5: Fix vc4 build.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Ilia Mirkin [Tue, 15 Sep 2015 23:39:25 +0000 (19:39 -0400)]
nv50, nvc0: fix max texture buffer size to 128M elements
This is what the hardware supports, there never was any sort of 64K
limit.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Tue, 15 Sep 2015 23:32:10 +0000 (19:32 -0400)]
st/mesa: avoid integer overflows with buffers >= 512MB
This fixes failures with the newly-submitted max-size texture buffer
piglit test for GPUs exposing >= 128M max texels.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.6 11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Brian Paul [Tue, 15 Sep 2015 20:28:38 +0000 (14:28 -0600)]
mesa: move GL_APPLE_object_purgeable functions to new file
Move this code out of bufferobj.c since it's not strongly connected to
buffer objects.
Acked-by: Matt Turner <mattst88@gmail.com>
Brian Paul [Tue, 15 Sep 2015 20:04:58 +0000 (14:04 -0600)]
mesa: remove trailing whitespace in bufferobj.c
Trivial.
Brian Paul [Tue, 15 Sep 2015 20:03:04 +0000 (14:03 -0600)]
mesa: whitespace, line wrap fixes in varray.c
Trivial.
Rob Clark [Tue, 15 Sep 2015 22:55:48 +0000 (18:55 -0400)]
nir/print: print symbolic names from shader-enum
v2: split out moving of FILE *fp into state structure into it's own
(more complete patch) to reduce the noise in this one
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Rob Clark [Tue, 15 Sep 2015 22:50:41 +0000 (18:50 -0400)]
nir/print: bit of state refactoring
Rename print_var_state to print_state, and stuff FILE ptr into the state
object. This avoids passing around an extra parameter everywhere.
v2: even more extensive conversion.. use state *everywhere* instead of
FILE ptr, and convert nir_print_instr() to use state as well
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Rob Clark [Fri, 11 Sep 2015 16:48:05 +0000 (12:48 -0400)]
glsl: shader-enum to name debug fxns
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Rob Clark [Fri, 4 Sep 2015 15:35:33 +0000 (11:35 -0400)]
freedreno: one screen to rule them all
Similar to
fee0686c21c631d96d6042741267a3c218c23ffc, but in this case to
ensure that drm_gralloc and libGLES_mesa are sharing a single screen.
Bumps libdrm_freedreno version dependency, as it requires the new
fd_device_fd() API.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Mon, 14 Sep 2015 15:54:05 +0000 (11:54 -0400)]
freedreno/ir3: use NIR to lower ffract instead of tgsi_lowering
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Mon, 14 Sep 2015 15:13:19 +0000 (11:13 -0400)]
nir: add lowering for ffract
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jordan Justen [Tue, 15 Sep 2015 21:01:17 +0000 (14:01 -0700)]
i965/fs: The barrier send uses only 1 payload register
When preparing the barrier payload, the instructions should operate in
simd8 mode since we only use 1 payload register.
fs_inst::regs_read is also updated to indicate that it only reads one
register for SHADER_OPCODE_BARRIER.
These issues were flagged by:
commit
cadd7dd384b33a779d46bd664f456bed4a21a5b7
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date: Thu Jul 2 15:41:02 2015 -0700
i965/fs: Add a very basic validation pass
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Jason Ekstrand [Tue, 15 Sep 2015 19:09:06 +0000 (12:09 -0700)]
nir/builder: Use a normal temporary array in nir_channel
C++ gets cranky if we take references of temporaries. This isn't a problem
yet in master because nir_builder is never used from C++. However, it will
be in the future so we should fix it now.
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Mon, 14 Sep 2015 19:15:06 +0000 (15:15 -0400)]
freedreno/a4xx: more texture formats
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Tue, 15 Sep 2015 21:25:47 +0000 (17:25 -0400)]
freedreno/a4xx: border-color support
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Tue, 15 Sep 2015 21:25:25 +0000 (17:25 -0400)]
freedreno/a4xx: wire up texture clamp lowering
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Tue, 15 Sep 2015 13:23:21 +0000 (09:23 -0400)]
freedreno: helper for a3xx/a4xx border-colors
Both use the same layout for the buffer containing border-color values,
so rather than duplicating the logic in a4xx, split it out into a
helper.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Mon, 14 Sep 2015 20:59:36 +0000 (16:59 -0400)]
freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Jason Ekstrand [Thu, 10 Sep 2015 00:18:55 +0000 (17:18 -0700)]
nir/lower_vec_to_movs: Coalesce into destinations of fdot instructions
Now that we have a replicating fdot instruction, we can actually coalesce
into the destinations of vec4 instructions. We couldn't really do this
before because, if the destination had to end up in .z, we couldn't
reswizzle the instruction. With a replicated destination, the result ends
up in all channels so we can just set the writemask and we're done.
Shader-db results for vec4 programs on Haswell:
total instructions in shared programs:
1747753 ->
1746280 (-0.08%)
instructions in affected programs: 143274 -> 141801 (-1.03%)
helped: 667
HURT: 0
It turns out that dot-products matter...
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Jason Ekstrand [Thu, 10 Sep 2015 18:08:15 +0000 (11:08 -0700)]
i965/vec4: Use the replicated fdot instruction in NIR
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Jason Ekstrand [Thu, 10 Sep 2015 17:51:46 +0000 (10:51 -0700)]
nir: Add a fdot instruction that replicates the result to a vec4
Fortunately, nir_constant_expr already auto-splats if "dst" never shows up
in the constant expression field so we don't need to do anything there.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Jason Ekstrand [Wed, 9 Sep 2015 21:40:06 +0000 (14:40 -0700)]
nir/lower_vec_to_movs: Coalesce movs on-the-fly when possible
The old pass blindly inserted a bunch of moves into the shader with no
concern for whether or not it was really needed. This adds code to try and
coalesce into the destination of the instruction providing the value.
Shader-db results for vec4 shaders on Haswell:
total instructions in shared programs:
1754420 ->
1747753 (-0.38%)
instructions in affected programs: 231230 -> 224563 (-2.88%)
helped: 1017
HURT: 2
This approach is heavily based on a different patch by Eduardo Lima Mitev
<elima@igalia.com>. Eduardo's patch did this in a separate pass as opposed
to integrating it into nir_lower_vec_to_movs.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Jason Ekstrand [Wed, 9 Sep 2015 21:47:28 +0000 (14:47 -0700)]
nir/lower_vec_to_movs: Get rid of start_idx and swizzle compacting
Previously, we did this thing with keeping track of a separate start_idx
which was different from the iteration variable. I think this was a relic
of the way that GLSL IR implements writemasks. In NIR, if a given bit in
the writemask is unset then that channel is just "unused", not missing. In
particular, a vec4 operation with a writemask of 0xd will use sources 0, 2,
and 3 and leave source 1 alone. We can simplify things a good deal (and
make them correct) by removing this "compacting" step.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Wed, 9 Sep 2015 20:55:39 +0000 (13:55 -0700)]
i965/vec4_nir: Use partial SSA form rather than full non-SSA
We made this switch in the FS backend some time ago and it seems to make a
number of things a bit easier. In particular, supporting SSA values takes
very little work in the backend and allows us to take advantage of the
majority of the SSA information even after we've gotten rid of Phi nodes.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Jason Ekstrand [Wed, 9 Sep 2015 20:42:14 +0000 (13:42 -0700)]
nir/lower_vec_to_movs: Handle partially SSA shaders
v2 (Jason Ekstrand):
- Use nir_instr_rewrite_dest
- Pass the impl directly into lower_vec_to_movs_block
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Jason Ekstrand [Wed, 9 Sep 2015 19:58:58 +0000 (12:58 -0700)]
nir/lower_vec_to_movs: Pass the shader around directly
Previously, we were passing the shader around, we were just calling it
"mem_ctx". However, the nir_shader is (and must be for the purposes of
mark-and-sweep) the mem_ctx so we might as well pass it around explicitly.
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Jason Ekstrand [Thu, 2 Jul 2015 22:41:02 +0000 (15:41 -0700)]
i965/fs: Add a very basic validation pass
Currently the validation pass only validates that regs_read and
regs_written are consistent with the sizes of VGRF's. We can add more as
we find it to be useful.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Mon, 14 Sep 2015 22:36:24 +0000 (15:36 -0700)]
i965/fs_surface_builder: Only apply predicate to components that exist
In certain conditions, we have to do bounds-checking in the shader for
image_load_store. The way this works for image loads is that we do a
predicated load and then emit a series of selects, one per component,
that gives us 0 or the loaded value depending on whether or not you're
in bounds. However, we were hard-coding 4 components which may not be
correct. Instead, we should be using size which is the number of
components read.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Jason Ekstrand [Mon, 14 Sep 2015 21:18:13 +0000 (14:18 -0700)]
i965/fs: Only read output_components many components when writing an output
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Jason Ekstrand [Mon, 14 Sep 2015 22:09:00 +0000 (15:09 -0700)]
i965/fs: Set output_components for lowered clip distance outputs
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Nanley Chery [Thu, 27 Aug 2015 23:05:22 +0000 (16:05 -0700)]
mesa/teximage: restrict GL_ETC1_RGB8_OES support to GLES
According to the extensions table and our glext headers,
OES_compressed_ETC1_RGB8_texture is only supported in
GLES1 and GLES2. Since we may give users a GLES3 context
when a GLES2 context is requested, we also allow this
extension for GLES3 as well.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Nanley Chery [Thu, 10 Sep 2015 17:48:46 +0000 (10:48 -0700)]
mesa/extensions: restrict GL_OES_EGL_image to GLES
Driver vendors do this as well. The extension specification
lists GLES 1.1 or 2.0 as requirements.
Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Nanley Chery [Thu, 27 Aug 2015 23:05:22 +0000 (16:05 -0700)]
mesa/extensions: restrict luminance alpha formats to API_OPENGL_COMPAT
According the GL 3.1 spec, luminance alpha formats are deprecated.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Thomas Hellstrom [Tue, 15 Sep 2015 06:40:07 +0000 (23:40 -0700)]
gallium/svga: Enable PIPE_FORMAT_L8_UNORM for vgpu10
It's extensively used by XA for a8- and planar yuv component surfaces.
This fixes broken XA yuv blits using vgpu10 contexts.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Emil Velikov [Thu, 10 Sep 2015 13:41:38 +0000 (14:41 +0100)]
egl/dri2: don't leak the fd on dri2_terminate
Currently the check was incorrect as it did not consider the (unlikely)
case of fd == 0. In order to fix this we should first correctly
initialize it to -1, as the swrast implementations leave it set to zero
(props to calloc()).
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Boyan Ding <boyan.j.ding@gmail.com>