mesa.git
9 years agoradeonsi: add support for NULL texture sampler views that return (0,0,0,1)
Marek Olšák [Sun, 1 Feb 2015 12:16:45 +0000 (13:16 +0100)]
radeonsi: add support for NULL texture sampler views that return (0,0,0,1)

This used to hang.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: fix a crash when binding a NULL sampler view list
Marek Olšák [Sun, 1 Feb 2015 12:16:06 +0000 (13:16 +0100)]
radeonsi: fix a crash when binding a NULL sampler view list

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: move the buffer descriptor to the end of the image descriptor
Marek Olšák [Sun, 1 Feb 2015 15:58:08 +0000 (16:58 +0100)]
radeonsi: move the buffer descriptor to the end of the image descriptor

This will allow supporting NULL textures.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: don't use tgsi_parse_context to get processor type
Marek Olšák [Sat, 31 Jan 2015 16:31:23 +0000 (17:31 +0100)]
radeonsi: don't use tgsi_parse_context to get processor type

Also remove unused "tokens".

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: fix instanced arrays with non-zero start instance
Marek Olšák [Sat, 31 Jan 2015 18:00:44 +0000 (19:00 +0100)]
radeonsi: fix instanced arrays with non-zero start instance

Fixes piglit ARB_base_instance/arb_base_instance-drawarrays.

Cc: 10.3 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agor600g,radeonsi: don't append to streamout buffers that haven't been used yet
Marek Olšák [Sun, 1 Feb 2015 12:47:01 +0000 (13:47 +0100)]
r600g,radeonsi: don't append to streamout buffers that haven't been used yet

The FILLED_SIZE counter is uninitialized at the beginning, so we can't use it.
Instead, use offset = 0, which is what we always do when not appending.

This unexpectedly fixes spec/ARB_texture_multisample/sample-position/*.
Yes, the test does use transform feedback.

Cc: 10.3 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agogallium: set PIPE_MAX_SAMPLERS to 18
Marek Olšák [Sat, 31 Jan 2015 17:58:19 +0000 (18:58 +0100)]
gallium: set PIPE_MAX_SAMPLERS to 18

For drivers that use higher slots not to crash in tgsi_shader_info.

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
9 years agogallium/u_pstipple: add ability to specify a fixed texture unit
Marek Olšák [Sat, 31 Jan 2015 17:56:54 +0000 (18:56 +0100)]
gallium/u_pstipple: add ability to specify a fixed texture unit

E.g. r600g can use slot 17, which is outside of the API range.

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
9 years agogallium/util: add u_bit_scan64
Marek Olšák [Sat, 31 Jan 2015 16:15:16 +0000 (17:15 +0100)]
gallium/util: add u_bit_scan64

Same as u_bit_scan, but for uint64_t.

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
9 years agotgsi: add tgsi_get_processor_type helper from radeon
Marek Olšák [Sat, 31 Jan 2015 16:17:05 +0000 (17:17 +0100)]
tgsi: add tgsi_get_processor_type helper from radeon

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
9 years agoi965/fs: Fix saturate on MAD and LRP with the NIR backend.
Kenneth Graunke [Tue, 3 Feb 2015 08:50:23 +0000 (00:50 -0800)]
i965/fs: Fix saturate on MAD and LRP with the NIR backend.

Fixes misrendering in "Witcher 2" with INTEL_USE_NIR=1, and probably
many other programs.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agomesa: Fix _mesa_format_convert fallback path when src is not an array format
Iago Toral Quiroga [Mon, 2 Feb 2015 12:59:27 +0000 (13:59 +0100)]
mesa: Fix _mesa_format_convert fallback path when src is not an array format

When a rebase swizzle is provided and we call _mesa_swizzle_and_convert
after unpacking the source format we were always passing normalized=false.
We should pass true or false depending on the formats involved in the
conversion for the byte and float paths (the integer path cannot ever be
normalized).

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Mark Janes <mark.a.janes@intel.com>
9 years agost/osmesa: Fix osbuffer->textures indexing
Park, Jeongmin [Tue, 3 Feb 2015 02:52:03 +0000 (11:52 +0900)]
st/osmesa: Fix osbuffer->textures indexing

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88930
Cc: 10.4 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
9 years agoi965/nir: use redundant phi optimization
Connor Abbott [Tue, 3 Feb 2015 06:50:49 +0000 (01:50 -0500)]
i965/nir: use redundant phi optimization

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: add an optimization to remove useless phi nodes
Connor Abbott [Tue, 3 Feb 2015 06:49:44 +0000 (01:49 -0500)]
nir: add an optimization to remove useless phi nodes

This removes phi nodes whose sources all point to the same thing.

Shader-db results:

total NIR instructions in shared programs: 2045293 -> 2041209 (-0.20%)
NIR instructions in affected programs:     126564 -> 122480 (-3.23%)
helped:                                615
HURT:                                  0

total FS instructions in shared programs: 4321840 -> 4320392 (-0.03%)
FS instructions in affected programs:     24622 -> 23174 (-5.88%)
helped:                                138
HURT:                                  0

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Tested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir/validate: Ensure that phi sources are SSA-only
Jason Ekstrand [Tue, 3 Feb 2015 18:10:59 +0000 (10:10 -0800)]
nir/validate: Ensure that phi sources are SSA-only

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir/validate: Validate that only float ALU outputs are saturated
Jason Ekstrand [Tue, 3 Feb 2015 20:42:07 +0000 (12:42 -0800)]
nir/validate: Validate that only float ALU outputs are saturated

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir/lower_source_mods: Don't lower saturate for non-float outputs
Jason Ekstrand [Tue, 3 Feb 2015 20:41:36 +0000 (12:41 -0800)]
nir/lower_source_mods: Don't lower saturate for non-float outputs

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoi965/fs_nir: Get rid of get_alu_src
Jason Ekstrand [Thu, 22 Jan 2015 00:00:55 +0000 (16:00 -0800)]
i965/fs_nir: Get rid of get_alu_src

Originally, get_alu_src was supposed to handle resolving swizzles and
things like that.  However, now that basically every instruction we have
only takes scalar sources, we don't really need it anymore.  The only case
where it's still marginally useful is for the mov and vecN operations that
are left over from SSA form.  We can handle those cases as a special case
easily enough.  As a side-effect, we don't need the vec_to_movs pass
anymore.

v2 Jason Ekstrand <jason.ekstrand@intel.com>:
 - Rework the way we detect if we need an extra copy for swizzling.  The
   old code involved a pile of confusing switch fall-throughs; we now use a
   loop.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoi965/fs: Use NIR's scalarizing abilities and stop handling vectors
Jason Ekstrand [Tue, 23 Dec 2014 22:44:19 +0000 (14:44 -0800)]
i965/fs: Use NIR's scalarizing abilities and stop handling vectors

Now that we can scalarize with NIR, there's no need for all this code
anymore.  Let's get rid of it and just do scalar operations.

v2: run copy prop before lowering phi nodes

v3: Get rid of the "emit(...)->saturate = foo" pattern

v4: Run alu_to_scalar as an optimization pass

total instructions in shared programs: 5998321 -> 5974070 (-0.40%)
instructions in affected programs:     732075 -> 707824 (-3.31%)
helped:                                3137
HURT:                                  191
GAINED:                                18
LOST:                                  0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agonir: Add a pass to lower vector phi nodes to scalar phi nodes
Jason Ekstrand [Wed, 21 Jan 2015 23:23:32 +0000 (15:23 -0800)]
nir: Add a pass to lower vector phi nodes to scalar phi nodes

v2 Jason Ekstrand <jason.ekstrand@intel.com>:
 - Add better comments
 - Use nir_ssa_dest_init and nir_src_for_ssa more places
 - Fix some void * casts

v3 Jason Ekstrand <jason.ekstrand@intel.com>:
 - Rework the way we determine whether or not to sccalarize a phi node to
   make the recursion non-bogus
 - Treat load_const instructions as scalarizable

v4 Jason Ekstrand <jason.ekstrand@intel.com>:
 - Allow uniform and input loads to be scalarizable

v5 Jason Ekstrand <jason.ekstrand@intel.com>:
 - Also consider loads of inputs (varying, uniform, or ubo) to be
   scalarizable.  We were already doing this for load_var on uniforms and
   inputs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoi965/fs: Add support for constant propagating into sources with modifiers.
Matt Turner [Thu, 29 Jan 2015 02:37:32 +0000 (18:37 -0800)]
i965/fs: Add support for constant propagating into sources with modifiers.

All but 16 of the programs helped were ARB fp programs.

total instructions in shared programs: 5949286 -> 5945470 (-0.06%)
instructions in affected programs:     275162 -> 271346 (-1.39%)
helped:                                1197
GAINED:                                1

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
9 years agoi965/vec4: Use abs/negate functions in const propagation.
Matt Turner [Fri, 30 Jan 2015 23:13:48 +0000 (15:13 -0800)]
i965/vec4: Use abs/negate functions in const propagation.

No changes in shader-db.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
9 years agoi965: Add function to take the abs of immediates.
Matt Turner [Fri, 30 Jan 2015 22:14:43 +0000 (14:14 -0800)]
i965: Add function to take the abs of immediates.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
9 years agoi965: Add function to negate immediates.
Matt Turner [Thu, 29 Jan 2015 19:15:10 +0000 (11:15 -0800)]
i965: Add function to negate immediates.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
9 years agoi965: Mark UB/B immediates as unreachable.
Matt Turner [Thu, 29 Jan 2015 19:16:43 +0000 (11:16 -0800)]
i965: Mark UB/B immediates as unreachable.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
9 years agogallium/util: Don't use __builtin_clrsb in util_last_bit().
Matt Turner [Tue, 3 Feb 2015 01:26:49 +0000 (17:26 -0800)]
gallium/util: Don't use __builtin_clrsb in util_last_bit().

Unclear circumstances lead to undefined symbols on x86.

Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=536916
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agoglsl/list: Note that exec_lists may not be realloc'd.
Matt Turner [Tue, 3 Feb 2015 01:23:25 +0000 (17:23 -0800)]
glsl/list: Note that exec_lists may not be realloc'd.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agost/mesa: mark constant array of swizzles as static const
Nils Wallménius [Thu, 22 Jan 2015 19:47:28 +0000 (20:47 +0100)]
st/mesa: mark constant array of swizzles as static const

This saves about 0.5k in the text section for a gallium driver
on amd64.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
9 years agomesa: Returns a GL_INVALID_VALUE error on several APIs when buffer size is negative
Eduardo Lima Mitev [Wed, 21 Jan 2015 14:32:47 +0000 (15:32 +0100)]
mesa: Returns a GL_INVALID_VALUE error on several APIs when buffer size is negative

Section 2.3.1 (Errors) of the OpenGL 4.5 spec says:

    "If a negative number is provided where an argument of type sizei or
    sizeiptr is specified, an INVALID_VALUE error is generated.

This patch adds checks for negative buffer size values passed to different APIs.
It also moves up the check on other APIs that already had it, making it the first
error check performed in the function, for consistency.

While there may be other APIs throughtout the code lacking this check (or at least
not at the beginning of the function), this patch focuses on the cases that break
the dEQP tests listed below. It could be a good excersize for the future to check
all other cases, and improve consistency in the order of the checks throughout the
whole Mesa code base.

This fixes 5 dEQP test:
* dEQP-GLES3.functional.negative_api.state.get_attached_shaders
* dEQP-GLES3.functional.negative_api.state.get_shader_source
* dEQP-GLES3.functional.negative_api.state.get_active_uniform
* dEQP-GLES3.functional.negative_api.state.get_active_attrib
* dEQP-GLES3.functional.negative_api.shader.program_binary

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agomesa: fix error value in GetFramebufferAttachmentParameteriv for OpenGL ES 3.0
Samuel Iglesias Gonsalvez [Fri, 16 Jan 2015 15:00:13 +0000 (16:00 +0100)]
mesa: fix error value in GetFramebufferAttachmentParameteriv for OpenGL ES 3.0

Section 6.1.13 "Framebuffer Object Queries" of OpenGL ES 3.0 spec:

 "If the default framebuffer is bound to target, then attachment must be
  BACK, identifying the color buffer; DEPTH, identifying the depth buffer; or
  STENCIL, identifying the stencil buffer."

OpenGL ES 3.0, section 2.5 (GL Errors):

 "If a command that requires an enumerated value is passed a
  symbolic constant that is not one of those specified as allowable
  for that command, an INVALID_ENUM error is generated."

Then change the returned error to INVALID_ENUM.

Fixes:

dEQP-GLES3.functional.fbo.api.attachment_query_default_fbo

Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agoglsl: Improve precision of mod(x,y)
Iago Toral Quiroga [Tue, 20 Jan 2015 16:09:59 +0000 (17:09 +0100)]
glsl: Improve precision of mod(x,y)

Currently, Mesa uses the lowering pass MOD_TO_FRACT to implement
mod(x,y) as y * fract(x/y). This implementation has a down side though:
it introduces precision errors due to the fract() operation. Even worse,
since the result of fract() is multiplied by y, the larger y gets the
larger the precision error we produce, so for large enough numbers the
precision loss is significant. Some examples on i965:

Operation                           Precision error
-----------------------------------------------------
mod(-1.951171875, 1.9980468750)      0.0000000447
mod(121.57, 13.29)                   0.0000023842
mod(3769.12, 321.99)                 0.0000762939
mod(3769.12, 1321.99)                0.0001220703
mod(-987654.125, 123456.984375)      0.0160663128
mod( 987654.125, 123456.984375)      0.0312500000

This patch replaces the current lowering pass with a different one
(MOD_TO_FLOOR) that follows the recommended implementation in the GLSL
man pages:

mod(x,y) = x - y * floor(x/y)

This implementation eliminates the precision errors at the expense of
an additional add instruction on some systems. On systems that can do
negate with multiply-add in a single operation this new implementation
would come at no additional cost.

v2 (Ian Romanick)
- Do not clone operands because when they are expressions we would be
duplicating them and that can lead to suboptimal code.

Fixes the following 16 dEQP tests:
dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.mediump_*
dEQP-GLES3.functional.shaders.builtin_functions.precision.mod.highp_*

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agomesa: Allow querying for GL_PRIMITIVE_RESTART_FIXED_INDEX under GLES 3
Eduardo Lima Mitev [Tue, 20 Jan 2015 12:58:45 +0000 (13:58 +0100)]
mesa: Allow querying for GL_PRIMITIVE_RESTART_FIXED_INDEX under GLES 3

GLES 3.0.0 spec introduces context state PRIMITIVE_RESTART_FIXED_INDEX
(2.8.1 Transferring Array Elements, page 26) which is not currently
possible to query using glGet*() funcs.

Fixes 4 dEQP tests:
* dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getboolean
* dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getinteger
* dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getinteger64
* dEQP-GLES3.functional.state_query.boolean.primitive_restart_fixed_index_getfloat

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agoglsl: can't have 'const' qualifier used with struct or interface block members
Iago Toral Quiroga [Mon, 19 Jan 2015 11:32:12 +0000 (12:32 +0100)]
glsl: can't have 'const' qualifier used with struct or interface block members

Fixes the following 2 dEQP tests:
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_const_vertex
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_const_fragment

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agoglsl: interface blocks must be declared at global scope
Iago Toral Quiroga [Mon, 19 Jan 2015 11:32:10 +0000 (12:32 +0100)]
glsl: interface blocks must be declared at global scope

Fixes the following 2 dEQP tests:
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_in_main_vertex
dEQP-GLES3.functional.shaders.declarations.invalid_declarations.uniform_block_in_main_fragment

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agoi965: Fix negate with unsigned integers
Iago Toral Quiroga [Mon, 19 Jan 2015 11:32:09 +0000 (12:32 +0100)]
i965: Fix negate with unsigned integers

For code such as:

uint tmp1 = uint(in0);
uint tmp2 = -tmp1;
float out0 = float(tmp2);

We produce code like:
mov(8)    g5<1>.xF    -g9<4,4,1>.xUD

which does not produce correct results. This code produces the
results we would expect if tmp1 and tmp2 were signed integers
instead.

It seems that a similar problem was detected and addressed when
using negations with unsigned integers as part of condionals, but
it looks like the problem has a wider impact than that.

This patch fixes the problem by preventing copy-propagation of
negated UD registers in all scenarios, not only in conditionals.

Fixes the following 24 dEQP tests:

dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uint_*
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec2_*
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec3_*
dEQP-GLES3.functional.shaders.operator.unary_operator.minus.*_uvec4_*

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
9 years agoscons: Fix Windows builds with LLVM 3.5.
Jose Fonseca [Tue, 3 Feb 2015 10:16:50 +0000 (10:16 +0000)]
scons: Fix Windows builds with LLVM 3.5.

LLVMBitReader dependency was introduced, as pointed out by Rob Conde.

9 years agost/mesa: add EXT_polygon_offset_clamp support
Ilia Mirkin [Wed, 31 Dec 2014 07:20:51 +0000 (02:20 -0500)]
st/mesa: add EXT_polygon_offset_clamp support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
9 years agogallium: add a cap to determine whether the driver supports offset_clamp
Ilia Mirkin [Sun, 1 Feb 2015 14:01:50 +0000 (09:01 -0500)]
gallium: add a cap to determine whether the driver supports offset_clamp

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
9 years agoi965/gen6+: enable EXT_polygon_offset_clamp
Ilia Mirkin [Wed, 31 Dec 2014 07:15:23 +0000 (02:15 -0500)]
i965/gen6+: enable EXT_polygon_offset_clamp

Replace the hard-coded 0's with the context clamp value.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agomesa: add support for GL_EXT_polygon_offset_clamp
Ilia Mirkin [Wed, 31 Dec 2014 07:07:55 +0000 (02:07 -0500)]
mesa: add support for GL_EXT_polygon_offset_clamp

Nothing enables the extension yet, but the values are now available.
The spec calls for it to only be exposed for GL 3.3+, which is core-only
in mesa. Instead we allow any driver to enable it, including in a compat
context for any GL version.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
9 years agoglapi: add GL_EXT_polygon_offset_clamp
Ilia Mirkin [Wed, 31 Dec 2014 06:47:15 +0000 (01:47 -0500)]
glapi: add GL_EXT_polygon_offset_clamp

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
9 years agoglsl: Pick ast_conditional branch regardless of op1/2 being constant.
Kenneth Graunke [Fri, 22 Aug 2014 04:49:07 +0000 (21:49 -0700)]
glsl: Pick ast_conditional branch regardless of op1/2 being constant.

If the ?: operator's condition is a constant value, and both branches
were pure expressions, we can just make the resulting value one or the
other.

Previously, we only did this if op[1] and op[2] were also constant
values - but there's no actual reason for that restriction.

No changes in shader-db, probably because we usually optimize this later
anyway.  But it does make us generate less stupid code up front.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoi965: Add a better PRM citation for the IMS dimension mangling.
Kenneth Graunke [Wed, 6 Aug 2014 08:08:19 +0000 (01:08 -0700)]
i965: Add a better PRM citation for the IMS dimension mangling.

Paul originally had to reverse engineer these formulas based on the
description about how the sampler works.  The description here is not
the easiest to follow - especially given that it's from the Sandybridge
era, when the hardware only did 4x multisampling.

Jordan and I recently found another part of the documentation where they
simply state that IMS dimensions must be adjusted by a set of formulas.
Quoting this section provides an easy to follow explanation for the
code, including 2x/4x/8x/16x.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
9 years agoswrast: Whitespace fixes.
Laura Ekstrand [Mon, 2 Feb 2015 18:20:57 +0000 (10:20 -0800)]
swrast: Whitespace fixes.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
9 years agoDD: Refactor BlitFramebuffer.
Laura Ekstrand [Fri, 30 Jan 2015 22:03:53 +0000 (14:03 -0800)]
DD: Refactor BlitFramebuffer.

In preparation for glBlitNamedFramebuffer, the DD table function
BlitFramebuffer needs to accept two arbitrary framebuffer objects rather
than assuming ctx->ReadBuffer and ctx->DrawBuffer.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
9 years agoGL: Update glext.h to Khronos Revision 29537.
Laura Ekstrand [Fri, 23 Jan 2015 21:43:16 +0000 (13:43 -0800)]
GL: Update glext.h to Khronos Revision 29537.

Khronos Revision 29537 fixes ARB_direct_state_access function prototypes that
had GLsizei where they should have had GLsizeiptr. The mainly affects
functions related to buffer objects.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agoi965: Don't use tiled_memcpy to download from RGBX or BGRX surfaces
Jason Ekstrand [Mon, 2 Feb 2015 17:49:44 +0000 (09:49 -0800)]
i965: Don't use tiled_memcpy to download from RGBX or BGRX surfaces

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88841

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
9 years agodir-locals.el: Don't set variables for non-programming modes
Neil Roberts [Sat, 31 Jan 2015 16:45:09 +0000 (17:45 +0100)]
dir-locals.el: Don't set variables for non-programming modes

This limits the style changes to modes inherited from prog-mode. The
main reason to do this is to avoid setting fill-column for people
using Emacs to edit commit messages because 78 characters is too many
to make it wrap properly in git log. Note that makefile-mode also
inherits from prog-mode so the fill column should continue to apply
there.

v2: Apply to all the .dir-locals.el files, not just the one in the
    root directory.

Acked-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoi965: Fix intel_miptree_copy_teximage for GL_TEXTURE_1D_ARRAY
Iago Toral Quiroga [Fri, 30 Jan 2015 08:03:57 +0000 (09:03 +0100)]
i965: Fix intel_miptree_copy_teximage for GL_TEXTURE_1D_ARRAY

For GL_TEXTURE_1D_ARRAY targets we store the depth of the array
in the Height field and leave Depth=1 in the underlying texture
object. When we call intel_miptree_copy_teximage in the process
of re-creating a miptree (possibily because the number of miplevels
has changed) we didn't account for this, so we where only copying
texture images for the first slice.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agovc4: Kill a bunch of color write calculation when colormask is all off.
Eric Anholt [Sun, 1 Feb 2015 22:09:12 +0000 (14:09 -0800)]
vc4: Kill a bunch of color write calculation when colormask is all off.

I could have done this in the bit that generates the ANDs and ORs, but
it's probably generally useful.  Sadly, I still need this even if I move
to NIR, because I can't yet express my read of the destination color in
NIR, which I would need to move my blend/logicop/colormask handling into
NIR.

total uniforms in shared programs: 13497 -> 13455 (-0.31%)
uniforms in affected programs:     101 -> 59 (-41.58%)
total instructions in shared programs: 40797 -> 40296 (-1.23%)
instructions in affected programs:     1639 -> 1138 (-30.57%)

9 years agodocs: Update ARB_direct_state_access
Fredrik Höglund [Sun, 1 Feb 2015 21:53:40 +0000 (22:53 +0100)]
docs: Update ARB_direct_state_access

Mark vertex array objects as started.

9 years agodoc: break down ARB_direct_state_access in GL3.txt
Martin Peres [Thu, 29 Jan 2015 14:54:08 +0000 (16:54 +0200)]
doc: break down ARB_direct_state_access in GL3.txt

A student was wondering what was going on + I started working on it too.

CC: Laura Ekstrand <laura@jlekstrand.net>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
9 years agovc4: Dump the VPM read index in QIR disasm.
Eric Anholt [Fri, 30 Jan 2015 19:23:26 +0000 (11:23 -0800)]
vc4: Dump the VPM read index in QIR disasm.

Since the VPM reads have to be in order, it's useful to see their indices
in the dump.

9 years agoi965/pixel_read: Don't try to do a tiled_memcpy from a multisampled buffer
Jason Ekstrand [Sat, 31 Jan 2015 02:47:59 +0000 (18:47 -0800)]
i965/pixel_read: Don't try to do a tiled_memcpy from a multisampled buffer

The GL spec guarantees that glGetTexImage will never get a multisampled
texture, but this is not true for glReadPixels.  If we get a multisampled
buffer, we have to do a multisample resolve on it before we can pull the
data down for the user.  Since this isn't practical to handle in
tiled_memcpy, we just fall back to the other paths that can handle this.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoi965: Enable L3 caching of buffer surfaces.
Francisco Jerez [Tue, 16 Dec 2014 14:11:57 +0000 (16:11 +0200)]
i965: Enable L3 caching of buffer surfaces.

And remove the mocs argument of the emit_buffer_surface_state vtbl hook.  Its
semantics vary greatly from one generation to another, so it kind of
encourages the caller to pass 0 which is the only valid setting across
generations.  After this commit the hardware-specific code decides what the
best cacheability settings are for buffer surfaces, just like we do for
textures.

This together with some additional changes coming is expected to improve
performance of pull constants, buffer textures, atomic counters and image
objects on Gen7 and up.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoegl: Pass the correct X visual depth to xcb_put_image().
José Fonseca [Mon, 19 Jan 2015 23:07:17 +0000 (23:07 +0000)]
egl: Pass the correct X visual depth to xcb_put_image().

The dri2_x11_add_configs_for_visuals() function happily matches a 32
bits EGLconfig with a 24 bits X visual.  However it was passing 32bits
depth to xcb_put_image(), making X server unhappy:

  https://github.com/apitrace/apitrace/issues/313#issuecomment-70571911

Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agointel/pixel_read: Properly flip the results for window system buffers
Jason Ekstrand [Wed, 28 Jan 2015 11:31:06 +0000 (03:31 -0800)]
intel/pixel_read: Properly flip the results for window system buffers

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88841

Reviewed-by: Chad Versace <chad.versace@intel.com>
9 years agoi965/tiled_memcpy: Support a signed linear pitch
Jason Ekstrand [Wed, 28 Jan 2015 11:30:32 +0000 (03:30 -0800)]
i965/tiled_memcpy: Support a signed linear pitch

Reviewed-by: Chad Versace <chad.versace@intel.com>
9 years agomain: Add STENCIL_INDEX formats to base_tex_format
Jason Ekstrand [Fri, 30 Jan 2015 22:24:13 +0000 (14:24 -0800)]
main: Add STENCIL_INDEX formats to base_tex_format

This fixes a bug on BDW when our meta-based stencil blit path assert-fails
due to an invalid internal format even though we do support the
ARB_stencil_texturing extension.

Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoteximage: Don't indent switch cases
Jason Ekstrand [Fri, 30 Jan 2015 23:42:59 +0000 (15:42 -0800)]
teximage: Don't indent switch cases

No functional change.

Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agomesa: remove some dead display list code
Brian Paul [Fri, 30 Jan 2015 16:12:46 +0000 (09:12 -0700)]
mesa: remove some dead display list code

The size of a Node is always four bytes so no need for the old code
that was used when sizeof(Node)==8.

Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agomesa: remove stale comment in dlist.c code
Brian Paul [Fri, 30 Jan 2015 15:54:19 +0000 (08:54 -0700)]
mesa: remove stale comment in dlist.c code

sizeof(Node) is always 4 bytes.

Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agomesa: s/union gl_dlist_node/Node/ in dlist.c code
Brian Paul [Fri, 30 Jan 2015 15:53:46 +0000 (08:53 -0700)]
mesa: s/union gl_dlist_node/Node/ in dlist.c code

Just minor clean-up.

Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agomesa: fix display list 8-byte alignment issue
Brian Paul [Tue, 27 Jan 2015 03:32:58 +0000 (20:32 -0700)]
mesa: fix display list 8-byte alignment issue

The _mesa_dlist_alloc() function is only guaranteed to return a pointer
with 4-byte alignment.  On 64-bit systems which don't support unaligned
loads (e.g. SPARC or MIPS) this could lead to a bus error in the VBO code.

The solution is to add a new  _mesa_dlist_alloc_aligned() function which
will return a pointer to an 8-byte aligned address on 64-bit systems.
This is accomplished by inserting a 4-byte NOP instruction in the display
list when needed.

The only place this actually matters is the VBO code where we need to
allocate a 'struct vbo_save_vertex_list' which needs to be 8-byte
aligned (just as if it were malloc'd).

The gears demo and others hit this bug.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88662
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
9 years agoutil/u_atomic: Provide a _InterlockedCompareExchange8 for older MSVC.
José Fonseca [Tue, 20 Jan 2015 23:36:50 +0000 (23:36 +0000)]
util/u_atomic: Provide a _InterlockedCompareExchange8 for older MSVC.

Fixes build with Windows SDK 7.0.7600.

Tested with u_atomic_test, both on x86 and x86_64.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
9 years agoutil/u_atomic: Use _Interlocked* intrinsics for non 64bits.
José Fonseca [Tue, 20 Jan 2015 23:34:26 +0000 (23:34 +0000)]
util/u_atomic: Use _Interlocked* intrinsics for non 64bits.

The intrinsics are universally available, whereas older Windows SDKs (e.g.
7.0.7600) don't have the non-intrisic entrypoint.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
9 years agoi965/skl: Force a BINDING_TABLE_POINTER_* after push constant command
Neil Roberts [Thu, 29 Jan 2015 14:59:49 +0000 (14:59 +0000)]
i965/skl: Force a BINDING_TABLE_POINTER_* after push constant command

According to the SKL bspec the 3DSTATE_CONSTANT_* commands only take
effect on the next corresponding 3DSTATE_BINDING_TABLE_POINTER_*
command. This patch just makes it set the BRW_NEW_SURFACES state when
uploading the push constants to ensure the binding tables will be
updated.

This fixes the fbo-blending-formats Piglit test and possibly others.

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agometa: Don't write depth when decompressing tex-images
Topi Pohjolainen [Wed, 28 Jan 2015 15:23:59 +0000 (17:23 +0200)]
meta: Don't write depth when decompressing tex-images

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agometa: Don't write depth when generating miptrees
Topi Pohjolainen [Wed, 28 Jan 2015 15:17:22 +0000 (17:17 +0200)]
meta: Don't write depth when generating miptrees

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agometa/blit: Compile programs with and without depth
Topi Pohjolainen [Wed, 28 Jan 2015 14:27:25 +0000 (16:27 +0200)]
meta/blit: Compile programs with and without depth

When color buffers alone are concerned the depth is not needed.

No regression on BDW where meta blit is used instead of blorp. I
also disabled blorp temporarily for fbo-blits on IVB and saw no
regressions there either.
I also compared several graphics benchmarks on BDW and saw neither
regressions or improvements.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agometa/blit: Write depth only when asked for
Topi Pohjolainen [Wed, 28 Jan 2015 14:36:11 +0000 (16:36 +0200)]
meta/blit: Write depth only when asked for

Implementing an idea from Ken, on i965 the shader program for 2D
blits becomes significantly simpler.

Before:

pln(8)   g6<1>F    g4<0,1,0>F    g2<8,8,1>F  { align1 1Q compacted };
pln(8)   g7<1>F    g4.4<0,1,0>F  g2<8,8,1>F  { align1 1Q compacted };
send(8)  g2<1>UW   g6<8,8,1>F
         sampler (1, 0, 0, 1) mlen 2 rlen 4  { align1 1Q };
mov(8)   g123<1>F  g2<8,8,1>F                { align1 1Q compacted };
mov(8)   g124<1>F  g3<8,8,1>F                { align1 1Q compacted };
mov(8)   g125<1>F  g4<8,8,1>F                { align1 1Q compacted };
mov(8)   g126<1>F  g5<8,8,1>F                { align1 1Q compacted };
mov(8)   g127<1>F  g2<8,8,1>F                { align1 1Q compacted };
nop                                                             ;
sendc(8) null        g123<8,8,1>F
    render RT write SIMD8 LastRT Surface = 0 mlen 5 rlen 0 { align1 1Q EOT };

After:

pln(8)   g6<1>F     g4<0,1,0>F    g2<8,8,1>F   { align1 1Q compacted };
pln(8)   g7<1>F     g4.4<0,1,0>F  g2<8,8,1>F   { align1 1Q compacted };
send(8)  g124<1>UW  g6<8,8,1>F
         sampler (1, 0, 0, 1) mlen 2 rlen 4    { align1 1Q };
sendc(8) null        g124<8,8,1>F
   render RT write SIMD8 LastRT Surface = 0 mlen 4 rlen 0 { align1 1Q EOT };

v2 (Matt): Removed unintended white-space change

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agometa/blit: Add plumbing for shaders without depth
Topi Pohjolainen [Wed, 28 Jan 2015 14:24:25 +0000 (16:24 +0200)]
meta/blit: Add plumbing for shaders without depth

Currently all blit programs are unconditionally compiled with
gl_FragDepth.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agonir/opt_algebraic: Add some constant bcsel reductions
Jason Ekstrand [Thu, 29 Jan 2015 00:53:51 +0000 (16:53 -0800)]
nir/opt_algebraic: Add some constant bcsel reductions

total instructions in shared programs: 5998190 -> 5997603 (-0.01%)
instructions in affected programs:     54276 -> 53689 (-1.08%)
helped:                                293

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agonir/opt_algebraic: Add some boolean simplifications
Jason Ekstrand [Thu, 29 Jan 2015 00:55:03 +0000 (16:55 -0800)]
nir/opt_algebraic: Add some boolean simplifications

total instructions in shared programs: 5998321 -> 5998287 (-0.00%)
instructions in affected programs:     4520 -> 4486 (-0.75%)
helped:                                8

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agonir/algebraic: Support specifying variable as constant or by type
Jason Ekstrand [Thu, 29 Jan 2015 00:42:20 +0000 (16:42 -0800)]
nir/algebraic: Support specifying variable as constant or by type

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agonir/algebraic: Fail to compile of a variable is used in a replace but not the search
Jason Ekstrand [Thu, 29 Jan 2015 19:45:31 +0000 (11:45 -0800)]
nir/algebraic: Fail to compile of a variable is used in a replace but not the search

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agonir/search: Allow for matching variables based on types
Jason Ekstrand [Thu, 29 Jan 2015 00:29:21 +0000 (16:29 -0800)]
nir/search: Allow for matching variables based on types

This allows you to match on an unknown value but only if it is of a given
type.  90% of the uses of this are for matching only booleans, but adding
the generality of arbitrary types is no more complex.

nir_algebraic.py doesn't handle this yet but that's ok because the C
language will ensure that the default type on all variables is void.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agonir/search: Add support for matching unknown constants
Jason Ekstrand [Thu, 22 Jan 2015 22:15:27 +0000 (14:15 -0800)]
nir/search: Add support for matching unknown constants

There are some algebraic transformations that we want to do but only if
certain things are constants.  For instance, we may want to replace
a * (b + c) with (a * b) + (a * c) as long as a and either b or c is constant.
While this generates more instructions, some of it will get constant
folded.

nir_algebraic.py doesn't handle this yet, but that's ok because the C
language will make sure that false is the default for now.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agonir: Add an invalid type
Jason Ekstrand [Thu, 29 Jan 2015 00:27:40 +0000 (16:27 -0800)]
nir: Add an invalid type

This allows us to indicate a concept of an invalid type.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agogallium/docs: fix docs wrt ARL/ARR/FLR
Roland Scheidegger [Thu, 29 Jan 2015 19:39:50 +0000 (20:39 +0100)]
gallium/docs: fix docs wrt ARL/ARR/FLR

since the address reg holds integer values, ARL/ARR do an implicit float-to-int
conversion, so clarify that. Thus it is also incorrect to say that FLR really
does the same as ARL.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
9 years agonir: Add variants of some of the comparison simplifications.
Eric Anholt [Tue, 27 Jan 2015 00:48:48 +0000 (16:48 -0800)]
nir: Add variants of some of the comparison simplifications.

We end up with these from TGSI-to-NIR because the pass generating the
comparisons doesn't know if the arg is actually a bool input or not.  vc4
results:

total instructions in shared programs: 41801 -> 41508 (-0.70%)
instructions in affected programs:     4253 -> 3960 (-6.89%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agovc4: Fix point size handling when it's the first output.
Eric Anholt [Thu, 29 Jan 2015 19:33:42 +0000 (11:33 -0800)]
vc4: Fix point size handling when it's the first output.

9 years agonir: Don't try to to-SSA ALU instructions that are already SSA.
Eric Anholt [Mon, 26 Jan 2015 22:37:42 +0000 (14:37 -0800)]
nir: Don't try to to-SSA ALU instructions that are already SSA.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agonir: Fix a bit of broken indentation.
Eric Anholt [Tue, 11 Nov 2014 22:08:13 +0000 (14:08 -0800)]
nir: Fix a bit of broken indentation.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agonir: Add a couple of helpers for glsl types.
Eric Anholt [Thu, 30 Oct 2014 23:49:32 +0000 (16:49 -0700)]
nir: Add a couple of helpers for glsl types.

This will be used by tgsi_to_nir, which needs to get vec4 types for
declaring shader input/output variables.

v2: Add a missing space.

Reviewed-by: Matt Turner <mattst88@gmail.com> (v2)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agodocs: fix mesa 10.4.3 release date
Emil Velikov [Thu, 29 Jan 2015 14:02:19 +0000 (14:02 +0000)]
docs: fix mesa 10.4.3 release date

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
9 years agoMesa: Advertise GL_OES_texture_*float* extensions support with i965.
Kalyan Kondapally [Thu, 8 Jan 2015 04:30:27 +0000 (20:30 -0800)]
Mesa: Advertise GL_OES_texture_*float* extensions support with i965.

This patch advertises support for GL_OES_texture_*float* extensions
when using i965 drivers.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
9 years agoMesa: Add support for HALF_FLOAT_OES type.
Kalyan Kondapally [Tue, 27 Jan 2015 07:23:00 +0000 (09:23 +0200)]
Mesa: Add support for HALF_FLOAT_OES type.

This patch adds needed support for accepting HALF_FLOAT_OES as valid type
for TexImage*D and TexSubImage*D when Texture FLoat extensions are supported.

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
9 years agoMesa: Add support for GL_OES_texture_*float* extensions.
Kalyan Kondapally [Thu, 8 Jan 2015 04:30:25 +0000 (20:30 -0800)]
Mesa: Add support for GL_OES_texture_*float* extensions.

This patch series adds support for following GLES2 Texture Float extensions:
1)GL_OES_texture_float,
2)GL_OES_texture_half_float,
3)GL_OES_texture_float_linear,
4)GL_OES_texture_half_float_linear.

This patch adds basic infrastructure and needed boolean flags to advertise
support for these extensions, by default the support is disabled. Next patch
in the series introduces support for HALF_FLOAT_OES token.

v4: take assert away and make valid_filter_for_float conditional (Tapani),
    fix the alphabetical order (Emil)

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
9 years agonir: Make vec-to-movs handle src/dest aliasing.
Eric Anholt [Thu, 22 Jan 2015 21:08:59 +0000 (13:08 -0800)]
nir: Make vec-to-movs handle src/dest aliasing.

It now emits vector MOVs instead of a series of individual MOVs, which
should be useful to any vector backends.  This pushes the problem of
src/dest aliasing of channels on a scalar chip to the backend, but if
there are any vector operations in your shader then you needed to be
handling this already.

Fixes fs-swap-problem with my scalarizing patches.

v2: Rename to insert_mov(), and add a comment about what it does.
v3: Rewrite the comment.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v3)
9 years agogallium: Replace u_simple_list.h with util/simple_list.h
Eric Anholt [Fri, 14 Nov 2014 20:40:46 +0000 (12:40 -0800)]
gallium: Replace u_simple_list.h with util/simple_list.h

The code was exactly the same, except util/ has c++ guards and a struct
simple_node declaration.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years agomesa: Port a variant of 68afbe89c72d085dcbbf2b264f0201ab73fe339e to util/
Eric Anholt [Mon, 26 Jan 2015 19:34:18 +0000 (11:34 -0800)]
mesa: Port a variant of 68afbe89c72d085dcbbf2b264f0201ab73fe339e to util/

The idea is that after a remove_from_list(), you might want to be able to
do a remove_from_list() on it again or an is_empty_list().  This is
apparently relied on by r300g.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years agomesa: Move simple_list.h to src/util.
Eric Anholt [Fri, 14 Nov 2014 20:14:40 +0000 (12:14 -0800)]
mesa: Move simple_list.h to src/util.

We have two copies of it in the tree, I'm going to delete one.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years agoradeonsi: Enable VGPR spilling for all shader types v5
Tom Stellard [Wed, 10 Dec 2014 14:13:59 +0000 (09:13 -0500)]
radeonsi: Enable VGPR spilling for all shader types v5

v2:
  - Only emit write SPI_TMPRING_SIZE once per packet.
  - Use context global scratch buffer.

v3:
  - Patch shaders using WRITE_DATA packet instead of map/unmap.
  - Emit ICACHE_FLUSH, CS_PARTIAL_FLUSH, PS_PARTIAL_FLUSH, and
    VS_PARTIAL_FLUSH when patching shaders.

v4:
  - Code cleanups.
  - Remove unnecessary multiplies.

v5:
  - Patch shaders in system memory and re-upload to vram.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi/compute: Allocate the scratch buffer during state creation
Tom Stellard [Tue, 27 Jan 2015 16:35:21 +0000 (16:35 +0000)]
radeonsi/compute: Allocate the scratch buffer during state creation

This moves scratch buffer allocation from si_launch_grid() to
si_create_compute_state().  This helps to reduce the overhead of
launching a kernel and also fixes a bug in the code that would cause
the scratch buffer to be too small if a kernel with smaller scratch size
was launched before a kernel with a larger scratch size.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: Add radeon_shader_binary member to struct si_shader
Tom Stellard [Fri, 23 Jan 2015 22:54:43 +0000 (22:54 +0000)]
radeonsi: Add radeon_shader_binary member to struct si_shader

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi/compute: Rename si_compute::program to si_compute::shader
Tom Stellard [Fri, 23 Jan 2015 22:54:08 +0000 (22:54 +0000)]
radeonsi/compute: Rename si_compute::program to si_compute::shader

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: Avoid leaking memory when rebuilding shader states
Marek Olšák [Tue, 27 Jan 2015 14:52:37 +0000 (14:52 +0000)]
radeonsi: Avoid leaking memory when rebuilding shader states

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agonir/opcodes: Use a return type of tfloat for ldexp
Jason Ekstrand [Wed, 28 Jan 2015 20:43:47 +0000 (12:43 -0800)]
nir/opcodes: Use a return type of tfloat for ldexp

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>