Kenneth Graunke [Wed, 6 Aug 2014 08:08:19 +0000 (01:08 -0700)]
i965: Add a better PRM citation for the IMS dimension mangling.
Paul originally had to reverse engineer these formulas based on the
description about how the sampler works. The description here is not
the easiest to follow - especially given that it's from the Sandybridge
era, when the hardware only did 4x multisampling.
Jordan and I recently found another part of the documentation where they
simply state that IMS dimensions must be adjusted by a set of formulas.
Quoting this section provides an easy to follow explanation for the
code, including 2x/4x/8x/16x.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Laura Ekstrand [Mon, 2 Feb 2015 18:20:57 +0000 (10:20 -0800)]
swrast: Whitespace fixes.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Laura Ekstrand [Fri, 30 Jan 2015 22:03:53 +0000 (14:03 -0800)]
DD: Refactor BlitFramebuffer.
In preparation for glBlitNamedFramebuffer, the DD table function
BlitFramebuffer needs to accept two arbitrary framebuffer objects rather
than assuming ctx->ReadBuffer and ctx->DrawBuffer.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Laura Ekstrand [Fri, 23 Jan 2015 21:43:16 +0000 (13:43 -0800)]
GL: Update glext.h to Khronos Revision 29537.
Khronos Revision 29537 fixes ARB_direct_state_access function prototypes that
had GLsizei where they should have had GLsizeiptr. The mainly affects
functions related to buffer objects.
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Jason Ekstrand [Mon, 2 Feb 2015 17:49:44 +0000 (09:49 -0800)]
i965: Don't use tiled_memcpy to download from RGBX or BGRX surfaces
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88841
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Neil Roberts [Sat, 31 Jan 2015 16:45:09 +0000 (17:45 +0100)]
dir-locals.el: Don't set variables for non-programming modes
This limits the style changes to modes inherited from prog-mode. The
main reason to do this is to avoid setting fill-column for people
using Emacs to edit commit messages because 78 characters is too many
to make it wrap properly in git log. Note that makefile-mode also
inherits from prog-mode so the fill column should continue to apply
there.
v2: Apply to all the .dir-locals.el files, not just the one in the
root directory.
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
Iago Toral Quiroga [Fri, 30 Jan 2015 08:03:57 +0000 (09:03 +0100)]
i965: Fix intel_miptree_copy_teximage for GL_TEXTURE_1D_ARRAY
For GL_TEXTURE_1D_ARRAY targets we store the depth of the array
in the Height field and leave Depth=1 in the underlying texture
object. When we call intel_miptree_copy_teximage in the process
of re-creating a miptree (possibily because the number of miplevels
has changed) we didn't account for this, so we where only copying
texture images for the first slice.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Eric Anholt [Sun, 1 Feb 2015 22:09:12 +0000 (14:09 -0800)]
vc4: Kill a bunch of color write calculation when colormask is all off.
I could have done this in the bit that generates the ANDs and ORs, but
it's probably generally useful. Sadly, I still need this even if I move
to NIR, because I can't yet express my read of the destination color in
NIR, which I would need to move my blend/logicop/colormask handling into
NIR.
total uniforms in shared programs: 13497 -> 13455 (-0.31%)
uniforms in affected programs: 101 -> 59 (-41.58%)
total instructions in shared programs: 40797 -> 40296 (-1.23%)
instructions in affected programs: 1639 -> 1138 (-30.57%)
Fredrik Höglund [Sun, 1 Feb 2015 21:53:40 +0000 (22:53 +0100)]
docs: Update ARB_direct_state_access
Mark vertex array objects as started.
Martin Peres [Thu, 29 Jan 2015 14:54:08 +0000 (16:54 +0200)]
doc: break down ARB_direct_state_access in GL3.txt
A student was wondering what was going on + I started working on it too.
CC: Laura Ekstrand <laura@jlekstrand.net>
Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Laura Ekstrand <laura@jlekstrand.net>
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Eric Anholt [Fri, 30 Jan 2015 19:23:26 +0000 (11:23 -0800)]
vc4: Dump the VPM read index in QIR disasm.
Since the VPM reads have to be in order, it's useful to see their indices
in the dump.
Jason Ekstrand [Sat, 31 Jan 2015 02:47:59 +0000 (18:47 -0800)]
i965/pixel_read: Don't try to do a tiled_memcpy from a multisampled buffer
The GL spec guarantees that glGetTexImage will never get a multisampled
texture, but this is not true for glReadPixels. If we get a multisampled
buffer, we have to do a multisample resolve on it before we can pull the
data down for the user. Since this isn't practical to handle in
tiled_memcpy, we just fall back to the other paths that can handle this.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Tue, 16 Dec 2014 14:11:57 +0000 (16:11 +0200)]
i965: Enable L3 caching of buffer surfaces.
And remove the mocs argument of the emit_buffer_surface_state vtbl hook. Its
semantics vary greatly from one generation to another, so it kind of
encourages the caller to pass 0 which is the only valid setting across
generations. After this commit the hardware-specific code decides what the
best cacheability settings are for buffer surfaces, just like we do for
textures.
This together with some additional changes coming is expected to improve
performance of pull constants, buffer textures, atomic counters and image
objects on Gen7 and up.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
José Fonseca [Mon, 19 Jan 2015 23:07:17 +0000 (23:07 +0000)]
egl: Pass the correct X visual depth to xcb_put_image().
The dri2_x11_add_configs_for_visuals() function happily matches a 32
bits EGLconfig with a 24 bits X visual. However it was passing 32bits
depth to xcb_put_image(), making X server unhappy:
https://github.com/apitrace/apitrace/issues/313#issuecomment-
70571911
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Jason Ekstrand [Wed, 28 Jan 2015 11:31:06 +0000 (03:31 -0800)]
intel/pixel_read: Properly flip the results for window system buffers
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88841
Reviewed-by: Chad Versace <chad.versace@intel.com>
Jason Ekstrand [Wed, 28 Jan 2015 11:30:32 +0000 (03:30 -0800)]
i965/tiled_memcpy: Support a signed linear pitch
Reviewed-by: Chad Versace <chad.versace@intel.com>
Jason Ekstrand [Fri, 30 Jan 2015 22:24:13 +0000 (14:24 -0800)]
main: Add STENCIL_INDEX formats to base_tex_format
This fixes a bug on BDW when our meta-based stencil blit path assert-fails
due to an invalid internal format even though we do support the
ARB_stencil_texturing extension.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Fri, 30 Jan 2015 23:42:59 +0000 (15:42 -0800)]
teximage: Don't indent switch cases
No functional change.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Brian Paul [Fri, 30 Jan 2015 16:12:46 +0000 (09:12 -0700)]
mesa: remove some dead display list code
The size of a Node is always four bytes so no need for the old code
that was used when sizeof(Node)==8.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Brian Paul [Fri, 30 Jan 2015 15:54:19 +0000 (08:54 -0700)]
mesa: remove stale comment in dlist.c code
sizeof(Node) is always 4 bytes.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Brian Paul [Fri, 30 Jan 2015 15:53:46 +0000 (08:53 -0700)]
mesa: s/union gl_dlist_node/Node/ in dlist.c code
Just minor clean-up.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Brian Paul [Tue, 27 Jan 2015 03:32:58 +0000 (20:32 -0700)]
mesa: fix display list 8-byte alignment issue
The _mesa_dlist_alloc() function is only guaranteed to return a pointer
with 4-byte alignment. On 64-bit systems which don't support unaligned
loads (e.g. SPARC or MIPS) this could lead to a bus error in the VBO code.
The solution is to add a new _mesa_dlist_alloc_aligned() function which
will return a pointer to an 8-byte aligned address on 64-bit systems.
This is accomplished by inserting a 4-byte NOP instruction in the display
list when needed.
The only place this actually matters is the VBO code where we need to
allocate a 'struct vbo_save_vertex_list' which needs to be 8-byte
aligned (just as if it were malloc'd).
The gears demo and others hit this bug.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88662
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
José Fonseca [Tue, 20 Jan 2015 23:36:50 +0000 (23:36 +0000)]
util/u_atomic: Provide a _InterlockedCompareExchange8 for older MSVC.
Fixes build with Windows SDK 7.0.7600.
Tested with u_atomic_test, both on x86 and x86_64.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
José Fonseca [Tue, 20 Jan 2015 23:34:26 +0000 (23:34 +0000)]
util/u_atomic: Use _Interlocked* intrinsics for non 64bits.
The intrinsics are universally available, whereas older Windows SDKs (e.g.
7.0.7600) don't have the non-intrisic entrypoint.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Neil Roberts [Thu, 29 Jan 2015 14:59:49 +0000 (14:59 +0000)]
i965/skl: Force a BINDING_TABLE_POINTER_* after push constant command
According to the SKL bspec the 3DSTATE_CONSTANT_* commands only take
effect on the next corresponding 3DSTATE_BINDING_TABLE_POINTER_*
command. This patch just makes it set the BRW_NEW_SURFACES state when
uploading the push constants to ensure the binding tables will be
updated.
This fixes the fbo-blending-formats Piglit test and possibly others.
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Topi Pohjolainen [Wed, 28 Jan 2015 15:23:59 +0000 (17:23 +0200)]
meta: Don't write depth when decompressing tex-images
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Topi Pohjolainen [Wed, 28 Jan 2015 15:17:22 +0000 (17:17 +0200)]
meta: Don't write depth when generating miptrees
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Topi Pohjolainen [Wed, 28 Jan 2015 14:27:25 +0000 (16:27 +0200)]
meta/blit: Compile programs with and without depth
When color buffers alone are concerned the depth is not needed.
No regression on BDW where meta blit is used instead of blorp. I
also disabled blorp temporarily for fbo-blits on IVB and saw no
regressions there either.
I also compared several graphics benchmarks on BDW and saw neither
regressions or improvements.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Topi Pohjolainen [Wed, 28 Jan 2015 14:36:11 +0000 (16:36 +0200)]
meta/blit: Write depth only when asked for
Implementing an idea from Ken, on i965 the shader program for 2D
blits becomes significantly simpler.
Before:
pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 1Q compacted };
pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 1Q compacted };
send(8) g2<1>UW g6<8,8,1>F
sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 1Q };
mov(8) g123<1>F g2<8,8,1>F { align1 1Q compacted };
mov(8) g124<1>F g3<8,8,1>F { align1 1Q compacted };
mov(8) g125<1>F g4<8,8,1>F { align1 1Q compacted };
mov(8) g126<1>F g5<8,8,1>F { align1 1Q compacted };
mov(8) g127<1>F g2<8,8,1>F { align1 1Q compacted };
nop ;
sendc(8) null g123<8,8,1>F
render RT write SIMD8 LastRT Surface = 0 mlen 5 rlen 0 { align1 1Q EOT };
After:
pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 1Q compacted };
pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 1Q compacted };
send(8) g124<1>UW g6<8,8,1>F
sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 1Q };
sendc(8) null g124<8,8,1>F
render RT write SIMD8 LastRT Surface = 0 mlen 4 rlen 0 { align1 1Q EOT };
v2 (Matt): Removed unintended white-space change
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Topi Pohjolainen [Wed, 28 Jan 2015 14:24:25 +0000 (16:24 +0200)]
meta/blit: Add plumbing for shaders without depth
Currently all blit programs are unconditionally compiled with
gl_FragDepth.
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Thu, 29 Jan 2015 00:53:51 +0000 (16:53 -0800)]
nir/opt_algebraic: Add some constant bcsel reductions
total instructions in shared programs:
5998190 ->
5997603 (-0.01%)
instructions in affected programs: 54276 -> 53689 (-1.08%)
helped: 293
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Thu, 29 Jan 2015 00:55:03 +0000 (16:55 -0800)]
nir/opt_algebraic: Add some boolean simplifications
total instructions in shared programs:
5998321 ->
5998287 (-0.00%)
instructions in affected programs: 4520 -> 4486 (-0.75%)
helped: 8
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Thu, 29 Jan 2015 00:42:20 +0000 (16:42 -0800)]
nir/algebraic: Support specifying variable as constant or by type
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Thu, 29 Jan 2015 19:45:31 +0000 (11:45 -0800)]
nir/algebraic: Fail to compile of a variable is used in a replace but not the search
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Thu, 29 Jan 2015 00:29:21 +0000 (16:29 -0800)]
nir/search: Allow for matching variables based on types
This allows you to match on an unknown value but only if it is of a given
type. 90% of the uses of this are for matching only booleans, but adding
the generality of arbitrary types is no more complex.
nir_algebraic.py doesn't handle this yet but that's ok because the C
language will ensure that the default type on all variables is void.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Thu, 22 Jan 2015 22:15:27 +0000 (14:15 -0800)]
nir/search: Add support for matching unknown constants
There are some algebraic transformations that we want to do but only if
certain things are constants. For instance, we may want to replace
a * (b + c) with (a * b) + (a * c) as long as a and either b or c is constant.
While this generates more instructions, some of it will get constant
folded.
nir_algebraic.py doesn't handle this yet, but that's ok because the C
language will make sure that false is the default for now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Thu, 29 Jan 2015 00:27:40 +0000 (16:27 -0800)]
nir: Add an invalid type
This allows us to indicate a concept of an invalid type.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Roland Scheidegger [Thu, 29 Jan 2015 19:39:50 +0000 (20:39 +0100)]
gallium/docs: fix docs wrt ARL/ARR/FLR
since the address reg holds integer values, ARL/ARR do an implicit float-to-int
conversion, so clarify that. Thus it is also incorrect to say that FLR really
does the same as ARL.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Eric Anholt [Tue, 27 Jan 2015 00:48:48 +0000 (16:48 -0800)]
nir: Add variants of some of the comparison simplifications.
We end up with these from TGSI-to-NIR because the pass generating the
comparisons doesn't know if the arg is actually a bool input or not. vc4
results:
total instructions in shared programs: 41801 -> 41508 (-0.70%)
instructions in affected programs: 4253 -> 3960 (-6.89%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Eric Anholt [Thu, 29 Jan 2015 19:33:42 +0000 (11:33 -0800)]
vc4: Fix point size handling when it's the first output.
Eric Anholt [Mon, 26 Jan 2015 22:37:42 +0000 (14:37 -0800)]
nir: Don't try to to-SSA ALU instructions that are already SSA.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Eric Anholt [Tue, 11 Nov 2014 22:08:13 +0000 (14:08 -0800)]
nir: Fix a bit of broken indentation.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Eric Anholt [Thu, 30 Oct 2014 23:49:32 +0000 (16:49 -0700)]
nir: Add a couple of helpers for glsl types.
This will be used by tgsi_to_nir, which needs to get vec4 types for
declaring shader input/output variables.
v2: Add a missing space.
Reviewed-by: Matt Turner <mattst88@gmail.com> (v2)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Emil Velikov [Thu, 29 Jan 2015 14:02:19 +0000 (14:02 +0000)]
docs: fix mesa 10.4.3 release date
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Kalyan Kondapally [Thu, 8 Jan 2015 04:30:27 +0000 (20:30 -0800)]
Mesa: Advertise GL_OES_texture_*float* extensions support with i965.
This patch advertises support for GL_OES_texture_*float* extensions
when using i965 drivers.
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Kalyan Kondapally [Tue, 27 Jan 2015 07:23:00 +0000 (09:23 +0200)]
Mesa: Add support for HALF_FLOAT_OES type.
This patch adds needed support for accepting HALF_FLOAT_OES as valid type
for TexImage*D and TexSubImage*D when Texture FLoat extensions are supported.
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Kalyan Kondapally [Thu, 8 Jan 2015 04:30:25 +0000 (20:30 -0800)]
Mesa: Add support for GL_OES_texture_*float* extensions.
This patch series adds support for following GLES2 Texture Float extensions:
1)GL_OES_texture_float,
2)GL_OES_texture_half_float,
3)GL_OES_texture_float_linear,
4)GL_OES_texture_half_float_linear.
This patch adds basic infrastructure and needed boolean flags to advertise
support for these extensions, by default the support is disabled. Next patch
in the series introduces support for HALF_FLOAT_OES token.
v4: take assert away and make valid_filter_for_float conditional (Tapani),
fix the alphabetical order (Emil)
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Signed-off-by: Kalyan Kondapally <kalyan.kondapally@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Eric Anholt [Thu, 22 Jan 2015 21:08:59 +0000 (13:08 -0800)]
nir: Make vec-to-movs handle src/dest aliasing.
It now emits vector MOVs instead of a series of individual MOVs, which
should be useful to any vector backends. This pushes the problem of
src/dest aliasing of channels on a scalar chip to the backend, but if
there are any vector operations in your shader then you needed to be
handling this already.
Fixes fs-swap-problem with my scalarizing patches.
v2: Rename to insert_mov(), and add a comment about what it does.
v3: Rewrite the comment.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v3)
Eric Anholt [Fri, 14 Nov 2014 20:40:46 +0000 (12:40 -0800)]
gallium: Replace u_simple_list.h with util/simple_list.h
The code was exactly the same, except util/ has c++ guards and a struct
simple_node declaration.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Eric Anholt [Mon, 26 Jan 2015 19:34:18 +0000 (11:34 -0800)]
mesa: Port a variant of
68afbe89c72d085dcbbf2b264f0201ab73fe339e to util/
The idea is that after a remove_from_list(), you might want to be able to
do a remove_from_list() on it again or an is_empty_list(). This is
apparently relied on by r300g.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Eric Anholt [Fri, 14 Nov 2014 20:14:40 +0000 (12:14 -0800)]
mesa: Move simple_list.h to src/util.
We have two copies of it in the tree, I'm going to delete one.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tom Stellard [Wed, 10 Dec 2014 14:13:59 +0000 (09:13 -0500)]
radeonsi: Enable VGPR spilling for all shader types v5
v2:
- Only emit write SPI_TMPRING_SIZE once per packet.
- Use context global scratch buffer.
v3:
- Patch shaders using WRITE_DATA packet instead of map/unmap.
- Emit ICACHE_FLUSH, CS_PARTIAL_FLUSH, PS_PARTIAL_FLUSH, and
VS_PARTIAL_FLUSH when patching shaders.
v4:
- Code cleanups.
- Remove unnecessary multiplies.
v5:
- Patch shaders in system memory and re-upload to vram.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tom Stellard [Tue, 27 Jan 2015 16:35:21 +0000 (16:35 +0000)]
radeonsi/compute: Allocate the scratch buffer during state creation
This moves scratch buffer allocation from si_launch_grid() to
si_create_compute_state(). This helps to reduce the overhead of
launching a kernel and also fixes a bug in the code that would cause
the scratch buffer to be too small if a kernel with smaller scratch size
was launched before a kernel with a larger scratch size.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tom Stellard [Fri, 23 Jan 2015 22:54:43 +0000 (22:54 +0000)]
radeonsi: Add radeon_shader_binary member to struct si_shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Tom Stellard [Fri, 23 Jan 2015 22:54:08 +0000 (22:54 +0000)]
radeonsi/compute: Rename si_compute::program to si_compute::shader
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Tue, 27 Jan 2015 14:52:37 +0000 (14:52 +0000)]
radeonsi: Avoid leaking memory when rebuilding shader states
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Jason Ekstrand [Wed, 28 Jan 2015 20:43:47 +0000 (12:43 -0800)]
nir/opcodes: Use a return type of tfloat for ldexp
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 28 Jan 2015 20:42:47 +0000 (12:42 -0800)]
Revert "util: Move the alternate fpclassify implementation to util"
This reverts commits
d6eb572905e39c36168b8f5da240af961f9dde0a and
58e8468d113c7d3d4a59ea4a8d70fd45b78e85e6.
This is no longer necessary as we aren't using it in NIR anymore. Also, it
broke the build on some strange systems so let's put it back in querymatrix
where it came from.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88852
Acked-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 28 Jan 2015 20:41:44 +0000 (12:41 -0800)]
Revert "nir/opcodes: Use fpclassify() instead of isnormal() for ldexp"
This reverts commit
d7d340fb2f68c46bd5a0008ecf53c6693e29c916.
We have an isnormal() implementation available, the only problem was that
we had the wrong return type (fixed in a later patch).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806
Acked-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Wed, 28 Jan 2015 18:38:01 +0000 (10:38 -0800)]
util: Predicate the fpclassify fallback on !defined(__cplusplus)
The problem is that the fallbacks we have at the moment don't work in C++.
While we could theoretically fix the fallbacks it would also raise the
issue of correctly detecting the fpclassify function. So, for now, we'll
just disable it until we actually have a C++ user.
Reported-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: EdB <edb+mesa@sigluy.net>
Sven Arvidsson [Mon, 8 Dec 2014 18:43:09 +0000 (19:43 +0100)]
drirc: set allow_glsl_extension_directive_midshader for Dead Island.
Signed-off-by: Sven Arvidsson <sa@whiz.se>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87076
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Jason Ekstrand [Mon, 26 Jan 2015 22:21:15 +0000 (14:21 -0800)]
nir/opcodes: Use fpclassify() instead of isnormal() for ldexp
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Jason Ekstrand [Mon, 26 Jan 2015 22:19:30 +0000 (14:19 -0800)]
util: Move the alternate fpclassify implementation to util
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Jason Ekstrand [Tue, 27 Jan 2015 22:18:57 +0000 (14:18 -0800)]
i965/tex: Don't create read-write textures with non-renderable formats
I haven't actually seen this bug in the wild, but it's possible that
someone could ask to do a S3TC PBO download or something. This protects us
from accidentally creating a render target with a compressed or otherwise
non-renderable format.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Tue, 27 Jan 2015 22:06:44 +0000 (14:06 -0800)]
i965/gen8: Include the buffer offset when emitting renderbuffer relocs
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88792
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tapani Pälli [Mon, 26 Jan 2015 10:35:23 +0000 (12:35 +0200)]
mesa: improve error messaging for format CSV parser
Patch adds 2 error messages that point user directly to fix
mispelled or impossible swizzle field for a format.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
EdB [Wed, 28 Jan 2015 00:20:38 +0000 (02:20 +0200)]
clover/llvm: Dump the OpenCL C code earlier.
[ Francisco Jerez: As discussed on the mailing list, this is intended
to produce more useful debug output in cases where the compilation
terminates unexpectedly. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
EdB [Sun, 14 Dec 2014 10:31:21 +0000 (11:31 +0100)]
clover/llvm: Move CLOVER_DEBUG stuff into anonymous namespace.
[ Francisco Jerez: As we're at it make debug_options[] local to its
only user and remove temporary. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Dave Airlie [Tue, 27 Jan 2015 03:39:51 +0000 (13:39 +1000)]
r600g: add support for primitive id without geom shader (v2)
GLSL 1.50 specifies a fragment shader may have a primitive id
input without a geometry shader present.
On r600 hw there is a special GS scenario for this, you have
to enable GS_SCENARIO_A and pass the primitive id through
the vertex shader which operates in GS_A mode.
This is a first pass attempt at this, and passes the piglit
tests that test for this.
v1.1: clean up debug print + no need to assign
key value to setup output.
v2: add r600 support
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Tue, 27 Jan 2015 03:34:50 +0000 (13:34 +1000)]
r600g: move selecting the pixel shader earlier.
In order to detect that a pixel shader has a prim id
input when we have no geometry shader we need to reorder
the shader selection so the pixel shader is selected
first, then the vertex shader key can take into account
the primitive id input requirement and lack of geom shader.
Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Michel Dänzer [Thu, 22 Jan 2015 03:30:24 +0000 (12:30 +0900)]
st/clover: Pass target instead of target.begin() to std::string()
Fixes reading beyond allocated memory:
==1936== Invalid read of size 1
==1936== at 0x4C2C1B4: strlen (vg_replace_strmem.c:412)
==1936== by 0x9E00C30: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.20)
==1936== by 0x5B44FAE: clover::compile_program_llvm(clover::compat::string const&, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&, pipe_shader_ir, clover::compat::string const&, clover::compat::string const&, clover::compat::string&) (invocation.cpp:698)
==1936== by 0x5B39A20: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63)
==1936== by 0x5B20152: clBuildProgram (program.cpp:182)
==1936== by 0x400F41: main (hello_world.c:109)
==1936== Address 0x56fee1f is 0 bytes after a block of size 15 alloc'd
==1936== at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==1936== by 0x5B398F0: alloc (compat.hpp:59)
==1936== by 0x5B398F0: vector<std::basic_string<char> > (compat.hpp:98)
==1936== by 0x5B398F0: string<std::basic_string<char> > (compat.hpp:327)
==1936== by 0x5B398F0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63)
==1936== by 0x5B20152: clBuildProgram (program.cpp:182)
==1936== by 0x400F41: main (hello_world.c:109)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Michel Dänzer [Thu, 22 Jan 2015 03:36:13 +0000 (12:36 +0900)]
r600g,radeonsi: Fix calculation of IR target cap string buffer size
Fixes writing beyond the allocated buffer:
==31855== Invalid write of size 1
==31855== at 0x50AB2A9: vsprintf (iovsprintf.c:43)
==31855== by 0x508F6F6: sprintf (sprintf.c:32)
==31855== by 0xB59C7EC: r600_get_compute_param (r600_pipe_common.c:526)
==31855== by 0x5B2B7DE: get_compute_param<char> (device.cpp:37)
==31855== by 0x5B2B7DE: clover::device::ir_target() const (device.cpp:201)
==31855== by 0x5B398E0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63)
==31855== by 0x5B20152: clBuildProgram (program.cpp:182)
==31855== by 0x400F41: main (hello_world.c:109)
==31855== Address 0x56fed5f is 0 bytes after a block of size 15 alloc'd
==31855== at 0x4C29180: operator new(unsigned long) (vg_replace_malloc.c:324)
==31855== by 0x5B2B7C2: allocate (new_allocator.h:104)
==31855== by 0x5B2B7C2: allocate (alloc_traits.h:357)
==31855== by 0x5B2B7C2: _M_allocate (stl_vector.h:170)
==31855== by 0x5B2B7C2: _M_create_storage (stl_vector.h:185)
==31855== by 0x5B2B7C2: _Vector_base (stl_vector.h:136)
==31855== by 0x5B2B7C2: vector (stl_vector.h:278)
==31855== by 0x5B2B7C2: get_compute_param<char> (device.cpp:35)
==31855== by 0x5B2B7C2: clover::device::ir_target() const (device.cpp:201)
==31855== by 0x5B398E0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63)
==31855== by 0x5B20152: clBuildProgram (program.cpp:182)
==31855== by 0x400F41: main (hello_world.c:109)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Connor Abbott [Sun, 25 Jan 2015 16:47:53 +0000 (11:47 -0500)]
nir: fix a bug with constant folding non-per-component instructions
Before, we were only copying the first N channels, where N is the size
of the SSA destination, which is fine for per-component instructions,
but non-per-component instructions like fdot3 can have more source
components than destination components. Fix this using the helper
function introduced in the last patch.
v2: use new helper name
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Connor Abbott [Sun, 25 Jan 2015 16:42:34 +0000 (11:42 -0500)]
nir: add a helper function for getting the number of source components
Unlike with non-SSA ALU instructions, where if they're per-component
you have to look at the writemask to know which source channels are
being used, SSA ALU instructions always have all the possible channels
enabled so we can just look at the number of components in the SSA
definition for per-component instructions to say how many source
components are being used.
v2: use new name nir_ssa_alu_instr_src_components()
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Sisinty Sasmita Patra [Fri, 12 Dec 2014 21:03:21 +0000 (13:03 -0800)]
i965: Implemente a tiled fast-path for glReadPixels and glGetTexImage
Added intel_readpixels_tiled_mempcpy and intel_gettexsubimage_tiled_mempcpy
functions. These are the fast paths for glReadPixels and glGetTexImage.
On chrome, using the RoboHornet 2D Canvas toDataURL test, this patch cuts
amount of time spent in glReadPixels by more than half and reduces the time
of the entire test by 10%.
v2: Jason Ekstrand <jason.ekstrand@intel.com>
- Refactor to make the functions look more like the old
intel_tex_subimage_tiled_memcpy
- Don't export the readpixels_tiled_memcpy function
- Fix some pointer arithmatic bugs in partial image downloads (using
ReadPixels with a non-zero x or y offset)
- Fix a bug when ReadPixels is performed on an FBO wrapping a texture
miplevel other than zero.
v3: Jason Ekstrand <jason.ekstrand@intel.com>
- Better documentation fot the *_tiled_memcpy functions
- Add target restrictions for renderbuffers wrapping textures
v4: Jason Ekstrand <jason.ekstrand@intel.com>
- Only check the return value of brw_bo_map for error and not bo->virtual
v5: Jason Ekstrand <jason.ekstrand@intel.com>
- Don't unnecessarily repeat a comment
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Sisinty Sasmita Patra [Sat, 3 Jan 2015 19:16:08 +0000 (11:16 -0800)]
i965/tiled_memcpy: Add tiled-to-linear paths
This commit addes tiled copy functions for coping from tiled memory to
linear memory. These are very similar to the existing linear-to-tiled
paths.
v2: Jason Ekstrand <jason.ekstrand@intel.com>
- New commit message
- Various whitespace fixes
- Added ptrdiff_t casts as done in commit
225a09790
v3: Jason Ekstrand <jason.ekstrand@intel.com>
- Fixed a comment
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Sisinty Sasmita Patra [Fri, 12 Dec 2014 19:28:05 +0000 (11:28 -0800)]
i965: Refactor tiled memcpy functions and move them into their own file
This commit refactors the tiled_memcpy code in intel_tex_subimage.c and
moves it into its own file intel_tiled_memcpy files. Also, xtile_copy and
ytile_copy are renamed to linear_to_xtiled and linear_to_ytiled
respectively. The *_faster functions are similarly renamed.
There was also a bit of logic to select between the the libc provided
memcpy function and our custom memcpy that does an RGBA -> BGRA swizzle.
This was moved into an intel_get_memcpy function so that rgba8_copy can
live (and be inlined) in intel_tiled_memcpy.c.
v2: Jason Ekstrand <jason.ekstrand@intel.com>
- Better commit message
- Fix up the copyright on the intel_tiled_memcpy files
- Various whitespace fixes
- Moved a bunch of stuff that did not need to be exposed from
intel_tiled_memcpy.h to intel_tiled_memcpy.c
- Added proper documentation for intel_get_memcpy
- Incorperated the ptrdiff_t tweaks from commit
225a09790
v3: Jason Ekstrand <jason.ekstrand@intel.com>
- Fixed a comment
- Move the tile size constants into the .c file
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Jason Ekstrand [Sat, 3 Jan 2015 02:22:04 +0000 (18:22 -0800)]
i965/tex_subimage: Use the fast tiled path for rectangle textures
There's no reason why we should be doing this for 2D textures and not
rectangles. Just a matter of adding another hunk to the condition.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
Dave Airlie [Wed, 21 Jan 2015 06:40:26 +0000 (16:40 +1000)]
mesa/autoconf: attempt to use gnu99 on older gcc compilers
anonymous structs/union don't work with c99 but do work with gnu99
on gcc 4.4.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Felix Janda [Fri, 23 Jan 2015 16:57:15 +0000 (17:57 +0100)]
mesa: simplify detection of fpclassify
Fixes compilation with musl libc.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Jason Ekstrand [Mon, 26 Jan 2015 17:40:25 +0000 (09:40 -0800)]
nir/opcodes: Don't go through doubles when constant-folding iabs
Previously, we called the abs() function in math.h. However, this involves
unnecessarily going through double. This commit changes it to use integers
directly with a ternary.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Mon, 26 Jan 2015 17:36:58 +0000 (09:36 -0800)]
nir/opcodes: Simplify and fix the unpack_half_*_split_* constant expressions
Previously, these functions were explicitly writing to dst.x and dst.y.
However they both return only one component so writing to dst.y is invalid.
Also, since they only return one component, we don't need the explicit
assignment in the expression and can simplify it use an implicit
assignment.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Sat, 24 Jan 2015 00:57:40 +0000 (16:57 -0800)]
nir: Use pointers for nir_src_copy and nir_dest_copy
This avoids the overhead of copying structures and better matches the newly
added nir_alu_src_copy and nir_alu_dest_copy.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Kenneth Graunke [Sat, 24 Jan 2015 12:16:54 +0000 (04:16 -0800)]
i965: Handle CMP.nz ... 0 and MOV.nz similarly in cmod propagation.
"MOV.nz null src" and "CMP.nz null src 0" are equivalent instructions.
Previously, we deleted MOV.nz instructions when the instruction
generating the MOV's source also wrote the flag register (as the flag
register already contains the desired value). However, we wouldn't
delete CMP.nz instructions that served the same purpose.
We also didn't attempt true cmod propagation on MOV.nz instructions,
while we would for the equivalent CMP.nz form.
This patch fixes both limitations, treating both forms equally.
CMP.nz instructions will now be deleted (helping the NIR backend),
and MOV.nz instructions will have their .nz propagated.
No changes in shader-db without NIR. With NIR,
total instructions in shared programs:
6006153 ->
5969364 (-0.61%)
instructions in affected programs:
2087139 ->
2050350 (-1.76%)
helped: 10704
HURT: 0
GAINED: 2
LOST: 2
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jan Vesely [Sun, 25 Jan 2015 21:11:40 +0000 (16:11 -0500)]
clover: Fix build with llvm after r226981
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88783
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Niels Ole Salscheider [Sat, 24 Jan 2015 21:49:44 +0000 (22:49 +0100)]
configure: Link against all LLVM targets when building clover
Since
8e7df519bd8556591794b2de08a833a67e34d526, we initialise all targets in
clover. This fixes bug 85380.
v2: Mention correct bug in commit message
Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Connor Abbott [Fri, 23 Jan 2015 04:32:16 +0000 (23:32 -0500)]
nir/constant_folding: use the new constant folding infrastructure
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Jason Ekstrand [Fri, 23 Jan 2015 21:38:46 +0000 (13:38 -0800)]
nir: add new constant folding infrastructure
Add a required field to the Opcode class, const_expr, that contains an
expression or statement that computes the result of the opcode given known
constant inputs. Then take those const_expr's and expand them into a function
that takes an opcode and an array of constant inputs and spits out the constant
result. This means that when adding opcodes, there's one less place to update,
and almost all the opcodes are self-documenting since the information on how to
compute the result is right next to the definition.
The helper functions in nir_constant_expressions.c were taken from
ir_constant_expressions.cpp.
v3 Jason Ekstrand <jason.ekstrand@iastate.edu>
- Use mako to generate one function per opcode instead of doing piles of
string splicing
v4 Jason Ekstrand <jason.ekstrand@iastate.edu>
- More comments and better indentation in the mako
- Add a description of the constant expression language in nir_opcodes.py
- Added nir_constant_expressions.py to EXTRA_DIST in Makefile.am
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Connor Abbott [Fri, 23 Jan 2015 04:32:14 +0000 (23:32 -0500)]
nir: use Python to autogenerate opcode information
Before, we used a system where a file, nir_opcodes.h, defined some macros that
were included to generate the enum values and the nir_op_infos structure. This
worked pretty well, but for development the error messages were never very
useful, Python tools couldn't understand the opcode list, and it was difficult
to use nir_opcodes.h to do other things like autogenerate a builder API. Now, we
store opcode information in nir_opcodes.py, and we have nir_opcodes_c.py to
generate the old nir_opcodes.c and nir_opcodes_h.py to generate nir_opcodes.h,
which contains all the enum names and gets included into nir.h like before. In
addition to solving the above problems, using Python and Mako to generate
everything means that it's much easier to add keep information centralized as we
add new things like constant propagation that require per-opcode information.
v2:
- make Opcode derive from object (Dylan)
- don't use assert like it's a function (Dylan)
- style fixes for fnoise, use xrange (Dylan)
- use iterkeys() in nir_opcodes_h.py (Dylan)
- use pydoc-style comments (Jason)
- don't make fmin/fmax commutative and associative yet (Jason)
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v3 Jason Ekstrand <jason.ekstrand@intel.com>
- Alphabetize source file lists
- Generate nir_opcodes.h in the builddir instead of the source dir
- Include $(builddir)/src/glsl/nir in the i965 build
- Rework nir_opcodes.h generation so it generates a complete header file
instead of one that has to be embedded inside an enum declaration
Emil Velikov [Sat, 24 Jan 2015 13:18:10 +0000 (13:18 +0000)]
docs: add news item and link release notes for mesa 10.4.3
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Emil Velikov [Sat, 24 Jan 2015 12:54:33 +0000 (12:54 +0000)]
docs: Add sha256 sums for the 10.4.3 release
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit
49a5bce7801651574d3f4841d7532d8b2b86af63)
Emil Velikov [Sat, 24 Jan 2015 12:49:17 +0000 (12:49 +0000)]
Add release notes for the 10.4.3 release
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit
e92bfa3f9512832b61706b76b5c9f7afa78199b0)
Matt Turner [Mon, 5 Jan 2015 21:51:03 +0000 (13:51 -0800)]
i965: Convert CMP.GE -(abs)reg 0 -> CMP.Z reg 0.
total instructions in shared programs:
5952059 ->
5951603 (-0.01%)
instructions in affected programs: 138812 -> 138356 (-0.33%)
GAINED: 1
LOST: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Wed, 31 Dec 2014 01:19:41 +0000 (17:19 -0800)]
i965/fs: Add support for removing MOV.NZ instructions.
For some reason, we occasionally write the flag register with a MOV.NZ
instruction:
add(8) g25<1>F -g6<0,1,0>F g15<8,8,1>F
cmp.l.f0(8) g26<1>D g25<8,8,1>F 0F
mov.nz.f0(8) null g26<8,8,1>D
A MOV.NZ instruction on the result of a CMP is like comparing for
equality with true in C. It's useless. Removing it allows us to
generate:
add.l.f0(8) null -g6<0,1,0>F g15<8,8,1>F
total instructions in shared programs:
5955701 ->
5951657 (-0.07%)
instructions in affected programs: 302910 -> 298866 (-1.34%)
GAINED: 1
LOST: 0
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Tue, 30 Dec 2014 20:18:57 +0000 (12:18 -0800)]
i965/fs: Allow flipping cond mod for negated arguments.
This allows us to apply the optimization in cases where the CMP's
argument is negated, by flipping the conditional mod. For example, it
allows us to optimize this:
add(8) temp a b
cmp.l.f0(8) null -temp 0.0
into
add.g.f0(8) temp a b
total instructions in shared programs:
5958360 ->
5955701 (-0.04%)
instructions in affected programs: 466880 -> 464221 (-0.57%)
GAINED: 0
LOST: 1
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Sat, 3 Jan 2015 20:18:15 +0000 (12:18 -0800)]
i965/fs: Propagate cmod across flag read if it contains the same value.
total instructions in shared programs:
5959463 ->
5958900 (-0.01%)
instructions in affected programs: 70031 -> 69468 (-0.80%)
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Thu, 6 Nov 2014 00:13:59 +0000 (16:13 -0800)]
i965/fs: Add unit tests for cmod propagation pass.
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Fri, 22 Aug 2014 17:54:43 +0000 (10:54 -0700)]
i965/fs: Add pass to propagate conditional modifiers.
total instructions in shared programs:
5974160 ->
5959463 (-0.25%)
instructions in affected programs:
1743737 ->
1729040 (-0.84%)
GAINED: 0
LOST: 12
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Fri, 22 Aug 2014 18:01:13 +0000 (11:01 -0700)]
i965/fs: Eliminate null-dst instructions without side-effects.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Tue, 30 Dec 2014 20:56:13 +0000 (12:56 -0800)]
i965/fs: Apply conditional mod specially to split MAD/LRP.
Otherwise we'll apply the conditional mod to only one of SIMD8
instructions and trigger an assertion.
NoDDClr/NoDDChk have the same problem but we never apply those to these
instructions, so I'm leaving them for a later time.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>