mesa.git
9 years agoclover/llvm: Dump the OpenCL C code earlier.
EdB [Wed, 28 Jan 2015 00:20:38 +0000 (02:20 +0200)]
clover/llvm: Dump the OpenCL C code earlier.

[ Francisco Jerez: As discussed on the mailing list, this is intended
  to produce more useful debug output in cases where the compilation
  terminates unexpectedly. ]

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agoclover/llvm: Move CLOVER_DEBUG stuff into anonymous namespace.
EdB [Sun, 14 Dec 2014 10:31:21 +0000 (11:31 +0100)]
clover/llvm: Move CLOVER_DEBUG stuff into anonymous namespace.

[ Francisco Jerez: As we're at it make debug_options[] local to its
  only user and remove temporary. ]

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agor600g: add support for primitive id without geom shader (v2)
Dave Airlie [Tue, 27 Jan 2015 03:39:51 +0000 (13:39 +1000)]
r600g: add support for primitive id without geom shader (v2)

GLSL 1.50 specifies a fragment shader may have a primitive id
input without a geometry shader present.

On r600 hw there is a special GS scenario for this, you have
to enable GS_SCENARIO_A and pass the primitive id through
the vertex shader which operates in GS_A mode.

This is a first pass attempt at this, and passes the piglit
tests that test for this.

v1.1: clean up debug print + no need to assign
key value to setup output.
v2: add r600 support

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agor600g: move selecting the pixel shader earlier.
Dave Airlie [Tue, 27 Jan 2015 03:34:50 +0000 (13:34 +1000)]
r600g: move selecting the pixel shader earlier.

In order to detect that a pixel shader has a prim id
input when we have no geometry shader we need to reorder
the shader selection so the pixel shader is selected
first, then the vertex shader key can take into account
the primitive id input requirement and lack of geom shader.

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agost/clover: Pass target instead of target.begin() to std::string()
Michel Dänzer [Thu, 22 Jan 2015 03:30:24 +0000 (12:30 +0900)]
st/clover: Pass target instead of target.begin() to std::string()

Fixes reading beyond allocated memory:

==1936== Invalid read of size 1
==1936==    at 0x4C2C1B4: strlen (vg_replace_strmem.c:412)
==1936==    by 0x9E00C30: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.20)
==1936==    by 0x5B44FAE: clover::compile_program_llvm(clover::compat::string const&, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&, pipe_shader_ir, clover::compat::string const&, clover::compat::string const&, clover::compat::string&) (invocation.cpp:698)
==1936==    by 0x5B39A20: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63)
==1936==    by 0x5B20152: clBuildProgram (program.cpp:182)
==1936==    by 0x400F41: main (hello_world.c:109)
==1936==  Address 0x56fee1f is 0 bytes after a block of size 15 alloc'd
==1936==    at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==1936==    by 0x5B398F0: alloc (compat.hpp:59)
==1936==    by 0x5B398F0: vector<std::basic_string<char> > (compat.hpp:98)
==1936==    by 0x5B398F0: string<std::basic_string<char> > (compat.hpp:327)
==1936==    by 0x5B398F0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63)
==1936==    by 0x5B20152: clBuildProgram (program.cpp:182)
==1936==    by 0x400F41: main (hello_world.c:109)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agor600g,radeonsi: Fix calculation of IR target cap string buffer size
Michel Dänzer [Thu, 22 Jan 2015 03:36:13 +0000 (12:36 +0900)]
r600g,radeonsi: Fix calculation of IR target cap string buffer size

Fixes writing beyond the allocated buffer:

==31855== Invalid write of size 1
==31855==    at 0x50AB2A9: vsprintf (iovsprintf.c:43)
==31855==    by 0x508F6F6: sprintf (sprintf.c:32)
==31855==    by 0xB59C7EC: r600_get_compute_param (r600_pipe_common.c:526)
==31855==    by 0x5B2B7DE: get_compute_param<char> (device.cpp:37)
==31855==    by 0x5B2B7DE: clover::device::ir_target() const (device.cpp:201)
==31855==    by 0x5B398E0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63)
==31855==    by 0x5B20152: clBuildProgram (program.cpp:182)
==31855==    by 0x400F41: main (hello_world.c:109)
==31855==  Address 0x56fed5f is 0 bytes after a block of size 15 alloc'd
==31855==    at 0x4C29180: operator new(unsigned long) (vg_replace_malloc.c:324)
==31855==    by 0x5B2B7C2: allocate (new_allocator.h:104)
==31855==    by 0x5B2B7C2: allocate (alloc_traits.h:357)
==31855==    by 0x5B2B7C2: _M_allocate (stl_vector.h:170)
==31855==    by 0x5B2B7C2: _M_create_storage (stl_vector.h:185)
==31855==    by 0x5B2B7C2: _Vector_base (stl_vector.h:136)
==31855==    by 0x5B2B7C2: vector (stl_vector.h:278)
==31855==    by 0x5B2B7C2: get_compute_param<char> (device.cpp:35)
==31855==    by 0x5B2B7C2: clover::device::ir_target() const (device.cpp:201)
==31855==    by 0x5B398E0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63)
==31855==    by 0x5B20152: clBuildProgram (program.cpp:182)
==31855==    by 0x400F41: main (hello_world.c:109)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
9 years agonir: fix a bug with constant folding non-per-component instructions
Connor Abbott [Sun, 25 Jan 2015 16:47:53 +0000 (11:47 -0500)]
nir: fix a bug with constant folding non-per-component instructions

Before, we were only copying the first N channels, where N is the size
of the SSA destination, which is fine for per-component instructions,
but non-per-component instructions like fdot3 can have more source
components than destination components. Fix this using the helper
function introduced in the last patch.

v2: use new helper name

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: add a helper function for getting the number of source components
Connor Abbott [Sun, 25 Jan 2015 16:42:34 +0000 (11:42 -0500)]
nir: add a helper function for getting the number of source components

Unlike with non-SSA ALU instructions, where if they're per-component
you have to look at the writemask to know which source channels are
being used, SSA ALU instructions always have all the possible channels
enabled so we can just look at the number of components in the SSA
definition for per-component instructions to say how many source
components are being used.

v2: use new name nir_ssa_alu_instr_src_components()

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoi965: Implemente a tiled fast-path for glReadPixels and glGetTexImage
Sisinty Sasmita Patra [Fri, 12 Dec 2014 21:03:21 +0000 (13:03 -0800)]
i965: Implemente a tiled fast-path for glReadPixels and glGetTexImage

Added intel_readpixels_tiled_mempcpy and intel_gettexsubimage_tiled_mempcpy
functions. These are the fast paths for glReadPixels and glGetTexImage.

On chrome, using the RoboHornet 2D Canvas toDataURL test, this patch cuts
amount of time spent in glReadPixels by more than half and reduces the time
of the entire test by 10%.

v2: Jason Ekstrand <jason.ekstrand@intel.com>
   - Refactor to make the functions look more like the old
     intel_tex_subimage_tiled_memcpy
   - Don't export the readpixels_tiled_memcpy function
   - Fix some pointer arithmatic bugs in partial image downloads (using
     ReadPixels with a non-zero x or y offset)
   - Fix a bug when ReadPixels is performed on an FBO wrapping a texture
     miplevel other than zero.

v3: Jason Ekstrand <jason.ekstrand@intel.com>
   - Better documentation fot the *_tiled_memcpy functions
   - Add target restrictions for renderbuffers wrapping textures

v4: Jason Ekstrand <jason.ekstrand@intel.com>
   - Only check the return value of brw_bo_map for error and not bo->virtual

v5: Jason Ekstrand <jason.ekstrand@intel.com>
   - Don't unnecessarily repeat a comment

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
9 years agoi965/tiled_memcpy: Add tiled-to-linear paths
Sisinty Sasmita Patra [Sat, 3 Jan 2015 19:16:08 +0000 (11:16 -0800)]
i965/tiled_memcpy: Add tiled-to-linear paths

This commit addes tiled copy functions for coping from tiled memory to
linear memory.  These are very similar to the existing linear-to-tiled
paths.

v2: Jason Ekstrand <jason.ekstrand@intel.com>
   - New commit message
   - Various whitespace fixes
   - Added ptrdiff_t casts as done in commit 225a09790

v3: Jason Ekstrand <jason.ekstrand@intel.com>
   - Fixed a comment

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
9 years agoi965: Refactor tiled memcpy functions and move them into their own file
Sisinty Sasmita Patra [Fri, 12 Dec 2014 19:28:05 +0000 (11:28 -0800)]
i965: Refactor tiled memcpy functions and move them into their own file

This commit refactors the tiled_memcpy code in intel_tex_subimage.c and
moves it into its own file intel_tiled_memcpy files.  Also, xtile_copy and
ytile_copy are renamed to linear_to_xtiled and linear_to_ytiled
respectively.  The *_faster functions are similarly renamed.

There was also a bit of logic to select between the the libc provided
memcpy function and our custom memcpy that does an RGBA -> BGRA swizzle.
This was moved into an intel_get_memcpy function so that rgba8_copy can
live (and be inlined) in intel_tiled_memcpy.c.

v2: Jason Ekstrand <jason.ekstrand@intel.com>
   - Better commit message
   - Fix up the copyright on the intel_tiled_memcpy files
   - Various whitespace fixes
   - Moved a bunch of stuff that did not need to be exposed from
     intel_tiled_memcpy.h to intel_tiled_memcpy.c
   - Added proper documentation for intel_get_memcpy
   - Incorperated the ptrdiff_t tweaks from commit 225a09790

v3: Jason Ekstrand <jason.ekstrand@intel.com>
   - Fixed a comment
   - Move the tile size constants into the .c file

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
9 years agoi965/tex_subimage: Use the fast tiled path for rectangle textures
Jason Ekstrand [Sat, 3 Jan 2015 02:22:04 +0000 (18:22 -0800)]
i965/tex_subimage: Use the fast tiled path for rectangle textures

There's no reason why we should be doing this for 2D textures and not
rectangles.  Just a matter of adding another hunk to the condition.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>
9 years agomesa/autoconf: attempt to use gnu99 on older gcc compilers
Dave Airlie [Wed, 21 Jan 2015 06:40:26 +0000 (16:40 +1000)]
mesa/autoconf: attempt to use gnu99 on older gcc compilers

anonymous structs/union don't work with c99 but do work with gnu99
on gcc 4.4.

Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agomesa: simplify detection of fpclassify
Felix Janda [Fri, 23 Jan 2015 16:57:15 +0000 (17:57 +0100)]
mesa: simplify detection of fpclassify

Fixes compilation with musl libc.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agonir/opcodes: Don't go through doubles when constant-folding iabs
Jason Ekstrand [Mon, 26 Jan 2015 17:40:25 +0000 (09:40 -0800)]
nir/opcodes: Don't go through doubles when constant-folding iabs

Previously, we called the abs() function in math.h.  However, this involves
unnecessarily going through double.  This commit changes it to use integers
directly with a ternary.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
9 years agonir/opcodes: Simplify and fix the unpack_half_*_split_* constant expressions
Jason Ekstrand [Mon, 26 Jan 2015 17:36:58 +0000 (09:36 -0800)]
nir/opcodes: Simplify and fix the unpack_half_*_split_* constant expressions

Previously, these functions were explicitly writing to dst.x and dst.y.
However they both return only one component so writing to dst.y is invalid.
Also, since they only return one component, we don't need the explicit
assignment in the expression and can simplify it use an implicit
assignment.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: Use pointers for nir_src_copy and nir_dest_copy
Jason Ekstrand [Sat, 24 Jan 2015 00:57:40 +0000 (16:57 -0800)]
nir: Use pointers for nir_src_copy and nir_dest_copy

This avoids the overhead of copying structures and better matches the newly
added nir_alu_src_copy and nir_alu_dest_copy.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoi965: Handle CMP.nz ... 0 and MOV.nz similarly in cmod propagation.
Kenneth Graunke [Sat, 24 Jan 2015 12:16:54 +0000 (04:16 -0800)]
i965: Handle CMP.nz ... 0 and MOV.nz similarly in cmod propagation.

"MOV.nz null src" and "CMP.nz null src 0" are equivalent instructions.

Previously, we deleted MOV.nz instructions when the instruction
generating the MOV's source also wrote the flag register (as the flag
register already contains the desired value).  However, we wouldn't
delete CMP.nz instructions that served the same purpose.

We also didn't attempt true cmod propagation on MOV.nz instructions,
while we would for the equivalent CMP.nz form.

This patch fixes both limitations, treating both forms equally.
CMP.nz instructions will now be deleted (helping the NIR backend),
and MOV.nz instructions will have their .nz propagated.

No changes in shader-db without NIR.  With NIR,

total instructions in shared programs: 6006153 -> 5969364 (-0.61%)
instructions in affected programs:     2087139 -> 2050350 (-1.76%)
helped:                                10704
HURT:                                  0
GAINED:                                2
LOST:                                  2

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoclover: Fix build with llvm after r226981
Jan Vesely [Sun, 25 Jan 2015 21:11:40 +0000 (16:11 -0500)]
clover: Fix build with llvm after r226981

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88783
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
9 years agoconfigure: Link against all LLVM targets when building clover
Niels Ole Salscheider [Sat, 24 Jan 2015 21:49:44 +0000 (22:49 +0100)]
configure: Link against all LLVM targets when building clover

Since 8e7df519bd8556591794b2de08a833a67e34d526, we initialise all targets in
clover. This fixes bug 85380.

v2: Mention correct bug in commit message

Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agonir/constant_folding: use the new constant folding infrastructure
Connor Abbott [Fri, 23 Jan 2015 04:32:16 +0000 (23:32 -0500)]
nir/constant_folding: use the new constant folding infrastructure

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agonir: add new constant folding infrastructure
Jason Ekstrand [Fri, 23 Jan 2015 21:38:46 +0000 (13:38 -0800)]
nir: add new constant folding infrastructure

Add a required field to the Opcode class, const_expr, that contains an
expression or statement that computes the result of the opcode given known
constant inputs. Then take those const_expr's and expand them into a function
that takes an opcode and an array of constant inputs and spits out the constant
result. This means that when adding opcodes, there's one less place to update,
and almost all the opcodes are self-documenting since the information on how to
compute the result is right next to the definition.

The helper functions in nir_constant_expressions.c were taken from
ir_constant_expressions.cpp.

v3 Jason Ekstrand <jason.ekstrand@iastate.edu>
 - Use mako to generate one function per opcode instead of doing piles of
   string splicing

v4 Jason Ekstrand <jason.ekstrand@iastate.edu>
 - More comments and better indentation in the mako
 - Add a description of the constant expression language in nir_opcodes.py
 - Added nir_constant_expressions.py to EXTRA_DIST in Makefile.am

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: use Python to autogenerate opcode information
Connor Abbott [Fri, 23 Jan 2015 04:32:14 +0000 (23:32 -0500)]
nir: use Python to autogenerate opcode information

Before, we used a system where a file, nir_opcodes.h, defined some macros that
were included to generate the enum values and the nir_op_infos structure. This
worked pretty well, but for development the error messages were never very
useful, Python tools couldn't understand the opcode list, and it was difficult
to use nir_opcodes.h to do other things like autogenerate a builder API. Now, we
store opcode information in nir_opcodes.py, and we have nir_opcodes_c.py to
generate the old nir_opcodes.c and nir_opcodes_h.py to generate nir_opcodes.h,
which contains all the enum names and gets included into nir.h like before.  In
addition to solving the above problems, using Python and Mako to generate
everything means that it's much easier to add keep information centralized as we
add new things like constant propagation that require per-opcode information.

v2:
 - make Opcode derive from object (Dylan)
 - don't use assert like it's a function (Dylan)
 - style fixes for fnoise, use xrange (Dylan)
 - use iterkeys() in nir_opcodes_h.py (Dylan)
 - use pydoc-style comments (Jason)
 - don't make fmin/fmax commutative and associative yet (Jason)

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v3 Jason Ekstrand <jason.ekstrand@intel.com>
 - Alphabetize source file lists
 - Generate nir_opcodes.h in the builddir instead of the source dir
 - Include $(builddir)/src/glsl/nir in the i965 build
 - Rework nir_opcodes.h generation so it generates a complete header file
   instead of one that has to be embedded inside an enum declaration

9 years agodocs: add news item and link release notes for mesa 10.4.3
Emil Velikov [Sat, 24 Jan 2015 13:18:10 +0000 (13:18 +0000)]
docs: add news item and link release notes for mesa 10.4.3

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
9 years agodocs: Add sha256 sums for the 10.4.3 release
Emil Velikov [Sat, 24 Jan 2015 12:54:33 +0000 (12:54 +0000)]
docs: Add sha256 sums for the 10.4.3 release

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 49a5bce7801651574d3f4841d7532d8b2b86af63)

9 years agoAdd release notes for the 10.4.3 release
Emil Velikov [Sat, 24 Jan 2015 12:49:17 +0000 (12:49 +0000)]
Add release notes for the 10.4.3 release

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit e92bfa3f9512832b61706b76b5c9f7afa78199b0)

9 years agoi965: Convert CMP.GE -(abs)reg 0 -> CMP.Z reg 0.
Matt Turner [Mon, 5 Jan 2015 21:51:03 +0000 (13:51 -0800)]
i965: Convert CMP.GE -(abs)reg 0 -> CMP.Z reg 0.

total instructions in shared programs: 5952059 -> 5951603 (-0.01%)
instructions in affected programs:     138812 -> 138356 (-0.33%)
GAINED:                                1
LOST:                                  0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoi965/fs: Add support for removing MOV.NZ instructions.
Matt Turner [Wed, 31 Dec 2014 01:19:41 +0000 (17:19 -0800)]
i965/fs: Add support for removing MOV.NZ instructions.

For some reason, we occasionally write the flag register with a MOV.NZ
instruction:

   add(8)          g25<1>F         -g6<0,1,0>F     g15<8,8,1>F
   cmp.l.f0(8)     g26<1>D         g25<8,8,1>F     0F
   mov.nz.f0(8)    null            g26<8,8,1>D

A MOV.NZ instruction on the result of a CMP is like comparing for
equality with true in C. It's useless. Removing it allows us to
generate:

   add.l.f0(8)     null            -g6<0,1,0>F     g15<8,8,1>F

total instructions in shared programs: 5955701 -> 5951657 (-0.07%)
instructions in affected programs:     302910 -> 298866 (-1.34%)
GAINED:                                1
LOST:                                  0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoi965/fs: Allow flipping cond mod for negated arguments.
Matt Turner [Tue, 30 Dec 2014 20:18:57 +0000 (12:18 -0800)]
i965/fs: Allow flipping cond mod for negated arguments.

This allows us to apply the optimization in cases where the CMP's
argument is negated, by flipping the conditional mod. For example, it
allows us to optimize this:

   add(8)       temp   a      b
   cmp.l.f0(8)  null   -temp  0.0

into

   add.g.f0(8)  temp   a      b

total instructions in shared programs: 5958360 -> 5955701 (-0.04%)
instructions in affected programs:     466880 -> 464221 (-0.57%)
GAINED:                                0
LOST:                                  1

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoi965/fs: Propagate cmod across flag read if it contains the same value.
Matt Turner [Sat, 3 Jan 2015 20:18:15 +0000 (12:18 -0800)]
i965/fs: Propagate cmod across flag read if it contains the same value.

total instructions in shared programs: 5959463 -> 5958900 (-0.01%)
instructions in affected programs:     70031 -> 69468 (-0.80%)

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoi965/fs: Add unit tests for cmod propagation pass.
Matt Turner [Thu, 6 Nov 2014 00:13:59 +0000 (16:13 -0800)]
i965/fs: Add unit tests for cmod propagation pass.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoi965/fs: Add pass to propagate conditional modifiers.
Matt Turner [Fri, 22 Aug 2014 17:54:43 +0000 (10:54 -0700)]
i965/fs: Add pass to propagate conditional modifiers.

total instructions in shared programs: 5974160 -> 5959463 (-0.25%)
instructions in affected programs:     1743737 -> 1729040 (-0.84%)
GAINED:                                0
LOST:                                  12

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoi965/fs: Eliminate null-dst instructions without side-effects.
Matt Turner [Fri, 22 Aug 2014 18:01:13 +0000 (11:01 -0700)]
i965/fs: Eliminate null-dst instructions without side-effects.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoi965/fs: Apply conditional mod specially to split MAD/LRP.
Matt Turner [Tue, 30 Dec 2014 20:56:13 +0000 (12:56 -0800)]
i965/fs: Apply conditional mod specially to split MAD/LRP.

Otherwise we'll apply the conditional mod to only one of SIMD8
instructions and trigger an assertion.

NoDDClr/NoDDChk have the same problem but we never apply those to these
instructions, so I'm leaving them for a later time.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoi965/fs: Add a pass to fixup 3-src instructions that have a null dest.
Matt Turner [Tue, 30 Dec 2014 04:33:12 +0000 (20:33 -0800)]
i965/fs: Add a pass to fixup 3-src instructions that have a null dest.

3-src instructions can only have GRF/MRF destinations. It's really
difficult to deal with that restriction in dead code elimination (that
wants to give instructions null destinations to show that their result
isn't used) while allowing 3-src instructions to have conditional mod,
so don't, and just give then a destination before register allocation.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoi965: Add is_3src() to backend_instruction.
Matt Turner [Tue, 30 Dec 2014 03:29:21 +0000 (19:29 -0800)]
i965: Add is_3src() to backend_instruction.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoi965: Add backend_instruction::can_do_cmod().
Matt Turner [Sun, 24 Aug 2014 21:01:48 +0000 (14:01 -0700)]
i965: Add backend_instruction::can_do_cmod().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoi965/cfg: Add a foreach_block_reverse macro.
Matt Turner [Wed, 31 Dec 2014 03:54:50 +0000 (19:54 -0800)]
i965/cfg: Add a foreach_block_reverse macro.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agoi965/cfg: Add a foreach_inst_in_block_reverse_safe macro.
Matt Turner [Wed, 31 Dec 2014 00:14:43 +0000 (16:14 -0800)]
i965/cfg: Add a foreach_inst_in_block_reverse_safe macro.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agoglsl: Add a foreach_in_list_reverse_safe macro.
Matt Turner [Wed, 31 Dec 2014 00:14:25 +0000 (16:14 -0800)]
glsl: Add a foreach_in_list_reverse_safe macro.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoi965: Don't make instructions with a null dest a barrier to scheduling.
Matt Turner [Wed, 9 Apr 2014 20:38:14 +0000 (13:38 -0700)]
i965: Don't make instructions with a null dest a barrier to scheduling.

Now that we properly track accumulator dependencies, the scheduler is
able to schedule instructions between the mach and mov in the common
the integer multiplication pattern:

   mul  acc0, x, y
   mach null, x, y
   mov  dest, acc0

Since a null destination implies no dependency on the destination, we
can also safely schedule instructions (that don't write the accumulator)
between the mul and mach.

GAINED:                                103
LOST:                                  43

Causes one program to spill (643 -> 1076 instructions).

I committed this patch last year (commit 42a26cb5) but reverted it
(commit 0d3f83f4) after inexplicable artifacts in Kerbal Space Program
(bug 78648). Tapani reapplied this patch and could not reproduce the bug
with current Mesa.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agoi965/fs: Allow SIMD16 on pre-SNB when try_replace_with_sel is successful
Ian Romanick [Fri, 23 Jan 2015 20:31:05 +0000 (12:31 -0800)]
i965/fs: Allow SIMD16 on pre-SNB when try_replace_with_sel is successful

If try_replace_with_sel is able to replace the flow control with a SEL
instruction, then there is no flow control... failing SIMD16 because
of nonexistent flow control is wrong.

No piglit regressions on any i965 platform in Jenkins.

total instructions in shared programs: 4382707 -> 4382707 (0.00%)
instructions in affected programs:     0 -> 0
helped:                                0
HURT:                                  0
GAINED:                                2089
LOST:                                  0

No other platforms affected in shader-db.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agonir: Expose nir_print_instr() for debug prints
Eric Anholt [Wed, 5 Nov 2014 23:10:37 +0000 (15:10 -0800)]
nir: Expose nir_print_instr() for debug prints

It's nice to have this present in your default cases so you can see what
instruction is triggering an abort.

v2: Just pass a NULL state, now that it won't crash when you do.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agonir: When asked to print with a NULL state, just use bare variable names.
Eric Anholt [Fri, 23 Jan 2015 22:47:50 +0000 (14:47 -0800)]
nir: When asked to print with a NULL state, just use bare variable names.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agonir: Add nir_lower_alu_to_scalar.
Eric Anholt [Thu, 13 Nov 2014 20:40:59 +0000 (12:40 -0800)]
nir: Add nir_lower_alu_to_scalar.

This is the equivalent of brw_fs_channel_expressions.cpp, which I wanted
for vc4.

v2: Use the nir_src_for_ssa() helper, and another instance of
    nir_alu_src_copy().
v3: Drop the non-SSA support.  All intended callers will have SSA-only ALU
    ops.
v4: Use insert_before, drop stale bcsel/fcsel comment, drop now-unused
    unsupported() function, drop lower_context struct.
v5: Completely rename the pass to nir_lower_alu_to_scalar(), add an assert
    about weird input_sizes[].

Reviewed-by: Jason Ekstrand <jason.ekstrand@iastate.edu>
9 years agonir: Make some helpers for copying ALU src/dests.
Eric Anholt [Wed, 21 Jan 2015 23:55:23 +0000 (15:55 -0800)]
nir: Make some helpers for copying ALU src/dests.

There aren't many users yet, but I wanted to do this from my scalarizing
pass.

v2: Constify the src arguments.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agonir: Add algebraic optimizations for division and reciprocal.
Kenneth Graunke [Thu, 22 Jan 2015 07:52:17 +0000 (23:52 -0800)]
nir: Add algebraic optimizations for division and reciprocal.

These also exist in opt_algebraic.cpp.

total NIR instructions in shared programs: 2011430 -> 2011211 (-0.01%)
NIR instructions in affected programs:     42221 -> 42002 (-0.52%)
helped:                                    198

total i965 instructions in shared programs: 6020553 -> 6020116 (-0.01%)
i965 instructions in affected programs:     84322 -> 83885 (-0.52%)
helped:                                     394
HURT:                                       1 (by 1 instruction)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agonir: Add algebraic optimizations for exponential/logarithmic functions.
Kenneth Graunke [Thu, 22 Jan 2015 07:47:06 +0000 (23:47 -0800)]
nir: Add algebraic optimizations for exponential/logarithmic functions.

Most of these exist in the GLSL IR algebraic pass already.  However,
SSA allows us to find more instances of the patterns.

total NIR instructions in shared programs: 2015593 -> 2011430 (-0.21%)
NIR instructions in affected programs:     124189 -> 120026 (-3.35%)
helped:                                    604

total i965 instructions in shared programs: 6025505 -> 6018717 (-0.11%)
i965 instructions in affected programs:     261295 -> 254507 (-2.60%)
helped:                                     1295
HURT:                                       3

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agonir: Add algebraic optimizations for simplifying comparisons.
Kenneth Graunke [Thu, 22 Jan 2015 07:36:01 +0000 (23:36 -0800)]
nir: Add algebraic optimizations for simplifying comparisons.

The first batch removes bonus fnot/inot operations, possibly allowing
other optimizations to better recognize patterns.

The next batch replaces a fadd and constant 0.0 with an fneg - negation
is usually free on GPUs, while addition is not.

total NIR instructions in shared programs: 2020814 -> 2015593 (-0.26%)
NIR instructions in affected programs:     411143 -> 405922 (-1.27%)
helped:                                    2233
HURT:                                      214

A few shaders are hurt by a few instructions due to moving neg such
that it has a constant operand, which is then folded, resulting in two
distinct load_consts for x and -x.  We can always clean that up later.

total i965 instructions in shared programs: 6035392 -> 6025505 (-0.16%)
i965 instructions in affected programs:     784980 -> 775093 (-1.26%)
helped:                                     4508
HURT:                                       2

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agonir: Add algebraic optimizations for pointless shifts.
Kenneth Graunke [Thu, 22 Jan 2015 07:25:56 +0000 (23:25 -0800)]
nir: Add algebraic optimizations for pointless shifts.

The GLSL IR optimization pass contained these; we may as well include
them too.

v2: Fix a >> 0 and a << 0 optimizations (caught by Matt).

No change in the number of NIR instructions on a shader-db run.

total i965 instructions in shared programs: 6035397 -> 6035392 (-0.00%)
i965 instructions in affected programs:     542 -> 537 (-0.92%)
helped:                                     2 (in glamor)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agonir: Add a bunch of algebraic optimizations on logic/bit operations.
Kenneth Graunke [Mon, 19 Jan 2015 22:57:38 +0000 (14:57 -0800)]
nir: Add a bunch of algebraic optimizations on logic/bit operations.

Matt and I noticed a bunch of "val <- ior a a" operations in a shader,
so we decided to add an algebraic optimization for that.  While there,
I decided to add a bunch more of them.

v2: Delete bogus fand/for optimizations (caught by Jason).

total NIR instructions in shared programs: 2023511 -> 2020814 (-0.13%)
NIR instructions in affected programs:     149634 -> 146937 (-1.80%)
helped:                                    1032

total i965 instructions in shared programs: 6035392 -> 6035397 (0.00%)
i965 instructions in affected programs:     537 -> 542 (0.93%)
HURT:                                       2

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agonir: Implement CSE on intrinsics that can be eliminated and reordered.
Kenneth Graunke [Mon, 19 Jan 2015 22:51:24 +0000 (14:51 -0800)]
nir: Implement CSE on intrinsics that can be eliminated and reordered.

Matt and I noticed that one of the shaders hurt by INTEL_USE_NIR=1 had
load_input and load_uniform intrinsics repeated several times, with the
same parameters, but each one generating a distinct SSA value.  This
made ALU operations on those values appear distinct as well.

Generating distinct SSA values is silly - these are read only variables.
CSE'ing them makes everything use a single SSA value, which then allows
other operations to be CSE'd away as well.

Generalizing a bit, it seems like we should be able to safely CSE any
intrinsics that can be eliminated and reordered.  I didn't implement
support for variables for the time being.

v2: Assert that info->num_variables == 0 (requested by Jason).

total NIR instructions in shared programs: 2435936 -> 2023511 (-16.93%)
NIR instructions in affected programs:     2413496 -> 2001071 (-17.09%)
helped:                                    16872

total i965 instructions in shared programs: 6028987 -> 6008427 (-0.34%)
i965 instructions in affected programs:     640654 -> 620094 (-3.21%)
helped:                                     2071
HURT:                                       585
GAINED:                                     14
LOST:                                       25

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agonir: Pull nir_instr_can_cse()'s SSA checks out of the switch.
Kenneth Graunke [Wed, 21 Jan 2015 20:20:59 +0000 (12:20 -0800)]
nir: Pull nir_instr_can_cse()'s SSA checks out of the switch.

This should not be a change in behavior, as all current cases that
potentially answer "yes" require SSA.

The next patch will introduce another case that requires SSA.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoi965/nir: Report NIR instruction counts (in SSA form) via KHR_debug.
Kenneth Graunke [Wed, 21 Jan 2015 09:51:21 +0000 (01:51 -0800)]
i965/nir: Report NIR instruction counts (in SSA form) via KHR_debug.

This allows us to count NIR instructions via shader-db.

Use "run" as normal.  The results file will contain both NIR and
assembly.

Then, to generate a NIR report:
./report.py <(grep    NIR results/foo) <(grep    NIR results/bar)

Or, to generate an i965 report:
./report.py <(grep -v NIR results/foo) <(grep -v NIR results/bar)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoi965/nir: Print NIR on INTEL_DEBUG=fs.
Kenneth Graunke [Tue, 20 Jan 2015 07:11:54 +0000 (23:11 -0800)]
i965/nir: Print NIR on INTEL_DEBUG=fs.

This is useful for debugging and looking for optimization opportunities.

It will need to be expanded when we add support for other scalar stages.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoi965/nir: Do optimizations again just before lowering source mods.
Kenneth Graunke [Tue, 20 Jan 2015 06:11:39 +0000 (22:11 -0800)]
i965/nir: Do optimizations again just before lowering source mods.

We want to run CSE and algebraic optimizations again after lowering IO.
Some of the passes in the optimization loop don't handle saturates and
other modifiers, so run it before lowering to source modifiers.

total instructions in shared programs: 6046190 -> 6045768 (-0.01%)
instructions in affected programs:     22406 -> 21984 (-1.88%)
helped:                                47
HURT:                                  0
GAINED:                                0
LOST:                                  0

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agoloader: Remove NEED_OPENGL_COMMON check.
Matt Turner [Thu, 18 Dec 2014 22:10:17 +0000 (14:10 -0800)]
loader: Remove NEED_OPENGL_COMMON check.

HAVE_DRICOMMON is sufficient since OpenGL must be enabled for DRI.

9 years agogitignore: Ignore .tar.xz files.
Matt Turner [Thu, 18 Dec 2014 21:35:47 +0000 (13:35 -0800)]
gitignore: Ignore .tar.xz files.

9 years agomesa: Build with subdir-objects.
Matt Turner [Thu, 18 Dec 2014 19:44:07 +0000 (11:44 -0800)]
mesa: Build with subdir-objects.

9 years agoglsl: Build a libglsl_util library.
Matt Turner [Thu, 18 Dec 2014 21:33:29 +0000 (13:33 -0800)]
glsl: Build a libglsl_util library.

Rather than sourcing files with ../dir/file.c which leads to distclean
wiping out ../dir's .deps directory.

9 years agomapi: Build with subdir-objects.
Matt Turner [Thu, 18 Dec 2014 02:50:25 +0000 (18:50 -0800)]
mapi: Build with subdir-objects.

9 years agomapi: Remove vgapi from SUBDIRS.
Matt Turner [Thu, 18 Dec 2014 04:28:19 +0000 (20:28 -0800)]
mapi: Remove vgapi from SUBDIRS.

OpenVG is disabled with via autotools.

9 years agomesa: Drop inclusion of glapi_gen.mk.
Matt Turner [Thu, 18 Dec 2014 03:46:22 +0000 (19:46 -0800)]
mesa: Drop inclusion of glapi_gen.mk.

Some glapi headers used to be generated from this Makefile.am, but no
longer.

9 years agoglsl: Build with subdir-objects.
Matt Turner [Mon, 8 Dec 2014 00:09:35 +0000 (16:09 -0800)]
glsl: Build with subdir-objects.

Apparently $(top_srcdir) is not expanded in a source list when using
subdir-objects, so remove that. It's not clear to me why we were going
to such lengths to prefix each source file anyway.

9 years agonir: Add headers to distribution.
Matt Turner [Fri, 23 Jan 2015 22:27:39 +0000 (14:27 -0800)]
nir: Add headers to distribution.

9 years agonir: Add nir_{opt_,}algebraic.py to distribution.
Matt Turner [Fri, 23 Jan 2015 22:25:46 +0000 (14:25 -0800)]
nir: Add nir_{opt_,}algebraic.py to distribution.

9 years agomesa: Add format_{un,}pack.py to distribution.
Matt Turner [Fri, 23 Jan 2015 22:25:10 +0000 (14:25 -0800)]
mesa: Add format_{un,}pack.py to distribution.

9 years agomesa: Remove pack_tmp.h from sources.
Matt Turner [Fri, 23 Jan 2015 21:35:25 +0000 (13:35 -0800)]
mesa: Remove pack_tmp.h from sources.

Missed in commit 3a4de321.

9 years agonir: add generated file to .gitignore
Connor Abbott [Fri, 23 Jan 2015 04:32:13 +0000 (23:32 -0500)]
nir: add generated file to .gitignore

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agoi965: Fix min_vs_entries for CHV
Ville Syrjälä [Mon, 19 Jan 2015 14:09:10 +0000 (16:09 +0200)]
i965: Fix min_vs_entries for CHV

According to BSpec the correct number for min_vs_entries is 34 for CHV.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
9 years agoi965: Fix max_wm_threads for CHV
Ville Syrjälä [Mon, 19 Jan 2015 14:08:31 +0000 (16:08 +0200)]
i965: Fix max_wm_threads for CHV

Change max_wm_threads to match the spec on CHV. The max number of
threads in 3DSTATE_PS is always programmed to 64 and the hardware
internally scales that depending on the GT SKU. So this doesn't
change the max number of threads actually used, but it does affect
the scratch space calculation.

On CHV the old value was too small, so the amount of scratch space
allocated wasn't sufficient to satisfy the actual max number of
threads used.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
9 years agoglsl: fix stale comment
Connor Abbott [Wed, 17 Dec 2014 03:32:21 +0000 (22:32 -0500)]
glsl: fix stale comment

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoi965/emit: Assert that src1 is not an MRF after doing the MRF->GRF conversion
Jason Ekstrand [Thu, 22 Jan 2015 23:49:56 +0000 (15:49 -0800)]
i965/emit: Assert that src1 is not an MRF after doing the MRF->GRF conversion

When emitting texturing from indirect texture units, we need to be able to
scratch around in the header message.  Since we only do this for >= HSW,
this is ok since there are no MRFs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj phogat <anuj.phogat@gmail.com>
9 years agoi965/emit: Do the sampler index adjustment directly in header.0.3
Jason Ekstrand [Thu, 22 Jan 2015 21:46:44 +0000 (13:46 -0800)]
i965/emit: Do the sampler index adjustment directly in header.0.3

Prior to this commit, the adjust_sampler_state_pointer function took an
extra register that it could use as scratch space.  The usual candidate was
the destination of the sampler instruction.  However, if that register ever
aliased anything important such as the sampler index, this would scratch
all over important data.  Fortunately, the calculation is such that we can
just do it in place and we don't need the scratch space at all.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
9 years agost/nine: Correctly handle when ff vs should have no texture coord input/output
Axel Davy [Wed, 7 Jan 2015 09:27:23 +0000 (10:27 +0100)]
st/nine: Correctly handle when ff vs should have no texture coord input/output

Previous code semantic was:

. if ff ps will not run a ff stage, then do not output texture coords for this stage
for vs
. if XYZRHW is used (position_t), use only the mode where input coordinates are copied
to the outputs.

Problem is when apps don't give texture inputs. When apps precise PASSTHRU, it means
copy texture coord input to texture coord output if there is such input. The case
where there is no texture coord input wasn't handled correctly.

Drivers like r300 dislike when vs has inputs that are not fed.

Moreover if the app uses ff vs with a programmable ps, we shouldn't look at
what are the parameters of the ff ps to decide to output or not texture
coordinates.

The new code semantic is:

. if XYZRHW is used, restrict to PASSTHRU
. if PASSTHRU is used and no texture input is declared, then do not output
texture coords for this stage

The case where ff ps needs a texture coord input and ff vs doesn't output
it is not handled, and should probably be a runtime error.

This fixes 3Dmark05, which uses ff vs with programmable ps.

Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
9 years agost/nine: Change comment relating to vertex shader inputs not matching declaration
Axel Davy [Mon, 5 Jan 2015 15:26:27 +0000 (16:26 +0100)]
st/nine: Change comment relating to vertex shader inputs not matching declaration

Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
9 years agost/nine: Allocate vs constbuf buffer for indirect addressing once.
Axel Davy [Sat, 3 Jan 2015 10:29:40 +0000 (11:29 +0100)]
st/nine: Allocate vs constbuf buffer for indirect addressing once.

When the shader does indirect addressing on the constants,
we allocate a temporary constant buffer to which we copy
the constants from the app given user constants and
the constants filled in the shader.

This patch makes this buffer be allocated once.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Tiziano Bacocco <tizbac2@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Allocate the correct size for the user constant buffer
Axel Davy [Fri, 2 Jan 2015 13:38:01 +0000 (14:38 +0100)]
st/nine: Allocate the correct size for the user constant buffer

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Add variables containing the size of the constant buffers
Axel Davy [Fri, 2 Jan 2015 13:22:17 +0000 (14:22 +0100)]
st/nine: Add variables containing the size of the constant buffers

Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Fix sm3 relative addressing for non-debug build
Axel Davy [Sun, 7 Dec 2014 12:42:41 +0000 (13:42 +0100)]
st/nine: Fix sm3 relative addressing for non-debug build

Relative addressing needs the constant buffer to get all
the correct constants, even those defined by the shader.

The code to copy the shader constants to the constant buffer
was enabled only for debug build. Enable it always.

Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
9 years agost/nine: Remove unused code for ps
Axel Davy [Sat, 6 Dec 2014 23:14:19 +0000 (00:14 +0100)]
st/nine: Remove unused code for ps

Since constant indirect adressing is not allowed for ps,
we can remove our code to handle that.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Correct rules for relative adressing and constants.
Axel Davy [Sat, 6 Dec 2014 21:26:50 +0000 (22:26 +0100)]
st/nine: Correct rules for relative adressing and constants.

relative adressing for constants is possible only for vs float
constants.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Implement TEXREG2AR, TEXREG2GB and TEXREG2RGB
Axel Davy [Sun, 28 Dec 2014 13:56:02 +0000 (14:56 +0100)]
st/nine: Implement TEXREG2AR, TEXREG2GB and TEXREG2RGB

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Implement TEXDP3TEX
Axel Davy [Sun, 28 Dec 2014 13:46:01 +0000 (14:46 +0100)]
st/nine: Implement TEXDP3TEX

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Implement TEXDP3
Axel Davy [Sun, 28 Dec 2014 13:42:33 +0000 (14:42 +0100)]
st/nine: Implement TEXDP3

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Implement TEXDEPTH
Axel Davy [Sun, 28 Dec 2014 13:38:25 +0000 (14:38 +0100)]
st/nine: Implement TEXDEPTH

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Implement TEXM3x3SPEC
Axel Davy [Sun, 28 Dec 2014 13:26:12 +0000 (14:26 +0100)]
st/nine: Implement TEXM3x3SPEC

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Implement TEXM3x2TEX
Axel Davy [Sun, 28 Dec 2014 13:21:15 +0000 (14:21 +0100)]
st/nine: Implement TEXM3x2TEX

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: implement TEXM3x2DEPTH
Axel Davy [Sun, 28 Dec 2014 13:18:26 +0000 (14:18 +0100)]
st/nine: implement TEXM3x2DEPTH

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Fix TEXM3x3 and implement TEXM3x3VSPEC
Axel Davy [Sun, 28 Dec 2014 12:05:15 +0000 (13:05 +0100)]
st/nine: Fix TEXM3x3 and implement TEXM3x3VSPEC

The fix is that this line:
"src[s] = tx->regs.vT[s];" is wrong if s doesn't start from 0.
Instead access tx->regs.vT directly when needed.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Fill missing dst and src number for some instructions.
Axel Davy [Thu, 8 Jan 2015 21:21:20 +0000 (22:21 +0100)]
st/nine: Fill missing dst and src number for some instructions.

Not filling them correctly results in bad padding and later crash.

Reviewed-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Implement TEXCOORD special behaviours
Axel Davy [Wed, 3 Dec 2014 15:28:56 +0000 (16:28 +0100)]
st/nine: Implement TEXCOORD special behaviours

texcoord for ps < 1_4 should clamp between 0 and 1 the values.

texcrd (texcoord ps 1_4) does not clamp and can be used with
two modifiers _dw and _dz that means the channels are divided
by w or z.
Implement those in shared code, since the same modifiers can be used
for texld ps 1_4.

v2: replace DIV by RCP + MUL
v3: Remove an useless MOV

Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Fix CALLNZ implementation
Axel Davy [Fri, 26 Dec 2014 10:14:05 +0000 (11:14 +0100)]
st/nine: Fix CALLNZ implementation

Nothing seems to indicates the negation modifier would be stored in the
instruction flags instead of the source modifier. tx_src_param has
already handled it if it is in the source modifier.

In addition,
when the card supports native integers, the boolean
are stored in 32 bits int and are equal to
0 or 0xFFFFFFFF.

Given 0xFFFFFFFF is NaN if it was a float, better use
UIF than IF.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
9 years agost/nine: Fix some fixed function pipeline operation
Axel Davy [Thu, 25 Dec 2014 15:50:09 +0000 (16:50 +0100)]
st/nine: Fix some fixed function pipeline operation

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Clamp ps 1.X constants
Axel Davy [Wed, 24 Dec 2014 08:58:49 +0000 (09:58 +0100)]
st/nine: Clamp ps 1.X constants

This is wine (and windows) behaviour.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Remove duplicated code for ps texcoord input declaration
Axel Davy [Wed, 3 Dec 2014 14:58:34 +0000 (15:58 +0100)]
st/nine: Remove duplicated code for ps texcoord input declaration

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
9 years agost/nine: Fix CND implementation
Axel Davy [Thu, 25 Dec 2014 10:37:28 +0000 (11:37 +0100)]
st/nine: Fix CND implementation

Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Tiziano Bacocco <tizbac2@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Match REP implementation to LOOP
Axel Davy [Fri, 2 Jan 2015 13:57:00 +0000 (14:57 +0100)]
st/nine: Match REP implementation to LOOP

Previous implementation was behaving fine, but improve it by:
. Improved documentation
. Decreasing counter (comparing to 0 is likely to be faster than to constant)
. Move the counter update at the end for better performance for shaders that
break the loop earlier than when the count is done.

Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
9 years agost/nine: Rewrite LOOP implementation, and a0 aL handling
Axel Davy [Mon, 8 Dec 2014 14:38:28 +0000 (15:38 +0100)]
st/nine: Rewrite LOOP implementation, and a0 aL handling

Previous implementation didn't work well with nested loops.

Instead of using several address registers, put a0 and aL
into normal registers, and copy them to one address register when
we need to use them.

Wine tests loop_index_test() and nested_loop_test() now pass correctly.

Fixes r600g crash while loading Bioshock -
bug https://bugs.freedesktop.org/show_bug.cgi?id=85696

Tested-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Correct LOG on negative values
Axel Davy [Wed, 3 Dec 2014 14:50:53 +0000 (15:50 +0100)]
st/nine: Correct LOG on negative values

We should take the absolute value of the input.

Also return -FLT_MAX instead of -Inf for an input of 0.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>