git.libre-soc.org Git - mesa.git/log

radeonsi: Enable VGPR spilling for all shader types v5

v2:
  - Only emit write SPI_TMPRING_SIZE once per packet.
  - Use context global scratch buffer.

v3:
  - Patch shaders using WRITE_DATA packet instead of map/unmap.
  - Emit ICACHE_FLUSH, CS_PARTIAL_FLUSH, PS_PARTIAL_FLUSH, and
    VS_PARTIAL_FLUSH when patching shaders.

v4:
  - Code cleanups.
  - Remove unnecessary multiplies.

v5:
  - Patch shaders in system memory and re-upload to vram.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

radeonsi/compute: Allocate the scratch buffer during state creation

This moves scratch buffer allocation from si_launch_grid() to
si_create_compute_state(). This helps to reduce the overhead of
launching a kernel and also fixes a bug in the code that would cause
the scratch buffer to be too small if a kernel with smaller scratch size
was launched before a kernel with a larger scratch size.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

radeonsi: Add radeon_shader_binary member to struct si_shader

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

radeonsi/compute: Rename si_compute::program to si_compute::shader

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

radeonsi: Avoid leaking memory when rebuilding shader states

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

nir/opcodes: Use a return type of tfloat for ldexp

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

Revert "util: Move the alternate fpclassify implementation to util"

This reverts commits d6eb572905e39c36168b8f5da240af961f9dde0a and
58e8468d113c7d3d4a59ea4a8d70fd45b78e85e6.

This is no longer necessary as we aren't using it in NIR anymore. Also, it
broke the build on some strange systems so let's put it back in querymatrix
where it came from.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88852

Acked-by: Matt Turner <mattst88@gmail.com>

Revert "nir/opcodes: Use fpclassify() instead of isnormal() for ldexp"

This reverts commit d7d340fb2f68c46bd5a0008ecf53c6693e29c916.

We have an isnormal() implementation available, the only problem was that
we had the wrong return type (fixed in a later patch).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806

Acked-by: Matt Turner <mattst88@gmail.com>

util: Predicate the fpclassify fallback on !defined(__cplusplus)

The problem is that the fallbacks we have at the moment don't work in C++.
While we could theoretically fix the fallbacks it would also raise the
issue of correctly detecting the fpclassify function. So, for now, we'll
just disable it until we actually have a C++ user.

Reported-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: Tom Stellard <thomas.stellard@amd.com>
Tested-by: EdB <edb+mesa@sigluy.net>

drirc: set allow_glsl_extension_directive_midshader for Dead Island.

Signed-off-by: Sven Arvidsson <sa@whiz.se>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87076
Signed-off-by: Marek Olšák <marek.olsak@amd.com>

nir/opcodes: Use fpclassify() instead of isnormal() for ldexp

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88806
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

util: Move the alternate fpclassify implementation to util

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

i965/tex: Don't create read-write textures with non-renderable formats

I haven't actually seen this bug in the wild, but it's possible that
someone could ask to do a S3TC PBO download or something. This protects us
from accidentally creating a render target with a compressed or otherwise
non-renderable format.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/gen8: Include the buffer offset when emitting renderbuffer relocs

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88792
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

mesa: improve error messaging for format CSV parser

Patch adds 2 error messages that point user directly to fix
mispelled or impossible swizzle field for a format.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

clover/llvm: Dump the OpenCL C code earlier.

[ Francisco Jerez: As discussed on the mailing list, this is intended
to produce more useful debug output in cases where the compilation
terminates unexpectedly. ]

Reviewed-by: Francisco Jerez <currojerez@riseup.net>

clover/llvm: Move CLOVER_DEBUG stuff into anonymous namespace.

[ Francisco Jerez: As we're at it make debug_options[] local to its
only user and remove temporary. ]

Reviewed-by: Francisco Jerez <currojerez@riseup.net>

r600g: add support for primitive id without geom shader (v2)

GLSL 1.50 specifies a fragment shader may have a primitive id
input without a geometry shader present.

On r600 hw there is a special GS scenario for this, you have
to enable GS_SCENARIO_A and pass the primitive id through
the vertex shader which operates in GS_A mode.

This is a first pass attempt at this, and passes the piglit
tests that test for this.

v1.1: clean up debug print + no need to assign
key value to setup output.
v2: add r600 support

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

r600g: move selecting the pixel shader earlier.

In order to detect that a pixel shader has a prim id
input when we have no geometry shader we need to reorder
the shader selection so the pixel shader is selected
first, then the vertex shader key can take into account
the primitive id input requirement and lack of geom shader.

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

st/clover: Pass target instead of target.begin() to std::string()

Fixes reading beyond allocated memory:

==1936== Invalid read of size 1
==1936==    at 0x4C2C1B4: strlen (vg_replace_strmem.c:412)
==1936==    by 0x9E00C30: std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.20)
==1936==    by 0x5B44FAE: clover::compile_program_llvm(clover::compat::string const&, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&, pipe_shader_ir, clover::compat::string const&, clover::compat::string const&, clover::compat::string&) (invocation.cpp:698)
==1936==    by 0x5B39A20: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63)
==1936==    by 0x5B20152: clBuildProgram (program.cpp:182)
==1936==    by 0x400F41: main (hello_world.c:109)
==1936==  Address 0x56fee1f is 0 bytes after a block of size 15 alloc'd
==1936==    at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==1936==    by 0x5B398F0: alloc (compat.hpp:59)
==1936==    by 0x5B398F0: vector<std::basic_string<char> > (compat.hpp:98)
==1936==    by 0x5B398F0: string<std::basic_string<char> > (compat.hpp:327)
==1936==    by 0x5B398F0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63)
==1936==    by 0x5B20152: clBuildProgram (program.cpp:182)
==1936==    by 0x400F41: main (hello_world.c:109)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>

r600g,radeonsi: Fix calculation of IR target cap string buffer size

Fixes writing beyond the allocated buffer:

==31855== Invalid write of size 1
==31855==    at 0x50AB2A9: vsprintf (iovsprintf.c:43)
==31855==    by 0x508F6F6: sprintf (sprintf.c:32)
==31855==    by 0xB59C7EC: r600_get_compute_param (r600_pipe_common.c:526)
==31855==    by 0x5B2B7DE: get_compute_param<char> (device.cpp:37)
==31855==    by 0x5B2B7DE: clover::device::ir_target() const (device.cpp:201)
==31855==    by 0x5B398E0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63)
==31855==    by 0x5B20152: clBuildProgram (program.cpp:182)
==31855==    by 0x400F41: main (hello_world.c:109)
==31855==  Address 0x56fed5f is 0 bytes after a block of size 15 alloc'd
==31855==    at 0x4C29180: operator new(unsigned long) (vg_replace_malloc.c:324)
==31855==    by 0x5B2B7C2: allocate (new_allocator.h:104)
==31855==    by 0x5B2B7C2: allocate (alloc_traits.h:357)
==31855==    by 0x5B2B7C2: _M_allocate (stl_vector.h:170)
==31855==    by 0x5B2B7C2: _M_create_storage (stl_vector.h:185)
==31855==    by 0x5B2B7C2: _Vector_base (stl_vector.h:136)
==31855==    by 0x5B2B7C2: vector (stl_vector.h:278)
==31855==    by 0x5B2B7C2: get_compute_param<char> (device.cpp:35)
==31855==    by 0x5B2B7C2: clover::device::ir_target() const (device.cpp:201)
==31855==    by 0x5B398E0: clover::program::build(clover::ref_vector<clover::device> const&, char const*, clover::compat::vector<clover::compat::pair<clover::compat::string, clover::compat::string> > const&) (program.cpp:63)
==31855==    by 0x5B20152: clBuildProgram (program.cpp:182)
==31855==    by 0x400F41: main (hello_world.c:109)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>

nir: fix a bug with constant folding non-per-component instructions

Before, we were only copying the first N channels, where N is the size
of the SSA destination, which is fine for per-component instructions,
but non-per-component instructions like fdot3 can have more source
components than destination components. Fix this using the helper
function introduced in the last patch.

v2: use new helper name

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>

nir: add a helper function for getting the number of source components

Unlike with non-SSA ALU instructions, where if they're per-component
you have to look at the writemask to know which source channels are
being used, SSA ALU instructions always have all the possible channels
enabled so we can just look at the number of components in the SSA
definition for per-component instructions to say how many source
components are being used.

v2: use new name nir_ssa_alu_instr_src_components()

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>

i965: Implemente a tiled fast-path for glReadPixels and glGetTexImage

Added intel_readpixels_tiled_mempcpy and intel_gettexsubimage_tiled_mempcpy
functions. These are the fast paths for glReadPixels and glGetTexImage.

On chrome, using the RoboHornet 2D Canvas toDataURL test, this patch cuts
amount of time spent in glReadPixels by more than half and reduces the time
of the entire test by 10%.

v2: Jason Ekstrand <jason.ekstrand@intel.com>
   - Refactor to make the functions look more like the old
     intel_tex_subimage_tiled_memcpy
   - Don't export the readpixels_tiled_memcpy function
   - Fix some pointer arithmatic bugs in partial image downloads (using
     ReadPixels with a non-zero x or y offset)
   - Fix a bug when ReadPixels is performed on an FBO wrapping a texture
     miplevel other than zero.

v3: Jason Ekstrand <jason.ekstrand@intel.com>
   - Better documentation fot the *_tiled_memcpy functions
   - Add target restrictions for renderbuffers wrapping textures

v4: Jason Ekstrand <jason.ekstrand@intel.com>
   - Only check the return value of brw_bo_map for error and not bo->virtual

v5: Jason Ekstrand <jason.ekstrand@intel.com>
   - Don't unnecessarily repeat a comment

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>

i965/tiled_memcpy: Add tiled-to-linear paths

This commit addes tiled copy functions for coping from tiled memory to
linear memory.  These are very similar to the existing linear-to-tiled
paths.

v2: Jason Ekstrand <jason.ekstrand@intel.com>
   - New commit message
   - Various whitespace fixes
   - Added ptrdiff_t casts as done in commit 225a09790

v3: Jason Ekstrand <jason.ekstrand@intel.com>
   - Fixed a comment

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>

i965: Refactor tiled memcpy functions and move them into their own file

This commit refactors the tiled_memcpy code in intel_tex_subimage.c and
moves it into its own file intel_tiled_memcpy files.  Also, xtile_copy and
ytile_copy are renamed to linear_to_xtiled and linear_to_ytiled
respectively.  The *_faster functions are similarly renamed.

There was also a bit of logic to select between the the libc provided
memcpy function and our custom memcpy that does an RGBA -> BGRA swizzle.
This was moved into an intel_get_memcpy function so that rgba8_copy can
live (and be inlined) in intel_tiled_memcpy.c.

v2: Jason Ekstrand <jason.ekstrand@intel.com>
   - Better commit message
   - Fix up the copyright on the intel_tiled_memcpy files
   - Various whitespace fixes
   - Moved a bunch of stuff that did not need to be exposed from
     intel_tiled_memcpy.h to intel_tiled_memcpy.c
   - Added proper documentation for intel_get_memcpy
   - Incorperated the ptrdiff_t tweaks from commit 225a09790

v3: Jason Ekstrand <jason.ekstrand@intel.com>
   - Fixed a comment
   - Move the tile size constants into the .c file

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>

i965/tex_subimage: Use the fast tiled path for rectangle textures

There's no reason why we should be doing this for 2D textures and not
rectangles. Just a matter of adding another hunk to the condition.

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Chad Versace <chad.versace@intel.com>

mesa/autoconf: attempt to use gnu99 on older gcc compilers

anonymous structs/union don't work with c99 but do work with gnu99
on gcc 4.4.

Signed-off-by: Dave Airlie <airlied@redhat.com>

mesa: simplify detection of fpclassify

Fixes compilation with musl libc.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

nir/opcodes: Don't go through doubles when constant-folding iabs

Previously, we called the abs() function in math.h. However, this involves
unnecessarily going through double. This commit changes it to use integers
directly with a ternary.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

nir/opcodes: Simplify and fix the unpack_half_*_split_* constant expressions

Previously, these functions were explicitly writing to dst.x and dst.y.
However they both return only one component so writing to dst.y is invalid.
Also, since they only return one component, we don't need the explicit
assignment in the expression and can simplify it use an implicit
assignment.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

nir: Use pointers for nir_src_copy and nir_dest_copy

This avoids the overhead of copying structures and better matches the newly
added nir_alu_src_copy and nir_alu_dest_copy.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

i965: Handle CMP.nz ... 0 and MOV.nz similarly in cmod propagation.

"MOV.nz null src" and "CMP.nz null src 0" are equivalent instructions.

Previously, we deleted MOV.nz instructions when the instruction
generating the MOV's source also wrote the flag register (as the flag
register already contains the desired value).  However, we wouldn't
delete CMP.nz instructions that served the same purpose.

We also didn't attempt true cmod propagation on MOV.nz instructions,
while we would for the equivalent CMP.nz form.

This patch fixes both limitations, treating both forms equally.
CMP.nz instructions will now be deleted (helping the NIR backend),
and MOV.nz instructions will have their .nz propagated.

No changes in shader-db without NIR.  With NIR,

total instructions in shared programs: 6006153 -> 5969364 (-0.61%)
instructions in affected programs:     2087139 -> 2050350 (-1.76%)
helped:                                10704
HURT:                                  0
GAINED:                                2
LOST:                                  2

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

clover: Fix build with llvm after r226981

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88783
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>

configure: Link against all LLVM targets when building clover

Since 8e7df519bd8556591794b2de08a833a67e34d526, we initialise all targets in
clover. This fixes bug 85380.

v2: Mention correct bug in commit message

Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>

nir/constant_folding: use the new constant folding infrastructure

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

nir: add new constant folding infrastructure

Add a required field to the Opcode class, const_expr, that contains an
expression or statement that computes the result of the opcode given known
constant inputs. Then take those const_expr's and expand them into a function
that takes an opcode and an array of constant inputs and spits out the constant
result. This means that when adding opcodes, there's one less place to update,
and almost all the opcodes are self-documenting since the information on how to
compute the result is right next to the definition.

The helper functions in nir_constant_expressions.c were taken from
ir_constant_expressions.cpp.

v3 Jason Ekstrand <jason.ekstrand@iastate.edu>
- Use mako to generate one function per opcode instead of doing piles of
string splicing

v4 Jason Ekstrand <jason.ekstrand@iastate.edu>
- More comments and better indentation in the mako
- Add a description of the constant expression language in nir_opcodes.py
- Added nir_constant_expressions.py to EXTRA_DIST in Makefile.am

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

nir: use Python to autogenerate opcode information

Before, we used a system where a file, nir_opcodes.h, defined some macros that
were included to generate the enum values and the nir_op_infos structure. This
worked pretty well, but for development the error messages were never very
useful, Python tools couldn't understand the opcode list, and it was difficult
to use nir_opcodes.h to do other things like autogenerate a builder API. Now, we
store opcode information in nir_opcodes.py, and we have nir_opcodes_c.py to
generate the old nir_opcodes.c and nir_opcodes_h.py to generate nir_opcodes.h,
which contains all the enum names and gets included into nir.h like before. In
addition to solving the above problems, using Python and Mako to generate
everything means that it's much easier to add keep information centralized as we
add new things like constant propagation that require per-opcode information.

v2:
- make Opcode derive from object (Dylan)
- don't use assert like it's a function (Dylan)
- style fixes for fnoise, use xrange (Dylan)
- use iterkeys() in nir_opcodes_h.py (Dylan)
- use pydoc-style comments (Jason)
- don't make fmin/fmax commutative and associative yet (Jason)

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
v3 Jason Ekstrand <jason.ekstrand@intel.com>
- Alphabetize source file lists
- Generate nir_opcodes.h in the builddir instead of the source dir
- Include $(builddir)/src/glsl/nir in the i965 build
- Rework nir_opcodes.h generation so it generates a complete header file
instead of one that has to be embedded inside an enum declaration

docs: add news item and link release notes for mesa 10.4.3

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>

docs: Add sha256 sums for the 10.4.3 release

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 49a5bce7801651574d3f4841d7532d8b2b86af63)

Add release notes for the 10.4.3 release

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit e92bfa3f9512832b61706b76b5c9f7afa78199b0)

i965: Convert CMP.GE -(abs)reg 0 -> CMP.Z reg 0.

total instructions in shared programs: 5952059 -> 5951603 (-0.01%)
instructions in affected programs:     138812 -> 138356 (-0.33%)
GAINED:                                1
LOST:                                  0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/fs: Add support for removing MOV.NZ instructions.

For some reason, we occasionally write the flag register with a MOV.NZ
instruction:

   add(8)          g25<1>F         -g6<0,1,0>F     g15<8,8,1>F
   cmp.l.f0(8)     g26<1>D         g25<8,8,1>F     0F
   mov.nz.f0(8)    null            g26<8,8,1>D

A MOV.NZ instruction on the result of a CMP is like comparing for
equality with true in C. It's useless. Removing it allows us to
generate:

   add.l.f0(8)     null            -g6<0,1,0>F     g15<8,8,1>F

total instructions in shared programs: 5955701 -> 5951657 (-0.07%)
instructions in affected programs:     302910 -> 298866 (-1.34%)
GAINED:                                1
LOST:                                  0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/fs: Allow flipping cond mod for negated arguments.

This allows us to apply the optimization in cases where the CMP's
argument is negated, by flipping the conditional mod. For example, it
allows us to optimize this:

   add(8)       temp   a      b
   cmp.l.f0(8)  null   -temp  0.0

into

   add.g.f0(8)  temp   a      b

total instructions in shared programs: 5958360 -> 5955701 (-0.04%)
instructions in affected programs:     466880 -> 464221 (-0.57%)
GAINED:                                0
LOST:                                  1

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/fs: Propagate cmod across flag read if it contains the same value.

total instructions in shared programs: 5959463 -> 5958900 (-0.01%)
instructions in affected programs: 70031 -> 69468 (-0.80%)

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/fs: Add unit tests for cmod propagation pass.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/fs: Add pass to propagate conditional modifiers.

total instructions in shared programs: 5974160 -> 5959463 (-0.25%)
instructions in affected programs:     1743737 -> 1729040 (-0.84%)
GAINED:                                0
LOST:                                  12

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/fs: Eliminate null-dst instructions without side-effects.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/fs: Apply conditional mod specially to split MAD/LRP.

Otherwise we'll apply the conditional mod to only one of SIMD8
instructions and trigger an assertion.

NoDDClr/NoDDChk have the same problem but we never apply those to these
instructions, so I'm leaving them for a later time.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/fs: Add a pass to fixup 3-src instructions that have a null dest.

3-src instructions can only have GRF/MRF destinations. It's really
difficult to deal with that restriction in dead code elimination (that
wants to give instructions null destinations to show that their result
isn't used) while allowing 3-src instructions to have conditional mod,
so don't, and just give then a destination before register allocation.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: Add is_3src() to backend_instruction.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: Add backend_instruction::can_do_cmod().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965/cfg: Add a foreach_block_reverse macro.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

i965/cfg: Add a foreach_inst_in_block_reverse_safe macro.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

glsl: Add a foreach_in_list_reverse_safe macro.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

i965: Don't make instructions with a null dest a barrier to scheduling.

Now that we properly track accumulator dependencies, the scheduler is
able to schedule instructions between the mach and mov in the common
the integer multiplication pattern:

   mul  acc0, x, y
   mach null, x, y
   mov  dest, acc0

Since a null destination implies no dependency on the destination, we
can also safely schedule instructions (that don't write the accumulator)
between the mul and mach.

GAINED:                                103
LOST:                                  43

Causes one program to spill (643 -> 1076 instructions).

I committed this patch last year (commit 42a26cb5) but reverted it
(commit 0d3f83f4) after inexplicable artifacts in Kerbal Space Program
(bug 78648). Tapani reapplied this patch and could not reproduce the bug
with current Mesa.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

i965/fs: Allow SIMD16 on pre-SNB when try_replace_with_sel is successful

If try_replace_with_sel is able to replace the flow control with a SEL
instruction, then there is no flow control... failing SIMD16 because
of nonexistent flow control is wrong.

No piglit regressions on any i965 platform in Jenkins.

total instructions in shared programs: 4382707 -> 4382707 (0.00%)
instructions in affected programs:     0 -> 0
helped:                                0
HURT:                                  0
GAINED:                                2089
LOST:                                  0

No other platforms affected in shader-db.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

nir: Expose nir_print_instr() for debug prints

It's nice to have this present in your default cases so you can see what
instruction is triggering an abort.

v2: Just pass a NULL state, now that it won't crash when you do.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

nir: When asked to print with a NULL state, just use bare variable names.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

nir: Add nir_lower_alu_to_scalar.

This is the equivalent of brw_fs_channel_expressions.cpp, which I wanted
for vc4.

v2: Use the nir_src_for_ssa() helper, and another instance of
    nir_alu_src_copy().
v3: Drop the non-SSA support.  All intended callers will have SSA-only ALU
    ops.
v4: Use insert_before, drop stale bcsel/fcsel comment, drop now-unused
    unsupported() function, drop lower_context struct.
v5: Completely rename the pass to nir_lower_alu_to_scalar(), add an assert
    about weird input_sizes[].

Reviewed-by: Jason Ekstrand <jason.ekstrand@iastate.edu>

nir: Make some helpers for copying ALU src/dests.

There aren't many users yet, but I wanted to do this from my scalarizing
pass.

v2: Constify the src arguments.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

nir: Add algebraic optimizations for division and reciprocal.

These also exist in opt_algebraic.cpp.

total NIR instructions in shared programs: 2011430 -> 2011211 (-0.01%)
NIR instructions in affected programs:     42221 -> 42002 (-0.52%)
helped:                                    198

total i965 instructions in shared programs: 6020553 -> 6020116 (-0.01%)
i965 instructions in affected programs:     84322 -> 83885 (-0.52%)
helped:                                     394
HURT:                                       1 (by 1 instruction)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

nir: Add algebraic optimizations for exponential/logarithmic functions.

Most of these exist in the GLSL IR algebraic pass already.  However,
SSA allows us to find more instances of the patterns.

total NIR instructions in shared programs: 2015593 -> 2011430 (-0.21%)
NIR instructions in affected programs:     124189 -> 120026 (-3.35%)
helped:                                    604

total i965 instructions in shared programs: 6025505 -> 6018717 (-0.11%)
i965 instructions in affected programs:     261295 -> 254507 (-2.60%)
helped:                                     1295
HURT:                                       3

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

nir: Add algebraic optimizations for simplifying comparisons.

The first batch removes bonus fnot/inot operations, possibly allowing
other optimizations to better recognize patterns.

The next batch replaces a fadd and constant 0.0 with an fneg - negation
is usually free on GPUs, while addition is not.

total NIR instructions in shared programs: 2020814 -> 2015593 (-0.26%)
NIR instructions in affected programs:     411143 -> 405922 (-1.27%)
helped:                                    2233
HURT:                                      214

A few shaders are hurt by a few instructions due to moving neg such
that it has a constant operand, which is then folded, resulting in two
distinct load_consts for x and -x.  We can always clean that up later.

total i965 instructions in shared programs: 6035392 -> 6025505 (-0.16%)
i965 instructions in affected programs:     784980 -> 775093 (-1.26%)
helped:                                     4508
HURT:                                       2

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

nir: Add algebraic optimizations for pointless shifts.

The GLSL IR optimization pass contained these; we may as well include
them too.

v2: Fix a >> 0 and a << 0 optimizations (caught by Matt).

No change in the number of NIR instructions on a shader-db run.

total i965 instructions in shared programs: 6035397 -> 6035392 (-0.00%)
i965 instructions in affected programs: 542 -> 537 (-0.92%)
helped: 2 (in glamor)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

nir: Add a bunch of algebraic optimizations on logic/bit operations.

Matt and I noticed a bunch of "val <- ior a a" operations in a shader,
so we decided to add an algebraic optimization for that.  While there,
I decided to add a bunch more of them.

v2: Delete bogus fand/for optimizations (caught by Jason).

total NIR instructions in shared programs: 2023511 -> 2020814 (-0.13%)
NIR instructions in affected programs:     149634 -> 146937 (-1.80%)
helped:                                    1032

total i965 instructions in shared programs: 6035392 -> 6035397 (0.00%)
i965 instructions in affected programs:     537 -> 542 (0.93%)
HURT:                                       2

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

nir: Implement CSE on intrinsics that can be eliminated and reordered.

Matt and I noticed that one of the shaders hurt by INTEL_USE_NIR=1 had
load_input and load_uniform intrinsics repeated several times, with the
same parameters, but each one generating a distinct SSA value.  This
made ALU operations on those values appear distinct as well.

Generating distinct SSA values is silly - these are read only variables.
CSE'ing them makes everything use a single SSA value, which then allows
other operations to be CSE'd away as well.

Generalizing a bit, it seems like we should be able to safely CSE any
intrinsics that can be eliminated and reordered.  I didn't implement
support for variables for the time being.

v2: Assert that info->num_variables == 0 (requested by Jason).

total NIR instructions in shared programs: 2435936 -> 2023511 (-16.93%)
NIR instructions in affected programs:     2413496 -> 2001071 (-17.09%)
helped:                                    16872

total i965 instructions in shared programs: 6028987 -> 6008427 (-0.34%)
i965 instructions in affected programs:     640654 -> 620094 (-3.21%)
helped:                                     2071
HURT:                                       585
GAINED:                                     14
LOST:                                       25

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

nir: Pull nir_instr_can_cse()'s SSA checks out of the switch.

This should not be a change in behavior, as all current cases that
potentially answer "yes" require SSA.

The next patch will introduce another case that requires SSA.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

i965/nir: Report NIR instruction counts (in SSA form) via KHR_debug.

This allows us to count NIR instructions via shader-db.

Use "run" as normal. The results file will contain both NIR and
assembly.

Then, to generate a NIR report:
./report.py <(grep NIR results/foo) <(grep NIR results/bar)

Or, to generate an i965 report:
./report.py <(grep -v NIR results/foo) <(grep -v NIR results/bar)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

i965/nir: Print NIR on INTEL_DEBUG=fs.

This is useful for debugging and looking for optimization opportunities.

It will need to be expanded when we add support for other scalar stages.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>

i965/nir: Do optimizations again just before lowering source mods.

We want to run CSE and algebraic optimizations again after lowering IO.
Some of the passes in the optimization loop don't handle saturates and
other modifiers, so run it before lowering to source modifiers.

total instructions in shared programs: 6046190 -> 6045768 (-0.01%)
instructions in affected programs:     22406 -> 21984 (-1.88%)
helped:                                47
HURT:                                  0
GAINED:                                0
LOST:                                  0

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

loader: Remove NEED_OPENGL_COMMON check.

HAVE_DRICOMMON is sufficient since OpenGL must be enabled for DRI.

gitignore: Ignore .tar.xz files.

mesa: Build with subdir-objects.

glsl: Build a libglsl_util library.

Rather than sourcing files with ../dir/file.c which leads to distclean
wiping out ../dir's .deps directory.

mapi: Build with subdir-objects.

mapi: Remove vgapi from SUBDIRS.

OpenVG is disabled with via autotools.

mesa: Drop inclusion of glapi_gen.mk.

Some glapi headers used to be generated from this Makefile.am, but no
longer.

glsl: Build with subdir-objects.

Apparently $(top_srcdir) is not expanded in a source list when using
subdir-objects, so remove that. It's not clear to me why we were going
to such lengths to prefix each source file anyway.

nir: Add headers to distribution.

nir: Add nir_{opt_,}algebraic.py to distribution.

mesa: Add format_{un,}pack.py to distribution.

mesa: Remove pack_tmp.h from sources.

Missed in commit 3a4de321.

nir: add generated file to .gitignore

Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>

i965: Fix min_vs_entries for CHV

According to BSpec the correct number for min_vs_entries is 34 for CHV.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

i965: Fix max_wm_threads for CHV

Change max_wm_threads to match the spec on CHV. The max number of
threads in 3DSTATE_PS is always programmed to 64 and the hardware
internally scales that depending on the GT SKU. So this doesn't
change the max number of threads actually used, but it does affect
the scratch space calculation.

On CHV the old value was too small, so the amount of scratch space
allocated wasn't sufficient to satisfy the actual max number of
threads used.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

glsl: fix stale comment

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>

i965/emit: Assert that src1 is not an MRF after doing the MRF->GRF conversion

When emitting texturing from indirect texture units, we need to be able to
scratch around in the header message. Since we only do this for >= HSW,
this is ok since there are no MRFs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj phogat <anuj.phogat@gmail.com>

i965/emit: Do the sampler index adjustment directly in header.0.3

Prior to this commit, the adjust_sampler_state_pointer function took an
extra register that it could use as scratch space.  The usual candidate was
the destination of the sampler instruction.  However, if that register ever
aliased anything important such as the sampler index, this would scratch
all over important data.  Fortunately, the calculation is such that we can
just do it in place and we don't need the scratch space at all.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>

st/nine: Correctly handle when ff vs should have no texture coord input/output

Previous code semantic was:

. if ff ps will not run a ff stage, then do not output texture coords for this stage
for vs
. if XYZRHW is used (position_t), use only the mode where input coordinates are copied
to the outputs.

Problem is when apps don't give texture inputs. When apps precise PASSTHRU, it means
copy texture coord input to texture coord output if there is such input. The case
where there is no texture coord input wasn't handled correctly.

Drivers like r300 dislike when vs has inputs that are not fed.

Moreover if the app uses ff vs with a programmable ps, we shouldn't look at
what are the parameters of the ff ps to decide to output or not texture
coordinates.

The new code semantic is:

. if XYZRHW is used, restrict to PASSTHRU
. if PASSTHRU is used and no texture input is declared, then do not output
texture coords for this stage

The case where ff ps needs a texture coord input and ff vs doesn't output
it is not handled, and should probably be a runtime error.

This fixes 3Dmark05, which uses ff vs with programmable ps.

Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>

st/nine: Change comment relating to vertex shader inputs not matching declaration

Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>

st/nine: Allocate vs constbuf buffer for indirect addressing once.

When the shader does indirect addressing on the constants,
we allocate a temporary constant buffer to which we copy
the constants from the app given user constants and
the constants filled in the shader.

This patch makes this buffer be allocated once.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Tiziano Bacocco <tizbac2@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>

st/nine: Allocate the correct size for the user constant buffer

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>

st/nine: Add variables containing the size of the constant buffers

Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>

st/nine: Fix sm3 relative addressing for non-debug build

Relative addressing needs the constant buffer to get all
the correct constants, even those defined by the shader.

The code to copy the shader constants to the constant buffer
was enabled only for debug build. Enable it always.

Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>

st/nine: Remove unused code for ps

Since constant indirect adressing is not allowed for ps,
we can remove our code to handle that.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>

st/nine: Correct rules for relative adressing and constants.

relative adressing for constants is possible only for vs float
constants.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>

st/nine: Implement TEXREG2AR, TEXREG2GB and TEXREG2RGB

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>

st/nine: Implement TEXDP3TEX

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>

st/nine: Implement TEXDP3

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>