mesa.git
9 years agonir: Add a helper for rewriting an instruction source
Jason Ekstrand [Fri, 14 Nov 2014 03:07:22 +0000 (19:07 -0800)]
nir: Add a helper for rewriting an instruction source

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoi965/fs_nir: Properly saturate multiplies
Jason Ekstrand [Fri, 14 Nov 2014 05:34:41 +0000 (21:34 -0800)]
i965/fs_nir: Properly saturate multiplies

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir/from_ssa: Don't lower constant SSA values to registers
Jason Ekstrand [Wed, 14 Jan 2015 19:19:41 +0000 (11:19 -0800)]
nir/from_ssa: Don't lower constant SSA values to registers

Backends want to be able to do special things with constant values such as
put them into immediates or make decisions based on whether or not a value
is constant.  Before, constants always got lowered to a load_const into a
register and then a register use.  Now we leave constants as SSA values so
backends can special-case them if they want.  Since handling constant SSA
values is trivial, this shouldn't be a problem for backends.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoi965/fs_nir: Handle SSA constants
Jason Ekstrand [Thu, 13 Nov 2014 00:24:21 +0000 (16:24 -0800)]
i965/fs_nir: Handle SSA constants

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoi965/fs_nir: Use an array rather than a hash table for register lookup
Jason Ekstrand [Wed, 12 Nov 2014 19:05:51 +0000 (11:05 -0800)]
i965/fs_nir: Use an array rather than a hash table for register lookup

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoi965/fs_nir: Add the CSE pass and actually run in a loop
Jason Ekstrand [Wed, 12 Nov 2014 00:12:32 +0000 (16:12 -0800)]
i965/fs_nir: Add the CSE pass and actually run in a loop

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: Add a basic CSE pass
Jason Ekstrand [Wed, 12 Nov 2014 00:11:34 +0000 (16:11 -0800)]
nir: Add a basic CSE pass

This pass is still fairly basic.  It only handles ALU operations, constant
loads, and phi nodes.  No texture ops or intrinsics yet.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: Add a fused multiply-add peephole
Jason Ekstrand [Tue, 11 Nov 2014 20:16:55 +0000 (12:16 -0800)]
nir: Add a fused multiply-add peephole

9 years agonir: Validate that the SSA def and register indices are unique
Jason Ekstrand [Tue, 11 Nov 2014 00:00:03 +0000 (16:00 -0800)]
nir: Validate that the SSA def and register indices are unique

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoi965/fs_nir: Turn on the peephole select optimization
Jason Ekstrand [Sat, 8 Nov 2014 00:07:22 +0000 (16:07 -0800)]
i965/fs_nir: Turn on the peephole select optimization

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: Add a peephole select optimization
Jason Ekstrand [Tue, 4 Nov 2014 18:12:14 +0000 (10:12 -0800)]
nir: Add a peephole select optimization

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir/nir: Patch up phi predecessors in move_successors
Jason Ekstrand [Sat, 8 Nov 2014 03:35:23 +0000 (19:35 -0800)]
nir/nir: Patch up phi predecessors in move_successors

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir/nir: Use safe iterators when iterating over the CFG
Jason Ekstrand [Sat, 8 Nov 2014 02:27:36 +0000 (18:27 -0800)]
nir/nir: Use safe iterators when iterating over the CFG

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoglsl/list: Add a foreach_list_typed_safe_reverse macro
Jason Ekstrand [Sat, 8 Nov 2014 02:26:50 +0000 (18:26 -0800)]
glsl/list: Add a foreach_list_typed_safe_reverse macro

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir/nir: Fix a bug in move_successors
Jason Ekstrand [Sat, 8 Nov 2014 02:25:08 +0000 (18:25 -0800)]
nir/nir: Fix a bug in move_successors

The unlink_blocks function moves successors around to make sure that, if
there is a remaining successor, it is in the first successors slot and not
the second.  To fix this, we simply get both successors up front.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoi965/fs_nir: Validate optimization passes
Jason Ekstrand [Fri, 7 Nov 2014 19:03:12 +0000 (11:03 -0800)]
i965/fs_nir: Validate optimization passes

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: Differentiate between signed and unsigned versions of find_msb
Jason Ekstrand [Fri, 7 Nov 2014 18:59:16 +0000 (10:59 -0800)]
nir: Differentiate between signed and unsigned versions of find_msb

We also make the return types match GLSL.  The GLSL spec specifies that
findMSB and findLSB return a signed integer.  Previously, nir had them
return unsigned.  This updates nir's behavior to match what GLSL expects.

We also update the nir-to-fs generator to take the new instructions.  While
we're at it, we fix the case where the input to findMSB is zero.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir/print: Don't reindex things
Jason Ekstrand [Thu, 6 Nov 2014 19:18:42 +0000 (11:18 -0800)]
nir/print: Don't reindex things

These indices should now be reasonably stable/consistent.  Redoing the
indices in the print functions makes it harder to debug problems.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: Validate all lists in the validator
Jason Ekstrand [Wed, 5 Nov 2014 21:58:42 +0000 (13:58 -0800)]
nir: Validate all lists in the validator

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoglsl/list: Fix the exec_list_validate function
Jason Ekstrand [Tue, 11 Nov 2014 18:12:24 +0000 (10:12 -0800)]
glsl/list: Fix the exec_list_validate function

Some time while refactoring things to make it look nicer before pushing to
master, I completely broke the function.  This fixes it to be correct.
Just goes to show you why you souldn't push code that has no users yet...

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoi965/fs_nir: Do retyping for ALU srouces in get_nir_alu_src
Jason Ekstrand [Fri, 12 Dec 2014 21:05:25 +0000 (13:05 -0800)]
i965/fs_nir: Do retyping for ALU srouces in get_nir_alu_src

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: Add a better out-of-SSA pass
Jason Ekstrand [Fri, 31 Oct 2014 18:17:09 +0000 (11:17 -0700)]
nir: Add a better out-of-SSA pass

This commit rewrites the out-of-SSA pass to not be nearly as naieve.  It's
based on "Revisiting Out-of-SSA Translation for Correctness, Code Quality,
and Efficiency" by Boissinot et. al.  It should be fairly close to
state-of-the art.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: Add a function for comparing two sources
Jason Ekstrand [Fri, 12 Dec 2014 20:52:11 +0000 (12:52 -0800)]
nir: Add a function for comparing two sources

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: Add a parallel copy instruction type
Jason Ekstrand [Fri, 31 Oct 2014 04:04:15 +0000 (21:04 -0700)]
nir: Add a parallel copy instruction type

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: Add a function for rewriting all the uses of a SSA def
Jason Ekstrand [Wed, 5 Nov 2014 01:18:48 +0000 (17:18 -0800)]
nir: Add a function for rewriting all the uses of a SSA def

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: Automatically handle SSA uses when an instruction is inserted
Jason Ekstrand [Tue, 4 Nov 2014 19:02:09 +0000 (11:02 -0800)]
nir: Automatically handle SSA uses when an instruction is inserted

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: Add an initialization function for SSA definitions
Jason Ekstrand [Tue, 4 Nov 2014 18:40:48 +0000 (10:40 -0800)]
nir: Add an initialization function for SSA definitions

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: Add an SSA-based liveness analysis pass.
Jason Ekstrand [Wed, 29 Oct 2014 21:17:17 +0000 (14:17 -0700)]
nir: Add an SSA-based liveness analysis pass.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: set reg_alloc and ssa_alloc when indexing registers and SSA values
Jason Ekstrand [Fri, 31 Oct 2014 04:18:22 +0000 (21:18 -0700)]
nir: set reg_alloc and ssa_alloc when indexing registers and SSA values

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: Add a function to detect if a block is immediately followed by an if
Jason Ekstrand [Wed, 29 Oct 2014 23:25:51 +0000 (16:25 -0700)]
nir: Add a function to detect if a block is immediately followed by an if

Since we don't actually have an "if" instruction, this is a very common
pattern when iterating over instructions.  This adds a helper function for
it to make things a little less painful.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: Add a foreach_block_reverse function
Jason Ekstrand [Wed, 29 Oct 2014 21:16:54 +0000 (14:16 -0700)]
nir: Add a foreach_block_reverse function

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir/foreach_block: Return false if the callback on the last block fails
Jason Ekstrand [Wed, 29 Oct 2014 21:16:39 +0000 (14:16 -0700)]
nir/foreach_block: Return false if the callback on the last block fails

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: Add a basic metadata management system
Jason Ekstrand [Wed, 29 Oct 2014 19:42:54 +0000 (12:42 -0700)]
nir: Add a basic metadata management system

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir/lower_variables_scalar: Silence a compiler warning
Jason Ekstrand [Wed, 29 Oct 2014 19:42:33 +0000 (12:42 -0700)]
nir/lower_variables_scalar: Silence a compiler warning

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoi965/fs_nir: Convert the shader to/from SSA
Jason Ekstrand [Wed, 22 Oct 2014 18:24:33 +0000 (11:24 -0700)]
i965/fs_nir: Convert the shader to/from SSA

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: Add a lower_vec_to_movs pass
Jason Ekstrand [Wed, 22 Oct 2014 19:57:28 +0000 (12:57 -0700)]
nir: Add a lower_vec_to_movs pass

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: Add a naieve from-SSA pass
Jason Ekstrand [Wed, 22 Oct 2014 18:22:53 +0000 (11:22 -0700)]
nir: Add a naieve from-SSA pass

This pass is kind of stupidly implemented but it should be enough to get us
up and going.  We probably want something better that doesn't generate all
of the redundant moves eventually.  However, the i965 backend should be
able to handle the movs, so I'm not too worried about it in the short term.

9 years agoi965/fs_nir: Don't duplicate emit_general_interpolation
Jason Ekstrand [Tue, 21 Oct 2014 01:07:28 +0000 (18:07 -0700)]
i965/fs_nir: Don't duplicate emit_general_interpolation

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoi965/fs: Don't take an ir_variable for emit_general_interpolation
Jason Ekstrand [Tue, 21 Oct 2014 01:05:36 +0000 (18:05 -0700)]
i965/fs: Don't take an ir_variable for emit_general_interpolation

Previously, emit_general_interpolation took an ir_variable and pulled the
information it needed from that.  This meant that in fs_fp, we were
constructing a dummy ir_variable just to pass into it.  This commit makes
emit_general_interpolation take only the information it needs and gets rid
of the fs_fp cruft.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: Add intrinsics to do alternate interpolation on inputs
Jason Ekstrand [Sat, 18 Oct 2014 00:11:34 +0000 (17:11 -0700)]
nir: Add intrinsics to do alternate interpolation on inputs

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: Add NIR_TRUE and NIR_FALSE constants and use them for boolean immediates
Jason Ekstrand [Thu, 16 Oct 2014 23:53:03 +0000 (16:53 -0700)]
nir: Add NIR_TRUE and NIR_FALSE constants and use them for boolean immediates

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoi965/fs_nir: Add atomic counters support
Jason Ekstrand [Thu, 16 Oct 2014 04:52:58 +0000 (21:52 -0700)]
i965/fs_nir: Add atomic counters support

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir/lower_atomics: Multiply array offsets by ATOMIC_COUNTER_SIZE
Jason Ekstrand [Thu, 16 Oct 2014 16:56:14 +0000 (09:56 -0700)]
nir/lower_atomics: Multiply array offsets by ATOMIC_COUNTER_SIZE

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoi965/fs_nir: Handle coarse/fine derivatives
Jason Ekstrand [Wed, 15 Oct 2014 21:44:00 +0000 (14:44 -0700)]
i965/fs_nir: Handle coarse/fine derivatives

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir/glsl: Add support for coarse and fine derivatives
Jason Ekstrand [Wed, 15 Oct 2014 23:57:10 +0000 (16:57 -0700)]
nir/glsl: Add support for coarse and fine derivatives

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: Add fine and coarse derivative opcodes
Jason Ekstrand [Wed, 15 Oct 2014 23:56:43 +0000 (16:56 -0700)]
nir: Add fine and coarse derivative opcodes

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir/glsl: Add support for saturate
Jason Ekstrand [Wed, 15 Oct 2014 23:19:26 +0000 (16:19 -0700)]
nir/glsl: Add support for saturate

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoi965/fs_nir: Add support for sample_pos and sample_id
Jason Ekstrand [Wed, 15 Oct 2014 23:01:04 +0000 (16:01 -0700)]
i965/fs_nir: Add support for sample_pos and sample_id

9 years agoFix up varying pull constants
Jason Ekstrand [Wed, 15 Oct 2014 22:36:43 +0000 (15:36 -0700)]
Fix up varying pull constants

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoFix what I think are a few NIR typos
Jason Ekstrand [Wed, 15 Oct 2014 20:56:48 +0000 (13:56 -0700)]
Fix what I think are a few NIR typos

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoi965/fs_nir: Use the correct texture offset immediate
Jason Ekstrand [Wed, 15 Oct 2014 22:25:10 +0000 (15:25 -0700)]
i965/fs_nir: Use the correct texture offset immediate

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoi965/fs_nir: Use the correct types for texture inputs
Jason Ekstrand [Wed, 15 Oct 2014 19:18:25 +0000 (12:18 -0700)]
i965/fs_nir: Use the correct types for texture inputs

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoi965/fs_nir: Make the sampler register always unsigned
Jason Ekstrand [Wed, 15 Oct 2014 17:41:04 +0000 (10:41 -0700)]
i965/fs_nir: Make the sampler register always unsigned

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoi965/fs: Only use nir for 8-wide non-fast-clear shaders.
Jason Ekstrand [Tue, 14 Oct 2014 23:40:04 +0000 (16:40 -0700)]
i965/fs: Only use nir for 8-wide non-fast-clear shaders.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoi965/fs: add a NIR frontend
Connor Abbott [Fri, 15 Aug 2014 17:32:07 +0000 (10:32 -0700)]
i965/fs: add a NIR frontend

This is similar to the GLSL IR frontend, except consuming NIR. This lets
us test NIR as part of an actual compiler.

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   Make brw_fs_nir build again
   Only use NIR of INTEL_USE_NIR is set
   whitespace fixes

9 years agoi965/fs: Don't pass through the coordinate type
Connor Abbott [Fri, 15 Aug 2014 17:17:26 +0000 (10:17 -0700)]
i965/fs: Don't pass through the coordinate type

All we really need is the number of components.

9 years agoi965/fs: make emit_fragcoord_interpolation() not take an ir_variable
Connor Abbott [Tue, 5 Aug 2014 18:02:02 +0000 (11:02 -0700)]
i965/fs: make emit_fragcoord_interpolation() not take an ir_variable

9 years agonir: add an SSA-based dead code elimination pass
Connor Abbott [Thu, 24 Jul 2014 22:51:58 +0000 (15:51 -0700)]
nir: add an SSA-based dead code elimination pass

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   whitespace fixes

9 years agonir: add an SSA-based copy propagation pass
Connor Abbott [Wed, 23 Jul 2014 18:19:50 +0000 (11:19 -0700)]
nir: add an SSA-based copy propagation pass

9 years agonir: add a pass to convert to SSA
Connor Abbott [Tue, 22 Jul 2014 21:05:06 +0000 (14:05 -0700)]
nir: add a pass to convert to SSA

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   whitespace fixes

9 years agonir: calculate dominance information
Connor Abbott [Fri, 18 Jul 2014 23:13:11 +0000 (16:13 -0700)]
nir: calculate dominance information

9 years agonir: add an optimization to turn global registers into local registers
Connor Abbott [Wed, 30 Jul 2014 19:08:13 +0000 (12:08 -0700)]
nir: add an optimization to turn global registers into local registers

After linking and inlining, this allows us to convert these registers
into SSA values and optimise more code.

9 years agonir: add a pass to lower atomics
Connor Abbott [Wed, 30 Jul 2014 21:43:26 +0000 (14:43 -0700)]
nir: add a pass to lower atomics

v2: Jason Ekstrand <jason.ekstrand@intel.com>
   whitespace fixes

9 years agonir: add a pass to lower system value reads
Connor Abbott [Wed, 30 Jul 2014 19:07:45 +0000 (12:07 -0700)]
nir: add a pass to lower system value reads

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   whitespace fixes

9 years agonir: add a pass to lower sampler instructions
Connor Abbott [Wed, 30 Jul 2014 19:04:49 +0000 (12:04 -0700)]
nir: add a pass to lower sampler instructions

9 years agonir: add a pass to remove unused variables
Connor Abbott [Wed, 30 Jul 2014 18:56:52 +0000 (11:56 -0700)]
nir: add a pass to remove unused variables

After we lower variables, we want to delete them in order to free up
some memory.

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
    whitespace fixes

9 years agonir: keep track of the number of input, output, and uniform slots
Connor Abbott [Tue, 5 Aug 2014 17:54:27 +0000 (10:54 -0700)]
nir: keep track of the number of input, output, and uniform slots

9 years agonir: add a pass to lower variables for scalar backends
Connor Abbott [Thu, 17 Jul 2014 16:12:52 +0000 (09:12 -0700)]
nir: add a pass to lower variables for scalar backends

9 years agonir: add a glsl-to-nir pass
Connor Abbott [Fri, 11 Jul 2014 01:18:17 +0000 (18:18 -0700)]
nir: add a glsl-to-nir pass

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   Make glsl_to_nir build again
   fix whitespace

9 years agonir: add a validation pass
Connor Abbott [Wed, 30 Jul 2014 22:20:53 +0000 (15:20 -0700)]
nir: add a validation pass

This is similar to ir_validate.cpp.

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   whitespace fixes

9 years agonir: add a printer
Connor Abbott [Wed, 30 Jul 2014 22:29:27 +0000 (15:29 -0700)]
nir: add a printer

This is similar to ir_print_visitor.cpp.

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   whitespace fixes

9 years agoSQUASH: Fix comments from eric
Jason Ekstrand [Thu, 18 Dec 2014 01:30:27 +0000 (17:30 -0800)]
SQUASH: Fix comments from eric

Reviewed-by: Eric Anholt <eric@anholt.net>
9 years agoSQUASH: Add an assert
Jason Ekstrand [Wed, 29 Oct 2014 21:15:13 +0000 (14:15 -0700)]
SQUASH: Add an assert

9 years agonir: add core helper functions
Connor Abbott [Thu, 31 Jul 2014 23:16:23 +0000 (16:16 -0700)]
nir: add core helper functions

These include functions for adding and removing various bits of IR and
helpers for iterating over all the sources and destinations of an
instruction. This is similar to ir.cpp.

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   whitespace and automake fixes

9 years agoSQUASH: Use the enum for the variable mode
Jason Ekstrand [Wed, 26 Nov 2014 23:08:19 +0000 (15:08 -0800)]
SQUASH: Use the enum for the variable mode

9 years agonir: add the core datastructures
Connor Abbott [Thu, 31 Jul 2014 23:14:51 +0000 (16:14 -0700)]
nir: add the core datastructures

This includes all the instructions, ifs, loops, functions, etc. This is
similar to the information in ir.h.

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
   Include ralloc and hash_table from the util directory
   whitespace fixes

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-By glenn.kennard <glenn.kennard@gmail.com>

9 years agonir: add a simple C wrapper around glsl_types.h
Connor Abbott [Wed, 30 Jul 2014 22:33:32 +0000 (15:33 -0700)]
nir: add a simple C wrapper around glsl_types.h

v2: Jason Ekstrand <jason.ekstrand@intel.com>:
    whitespace and automake fixes

Reviewed-by: Eric Anholt <eric@anholt.net>
9 years agonir: add initial README
Connor Abbott [Wed, 30 Jul 2014 22:32:21 +0000 (15:32 -0700)]
nir: add initial README

Reviewed-by: Eric Anholt <eric@anholt.net>
9 years agoexec_list: add a list_foreach_typed_reverse() macro
Connor Abbott [Tue, 22 Jul 2014 00:11:53 +0000 (17:11 -0700)]
exec_list: add a list_foreach_typed_reverse() macro

Reviewed-by: Eric Anholt <eric@anholt.net>
9 years agovc4: Add some dumping for STORE_TILE_BUFFER_GENERAL.
Eric Anholt [Tue, 13 Jan 2015 22:23:43 +0000 (11:23 +1300)]
vc4: Add some dumping for STORE_TILE_BUFFER_GENERAL.

9 years agovc4: Add dumping for the TILE_RENDERING_MODE_CONFIG packet.
Eric Anholt [Tue, 13 Jan 2015 21:53:20 +0000 (10:53 +1300)]
vc4: Add dumping for the TILE_RENDERING_MODE_CONFIG packet.

I wanted to read it, so I wrote parsing.

9 years agovc4: Fix CL dumping trying to dump too far.
Eric Anholt [Tue, 13 Jan 2015 21:06:02 +0000 (10:06 +1300)]
vc4: Fix CL dumping trying to dump too far.

Execution will end at the cl->next, because that's what ct0ea/ct1ea get
programmed to.

9 years agovc4: Fix texture type masking.
Eric Anholt [Tue, 13 Jan 2015 03:43:16 +0000 (16:43 +1300)]
vc4: Fix texture type masking.

Everything from ETC1 to RGBA64 was getting its top bit dropped, but we
didn't use any of those formats.

9 years agovc4: Colormask should apply after all other fragment ops (like logic op).
Eric Anholt [Mon, 12 Jan 2015 01:53:48 +0000 (14:53 +1300)]
vc4: Colormask should apply after all other fragment ops (like logic op).

Theoretically it should apply after dithering as well, but ditehring for
565 happens in fixed function in the TLB store.

9 years agovc4: No turning unpack arguments into small immediates.
Eric Anholt [Sun, 11 Jan 2015 20:14:41 +0000 (09:14 +1300)]
vc4: No turning unpack arguments into small immediates.

Since unpack only happens on things read from the A register file, we have
to leave them as something that can be allocated to A (temp or uniform).

9 years agovc4: Move the tests for src needing to be an A register to vc4_qir.c.
Eric Anholt [Sun, 11 Jan 2015 20:10:35 +0000 (09:10 +1300)]
vc4: Move the tests for src needing to be an A register to vc4_qir.c.

I want it from another location.

9 years agovc4: Don't swap the raddr on instructions doing unpacks.
Eric Anholt [Sun, 11 Jan 2015 20:16:26 +0000 (09:16 +1300)]
vc4: Don't swap the raddr on instructions doing unpacks.

It would mean different unpacking behavior, since only the A file does
unpack (with PM==0).

9 years agovc4: Don't let pairing happen with badly mismatched unpack flags.
Eric Anholt [Sun, 11 Jan 2015 06:31:59 +0000 (19:31 +1300)]
vc4: Don't let pairing happen with badly mismatched unpack flags.

No difference on shader-db, but prevents definite regressions in the
blending changes.

9 years agovc4: Don't let pairing happen with badly mismatched pack flags.
Eric Anholt [Sun, 11 Jan 2015 05:27:07 +0000 (18:27 +1300)]
vc4: Don't let pairing happen with badly mismatched pack flags.

No difference on shader-db, but will become more important as I introduce
more use of pack flags with the blending changes.

9 years agovc4: Fix early Z behavior on hardware.
Eric Anholt [Wed, 14 Jan 2015 04:11:59 +0000 (17:11 +1300)]
vc4: Fix early Z behavior on hardware.

It turns out the simulator was not treating this bit the same as the RPi,
and I'd forgotten to remove it when turning on early Z.  The result was
that you'd get big chunks of your rendering missing.

9 years agoRevert "radeonsi: only set BC_OPTIMIZE_DISABLE when necessary"
Michel Dänzer [Tue, 13 Jan 2015 07:38:52 +0000 (16:38 +0900)]
Revert "radeonsi: only set BC_OPTIMIZE_DISABLE when necessary"

This reverts commit 0543630d0b0d9d9f6eefbc14fbd3385d4de37ba0.

It caused flickering artifacts in Steam games such as Team Fortress 2 or
Left 4 Dead 2.

We could probably only enable this optimization by also making sure the
shader code only uses either SI_PARAM_LINEAR_CENTROID or
SI_PARAM_LINEAR_CENTER, not both. This would probably require a shader
variant.

Sorry I didn't remember this when reviewing the reverted change.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years agost/clover: Adapt to TargetLibraryInfo.h move in LLVM SVN r226078
Michel Dänzer [Thu, 15 Jan 2015 03:57:05 +0000 (12:57 +0900)]
st/clover: Adapt to TargetLibraryInfo.h move in LLVM SVN r226078

Trivial.

9 years agomesa: Micro-optimize _mesa_is_valid_prim_mode
Ian Romanick [Fri, 7 Nov 2014 06:51:45 +0000 (22:51 -0800)]
mesa: Micro-optimize _mesa_is_valid_prim_mode

You would not believe the mess GCC 4.8.3 generated for the old
switch-statement.

On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:

32-bit: Difference at 95.0% confidence -0.37374% +/- 0.184057% (n=40)
64-bit: Difference at 95.0% confidence 0.966722% +/- 0.338442% (n=40)

The regression on 32-bit is odd.  Callgrind says the caller,
_mesa_is_valid_prim_mode is faster.  Before it says 2,293,760
cycles, and after it says 917,504.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agomesa: Check for vertex program the same way in desktop GL and ES
Ian Romanick [Tue, 11 Nov 2014 10:29:34 +0000 (10:29 +0000)]
mesa: Check for vertex program the same way in desktop GL and ES

On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Multithread:

32-bit: Difference at 95.0% confidence 0.416027% +/- 0.163529% (n=40)
64-bit: Difference at 95.0% confidence 0.494771% +/- 0.259985% (n=40)

Gl32Batch7 had no difference proven at 95.0% confidence (n=120) on
32-bit or 64-bit.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agomesa: Drop index buffer bounds check
Ian Romanick [Tue, 11 Nov 2014 09:21:40 +0000 (09:21 +0000)]
mesa: Drop index buffer bounds check

The previous check was insufficient (as it did not take 'indices' into
consideration), and DX10 hardware does not need this check anyway.

Since index_bytes is no longer used, remove it.

On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:

32-bit: Difference at 95.0% confidence 1.66929% +/- 0.230107% (n=40)
64-bit: Difference at 95.0% confidence -1.40848% +/- 0.288038% (n=40)

The regression on 64-bit is odd.  Callgrind says the caller,
validate_DrawElements_common is faster.  Before it says 10,321,920
cycles, and after it says 8,945,664.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agomesa: Only check for a current vertex shader in core profile
Ian Romanick [Tue, 11 Nov 2014 11:28:28 +0000 (11:28 +0000)]
mesa: Only check for a current vertex shader in core profile

This doesn't affect performance, but it feels more correct.

On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:

32-bit: No difference proven at 95.0% confidence (n=120)
64-bit: No difference proven at 95.0% confidence (n=120)

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agomesa: Only validate shaders that can exist in the context
Ian Romanick [Tue, 11 Nov 2014 12:31:22 +0000 (12:31 +0000)]
mesa: Only validate shaders that can exist in the context

On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:

32-bit: Difference at 95.0% confidence 0.495267% +/- 0.202063% (n=40)
64-bit: Difference at 95.0% confidence 3.57576% +/- 0.288175% (n=40)

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoi965: Store the atoms directly in the context
Ian Romanick [Tue, 11 Nov 2014 14:51:29 +0000 (14:51 +0000)]
i965: Store the atoms directly in the context

Instead of having an extra pointer indirection in one of the hottest
loops in the driver.

On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:

32-bit: Difference at 95.0% confidence 1.98515% +/- 0.20814% (n=40)
64-bit: Difference at 95.0% confidence 1.5163% +/- 0.811016% (n=60)

v2 (Ken): Cut size of array from 64 to 57 to save memory.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoi965: Micro-optimize brw_get_index_type
Ian Romanick [Mon, 10 Nov 2014 14:06:47 +0000 (06:06 -0800)]
i965: Micro-optimize brw_get_index_type

With the switch-statement, GCC 4.8.3 produces a small pile of code with
a branch.

00000000 <brw_get_index_type>:
  000000:       8b 54 24 04             mov    0x4(%esp),%edx
  000004:       b8 01 00 00 00          mov    $0x1,%eax
  000009:       81 fa 03 14 00 00       cmp    $0x1403,%edx
  00000f:       74 0d                   je     00001e <brw_get_index_type+0x1e>
  000011:       31 c0                   xor    %eax,%eax
  000013:       81 fa 05 14 00 00       cmp    $0x1405,%edx
  000019:       0f 94 c0                sete   %al
  00001c:       01 c0                   add    %eax,%eax
  00001e:       c3                      ret

However, this could be two instructions.

00000000 <brw_get_index_type>:
  000000:       2d 01 14 00 00          sub    $0x1401,%eax
  000005:       d1 e8                   shr    %eax
  000007:       90                      nop
  000008:       90                      nop
  000009:       90                      nop
  00000a:       90                      nop
  00000b:       c3                      ret

The function was also moved to the header so that it could be inlined at
the two call sites.  Without this, 32-bit also needs to pull the
parameter from the stack.  This means there is a push, a call, a move,
and a ret added to a two instruction function.  The above code shows the
function with __attribute__((regparm=1)), but even this adds several
extra instructions.  There is also an extra instruction on 64-bit to
move the parameter to %eax for the subtract.

On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:

32-bit: Difference at 95.0% confidence 0.818589% +/- 0.234661% (n=40)
64-bit: Difference at 95.0% confidence 0.54554% +/- 0.354092% (n=40)

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agometa: Put _mesa_meta_in_progress in the header file
Ian Romanick [Tue, 11 Nov 2014 14:14:14 +0000 (14:14 +0000)]
meta: Put _mesa_meta_in_progress in the header file

...so that it can be inlined in the two places that call it.

On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
Gl32Batch7:

32-bit: No difference proven at 95.0% confidence (n=120)
64-bit: Difference at 95.0% confidence 1.24042% +/- 0.382277% (n=40)

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>