mesa.git
9 years agoradeonsi: use tgsi_shader_info in si_llvm_emit_vs_epilogue
Marek Olšák [Sat, 4 Oct 2014 20:33:36 +0000 (22:33 +0200)]
radeonsi: use tgsi_shader_info in si_llvm_emit_vs_epilogue

That code was really ugly.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: remove shader->input[] and output[] arrays and dependencies
Marek Olšák [Sat, 4 Oct 2014 20:17:25 +0000 (22:17 +0200)]
radeonsi: remove shader->input[] and output[] arrays and dependencies

They were reinventing tgsi_shader_info. They are unused now.

radeon_llvm_context::load_input can be NULL if input fetching is implemented
in some other way.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: move param_offset out of shader->input[] and output[]
Marek Olšák [Sat, 4 Oct 2014 20:15:07 +0000 (22:15 +0200)]
radeonsi: move param_offset out of shader->input[] and output[]

Those are going away.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: use tgsi_shader_info to get a list of GS outputs
Marek Olšák [Sat, 4 Oct 2014 20:07:50 +0000 (22:07 +0200)]
radeonsi: use tgsi_shader_info to get a list of GS outputs

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: use tgsi_shader_info in si_update_spi_map
Marek Olšák [Sat, 4 Oct 2014 20:03:53 +0000 (22:03 +0200)]
radeonsi: use tgsi_shader_info in si_update_spi_map

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: simplify dereferences in si_update_spi_map
Marek Olšák [Sat, 4 Oct 2014 18:59:48 +0000 (20:59 +0200)]
radeonsi: simplify dereferences in si_update_spi_map

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: use tgsi_shader_info in si_shader_vs
Marek Olšák [Sat, 4 Oct 2014 19:31:18 +0000 (21:31 +0200)]
radeonsi: use tgsi_shader_info in si_shader_vs

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: use tgsi_shader_info in si_shader_ps
Marek Olšák [Sat, 4 Oct 2014 16:33:36 +0000 (18:33 +0200)]
radeonsi: use tgsi_shader_info in si_shader_ps

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: use tgsi_shader_info in fetch_input_gs
Marek Olšák [Sat, 4 Oct 2014 15:40:39 +0000 (17:40 +0200)]
radeonsi: use tgsi_shader_info in fetch_input_gs

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: don't rely on shader->output in si_llvm_emit_fs_epilogue
Marek Olšák [Sat, 4 Oct 2014 20:09:16 +0000 (22:09 +0200)]
radeonsi: don't rely on shader->output in si_llvm_emit_fs_epilogue

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: use tgsi_shader_info in si_llvm_emit_es_epilogue
Marek Olšák [Sat, 4 Oct 2014 15:04:05 +0000 (17:04 +0200)]
radeonsi: use tgsi_shader_info in si_llvm_emit_es_epilogue

tgsi_shader_info contains everything we need.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: don't recompile shaders when changing nr_cbufs from 0 to 1
Marek Olšák [Sat, 4 Oct 2014 18:44:23 +0000 (20:44 +0200)]
radeonsi: don't recompile shaders when changing nr_cbufs from 0 to 1

Both cases are equivalent.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: remove vs.ucps_enabled from the shader key
Marek Olšák [Sat, 4 Oct 2014 18:41:03 +0000 (20:41 +0200)]
radeonsi: remove vs.ucps_enabled from the shader key

Written CLIPDIST outputs are simply disabled in PA_CL_VS_OUT_CNTL.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: assume ClipDistance usage mask is always 0xf
Marek Olšák [Sat, 4 Oct 2014 17:09:09 +0000 (19:09 +0200)]
radeonsi: assume ClipDistance usage mask is always 0xf

No code in Mesa sets the usage mask to any other value.
The final mask is AND'ed with enable bits from the rasterizer state anyway.

If somebody implements setting usage masks in st/mesa, we can use
tgsi_shader_info to get it more easily.

This is a prerequisite for the following commit.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoclover: Fix unintended fall-through in kernel::argument::bind.
Francisco Jerez [Sun, 12 Oct 2014 08:32:48 +0000 (11:32 +0300)]
clover: Fix unintended fall-through in kernel::argument::bind.

9 years agoclover: Append implicit arguments to the kernel argument list.
Jan Vesely [Wed, 8 Oct 2014 14:43:01 +0000 (17:43 +0300)]
clover: Append implicit arguments to the kernel argument list.

[ Francisco Jerez: Split off from a larger patch, and take a slightly
  different approach for passing the implicit arguments around. ]

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agoclover: Pass execution dimensions and offset to the kernel as implicit arguments.
Francisco Jerez [Wed, 8 Oct 2014 14:39:35 +0000 (17:39 +0300)]
clover: Pass execution dimensions and offset to the kernel as implicit arguments.

Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
9 years agoclover: Add semantic information to module::argument for implicit parameter passing.
Francisco Jerez [Wed, 8 Oct 2014 14:32:18 +0000 (17:32 +0300)]
clover: Add semantic information to module::argument for implicit parameter passing.

Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
9 years agoclover: Use unreachable() from util/macros.h instead of assert(0).
Francisco Jerez [Wed, 8 Oct 2014 14:29:14 +0000 (17:29 +0300)]
clover: Use unreachable() from util/macros.h instead of assert(0).

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agogallium: Add tokens for DragonFly BSD.
Vinson Lee [Tue, 16 Sep 2014 23:11:53 +0000 (16:11 -0700)]
gallium: Add tokens for DragonFly BSD.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Brian Paul <brianp@vmware.com>
9 years agoilo: disassemble compacted instructions
Chia-I Wu [Fri, 10 Oct 2014 19:24:48 +0000 (03:24 +0800)]
ilo: disassemble compacted instructions

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
9 years agoglsl: improve accuracy of atan()
Erik Faye-Lund [Fri, 26 Sep 2014 16:11:19 +0000 (18:11 +0200)]
glsl: improve accuracy of atan()

Our current atan()-approximation is pretty inaccurate at 1.0, so
let's try to improve the situation by doing a direct approximation
without going through atan.

This new implementation uses an 11th degree polynomial to approximate
atan in the [-1..1] range, and the following identitiy to reduce the
entire range to [-1..1]:

atan(x) = 0.5 * pi * sign(x) - atan(1.0 / x)

This range-reduction idea is taken from the paper "Fast computation
of Arctangent Functions for Embedded Applications: A Comparative
Analysis" (Ukil et al. 2011).

The polynomial that approximates atan(x) is:

x   * 0.9999793128310355 - x^3  * 0.3326756418091246 +
x^5 * 0.1938924977115610 - x^7  * 0.1173503194786851 +
x^9 * 0.0536813784310406 - x^11 * 0.0121323213173444

This polynomial was found with the following GNU Octave script:

x = linspace(0, 1);
y = atan(x);
n = [1, 3, 5, 7, 9, 11];
format long;
polyfitc(x, y, n)

The polyfitc function is not built-in, but too long to include here.
It can be downloaded from the following URL:

http://www.mathworks.com/matlabcentral/fileexchange/47851-constraint-polynomial-fit/content/polyfitc.m

This fixes the following piglit test:
shaders/glsl-const-folding-01

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agovc4: Use the fnv1 hash function instead of gallium util's crc32.
Eric Anholt [Fri, 10 Oct 2014 11:56:45 +0000 (13:56 +0200)]
vc4: Use the fnv1 hash function instead of gallium util's crc32.

Improves simulated norast performance on a little benchmark by 13.4012%
+/- 2.08459% (n=13).

9 years agovc4: Don't look up the compiled shaders unless state has changed.
Eric Anholt [Fri, 10 Oct 2014 12:17:15 +0000 (14:17 +0200)]
vc4: Don't look up the compiled shaders unless state has changed.

Improves simulated norast performance on a little benchmark by 38.0965%
+/- 3.27534% (n=11).

9 years agovc4: Actually clear the context's dirty flags.
Eric Anholt [Fri, 10 Oct 2014 12:24:06 +0000 (14:24 +0200)]
vc4: Actually clear the context's dirty flags.

I was trying to skip state updates when !dirty, and suspiciously
everything was always dirty.

9 years agovc4: Optimize the other case of SEL_X_Y wih a 0 -> SEL_X_0(a).
Eric Anholt [Thu, 9 Oct 2014 07:40:51 +0000 (09:40 +0200)]
vc4: Optimize the other case of SEL_X_Y wih a 0 -> SEL_X_0(a).

Cleans up some output to be more obvious in a piglit test I'm looking at.

9 years agomesa: fix error reported on gTexSubImage2D when level not valid
Tapani Pälli [Tue, 7 Oct 2014 07:56:49 +0000 (10:56 +0300)]
mesa: fix error reported on gTexSubImage2D when level not valid

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
9 years agoi965: Fix register write checks.
Kenneth Graunke [Tue, 30 Sep 2014 00:00:51 +0000 (17:00 -0700)]
i965: Fix register write checks.

When mapping the buffer a second time, we need to use the new pointer,
not the one from the previous mapping.  Otherwise, we will most likely
crash.

Apparently, we've just been getting lucky and getting the same
bo->virtual pointer in both cases.  libdrm probably has a hand in that.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
9 years agovc4: Optimize out adds of 0.
Eric Anholt [Thu, 9 Oct 2014 13:10:52 +0000 (15:10 +0200)]
vc4: Optimize out adds of 0.

9 years agovc4: Optimize fmul(x, 0) and fmul(x, 1).
Eric Anholt [Thu, 9 Oct 2014 13:02:00 +0000 (15:02 +0200)]
vc4: Optimize fmul(x, 0) and fmul(x, 1).

This was being generated frequently by matrix multiplies of 2 and
3-channel vertex attributes (which have the 0 or 1 loaded in the shader).

9 years agovc4: Factor out the turn-it-into-a-mov in opt_algebraic.
Eric Anholt [Thu, 9 Oct 2014 13:07:24 +0000 (15:07 +0200)]
vc4: Factor out the turn-it-into-a-mov in opt_algebraic.

This will be used more in the next commits.

9 years agovc4: Eliminate unused texture instructions.
Eric Anholt [Thu, 9 Oct 2014 12:42:14 +0000 (14:42 +0200)]
vc4: Eliminate unused texture instructions.

9 years agovc4: Dead code eliminate unused SF instructions.
Eric Anholt [Thu, 9 Oct 2014 12:45:14 +0000 (14:45 +0200)]
vc4: Dead code eliminate unused SF instructions.

9 years agovc4: Prevent copy propagating out the MOVs from r4.
Eric Anholt [Thu, 9 Oct 2014 14:36:45 +0000 (16:36 +0200)]
vc4: Prevent copy propagating out the MOVs from r4.

Copy propagating these might result in reading the r4 after some other
instruction has written r4.  Just prevent all copy propagation of this for
now.

Fixes bad rendering with upcoming indirect register access support, where
the copy propagation was consistently happening across another read.

9 years agovc4: Split the coordinate shader to its own vc4_compiled_shader.
Eric Anholt [Thu, 2 Oct 2014 16:50:44 +0000 (09:50 -0700)]
vc4: Split the coordinate shader to its own vc4_compiled_shader.

Merging VS and CS into the same struct wasn't winning us anything except
for not allocating a separate BO (but if we want to pack programs into
BOs, we should pack not just those 2 programs together).  What it was
getting us was a bunch of code duplication about hash table lookups and
propagating vc4_compile contents into a vc4_compiled_shader.

I was about to make the situation worse with indirect uniform buffer
access.

9 years agovc4: Add #defines for the texture uniform fields.
Eric Anholt [Thu, 2 Oct 2014 00:32:50 +0000 (17:32 -0700)]
vc4: Add #defines for the texture uniform fields.

I wanted to make another set of texture uploads for handling reladdr
constants, and duplicating all the bitshifting looked like a terrible
idea.  In the process, this fixes a swap of the s/t texture wrap modes.

9 years agovc4: Initialize undefined temporaries to 0.
Eric Anholt [Thu, 9 Oct 2014 15:49:23 +0000 (17:49 +0200)]
vc4: Initialize undefined temporaries to 0.

Under the simulator, reading registers before writing them triggers an
assertion failure.  c->undef gets treated as r0, which will usually be
written, but not if it's used in the first instruction.  We should
definitely not be aborting in this case, and return some sort of undefined
value instead.

Fixes glsl-user-varying-ff.

9 years agoi965: Skip uploading border color when unnecessary.
Kenneth Graunke [Sat, 26 Jul 2014 08:16:27 +0000 (01:16 -0700)]
i965: Skip uploading border color when unnecessary.

The border color is only needed when using the GL_CLAMP_TO_BORDER or
(deprecated) GL_CLAMP wrap modes; all others ignore it, including the
common GL_CLAMP_TO_EDGE and GL_REPEAT wrap modes.

In those cases, we can skip uploading it entirely, saving a bit of space
in the batchbuffer.  Instead, we just point it at the start of the
batch (offset 0); we have to program something, and that address is safe
to read.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
9 years agoi965: Use BDW_MOCS_PTE for renderbuffers.
Kenneth Graunke [Tue, 30 Sep 2014 08:15:56 +0000 (01:15 -0700)]
i965: Use BDW_MOCS_PTE for renderbuffers.

Write-back caching cannot be used for buffers being scanned out by the
display engine; surfaces used for scan-out must be write-through or
uncached.  I originally chose WT for render targets because it works in
all cases.  However, we really want to use write-back caching where
possible, as it is more efficient.

Most renderbuffers are not used for scanout - off-screen FBOs certainly
are fine, and non-pageflipped backbuffers should be fine as well.  So
in most cases WB will work.  However, we don't know what will be used
for scan-out, so we instead simply use the PTE value specified by the
kernel, as it knows these things.

This matches our MOCS choice on Haswell.

Fixes performance regressions since commit ee4484be3dc827cf15bcf109f5
in a microbenchmark (spotted by Eero Tamminen).  Improves performance
in GLBenchmark 2.7/EgyptHD by 7.44362% +/- 0.496939% (n=55) on a
Broadwell GT2.  Improves performance in a bunch of other microbenchmarks
by ~15% or so.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: mesa-stable@lists.freedesktop.org
9 years agoi965: Add a BRW_MOCS_PTE #define.
Kenneth Graunke [Tue, 30 Sep 2014 08:15:55 +0000 (01:15 -0700)]
i965: Add a BRW_MOCS_PTE #define.

Like BDW_MOCS_WB and BDW_MOCS_WT, this specifies that we want to use all
three caches (L3, LLC, and eLLC where available), but leaves the LLC
caching mode up to the kernel's page table entry.

This allows the kernel to pick WB/WT/UC based on whether it's using a
buffer for scanout.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: mesa-stable@lists.freedesktop.org
9 years agomesa: Make _mesa_print_arrays use stderr.
Kenneth Graunke [Sat, 27 Sep 2014 05:02:50 +0000 (22:02 -0700)]
mesa: Make _mesa_print_arrays use stderr.

These days, most driver debug output happens via stderr, not stdout.
Some applications (such as Xephyr) also appear to close stdout which
makes these messages go nowhere.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
9 years agor600g,radeonsi: Always use GTT again for PIPE_USAGE_STREAM buffers
Michel Dänzer [Tue, 26 Aug 2014 09:21:50 +0000 (18:21 +0900)]
r600g,radeonsi: Always use GTT again for PIPE_USAGE_STREAM buffers

Putting those in VRAM can cause long pauses due to buffers being moved
into / out of VRAM.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84662
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
9 years agovc4: Optimize SF(ITOF(x)) -> SF(x).
Eric Anholt [Thu, 9 Oct 2014 07:36:03 +0000 (09:36 +0200)]
vc4: Optimize SF(ITOF(x)) -> SF(x).

This is a common production of st_glsl_to_tgsi, because CMP takes a float
argument.

9 years agovc4: Add some optimization of FADD(FSUB(0, x)).
Eric Anholt [Thu, 9 Oct 2014 07:32:10 +0000 (09:32 +0200)]
vc4: Add some optimization of FADD(FSUB(0, x)).

This is a common production of st_glsl_to_tgsi, which uses negate flags on
source arguments to handle subtraction.

9 years agovc4: Mostly fix offset calculation for NPOT mipmap levels.
Eric Anholt [Mon, 6 Oct 2014 22:47:38 +0000 (15:47 -0700)]
vc4: Mostly fix offset calculation for NPOT mipmap levels.

The non-base NPOT levels are stored as POT-aligned images.  We get that
POT alignment by minifying the POT-aligned base level.

This means that level strides are also POT aligned, so we have to tell the
rendering mode config that our resource is larger than the actual
requested area.

Fixes the fbo-generatemipmap-formats NPOT cases.  Regresses
depthstencil-render-miplevels 273 * -- the texture presentation now works
(where it was completely broken before), it looks like there's some
overflow of image bounds happening at the lower miplevels.

9 years agovc4: Move the mirrored kernel code to a kernel/ directory.
Eric Anholt [Tue, 30 Sep 2014 23:25:48 +0000 (16:25 -0700)]
vc4: Move the mirrored kernel code to a kernel/ directory.

Now this whole setup matches the kernel's file layout much more closely.

9 years agovc4: Enable LIT lowering in TGSI instead of our own code.
Eric Anholt [Tue, 30 Sep 2014 20:27:36 +0000 (13:27 -0700)]
vc4: Enable LIT lowering in TGSI instead of our own code.

This brings us the -128/128 clamping on the w component.

9 years agovc4: Fix scalar math opcodes to replicate their result from the X channel.
Eric Anholt [Wed, 8 Oct 2014 20:26:58 +0000 (22:26 +0200)]
vc4: Fix scalar math opcodes to replicate their result from the X channel.

Thanks to robclark for pointing out that I was probably failing to do this
when I reported a "bug" in his lowering code.

9 years agoilo: fix rectlist on GEN7+
Chia-I Wu [Wed, 8 Oct 2014 19:30:17 +0000 (03:30 +0800)]
ilo: fix rectlist on GEN7+

It was broken by 343b014b57ecc5431477e090100e6a26edbda540.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
9 years agovc4: Add support for two-sided color.
Eric Anholt [Tue, 30 Sep 2014 23:24:26 +0000 (16:24 -0700)]
vc4: Add support for two-sided color.

It's fairly easy, thanks to Rob Clark's lowering code.  Fixes
two-sided-lighting and 4 vertex-program-two-side testcases, while
regressing 8 testcases that involve enabling two-sided color while only
initializing one of the two colors in the VS.  If you're enabling two
sided color, it's of course expected that you really do set up both
colors, so this is still an improvement (and when we set up a linker for
TGSI, we'll hopefully fix those 8 fails).

9 years agovc4: Enable POW lowering in TGSI instead of our own code.
Eric Anholt [Tue, 30 Sep 2014 20:29:22 +0000 (13:29 -0700)]
vc4: Enable POW lowering in TGSI instead of our own code.

9 years agovc4: Enable DP lowering in TGSI instead of our own code.
Eric Anholt [Tue, 30 Sep 2014 20:26:06 +0000 (13:26 -0700)]
vc4: Enable DP lowering in TGSI instead of our own code.

9 years agovc4: Start using tgsi_lowering for opcodes we haven't supported before.
Eric Anholt [Tue, 30 Sep 2014 20:25:24 +0000 (13:25 -0700)]
vc4: Start using tgsi_lowering for opcodes we haven't supported before.

9 years agogallium: Rename freedreno parts of tgsi_lowering.[ch].
Eric Anholt [Tue, 30 Sep 2014 20:14:02 +0000 (13:14 -0700)]
gallium: Rename freedreno parts of tgsi_lowering.[ch].

Acked-by: Rob Clark <robclark@freedesktop.org>
9 years agogallium: Reformat tgsi_lowering.c for the normal style.
Eric Anholt [Tue, 30 Sep 2014 20:08:56 +0000 (13:08 -0700)]
gallium: Reformat tgsi_lowering.c for the normal style.

Acked-by: Rob Clark <robclark@freedesktop.org>
9 years agogallium: Copy fd_lowering.[ch] to tgsi_lowering.[ch] for code sharing.
Eric Anholt [Tue, 30 Sep 2014 20:07:23 +0000 (13:07 -0700)]
gallium: Copy fd_lowering.[ch] to tgsi_lowering.[ch] for code sharing.

Lots of drivers need to transform the weird instructions in TGSI into
reasonable scalar ops, and this code can make those translations
canonical.

Acked-by: Rob Clark <robclark@freedesktop.org>
9 years agovc4: Set unused raddr fields to QPU_R_NOP.
Eric Anholt [Fri, 3 Oct 2014 06:32:59 +0000 (23:32 -0700)]
vc4: Set unused raddr fields to QPU_R_NOP.

The simulator assertion fails if you have a write to a reg and then a read
(for example, in the NOP side of an instruction), even if the read isn't
used for anything.  By setting unused raddrs to NOP, we avoid the problem
(since only the phsyical registers are tracked).

9 years agovc4: Abstract out the field-merging logic for instructions.
Eric Anholt [Fri, 3 Oct 2014 06:22:03 +0000 (23:22 -0700)]
vc4: Abstract out the field-merging logic for instructions.

I'm going to be doing the same logic for some more fields next.

9 years agor600: Use DMA transfers in r600_copy_global_buffer
Niels Ole Salscheider [Mon, 8 Sep 2014 18:10:31 +0000 (20:10 +0200)]
r600: Use DMA transfers in r600_copy_global_buffer

v2: Do not demote items that are already in the pool

Signed-off-by: Niels Ole Salscheider <niels_ole@salscheider-online.de>
9 years agoglsl: Optimize min/max expression trees
Iago Toral Quiroga [Tue, 29 Jul 2014 09:36:31 +0000 (12:36 +0300)]
glsl: Optimize min/max expression trees

Original patch by Petri Latvala <petri.latvala@intel.com>:

Add an optimization pass that drops min/max expression operands that
can be proven to not contribute to the final result. The algorithm is
similar to alpha-beta pruning on a minmax search, from the field of
AI.

This optimization pass can optimize min/max expressions where operands
are min/max expressions. Such code can appear in shaders by itself, or
as the result of clamp() or AMD_shader_trinary_minmax functions.

This optimization pass improves the generated code for piglit's
AMD_shader_trinary_minmax tests as follows:

total instructions in shared programs: 75 -> 67 (-10.67%)
instructions in affected programs:     60 -> 52 (-13.33%)
GAINED:                                0
LOST:                                  0

All tests (max3, min3, mid3) improved.

A full shader-db run:

total instructions in shared programs: 4293603 -> 4293575 (-0.00%)
instructions in affected programs:     1188 -> 1160 (-2.36%)
GAINED:                                0
LOST:                                  0

Improvements happen in Guacamelee and Serious Sam 3. One shader from
Dungeon Defenders is hurt by shader-db metrics (26 -> 28), because of
dropping of a (constant float (0.00000)) operand, which was
compiled to a saturate modifier.

Version 2 by Iago Toral Quiroga <itoral@igalia.com>:

Changes from review feedback:
- Squashed various cosmetic changes sent by Matt Turner.
- Make less_all_components return an enum rather than setting a class member.
  (Suggested by Mat Turner). Also, renamed it to compare_components.
- Make less_all_components, smaller_constant and larger_constant static.
  (Suggested by Mat Turner)
- Change mixmax_range to call its limits "low" and "high" instead of
  "range[0]" and "range[1]". (Suggested by Connor Abbot).
- Use ir_builder swizzle helpers in swizzle_if_required(). (Suggested by
  Connor Abbot).
- Make the logic more clearer by rearrenging the code and commenting.
  (Suggested by Connor Abbot).
- Added comment to explain why we need to recurse twice. (Suggested by
  Connor Abbot).
- If we cannot prune an expression, do not return early. Instead, attempt
  to prune its children. (Suggested by Connor Abbot).

Other changes:
- Instead of having a global "valid" visitor member, let the various functions
  that can determine this status return a boolean and check for its value
  to decide what to do in each case. This is more flexible and allows to
  recurse into children of parents that could not be prunned due to invalid
  ranges (so related to the last bullet in the review feedback).
- Make sure we always check if a range is valid before working with it. Since
  any use of get_range, combine_range or range_intersection can invalidate
  a range we should check for this situation every time we use any of these
  functions.

Version 3 by Iago Toral Quiroga <itoral@igalia.com>:

Changes from review feedback:
- Now we can make get_range, combine_range and range_intersection static too
  (suggested by Connor Abbot).
- Do not return NULL when looking for the larger or greater constant into
  mixed vector constants. Instead, produce a new constant by doing a
  component-wise minmax. With this we can also remove of the validations when
  we call into these functions (suggested by Connor Abbot).
- Add a comment explaining the meaning of the baserange argument in
  prune_expression (suggested by Connor Abbot).

Other changes:
- Eliminate minmax expressions operating on constant vectors with mixed values
  by resolving them.

No piglit regressions observed with Version 3.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76861

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoglsl: do not emit error for non written varyings on OpenGL ES
Tapani Pälli [Tue, 16 Sep 2014 17:18:41 +0000 (20:18 +0300)]
glsl: do not emit error for non written varyings on OpenGL ES

Patch fixes following test case from 'shaders-with-varyings' WebGL
conformance suite: "vertex shader with unused varying and fragment
shader with used varying must succeed"

v2: emit still a warning if the condition happens (Ian)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agoradeonsi: Use dummy pixel shader if compilation of the real shader failed
Michel Dänzer [Mon, 6 Oct 2014 08:05:38 +0000 (17:05 +0900)]
radeonsi: Use dummy pixel shader if compilation of the real shader failed

Instead of crashing.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79155#c5
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years agoilo: let shaders determine surface counts
Chia-I Wu [Mon, 6 Oct 2014 04:42:56 +0000 (12:42 +0800)]
ilo: let shaders determine surface counts

When a shader needs N surfaces, we should upload N surfaces and not depend on
how many are bound.  This commit is larger than it should be because we did
not export how many surfaces a surface uses before.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
9 years agoilo: let shaders determine sampler counts
Chia-I Wu [Sat, 4 Oct 2014 02:51:20 +0000 (10:51 +0800)]
ilo: let shaders determine sampler counts

When a shader needs N samplers, we should upload N samplers and not depend on
how many are bound.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
9 years agotgsi: change tgsi_shader_info::properties to a one-dimensional array
Marek Olšák [Thu, 2 Oct 2014 14:36:51 +0000 (16:36 +0200)]
tgsi: change tgsi_shader_info::properties to a one-dimensional array

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
v2: fix svga too

9 years agoradeonsi: set number of userdata SGPRs of GS copy shader to 4
Marek Olšák [Tue, 23 Sep 2014 17:42:28 +0000 (19:42 +0200)]
radeonsi: set number of userdata SGPRs of GS copy shader to 4

It only needs the constant buffer with clip planes and read-write resources
for the GS->VS ring and streamout. That's 2 pointers.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: pass the GS shader directly to si_generate_gs_copy_shader
Marek Olšák [Tue, 30 Sep 2014 16:15:17 +0000 (18:15 +0200)]
radeonsi: pass the GS shader directly to si_generate_gs_copy_shader

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: set LLVMByValAttribute for all descriptor arrays
Marek Olšák [Tue, 30 Sep 2014 16:13:06 +0000 (18:13 +0200)]
radeonsi: set LLVMByValAttribute for all descriptor arrays

I hope this is correct.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: make the vertex shader key smaller
Marek Olšák [Thu, 25 Sep 2014 14:47:55 +0000 (16:47 +0200)]
radeonsi: make the vertex shader key smaller

We only support 16 vertex attribs, not 32.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: don't flush shader caches when building PM4 shader states
Marek Olšák [Tue, 23 Sep 2014 15:25:41 +0000 (17:25 +0200)]
radeonsi: don't flush shader caches when building PM4 shader states

This is a wrong place to flush caches to say the least.

I don't think we need to flush the instruction caches if we don't patch
shaders with DMA.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: remove interp_at_sample from the key, use TGSI_INTERPOLATE_LOC_SAMPLE
Marek Olšák [Tue, 30 Sep 2014 15:09:13 +0000 (17:09 +0200)]
radeonsi: remove interp_at_sample from the key, use TGSI_INTERPOLATE_LOC_SAMPLE

st/mesa has the same flag in its shader key, we don't need to do it
in the driver anymore.

Instead, use TGSI_INTERPOLATE_LOC_SAMPLE, which is what st/mesa sets.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: move geometry shader properties from si_shader to si_shader_selector
Marek Olšák [Tue, 30 Sep 2014 14:55:36 +0000 (16:55 +0200)]
radeonsi: move geometry shader properties from si_shader to si_shader_selector

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: always compile shaders on demand
Marek Olšák [Tue, 30 Sep 2014 14:25:18 +0000 (16:25 +0200)]
radeonsi: always compile shaders on demand

The first compiled shader is sometimes useless, because the key doesn't match
the key for the draw call where it's used.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: remove unused variable si_shader::gs_input_prim
Marek Olšák [Tue, 30 Sep 2014 14:11:59 +0000 (16:11 +0200)]
radeonsi: remove unused variable si_shader::gs_input_prim

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agotgsi: remove some not so useful variables from tgsi_shader_info
Marek Olšák [Tue, 30 Sep 2014 13:59:37 +0000 (15:59 +0200)]
tgsi: remove some not so useful variables from tgsi_shader_info

9 years agoradeonsi: get fs_write_all from tgsi_shader_info directly
Marek Olšák [Tue, 30 Sep 2014 13:56:14 +0000 (15:56 +0200)]
radeonsi: get fs_write_all from tgsi_shader_info directly

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agotgsi: simplify shader properties in tgsi_shader_info
Marek Olšák [Tue, 30 Sep 2014 13:48:22 +0000 (15:48 +0200)]
tgsi: simplify shader properties in tgsi_shader_info

Use an array of properties indexed by TGSI_PROPERTY_* definitions.

9 years agoradeonsi: get tgsi_shader_info only once before compilation
Marek Olšák [Tue, 30 Sep 2014 13:12:09 +0000 (15:12 +0200)]
radeonsi: get tgsi_shader_info only once before compilation

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agogallium/util: add util_bitcount64
Marek Olšák [Wed, 24 Sep 2014 16:26:21 +0000 (18:26 +0200)]
gallium/util: add util_bitcount64

I'll need this in radeonsi.

v2: use __builtin_popcountll if available

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: fix CS tracing and remove excessive CS dumping
Marek Olšák [Fri, 5 Sep 2014 09:59:10 +0000 (11:59 +0200)]
radeonsi: fix CS tracing and remove excessive CS dumping

9 years agogk110/ir: add dnz flag emission for fmul/fmad
Ilia Mirkin [Sun, 28 Sep 2014 16:07:03 +0000 (12:07 -0400)]
gk110/ir: add dnz flag emission for fmul/fmad

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
9 years agogm107/ir: add dnz emission for fmul
Ilia Mirkin [Sun, 28 Sep 2014 05:52:11 +0000 (01:52 -0400)]
gm107/ir: add dnz emission for fmul

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
9 years agost/wgl: add WINAPI qualifiers on wgl function typedefs
Brian Paul [Fri, 3 Oct 2014 15:55:34 +0000 (09:55 -0600)]
st/wgl: add WINAPI qualifiers on wgl function typedefs

Fixes a release build segfault when wglCreateContextAttribsARB()
calls the wglCreateContext() function.

Cc: "10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
9 years agofreedreno: query fixes
Rob Clark [Fri, 3 Oct 2014 16:48:31 +0000 (12:48 -0400)]
freedreno: query fixes

Fixes a few issues, including a potential empty-IB (which triggers gpu
hangs in piglit occlusion_query_meta_no_fragments)

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno/a3xx: handle VS only outputting BCOLOR
Rob Clark [Fri, 3 Oct 2014 14:08:59 +0000 (10:08 -0400)]
freedreno/a3xx: handle VS only outputting BCOLOR

Possibly we should map the front color to black (zeroes).  But not sure
there is a way to do that without generating a shader variant.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno/ir3: fix lockups with lame FRAG shaders
Rob Clark [Fri, 3 Oct 2014 14:02:31 +0000 (10:02 -0400)]
freedreno/ir3: fix lockups with lame FRAG shaders

Shaders like:

  FRAG
  PROPERTY FS_COLOR0_WRITES_ALL_CBUFS 1
  DCL IN[0], GENERIC[0], PERSPECTIVE
  DCL OUT[0], COLOR
  DCL SAMP[0]
  DCL TEMP[0], LOCAL
  IMM[0] FLT32 {    0.0000,     1.0000,     0.0000,     0.0000}
    0: TEX TEMP[0], IN[0].xyyy, SAMP[0], 2D
    1: MOV OUT[0], IMM[0].xyxx
    2: END

cause unhappyness.  They have an IN[], but once this is compiled the
useless TEX instruction goes away.  Leaving a varying that is never
fetched, which makes the hw unhappy.

In the process fix a signed vs unsigned compare.  If the vertex shader
has max_reg=-1, MAX2() vs an unsigned would not give the desired result.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agoi965/compaction: Disable compaction on SNB temporarily.
Matt Turner [Fri, 3 Oct 2014 17:01:54 +0000 (10:01 -0700)]
i965/compaction: Disable compaction on SNB temporarily.

Will investigate after XDC.

9 years agoRevert "i965: Emit ELSE/ENDIF JIP with type D on Gen 7."
Matt Turner [Fri, 3 Oct 2014 16:58:41 +0000 (09:58 -0700)]
Revert "i965: Emit ELSE/ENDIF JIP with type D on Gen 7."

This reverts commit 54e30dbf4db437748509d1319c3f6e4185f76c69.

Will investigate after XDC.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84557

9 years agoi965/fs: Remove dead generate_rep_fb_write prototype.
Matt Turner [Wed, 1 Oct 2014 06:18:34 +0000 (23:18 -0700)]
i965/fs: Remove dead generate_rep_fb_write prototype.

Added in commit f9dc7aab.

9 years agomesa: fix spurious wglGetProcAddress / GL_INVALID_OPERATION error
Brian Paul [Thu, 2 Oct 2014 15:36:54 +0000 (09:36 -0600)]
mesa: fix spurious wglGetProcAddress / GL_INVALID_OPERATION error

On Windows, the Piglit primitive-restart test was failing a
glGetError()==0 assertion when it was run w/out any command line
arguments.  Piglit's all.py script only runs primitive-restart
with arguments so this case isn't normally hit during a full
piglit run.

The basic problem is Microsoft's opengl32.dll calls glFlush
from wglGetProcAddress() and Piglit uses wglGetProcAddress() to
resolve glPrimitiveRestartNV() which is called inside glBegin/End.
See comments in the code for more info.

Plus, improve the comments for _mesa_alloc_dispatch_table().

Cc: <mesa-stable@lists.freedesktop.org>
Acked-by: Sinclair Yeh <syeh@vmware.com>
9 years agofreedreno/ir3: add TXF support
Ilia Mirkin [Wed, 24 Sep 2014 21:42:03 +0000 (17:42 -0400)]
freedreno/ir3: add TXF support

Still failing a bunch of the fairly picky texelFetch tests, but the
1D(Array) ones are full passes.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agofreedreno/ir3: add TXD support and expose ARB_shader_texture_lod
Ilia Mirkin [Sat, 27 Sep 2014 14:50:40 +0000 (10:50 -0400)]
freedreno/ir3: add TXD support and expose ARB_shader_texture_lod

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agofreedreno/ir3: add texture offset support
Ilia Mirkin [Sat, 27 Sep 2014 06:52:42 +0000 (02:52 -0400)]
freedreno/ir3: add texture offset support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agofreedreno/ir3: shadow comes before array
Ilia Mirkin [Wed, 1 Oct 2014 00:02:37 +0000 (20:02 -0400)]
freedreno/ir3: shadow comes before array

Experimentally, this makes *ArrayShadow tex-miplevel-selection tests
pass.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agofreedreno/ir3: make TXQ return integers, not floats
Ilia Mirkin [Sun, 28 Sep 2014 23:37:27 +0000 (19:37 -0400)]
freedreno/ir3: make TXQ return integers, not floats

We're still doing something wrong for array textures.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agofreedreno/ir3: add UMAD support
Ilia Mirkin [Wed, 1 Oct 2014 05:13:38 +0000 (01:13 -0400)]
freedreno/ir3: add UMAD support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agofreedreno/ir3: add ISSG support
Ilia Mirkin [Mon, 29 Sep 2014 02:00:34 +0000 (22:00 -0400)]
freedreno/ir3: add ISSG support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agofreedreno/ir3: add MOD support
Ilia Mirkin [Wed, 1 Oct 2014 05:03:31 +0000 (01:03 -0400)]
freedreno/ir3: add MOD support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agofreedreno/ir3: add UMOD support, based on UDIV
Ilia Mirkin [Mon, 29 Sep 2014 01:05:05 +0000 (21:05 -0400)]
freedreno/ir3: add UMOD support, based on UDIV

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agofreedreno/ir3: add IDIV/UDIV support
Ilia Mirkin [Fri, 12 Sep 2014 03:15:11 +0000 (23:15 -0400)]
freedreno/ir3: add IDIV/UDIV support

Logic shamelessly copied from nv50 lowering pass.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>