mesa.git
9 years agofreedreno/ir3: large const support
Rob Clark [Wed, 15 Oct 2014 17:08:00 +0000 (13:08 -0400)]
freedreno/ir3: large const support

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno: update generated headers
Rob Clark [Wed, 15 Oct 2014 18:38:07 +0000 (14:38 -0400)]
freedreno: update generated headers

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno: fix layer_stride
Rob Clark [Wed, 15 Oct 2014 14:29:17 +0000 (10:29 -0400)]
freedreno: fix layer_stride

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno: inline fd_draw_emit()
Rob Clark [Wed, 15 Oct 2014 12:12:24 +0000 (08:12 -0400)]
freedreno: inline fd_draw_emit()

Manual LTO

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno/ir3: optimize shader key comparision
Rob Clark [Tue, 14 Oct 2014 20:23:18 +0000 (16:23 -0400)]
freedreno/ir3: optimize shader key comparision

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno/a3xx: refactor/optimize emit
Rob Clark [Tue, 14 Oct 2014 18:27:47 +0000 (14:27 -0400)]
freedreno/a3xx: refactor/optimize emit

Because we reuse various bits of emit code (for state/vertex/prog/etc)
for both regular draws and internal draws (gmem<->mem, clear, etc), the
number of parameters getting passed around has been growing.  Refactor
to group these into fd3_emit.  This simplifies fxn signatures, avoids
passing around shader key on the stack, etc.  It also gives us a nice
place to cache shader-variant lookup to avoid looking up shader variants
multiple times per draw (without having to *also* pass them around as
fxn args everywhere).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno/a3xx: refactor vertex state emit
Rob Clark [Tue, 14 Oct 2014 16:20:54 +0000 (12:20 -0400)]
freedreno/a3xx: refactor vertex state emit

Get rid of fd3_vertex_buf and use fd_vertex_state directly for all
draws.  Removes a tiny bit of CPU overhead for munging around the vertex
state every time it is emitted, but more importantly it cleans things up
for later optimizations, so the emit paths don't have to special case
internal draws (gmem<->mem, clears, etc) with regular draws.

Instead of constructing fd3_vertex_buf array each time for internal
draws, and context init time pre-create solid_vbuf_state and
blit_vbuf_state.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agovc4: Fix the uniform debug output.
Eric Anholt [Wed, 15 Oct 2014 15:16:09 +0000 (16:16 +0100)]
vc4: Fix the uniform debug output.

I dropped the shader index when moving to the compiled shader struct, but
didn't update the format string here.

9 years agovc4: Add support for user clip plane and gl_ClipVertex.
Eric Anholt [Wed, 15 Oct 2014 14:25:57 +0000 (15:25 +0100)]
vc4: Add support for user clip plane and gl_ClipVertex.

Fixes about 15 piglit tests about interpolation and clipping.

9 years agovc4: Move the output semantics setup to a helper.
Eric Anholt [Wed, 15 Oct 2014 15:39:54 +0000 (16:39 +0100)]
vc4: Move the output semantics setup to a helper.

I want to reuse it elsewhere to set up outputs that aren't in the TGSI.

9 years agoi965: Allow CSE on Gen4-5 unary math.
Kenneth Graunke [Tue, 14 Oct 2014 06:45:07 +0000 (23:45 -0700)]
i965: Allow CSE on Gen4-5 unary math.

Due to the implicit move-from-GRF, unary math looks a lot like the Gen6+
math instruction: it's a single instruction (SEND) with a GRF source.
The difference is that it also implicitly clobbers a message register.

The only visible effect is that CSE will remove the MRF-clobbering from
later math operations.  This should be fine; compute_to_mrf and
remove_redundant_mrf_writes don't look at the values populated by
implied writes, so they can't rely on those values being present.
Less interference may actually help those passes make more progress.

Binary math is still problematic, since it involves a separate MOV
instruction to load the second operand.  We continue disallowing CSE for
binary math operations.

total instructions in shared programs: 3340303 -> 3340100 (-0.01%)
instructions in affected programs:     26927 -> 26724 (-0.75%)
Nothing hurt, gained, or lost.  ~6% reduction on a few shaders.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agor600g,radeonsi: Only set use_staging_texture = TRUE once
Michel Dänzer [Wed, 8 Oct 2014 07:05:36 +0000 (16:05 +0900)]
r600g,radeonsi: Only set use_staging_texture = TRUE once

No need to check for setting the flag after we set it already.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years agor600g,radeonsi: Use staging texture for transfers if any miplevel is tiled
Michel Dänzer [Wed, 8 Oct 2014 07:01:47 +0000 (16:01 +0900)]
r600g,radeonsi: Use staging texture for transfers if any miplevel is tiled

We set the NO_CPU_ACCESS flag for BO allocation in that case, so direct CPU
access may not work.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years agowinsys/radeon: Use separate caching buffer manager for each set of flags
Michel Dänzer [Wed, 8 Oct 2014 07:34:46 +0000 (16:34 +0900)]
winsys/radeon: Use separate caching buffer manager for each set of flags

Otherwise the caching buffer manager may return a buffer which was created
with a different set of flags, which can cause trouble.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
9 years agoconfigure.ac: check for libexpat when no pkg-config is available
Andres Gomez [Tue, 7 Oct 2014 14:32:17 +0000 (17:32 +0300)]
configure.ac: check for libexpat when no pkg-config is available

Previously, when no pkg-config was available for
libexpat we would just add the needed linking
flags without any extra check.

Now, we check that the library and the headers are
also installed in the building environment.

Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
9 years agoclover: Fix regression in module serialization
Tom Stellard [Tue, 14 Oct 2014 21:55:23 +0000 (17:55 -0400)]
clover: Fix regression in module serialization

We need to serialize semantic information for arguments, which was added
in 06139c56fa070f84a931a4ddbdb894c9e8d24f55.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agoi965/fs: Use the correct regs_written on unspill instructions
Jason Ekstrand [Tue, 14 Oct 2014 19:02:19 +0000 (12:02 -0700)]
i965/fs: Use the correct regs_written on unspill instructions

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agost/gbm: fix order of arguments passed to is_format_supported
Ilia Mirkin [Tue, 14 Oct 2014 02:39:48 +0000 (22:39 -0400)]
st/gbm: fix order of arguments passed to is_format_supported

Reported by Coverity

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Cc: mesa-stable@lists.freedesktop.org
9 years agonouveau: 3d textures are unsupported, limit 3d levels to 1
Ilia Mirkin [Sun, 5 Oct 2014 16:35:51 +0000 (12:35 -0400)]
nouveau: 3d textures are unsupported, limit 3d levels to 1

Ideally there would be a swrast fallback, but the driver isn't ready for
that. This should avoid crashes if someone tries to use 3d textures
though.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: mesa-stable@lists.freedesktop.org
9 years agofreedreno: use tgsi_lowering
Rob Clark [Wed, 1 Oct 2014 00:09:11 +0000 (20:09 -0400)]
freedreno: use tgsi_lowering

Now that the freedreno_lowering code is moved to tgsi_lowering, remove
our private copy and switch over to using the common version.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agor300/compiler: remove useless check
David Heidelberger [Tue, 14 Oct 2014 00:25:01 +0000 (02:25 +0200)]
r300/compiler: remove useless check

This code is already in if (!variable->C->is_r500) so no need check
twice.

Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
Signed-off-by: David Heidelberger <david.heidelberger@ixit.cz>
9 years agoilo: Build pipe-loader for ilo
Nick Sarnie [Fri, 12 Sep 2014 22:20:46 +0000 (18:20 -0400)]
ilo: Build pipe-loader for ilo

Trivial patch to create the pipe loader for ilo. All the code was already there.

Signed-off-by: Nick Sarnie <commendsarnex@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
9 years agoautomake: explicitly set TARGET_RADEON_{WINSYS,COMMON}
Emil Velikov [Tue, 14 Oct 2014 14:25:54 +0000 (15:25 +0100)]
automake: explicitly set TARGET_RADEON_{WINSYS,COMMON}

Originally the variables were set only once via the ?= operator but
that causes issues when doing incremental builds. They appear to be
undefined and missing from the dependency list despite their addition
to LIBADD.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84807
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
9 years agovc4: Fix render target NPOT alignment at small miplevels.
Eric Anholt [Tue, 14 Oct 2014 13:28:14 +0000 (14:28 +0100)]
vc4: Fix render target NPOT alignment at small miplevels.

The texturing hardware takes the POT level 0 width/height and minifies
those.  This is different from what we were doing, for example, for
273-wide's level 5: POT(273>>5) == 8, while POT(273)>>5 == 16.

Fixes piglit-depthstencil-render-miplevels 273.

9 years agovc4: Add support for having 0 vertex elements used.
Eric Anholt [Thu, 25 Sep 2014 21:57:01 +0000 (14:57 -0700)]
vc4: Add support for having 0 vertex elements used.

You have to load at least 1, according to the simulator.  Fixes 4 piglit
tests and even more ES2 conformance tests.

9 years agoauxilary/os: Add DragonFly BSD support in os_get_total_physical_memory.
Vinson Lee [Sat, 11 Oct 2014 05:40:21 +0000 (22:40 -0700)]
auxilary/os: Add DragonFly BSD support in os_get_total_physical_memory.

This patch fixes this build error on DragonFly BSD.

  CC       os/os_misc.lo
os/os_misc.c: In function 'os_get_total_physical_memory':
os/os_misc.c:132:2: error: #error Unsupported *BSD

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
9 years agoglx: Fix glxUseXFont for glxWindow and glxPixmaps
Daniel Manjarres [Sun, 22 Jun 2014 16:47:58 +0000 (09:47 -0700)]
glx: Fix glxUseXFont for glxWindow and glxPixmaps

The current implementation of glxUseXFont requires creating
a temporary pixmap and graphics context, which requires a real
old-school X11 Window, not a glxDrawable. This patch changes
things so that glxUseXFont will also accept a glxWindow or
glxPixmap, and lookup the underlying X11 Drawable. Without
this patch glxUseXFont generates a giant stream of Xerrors
about bad drawables and bad graphics contexts.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=54372

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Brian Paul <brianp@vmware.com>
9 years agoilo: clear writer pointer after unmapping
Chia-I Wu [Tue, 14 Oct 2014 00:29:16 +0000 (08:29 +0800)]
ilo: clear writer pointer after unmapping

It does not look like an issue now but it is good to be future proof.  Spotted
by Courtney Goeltzenleuchter.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
9 years agovc4: Write the VPM read setup multiple times to queue all the inputs.
Eric Anholt [Mon, 13 Oct 2014 15:20:01 +0000 (16:20 +0100)]
vc4: Write the VPM read setup multiple times to queue all the inputs.

There's a 4-element fifo, and the size (number of dwords per vertex) field
is just 4 bits.

Fixes glsl-routing on sim.

9 years agovc4: Add support for the TXL opcode.
Eric Anholt [Mon, 13 Oct 2014 13:38:10 +0000 (14:38 +0100)]
vc4: Add support for the TXL opcode.

There's a bit at the bottom of cube map stride (which has some formatting
bugs in the docs) which flips the bias coordinate to being an absolute
LOD.

9 years agovc4: Improve the accuracy of SIN and COS.
Eric Anholt [Mon, 13 Oct 2014 13:11:28 +0000 (14:11 +0100)]
vc4: Improve the accuracy of SIN and COS.

This gets them to pass glsl-sin/cos.  There was an obvious problem that I
was using the FRC code on the scaled input value, which means that we had
a range in [0, 1], while our taylor is most accurate across [-0.5, 0.5].
We can just slide things over, but that means flipping the sign of the
coefficients.  After that, it was just a matter of stuffing more
coefficients in.

9 years agoi965: Use unsynchronized maps for the program cache on LLC platforms.
Kenneth Graunke [Thu, 21 Aug 2014 21:41:17 +0000 (14:41 -0700)]
i965: Use unsynchronized maps for the program cache on LLC platforms.

There's no reason to stall on pwrite - the CPU always appends to the
buffer and never modifies existing contents, and the GPU never writes
it.  Further, the CPU always appends new data before submitting a batch
that requires it.

This code predates the unsynchronized mapping feature, so we simply
didn't have the option when it was written.

Ideally, we would do this for non-LLC platforms too, but unsynchronized
mapping support only exists for LLC systems.

Saves a bunch of stall avoidance copies when uploading shaders.

v2: Rebase on changes to previous patch.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> [v1]
9 years agoi965: Issue performance warnings when copying the program cache BO.
Kenneth Graunke [Thu, 21 Aug 2014 17:50:31 +0000 (10:50 -0700)]
i965: Issue performance warnings when copying the program cache BO.

We don't really want unnecessary buffer copying, so it'd be nice to know
when it's happening.

v2: Drop stall warnings when doing a read-only CPU mapping of the cache
    BO.  The GPU also uses it in a read-only fashion, so there won't be
    any stalls, even though the buffer is busy.  (Thanks to Chris Wilson
    for catching this mistake.)

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net> [v1]
9 years agoi965: Issue performance warnings on MapBufferRange stalls.
Kenneth Graunke [Thu, 21 Aug 2014 17:42:05 +0000 (10:42 -0700)]
i965: Issue performance warnings on MapBufferRange stalls.

This is easy: we just need to use brw_map_bo instead of mapping it
directly.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
9 years agovc4: Match VS outputs to FS inputs.
Eric Anholt [Mon, 13 Oct 2014 07:24:57 +0000 (08:24 +0100)]
vc4: Match VS outputs to FS inputs.

If the VS doesn't output a value that the FS needs, we still need to read
the right contents for the remaining FS inputs, by emitting padding.  And
if the VS outputs something the FS doesn't need, we shouldn't put it in
the VPM at all (so the code producing it can get DCEed).

Fixes 77 piglit tests.

9 years agoconfigure: use $libdir/dri as default for VA-API
Christian König [Thu, 9 Oct 2014 17:51:48 +0000 (19:51 +0200)]
configure: use $libdir/dri as default for VA-API

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agoconfigure: remove superflous VA-API line from configure.ac
Christian König [Thu, 9 Oct 2014 16:03:02 +0000 (18:03 +0200)]
configure: remove superflous VA-API line from configure.ac

We don't have GALLIUM_STATE_TRACKERS_DIRS any more.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agoconfigure: respect $libdir for the OMX installation dir
Christian König [Thu, 9 Oct 2014 16:42:58 +0000 (18:42 +0200)]
configure: respect $libdir for the OMX installation dir

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agoconfigure: Revert "ask vdpau.pc for the default location of the vdpau drivers"
Christian König [Thu, 9 Oct 2014 16:01:19 +0000 (18:01 +0200)]
configure: Revert "ask vdpau.pc for the default location of the vdpau drivers"

This reverts commit bbe6f7f865cd4316b5f885507ee0b128a20686eb.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agovc4: Add support for the CEIL opcode.
Eric Anholt [Mon, 13 Oct 2014 07:05:35 +0000 (08:05 +0100)]
vc4: Add support for the CEIL opcode.

Not as big of a deal as SSG, but still +9 piglit tests.

9 years agovc4: Add support for the SSG opcode.
Eric Anholt [Sun, 12 Oct 2014 21:02:53 +0000 (22:02 +0100)]
vc4: Add support for the SSG opcode.

9 years agodocs: add news item and link release notes
Emil Velikov [Mon, 13 Oct 2014 01:14:02 +0000 (02:14 +0100)]
docs: add news item and link release notes

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
9 years agodocs: Add sha256 sums for the 10.3.1 release
Emil Velikov [Sun, 12 Oct 2014 23:34:19 +0000 (00:34 +0100)]
docs: Add sha256 sums for the 10.3.1 release

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit fa98c74692634de4f87694a40a299b59c4716ee5)

9 years agoAdd release notes for the 10.3.1 release
Emil Velikov [Sun, 12 Oct 2014 23:16:59 +0000 (00:16 +0100)]
Add release notes for the 10.3.1 release

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 088d3501786a2ff0833de45951b63acbe6560a0f)

9 years agodocs: Add sha256 sums for the 10.2.9 release
Emil Velikov [Sun, 12 Oct 2014 20:05:07 +0000 (21:05 +0100)]
docs: Add sha256 sums for the 10.2.9 release

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 52bd154980e306b8bc9b9d2edc0e728a9f8f3bf6)

9 years agoAdd release notes for the 10.2.9 release
Emil Velikov [Sun, 12 Oct 2014 18:06:25 +0000 (19:06 +0100)]
Add release notes for the 10.2.9 release

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
(cherry picked from commit 9f1149876f2d010c871751a53d02d4d2b6aef1fe)

9 years agor600g: Implement GL_ARB_sample_shading
Glenn Kennard [Wed, 10 Sep 2014 09:54:40 +0000 (11:54 +0200)]
r600g: Implement GL_ARB_sample_shading

Also fixes two sided lighting which was broken at least
on pre-evergreen by commit b1eb00.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
9 years agoradeonsi: use tgsi_shader_info in si_llvm_emit_fs_epilogue
Marek Olšák [Sat, 4 Oct 2014 21:13:50 +0000 (23:13 +0200)]
radeonsi: use tgsi_shader_info in si_llvm_emit_fs_epilogue

This is the last use tgsi_parse_token in radeonsi.

It looks ugly because the code was re-indented, but there is really no change
in behavior.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: remove si_shader_output_values::index
Marek Olšák [Sat, 4 Oct 2014 20:37:23 +0000 (22:37 +0200)]
radeonsi: remove si_shader_output_values::index

It's redundant now.

It led to a simplification in si_llvm_emit_streamout, because outidx == reg.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: use tgsi_shader_info in si_llvm_emit_vs_epilogue
Marek Olšák [Sat, 4 Oct 2014 20:33:36 +0000 (22:33 +0200)]
radeonsi: use tgsi_shader_info in si_llvm_emit_vs_epilogue

That code was really ugly.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: remove shader->input[] and output[] arrays and dependencies
Marek Olšák [Sat, 4 Oct 2014 20:17:25 +0000 (22:17 +0200)]
radeonsi: remove shader->input[] and output[] arrays and dependencies

They were reinventing tgsi_shader_info. They are unused now.

radeon_llvm_context::load_input can be NULL if input fetching is implemented
in some other way.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: move param_offset out of shader->input[] and output[]
Marek Olšák [Sat, 4 Oct 2014 20:15:07 +0000 (22:15 +0200)]
radeonsi: move param_offset out of shader->input[] and output[]

Those are going away.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: use tgsi_shader_info to get a list of GS outputs
Marek Olšák [Sat, 4 Oct 2014 20:07:50 +0000 (22:07 +0200)]
radeonsi: use tgsi_shader_info to get a list of GS outputs

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: use tgsi_shader_info in si_update_spi_map
Marek Olšák [Sat, 4 Oct 2014 20:03:53 +0000 (22:03 +0200)]
radeonsi: use tgsi_shader_info in si_update_spi_map

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: simplify dereferences in si_update_spi_map
Marek Olšák [Sat, 4 Oct 2014 18:59:48 +0000 (20:59 +0200)]
radeonsi: simplify dereferences in si_update_spi_map

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: use tgsi_shader_info in si_shader_vs
Marek Olšák [Sat, 4 Oct 2014 19:31:18 +0000 (21:31 +0200)]
radeonsi: use tgsi_shader_info in si_shader_vs

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: use tgsi_shader_info in si_shader_ps
Marek Olšák [Sat, 4 Oct 2014 16:33:36 +0000 (18:33 +0200)]
radeonsi: use tgsi_shader_info in si_shader_ps

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: use tgsi_shader_info in fetch_input_gs
Marek Olšák [Sat, 4 Oct 2014 15:40:39 +0000 (17:40 +0200)]
radeonsi: use tgsi_shader_info in fetch_input_gs

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: don't rely on shader->output in si_llvm_emit_fs_epilogue
Marek Olšák [Sat, 4 Oct 2014 20:09:16 +0000 (22:09 +0200)]
radeonsi: don't rely on shader->output in si_llvm_emit_fs_epilogue

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: use tgsi_shader_info in si_llvm_emit_es_epilogue
Marek Olšák [Sat, 4 Oct 2014 15:04:05 +0000 (17:04 +0200)]
radeonsi: use tgsi_shader_info in si_llvm_emit_es_epilogue

tgsi_shader_info contains everything we need.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: don't recompile shaders when changing nr_cbufs from 0 to 1
Marek Olšák [Sat, 4 Oct 2014 18:44:23 +0000 (20:44 +0200)]
radeonsi: don't recompile shaders when changing nr_cbufs from 0 to 1

Both cases are equivalent.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: remove vs.ucps_enabled from the shader key
Marek Olšák [Sat, 4 Oct 2014 18:41:03 +0000 (20:41 +0200)]
radeonsi: remove vs.ucps_enabled from the shader key

Written CLIPDIST outputs are simply disabled in PA_CL_VS_OUT_CNTL.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoradeonsi: assume ClipDistance usage mask is always 0xf
Marek Olšák [Sat, 4 Oct 2014 17:09:09 +0000 (19:09 +0200)]
radeonsi: assume ClipDistance usage mask is always 0xf

No code in Mesa sets the usage mask to any other value.
The final mask is AND'ed with enable bits from the rasterizer state anyway.

If somebody implements setting usage masks in st/mesa, we can use
tgsi_shader_info to get it more easily.

This is a prerequisite for the following commit.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoclover: Fix unintended fall-through in kernel::argument::bind.
Francisco Jerez [Sun, 12 Oct 2014 08:32:48 +0000 (11:32 +0300)]
clover: Fix unintended fall-through in kernel::argument::bind.

9 years agoclover: Append implicit arguments to the kernel argument list.
Jan Vesely [Wed, 8 Oct 2014 14:43:01 +0000 (17:43 +0300)]
clover: Append implicit arguments to the kernel argument list.

[ Francisco Jerez: Split off from a larger patch, and take a slightly
  different approach for passing the implicit arguments around. ]

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agoclover: Pass execution dimensions and offset to the kernel as implicit arguments.
Francisco Jerez [Wed, 8 Oct 2014 14:39:35 +0000 (17:39 +0300)]
clover: Pass execution dimensions and offset to the kernel as implicit arguments.

Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
9 years agoclover: Add semantic information to module::argument for implicit parameter passing.
Francisco Jerez [Wed, 8 Oct 2014 14:32:18 +0000 (17:32 +0300)]
clover: Add semantic information to module::argument for implicit parameter passing.

Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
9 years agoclover: Use unreachable() from util/macros.h instead of assert(0).
Francisco Jerez [Wed, 8 Oct 2014 14:29:14 +0000 (17:29 +0300)]
clover: Use unreachable() from util/macros.h instead of assert(0).

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
9 years agogallium: Add tokens for DragonFly BSD.
Vinson Lee [Tue, 16 Sep 2014 23:11:53 +0000 (16:11 -0700)]
gallium: Add tokens for DragonFly BSD.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Brian Paul <brianp@vmware.com>
9 years agoilo: disassemble compacted instructions
Chia-I Wu [Fri, 10 Oct 2014 19:24:48 +0000 (03:24 +0800)]
ilo: disassemble compacted instructions

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
9 years agoglsl: improve accuracy of atan()
Erik Faye-Lund [Fri, 26 Sep 2014 16:11:19 +0000 (18:11 +0200)]
glsl: improve accuracy of atan()

Our current atan()-approximation is pretty inaccurate at 1.0, so
let's try to improve the situation by doing a direct approximation
without going through atan.

This new implementation uses an 11th degree polynomial to approximate
atan in the [-1..1] range, and the following identitiy to reduce the
entire range to [-1..1]:

atan(x) = 0.5 * pi * sign(x) - atan(1.0 / x)

This range-reduction idea is taken from the paper "Fast computation
of Arctangent Functions for Embedded Applications: A Comparative
Analysis" (Ukil et al. 2011).

The polynomial that approximates atan(x) is:

x   * 0.9999793128310355 - x^3  * 0.3326756418091246 +
x^5 * 0.1938924977115610 - x^7  * 0.1173503194786851 +
x^9 * 0.0536813784310406 - x^11 * 0.0121323213173444

This polynomial was found with the following GNU Octave script:

x = linspace(0, 1);
y = atan(x);
n = [1, 3, 5, 7, 9, 11];
format long;
polyfitc(x, y, n)

The polyfitc function is not built-in, but too long to include here.
It can be downloaded from the following URL:

http://www.mathworks.com/matlabcentral/fileexchange/47851-constraint-polynomial-fit/content/polyfitc.m

This fixes the following piglit test:
shaders/glsl-const-folding-01

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agovc4: Use the fnv1 hash function instead of gallium util's crc32.
Eric Anholt [Fri, 10 Oct 2014 11:56:45 +0000 (13:56 +0200)]
vc4: Use the fnv1 hash function instead of gallium util's crc32.

Improves simulated norast performance on a little benchmark by 13.4012%
+/- 2.08459% (n=13).

9 years agovc4: Don't look up the compiled shaders unless state has changed.
Eric Anholt [Fri, 10 Oct 2014 12:17:15 +0000 (14:17 +0200)]
vc4: Don't look up the compiled shaders unless state has changed.

Improves simulated norast performance on a little benchmark by 38.0965%
+/- 3.27534% (n=11).

9 years agovc4: Actually clear the context's dirty flags.
Eric Anholt [Fri, 10 Oct 2014 12:24:06 +0000 (14:24 +0200)]
vc4: Actually clear the context's dirty flags.

I was trying to skip state updates when !dirty, and suspiciously
everything was always dirty.

9 years agovc4: Optimize the other case of SEL_X_Y wih a 0 -> SEL_X_0(a).
Eric Anholt [Thu, 9 Oct 2014 07:40:51 +0000 (09:40 +0200)]
vc4: Optimize the other case of SEL_X_Y wih a 0 -> SEL_X_0(a).

Cleans up some output to be more obvious in a piglit test I'm looking at.

9 years agomesa: fix error reported on gTexSubImage2D when level not valid
Tapani Pälli [Tue, 7 Oct 2014 07:56:49 +0000 (10:56 +0300)]
mesa: fix error reported on gTexSubImage2D when level not valid

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
9 years agoi965: Fix register write checks.
Kenneth Graunke [Tue, 30 Sep 2014 00:00:51 +0000 (17:00 -0700)]
i965: Fix register write checks.

When mapping the buffer a second time, we need to use the new pointer,
not the one from the previous mapping.  Otherwise, we will most likely
crash.

Apparently, we've just been getting lucky and getting the same
bo->virtual pointer in both cases.  libdrm probably has a hand in that.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
9 years agovc4: Optimize out adds of 0.
Eric Anholt [Thu, 9 Oct 2014 13:10:52 +0000 (15:10 +0200)]
vc4: Optimize out adds of 0.

9 years agovc4: Optimize fmul(x, 0) and fmul(x, 1).
Eric Anholt [Thu, 9 Oct 2014 13:02:00 +0000 (15:02 +0200)]
vc4: Optimize fmul(x, 0) and fmul(x, 1).

This was being generated frequently by matrix multiplies of 2 and
3-channel vertex attributes (which have the 0 or 1 loaded in the shader).

9 years agovc4: Factor out the turn-it-into-a-mov in opt_algebraic.
Eric Anholt [Thu, 9 Oct 2014 13:07:24 +0000 (15:07 +0200)]
vc4: Factor out the turn-it-into-a-mov in opt_algebraic.

This will be used more in the next commits.

9 years agovc4: Eliminate unused texture instructions.
Eric Anholt [Thu, 9 Oct 2014 12:42:14 +0000 (14:42 +0200)]
vc4: Eliminate unused texture instructions.

9 years agovc4: Dead code eliminate unused SF instructions.
Eric Anholt [Thu, 9 Oct 2014 12:45:14 +0000 (14:45 +0200)]
vc4: Dead code eliminate unused SF instructions.

9 years agovc4: Prevent copy propagating out the MOVs from r4.
Eric Anholt [Thu, 9 Oct 2014 14:36:45 +0000 (16:36 +0200)]
vc4: Prevent copy propagating out the MOVs from r4.

Copy propagating these might result in reading the r4 after some other
instruction has written r4.  Just prevent all copy propagation of this for
now.

Fixes bad rendering with upcoming indirect register access support, where
the copy propagation was consistently happening across another read.

9 years agovc4: Split the coordinate shader to its own vc4_compiled_shader.
Eric Anholt [Thu, 2 Oct 2014 16:50:44 +0000 (09:50 -0700)]
vc4: Split the coordinate shader to its own vc4_compiled_shader.

Merging VS and CS into the same struct wasn't winning us anything except
for not allocating a separate BO (but if we want to pack programs into
BOs, we should pack not just those 2 programs together).  What it was
getting us was a bunch of code duplication about hash table lookups and
propagating vc4_compile contents into a vc4_compiled_shader.

I was about to make the situation worse with indirect uniform buffer
access.

9 years agovc4: Add #defines for the texture uniform fields.
Eric Anholt [Thu, 2 Oct 2014 00:32:50 +0000 (17:32 -0700)]
vc4: Add #defines for the texture uniform fields.

I wanted to make another set of texture uploads for handling reladdr
constants, and duplicating all the bitshifting looked like a terrible
idea.  In the process, this fixes a swap of the s/t texture wrap modes.

9 years agovc4: Initialize undefined temporaries to 0.
Eric Anholt [Thu, 9 Oct 2014 15:49:23 +0000 (17:49 +0200)]
vc4: Initialize undefined temporaries to 0.

Under the simulator, reading registers before writing them triggers an
assertion failure.  c->undef gets treated as r0, which will usually be
written, but not if it's used in the first instruction.  We should
definitely not be aborting in this case, and return some sort of undefined
value instead.

Fixes glsl-user-varying-ff.

9 years agoi965: Skip uploading border color when unnecessary.
Kenneth Graunke [Sat, 26 Jul 2014 08:16:27 +0000 (01:16 -0700)]
i965: Skip uploading border color when unnecessary.

The border color is only needed when using the GL_CLAMP_TO_BORDER or
(deprecated) GL_CLAMP wrap modes; all others ignore it, including the
common GL_CLAMP_TO_EDGE and GL_REPEAT wrap modes.

In those cases, we can skip uploading it entirely, saving a bit of space
in the batchbuffer.  Instead, we just point it at the start of the
batch (offset 0); we have to program something, and that address is safe
to read.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
9 years agoi965: Use BDW_MOCS_PTE for renderbuffers.
Kenneth Graunke [Tue, 30 Sep 2014 08:15:56 +0000 (01:15 -0700)]
i965: Use BDW_MOCS_PTE for renderbuffers.

Write-back caching cannot be used for buffers being scanned out by the
display engine; surfaces used for scan-out must be write-through or
uncached.  I originally chose WT for render targets because it works in
all cases.  However, we really want to use write-back caching where
possible, as it is more efficient.

Most renderbuffers are not used for scanout - off-screen FBOs certainly
are fine, and non-pageflipped backbuffers should be fine as well.  So
in most cases WB will work.  However, we don't know what will be used
for scan-out, so we instead simply use the PTE value specified by the
kernel, as it knows these things.

This matches our MOCS choice on Haswell.

Fixes performance regressions since commit ee4484be3dc827cf15bcf109f5
in a microbenchmark (spotted by Eero Tamminen).  Improves performance
in GLBenchmark 2.7/EgyptHD by 7.44362% +/- 0.496939% (n=55) on a
Broadwell GT2.  Improves performance in a bunch of other microbenchmarks
by ~15% or so.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reported-by: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: mesa-stable@lists.freedesktop.org
9 years agoi965: Add a BRW_MOCS_PTE #define.
Kenneth Graunke [Tue, 30 Sep 2014 08:15:55 +0000 (01:15 -0700)]
i965: Add a BRW_MOCS_PTE #define.

Like BDW_MOCS_WB and BDW_MOCS_WT, this specifies that we want to use all
three caches (L3, LLC, and eLLC where available), but leaves the LLC
caching mode up to the kernel's page table entry.

This allows the kernel to pick WB/WT/UC based on whether it's using a
buffer for scanout.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
Cc: mesa-stable@lists.freedesktop.org
9 years agomesa: Make _mesa_print_arrays use stderr.
Kenneth Graunke [Sat, 27 Sep 2014 05:02:50 +0000 (22:02 -0700)]
mesa: Make _mesa_print_arrays use stderr.

These days, most driver debug output happens via stderr, not stdout.
Some applications (such as Xephyr) also appear to close stdout which
makes these messages go nowhere.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
9 years agor600g,radeonsi: Always use GTT again for PIPE_USAGE_STREAM buffers
Michel Dänzer [Tue, 26 Aug 2014 09:21:50 +0000 (18:21 +0900)]
r600g,radeonsi: Always use GTT again for PIPE_USAGE_STREAM buffers

Putting those in VRAM can cause long pauses due to buffers being moved
into / out of VRAM.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84662
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
9 years agovc4: Optimize SF(ITOF(x)) -> SF(x).
Eric Anholt [Thu, 9 Oct 2014 07:36:03 +0000 (09:36 +0200)]
vc4: Optimize SF(ITOF(x)) -> SF(x).

This is a common production of st_glsl_to_tgsi, because CMP takes a float
argument.

9 years agovc4: Add some optimization of FADD(FSUB(0, x)).
Eric Anholt [Thu, 9 Oct 2014 07:32:10 +0000 (09:32 +0200)]
vc4: Add some optimization of FADD(FSUB(0, x)).

This is a common production of st_glsl_to_tgsi, which uses negate flags on
source arguments to handle subtraction.

9 years agovc4: Mostly fix offset calculation for NPOT mipmap levels.
Eric Anholt [Mon, 6 Oct 2014 22:47:38 +0000 (15:47 -0700)]
vc4: Mostly fix offset calculation for NPOT mipmap levels.

The non-base NPOT levels are stored as POT-aligned images.  We get that
POT alignment by minifying the POT-aligned base level.

This means that level strides are also POT aligned, so we have to tell the
rendering mode config that our resource is larger than the actual
requested area.

Fixes the fbo-generatemipmap-formats NPOT cases.  Regresses
depthstencil-render-miplevels 273 * -- the texture presentation now works
(where it was completely broken before), it looks like there's some
overflow of image bounds happening at the lower miplevels.

9 years agovc4: Move the mirrored kernel code to a kernel/ directory.
Eric Anholt [Tue, 30 Sep 2014 23:25:48 +0000 (16:25 -0700)]
vc4: Move the mirrored kernel code to a kernel/ directory.

Now this whole setup matches the kernel's file layout much more closely.

9 years agovc4: Enable LIT lowering in TGSI instead of our own code.
Eric Anholt [Tue, 30 Sep 2014 20:27:36 +0000 (13:27 -0700)]
vc4: Enable LIT lowering in TGSI instead of our own code.

This brings us the -128/128 clamping on the w component.

9 years agovc4: Fix scalar math opcodes to replicate their result from the X channel.
Eric Anholt [Wed, 8 Oct 2014 20:26:58 +0000 (22:26 +0200)]
vc4: Fix scalar math opcodes to replicate their result from the X channel.

Thanks to robclark for pointing out that I was probably failing to do this
when I reported a "bug" in his lowering code.

9 years agoilo: fix rectlist on GEN7+
Chia-I Wu [Wed, 8 Oct 2014 19:30:17 +0000 (03:30 +0800)]
ilo: fix rectlist on GEN7+

It was broken by 343b014b57ecc5431477e090100e6a26edbda540.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
9 years agovc4: Add support for two-sided color.
Eric Anholt [Tue, 30 Sep 2014 23:24:26 +0000 (16:24 -0700)]
vc4: Add support for two-sided color.

It's fairly easy, thanks to Rob Clark's lowering code.  Fixes
two-sided-lighting and 4 vertex-program-two-side testcases, while
regressing 8 testcases that involve enabling two-sided color while only
initializing one of the two colors in the VS.  If you're enabling two
sided color, it's of course expected that you really do set up both
colors, so this is still an improvement (and when we set up a linker for
TGSI, we'll hopefully fix those 8 fails).

9 years agovc4: Enable POW lowering in TGSI instead of our own code.
Eric Anholt [Tue, 30 Sep 2014 20:29:22 +0000 (13:29 -0700)]
vc4: Enable POW lowering in TGSI instead of our own code.