Dave Airlie [Mon, 1 Sep 2014 23:54:36 +0000 (09:54 +1000)]
glsl: free uniform_map on failure path.
If we fails in reserve_explicit_locations, we leak uniform_map.
Reported-by: coverity scanner.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Paul Berry [Sat, 11 Jan 2014 05:39:25 +0000 (21:39 -0800)]
main/cs: Add gl_context::ComputeProgram
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jordan Justen [Thu, 7 Aug 2014 05:32:03 +0000 (22:32 -0700)]
mesa: Convert NewDriverState to 64-bits
i965 will have more than 32 bits when BRW_STATE_COMPUTE_PROGRAM is added.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Paul Berry [Sat, 11 Jan 2014 00:37:09 +0000 (16:37 -0800)]
i965: Modify state upload to allow 2 different sets of state atoms.
The set of state atoms for compute shaders is currently empty; it will
be filled in by future patches.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Paul Berry [Sat, 11 Jan 2014 00:05:11 +0000 (16:05 -0800)]
i965: Modify dirty bit handling to support 2 pipelines.
The hardware state for compute shaders is almost entirely orthogonal
to the hardware state for 3D rendering. To avoid sending unnecessary
state to the hardware, we'll need to have a separate set of state
atoms for the compute pipeline and the 3D pipeline. That means we
need to maintain two separate sets of dirty bits to determine which
state atoms need to be run.
But the dirty bits are not completely independent; for example, if
BRW_NEW_SURFACES is flagged while doing 3D rendering, then not only do
we need to re-run 3D state atoms that depend on BRW_NEW_SURFACES, but
we also need to re-run compute state atoms that depend on
BRW_NEW_SURFACES. But we'll also need to re-run those state atoms the
next time the compute pipeline is run.
To accomplish this, we record two sets of dirty bits, one for each
pipeline. When bits are dirtied (via SET_DIRTY_BIT() or
SET_DIRTY_ALL()) we set them to the dirty state in both pipelines.
When brw_state_upload() is run, we clear the dirty bits just for the
pipeline that was run.
Note that since the number of pipelines is known at compile time to be
2, the compiler should unroll the loops in SET_DIRTY_BIT() and
SET_DIRTY_ALL().
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Paul Berry [Fri, 10 Jan 2014 23:40:57 +0000 (15:40 -0800)]
i965: Create a macro for checking a dirty bit.
This will make it easier to extend dirty bit handling to support
compute shaders.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Paul Berry [Fri, 10 Jan 2014 22:23:52 +0000 (14:23 -0800)]
i965: Create a macro for setting all dirty bits.
This will make it easier to extend dirty bit handling to support
compute shaders.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Paul Berry [Fri, 10 Jan 2014 21:00:51 +0000 (13:00 -0800)]
i965: Create a macro for setting a dirty bit.
This will make it easier to extend dirty bit handling to support
compute shaders.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Dave Airlie [Tue, 2 Sep 2014 00:13:02 +0000 (10:13 +1000)]
i965: add missing parens in vec4 visitor
coverity reported this, Matt said it look like missing parens,
not bad identing, so lets try that.
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 1 Sep 2014 23:07:55 +0000 (09:07 +1000)]
nouveau: don't leak dec struct on error
This one path doesn't goto fail, so it seems to leak dec.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Tue, 2 Sep 2014 00:03:00 +0000 (10:03 +1000)]
xvmc/tests: %C isn't a valid printf specifier.
Reported-by: Coverity scanner.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 1 Sep 2014 22:55:55 +0000 (08:55 +1000)]
nouveau/nv40: quiten coverity warning in unused vertex texture code.
This fixes the code, but we never run it anyways, so silence coverity.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Ilia Mirkin [Mon, 1 Sep 2014 22:47:01 +0000 (18:47 -0400)]
nv50: remove unused variables
Recent code changes have caused these to no longer be used. Remove them.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Ilia Mirkin [Wed, 20 Aug 2014 06:42:30 +0000 (02:42 -0400)]
mesa: force height of 1D textures to be 1 in texture views
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Ilia Mirkin [Mon, 1 Sep 2014 14:51:08 +0000 (10:51 -0400)]
nv50: attach the buffer bo to the miptree structures
The current code... makes no sense. Use nouveau_bo_ref to attach the bo
to the exposed resource so as to have the proper lifetime guarantees.
Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Mon, 1 Sep 2014 14:48:09 +0000 (10:48 -0400)]
nv50: mt address may not be the underlying bo's start address
With VP2, nv50_miptree is faked because the underlying bo's have to be
laid out in a certain way. This is done by adjusting the address. Make
sure that blits (and everything else for consistency) use the mt address
rather than the bo address as a base.
This fixes retrieving chroma plane with VDPAU.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82255
Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Mon, 1 Sep 2014 16:48:12 +0000 (12:48 -0400)]
nv50: set the miptree address when clearing bo's in vp2 init
The mt address is about to be used more, make sure it's set
appropriately.
Reported-by: Emil Velikov <emil.l.velikov@gmail.com>
Tested-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Mon, 1 Sep 2014 14:55:27 +0000 (10:55 -0400)]
nv50/ir: avoid creating instructions that can't be emitted
When constant folding a MAD operation, we first fold the multiply and
generate an ADD. However we do so without making sure that the immediate
can be handled in the saturate case. If it can't, load the immediate in
a separate instruction.
Reported-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Mon, 1 Sep 2014 04:43:06 +0000 (00:43 -0400)]
nvc0: don't make 1d staging textures linear
Experimentally, the sampler doesn't appear to like these, neither as
buffer nor as rect textures. So remove 1D from the list of texture types
to make linear when used for staging.
This fixes the OSD in mplayer for VDPAU.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Sat, 30 Aug 2014 17:35:47 +0000 (13:35 -0400)]
nv50: zero out unbound samplers
Samplers are only defined up to num_samplers, so set all samplers above
nr to NULL so that we don't try to read them again later.
Tested-by: Christian Ruppert <idl0r@qasl.de>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Ilia Mirkin [Fri, 29 Aug 2014 03:05:49 +0000 (23:05 -0400)]
nvc0/ir: avoid infinite recursion when finding first uses of tex
In certain circumstances, findFirstUses could end up doubling back on
instructions it had already processed, resulting in an infinite
recursion. Avoid this by keeping track of already-visited instructions.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83079
Tested-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Rob Clark [Mon, 1 Sep 2014 16:37:26 +0000 (12:37 -0400)]
freedreno/ir3: add DDX/DDY
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Mon, 1 Sep 2014 16:36:34 +0000 (12:36 -0400)]
freedreno/ir3: don't keep IR around
Once we've assembled the shader, no need to keep the intermediate
around.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Jason Ekstrand [Fri, 29 Aug 2014 18:23:55 +0000 (11:23 -0700)]
i965/fs: Don't segfault when debug-logging a null program
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Thu, 28 Aug 2014 04:49:50 +0000 (21:49 -0700)]
i965/vec4: Don't segfault when debug-logging a null program
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Marek Olšák [Sat, 23 Aug 2014 14:46:53 +0000 (16:46 +0200)]
radeonsi: implement EXPCLEAR optimization for depth
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sat, 23 Aug 2014 13:48:21 +0000 (15:48 +0200)]
r600g,radeonsi: initialize HTILE to fully-expanded state
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sat, 23 Aug 2014 01:39:08 +0000 (03:39 +0200)]
radeonsi: implement fast depth clear
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sat, 23 Aug 2014 01:25:29 +0000 (03:25 +0200)]
radeonsi: move DB_RENDER_CONTROL into draw_vbo
So that I can add fast depth clear.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sat, 23 Aug 2014 09:12:01 +0000 (11:12 +0200)]
radeonsi: disable occlusion queries if they are not needed
We always left them enabled, which turned off HiZ in some cases.
This should improve performace with Hyper-Z.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Sat, 23 Aug 2014 00:03:58 +0000 (02:03 +0200)]
r600g,radeonsi: force fast stencil and HTILE stencil off, fixing a Hyper-Z hang
This should be as fast as no HTILE for stencil. I think we can still get full
performance with depth-only rendering even if stencil is present in the buffer
but not used, but I'm not 100% sure. This may be revisited when HiS and fast
stencil clear are implemented.
This fixes a hang in Brutal Legend.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64471
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Wed, 20 Aug 2014 21:58:24 +0000 (23:58 +0200)]
r600g: set VGT_ENHANCE=4 on R7xx
This is a golden setting on RV740, but there is a hw bug which recommends
setting it on all R7xx chipsets.
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Wed, 20 Aug 2014 17:17:39 +0000 (19:17 +0200)]
r600g: expose AMD_vertex_shader_layer and *_viewport_index on R600-R700
already implemented
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Wed, 20 Aug 2014 17:17:09 +0000 (19:17 +0200)]
r600g: fix layered clear
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Wed, 20 Aug 2014 15:22:41 +0000 (17:22 +0200)]
r600g: some DB bug workarounds for R6xx DB flushing
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Wed, 20 Aug 2014 12:36:53 +0000 (14:36 +0200)]
r600g: enable fast depth clear for array textures and cubemaps
I have a piglit test that hits this.
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Tue, 19 Aug 2014 23:34:37 +0000 (01:34 +0200)]
r600g: use HTILE allocator from SI
It's almost the same.
This enables tiling for HTILE. It also enables Hyper-Z for other texture
targets (1D, 1D_ARRAY, 2D_ARRAY, CUBE, CUBE_ARRAY, 3D, RECT).
2D array depth textures are tested by Unigine Sanctuary and my new piglit
test.
Acked-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Wed, 20 Aug 2014 10:52:09 +0000 (12:52 +0200)]
r600g: set DB_DEPTH_SIZE.HEIGHT_TILE_MAX for EG/CM, inline other fields
This fixes rendering to non-zero layer/face/slice with HTILE.
v2: added the assertion
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Marek Olšák [Tue, 19 Aug 2014 14:22:12 +0000 (16:22 +0200)]
radeonsi: set DB_DEPTH_SIZE.HEIGHT_TILE_MAX, inline other fields
This fixes rendering to a non-zero layer/face/slice with HTILE.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=72685
v2: added the assertion
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Glenn Kennard [Mon, 25 Aug 2014 09:05:06 +0000 (11:05 +0200)]
r600g: Implement sm5 geometry shader instancing
Requires Evergreen or later hardware.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Marek Olšák [Sat, 23 Aug 2014 22:56:12 +0000 (00:56 +0200)]
glsl_to_tgsi: allocate and enlarge arrays for temporaries on demand
This fixes crashes if the number of temporaries is greater than 4096.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=66184
v2: added fail paths for realloc failures
Cc: 10.2 10.3 mesa-stable@lists.freedesktop.org
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Marek Olšák [Wed, 20 Aug 2014 21:53:40 +0000 (23:53 +0200)]
gallium/pb_bufmgr_cache: limit the size of cache
This should make a machine which is running piglit more responsive at times.
e.g. streaming-texture-leak can easily eat 600 MB because of how fast it
creates new textures.
Marek Olšák [Tue, 19 Aug 2014 22:34:18 +0000 (00:34 +0200)]
pipe-loader: use the correct screen index
Marek Olšák [Tue, 19 Aug 2014 22:33:34 +0000 (00:33 +0200)]
egl/dri2: use the correct screen index
Required for multi-GPU configuration where each GPU has its own X screen.
Jordan Justen [Wed, 27 Aug 2014 20:22:12 +0000 (13:22 -0700)]
docs: Mark ARB_compute_shader as work in progress
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Connor Abbott [Mon, 4 Aug 2014 22:20:37 +0000 (15:20 -0700)]
i965/fs: don't use ir->shadow_comparitor in emit_texture_*
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Connor Abbott [Tue, 5 Aug 2014 18:10:07 +0000 (11:10 -0700)]
i965/fs: don't pass ir_variable * to emit_samplepos_setup()
We were only using it to get at its type, which we already know because
it's a builtin variable.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Connor Abbott [Tue, 5 Aug 2014 17:29:00 +0000 (10:29 -0700)]
i965/fs: don't pass ir_variable * to emit_frontfacing_interpolation()
We were only using it to get at its type, which we already know because
it's a builtin variable.
v2 (Ken): Rebase on Matt's optimized gl_FrontFacing calculations.
Signed-off-by: Connor Abbott <connor.abbott@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Kenneth Graunke [Sat, 30 Aug 2014 06:10:47 +0000 (23:10 -0700)]
i965: Fix GPU hangs when INTEL_DEBUG=no16 is set.
The replicated data clear shader needs to be SIMD16, or else the GPU
will hang. So, compile it even if INTEL_DEBUG=no16 is set.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Emil Velikov [Sun, 31 Aug 2014 22:16:15 +0000 (23:16 +0100)]
mesa: fix make tarballs
Current method of generating distribution tar-balls involves manually
invoking make + target name in the appropriate places. This temporary
solution is used until we get 'make dist' working.
Currently it does not work, as in order to have the target (which is
also a filename) available in the final Makefile we need to add a PHONY
target + use the correct target name.
Cc: "10.2 10.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Abdiel Janulgue [Mon, 16 Jun 2014 20:56:10 +0000 (13:56 -0700)]
i965/vec4: Remove try_emit_saturate
Now that saturate is implemented natively as an instruction,
we can cut down on unneeded functionality.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Abdiel Janulgue [Mon, 16 Jun 2014 19:28:00 +0000 (12:28 -0700)]
i965/fs: Refactor try_emit_saturate
v3: Since the fs backend can emit saturate as a separate instruction, there is
no need to detect for min/max instructions and to rewrite the instruction tree
accordingly. On the other hand, we don't need to emit a separate saturated
mov either when the expression generating src can do saturate directly.
v4: Add can_do_saturate() check before enabling saturate modifer (Ken)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Abdiel Janulgue [Mon, 16 Jun 2014 19:28:44 +0000 (12:28 -0700)]
ir_to_mesa, glsl_to_tgsi: Remove try_emit_saturate
Now that saturate is implemented natively as instruction,
we can cut down on unneeded functionality.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Abdiel Janulgue [Fri, 4 Jul 2014 11:52:36 +0000 (04:52 -0700)]
i965/vec4: Allow propagation of instructions with saturate flag to sel
When sel conditon is bounded within 0 and 1.0. This allows code as:
mov.sat a b
sel.ge dst a 0.25F
To be propagated as:
sel.ge.sat dst b 0.25F
v3: - Syntax clarifications in inst->saturate assignment
- Remove extra parenthesis when assigning src_reg value
from copy_entry (Matt Turner)
v4: - Take channels into consideration when propagating saturated instructions.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Abdiel Janulgue [Thu, 3 Jul 2014 11:14:39 +0000 (04:14 -0700)]
i965/fs: Allow propagation of instructions with saturate flag to sel
When sel conditon is bounded within 0 and 1.0. This allows code as:
mov.sat a b
sel.ge dst a 0.25F
To be propagated as:
sel.ge.sat dst b 0.25F
v3: Syntax clarifications in inst->saturate assignment (Matt Turner)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Abdiel Janulgue [Tue, 8 Jul 2014 11:12:50 +0000 (14:12 +0300)]
glsl: Optimize clamp(x, b, 1.0), where b > 0.0 as max(saturate(x),b)
v2: - Output max(saturate(x),b) instead of saturate(max(x,b))
- Make sure we do component-wise comparison for vectors (Ian Romanick)
v3: - Add missing condition where the outer constant value is > 0.0 and
inner constant is 1.0.
- Fix comments to show that the optimization is a commutative operation
(Matt Turner)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Abdiel Janulgue [Fri, 20 Jun 2014 05:17:20 +0000 (22:17 -0700)]
glsl: Optimize clamp(x, 0.0, b), where b < 1.0 as min(saturate(x),b)
v2: - Output min(saturate(x),b) instead of saturate(min(x,b)) suggested by Ilia Mirkin
- Make sure we do component-wise comparison for vectors (Ian Romanick)
v3: - Add missing condition where the outer constant value is zero and
inner constant is < 1
- Fix comments to reflect we are doing a commutative operation (Matt Turner)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Abdiel Janulgue [Fri, 20 Jun 2014 05:15:14 +0000 (22:15 -0700)]
glsl: Optimize clamp(x, 0, 1) as saturate(x)
v2: - Check that the base type is float (Ian Romanick)
v3: - Make sure comments reflect that we are doing a commutative operation
- Add missing condition where the inner constant is 1.0 and outer constant is 0.0
- Make indexing of operands easier to read (Matt Turner)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Abdiel Janulgue [Fri, 20 Jun 2014 23:55:03 +0000 (16:55 -0700)]
glsl: Implement saturate as ir_unop_saturate
Now that we have the ir_unop_saturate implemented as a single
instruction, generate the correct simplified expression.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Abdiel Janulgue [Mon, 16 Jun 2014 20:56:50 +0000 (13:56 -0700)]
yi965/vec4: Add support for ir_unop_saturate
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Abdiel Janulgue [Mon, 16 Jun 2014 17:35:44 +0000 (10:35 -0700)]
i965/fs: Add support for ir_unop_saturate
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Abdiel Janulgue [Mon, 16 Jun 2014 18:14:32 +0000 (11:14 -0700)]
ir_to_mesa, glsl_to_tgsi: Add support for ir_unop_saturate
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Abdiel Janulgue [Mon, 16 Jun 2014 19:16:57 +0000 (12:16 -0700)]
ir_to_mesa, glsl_to_tgsi: lower ir_unop_saturate
Needed when vertex programs doesn't allow saturate
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Abdiel Janulgue [Thu, 12 Jun 2014 21:59:30 +0000 (14:59 -0700)]
glsl: Add a pass to lower ir_unop_saturate to clamp(x, 0, 1)
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Abdiel Janulgue [Thu, 12 Jun 2014 20:53:40 +0000 (13:53 -0700)]
glsl: Add constant evaluation of ir_unop_saturate
v2: Use CLAMP macro (Ian Romanick)
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Abdiel Janulgue [Fri, 20 Jun 2014 18:56:48 +0000 (11:56 -0700)]
glsl: Add ir_unop_saturate
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Abdiel Janulgue [Wed, 6 Aug 2014 08:27:58 +0000 (11:27 +0300)]
i965/vec4/fs: Count loops in shader debug
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Abdiel Janulgue [Fri, 29 Aug 2014 16:07:08 +0000 (19:07 +0300)]
i965/vec4: inline generate_vec4_instruction() within generate_code()
Suggested by Matt. This patch combines and moves back the code-generation
functions from generate_vec4_instruction() into generate_code(). Makes
generate_code() a bit larger, but helps us to count loops in a
straightforward manner.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
Kenneth Graunke [Fri, 29 Aug 2014 22:15:43 +0000 (15:15 -0700)]
i965: Add 2x MSAA support to Broadwell fast clear code.
According to the cited documentation section (but in the newer docs),
x_scaledown is the same for 2x and 4x MSAA.
+47 piglits.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83081
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: "10.3" <mesa-stable@lists.freedesktop.org>
Matt Turner [Fri, 29 Aug 2014 03:16:42 +0000 (20:16 -0700)]
i965/vec4: Update register coalescing test.
In commit
04895f5c I added support for reswizzling writemasks. This test
was checking that we didn't support this.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82881
Matt Turner [Sat, 30 Aug 2014 16:42:18 +0000 (09:42 -0700)]
i965: Use unreachable() to silence warning.
brw_meta_fast_clear.c:211:17: warning: 'x_scaledown' may be used
uninitialized in this function [-Wmaybe-uninitialized]
unsigned int x_scaledown, y_scaledown;
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Chia-I Wu [Sun, 31 Aug 2014 00:12:27 +0000 (08:12 +0800)]
ilo: set INTEL_RELOC_GGTT only on GEN6
We asked MI commands to use GGTT only on GEN6.
Chia-I Wu [Sat, 30 Aug 2014 17:34:41 +0000 (01:34 +0800)]
ilo: fix bound check for 3DSTATE_URB_VS
Fix max/min entries on GEN7.5 GT2/GT3.
Chia-I Wu [Sat, 30 Aug 2014 23:22:01 +0000 (07:22 +0800)]
ilo: replace cmd by dw0 in GPE
With
e3c251071b0c9396c3ec76d1cf943c60ae297281, the magic values are gone. We
no longer need "cmd" to hide them. Replace it by dw0.
Alexander von Gluck IV [Fri, 29 Aug 2014 15:06:09 +0000 (15:06 +0000)]
st/hgl: Move st_visual create/destroy into hgl state_tracker
Alexander von Gluck IV [Fri, 29 Aug 2014 14:42:26 +0000 (14:42 +0000)]
st/hgl: Move st_manager create/destroy into hgl state_tracker
Rob Clark [Sat, 30 Aug 2014 20:51:46 +0000 (16:51 -0400)]
freedreno/ir3: fix potential null ptr deref
Fix potential segfault in debug code.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Sat, 30 Aug 2014 19:17:49 +0000 (15:17 -0400)]
freedreno/ir3: add TXB
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Rob Clark [Fri, 29 Aug 2014 14:51:40 +0000 (10:51 -0400)]
freedreno/ir3: detect scheduler fail
There are some cases where the scheduler can get itself into impossible
situations, by scheduling the wrong write to pred or addr register
first. (Ie. it could end up being unable to schedule any instruction if
some instruction which depends on the current addr/reg value also
depends on another addr/reg value.)
To solve this we'd need to be able to insert extra mov instructions
(which would also help when register assignment gets into impossible
situations). To do that, we'd need to move the nop padding from sched
into legalize.
But to start with, just detect when we get into an impossible situation
and bail, rather than sitting forever in an infinite loop. This way it
will at least fall back to the old compiler, which might even work if
you are lucky.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Ian Romanick [Mon, 14 Jul 2014 22:48:38 +0000 (15:48 -0700)]
glsl: Use bit-flags image attributes and uint16_t for the image format
All of the GL image enums fit in 16-bits.
Also move the fields from the anonymous "image" structucture to the next
higher structure. This will enable packing the bits with the other
bitfield.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 76 40,572,916,873 68,831,248 63,328,783 5,502,465 0
After (32-bit): 70 40,577,421,777 68,487,584 62,973,695 5,513,889 0
Before (64-bit): 60 36,822,640,058 96,526,824 88,735,296 7,791,528 0
After (64-bit): 74 37,124,603,758 95,891,808 88,466,712 7,425,096 0
A real savings of 346KiB on 32-bit and 262KiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Mon, 14 Jul 2014 22:48:37 +0000 (15:48 -0700)]
glsl: Use a single bit for the dual-source blend index
The only values allowed are 0 and 1, and the value is checked before
assigning.
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 74 40,580,119,657 69,186,544 63,506,327 5,680,217 0
After (32-bit): 76 40,572,916,873 68,831,248 63,328,783 5,502,465 0
Before (64-bit): 89 36,822,971,897 96,526,616 88,735,296 7,791,320 0
After (64-bit): 60 36,822,640,058 96,526,824 88,735,296 7,791,528 0
A real savings of 173KiB on 32-bit and no change on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Mon, 14 Jul 2014 22:48:36 +0000 (15:48 -0700)]
glsl: Eliminate ir_variable::data.atomic.buffer_index
Just use ir_variable::data.binding... because that's the where the
binding is stored for everything else that can use layout(binding=).
Valgrind massif results for a trimmed apitrace of dota2:
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
Before (32-bit): 50 40,564,927,443 69,185,408 63,683,871 5,501,537 0
After (32-bit): 74 40,580,119,657 69,186,544 63,506,327 5,680,217 0
Before (64-bit): 59 36,822,048,449 96,526,888 89,113,000 7,413,888 0
After (64-bit): 89 36,822,971,897 96,526,616 88,735,296 7,791,320 0
A real savings of 173KiB on 32-bit and 368KiB on 64-bit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Kenneth Graunke [Thu, 28 Aug 2014 03:14:54 +0000 (20:14 -0700)]
mesa: Delete ctx->GeometryProgram.Cache.
The VertexProgram and FragmentProgram have a Cache member for dealing
with fixed function programs. There are no fixed function geometry
programs, so this should never have existed, and was just copy and
pasted.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Roland Scheidegger [Fri, 29 Aug 2014 03:12:00 +0000 (05:12 +0200)]
gallivm: fix somewhat broken NaN behavior for exp2
I actually screwed that up in
754319490f6946a9ad5ee619822d5fe4254e6759,
mistakenly thinking the code actually wanted the non-nan result before.
So, introduce that missing nan behavior case and use that instead.
For sse, there's no actual change in the resulting code at all, the fallback
code wouldn't have done the right thing though.
Of course, the actual issue I saw with pow() was completely unrelated...
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Thu, 28 Aug 2014 03:13:35 +0000 (05:13 +0200)]
softpipe: handle vertex texture sampling when using llvm for draw
Pretty trivial, just fill in the offsets and such. The implementation
is near 100% copy and paste from llvmpipe. Should be useful for debugging.
No piglit change when not using SOFTPIPE_USE_LLVM=1.
Now that it can do the same tests with and without using llvm for vs/gs,
with llvm more pass, the only things failing only with llvm seems to be
edgeflags tests and vs/gs-pow-float-float (and for the latter I'm not
convinced the zero tolerance it requires is somehow mandated by glsl).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Thu, 28 Aug 2014 01:38:55 +0000 (03:38 +0200)]
llvmpipe: (trivial) enable cube map arrays
The code is all in place now so enable it.
Seems to pass all relevant piglit tests (just like cube maps, some of the
cube map array tests need GALLIVM_DEBUG=no_quad_lod,no_rho_approx)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Fri, 29 Aug 2014 23:32:59 +0000 (01:32 +0200)]
gallivm: handle cube map arrays for texture sampling
Pretty easy, just make sure that all paths testing for PIPE_TEXTURE_CUBE
also recognize PIPE_TEXTURE_CUBE_ARRAY, and add the layer * 6 calculation
to the calculated face.
Also handle it for texture size query, looks like OpenGL wants the number
of cubes, not layers (so need division by 6).
No piglit regressions.
v2: fix up adding cube layer to face for seamless filtering (needs to happen
after calculating per-sample face). Undetected by piglit unfortunately.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com> (v1)
Roland Scheidegger [Wed, 27 Aug 2014 23:06:11 +0000 (01:06 +0200)]
draw: kill off bogus assertion in tgsi_fetch_gs_outputs
Not sure why it was there but it is definitely not an error if gs outputs are
infs/nans. Besides, the outputs can be ints, in which case any small negative
number asserted.
This fixes piglit's texelFetch gs isamplerXX crashes with softpipe (down from
14 to 2).
Bug https://bugs.freedesktop.org/show_bug.cgi?id=80012
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Wed, 27 Aug 2014 22:43:32 +0000 (00:43 +0200)]
softpipe: don't assert on illegal wrap mode for rect textures
piglit tex-miplevel-selection nowadays doesn't use repeat wrap mode due to
sampler objects any longer, however at the time of the clear the wrap mode
is still illegal and at this point we get to verify the state, including
samplers (even though they won't get used), and because mesa doesn't treat
it as an incomplete texture as the spec says it should, we hit the assertion.
Just warn about this for now instead.
Gets crashes down from 44 to 14 in a piglit run (all were in various tests of
tex-miplevel-selection with texture rectangles). Though just about all
tex-miplevel-selection tests fail anyway for other reasons.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Wed, 27 Aug 2014 22:41:05 +0000 (00:41 +0200)]
tgsi: (trivial) fix handling msaa resources on TXF
Just handle as ordinary 2d / 2d array resources. Prevents an assertion failure
with softpipe and piglit glsl-resource-not-bound 2DMS/2DMSArray tests.
While here also fix TXD shadowCube similarly, which fixes the crash with piglit
tex-miplevel-selection textureGrad CubeShadow (the test will still fail due to
softpipe being broken).
This fixes https://bugs.freedesktop.org/show_bug.cgi?id=80011
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Tue, 26 Aug 2014 23:39:44 +0000 (01:39 +0200)]
draw: remove fishy num_samplers/num_sampler_views check in llvm path
This was meant for softpipe to not crash at some point if vertex texturing
was used. It is, however, fishy because it uses values from
draw_set_samplers/draw_set_sampler_views and not from the shader key. Albeit
we should still in all cases actually generate a new shader if this changes
(because the samplers and views themselves are in the key) I don't want to
think again wondering if that's really correct in the future.
Besides, at least today, it does not actually work for softpipe, as this was
relying on softpipe not actually calling draw_set_samplers/sampler_views at
all - I've verified it crashes regardless (if there were a tex instruction in
the vs, which normally should not happen anyway). For drivers which do indeed
not call these functions because they don't support vertex texturing at all
(r300), this should still not crash because the static texture data is all
zero, which causes the sampling functions to take an early out (same as is done
if no texture is bound at the slot used for sampling - verified with hacked up
softpipe).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Thu, 28 Aug 2014 13:54:52 +0000 (15:54 +0200)]
mesa: fix fallback texture for cube map array
mesa was creating a cube map array texture with just one layer, which is
not legal. This caused an assertion failure when using that texture later
in llvmpipe (when enabling cube map arrays) since it verifies the number
of layers in the view is divisible by 6 (the sampling code might well crash
randomly otherwise) with piglit glsl-resource-not-bound CubeArray -fbo -auto.
v2: use appropriately sized texel array...
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Aaron Watry [Wed, 20 Aug 2014 23:24:00 +0000 (18:24 -0500)]
r600/compute: Don't leak compute pool item_list/unallocated_list
v3: Fix multi-line comment format
v2: Change to C-style comments and fix indentation
Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Bruno Jiménez <brunojimen@gmail.com>
Michel Dänzer [Thu, 28 Aug 2014 02:12:20 +0000 (11:12 +0900)]
u_vbuf: Make sure all caps are initialized
Pointed out by valgrind.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83148
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Michel Dänzer [Fri, 29 Aug 2014 03:09:16 +0000 (12:09 +0900)]
r600g: Reinstate include path to common radeon source directory
Fixes build failure since commit
a131263a2f19507ca0d2f6093672d930a7c054d1
('gallium/radeon: cleanup header inclusion'):
../../../../../src/gallium/drivers/r600/evergreen_compute.c:50:30: fatal error: radeon_llvm_util.h: No such file or directory
#include "radeon_llvm_util.h"
^
compilation terminated.
Trivial.
Matt Turner [Thu, 28 Aug 2014 01:39:35 +0000 (18:39 -0700)]
i965: Mark BRW_CONDITIONAL_R as Gen <= 5.
Matt Turner [Tue, 26 Aug 2014 01:40:24 +0000 (18:40 -0700)]
i965/disasm: Show jump count for if/iff/halt.
These instructions don't have pop count.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Sun, 24 Aug 2014 06:59:30 +0000 (23:59 -0700)]
i965/disasm: Disassemble JMPI's source properly.
The source can be a register as well as an immediate, and disassembling
a register as an immediate can have some strange results.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Fri, 22 Aug 2014 00:01:15 +0000 (17:01 -0700)]
i965/disasm: Add break/cont/halt to list of has_uip().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Matt Turner [Sun, 24 Aug 2014 21:49:51 +0000 (14:49 -0700)]
i965/disasm: Disassemble Z/NZ conditional modifiers as .z/.nz.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>