mesa.git
9 years agost/nine: Handle RSQ special cases
Axel Davy [Wed, 3 Dec 2014 14:28:42 +0000 (15:28 +0100)]
st/nine: Handle RSQ special cases

We should use the absolute value of the input as input to ureg_RSQ.

Moreover, an input of 0.0 should return FLT_MAX.

Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Fix POW implementation
Axel Davy [Wed, 3 Dec 2014 14:14:07 +0000 (15:14 +0100)]
st/nine: Fix POW implementation

POW doesn't match directly TGSI, since we should
take the absolute value of src0.

Fixes black textures in some games

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Fix typo for M4x4
Axel Davy [Fri, 26 Dec 2014 10:02:08 +0000 (11:02 +0100)]
st/nine: Fix typo for M4x4

Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
9 years agost/nine: Correctly declare NineTranslateInstruction_Mkxn inputs
Axel Davy [Fri, 26 Dec 2014 08:22:26 +0000 (09:22 +0100)]
st/nine: Correctly declare NineTranslateInstruction_Mkxn inputs

Let's say we have c1 and c2 declared in the shader and c0 given by the app

Then here we would have read c0, c1 and c2 given by the app, instead
of the correct c0, c1, c2.

This correction fixes several issues in some games.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Saturate oFog and oPts vs outputs
Axel Davy [Wed, 3 Dec 2014 13:47:24 +0000 (14:47 +0100)]
st/nine: Saturate oFog and oPts vs outputs

According to docs and Wine, these two vs outputs have
to be saturated.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Remove some shader unused code
Axel Davy [Mon, 22 Dec 2014 17:44:06 +0000 (18:44 +0100)]
st/nine: Remove some shader unused code

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Convert integer constants to floats before storing them when cards don't...
Axel Davy [Fri, 2 Jan 2015 12:42:11 +0000 (13:42 +0100)]
st/nine: Convert integer constants to floats before storing them when cards don't support integers

The shader code is already behaving as if they are floats when the the card doesn't support integers

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Rework of boolean constants
Axel Davy [Fri, 2 Jan 2015 12:00:06 +0000 (13:00 +0100)]
st/nine: Rework of boolean constants

Convert them to shader booleans at earlier stage.
Previous code is fine, but later patch will make
integers being converted at earlier stage, so do
the same for booleans

Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Add ATI1 and ATI2 support
Axel Davy [Sun, 7 Dec 2014 17:11:40 +0000 (18:11 +0100)]
st/nine: Add ATI1 and ATI2 support

Adds ATI1 and ATI2 support to nine.

They map to PIPE_FORMAT_RGTC1_UNORM and PIPE_FORMAT_RGTC2_UNORM,
but need special handling.

Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Xavier Bouchoux <xavierb@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Check if srgb format is supported before trying to use it.
Axel Davy [Wed, 3 Dec 2014 22:33:07 +0000 (23:33 +0100)]
st/nine: Check if srgb format is supported before trying to use it.

According to msdn, we must act as if user didn't ask srgb if we don't
support it.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Hack to generate resource if it doesn't exist when getting view
Stanislaw Halik [Thu, 4 Dec 2014 15:52:22 +0000 (16:52 +0100)]
st/nine: Hack to generate resource if it doesn't exist when getting view

Buffers in the MANAGED pool are supposed to have the content in a ram buffer,
a copy in VRAM if there is enough memory (driver manages memory and decide when
to delete the buffer in VRAM).

This is not implemented properly in nine, and a VRAM copy is going to be created
when the RAM memory is filled, and the VRAM copy will get synced with the RAM
memory updates.

Due to some issues (in the implementation or in app logic), it can happen
we try to create a sampler view of the resource while we haven't created the
VRAM resource. This hack creates the resource when we hit this case, which prevents
crashing, but doesn't help with the resource content.

This fixes several games crashing at launch.

Acked-by: Axel Davy <axel.davy@ens.fr>
Acked-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Stanislaw Halik <sthalik@misaki.pl>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: NineBaseTexture9: update sampler view creation
Axel Davy [Tue, 2 Dec 2014 22:33:37 +0000 (23:33 +0100)]
st/nine: NineBaseTexture9: update sampler view creation

While previous code was having the correct behaviour in general,
this new code is more readable (without checking all gallium formats
manually) and has a more defined behaviour for depth stencil resources.

Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Return D3DERR_INVALIDCALL when trying to create a texture of bad format
Axel Davy [Sun, 7 Dec 2014 15:51:49 +0000 (16:51 +0100)]
st/nine: Return D3DERR_INVALIDCALL when trying to create a texture of bad format

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
9 years agost/nine: Fix crash when deleting non-implicit swapchain
Axel Davy [Tue, 2 Dec 2014 23:07:26 +0000 (00:07 +0100)]
st/nine: Fix crash when deleting non-implicit swapchain

The implicit swapchains are destroyed when the device instance is
destroyed. However for non-implicit swapchains, it is not the case,
and the application can have kept an reference on the swapchain
buffers to reuse them.

Fixes problems with battle.net launcher.

Cc: "10.4" <mesa-stable@lists.freedesktop.org>
Tested-by: Nick Sarnie <commendsarnex@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
9 years agost/nine: CubeTexture: fix GetLevelDesc
Axel Davy [Tue, 2 Dec 2014 21:44:37 +0000 (22:44 +0100)]
st/nine: CubeTexture: fix GetLevelDesc

This->surfaces contains the surfaces associated to the levels
and faces. This->surfaces[6*Level] is what we want here,
since it gives us a face descriptor for the level 'Level'.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Xavier Bouchoux <xavierb@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: NineBaseTexture9: fix setting of last_layer
Axel Davy [Tue, 2 Dec 2014 21:18:30 +0000 (22:18 +0100)]
st/nine: NineBaseTexture9: fix setting of last_layer

Use same similar settings as u_sampler_view_default_template

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Correctly advertise D3DPMISCCAPS_CLIPTLVERTS
Axel Davy [Thu, 25 Dec 2014 10:04:10 +0000 (11:04 +0100)]
st/nine: Correctly advertise D3DPMISCCAPS_CLIPTLVERTS

The cap means D3DFVF_XYZRHW vertices will see clipping.
This is not the case when
PIPE_CAP_TGSI_VS_WINDOW_SPACE_POSITION is supported, since
it'll disable clipping.

Reviewed-by: Tiziano Bacocco <tizbac2@gmail.com>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Fix D3DRS_POINTSPRITE support
Xavier Bouchoux [Wed, 17 Dec 2014 22:10:04 +0000 (23:10 +0100)]
st/nine: Fix D3DRS_POINTSPRITE support

It's done by testing the existence of the point sprite output register *after* parsing the vertex shader.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Xavier Bouchoux <xavierb@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Add new texture format strings
Axel Davy [Sun, 7 Dec 2014 15:46:28 +0000 (16:46 +0100)]
st/nine: Add new texture format strings

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Add missing c++ declaration for IDirect3DVolumeTexture9
Xavier Bouchoux [Mon, 8 Dec 2014 22:28:28 +0000 (23:28 +0100)]
st/nine: Add missing c++ declaration for IDirect3DVolumeTexture9

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Xavier Bouchoux <xavierb@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Additional defines to d3dtypes.h
Xavier Bouchoux [Mon, 8 Dec 2014 22:31:13 +0000 (23:31 +0100)]
st/nine: Additional defines to d3dtypes.h

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Xavier Bouchoux <xavierb@gmail.com>
Cc: "10.4" <mesa-stable@lists.freedesktop.org>
9 years agost/nine: Fix clip state logic
Axel Davy [Sun, 21 Dec 2014 12:03:47 +0000 (13:03 +0100)]
st/nine: Fix clip state logic

The clip state was reset everytime, incurring an overhead.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Signed-off-by: Axel Davy <axel.davy@ens.fr>
9 years agost/nine: query: remove unused variable (trivial)
David Heidelberger [Sun, 28 Dec 2014 01:44:53 +0000 (02:44 +0100)]
st/nine: query: remove unused variable (trivial)

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: David Heidelberg <david@ixit.cz>
9 years agonir: Fix setup of constant bool initializers.
Eric Anholt [Wed, 21 Jan 2015 23:32:48 +0000 (15:32 -0800)]
nir: Fix setup of constant bool initializers.

brw_fs_nir has only seen scalar bools so far, thanks to vector splitting,
and the ralloc of in glsl_to_nir.cpp will *usually* get you a 0-filled
chunk of memory, so reading too large of a value will usually get you the
right bool value.  But once we start doing vector bools in a few commits,
we end up getting bad values.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agonir: Make an easier helper for setting up SSA defs.
Eric Anholt [Wed, 21 Jan 2015 00:23:51 +0000 (16:23 -0800)]
nir: Make an easier helper for setting up SSA defs.

Almost all instructions we nir_ssa_def_init() for are nir_dests, and you
have to keep from forgetting to set is_ssa when you do.  Just provide the
simpler helper, instead.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agoglsl: Link glsl_test with pthreads library.
Jonathan Gray [Fri, 9 Jan 2015 01:33:17 +0000 (12:33 +1100)]
glsl: Link glsl_test with pthreads library.

Otherwise pthread_mutex_lock will be an undefined reference
on OpenBSD.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88219
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
9 years agoscons: Add X11 include path if X11 is available.
Vinson Lee [Mon, 19 Jan 2015 20:54:44 +0000 (12:54 -0800)]
scons: Add X11 include path if X11 is available.

Mac OS X XQuartz places X11 headers at /opt/X11/include.

This patch fixes this Mac OS X SCons build error.

  Compiling src/gallium/state_trackers/glx/xlib/glx_api.c ...
In file included from src/gallium/state_trackers/glx/xlib/glx_api.c:34:
include/GL/glx.h:30:10: fatal error: 'X11/Xlib.h' file not found
         ^

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
9 years agometa: Move loop declaration to top of block.
José Fonseca [Thu, 22 Jan 2015 20:05:48 +0000 (20:05 +0000)]
meta: Move loop declaration to top of block.

Fixes MSVC build.

Trvial.

9 years agoi965/tex_subimage: use meta instead of the blitter for PBO TexSubImage
Jason Ekstrand [Tue, 13 Jan 2015 00:22:30 +0000 (16:22 -0800)]
i965/tex_subimage: use meta instead of the blitter for PBO TexSubImage

Reviewed-by: Neil Roberts <neil@linux.intel.com>
9 years agoi965/tex_image: Use meta for instead of the blitter PBO TexImage and GetTexImage
Jason Ekstrand [Tue, 13 Jan 2015 00:21:17 +0000 (16:21 -0800)]
i965/tex_image: Use meta for instead of the blitter PBO TexImage and GetTexImage

Reviewed-by: Neil Roberts <neil@linux.intel.com>
9 years agoi965/pixel_read: Use meta_pbo_GetTexSubImage for PBO ReadPixels
Jason Ekstrand [Tue, 13 Jan 2015 00:20:27 +0000 (16:20 -0800)]
i965/pixel_read: Use meta_pbo_GetTexSubImage for PBO ReadPixels

Since the meta path can do strictly more than the blitter path, we just
remove the blitter path entirely.

Reviewed-by: Neil Roberts <neil@linux.intel.com>
9 years agometa: Add an implementation of GetTexSubImage for PBOs
Jason Ekstrand [Mon, 12 Jan 2015 23:39:59 +0000 (15:39 -0800)]
meta: Add an implementation of GetTexSubImage for PBOs

Reviewed-by: Neil Roberts <neil@linux.intel.com>
9 years agometa: Add a BlitFramebuffers-based implementation of TexSubImage
Jason Ekstrand [Tue, 6 Jan 2015 02:17:04 +0000 (18:17 -0800)]
meta: Add a BlitFramebuffers-based implementation of TexSubImage

This meta path, designed for use with PBO's, creates a temporary texture
out of the PBO and uses BlitFramebuffers to do the actual texture upload.

v2 Jason Ekstrand <jason.ekstrand@intel.com>:
 - Add support for handling simple packing options

v3 Jason Ekstrand <jason.ekstrand@intel.com>:
 - Refactor to split out the texture-from-pbo code
 - Rename to _mesa_meta_pbo_TexSubImage

Reviewed-by: Neil Roberts <neil@linux.intel.com>
9 years agoformats: Use a hash table for _mesa_format_from_array_format
Jason Ekstrand [Mon, 12 Jan 2015 22:43:34 +0000 (14:43 -0800)]
formats: Use a hash table for _mesa_format_from_array_format

Going through the for loop every time has noticable overhead.  This fixes
things up so we only do that once ever and then just do a hash table lookup
which should be much cheaper.

v2 Jason Ekstrand <jason.ekstrand@intel.com>:
 - Use once_flag and call_once from c11/threads.h instead of pthreads

Reviewed-by: Neil Roberts <neil@linux.intel.com>
9 years agoi965: Implement SetTextureStorageForBufferObject
Jason Ekstrand [Thu, 8 Jan 2015 05:13:49 +0000 (21:13 -0800)]
i965: Implement SetTextureStorageForBufferObject

Reviewed-by: Neil Roberts <neil@linux.intel.com>
9 years agoi965: Apply the miptree offset to surface state for renderbuffers
Jason Ekstrand [Tue, 13 Jan 2015 17:50:37 +0000 (09:50 -0800)]
i965: Apply the miptree offset to surface state for renderbuffers

Previously, we were completely ignoring the mt->offset field for
renderbuffers.  While it does have some alignment constraints, it is valid
to use it.  This patch adds the code to each of the 4 surface state setup
functions to handle it.

Reviewed-by: Neil Roberts <neil@linux.intel.com>
9 years agoi965/mipmap_tree: Add a depth parameter to create_for_bo
Jason Ekstrand [Thu, 8 Jan 2015 20:23:46 +0000 (12:23 -0800)]
i965/mipmap_tree: Add a depth parameter to create_for_bo

Reviewed-by: Neil Roberts <neil@linux.intel.com>
9 years agomesa/dd: Add a function for creating a texture from a buffer object
Jason Ekstrand [Thu, 8 Jan 2015 05:13:15 +0000 (21:13 -0800)]
mesa/dd: Add a function for creating a texture from a buffer object

Reviewed-by: Neil Roberts <neil@linux.intel.com>
9 years agoglsl: do not allow interface block to have name already taken
Tapani Pälli [Mon, 19 Jan 2015 10:28:17 +0000 (12:28 +0200)]
glsl: do not allow interface block to have name already taken

Fixes currently failing Piglit case
   interface-blocks-name-reused-globally.vert

v2: combine var declaration with assignment (Ian)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agonir: Replace assert(0) with unreachable().
Matt Turner [Thu, 22 Jan 2015 04:22:18 +0000 (20:22 -0800)]
nir: Replace assert(0) with unreachable().

Fixes a couple of warnings in the process.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agoi965/vec4: Fix fprintf argument ordering.
Matt Turner [Thu, 22 Jan 2015 04:16:38 +0000 (20:16 -0800)]
i965/vec4: Fix fprintf argument ordering.

Introduced in commit 3167a80b.

9 years agonir: Stop using designated initializers
Jason Ekstrand [Wed, 21 Jan 2015 19:11:03 +0000 (11:11 -0800)]
nir: Stop using designated initializers

Designated initializers with anonymous unions don't work in MSVC or
GCC < 4.6.  With a couple of constructor methods, we don't need them any
more and the code is actually cleaner.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88467
Reviewed-by: Connor Abbot <cwabbott0@gmail.com>
9 years agomesa: change assert to unreachable in two format functions
Tobias Klausmann [Mon, 19 Jan 2015 20:51:38 +0000 (21:51 +0100)]
mesa: change assert to unreachable in two format functions

This fixes two problems reported by osc:
I: Program returns random data in a function
E: Mesa no-return-in-nonvoid-function ../../src/mesa/main/format_utils.c:180
E: Mesa no-return-in-nonvoid-function ../../src/mesa/main/glformats.c:2714

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tobias Klausmann <tobias.johannes.klausmann@mni.thm.de>
9 years agonir: Add src and dest constructors
Jason Ekstrand [Wed, 21 Jan 2015 19:10:11 +0000 (11:10 -0800)]
nir: Add src and dest constructors

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agomesa: Add assert to check number of vector elements
Jan Vesely [Wed, 14 Jan 2015 21:12:06 +0000 (16:12 -0500)]
mesa: Add assert to check number of vector elements

The below code crashes when vector_elements <= 0
Fixes Warray-bounds warnings

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
9 years agomesa: Fix some signed-unsigned comparison warnings
Jan Vesely [Thu, 15 Jan 2015 18:41:04 +0000 (13:41 -0500)]
mesa: Fix some signed-unsigned comparison warnings

v2: s/unsigned int/unsigned/ in prog_optimize.c

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: David Heidelberg <david@ixit.cz>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
9 years agomesa: remove comparisons that are always true
Jan Vesely [Wed, 14 Jan 2015 20:53:18 +0000 (15:53 -0500)]
mesa: remove comparisons that are always true

Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
9 years agonir: Add a nir_foreach_phi_src helper macro
Jason Ekstrand [Wed, 21 Jan 2015 00:30:14 +0000 (16:30 -0800)]
nir: Add a nir_foreach_phi_src helper macro

Reviewed-by: Connor Abbott <cwabbott02gmail.com>
9 years agoi965: Extract scalar region checking logic
Ben Widawsky [Tue, 23 Dec 2014 03:29:22 +0000 (19:29 -0800)]
i965: Extract scalar region checking logic

There are currently 2 users of this functionality. I have 2 more users coming
up, and having a simple function makes the results much cleaner. The existing
interface semantics was proposed by Matt.

v2 (Ken): Rename to region_matches()/has_scalar_region().

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoi965: Add QWORD sizes to type_sz macro
Ben Widawsky [Tue, 23 Dec 2014 03:29:13 +0000 (19:29 -0800)]
i965: Add QWORD sizes to type_sz macro

GEN8 added the QWORD as a valid type for certain operations on the EU.
In order to calculate the number of registers used one must have the type
size as part of the equation. Quoting the formula in the code:

   regs_written = (dst.width * dst.stride * type_sz(dst.type) + 31) / 32;

Adding this separately for bisection since there is no simple way to add
an assert in the type_sz function.

NOTE: As a side note, I was confused for a while because it's impossible
to calculate the region, ie. registers needed, without vstride.  However,
at this point these are all part of the IR, and so no vstride must exist.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agovc4: Fix build since 8ed5305d28d9309d651dfec3fbf4349854694694
Eric Anholt [Tue, 20 Jan 2015 22:19:29 +0000 (14:19 -0800)]
vc4: Fix build since 8ed5305d28d9309d651dfec3fbf4349854694694

9 years agofreedreno/a4xx: sysmem bypass
Rob Clark [Sun, 18 Jan 2015 19:47:15 +0000 (14:47 -0500)]
freedreno/a4xx: sysmem bypass

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agofreedreno: update generated headers
Rob Clark [Sun, 18 Jan 2015 23:10:02 +0000 (18:10 -0500)]
freedreno: update generated headers

Signed-off-by: Rob Clark <robclark@freedesktop.org>
9 years agoradeonsi: Re-enable LLVM IR dumps
Tom Stellard [Wed, 7 Jan 2015 20:51:48 +0000 (15:51 -0500)]
radeonsi: Re-enable LLVM IR dumps

This was inadvertently disabled by
761e36b4caab4e8e09a4c2b1409a825902fc7d2c.

9 years agoradeonsi/compute: Use relocs for scratch pointer rather than user sgprs v2
Tom Stellard [Wed, 10 Dec 2014 01:05:44 +0000 (20:05 -0500)]
radeonsi/compute: Use relocs for scratch pointer rather than user sgprs v2

Instead of passing a pointer to the scratch buffer via user sgprs, we
now patch the shader with the buffer address using reloc information
from the LLVM generated ELF.

v2:
  - Make sure not to break older LLVM.

9 years agoradeon: Teach radeon_elf_read() how to parse reloc information v3
Tom Stellard [Wed, 10 Dec 2014 01:03:50 +0000 (20:03 -0500)]
radeon: Teach radeon_elf_read() how to parse reloc information v3

v2:
  - Use strdup for copying reloc names.
  - Free reloc memory.

v3:
  - Add free_relocs parameter to radeon_shader_binary_free_members()

9 years agoradeon: Add a helper function for freeing members of radeon_shader_binary
Tom Stellard [Wed, 14 Jan 2015 15:01:29 +0000 (10:01 -0500)]
radeon: Add a helper function for freeing members of radeon_shader_binary

9 years agoi965: Work around mysterious Gen4 GPU hangs with minimal state changes.
Kenneth Graunke [Sun, 18 Jan 2015 07:21:15 +0000 (23:21 -0800)]
i965: Work around mysterious Gen4 GPU hangs with minimal state changes.

Gen4 hardware appears to GPU hang frequently when using Chromium, and
also when running 'glmark2 -b ideas'.  Most of the error states contain
3DPRIMITIVE commands in quick succession, with very few state packets
between them - usually VERTEX_BUFFERS/ELEMENTS and CONSTANT_BUFFER.

I trimmed an apitrace of the glmark2 hang down to two draw calls with a
glUniformMatrix4fv call between the two.  Either draw by itself works
fine, but together, they hang the GPU.  Removing the glUniform call
makes the hangs disappear.  In the hardware state, this translates to
removing the CONSTANT_BUFFER packet between the two 3DPRIMITIVE packets.

Flushing before emitting CONSTANT_BUFFER packets also appears to make
the hangs disappear.  I observed a slowdown in glxgears by doing it all
the time, so I've chosen to only do it when BRW_NEW_BATCH and
BRW_NEW_PSP are unset (i.e. we haven't done a CS_URB_STATE change or
already flushed the whole pipeline).

I'd much rather understand the problem, but at this point, I don't see
how we'd ever be able to track it down further.  We have no real tools,
and the hardware people moved on years ago.  I've analyzed 20+ error
states and read every scrap of documentation I could find.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80568
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85367
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Acked-by: Matt Turner <mattst88@gmail.com>
Cc: "10.4 10.3" <mesa-stable@lists.freedesktop.org>
9 years agoi965/nir: Enable SIMD16 support in the NIR FS backend.
Kenneth Graunke [Fri, 16 Jan 2015 09:40:33 +0000 (01:40 -0800)]
i965/nir: Enable SIMD16 support in the NIR FS backend.

With the previous commits in place, it just works.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoi965/nir: Use offset() instead of altering reg_offset directly.
Kenneth Graunke [Fri, 16 Jan 2015 21:16:18 +0000 (13:16 -0800)]
i965/nir: Use offset() instead of altering reg_offset directly.

offset() properly handles reg_width, so it'll work for SIMD16.

While we're in the area, simplify a few cases, and use retype() to cut a
few more lines of code.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoi965/nir: Replace fs_reg(GRF, virtual_grf_alloc(...)) with vgrf(...).
Kenneth Graunke [Fri, 16 Jan 2015 10:12:17 +0000 (02:12 -0800)]
i965/nir: Replace fs_reg(GRF, virtual_grf_alloc(...)) with vgrf(...).

brw_fs_nir.cpp creates almost all of its registers via:

   fs_reg reg = fs_reg(GRF, virtual_grf_alloc(num_components));

When we add SIMD16 support, we'll need to set reg->width = 16 and
double the VGRF size...on pretty much every VGRF it allocates.

This patch replaces that pattern with a new "vgrf" helper method:

   fs_reg reg = vgrf(num_components);

The new function correctly takes reg_width into account.  For now,
reg_width is always 1, so this should have no functional change.

v2: Just make vgrf() account for reg_width right away, rather than
    changing the behavior in the next patch.

v3: Replace one last virtual_grf_alloc I missed.  It's used in code
    that only runs for dispatch_width == 8, so it doesn't matter,
    but consistency is nice.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agoi965: Replace fs_reg(fs_visitor, type) with fs_visitor::vgrf(type).
Kenneth Graunke [Fri, 16 May 2014 09:21:51 +0000 (02:21 -0700)]
i965: Replace fs_reg(fs_visitor, type) with fs_visitor::vgrf(type).

I dislike how fs_reg has a constructor that knows about fs_visitor.
Apart from that, it stands alone, with no need to interact with the
rest of the compiler.  Which is sensible - a class that represents
a register should do just that.  Allocating virtual register numbers
should be left up to the compiler (fs_visitor).

This patch replaces the constructor with a new fs_visitor::vgrf method,
eliminating fs_reg's dependency on fs_visitor.  It ends up being no
more code.

v2: Rebase from May 2014 -> January 2015.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agost/mesa: don't set vs.key.clamp_color if a shader doesn't write any colors
Marek Olšák [Sun, 11 Jan 2015 17:38:03 +0000 (18:38 +0100)]
st/mesa: don't set vs.key.clamp_color if a shader doesn't write any colors

And update some comments.

9 years agowinsys/radeon: increase the size of buffer cache
Marek Olšák [Mon, 12 Jan 2015 13:35:24 +0000 (14:35 +0100)]
winsys/radeon: increase the size of buffer cache

This should fix this performance regression:
https://bugs.freedesktop.org/show_bug.cgi?id=88227

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
9 years agoRename sha1.c and sha1.h to mesa-sha1.c and mesa-sha1.h
Carl Worth [Mon, 19 Jan 2015 18:49:41 +0000 (10:49 -0800)]
Rename sha1.c and sha1.h to mesa-sha1.c and mesa-sha1.h

The filename of sha1.h was conflicting with the system-provided
sha1.h, (and in some confiurations, our sha1.c was unsuccessfully
attemping to include "sha1.h" and <sha1.h> as two different files).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88523

9 years agomesa: fix a trivial spelling mistake
Martin Peres [Mon, 19 Jan 2015 08:52:05 +0000 (10:52 +0200)]
mesa: fix a trivial spelling mistake

Signed-off-by: Martin Peres <martin.peres@linux.intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agomesa: support GL_RGB for GL_EXT_texture_type_2_10_10_10_REV
Tapani Pälli [Fri, 16 Jan 2015 10:48:43 +0000 (12:48 +0200)]
mesa: support GL_RGB for GL_EXT_texture_type_2_10_10_10_REV

Commit 8ec6534 changed texture upload path and the way how texture
format is being checked, this commit adds support for GL_RGB with
GL_UNSIGNED_INT_2_10_10_10_REV as specified by the extension
EXT_texture_type_2_10_10_10_REV specification.

This fixes regression in ES3 conformance test
   ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels

v2: add MESA_FORMAT_R10G10B10X2_UNORM format (Iago Toral)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88385
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
9 years agomesa: Add ARB_shader_precision infrastructure
Micah Fedke [Wed, 31 Dec 2014 20:16:52 +0000 (14:16 -0600)]
mesa: Add ARB_shader_precision infrastructure

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
9 years agoi965/fs: Fix the dummy fragment shader.
Kenneth Graunke [Sat, 17 Jan 2015 09:01:35 +0000 (01:01 -0800)]
i965/fs: Fix the dummy fragment shader.

We hit an assertion that the destination of the FB write should not be
an immediate.  (I don't know what we were thinking.)  Use ARF null.

Trying to substitute real shaders with the dummy shader would crash
when trying to upload non-existent uniforms.  Say there are none.

It also wouldn't generate any code because we didn't compute the CFG,
and code generation now requires it.  Compute it.

Gen4-5 also require a message header to be present.

On Gen6+, there were assertion failures in SF/SBE state because
urb_setup was memset to 0 instad of -1, causing it to think there were
attributes when nothing was set up right.  Set to no attributes.

Finally, you have to ensure "Setup URB Entry Read Length" is non-zero
or you get GPU hangs, at least on Crestline.

It now works on at least Crestline and Haswell.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agogbm: Define _DEFAULT_SOURCE to avoid warning
Kristian Høgsberg [Sat, 17 Jan 2015 05:54:54 +0000 (21:54 -0800)]
gbm: Define _DEFAULT_SOURCE to avoid warning

glibc 2.19 introduced _DEFUAULT_SOURCE as a replacement for _BSD_SOURCE,
and deprecates _BSD_SOURCE with an annoying warning.  Defining both is
how you're supposed to transition so let's do that.  It gets rid of the
warning and we can figure out when/if we can drop _BSD_SOURCE later.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
9 years agosha1: Fix gcry_md_hd_t typo.
Vinson Lee [Sat, 17 Jan 2015 00:21:41 +0000 (16:21 -0800)]
sha1: Fix gcry_md_hd_t typo.

Fix build error.

  CC       libmesautil_la-sha1.lo
sha1.c: In function '_mesa_sha1_final':
sha1.c:210:22: error: 'grcy_md_hd_t' undeclared (first use in this function)
    gcry_md_hd_t h = (grcy_md_hd_t) ctx;
                      ^

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88519
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
9 years agonir: s/malloc.h/stdlib.h/
Vinson Lee [Sat, 17 Jan 2015 00:14:51 +0000 (16:14 -0800)]
nir: s/malloc.h/stdlib.h/

Fix build error on Mac OS X.

  CC       nir_to_ssa.lo
nir_to_ssa.c:29:10: fatal error: 'malloc.h' file not found
         ^

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88478
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
9 years agoi965: Fix up too-wide comment
Kristian Høgsberg [Fri, 16 Jan 2015 22:42:27 +0000 (14:42 -0800)]
i965: Fix up too-wide comment

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
9 years agogbm/dri: Fix const confusion
Kristian Høgsberg [Fri, 16 Jan 2015 22:29:40 +0000 (14:29 -0800)]
gbm/dri: Fix const confusion

The driver name is no longer const, it's always allocated dynamically
one way or another.  Drop const from dri_screen_create_dri2
driver_name argument to avoid warning.

Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
9 years agoconfigure: Add machinery for --enable-shader-cache (and --disable-shader-cache)
Carl Worth [Wed, 14 Jan 2015 23:53:00 +0000 (15:53 -0800)]
configure: Add machinery for --enable-shader-cache (and --disable-shader-cache)

We don't actually have the code for the shader cache just yet, but
this configure machinery puts everything in place so that the shader
cache can be optionally compiled in.

Specifically, if the user passes no option (neither
--disable-shader-cache, nor --enable-shader-cache), then this feature
will be automatically detected based on the presence of a usable SHA-1
library. If no suitable library can be found, then the shader cache
will be automatically disabled, (and reported in the final output from
configure).

The user can force the shader-cache feature to not be compiled, (even
if a SHA-1 library is detected), by passing
--disable-shader-cache. This will prevent the compiled Mesa libraries
from depending on any library for SHA-1 implementation.

Finally, the user can also force the shader cache on with
--enable-shader-cache. This will cause configure to trigger a fatal
error if no sutiable SHA-1 implementation can be found for the
shader-cache feature.

Bug fix by José Fonseca <jfonseca@vmware.com>: Fix to put conditional
assignment in Makefile.am, not Makefile.sources to avoid breaking
scons build.

Note: As recommended by José, with this commit the scons build will
not compile any of the SHA-1-using code. This is waiting for someone
to write SConstruct detection of the available SHA-1 libraries, (and
set the appropriate HAVE_SHA1_* variables).

Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agomesa: Add mesa SHA-1 functions
Carl Worth [Fri, 12 Dec 2014 21:55:30 +0000 (13:55 -0800)]
mesa: Add mesa SHA-1 functions

The upcoming shader cache uses the SHA-1 algorithm for cryptographic
naming. These new mesa_sha1 functions are implemented with any one of
several differeny cryptographics libraries.

This code was copied from the xserver repository, (where it has
apparently been functioning well on a variety of operating systems),
and comes licensed with a license identical to that of Mesa.

Bug fixes by José Fonseca <jfonseca@vmware.com>: Fix to put
conditional assignment in Makefile.am, not Makefile.sources to avoid
breaking scons build. Fix include file for CryptoAPI section. Fix
missing cast in openssl section.

Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoconfigure: Add copyright and license block to configure.ac
Carl Worth [Thu, 11 Dec 2014 22:33:44 +0000 (14:33 -0800)]
configure: Add copyright and license block to configure.ac

Prior to copying in code from the xserver configure.ac file, it makes
sense to have the license of this file clearly marked, (to show that
it's licensed identically to the configure.ac file from the xserver
repository).

And since the text of the license refers to "the above copyright
notice" it also makes sense to have an actual copyright attribution in
place.

I generated this list of names by looking at the output of:

git shortlog -n --format=%aD -- configure.ac

(and arbitrarily stopping for contributors with fewer than 15
commits). Then for each name, I looked for existing Copyright
attributions in the mesa source tree with the same name, (and using
"Intel Corporation" as the copyright holder where I knew that was
appropriate).

9 years agoglsl: Add unit tests for blob.c
Carl Worth [Mon, 15 Dec 2014 23:58:34 +0000 (15:58 -0800)]
glsl: Add unit tests for blob.c

In addition to exercising all of the functions in blob.h, this
includes a stress test that forces some reallocing, and also tests to
verify the alignment and overrun-detection code in blob.c.

9 years agoglsl: Add blob_overwrite_bytes and blob_overwrite_uint32
Tapani Pälli [Thu, 13 Nov 2014 07:16:51 +0000 (23:16 -0800)]
glsl: Add blob_overwrite_bytes and blob_overwrite_uint32

These functions are useful when serializing an unknown number of items
to a blob. The caller can first save the current offset, write a
placeholder uint32, write out (and count) the items, then use
blob_overwrite_uint32 with the saved offset to replace the placeholder
value.

Then, when deserializing, the reader will first read the count and
know how many subsequent items to expect.

(I wrote this code after reading a very similar patch written by
Tapani when he wrote serialization code for IR. Since I re-used the
idea of his code so directly, I've credited him as the author of this
code. --Carl)

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agoglsl: Add blob.c---a simple interface for serializing data
Carl Worth [Thu, 4 Dec 2014 22:16:47 +0000 (14:16 -0800)]
glsl: Add blob.c---a simple interface for serializing data

This new interface allows for writing a series of objects to a chunk
of memory (a "blob").. The allocated memory is maintained within the
blob itself, (and re-allocated by doubling when necessary).

There are also functions for reading objects from a blob as well. If
code attempts to read beyond the available memory, the read functions
return 0 values (or its moral equivalent) without reading past the
allocated memory. Once the caller is done with the reads, it can check
blob->overrun to ensure whether any invalid values were previously
returned due to attempts to read too far.

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agomesa: Add iterate method for string_to_uint_map
Tapani Pälli [Mon, 2 Jun 2014 12:05:51 +0000 (15:05 +0300)]
mesa: Add iterate method for string_to_uint_map

The upcoming shader cache needs this to be able to cache hash data
from the gl_shader_program structure.

Edited-by: Carl Worth <cworth@cworth.org>:
There is an internal implementation detail that the hash table
underlying the struct string_to_uint_map stores each value internally
as (value+1). The user needn't be very concerned with this (other than
knowing that a value of UINT_MAX cannot be stored) since put() adds 1
and get() subtracts 1.

So in this commit, rather than call the user's function directly with
hash_table_call_foreach, we call through a wrapper that fixes up the
off-by-one values before the caller's callback sees them.

And with this wrapper in place, we also give a better signature to the
callback function being passed to iterate(), so that this callback
function can actually expect a char* and an unsigned argument, (rather
than a couple of void* ).

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
9 years agoutil: Make unreachable at least be an assert
Carl Worth [Fri, 5 Dec 2014 16:05:44 +0000 (08:05 -0800)]
util: Make unreachable at least be an assert

Previously, if __builtin_unreachable() was unavailable, the
unreachable macro was defined to do nothing. We do better here, by at
least still making it an assert.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agoglsl: Add convenience function get_sampler_instance
Carl Worth [Wed, 22 Oct 2014 23:58:26 +0000 (16:58 -0700)]
glsl: Add convenience function get_sampler_instance

This is similar to the existing functions get_instance,
get_array_instance, etc. for getting a type singleton. The new
get_sampler_instance() function will be used by the upcoming shader
cache.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agoi965: Fix some oddities in FB_WRITE register width and execution size.
Kenneth Graunke [Fri, 16 Jan 2015 08:53:53 +0000 (00:53 -0800)]
i965: Fix some oddities in FB_WRITE register width and execution size.

Previously, we generated this for FB writes in SIMD16 mode:

load_payload(16) vgrf5@8+0.0:F, vgrf1:F, vgrf2:F, vgrf3:F, vgrf4:F
fb_write(8) (null):UD, vgrf5@8+0.0:F 1sthalf

The LOAD_PAYLOAD's destination had its register width set to 8, and the
FB_WRITE had its execution size set to 8.  This seems wrong, and while
it probably doesn't affect anything, we should fix it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agoi965/fs: Make lower_load_payload etc. appear in INTEL_DEBUG=optimizer.
Kenneth Graunke [Fri, 16 Jan 2015 09:05:21 +0000 (01:05 -0800)]
i965/fs: Make lower_load_payload etc. appear in INTEL_DEBUG=optimizer.

In order to support calling lower_load_payload() inside a condition,
this patch makes OPT() a statement expression:

https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html

We recently did the equivalent change in the vec4 backend (commit
9b8bd67768769b685c25e1276e053505aede5f93).

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agoformat_utils: Use a more precise conversion when decreasing bits
Neil Roberts [Wed, 14 Jan 2015 15:14:05 +0000 (15:14 +0000)]
format_utils: Use a more precise conversion when decreasing bits

When converting to a format that has fewer bits the previous code was just
shifting off the bits. This doesn't provide very accurate results. For example
when converting from 8 bits to 5 bits it is equivalent to doing this:

x * 32 / 256

This works as if it's taking a value from a range where 256 represents 1.0 and
scaling it down to a range where 32 represents 1.0. However this is not
correct because it is actually 255 and 31 that represent 1.0.

We can do better with a formula like this:

(x * 31 + 127) / 255

The +127 is to make it round correctly.

The new code has a special case to use uint64_t when the result of the
multiplication would overflow an unsigned int. This function is inline and
only ever called with constant values so hopefully the if statements will be
folded.

The main incentive to do this is to make the CPU conversion path pick the same
values as the hardware would if it did the conversion. This fixes failures
with the ‘texsubimage pbo’ test when using the patches from here:

http://lists.freedesktop.org/archives/mesa-dev/2015-January/074312.html

v2: Use 64-bit arithmetic when src_bits+dst_bits > 32

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agoi965/gen6: Fix crash with VS+TF after rendering with GS
Iago Toral Quiroga [Wed, 7 Jan 2015 09:08:57 +0000 (10:08 +0100)]
i965/gen6: Fix crash with VS+TF after rendering with GS

Rendering with a GS and then using transform feedback with a program that does
not have a GS can crash in gen6. The reason for this is that
brw_begin_transform_feedback checks brw->geometry_program to decide if there
is a GS program, but this is not correct: brw->geometry_program is updated when
issuing drawing commands, so after rendering with a GS it will be non-NULL
until we draw again with a program that does not have a GS. If the next
program uses TF, we will call glBegintransformFeedback before issuing
the drawing command and hence brw->geometry_program will be non-NULL if
the previous rendering used a GS. The right thing to do here is to check
ctx->_Shader->CurrentProgram[MESA_SHADER_GEOMETRY] instead. This is what the
gen7 code path does too.

Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=87694
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
9 years agonir/live_variables: Use a worklist
Jason Ekstrand [Fri, 19 Dec 2014 19:49:58 +0000 (11:49 -0800)]
nir/live_variables: Use a worklist

This is a rework of the liveness algorithm using a worklist as suggested by
Connor.  Doing so reduces the number of times we walk over the instructions
because we don't have to do an entire pointless walk over the instructions
just to figure out it's time to stop.  Also, the stuff after the last loop
in the funciton will only ever get visited once.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: Add a worklist helper structure
Jason Ekstrand [Fri, 19 Dec 2014 19:05:02 +0000 (11:05 -0800)]
nir: Add a worklist helper structure

A worklist is a common concept in optimizations.  This adds a structure
that we can reuse for many different types of optimizations.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: fix incorrect argument passed to validate_src() in validate_tex_instr()
Brian Paul [Fri, 16 Jan 2015 00:38:39 +0000 (17:38 -0700)]
nir: fix incorrect argument passed to validate_src() in validate_tex_instr()

Silences a compiler warning.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agonir: silence compiler warning from visit_src() call
Brian Paul [Thu, 15 Jan 2015 22:28:14 +0000 (15:28 -0700)]
nir: silence compiler warning from visit_src() call

v2: use proper argument

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
9 years agomesa: move GET_CURRENT_CONTEXT() to top of _mesa_init_renderbuffer()
Brian Paul [Thu, 15 Jan 2015 22:30:40 +0000 (15:30 -0700)]
mesa: move GET_CURRENT_CONTEXT() to top of _mesa_init_renderbuffer()

To fix MSVC build.

Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agomesa: Fix render buffer initial internal format in GLES 3
Mike Mason [Wed, 14 Jan 2015 20:12:27 +0000 (12:12 -0800)]
mesa: Fix render buffer initial internal format in GLES 3

Changes the initial internal format of a render buffer
to GL_RGBA4 in GLES 3. This fixes a failure in the following
DrawElements test:

  dEQP-GLES3.functional.state_query.rbo.renderbuffer_internal_format

Reviewed-by: Chad Versace <chad.versace@intel.com>
9 years agoutil/hash_set: Rework the API to know about hashing
Jason Ekstrand [Thu, 15 Jan 2015 17:31:18 +0000 (09:31 -0800)]
util/hash_set:  Rework the API to know about hashing

Previously, the set API required the user to do all of the hashing of keys
as it passed them in.  Since the hashing function is intrinsically tied to
the comparison function, it makes sense for the hash set to know about
it.  Also, it makes for a somewhat clumsy API as the user is constantly
calling hashing functions many of which have long names.  This is
especially bad when the standard call looks something like

_mesa_set_add(ht, _mesa_pointer_hash(key), key);

In the above case, there is no reason why the hash set shouldn't do the
hashing for you.  We leave the option for you to do your own hashing if
it's more efficient, but it's no longer needed.  Also, if you do do your
own hashing, the hash set will assert that your hash matches what it
expects out of the hashing function.  This should make it harder to mess up
your hashing.

This is analygous to 94303a0750 where we did this for hash_table

Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
9 years agoutil: Move main/set to util/hash_set
Jason Ekstrand [Thu, 15 Jan 2015 16:06:05 +0000 (08:06 -0800)]
util: Move main/set to util/hash_set

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
9 years agohash_table: Rename insert_with_hash to insert_pre_hashed
Jason Ekstrand [Thu, 15 Jan 2015 15:58:07 +0000 (07:58 -0800)]
hash_table: Rename insert_with_hash to insert_pre_hashed

We already have search_pre_hashed.  This makes the APIs match better.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
9 years agoi965: Don't consider null dst instructions as matching non-null dst.
Matt Turner [Mon, 12 Jan 2015 21:58:06 +0000 (13:58 -0800)]
i965: Don't consider null dst instructions as matching non-null dst.

When performing common subexpression elimination on instructions with
non-null destinations we emit a MOV to copy the result to a new
register that must have no other uses. In the case of:

   cmp.g.f0.0(8) null:D, vgrf43:F, 0.500000f
   ...
   cmp.g.f0.0(8) vgrf113:D, vgrf43:F, 0.500000f

we put the first instruction in the AEB and decided that we could reuse
its result when we found the second. Unfortunately, that meant that we'd
emit a MOV from the first's destination, which is null.

Don't do anything if the entry's destination is null and the
instruction's destination is non-null.

Tested-by: Tapani Pälli <tapani.palli@intel.com>
9 years agoi965/vec4: Make sure that imm writes are to registers in the same file.
Matt Turner [Mon, 12 Jan 2015 18:48:04 +0000 (10:48 -0800)]
i965/vec4: Make sure that imm writes are to registers in the same file.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87887

9 years agoi965/fs: Emit MADs from (x + abs(y * z)).
Matt Turner [Tue, 13 Jan 2015 21:35:15 +0000 (13:35 -0800)]
i965/fs: Emit MADs from (x + abs(y * z)).

Just use the abs source modifier on both of the multiplicand
arguments.

instructions in affected programs:     300 -> 296 (-1.33%)

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>
9 years agoi965/fs: Emit MADs from (x + -(y * z)).
Matt Turner [Sat, 20 Dec 2014 05:30:16 +0000 (21:30 -0800)]
i965/fs: Emit MADs from (x + -(y * z)).

Just use the negation source modifier on one of the multiplicand
arguments.

total instructions in shared programs: 5889529 -> 5880016 (-0.16%)
instructions in affected programs:     600846 -> 591333 (-1.58%)

Reviewed-by: Kristian Høgsberg <krh@bitplanet.net>