mesa.git
9 years agosvga: update the svga3d device header files
Brian Paul [Thu, 6 Aug 2015 22:28:19 +0000 (16:28 -0600)]
svga: update the svga3d device header files

Remove some obsolete svga_dump.c code for items which no longer exist.

Signed-off-by: Brian Paul <brianp@vmware.com>
9 years agosvga: add new version 10 device header files
Brian Paul [Fri, 7 Aug 2015 20:56:03 +0000 (14:56 -0600)]
svga: add new version 10 device header files

Signed-off-by: Brian Paul <brianp@vmware.com>
9 years agowinsys/svga: add new vmw_query.c[h] files
Brian Paul [Wed, 29 Jul 2015 17:23:29 +0000 (11:23 -0600)]
winsys/svga: add new vmw_query.c[h] files

Functions for creating, destroying, getting queries, etc.

Signed-off-by: Brian Paul <brianp@vmware.com>
9 years agometa: Compute correct buffer size with SkipRows/SkipPixels
Chris Wilson [Tue, 1 Sep 2015 08:31:15 +0000 (09:31 +0100)]
meta: Compute correct buffer size with SkipRows/SkipPixels

If the user is specifying a subregion of a buffer using SKIP_ROWS and
SKIP_PIXELS, we must compute the buffer size carefully as the end of the
last row may be much shorter than stride*image_height*depth. The current
code tries to memcpy from beyond the end of the user data, for example
causing:

==28136== Invalid read of size 8
==28136==    at 0x4C2D94E: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915)
==28136==    by 0xB4ADFE3: brw_bo_write (brw_batch.c:1856)
==28136==    by 0xB5B3531: brw_buffer_data (intel_buffer_objects.c:208)
==28136==    by 0xB0F6275: _mesa_buffer_data (bufferobj.c:1600)
==28136==    by 0xB0F6346: _mesa_BufferData (bufferobj.c:1631)
==28136==    by 0xB37A1EE: create_texture_for_pbo (meta_tex_subimage.c:103)
==28136==    by 0xB37A467: _mesa_meta_pbo_TexSubImage (meta_tex_subimage.c:176)
==28136==    by 0xB5C8D61: intelTexSubImage (intel_tex_subimage.c:195)
==28136==    by 0xB254AB4: _mesa_texture_sub_image (teximage.c:3654)
==28136==    by 0xB254C9F: texsubimage (teximage.c:3712)
==28136==    by 0xB2550E9: _mesa_TexSubImage2D (teximage.c:3853)
==28136==    by 0x401CA0: UploadTexSubImage2D (teximage.c:171)
==28136==  Address 0xd8bfbe0 is 0 bytes after a block of size 1,024 alloc'd
==28136==    at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==28136==    by 0x402014: PerfDraw (teximage.c:270)
==28136==    by 0x402648: Draw (glmain.c:182)
==28136==    by 0x8385E63: ??? (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x83896C8: fgEnumWindows (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x838641C: glutMainLoopEvent (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x8386C1C: glutMainLoop (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x4019C1: main (glmain.c:262)
==28136==
==28136== Invalid read of size 8
==28136==    at 0x4C2D940: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915)
==28136==    by 0xB4ADFE3: brw_bo_write (brw_batch.c:1856)
==28136==    by 0xB5B3531: brw_buffer_data (intel_buffer_objects.c:208)
==28136==    by 0xB0F6275: _mesa_buffer_data (bufferobj.c:1600)
==28136==    by 0xB0F6346: _mesa_BufferData (bufferobj.c:1631)
==28136==    by 0xB37A1EE: create_texture_for_pbo (meta_tex_subimage.c:103)
==28136==    by 0xB37A467: _mesa_meta_pbo_TexSubImage (meta_tex_subimage.c:176)
==28136==    by 0xB5C8D61: intelTexSubImage (intel_tex_subimage.c:195)
==28136==    by 0xB254AB4: _mesa_texture_sub_image (teximage.c:3654)
==28136==    by 0xB254C9F: texsubimage (teximage.c:3712)
==28136==    by 0xB2550E9: _mesa_TexSubImage2D (teximage.c:3853)
==28136==    by 0x401CA0: UploadTexSubImage2D (teximage.c:171)
==28136==  Address 0xd8bfbe8 is 8 bytes after a block of size 1,024 alloc'd
==28136==    at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==28136==    by 0x402014: PerfDraw (teximage.c:270)
==28136==    by 0x402648: Draw (glmain.c:182)
==28136==    by 0x8385E63: ??? (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x83896C8: fgEnumWindows (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x838641C: glutMainLoopEvent (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x8386C1C: glutMainLoop (in /usr/lib/x86_64-linux-gnu/libglut.so.3.9.0)
==28136==    by 0x4019C1: main (glmain.c:262)
==28136==

Fixes regression from commit 7f396189f073d626c5f7a2c232dac92b65f5a23f
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Mon Jan 5 18:17:04 2015 -0800

    meta: Add a BlitFramebuffers-based implementation of TexSubImage

v2: However, the teximage we create does need to be width x full_height x 1

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Neil Roberts <neil@linux.intel.com>
Reviewed-by Neil Roberts <neil@linux.intel.com>

9 years agoi965/vec4: fill src_reg type using the constructor type parameter
Alejandro Piñeiro [Tue, 1 Sep 2015 15:02:20 +0000 (17:02 +0200)]
i965/vec4: fill src_reg type using the constructor type parameter

The src_reg constructor that received the glsl_type was using it
only to build the swizzle, but not to fill this->type as dst_reg
is doing.

This caused some type mismatch between movs and alu operations
on the NIR path, so copy propagation optimization was not applied
to remove unneeded movs if negate modifier was involved. This was
first detected on minus (negate+add) operations.

Shader DB results (taking into account only vec4):

total instructions in shared programs: 20019 -> 19934 (-0.42%)
instructions in affected programs:     2918 -> 2833 (-2.91%)
helped:                                79
HURT:                                  0
GAINED:                                0
LOST:                                  0

Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agor600g: Add doubles support for CYPRESS
Glenn Kennard [Wed, 12 Aug 2015 00:27:39 +0000 (10:27 +1000)]
r600g: Add doubles support for CYPRESS

This doesn't enable the support, just adds some of
the code, so we don't have to keep rebasing.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agor600g: add doubles support for CAYMAN
Dave Airlie [Fri, 20 Feb 2015 00:47:15 +0000 (10:47 +1000)]
r600g: add doubles support for CAYMAN

Only a subset of AMD GPUs supported by r600g support doubles,
CAYMAN and CYPRESS are probably all we'll try and support, however
I don't have a CYPRESS so ignore that for now.

This disables SB support for doubles, as we think we need to
make the scheduler smarter to introduce delay slots.

[airlied: pushing this to avoid pain of rebasing, it mostly
works on cayman only so far, Glenn has some ideas about
delay slot issues we need to look into. turned off by
default for now]

Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agotgsi/scan: add uses_doubles to tgsi scanner
Dave Airlie [Fri, 20 Feb 2015 00:40:46 +0000 (10:40 +1000)]
tgsi/scan: add uses_doubles to tgsi scanner

This allows drivers to work out if a shader contains any
double opcodes easily.

Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agor600g: add multiple stream support for geom shaders
Glenn Kennard [Thu, 9 Jul 2015 06:37:28 +0000 (16:37 +1000)]
r600g: add multiple stream support for geom shaders

This patch is taken from work by Glenn and myself,
and I've spent some time making it all work here.

This adds support for the multiple streams part of
ARB_gpu_shader5 to r600g.

It doesn't enable ARB_gpu_shader5 yet.

Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agor600g/sb: add support for multiple streams to SB backend
Dave Airlie [Thu, 9 Jul 2015 06:36:16 +0000 (16:36 +1000)]
r600g/sb: add support for multiple streams to SB backend

This adds a peephole and removes an assert that isn't
actually valid with some of the stream emit instructions.

Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agor600g: add support for streams to the assembler.
Dave Airlie [Thu, 9 Jul 2015 06:30:26 +0000 (16:30 +1000)]
r600g: add support for streams to the assembler.

This just adds support to the assembler dumper and allows
stream instructions to be generated. Also fix up the stream
debugging to add stream info.

Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agor600g/sb: dump sampler/resource index modes for textures.
Dave Airlie [Tue, 25 Aug 2015 01:18:48 +0000 (11:18 +1000)]
r600g/sb: dump sampler/resource index modes for textures.

This just aids debugging.

Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agomesa/readpixels: check strides are equal before skipping conversion
Dave Airlie [Tue, 1 Sep 2015 05:57:02 +0000 (15:57 +1000)]
mesa/readpixels: check strides are equal before skipping conversion

The CTS packed_pixels test checks that readpixels doesn't write
into the space between rows, however we fail that here unless
we check the format and stride match.

This fixes all the core mesa problems with CTS packed_pixels
tests.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agotexcompress_s3tc/fxt1: fix stride checks (v1.1)
Dave Airlie [Tue, 1 Sep 2015 05:44:46 +0000 (15:44 +1000)]
texcompress_s3tc/fxt1: fix stride checks (v1.1)

The fastpath currently checks the RowLength != width, but
if you have a RowLength of 7, and Alignment of 4, then
that shouldn't match.

align the rowlength to the pack alignment before comparing.

This fixes compressed cases in CTS packed_pixels_pixelstore
test when SKIP_PIXELS is enabled, which causes row length
to get set.

v1.1: add fxt1 fix (Iago)

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agost/readpixels: fix accel path for skipimages.
Dave Airlie [Tue, 1 Sep 2015 05:13:45 +0000 (15:13 +1000)]
st/readpixels: fix accel path for skipimages.

We don't need to use the 3d image address here as that will
include SKIP_IMAGES, and we are only blitting a single
2D anyways, so just use the 2D path.

This fixes some memory overruns under CTS
 packed_pixels.packed_pixels_pixelstore when PACK_SKIP_IMAGES
is used.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agomesa/formats: 8-bit channel integer formats addition
Dave Airlie [Thu, 30 Jul 2015 01:48:37 +0000 (02:48 +0100)]
mesa/formats: 8-bit channel integer formats addition

Add enough 8-bit channel formats to handle all the
different things CTS throws at us.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agomesa/formats: add some formats from GL3.3
Dave Airlie [Thu, 30 Jul 2015 01:48:36 +0000 (02:48 +0100)]
mesa/formats: add some formats from GL3.3

GL3.3 added GL_ARB_texture_rgb10_a2ui, which specifies
a lot more things than just rgb10/a2ui.

While playing with ogl conform one of the tests must
attempted all valid formats for GL3.3 and hits the
unreachable here.

This adds the first chunk of formats that hit the
assert.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agomesa: handle SwapBytes in compressed texture get code.
Dave Airlie [Tue, 25 Aug 2015 11:13:13 +0000 (21:13 +1000)]
mesa: handle SwapBytes in compressed texture get code.

This case just wasn't handled, so add support for it.

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agomesa: fix SwapBytes handling in numerous places
Dave Airlie [Tue, 25 Aug 2015 04:36:01 +0000 (14:36 +1000)]
mesa: fix SwapBytes handling in numerous places

In a number of places the SwapBytes handling didn't handle cases with
GL_(UN)PACK_ALIGNMENT set and 7 byte width cases aligned to 8 bytes.

This adds a common routine to swap bytes a 2D image and uses this
code in:

texture storage
texture get
readpixels
swrast drawpixels.

[airlied: updated with Brian's nitpicks].

Cc: "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agoauxiliary/os: Don't implement os_get_option() on embedded builds.
José Fonseca [Tue, 1 Sep 2015 22:29:17 +0000 (16:29 -0600)]
auxiliary/os: Don't implement os_get_option() on embedded builds.

Let it be defined externally instead, allowing setting mechanisms other
than environment variables.

Reviewed-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Matthew McClure <mcclurem@vmware.com>
9 years agoutil: add a couple primitive restart helper functions
Brian Paul [Tue, 1 Sep 2015 22:29:17 +0000 (16:29 -0600)]
util: add a couple primitive restart helper functions

The first function translates prim restart indexes to be 0xffff or
0xffffffff.

The second splits indexed primitives with restart indexes into sub-
primitives without restart indexes.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
9 years agotgsi: add tgsi utility to transform a fragment shader to support aa point
Charmaine Lee [Tue, 1 Sep 2015 22:29:17 +0000 (16:29 -0600)]
tgsi: add tgsi utility to transform a fragment shader to support aa point

This adds a tgsi utility tgsi_add_aa_point to transform a fragment shader
to support anti-aliased wide point by computing the fragment distance from
the point center. This utility assumes the geometry shader is emitting
an extra generic output with point coord data. The semantic index of
this generic output is passed to the tgsi_add_aa_point utility.

Reviewed-by: Brian Paul <brianp@vmware.com>
9 years agotgsi: adds tgsi utility to transform a shader to support point sprite
Charmaine Lee [Tue, 1 Sep 2015 22:29:17 +0000 (16:29 -0600)]
tgsi: adds tgsi utility to transform a shader to support point sprite

This adds a tgsi utility tgsi_add_point_sprite to transform a geometry
shader to emulate wide points by drawing quads. This utility adds an
extra output for the original point position if the point position is
to be written to a stream output buffer. It also assumes the driver will
add a constant for inverse viewport scale after the user defined constants.

Reviewed-by: Brian Paul <brianp@vmware.com>
9 years agotgsi: add new tgsi_two_side.c utility code
Brian Paul [Tue, 1 Sep 2015 22:29:17 +0000 (16:29 -0600)]
tgsi: add new tgsi_two_side.c utility code

This could be used by any driver where the device doesn't directly
support two-sided lighting.  This code modifies a fragment shader
to accecpt back-face colors and choose between the front/back colors
depending on the triangle's front-face sign.

9 years agoutil: add util_strcasecmp() wrapper
Brian Paul [Tue, 1 Sep 2015 22:29:17 +0000 (16:29 -0600)]
util: add util_strcasecmp() wrapper

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
9 years agogallium/util: add a utility to create geometry passthrough shader
Charmaine Lee [Tue, 1 Sep 2015 22:29:17 +0000 (16:29 -0600)]
gallium/util: add a utility to create geometry passthrough shader

Reviewed-by: Brian Paul <brianp@vmware.com>
9 years agogallium/util: fix returning empty box for rectangle intersection
Roland Scheidegger [Tue, 1 Sep 2015 22:29:17 +0000 (16:29 -0600)]
gallium/util: fix returning empty box for rectangle intersection

These functions deal with inclusive coordinates, hence a 0/0/0/0 rect
returned when there's no intersection doesn't actually represent an empty
rectangle. Hence return 0/-1/0/-1 instead.
This fixes some problems in llvmpipe with empty scissor rects (which up
to now didn't really matter because while the intersect test returned the
wrong result all pixels were scissored away later anyway).

9 years agogallium/util: return FALSE for intersection if there's empty rectangles
Roland Scheidegger [Tue, 1 Sep 2015 22:29:17 +0000 (16:29 -0600)]
gallium/util: return FALSE for intersection if there's empty rectangles

It isn't really obvious if intersection test should take into account empty
rectangles or if the caller should do it. But it looks like most callers
actually verified one of the rects but not the other, but since correctly
returning an empty rect that other rect could actually be empty leading to
more bugs. Hence just verify both rects for emptyness in the intersection
test itself which makes the code easier in the caller (though it will be
slower if the caller knows the rectangles are non-empty).

Reviewed-by: Zack Rusin <zackr@vmware.com>
9 years agotgsi: add some more helper functions
Charmaine Lee [Tue, 1 Sep 2015 22:29:17 +0000 (16:29 -0600)]
tgsi: add some more helper functions

This patch adds some more helper functions such as
   . tgsi_transform_temps_decl
   . tgsi_transform_output_decl
   . tgsi_transform_dst_reg
   . tgsi_transform_src_reg

Reviewed-by: Brian Paul <brianp@vmware.com>
9 years agotgsi: added tgsi_is_shadow_target() helper
Brian Paul [Tue, 1 Sep 2015 22:29:17 +0000 (16:29 -0600)]
tgsi: added tgsi_is_shadow_target() helper

9 years agotgsi: add negate parameter to tgsi_transform_kill_inst()
Brian Paul [Tue, 1 Sep 2015 22:29:17 +0000 (16:29 -0600)]
tgsi: add negate parameter to tgsi_transform_kill_inst()

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
9 years agoutil: added ffsll() function
Brian Paul [Tue, 1 Sep 2015 22:29:17 +0000 (16:29 -0600)]
util: added ffsll() function

v2: fix errant _GNU_SOURCE test, per Matt Turner.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
9 years agoutil: added util_set_index_buffer()
Brian Paul [Tue, 1 Sep 2015 22:29:17 +0000 (16:29 -0600)]
util: added util_set_index_buffer()

Like util_set_vertex_buffers_count(), this basically just copies a
pipe_index_buffer object, taking care of refcounting.

9 years agomesa: Move gl_vert_attrib from mtypes.h to shader_enums.h
Jason Ekstrand [Mon, 31 Aug 2015 21:55:49 +0000 (14:55 -0700)]
mesa: Move gl_vert_attrib from mtypes.h to shader_enums.h

It is a shader enum after all...

Acked-by: Brian Paul <brianp@vmware.com>
9 years agoglapi: Inline x86_64_current_tls().
Matt Turner [Fri, 26 Sep 2014 00:28:20 +0000 (17:28 -0700)]
glapi: Inline x86_64_current_tls().

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
9 years agor600g: Simplify out a couple of unnecessary branches
Edward O'Callaghan [Tue, 1 Sep 2015 08:38:34 +0000 (18:38 +1000)]
r600g: Simplify out a couple of unnecessary branches

Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
9 years agoradeonsi: use an indirect buffer for init_config
Marek Olšák [Sun, 30 Aug 2015 16:46:06 +0000 (18:46 +0200)]
radeonsi: use an indirect buffer for init_config

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: add IB2 indirect buffer support for pm4 states
Marek Olšák [Sun, 30 Aug 2015 16:39:19 +0000 (18:39 +0200)]
radeonsi: add IB2 indirect buffer support for pm4 states

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agowinsys/radeon: add a flag telling how gfx IBs should be padded
Marek Olšák [Sun, 30 Aug 2015 15:41:23 +0000 (17:41 +0200)]
winsys/radeon: add a flag telling how gfx IBs should be padded

This is always false on amdgpu (set by calloc).

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agowinsys/amdgpu: remove IB padding for SI
Marek Olšák [Sun, 30 Aug 2015 15:39:03 +0000 (17:39 +0200)]
winsys/amdgpu: remove IB padding for SI

SI is unsupported by amdgpu

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: remove unused macro si_pm4_set_state
Marek Olšák [Sun, 30 Aug 2015 12:43:59 +0000 (14:43 +0200)]
radeonsi: remove unused macro si_pm4_set_state

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: remove si_pm4_cleanup
Marek Olšák [Sun, 30 Aug 2015 12:39:54 +0000 (14:39 +0200)]
radeonsi: remove si_pm4_cleanup

All remaining pm4 state are created and destroyed by state trackers.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: rework uploading border colors
Marek Olšák [Sun, 30 Aug 2015 12:13:10 +0000 (14:13 +0200)]
radeonsi: rework uploading border colors

The border colors are uploaded only once when the state is created.

This brings truly immutable sampler descriptors, because they don't have
to be updated every time a sampler state is re-bound.

It also moves the TA_BC_BASE_ADDR registers to init_config, removing one
more state. The catch is there is now a limit: only 4096 border colors can
be used by one context. I don't think that will be a problem.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: use all built-in border colors
Marek Olšák [Sun, 30 Aug 2015 11:17:15 +0000 (13:17 +0200)]
radeonsi: use all built-in border colors

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: inline si_cmd_context_control
Marek Olšák [Sun, 30 Aug 2015 10:39:45 +0000 (12:39 +0200)]
radeonsi: inline si_cmd_context_control

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: remove unused si_pm4_state code
Marek Olšák [Sun, 30 Aug 2015 10:35:02 +0000 (12:35 +0200)]
radeonsi: remove unused si_pm4_state code

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: reorder si_context variables
Marek Olšák [Sun, 30 Aug 2015 10:25:03 +0000 (12:25 +0200)]
radeonsi: reorder si_context variables

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: don't send IB dword usage to si_need_cs_space
Marek Olšák [Sun, 30 Aug 2015 01:56:13 +0000 (03:56 +0200)]
radeonsi: don't send IB dword usage to si_need_cs_space

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: don't set number of IB dwords for states
Marek Olšák [Sun, 30 Aug 2015 01:53:39 +0000 (03:53 +0200)]
radeonsi: don't set number of IB dwords for states

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: don't count IB space for states, just use an upper bound
Marek Olšák [Sun, 30 Aug 2015 01:49:15 +0000 (03:49 +0200)]
radeonsi: don't count IB space for states, just use an upper bound

Since we don't put any resource descriptors in IBs, the space used by draw
calls is quite small.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: convert SPI state to an atom
Marek Olšák [Sun, 30 Aug 2015 01:17:30 +0000 (03:17 +0200)]
radeonsi: convert SPI state to an atom

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agogallium/radeon: rename r600_context_bo_reloc -> radeon_add_to_buffer_list
Marek Olšák [Sun, 30 Aug 2015 00:04:37 +0000 (02:04 +0200)]
gallium/radeon: rename r600_context_bo_reloc -> radeon_add_to_buffer_list

this name should be easy to understand without other knowledge

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agogallium/radeon: rename write_*_reg functions
Marek Olšák [Sat, 29 Aug 2015 23:54:00 +0000 (01:54 +0200)]
gallium/radeon: rename write_*_reg functions

e.g. radeon_set_context_reg is nicer and looks consistent next to
radeon_emit().

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: rename and precalculate polygon offset states
Marek Olšák [Sat, 29 Aug 2015 23:35:03 +0000 (01:35 +0200)]
radeonsi: rename and precalculate polygon offset states

one less calloc and state construction while drawing

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: convert CB_TARGET_MASK setup to an atom
Marek Olšák [Sat, 29 Aug 2015 22:50:42 +0000 (00:50 +0200)]
radeonsi: convert CB_TARGET_MASK setup to an atom

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: don't set VGT_VTX_CNT_EN twice in init_config
Marek Olšák [Sat, 29 Aug 2015 22:16:01 +0000 (00:16 +0200)]
radeonsi: don't set VGT_VTX_CNT_EN twice in init_config

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: convert stencil ref state into an atom
Marek Olšák [Sat, 29 Aug 2015 15:00:11 +0000 (17:00 +0200)]
radeonsi: convert stencil ref state into an atom

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: convert blend color state into an atom
Marek Olšák [Sat, 29 Aug 2015 13:05:53 +0000 (15:05 +0200)]
radeonsi: convert blend color state into an atom

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: convert sample mask state into an atom
Marek Olšák [Sat, 29 Aug 2015 13:05:53 +0000 (15:05 +0200)]
radeonsi: convert sample mask state into an atom

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: convert clip state into an atom
Marek Olšák [Sat, 29 Aug 2015 12:54:58 +0000 (14:54 +0200)]
radeonsi: convert clip state into an atom

Reducing calloc overhead.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: avoid redundant CB and DB register updates
Marek Olšák [Sat, 29 Aug 2015 00:32:13 +0000 (02:32 +0200)]
radeonsi: avoid redundant CB and DB register updates

The main idea is to avoid setting CB_COLORi_INFO = 0 for i>0 repeatedly
when those colorbuffers aren't used. This is mainly for glamor.

Same for DB. Z_INFO and STENCIL_INFO need to be cleared only once.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: don't rebind GSVS ring buffers every draw call using GS
Marek Olšák [Sat, 29 Aug 2015 00:02:29 +0000 (02:02 +0200)]
radeonsi: don't rebind GSVS ring buffers every draw call using GS

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: don't clear the tessellation factor ring buffer
Marek Olšák [Fri, 28 Aug 2015 23:56:27 +0000 (01:56 +0200)]
radeonsi: don't clear the tessellation factor ring buffer

Leftover from the bring-up.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: remove the tf_ring state, add the registers to init_config
Marek Olšák [Fri, 28 Aug 2015 23:45:28 +0000 (01:45 +0200)]
radeonsi: remove the tf_ring state, add the registers to init_config

One less state to worry about.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: remove the gs_rings state, add the registers to init_config
Marek Olšák [Fri, 28 Aug 2015 23:45:28 +0000 (01:45 +0200)]
radeonsi: remove the gs_rings state, add the registers to init_config

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: use a bitmask for tracking dirty atoms
Marek Olšák [Fri, 28 Aug 2015 22:49:40 +0000 (00:49 +0200)]
radeonsi: use a bitmask for tracking dirty atoms

This mainly removes the cache misses when checking the dirty flags.
Not much else though.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: initialize atom IDs for external atoms
Marek Olšák [Fri, 28 Aug 2015 22:03:02 +0000 (00:03 +0200)]
radeonsi: initialize atom IDs for external atoms

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: call si_init_atom for remaining radeonsi atoms
Marek Olšák [Fri, 28 Aug 2015 21:52:47 +0000 (23:52 +0200)]
radeonsi: call si_init_atom for remaining radeonsi atoms

I need to initialize more atom IDs.

This adds 4 more si_init_atom calls, which simplifies the code.
(si_init_atom needs a different context type of the emit functions though)

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: initialize atom IDs
Marek Olšák [Fri, 28 Aug 2015 21:26:50 +0000 (23:26 +0200)]
radeonsi: initialize atom IDs

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: define the state atom array separately
Marek Olšák [Fri, 28 Aug 2015 19:59:22 +0000 (21:59 +0200)]
radeonsi: define the state atom array separately

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: optimize viewport states
Marek Olšák [Fri, 28 Aug 2015 19:48:37 +0000 (21:48 +0200)]
radeonsi: optimize viewport states

same as scissors

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: optimize scissor states
Marek Olšák [Fri, 28 Aug 2015 19:08:49 +0000 (21:08 +0200)]
radeonsi: optimize scissor states

- convert 16 states to 1 atom
- only emit 1 scissor if VIEWPORT_INDEX isn't written
- use only one packet when emitting consecutive scissors

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: add SI_MAX_ATTRIBS
Marek Olšák [Fri, 28 Aug 2015 20:33:02 +0000 (22:33 +0200)]
radeonsi: add SI_MAX_ATTRIBS

PIPE_MAX_ATTRIBS is 32, but we currently only support 16.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: fix memory usage checking for big IBs
Marek Olšák [Sun, 30 Aug 2015 01:44:03 +0000 (03:44 +0200)]
radeonsi: fix memory usage checking for big IBs

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: set all 16 viewport Z bounds for GL 4.1
Marek Olšák [Sat, 29 Aug 2015 22:12:03 +0000 (00:12 +0200)]
radeonsi: set all 16 viewport Z bounds for GL 4.1

Cc: 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoradeonsi: fix a Unigine Heaven hang when drirc is missing
Marek Olšák [Sat, 29 Aug 2015 20:59:23 +0000 (22:59 +0200)]
radeonsi: fix a Unigine Heaven hang when drirc is missing

Cc: 10.6 11.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agowinsys/amdgpu: use small IBs for better performance on VI
Marek Olšák [Sun, 30 Aug 2015 09:59:23 +0000 (11:59 +0200)]
winsys/amdgpu: use small IBs for better performance on VI

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agogallium/util: add u_bit_scan_consecutive_range
Marek Olšák [Fri, 28 Aug 2015 18:53:08 +0000 (20:53 +0200)]
gallium/util: add u_bit_scan_consecutive_range

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
9 years agoi965: Prevent coordinate overflow in intel_emit_linear_blit
Chris Wilson [Sat, 6 Jun 2015 08:33:33 +0000 (09:33 +0100)]
i965: Prevent coordinate overflow in intel_emit_linear_blit

Fixes regression from
commit 8c17d53823c77ac1c56b0548e4e54f69a33285f1
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Wed Apr 15 03:04:33 2015 -0700

    i965: Make intel_emit_linear_blit handle Gen8+ alignment restrictions.

which adjusted the coordinates to be relative to the nearest cacheline.
However, this then offsets the coordinates by up to 63 and this may then
cause them to overflow the BLT limits. For the well aligned large
transfer case, we can use 32bpp pixels and so reduce the coordinates by
4 (versus the current 8bpp pixels). We also have to be more careful
doing the last line just in case it may exceed the coordinate limit.

Reported-and-tested-by: kaillasse91@hotmail.fr
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90734
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Cc: Anuj Phogat <anuj.phogat@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
9 years agoi965/nir: enable the dead control flow optimization
Connor Abbott [Fri, 1 May 2015 06:51:12 +0000 (02:51 -0400)]
i965/nir: enable the dead control flow optimization

total instructions in shared programs: 7541551 -> 7541381 (-0.00%)
instructions in affected programs:     3054 -> 2884 (-5.57%)
helped:                                29

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agonir/dead_cf: add support for removing useless loops
Connor Abbott [Fri, 8 May 2015 18:42:14 +0000 (14:42 -0400)]
nir/dead_cf: add support for removing useless loops

v2: fix detecting if the loop has any phi nodes after it.
v2: use nir_foreach_ssa_def() instead of nir_foreach_dest() when
    checking for values live after the loop to catch const_load
    instructions.
v2: fix handling return instructions
v2: add some documentation to loop_is_dead()

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agonir: add a helper for iterating over blocks in a cf node
Connor Abbott [Fri, 8 May 2015 18:40:58 +0000 (14:40 -0400)]
nir: add a helper for iterating over blocks in a cf node

We were already doing this internally for iterating over a function
implementation, so just expose it directly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agonir: add nir_block_get_following_loop() helper
Connor Abbott [Fri, 8 May 2015 17:17:10 +0000 (13:17 -0400)]
nir: add nir_block_get_following_loop() helper

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agonir/dead_cf: delete code that's unreachable due to jumps
Connor Abbott [Fri, 8 May 2015 05:44:24 +0000 (01:44 -0400)]
nir/dead_cf: delete code that's unreachable due to jumps

v2: use nir_cf_node_remove_after().
v2: use foreach_list_typed() instead of hardcoding a list walk.
v3: update to new control flow modification helpers.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agonir: add an optimization for removing dead control flow
Connor Abbott [Fri, 1 May 2015 06:38:17 +0000 (02:38 -0400)]
nir: add an optimization for removing dead control flow

v2: use nir_cf_node_remove_after() instead of our own broken thing.
v3: use the new control flow modification helpers.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agor600g: fix calculation for gpr allocation
Dave Airlie [Tue, 1 Sep 2015 02:29:58 +0000 (12:29 +1000)]
r600g: fix calculation for gpr allocation

I've been chasing a geom shader hang on rv635 since I wrote
r600 geom code, and finally I hacked some values from fglrx
in and I could run texelfetch without failures.

This is totally my fault as well, maths fail 101.

This makes geom shaders on r600 not fail heavily.

Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agomesa: Limit Framebuffer Parameter OpenGL ES 3.1 usage
Marta Lofstedt [Mon, 24 Aug 2015 11:01:53 +0000 (13:01 +0200)]
mesa: Limit Framebuffer Parameter OpenGL ES 3.1 usage

According to OpenGL ES 3.1 specification, section 9.2.1 for
glFramebufferParameter and section 9.2.3 for glGetFramebufferParameteriv:

"An INVALID_ENUM error is generated if pname is not FRAMEBUFFER_DEFAULT_WIDTH,
FRAMEBUFFER_DEFAULT_HEIGHT, FRAMEBUFFER_DEFAULT_SAMPLES, or
FRAMEBUFFER_DEFAULT_FIXED_SAMPLE_LOCATIONS."

Therefore exclude OpenGL ES 3.1 from using the GL_FRAMEBUFFER_DEFAULT_LAYERS
parameter.

Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Kevin Rogovin <kevin.rogovin at intel.com>
9 years agomesa: Expose GL_ARB_framebuffer_no_attachments to GLES 3.1
Marta Lofstedt [Tue, 1 Sep 2015 05:19:11 +0000 (08:19 +0300)]
mesa: Expose GL_ARB_framebuffer_no_attachments to GLES 3.1

V2: Conform to new standard for exposing enums for OpenGL ES 3.1.

Signed-off-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
9 years agonir/builder: Use nir_after_instr to advance the cursor
Jason Ekstrand [Mon, 31 Aug 2015 23:54:02 +0000 (16:54 -0700)]
nir/builder: Use nir_after_instr to advance the cursor

This *should* ensure that the cursor gets properly advanced in all cases.
We had a problem before where, if the cursor was created using
nir_after_cf_node on a non-block cf_node, that would call nir_before_block
on the block following the cf node.  Instructions would then get inserted
in backwards order at the top of the block which is not at all what you
would expect from nir_after_cf_node.  By just resetting to after_instr, we
avoid all these problems.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
9 years agoi965: advertise ASTC support for Skylake
Nanley Chery [Tue, 19 May 2015 19:28:20 +0000 (12:28 -0700)]
i965: advertise ASTC support for Skylake

v2: remove OES ASTC extension reference.

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
9 years agomesa/glformats: recognize ASTC formats as color formats
Nanley Chery [Mon, 31 Aug 2015 23:38:09 +0000 (16:38 -0700)]
mesa/glformats: recognize ASTC formats as color formats

ASTC formats contain RGBA components.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
9 years agomesa/texformat: use format conversion function in _mesa_choose_tex_format
Nanley Chery [Wed, 12 Aug 2015 21:41:50 +0000 (14:41 -0700)]
mesa/texformat: use format conversion function in _mesa_choose_tex_format

This function's cases for non-generic compressed formats duplicate
the GL to MESA translation in _mesa_glenum_to_compressed_format().
This patch replaces the switch cases with a call to the translation
function. This change teaches this function about ASTC, thus enabling
ASTC for glTex*Storage*() calls.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
9 years agomesa/texcompress: correct mapping of S3TC formats in conversion function
Nanley Chery [Wed, 26 Aug 2015 19:01:38 +0000 (12:01 -0700)]
mesa/texcompress: correct mapping of S3TC formats in conversion function

MESA_FORMAT_RGBA_DXT5 should actually be reserved for GL_RGBA[4]_DXT5_S3TC.
Also, Gallium and other dri drivers (radeon and nouveau) follow this mapping
scheme.

Reviewed-by: Chad Versace <chad.versace@intel.com>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
9 years agor600/sb: update last_cf for finalize if.
Dave Airlie [Mon, 31 Aug 2015 04:22:23 +0000 (14:22 +1000)]
r600/sb: update last_cf for finalize if.

As Glenn did for finalize_loop we need to update_cf when we
add a POP at the end of a shader.

I think this fixes one of the earlier shader going off end
of memory problems we've stopped.

Reviewed-by: Glenn Kennard <glenn.kennard@gmail.com>
Cc: "10.6" "11.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agoi965/fs: Use greater-equal cmod to implement maximum.
Matt Turner [Sat, 29 Aug 2015 00:10:00 +0000 (17:10 -0700)]
i965/fs: Use greater-equal cmod to implement maximum.

The docs specifically call out SEL with .l and .ge as the
implementations of MIN and MAX respectively. Among other things,
SEL with these conditional mods are commutative.

See commit 3b7f683f.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
9 years agoi965/chv|skl: Apply sampler bypass w/a
Ben Widawsky [Thu, 9 Jul 2015 00:04:10 +0000 (17:04 -0700)]
i965/chv|skl: Apply sampler bypass w/a

Certain compressed formats require this setting. The docs don't go into much
detail as to why it's needed exactly.

This patch introduces no piglit regressions on gen9 (bsw is untested). Note that
the SKL "regressions" are fixed tests, and the egl_khr_gl_colorspace tests are
WTF. The patch also fixes nothing I can find.
http://otc-mesa-ci.jf.intel.com/job/Leeroy/127820/

v2:
Reworded commit message (Matt); Added piglit results link.
Restructured condition (Matt)
Moved check out to function (Nanley). I left the setting of the bit in the
  surface state open coded because it seems to go better with the existing code.

v3:
Use and inline function only in gen8_emit_texture_surface_state() (Matt).

Cc: Matt Turner <mattst88@gmail.com>
Cc: Nanley Chery <nanleychery@gmail.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
9 years agost/mesa: move to renumbering registers in a group
Dave Airlie [Thu, 27 Aug 2015 01:13:14 +0000 (02:13 +0100)]
st/mesa: move to renumbering registers in a group

This can be done with a single pass for the instruction base,
and takes renumber_registers out of its spot on the profile.

Acked-by: Marek Olšák <marek.olsak@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agost/mesa: reduce time spent in calculating temp read/writes
Dave Airlie [Thu, 27 Aug 2015 00:46:33 +0000 (01:46 +0100)]
st/mesa: reduce time spent in calculating temp read/writes

The glsl->tgsi convertor does some temporary register reduction
however in profiling shader-db this shows up quite highly,

so optimise things to reduce the number of loops through
all the instructions we do. This drops merge_registers
from 4-5% on the profile to 1%. I think this can be reduced
further by possibly optimising the renumber pass.

Acked-by: Marek Olšák <marek.olsak@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agost/mesa: cache tgsi opcode info in the instruction
Dave Airlie [Thu, 27 Aug 2015 00:01:00 +0000 (01:01 +0100)]
st/mesa: cache tgsi opcode info in the instruction

Instead of looking this up lots, lets just cache it in the instruction
translation up front. I just noticed this function what high in a profile
of shader-db on radeonsi.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
9 years agor600: move prim convert from geom shader to function.
Dave Airlie [Sun, 30 Aug 2015 10:40:31 +0000 (20:40 +1000)]
r600: move prim convert from geom shader to function.

This should avoid C++ fail including this header.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>