mesa.git
7 years agoisl: Refactor row pitch calculation (v2)
Chad Versace [Sat, 25 Feb 2017 00:23:02 +0000 (16:23 -0800)]
isl: Refactor row pitch calculation (v2)

The calculations of row_pitch, the row pitch's alignment, surface size,
and base_alignment were mixed together. This patch moves the calculation
of row_pitch and its alignment to occur before the calculation of
surface_size and base_alignment.

This simplifies a follow-on patch that adds a new member, 'row_pitch',
to struct isl_surf_init_info.

v2:
  - Also extract the row pitch alignment.
  - More helper functions that will later help validate the row pitch.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
7 years agoisl: Drop misplaced comment about padding
Chad Versace [Fri, 10 Mar 2017 23:13:16 +0000 (15:13 -0800)]
isl: Drop misplaced comment about padding

isl has a giant comment that explains the hardware's padding
requirements. (Hint: Cache lines and page faults). But the comment is in
the wrong place, in isl_calc_linear_row_pitch(), which is unrelated to
padding.

The important parts of that comment were copied to
isl_apply_surface_padding() long ago. So drop the misplaced comment.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoi965/dri: Turn on support for image modifiers
Ben Widawsky [Sat, 18 Mar 2017 18:56:31 +0000 (11:56 -0700)]
i965/dri: Turn on support for image modifiers

All the plumbing is in place so the extension just needs to be
advertised.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoi965/dri: Handle X-tiled modifier
Ben Widawsky [Mon, 2 Jan 2017 23:01:54 +0000 (15:01 -0800)]
i965/dri: Handle X-tiled modifier

This doesn't really "do" anything because the default tiling for the
winsys buffer is X tiled. We do however want the X tiled modifier to
work correctly from the API perspective, which would imply that if you
set this modifier, and later do a get_modifier, you get back at least X
tiled.

Running with a modified kmscube, here are the bandwidth measurements.

Linear:
Read bandwidth: 1039.31 MiB/s
Write bandwidth: 1453.56 MiB/s

Y-tiled:
Read bandwidth: 458.29 MiB/s
Write bandwidth: 542.12 MiB/s

X-tiled:
Read bandwidth: 575.01 MiB/s
Write bandwidth: 606.25 MiB/s

Cc: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoi965/dri: Handle Y-tiled modifier
Ben Widawsky [Fri, 4 Nov 2016 19:34:40 +0000 (12:34 -0700)]
i965/dri: Handle Y-tiled modifier

This patch begins introducing how we'll actually handle the potentially
many modifiers coming in from the API, how we'll store them, and the
structure in the code to support it.

Prior to this patch, the Y-tiled modifier would be entirely ignored. It
shouldn't actually be used until this point because we've not bumped the
DRIimage extension version (which is a requirement to use modifiers).

Measuring later in the series with kmscube:
Linear:
Read bandwidth: 1048.44 MiB/s
Write bandwidth: 1483.17 MiB/s

Y-tiled:
Read bandwidth: 471.13 MiB/s
Write bandwidth: 589.10 MiB/s

Similar functionality was introduced and then reverted here:

commit 6a0d036483caf87d43ebe2edd1905873446c9589
Author: Ben Widawsky <ben@bwidawsk.net>
Date:   Thu Apr 21 20:14:58 2016 -0700

    i965: Always use Y-tiled buffers on SKL+

v2: Use last set bit instead of first set bit in modifiers to address
bug found by Daniel Stone.

v3: Use the new priority modifier selection thing. This nullifies the
bug fixed by v2 also.

v4: Get rid of modifier compaction which originally served another
purpose and now serves none (Jason)

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoi965/dri: Handle the linear fb modifier
Ben Widawsky [Fri, 13 Jan 2017 20:01:37 +0000 (12:01 -0800)]
i965/dri: Handle the linear fb modifier

At image creation create a path for dealing with the linear modifier.
This works exactly like the old usage flags where __DRI_IMAGE_USE_LINEAR
was specified.

During development of this patch series, it was decided that a lack of
modifier was an insufficient way to express the required modifiers. As a
result, 0 was repurposed to mean a modifier for a LINEAR layout.

NOTE: This patch was added for v3 of the patch series.

v2: Rework the algorithm for modifier selection to go from a bitmask
based selection to this priority value.

v3: Make DRM_FORMAT_MOD_INVALID allowed at selection as a way of
identifying no modifiers found (because 0 is LINEAR) (Jason)

v4: Remove the logic to prune unknown modifiers (like those from other
vendors) and simply handle is in select_best_modifier (Jason)

Requested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoi965/dri: Enable modifier queries
Ben Widawsky [Fri, 17 Mar 2017 20:29:08 +0000 (13:29 -0700)]
i965/dri: Enable modifier queries

New to the patch series after reordering things for landing smaller
chunks.

This will essentially enable modifiers from clients that were just
enabled in previous patches. A client could use the modifiers by
setting all of them at create, but had no way to actually query them
after creating the surface (ie. stupid clients could be broken before
this patch, but in more ways than this).

Obviously, there are no modifiers being actually stored yet - so this
patch shouldn't do anything other than allow the API to get back 0 (or
the LINEAR modifier).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoi965/dri: Store the screen associated with the image
Ben Widawsky [Thu, 20 Oct 2016 21:51:53 +0000 (14:51 -0700)]
i965/dri: Store the screen associated with the image

I intend to need to get to the devinfo structure, and storing the screen
is an easy way to do that.

It seems to be the consensus that you cannot share an image between
multiple screens.

Scape-goat: Rob Clark <robdclark@gmail.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agogbm: Disallow INVALID modifiers returned upon image creation
Ben Widawsky [Tue, 21 Mar 2017 18:59:51 +0000 (11:59 -0700)]
gbm: Disallow INVALID modifiers returned upon image creation

v2: Add a TODO about modifier validation (Jason)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoi965/dri: Disallow image with INVALID modifier
Ben Widawsky [Tue, 21 Mar 2017 18:59:33 +0000 (11:59 -0700)]
i965/dri: Disallow image with INVALID modifier

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoi965: Shut up major()/minor() warnings.
Kenneth Graunke [Mon, 20 Mar 2017 23:03:07 +0000 (16:03 -0700)]
i965: Shut up major()/minor() warnings.

Recent glibc generates this warning:

brw_performance_query.c:1648:13: warning: In the GNU C Library, "minor" is defined
 by <sys/sysmacros.h>. For historical compatibility, it is
 currently defined by <sys/types.h> as well, but we plan to
 remove this soon. To use "minor", include <sys/sysmacros.h>
 directly. If you did not intend to use a system-defined macro
 "minor", you should undefine it after including <sys/types.h>.

    min = minor(sb.st_rdev);

So, include sys/sysmacros.h to shut up the warning.

v2: Use the AC_HEADER_MAJOR defines to figure out the right header
    (thanks to Jonathan Gray for helping me not break non-glibc systems)

Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Emil Velikov <emli.velikov@collabora.com>
7 years agoi965: Drop AUB_TRACE_* stuff.
Kenneth Graunke [Mon, 20 Mar 2017 09:21:41 +0000 (02:21 -0700)]
i965: Drop AUB_TRACE_* stuff.

This was used for aubdumping (deleted a while ago) and INTEL_DEBUG=bat
decoding (deleted recently).

While we're changing parameters, delete the wrapper macro and make the
actual function brw_state_batch instead of __brw_state_batch.

This subsumes a patch by Emil Velikov to drop this from BLORP.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoi965: Use aubinator/genxml for INTEL_DEBUG=bat state decoding.
Kenneth Graunke [Thu, 16 Mar 2017 00:53:44 +0000 (17:53 -0700)]
i965: Use aubinator/genxml for INTEL_DEBUG=bat state decoding.

This deletes all of our handwritten code in favor of autogenerated
genxml-based decoding.  This should be much more usable, as the old
code isn't entirely accurate - we updated some things for new
generations, but not everything.

Aubinator has one annoying limitation: it has no idea how many entries
to print when encountering e.g. 3DSTATE_BINDING_TABLE_POINTERS_VS.  It
picks an arbitrary number, which may skip decoding valid data, and may
print extra garbage entries.

We do a better job here by making brw_state_batch track the size of the
data stored at a particular batchbuffer offset.  Then, we can divide by
the structure size to obtain the exact number of entries.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoi965: Use aubinator/genxml for INTEL_DEBUG=bat commands.
Kenneth Graunke [Wed, 15 Mar 2017 00:32:03 +0000 (17:32 -0700)]
i965: Use aubinator/genxml for INTEL_DEBUG=bat commands.

This should give substantially better decoding, as the public libdrm
decoder hasn't been properly maintained in years.

For now, we reuse the existing state dumping mechanism.  We'll improve
that in the next patch.

To avoid increasing the size of the driver, we restrict this feature
to debug builds of Mesa.  There's probably very little use for it in
release builds anyway.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agointel: Move tools/decoder.[ch] to common/gen_decoder.[ch].
Kenneth Graunke [Mon, 20 Mar 2017 18:13:07 +0000 (11:13 -0700)]
intel: Move tools/decoder.[ch] to common/gen_decoder.[ch].

This way they become part of libintel_common.la so I can use them in
the i965 driver.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agointel: Add a INTEL_DEBUG=color option.
Kenneth Graunke [Mon, 20 Mar 2017 08:58:48 +0000 (01:58 -0700)]
intel: Add a INTEL_DEBUG=color option.

This will be used for color output in debug messages.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agonir: Add positional argument specifiers.
Vinson Lee [Fri, 17 Mar 2017 23:21:38 +0000 (16:21 -0700)]
nir: Add positional argument specifiers.

Fix build with Python < 2.7.

  File "src/compiler/nir/nir_builder_opcodes_h.py", line 46, in <module>
    from nir_opcodes import opcodes
  File "src/compiler/nir/nir_opcodes.py", line 178, in <module>
    unop_convert("{}2{}{}".format(src_t[0], dst_t[0], bit_size),
ValueError: zero length field name in format

Fixes: 762a6333f21f ("nir: Rework conversion opcodes")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
7 years agor600_shader.c: check returned value of eg_get_interpolator_index
Julien Isorce [Thu, 16 Mar 2017 14:25:24 +0000 (14:25 +0000)]
r600_shader.c: check returned value of eg_get_interpolator_index

Like done in another place in that same file.

CID 1250588

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoutil/disk_cache: fix build on platforms where shader cache is disabled
Timothy Arceri [Tue, 21 Mar 2017 00:49:11 +0000 (11:49 +1100)]
util/disk_cache: fix build on platforms where shader cache is disabled

7 years agoutil/disk_cache: add a write helper
Grazvydas Ignotas [Wed, 15 Mar 2017 23:09:32 +0000 (01:09 +0200)]
util/disk_cache: add a write helper

Simplifies the write code a bit and handles EINTR.

V2: (Timothy Arceri) Drop EINTR handling. To do it
    properly we would need a retry limit but it's
    probably best to just avoid trying to write if
    we hit EINTR and try again next time we see
    the program.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
7 years agotests/cache_test: use the blob key's actual first byte
Grazvydas Ignotas [Wed, 15 Mar 2017 23:09:28 +0000 (01:09 +0200)]
tests/cache_test: use the blob key's actual first byte

There is no need to hardcode it, we can just use blob_key[0].
This is needed because the next patches are going to change how cache
keys are computed.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
7 years agoutil/disk_cache: use a helper to compute cache keys
Grazvydas Ignotas [Wed, 15 Mar 2017 23:09:27 +0000 (01:09 +0200)]
util/disk_cache: use a helper to compute cache keys

This will allow to hash additional data into the cache keys or even
change the hashing algorithm easily, should we decide to do so.

v2: don't try to compute key (and crash) if cache is disabled

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
7 years agoradv: move KHR_get_physical_device_properties2 to instance props.
Dave Airlie [Sun, 19 Mar 2017 03:41:53 +0000 (13:41 +1000)]
radv: move KHR_get_physical_device_properties2 to instance props.

This is an instance property not a device one.

Fixes:
dEQP-VK.api.info.device.extensions

Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: drop illegal DB format error.
Dave Airlie [Mon, 20 Mar 2017 20:10:15 +0000 (06:10 +1000)]
radv: drop illegal DB format error.

We'll get this if we have a stencil only setup.

Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoi965: Add autogenerated OA files to .gitignore.
Kenneth Graunke [Mon, 20 Mar 2017 23:26:25 +0000 (16:26 -0700)]
i965: Add autogenerated OA files to .gitignore.

7 years agoswr: [rasterizer] Cleanup naming of codegen files
Tim Rowley [Thu, 16 Mar 2017 18:44:52 +0000 (13:44 -0500)]
swr: [rasterizer] Cleanup naming of codegen files

All template files and generated files are prefixed with gen_.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer codegen] Remove BOM from knob_defs.py
Tim Rowley [Wed, 15 Mar 2017 20:40:31 +0000 (15:40 -0500)]
swr: [rasterizer codegen] Remove BOM from knob_defs.py

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer codegen] Rewrite gen_llvm_types.py to use mako
Tim Rowley [Wed, 15 Mar 2017 18:37:50 +0000 (13:37 -0500)]
swr: [rasterizer codegen] Rewrite gen_llvm_types.py to use mako

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer codegen] Fix generation of knobs
Tim Rowley [Wed, 15 Mar 2017 16:58:10 +0000 (11:58 -0500)]
swr: [rasterizer codegen] Fix generation of knobs

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer codegen] Change backend template comment style
Tim Rowley [Thu, 16 Mar 2017 15:26:49 +0000 (10:26 -0500)]
swr: [rasterizer codegen] Change backend template comment style

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer codegen] Rewrite gen_llvm_ir_macros.py to use mako
Tim Rowley [Wed, 15 Mar 2017 06:12:59 +0000 (01:12 -0500)]
swr: [rasterizer codegen] Rewrite gen_llvm_ir_macros.py to use mako

Don't create/use cpp files, header only now.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer codegen] Quiet gen_backends.py execution
Tim Rowley [Wed, 15 Mar 2017 03:10:07 +0000 (22:10 -0500)]
swr: [rasterizer codegen] Quiet gen_backends.py execution

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer scripts] Put codegen scripts into a separate directory
Tim Rowley [Wed, 15 Mar 2017 00:12:20 +0000 (19:12 -0500)]
swr: [rasterizer scripts] Put codegen scripts into a separate directory

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer core] Fix trifan regression from 9d3442575f
Tim Rowley [Wed, 15 Mar 2017 00:37:30 +0000 (19:37 -0500)]
swr: [rasterizer core] Fix trifan regression from 9d3442575f

Fixes piglit triangle-rasterization-overdraw.

SIMD16 path not working.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer core] SIMD16 Frontend WIP - fix tesselation crashes
Tim Rowley [Wed, 8 Mar 2017 00:23:18 +0000 (16:23 -0800)]
swr: [rasterizer core] SIMD16 Frontend WIP - fix tesselation crashes

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer jitter] Fix LogicOp blend jit after assert changes
Tim Rowley [Thu, 2 Mar 2017 17:43:11 +0000 (09:43 -0800)]
swr: [rasterizer jitter] Fix LogicOp blend jit after assert changes

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer] Convert more SWR_ASSERT(false, ...) to SWR_INVALID(...)
Tim Rowley [Wed, 1 Mar 2017 00:56:01 +0000 (16:56 -0800)]
swr: [rasterizer] Convert more SWR_ASSERT(false, ...) to SWR_INVALID(...)

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer core] Fix typo in SIMD16 code path
Tim Rowley [Wed, 1 Mar 2017 00:48:40 +0000 (16:48 -0800)]
swr: [rasterizer core] Fix typo in SIMD16 code path

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer core/common] Fix the native AVX512 build under ICC
Tim Rowley [Tue, 28 Feb 2017 21:19:26 +0000 (13:19 -0800)]
swr: [rasterizer core/common] Fix the native AVX512 build under ICC

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer core] Allow no arguments to SWR_INVALID macro
Tim Rowley [Tue, 28 Feb 2017 01:59:37 +0000 (17:59 -0800)]
swr: [rasterizer core] Allow no arguments to SWR_INVALID macro

Turns out this is somewhat tricky with gcc/g++.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer] Slight assert refactoring
Tim Rowley [Mon, 27 Feb 2017 18:11:47 +0000 (10:11 -0800)]
swr: [rasterizer] Slight assert refactoring

Make asserts more robust.

Add SWR_INVALID(...) as a replacement for SWR_ASSERT(0, ...)

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer] Backend code adjustments
Tim Rowley [Thu, 16 Feb 2017 18:53:01 +0000 (10:53 -0800)]
swr: [rasterizer] Backend code adjustments

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer archrast] Fix the early and late depthstencil events
Tim Rowley [Wed, 22 Feb 2017 19:11:58 +0000 (11:11 -0800)]
swr: [rasterizer archrast] Fix the early and late depthstencil events

The coverage and stencil mask arguments were reversed.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer core] Implement double pumped SIMD16 TESS
Tim Rowley [Tue, 21 Feb 2017 19:20:38 +0000 (11:20 -0800)]
swr: [rasterizer core] Implement double pumped SIMD16 TESS

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer archrast/core/scripts] Fix archrast multithreading issue
Tim Rowley [Sat, 18 Feb 2017 08:29:06 +0000 (00:29 -0800)]
swr: [rasterizer archrast/core/scripts] Fix archrast multithreading issue

Per pixel stats are cached but were not always being flushed as threads
moved from one draw context to the next.  Added an explicit flush to allow
all archrast objects to flush any cached events.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer archrast] Remove redundant data from archrast files
Tim Rowley [Fri, 17 Feb 2017 22:14:06 +0000 (14:14 -0800)]
swr: [rasterizer archrast] Remove redundant data from archrast files

If count can be derived from other counts then this can be done in
post processing scripts.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer archrast/scripts] Further archrast cleanups
Tim Rowley [Fri, 17 Feb 2017 21:16:59 +0000 (13:16 -0800)]
swr: [rasterizer archrast/scripts] Further archrast cleanups

Removed redundant data being written out to file

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer core] Fix RECT_LIST primitive assembly
Tim Rowley [Fri, 17 Feb 2017 04:28:28 +0000 (20:28 -0800)]
swr: [rasterizer core] Fix RECT_LIST primitive assembly

The bug would make the 3rd component of attributes on the second
triangle of a RECT be invalid.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer common] Add InterpolateComponentFlat utility
Tim Rowley [Fri, 17 Feb 2017 02:31:09 +0000 (18:31 -0800)]
swr: [rasterizer common] Add InterpolateComponentFlat utility

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer archrast] Fix performance issue with archrast stats
Tim Rowley [Thu, 16 Feb 2017 22:48:28 +0000 (14:48 -0800)]
swr: [rasterizer archrast] Fix performance issue with archrast stats

Performance is now 50x faster with archrast now that we're properly
filtering out all of the rdtsc begin/end.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer core] Implement SIMD16 GS and STREAMOUT
Tim Rowley [Thu, 16 Feb 2017 21:50:21 +0000 (13:50 -0800)]
swr: [rasterizer core] Implement SIMD16 GS and STREAMOUT

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer archrast] Add additional API events
Tim Rowley [Thu, 16 Feb 2017 07:57:50 +0000 (23:57 -0800)]
swr: [rasterizer archrast] Add additional API events

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer core/scripts] Autogen backend initialization function(s)
Tim Rowley [Wed, 15 Feb 2017 21:45:16 +0000 (13:45 -0800)]
swr: [rasterizer core/scripts] Autogen backend initialization function(s)

Autogen functions that instantiates different BackendPixelRate templates.
Functions get split into separate files after reaching a user defined
threshold (currently 512 per file) to speed up compilation.

This change will enable the addition of more template flags in the pixel
back end.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer core] backend.h declares gBackendPixelRateTable
Tim Rowley [Thu, 16 Mar 2017 17:00:15 +0000 (12:00 -0500)]
swr: [rasterizer core] backend.h declares gBackendPixelRateTable

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer core] Finish SIMD16 PA OPT including tesselation
Tim Rowley [Fri, 10 Feb 2017 22:56:57 +0000 (14:56 -0800)]
swr: [rasterizer core] Finish SIMD16 PA OPT including tesselation

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer core] Finish SIMD16 PA OPT except tesselation
Tim Rowley [Thu, 9 Feb 2017 21:43:32 +0000 (13:43 -0800)]
swr: [rasterizer core] Finish SIMD16 PA OPT except tesselation

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: [rasterizer core] Support sparse numa id values on all OSes
Tim Rowley [Mon, 6 Feb 2017 23:25:57 +0000 (15:25 -0800)]
swr: [rasterizer core] Support sparse numa id values on all OSes

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoi965: Skip register write detection when possible.
Kenneth Graunke [Fri, 3 Mar 2017 02:27:32 +0000 (18:27 -0800)]
i965: Skip register write detection when possible.

Detecting register write support by trial and error introduces a
stall at screen creation time, which it would be nice to avoid.
Certain command parser versions guarantee this will work (see the
giant comment in intelInitScreen2 below, or a few commits ago):

- Ivybridge: version >= 1 (kernel v3.16)
- Baytrail:  version >= 2 (kernel v3.19)
- Haswell:   version >= 7 (kernel v4.8)

For simplicity, we don't bother with version 1 in this patch.

This assumes that the user hasn't disabled aliasing PPGTT via a kernel
command line parameter.  Don't do that - you're only breaking things.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
7 years agoi965: Set screen->cmd_parser_version to 0 if we can't write registers.
Kenneth Graunke [Fri, 3 Mar 2017 02:21:31 +0000 (18:21 -0800)]
i965: Set screen->cmd_parser_version to 0 if we can't write registers.

If we can't write registers, then the effective command parser version
is 0 - it may exist, but it's not usefully enabling anything.

See kernel commit 1ca3712ca3429a617ed6c5f87718e4f6fe4ae0c6 (in v4.8)
where the kernel starts doing this for us.  This makes us do more or
less the same thing on older kernels.

This should preserve a bit of sanity by allowing us to perform a
screen->cmd_parser_version > N check to determine that we really can
use the features promised by command parser version N.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
7 years agoi965: Document the sad story of the kernel command parser.
Kenneth Graunke [Fri, 3 Mar 2017 02:12:28 +0000 (18:12 -0800)]
i965: Document the sad story of the kernel command parser.

This should help us figure out the complexities of which kernel
versions we need to get various features on various platforms.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
7 years agoi965: Fall back to GL 4.2/4.3 on Haswell if the kernel isn't new enough.
Kenneth Graunke [Thu, 2 Mar 2017 19:33:37 +0000 (11:33 -0800)]
i965: Fall back to GL 4.2/4.3 on Haswell if the kernel isn't new enough.

In commit d2590eb65ff28a9cbd592353d15d7e6cbd2c6fc6 I enabled GL 4.5
on Haswell...but failed to check if we could do indirect compute
shader dispatch...and query buffer objects.

Indirect compute shader dispatch requires command parser version 5
(kernel commit 7b9748cb513a6bef4af87b79f0da3ff7e8b56cd8, which is in
Linux v4.4).  On earlier kernels we would have disabled
ARB_compute_shader, which is a mandatory part of OpenGL 4.3+.

Query buffer objects currently require MI_MATH and MI_LOAD_REGISTER_REG,
which mean command parser version 7 (Linux v4.8).  On earlier kernels
we would have disabled ARB_query_buffer_object, which is a mandatory
part of OpenGL 4.4+.

The new version support looks like:

- Kernel 4.1 and older => OpenGL 3.3
- Kernel 4.2-4.3       => OpenGL 4.2
- Kernel 4.4-4.7       => OpenGL 4.3
- Kernel 4.8+          => OpenGL 4.5

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
7 years agor600g/sb: Fix memory leak by reworking uses list (rebased)
Constantine Kharlamov [Mon, 20 Mar 2017 18:16:25 +0000 (21:16 +0300)]
r600g/sb: Fix memory leak by reworking uses list (rebased)

The author is Heiko Przybyl(CC'ing), the patch is rebased on top of Bartosz Tomczyk's one per Dieter Nützel's comment.
Tested-by: Constantine Charlamov <Hi-Angel@yandex.ru>
v2: Resend the patch again through git-email. The prev. rebase was sent
through Thunderbird, which screwed up tab characters, making the patch
not apply.

--------------
When fixing the stalls on evergreen I introduced leaking of the useinfo
structure(s). Sorry. Instead of allocating a new object to hold 3 values
where only one is actually used, rework the list to just store the node
pointer. Thus no allocating and deallocation is needed. Since use_info
and use_kind aren't used anywhere, drop them and reduce code complexity.
This might also save some small amount of cycles.

Thanks to Bartosz Tomczyk for finding the bug.

Reported-by: Bartosz Tomczyk <bartosz.tomczyk86 at gmail.com <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>>
Signed-off-by: Heiko Przybyl <lil_tux at web.de <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>>
Supersedes: https://patchwork.freedesktop.org/patch/135852
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
7 years agoradeonsi: check the IR type before waiting for a compute compilation fence
Marek Olšák [Mon, 20 Mar 2017 15:39:02 +0000 (16:39 +0100)]
radeonsi: check the IR type before waiting for a compute compilation fence

This should fix OpenCL getting stuck.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100288
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
7 years agoaubinator: Move the guts of decode_group() to decoder.c.
Kenneth Graunke [Mon, 20 Mar 2017 05:11:52 +0000 (22:11 -0700)]
aubinator: Move the guts of decode_group() to decoder.c.

This lets us use it outside of the aubinator binary itself.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoaubinator: Drop spec parameter to decode_group().
Kenneth Graunke [Mon, 20 Mar 2017 05:10:46 +0000 (22:10 -0700)]
aubinator: Drop spec parameter to decode_group().

No longer necessary - the iterator gets it from the group.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoaubinator: Make the iterator store a pointer to structure descriptions.
Kenneth Graunke [Mon, 20 Mar 2017 04:22:20 +0000 (21:22 -0700)]
aubinator: Make the iterator store a pointer to structure descriptions.

When the iterator encounters a structure field, it now looks up the
gen_group for that structure definition and saves a pointer to it.

This lets us drop a lot of ridiculous code in the caller, which looked
at item->value (<struct NAME dword>), strtok'd the structure name back
out, and looked it up itself.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoaubinator: Track the current field's starting dword offset.
Kenneth Graunke [Mon, 20 Mar 2017 04:45:20 +0000 (21:45 -0700)]
aubinator: Track the current field's starting dword offset.

The iterator code already computed this value, then we stored it in
the structure name, strtok'd it back out, and also manually computed
it when printing dword headers.

Just put the value in the struct and use it.  Way simpler.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoaubinator: Drop decode_structure() helper.
Kenneth Graunke [Mon, 20 Mar 2017 04:47:25 +0000 (21:47 -0700)]
aubinator: Drop decode_structure() helper.

It made more sense when decode_group() took a bunch of extra options,
but now that there's only one...we may as well pass 0 and call it a day.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoaubinator: Drop unused print_dword_headers flag.
Kenneth Graunke [Mon, 20 Mar 2017 04:30:37 +0000 (21:30 -0700)]
aubinator: Drop unused print_dword_headers flag.

I added this flag in 65a9d5eabb05e4925c1c9a17836cad57304210d6 but
it was completely unused.  Both callers appear to have printed dword
headers, so we can just drop the flag and continue doing it
unconditionally.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoaubinator: Store a pointer from gen_group back to gen_spec.
Kenneth Graunke [Mon, 20 Mar 2017 04:24:24 +0000 (21:24 -0700)]
aubinator: Store a pointer from gen_group back to gen_spec.

When decoding a structure field within a group, we may want to look up
that structure type.  Having a gen_spec pointer makes it easy to do so.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoaubinator: Store enum textual name in iter->value.
Kenneth Graunke [Mon, 20 Mar 2017 03:59:08 +0000 (20:59 -0700)]
aubinator: Store enum textual name in iter->value.

gen_field_iterator_next() produces a string representing the value of
the field.  For enum values, it also produced a separate "description"
string containing the textual name of the enum.

The only caller of this function combines the two, printing enums as
"<numeric value> (<texture enum name>)".  We may as well just store
that in item->value directly, eliminating the description field, and
a layer of wrapping.

v2: Use non-overlapping source and destination strings in snprintf.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agosi_descriptor: move velems nullity check before dereference
Julien Isorce [Thu, 16 Mar 2017 13:09:21 +0000 (13:09 +0000)]
si_descriptor: move velems nullity check before dereference

CID 1399479: Dereference before null check (REVERSE_INULL)
check_after_deref: Null-checking velems suggests that it may be null,
but it has already been dereferenced on all paths leading to the check.

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeon_drm_bo: explicitly check return value of drmCommandWriteRead
Julien Isorce [Wed, 15 Mar 2017 17:40:25 +0000 (17:40 +0000)]
radeon_drm_bo: explicitly check return value of drmCommandWriteRead

CID 1313492

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agosi_pipe: remove nullity check after dereference
Julien Isorce [Wed, 15 Mar 2017 17:31:40 +0000 (17:31 +0000)]
si_pipe: remove nullity check after dereference

sscreen cannot be NULL

CID 1354483

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoradeon: initialize hole variable before calling container_of
Julien Isorce [Mon, 27 Feb 2017 13:42:17 +0000 (13:42 +0000)]
radeon: initialize hole variable before calling container_of

Like in a few other places in that radeon_drm_bo.c file.

CID 715739.

Signed-off-by: Julien Isorce <jisorce@oblong.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agointel: Correct the BDW surface state size
Nanley Chery [Tue, 7 Mar 2017 19:17:05 +0000 (11:17 -0800)]
intel: Correct the BDW surface state size

The PRMs state that this packet is 16 DWORDS long. Ensure that the last
three DWORDS are zeroed as required by the hardware when allocating a
null surface state.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
7 years agor600g: Fix out of bounds access
Bartosz Tomczyk [Wed, 8 Feb 2017 16:16:13 +0000 (17:16 +0100)]
r600g: Fix out of bounds access

fc_sp variable should indicate number of elements in
fc_stack array, but fc_sp was increased at beginning of fc_pushlevel
function. It leads to situation where idx=0 was never used, and last
32 element was stored outside fs_stack array.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
7 years agor600g: update sb documentation
Constantine Kharlamov [Mon, 20 Mar 2017 12:19:43 +0000 (15:19 +0300)]
r600g: update sb documentation

v2: s/r600/r600g in the title

Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
7 years agor600g: make condition clearer
Constantine Kharlamov [Mon, 20 Mar 2017 12:19:42 +0000 (15:19 +0300)]
r600g: make condition clearer

The second check in the old code looked pretty much unreachable, esp.
because it's not obvious that "max_entries" could be zero. To find out
that it was intentional I had to run some checks, and to dig into
the old versions of the file.

So, rewrite the check to make the intention clear.

v2: s/r600/r600g in the title, and per Dieter Nützel's comment wrap
lines of condition.

Signed-off-by: Constantine Kharlamov <Hi-Angel@yandex.ru>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
7 years agodocs: add news item and link release notes for 13.0.6/17.0.2
Emil Velikov [Mon, 20 Mar 2017 14:25:18 +0000 (14:25 +0000)]
docs: add news item and link release notes for 13.0.6/17.0.2

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
7 years agodocs: add sha256 checksums for 17.0.2
Emil Velikov [Mon, 20 Mar 2017 14:17:20 +0000 (14:17 +0000)]
docs: add sha256 checksums for 17.0.2

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 9b66351f5b274f3d79cb2c48afa3b2fcc2bf3442)

7 years agodocs: add release notes for 17.0.2
Emil Velikov [Mon, 20 Mar 2017 14:07:38 +0000 (14:07 +0000)]
docs: add release notes for 17.0.2

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 373d88a7117150de984510453e1c30a455987686)

7 years agodocs: add sha256 checksums for 13.0.6
Emil Velikov [Mon, 20 Mar 2017 11:54:35 +0000 (11:54 +0000)]
docs: add sha256 checksums for 13.0.6

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 879d24c49727cfc6c62cbd5bca58efad4c914e40)

7 years agodocs: add release notes for 13.0.6
Emil Velikov [Mon, 20 Mar 2017 11:42:19 +0000 (11:42 +0000)]
docs: add release notes for 13.0.6

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit fcef88d13a9ebdcadc6a878e9284c55651785301)

7 years agoanv/genX: Solve the vkCreateGraphicsPipelines crash
Xu,Randy [Sat, 18 Mar 2017 11:20:17 +0000 (19:20 +0800)]
anv/genX: Solve the vkCreateGraphicsPipelines crash

The crash is due to NULL pColorBlendState, which is legal if the
pipeline has rasterization disabled or if the subpass of the render pass
the pipeline is created against does not use any color attachments.

Test: Sample subpasses from LunarG can run without crash

Signed-off-by: Xu,Randy <randy.xu@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
7 years agoradv: fix logic for when to flush on multiple CS emission
Dave Airlie [Sun, 19 Mar 2017 23:00:36 +0000 (09:00 +1000)]
radv: fix logic for when to flush on multiple CS emission

The current code evaluated to always true, we only want to flush
on the first submit. Rename the variable to do_flush, and only
emit on the first iteration.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agospirv: Implement IsInf using an integer comparison
Jason Ekstrand [Fri, 17 Mar 2017 04:18:10 +0000 (21:18 -0700)]
spirv: Implement IsInf using an integer comparison

Since we already do fabs on the one source, we're guaranteed to get
positive infinity if we get any infinity at all.  Since +inf only has
one IEEE 754 representation, we can use an integer comparison and avoid
all of the ordered/unordered issues.

Cc: Dave Airlie <airlied@redhat.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/meta: fix image clears for r4g4 format.
Dave Airlie [Fri, 17 Mar 2017 04:23:56 +0000 (14:23 +1000)]
radv/meta: fix image clears for r4g4 format.

This just uses an 8-bit clear and packs the values.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoRevert "radv: fallback to an in-memory cache when no pipline cache is provided"
Dave Airlie [Mon, 20 Mar 2017 03:24:02 +0000 (13:24 +1000)]
Revert "radv: fallback to an in-memory cache when no pipline cache is provided"

This reverts commit 2845a108a9a8bd4b0e6e9b590c976452fb99eb10.

This break VK-GL-CTS randomly.
./deqp-vk --deqp-case=dEQP-VK.texture.filtering.3d.formats.r4g4b4a4*

bounces around here from 6/6 to 3/6 or 4/6 to hanging.

Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agomesa: disable glthread when glNewList() is called
Timothy Arceri [Thu, 16 Mar 2017 06:01:26 +0000 (17:01 +1100)]
mesa: disable glthread when glNewList() is called

glNewList() swaps dispatch tables, and we don't have anything in
place to handle that in glthread.

Tested-by: Michel Dänzer <michel.daenzer@amd.com>
7 years agoradv: fix primitive reset index emission
Dave Airlie [Sun, 19 Mar 2017 04:17:14 +0000 (14:17 +1000)]
radv: fix primitive reset index emission

This was meant to be checking the index type to get the correct
index not the last emitted one. This fixes:
dEQP-VK.pipeline.input_assembly.primitive_restart.index_type_uint32.triangle_strip_with_adjacency

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoutil/disk_cache: check rename result
Grazvydas Ignotas [Sat, 18 Mar 2017 20:58:55 +0000 (22:58 +0200)]
util/disk_cache: check rename result

I haven't seen this causing problems in practice, but for correctness
we should also check if rename succeeded to avoid breaking accounting
and leaving a .tmp file behind.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
7 years agoutil/disk_cache: delete .tmp if target exists
Grazvydas Ignotas [Sat, 18 Mar 2017 20:58:54 +0000 (22:58 +0200)]
util/disk_cache: delete .tmp if target exists

At the time of target file check, .tmp file is already created and file
lock is held, so we should remove the .tmp, like in other error paths.

With this, piglit no longer leaves large amount of empty .tmp files
behind, which waste directory entries and may interfere with eviction.

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
7 years agoutil/disk_cache: fix stored_keys index
Grazvydas Ignotas [Sat, 18 Mar 2017 22:46:39 +0000 (00:46 +0200)]
util/disk_cache: fix stored_keys index

It seems there is a bug because:
- 20 bytes are compared, but only 1 byte stored_keys step is used
- entries can overlap each other by 19 bytes
- index_mmap is ~1.3M in size, but only first 64K is used

With this fix for Deus Ex:
- startup time (from launch to Feral logo): ~38s -> ~16s
- disk_cache_has_key() hit rate: ~50% -> ~96%

Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
7 years agonv30: create uploader after pipe->screen is set
Ilia Mirkin [Sun, 19 Mar 2017 05:22:29 +0000 (01:22 -0400)]
nv30: create uploader after pipe->screen is set

Fixes crashes after recent upload rework.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agonv50,nvc0: enable TEX_LZ and TXF_LZ
Ilia Mirkin [Thu, 16 Mar 2017 03:29:47 +0000 (23:29 -0400)]
nv50,nvc0: enable TEX_LZ and TXF_LZ

There should be minimal gain, if any, for nvc0, but nv50 may end up
noticing more often that the lod argument is uniform. This, in turn,
will remove the need for some unnecessary transformations, which were
being hit due to the checks being done pre-ssa.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agost/mesa: set result writemask based on ir type
Ilia Mirkin [Sat, 4 Mar 2017 18:52:48 +0000 (13:52 -0500)]
st/mesa: set result writemask based on ir type

This prevents textureQueryLevels, which maps as LODQ, from ending up
with a xyzw writemask, which is illegal.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100061
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agonvc0/ir: treat FMA like MAD for operand propagation
Karol Herbst [Sun, 19 Mar 2017 00:08:56 +0000 (01:08 +0100)]
nvc0/ir: treat FMA like MAD for operand propagation

Helps mainly Feral-ported games, due to their use of fma()

shader-db changes:
total instructions in shared programs : 3901147 -> 3842505 (-1.50%)
total gprs used in shared programs    : 471258 -> 467359 (-0.83%)
total local used in shared programs   : 27405 -> 27361 (-0.16%)
total bytes used in shared programs   : 35749888 -> 35214176 (-1.50%)

                local        gpr       inst      bytes
    helped          17        1829        4091        4091
      hurt           4          44           3           3

Signed-off-by: Karol Herbst <karolherbst@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
7 years agoutil/disk_cache: pass predicate functions file stats directly (v4)
Alan Swanson [Fri, 17 Mar 2017 18:05:43 +0000 (18:05 +0000)]
util/disk_cache: pass predicate functions file stats directly (v4)

Since switching to LRU eviction the only user of these predicate
functions now resolves directory entry stats itself so pass them
directly saving calling fstat and strlen twice (and the
expensive strlen is skipped entirely if access time is newer).

v2: Update for empty cache dir detection changes
v3: Fix passing string length to predicate with the +1 for NULL
    termination and also pass sb as pointer
v4: Missed ampersand for passing sb as pointer

Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
7 years agoglsl: use set for copy propagation kills
Timothy Arceri [Fri, 17 Mar 2017 13:40:56 +0000 (00:40 +1100)]
glsl: use set for copy propagation kills

Previously each time we saw a variable we just created a duplicate
entry in the list. This is particularly bad for loops were we add
everything twice, and then throw nested loops into the mix and the
list was growing expoentially.

This stops the glsl-vs-unroll-explosion test which has 16 nested
loops from reaching the tests mem usage limit in this pass. The
test now hits the mem limit in opt_copy_propagation_elements()
instead.

I suspect this was also part of the reason this pass can be so
slow with some shaders.

Reviewed-by: Thomas Helland <thomashelland90@gmail.com>