mesa.git
5 years agoRevert "glsl: Set location on structure-split sampler uniform variables"
Kenneth Graunke [Wed, 17 Apr 2019 21:48:10 +0000 (14:48 -0700)]
Revert "glsl: Set location on structure-split sampler uniform variables"

This reverts commit 9e0c744f07a21fc7bb018a77cf83b057436d0d1b, which
regressed dEQP-GLES2.functional.uniform_api.random.3.  It turns out
that the newly produced location is meaningless and impossible to
consume by drivers that want to look at gl_uniform_storage, so it's
probably better to leave it unset (0) than a number that looks usable.

Leave a tombstone^Wcomment to discourage the next person from making
the obvious looking fix.

See the next commit for a longer description of the problem.

This breaks tests/spec/glsl-1.10/execution/samplers/uniform-struct
on i965, which was originally fixed by the revert.  The next commit
will fix it again.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoradeonsi: use CP DMA for the null const buffer clear on CIK
Marek Olšák [Fri, 12 Apr 2019 15:12:34 +0000 (11:12 -0400)]
radeonsi: use CP DMA for the null const buffer clear on CIK

This is a workaround for a thread deadlock that I have no idea
why it occurs.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108879
Fixes: 9b331e462e5021d994859756d46cd2519d9c9c6e
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agodrirc: Add workaround for Epic Games Launcher
Danylo Piliaiev [Wed, 17 Apr 2019 11:32:47 +0000 (14:32 +0300)]
drirc: Add workaround for Epic Games Launcher

Epic Games Launcher could be launched in opengl mode
with "-opengl" option. It creates 4.4 opengl core context
however it uses deprecated functionality e.g. default
vertex buffer object.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110462

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
5 years agoiris: Track bound and writable SSBOs
Kenneth Graunke [Wed, 17 Apr 2019 05:54:40 +0000 (22:54 -0700)]
iris: Track bound and writable SSBOs

Marek recently extended pipe->set_shader_buffers() to take an extra
writable_bitmask parameter, indicating which SSBOs are writable (some
may be bound read-only).  We can use this to decide whether to set
EXEC_OBJECT_WRITE when pinning.  Avoiding the write flag can save us
some cross-batch flushing if the SSBO is used for reading in both the
render and compute engines.

5 years agovirgl: clear vertex_array_dirty
Chia-I Wu [Thu, 18 Apr 2019 22:10:22 +0000 (15:10 -0700)]
virgl: clear vertex_array_dirty

Clear vertex_array_dirty after the state is emitted.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
5 years agogallivm: disable NEON instructions if they are not supported
Lubomir Rintel [Mon, 11 Mar 2019 20:18:48 +0000 (21:18 +0100)]
gallivm: disable NEON instructions if they are not supported

The LLVM project made some questionable decisions about defaults for
armv7 (e.g. they enable NEON that is not there on NVIDIA and Marvell
platforms).

On top of that, getHostCPUFeatures() doesn't disable missing machine
attributes. Finally, -neon alone is not sufficient to disable emmision
of NEON instructions.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agogallivm: guess CPU features also on ARM
Lubomir Rintel [Mon, 11 Mar 2019 18:16:40 +0000 (19:16 +0100)]
gallivm: guess CPU features also on ARM

getHostCPUFeatures() is also available on ARM, for even longer time than
for x86. Use it -- it potentially enables instructions that may speed
things up.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/518
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agoiris: Enable the dual_color_blend_by_location driconf option.
Kenneth Graunke [Fri, 19 Apr 2019 05:29:27 +0000 (22:29 -0700)]
iris: Enable the dual_color_blend_by_location driconf option.

This fixes rendering in Unigine Valley 1.0 and Heaven 4.0.

5 years agoiris: Add mechanism for iris-specific driconf options
Kenneth Graunke [Fri, 19 Apr 2019 05:13:41 +0000 (22:13 -0700)]
iris: Add mechanism for iris-specific driconf options

Based on Nicolai's 0f8c5de8690e7c87aa2e24383065efaca7e6fe78.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
5 years agonir: Use the NIR_SRC_AS_ macro to define nir_src_as_deref
Jason Ekstrand [Fri, 19 Apr 2019 20:09:04 +0000 (15:09 -0500)]
nir: Use the NIR_SRC_AS_ macro to define nir_src_as_deref

We have a macro for this now; no reason to hand-roll it for derefs.
While we're here, move the NIR_DEFINE_CAST for derefs down to where all
the other ones are.

Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoanv,radv: Update release notes for newly implemented extensiosn
Jason Ekstrand [Sat, 20 Apr 2019 14:44:57 +0000 (09:44 -0500)]
anv,radv: Update release notes for newly implemented extensiosn

A lot has happened in those two drivers since the 19.0 release and we
keep forgetting to update release notes.  Time to bring everything up to
date again before 19.1 gets released.

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: add VK_NV_compute_shader_derivates support
Samuel Pitoiset [Fri, 19 Apr 2019 10:40:37 +0000 (12:40 +0200)]
radv: add VK_NV_compute_shader_derivates support

Only computeDerivativeGroupLinear is supported for now.

All crucible tests pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agointel/fs: Add support for float16 to the fsign optimizations
Ian Romanick [Thu, 18 Apr 2019 22:09:06 +0000 (15:09 -0700)]
intel/fs: Add support for float16 to the fsign optimizations

Commit ad98fbc2174 ("intel/fs: Refactor code generation for nir_op_fsign
to its own function") criss-crossed with c2b8fb9a810 ("anv/device:
expose VK_KHR_shader_float16_int8 in gen8+"), and I was not paying
enough attention when I rebased.  This adds back the float16 changes and
enables the optimization.

v2: Incorporate more changes from 19cd2f5debd and a8d8b1a1391 that I
missed in the previous version.

Fixes: ad98fbc2174 ("intel/fs: Refactor code generation for nir_op_fsign to its own function")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110474
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
5 years agolima: add Android build
Icenowy Zheng [Mon, 15 Apr 2019 04:32:43 +0000 (12:32 +0800)]
lima: add Android build

Currently only meson build supported is added for lima driver.

Add Android build support for lima.

Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Acked-by: Qiang Yu <yuq825@gmail.com>
5 years agost/nine: skip position checks in SetCursorPosition()
Andre Heider [Thu, 11 Apr 2019 06:42:47 +0000 (08:42 +0200)]
st/nine: skip position checks in SetCursorPosition()

For HW cursors, "cursor.pos" doesn't hold the current position of the
pointer, just the position of the last call to SetCursorPosition().

Skip the check against stale values and bump the d3dadapter9 drm version
to expose this change of behaviour.

Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>
5 years agoanv: Rework the descriptor set layout create loop
Jason Ekstrand [Fri, 19 Apr 2019 19:45:34 +0000 (14:45 -0500)]
anv: Rework the descriptor set layout create loop

Previously, we were storing the per-binding create info pointer in the
immutable_samplers field temporarily so that we can switch the order in
which we walk the loop.  However, now that we have multiple arrays of
structs to walk, it makes more sense to store an index of some sort.
Because we want to leave immutable_samplers as NULL for undefined
bindings, we store index + 1 and then subtract one later.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Ignore descriptor binding flags if bindingCount == 0
Jason Ekstrand [Fri, 19 Apr 2019 19:43:01 +0000 (14:43 -0500)]
anv: Ignore descriptor binding flags if bindingCount == 0

I missed this on the first go round.  The bindingCount field of
VkDescriptorSetLayoutBindingFlagsCreateInfoEXT is allowed to be zero
which means the flags array is ignored.

Fixes: d6c9bd6e01b4d "anv: Put binding flags in descriptor set layouts"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agopanfrost/mdg: Use shared fsign lowering
Alyssa Rosenzweig [Fri, 19 Apr 2019 23:15:45 +0000 (23:15 +0000)]
panfrost/mdg: Use shared fsign lowering

Fixes failures in shaders.operator.common_functions.sign.*

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost: Fixup vertex offsets to prevent shadow copy
Alyssa Rosenzweig [Mon, 15 Apr 2019 04:08:46 +0000 (04:08 +0000)]
panfrost: Fixup vertex offsets to prevent shadow copy

Mali attribute buffers have to be 64-byte aligned. However, Gallium
enforces no such requirement; for unaligned buffers, we were previously
forced to create a shadow copy (slow!). To prevent this, we instead use
the offseted buffer's address with the lower bits masked off, and then
add those masked off bits to the src_offset. Proof of correctness
included, possibly for the opportunity to say "QED" unironically.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost: Track BO lifetime with jobs and reference counts
Alyssa Rosenzweig [Sun, 14 Apr 2019 22:42:44 +0000 (22:42 +0000)]
panfrost: Track BO lifetime with jobs and reference counts

This (fairly large) patch continues work surrounding the panfrost_job
abstraction to improve job lifetime management. In particular, we add
infrastructure to track which BOs are used by a particular job
(currently limited to the vertex buffer BOs), to reference count these
BOs, and to automatically manage the BOs memory based on the reference
count. This set of changes serves as a code cleanup, as a way of future
proofing for allowing flushing BOs, and immediately as a bugfix to
workaround the missing reference counting for vertex buffer BOs.
Meanwhile, there are a few cleanups to vertex buffer handling code
itself, so in the short-term, this allows us to remove the costly VBO
staging workaround, since this patch addresses the underlying causes.

v2: Use pipe_reference for BO reference counting, rather than managing
it ourselves. Don't duplicate hash-table key removal. Fix vertex buffer
counting.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agodocs/relnotes: add support for VK_KHR_shader_float16_int8
Andres Gomez [Thu, 18 Apr 2019 19:00:37 +0000 (21:00 +0200)]
docs/relnotes: add support for VK_KHR_shader_float16_int8

v2: radv also supports it now (Samuel Pitoiset).

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoanv/nir: Add a central helper for figuring out SSBO address formats
Jason Ekstrand [Thu, 18 Apr 2019 17:08:57 +0000 (12:08 -0500)]
anv/nir: Add a central helper for figuring out SSBO address formats

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agonir: Add helpers for getting the type of an address format
Jason Ekstrand [Thu, 18 Apr 2019 17:08:34 +0000 (12:08 -0500)]
nir: Add helpers for getting the type of an address format

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Implement VK_EXT_descriptor_indexing
Jason Ekstrand [Wed, 27 Feb 2019 22:08:20 +0000 (16:08 -0600)]
anv: Implement VK_EXT_descriptor_indexing

Now that everything is in place to do bindless for all resource types
except input attachments and UBOs, VK_EXT_descriptor_indexing is
"trivial".

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Put binding flags in descriptor set layouts
Jason Ekstrand [Tue, 2 Oct 2018 20:35:59 +0000 (15:35 -0500)]
anv: Put binding flags in descriptor set layouts

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Use bindless handles for images
Jason Ekstrand [Tue, 12 Feb 2019 07:02:28 +0000 (01:02 -0600)]
anv: Use bindless handles for images

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agointel/fs: Add support for bindless image load/store/atomic
Jason Ekstrand [Tue, 12 Feb 2019 06:47:54 +0000 (00:47 -0600)]
intel/fs: Add support for bindless image load/store/atomic

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Use bindless textures and samplers
Jason Ekstrand [Thu, 7 Feb 2019 20:10:33 +0000 (14:10 -0600)]
anv: Use bindless textures and samplers

This commit changes anv to put bindless handles and sampler pointers
into the descriptor buffer and use those instead of bindful when we run
out of binding table space.  This "spilling" of descriptors allows to to
advertise an almost unbounded number of images and samplers.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Pass the plane into lower_tex_deref
Jason Ekstrand [Fri, 8 Feb 2019 23:04:07 +0000 (17:04 -0600)]
anv: Pass the plane into lower_tex_deref

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Use write_image_view to initialize immutable samplers
Jason Ekstrand [Fri, 8 Feb 2019 04:34:57 +0000 (22:34 -0600)]
anv: Use write_image_view to initialize immutable samplers

Instead of setting it manually, call the helper.  When setting
descriptor sets becomes more complicated than just setting some struct
values, this will keep immutable sampler handling correct.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Count the number of planes in each descriptor binding
Jason Ekstrand [Thu, 7 Feb 2019 16:16:24 +0000 (10:16 -0600)]
anv: Count the number of planes in each descriptor binding

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agointel/fs: Add support for bindless texture ops
Jason Ekstrand [Wed, 6 Feb 2019 21:42:17 +0000 (15:42 -0600)]
intel/fs: Add support for bindless texture ops

We add two new texture sources for bindless surface and sampler handles.
Bindless surface handles are expected to be pre-shifted so that the
20-bit surface state table index is in the top 20 bits of the 32-bit
handle.  This lets us avoid any extra shifts in the shader.  Bindless
sampler handles are 32-byte aligned byte offsets from general state base
address.  We use 32-byte aligned instead of 16-byte aligned to avoid
having to use more indirect messages than needed.  It means we can't
tightly pack samplers but that's probably not a big deal.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agointel,nir: Lower TXD with a bindless sampler
Jason Ekstrand [Fri, 8 Feb 2019 23:56:52 +0000 (17:56 -0600)]
intel,nir: Lower TXD with a bindless sampler

When we have a bindless sampler, we need an instruction header.  Even in
SIMD8, this pushes the instruction over the sampler message size maximum
of 11 registers.  Instead, we have to lower TXD to TXL.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Implement VK_KHR_shader_atomic_int64
Jason Ekstrand [Sun, 13 Jan 2019 00:30:47 +0000 (18:30 -0600)]
anv: Implement VK_KHR_shader_atomic_int64

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Implement SSBOs bindings with GPU addresses in the descriptor BO
Jason Ekstrand [Wed, 9 Jan 2019 22:04:22 +0000 (16:04 -0600)]
anv: Implement SSBOs bindings with GPU addresses in the descriptor BO

This commit adds a new way for ANV to do SSBO bindings by just passing a
GPU address in through the descriptor buffer and using the A64 messages
to access the GPU address directly.  This means that our variable
pointers are now "real" pointers instead of a vec2(BTI, offset) pair.
This carries a few of advantages:

 1. It lets us support a virtually unbounded number of SSBO bindings.

 2. It lets us implement VK_KHR_shader_atomic_int64 which we couldn't
    implement before because those atomic messages are only available
    in the bindless A64 form.

 3. It's way better than messing around with bindless handles for SSBOs
    which is the only other option for VK_EXT_descriptor_indexing.

 4. It's more future looking, maybe?  At the least, this is what NVIDIA
    does (they don't have binding based SSBOs at all).  This doesn't a
    priori mean it's better, it just means it's probably not terrible.

The big disadvantage, of course, is that we have to start doing our own
bounds checking for robustBufferAccess again have to push in dynamic
offsets.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Lower some SSBO operations in apply_pipeline_layout
Jason Ekstrand [Fri, 11 Jan 2019 22:52:43 +0000 (16:52 -0600)]
anv: Lower some SSBO operations in apply_pipeline_layout

In order to avoid the potential overhead of A64 operations on all SSBO
ops, we look for those SSBO ops where we can get to the descriptor set
from the SSBO access operation and lower those to a binding-table
approach.  When robustBufferAccess is enabled, this lets the hardware do
the bounds checking for us.  It also avoids some potentially expensive
64-bit integer calculations.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Add a has_a64_buffer_access to anv_physical_device
Jason Ekstrand [Thu, 7 Feb 2019 18:01:18 +0000 (12:01 -0600)]
anv: Add a has_a64_buffer_access to anv_physical_device

This is more descriptive and a bit nicer than checking for gen >= 8 &&
use_softpin everywhere.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agointel/nir: Re-run int64 lowering in postprocess_nir
Jason Ekstrand [Thu, 10 Jan 2019 22:05:06 +0000 (16:05 -0600)]
intel/nir: Re-run int64 lowering in postprocess_nir

We're about to start doing 64-bit pointer calculations in ANV.  They
will get applied after brw_preprocess_nir which is where we currently do
64-bit integer arithmetic lowering.  Because we're adding 64-bit integer
arithmetic after the initial lowering has happened, we need to lower
again.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agonir/lower_io: Expose some explicit I/O lowering helpers
Jason Ekstrand [Tue, 8 Jan 2019 00:00:22 +0000 (18:00 -0600)]
nir/lower_io: Expose some explicit I/O lowering helpers

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv/pipeline: Add skeleton support for spilling to bindless
Jason Ekstrand [Mon, 25 Feb 2019 19:59:07 +0000 (13:59 -0600)]
anv/pipeline: Add skeleton support for spilling to bindless

If the number of surfaces or samplers exceeds what we can put in a
table, we will want to spill out to bindless.  There is no bindless
support yet but this gets us the basic framework that will be used by
later commits.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv/pipeline: Sort bindings by most used first
Jason Ekstrand [Thu, 21 Feb 2019 00:14:56 +0000 (18:14 -0600)]
anv/pipeline: Sort bindings by most used first

This commit just sorts the bindings by how often they're used vs the
array size of the binding.  This will let us make more nuanced decisions
about what goes in the binding table vs. what to make bindless.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Add a #define for the max binding table size
Jason Ekstrand [Tue, 16 Apr 2019 22:35:05 +0000 (17:35 -0500)]
anv: Add a #define for the max binding table size

This also fixes a bug where we mis-calculate maximum binding table sizes
and may return true in vkGetDescriptorSetLayoutSupport even for sets too
large to fit in a binding table.

Fixes: ddc40691221 "anv: Implement VK_KHR_maintenance3"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Put image params in the descriptor set buffer on gen8 and earlier
Jason Ekstrand [Thu, 22 Nov 2018 00:26:27 +0000 (18:26 -0600)]
anv: Put image params in the descriptor set buffer on gen8 and earlier

This is really where they belong; not push constants.  The one downside
here is that we can't push them anymore for compute shaders.  However,
that's a general problem and we should figure out how to push descriptor
sets for compute shaders.  This lets us bump MAX_IMAGES to 64 on BDW and
earlier platforms because we no longer have to worry about push constant
overhead limits.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Make all VkDeviceMemory BOs resident permanently
Jason Ekstrand [Wed, 27 Feb 2019 00:05:34 +0000 (18:05 -0600)]
anv: Make all VkDeviceMemory BOs resident permanently

We spend a lot of time in the driver adding things to hash sets to track
residency.  The reality is that a properly built Vulkan app uses large
memory objects and sub-allocates from them.  In a typical frame, most of
if not all of those allocations are going to be resident for the entire
frame so we're really not saving ourselves much by tracking fine-grained
residency.  Just throwing everything in the validation list does make it
a little bit more expensive inside the kernel to walk the list and
ensure that all our VA is in order.  However, without relocations, the
overhead of that is pretty small.

If we ever do run into a memory pressure situation where the fine-
grained residency could even potentially help, we would likely be
swapping one page out to make room for another within the draw call and
performance is totally lost at that point.  We're better off swapping
out other apps and just letting ours run a whole frame.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agofreedreno/ir3: fix const assert
Rob Clark [Fri, 19 Apr 2019 18:32:22 +0000 (11:32 -0700)]
freedreno/ir3: fix const assert

Fixes: fe8c57e859d freedreno/ir3: use nir_src_as_uint in a few places
Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agogallium/auxiliary/vl: Fix a couple of warnings
Kristian H. Kristensen [Thu, 18 Apr 2019 18:33:10 +0000 (11:33 -0700)]
gallium/auxiliary/vl: Fix a couple of warnings

Remove unused functions and mark unhandled default case with
unreachable.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
5 years agoegl/dri2: Mark potentially unused 'display' variable with MAYBE_UNUSED
Kristian H. Kristensen [Thu, 18 Apr 2019 18:30:22 +0000 (11:30 -0700)]
egl/dri2: Mark potentially unused 'display' variable with MAYBE_UNUSED

Sometimes there is no X11 platform.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
5 years agoralloc: Fully qualify non-virtual destructor call
Kristian H. Kristensen [Thu, 18 Apr 2019 18:28:12 +0000 (11:28 -0700)]
ralloc: Fully qualify non-virtual destructor call

This suppresses warning about calling a non-virtual destructor in a
non-final class with virtual functions:

src/compiler/glsl/ast.h:53:4: warning: destructor called on non-final 'ast_node' that has virtual functions but non-virtual destructor [-Wdelete-non-virtual-dtor]
   DECLARE_LINEAR_ZALLOC_CXX_OPERATORS(ast_node);

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
5 years agonir_opcodes.py: Saturate to expression that doesn't overflow
Kristian H. Kristensen [Thu, 18 Apr 2019 18:23:13 +0000 (11:23 -0700)]
nir_opcodes.py: Saturate to expression that doesn't overflow

Compiler warns about overflow when assigning UINT64_MAX to something
smaller than a uin64_t:

src/compiler/nir/nir_constant_expressions.c:16909:50: warning: implicit conversion from 'unsigned long long' to 'uint1_t' (aka 'unsigned char') changes value from 18446744073709551615 to 255 [-Wconstant-conversion]
            uint1_t dst = (src0 + src1) < src0 ? UINT64_MAX : (src0 + src1);
                    ~~~                          ^~~~~~~~~~

Shift UINT64_MAX down to the appropriate maximum value for the type
being assigned to.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoglsl_to_nir: Initialize debug variable
Kristian H. Kristensen [Wed, 10 Apr 2019 20:10:48 +0000 (13:10 -0700)]
glsl_to_nir: Initialize debug variable

If we want to assert on found == true when the loop exits early, we
need to initialize it to false.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
5 years agotgsi: Mark tgsi_strings_check() unused
Kristian H. Kristensen [Wed, 10 Apr 2019 20:09:01 +0000 (13:09 -0700)]
tgsi: Mark tgsi_strings_check() unused

It's there to hold the static asserts, don't warning about it being
unused.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
5 years agoanv: limit URB reconfigurations when using blorp
Lionel Landwerlin [Wed, 6 Mar 2019 11:42:14 +0000 (11:42 +0000)]
anv: limit URB reconfigurations when using blorp

If the last graphics pipeline bound to the command buffer has enough
space in its VS URB entries for Blorp then avoid reconfiguring the URB
partitions.

v2: s/0/MESA_SHADER_VERTEX/ (Caio)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agointel/devinfo: add basic sanity tests on device database
Lionel Landwerlin [Thu, 11 Apr 2019 11:12:38 +0000 (12:12 +0100)]
intel/devinfo: add basic sanity tests on device database

v2: #undef NDEBUG (Eric)
    Use inc_include & inc_src (Eric)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Anuj Phogat anuj.phogat@gmail.com
5 years agointel/devinfo: fix missing num_thread_per_eu on ICL
Lionel Landwerlin [Thu, 11 Apr 2019 11:20:36 +0000 (12:20 +0100)]
intel/devinfo: fix missing num_thread_per_eu on ICL

There was an assumption that num_thread_per_eu would be set in the
Gen8 features. Since this is mostly the same of all gen8->11 (except
GEN9_LP that overwrites it) let's just factor it out.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Anuj Phogat anuj.phogat@gmail.com
5 years agonir: Use the nir_builder _imm helpers in setting up deref offsets.
Eric Anholt [Wed, 17 Apr 2019 17:12:48 +0000 (10:12 -0700)]
nir: Use the nir_builder _imm helpers in setting up deref offsets.

When looking at the dEQP nested_struct_array_dynamic_index_fragment code
after lowering, I was horrified at the amount of adding and multiplying by
0 we were doing.  The builder _imm helpers handle that for you so that the
following optimization passes have less work to do.  Plus, it's easier to
read.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: Fix deref offset calculation for structs.
Eric Anholt [Wed, 17 Apr 2019 17:09:14 +0000 (10:09 -0700)]
nir: Fix deref offset calculation for structs.

We were calcuating the offset for the field within the struct, and just
dropping it on the floor.  Fixes a regression in
KHR-GLES3.shaders.struct.local.nested_struct_array_dynamic_index_fragment
and a few of its friends since the scratch lowering commit.

Fixes: e8e159e9df40 ("nir/deref: Add helpers for getting offsets")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agolima: enable nir fsign lowering in ppir
Erico Nunes [Tue, 16 Apr 2019 20:49:51 +0000 (22:49 +0200)]
lima: enable nir fsign lowering in ppir

The mali utgard pp doesn't support a sign instruction.
Use the nir lowering function for fsign to implement fsign in ppir.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agonir/algebraic: add lowering for fsign
Erico Nunes [Tue, 16 Apr 2019 20:49:41 +0000 (22:49 +0200)]
nir/algebraic: add lowering for fsign

The mali utgard pp doesn't support a sign instruction.
In the ARM offline shader compiler, the sign function is implemented
using sub(gt(0.0, a), lt(0.0, a)).
This is a generic optimization, so implement it in the nir level when
lower_fsign is set, alongside the lowering for isign.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agodocs: s/Aptril/April/
Brian Paul [Fri, 19 Apr 2019 14:30:27 +0000 (08:30 -0600)]
docs: s/Aptril/April/

Found by Manuel Huber.  Trivial.

5 years agolima/ppir: support ppir_op_ceil
Erico Nunes [Tue, 16 Apr 2019 21:21:24 +0000 (23:21 +0200)]
lima/ppir: support ppir_op_ceil

Add a few missing ppir_op_ceil enum handling entries to implement
nir_op_fceil in lima ppir.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
5 years agoradv: Support VK_EXT_inline_uniform_block.
Bas Nieuwenhuizen [Thu, 14 Mar 2019 10:20:53 +0000 (11:20 +0100)]
radv: Support VK_EXT_inline_uniform_block.

Basically just reserve the memory in the descriptor sets.

On the shader side we construct a buffer descriptor, since
AFAIU VGPR indexing on 32-bit pointers in LLVM is still broken.

This fully supports update after bind and variable descriptor set
sizes. However, the limits are somewhat arbitrary and are mostly
about finding a reasonable division of a 2 GiB max memory size over
the set.

v2: - rebased on top of master (Samuel)
    - remove the loading resources rework (Samuel)
    - only load UBO descriptors if it's a pointer (Samuel)
    - use LLVMBuildPtrToInt to avoid IR failures (Samuel)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v2)
5 years agoac/nir: use the new raw/struct SSBO atomic intrisics for comp_swap
Samuel Pitoiset [Thu, 18 Apr 2019 07:09:55 +0000 (09:09 +0200)]
ac/nir: use the new raw/struct SSBO atomic intrisics for comp_swap

This is actually fixed now.

This change requires LLVM r358579. Make sure to have it in
your tree, otherwise the following piglit will hang:

tests/spec/arb_shader_storage_buffer_object/execution/ssbo-atomicCompSwap-int.shader_test

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agoac/nir: only use the new raw/struct SSBO atomic intrinsics with LLVM 9+
Samuel Pitoiset [Thu, 18 Apr 2019 07:06:49 +0000 (09:06 +0200)]
ac/nir: only use the new raw/struct SSBO atomic intrinsics with LLVM 9+

They are buggy with older LLVM version, see r358579.

Fixes: 78c551aca1c ("ac/nir: use new LLVM 8 intrinsics for SSBO atomics except cmpswap")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agoac/nir: only use the new raw/struct image atomic intrinsics with LLVM 9+
Samuel Pitoiset [Thu, 18 Apr 2019 07:17:04 +0000 (09:17 +0200)]
ac/nir: only use the new raw/struct image atomic intrinsics with LLVM 9+

They are buggy with LLVM 8 because they weren't marked as source
of divergence, see r358579.

Fixes: dd0172e865f ("radv: Use structured intrinsics instead of indexing workaround for GFX9.")"
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agoiris: Be less aggressive at postdraw work skipping
Kenneth Graunke [Thu, 18 Apr 2019 21:35:26 +0000 (14:35 -0700)]
iris: Be less aggressive at postdraw work skipping

We empty the cache sets when flushing the batch, at which point we need
to add any framebuffer related BOs even though the bindings haven't
changed.  So, we now do the cache set tracking unconditionally.

For now, we continue skipping resolve work based on the same conditions
in the predraw functions - the thinking is if we didn't trigger
resolves, there's nothing to update here.  Time will tell if this works.

Partly reverts commit 365886ebe1a54f893b688b457553eead6aa572ea, and
fixes Unigine Valley rendering on Gen9+.  Drops drawoverhead scores
by about 10-12%.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110353

5 years agointel/fs: Account for live range lengths in spill costs
Jason Ekstrand [Sat, 13 Apr 2019 21:01:50 +0000 (16:01 -0500)]
intel/fs: Account for live range lengths in spill costs

The current register allocator has a concept of "spill benefit" which is
based on the number of nodes with which a given node interferes.  The
idea is that you want to spill stuff with high interference because
those are the most likely registers to help when spilling.  However,
this fails to take into account the length of the live range so the
allocator frequently picks "cheap" (not many uses) registers which are
actually very short lived and so spilling them doesn't help with the
pressure situation.

This commit takes into account the length of the live range to make
long-lived registers more likely to get spilled than short-lived ones.
This encourages the spill chooser to choose slightly larger registers
which will affect a larger area of the program and hopefully we have to
spill fewer of them to get the same reduction in over-all register
pressure.

Shader-db results on Kaby Lake:

    total spills in shared programs: 23664 -> 12050 (-49.08%)
    spills in affected programs: 19243 -> 7629 (-60.35%)
    helped: 296
    HURT: 8

    total fills in shared programs: 32028 -> 25139 (-21.51%)
    fills in affected programs: 20378 -> 13489 (-33.81%)
    helped: 295
    HURT: 16

Of course, most of that is in Deus Ex...

Shader-db results on Kaby Lake (without Deus Ex):

    total spills in shared programs: 6479 -> 5834 (-9.96%)
    spills in affected programs: 3231 -> 2586 (-19.96%)
    helped: 40
    HURT: 4

    total fills in shared programs: 17165 -> 17099 (-0.38%)
    fills in affected programs: 6951 -> 6885 (-0.95%)
    helped: 40
    HURT: 7

Even without Deus Ex, the spill help is pretty respectable.  The worst
hurt shaders were one compute shader in Aztec Ruins and one fragment
shader in KSP that were each hurt by around 13% fill 9% spill.

VkPipeline-db results on Kaby Lake:

    total spills in shared programs: 9149 -> 8069 (-11.80%)
    spills in affected programs: 5197 -> 4117 (-20.78%)
    helped: 27
    HURT: 16

    total fills in shared programs: 26390 -> 25477 (-3.46%)
    fills in affected programs: 12662 -> 11749 (-7.21%)
    helped: 24
    HURT: 22

The Vulkan results were decidedly more mixed but we don't have nearly as
many apps in that database yet.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agovirgl/vtest: bump up protocol version + support encoded transfers
Gurchetan Singh [Sat, 15 Dec 2018 00:07:19 +0000 (16:07 -0800)]
virgl/vtest: bump up protocol version + support encoded transfers

This more accurately reflects what the drm winsys does.

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agovirgl/vtest: wait after issuing a transfer get
Gurchetan Singh [Sat, 15 Dec 2018 00:36:07 +0000 (16:36 -0800)]
virgl/vtest: wait after issuing a transfer get

Otherwise, there's artifacts when running Unigine Valley with
protocol version 2.

We can get away with not waiting for most buffers, but let's
be conservative.

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agovirgl/vtest: modify sending and receiving data for shared memory
Gurchetan Singh [Thu, 13 Dec 2018 02:01:06 +0000 (18:01 -0800)]
virgl/vtest: modify sending and receiving data for shared memory

We need to copy the shared memory region to the display target.

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agovirgl/vtest: receive and handle shared memory fd
Gurchetan Singh [Wed, 12 Dec 2018 17:49:35 +0000 (09:49 -0800)]
virgl/vtest: receive and handle shared memory fd

The only tricky part is with protocol 0 we can either have
a display target or resource backing store.  With protocol
2 we can have both.  Make the map/unmap functions only deal
with the resource backing store.

v2: Handle MSAA texture case.
v3: spelling
v4: Fix dangling else (@prak)
v5: mmap --> os_mmap (@prak) + added comments (@gerddie)

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agovirgl/vtest: plumb support for shared memory
Gurchetan Singh [Wed, 12 Dec 2018 18:08:06 +0000 (10:08 -0800)]
virgl/vtest: plumb support for shared memory

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agovirgl/vtest: add utilities for receiving fds
Gurchetan Singh [Wed, 12 Dec 2018 01:01:34 +0000 (17:01 -0800)]
virgl/vtest: add utilities for receiving fds

v2: recieve --> receive (airlied@)

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agovirgl/vtest: execute a transfer_get when flushing the front buffer
Gurchetan Singh [Wed, 12 Dec 2018 23:43:43 +0000 (15:43 -0800)]
virgl/vtest: execute a transfer_get when flushing the front buffer

This just moves everything to a helper function -- "flush_front_buffer"
will be used later.

virgl_vtest_resource_map / virgl_vtest_resource_unmap already take
care to map the display target.

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agovirgl: wait after a flush
Gurchetan Singh [Tue, 16 Apr 2019 03:36:54 +0000 (20:36 -0700)]
virgl: wait after a flush

We really need to wait under certain circumstances, or we can end
up writing to memory the same time the host is reading.

Partial revert of d6dc68 ("virgl: use uint16_t mask instead of separate booleans").

Test cases:
   - dEQP-GLES31.functional.texture.texture_buffer.render_modify.as_vertex_array.bufferdata
     on vtest protocol version 2
   - Flickering during Alien Isolation
Fixes: d6dc68 ("virgl: use uint16_t mask instead of separate booleans")
Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agoanv: fix uninitialized pthread cond clock domain
Lionel Landwerlin [Thu, 18 Apr 2019 16:39:36 +0000 (17:39 +0100)]
anv: fix uninitialized pthread cond clock domain

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 843775bab78a6b ("anv: Rework fences")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years ago.gitignore: Remove autotool artifacts
Kristian H. Kristensen [Thu, 18 Apr 2019 17:31:31 +0000 (10:31 -0700)]
.gitignore: Remove autotool artifacts

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
5 years agov3d: Fix atomic cmpxchg in shaders on hardware.
Eric Anholt [Wed, 17 Apr 2019 21:44:44 +0000 (14:44 -0700)]
v3d: Fix atomic cmpxchg in shaders on hardware.

In what might be my first case of finding a divergence between hardware
and simpenrose for v3d 4.x, it seems that despite what the spec claims,
you actually need specific values in the TYPE field for atomic ops.

Fixes dEQP-GLES31.functional.*.compswap.*

5 years agov3d: Fix an invalid reuse of flags generation from before a thrsw.
Eric Anholt [Wed, 17 Apr 2019 21:07:20 +0000 (14:07 -0700)]
v3d: Fix an invalid reuse of flags generation from before a thrsw.

Noticed while debugging the last GLES 3.1 failure, though it doesn't seem
to affect that bug.

5 years agoanv: Drop some unneeded ANV_FROM_HANDLE for physical devices
Jason Ekstrand [Thu, 18 Apr 2019 20:04:42 +0000 (15:04 -0500)]
anv: Drop some unneeded ANV_FROM_HANDLE for physical devices

Ever since 48ed2a7bb009618ed, we've had one at the top of the function.

Reviewed-by: Caio Marcelo de Oliveira Filho caio.oliveira@intel.com
5 years agoanv: Re-sort the GetPhysicalDeviceFeatures2 switch statement
Jason Ekstrand [Thu, 18 Apr 2019 19:19:29 +0000 (14:19 -0500)]
anv: Re-sort the GetPhysicalDeviceFeatures2 switch statement

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoradeonsi/gfx9: use the correct condition for the DPBB + QUANT_MODE workaround
Marek Olšák [Wed, 17 Apr 2019 15:17:18 +0000 (11:17 -0400)]
radeonsi/gfx9: use the correct condition for the DPBB + QUANT_MODE workaround

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agonir/algebraic: Strength reduce some compares of x and -x
Ian Romanick [Tue, 18 Dec 2018 06:29:26 +0000 (22:29 -0800)]
nir/algebraic: Strength reduce some compares of x and -x

Converting the x vs -x comparison to an x vs 0 comparison enable cmod
propagation to help.

The seems to be a win everywhere except Gen7.

Skylake and Broadwell had similar results. (Broadwell shown)
total instructions in shared programs: 15566733 -> 15566014 (<.01%)
instructions in affected programs: 72617 -> 71898 (-0.99%)
helped: 302
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.38 x̃: 2
helped stats (rel) min: 0.15% max: 7.69% x̄: 1.28% x̃: 0.98%
95% mean confidence interval for instructions value: -2.55 -2.21
95% mean confidence interval for instructions %-change: -1.40% -1.16%
Instructions are helped.

total cycles in shared programs: 413014786 -> 413015475 (<.01%)
cycles in affected programs: 707594 -> 708283 (0.10%)
helped: 227
HURT: 101
helped stats (abs) min: 1 max: 612 x̄: 36.07 x̃: 20
helped stats (rel) min: 0.04% max: 19.39% x̄: 2.25% x̃: 1.49%
HURT stats (abs)   min: 2 max: 334 x̄: 87.90 x̃: 45
HURT stats (rel)   min: 0.07% max: 14.51% x̄: 4.54% x̃: 3.36%
95% mean confidence interval for cycles value: -8.12 12.32
95% mean confidence interval for cycles %-change: -0.67% 0.34%
Inconclusive result (value mean confidence interval includes 0).

Haswell and Ivy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 13828220 -> 13827881 (<.01%)
instructions in affected programs: 60887 -> 60548 (-0.56%)
helped: 253
HURT: 6
helped stats (abs) min: 1 max: 5 x̄: 1.36 x̃: 1
helped stats (rel) min: 0.16% max: 3.85% x̄: 0.81% x̃: 0.64%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.26% max: 0.89% x̄: 0.47% x̃: 0.27%
95% mean confidence interval for instructions value: -1.39 -1.23
95% mean confidence interval for instructions %-change: -0.85% -0.70%
Instructions are helped.

total cycles in shared programs: 386870095 -> 386894412 (<.01%)
cycles in affected programs: 1537307 -> 1561624 (1.58%)
helped: 127
HURT: 188
helped stats (abs) min: 1 max: 381 x̄: 17.89 x̃: 4
helped stats (rel) min: 0.02% max: 14.33% x̄: 1.00% x̃: 0.33%
HURT stats (abs)   min: 2 max: 5585 x̄: 141.43 x̃: 14
HURT stats (rel)   min: 0.03% max: 11.50% x̄: 1.65% x̃: 1.06%
95% mean confidence interval for cycles value: 21.95 132.45
95% mean confidence interval for cycles %-change: 0.32% 0.85%
Cycles are HURT.

Sandy Bridge
total instructions in shared programs: 10896339 -> 10896276 (<.01%)
instructions in affected programs: 10757 -> 10694 (-0.59%)
helped: 49
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.29 x̃: 1
helped stats (rel) min: 0.12% max: 1.85% x̄: 0.87% x̃: 0.89%
95% mean confidence interval for instructions value: -1.42 -1.15
95% mean confidence interval for instructions %-change: -1.03% -0.72%
Instructions are helped.

total cycles in shared programs: 155091003 -> 155090480 (<.01%)
cycles in affected programs: 102761 -> 102238 (-0.51%)
helped: 51
HURT: 0
helped stats (abs) min: 1 max: 36 x̄: 10.25 x̃: 4
helped stats (rel) min: 0.02% max: 2.57% x̄: 0.76% x̃: 0.36%
95% mean confidence interval for cycles value: -12.98 -7.53
95% mean confidence interval for cycles %-change: -0.97% -0.56%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8234667 -> 8234652 (<.01%)
instructions in affected programs: 2063 -> 2048 (-0.73%)
helped: 15
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.30% max: 1.56% x̄: 0.82% x̃: 0.81%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.97% -0.67%
Instructions are helped.

total cycles in shared programs: 188700906 -> 188700598 (<.01%)
cycles in affected programs: 283480 -> 283172 (-0.11%)
helped: 83
HURT: 3
helped stats (abs) min: 2 max: 8 x̄: 3.78 x̃: 4
helped stats (rel) min: 0.04% max: 0.55% x̄: 0.15% x̃: 0.12%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.02% max: 0.04% x̄: 0.03% x̃: 0.04%
95% mean confidence interval for cycles value: -3.87 -3.29
95% mean confidence interval for cycles %-change: -0.16% -0.12%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir/algebraic: Fix some 1-bit Boolean weirdness
Ian Romanick [Tue, 18 Dec 2018 05:34:11 +0000 (21:34 -0800)]
nir/algebraic: Fix some 1-bit Boolean weirdness

Skylake, Broadwell, and Haswell had similar results. (Skylake shown)
total cycles in shared programs: 372594532 -> 372594460 (<.01%)
cycles in affected programs: 46854 -> 46782 (-0.15%)
helped: 9
HURT: 0
helped stats (abs) min: 2 max: 22 x̄: 8.00 x̃: 2
helped stats (rel) min: 0.02% max: 0.41% x̄: 0.16% x̃: 0.09%
95% mean confidence interval for cycles value: -14.34 -1.66
95% mean confidence interval for cycles %-change: -0.28% -0.04%
Cycles are helped.

Ivy Bridge
total instructions in shared programs: 12038379 -> 12038373 (<.01%)
instructions in affected programs: 1278 -> 1272 (-0.47%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.31% max: 0.77% x̄: 0.54% x̃: 0.55%

total cycles in shared programs: 180889027 -> 180888997 (<.01%)
cycles in affected programs: 29979 -> 29949 (-0.10%)
helped: 5
HURT: 0
helped stats (abs) min: 1 max: 16 x̄: 6.00 x̃: 5
helped stats (rel) min: 0.02% max: 0.34% x̄: 0.11% x̃: 0.07%
95% mean confidence interval for cycles value: -13.40 1.40
95% mean confidence interval for cycles %-change: -0.27% 0.05%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total cycles in shared programs: 155091021 -> 155091003 (<.01%)
cycles in affected programs: 8842 -> 8824 (-0.20%)
helped: 2
HURT: 0

No changes on Iron Lake or GM45.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir/algebraic: Replace a pattern where iand with a Boolean is used as a bcsel
Ian Romanick [Thu, 6 Sep 2018 03:45:19 +0000 (20:45 -0700)]
nir/algebraic: Replace a pattern where iand with a Boolean is used as a bcsel

All of the affected shaders are in Mad Max.  I noticed this while
looking at some other things.  I tried a couple similar patterns, but
the affect on cycles was general negative.  It may be worth revisiting
this later.

v2: Rebase on 1-bit Boolean changes.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15282073 -> 15282053 (<.01%)
instructions in affected programs: 1192 -> 1172 (-1.68%)
helped: 14
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.43 x̃: 1
helped stats (rel) min: 1.16% max: 2.17% x̄: 1.65% x̃: 1.39%
95% mean confidence interval for instructions value: -1.73 -1.13
95% mean confidence interval for instructions %-change: -1.91% -1.38%
Instructions are helped.

total cycles in shared programs: 372595954 -> 372594532 (<.01%)
cycles in affected programs: 11477 -> 10055 (-12.39%)
helped: 14
HURT: 0
helped stats (abs) min: 76 max: 122 x̄: 101.57 x̃: 104
helped stats (rel) min: 7.76% max: 15.62% x̄: 12.94% x̃: 14.78%
95% mean confidence interval for cycles value: -111.05 -92.09
95% mean confidence interval for cycles %-change: -14.90% -10.98%
Cycles are helped.

No changes on any Gen6 or earlier platforms.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir/algebraic: Recognize open-coded copysign(1.0, a)
Ian Romanick [Thu, 22 Feb 2018 02:30:20 +0000 (18:30 -0800)]
nir/algebraic: Recognize open-coded copysign(1.0, a)

All of the affected shaders are in Mad Max.  The inner part of the
pattern is itself an open-coded sign(a).  I tried using that as a
pattern, but the results were not good.  A bunch of shaders were helped
for instructions, but overall cycles, spill, and fills were hurt.

v2: Rebase on 1-bit Boolean changes.

v3: Fix order of copysign() parameters in comments and commit message.
Noticed by Matt.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15282141 -> 15282073 (<.01%)
instructions in affected programs: 6106 -> 6038 (-1.11%)
helped: 17
HURT: 0
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 1.02% max: 2.20% x̄: 1.15% x̃: 1.06%
95% mean confidence interval for instructions value: -4.00 -4.00
95% mean confidence interval for instructions %-change: -1.30% -1.00%
Instructions are helped.

total cycles in shared programs: 372597886 -> 372595954 (<.01%)
cycles in affected programs: 32701 -> 30769 (-5.91%)
helped: 17
HURT: 0
helped stats (abs) min: 6 max: 216 x̄: 113.65 x̃: 118
helped stats (rel) min: 0.40% max: 21.86% x̄: 6.20% x̃: 5.83%
95% mean confidence interval for cycles value: -152.84 -74.45
95% mean confidence interval for cycles %-change: -8.89% -3.51%
Cycles are helped.

No changes on any Gen6 or earlier platforms.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/fs: Generate better code for fsign multiplied by a value
Ian Romanick [Tue, 26 Jun 2018 02:55:31 +0000 (19:55 -0700)]
intel/fs: Generate better code for fsign multiplied by a value

v2: Rebase on v2 changes in previous two commits.

v3: Rebase on 85c35885b38 ("nir: Rework nir_src_as_alu_instr to not take
a pointer").

shader-db results:

Skylake and Broadwell had similar results. (Skylake shown)
total instructions in shared programs: 15297100 -> 15282141 (-0.10%)
instructions in affected programs: 956685 -> 941726 (-1.56%)
helped: 4527
HURT: 0
helped stats (abs) min: 1 max: 221 x̄: 3.30 x̃: 2
helped stats (rel) min: 0.07% max: 10.53% x̄: 1.85% x̃: 1.37%
95% mean confidence interval for instructions value: -3.48 -3.12
95% mean confidence interval for instructions %-change: -1.88% -1.81%
Instructions are helped.

total cycles in shared programs: 372809551 -> 372597886 (-0.06%)
cycles in affected programs: 13645512 -> 13433847 (-1.55%)
helped: 4362
HURT: 125
helped stats (abs) min: 1 max: 2088 x̄: 50.73 x̃: 28
helped stats (rel) min: 0.01% max: 28.20% x̄: 2.77% x̃: 2.39%
HURT stats (abs)   min: 1 max: 1836 x̄: 76.90 x̃: 28
HURT stats (rel)   min: <.01% max: 34.36% x̄: 3.03% x̃: 1.42%
95% mean confidence interval for cycles value: -50.98 -43.37
95% mean confidence interval for cycles %-change: -2.67% -2.55%
Cycles are helped.

total spills in shared programs: 23465 -> 23463 (<.01%)
spills in affected programs: 42 -> 40 (-4.76%)
helped: 1
HURT: 0

total fills in shared programs: 31766 -> 31763 (<.01%)
fills in affected programs: 69 -> 66 (-4.35%)
helped: 1
HURT: 0

Haswell
total instructions in shared programs: 13839992 -> 13828311 (-0.08%)
instructions in affected programs: 712503 -> 700822 (-1.64%)
helped: 3477
HURT: 0
helped stats (abs) min: 1 max: 221 x̄: 3.36 x̃: 2
helped stats (rel) min: 0.07% max: 10.64% x̄: 1.96% x̃: 1.52%
95% mean confidence interval for instructions value: -3.58 -3.14
95% mean confidence interval for instructions %-change: -2.01% -1.92%
Instructions are helped.

total cycles in shared programs: 387026330 -> 386872483 (-0.04%)
cycles in affected programs: 11329966 -> 11176119 (-1.36%)
helped: 3307
HURT: 139
helped stats (abs) min: 2 max: 1776 x̄: 49.58 x̃: 18
helped stats (rel) min: 0.01% max: 20.38% x̄: 2.27% x̃: 1.79%
HURT stats (abs)   min: 1 max: 2314 x̄: 72.68 x̃: 20
HURT stats (rel)   min: <.01% max: 33.99% x̄: 2.28% x̃: 0.96%
95% mean confidence interval for cycles value: -49.31 -39.98
95% mean confidence interval for cycles %-change: -2.15% -2.01%
Cycles are helped.

LOST:   1
GAINED: 0

Ivy Bridge
total instructions in shared programs: 12045602 -> 12038463 (-0.06%)
instructions in affected programs: 623837 -> 616698 (-1.14%)
helped: 2498
HURT: 0
helped stats (abs) min: 1 max: 39 x̄: 2.86 x̃: 2
helped stats (rel) min: 0.05% max: 10.00% x̄: 1.30% x̃: 1.05%
95% mean confidence interval for instructions value: -2.96 -2.75
95% mean confidence interval for instructions %-change: -1.34% -1.26%
Instructions are helped.

total cycles in shared programs: 181025675 -> 180891323 (-0.07%)
cycles in affected programs: 11329329 -> 11194977 (-1.19%)
helped: 2439
HURT: 47
helped stats (abs) min: 1 max: 1565 x̄: 57.06 x̃: 26
helped stats (rel) min: 0.02% max: 24.56% x̄: 2.02% x̃: 1.64%
HURT stats (abs)   min: 1 max: 1269 x̄: 102.51 x̃: 43
HURT stats (rel)   min: 0.11% max: 52.94% x̄: 4.15% x̃: 1.34%
95% mean confidence interval for cycles value: -59.91 -48.17
95% mean confidence interval for cycles %-change: -1.99% -1.82%
Cycles are helped.

Sandy Bridge, Iron Lake, and GM45 had similar results. (Sandy Bridge shown)
total instructions in shared programs: 10896368 -> 10896339 (<.01%)
instructions in affected programs: 3767 -> 3738 (-0.77%)
helped: 17
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 1.71 x̃: 1
helped stats (rel) min: 0.13% max: 9.52% x̄: 3.58% x̃: 2.73%
95% mean confidence interval for instructions value: -2.27 -1.14
95% mean confidence interval for instructions %-change: -5.14% -2.03%
Instructions are helped.

total cycles in shared programs: 155091109 -> 155091021 (<.01%)
cycles in affected programs: 47241 -> 47153 (-0.19%)
helped: 15
HURT: 8
helped stats (abs) min: 2 max: 81 x̄: 15.73 x̃: 4
helped stats (rel) min: 0.03% max: 10.59% x̄: 1.55% x̃: 0.71%
HURT stats (abs)   min: 14 max: 32 x̄: 18.50 x̃: 17
HURT stats (rel)   min: 0.32% max: 2.79% x̄: 2.43% x̃: 2.71%
95% mean confidence interval for cycles value: -14.59 6.93
95% mean confidence interval for cycles %-change: -1.41% 1.08%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]
5 years agointel/fs: Add a scale factor to emit_fsign
Ian Romanick [Tue, 26 Jun 2018 02:53:38 +0000 (19:53 -0700)]
intel/fs: Add a scale factor to emit_fsign

Normally fsign generates -1, 0, or +1.  The new scale factor, S, causes
fsign to generate -S, 0, or +S.

v2: Rebase on v2 changes in previous commit.

v3: Rebase on 85c35885b38 ("nir: Rework nir_src_as_alu_instr to not take
a pointer").

Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]
5 years agointel/fs: Refactor code generation for nir_op_fsign to its own function
Ian Romanick [Tue, 26 Jun 2018 02:50:56 +0000 (19:50 -0700)]
intel/fs: Refactor code generation for nir_op_fsign to its own function

v2: Call emit_fsign from inside the existing switch statement.
Suggested by Matt.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/fs: Eliminate dead code first
Ian Romanick [Sun, 9 Sep 2018 18:37:24 +0000 (11:37 -0700)]
intel/fs: Eliminate dead code first

This simplifies the later patch "i965/fs: Generate better code for fsign
multiplied by a value".

shader-db results:

Broadwell and Skylake had similar results. (Skylake shown)
total cycles in shared programs: 372808735 -> 372809551 (<.01%)
cycles in affected programs: 1519520 -> 1520336 (0.05%)
helped: 243
HURT: 277
helped stats (abs) min: 1 max: 226 x̄: 34.05 x̃: 5
helped stats (rel) min: 0.01% max: 13.88% x̄: 1.46% x̃: 0.27%
HURT stats (abs)   min: 1 max: 1810 x̄: 32.82 x̃: 5
HURT stats (rel)   min: 0.01% max: 16.03% x̄: 1.56% x̃: 0.29%
95% mean confidence interval for cycles value: -7.18 10.32
95% mean confidence interval for cycles %-change: -0.17% 0.46%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge, Haswell and Ivy Bridge had similar results. (Sandy Bridge shown)
total cycles in shared programs: 155091458 -> 155091109 (<.01%)
cycles in affected programs: 370797 -> 370448 (-0.09%)
helped: 24
HURT: 36
helped stats (abs) min: 1 max: 331 x̄: 103.17 x̃: 41
helped stats (rel) min: 0.02% max: 7.70% x̄: 2.07% x̃: 0.56%
HURT stats (abs)   min: 1 max: 291 x̄: 59.08 x̃: 10
HURT stats (rel)   min: 0.02% max: 5.29% x̄: 1.02% x̃: 0.15%
95% mean confidence interval for cycles value: -37.92 26.28
95% mean confidence interval for cycles %-change: -0.88% 0.45%
Inconclusive result (value mean confidence interval includes 0).

Iron Lake and GM45 had similar results. (GM45 shown)
total cycles in shared programs: 129133970 -> 129133978 (<.01%)
cycles in affected programs: 111966 -> 111974 (<.01%)
helped: 3
HURT: 1
helped stats (abs) min: 2 max: 4 x̄: 2.67 x̃: 2
helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01%
HURT stats (abs)   min: 16 max: 16 x̄: 16.00 x̃: 16
HURT stats (rel)   min: 0.07% max: 0.07% x̄: 0.07% x̃: 0.07%
95% mean confidence interval for cycles value: -12.93 16.93
95% mean confidence interval for cycles %-change: -0.05% 0.08%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agofreedreno: Fix format string warning
Kristian H. Kristensen [Thu, 18 Apr 2019 17:44:02 +0000 (10:44 -0700)]
freedreno: Fix format string warning

Modifiers are uin64_t.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/a6xx: Add helper for incrementing regid
Kristian H. Kristensen [Thu, 18 Apr 2019 17:40:45 +0000 (10:40 -0700)]
freedreno/a6xx: Add helper for incrementing regid

Increments the regid by specified amount unless regid is is
r63.x (invalid).

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno: Use enum values from matching enum
Kristian H. Kristensen [Thu, 18 Apr 2019 17:38:56 +0000 (10:38 -0700)]
freedreno: Use enum values from matching enum

We get a couple of warnings from using mismatched enum values. This
fixes that.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/a2xx: Fix redundant if statement
Kristian H. Kristensen [Wed, 10 Apr 2019 20:08:00 +0000 (13:08 -0700)]
freedreno/a2xx: Fix redundant if statement

We test the condition, declare a few variables, then test the exact
same condition again. Let's not do that.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
5 years agofreedreno/ir3: Mark ir3_context_error() as NORETURN
Kristian H. Kristensen [Wed, 10 Apr 2019 20:06:39 +0000 (13:06 -0700)]
freedreno/ir3: Mark ir3_context_error() as NORETURN

Fixes a few warnings.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
5 years agonir: Add a nir_src_as_intrinsic() helper
Jason Ekstrand [Wed, 17 Apr 2019 22:18:19 +0000 (17:18 -0500)]
nir: Add a nir_src_as_intrinsic() helper

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agonir: Rework nir_src_as_alu_instr to not take a pointer
Jason Ekstrand [Wed, 17 Apr 2019 22:10:18 +0000 (17:10 -0500)]
nir: Rework nir_src_as_alu_instr to not take a pointer

Other nir_src_as_* functions just take a nir_src.  It's not that much
more memory copying and the constness preserving really isn't worth the
cognitive dissonance.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agonir: Drop "struct" from some nir_* declarations
Jason Ekstrand [Wed, 17 Apr 2019 22:01:14 +0000 (17:01 -0500)]
nir: Drop "struct" from some nir_* declarations

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: implement WaEnableStateCacheRedirectToCS
Lionel Landwerlin [Thu, 18 Apr 2019 11:00:19 +0000 (12:00 +0100)]
anv: implement WaEnableStateCacheRedirectToCS

This 3d performance workaround was initially put in the kernel but the
media driver requires different settings so the register has been
whitelisted in i915 [1] and userspace drivers are left initializing it as
they wish.

[1] : https://patchwork.freedesktop.org/series/59494/

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
5 years agoi965: implement WaEnableStateCacheRedirectToCS
Lionel Landwerlin [Thu, 18 Apr 2019 11:00:08 +0000 (12:00 +0100)]
i965: implement WaEnableStateCacheRedirectToCS

This 3d performance workaround was initially put in the kernel but the
media driver requires different settings so the register has been
whitelisted in i915 [1] and userspace drivers are left initializing it as
they wish.

[1] : https://patchwork.freedesktop.org/series/59494/

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
5 years agoiris: implement WaEnableStateCacheRedirectToCS
Lionel Landwerlin [Thu, 18 Apr 2019 10:57:57 +0000 (11:57 +0100)]
iris: implement WaEnableStateCacheRedirectToCS

This 3d performance workaround was initially put in the kernel but the
media driver requires different settings so the register has been
whitelisted in i915 [1] and userspace drivers are left initializing it as
they wish.

[1] : https://patchwork.freedesktop.org/series/59494/

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>