mesa.git
5 years agolima: add Android build
Icenowy Zheng [Mon, 15 Apr 2019 04:32:43 +0000 (12:32 +0800)]
lima: add Android build

Currently only meson build supported is added for lima driver.

Add Android build support for lima.

Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Acked-by: Qiang Yu <yuq825@gmail.com>
5 years agost/nine: skip position checks in SetCursorPosition()
Andre Heider [Thu, 11 Apr 2019 06:42:47 +0000 (08:42 +0200)]
st/nine: skip position checks in SetCursorPosition()

For HW cursors, "cursor.pos" doesn't hold the current position of the
pointer, just the position of the last call to SetCursorPosition().

Skip the check against stale values and bump the d3dadapter9 drm version
to expose this change of behaviour.

Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>
5 years agoanv: Rework the descriptor set layout create loop
Jason Ekstrand [Fri, 19 Apr 2019 19:45:34 +0000 (14:45 -0500)]
anv: Rework the descriptor set layout create loop

Previously, we were storing the per-binding create info pointer in the
immutable_samplers field temporarily so that we can switch the order in
which we walk the loop.  However, now that we have multiple arrays of
structs to walk, it makes more sense to store an index of some sort.
Because we want to leave immutable_samplers as NULL for undefined
bindings, we store index + 1 and then subtract one later.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Ignore descriptor binding flags if bindingCount == 0
Jason Ekstrand [Fri, 19 Apr 2019 19:43:01 +0000 (14:43 -0500)]
anv: Ignore descriptor binding flags if bindingCount == 0

I missed this on the first go round.  The bindingCount field of
VkDescriptorSetLayoutBindingFlagsCreateInfoEXT is allowed to be zero
which means the flags array is ignored.

Fixes: d6c9bd6e01b4d "anv: Put binding flags in descriptor set layouts"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agopanfrost/mdg: Use shared fsign lowering
Alyssa Rosenzweig [Fri, 19 Apr 2019 23:15:45 +0000 (23:15 +0000)]
panfrost/mdg: Use shared fsign lowering

Fixes failures in shaders.operator.common_functions.sign.*

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost: Fixup vertex offsets to prevent shadow copy
Alyssa Rosenzweig [Mon, 15 Apr 2019 04:08:46 +0000 (04:08 +0000)]
panfrost: Fixup vertex offsets to prevent shadow copy

Mali attribute buffers have to be 64-byte aligned. However, Gallium
enforces no such requirement; for unaligned buffers, we were previously
forced to create a shadow copy (slow!). To prevent this, we instead use
the offseted buffer's address with the lower bits masked off, and then
add those masked off bits to the src_offset. Proof of correctness
included, possibly for the opportunity to say "QED" unironically.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost: Track BO lifetime with jobs and reference counts
Alyssa Rosenzweig [Sun, 14 Apr 2019 22:42:44 +0000 (22:42 +0000)]
panfrost: Track BO lifetime with jobs and reference counts

This (fairly large) patch continues work surrounding the panfrost_job
abstraction to improve job lifetime management. In particular, we add
infrastructure to track which BOs are used by a particular job
(currently limited to the vertex buffer BOs), to reference count these
BOs, and to automatically manage the BOs memory based on the reference
count. This set of changes serves as a code cleanup, as a way of future
proofing for allowing flushing BOs, and immediately as a bugfix to
workaround the missing reference counting for vertex buffer BOs.
Meanwhile, there are a few cleanups to vertex buffer handling code
itself, so in the short-term, this allows us to remove the costly VBO
staging workaround, since this patch addresses the underlying causes.

v2: Use pipe_reference for BO reference counting, rather than managing
it ourselves. Don't duplicate hash-table key removal. Fix vertex buffer
counting.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agodocs/relnotes: add support for VK_KHR_shader_float16_int8
Andres Gomez [Thu, 18 Apr 2019 19:00:37 +0000 (21:00 +0200)]
docs/relnotes: add support for VK_KHR_shader_float16_int8

v2: radv also supports it now (Samuel Pitoiset).

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoanv/nir: Add a central helper for figuring out SSBO address formats
Jason Ekstrand [Thu, 18 Apr 2019 17:08:57 +0000 (12:08 -0500)]
anv/nir: Add a central helper for figuring out SSBO address formats

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agonir: Add helpers for getting the type of an address format
Jason Ekstrand [Thu, 18 Apr 2019 17:08:34 +0000 (12:08 -0500)]
nir: Add helpers for getting the type of an address format

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Implement VK_EXT_descriptor_indexing
Jason Ekstrand [Wed, 27 Feb 2019 22:08:20 +0000 (16:08 -0600)]
anv: Implement VK_EXT_descriptor_indexing

Now that everything is in place to do bindless for all resource types
except input attachments and UBOs, VK_EXT_descriptor_indexing is
"trivial".

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Put binding flags in descriptor set layouts
Jason Ekstrand [Tue, 2 Oct 2018 20:35:59 +0000 (15:35 -0500)]
anv: Put binding flags in descriptor set layouts

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Use bindless handles for images
Jason Ekstrand [Tue, 12 Feb 2019 07:02:28 +0000 (01:02 -0600)]
anv: Use bindless handles for images

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agointel/fs: Add support for bindless image load/store/atomic
Jason Ekstrand [Tue, 12 Feb 2019 06:47:54 +0000 (00:47 -0600)]
intel/fs: Add support for bindless image load/store/atomic

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Use bindless textures and samplers
Jason Ekstrand [Thu, 7 Feb 2019 20:10:33 +0000 (14:10 -0600)]
anv: Use bindless textures and samplers

This commit changes anv to put bindless handles and sampler pointers
into the descriptor buffer and use those instead of bindful when we run
out of binding table space.  This "spilling" of descriptors allows to to
advertise an almost unbounded number of images and samplers.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Pass the plane into lower_tex_deref
Jason Ekstrand [Fri, 8 Feb 2019 23:04:07 +0000 (17:04 -0600)]
anv: Pass the plane into lower_tex_deref

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Use write_image_view to initialize immutable samplers
Jason Ekstrand [Fri, 8 Feb 2019 04:34:57 +0000 (22:34 -0600)]
anv: Use write_image_view to initialize immutable samplers

Instead of setting it manually, call the helper.  When setting
descriptor sets becomes more complicated than just setting some struct
values, this will keep immutable sampler handling correct.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Count the number of planes in each descriptor binding
Jason Ekstrand [Thu, 7 Feb 2019 16:16:24 +0000 (10:16 -0600)]
anv: Count the number of planes in each descriptor binding

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agointel/fs: Add support for bindless texture ops
Jason Ekstrand [Wed, 6 Feb 2019 21:42:17 +0000 (15:42 -0600)]
intel/fs: Add support for bindless texture ops

We add two new texture sources for bindless surface and sampler handles.
Bindless surface handles are expected to be pre-shifted so that the
20-bit surface state table index is in the top 20 bits of the 32-bit
handle.  This lets us avoid any extra shifts in the shader.  Bindless
sampler handles are 32-byte aligned byte offsets from general state base
address.  We use 32-byte aligned instead of 16-byte aligned to avoid
having to use more indirect messages than needed.  It means we can't
tightly pack samplers but that's probably not a big deal.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agointel,nir: Lower TXD with a bindless sampler
Jason Ekstrand [Fri, 8 Feb 2019 23:56:52 +0000 (17:56 -0600)]
intel,nir: Lower TXD with a bindless sampler

When we have a bindless sampler, we need an instruction header.  Even in
SIMD8, this pushes the instruction over the sampler message size maximum
of 11 registers.  Instead, we have to lower TXD to TXL.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Implement VK_KHR_shader_atomic_int64
Jason Ekstrand [Sun, 13 Jan 2019 00:30:47 +0000 (18:30 -0600)]
anv: Implement VK_KHR_shader_atomic_int64

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Implement SSBOs bindings with GPU addresses in the descriptor BO
Jason Ekstrand [Wed, 9 Jan 2019 22:04:22 +0000 (16:04 -0600)]
anv: Implement SSBOs bindings with GPU addresses in the descriptor BO

This commit adds a new way for ANV to do SSBO bindings by just passing a
GPU address in through the descriptor buffer and using the A64 messages
to access the GPU address directly.  This means that our variable
pointers are now "real" pointers instead of a vec2(BTI, offset) pair.
This carries a few of advantages:

 1. It lets us support a virtually unbounded number of SSBO bindings.

 2. It lets us implement VK_KHR_shader_atomic_int64 which we couldn't
    implement before because those atomic messages are only available
    in the bindless A64 form.

 3. It's way better than messing around with bindless handles for SSBOs
    which is the only other option for VK_EXT_descriptor_indexing.

 4. It's more future looking, maybe?  At the least, this is what NVIDIA
    does (they don't have binding based SSBOs at all).  This doesn't a
    priori mean it's better, it just means it's probably not terrible.

The big disadvantage, of course, is that we have to start doing our own
bounds checking for robustBufferAccess again have to push in dynamic
offsets.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Lower some SSBO operations in apply_pipeline_layout
Jason Ekstrand [Fri, 11 Jan 2019 22:52:43 +0000 (16:52 -0600)]
anv: Lower some SSBO operations in apply_pipeline_layout

In order to avoid the potential overhead of A64 operations on all SSBO
ops, we look for those SSBO ops where we can get to the descriptor set
from the SSBO access operation and lower those to a binding-table
approach.  When robustBufferAccess is enabled, this lets the hardware do
the bounds checking for us.  It also avoids some potentially expensive
64-bit integer calculations.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Add a has_a64_buffer_access to anv_physical_device
Jason Ekstrand [Thu, 7 Feb 2019 18:01:18 +0000 (12:01 -0600)]
anv: Add a has_a64_buffer_access to anv_physical_device

This is more descriptive and a bit nicer than checking for gen >= 8 &&
use_softpin everywhere.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agointel/nir: Re-run int64 lowering in postprocess_nir
Jason Ekstrand [Thu, 10 Jan 2019 22:05:06 +0000 (16:05 -0600)]
intel/nir: Re-run int64 lowering in postprocess_nir

We're about to start doing 64-bit pointer calculations in ANV.  They
will get applied after brw_preprocess_nir which is where we currently do
64-bit integer arithmetic lowering.  Because we're adding 64-bit integer
arithmetic after the initial lowering has happened, we need to lower
again.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agonir/lower_io: Expose some explicit I/O lowering helpers
Jason Ekstrand [Tue, 8 Jan 2019 00:00:22 +0000 (18:00 -0600)]
nir/lower_io: Expose some explicit I/O lowering helpers

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv/pipeline: Add skeleton support for spilling to bindless
Jason Ekstrand [Mon, 25 Feb 2019 19:59:07 +0000 (13:59 -0600)]
anv/pipeline: Add skeleton support for spilling to bindless

If the number of surfaces or samplers exceeds what we can put in a
table, we will want to spill out to bindless.  There is no bindless
support yet but this gets us the basic framework that will be used by
later commits.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv/pipeline: Sort bindings by most used first
Jason Ekstrand [Thu, 21 Feb 2019 00:14:56 +0000 (18:14 -0600)]
anv/pipeline: Sort bindings by most used first

This commit just sorts the bindings by how often they're used vs the
array size of the binding.  This will let us make more nuanced decisions
about what goes in the binding table vs. what to make bindless.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Add a #define for the max binding table size
Jason Ekstrand [Tue, 16 Apr 2019 22:35:05 +0000 (17:35 -0500)]
anv: Add a #define for the max binding table size

This also fixes a bug where we mis-calculate maximum binding table sizes
and may return true in vkGetDescriptorSetLayoutSupport even for sets too
large to fit in a binding table.

Fixes: ddc40691221 "anv: Implement VK_KHR_maintenance3"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Put image params in the descriptor set buffer on gen8 and earlier
Jason Ekstrand [Thu, 22 Nov 2018 00:26:27 +0000 (18:26 -0600)]
anv: Put image params in the descriptor set buffer on gen8 and earlier

This is really where they belong; not push constants.  The one downside
here is that we can't push them anymore for compute shaders.  However,
that's a general problem and we should figure out how to push descriptor
sets for compute shaders.  This lets us bump MAX_IMAGES to 64 on BDW and
earlier platforms because we no longer have to worry about push constant
overhead limits.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Make all VkDeviceMemory BOs resident permanently
Jason Ekstrand [Wed, 27 Feb 2019 00:05:34 +0000 (18:05 -0600)]
anv: Make all VkDeviceMemory BOs resident permanently

We spend a lot of time in the driver adding things to hash sets to track
residency.  The reality is that a properly built Vulkan app uses large
memory objects and sub-allocates from them.  In a typical frame, most of
if not all of those allocations are going to be resident for the entire
frame so we're really not saving ourselves much by tracking fine-grained
residency.  Just throwing everything in the validation list does make it
a little bit more expensive inside the kernel to walk the list and
ensure that all our VA is in order.  However, without relocations, the
overhead of that is pretty small.

If we ever do run into a memory pressure situation where the fine-
grained residency could even potentially help, we would likely be
swapping one page out to make room for another within the draw call and
performance is totally lost at that point.  We're better off swapping
out other apps and just letting ours run a whole frame.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agofreedreno/ir3: fix const assert
Rob Clark [Fri, 19 Apr 2019 18:32:22 +0000 (11:32 -0700)]
freedreno/ir3: fix const assert

Fixes: fe8c57e859d freedreno/ir3: use nir_src_as_uint in a few places
Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agogallium/auxiliary/vl: Fix a couple of warnings
Kristian H. Kristensen [Thu, 18 Apr 2019 18:33:10 +0000 (11:33 -0700)]
gallium/auxiliary/vl: Fix a couple of warnings

Remove unused functions and mark unhandled default case with
unreachable.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
5 years agoegl/dri2: Mark potentially unused 'display' variable with MAYBE_UNUSED
Kristian H. Kristensen [Thu, 18 Apr 2019 18:30:22 +0000 (11:30 -0700)]
egl/dri2: Mark potentially unused 'display' variable with MAYBE_UNUSED

Sometimes there is no X11 platform.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
5 years agoralloc: Fully qualify non-virtual destructor call
Kristian H. Kristensen [Thu, 18 Apr 2019 18:28:12 +0000 (11:28 -0700)]
ralloc: Fully qualify non-virtual destructor call

This suppresses warning about calling a non-virtual destructor in a
non-final class with virtual functions:

src/compiler/glsl/ast.h:53:4: warning: destructor called on non-final 'ast_node' that has virtual functions but non-virtual destructor [-Wdelete-non-virtual-dtor]
   DECLARE_LINEAR_ZALLOC_CXX_OPERATORS(ast_node);

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
5 years agonir_opcodes.py: Saturate to expression that doesn't overflow
Kristian H. Kristensen [Thu, 18 Apr 2019 18:23:13 +0000 (11:23 -0700)]
nir_opcodes.py: Saturate to expression that doesn't overflow

Compiler warns about overflow when assigning UINT64_MAX to something
smaller than a uin64_t:

src/compiler/nir/nir_constant_expressions.c:16909:50: warning: implicit conversion from 'unsigned long long' to 'uint1_t' (aka 'unsigned char') changes value from 18446744073709551615 to 255 [-Wconstant-conversion]
            uint1_t dst = (src0 + src1) < src0 ? UINT64_MAX : (src0 + src1);
                    ~~~                          ^~~~~~~~~~

Shift UINT64_MAX down to the appropriate maximum value for the type
being assigned to.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoglsl_to_nir: Initialize debug variable
Kristian H. Kristensen [Wed, 10 Apr 2019 20:10:48 +0000 (13:10 -0700)]
glsl_to_nir: Initialize debug variable

If we want to assert on found == true when the loop exits early, we
need to initialize it to false.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
5 years agotgsi: Mark tgsi_strings_check() unused
Kristian H. Kristensen [Wed, 10 Apr 2019 20:09:01 +0000 (13:09 -0700)]
tgsi: Mark tgsi_strings_check() unused

It's there to hold the static asserts, don't warning about it being
unused.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
5 years agoanv: limit URB reconfigurations when using blorp
Lionel Landwerlin [Wed, 6 Mar 2019 11:42:14 +0000 (11:42 +0000)]
anv: limit URB reconfigurations when using blorp

If the last graphics pipeline bound to the command buffer has enough
space in its VS URB entries for Blorp then avoid reconfiguring the URB
partitions.

v2: s/0/MESA_SHADER_VERTEX/ (Caio)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agointel/devinfo: add basic sanity tests on device database
Lionel Landwerlin [Thu, 11 Apr 2019 11:12:38 +0000 (12:12 +0100)]
intel/devinfo: add basic sanity tests on device database

v2: #undef NDEBUG (Eric)
    Use inc_include & inc_src (Eric)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Anuj Phogat anuj.phogat@gmail.com
5 years agointel/devinfo: fix missing num_thread_per_eu on ICL
Lionel Landwerlin [Thu, 11 Apr 2019 11:20:36 +0000 (12:20 +0100)]
intel/devinfo: fix missing num_thread_per_eu on ICL

There was an assumption that num_thread_per_eu would be set in the
Gen8 features. Since this is mostly the same of all gen8->11 (except
GEN9_LP that overwrites it) let's just factor it out.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Anuj Phogat anuj.phogat@gmail.com
5 years agonir: Use the nir_builder _imm helpers in setting up deref offsets.
Eric Anholt [Wed, 17 Apr 2019 17:12:48 +0000 (10:12 -0700)]
nir: Use the nir_builder _imm helpers in setting up deref offsets.

When looking at the dEQP nested_struct_array_dynamic_index_fragment code
after lowering, I was horrified at the amount of adding and multiplying by
0 we were doing.  The builder _imm helpers handle that for you so that the
following optimization passes have less work to do.  Plus, it's easier to
read.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: Fix deref offset calculation for structs.
Eric Anholt [Wed, 17 Apr 2019 17:09:14 +0000 (10:09 -0700)]
nir: Fix deref offset calculation for structs.

We were calcuating the offset for the field within the struct, and just
dropping it on the floor.  Fixes a regression in
KHR-GLES3.shaders.struct.local.nested_struct_array_dynamic_index_fragment
and a few of its friends since the scratch lowering commit.

Fixes: e8e159e9df40 ("nir/deref: Add helpers for getting offsets")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agolima: enable nir fsign lowering in ppir
Erico Nunes [Tue, 16 Apr 2019 20:49:51 +0000 (22:49 +0200)]
lima: enable nir fsign lowering in ppir

The mali utgard pp doesn't support a sign instruction.
Use the nir lowering function for fsign to implement fsign in ppir.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agonir/algebraic: add lowering for fsign
Erico Nunes [Tue, 16 Apr 2019 20:49:41 +0000 (22:49 +0200)]
nir/algebraic: add lowering for fsign

The mali utgard pp doesn't support a sign instruction.
In the ARM offline shader compiler, the sign function is implemented
using sub(gt(0.0, a), lt(0.0, a)).
This is a generic optimization, so implement it in the nir level when
lower_fsign is set, alongside the lowering for isign.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agodocs: s/Aptril/April/
Brian Paul [Fri, 19 Apr 2019 14:30:27 +0000 (08:30 -0600)]
docs: s/Aptril/April/

Found by Manuel Huber.  Trivial.

5 years agolima/ppir: support ppir_op_ceil
Erico Nunes [Tue, 16 Apr 2019 21:21:24 +0000 (23:21 +0200)]
lima/ppir: support ppir_op_ceil

Add a few missing ppir_op_ceil enum handling entries to implement
nir_op_fceil in lima ppir.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
5 years agoradv: Support VK_EXT_inline_uniform_block.
Bas Nieuwenhuizen [Thu, 14 Mar 2019 10:20:53 +0000 (11:20 +0100)]
radv: Support VK_EXT_inline_uniform_block.

Basically just reserve the memory in the descriptor sets.

On the shader side we construct a buffer descriptor, since
AFAIU VGPR indexing on 32-bit pointers in LLVM is still broken.

This fully supports update after bind and variable descriptor set
sizes. However, the limits are somewhat arbitrary and are mostly
about finding a reasonable division of a 2 GiB max memory size over
the set.

v2: - rebased on top of master (Samuel)
    - remove the loading resources rework (Samuel)
    - only load UBO descriptors if it's a pointer (Samuel)
    - use LLVMBuildPtrToInt to avoid IR failures (Samuel)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v2)
5 years agoac/nir: use the new raw/struct SSBO atomic intrisics for comp_swap
Samuel Pitoiset [Thu, 18 Apr 2019 07:09:55 +0000 (09:09 +0200)]
ac/nir: use the new raw/struct SSBO atomic intrisics for comp_swap

This is actually fixed now.

This change requires LLVM r358579. Make sure to have it in
your tree, otherwise the following piglit will hang:

tests/spec/arb_shader_storage_buffer_object/execution/ssbo-atomicCompSwap-int.shader_test

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agoac/nir: only use the new raw/struct SSBO atomic intrinsics with LLVM 9+
Samuel Pitoiset [Thu, 18 Apr 2019 07:06:49 +0000 (09:06 +0200)]
ac/nir: only use the new raw/struct SSBO atomic intrinsics with LLVM 9+

They are buggy with older LLVM version, see r358579.

Fixes: 78c551aca1c ("ac/nir: use new LLVM 8 intrinsics for SSBO atomics except cmpswap")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agoac/nir: only use the new raw/struct image atomic intrinsics with LLVM 9+
Samuel Pitoiset [Thu, 18 Apr 2019 07:17:04 +0000 (09:17 +0200)]
ac/nir: only use the new raw/struct image atomic intrinsics with LLVM 9+

They are buggy with LLVM 8 because they weren't marked as source
of divergence, see r358579.

Fixes: dd0172e865f ("radv: Use structured intrinsics instead of indexing workaround for GFX9.")"
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agoiris: Be less aggressive at postdraw work skipping
Kenneth Graunke [Thu, 18 Apr 2019 21:35:26 +0000 (14:35 -0700)]
iris: Be less aggressive at postdraw work skipping

We empty the cache sets when flushing the batch, at which point we need
to add any framebuffer related BOs even though the bindings haven't
changed.  So, we now do the cache set tracking unconditionally.

For now, we continue skipping resolve work based on the same conditions
in the predraw functions - the thinking is if we didn't trigger
resolves, there's nothing to update here.  Time will tell if this works.

Partly reverts commit 365886ebe1a54f893b688b457553eead6aa572ea, and
fixes Unigine Valley rendering on Gen9+.  Drops drawoverhead scores
by about 10-12%.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110353

5 years agointel/fs: Account for live range lengths in spill costs
Jason Ekstrand [Sat, 13 Apr 2019 21:01:50 +0000 (16:01 -0500)]
intel/fs: Account for live range lengths in spill costs

The current register allocator has a concept of "spill benefit" which is
based on the number of nodes with which a given node interferes.  The
idea is that you want to spill stuff with high interference because
those are the most likely registers to help when spilling.  However,
this fails to take into account the length of the live range so the
allocator frequently picks "cheap" (not many uses) registers which are
actually very short lived and so spilling them doesn't help with the
pressure situation.

This commit takes into account the length of the live range to make
long-lived registers more likely to get spilled than short-lived ones.
This encourages the spill chooser to choose slightly larger registers
which will affect a larger area of the program and hopefully we have to
spill fewer of them to get the same reduction in over-all register
pressure.

Shader-db results on Kaby Lake:

    total spills in shared programs: 23664 -> 12050 (-49.08%)
    spills in affected programs: 19243 -> 7629 (-60.35%)
    helped: 296
    HURT: 8

    total fills in shared programs: 32028 -> 25139 (-21.51%)
    fills in affected programs: 20378 -> 13489 (-33.81%)
    helped: 295
    HURT: 16

Of course, most of that is in Deus Ex...

Shader-db results on Kaby Lake (without Deus Ex):

    total spills in shared programs: 6479 -> 5834 (-9.96%)
    spills in affected programs: 3231 -> 2586 (-19.96%)
    helped: 40
    HURT: 4

    total fills in shared programs: 17165 -> 17099 (-0.38%)
    fills in affected programs: 6951 -> 6885 (-0.95%)
    helped: 40
    HURT: 7

Even without Deus Ex, the spill help is pretty respectable.  The worst
hurt shaders were one compute shader in Aztec Ruins and one fragment
shader in KSP that were each hurt by around 13% fill 9% spill.

VkPipeline-db results on Kaby Lake:

    total spills in shared programs: 9149 -> 8069 (-11.80%)
    spills in affected programs: 5197 -> 4117 (-20.78%)
    helped: 27
    HURT: 16

    total fills in shared programs: 26390 -> 25477 (-3.46%)
    fills in affected programs: 12662 -> 11749 (-7.21%)
    helped: 24
    HURT: 22

The Vulkan results were decidedly more mixed but we don't have nearly as
many apps in that database yet.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agovirgl/vtest: bump up protocol version + support encoded transfers
Gurchetan Singh [Sat, 15 Dec 2018 00:07:19 +0000 (16:07 -0800)]
virgl/vtest: bump up protocol version + support encoded transfers

This more accurately reflects what the drm winsys does.

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agovirgl/vtest: wait after issuing a transfer get
Gurchetan Singh [Sat, 15 Dec 2018 00:36:07 +0000 (16:36 -0800)]
virgl/vtest: wait after issuing a transfer get

Otherwise, there's artifacts when running Unigine Valley with
protocol version 2.

We can get away with not waiting for most buffers, but let's
be conservative.

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agovirgl/vtest: modify sending and receiving data for shared memory
Gurchetan Singh [Thu, 13 Dec 2018 02:01:06 +0000 (18:01 -0800)]
virgl/vtest: modify sending and receiving data for shared memory

We need to copy the shared memory region to the display target.

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agovirgl/vtest: receive and handle shared memory fd
Gurchetan Singh [Wed, 12 Dec 2018 17:49:35 +0000 (09:49 -0800)]
virgl/vtest: receive and handle shared memory fd

The only tricky part is with protocol 0 we can either have
a display target or resource backing store.  With protocol
2 we can have both.  Make the map/unmap functions only deal
with the resource backing store.

v2: Handle MSAA texture case.
v3: spelling
v4: Fix dangling else (@prak)
v5: mmap --> os_mmap (@prak) + added comments (@gerddie)

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agovirgl/vtest: plumb support for shared memory
Gurchetan Singh [Wed, 12 Dec 2018 18:08:06 +0000 (10:08 -0800)]
virgl/vtest: plumb support for shared memory

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agovirgl/vtest: add utilities for receiving fds
Gurchetan Singh [Wed, 12 Dec 2018 01:01:34 +0000 (17:01 -0800)]
virgl/vtest: add utilities for receiving fds

v2: recieve --> receive (airlied@)

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agovirgl/vtest: execute a transfer_get when flushing the front buffer
Gurchetan Singh [Wed, 12 Dec 2018 23:43:43 +0000 (15:43 -0800)]
virgl/vtest: execute a transfer_get when flushing the front buffer

This just moves everything to a helper function -- "flush_front_buffer"
will be used later.

virgl_vtest_resource_map / virgl_vtest_resource_unmap already take
care to map the display target.

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agovirgl: wait after a flush
Gurchetan Singh [Tue, 16 Apr 2019 03:36:54 +0000 (20:36 -0700)]
virgl: wait after a flush

We really need to wait under certain circumstances, or we can end
up writing to memory the same time the host is reading.

Partial revert of d6dc68 ("virgl: use uint16_t mask instead of separate booleans").

Test cases:
   - dEQP-GLES31.functional.texture.texture_buffer.render_modify.as_vertex_array.bufferdata
     on vtest protocol version 2
   - Flickering during Alien Isolation
Fixes: d6dc68 ("virgl: use uint16_t mask instead of separate booleans")
Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agoanv: fix uninitialized pthread cond clock domain
Lionel Landwerlin [Thu, 18 Apr 2019 16:39:36 +0000 (17:39 +0100)]
anv: fix uninitialized pthread cond clock domain

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 843775bab78a6b ("anv: Rework fences")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years ago.gitignore: Remove autotool artifacts
Kristian H. Kristensen [Thu, 18 Apr 2019 17:31:31 +0000 (10:31 -0700)]
.gitignore: Remove autotool artifacts

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
5 years agov3d: Fix atomic cmpxchg in shaders on hardware.
Eric Anholt [Wed, 17 Apr 2019 21:44:44 +0000 (14:44 -0700)]
v3d: Fix atomic cmpxchg in shaders on hardware.

In what might be my first case of finding a divergence between hardware
and simpenrose for v3d 4.x, it seems that despite what the spec claims,
you actually need specific values in the TYPE field for atomic ops.

Fixes dEQP-GLES31.functional.*.compswap.*

5 years agov3d: Fix an invalid reuse of flags generation from before a thrsw.
Eric Anholt [Wed, 17 Apr 2019 21:07:20 +0000 (14:07 -0700)]
v3d: Fix an invalid reuse of flags generation from before a thrsw.

Noticed while debugging the last GLES 3.1 failure, though it doesn't seem
to affect that bug.

5 years agoanv: Drop some unneeded ANV_FROM_HANDLE for physical devices
Jason Ekstrand [Thu, 18 Apr 2019 20:04:42 +0000 (15:04 -0500)]
anv: Drop some unneeded ANV_FROM_HANDLE for physical devices

Ever since 48ed2a7bb009618ed, we've had one at the top of the function.

Reviewed-by: Caio Marcelo de Oliveira Filho caio.oliveira@intel.com
5 years agoanv: Re-sort the GetPhysicalDeviceFeatures2 switch statement
Jason Ekstrand [Thu, 18 Apr 2019 19:19:29 +0000 (14:19 -0500)]
anv: Re-sort the GetPhysicalDeviceFeatures2 switch statement

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoradeonsi/gfx9: use the correct condition for the DPBB + QUANT_MODE workaround
Marek Olšák [Wed, 17 Apr 2019 15:17:18 +0000 (11:17 -0400)]
radeonsi/gfx9: use the correct condition for the DPBB + QUANT_MODE workaround

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agonir/algebraic: Strength reduce some compares of x and -x
Ian Romanick [Tue, 18 Dec 2018 06:29:26 +0000 (22:29 -0800)]
nir/algebraic: Strength reduce some compares of x and -x

Converting the x vs -x comparison to an x vs 0 comparison enable cmod
propagation to help.

The seems to be a win everywhere except Gen7.

Skylake and Broadwell had similar results. (Broadwell shown)
total instructions in shared programs: 15566733 -> 15566014 (<.01%)
instructions in affected programs: 72617 -> 71898 (-0.99%)
helped: 302
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.38 x̃: 2
helped stats (rel) min: 0.15% max: 7.69% x̄: 1.28% x̃: 0.98%
95% mean confidence interval for instructions value: -2.55 -2.21
95% mean confidence interval for instructions %-change: -1.40% -1.16%
Instructions are helped.

total cycles in shared programs: 413014786 -> 413015475 (<.01%)
cycles in affected programs: 707594 -> 708283 (0.10%)
helped: 227
HURT: 101
helped stats (abs) min: 1 max: 612 x̄: 36.07 x̃: 20
helped stats (rel) min: 0.04% max: 19.39% x̄: 2.25% x̃: 1.49%
HURT stats (abs)   min: 2 max: 334 x̄: 87.90 x̃: 45
HURT stats (rel)   min: 0.07% max: 14.51% x̄: 4.54% x̃: 3.36%
95% mean confidence interval for cycles value: -8.12 12.32
95% mean confidence interval for cycles %-change: -0.67% 0.34%
Inconclusive result (value mean confidence interval includes 0).

Haswell and Ivy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 13828220 -> 13827881 (<.01%)
instructions in affected programs: 60887 -> 60548 (-0.56%)
helped: 253
HURT: 6
helped stats (abs) min: 1 max: 5 x̄: 1.36 x̃: 1
helped stats (rel) min: 0.16% max: 3.85% x̄: 0.81% x̃: 0.64%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.26% max: 0.89% x̄: 0.47% x̃: 0.27%
95% mean confidence interval for instructions value: -1.39 -1.23
95% mean confidence interval for instructions %-change: -0.85% -0.70%
Instructions are helped.

total cycles in shared programs: 386870095 -> 386894412 (<.01%)
cycles in affected programs: 1537307 -> 1561624 (1.58%)
helped: 127
HURT: 188
helped stats (abs) min: 1 max: 381 x̄: 17.89 x̃: 4
helped stats (rel) min: 0.02% max: 14.33% x̄: 1.00% x̃: 0.33%
HURT stats (abs)   min: 2 max: 5585 x̄: 141.43 x̃: 14
HURT stats (rel)   min: 0.03% max: 11.50% x̄: 1.65% x̃: 1.06%
95% mean confidence interval for cycles value: 21.95 132.45
95% mean confidence interval for cycles %-change: 0.32% 0.85%
Cycles are HURT.

Sandy Bridge
total instructions in shared programs: 10896339 -> 10896276 (<.01%)
instructions in affected programs: 10757 -> 10694 (-0.59%)
helped: 49
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.29 x̃: 1
helped stats (rel) min: 0.12% max: 1.85% x̄: 0.87% x̃: 0.89%
95% mean confidence interval for instructions value: -1.42 -1.15
95% mean confidence interval for instructions %-change: -1.03% -0.72%
Instructions are helped.

total cycles in shared programs: 155091003 -> 155090480 (<.01%)
cycles in affected programs: 102761 -> 102238 (-0.51%)
helped: 51
HURT: 0
helped stats (abs) min: 1 max: 36 x̄: 10.25 x̃: 4
helped stats (rel) min: 0.02% max: 2.57% x̄: 0.76% x̃: 0.36%
95% mean confidence interval for cycles value: -12.98 -7.53
95% mean confidence interval for cycles %-change: -0.97% -0.56%
Cycles are helped.

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8234667 -> 8234652 (<.01%)
instructions in affected programs: 2063 -> 2048 (-0.73%)
helped: 15
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.30% max: 1.56% x̄: 0.82% x̃: 0.81%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.97% -0.67%
Instructions are helped.

total cycles in shared programs: 188700906 -> 188700598 (<.01%)
cycles in affected programs: 283480 -> 283172 (-0.11%)
helped: 83
HURT: 3
helped stats (abs) min: 2 max: 8 x̄: 3.78 x̃: 4
helped stats (rel) min: 0.04% max: 0.55% x̄: 0.15% x̃: 0.12%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.02% max: 0.04% x̄: 0.03% x̃: 0.04%
95% mean confidence interval for cycles value: -3.87 -3.29
95% mean confidence interval for cycles %-change: -0.16% -0.12%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir/algebraic: Fix some 1-bit Boolean weirdness
Ian Romanick [Tue, 18 Dec 2018 05:34:11 +0000 (21:34 -0800)]
nir/algebraic: Fix some 1-bit Boolean weirdness

Skylake, Broadwell, and Haswell had similar results. (Skylake shown)
total cycles in shared programs: 372594532 -> 372594460 (<.01%)
cycles in affected programs: 46854 -> 46782 (-0.15%)
helped: 9
HURT: 0
helped stats (abs) min: 2 max: 22 x̄: 8.00 x̃: 2
helped stats (rel) min: 0.02% max: 0.41% x̄: 0.16% x̃: 0.09%
95% mean confidence interval for cycles value: -14.34 -1.66
95% mean confidence interval for cycles %-change: -0.28% -0.04%
Cycles are helped.

Ivy Bridge
total instructions in shared programs: 12038379 -> 12038373 (<.01%)
instructions in affected programs: 1278 -> 1272 (-0.47%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.31% max: 0.77% x̄: 0.54% x̃: 0.55%

total cycles in shared programs: 180889027 -> 180888997 (<.01%)
cycles in affected programs: 29979 -> 29949 (-0.10%)
helped: 5
HURT: 0
helped stats (abs) min: 1 max: 16 x̄: 6.00 x̃: 5
helped stats (rel) min: 0.02% max: 0.34% x̄: 0.11% x̃: 0.07%
95% mean confidence interval for cycles value: -13.40 1.40
95% mean confidence interval for cycles %-change: -0.27% 0.05%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge
total cycles in shared programs: 155091021 -> 155091003 (<.01%)
cycles in affected programs: 8842 -> 8824 (-0.20%)
helped: 2
HURT: 0

No changes on Iron Lake or GM45.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir/algebraic: Replace a pattern where iand with a Boolean is used as a bcsel
Ian Romanick [Thu, 6 Sep 2018 03:45:19 +0000 (20:45 -0700)]
nir/algebraic: Replace a pattern where iand with a Boolean is used as a bcsel

All of the affected shaders are in Mad Max.  I noticed this while
looking at some other things.  I tried a couple similar patterns, but
the affect on cycles was general negative.  It may be worth revisiting
this later.

v2: Rebase on 1-bit Boolean changes.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15282073 -> 15282053 (<.01%)
instructions in affected programs: 1192 -> 1172 (-1.68%)
helped: 14
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.43 x̃: 1
helped stats (rel) min: 1.16% max: 2.17% x̄: 1.65% x̃: 1.39%
95% mean confidence interval for instructions value: -1.73 -1.13
95% mean confidence interval for instructions %-change: -1.91% -1.38%
Instructions are helped.

total cycles in shared programs: 372595954 -> 372594532 (<.01%)
cycles in affected programs: 11477 -> 10055 (-12.39%)
helped: 14
HURT: 0
helped stats (abs) min: 76 max: 122 x̄: 101.57 x̃: 104
helped stats (rel) min: 7.76% max: 15.62% x̄: 12.94% x̃: 14.78%
95% mean confidence interval for cycles value: -111.05 -92.09
95% mean confidence interval for cycles %-change: -14.90% -10.98%
Cycles are helped.

No changes on any Gen6 or earlier platforms.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agonir/algebraic: Recognize open-coded copysign(1.0, a)
Ian Romanick [Thu, 22 Feb 2018 02:30:20 +0000 (18:30 -0800)]
nir/algebraic: Recognize open-coded copysign(1.0, a)

All of the affected shaders are in Mad Max.  The inner part of the
pattern is itself an open-coded sign(a).  I tried using that as a
pattern, but the results were not good.  A bunch of shaders were helped
for instructions, but overall cycles, spill, and fills were hurt.

v2: Rebase on 1-bit Boolean changes.

v3: Fix order of copysign() parameters in comments and commit message.
Noticed by Matt.

All Gen7+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15282141 -> 15282073 (<.01%)
instructions in affected programs: 6106 -> 6038 (-1.11%)
helped: 17
HURT: 0
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 1.02% max: 2.20% x̄: 1.15% x̃: 1.06%
95% mean confidence interval for instructions value: -4.00 -4.00
95% mean confidence interval for instructions %-change: -1.30% -1.00%
Instructions are helped.

total cycles in shared programs: 372597886 -> 372595954 (<.01%)
cycles in affected programs: 32701 -> 30769 (-5.91%)
helped: 17
HURT: 0
helped stats (abs) min: 6 max: 216 x̄: 113.65 x̃: 118
helped stats (rel) min: 0.40% max: 21.86% x̄: 6.20% x̃: 5.83%
95% mean confidence interval for cycles value: -152.84 -74.45
95% mean confidence interval for cycles %-change: -8.89% -3.51%
Cycles are helped.

No changes on any Gen6 or earlier platforms.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/fs: Generate better code for fsign multiplied by a value
Ian Romanick [Tue, 26 Jun 2018 02:55:31 +0000 (19:55 -0700)]
intel/fs: Generate better code for fsign multiplied by a value

v2: Rebase on v2 changes in previous two commits.

v3: Rebase on 85c35885b38 ("nir: Rework nir_src_as_alu_instr to not take
a pointer").

shader-db results:

Skylake and Broadwell had similar results. (Skylake shown)
total instructions in shared programs: 15297100 -> 15282141 (-0.10%)
instructions in affected programs: 956685 -> 941726 (-1.56%)
helped: 4527
HURT: 0
helped stats (abs) min: 1 max: 221 x̄: 3.30 x̃: 2
helped stats (rel) min: 0.07% max: 10.53% x̄: 1.85% x̃: 1.37%
95% mean confidence interval for instructions value: -3.48 -3.12
95% mean confidence interval for instructions %-change: -1.88% -1.81%
Instructions are helped.

total cycles in shared programs: 372809551 -> 372597886 (-0.06%)
cycles in affected programs: 13645512 -> 13433847 (-1.55%)
helped: 4362
HURT: 125
helped stats (abs) min: 1 max: 2088 x̄: 50.73 x̃: 28
helped stats (rel) min: 0.01% max: 28.20% x̄: 2.77% x̃: 2.39%
HURT stats (abs)   min: 1 max: 1836 x̄: 76.90 x̃: 28
HURT stats (rel)   min: <.01% max: 34.36% x̄: 3.03% x̃: 1.42%
95% mean confidence interval for cycles value: -50.98 -43.37
95% mean confidence interval for cycles %-change: -2.67% -2.55%
Cycles are helped.

total spills in shared programs: 23465 -> 23463 (<.01%)
spills in affected programs: 42 -> 40 (-4.76%)
helped: 1
HURT: 0

total fills in shared programs: 31766 -> 31763 (<.01%)
fills in affected programs: 69 -> 66 (-4.35%)
helped: 1
HURT: 0

Haswell
total instructions in shared programs: 13839992 -> 13828311 (-0.08%)
instructions in affected programs: 712503 -> 700822 (-1.64%)
helped: 3477
HURT: 0
helped stats (abs) min: 1 max: 221 x̄: 3.36 x̃: 2
helped stats (rel) min: 0.07% max: 10.64% x̄: 1.96% x̃: 1.52%
95% mean confidence interval for instructions value: -3.58 -3.14
95% mean confidence interval for instructions %-change: -2.01% -1.92%
Instructions are helped.

total cycles in shared programs: 387026330 -> 386872483 (-0.04%)
cycles in affected programs: 11329966 -> 11176119 (-1.36%)
helped: 3307
HURT: 139
helped stats (abs) min: 2 max: 1776 x̄: 49.58 x̃: 18
helped stats (rel) min: 0.01% max: 20.38% x̄: 2.27% x̃: 1.79%
HURT stats (abs)   min: 1 max: 2314 x̄: 72.68 x̃: 20
HURT stats (rel)   min: <.01% max: 33.99% x̄: 2.28% x̃: 0.96%
95% mean confidence interval for cycles value: -49.31 -39.98
95% mean confidence interval for cycles %-change: -2.15% -2.01%
Cycles are helped.

LOST:   1
GAINED: 0

Ivy Bridge
total instructions in shared programs: 12045602 -> 12038463 (-0.06%)
instructions in affected programs: 623837 -> 616698 (-1.14%)
helped: 2498
HURT: 0
helped stats (abs) min: 1 max: 39 x̄: 2.86 x̃: 2
helped stats (rel) min: 0.05% max: 10.00% x̄: 1.30% x̃: 1.05%
95% mean confidence interval for instructions value: -2.96 -2.75
95% mean confidence interval for instructions %-change: -1.34% -1.26%
Instructions are helped.

total cycles in shared programs: 181025675 -> 180891323 (-0.07%)
cycles in affected programs: 11329329 -> 11194977 (-1.19%)
helped: 2439
HURT: 47
helped stats (abs) min: 1 max: 1565 x̄: 57.06 x̃: 26
helped stats (rel) min: 0.02% max: 24.56% x̄: 2.02% x̃: 1.64%
HURT stats (abs)   min: 1 max: 1269 x̄: 102.51 x̃: 43
HURT stats (rel)   min: 0.11% max: 52.94% x̄: 4.15% x̃: 1.34%
95% mean confidence interval for cycles value: -59.91 -48.17
95% mean confidence interval for cycles %-change: -1.99% -1.82%
Cycles are helped.

Sandy Bridge, Iron Lake, and GM45 had similar results. (Sandy Bridge shown)
total instructions in shared programs: 10896368 -> 10896339 (<.01%)
instructions in affected programs: 3767 -> 3738 (-0.77%)
helped: 17
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 1.71 x̃: 1
helped stats (rel) min: 0.13% max: 9.52% x̄: 3.58% x̃: 2.73%
95% mean confidence interval for instructions value: -2.27 -1.14
95% mean confidence interval for instructions %-change: -5.14% -2.03%
Instructions are helped.

total cycles in shared programs: 155091109 -> 155091021 (<.01%)
cycles in affected programs: 47241 -> 47153 (-0.19%)
helped: 15
HURT: 8
helped stats (abs) min: 2 max: 81 x̄: 15.73 x̃: 4
helped stats (rel) min: 0.03% max: 10.59% x̄: 1.55% x̃: 0.71%
HURT stats (abs)   min: 14 max: 32 x̄: 18.50 x̃: 17
HURT stats (rel)   min: 0.32% max: 2.79% x̄: 2.43% x̃: 2.71%
95% mean confidence interval for cycles value: -14.59 6.93
95% mean confidence interval for cycles %-change: -1.41% 1.08%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]
5 years agointel/fs: Add a scale factor to emit_fsign
Ian Romanick [Tue, 26 Jun 2018 02:53:38 +0000 (19:53 -0700)]
intel/fs: Add a scale factor to emit_fsign

Normally fsign generates -1, 0, or +1.  The new scale factor, S, causes
fsign to generate -S, 0, or +S.

v2: Rebase on v2 changes in previous commit.

v3: Rebase on 85c35885b38 ("nir: Rework nir_src_as_alu_instr to not take
a pointer").

Reviewed-by: Matt Turner <mattst88@gmail.com> [v2]
5 years agointel/fs: Refactor code generation for nir_op_fsign to its own function
Ian Romanick [Tue, 26 Jun 2018 02:50:56 +0000 (19:50 -0700)]
intel/fs: Refactor code generation for nir_op_fsign to its own function

v2: Call emit_fsign from inside the existing switch statement.
Suggested by Matt.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/fs: Eliminate dead code first
Ian Romanick [Sun, 9 Sep 2018 18:37:24 +0000 (11:37 -0700)]
intel/fs: Eliminate dead code first

This simplifies the later patch "i965/fs: Generate better code for fsign
multiplied by a value".

shader-db results:

Broadwell and Skylake had similar results. (Skylake shown)
total cycles in shared programs: 372808735 -> 372809551 (<.01%)
cycles in affected programs: 1519520 -> 1520336 (0.05%)
helped: 243
HURT: 277
helped stats (abs) min: 1 max: 226 x̄: 34.05 x̃: 5
helped stats (rel) min: 0.01% max: 13.88% x̄: 1.46% x̃: 0.27%
HURT stats (abs)   min: 1 max: 1810 x̄: 32.82 x̃: 5
HURT stats (rel)   min: 0.01% max: 16.03% x̄: 1.56% x̃: 0.29%
95% mean confidence interval for cycles value: -7.18 10.32
95% mean confidence interval for cycles %-change: -0.17% 0.46%
Inconclusive result (value mean confidence interval includes 0).

Sandy Bridge, Haswell and Ivy Bridge had similar results. (Sandy Bridge shown)
total cycles in shared programs: 155091458 -> 155091109 (<.01%)
cycles in affected programs: 370797 -> 370448 (-0.09%)
helped: 24
HURT: 36
helped stats (abs) min: 1 max: 331 x̄: 103.17 x̃: 41
helped stats (rel) min: 0.02% max: 7.70% x̄: 2.07% x̃: 0.56%
HURT stats (abs)   min: 1 max: 291 x̄: 59.08 x̃: 10
HURT stats (rel)   min: 0.02% max: 5.29% x̄: 1.02% x̃: 0.15%
95% mean confidence interval for cycles value: -37.92 26.28
95% mean confidence interval for cycles %-change: -0.88% 0.45%
Inconclusive result (value mean confidence interval includes 0).

Iron Lake and GM45 had similar results. (GM45 shown)
total cycles in shared programs: 129133970 -> 129133978 (<.01%)
cycles in affected programs: 111966 -> 111974 (<.01%)
helped: 3
HURT: 1
helped stats (abs) min: 2 max: 4 x̄: 2.67 x̃: 2
helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01%
HURT stats (abs)   min: 16 max: 16 x̄: 16.00 x̃: 16
HURT stats (rel)   min: 0.07% max: 0.07% x̄: 0.07% x̃: 0.07%
95% mean confidence interval for cycles value: -12.93 16.93
95% mean confidence interval for cycles %-change: -0.05% 0.08%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agofreedreno: Fix format string warning
Kristian H. Kristensen [Thu, 18 Apr 2019 17:44:02 +0000 (10:44 -0700)]
freedreno: Fix format string warning

Modifiers are uin64_t.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/a6xx: Add helper for incrementing regid
Kristian H. Kristensen [Thu, 18 Apr 2019 17:40:45 +0000 (10:40 -0700)]
freedreno/a6xx: Add helper for incrementing regid

Increments the regid by specified amount unless regid is is
r63.x (invalid).

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno: Use enum values from matching enum
Kristian H. Kristensen [Thu, 18 Apr 2019 17:38:56 +0000 (10:38 -0700)]
freedreno: Use enum values from matching enum

We get a couple of warnings from using mismatched enum values. This
fixes that.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/a2xx: Fix redundant if statement
Kristian H. Kristensen [Wed, 10 Apr 2019 20:08:00 +0000 (13:08 -0700)]
freedreno/a2xx: Fix redundant if statement

We test the condition, declare a few variables, then test the exact
same condition again. Let's not do that.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
5 years agofreedreno/ir3: Mark ir3_context_error() as NORETURN
Kristian H. Kristensen [Wed, 10 Apr 2019 20:06:39 +0000 (13:06 -0700)]
freedreno/ir3: Mark ir3_context_error() as NORETURN

Fixes a few warnings.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
5 years agonir: Add a nir_src_as_intrinsic() helper
Jason Ekstrand [Wed, 17 Apr 2019 22:18:19 +0000 (17:18 -0500)]
nir: Add a nir_src_as_intrinsic() helper

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agonir: Rework nir_src_as_alu_instr to not take a pointer
Jason Ekstrand [Wed, 17 Apr 2019 22:10:18 +0000 (17:10 -0500)]
nir: Rework nir_src_as_alu_instr to not take a pointer

Other nir_src_as_* functions just take a nir_src.  It's not that much
more memory copying and the constness preserving really isn't worth the
cognitive dissonance.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agonir: Drop "struct" from some nir_* declarations
Jason Ekstrand [Wed, 17 Apr 2019 22:01:14 +0000 (17:01 -0500)]
nir: Drop "struct" from some nir_* declarations

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: implement WaEnableStateCacheRedirectToCS
Lionel Landwerlin [Thu, 18 Apr 2019 11:00:19 +0000 (12:00 +0100)]
anv: implement WaEnableStateCacheRedirectToCS

This 3d performance workaround was initially put in the kernel but the
media driver requires different settings so the register has been
whitelisted in i915 [1] and userspace drivers are left initializing it as
they wish.

[1] : https://patchwork.freedesktop.org/series/59494/

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
5 years agoi965: implement WaEnableStateCacheRedirectToCS
Lionel Landwerlin [Thu, 18 Apr 2019 11:00:08 +0000 (12:00 +0100)]
i965: implement WaEnableStateCacheRedirectToCS

This 3d performance workaround was initially put in the kernel but the
media driver requires different settings so the register has been
whitelisted in i915 [1] and userspace drivers are left initializing it as
they wish.

[1] : https://patchwork.freedesktop.org/series/59494/

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
5 years agoiris: implement WaEnableStateCacheRedirectToCS
Lionel Landwerlin [Thu, 18 Apr 2019 10:57:57 +0000 (11:57 +0100)]
iris: implement WaEnableStateCacheRedirectToCS

This 3d performance workaround was initially put in the kernel but the
media driver requires different settings so the register has been
whitelisted in i915 [1] and userspace drivers are left initializing it as
they wish.

[1] : https://patchwork.freedesktop.org/series/59494/

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
5 years agoanv/device: expose VK_KHR_shader_float16_int8 in gen8+
Iago Toral Quiroga [Fri, 22 Jun 2018 09:41:28 +0000 (11:41 +0200)]
anv/device: expose VK_KHR_shader_float16_int8 in gen8+

v2 (Jason):
 - Merge shaderFloat16 and shaderInt8 enablement into a single patch.
 - Merge extension enable.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
5 years agoanv/pipeline: support Float16 and Int8 SPIR-V capabilities in gen8+
Iago Toral Quiroga [Tue, 22 Jan 2019 10:26:03 +0000 (11:26 +0100)]
anv/pipeline: support Float16 and Int8 SPIR-V capabilities in gen8+

v2:
  - Merge Float16 and Int8 capabilities into a single patch (Jason)
  - Merged patch that enabled SPIR-V front-end checks for these caps
    (except for Int8, which was already merged)

v3:
 - Keep capabilities sorted (Jason)

v4:
- SpvCapabilityFloat16 support already added in master (Juan)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
5 years agocompiler/spirv: move the check for Int8 capability
Iago Toral Quiroga [Tue, 22 Jan 2019 10:27:09 +0000 (11:27 +0100)]
compiler/spirv: move the check for Int8 capability

So it is right after the checks for the other various Int* capabilities.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/compiler: validate region restrictions for mixed float mode
Iago Toral Quiroga [Wed, 6 Feb 2019 08:13:22 +0000 (09:13 +0100)]
intel/compiler: validate region restrictions for mixed float mode

v2:
 - Adapted unit tests to make them consistent with the changes done
   to the validation of half-float conversions.

v3 (Curro):
- Check all the accummulators
- Constify declarations
- Do not check src1 type in single-source instructions.
- Check for all instructions that read accumulator (either implicitly or
  explicitly)
- Check restrictions in src1 too.
- Merge conditional block
- Add invalid test case.

v4 (Curro):
- Assert on 3-src instructions, as they are not validated.
- Get rid of types_are_mixed_float(), as we know instruction is mixed
  float at that point.
- Remove conditions from not verified case.
- Fix brackets on conditional.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
5 years agointel/compiler: validate conversions between 64-bit and 8-bit types
Iago Toral Quiroga [Fri, 8 Feb 2019 08:20:56 +0000 (09:20 +0100)]
intel/compiler: validate conversions between 64-bit and 8-bit types

v2:
 - Add some tests with UB type too (Jason)

v3:
 - consider implicit conversions from 2src instructions too (Curro).

v4:
 - Do not check src1 type in single-source instructions (Curro).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
5 years agointel/compiler: validate region restrictions for half-float conversions
Iago Toral Quiroga [Fri, 1 Feb 2019 10:41:33 +0000 (11:41 +0100)]
intel/compiler: validate region restrictions for half-float conversions

v2:
 - Consider implicit conversions in 2-src instructions too (Curro)
 - For restrictions that involve destination stride requirements
   only validate them for Align1, since Align16 always requires
   packed data.
 - Skip general rule for the dst/execution type size ratio for
   mixed float instructions on CHV and SKL+, these have their own
   set of rules that we'll be validated separately.

v3 (Curro):
 - Do not check src1 type in single-source instructions.
 - Check restriction on src1.
 - Remove invalid test.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
5 years agointel/compiler: also set F execution type for mixed float mode in BDW
Iago Toral Quiroga [Tue, 5 Feb 2019 12:50:09 +0000 (13:50 +0100)]
intel/compiler: also set F execution type for mixed float mode in BDW

The section 'Execution Data Types' of 3D Media GPGPU volume, which
describes execution types, is exactly the same in BDW and SKL+.

Also, this section states that there is a single execution type, so it
makes sense that this is the wider of the two floating point types
involved in mixed float mode, which is what we do for SKL+ and CHV.

v2:
 - Make sure we also account for the destination type in mixed mode (Curro).

Acked-by: Francisco Jerez <currojerez@riseup.net>
5 years agointel/compiler: implement SIMD16 restrictions for mixed-float instructions
Iago Toral Quiroga [Thu, 14 Mar 2019 09:35:58 +0000 (10:35 +0100)]
intel/compiler: implement SIMD16 restrictions for mixed-float instructions

v2: f32to16/f16to32 can use a :W destination (Curro)
v3: check destination is packed (Curro).

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
5 years agointel/compiler: skip MAD algebraic optimization for half-float or mixed mode
Iago Toral Quiroga [Tue, 12 Feb 2019 08:34:10 +0000 (09:34 +0100)]
intel/compiler: skip MAD algebraic optimization for half-float or mixed mode

It is very likely that this optimzation is never useful and we'll probably
just end up removing it, so let's not bother adding more cases to it for
now.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/compiler: remove inexact algebraic optimizations from the backend
Iago Toral Quiroga [Tue, 12 Feb 2019 11:43:30 +0000 (12:43 +0100)]
intel/compiler: remove inexact algebraic optimizations from the backend

NIR already has these and correctly considers exact/inexact qualification,
whereas the backend doesn't and can apply the optimizations where it
shouldn't. This happened to be the case in a handful of Tomb Raider shaders,
where NIR would skip the optimizations because of a precise qualification
but the backend would then (incorrectly) apply them anyway.

Besides this, considering that we are not emitting much math in the backend
these days it is unlikely that these optimizations are useful in general. A
shader-db run confirms that MAD and LRP optimizations, for example, were only
being triggered in cases where NIR would skip them due to precise
requirements, so in the near future we might want to remove more of these,
but for now we just remove the ones that are not completely correct.

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/compiler: fix cmod propagation for non 32-bit types
Iago Toral Quiroga [Mon, 19 Nov 2018 12:08:07 +0000 (13:08 +0100)]
intel/compiler: fix cmod propagation for non 32-bit types

v2:
 - Do not propagate if the bit-size changes

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/compiler: add a brw_reg_type_is_integer helper
Iago Toral Quiroga [Tue, 20 Nov 2018 13:04:26 +0000 (14:04 +0100)]
intel/compiler: add a brw_reg_type_is_integer helper

v2:
 - Fixed typo: meant BRW_REGISTER_TYPE_UB instead BRW_REGISTER_TYPE_UV

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
5 years agointel/compiler: implement is_zero, is_one, is_negative_one for 8-bit/16-bit
Iago Toral Quiroga [Fri, 26 Oct 2018 11:40:27 +0000 (13:40 +0200)]
intel/compiler: implement is_zero, is_one, is_negative_one for 8-bit/16-bit

There are no 8-bit immediates, so assert in that case.
16-bit immediates are replicated in each word of a 32-bit immediate, so
we only need to check the lower 16-bits.

v2:
 - Fix is_zero with half-float to consider -0 as well (Jason).
 - Fix is_negative_one for word type.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>