mesa.git
5 years agowinsys/amdgpu: make IBs writable and expose their address
Marek Olšák [Thu, 16 Aug 2018 01:17:06 +0000 (21:17 -0400)]
winsys/amdgpu: make IBs writable and expose their address

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoac: add REWIND and GDS registers to register headers
Marek Olšák [Tue, 12 Feb 2019 20:01:18 +0000 (15:01 -0500)]
ac: add REWIND and GDS registers to register headers

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoac: add ac_get_i1_sgpr_mask
Marek Olšák [Tue, 12 Feb 2019 20:00:53 +0000 (15:00 -0500)]
ac: add ac_get_i1_sgpr_mask

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoac: add radeon_info::is_pro_graphics
Marek Olšák [Tue, 12 Feb 2019 17:19:33 +0000 (12:19 -0500)]
ac: add radeon_info::is_pro_graphics

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoac: add radeon_info::marketing_name, replacing the winsys callback
Marek Olšák [Tue, 12 Feb 2019 17:14:15 +0000 (12:14 -0500)]
ac: add radeon_info::marketing_name, replacing the winsys callback

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agotgsi/scan: add uses_drawid
Marek Olšák [Mon, 4 Feb 2019 19:31:59 +0000 (14:31 -0500)]
tgsi/scan: add uses_drawid

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
5 years agoiris: Track valid data range and infer unsynchronized mappings.
Kenneth Graunke [Fri, 5 Apr 2019 18:54:10 +0000 (11:54 -0700)]
iris: Track valid data range and infer unsynchronized mappings.

Applications frequently call glBufferSubData() to consecutive regions
of a VBO to append new vertex data.  If no data exists there yet, we
can promote these to unsynchronized writes, even if the buffer is busy,
since the GPU can't be doing anything useful with undefined content.
This can avoid a bunch of unnecessary blitting on the GPU.

u_threaded_context would do this for us, and in fact prohibits us from
doing so (see TC_TRANSFER_MAP_NO_INFER_UNSYNCHRONIZED).  But we haven't
hooked that up yet, and it may be useful to disable u_threaded_context
when debugging...at which point we'd still want this optimization.  At
the very least, it would let us measure the benefit of threading
independently from this optimization.  And it's not a lot of code.

Removes most stall avoidance blits in "Total War: WARHAMMER."

On my Skylake GT4e at 1920x1080, this appears to improve performance
in games by the following (but I did not do many runs for proper
statistics gathering):

   ----------------------------------------------
   | DiRT Rally        | +2% (avg) | + 2% (max) |
   | Bioshock Infinite | +3% (avg) | + 9% (max) |
   | Shadow of Mordor  | +7% (avg) | +20% (max) |
   ----------------------------------------------

5 years agoiris: Make a resource_is_busy() helper
Kenneth Graunke [Tue, 16 Apr 2019 20:23:06 +0000 (13:23 -0700)]
iris: Make a resource_is_busy() helper

This checks both "is it busy" and "do we have work queued up for it"?

5 years agoiris: Replace buffer backing storage and rebind to update addresses.
Kenneth Graunke [Tue, 12 Mar 2019 21:51:22 +0000 (14:51 -0700)]
iris: Replace buffer backing storage and rebind to update addresses.

This implements PIPE_CAP_INVALIDATE_BUFFER and invalidate_resource(),
as well as the PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE flag.  When either
of these happen, we swap out the backing storage of the buffer for a
new idle BO, allowing us to write to it immediately without stalling
or queueing a blit.

On my Skylake GT4e at 1920x1080, this improves performance in games:

   -----------------------------------------------
   | DiRT Rally        | +25% (avg) | +17% (max) |
   | Bioshock Infinite | +22% (avg) | +11% (max) |
   | Shadow of Mordor  | +27% (avg) | +83% (max) |
   -----------------------------------------------

5 years agoiris: Make memzone_for_address non-static
Kenneth Graunke [Mon, 22 Apr 2019 22:16:49 +0000 (15:16 -0700)]
iris: Make memzone_for_address non-static

I want to use this in iris_resource.c.

5 years agoiris: Make a gl_shader_stage -> pipe_shader_stage helper function
Kenneth Graunke [Wed, 17 Apr 2019 06:54:37 +0000 (23:54 -0700)]
iris: Make a gl_shader_stage -> pipe_shader_stage helper function

This is probably not the best place for it, but I don't feel like moving
the one out of the TGSI translator today, and we already have the other
direction here, so...*shrug*

5 years agoiris: Rework image views to store pipe_image_view.
Kenneth Graunke [Mon, 22 Apr 2019 18:27:37 +0000 (11:27 -0700)]
iris: Rework image views to store pipe_image_view.

This will be useful when rebinding images.

5 years agoiris: Rework UBOs and SSBOs to use pipe_shader_buffer
Kenneth Graunke [Wed, 17 Apr 2019 06:44:15 +0000 (23:44 -0700)]
iris: Rework UBOs and SSBOs to use pipe_shader_buffer

This unifies a bunch of the UBO and SSBO code to use common structures.
Beyond iris_state_ref, pipe_shader_buffer also gives us a buffer size,
which can be useful when filling out the surface state.

5 years agoiris: Track bound constant buffers
Kenneth Graunke [Wed, 17 Apr 2019 06:01:41 +0000 (23:01 -0700)]
iris: Track bound constant buffers

This helps avoid having to iterate over [0, PIPE_MAX_CONSTANT_BUFFERS)
looking to see if any resources are bound.

5 years agoiris: Mark constants dirty on transfer unmap even if no flushes occur
Kenneth Graunke [Tue, 23 Apr 2019 02:11:44 +0000 (19:11 -0700)]
iris: Mark constants dirty on transfer unmap even if no flushes occur

I have various conditions in place to try and avoid unnecessary
PIPE_CONTROL flushes, especially to batches which may have never
used the buffer being mapped.  But if we do a CPU map to a bound
constant buffer, we still need to mark push constants dirty, even
if there's nothing happening in batches that would warrant a flush.

Fixes obvious misrendering in the "XCOM 2: War of the Chosen" menus
(lots of rainbow colored triangles).  Fixes lots of blinking elements
in "Shadow of Mordor".  Fixes missing crowd rendering in "DiRT Rally".

5 years agointel: workaround VS fixed function issue on Gen9 GT1 parts
Lionel Landwerlin [Wed, 20 Feb 2019 12:50:56 +0000 (12:50 +0000)]
intel: workaround VS fixed function issue on Gen9 GT1 parts

The issue is noticeable in the
dEQP-GLES31.functional.geometry_shading.layered.render_with_default_layer_3d
test where a triangle goes missing when we use the maximum number of
URB entries as specified by the documentation.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107505
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agointel/compiler: Improve fix_3src_operand()
Matt Turner [Thu, 18 Apr 2019 21:29:03 +0000 (14:29 -0700)]
intel/compiler: Improve fix_3src_operand()

Allow ATTR and IMM sources unconditionally (ATTR are just GRFs, IMM will
be handled by opt_combine_constants(). Both are already allowed by
opt_copy_propagation().

Also allow FIXED_GRF if the regioning is 8,8,1. Could also allow other
stride=1 regions (e.g., 4,4,1) and scalar regions but I don't think
those occur. This is sufficient to allow a pass added in a future commit
(fs_visitor::lower_linterp) to avoid emitting extra MOV instructions.

I removed the 'src.stride > 1' case because it seems wrong: 3-src
instructions on Gen6-9 are align16-only and can only do stride=1 or
stride=0. A run through Jenkins with an assert(src.stride <= 1) never
triggers, so it seems that it was dead code.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agointel/compiler: Add unit tests for sat prop for different exec sizes
Matt Turner [Thu, 18 Apr 2019 17:11:54 +0000 (10:11 -0700)]
intel/compiler: Add unit tests for sat prop for different exec sizes

The two new unit tests verify that propagating a saturate between
instructions of different exec sizes does not happen.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agointel/compiler: Use SIMD16 instructions in fs saturate prop unit test
Matt Turner [Thu, 18 Apr 2019 17:09:08 +0000 (10:09 -0700)]
intel/compiler: Use SIMD16 instructions in fs saturate prop unit test

Will allow us to test that propagation between instructions of different
exec sizes does not happen (in the next commit).

The stray-looking change in intervening_dest_write is to adjust the size
of the texture result to keep the test functioning identically when the
instructions' exec sizes are doubled. Without the change, the texture
does not overwrite the destination fully as the unit test intends.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agointel/fs: Remove fs_generator::generate_linterp from gen11+.
Rafael Antognolli [Tue, 23 Oct 2018 21:06:33 +0000 (14:06 -0700)]
intel/fs: Remove fs_generator::generate_linterp from gen11+.

We now have a lowering pass that will do this at the fs_visitor level,
so we can remove this code from gen11+.

v2: Reduce size of the "i" array from 4 to 2 (Matt).

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/fs: Add a lowering pass for linear interpolation.
Rafael Antognolli [Tue, 23 Oct 2018 16:03:32 +0000 (09:03 -0700)]
intel/fs: Add a lowering pass for linear interpolation.

On gen11, instead of using a PLN instruction, we convert
FS_OPCODE_LINTERP to 2 or 4 multiply adds. That is done in the
fs_generator code.

This patch adds a lowering pass that does the same thing at the
fs_visitor. It also drops the usage of NF types, since we don't need the
extra precision and it lets us skip the accumulator. With all that, some
optimizations will still be run on the generated code, and we should get
better scheduling.

v2: Update comment about saturation and conditional mod (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/fs: Move the scalar-region conversion to the generator.
Rafael Antognolli [Fri, 19 Oct 2018 22:44:15 +0000 (15:44 -0700)]
intel/fs: Move the scalar-region conversion to the generator.

Move the scalar-region conversion from the IR to the generator, so it
doesn't affect the Gen11 path. We need the non-scalar regioning
for a later lowering pass that we are adding.

v2: Better commit message (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/fs: Only propagate saturation if exec_size is the same.
Rafael Antognolli [Fri, 19 Oct 2018 22:33:50 +0000 (15:33 -0700)]
intel/fs: Only propagate saturation if exec_size is the same.

Otherwise it could propagate the saturation from a SIMD16 instruction
into a SIMD8 instruction. With that, only part of the destination
register, which is the source of the move with saturation, would have
been updated.

Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agoi965: Tidy bogus indentation left by previous commit
Kenneth Graunke [Mon, 22 Apr 2019 16:52:24 +0000 (09:52 -0700)]
i965: Tidy bogus indentation left by previous commit

I left code indented one level too far in the previous commit to make
the diff easier to review.  Drop that extra level now.

Fixes: 6981069fc80 i965: Ignore uniform storage for samplers or images, use binding info
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoi965: Ignore uniform storage for samplers or images, use binding info
Kenneth Graunke [Thu, 18 Apr 2019 00:25:29 +0000 (17:25 -0700)]
i965: Ignore uniform storage for samplers or images, use binding info

gl_nir_lower_samplers_as_deref creates new top level sampler and image
uniforms which have been split from structure uniforms.  i965 assumed
that it could walk through gl_uniform_storage slots by starting at
var->data.location and walking forward based on a simple slot count.
This assumed that structure types were walked in a particular order.

With samplers and images split out of structures, it becomes impossible
to assign meaningful locations.  Consider:

   struct S {
      sampler2D a;
      sampler2D b;
   } s[2];

The gl_uniform_storage locations for these follow this map:

   0 => a[0], 1 => b[0], 2 => a[0], 3 => b[0].

But the new split variables look like:

   sampler2D lowered_a[2];
   sampler2D lowered_b[2];

and there is no way to know that there's effectively a stride to get to
the location for successive elements of a[] or b[].  So, working with
location becomes effectively impossible.

Ultimately, the point of looking at uniform storage was to pull out the
bindings from the opaque index fields.  gl_nir_lower_samplers_as_derefs
can obtain this information while doing the splitting, however, and sets
up var->data.binding to have the desired values.

We move gl_nir_lower_samplers before brw_nir_lower_image_load_store so
gl_nir_lower_samplers_as_derefs has the opportunity to set proper image
bindings.  Then, we make the uniform handling code skip sampler(-array)
variables, and handle image param setup based on var->data.binding.

Fixes Piglit tests/spec/glsl-1.10/execution/samplers/uniform-struct,
this time without regressing dEQP-GLES2.functional.uniform_api.random.3.

Fixes: f003859f97c nir: Make gl_nir_lower_samplers use gl_nir_lower_samplers_as_deref
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoRevert "glsl: Set location on structure-split sampler uniform variables"
Kenneth Graunke [Wed, 17 Apr 2019 21:48:10 +0000 (14:48 -0700)]
Revert "glsl: Set location on structure-split sampler uniform variables"

This reverts commit 9e0c744f07a21fc7bb018a77cf83b057436d0d1b, which
regressed dEQP-GLES2.functional.uniform_api.random.3.  It turns out
that the newly produced location is meaningless and impossible to
consume by drivers that want to look at gl_uniform_storage, so it's
probably better to leave it unset (0) than a number that looks usable.

Leave a tombstone^Wcomment to discourage the next person from making
the obvious looking fix.

See the next commit for a longer description of the problem.

This breaks tests/spec/glsl-1.10/execution/samplers/uniform-struct
on i965, which was originally fixed by the revert.  The next commit
will fix it again.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoradeonsi: use CP DMA for the null const buffer clear on CIK
Marek Olšák [Fri, 12 Apr 2019 15:12:34 +0000 (11:12 -0400)]
radeonsi: use CP DMA for the null const buffer clear on CIK

This is a workaround for a thread deadlock that I have no idea
why it occurs.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108879
Fixes: 9b331e462e5021d994859756d46cd2519d9c9c6e
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agodrirc: Add workaround for Epic Games Launcher
Danylo Piliaiev [Wed, 17 Apr 2019 11:32:47 +0000 (14:32 +0300)]
drirc: Add workaround for Epic Games Launcher

Epic Games Launcher could be launched in opengl mode
with "-opengl" option. It creates 4.4 opengl core context
however it uses deprecated functionality e.g. default
vertex buffer object.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110462

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
5 years agoiris: Track bound and writable SSBOs
Kenneth Graunke [Wed, 17 Apr 2019 05:54:40 +0000 (22:54 -0700)]
iris: Track bound and writable SSBOs

Marek recently extended pipe->set_shader_buffers() to take an extra
writable_bitmask parameter, indicating which SSBOs are writable (some
may be bound read-only).  We can use this to decide whether to set
EXEC_OBJECT_WRITE when pinning.  Avoiding the write flag can save us
some cross-batch flushing if the SSBO is used for reading in both the
render and compute engines.

5 years agovirgl: clear vertex_array_dirty
Chia-I Wu [Thu, 18 Apr 2019 22:10:22 +0000 (15:10 -0700)]
virgl: clear vertex_array_dirty

Clear vertex_array_dirty after the state is emitted.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
5 years agogallivm: disable NEON instructions if they are not supported
Lubomir Rintel [Mon, 11 Mar 2019 20:18:48 +0000 (21:18 +0100)]
gallivm: disable NEON instructions if they are not supported

The LLVM project made some questionable decisions about defaults for
armv7 (e.g. they enable NEON that is not there on NVIDIA and Marvell
platforms).

On top of that, getHostCPUFeatures() doesn't disable missing machine
attributes. Finally, -neon alone is not sufficient to disable emmision
of NEON instructions.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agogallivm: guess CPU features also on ARM
Lubomir Rintel [Mon, 11 Mar 2019 18:16:40 +0000 (19:16 +0100)]
gallivm: guess CPU features also on ARM

getHostCPUFeatures() is also available on ARM, for even longer time than
for x86. Use it -- it potentially enables instructions that may speed
things up.

Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Cc: <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/518
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agoiris: Enable the dual_color_blend_by_location driconf option.
Kenneth Graunke [Fri, 19 Apr 2019 05:29:27 +0000 (22:29 -0700)]
iris: Enable the dual_color_blend_by_location driconf option.

This fixes rendering in Unigine Valley 1.0 and Heaven 4.0.

5 years agoiris: Add mechanism for iris-specific driconf options
Kenneth Graunke [Fri, 19 Apr 2019 05:13:41 +0000 (22:13 -0700)]
iris: Add mechanism for iris-specific driconf options

Based on Nicolai's 0f8c5de8690e7c87aa2e24383065efaca7e6fe78.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
5 years agonir: Use the NIR_SRC_AS_ macro to define nir_src_as_deref
Jason Ekstrand [Fri, 19 Apr 2019 20:09:04 +0000 (15:09 -0500)]
nir: Use the NIR_SRC_AS_ macro to define nir_src_as_deref

We have a macro for this now; no reason to hand-roll it for derefs.
While we're here, move the NIR_DEFINE_CAST for derefs down to where all
the other ones are.

Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoanv,radv: Update release notes for newly implemented extensiosn
Jason Ekstrand [Sat, 20 Apr 2019 14:44:57 +0000 (09:44 -0500)]
anv,radv: Update release notes for newly implemented extensiosn

A lot has happened in those two drivers since the 19.0 release and we
keep forgetting to update release notes.  Time to bring everything up to
date again before 19.1 gets released.

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: add VK_NV_compute_shader_derivates support
Samuel Pitoiset [Fri, 19 Apr 2019 10:40:37 +0000 (12:40 +0200)]
radv: add VK_NV_compute_shader_derivates support

Only computeDerivativeGroupLinear is supported for now.

All crucible tests pass.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agointel/fs: Add support for float16 to the fsign optimizations
Ian Romanick [Thu, 18 Apr 2019 22:09:06 +0000 (15:09 -0700)]
intel/fs: Add support for float16 to the fsign optimizations

Commit ad98fbc2174 ("intel/fs: Refactor code generation for nir_op_fsign
to its own function") criss-crossed with c2b8fb9a810 ("anv/device:
expose VK_KHR_shader_float16_int8 in gen8+"), and I was not paying
enough attention when I rebased.  This adds back the float16 changes and
enables the optimization.

v2: Incorporate more changes from 19cd2f5debd and a8d8b1a1391 that I
missed in the previous version.

Fixes: ad98fbc2174 ("intel/fs: Refactor code generation for nir_op_fsign to its own function")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110474
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
5 years agolima: add Android build
Icenowy Zheng [Mon, 15 Apr 2019 04:32:43 +0000 (12:32 +0800)]
lima: add Android build

Currently only meson build supported is added for lima driver.

Add Android build support for lima.

Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Acked-by: Qiang Yu <yuq825@gmail.com>
5 years agost/nine: skip position checks in SetCursorPosition()
Andre Heider [Thu, 11 Apr 2019 06:42:47 +0000 (08:42 +0200)]
st/nine: skip position checks in SetCursorPosition()

For HW cursors, "cursor.pos" doesn't hold the current position of the
pointer, just the position of the last call to SetCursorPosition().

Skip the check against stale values and bump the d3dadapter9 drm version
to expose this change of behaviour.

Signed-off-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Axel Davy <davyaxel0@gmail.com>
5 years agoanv: Rework the descriptor set layout create loop
Jason Ekstrand [Fri, 19 Apr 2019 19:45:34 +0000 (14:45 -0500)]
anv: Rework the descriptor set layout create loop

Previously, we were storing the per-binding create info pointer in the
immutable_samplers field temporarily so that we can switch the order in
which we walk the loop.  However, now that we have multiple arrays of
structs to walk, it makes more sense to store an index of some sort.
Because we want to leave immutable_samplers as NULL for undefined
bindings, we store index + 1 and then subtract one later.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Ignore descriptor binding flags if bindingCount == 0
Jason Ekstrand [Fri, 19 Apr 2019 19:43:01 +0000 (14:43 -0500)]
anv: Ignore descriptor binding flags if bindingCount == 0

I missed this on the first go round.  The bindingCount field of
VkDescriptorSetLayoutBindingFlagsCreateInfoEXT is allowed to be zero
which means the flags array is ignored.

Fixes: d6c9bd6e01b4d "anv: Put binding flags in descriptor set layouts"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agopanfrost/mdg: Use shared fsign lowering
Alyssa Rosenzweig [Fri, 19 Apr 2019 23:15:45 +0000 (23:15 +0000)]
panfrost/mdg: Use shared fsign lowering

Fixes failures in shaders.operator.common_functions.sign.*

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost: Fixup vertex offsets to prevent shadow copy
Alyssa Rosenzweig [Mon, 15 Apr 2019 04:08:46 +0000 (04:08 +0000)]
panfrost: Fixup vertex offsets to prevent shadow copy

Mali attribute buffers have to be 64-byte aligned. However, Gallium
enforces no such requirement; for unaligned buffers, we were previously
forced to create a shadow copy (slow!). To prevent this, we instead use
the offseted buffer's address with the lower bits masked off, and then
add those masked off bits to the src_offset. Proof of correctness
included, possibly for the opportunity to say "QED" unironically.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost: Track BO lifetime with jobs and reference counts
Alyssa Rosenzweig [Sun, 14 Apr 2019 22:42:44 +0000 (22:42 +0000)]
panfrost: Track BO lifetime with jobs and reference counts

This (fairly large) patch continues work surrounding the panfrost_job
abstraction to improve job lifetime management. In particular, we add
infrastructure to track which BOs are used by a particular job
(currently limited to the vertex buffer BOs), to reference count these
BOs, and to automatically manage the BOs memory based on the reference
count. This set of changes serves as a code cleanup, as a way of future
proofing for allowing flushing BOs, and immediately as a bugfix to
workaround the missing reference counting for vertex buffer BOs.
Meanwhile, there are a few cleanups to vertex buffer handling code
itself, so in the short-term, this allows us to remove the costly VBO
staging workaround, since this patch addresses the underlying causes.

v2: Use pipe_reference for BO reference counting, rather than managing
it ourselves. Don't duplicate hash-table key removal. Fix vertex buffer
counting.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agodocs/relnotes: add support for VK_KHR_shader_float16_int8
Andres Gomez [Thu, 18 Apr 2019 19:00:37 +0000 (21:00 +0200)]
docs/relnotes: add support for VK_KHR_shader_float16_int8

v2: radv also supports it now (Samuel Pitoiset).

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoanv/nir: Add a central helper for figuring out SSBO address formats
Jason Ekstrand [Thu, 18 Apr 2019 17:08:57 +0000 (12:08 -0500)]
anv/nir: Add a central helper for figuring out SSBO address formats

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agonir: Add helpers for getting the type of an address format
Jason Ekstrand [Thu, 18 Apr 2019 17:08:34 +0000 (12:08 -0500)]
nir: Add helpers for getting the type of an address format

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Implement VK_EXT_descriptor_indexing
Jason Ekstrand [Wed, 27 Feb 2019 22:08:20 +0000 (16:08 -0600)]
anv: Implement VK_EXT_descriptor_indexing

Now that everything is in place to do bindless for all resource types
except input attachments and UBOs, VK_EXT_descriptor_indexing is
"trivial".

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Put binding flags in descriptor set layouts
Jason Ekstrand [Tue, 2 Oct 2018 20:35:59 +0000 (15:35 -0500)]
anv: Put binding flags in descriptor set layouts

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Use bindless handles for images
Jason Ekstrand [Tue, 12 Feb 2019 07:02:28 +0000 (01:02 -0600)]
anv: Use bindless handles for images

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agointel/fs: Add support for bindless image load/store/atomic
Jason Ekstrand [Tue, 12 Feb 2019 06:47:54 +0000 (00:47 -0600)]
intel/fs: Add support for bindless image load/store/atomic

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Use bindless textures and samplers
Jason Ekstrand [Thu, 7 Feb 2019 20:10:33 +0000 (14:10 -0600)]
anv: Use bindless textures and samplers

This commit changes anv to put bindless handles and sampler pointers
into the descriptor buffer and use those instead of bindful when we run
out of binding table space.  This "spilling" of descriptors allows to to
advertise an almost unbounded number of images and samplers.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Pass the plane into lower_tex_deref
Jason Ekstrand [Fri, 8 Feb 2019 23:04:07 +0000 (17:04 -0600)]
anv: Pass the plane into lower_tex_deref

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Use write_image_view to initialize immutable samplers
Jason Ekstrand [Fri, 8 Feb 2019 04:34:57 +0000 (22:34 -0600)]
anv: Use write_image_view to initialize immutable samplers

Instead of setting it manually, call the helper.  When setting
descriptor sets becomes more complicated than just setting some struct
values, this will keep immutable sampler handling correct.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Count the number of planes in each descriptor binding
Jason Ekstrand [Thu, 7 Feb 2019 16:16:24 +0000 (10:16 -0600)]
anv: Count the number of planes in each descriptor binding

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agointel/fs: Add support for bindless texture ops
Jason Ekstrand [Wed, 6 Feb 2019 21:42:17 +0000 (15:42 -0600)]
intel/fs: Add support for bindless texture ops

We add two new texture sources for bindless surface and sampler handles.
Bindless surface handles are expected to be pre-shifted so that the
20-bit surface state table index is in the top 20 bits of the 32-bit
handle.  This lets us avoid any extra shifts in the shader.  Bindless
sampler handles are 32-byte aligned byte offsets from general state base
address.  We use 32-byte aligned instead of 16-byte aligned to avoid
having to use more indirect messages than needed.  It means we can't
tightly pack samplers but that's probably not a big deal.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agointel,nir: Lower TXD with a bindless sampler
Jason Ekstrand [Fri, 8 Feb 2019 23:56:52 +0000 (17:56 -0600)]
intel,nir: Lower TXD with a bindless sampler

When we have a bindless sampler, we need an instruction header.  Even in
SIMD8, this pushes the instruction over the sampler message size maximum
of 11 registers.  Instead, we have to lower TXD to TXL.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Implement VK_KHR_shader_atomic_int64
Jason Ekstrand [Sun, 13 Jan 2019 00:30:47 +0000 (18:30 -0600)]
anv: Implement VK_KHR_shader_atomic_int64

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Implement SSBOs bindings with GPU addresses in the descriptor BO
Jason Ekstrand [Wed, 9 Jan 2019 22:04:22 +0000 (16:04 -0600)]
anv: Implement SSBOs bindings with GPU addresses in the descriptor BO

This commit adds a new way for ANV to do SSBO bindings by just passing a
GPU address in through the descriptor buffer and using the A64 messages
to access the GPU address directly.  This means that our variable
pointers are now "real" pointers instead of a vec2(BTI, offset) pair.
This carries a few of advantages:

 1. It lets us support a virtually unbounded number of SSBO bindings.

 2. It lets us implement VK_KHR_shader_atomic_int64 which we couldn't
    implement before because those atomic messages are only available
    in the bindless A64 form.

 3. It's way better than messing around with bindless handles for SSBOs
    which is the only other option for VK_EXT_descriptor_indexing.

 4. It's more future looking, maybe?  At the least, this is what NVIDIA
    does (they don't have binding based SSBOs at all).  This doesn't a
    priori mean it's better, it just means it's probably not terrible.

The big disadvantage, of course, is that we have to start doing our own
bounds checking for robustBufferAccess again have to push in dynamic
offsets.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Lower some SSBO operations in apply_pipeline_layout
Jason Ekstrand [Fri, 11 Jan 2019 22:52:43 +0000 (16:52 -0600)]
anv: Lower some SSBO operations in apply_pipeline_layout

In order to avoid the potential overhead of A64 operations on all SSBO
ops, we look for those SSBO ops where we can get to the descriptor set
from the SSBO access operation and lower those to a binding-table
approach.  When robustBufferAccess is enabled, this lets the hardware do
the bounds checking for us.  It also avoids some potentially expensive
64-bit integer calculations.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Add a has_a64_buffer_access to anv_physical_device
Jason Ekstrand [Thu, 7 Feb 2019 18:01:18 +0000 (12:01 -0600)]
anv: Add a has_a64_buffer_access to anv_physical_device

This is more descriptive and a bit nicer than checking for gen >= 8 &&
use_softpin everywhere.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agointel/nir: Re-run int64 lowering in postprocess_nir
Jason Ekstrand [Thu, 10 Jan 2019 22:05:06 +0000 (16:05 -0600)]
intel/nir: Re-run int64 lowering in postprocess_nir

We're about to start doing 64-bit pointer calculations in ANV.  They
will get applied after brw_preprocess_nir which is where we currently do
64-bit integer arithmetic lowering.  Because we're adding 64-bit integer
arithmetic after the initial lowering has happened, we need to lower
again.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agonir/lower_io: Expose some explicit I/O lowering helpers
Jason Ekstrand [Tue, 8 Jan 2019 00:00:22 +0000 (18:00 -0600)]
nir/lower_io: Expose some explicit I/O lowering helpers

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv/pipeline: Add skeleton support for spilling to bindless
Jason Ekstrand [Mon, 25 Feb 2019 19:59:07 +0000 (13:59 -0600)]
anv/pipeline: Add skeleton support for spilling to bindless

If the number of surfaces or samplers exceeds what we can put in a
table, we will want to spill out to bindless.  There is no bindless
support yet but this gets us the basic framework that will be used by
later commits.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv/pipeline: Sort bindings by most used first
Jason Ekstrand [Thu, 21 Feb 2019 00:14:56 +0000 (18:14 -0600)]
anv/pipeline: Sort bindings by most used first

This commit just sorts the bindings by how often they're used vs the
array size of the binding.  This will let us make more nuanced decisions
about what goes in the binding table vs. what to make bindless.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Add a #define for the max binding table size
Jason Ekstrand [Tue, 16 Apr 2019 22:35:05 +0000 (17:35 -0500)]
anv: Add a #define for the max binding table size

This also fixes a bug where we mis-calculate maximum binding table sizes
and may return true in vkGetDescriptorSetLayoutSupport even for sets too
large to fit in a binding table.

Fixes: ddc40691221 "anv: Implement VK_KHR_maintenance3"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Put image params in the descriptor set buffer on gen8 and earlier
Jason Ekstrand [Thu, 22 Nov 2018 00:26:27 +0000 (18:26 -0600)]
anv: Put image params in the descriptor set buffer on gen8 and earlier

This is really where they belong; not push constants.  The one downside
here is that we can't push them anymore for compute shaders.  However,
that's a general problem and we should figure out how to push descriptor
sets for compute shaders.  This lets us bump MAX_IMAGES to 64 on BDW and
earlier platforms because we no longer have to worry about push constant
overhead limits.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoanv: Make all VkDeviceMemory BOs resident permanently
Jason Ekstrand [Wed, 27 Feb 2019 00:05:34 +0000 (18:05 -0600)]
anv: Make all VkDeviceMemory BOs resident permanently

We spend a lot of time in the driver adding things to hash sets to track
residency.  The reality is that a properly built Vulkan app uses large
memory objects and sub-allocates from them.  In a typical frame, most of
if not all of those allocations are going to be resident for the entire
frame so we're really not saving ourselves much by tracking fine-grained
residency.  Just throwing everything in the validation list does make it
a little bit more expensive inside the kernel to walk the list and
ensure that all our VA is in order.  However, without relocations, the
overhead of that is pretty small.

If we ever do run into a memory pressure situation where the fine-
grained residency could even potentially help, we would likely be
swapping one page out to make room for another within the draw call and
performance is totally lost at that point.  We're better off swapping
out other apps and just letting ours run a whole frame.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agofreedreno/ir3: fix const assert
Rob Clark [Fri, 19 Apr 2019 18:32:22 +0000 (11:32 -0700)]
freedreno/ir3: fix const assert

Fixes: fe8c57e859d freedreno/ir3: use nir_src_as_uint in a few places
Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agogallium/auxiliary/vl: Fix a couple of warnings
Kristian H. Kristensen [Thu, 18 Apr 2019 18:33:10 +0000 (11:33 -0700)]
gallium/auxiliary/vl: Fix a couple of warnings

Remove unused functions and mark unhandled default case with
unreachable.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
5 years agoegl/dri2: Mark potentially unused 'display' variable with MAYBE_UNUSED
Kristian H. Kristensen [Thu, 18 Apr 2019 18:30:22 +0000 (11:30 -0700)]
egl/dri2: Mark potentially unused 'display' variable with MAYBE_UNUSED

Sometimes there is no X11 platform.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
5 years agoralloc: Fully qualify non-virtual destructor call
Kristian H. Kristensen [Thu, 18 Apr 2019 18:28:12 +0000 (11:28 -0700)]
ralloc: Fully qualify non-virtual destructor call

This suppresses warning about calling a non-virtual destructor in a
non-final class with virtual functions:

src/compiler/glsl/ast.h:53:4: warning: destructor called on non-final 'ast_node' that has virtual functions but non-virtual destructor [-Wdelete-non-virtual-dtor]
   DECLARE_LINEAR_ZALLOC_CXX_OPERATORS(ast_node);

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
5 years agonir_opcodes.py: Saturate to expression that doesn't overflow
Kristian H. Kristensen [Thu, 18 Apr 2019 18:23:13 +0000 (11:23 -0700)]
nir_opcodes.py: Saturate to expression that doesn't overflow

Compiler warns about overflow when assigning UINT64_MAX to something
smaller than a uin64_t:

src/compiler/nir/nir_constant_expressions.c:16909:50: warning: implicit conversion from 'unsigned long long' to 'uint1_t' (aka 'unsigned char') changes value from 18446744073709551615 to 255 [-Wconstant-conversion]
            uint1_t dst = (src0 + src1) < src0 ? UINT64_MAX : (src0 + src1);
                    ~~~                          ^~~~~~~~~~

Shift UINT64_MAX down to the appropriate maximum value for the type
being assigned to.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoglsl_to_nir: Initialize debug variable
Kristian H. Kristensen [Wed, 10 Apr 2019 20:10:48 +0000 (13:10 -0700)]
glsl_to_nir: Initialize debug variable

If we want to assert on found == true when the loop exits early, we
need to initialize it to false.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
5 years agotgsi: Mark tgsi_strings_check() unused
Kristian H. Kristensen [Wed, 10 Apr 2019 20:09:01 +0000 (13:09 -0700)]
tgsi: Mark tgsi_strings_check() unused

It's there to hold the static asserts, don't warning about it being
unused.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
5 years agoanv: limit URB reconfigurations when using blorp
Lionel Landwerlin [Wed, 6 Mar 2019 11:42:14 +0000 (11:42 +0000)]
anv: limit URB reconfigurations when using blorp

If the last graphics pipeline bound to the command buffer has enough
space in its VS URB entries for Blorp then avoid reconfiguring the URB
partitions.

v2: s/0/MESA_SHADER_VERTEX/ (Caio)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agointel/devinfo: add basic sanity tests on device database
Lionel Landwerlin [Thu, 11 Apr 2019 11:12:38 +0000 (12:12 +0100)]
intel/devinfo: add basic sanity tests on device database

v2: #undef NDEBUG (Eric)
    Use inc_include & inc_src (Eric)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Anuj Phogat anuj.phogat@gmail.com
5 years agointel/devinfo: fix missing num_thread_per_eu on ICL
Lionel Landwerlin [Thu, 11 Apr 2019 11:20:36 +0000 (12:20 +0100)]
intel/devinfo: fix missing num_thread_per_eu on ICL

There was an assumption that num_thread_per_eu would be set in the
Gen8 features. Since this is mostly the same of all gen8->11 (except
GEN9_LP that overwrites it) let's just factor it out.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Anuj Phogat anuj.phogat@gmail.com
5 years agonir: Use the nir_builder _imm helpers in setting up deref offsets.
Eric Anholt [Wed, 17 Apr 2019 17:12:48 +0000 (10:12 -0700)]
nir: Use the nir_builder _imm helpers in setting up deref offsets.

When looking at the dEQP nested_struct_array_dynamic_index_fragment code
after lowering, I was horrified at the amount of adding and multiplying by
0 we were doing.  The builder _imm helpers handle that for you so that the
following optimization passes have less work to do.  Plus, it's easier to
read.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: Fix deref offset calculation for structs.
Eric Anholt [Wed, 17 Apr 2019 17:09:14 +0000 (10:09 -0700)]
nir: Fix deref offset calculation for structs.

We were calcuating the offset for the field within the struct, and just
dropping it on the floor.  Fixes a regression in
KHR-GLES3.shaders.struct.local.nested_struct_array_dynamic_index_fragment
and a few of its friends since the scratch lowering commit.

Fixes: e8e159e9df40 ("nir/deref: Add helpers for getting offsets")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agolima: enable nir fsign lowering in ppir
Erico Nunes [Tue, 16 Apr 2019 20:49:51 +0000 (22:49 +0200)]
lima: enable nir fsign lowering in ppir

The mali utgard pp doesn't support a sign instruction.
Use the nir lowering function for fsign to implement fsign in ppir.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agonir/algebraic: add lowering for fsign
Erico Nunes [Tue, 16 Apr 2019 20:49:41 +0000 (22:49 +0200)]
nir/algebraic: add lowering for fsign

The mali utgard pp doesn't support a sign instruction.
In the ARM offline shader compiler, the sign function is implemented
using sub(gt(0.0, a), lt(0.0, a)).
This is a generic optimization, so implement it in the nir level when
lower_fsign is set, alongside the lowering for isign.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agodocs: s/Aptril/April/
Brian Paul [Fri, 19 Apr 2019 14:30:27 +0000 (08:30 -0600)]
docs: s/Aptril/April/

Found by Manuel Huber.  Trivial.

5 years agolima/ppir: support ppir_op_ceil
Erico Nunes [Tue, 16 Apr 2019 21:21:24 +0000 (23:21 +0200)]
lima/ppir: support ppir_op_ceil

Add a few missing ppir_op_ceil enum handling entries to implement
nir_op_fceil in lima ppir.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
5 years agoradv: Support VK_EXT_inline_uniform_block.
Bas Nieuwenhuizen [Thu, 14 Mar 2019 10:20:53 +0000 (11:20 +0100)]
radv: Support VK_EXT_inline_uniform_block.

Basically just reserve the memory in the descriptor sets.

On the shader side we construct a buffer descriptor, since
AFAIU VGPR indexing on 32-bit pointers in LLVM is still broken.

This fully supports update after bind and variable descriptor set
sizes. However, the limits are somewhat arbitrary and are mostly
about finding a reasonable division of a 2 GiB max memory size over
the set.

v2: - rebased on top of master (Samuel)
    - remove the loading resources rework (Samuel)
    - only load UBO descriptors if it's a pointer (Samuel)
    - use LLVMBuildPtrToInt to avoid IR failures (Samuel)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v2)
5 years agoac/nir: use the new raw/struct SSBO atomic intrisics for comp_swap
Samuel Pitoiset [Thu, 18 Apr 2019 07:09:55 +0000 (09:09 +0200)]
ac/nir: use the new raw/struct SSBO atomic intrisics for comp_swap

This is actually fixed now.

This change requires LLVM r358579. Make sure to have it in
your tree, otherwise the following piglit will hang:

tests/spec/arb_shader_storage_buffer_object/execution/ssbo-atomicCompSwap-int.shader_test

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agoac/nir: only use the new raw/struct SSBO atomic intrinsics with LLVM 9+
Samuel Pitoiset [Thu, 18 Apr 2019 07:06:49 +0000 (09:06 +0200)]
ac/nir: only use the new raw/struct SSBO atomic intrinsics with LLVM 9+

They are buggy with older LLVM version, see r358579.

Fixes: 78c551aca1c ("ac/nir: use new LLVM 8 intrinsics for SSBO atomics except cmpswap")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agoac/nir: only use the new raw/struct image atomic intrinsics with LLVM 9+
Samuel Pitoiset [Thu, 18 Apr 2019 07:17:04 +0000 (09:17 +0200)]
ac/nir: only use the new raw/struct image atomic intrinsics with LLVM 9+

They are buggy with LLVM 8 because they weren't marked as source
of divergence, see r358579.

Fixes: dd0172e865f ("radv: Use structured intrinsics instead of indexing workaround for GFX9.")"
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agoiris: Be less aggressive at postdraw work skipping
Kenneth Graunke [Thu, 18 Apr 2019 21:35:26 +0000 (14:35 -0700)]
iris: Be less aggressive at postdraw work skipping

We empty the cache sets when flushing the batch, at which point we need
to add any framebuffer related BOs even though the bindings haven't
changed.  So, we now do the cache set tracking unconditionally.

For now, we continue skipping resolve work based on the same conditions
in the predraw functions - the thinking is if we didn't trigger
resolves, there's nothing to update here.  Time will tell if this works.

Partly reverts commit 365886ebe1a54f893b688b457553eead6aa572ea, and
fixes Unigine Valley rendering on Gen9+.  Drops drawoverhead scores
by about 10-12%.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110353

5 years agointel/fs: Account for live range lengths in spill costs
Jason Ekstrand [Sat, 13 Apr 2019 21:01:50 +0000 (16:01 -0500)]
intel/fs: Account for live range lengths in spill costs

The current register allocator has a concept of "spill benefit" which is
based on the number of nodes with which a given node interferes.  The
idea is that you want to spill stuff with high interference because
those are the most likely registers to help when spilling.  However,
this fails to take into account the length of the live range so the
allocator frequently picks "cheap" (not many uses) registers which are
actually very short lived and so spilling them doesn't help with the
pressure situation.

This commit takes into account the length of the live range to make
long-lived registers more likely to get spilled than short-lived ones.
This encourages the spill chooser to choose slightly larger registers
which will affect a larger area of the program and hopefully we have to
spill fewer of them to get the same reduction in over-all register
pressure.

Shader-db results on Kaby Lake:

    total spills in shared programs: 23664 -> 12050 (-49.08%)
    spills in affected programs: 19243 -> 7629 (-60.35%)
    helped: 296
    HURT: 8

    total fills in shared programs: 32028 -> 25139 (-21.51%)
    fills in affected programs: 20378 -> 13489 (-33.81%)
    helped: 295
    HURT: 16

Of course, most of that is in Deus Ex...

Shader-db results on Kaby Lake (without Deus Ex):

    total spills in shared programs: 6479 -> 5834 (-9.96%)
    spills in affected programs: 3231 -> 2586 (-19.96%)
    helped: 40
    HURT: 4

    total fills in shared programs: 17165 -> 17099 (-0.38%)
    fills in affected programs: 6951 -> 6885 (-0.95%)
    helped: 40
    HURT: 7

Even without Deus Ex, the spill help is pretty respectable.  The worst
hurt shaders were one compute shader in Aztec Ruins and one fragment
shader in KSP that were each hurt by around 13% fill 9% spill.

VkPipeline-db results on Kaby Lake:

    total spills in shared programs: 9149 -> 8069 (-11.80%)
    spills in affected programs: 5197 -> 4117 (-20.78%)
    helped: 27
    HURT: 16

    total fills in shared programs: 26390 -> 25477 (-3.46%)
    fills in affected programs: 12662 -> 11749 (-7.21%)
    helped: 24
    HURT: 22

The Vulkan results were decidedly more mixed but we don't have nearly as
many apps in that database yet.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agovirgl/vtest: bump up protocol version + support encoded transfers
Gurchetan Singh [Sat, 15 Dec 2018 00:07:19 +0000 (16:07 -0800)]
virgl/vtest: bump up protocol version + support encoded transfers

This more accurately reflects what the drm winsys does.

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agovirgl/vtest: wait after issuing a transfer get
Gurchetan Singh [Sat, 15 Dec 2018 00:36:07 +0000 (16:36 -0800)]
virgl/vtest: wait after issuing a transfer get

Otherwise, there's artifacts when running Unigine Valley with
protocol version 2.

We can get away with not waiting for most buffers, but let's
be conservative.

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agovirgl/vtest: modify sending and receiving data for shared memory
Gurchetan Singh [Thu, 13 Dec 2018 02:01:06 +0000 (18:01 -0800)]
virgl/vtest: modify sending and receiving data for shared memory

We need to copy the shared memory region to the display target.

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agovirgl/vtest: receive and handle shared memory fd
Gurchetan Singh [Wed, 12 Dec 2018 17:49:35 +0000 (09:49 -0800)]
virgl/vtest: receive and handle shared memory fd

The only tricky part is with protocol 0 we can either have
a display target or resource backing store.  With protocol
2 we can have both.  Make the map/unmap functions only deal
with the resource backing store.

v2: Handle MSAA texture case.
v3: spelling
v4: Fix dangling else (@prak)
v5: mmap --> os_mmap (@prak) + added comments (@gerddie)

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agovirgl/vtest: plumb support for shared memory
Gurchetan Singh [Wed, 12 Dec 2018 18:08:06 +0000 (10:08 -0800)]
virgl/vtest: plumb support for shared memory

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agovirgl/vtest: add utilities for receiving fds
Gurchetan Singh [Wed, 12 Dec 2018 01:01:34 +0000 (17:01 -0800)]
virgl/vtest: add utilities for receiving fds

v2: recieve --> receive (airlied@)

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agovirgl/vtest: execute a transfer_get when flushing the front buffer
Gurchetan Singh [Wed, 12 Dec 2018 23:43:43 +0000 (15:43 -0800)]
virgl/vtest: execute a transfer_get when flushing the front buffer

This just moves everything to a helper function -- "flush_front_buffer"
will be used later.

virgl_vtest_resource_map / virgl_vtest_resource_unmap already take
care to map the display target.

Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agovirgl: wait after a flush
Gurchetan Singh [Tue, 16 Apr 2019 03:36:54 +0000 (20:36 -0700)]
virgl: wait after a flush

We really need to wait under certain circumstances, or we can end
up writing to memory the same time the host is reading.

Partial revert of d6dc68 ("virgl: use uint16_t mask instead of separate booleans").

Test cases:
   - dEQP-GLES31.functional.texture.texture_buffer.render_modify.as_vertex_array.bufferdata
     on vtest protocol version 2
   - Flickering during Alien Isolation
Fixes: d6dc68 ("virgl: use uint16_t mask instead of separate booleans")
Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-By: Piotr Rak <p.rak@samsung.com>
5 years agoanv: fix uninitialized pthread cond clock domain
Lionel Landwerlin [Thu, 18 Apr 2019 16:39:36 +0000 (17:39 +0100)]
anv: fix uninitialized pthread cond clock domain

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 843775bab78a6b ("anv: Rework fences")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>