mesa.git
2 years agoegl: Let the caller of dri2_create_drawable decide about loaderPrivate.
Mathias Fröhlich [Fri, 7 Jun 2019 05:12:42 +0000 (07:12 +0200)]
egl: Let the caller of dri2_create_drawable decide about loaderPrivate.

In the call arguments to dri2_create_drawable decouple loaderPrivate
from dri2_surf. For all callers of dri2_create_drawable the two
pointers are the same with the exception of the gbm backed platform.
Let the calling code of dri2_create_drawable decide what
loaderPrivate shall be.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
2 years agoradv: fix alpha-to-coverage when there is unused color attachments
Samuel Pitoiset [Thu, 6 Jun 2019 14:31:01 +0000 (16:31 +0200)]
radv: fix alpha-to-coverage when there is unused color attachments

When alphaToCoverage is enabled, we should always write the alpha
channel of MRT0 if it's unused. This now matches RadeonSI.

This fixes the new CTS:
dEQP-VK.pipeline.multisample.alpha_to_coverage_unused_attachment.samples_*.alpha_invisible

Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl
2 years agopanfrost: ci: Switch from direct Docker use to buildah
Tomeu Vizoso [Fri, 7 Jun 2019 08:20:28 +0000 (10:20 +0200)]
panfrost: ci: Switch from direct Docker use to buildah

Use the infrastructure in wayland/ci-templates to build the container
images.

This prevents from getting into some situations in which the images
wouldn't be rebuilt, and allows us to share some infrastructure with
other projects in freedesktop.org.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Suggested-by: Michel Dänzer <michel@daenzer.net>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2 years agogallium/u_transfer_helper: Free the staging buffer on unmap.
Kenneth Graunke [Fri, 7 Jun 2019 08:16:16 +0000 (01:16 -0700)]
gallium/u_transfer_helper: Free the staging buffer on unmap.

u_transfer_helper sometimes mallocs a staging buffer, and leaked it.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2 years agointel/gpu_dump: fix argument passing
Lionel Landwerlin [Sat, 8 Jun 2019 20:48:02 +0000 (23:48 +0300)]
intel/gpu_dump: fix argument passing

We were dropping "/' around arguments grouped together.
This was triggering failures with :

   $ ./framemetrics -g "Memory Writes Distribution Gen9" -o /tmp/output.csv -f ./my.trace 10 11

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2 years agoutil/os_file: suppress sign comparison warning
Eric Engestrom [Thu, 16 May 2019 14:37:28 +0000 (15:37 +0100)]
util/os_file: suppress sign comparison warning

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2 years agoutil/os_file: fix error being sign-cast back and forth
Eric Engestrom [Thu, 16 May 2019 12:08:53 +0000 (13:08 +0100)]
util/os_file: fix error being sign-cast back and forth

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2 years agoutil/os_file: avoid shadowing read() with a local variable
Eric Engestrom [Thu, 16 May 2019 14:02:45 +0000 (15:02 +0100)]
util/os_file: avoid shadowing read() with a local variable

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2 years agoutil/os_file: actually return the error read() gave us
Eric Engestrom [Thu, 16 May 2019 13:57:07 +0000 (14:57 +0100)]
util/os_file: actually return the error read() gave us

Fixes: 316964709e21286c2af5 "util: add os_read_file() helper"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2 years agovirgl: Work around possible memory exhaustion
Alexandros Frantzis [Wed, 5 Jun 2019 13:50:11 +0000 (16:50 +0300)]
virgl: Work around possible memory exhaustion

Since we don't normally flush before performing copy transfers, it's
possible in some scenarios to use too much memory for staging resources
and start failing. This can happen either because we exhaust the total
available memory (including system memory virtio-gpu swaps out to), or,
more commonly, because the total size of resources in a command buffer
doesn't fit in virtio-gpu video memory.

To reduce the chances of this happening, force a flush before a copy
transfer if the total size of queued staging resources exceeds a certain
limit. Since after a flush any queued staging resources will be
eventually released, this ensures both that each command buffer doesn't
require too much video memory, and that we don't end up consuming too
much memory for staging resources in total.

Fixes kernel errors reported when running texture_upload tests in glbench.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2 years agovirgl: Remove incorrect resource wait condition
Alexandros Frantzis [Mon, 27 May 2019 21:06:03 +0000 (00:06 +0300)]
virgl: Remove incorrect resource wait condition

Now that we have copy transfers in place, we can remove the incorrect
resource wait condition. Copy transfers and other optimizations minimize
the performance impact of this removal, while providing the correct
behavior.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2 years agovirgl: Use copy transfers for textures
Alexandros Frantzis [Fri, 24 May 2019 11:03:28 +0000 (14:03 +0300)]
virgl: Use copy transfers for textures

Extend copy transfers to also be used for busy textures.

Performance results:
Unigine Valley, qemu before: 22.7 FPS after: 23.1 FPS

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2 years agovirgl: Use buffer copy transfers to avoid waiting when mapping
Alexandros Frantzis [Wed, 8 May 2019 09:10:21 +0000 (12:10 +0300)]
virgl: Use buffer copy transfers to avoid waiting when mapping

We typically need to wait for a buffer to become ready before mapping,
so that we don't write new contents while the host is still using the
old contents. However, if we are allowed to discard the contents of the
mapped buffer range, then we can avoid waiting by using a staging buffer
range which we guarantee to never be busy, copying from the staging
buffer range to the target buffer in the host.

This commit implements this optimization by utilizing a dedicated
u_upload_mgr for the staging buffer.

Performance results:
Twilight Struggle (Steam/Proton), qemu before: 7 FPS after: 25 FPS
glmark2 ubo, qemu before: 38 FPS after: 331 FPS

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Suggested-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2 years agovirgl: Support copy transfers
Alexandros Frantzis [Wed, 8 May 2019 13:17:53 +0000 (16:17 +0300)]
virgl: Support copy transfers

Support transfers that use a different resource as the source of data to
transfer. This will be used in upcoming commits to send data to host
buffers through a transfer upload buffer, in order to avoid waiting
when the buffer resource is busy.

Note that we don't support queueing copy transfers in the transfer
queue. Copy transfers should be emitted directly in the command queue,
allowing us to avoid flushes before them and leads to better
performance.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2 years agovirgl: Add copy_transfer3d definitions
Alexandros Frantzis [Tue, 14 May 2019 09:38:24 +0000 (12:38 +0300)]
virgl: Add copy_transfer3d definitions

Introduce definitions for the copy_transfer3d protocol command and virgl
capability. This command transfers data to the host by copying through
another resource, and will be used in upcoming commits to avoid waiting
when transferring data for busy resources.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2 years agovirgl: Make VIRGL_BIND_STAGING resources cacheable
Alexandros Frantzis [Tue, 4 Jun 2019 13:43:31 +0000 (16:43 +0300)]
virgl: Make VIRGL_BIND_STAGING resources cacheable

This could help performance when trying to recreate such resources for
copy transfers.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2 years agovirgl: Support VIRGL_BIND_STAGING
Alexandros Frantzis [Mon, 20 May 2019 10:00:38 +0000 (13:00 +0300)]
virgl: Support VIRGL_BIND_STAGING

Support a new virgl bind type for staging buffers which don't require
dedicated host-side storage. These will be used to implement copy
transfers.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2 years agovirgl: Avoid unfinished transfer_get with PIPE_TRANSFER_DONTBLOCK
Alexandros Frantzis [Thu, 23 May 2019 18:16:48 +0000 (21:16 +0300)]
virgl: Avoid unfinished transfer_get with PIPE_TRANSFER_DONTBLOCK

If we are not allowed to block, and we know that we will have to wait,
either because the resource is busy, or because it will become busy due
to a readback, return early to avoid performing an incomplete
transfer_get. Such an incomplete transfer_get may finish at any time,
during which another unsynchronized map could write to the resource
contents, leaving the contents in an undefined state.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Suggested-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2 years agovirgl: Deduplicate checks for resource caching
Alexandros Frantzis [Tue, 4 Jun 2019 13:40:33 +0000 (16:40 +0300)]
virgl: Deduplicate checks for resource caching

Also fixes a missed check for VIRGL_BIND_CUSTOM in one of the duplicate
code snippets.

Note that legacy fences also use VIRGL_BIND_CUSTOM, but we ensured they
don't go through the cache in the previous commit.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2 years agovirgl: Don't try to use cached resources for legacy fences
Alexandros Frantzis [Wed, 5 Jun 2019 07:32:01 +0000 (10:32 +0300)]
virgl: Don't try to use cached resources for legacy fences

Resources for fences should not be from the cache, since we are basing
the fence status on the resource creation busy status.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2 years agovirgl: More info about chosen alignment value
Alexandros Frantzis [Thu, 23 May 2019 11:58:46 +0000 (14:58 +0300)]
virgl: More info about chosen alignment value

Add more info about why the value of VIRGL_MAP_BUFFER_ALIGNMENT.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2 years agovirgl: store all info about atomic buffers
Chia-I Wu [Thu, 16 May 2019 22:01:36 +0000 (15:01 -0700)]
virgl: store all info about atomic buffers

We will need the full info.  This also speeds up
virgl_attach_res_atomic_buffers and fixes resource leaks when the
context is destroyed.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2 years agovirgl: add shader images to virgl_shader_binding_state
Chia-I Wu [Thu, 16 May 2019 21:33:15 +0000 (14:33 -0700)]
virgl: add shader images to virgl_shader_binding_state

It replaces virgl_context::images.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2 years agovirgl: add SSBOs to virgl_shader_binding_state
Chia-I Wu [Thu, 16 May 2019 21:33:15 +0000 (14:33 -0700)]
virgl: add SSBOs to virgl_shader_binding_state

It replaces virgl_context::ssbos.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2 years agovirgl: add UBOs to virgl_shader_binding_state
Chia-I Wu [Thu, 16 May 2019 21:00:54 +0000 (14:00 -0700)]
virgl: add UBOs to virgl_shader_binding_state

It replaces virgl_context::ubos.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2 years agovirgl: add virgl_shader_binding_state
Chia-I Wu [Thu, 16 May 2019 20:32:18 +0000 (13:32 -0700)]
virgl: add virgl_shader_binding_state

virgl_shader_binding_state will be used to manage all per-stage
shader bindings.  For now, it manages only sampler views.

This replaces virgl_textures_info and fixes some issues

 - start_slot is now honored
 - views outside of [start_slot, slart_slot+count) are unmodified
 - views are released when the context is destroyed

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2 years agoiris: Zero shs->cbuf0 when binding a passthrough TCS
Kenneth Graunke [Fri, 7 Jun 2019 19:41:28 +0000 (12:41 -0700)]
iris: Zero shs->cbuf0 when binding a passthrough TCS

Fixes valgrind errors when running two CTS tests back to back:
- KHR-GL45.shader_image_load_store.basic-allTargets-loadStoreT*
(The first test has an actual TCS, the second uses passthrough.)

2 years agointel/blorp: Only double the fast-clear rect alignment on HSW
Jason Ekstrand [Fri, 7 Jun 2019 20:13:30 +0000 (15:13 -0500)]
intel/blorp: Only double the fast-clear rect alignment on HSW

This restriction was accidentally added to the BSpec/PRM as an
unrestricted restriction starting with the HSW docs and it was never
removed.  However, it only ever applied to HSW and actually potentially
causes problems on BDW and above where we have mipmapped fast-clears.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2 years agofreedreno/a6xx: re-arrange program stageobj/group
Rob Clark [Thu, 6 Jun 2019 16:55:33 +0000 (09:55 -0700)]
freedreno/a6xx: re-arrange program stageobj/group

Split out a separate program config state group to run early before the
other groups.

This seems to help w/ intermittent "missed tiles" (although I had
assumed that was a mem2gmem issue), or at least I can't reproduce that
issue with this patch, but can without.

It has the benefit of HLSQ_VS_CNTL.CONSTLEN matching for VS and BS.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2 years agofreedreno/a6xx: fix hangs with newer sqe fw
Rob Clark [Thu, 6 Jun 2019 17:22:04 +0000 (10:22 -0700)]
freedreno/a6xx: fix hangs with newer sqe fw

With the newer (v1.76) fw, we were getting hangs (compared to older
v1.66 fw).  Re-work the GMEM code to structure things a bit closer to
the blob.  This moves some PKT7 packets from IB2 to IB1, which I think
is what was confusing SQE and causing it to get stuck in an infinite
loop.  But in general structuring things at least closer to the same way
blob does makes it easier to compare cmdstream.

Note: this is a bit on the large side for what I'd normally consider for
stable.. but right now it is looking  like it is the newer fw that is
headed for linux-firmware.  This should defn have some soak time on
master, but probably a good idea for this patch to end up in distro mesa
builds by the time a630_sqe.fw hits linux-firmware.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2 years agofreedreno/a6xx: WFI before RB_CCU_CNTL writes
Rob Clark [Thu, 6 Jun 2019 17:19:07 +0000 (10:19 -0700)]
freedreno/a6xx: WFI before RB_CCU_CNTL writes

This seems to be in a block of non buffered/context regs.  Blob always
WFIs before write, so probably a good idea.

Annoyingly, compared to ealier gens, it is a bit harder to tell from the
register offset whether it is a buffered reg, it isn't as simple as
everything below 0x2000, it seems.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2 years agofreedreno/a6xx: don't pre-dispatch texture fetch on accident
Rob Clark [Thu, 6 Jun 2019 16:53:15 +0000 (09:53 -0700)]
freedreno/a6xx: don't pre-dispatch texture fetch on accident

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2 years agofreedreno/a6xx: fix issues with gallium HUD
Rob Clark [Thu, 6 Jun 2019 16:45:25 +0000 (09:45 -0700)]
freedreno/a6xx: fix issues with gallium HUD

In some cases the draw for the text wasn't working.  This seems to be
fixed by resyncing some of the "golded registers" from blob (initial
values were based on somewhat older blob version).

Perhaps good to have a bit of soak time on master, but would be good
to eventually land in 19.x stable branches.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2 years agoanv/cmd_buffer: Initalize the clear color struct for CNL+
Nanley Chery [Wed, 24 Oct 2018 21:50:32 +0000 (14:50 -0700)]
anv/cmd_buffer: Initalize the clear color struct for CNL+

On CNL+, the clear color struct is composed of RGBA channel values and
fields which are either reserved by the HW or used to control
fast-clears. Currently anv initializes the channel values to zero and
allows the other fields to be undefined.

Satisfy the MBZ field requirements by removing an optimization that
doesn't hold true for CNL+ and pulling in the number of dwords to
initialize from ISL.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2 years agoglx/windows: Fix compilation with -Werror-format
Jon Turney [Thu, 6 Jun 2019 15:44:08 +0000 (16:44 +0100)]
glx/windows: Fix compilation with -Werror-format

Fix compilation where the DWORD type is used with a format, after
-Werror-format added by c9c1e261.

Some Win32 API types are different fundamental types in the 32-bit and
64-bit versions. This problem is then further compounded by the fact
that whilst both 32-bit Cygwin and 32-bit MinGW use the ILP32 data
model, 64-bit MinGW uses the LLP64 data model, but 64-bit Cygwin uses
the LP64 data model. This makes it near impossible to write printf
format specifiers which are correct for all those targets.

In the Win32 API, DWORD is an unsigned, 32-bit type.  So, it is defined
in terms of an unsigned long, except in the LP64 data model used by
64-bit Cygwin, where it is an unsigned int.

It should always be safe to cast it to unsigned int and use %u or %x.

Reviewed-by: Eric Anholt <eric@anholt.net>
2 years agoiris: Rename bind_state to bind_shader_state.
Kenneth Graunke [Fri, 7 Jun 2019 18:26:20 +0000 (11:26 -0700)]
iris: Rename bind_state to bind_shader_state.

bind_state is possibly the worst name ever.  For create, we used
create_shader_state, which is more descriptive.  Put shader in the name.

2 years agoisl: Mark enum isl_channel_select packed so it becomes 1 byte.
Kenneth Graunke [Fri, 7 Jun 2019 00:36:09 +0000 (17:36 -0700)]
isl: Mark enum isl_channel_select packed so it becomes 1 byte.

I recently discovered that the following code lead to valgrind errors:

   struct isl_swizzle swizzle = ISL_SWIZZLE_IDENTITY;
   VALGRIND_CHECK_MEM_IS_DEFINED(&swizzle, sizeof(swizzle));

which is surprising, because struct isl_swizzle is simply:

   struct isl_swizzle {
      enum isl_channel_select r:4;
      enum isl_channel_select g:4;
      enum isl_channel_select b:4;
      enum isl_channel_select a:4;
   };

and the above code initializes all of them with a C99 initializer.
Iván Briano reminded me that C99 initializers don't necessarily zero
padding.  A quick inspection revealed that sizeof(struct isl_swizzle)
was 4 (rather than the expected 2).  Ian Romanick suggested changing
it to uint16_t, since this is essentially dicing up an unsigned, and
that worked.

This patch marks enum isl_channel_select packed, changing its size
from 4 bytes to 1 byte.  This then makes struct isl_swizzle 2 bytes,
with no bogus padding fields.  This eliminates valgrind undefined
memory warnings.

These isl_swizzle values become part of our BLORP blit program keys,
which are then hashed.  This undefined padding was being included in
the hashing, possibly leading to issues.  I originally saw this error
when running KHR-GL45.texture_size_promotion.functional in iris under
valgrind.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2 years agopanfrost/ci: Texture wrap tests are legitimately fixed
Alyssa Rosenzweig [Thu, 6 Jun 2019 19:04:16 +0000 (12:04 -0700)]
panfrost/ci: Texture wrap tests are legitimately fixed

These depended on the wallpaper reload.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2 years agopanfrost/midgard: Lower inot to inor with 0
Alyssa Rosenzweig [Thu, 6 Jun 2019 18:58:57 +0000 (11:58 -0700)]
panfrost/midgard: Lower inot to inor with 0

We were previously lowering to inand, but the second arg was not
duplicated so inot would always return ~0. Oops.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2 years agopanfrost/midgard: Cleanup tag fetch in disassembler
Alyssa Rosenzweig [Thu, 6 Jun 2019 18:20:21 +0000 (11:20 -0700)]
panfrost/midgard: Cleanup tag fetch in disassembler

Trivial.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2 years agopanfrost/midgard: Use fancy iterator
Alyssa Rosenzweig [Thu, 6 Jun 2019 18:19:44 +0000 (11:19 -0700)]
panfrost/midgard: Use fancy iterator

Trivial cleanup.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2 years agopanfrost/midgard: Cull dead branches
Alyssa Rosenzweig [Thu, 6 Jun 2019 18:19:13 +0000 (11:19 -0700)]
panfrost/midgard: Cull dead branches

This fixes bugs with complex control flow.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2 years agopanfrost/midgard: Add mir_print_bundle helper
Alyssa Rosenzweig [Thu, 6 Jun 2019 18:18:30 +0000 (11:18 -0700)]
panfrost/midgard: Add mir_print_bundle helper

This helps with debugging scheduling/emission.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2 years agopanfrost/midgard/disasm: Pretty-print branch tags
Alyssa Rosenzweig [Thu, 6 Jun 2019 17:21:57 +0000 (10:21 -0700)]
panfrost/midgard/disasm: Pretty-print branch tags

Just makes it a little more obvious what's going on.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2 years agopanfrost/ci: Note some since-fixed tests
Alyssa Rosenzweig [Thu, 6 Jun 2019 16:42:23 +0000 (09:42 -0700)]
panfrost/ci: Note some since-fixed tests

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2 years agopanfrost/midgard: Vectorize I/O
Alyssa Rosenzweig [Thu, 6 Jun 2019 16:15:26 +0000 (09:15 -0700)]
panfrost/midgard: Vectorize I/O

This uses the new mesa/st functionality for NIR I/O vectorization, which
eliminates a number of corner cases (resulting in assorted dEQP
failures and regressions) and should improve performance substantial due
to lessened pressure on the load/store pipe.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2 years agopanfrost/midgard: Remove varyings delay pass
Alyssa Rosenzweig [Thu, 6 Jun 2019 15:21:27 +0000 (08:21 -0700)]
panfrost/midgard: Remove varyings delay pass

This pass interfered with the more delicate path required for
non-vectorized I/O. It's also ugly and duplicating the job of an actual
honest-to-goodness scheduler.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2 years agopanfrost/midgard: Apply component to load_input
Alyssa Rosenzweig [Thu, 6 Jun 2019 15:16:04 +0000 (08:16 -0700)]
panfrost/midgard: Apply component to load_input

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2 years agonir: fix s/&&/||/ typo
Eric Engestrom [Fri, 7 Jun 2019 15:04:25 +0000 (16:04 +0100)]
nir: fix s/&&/||/ typo

Fixes: cd73b6174b093b75f581 "nir/lower_to_source_mods: Stop turning add, sat, and neg into mov"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2 years agofreedreno/a6xx: Drop struct stage array
Kristian H. Kristensen [Fri, 7 Jun 2019 03:29:38 +0000 (20:29 -0700)]
freedreno/a6xx: Drop struct stage array

This now boils down to just picking between binning or vertex shader
and dummy_fs or real fs, which we can do in a couple of lines of code
instead.  The constlen logic isn't doing what it thinks it's doing,
both constlens at this point

  MAX2(s[VS].constlen, align(state->bs->constlen, 4));

are binning shader constlens.  We'll have to revisit the constlen
logic, but this commit doesn't change how it works.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2 years agofreedreno/a6xx: Drop support for SS6_DIRECT shader upload
Kristian H. Kristensen [Tue, 4 Jun 2019 20:44:48 +0000 (13:44 -0700)]
freedreno/a6xx: Drop support for SS6_DIRECT shader upload

a6xx only supports indirect shaders.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2 years agofreedreno/a6xx: Share shader_t_to_opcode
Kristian H. Kristensen [Thu, 6 Jun 2019 04:35:35 +0000 (21:35 -0700)]
freedreno/a6xx: Share shader_t_to_opcode

We have a similar function in fd6_program.c. Move to fd6_emit.h and
share.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2 years agofreedreno/a6xx: Consolidate more of dword 0 building in fd6_draw_vbo
Kristian H. Kristensen [Mon, 3 Jun 2019 21:12:59 +0000 (14:12 -0700)]
freedreno/a6xx: Consolidate more of dword 0 building in fd6_draw_vbo

There's already a bit of duplicated logic here and tessellation will
add more. Build up dword 0 in fd6_draw_vbo() and drop the a4xx in the
process.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2 years agofreedreno: Move fd4_size2indextype() helper to freedreno_util.h
Kristian H. Kristensen [Mon, 3 Jun 2019 21:01:14 +0000 (14:01 -0700)]
freedreno: Move fd4_size2indextype() helper to freedreno_util.h

In preparation for refactoring fd6_draw.c a bit.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2 years agoradv: enable VK_EXT_sample_locations
Samuel Pitoiset [Thu, 30 May 2019 07:58:01 +0000 (09:58 +0200)]
radv: enable VK_EXT_sample_locations

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2 years agoradv: enable HTILE for images that might need variable sample locations
Samuel Pitoiset [Thu, 30 May 2019 08:26:43 +0000 (10:26 +0200)]
radv: enable HTILE for images that might need variable sample locations

This is now supported.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2 years agoradv: handle sample locations during automatic layout transitions
Samuel Pitoiset [Thu, 30 May 2019 10:27:29 +0000 (12:27 +0200)]
radv: handle sample locations during automatic layout transitions

From the Vulkan spec 1.1.109:

   "Some implementations may need to evaluate depth image values
    while performing image layout transitions. To accommodate this,
    instances of the VkSampleLocationsInfoEXT structure can be
    specified for each situation where an explicit or automatic
    layout transition has to take place. [...] and
    VkRenderPassSampleLocationsBeginInfoEXT can be chained from
    VkRenderPassBeginInfo to provide sample locations for layout
    transitions performed implicitly by a render pass instance."

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2 years agoradv: determine the first subpass id for every attachments
Samuel Pitoiset [Thu, 30 May 2019 12:10:42 +0000 (14:10 +0200)]
radv: determine the first subpass id for every attachments

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2 years agoradv: handle sample locations during explicit depth/stencil transitions
Samuel Pitoiset [Thu, 30 May 2019 10:23:21 +0000 (12:23 +0200)]
radv: handle sample locations during explicit depth/stencil transitions

From the Vulkan spec 1.1.109,

   "Some implementations may need to evaluate depth image values
    while performing image layout transitions. To accommodate this,
    instances of the VkSampleLocationsInfoEXT structure can be
    specified for each situation where an explicit or automatic
    layout transition has to take place. VkSampleLocationsInfoEXT
    can be chained from VkImageMemoryBarrier structures to provide
    sample locations for layout transitions performed by
    vkCmdWaitEvents and vkCmdPipelineBarrier calls."

This handles explicit depth/stencil layout transitions performed
with CmdWaitEvents() or CmdPipelineBarrier().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2 years agoradv: allow the depth decompress pass to emit dynamic sample locations
Samuel Pitoiset [Thu, 30 May 2019 10:20:12 +0000 (12:20 +0200)]
radv: allow the depth decompress pass to emit dynamic sample locations

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2 years agoradv: allow to set dynamic sample locations to the depth decompress pass
Samuel Pitoiset [Thu, 30 May 2019 09:52:56 +0000 (11:52 +0200)]
radv: allow to set dynamic sample locations to the depth decompress pass

If VK_EXT_sample_locations is used, the driver might need to emit
the sample locations specified during layout transitions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2 years agoradv: allow to save/restore sample locations during meta operations
Samuel Pitoiset [Thu, 30 May 2019 09:50:22 +0000 (11:50 +0200)]
radv: allow to save/restore sample locations during meta operations

This will be used for the depth decompress pass that might need
to emit variable sample locations during layout transitions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2 years agoiris: Sweep the NIR in iris_create_uncompiled_shader().
Kenneth Graunke [Fri, 7 Jun 2019 07:57:25 +0000 (00:57 -0700)]
iris: Sweep the NIR in iris_create_uncompiled_shader().

We run a ton of backend specific passes here (mostly brw_preprocess_nir)
and ought to sweep up any unused memory at this point, since we're going
to hang on to this NIR for as long as the linked program lives.

2 years agoir3: Use the new NIR lowering pass for integer multiplication
Eduardo Lima Mitev [Sun, 12 May 2019 22:33:57 +0000 (00:33 +0200)]
ir3: Use the new NIR lowering pass for integer multiplication

Shader-db stats courtesy of Eric Anholt:

total instructions in shared programs: 6480215 -> 6475457 (-0.07%)
instructions in affected programs: 662105 -> 657347 (-0.72%)
helped: 1209
HURT: 13
total constlen in shared programs: 1432704 -> 1427769 (-0.34%)
constlen in affected programs: 100063 -> 95128 (-4.93%)
helped: 512
HURT: 0
total max_sun in shared programs: 875561 -> 873387 (-0.25%)
max_sun in affected programs: 46179 -> 44005 (-4.71%)
helped: 1087
HURT: 0

Reviewed-by: Eric Anholt <eric@anholt.net>
2 years agoir3/nir: Add new NIR AlgebraicPass for lowering imul
Eduardo Lima Mitev [Sun, 12 May 2019 22:23:58 +0000 (00:23 +0200)]
ir3/nir: Add new NIR AlgebraicPass for lowering imul

Currently, ir3 backend compiler is lowering integer multiplication from:

dst = a * b

to:

dst = (al * bl) + (ah * bl << 16) + (al * bh << 16)

by emitting this code:

mull.u tmp0, a, b           ; mul low, i.e. al * bl
madsh.m16 tmp1, a, b, tmp0  ; mul-add shift high mix, i.e. ah * bl << 16
madsh.m16 dst, b, a, tmp1   ; i.e. al * bh << 16

which at that point has very low chances of being optimized.

This patch adds a new nir_algebraic.AlgebraicPass to performs this
lowering during NIR algebraic optimization passes, giving it a better
chance for optimizing the resulting code.

Reviewed-by: Eric Anholt <eric@anholt.net>
2 years agonir_algebraic: Add basic optimizations for umul_low and imadsh_mix16
Eduardo Lima Mitev [Sun, 12 May 2019 22:09:38 +0000 (00:09 +0200)]
nir_algebraic: Add basic optimizations for umul_low and imadsh_mix16

For umul_low (al * bl), zero is returned if the low 16-bits word of either
source is zero.

for imadsh_mix16 (ah * bl << 16 + c), c is returned if either 'ah' or 'bl'
is zero.

A couple of nir_search_helpers are added:

is_upper_half_zero() returns true if the highest word of all components of
an integer NIR alu src are zero.

is_lower_half_zero() returns true if the lowest word of all components of
an integer nir alu src are zero.

Reviewed-by: Eric Anholt <eric@anholt.net>
2 years agoir3/compiler: Handle new alu opcodes 'umul_low' and 'imadsh_mix16'
Eduardo Lima Mitev [Sun, 12 May 2019 19:12:59 +0000 (21:12 +0200)]
ir3/compiler: Handle new alu opcodes 'umul_low' and 'imadsh_mix16'

They directly emit ir3_MULL_U and ir3_MADSH_M16 respectively.

Reviewed-by: Eric Anholt <eric@anholt.net>
2 years agonir/opcodes: Add new 'umul_low' and 'imadsh_mix16' opcodes
Eduardo Lima Mitev [Fri, 29 Mar 2019 09:49:12 +0000 (10:49 +0100)]
nir/opcodes: Add new 'umul_low' and 'imadsh_mix16' opcodes

'umul_low' is the low 32-bits of unsigned integer multiply. It maps
directly to ir3's MULL_U.

'imadsh_mix16' is multiply add with shift and mix, an ir3 specific
instruction that maps directly to ir3's IMADSH_M16.

Both are necessary for the lowering of integer multiplication on
Freedreno, which will be introduced later in this series.

Reviewed-by: Eric Anholt <eric@anholt.net>
2 years agov3d: don't emit point coordinates varyings if the FS doesn't read them
Iago Toral Quiroga [Thu, 6 Jun 2019 08:04:27 +0000 (10:04 +0200)]
v3d: don't emit point coordinates varyings if the FS doesn't read them

We still need to emit them in V3D 3.x since there there is no mechanism to
disable them.

Reviewed-by: Eric Anholt <eric@anholt.net>
2 years agov3d: add a helper to track variables that need point coordinates
Iago Toral Quiroga [Thu, 6 Jun 2019 07:41:33 +0000 (09:41 +0200)]
v3d: add a helper to track variables that need point coordinates

Reviewed-by: Eric Anholt <eric@anholt.net>
2 years agoegl/x11: calloc dri2_surf so it's properly zeroed
Kenneth Graunke [Fri, 7 Jun 2019 05:17:06 +0000 (22:17 -0700)]
egl/x11: calloc dri2_surf so it's properly zeroed

Commit 2282ec0a refactored drawable creation across various platforms
into a new dri2_create_drawable helper function.

The GBM code in platform_drm.c code passed in dri2_surf->gbm_surf as the
loaderPrivate, while most other backends passed in dri2_surf directly.

To try and handle this, the patch checked if dri2_surf->gbm_surf was
non-NULL, and if so, presumed that the caller is the DRM platform and
we should use the dri2_surf->gbm_surf pointer.

This worked for most platforms, which calloc their dri2_surf structure,
zeroing the data.  Unfortunately, platform_x11.c used malloc, leaving
most of the dri2_surf as garbage.  In particular, dri2_surf->gbm_surf
was often non-NULL, causing dri2_create_drawable to try and use it,
passing a garbage pointer to the createNewDrawable hook, usually leading
to a SIGBUS or SIGSEGV when trying to dereference that bad pointer.

Since most callers calloc the data, make platform_x11.c follow suit.

Fixes crashes with i915_dri.so when running dEQP-GLES2.

Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2 years agotests/graw: use C99 print conversion specifier for 32 bit builds
Mark Janes [Thu, 6 Jun 2019 05:48:41 +0000 (22:48 -0700)]
tests/graw: use C99 print conversion specifier for 32 bit builds

Fixes formatting errors for 32 bit compilations, eg:

  error: format specifies type 'unsigned long' but the argument has
  type 'uint64_t' (aka 'unsigned long long') [-Werror,-Wformat]
  printf("result1 = %lu result2 = %lu\n", res1.u64, res2.u64);

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2 years agopanfrost/midgard: Fix crash with unused SSA values
Alyssa Rosenzweig [Thu, 6 Jun 2019 15:15:23 +0000 (08:15 -0700)]
panfrost/midgard: Fix crash with unused SSA values

Crash introduced in "b38dab101ca7e0896255dccbd85fd510c47d84d1" but not
adding a Fixes tag since it's our bug anyway.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2 years agopanfrost: Report sRGB colorspace as not supported
Boris Brezillon [Thu, 6 Jun 2019 16:44:09 +0000 (18:44 +0200)]
panfrost: Report sRGB colorspace as not supported

The driver does not support sRGB yet, so let's report it as unsupported.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2 years agodocs: do not use div for line-breaking
Erik Faye-Lund [Thu, 6 Jun 2019 09:19:08 +0000 (11:19 +0200)]
docs: do not use div for line-breaking

HTML has the <p>-tag for this purpose. It adds some margins, but that
just makes this read better, IMO.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2 years agodocs: fixup code-tag positioning
Erik Faye-Lund [Thu, 6 Jun 2019 08:11:31 +0000 (10:11 +0200)]
docs: fixup code-tag positioning

This reads better if we include the asterisk in the code-block, as it's
part of the function-reference, even though it's not technically
speaking code. But as the <code>-tag isn't purely for code, this should
be fine.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2 years agodocs: add missing code-tags
Erik Faye-Lund [Thu, 6 Jun 2019 08:16:36 +0000 (10:16 +0200)]
docs: add missing code-tags

Looks like I missed a few cases when I recently added more code-tags
here. So let's add these cases as well.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2 years agodocs: add accidentally dropped "at"
Erik Faye-Lund [Thu, 6 Jun 2019 09:01:54 +0000 (11:01 +0200)]
docs: add accidentally dropped "at"

When rewriting 20c56e18c21 after review, I accidentally dropped the "at"
here. Sorry for that, and let's fix it up!

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 20c56e18c21 ("docs: use proper links instead of code-tags")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2 years agoanv: allow NV12 <--> AHARDWAREBUFFER_FORMAT_Y8Cb8Cr8_420 inter-op
Gurchetan Singh [Wed, 5 Jun 2019 16:51:07 +0000 (09:51 -0700)]
anv: allow NV12 <--> AHARDWAREBUFFER_FORMAT_Y8Cb8Cr8_420 inter-op

AHARDWAREBUFFER_FORMAT_Y8Cb8Cr8_420 is an implementation defined
flexible YUV format.  Most of the times, it's NV12 or YV12.
On Intel, NV12 is preferred since it can be used by the display
engine.  

This API adds a dependency between gralloc and buffer consumers,
unfortunately.  Right now, the code seems to work for i915 gralloc,
but not cros_gralloc.  Add a preprocessor flag to fix this.

TEST=android.graphics.cts.MediaVulkanGpuTest#testMediaImportAndRendering

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2 years agoac/nir: Remove stale TODO
Connor Abbott [Wed, 5 Jun 2019 14:54:24 +0000 (16:54 +0200)]
ac/nir: Remove stale TODO

While we're here, copy the comment explaining this from radeonsi.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2 years agoradeonsi: Don't force dcc disable for loads
Connor Abbott [Wed, 5 Jun 2019 10:37:46 +0000 (12:37 +0200)]
radeonsi: Don't force dcc disable for loads

When e9d935ed0e2 added force_dcc_off(), we forced it off for any
preloaded image descriptor which had stores associated with them, since
the same preloaded descriptors were used for loads and stores. However,
when the preloading was removed in 16be87c9042, the existing logic was
kept despite it not being necessary anymore. The comment above
force_dcc_off() only mentions stores, so only force DCC off for stores.

Cc: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2 years agomesa/main: Expose EXT_clip_control and related enums and the function
Gert Wollny [Sat, 11 May 2019 15:48:18 +0000 (17:48 +0200)]
mesa/main: Expose EXT_clip_control and related enums and the function

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2 years agomapi/glapi/registry: Update gl.xml to latest upstream version
Gert Wollny [Sat, 11 May 2019 15:44:17 +0000 (17:44 +0200)]
mapi/glapi/registry: Update gl.xml to latest upstream version

The old copy didn't include EXT_clip_control, so update it.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2 years agovirgl: Enable CAP_CLIP_HALFZ if host supports it
Gert Wollny [Tue, 7 May 2019 17:50:46 +0000 (19:50 +0200)]
virgl: Enable CAP_CLIP_HALFZ if host supports it

On according hosts this enables the piglits as "pass":
  arb_clip_control-*

v2: sync flag with host

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com> (v1)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2 years agosvga: Remove unnecessary check for the pre flush bit for setting vertex buffers
Charmaine Lee [Tue, 7 May 2019 21:07:50 +0000 (14:07 -0700)]
svga: Remove unnecessary check for the pre flush bit for setting vertex buffers

This fixes the missing rebind when the can_pre_flush bit
is not set and the vertex buffers are the same as what have been sent.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Signed-off-by: Charmaine Lee <charmainel@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
2 years agowinsys/svga/drm: Fix 32-bit RPCI send message
Deepak Rawat [Wed, 9 May 2018 22:50:39 +0000 (15:50 -0700)]
winsys/svga/drm: Fix 32-bit RPCI send message

Depending on whether compiled with frame-pointer or not, the temporary
memory location used for the bp parameter in these macros are referenced
relative to the stack pointer or the frame pointer.
Hence we can never reference that parameter when we've modified either
the stack pointer or the frame pointer, because then the compiler would
generate an incorrect stack reference.

Fix this by pushing the temporary memory parameter on a known location on
the stack before modifying the stack- and frame pointers.

Also in case of failuire RPCI channel is not closed which lead to vmx
running out of channels.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Deepak Rawat <drawat@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
2 years agoradv: set the subpass before any initial subpass transitions
Samuel Pitoiset [Thu, 30 May 2019 13:13:59 +0000 (15:13 +0200)]
radv: set the subpass before any initial subpass transitions

This might fix initial subpass transitions when multiview is used.
Noticed while implementing sample locations during layout transitions.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2 years agoanv: Fix check for isl_fmt in assert
Nataraj Deshpande [Wed, 5 Jun 2019 19:32:01 +0000 (12:32 -0700)]
anv: Fix check for isl_fmt in assert

Checking isl_fmt returned value in assert seems appropriate
instead of format variable.

Fixes: f1654fa7e31 "anv/android: support creating images from external format"
Signed-off-by: Nataraj Deshpande <nataraj.deshpande@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
2 years agov3d: fix scheduling dependency tracking for ALU with small immediates
Iago Toral Quiroga [Wed, 5 Jun 2019 06:53:10 +0000 (08:53 +0200)]
v3d: fix scheduling dependency tracking for ALU with small immediates

We were not accountint for small immediates in the B mux so the scheduler
was interpreting these are regular register file accesses, which could
lead to additional (incorrect) write-read dependencies.

Shader-db changes:

total instructions in shared programs: 9163664 -> 9137263 (-0.29%)
instructions in affected programs: 3931035 -> 3904634 (-0.67%)
helped: 12457
HURT: 2563

total max-temps in shared programs: 1325787 -> 1325597 (-0.01%)
max-temps in affected programs: 5746 -> 5556 (-3.31%)
helped: 186
HURT: 16
helped stats (abs) min: 1 max: 4 x̄: 1.12 x̃: 1
helped stats (rel) min: 1.45% max: 22.22% x̄: 4.42% x̃: 3.28%
HURT stats (abs)   min: 1 max: 3 x̄: 1.12 x̃: 1
HURT stats (rel)   min: 2.86% max: 10.00% x̄: 5.76% x̃: 5.88%
95% mean confidence interval for max-temps value: -1.04 -0.84
95% mean confidence interval for max-temps %-change: -4.16% -3.07%
Max-temps are helped.

Reviewed-by: Eric Anholt <eric@anholt.net>
2 years agolima/ppir: add missing handling of min/max ops for vec4 add slot
Vasily Khoruzhick [Tue, 4 Jun 2019 15:56:38 +0000 (08:56 -0700)]
lima/ppir: add missing handling of min/max ops for vec4 add slot

Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2 years agolima/ppir: fix crash when program uses no registers at all
Vasily Khoruzhick [Sat, 1 Jun 2019 05:30:54 +0000 (22:30 -0700)]
lima/ppir: fix crash when program uses no registers at all

Program may need no regalloc at all, e.g. in case when program consists
of single discard op.

Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
2 years agoutil/hash_table: Assert that keys are not reserved pointers
Jason Ekstrand [Wed, 5 Jun 2019 22:30:47 +0000 (17:30 -0500)]
util/hash_table: Assert that keys are not reserved pointers

If we insert a NULL key, it will appear to succeed but will mess up
entry counting.  Similar errors can occur if someone accidentally
inserts the deleted key.  The later is highly unlikely but technically
possible so we should guard against it too.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2 years agoutil/set: Assert that keys are not reserved pointers
Jason Ekstrand [Wed, 5 Jun 2019 21:56:20 +0000 (16:56 -0500)]
util/set: Assert that keys are not reserved pointers

If we insert a NULL key, it will appear to succeed but will mess up
entry counting.  Similar errors can occur if someone accidentally
inserts the deleted key.  The later is highly unlikely but technically
possible so we should guard against it too.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2 years agoglsl/loop_analysis: Don't search for NULL variables in the hash table
Jason Ekstrand [Wed, 5 Jun 2019 23:35:14 +0000 (18:35 -0500)]
glsl/loop_analysis: Don't search for NULL variables in the hash table

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2 years agonir/propagate_invariant: Don't add NULL vars to the hash table
Jason Ekstrand [Wed, 5 Jun 2019 21:54:40 +0000 (16:54 -0500)]
nir/propagate_invariant: Don't add NULL vars to the hash table

Fixes: 8410cf66d "nir/propagate_invariant: Skip unknown vars"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2 years agointel/compiler: Treat b32csel as potentially producing a Boolean result for resolve...
Ian Romanick [Tue, 4 Jun 2019 19:16:55 +0000 (12:16 -0700)]
intel/compiler: Treat b32csel as potentially producing a Boolean result for resolve analysis

If the 2nd and 3rd source are both Boolean values, we can potentially
avoid a resolve by only resolving the result of the b32csel.

No changes on any Gen6+ Intel platform.

v2: Use ?: instead of cast from bool to unsigned.  Suggested by Caio.

Iron Lake
total instructions in shared programs: 8142729 -> 8142677 (<.01%)
instructions in affected programs: 12890 -> 12838 (-0.40%)
helped: 26
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.25% max: 0.74% x̄: 0.45% x̃: 0.38%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -0.52% -0.39%
Instructions are helped.

total cycles in shared programs: 188549632 -> 188549394 (<.01%)
cycles in affected programs: 60754 -> 60516 (-0.39%)
helped: 25
HURT: 1
helped stats (abs) min: 2 max: 26 x̄: 9.92 x̃: 8
helped stats (rel) min: 0.07% max: 2.23% x̄: 0.59% x̃: 0.27%
HURT stats (abs)   min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel)   min: 0.70% max: 0.70% x̄: 0.70% x̃: 0.70%
95% mean confidence interval for cycles value: -12.91 -5.40
95% mean confidence interval for cycles %-change: -0.84% -0.23%
Cycles are helped.

GM45
total instructions in shared programs: 5013119 -> 5013093 (<.01%)
instructions in affected programs: 6764 -> 6738 (-0.38%)
helped: 13
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.24% max: 0.68% x̄: 0.43% x̃: 0.36%
95% mean confidence interval for instructions value: -2.00 -2.00
95% mean confidence interval for instructions %-change: -0.52% -0.34%
Instructions are helped.

total cycles in shared programs: 128977804 -> 128977700 (<.01%)
cycles in affected programs: 37738 -> 37634 (-0.28%)
helped: 13
HURT: 0
helped stats (abs) min: 8 max: 8 x̄: 8.00 x̃: 8
helped stats (rel) min: 0.18% max: 0.46% x̄: 0.30% x̃: 0.26%
95% mean confidence interval for cycles value: -8.00 -8.00
95% mean confidence interval for cycles %-change: -0.36% -0.24%
Cycles are helped.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2 years agointel/fs: Improve discard_if code generation
Ian Romanick [Tue, 21 May 2019 00:25:01 +0000 (17:25 -0700)]
intel/fs: Improve discard_if code generation

Previously we would blindly emit an sequence like:

        mov(1)          f0.1<1>UW       g1.14<0,1,0>UW
        ...
        cmp.l.f0(16)    g7<1>F          g5<8,8,1>F      0x41700000F  /* 15F */
(+f0.1) cmp.z.f0.1(16)  null<1>D        g7<8,8,1>D      0D

The first move sets the flags based on the initial execution mask.
Later discard sequences contain a predicated compare that can only
remove more SIMD channels.  Often times the only user of the result from
the first compare is the second compare.  Instead, generate a sequence
like

        mov(1)          f0.1<1>UW       g1.14<0,1,0>UW
        ...
        cmp.l.f0(16)    g7<1>F          g5<8,8,1>F      0x41700000F  /* 15F */
(+f0.1) cmp.ge.f0.1(8)  null<1>F        g5<8,8,1>F      0x41700000F  /* 15F */

If the results stored in g7 and f0.0 are not used, the comparison will
be eliminated.  This removes an instruction and potentially reduces
register pressure.

v2: Major re-write of the commit message (including fixing the assembly
code).  Suggested by Matt.

All Gen8+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224434 -> 17198659 (-0.15%)
instructions in affected programs: 2908125 -> 2882350 (-0.89%)
helped: 18891
HURT: 5
helped stats (abs) min: 1 max: 12 x̄: 1.38 x̃: 1
helped stats (rel) min: 0.03% max: 25.00% x̄: 1.76% x̃: 1.02%
HURT stats (abs)   min: 9 max: 105 x̄: 51.40 x̃: 35
HURT stats (rel)   min: 0.43% max: 4.92% x̄: 2.34% x̃: 1.56%
95% mean confidence interval for instructions value: -1.39 -1.34
95% mean confidence interval for instructions %-change: -1.79% -1.73%
Instructions are helped.

total cycles in shared programs: 361468458 -> 361170679 (-0.08%)
cycles in affected programs: 38470116 -> 38172337 (-0.77%)
helped: 16202
HURT: 1456
helped stats (abs) min: 1 max: 4473 x̄: 26.24 x̃: 18
helped stats (rel) min: <.01% max: 28.44% x̄: 2.90% x̃: 2.18%
HURT stats (abs)   min: 1 max: 5982 x̄: 87.51 x̃: 28
HURT stats (rel)   min: <.01% max: 51.29% x̄: 5.48% x̃: 1.64%
95% mean confidence interval for cycles value: -18.24 -15.49
95% mean confidence interval for cycles %-change: -2.26% -2.14%
Cycles are helped.

total spills in shared programs: 12147 -> 12176 (0.24%)
spills in affected programs: 175 -> 204 (16.57%)
helped: 8
HURT: 5

total fills in shared programs: 25262 -> 25292 (0.12%)
fills in affected programs: 269 -> 299 (11.15%)
helped: 8
HURT: 5

Haswell
total instructions in shared programs: 13530316 -> 13502647 (-0.20%)
instructions in affected programs: 2507824 -> 2480155 (-1.10%)
helped: 18859
HURT: 10
helped stats (abs) min: 1 max: 12 x̄: 1.48 x̃: 1
helped stats (rel) min: 0.03% max: 27.78% x̄: 2.38% x̃: 1.41%
HURT stats (abs)   min: 5 max: 39 x̄: 25.70 x̃: 31
HURT stats (rel)   min: 0.22% max: 1.66% x̄: 1.09% x̃: 1.31%
95% mean confidence interval for instructions value: -1.49 -1.44
95% mean confidence interval for instructions %-change: -2.42% -2.34%
Instructions are helped.

total cycles in shared programs: 377865412 -> 377639034 (-0.06%)
cycles in affected programs: 40169572 -> 39943194 (-0.56%)
helped: 15550
HURT: 1938
helped stats (abs) min: 1 max: 2482 x̄: 25.67 x̃: 18
helped stats (rel) min: <.01% max: 37.77% x̄: 3.00% x̃: 2.25%
HURT stats (abs)   min: 1 max: 4862 x̄: 89.17 x̃: 35
HURT stats (rel)   min: <.01% max: 67.67% x̄: 6.16% x̃: 2.75%
95% mean confidence interval for cycles value: -14.42 -11.47
95% mean confidence interval for cycles %-change: -2.05% -1.91%
Cycles are helped.

total spills in shared programs: 26769 -> 26814 (0.17%)
spills in affected programs: 826 -> 871 (5.45%)
helped: 9
HURT: 10

total fills in shared programs: 38383 -> 38425 (0.11%)
fills in affected programs: 834 -> 876 (5.04%)
helped: 9
HURT: 10

LOST:   5
GAINED: 10

Ivy Bridge
total instructions in shared programs: 12079250 -> 12044139 (-0.29%)
instructions in affected programs: 2409680 -> 2374569 (-1.46%)
helped: 16135
HURT: 0
helped stats (abs) min: 1 max: 23 x̄: 2.18 x̃: 2
helped stats (rel) min: 0.07% max: 37.50% x̄: 2.72% x̃: 1.68%
95% mean confidence interval for instructions value: -2.21 -2.14
95% mean confidence interval for instructions %-change: -2.76% -2.67%
Instructions are helped.

total cycles in shared programs: 180116747 -> 179900405 (-0.12%)
cycles in affected programs: 25439823 -> 25223481 (-0.85%)
helped: 13817
HURT: 1499
helped stats (abs) min: 1 max: 1886 x̄: 26.40 x̃: 18
helped stats (rel) min: <.01% max: 38.84% x̄: 2.57% x̃: 1.97%
HURT stats (abs)   min: 1 max: 3684 x̄: 98.99 x̃: 52
HURT stats (rel)   min: <.01% max: 97.01% x̄: 6.37% x̃: 3.42%
95% mean confidence interval for cycles value: -15.68 -12.57
95% mean confidence interval for cycles %-change: -1.77% -1.63%
Cycles are helped.

LOST:   8
GAINED: 10

Sandy Bridge
total instructions in shared programs: 10878990 -> 10863659 (-0.14%)
instructions in affected programs: 1806702 -> 1791371 (-0.85%)
helped: 13023
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.18 x̃: 1
helped stats (rel) min: 0.07% max: 13.79% x̄: 1.65% x̃: 1.10%
95% mean confidence interval for instructions value: -1.18 -1.17
95% mean confidence interval for instructions %-change: -1.68% -1.62%
Instructions are helped.

total cycles in shared programs: 154082878 -> 153862810 (-0.14%)
cycles in affected programs: 20199374 -> 19979306 (-1.09%)
helped: 12048
HURT: 510
helped stats (abs) min: 1 max: 323 x̄: 20.57 x̃: 18
helped stats (rel) min: 0.03% max: 17.78% x̄: 2.05% x̃: 1.52%
HURT stats (abs)   min: 1 max: 448 x̄: 54.39 x̃: 16
HURT stats (rel)   min: 0.02% max: 37.98% x̄: 4.13% x̃: 1.17%
95% mean confidence interval for cycles value: -17.97 -17.08
95% mean confidence interval for cycles %-change: -1.84% -1.75%
Cycles are helped.

LOST:   1
GAINED: 0

Iron Lake
total instructions in shared programs: 8155075 -> 8142729 (-0.15%)
instructions in affected programs: 949495 -> 937149 (-1.30%)
helped: 5810
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.12 x̃: 2
helped stats (rel) min: 0.10% max: 16.67% x̄: 2.53% x̃: 1.85%
95% mean confidence interval for instructions value: -2.14 -2.11
95% mean confidence interval for instructions %-change: -2.59% -2.48%
Instructions are helped.

total cycles in shared programs: 188584610 -> 188549632 (-0.02%)
cycles in affected programs: 17274446 -> 17239468 (-0.20%)
helped: 3881
HURT: 90
helped stats (abs) min: 2 max: 168 x̄: 9.08 x̃: 6
helped stats (rel) min: <.01% max: 23.53% x̄: 0.83% x̃: 0.30%
HURT stats (abs)   min: 2 max: 10 x̄: 2.80 x̃: 2
HURT stats (rel)   min: <.01% max: 0.60% x̄: 0.10% x̃: 0.07%
95% mean confidence interval for cycles value: -9.35 -8.27
95% mean confidence interval for cycles %-change: -0.85% -0.77%
Cycles are helped.

GM45
total instructions in shared programs: 5019308 -> 5013119 (-0.12%)
instructions in affected programs: 489028 -> 482839 (-1.27%)
helped: 2912
HURT: 0
helped stats (abs) min: 1 max: 8 x̄: 2.13 x̃: 2
helped stats (rel) min: 0.10% max: 16.67% x̄: 2.46% x̃: 1.81%
95% mean confidence interval for instructions value: -2.14 -2.11
95% mean confidence interval for instructions %-change: -2.54% -2.39%
Instructions are helped.

total cycles in shared programs: 129002592 -> 128977804 (-0.02%)
cycles in affected programs: 12669152 -> 12644364 (-0.20%)
helped: 2759
HURT: 37
helped stats (abs) min: 2 max: 168 x̄: 9.03 x̃: 4
helped stats (rel) min: <.01% max: 21.43% x̄: 0.75% x̃: 0.31%
HURT stats (abs)   min: 2 max: 10 x̄: 3.62 x̃: 4
HURT stats (rel)   min: <.01% max: 0.41% x̄: 0.10% x̃: 0.04%
95% mean confidence interval for cycles value: -9.53 -8.20
95% mean confidence interval for cycles %-change: -0.79% -0.70%
Cycles are helped.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2 years agointel/fs: Add need_dest parameter to fs_visitor::nir_emit_alu
Ian Romanick [Tue, 21 May 2019 19:09:42 +0000 (12:09 -0700)]
intel/fs: Add need_dest parameter to fs_visitor::nir_emit_alu

This is the same as the need_dest parameter to
prepare_alu_destination_and_sources.  This allows us to not change the
register that is expected to hold an result if an instruction is
re-emitted.  This is particularly a problem if the re-emitted
instruction is a partial write.  A later patch will use this feature.

No shader-db changes on any Intel platform.

v2: Don't do the Boolean resolve when there is no destination.  If the
ALU instruction didn't write a register, there's nothing to resolve.
This replaces an earlier patch "intel/fs: Allocate dummy destination
register when need_dest is false".

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2 years agointel/fs: Allow cmod propagation across reads and writes of different flags
Ian Romanick [Wed, 22 May 2019 19:32:03 +0000 (12:32 -0700)]
intel/fs: Allow cmod propagation across reads and writes of different flags

This also helps a later patch (intel/fs: Improve discard_if code
generation) on about 200 shaders.

v2: Document that other instruction sequences are also valid in
subtract_merge_with_compare_intervening_mismatch_flag_write.  Suggested
by Caio.

All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 17224438 -> 17224434 (<.01%)
instructions in affected programs: 296 -> 292 (-1.35%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.99% max: 1.92% x̄: 1.43% x̃: 1.40%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.04% -0.81%
Instructions are helped.

total cycles in shared programs: 361468455 -> 361468458 (<.01%)
cycles in affected programs: 2862 -> 2865 (0.10%)
helped: 2
HURT: 2
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.24% max: 0.39% x̄: 0.31% x̃: 0.31%
HURT stats (abs)   min: 3 max: 4 x̄: 3.50 x̃: 3
HURT stats (rel)   min: 0.32% max: 0.70% x̄: 0.51% x̃: 0.51%
95% mean confidence interval for cycles value: -4.34 5.84
95% mean confidence interval for cycles %-change: -0.70% 0.90%
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2 years agointel/fs: Fix flag_subreg handling in cmod propagation
Ian Romanick [Wed, 22 May 2019 17:18:06 +0000 (10:18 -0700)]
intel/fs: Fix flag_subreg handling in cmod propagation

There were two errors.  First, the pass could propagate conditional
modifiers from an instruction that writes on flag register to an
instruction that writes a different flag register.  For example,

    cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F
    cmp.nz.f0.1(16) null:F, vgrf6:F, vgrf5:F

could be come

    cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F

Second, if an instruction writes f0.1 has it's condition propagated, the
modified instruction will incorrectly write flag f0.0.  For example,

    linterp(16) vgrf6:F, g2:F, attr0:F
    cmp.z.f0.1(16) null:F, vgrf6:F, vgrf5:F
    (-f0.1) discard_jump(16) (null):UD

could become

    linterp.z.f0.0(16) vgrf6:F, g2:F, attr0:F
    (-f0.1) discard_jump(16) (null):UD

None of these cases will occur currently.  The only time we use f0.1 is
for generating discard intrinsics.  In all those cases, we generate a
squence like:

    cmp.nz.f0.0(16) vgrf7:F, vgrf6:F, vgrf5:F
    (+f0.1) cmp.z(16) null:D, vgrf7:D, 0d
    (-f0.1) discard_jump(16) (null):UD

Due to the mixed types and incompatible conditions, this sequence would
never see any cmod propagation.  The next patch will change this.

No shader-db changes on any Intel platform.

v2: Fix typo in comment in test case subtract_delete_compare_other_flag.
Noticed by Caio.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>