mesa.git
5 years agoswr/rast: AVX512 support compiled in by default
Alok Hota [Tue, 19 Jun 2018 22:22:32 +0000 (17:22 -0500)]
swr/rast: AVX512 support compiled in by default

- Emulation of AVX512 built into SIMDLIB
  - Remove associated macros
- Remove knobs controlling AVX512 and let emulation handle it
- Refactor variable names for SIMD16

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
5 years agoswr/rast: Remove deprecated 4x2 backend code
Alok Hota [Thu, 14 Jun 2018 17:30:56 +0000 (12:30 -0500)]
swr/rast: Remove deprecated 4x2 backend code

- Use 8x2 tiling by default
  - Remove associated macros
- Use SIMDLIB emulation for SIMD16 on SIMD8 hardware
- Remove code rot in Load/StoreTile

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
5 years agollvmpipe: Always return some fence in flush (v2)
Tomasz Figa [Thu, 25 Apr 2019 17:42:04 +0000 (18:42 +0100)]
llvmpipe: Always return some fence in flush (v2)

If there is no last fence, due to no rendering happening yet, just
create a new signaled fence and return it, to match the expectations of
the EGL sync fence API.

Fixes random "Could not create sync fence 0x3003" assertion failures from
Skia on Android, coming from the following code:

https://android.googlesource.com/platform/frameworks/base/+/master/libs/hwui/pipeline/skia/SkiaOpenGLPipeline.cpp#427

Reproducible especially with thread count >= 4.

One could make the driver always keep the reference to the last fence,
but:

 - the driver seems to explicitly destroy the fence whenever a rendering
   pass completes and changing that would require a significant functional
   change to the code. (Specifically, in lp_scene_end_rasterization().)

 - it still wouldn't solve the problem of an EGL sync fence being created
   and waited on without any rendering happening at all, which is
   also likely to happen with Android code pointed to in the commit.

Therefore, the simple approach of always creating a fence is taken,
similarly to other drivers, such as radeonsi.

Tested with piglit llvmpipe suite with no regressions and following
tests fixed:

egl_khr_fence_sync
 conformance
  eglclientwaitsynckhr_flag_sync_flush
  eglclientwaitsynckhr_nonzero_timeout
  eglclientwaitsynckhr_zero_timeout
  eglcreatesynckhr_default_attributes
  eglgetsyncattribkhr_invalid_attrib
  eglgetsyncattribkhr_sync_status

v2:
 - remove the useless lp_fence_reference() dance (Nicolai),
 - explain why creating the dummy fence is the right approach.

Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agollvmpipe: correctly handle waiting in llvmpipe_fence_finish
Emil Velikov [Thu, 25 Apr 2019 17:42:03 +0000 (18:42 +0100)]
llvmpipe: correctly handle waiting in llvmpipe_fence_finish

Currently if the timeout differs from 0, we'll end up with infinite
wait... even if the user is perfectly clear they don't want that.

Use the new lp_fence_timedwait() helper guarding both waits in an
!lp_fence_signalled block like the rest of llvmpipe.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agollvmpipe: add lp_fence_timedwait() helper
Emil Velikov [Thu, 25 Apr 2019 17:42:02 +0000 (18:42 +0100)]
llvmpipe: add lp_fence_timedwait() helper

The function is analogous to lp_fence_wait() while taking at timeout
(ns) parameter, as needed for EGL fence/sync.

v2:
 - use absolute UTC time, as per spec (Gustaw)
 - bail out on cnd_timedwait() failure (Gustaw)

v3:
 - check count/rank under mutex (Gustaw)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com> (v1)
Reviewed-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
5 years agovulkan/wsi: don't use DUMB_CLOSE for normal GEM handles
Emil Velikov [Fri, 19 Apr 2019 11:11:00 +0000 (12:11 +0100)]
vulkan/wsi: don't use DUMB_CLOSE for normal GEM handles

Currently we get normal GEM handles from PrimeFDToHandle, yet we close
then with DUMB_CLOSE. Use GEM_CLOSE instead.

Fixes: da997ebec92 ("vulkan: Add KHR_display extension using DRM [v10]")
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Keith Packard <keithp@keithp.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
5 years agovulkan/wsi: check if the display_fd given is master
Emil Velikov [Wed, 17 Apr 2019 16:49:09 +0000 (17:49 +0100)]
vulkan/wsi: check if the display_fd given is master

As effectively required by the extension, we need to ensure we're master

Currently drivers employ vendor specific solutions, which check if the
device behind the fd is capable*, yet none of them do the master check.

*In the radv case, if acceleration is available.

Instead of duplicating the check in each driver, keep it where it's
needed and used.

Note this copies libdrm's drmIsMaster() to avoid depending on bleeding
edge version of the library.

v2: set the fd to -1 if not master (Bas)

Fixes: da997ebec92 ("vulkan: Add KHR_display extension using DRM [v10]")
Cc: Andres Rodriguez <andresx7@gmail.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: Keith Packard <keithp@keithp.com>
Reported-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
5 years agoturnip: drop dead close(master_fd)
Emil Velikov [Fri, 19 Apr 2019 10:47:34 +0000 (11:47 +0100)]
turnip: drop dead close(master_fd)

The fd is -1, thus the block of if (fd != -1) close(fd) is dead code.

Cc: Chad Versace <chadversary@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
5 years agonir/algebraic: Optimize integer cast-of-cast
Jason Ekstrand [Mon, 22 Apr 2019 23:03:04 +0000 (18:03 -0500)]
nir/algebraic: Optimize integer cast-of-cast

These have been popping up more and more with the OpenCL work and other
bits causing extra conversions to/from 64-bit.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
5 years agoanv/descriptor_set: Don't fully destroy sets in pool destroy/reset
Jason Ekstrand [Thu, 25 Apr 2019 19:15:26 +0000 (14:15 -0500)]
anv/descriptor_set: Don't fully destroy sets in pool destroy/reset

In 105002bd2d617, we fixed a memory leak bug where we weren't properly
destroying descriptor when destroying/resetting a descriptor pool.
However, the only real leak that happened was that we we take a
reference to the descriptor set layout in the descriptor set and we
weren't dropping our reference.  Everything else in the descriptor set
is tied to the pool itself and doesn't need to be freed on a per-set
basis.  This commit changes the destroy/reset functions to only bother
walking the list of sets to unref the layouts and otherwise we just
assume that the whole-pool destroy/reset takes care of the rest.

Now that we're doing more non-trivial things with descriptor sets such
as allocating things with util_vma_heap, per-set destruction is starting
to show up on perf traces.  This takes reset back to where it's supposed
to be as a cheap whole-pool operation.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agoanv: Better handle 32-byte alignment of descriptor set buffers
Jason Ekstrand [Fri, 26 Apr 2019 03:23:23 +0000 (22:23 -0500)]
anv: Better handle 32-byte alignment of descriptor set buffers

In c520f4dec9c, we chose to align the sizes of descriptor set buffers to
32 bytes.  We have to align the descriptor set buffer to 32B so that
it's valid for using with push constants.  We align the size as well so
we don't leave lots of holes with util_vma_heap_alloc.  Unfortunately,
we were only aligning it for alloc and not for free so we were still
creating piles of holes when we delete descriptor sets.  This causes
terrible perf for the allocator once we've deleted piles of descriptor
sets.

This commit reworks the code so that we align the descriptor set buffer
size to 32B for both alloc and free.  The result is that it takes the
new crucible vkResetDescriptorPool from 104.567719 to 2.898354 seconds.

Fixes: c520f4dec9c "anv: Add a concept of a descriptor buffer"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110497
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agonir: fix bit_size in lower indirect derefs.
Dave Airlie [Fri, 26 Apr 2019 02:50:11 +0000 (12:50 +1000)]
nir: fix bit_size in lower indirect derefs.

This fixes a case where we are expecting 64-bit but generate
32-bit consts and validate gets angry.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
5 years agoiris: Silence unused function warning
Kenneth Graunke [Fri, 26 Apr 2019 00:33:36 +0000 (17:33 -0700)]
iris: Silence unused function warning

5 years agoglsl: fix shader_storage_blocks_write_access for SSBO block arrays (v2)
Marek Olšák [Mon, 8 Apr 2019 21:20:13 +0000 (17:20 -0400)]
glsl: fix shader_storage_blocks_write_access for SSBO block arrays (v2)

This fixes KHR-GL45.compute_shader.resources-max on radeonsi.

Fixes: 4e1e8f684bf "glsl: remember which SSBOs are not read-only and pass it to gallium"
v2: use is_interface_array, protect again assertion failures in u_bit_consecutive

Reviewed-by: Dave Airlie <airlied@redhat.com>
5 years agodocs/features: update GL too
Rob Clark [Thu, 25 Apr 2019 22:48:19 +0000 (15:48 -0700)]
docs/features: update GL too

Forgot to update corresponding entries for desktop GL.. kinda wish we
didn't have to update both GLES and GL tables.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/a6xx: sample-shading support
Rob Clark [Thu, 25 Apr 2019 19:28:35 +0000 (12:28 -0700)]
freedreno/a6xx: sample-shading support

Enables:

  OES_sample_shading
  OES_sample_variables
  OES_shader_multisample_interpolation

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: sample-shading support
Rob Clark [Thu, 25 Apr 2019 19:25:02 +0000 (12:25 -0700)]
freedreno/ir3: sample-shading support

The compiler support for:

  OES_sample_shading
  OES_sample_variables
  OES_shader_multisample_interpolation

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno: wire up core sample-shading support
Rob Clark [Thu, 25 Apr 2019 19:22:30 +0000 (12:22 -0700)]
freedreno: wire up core sample-shading support

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: fix load_interpolated_input slot
Rob Clark [Thu, 25 Apr 2019 19:20:07 +0000 (12:20 -0700)]
freedreno/ir3: fix load_interpolated_input slot

The so->inputs[] table is in units of vec4

Fixes: 7ff6705b8d8 freedreno/ir3: convert to "new style" frag inputs
Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/a6xx: add VALIDREG/CONDREG helper macros
Rob Clark [Thu, 25 Apr 2019 19:13:35 +0000 (12:13 -0700)]
freedreno/a6xx: add VALIDREG/CONDREG helper macros

There are a few places that we check if a shader stage input reg is
used/valid (ie. not r63.x).. and there are about to be a bunch more.
So add some helper macros for less open-coding.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: rename frag_vcoord -> ij_pixel
Rob Clark [Thu, 25 Apr 2019 19:00:49 +0000 (12:00 -0700)]
freedreno/ir3: rename frag_vcoord -> ij_pixel

Since this is what the value actually is.  Cleanup the name before
adding more different i,j related values for sample-shading.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: remove bogus assert
Rob Clark [Thu, 25 Apr 2019 18:54:34 +0000 (11:54 -0700)]
freedreno/ir3: remove bogus assert

tex instruction can actually return 16b values.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: lower load_barycentric_at_offset
Rob Clark [Fri, 19 Apr 2019 18:15:40 +0000 (11:15 -0700)]
freedreno/ir3:  lower load_barycentric_at_offset

Calculates i,j at specified offset within a pixel.  A new load_size_ir3
intrinsic is used in conjunction with fddx/fddy to translate the offset
into primitive space and adjust the i,j from load_barycentric_pixel
accordingly.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: lower load_barycentric_at_sample
Rob Clark [Fri, 19 Apr 2019 18:12:34 +0000 (11:12 -0700)]
freedreno/ir3: lower load_barycentric_at_sample

This lowers load_barycentric_at_sample to load_sample_pos_from_id plus
load_barycentric_at_offset.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno: update generated headers
Rob Clark [Fri, 19 Apr 2019 18:10:49 +0000 (11:10 -0700)]
freedreno: update generated headers

Pull in updates for sample shading.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: cleanup instruction builder macros
Rob Clark [Fri, 5 Apr 2019 22:21:05 +0000 (18:21 -0400)]
freedreno/ir3: cleanup instruction builder macros

De-duplicate the "normal" and "flags" versions of the macros, and while
at it go ahead and add "flags" versions for all the remaining macros,
since we'll at least need INSTR1F in a following commit.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: more emit-cat5 fixes
Rob Clark [Wed, 17 Apr 2019 23:24:15 +0000 (16:24 -0700)]
freedreno/ir3: more emit-cat5 fixes

Couple more opcodes which don't take a sampler id as first arg.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/ir3: fix rgetpos decoding
Rob Clark [Fri, 5 Apr 2019 17:09:33 +0000 (13:09 -0400)]
freedreno/ir3: fix rgetpos decoding

It takes an argument.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agocompiler: rename SYSTEM_VALUE_VARYING_COORD
Rob Clark [Wed, 3 Apr 2019 23:55:34 +0000 (19:55 -0400)]
compiler: rename SYSTEM_VALUE_VARYING_COORD

And add corresponding enums for different sorts of varying
interpolation.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno: add robustness support
Rob Clark [Tue, 16 Apr 2019 18:10:01 +0000 (11:10 -0700)]
freedreno: add robustness support

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agofreedreno/drm: update for robustness
Rob Clark [Tue, 16 Apr 2019 17:10:05 +0000 (10:10 -0700)]
freedreno/drm: update for robustness

Update UABI header and add FD_PP_PGTABLE and FD_NR_FAULTS params.

Robustness can be supported by a kernel which provides the new ABI if it
also indicates that per-process pagetables are in use.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agopanfrost/midgard: Add new bitwise ops
Alyssa Rosenzweig [Thu, 25 Apr 2019 04:38:32 +0000 (04:38 +0000)]
panfrost/midgard: Add new bitwise ops

These fused NOT-ops could maybe help somehow...?

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/midgard: Identify inand
Alyssa Rosenzweig [Thu, 25 Apr 2019 04:25:33 +0000 (04:25 +0000)]
panfrost/midgard: Identify inand

This was previously thought to be inot, but it's actually a bit more
general than that! :)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/midgard: Copy prop for texture registers
Alyssa Rosenzweig [Thu, 25 Apr 2019 04:08:46 +0000 (04:08 +0000)]
panfrost/midgard: Copy prop for texture registers

We'll want to unify this with main copy prop (and extend to varyings),
but that'll take more care to handle some special cases, so leave it as
a stub pass for now.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/midgard: Optimize csel involving 0
Alyssa Rosenzweig [Thu, 25 Apr 2019 03:48:08 +0000 (03:48 +0000)]
panfrost/midgard: Optimize csel involving 0

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/midgard: Extend copy propagation pass
Alyssa Rosenzweig [Thu, 25 Apr 2019 03:18:34 +0000 (03:18 +0000)]
panfrost/midgard: Extend copy propagation pass

This extends copy propagation to respect output modifiers for ALU
instructions, as well as potentially fixing some bugs related to looping
(all dEQP loop tests pass).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/midgard: Reduce fmax(a, 0.0) to fmov.pos
Alyssa Rosenzweig [Wed, 24 Apr 2019 23:42:30 +0000 (23:42 +0000)]
panfrost/midgard: Reduce fmax(a, 0.0) to fmov.pos

This will allow us to copyprop away the move and eliminate the
instruction entirely.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agoradv: Expose Vulkan 1.1 for Android.
Bas Nieuwenhuizen [Sun, 7 Apr 2019 20:54:40 +0000 (22:54 +0200)]
radv: Expose Vulkan 1.1 for Android.

We have the YCBCR feature now.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Expose VK_EXT_ycbcr_image_arrays.
Bas Nieuwenhuizen [Sun, 7 Apr 2019 21:01:36 +0000 (23:01 +0200)]
radv: Expose VK_EXT_ycbcr_image_arrays.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Enable YCBCR conversion feature.
Bas Nieuwenhuizen [Mon, 1 Apr 2019 15:43:29 +0000 (17:43 +0200)]
radv: Enable YCBCR conversion feature.

This enabled the basic YCBCR features.

We support basic multiplane formats using 8-bit and 16-bit unorms, as
well as YUV2 formats.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Add ycbcr subsampled & multiplane formats to csv.
Bas Nieuwenhuizen [Mon, 15 Apr 2019 23:05:29 +0000 (01:05 +0200)]
radv: Add ycbcr subsampled & multiplane formats to csv.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Add ycbcr format features.
Bas Nieuwenhuizen [Mon, 30 Jul 2018 13:45:03 +0000 (15:45 +0200)]
radv: Add ycbcr format features.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Add hashing for the ycbcr samplers.
Bas Nieuwenhuizen [Sun, 7 Apr 2019 15:16:50 +0000 (17:16 +0200)]
radv: Add hashing for the ycbcr samplers.

Otherwise caching gets very confused.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Run the new ycbcr lowering pass.
Bas Nieuwenhuizen [Sat, 30 Mar 2019 13:28:06 +0000 (14:28 +0100)]
radv: Run the new ycbcr lowering pass.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Add ycbcr lowering pass.
Bas Nieuwenhuizen [Thu, 21 Mar 2019 00:29:52 +0000 (01:29 +0100)]
radv: Add ycbcr lowering pass.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Update descriptor sets for multiple planes.
Bas Nieuwenhuizen [Sat, 30 Mar 2019 02:16:04 +0000 (03:16 +0100)]
radv: Update descriptor sets for multiple planes.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Add ycbcr samplers in descriptor set layouts.
Bas Nieuwenhuizen [Thu, 21 Mar 2019 00:29:33 +0000 (01:29 +0100)]
radv: Add ycbcr samplers in descriptor set layouts.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoac/nir: Add support for planes.
Bas Nieuwenhuizen [Sat, 30 Mar 2019 02:15:32 +0000 (03:15 +0100)]
ac/nir: Add support for planes.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Allow mixed src/dst aspects in copies.
Bas Nieuwenhuizen [Sun, 7 Apr 2019 20:40:30 +0000 (22:40 +0200)]
radv: Allow mixed src/dst aspects in copies.

e.g. COLOR + PLANE_2, as well COLOR + COLOR for multiplane images.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Add support for image views with multiple planes.
Bas Nieuwenhuizen [Sun, 2 Dec 2018 22:58:54 +0000 (23:58 +0100)]
radv: Add support for image views with multiple planes.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Add ycbcr conversion structs.
Bas Nieuwenhuizen [Sun, 2 Dec 2018 22:58:58 +0000 (23:58 +0100)]
radv: Add ycbcr conversion structs.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Support different source & dest aspects for planar images in blit2d.
Bas Nieuwenhuizen [Sun, 17 Feb 2019 21:17:53 +0000 (22:17 +0100)]
radv: Support different source & dest aspects for planar images in blit2d.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Add single plane image views & meta operations.
Bas Nieuwenhuizen [Tue, 17 Jul 2018 22:53:52 +0000 (00:53 +0200)]
radv: Add single plane image views & meta operations.

Copies & clear of multiplane images is not allowed so we do not
have to handle that case.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Add multiple planes to images.
Bas Nieuwenhuizen [Mon, 16 Jul 2018 18:51:26 +0000 (20:51 +0200)]
radv: Add multiple planes to images.

No functional changes. This temporarily uses plane 0 for
everything.

Long term plan is that only single plane images get to use
metadata like htile/dcc/cmask/fmask.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Add logic for multisample format descriptions.
Bas Nieuwenhuizen [Sun, 15 Jul 2018 23:31:09 +0000 (01:31 +0200)]
radv: Add logic for multisample format descriptions.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Add logic for subsampled format descriptions.
Bas Nieuwenhuizen [Sun, 15 Jul 2018 18:09:28 +0000 (20:09 +0200)]
radv: Add logic for subsampled format descriptions.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agointel/fs: Don't handle texop_tex for shaders without implicit LOD
Caio Marcelo de Oliveira Filho [Fri, 19 Apr 2019 04:04:57 +0000 (21:04 -0700)]
intel/fs: Don't handle texop_tex for shaders without implicit LOD

These will be lowered by nir_lower_tex() with the
lower_tex_when_implicit_lod_not_supported, so don't need the extra
handling here.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agonir: Add option to lower tex to txl when shader don't support implicit LOD
Caio Marcelo de Oliveira Filho [Fri, 19 Apr 2019 04:01:15 +0000 (21:01 -0700)]
nir: Add option to lower tex to txl when shader don't support implicit LOD

We already add the LOD src, so go ahead and update the texop as well
when this option is set.

v2: Make it an option. (Rob Clark)

v3: Use a more concise name suggested by Jason.

Reviewed-by: Rob Clark <robdclark@gmail.com>
5 years agointel/compiler/fs/icl: Use dummy masked urb write for tess eval
Topi Pohjolainen [Sun, 7 Apr 2019 14:23:33 +0000 (07:23 -0700)]
intel/compiler/fs/icl: Use dummy masked urb write for tess eval

One cannot write the URB arbitrarily and therefore the message
has to be carefully constructed. The clever tricks originate
from Kenneth and Jason, I'm just writing the patch.

Fixes GPU hangs on ICL with Vulkan CTS.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
5 years agoiris: make the TFB result visible to others
Andrii Simiklit [Thu, 25 Apr 2019 08:19:46 +0000 (11:19 +0300)]
iris: make the TFB result visible to others

OpenGL 4.6 Spec:
   "5.3.3 Rules
    .......
    Note: “Updates” via rendering or transform feedback
    are treated consistently with updates via GL commands.
    Once EndTransformFeedback has been issued, any subsequent
    command in the same context that uses the results of the
    transform feedback operation will see the results."

v2: removed a wrong comment
    ( Kenneth Graunke <kenneth@whitecape.org> )

v3: - flush+dirty depends on buffers usage history
    - removed an old hack
    ( Kenneth Graunke <kenneth@whitecape.org> )

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110404
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agoiris: Some tidying for preemption support
Kenneth Graunke [Wed, 24 Apr 2019 23:43:36 +0000 (16:43 -0700)]
iris: Some tidying for preemption support

Just enable it during init_render_context on Gen10+, and move the
Gen9 state tracking into iris_genx_state so it only exists on Gen9.

Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agoradeonsi: remove dirty slot masks from scissor and viewport states
Marek Olšák [Thu, 18 Apr 2019 19:43:46 +0000 (15:43 -0400)]
radeonsi: remove dirty slot masks from scissor and viewport states

All registers in the array need to be updated if any of them is changed.

Only apps writing gl_ViewportIndex were affected by this bug.

5 years agoradeonsi/gfx9: rework the gfx9 scissor bug workaround (v2)
Marek Olšák [Thu, 18 Apr 2019 19:19:19 +0000 (15:19 -0400)]
radeonsi/gfx9: rework the gfx9 scissor bug workaround (v2)

Needed to track context rolls caused by streamout and ACQUIRE_MEM.
ACQUIRE_MEM can occur outside of draw calls.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110355

v2: squashed patches and done more rework

Cc: 19.0 <mesa-stable@lists.freedesktop.org>
5 years agoradeonsi/gfx9: set that window_rectangles always roll the context
Marek Olšák [Wed, 17 Apr 2019 15:43:14 +0000 (11:43 -0400)]
radeonsi/gfx9: set that window_rectangles always roll the context

Cc: 19.0 <mesa-stable@lists.freedesktop.org>
5 years agomeson: Force '.so' extension for DRI drivers
Jon Turney [Sun, 14 Apr 2019 19:46:39 +0000 (20:46 +0100)]
meson: Force '.so' extension for DRI drivers

DRI driver loadable modules are always installed with
install_megadriver.py with names ending with '.so', irrespective of
platform.

Force the name the loadable module is built with to match, so
install_megadriver.py doesn't spin trying to remove non-existent
symlinks.

Fixes: c77acc3c "meson: remove meson-created megadrivers symlinks"
5 years agoradeonsi: add radeonsi_sync_compile option
Nicolai Hähnle [Mon, 25 Mar 2019 14:44:45 +0000 (15:44 +0100)]
radeonsi: add radeonsi_sync_compile option

Force the driver thread to sync immediately with a compiler thread (but
compilation still happens in a separate thread).

This can be useful to simplify debugging compiler issues.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoradeonsi: add radeonsi_aux_debug option for aux context debug dumps
Nicolai Hähnle [Thu, 14 Mar 2019 08:51:43 +0000 (09:51 +0100)]
radeonsi: add radeonsi_aux_debug option for aux context debug dumps

Enabling this option will create ddebug-style dumps for the aux context,
except that instead of intercepting the pipe_context layer
we just dump the IB contents on flush.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoddebug: expose some helper functions as non-inline
Nicolai Hähnle [Thu, 14 Mar 2019 08:48:47 +0000 (09:48 +0100)]
ddebug: expose some helper functions as non-inline

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoddebug: dump driver state into a separate file
Nicolai Hähnle [Tue, 26 Feb 2019 15:34:34 +0000 (16:34 +0100)]
ddebug: dump driver state into a separate file

Due to asynchronous execution, it's not clear which of the draws the state
may refer to.

This also works around an issue encountered with radeonsi where dumping
the driver state itself caused a hang.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoddebug: log calls to pipe->flush
Nicolai Hähnle [Tue, 26 Feb 2019 15:22:53 +0000 (16:22 +0100)]
ddebug: log calls to pipe->flush

This can be useful when internal draws lead to a hang.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoddebug: set thread name
Nicolai Hähnle [Tue, 26 Feb 2019 12:07:30 +0000 (13:07 +0100)]
ddebug: set thread name

For better debuggability.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoutil/u_log: flush auto loggers before starting a new page
Nicolai Hähnle [Tue, 26 Feb 2019 15:22:02 +0000 (16:22 +0100)]
util/u_log: flush auto loggers before starting a new page

Without this, command stream dumps of radeonsi may misleadingly end up
in a later page.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoradeonsi: add si_debug_options for convenient adding/removing of options
Nicolai Hähnle [Fri, 15 Mar 2019 13:56:36 +0000 (14:56 +0100)]
radeonsi: add si_debug_options for convenient adding/removing of options

Move the definition of radeonsi_clear_db_cache_before_clear there,
as well as radeonsi_enable_nir.

This removes the AMD_DEBUG=nir option.

We currently still have two places for options: the driconf machinery
and AMD_DEBUG/R600_DEBUG. If we are to have a single place for options,
then the driconf machinery should be preferred since it's more flexible.

The only downside of the driconf machinery was that adding new options
was quite inconvenient. With this change, a simple boolean option can
be added with a single line of code, same as for AMD_DEBUG.

One technical limitation of this particular implementation is that while
almost all driconf features are available, the translation machinery doesn't
pick up the description strings for options added in si_debvug_options. In
practice, translations haven't been provided anyway, and this is intended
for developer options, so I'm not too worried. It could always be added
later if anybody really cares.

v2:
- use bool instead of uint8_t for options
- si_debug_options.inc -> si_debug_options.h

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agogitlab-ci: Use meson buildtype debug instead of default debugoptimized
Michel Dänzer [Wed, 17 Apr 2019 10:53:23 +0000 (12:53 +0200)]
gitlab-ci: Use meson buildtype debug instead of default debugoptimized

This can save a lot of time for some of the meson CI jobs.

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoRevert "intel/compiler: split is_partial_write() into two variants"
Juan A. Suarez Romero [Wed, 24 Apr 2019 10:38:28 +0000 (12:38 +0200)]
Revert "intel/compiler: split is_partial_write() into two variants"

This reverts commit 40b3abb4d16af4cef0307e1b4904c2ec0924299e.

It is not clear that this commit was entirely correct, and unfortunately
it was pushed by error.

CC: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: fix nir_remove_unused_varyings()
Timothy Arceri [Thu, 25 Apr 2019 01:17:42 +0000 (11:17 +1000)]
nir: fix nir_remove_unused_varyings()

We were only setting the used mask for the first component of a
varying. Since the linking opts split vectors into scalars this
has mostly worked ok.

However this causes an issue where for example if we split a
struct on one side of the interface but not the other, then we
can possibly end up removing the first components on the side
that was split and then incorrectly remove the whole struct
on the other side of the varying.

With this change we simply mark all 4 components for each slot
used by a struct. We could possibly make this more fine gained
but that would require a more complex change.

This fixes a bug in Strange Brigade on RADV when tessellation
is enabled, all credit goes to Samuel Pitoiset for tracking down
the cause of the bug.

Fixes: f1eb5e639997 ("nir: add component level support to remove_unused_io_vars()")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoi965: fix icelake performance query enabling
Lionel Landwerlin [Wed, 24 Apr 2019 21:27:34 +0000 (05:27 +0800)]
i965: fix icelake performance query enabling

This was a rebase issue which lost of change to a file moved from i965
to src/intel/perf.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 134e750e16bfc5 ("i965: extract performance query metrics")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agoradeonsi: add BOs after need_cs_space
Marek Olšák [Wed, 24 Apr 2019 21:33:53 +0000 (17:33 -0400)]
radeonsi: add BOs after need_cs_space

need_cs_space may clear the buffer list.

Fixes: 951d60f8cdc88 "radeonsi: delay adding BOs at the beginning of IBs until the first draw"
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoglsl: handle interactions between EXT_gpu_shader4 and texture extensions
Marek Olšák [Wed, 24 Apr 2019 17:02:43 +0000 (13:02 -0400)]
glsl: handle interactions between EXT_gpu_shader4 and texture extensions

also, EXT_texture_buffer_object has to be enabled separately.

Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agost/mesa: expose EXT_gpu_shader4 if GLSL 1.40 is supported
Marek Olšák [Tue, 7 Aug 2018 22:32:31 +0000 (18:32 -0400)]
st/mesa: expose EXT_gpu_shader4 if GLSL 1.40 is supported

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agomesa: only allow EXT_gpu_shader4 in the compatibility profile
Marek Olšák [Tue, 21 Aug 2018 03:42:22 +0000 (23:42 -0400)]
mesa: only allow EXT_gpu_shader4 in the compatibility profile

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agomesa: expose EXT_texture_buffer_object
Marek Olšák [Wed, 8 Aug 2018 04:06:51 +0000 (00:06 -0400)]
mesa: expose EXT_texture_buffer_object

This is needed for exposing the samplerBuffer functions under
EXT_gpu_shader4.

v2: - expose it in the compat profile only
    - make it an alias of EXT_gpu_shader4

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> (v1)
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoglsl: allow "varying out" for fragment shader outputs with EXT_gpu_shader4
Marek Olšák [Fri, 5 Apr 2019 21:01:55 +0000 (17:01 -0400)]
glsl: allow "varying out" for fragment shader outputs with EXT_gpu_shader4

Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoglsl: add texture builtin functions for EXT_gpu_shader4
Marek Olšák [Wed, 8 Aug 2018 01:31:09 +0000 (21:31 -0400)]
glsl: add texture builtin functions for EXT_gpu_shader4

v2: some fixes to texture functions thanks to piglit tests

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoglsl: add arithmetic builtin functions for EXT_gpu_shader4
Marek Olšák [Wed, 8 Aug 2018 01:31:09 +0000 (21:31 -0400)]
glsl: add arithmetic builtin functions for EXT_gpu_shader4

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoglsl: add builtin variables for EXT_gpu_shader4
Marek Olšák [Tue, 7 Aug 2018 22:30:19 +0000 (18:30 -0400)]
glsl: add builtin variables for EXT_gpu_shader4

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoglsl: apply some 1.30 and other rules to EXT_gpu_shader4 as well
Marek Olšák [Tue, 7 Aug 2018 23:56:44 +0000 (19:56 -0400)]
glsl: apply some 1.30 and other rules to EXT_gpu_shader4 as well

Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoglsl: enable types for EXT_gpu_shader4
Chris Forbes [Thu, 18 Jul 2013 10:41:21 +0000 (22:41 +1200)]
glsl: enable types for EXT_gpu_shader4

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoglsl: add `unsigned int` type for EXT_GPU_shader4
Marek Olšák [Tue, 7 Aug 2018 21:18:40 +0000 (17:18 -0400)]
glsl: add `unsigned int` type for EXT_GPU_shader4

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoglsl: enable noperspective|flat|centroid for EXT_gpu_shader4
Chris Forbes [Thu, 18 Jul 2013 10:43:26 +0000 (22:43 +1200)]
glsl: enable noperspective|flat|centroid for EXT_gpu_shader4

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoglsl: add scaffolding for EXT_gpu_shader4
Chris Forbes [Thu, 18 Jul 2013 09:44:58 +0000 (21:44 +1200)]
glsl: add scaffolding for EXT_gpu_shader4

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agomesa: enable glGet for EXT_gpu_shader4
Marek Olšák [Tue, 7 Aug 2018 21:49:42 +0000 (17:49 -0400)]
mesa: enable glGet for EXT_gpu_shader4

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agov3d: Disable SSBOs and atomic counters on vertex shaders.
Eric Anholt [Mon, 22 Apr 2019 18:24:55 +0000 (11:24 -0700)]
v3d: Disable SSBOs and atomic counters on vertex shaders.

The CTS fails on
dEQP-GLES31.functional.shaders.opaque_type_indexing.atomic_counter.*vertex
when they are enabled, due to the VS being run for both bin and render.  I
think this behavior is expected to be valid, but I can't find text in
atomic counters or SSBO specs saying so (the closed I found was in
shader_image_load_store).  Just disable it for now, since the closed
source driver doesn't expose vertex atomic counters/SSBOs either.

5 years agost/mesa: Don't set atomic counter size != 0 if MAX_SHADER_BUFFERS == 0.
Eric Anholt [Wed, 1 Aug 2018 23:07:45 +0000 (16:07 -0700)]
st/mesa: Don't set atomic counter size != 0 if MAX_SHADER_BUFFERS == 0.

This is just asking for tests to get confused about the HW supporting
atomics in this shader stage or not, such as
dEQP-GLES31.functional.shaders.opaque_type_indexing.atomic_counter.const_expression_vertex.

v2: Rebase on the other atomic cleanups that have happened since posting.
v3: Commit message tweak by Marek.

Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoiris: Advertise EXT_texture_sRGB_R8 support
Kenneth Graunke [Wed, 24 Apr 2019 20:24:18 +0000 (13:24 -0700)]
iris: Advertise EXT_texture_sRGB_R8 support

Using the luminance format, like both brw and anv do.

5 years agoiris: Enable GL_AMD_depth_clamp_separate
Kenneth Graunke [Wed, 24 Apr 2019 20:04:53 +0000 (13:04 -0700)]
iris: Enable GL_AMD_depth_clamp_separate

We support this, we just forgot to turn it on.

5 years agoutil: fix a compile failure in u_compute.c on windows
Marek Olšák [Wed, 24 Apr 2019 22:29:26 +0000 (18:29 -0400)]
util: fix a compile failure in u_compute.c on windows

5 years agoiris: enable preemption support for gen10
Mike Blumenkrantz [Fri, 19 Apr 2019 13:04:59 +0000 (09:04 -0400)]
iris: enable preemption support for gen10

this automatically enables preemption on gen10 where it is disabled by
default but still available

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agoiris: add preemption support on gen9
Mike Blumenkrantz [Thu, 11 Apr 2019 14:07:15 +0000 (10:07 -0400)]
iris: add preemption support on gen9

this is basically just porting the following two commits to gallium:
d8b50e152a0d5df0971c05b8db132fa688794001
5c454661c66fa2624cf4bba1071175070724869a

resolves kwg/mesa#49

Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
5 years agoiris: Split iris_flush_and_dirty_for_history into two helpers.
Kenneth Graunke [Wed, 24 Apr 2019 19:11:39 +0000 (12:11 -0700)]
iris: Split iris_flush_and_dirty_for_history into two helpers.

We create two new helpers, iris_flush_bits_for_history, and
iris_dirty_for_history, then use them in the existing function.

The first accumulates flush bits based on res->bind_history, but doesn't
actually perform a flush.  This allows us to accumulate flush bits by
looping over multiple resources, but ultimately emit a single flush for
all of them.

The latter flags dirty bits without flushing, which again allows us to
handle multiple resources, but also is more convenient when writing from
the CPU where we don't need a flush (as in commit 4d12236072).