mesa.git
4 years agofreedreno: Refactor the UBWC flags registers emission.
Eric Anholt [Fri, 22 Nov 2019 00:20:11 +0000 (16:20 -0800)]
freedreno: Refactor the UBWC flags registers emission.

It's the same logic for each of these being emitted, and I was about to
change the rsc->layout.* for UBWC.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agofreedreno: Drop the extra offset field for mipmap slices.
Eric Anholt [Thu, 21 Nov 2019 22:53:58 +0000 (14:53 -0800)]
freedreno: Drop the extra offset field for mipmap slices.

We can just bake the UBWC-goes-first delta into the slices at setup time.
I did have to fix up the resource shadowing swap path to swap the slice
fields, as it was missing and regressed the format reinterpets otherwise.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agointel/decoder: Make get_state_size take a full 64-bit address and a base
Kenneth Graunke [Wed, 2 Oct 2019 19:09:33 +0000 (15:09 -0400)]
intel/decoder: Make get_state_size take a full 64-bit address and a base

i965 wants to use an offset from a base because everything is in a
single buffer whose address may be relocated, and all base addresses
are set to the start of that buffer.

iris wants to use a full 64-bit address, because state lives in separate
buffers which may be in the shader, surface, and dynamic memory zones,
where addresses grow downward from the top of a 4GB zone,  So it's very
possible for a 32-bit offset to exist relative to multiple bases,
leading to the wrong state size.

4 years agoiris: INTEL performance query implementation
Dongwon Kim [Tue, 15 Oct 2019 19:43:02 +0000 (12:43 -0700)]
iris: INTEL performance query implementation

low-level implementation of INTEL-performance-query APIs in
Intel iris driver. Most of functions and procedures defined here
are adopted from i965 driver (brw_performance_query.c)

v2: - replace genX_init_performance_query with
      iris_init_perfquery_functions which is gen's version agnositic
    - general code clean-up

v3: include gen_perf_gens.h as some of defines were moved to this new
    header file

v4: - checking for kernel 4.13+ won't be needed here as Iris won't be
      loaded anyway without DRM_SYNCOBJ that is enabled after Kernel
      4.13.

    - checking whether gen < 8 or is_cherryview won't be required as
      well because those cases are screened in iris_screen_create.

v5: remove genX(init_performance_query)

v6: - remove oa_metrics_kernel_support as iris works only with kernel
    4.18 and newer.

    - use perf functions defined in separate file, iris_perf.h/c

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agoiris: separating out common perf code
Mark Janes [Fri, 22 Nov 2019 21:46:22 +0000 (13:46 -0800)]
iris: separating out common perf code

The configuration of the gen_perf vtable will be the same for
INTEL_performance_query and AMD_performance_monitor.
Initialize the table in a single routine that can be called from both
implementations.

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agogallium: enable INTEL_PERFORMANCE_QUERY
Dongwon Kim [Tue, 15 Oct 2019 19:43:04 +0000 (12:43 -0700)]
gallium: enable INTEL_PERFORMANCE_QUERY

new state tracker APIs added for INTEL_performance_query
This extension is enabled if all vendor specific functions for it
exist.

v2: add st_cb_perfquery.* to the list of sources in Makefile
v3: minor code clean-up
v4: - add driver hooks for intel-performance-query apis
    - add PIPE level performance counter and type enums that
      match to OpenGL enums
    - do conversion of pipe_perf_counter_type and
      pipe_perf_counter_data_type enums to GL defines in state_tracker

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agomeson/broadcom: libbroadcom_cle also needs zlib
Dylan Baker [Tue, 10 Dec 2019 19:15:37 +0000 (11:15 -0800)]
meson/broadcom: libbroadcom_cle also needs zlib

Fixes: 1ae8018a6af81eec4832a57d9d0346aa3dd98d28
       ("meson: Add support for the vc4 driver.")
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoanv: Enable Gen11 Color/Z write merging optimization
Kenneth Graunke [Tue, 3 Dec 2019 01:30:06 +0000 (17:30 -0800)]
anv: Enable Gen11 Color/Z write merging optimization

TCCNTLREG contains additional L3 cache write merging optimizations.

The default value on my system appears to be:
- URB Partial Write Merging (bit 0)
- L3 Data Partial Write Merging (bit 2)
- TC Disable (bit 3)

Windows drivers appear to set bit 1 as well to enable "Color/Z Partial
Write Merging".  This should solve an issue we were seeing where MRT
benchmarks were using substantially more bandwidth than they ought.
However, we have not observed it to cause measurable FPS gains.

It is unclear whether we should be setting bit 0 or bit 3, so for now
we leave those at the hardware default value.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agoiris: Enable Gen11 Color/Z write merging optimization
Kenneth Graunke [Sat, 31 Aug 2019 00:19:46 +0000 (17:19 -0700)]
iris: Enable Gen11 Color/Z write merging optimization

TCCNTLREG contains additional L3 cache write merging optimizations.

The default value on my system appears to be:
- URB Partial Write Merging (bit 0)
- L3 Data Partial Write Merging (bit 2)
- TC Disable (bit 3)

Windows drivers appear to set bit 1 as well to enable "Color/Z Partial
Write Merging".  This should solve an issue we were seeing where MRT
benchmarks were using substantially more bandwidth than they ought.
However, we have not observed it to cause measurable FPS gains.

It is unclear whether we should be setting bit 0 or bit 3, so for now
we leave those at the hardware default value.

Improves performance in Manhattan 3.0 by 6% on ICL 8x8 at a fixed
frequency, according to Felix Degrood.  I didn't see any improvements
at out-of-the-box power management settings, however.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agointel/genxml: Add a partial TCCNTLREG definition
Kenneth Graunke [Mon, 2 Dec 2019 07:01:19 +0000 (23:01 -0800)]
intel/genxml: Add a partial TCCNTLREG definition

TCCNTLREG contains additional cache programming settings.  In
particular, there are several write combining controls we'd like to use.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agoutil: Detect use-after-destroy in simple_mtx
Kenneth Graunke [Mon, 21 Oct 2019 21:51:13 +0000 (14:51 -0700)]
util: Detect use-after-destroy in simple_mtx

This makes simple_mtx_destroy set the counter to an invalid canary
value and then makes lock/unlock assert that the value is legal.

That way, calling lock/unlock after destroy will assert fail,
rather than deadlocking or potentially even working.

This has caught real deadlocks in dEQP multithreaded tests (in st/mesa
shader variant zombie list handling), which have since been fixed.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
4 years agofreedreno/a6xx: enable LRZ by default
Rob Clark [Tue, 10 Dec 2019 22:41:46 +0000 (14:41 -0800)]
freedreno/a6xx: enable LRZ by default

Now that dEQP should be happy, lets flip the switch.

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/a6xx: fix LRZ logic
Rob Clark [Fri, 6 Dec 2019 19:34:39 +0000 (11:34 -0800)]
freedreno/a6xx: fix LRZ logic

In particular, we need to invalidate the LRZ state when we cannot be
confident in what the Z state would be during rendering:

1) depth test modes not supported by LRZ
2) stencil test, which would require full rasterization and stencil
   test in the binning pass (whereas LRZ normally just needs to
   determine the min and max z value in an 8x8 quad)

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/a6xx: fix LRZ layout
Rob Clark [Tue, 10 Dec 2019 22:27:20 +0000 (14:27 -0800)]
freedreno/a6xx: fix LRZ layout

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/a5xx+a6xx: split LRZ layout to per-gen
Rob Clark [Tue, 10 Dec 2019 22:24:59 +0000 (14:24 -0800)]
freedreno/a5xx+a6xx: split LRZ layout to per-gen

Seems to be a bit different for a6xx, so let's split this out.

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/a6xx: disable LRZ when blending
Rob Clark [Thu, 5 Dec 2019 19:54:33 +0000 (11:54 -0800)]
freedreno/a6xx: disable LRZ when blending

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agoradeonsi: don't rely on CLEAR_STATE to set PA_SC_GENERIC_SCISSOR_*
Marek Olšák [Tue, 10 Dec 2019 00:27:26 +0000 (19:27 -0500)]
radeonsi: don't rely on CLEAR_STATE to set PA_SC_GENERIC_SCISSOR_*

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoradeonsi/gfx10: simplify the tess_turns_off_ngg condition
Marek Olšák [Tue, 12 Nov 2019 22:10:05 +0000 (17:10 -0500)]
radeonsi/gfx10: simplify the tess_turns_off_ngg condition

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoradeonsi/gfx10: disable vertex grouping
Marek Olšák [Mon, 28 Oct 2019 20:37:53 +0000 (16:37 -0400)]
radeonsi/gfx10: disable vertex grouping

based on PAL.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoradeonsi: enable NIR by default and document GL 4.6 support
Marek Olšák [Sat, 26 Oct 2019 03:32:18 +0000 (23:32 -0400)]
radeonsi: enable NIR by default and document GL 4.6 support

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agost/dri: assume external consumers of back buffers can write to the buffers
Marek Olšák [Thu, 17 Oct 2019 20:46:06 +0000 (16:46 -0400)]
st/dri: assume external consumers of back buffers can write to the buffers

This was reverted needlessly because if was part of another series.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-By: Tapani Pälli <tapani.palli@intel.com>
4 years agoANV: Stop advertising smoothLines support on gen10+
Jason Ekstrand [Fri, 6 Dec 2019 21:20:35 +0000 (15:20 -0600)]
ANV: Stop advertising smoothLines support on gen10+

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
4 years agomeson/broadcom: libbroadcom_cle needs expat headers
Dylan Baker [Tue, 10 Dec 2019 18:19:04 +0000 (10:19 -0800)]
meson/broadcom: libbroadcom_cle needs expat headers

Fixes: 1ae8018a6af81eec4832a57d9d0346aa3dd98d28
       ("meson: Add support for the vc4 driver.")
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoanv: fix incorrect VMA alignment for CCS main surfaces
Lionel Landwerlin [Tue, 10 Dec 2019 11:49:49 +0000 (03:49 -0800)]
anv: fix incorrect VMA alignment for CCS main surfaces

Maybe finer way of dealing with this requirement would be to increase
the number of pdevice->memory.types[] to add a category for special
alignment cases.

Meanwhile this fixes the problem of CCS surface alignment and it's
probably not going to cause issues given the size of our address
space.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 6af8a4acc4a4 ("anv: Add aux-map translation for gen12+")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agoanv: fix missing gen12 handling
Lionel Landwerlin [Tue, 10 Dec 2019 11:49:16 +0000 (03:49 -0800)]
anv: fix missing gen12 handling

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 181be14d4303 ("anv: Build for gen12")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agodocs: reword a bit and list HTTPS before FTP
Eric Engestrom [Fri, 22 Nov 2019 14:36:02 +0000 (14:36 +0000)]
docs: reword a bit and list HTTPS before FTP

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
4 years agomeson: drop `intel_` prefix on imgui_core
Eric Engestrom [Thu, 21 Nov 2019 23:13:01 +0000 (23:13 +0000)]
meson: drop `intel_` prefix on imgui_core

Again, no real effect, just the name of a temporary build file.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
4 years agomeson: drop duplicate `lib` prefix on libiris_gen*
Eric Engestrom [Thu, 21 Nov 2019 23:11:07 +0000 (23:11 +0000)]
meson: drop duplicate `lib` prefix on libiris_gen*

This has no real effect other than the names of the temporary files in
the build folder.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
4 years agoradv: implement VK_KHR_separate_depth_stencil_layouts
Samuel Pitoiset [Mon, 9 Dec 2019 12:56:24 +0000 (13:56 +0100)]
radv: implement VK_KHR_separate_depth_stencil_layouts

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: initialize HTILE for separate depth/stencil aspects
Samuel Pitoiset [Wed, 6 Nov 2019 14:49:10 +0000 (15:49 +0100)]
radv: initialize HTILE for separate depth/stencil aspects

It either clears the whole HTILE buffer or part of it depending
on the HTILE mask parameter.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: do not init HTILE as compressed state when dst layout allows it
Samuel Pitoiset [Wed, 6 Nov 2019 15:31:56 +0000 (16:31 +0100)]
radv: do not init HTILE as compressed state when dst layout allows it

I don't think this makes much differences and a potential clear
following the initialization will overwrite HTILE anyways.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: synchronize after performing a separate depth/stencil fast clears
Samuel Pitoiset [Tue, 26 Nov 2019 15:55:02 +0000 (16:55 +0100)]
radv: synchronize after performing a separate depth/stencil fast clears

For depth+stencil images, the driver might use an optimized path
if only one aspect is cleared. It either clears the depth or the
stencil part of HTILE. Because the two separate aspects might use
the same HTILE memory we have to synchronize.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agogitlab-ci: Don't exclude any piglit quick_shader tests
Michel Dänzer [Fri, 6 Dec 2019 11:02:13 +0000 (12:02 +0100)]
gitlab-ci: Don't exclude any piglit quick_shader tests

Now that we're running these with process isolation enabled, their
results will hopefully be stable.

Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agogallivm: add TGSI bit arithmetic opcodes support
Krzysztof Raszkowski [Thu, 5 Dec 2019 17:01:08 +0000 (18:01 +0100)]
gallivm: add TGSI bit arithmetic opcodes support

Add TGSI_OPCODE_BFI, TGSI_OPCODE_POPC, TGSI_OPCODE_LSB,
TGSI_OPCODE_IMSB, TGSI_OPCODE_UMSB, TGSI_OPCODE_IBFE,
TGSI_OPCODE_UBFE, TGSI_OPCODE_BREV support.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
4 years agoradv: fix possibly wrong PA_SC_AA_CONFIG value for conservative rast
Samuel Pitoiset [Fri, 6 Dec 2019 11:19:11 +0000 (12:19 +0100)]
radv: fix possibly wrong PA_SC_AA_CONFIG value for conservative rast

PA_SC_AA_CONFIG might be updated when conversative rasterization is
enabled. Because the driver only re-emits the multisample state if
the number of samples is different, that register value might not
be updated correctly.

Found by inspection, doesn't fix anything known.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: move emission of two PA_SC_* registers to the pipeline CS
Samuel Pitoiset [Fri, 6 Dec 2019 11:12:38 +0000 (12:12 +0100)]
radv: move emission of two PA_SC_* registers to the pipeline CS

They don't have to be updated dynamically.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agost/dri: use st->flush callback to flush the backbuffer
Pierre-Eric Pelloux-Prayer [Wed, 27 Nov 2019 10:25:40 +0000 (11:25 +0100)]
st/dri: use st->flush callback to flush the backbuffer

Previously the flush was done before the call to st->flush but
could lead to problems as FLUSH_VERTICES could push some work
that would change the backbuffer (or modify it).

With this commit, all the backbuffer flushing code is executed
right before the call to st_flush.

Closes: https://gitlab.freedesktop.org/drm/amd/issues/842
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205049

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agost/mesa: add a notify_before_flush callback param to flush
Pierre-Eric Pelloux-Prayer [Wed, 27 Nov 2019 10:22:11 +0000 (11:22 +0100)]
st/mesa: add a notify_before_flush callback param to flush

The new callback is called right before the flush is done to allow
users of st->flush to do some work after all the previous work has
been flushed.

This will be used by dri_flush in the next commit.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agoradeonsi: dcc dirty flag
Pierre-Eric Pelloux-Prayer [Fri, 22 Nov 2019 14:42:46 +0000 (15:42 +0100)]
radeonsi: dcc dirty flag

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agoradeonsi: fix multi plane buffers creation
Pierre-Eric Pelloux-Prayer [Mon, 9 Dec 2019 08:48:37 +0000 (09:48 +0100)]
radeonsi: fix multi plane buffers creation

When using 3 planes, the sequence produces this chain:
  plane0 -> plane2
This commit fixes this to produce:
  plane0 -> plane1 -> plane2

Fixes: 86e60bc2659 ("radeonsi: remove si_vid_join_surfaces and use combined planar allocations")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2193
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agoradeonsi: use gfx9.surf_offset to compute texture offset
Pierre-Eric Pelloux-Prayer [Fri, 6 Dec 2019 20:35:38 +0000 (21:35 +0100)]
radeonsi: use gfx9.surf_offset to compute texture offset

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2177
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agoradeonsi: use compute shader for clear 12-byte buffer
Sonny Jiang [Fri, 29 Nov 2019 23:04:54 +0000 (18:04 -0500)]
radeonsi: use compute shader for clear 12-byte buffer

Signed-off-by: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agost/mesa: release the draw shader properly to fix driver crashes (iris)
Marek Olšák [Tue, 10 Dec 2019 03:35:57 +0000 (22:35 -0500)]
st/mesa: release the draw shader properly to fix driver crashes (iris)

Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agodraw, st/mesa: generate TGSI for ffvp/ARB_vp if draw lacks LLVM
Marek Olšák [Wed, 4 Dec 2019 22:27:13 +0000 (17:27 -0500)]
draw, st/mesa: generate TGSI for ffvp/ARB_vp if draw lacks LLVM

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agost/mesa: don't generate VS TGSI if NIR is enabled
Marek Olšák [Thu, 28 Nov 2019 03:47:56 +0000 (22:47 -0500)]
st/mesa: don't generate VS TGSI if NIR is enabled

it's no longer needed

Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agost/mesa: remove struct st_vp_variant in favor of st_common_variant
Marek Olšák [Thu, 28 Nov 2019 03:37:35 +0000 (22:37 -0500)]
st/mesa: remove struct st_vp_variant in favor of st_common_variant

Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agost/mesa: remove st_vp_variant::num_inputs
Marek Olšák [Thu, 28 Nov 2019 03:30:22 +0000 (22:30 -0500)]
st/mesa: remove st_vp_variant::num_inputs

Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agost/mesa: use a separate VS variant for the draw module
Marek Olšák [Thu, 28 Nov 2019 03:25:00 +0000 (22:25 -0500)]
st/mesa: use a separate VS variant for the draw module

instead of keeping the IR indefinitely in st_vp_variant.

This trivially fixes Selection/Feedback/RasterPos for NIR.

Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agost/mesa: support shader images for Selection/Feedback/RasterPos
Marek Olšák [Wed, 27 Nov 2019 23:01:15 +0000 (18:01 -0500)]
st/mesa: support shader images for Selection/Feedback/RasterPos

Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agost/mesa: support SSBOs for Selection/Feedback/RasterPos
Marek Olšák [Wed, 27 Nov 2019 22:26:51 +0000 (17:26 -0500)]
st/mesa: support SSBOs for Selection/Feedback/RasterPos

Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agost/mesa: support samplers for Selection/Feedback/RasterPos
Marek Olšák [Wed, 27 Nov 2019 20:10:33 +0000 (15:10 -0500)]
st/mesa: support samplers for Selection/Feedback/RasterPos

Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agost/mesa: save currently bound vertex samplers and sampler views in st_context
Marek Olšák [Wed, 27 Nov 2019 20:10:27 +0000 (15:10 -0500)]
st/mesa: save currently bound vertex samplers and sampler views in st_context

for st_draw_feedback.c

Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agost/mesa: support UBOs for Selection/Feedback/RasterPos
Marek Olšák [Wed, 27 Nov 2019 01:55:13 +0000 (20:55 -0500)]
st/mesa: support UBOs for Selection/Feedback/RasterPos

Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agogallivm: implement LOAD with CONSTBUF but don't enable it for llvmpipe
Marek Olšák [Wed, 27 Nov 2019 22:10:45 +0000 (17:10 -0500)]
gallivm: implement LOAD with CONSTBUF but don't enable it for llvmpipe

This is already used in st_draw_feedback.c, because it uses shaders
generated for drivers.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agollvmpipe: implement TEX_LZ and TXF_LZ opcodes
Marek Olšák [Wed, 27 Nov 2019 20:04:14 +0000 (15:04 -0500)]
llvmpipe: implement TEX_LZ and TXF_LZ opcodes

gallivm receives these opcodes anyway because st_draw_feedback.c uses
shaders that were assembled for drivers, not llvmpipe.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agodrirc: set allow_higher_compat_version for Faster Than Light
Gurchetan Singh [Thu, 5 Dec 2019 02:03:19 +0000 (18:03 -0800)]
drirc: set allow_higher_compat_version for Faster Than Light

With 781a78 ("mesa: enable ARB_direct_state_access in compat for
GL3.1+), it's possible to have DSA with GL3.1+.

FTL creates a GL3.1 compat context, but fails the
_mesa_has_geometry_shaders(..) check in frame_buffer_texture.

Bump the compat version to pass the check.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agoutil/atomic: Fix p_atomic_add for unlocked and msvc paths
Roland Scheidegger [Mon, 9 Dec 2019 17:49:13 +0000 (18:49 +0100)]
util/atomic: Fix p_atomic_add for unlocked and msvc paths

Braces mismatch (flagged by CI, untested).

Fixes: 385d13f26d2 "util/atomic: Add a _return variant of p_atomic_add"
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agofreedreno: Track the set of UBOs to be uploaded in UBO analysis.
Eric Anholt [Mon, 9 Dec 2019 19:55:21 +0000 (11:55 -0800)]
freedreno: Track the set of UBOs to be uploaded in UBO analysis.

We were iterating over the entire 32-entry array each time, when we
can just use a bitset to know that we're only uploading from the first
entry normally.

Knocks ir3_emit_user_consts down from ~.5% of CPU to .1% on WebGL
fishtank.

Reviewed-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno: Stop forcing ALLOW_MAPPED_BUFFERS_DURING_EXEC off.
Eric Anholt [Mon, 9 Dec 2019 19:19:14 +0000 (11:19 -0800)]
freedreno: Stop forcing ALLOW_MAPPED_BUFFERS_DURING_EXEC off.

The default is to not throw GL errors when drawing with mapped
buffers, but we were forcing it on for unclear reasons.  Internally we
keep all our buffers mapped anyway, so it should be a no-op other than
reducing CPU overhead (.23% in a perf report for WebGL fishtank)

Reviewed-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/fdperf: use drmOpen()
Rob Clark [Mon, 9 Dec 2019 21:08:33 +0000 (13:08 -0800)]
freedreno/fdperf: use drmOpen()

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agogallium/util: Support POLYGON in u_stream_outputs_for_vertices
Alyssa Rosenzweig [Fri, 6 Dec 2019 21:45:57 +0000 (16:45 -0500)]
gallium/util: Support POLYGON in u_stream_outputs_for_vertices

u_decomposed_prims_for_vertices cannot support POLYGON, but POLYGON is
trivial to support as a special case directly (since we have the number
of vertices directly).

Fixes aborts in Panfrost in apps using GL_POLYGON.

Fixes: e881aa8c12c ("gallium/util: Add u_stream_outputs_for_vertices helper")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Revewied-by: Eric Anholt <eric@anholt.net>
4 years agointel: Add pci-ids for Jasper Lake
Anuj Phogat [Wed, 4 Dec 2019 23:21:20 +0000 (15:21 -0800)]
intel: Add pci-ids for Jasper Lake

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agointel: Add device info for 1x4x6 Jasper Lake
Anuj Phogat [Wed, 4 Dec 2019 23:19:18 +0000 (15:19 -0800)]
intel: Add device info for 1x4x6 Jasper Lake

Also removing the FIXME comments after matching the numbers with
updated documentation.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agolima: expose tiled format modifier in query_dmabuf_modifiers()
Vasily Khoruzhick [Sun, 8 Dec 2019 20:06:43 +0000 (12:06 -0800)]
lima: expose tiled format modifier in query_dmabuf_modifiers()

Fixes: 8c12f4e5f24f ("lima: enable tiling")
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
4 years agolima: handle DRM_FORMAT_MOD_INVALID in resource_from_handle()
Vasily Khoruzhick [Sun, 8 Dec 2019 20:03:42 +0000 (12:03 -0800)]
lima: handle DRM_FORMAT_MOD_INVALID in resource_from_handle()

Assume that resource is tiled if we get DRM_FORMAT_MOD_INVALID
in resource_from_handle() and we don't have RO.

Fixes: 8c12f4e5f24f ("lima: enable tiling")
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
4 years agoturnip: add hw binning
Jonathan Marek [Wed, 20 Nov 2019 03:19:46 +0000 (22:19 -0500)]
turnip: add hw binning

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoradv: do not use VK_TRUE/VK_FALSE
Samuel Pitoiset [Thu, 5 Dec 2019 17:04:32 +0000 (18:04 +0100)]
radv: do not use VK_TRUE/VK_FALSE

For consistency.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agogallivm: add bitfield reverse and ufind_msb
Dave Airlie [Tue, 3 Dec 2019 05:54:56 +0000 (15:54 +1000)]
gallivm: add bitfield reverse and ufind_msb

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Krzysztof Raszkowski <krzysztof.raszkowski@intel.com>
4 years agogallium/scons: fix graw_gdi build
Roland Scheidegger [Sat, 7 Dec 2019 03:37:17 +0000 (04:37 +0100)]
gallium/scons: fix graw_gdi build

Fixes: 44a6b0107b37 (gallivm: add nir->llvm translation (v2))
Reviewed-by: Dave Airlie <Airlied@redhat.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
4 years agoaco: propagate temporaries into expanded vectors
Daniel Schürmann [Thu, 5 Dec 2019 18:27:16 +0000 (19:27 +0100)]
aco: propagate temporaries into expanded vectors

Gives a very slight decrease in code size:
Totals from affected shaders:
Code Size: 1708488 -> 1702768 (-0.33 %) bytes
Max Waves: 2858 -> 2855 (-0.10 %)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: improve readfirstlane after uniform ssbo loads on GFX7
Daniel Schürmann [Thu, 5 Dec 2019 18:17:52 +0000 (19:17 +0100)]
aco: improve readfirstlane after uniform ssbo loads on GFX7

pipeline-db changes for GFX7:

80310 shaders in 40472 tests
Totals:
SGPRS: 3655900 -> 3643916 (-0.33 %)
VGPRS: 2678324 -> 2686324 (0.30 %)
Spilled SGPRs: 1730 -> 1634 (-5.55 %)
Spilled VGPRs: 14 -> 21 (50.00 %)
Scratch size: 15540 -> 15536 (-0.03 %) dwords per thread
Code Size: 136106120 -> 135457616 (-0.48 %) bytes
LDS: 1259 -> 1259 (0.00 %) blocks
Max Waves: 601014 -> 600206 (-0.13 %)

Totals from affected shaders:
SGPRS: 307832 -> 295848 (-3.89 %)
VGPRS: 267864 -> 275864 (2.99 %)
Spilled SGPRs: 770 -> 674 (-12.47 %)
Spilled VGPRs: 14 -> 21 (50.00 %)
Scratch size: 16 -> 12 (-25.00 %) dwords per thread
Code Size: 22007488 -> 21358984 (-2.95 %) bytes
LDS: 65 -> 65 (0.00 %) blocks
Max Waves: 28668 -> 27860 (-2.82 %)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: use soffset for MUBUF instructions on SI/CI
Daniel Schürmann [Thu, 5 Dec 2019 17:32:52 +0000 (18:32 +0100)]
aco: use soffset for MUBUF instructions on SI/CI

pipeline-db changes for GFX7:

80310 shaders in 40472 tests
Totals:
SGPRS: 3655300 -> 3655900 (0.02 %)
VGPRS: 2677732 -> 2678324 (0.02 %)
Spilled SGPRs: 1730 -> 1730 (0.00 %)
Spilled VGPRs: 14 -> 14 (0.00 %)
Scratch size: 15540 -> 15540 (0.00 %) dwords per thread
Code Size: 136488364 -> 136106120 (-0.28 %) bytes
LDS: 1259 -> 1259 (0.00 %) blocks
Max Waves: 601039 -> 601014 (-0.00 %)

Totals from affected shaders:
SGPRS: 316312 -> 316912 (0.19 %)
VGPRS: 273844 -> 274436 (0.22 %)
Spilled SGPRs: 770 -> 770 (0.00 %)
Spilled VGPRs: 14 -> 14 (0.00 %)
Scratch size: 16 -> 16 (0.00 %) dwords per thread
Code Size: 22724904 -> 22342660 (-1.68 %) bytes
LDS: 114 -> 114 (0.00 %) blocks
Max Waves: 30861 -> 30836 (-0.08 %)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoradv: Enable ACO on GFX7 (Sea Islands)
Daniel Schürmann [Fri, 15 Nov 2019 14:37:13 +0000 (15:37 +0100)]
radv: Enable ACO on GFX7 (Sea Islands)

This patch also disables AMD_shader_ballot on GFX7 by default if ACO is used.
Note that shader_ballot works correctly, but performance seems inferior.
To enable shader_ballot use RADV_PERFTEST=shader_ballot.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: return to loop_active mask at continue_or_break blocks
Daniel Schürmann [Wed, 4 Dec 2019 12:41:37 +0000 (13:41 +0100)]
aco: return to loop_active mask at continue_or_break blocks

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoradv: disable Youngblood app profile if ACO is used
Daniel Schürmann [Mon, 2 Dec 2019 16:58:12 +0000 (17:58 +0100)]
radv: disable Youngblood app profile if ACO is used

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: implement exclusive scan for SI/CI
Daniel Schürmann [Thu, 21 Nov 2019 09:23:13 +0000 (10:23 +0100)]
aco: implement exclusive scan for SI/CI

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: implement inclusive_scan for SI/CI
Daniel Schürmann [Wed, 20 Nov 2019 17:51:39 +0000 (18:51 +0100)]
aco: implement inclusive_scan for SI/CI

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: implement (clustered) reductions for SI/CI
Daniel Schürmann [Wed, 20 Nov 2019 15:53:42 +0000 (16:53 +0100)]
aco: implement (clustered) reductions for SI/CI

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: don't use a scalar temporary for reductions on GFX10
Daniel Schürmann [Wed, 20 Nov 2019 17:57:23 +0000 (18:57 +0100)]
aco: don't use a scalar temporary for reductions on GFX10

This patch also adds the scalar temporary for scans on SI/CI

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: flush denorms after fmin/fmax on pre-GFX9
Daniel Schürmann [Mon, 18 Nov 2019 17:44:51 +0000 (18:44 +0100)]
aco: flush denorms after fmin/fmax on pre-GFX9

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoradv: only flush scalar cache for SSBO writes with ACO on GFX8+
Daniel Schürmann [Mon, 18 Nov 2019 10:15:06 +0000 (11:15 +0100)]
radv: only flush scalar cache for SSBO writes with ACO on GFX8+

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: disable disassembly for SI/CI due to lack of support by LLVM
Daniel Schürmann [Fri, 15 Nov 2019 15:29:32 +0000 (16:29 +0100)]
aco: disable disassembly for SI/CI due to lack of support by LLVM

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: implement 64bit ine/ieq for SI/CI
Daniel Schürmann [Fri, 8 Nov 2019 12:37:15 +0000 (13:37 +0100)]
aco: implement 64bit ine/ieq for SI/CI

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: implement 64bit i2b for SI /CI
Daniel Schürmann [Fri, 15 Nov 2019 07:20:06 +0000 (08:20 +0100)]
aco: implement 64bit i2b for SI /CI

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: make 1/2*PI a literal constant on SI/CI
Daniel Schürmann [Thu, 14 Nov 2019 07:09:32 +0000 (08:09 +0100)]
aco: make 1/2*PI a literal constant on SI/CI

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: implement 64bit VGPR shifts for SI/CI
Daniel Schürmann [Fri, 8 Nov 2019 10:45:13 +0000 (11:45 +0100)]
aco: implement 64bit VGPR shifts for SI/CI

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: split read/writelane opcode into VOP2/VOP3 version for SI/CI
Daniel Schürmann [Thu, 7 Nov 2019 17:02:33 +0000 (18:02 +0100)]
aco: split read/writelane opcode into VOP2/VOP3 version for SI/CI

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: fix disassembly of writelane instructions.
Daniel Schürmann [Wed, 4 Dec 2019 09:43:14 +0000 (10:43 +0100)]
aco: fix disassembly of writelane instructions.

ACO writes an unused 3rd operand for internal usage
which makes LLVM recoginize it as illegal instruction.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: recognize SI/CI SMRD hazards
Daniel Schürmann [Wed, 6 Nov 2019 17:09:33 +0000 (18:09 +0100)]
aco: recognize SI/CI SMRD hazards

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: implement quad swizzles for SI/CI
Daniel Schürmann [Wed, 6 Nov 2019 13:01:26 +0000 (14:01 +0100)]
aco: implement quad swizzles for SI/CI

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: move buffer_store data to VGPR if needed
Daniel Schürmann [Wed, 6 Nov 2019 11:40:14 +0000 (12:40 +0100)]
aco: move buffer_store data to VGPR if needed

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: implement nir_op_isign on SI/CI
Daniel Schürmann [Wed, 6 Nov 2019 09:35:57 +0000 (10:35 +0100)]
aco: implement nir_op_isign on SI/CI

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: only use scalar loads for readonly buffers on SI/CI
Daniel Schürmann [Wed, 6 Nov 2019 09:13:50 +0000 (10:13 +0100)]
aco: only use scalar loads for readonly buffers on SI/CI

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: implement nir_op_fquantize2f16 for SI/CI
Daniel Schürmann [Wed, 6 Nov 2019 09:12:26 +0000 (10:12 +0100)]
aco: implement nir_op_fquantize2f16 for SI/CI

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: fix SMEM offsets for SI/CI
Daniel Schürmann [Tue, 5 Nov 2019 14:27:59 +0000 (15:27 +0100)]
aco: fix SMEM offsets for SI/CI

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: SI/CI - fix sampler aniso
Daniel Schürmann [Tue, 5 Nov 2019 14:24:12 +0000 (15:24 +0100)]
aco: SI/CI - fix sampler aniso

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: handle gfx7 int8/10 clamping on exports
Dave Airlie [Wed, 10 Jul 2019 04:59:46 +0000 (14:59 +1000)]
aco: handle gfx7 int8/10 clamping on exports

Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: Initial GFX7 Support
Daniel Schürmann [Mon, 4 Nov 2019 17:02:47 +0000 (18:02 +0100)]
aco: Initial GFX7 Support

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: refactor visit_store_fs_output() to use the Builder
Daniel Schürmann [Mon, 18 Nov 2019 09:33:40 +0000 (10:33 +0100)]
aco: refactor visit_store_fs_output() to use the Builder

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoanv: Re-emit all compute state on pipeline switch
Jason Ekstrand [Sat, 7 Dec 2019 00:26:59 +0000 (18:26 -0600)]
anv: Re-emit all compute state on pipeline switch

It's a very odd case to hit in the real world.  However, there are some
CTS tests which switch back and forth between dispatch and clear without
changing the pipeline.

Fixes: bc612536eb2f "anv: Emit a dummy MEDIA_VFE_STATE before switching..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>