mesa.git
4 years agodocs: update source code repository documentation
Timothy Arceri [Thu, 28 Nov 2019 04:26:34 +0000 (15:26 +1100)]
docs: update source code repository documentation

This drops all the old documentaion around applying for push access.

Also this removes the documentation stating that you can push
directly to mesa rather than using merge requests.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1969
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
4 years agoradv: Fix timeline semaphore refcounting.
Bas Nieuwenhuizen [Fri, 22 Nov 2019 00:51:36 +0000 (01:51 +0100)]
radv: Fix timeline semaphore refcounting.

Was totally broken ...

Removed two if(point) {} because point is always non-NULL and we
were counting on that already for counting, since we NULL our
references to semaphores without active point earlier.

Fixes: 4aa75bb3bdd "radv: Add wait-before-submit support for timelines."
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2137
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agowinsys/amdgpu: avoid double simple_mtx_unlock()
Jonathan Gray [Thu, 28 Nov 2019 05:56:30 +0000 (16:56 +1100)]
winsys/amdgpu: avoid double simple_mtx_unlock()

pthread_mutex_unlock() when unlocked is documented by posix as
being undefined behaviour.  On OpenBSD pthread_mutex_unlock() will call
abort(3) if this happens.

This occurs in amdgpu_winsys_create() after
cb446dc0fa5c68f681108f4613560543aa4cf553
winsys/amdgpu: Add amdgpu_screen_winsys

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Cc: 19.2 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
4 years agoutil/driconfig: print ATTENTION if MESA_DEBUG=silent is not set
Marek Olšák [Tue, 26 Nov 2019 01:05:47 +0000 (20:05 -0500)]
util/driconfig: print ATTENTION if MESA_DEBUG=silent is not set

unix-bytebenchmark refuses to run if the driver prints ATTENTION to stderr.

Acked-by: Eric Engestrom <eric@engestrom.ch>
4 years agoglsl: handle max uniform limits with lower_const_arrays_to_uniforms
Tapani Pälli [Fri, 8 Nov 2019 06:17:17 +0000 (08:17 +0200)]
glsl: handle max uniform limits with lower_const_arrays_to_uniforms

Fixes arb_tessellation_shader-large-uniforms Piglit test.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agoradv: Unify max_descriptor_set_size.
Bas Nieuwenhuizen [Wed, 27 Nov 2019 23:36:24 +0000 (00:36 +0100)]
radv: Unify max_descriptor_set_size.

They were out of sync. Besides syncing, lets ensure they never diverge
again.

Fixes: 8d2654a4197 "radv: Support VK_EXT_inline_uniform_block."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agoamd/llvm: Refactor ac_build_scan.
Bas Nieuwenhuizen [Wed, 27 Nov 2019 22:33:59 +0000 (23:33 +0100)]
amd/llvm: Refactor ac_build_scan.

Split out the logic for exclusive scans into a separate function
that makes clear what it does instead of having this opaque 60
line if.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agoradv: add more constants to avoid using magic numbers
Samuel Pitoiset [Tue, 26 Nov 2019 07:32:02 +0000 (08:32 +0100)]
radv: add more constants to avoid using magic numbers

Trivial.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoac/llvm: convert src operands to pointers if necessary
Samuel Pitoiset [Wed, 27 Nov 2019 14:32:45 +0000 (15:32 +0100)]
ac/llvm: convert src operands to pointers if necessary

To avoid generating invalid LLVM IR when both operands don't have
the same type. This might happen when performing pointer comparisons
with SPIRV 1.4.

Fixes invalid LLVM IR for:
dEQP-VK.spirv_assembly.instruction.spirv1p4.opptrequal.variable_pointers_ssbo_equal
dEQP-VK.spirv_assembly.instruction.spirv1p4.opptrnotequal.variable_pointers_ssbo_not_equal

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agollvmpipe: add initial nir support
Dave Airlie [Thu, 5 Sep 2019 05:49:25 +0000 (15:49 +1000)]
llvmpipe: add initial nir support

This adds the hooks between llvmpipe and the gallivm NIR
code, for compute and fragment shaders.

NIR support is hidden behind LP_DEBUG=nir for now until
all the intergration issues are solved

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agogallivm: add swizzle support where one channel isn't defined.
Dave Airlie [Thu, 5 Sep 2019 05:41:05 +0000 (15:41 +1000)]
gallivm: add swizzle support where one channel isn't defined.

NIR doesn't always define all output channels
relies on outputs being memset to 0

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agogallium: add nir lowering passes for the draw pipe stages. (v2)
Dave Airlie [Thu, 5 Sep 2019 05:47:39 +0000 (15:47 +1000)]
gallium: add nir lowering passes for the draw pipe stages. (v2)

This transforms the NIR shaders like the TGSI transforms worked.

v2: fix some nir info requirements, use 32-bit bools

Acked-by: Roland Scheidegger <sroland@vmware.com>
4 years agodraw: add nir info gathering and building support
Dave Airlie [Thu, 5 Sep 2019 05:47:19 +0000 (15:47 +1000)]
draw: add nir info gathering and building support

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agogallivm: add nir->llvm translation (v2)
Dave Airlie [Thu, 5 Sep 2019 05:46:31 +0000 (15:46 +1000)]
gallivm: add nir->llvm translation (v2)

This add the initial implementation of the NIR->LLVM conversion
for llvmpipe NIR support.

v2: lower bool to int32 in nir not llvm

Acked-by: Roland Scheidegger <sroland@vmware.com>
4 years agogallivm: add selection for non-32 bit types
Dave Airlie [Thu, 5 Sep 2019 05:43:38 +0000 (15:43 +1000)]
gallivm: add selection for non-32 bit types

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agogallivm: add cttz wrapper
Dave Airlie [Wed, 20 Nov 2019 01:44:22 +0000 (11:44 +1000)]
gallivm: add cttz wrapper

this will be used to write find_lsb support

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agogallivm: add popcount intrinsic wrapper
Dave Airlie [Mon, 28 Oct 2019 04:21:43 +0000 (14:21 +1000)]
gallivm: add popcount intrinsic wrapper

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agogallivm: nir->tgsi info convertor (v2)
Dave Airlie [Thu, 5 Sep 2019 05:32:21 +0000 (15:32 +1000)]
gallivm: nir->tgsi info convertor (v2)

This is a port of the old radeonsi code to be used for llvmpipe NIR support.

Once we remove TGSI support from llvmpipe (I can dream? :-), then
we should be able to refine most of this down and remove it.

v2: port to later radeonsi code for vertex inputs and sampler/io parsing.

Acked-by: Roland Scheidegger <sroland@vmware.com>
4 years agogallivm: split out the flow control ir to a common file.
Dave Airlie [Thu, 5 Sep 2019 05:34:46 +0000 (15:34 +1000)]
gallivm: split out the flow control ir to a common file.

We can share a bunch of flow control handling between NIR and TGSI.

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agoradeonsi: enable SPIR-V and GL 4.6 for NIR
Marek Olšák [Wed, 6 Nov 2019 23:03:30 +0000 (18:03 -0500)]
radeonsi: enable SPIR-V and GL 4.6 for NIR

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoradeonsi/nir: support interface output types to fix SPIR-V xfb piglits
Marek Olšák [Thu, 7 Nov 2019 01:50:26 +0000 (20:50 -0500)]
radeonsi/nir: support interface output types to fix SPIR-V xfb piglits

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoradeonsi/nir: fix location_frac handling for TCS outputs
Marek Olšák [Thu, 7 Nov 2019 01:19:17 +0000 (20:19 -0500)]
radeonsi/nir: fix location_frac handling for TCS outputs

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoradeonsi/nir: don't rely on data.patch for tess factors
Marek Olšák [Thu, 7 Nov 2019 01:18:23 +0000 (20:18 -0500)]
radeonsi/nir: don't rely on data.patch for tess factors

GLCTS SPIR-V tests have this issue.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoradeonsi/nir: validate is_patch because SPIR-V doesn't set it for tess factors
Marek Olšák [Thu, 7 Nov 2019 01:12:40 +0000 (20:12 -0500)]
radeonsi/nir: validate is_patch because SPIR-V doesn't set it for tess factors

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoradeonsi: simplify get_tcs_tes_buffer_address_from_generic_indices
Marek Olšák [Thu, 7 Nov 2019 00:48:34 +0000 (19:48 -0500)]
radeonsi: simplify get_tcs_tes_buffer_address_from_generic_indices

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoradeonsi: simplify the interface of get_dw_address_from_generic_indices
Marek Olšák [Thu, 7 Nov 2019 00:40:23 +0000 (19:40 -0500)]
radeonsi: simplify the interface of get_dw_address_from_generic_indices

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoradeonsi/nir: implement subgroup system values for SPIR-V
Marek Olšák [Thu, 7 Nov 2019 00:06:09 +0000 (19:06 -0500)]
radeonsi/nir: implement subgroup system values for SPIR-V

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoac/nir: don't rely on data.patch for tess factors
Marek Olšák [Thu, 7 Nov 2019 01:19:10 +0000 (20:19 -0500)]
ac/nir: don't rely on data.patch for tess factors

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agodrirc: Set vs_position_always_invariant for Shadow of Mordor on Intel
Kenneth Graunke [Fri, 22 Nov 2019 09:37:02 +0000 (01:37 -0800)]
drirc: Set vs_position_always_invariant for Shadow of Mordor on Intel

When drawing the main character in Shadow of Mordor, the game appears
to draw Talion with one vertex shader, and the Wraith with another.
If the compiler optimizes those in different ways which lead to slight
imprecisions, then the resulting positions may not line up, leading to
Z-fighting occurring as the game decides which of the two are in front.

brw_nir_opt_peephole_ffma looks at usages of multiply adds across the
entire shader, and may make different decisions between the two, leading
to such imprecisions and Z-fighting.  This started happening recently
after a NIR change to eliminate unnecessary MOVs (7025dbe7), but that
change simply exposed the existing problem.

Improves performance on Skylake GT4e by 1.22945% +/- 0.398672% (n=3),
likely due to the fixed rendering.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1985
Fixes: 7025dbe794b ("nir: Skip emitting no-op movs from the builder.")
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
4 years agodriconf, glsl: Add a vs_position_always_invariant option
Kenneth Graunke [Fri, 22 Nov 2019 00:11:15 +0000 (16:11 -0800)]
driconf, glsl: Add a vs_position_always_invariant option

Many applications use multi-pass rendering and require their vertex
shader position to be computed the same way each time.  Optimizations
may consider, say, fusing a multiply-add based on global usage of an
expression in a shader.  But a second shader with the same expression
may have different code, causing that optimization to make the other
choice the second time around.

The correct solution is for applications to mark their VS outputs
'invariant', indicating they need multiple shaders to compute that
output in the same manner.  However, most applications fail to do so.

So, we add a new driconf option - vs_position_always_invariant - which
forces the gl_Position output in vertex shaders to be marked invariant.

Fixes: 7025dbe794b ("nir: Skip emitting no-op movs from the builder.")
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
4 years agoturnip: Disable timestamp queries for now.
Eric Anholt [Wed, 27 Nov 2019 00:16:05 +0000 (16:16 -0800)]
turnip: Disable timestamp queries for now.

They're not implemented, and not critical to bring up immediately.  Avoids
failures in the CTS when nothing gets written to the query.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agofreedreno/perfcntrs/fdperf: add missing a2xx case in select_counter
Jonathan Marek [Wed, 27 Nov 2019 15:46:22 +0000 (10:46 -0500)]
freedreno/perfcntrs/fdperf: add missing a2xx case in select_counter

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/perfcntrs/fdperf: add missing a20x compatible
Jonathan Marek [Wed, 27 Nov 2019 15:45:41 +0000 (10:45 -0500)]
freedreno/perfcntrs/fdperf: add missing a20x compatible

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/perfcntrs/fdperf: fix u64 print on 32-bit builds
Jonathan Marek [Wed, 27 Nov 2019 15:44:57 +0000 (10:44 -0500)]
freedreno/perfcntrs/fdperf: fix u64 print on 32-bit builds

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/perfcntrs: add a2xx MH counters
Jonathan Marek [Wed, 27 Nov 2019 15:40:59 +0000 (10:40 -0500)]
freedreno/perfcntrs: add a2xx MH counters

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/registers: add missing MH perfcounter enum for a2xx
Jonathan Marek [Wed, 27 Nov 2019 15:38:14 +0000 (10:38 -0500)]
freedreno/registers: add missing MH perfcounter enum for a2xx

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@chromium.org>
4 years agogitlab-ci: Put HTML summary in artifacts for failed piglit jobs
Michel Dänzer [Mon, 25 Nov 2019 17:42:10 +0000 (18:42 +0100)]
gitlab-ci: Put HTML summary in artifacts for failed piglit jobs

This will make it easier to look at details of failed / skipped tests.

Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agogitlab-ci: Stop storing piglit test results as JUnit
Michel Dänzer [Tue, 26 Nov 2019 15:27:07 +0000 (16:27 +0100)]
gitlab-ci: Stop storing piglit test results as JUnit

Since we're not reporting test results as JUnit anymore, we can use the
default JSON format.

This affects how test results are summarized, update the reference files
accordingly.

Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agogitlab-ci: Stop reporting piglit test results via JUnit
Michel Dänzer [Tue, 26 Nov 2019 16:44:49 +0000 (17:44 +0100)]
gitlab-ci: Stop reporting piglit test results via JUnit

It was basically useless in this form, and processing the JUnit data in
the GitLab backend was pretty expensive.

Acked-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agov3d: fix indirect BO allocation for uniforms
Iago Toral Quiroga [Tue, 26 Nov 2019 14:28:52 +0000 (15:28 +0100)]
v3d: fix indirect BO allocation for uniforms

We were always ensuring a minimum size of 4 bytes for uniforms
for the case where we don't have any, to account for hardware pre-fetching
of the uniform stream, however, pre-fetching could also lead to to out
of bounds reads when have read the last uniform in the stream, so we
probably want to have the extra 4 bytes to prevent the kernel from
observing invalid memory accesses when the uniform stream sits right at
the end of a page.

This seems to fix MMU exceptions reported with a Linux 5.4 kernel.

Credit goes to Phil Elwell for identifying the problem and narrowing
it down to memory accesses in the uniform stream.

Reported-by: Phil Elwell <phil@raspberrypi.org>
Tested-by: Phil Elwell <phil@raspberrypi.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoradv: enable VK_KHR_shader_subgroup_extended_types on GFX10
Samuel Pitoiset [Mon, 25 Nov 2019 15:53:54 +0000 (16:53 +0100)]
radv: enable VK_KHR_shader_subgroup_extended_types on GFX10

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoac: add 8-bit and 16-bit supports to ac_build_permlane16()
Samuel Pitoiset [Mon, 25 Nov 2019 16:02:44 +0000 (17:02 +0100)]
ac: add 8-bit and 16-bit supports to ac_build_permlane16()

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv/gfx10: fix implementation of exclusive scans
Samuel Pitoiset [Fri, 23 Aug 2019 15:53:05 +0000 (17:53 +0200)]
radv/gfx10: fix implementation of exclusive scans

This implementation is loosely based on ROCm.
https://github.com/RadeonOpenCompute/ROCm-Device-Libs/blob/master/ockl/src/wfredscan.cl

This fixes dEQP-VK.subgroups.arithmetic.*.subgroupexclusive* on GFX10.

Fixes: 227c29a80de ("amd/common/gfx10: implement scan & reduce operations")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: fix enabling sample shading with SampleID/SamplePosition
Samuel Pitoiset [Tue, 26 Nov 2019 17:29:00 +0000 (18:29 +0100)]
radv: fix enabling sample shading with SampleID/SamplePosition

When a fragment shader includes an input variable decorated with
SampleId or SamplePosition, sample shading should be enabled
because minSampleShadingFactor is expected to be 1.0.

Cc: 19.2, 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoturnip: fix integer render targets
Jonathan Marek [Tue, 19 Nov 2019 04:01:18 +0000 (23:01 -0500)]
turnip: fix integer render targets

Add missing required bits.  Fixes at least:

dEQP-VK.pipeline.render_to_image.dedicated_allocation.1d.small.r16g16_sint_d24_unorm_s8_uint
dEQP-VK.pipeline.render_to_image.dedicated_allocation.2d.mipmap.r16g16_sint_d24_unorm_s8_uint
dEQP-VK.renderpass.dedicated_allocation.attachment.4.401
dEQP-VK.renderpass2.suballocation.formats.r16_uint.load.draw
dEQP-VK.synchronization.op.single_queue.barrier.write_draw_read_copy_image_to_buffer.image_128x128_r16_uint

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoanv: Push constants are relative to dynamic state on IVB
Jason Ekstrand [Mon, 25 Nov 2019 17:05:42 +0000 (11:05 -0600)]
anv: Push constants are relative to dynamic state on IVB

Fixes: aecde2351 "anv: Pre-compute push ranges for graphics pipelines"
Closes: #2136
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agomeson: Add -Werror=gnu-empty-initializer to MSVC compat args
Dylan Baker [Thu, 21 Nov 2019 17:11:45 +0000 (09:11 -0800)]
meson: Add -Werror=gnu-empty-initializer to MSVC compat args

Only clang has this argument (at least as of clang 8 and gcc 9), which
errors when using the gcc empty initializer syntax in C:

```C
struct foo f = {};
```

GCC has a warning for this, but only when using -Wpedantic, which is a
lot of noise to lose useful warnings in.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agogallium/auxiliary: Fix uses of gnu struct = {} extension
Dylan Baker [Thu, 21 Nov 2019 17:50:27 +0000 (09:50 -0800)]
gallium/auxiliary: Fix uses of gnu struct = {} extension

Most of these will never actually be compiled by windows, but in the
interest of being able to make using struct foo = {}; an error and
avoiding breaking windows removing a handful of safe uses seems like a
good trade off.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agost/mesa: add st_variant base class to simplify code for shader variants
Marek Olšák [Mon, 25 Nov 2019 22:58:45 +0000 (17:58 -0500)]
st/mesa: add st_variant base class to simplify code for shader variants

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agost/mesa: don't use ** in the st_nir_link_shaders signature
Marek Olšák [Sat, 23 Nov 2019 00:31:46 +0000 (19:31 -0500)]
st/mesa: don't use ** in the st_nir_link_shaders signature

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agost/mesa: simplify looping over linked shaders when linking NIR
Marek Olšák [Tue, 19 Nov 2019 22:30:03 +0000 (17:30 -0500)]
st/mesa: simplify looping over linked shaders when linking NIR

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agost/mesa: propagate gl_PatchVerticesIn from TCS to TES before linking for NIR
Marek Olšák [Wed, 13 Nov 2019 04:48:02 +0000 (23:48 -0500)]
st/mesa: propagate gl_PatchVerticesIn from TCS to TES before linking for NIR

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agost/mesa: don't call ProgramStringNotify in glsl_to_nir
Marek Olšák [Wed, 13 Nov 2019 04:46:37 +0000 (23:46 -0500)]
st/mesa: don't call ProgramStringNotify in glsl_to_nir

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agost/mesa: don't use redundant stp->state.ir.nir
Marek Olšák [Thu, 21 Nov 2019 00:18:21 +0000 (19:18 -0500)]
st/mesa: don't use redundant stp->state.ir.nir

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agost/mesa: don't serialize all streamout state if there are no SO outputs
Marek Olšák [Mon, 25 Nov 2019 22:01:42 +0000 (17:01 -0500)]
st/mesa: don't serialize all streamout state if there are no SO outputs

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agoiris: Disable VF cache partial address workaround on Gen11+
Kenneth Graunke [Mon, 25 Nov 2019 18:04:38 +0000 (10:04 -0800)]
iris: Disable VF cache partial address workaround on Gen11+

The vertex cache uses the full 48-bit address on Gen11+.  See the
documentation for 3DSTATE_VERTEX_BUFFERS, which describes the
workaround and lists it as pre-Icelake.

Interestingly, the docs don't mention index buffers as needing a
workaround at all.  So either we've been overzealous, or the docs
never got updated to record that.  Which begs the question of whether
the issue there was fixed, if there was one...

Cuts 40% of the PIPE_CONTROLs from Civilization VI's benchmark; appears
that it improves performance by about 1-2% on Icelake 8x8 (not frequency
locked).

4 years agofreedreno: switch to layout helper
Rob Clark [Sun, 5 May 2019 17:59:37 +0000 (10:59 -0700)]
freedreno: switch to layout helper

The slices table and most of the other layout fields in the
freedreno_resource moves into fdl_layout.

v2: Changes by anholt to not have duplicate fields, which was introducing
    a surprising behavior change in resource layout (using the
    level_linear helper before the setup of the shadowed fields)

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/a6xx: Log the tiling mode in resource layout debug.
Eric Anholt [Thu, 21 Nov 2019 04:54:27 +0000 (20:54 -0800)]
freedreno/a6xx: Log the tiling mode in resource layout debug.

This was important for figuring out what went wrong with the layout
refactor.

Acked-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno: Convert the slice struct to the new resource header.
Eric Anholt [Wed, 20 Nov 2019 20:40:25 +0000 (12:40 -0800)]
freedreno: Convert the slice struct to the new resource header.

This gets the worst of the sed required for shared resource layout out of
the way.  The texture layout comment is dropped now that we're referencing
the shared header, which has a more complete description.

Acked-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno: Introduce a resource layout header.
Eric Anholt [Wed, 20 Nov 2019 20:28:43 +0000 (12:28 -0800)]
freedreno: Introduce a resource layout header.

This will be used for sharing resource layout code between freedreno and
tu.  Mostly copied from a commit by Rob, with a new location and the slice
struct renamed for consistency.

Acked-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno: Introduce a fd_resource_tile_mode() helper.
Eric Anholt [Wed, 20 Nov 2019 21:17:27 +0000 (13:17 -0800)]
freedreno: Introduce a fd_resource_tile_mode() helper.

Multiple places were doing the same thing to get the tile mode of a level,
so refactor it out.  This will make the shared resource helper transition
cleaner.

Acked-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno: Introduce a fd_resource_layer_stride() helper.
Eric Anholt [Wed, 20 Nov 2019 20:55:56 +0000 (12:55 -0800)]
freedreno: Introduce a fd_resource_layer_stride() helper.

This factors out a bit of duplicated code, but will also make the shared
resource layout transition process clearer.

Acked-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno: use rsc->slice accessor everywhere
Rob Clark [Sun, 5 May 2019 15:10:24 +0000 (08:10 -0700)]
freedreno: use rsc->slice accessor everywhere

This will make it easier to extract the slice table out into a layout
helper.

Acked-by: Rob Clark <robdclark@chromium.org>
4 years agonir: Make algebraic backtrack and reprocess after a replacement.
Eric Anholt [Wed, 2 Oct 2019 17:59:13 +0000 (10:59 -0700)]
nir: Make algebraic backtrack and reprocess after a replacement.

The algebraic pass was exhibiting O(n^2) behavior in
dEQP-GLES2.functional.uniform_api.random.3 and
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 (along with
other code-generated tests, and likely real-world loop-unroll cases).
In the process of using fmul(b2f(x), b2f(x)) -> b2f(iand(x, y)) to
transform:

result = b2f(a == b);
result *= b2f(c == d);
...
result *= b2f(z == w);

->

temp = (a == b)
temp = temp && (c == d)
...
temp = temp && (z == w)
result = b2f(temp);

nir_opt_algebraic, proceeding bottom-to-top, would match and convert
the top-most fmul(b2f(), b2f()) case each time, leaving the new b2f to
be matched by the next fmul down on the next time algebraic got run by
the optimization loop.

Back in 2016 in 7be8d0773229 ("nir: Do opt_algebraic in reverse
order."), Matt changed algebraic to go bottom-to-top so that we would
match the biggest patterns first.  This helped his cases, but I
believe introduced this failure mode.  Instead of reverting that, now
that we've got the automaton, we can update the automaton's state
recursively and just re-process any instructions whose state has
changed (indicating that they might match new things).  There's a
small chance that the state will hash to the same value and miss out
on this round of algebraic, but this seems to be good enough to fix
dEQP.

Effects with NIR_VALIDATE=0 (improvement is better with validation enabled):

Intel shader-db runtime -0.954712% +/- 0.333844% (n=44/46, obvious throttling
  outliers removed)
dEQP-GLES2.functional.uniform_api.random.3 runtime
  -65.3512% +/- 4.22369% (n=21, was 1.4s)
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 runtime
  -68.8066% +/- 6.49523% (was 4.8s)

v2: Use two worklists, suggested by @cwabbott, to cut out a bunch of
    tricky code.  Runtime of uniform_api.random.3 down -0.790299% +/-
    0.244213% compred to v1.
v3: Re-add the nir_instr_remove() that I accidentally dropped in v2,
    fixing infinite loops.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
4 years agonir: Refactor algebraic's block walk
Eric Anholt [Wed, 13 Nov 2019 20:15:08 +0000 (12:15 -0800)]
nir: Refactor algebraic's block walk

My motivation was to clarify the changes in the following commit, but
incidentally, it reduces runtime of
dEQP-GLES2.functional.uniform_api.random.3 (an algebraic-heavy
testcase) by -5.39524% +/- 2.21179% (n=15)

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
4 years agonir: Maintain the algebraic automaton's state as we work.
Connor Abbott [Fri, 10 May 2019 14:57:45 +0000 (16:57 +0200)]
nir: Maintain the algebraic automaton's state as we work.

In order to have nir_opt_algebraic be able to do further algebraic
work on the output of a replacement, we need to maintain the
automaton's state.

Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoetnaviv: support 3d/array/integer formats in texture descriptors
Jonathan Marek [Sun, 20 Oct 2019 06:11:45 +0000 (02:11 -0400)]
etnaviv: support 3d/array/integer formats in texture descriptors

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoetnaviv: blt: fix partial ZS clears with TS
Jonathan Marek [Fri, 9 Aug 2019 14:55:46 +0000 (10:55 -0400)]
etnaviv: blt: fix partial ZS clears with TS

If not all bits are cleared, then BLT needs to be given the current clear
value and not the new one.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoaco: don't value-number instructions from within a loop with ones after the loop.
Daniel Schürmann [Tue, 26 Nov 2019 14:28:54 +0000 (15:28 +0100)]
aco: don't value-number instructions from within a loop with ones after the loop.

Fixes:
Wolfenstein:Youngblood (w/o shader_ballot)
dEQP-VK.descriptor_indexing.combined_image_sampler_in_loop_with_lod

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: set dlc/glc correctly for image loads
Rhys Perry [Fri, 20 Sep 2019 11:08:19 +0000 (12:08 +0100)]
aco: set dlc/glc correctly for image loads

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
4 years agoaco: allow constant offsets for global/scratch instructions on GFX10
Rhys Perry [Wed, 20 Nov 2019 14:52:15 +0000 (14:52 +0000)]
aco: allow constant offsets for global/scratch instructions on GFX10

I don't think the bug applies for global/scratch instructions and
load_barycentric_at_sample selection expects this feature to work.

Fixes various dEQP-VK.pipeline.multisample_interpolation.* tests on GFX10.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
4 years agoradv: Enable VK_KHR_buffer_device_address.
Bas Nieuwenhuizen [Tue, 26 Nov 2019 00:00:20 +0000 (01:00 +0100)]
radv: Enable VK_KHR_buffer_device_address.

Still no capture/replay or multi device support.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agoradv: fix reporting subgroup size with VK_KHR_pipeline_executable_properties
Samuel Pitoiset [Mon, 25 Nov 2019 13:24:52 +0000 (14:24 +0100)]
radv: fix reporting subgroup size with VK_KHR_pipeline_executable_properties

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: Allocate cmdbuffer space for buffer marker write.
Bas Nieuwenhuizen [Mon, 25 Nov 2019 22:58:04 +0000 (23:58 +0100)]
radv: Allocate cmdbuffer space for buffer marker write.

Fixes: 946193ae008 "radv: add support for VK_AMD_buffer_marker"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agor600: Disable eight bit three channel formats
Gert Wollny [Mon, 18 Nov 2019 10:57:00 +0000 (11:57 +0100)]
r600: Disable eight bit three channel formats

Commit 0899bf55 made some deqp-gles3 tests related to RGB8 PBOs fail
on R600 because it exposed PIPE_FORMAT_R8G8B8_UNORM and R600 doesn't
propely handle this. Disabling this format also for buffers fixes the
issue.

In addition, disabling also the related RGB8 integer formats for buffers
fixes some deqp-gles3 tests:

  dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rgb8ui_cube
  dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8i_2d
  dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8i_cube
  dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8ui_2d
  dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8ui_cube
  dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8i_2d_array
  dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8i_3d
  dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8ui_2d_array
  dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8ui_3d
  dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8i_2d_array
  dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8i_3d
  dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8ui_2d_array
  dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8ui_3d

Fixes: 0899bf55
  st/mesa: Map MESA_FORMAT_RGB_UNORM8 <-> PIPE_FORMAT_R8G8B8_UNORM

Closes #2118

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoac/llvm: fix warning in ac_build_canonicalize()
Samuel Pitoiset [Mon, 25 Nov 2019 07:51:16 +0000 (08:51 +0100)]
ac/llvm: fix warning in ac_build_canonicalize()

../src/amd/llvm/ac_llvm_build.c: In function ‘ac_build_canonicalize’:
../src/amd/llvm/ac_llvm_build.c:4567:9: warning: ‘intr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
 4567 |  return ac_build_intrinsic(ctx, intr, type, params, 1,
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 4568 |       AC_FUNC_ATTR_READNONE);
      |       ~~~~~~~~~~~~~~~~~~~~~~
../src/amd/llvm/ac_llvm_build.c:4567:9: warning: ‘type’ may be used uninitialized in this function [-Wmaybe-uninitialized]

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agomapi: add GetInteger64vEXT with EXT_disjoint_timer_query
Tapani Pälli [Tue, 19 Nov 2019 10:44:29 +0000 (12:44 +0200)]
mapi: add GetInteger64vEXT with EXT_disjoint_timer_query

From EXT_disjoint_timer_query spec:

   "Interaction: This extension adds GetInteger64vEXT if
    OpenGL ES 3.0 is not supported"

See https://github.com/KhronosGroup/OpenGL-Registry/issues/326.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2090
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agovulkan: Update the XML and headers to 1.1.129
Jason Ekstrand [Mon, 25 Nov 2019 17:20:42 +0000 (11:20 -0600)]
vulkan: Update the XML and headers to 1.1.129

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agoanv/entrypoints: Better handle promoted extensions
Jason Ekstrand [Wed, 26 Jun 2019 18:57:34 +0000 (13:57 -0500)]
anv/entrypoints: Better handle promoted extensions

In the case of promoted extensions we can end up with an entrypoint that
we support being an alias of an entrypoint we do not support.  For
instance, if an extension gets promoted from EXT to KHR, the EXT entry-
points may be aliases of the KHR ones.  We want to leave everything as
EXT until we get around to advertising the KHR so that we don't break
things when we update the XML and headers.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agovulkan/enum_to_str: Handle out-of-order aliases
Jason Ekstrand [Wed, 26 Jun 2019 18:35:54 +0000 (13:35 -0500)]
vulkan/enum_to_str: Handle out-of-order aliases

The current code can only handle enum aliases if the original enum is
declared first followed by the alias as we walk the XML in a linear
fashion.  This commit allows us to handle aliases where the alias
declaration comes before the thing it's aliasing.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agoiris: Update SURFACE_STATE addresses when setting sampler views
Kenneth Graunke [Fri, 15 Nov 2019 23:18:06 +0000 (15:18 -0800)]
iris: Update SURFACE_STATE addresses when setting sampler views

We may have replaced the backing storage for a texture buffer while it
was unbound, at which point iris_rebind_buffer would not have caught it
and updated it.  We need to ensure that the current resource's address
matches the one our SURFACE_STATE points at.  If not, update addresses
and re-upload the SURFACE_STATE.

Shader images and buffers do not suffer from this problem because we
re-stream the surface state on every set call, since there isn't a
created CSO object for those with a saved SURFACE_STATE.  Constant
buffers are also currently re-streamed (we pitch the SURFACE_STATE
on every set_constant_buffer call).  Surfaces would need this
treatment (as they're created CSOs) except that we never swap out
their backing storage today (we only do it for buffers), so it's OK
for now.

Fixes misrendering in Unreal 4 demos (Elemental, Matinee Fight Scene).
Huge thanks to Andrii Simiklit for tracking down the problem - it was
quite difficult to find!  Also fixes Andrii's new Piglit test for the
bug, 'arb_texture_buffer_object-re-init'.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1365
5 years agoiris: Maintain CPU-side SURFACE_STATE copies for views and surfaces.
Kenneth Graunke [Fri, 15 Nov 2019 01:17:43 +0000 (17:17 -0800)]
iris: Maintain CPU-side SURFACE_STATE copies for views and surfaces.

When replacing the backing storage for texture buffers, image buffers,
and so on, we may need to update the "Surface Base Address" field in
any corresponding SURFACE_STATE.  This is easier to accomplish if we
have a copy on the CPU - we can just compare the current field, update
it, and re-upload.

This patch adds a CPU-side copy to the new iris_surface_state wrapper
struct, and reworks allocation and upload to fill things out on the
CPU copy first, then upload that to the GPU when finished.

This will be necessary to fix iris_invalidate_resource bugs shortly.

Technically, we never replace the backing storage for pipe_surfaces
(render targets), so we don't need to make this change there.  However,
it's nice to have surfaces, sampler views, and image views handled
similarly.  Plus, if we ever wanted to swap out backing storage for
busy textures, we'd need this infrastructure.

v2: Properly free memory (caught by Andrii Simiklit)

5 years agoiris: Create an "iris_surface_state" wrapper struct
Kenneth Graunke [Fri, 15 Nov 2019 00:06:10 +0000 (16:06 -0800)]
iris: Create an "iris_surface_state" wrapper struct

Today, we only have a state reference to the GPU buffer containing our
uploaded SURFACE_STATEs.  However, we're going to want a CPU-side copy
soon.  Making a wrapper struct means we can talk about both together,
and also put both in the field called "surface_state".

5 years agoiris: Drop 'old_address' parameter from iris_rebind_buffer
Kenneth Graunke [Thu, 31 Oct 2019 16:41:49 +0000 (09:41 -0700)]
iris: Drop 'old_address' parameter from iris_rebind_buffer

We can just compare the VERTEX_BUFFER_STATE address field to the
current BO's address.  When calling rebind, we've already updated
the resource to the new buffer, but the state will have the old
address.

5 years agoiris: Stop mutating the resource in get_rt_read_isl_surf().
Kenneth Graunke [Fri, 15 Nov 2019 07:26:07 +0000 (23:26 -0800)]
iris: Stop mutating the resource in get_rt_read_isl_surf().

Mutating fields of global resources is generally not safe, and the only
reason we were doing it was to avoid passing an extra parameter to
the fill_surface_state helper.

5 years agoradeonsi/nir: don't run si_nir_opts again if there is no change
Marek Olšák [Sat, 23 Nov 2019 03:47:02 +0000 (22:47 -0500)]
radeonsi/nir: don't run si_nir_opts again if there is no change

0.3% less overhead

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agoradeonsi: initialize the per-context compiler on demand
Marek Olšák [Wed, 20 Nov 2019 23:40:46 +0000 (18:40 -0500)]
radeonsi: initialize the per-context compiler on demand

This takes a noticable amount of time in piglit and some tests don't
need it.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
5 years agoac: set swizzled bit in cache policy as a hint not to merge loads/stores
Marek Olšák [Fri, 22 Nov 2019 22:41:22 +0000 (17:41 -0500)]
ac: set swizzled bit in cache policy as a hint not to merge loads/stores

LLVM now merges loads and stores for all opcodes, so this must be set.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agonir: Add a scheduler pass to reduce maximum register pressure.
Eric Anholt [Tue, 19 Feb 2019 17:30:52 +0000 (09:30 -0800)]
nir: Add a scheduler pass to reduce maximum register pressure.

This is similar to a scheduler I've written for vc4 and i965, but this
time written at the NIR level so that hopefully it's reusable.  A notable
new feature it has is Goodman/Hsu's heuristic of "once we've started
processing the uses of a value, prioritize processing the rest of their
uses", which should help avoid the heuristic otherwise making such
systematically bad choices around getting texture results consumed.

Results for v3d:

total instructions in shared programs: 6497588 -> 6518242 (0.32%)
total threads in shared programs: 154000 -> 152828 (-0.76%)
total uniforms in shared programs: 2119629 -> 2068681 (-2.40%)
total spills in shared programs: 4984 -> 472 (-90.53%)
total fills in shared programs: 6418 -> 1546 (-75.91%)

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> (v1)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v2)
v2: Use the DAG datastructure, fold in the scheduling-for-parallelism
    patch, include SSA defs in live values so we can switch to bottom-up
    if we want.
v3: Squash in improvements from Alejandro Piñeiro for getting V3D to
    successfully register allocate on GLES3.1 dEQP.  Make sure that
    discards don't move after store_output.  Comment spelling fix.

5 years agoetnaviv: implement 64bpp clear
Jonathan Marek [Mon, 12 Aug 2019 15:43:26 +0000 (11:43 -0400)]
etnaviv: implement 64bpp clear

At the same time, update etna_clear_blit_pack_rgba to work with integer
formats.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
5 years agoetnaviv: avoid using RS for 64bpp formats
Jonathan Marek [Mon, 12 Aug 2019 15:34:57 +0000 (11:34 -0400)]
etnaviv: avoid using RS for 64bpp formats

At the same time, this change allows using BLT for 8bpp formats

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
5 years agoetnaviv: add support for extended pe formats
Christian Gmeiner [Fri, 14 Jun 2019 06:22:07 +0000 (08:22 +0200)]
etnaviv: add support for extended pe formats

Use the extended format if an such a format was passed.

v1 -> v2:
 - set FORMAT_MASK bit when using ext PE format as suggested
   by Wladimir J. van der Laan

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
5 years agoetnaviv: handle 8 byte block in tiling
Christian Gmeiner [Tue, 1 May 2018 14:48:41 +0000 (16:48 +0200)]
etnaviv: handle 8 byte block in tiling

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
5 years agoradv: select the depth decompress path based on the aspect mask
Samuel Pitoiset [Thu, 17 Oct 2019 13:05:59 +0000 (15:05 +0200)]
radv: select the depth decompress path based on the aspect mask

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: create decompress pipelines for separate depth/stencil layouts
Samuel Pitoiset [Thu, 17 Oct 2019 12:57:04 +0000 (14:57 +0200)]
radv: create decompress pipelines for separate depth/stencil layouts

No functional changes as the driver still uses the depth+stencil
pipeline.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: rework creation of decompress/resummarize meta pipelines
Samuel Pitoiset [Thu, 17 Oct 2019 12:48:23 +0000 (14:48 +0200)]
radv: rework creation of decompress/resummarize meta pipelines

This refactoring will help for creating more decompress pipelines.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: set the image view aspect mask before resolves
Samuel Pitoiset [Thu, 17 Oct 2019 13:26:07 +0000 (15:26 +0200)]
radv: set the image view aspect mask before resolves

No functional changes, but it will be used to decompress
separate depth/stencil aspects.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: set the image view aspect mask during subpass transitions
Samuel Pitoiset [Wed, 16 Oct 2019 12:13:52 +0000 (14:13 +0200)]
radv: set the image view aspect mask during subpass transitions

No functional changes because the aspect mask is still not used
during image transitions but it will be needed for the separate
depth/stencil aspects logic.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoaco: enable load/store vectorizer
Rhys Perry [Wed, 18 Sep 2019 19:31:33 +0000 (20:31 +0100)]
aco: enable load/store vectorizer

Totals from affected shaders:
SGPRS: 1890373 -> 1900772 (0.55 %)
VGPRS: 1210024 -> 1215244 (0.43 %)
Spilled SGPRs: 828 -> 828 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 252 -> 252 (0.00 %) dwords per thread
Code Size: 81937504 -> 74608304 (-8.94 %) bytes
LDS: 746 -> 746 (0.00 %) blocks
Max Waves: 230491 -> 230158 (-0.14 %)

In NeiR:Automata and GTA V, the code decrease is especially large: -13.79%
and -15.32%, respectively.

v9: rework the callback function
v10: handle load_shared/store_shared in the callback

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)
5 years agonir: add load/store vectorizer tests
Rhys Perry [Mon, 2 Sep 2019 15:09:24 +0000 (16:09 +0100)]
nir: add load/store vectorizer tests

v7: run nir_opt_algebraic
v9: rework the callback function
v9: update alignment on all loads/stores, even if they're not vectorized
v10: add tests for 64-bit offsets
v10: add tests for signed offsets

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)