Eric Anholt [Wed, 20 Nov 2019 21:17:27 +0000 (13:17 -0800)]
freedreno: Introduce a fd_resource_tile_mode() helper.
Multiple places were doing the same thing to get the tile mode of a level,
so refactor it out. This will make the shared resource helper transition
cleaner.
Acked-by: Rob Clark <robdclark@chromium.org>
Eric Anholt [Wed, 20 Nov 2019 20:55:56 +0000 (12:55 -0800)]
freedreno: Introduce a fd_resource_layer_stride() helper.
This factors out a bit of duplicated code, but will also make the shared
resource layout transition process clearer.
Acked-by: Rob Clark <robdclark@chromium.org>
Rob Clark [Sun, 5 May 2019 15:10:24 +0000 (08:10 -0700)]
freedreno: use rsc->slice accessor everywhere
This will make it easier to extract the slice table out into a layout
helper.
Acked-by: Rob Clark <robdclark@chromium.org>
Eric Anholt [Wed, 2 Oct 2019 17:59:13 +0000 (10:59 -0700)]
nir: Make algebraic backtrack and reprocess after a replacement.
The algebraic pass was exhibiting O(n^2) behavior in
dEQP-GLES2.functional.uniform_api.random.3 and
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 (along with
other code-generated tests, and likely real-world loop-unroll cases).
In the process of using fmul(b2f(x), b2f(x)) -> b2f(iand(x, y)) to
transform:
result = b2f(a == b);
result *= b2f(c == d);
...
result *= b2f(z == w);
->
temp = (a == b)
temp = temp && (c == d)
...
temp = temp && (z == w)
result = b2f(temp);
nir_opt_algebraic, proceeding bottom-to-top, would match and convert
the top-most fmul(b2f(), b2f()) case each time, leaving the new b2f to
be matched by the next fmul down on the next time algebraic got run by
the optimization loop.
Back in 2016 in
7be8d0773229 ("nir: Do opt_algebraic in reverse
order."), Matt changed algebraic to go bottom-to-top so that we would
match the biggest patterns first. This helped his cases, but I
believe introduced this failure mode. Instead of reverting that, now
that we've got the automaton, we can update the automaton's state
recursively and just re-process any instructions whose state has
changed (indicating that they might match new things). There's a
small chance that the state will hash to the same value and miss out
on this round of algebraic, but this seems to be good enough to fix
dEQP.
Effects with NIR_VALIDATE=0 (improvement is better with validation enabled):
Intel shader-db runtime -0.954712% +/- 0.333844% (n=44/46, obvious throttling
outliers removed)
dEQP-GLES2.functional.uniform_api.random.3 runtime
-65.3512% +/- 4.22369% (n=21, was 1.4s)
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 runtime
-68.8066% +/- 6.49523% (was 4.8s)
v2: Use two worklists, suggested by @cwabbott, to cut out a bunch of
tricky code. Runtime of uniform_api.random.3 down -0.790299% +/-
0.244213% compred to v1.
v3: Re-add the nir_instr_remove() that I accidentally dropped in v2,
fixing infinite loops.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Eric Anholt [Wed, 13 Nov 2019 20:15:08 +0000 (12:15 -0800)]
nir: Refactor algebraic's block walk
My motivation was to clarify the changes in the following commit, but
incidentally, it reduces runtime of
dEQP-GLES2.functional.uniform_api.random.3 (an algebraic-heavy
testcase) by -5.39524% +/- 2.21179% (n=15)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Connor Abbott [Fri, 10 May 2019 14:57:45 +0000 (16:57 +0200)]
nir: Maintain the algebraic automaton's state as we work.
In order to have nir_opt_algebraic be able to do further algebraic
work on the output of a replacement, we need to maintain the
automaton's state.
Reviewed-by: Eric Anholt <eric@anholt.net>
Jonathan Marek [Sun, 20 Oct 2019 06:11:45 +0000 (02:11 -0400)]
etnaviv: support 3d/array/integer formats in texture descriptors
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Jonathan Marek [Fri, 9 Aug 2019 14:55:46 +0000 (10:55 -0400)]
etnaviv: blt: fix partial ZS clears with TS
If not all bits are cleared, then BLT needs to be given the current clear
value and not the new one.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Daniel Schürmann [Tue, 26 Nov 2019 14:28:54 +0000 (15:28 +0100)]
aco: don't value-number instructions from within a loop with ones after the loop.
Fixes:
Wolfenstein:Youngblood (w/o shader_ballot)
dEQP-VK.descriptor_indexing.combined_image_sampler_in_loop_with_lod
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Rhys Perry [Fri, 20 Sep 2019 11:08:19 +0000 (12:08 +0100)]
aco: set dlc/glc correctly for image loads
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Rhys Perry [Wed, 20 Nov 2019 14:52:15 +0000 (14:52 +0000)]
aco: allow constant offsets for global/scratch instructions on GFX10
I don't think the bug applies for global/scratch instructions and
load_barycentric_at_sample selection expects this feature to work.
Fixes various dEQP-VK.pipeline.multisample_interpolation.* tests on GFX10.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Bas Nieuwenhuizen [Tue, 26 Nov 2019 00:00:20 +0000 (01:00 +0100)]
radv: Enable VK_KHR_buffer_device_address.
Still no capture/replay or multi device support.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Samuel Pitoiset [Mon, 25 Nov 2019 13:24:52 +0000 (14:24 +0100)]
radv: fix reporting subgroup size with VK_KHR_pipeline_executable_properties
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bas Nieuwenhuizen [Mon, 25 Nov 2019 22:58:04 +0000 (23:58 +0100)]
radv: Allocate cmdbuffer space for buffer marker write.
Fixes: 946193ae008 "radv: add support for VK_AMD_buffer_marker"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Gert Wollny [Mon, 18 Nov 2019 10:57:00 +0000 (11:57 +0100)]
r600: Disable eight bit three channel formats
Commit
0899bf55 made some deqp-gles3 tests related to RGB8 PBOs fail
on R600 because it exposed PIPE_FORMAT_R8G8B8_UNORM and R600 doesn't
propely handle this. Disabling this format also for buffers fixes the
issue.
In addition, disabling also the related RGB8 integer formats for buffers
fixes some deqp-gles3 tests:
dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rgb8ui_cube
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8i_2d
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8i_cube
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8ui_2d
dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb8ui_cube
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8i_2d_array
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8i_3d
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8ui_2d_array
dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb8ui_3d
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8i_2d_array
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8i_3d
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8ui_2d_array
dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb8ui_3d
Fixes: 0899bf55
st/mesa: Map MESA_FORMAT_RGB_UNORM8 <-> PIPE_FORMAT_R8G8B8_UNORM
Closes #2118
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Samuel Pitoiset [Mon, 25 Nov 2019 07:51:16 +0000 (08:51 +0100)]
ac/llvm: fix warning in ac_build_canonicalize()
../src/amd/llvm/ac_llvm_build.c: In function ‘ac_build_canonicalize’:
../src/amd/llvm/ac_llvm_build.c:4567:9: warning: ‘intr’ may be used uninitialized in this function [-Wmaybe-uninitialized]
4567 | return ac_build_intrinsic(ctx, intr, type, params, 1,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4568 | AC_FUNC_ATTR_READNONE);
| ~~~~~~~~~~~~~~~~~~~~~~
../src/amd/llvm/ac_llvm_build.c:4567:9: warning: ‘type’ may be used uninitialized in this function [-Wmaybe-uninitialized]
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Tapani Pälli [Tue, 19 Nov 2019 10:44:29 +0000 (12:44 +0200)]
mapi: add GetInteger64vEXT with EXT_disjoint_timer_query
From EXT_disjoint_timer_query spec:
"Interaction: This extension adds GetInteger64vEXT if
OpenGL ES 3.0 is not supported"
See https://github.com/KhronosGroup/OpenGL-Registry/issues/326.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2090
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Jason Ekstrand [Mon, 25 Nov 2019 17:20:42 +0000 (11:20 -0600)]
vulkan: Update the XML and headers to 1.1.129
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jason Ekstrand [Wed, 26 Jun 2019 18:57:34 +0000 (13:57 -0500)]
anv/entrypoints: Better handle promoted extensions
In the case of promoted extensions we can end up with an entrypoint that
we support being an alias of an entrypoint we do not support. For
instance, if an extension gets promoted from EXT to KHR, the EXT entry-
points may be aliases of the KHR ones. We want to leave everything as
EXT until we get around to advertising the KHR so that we don't break
things when we update the XML and headers.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jason Ekstrand [Wed, 26 Jun 2019 18:35:54 +0000 (13:35 -0500)]
vulkan/enum_to_str: Handle out-of-order aliases
The current code can only handle enum aliases if the original enum is
declared first followed by the alias as we walk the XML in a linear
fashion. This commit allows us to handle aliases where the alias
declaration comes before the thing it's aliasing.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Kenneth Graunke [Fri, 15 Nov 2019 23:18:06 +0000 (15:18 -0800)]
iris: Update SURFACE_STATE addresses when setting sampler views
We may have replaced the backing storage for a texture buffer while it
was unbound, at which point iris_rebind_buffer would not have caught it
and updated it. We need to ensure that the current resource's address
matches the one our SURFACE_STATE points at. If not, update addresses
and re-upload the SURFACE_STATE.
Shader images and buffers do not suffer from this problem because we
re-stream the surface state on every set call, since there isn't a
created CSO object for those with a saved SURFACE_STATE. Constant
buffers are also currently re-streamed (we pitch the SURFACE_STATE
on every set_constant_buffer call). Surfaces would need this
treatment (as they're created CSOs) except that we never swap out
their backing storage today (we only do it for buffers), so it's OK
for now.
Fixes misrendering in Unreal 4 demos (Elemental, Matinee Fight Scene).
Huge thanks to Andrii Simiklit for tracking down the problem - it was
quite difficult to find! Also fixes Andrii's new Piglit test for the
bug, 'arb_texture_buffer_object-re-init'.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1365
Kenneth Graunke [Fri, 15 Nov 2019 01:17:43 +0000 (17:17 -0800)]
iris: Maintain CPU-side SURFACE_STATE copies for views and surfaces.
When replacing the backing storage for texture buffers, image buffers,
and so on, we may need to update the "Surface Base Address" field in
any corresponding SURFACE_STATE. This is easier to accomplish if we
have a copy on the CPU - we can just compare the current field, update
it, and re-upload.
This patch adds a CPU-side copy to the new iris_surface_state wrapper
struct, and reworks allocation and upload to fill things out on the
CPU copy first, then upload that to the GPU when finished.
This will be necessary to fix iris_invalidate_resource bugs shortly.
Technically, we never replace the backing storage for pipe_surfaces
(render targets), so we don't need to make this change there. However,
it's nice to have surfaces, sampler views, and image views handled
similarly. Plus, if we ever wanted to swap out backing storage for
busy textures, we'd need this infrastructure.
v2: Properly free memory (caught by Andrii Simiklit)
Kenneth Graunke [Fri, 15 Nov 2019 00:06:10 +0000 (16:06 -0800)]
iris: Create an "iris_surface_state" wrapper struct
Today, we only have a state reference to the GPU buffer containing our
uploaded SURFACE_STATEs. However, we're going to want a CPU-side copy
soon. Making a wrapper struct means we can talk about both together,
and also put both in the field called "surface_state".
Kenneth Graunke [Thu, 31 Oct 2019 16:41:49 +0000 (09:41 -0700)]
iris: Drop 'old_address' parameter from iris_rebind_buffer
We can just compare the VERTEX_BUFFER_STATE address field to the
current BO's address. When calling rebind, we've already updated
the resource to the new buffer, but the state will have the old
address.
Kenneth Graunke [Fri, 15 Nov 2019 07:26:07 +0000 (23:26 -0800)]
iris: Stop mutating the resource in get_rt_read_isl_surf().
Mutating fields of global resources is generally not safe, and the only
reason we were doing it was to avoid passing an extra parameter to
the fill_surface_state helper.
Marek Olšák [Sat, 23 Nov 2019 03:47:02 +0000 (22:47 -0500)]
radeonsi/nir: don't run si_nir_opts again if there is no change
0.3% less overhead
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Wed, 20 Nov 2019 23:40:46 +0000 (18:40 -0500)]
radeonsi: initialize the per-context compiler on demand
This takes a noticable amount of time in piglit and some tests don't
need it.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Fri, 22 Nov 2019 22:41:22 +0000 (17:41 -0500)]
ac: set swizzled bit in cache policy as a hint not to merge loads/stores
LLVM now merges loads and stores for all opcodes, so this must be set.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Eric Anholt [Tue, 19 Feb 2019 17:30:52 +0000 (09:30 -0800)]
nir: Add a scheduler pass to reduce maximum register pressure.
This is similar to a scheduler I've written for vc4 and i965, but this
time written at the NIR level so that hopefully it's reusable. A notable
new feature it has is Goodman/Hsu's heuristic of "once we've started
processing the uses of a value, prioritize processing the rest of their
uses", which should help avoid the heuristic otherwise making such
systematically bad choices around getting texture results consumed.
Results for v3d:
total instructions in shared programs:
6497588 ->
6518242 (0.32%)
total threads in shared programs: 154000 -> 152828 (-0.76%)
total uniforms in shared programs:
2119629 ->
2068681 (-2.40%)
total spills in shared programs: 4984 -> 472 (-90.53%)
total fills in shared programs: 6418 -> 1546 (-75.91%)
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> (v1)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v2)
v2: Use the DAG datastructure, fold in the scheduling-for-parallelism
patch, include SSA defs in live values so we can switch to bottom-up
if we want.
v3: Squash in improvements from Alejandro Piñeiro for getting V3D to
successfully register allocate on GLES3.1 dEQP. Make sure that
discards don't move after store_output. Comment spelling fix.
Jonathan Marek [Mon, 12 Aug 2019 15:43:26 +0000 (11:43 -0400)]
etnaviv: implement 64bpp clear
At the same time, update etna_clear_blit_pack_rgba to work with integer
formats.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Jonathan Marek [Mon, 12 Aug 2019 15:34:57 +0000 (11:34 -0400)]
etnaviv: avoid using RS for 64bpp formats
At the same time, this change allows using BLT for 8bpp formats
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Christian Gmeiner [Fri, 14 Jun 2019 06:22:07 +0000 (08:22 +0200)]
etnaviv: add support for extended pe formats
Use the extended format if an such a format was passed.
v1 -> v2:
- set FORMAT_MASK bit when using ext PE format as suggested
by Wladimir J. van der Laan
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Christian Gmeiner [Tue, 1 May 2018 14:48:41 +0000 (16:48 +0200)]
etnaviv: handle 8 byte block in tiling
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Samuel Pitoiset [Thu, 17 Oct 2019 13:05:59 +0000 (15:05 +0200)]
radv: select the depth decompress path based on the aspect mask
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 17 Oct 2019 12:57:04 +0000 (14:57 +0200)]
radv: create decompress pipelines for separate depth/stencil layouts
No functional changes as the driver still uses the depth+stencil
pipeline.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 17 Oct 2019 12:48:23 +0000 (14:48 +0200)]
radv: rework creation of decompress/resummarize meta pipelines
This refactoring will help for creating more decompress pipelines.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 17 Oct 2019 13:26:07 +0000 (15:26 +0200)]
radv: set the image view aspect mask before resolves
No functional changes, but it will be used to decompress
separate depth/stencil aspects.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 16 Oct 2019 12:13:52 +0000 (14:13 +0200)]
radv: set the image view aspect mask during subpass transitions
No functional changes because the aspect mask is still not used
during image transitions but it will be needed for the separate
depth/stencil aspects logic.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Rhys Perry [Wed, 18 Sep 2019 19:31:33 +0000 (20:31 +0100)]
aco: enable load/store vectorizer
Totals from affected shaders:
SGPRS:
1890373 ->
1900772 (0.55 %)
VGPRS:
1210024 ->
1215244 (0.43 %)
Spilled SGPRs: 828 -> 828 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 252 -> 252 (0.00 %) dwords per thread
Code Size:
81937504 ->
74608304 (-8.94 %) bytes
LDS: 746 -> 746 (0.00 %) blocks
Max Waves: 230491 -> 230158 (-0.14 %)
In NeiR:Automata and GTA V, the code decrease is especially large: -13.79%
and -15.32%, respectively.
v9: rework the callback function
v10: handle load_shared/store_shared in the callback
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)
Rhys Perry [Mon, 2 Sep 2019 15:09:24 +0000 (16:09 +0100)]
nir: add load/store vectorizer tests
v7: run nir_opt_algebraic
v9: rework the callback function
v9: update alignment on all loads/stores, even if they're not vectorized
v10: add tests for 64-bit offsets
v10: add tests for signed offsets
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)
Rhys Perry [Tue, 19 Mar 2019 20:55:30 +0000 (20:55 +0000)]
nir: add a load/store vectorization pass
This pass combines intersecting, adjacent and identical loads/stores into
potentially larger ones and will be used by ACO to greatly reduce the
number of memory operations.
v2: handle nir_deref_type_ptr_as_array
v3: assume explicitly laid out types for derefs
v4: create less deref casts
v4: fix shared boolean vectorization
v4: fix copy+paste error in resources_different
v4: fix extract_subvector() to pass
nir_load_store_vectorize_test.ssbo_load_intersecting_32_32_64
v4: rebase
v5: subtract from deref/offset instead of scheduling offset calculations
v5: various non-functional changes/cleanups
v5: require less metadata and preserve more
v5: rebase
v6: cleanup and improve dependency handling
v6: emit less deref casts
v6: pass undef to components not set in the write_mask for new stores
v7: fix 8-bit extract_vector() with 64-bit input
v7: cleanup creation of store write data
v7: update align correctly for when the bit size of load/store increases
v7: rename extract_vector to extract_component and update comment
v8: prevent combining of row-major matrix column acceses
v9: rework process_block() to be able to vectorize more
v9: rework the callback function
v9: update alignment on all loads/stores, even if they're not vectorized
v9: remove entry::store_value, since it will not be updated if it's was
from a vectorized load
v9: fix bug in subtract_deref(), causing artifacts in Dishonored 2
v9: handle nir_intrinsic_scoped_memory_barrier
v10: use nir_ssa_scalar
v10: handle non-32-bit offsets
v10: use signed offsets for comparison
v10: improve create_entry_key_from_offset()
v10: support load_shared/store_shared
v10: remove strip_deref_casts()
v10: don't ever pass NULL to memcmp
v10: remove recursion in gcd()
v10: fix outdated comment
v11: use the new nir_extract_bits()
v12: remove use of nir_src_as_const_value in resources_different
v13: make entry key hash function deterministic
v13: simplify mask_sign_extend()
v14: add comment in hash_entry_key() about hashing pointers
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)
Rhys Perry [Mon, 4 Nov 2019 17:45:59 +0000 (17:45 +0000)]
radv: set alignment for load_ssbo/store_ssbo in meta shaders
Otherwise, nir_intrinsic_align() will assert when called on the intrinsics
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Rhys Perry [Tue, 19 Mar 2019 20:24:35 +0000 (20:24 +0000)]
nir: add nir_num_variable_modes and nir_var_mem_push_const
These will be useful in the upcoming load/store vectorizer.
v11: rebase
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Connor Abbott [Mon, 18 Nov 2019 14:36:20 +0000 (15:36 +0100)]
aco: Make unused workgroup id's 0
It shouldn't matter, but the 1 was leftover from when it was handled
together with workgroup_size and num_work_groups.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Connor Abbott [Wed, 13 Nov 2019 12:30:52 +0000 (13:30 +0100)]
aco: Use common argument handling
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Connor Abbott [Tue, 12 Nov 2019 14:38:46 +0000 (15:38 +0100)]
radv: Replace supports_spill with explict_scratch_args
The former was always true and hence dead code. We will want to
explicitly declare the ring offset register with ACO, but we also want
to declare the scratch offset too, and we can't try to disable it since
ACO also supports spilling and the determination of whether spilling has
to happen occurs well after setting up registers. So replace
supports_spill with something that will actually be used for ACO.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Connor Abbott [Tue, 12 Nov 2019 10:06:39 +0000 (11:06 +0100)]
aco: Make num_workgroups and local_invocation_ids one argument each
To match the LLVM argument setup code.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Connor Abbott [Fri, 15 Nov 2019 12:51:27 +0000 (13:51 +0100)]
aco: Split vector arguments at the beginning
Due to how LLVM works we have to make some of the FS inputs become
vectors, and therefore have to split them early so that they don't take
up extra register pressure due to how RA currently works.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Connor Abbott [Mon, 11 Nov 2019 17:27:25 +0000 (18:27 +0100)]
aco: Use radv_shader_args in aco_compile_shader()
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Connor Abbott [Wed, 30 Oct 2019 10:54:43 +0000 (11:54 +0100)]
aco: Constify radv_nir_compiler_options in isel
It's already const for everything else.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Connor Abbott [Mon, 11 Nov 2019 17:05:03 +0000 (18:05 +0100)]
radv: Move argument declaration out of nir_to_llvm
Now it's executed for ACO too.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Connor Abbott [Mon, 11 Nov 2019 11:50:12 +0000 (12:50 +0100)]
ac/nir, radv, radeonsi: Switch to using ac_shader_args
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Connor Abbott [Tue, 29 Oct 2019 16:40:30 +0000 (17:40 +0100)]
ac: Add a shared interface between radv, radeonsi, LLVM and ACO
ac_shader_args will be similar to ac_shader_abi, except for being free
from LLVM-specific concepts and therefore capable of being shared
between LLVM and ACO. This will help us accomplish a few different
things:
- Decouple setting up SGPR and VGPR arguments from translating to LLVM,
so that we can reference these arguments in NIR lowering passes, which
will let us lower e.g. descriptor sets in NIR.
- Stop using radv-specific structures for things like determining the
chip generation in ACO.
In the end, we should replace ac_shader_abi with this structure +
driver-specific lowering passes.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Connor Abbott [Thu, 31 Oct 2019 14:23:35 +0000 (15:23 +0100)]
radv: Rename ac_arg_regfile
We'll duplicate this in a header file in the next commit, and then
remove the original enum. Just rename it temporarily so that things
keep building.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Danylo Piliaiev [Fri, 22 Nov 2019 16:05:14 +0000 (18:05 +0200)]
drirc: Add glsl_zero_init workaround for GpuTest
GiMark benchmark from GpuTest has such code in VS:
out vec4 lightDir0;
out vec4 lightDir1;
...
lightDir0.xyz = lp0 - vVertex.xyz;
lightDir1.xyz = lp1 - vVertex.xyz;
In FS:
float distSqr = dot(lightDir0, lightDir0);
So due to the usage of uninitialized .w channel in the dot product,
distSqr may become undefined which results in many black dots
in the test on Iris.
In https://www.geeks3d.com/forums/index.php/topic,6242.0.html
developer stated that this benchmark most likely won't be updated.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1919
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Fri, 22 Nov 2019 11:16:50 +0000 (12:16 +0100)]
meson: only build imgui when needed
Only required for Intel tools or the Vulkan overlay layer.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Samuel Pitoiset [Thu, 31 Oct 2019 13:00:52 +0000 (14:00 +0100)]
ac/llvm: fix the local invocation index for wave32
Fixes dEQP-VK.compute.builtin_var.local_invocation_index with
RADV_PERFTEST=cswave32.
My initial fix was to lower it but Rhys suggested the shift-right
and it's much better like this.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 21 Nov 2019 10:27:55 +0000 (11:27 +0100)]
radv: disable subgroup shuffle operations on GFX10
They are broken like on GFX6-GFX7. It seems better to disable them
instead of enabling a broken feature.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Dave Airlie [Fri, 22 Nov 2019 04:55:40 +0000 (14:55 +1000)]
docs: add llvmpipe to ARB_query_buffer_object.
Dave Airlie [Fri, 22 Nov 2019 04:47:59 +0000 (14:47 +1000)]
llvmpipe: initial query buffer object support. (v2)
This fails a couple of piglits due to other bugs in llvmpipe,
but it adds support for the feature properly.
v2: don't reset pipestats, just recalc, fix CI expectation
Timothy Arceri [Sun, 24 Nov 2019 23:08:26 +0000 (10:08 +1100)]
radv: create a fresh fork for each pipeline compile
In order to prevent a potential malicious pipeline tainting our
secure compile process and interfering with successive pipelines
we want to create a fresh fork for each pipeline compile.
Benchmarking has shown that simply forking on each pipeline
creation doubles the total time it takes to compile a fossilize db
collection. So instead here we fork the process at device creation
so that we have a slim copy of the device and then fork this
otherwise idle and untainted process each time we compile a
pipeline. Forking this slim copy of the device results in only a
20% increase in compile time vs a 100% increase.
Fixes: cff53da3 ("radv: enable secure compile support")
Timothy Arceri [Wed, 13 Nov 2019 03:51:48 +0000 (14:51 +1100)]
radv: add a secure_compile_open_fifo_fds() helper
This will be used to create a communication pipe between the user
facing device and a freshly forked (per pipeline compile) slim copy
of that device.
We can't use pipe() here because the fork will not be a direct fork
of the user facing process. Instead we use a previously forked
copy of the process that was forked at device creation in order to
reduce the resources required for the fork and avoid performance
issues.
Fixes: cff53da3748d ("radv: enable secure compile support")
Timothy Arceri [Sun, 24 Nov 2019 23:00:20 +0000 (10:00 +1100)]
radv: add some infrastructure for fresh forks for each secure compile
In the following commits we want to be able to fork an existing lightweight
fork created at device creation time. In order for the user facing process
to communicate with this new fresh fork we create some members here to hold
FIFO file descriptors and a unique id.
Here we also add a new fork enum that we use to tell the lightweight
process to create a fresh fork.
For more information on why we create a fresh fork see the following
commits.
Brian Paul [Sat, 23 Nov 2019 02:42:34 +0000 (19:42 -0700)]
nir: no-op C99 _Pragma() with MSVC
This fixes a build failure on MSVC.
BTW, it looks like clang supports _Pragma() but I don't know if it
understands the "gcc unroll N" directive.
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Michel Zou [Sun, 17 Nov 2019 15:40:29 +0000 (16:40 +0100)]
Meson: Add llvm>=9 modules
Fixes build with MinGW, with shared LLVM and lto
/tmp/opengl32.dll.BxiIYm.ltrans59.ltrans.o:<artificial>:(.text+0x1674): undefined reference to `LLVMAddInstructionCombiningPass'
See also scons/llvm.py
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Michel Zou [Mon, 11 Nov 2019 21:15:41 +0000 (22:15 +0100)]
disk_cache_get_function_timestamp: check for dladdr
instead of dlopen
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Michel Zou [Mon, 11 Nov 2019 21:14:55 +0000 (22:14 +0100)]
Meson: Check for dladdr with MinGW
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Marek Olšák [Fri, 22 Nov 2019 01:24:08 +0000 (20:24 -0500)]
nir/serialize: support any num_components for remaining instructions
Only NPOT vectors greater than vec4 use the extra uint32.
This is for instructions that share the dest code.
load_const and undef already support 1-16 in the header.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Fri, 22 Nov 2019 01:23:27 +0000 (20:23 -0500)]
nir/serialize: use 3 unused bits in intrinsic for packed_const_indices
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Fri, 22 Nov 2019 00:45:46 +0000 (19:45 -0500)]
nir/serialize: don't serialize redundant nir_intrinsic_instr::num_components
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 12 Nov 2019 03:33:49 +0000 (22:33 -0500)]
nir/serialize: serialize writemask for vec8 and vec16
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 12 Nov 2019 03:28:17 +0000 (22:28 -0500)]
nir/serialize: serialize swizzles for vec8 and vec16
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Thu, 7 Nov 2019 05:28:01 +0000 (00:28 -0500)]
nir/serialize: reuse the writemask field for 2 src X swizzles of SSA ALU
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Wed, 6 Nov 2019 03:14:28 +0000 (22:14 -0500)]
nir/serialize: remove up to 3 consecutive equal ALU instruction headers
vec4 scalarized ALUs typically have 4 equal instruction headers, so remove
the last 3.
There are no bits left in the ALU header for more flags, so future
extensions of NIR will have to use something like instr_type == 15
to describe more complex ALU instructions.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 23:10:40 +0000 (18:10 -0500)]
nir/serialize: try to pack both deref array src into 32 bits
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 23:24:27 +0000 (18:24 -0500)]
nir/serialize: cleanup - fold nir_deref_type_var cases into switches
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 22:53:32 +0000 (17:53 -0500)]
nir/serialize: try to put deref->var index into the unused bits of the header
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 22:39:38 +0000 (17:39 -0500)]
nir/serialize: don't serialize mode for deref non-cast instructions
It can be derived from src and var. This frees 10 bits in the header
that will be used later.
"mode" is moved in the structure, because those bits will be used for
something else later.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 01:11:11 +0000 (20:11 -0500)]
nir/serialize: don't store deref types if not needed
- type_cast: deduplicate types if the last one is the same
- derive the type from the parent for other derefs
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 05:09:29 +0000 (00:09 -0500)]
nir/serialize: try to pack two alu srcs into 1 uint32
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 04:29:33 +0000 (23:29 -0500)]
nir/serialize: pack nir_intrinsic_instr::const_index[] better
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 03:25:15 +0000 (22:25 -0500)]
nir/serialize: pack 1-component constants into 20 bits if possible
The majority of constants can be packed like this.
v2: - use enum for the packing encoding,
- trim packed_value to 20 bits add 1 bit to last_component,
which simplifies a later commit
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 03:15:17 +0000 (22:15 -0500)]
nir/serialize: pack load_const with non-64-bit constants better
v2: use blob_write_uint8/16
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 02:31:40 +0000 (21:31 -0500)]
nir/serialize: try to store a diff in var data locations instead of var data
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 01:11:11 +0000 (20:11 -0500)]
nir/serialize: deduplicate serialized var types by reusing the last unique one
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 00:42:42 +0000 (19:42 -0500)]
nir/serialize: don't serialize var->data for temporaries
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Wed, 30 Oct 2019 22:14:37 +0000 (18:14 -0400)]
nir/serialize: pack src better and limit the object count to 1M from 1G
We need to limit the object count to 1M to free 10 bits for the src
modifiers.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Fri, 25 Oct 2019 06:39:54 +0000 (02:39 -0400)]
nir/serialize: pack instructions better
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Wed, 20 Nov 2019 00:36:36 +0000 (19:36 -0500)]
util/blob: add 8-bit and 16-bit reads and writes
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Eric Anholt [Fri, 22 Nov 2019 23:16:27 +0000 (15:16 -0800)]
ci: Use a tag from the parallel-deqp-runner repo.
If the repo continues development, we don't want to accidentally pick
up potentially breaking changes on our next container rebuild.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Rob Clark [Thu, 21 Nov 2019 18:54:13 +0000 (10:54 -0800)]
gitlab-ci/freedreno/a6xx: remove most of the flakes
xfb + lines/points still flakes too frequently (and the problem isn't
even related to xfb), but we can add the rest back into this mix now.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Rob Clark [Sun, 17 Nov 2019 20:04:50 +0000 (12:04 -0800)]
gitlab-ci/deqp: generate junit results
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Rob Clark [Sun, 17 Nov 2019 19:57:26 +0000 (11:57 -0800)]
gitlab-ci/deqp: generate xml results for fails/flakes
Extract .qpa for the individual unexpected results and flakes, and
translate to xml, preserved with the artifacts. This allows easy
browsing of the test logs for fails/flakes, for easier debugging.
The # of logs to preserve is capped at 50 to avoid saving 100s of
megabytes of logs in case someone pushes a change that breaks
everything.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Rob Clark [Fri, 22 Nov 2019 21:30:18 +0000 (13:30 -0800)]
gitlab-ci: bump arm test container
To pick up updated cts_runner and netcat for the flake reporting.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Rob Clark [Sun, 17 Nov 2019 19:33:01 +0000 (11:33 -0800)]
gitlab-ci/deqp: detect and report flakes
If there are a small number of fails, re-run to determine if they are
flakes, and optionally (if `$FLAKES_CHANNEL` configured) report the
flakes.
This way flakes don't interfere with developers working on other
drivers, but get logged so that the developers working on the flaking
driver can monitor the situation.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Rob Clark [Sun, 17 Nov 2019 19:28:16 +0000 (11:28 -0800)]
gitlab-ci/deqp: preserve caselists for blocks with fails
Bump cts_runner to pick up the change to preserve .qpa and caselist .txt
files for blocks of tests that contain fails, and preserve the caselist
files. To reproduce fails that depend on order of running tests, these
are useful.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Rob Clark [Sun, 17 Nov 2019 19:16:09 +0000 (11:16 -0800)]
gitlab-ci/deqp: preserve full list of unexpected results
The log only shows the first 50, but preserve the full list for easier
browsing.
(Also move return of exit code to end which makes later patches in the
series easier)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Rob Clark [Fri, 15 Nov 2019 18:15:32 +0000 (10:15 -0800)]
gitlab-ci: update deqp build so we can generate xml
Update the deqp build to preserve testlog-to-xml and stylesheets, so
deqp runner can extract .qpa for failed/flaked tests, and convert to
xml. With this, will be able to browse output from failed tests
directly from the artifacts.
The main motiviation is to give better visibility into what happens with
flaked tests, when it is difficult/impossible to reproduce the flake
locally (ie. when it happens once out of N million tests). But this
should also make it easier to debug regressions that a MR triggers,
especially when it is on hw that you don't have.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Markus Wick [Tue, 5 Nov 2019 08:16:37 +0000 (09:16 +0100)]
drirc: Enable glthread for dolphin/citra/yuzu.
Dolphin: 75 fps -> 88 fps - Super Mario Galaxy
Citra: 81 fps -> 91 fps - A Link Between Worlds
Yuzu: 21 fps -> 27 fps - Super Mario Odyssey
Dolphin still has many syncs because of glFenceSync and glClientWaitSync.
Moving them to the dispatcher thread might yield another speedup.
Yuzu uses a compatible profile by default. This benchmark used the variable
MESA_GL_VERSION_OVERRIDE=4.5FC to overwrite this behavior.
This profilation was done on a mobile i7-8550U CPU with i965.
Signed-off-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Markus Wick [Sun, 3 Nov 2019 08:49:59 +0000 (09:49 +0100)]
mesa/glthread: Implement ARB_multi_bind.
Signed-off-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>