Marek Olšák [Sat, 23 Nov 2019 03:47:02 +0000 (22:47 -0500)]
radeonsi/nir: don't run si_nir_opts again if there is no change
0.3% less overhead
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Wed, 20 Nov 2019 23:40:46 +0000 (18:40 -0500)]
radeonsi: initialize the per-context compiler on demand
This takes a noticable amount of time in piglit and some tests don't
need it.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Fri, 22 Nov 2019 22:41:22 +0000 (17:41 -0500)]
ac: set swizzled bit in cache policy as a hint not to merge loads/stores
LLVM now merges loads and stores for all opcodes, so this must be set.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Eric Anholt [Tue, 19 Feb 2019 17:30:52 +0000 (09:30 -0800)]
nir: Add a scheduler pass to reduce maximum register pressure.
This is similar to a scheduler I've written for vc4 and i965, but this
time written at the NIR level so that hopefully it's reusable. A notable
new feature it has is Goodman/Hsu's heuristic of "once we've started
processing the uses of a value, prioritize processing the rest of their
uses", which should help avoid the heuristic otherwise making such
systematically bad choices around getting texture results consumed.
Results for v3d:
total instructions in shared programs:
6497588 ->
6518242 (0.32%)
total threads in shared programs: 154000 -> 152828 (-0.76%)
total uniforms in shared programs:
2119629 ->
2068681 (-2.40%)
total spills in shared programs: 4984 -> 472 (-90.53%)
total fills in shared programs: 6418 -> 1546 (-75.91%)
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> (v1)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v2)
v2: Use the DAG datastructure, fold in the scheduling-for-parallelism
patch, include SSA defs in live values so we can switch to bottom-up
if we want.
v3: Squash in improvements from Alejandro Piñeiro for getting V3D to
successfully register allocate on GLES3.1 dEQP. Make sure that
discards don't move after store_output. Comment spelling fix.
Jonathan Marek [Mon, 12 Aug 2019 15:43:26 +0000 (11:43 -0400)]
etnaviv: implement 64bpp clear
At the same time, update etna_clear_blit_pack_rgba to work with integer
formats.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Jonathan Marek [Mon, 12 Aug 2019 15:34:57 +0000 (11:34 -0400)]
etnaviv: avoid using RS for 64bpp formats
At the same time, this change allows using BLT for 8bpp formats
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Christian Gmeiner [Fri, 14 Jun 2019 06:22:07 +0000 (08:22 +0200)]
etnaviv: add support for extended pe formats
Use the extended format if an such a format was passed.
v1 -> v2:
- set FORMAT_MASK bit when using ext PE format as suggested
by Wladimir J. van der Laan
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Christian Gmeiner [Tue, 1 May 2018 14:48:41 +0000 (16:48 +0200)]
etnaviv: handle 8 byte block in tiling
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Samuel Pitoiset [Thu, 17 Oct 2019 13:05:59 +0000 (15:05 +0200)]
radv: select the depth decompress path based on the aspect mask
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 17 Oct 2019 12:57:04 +0000 (14:57 +0200)]
radv: create decompress pipelines for separate depth/stencil layouts
No functional changes as the driver still uses the depth+stencil
pipeline.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 17 Oct 2019 12:48:23 +0000 (14:48 +0200)]
radv: rework creation of decompress/resummarize meta pipelines
This refactoring will help for creating more decompress pipelines.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 17 Oct 2019 13:26:07 +0000 (15:26 +0200)]
radv: set the image view aspect mask before resolves
No functional changes, but it will be used to decompress
separate depth/stencil aspects.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 16 Oct 2019 12:13:52 +0000 (14:13 +0200)]
radv: set the image view aspect mask during subpass transitions
No functional changes because the aspect mask is still not used
during image transitions but it will be needed for the separate
depth/stencil aspects logic.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Rhys Perry [Wed, 18 Sep 2019 19:31:33 +0000 (20:31 +0100)]
aco: enable load/store vectorizer
Totals from affected shaders:
SGPRS:
1890373 ->
1900772 (0.55 %)
VGPRS:
1210024 ->
1215244 (0.43 %)
Spilled SGPRs: 828 -> 828 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 252 -> 252 (0.00 %) dwords per thread
Code Size:
81937504 ->
74608304 (-8.94 %) bytes
LDS: 746 -> 746 (0.00 %) blocks
Max Waves: 230491 -> 230158 (-0.14 %)
In NeiR:Automata and GTA V, the code decrease is especially large: -13.79%
and -15.32%, respectively.
v9: rework the callback function
v10: handle load_shared/store_shared in the callback
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)
Rhys Perry [Mon, 2 Sep 2019 15:09:24 +0000 (16:09 +0100)]
nir: add load/store vectorizer tests
v7: run nir_opt_algebraic
v9: rework the callback function
v9: update alignment on all loads/stores, even if they're not vectorized
v10: add tests for 64-bit offsets
v10: add tests for signed offsets
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)
Rhys Perry [Tue, 19 Mar 2019 20:55:30 +0000 (20:55 +0000)]
nir: add a load/store vectorization pass
This pass combines intersecting, adjacent and identical loads/stores into
potentially larger ones and will be used by ACO to greatly reduce the
number of memory operations.
v2: handle nir_deref_type_ptr_as_array
v3: assume explicitly laid out types for derefs
v4: create less deref casts
v4: fix shared boolean vectorization
v4: fix copy+paste error in resources_different
v4: fix extract_subvector() to pass
nir_load_store_vectorize_test.ssbo_load_intersecting_32_32_64
v4: rebase
v5: subtract from deref/offset instead of scheduling offset calculations
v5: various non-functional changes/cleanups
v5: require less metadata and preserve more
v5: rebase
v6: cleanup and improve dependency handling
v6: emit less deref casts
v6: pass undef to components not set in the write_mask for new stores
v7: fix 8-bit extract_vector() with 64-bit input
v7: cleanup creation of store write data
v7: update align correctly for when the bit size of load/store increases
v7: rename extract_vector to extract_component and update comment
v8: prevent combining of row-major matrix column acceses
v9: rework process_block() to be able to vectorize more
v9: rework the callback function
v9: update alignment on all loads/stores, even if they're not vectorized
v9: remove entry::store_value, since it will not be updated if it's was
from a vectorized load
v9: fix bug in subtract_deref(), causing artifacts in Dishonored 2
v9: handle nir_intrinsic_scoped_memory_barrier
v10: use nir_ssa_scalar
v10: handle non-32-bit offsets
v10: use signed offsets for comparison
v10: improve create_entry_key_from_offset()
v10: support load_shared/store_shared
v10: remove strip_deref_casts()
v10: don't ever pass NULL to memcmp
v10: remove recursion in gcd()
v10: fix outdated comment
v11: use the new nir_extract_bits()
v12: remove use of nir_src_as_const_value in resources_different
v13: make entry key hash function deterministic
v13: simplify mask_sign_extend()
v14: add comment in hash_entry_key() about hashing pointers
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v9)
Rhys Perry [Mon, 4 Nov 2019 17:45:59 +0000 (17:45 +0000)]
radv: set alignment for load_ssbo/store_ssbo in meta shaders
Otherwise, nir_intrinsic_align() will assert when called on the intrinsics
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Rhys Perry [Tue, 19 Mar 2019 20:24:35 +0000 (20:24 +0000)]
nir: add nir_num_variable_modes and nir_var_mem_push_const
These will be useful in the upcoming load/store vectorizer.
v11: rebase
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Connor Abbott [Mon, 18 Nov 2019 14:36:20 +0000 (15:36 +0100)]
aco: Make unused workgroup id's 0
It shouldn't matter, but the 1 was leftover from when it was handled
together with workgroup_size and num_work_groups.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Connor Abbott [Wed, 13 Nov 2019 12:30:52 +0000 (13:30 +0100)]
aco: Use common argument handling
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Connor Abbott [Tue, 12 Nov 2019 14:38:46 +0000 (15:38 +0100)]
radv: Replace supports_spill with explict_scratch_args
The former was always true and hence dead code. We will want to
explicitly declare the ring offset register with ACO, but we also want
to declare the scratch offset too, and we can't try to disable it since
ACO also supports spilling and the determination of whether spilling has
to happen occurs well after setting up registers. So replace
supports_spill with something that will actually be used for ACO.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Connor Abbott [Tue, 12 Nov 2019 10:06:39 +0000 (11:06 +0100)]
aco: Make num_workgroups and local_invocation_ids one argument each
To match the LLVM argument setup code.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Connor Abbott [Fri, 15 Nov 2019 12:51:27 +0000 (13:51 +0100)]
aco: Split vector arguments at the beginning
Due to how LLVM works we have to make some of the FS inputs become
vectors, and therefore have to split them early so that they don't take
up extra register pressure due to how RA currently works.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Connor Abbott [Mon, 11 Nov 2019 17:27:25 +0000 (18:27 +0100)]
aco: Use radv_shader_args in aco_compile_shader()
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Connor Abbott [Wed, 30 Oct 2019 10:54:43 +0000 (11:54 +0100)]
aco: Constify radv_nir_compiler_options in isel
It's already const for everything else.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Connor Abbott [Mon, 11 Nov 2019 17:05:03 +0000 (18:05 +0100)]
radv: Move argument declaration out of nir_to_llvm
Now it's executed for ACO too.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Connor Abbott [Mon, 11 Nov 2019 11:50:12 +0000 (12:50 +0100)]
ac/nir, radv, radeonsi: Switch to using ac_shader_args
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Connor Abbott [Tue, 29 Oct 2019 16:40:30 +0000 (17:40 +0100)]
ac: Add a shared interface between radv, radeonsi, LLVM and ACO
ac_shader_args will be similar to ac_shader_abi, except for being free
from LLVM-specific concepts and therefore capable of being shared
between LLVM and ACO. This will help us accomplish a few different
things:
- Decouple setting up SGPR and VGPR arguments from translating to LLVM,
so that we can reference these arguments in NIR lowering passes, which
will let us lower e.g. descriptor sets in NIR.
- Stop using radv-specific structures for things like determining the
chip generation in ACO.
In the end, we should replace ac_shader_abi with this structure +
driver-specific lowering passes.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Connor Abbott [Thu, 31 Oct 2019 14:23:35 +0000 (15:23 +0100)]
radv: Rename ac_arg_regfile
We'll duplicate this in a header file in the next commit, and then
remove the original enum. Just rename it temporarily so that things
keep building.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Danylo Piliaiev [Fri, 22 Nov 2019 16:05:14 +0000 (18:05 +0200)]
drirc: Add glsl_zero_init workaround for GpuTest
GiMark benchmark from GpuTest has such code in VS:
out vec4 lightDir0;
out vec4 lightDir1;
...
lightDir0.xyz = lp0 - vVertex.xyz;
lightDir1.xyz = lp1 - vVertex.xyz;
In FS:
float distSqr = dot(lightDir0, lightDir0);
So due to the usage of uninitialized .w channel in the dot product,
distSqr may become undefined which results in many black dots
in the test on Iris.
In https://www.geeks3d.com/forums/index.php/topic,6242.0.html
developer stated that this benchmark most likely won't be updated.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1919
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Fri, 22 Nov 2019 11:16:50 +0000 (12:16 +0100)]
meson: only build imgui when needed
Only required for Intel tools or the Vulkan overlay layer.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Samuel Pitoiset [Thu, 31 Oct 2019 13:00:52 +0000 (14:00 +0100)]
ac/llvm: fix the local invocation index for wave32
Fixes dEQP-VK.compute.builtin_var.local_invocation_index with
RADV_PERFTEST=cswave32.
My initial fix was to lower it but Rhys suggested the shift-right
and it's much better like this.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 21 Nov 2019 10:27:55 +0000 (11:27 +0100)]
radv: disable subgroup shuffle operations on GFX10
They are broken like on GFX6-GFX7. It seems better to disable them
instead of enabling a broken feature.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Dave Airlie [Fri, 22 Nov 2019 04:55:40 +0000 (14:55 +1000)]
docs: add llvmpipe to ARB_query_buffer_object.
Dave Airlie [Fri, 22 Nov 2019 04:47:59 +0000 (14:47 +1000)]
llvmpipe: initial query buffer object support. (v2)
This fails a couple of piglits due to other bugs in llvmpipe,
but it adds support for the feature properly.
v2: don't reset pipestats, just recalc, fix CI expectation
Timothy Arceri [Sun, 24 Nov 2019 23:08:26 +0000 (10:08 +1100)]
radv: create a fresh fork for each pipeline compile
In order to prevent a potential malicious pipeline tainting our
secure compile process and interfering with successive pipelines
we want to create a fresh fork for each pipeline compile.
Benchmarking has shown that simply forking on each pipeline
creation doubles the total time it takes to compile a fossilize db
collection. So instead here we fork the process at device creation
so that we have a slim copy of the device and then fork this
otherwise idle and untainted process each time we compile a
pipeline. Forking this slim copy of the device results in only a
20% increase in compile time vs a 100% increase.
Fixes: cff53da3 ("radv: enable secure compile support")
Timothy Arceri [Wed, 13 Nov 2019 03:51:48 +0000 (14:51 +1100)]
radv: add a secure_compile_open_fifo_fds() helper
This will be used to create a communication pipe between the user
facing device and a freshly forked (per pipeline compile) slim copy
of that device.
We can't use pipe() here because the fork will not be a direct fork
of the user facing process. Instead we use a previously forked
copy of the process that was forked at device creation in order to
reduce the resources required for the fork and avoid performance
issues.
Fixes: cff53da3748d ("radv: enable secure compile support")
Timothy Arceri [Sun, 24 Nov 2019 23:00:20 +0000 (10:00 +1100)]
radv: add some infrastructure for fresh forks for each secure compile
In the following commits we want to be able to fork an existing lightweight
fork created at device creation time. In order for the user facing process
to communicate with this new fresh fork we create some members here to hold
FIFO file descriptors and a unique id.
Here we also add a new fork enum that we use to tell the lightweight
process to create a fresh fork.
For more information on why we create a fresh fork see the following
commits.
Brian Paul [Sat, 23 Nov 2019 02:42:34 +0000 (19:42 -0700)]
nir: no-op C99 _Pragma() with MSVC
This fixes a build failure on MSVC.
BTW, it looks like clang supports _Pragma() but I don't know if it
understands the "gcc unroll N" directive.
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Michel Zou [Sun, 17 Nov 2019 15:40:29 +0000 (16:40 +0100)]
Meson: Add llvm>=9 modules
Fixes build with MinGW, with shared LLVM and lto
/tmp/opengl32.dll.BxiIYm.ltrans59.ltrans.o:<artificial>:(.text+0x1674): undefined reference to `LLVMAddInstructionCombiningPass'
See also scons/llvm.py
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Michel Zou [Mon, 11 Nov 2019 21:15:41 +0000 (22:15 +0100)]
disk_cache_get_function_timestamp: check for dladdr
instead of dlopen
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Michel Zou [Mon, 11 Nov 2019 21:14:55 +0000 (22:14 +0100)]
Meson: Check for dladdr with MinGW
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Marek Olšák [Fri, 22 Nov 2019 01:24:08 +0000 (20:24 -0500)]
nir/serialize: support any num_components for remaining instructions
Only NPOT vectors greater than vec4 use the extra uint32.
This is for instructions that share the dest code.
load_const and undef already support 1-16 in the header.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Fri, 22 Nov 2019 01:23:27 +0000 (20:23 -0500)]
nir/serialize: use 3 unused bits in intrinsic for packed_const_indices
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Fri, 22 Nov 2019 00:45:46 +0000 (19:45 -0500)]
nir/serialize: don't serialize redundant nir_intrinsic_instr::num_components
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 12 Nov 2019 03:33:49 +0000 (22:33 -0500)]
nir/serialize: serialize writemask for vec8 and vec16
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 12 Nov 2019 03:28:17 +0000 (22:28 -0500)]
nir/serialize: serialize swizzles for vec8 and vec16
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Thu, 7 Nov 2019 05:28:01 +0000 (00:28 -0500)]
nir/serialize: reuse the writemask field for 2 src X swizzles of SSA ALU
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Wed, 6 Nov 2019 03:14:28 +0000 (22:14 -0500)]
nir/serialize: remove up to 3 consecutive equal ALU instruction headers
vec4 scalarized ALUs typically have 4 equal instruction headers, so remove
the last 3.
There are no bits left in the ALU header for more flags, so future
extensions of NIR will have to use something like instr_type == 15
to describe more complex ALU instructions.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 23:10:40 +0000 (18:10 -0500)]
nir/serialize: try to pack both deref array src into 32 bits
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 23:24:27 +0000 (18:24 -0500)]
nir/serialize: cleanup - fold nir_deref_type_var cases into switches
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 22:53:32 +0000 (17:53 -0500)]
nir/serialize: try to put deref->var index into the unused bits of the header
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 22:39:38 +0000 (17:39 -0500)]
nir/serialize: don't serialize mode for deref non-cast instructions
It can be derived from src and var. This frees 10 bits in the header
that will be used later.
"mode" is moved in the structure, because those bits will be used for
something else later.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 01:11:11 +0000 (20:11 -0500)]
nir/serialize: don't store deref types if not needed
- type_cast: deduplicate types if the last one is the same
- derive the type from the parent for other derefs
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 05:09:29 +0000 (00:09 -0500)]
nir/serialize: try to pack two alu srcs into 1 uint32
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 04:29:33 +0000 (23:29 -0500)]
nir/serialize: pack nir_intrinsic_instr::const_index[] better
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 03:25:15 +0000 (22:25 -0500)]
nir/serialize: pack 1-component constants into 20 bits if possible
The majority of constants can be packed like this.
v2: - use enum for the packing encoding,
- trim packed_value to 20 bits add 1 bit to last_component,
which simplifies a later commit
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 03:15:17 +0000 (22:15 -0500)]
nir/serialize: pack load_const with non-64-bit constants better
v2: use blob_write_uint8/16
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 02:31:40 +0000 (21:31 -0500)]
nir/serialize: try to store a diff in var data locations instead of var data
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 01:11:11 +0000 (20:11 -0500)]
nir/serialize: deduplicate serialized var types by reusing the last unique one
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Tue, 5 Nov 2019 00:42:42 +0000 (19:42 -0500)]
nir/serialize: don't serialize var->data for temporaries
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Wed, 30 Oct 2019 22:14:37 +0000 (18:14 -0400)]
nir/serialize: pack src better and limit the object count to 1M from 1G
We need to limit the object count to 1M to free 10 bits for the src
modifiers.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Fri, 25 Oct 2019 06:39:54 +0000 (02:39 -0400)]
nir/serialize: pack instructions better
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Wed, 20 Nov 2019 00:36:36 +0000 (19:36 -0500)]
util/blob: add 8-bit and 16-bit reads and writes
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Eric Anholt [Fri, 22 Nov 2019 23:16:27 +0000 (15:16 -0800)]
ci: Use a tag from the parallel-deqp-runner repo.
If the repo continues development, we don't want to accidentally pick
up potentially breaking changes on our next container rebuild.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Rob Clark [Thu, 21 Nov 2019 18:54:13 +0000 (10:54 -0800)]
gitlab-ci/freedreno/a6xx: remove most of the flakes
xfb + lines/points still flakes too frequently (and the problem isn't
even related to xfb), but we can add the rest back into this mix now.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Rob Clark [Sun, 17 Nov 2019 20:04:50 +0000 (12:04 -0800)]
gitlab-ci/deqp: generate junit results
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Rob Clark [Sun, 17 Nov 2019 19:57:26 +0000 (11:57 -0800)]
gitlab-ci/deqp: generate xml results for fails/flakes
Extract .qpa for the individual unexpected results and flakes, and
translate to xml, preserved with the artifacts. This allows easy
browsing of the test logs for fails/flakes, for easier debugging.
The # of logs to preserve is capped at 50 to avoid saving 100s of
megabytes of logs in case someone pushes a change that breaks
everything.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Rob Clark [Fri, 22 Nov 2019 21:30:18 +0000 (13:30 -0800)]
gitlab-ci: bump arm test container
To pick up updated cts_runner and netcat for the flake reporting.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Rob Clark [Sun, 17 Nov 2019 19:33:01 +0000 (11:33 -0800)]
gitlab-ci/deqp: detect and report flakes
If there are a small number of fails, re-run to determine if they are
flakes, and optionally (if `$FLAKES_CHANNEL` configured) report the
flakes.
This way flakes don't interfere with developers working on other
drivers, but get logged so that the developers working on the flaking
driver can monitor the situation.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Rob Clark [Sun, 17 Nov 2019 19:28:16 +0000 (11:28 -0800)]
gitlab-ci/deqp: preserve caselists for blocks with fails
Bump cts_runner to pick up the change to preserve .qpa and caselist .txt
files for blocks of tests that contain fails, and preserve the caselist
files. To reproduce fails that depend on order of running tests, these
are useful.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Rob Clark [Sun, 17 Nov 2019 19:16:09 +0000 (11:16 -0800)]
gitlab-ci/deqp: preserve full list of unexpected results
The log only shows the first 50, but preserve the full list for easier
browsing.
(Also move return of exit code to end which makes later patches in the
series easier)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Rob Clark [Fri, 15 Nov 2019 18:15:32 +0000 (10:15 -0800)]
gitlab-ci: update deqp build so we can generate xml
Update the deqp build to preserve testlog-to-xml and stylesheets, so
deqp runner can extract .qpa for failed/flaked tests, and convert to
xml. With this, will be able to browse output from failed tests
directly from the artifacts.
The main motiviation is to give better visibility into what happens with
flaked tests, when it is difficult/impossible to reproduce the flake
locally (ie. when it happens once out of N million tests). But this
should also make it easier to debug regressions that a MR triggers,
especially when it is on hw that you don't have.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Markus Wick [Tue, 5 Nov 2019 08:16:37 +0000 (09:16 +0100)]
drirc: Enable glthread for dolphin/citra/yuzu.
Dolphin: 75 fps -> 88 fps - Super Mario Galaxy
Citra: 81 fps -> 91 fps - A Link Between Worlds
Yuzu: 21 fps -> 27 fps - Super Mario Odyssey
Dolphin still has many syncs because of glFenceSync and glClientWaitSync.
Moving them to the dispatcher thread might yield another speedup.
Yuzu uses a compatible profile by default. This benchmark used the variable
MESA_GL_VERSION_OVERRIDE=4.5FC to overwrite this behavior.
This profilation was done on a mobile i7-8550U CPU with i965.
Signed-off-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Markus Wick [Sun, 3 Nov 2019 08:49:59 +0000 (09:49 +0100)]
mesa/glthread: Implement ARB_multi_bind.
Signed-off-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Rhys Perry [Fri, 22 Nov 2019 19:38:51 +0000 (19:38 +0000)]
aco: fix waitcnts for barriers at block ends
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: d1b9deee ('aco: improve waitcnt insertion around loops')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Zebediah Figura [Tue, 5 Nov 2019 16:21:21 +0000 (10:21 -0600)]
Revert "draw: revert using correct order for prim decomposition."
This reverts commit
f97b731c82afb06cfd6ffebc90a3e098a9a1b308.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/250
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Kenneth Graunke [Wed, 5 Jun 2019 20:15:35 +0000 (13:15 -0700)]
iris: Change keybox parenting
For temporary lookups, just allocate out of the NULL ralloc context,
so we don't have to edit the linked list of ralloc children to add it
and then immediately remove it again.
When uploading a new shader, allocate the keybox off the shader, so
if we delete the shader the keybox also goes away. Less manual cleanup.
Ian Romanick [Sat, 16 Nov 2019 21:19:47 +0000 (13:19 -0800)]
nir/range_analysis: Make sure the table validation only occurs once
All of the tables are static const, so they only need to be validated
once. As noted in the previous commit, the compiler should be able to
eliminate all of this code when the assertions would pass. Even with
the help of the previous commit, this does not always occur.
-Og: -95.688 +/- 3.91935 (-24.9562% +/- 1.0222%) N=5
-O1: No difference proven at 95.0% confidence. N=5
-O2: -1.962 +/- 0.85001 (-0.860013% +/- 0.372589%) N=5
Reviewed-by: Eric Anholt <eric@anholt.net>
Ian Romanick [Sat, 16 Nov 2019 21:23:31 +0000 (13:23 -0800)]
nir/range-analysis: Add pragmas to help loop unrolling
I was pretty liberal with these assertions when I wrote this code
because I had assumed that GCC would unroll the loops, inline the look ups
of static const arrays with now constant indices, and then elmininate
all the actuall assertions. It seems none of this happens even at -O3.
Adding the pragmas helps encourage loop unrolling at some optimization
levels. I tested by running shader-db with NIR_VALIDATE=false on a Core
i7 Haswell desktop system.
-Og: No difference proven at 95.0% confidence. N=5
-O1: -48.304 +/- 1.221 (-16.3343% +/- 0.412888%) N=5
-O2: -49.94 +/- 1.23521 (-17.9634% +/- 0.444303%) N=5
v2: Add a _Pragma to an inner loop that was accidentally dropped during
a rebase.
Reviewed-by: Eric Anholt <eric@anholt.net>
Danylo Piliaiev [Thu, 21 Nov 2019 13:04:37 +0000 (15:04 +0200)]
glsl: Add varyings to "zero-init of uninitialized vars" workaround
Varyings are similar to already handled cases. And "glsl_zero_init"
name of the workaround already looks like it should include varyings.
The issue was observed in GiMark subtest from GpuTest.
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Alyssa Rosenzweig [Thu, 21 Nov 2019 18:40:00 +0000 (13:40 -0500)]
pan/midgard: Use lower_tex_without_implicit_lod
Just a bit of cleanup. lower_tex can do this lowering for us, which
should also eliminate some special cases (one less thing to fix if we
ever need texturing in tess/geom/etc, perhaps?)
Closes #2133
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Christian Gmeiner [Fri, 15 Nov 2019 16:35:50 +0000 (17:35 +0100)]
etnaviv: use a more self-explanatory param name
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Christian Gmeiner [Fri, 15 Nov 2019 16:34:11 +0000 (17:34 +0100)]
etnaviv: drop not used config_out function param
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Samuel Pitoiset [Thu, 21 Nov 2019 07:29:25 +0000 (08:29 +0100)]
gitlab-ci: reduce the number of scons build
It seems overkill to me to build scons 7x for every pipeline.
Scons is now build with the oldest llvm version in scons-old-llvm
and with the newest llvm version in scons.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Alyssa Rosenzweig [Thu, 21 Nov 2019 13:43:21 +0000 (08:43 -0500)]
panfrost: Add lcra.c to Android.mk
This was forgotten.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Thu, 21 Nov 2019 13:45:27 +0000 (08:45 -0500)]
pan/midgard: Enable LOD lowering only on buggy chips
T720 and earlier need this workaround, so check the quirk before
lowering.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Wed, 20 Nov 2019 02:21:19 +0000 (21:21 -0500)]
pan/midgard: Describe quirk MIDGARD_BROKEN_LOD
Corresponds to errata #10471, applies to T6xx and T720. Fixed in T760.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Thu, 21 Nov 2019 13:43:53 +0000 (08:43 -0500)]
pan/midgard: Add LOD bias/clamp lowering
We fetch the info with the new intrinsic and lower with ALU ops for txl
instructions, which seemingly correspond to "TEXGRD" instructions (what
we call textureLod).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Thu, 21 Nov 2019 13:42:28 +0000 (08:42 -0500)]
pan/midgard: Implement load_sampler_lod_paramaters_pan
We can stuff this information in as parametrized system values, like we
currently do texture size and SSBO addresses.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Thu, 21 Nov 2019 13:41:22 +0000 (08:41 -0500)]
nir: Add load_sampler_lod_paramaters_pan intrinsic
This loads in the <min_lod, max_lod, lod_bias> settings for a given
sampler, which is necessary for lowering clamps/biases on certain
Midgard chips.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Markus Wick [Sun, 17 Nov 2019 18:12:04 +0000 (19:12 +0100)]
mapi/glapi: Generate sizeof() helpers instead of fixed sizes.
Generating a source code with a fixed size leads to issues with plattform dependent types.
We either hard code 4 or 8 bytes there, and both are wrong on the other plattform.
So this patch solves this issue by generating eg sizeof(GLsizeiptr), which is valid both
on 32 and on 64 bit plattforms.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Ian Romanick [Mon, 18 Nov 2019 19:52:47 +0000 (11:52 -0800)]
intel/fs: Disable conditional discard optimization on Gen4 and Gen5
The CMP instruction on Gen4 and Gen5 generates one bit (the LSB) of
valid data and 31 bits of junk. Results of comparisons that are used as
Boolean values need to have a fixup applied to generate the proper 0/~0
values.
Calling fs_visitor::nir_emit_alu with need_dest=false prevents the fixup
code from being generated. This results in a sequence like:
cmp.l.f0.0(16) g8<1>F g14<8,8,1>F 0x0F /* 0F */
...
cmp.l.f0.0(16) g4<1>F g6<8,8,1>F 0x0F /* 0F */
(+f0.1) or.z.f0.1(16) null<1>UD g4<8,8,1>UD g8<8,8,1>UD
instead of
cmp.l.f0.0(16) g8<1>F g14<8,8,1>F 0x0F /* 0F */
...
cmp.l.f0.0(16) g4<1>F g6<8,8,1>F 0x0F /* 0F */
or(16) g4<1>UD g4<8,8,1>UD g8<8,8,1>UD
(+f0.1) and.z.f0.1(16) null<1>UD g4<8,8,1>UD 1UD
I examined a couple of the shaders hurt by this change, and ALL of them
would have been affected by this bug. :(
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1836
Fixes: 0ba9497e66a ("intel/fs: Improve discard_if code generation")
Iron Lake
total instructions in shared programs:
8122757 ->
8122957 (<.01%)
instructions in affected programs: 8307 -> 8507 (2.41%)
helped: 0
HURT: 100
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 0.84% max: 6.67% x̄: 2.81% x̃: 2.76%
95% mean confidence interval for instructions value: 2.00 2.00
95% mean confidence interval for instructions %-change: 2.58% 3.03%
Instructions are HURT.
total cycles in shared programs:
188510100 ->
188510376 (<.01%)
cycles in affected programs: 76018 -> 76294 (0.36%)
helped: 0
HURT: 55
HURT stats (abs) min: 2 max: 12 x̄: 5.02 x̃: 4
HURT stats (rel) min: 0.07% max: 3.75% x̄: 0.86% x̃: 0.56%
95% mean confidence interval for cycles value: 4.33 5.71
95% mean confidence interval for cycles %-change: 0.60% 1.12%
Cycles are HURT.
GM45
total instructions in shared programs:
4994403 ->
4994503 (<.01%)
instructions in affected programs: 4212 -> 4312 (2.37%)
helped: 0
HURT: 50
HURT stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel) min: 0.84% max: 6.25% x̄: 2.76% x̃: 2.72%
95% mean confidence interval for instructions value: 2.00 2.00
95% mean confidence interval for instructions %-change: 2.45% 3.07%
Instructions are HURT.
total cycles in shared programs:
128928750 ->
128928982 (<.01%)
cycles in affected programs: 67442 -> 67674 (0.34%)
helped: 0
HURT: 47
HURT stats (abs) min: 2 max: 12 x̄: 4.94 x̃: 4
HURT stats (rel) min: 0.09% max: 3.75% x̄: 0.75% x̃: 0.53%
95% mean confidence interval for cycles value: 4.19 5.68
95% mean confidence interval for cycles %-change: 0.50% 1.00%
Cycles are HURT.
Dylan Baker [Fri, 22 Nov 2019 00:33:19 +0000 (16:33 -0800)]
docs: update calendar, add news item and link release notes for 19.2.6
Dylan Baker [Fri, 22 Nov 2019 00:31:47 +0000 (16:31 -0800)]
docs: Add SHA256 sum for 19.2.6
Dylan Baker [Fri, 22 Nov 2019 00:04:11 +0000 (16:04 -0800)]
docs: Add release notes for 19.2.6
Marek Olšák [Tue, 5 Nov 2019 02:29:56 +0000 (21:29 -0500)]
nir/serialize: do ctx = {0} instead of manual initializations
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Marek Olšák [Mon, 4 Nov 2019 23:09:26 +0000 (18:09 -0500)]
nir: strip as we serialize to remove the nir_shader_clone call
Serializing stripped NIR is faster now.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Christian Gmeiner [Tue, 6 Aug 2019 21:49:03 +0000 (23:49 +0200)]
etnaviv: add drm-shim
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric Engestrom [Thu, 21 Nov 2019 20:29:35 +0000 (20:29 +0000)]
vk_util: drop duplicate formats in vk_format_map[]
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Eric Anholt <eric@anholt.net>