Samuel Pitoiset [Tue, 26 Mar 2019 10:34:42 +0000 (11:34 +0100)]
ac/nir: fix nir_op_b2i16
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Eric Engestrom [Tue, 26 Mar 2019 11:21:09 +0000 (11:21 +0000)]
meson: strip rpath from megadrivers
More specifically, use the library file that has been post-processed by Meson
when creating the hardlinks.
Bugs: https://bugs.freedesktop.org/show_bug.cgi?id=108766
Fixes: 3218056e0eb375eeda47 "meson: Build i965 and dri stack"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Tapani Pälli [Thu, 28 Mar 2019 09:35:27 +0000 (11:35 +0200)]
spirv: fix a compiler warning
Fixes implicit conversion from enumeration type 'SpvOp' warning.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Lionel Landwerlin [Sun, 31 Mar 2019 09:41:17 +0000 (10:41 +0100)]
i965: perf: update render basic configs for big core gen9/gen10
This updates allows an MI_LRI to trigger a OA report write in the
global OA buffer. This isn't really useful for us, we just keep close
to the internal public configs.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Sun, 31 Mar 2019 09:38:19 +0000 (10:38 +0100)]
i965: perf: add ring busyness metric for cfl gt2
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Fri, 4 May 2018 11:28:57 +0000 (12:28 +0100)]
i965: perf: enable Icelake metrics
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Fri, 4 May 2018 11:28:34 +0000 (12:28 +0100)]
i965: perf: add Icelake metrics
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Fri, 4 May 2018 10:32:07 +0000 (11:32 +0100)]
i965: perf: sklgt2: drop programming of an unused NOA register
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Thu, 3 May 2018 16:57:01 +0000 (17:57 +0100)]
i965: perf: hsw: drop register programming not needed on HSW
This register is flagged as IVB only in the documentation.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Thu, 3 May 2018 16:56:30 +0000 (17:56 +0100)]
i965: perf: chv: fixup counters names
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Thu, 3 May 2018 16:51:42 +0000 (17:51 +0100)]
i965: perf: add PMA stall metrics
These are new metrics for Gen8/9 to measure the effect of the PMA
stall workaround fix.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Thu, 3 May 2018 17:54:20 +0000 (18:54 +0100)]
i965: perf: sklgt2: update memory write config
This rework the programming between older pre-production steppings &
new ones.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Thu, 3 May 2018 17:52:46 +0000 (18:52 +0100)]
i965: perf: sklgt2: update compute metrics config
This unifies some of the programming between pre-production stepping
and production ones.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Thu, 3 May 2018 17:51:23 +0000 (18:51 +0100)]
i965: perf: sklgt2: update a priority for register programming
This makes no difference in term of programming, it's just a cleanup.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Alyssa Rosenzweig [Sun, 31 Mar 2019 04:34:22 +0000 (04:34 +0000)]
panfrost: Implement FIXED formats
Fixes crash in dEQP-GLES2.functional.draw.random.9
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Sun, 31 Mar 2019 04:27:29 +0000 (04:27 +0000)]
panfrost: Fix index calculation types and asserts
Fixes crash in
dEQP-GLES2.functional.draw.draw_elements.points.single_attribute.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Sun, 31 Mar 2019 04:26:48 +0000 (04:26 +0000)]
panfrost: Clean index state between indexed draws
Fixes subsequent tests in
dEQP-GLES2.functional.draw.draw_elements.indices.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Sun, 31 Mar 2019 04:15:46 +0000 (04:15 +0000)]
panfrost/decode: Print negative_start
This property slipped through..
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Fri, 29 Mar 2019 01:46:17 +0000 (01:46 +0000)]
panfrost: Implement missing texture formats
- Implements RGB565/RGBA5551 formats
- Don't advertise support for flipped RGBA5551 and ETC
Fixes remaining tests in dEQP-GLES2.functional.texture.format.* which is
now at 36/36.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Thu, 28 Mar 2019 23:47:10 +0000 (23:47 +0000)]
panfrost: Extend tiling for cubemaps
transfer_unmap now tiles for any tiled resource, not just TEXTURE_2D,
which should more than just cubemaps!
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Thu, 28 Mar 2019 04:33:28 +0000 (04:33 +0000)]
panfrost: Implement command stream for linear cubemaps
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Thu, 28 Mar 2019 04:27:13 +0000 (04:27 +0000)]
panfrost/midgard: Emit cubemap coordinates
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Thu, 28 Mar 2019 02:12:03 +0000 (02:12 +0000)]
panfrost: Include all cubemap faces in bitmap list
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Thu, 28 Mar 2019 01:59:02 +0000 (01:59 +0000)]
panfrost/decode: Decode all cubemap faces
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Wed, 27 Mar 2019 04:37:26 +0000 (04:37 +0000)]
panfrost: Preliminary work for cubemaps
Again, not yet functional, but this sets up the memory management for
cube maps.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Wed, 27 Mar 2019 04:36:42 +0000 (04:36 +0000)]
panfrost/midgard: Add L/S op for writing cubemap coordinates
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Wed, 27 Mar 2019 04:36:10 +0000 (04:36 +0000)]
panfrost/midgard: Disassemble `cube` texture op
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Wed, 27 Mar 2019 02:22:33 +0000 (02:22 +0000)]
panfrost: Fix vertex buffer corruption
Fixes crash in dEQP-GLES2.functional.buffer.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Rob Clark [Thu, 14 Mar 2019 19:12:28 +0000 (15:12 -0400)]
iris: fix set_sampler_view
Update to match docs.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rob Clark [Thu, 14 Mar 2019 12:48:08 +0000 (08:48 -0400)]
gallium/docs: clarify set_sampler_views (v2)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rob Clark [Sat, 23 Mar 2019 15:38:37 +0000 (11:38 -0400)]
freedreno/ir3: convert to "new style" frag inputs
Add support for load_barycentric_pixel, load_interpolated_input, and
friends. For now, this retains support for old-style inputs, which can
probably be dropped with some ttn work.
Prep work for sample-shading support.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Tue, 26 Mar 2019 15:28:34 +0000 (11:28 -0400)]
freedreno/ir3: add pass to move varying loads
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sun, 24 Mar 2019 15:58:21 +0000 (11:58 -0400)]
freedreno/ir3: rework varying packing
Originally we kept track of a table of inputs. But with new-style frag
inputs this becomes awkward. Re-work it so that initially we assigned
un-packed varying locations, and then after the shader is compiled scan
to find actual used inputs, and re-pack.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sat, 23 Mar 2019 15:37:49 +0000 (11:37 -0400)]
freedreno/ir3: re-indent comment
Make it more clear that it applies to the following 'case' statements,
rather than the previous one.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Thu, 28 Mar 2019 14:57:31 +0000 (10:57 -0400)]
nir: add lower_all_io_to_elements
I need this part of lower_all_io_to_temps but without the actual
lowering to temps part.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Wed, 27 Mar 2019 15:29:56 +0000 (11:29 -0400)]
nir: print var name for load_interpolated_input too
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Sergii Romantsov [Thu, 28 Feb 2019 11:35:54 +0000 (13:35 +0200)]
i965,iris/blorp: do not blit 0-sizes
Seems there is no sense in blitting 0-sized sources
or destinations.
Additionaly it may cause segfaults for i965.
v2: Function call replaced with inline check
v3: Added check to avoid devision by zero (L. Landwerlin)
v4: Added simillar check for Iris (L. Landwerlin)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110239
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Vinson Lee [Sat, 30 Mar 2019 06:24:05 +0000 (23:24 -0700)]
gallium: Fix autotools build with libxatracker.la.
CXXLD libxatracker.la
/usr/bin/ld: ../../../../src/gallium/auxiliary/.libs/libgallium.a(tgsi_to_nir.o): in function `ttn_finalize_nir':
src/gallium/auxiliary/nir/tgsi_to_nir.c:2111: undefined reference to `gl_nir_lower_samplers_as_deref'
/usr/bin/ld: src/gallium/auxiliary/nir/tgsi_to_nir.c:2113: undefined reference to `gl_nir_lower_samplers'
Fixes: 9a834447d652 ("tgsi_to_nir: Produce optimized NIR for a given pipe_screen.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109929
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Timur Kristóf [Thu, 14 Mar 2019 14:32:37 +0000 (15:32 +0100)]
gallium: fix autotools build of pipe_msm.la
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Fixes: 9a834447d652 ("tgsi_to_nir: Produce optimized NIR for a given pipe_screen.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109929
Jason Ekstrand [Wed, 27 Mar 2019 15:13:28 +0000 (10:13 -0500)]
nir: Lock around validation fail shader dumping
This prevents getting mixed-up results if a multi-threaded app has two
validation errors in different threads.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Brian Paul [Fri, 29 Mar 2019 16:45:44 +0000 (10:45 -0600)]
util: no-op __builtin_types_compatible_p() for non-GCC compilers
__builtin_types_compatible_p() is GCC-specific and breaks the
MSVC build.
This intrinsic has been in u_vector_foreach() for a long time, but
that macro has only recently been used in code
(nir/nir_opt_comparison_pre.c) that's built with MSVC.
Fixes: 2cf59861a ("nir: Add partial redundancy elimination for compares")
Reviewed-by: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Caio Marcelo de Oliveira Filho [Fri, 29 Mar 2019 18:55:08 +0000 (11:55 -0700)]
iris: Clean up compiler warnings about unused
Removed a few unused variables and iris_getparam_boolean().
Kept 'name' around since there's a commented debug that make use of it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Engestrom [Thu, 15 Nov 2018 18:07:33 +0000 (18:07 +0000)]
egl: hide entrypoints that shouldn't be exported when using glvnd
From GLVND author:
> From a functional standpoint, exporting additional symbols doesn't
> really matter, since libglvnd will load the vendor libraries with
> RTLD_LOCAL.
Suggested-by: Kyle Brenneman <kbrenneman@nvidia.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Kyle Brenneman <kbrenneman@nvidia.com>
Karol Herbst [Sun, 24 Mar 2019 04:36:36 +0000 (05:36 +0100)]
nir/validate: validate that tex deref sources are actually derefs
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Karol Herbst [Thu, 28 Mar 2019 22:50:54 +0000 (23:50 +0100)]
nir/print: fix printing the image_array intrinsic index
Fixes: 0de003be0363 ("nir: Add handle/index-based image intrinsics")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Timothy Arceri [Thu, 28 Mar 2019 00:43:06 +0000 (11:43 +1100)]
Revert "ac/nir: use new LLVM 8 intrinsics for SSBO atomic operations"
This reverts commit
29132af2347ede46a6d02422295a5fadbe5fe788.
It seems the new intrinsic causes a hang on radeonsi (VEGA) when running the
piglit test:
tests/spec/arb_shader_storage_buffer_object/execution/ssbo-atomicCompSwap-int.shader_test
Samuel Pitoiset [Fri, 29 Mar 2019 07:39:43 +0000 (08:39 +0100)]
ac: fix return type for llvm.amdgcn.frexp.exp.i32.64
This fixes the following piglit with RadeonSI
tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-frexp-dvec4.shader_test
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Gert Wollny [Wed, 27 Mar 2019 08:07:36 +0000 (09:07 +0100)]
virgl: Add a caps feature check version
When we add new feature checks on the host side that is used to
enable a cap conditionally that was enabled unconditionally before
we might end up with a feature regression when a new mesa version
is used with an old virglrenderer version that doesn't check for
that cap.
To work around this problem add a version id to the caps that corresponds
to the features that are actually checked on the host and check that
version too when enabling the cap.
Fixes: 2ee197d6e84aa37638d423363aca183952816067
virgl: Enable mixed color FBO attachemnets only when the host supports it
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Pohsien Wang <pwang@chromium.org>
Samuel Pitoiset [Thu, 28 Mar 2019 15:03:03 +0000 (16:03 +0100)]
radv: do not always initialize HTILE in compressed state
Especially when performing a transtion from UNDEFINED->GENERAL,
the driver shouldn't initialize HTILE metadata in compressed
state because it doesn't decompress when the src layout is
GENERAL.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110259
Fixes: 3a2e93147f7 ("radv: always initialize HTILE when the src layout is UNDEFINED")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Kenneth Graunke [Sat, 23 Mar 2019 16:32:38 +0000 (09:32 -0700)]
iris: Print the memzone name when allocating BOs with INTEL_DEBUG=buf
This gives me an idea of what kinds of buffers are being allocated on
the fly which could help inform our cache decisions.
Brian Paul [Fri, 29 Mar 2019 02:33:32 +0000 (20:33 -0600)]
nir: use {0} initializer instead of {} to fix MSVC build
Trivial change.
Fixes: c6ee46a75 ("nir: Add nir_alu_srcs_negative_equal")
Ian Romanick [Wed, 23 May 2018 01:56:41 +0000 (18:56 -0700)]
intel/compiler: Use partial redundancy elimination for compares
Almost all of the hurt shaders are repeated instances of the same shader
in synmark's compilation speed tests.
shader-db results:
All Gen6+ platforms had similar results. (Skylake shown)
total instructions in shared programs:
15256840 ->
15256389 (<.01%)
instructions in affected programs: 54137 -> 53686 (-0.83%)
helped: 288
HURT: 0
helped stats (abs) min: 1 max: 15 x̄: 1.57 x̃: 1
helped stats (rel) min: 0.06% max: 26.67% x̄: 1.99% x̃: 0.74%
95% mean confidence interval for instructions value: -1.76 -1.38
95% mean confidence interval for instructions %-change: -2.47% -1.50%
Instructions are helped.
total cycles in shared programs:
372286583 ->
372283851 (<.01%)
cycles in affected programs: 833829 -> 831097 (-0.33%)
helped: 265
HURT: 16
helped stats (abs) min: 2 max: 74 x̄: 11.81 x̃: 4
helped stats (rel) min: 0.04% max: 9.07% x̄: 0.99% x̃: 0.35%
HURT stats (abs) min: 2 max: 130 x̄: 24.88 x̃: 8
HURT stats (rel) min: <.01% max: 12.31% x̄: 1.44% x̃: 0.27%
95% mean confidence interval for cycles value: -12.30 -7.15
95% mean confidence interval for cycles %-change: -1.06% -0.64%
Cycles are helped.
Iron Lake and GM45 had similar results. (GM45 shown)
total instructions in shared programs:
5038653 ->
5038495 (<.01%)
instructions in affected programs: 13939 -> 13781 (-1.13%)
helped: 50
HURT: 1
helped stats (abs) min: 1 max: 15 x̄: 3.18 x̃: 4
helped stats (rel) min: 0.33% max: 13.33% x̄: 2.24% x̃: 1.09%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.83% max: 0.83% x̄: 0.83% x̃: 0.83%
95% mean confidence interval for instructions value: -3.73 -2.47
95% mean confidence interval for instructions %-change: -3.16% -1.21%
Instructions are helped.
total cycles in shared programs:
128118922 ->
128118228 (<.01%)
cycles in affected programs: 134906 -> 134212 (-0.51%)
helped: 50
HURT: 0
helped stats (abs) min: 2 max: 60 x̄: 13.88 x̃: 18
helped stats (rel) min: 0.06% max: 3.19% x̄: 0.74% x̃: 0.70%
95% mean confidence interval for cycles value: -16.54 -11.22
95% mean confidence interval for cycles %-change: -0.95% -0.53%
Cycles are helped.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Wed, 23 May 2018 01:19:16 +0000 (18:19 -0700)]
nir: Add partial redundancy elimination for compares
This pass attempts to dectect code sequences like
if (x < y) {
z = y - x;
...
}
and replace them with sequences like
t = x - y;
if (t < 0) {
z = -t;
...
}
On architectures where the subtract can generate the flags used by the
if-statement, this saves an instruction. It's also possible that moving
an instruction out of the if-statement will allow
nir_opt_peephole_select to convert the whole thing to a bcsel.
Currently only floating point compares and adds are supported. Adding
support for integer will be a challenge due to integer overflow. There
are a couple possible solutions, but they may not apply to all
architectures.
v2: Fix a typo in the commit message and a couple typos in comments.
Fix possible NULL pointer deref from result of push_block(). Add
missing (-A + B) case. Suggested by Caio.
v3: Fix is_not_const_zero to work correctly with types other than
nir_type_float32. Suggested by Ken.
v4: Add some comments explaining how this works. Suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Wed, 23 May 2018 01:18:07 +0000 (18:18 -0700)]
nir: Add nir_alu_srcs_negative_equal
v2: Move bug fix in get_neg_instr from the next patch to this patch
(where it was intended to be in the first place). Noticed by Caio.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Thu, 24 May 2018 18:37:51 +0000 (11:37 -0700)]
nir: Add nir_const_value_negative_equal
v2: Rebase on 1-bit Boolean changes.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Ian Romanick [Thu, 28 Feb 2019 04:15:32 +0000 (20:15 -0800)]
nir/algebraic: Add missing 16-bit extract_[iu]8 patterns
No shader-db changes on any Intel platform.
v2: Use a loop to generate patterns. Suggested by Jason.
v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would
replace an extract_i8 with and extract_u8. This broke ~180 tests. This
bug was introduced in v2.
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2]
Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
Ian Romanick [Thu, 28 Feb 2019 04:12:46 +0000 (20:12 -0800)]
nir/algebraic: Add missing 64-bit extract_[iu]8 patterns
No shader-db changes on any Intel platform.
v2: Use a loop to generate patterns. Suggested by Jason.
v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would
replace an extract_i8 with and extract_u8. This broke ~180 tests. This
bug was introduced in v2.
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2]
Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
Ian Romanick [Thu, 28 Feb 2019 04:08:38 +0000 (20:08 -0800)]
nir/algebraic: Remove redundant extract_[iu]8 patterns
No shader-db changes on any Intel platform.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Ian Romanick [Thu, 28 Feb 2019 03:52:12 +0000 (19:52 -0800)]
nir/algebraic: Fix up extract_[iu]8 after loop unrolling
Skylake, Broadwell, and Haswell had similar results. (Skylake shown)
total instructions in shared programs:
15256840 ->
15256837 (<.01%)
instructions in affected programs: 4713 -> 4710 (-0.06%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.06% max: 0.08% x̄: 0.06% x̃: 0.06%
total cycles in shared programs:
372286583 ->
372286583 (0.00%)
cycles in affected programs: 198516 -> 198516 (0.00%)
helped: 1
HURT: 1
helped stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10
helped stats (rel) min: <.01% max: <.01% x̄: <.01% x̃: <.01%
HURT stats (abs) min: 10 max: 10 x̄: 10.00 x̃: 10
HURT stats (rel) min: 0.01% max: 0.01% x̄: 0.01% x̃: 0.01%
No changes on any other Intel platform.
v2: Use a loop to generate patterns. Suggested by Jason.
v3: Fix a copy-and-paste bug in the extract_[ui] of ishl loop that would
replace an extract_i8 with and extract_u8. This broke ~180 tests. This
bug was introduced in v2.
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> [v2]
Acked-by: Jason Ekstrand <jason@jlekstrand.net> [v2]
Dave Airlie [Fri, 8 Mar 2019 03:09:05 +0000 (13:09 +1000)]
nir/deref: fix struct wrapper casts. (v3)
llvm/spir-v spits out some struct a { struct b {} }, but it
doesn't deref, it casts (struct a) to (struct b), reconstruct
struct derefs instead of casts for these.
v2: use ssa_def_rewrite uses, rework the type restrictions (Jason)
v3: squish more stuff into one function, drop unused temp (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Rafael Antognolli [Fri, 15 Feb 2019 23:43:12 +0000 (15:43 -0800)]
i965/blorp: Remove unused parameter from blorp_surf_for_miptree.
It seems pretty useless nowadays.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Anuj Phogat [Tue, 26 Mar 2019 22:46:24 +0000 (15:46 -0700)]
iris/icl: Add WA_2204188704 to disable pixel shader panic dispatch
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Anuj Phogat [Tue, 26 Mar 2019 22:45:29 +0000 (15:45 -0700)]
iris/icl: Set Enabled Texel Offset Precision Fix bit
h/w specification requires this bit to be always set.
See Mesa commit
5eb173304bd.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Rob Clark [Thu, 28 Mar 2019 17:33:30 +0000 (13:33 -0400)]
freedreno/ir3: align const size to vec4
This is no longer true since PIPE_CAP_PACKED_UNIFORMS was enabled.
Fixes: 3c8779af325 freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Tue, 26 Mar 2019 19:21:12 +0000 (15:21 -0400)]
freedreno/ir3: reads/writes to unrelated arrays are not dependent
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sun, 24 Mar 2019 15:16:12 +0000 (11:16 -0400)]
freedreno/ir3: sched fix
Not sure why new-style frag inputs start triggering this. But we
probably shouldn't consider src's from other blocks.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Tue, 26 Mar 2019 14:00:12 +0000 (10:00 -0400)]
freedreno/a6xx: small cleanup
Signed-off-by: Rob Clark <robdclark@gmail.com>
Kenneth Graunke [Tue, 19 Mar 2019 21:00:50 +0000 (14:00 -0700)]
iris: Fix blits with S8_UINT destination
For depth and stencil blits, we always want the main mask to be Z, and
the secondary pass mask to be S. If asked to blit Z+S to S, we should
handle the blit in the second pass which properly gets the stencil
resources.
Before, we were trying to handle S as the main mask, and accidentally
blitting a Z source to a S destination, which doesn't work out well.
Fixes Piglit's "framebuffer-blit-levels {draw,read} stencil" tests.
Kenneth Graunke [Mon, 11 Mar 2019 22:03:13 +0000 (15:03 -0700)]
st/mesa: Fix blitting from GL_DEPTH_STENCIL to GL_STENCIL_INDEX
Fixes assertion failures in Piglit's "framebuffer-blit-levels
{draw,read} stencil" tests on iris. Also fixes assert failures in
frameretrace, which tries to ReadPixels the stencil values (only)
from a Z24S8 depth/stencil attachment.
Reviewed-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Kristian H. Kristensen [Wed, 27 Mar 2019 22:31:49 +0000 (15:31 -0700)]
freedreno/ir3: Add workaround for VS samgq
This instruction needs a workaround when used from vertex shaders.
Fixes:
dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.sampler2dshadow_vertex
dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.sampler3d_fixed_vertex
dEQP-GLES3.functional.shaders.texture_functions.texturegradoffset.sampler3d_float_vertex
dEQP-GLES3.functional.shaders.texture_functions.textureprojgradoffset.sampler2dshadow_vertex
dEQP-GLES3.functional.shaders.texture_functions.textureprojgradoffset.sampler3d_fixed_vertex
dEQP-GLES3.functional.shaders.texture_functions.textureprojgradoffset.sampler3d_float_vertex
dEQP-GLES3.functional.shaders.texture_functions.textureprojgrad.sampler2dshadow_vertex
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Kristian H. Kristensen [Thu, 28 Mar 2019 06:05:01 +0000 (23:05 -0700)]
freedreno/ir3: Don't access beyond available regs
emit_cat5() needs to check if the last optional reg is there before it
accesses it.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Eric Engestrom [Tue, 19 Mar 2019 14:36:30 +0000 (14:36 +0000)]
util/disk_cache: close fd in the fallback path
There are multiple `goto path_fail` with an open fd, but none that go to
`fail:` without going through `path_fail:` first, so let's just move the
`close(fd)` there.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Samuel Pitoiset [Thu, 28 Mar 2019 11:23:24 +0000 (12:23 +0100)]
radv: skip updating depth/color metadata for conditional rendering
I don't think we should update metadata when conditional rendering
is enabled. For some reasons, some CTS breaks only on SI.
This fixes the following CTS on SI:
dEQP-VK.conditional_rendering.draw_clear.clear.depth.*
Cc: 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Kenneth Graunke [Thu, 28 Mar 2019 06:09:11 +0000 (23:09 -0700)]
st/nir: Free the GLSL IR after linking.
i965 does this, and st's tgsi path does this. st/nir did not.
Cuts 138MB of memory from a DiRT Rally trace, which is about 44%
of the total GLSL IR memory.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Samuel Pitoiset [Thu, 21 Mar 2019 09:24:11 +0000 (10:24 +0100)]
radv: enable VK_AMD_gpu_shader_int16
This extension allows 16-bit support to Frexp/FrexpStruct.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Fri, 22 Mar 2019 13:48:38 +0000 (14:48 +0100)]
radv: do not lower frexp_exp and frexp_sig
Hardware has two instructions.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Fri, 22 Mar 2019 10:59:32 +0000 (11:59 +0100)]
ac: add ac_build_frex_exp() helper ans 16-bit/32-bit support
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 21 Mar 2019 14:47:04 +0000 (15:47 +0100)]
ac: add ac_build_frexp_mant() helper and 16-bit/32-bit support
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Kenneth Graunke [Tue, 26 Mar 2019 07:25:31 +0000 (00:25 -0700)]
iris: Actually advertise some modifiers
I neglected to fill out this driver function, causing us to advertise
0 modifiers. Now we advertise the various tilings and let the driver
pick them. I've verified that X tiling works with Weston (by hacking
the list to skip Y tiling).
Y+CCS doesn't work yet because it's multiplane and the Gallium dri
state tracker isn't really prepared for that. Leave it off for now.
Toni Lönnberg [Wed, 16 Jan 2019 11:55:25 +0000 (13:55 +0200)]
intel/genxml: Media instructions and structures for gen11
v2: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
- fix missing type
- fix *_FQM_*/*_QM_* commands
- shorten some media structs using groups
- factor out memory attributes
- switch MI_FLUSH_DW fields to bool
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Toni Lönnberg [Wed, 16 Jan 2019 11:55:08 +0000 (13:55 +0200)]
intel/genxml: Media instructions and structures for gen10
v2: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
- fix missing type
- fix *_FQM_*/*_QM_* commands
- shorten some media structs using groups
- factor out memory attributes
- switch MI_FLUSH_DW fields to bool
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Toni Lönnberg [Wed, 16 Jan 2019 11:54:46 +0000 (13:54 +0200)]
intel/genxml: Media instructions and structures for gen9
v2: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
- fix missing type
- fix *_FQM_*/*_QM_* commands
- shorten some media structs using groups
- factor out memory attributes
- switch MI_FLUSH_DW fields to bool
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Toni Lönnberg [Wed, 16 Jan 2019 11:54:25 +0000 (13:54 +0200)]
intel/genxml: Media instructions and structures for gen8
v2: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
- switch MI_FLUSH_DW fields to bool
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Toni Lönnberg [Wed, 16 Jan 2019 11:54:02 +0000 (13:54 +0200)]
intel/genxml: Media instructions and structures for gen7.5
v2: Fixed MI_WAIT_FOR_EVENT to be for video also
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Toni Lönnberg [Wed, 16 Jan 2019 11:53:13 +0000 (13:53 +0200)]
intel/genxml: Media instructions and structures for gen7
v2: Fixed MI_WAIT_FOR_EVENT to be for blitter and video also
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Toni Lönnberg [Wed, 16 Jan 2019 11:52:11 +0000 (13:52 +0200)]
intel/genxml: Media instructions and structures for gen6
v2: Fixed MI_WAIT_FOR_EVENT to be for blitter and video also
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Toni Lönnberg [Thu, 15 Nov 2018 14:04:34 +0000 (16:04 +0200)]
intel/genxml: Only handle instructions meant for render engine when generating
headers
v2: Fixed the check for engine
v3: Changed engine into an argument given to the scripts
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Dave Airlie [Wed, 27 Mar 2019 05:21:08 +0000 (15:21 +1000)]
softpipe: add indirect store buffer/image unit
The code to handle image unit indirect was missing
Fixes piglit tests/spec/arb_arrays_of_arrays/execution/image_store/basic-imageStore-mixed-const-non-const-uniform-index.shader_test
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Wed, 27 Mar 2019 04:06:50 +0000 (14:06 +1000)]
softpipe/draw: fix vertex id in soft paths.
This fixes the vertex id fetch in the non-llvm drawing paths.
This vertex id in elt mode comes from the elts not just a linear
value.
Note we don't bad basevertex in the elts case as it's already included
in the elts by the looks of it (at least tests fail if I add it)
Fixes piglit end-primitive tests and some others.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Kristian H. Kristensen [Tue, 26 Mar 2019 17:31:54 +0000 (10:31 -0700)]
freedreno/ir3: Push UBOs to constant file
We have a rather big constant file and it seems that the best way to
use it is to upload all UBOs and lower UBO access the load_uniform.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Kristian H. Kristensen [Tue, 26 Mar 2019 17:31:54 +0000 (10:31 -0700)]
freedreno/ir3: Enable PIPE_CAP_PACKED_UNIFORMS
This commit turns on the gallium cap and adds a pass to lower the
load_ubo intrinsics for block 0 back to load_uniform intrinsics and
adjust the backend where the cap switches units from vec4s to dwords.
As we stop using ir3_glsl_type_size() for uniform layout, this also
corrects an issue where we would allocate a vec4 slot for samplers in
uniforms, fixing:
dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_fragment
dEQP-GLES3.functional.shaders.struct.uniform.sampler_array_vertex
dEQP-GLES3.functional.shaders.struct.uniform.sampler_nested_fragment
dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_vertex
dEQP-GLES2.functional.shaders.struct.uniform.sampler_nested_fragment
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Kristian H. Kristensen [Tue, 26 Mar 2019 16:53:38 +0000 (09:53 -0700)]
st/glsl_to_nir: Calculate num_uniforms from NumParameterValues
We don't need to determine the number of uniform slots here, it's
already available as prog->Parameters->NumParameterValues. The way we
previously determined the number of slots was also broken for
PackedDriverUniformStorage, where we would add loc (in dwords) and
type_size() (in vec4s).
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Anuj Phogat [Mon, 21 May 2018 20:54:13 +0000 (13:54 -0700)]
intel: Add Elkhart Lake PCI-IDs
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Anuj Phogat [Fri, 7 Sep 2018 21:40:12 +0000 (14:40 -0700)]
intel: Add Elkhart Lake device info
V2: Fix L3 bank count (Vivek)
Fix simulator_id and num_eu_per_subslice (Lionel)
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Leo Liu [Tue, 26 Mar 2019 18:36:09 +0000 (14:36 -0400)]
radeon/vcn: add H.264 constrained baseline support
VCN supports this profile as well as UVD, so add it
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
CC: <mesa-stable@lists.freedesktop.org>
Gurchetan Singh [Wed, 27 Mar 2019 02:20:13 +0000 (19:20 -0700)]
egl/android: chose node type based on swrast and preprocessor flags
kms_swrast can work with primary nodes out of the box, but also
with rendernodes if the build environment specifies the
EGL_FORCE_RENDERNODE flag.
Suggested-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Gurchetan Singh [Wed, 13 Mar 2019 17:59:59 +0000 (10:59 -0700)]
egl/android: use software rendering when appropriate
Now the init logic fallbacks to or forces software rendering.
v2: simplify flow (@eric)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Gurchetan Singh [Wed, 13 Mar 2019 17:49:20 +0000 (10:49 -0700)]
egl/android: use swrast option in droid_load_driver
Load the kms_swrast driver when specified.
Doesn't work with drm_gralloc.
v2: remove unneeded line (@eric)
v3: Remove swrast_loader_extensions (@evelikov)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Gurchetan Singh [Wed, 13 Mar 2019 17:43:35 +0000 (10:43 -0700)]
egl/android: plumb swrast option
It's good to have options.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Gurchetan Singh [Wed, 27 Mar 2019 00:13:01 +0000 (17:13 -0700)]
egl/android: refactor droid_load_driver a bit
This way, we can use primary nodes with kms_swrast too.
Also fix up some whitespace issues.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>