Marek Olšák [Thu, 4 Jul 2019 01:12:46 +0000 (21:12 -0400)]
radeonsi: use BREAK_BATCH instead of FLUSH_DFSM when CB_TARGET_MASK changes
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Wed, 3 Jul 2019 04:22:29 +0000 (00:22 -0400)]
radeonsi/gfx10: don't expose unimplemented PIPE_CAP_QUERY_SO_OVERFLOW
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Thu, 4 Jul 2019 02:56:58 +0000 (22:56 -0400)]
radeonsi/gfx10: launch 2 compute waves per CU before going onto the next CU
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Thu, 4 Jul 2019 03:01:25 +0000 (23:01 -0400)]
radeonsi/gfx10: set more registers and fields
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Wed, 3 Jul 2019 04:09:21 +0000 (00:09 -0400)]
radeonsi/gfx10: enable LATE_ALLOC_GS
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Wed, 3 Jul 2019 03:35:05 +0000 (23:35 -0400)]
radeonsi/gfx10: set HS/GS/CS.WGP_MODE
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Wed, 3 Jul 2019 02:48:49 +0000 (22:48 -0400)]
radeonsi/gfx10: set GE_PC_ALLOC
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Wed, 3 Jul 2019 01:40:49 +0000 (21:40 -0400)]
radeonsi/gfx10: enable 1D textures
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Sat, 29 Jun 2019 03:48:14 +0000 (23:48 -0400)]
radeonsi/gfx10: enable image stores with DCC
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Sat, 29 Jun 2019 00:31:41 +0000 (20:31 -0400)]
radeonsi/gfx10: no need to invalidate L2 for framebuffer -> texture coherency
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Thu, 27 Jun 2019 02:57:10 +0000 (22:57 -0400)]
radeonsi/gfx10: support pixel shaders without exports
It only works if there are not color and no Z exports.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Thu, 27 Jun 2019 03:13:00 +0000 (23:13 -0400)]
radeonsi/gfx10: enable vertex shaders without param space allocation
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Thu, 4 Jul 2019 01:55:07 +0000 (21:55 -0400)]
radeonsi: update DCC settings from PAL
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Thu, 4 Jul 2019 00:43:28 +0000 (20:43 -0400)]
radeonsi: reorder shader IO indices for better IO space usage for tess and GS
The highest used index determines the stride for shader outputs in shaders
that use LDS or memory for outputs.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Wed, 3 Jul 2019 23:05:19 +0000 (19:05 -0400)]
radeonsi: decrease maximum supported GENERIC varying index from 42 to 31
This can decrease LDS and/or memory usage for shader outputs when geometry
shaders or tessellation is used.
Only PS inputs support higher indices and those aren't eliminated by
kill_outputs.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Wed, 3 Jul 2019 23:04:37 +0000 (19:04 -0400)]
radeonsi: cosmetic cleanup in si_shader_io_get_unique_index
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Tue, 2 Jul 2019 22:43:40 +0000 (18:43 -0400)]
radeonsi: fix and clean up shader_type passing
- don't pass it via a parameter if it can be derived from other parameters
- set shader_type for ac_rtld_open
- use enum pipe_shader_type instead of unsigned
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Thu, 27 Jun 2019 02:44:06 +0000 (22:44 -0400)]
radeonsi: enable RB+ for pixel shaders with no/non-contiguous color outputs
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie airlied@redhat.com
Marek Olšák [Tue, 25 Jun 2019 22:59:50 +0000 (18:59 -0400)]
radeonsi: don't set READ_ONLY for const_uploader to fix bindless texture hangs
Bindless textures can update descriptors with WRITE_DATA.
Cc: 19.1 <mesa-stable@lists.freedesktop.org>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Dave Airlie airlied@redhat.com
Alyssa Rosenzweig [Fri, 5 Jul 2019 15:40:22 +0000 (08:40 -0700)]
gallium: Add util_format_is_unorm8 check
Useful for formats that would work with the same driver code path as
RGBA8 UNORM but that don't meet the util_format_is_rgba8_variant
criteria due to a smaller channel count.
v2: Use simpler logic (suggested by Iago).
v3: Fix spelling erorr. boolean->bool (thank you airlied).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Alyssa Rosenzweig [Mon, 1 Jul 2019 22:01:19 +0000 (15:01 -0700)]
nir: Add Panfrost-specific blending intrinsic
This gives more flexibility than the normal store_deref/store_output
versions (particularly, it allows us to abuse the type system in awful
ways, which is necessary for efficient format conversion in blend
shaders.)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Pratik Vishwakarma [Tue, 9 Jul 2019 06:23:26 +0000 (11:53 +0530)]
radeonsi: Expose support for 10-bit VP9 decode
Fix si_vid_is_format_supported to expose support
for 10-bit VP9 decode using P016 format. Without
this change, 10-bit decode will be exposed only
for HEVC even though newer hardware support
10-bit decode for VP9.
Signed-off-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Alyssa Rosenzweig [Wed, 3 Jul 2019 20:00:14 +0000 (13:00 -0700)]
nir: Add nir_imm_vec4_16
We already have nir_imm_float16 and nir_imm_vec4; let's add the ability
to easily make immediate fp16 vectors as well, now that fp16 support is
maturing in NIR/GLSL.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Karol Herbst [Sun, 7 Jul 2019 19:27:47 +0000 (21:27 +0200)]
nvc0: remove nvc0_program.tp.input_patch_size
right now that's dead code
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Bas Nieuwenhuizen [Tue, 9 Jul 2019 09:03:56 +0000 (11:03 +0200)]
radv: Add a common member in the union to make things more clear.
This clarifies that the struct can be used when the shader can be
one of VS/TES.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Tue, 9 Jul 2019 09:00:33 +0000 (11:00 +0200)]
Revert "radv: keep track of whether NGG is used for GS on GFX10"
This reverts commit
63e0675d986744a9ed2d9a15b7cba84ff4a24fc2.
The GS is merged with the preceding shader and since the preceding
shader will have as_ngg set the final binary will have is_ngg set.
So we do not need the gs key here.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Juan A. Suarez Romero [Tue, 9 Jul 2019 09:22:13 +0000 (11:22 +0200)]
docs: update calendar, add news item and link release notes for 19.1.2
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Juan A. Suarez Romero [Tue, 9 Jul 2019 09:18:55 +0000 (09:18 +0000)]
docs: add sha256 checksums for 19.1.2
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit
e42399f4de80acb681b90ae4e35d8983b89d0329)
Juan A. Suarez Romero [Tue, 9 Jul 2019 09:09:53 +0000 (09:09 +0000)]
docs: add release notes for 19.1.2
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit
fe1f7b538b7e8e4bd221c5d52ae72a3721c6aa08)
Connor Abbott [Mon, 8 Jul 2019 16:17:30 +0000 (18:17 +0200)]
nir/lower_io_to_temporaries: Fix hash table leak
Fixes: c45f5db527252384395e55fb1149b673ec7b5fa8 ("nir/lower_io_to_temporaries: Handle interpolation intrinsics")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Bas Nieuwenhuizen [Tue, 9 Jul 2019 07:41:14 +0000 (09:41 +0200)]
radv/gfx10: Use correct gs_out for tess point_mode.
Fixes: 204e4da9b47 "radv: Use correct gs_out with tessellation."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Samuel Pitoiset [Tue, 9 Jul 2019 06:44:01 +0000 (08:44 +0200)]
radv: set correct number of VGPRs for GS on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Tue, 9 Jul 2019 06:44:00 +0000 (08:44 +0200)]
radv: fix VGT_ESGS_RING_ITEMSIZE for GS as NGG on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Tue, 9 Jul 2019 06:43:59 +0000 (08:43 +0200)]
radv: emit VGT_GS_MAX_VERT_OUT for legacy and NGG paths for GS
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Tue, 9 Jul 2019 06:43:58 +0000 (08:43 +0200)]
radv: emit the geometry shader as NGG if enabled on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Tue, 9 Jul 2019 06:43:57 +0000 (08:43 +0200)]
radv: keep track of whether NGG is used for GS on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Tue, 9 Jul 2019 06:43:56 +0000 (08:43 +0200)]
radv: add radv_pipeline_generate_hw_gs() helper
For legacy GS path.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Tue, 9 Jul 2019 06:27:31 +0000 (08:27 +0200)]
radv: fix setting VGT_REUSE_OFF for TES on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Tue, 9 Jul 2019 06:27:30 +0000 (08:27 +0200)]
radv: fix computing the number of ES VGPRS for TES on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Tue, 9 Jul 2019 06:27:29 +0000 (08:27 +0200)]
radv: set max workgroup size to 128 for TES as NGG on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Tue, 9 Jul 2019 06:27:28 +0000 (08:27 +0200)]
radv: fix allocating USER SGPRs on GFX10
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Alejandro Piñeiro [Thu, 4 Jul 2019 12:11:27 +0000 (14:11 +0200)]
v3d: Early return with handle 0 when getting a bo on the simulator
Until now we were just asking entries on the bo hash table, and don't
worry if the handle was NULL, as we were just expecting to get a NULL
in return. It seems that now the hash table assert with some reserverd
pointers, included NULL. This commit just early returns with handle 0.
This change fixes several crashes on vk-gl-cts GLES tests when using
the v3d simulator, like:
KHR-GLES3.core.internalformat.copy_tex_image.*
Reviewed-by: Eric Anholt <eric@anholt.net>
Lionel Landwerlin [Mon, 8 Jul 2019 13:04:06 +0000 (16:04 +0300)]
vulkan/overlay: use a single macro to lookup objects
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Lionel Landwerlin [Mon, 8 Jul 2019 13:03:14 +0000 (16:03 +0300)]
vulkan/overlay: add queue present timing measurement
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Bas Nieuwenhuizen [Mon, 8 Jul 2019 21:50:09 +0000 (23:50 +0200)]
radv/gfx10: Enable tess.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Mon, 8 Jul 2019 21:44:32 +0000 (23:44 +0200)]
radv/gfx10: Add pipeline state support for tess.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Mon, 8 Jul 2019 21:43:34 +0000 (23:43 +0200)]
radv/gfx10: Only set HW edge flags with gs & tess disabled.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Mon, 8 Jul 2019 21:42:45 +0000 (23:42 +0200)]
radv/gfx10: Add tess eval ngg shader support.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Mon, 8 Jul 2019 21:18:55 +0000 (23:18 +0200)]
radv: Use correct gs_out with tessellation.
We should use the primitives output by the TES in that case.
There is always a separate TES if there is no GS.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Mon, 8 Jul 2019 21:18:28 +0000 (23:18 +0200)]
radv/gfx10: Use correct count of max_offchip_buffers.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Tue, 9 Jul 2019 00:56:10 +0000 (02:56 +0200)]
radv/gfx10: Load global pointers in correct userdata registers for hs/gs.
Fixes: cfaad5e3cad "radv/gfx10: implement radv_emit_global_shader_pointers()"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Timothy Arceri [Mon, 8 Jul 2019 00:59:46 +0000 (10:59 +1000)]
radeonsi: update function name in comment
This was missed in
2361558eb71d
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Timothy Arceri [Mon, 8 Jul 2019 00:52:45 +0000 (10:52 +1000)]
r600: remove query/apply_opaque_metadata callbacks
Theses seem to have been radeonsi specific callbacks that are no
longer needed now that these drivers no longer share this code
path.
These callbacks were removed from radeonsi in
c0d44fe0e91c.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Lionel Landwerlin [Mon, 8 Jul 2019 13:00:59 +0000 (16:00 +0300)]
vulkan/overlay: fix crash on freeing NULL command buffer
It is legal to call vkFreeCommandBuffers() on NULL command buffers.
This fix requires
eb41ce1b012f24 ("util/hash_table: Properly handle
the NULL key in hash_table_u64").
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4438188f492e1f ("vulkan/overlay: record stats in command buffers and accumulate on exec/submit")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Lionel Landwerlin [Mon, 8 Jul 2019 07:30:50 +0000 (10:30 +0300)]
vulkan: bump headers & registry to 1.1.114
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Dave Airlie [Mon, 8 Jul 2019 19:08:09 +0000 (05:08 +1000)]
radv: only use specialised 3D meta paths on GFX9.
GFX10 appears to act like GFX8 here.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Ian Romanick [Sat, 9 Mar 2019 07:50:29 +0000 (23:50 -0800)]
mesa: Set minimum possible GLSL version
Set the absolute minimum possible GLSL version. API_OPENGL_CORE can
mean an OpenGL 3.0 forward-compatible context, so that implies a minimum
possible version of 1.30. Otherwise, the minimum possible version 1.20.
Since Mesa unconditionally advertises GL_ARB_shading_language_100 and
GL_ARB_shader_objects, every driver has GLSL 1.20... even if they don't
advertise any extensions to enable any shader stages (e.g.,
GL_ARB_vertex_shader).
Converts about 2,500 piglit tests from crash to skip on NV18.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109524
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110955
Cc: mesa-stable@lists.freedesktop.org
Caio Marcelo de Oliveira Filho [Mon, 8 Jul 2019 17:36:59 +0000 (10:36 -0700)]
anv: Set maxComputeSharedMemorySize to 64k
This value is supported since gen7. See also
8514c75a26e "i965: Set
compute shader shared memory max to 64k".
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Ian Romanick [Thu, 6 Jun 2019 18:00:40 +0000 (11:00 -0700)]
intel/vec4: Delete vec4_visitor::emit_lrp
Effectivley unused since
dd7135d55d5 ("intel/compiler: Use the flrp
lowering pass for all stages on Gen4 and Gen5"). I had intended to
remove this code as part of that series, but I forgot.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Fri, 7 Jun 2019 15:35:51 +0000 (08:35 -0700)]
nir: Allow nir_ssa_alu_instr_src_components to operate on non-SSA destinations
Existing users only operate on instructions with SSA destinations. Some
later patches add new direct calls and indirect calls (via existing NIR
functions) on instructions after going out of SSA. At the very least,
these calls are added by:
intel/vec4: Try to emit a VF source in try_immediate_source
intel/vec4: Try to emit a single load for multiple 3-src instruction operands
The first commit adds direct calls, and the second adds calls via
nir_alu_srcs_equal and nir_alu_srcs_negative_equal.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Mon, 10 Jun 2019 22:05:14 +0000 (15:05 -0700)]
nir: Handle swizzle in nir_alu_srcs_negative_equal
When I added this function, I was not sure if swizzles of immediate
values were a thing that occurred in NIR. The only existing user of
these functions is the partial redundancy elimination for compares.
Since comparison instructions are inherently scalar, this does not
occur.
However, a couple later patches, "nir/algebraic: Recognize
open-coded flrp(-1, 1, a) and flrp(1, -1, a)" combined with "intel/vec4:
Try to emit a single load for multiple 3-src instruction operands",
collaborate to create a few thousand instances.
No shader-db changes on any Intel platform.
v2: Handle the swizzle in nir_alu_srcs_negative_equal and leave
nir_const_value_negative_equal unchanged. Suggested by Jason.
v3: Correctly handle write masks. Add note (and assertion) that the
caller is responsible for various compatibility checks. The single
existing caller only calls this for combinations of scalar fadd and
float comparison instructions, so all of the requirements are met. A
later patch (intel/vec4: Try to emit a single load for multiple 3-src
instruction operands) will call this for sources of the same
instruction, so all of the requirements are met.
v4: Add unit test for nir_opt_comparison_pre that is fixed by this
commit.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Thu, 13 Jun 2019 21:06:55 +0000 (14:06 -0700)]
nir: nir_const_value_negative_equal compares one value at a time
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Thu, 13 Jun 2019 20:55:30 +0000 (13:55 -0700)]
nir: Port some const_value_negative_equal tests to alu_src_negative_equal
The next commit will make the existing tests irrelevant.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Ian Romanick [Thu, 13 Jun 2019 19:59:29 +0000 (12:59 -0700)]
nir: Pass fully qualified type to nir_const_value_negative_equal
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Thu, 13 Jun 2019 21:19:11 +0000 (14:19 -0700)]
nir: Use nir_src_bit_size instead of alu1->dest.dest.ssa.bit_size
This is important because, for example nir_op_fne has
dest.dest.ssa.bit_size == 1, but the source operands can be 16-, 32-, or
64-bits. Fixing this helps partial redundancy elimination for compares
in a few more shaders.
v2: Add unit tests for nir_opt_comparison_pre that are fixed by this
commit.
All Intel platforms had similar results.
total instructions in shared programs:
17179408 ->
17179081 (<.01%)
instructions in affected programs: 43958 -> 43631 (-0.74%)
helped: 118
HURT: 2
helped stats (abs) min: 1 max: 5 x̄: 2.87 x̃: 2
helped stats (rel) min: 0.06% max: 4.12% x̄: 1.19% x̃: 0.81%
HURT stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6
HURT stats (rel) min: 5.83% max: 6.06% x̄: 5.94% x̃: 5.94%
95% mean confidence interval for instructions value: -3.08 -2.37
95% mean confidence interval for instructions %-change: -1.30% -0.85%
Instructions are helped.
total cycles in shared programs:
360959066 ->
360942386 (<.01%)
cycles in affected programs: 774274 -> 757594 (-2.15%)
helped: 111
HURT: 4
helped stats (abs) min: 1 max: 1591 x̄: 169.49 x̃: 36
helped stats (rel) min: <.01% max: 24.43% x̄: 8.86% x̃: 2.24%
HURT stats (abs) min: 1 max: 2068 x̄: 533.25 x̃: 32
HURT stats (rel) min: 0.02% max: 5.10% x̄: 3.06% x̃: 3.56%
95% mean confidence interval for cycles value: -200.61 -89.47
95% mean confidence interval for cycles %-change: -10.32% -6.58%
Cycles are helped.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v1]
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: be1cc3552bc ("nir: Add nir_const_value_negative_equal")
Ian Romanick [Wed, 12 Jun 2019 20:19:25 +0000 (13:19 -0700)]
intel/vec4: Reswizzle VF immediates too
Previously, an instruction like
mul(8) vgrf29.xy:F, vgrf25.yxxx:F, [-1F, 1F, 0F, 0F]
would get rewritten as
mul(8) vgrf0.yz:F, vgrf25.yyxx:F, [-1F, 1F, 0F, 0F]
The latter does not produce the correct result. The VF immediate in the
second should be either [-1F, -1F, 1F, 1F] or [0F, -1F, 1F, 0F]. This
commit produces the former.
Fixes: 1ee1d8ab468 ("i965/vec4: Reswizzle sources when necessary.")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ian Romanick [Mon, 17 Jun 2019 23:27:37 +0000 (16:27 -0700)]
nir: Add unit tests for nir_opt_comparison_pre
Each tests has a comment with the expected before and after NIR. The
tests don't actually check this. The tests only check whether or not
the optimization pass reported progress. I couldn't think of a robust,
future-proof way to check the before and after code.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Dongwon Kim [Thu, 27 Jun 2019 16:54:37 +0000 (09:54 -0700)]
anv: disable repacking for compression for applicable gen
set bit15 (Disable Repacking for Compression) of CACHE_MODE_0 register
if the gen attribute, 'disable_ccs_repack' is set.
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Dongwon Kim [Thu, 27 Jun 2019 16:54:36 +0000 (09:54 -0700)]
iris: disable repacking for compression for applicable gen
set bit15 (Disable Repacking for Compression) of CACHE_MODE_0 register
if the gen attribute, 'disable_ccs_repack' is set.
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Dongwon Kim [Thu, 27 Jun 2019 16:54:35 +0000 (09:54 -0700)]
i965: disable repacking for compression for applicable gen
set bit15 (Disable Repacking for Compression) of CACHE_MODE_0 register
if the gen attribute, 'disable_ccs_repack' is set.
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Dongwon Kim [Thu, 27 Jun 2019 16:54:34 +0000 (09:54 -0700)]
intel: add disable_ccs_repack to gen_device_info
add a new attribute, 'disable_ccs_repack' to gen_device info, which
indicates whether repacking of components in certain pixel formats
before compression needs to be disabled to keep the compatibility
with decompression capability of display controller (gen11+)
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Dongwon Kim [Thu, 27 Jun 2019 16:54:33 +0000 (09:54 -0700)]
intel/genxml: correct bit fields in CACHE_MODE_0 reg for gen11
correct bit fields information of CACHE_MODE_0 reg in current gen11.xml
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Caio Marcelo de Oliveira Filho [Wed, 3 Jul 2019 19:10:43 +0000 (12:10 -0700)]
nir: print ptr_stride for deref_casts
Reviewed-by: Dave Airlie <airlied@redhat.com>
Caio Marcelo de Oliveira Filho [Sat, 8 Jun 2019 00:37:38 +0000 (17:37 -0700)]
anv: Advertise VK_EXT_shader_demote_to_helper_invocation
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Caio Marcelo de Oliveira Filho [Sat, 8 Jun 2019 06:08:04 +0000 (23:08 -0700)]
spirv: Implement SPV_EXT_demote_to_helper_invocation
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Caio Marcelo de Oliveira Filho [Fri, 7 Jun 2019 23:41:04 +0000 (16:41 -0700)]
spirv: Update the headers from latest Khronos master
This corresponds to
29c11140baaf9f7fdaa39a583672c556bf1795a1 in
https://github.com/KhronosGroup/SPIRV-Headers.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Caio Marcelo de Oliveira Filho [Sat, 8 Jun 2019 06:06:27 +0000 (23:06 -0700)]
intel/fs: Implement "demote to helper invocation"
The "demote" intrinsic works like "discard" but don't change the
control flow, allowing derivative operations to work. This is the
semantics of D3D discard.
The "is_helper_invocation" intrinsic will return true for helper
invocations -- both the ones that started as helpers and the ones that
where demoted. This is needed to avoid changing the behavior of
gl_HelperInvocation which is an input (so not expected to change
during shader execution).
v2: Emit the discard jump and comment why it is safe. (Jason)
Rework the is_helper_invocation() that was stomping f0.1. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Caio Marcelo de Oliveira Filho [Sat, 8 Jun 2019 00:29:05 +0000 (17:29 -0700)]
nir: Add demote and is_helper_invocation intrinsics
From SPV_EXT_demote_to_helper_invocation. Demote will be implemented
as a variant of discard, so mark uses_discard if it is used.
v2: Add CAN_ELIMINATE flag to the new intrinsic. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Samuel Pitoiset [Mon, 8 Jul 2019 11:45:08 +0000 (13:45 +0200)]
radv: do not emit VGT_FLUSH on GFX10
We don't need it.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Connor Abbott [Fri, 17 May 2019 14:31:17 +0000 (16:31 +0200)]
ac/nir: Remove now-unused interp_deref handling
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Connor Abbott [Tue, 14 May 2019 11:29:52 +0000 (13:29 +0200)]
radeonsi/nir: Use NIR barycentric intrinsics
This is simpler than radv, since the driver_location is already assigned
for us.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Connor Abbott [Mon, 13 May 2019 14:28:58 +0000 (16:28 +0200)]
radeonsi/nir: Delete unreachable code
We always get gl_FragCoord as a system value, not a varying, so this is
never hit. We already set PIXEL_CENTER_INTEGER elsewhere.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Connor Abbott [Mon, 27 May 2019 15:48:42 +0000 (17:48 +0200)]
compiler: Add color system value
This is nice to have with radeonsi, where color varyings are handled
specially to avoid recompiles.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Connor Abbott [Fri, 10 May 2019 08:44:20 +0000 (10:44 +0200)]
radv: Use NIR barycentric intrinsics
We have to add a few lowering to deal with things that used to be dealt
with inline when creating inputs. We also move the code that fills out
the radv_shader_variant_info struct for linking purposes to
radv_shader.c, as it's no longer tied to the NIR->LLVM lowering.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Connor Abbott [Mon, 13 May 2019 08:55:07 +0000 (10:55 +0200)]
ac/nir: Implement barycentric intrinsics
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Connor Abbott [Tue, 14 May 2019 10:10:11 +0000 (12:10 +0200)]
intel/nir: Extract add_const_offset_to_base
Pretty much every driver using nir_lower_io_to_temporaries followed by
nir_lower_io is going to want this. In particular, radv and radeonsi in
the next commits.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Connor Abbott [Wed, 15 May 2019 16:48:25 +0000 (18:48 +0200)]
nir/lower_io_to_temporaries: Handle interpolation intrinsics
These weren't properly supported. This does pretty much the same thing
that the radv code did.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Connor Abbott [Wed, 15 May 2019 10:49:29 +0000 (12:49 +0200)]
nir: Avoid coalescing vars created by lower_io_to_temporaries
Right now nir_copy_prop_vars is effectively undoing
nir_lower_io_to_temporaries for inputs by propagating the original
variable through the copy created in lower_io_to_temporaries. A
theoretical variable coalescing pass would have the same issue with
output variables, although that doesn't exist yet. To fix this, add a
new bit to nir_variable, and disable copy propagation when it's set.
This doesn't seem to affect any drivers now, probably since since no one
uses lower_io_to_temporaries for inputs as well as copy_prop_vars, but
it will fix radv once we flip on lower_io_to_temporaries for fs inputs.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Connor Abbott [Fri, 17 May 2019 12:56:45 +0000 (14:56 +0200)]
nir: Return correct size in nir_assign_io_var_locations()
It was double-counting cases where multiple variables were assigned to
the same slot, and not handling the case where the last variable is a
compact variable.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Connor Abbott [Tue, 14 May 2019 12:08:46 +0000 (14:08 +0200)]
nir: Handle compact variables when assigning i/o locations
These are used in Vulkan for clip/cull distances, instead of the GLSL
lowering when the clip/cull arrays are shared.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Connor Abbott [Fri, 10 May 2019 08:18:12 +0000 (10:18 +0200)]
nir: Move st_nir_assign_var_locations() to common code
It isn't really doing anything Gallium-specific, and it's needed for
handling component packing, overlapping, etc.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Connor Abbott [Mon, 13 May 2019 13:39:54 +0000 (15:39 +0200)]
radv: Make FragCoord a sysval
load_fragcoord is already handled in common code for radeonsi, so we
don't need to do anything to handle it. However, there were some passes
creating NIR with the varying, so we switch them over to the sysval. In
the case of nir_lower_input_attachments which is used by both radv and
anv, we add handling for both until intel switches to using a sysval.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Connor Abbott [Mon, 13 May 2019 13:32:26 +0000 (15:32 +0200)]
spirv: Add an option for making FragCoord a sysval
On AMD, FragCoord should be a sysval because it is handled separately
from all the other inputs. We were already doing this in radeonsi, but
we weren't doing it with radv. It'll be much more annoying to handle
VARYING_SLOT_POS in fragment shaders when we let NIR lower FS inputs for
us, so here we add an option so that radv can get it as a system value.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Daniel Schürmann [Fri, 5 Apr 2019 09:01:39 +0000 (11:01 +0200)]
radv: Lower input attachments in NIR.
v2 (Connor)
- Fix warning in release mode using MAYBE_UNUSED
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Daniel Schürmann [Fri, 5 Apr 2019 08:52:31 +0000 (10:52 +0200)]
radv: Implement nir_intrinsic_load_layer_id().
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Daniel Schürmann [Wed, 3 Apr 2019 15:29:20 +0000 (17:29 +0200)]
anv,nir: Move lower_input_attachments pass from ANV to NIR.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Dave Airlie [Mon, 8 Jul 2019 06:53:27 +0000 (16:53 +1000)]
radv/gfx10: don't emit PFP packets on ME.
This was done for all previous GPUs.
This fixes Talos Principle launch hangs.
Fixes: 7e43022e8c8 (radv/gfx10: add gfx10_cs_emit_cache_flush)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Sun, 7 Jul 2019 17:32:29 +0000 (19:32 +0200)]
ac: select the GFX ring when halting waves with UMR on GFX10
GFX10 has two rings, so UMR want to know which one to halt.
Select the first one by default.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bas Nieuwenhuizen [Sun, 7 Jul 2019 23:19:55 +0000 (01:19 +0200)]
radv/gfx10: Move NGG output handling outside of giant if-statement.
In merged shaders we put a big if around each shader, so both stages
can have a different number of threads. However, the NGG output code
still needs to run if the first shader is not executed.
This can happen when there are more gs threads than vs/es threads, or
when there are 0 es/vs threads (why? no clue).
Fixes: ee21bd7440c "radv/gfx10: implement NGG support (VS only)"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sun, 7 Jul 2019 21:03:33 +0000 (23:03 +0200)]
radv: Actually use VK formats for the format table.
No ETC2 or ASTC on navi so nothing to add.
Fixes: 3dc5ec5d167 "radv/gfx10: generate gfx10_format_table.h"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>