mesa.git
6 years agoradv: use a global BO list only for VK_EXT_descriptor_indexing
Samuel Pitoiset [Thu, 19 Apr 2018 11:48:33 +0000 (13:48 +0200)]
radv: use a global BO list only for VK_EXT_descriptor_indexing

Maintaining two different paths is annoying but this gets
rid of the performance regression introduced by the global
BO list.

We might find a better solution in the future, but for now
just keeps two paths.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoRevert "radv: Don't store buffer references in the descriptor set."
Samuel Pitoiset [Thu, 19 Apr 2018 11:39:17 +0000 (13:39 +0200)]
Revert "radv: Don't store buffer references in the descriptor set."

In order to reduce a performance regression introduced by
4b13fe55a4 ("radv: Keep a global BO list for VkMemory."),
we are going to maintain two different paths.

One when VK_EXT_descriptor_indexing is enabled by the
application because we need to have a global BO list, and
one (the old one) when it's not enabled.

With Talos on Polaris, the global BO list reduces performance
by 10% which is too much for me.

This reverts commit ab6cadd3ecc7fbdd9079808b407674e0b19c52f0.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoi965/fs: retype offset_reg to UD at load_ssbo
Jose Maria Casanova Crespo [Mon, 19 Mar 2018 14:03:17 +0000 (15:03 +0100)]
i965/fs: retype offset_reg to UD at load_ssbo

All operations with offset_reg at do_vector_read are done
with UD type. So copy propagation was not working through
the generated MOVs:

mov(8) vgrf9:UD, vgrf7:D

This change allows removing the MOV generated for reading the
first components for 16-bit and 64-bit ssbo reads with
non-constant offsets.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
6 years agoac/nir: use ac_build_image_opcode for image intrinsics
Nicolai Hähnle [Fri, 20 Apr 2018 07:30:07 +0000 (09:30 +0200)]
ac/nir: use ac_build_image_opcode for image intrinsics

So that we'll use the dimension-aware intrinsics in the future.

Acked-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi: generate image load/store/atomic ops using ac_build_image_opcode
Nicolai Hähnle [Fri, 20 Apr 2018 07:29:57 +0000 (09:29 +0200)]
radeonsi: generate image load/store/atomic ops using ac_build_image_opcode

In preparation of dimension-aware LLVM image intrinsics.

Acked-by: Marek Olšák <marek.olsak@amd.com>
6 years agoamd/common: pass address components individually to ac_build_image_intrinsic
Nicolai Hähnle [Fri, 23 Mar 2018 10:20:24 +0000 (11:20 +0100)]
amd/common: pass address components individually to ac_build_image_intrinsic

This is in preparation for the new image intrinsics.

Acked-by: Marek Olšák <marek.olsak@amd.com>
6 years agoamd/common: pass new enum ac_image_dim to ac_build_image_opcode
Nicolai Hähnle [Fri, 16 Feb 2018 13:21:56 +0000 (14:21 +0100)]
amd/common: pass new enum ac_image_dim to ac_build_image_opcode

This is in preparation for the new, dimension-aware LLVM image
intrinsics.

Acked-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi/nir: fix crash in test involving the sample mask
Nicolai Hähnle [Wed, 4 Apr 2018 19:14:13 +0000 (21:14 +0200)]
radeonsi/nir: fix crash in test involving the sample mask

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agoradeonsi/nir: set FS properties only when scanning a fragment shader
Nicolai Hähnle [Mon, 2 Apr 2018 11:20:02 +0000 (13:20 +0200)]
radeonsi/nir: set FS properties only when scanning a fragment shader

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agoac/nir: fix atomic compare-and-swap
Nicolai Hähnle [Mon, 2 Apr 2018 12:12:50 +0000 (14:12 +0200)]
ac/nir: fix atomic compare-and-swap

The LLVM instruction returns { i32, i1 }, where the i1 indicates success.
We're only interested in the first part, which is the loaded value.

Fixes dEQP-GLES31.functional.compute.shared_var.atomic.compswap.*

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agoradeonsi: fix error paths of si_texture_transfer_map
Nicolai Hähnle [Tue, 16 Jan 2018 13:38:00 +0000 (14:38 +0100)]
radeonsi: fix error paths of si_texture_transfer_map

trans is zero-initialized, but trans->resource is setup immediately so
needs to be dereferenced.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agoglsl: prevent spurious Valgrind errors when serializing NIR
Nicolai Hähnle [Fri, 23 Mar 2018 14:43:58 +0000 (15:43 +0100)]
glsl: prevent spurious Valgrind errors when serializing NIR

It looks as if the structure fields array is fully initialized below,
but in fact at least gcc in debug builds will not actually overwrite
the unused bits of bit fields.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agoclover: Fix host access validation for sub-buffer creation
Aaron Watry [Sat, 7 Apr 2018 18:44:53 +0000 (13:44 -0500)]
clover: Fix host access validation for sub-buffer creation

  From CL 1.2 Section 5.2.1:
    CL_INVALID_VALUE if buffer was created with CL_MEM_HOST_WRITE_ONLY and
    flags specify CL_MEM_HOST_READ_ONLY , or if buffer was created with
    CL_MEM_HOST_READ_ONLY and flags specify CL_MEM_HOST_WRITE_ONLY , or if
    buffer was created with CL_MEM_HOST_NO_ACCESS and flags specify
    CL_MEM_HOST_READ_ONLY or CL_MEM_HOST_WRITE_ONLY .

Fixes CL 1.2 CTS test/api get_buffer_info

v2: Correct host_access_flags check (Francisco)

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
6 years agonir: Offset vertex_id by first_vertex instead of base_vertex
Neil Roberts [Thu, 25 Jan 2018 18:15:43 +0000 (19:15 +0100)]
nir: Offset vertex_id by first_vertex instead of base_vertex

base_vertex will be zero for non-indexed calls and in that case we
need vertex_id to be offset by the ‘first’ parameter instead. That is
what we get with first_vertex. This is true for both GL and Vulkan.

The freedreno driver is also setting vertex_id_zero_based on
nir_options. In order to avoid breakage this patch switches the
relevant code to handle SYSTEM_VALUE_FIRST_VERTEX so that it can
retain the same behavior.

v2: change a3xx/fd3_emit.c and a4xx/fd4_emit.c from
SYSTEM_VALUE_BASE_VERTEX to SYSTEM_VALUE_FIRST_VERTEX (Kenneth).

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
6 years agospirv: Lower BaseVertex to FIRST_VERTEX instead of BASE_VERTEX
Neil Roberts [Thu, 25 Jan 2018 18:15:41 +0000 (19:15 +0100)]
spirv: Lower BaseVertex to FIRST_VERTEX instead of BASE_VERTEX

The base vertex in Vulkan is different from GL in that for non-indexed
primitives the value is taken from the firstVertex parameter instead
of being set to zero. This coincides with the new SYSTEM_VALUE_FIRST_VERTEX
instead of BASE_VERTEX.

v2 (idr): Add comment describing why SYSTEM_VALUE_FIRST_VERTEX is used
for SpvBuiltInBaseVertex.  Suggested by Jason.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agointel: Handle firstvertex in an identical way to BaseVertex
Antia Puentes [Thu, 25 Jan 2018 18:15:40 +0000 (19:15 +0100)]
intel: Handle firstvertex in an identical way to BaseVertex

Until we set gl_BaseVertex to zero for non-indexed draw calls
both have an identical value.

The Vertex Elements are kept like that:
* VE 1: <BaseVertex/firstvertex, BaseInstance, VertexID, InstanceID>
* VE 2: <Draw ID, 0, 0, 0>

v2 (idr): Mark nir_intrinsic_load_first_vertex as "unreachable" in
emit_system_values_block and fs_visitor::nir_emit_vs_intrinsic.

6 years agointel/compiler: Add a uses_firstvertex flag
Neil Roberts [Thu, 25 Jan 2018 18:15:39 +0000 (19:15 +0100)]
intel/compiler: Add a uses_firstvertex flag

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agocompiler: Add SYSTEM_VALUE_FIRST_VERTEX and instrinsics
Antia Puentes [Thu, 25 Jan 2018 18:15:38 +0000 (19:15 +0100)]
compiler: Add SYSTEM_VALUE_FIRST_VERTEX and instrinsics

This VS system value will contain the value passed as <basevertex> for
indexed draw calls or the value passed as <first> for non-indexed draw
calls. It can be used to calculate the gl_VertexID as
SYSTEM_VALUE_VERTEX_ID_ZERO_BASE plus SYSTEM_VALUE_FIRST_VERTEX.

From the OpenGL 4.6 spec, 10.4 "Drawing Commands Using Vertex Arrays":

-  Page 352:
"The index of any element transferred to the GL by DrawArraysOneInstance
is referred to as its vertex ID, and may be read by a vertex shader as
gl_VertexID.  The vertex ID of the ith element transferred is first +
i."

- Page 355:
"The index of any element transferred to the GL by
DrawElementsOneInstance is referred to as its vertex ID, and may be read
by a vertex shader as gl_VertexID.  The vertex ID of the ith element
transferred is the sum of basevertex and the value stored in the
currently bound element array buffer at offset indices + i."

Currently the gl_VertexID calculation uses SYSTEM_VALUE_BASE_VERTEX but
this will have to change when the value of gl_BaseVertex is
fixed. Currently its value is broken for non-indexed draw calls because
it must be zero but we are setting it to <first>.

v2: use SYSTEM_VALUE_FIRST_VERTEX as name for the value, instead of
SYSTEM_VALUE_BASE_VERTEX_ID (Kenneth).

v3 (idr): Rebase on Rob Clark converting nir_intrinsics.h to be
generated.  Reformat commit message to 72 columns.

Reviewed-by: Neil Roberts <nroberts@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agomeson: Build st_tests_common with gtest
Mike Lothian [Thu, 19 Apr 2018 09:02:39 +0000 (10:02 +0100)]
meson: Build st_tests_common with gtest

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106131
Fixes: 34cb4d0ebc1 ("meson: build tests for gallium mesa state tracker")
Signed-off-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agoradv: Add Vega M support.
Bas Nieuwenhuizen [Thu, 19 Apr 2018 04:35:08 +0000 (06:35 +0200)]
radv: Add Vega M support.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Add bound checking workaround for dynamic buffers.
Bas Nieuwenhuizen [Thu, 19 Apr 2018 05:29:03 +0000 (07:29 +0200)]
radv: Add bound checking workaround for dynamic buffers.

I have seen a few applications and games do the dynamic buffer bounds incorrectly, this
make it easier to work around, e.g. for debugging.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agosvga: Fix incorrect advertizing of EGL_KHR_gl_colorspace
Thomas Hellstrom [Thu, 12 Apr 2018 12:41:47 +0000 (14:41 +0200)]
svga: Fix incorrect advertizing of EGL_KHR_gl_colorspace

When advertizing this extension, egl_dri2 uses the DRI2_RENDERER_QUERY
extension to query whether an sRGB format is supported. That extension will
query our driver with the BIND flag PIPE_BIND_RENDER_TARGET rather than
PIPE_BIND_DISPLAY_TARGET which is used when building the configs.
We only return the correct value for PIPE_BIND_DISPLAY_TARGET.

The inconsistency causes EGL to crash at surface initialization if sRGB is
not supported. Fix this by supporting both bind flags.

Testing done:
piglit egl_gl_colorspace srgb

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
6 years agoswr: Fix include for createPromoteMemoryToRegisterPass
Mike Lothian [Wed, 4 Apr 2018 08:22:54 +0000 (09:22 +0100)]
swr: Fix include for createPromoteMemoryToRegisterPass

Include llvm/Transforms/Utils.h with the newest LLVM 7

v2: Include with " " rather than < > (Vinson Lee)

v3: Use LLVM_VERSION_MAJOR rather than HAVE_LLVM (George Kyriazis)

Signed-of-by: Mike Lothian <mike@fireburn.co.uk>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
6 years agoradv: enable DCC for MSAA 2x textures on VI under an option
Samuel Pitoiset [Tue, 17 Apr 2018 14:05:18 +0000 (16:05 +0200)]
radv: enable DCC for MSAA 2x textures on VI under an option

This can be enabled with RADV_PERFTEST=dccmsaa.

DCC for MSAA textures is actually not as easy to implement. It
looks like there is some corner cases. I will improve support
incrementally.

Vega support, as well as Polaris improvements, will be added later.

No CTS changes on Polaris using RADV_DEBUG=zerovram and
RADV_PERFTEST=dccmsaa.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: decompress DCC for multisampled source images before resolving
Samuel Pitoiset [Tue, 17 Apr 2018 14:05:17 +0000 (16:05 +0200)]
radv: decompress DCC for multisampled source images before resolving

Multisampled source images (ie. color attachments) can be now
DCC compressed, so the driver needs to perform a DCC decompression
pass before resolving

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: add a workaround for fast clears with DCC and MSAA textures
Samuel Pitoiset [Tue, 17 Apr 2018 14:05:16 +0000 (16:05 +0200)]
radv: add a workaround for fast clears with DCC and MSAA textures

This should be fixed at some point in order to improve
performance.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: allocate CMASK for DCC fast clear with MSAA
Samuel Pitoiset [Tue, 17 Apr 2018 14:05:15 +0000 (16:05 +0200)]
radv: allocate CMASK for DCC fast clear with MSAA

CMASK is required because it should be cleared to
0xCCCCCCCC for MSAA textures.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: implement fast color clear for DCC with MSAA
Samuel Pitoiset [Tue, 17 Apr 2018 14:05:14 +0000 (16:05 +0200)]
radv: implement fast color clear for DCC with MSAA

When DCC is enabled with MSAA textures, CMASK should be
cleared to 0xCCCCCCCC.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: make sure to sync after resolving using the compute path
Samuel Pitoiset [Tue, 17 Apr 2018 13:08:11 +0000 (15:08 +0200)]
radv: make sure to sync after resolving using the compute path

This fixes some random CTS failures:

dEQP-VK.renderpass.multisample.*.

Performing a fast-clear eliminate is still useless, but it
seems that we need to sync.

Found while running CTS with RADV_DEBUG=zerovram.

Fixes: 56a171a499c ("radv: don't fast-clear eliminate after resolving a subpass with compute")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: dump the SHA1 of SPIRV in the hang report
Samuel Pitoiset [Wed, 18 Apr 2018 16:53:44 +0000 (18:53 +0200)]
radv: dump the SHA1 of SPIRV in the hang report

Might be useful for debugging purposes, especially when we
want to replace a shader on the fly.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: Enable VK_EXT_descriptor_indexing.
Bas Nieuwenhuizen [Wed, 11 Apr 2018 17:08:30 +0000 (19:08 +0200)]
radv: Enable VK_EXT_descriptor_indexing.

This adds everything except non-uniform indexing, which needs a bit
more work and testing.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agospirv: Add support for runtime descriptor array cap.
Bas Nieuwenhuizen [Wed, 11 Apr 2018 23:36:22 +0000 (01:36 +0200)]
spirv: Add support for runtime descriptor array cap.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agospirv: Add support for VK_EXT_descriptor_indexing uniform indexing caps.
Bas Nieuwenhuizen [Wed, 11 Apr 2018 23:34:29 +0000 (01:34 +0200)]
spirv: Add support for VK_EXT_descriptor_indexing uniform indexing caps.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Support allocating variable size descriptor sets.
Bas Nieuwenhuizen [Mon, 9 Apr 2018 23:06:47 +0000 (01:06 +0200)]
radv: Support allocating variable size descriptor sets.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Add support for variable descriptor set layouts.
Bas Nieuwenhuizen [Mon, 9 Apr 2018 22:00:22 +0000 (00:00 +0200)]
radv: Add support for variable descriptor set layouts.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Fix GetDescriptorSetLayoutSupport.
Bas Nieuwenhuizen [Mon, 9 Apr 2018 21:36:19 +0000 (23:36 +0200)]
radv: Fix GetDescriptorSetLayoutSupport.

The continue means we do alignment differently than during creation,
making the buffer smaller than expected.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Use sorted bindings for set layout creation.
Bas Nieuwenhuizen [Mon, 9 Apr 2018 21:16:55 +0000 (23:16 +0200)]
radv: Use sorted bindings for set layout creation.

Previously we did not care about havin the set storage in order,
but for variable descriptor count we want the highest binding
at the end of the storage.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Don't store buffer references in the descriptor set.
Bas Nieuwenhuizen [Mon, 9 Apr 2018 11:02:14 +0000 (13:02 +0200)]
radv: Don't store buffer references in the descriptor set.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoradv: Keep a global BO list for VkMemory.
Bas Nieuwenhuizen [Mon, 9 Apr 2018 10:46:49 +0000 (12:46 +0200)]
radv: Keep a global BO list for VkMemory.

With update after bind we can't attach bo's to the command buffer
from the descriptor set anymore, so we have to have a global BO
list.

I am somewhat surprised this works really well even though we have
implicit synchronization in the WSI based on the bo list associations
and with the new behavior every command buffer is associated with
every swapchain image. But I could not find slowdowns in games because
of it.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agospirv: Update spirv.h to 12f8de9f04327336b699b1b80aa390ae7f9ddbf4
Bas Nieuwenhuizen [Sun, 8 Apr 2018 11:03:06 +0000 (13:03 +0200)]
spirv: Update spirv.h to 12f8de9f04327336b699b1b80aa390ae7f9ddbf4

Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
6 years agoi965: Fix shadow batches to be the same size as the real BO.
Kenneth Graunke [Fri, 13 Apr 2018 18:48:06 +0000 (11:48 -0700)]
i965: Fix shadow batches to be the same size as the real BO.

brw_bo_alloc may round up our allocation size to the next bucket size.
In this case, we would malloc a shadow buffer that was the original
intended size, but use bo->size (the larger size) for all of our checks.

This could cause us to run off the end of the shadow buffer.

v2: Actually use the new BO size (caught by Lionel)

Reported-by: James Xiong <james.xiong@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c7dcee58b5fe183e1653c13bff6a212f0d157b29 (i965: Avoid problems from referencing orphaned BOs after growing.)
6 years agoglsl_to_tgsi: try harder to lower unsupported ir_binop_vector_extract
Marek Olšák [Fri, 13 Apr 2018 19:18:26 +0000 (15:18 -0400)]
glsl_to_tgsi: try harder to lower unsupported ir_binop_vector_extract

This fixes some piglits.

Cc: 18.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agoradeon/vce: disable vce dual pipe on VegaM
Leo Liu [Wed, 22 Nov 2017 18:31:53 +0000 (13:31 -0500)]
radeon/vce: disable vce dual pipe on VegaM

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agoradeonsi: add support for VegaM
Marek Olšák [Mon, 27 Feb 2017 22:28:07 +0000 (23:28 +0100)]
radeonsi: add support for VegaM

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agoamd/addrlib: add support for VegaM
Marek Olšák [Tue, 7 Nov 2017 01:00:03 +0000 (02:00 +0100)]
amd/addrlib: add support for VegaM

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agoradeonsi/gfx9: fix a hang with an empty first IB
Marek Olšák [Tue, 17 Apr 2018 19:28:04 +0000 (15:28 -0400)]
radeonsi/gfx9: fix a hang with an empty first IB

This packet causes the no-op IB detection to fail, so the IB is always
submitted. Also fix the no-op IB detection by moving the begin call.

Cc: 18.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
6 years agomeson: build graw tests
Dylan Baker [Thu, 5 Apr 2018 21:39:13 +0000 (14:39 -0700)]
meson: build graw tests

This only enables the null and xlib target, so no windows support yet.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agomeson: build tests for gallium mesa state tracker
Dylan Baker [Tue, 6 Feb 2018 23:46:25 +0000 (15:46 -0800)]
meson: build tests for gallium mesa state tracker

v2: - Fix typo

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agomeson: build gallium unit tests
Dylan Baker [Thu, 11 Jan 2018 00:13:52 +0000 (16:13 -0800)]
meson: build gallium unit tests

v2: - gate unit tests on swrast being enabled (Eric A)
v3: - rebase on libtrace being merged with gallium auxiliary

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net> (v2)
6 years agomeson: Build gallium trivial tests
Dylan Baker [Thu, 11 Jan 2018 00:07:11 +0000 (16:07 -0800)]
meson: Build gallium trivial tests

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agomeson: Remove TODO about mesa/main tests
Dylan Baker [Wed, 10 Jan 2018 23:18:54 +0000 (15:18 -0800)]
meson: Remove TODO about mesa/main tests

They're already done.

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agomeson: enable glcpp test
Dylan Baker [Thu, 11 Jan 2018 22:41:42 +0000 (14:41 -0800)]
meson: enable glcpp test

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agoglcpp/tests: Convert shell scripts to a python script
Dylan Baker [Tue, 9 Jan 2018 23:26:39 +0000 (15:26 -0800)]
glcpp/tests: Convert shell scripts to a python script

This ports glcpp-test.sh and glcpp-test-cr-lf.sh to a python script that
accepts arguments for each line ending type. This should allow for
better reporting to users.

v2: - Use $PYTHON2 to be consistent with other tests in mesa

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agoglsl/tests: Remove unused compare_ir.py script
Dylan Baker [Thu, 11 Jan 2018 22:32:53 +0000 (14:32 -0800)]
glsl/tests: Remove unused compare_ir.py script

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
6 years agomeson: enable optimization-test
Dylan Baker [Thu, 11 Jan 2018 22:32:40 +0000 (14:32 -0800)]
meson: enable optimization-test

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
6 years agoglsl/tests: Convert optimization-test.sh to pure python
Dylan Baker [Sat, 9 Dec 2017 01:45:03 +0000 (17:45 -0800)]
glsl/tests: Convert optimization-test.sh to pure python

This patch converts optimization-test.sh to python, in this process it
removes external shell dependencies including diff. It replaces the
python script that generates shell scripts with a python library that
generates test cases and runs them using subprocess.

v2: - use $PYTHON2 to be consistent with other tests in mesa

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
6 years agomeson: run glsl compiler warnings test
Dylan Baker [Sat, 9 Dec 2017 01:45:03 +0000 (17:45 -0800)]
meson: run glsl compiler warnings test

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agoglsl/tests: reimplement warnings-test in python
Dylan Baker [Sat, 9 Dec 2017 01:25:50 +0000 (17:25 -0800)]
glsl/tests: reimplement warnings-test in python

This reimplements the test in python with a shell script wrapper that
allows autotools to continue to run the test without realizing that
anything has changed.

Using python has two advantages, first it's portable so this test can be
run on windows as well as Linux since it just requires python, no more
diff, pwd or sh. It's also no longer tied to autotools implementation
details, like the environment variables $srcdir and $abs_builddir,
though the autotools shell wrapper still uses those, which makes it
possible to run the test in meson.

v2: - Use $PYTHON2 in script to be consistent with other scripts in mesa

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agoswr/rast: Fix VGATHERPD lowering
George Kyriazis [Tue, 10 Apr 2018 22:49:19 +0000 (17:49 -0500)]
swr/rast: Fix VGATHERPD lowering

Also Implement VHSUBPS in x86 lowering pass.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Replace x86 VMOVMSK with llvm-only implementation
George Kyriazis [Tue, 10 Apr 2018 17:03:41 +0000 (12:03 -0500)]
swr/rast: Replace x86 VMOVMSK with llvm-only implementation

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Optimize late/bindless JIT of samplers
George Kyriazis [Tue, 10 Apr 2018 06:05:19 +0000 (01:05 -0500)]
swr/rast: Optimize late/bindless JIT of samplers

Add per-worker thread private data to all shader calls
Add per-worker sampler cache and jit context
Add late LoadTexel JIT support
Add per-worker-thread Sampler / LoadTexel JIT

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Implement VROUND intrinsic in x86 lowering pass
George Kyriazis [Mon, 9 Apr 2018 22:21:46 +0000 (17:21 -0500)]
swr/rast: Implement VROUND intrinsic in x86 lowering pass

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Refactor to improve code sharing.
George Kyriazis [Mon, 9 Apr 2018 18:35:43 +0000 (13:35 -0500)]
swr/rast: Refactor to improve code sharing.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: minimize codegen redundant work
George Kyriazis [Mon, 9 Apr 2018 17:51:14 +0000 (12:51 -0500)]
swr/rast: minimize codegen redundant work

Move filtering of redundant codegen operations into gen scripts themselves

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: double-pump in x86 lowering pass
George Kyriazis [Mon, 9 Apr 2018 16:47:37 +0000 (11:47 -0500)]
swr/rast: double-pump in x86 lowering pass

Add support for double-pumping a smaller SIMD width intrinsic.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Fix 64bit float loads in x86 lowering pass
George Kyriazis [Fri, 6 Apr 2018 21:39:09 +0000 (16:39 -0500)]
swr/rast: Fix 64bit float loads in x86 lowering pass

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Add shader stats infrastructure (WIP)
George Kyriazis [Fri, 6 Apr 2018 20:48:00 +0000 (15:48 -0500)]
swr/rast: Add shader stats infrastructure (WIP)

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Type-check TemplateArgUnroller
George Kyriazis [Fri, 6 Apr 2018 20:03:09 +0000 (15:03 -0500)]
swr/rast: Type-check TemplateArgUnroller

Allows direct use of enum values in conversion to template args.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Add vgather to x86 lowering pass.
George Kyriazis [Fri, 6 Apr 2018 18:19:01 +0000 (13:19 -0500)]
swr/rast: Add vgather to x86 lowering pass.

Add support for generic VGATHERPD intrinsic in x86 lowering pass.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: fix comment
George Kyriazis [Fri, 6 Apr 2018 19:16:12 +0000 (14:16 -0500)]
swr/rast: fix comment

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: add cvt instructions in x86 lowering pass
George Kyriazis [Thu, 5 Apr 2018 22:51:02 +0000 (17:51 -0500)]
swr/rast: add cvt instructions in x86 lowering pass

Support generic VCVTPD2PS and VCVTPH2PS in x86 lowering pass.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Fix alloca usage in jitter
George Kyriazis [Thu, 5 Apr 2018 20:59:54 +0000 (15:59 -0500)]
swr/rast: Fix alloca usage in jitter

Fix issue where temporary allocas were getting hoisted to function entry
unnecessarily. We now explicitly mark temporary allocas and skip hoisting
during the hoist pass. Shuold reduce stack usage.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Change gfx pointers to gfxptr_t
George Kyriazis [Thu, 5 Apr 2018 17:08:15 +0000 (12:08 -0500)]
swr/rast: Change gfx pointers to gfxptr_t

Changing type to gfxptr for indices and related changes to fetch and mem
builder code.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Fix byte offset for non-indexed draws
George Kyriazis [Tue, 10 Apr 2018 00:47:51 +0000 (19:47 -0500)]
swr/rast: Fix byte offset for non-indexed draws

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Add support for setting optimization level
George Kyriazis [Wed, 4 Apr 2018 22:34:54 +0000 (17:34 -0500)]
swr/rast: Add support for setting optimization level

for JIT compilation

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Adding translate call to builder_gfx_mem.
George Kyriazis [Thu, 29 Mar 2018 19:43:06 +0000 (14:43 -0500)]
swr/rast: Adding translate call to builder_gfx_mem.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Fix codegen for typedef types
George Kyriazis [Wed, 28 Mar 2018 19:43:09 +0000 (14:43 -0500)]
swr/rast: Fix codegen for typedef types

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr: add x86 lowering pass to fragment shader
George Kyriazis [Wed, 28 Mar 2018 19:31:20 +0000 (14:31 -0500)]
swr: add x86 lowering pass to fragment shader

Needed because some FP paths (namely stipple) use gather intrinsics
that now need to be lowered to x86.

v2: fix typo in commit message
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Enable generalized fetch jit
George Kyriazis [Fri, 23 Mar 2018 20:14:58 +0000 (15:14 -0500)]
swr/rast: Enable generalized fetch jit

Enable generalized fetch jit with 8 or 16 wide SIMD target. Still some
work needed to remove some simd8 double pumping for 16-wide target.

Also removed unused non-gather load vertices path.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Add builder_gfx_mem.{h|cpp}
George Kyriazis [Mon, 26 Mar 2018 18:29:04 +0000 (13:29 -0500)]
swr/rast: Add builder_gfx_mem.{h|cpp}

Abstract usage scenarios for memory accesses into builder_gfx_mem.
Builder_gfx_mem will convert gfxptr_t from 64-bit int to regular pointer
types for use by builder_mem.

v2: reworded commit message; renamed enum more appropriately
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Lower VGATHERPS and VGATHERPS_16 to x86.
George Kyriazis [Thu, 22 Mar 2018 20:25:36 +0000 (15:25 -0500)]
swr/rast: Lower VGATHERPS and VGATHERPS_16 to x86.

Some more work to do before we can support simultaneous 8-wide and
16-wide and remove the VGATHERPS_16 version.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Cleanup of JitManager convenience types
George Kyriazis [Wed, 21 Mar 2018 18:23:23 +0000 (13:23 -0500)]
swr/rast: Cleanup of JitManager convenience types

Small cleanup. Remove convenience types from JitManager and standardize
on the Builder's convenience types.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Lower PERMD and PERMPS to x86.
George Kyriazis [Tue, 20 Mar 2018 23:13:35 +0000 (18:13 -0500)]
swr/rast: Lower PERMD and PERMPS to x86.

Add support for providing an emulation callback function for arch/width
combinations that don't map cleanly to an x86 intrinsic.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Start refactoring of builder/packetizer.
George Kyriazis [Tue, 20 Mar 2018 00:05:38 +0000 (19:05 -0500)]
swr/rast: Start refactoring of builder/packetizer.

Move x86 intrinsic lowering to a separate pass. Builder now instantiates
generic intrinsics for features not supported by llvm. The separate x86
lowering pass is responsible for lowering to valid x86 for the target
SIMD architecture. Currently it's a port of existing code to get it
up and running quickly. Will eventually support optimized x86 for AVX,
AVX2 and AVX512.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Simplify #define usage in gen source file
George Kyriazis [Mon, 19 Mar 2018 22:46:13 +0000 (17:46 -0500)]
swr/rast: Simplify #define usage in gen source file

Removed preprocessor defines from structures passed to LLVM jitted code.

The python scripts do not understand the preprocessor defines and ignores
them. So for fields that are compiled out due to a preprocessor define
the LLVM script accounts for them anyway because it doesn't know what
the defines are set to. The sanitize defines for open source are fine
in that they're safely used.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Move CallPrint() to a separate file
George Kyriazis [Fri, 16 Mar 2018 15:26:25 +0000 (10:26 -0500)]
swr/rast: Move CallPrint() to a separate file

Needed work for jit code debug.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Fix name mangling for LLVM pow intrinsic
George Kyriazis [Thu, 15 Mar 2018 22:49:54 +0000 (17:49 -0500)]
swr/rast: Fix name mangling for LLVM pow intrinsic

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Add some archrast counters
George Kyriazis [Thu, 15 Mar 2018 20:58:10 +0000 (15:58 -0500)]
swr/rast: Add some archrast counters

Hook up archrast counters for shader stats: instructions executed.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Code cleanup
George Kyriazis [Thu, 15 Mar 2018 18:43:08 +0000 (13:43 -0500)]
swr/rast: Code cleanup

Removing some code that doesn't seem to do anything meaningful.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Add "Num Instructions Executed" stats intrinsic.
George Kyriazis [Thu, 15 Mar 2018 17:49:51 +0000 (12:49 -0500)]
swr/rast: Add "Num Instructions Executed" stats intrinsic.

Added a SWR_SHADER_STATS structure which is passed to each shader. The
stats pass will instrument the shader to populate this.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Add MEM_ADD helper function to Builder.
George Kyriazis [Thu, 15 Mar 2018 17:08:00 +0000 (12:08 -0500)]
swr/rast: Add MEM_ADD helper function to Builder.

mem[offset] += value

This function will be heavily used by all stats intrinsics.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Permute work for simd16
George Kyriazis [Wed, 14 Mar 2018 18:38:18 +0000 (13:38 -0500)]
swr/rast: Permute work for simd16

Fix slow permutes in PA tri lists under SIMD16 emulation on AVX

Added missing permute (interlane, immediate) to SIMDLIB

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: WIP builder rewrite (2)
George Kyriazis [Wed, 14 Mar 2018 17:29:04 +0000 (12:29 -0500)]
swr/rast: WIP builder rewrite (2)

Finish up the remaining explicit intrinsic uses. At this point all
explicit Intrinsic::getDeclaration() usage has been replaced with auto
generated macros generated with gen_llvm_ir_macros.py. Going forward,
make sure to only use the intrinsics here, adding new ones as needed.

Next step is to remove all references to x86 intrinsics to keep the
builder target-independent. Any x86 lowering will be handled by a
separate pass.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Add autogen of helper llvm intrinsics.
George Kyriazis [Tue, 13 Mar 2018 18:46:41 +0000 (13:46 -0500)]
swr/rast: Add autogen of helper llvm intrinsics.

Replace sqrt, maskload, fp min/max, cttz, ctlz with llvm equivalent.
Replace AVX maskedstore intrinsic with LLVM intrinsic. Add helper llvm
macros for stacksave, stackrestore, popcnt.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: WIP builder rewrite.
George Kyriazis [Mon, 12 Mar 2018 18:18:56 +0000 (13:18 -0500)]
swr/rast: WIP builder rewrite.

Start removing avx2 macros for functionality that exists in llvm.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: LLVM 6 fix
George Kyriazis [Tue, 13 Mar 2018 01:34:19 +0000 (20:34 -0500)]
swr/rast: LLVM 6 fix

for getting masked gather intrinsic (also compatible with LLVM 4)

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Changes to allow jitter to compile with LLVM5
George Kyriazis [Sat, 10 Mar 2018 06:04:11 +0000 (00:04 -0600)]
swr/rast: Changes to allow jitter to compile with LLVM5

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Add some archrast stats
George Kyriazis [Wed, 7 Mar 2018 01:32:53 +0000 (19:32 -0600)]
swr/rast: Add some archrast stats

Add stats for degenerate and backfacing primitive counts

Wire archrast stats for alpha blend and alpha test.
pass value to jitter, upon return have archrast event increment a value

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Silence some unused variable warnings
George Kyriazis [Fri, 9 Mar 2018 17:37:57 +0000 (11:37 -0600)]
swr/rast: Silence some unused variable warnings

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
6 years agoswr/rast: Add debug type info for i128
George Kyriazis [Thu, 8 Mar 2018 22:19:36 +0000 (16:19 -0600)]
swr/rast: Add debug type info for i128

Help support debug info in 16 wide shaders.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>