mesa.git
4 years agoanv: implement VK_KHR_timeline_semaphore
Lionel Landwerlin [Tue, 16 Oct 2018 22:44:31 +0000 (17:44 -0500)]
anv: implement VK_KHR_timeline_semaphore

v2: Fix inverted condition in vkGetPhysicalDeviceExternalSemaphoreProperties()

v3: Add anv_timeline_* helpers (Jason)

v4: Avoid variable shadowing (Jason)
    Split timeline wait/signal device operations (Jason/Lionel)

v5: s/point/signal_value/ (Jason)
    Drop piece of drm-syncobj timeline code (Jason)

v6: Add missing sync_fd semaphore signaling (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agoanv: Plumb timeline semaphore signal/wait values through from the API
Jason Ekstrand [Tue, 16 Oct 2018 20:58:14 +0000 (15:58 -0500)]
anv: Plumb timeline semaphore signal/wait values through from the API

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agoanv/wsi: signal the semaphore in the acquireNextImage
Lionel Landwerlin [Thu, 11 Jul 2019 12:21:04 +0000 (15:21 +0300)]
anv/wsi: signal the semaphore in the acquireNextImage

We seem to have forgotten about the semaphore in the
acquireNextImageInfo.

v2: Signal semaphore/fence regardless of presentation status (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agoanv: Lock around fetching sync file FDs from semaphores
Jason Ekstrand [Mon, 11 Nov 2019 16:58:44 +0000 (10:58 -0600)]
anv: Lock around fetching sync file FDs from semaphores

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agoanv: prepare the driver for delayed submissions
Lionel Landwerlin [Mon, 30 Sep 2019 09:30:20 +0000 (12:30 +0300)]
anv: prepare the driver for delayed submissions

Timeline semaphore introduce support for wait before signal behavior,
which means that it is now allowed to call vkQueueSubmit() with wait
semaphores not yet submitted for execution. Our kernel driver requires
all of the wait primitives to be created before calling the execbuf
ioctl. As a result, we must delay submissions in the userspace driver.
This change store the necessary information to be able to delay a
VkSubmitInfo submission to the kernel driver.

v2: Fold count++ into array access (Jason)
    Move queue list to another patch (Jason)

v3: Document cleanup of temporary semaphores (Jason)

v4: Track semaphores of SYNC_FD type that needs updating after delayed
    submission

v5: Don't forget to update sync_fd in signaled semaphores after
    submission (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agoanv: refcount semaphores
Lionel Landwerlin [Sat, 26 Oct 2019 15:59:59 +0000 (18:59 +0300)]
anv: refcount semaphores

Delayed submissions required by timeline semaphores mean we need to be
able to update the sync fd backed semaphores in a delayed fashion.
This could mean a race between the application destroying the
semaphore and the submission code trying to update it with the new
sync fd.

This change prepares semaphores to be refcounted, we'll most likely
only take a reference for cases where we signal a sync fd semaphore.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agoanv: prepare driver to report submission error through queues
Lionel Landwerlin [Fri, 23 Aug 2019 11:48:28 +0000 (13:48 +0200)]
anv: prepare driver to report submission error through queues

When we will submit to i915 from a submission thread, we won't be able
to directly report the error to the user (in particular through the
debug report callbacks). So prepare 2 paths to report errors device ->
notifying the user immediately, queue -> notifying the user the next
time an entry point is called.

In this change we still report directly for both paths, this will
change in the next commit.

v2: Split NULL batch parameter handling in
    anv_queue_submit_simple_batch() in a different commit

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agoanv: allow NULL batch parameter to anv_queue_submit_simple_batch
Lionel Landwerlin [Fri, 23 Aug 2019 17:14:34 +0000 (20:14 +0300)]
anv: allow NULL batch parameter to anv_queue_submit_simple_batch

We can reuse device->trivial_batch_bo

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agoanv: move queue init/finish to anv_queue.c
Lionel Landwerlin [Fri, 23 Aug 2019 10:30:42 +0000 (12:30 +0200)]
anv: move queue init/finish to anv_queue.c

Prepare the queue initialization to take on more responsabilities and
possibly fail.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agoanv: expose timeout helpers outside of anv_queue.c
Lionel Landwerlin [Wed, 7 Aug 2019 13:46:45 +0000 (16:46 +0300)]
anv: expose timeout helpers outside of anv_queue.c

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agoanv: detach batch emission allocation from device
Lionel Landwerlin [Thu, 1 Aug 2019 10:21:41 +0000 (13:21 +0300)]
anv: detach batch emission allocation from device

In the future we'll have 2 different allocations depending on whether
we're using threaded submission or not.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agoanv: remove list items on batch fini
Lionel Landwerlin [Thu, 19 Sep 2019 22:24:53 +0000 (01:24 +0300)]
anv: remove list items on batch fini

This doesn't seem to fix anything because those destroy() calls happen
right before the command buffer object & its list of batch_bo is also
destroyed. Still looks a bit cleaner.

v2: Found a second occurence

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
Fixes: 26ba0ad54d ("vk: Re-name command buffer implementation files")
Cc: <mesa-stable@lists.freedesktop.org>
4 years agoanv: invalidate file descriptor of semaphore sync fd at vkQueueSubmit
Lionel Landwerlin [Thu, 29 Aug 2019 11:54:12 +0000 (14:54 +0300)]
anv: invalidate file descriptor of semaphore sync fd at vkQueueSubmit

We always close the in_fence at the end the anv_cmd_buffer_execbuf()
so when we take it from the semaphore, let's not forget to invalidate
it.

Note that the code leaks the fence_in if we get any error before
reaching the close(). Let's fix that in another patch or better,
rewrite the whole thing!

v2: drop redundant fd = -1 (Jason)

v3: Update commit message (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agoradv: fix radv_nir_get_max_workgroup_size when nir=NULL
Rhys Perry [Mon, 11 Nov 2019 11:16:31 +0000 (11:16 +0000)]
radv: fix radv_nir_get_max_workgroup_size when nir=NULL

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 84a1a2578 ('compiler: pack shader_info from 160 bytes to 96 bytes')
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agomesa: check framebuffer completeness only after state update
Lionel Landwerlin [Mon, 11 Nov 2019 10:32:50 +0000 (12:32 +0200)]
mesa: check framebuffer completeness only after state update

The change made in 88d665830f27 ("mesa: check draw buffer completeness
on glClearBufferfi/glClearBufferiv") correctly updated the state prior
to checking the framebuffer completeness on glClearBufferiv but not in
glClearBufferfi.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Fixes: 88d665830f27 ("mesa: check draw buffer completeness on glClearBufferfi/glClearBufferiv")
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/2072

4 years agoglsl: Check earlier for MaxTextureImageUnits and MaxImageUniforms
Caio Marcelo de Oliveira Filho [Sat, 9 Nov 2019 06:21:10 +0000 (22:21 -0800)]
glsl: Check earlier for MaxTextureImageUnits and MaxImageUniforms

Currently the linker do all the work then check for the limits, which
means num_textures and num_images in shader_info may have to store more
than the limit.  This breaks down now since shader_info was packed and
doesn't expect to store larger invalid values.

To fix this, pull the check before we set the counts in shader_info.
Add necessary plumbing to make sure we bail once those errors are
found.

Fixes: 84a1a2578da ("compiler: pack shader_info from 160 bytes to 96 bytes")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agoglsl: Check earlier for MaxShaderStorageBlocks and MaxUniformBlocks
Caio Marcelo de Oliveira Filho [Sat, 9 Nov 2019 06:00:10 +0000 (22:00 -0800)]
glsl: Check earlier for MaxShaderStorageBlocks and MaxUniformBlocks

Currently the linker do all the work then check for the limits, which
means num_ssbos and num_ubos in shader_info may have to store more
than the limit.  This breaks down now since shader_info was packed and
doesn't expect to store larger invalid values.

To fix this, pull the check before we set the counts in shader_info.
One drawback of this approach is that for some cases we might not see
the collected errors from various stages, but bail as soon as a stage
breaks the limits.

Fixes: 84a1a2578da ("compiler: pack shader_info from 160 bytes to 96 bytes")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agoutil: Use ZSTD for shader cache if possible
Dylan Baker [Thu, 31 Oct 2019 20:26:00 +0000 (13:26 -0700)]
util: Use ZSTD for shader cache if possible

This allows ZSTD instead of ZLIB to be used for compressing the shader
cache.

On a 72 core system emulating skl with a full shader-db (with i965):
ZSTD:
    1915.10s user 229.27s system 5150% cpu 41.632 total (cold cache)
    225.40s user 10.87s system 3810% cpu 6.201 total (warm cache)
    154M (235M on disk)
ZLIB:
    2231.33s user 194.24s system 1899% cpu 2:07.72 total (cold cache)
    229.15s user 10.63s system 3906% cpu 6.139 total (warm cache)
    163M (244M on disk)

Tim Arceri sees (8 core ryzen and a full shader-db):
ZSTD:
    2505.22 user 40.50 system 3:18.73 elapsed 1280% CPU (cold cache)
    418.71 user 14.93 system 0:46.53 elapsed 931% CPU (warm cache)
    454.3 MB (681.7 MB on disk)
ZLIB:
    3069.83 user 40.02 system 4:20.13 elapsed 1195% CPU (cold cache)
    425.50 user 15.17 system 0:46.80 elapsed 941% CPU (warm cache)
    470.3 MB (701.4 MB on disk)

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoegl: avoid local modifications for eglext.h Khronos standard header file
Laurent Carlier [Wed, 6 Nov 2019 15:04:50 +0000 (16:04 +0100)]
egl: avoid local modifications for eglext.h Khronos standard header file

Move differences in eglextchromium.h header file, then provide the same header than libglvnd-1.2
So program that omit to include eglextchromium.h will fail to build with both mesa and libglvnd headers.

Fixes: a0a8109f "include: add the definition of EGL_EXT_image_flush_external"
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agoegl: move #include of local headers out of Khronos headers
Eric Engestrom [Wed, 6 Nov 2019 19:53:28 +0000 (19:53 +0000)]
egl: move #include of local headers out of Khronos headers

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agointel/fs: Lower large local arrays to scratch
Jason Ekstrand [Sun, 4 Dec 2016 01:15:42 +0000 (17:15 -0800)]
intel/fs: Lower large local arrays to scratch

Shader-db results on Kaby Lake:

    total instructions in shared programs: 14929212 -> 14880028 (-0.33%)
    instructions in affected programs: 72428 -> 23244 (-67.91%)
    helped: 6
    HURT: 2
    helped stats (abs) min: 2165 max: 15981 x̄: 8590.00 x̃: 7624
    helped stats (rel) min: 56.06% max: 74.52% x̄: 67.55% x̃: 72.08%
    HURT stats (abs)   min: 1178 max: 1178 x̄: 1178.00 x̃: 1178
    HURT stats (rel)   min: 350.60% max: 361.35% x̄: 355.97% x̃: 355.97%
    95% mean confidence interval for instructions value: -11947.03 -348.97
    95% mean confidence interval for instructions %-change: -125.72% 202.37%
    Inconclusive result (%-change mean confidence interval includes 0).

    total cycles in shared programs: 368585300 -> 342557344 (-7.06%)
    cycles in affected programs: 28144921 -> 2116965 (-92.48%)
    helped: 6
    HURT: 2
    helped stats (abs) min: 1404978 max: 7766106 x̄: 4353922.00 x̃: 3890682
    helped stats (rel) min: 82.01% max: 95.57% x̄: 89.95% x̃: 92.28%
    HURT stats (abs)   min: 47778 max: 47798 x̄: 47788.00 x̃: 47788
    HURT stats (rel)   min: 278.20% max: 282.98% x̄: 280.59% x̃: 280.59%
    95% mean confidence interval for cycles value: -5900438.73 -606550.27
    95% mean confidence interval for cycles %-change: -140.79% 146.16%
    Inconclusive result (%-change mean confidence interval includes 0).

    total spills in shared programs: 9243 -> 8901 (-3.70%)
    spills in affected programs: 2718 -> 2376 (-12.58%)
    helped: 4
    HURT: 4

    total fills in shared programs: 21831 -> 10141 (-53.55%)
    fills in affected programs: 11804 -> 114 (-99.03%)
    helped: 6
    HURT: 2

    total sends in shared programs: 815912 -> 815912 (0.00%)
    sends in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    LOST:   1
    GAINED: 3

The helped shaders are all compute shaders in Aztec Ruins.  There is
also a compute shader in synmark2 OglCSDof that's helped but it doesn't
show up in above shader-db results because it went from SIMD8 to SIMD16.
That shader improves enough to yield an 15-20% performance boost to the
benchmark as a whole on my KBL laptop.  The hurt shaders are a couple
shaders in Kerbal Space Program and a couple in Aztec Ruins.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agointel/fs: Implement the new load/store_scratch intrinsics
Jason Ekstrand [Thu, 28 Feb 2019 14:15:30 +0000 (08:15 -0600)]
intel/fs: Implement the new load/store_scratch intrinsics

This commit fills in a number of different pieces:

 1. We add support to brw_nir_lower_mem_access_bit_sizes to handle the
    new intrinsics.  This involves simple plumbing work as well as a
    tiny bit of extra logic to always scalarize scratch intrinsics

 2. Add code to brw_fs_nir.cpp to turn nir_load/store_scratch intrinsics
    into byte/dword scattered read/write messages which use the A32
    stateless model.

 3. Add code to lower_surface_logical_send to handle dword scattered
    messages and the A32 stateless model.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agointel/nir: Plumb devinfo through lower_mem_access_bit_sizes
Jason Ekstrand [Thu, 28 Feb 2019 16:02:03 +0000 (10:02 -0600)]
intel/nir: Plumb devinfo through lower_mem_access_bit_sizes

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agointel/fs: refactor surface header setup
Jason Ekstrand [Thu, 28 Feb 2019 16:26:33 +0000 (10:26 -0600)]
intel/fs: refactor surface header setup

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agointel/fs: Add DWord scattered read/write opcodes
Jason Ekstrand [Wed, 8 Apr 2015 09:41:33 +0000 (02:41 -0700)]
intel/fs: Add DWord scattered read/write opcodes

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agointel/nir: Use nir_extract_bits in lower_mem_access_bit_sizes
Jason Ekstrand [Wed, 6 Nov 2019 18:36:28 +0000 (12:36 -0600)]
intel/nir: Use nir_extract_bits in lower_mem_access_bit_sizes

The new helper solves most of the annoying problems with data wrangling
in brw_nir_lower_mem_access_bit_sizes.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agonir: Add tests for nir_extract_bits
Jason Ekstrand [Sat, 9 Nov 2019 01:24:05 +0000 (19:24 -0600)]
nir: Add tests for nir_extract_bits

4 years agonir/builder: Add a nir_extract_bits helper
Jason Ekstrand [Wed, 6 Nov 2019 18:09:56 +0000 (12:09 -0600)]
nir/builder: Add a nir_extract_bits helper

This new helper is better than nir_bitcast_vector because it's able to
take a (mostly) arbitrary range from the source vector.  The only
requirement is that first_bit has to be aligned to the smaller of the
two bit sizes.  It wouldn't be hard to lift that requirement but it's
reasonable for now.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agoegl: fix _EGL_NATIVE_PLATFORM fallback
Eric Engestrom [Tue, 10 Sep 2019 16:06:09 +0000 (17:06 +0100)]
egl: fix _EGL_NATIVE_PLATFORM fallback

When the X11 or Haiku platforms were compiled in, they would bypass the
`_EGL_NATIVE_PLATFORM` fallback by always returning themselves instead.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agoanv: Unify GetDeviceQueue and GetDeviceQueue2
Ricardo Garcia [Thu, 7 Nov 2019 14:38:45 +0000 (15:38 +0100)]
anv: Unify GetDeviceQueue and GetDeviceQueue2

Avoid duplicating some checks and code by making anv_GetDeviceQueue a
subcase of anv_GetDeviceQueue2, like radv does.

Signed-off-by: Ricardo Garcia <rgarcia@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agopanfrost: Select format-specific blending intrinsics
Alyssa Rosenzweig [Thu, 7 Nov 2019 02:48:33 +0000 (21:48 -0500)]
panfrost: Select format-specific blending intrinsics

If we have an accelerated path for a particular framebuffer format,
let's use it to save a bunch of instructions in a blend shader.

[Tomeu: Only use the faster intrinsic on >T760]

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
4 years agopan/midgard: Pack load/store masks
Alyssa Rosenzweig [Thu, 7 Nov 2019 13:25:27 +0000 (08:25 -0500)]
pan/midgard: Pack load/store masks

While most load/store operations on 32-bit/vec4 intriniscally, some are
not and have special type-size-dependent semantics for the mask. We need
to convert into this native format.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
4 years agopan/midgard: Implement nir_intrinsic_load_output_u8_as_fp16_pan
Alyssa Rosenzweig [Thu, 7 Nov 2019 02:50:32 +0000 (21:50 -0500)]
pan/midgard: Implement nir_intrinsic_load_output_u8_as_fp16_pan

We can use the native Midgard ops for this, depending what chip we're
on.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
4 years agopan/midgard: Identify ld_color_buffer_u8_as_fp16*
Alyssa Rosenzweig [Thu, 7 Nov 2019 02:49:35 +0000 (21:49 -0500)]
pan/midgard: Identify ld_color_buffer_u8_as_fp16*

There are two versions of this opcode, depending what version of the ISA
you're using. I'm not sure if there's a semantic difference; I think
there might be some slight subtleties but it's too early to know at this
stage.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
4 years agonir: Add load_output_u8_as_fp16_pan intrinsic
Alyssa Rosenzweig [Thu, 7 Nov 2019 02:47:23 +0000 (21:47 -0500)]
nir: Add load_output_u8_as_fp16_pan intrinsic

This is a single opcode, at least on newer Midgard chips. It's easier to
have this represented in NIR rather than trying to optimize out the
conversions, so let's add the intrinsic.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
4 years agopanfrost: Set depth and stencil for SFBD based on the format
Tomeu Vizoso [Wed, 6 Nov 2019 09:04:36 +0000 (10:04 +0100)]
panfrost: Set depth and stencil for SFBD based on the format

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agozink: correct depth-stencil format
Erik Faye-Lund [Fri, 8 Nov 2019 11:22:00 +0000 (12:22 +0100)]
zink: correct depth-stencil format

When using packed vulkan-formats on little-endian systems, we need to
swap the components for the gallium formats. And since Zink isn't
big-endian safe yet, little-endian is the only endianess we care about
right now.

This fixes a bunch of piglit tests, amongs others:
- spec@arb_depth_texture@depth-level-clamp
- spec@arb_depth_texture@depthstencil-render-miplevels * d=z24
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-blit
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-copypixels
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-drawpixels
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-readpixels

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 8d46e35d16e ("zink: introduce opengl over vulkan")
4 years agozink/spirv: add support for nir_op_flrp
Erik Faye-Lund [Wed, 6 Nov 2019 14:13:58 +0000 (15:13 +0100)]
zink/spirv: add support for nir_op_flrp

This fixes the following piglit:

spec@ati_fragment_shader@ati_fragment_shader-render-fog

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
4 years agoegl: Mention if swrast is being forced
Chris Wilson [Thu, 31 Oct 2019 07:29:55 +0000 (07:29 +0000)]
egl: Mention if swrast is being forced

The system can be disabling HW acceleration unbeknown to the user,
leading to a long debug session trying to work out which component is
failing. A quick mention that it is the environment override would be
very useful.

v2: Use more generic "CPU renderer" and so try to avoid jargon.

Reviewed-By: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Martin Peres <martin.peres@linux.intel.com>
4 years agospirv: Sort out the mess that is sampled image
Jason Ekstrand [Thu, 26 Sep 2019 16:56:48 +0000 (11:56 -0500)]
spirv: Sort out the mess that is sampled image

This commit makes two major changes.  First, we add a second case to
OpLoad for sampled images which constructs a vtn_sampled_image and
stashes that rather than stashing a pointer to the combined image
sampler like we do for bare samplers and images.  This should be more in
line with how SPIR-V is intended to work and hopefully doesn't cause any
weird problems.  The second is a rework of vtn_handle_texture to assume
that everything has an image but not everything has a sampler.  We also
add a vtn_fail_if for the case where a texture instructions require a
sampler but none is provided.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agospirv: Add a vtn_decorate_pointer helper
Jason Ekstrand [Mon, 4 Nov 2019 22:44:30 +0000 (16:44 -0600)]
spirv: Add a vtn_decorate_pointer helper

This helper makes a duplicate copy of the pointer if any new access
flags are set at this stage.  This way we don't end up propagating
access flags further than they actual SPIR-V decorations.  In several
instances where we create new pointers, we still call the decoration
helper directly because no copy is needed.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agospirv: Remove the type from sampled_image
Jason Ekstrand [Thu, 26 Sep 2019 16:48:44 +0000 (11:48 -0500)]
spirv: Remove the type from sampled_image

We have types on all vtn_values at this point so there's no reason to
carry the redundant type information.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agofreedreno/ir3: also track # of nops for shader-db
Rob Clark [Mon, 4 Nov 2019 19:41:55 +0000 (11:41 -0800)]
freedreno/ir3: also track # of nops for shader-db

The instruction count is (mostly) a measure of what optimization passes
can do, while # of nops is more an indication of how effectively the
scheduler is balancing register pressure vs instruction count.  So track
these independently.

(There could be opportunities to rematerialize values to reduce register
pressure, swapping some nop's with other alu instructions, so nothing is
truely independent.. but it is still useful to break these stats out.)

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/ir3: sync disasm changes from envytools
Rob Clark [Mon, 4 Nov 2019 19:33:54 +0000 (11:33 -0800)]
freedreno/ir3: sync disasm changes from envytools

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/a4xx: fix SP_FS_MRT_REG.HALF_PRECISION
Rob Clark [Fri, 25 Oct 2019 20:57:49 +0000 (13:57 -0700)]
freedreno/a4xx: fix SP_FS_MRT_REG.HALF_PRECISION

Set flag based on actual output reg type.

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/a3xx: fix SP_FS_MRT_REG.HALF_PRECISION
Rob Clark [Fri, 25 Oct 2019 20:56:30 +0000 (13:56 -0700)]
freedreno/a3xx: fix SP_FS_MRT_REG.HALF_PRECISION

We should really be setting this based on the actual output register
type.

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/ir3: remove obsolete comment
Rob Clark [Thu, 24 Oct 2019 18:24:15 +0000 (11:24 -0700)]
freedreno/ir3: remove obsolete comment

The meta PHI instruction was removed long ago.  And fanin/fanout
themselves to not contribute actual instructions (at least not by the
time you get to sched, they may prevent copy-propagating away a mov)

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/ir3/ra: remove ir print after livein/out
Rob Clark [Wed, 23 Oct 2019 22:04:38 +0000 (15:04 -0700)]
freedreno/ir3/ra: remove ir print after livein/out

The IR hasn't changed at this point, so it isn't really adding any
value.

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/ir3/ra: move regs_count==0 check
Rob Clark [Wed, 23 Oct 2019 16:50:22 +0000 (09:50 -0700)]
freedreno/ir3/ra: move regs_count==0 check

Fold it in to writes_gpr() (since a register that does not reference any
registers by definition does not write a register).  This lets us avoid
having to handle this case in a few other places.

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/ir3: ir3_print tweaks
Rob Clark [Tue, 22 Oct 2019 18:01:11 +0000 (11:01 -0700)]
freedreno/ir3: ir3_print tweaks

Handle HALF/HIGH flags in all cases, and colorize SSA src notation.

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/ir3: use SSA flag on dest register too
Rob Clark [Tue, 22 Oct 2019 17:28:04 +0000 (10:28 -0700)]
freedreno/ir3: use SSA flag on dest register too

We did this in some places before, but not consistantly.  But it will be
useful for two-pass RA, to identify which registers have already been
assigned.

While we are cleaning this up, use __ssa_src() and new __ssa_dst()
helper more consistently.  (If nothing else, this reduces the # of
callers of ir3_reg_create() to audit that we didn't miss something)

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agofreedreno/ir3: split pre-coloring to it's own function
Rob Clark [Tue, 22 Oct 2019 16:22:58 +0000 (09:22 -0700)]
freedreno/ir3: split pre-coloring to it's own function

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agospirv: Don't leak GS initialization to other stages
Caio Marcelo de Oliveira Filho [Fri, 8 Nov 2019 23:58:15 +0000 (15:58 -0800)]
spirv: Don't leak GS initialization to other stages

The stage specific fields of shader_info are in an union.  We've
likely been lucky that this value was either overwritten or ignored by
other stages.  The recent change in shader_info layout in commit
84a1a2578da ("compiler: pack shader_info from 160 bytes to 96 bytes")
made this issue visible.

Fixes: cf2257069cb ("nir/spirv: Set a default number of invocations for geometry shaders")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agocompiler: pack shader_info from 160 bytes to 96 bytes
Marek Olšák [Wed, 6 Nov 2019 00:10:09 +0000 (19:10 -0500)]
compiler: pack shader_info from 160 bytes to 96 bytes

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agoglsl/linker: pass shader_info to analyze_clip_cull_usage directly
Marek Olšák [Wed, 6 Nov 2019 00:20:35 +0000 (19:20 -0500)]
glsl/linker: pass shader_info to analyze_clip_cull_usage directly

This will be needed by the next commit.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agoradeonsi/nir: fix compute shader crash due to nir_binary == NULL
Marek Olšák [Fri, 8 Nov 2019 21:31:35 +0000 (16:31 -0500)]
radeonsi/nir: fix compute shader crash due to nir_binary == NULL

This partially reverts 8b30114dda8.

Fixes: 8b30114dda8 "radeonsi/nir: call nir_serialize only once per shader"
4 years agoradeonsi/nir: call nir_serialize only once per shader
Marek Olšák [Thu, 7 Nov 2019 23:43:07 +0000 (18:43 -0500)]
radeonsi/nir: call nir_serialize only once per shader

We were calling it twice.

First serialize it, then use it to compute the cache key.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agoutil: add blob_finish_get_buffer
Marek Olšák [Fri, 8 Nov 2019 00:10:55 +0000 (19:10 -0500)]
util: add blob_finish_get_buffer

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agou_format: Fix swizzle of A1R5G5B5.
Eric Anholt [Thu, 7 Nov 2019 23:24:05 +0000 (15:24 -0800)]
u_format: Fix swizzle of A1R5G5B5.

Found once I started using the generated unpack code from the Mesa side.

Fixes: 4bbaac3782ad ("gallium: Add some more channel orderings of packed formats.")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
4 years agovirgl: support emulating planar image sampling
David Stevens [Wed, 23 Oct 2019 02:06:17 +0000 (11:06 +0900)]
virgl: support emulating planar image sampling

Mesa emulates planar format sampling with per-plane samplers. Virgl now
supports this by allowing the plane index to be passed when creating a
sampler view from a planar image. With this change, mesa now passes that
information to virgl.

Signed-off-by: David Stevens <stevensd@chromium.org>
Reviewed-by: Lepton Wu <lepton@chromium.org>
4 years agogallium/swr: Enable some ARB_gpu_shader5 extensions
Krzysztof Raszkowski [Fri, 8 Nov 2019 16:04:47 +0000 (16:04 +0000)]
gallium/swr: Enable some ARB_gpu_shader5 extensions
Enable / add to features.txt:
- Enhanced textureGather.
- Geometry shader instancing.
- Geometry shader multiple streams.

Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
4 years agogallium/swr: Fix GS invocation issues
Krzysztof Raszkowski [Fri, 8 Nov 2019 14:52:16 +0000 (14:52 +0000)]
gallium/swr: Fix GS invocation issues
- Fixed proper setting gl_InvocationID.
- Fixed GS vertices output memory overflow.

Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
4 years agoac: Handle invalid GFX10 format correctly in ac_get_tbuffer_format.
Timur Kristóf [Wed, 6 Nov 2019 12:29:26 +0000 (13:29 +0100)]
ac: Handle invalid GFX10 format correctly in ac_get_tbuffer_format.

It happens that some games try to access a vertex buffer without
a valid format. This case was incorrectly handled by
ac_get_tbuffer_format which made ACO emit an invalid instruction.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agopanfrost: Try to evict unused BOs from the cache
Boris Brezillon [Thu, 7 Nov 2019 08:32:31 +0000 (09:32 +0100)]
panfrost: Try to evict unused BOs from the cache

The panfrost BO cache can only grow since all newly allocated BOs are
returned to the cache (unless they've been exported).

With the MADVISE ioctl that's not a big issue because the kernel can
come and reclaim this memory, but MADVISE will only be available on 5.4
kernels. This means an app can currently allocate a lot memory without
ever releasing it, leading to some situations where the OOM-killer kicks
in and kills the app (or even worse, kills another process consuming
more memory than the GL app) to get some of this memory back.

Let's try to limit the amount of BOs we keep in the cache by evicting
entries that have not been used for more than one second (if the app
stopped allocating BOs of this size, it's likely to not allocate
similar BOs in a near future).

This solution is based on the VC4/V3D implementation.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopanfrost: Move BO cache related fields to a sub-struct
Boris Brezillon [Thu, 7 Nov 2019 07:42:09 +0000 (08:42 +0100)]
panfrost: Move BO cache related fields to a sub-struct

We will soon introduce an LRU list to evict BOs that have been unused
for more than 1 second. Let's first move all BO cache fields to a
sub-struct to clarify which fields are used by the BO caching logic.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Switch base for vertex texturing on T720
Alyssa Rosenzweig [Thu, 7 Nov 2019 14:31:29 +0000 (09:31 -0500)]
pan/midgard: Switch base for vertex texturing on T720

There aren't texture pipeline registers anymore; instead, space is
shared with work and ldst registers for output and input respectively.
We need to shift the base registers to represent this correctly.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Pass shader stage to disassembler
Alyssa Rosenzweig [Thu, 7 Nov 2019 14:31:02 +0000 (09:31 -0500)]
pan/midgard: Pass shader stage to disassembler

Vertex texturing behaves differently from fragment texturing on some
GPUs.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Disassemble half-steps correctly
Alyssa Rosenzweig [Thu, 7 Nov 2019 14:20:56 +0000 (09:20 -0500)]
pan/midgard: Disassemble half-steps correctly

The meaning of some bits shifts; we need to account for this to print
swizzles sanely.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Fix printing of half-registers in texture ops
Alyssa Rosenzweig [Thu, 7 Nov 2019 13:56:41 +0000 (08:56 -0500)]
pan/midgard: Fix printing of half-registers in texture ops

We were using old style half-registers; let's update that to be
consistent, preparing us for more disassmbler changes in this area.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agofreedreno/ir3: Use regid() helper when setting up precolor regs
Kristian H. Kristensen [Fri, 8 Nov 2019 00:44:33 +0000 (16:44 -0800)]
freedreno/ir3: Use regid() helper when setting up precolor regs

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/a6xx: Turn on tessellation shaders
Kristian H. Kristensen [Wed, 23 Oct 2019 02:51:38 +0000 (19:51 -0700)]
freedreno/a6xx: Turn on tessellation shaders

Wow. Very triangle. So shader.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/a6xx: Only use merged regs and four quads for VS+FS
Kristian H. Kristensen [Tue, 29 Oct 2019 23:26:34 +0000 (16:26 -0700)]
freedreno/a6xx: Only use merged regs and four quads for VS+FS

When other geometry stages are present, we chose two quads and no
merged regs.

Acked-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/blitter: Save tessellation state
Kristian H. Kristensen [Tue, 22 Oct 2019 23:26:11 +0000 (16:26 -0700)]
freedreno/blitter: Save tessellation state

We have tessellation state now.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/a6xx: Only set emit.hs/ds when we're drawing patches
Kristian H. Kristensen [Fri, 8 Nov 2019 00:32:24 +0000 (16:32 -0800)]
freedreno/a6xx: Only set emit.hs/ds when we're drawing patches

At least the gallium blitter helper will call us to draw with
tessellation shaders set but a non-patch primitive.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno: Use bypass rendering for tessellation
Kristian H. Kristensen [Wed, 23 Oct 2019 02:49:47 +0000 (19:49 -0700)]
freedreno: Use bypass rendering for tessellation

It seems like tiling could work in the Adreno architecture, but we've
only ever seen bypass rendering with tessellation.  For now, let's do
that too.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/a6xx: Program state for tessellation stages
Kristian H. Kristensen [Wed, 23 Oct 2019 03:03:07 +0000 (20:03 -0700)]
freedreno/a6xx: Program state for tessellation stages

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/a6xx: Emit constant parameters for tessellation stages
Kristian H. Kristensen [Wed, 23 Oct 2019 02:58:59 +0000 (19:58 -0700)]
freedreno/a6xx: Emit constant parameters for tessellation stages

Assemble the information the stages need and emit the constants.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/a6xx: Allocate and program tessellation buffer
Kristian H. Kristensen [Wed, 23 Oct 2019 03:05:47 +0000 (20:05 -0700)]
freedreno/a6xx: Allocate and program tessellation buffer

Tessellation needs a couple of buffers that should hold the entire
output from a full VS+TCS draw call.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/a6xx: Build the right draw command for tessellation
Kristian H. Kristensen [Wed, 23 Oct 2019 02:47:50 +0000 (19:47 -0700)]
freedreno/a6xx: Build the right draw command for tessellation

We need to select the right primitive type, set a bit to turn on
tessellation and or in the TES output primitive type.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/ir3: Allocate const space for tessellation parameters
Kristian H. Kristensen [Wed, 23 Oct 2019 02:44:42 +0000 (19:44 -0700)]
freedreno/ir3: Allocate const space for tessellation parameters

The tessellation stages need size and stride or the patch layout as
well as locations of attributes in the patch.  The tesselation stages
also use two system memory BOs and need the iovas of those.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/ir3: Pre-color TCS header and primitive ID inputs
Kristian H. Kristensen [Wed, 23 Oct 2019 02:39:47 +0000 (19:39 -0700)]
freedreno/ir3: Pre-color TCS header and primitive ID inputs

Similar to GS, the registers are shared and not reinitialized betewen
VS and TCS, so we need to make sure to allocate the same registers for
the system values between stages.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/ir3: Don't assume binning shader is always VS
Kristian H. Kristensen [Wed, 23 Oct 2019 03:21:13 +0000 (20:21 -0700)]
freedreno/ir3: Don't assume binning shader is always VS

In tessellation mode, the TES is (probably) the binning shader.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/ir3: Setup inputs and outputs for tessellation stages
Kristian H. Kristensen [Wed, 23 Oct 2019 02:37:53 +0000 (19:37 -0700)]
freedreno/ir3: Setup inputs and outputs for tessellation stages

Similar to GS, some inputs are reused when the chsh from VS to TCS or
TES to GS, so we need to make sure we setup the right inputs and make
the shared system values outputs so they don't get clobbered.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/ir3: Implement TCS synchronization intrinsics
Kristian H. Kristensen [Wed, 23 Oct 2019 02:30:51 +0000 (19:30 -0700)]
freedreno/ir3: Implement TCS synchronization intrinsics

We add two new IR3 specific nir intrinsics that map to the new condend
and endpatch instructions.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/ir3: Implement tess coord intrinsic
Kristian H. Kristensen [Wed, 23 Oct 2019 02:29:07 +0000 (19:29 -0700)]
freedreno/ir3: Implement tess coord intrinsic

Our lowering pass made the z component unused by replacing its uses
by 1 - x - y.  The intrinsic implementation then just need to return
the x and y components.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/ir3: End TES with chsh when using GS
Kristian H. Kristensen [Wed, 23 Oct 2019 02:26:30 +0000 (19:26 -0700)]
freedreno/ir3: End TES with chsh when using GS

When we have both TES and GS, the TES needs to chain to the VS with
chmask and chsh GS just like the VS does to either TCS or GS.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/ir3: Add new synchronization opcodes
Kristian H. Kristensen [Wed, 23 Oct 2019 01:19:50 +0000 (18:19 -0700)]
freedreno/ir3: Add new synchronization opcodes

There are two new opcodes in use in tesselation control shaders:
category 0, opcodes 13 and 15.  unk13 is a kill type of instruction
that terminates threads where !p0.x and it used to narrow down a patch
wavefront to just thread 0.  Then, once thread 0 has written the tess
levels, it issues unk15, which might signal the TE that another patch
has been fully written.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/ir3: Extend geometry lowering pass to handle tessellation
Kristian H. Kristensen [Wed, 23 Oct 2019 00:30:48 +0000 (17:30 -0700)]
freedreno/ir3: Extend geometry lowering pass to handle tessellation

VS and TCS pass varyings the same way as VS and GS does. TCS then
writes entire patch to a system memory BO and TES eventually reads
back from the BO once the TE starts generating vertices.  TES outputs
vertices the same way as VS and GS, except when there's a GS as well,
in which case TES passes varyings to GS same way the VS would.

In addition, the TCS needs a little bit of control flow massaging so
that it only runs for valid invocations needs a couple of unknown
instructions to synchronize with the TE.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/ir3: Add tessellation field to shader key
Kristian H. Kristensen [Wed, 23 Oct 2019 00:16:09 +0000 (17:16 -0700)]
freedreno/ir3: Add tessellation field to shader key

Whether we're tessellating and which primitives the TES outputs
affects the entire pipeline so let's add a field to the key to track
that.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/ir3: Use imul24 in offset calculations
Kristian H. Kristensen [Tue, 22 Oct 2019 23:37:35 +0000 (16:37 -0700)]
freedreno/ir3: Use imul24 in offset calculations

With the imul24 opcode in place, we can now use it for computing local
offsets (ie for ldlw/stlw).

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/ir3: Add ir3 intrinsics for tessellation
Kristian H. Kristensen [Tue, 22 Oct 2019 23:33:18 +0000 (16:33 -0700)]
freedreno/ir3: Add ir3 intrinsics for tessellation

These provide the iovas for system memory buffers used for
tessellation as well as a new HW specific system value.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno: Don't count primitives for patches
Kristian H. Kristensen [Tue, 22 Oct 2019 23:24:26 +0000 (16:24 -0700)]
freedreno: Don't count primitives for patches

The gallium helper doesn't like patches and we can't determine how
many primitives it gets tessellated into anyway.  On gens where we
have tessellation, we get the prim count from a HW counter so just
skip counting on the CPU.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/ir3: Add load and store intrinsics for global io
Kristian H. Kristensen [Tue, 22 Oct 2019 23:16:35 +0000 (16:16 -0700)]
freedreno/ir3: Add load and store intrinsics for global io

These intrinsics take a ivec2 for the 64 bit base address and a
integer offset.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/ir3: Emit link map as byte or dwords offsets as needed
Kristian H. Kristensen [Tue, 22 Oct 2019 23:07:45 +0000 (16:07 -0700)]
freedreno/ir3: Emit link map as byte or dwords offsets as needed

Stages that load inputs with ldlw (TCS, GS) need byte offsets, stages
that load with ldg (TES) need dwords offsets.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/a6xx: Add register offset for STG/LDG
Kristian H. Kristensen [Tue, 22 Oct 2019 23:03:36 +0000 (16:03 -0700)]
freedreno/a6xx: Add register offset for STG/LDG

These instructions take a 64 bit iova as two conescutive registers and
a immediate offset.  This patch adds support for the offset to be a
single register, which is added to the 64 bit iova.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/a6x: Rename z/s formats
Kristian H. Kristensen [Thu, 31 Oct 2019 21:43:58 +0000 (14:43 -0700)]
freedreno/a6x: Rename z/s formats

What we call eRB6_Z24_UNORM_S8_UINT now is actually
RB6_Z24_UNORM_S8_UINT_AS_R8G8B8A8 and RB6_X8Z24_UNORM is actually
RB6_Z24_UNORM_S8_UINT.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/a6xx: Fix layered texture type enum
Kristian H. Kristensen [Fri, 1 Nov 2019 22:16:44 +0000 (15:16 -0700)]
freedreno/a6xx: Fix layered texture type enum

2D array textures and 3D textures are different enum values after all.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno: Add nogmem debug option to force bypass rendering
Kristian H. Kristensen [Thu, 31 Oct 2019 21:21:32 +0000 (14:21 -0700)]
freedreno: Add nogmem debug option to force bypass rendering

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/a6xx: Clear sysmem with CP_BLIT
Kristian H. Kristensen [Fri, 20 Sep 2019 00:13:34 +0000 (17:13 -0700)]
freedreno/a6xx: Clear sysmem with CP_BLIT

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/a6xx: Fix primitive counters again
Kristian H. Kristensen [Thu, 31 Oct 2019 00:18:48 +0000 (17:18 -0700)]
freedreno/a6xx: Fix primitive counters again

We use one mechanism for (REG_A6XX_RBBM_PRIMCTR_8_LO)
PIPE_QUERY_PRIMITIVES_GENERATED, which counts all primitives that exit
the geometry pipeline, whether or not xfb is on.  Then for
PIPE_QUERY_PRIMITIVES_EMITTED, we use the CP_EVENT_WRITE subfunction
that writes out per-stream counts for generated and emitted, but only
when xfb is enabled.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>