mesa.git
4 years agomesa: add ARB_clear_buffer_object named functions
Pierre-Eric Pelloux-Prayer [Tue, 5 Nov 2019 14:37:12 +0000 (15:37 +0100)]
mesa: add ARB_clear_buffer_object named functions

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agomesa: add ARB_vertex_attrib_64bit VertexArrayVertexAttribLOffsetEXT
Pierre-Eric Pelloux-Prayer [Tue, 5 Nov 2019 14:04:52 +0000 (15:04 +0100)]
mesa: add ARB_vertex_attrib_64bit VertexArrayVertexAttribLOffsetEXT

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agomesa: add ARB_framebuffer_no_attachments named functions
Pierre-Eric Pelloux-Prayer [Tue, 5 Nov 2019 13:47:53 +0000 (14:47 +0100)]
mesa: add ARB_framebuffer_no_attachments named functions

The wording in ARB_framebuffer_no_attachments and EXT_direct_state_access
is different.
In the former framebuffer names must have been generated using glGenFramebuffers
before using the named functions.
In the latter framebuffer names have no such constraints, so we can't use
the _mesa_lookup_framebuffer_dsa function.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agomesa: update features.txt to reflect EXT_dsa status
Pierre-Eric Pelloux-Prayer [Wed, 6 Nov 2019 09:30:13 +0000 (10:30 +0100)]
mesa: update features.txt to reflect EXT_dsa status

All features from the EXT_dsa spec are implemented.

Interactions with other specs:
- GL_AMD_gpu_shader_int64: not needed, since it's not enabled in
  compatibility profile.
- GL_ARB_bindless_texture is DONE
    "INVALID_OPERATION is generated when calling various functions
    to modify the state of a texture object from which handles have
    been extracted"
- GL_ARB_buffer_storage/GL_EXT_buffer_storage is DONE (NamedBufferStorageEXT function)
- GL_ARB_texture_storage is DONE (3 TextureStorage*DEXT functions)
- GL_ARB_vertex_attrib_binding is DONE (6 VertexArray* functions)
- GL_EXT_external_buffer is not supported by Mesa

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agopanfrost: Set PIPE_COMPUTE_CAP_ADDRESS_BITS to 64
Alyssa Rosenzweig [Wed, 6 Nov 2019 19:55:41 +0000 (14:55 -0500)]
panfrost: Set PIPE_COMPUTE_CAP_ADDRESS_BITS to 64

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopanfrost: Disable tiling for GLOBAL resources
Alyssa Rosenzweig [Tue, 5 Nov 2019 16:18:42 +0000 (11:18 -0500)]
panfrost: Disable tiling for GLOBAL resources

It doesn't make sense to have nonlinear layouts for a buffer that can be
accessed as direct memory for a compute kernel. Turn that off so things
work as expected.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopanfrost: Pass kernel inputs as uniforms
Alyssa Rosenzweig [Tue, 5 Nov 2019 16:19:20 +0000 (11:19 -0500)]
panfrost: Pass kernel inputs as uniforms

We can take the OpenCL kernel inputs and interpret them as uniforms by
simply reusing the Gallium callback.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopanfrost: Stub out clover callbacks
Alyssa Rosenzweig [Tue, 5 Nov 2019 14:37:51 +0000 (09:37 -0500)]
panfrost: Stub out clover callbacks

We don't implement these yet but let's not crash.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agoi965: Ensure that all 2101010 image imports can pass framebuffer completeness.
Miguel Casas-Sanchez [Tue, 19 Nov 2019 02:21:12 +0000 (02:21 +0000)]
i965: Ensure that all 2101010 image imports can pass framebuffer completeness.

Chrome OS would like to import and render to any supported format that has
a corresponding display plane format, and this prevents throwing
framebuffer incomplete for FBOs using these textures.

See: crbug.com/949260

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agonir/serialize: fix serializing functions with no implementations.
Dave Airlie [Mon, 18 Nov 2019 22:19:34 +0000 (08:19 +1000)]
nir/serialize: fix serializing functions with no implementations.

Store a flag stating if there was an implmentation, and use
fxn->impl as a temporary flag between deserializsation stages.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agonir/serialize: pack function has name and entry point into flags.
Dave Airlie [Mon, 18 Nov 2019 22:16:22 +0000 (08:16 +1000)]
nir/serialize: pack function has name and entry point into flags.

Suggested by Jason.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agoiris: Re-enable param compaction
Jason Ekstrand [Mon, 18 Nov 2019 22:52:02 +0000 (16:52 -0600)]
iris: Re-enable param compaction

In d1c4e64a69e, we added a parameter to tell the back-end compiler to
ignore the param array and just push however many constants you ask it
to push.  I enabled it for iris because this is really what iris wants
but it seems to have caused a number of regressions.  Revert to the old
behavior for now.

Fixes: d1c4e64a69e "intel/compiler: Add a flag to avoid compacting..."
4 years agomesa: enable glthread for 7 Days To Die
Marek Olšák [Mon, 18 Nov 2019 20:50:31 +0000 (15:50 -0500)]
mesa: enable glthread for 7 Days To Die

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agointel/compiler: Don't change hstride if not needed
Iván Briano [Wed, 23 Oct 2019 16:18:03 +0000 (09:18 -0700)]
intel/compiler: Don't change hstride if not needed

Alignment requirements may have changed the horizontal stride already,
so don't set it if not required to avoid breaking said requirements.

Fixes several tests such as
dEQP-VK.subgroups.vote.graphics.subgroupallequal_int8_t

Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agoturnip: add x11 wsi
Jonathan Marek [Wed, 13 Nov 2019 22:02:43 +0000 (17:02 -0500)]
turnip: add x11 wsi

Copied from radv

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoturnip: add display wsi
Jonathan Marek [Wed, 13 Nov 2019 21:50:36 +0000 (16:50 -0500)]
turnip: add display wsi

Copied from radv (minus the fence change)

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agonir: Validate that variables are in the right lists
Jason Ekstrand [Thu, 14 Nov 2019 18:12:50 +0000 (12:12 -0600)]
nir: Validate that variables are in the right lists

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoetnaviv: blt: set TS dirty after clear
Jonathan Marek [Tue, 2 Jul 2019 21:05:27 +0000 (17:05 -0400)]
etnaviv: blt: set TS dirty after clear

RS engine does this already, it is missing for BLT engine. This fixes
cases where a clear isn't immediately at the start of the frame.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoetnaviv: separate PE and RS formats, use only RS only for tiling
Jonathan Marek [Fri, 9 Aug 2019 20:27:47 +0000 (16:27 -0400)]
etnaviv: separate PE and RS formats, use only RS only for tiling

There are PE formats not supported by RS, so we can't have a single
to translate both.

Use RS only for same formats until we have a translate_rs_format and test
the possible different format blits.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoetnaviv: blt: use only for tiling, and add missing formats
Jonathan Marek [Fri, 9 Aug 2019 14:41:22 +0000 (10:41 -0400)]
etnaviv: blt: use only for tiling, and add missing formats

* Removes the incorrect usage of translate_rs_format
* Disables use of BLT engine for different src/dst format

We only really need the BLT engine for tiling/detiling right now, but it
would be nice to support as many blit cases as possible to avoid using PE
for that.

To deal with different formats we need to:
 * Have a translate_blt_format which has all supported formats
 * Fix the swizzle translation from gallium (current version was wrong)
 * Set the src/dst sRGB bits as needed
 * Find which type conversions the BLT engine can actually do

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoCall shmget() with permission 0600 instead of 0777
Brian Paul [Wed, 9 Oct 2019 18:05:16 +0000 (12:05 -0600)]
Call shmget() with permission 0600 instead of 0777

A security advisory (TALOS-2019-0857/CVE-2019-5068) found that
creating shared memory regions with permission mode 0777 could allow
any user to access that memory.  Several Mesa drivers use shared-
memory XImages to implement back buffers for improved performance.

This path changes the shmget() calls to use 0600 (user r/w).

Tested with legacy Xlib driver and llvmpipe.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agoanv: Emit a NULL vertex for zero base_vertex/instance
Jason Ekstrand [Fri, 8 Nov 2019 04:05:21 +0000 (22:05 -0600)]
anv: Emit a NULL vertex for zero base_vertex/instance

If both are zero (the common case), we can emit a null vertex buffer
rather than emitting a vertex buffer with zeros in it.  The packing of
the VERTEX_BUFFER_STATE is faster because no relocation is emitted and
we can avoid creating the vertex buffer which means one less
anv_state_stream_alloc.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agoanv: Use an anv_state for the next binding table
Jason Ekstrand [Thu, 7 Nov 2019 20:02:09 +0000 (14:02 -0600)]
anv: Use an anv_state for the next binding table

This is a bit more natural because we're already getting an anv_state
most places in the pipeline.  The important part here, however, is that
we're no longer calling anv_block_pool_map on every alloc_binding_table
call.  While it's probably pretty cheap, it is potentially a linear walk
over the list of BOs and it was showing up in profiles.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agoanv: More carefully dirty state in BindPipeline
Jason Ekstrand [Thu, 7 Nov 2019 17:28:47 +0000 (11:28 -0600)]
anv: More carefully dirty state in BindPipeline

Instead of blindly dirtying descriptors and push constants the moment we
see a pipeline change, check to see if it actually changes the bind
layout or push constant layout.  This doubles the runtime performance of
one CPU-limited example running with the Dawn WebGPU implementation when
running on my laptop.

NOTE: This effectively reverts beca63c6c07.  While it was a nice
optimization, it was based on prog_data and we can't do that anymore
once we start allowing the same binding table to be used with multiple
different pipelines.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agoanv: More carefully dirty state in BindDescriptorSets
Jason Ekstrand [Thu, 7 Nov 2019 17:44:08 +0000 (11:44 -0600)]
anv: More carefully dirty state in BindDescriptorSets

Instead of dirtying all graphics or all compute based on binding point,
we're now much more careful.  We first check to see if the actual
descriptor set changed and then only dirty the stages used by that
descriptor set.  For dynamic offsets, we keep a bitfield per-stage of
which offsets are actually used in that stage and we only dirty push
constants and descriptors if that stage has dynamic offsets AND those
offsets actually change.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agoanv: Use a switch statement for binding table setup
Jason Ekstrand [Thu, 7 Nov 2019 20:39:28 +0000 (14:39 -0600)]
anv: Use a switch statement for binding table setup

It theoretically could be more efficient but the real point here is that
it's no longer really a matter of dealing with special cases and then
the "real" thing.  The way we're handling binding tables, it's more of a
multi-step process and a switch is more natural.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agoanv: Rework push constant handling
Jason Ekstrand [Thu, 7 Nov 2019 23:16:14 +0000 (17:16 -0600)]
anv: Rework push constant handling

This substantially reworks both the state setup side of push constant
handling and the pipeline compile side.  The fundamental change here is
that we're no longer respecting the prog_data::param array and instead
are just instructing the back-end compiler to leave the array alone.
This makes the state setup side substantially simpler because we can now
just memcpy the whole block of push constants and don't have to
upload one DWORD at a time.

This also means that we can compute the full push constant layout
up-front and just trust the back-end compiler to not mess with it.
Maybe one day we'll decide that the back-end compiler can do useful
things there again but for now, this is functionally no different from
what we had before this commit and makes the NIR handling cleaner.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agoanv: Re-arrange push constant data a bit
Jason Ekstrand [Wed, 6 Nov 2019 16:59:15 +0000 (10:59 -0600)]
anv: Re-arrange push constant data a bit

This moves the compute stuff into a anv_push_constants::cs sub-struct.
It also moves dynamic offsets into the push constants.  This means we
have to duplicate the data per-stage but that doesn't seem like the end
of the world and one day we may wish to make dynamic offsets per-stage
anyway.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agointel/compiler: Add a flag to avoid compacting push constants
Jason Ekstrand [Thu, 31 Oct 2019 20:57:52 +0000 (15:57 -0500)]
intel/compiler: Add a flag to avoid compacting push constants

In vec4, we can just not run the pass.  In fs, things are a bit more
deeply intertwined.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agoanv: Pre-compute push ranges for graphics pipelines
Jason Ekstrand [Fri, 8 Nov 2019 15:42:30 +0000 (09:42 -0600)]
anv: Pre-compute push ranges for graphics pipelines

It turns off that emitting push constants is one of the hottest paths in
the driver and ANY work we do there costs us.  By pre-computing things a
bit ahead of time, we shave 5% off the runtime of a CPU-limited example
running with the Dawn WebGPU implementation.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agoanv: Stop bounds-checking pushed UBOs
Jason Ekstrand [Fri, 8 Nov 2019 15:33:07 +0000 (09:33 -0600)]
anv: Stop bounds-checking pushed UBOs

The bounds checking is actually less safe than just pushing the data.
If the bounds checking actually ever kicks in and it's not on the last
UBO push range, then the shrinking will cause all subsequent ranges to
be pushed to the wrong place in the GRF.  One of the behaviors we
definitely don't want is for OOB UBO access to result in completely
unrelated UBOs returning garbage values.  It's safer to just push the
UBOs as-requested.  If we're really concerned about robustness, we can
emit shader code to do bounds checking which should be stupid cheap (a
CMP followed by SEL).

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agoanv: Delete dead shader constant pushing code
Jason Ekstrand [Wed, 6 Nov 2019 20:13:44 +0000 (14:13 -0600)]
anv: Delete dead shader constant pushing code

As of 2d78e55a8c5481, nir_intrinsic_load_constant with a constant offset
is constant-folded so we should never end up with any that trigger
brw_nir_analyze_ubo_ranges.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agoanv: Flatten descriptor bindings in anv_nir_apply_pipeline_layout
Jason Ekstrand [Thu, 31 Oct 2019 19:09:39 +0000 (14:09 -0500)]
anv: Flatten descriptor bindings in anv_nir_apply_pipeline_layout

This lets us stop tracking the pipeline layout.  It also means less
indirection on a very hot path.  As an extra bonus, we can make some of
our data structures smaller.  No measurable CPU overhead improvement.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agoanv: Input attachments are always single-plane
Jason Ekstrand [Thu, 31 Oct 2019 21:57:29 +0000 (16:57 -0500)]
anv: Input attachments are always single-plane

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agogenxml: Mark everything in genX_pack.h always_inline
Jason Ekstrand [Thu, 31 Oct 2019 15:25:48 +0000 (10:25 -0500)]
genxml: Mark everything in genX_pack.h always_inline

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agoanv/pipeline: Assume layout != NULL
Jason Ekstrand [Wed, 6 Nov 2019 17:19:00 +0000 (11:19 -0600)]
anv/pipeline: Assume layout != NULL

In the early days of the driver we allowed layout to be VK_NULL_HANDLE
and used that for some internal pipelines when we wanted to be lazy.
Vulkan doesn't actually allow NULL layouts, however, so there's no
reason to have this check.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agointel/compiler: remove old comment
Italo Nicola [Fri, 8 Nov 2019 14:29:59 +0000 (11:29 -0300)]
intel/compiler: remove old comment

This comment was correct some time ago, but since commit
d3c10ad42729c1fe74a7f7c67465bd2, it isn't true anymore.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
4 years agopan/midgard: Use shader stage in mir_op_computes_derivative
Alyssa Rosenzweig [Mon, 18 Nov 2019 13:02:58 +0000 (08:02 -0500)]
pan/midgard: Use shader stage in mir_op_computes_derivative

A 'normal' texture op may be emitted in a vertex shader on T720 but it
still doesn't take any derivatives.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agoi965: Unify CC_STATE and BLEND_STATE atoms on Haswell as a workaround
Danylo Piliaiev [Thu, 14 Nov 2019 13:36:27 +0000 (15:36 +0200)]
i965: Unify CC_STATE and BLEND_STATE atoms on Haswell as a workaround

Re-emitting 3DSTATE_CC_STATE_POINTERS after emitting
3DSTATE_BLEND_STATE_POINTERS fixes the shadow flickering in
SuperTuxCart and Tropico 6 which was seen only on Haswell.
The reason for this is unknown and fix was found empirically.

The closest mention in PRM is that it should improve performance.
From the HSW PRM, volume 2b, page 823 (3DSTATE_BLEND_STATE_POINTERS):
 "When the BLEND_STATE pointer changes but not the CC_STATE pointer,
  driver needs to force a CC_STATE pointer change to improve
  blend performance in pixel backend."

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1834
Fixes: eca4a654 ("i965: Disable dual source blending when shader doesn't support it on gen8+")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agoradv: implement VK_AMD_device_coherent_memory
Samuel Pitoiset [Wed, 13 Nov 2019 07:58:37 +0000 (08:58 +0100)]
radv: implement VK_AMD_device_coherent_memory

This extension adds the device coherent and device uncached memory
types. It's known to be slower than non-device coherent memory but
it might be useful for debugging.

This is only exposed for chips that support L2 uncached.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoac: add radeon_info::has_l2_uncached
Samuel Pitoiset [Tue, 12 Nov 2019 16:17:21 +0000 (17:17 +0100)]
ac: add radeon_info::has_l2_uncached

For chips that have uncached device memory (ie. MTYPE_UC).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradeonsi: enable mesa_glthread for GfxBench
Pierre-Eric Pelloux-Prayer [Tue, 29 Oct 2019 14:58:04 +0000 (15:58 +0100)]
radeonsi: enable mesa_glthread for GfxBench

It improves offscreen tests performance.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agopan/midgard: Represent ld/st offset unpacked
Alyssa Rosenzweig [Fri, 15 Nov 2019 19:19:34 +0000 (14:19 -0500)]
pan/midgard: Represent ld/st offset unpacked

This simplifies manipulation of the offsets dramatically, fixing some
UBO access related bugs.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Fix masks/alignment for 64-bit loads
Alyssa Rosenzweig [Fri, 15 Nov 2019 20:16:53 +0000 (15:16 -0500)]
pan/midgard: Fix masks/alignment for 64-bit loads

These need to be handled with special care.

Oh, Midgard, you're *extra* special.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Expose more typesize helpers
Alyssa Rosenzweig [Fri, 15 Nov 2019 20:16:28 +0000 (15:16 -0500)]
pan/midgard: Expose more typesize helpers

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Implement non-aligned UBOs
Alyssa Rosenzweig [Fri, 15 Nov 2019 19:13:18 +0000 (14:13 -0500)]
pan/midgard: Implement non-aligned UBOs

The field is more fine-grained than we had assumed.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agoetnaviv: rs: upsampling is not supported
Christian Gmeiner [Sat, 9 Nov 2019 18:27:54 +0000 (19:27 +0100)]
etnaviv: rs: upsampling is not supported

This change makes it possible to support different downsample cases
like 4 -> 2 or 4 -> 1.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
4 years agofreedreno/registers: fix a6xx_2d_blit_cntl ROTATE
Jonathan Marek [Fri, 15 Nov 2019 20:34:09 +0000 (15:34 -0500)]
freedreno/registers: fix a6xx_2d_blit_cntl ROTATE

A change from b7093882 got overwritten by 610c8c93

Fixes: 610c8c93 ("freedreno/registers: Update with GS, HS and DS registers")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agofreedreno/ir3: disable texture prefetch for 1d array textures
Jonathan Marek [Fri, 15 Nov 2019 17:38:28 +0000 (12:38 -0500)]
freedreno/ir3: disable texture prefetch for 1d array textures

Prefetch only supports the basic 2D texture case, checking is_array is
needed because 1d array textures pass the coord num_components==2 test.

Fixes: 2a0d45ae ("freedreno/ir3: Add a NIR pass to select tex instructions eligible for pre-fetch")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
4 years agolima: Parse VS and PLBU command stream while making a dump
Andreas Baierl [Fri, 25 Oct 2019 14:38:52 +0000 (16:38 +0200)]
lima: Parse VS and PLBU command stream while making a dump

This makes the streams more readable and comparable with the blob's parser
as it parses the VS and PLBU stream and shows the currently known values.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
4 years agolima: Beautify stream dumps
Andreas Baierl [Thu, 17 Oct 2019 10:21:13 +0000 (12:21 +0200)]
lima: Beautify stream dumps

Change the dump, that the output looks more like the output of
mali-syscall-tracker [1].
This is a preparation for a more detailed stream analysis.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
[1]: https://gitlab.freedesktop.org/lima/mali-syscall-tracker

4 years agoclover/llvm: fix build after llvm 10 commit 1dfede3122ee
Aaron Watry [Fri, 15 Nov 2019 04:44:02 +0000 (22:44 -0600)]
clover/llvm: fix build after llvm 10 commit 1dfede3122ee

CodeGenFileType moved from ::llvm::TargetMachine in
llvm/Target/TargetMachine.h to ::llvm:: in llvm/Support/CodeGen.h

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
4 years agoandroid: util/format: fix include path list
Mauro Rossi [Fri, 15 Nov 2019 22:54:52 +0000 (23:54 +0100)]
android: util/format: fix include path list

To avoid following building error:

out/target/product/x86_64/obj_x86/STATIC_LIBRARIES/libmesa_util_intermediates/format/u_format_table.c:30:10:
fatal error: 'u_format.h' file not found
         ^~~~~~~~~~~~
1 error generated.

Fixes: 882ca6d ("util: Move gallium's PIPE_FORMAT utils to /util/format/")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
4 years agoandroid: radeonsi: fix build error due to wrong u_format.csv file path
Mauro Rossi [Fri, 15 Nov 2019 22:13:49 +0000 (23:13 +0100)]
android: radeonsi: fix build error due to wrong u_format.csv file path

GEN10_FORMAT_TABLE_INPUTS requires correction of u_format.csv file path
in order to avoid following build error:

ninja: error: 'external/mesa/util/format/u_format.csv',
needed by 'out/target/product/x86_64/gen/STATIC_LIBRARIES/libmesa_pipe_radeonsi_intermediates/radeonsi/gfx10_format_table.h',
missing and no known rule to make it

Fixes: 882ca6d ("util: Move gallium's PIPE_FORMAT utils to /util/format/")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
4 years agomesa/st: Reuse st_choose_matching_format from st_choose_format().
Eric Anholt [Tue, 17 Sep 2019 19:39:23 +0000 (12:39 -0700)]
mesa/st: Reuse st_choose_matching_format from st_choose_format().

We had this ad-hoc exact size matching for unsized internalformats,
but st_choose_matching_format() can do exactly what we want.  This
means, that, for example, we'll now prefer the matching ordering for
565/565_REV if the driver supports both orders.  We also pass
Unpack.SwapBytes through from ChooseTextureFormat so that we can hit
the memcpy path for 8888 formats when that flag is set.

Some interesting format choice changes from this (on softpipe):
intf/form/type        before            after
----------------------------------------------------
RGBA/RGBA/USHORT:     R8G8B8A8_UNORM -> RGBA_UNORM16
RGB/RGBA/8888:        X8B8G8R8_UNORM -> R8G8B8X8_UNORM
RGB/ABGR/8888_REV:    X8B8G8R8_UNORM -> R8G8B8X8_UNORM
RGBA/RGBA/5551:       B5G5R5A1_UNORM -> A1B5G5R5_UNORM
RGBA/RGBA/4444:       R8G8B8A8_UNORM -> A4B4G4R4_UNORM
RGBA/GL_RGBA/1010102: R8G8B8A8_UNORM -> A2B10G10R10_UNORM
DEPTH/DEPTH/UINT:     Z24X8          -> Z_UNORM32
DEPTH/DEPTH/USHORT:   Z24X8          -> Z_UNORM16

v2: Make sure that the baseformat still matches.  v1 would pick
    MESA_FORMAT_L16_UNORM for RED/LUMINANCE/SHORT, when we clearly
    want a red format.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agomesa: Don't put sRGB formats in the array format table.
Eric Anholt [Fri, 15 Nov 2019 01:41:27 +0000 (17:41 -0800)]
mesa: Don't put sRGB formats in the array format table.

sRGB vs unorm was the only conflict case being guarded against in this
function.  Before the PIPE_FORMAT conversion, we always listed the
unorm before the sRGB in the enums, but PIPE_FORMAT_A8B8G8R8_SRGB
happens to be before _UNORM.  We always want the unorm result here.

Fixes: 807a800d8c3e ("mesa: Redefine MESA_FORMAT_* in terms of PIPE_FORMAT_*.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agomesa/st: Simplify st_choose_matching_format().
Eric Anholt [Tue, 17 Sep 2019 22:13:30 +0000 (15:13 -0700)]
mesa/st: Simplify st_choose_matching_format().

We now have a nice helper function for finding those memcpy formats,
without needing to go through each entry of the mesa format table to
see if it happens to match.

While looking at sysprof of a softpipe GLES2 CTS run, we were spending
~8% of the CPU on ChooseTextureFormat.  With this, roughly the same
region of the testsuite was .4%.

v2: Add Ken's fix for canonicalizing array formats.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agomesa: Handle GL_COLOR_INDEX in _mesa_format_from_format_and_type().
Kenneth Graunke [Wed, 13 Nov 2019 07:12:54 +0000 (23:12 -0800)]
mesa: Handle GL_COLOR_INDEX in _mesa_format_from_format_and_type().

Just return MESA_FORMAT_NONE to avoid triggering unreachable; there's
really no sensible thing to return for this case anyway.

This prevents regressions in the next commit, which makes st/mesa
start using this function to find a reasonable format from GL format
and type enums.

Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agopan/midgard: Use generic constant packing for 8/64-bit
Alyssa Rosenzweig [Tue, 5 Nov 2019 14:06:41 +0000 (09:06 -0500)]
pan/midgard: Use generic constant packing for 8/64-bit

Eventually, we will want to combine constants across types, but for now
let's not break the world.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Pack 64-bit swizzles
Alyssa Rosenzweig [Tue, 5 Nov 2019 13:50:29 +0000 (08:50 -0500)]
pan/midgard: Pack 64-bit swizzles

64-bit ops have their own funky swizzles. Let's pack them, both for
native 64-bit sources as well as extended 32-bit sources.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Fix mir_round_bytemask_down for !32b
Alyssa Rosenzweig [Tue, 5 Nov 2019 03:21:47 +0000 (22:21 -0500)]
pan/midgard: Fix mir_round_bytemask_down for !32b

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Implement i2i64 and u2u64
Alyssa Rosenzweig [Tue, 5 Nov 2019 03:21:20 +0000 (22:21 -0500)]
pan/midgard: Implement i2i64 and u2u64

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Expand 64-bit writemasks
Alyssa Rosenzweig [Tue, 5 Nov 2019 03:20:59 +0000 (22:20 -0500)]
pan/midgard: Expand 64-bit writemasks

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agoradeonsi/nir: don't lower fma, instead, fuse fma
Marek Olšák [Wed, 13 Nov 2019 05:21:54 +0000 (00:21 -0500)]
radeonsi/nir: don't lower fma, instead, fuse fma

We want fma. This decreases compile times by 4% for Borderlands 2.

48505 shaders in 30515 tests
Totals:
SGPRS: 2206584 -> 2204784 (-0.08 %)
VGPRS: 1647892 -> 1648964 (0.07 %)
Spilled SGPRs: 6256 -> 6078 (-2.85 %)
Spilled VGPRs: 72 -> 72 (0.00 %)
Private memory VGPRs: 2176 -> 2176 (0.00 %)
Scratch size: 2240 -> 2240 (0.00 %) dwords per thread
Code Size: 49680804 -> 49837988 (0.32 %) bytes
LDS: 74 -> 74 (0.00 %) blocks
Max Waves: 371387 -> 371352 (-0.01 %)

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agoradeonsi/nir: call nir_lower_flrp only once per shader
Marek Olšák [Wed, 13 Nov 2019 04:41:23 +0000 (23:41 -0500)]
radeonsi/nir: call nir_lower_flrp only once per shader

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agoradeonsi/nir: remove dead function temps
Marek Olšák [Sat, 9 Nov 2019 01:16:20 +0000 (20:16 -0500)]
radeonsi/nir: remove dead function temps

glxgears has dead temps after lowering color inputs to load intrinsics.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agogallium/noop: call finalize_nir
Marek Olšák [Thu, 14 Nov 2019 02:20:55 +0000 (21:20 -0500)]
gallium/noop: call finalize_nir

For measuring st/mesa compile time.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agopanfrost: Make sure the shader descriptor is in sync with the GL state
Tomeu Vizoso [Tue, 12 Nov 2019 12:48:54 +0000 (13:48 +0100)]
panfrost: Make sure the shader descriptor is in sync with the GL state

State was leaking from previous frames as we weren't updating the
descriptor in all cases.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
4 years agopan/midgard: Prioritize texture registers
Alyssa Rosenzweig [Wed, 13 Nov 2019 14:00:37 +0000 (09:00 -0500)]
pan/midgard: Prioritize texture registers

On newer GPUs, this is a no-op. On older GPUs, this prevents needless
spilling since texture registers are shared with a subset of work
registers.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
4 years agopan/midgard: Disassemble with old pipeline always on T720
Alyssa Rosenzweig [Wed, 13 Nov 2019 13:48:12 +0000 (08:48 -0500)]
pan/midgard: Disassemble with old pipeline always on T720

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
4 years agopan/midgard: Use texture, not textureLod, on early Midgard
Alyssa Rosenzweig [Mon, 11 Nov 2019 13:19:18 +0000 (08:19 -0500)]
pan/midgard: Use texture, not textureLod, on early Midgard

We have to disable the fixup.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
4 years agopan/midgard: Fix vertex texturing on early Midgard
Alyssa Rosenzweig [Mon, 11 Nov 2019 13:15:46 +0000 (08:15 -0500)]
pan/midgard: Fix vertex texturing on early Midgard

We use a different set of texture registers, probably to save hardware.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
4 years agopan/midgard: Generalize texture registers across GPUs
Alyssa Rosenzweig [Mon, 11 Nov 2019 13:13:46 +0000 (08:13 -0500)]
pan/midgard: Generalize texture registers across GPUs

Early Midgard uses a different set of texture registers; let's not
hardcode.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Tested-by: Andre Heider <a.heider@gmail.com>
4 years agoaco: implement VK_KHR_shader_float_controls
Rhys Perry [Sat, 9 Nov 2019 20:51:45 +0000 (20:51 +0000)]
aco: implement VK_KHR_shader_float_controls

This actually supports more of the extension than the LLVM backend but we
can't enable it because ACO doesn't work with all stages yet.

With more of it enabled, some CTS tests fail because our 64-bit sqrt
is very imprecise. I can't find any precision requirements for it
anywhere, so I'm thinking it might be a CTS issue.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: fix 64-bit fsign with 0
Rhys Perry [Mon, 11 Nov 2019 14:19:51 +0000 (14:19 +0000)]
aco: fix 64-bit fsign with 0

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: don't combine literals into v_cndmask_b32/v_subb/v_addc
Rhys Perry [Mon, 11 Nov 2019 14:15:04 +0000 (14:15 +0000)]
aco: don't combine literals into v_cndmask_b32/v_subb/v_addc

No pipeline-db changes

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoradv: enable FP16/FP64 denormals earlier and only for LLVM
Rhys Perry [Mon, 11 Nov 2019 13:41:32 +0000 (13:41 +0000)]
radv: enable FP16/FP64 denormals earlier and only for LLVM

ACO sets this itself and will have to set it differently in the future to
support shaderDenormFlushToZeroFloat64.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agogitlab-ci: Organize images using new REPO_SUFFIX templates feature
Michel Dänzer [Mon, 11 Nov 2019 17:13:28 +0000 (18:13 +0100)]
gitlab-ci: Organize images using new REPO_SUFFIX templates feature

Two benefits:

Most docker image related environment variables can now be defined in
the jobs where they're used instead of globally. The DEBIAN_TAG values
are propagated to other jobs via YAML anchors.

Images on https://gitlab.freedesktop.org/mesa/mesa/container_registry
are now organized in separate repositories with a suffix matching the
name of the job which makes sure the image is there.

Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
4 years agogitlab-ci: Rename container install scripts to match job names (better)
Michel Dänzer [Thu, 7 Nov 2019 19:08:03 +0000 (20:08 +0100)]
gitlab-ci: Rename container install scripts to match job names (better)

Cleans up .gitlab-ci/ a little, and allows using a single DEBIAN_EXEC
line for all container jobs.

v2:
* Use lava_arm.sh instead of arm_lava.sh for consistency with v2 of the
  previous change

Reviewed-by: Eric Anholt <eric@anholt.net> # v1
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
4 years agogitlab-ci: Use functional container job names
Michel Dänzer [Wed, 13 Nov 2019 16:43:41 +0000 (17:43 +0100)]
gitlab-ci: Use functional container job names

This makes it easier to tell which job is which in a pipeline.

v2:
* Use lava_arm{64,hf} instead of arm{64,hf}_lava to keep these jobs
  together in pipeline overviews

Reviewed-by: Eric Anholt <eric@anholt.net> # v1
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
4 years agogitlab-ci: Document that ci-templates refs must be in sync
Michel Dänzer [Thu, 7 Nov 2019 19:25:10 +0000 (20:25 +0100)]
gitlab-ci: Document that ci-templates refs must be in sync

Otherwise there can be weird breakage.

(Removing the include from .gitlab-ci/lava-gitlab-ci.yml doesn't seem
possible unfortunately:
https://gitlab.freedesktop.org/daenzer/mesa/pipelines/79458)

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
4 years agopanfrost: Multiply offset_units by 2
Tomeu Vizoso [Wed, 13 Nov 2019 07:42:34 +0000 (08:42 +0100)]
panfrost: Multiply offset_units by 2

Per the spec, the units passed to glPolygonOffset are to be multiplied
by an implementation-defined constant.

On Midgard, this constant seems to be 2.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agointel/perf: add EHL performance query support
Lionel Landwerlin [Wed, 30 Oct 2019 09:18:42 +0000 (11:18 +0200)]
intel/perf: add EHL performance query support

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Rafael Antognolli <rafael.antognolli@intel.com>
4 years agointel/dev: flag the Elkhart Lake platform
Lionel Landwerlin [Wed, 30 Oct 2019 08:59:35 +0000 (10:59 +0200)]
intel/dev: flag the Elkhart Lake platform

We'll use this for performance metrics which are different from ICL.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
4 years agogitlab-ci: update Piglit commit, update skips
Tapani Pälli [Fri, 15 Nov 2019 05:12:24 +0000 (07:12 +0200)]
gitlab-ci: update Piglit commit, update skips

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
4 years agomesa: allow bit queries for EXT_disjoint_timer_query
Tapani Pälli [Tue, 12 Nov 2019 11:43:21 +0000 (13:43 +0200)]
mesa: allow bit queries for EXT_disjoint_timer_query

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2090
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agoradv: make sure to not clear the ds attachment after resolves
Samuel Pitoiset [Wed, 6 Nov 2019 12:55:08 +0000 (13:55 +0100)]
radv: make sure to not clear the ds attachment after resolves

To not overwrite the resolve if there is pending clear aspects,
same as color resolves.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: remove useless RADV_DEBUG=unsafemath debug option
Samuel Pitoiset [Fri, 8 Nov 2019 07:22:15 +0000 (08:22 +0100)]
radv: remove useless RADV_DEBUG=unsafemath debug option

This option is useless and shouldn't be used at all.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agollvmpipe: Check thread creation errors
Nathan Kidd [Fri, 15 Nov 2019 01:35:11 +0000 (02:35 +0100)]
llvmpipe: Check thread creation errors

In the case of glibc, pthread_t is internally a pointer.  If
lp_rast_destroy() passes a 0-value pthread_t to pthread_join(), the
latter will SEGV dereferencing it.

pthread_create() can fail if either the user's ulimit -u or Linux
kernel's /proc/sys/kernel/threads-max is reached.

Choosing to continue, rather than fail, on theory that it is better to
run with the one main thread, than not run at all.

Keeping as many threads as we got, since lack of threads severely
degrades llvmpipe performance.

Signed-off-by: Nathan Kidd <nkidd@opentext.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
4 years agollvmpipe: use ppc64le/ppc64 Large code model for JIT-compiled shaders
Ben Crocker [Wed, 13 Nov 2019 20:27:24 +0000 (20:27 +0000)]
llvmpipe: use ppc64le/ppc64 Large code model for JIT-compiled shaders

Large programs, e.g. gnome-shell and firefox, may tax the
addressability of the Medium code model once a (potentially unbounded)
number of dynamically generated JIT-compiled shader programs are
linked in and relocated.  Yet the default code model as of LLVM 8 is
Medium or even Small.

The cost of changing from Medium to Large is negligible:
- an additional 8-byte pointer stored immediately before the shader entrypoint;
- change an add-immediate (addis) instruction to a load (ld).

Testing with WebGL Conformance
(https://www.khronos.org/registry/webgl/sdk/tests/webgl-conformance-tests.html)
yields clean runs with this change (and crashes without it).

Testing with glxgears shows no detectable performance difference.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1753327175378915435721747110, and 1582226

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/223
Co-authored by: Nemanja Ivanovic <nemanjai@ca.ibm.com>, Tom Stellard <tstellar@redhat.com>

CC: mesa-stable@lists.freedesktop.org
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
4 years agoiris: Wrap iris_fix_edge_flags in NIR_PASS
Kenneth Graunke [Thu, 14 Nov 2019 18:22:17 +0000 (10:22 -0800)]
iris: Wrap iris_fix_edge_flags in NIR_PASS

So nir_validate happens properly.  Unfortunately this means we have
to play the metadata song and dance, so walk over all impls and say
that we didn't hurt anything.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agoiris: Properly move edgeflag_out from output list to global list
Kenneth Graunke [Thu, 14 Nov 2019 18:10:27 +0000 (10:10 -0800)]
iris: Properly move edgeflag_out from output list to global list

When demoting it from an output to a global, we need to actually move
it to the correct list.  While here, we also refactor so it's clear
we aren't mutating the list while iterating.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2106
Fixes: f9fd04aca15 ("nir: Fix non-determinism in lower_global_vars_to_local")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agomesa: Move compile of common Mesa core files to a static lib.
Eric Anholt [Tue, 12 Nov 2019 19:25:09 +0000 (11:25 -0800)]
mesa: Move compile of common Mesa core files to a static lib.

We were compiling them twice, costing extra build time.  Reduces my
ccache-hot clean build time by a second (24.3s to 23.3s, 3 runs each).

The windows args are a little strange -- it's not clear to me that
they're actually used for building these files, but keep them in place
just in case, since we don't have a good windows CI story yet.  We
should want them on both gallium and classic regardless: Only osmesa
could be built for windows in classic, and classic OSMesa's scons
build defines these flags too.

Closes: #2052
Acked-by: Dylan Baker <dylan@pnwbakers.com>
4 years agoAppveyor: Quickly fix meson build.
Prodea Alexandru-Liviu [Thu, 14 Nov 2019 21:45:23 +0000 (21:45 +0000)]
Appveyor: Quickly fix meson build.
As this required use of Python 3.8, mako module also had to be updated.

v2 - Unbind mako module version when using Meson.
Signed-off-by: Prodea Alexandru-Liviu <liviuprodea@yahoo.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
4 years agointel/fs: Do not lower large local arrays to scratch on gen7
Danylo Piliaiev [Tue, 12 Nov 2019 16:32:25 +0000 (18:32 +0200)]
intel/fs: Do not lower large local arrays to scratch on gen7

On gen7 and earlier the scratch space size is limited to 12kB.
By enabling this optimization we may easily exceed this limit
without having any fallback.

arb_compute_shader/linker/bug-93840.shader_test crashes with
this lowering on IVB due to exceeding scratch size limit.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2092
Fixes: 69244fc7
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agoutil: Move gallium's PIPE_FORMAT utils to /util/format/
Eric Anholt [Thu, 27 Jun 2019 22:05:31 +0000 (15:05 -0700)]
util: Move gallium's PIPE_FORMAT utils to /util/format/

To make PIPE_FORMATs usable from non-gallium parts of Mesa, I want to
move their helpers out of gallium.  Since u_format used
util_copy_rect(), I moved that in there, too.

I've put it in a separate directory in util/ because it's a big chunk
of related code, and it's not clear to me whether we might want it as
a separate library from libmesa_util at some point.

Closes: #1905
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agogitlab-ci: auto-cancel CI runs when a newer commit is pushed to the same branch
Eric Engestrom [Tue, 12 Nov 2019 23:42:21 +0000 (23:42 +0000)]
gitlab-ci: auto-cancel CI runs when a newer commit is pushed to the same branch

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
4 years agoaco: Optimize out trivial code from uniform bools.
Timur Kristóf [Tue, 5 Nov 2019 10:41:00 +0000 (11:41 +0100)]
aco: Optimize out trivial code from uniform bools.

This should remove most of the excess code size that was
introduced by making all booleans per-lane.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: Treat all booleans as per-lane.
Timur Kristóf [Mon, 4 Nov 2019 18:28:08 +0000 (19:28 +0100)]
aco: Treat all booleans as per-lane.

Previously, instruction selection had two kinds of booleans:
1. divergent which was per-lane and stored in s2 (VCC size)
2. uniform which was stored in s1
Additionally, uniform booleans were made per-lane when they resulted
from operations which were supported only by the VALU.

To decide which type was used, we relied on the destination size,
which was not reliable due to the per-lane uniform bools, but it
mostly works on wave64.
However, in wave32 mode (where VCC is also s1) this approach
makes it impossible keep track of which boolean is uniform and
which is divergent.

This commit makes all booleans per-lane.
The resulting excess code size will be taken care of by the optimizer.

v2 (by Daniel Schürmann):
- Better names for some functions
- Use s_andn2_b64 with exec for nir_op_inot
- Simplify code due to using s_and_b64 in bool_to_scalar_condition

v3 (by Timur Kristóf):
- Fix several subgroups regressions

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: use s_and_b64 exec to reduce uniform booleans to one bit
Daniel Schürmann [Tue, 12 Nov 2019 10:40:28 +0000 (11:40 +0100)]
aco: use s_and_b64 exec to reduce uniform booleans to one bit

Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>