mesa.git
4 years agoandroid: radeonsi: fix build after vl refactoring (v2)
Mauro Rossi [Sat, 16 Nov 2019 17:39:31 +0000 (18:39 +0100)]
android: radeonsi: fix build after vl refactoring (v2)

vl functions moved from radeonsi to gallium/auxiliary/vl have left
android build of radeonsi in broken state.

libmesa_galliumvl static is need to build readeonsi,
gallium_dri building rules are reworked to avoid multiple symbols
and libmesa_galliumvl static dependency is needed in radeonsi.

Here is the changelog:
- android: gallium/auxiliary: add libmesa_galliumvl static
- android: gallium_dri: move libmesa_gallium to static to prevent multiple symbols
- android: radeonsi: fix build after vl refactoring

Fixes the following building error:

external/mesa/src/gallium/drivers/radeonsi/si_uvd.c:47:
error: undefined reference to 'vl_video_buffer_create_as_resource'
clang.real: error: linker command failed with exit code 1 (use -v to see invocation)

Fixes: 86e60bc ("radeonsi: remove si_vid_join_surfaces and use combined planar allocations")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agointel/compiler: force simd8 when dual src blending on gen8
Tapani Pälli [Mon, 2 Dec 2019 14:54:30 +0000 (16:54 +0200)]
intel/compiler: force simd8 when dual src blending on gen8

Patch introduces option to force simd8 and uses it as a workaround for
dual source blending issues seen with skqp (skia testsuite) on gen8.

Fixes following Piglit test on gen8 platforms:
   arb_blend_func_extended-dual-src-blending-issue-1917

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1917
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
c: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agointel/compiler: add newline to limit_dispatch_width message
Tapani Pälli [Wed, 4 Dec 2019 06:04:21 +0000 (08:04 +0200)]
intel/compiler: add newline to limit_dispatch_width message

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agoturnip: Add support for compute shaders.
Eric Anholt [Wed, 27 Nov 2019 04:37:19 +0000 (20:37 -0800)]
turnip: Add support for compute shaders.

Since compute shares the FS state with graphics, we have to re-upload the
pipeline state when switching between compute dispatch and graphics draws.
We could potentially expose graphics and compute as separate queues and
then we wouldn't need pipeline state management, but the closed driver
exposes a single queue and consistency with them is probably good.

So far I'm emitting texture/ibo state as IBs that we jump to.  This is
kind of silly when we could just emit it directly in our CS, but that's a
refactor we can do later.

Reviewed-by: Jonathan Marek <jonathan@marek.ca>
4 years agoturnip: Move pipeline BO list adding to BindPipeline.
Eric Anholt [Wed, 4 Dec 2019 20:22:55 +0000 (12:22 -0800)]
turnip: Move pipeline BO list adding to BindPipeline.

We only need to do it once when we bind, rather than having to check at
every draw call.

Reviewed-by: Jonathan Marek <jonathan@marek.ca>
4 years agoturnip: Sanity check that we're adding valid BOs to the list.
Eric Anholt [Wed, 4 Dec 2019 20:21:50 +0000 (12:21 -0800)]
turnip: Sanity check that we're adding valid BOs to the list.

I tripped over this during CS enabling when my program BO wasn't set up.
Easier to debug this way than the kernel telling us a 0 handle is invalid.

Reviewed-by: Jonathan Marek <jonathan@marek.ca>
4 years agoturnip: Add a helper function for getting tu_buffer iovas.
Eric Anholt [Wed, 4 Dec 2019 21:13:16 +0000 (13:13 -0800)]
turnip: Add a helper function for getting tu_buffer iovas.

Easier than remembering to add all 3 offsets.

Reviewed-by: Jonathan Marek <jonathan@marek.ca>
4 years agoturnip: Refactor the graphics pipeline create implementation.
Eric Anholt [Wed, 27 Nov 2019 00:42:24 +0000 (16:42 -0800)]
turnip: Refactor the graphics pipeline create implementation.

The loop over the pipelines to create (and the failure handling) was
noisy, and the stub for compute setup looked nicer to me.

Reviewed-by: Jonathan Marek <jonathan@marek.ca>
4 years agoturnip: Add basic SSBO support.
Eric Anholt [Mon, 2 Dec 2019 22:32:53 +0000 (14:32 -0800)]
turnip: Add basic SSBO support.

This is enough to pass
dEQP-VK.binding_model.shader_access.primary_cmd_buf.storage_buffer.fragment.single_descriptor.*
with fragmentStoresAndAtomics set, and thus to be able to start working on
compute.  I haven't enabled that flag yet, because it also implies image
load/store support, which I haven't filled in.

Reviewed-by: Jonathan Marek <jonathan@marek.ca>
4 years agoturnip: Reuse tu6_stage2opcode() more.
Eric Anholt [Tue, 3 Dec 2019 00:44:52 +0000 (16:44 -0800)]
turnip: Reuse tu6_stage2opcode() more.

A bit of cleanup for adding more stages later.

Reviewed-by: Jonathan Marek <jonathan@marek.ca>
4 years agoturnip: Drop redefinition of VALIDREG now that it's in ir3.h.
Eric Anholt [Wed, 4 Dec 2019 22:15:42 +0000 (14:15 -0800)]
turnip: Drop redefinition of VALIDREG now that it's in ir3.h.

Fixes: 937b9055698b ("freedreno/ir3: fix neverball assert in case of unused VS inputs")
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
4 years agoturnip: Fix unused variable warnings.
Eric Anholt [Tue, 26 Nov 2019 19:15:12 +0000 (11:15 -0800)]
turnip: Fix unused variable warnings.

Reviewed-by: Jonathan Marek <jonathan@marek.ca>
4 years agoglsl: make use of active_shader_mask when building resource list
Timothy Arceri [Tue, 3 Dec 2019 13:24:35 +0000 (00:24 +1100)]
glsl: make use of active_shader_mask when building resource list

This allows us to avoid walking the entire IR looking for used
uniforms.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
4 years agoglsl: don't set uniform block as used when its not
Timothy Arceri [Tue, 3 Dec 2019 13:14:03 +0000 (00:14 +1100)]
glsl: don't set uniform block as used when its not

The spec requires unused uniform block to be set as active in the
program resource list. To support this we tell opt dead code not to
remove them. However we can mark them as unused internally and
avoid unnecessarily state changes.

This change is also required for the folowing clean-up patch.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
4 years agoglsl: move calculate_array_size_and_stride() to link_uniforms.cpp
Timothy Arceri [Tue, 3 Dec 2019 04:04:14 +0000 (15:04 +1100)]
glsl: move calculate_array_size_and_stride() to link_uniforms.cpp

This is where all the other uniform values are populated so it
makes much more sense here. Moving it will also allow us to better
share code between the NIR and GLSL IR resource list builders.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
4 years agoanv: Fix error message format string
Ian Romanick [Tue, 19 Nov 2019 03:53:57 +0000 (19:53 -0800)]
anv: Fix error message format string

See also 246261f0addf

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
CID: 1455892
Fixes: 246261f0add ("anv: prepare the driver for delayed submissions")
4 years agomesa: Silence unused parameter warning
Ian Romanick [Tue, 19 Nov 2019 03:42:22 +0000 (19:42 -0800)]
mesa: Silence unused parameter warning

Unused since e4da8b9c331 ("mesa/compiler: rework tear down of
builtin/types").

src/mesa/main/context.c: In function ‘_mesa_free_context_data’:
src/mesa/main/context.c:1321:54: warning: unused parameter ‘destroy_compiler_types’ [-Wunused-parameter]
 1321 | _mesa_free_context_data(struct gl_context *ctx, bool destroy_compiler_types)
      |                                                      ^

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agomesa: Silence 'left shift of negative value' warning in BPTC compression code
Ian Romanick [Tue, 19 Nov 2019 03:33:06 +0000 (19:33 -0800)]
mesa: Silence 'left shift of negative value' warning in BPTC compression code

src/util/format/../../mesa/main/texcompress_bptc_tmp.h:830:31: warning: left shift of negative value [-Wshift-negative-value]
  830 |       value |= (~(int32_t) 0) << n_bits;
      |                               ^~

v2: Rewrite to just shift left then shift right.  Based on conversation
with Neil in
https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2792#note_320272,
this should be fine.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> [v1]
Reviewed-by: Neil Roberts <nroberts@igalia.com>
4 years agointel/compiler: Fix 'comparison is always true' warning
Ian Romanick [Tue, 19 Nov 2019 03:16:23 +0000 (19:16 -0800)]
intel/compiler: Fix 'comparison is always true' warning

Without looking at the assembly or something, I'm not sure what the
compiler does here.  The brw_reg_type enum is marked packed, so I'm
guess that it gets represented as a uint8_t.  That's the only reason I
could think that comparing with -1 would be always true.

This patch adds the same cast that exists in brw_hw_type_to_reg_type.
It might be better to add a #define outside the enum for
BRW_REGISTER_TYPE_INVALID as (enum brw_reg_type)-1.

src/intel/compiler/brw_eu_compact.c: In function ‘has_immediate’:
src/intel/compiler/brw_eu_compact.c:1515:20: warning: comparison is always true due to limited range of data type [-Wtype-limits]
 1515 |       return *type != -1;
      |                    ^~
src/intel/compiler/brw_eu_compact.c:1518:20: warning: comparison is always true due to limited range of data type [-Wtype-limits]
 1518 |       return *type != -1;
      |                    ^~

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
CID: 1455194
Fixes: 12d3b11908e ("intel/compiler: Add instruction compaction support on Gen12")
Cc: @mattst88
4 years agodocs: Update mesa 19.3 release calendar
Dylan Baker [Wed, 4 Dec 2019 22:40:10 +0000 (14:40 -0800)]
docs: Update mesa 19.3 release calendar

4 years agodocs: update calendar, add news item and link release notes for 19.2.7
Dylan Baker [Wed, 4 Dec 2019 22:38:48 +0000 (14:38 -0800)]
docs: update calendar, add news item and link release notes for 19.2.7

4 years agodocs: Add SHA256 sums for 19.2.7
Dylan Baker [Wed, 4 Dec 2019 22:36:13 +0000 (14:36 -0800)]
docs: Add SHA256 sums for 19.2.7

4 years agodocs: Add release notes for 19.2.7
Dylan Baker [Wed, 4 Dec 2019 21:47:44 +0000 (13:47 -0800)]
docs: Add release notes for 19.2.7

4 years agoturnip: allow writes to draw_cs outside of render pass
Jonathan Marek [Wed, 4 Dec 2019 19:29:58 +0000 (14:29 -0500)]
turnip: allow writes to draw_cs outside of render pass

This is for state commands like CmdSetViewport that can be used outside of
a renderpass. Accumulating those into draw_cs outside of the renderpass
should have the desired effect.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agonir/lower_clip: Fix incorrect driver loc for clipdist outputs
Rob Clark [Wed, 4 Dec 2019 00:28:26 +0000 (16:28 -0800)]
nir/lower_clip: Fix incorrect driver loc for clipdist outputs

Somehow adjusting maxloc based on existing outputs got lost, resulting
in the clipdist varying clobbering the position varying.  Causing a
shader that had no position output in freedreno/ir3, which triggers GPU
hangs in neverball.

Fixes: d0f746b6458 ("nir: Save nir_variable pointers in nir_lower_clip_vs rather than locs.")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agofreedreno/ir3: fix neverball assert in case of unused VS inputs
Rob Clark [Tue, 3 Dec 2019 21:44:35 +0000 (13:44 -0800)]
freedreno/ir3: fix neverball assert in case of unused VS inputs

The logic to ensure VS and BS inputs are aligned wasn't accounting for
unused inputs in VS.  This *usually* doesn't happen, but it seems it
can in the case of ARB programs?

Fixes assert:
```
fd6_program_create: Assertion `bs->inputs[i].regid == vs->inputs[i].regid' failed.
```

Fixes: 882d53d8e36 ("freedreno/ir3+a6xx: same VBO state for draw/binning")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agofreedreno/ir3: remove store_output lowered to store_shared_ir3
Rob Clark [Wed, 4 Dec 2019 18:15:39 +0000 (10:15 -0800)]
freedreno/ir3: remove store_output lowered to store_shared_ir3

Fixes crashes that were unnoticed in CI because debug_assert() was not
enabled (but become real crashes after the next patch):

dEQP-GLES31.functional.shaders.builtin_functions.integer.bitfieldextract.ivec2_highp_geometry
dEQP-GLES31.functional.shaders.builtin_functions.integer.bitfieldextract.ivec2_lowp_geometry
dEQP-GLES31.functional.shaders.builtin_functions.integer.bitfieldextract.ivec2_mediump_geometry
dEQP-GLES31.functional.shaders.builtin_functions.integer.bitfieldextract.uvec2_highp_geometry
dEQP-GLES31.functional.shaders.builtin_functions.integer.bitfieldextract.uvec2_lowp_geometry
dEQP-GLES31.functional.shaders.builtin_functions.integer.bitfieldextract.uvec2_mediump_geometry

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agoiris: Add restriction to 3DSTATE_CONSTANT_ packets.
Rafael Antognolli [Tue, 3 Dec 2019 19:15:38 +0000 (11:15 -0800)]
iris: Add restriction to 3DSTATE_CONSTANT_ packets.

The following programming note shows up in all 3DSTATE_CONSTANT_*
packets:

   "The sum of all four read length fields must be less than or equal to
   the size of 64."

The backend compiler should guarantee this for us, so let's just add a
check here.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agoanv: Use 3DSTATE_CONSTANT_ALL when possible.
Rafael Antognolli [Tue, 26 Nov 2019 17:42:06 +0000 (09:42 -0800)]
anv: Use 3DSTATE_CONSTANT_ALL when possible.

Use this new instruction introduced in Gen12. The instruction itself is
smaller, and it also allows us to emit a single instruction to all
stages that have the same push constant buffers (e.g. when they don't
have constant buffers).

There's one restriction to use this instruction, though: the length
field is only 5 bits long, so we need to check whether we can use it,
and fallback to the old 3DSTATE_CONSTANT_XS if that field is >= 32.

v2:
 - Rebased on top of the lasted changes from Jason.
 - Added review suggestions by Caio.
 - Removed struct push_bos and merged some code into
 anv_nir_compute_push_layout().

v3:
 - Remove code churn due to gen8+ workaround in
 anv_nir_compute_push_layout(). This code has been removed in an earlier
 commit, and implemented in cmd_buffer_emit_push_constant().

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agoanv: Move code for emitting push constants into its own function.
Rafael Antognolli [Tue, 26 Nov 2019 21:07:41 +0000 (13:07 -0800)]
anv: Move code for emitting push constants into its own function.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agoanv: Add get_push_range_address() helper.
Rafael Antognolli [Tue, 26 Nov 2019 21:05:06 +0000 (13:05 -0800)]
anv: Add get_push_range_address() helper.

Add a helper function to get the push range address. Once we have a
separate function for emitting gen12 push constants, we can use this
helper and avoid duplicating code.

v3: Do not add range->start to the address in gen7 (Caio).
v4: Do not drop range->start from gen7 (Caio, Jason).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agoanv: Move gen8+ push constant packet workaround.
Rafael Antognolli [Mon, 2 Dec 2019 21:41:32 +0000 (13:41 -0800)]
anv: Move gen8+ push constant packet workaround.

Store push_ranges in ascending order, and only "shift" them to the end
of the array during state packet emission.

We don't need this workaround with the new 3DSTATE_CONSTANT_ALL packet.
So instead of applying the workaround here just for GEN < 12 (which
requires and extra loop through all the ranges to figure out if we
should shift them or not), we simply move the whole logic to the state
emission code. At that point, in a later commit, we are already looping
through all of the ranges anyway to check which packet we will be using,
so we might as well implement the workaround there, where it is going to
be used.

v3: Move gen8+ workaround to the state emission code (Caio).
v4: Add explanation of why we moved the workaroudn (Caio).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agoiris: Use 3DSTATE_CONSTANT_ALL when possible.
Rafael Antognolli [Mon, 23 Sep 2019 20:25:01 +0000 (13:25 -0700)]
iris: Use 3DSTATE_CONSTANT_ALL when possible.

Use this new instruction introduced in Gen12. The instruction itself is
smaller, and it also allows us to emit a single instruction to all
stages that have the same push constant buffers (e.g. when they don't
have constant buffers).

There's one restriction to use this instruction, though: the length
field is only 5 bits long, so we need to check whether we can use it,
and fallback to the old 3DSTATE_CONSTANT_XS if that field is >= 32.

v2 (Suggestions from Caio):
 - use max_length instead of large_buffers.
 - remove UNUSED and use #if GEN_GEN >= 12 instead.
 - inline "buffers" and drop BITSET_RANGE() usage.
 - add assert(n <= max_pointers)
 - move emit to outside of the loop.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agoiris: Rework push constants emitting code.
Rafael Antognolli [Mon, 23 Sep 2019 17:15:52 +0000 (10:15 -0700)]
iris: Rework push constants emitting code.

Split into a function the logic to gather the push constant buffers,
which now stores them in struct push_bos. Another function is added to
emit the packet, using data from the push_bos struct.

This will be useful when adding a new function for emitting push
constants for newer platforms.

v2 (Suggestions from Caio):
   - rename 'n' -> 'buffer_count'
   - remove large_buffers (for now)
   - initialize push_bos
   - remove assert
   - change for() condition (i <= 3 -> i < 4)
v3:
   - Add comment about size limit.
   - Rework "shift" logic and 'for' loop.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agointel/blorp: Use 3DSTATE_CONSTANT_ALL to setup push constants.
Rafael Antognolli [Mon, 11 Jun 2018 18:29:14 +0000 (11:29 -0700)]
intel/blorp: Use 3DSTATE_CONSTANT_ALL to setup push constants.

In blorp, all the push constants are disabled, so we only need to emit a
single 3DSTATE_CONSTANT_ALL with the bitmask for stage update
appropriately set.

v2: Update comment (Caio).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agointel/aubinator: Decode 3DSTATE_CONSTANT_ALL.
Rafael Antognolli [Wed, 13 Jun 2018 16:49:07 +0000 (09:49 -0700)]
intel/aubinator: Decode 3DSTATE_CONSTANT_ALL.

Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agointel/genxml: Add 3DSTATE_CONSTANT_ALL packet.
Rafael Antognolli [Thu, 7 Jun 2018 22:25:24 +0000 (15:25 -0700)]
intel/genxml: Add 3DSTATE_CONSTANT_ALL packet.

Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
4 years agoturnip: MSAA resolve directly from GMEM
Jonathan Marek [Fri, 22 Nov 2019 23:25:43 +0000 (18:25 -0500)]
turnip: MSAA resolve directly from GMEM

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoturnip: don't set unused BLIT_DST_INFO bits for GMEM clear
Jonathan Marek [Fri, 22 Nov 2019 23:12:11 +0000 (18:12 -0500)]
turnip: don't set unused BLIT_DST_INFO bits for GMEM clear

These bits are ignored when clearing so don't bother setting them.

Note: MSAA samples when clearing comes from other registers (tu6_emit_msaa)

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoturnip: implement CmdClearAttachments
Jonathan Marek [Fri, 22 Nov 2019 23:09:32 +0000 (18:09 -0500)]
turnip: implement CmdClearAttachments

Passes these deqp tests: dEQP-VK.api.image_clearing.core.*attach*single*

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoturnip: don't skip unused attachments when setting up tiling config
Jonathan Marek [Fri, 22 Nov 2019 23:06:44 +0000 (18:06 -0500)]
turnip: don't skip unused attachments when setting up tiling config

This makes it easier to find the gmem_offset associated with an attachment.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agolima: enable tiling
Vasily Khoruzhick [Sun, 1 Dec 2019 00:19:25 +0000 (16:19 -0800)]
lima: enable tiling

Now that we have tiled format modifier merged into linux we can enable tiling.

That should improve overall performance and also workaround broken mipmapping
for linear textures since now we prefer tiled textures.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
4 years agoglsl: additional interface redeclaration check for SSO programs
Tapani Pälli [Tue, 5 Nov 2019 13:00:25 +0000 (15:00 +0200)]
glsl: additional interface redeclaration check for SSO programs

Patch adds additional linker check for SSO programs to make sure they
are redeclaring built-in blocks as required by the desktop spec.

This fixes following Piglit tests:
   arb_separate_shader_objects/linker/pervertex-*

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agogitlab-ci: bump piglit checkout commit
Tapani Pälli [Mon, 2 Dec 2019 08:10:37 +0000 (10:10 +0200)]
gitlab-ci: bump piglit checkout commit

Commit also updates the Piglit quick_gl.txt, list modifications happened
due to following Piglit commits: c248bf201,c acff58ca5603e2e60.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
4 years agonir/load_store_vectorize: fix combining stores with aliasing loads between
Rhys Perry [Tue, 3 Dec 2019 10:48:18 +0000 (10:48 +0000)]
nir/load_store_vectorize: fix combining stores with aliasing loads between

v2: add test

Fixes: ce9205c03bd ('nir: add a load/store vectorization pass')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v1)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v2)
4 years agoaco/wave32: Fix reductions.
Timur Kristóf [Wed, 27 Nov 2019 15:59:11 +0000 (16:59 +0100)]
aco/wave32: Fix reductions.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco/wave32: Allow setting the subgroup ballot size to 64-bit.
Timur Kristóf [Thu, 28 Nov 2019 09:41:19 +0000 (10:41 +0100)]
aco/wave32: Allow setting the subgroup ballot size to 64-bit.

Previously, it would only work when the ballot size was set to the
lane mask. This patch makes is possible to set the ballot size
to either 32-bit or 64-bit for wave32 mode.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco/wave32: Use wave_size for barrier intrinsic.
Timur Kristóf [Fri, 22 Nov 2019 16:07:34 +0000 (17:07 +0100)]
aco/wave32: Use wave_size for barrier intrinsic.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco/wave32: Fix load_local_invocation_index to support wave32.
Timur Kristóf [Wed, 27 Nov 2019 10:09:20 +0000 (11:09 +0100)]
aco/wave32: Fix load_local_invocation_index to support wave32.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco/wave32: Use lane mask regclass for exec/vcc.
Timur Kristóf [Wed, 27 Nov 2019 10:04:47 +0000 (11:04 +0100)]
aco/wave32: Use lane mask regclass for exec/vcc.

Currently all usages of exec and vcc are hardcoded to use s2 regclass.
This commit makes it possible to use s1 in wave32 mode and
s2 in wave64 mode.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco/wave32: Add wave size specific opcodes to aco_builder.
Timur Kristóf [Thu, 31 Oct 2019 12:28:54 +0000 (13:28 +0100)]
aco/wave32: Add wave size specific opcodes to aco_builder.

Several places in ACO we use SOP1 or SOP2 instructions to operate over the
exec mask or VCC, and these need to be adapted to the new size in wave32
mode.

This commit adds a way to deal with this problem in aco_builder: the caller
can specify a wave size specific opcode and the builder will translate that
to the correct opcode based on the current wave size.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco/wave32: Introduce emit_mbcnt which takes wave size into account.
Timur Kristóf [Thu, 31 Oct 2019 10:26:14 +0000 (11:26 +0100)]
aco/wave32: Introduce emit_mbcnt which takes wave size into account.

This is relevant because in wave32 mode the v_mbcnt_hi_u32_b32
instruction is superfluous.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco/wave32: Replace hardcoded numbers in spiller with wave size.
Timur Kristóf [Mon, 28 Oct 2019 16:15:17 +0000 (17:15 +0100)]
aco/wave32: Replace hardcoded numbers in spiller with wave size.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco/wave32: Change uniform bool optimization to work with wave32.
Timur Kristóf [Fri, 22 Nov 2019 10:57:45 +0000 (11:57 +0100)]
aco/wave32: Change uniform bool optimization to work with wave32.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: Optimize load_subgroup_id to one bit field extract instruction.
Timur Kristóf [Fri, 22 Nov 2019 14:13:54 +0000 (15:13 +0100)]
aco: Optimize load_subgroup_id to one bit field extract instruction.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: Remove lower_linear_bool_phi, it is not needed anymore.
Timur Kristóf [Thu, 21 Nov 2019 11:31:14 +0000 (12:31 +0100)]
aco: Remove lower_linear_bool_phi, it is not needed anymore.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: Remove superfluous argument from emit_boolean_logic.
Timur Kristóf [Thu, 21 Nov 2019 11:28:31 +0000 (12:28 +0100)]
aco: Remove superfluous argument from emit_boolean_logic.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: Fix operand of s_bcnt1_i32_b64 in emit_boolean_reduce.
Timur Kristóf [Thu, 21 Nov 2019 11:26:36 +0000 (12:26 +0100)]
aco: Fix operand of s_bcnt1_i32_b64 in emit_boolean_reduce.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agogitlab-ci: Run piglit glslparser & quick_shader tests separately
Michel Dänzer [Tue, 3 Dec 2019 09:45:28 +0000 (10:45 +0100)]
gitlab-ci: Run piglit glslparser & quick_shader tests separately

And only use --process-isolation false for the quick_gl tests.

This will hopefully avoid variance in the test results that we've been
seeing lately. But even if it doesn't, it should at least help narrow
down the cause of the variance.

Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agointel/perf: fix improper pointer access
Lionel Landwerlin [Tue, 3 Dec 2019 14:35:45 +0000 (16:35 +0200)]
intel/perf: fix improper pointer access

This expression was unused by the macro, probably why it didn't
register in the compilation.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/perf: simplify the processing of OA reports
Lionel Landwerlin [Tue, 3 Dec 2019 14:33:25 +0000 (16:33 +0200)]
intel/perf: simplify the processing of OA reports

This is a more accurate description of what happens in processing the
OA reports.

Previously we only had a somewhat difficult to parse state machine
tracking the context ID.

What we really only need to do to decide if the delta between 2
reports (r0 & r1) should be accumulated in the query result is :

   * whether the r0 is tagged with the context ID relevant to us

   * if r0 is not tagged with our context ID and r1 is: does r0 have a
     invalid context id? If not then we're in a case where i915 has
     resubmitted the same context for execution through the execlist
     submission port

v2: Update comment (Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/perf: take into account that reports read can be fairly old
Lionel Landwerlin [Tue, 3 Dec 2019 14:19:24 +0000 (16:19 +0200)]
intel/perf: take into account that reports read can be fairly old

If we read the OA reports late enough after the query happens, we can
get a timestamp in the report that is significantly in the past
compared to the start timestamp of the query. The current code must
deal with the wraparound of the timestamp value (every ~6 minute). So
consider that if the difference is greater than half that wraparound
period, we're probably dealing with an old report and make the caller
aware it should read more reports when they're available.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/perf: set read buffer len to 0 to identify empty buffer
Lionel Landwerlin [Tue, 3 Dec 2019 14:12:03 +0000 (16:12 +0200)]
intel/perf: set read buffer len to 0 to identify empty buffer

We always add an empty buffer in the list when creating the query.
Let's set the len appropriately so that we can recognize it when we
read OA reports up to the end of a query.

We were using an 0 timestamp value associated with the empty buffer
and incorrectly assuming this was a valid value. In turn that led to
not reading enough reports and resulted in deltas added to our counter
values which should have been discarded because those would be flagged
for a different context.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/perf: fix invalid hw_id in query results
Lionel Landwerlin [Tue, 3 Dec 2019 14:08:12 +0000 (16:08 +0200)]
intel/perf: fix invalid hw_id in query results

Accumulation happens between 2 reports, it can be between a start/end
report from another context. So only consider updating the hw_id of
the results when it's not already valid and that we have a valid value
to put in there.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 41b54b5faf ("i965: move OA accumulation code to intel/perf")
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agoradeonsi: display cs blit count for AMD_DEBUG=testdma
Pierre-Eric Pelloux-Prayer [Fri, 4 Oct 2019 13:24:34 +0000 (15:24 +0200)]
radeonsi: display cs blit count for AMD_DEBUG=testdma

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agoradeonsi: implement sdma for GFX9
Pierre-Eric Pelloux-Prayer [Fri, 29 Nov 2019 13:01:10 +0000 (14:01 +0100)]
radeonsi: implement sdma for GFX9

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agoradv/gfx10: fix the vertex order for triangle strips emitted by a GS
Samuel Pitoiset [Sat, 30 Nov 2019 08:03:31 +0000 (09:03 +0100)]
radv/gfx10: fix the vertex order for triangle strips emitted by a GS

My fix wasn't totally correct as pointed out by Marek.
Ported from RadeonSI.

Fixes: deafe4cc587 ("radv/gfx10: fix primitive indices orientation for NGG GS")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: simplify a check in radv_fixup_vertex_input_fetches()
Samuel Pitoiset [Mon, 2 Dec 2019 15:46:30 +0000 (16:46 +0100)]
radv: simplify a check in radv_fixup_vertex_input_fetches()

The number of loaded channels should always be > 0 now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: remove dead shader input/output variables
Samuel Pitoiset [Mon, 2 Dec 2019 15:33:06 +0000 (16:33 +0100)]
radv: remove dead shader input/output variables

No pipeline-db changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoiris: Stop setting up fake params
Jason Ekstrand [Mon, 18 Nov 2019 21:40:09 +0000 (15:40 -0600)]
iris: Stop setting up fake params

In d1c4e64a69e, we added a parameter to tell the back-end compiler to
ignore the param array and just push however many constants you ask it
to push.  Iris doesn't want to push anything so it gives a bogus number
of parameters and trusts the back-end compiler to dead-code all of them.
Now that we can tell the back-end compiler to stop re-arranging things,
delete the hack and enable the new simpler code path.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agogallium/scons: fix graw-xlib build on OSX.
Dave Airlie [Sun, 1 Dec 2019 22:51:25 +0000 (08:51 +1000)]
gallium/scons: fix graw-xlib build on OSX.

Fixes: 44a6b0107b37 (gallivm: add nir->llvm translation (v2))
Tested-by: Vinson Lee <vlee@freedesktop.org>
4 years agollvmpipe: enable texcoord semantics
Dave Airlie [Tue, 3 Dec 2019 23:26:46 +0000 (09:26 +1000)]
llvmpipe: enable texcoord semantics

To make NIR transitioning easier, move the driver to using
texcoord semantics.

Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoanv: Respect the always_flush_cache driconf option
Jason Ekstrand [Mon, 11 Nov 2019 18:46:33 +0000 (12:46 -0600)]
anv: Respect the always_flush_cache driconf option

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agogallium/swr: Fix crash when use GL_TDFX_texture_compression_FXT1 format.
Krzysztof Raszkowski [Tue, 3 Dec 2019 13:43:57 +0000 (14:43 +0100)]
gallium/swr: Fix crash when use GL_TDFX_texture_compression_FXT1 format.

Reject the new formats in swr to prevent crashes because it doesn't
know how to handle the new formats.

Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
4 years agogitlab-ci: disable junit results for deqp
Rob Clark [Tue, 3 Dec 2019 16:45:49 +0000 (08:45 -0800)]
gitlab-ci: disable junit results for deqp

They don't seem to be hugely useful, and seem to be bogging down gitlab.

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agoanv: Set up SBE_SWIZ properly for gl_Viewport
Jason Ekstrand [Tue, 26 Nov 2019 21:08:43 +0000 (15:08 -0600)]
anv: Set up SBE_SWIZ properly for gl_Viewport

gl_Viewport is also in the VUE header so we need to whack the read
offset to 0 and emit a default (no overrides) SBE_SWIZ entry in that
case as well.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agogitlab-ci: Update to current ci-templates master
Michel Dänzer [Tue, 3 Dec 2019 11:38:59 +0000 (12:38 +0100)]
gitlab-ci: Update to current ci-templates master

Fixes skopeo copy failures.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoac/llvm: fix atomic var operations if source isn't a deref
Samuel Pitoiset [Mon, 2 Dec 2019 13:58:00 +0000 (14:58 +0100)]
ac/llvm: fix atomic var operations if source isn't a deref

Fixes some CTS regressions.

Fixes: e61a826f396 ("ac/llvm: fix pointer type for global atomics")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoAdd support for T820 CI Jobs
Neil Armstrong [Fri, 20 Sep 2019 13:43:19 +0000 (15:43 +0200)]
Add support for T820 CI Jobs

Tomeu: - Small rebase fixups

Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
4 years agogallivm/llvmpipe: add support for front facing in sysval.
Dave Airlie [Mon, 2 Dec 2019 00:05:12 +0000 (10:05 +1000)]
gallivm/llvmpipe: add support for front facing in sysval.

This wires up the front facing value as a sysval, I'd like to
remove the other facing code but I'd need to confirm VMware
don't use it first.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agollvmpipe/images: handle undefined atomic without crashing
Dave Airlie [Thu, 24 Oct 2019 01:42:23 +0000 (11:42 +1000)]
llvmpipe/images: handle undefined atomic without crashing

just return 0 for unbound atomic operations.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agopanfrost: Remove blend shader hack
Alyssa Rosenzweig [Sat, 30 Nov 2019 18:24:46 +0000 (13:24 -0500)]
panfrost: Remove blend shader hack

This is no longer used.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agogitlab-ci: Test Panfrost on T720 GPUs
Tomeu Vizoso [Fri, 25 Oct 2019 11:04:34 +0000 (13:04 +0200)]
gitlab-ci: Test Panfrost on T720 GPUs

Now that the Mali T720 GPU is supoprted at the same level as the T760,
test it on PINE64 H64 boards.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agogitlab-ci: Remove non-default skips from Panfrost
Alyssa Rosenzweig [Sat, 23 Nov 2019 16:28:43 +0000 (11:28 -0500)]
gitlab-ci: Remove non-default skips from Panfrost

During the past months, Panfrost has matured considerably and several
tests stopped being flaky or failing at all.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopanfrost: White list the Mali T720
Tomeu Vizoso [Mon, 28 Oct 2019 08:59:30 +0000 (09:59 +0100)]
panfrost: White list the Mali T720

Support for this GPU is equal now to that of T760, so whitelist it.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Splatter on fragment out
Alyssa Rosenzweig [Tue, 26 Nov 2019 13:48:33 +0000 (08:48 -0500)]
pan/midgard: Splatter on fragment out

Make sure that the fragment is complete when writing it out.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
4 years agopanfrost: Simplify shader patching
Tomeu Vizoso [Wed, 20 Nov 2019 15:00:23 +0000 (16:00 +0100)]
panfrost: Simplify shader patching

We need to always upload anyway.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopanfrost: Simplify draw_flags
Alyssa Rosenzweig [Fri, 22 Nov 2019 16:45:13 +0000 (11:45 -0500)]
panfrost: Simplify draw_flags

Fixes dEQP-GLES3.functional.primitive_restart.*. Note the 0x18000 value
is accidentally somehow enabling primitive restart for some reason.
I'm not sure where this value came from but let's not.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
4 years agopanfrost: Implement pan_tiler for non-hierarchy GPUs
Alyssa Rosenzweig [Wed, 27 Nov 2019 13:31:16 +0000 (08:31 -0500)]
panfrost: Implement pan_tiler for non-hierarchy GPUs

The algorithm is as described. Nothing fancy here, just need to add some
new code paths depending on which model we're running on.

Tomeu:
- Also disable tiling when !hierarchy and !vertex_count
- Avoid creating polygon lists smaller than the minimum when
  vertex_count > 0 but tile size smaller than 16 byte
- Take into account tile size when calculating polygon list size for
  !hierarchy
- Allow 0-sized tiles in a single dimension

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
4 years agopanfrost: Add information about T720 tiling
Alyssa Rosenzweig [Wed, 27 Nov 2019 13:04:22 +0000 (08:04 -0500)]
panfrost: Add information about T720 tiling

We've figured out most of the big pieces, and though it looks faintly
like other Midgards, it's much simpler.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopanfrost: Add quirks system to cmdstream
Tomeu Vizoso [Thu, 28 Nov 2019 09:21:06 +0000 (10:21 +0100)]
panfrost: Add quirks system to cmdstream

Similarly to how it's already done in the compiler, add a way to express
differences between GPU models that need to be taken into account when
assembling the cmdstream.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
4 years agonir/algebraic: Rearrange bcsel sequences generated by nir_opt_peephole_select
Ian Romanick [Fri, 1 Nov 2019 23:23:09 +0000 (16:23 -0700)]
nir/algebraic: Rearrange bcsel sequences generated by nir_opt_peephole_select

Reviewed-by: Matt Turner <mattst88@gmail.com>
All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14660366 -> 14653437 (-0.05%)
instructions in affected programs: 316166 -> 309237 (-2.19%)
helped: 905
HURT: 10
helped stats (abs) min: 1 max: 36 x̄: 7.67 x̃: 6
helped stats (rel) min: 0.13% max: 18.75% x̄: 4.28% x̃: 3.60%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.10% max: 1.33% x̄: 0.70% x̃: 0.97%
95% mean confidence interval for instructions value: -7.91 -7.23
95% mean confidence interval for instructions %-change: -4.46% -3.99%
Instructions are helped.

total cycles in shared programs: 228571646 -> 228549759 (<.01%)
cycles in affected programs: 56239919 -> 56218032 (-0.04%)
helped: 681
HURT: 216
helped stats (abs) min: 1 max: 5156 x̄: 45.49 x̃: 10
helped stats (rel) min: <.01% max: 10.45% x̄: 1.29% x̃: 0.65%
HURT stats (abs)   min: 1 max: 320 x̄: 42.09 x̃: 14
HURT stats (rel)   min: <.01% max: 37.04% x̄: 1.38% x̃: 0.49%
95% mean confidence interval for cycles value: -41.51 -7.29
95% mean confidence interval for cycles %-change: -0.80% -0.49%
Cycles are helped.

LOST:   1
GAINED: 0

4 years agonir/algebraic: Simplify some Inf and NaN avoidance code
Ian Romanick [Sat, 2 Nov 2019 02:53:06 +0000 (19:53 -0700)]
nir/algebraic: Simplify some Inf and NaN avoidance code

Since a is non-negative, neither fsqrt nor frsq should return NaN.  frsq
should only return Inf when fsqrt returns 0.

The changes are pretty small, but this turns a few hundred hurt shaders
in the next patch into helped shaders.

An alternative to the intBitsToFloat is to import numpy and do
np.finfo(np.float32).max.  That's more explicit, but we may also want to
have specific bit encodings of float values later.  I could be convinced
either way, but intBitsToFloat(0x7f7fffff) was what I implemented first.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14661140 -> 14661104 (<.01%)
instructions in affected programs: 7520 -> 7484 (-0.48%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.32% max: 0.61% x̄: 0.49% x̃: 0.52%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.52% -0.47%
Instructions are helped.

total cycles in shared programs: 228585416 -> 228584806 (<.01%)
cycles in affected programs: 56321 -> 55711 (-1.08%)
helped: 32
HURT: 0
helped stats (abs) min: 2 max: 98 x̄: 19.06 x̃: 10
helped stats (rel) min: 0.08% max: 6.41% x̄: 1.09% x̃: 0.65%
95% mean confidence interval for cycles value: -28.32 -9.80
95% mean confidence interval for cycles %-change: -1.63% -0.54%
Cycles are helped.

Sandy Bridge
total cycles in shared programs: 152991077 -> 152991075 (<.01%)
cycles in affected programs: 11525 -> 11523 (-0.02%)
helped: 2
HURT: 2
helped stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.07% max: 0.11% x̄: 0.09% x̃: 0.09%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08%
95% mean confidence interval for cycles value: -5.27 4.27
95% mean confidence interval for cycles %-change: -0.16% 0.15%
Inconclusive result (value mean confidence interval includes 0).

No changes on Iron Lake or GM45.

4 years agointel/compiler: Increase nir_opt_peephole_select threshold
Ian Romanick [Fri, 1 Nov 2019 22:40:12 +0000 (15:40 -0700)]
intel/compiler: Increase nir_opt_peephole_select threshold

I tried 2, 4, 6, 8, and 10.  8 seemed to be the sweet spot across all
Intel platforms.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14736141 -> 14661140 (-0.51%)
instructions in affected programs: 2272413 -> 2197412 (-3.30%)
helped: 8416
HURT: 140
helped stats (abs) min: 1 max: 1152 x̄: 8.99 x̃: 6
helped stats (rel) min: 0.13% max: 42.55% x̄: 4.15% x̃: 3.20%
HURT stats (abs)   min: 1 max: 140 x̄: 4.73 x̃: 1
HURT stats (rel)   min: 0.03% max: 3.44% x̄: 0.87% x̃: 0.60%
95% mean confidence interval for instructions value: -9.36 -8.17
95% mean confidence interval for instructions %-change: -4.14% -3.99%
Instructions are helped.

total cycles in shared programs: 231560416 -> 228585416 (-1.28%)
cycles in affected programs: 126536021 -> 123561021 (-2.35%)
helped: 7092
HURT: 1898
helped stats (abs) min: 1 max: 419320 x̄: 519.02 x̃: 159
helped stats (rel) min: <.01% max: 77.25% x̄: 13.52% x̃: 11.77%
HURT stats (abs)   min: 1 max: 14518 x̄: 371.91 x̃: 36
HURT stats (rel)   min: <.01% max: 103.23% x̄: 5.92% x̃: 2.55%
95% mean confidence interval for cycles value: -514.34 -147.50
95% mean confidence interval for cycles %-change: -9.69% -9.14%
Cycles are helped.

total spills in shared programs: 5763 -> 5848 (1.47%)
spills in affected programs: 1797 -> 1882 (4.73%)
helped: 13
HURT: 13

total fills in shared programs: 17163 -> 16931 (-1.35%)
fills in affected programs: 7214 -> 6982 (-3.22%)
helped: 22
HURT: 19

total sends in shared programs: 730410 -> 730246 (-0.02%)
sends in affected programs: 2705 -> 2541 (-6.06%)
helped: 114
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 1.44 x̃: 1
helped stats (rel) min: 0.60% max: 20.00% x̄: 7.26% x̃: 5.88%
95% mean confidence interval for sends value: -1.55 -1.33
95% mean confidence interval for sends %-change: -7.90% -6.62%
Sends are helped.

LOST:   4
GAINED: 0

Sandy Bridge
total instructions in shared programs: 10760511 -> 10724637 (-0.33%)
instructions in affected programs: 961305 -> 925431 (-3.73%)
helped: 3734
HURT: 110
helped stats (abs) min: 1 max: 151 x̄: 9.66 x̃: 8
helped stats (rel) min: 0.14% max: 41.21% x̄: 4.93% x̃: 3.95%
HURT stats (abs)   min: 1 max: 20 x̄: 1.68 x̃: 1
HURT stats (rel)   min: 0.12% max: 5.41% x̄: 0.88% x̃: 0.52%
95% mean confidence interval for instructions value: -9.76 -8.91
95% mean confidence interval for instructions %-change: -4.90% -4.63%
Instructions are helped.

total cycles in shared programs: 153359411 -> 152991077 (-0.24%)
cycles in affected programs: 11615401 -> 11247067 (-3.17%)
helped: 2725
HURT: 1138
helped stats (abs) min: 1 max: 2844 x̄: 164.27 x̃: 80
helped stats (rel) min: 0.02% max: 48.60% x̄: 7.47% x̃: 3.91%
HURT stats (abs)   min: 1 max: 4351 x̄: 69.69 x̃: 25
HURT stats (rel)   min: 0.02% max: 40.00% x̄: 3.39% x̃: 1.47%
95% mean confidence interval for cycles value: -103.18 -87.52
95% mean confidence interval for cycles %-change: -4.57% -3.97%
Cycles are helped.

total sends in shared programs: 584038 -> 583855 (-0.03%)
sends in affected programs: 3512 -> 3329 (-5.21%)
helped: 157
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.17 x̃: 1
helped stats (rel) min: 2.38% max: 25.00% x̄: 6.52% x̃: 6.06%
95% mean confidence interval for sends value: -1.26 -1.07
95% mean confidence interval for sends %-change: -7.17% -5.87%
Sends are helped.

LOST:   23
GAINED: 0

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8122617 -> 8111592 (-0.14%)
instructions in affected programs: 380503 -> 369478 (-2.90%)
helped: 912
HURT: 86
helped stats (abs) min: 1 max: 129 x̄: 12.19 x̃: 9
helped stats (rel) min: 0.30% max: 39.21% x̄: 3.69% x̃: 2.57%
HURT stats (abs)   min: 1 max: 2 x̄: 1.05 x̃: 1
HURT stats (rel)   min: 0.12% max: 3.64% x̄: 0.54% x̃: 0.36%
95% mean confidence interval for instructions value: -12.00 -10.10
95% mean confidence interval for instructions %-change: -3.56% -3.10%
Instructions are helped.

total cycles in shared programs: 188509780 -> 188534398 (0.01%)
cycles in affected programs: 7211542 -> 7236160 (0.34%)
helped: 859
HURT: 132
helped stats (abs) min: 2 max: 690 x̄: 46.59 x̃: 16
helped stats (rel) min: 0.01% max: 26.76% x̄: 1.53% x̃: 0.33%
HURT stats (abs)   min: 2 max: 1592 x̄: 489.67 x̃: 618
HURT stats (rel)   min: 0.03% max: 185.92% x̄: 23.35% x̃: 6.26%
95% mean confidence interval for cycles value: 9.58 40.10
95% mean confidence interval for cycles %-change: 0.65% 2.93%
Cycles are HURT.

4 years agonir/opt_peephole_select: Don't count some unary operations
Ian Romanick [Fri, 1 Nov 2019 21:52:38 +0000 (14:52 -0700)]
nir/opt_peephole_select: Don't count some unary operations

In many cases, fsat, fneg, fabs, ineg, and iabs will get folded into
another instruction as either source or destination modifiers.
Counting them as instructions means that some if-statements won't get
converted to selects.  For example,

        vec1 32 ssa_25 = flt32 ssa_0, ssa_23.x
        /* succs: block_1 block_2 */
        if ssa_25 {
                block block_1:
                /* preds: block_0 */
                vec1 32 ssa_26 = fabs ssa_24
                vec1 32 ssa_27 = fneg ssa_26
                vec1 32 ssa_28 = fabs ssa_20
                vec1 32 ssa_29 = fneg ssa_28
                vec1 32 ssa_30 = fmul ssa_27, ssa_29
                vec1 32 ssa_31 = fsat ssa_30
                /* succs: block_3 */
        } else {
                block block_2:
                /* preds: block_0 */
                /* succs: block_3 */
        }
        block block_3:
        /* preds: block_1 block_2 */

block_1 isn't really 6 instructions, but it will be counted that way.

Most callers of the peephole_select pass use either 1 or 8.  It's very
easy to blow way past either of these limits with things that are really
only one or two actual instructions.

I also tried some fancier things like making sure the fsat was of
another SSA def from the same block, but the simple test was actually
better.

The i965 back-end SEL peephole pass still helps ~700 shaders in
shader-db with this change.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
All Gen6+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14743694 -> 14738910 (-0.03%)
instructions in affected programs: 156575 -> 151791 (-3.06%)
helped: 1204
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 3.97 x̃: 3
helped stats (rel) min: 0.15% max: 19.57% x̄: 5.15% x̃: 4.55%
95% mean confidence interval for instructions value: -4.12 -3.82
95% mean confidence interval for instructions %-change: -5.35% -4.95%
Instructions are helped.

total cycles in shared programs: 231749141 -> 231602916 (-0.06%)
cycles in affected programs: 2818975 -> 2672750 (-5.19%)
helped: 876
HURT: 322
helped stats (abs) min: 2 max: 788 x̄: 180.99 x̃: 220
helped stats (rel) min: <.01% max: 43.82% x̄: 20.75% x̃: 19.44%
HURT stats (abs)   min: 1 max: 1188 x̄: 38.27 x̃: 20
HURT stats (rel)   min: 0.09% max: 102.67% x̄: 5.17% x̃: 1.70%
95% mean confidence interval for cycles value: -130.47 -113.64
95% mean confidence interval for cycles %-change: -14.85% -12.72%
Cycles are helped.

total sends in shared programs: 730495 -> 730491 (<.01%)
sends in affected programs: 46 -> 42 (-8.70%)
helped: 2
HURT: 0

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8122757 -> 8122617 (<.01%)
instructions in affected programs: 14716 -> 14576 (-0.95%)
helped: 46
HURT: 1
helped stats (abs) min: 1 max: 8 x̄: 3.07 x̃: 3
helped stats (rel) min: 0.36% max: 10.00% x̄: 2.54% x̃: 1.06%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 1.59% max: 1.59% x̄: 1.59% x̃: 1.59%
95% mean confidence interval for instructions value: -3.42 -2.54
95% mean confidence interval for instructions %-change: -3.28% -1.62%
Instructions are helped.

total cycles in shared programs: 188510100 -> 188509780 (<.01%)
cycles in affected programs: 58994 -> 58674 (-0.54%)
helped: 32
HURT: 1
helped stats (abs) min: 2 max: 96 x̄: 10.06 x̃: 6
helped stats (rel) min: 0.05% max: 15.29% x̄: 1.37% x̃: 0.31%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.68% max: 0.68% x̄: 0.68% x̃: 0.68%
95% mean confidence interval for cycles value: -16.34 -3.06
95% mean confidence interval for cycles %-change: -2.46% -0.15%
Cycles are helped.

4 years agoiris: Allow max dynamic pool size of 2GB for gen12
Jordan Justen [Sat, 25 May 2019 08:33:17 +0000 (01:33 -0700)]
iris: Allow max dynamic pool size of 2GB for gen12

Reworks:
 * Adjust comment to list the state packets that curro found to be
   affected.

Fixes: 8125d7960b6 ("intel/dev: Add preliminary device info for Tigerlake")
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
4 years agoradeonsi/gfx10: fix the vertex order for triangle strips emitted by a GS
Marek Olšák [Fri, 29 Nov 2019 00:46:11 +0000 (19:46 -0500)]
radeonsi/gfx10: fix the vertex order for triangle strips emitted by a GS

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoradeonsi/gfx10: simplify some duplicated NGG GS code
Marek Olšák [Thu, 28 Nov 2019 23:15:48 +0000 (18:15 -0500)]
radeonsi/gfx10: simplify some duplicated NGG GS code

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoutil/u_thread: don't restrict u_thread_get_time_nano() to __linux__
Jonathan Gray [Sat, 30 Nov 2019 15:17:36 +0000 (02:17 +1100)]
util/u_thread: don't restrict u_thread_get_time_nano() to __linux__

pthread_getcpuclockid() and clock_gettime() are also available on at least
OpenBSD, FreeBSD, NetBSD, DragonFly, Cygwin.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
4 years agoutil/futex: use futex syscall on OpenBSD
Jonathan Gray [Sat, 30 Nov 2019 15:19:38 +0000 (02:19 +1100)]
util/futex: use futex syscall on OpenBSD

Make use of the futex syscall added in OpenBSD 6.2.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>