mesa.git
4 years agoegl: Fix implicit declaration of ffs
Kevin Strasser [Thu, 12 Sep 2019 16:38:24 +0000 (09:38 -0700)]
egl: Fix implicit declaration of ffs

Found when building for Android in C99 mode. Include bitscan.h to ensure ffs is
available.

Fixes: 7b4ed2b5 ("egl: Convert configs to use shifts and sizes instead of masks")
Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agointel/tools: Fix aubinator usage of rb_tree.
Rafael Antognolli [Mon, 30 Sep 2019 19:34:12 +0000 (12:34 -0700)]
intel/tools: Fix aubinator usage of rb_tree.

The order of comparison has changed, so we need to invert the logic of
"insert_left" when using rb_tree_insert_at().

Fixes: dae33052dbf (util/rb_tree: Reverse the order of comparison
                    functions).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
4 years agodocs/relnotes: Add EXT_demote_to_helper_invocation support on iris, i965
Caio Marcelo de Oliveira Filho [Fri, 20 Sep 2019 18:08:44 +0000 (11:08 -0700)]
docs/relnotes: Add EXT_demote_to_helper_invocation support on iris, i965

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agoi965: Enable EXT_demote_to_helper_invocation
Caio Marcelo de Oliveira Filho [Fri, 20 Sep 2019 17:56:36 +0000 (10:56 -0700)]
i965: Enable EXT_demote_to_helper_invocation

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agoiris: Enable EXT_demote_to_helper_invocation
Caio Marcelo de Oliveira Filho [Fri, 20 Sep 2019 16:20:17 +0000 (09:20 -0700)]
iris: Enable EXT_demote_to_helper_invocation

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agogallium: Add PIPE_CAP_DEMOTE_TO_HELPER_INVOCATION
Caio Marcelo de Oliveira Filho [Fri, 20 Sep 2019 16:21:02 +0000 (09:21 -0700)]
gallium: Add PIPE_CAP_DEMOTE_TO_HELPER_INVOCATION

To enable EXT_demote_to_helper_invocation:

    This extension adds a "demote" keyword that is similar to "discard" but
    only suppresses subsequent writes and outputs to the framebuffer, and
    does not terminate the execution of the invocation. For the remainder
    of the execution, the invocation is "demoted" to act like a helper
    invocation.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agoglsl: Add helperInvocationEXT() builtin
Caio Marcelo de Oliveira Filho [Fri, 20 Sep 2019 17:50:37 +0000 (10:50 -0700)]
glsl: Add helperInvocationEXT() builtin

From EXT_demote_to_helper_invocation, implemented with the existing
nir_intrinsic_is_helper_invocation.

Such builtin is necessary when using `demote` because we can't
redefine the value of gl_HelperInvocation (since it is an input
variable).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agoglsl: Parse `demote` statement
Caio Marcelo de Oliveira Filho [Fri, 20 Sep 2019 16:34:19 +0000 (09:34 -0700)]
glsl: Parse `demote` statement

When the EXT_demote_to_helper_invocation extension is enabled,
`demote` is treated as a keyword, and produces an ir_demote.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agoglsl: Add ir_demote
Caio Marcelo de Oliveira Filho [Fri, 20 Sep 2019 16:27:00 +0000 (09:27 -0700)]
glsl: Add ir_demote

To represent the new `demote` keyword when using
EXT_demote_to_helper_invocation extension.  Most of the changes are to
include it in the visitors.

Demote is not considered a control flow, so also include an empty
visit member function in ir_control_flow_visitor.

Only NIR actually supports `demote`, so assert the translations for
TGSI and Mesa's gl_program -- since the demote is not expected to
appear for those.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agomesa: Extension boilerplate for EXT_demote_to_helper_invocation
Caio Marcelo de Oliveira Filho [Thu, 19 Sep 2019 20:54:18 +0000 (13:54 -0700)]
mesa: Extension boilerplate for EXT_demote_to_helper_invocation

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agoiris: Fix iris_rebind_buffer() for VBOs with non-zero offsets.
Kenneth Graunke [Tue, 24 Sep 2019 03:37:39 +0000 (20:37 -0700)]
iris: Fix iris_rebind_buffer() for VBOs with non-zero offsets.

We can't just check for the BO base address, we need to check for the
full address including any offset we may have applied.  When updating
the address, we need to include the offset again.

Fixes: 5ad0c88dbe3 ("iris: Replace buffer backing storage and rebind to update addresses.")
4 years agodocs/install: drop autotools references
Eric Engestrom [Mon, 30 Sep 2019 18:11:22 +0000 (19:11 +0100)]
docs/install: drop autotools references

19.3 will be the 3rd release without autotools, people know it's gone by now.

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
4 years agomeson: Test for -Wl,--build-id=sha1
Maya Rashish [Tue, 3 Sep 2019 08:55:34 +0000 (11:55 +0300)]
meson: Test for -Wl,--build-id=sha1

instead of hard-coding OS list. Helps Solaris ld builds.

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Signed-off-by: Maya Rashish <coypu@sdf.org>
4 years agodocs: remove stray newline
Dylan Baker [Mon, 30 Sep 2019 18:02:41 +0000 (11:02 -0700)]
docs: remove stray newline

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: use https for mesonbuild.com
Dylan Baker [Mon, 30 Sep 2019 18:02:31 +0000 (11:02 -0700)]
docs: use https for mesonbuild.com

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agodocs: update install docs for meson
Dylan Baker [Mon, 30 Sep 2019 18:02:09 +0000 (11:02 -0700)]
docs: update install docs for meson

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agoac/nir: fix GLSL imageSamples()
Marek Olšák [Mon, 16 Sep 2019 23:39:40 +0000 (19:39 -0400)]
ac/nir: fix GLSL imageSamples()

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agoac: add ac_build_image_get_sample_count from radeonsi
Marek Olšák [Mon, 16 Sep 2019 23:37:04 +0000 (19:37 -0400)]
ac: add ac_build_image_get_sample_count from radeonsi

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agoac/surface: don't allocate FMASK if there is no graphics
Marek Olšák [Fri, 13 Sep 2019 22:27:46 +0000 (18:27 -0400)]
ac/surface: don't allocate FMASK if there is no graphics

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agotgsi_to_nir: handle PIPE_FORMAT_NONE in image opcodes
Marek Olšák [Tue, 17 Sep 2019 01:19:44 +0000 (21:19 -0400)]
tgsi_to_nir: handle PIPE_FORMAT_NONE in image opcodes

radeonsi doesn't use the format and internal shaders don't set it.

Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
4 years agomeson: gallium media state trackers require libdrm with x11
Dylan Baker [Thu, 9 May 2019 17:32:31 +0000 (10:32 -0700)]
meson: gallium media state trackers require libdrm with x11

v2: - update copyright year in all changed files
    - rebase on master

Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agoiris: Disable CCS_E for 32-bit floating point textures.
Kenneth Graunke [Thu, 29 Aug 2019 07:38:15 +0000 (00:38 -0700)]
iris: Disable CCS_E for 32-bit floating point textures.

A while back, Michael Larabel noticed that Paraview's Wavelet Volume
case runs significantly slower on iris than i965.  It turns out this
is because we enable CCS_E for 32-bit floating point formats, while
i965 disables it, with an oblique comment saying that we benchmarked
it (on what exactly?) and determined that it was a loss.

Paraview uses both R32_FLOAT and R32G32B32A32_FLOAT, and I observed
large framerate drops when enabling CCS_E for either format.  However,
several other benchmarks (Aztec Ruins, many Synmark cases) use 16-bit
floating point formats, with no apparent ill effects.

So, disable compression for 32-bit float formats for now, but leave it
enabled for 16-bit float formats as they seem to be working fine.

Improves performance in Paraview's Wavelet Volume test by 62% on a
Skylake GT4e.

Fixes: 3cfc6a207bd ("iris: Fill out res->aux.possible_usages")
4 years agoac: reorder and print all radeon_info fields
Marek Olšák [Fri, 20 Sep 2019 04:54:22 +0000 (00:54 -0400)]
ac: reorder and print all radeon_info fields

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoac: set the number of SDPs same as the number of TCCs
Marek Olšák [Fri, 20 Sep 2019 02:17:30 +0000 (22:17 -0400)]
ac: set the number of SDPs same as the number of TCCs

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoac: fix num_good_cu_per_sh for harvested chips
Marek Olšák [Fri, 20 Sep 2019 02:16:51 +0000 (22:16 -0400)]
ac: fix num_good_cu_per_sh for harvested chips

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradeonsi/gfx10: fix corruption for chips with harvested TCCs
Marek Olšák [Tue, 24 Sep 2019 20:56:57 +0000 (16:56 -0400)]
radeonsi/gfx10: fix corruption for chips with harvested TCCs

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoac: add radeon_info::tcc_harvested
Marek Olšák [Tue, 24 Sep 2019 20:56:21 +0000 (16:56 -0400)]
ac: add radeon_info::tcc_harvested

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoac: fix incorrect vram_size reported by the kernel
Marek Olšák [Tue, 24 Sep 2019 20:47:05 +0000 (16:47 -0400)]
ac: fix incorrect vram_size reported by the kernel

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradeonsi/gfx10: fix L2 cache rinse programming
Marek Olšák [Tue, 24 Sep 2019 19:15:00 +0000 (15:15 -0400)]
radeonsi/gfx10: fix L2 cache rinse programming

Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoetnaviv: fix bitmask typo
Eric Engestrom [Sun, 29 Sep 2019 21:27:24 +0000 (22:27 +0100)]
etnaviv: fix bitmask typo

Fixes: d92689c46f0d2da05ae6 ("etnaviv: nir: add native integers (HALTI2+)")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
4 years agoglx: Log the filename of the drm device if we fail to open it
Adam Jackson [Fri, 27 Sep 2019 16:16:22 +0000 (12:16 -0400)]
glx: Log the filename of the drm device if we fail to open it

Helps point the user to the specific device that's having issues, since
you're increasingly likely to have more than one.

Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/107
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoscons/windows: Enable compute shaders when possible.
pal1000 [Sun, 29 Sep 2019 15:35:29 +0000 (18:35 +0300)]
scons/windows: Enable compute shaders when possible.

Tests done with llvm-config indicate that there are only 2 libraries in
irreader and not in engine, LLVMAsmParser and LLVMIRReader and both of them
are part of coroutines so I replaced irreader with coroutines and added
libraries unique to coroutines.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
4 years agopan/midgard: Allow scheduling conditions with constants
Alyssa Rosenzweig [Sat, 28 Sep 2019 17:05:12 +0000 (13:05 -0400)]
pan/midgard: Allow scheduling conditions with constants

Now that we have constant adjustment logic abstracted, we can do this
safely. Along with the csel inversion patch, this allows many more
common csel ops to inline their condition in the bundle.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Add csel invert optimization
Alyssa Rosenzweig [Sat, 28 Sep 2019 16:39:15 +0000 (12:39 -0400)]
pan/midgard: Add csel invert optimization

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Add mir_flip helper
Alyssa Rosenzweig [Sat, 28 Sep 2019 16:38:51 +0000 (12:38 -0400)]
pan/midgard: Add mir_flip helper

Useful for various operations on both commutative and anticommutative
ops.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Tightly pack 32-bit constants
Alyssa Rosenzweig [Sat, 28 Sep 2019 16:13:52 +0000 (12:13 -0400)]
pan/midgard: Tightly pack 32-bit constants

If we can reuse constant slots from other instructions, we would like to
do so to include more instructions per bundle.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Allow writeout to see into the future
Alyssa Rosenzweig [Sat, 28 Sep 2019 14:43:51 +0000 (10:43 -0400)]
pan/midgard: Allow writeout to see into the future

If an instruction could be scheduled to vmul to satisfy the writeout
conditions, let's do that and save an instruction+cycle per fragment
shader.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Allow 6 instructions per bundle
Alyssa Rosenzweig [Sat, 28 Sep 2019 14:28:48 +0000 (10:28 -0400)]
pan/midgard: Allow 6 instructions per bundle

We never had a scheduler good enough to hit this case before! :)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Only one conditional per bundle allowed
Alyssa Rosenzweig [Sat, 28 Sep 2019 14:22:35 +0000 (10:22 -0400)]
pan/midgard: Only one conditional per bundle allowed

There's no r32 to save ya after you use up r31 :)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Schedule to smul/sadd
Alyssa Rosenzweig [Sat, 28 Sep 2019 13:48:53 +0000 (09:48 -0400)]
pan/midgard: Schedule to smul/sadd

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Extend choose_instruction for scalar units
Alyssa Rosenzweig [Sat, 28 Sep 2019 13:48:43 +0000 (09:48 -0400)]
pan/midgard: Extend choose_instruction for scalar units

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Don't double check SCALAR units
Alyssa Rosenzweig [Sat, 28 Sep 2019 13:48:27 +0000 (09:48 -0400)]
pan/midgard: Don't double check SCALAR units

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Use new scheduler
Alyssa Rosenzweig [Mon, 23 Sep 2019 12:00:51 +0000 (08:00 -0400)]
pan/midgard: Use new scheduler

We still emit in-order but we switch to using the bundles created from
the new scheduler, which will allow greater flexibility and room for
out-of-order optimization.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Add distance metric to choose_instruction
Alyssa Rosenzweig [Sat, 28 Sep 2019 00:18:16 +0000 (20:18 -0400)]
pan/midgard: Add distance metric to choose_instruction

We require chosen instructions to be "close", to avoid ballooning
register pressure. This is a kludge that will go away once we have
proper liveness tracking in the scheduler, but for now it prevents a lot
of needless spilling.

v2: Lower threshold to 6 (from 8). Schedule is hurt, but a few shaders
that spilled excessively are fixed.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Derp

4 years agopan/midgard: Add mir_choose_alu helper
Alyssa Rosenzweig [Fri, 27 Sep 2019 12:18:54 +0000 (08:18 -0400)]
pan/midgard: Add mir_choose_alu helper

Based on a given unit.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Implement load/store pairing
Alyssa Rosenzweig [Mon, 23 Sep 2019 19:57:58 +0000 (15:57 -0400)]
pan/midgard: Implement load/store pairing

We can bundle two load/store together. This eliminates the need for
explicit load/store pairing in a prepass, as well.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Extend csel_swizzle to branches
Alyssa Rosenzweig [Tue, 24 Sep 2019 13:06:37 +0000 (09:06 -0400)]
pan/midgard: Extend csel_swizzle to branches

Conditions for branches don't have a swizzle explicitly in the emitted
binary, but they do implicitly get swizzled in whatever instruction
wrote r31, so we need to handle that.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Add helpers for scheduling conditionals
Alyssa Rosenzweig [Mon, 23 Sep 2019 19:37:53 +0000 (15:37 -0400)]
pan/midgard: Add helpers for scheduling conditionals

Conditional instructions (csel and conditional branches) require their
condition to be written to a special condition pipeline register (r31.w
for scalar, r31.xyzw for vector). However, pipeline registers are live
only for the duration of a single bundle. As such, the logic to schedule
conditionals correct is surprisingly complex. Essentially, we see if we
could stuff the conditional within the same bundle as the csel/branch
without breaking anything; if we can, we do that. If we can't, we add a
dummy move to make room.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Implement predicate->unit
Alyssa Rosenzweig [Fri, 27 Sep 2019 12:19:51 +0000 (08:19 -0400)]
pan/midgard: Implement predicate->unit

This allows ALUs to select for each unit of the bundle separately.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Add predicate->exclude
Alyssa Rosenzweig [Fri, 27 Sep 2019 19:43:18 +0000 (15:43 -0400)]
pan/midgard: Add predicate->exclude

A bit of a kludge but allows setting an implicit dependency of synthetic
conditional moves on the actual condition, fixing code generated like:

   vmul.feq r0, ..
   sadd.imov r31, .., r0
   vadd.fcsel [...]

The imov runs simultaneous with feq so it gets garbage results, but it's
too late to add an actual dependency practically speaking, since the new
synthetic imov doesn't have a node associated.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Add constant intersection filters
Alyssa Rosenzweig [Mon, 23 Sep 2019 20:07:53 +0000 (16:07 -0400)]
pan/midgard: Add constant intersection filters

In the future, we will want to keep track of which components of
constants of various sizes correspond to which parts of the bundle
constants, like in the old scheduler. For now, let's just stub it out
for a simple rule of one instruction with embedded constants per bundle.
We can eventually do better, of course.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Remove csel constant unit force
Alyssa Rosenzweig [Mon, 23 Sep 2019 11:51:08 +0000 (07:51 -0400)]
pan/midgard: Remove csel constant unit force

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Add mir_schedule_texture/ldst/alu helpers
Alyssa Rosenzweig [Sun, 22 Sep 2019 13:08:33 +0000 (09:08 -0400)]
pan/midgard: Add mir_schedule_texture/ldst/alu helpers

We don't actually do any scheduling here yet, but add per-tag helpers to
consume an instruction, print it, pop it off the worklist.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Add mir_choose_bundle helper
Alyssa Rosenzweig [Sun, 22 Sep 2019 13:01:07 +0000 (09:01 -0400)]
pan/midgard: Add mir_choose_bundle helper

It's not always obvious what the optimal bundle type should be. Let's
break out the logic to decide.

Currently set for purely in-order operation.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Add mir_update_worklist helper
Alyssa Rosenzweig [Sun, 22 Sep 2019 12:50:22 +0000 (08:50 -0400)]
pan/midgard: Add mir_update_worklist helper

After we've chosen an instruction, popped it off, and processed it, it's
time to update the worklist, removing that instruction from the
dependency graph to allow its dependents to be put onto the worklist.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Add mir_choose_instruction stub
Alyssa Rosenzweig [Wed, 18 Sep 2019 12:26:30 +0000 (08:26 -0400)]
pan/midgard: Add mir_choose_instruction stub

In the future, this routine will implement the core scheduling logic to
decide which instruction out of the worklist will be scheduled next, in
a way that minimizes cycle count and register pressure.

In the present, we are more interested in replicating in-order
scheduling with the much-more-powerful out-of-order model. So rather
than discriminating by a register pressure estimate, we simply choose
the latest possible instruction in the worklist.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Initialize worklist
Alyssa Rosenzweig [Sat, 31 Aug 2019 18:56:58 +0000 (11:56 -0700)]
pan/midgard: Initialize worklist

This flows naturally from the dependency graph

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Calculate dependency graph
Alyssa Rosenzweig [Sat, 31 Aug 2019 18:55:31 +0000 (11:55 -0700)]
pan/midgard: Calculate dependency graph

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Add flatten_mir helper
Alyssa Rosenzweig [Sat, 31 Aug 2019 18:08:39 +0000 (11:08 -0700)]
pan/midgard: Add flatten_mir helper

We would like to flatten a linked list of midgard_instructions into an
array of midgard_instruction pointers on the heap.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Squeeze indices before scheduling
Alyssa Rosenzweig [Fri, 27 Sep 2019 12:20:17 +0000 (08:20 -0400)]
pan/midgard: Squeeze indices before scheduling

This allows node_count to be correct while scheduling.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Fix component count handling for ldst
Alyssa Rosenzweig [Fri, 27 Sep 2019 21:07:30 +0000 (17:07 -0400)]
pan/midgard: Fix component count handling for ldst

It's not based on the writemask and it can't be inferred; it's just
intrinsic to the op itself.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Add missing parans in SWIZZLE definition
Alyssa Rosenzweig [Fri, 27 Sep 2019 21:07:10 +0000 (17:07 -0400)]
pan/midgard: Add missing parans in SWIZZLE definition

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agonouveau: set lower_sub = true
Daniel Schürmann [Fri, 27 Sep 2019 10:12:25 +0000 (12:12 +0200)]
nouveau: set lower_sub = true

Subtractions are already implemented as additions anyway.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
4 years agov3d: Enable the late algebraic optimizations to get real subs.
Eric Anholt [Wed, 25 Sep 2019 18:56:06 +0000 (11:56 -0700)]
v3d: Enable the late algebraic optimizations to get real subs.

This worked better than my original v3d-local pass for just subs, and is a
huge win over not producing subs.

total instructions in shared programs: 6408469 -> 6167932 (-3.75%)
total threads in shared programs: 153784 -> 154104 (0.21%)
total uniforms in shared programs: 2157078 -> 1905823 (-11.65%)
total max-temps in shared programs: 904546 -> 895796 (-0.97%)
total spills in shared programs: 4959 -> 4993 (0.69%)
total fills in shared programs: 6558 -> 6670 (1.71%)
total sfu-stalls in shared programs: 25845 -> 25175 (-2.59%)
total inst-and-stalls in shared programs: 6434314 -> 6193107 (-3.75%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
4 years agoaco: call nir_opt_algebraic_late() exhaustively
Daniel Schürmann [Thu, 26 Sep 2019 10:08:13 +0000 (12:08 +0200)]
aco: call nir_opt_algebraic_late() exhaustively

57559 shaders in 28980 tests
Totals:
SGPRS: 2963407 -> 2959935 (-0.12 %)
VGPRS: 2014812 -> 2016328 (0.08 %)
Spilled SGPRs: 1077 -> 1077 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 10348 -> 10348 (0.00 %) dwords per thread
Code Size: 114545436 -> 114498084 (-0.04 %) bytes
LDS: 933 -> 933 (0.00 %) blocks
Max Waves: 375997 -> 375866 (-0.03 %)

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
4 years agoradv/aco: Don't lower subtractions
Daniel Schürmann [Wed, 25 Sep 2019 14:34:29 +0000 (16:34 +0200)]
radv/aco: Don't lower subtractions

40228 shaders in 20236 tests
Totals:
SGPRS: 2045512 -> 2046496 (0.05 %)
VGPRS: 1430856 -> 1430464 (-0.03 %)
Spilled SGPRs: 1077 -> 1077 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 10348 -> 10348 (0.00 %) dwords per thread
Code Size: 77202840 -> 77151832 (-0.07 %) bytes
LDS: 863 -> 863 (0.00 %) blocks
Max Waves: 260729 -> 260754 (0.01 %)

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
4 years agonir: Remove unnecessary subtraction optimizations
Daniel Schürmann [Wed, 25 Sep 2019 14:33:10 +0000 (16:33 +0200)]
nir: Remove unnecessary subtraction optimizations

These optimizations are already covered after lowering.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
4 years agonir: recombine nir_op_*sub when lower_sub = false
Daniel Schürmann [Wed, 25 Sep 2019 14:20:09 +0000 (16:20 +0200)]
nir: recombine nir_op_*sub when lower_sub = false

There are some optimizations which are only implemented for additions
and some optimizations which assume that subtractions have been lowered.
By lowering all subtractions first and later recombine for backends
which prefer this option, we don't have to implement them twice.

This patch also moves lower_negate to nir_opt_algebraic_late() to enable
these optimizations for backends which make use of it.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
4 years agofreedreno: Enable the nir_opt_algebraic_late() pass.
Daniel Schürmann [Fri, 27 Sep 2019 10:49:06 +0000 (12:49 +0200)]
freedreno: Enable the nir_opt_algebraic_late() pass.

Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
4 years agovc4: Enable the nir_opt_algebraic_late() pass.
Eric Anholt [Thu, 26 Sep 2019 18:03:46 +0000 (11:03 -0700)]
vc4: Enable the nir_opt_algebraic_late() pass.

Upcoming changes to sub optimization will make this pass required.  Over
the course of that series, we see uniforms +.46%, instructions -.24%
(seems like a fine tradeoff -- uniforms are 1/2 the size of instructions
as far as cache occupancy)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
4 years agogitlab-ci: Add test-container:arm64 to needs: for arm64 test jobs
Michel Dänzer [Wed, 18 Sep 2019 14:28:41 +0000 (16:28 +0200)]
gitlab-ci: Add test-container:arm64 to needs: for arm64 test jobs

Without this, it was theoretically possible for the jobs to run before
the docker image was ready.

v2:
* Use - list syntax instead of [] (Eric Engestrom)

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agogitlab-ci: Add needs: for x86 buster docker image
Michel Dänzer [Wed, 18 Sep 2019 14:21:32 +0000 (16:21 +0200)]
gitlab-ci: Add needs: for x86 buster docker image

This allows most build jobs to run before the stretch or arm64 docker
images are ready.

v2:
* Use - list syntax instead of [] (Eric Engestrom)

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agogitlab-ci: Declare needs: for stretch docker image
Michel Dänzer [Wed, 18 Sep 2019 14:17:01 +0000 (16:17 +0200)]
gitlab-ci: Declare needs: for stretch docker image

This allows the *-old-llvm jobs to run before the buster docker images
are ready.

v2:
* Use - list syntax instead of [] (Eric Engestrom)

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoscons: Fix MSYS2 Mingw-w64 build.
pal1000 [Fri, 1 Mar 2019 10:30:15 +0000 (12:30 +0200)]
scons: Fix MSYS2 Mingw-w64 build.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This patch is based on https://github.com/msys2/MINGW-packages/blob/28e3f85e09b6947ea80036c49f6c38f1394f93ca/mingw-w64-mesa/link-ole32.patch but with tweaks to avoid MSVC build break when applied.

v2: Create Mingw platform alias pointing to windows host platform define to avoid spurious crosscompilation;

v3: Fix obviously wrong compiler flags for swr driver;

v4: Update original patch URL because it has been relocated;

v5: Don't bother patching autools stuff as it's not used by MSYS2 Mingw-w64 build and it's days are numbered anyway;

v6: After Mingw posix flag fix in 295851eb things are far simpler as we don't need more linking of uuid, ole32, version and shell32 than what is already in place.

4 years agoscons/windows: Support build with LLVM 9.
pal1000 [Fri, 6 Sep 2019 14:34:30 +0000 (17:34 +0300)]
scons/windows: Support build with LLVM 9.

As X86AsmPrinter component is gone, LLVMX86AsmPrinter got replaced
with LLVMRemarks, LLVMBitstreamReader and LLVMDebugInfoDWARF.

Tests done with llvm-config on both LLVM 8 and 9 indicate that
mcjit, bitwriter and x86asmprinter fully fit inside engine component.

On other platforms and with meson build mcdisassembler was used to replace
X86AsmPrinter but mcdisassembler also fully fits inside engine component
for LLVM>=8 according to same tests.

v2: Avoid duplicating code related to Mingw pthreads.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org>
On 19.1 this patch does not apply cleanly without 88eb2a1f

4 years agolima: set uniforms_address lower bits properly
Vasily Khoruzhick [Sat, 28 Sep 2019 16:57:55 +0000 (09:57 -0700)]
lima: set uniforms_address lower bits properly

Looks like blob uses following values for uniforms buffer:

0 for 8 bytes
1 for 16 bytes
2 for 24 bytes
2 for 32 bytes
3 for 40 bytes
3 for 48 bytes
3 for 56 bytes
3 for 64 bytes
4 for 72 bytes

It all looks like log2(size / 8) rounded up, so let's do the same.

Fixes: 931fc2a7b3f9("lima: do not set the PP uniforms address lowest bits")
Reviewed-by: Icenowy Zheng <icenowy@aosc.io>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
4 years agoscons: add py3 support
Michel Zou [Sat, 28 Sep 2019 06:53:38 +0000 (08:53 +0200)]
scons: add py3 support

SCons 3.1 has moved to python 3, requiring this fix
to continue supporting scons builds.

Closes: #944
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Eric Engestrom <eric@engestrom.ch>
4 years agoandroid: aco: add support for libmesa_aco
Mauro Rossi [Sat, 21 Sep 2019 15:58:52 +0000 (17:58 +0200)]
android: aco: add support for libmesa_aco

Android building rules are added in src/amd/Android.compiler.mk
libmesa_aco static library is built conditionally to radeonsi
as done for vulkan.radv module

This will prevent Android build errors for non x86 systems

filter-out compiler/aco_instruction_selection_setup.cpp source,
as already included by compiler/aco_instruction_selection.cpp
and would cause several multiple definition linker errors

NOTE: libLLVM requires AMDGPU Disassembler to build radv with aco

Fixes: 93c8ebf ("aco: Initial commit of independent AMD compiler")
Fixes: a70a998 ("radv/aco: Setup alternate path in RADV to support the experimental ACO compiler")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
4 years agoandroid: compiler/nir: build nir_divergence_analysis.c
Mauro Rossi [Sat, 21 Sep 2019 15:48:52 +0000 (17:48 +0200)]
android: compiler/nir: build nir_divergence_analysis.c

Prerequisite to avoid following radv linking error happening with aco

FAILED: out/target/product/x86_64/obj_x86/SHARED_LIBRARIES/vulkan.radv_intermediates/LINKED/vulkan.radv.so
...
external/mesa/src/amd/compiler/aco_instruction_selection_setup.cpp:178:
error: undefined reference to 'nir_divergence_analysis'
clang.real: error: linker command failed with exit code 1 (use -v to see invocation)

Fixes: df86c5f ("nir: add divergence analysis pass.")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
4 years agoandroid: aco: fix undefined template 'std::__1::array' build errors
Mauro Rossi [Sat, 21 Sep 2019 15:38:52 +0000 (17:38 +0200)]
android: aco: fix undefined template 'std::__1::array' build errors

Fixes a few building errors similar to the following:

In file included from external/mesa/src/amd/compiler/aco_instruction_selection.cpp:26:
In file included from external/libcxx/include/algorithm:639:
external/libcxx/include/utility:321:9:
error: implicit instantiation of undefined template 'std::__1::array<aco::Temp, 4>'
    _T2 second;
        ^

Fixes: 93c8ebf ("aco: Initial commit of independent AMD compiler")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
4 years agoetnaviv: nir: fix gl_FragDepth
Jonathan Marek [Thu, 12 Sep 2019 20:12:02 +0000 (16:12 -0400)]
etnaviv: nir: fix gl_FragDepth

Fixes the following piglit test: fragdepth_gles2 (for ETNA_MESA_DEBUG=nir)

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoetnaviv: disable earlyZ when shader writes fragment depth
Jonathan Marek [Tue, 17 Sep 2019 11:49:46 +0000 (07:49 -0400)]
etnaviv: disable earlyZ when shader writes fragment depth

Fixes the following piglit test: fragdepth_gles2

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoetnaviv: nir: make lower_alu easier to follow
Jonathan Marek [Thu, 12 Sep 2019 17:51:32 +0000 (13:51 -0400)]
etnaviv: nir: make lower_alu easier to follow

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoetnaviv: remove extra allocation for shader code
Jonathan Marek [Thu, 12 Sep 2019 17:17:21 +0000 (13:17 -0400)]
etnaviv: remove extra allocation for shader code

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoetnaviv: nir: remove "options" struct
Jonathan Marek [Thu, 12 Sep 2019 17:14:20 +0000 (13:14 -0400)]
etnaviv: nir: remove "options" struct

It just makes thing more complicated for no reason.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoetnaviv: nir: use store_deref instead of store_output
Jonathan Marek [Thu, 12 Sep 2019 16:28:28 +0000 (12:28 -0400)]
etnaviv: nir: use store_deref instead of store_output

Allows some simplification.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoetnaviv: nir: add native integers (HALTI2+)
Jonathan Marek [Wed, 11 Sep 2019 20:59:31 +0000 (16:59 -0400)]
etnaviv: nir: add native integers (HALTI2+)

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoqetnaviv: nir: use new immediates when possible
Jonathan Marek [Thu, 12 Sep 2019 20:07:14 +0000 (16:07 -0400)]
qetnaviv: nir: use new immediates when possible

Note it can still be improved a bit:
* Use alu swizzle to determine if src is scalar
* Take into account new immediates in the multiple uniform src lowering

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoetnaviv: nir: set num_components for inputs/outputs
Jonathan Marek [Wed, 11 Sep 2019 20:45:05 +0000 (16:45 -0400)]
etnaviv: nir: set num_components for inputs/outputs

This can improve performance by allowing the LAST_VARYING_2X bit to be
set when possible (and possibility more benefits on HALTI5 where the
number of components is set for each varying).

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoetnaviv: nir: allocate contiguous components for LOAD destination
Jonathan Marek [Wed, 11 Sep 2019 18:29:10 +0000 (14:29 -0400)]
etnaviv: nir: allocate contiguous components for LOAD destination

LOAD starts reading into the first enabled destination component, and
doesn't skip disabled components, so we need to allocate a destination with
contiguous components.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoetnaviv: nir: fix gl_FrontFacing
Jonathan Marek [Wed, 11 Sep 2019 17:42:43 +0000 (13:42 -0400)]
etnaviv: nir: fix gl_FrontFacing

Only invert front facing when glFrontFace is GL_CW.

Fixes following deqp test:
dEQP-GLES2.functional.shaders.builtin_variable.frontfacing

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agolima: do not set the PP uniforms address lowest bits
Icenowy Zheng [Thu, 26 Sep 2019 15:25:09 +0000 (23:25 +0800)]
lima: do not set the PP uniforms address lowest bits

The PP uniforms address register in render state is not a direct pointer
to the uniforms storage -- instead, it points to an one-item array, and
the array item is the real pointer to the uniforms storage.

This register reuses some of its LSBs as a size field. Currently the
size is set according to the length of the real uniforms storage.
However, as the register itself contains only a pointer to the one-item
array, the size field should be set to the length of the one-item array
and subtract it by 1, which means a fixed value of 0. That means we can
just omit it now.

Test shows this should be the correct approach to set this register.

Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
4 years agoglsl: disallow incompatible matrices multiplication
Andrii Simiklit [Tue, 10 Sep 2019 14:00:32 +0000 (17:00 +0300)]
glsl: disallow incompatible matrices multiplication

glsl 4.4 spec section '5.9 expressions':
"The operator is multiply (*), where both operands are matrices or one operand is a vector and the
 other a matrix. A right vector operand is treated as a column vector and a left vector operand as a
 row vector. In all these cases, it is required that the number of columns of the left operand is equal
 to the number of rows of the right operand. Then, the multiply (*) operation does a linear
 algebraic multiply, yielding an object that has the same number of rows as the left operand and the
 same number of columns as the right operand. Section 5.10 “Vector and Matrix Operations”
 explains in more detail how vectors and matrices are operated on."

This fix disallows a multiplication of incompatible matrices like:
mat4x3(..) * mat4x3(..)
mat4x2(..) * mat4x2(..)
mat3x2(..) * mat3x2(..)
....

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111664
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
4 years agoturnip: Fix failure behavior of vkCreateGraphicsPipelines.
Eric Anholt [Thu, 19 Sep 2019 18:09:46 +0000 (11:09 -0700)]
turnip: Fix failure behavior of vkCreateGraphicsPipelines.

According to the 1.1.123 spec:

    "The implementation will attempt to create all pipelines, and only
     return VK_NULL_HANDLE values for those that actually failed."

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agoturnip: Silence compiler warning about uninit pipeline.
Eric Anholt [Thu, 19 Sep 2019 18:08:25 +0000 (11:08 -0700)]
turnip: Silence compiler warning about uninit pipeline.

The code was fine as far as I see, but the warning was irritating.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agoturnip: Add a .editorconfig and .dir-locals.el
Eric Anholt [Thu, 19 Sep 2019 18:03:35 +0000 (11:03 -0700)]
turnip: Add a .editorconfig and .dir-locals.el

I was inheriting the one from src/freedreno with funny tabs, while
this driver is written with normal Mesa 3-space indents.
Unfortunately I have to add both files, because I use emacs and emacs
prefers .dir-locals to .editorconfig :(

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agoshader_enums: Move MAX_DRAW_BUFFERS to this file.
Eric Anholt [Thu, 19 Sep 2019 17:54:08 +0000 (10:54 -0700)]
shader_enums: Move MAX_DRAW_BUFFERS to this file.

We include shader_enums.h from freedreno's compiler for both GL and
Vulkan, and the main/config.h include resulted in polluting the
namespace with things like MAX_VIEWPORTS that other Vulkan drivers use
as their driver-specific maximums.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agointel/fs: Fix fs_inst::flags_read for ANY/ALL predicates
Jason Ekstrand [Tue, 24 Sep 2019 22:06:12 +0000 (17:06 -0500)]
intel/fs: Fix fs_inst::flags_read for ANY/ALL predicates

Without this, we were DCEing flag writes because we didn't think their
results were used because we didn't understand that an ANY32 predicate
actually read all the flags.

Fixes: df1aec763eb "i965/fs: Define methods to calculate the flag..."
Reviewed-by: Matt Turner <mattst88@gmail.com>
4 years agoetnaviv: support ARB_framebuffer_object
Christian Gmeiner [Fri, 27 Sep 2019 08:39:30 +0000 (10:39 +0200)]
etnaviv: support ARB_framebuffer_object

Passes most of piglit's tests regarding arb_framebuffer_object
and unlocks some more piglit tests.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
4 years agoetnaviv: etna_resource_copy_region(..): drop assert
Christian Gmeiner [Fri, 27 Sep 2019 11:08:24 +0000 (13:08 +0200)]
etnaviv: etna_resource_copy_region(..): drop assert

We are using util_resource_copy_region(..) as fallback which supports
different formats for src and dst. Improves the experience when running
deqp or piglit with a debug build.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>