Bas Nieuwenhuizen [Mon, 30 Sep 2019 21:20:05 +0000 (23:20 +0200)]
radv: Fix warning in 32-bit build.
uintptr_t is 32 bits in a 32-bits build, resulting in shifting out
of bounds.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Wed, 2 Oct 2019 19:26:01 +0000 (21:26 +0200)]
radv: Fix condition for skipping the continue CS.
We need the continue CS for referencing the tess/GDS/sample position BOs.
Fixes: 46e52df34d3 "radv: add tessellation ring allocation support. (v2)"
Fixes: e1dc3ab7534 "radv/gfx10: allocate GDS/OA buffer objects for NGG streamout"
Fixes: 1171b304f30 "radv: overhaul fragment shader sample positions."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Michel Dänzer [Tue, 1 Oct 2019 14:00:16 +0000 (16:00 +0200)]
gitlab-ci: Use per-job ccache
Instead of a single cache shared between all jobs, but reduce the
maximum cache size to 1.5G (from 5G).
Rationale for smaller cache:
Pulling & pushing a 5G cache could take a long time. Consider
https://gitlab.freedesktop.org/mesa/mesa/-/jobs/684010 (click the "Show
complete raw" button to see timestamps): Pulling the cache took
1569927241-
1569927194 = 47 seconds, pushing it
1569927671-
1569927519
= 152, for a total of 199 seconds. The actual build took comparable
1569927518-
1569927243 = 275 seconds, despite no cache hits from ccache.
In other words, the cache transfers almost doubled the job duration,
and they would have negated any build time benefits from ccache even
with a high cache hit rate.
Also, the smaller caches avoid blowing up storage requirements for them
too much.
Rationale for per-job caches:
Making a single cache significantly smaller might result in cached
build products from one job getting evicted by another job, reducing
the likelihood of cache hits from previous pipelines.
v2:
* Move up "ccache --max-size=1500M" call (Eric Engestrom)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Gurchetan Singh [Wed, 25 Sep 2019 17:44:44 +0000 (10:44 -0700)]
virgl: honor winsys supplied metadata
To truly to do this correctly, we'll have to fix the discrepancy between
drm_virtgpu_3d_transfer_to_host and virtio_gpu_transfer_host_3d. However,
this is a good starting point.
Since virtio-gpu only supports self-import and export, this should be fine.
Let's only do WINSYS_HANDLE_TYPE_FD for this currently.
Reviewed by: Robert Tarasov <tutankhamen@chromium.org>
Gurchetan Singh [Wed, 25 Sep 2019 17:33:16 +0000 (10:33 -0700)]
virgl: modify internal structures to track winsys-supplied data
The winsys might supply dimensions that are different than
those we calculate. In additional, it may supply virtualized
modifiers.
In practice, a stride != bpp * width and virtualized modifiers don't
happen yet, but the plan is to move in that direction.
Also make virgl_resource_layout static.
Reviewed by: Robert Tarasov <tutankhamen@chromium.org>
Gurchetan Singh [Wed, 25 Sep 2019 17:06:23 +0000 (10:06 -0700)]
virgl: modify resource_create_from_handle(..) callback
This commit makes no functional changes, just adds the revelant
plumbing.
Reviewed by: Robert Tarasov <tutankhamen@chromium.org>
Gurchetan Singh [Wed, 25 Sep 2019 16:25:46 +0000 (09:25 -0700)]
virgl: remove stride from virgl_hw_res
It's not used anywhere, and stride isn't really an intrinsic
property of a GEM buffer.
Reviewed by: Robert Tarasov <tutankhamen@chromium.org>
Lionel Landwerlin [Wed, 2 Oct 2019 14:13:06 +0000 (17:13 +0300)]
intel: fix topology query
i915 will report ENODEV on generations prior to Haswell because there
is no point in reporting values on those. This is prior any fusing
could happen on parts with identical PCI ids.
This query call was previously only triggered on generations that
support performance queries, which happens to match generation for
which i915 reports topology, but the commit pointed below started
using it on all generations.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1860
Cc: <mesa-stable@lists.freedesktop.org>
Fixes: 96e1c945f2 ("i965: Move device info initialization to common code")
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Caio Marcelo de Oliveira Filho [Tue, 1 Oct 2019 03:47:58 +0000 (20:47 -0700)]
docs: Fix GL_EXT_demote_to_helper_invocation name
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Samuel Pitoiset [Wed, 18 Sep 2019 07:58:54 +0000 (09:58 +0200)]
radv/gfx10: fix the ESGS ring size symbol
Random hangs no longer happen, I'm actually not sure if they were
related to this.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 2 Oct 2019 18:37:43 +0000 (20:37 +0200)]
radv: fix build
Forgot to amend the commit before updating the MR.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Samuel Pitoiset [Wed, 2 Oct 2019 17:34:52 +0000 (19:34 +0200)]
Revert "radv: disable viewport clamping even if FS doesn't write Z"
This was actually the wrong fix.
This reverts commit
0a313cc285c2939de9cac07f045b0b699bc208ca.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 18 Sep 2019 13:43:13 +0000 (15:43 +0200)]
radv: rework the slow depthstencil clear to write depth from PS
Make sure to export the expected clear values to the depth
stencil attachment.
This fixes dEQP-VK.pipeline.depth_range_unrestricted.* on GFX10.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Tue, 17 Sep 2019 16:52:02 +0000 (18:52 +0200)]
radv/gfx10: fix NGG streamout with triangle strips for VS
The number of vertices has to be adjusted with the output primitive
type.
This fixes dEQP-VK.transform_feedback.simple.triangle_strip_*.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Tue, 17 Sep 2019 09:01:01 +0000 (11:01 +0200)]
radv/gfx10: fix storing/loading NGG stream outputs for GS
The GS outputs are stored differently in the LDS storage, they
are indexed by out_idx which is incremented for each stored DWORD.
Thus, we need a different path for exporting the stream outputs.
This fixes a bunch of CTS failures when NGG GS is force enabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Tue, 17 Sep 2019 08:51:46 +0000 (10:51 +0200)]
radv/gfx10: use the component mask when storing/loading NGG stream outputs
It's unnecessary to store/load more components that needed.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Tue, 17 Sep 2019 08:43:15 +0000 (10:43 +0200)]
radv/gfx10: fix storing/loading NGG stream outputs for VS and TES
The LDS storage allocated for stream outputs is 4 * N, where N
is the number of outputs. So, we have to store/load with N as index
and not with the output location as index.
This doesn't fix anything known but it should fix out-of-bounds
access and it also reduces the number of outputs written to the
LDS storage.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Mon, 16 Sep 2019 14:16:05 +0000 (16:16 +0200)]
radv/gfx10: add missing counter buffer to the BO list
The buffer isn't necessarily used before.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 18 Sep 2019 07:01:38 +0000 (09:01 +0200)]
radv/gfx10: add radv_device::use_ngg
Trivial.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Eric Engestrom [Wed, 2 Oct 2019 12:20:13 +0000 (13:20 +0100)]
git: delete .gitattributes
The last of these was deleted in
44a8e5135470fa51ae36 ("d3d1x: Remove.")
over 6 years ago.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Gert Wollny [Wed, 2 Oct 2019 07:28:55 +0000 (09:28 +0200)]
etnaviv: enable triangle strips only when the hardware supports it
Some hardware has a bug with triangle strips and it is signalled by the
flag BUG_FIXED8 whether this bug has been fixed. So only enable triangle
strips when this flag is set.
Thanks: Jonathan Marek and Christian Gmeiner for the pointers
v2: Add TODO to indicate that the handling should be refined
(Jonathan & Christian)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Dylan Baker [Mon, 30 Sep 2019 18:09:44 +0000 (11:09 -0700)]
meson: remove -DGALLIUM_SOFTPIPE from st/osmesa
It's unused here, and undefined in scons. It is used in targets/osmesa,
but it's properly defined there already.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Lionel Landwerlin [Tue, 1 Oct 2019 08:55:46 +0000 (11:55 +0300)]
mesa: don't forget to clear _Layer field on texture unit
On the Android Antutu benchmark we ran into an assert in ISL where the
(base layer + num layers) > total layers. It turns out the core of
mesa forgot to clear the _Layer variable, potentially leaving an
inconsistent value.
v2: Pull setting u->_Layer out of the conditional blocks (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Robin Murphy [Sat, 21 Sep 2019 17:07:28 +0000 (18:07 +0100)]
egl/gbm: Fix config validation
In converting to shift/size-based validation, we lost a condition from
the ARGB/XRGB equivalence check, which left it working one way round
but not the other, and broke applications like glmark2-es2-drm on some
platforms. Restore the equivalent check that *both* configs actually
have an alpha channel before considering a mismatch.
Fixes: 7b4ed2b513ef ("egl: Convert configs to use shifts and sizes instead of masks")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Ken Mays [Thu, 26 Sep 2019 09:47:06 +0000 (09:47 +0000)]
haiku: fix Mesa build
1. The hgl.c file is a read-only file versus read-write.
Ref: src/gallium/state_trackers/hgl/hgl.c
2. I've included the Haiku-specific patches I used to get a successful
build of Mesa 19.1.7 on Haiku using the meson/ninja build procedure.
Shows "[764/764] linking target ... libswpipe.so" at build completion.
v2:
Remove autotools files (Eric)
v3:
Update the patch
Reported-by: Ken Mays <kmays2000@gmail.com>
Tested-by: Ken Mays <kmays2000@gmail.com>
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Alexander von Gluck IV <kallisti5@unixzen.com>
Michel Dänzer [Mon, 30 Sep 2019 08:36:04 +0000 (10:36 +0200)]
gitlab-ci: Set ccache path for cross compilers in meson cross file
Without this, meson didn't pick up ccache for cross builds.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Andres Gomez [Wed, 25 Sep 2019 23:56:29 +0000 (02:56 +0300)]
docs/relnotes: add support for GL_ARB_gl_spirv, GL_ARB_spirv_extensions and OpenGL 4.6 on i965 and iris
After
41549a18e6c ("i965: Enable OpenGL 4.6 for Gen8+"), i965
implements GL_ARB_gl_spirv, GL_ARB_spirv_extensions and OpenGL 4.6.
After
15e439071d8 ("iris: Enable ARB_gl_spirv and ARB_spirv_extensions"),
iris implements GL_ARB_gl_spirv, GL_ARB_spirv_extensions and OpenGL
4.6.
v2:
- Explicit the support is for i965 and iris.
v3:
- Add also GL_ARB_spirv_extensions to the release notes (Alejandro).
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Kevin Strasser [Thu, 12 Sep 2019 16:38:24 +0000 (09:38 -0700)]
egl: Fix implicit declaration of ffs
Found when building for Android in C99 mode. Include bitscan.h to ensure ffs is
available.
Fixes: 7b4ed2b5 ("egl: Convert configs to use shifts and sizes instead of masks")
Signed-off-by: Kevin Strasser <kevin.strasser@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Rafael Antognolli [Mon, 30 Sep 2019 19:34:12 +0000 (12:34 -0700)]
intel/tools: Fix aubinator usage of rb_tree.
The order of comparison has changed, so we need to invert the logic of
"insert_left" when using rb_tree_insert_at().
Fixes: dae33052dbf (util/rb_tree: Reverse the order of comparison
functions).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Caio Marcelo de Oliveira Filho [Fri, 20 Sep 2019 18:08:44 +0000 (11:08 -0700)]
docs/relnotes: Add EXT_demote_to_helper_invocation support on iris, i965
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Caio Marcelo de Oliveira Filho [Fri, 20 Sep 2019 17:56:36 +0000 (10:56 -0700)]
i965: Enable EXT_demote_to_helper_invocation
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Caio Marcelo de Oliveira Filho [Fri, 20 Sep 2019 16:20:17 +0000 (09:20 -0700)]
iris: Enable EXT_demote_to_helper_invocation
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Caio Marcelo de Oliveira Filho [Fri, 20 Sep 2019 16:21:02 +0000 (09:21 -0700)]
gallium: Add PIPE_CAP_DEMOTE_TO_HELPER_INVOCATION
To enable EXT_demote_to_helper_invocation:
This extension adds a "demote" keyword that is similar to "discard" but
only suppresses subsequent writes and outputs to the framebuffer, and
does not terminate the execution of the invocation. For the remainder
of the execution, the invocation is "demoted" to act like a helper
invocation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Caio Marcelo de Oliveira Filho [Fri, 20 Sep 2019 17:50:37 +0000 (10:50 -0700)]
glsl: Add helperInvocationEXT() builtin
From EXT_demote_to_helper_invocation, implemented with the existing
nir_intrinsic_is_helper_invocation.
Such builtin is necessary when using `demote` because we can't
redefine the value of gl_HelperInvocation (since it is an input
variable).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Caio Marcelo de Oliveira Filho [Fri, 20 Sep 2019 16:34:19 +0000 (09:34 -0700)]
glsl: Parse `demote` statement
When the EXT_demote_to_helper_invocation extension is enabled,
`demote` is treated as a keyword, and produces an ir_demote.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Caio Marcelo de Oliveira Filho [Fri, 20 Sep 2019 16:27:00 +0000 (09:27 -0700)]
glsl: Add ir_demote
To represent the new `demote` keyword when using
EXT_demote_to_helper_invocation extension. Most of the changes are to
include it in the visitors.
Demote is not considered a control flow, so also include an empty
visit member function in ir_control_flow_visitor.
Only NIR actually supports `demote`, so assert the translations for
TGSI and Mesa's gl_program -- since the demote is not expected to
appear for those.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Caio Marcelo de Oliveira Filho [Thu, 19 Sep 2019 20:54:18 +0000 (13:54 -0700)]
mesa: Extension boilerplate for EXT_demote_to_helper_invocation
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Kenneth Graunke [Tue, 24 Sep 2019 03:37:39 +0000 (20:37 -0700)]
iris: Fix iris_rebind_buffer() for VBOs with non-zero offsets.
We can't just check for the BO base address, we need to check for the
full address including any offset we may have applied. When updating
the address, we need to include the offset again.
Fixes: 5ad0c88dbe3 ("iris: Replace buffer backing storage and rebind to update addresses.")
Eric Engestrom [Mon, 30 Sep 2019 18:11:22 +0000 (19:11 +0100)]
docs/install: drop autotools references
19.3 will be the 3rd release without autotools, people know it's gone by now.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Maya Rashish [Tue, 3 Sep 2019 08:55:34 +0000 (11:55 +0300)]
meson: Test for -Wl,--build-id=sha1
instead of hard-coding OS list. Helps Solaris ld builds.
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Signed-off-by: Maya Rashish <coypu@sdf.org>
Dylan Baker [Mon, 30 Sep 2019 18:02:41 +0000 (11:02 -0700)]
docs: remove stray newline
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Dylan Baker [Mon, 30 Sep 2019 18:02:31 +0000 (11:02 -0700)]
docs: use https for mesonbuild.com
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Dylan Baker [Mon, 30 Sep 2019 18:02:09 +0000 (11:02 -0700)]
docs: update install docs for meson
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Marek Olšák [Mon, 16 Sep 2019 23:39:40 +0000 (19:39 -0400)]
ac/nir: fix GLSL imageSamples()
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Mon, 16 Sep 2019 23:37:04 +0000 (19:37 -0400)]
ac: add ac_build_image_get_sample_count from radeonsi
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Fri, 13 Sep 2019 22:27:46 +0000 (18:27 -0400)]
ac/surface: don't allocate FMASK if there is no graphics
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Tue, 17 Sep 2019 01:19:44 +0000 (21:19 -0400)]
tgsi_to_nir: handle PIPE_FORMAT_NONE in image opcodes
radeonsi doesn't use the format and internal shaders don't set it.
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Dylan Baker [Thu, 9 May 2019 17:32:31 +0000 (10:32 -0700)]
meson: gallium media state trackers require libdrm with x11
v2: - update copyright year in all changed files
- rebase on master
Cc: 19.1 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Kenneth Graunke [Thu, 29 Aug 2019 07:38:15 +0000 (00:38 -0700)]
iris: Disable CCS_E for 32-bit floating point textures.
A while back, Michael Larabel noticed that Paraview's Wavelet Volume
case runs significantly slower on iris than i965. It turns out this
is because we enable CCS_E for 32-bit floating point formats, while
i965 disables it, with an oblique comment saying that we benchmarked
it (on what exactly?) and determined that it was a loss.
Paraview uses both R32_FLOAT and R32G32B32A32_FLOAT, and I observed
large framerate drops when enabling CCS_E for either format. However,
several other benchmarks (Aztec Ruins, many Synmark cases) use 16-bit
floating point formats, with no apparent ill effects.
So, disable compression for 32-bit float formats for now, but leave it
enabled for 16-bit float formats as they seem to be working fine.
Improves performance in Paraview's Wavelet Volume test by 62% on a
Skylake GT4e.
Fixes: 3cfc6a207bd ("iris: Fill out res->aux.possible_usages")
Marek Olšák [Fri, 20 Sep 2019 04:54:22 +0000 (00:54 -0400)]
ac: reorder and print all radeon_info fields
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Fri, 20 Sep 2019 02:17:30 +0000 (22:17 -0400)]
ac: set the number of SDPs same as the number of TCCs
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Fri, 20 Sep 2019 02:16:51 +0000 (22:16 -0400)]
ac: fix num_good_cu_per_sh for harvested chips
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Tue, 24 Sep 2019 20:56:57 +0000 (16:56 -0400)]
radeonsi/gfx10: fix corruption for chips with harvested TCCs
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Tue, 24 Sep 2019 20:56:21 +0000 (16:56 -0400)]
ac: add radeon_info::tcc_harvested
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Tue, 24 Sep 2019 20:47:05 +0000 (16:47 -0400)]
ac: fix incorrect vram_size reported by the kernel
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Tue, 24 Sep 2019 19:15:00 +0000 (15:15 -0400)]
radeonsi/gfx10: fix L2 cache rinse programming
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Eric Engestrom [Sun, 29 Sep 2019 21:27:24 +0000 (22:27 +0100)]
etnaviv: fix bitmask typo
Fixes: d92689c46f0d2da05ae6 ("etnaviv: nir: add native integers (HALTI2+)")
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Adam Jackson [Fri, 27 Sep 2019 16:16:22 +0000 (12:16 -0400)]
glx: Log the filename of the drm device if we fail to open it
Helps point the user to the specific device that's having issues, since
you're increasingly likely to have more than one.
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/107
Reviewed-by: Eric Anholt <eric@anholt.net>
pal1000 [Sun, 29 Sep 2019 15:35:29 +0000 (18:35 +0300)]
scons/windows: Enable compute shaders when possible.
Tests done with llvm-config indicate that there are only 2 libraries in
irreader and not in engine, LLVMAsmParser and LLVMIRReader and both of them
are part of coroutines so I replaced irreader with coroutines and added
libraries unique to coroutines.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Alyssa Rosenzweig [Sat, 28 Sep 2019 17:05:12 +0000 (13:05 -0400)]
pan/midgard: Allow scheduling conditions with constants
Now that we have constant adjustment logic abstracted, we can do this
safely. Along with the csel inversion patch, this allows many more
common csel ops to inline their condition in the bundle.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Sat, 28 Sep 2019 16:39:15 +0000 (12:39 -0400)]
pan/midgard: Add csel invert optimization
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Sat, 28 Sep 2019 16:38:51 +0000 (12:38 -0400)]
pan/midgard: Add mir_flip helper
Useful for various operations on both commutative and anticommutative
ops.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Sat, 28 Sep 2019 16:13:52 +0000 (12:13 -0400)]
pan/midgard: Tightly pack 32-bit constants
If we can reuse constant slots from other instructions, we would like to
do so to include more instructions per bundle.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Sat, 28 Sep 2019 14:43:51 +0000 (10:43 -0400)]
pan/midgard: Allow writeout to see into the future
If an instruction could be scheduled to vmul to satisfy the writeout
conditions, let's do that and save an instruction+cycle per fragment
shader.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Sat, 28 Sep 2019 14:28:48 +0000 (10:28 -0400)]
pan/midgard: Allow 6 instructions per bundle
We never had a scheduler good enough to hit this case before! :)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Sat, 28 Sep 2019 14:22:35 +0000 (10:22 -0400)]
pan/midgard: Only one conditional per bundle allowed
There's no r32 to save ya after you use up r31 :)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Sat, 28 Sep 2019 13:48:53 +0000 (09:48 -0400)]
pan/midgard: Schedule to smul/sadd
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Sat, 28 Sep 2019 13:48:43 +0000 (09:48 -0400)]
pan/midgard: Extend choose_instruction for scalar units
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Sat, 28 Sep 2019 13:48:27 +0000 (09:48 -0400)]
pan/midgard: Don't double check SCALAR units
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 23 Sep 2019 12:00:51 +0000 (08:00 -0400)]
pan/midgard: Use new scheduler
We still emit in-order but we switch to using the bundles created from
the new scheduler, which will allow greater flexibility and room for
out-of-order optimization.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Sat, 28 Sep 2019 00:18:16 +0000 (20:18 -0400)]
pan/midgard: Add distance metric to choose_instruction
We require chosen instructions to be "close", to avoid ballooning
register pressure. This is a kludge that will go away once we have
proper liveness tracking in the scheduler, but for now it prevents a lot
of needless spilling.
v2: Lower threshold to 6 (from 8). Schedule is hurt, but a few shaders
that spilled excessively are fixed.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Derp
Alyssa Rosenzweig [Fri, 27 Sep 2019 12:18:54 +0000 (08:18 -0400)]
pan/midgard: Add mir_choose_alu helper
Based on a given unit.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 23 Sep 2019 19:57:58 +0000 (15:57 -0400)]
pan/midgard: Implement load/store pairing
We can bundle two load/store together. This eliminates the need for
explicit load/store pairing in a prepass, as well.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Tue, 24 Sep 2019 13:06:37 +0000 (09:06 -0400)]
pan/midgard: Extend csel_swizzle to branches
Conditions for branches don't have a swizzle explicitly in the emitted
binary, but they do implicitly get swizzled in whatever instruction
wrote r31, so we need to handle that.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 23 Sep 2019 19:37:53 +0000 (15:37 -0400)]
pan/midgard: Add helpers for scheduling conditionals
Conditional instructions (csel and conditional branches) require their
condition to be written to a special condition pipeline register (r31.w
for scalar, r31.xyzw for vector). However, pipeline registers are live
only for the duration of a single bundle. As such, the logic to schedule
conditionals correct is surprisingly complex. Essentially, we see if we
could stuff the conditional within the same bundle as the csel/branch
without breaking anything; if we can, we do that. If we can't, we add a
dummy move to make room.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 27 Sep 2019 12:19:51 +0000 (08:19 -0400)]
pan/midgard: Implement predicate->unit
This allows ALUs to select for each unit of the bundle separately.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 27 Sep 2019 19:43:18 +0000 (15:43 -0400)]
pan/midgard: Add predicate->exclude
A bit of a kludge but allows setting an implicit dependency of synthetic
conditional moves on the actual condition, fixing code generated like:
vmul.feq r0, ..
sadd.imov r31, .., r0
vadd.fcsel [...]
The imov runs simultaneous with feq so it gets garbage results, but it's
too late to add an actual dependency practically speaking, since the new
synthetic imov doesn't have a node associated.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 23 Sep 2019 20:07:53 +0000 (16:07 -0400)]
pan/midgard: Add constant intersection filters
In the future, we will want to keep track of which components of
constants of various sizes correspond to which parts of the bundle
constants, like in the old scheduler. For now, let's just stub it out
for a simple rule of one instruction with embedded constants per bundle.
We can eventually do better, of course.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 23 Sep 2019 11:51:08 +0000 (07:51 -0400)]
pan/midgard: Remove csel constant unit force
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Sun, 22 Sep 2019 13:08:33 +0000 (09:08 -0400)]
pan/midgard: Add mir_schedule_texture/ldst/alu helpers
We don't actually do any scheduling here yet, but add per-tag helpers to
consume an instruction, print it, pop it off the worklist.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Sun, 22 Sep 2019 13:01:07 +0000 (09:01 -0400)]
pan/midgard: Add mir_choose_bundle helper
It's not always obvious what the optimal bundle type should be. Let's
break out the logic to decide.
Currently set for purely in-order operation.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Sun, 22 Sep 2019 12:50:22 +0000 (08:50 -0400)]
pan/midgard: Add mir_update_worklist helper
After we've chosen an instruction, popped it off, and processed it, it's
time to update the worklist, removing that instruction from the
dependency graph to allow its dependents to be put onto the worklist.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Wed, 18 Sep 2019 12:26:30 +0000 (08:26 -0400)]
pan/midgard: Add mir_choose_instruction stub
In the future, this routine will implement the core scheduling logic to
decide which instruction out of the worklist will be scheduled next, in
a way that minimizes cycle count and register pressure.
In the present, we are more interested in replicating in-order
scheduling with the much-more-powerful out-of-order model. So rather
than discriminating by a register pressure estimate, we simply choose
the latest possible instruction in the worklist.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Sat, 31 Aug 2019 18:56:58 +0000 (11:56 -0700)]
pan/midgard: Initialize worklist
This flows naturally from the dependency graph
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Sat, 31 Aug 2019 18:55:31 +0000 (11:55 -0700)]
pan/midgard: Calculate dependency graph
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Sat, 31 Aug 2019 18:08:39 +0000 (11:08 -0700)]
pan/midgard: Add flatten_mir helper
We would like to flatten a linked list of midgard_instructions into an
array of midgard_instruction pointers on the heap.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 27 Sep 2019 12:20:17 +0000 (08:20 -0400)]
pan/midgard: Squeeze indices before scheduling
This allows node_count to be correct while scheduling.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 27 Sep 2019 21:07:30 +0000 (17:07 -0400)]
pan/midgard: Fix component count handling for ldst
It's not based on the writemask and it can't be inferred; it's just
intrinsic to the op itself.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 27 Sep 2019 21:07:10 +0000 (17:07 -0400)]
pan/midgard: Add missing parans in SWIZZLE definition
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Daniel Schürmann [Fri, 27 Sep 2019 10:12:25 +0000 (12:12 +0200)]
nouveau: set lower_sub = true
Subtractions are already implemented as additions anyway.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Eric Anholt [Wed, 25 Sep 2019 18:56:06 +0000 (11:56 -0700)]
v3d: Enable the late algebraic optimizations to get real subs.
This worked better than my original v3d-local pass for just subs, and is a
huge win over not producing subs.
total instructions in shared programs:
6408469 ->
6167932 (-3.75%)
total threads in shared programs: 153784 -> 154104 (0.21%)
total uniforms in shared programs:
2157078 ->
1905823 (-11.65%)
total max-temps in shared programs: 904546 -> 895796 (-0.97%)
total spills in shared programs: 4959 -> 4993 (0.69%)
total fills in shared programs: 6558 -> 6670 (1.71%)
total sfu-stalls in shared programs: 25845 -> 25175 (-2.59%)
total inst-and-stalls in shared programs:
6434314 ->
6193107 (-3.75%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Daniel Schürmann [Thu, 26 Sep 2019 10:08:13 +0000 (12:08 +0200)]
aco: call nir_opt_algebraic_late() exhaustively
57559 shaders in 28980 tests
Totals:
SGPRS:
2963407 ->
2959935 (-0.12 %)
VGPRS:
2014812 ->
2016328 (0.08 %)
Spilled SGPRs: 1077 -> 1077 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 10348 -> 10348 (0.00 %) dwords per thread
Code Size:
114545436 ->
114498084 (-0.04 %) bytes
LDS: 933 -> 933 (0.00 %) blocks
Max Waves: 375997 -> 375866 (-0.03 %)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Daniel Schürmann [Wed, 25 Sep 2019 14:34:29 +0000 (16:34 +0200)]
radv/aco: Don't lower subtractions
40228 shaders in 20236 tests
Totals:
SGPRS:
2045512 ->
2046496 (0.05 %)
VGPRS:
1430856 ->
1430464 (-0.03 %)
Spilled SGPRs: 1077 -> 1077 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 10348 -> 10348 (0.00 %) dwords per thread
Code Size:
77202840 ->
77151832 (-0.07 %) bytes
LDS: 863 -> 863 (0.00 %) blocks
Max Waves: 260729 -> 260754 (0.01 %)
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Daniel Schürmann [Wed, 25 Sep 2019 14:33:10 +0000 (16:33 +0200)]
nir: Remove unnecessary subtraction optimizations
These optimizations are already covered after lowering.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Daniel Schürmann [Wed, 25 Sep 2019 14:20:09 +0000 (16:20 +0200)]
nir: recombine nir_op_*sub when lower_sub = false
There are some optimizations which are only implemented for additions
and some optimizations which assume that subtractions have been lowered.
By lowering all subtractions first and later recombine for backends
which prefer this option, we don't have to implement them twice.
This patch also moves lower_negate to nir_opt_algebraic_late() to enable
these optimizations for backends which make use of it.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Daniel Schürmann [Fri, 27 Sep 2019 10:49:06 +0000 (12:49 +0200)]
freedreno: Enable the nir_opt_algebraic_late() pass.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Eric Anholt [Thu, 26 Sep 2019 18:03:46 +0000 (11:03 -0700)]
vc4: Enable the nir_opt_algebraic_late() pass.
Upcoming changes to sub optimization will make this pass required. Over
the course of that series, we see uniforms +.46%, instructions -.24%
(seems like a fine tradeoff -- uniforms are 1/2 the size of instructions
as far as cache occupancy)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Michel Dänzer [Wed, 18 Sep 2019 14:28:41 +0000 (16:28 +0200)]
gitlab-ci: Add test-container:arm64 to needs: for arm64 test jobs
Without this, it was theoretically possible for the jobs to run before
the docker image was ready.
v2:
* Use - list syntax instead of [] (Eric Engestrom)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Michel Dänzer [Wed, 18 Sep 2019 14:21:32 +0000 (16:21 +0200)]
gitlab-ci: Add needs: for x86 buster docker image
This allows most build jobs to run before the stretch or arm64 docker
images are ready.
v2:
* Use - list syntax instead of [] (Eric Engestrom)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Michel Dänzer [Wed, 18 Sep 2019 14:17:01 +0000 (16:17 +0200)]
gitlab-ci: Declare needs: for stretch docker image
This allows the *-old-llvm jobs to run before the buster docker images
are ready.
v2:
* Use - list syntax instead of [] (Eric Engestrom)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>