Timothy Arceri [Mon, 16 Oct 2017 07:06:49 +0000 (18:06 +1100)]
mesa: Add new fast mtx_t mutex type for basic use cases
While modern pthread mutexes are very fast, they still incur a call to an
external DSO and overhead of the generality and features of pthread mutexes.
Most mutexes in mesa only needs lock/unlock, and the idea here is that we can
inline the atomic operation and make the fast case just two intructions.
Mutexes are subtle and finicky to implement, so we carefully copy the
implementation from Ulrich Dreppers well-written and well-reviewed paper:
"Futexes Are Tricky"
http://www.akkadia.org/drepper/futex.pdf
We implement "mutex3", which gives us a mutex that has no syscalls on
uncontended lock or unlock. Further, the uncontended case boils down to a
cmpxchg and an untaken branch and the uncontended unlock is just a locked decr
and an untaken branch. We use __builtin_expect() to indicate that contention
is unlikely so that gcc will put the contention code out of the main code
flow.
A fast mutex only supports lock/unlock, can't be recursive or used with
condition variables. We keep the pthread mutex implementation around as
for the few places where we use condition variables or recursive locking.
For platforms or compilers where futex and atomics aren't available,
simple_mtx_t falls back to the pthread mutex.
The pthread mutex lock/unlock overhead shows up on benchmarks for CPU bound
applications. Most CPU bound cases are helped and some of our internal
bind_buffer_object heavy benchmarks gain up to 10%.
Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Timothy Arceri [Tue, 7 Nov 2017 23:57:21 +0000 (10:57 +1100)]
mesa: rework how we free gl_shader_program_data
When I introduced gl_shader_program_data one of the intentions was to
fix a bug where a failed linking attempt freed data required by a
currently active program. However I seem to have failed to finish
hooking up the final steps required to have the data hang around.
Here we create a fresh instance of gl_shader_program_data every
time we link. gl_program has a reference to gl_shader_program_data
so it will be freed once the program is no longer active.
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Neil Roberts <nroberts@igalia.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102177
Timothy Arceri [Wed, 8 Nov 2017 00:34:10 +0000 (11:34 +1100)]
glsl: use the correct parent when allocating program data members
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Timothy Arceri [Tue, 7 Nov 2017 22:54:22 +0000 (09:54 +1100)]
glsl: drop cache_fallback
This turned out to be a dead end, it is much easier and less error
prone to just cache the IR used by the drivers backend e.g. TGSI or
NIR.
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Kenneth Graunke [Tue, 31 Oct 2017 07:56:24 +0000 (00:56 -0700)]
i965: properly initialize brw->cs.base.stage to MESA_SHADER_COMPUTE
This has a bit of a surprising effect:
For the render pipeline, the upload_sampler_state_table atom emits
3DSTATE_BINDING_TABLE_POINTERS_XS. It tries to avoid this for compute:
if (GEN_GEN >= 7 && stage_state->stage != MESA_SHADER_COMPUTE) {
/* Emit a 3DSTATE_SAMPLER_STATE_POINTERS_XS packet. */
genX(emit_sampler_state_pointers_xs)(brw, stage_state);
} ...
However, we were failing to initialize brw->cs.base.stage, so it was
left as 0 (MESA_SHADER_VERTEX), causing this condition to break. We
then emitted 3DSTATE_SAMPLER_STATE_POINTERS_VS in GPGPU mode, when
trying to upload CS samplers. Nothing good can come of this.
Found by inspection while debugging a GPU hang. Jordan believes this
helps the Deus Ex: Mankind Divided benchmark mode's stability when
running with shader cache.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Sat, 28 Oct 2017 15:57:23 +0000 (08:57 -0700)]
intel/nir: Break the linking code into a helper in brw_nir.c
Reviewed-by: Timothy Arceri <tarceri at itsqueeze.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Sat, 28 Oct 2017 15:50:54 +0000 (08:50 -0700)]
intel/nir: Add a helper for getting the NoIndirect mask
Reviewed-by: Timothy Arceri <tarceri at itsqueeze.com>
Cc: mesa-stable@lists.freedesktop.org
Matt Turner [Sat, 28 Oct 2017 01:15:46 +0000 (18:15 -0700)]
nir: Don't print swizzles when there are more than 4 components
... as can happen with various types like mat4, or else we'll smash the
stack writing past the end of components_local[].
Fixes: 5a0d3e1129b7 ("nir: Print the components referenced for split or
packed shader in/outs.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Dylan Baker [Fri, 27 Oct 2017 18:19:46 +0000 (11:19 -0700)]
meson: Add threads dependencies to glsl_compiler executable
Fixes compiling the optional standalone glsl compiler.
Reported-by: DrNick (on irc)
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-and-Tested-by: Eric Engestrom <eric.engestrom@imgtec.com>
Andreas Boll [Wed, 8 Nov 2017 14:15:08 +0000 (15:15 +0100)]
glsl: Fix typo fragement -> fragment
Fixes: 94d669b0d2f ("glsl: enforce fragment shader input restrictions in
GLSL ES 3.10")
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Andreas Boll [Wed, 8 Nov 2017 14:15:07 +0000 (15:15 +0100)]
broadcom/vc5: Remove unused v3d_compiler.c
Unused since original import of VC5.
Fixes: ade416d0236 ("broadcom: Add VC5 NIR compiler.")
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Andreas Boll [Wed, 8 Nov 2017 14:15:06 +0000 (15:15 +0100)]
broadcom/vc5: Add vc5_drm.h to the release tarball
Fixes: 45bb8f29571 ("broadcom: Add V3D 3.3 gallium driver called "vc5",
for BCM7268.")
Cc: 17.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Gert Wollny [Wed, 18 Oct 2017 15:05:27 +0000 (17:05 +0200)]
clover: use the unified check for c++11 instead of the gcc version number
So far clover based its test for compiler support on the version of gcc,
while in reality support for c++11 is required. This patch replaces the
version check by the check unified for all modules that require c++11.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Gert Wollny [Wed, 18 Oct 2017 15:05:26 +0000 (17:05 +0200)]
swr: Replace the check for c++11 by the unified version
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Gert Wollny [Wed, 18 Oct 2017 15:05:25 +0000 (17:05 +0200)]
configure: check for -std=c++11 support and enable st/mesa test accordingly
Add a check that tests whether the c++ compiler supports c++11, either
by default, by adding the compiler flag -std=c++11, or by adding a
compiler flag that the user has specified via the environment variable
CXX11_CXXFLAGS.
The test only does a very shallow check of c++11 support, i.e. it tests
whether the define __cplusplus >= 201103L to confirm language support
by the compiler, and it checks whether the header <tuple> is available
to test the availability of the c++11 standard library.
A make file conditional HAVE_STD_CXX11 is provided that is used in this
patch to enable the test in st/mesa if C++11 support is available.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102665
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Emil Velikov [Tue, 31 Oct 2017 19:26:33 +0000 (19:26 +0000)]
configure.ac: append to existing initializer override flags
Currently we were overwriting the existing warning flags, instead of
adding new [as applicable].
Fixes
c5d2e2d43f6 ("configure: Test for -Wno-initializer-overrides")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Emil Velikov [Tue, 31 Oct 2017 19:26:32 +0000 (19:26 +0000)]
configure.ac: append to existing MSVC compat flags
Currently we were overwriting the existing warning flags, instead of
adding new [as applicable].
v2: Add missing space before -Werror (Eric)
Fixes
e4b2b69e828 ("configure: Add and use AX_CHECK_COMPILE_FLAG")
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Dylan Baker [Wed, 1 Nov 2017 17:24:10 +0000 (10:24 -0700)]
meson: Allow building glvnd with EGL and non-dri based GLX
Because meson mirrors the auototools logic, it needs the same changes to
allow building glvnd based egl.
v2: - change if to elif (Eric)
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Emil Velikov <emil.velikov@collabora.com>
Emil Velikov [Tue, 31 Oct 2017 18:58:10 +0000 (18:58 +0000)]
configure.ac: require xcb* for the omx/va/... when using x11 platform
Targets such as omx and va can work w/o anything X related. Mandate the
xcb* dependencies only when the X11 platform is selected.
Reported-by: Lukas Rusak <lorusak@gmail.com>
Fixes: 63e11ac2b5c ("configure: error out if building VA w/o supported
platform")
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Lukas Rusak <lorusak@gmail.com> (v1)
Emil Velikov [Tue, 31 Oct 2017 18:58:09 +0000 (18:58 +0000)]
configure.ac: loosen --enable-glvnd check to honour egl
Currently we error out when building GLVND w/o GLX.
That was the original premice before we had EGL. As the commit says,
that error should be reworked to honour both - do so.
v2: Drop noop *);; (Eric)
Reported-by: Lukas Rusak <lorusak@gmail.com>
Fixes: ce562f9e3fa ("EGL: Implement the libglvnd interface for EGL (v3)")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Lukas Rusak <lorusak@gmail.com> (v1)
Emil Velikov [Mon, 16 Oct 2017 16:10:42 +0000 (17:10 +0100)]
egl/android: add a note about .swap_buffers_with_damage
Android implements the API and does the native damage handling itself.
At the same time it
a) does call the vendor's eglSwapBuffersWithDamageKHR
b) does not implement eglSetDamageRegionKHR
There's something strange happening here. For now simply note about the
'lack' of eglSwapBuffersWithDamageKHR support.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Emil Velikov [Tue, 24 Oct 2017 16:14:20 +0000 (17:14 +0100)]
wayland-drm: static inline wayland_drm_buffer_get
The function is effectively a direct function call into
libwayland-server.so.
Thus GBM no longer depends on the wayland-drm static library, making the
build more straight forward. And the resulting binary is a bit smaller.
Note: we need to move struct wayland_drm_callbacks further up,
otherwise we'll get an error since the type is incomplete.
v2: Rebase, beef-up commit message, update meson, move struct
wayland_drm_callbacks.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com> (v1)
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> # meson bit only
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com> # for the rest
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> # meson
Emil Velikov [Mon, 23 Oct 2017 12:29:30 +0000 (13:29 +0100)]
automake: intel: correctly append to the LIBADD variable
Commit
05fc62d89f5 sets the variable, yet it forgot the update the
existing reference to append (instead of assign).
Thus as-is the expat library was discarded from the link chain when
building with Android.
Fixes: 05fc62d89f5 ("automake: intel: move expat handling where it's
used")
Cc: Hongxu Jia <hongxu.jia@windriver.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Emil Velikov [Mon, 16 Oct 2017 15:40:08 +0000 (16:40 +0100)]
configure: enable the OpenCL ICD by default
Nearly all the distributions* that build Mesa OpenCL, enable the ICD.
Since building a non-ICD driver has the chance of conflicting with
existing OpenCL binary (libOpenCL.so).
Furthermore, some applications expect the library to provide
annotated/versioned symbols.
https://lists.freedesktop.org/archives/mesa-dev/2017-September/171093.html
*Fedora, Suse, Arch, Debian, Ubuntu, FreeBSD use the ICD
Gentoo manages the conflicting files via eselect.
Cc: Matt Turner <mattst88@gmail.com>
Cc: Jan Vesely <jan.vesely@rutgers.edu>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-By: Aaron Watry <awatry@gmail.com>
Emil Velikov [Mon, 16 Oct 2017 15:40:07 +0000 (16:40 +0100)]
targets/opencl: don't hardcode the icd file install to /etc/...
Use $(sysconfdir) instead of hardcoding /etc.
While the OpenCL spec expects the file in /etc, people building their
stack can override that, esp. !Linux users.
Furthermore this removes a fundamental violation, which results in the
system file being overwritten even as one explicitly sets --prefix
and/or DESTDIR.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-By: Aaron Watry <awatry@gmail.com>
Emil Velikov [Wed, 8 Nov 2017 14:07:27 +0000 (14:07 +0000)]
amd: add amdgpu_asic_addr.h to the sources list
Otherwise it will be missing from the release tarball
Fixes: 7f33e94e43a ("amd/addrlib: update to latest version")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Tobias Droste [Wed, 8 Nov 2017 09:44:19 +0000 (10:44 +0100)]
gallivm: Use new LLVM fast-math-flags API
LLVM 6 changed the API on the fast-math-flags:
https://reviews.llvm.org/rL317488
NOTE: This also enables the new flag 'ApproxFunc' to allow for
approximations for library functions (sin, cos, ...). I'm not completly
convinced, that this is something mesa should do.
Signed-off-by: Tobias Droste <tdroste@gmx.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-and-Tested-by: Michel Dänzer <michel.daenzer@amd.com>
Juan A. Suarez Romero [Tue, 31 Oct 2017 17:39:17 +0000 (17:39 +0000)]
glsl: add varying resources for arrays of complex types
This patch is mostly a patch done by Ilia Mirkin.
It fixes KHR-GL45.enhanced_layouts.varying_structure_locations.
v2: fix locations for TCS/TES/GS inputs and outputs (Ilia)
CC: Ilia Mirkin <imirkin@alum.mit.edu>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103098
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Timothy Arceri [Wed, 1 Nov 2017 09:32:12 +0000 (20:32 +1100)]
st/glsl_to_nir: use nir_shader_gather_info()
Use the NIR helper rather than the GLSL IR helper to get in/out
masks. This allows us to ignore varyings removed by NIR
optimisations.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Timothy Arceri [Wed, 1 Nov 2017 06:28:09 +0000 (17:28 +1100)]
st/glsl_to_nir: generate NIR earlier
We want to use nir_shader_gather_info() the GLSL IR version might
be including varyings that NIR later eliminates. To do this we
need to generate NIR before we we start using the in/out bitmasks.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Timothy Arceri [Wed, 1 Nov 2017 03:15:22 +0000 (14:15 +1100)]
st/glsl_to_nir: delay adding built-in uniforms to Parameters list
Delaying adding built-in uniforms until after we convert to NIR
gives us a better chance to optimise them away. Also NIR allows
us to iterate over the uniforms directly so should be faster.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Mon, 6 Nov 2017 23:56:13 +0000 (00:56 +0100)]
amd/addrlib: update to latest version
This uses C++11 initializer lists.
I just overwrote all Mesa files with internal addrlib and discarded
hunks that we should probably keep, but I might have missed something.
The code depending on ADDR_AM_BUILD is removed. We can add it back next
time if needed.
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Eric Anholt [Tue, 7 Nov 2017 19:05:16 +0000 (11:05 -0800)]
braodcom/vc5: Flush the job when it grows over 1GB.
Fixes GL_OUT_OF_MEMORY from streaming-texture-leak (and will hopefully
keep piglit from ooming on my no-swap platform, as well).
Eric Anholt [Tue, 7 Nov 2017 18:34:42 +0000 (10:34 -0800)]
broadcom/vc5: Do 16-bit unpacking of integer texture returns properly.
We were doing f16 unpacks, which trashed "1" values. Fixes many piglit
texwrap GL_EXT_texture_integer cases.
Eric Anholt [Tue, 7 Nov 2017 18:13:04 +0000 (10:13 -0800)]
broadcom/vc5: Fix pausing of transform feedback.
Gallium disables it by removing the streamout buffers, not by binding a
program that doesn't have TF outputs. Fixes piglit
"ext_transform_feedback2/counting with pause"
Eric Anholt [Tue, 7 Nov 2017 18:08:59 +0000 (10:08 -0800)]
broadcom/vc5: Add support for GL_RASTERIZER_DISCARD
Fixes piglit discard-drawarrays.
Eric Anholt [Tue, 7 Nov 2017 17:51:56 +0000 (09:51 -0800)]
broadcom/vc5: Fix scheduling for a non-SFU R4 write after a dead R4 write.
The v3d_qpu_writes_r*() were only checking for fixed-function accumulator
writes, not normal ALU writes to those regs.
Fixes fs-discard-exit-2 on simulation (but not HW).
Eric Anholt [Tue, 7 Nov 2017 00:59:05 +0000 (16:59 -0800)]
broadcom/vc5: Add partial transform feedback query support.
We have to compute the queries in software, so we're counting the
primitives by hand. We still need to make sure to not increment the
PRIMITIVES_EMITTED if we overflowed, but leave that for later.
Eric Anholt [Mon, 6 Nov 2017 23:41:40 +0000 (15:41 -0800)]
broadcom/vc5: Add occlusion query support.
Fixes all of piglit's OQ tests.
Jason Ekstrand [Thu, 2 Nov 2017 22:59:58 +0000 (15:59 -0700)]
intel/fs/nir: Return Q types from brw_reg_type_for_bit_size
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Jason Ekstrand [Fri, 3 Nov 2017 01:32:39 +0000 (18:32 -0700)]
intel/fs/nir: Use Q immediates for load_const on gen8+
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Jason Ekstrand [Fri, 3 Nov 2017 01:30:04 +0000 (18:30 -0700)]
intel/fs/nir: Setup immediates based on type in i2b and f2b
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Jason Ekstrand [Fri, 3 Nov 2017 01:29:03 +0000 (18:29 -0700)]
intel/reg: Add helpers for 64-bit integer immediates
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Jason Ekstrand [Thu, 24 Aug 2017 00:43:36 +0000 (17:43 -0700)]
compiler/nir_types: Handle vectors in glsl_get_array_element
Most of NIR doesn't allow doing array indexing on a vector (though it
does on a matrix). However, nir_lower_io handles it just fine and this
behavior is needed for shared variables in Vulkan. This commit makes
glsl_get_array_element do something sensible for vector types and makes
nir_validate happy with them.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Fri, 1 Sep 2017 23:40:28 +0000 (16:40 -0700)]
nir: Validate base types on array dereferences
We were already validating that the parent type goes along with the
child type but we weren't actually validating that the parent type is
reasonable. This fixes that.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Wed, 23 Aug 2017 01:57:56 +0000 (18:57 -0700)]
nir,intel/compiler: Use a fixed subgroup size
The GL_ARB_shader_ballot spec says that gl_SubGroupSizeARB is declared
as a uniform. This means that it cannot change across an invocation
such as a draw call or a compute dispatch. For compute shaders, we're
ok because we only ever use one dispatch size. For fragment, however,
the hardware dynamically chooses between SIMD8 and SIMD16 which violates
the spec. Instead, let's just pick a subgroup size based on the shader
stage. The fixed size we choose for compute shaders is a bit higher
than strictly needed but there's no real harm in that. The advantage is
that, if they do anything interesting with the value, NIR will see it as
an immediate and can optimize better.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Wed, 23 Aug 2017 01:44:51 +0000 (18:44 -0700)]
nir/lower_subgroups: Lower ballot intrinsics to the specified bit size
Ballot intrinsics return a bitfield of subgroups. In GLSL and some
SPIR-V extensions, they return a uint64_t. In SPV_KHR_shader_ballot,
they return a uvec4. Also, some back-ends would rather pass around
32-bit values because it's easier than messing with 64-bit all the time.
To solve this mess, we make nir_lower_subgroups take a new parameter
called ballot_bit_size and it lowers whichever thing it gets in from the
source language (uint64_t or uvec4) to a scalar with the specified
number of bits. This replaces a chunk of the old lowering code.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Tue, 31 Oct 2017 21:42:33 +0000 (14:42 -0700)]
nir/builder: Add a nir_imm_intN_t helper
This lets you easily build integer immediates of arbitrary bit size.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jason Ekstrand [Tue, 22 Aug 2017 21:09:37 +0000 (14:09 -0700)]
nir/lower_system_values: Lower SUBGROUP_*_MASK based on type
The SUBGROUP_*_MASK system values are uint64_t when coming in from GLSL
but uvec4 when coming in from SPIR-V. Lowering based on type allows us
to nicely handle both.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Wed, 23 Aug 2017 02:58:59 +0000 (19:58 -0700)]
nir: Make ballot intrinsics variable-size
This way they can return either a uvec4 or a uint64_t. At the moment,
this is a no-op since we still always return a uint64_t.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Tue, 22 Aug 2017 21:08:32 +0000 (14:08 -0700)]
nir: Add a ssa_dest_init_for_type helper
This would be useful a number of places
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Tue, 22 Aug 2017 20:23:59 +0000 (13:23 -0700)]
nir: Add a new subgroups lowering pass
This commit pulls nir_lower_read_invocations_to_scalar along with most
of the guts of nir_opt_intrinsics (which mostly does subgroup lowering)
into a new nir_lower_subgroups pass. There are various other bits of
subgroup lowering that we're going to want to do so it makes a bit more
sense to keep it all together in one pass. We also move it in i965 to
happen after nir_lower_system_values to ensure that because we want to
handle the subgroup mask system value intrinsics here.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Thu, 31 Aug 2017 16:53:02 +0000 (09:53 -0700)]
intel/fs: Don't use automatic exec size inference
The automatic exec size inference can accidentally mess things up if
we're not careful. For instance, if we have
add(4) g38.2<4>D g38.1<8,2,4>D g38.2<8,2,4>D
then the destination register will end up having a width of 2 with a
horizontal stride of 4 and a vertical stride of 8. The EU emit code
sees the width of 2 and decides that we really wanted an exec size of 2
which doesn't do what we wanted.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Wed, 30 Aug 2017 19:07:00 +0000 (12:07 -0700)]
intel/fs: Explicitly set EXECUTE_1 where needed
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Wed, 30 Aug 2017 20:36:58 +0000 (13:36 -0700)]
intel/eu: Explicitly set EXECUTE_1 where needed
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Thu, 31 Aug 2017 16:41:22 +0000 (09:41 -0700)]
intel/eu: Make automatic exec sizes a configurable option
We have had a feature in codegen for some time that tries to
automatically infer the execution size of an instruction from the width
of its destination. For things such as fixed function GS, clipper, and
SF programs, this is very useful because they tend to have lots of
hand-rolled register setup and trying to specify the exec size all the
time would be prohibitive. For things that come from a higher-level IR,
however, it's easier to just set the right size all the time and the
automatic exec sizes can, in fact, cause problems. This commit makes it
optional while enabling it by default.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Fri, 1 Sep 2017 16:59:34 +0000 (09:59 -0700)]
intel/fs: Rework zero-length URB write handling
Originally we tried to handle this case based on slots_valid. However,
there are a number of ways that this can go wrong. For one, we throw
away any trailing slots which either aren't written or are set to
VARYING_SLOT_PAD. Second, even if PSIZ is a valid slot, we may not
actually write anything there. Between the lot of these, it was
possible to end up in a case where we tried to do a regular URB write
but ended up with a length of 1 which is invalid. This commit moves it
to the end and makes it based on a new boolean flag urb_written.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Fri, 1 Sep 2017 04:56:43 +0000 (21:56 -0700)]
intel/compiler/fs: Set up subgroup invocation as a system value
Subgroup invocation is computed using a vector immediate and some
dispatch-aware arithmetic. Unfortunately, due to the vector arithmetic,
and the fact that it's frequently read 16-wide, it's not something that
can easily be CSEd by the back-end compiler. There are a few different
possible approaches to this problem:
1) Emit the code to calculate the subgroup invocation on-the-fly and
trust NIR to do the CSE. This is what we were doing.
2) Add a back-end instruction for the subgroup ID. This has the
advantage of helping the back-end compiler with CSE but has the
downside of very poor scheduling for the calculation because it has
to be emitted in the back-end.
3) Emit the calculation at the top of the program and re-use the
result. This gets rid of the CSE problem but comes at the cost of
an extra live register.
This commit switches us from 1) to 3). We choose to store the subgroup
invocation values as a W type to reduce the impact of the extra live
register. Trusting NIR and using 1) was fine but we're soon going to
want to use the subgroup invocation value for other things in the
back-end compiler and this makes it much easier to do without having to
worry about CSE problems.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Thu, 24 Aug 2017 18:40:31 +0000 (11:40 -0700)]
intel/cs: Push subgroup ID instead of base thread ID
We're going to want subgroup ID for SPIR-V subgroups eventually anyway.
We really only want to push one and calculate the other from it. It
makes a bit more sense to push the subgroup ID because it's simpler to
calculate and because it's a real API thing. The only advantage to
pushing the base thread ID is to avoid a single SHL in the shader.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Tue, 22 Aug 2017 04:27:19 +0000 (21:27 -0700)]
intel/cs: Re-run final NIR optimizations for each SIMD size
With the advent of SPIR-V subgroup operations, compute shaders will have
to be slightly different depending on the SIMD size at which they
execute. In order to allow us to do dispatch-width specific things in
NIR, we re-run the final NIR stages for each sIMD width.
One side-effect of this change is that we start rallocing fs_visitors
which means we need DECLARE_RALLOC_CXX_OPERATORS.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Tue, 7 Nov 2017 01:01:56 +0000 (17:01 -0800)]
intel/compiler: Move the destructor from vec4_visitor to backend_shader
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Tue, 7 Nov 2017 00:29:42 +0000 (16:29 -0800)]
i965/fs: Get rid of the early return in brw_compile_cs
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Sat, 30 Sep 2017 00:57:32 +0000 (17:57 -0700)]
intel/cs: Rework the way thread local ID is handled
Previously, brw_nir_lower_intrinsics added the param and then emitted a
load_uniform intrinsic to load it directly. This commit switches things
over to use a specific NIR intrinsic for the thread id. The one thing I
don't like about this approach is that we have to copy thread_local_id
over to the new visitor in import_uniforms.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Tue, 3 Oct 2017 03:25:11 +0000 (20:25 -0700)]
intel/fs: Mark 64-bit values as being contiguous
This isn't often a problem , when we're in a compute shader, we must
push the thread local ID so we decrement the amount of available push
space by 1 and it's no longer even and 64-bit data can, in theory, span
it. By marking those uniforms contiguous, we ensure that they never get
split in half between push and pull constants.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Tue, 22 Aug 2017 04:27:42 +0000 (21:27 -0700)]
intel/cs: Ignore runtime_check_aads_emit for CS
It's only set on gen4-5 which clearly don't support compute shaders.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Tue, 22 Aug 2017 03:00:30 +0000 (20:00 -0700)]
intel/cs: Stop setting dispatch_grf_start_reg
Nothing ever reads it for compute shaders because it's always 1.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Tue, 22 Aug 2017 02:30:24 +0000 (19:30 -0700)]
intel/cs: Drop max_dispatch_width checks from compile_cs
The only things that adjust fs_visitor::max_dispatch_width are render
target writes which don't happen in compute shaders so they're
pointless.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Tue, 22 Aug 2017 02:16:45 +0000 (19:16 -0700)]
intel/fs: Remove min_dispatch_width from fs_visitor
It's 8 for everything except compute shaders. For compute shaders,
there's no need to duplicate the computation and it's just a possible
source of error.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Tue, 22 Aug 2017 01:42:41 +0000 (18:42 -0700)]
intel/fs: use pull constant locations to check for first compile of a shader
Before, we bailing in assign_constant_locations based on the minimum
dispatch size. The more direct thing to do is simply to check for
whether or not we have constant locations and bail if we do. For
nir_setup_uniforms, it's completely safe to do it multiple times because
we just copy a value from the NIR shader.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Sat, 2 Sep 2017 05:37:42 +0000 (22:37 -0700)]
intel/fs: Retype dest to match value in read[First]Invocation
This is what we really wanted all along. Always retyping to D works
because that's what get_nir_src() always gives us, at least for 32-bit
types. The SPIR-V variants of these operations accept arbitrary types
and we need this if we're going to handle 64 or 16-bit values.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Sat, 2 Sep 2017 05:35:43 +0000 (22:35 -0700)]
intel/fs: Uniformize the index in readInvocation
The index is any value provided by the shader and this can be called in
non-uniform control flow so we can't just take component 0. Found by
inspection.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Sat, 2 Sep 2017 05:30:53 +0000 (22:30 -0700)]
intel/fs: Protect opt_algebraic from OOB BROADCAST indices
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Thu, 24 Aug 2017 00:10:33 +0000 (17:10 -0700)]
i965/fs/nir: Don't stomp 64-bit values to D in get_nir_src
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Mon, 28 Aug 2017 04:48:03 +0000 (21:48 -0700)]
i965/fs/nir: Minor refactor of store_output
Stop retyping the output of shuffle_64bit_data_for_32bit_write. It's
always BRW_REGISTER_TYPE_D which is perfectly fine for writing out.
Also, when we change get_nir_src to return something with a 64-bit type
for 64-bit values, the retyping will not be at all what we want. Also,
retyping the output based on src.type before we whack it back to 32 bits
is a problem because the output is always 32 bits.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Sat, 26 Aug 2017 17:00:14 +0000 (10:00 -0700)]
i965/fs: Return a fs_reg from shuffle_64bit_data_for_32bit_write
All callers of this function allocate a fs_reg expressly to pass into
it. It's much easier if we just let the helper allocate the register.
While we're here, we switch it to doing the MOVs with an integer type so
that we don't accidentally canonicalize floats on half of a double.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Sat, 26 Aug 2017 18:26:40 +0000 (11:26 -0700)]
i965/fs/nir: Simplify 64-bit store_output
The swizzles weren't doing any good because swiz is just XYZW. Also, we
were emitting an extra set of MOVs because shuffle_64bit_data_for_32bit
already does a MOV for us. Finally, the temporary was only ever used
inside the inner loop so there's no need for it to actually be an array.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Wed, 18 Oct 2017 01:59:26 +0000 (18:59 -0700)]
intel/fs: Use the original destination region for int MUL lowering
Some hardware (CHV, BXT) have special restrictions on register regions
when doing integer multiplication. We want to respect those when we
lower to DxW multiplication.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Wed, 18 Oct 2017 01:56:29 +0000 (18:56 -0700)]
intel/fs: Fix integer multiplication lowering for src/dst hazards
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Tue, 17 Oct 2017 21:45:43 +0000 (14:45 -0700)]
intel/fs: Fix MOV_INDIRECT for 64-bit values on little-core
The same workaround we need for 64-bit values on little core also takes
care of the Ivy Bridge problem and does so a bit more efficiently so we
can drop that code while we're here.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Tue, 17 Oct 2017 21:45:12 +0000 (14:45 -0700)]
intel/eu: Fix broadcast instruction for 64-bit values on little-core
We're not using broadcast for any 32-bit types right now since we mostly
use it for emit_uniformize on 32-bit buffer indices. However, SPIR-V
subgroups are going to need it for 64-bit so let's make it work.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Wed, 18 Oct 2017 02:50:36 +0000 (19:50 -0700)]
intel/eu/reg: Add a subscript() helper
This is similar to the identically named fs_reg helper.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Tue, 17 Oct 2017 21:16:31 +0000 (14:16 -0700)]
intel/eu: Just modify the offset in brw_broadcast
This means we have to drop const from a variable but it also means that
100% of the code which deals with the offset limit is in one place.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Tue, 17 Oct 2017 18:57:48 +0000 (11:57 -0700)]
intel/compiler: Add some restrictions to MOV_INDIRECT and BROADCAST
These restrictions effectively already existed due to the way we use
indirect sources but weren't being directly enforced.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Thu, 12 Oct 2017 23:17:03 +0000 (16:17 -0700)]
intel/fs: Use a pair of 1-wide MOVs instead of SEL for any/all
For some reason, the any/all predicates don't work properly with SIMD32.
In particular, it appears that a SEL with a QtrCtrl of 2H doesn't read
the correct subset of the flag register and you end up getting garbage
in the second half. Work around this by using a pair of 1-wide MOVs and
scattering the result. This fixes the any/all instructions for SIMD32.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Thu, 7 Sep 2017 03:32:30 +0000 (20:32 -0700)]
intel/fs: Use an explicit D type for vote any/all/eq intrinsics
The any/all intrinsics return a boolean value so D or UD is the correct
type. Unfortunately, get_nir_dest has the annoying behavior of
returnning a float type by default. This causes format conversion which
gives us -1.0f or 0.0f in the register. If the consumer of the result
does an integer comparison to zero, it will give you the right boolean
value but if we do something more clever based on the 0/~0 assumption
for booleans, this will give the wrong value.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Thu, 7 Sep 2017 01:37:34 +0000 (18:37 -0700)]
intel/fs: Don't stomp f0.1 in SIMD16 ballot
In fragment shaders f0.1 is used for discards so doing ballot after a
discard can potentially cause the discard to not happen. However, we
don't support SIMD32 fragment shaders yet so this isn't a problem.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Sat, 2 Sep 2017 06:24:15 +0000 (23:24 -0700)]
intel/fs: Use ANY/ALL32 predicates in SIMD32
We have ANY/ALL32 predicates and, for the most part, they work just
fine. (See the next commit for more details.) Also, due to the way
that flag registers are handled in hardware, instruction splitting is
able to split the CMP correctly. Specifically, that hardware looks at
the execution group and knows to shift it's flag usage up correctly so a
2H instruction will write to f0.1 instead of f0.0.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Thu, 7 Sep 2017 01:31:11 +0000 (18:31 -0700)]
intel/fs: Be more explicit about our placement of [un]zip
Before, we were careful to place the zip after the last of the split
instructions but did unzip on-demand. This changes things so that the
unzips go before all of the split instructions and the unzip comes
explicitly after all the split instructions. As a side-effect of this
change, we now emit the split instruction from highest SIMD group to
lowest instead of low to high. We could have kept the old behavior, but
it shouldn't matter and this made the code easier.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Thu, 7 Sep 2017 01:24:17 +0000 (18:24 -0700)]
intel/fs: Pass builders instead of blocks into emit_[un]zip
This makes it far more explicit where we're inserting the instructions
rather than the magic "before and after" stuff that the emit_[un]zip
helpers did based on block and inst.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Thu, 2 Nov 2017 21:52:49 +0000 (14:52 -0700)]
intel/fs: Use a pure vertical stride for large register strides
Register strides higher than 4 are uncommon but they can happen. For
instance, if you have a 64-bit extract_u8 operation, we turn that into
UB -> UQ MOV with a source stride of 8. Our previous calculation would
try to generate a stride of <32;8,8>:ub which is invalid because the
maximum horizontal stride is 4. To solve this problem, we instead use a
stride of <8;1,0>. As noted in the comment, this does not work as a
destination but that's ok as very few things actually generate that
stride.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
Eric Anholt [Fri, 3 Nov 2017 02:04:12 +0000 (19:04 -0700)]
broadcom/vc5: Skip emitting textures that aren't used.
Fixes crashes when ARB_fp uses texture[1] but not 0, as in piglit's
fp-fragment-position.
Eric Anholt [Fri, 3 Nov 2017 01:49:58 +0000 (18:49 -0700)]
broadcom/vc5: Add missing SRGBA8 ETC2 support.
Fixes piglit oes_compressed_etc2_texture-miptree srgb8-alpha8.
Eric Anholt [Fri, 3 Nov 2017 01:45:07 +0000 (18:45 -0700)]
broadcom/vc5: Disable early Z test when the FS writes Z.
Fixes piglit early-z.
Eric Anholt [Thu, 2 Nov 2017 19:49:46 +0000 (12:49 -0700)]
broadcom/vc5: Shift the min/max lod fields by the BASE_LEVEL.
The lod clamping is what limits you between base and last level, and the
base level field is just there to help decide where the min/mag change
happens.
Fixes tex-miplevel-selection GL2:texture()
Eric Anholt [Thu, 2 Nov 2017 19:24:17 +0000 (12:24 -0700)]
broadcom/vc5: Add support for anisotropic filtering.
Eric Anholt [Thu, 2 Nov 2017 19:19:10 +0000 (12:19 -0700)]
broadcom/vc5: Fix mipmap filtering enums.
The ordering of the values was even less obvious than I thought, with both
the mip filter and the min filter being in different bits depending on
whether the mip filter is none.
Fixes piglit fs-textureLod-miplevels.shader_test
Eric Anholt [Thu, 2 Nov 2017 18:47:30 +0000 (11:47 -0700)]
broadcom/vc5: Fix height padding of small UIF slices.
The HW doesn't pad the slice's height to make a full 4x4 group of UIF
blocks. We just need to pad to columns, and the start of the next column
appears in the bottom of the previous column's last block.
Fixes piglit fs-textureOffset-2D.
Eric Anholt [Thu, 2 Nov 2017 00:55:52 +0000 (17:55 -0700)]
broadcom/vc5: Print the actual offsets in HW for our resource layout debug.
The alignment of level 0 is non-obvious, so it's hard to turn a faulting
address into a slice without this.
Eric Anholt [Thu, 2 Nov 2017 00:22:17 +0000 (17:22 -0700)]
broadcom/vc5: Set the available VS outputs to match the FS inputs.
Fixes piglit glsl-es-3.00/minimum-maximums.txt.
Eric Anholt [Wed, 1 Nov 2017 22:29:58 +0000 (15:29 -0700)]
broadcom/vc5: Set the max texture LOD bias.
The field is signed 8.8, so the usual 16.0f fits. Fixes piglit
gl-2.1-minmax.