Alyssa Rosenzweig [Mon, 15 Jul 2019 14:08:15 +0000 (07:08 -0700)]
panfrost: Fix blend_cso if nr_cbufs == 0
Fixes: 46396af1ec4b69ca4a ("panfrost: Refactor blend infrastructure")
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 12 Jul 2019 23:53:52 +0000 (16:53 -0700)]
panfrost: Cleanup shader upload code
The old algorithm is still used (and the same issue -- namely, leaking
all shaders -- applies) but we're way more concise about it since we're
*only* using the routine for shaders nowadays; everything else is a
BO-proper or transient.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 12 Jul 2019 23:45:44 +0000 (16:45 -0700)]
panfrost: Remove all old allocators
With the new refactor, this all becomes dead code.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 12 Jul 2019 23:38:11 +0000 (16:38 -0700)]
panfrost: Use transient memory for occlusion queries
These only last a frame anyway.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 12 Jul 2019 23:37:45 +0000 (16:37 -0700)]
panfrost: Remove bizarre hack
I don't think this is still necessary, and if it is, we'll have to
figure out how to fix it the right way.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 12 Jul 2019 23:35:47 +0000 (16:35 -0700)]
panfrost: Upload vertex descriptors to *transient* memory
It's not legal to reuse the vertex shader descriptor across frames now
that we patch it at draw-time, so upload to transient memory.
Ideally, we could be smarter about this such that subsequent draws with
the same vertex shader and same patched state would reuse the
descriptor, but for now, let's simply achieve correctness.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 12 Jul 2019 22:50:58 +0000 (15:50 -0700)]
panfrost: Delay resource mmaps
We use the new PAN_ALLOCATE_DELAY_MMAP flag to only map resources
on-demand, which should avoid mapping FBOs.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 12 Jul 2019 22:45:28 +0000 (15:45 -0700)]
panfrost: Cleanup PAN_ALLOCATE_*
While we're at it, prompted by a semantics issue around INVISIBLE, also
add a separate DELAY_MMAP flag.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 12 Jul 2019 22:39:47 +0000 (15:39 -0700)]
panfrost/drm: Don't mmap INVISIBLE buffers
On the new kernel, mmaping doesn't *hurt* per se, but it's still
wasteful for buffers explicitly marked as not needing an mmap.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Lionel Landwerlin [Mon, 15 Jul 2019 12:35:11 +0000 (15:35 +0300)]
anv: fix crash in vkCmdClearAttachments with unused attachment
anv_render_pass_compile() turns an unused attachment into a NULL
depth_stencil_attachment pointer so check that pointer before
accessing it.
Found with updates to existing CTS tests.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 208be8eafa30be ("anv: Make subpass::depth_stencil_attachment a pointer")
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Samuel Pitoiset [Wed, 10 Jul 2019 22:34:18 +0000 (00:34 +0200)]
radv/gfx10: export the PrimitiveID for ES stages (VS or TES)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 10 Jul 2019 22:29:50 +0000 (00:29 +0200)]
radv/gfx10: declare an external symbol for the ESGS ring
It will be used for stream output but for now only declares it
if VS and if the PrimitiveID needs to be exported.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 10 Jul 2019 22:25:28 +0000 (00:25 +0200)]
radv/gfx10: allocate ESGS ring space for exporting PrimitiveID
Only VS needs that. We shouldn't hardcode these values but
that's complicated to not do that for now.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Sun, 14 Jul 2019 10:55:48 +0000 (12:55 +0200)]
radv/gfx10: fix crash when emitting NGG GS prologue
ac_nir_context is initialized after the driver emits the NGG GS
prologue so it's likely to crash.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Vasily Khoruzhick [Mon, 15 Jul 2019 01:22:36 +0000 (18:22 -0700)]
lima/ppir: Fix branch codegen
"unknown_2" field is actually a size of instruction that branch
points to. If it's set to a smaller size than actual instruction
branch behavior is not defined (and it usually wedges the GPU).
Fix it by setting this field correctly.
Fixes: af0de6b91c0b ("lima/ppir: implement discard and discard_if")
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Vasily Khoruzhick [Mon, 15 Jul 2019 01:21:57 +0000 (18:21 -0700)]
lima/ppir: Fix assert condition in ppir_codegen_encode_discard
Fixes: af0de6b91c0b ("lima/ppir: implement discard and discard_if")
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Jonathan Marek [Wed, 19 Jun 2019 21:49:47 +0000 (17:49 -0400)]
etnaviv: fix incorrect varying interpolation
This corresponds to what the GC3000 blob does. The USED / UNUSED enums are
wrong, at least for GC2000/GC3000.
Without this the 3rd texture component is not interpolated correctly (flat?)
in the following test (and others):
dEQP-GLES2.functional.texture.mipmap.cube.generate.rgba8888_nicest
Strangely, when the texture is sampled from OpenGL it works correctly,
the problem only shows up for sampling by gallium/blitter. This fixes other
cube map tests which use util_blitter_blit.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Jonathan Marek [Wed, 19 Jun 2019 21:16:27 +0000 (17:16 -0400)]
etnaviv: reduce rs alignment requirement for two pixel pipes GPU
The rs alignment doesn't have to be multiplied by # of pixel pipes.
This works on GC2000 which doesn't have the SINGLE_BUFFER feature.
This fixes some cubemaps (NPOT / small mipmap levels) because aligning by 8
breaks the expected alignment of 4 for tiled format. We don't want to mess
with the alignment of tiled formats.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Jonathan Marek [Wed, 19 Jun 2019 15:43:54 +0000 (11:43 -0400)]
etnaviv: fix nearest_linear / linear_nearest filtering on GC3000
The MIN filter is never used when not using mipmaps. This fixes that.
Interestingly, only GC3000 needs this (GC2000 works without this fix).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Jonathan Marek [Wed, 19 Jun 2019 15:42:13 +0000 (11:42 -0400)]
etnaviv: fix nearest filtering
ROUND_UV rounding breaks nearest filtering.
Enable it only when nearest filtering isn't used.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Bas Nieuwenhuizen [Sat, 13 Jul 2019 14:05:40 +0000 (16:05 +0200)]
radv/gfx10: Fix DCC clears.
Looks like if the reg clear bit is set, the hwardware does not use the 0/1
clears for textures.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Vinson Lee [Thu, 13 Jun 2019 22:08:27 +0000 (15:08 -0700)]
meson: Add dep_thread dependency.
Fix this build error on Ubuntu 18.04.
/usr/bin/ld: src/util/libmesa_util.a(u_cpu_detect.c.o): undefined reference to symbol 'pthread_once@@GLIBC_2.2.5'
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110663
Suggested-by: Eric Engestrom <eric@@engestrom.ch>
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Eric Anholt [Thu, 11 Jul 2019 19:58:28 +0000 (12:58 -0700)]
gitlab-ci: Build i386 and ARM drivers in surfaceless mode.
I don't particularly care about getting x86/ARM cross-build coverage
of all the window systems, but we do want to be building src/mesa/
(for x86 asm) and gallium drivers (for vc4 NEON asm). I'm also hoping
to use these build products for testing freedreno on actual HW (which
we do using surfaceless).
This increases the docker image from 1.4G to 1.5G.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Andreas Baierl [Thu, 11 Jul 2019 13:26:24 +0000 (15:26 +0200)]
lima: Fix compiler warnings for unused functions.
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Caio Marcelo de Oliveira Filho [Fri, 12 Jul 2019 21:37:38 +0000 (14:37 -0700)]
anv: Fix pool allocator when first alloc needs to grow
When using softpin, the first allocation was not calculating the
padding and offset correctly for the case the first allocation needed
to grow. We were missing initialize the state.end right after
expanding the pool for the first time.
This is not a problem for non-softpin since there we don't use
leftover padding so the ends would re-arrange incrementally.
This fixes running dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 in
SKL -- the test uses a shader larger than the initial size for the
instruction pool.
Fixes: dfc9ab2ccd9 "anv/allocator: Add padding information."
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Sun, 31 May 2015 23:02:36 +0000 (16:02 -0700)]
mesa: Port errors.c to util/list.h instead of simple_list.
There is widespread consensus that simple_list should go away.
This patch converts one more use to the modern kernel-style list.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Jason Ekstrand [Fri, 12 Jul 2019 23:47:15 +0000 (18:47 -0500)]
intel: Run the optimization loop before and after lowering int64
For bindless SSBO access, we have to do 64-bit address calculations. On
ICL and above, we don't have 64-bit integer support so we have to lower
the address calculations to 32-bit arithmetic. If we don't run the
optimization loop before lowering, we won't fold any of the address
chain calculations before lowering 64-bit arithmetic and they aren't
really foldable afterwards. This cuts the size of the generated code in
the compute shader in dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 by
around 30%.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Alyssa Rosenzweig [Fri, 12 Jul 2019 15:57:10 +0000 (08:57 -0700)]
panfrost/decode: Drop _replay prefix
We don't even support replay anymore; this is just wasting characters
and adding clutter.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 12 Jul 2019 15:54:49 +0000 (08:54 -0700)]
panfrost/decode: Drop _name suffixes
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 12 Jul 2019 15:47:35 +0000 (08:47 -0700)]
panfrost/decode: Add MEMORY_PROP_DIR variant
This allows dumping memory properties directly without dereferencing an
address, allowing us to fix more -Waddress-of-packed-member warnings.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 12 Jul 2019 15:45:51 +0000 (08:45 -0700)]
panfrost/decode: Copy embedded structs before using
Fixes some, but not all, warnings from -Waddress-of-packed-member
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 12 Jul 2019 15:45:36 +0000 (08:45 -0700)]
panfrost/decode: Remove pandecode_decode_fbd_type
It is unused.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 12 Jul 2019 15:41:13 +0000 (08:41 -0700)]
panfrost/midgard: Use generic outmod type
It could be midgard_outmod_float or midgard_outmod_int; don't assume
it's one or the other. Fixes -Wenum-conversion warnings.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 12 Jul 2019 21:48:34 +0000 (14:48 -0700)]
panfrost: Precompute scoreboard dependents
Mali job dependency graphs, at least for GLES3.0, have the special
property that a given node will only have at most a single dependent.
This allows us to efficiently precompute the dependent array and
replace an inner loop's O(N) search with an O(1) lookup, bringing the
algorithmic complexity of scoreboarding from O(N^2) to O(N).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 12 Jul 2019 21:09:57 +0000 (14:09 -0700)]
panfrost: Remove transient pool abstraction
Now that it has been totally replaced by the borrow mechanism, it is now
unused code.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 12 Jul 2019 20:59:35 +0000 (13:59 -0700)]
panfrost: Subdivide fixed-size transient slabs
The whole purpose of the transient memory model is to make subdivision
stupidly easy, so let's handle that.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 12 Jul 2019 20:05:14 +0000 (13:05 -0700)]
panfrost: Recycle fixed-size transient BOs
The usual case. We use the bitset to mark freedom and seize it.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 12 Jul 2019 19:53:36 +0000 (12:53 -0700)]
panfrost: Bookkeep transient indices
The batch now temporarily possesses the transient buffer, so it'll need
to remember that to free it later.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 12 Jul 2019 19:49:23 +0000 (12:49 -0700)]
panfrost: Rewrite allocate_transient with new abstraction
We use a fixed size slab if we can, otherwise we create a dedicated
("oversized") BO and add that to the job. In the latter case we'll get
reference counting for free so we can forget about this corner case for
the rest of the series.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 12 Jul 2019 20:55:45 +0000 (13:55 -0700)]
panfrost: Add pan_bo_for_screen helper
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Thu, 11 Jul 2019 17:34:40 +0000 (10:34 -0700)]
panfrost: Add panfrost_transient_bo array
We would like transient allocations to occur on the screen (borrowed by
the batch) rather than on the context. Add fields to track this.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Thu, 11 Jul 2019 18:39:33 +0000 (11:39 -0700)]
panfrost: Don't upload vertex/tiler twice
The latter upload is correct, but the former upload is unassociated with
any particular FBO and therefore becomes orphaned. We do have to upload
at draw-time at the latest, if we haven't by then.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Thu, 11 Jul 2019 17:53:37 +0000 (10:53 -0700)]
panfrost/drm: Check allocation size is positive
Zero-sized allocations will fail with an unhelpful errno from the
kernel; check size explicitly in userspace before it gets that far.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Neil Roberts [Tue, 12 Jun 2018 20:24:00 +0000 (22:24 +0200)]
mesa/glspirv: Validate that compute shaders are not linked with other stages
The test is based on link_shaders().
For example, it allows the following test (when run on SPIR-V mode) to
pass:
spec/arb_compute_shader/linker/mix_compute_and_non_compute.shader_test
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Neil Roberts [Fri, 25 May 2018 13:34:26 +0000 (15:34 +0200)]
mesa/glspirv: Validate that there is a VS when there is a TCS, TES or GS
The shader combination tests are copied from link_shaders().
For example, it allows the following tests (when run on SPIR-V mode) to
pass:
spec/arb_tessellation_shader/linker/no-vs
spec/arb_tessellation_shader/linker/tcs-no-vs
spec/arb_tessellation_shader/linker/tes-no-vs
spec/glsl-1.50/linker/gs-without-vs
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Alejandro Piñeiro [Wed, 27 Feb 2019 14:28:47 +0000 (15:28 +0100)]
i965: don't use disk cache with SPIR-V shaders
Right now we don't support disk cache for SPIR-V shaders (from
ARB_gl_spirv), so let's avoid writing the program data to or reading
it from the disk if any in-use shaders use SPIR-V.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Alejandro Piñeiro [Wed, 27 Feb 2019 14:29:15 +0000 (15:29 +0100)]
glsl/shader_cache: handle SPIR-V shaders
Right now we don't have cache support for SPIR-V shaders (from
ARB_gl_spirv). Right now they are properly skipped because they fall
on the ff shader code path (no key, no name), but it would be better
to update current comments, and add some guards.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Arcady Goldmints-Orlov [Mon, 28 Jan 2019 16:19:28 +0000 (10:19 -0600)]
nir/linker: Initialize UniformDataDefaults when using SPIR-V
Allocate UniformDataDefaults and fill in the data defaults when
linking a SPIR-V program. Among other things, this allows program
serialization to work.
It allows the following piglit test (when run on SPIR-V mode) to pass:
spec/arb_get_program_binary/execution/uniform-after-restore.shader_test
v2: use memcpy to initialize UniformDataDefaults
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Arcady Goldmints-Orlov [Thu, 20 Dec 2018 01:12:25 +0000 (02:12 +0100)]
glsl/serialize: Update write_program_resource_data() to handle NULL input and output variable names
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Arcady Goldmints-Orlov [Thu, 29 Nov 2018 14:16:34 +0000 (15:16 +0100)]
glsl/serialize: Handle NULL uniform name in write_uniforms()
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Antia Puentes [Tue, 18 Dec 2018 10:55:04 +0000 (11:55 +0100)]
mesa/main: Fix UBO/SSBO ACTIVE_VARIABLES query (ARB_gl_spirv)
When querying MAX_NUM_ACTIVE_VARIABLES, NUM_ACTIVE_VARIABLES and
ACTIVE_VARIABLES over SSBO and UBO interfaces, we filter the variables
which are active using the variable's name and looking for it in the
program resource list. If it is in the program resource list, the
variable will be considered active.
However due to ARB_gl_spirv where name reflection information is not
mandatory, we can use the UBO/SSBO binding and variable offset to
filter which variables which are active.
v2: use RESOURCE_UBO/UNI macros instead of direct castings, update
comment (Alejandro)
v3: Change signature of _mesa_program_resource_find_active_variable
to simplify calling it. Also, squash the fix for find_binding_offset
for arrays of blocks (Arcady)
Signed-off-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Arcady Goldmints-Orlov <agoldmints@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Antia Puentes [Fri, 14 Sep 2018 06:55:24 +0000 (08:55 +0200)]
mesa/shader_query: Fix LOCATION_INDEX query (ARB_gl_spirv)
When querying GL_LOCATION_INDEX using glGetProgramResourceiv
we already know the index of the resource, we do not need to find
it using the name, which is convenient for shaders coming from
SPIR-V binaries where names are optional.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Antia Puentes [Mon, 13 Aug 2018 12:13:38 +0000 (14:13 +0200)]
mesa/shaderapi: Fix TRANSFORM_FEEDBACK_VARYING program query
Fixes the program queries API (glGetProgramiv):
TRANSFORM_FEEDBACK_VARYINGS and TRANSFORM_FEEDBACK_VARYING_MAX_LENGTH
in two cases:
1. ARB_enhaced_layouts:
The queries were not working for GLSL shaders which specify the
varyings using enhanced layouts. We were returning the info as if the
varyings could only be specified using the API.
2. ARB_gl_spirv:
TRANSFORM_FEEDBACK_VARYING_MAX_LENGTH should return 1 if there is no
name reflection information available.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Antia Puentes [Wed, 8 Aug 2018 15:52:04 +0000 (17:52 +0200)]
mesa/uniforms: Fix GetUniformLocation (ARB_gl_spirv)
From the ARB_gl_spirv specification, glGetUniformLocation should
return -1 when no name reflection is available.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Antia Puentes [Mon, 13 Aug 2018 16:48:37 +0000 (18:48 +0200)]
mesa/shader_query: Fix NAME_LENGTH queries (ARB_gl_spirv)
For shaders constructed from SPIR-V binaries, it is possible that
no name reflection information is available. In that case,
- glGetProgramInterfaceiv(.., pname=MAX_NAME_LENGTH, ..)
- gletProgramResourceiv(.., props=NAME_LENGTH, ..)
should return 1.
Signed-off-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Alejandro Piñeiro [Sat, 18 Nov 2017 09:04:42 +0000 (10:04 +0100)]
mesa: Fix ACTIVE_*_MAX_LENGTH program queries (ARB_gl_spirv)
Since ARB_gl_spirv it is possible to miss a lot of name reflection
information, so it is needed to add NULL name checks for several
queries, and return a specific value on those cases. This commit add
them for ACTIVE_UNIFORM_BLOCK_MAX_NAME_LENGTH,
ACTIVE_ATTRIBUTE_MAX_LENGTH and ACTIVE_UNIFORM_MAX_LENGTH.
From ARB_gl_spirv spec:
"If pname is ACTIVE_UNIFORM_BLOCK_MAX_NAME_LENGTH, the length of
the longest active uniform block name, including the null
terminator, is returned. If no active uniform blocks exist, zero
is returned. If no name reflection information is available, one
is returned.
If pname is ACTIVE_ATTRIBUTE_MAX_LENGTH, the length of the longest
active attribute name, including a null terminator, is returned.
If no active attributes exist, zero is returned. If no name
reflection information is available, one is returned.
If pname is ACTIVE_UNIFORM_MAX_LENGTH, the length of the longest
active uniform name, including a null terminator, is returned. If
no active uniforms exist, zero is returned. If no name reflection
information is available, one is returned."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Antia Puentes [Thu, 15 Nov 2018 08:13:08 +0000 (09:13 +0100)]
nir/types: Add glsl_type_is_unsized_array helper
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Antia Puentes [Sun, 30 Jun 2019 23:27:59 +0000 (18:27 -0500)]
nir/linker: Fill TOP_LEVEL_ARRAY_SIZE and STRIDE
From the ARB_program_interface_query specification:
"For the property TOP_LEVEL_ARRAY_SIZE, a single integer
identifying the number of active array elements of the top-level
shader storage block member containing to the active variable is
written to <params>. If the top-level block member is not
declared as an array, the value one is written to <params>. If
the top-level block member is an array with no declared size, the
value zero is written to <params>."
"For the property TOP_LEVEL_ARRAY_STRIDE, a single integer
identifying the stride between array elements of the top-level
shader storage block member containing the active variable is
written to <params>. For top-level block members declared as
arrays, the value written is the difference, in basic machine
units, between the offsets of the active variable for consecutive
elements in the top-level array. For top-level block members not
declared as an array, zero is written to <params>."
v2: move top_level_array_size and stride into nir_link_uniforms_state
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Antia Puentes [Wed, 12 Sep 2018 11:51:57 +0000 (13:51 +0200)]
nir/linker: Compute the offset for non-trivial uniform types.
ARB_gl_spirv points that the offset must be explicit, however this is
true for 'root' types. For complex types, like struct members or
arrays of arraya, it needs to be computed.
We are not using the offset stored in the gl_buffer_variables during
the uniform blocks linking because currently we do not have a way to
relate a gl_buffer_variable with its corresponding gl_uniform_storage.
The GLSL path uses the name for that, but we can not rely on that
because names are optional in SPIR-V.
Notice that uniforms non-backed by a buffer object will have an offset
equal to -1, like in the GLSL path.
v2: add offset and var_is_in_block as per-variable state in
nir_link_uniforms_state (Arcady)
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Antia Puentes [Sat, 15 Dec 2018 17:34:11 +0000 (18:34 +0100)]
nir/linker: Add atomic counters to the program resource list
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Antia Puentes [Sat, 15 Dec 2018 17:33:18 +0000 (18:33 +0100)]
nir/linker: Add XFB resources to the program resource list
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Antia Puentes [Sat, 15 Dec 2018 17:25:41 +0000 (18:25 +0100)]
nir/linker: Add BUFFER_VARIABLEs to the prog resource list
v2: use link_util_should_add_buffer_variable() (Arcady)
Signed-off-by: Arcady Goldmints-Orlov <agoldmints@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Antia Puentes [Wed, 8 Aug 2018 12:29:38 +0000 (14:29 +0200)]
nir/linker: Add inputs/outputs to the program resource list
v2: added TODO comment hinting possible future refactoring of
nir_build_program_resource_list and build_program_resource_list,
to avoid code duplication (Alejandro, to explicitly reflect a
valid concern from Timothy during the review).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Alejandro Piñeiro [Fri, 23 Mar 2018 11:35:48 +0000 (12:35 +0100)]
nir/linker: add ubo/ssbo to the program resource list
v2: "nir/linker: Use the stageref when adding UBO/SSBO resources"
squashed on this one (Timothy)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Antia Puentes [Sat, 25 Aug 2018 13:15:30 +0000 (15:15 +0200)]
nir/linker: Fill the uniform's BLOCK_INDEX
Binding comparison is used to determine the block the uniform is part
of. Note that to do the binding comparison we need the information in
UniformBlocks[] and ShaderStorageBlocks[] to be available, so we have
to call gl_nir_link_uniform_blocks() before linking the uniforms.
v2: add missing break (Timothy)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Samuel Pitoiset [Fri, 12 Jul 2019 06:17:09 +0000 (08:17 +0200)]
radv/gfx10: enable 1D textures
Mirror RadeonSI. This also fixes crashes in addrlib.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Andres Gomez [Fri, 12 Jul 2019 15:17:01 +0000 (18:17 +0300)]
intel/compiler: remove abandoned comments
c8665005: ("intel/compiler: Don't always require precise lowering of flrp")
forgot to remove some comments that didn't apply any more after the
change.
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrnd.net>
Andres Gomez [Mon, 8 Jul 2019 13:26:52 +0000 (16:26 +0300)]
nir/compiler: keep same bit size when lowering with flrp
This was probably not caught before because no supported test was
exercising the flrp lowering with other bit size different than 32.
With the arrival of VK_KHR_shader_float_controls we will have some of
those and, unless we keep the bit size, we will end with something
like:
../src/compiler/nir/nir_builder.h:420: nir_builder_alu_instr_finish_and_insert: Assertion `src_bit_size == bit_size' failed.
Fixes: 158370ed2a0 ("nir/flrp: Add new lowering pass for flrp instructions")
Fixes: ae02622d8fd ("nir/flrp: Lower flrp(a, b, c) differently if another flrp(_, b, c) exists")
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrnd.net>
Jason Ekstrand [Fri, 12 Jul 2019 13:20:25 +0000 (08:20 -0500)]
anv: Properly compute image usage in CreateImageView
With separate stencil usage, we can't just grab the usage from the image
directly and have to consider the per-aspect usage instead.
Fixes: 1be38f9178 "anv:Use VK_EXT_separate_stencil_usage to avoid..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Samuel Pitoiset [Fri, 12 Jul 2019 10:17:18 +0000 (12:17 +0200)]
radv/gfx10: emit DISABLE_CONSERVATIVE_ZPASS_COUNTS
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Fri, 12 Jul 2019 10:17:17 +0000 (12:17 +0200)]
radv/gfx10: init more registers in the graphics preamble
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Fri, 12 Jul 2019 10:17:16 +0000 (12:17 +0200)]
radv/gfx10: set HS/GS/CS.WGP_MODE
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Fri, 12 Jul 2019 10:17:15 +0000 (12:17 +0200)]
radv/gfx10: emit GE_PC_ALLOC
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Fri, 12 Jul 2019 10:17:14 +0000 (12:17 +0200)]
radv/gfx10: enable vertex shaders without export parameters
GFX10 allows this.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Fri, 12 Jul 2019 10:17:13 +0000 (12:17 +0200)]
radv/gfx10: launch 2 compute waves per CU before going onto the next CU
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Fri, 12 Jul 2019 10:17:12 +0000 (12:17 +0200)]
radv: use ac_get_compute_resource_limits()
No behaviour change.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Fri, 12 Jul 2019 10:17:11 +0000 (12:17 +0200)]
ac: import ac_get_compute_resource_limits() from RadeonSI
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Alyssa Rosenzweig [Thu, 11 Jul 2019 22:46:22 +0000 (15:46 -0700)]
panfrost: Initialize shift/extra_flags
Don't rely on them being preinitialized to zero; this can cause junk to
appear on the wire.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Thu, 11 Jul 2019 22:34:56 +0000 (15:34 -0700)]
panfrost: Fix build warnings
A bunch of these are from asserts not being compiled in 32-bit mode
(once Erik's ASSERTABLE stuff is merged, we'll want to switch).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Samuel Pitoiset [Fri, 12 Jul 2019 11:59:08 +0000 (13:59 +0200)]
radv/gfx10: invalidate everything in L2 when shaders read data
This includes metadata as well. On GFX10, we have to invalidate
the L2 metadata cache when shaders read DCC.
Note that we still have to implement GFX10 coherency by
introducing INV_L2_METATADA but for now just flush L2.
This fixes a corruption with DCC and Talos.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Fri, 12 Jul 2019 09:12:58 +0000 (11:12 +0200)]
radv/gfx10: fix wrong emission of GE_CNTL
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Fri, 12 Jul 2019 09:12:57 +0000 (11:12 +0200)]
radv: add more assertions to make sure packets are correctly emitted
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Alejandro Piñeiro [Tue, 25 Jun 2019 13:02:56 +0000 (15:02 +0200)]
v3d: use inc/dec tmu operation with image atomic sub/add of 1
This allows to remove a mov of 1/-1, as it is implicit with the
operation.
As with atomic inc/dec/add, usual shader-db set doesn't include any
GLES shader using it. So using as workaround vk-gl-cts shaders, we get
this:
total instructions in shared programs:
1217013 ->
1217006 (<.01%)
instructions in affected programs: 53 -> 46 (-13.21%)
helped: 2
HURT: 0
One of the helped shader went from 40 to 34 instructions.
Reviewed-by: Eric Anholt <eric@anholt.net>
Alejandro Piñeiro [Thu, 27 Jun 2019 12:16:15 +0000 (14:16 +0200)]
v3d: refactor some code from v3d40_vir_emit_image_load_store
And moved to new auxiliar method v3d40_image_load_store_tmu_op,
equivalent to the nir_to_nir v3d_general_tmu_op, to clean-up a little.
Reviewed-by: Eric Anholt <eric@anholt.net>
Alejandro Piñeiro [Wed, 19 Jun 2019 11:17:41 +0000 (13:17 +0200)]
v3d: use inc/dec tmu operation with atomic sub/add of 1
Among other things, this avoid the need of loading 1/-1 constants (so
one less operation).
The removed comment suggest the option of adding support on NIR for
inc/dec. Intel just uses an auxiliar method to get which hw operation
is needed, so no lowering is needed. And at the same time, being so
small, seems unreasonable to try to add a general one on NIR
itself. It is more easy to just adapt the method here (that is what
the patch does right now).
It is worth to note that we are not getting any change on shader-db
stats because all those methods are used on the usual shader-db set
with shaders needing GLSL > 4.2. In general there aren't too many GLSL
ES 3.1 tests.
As an alternative, we captured the GLES3/GLSL31/GLS32 used on
vk-gl-cts, even if that is not a real life usage of shaders. With
those we get the following:
total instructions in shared programs:
1217022 ->
1217013 (<.01%)
instructions in affected programs: 117 -> 108 (-7.69%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.50 x̃: 1
helped stats (rel) min: 3.57% max: 10.00% x̄: 8.09% x̃: 9.09%
95% mean confidence interval for instructions value: -2.07 -0.93
95% mean confidence interval for instructions %-change: -10.54% -5.64%
Instructions are helped.
Note that the shaders helped are really low because most of the
vk-gl-cts tests using AtomicInc/Dec/Add are mostly used on compute
shaders. Although right now there is a branch around with CS support,
the usual is doing the stats against master.
Reviewed-by: Eric Anholt <eric@anholt.net>
Alejandro Piñeiro [Tue, 2 Jul 2019 10:45:49 +0000 (12:45 +0200)]
v3d: remove redefinition of tmu operations on nir_to_vir
They are already defined, although is a slightly different format on
the generated packet headers, so it was needed to change how it is
used on nir_to_vir.
In addition to allow to remove some duplicated headers, it will allow
to define just one get_op_for_atomic_add aux method later to support
using inc/dec instead of add of 1/-1.
Reviewed-by: Eric Anholt <eric@anholt.net>
Alejandro Piñeiro [Tue, 2 Jul 2019 10:02:04 +0000 (12:02 +0200)]
v3d: tweak initial comment on pack generator script
As the files it mentions to use as reference has slightly different
names.
Reviewed-by: Eric Anholt <eric@anholt.net>
Yevhenii Kolesnikov [Wed, 10 Jul 2019 10:44:44 +0000 (13:44 +0300)]
glsl/link_varyings: Fix hash table leak
Hash tables were not destroyed at return.
v2: Use ralloc_context (Eric Anholt)
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Kenneth Graunke [Mon, 1 Apr 2019 22:27:01 +0000 (15:27 -0700)]
iris: Simplify devinfo access in calculate_result_on_gpu()
We have devinfo, no need for screen->devinfo.
Iago Toral Quiroga [Tue, 9 Jul 2019 10:24:43 +0000 (12:24 +0200)]
v3d: remove unused definitions
Reviewed-by: Eric Anholt <eric@anholt.net>
Iago Toral Quiroga [Wed, 3 Jul 2019 07:56:49 +0000 (09:56 +0200)]
v3d: move implementation of some intrinsics to separate helpers
Reviewed-by: Eric Anholt <eric@anholt.net>
Iago Toral Quiroga [Thu, 11 Jul 2019 08:56:17 +0000 (10:56 +0200)]
v3d: emit correct lowering for logic ops with RGB10A2 render targets
Reviewed-by: Eric Anholt <eric@anholt.net>
Iago Toral Quiroga [Mon, 8 Jul 2019 10:31:38 +0000 (12:31 +0200)]
v3d: emit correct lowering for logic ops with integer render targets
Reviewed-by: Eric Anholt <eric@anholt.net>
Iago Toral Quiroga [Wed, 3 Jul 2019 07:38:39 +0000 (09:38 +0200)]
v3d: add lowering for OpenGL logic operations
This implements support for OpenGL logic operations by emitting code to read
from the TLB if needed and blending the fragment output accordingly. It is
similar to VC4's blend lowering pass, but exclusive to logic operations, since
blending is otherwise supported in hardware.
The pass doesn't handle MSAA targets yet.
Fixes the following piglit tests:
spec/!opengl 1.0/gl-1.0-logicop/*
spec/!opengl 1.1/gl-1.1-xor
spec/!opengl 1.1/gl-1.1-xor-copypixels
It also fixes text cursor rendering in Libreoffice with the GTK+2 theme, which
is rendered via glamor using the XOR logic operation.
v2: fix checks for allowed variable location and maximum render target (Eric)
Reviewed-by: Eric Anholt <eric@anholt.net>
Iago Toral Quiroga [Thu, 4 Jul 2019 10:22:40 +0000 (12:22 +0200)]
v3d: acquire scoreboard lock before first tlb read
Until now we have always been emitting our scoreboard locks on the last thread
switch to improve parallelism. We did this by emitting our last thread switch
right before our tlb writes at the very end of the program, where we know that
we are outside control flow.
Unfortunately, this strategy is not valid when we have tlb color reads too, as
these will happen before this point in the program and can happen inside
control flow.
To fix this we always emit a thread switch before the first tlb load and if we
see additional thread switches after that point, we change the strategy to lock
on the first thread switch.
v2: change the solution so it is expected to work in more scenarios (Eric).
Reviewed-by: Eric Anholt <eric@anholt.net>
Iago Toral Quiroga [Wed, 3 Jul 2019 07:30:43 +0000 (09:30 +0200)]
v3d: implement tile buffer color read intrinsic
We will be emitting this intrinsic to signal TLB color loads when we implement
OpenGL logic operations, where we need to blend the fragment shader color
output with the existing color in the render target.
Per-sample TLB reads are not supported yet.
v2: fix the offset into the color_reads array (Eric).
Reviewed-by: Eric Anholt <eric@anholt.net>
Iago Toral Quiroga [Tue, 9 Jul 2019 07:15:02 +0000 (09:15 +0200)]
nir: add a new v3d-specific intrinsic for tile buffer color reads
This is intended to be used, for example, with OpenGL logic operations. It
takes a render target as source and a sample index in the base index for
MSAA color reads.
v2: drop the CAN_ELIMINATE and CAN_REORDER flags (Eric).
Reviewed-by: Eric Anholt <eric@anholt.net>
Iago Toral Quiroga [Wed, 3 Jul 2019 07:19:52 +0000 (09:19 +0200)]
v3d: fix size of color_reads and sample_colors arrays
We need to scale the size of these arrays to consider up to
V3D_MAX_DRAW_BUFFERS render targets and 4 components per color.
v2: we want to store each color component separately, so scale by 4 too.
Reviewed-by: Eric Anholt <eric@anholt.net>
Iago Toral Quiroga [Wed, 3 Jul 2019 07:14:25 +0000 (09:14 +0200)]
v3d: add color formats and swizzles to the fragment shader key
We are going to need these very soon to emit correct reads from the tlb
to implement logic operations.
Reviewed-by: Eric Anholt <eric@anholt.net>
Iago Toral Quiroga [Wed, 3 Jul 2019 07:09:11 +0000 (09:09 +0200)]
v3d: add helpers to emit ldtlb and ldtlbu signals
The ldtlbu version will read an implicit uniform with the TLB read
specifier and should be used for the first read in a sequence
of TLB reads (unless the default configuration is valid, in which
case we can use ldtlb). The ldtlb version is used for any subsequent
TLB read in the sequence.
Reviewed-by: Eric Anholt <eric@anholt.net>