Rafael Antognolli [Mon, 29 Apr 2019 18:05:07 +0000 (11:05 -0700)]
iris: Add Tile Cache Flush for Unified Cache.
Jordan Justen [Sat, 9 Sep 2017 02:08:21 +0000 (19:08 -0700)]
intel/genxml: Add gen12 tile cache flush bit
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Daniel Schürmann [Thu, 24 Oct 2019 16:27:25 +0000 (18:27 +0200)]
aco: implement VGPR spilling
VGPR spilling is implemented via MUBUF instructions and scratch memory.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Daniel Schürmann [Wed, 30 Oct 2019 17:24:39 +0000 (18:24 +0100)]
aco: always set scratch_offset in startpgm
This patch also moves private_segment_buffer and
scratch_offset to Program to easily access it.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Daniel Schürmann [Wed, 30 Oct 2019 13:54:44 +0000 (14:54 +0100)]
aco: omit linear VGPRs as spill variables
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Daniel Schürmann [Wed, 30 Oct 2019 13:42:00 +0000 (14:42 +0100)]
aco: ensure that spilled VGPR reloads are done after p_logical_start
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Daniel Schürmann [Thu, 24 Oct 2019 09:38:37 +0000 (11:38 +0200)]
aco: simplify calculation of target register pressure when spilling
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Rhys Perry [Wed, 30 Oct 2019 18:00:36 +0000 (18:00 +0000)]
aco: fix new_demand calculation for first instructions
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Daniel Schürmann [Wed, 30 Oct 2019 11:32:32 +0000 (12:32 +0100)]
aco: don't add interferences between spilled phi operands
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Daniel Schürmann [Wed, 30 Oct 2019 11:04:22 +0000 (12:04 +0100)]
aco: consider loop_exit blocks like merge blocks, even if they have only one predecessor
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Daniel Schürmann [Wed, 30 Oct 2019 11:00:23 +0000 (12:00 +0100)]
aco: don't insert the exec mask into set of live-out variables when spilling
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Daniel Schürmann [Wed, 16 Oct 2019 14:39:06 +0000 (16:39 +0200)]
aco: fix transitive affinities of spilled variables
Variables spilled on both branch legs need to be assigned to the same spilling slot.
These affinities can be transitive through multiple merge blocks.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Daniel Schürmann [Tue, 29 Oct 2019 10:58:21 +0000 (11:58 +0100)]
aco: fix live-range splits of phis
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Daniel Schürmann [Tue, 29 Oct 2019 10:57:11 +0000 (11:57 +0100)]
aco: remove potential critical edge on loops.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Daniel Schürmann [Tue, 29 Oct 2019 10:56:09 +0000 (11:56 +0100)]
aco: improve live variable analysis
This patch makes the live variable analysis more precise
w.r.t. killed phi operands and the block's register pressure.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Daniel Schürmann [Tue, 15 Oct 2019 16:23:52 +0000 (18:23 +0200)]
aco: Lower to CSSA
Converting to 'Conventional SSA Form' ensures correctness w.r.t. spilling of phi nodes.
Previously, it was possible that phi operands have intersecting live-ranges, and thus,
couldn't get spilled to the same spilling slot. For this reason, ACO tried to avoid to
spill phis, even if it was beneficial.
This patch implements a conversion pass which is currently only called if spilling is necessary.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Jonathan Marek [Wed, 3 Jul 2019 18:08:37 +0000 (14:08 -0400)]
etnaviv: fix non-pointsprite points on GC7000L
Fixes these deqp tests (and more):
dEQP-GLES2.functional.draw.draw_arrays.points.single_attribute
dEQP-GLES2.functional.draw.draw_arrays.points.multiple_attributes
dEQP-GLES2.functional.draw.draw_arrays.points.default_attribute
dEQP-GLES2.functional.draw.draw_elements.points.single_attribute
dEQP-GLES2.functional.draw.draw_elements.points.multiple_attributes
dEQP-GLES2.functional.draw.draw_elements.points.default_attribute
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Jonathan Marek [Sun, 20 Oct 2019 18:37:25 +0000 (14:37 -0400)]
etnaviv: stencil fix
The final version of previous stencil fix patch ended up breaking one-sided
stencil.
Fixes remaining failures in these deqp tests (tested on GC3000/GC7000L):
dEQP-GLES2.functional.fragment_ops.depth_stencil.*
Note: deqp tests require --deqp-gl-config-name=rgba8888d24s8ms0
Fixes: 05da025f ("etnaviv: fix two-sided stencil")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Jonathan Marek [Mon, 2 Sep 2019 18:46:15 +0000 (14:46 -0400)]
etnaviv: fix depth bias
Fixes remaining failures in these deqp tests (tested on GC3000/GC7000L):
dEQP-GLES2.functional.polygon_offset.*
Fixes: 6c3c05dc ("etnaviv: fix polygon offset")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Jordan Justen [Fri, 10 May 2019 18:50:54 +0000 (11:50 -0700)]
iris: Set MOCS for external surfaces to uncached
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Rafael Antognolli [Tue, 13 Aug 2019 21:47:27 +0000 (14:47 -0700)]
iris: Align fast clear color state buffer to a page.
On gen11 and older, compressed images are tiled and aligned to 4K. On
gen12 this 4K alignment restriction was removed. However, only aligning
the fast clear color buffer to 64B (a cacheline, as it's on the
documentation) is causing some bugs where the fast clear color is not
converted during the fast clear operation. Aligning things to 4K seems
to fix it.
v2: Fix typo case in the comment (Nanley)
v3: Rebase and fix conflicts.
v4: Fix rebase mistake (Nanley).
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Rafael Antognolli [Tue, 13 Aug 2019 21:47:27 +0000 (14:47 -0700)]
anv: Align fast clear color state buffer to a page.
On gen11 and older, compressed images are tiled and aligned to 4K. On
gen12 this 4K alignment restriction was removed. However, only aligning
the fast clear color buffer to 64B (a cacheline, as it's on the
documentation) is causing some bugs where the fast clear color is not
converted during the fast clear operation. Aligning things to 4K seems
to fix it.
v2: Assert that image->planes[plane].offset is 4K aligned (Nanley)
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Erik Faye-Lund [Wed, 23 Oct 2019 10:16:22 +0000 (12:16 +0200)]
zink: only enable KHR_external_memory_fd if supported
While we're at it, make sure we error out if it's not supported when
required.
This brings us a bit closer to being able to test on SwiftShader, which
doesn't currently support KHR_external_memory_fd.
Bas Nieuwenhuizen [Wed, 30 Oct 2019 13:51:17 +0000 (14:51 +0100)]
radv: Start signalling semaphores in WSI acquire.
Winsys semaphores without signal operation get silently ignored.
Not so for syncobjs, so actually signal them.
Fixes: 84d9551b232 "radv: Always enable syncobj when supported for all fences/semaphores."
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2030
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Rhys Perry [Mon, 21 Oct 2019 14:08:07 +0000 (15:08 +0100)]
aco: rename README to README.md
Closes: #1974
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Rhys Perry [Tue, 29 Oct 2019 11:19:39 +0000 (11:19 +0000)]
aco: a couple loop handling fixes for GFX10 hazard pass
It was joining from the wrong blocks and block.kind is a bitmask instead
of an enum.
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Matt Turner [Mon, 9 Sep 2019 20:01:06 +0000 (13:01 -0700)]
intel/compiler: Add instruction compaction support on Gen12
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Matt Turner [Thu, 15 Feb 2018 18:33:18 +0000 (10:33 -0800)]
intel/compiler: Make separate src0/src1 index tables
TGL uses different data (and even a different format!) for each source.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Matt Turner [Tue, 13 Feb 2018 00:35:49 +0000 (16:35 -0800)]
intel/compiler: Inline get_src_index()
TGL will have separate tables for src0 and src1, so the shared function
will no longer make sense.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Matt Turner [Tue, 13 Feb 2018 00:26:20 +0000 (16:26 -0800)]
intel/compiler: Restructure instruction compaction in preparation for Gen12
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Matt Turner [Wed, 16 Oct 2019 19:45:55 +0000 (12:45 -0700)]
intel/compiler: Remove unreachable() from brw_reg_type.c
The EU compaction unit test fuzzes the compaction code by flipping bits.
We use a simple skip_bits() function with a list of reserved bits to
ignore, but for more complex cases like invalid combinations of register
file:type, we need either machinery to check validity or for these
functions to simply inform us whether a combination was valid.
enum brw_reg_type a 4-bit field in brw_reg, so rather than expanding it
with an "INVALID" value, just return -1 and let the caller check for
that.
Scott suggested redefining unreachable() within the unit test to
longjmp() which would allow driver code like this to still use it and
allow the test to handle expected failures like this. If that plan works
out, I plan to revert this.
Jonathan Marek [Fri, 6 Sep 2019 16:59:15 +0000 (12:59 -0400)]
freedreno/a2xx: add missing vertex formats (SSCALE/USCALE/FIXED)
Mostly for vertex formats, but they are supported as texture formats too
(untested however).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Pierre-Eric Pelloux-Prayer [Tue, 22 Oct 2019 08:12:49 +0000 (10:12 +0200)]
radeonsi: disable sdma for gfx10
Disable sdma on gfx10 until all timeouts bugs are fixed.
See:
https://gitlab.freedesktop.org/mesa/mesa/issues/1907
https://bugs.freedesktop.org/show_bug.cgi?id=111481
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Pierre-Eric Pelloux-Prayer [Thu, 17 Oct 2019 14:15:54 +0000 (16:15 +0200)]
radeonsi: sdma misc fixes
SDMA IB doesn't need to be padded for SDMA.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Pierre-Eric Pelloux-Prayer [Tue, 15 Oct 2019 13:19:22 +0000 (15:19 +0200)]
radeonsi: align sdma byte count to dw
If src/dst addresses are dw aligned and size is > 4 then we align
byte count to dw as well.
PAL implementation works like this.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timur Kristóf [Tue, 17 Sep 2019 17:59:52 +0000 (19:59 +0200)]
radv: Enable ACO on Navi.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Leo Liu [Mon, 28 Oct 2019 17:17:04 +0000 (13:17 -0400)]
radeonsi: enable 8K video decode support for HEVC and VP9
HW 8K decode support starts at Renoir
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Leo Liu [Mon, 28 Oct 2019 17:08:25 +0000 (13:08 -0400)]
radeon/vcn: Add VP9 8K decode support
Require increase of context buffer size
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Rhys Perry [Fri, 18 Oct 2019 12:05:00 +0000 (13:05 +0100)]
aco: try to group together VMEM loads of the same resource
v2: remove accidental shaderInt16 change
v2: simplify can_move_down initialization
v2: simplify VMEM_CLAUSE_MAX_GRAB_DIST
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Daniel Schürmann [Thu, 10 Oct 2019 14:31:40 +0000 (16:31 +0200)]
aco: don't schedule instructions through depending VMEM instructions
Previously, the scheduler tried to move up instructions from below depending
VMEM instructions only to move them down again when scheduling the VMEM
instruction.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Daniel Schürmann [Thu, 10 Oct 2019 12:55:13 +0000 (14:55 +0200)]
aco: add can_reorder flags to load_ubo and load_constant
These got lost due to some refactoring.
Due to the way our scheduler works currently, for now
we add back the reorder flag for divergent loads only.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Daniel Schürmann [Wed, 28 Aug 2019 10:08:12 +0000 (12:08 +0200)]
aco: only skip RAR dependencies if the variable is killed somewhere
This patch changes VMEM scheduling in a way that they can only
be moved upwards by previous VMEM instructions but not downwards.
This way, it improves the order of VMEM instructions in relation
to their users.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Daniel Schürmann [Thu, 29 Aug 2019 15:17:32 +0000 (17:17 +0200)]
aco: restrict scheduling depending on max_waves
Previously, we allowed all shaders to reduce the number of max_waves to as low as 5.
Restricting this on shaders with low register demand, increases the total number of waves
while the VMEM def-use distances hardly change.
This patch also changes the max number of move operations per MEM instruction.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Jason Ekstrand [Tue, 29 Oct 2019 21:10:49 +0000 (16:10 -0500)]
anv: Avoid emitting UBO surface states that won't be used
This shaves around 4-5% off of a CPU-limited example running with the
Dawn WebGPU implementation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Tue, 29 Oct 2019 22:28:18 +0000 (17:28 -0500)]
intel/vec4: Set brw_stage_prog_data::has_ubo_pull
In
0e4a75f917, Ken added a flag brw_stage_prog_data which indicates
whether any UBO pulls ever occur. Unfortunately, he neglected to set
the bit in the vec4 back-end. This was fine at the time because the
optimization was intended for iris which does not support gen7 and using
the vec4 back-end on Gen8+ requires an environment variable. We want to
use this in Vulkan which does support Gen7 so we want the information
from the vec4 back-end as well as scalar.
Fixes: 0e4a75f917 "intel/compiler: Record whether any pull constant..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Samuel Pitoiset [Mon, 28 Oct 2019 14:12:27 +0000 (15:12 +0100)]
radv: fix perftest options
RADV_PERFTEST=outooforder has been removed a while ago. This fixes
dumping the options into hang reports.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Mon, 28 Oct 2019 14:12:03 +0000 (15:12 +0100)]
radv: move nomemorycache debug option at the right palce
Fixes: 6571000071d ("radv: add debug option to turn off in memory cache")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Mon, 28 Oct 2019 15:56:15 +0000 (16:56 +0100)]
radv: fix dumping SPIR-V into hang reports
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tapani Pälli [Fri, 25 Oct 2019 08:06:05 +0000 (11:06 +0300)]
mesa: enable ARB_gpu_shader_int64 in compat profile
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tapani Pälli [Fri, 25 Oct 2019 08:00:04 +0000 (11:00 +0300)]
mesa: add [Program]Uniform*64ARB display list support
This is required for int64 to be enabled in compat profile.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bas Nieuwenhuizen [Fri, 25 Oct 2019 08:26:50 +0000 (10:26 +0200)]
radv: Enable VK_KHR_timeline_semaphore.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Mon, 28 Oct 2019 01:44:54 +0000 (02:44 +0100)]
radv: Add wait-before-submit support for timelines.
This is actually a non-threaded implementation. I'd summarize this
as event-based submission.
When submit happens we walk a tree of submissions that depend on
the syncobj signal operations to be submitted and if those submission
we no other dependencies we start to execute them immediately.
Or, well I still use a list to avoid issues with long chains and
the stacksize when using recursion.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Tue, 22 Oct 2019 08:18:06 +0000 (10:18 +0200)]
radv: Add timelines with a VK_KHR_timeline_semaphore impl.
This does not fully do wait-before-submit, to be done in a follow
up patch.
For kernels without support for timeline syncobjs, this adds an
implementation of non-shareable timelines using legacy syncobjs.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Wed, 23 Oct 2019 13:31:43 +0000 (15:31 +0200)]
radv: Add temporary datastructure for submissions.
So we can defer them.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Sun, 20 Oct 2019 20:50:58 +0000 (22:50 +0200)]
radv: Split semaphore into two parts as enum+union.
This is in preparation to adding more types.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Sun, 20 Oct 2019 17:15:24 +0000 (19:15 +0200)]
radv: Always enable syncobj when supported for all fences/semaphores.
This simplifies code for timeline semaphores by needing to support
less configurations.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Sun, 20 Oct 2019 17:12:24 +0000 (19:12 +0200)]
radv: Improve fence signalling in QueueSubmit.
Only signalling it once.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Sat, 19 Oct 2019 15:05:22 +0000 (17:05 +0200)]
radv: Do sparse binding in queue submission.
So we have one place to do queue things if we end up deferring
submissions.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Thu, 3 Oct 2019 19:08:29 +0000 (21:08 +0200)]
radv: Split out commandbuffer submission.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Tue, 1 Oct 2019 16:14:34 +0000 (18:14 +0200)]
radv: Clean up unused variable.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Wed, 30 Oct 2019 02:29:21 +0000 (03:29 +0100)]
radv: Add an early exit in the secure compile if we already have the cache entries.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Bas Nieuwenhuizen [Wed, 30 Oct 2019 01:54:37 +0000 (02:54 +0100)]
radv: Compute hashes in secure process for secure compilation.
To prevent poisoning arbitrary cache entries.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Erik Faye-Lund [Tue, 29 Oct 2019 13:12:02 +0000 (14:12 +0100)]
zink: drop nop descriptor-updates
If there's nothing to be done, let's actually do nothing. Seems like a
good idea.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Erik Faye-Lund [Tue, 29 Oct 2019 12:27:58 +0000 (13:27 +0100)]
zink: use bitfield for dirty flagging
Bitfields are a bit more ideomatic than explicit flags, and harder to
get wrong.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Erik Faye-Lund [Tue, 29 Oct 2019 11:43:56 +0000 (12:43 +0100)]
zink: use dynamic state for line-width
This will lead to fewer pipelines in the cache, which is assumed to
become our most unavoidable performance bottle-neck down the line.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Duncan Hopkins [Wed, 14 Aug 2019 10:07:47 +0000 (11:07 +0100)]
zink: Use optimal layout instead of general. Reduces valid layer warnings. Fixes RADV image noise.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Michel Dänzer [Wed, 30 Oct 2019 08:38:20 +0000 (09:38 +0100)]
gitlab-ci: Disable meson-windows job for the time being
It needs a CI runner carrying the mesa-windows tag, but there's none
available currently.
Timothy Arceri [Tue, 29 Oct 2019 06:46:57 +0000 (17:46 +1100)]
radv: make use of radv_sc_read()
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Timothy Arceri [Tue, 29 Oct 2019 06:43:40 +0000 (17:43 +1100)]
radv: add radv_sc_read() helper
This is a function with timeout support for reading from the pipe
between processes used for secure compile.
Initially we hardcode the timeout to 5 seconds. We can adjust the
timeout limit in future if needed.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Timothy Arceri [Tue, 29 Oct 2019 06:41:41 +0000 (17:41 +1100)]
radv: allow select() calls in secure compile
This will be used in the following patch to support timeouts for
reading the pipe between processes.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Lepton Wu [Wed, 30 Oct 2019 00:52:21 +0000 (17:52 -0700)]
mapi: Improve the x86 tsd stubs performance.
This skips touching %ebx most times and it shows that glGetString performance
increased from 114M/s to 120M/s on my desktop.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
Lepton Wu [Tue, 22 Oct 2019 03:22:18 +0000 (20:22 -0700)]
mapi: Inline call x86_current_tls.
This saves one return and a simple benchmark which calls glGetString
repeatedly on my desktop shows it improves calls per second from 123M
to 141M.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1997
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
Lepton Wu [Sat, 26 Oct 2019 00:27:04 +0000 (17:27 -0700)]
mapi: Clean up entry_patch_public for x86 tls
Remove hard coded 16 and use entry_generate_or_patch to patch
public stubs. The generated code actually is sightly tighter
than before since the "nop" instructions before the final "jmp"
get removed.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
Lepton Wu [Fri, 25 Oct 2019 23:54:35 +0000 (16:54 -0700)]
mapi: split entry_generate_or_patch for x86 tls
The code works exactly the same with before. Just split this function
out so we can reuse it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
Jonathan Gray [Fri, 13 Sep 2019 17:09:15 +0000 (10:09 -0700)]
mapi: Adapted libglvnd x86 tsd changes
The x86 assembly language stub in src/mapi/entry_x86_tsd.h does not
generate PIC (position-independent code). This causes text relocations
which bring troubles on recent versions of FreeBSD, OpenBSD, Android.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108541
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
Caio Marcelo de Oliveira Filho [Tue, 29 Oct 2019 19:09:38 +0000 (12:09 -0700)]
spirv: Don't fail if multiple ordering semantics bits are set
Vulkan requires that only one bit for the ordering is set, but old
versions of GLSLang just set all the bits. This was fixed as part of
https://github.com/KhronosGroup/glslang/commit/
c51287d744fb6e7e9ccc09f6f8451e6c64b1dad6
but we can still find older versions (or shaders compiled with it)
around.
So instead of failing, emit a warning and fallback to the effective
result of any combination of multiple bits: AcquireRelease.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2018
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Sagar Ghuge [Tue, 15 Oct 2019 21:13:29 +0000 (14:13 -0700)]
intel/isl: Allow stencil buffer to support compression on Gen12+
v2: (Nanley Chery)
- Fix commit title
- Fix comment
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Sagar Ghuge [Tue, 17 Sep 2019 20:20:16 +0000 (13:20 -0700)]
iris: Resolve stencil resource prior to copy or used by CPU
v2: Decide aux usage in get_copy_region_aux_settings (Nanley Chery)
v3: Use isl_surf_usage_is_stencil function (Nanley Chery)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Sagar Ghuge [Tue, 3 Sep 2019 23:30:14 +0000 (16:30 -0700)]
iris: Prepare resources before stencil blit operation
We have to resolve destination surfaces if we are bliting to and from
the same surface.
v2: Revert unrelated change (Nanley Chery)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Sagar Ghuge [Wed, 28 Aug 2019 07:21:20 +0000 (00:21 -0700)]
iris: Prepare depth resource if clear_depth enable
Avoid preparing depth resource, if we did fast depth clear before.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Sagar Ghuge [Wed, 14 Aug 2019 20:58:57 +0000 (13:58 -0700)]
iris: Prepare stencil resource before clear depth stencil
Let aux surface state tracker track the stencil buffer's aux state while
clearing depth stencil buffer.
v2: Fix condition check (Nanley Chery)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Sagar Ghuge [Wed, 7 Aug 2019 20:42:39 +0000 (13:42 -0700)]
iris: Resolve stencil buffer lossless compression with WM_HZ_OP packet
Even though stencil buffer compression looks like regular lossless color
compression w/o fast clear support, we have to resolve stencil buffer
with WM_HZ_OP packet.
v2: Check if resource is stencil with helper function (Nanley Chery)
v3: Remove unnecessary included file (Nanley Chery)
v4: (Nanley Chery)
- Avoid stencil buffer aux state transition by improving condition check
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Sagar Ghuge [Tue, 17 Sep 2019 18:04:15 +0000 (11:04 -0700)]
intel/blorp: Set stencil resolve enable bit
When set, the stencil buffer is filled with the true stencil values and
we have to disable stencil buffer clear enable bit.
v2: 1) Refactor code little bit (Nanley Chery)
2) Fix assertion (Nanley Chery)
v3: 1) Remove unncessary assignment (Nanley Chery)
2) Fix GEN_GEN check (Nanley Chery)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Sagar Ghuge [Wed, 23 Oct 2019 23:24:46 +0000 (16:24 -0700)]
intel: Track stencil aux usage on Gen12+
Enable stencil compression enable and control surface enable bit if
stencil buffer lossless compression is enabled.
v2: Remove unnecessary GEN_GEN check (Nanley Chery)
v3: (Nanley Chery)
- Change commit subject tag from intel/isl to intel
- Keep assignment order correct
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Sagar Ghuge [Tue, 15 Oct 2019 18:15:22 +0000 (11:15 -0700)]
intel/blorp: Add helper function for stencil buffer resolve
On Gen12+, Stencil buffer's lossless compression should be resolved
with WM_HZ_OP packet.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Sagar Ghuge [Wed, 14 Aug 2019 20:58:33 +0000 (13:58 -0700)]
intel/blorp: Assign correct view while clearing depth stencil
We never saw any failures regarding this typo but it's good to assign
correct stencil view while constructing blorp_params.
Fixes: 0cabf93b80d0 "intel/blorp: Add an entrypoint for clearing depth and stencil"
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Sagar Ghuge [Wed, 23 Oct 2019 23:17:48 +0000 (16:17 -0700)]
genxml/gen12: Add Stencil Buffer Resolve Enable bit
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Nanley Chery [Thu, 24 Oct 2019 16:14:07 +0000 (09:14 -0700)]
iris: Allocate main and aux surfaces together
On Gen12, the CCS buffer address doesn't have to be referenced in state
packets. In the case of a stencil buffer with CCS, the kernel won't know
the location of the CCS unless an extra call is made to pin its address.
To avoid this extra call, make the CCS part of the main surface.
v2. Update comment above bo_size. (Jordan)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Nanley Chery [Fri, 25 Oct 2019 22:07:42 +0000 (15:07 -0700)]
iris: Determine aux offsets within configure_aux
If a resource has a modifier, the main and aux surfaces will share a BO.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Nanley Chery [Fri, 25 Oct 2019 22:38:18 +0000 (15:38 -0700)]
iris: Bail resource creation upon aux creation error
The functions used during aux buffer configuration and creation only
return false for exceptional errors. Don't proceed with surface creation
in those cases.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Nanley Chery [Fri, 25 Oct 2019 19:05:58 +0000 (12:05 -0700)]
iris: Drop iris_resource::aux::extra_aux::bo
The primary and secondary aux buffers are always allocated in the same
BO.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Duncan Hopkins [Tue, 24 Sep 2019 15:03:04 +0000 (16:03 +0100)]
zink: pass line width from rast_state to gfx_pipeline_state.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Jason Ekstrand [Mon, 28 Oct 2019 16:17:06 +0000 (11:17 -0500)]
anv: Reduce the minimum number of relocations
The original value of 256 was under the assumption that you're a batch
buffer which is likely going to have a large number of relocations.
However, pipeline objects on Gen7 will have at most 6 relocations (one
per shader stage and one for the workaround BO) so this is a lot of
per-pipeline wasted space.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Jason Ekstrand [Mon, 28 Oct 2019 15:22:47 +0000 (10:22 -0500)]
anv: Delay allocation of relocation lists
The old relocation list code always allocated 256 relocations and a hash
set up-front without knowing whether or not we really need them. In
particular, in the softpin case, this is two fairly large allocations
that we don't need to be making. Also, for pipeline objects on haswell
where we don't have softpin, we don't need relocations unless scratch is
used so this is extra data per-pipeline. Instead, we should do it
on-demand. This shaves 3.5% off of a cpu-limited example running with
the Dawn WebGPU implementation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Plamena Manolova [Wed, 23 Oct 2019 22:47:03 +0000 (23:47 +0100)]
anv: Implement new way for setting streamout buffers.
For gen12 we set the streamout buffers using 4 separate
commands instead of 3DSTATE_SO_BUFFER.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Plamena Manolova [Wed, 23 Oct 2019 22:45:58 +0000 (23:45 +0100)]
iris: Implement new way for setting streamout buffers.
For gen12 we set the streamout buffers using 4 separate
commands instead of 3DSTATE_SO_BUFFER.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Plamena Manolova [Thu, 17 Oct 2019 20:05:55 +0000 (21:05 +0100)]
genxml: Add 3DSTATE_SO_BUFFER_INDEX_* instructions
For gen12 we set the streamout buffers using 4 separate
commands instead of 3DSTATE_SO_BUFFER.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Rob Clark [Thu, 24 Oct 2019 21:29:39 +0000 (14:29 -0700)]
freedreno/a6xx: add a618 support
Signed-off-by: Rob Clark <robdclark@chromium.org>
Rob Clark [Thu, 24 Oct 2019 21:03:32 +0000 (14:03 -0700)]
freedreno/a6xx: cleanup magic registers
Extract out values for the handful of unknown registers which have
different values across different a6xx models, to simplify adding
support for new a6xx's.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Rob Clark [Thu, 24 Oct 2019 21:22:09 +0000 (14:22 -0700)]
freedreno/a6xx: remove some left over dead code
These registers don't exist, just remnants of initial port from a5xx.
Signed-off-by: Rob Clark <robdclark@chromium.org>