Danylo Piliaiev [Tue, 12 Mar 2019 15:13:47 +0000 (17:13 +0200)]
anv: Fix destroying descriptor sets when pool gets reset
pool->next and pool->free_list were reset before their usage in
anv_descriptor_pool_free_set
Fixes: 775aabdd "anv: destroy descriptor sets when pool gets reset"
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Eric Anholt [Mon, 11 Mar 2019 22:59:24 +0000 (15:59 -0700)]
v3d: Disable PIPE_CAP_BLIT_BASED_TEXTURE_TRANSFER.
This reduces the runtime of dEQP-GLES3.functional.shaders.precision.* from
11.5s to 3.3s. This brings CTS runs down to 4 hours on one of my target
devices.
Jason Ekstrand [Thu, 7 Mar 2019 21:01:37 +0000 (15:01 -0600)]
intel/nir: Vectorize all IO
The IO scalarization pass that we run to help with linking end up
turning some shader I/O such as that for tessellation and geometry
shaders into many scalar URB operations rather than one vector one. To
alleviate this, we now vectorize the I/O once again. This fixes a 10%
performance regression in the GfxBench tessellation test that was caused
by scalarizing.
Shader-db results on Kaby Lake:
total instructions in shared programs:
15224023 ->
15220871 (-0.02%)
instructions in affected programs: 342009 -> 338857 (-0.92%)
helped: 1236
HURT: 443
total spills in shared programs: 23471 -> 23465 (-0.03%)
spills in affected programs: 6 -> 0
helped: 1
HURT: 0
total fills in shared programs: 31770 -> 31766 (-0.01%)
fills in affected programs: 4 -> 0
helped: 1
HURT: 0
Cycles was just a lot of churn do to moves being different places. Most
of the pure churn in instructions was +/- one or two instructions in
fragment shaders.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510
Fixes: 4434591bf56a "intel/nir: Call nir_lower_io_to_scalar_early"
Fixes: 8d8222461f9d "intel/nir: Enable nir_opt_find_array_copies"
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Jason Ekstrand [Wed, 6 Mar 2019 21:21:51 +0000 (15:21 -0600)]
nir: Add a pass for lowering IO back to vector when possible
This pass tries to turn scalar and array-of-scalar IO variables into
vector IO variables whenever possible.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Cc: "19.0" <mesa-stable@lists.freedesktop.org>
Rhys Perry [Thu, 6 Dec 2018 14:58:50 +0000 (14:58 +0000)]
ac/nir: fix 16-bit ssbo stores
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
pal1000 [Thu, 7 Mar 2019 08:38:10 +0000 (10:38 +0200)]
scons: Compatibility with Scons development version string
This ensures Mesa3D build doesn't fail in this case as encountered when
bisecting Scons source code while regression testing
https://bugs.freedesktop.org/show_bug.cgi?id=109443
and when testing 3.0.5.a.2
Technical details:
Scons version string has consistently been in this format:
MajorVersion.MinorVersion.Patch[.alpha/beta.yyyymmdd]
so these formulas should strip alpha/beta flags and return Scons version:
- as string - `'.'.join(SCons.__version__.split('.')[:3])`
- as tuple of integers - `tuple(map(int, SCons.__version__.split('.')[:3]))`
- v2: Fixed Scons version retrieval formulas as string and tuple of integers.
- v3: Fixed Scons version string format description.
Cc: "19.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Tapani Pälli [Tue, 12 Mar 2019 12:01:26 +0000 (14:01 +0200)]
anv: revert "anv: release memory allocated by glsl types during spirv_to_nir"
This reverts commit
47fc359822494935852de1e70e4d840b2fe6a25c.
Reason is that patch did not take in to account situation where we might
have both OpenGL and Vulkan using glsl_types at the same time.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Connor Abbott [Mon, 4 Mar 2019 17:00:30 +0000 (18:00 +0100)]
radeonsi/nir: Use nir stripping pass
This reduces compilation time for my shader-db collection from around 40
seconds to 30, vs. 19 seconds for TGSI. There are still some shaders
that TGSI caches but NIR doesn't, partly because of more aggressive
cross-stage optimizations with NIR.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Connor Abbott [Mon, 4 Mar 2019 16:51:12 +0000 (17:51 +0100)]
nir: Add a stripping pass for improved cacheability
Oftentimes various nir shaders after lowering will be the same, or
almost the same. For example, this can happen when the same shader is
linked with different shaders to form different pipelines and
cross-stage optimizations don't kick in to change it. We want to avoid
running the backend twice on these shaders. We were already doing this
with radeonsi, but we were storing a few extra pieces of information
that made this much less effective compared to TGSI. The worse offender
by far was the program name, which caused most of the cache misses. This
pass strips out these pieces of information, controlled by the NIR_STRIP
debug env variable.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Samuel Pitoiset [Mon, 11 Mar 2019 09:25:53 +0000 (10:25 +0100)]
radv: fix pointSizeRange limits
The values should match the ones that are emitted.
This fixes new CTS dEQP-VK.rasterization.primitive_size.points.*.
Fixes: f4e499ec791 ("radv: add initial non-conformant radv vulkan driver")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Sagar Ghuge [Thu, 14 Feb 2019 06:22:16 +0000 (22:22 -0800)]
iris: Flag fewer dirty bits in BLORP
v2: 1) Skip flagging IRIS_DIRTY_DEPTH_BUFFER if
BLORP_BATCH_NO_EMIT_DEPTH_STENCIL is set (Kenneth Graunke)
2) Add missing flags (Kenneth Graunke)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Timothy Arceri [Fri, 1 Mar 2019 10:35:41 +0000 (21:35 +1100)]
st/glsl_to_nir: fix incorrect arrary access
This fixes a segfault when we try to access the array using a
-1 when the array wasn't allocated in the first place.
Before
7536af670b75 we would just access a pre-allocated array
that was also load/stored to/from the shader cache. But now the
cache will no longer allocate these arrays if they are empty.
The change resulted in tests such as the following segfaulting
when run with a warm shader cache.
tests/spec/arb_arrays_of_arrays/execution/sampler/fs-struct-const-index.shader_test
Brian Paul [Tue, 12 Mar 2019 02:12:15 +0000 (20:12 -0600)]
nir: silence a couple new compiler warnings
[33/630] Compiling C object 'src/compiler/nir/nir@sta/nir_loop_analyze.c.o'.
../src/compiler/nir/nir_loop_analyze.c: In function ‘try_find_trip_count_vars_in_iand’:
../src/compiler/nir/nir_loop_analyze.c:846:29: warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses]
if (*ind == NULL || *ind && (*ind)->type != basic_induction ||
^
[85/630] Compiling C object 'src/compiler/nir/nir@sta/nir_opt_loop_unroll.c.o'.
../src/compiler/nir/nir_opt_loop_unroll.c: In function ‘complex_unroll_single_terminator’:
../src/compiler/nir/nir_opt_loop_unroll.c:494:17: warning: unused variable ‘unroll_loc’ [-Wunused-variable]
nir_cf_node *unroll_loc =
^
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Alyssa Rosenzweig [Sat, 9 Mar 2019 00:45:23 +0000 (00:45 +0000)]
panfrost: Identify fragment_extra flags
The fragment_extra structure contains additional fields extending the
MRT framebuffer descriptor, snuck in between the main framebuffer
descriptor and the render targets. Its fields include those related to
transaction elimination and depth/stencil buffers. This patch identifies
the flags field (previously just "unk" with some magic values) as well
as identifying some (but not all) flags set by the driver.
The process of identifying flags brought a bug to light where
transaction elimination (checksumming) could not be enabled unless AFBC
was in-use. This issue is now resolved.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Sat, 9 Mar 2019 00:12:07 +0000 (00:12 +0000)]
panfrost: Document "depth-buffer writeback" bit
This bit, if set, causes the depth buffer to be copied from GPU tile
memory to the provided depth buffer in main memory. If not set, the GPU
will not access the main memory (saving considerable memory bandwidth if
depth results are not actually used).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Alyssa Rosenzweig [Fri, 8 Mar 2019 23:41:12 +0000 (23:41 +0000)]
panfrost: Support linear depth textures
This combination has not yet been seen "in the wild" in traces, but to
support linear depth FBOs, ~bruteforce reveals this bit pattern is
necessary. It's not yet clear why the meanings of 0x1 and 0x2 are
essentially flipped (tiled vs linear for colour, linear vs some sort of
tiled for depth).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Fri, 8 Mar 2019 23:36:02 +0000 (23:36 +0000)]
panfrost: Allocate dedicated slab for linear BOs
Previously, linear BOs shared memory with each other to minimize kernel
round-trips / latency, as well as to work around a bug in the free_slab
function. These concerns are invalid now, but continuing to use the slab
allocator for BOs resulted in memory allocation errors. This issue was
aggravated, though not introduced (so not a real regression) in the
previous commit.
v2 (unreviewed): Fix bug in v1 preventing munmaps from working
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Thu, 7 Mar 2019 04:42:49 +0000 (04:42 +0000)]
panfrost: Determine framebuffer format bits late
Again, these formats are only properly known at the time of fragment job
emit. Rather than hardcoding the format, at least for MFBD we begin to
construct the format bits on-demand. This cleans up the code,
futureproofs for ES3 framebuffer formats, and should fix bugs regarding
FBO colour swizzles.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>
Alyssa Rosenzweig [Thu, 7 Mar 2019 04:19:21 +0000 (04:19 +0000)]
panfrost: Delay color buffer setup
In an effort to cleanup framebuffer management code, we delay
colour buffer setup until the FRAGMENT job is actually emitted, allowing
the AFBC and linear codepaths to be unified.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>
Alyssa Rosenzweig [Thu, 7 Mar 2019 03:52:20 +0000 (03:52 +0000)]
panfrost: Combine has_afbc/tiled in layout enum
AFBC, tiled, and linear BO layouts are mutually exclusive; they should
be coupled via a single enum rather than ad hoc checks of booleans.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>
Alyssa Rosenzweig [Thu, 7 Mar 2019 03:24:45 +0000 (03:24 +0000)]
panfrost: Cleanup needless if in create_bo
I'm not sure why we were checking for these additional criteria (likely
inherited from some other driver); remove the needless checks to cleanup
the code and perhaps fix some bugs down the line.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>
Kenneth Graunke [Fri, 17 Nov 2017 07:47:43 +0000 (23:47 -0800)]
i965: Reimplement all the PIPE_CONTROL rules.
This implements virtually all documented PIPE_CONTROL restrictions
in a centralized helper. You now simply ask for the operations you
want, and the pipe control "brain" will figure out exactly what pipe
controls to emit to make that happen without tanking your system.
The hope is that this will fix some intermittent flushing issues as
well as GPU hangs. However, it also has a high risk of causing GPU
hangs and other regressions, as this is a particularly sensitive
area and poking the bear isn't always advisable.
Mark Janes noted that this patch helps with some GPU hangs on Icelake.
This does re-enable the VF Invalidate => Write Immediate workaround
on Gen8, which had been disabled (bug 103787) due to GPU hangs. The
old code did this workaround after another which would have added CS
stall bits, so it missed a workaround. The new code orders them
properly and appears to work.
v4: Don't pass "bo, offset, imm" to a recursive CS stall (caught by
Topi Pohjolainen), drop Gen10 workarounds that are unnecessary for
production hardware.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Kenneth Graunke [Thu, 1 Nov 2018 22:55:51 +0000 (15:55 -0700)]
i965: Use genxml for emitting PIPE_CONTROL.
While this does add a bunch of boilerplate, it also protects us against
the hardware moving bits, or changing their meaning. For something as
finnicky as PIPE_CONTROL, the extra safety seems worth it.
We turn PIPE_CONTROL_* into an bitfield of arbitrary flags, and then
pack them appropriately.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Kenneth Graunke [Thu, 1 Nov 2018 22:55:21 +0000 (15:55 -0700)]
i965: Rename ISP_DIS to INDIRECT_STATE_POINTERS_DISABLE.
Clearer name.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Kenneth Graunke [Fri, 17 Nov 2017 06:37:02 +0000 (22:37 -0800)]
i965: Move some genX infrastructure to genX_boilerplate.h.
This will let us make multiple genX_*.c files, without copy and pasting
all this boilerplate.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Brian Paul [Fri, 8 Mar 2019 22:50:58 +0000 (15:50 -0700)]
gallium/winsys/kms: fix incomplete type compilation failure
Fixes:
../src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c: In function ‘kms_sw_displaytarget_from_handle’:
../src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c:402:60: error: dereferencing pointer to incomplete type ‘const struct pipe_resource’
templ->format,
^
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Brian Paul [Fri, 8 Mar 2019 22:49:49 +0000 (15:49 -0700)]
drisw: fix incomplete type compilation failure
Fixes:
../src/gallium/winsys/sw/dri/dri_sw_winsys.c: In function ‘dri_sw_displaytarget_display’:
../src/gallium/winsys/sw/dri/dri_sw_winsys.c:255:39: error: dereferencing pointer to incomplete type ‘struct pipe_box’
offset = dri_sw_dt->stride * box->y;
^
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Brian Paul [Fri, 8 Mar 2019 03:39:49 +0000 (20:39 -0700)]
docs: try to improve the Meson documentation (v2)
Add new Introduction and Advanced Usage sections.
Spell out a few more details, like "ninja install".
Improve the layout around example commands.
Fix grammatical errors and tighten up the text.
Explain the --prefix option.
v2: Remove language about 'ninja clean' and move link to Meson
information about separate build directories earlier in the page.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Brian Paul [Wed, 6 Mar 2019 23:20:55 +0000 (16:20 -0700)]
st/mesa: minor refactoring of texture/sampler delete code
Rename st_texture_free_sampler_views() to
st_delete_texture_sampler_views() to align with
st_DeleteTextureObject(), its only caller.
Move the call to st_texture_release_all_sampler_views() from
st_DeleteTextureObject() to st_delete_texture_sampler_views()
so all the sampler view clean-up code is in one place.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Brian Paul [Wed, 6 Mar 2019 23:15:19 +0000 (16:15 -0700)]
st/mesa: rename st_texture_release_sampler_view()
To st_texture_release_context_sampler_view() to be more clear
that it's context-specific.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Brian Paul [Wed, 6 Mar 2019 23:09:09 +0000 (16:09 -0700)]
st/mesa: add/improve sampler view comments
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Brian Paul [Thu, 7 Mar 2019 16:55:09 +0000 (09:55 -0700)]
st/mesa: move around some code in st_context.c
st_init_driver_functions() is only called in st_context.c so there's
no need for the prototype in st_context.h
To avoid a forward declaration of st_init_driver_functions() in
st_context.c, we need to move around several other functions.
No functional change.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Brian Paul [Thu, 7 Mar 2019 16:21:53 +0000 (09:21 -0700)]
st/mesa: move utility functions, macros into new st_util.h file
To de-clutter st_context.h.
Clean up remaining function prototypes in st_context.h.
The st_vp_uses_current_values() helper is only used in st_context.c
so move it there.
The st_get_active_states() function is only used in st_context.c so
remove its prototype in st_context.h
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Juan A. Suarez Romero [Mon, 11 Mar 2019 17:33:54 +0000 (18:33 +0100)]
anv: destroy descriptor sets when pool gets reset
As stated in Vulkan spec:
"Resetting a descriptor pool recycles all of the resources from all
of the descriptor sets allocated from the descriptor pool back to
the descriptor pool, and the descriptor sets are implicitly freed."
This fixes dEQP-VK.api.descriptor_pool.*
Fixes: 14f6275c92f1 "anv/descriptor_set: add reference counting for..."
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
Timothy Arceri [Thu, 6 Dec 2018 05:00:40 +0000 (16:00 +1100)]
nir: find induction/limit vars in iand instructions
This will be used to help find the trip count of loops that look
like the following:
while (a < x && i < 8) {
...
i++;
}
Where the NIR will end up looking something like this:
vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */)
loop {
...
vec1 1 ssa_12 = ilt ssa_225, ssa_11
vec1 1 ssa_17 = ilt ssa_226, ssa_1
vec1 1 ssa_18 = iand ssa_12, ssa_17
vec1 1 ssa_19 = inot ssa_18
if ssa_19 {
...
break
} else {
...
}
}
On RADV this unrolls a bunch of loops in F1-2017 shaders.
Totals from affected shaders:
SGPRS: 4112 -> 4136 (0.58 %)
VGPRS: 4132 -> 4052 (-1.94 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 515444 -> 587720 (14.02 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Max Waves: 194 -> 196 (1.03 %)
Wait states: 0 -> 0 (0.00 %)
It also unrolls a couple of loops in shader-db on radeonsi.
Totals from affected shaders:
SGPRS: 128 -> 128 (0.00 %)
VGPRS: 64 -> 64 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 6880 -> 9504 (38.14 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 16 -> 16 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Timothy Arceri [Thu, 6 Dec 2018 04:56:55 +0000 (15:56 +1100)]
nir: pass nir_op to calculate_iterations()
Rather than getting this from the alu instruction this allows us
some flexibility. In the following pass we instead pass the
inverse op.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Timothy Arceri [Thu, 6 Dec 2018 02:29:05 +0000 (13:29 +1100)]
nir: add get_induction_and_limit_vars() helper to loop analysis
This helps make find_trip_count() a little easier to follow but
will also be used by a following patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Timothy Arceri [Thu, 6 Dec 2018 00:17:45 +0000 (11:17 +1100)]
nir: add helper to return inversion op of a comparison
This will be used to help find the trip count of loops that look
like the following:
while (a < x && i < 8) {
...
i++;
}
Where the NIR will end up looking something like this:
vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */)
loop {
...
vec1 1 ssa_12 = ilt ssa_225, ssa_11
vec1 1 ssa_17 = ilt ssa_226, ssa_1
vec1 1 ssa_18 = iand ssa_12, ssa_17
vec1 1 ssa_19 = inot ssa_18
if ssa_19 {
...
break
} else {
...
}
}
So in order to find the trip count we need to find the inverse of
ilt.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Timothy Arceri [Thu, 6 Dec 2018 00:12:12 +0000 (11:12 +1100)]
nir: simplify the loop analysis trip count code a little
Here we create a helper is_supported_terminator_condition()
and use that rather than embedding all the trip count code
inside a switch.
The new helper will also be used in a following patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Timothy Arceri [Tue, 20 Nov 2018 04:23:45 +0000 (15:23 +1100)]
nir: unroll some loops with a variable limit
For some loops can have a single terminator but the exact trip
count is still unknown. For example:
for (int i = 0; i < imin(x, 4); i++)
...
Shader-db results radeonsi (all affected are from Tropico 5):
Totals from affected shaders:
SGPRS: 144 -> 152 (5.56 %)
VGPRS: 124 -> 108 (-12.90 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 5180 -> 6640 (28.19 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 17 -> 21 (23.53 %)
Wait states: 0 -> 0 (0.00 %)
Shader-db results i965 (SKL):
total loops in shared programs: 3808 -> 3802 (-0.16%)
loops in affected programs: 6 -> 0
helped: 6
HURT: 0
vkpipeline-db results RADV (Unrolls some Skyrim VR shaders):
Totals from affected shaders:
SGPRS: 304 -> 304 (0.00 %)
VGPRS: 296 -> 292 (-1.35 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 15756 -> 25884 (64.28 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 29 -> 29 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
v2: fix bug where last iteration would get optimised away by
mistake.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Timothy Arceri [Tue, 20 Nov 2018 02:45:58 +0000 (13:45 +1100)]
nir: calculate trip count for more loops
This adds support to loop analysis for loops where the induction
variable is compared to the result of min(variable, constant).
For example:
for (int i = 0; i < imin(x, 4); i++)
...
We add a new bool to the loop terminator struct in order to
differentiate terminators with this exit condition.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Timothy Arceri [Tue, 20 Nov 2018 03:05:09 +0000 (14:05 +1100)]
nir: add partial loop unrolling support
This adds partial loop unrolling support and makes use of a
guessed trip count based on array access.
The code is written so that we could use partial unrolling
more generally, but for now it's only use when we have guessed
the trip count.
We use partial unrolling for this guessed trip count because its
possible any out of bounds array access doesn't otherwise affect
the shader e.g the stores/loads to/from the array are unused. So
we insert a copy of the loop in the innermost continue branch of
the unrolled loop. Later on its possible for nir_opt_dead_cf()
to then remove the loop in some cases.
A Renderdoc capture from the Rise of the Tomb Raider benchmark,
reports the following change in an affected compute shader:
GPU duration: 350 -> 325 microseconds
shader-db results radeonsi VEGA (NIR backend):
SGPRS: 1008 -> 816 (-19.05 %)
VGPRS: 684 -> 432 (-36.84 %)
Spilled SGPRs: 539 -> 0 (-100.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 39708 -> 45812 (15.37 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 105 -> 144 (37.14 %)
Wait states: 0 -> 0 (0.00 %)
shader-db results i965 SKL:
total instructions in shared programs:
13098265 ->
13103359 (0.04%)
instructions in affected programs: 5126 -> 10220 (99.38%)
helped: 0
HURT: 21
total cycles in shared programs:
332039949 ->
331985622 (-0.02%)
cycles in affected programs: 289252 -> 234925 (-18.78%)
helped: 12
HURT: 9
vkpipeline-db results VEGA:
Totals from affected shaders:
SGPRS: 184 -> 184 (0.00 %)
VGPRS: 448 -> 448 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 26076 -> 24428 (-6.32 %) bytes
LDS: 6 -> 6 (0.00 %) blocks
Max Waves: 5 -> 5 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Timothy Arceri [Mon, 19 Nov 2018 06:01:52 +0000 (17:01 +1100)]
nir: add new partially_unrolled bool to nir_loop
In order to stop continuously partially unrolling the same loop
we add the bool partially_unrolled to nir_loop, we add it here
rather than in nir_loop_info because nir_loop_info is only set
via loop analysis and is intended to be cleared before each
analysis. Also nir_loop_info is never cloned.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Timothy Arceri [Thu, 15 Nov 2018 12:23:09 +0000 (23:23 +1100)]
nir: add guess trip count support to loop analysis
This detects an induction variable used as an array index to guess
the trip count of the loop. This enables us to do a partial
unroll of the loop, which can eventually result in the loop being
eliminated.
v2: check if the induction var is used to index more than a single
array and if so get the size of the smallest array.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tomeu Vizoso [Fri, 8 Mar 2019 14:24:57 +0000 (15:24 +0100)]
panfrost: Add support for PAN_MESA_DEBUG
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Tomeu Vizoso [Fri, 8 Mar 2019 14:04:50 +0000 (15:04 +0100)]
panfrost/midgard: Add support for MIDGARD_MESA_DEBUG
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Xavier Bouchoux [Mon, 15 Oct 2018 14:24:29 +0000 (16:24 +0200)]
nir/spirv: Fix assert when unsampled OpTypeImage has unknown 'Depth'
'dxc' hlsl-to-spirv compiler appears to emit 2 (Unknown) in the depth field,
when the image is not sampled and the value is not needed.
Previously, shaders failed with:
SPIR-V parsing FAILED:
In file ../src/compiler/spirv/spirv_to_nir.c:1412
!is_shadow
632 bytes into the SPIR-V binary
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Sat, 9 Mar 2019 09:02:06 +0000 (01:02 -0800)]
iris: Fix write enable in pinning of depth/stencil resources
We may bind new Z/S buffers (which come via the framebuffer CSO,
triggering IRIS_DIRTY_DEPTH_BUFFER), but with writes disabled.
The next draw may enable Z or S writes (which come via the ZSA CSO,
triggering IRIS_DIRTY_WM_DEPTH_STENCIL), which requires us to update
our pin to have the write flag.
So, update pinning if either dirty flag changes. To clarify, pass
cso_zsa to the pinning function rather than pulling the random values
out of ice->state, which unfortunately have to exist for the resolve
code since iris_depth_stencil_alpha_state only exists in iris_state.c.
Kenneth Graunke [Sat, 9 Mar 2019 08:50:24 +0000 (00:50 -0800)]
iris: Refactor depth/stencil buffer pinning into a helper.
This avoids the code duplication that caused me to put things in the
wrong place in the previous commit. One used to have extra flushes,
but we moved those out so now these are identical and can be easily
shared.
Kenneth Graunke [Sat, 9 Mar 2019 08:42:54 +0000 (00:42 -0800)]
iris: Move depth/stencil flushes so they actually do something
Commit
d6dd57d43cd (iris: Add missing depth cache flushes) added the
depth/stencil flushes to the wrong place. I meant to add them to the
iris_upload_dirty_render_state code that emits the packets, but I
accidentally added them to the nearly identical looking code in
iris_restore_render_saved_bos. This meant we missed the actual flushing
at draw time, but instead did pointless flushing on the first draw in a
batch where things are already flushed anyway.
This commit moves them to iris_resolve.c, next to the depth prepares,
similar to what we do for color buffers. i965 does them elsewhere, but
I'm not sure why - this seems like the most consistent place.
Christian Gmeiner [Tue, 26 Feb 2019 17:41:07 +0000 (18:41 +0100)]
st/dri: allow direct UYVY import
Push this format to the pipe driver unchanged.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Kenneth Graunke [Fri, 8 Mar 2019 04:14:59 +0000 (20:14 -0800)]
iris: Fix TES gl_PatchVerticesIn handling.
1. If we switch the TCS for one with a different number of output
vertices, then the TES's gl_PatchVerticesIn value will change.
We need to re-upload in this case. For now, re-emit constants
whenever the TCS/TES are swapped out.
2. If there is no TCS, then we can't grab gl_PatchVerticesIn from
the TCS info. Since it's a passthrough, we can just use the
primitive's patch count (like the TCS gl_PatchVerticesIn does).
Fixes KHR-GL45.tessellation_shader.single.max_patch_vertices and
KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Kenneth Graunke [Thu, 7 Mar 2019 04:56:37 +0000 (20:56 -0800)]
iris: Rework default tessellation level uploads
Now that we've added a system value uploading mechanism, we may as well
reuse the same system for default tessellation levels. This simplifies
the state upload code a bit.
Also fixes:
KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_tessLevel
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Timur Kristóf [Wed, 13 Feb 2019 22:28:20 +0000 (00:28 +0200)]
iris: Face should be a system value.
This patch adds PIPE_CAP_TGSI_FS_FACE_IS_INTEGER_SYSVAL which
despite its name is not a TGSI-specific capability, just lets
the state tracker know that it should generate a system value
for FACE.
This is needed if we want to run tgsi_to_nir on iris.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Thu, 28 Feb 2019 20:02:58 +0000 (12:02 -0800)]
vc4: Switch the post-RA scheduler over to the DAG datastructure.
Just a small code reduction from shared infrastructure.
Eric Anholt [Thu, 28 Feb 2019 18:42:05 +0000 (10:42 -0800)]
v3d: Use the DAG datastructure for QPU instruction scheduling.
Just a small code reduction from shared infrastructure.
Eric Anholt [Thu, 28 Feb 2019 19:02:25 +0000 (11:02 -0800)]
vc4: Reuse list_for_each_entry_rev().
Eric Anholt [Thu, 28 Feb 2019 19:01:57 +0000 (11:01 -0800)]
v3d: Reuse list_for_each_entry_rev().
Eric Anholt [Thu, 28 Feb 2019 18:06:27 +0000 (10:06 -0800)]
vc4: Switch over to using the DAG datastructure for QIR scheduling.
Just a small code reduction from shared infrastructure.
Eric Anholt [Wed, 27 Feb 2019 19:12:59 +0000 (11:12 -0800)]
util: Add a DAG datastructure.
I keep writing this for various schedulers.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Kristian H. Kristensen [Fri, 1 Mar 2019 22:33:36 +0000 (14:33 -0800)]
freedreno/a6xx: Remove extra parens
There's a warning about this now.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Kristian H. Kristensen [Fri, 1 Mar 2019 22:25:57 +0000 (14:25 -0800)]
freedreno: Use c_vis_args and no_override_init_args
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Chia-I Wu [Fri, 8 Feb 2019 21:45:53 +0000 (13:45 -0800)]
turnip: preliminary support for Wayland WSI
Chia-I Wu [Mon, 11 Feb 2019 19:12:32 +0000 (11:12 -0800)]
turnip: preliminary support for tu_GetImageSubresourceLayout
Chad Versace [Sat, 2 Feb 2019 01:08:51 +0000 (17:08 -0800)]
turnip: Use Vulkan 1.1 names instead of KHR
That is, drop KHR from all tokens that were promoted to Vulkan 1.1.
The consistency makes ctags more useful (it now jumps directly to the
real definitions in vulkan_core.h instead of the typedefs); and it makes
the code slightly less verbose.
Chia-I Wu [Fri, 8 Mar 2019 19:27:50 +0000 (11:27 -0800)]
turnip: guard -Dvulkan-driver=freedreno
Require -DI-love-half-baked-turnips=true as well to enable freedreno
vulkan driver.
Chia-I Wu [Fri, 22 Feb 2019 16:50:58 +0000 (08:50 -0800)]
turnip: preliminary support for tu_CmdDraw
Chia-I Wu [Fri, 22 Feb 2019 06:37:34 +0000 (22:37 -0800)]
turnip: preliminary support for draw state binding
This adds support for tu_CmdBindPipeline, tu_CmdBindVertexBuffers,
etc.
Chia-I Wu [Wed, 20 Feb 2019 22:26:06 +0000 (14:26 -0800)]
turnip: add draw_cs to tu_cmd_buffer
It will hold draw commands.
Chia-I Wu [Fri, 22 Feb 2019 06:31:36 +0000 (22:31 -0800)]
turnip: parse VkPipelineVertexInputStateCreateInfo
Chia-I Wu [Wed, 27 Feb 2019 06:10:34 +0000 (22:10 -0800)]
turnip: parse VkPipelineShaderStageCreateInfo
Chia-I Wu [Wed, 27 Feb 2019 06:09:37 +0000 (22:09 -0800)]
turnip: compile VkPipelineShaderStageCreateInfo
Compile all shaders and upload the binaries to a BO.
Chia-I Wu [Wed, 20 Feb 2019 17:53:47 +0000 (09:53 -0800)]
turnip: preliminary support for shader modules
Save SPIR-V in tu_shader_module. Tranlation to NIR happens in
tu_shader_create, and compilation to binary code happens in
tu_shader_compile. Both will be called during pipeline creation.
Chia-I Wu [Thu, 21 Feb 2019 22:58:52 +0000 (14:58 -0800)]
turnip: parse VkPipeline{Multisample,ColorBlend}StateCreateInfo
Chia-I Wu [Thu, 21 Feb 2019 19:46:59 +0000 (11:46 -0800)]
turnip: parse VkPipelineDepthStencilStateCreateInfo
Chia-I Wu [Wed, 27 Feb 2019 07:29:51 +0000 (23:29 -0800)]
turnip: parse VkPipelineRasterizationStateCreateInfo
Chia-I Wu [Tue, 19 Feb 2019 21:49:01 +0000 (13:49 -0800)]
turnip: parse VkPipelineViewportStateCreateInfo
Chia-I Wu [Thu, 21 Feb 2019 19:07:38 +0000 (11:07 -0800)]
turnip: parse VkPipelineInputAssemblyStateCreateInfo
Chia-I Wu [Thu, 21 Feb 2019 17:41:49 +0000 (09:41 -0800)]
turnip: parse VkPipelineDynamicStateCreateInfo
Chia-I Wu [Thu, 21 Feb 2019 17:22:17 +0000 (09:22 -0800)]
turnip: create a less dummy pipeline
Still dummy, but at least it is created from tu_pipeline_builder.
Chia-I Wu [Mon, 25 Feb 2019 22:38:34 +0000 (14:38 -0800)]
turnip: simplify tu_cs sub-streams usage
Let tu_cs_begin_sub_stream imply tu_cs_reserve_space, and
tu_cs_end_sub_stream imply tu_cs_sanity_check. Callers are no
longer required to call them (but can still do if they choose to).
Chia-I Wu [Mon, 25 Feb 2019 22:37:55 +0000 (14:37 -0800)]
turnip: fix tu_cs sub-streams
Update cs->start in tu_cs_end_sub_stream. Otherwise, the entry
would include commands from all prior sub-streams.
Chia-I Wu [Mon, 25 Feb 2019 22:57:03 +0000 (14:57 -0800)]
turnip: tu_cs_emit_array
Array version of tu_cs_emit. Useful for updating multiple
consecutive array-like registers, or loading a shader binary with
SS6_DIRECT.
Chia-I Wu [Mon, 25 Feb 2019 22:49:34 +0000 (14:49 -0800)]
turnip: add tu_cs_discard_entries
We will start a draw IB at the beginning of a subpass and consume it
at the end of the subpass. With tu_cs_discard_entries, we can reuse
the same tu_cs for all subpasses.
Chia-I Wu [Mon, 25 Feb 2019 22:55:06 +0000 (14:55 -0800)]
turnip: more/better asserts for tu_cs
Asserting (cur < end) in tu_cs_emit catches much less programming
errors comparing to asserting (cur < reserved_end). We should never
write more commands than what we have reserved.
Assert IB is non-empty and sane in tu_cs_emit_ib.
Chia-I Wu [Mon, 25 Feb 2019 22:44:52 +0000 (14:44 -0800)]
turnip: use 32-bit offset in tu_cs_entry
We don't support nor expect BOs to be that big in tu_cs.
Chia-I Wu [Mon, 25 Feb 2019 22:32:36 +0000 (14:32 -0800)]
turnip: mark IBs for dumping
Includes IBs in kernel cmdbuf dumps.
Eric Engestrom [Wed, 27 Feb 2019 12:31:06 +0000 (12:31 +0000)]
turnip: use the platform defines in vk.xml instead of hard-coding them
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Bas Nieuwenhuizen [Thu, 21 Feb 2019 21:39:22 +0000 (22:39 +0100)]
turnip: Add todo for copies.
Bas Nieuwenhuizen [Mon, 18 Feb 2019 15:43:24 +0000 (16:43 +0100)]
turnip: Add buffer->image DMA copies.
Passes dEQP-VK.api.copy_and_blit.core.buffer_to_image.*
Bas Nieuwenhuizen [Mon, 18 Feb 2019 15:13:23 +0000 (16:13 +0100)]
turnip: Add image->buffer DMA copies.
Passes dEQP-VK.api.copy_and_blit.core.image_to_buffer.*
Bas Nieuwenhuizen [Mon, 18 Feb 2019 15:09:27 +0000 (16:09 +0100)]
turnip: Implement buffer->buffer DMA copies.
Passes dEQP-VK.api.copy_and_blit.core.buffer_to_buffer.*
Bas Nieuwenhuizen [Mon, 4 Feb 2019 13:52:34 +0000 (14:52 +0100)]
turnip: Add tu6_rb_fmt_to_ifmt.
Bas Nieuwenhuizen [Mon, 18 Feb 2019 13:49:52 +0000 (14:49 +0100)]
turnip: Make tu6_emit_event_write shared.
Bas Nieuwenhuizen [Tue, 15 Jan 2019 21:54:15 +0000 (22:54 +0100)]
turnip: Add buffer memory binding.
Chia-I Wu [Thu, 14 Feb 2019 18:53:20 +0000 (10:53 -0800)]
turnip: respect color attachment formats
Make tu6_get_native_format available to tu_cmd_buffer and start
using of it.
Chia-I Wu [Thu, 14 Feb 2019 22:36:52 +0000 (14:36 -0800)]
turnip: preliminary support for fences
This should be quite complete feature-wise. External fences are
still missing. We probably also want to add a simpler path to
tu_WaitForFences for when fenceCount == 1.
Chia-I Wu [Wed, 13 Feb 2019 18:23:32 +0000 (10:23 -0800)]
turnip: fix VkClearValue packing
Add tu_pack_clear_value to correctly pack VkClearValue according to
VkFormat. It ignores the component order defined by VkFormat, and
always packs to WZYX order.
Chia-I Wu [Fri, 1 Feb 2019 18:36:19 +0000 (10:36 -0800)]
turnip: add support for VK_KHR_external_memory_{fd,dma_buf}
Chia-I Wu [Fri, 1 Feb 2019 18:27:28 +0000 (10:27 -0800)]
turnip: advertise VK_KHR_external_memory
AFAICT, it is supported. We don't need to handle any of the new
structs because our BOs can always be exported.