Alyssa Rosenzweig [Thu, 7 Mar 2019 04:19:21 +0000 (04:19 +0000)]
panfrost: Delay color buffer setup
In an effort to cleanup framebuffer management code, we delay
colour buffer setup until the FRAGMENT job is actually emitted, allowing
the AFBC and linear codepaths to be unified.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>
Alyssa Rosenzweig [Thu, 7 Mar 2019 03:52:20 +0000 (03:52 +0000)]
panfrost: Combine has_afbc/tiled in layout enum
AFBC, tiled, and linear BO layouts are mutually exclusive; they should
be coupled via a single enum rather than ad hoc checks of booleans.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>
Alyssa Rosenzweig [Thu, 7 Mar 2019 03:24:45 +0000 (03:24 +0000)]
panfrost: Cleanup needless if in create_bo
I'm not sure why we were checking for these additional criteria (likely
inherited from some other driver); remove the needless checks to cleanup
the code and perhaps fix some bugs down the line.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>
Kenneth Graunke [Fri, 17 Nov 2017 07:47:43 +0000 (23:47 -0800)]
i965: Reimplement all the PIPE_CONTROL rules.
This implements virtually all documented PIPE_CONTROL restrictions
in a centralized helper. You now simply ask for the operations you
want, and the pipe control "brain" will figure out exactly what pipe
controls to emit to make that happen without tanking your system.
The hope is that this will fix some intermittent flushing issues as
well as GPU hangs. However, it also has a high risk of causing GPU
hangs and other regressions, as this is a particularly sensitive
area and poking the bear isn't always advisable.
Mark Janes noted that this patch helps with some GPU hangs on Icelake.
This does re-enable the VF Invalidate => Write Immediate workaround
on Gen8, which had been disabled (bug 103787) due to GPU hangs. The
old code did this workaround after another which would have added CS
stall bits, so it missed a workaround. The new code orders them
properly and appears to work.
v4: Don't pass "bo, offset, imm" to a recursive CS stall (caught by
Topi Pohjolainen), drop Gen10 workarounds that are unnecessary for
production hardware.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Kenneth Graunke [Thu, 1 Nov 2018 22:55:51 +0000 (15:55 -0700)]
i965: Use genxml for emitting PIPE_CONTROL.
While this does add a bunch of boilerplate, it also protects us against
the hardware moving bits, or changing their meaning. For something as
finnicky as PIPE_CONTROL, the extra safety seems worth it.
We turn PIPE_CONTROL_* into an bitfield of arbitrary flags, and then
pack them appropriately.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Kenneth Graunke [Thu, 1 Nov 2018 22:55:21 +0000 (15:55 -0700)]
i965: Rename ISP_DIS to INDIRECT_STATE_POINTERS_DISABLE.
Clearer name.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Kenneth Graunke [Fri, 17 Nov 2017 06:37:02 +0000 (22:37 -0800)]
i965: Move some genX infrastructure to genX_boilerplate.h.
This will let us make multiple genX_*.c files, without copy and pasting
all this boilerplate.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Brian Paul [Fri, 8 Mar 2019 22:50:58 +0000 (15:50 -0700)]
gallium/winsys/kms: fix incomplete type compilation failure
Fixes:
../src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c: In function ‘kms_sw_displaytarget_from_handle’:
../src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c:402:60: error: dereferencing pointer to incomplete type ‘const struct pipe_resource’
templ->format,
^
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Brian Paul [Fri, 8 Mar 2019 22:49:49 +0000 (15:49 -0700)]
drisw: fix incomplete type compilation failure
Fixes:
../src/gallium/winsys/sw/dri/dri_sw_winsys.c: In function ‘dri_sw_displaytarget_display’:
../src/gallium/winsys/sw/dri/dri_sw_winsys.c:255:39: error: dereferencing pointer to incomplete type ‘struct pipe_box’
offset = dri_sw_dt->stride * box->y;
^
Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Brian Paul [Fri, 8 Mar 2019 03:39:49 +0000 (20:39 -0700)]
docs: try to improve the Meson documentation (v2)
Add new Introduction and Advanced Usage sections.
Spell out a few more details, like "ninja install".
Improve the layout around example commands.
Fix grammatical errors and tighten up the text.
Explain the --prefix option.
v2: Remove language about 'ninja clean' and move link to Meson
information about separate build directories earlier in the page.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Brian Paul [Wed, 6 Mar 2019 23:20:55 +0000 (16:20 -0700)]
st/mesa: minor refactoring of texture/sampler delete code
Rename st_texture_free_sampler_views() to
st_delete_texture_sampler_views() to align with
st_DeleteTextureObject(), its only caller.
Move the call to st_texture_release_all_sampler_views() from
st_DeleteTextureObject() to st_delete_texture_sampler_views()
so all the sampler view clean-up code is in one place.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Brian Paul [Wed, 6 Mar 2019 23:15:19 +0000 (16:15 -0700)]
st/mesa: rename st_texture_release_sampler_view()
To st_texture_release_context_sampler_view() to be more clear
that it's context-specific.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Brian Paul [Wed, 6 Mar 2019 23:09:09 +0000 (16:09 -0700)]
st/mesa: add/improve sampler view comments
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Brian Paul [Thu, 7 Mar 2019 16:55:09 +0000 (09:55 -0700)]
st/mesa: move around some code in st_context.c
st_init_driver_functions() is only called in st_context.c so there's
no need for the prototype in st_context.h
To avoid a forward declaration of st_init_driver_functions() in
st_context.c, we need to move around several other functions.
No functional change.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Brian Paul [Thu, 7 Mar 2019 16:21:53 +0000 (09:21 -0700)]
st/mesa: move utility functions, macros into new st_util.h file
To de-clutter st_context.h.
Clean up remaining function prototypes in st_context.h.
The st_vp_uses_current_values() helper is only used in st_context.c
so move it there.
The st_get_active_states() function is only used in st_context.c so
remove its prototype in st_context.h
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Juan A. Suarez Romero [Mon, 11 Mar 2019 17:33:54 +0000 (18:33 +0100)]
anv: destroy descriptor sets when pool gets reset
As stated in Vulkan spec:
"Resetting a descriptor pool recycles all of the resources from all
of the descriptor sets allocated from the descriptor pool back to
the descriptor pool, and the descriptor sets are implicitly freed."
This fixes dEQP-VK.api.descriptor_pool.*
Fixes: 14f6275c92f1 "anv/descriptor_set: add reference counting for..."
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
Timothy Arceri [Thu, 6 Dec 2018 05:00:40 +0000 (16:00 +1100)]
nir: find induction/limit vars in iand instructions
This will be used to help find the trip count of loops that look
like the following:
while (a < x && i < 8) {
...
i++;
}
Where the NIR will end up looking something like this:
vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */)
loop {
...
vec1 1 ssa_12 = ilt ssa_225, ssa_11
vec1 1 ssa_17 = ilt ssa_226, ssa_1
vec1 1 ssa_18 = iand ssa_12, ssa_17
vec1 1 ssa_19 = inot ssa_18
if ssa_19 {
...
break
} else {
...
}
}
On RADV this unrolls a bunch of loops in F1-2017 shaders.
Totals from affected shaders:
SGPRS: 4112 -> 4136 (0.58 %)
VGPRS: 4132 -> 4052 (-1.94 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 515444 -> 587720 (14.02 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Max Waves: 194 -> 196 (1.03 %)
Wait states: 0 -> 0 (0.00 %)
It also unrolls a couple of loops in shader-db on radeonsi.
Totals from affected shaders:
SGPRS: 128 -> 128 (0.00 %)
VGPRS: 64 -> 64 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 6880 -> 9504 (38.14 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 16 -> 16 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Timothy Arceri [Thu, 6 Dec 2018 04:56:55 +0000 (15:56 +1100)]
nir: pass nir_op to calculate_iterations()
Rather than getting this from the alu instruction this allows us
some flexibility. In the following pass we instead pass the
inverse op.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Timothy Arceri [Thu, 6 Dec 2018 02:29:05 +0000 (13:29 +1100)]
nir: add get_induction_and_limit_vars() helper to loop analysis
This helps make find_trip_count() a little easier to follow but
will also be used by a following patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Timothy Arceri [Thu, 6 Dec 2018 00:17:45 +0000 (11:17 +1100)]
nir: add helper to return inversion op of a comparison
This will be used to help find the trip count of loops that look
like the following:
while (a < x && i < 8) {
...
i++;
}
Where the NIR will end up looking something like this:
vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */)
loop {
...
vec1 1 ssa_12 = ilt ssa_225, ssa_11
vec1 1 ssa_17 = ilt ssa_226, ssa_1
vec1 1 ssa_18 = iand ssa_12, ssa_17
vec1 1 ssa_19 = inot ssa_18
if ssa_19 {
...
break
} else {
...
}
}
So in order to find the trip count we need to find the inverse of
ilt.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Timothy Arceri [Thu, 6 Dec 2018 00:12:12 +0000 (11:12 +1100)]
nir: simplify the loop analysis trip count code a little
Here we create a helper is_supported_terminator_condition()
and use that rather than embedding all the trip count code
inside a switch.
The new helper will also be used in a following patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Timothy Arceri [Tue, 20 Nov 2018 04:23:45 +0000 (15:23 +1100)]
nir: unroll some loops with a variable limit
For some loops can have a single terminator but the exact trip
count is still unknown. For example:
for (int i = 0; i < imin(x, 4); i++)
...
Shader-db results radeonsi (all affected are from Tropico 5):
Totals from affected shaders:
SGPRS: 144 -> 152 (5.56 %)
VGPRS: 124 -> 108 (-12.90 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 5180 -> 6640 (28.19 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 17 -> 21 (23.53 %)
Wait states: 0 -> 0 (0.00 %)
Shader-db results i965 (SKL):
total loops in shared programs: 3808 -> 3802 (-0.16%)
loops in affected programs: 6 -> 0
helped: 6
HURT: 0
vkpipeline-db results RADV (Unrolls some Skyrim VR shaders):
Totals from affected shaders:
SGPRS: 304 -> 304 (0.00 %)
VGPRS: 296 -> 292 (-1.35 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 15756 -> 25884 (64.28 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 29 -> 29 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
v2: fix bug where last iteration would get optimised away by
mistake.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Timothy Arceri [Tue, 20 Nov 2018 02:45:58 +0000 (13:45 +1100)]
nir: calculate trip count for more loops
This adds support to loop analysis for loops where the induction
variable is compared to the result of min(variable, constant).
For example:
for (int i = 0; i < imin(x, 4); i++)
...
We add a new bool to the loop terminator struct in order to
differentiate terminators with this exit condition.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Timothy Arceri [Tue, 20 Nov 2018 03:05:09 +0000 (14:05 +1100)]
nir: add partial loop unrolling support
This adds partial loop unrolling support and makes use of a
guessed trip count based on array access.
The code is written so that we could use partial unrolling
more generally, but for now it's only use when we have guessed
the trip count.
We use partial unrolling for this guessed trip count because its
possible any out of bounds array access doesn't otherwise affect
the shader e.g the stores/loads to/from the array are unused. So
we insert a copy of the loop in the innermost continue branch of
the unrolled loop. Later on its possible for nir_opt_dead_cf()
to then remove the loop in some cases.
A Renderdoc capture from the Rise of the Tomb Raider benchmark,
reports the following change in an affected compute shader:
GPU duration: 350 -> 325 microseconds
shader-db results radeonsi VEGA (NIR backend):
SGPRS: 1008 -> 816 (-19.05 %)
VGPRS: 684 -> 432 (-36.84 %)
Spilled SGPRs: 539 -> 0 (-100.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 39708 -> 45812 (15.37 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 105 -> 144 (37.14 %)
Wait states: 0 -> 0 (0.00 %)
shader-db results i965 SKL:
total instructions in shared programs:
13098265 ->
13103359 (0.04%)
instructions in affected programs: 5126 -> 10220 (99.38%)
helped: 0
HURT: 21
total cycles in shared programs:
332039949 ->
331985622 (-0.02%)
cycles in affected programs: 289252 -> 234925 (-18.78%)
helped: 12
HURT: 9
vkpipeline-db results VEGA:
Totals from affected shaders:
SGPRS: 184 -> 184 (0.00 %)
VGPRS: 448 -> 448 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 26076 -> 24428 (-6.32 %) bytes
LDS: 6 -> 6 (0.00 %) blocks
Max Waves: 5 -> 5 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Timothy Arceri [Mon, 19 Nov 2018 06:01:52 +0000 (17:01 +1100)]
nir: add new partially_unrolled bool to nir_loop
In order to stop continuously partially unrolling the same loop
we add the bool partially_unrolled to nir_loop, we add it here
rather than in nir_loop_info because nir_loop_info is only set
via loop analysis and is intended to be cleared before each
analysis. Also nir_loop_info is never cloned.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Timothy Arceri [Thu, 15 Nov 2018 12:23:09 +0000 (23:23 +1100)]
nir: add guess trip count support to loop analysis
This detects an induction variable used as an array index to guess
the trip count of the loop. This enables us to do a partial
unroll of the loop, which can eventually result in the loop being
eliminated.
v2: check if the induction var is used to index more than a single
array and if so get the size of the smallest array.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tomeu Vizoso [Fri, 8 Mar 2019 14:24:57 +0000 (15:24 +0100)]
panfrost: Add support for PAN_MESA_DEBUG
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Tomeu Vizoso [Fri, 8 Mar 2019 14:04:50 +0000 (15:04 +0100)]
panfrost/midgard: Add support for MIDGARD_MESA_DEBUG
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Xavier Bouchoux [Mon, 15 Oct 2018 14:24:29 +0000 (16:24 +0200)]
nir/spirv: Fix assert when unsampled OpTypeImage has unknown 'Depth'
'dxc' hlsl-to-spirv compiler appears to emit 2 (Unknown) in the depth field,
when the image is not sampled and the value is not needed.
Previously, shaders failed with:
SPIR-V parsing FAILED:
In file ../src/compiler/spirv/spirv_to_nir.c:1412
!is_shadow
632 bytes into the SPIR-V binary
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Sat, 9 Mar 2019 09:02:06 +0000 (01:02 -0800)]
iris: Fix write enable in pinning of depth/stencil resources
We may bind new Z/S buffers (which come via the framebuffer CSO,
triggering IRIS_DIRTY_DEPTH_BUFFER), but with writes disabled.
The next draw may enable Z or S writes (which come via the ZSA CSO,
triggering IRIS_DIRTY_WM_DEPTH_STENCIL), which requires us to update
our pin to have the write flag.
So, update pinning if either dirty flag changes. To clarify, pass
cso_zsa to the pinning function rather than pulling the random values
out of ice->state, which unfortunately have to exist for the resolve
code since iris_depth_stencil_alpha_state only exists in iris_state.c.
Kenneth Graunke [Sat, 9 Mar 2019 08:50:24 +0000 (00:50 -0800)]
iris: Refactor depth/stencil buffer pinning into a helper.
This avoids the code duplication that caused me to put things in the
wrong place in the previous commit. One used to have extra flushes,
but we moved those out so now these are identical and can be easily
shared.
Kenneth Graunke [Sat, 9 Mar 2019 08:42:54 +0000 (00:42 -0800)]
iris: Move depth/stencil flushes so they actually do something
Commit
d6dd57d43cd (iris: Add missing depth cache flushes) added the
depth/stencil flushes to the wrong place. I meant to add them to the
iris_upload_dirty_render_state code that emits the packets, but I
accidentally added them to the nearly identical looking code in
iris_restore_render_saved_bos. This meant we missed the actual flushing
at draw time, but instead did pointless flushing on the first draw in a
batch where things are already flushed anyway.
This commit moves them to iris_resolve.c, next to the depth prepares,
similar to what we do for color buffers. i965 does them elsewhere, but
I'm not sure why - this seems like the most consistent place.
Christian Gmeiner [Tue, 26 Feb 2019 17:41:07 +0000 (18:41 +0100)]
st/dri: allow direct UYVY import
Push this format to the pipe driver unchanged.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Kenneth Graunke [Fri, 8 Mar 2019 04:14:59 +0000 (20:14 -0800)]
iris: Fix TES gl_PatchVerticesIn handling.
1. If we switch the TCS for one with a different number of output
vertices, then the TES's gl_PatchVerticesIn value will change.
We need to re-upload in this case. For now, re-emit constants
whenever the TCS/TES are swapped out.
2. If there is no TCS, then we can't grab gl_PatchVerticesIn from
the TCS info. Since it's a passthrough, we can just use the
primitive's patch count (like the TCS gl_PatchVerticesIn does).
Fixes KHR-GL45.tessellation_shader.single.max_patch_vertices and
KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Kenneth Graunke [Thu, 7 Mar 2019 04:56:37 +0000 (20:56 -0800)]
iris: Rework default tessellation level uploads
Now that we've added a system value uploading mechanism, we may as well
reuse the same system for default tessellation levels. This simplifies
the state upload code a bit.
Also fixes:
KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_tessLevel
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Timur Kristóf [Wed, 13 Feb 2019 22:28:20 +0000 (00:28 +0200)]
iris: Face should be a system value.
This patch adds PIPE_CAP_TGSI_FS_FACE_IS_INTEGER_SYSVAL which
despite its name is not a TGSI-specific capability, just lets
the state tracker know that it should generate a system value
for FACE.
This is needed if we want to run tgsi_to_nir on iris.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Thu, 28 Feb 2019 20:02:58 +0000 (12:02 -0800)]
vc4: Switch the post-RA scheduler over to the DAG datastructure.
Just a small code reduction from shared infrastructure.
Eric Anholt [Thu, 28 Feb 2019 18:42:05 +0000 (10:42 -0800)]
v3d: Use the DAG datastructure for QPU instruction scheduling.
Just a small code reduction from shared infrastructure.
Eric Anholt [Thu, 28 Feb 2019 19:02:25 +0000 (11:02 -0800)]
vc4: Reuse list_for_each_entry_rev().
Eric Anholt [Thu, 28 Feb 2019 19:01:57 +0000 (11:01 -0800)]
v3d: Reuse list_for_each_entry_rev().
Eric Anholt [Thu, 28 Feb 2019 18:06:27 +0000 (10:06 -0800)]
vc4: Switch over to using the DAG datastructure for QIR scheduling.
Just a small code reduction from shared infrastructure.
Eric Anholt [Wed, 27 Feb 2019 19:12:59 +0000 (11:12 -0800)]
util: Add a DAG datastructure.
I keep writing this for various schedulers.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Kristian H. Kristensen [Fri, 1 Mar 2019 22:33:36 +0000 (14:33 -0800)]
freedreno/a6xx: Remove extra parens
There's a warning about this now.
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Kristian H. Kristensen [Fri, 1 Mar 2019 22:25:57 +0000 (14:25 -0800)]
freedreno: Use c_vis_args and no_override_init_args
Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
Chia-I Wu [Fri, 8 Feb 2019 21:45:53 +0000 (13:45 -0800)]
turnip: preliminary support for Wayland WSI
Chia-I Wu [Mon, 11 Feb 2019 19:12:32 +0000 (11:12 -0800)]
turnip: preliminary support for tu_GetImageSubresourceLayout
Chad Versace [Sat, 2 Feb 2019 01:08:51 +0000 (17:08 -0800)]
turnip: Use Vulkan 1.1 names instead of KHR
That is, drop KHR from all tokens that were promoted to Vulkan 1.1.
The consistency makes ctags more useful (it now jumps directly to the
real definitions in vulkan_core.h instead of the typedefs); and it makes
the code slightly less verbose.
Chia-I Wu [Fri, 8 Mar 2019 19:27:50 +0000 (11:27 -0800)]
turnip: guard -Dvulkan-driver=freedreno
Require -DI-love-half-baked-turnips=true as well to enable freedreno
vulkan driver.
Chia-I Wu [Fri, 22 Feb 2019 16:50:58 +0000 (08:50 -0800)]
turnip: preliminary support for tu_CmdDraw
Chia-I Wu [Fri, 22 Feb 2019 06:37:34 +0000 (22:37 -0800)]
turnip: preliminary support for draw state binding
This adds support for tu_CmdBindPipeline, tu_CmdBindVertexBuffers,
etc.
Chia-I Wu [Wed, 20 Feb 2019 22:26:06 +0000 (14:26 -0800)]
turnip: add draw_cs to tu_cmd_buffer
It will hold draw commands.
Chia-I Wu [Fri, 22 Feb 2019 06:31:36 +0000 (22:31 -0800)]
turnip: parse VkPipelineVertexInputStateCreateInfo
Chia-I Wu [Wed, 27 Feb 2019 06:10:34 +0000 (22:10 -0800)]
turnip: parse VkPipelineShaderStageCreateInfo
Chia-I Wu [Wed, 27 Feb 2019 06:09:37 +0000 (22:09 -0800)]
turnip: compile VkPipelineShaderStageCreateInfo
Compile all shaders and upload the binaries to a BO.
Chia-I Wu [Wed, 20 Feb 2019 17:53:47 +0000 (09:53 -0800)]
turnip: preliminary support for shader modules
Save SPIR-V in tu_shader_module. Tranlation to NIR happens in
tu_shader_create, and compilation to binary code happens in
tu_shader_compile. Both will be called during pipeline creation.
Chia-I Wu [Thu, 21 Feb 2019 22:58:52 +0000 (14:58 -0800)]
turnip: parse VkPipeline{Multisample,ColorBlend}StateCreateInfo
Chia-I Wu [Thu, 21 Feb 2019 19:46:59 +0000 (11:46 -0800)]
turnip: parse VkPipelineDepthStencilStateCreateInfo
Chia-I Wu [Wed, 27 Feb 2019 07:29:51 +0000 (23:29 -0800)]
turnip: parse VkPipelineRasterizationStateCreateInfo
Chia-I Wu [Tue, 19 Feb 2019 21:49:01 +0000 (13:49 -0800)]
turnip: parse VkPipelineViewportStateCreateInfo
Chia-I Wu [Thu, 21 Feb 2019 19:07:38 +0000 (11:07 -0800)]
turnip: parse VkPipelineInputAssemblyStateCreateInfo
Chia-I Wu [Thu, 21 Feb 2019 17:41:49 +0000 (09:41 -0800)]
turnip: parse VkPipelineDynamicStateCreateInfo
Chia-I Wu [Thu, 21 Feb 2019 17:22:17 +0000 (09:22 -0800)]
turnip: create a less dummy pipeline
Still dummy, but at least it is created from tu_pipeline_builder.
Chia-I Wu [Mon, 25 Feb 2019 22:38:34 +0000 (14:38 -0800)]
turnip: simplify tu_cs sub-streams usage
Let tu_cs_begin_sub_stream imply tu_cs_reserve_space, and
tu_cs_end_sub_stream imply tu_cs_sanity_check. Callers are no
longer required to call them (but can still do if they choose to).
Chia-I Wu [Mon, 25 Feb 2019 22:37:55 +0000 (14:37 -0800)]
turnip: fix tu_cs sub-streams
Update cs->start in tu_cs_end_sub_stream. Otherwise, the entry
would include commands from all prior sub-streams.
Chia-I Wu [Mon, 25 Feb 2019 22:57:03 +0000 (14:57 -0800)]
turnip: tu_cs_emit_array
Array version of tu_cs_emit. Useful for updating multiple
consecutive array-like registers, or loading a shader binary with
SS6_DIRECT.
Chia-I Wu [Mon, 25 Feb 2019 22:49:34 +0000 (14:49 -0800)]
turnip: add tu_cs_discard_entries
We will start a draw IB at the beginning of a subpass and consume it
at the end of the subpass. With tu_cs_discard_entries, we can reuse
the same tu_cs for all subpasses.
Chia-I Wu [Mon, 25 Feb 2019 22:55:06 +0000 (14:55 -0800)]
turnip: more/better asserts for tu_cs
Asserting (cur < end) in tu_cs_emit catches much less programming
errors comparing to asserting (cur < reserved_end). We should never
write more commands than what we have reserved.
Assert IB is non-empty and sane in tu_cs_emit_ib.
Chia-I Wu [Mon, 25 Feb 2019 22:44:52 +0000 (14:44 -0800)]
turnip: use 32-bit offset in tu_cs_entry
We don't support nor expect BOs to be that big in tu_cs.
Chia-I Wu [Mon, 25 Feb 2019 22:32:36 +0000 (14:32 -0800)]
turnip: mark IBs for dumping
Includes IBs in kernel cmdbuf dumps.
Eric Engestrom [Wed, 27 Feb 2019 12:31:06 +0000 (12:31 +0000)]
turnip: use the platform defines in vk.xml instead of hard-coding them
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Bas Nieuwenhuizen [Thu, 21 Feb 2019 21:39:22 +0000 (22:39 +0100)]
turnip: Add todo for copies.
Bas Nieuwenhuizen [Mon, 18 Feb 2019 15:43:24 +0000 (16:43 +0100)]
turnip: Add buffer->image DMA copies.
Passes dEQP-VK.api.copy_and_blit.core.buffer_to_image.*
Bas Nieuwenhuizen [Mon, 18 Feb 2019 15:13:23 +0000 (16:13 +0100)]
turnip: Add image->buffer DMA copies.
Passes dEQP-VK.api.copy_and_blit.core.image_to_buffer.*
Bas Nieuwenhuizen [Mon, 18 Feb 2019 15:09:27 +0000 (16:09 +0100)]
turnip: Implement buffer->buffer DMA copies.
Passes dEQP-VK.api.copy_and_blit.core.buffer_to_buffer.*
Bas Nieuwenhuizen [Mon, 4 Feb 2019 13:52:34 +0000 (14:52 +0100)]
turnip: Add tu6_rb_fmt_to_ifmt.
Bas Nieuwenhuizen [Mon, 18 Feb 2019 13:49:52 +0000 (14:49 +0100)]
turnip: Make tu6_emit_event_write shared.
Bas Nieuwenhuizen [Tue, 15 Jan 2019 21:54:15 +0000 (22:54 +0100)]
turnip: Add buffer memory binding.
Chia-I Wu [Thu, 14 Feb 2019 18:53:20 +0000 (10:53 -0800)]
turnip: respect color attachment formats
Make tu6_get_native_format available to tu_cmd_buffer and start
using of it.
Chia-I Wu [Thu, 14 Feb 2019 22:36:52 +0000 (14:36 -0800)]
turnip: preliminary support for fences
This should be quite complete feature-wise. External fences are
still missing. We probably also want to add a simpler path to
tu_WaitForFences for when fenceCount == 1.
Chia-I Wu [Wed, 13 Feb 2019 18:23:32 +0000 (10:23 -0800)]
turnip: fix VkClearValue packing
Add tu_pack_clear_value to correctly pack VkClearValue according to
VkFormat. It ignores the component order defined by VkFormat, and
always packs to WZYX order.
Chia-I Wu [Fri, 1 Feb 2019 18:36:19 +0000 (10:36 -0800)]
turnip: add support for VK_KHR_external_memory_{fd,dma_buf}
Chia-I Wu [Fri, 1 Feb 2019 18:27:28 +0000 (10:27 -0800)]
turnip: advertise VK_KHR_external_memory
AFAICT, it is supported. We don't need to handle any of the new
structs because our BOs can always be exported.
Chia-I Wu [Fri, 1 Feb 2019 18:12:38 +0000 (10:12 -0800)]
turnip: advertise VK_KHR_external_memory_capabilities
AFAICT, it is supported.
Chia-I Wu [Thu, 31 Jan 2019 23:03:03 +0000 (15:03 -0800)]
turnip: add functions to import/export prime fd
Add tu_bo_init_dmabuf, tu_bo_export_dmabuf, tu_gem_import_dmabuf,
and tu_gem_export_dmabuf.
Chad Versace [Sat, 2 Feb 2019 00:48:44 +0000 (16:48 -0800)]
turnip: Fix error behavior for VkPhysicalDeviceExternalImageFormatInfo
If the handle type is unsupported, then the spec requires us to return
VK_ERROR_FORMAT_NOT_SUPPORTED.
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Closes: https://gitlab.freedesktop.org/bnieuwenhuizen/mesa/merge_requests/17
Chia-I Wu [Fri, 25 Jan 2019 19:13:54 +0000 (11:13 -0800)]
turnip: add a more complete format table
A format table is an array of tu_native_format. Table lookup is
done through array indexing.
This commit defines a single format table for core VkFormat. It is
derived from the table in the gallium driver. There might be errors
introduced in the process of the conversion.
When an extension that defines new VkFormat is supported, we need to
add a new table for the extension.
Chia-I Wu [Fri, 11 Jan 2019 23:01:26 +0000 (15:01 -0800)]
turnip: preliminary support for loadOp and storeOp
- create tile_load_ib and tile_store_ib at the beginning of each
subpass
- execute the IBs at the end of each subpass
- no DONT_CARE support
- no subpass dependency analysis and subpass merging
- no zs support
- no true VkImageView support
- assume VK_FORMAT_B8G8R8A8_UNORM
- no tiling
- no MSAA
This also removes cur_cs from tu_cmd_buffer.
Chia-I Wu [Tue, 29 Jan 2019 23:00:34 +0000 (15:00 -0800)]
turnip: add TU_CS_MODE_SUB_STREAM
When in TU_CS_MODE_SUB_STREAM, tu_cs_begin_sub_stream (or
tu_cs_end_sub_stream) should be called instead of tu_cs_begin (or
tu_cs_end). It gives the caller a TU_CS_MODE_EXTERNAL cs to emit
commands to.
Chia-I Wu [Mon, 28 Jan 2019 22:33:20 +0000 (14:33 -0800)]
turnip: add tu_cs_mode
Add tu_cs_mode and TU_CS_MODE_EXTERNAL. When in
TU_CS_MODE_EXTERNAL, tu_cs wraps an external buffer and can not
grow.
This also moves tu_cs* up in tu_private.h, such that other structs
can embed tu_cs_entry.
Chia-I Wu [Tue, 29 Jan 2019 18:43:48 +0000 (10:43 -0800)]
turnip: provide both emit_ib and emit_call
tu_cs_emit_ib emits a CP_INDIRECT_BUFFER for a BO. tu_cs_emit_call
emits a CP_INDIRECT_BUFFER for each entry of a target cs.
Chia-I Wu [Tue, 29 Jan 2019 00:31:54 +0000 (16:31 -0800)]
turnip: add tu_cs_sanity_check
It replaces tu_cs_reserve_space_assert and can be called at any
time to sanity check tu_cs.
Chia-I Wu [Mon, 28 Jan 2019 23:55:40 +0000 (15:55 -0800)]
turnip: never fail tu_cs_begin/tu_cs_end
Error checking tu_cs_begin/tu_cs_end is too tedious for the callers.
Move tu_cs_add_bo and tu_cs_reserve_entry to tu_cs_reserve_space
such that tu_cs_begin/tu_cs_end never fails.
Chia-I Wu [Tue, 29 Jan 2019 00:24:48 +0000 (16:24 -0800)]
turnip: specify initial size in tu_cs_init
We will drop size parameter from tu_cs_begin shortly, such that
tu_cs_begin never fails.
Chia-I Wu [Mon, 28 Jan 2019 23:52:36 +0000 (15:52 -0800)]
turnip: add tu_cs_{reserve,add}_entry
We will stop calling tu_cs_reserve_entry in tu_cs_end shortly, such
that tu_cs_end never fails.
Chia-I Wu [Tue, 29 Jan 2019 22:09:17 +0000 (14:09 -0800)]
turnip: add internal helpers for tu_cs
Add tu_cs_get_offset, tu_cs_get_size, tu_cs_get_space, and
tu_cs_is_empty.
Chia-I Wu [Tue, 22 Jan 2019 18:27:22 +0000 (10:27 -0800)]
turnip: add tu_tiling_config
We need the current color/depth/stencil attachments and the current
render area to compute the tiling config.
We compute the tiling config at the beginning of each subpass for
the moment. We should change that when the driver can reorder/merge
subpasses.
It is very common that the render area is the entire framebuffer.
We might want to optimize for the case and compute the tiling config
in tu_framebuffer ctor.
Chia-I Wu [Tue, 22 Jan 2019 18:27:18 +0000 (10:27 -0800)]
turnip: preliminary support for tu_GetRenderAreaGranularity
Set it to tile alignments, 32x32 on 6xx.
Chia-I Wu [Fri, 18 Jan 2019 16:54:04 +0000 (08:54 -0800)]
turnip: emit HW init in tu_BeginCommandBuffer
Being the first commit that emits meaningful command packets, there
are many things included in this commit
- tu6_emit_xxx are low-level helpers that emit command packets
without boundary checks
- tu6_xxx are high-level helpers that emit command packets with
boundary checks
- cmdbuf->cs is a pointer to the current CS, so that we can use the
helpers above to emit to other CS
- use cmd as the variable name of tu_cmd_buffer
- there is a per-cmdbuf scratch bo for CP_EVENT_WRITE writeback
- there is a per-cmdbuf debug marker, using scratch reg 7 or 6
depending on whether the cmdbuf is primary or secondary
(olv, after rebase) REG_A6XX_SP_UNKNOWN_AB20 is renamed
Chia-I Wu [Fri, 18 Jan 2019 22:24:45 +0000 (14:24 -0800)]
turnip: add tu_cs_reserve_space(_assert)
They are used like
tu_cs_reserve_space(...);
tu_cs_emit(...);
...;
tu_cs_reserve_space_assert();
to make sure we reserved enough space at the beginning.
Chad Versace [Wed, 16 Jan 2019 23:01:35 +0000 (15:01 -0800)]
turnip: Annotate vkGetImageSubresourceLayout with tu_stub
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>