mesa.git
5 years agopanfrost: Document "depth-buffer writeback" bit
Alyssa Rosenzweig [Sat, 9 Mar 2019 00:12:07 +0000 (00:12 +0000)]
panfrost: Document "depth-buffer writeback" bit

This bit, if set, causes the depth buffer to be copied from GPU tile
memory to the provided depth buffer in main memory. If not set, the GPU
will not access the main memory (saving considerable memory bandwidth if
depth results are not actually used).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost: Support linear depth textures
Alyssa Rosenzweig [Fri, 8 Mar 2019 23:41:12 +0000 (23:41 +0000)]
panfrost: Support linear depth textures

This combination has not yet been seen "in the wild" in traces, but to
support linear depth FBOs, ~bruteforce reveals this bit pattern is
necessary. It's not yet clear why the meanings of 0x1 and 0x2 are
essentially flipped (tiled vs linear for colour, linear vs some sort of
tiled for depth).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
5 years agopanfrost: Allocate dedicated slab for linear BOs
Alyssa Rosenzweig [Fri, 8 Mar 2019 23:36:02 +0000 (23:36 +0000)]
panfrost: Allocate dedicated slab for linear BOs

Previously, linear BOs shared memory with each other to minimize kernel
round-trips / latency, as well as to work around a bug in the free_slab
function. These concerns are invalid now, but continuing to use the slab
allocator for BOs resulted in memory allocation errors. This issue was
aggravated, though not introduced (so not a real regression) in the
previous commit.

v2 (unreviewed): Fix bug in v1 preventing munmaps from working

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
5 years agopanfrost: Determine framebuffer format bits late
Alyssa Rosenzweig [Thu, 7 Mar 2019 04:42:49 +0000 (04:42 +0000)]
panfrost: Determine framebuffer format bits late

Again, these formats are only properly known at the time of fragment job
emit. Rather than hardcoding the format, at least for MFBD we begin to
construct the format bits on-demand. This cleans up the code,
futureproofs for ES3 framebuffer formats, and should fix bugs regarding
FBO colour swizzles.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>
5 years agopanfrost: Delay color buffer setup
Alyssa Rosenzweig [Thu, 7 Mar 2019 04:19:21 +0000 (04:19 +0000)]
panfrost: Delay color buffer setup

In an effort to cleanup framebuffer management code, we delay
colour buffer setup until the FRAGMENT job is actually emitted, allowing
the AFBC and linear codepaths to be unified.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>
5 years agopanfrost: Combine has_afbc/tiled in layout enum
Alyssa Rosenzweig [Thu, 7 Mar 2019 03:52:20 +0000 (03:52 +0000)]
panfrost: Combine has_afbc/tiled in layout enum

AFBC, tiled, and linear BO layouts are mutually exclusive; they should
be coupled via a single enum rather than ad hoc checks of booleans.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>
5 years agopanfrost: Cleanup needless if in create_bo
Alyssa Rosenzweig [Thu, 7 Mar 2019 03:24:45 +0000 (03:24 +0000)]
panfrost: Cleanup needless if in create_bo

I'm not sure why we were checking for these additional criteria (likely
inherited from some other driver); remove the needless checks to cleanup
the code and perhaps fix some bugs down the line.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>
5 years agoi965: Reimplement all the PIPE_CONTROL rules.
Kenneth Graunke [Fri, 17 Nov 2017 07:47:43 +0000 (23:47 -0800)]
i965: Reimplement all the PIPE_CONTROL rules.

This implements virtually all documented PIPE_CONTROL restrictions
in a centralized helper.  You now simply ask for the operations you
want, and the pipe control "brain" will figure out exactly what pipe
controls to emit to make that happen without tanking your system.

The hope is that this will fix some intermittent flushing issues as
well as GPU hangs.  However, it also has a high risk of causing GPU
hangs and other regressions, as this is a particularly sensitive
area and poking the bear isn't always advisable.

Mark Janes noted that this patch helps with some GPU hangs on Icelake.

This does re-enable the VF Invalidate => Write Immediate workaround
on Gen8, which had been disabled (bug 103787) due to GPU hangs.  The
old code did this workaround after another which would have added CS
stall bits, so it missed a workaround.  The new code orders them
properly and appears to work.

v4: Don't pass "bo, offset, imm" to a recursive CS stall (caught by
    Topi Pohjolainen), drop Gen10 workarounds that are unnecessary for
    production hardware.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
5 years agoi965: Use genxml for emitting PIPE_CONTROL.
Kenneth Graunke [Thu, 1 Nov 2018 22:55:51 +0000 (15:55 -0700)]
i965: Use genxml for emitting PIPE_CONTROL.

While this does add a bunch of boilerplate, it also protects us against
the hardware moving bits, or changing their meaning.  For something as
finnicky as PIPE_CONTROL, the extra safety seems worth it.

We turn PIPE_CONTROL_* into an bitfield of arbitrary flags, and then
pack them appropriately.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
5 years agoi965: Rename ISP_DIS to INDIRECT_STATE_POINTERS_DISABLE.
Kenneth Graunke [Thu, 1 Nov 2018 22:55:21 +0000 (15:55 -0700)]
i965: Rename ISP_DIS to INDIRECT_STATE_POINTERS_DISABLE.

Clearer name.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
5 years agoi965: Move some genX infrastructure to genX_boilerplate.h.
Kenneth Graunke [Fri, 17 Nov 2017 06:37:02 +0000 (22:37 -0800)]
i965: Move some genX infrastructure to genX_boilerplate.h.

This will let us make multiple genX_*.c files, without copy and pasting
all this boilerplate.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
5 years agogallium/winsys/kms: fix incomplete type compilation failure
Brian Paul [Fri, 8 Mar 2019 22:50:58 +0000 (15:50 -0700)]
gallium/winsys/kms: fix incomplete type compilation failure

Fixes:
../src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c: In function ‘kms_sw_displaytarget_from_handle’:
../src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c:402:60: error: dereferencing pointer to incomplete type ‘const struct pipe_resource’
                                                       templ->format,
                                                            ^

Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
5 years agodrisw: fix incomplete type compilation failure
Brian Paul [Fri, 8 Mar 2019 22:49:49 +0000 (15:49 -0700)]
drisw: fix incomplete type compilation failure

Fixes:
../src/gallium/winsys/sw/dri/dri_sw_winsys.c: In function ‘dri_sw_displaytarget_display’:
../src/gallium/winsys/sw/dri/dri_sw_winsys.c:255:39: error: dereferencing pointer to incomplete type ‘struct pipe_box’
       offset = dri_sw_dt->stride * box->y;
                                       ^

Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
5 years agodocs: try to improve the Meson documentation (v2)
Brian Paul [Fri, 8 Mar 2019 03:39:49 +0000 (20:39 -0700)]
docs: try to improve the Meson documentation (v2)

Add new Introduction and Advanced Usage sections.
Spell out a few more details, like "ninja install".
Improve the layout around example commands.
Fix grammatical errors and tighten up the text.
Explain the --prefix option.

v2: Remove language about 'ninja clean' and move link to Meson
information about separate build directories earlier in the page.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agost/mesa: minor refactoring of texture/sampler delete code
Brian Paul [Wed, 6 Mar 2019 23:20:55 +0000 (16:20 -0700)]
st/mesa: minor refactoring of texture/sampler delete code

Rename st_texture_free_sampler_views() to
st_delete_texture_sampler_views() to align with
st_DeleteTextureObject(), its only caller.

Move the call to st_texture_release_all_sampler_views() from
st_DeleteTextureObject() to st_delete_texture_sampler_views()
so all the sampler view clean-up code is in one place.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
5 years agost/mesa: rename st_texture_release_sampler_view()
Brian Paul [Wed, 6 Mar 2019 23:15:19 +0000 (16:15 -0700)]
st/mesa: rename st_texture_release_sampler_view()

To st_texture_release_context_sampler_view() to be more clear
that it's context-specific.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
5 years agost/mesa: add/improve sampler view comments
Brian Paul [Wed, 6 Mar 2019 23:09:09 +0000 (16:09 -0700)]
st/mesa: add/improve sampler view comments

Reviewed-by: Neha Bhende <bhenden@vmware.com>
5 years agost/mesa: move around some code in st_context.c
Brian Paul [Thu, 7 Mar 2019 16:55:09 +0000 (09:55 -0700)]
st/mesa: move around some code in st_context.c

st_init_driver_functions() is only called in st_context.c so there's
no need for the prototype in st_context.h

To avoid a forward declaration of st_init_driver_functions() in
st_context.c, we need to move around several other functions.

No functional change.

Reviewed-by: Neha Bhende <bhenden@vmware.com>
5 years agost/mesa: move utility functions, macros into new st_util.h file
Brian Paul [Thu, 7 Mar 2019 16:21:53 +0000 (09:21 -0700)]
st/mesa: move utility functions, macros into new st_util.h file

To de-clutter st_context.h.

Clean up remaining function prototypes in st_context.h.

The st_vp_uses_current_values() helper is only used in st_context.c
so move it there.

The st_get_active_states() function is only used in st_context.c so
remove its prototype in st_context.h

Reviewed-by: Neha Bhende <bhenden@vmware.com>
5 years agoanv: destroy descriptor sets when pool gets reset
Juan A. Suarez Romero [Mon, 11 Mar 2019 17:33:54 +0000 (18:33 +0100)]
anv: destroy descriptor sets when pool gets reset

As stated in Vulkan spec:
   "Resetting a descriptor pool recycles all of the resources from all
    of the descriptor sets allocated from the descriptor pool back to
    the descriptor pool, and the descriptor sets are implicitly freed."

This fixes dEQP-VK.api.descriptor_pool.*

Fixes: 14f6275c92f1 "anv/descriptor_set: add reference counting for..."
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
5 years agonir: find induction/limit vars in iand instructions
Timothy Arceri [Thu, 6 Dec 2018 05:00:40 +0000 (16:00 +1100)]
nir: find induction/limit vars in iand instructions

This will be used to help find the trip count of loops that look
like the following:

   while (a < x && i < 8) {
      ...
      i++;
   }

Where the NIR will end up looking something like this:

   vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */)
   loop {
      ...
      vec1 1 ssa_12 = ilt ssa_225, ssa_11
      vec1 1 ssa_17 = ilt ssa_226, ssa_1
      vec1 1 ssa_18 = iand ssa_12, ssa_17
      vec1 1 ssa_19 = inot ssa_18

      if ssa_19 {
         ...
         break
      } else {
         ...
      }
   }

On RADV this unrolls a bunch of loops in F1-2017 shaders.

Totals from affected shaders:
SGPRS: 4112 -> 4136 (0.58 %)
VGPRS: 4132 -> 4052 (-1.94 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 515444 -> 587720 (14.02 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Max Waves: 194 -> 196 (1.03 %)
Wait states: 0 -> 0 (0.00 %)

It also unrolls a couple of loops in shader-db on radeonsi.

Totals from affected shaders:
SGPRS: 128 -> 128 (0.00 %)
VGPRS: 64 -> 64 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 6880 -> 9504 (38.14 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 16 -> 16 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
5 years agonir: pass nir_op to calculate_iterations()
Timothy Arceri [Thu, 6 Dec 2018 04:56:55 +0000 (15:56 +1100)]
nir: pass nir_op to calculate_iterations()

Rather than getting this from the alu instruction this allows us
some flexibility. In the following pass we instead pass the
inverse op.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
5 years agonir: add get_induction_and_limit_vars() helper to loop analysis
Timothy Arceri [Thu, 6 Dec 2018 02:29:05 +0000 (13:29 +1100)]
nir: add get_induction_and_limit_vars() helper to loop analysis

This helps make find_trip_count() a little easier to follow but
will also be used by a following patch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
5 years agonir: add helper to return inversion op of a comparison
Timothy Arceri [Thu, 6 Dec 2018 00:17:45 +0000 (11:17 +1100)]
nir: add helper to return inversion op of a comparison

This will be used to help find the trip count of loops that look
like the following:

   while (a < x && i < 8) {
      ...
      i++;
   }

Where the NIR will end up looking something like this:

   vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */)
   loop {
      ...
      vec1 1 ssa_12 = ilt ssa_225, ssa_11
      vec1 1 ssa_17 = ilt ssa_226, ssa_1
      vec1 1 ssa_18 = iand ssa_12, ssa_17
      vec1 1 ssa_19 = inot ssa_18

      if ssa_19 {
         ...
         break
      } else {
         ...
      }
   }

So in order to find the trip count we need to find the inverse of
ilt.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
5 years agonir: simplify the loop analysis trip count code a little
Timothy Arceri [Thu, 6 Dec 2018 00:12:12 +0000 (11:12 +1100)]
nir: simplify the loop analysis trip count code a little

Here we create a helper is_supported_terminator_condition()
and use that rather than embedding all the trip count code
inside a switch.

The new helper will also be used in a following patch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
5 years agonir: unroll some loops with a variable limit
Timothy Arceri [Tue, 20 Nov 2018 04:23:45 +0000 (15:23 +1100)]
nir: unroll some loops with a variable limit

For some loops can have a single terminator but the exact trip
count is still unknown. For example:

   for (int i = 0; i < imin(x, 4); i++)
      ...

Shader-db results radeonsi (all affected are from Tropico 5):

Totals from affected shaders:
SGPRS: 144 -> 152 (5.56 %)
VGPRS: 124 -> 108 (-12.90 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 5180 -> 6640 (28.19 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 17 -> 21 (23.53 %)
Wait states: 0 -> 0 (0.00 %)

Shader-db results i965 (SKL):

total loops in shared programs: 3808 -> 3802 (-0.16%)
loops in affected programs: 6 -> 0
helped: 6
HURT: 0

vkpipeline-db results RADV (Unrolls some Skyrim VR shaders):

Totals from affected shaders:
SGPRS: 304 -> 304 (0.00 %)
VGPRS: 296 -> 292 (-1.35 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 15756 -> 25884 (64.28 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 29 -> 29 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

v2: fix bug where last iteration would get optimised away by
    mistake.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
5 years agonir: calculate trip count for more loops
Timothy Arceri [Tue, 20 Nov 2018 02:45:58 +0000 (13:45 +1100)]
nir: calculate trip count for more loops

This adds support to loop analysis for loops where the induction
variable is compared to the result of min(variable, constant).

For example:

   for (int i = 0; i < imin(x, 4); i++)
      ...

We add a new bool to the loop terminator struct in order to
differentiate terminators with this exit condition.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
5 years agonir: add partial loop unrolling support
Timothy Arceri [Tue, 20 Nov 2018 03:05:09 +0000 (14:05 +1100)]
nir: add partial loop unrolling support

This adds partial loop unrolling support and makes use of a
guessed trip count based on array access.

The code is written so that we could use partial unrolling
more generally, but for now it's only use when we have guessed
the trip count.

We use partial unrolling for this guessed trip count because its
possible any out of bounds array access doesn't otherwise affect
the shader e.g the stores/loads to/from the array are unused. So
we insert a copy of the loop in the innermost continue branch of
the unrolled loop. Later on its possible for nir_opt_dead_cf()
to then remove the loop in some cases.

A Renderdoc capture from the Rise of the Tomb Raider benchmark,
reports the following change in an affected compute shader:

GPU duration: 350 -> 325 microseconds

shader-db results radeonsi VEGA (NIR backend):

SGPRS: 1008 -> 816 (-19.05 %)
VGPRS: 684 -> 432 (-36.84 %)
Spilled SGPRs: 539 -> 0 (-100.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 39708 -> 45812 (15.37 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 105 -> 144 (37.14 %)
Wait states: 0 -> 0 (0.00 %)

shader-db results i965 SKL:

total instructions in shared programs: 13098265 -> 13103359 (0.04%)
instructions in affected programs: 5126 -> 10220 (99.38%)
helped: 0
HURT: 21

total cycles in shared programs: 332039949 -> 331985622 (-0.02%)
cycles in affected programs: 289252 -> 234925 (-18.78%)
helped: 12
HURT: 9

vkpipeline-db results VEGA:

Totals from affected shaders:
SGPRS: 184 -> 184 (0.00 %)
VGPRS: 448 -> 448 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 26076 -> 24428 (-6.32 %) bytes
LDS: 6 -> 6 (0.00 %) blocks
Max Waves: 5 -> 5 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
5 years agonir: add new partially_unrolled bool to nir_loop
Timothy Arceri [Mon, 19 Nov 2018 06:01:52 +0000 (17:01 +1100)]
nir: add new partially_unrolled bool to nir_loop

In order to stop continuously partially unrolling the same loop
we add the bool partially_unrolled to nir_loop, we add it here
rather than in nir_loop_info because nir_loop_info is only set
via loop analysis and is intended to be cleared before each
analysis. Also nir_loop_info is never cloned.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
5 years agonir: add guess trip count support to loop analysis
Timothy Arceri [Thu, 15 Nov 2018 12:23:09 +0000 (23:23 +1100)]
nir: add guess trip count support to loop analysis

This detects an induction variable used as an array index to guess
the trip count of the loop. This enables us to do a partial
unroll of the loop, which can eventually result in the loop being
eliminated.

v2: check if the induction var is used to index more than a single
    array and if so get the size of the smallest array.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
5 years agopanfrost: Add support for PAN_MESA_DEBUG
Tomeu Vizoso [Fri, 8 Mar 2019 14:24:57 +0000 (15:24 +0100)]
panfrost: Add support for PAN_MESA_DEBUG

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/midgard: Add support for MIDGARD_MESA_DEBUG
Tomeu Vizoso [Fri, 8 Mar 2019 14:04:50 +0000 (15:04 +0100)]
panfrost/midgard: Add support for MIDGARD_MESA_DEBUG

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agonir/spirv: Fix assert when unsampled OpTypeImage has unknown 'Depth'
Xavier Bouchoux [Mon, 15 Oct 2018 14:24:29 +0000 (16:24 +0200)]
nir/spirv: Fix assert when unsampled OpTypeImage has unknown 'Depth'

'dxc' hlsl-to-spirv compiler appears to emit 2 (Unknown) in the depth field,
when the image is not sampled and the value is not needed.

Previously, shaders failed with:

SPIR-V parsing FAILED:
    In file ../src/compiler/spirv/spirv_to_nir.c:1412
    !is_shadow
    632 bytes into the SPIR-V binary

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoiris: Fix write enable in pinning of depth/stencil resources
Kenneth Graunke [Sat, 9 Mar 2019 09:02:06 +0000 (01:02 -0800)]
iris: Fix write enable in pinning of depth/stencil resources

We may bind new Z/S buffers (which come via the framebuffer CSO,
triggering IRIS_DIRTY_DEPTH_BUFFER), but with writes disabled.

The next draw may enable Z or S writes (which come via the ZSA CSO,
triggering IRIS_DIRTY_WM_DEPTH_STENCIL), which requires us to update
our pin to have the write flag.

So, update pinning if either dirty flag changes.  To clarify, pass
cso_zsa to the pinning function rather than pulling the random values
out of ice->state, which unfortunately have to exist for the resolve
code since iris_depth_stencil_alpha_state only exists in iris_state.c.

5 years agoiris: Refactor depth/stencil buffer pinning into a helper.
Kenneth Graunke [Sat, 9 Mar 2019 08:50:24 +0000 (00:50 -0800)]
iris: Refactor depth/stencil buffer pinning into a helper.

This avoids the code duplication that caused me to put things in the
wrong place in the previous commit.  One used to have extra flushes,
but we moved those out so now these are identical and can be easily
shared.

5 years agoiris: Move depth/stencil flushes so they actually do something
Kenneth Graunke [Sat, 9 Mar 2019 08:42:54 +0000 (00:42 -0800)]
iris: Move depth/stencil flushes so they actually do something

Commit d6dd57d43cd (iris: Add missing depth cache flushes) added the
depth/stencil flushes to the wrong place.  I meant to add them to the
iris_upload_dirty_render_state code that emits the packets, but I
accidentally added them to the nearly identical looking code in
iris_restore_render_saved_bos.  This meant we missed the actual flushing
at draw time, but instead did pointless flushing on the first draw in a
batch where things are already flushed anyway.

This commit moves them to iris_resolve.c, next to the depth prepares,
similar to what we do for color buffers.  i965 does them elsewhere, but
I'm not sure why - this seems like the most consistent place.

5 years agost/dri: allow direct UYVY import
Christian Gmeiner [Tue, 26 Feb 2019 17:41:07 +0000 (18:41 +0100)]
st/dri: allow direct UYVY import

Push this format to the pipe driver unchanged.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agoiris: Fix TES gl_PatchVerticesIn handling.
Kenneth Graunke [Fri, 8 Mar 2019 04:14:59 +0000 (20:14 -0800)]
iris: Fix TES gl_PatchVerticesIn handling.

1. If we switch the TCS for one with a different number of output
   vertices, then the TES's gl_PatchVerticesIn value will change.
   We need to re-upload in this case.  For now, re-emit constants
   whenever the TCS/TES are swapped out.

2. If there is no TCS, then we can't grab gl_PatchVerticesIn from
   the TCS info.  Since it's a passthrough, we can just use the
   primitive's patch count (like the TCS gl_PatchVerticesIn does).

Fixes KHR-GL45.tessellation_shader.single.max_patch_vertices and
KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoiris: Rework default tessellation level uploads
Kenneth Graunke [Thu, 7 Mar 2019 04:56:37 +0000 (20:56 -0800)]
iris: Rework default tessellation level uploads

Now that we've added a system value uploading mechanism, we may as well
reuse the same system for default tessellation levels.  This simplifies
the state upload code a bit.

Also fixes:
KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_tessLevel

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoiris: Face should be a system value.
Timur Kristóf [Wed, 13 Feb 2019 22:28:20 +0000 (00:28 +0200)]
iris: Face should be a system value.

This patch adds PIPE_CAP_TGSI_FS_FACE_IS_INTEGER_SYSVAL which
despite its name is not a TGSI-specific capability, just lets
the state tracker know that it should generate a system value
for FACE.

This is needed if we want to run tgsi_to_nir on iris.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agovc4: Switch the post-RA scheduler over to the DAG datastructure.
Eric Anholt [Thu, 28 Feb 2019 20:02:58 +0000 (12:02 -0800)]
vc4: Switch the post-RA scheduler over to the DAG datastructure.

Just a small code reduction from shared infrastructure.

5 years agov3d: Use the DAG datastructure for QPU instruction scheduling.
Eric Anholt [Thu, 28 Feb 2019 18:42:05 +0000 (10:42 -0800)]
v3d: Use the DAG datastructure for QPU instruction scheduling.

Just a small code reduction from shared infrastructure.

5 years agovc4: Reuse list_for_each_entry_rev().
Eric Anholt [Thu, 28 Feb 2019 19:02:25 +0000 (11:02 -0800)]
vc4: Reuse list_for_each_entry_rev().

5 years agov3d: Reuse list_for_each_entry_rev().
Eric Anholt [Thu, 28 Feb 2019 19:01:57 +0000 (11:01 -0800)]
v3d: Reuse list_for_each_entry_rev().

5 years agovc4: Switch over to using the DAG datastructure for QIR scheduling.
Eric Anholt [Thu, 28 Feb 2019 18:06:27 +0000 (10:06 -0800)]
vc4: Switch over to using the DAG datastructure for QIR scheduling.

Just a small code reduction from shared infrastructure.

5 years agoutil: Add a DAG datastructure.
Eric Anholt [Wed, 27 Feb 2019 19:12:59 +0000 (11:12 -0800)]
util: Add a DAG datastructure.

I keep writing this for various schedulers.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
5 years agofreedreno/a6xx: Remove extra parens
Kristian H. Kristensen [Fri, 1 Mar 2019 22:33:36 +0000 (14:33 -0800)]
freedreno/a6xx: Remove extra parens

There's a warning about this now.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
5 years agofreedreno: Use c_vis_args and no_override_init_args
Kristian H. Kristensen [Fri, 1 Mar 2019 22:25:57 +0000 (14:25 -0800)]
freedreno: Use c_vis_args and no_override_init_args

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>
5 years agoturnip: preliminary support for Wayland WSI
Chia-I Wu [Fri, 8 Feb 2019 21:45:53 +0000 (13:45 -0800)]
turnip: preliminary support for Wayland WSI

5 years agoturnip: preliminary support for tu_GetImageSubresourceLayout
Chia-I Wu [Mon, 11 Feb 2019 19:12:32 +0000 (11:12 -0800)]
turnip: preliminary support for tu_GetImageSubresourceLayout

5 years agoturnip: Use Vulkan 1.1 names instead of KHR
Chad Versace [Sat, 2 Feb 2019 01:08:51 +0000 (17:08 -0800)]
turnip: Use Vulkan 1.1 names instead of KHR

That is, drop KHR from all tokens that were promoted to Vulkan 1.1.
The consistency makes ctags more useful (it now jumps directly to the
real definitions in vulkan_core.h instead of the typedefs); and it makes
the code slightly less verbose.

5 years agoturnip: guard -Dvulkan-driver=freedreno
Chia-I Wu [Fri, 8 Mar 2019 19:27:50 +0000 (11:27 -0800)]
turnip: guard -Dvulkan-driver=freedreno

Require -DI-love-half-baked-turnips=true as well to enable freedreno
vulkan driver.

5 years agoturnip: preliminary support for tu_CmdDraw
Chia-I Wu [Fri, 22 Feb 2019 16:50:58 +0000 (08:50 -0800)]
turnip: preliminary support for tu_CmdDraw

5 years agoturnip: preliminary support for draw state binding
Chia-I Wu [Fri, 22 Feb 2019 06:37:34 +0000 (22:37 -0800)]
turnip: preliminary support for draw state binding

This adds support for tu_CmdBindPipeline, tu_CmdBindVertexBuffers,
etc.

5 years agoturnip: add draw_cs to tu_cmd_buffer
Chia-I Wu [Wed, 20 Feb 2019 22:26:06 +0000 (14:26 -0800)]
turnip: add draw_cs to tu_cmd_buffer

It will hold draw commands.

5 years agoturnip: parse VkPipelineVertexInputStateCreateInfo
Chia-I Wu [Fri, 22 Feb 2019 06:31:36 +0000 (22:31 -0800)]
turnip: parse VkPipelineVertexInputStateCreateInfo

5 years agoturnip: parse VkPipelineShaderStageCreateInfo
Chia-I Wu [Wed, 27 Feb 2019 06:10:34 +0000 (22:10 -0800)]
turnip: parse VkPipelineShaderStageCreateInfo

5 years agoturnip: compile VkPipelineShaderStageCreateInfo
Chia-I Wu [Wed, 27 Feb 2019 06:09:37 +0000 (22:09 -0800)]
turnip: compile VkPipelineShaderStageCreateInfo

Compile all shaders and upload the binaries to a BO.

5 years agoturnip: preliminary support for shader modules
Chia-I Wu [Wed, 20 Feb 2019 17:53:47 +0000 (09:53 -0800)]
turnip: preliminary support for shader modules

Save SPIR-V in tu_shader_module.  Tranlation to NIR happens in
tu_shader_create, and compilation to binary code happens in
tu_shader_compile.  Both will be called during pipeline creation.

5 years agoturnip: parse VkPipeline{Multisample,ColorBlend}StateCreateInfo
Chia-I Wu [Thu, 21 Feb 2019 22:58:52 +0000 (14:58 -0800)]
turnip: parse VkPipeline{Multisample,ColorBlend}StateCreateInfo

5 years agoturnip: parse VkPipelineDepthStencilStateCreateInfo
Chia-I Wu [Thu, 21 Feb 2019 19:46:59 +0000 (11:46 -0800)]
turnip: parse VkPipelineDepthStencilStateCreateInfo

5 years agoturnip: parse VkPipelineRasterizationStateCreateInfo
Chia-I Wu [Wed, 27 Feb 2019 07:29:51 +0000 (23:29 -0800)]
turnip: parse VkPipelineRasterizationStateCreateInfo

5 years agoturnip: parse VkPipelineViewportStateCreateInfo
Chia-I Wu [Tue, 19 Feb 2019 21:49:01 +0000 (13:49 -0800)]
turnip: parse VkPipelineViewportStateCreateInfo

5 years agoturnip: parse VkPipelineInputAssemblyStateCreateInfo
Chia-I Wu [Thu, 21 Feb 2019 19:07:38 +0000 (11:07 -0800)]
turnip: parse VkPipelineInputAssemblyStateCreateInfo

5 years agoturnip: parse VkPipelineDynamicStateCreateInfo
Chia-I Wu [Thu, 21 Feb 2019 17:41:49 +0000 (09:41 -0800)]
turnip: parse VkPipelineDynamicStateCreateInfo

5 years agoturnip: create a less dummy pipeline
Chia-I Wu [Thu, 21 Feb 2019 17:22:17 +0000 (09:22 -0800)]
turnip: create a less dummy pipeline

Still dummy, but at least it is created from tu_pipeline_builder.

5 years agoturnip: simplify tu_cs sub-streams usage
Chia-I Wu [Mon, 25 Feb 2019 22:38:34 +0000 (14:38 -0800)]
turnip: simplify tu_cs sub-streams usage

Let tu_cs_begin_sub_stream imply tu_cs_reserve_space, and
tu_cs_end_sub_stream imply tu_cs_sanity_check.  Callers are no
longer required to call them (but can still do if they choose to).

5 years agoturnip: fix tu_cs sub-streams
Chia-I Wu [Mon, 25 Feb 2019 22:37:55 +0000 (14:37 -0800)]
turnip: fix tu_cs sub-streams

Update cs->start in tu_cs_end_sub_stream.  Otherwise, the entry
would include commands from all prior sub-streams.

5 years agoturnip: tu_cs_emit_array
Chia-I Wu [Mon, 25 Feb 2019 22:57:03 +0000 (14:57 -0800)]
turnip: tu_cs_emit_array

Array version of tu_cs_emit.  Useful for updating multiple
consecutive array-like registers, or loading a shader binary with
SS6_DIRECT.

5 years agoturnip: add tu_cs_discard_entries
Chia-I Wu [Mon, 25 Feb 2019 22:49:34 +0000 (14:49 -0800)]
turnip: add tu_cs_discard_entries

We will start a draw IB at the beginning of a subpass and consume it
at the end of the subpass.  With tu_cs_discard_entries, we can reuse
the same tu_cs for all subpasses.

5 years agoturnip: more/better asserts for tu_cs
Chia-I Wu [Mon, 25 Feb 2019 22:55:06 +0000 (14:55 -0800)]
turnip: more/better asserts for tu_cs

Asserting (cur < end) in tu_cs_emit catches much less programming
errors comparing to asserting (cur < reserved_end).  We should never
write more commands than what we have reserved.

Assert IB is non-empty and sane in tu_cs_emit_ib.

5 years agoturnip: use 32-bit offset in tu_cs_entry
Chia-I Wu [Mon, 25 Feb 2019 22:44:52 +0000 (14:44 -0800)]
turnip: use 32-bit offset in tu_cs_entry

We don't support nor expect BOs to be that big in tu_cs.

5 years agoturnip: mark IBs for dumping
Chia-I Wu [Mon, 25 Feb 2019 22:32:36 +0000 (14:32 -0800)]
turnip: mark IBs for dumping

Includes IBs in kernel cmdbuf dumps.

5 years agoturnip: use the platform defines in vk.xml instead of hard-coding them
Eric Engestrom [Wed, 27 Feb 2019 12:31:06 +0000 (12:31 +0000)]
turnip: use the platform defines in vk.xml instead of hard-coding them

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoturnip: Add todo for copies.
Bas Nieuwenhuizen [Thu, 21 Feb 2019 21:39:22 +0000 (22:39 +0100)]
turnip: Add todo for copies.

5 years agoturnip: Add buffer->image DMA copies.
Bas Nieuwenhuizen [Mon, 18 Feb 2019 15:43:24 +0000 (16:43 +0100)]
turnip: Add buffer->image DMA copies.

Passes dEQP-VK.api.copy_and_blit.core.buffer_to_image.*

5 years agoturnip: Add image->buffer DMA copies.
Bas Nieuwenhuizen [Mon, 18 Feb 2019 15:13:23 +0000 (16:13 +0100)]
turnip: Add image->buffer DMA copies.

Passes dEQP-VK.api.copy_and_blit.core.image_to_buffer.*

5 years agoturnip: Implement buffer->buffer DMA copies.
Bas Nieuwenhuizen [Mon, 18 Feb 2019 15:09:27 +0000 (16:09 +0100)]
turnip: Implement buffer->buffer DMA copies.

Passes dEQP-VK.api.copy_and_blit.core.buffer_to_buffer.*

5 years agoturnip: Add tu6_rb_fmt_to_ifmt.
Bas Nieuwenhuizen [Mon, 4 Feb 2019 13:52:34 +0000 (14:52 +0100)]
turnip: Add tu6_rb_fmt_to_ifmt.

5 years agoturnip: Make tu6_emit_event_write shared.
Bas Nieuwenhuizen [Mon, 18 Feb 2019 13:49:52 +0000 (14:49 +0100)]
turnip: Make tu6_emit_event_write shared.

5 years agoturnip: Add buffer memory binding.
Bas Nieuwenhuizen [Tue, 15 Jan 2019 21:54:15 +0000 (22:54 +0100)]
turnip: Add buffer memory binding.

5 years agoturnip: respect color attachment formats
Chia-I Wu [Thu, 14 Feb 2019 18:53:20 +0000 (10:53 -0800)]
turnip: respect color attachment formats

Make tu6_get_native_format available to tu_cmd_buffer and start
using of it.

5 years agoturnip: preliminary support for fences
Chia-I Wu [Thu, 14 Feb 2019 22:36:52 +0000 (14:36 -0800)]
turnip: preliminary support for fences

This should be quite complete feature-wise.  External fences are
still missing.  We probably also want to add a simpler path to
tu_WaitForFences for when fenceCount == 1.

5 years agoturnip: fix VkClearValue packing
Chia-I Wu [Wed, 13 Feb 2019 18:23:32 +0000 (10:23 -0800)]
turnip: fix VkClearValue packing

Add tu_pack_clear_value to correctly pack VkClearValue according to
VkFormat.  It ignores the component order defined by VkFormat, and
always packs to WZYX order.

5 years agoturnip: add support for VK_KHR_external_memory_{fd,dma_buf}
Chia-I Wu [Fri, 1 Feb 2019 18:36:19 +0000 (10:36 -0800)]
turnip: add support for VK_KHR_external_memory_{fd,dma_buf}

5 years agoturnip: advertise VK_KHR_external_memory
Chia-I Wu [Fri, 1 Feb 2019 18:27:28 +0000 (10:27 -0800)]
turnip: advertise VK_KHR_external_memory

AFAICT, it is supported.  We don't need to handle any of the new
structs because our BOs can always be exported.

5 years agoturnip: advertise VK_KHR_external_memory_capabilities
Chia-I Wu [Fri, 1 Feb 2019 18:12:38 +0000 (10:12 -0800)]
turnip: advertise VK_KHR_external_memory_capabilities

AFAICT, it is supported.

5 years agoturnip: add functions to import/export prime fd
Chia-I Wu [Thu, 31 Jan 2019 23:03:03 +0000 (15:03 -0800)]
turnip: add functions to import/export prime fd

Add tu_bo_init_dmabuf, tu_bo_export_dmabuf, tu_gem_import_dmabuf,
and tu_gem_export_dmabuf.

5 years agoturnip: Fix error behavior for VkPhysicalDeviceExternalImageFormatInfo
Chad Versace [Sat, 2 Feb 2019 00:48:44 +0000 (16:48 -0800)]
turnip: Fix error behavior for VkPhysicalDeviceExternalImageFormatInfo

If the handle type is unsupported, then the spec requires us to return
VK_ERROR_FORMAT_NOT_SUPPORTED.

Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Closes: https://gitlab.freedesktop.org/bnieuwenhuizen/mesa/merge_requests/17
5 years agoturnip: add a more complete format table
Chia-I Wu [Fri, 25 Jan 2019 19:13:54 +0000 (11:13 -0800)]
turnip: add a more complete format table

A format table is an array of tu_native_format.  Table lookup is
done through array indexing.

This commit defines a single format table for core VkFormat.  It is
derived from the table in the gallium driver.  There might be errors
introduced in the process of the conversion.

When an extension that defines new VkFormat is supported, we need to
add a new table for the extension.

5 years agoturnip: preliminary support for loadOp and storeOp
Chia-I Wu [Fri, 11 Jan 2019 23:01:26 +0000 (15:01 -0800)]
turnip: preliminary support for loadOp and storeOp

 - create tile_load_ib and tile_store_ib at the beginning of each
   subpass
 - execute the IBs at the end of each subpass
 - no DONT_CARE support
 - no subpass dependency analysis and subpass merging
 - no zs support
 - no true VkImageView support
   - assume VK_FORMAT_B8G8R8A8_UNORM
   - no tiling
   - no MSAA

This also removes cur_cs from tu_cmd_buffer.

5 years agoturnip: add TU_CS_MODE_SUB_STREAM
Chia-I Wu [Tue, 29 Jan 2019 23:00:34 +0000 (15:00 -0800)]
turnip: add TU_CS_MODE_SUB_STREAM

When in TU_CS_MODE_SUB_STREAM, tu_cs_begin_sub_stream (or
tu_cs_end_sub_stream) should be called instead of tu_cs_begin (or
tu_cs_end).  It gives the caller a TU_CS_MODE_EXTERNAL cs to emit
commands to.

5 years agoturnip: add tu_cs_mode
Chia-I Wu [Mon, 28 Jan 2019 22:33:20 +0000 (14:33 -0800)]
turnip: add tu_cs_mode

Add tu_cs_mode and TU_CS_MODE_EXTERNAL.  When in
TU_CS_MODE_EXTERNAL, tu_cs wraps an external buffer and can not
grow.

This also moves tu_cs* up in tu_private.h, such that other structs
can embed tu_cs_entry.

5 years agoturnip: provide both emit_ib and emit_call
Chia-I Wu [Tue, 29 Jan 2019 18:43:48 +0000 (10:43 -0800)]
turnip: provide both emit_ib and emit_call

tu_cs_emit_ib emits a CP_INDIRECT_BUFFER for a BO.  tu_cs_emit_call
emits a CP_INDIRECT_BUFFER for each entry of a target cs.

5 years agoturnip: add tu_cs_sanity_check
Chia-I Wu [Tue, 29 Jan 2019 00:31:54 +0000 (16:31 -0800)]
turnip: add tu_cs_sanity_check

It replaces tu_cs_reserve_space_assert and can be called at any
time to sanity check tu_cs.

5 years agoturnip: never fail tu_cs_begin/tu_cs_end
Chia-I Wu [Mon, 28 Jan 2019 23:55:40 +0000 (15:55 -0800)]
turnip: never fail tu_cs_begin/tu_cs_end

Error checking tu_cs_begin/tu_cs_end is too tedious for the callers.
Move tu_cs_add_bo and tu_cs_reserve_entry to tu_cs_reserve_space
such that tu_cs_begin/tu_cs_end never fails.

5 years agoturnip: specify initial size in tu_cs_init
Chia-I Wu [Tue, 29 Jan 2019 00:24:48 +0000 (16:24 -0800)]
turnip: specify initial size in tu_cs_init

We will drop size parameter from tu_cs_begin shortly, such that
tu_cs_begin never fails.

5 years agoturnip: add tu_cs_{reserve,add}_entry
Chia-I Wu [Mon, 28 Jan 2019 23:52:36 +0000 (15:52 -0800)]
turnip: add tu_cs_{reserve,add}_entry

We will stop calling tu_cs_reserve_entry in tu_cs_end shortly, such
that tu_cs_end never fails.

5 years agoturnip: add internal helpers for tu_cs
Chia-I Wu [Tue, 29 Jan 2019 22:09:17 +0000 (14:09 -0800)]
turnip: add internal helpers for tu_cs

Add tu_cs_get_offset, tu_cs_get_size, tu_cs_get_space, and
tu_cs_is_empty.

5 years agoturnip: add tu_tiling_config
Chia-I Wu [Tue, 22 Jan 2019 18:27:22 +0000 (10:27 -0800)]
turnip: add tu_tiling_config

We need the current color/depth/stencil attachments and the current
render area to compute the tiling config.

We compute the tiling config at the beginning of each subpass for
the moment.  We should change that when the driver can reorder/merge
subpasses.

It is very common that the render area is the entire framebuffer.
We might want to optimize for the case and compute the tiling config
in tu_framebuffer ctor.