git.libre-soc.org Git - mesa.git/log

projects / mesa.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Danylo Piliaiev [Tue, 12 Mar 2019 15:13:47 +0000 (17:13 +0200)]

anv: Fix destroying descriptor sets when pool gets reset

pool->next and pool->free_list were reset before their usage in
anv_descriptor_pool_free_set

Fixes: 775aabdd "anv: destroy descriptor sets when pool gets reset"
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Eric Anholt [Mon, 11 Mar 2019 22:59:24 +0000 (15:59 -0700)]

v3d: Disable PIPE_CAP_BLIT_BASED_TEXTURE_TRANSFER.

This reduces the runtime of dEQP-GLES3.functional.shaders.precision.* from
11.5s to 3.3s. This brings CTS runs down to 4 hours on one of my target
devices.

commit | commitdiff | tree

Jason Ekstrand [Thu, 7 Mar 2019 21:01:37 +0000 (15:01 -0600)]

intel/nir: Vectorize all IO

The IO scalarization pass that we run to help with linking end up
turning some shader I/O such as that for tessellation and geometry
shaders into many scalar URB operations rather than one vector one.  To
alleviate this, we now vectorize the I/O once again.  This fixes a 10%
performance regression in the GfxBench tessellation test that was caused
by scalarizing.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15224023 -> 15220871 (-0.02%)
    instructions in affected programs: 342009 -> 338857 (-0.92%)
    helped: 1236
    HURT: 443

    total spills in shared programs: 23471 -> 23465 (-0.03%)
    spills in affected programs: 6 -> 0
    helped: 1
    HURT: 0

    total fills in shared programs: 31770 -> 31766 (-0.01%)
    fills in affected programs: 4 -> 0
    helped: 1
    HURT: 0

Cycles was just a lot of churn do to moves being different places.  Most
of the pure churn in instructions was +/- one or two instructions in
fragment shaders.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510
Fixes: 4434591bf56a "intel/nir: Call nir_lower_io_to_scalar_early"
Fixes: 8d8222461f9d "intel/nir: Enable nir_opt_find_array_copies"
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

commit | commitdiff | tree

Jason Ekstrand [Wed, 6 Mar 2019 21:21:51 +0000 (15:21 -0600)]

nir: Add a pass for lowering IO back to vector when possible

This pass tries to turn scalar and array-of-scalar IO variables into
vector IO variables whenever possible.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Cc: "19.0" <mesa-stable@lists.freedesktop.org>

commit | commitdiff | tree

Rhys Perry [Thu, 6 Dec 2018 14:58:50 +0000 (14:58 +0000)]

ac/nir: fix 16-bit ssbo stores

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

commit | commitdiff | tree

pal1000 [Thu, 7 Mar 2019 08:38:10 +0000 (10:38 +0200)]

scons: Compatibility with Scons development version string

This ensures Mesa3D build doesn't fail in this case as encountered when
bisecting Scons source code while regression testing
https://bugs.freedesktop.org/show_bug.cgi?id=109443
and when testing 3.0.5.a.2

Technical details:
Scons version string has consistently been in this format:
MajorVersion.MinorVersion.Patch[.alpha/beta.yyyymmdd]
so these formulas should strip alpha/beta flags and return Scons version:

- as string - `'.'.join(SCons.__version__.split('.')[:3])`
- as tuple of integers - `tuple(map(int, SCons.__version__.split('.')[:3]))`

- v2: Fixed Scons version retrieval formulas as string and tuple of integers.
- v3: Fixed Scons version string format description.

Cc: "19.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>

commit | commitdiff | tree

Tapani Pälli [Tue, 12 Mar 2019 12:01:26 +0000 (14:01 +0200)]

anv: revert "anv: release memory allocated by glsl types during spirv_to_nir"

This reverts commit 47fc359822494935852de1e70e4d840b2fe6a25c.

Reason is that patch did not take in to account situation where we might
have both OpenGL and Vulkan using glsl_types at the same time.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

commit | commitdiff | tree

Connor Abbott [Mon, 4 Mar 2019 17:00:30 +0000 (18:00 +0100)]

radeonsi/nir: Use nir stripping pass

This reduces compilation time for my shader-db collection from around 40
seconds to 30, vs. 19 seconds for TGSI. There are still some shaders
that TGSI caches but NIR doesn't, partly because of more aggressive
cross-stage optimizations with NIR.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Connor Abbott [Mon, 4 Mar 2019 16:51:12 +0000 (17:51 +0100)]

nir: Add a stripping pass for improved cacheability

Oftentimes various nir shaders after lowering will be the same, or
almost the same. For example, this can happen when the same shader is
linked with different shaders to form different pipelines and
cross-stage optimizations don't kick in to change it. We want to avoid
running the backend twice on these shaders. We were already doing this
with radeonsi, but we were storing a few extra pieces of information
that made this much less effective compared to TGSI. The worse offender
by far was the program name, which caused most of the cache misses. This
pass strips out these pieces of information, controlled by the NIR_STRIP
debug env variable.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Samuel Pitoiset [Mon, 11 Mar 2019 09:25:53 +0000 (10:25 +0100)]

radv: fix pointSizeRange limits

The values should match the ones that are emitted.

This fixes new CTS dEQP-VK.rasterization.primitive_size.points.*.

Fixes: f4e499ec791 ("radv: add initial non-conformant radv vulkan driver")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

commit | commitdiff | tree

Sagar Ghuge [Thu, 14 Feb 2019 06:22:16 +0000 (22:22 -0800)]

iris: Flag fewer dirty bits in BLORP

v2: 1) Skip flagging IRIS_DIRTY_DEPTH_BUFFER if
BLORP_BATCH_NO_EMIT_DEPTH_STENCIL is set (Kenneth Graunke)
2) Add missing flags (Kenneth Graunke)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

commit | commitdiff | tree

Timothy Arceri [Fri, 1 Mar 2019 10:35:41 +0000 (21:35 +1100)]

st/glsl_to_nir: fix incorrect arrary access

This fixes a segfault when we try to access the array using a
-1 when the array wasn't allocated in the first place.

Before 7536af670b75 we would just access a pre-allocated array
that was also load/stored to/from the shader cache. But now the
cache will no longer allocate these arrays if they are empty.
The change resulted in tests such as the following segfaulting
when run with a warm shader cache.

tests/spec/arb_arrays_of_arrays/execution/sampler/fs-struct-const-index.shader_test

commit | commitdiff | tree

Brian Paul [Tue, 12 Mar 2019 02:12:15 +0000 (20:12 -0600)]

nir: silence a couple new compiler warnings

[33/630] Compiling C object 'src/compiler/nir/nir@sta/nir_loop_analyze.c.o'.
../src/compiler/nir/nir_loop_analyze.c: In function ‘try_find_trip_count_vars_in_iand’:
../src/compiler/nir/nir_loop_analyze.c:846:29: warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses]
    if (*ind == NULL || *ind && (*ind)->type != basic_induction ||
                             ^
[85/630] Compiling C object 'src/compiler/nir/nir@sta/nir_opt_loop_unroll.c.o'.
../src/compiler/nir/nir_opt_loop_unroll.c: In function ‘complex_unroll_single_terminator’:
../src/compiler/nir/nir_opt_loop_unroll.c:494:17: warning: unused variable ‘unroll_loc’ [-Wunused-variable]
    nir_cf_node *unroll_loc =
                 ^
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Sat, 9 Mar 2019 00:45:23 +0000 (00:45 +0000)]

panfrost: Identify fragment_extra flags

The fragment_extra structure contains additional fields extending the
MRT framebuffer descriptor, snuck in between the main framebuffer
descriptor and the render targets. Its fields include those related to
transaction elimination and depth/stencil buffers. This patch identifies
the flags field (previously just "unk" with some magic values) as well
as identifying some (but not all) flags set by the driver.

The process of identifying flags brought a bug to light where
transaction elimination (checksumming) could not be enabled unless AFBC
was in-use. This issue is now resolved.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Sat, 9 Mar 2019 00:12:07 +0000 (00:12 +0000)]

panfrost: Document "depth-buffer writeback" bit

This bit, if set, causes the depth buffer to be copied from GPU tile
memory to the provided depth buffer in main memory. If not set, the GPU
will not access the main memory (saving considerable memory bandwidth if
depth results are not actually used).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>

commit | commitdiff | tree

Alyssa Rosenzweig [Fri, 8 Mar 2019 23:41:12 +0000 (23:41 +0000)]

panfrost: Support linear depth textures

This combination has not yet been seen "in the wild" in traces, but to
support linear depth FBOs, ~bruteforce reveals this bit pattern is
necessary. It's not yet clear why the meanings of 0x1 and 0x2 are
essentially flipped (tiled vs linear for colour, linear vs some sort of
tiled for depth).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Fri, 8 Mar 2019 23:36:02 +0000 (23:36 +0000)]

panfrost: Allocate dedicated slab for linear BOs

Previously, linear BOs shared memory with each other to minimize kernel
round-trips / latency, as well as to work around a bug in the free_slab
function. These concerns are invalid now, but continuing to use the slab
allocator for BOs resulted in memory allocation errors. This issue was
aggravated, though not introduced (so not a real regression) in the
previous commit.

v2 (unreviewed): Fix bug in v1 preventing munmaps from working

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Thu, 7 Mar 2019 04:42:49 +0000 (04:42 +0000)]

panfrost: Determine framebuffer format bits late

Again, these formats are only properly known at the time of fragment job
emit. Rather than hardcoding the format, at least for MFBD we begin to
construct the format bits on-demand. This cleans up the code,
futureproofs for ES3 framebuffer formats, and should fix bugs regarding
FBO colour swizzles.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Thu, 7 Mar 2019 04:19:21 +0000 (04:19 +0000)]

panfrost: Delay color buffer setup

In an effort to cleanup framebuffer management code, we delay
colour buffer setup until the FRAGMENT job is actually emitted, allowing
the AFBC and linear codepaths to be unified.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Thu, 7 Mar 2019 03:52:20 +0000 (03:52 +0000)]

panfrost: Combine has_afbc/tiled in layout enum

AFBC, tiled, and linear BO layouts are mutually exclusive; they should
be coupled via a single enum rather than ad hoc checks of booleans.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>

commit | commitdiff | tree

Alyssa Rosenzweig [Thu, 7 Mar 2019 03:24:45 +0000 (03:24 +0000)]

panfrost: Cleanup needless if in create_bo

I'm not sure why we were checking for these additional criteria (likely
inherited from some other driver); remove the needless checks to cleanup
the code and perhaps fix some bugs down the line.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Tomeu Vizoso <tomeu.visozo@collabora.com>

commit | commitdiff | tree

Kenneth Graunke [Fri, 17 Nov 2017 07:47:43 +0000 (23:47 -0800)]

i965: Reimplement all the PIPE_CONTROL rules.

This implements virtually all documented PIPE_CONTROL restrictions
in a centralized helper.  You now simply ask for the operations you
want, and the pipe control "brain" will figure out exactly what pipe
controls to emit to make that happen without tanking your system.

The hope is that this will fix some intermittent flushing issues as
well as GPU hangs.  However, it also has a high risk of causing GPU
hangs and other regressions, as this is a particularly sensitive
area and poking the bear isn't always advisable.

Mark Janes noted that this patch helps with some GPU hangs on Icelake.

This does re-enable the VF Invalidate => Write Immediate workaround
on Gen8, which had been disabled (bug 103787) due to GPU hangs.  The
old code did this workaround after another which would have added CS
stall bits, so it missed a workaround.  The new code orders them
properly and appears to work.

v4: Don't pass "bo, offset, imm" to a recursive CS stall (caught by
    Topi Pohjolainen), drop Gen10 workarounds that are unnecessary for
    production hardware.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

commit | commitdiff | tree

Kenneth Graunke [Thu, 1 Nov 2018 22:55:51 +0000 (15:55 -0700)]

i965: Use genxml for emitting PIPE_CONTROL.

While this does add a bunch of boilerplate, it also protects us against
the hardware moving bits, or changing their meaning. For something as
finnicky as PIPE_CONTROL, the extra safety seems worth it.

We turn PIPE_CONTROL_* into an bitfield of arbitrary flags, and then
pack them appropriately.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

commit | commitdiff | tree

Kenneth Graunke [Thu, 1 Nov 2018 22:55:21 +0000 (15:55 -0700)]

i965: Rename ISP_DIS to INDIRECT_STATE_POINTERS_DISABLE.

Clearer name.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

commit | commitdiff | tree

Kenneth Graunke [Fri, 17 Nov 2017 06:37:02 +0000 (22:37 -0800)]

i965: Move some genX infrastructure to genX_boilerplate.h.

This will let us make multiple genX_*.c files, without copy and pasting
all this boilerplate.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>

commit | commitdiff | tree

Brian Paul [Fri, 8 Mar 2019 22:50:58 +0000 (15:50 -0700)]

gallium/winsys/kms: fix incomplete type compilation failure

Fixes:
../src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c: In function ‘kms_sw_displaytarget_from_handle’:
../src/gallium/winsys/sw/kms-dri/kms_dri_sw_winsys.c:402:60: error: dereferencing pointer to incomplete type ‘const struct pipe_resource’
templ->format,
^

Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>

commit | commitdiff | tree

Brian Paul [Fri, 8 Mar 2019 22:49:49 +0000 (15:49 -0700)]

drisw: fix incomplete type compilation failure

Fixes:
../src/gallium/winsys/sw/dri/dri_sw_winsys.c: In function ‘dri_sw_displaytarget_display’:
../src/gallium/winsys/sw/dri/dri_sw_winsys.c:255:39: error: dereferencing pointer to incomplete type ‘struct pipe_box’
offset = dri_sw_dt->stride * box->y;
^

Reviewed-by: Mathias Fröhlich <Mathias.Froehlich@web.de>

commit | commitdiff | tree

Brian Paul [Fri, 8 Mar 2019 03:39:49 +0000 (20:39 -0700)]

docs: try to improve the Meson documentation (v2)

Add new Introduction and Advanced Usage sections.
Spell out a few more details, like "ninja install".
Improve the layout around example commands.
Fix grammatical errors and tighten up the text.
Explain the --prefix option.

v2: Remove language about 'ninja clean' and move link to Meson
information about separate build directories earlier in the page.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

commit | commitdiff | tree

Brian Paul [Wed, 6 Mar 2019 23:20:55 +0000 (16:20 -0700)]

st/mesa: minor refactoring of texture/sampler delete code

Rename st_texture_free_sampler_views() to
st_delete_texture_sampler_views() to align with
st_DeleteTextureObject(), its only caller.

Move the call to st_texture_release_all_sampler_views() from
st_DeleteTextureObject() to st_delete_texture_sampler_views()
so all the sampler view clean-up code is in one place.

Reviewed-by: Neha Bhende <bhenden@vmware.com>

commit | commitdiff | tree

Brian Paul [Wed, 6 Mar 2019 23:15:19 +0000 (16:15 -0700)]

st/mesa: rename st_texture_release_sampler_view()

To st_texture_release_context_sampler_view() to be more clear
that it's context-specific.

Reviewed-by: Neha Bhende <bhenden@vmware.com>

commit | commitdiff | tree

Brian Paul [Wed, 6 Mar 2019 23:09:09 +0000 (16:09 -0700)]

st/mesa: add/improve sampler view comments

Reviewed-by: Neha Bhende <bhenden@vmware.com>

commit | commitdiff | tree

Brian Paul [Thu, 7 Mar 2019 16:55:09 +0000 (09:55 -0700)]

st/mesa: move around some code in st_context.c

st_init_driver_functions() is only called in st_context.c so there's
no need for the prototype in st_context.h

To avoid a forward declaration of st_init_driver_functions() in
st_context.c, we need to move around several other functions.

No functional change.

Reviewed-by: Neha Bhende <bhenden@vmware.com>

commit | commitdiff | tree

Brian Paul [Thu, 7 Mar 2019 16:21:53 +0000 (09:21 -0700)]

st/mesa: move utility functions, macros into new st_util.h file

To de-clutter st_context.h.

Clean up remaining function prototypes in st_context.h.

The st_vp_uses_current_values() helper is only used in st_context.c
so move it there.

The st_get_active_states() function is only used in st_context.c so
remove its prototype in st_context.h

Reviewed-by: Neha Bhende <bhenden@vmware.com>

commit | commitdiff | tree

Juan A. Suarez Romero [Mon, 11 Mar 2019 17:33:54 +0000 (18:33 +0100)]

anv: destroy descriptor sets when pool gets reset

As stated in Vulkan spec:
   "Resetting a descriptor pool recycles all of the resources from all
    of the descriptor sets allocated from the descriptor pool back to
    the descriptor pool, and the descriptor sets are implicitly freed."

This fixes dEQP-VK.api.descriptor_pool.*

Fixes: 14f6275c92f1 "anv/descriptor_set: add reference counting for..."
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>

commit | commitdiff | tree

Timothy Arceri [Thu, 6 Dec 2018 05:00:40 +0000 (16:00 +1100)]

nir: find induction/limit vars in iand instructions

This will be used to help find the trip count of loops that look
like the following:

   while (a < x && i < 8) {
      ...
      i++;
   }

Where the NIR will end up looking something like this:

   vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */)
   loop {
      ...
      vec1 1 ssa_12 = ilt ssa_225, ssa_11
      vec1 1 ssa_17 = ilt ssa_226, ssa_1
      vec1 1 ssa_18 = iand ssa_12, ssa_17
      vec1 1 ssa_19 = inot ssa_18

      if ssa_19 {
         ...
         break
      } else {
         ...
      }
   }

On RADV this unrolls a bunch of loops in F1-2017 shaders.

Totals from affected shaders:
SGPRS: 4112 -> 4136 (0.58 %)
VGPRS: 4132 -> 4052 (-1.94 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 515444 -> 587720 (14.02 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Max Waves: 194 -> 196 (1.03 %)
Wait states: 0 -> 0 (0.00 %)

It also unrolls a couple of loops in shader-db on radeonsi.

Totals from affected shaders:
SGPRS: 128 -> 128 (0.00 %)
VGPRS: 64 -> 64 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 6880 -> 9504 (38.14 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 16 -> 16 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Timothy Arceri [Thu, 6 Dec 2018 04:56:55 +0000 (15:56 +1100)]

nir: pass nir_op to calculate_iterations()

Rather than getting this from the alu instruction this allows us
some flexibility. In the following pass we instead pass the
inverse op.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Timothy Arceri [Thu, 6 Dec 2018 02:29:05 +0000 (13:29 +1100)]

nir: add get_induction_and_limit_vars() helper to loop analysis

This helps make find_trip_count() a little easier to follow but
will also be used by a following patch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Timothy Arceri [Thu, 6 Dec 2018 00:17:45 +0000 (11:17 +1100)]

nir: add helper to return inversion op of a comparison

This will be used to help find the trip count of loops that look
like the following:

   while (a < x && i < 8) {
      ...
      i++;
   }

Where the NIR will end up looking something like this:

   vec1 32 ssa_1 = load_const (0x00000004 /* 0.000000 */)
   loop {
      ...
      vec1 1 ssa_12 = ilt ssa_225, ssa_11
      vec1 1 ssa_17 = ilt ssa_226, ssa_1
      vec1 1 ssa_18 = iand ssa_12, ssa_17
      vec1 1 ssa_19 = inot ssa_18

      if ssa_19 {
         ...
         break
      } else {
         ...
      }
   }

So in order to find the trip count we need to find the inverse of
ilt.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Timothy Arceri [Thu, 6 Dec 2018 00:12:12 +0000 (11:12 +1100)]

nir: simplify the loop analysis trip count code a little

Here we create a helper is_supported_terminator_condition()
and use that rather than embedding all the trip count code
inside a switch.

The new helper will also be used in a following patch.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Timothy Arceri [Tue, 20 Nov 2018 04:23:45 +0000 (15:23 +1100)]

nir: unroll some loops with a variable limit

For some loops can have a single terminator but the exact trip
count is still unknown. For example:

   for (int i = 0; i < imin(x, 4); i++)
      ...

Shader-db results radeonsi (all affected are from Tropico 5):

Totals from affected shaders:
SGPRS: 144 -> 152 (5.56 %)
VGPRS: 124 -> 108 (-12.90 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 5180 -> 6640 (28.19 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 17 -> 21 (23.53 %)
Wait states: 0 -> 0 (0.00 %)

Shader-db results i965 (SKL):

total loops in shared programs: 3808 -> 3802 (-0.16%)
loops in affected programs: 6 -> 0
helped: 6
HURT: 0

vkpipeline-db results RADV (Unrolls some Skyrim VR shaders):

Totals from affected shaders:
SGPRS: 304 -> 304 (0.00 %)
VGPRS: 296 -> 292 (-1.35 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 15756 -> 25884 (64.28 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 29 -> 29 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

v2: fix bug where last iteration would get optimised away by
    mistake.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Timothy Arceri [Tue, 20 Nov 2018 02:45:58 +0000 (13:45 +1100)]

nir: calculate trip count for more loops

This adds support to loop analysis for loops where the induction
variable is compared to the result of min(variable, constant).

For example:

for (int i = 0; i < imin(x, 4); i++)
...

We add a new bool to the loop terminator struct in order to
differentiate terminators with this exit condition.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Timothy Arceri [Tue, 20 Nov 2018 03:05:09 +0000 (14:05 +1100)]

nir: add partial loop unrolling support

This adds partial loop unrolling support and makes use of a
guessed trip count based on array access.

The code is written so that we could use partial unrolling
more generally, but for now it's only use when we have guessed
the trip count.

We use partial unrolling for this guessed trip count because its
possible any out of bounds array access doesn't otherwise affect
the shader e.g the stores/loads to/from the array are unused. So
we insert a copy of the loop in the innermost continue branch of
the unrolled loop. Later on its possible for nir_opt_dead_cf()
to then remove the loop in some cases.

A Renderdoc capture from the Rise of the Tomb Raider benchmark,
reports the following change in an affected compute shader:

GPU duration: 350 -> 325 microseconds

shader-db results radeonsi VEGA (NIR backend):

SGPRS: 1008 -> 816 (-19.05 %)
VGPRS: 684 -> 432 (-36.84 %)
Spilled SGPRs: 539 -> 0 (-100.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 39708 -> 45812 (15.37 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 105 -> 144 (37.14 %)
Wait states: 0 -> 0 (0.00 %)

shader-db results i965 SKL:

total instructions in shared programs: 13098265 -> 13103359 (0.04%)
instructions in affected programs: 5126 -> 10220 (99.38%)
helped: 0
HURT: 21

total cycles in shared programs: 332039949 -> 331985622 (-0.02%)
cycles in affected programs: 289252 -> 234925 (-18.78%)
helped: 12
HURT: 9

vkpipeline-db results VEGA:

Totals from affected shaders:
SGPRS: 184 -> 184 (0.00 %)
VGPRS: 448 -> 448 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 26076 -> 24428 (-6.32 %) bytes
LDS: 6 -> 6 (0.00 %) blocks
Max Waves: 5 -> 5 (0.00 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Timothy Arceri [Mon, 19 Nov 2018 06:01:52 +0000 (17:01 +1100)]

nir: add new partially_unrolled bool to nir_loop

In order to stop continuously partially unrolling the same loop
we add the bool partially_unrolled to nir_loop, we add it here
rather than in nir_loop_info because nir_loop_info is only set
via loop analysis and is intended to be cleared before each
analysis. Also nir_loop_info is never cloned.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Timothy Arceri [Thu, 15 Nov 2018 12:23:09 +0000 (23:23 +1100)]

nir: add guess trip count support to loop analysis

This detects an induction variable used as an array index to guess
the trip count of the loop. This enables us to do a partial
unroll of the loop, which can eventually result in the loop being
eliminated.

v2: check if the induction var is used to index more than a single
array and if so get the size of the smallest array.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

commit | commitdiff | tree

Tomeu Vizoso [Fri, 8 Mar 2019 14:24:57 +0000 (15:24 +0100)]

panfrost: Add support for PAN_MESA_DEBUG

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>

commit | commitdiff | tree

Tomeu Vizoso [Fri, 8 Mar 2019 14:04:50 +0000 (15:04 +0100)]

panfrost/midgard: Add support for MIDGARD_MESA_DEBUG

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>

commit | commitdiff | tree

Xavier Bouchoux [Mon, 15 Oct 2018 14:24:29 +0000 (16:24 +0200)]

nir/spirv: Fix assert when unsampled OpTypeImage has unknown 'Depth'

'dxc' hlsl-to-spirv compiler appears to emit 2 (Unknown) in the depth field,
when the image is not sampled and the value is not needed.

Previously, shaders failed with:

SPIR-V parsing FAILED:
    In file ../src/compiler/spirv/spirv_to_nir.c:1412
    !is_shadow
    632 bytes into the SPIR-V binary

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

commit | commitdiff | tree

Kenneth Graunke [Sat, 9 Mar 2019 09:02:06 +0000 (01:02 -0800)]

iris: Fix write enable in pinning of depth/stencil resources

We may bind new Z/S buffers (which come via the framebuffer CSO,
triggering IRIS_DIRTY_DEPTH_BUFFER), but with writes disabled.

The next draw may enable Z or S writes (which come via the ZSA CSO,
triggering IRIS_DIRTY_WM_DEPTH_STENCIL), which requires us to update
our pin to have the write flag.

So, update pinning if either dirty flag changes. To clarify, pass
cso_zsa to the pinning function rather than pulling the random values
out of ice->state, which unfortunately have to exist for the resolve
code since iris_depth_stencil_alpha_state only exists in iris_state.c.

commit | commitdiff | tree

Kenneth Graunke [Sat, 9 Mar 2019 08:50:24 +0000 (00:50 -0800)]

iris: Refactor depth/stencil buffer pinning into a helper.

This avoids the code duplication that caused me to put things in the
wrong place in the previous commit. One used to have extra flushes,
but we moved those out so now these are identical and can be easily
shared.

commit | commitdiff | tree

Kenneth Graunke [Sat, 9 Mar 2019 08:42:54 +0000 (00:42 -0800)]

iris: Move depth/stencil flushes so they actually do something

Commit d6dd57d43cd (iris: Add missing depth cache flushes) added the
depth/stencil flushes to the wrong place.  I meant to add them to the
iris_upload_dirty_render_state code that emits the packets, but I
accidentally added them to the nearly identical looking code in
iris_restore_render_saved_bos.  This meant we missed the actual flushing
at draw time, but instead did pointless flushing on the first draw in a
batch where things are already flushed anyway.

This commit moves them to iris_resolve.c, next to the depth prepares,
similar to what we do for color buffers.  i965 does them elsewhere, but
I'm not sure why - this seems like the most consistent place.

commit | commitdiff | tree

Christian Gmeiner [Tue, 26 Feb 2019 17:41:07 +0000 (18:41 +0100)]

st/dri: allow direct UYVY import

Push this format to the pipe driver unchanged.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

commit | commitdiff | tree

Kenneth Graunke [Fri, 8 Mar 2019 04:14:59 +0000 (20:14 -0800)]

iris: Fix TES gl_PatchVerticesIn handling.

1. If we switch the TCS for one with a different number of output
   vertices, then the TES's gl_PatchVerticesIn value will change.
   We need to re-upload in this case.  For now, re-emit constants
   whenever the TCS/TES are swapped out.

2. If there is no TCS, then we can't grab gl_PatchVerticesIn from
   the TCS info.  Since it's a passthrough, we can just use the
   primitive's patch count (like the TCS gl_PatchVerticesIn does).

Fixes KHR-GL45.tessellation_shader.single.max_patch_vertices and
KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Kenneth Graunke [Thu, 7 Mar 2019 04:56:37 +0000 (20:56 -0800)]

iris: Rework default tessellation level uploads

Now that we've added a system value uploading mechanism, we may as well
reuse the same system for default tessellation levels. This simplifies
the state upload code a bit.

Also fixes:
KHR-GL45.tessellation_shader.tessellation_control_to_tessellation_evaluation.gl_tessLevel

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

commit | commitdiff | tree

Timur Kristóf [Wed, 13 Feb 2019 22:28:20 +0000 (00:28 +0200)]

iris: Face should be a system value.

This patch adds PIPE_CAP_TGSI_FS_FACE_IS_INTEGER_SYSVAL which
despite its name is not a TGSI-specific capability, just lets
the state tracker know that it should generate a system value
for FACE.

This is needed if we want to run tgsi_to_nir on iris.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

commit | commitdiff | tree

Eric Anholt [Thu, 28 Feb 2019 20:02:58 +0000 (12:02 -0800)]

vc4: Switch the post-RA scheduler over to the DAG datastructure.

Just a small code reduction from shared infrastructure.

commit | commitdiff | tree

Eric Anholt [Thu, 28 Feb 2019 18:42:05 +0000 (10:42 -0800)]

v3d: Use the DAG datastructure for QPU instruction scheduling.

Just a small code reduction from shared infrastructure.

commit | commitdiff | tree

Eric Anholt [Thu, 28 Feb 2019 19:02:25 +0000 (11:02 -0800)]

vc4: Reuse list_for_each_entry_rev().

commit | commitdiff | tree

Eric Anholt [Thu, 28 Feb 2019 19:01:57 +0000 (11:01 -0800)]

v3d: Reuse list_for_each_entry_rev().

commit | commitdiff | tree

Eric Anholt [Thu, 28 Feb 2019 18:06:27 +0000 (10:06 -0800)]

vc4: Switch over to using the DAG datastructure for QIR scheduling.

Just a small code reduction from shared infrastructure.

commit | commitdiff | tree

Eric Anholt [Wed, 27 Feb 2019 19:12:59 +0000 (11:12 -0800)]

util: Add a DAG datastructure.

I keep writing this for various schedulers.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>

commit | commitdiff | tree

Kristian H. Kristensen [Fri, 1 Mar 2019 22:33:36 +0000 (14:33 -0800)]

freedreno/a6xx: Remove extra parens

There's a warning about this now.

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>

commit | commitdiff | tree

Kristian H. Kristensen [Fri, 1 Mar 2019 22:25:57 +0000 (14:25 -0800)]

freedreno: Use c_vis_args and no_override_init_args

Signed-off-by: Kristian H. Kristensen <hoegsberg@chromium.org>

commit | commitdiff | tree

Chia-I Wu [Fri, 8 Feb 2019 21:45:53 +0000 (13:45 -0800)]

turnip: preliminary support for Wayland WSI

commit | commitdiff | tree

Chia-I Wu [Mon, 11 Feb 2019 19:12:32 +0000 (11:12 -0800)]

turnip: preliminary support for tu_GetImageSubresourceLayout

commit | commitdiff | tree

Chad Versace [Sat, 2 Feb 2019 01:08:51 +0000 (17:08 -0800)]

turnip: Use Vulkan 1.1 names instead of KHR

That is, drop KHR from all tokens that were promoted to Vulkan 1.1.
The consistency makes ctags more useful (it now jumps directly to the
real definitions in vulkan_core.h instead of the typedefs); and it makes
the code slightly less verbose.

commit | commitdiff | tree

Chia-I Wu [Fri, 8 Mar 2019 19:27:50 +0000 (11:27 -0800)]

turnip: guard -Dvulkan-driver=freedreno

Require -DI-love-half-baked-turnips=true as well to enable freedreno
vulkan driver.

commit | commitdiff | tree

Chia-I Wu [Fri, 22 Feb 2019 16:50:58 +0000 (08:50 -0800)]

turnip: preliminary support for tu_CmdDraw

commit | commitdiff | tree

Chia-I Wu [Fri, 22 Feb 2019 06:37:34 +0000 (22:37 -0800)]

turnip: preliminary support for draw state binding

This adds support for tu_CmdBindPipeline, tu_CmdBindVertexBuffers,
etc.

commit | commitdiff | tree

Chia-I Wu [Wed, 20 Feb 2019 22:26:06 +0000 (14:26 -0800)]

turnip: add draw_cs to tu_cmd_buffer

It will hold draw commands.

commit | commitdiff | tree

Chia-I Wu [Fri, 22 Feb 2019 06:31:36 +0000 (22:31 -0800)]

turnip: parse VkPipelineVertexInputStateCreateInfo

commit | commitdiff | tree

Chia-I Wu [Wed, 27 Feb 2019 06:10:34 +0000 (22:10 -0800)]

turnip: parse VkPipelineShaderStageCreateInfo

commit | commitdiff | tree

Chia-I Wu [Wed, 27 Feb 2019 06:09:37 +0000 (22:09 -0800)]

turnip: compile VkPipelineShaderStageCreateInfo

Compile all shaders and upload the binaries to a BO.

commit | commitdiff | tree

Chia-I Wu [Wed, 20 Feb 2019 17:53:47 +0000 (09:53 -0800)]

turnip: preliminary support for shader modules

Save SPIR-V in tu_shader_module. Tranlation to NIR happens in
tu_shader_create, and compilation to binary code happens in
tu_shader_compile. Both will be called during pipeline creation.

commit | commitdiff | tree

Chia-I Wu [Thu, 21 Feb 2019 22:58:52 +0000 (14:58 -0800)]

turnip: parse VkPipeline{Multisample,ColorBlend}StateCreateInfo

commit | commitdiff | tree

Chia-I Wu [Thu, 21 Feb 2019 19:46:59 +0000 (11:46 -0800)]

turnip: parse VkPipelineDepthStencilStateCreateInfo

commit | commitdiff | tree

Chia-I Wu [Wed, 27 Feb 2019 07:29:51 +0000 (23:29 -0800)]

turnip: parse VkPipelineRasterizationStateCreateInfo

commit | commitdiff | tree

Chia-I Wu [Tue, 19 Feb 2019 21:49:01 +0000 (13:49 -0800)]

turnip: parse VkPipelineViewportStateCreateInfo

commit | commitdiff | tree

Chia-I Wu [Thu, 21 Feb 2019 19:07:38 +0000 (11:07 -0800)]

turnip: parse VkPipelineInputAssemblyStateCreateInfo

commit | commitdiff | tree

Chia-I Wu [Thu, 21 Feb 2019 17:41:49 +0000 (09:41 -0800)]

turnip: parse VkPipelineDynamicStateCreateInfo

commit | commitdiff | tree

Chia-I Wu [Thu, 21 Feb 2019 17:22:17 +0000 (09:22 -0800)]

turnip: create a less dummy pipeline

Still dummy, but at least it is created from tu_pipeline_builder.

commit | commitdiff | tree

Chia-I Wu [Mon, 25 Feb 2019 22:38:34 +0000 (14:38 -0800)]

turnip: simplify tu_cs sub-streams usage

Let tu_cs_begin_sub_stream imply tu_cs_reserve_space, and
tu_cs_end_sub_stream imply tu_cs_sanity_check. Callers are no
longer required to call them (but can still do if they choose to).

commit | commitdiff | tree

Chia-I Wu [Mon, 25 Feb 2019 22:37:55 +0000 (14:37 -0800)]

turnip: fix tu_cs sub-streams

Update cs->start in tu_cs_end_sub_stream. Otherwise, the entry
would include commands from all prior sub-streams.

commit | commitdiff | tree

Chia-I Wu [Mon, 25 Feb 2019 22:57:03 +0000 (14:57 -0800)]

turnip: tu_cs_emit_array

Array version of tu_cs_emit. Useful for updating multiple
consecutive array-like registers, or loading a shader binary with
SS6_DIRECT.

commit | commitdiff | tree

Chia-I Wu [Mon, 25 Feb 2019 22:49:34 +0000 (14:49 -0800)]

turnip: add tu_cs_discard_entries

We will start a draw IB at the beginning of a subpass and consume it
at the end of the subpass. With tu_cs_discard_entries, we can reuse
the same tu_cs for all subpasses.

commit | commitdiff | tree

Chia-I Wu [Mon, 25 Feb 2019 22:55:06 +0000 (14:55 -0800)]

turnip: more/better asserts for tu_cs

Asserting (cur < end) in tu_cs_emit catches much less programming
errors comparing to asserting (cur < reserved_end). We should never
write more commands than what we have reserved.

Assert IB is non-empty and sane in tu_cs_emit_ib.

commit | commitdiff | tree

Chia-I Wu [Mon, 25 Feb 2019 22:44:52 +0000 (14:44 -0800)]

turnip: use 32-bit offset in tu_cs_entry

We don't support nor expect BOs to be that big in tu_cs.

commit | commitdiff | tree

Chia-I Wu [Mon, 25 Feb 2019 22:32:36 +0000 (14:32 -0800)]

turnip: mark IBs for dumping

Includes IBs in kernel cmdbuf dumps.

commit | commitdiff | tree

Eric Engestrom [Wed, 27 Feb 2019 12:31:06 +0000 (12:31 +0000)]

turnip: use the platform defines in vk.xml instead of hard-coding them

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>

commit | commitdiff | tree

Bas Nieuwenhuizen [Thu, 21 Feb 2019 21:39:22 +0000 (22:39 +0100)]

turnip: Add todo for copies.

commit | commitdiff | tree

Bas Nieuwenhuizen [Mon, 18 Feb 2019 15:43:24 +0000 (16:43 +0100)]

turnip: Add buffer->image DMA copies.

Passes dEQP-VK.api.copy_and_blit.core.buffer_to_image.*

commit | commitdiff | tree

Bas Nieuwenhuizen [Mon, 18 Feb 2019 15:13:23 +0000 (16:13 +0100)]

turnip: Add image->buffer DMA copies.

Passes dEQP-VK.api.copy_and_blit.core.image_to_buffer.*

commit | commitdiff | tree

Bas Nieuwenhuizen [Mon, 18 Feb 2019 15:09:27 +0000 (16:09 +0100)]

turnip: Implement buffer->buffer DMA copies.

Passes dEQP-VK.api.copy_and_blit.core.buffer_to_buffer.*

commit | commitdiff | tree

Bas Nieuwenhuizen [Mon, 4 Feb 2019 13:52:34 +0000 (14:52 +0100)]

turnip: Add tu6_rb_fmt_to_ifmt.

commit | commitdiff | tree

Bas Nieuwenhuizen [Mon, 18 Feb 2019 13:49:52 +0000 (14:49 +0100)]

turnip: Make tu6_emit_event_write shared.

commit | commitdiff | tree

Bas Nieuwenhuizen [Tue, 15 Jan 2019 21:54:15 +0000 (22:54 +0100)]

turnip: Add buffer memory binding.

commit | commitdiff | tree

Chia-I Wu [Thu, 14 Feb 2019 18:53:20 +0000 (10:53 -0800)]

turnip: respect color attachment formats

Make tu6_get_native_format available to tu_cmd_buffer and start
using of it.

commit | commitdiff | tree

Chia-I Wu [Thu, 14 Feb 2019 22:36:52 +0000 (14:36 -0800)]

turnip: preliminary support for fences

This should be quite complete feature-wise. External fences are
still missing. We probably also want to add a simpler path to
tu_WaitForFences for when fenceCount == 1.

commit | commitdiff | tree

Chia-I Wu [Wed, 13 Feb 2019 18:23:32 +0000 (10:23 -0800)]

turnip: fix VkClearValue packing

Add tu_pack_clear_value to correctly pack VkClearValue according to
VkFormat. It ignores the component order defined by VkFormat, and
always packs to WZYX order.

commit | commitdiff | tree

Chia-I Wu [Fri, 1 Feb 2019 18:36:19 +0000 (10:36 -0800)]

turnip: add support for VK_KHR_external_memory_{fd,dma_buf}

commit | commitdiff | tree

Chia-I Wu [Fri, 1 Feb 2019 18:27:28 +0000 (10:27 -0800)]

turnip: advertise VK_KHR_external_memory

AFAICT, it is supported. We don't need to handle any of the new
structs because our BOs can always be exported.