mesa.git
4 years agozink: store image-type per texture
Erik Faye-Lund [Fri, 3 Jan 2020 12:55:35 +0000 (13:55 +0100)]
zink: store image-type per texture

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3275>

4 years agozink: avoid incorrect vector-construction
Erik Faye-Lund [Fri, 3 Jan 2020 11:22:38 +0000 (12:22 +0100)]
zink: avoid incorrect vector-construction

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3275>

4 years agozink: support offset-variants of texturing
Erik Faye-Lund [Thu, 2 Jan 2020 10:26:38 +0000 (11:26 +0100)]
zink: support offset-variants of texturing

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3275>

4 years agozink: implement nir_texop_txs
Erik Faye-Lund [Wed, 30 Oct 2019 09:50:20 +0000 (10:50 +0100)]
zink: implement nir_texop_txs

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3275>

4 years agodocs: fixup indentation
Erik Faye-Lund [Thu, 16 Jan 2020 20:11:29 +0000 (21:11 +0100)]
docs: fixup indentation

The most canonical indentation-style here is two spaces, which is what
the standard boilerplate in all documents use. So let's normalize to
that.

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>

4 years agodocs: remove pointless, stray newline
Erik Faye-Lund [Thu, 16 Jan 2020 20:03:08 +0000 (21:03 +0100)]
docs: remove pointless, stray newline

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>

4 years agodocs: use [1] instead of asterisk for footnote
Erik Faye-Lund [Thu, 16 Jan 2020 19:45:22 +0000 (20:45 +0100)]
docs: use [1] instead of asterisk for footnote

While we're at it, make it a link as well.

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>

4 years agodocs: remove trailing newlines
Erik Faye-Lund [Thu, 16 Jan 2020 19:28:53 +0000 (20:28 +0100)]
docs: remove trailing newlines

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>

4 years agodocs: remove leading spaces
Erik Faye-Lund [Thu, 16 Jan 2020 19:18:07 +0000 (20:18 +0100)]
docs: remove leading spaces

There's no good reason to have leading space in these pre-formatted
blocks. It looks strange, so let's get rid of it.

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>

4 years agodocs: remove trailing header
Erik Faye-Lund [Thu, 16 Jan 2020 19:02:17 +0000 (20:02 +0100)]
docs: remove trailing header

This header has been there since the document was added, but contains
nothing. So let's get rid of it.

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>

4 years agodocs: use figure/figcaption instead of tables
Erik Faye-Lund [Thu, 16 Jan 2020 18:57:13 +0000 (19:57 +0100)]
docs: use figure/figcaption instead of tables

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>

4 years agodocs: do not use definition-list for sub-topics
Erik Faye-Lund [Thu, 16 Jan 2020 18:34:02 +0000 (19:34 +0100)]
docs: do not use definition-list for sub-topics

The dl-tag isn't a neat tool for defining sub-headings, it's a semantic
tool for defining definitions and their meaning. Let's insetad use
normal sub-headings instead.

To make the last few paragraphs stand out from the above, let's add a
sub-heading for those as well.

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>

4 years agofreedreno/a6xx: add PROG_FB_RAST stateobj
Rob Clark [Thu, 16 Jan 2020 23:14:19 +0000 (15:14 -0800)]
freedreno/a6xx: add PROG_FB_RAST stateobj

For the handful of registers that depend on the union of program/
framebuffer/rasterizer state.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>

4 years agofreedreno/a6xx: move dynamic program state to streaming stateobj
Rob Clark [Thu, 16 Jan 2020 22:38:41 +0000 (14:38 -0800)]
freedreno/a6xx: move dynamic program state to streaming stateobj

Move the program state which we can't pre-bake to a streaming state
object, rather than emitting directly in the draw cmdstream.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>

4 years agofreedreno/a6xx: drop a few more per-draw registers
Rob Clark [Thu, 16 Jan 2020 20:42:45 +0000 (12:42 -0800)]
freedreno/a6xx: drop a few more per-draw registers

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>

4 years agofreedreno/a6xx: separate rast stateobj for prim restart
Rob Clark [Thu, 16 Jan 2020 20:15:37 +0000 (12:15 -0800)]
freedreno/a6xx: separate rast stateobj for prim restart

This lets us move PC_PRIMITIVE_CNTL into the rasterizr stateobj, rather
than unconditionally emitting it directly in the cmdstream on every
draw.

This also starts adding some tracking about previous draw state, so that
following patches can limit some of the register writes we currently
emit on every draw.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>

4 years agofreedreno/a6xx: cleanup rasterizer state
Rob Clark [Thu, 16 Jan 2020 19:25:24 +0000 (11:25 -0800)]
freedreno/a6xx: cleanup rasterizer state

All but one of the reg values is only used in the stateobj, so we can
inline the register value setup and stateobj construction.  While we
are at it, switch over to the new register builders.

Prep work for next patch.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>

4 years agofreedreno/a6xx: limit scratch/debug markers to debug builds
Rob Clark [Thu, 16 Jan 2020 18:42:39 +0000 (10:42 -0800)]
freedreno/a6xx: limit scratch/debug markers to debug builds

The overhead does seem to matter when you have a high enough # of draw
calls that effect few bins/pixels, because these writes would happen
unconditionally (ie. not part of a state-group).

Possibly we could keep these if we moved them into a state-group so the
register writes would be no-ops on bins with no geometry.  OTOH I
usually end up adding in a WFI when using them scratch reg values to
track down a crash.  (So add a WFI to mitigate the annoyance of needing
to use a debug build to get scratch regs to locate the position of a
crash/hang in the cmdstream.)

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>

4 years agoiris: Fix some indentation in iris_init_render_context
Jordan Justen [Tue, 14 Jan 2020 02:07:34 +0000 (18:07 -0800)]
iris: Fix some indentation in iris_init_render_context

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
4 years agoutil/vector: Fix u_vector_foreach when head rolls over
C Stout [Thu, 16 Jan 2020 23:05:06 +0000 (15:05 -0800)]
util/vector: Fix u_vector_foreach when head rolls over

Also add unit tests for u_vector.

Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3453>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3453>

4 years agointel/fs: Switch to standard vector layout for barycentrics at optimization time.
Francisco Jerez [Sat, 4 Jan 2020 01:08:51 +0000 (17:08 -0800)]
intel/fs: Switch to standard vector layout for barycentrics at optimization time.

This involves permuting the registers of barycentric vectors to have
the standard X[0-n] Y[0-n] layout at NIR translation time.
Barycentrics are converted to the format expected by the PLN
instruction in the lower_barycentrics() pass run after the
optimization loop.

Main reason is correctness of SIMD32 fragment shaders.  The
shuffle_from_pln_layout() and shuffle_to_pln_layout() helpers used
during NIR translation are busted for SIMD32.  This leads to serious
corruption at present with INTEL_DEBUG=do32, especially on Gen11+
where these helpers are hit more frequently due to the lack of a
hardware PLN instruction.

Of course one could have chosen to fix those helpers instead, but
there is another far more subtle issue that was reported during review
of the SIMD32 fragment shader codegen changes: The SIMD splitting pass
currently handles SIMD32 barycentric vectors as if they had the
standard X[0-n] Y[0-n] layout, even though they are interleaved for
the PLN instruction, which causes incorrect execution masks to be
applied to the MOVs unzipping barycentric vectors in cases where a
LINTERP instruction occurs under non-uniform control flow.

I'm not aware of any conformance regressions due to the latter issue
at present, but for our peace of mind let's move the conversion to the
PLN layout into the lower_barycentrics() pass run after
lower_simd_width().

This leads to the following shader-db improvements (including SIMD32
shaders) in combination with the previous back-end preparation changes
-- Without them (especially the copy propagation changes) this would
lead to a massive number of regressions.  On ICL:

   total instructions in shared programs: 20662316 -> 20466903 (-0.95%)
   instructions in affected programs: 10538474 -> 10343061 (-1.85%)
   helped: 68775
   HURT: 6

   total spills in shared programs: 8938 -> 8748 (-2.13%)
   spills in affected programs: 376 -> 186 (-50.53%)
   helped: 9
   HURT: 5

   total fills in shared programs: 8965 -> 8663 (-3.37%)
   fills in affected programs: 965 -> 663 (-31.30%)
   helped: 9
   HURT: 6

   LOST:   146
   GAINED: 43

On SKL:

   total instructions in shared programs: 18725867 -> 18614912 (-0.59%)
   instructions in affected programs: 3876590 -> 3765635 (-2.86%)
   helped: 27492
   HURT: 2

   LOST:   191
   GAINED: 417

On SNB:

   total instructions in shared programs: 14573613 -> 13980646 (-4.07%)
   instructions in affected programs: 5199074 -> 4606107 (-11.41%)
   helped: 29998
   HURT: 0

   LOST:   21
   GAINED: 30

Results are somewhat less impressive but still significant without
SIMD32 fragment shaders enabled.  On ICL:

   total instructions in shared programs: 16148728 -> 16061659 (-0.54%)
   instructions in affected programs: 6114788 -> 6027719 (-1.42%)
   helped: 42046
   HURT: 6

   total spills in shared programs: 8218 -> 8028 (-2.31%)
   spills in affected programs: 376 -> 186 (-50.53%)
   helped: 9
   HURT: 5

   total fills in shared programs: 8953 -> 8651 (-3.37%)
   fills in affected programs: 965 -> 663 (-31.30%)
   helped: 9
   HURT: 6

   LOST:   0
   GAINED: 3

On SKL:

   total instructions in shared programs: 14927994 -> 14926738 (-0.01%)
   instructions in affected programs: 168850 -> 167594 (-0.74%)
   helped: 711
   HURT: 2

On SNB:

   total instructions in shared programs: 10770538 -> 10734403 (-0.34%)
   instructions in affected programs: 2702172 -> 2666037 (-1.34%)
   helped: 17818
   HURT: 0

All of the hurt shaders are either spilling slightly more or emitting
additional NOP instructions due to the SIMD16 POW workaround for
Gen8-9 combined with differences in scheduling.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs: Introduce barycentric layout lowering pass.
Francisco Jerez [Sat, 4 Jan 2020 00:12:23 +0000 (16:12 -0800)]
intel/fs: Introduce barycentric layout lowering pass.

The goal is to represent barycentrics with the standard vector layout
during optimization and particularly SIMD lowering.  Instead of
emitting the barycentric layout conversions at NIR translation time,
do it later as a lowering pass.  For the moment this is only applied
to PI messages, but we'll give the same treatment to LINTERP
instructions too.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs: Split fetch_payload_reg() into separate helper for barycentrics.
Francisco Jerez [Fri, 3 Jan 2020 22:41:15 +0000 (14:41 -0800)]
intel/fs: Split fetch_payload_reg() into separate helper for barycentrics.

We're about to change the layout of barycentric vectors, which will
involve permuting the GRFs of barycentrics fetched from the thread
payload.  Make room for this in a function separate from the generic
fetch_payload_reg(), since the permutation will only be applicable to
barycentric vectors.  This allows simplifying fetch_payload_reg(),
since there was no need for handling multiple-component payload
registers except for barycentrics.

This causes some minor shader-db noise due to the new helper emitting
a LOAD_PAYLOAD instruction unconditionally, but it will be cleaned up
shortly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs/gen6: Use SEL instead of bashing thread payload for unlit centroid workaround.
Francisco Jerez [Fri, 3 Jan 2020 23:58:05 +0000 (15:58 -0800)]
intel/fs/gen6: Use SEL instead of bashing thread payload for unlit centroid workaround.

This prevents regressions on SNB due to the redundant MOVs lying
around in cases where fetch_payload_reg() returns a VGRF (currently
only in SIMD32 but soon in pretty much all cases).  The MOVs can't be
register-coalesced due to their source being a FIXED_GRF, and they
can't be copy-propagated either due to the unlit centroid workaround
partial writes.  They can be copy-propagated just fine into a SEL
instruction though.

On SNB this prevents the following shader-db regressions (including
SIMD32 programs) in combination with the interpolation rework part of
this series:

   total instructions in shared programs: 13996898 -> 14001982 (0.04%)
   instructions in affected programs: 197461 -> 202545 (2.57%)
   helped: 0
   HURT: 1251

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs/gen6: Generalize aligned_pairs_class to SIMD16 aligned barycentrics.
Francisco Jerez [Fri, 3 Jan 2020 23:06:52 +0000 (15:06 -0800)]
intel/fs/gen6: Generalize aligned_pairs_class to SIMD16 aligned barycentrics.

This is mainly meant to avoid shader-db regressions on SNB as we start
using VGRFs for barycentrics more frequently.  Currently the
aligned_pairs_class is only useful in SIMD8 mode, because in SIMD16
mode barycentric vectors are typically 4 GRFs.  This is not a problem
on Gen4-5, because on those platforms all VGRF allocations are
pair-aligned in SIMD16 mode.  However on Gen6 we end up using either
the fast or the slow path of LINTERP rather non-deterministically
based on the behavior of the register allocator.

Fix it by repurposing aligned_pairs_class to hold PLN-aligned
registers of whatever the natural size of a barycentric vector is in
the current dispatch width.

On SNB this prevents the following shader-db regressions (including
SIMD32 programs) in combination with the interpolation rework part of
this series:

   total instructions in shared programs: 13983257 -> 14527274 (3.89%)
   instructions in affected programs: 1766255 -> 2310272 (30.80%)
   helped: 0
   HURT: 11608

   LOST:   26
   GAINED: 13

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs/gen6: Constrain barycentric source of LINTERP during bank conflict mitigation.
Francisco Jerez [Tue, 31 Dec 2019 00:34:22 +0000 (16:34 -0800)]
intel/fs/gen6: Constrain barycentric source of LINTERP during bank conflict mitigation.

This avoids regressions on SNB due to the bank conflict mitigation
pass moving a VGRF-allocated barycentric vector to a misaligned
location, which would prevent the PLN instruction from being used.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs/gen4-6: Allocate registers from aligned_pairs_class based on LINTERP use.
Francisco Jerez [Fri, 3 Jan 2020 22:53:11 +0000 (14:53 -0800)]
intel/fs/gen4-6: Allocate registers from aligned_pairs_class based on LINTERP use.

Previously we would hardcode fs_visitor::delta_xy barycentrics to be
allocated from aligned_pairs_class on hardware with PLN source
alignment restrictions (pre-Gen7).  Instead allocate any registers
consumed by LINTERP from aligned_pairs_class, even if some barycentric
vector had ended up in a temporary.

On SNB this prevents the following shader-db regressions (including
SIMD32 programs) in combination with the interpolation rework part of
this series:

   total instructions in shared programs: 13983257 -> 14527274 (3.89%)
   instructions in affected programs: 1766255 -> 2310272 (30.80%)
   helped: 0
   HURT: 11608

   LOST:   26
   GAINED: 13

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs: Allow limited copy propagation of a LOAD_PAYLOAD into another.
Francisco Jerez [Mon, 30 Dec 2019 08:37:35 +0000 (00:37 -0800)]
intel/fs: Allow limited copy propagation of a LOAD_PAYLOAD into another.

This is particularly useful in cases where register coalaesce is
unlikely to succeed because the LOAD_PAYLOAD isn't a plain copy --
E.g. when a LOAD_PAYLOAD is shuffling the contents of a barycentric
vector in order to transform it into the PLN layout.

This prevents the following shader-db regressions (including SIMD32
programs) in combination with the interpolation rework part of this
series.  On SKL:

   total instructions in shared programs: 18596672 -> 18976097 (2.04%)
   instructions in affected programs: 7937041 -> 8316466 (4.78%)
   helped: 39
   HURT: 67427

   LOST:   466
   GAINED: 220

On SNB:

   total instructions in shared programs: 13993866 -> 14202963 (1.49%)
   instructions in affected programs: 7611309 -> 7820406 (2.75%)
   helped: 624
   HURT: 52943

   LOST:   6
   GAINED: 18

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs: Add support for copy-propagating a block of multiple FIXED_GRFs.
Francisco Jerez [Mon, 30 Dec 2019 08:36:48 +0000 (00:36 -0800)]
intel/fs: Add support for copy-propagating a block of multiple FIXED_GRFs.

In cases where a LOAD_PAYLOAD instruction copies a single block of
sequential GRF registers into the destination (see
is_identity_payload()), splitting the block copy into a number of ACP
entries (one for each LOAD_PAYLOAD source) is undesirable, because
that prevents copy propagation into any instructions which read
multiple components at once with the same source (the barycentric
source of the LINTERP instruction is going to be the overwhelmingly
most common example).

Technically it would also be possible to do this for VGRF sources, but
there is little benefit from that since register coalesce already
covers many of those cases -- There is no way for a block of
FIXED_GRFs to be coalesced into a VGRF though.

This prevents the following shader-db regressions (including SIMD32
programs) in combination with the interpolation rework part of this
series.  On SKL:

   total instructions in shared programs: 18595160 -> 18828562 (1.26%)
   instructions in affected programs: 13374946 -> 13608348 (1.75%)
   helped: 7
   HURT: 108977

   total spills in shared programs: 9116 -> 9106 (-0.11%)
   spills in affected programs: 404 -> 394 (-2.48%)
   helped: 7
   HURT: 9

   total fills in shared programs: 8994 -> 9176 (2.02%)
   fills in affected programs: 898 -> 1080 (20.27%)
   helped: 7
   HURT: 9

   LOST:   469
   GAINED: 220

On SNB:

   total instructions in shared programs: 13996898 -> 14096222 (0.71%)
   instructions in affected programs: 8088546 -> 8187870 (1.23%)
   helped: 2
   HURT: 66520

   total spills in shared programs: 2985 -> 2961 (-0.80%)
   spills in affected programs: 632 -> 608 (-3.80%)
   helped: 2
   HURT: 0

   total fills in shared programs: 3144 -> 3128 (-0.51%)
   fills in affected programs: 1515 -> 1499 (-1.06%)
   helped: 2
   HURT: 0

   LOST:   0
   GAINED: 4

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs: Add partial support for copy-propagating FIXED_GRFs.
Francisco Jerez [Mon, 30 Dec 2019 08:38:08 +0000 (00:38 -0800)]
intel/fs: Add partial support for copy-propagating FIXED_GRFs.

This will be useful for eliminating redundant copies from the FS
thread payload, particularly in SIMD32 programs.  For the moment we
only allow FIXED_GRFs with identity strides in order to avoid dealing
with composing the arbitrary bidimensional strides that FIXED_GRF
regions potentially have, which are rarely used at the IR level
anyway.

This enables the following commit allowing block-propagation of
FIXED_GRF LOAD_PAYLOAD copies, and prevents the following shader-db
regressions (including SIMD32 programs) in combination with the
interpolation rework part of this series.  On ICL:

   total instructions in shared programs: 20484665 -> 20529650 (0.22%)
   instructions in affected programs: 6031235 -> 6076220 (0.75%)
   helped: 5
   HURT: 42073

   total spills in shared programs: 8748 -> 8925 (2.02%)
   spills in affected programs: 186 -> 363 (95.16%)
   helped: 5
   HURT: 9

   total fills in shared programs: 8663 -> 8960 (3.43%)
   fills in affected programs: 647 -> 944 (45.90%)
   helped: 5
   HURT: 9

On SKL:

   total instructions in shared programs: 18937442 -> 19128162 (1.01%)
   instructions in affected programs: 8378187 -> 8568907 (2.28%)
   helped: 39
   HURT: 68176

   LOST:   1
   GAINED: 4

On SNB:

   total instructions in shared programs: 14094685 -> 14243499 (1.06%)
   instructions in affected programs: 7751062 -> 7899876 (1.92%)
   helped: 623
   HURT: 53586

   LOST:   7
   GAINED: 25

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs: Extend copy propagation dataflow analysis to copies with FIXED_GRF source.
Francisco Jerez [Fri, 3 Jan 2020 02:54:13 +0000 (18:54 -0800)]
intel/fs: Extend copy propagation dataflow analysis to copies with FIXED_GRF source.

This involves indexing the ACP tables used internally by
fs_copy_prop_dataflow::setup_initial_values() by reg_space() instead
of register number.  Both are nearly equivalent for virtual GRFs
(barring the single bit of entropy lost in the hash), and this makes
handling FIXED_GRFs straightforward.

Because we're only going to support FIXED_GRFs for the source of a
copy, this change is only strictly necessary during the second pass
that checks for source interference, but we also apply the same change
to the first pass for consistency.

Note that this shouldn't change the behavior of the copy propagation
pass until we start inserting FIXED_GRF entries into the ACP.  Even
then FIXED_GRF writes are extremely rare so this change will hardly
ever have an effect, but they aren't completely non-existing so we
need to handle them for correctness.

No functional nor shader-db changes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs: Rework fs_inst::is_copy_payload() into multiple classification helpers.
Francisco Jerez [Tue, 31 Dec 2019 08:10:28 +0000 (00:10 -0800)]
intel/fs: Rework fs_inst::is_copy_payload() into multiple classification helpers.

This reworks the current fs_inst::is_copy_payload() method into a
number of classification helpers with well-defined semantics.  This
will be useful later on in order to optimize LOAD_PAYLOAD instructions
more aggressively in cases where we can determine it's safe to do so.

The closest equivalent of the present fs_inst::is_copy_payload()
method is the is_coalescing_payload() helper introduced here.

No functional nor shader-db changes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs: Generalize fs_reg::is_contiguous() to register files other than VGRF.
Francisco Jerez [Thu, 2 Jan 2020 23:32:56 +0000 (15:32 -0800)]
intel/fs: Generalize fs_reg::is_contiguous() to register files other than VGRF.

No functional nor shader-db changes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/fs: Try to vectorize header setup in lower_load_payload().
Francisco Jerez [Mon, 30 Dec 2019 02:17:10 +0000 (18:17 -0800)]
intel/fs: Try to vectorize header setup in lower_load_payload().

In cases where LOAD_PAYLOAD is provided a pair of contiguous registers
as header sources, try to use a single SIMD16 instruction in order to
initialize them.  This is unlikely to affect the overall cycle count
of the shader, since the compressed instruction has twice the issue
time, except due to the reduced pressure on the instruction cache.

Main motivation is avoiding instruction-count regressions in
combination with the following copy propagation improvements, which
will allow the SIMD16 g0-1 header setup emitted for framebuffer writes
to be copy-propagated into its LOAD_PAYLOAD, leading to the emission
of two SIMD8 MOV instructions instead of a single SIMD16 MOV.

Reverting this commit on top of the copy propagation changes would
lead to the following shader-db regressions on SKL and other
platforms:

 total instructions in shared programs: 14926738 -> 14935415 (0.06%)
 instructions in affected programs: 1892445 -> 1901122 (0.46%)
 helped: 0
 HURT: 8676

Without the following copy propagation changes this doesn't have any
effect on shader-db on Gen7+, because we would typically set up the FB
write header with a separate SIMD16 MOV that isn't currently
copy-propagated into the LOAD_PAYLOAD, so the individual SIMD8 MOVs
result of LOAD_PAYLOAD lowering would get register-coalesced away
under normal circumstances.  However that wasn't the case for MRF
LOAD_PAYLOAD destinations on Gen6 and earlier, because register
coalesce only kicks in for GRFs, leaving a number of redundant SIMD8
MOVs lying around.  On SNB this leads to the following shader-db
improvements:

 total instructions in shared programs: 10770538 -> 10734681 (-0.33%)
 instructions in affected programs: 2700655 -> 2664798 (-1.33%)
 helped: 17791
 HURT: 0

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agost/dri: do FLUSH_VERTICES before calling flush_resource
Marek Olšák [Tue, 10 Dec 2019 20:45:14 +0000 (15:45 -0500)]
st/dri: do FLUSH_VERTICES before calling flush_resource

4 years agogallium: add st_context_iface::flush_resource to call FLUSH_VERTICES
Marek Olšák [Tue, 10 Dec 2019 20:35:10 +0000 (15:35 -0500)]
gallium: add st_context_iface::flush_resource to call FLUSH_VERTICES

4 years agoanv: enable VK_KHR_swapchain_mutable_format
Lionel Landwerlin [Thu, 24 Jan 2019 12:03:56 +0000 (12:03 +0000)]
anv: enable VK_KHR_swapchain_mutable_format

Enable new tests in dEQP-VK.image.swapchain_mutable.*

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3434>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3434>

4 years agovulkan/wsi: Implement VK_KHR_swapchain_mutable_format
Jason Ekstrand [Thu, 16 Jan 2020 20:39:58 +0000 (14:39 -0600)]
vulkan/wsi: Implement VK_KHR_swapchain_mutable_format

This is only the core WSI code for the extension.  It adds the image
format list and the flags to vkCreateImage as well as handling things
properly in the modifier queries.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3434>

4 years agovulkan/wsi: Filter modifiers with ImageFormatProperties
Jason Ekstrand [Mon, 1 Oct 2018 21:00:32 +0000 (16:00 -0500)]
vulkan/wsi: Filter modifiers with ImageFormatProperties

Just because a modifier is returned for the given format, that doesn't
mean it works with all usages and flags.  We need to filter the list by
calling vkGetPhysicalDeviceImageFormatProperties2.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3434>

4 years agovulkan/wsi: Use the interface from the real modifiers extension
Jason Ekstrand [Mon, 1 Oct 2018 21:14:24 +0000 (16:14 -0500)]
vulkan/wsi: Use the interface from the real modifiers extension

The anv implementation still isn't quite complete, but we can at least
start using the structs from the real extension.

v2: Fix circular pNext list (Lionel)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3434>

4 years agovulkan/wsi: Move the ImageCreateInfo higher up
Jason Ekstrand [Mon, 1 Oct 2018 21:04:20 +0000 (16:04 -0500)]
vulkan/wsi: Move the ImageCreateInfo higher up

Future changes will be easier if we can modify it based on whether or
not we're using modifiers.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3434>

4 years agoanv: Support modifiers in GetImageFormatProperties2
Jason Ekstrand [Thu, 16 Jan 2020 19:38:55 +0000 (13:38 -0600)]
anv: Support modifiers in GetImageFormatProperties2

Images with modifiers come with restrictions:

 1. They have to be simple 2D images right now

 2. They need to have a sensible format (not compressed, multi-plane, or
    non-power-of-two)

 3. If a CCS modifier is being requested, they have to actually support
    CCS_E and be CCS-compatible with any other formats the client may
    wish to use for image views.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3434>

4 years agoanv: Drop some VK_IMAGE_TILING_OPTIMAL checks
Jason Ekstrand [Mon, 1 Oct 2018 21:17:47 +0000 (16:17 -0500)]
anv: Drop some VK_IMAGE_TILING_OPTIMAL checks

The DRM format modifiers extension adds a TILING_DRM_FORMAT_MODIFIER
which will be used for modifiers so we can no longer use OPTIMAL to
indicate tiled inside the driver.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3434>

4 years agoaco: print assembly with CLRXdisasm for GFX6-GFX7 if found on the system
Samuel Pitoiset [Fri, 17 Jan 2020 08:49:44 +0000 (09:49 +0100)]
aco: print assembly with CLRXdisasm for GFX6-GFX7 if found on the system

LLVM only supports GFX8+. Using CLRXdisasm works most of the time,
so it's useful to add support for it.

Original patch by Daniel Schürmann.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3439>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3439>

4 years agovulkan/wsi: disable the hardware cursor
Andres Rodriguez [Tue, 10 Sep 2019 18:30:35 +0000 (14:30 -0400)]
vulkan/wsi: disable the hardware cursor

Ensure the hardware cursor is disabled when we set the mode for a
VkDisplayKHR object. The extension doesn't expose any mechanisms to
program the hardware cursor, so we need to ensure it is hidden.

Currently, it seems like X is responsible for disabling the cursor
before handing over the lease. But that seems a little frail, and we
should be disabling the cursor ourselves so it works correctly
independently of how the lease was prepared for us.

Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1922>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1922>

4 years agogallium/swr: Disable showing detected arch message.
Krzysztof Raszkowski [Fri, 17 Jan 2020 15:43:33 +0000 (16:43 +0100)]
gallium/swr: Disable showing detected arch message.

When swr driver is in use it print detected architecture
message to std::err. It can be harmfull when swr is using
in multinodes environments.
It can be enabled setting env var SWR_PRINT_INFO to 1.

Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
4 years agoaco: fix emitting slc for MUBUF instructions on GFX6-GFX7
Samuel Pitoiset [Fri, 17 Jan 2020 07:22:48 +0000 (08:22 +0100)]
aco: fix emitting slc for MUBUF instructions on GFX6-GFX7

Same as GFX10, only GFX8/GFX9 moved that bit near the opcode.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3437>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3437>

4 years agopanfrost/midgard: Fix swizzle for store instructions
Boris Brezillon [Thu, 16 Jan 2020 10:20:06 +0000 (11:20 +0100)]
panfrost/midgard: Fix swizzle for store instructions

The current logic considers that the nir_intrinsic_component(store_intr)
encodes the source components start, but it actually encodes the
destination one. Source component offset adjustment is taken care of in
install_registers_instr(), when offset_swizzle() is called.

This fixes dEQP-GLES2.functional.shaders.random.all_features.fragment.45
when PAN_MESA_DEBUG=deqp (looks like exposing GLES3 features has an
impact on the varyings layout).

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3429>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3429>

4 years agodocs: do not double-close link tag
Erik Faye-Lund [Thu, 16 Jan 2020 16:56:13 +0000 (17:56 +0100)]
docs: do not double-close link tag

Fixes: f8148d0cc17 "docs: remove mailing list as way of submitting patches"
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3431>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3431>

4 years agodocs: remove double-closed definition-list
Erik Faye-Lund [Thu, 16 Jan 2020 16:45:55 +0000 (17:45 +0100)]
docs: remove double-closed definition-list

Fixes: bc17ac58661 "docs: add documentation for building with meson"
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3431>

4 years agodocs: move paragraph closing tag
Erik Faye-Lund [Thu, 16 Jan 2020 16:39:39 +0000 (17:39 +0100)]
docs: move paragraph closing tag

The pre-tag right before is a block-level tag, which means it implicitly
terminates the paragraph. So there's no paragraph to close after this.
Instead, move the paragraph-closing before the pre-tag, to explicitly
close the paragraph.

Fixes: 41b3eb08d9f "docs: update meson docs for windows"
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3431>

4 years agodocs: use code-tags instead of pre-tags
Erik Faye-Lund [Thu, 16 Jan 2020 17:01:41 +0000 (18:01 +0100)]
docs: use code-tags instead of pre-tags

Similar to the previous two commits, it seems more appropriate to use
code-tags here than pre-tag.

Fixes: 9af6c38deff "docs: Add use of Closes: tag for closing gitlab issues"
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3431>

4 years agodocs: use code-tags instead of pre-tags
Erik Faye-Lund [Thu, 16 Jan 2020 16:49:34 +0000 (17:49 +0100)]
docs: use code-tags instead of pre-tags

Similar to the previous commit, code-tags seems more appropriate than
pre-tags here. So let's change it.

Fixes: ca0c1e69cab "docs: update releasing process to use new scripts and gitlab"
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3431>

4 years agodocs: use code-tag instead of pre-tag
Erik Faye-Lund [Thu, 16 Jan 2020 16:32:19 +0000 (17:32 +0100)]
docs: use code-tag instead of pre-tag

It's unlikely the author meant to use <pre>-here, as that starts a whole
new block. Instead, the inline code-tag seems more appropriate here.

Fixes: 41b3eb08d9f "docs: update meson docs for windows"
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3431>

4 years agodocs: open paragraph before closing it
Erik Faye-Lund [Thu, 16 Jan 2020 16:27:16 +0000 (17:27 +0100)]
docs: open paragraph before closing it

Fixes: 44c5e634a5c "docs: update meson docs for windows"
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3431>

4 years agodocs: fix paragraphs
Erik Faye-Lund [Thu, 16 Jan 2020 16:21:50 +0000 (17:21 +0100)]
docs: fix paragraphs

Paragraphs are terminated by pre-tags, so the latter one closes a new,
empty one. Let's split the paragraph in two around the pre-tag instead.

Fixes: c0dfe8c6dfd "docs: do not use div for line-breaking"
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3431>

4 years agodocs: fix typo in html tag name
Erik Faye-Lund [Thu, 16 Jan 2020 16:13:12 +0000 (17:13 +0100)]
docs: fix typo in html tag name

Fixes: 5d11a828e10 "docs: update install docs for meson"
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3431>

4 years agoutil: call bind_sampler_states before setting sampler_views
Pierre-Eric Pelloux-Prayer [Sat, 15 Feb 2020 22:05:53 +0000 (23:05 +0100)]
util: call bind_sampler_states before setting sampler_views

Fixes the following valgrind error:

    Invalid read of size 16
       at 0x28F458A1: si_set_sampler_view_desc (in radeonsi_drv_video.so)
       by 0x28F4657E: si_set_sampler_views (in radeonsi_drv_video.so)
       by 0x28D62BF5: util_compute_blit (in radeonsi_drv_video.so)
       by 0x28D3A944: vlVaHandleVAProcPipelineParameterBufferType (in radeonsi_drv_video.so)
       by 0x28D34EE1: vlVaRenderPicture (in radeonsi_drv_video.so)
       by 0x4B2582B: vaRenderPicture (in libva.so.2.500.0)
     Address 0x18142a10 is 0 bytes inside a block of size 48 free'd
       at 0x48369AB: free (vg_replace_malloc.c:540)
       by 0x28D62D51: util_compute_blit (in radeonsi_drv_video.so)
       by 0x28D3A944: vlVaHandleVAProcPipelineParameterBufferType (in radeonsi_drv_video.so)
       by 0x28D34EE1: vlVaRenderPicture (in radeonsi_drv_video.so)
       by 0x4B2582B: vaRenderPicture (in libva.so.2.500.0)
     Block was alloc'd at
       at 0x4837B65: calloc (vg_replace_malloc.c:762)
       by 0x28EFB2EC: si_create_sampler_state (in radeonsi_drv_video.so)
       by 0x28D62C30: util_compute_blit (in radeonsi_drv_video.so)
       by 0x28D3A944: vlVaHandleVAProcPipelineParameterBufferType (in radeonsi_drv_video.so)
       by 0x28D34EE1: vlVaRenderPicture (in radeonsi_drv_video.so)
       by 0x4B2582B: vaRenderPicture (in libva.so.2.500.0)

Fixes: 69430d7e59e ("va: use a compute shader for the blit")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2321
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3428>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3428>

4 years agonir: Fix printing of ~0 .locations.
Eric Anholt [Thu, 19 Dec 2019 00:46:41 +0000 (16:46 -0800)]
nir: Fix printing of ~0 .locations.

I kept wondering what "429" meant in variable declarations, when it was
just a truncated ~0 snprintf.

Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3423>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3423>

4 years agomeson: use github URL for wraps instead of completely unreliable wrapdb
Eric Engestrom [Tue, 14 Jan 2020 15:53:21 +0000 (15:53 +0000)]
meson: use github URL for wraps instead of completely unreliable wrapdb

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3391>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3391>

4 years agodocs: Update release calendar for 20.0
Dylan Baker [Wed, 15 Jan 2020 20:37:52 +0000 (12:37 -0800)]
docs: Update release calendar for 20.0

Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3417>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3417>

4 years agolima: Fix alpha blending
Andreas Baierl [Wed, 15 Jan 2020 14:31:39 +0000 (15:31 +0100)]
lima: Fix alpha blending

Introduce separate helper functions to set the blendfactor bits.

Lima uses bits 0-2 for the type, bit 3 sets the inverted function
and bit 4 is set if alpha is used.
alpha_src_factor and alpha_dst_factor don't need the alpha bit, so
they are masked with 0xf. There is only place for 4 bits anyway.
If alpha_src_factor is PIPE_BLENDFACTOR_SRC_ALPHA_SATURATE, we need
to change it to PIPE_BLENDFACTOR_ONE first.
This is exactly what the blob does and we pass all
dEQP-GLES2.functional.fragment_ops.blend.* tests now.
Better than the blob btw...

Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3411>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3411>

4 years agoaco: ignore parallelcopies to the same register on jump threading
Daniel Schürmann [Tue, 14 Jan 2020 12:14:38 +0000 (13:14 +0100)]
aco: ignore parallelcopies to the same register on jump threading

The more conservative lowering to CSSA inserts unnecessary parallelcopies
which might get coalesced and can be ignored on jump threading.

v2: outline is_empty_block() check.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3385>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3385>

4 years agoaco: handle phi affinities transitively through parallelcopies
Daniel Schürmann [Mon, 13 Jan 2020 14:13:19 +0000 (15:13 +0100)]
aco: handle phi affinities transitively through parallelcopies

This can coalesce most unnecessarily inserted parallelcopies
from lowering to CSSA.

v2: refactor loop a bit to make it more efficient and readable.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3385>

4 years agoaco: rework lower_to_cssa()
Daniel Schürmann [Mon, 13 Jan 2020 16:35:11 +0000 (17:35 +0100)]
aco: rework lower_to_cssa()

This patch changes lower_to_cssa to be much more conservative
about assumptions which phi operands might interfere.
Previously, this pass wasn't exhaustive and could miss some corner cases.

v2: remove optimizations to find better insertion points as it's hard
to guarantee that they are always correct and have overall no benefit.

Fixes: 0b8216b2cdbcaccfd2bd1a65be6b8ac5654e3067 ('aco: Lower to CSSA')
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3385>

4 years agoaco: implement stream output with vec3 on GFX6
Samuel Pitoiset [Wed, 15 Jan 2020 13:44:26 +0000 (14:44 +0100)]
aco: implement stream output with vec3 on GFX6

GFX6 doesn't support vec3.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3412>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3412>

4 years agoaco: do not combine additions of DS instructions on GFX6
Samuel Pitoiset [Wed, 15 Jan 2020 09:47:17 +0000 (10:47 +0100)]
aco: do not combine additions of DS instructions on GFX6

The offset field doesn't work as expected on GFX6.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3412>

4 years agoaco: do not select 96-bit/128-bit variants for ds_read/ds_write on GFX6
Samuel Pitoiset [Tue, 14 Jan 2020 17:46:36 +0000 (18:46 +0100)]
aco: do not select 96-bit/128-bit variants for ds_read/ds_write on GFX6

Only GFX7 and later support large ds_read/ds_write.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3412>

4 years agointel/perf: report query split for mdapi
Lionel Landwerlin [Mon, 16 Dec 2019 13:42:55 +0000 (15:42 +0200)]
intel/perf: report query split for mdapi

Also forgotten in the initial implementation.

v2: Report begin timestamp scaled by the timestamp frequency (Windows
    behavior)

v3: Rename split to disjoint to match GL terminology (Tapani)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3112>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3112>

4 years agointel/perf: expose timestamp begin for mdapi
Lionel Landwerlin [Mon, 16 Dec 2019 13:36:24 +0000 (15:36 +0200)]
intel/perf: expose timestamp begin for mdapi

This was forgotten in the initial implementation.

v2: ensure the value is written for both GL & Vulkan queries

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3112>

4 years agoanv: set depth stall enabled when depth flush enabled on gen12
Tapani Pälli [Tue, 14 Jan 2020 08:03:21 +0000 (10:03 +0200)]
anv: set depth stall enabled when depth flush enabled on gen12

This implements HW workaround #1409600907 for anv driver.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3378>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3378>

4 years agoiris: set depth stall enabled when depth flush enabled on gen12
Tapani Pälli [Tue, 14 Jan 2020 08:02:05 +0000 (10:02 +0200)]
iris: set depth stall enabled when depth flush enabled on gen12

This implements HW workaround #1409600907 for iris driver.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3378>

4 years agoanv: implement another workaround for non pipelined states
Lionel Landwerlin [Wed, 15 Jan 2020 13:14:23 +0000 (15:14 +0200)]
anv: implement another workaround for non pipelined states

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3408>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3408>

4 years agoiris: implement another workaround for non pipelined states
Lionel Landwerlin [Wed, 15 Jan 2020 13:14:10 +0000 (15:14 +0200)]
iris: implement another workaround for non pipelined states

v2: add comment (Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3408>

4 years agoiris: handle new PIPE_CONTROL field
Lionel Landwerlin [Wed, 15 Jan 2020 13:13:43 +0000 (15:13 +0200)]
iris: handle new PIPE_CONTROL field

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3408>

4 years agogenxml: add new Gen11+ PIPE_CONTROL field
Lionel Landwerlin [Wed, 15 Jan 2020 13:11:08 +0000 (15:11 +0200)]
genxml: add new Gen11+ PIPE_CONTROL field

PIPE_CONTROL gained a new field in its first DWORD on Gen11. We had no
use for it so far, but we start using it on Gen12.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3408>

4 years agost/mesa: Allocate full miplevels if MaxLevel is explicitly set
Kenneth Graunke [Wed, 15 Jan 2020 00:25:11 +0000 (16:25 -0800)]
st/mesa: Allocate full miplevels if MaxLevel is explicitly set

Some applications explicitly call glTex[ture]Parameteri[v] to set
GL_TEXTURE_MAX_LEVEL and GL_TEXTURE_BASE_LEVEL before uploading any
texture data.  Core Mesa initializes MaxLevel to 1000, so if it isn't
that, we know they've set it.  (We check for < TEXTURE_MAX_LEVELS to
avoid hardcoding that value, however.)

If MaxLevel - BaseLevel > 0, then the app is trying to tell us that
this texture is going to have multiple miplevels.  In that case, go
ahead and allocate the space for it.

Avoids many resource_copy_region calls at texture finalization time
in the Civilization VI benchmark.

Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3401>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3401>

4 years agoaco: fix emitting SMEM instructions with no operands on GFX6-GFX7
Samuel Pitoiset [Wed, 15 Jan 2020 12:08:17 +0000 (13:08 +0100)]
aco: fix emitting SMEM instructions with no operands on GFX6-GFX7

Like s_memtime.

Fixes dEQP-VK.glsl.shader_clock.* on GFX6.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3407>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3407>

4 years agolima: fix handling of reverse depth range
Vasily Khoruzhick [Wed, 15 Jan 2020 03:53:29 +0000 (19:53 -0800)]
lima: fix handling of reverse depth range

Looks like we need to handle cases when near > far and near == far.
In first case we just need to swap near and far, and in second we
need subtract epsilon from near if it's not zero.

Fixes 10 tests in dEQP-GLES2.functional.depth_range.*

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3400>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3400>

4 years agonvc0: disable xfb's which don't have a stride
Ilia Mirkin [Thu, 16 Jan 2020 00:46:59 +0000 (19:46 -0500)]
nvc0: disable xfb's which don't have a stride

No stride / no attributes means that nothing is being written to the
buffer. However it might still prevent primitives from being written out
to the other buffers. Disabling it entirely seems to fix it.

Fixes GTF-GL45.gtf30.GL3Tests.transform_feedback.transform_feedback_overflow

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
4 years agolima/ppir: implement full liveness analysis for regalloc
Erico Nunes [Sun, 12 Jan 2020 14:11:55 +0000 (15:11 +0100)]
lima/ppir: implement full liveness analysis for regalloc

The existing liveness analysis in ppir still ultimately relies on a
single continuous live_in and live_out range per register and was
observed to be the bottleneck for register allocation on complicated
examples with several control flow blocks.
The use of live_in and live_out ranges was fine before ppir got control
flow, but now it ends up creating unnecessary interferences as live_in
and live_out ranges may span across entire blocks after blocks get
placed sequentially.

This new liveness analysis implementation generates a set of live
variables at each program point; before and after each instruction and
beginning and end of each block.
This is a global analysis and propagates the sets of live registers
across blocks independently of their sequence.
The resulting sets optimally represent all variables that cannot share a
register at each program point, so can be directly translated as
interferences to the register allocator.

Special care has to be taken with non-ssa registers. In order to
properly define their live range, their alive components also need to be
tracked. Therefore ppir can't use simple bitsets to keep track of live
registers.

The algorithm uses an auxiliary set data structure to keep track of the
live registers. The initial implementation used only trivial arrays,
however regalloc execution time was then prohibitive (>1minute on
Cortex-A53) on extreme benchmarks with hundreds of instructions,
hundreds of registers and several spilling iterations, mostly due to the
n^2 complexity to generate the interferences from the live sets. Since
the live registers set are only a very sparse subset of all registers at
each instruction, iterating only over this subset allows it to run very
fast again (a couple of seconds for the same benchmark).

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3358>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3358>

4 years agolima/ppir: remove orphan load node after cloning
Erico Nunes [Sun, 12 Jan 2020 13:30:26 +0000 (14:30 +0100)]
lima/ppir: remove orphan load node after cloning

There are some cases in shades using control flow where the varying load
is cloned to every block, and then the original node is left orphan.
This is not harmful for program execution, but it complicates analysis
for register allocation as there is now a case of writing to a register
that is never read.
While ppir doesn't have a dead code elimination pass for its own
optimizations and it is not hard to detect when we cloned the last load,
let's remove it early.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3358>

4 years agoiris: Print warning and return *out = NULL when fd to syncobj fails
Kristian H. Kristensen [Wed, 15 Jan 2020 00:56:41 +0000 (16:56 -0800)]
iris: Print warning and return *out = NULL when fd to syncobj fails

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agoiris: Advertise PIPE_CAP_NATIVE_FENCE_FD
Kristian H. Kristensen [Thu, 19 Dec 2019 18:56:03 +0000 (10:56 -0800)]
iris: Advertise PIPE_CAP_NATIVE_FENCE_FD

Enables EGL_ANDROID_native_fence_sync.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agoiris: Fix export of fences that have already completed.
Kenneth Graunke [Thu, 19 Dec 2019 21:51:07 +0000 (13:51 -0800)]
iris: Fix export of fences that have already completed.

After flushing batches, iris_fence_flush() asks the kernel whether
each batch's last_syncpt has already signalled or not.  (The idea is
that either the compute or render batch may not have actually had any
work queued up, so last_syncpt there might have been signalled a long
time ago.)  If it's already completed, we don't bother to record it.

A strange corner is the case of repeated flushes.  For example, we
might flush for some reason, and hit a glFlush(), and hit SwapBuffers.
It's possible for all the batches to have been flushed previously, -and-
for them to have actually completed.  In this case, we'll see that there
are no syncobj's to wait on, and record fence->count == 0.

This works fine internally - fence_finish can see count == 0 and realize
that it doesn't need to wait, for example.  But when working with native
FDs, we may be asked to export a fence with count == 0.  So we need an
actual synchronization primitive we can hand off.  Because all of the
relevant batches had been signalled when creating the fence, we want the
new dummy fence to be signalled as well.

So we just make a signalled syncobj and export it.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
4 years agoandroid: Fix whitespace issue
Robert Foss [Wed, 15 Jan 2020 00:11:23 +0000 (01:11 +0100)]
android: Fix whitespace issue

Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopanfrost: Prefix schedule_program to prevent collision
Robert Foss [Wed, 15 Jan 2020 00:14:16 +0000 (01:14 +0100)]
panfrost: Prefix schedule_program to prevent collision

Currently the schedule_program implementation being used is picked
at compile time, which on the Android platform means that the
bifrost compiler & scheduler is used for all targets, including
midgard based hardware.

This commit disambiguates between the two schedule_program functions.

Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agoradeonsi: merge si_compile_llvm and si_llvm_compile functions
Marek Olšák [Wed, 15 Jan 2020 01:40:51 +0000 (20:40 -0500)]
radeonsi: merge si_compile_llvm and si_llvm_compile functions

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3399>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3399>

4 years agoradeonsi: remove useless #includes
Marek Olšák [Wed, 15 Jan 2020 01:27:24 +0000 (20:27 -0500)]
radeonsi: remove useless #includes

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3399>

4 years agoradeonsi: move code for shader resources into si_shader_llvm_resources.c
Marek Olšák [Wed, 15 Jan 2020 01:17:08 +0000 (20:17 -0500)]
radeonsi: move code for shader resources into si_shader_llvm_resources.c

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3399>

4 years agoradeonsi: move geometry shader code into si_shader_llvm_gs.c
Marek Olšák [Wed, 15 Jan 2020 01:03:48 +0000 (20:03 -0500)]
radeonsi: move geometry shader code into si_shader_llvm_gs.c

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3399>

4 years agoradeonsi: remove llvm_type_is_64bit
Marek Olšák [Wed, 15 Jan 2020 00:29:34 +0000 (19:29 -0500)]
radeonsi: remove llvm_type_is_64bit

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3399>

4 years agoradeonsi: move tessellation shader code into si_shader_llvm_tess.c
Marek Olšák [Wed, 15 Jan 2020 00:13:42 +0000 (19:13 -0500)]
radeonsi: move tessellation shader code into si_shader_llvm_tess.c

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3399>

4 years agoradeonsi: move si_insert_input_* functions
Marek Olšák [Tue, 24 Dec 2019 00:53:46 +0000 (19:53 -0500)]
radeonsi: move si_insert_input_* functions

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3399>

4 years agoradeonsi: work around an LLVM crash when using llvm.amdgcn.icmp.i64.i1
Marek Olšák [Thu, 9 Jan 2020 02:52:26 +0000 (21:52 -0500)]
radeonsi: work around an LLVM crash when using llvm.amdgcn.icmp.i64.i1

Cc: 19.2 19.3 <mesa-stable@lists.freedesktop.org>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3338>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3338>

4 years agoradeonsi: fix si_build_wrapper_function for compute-based primitive culling
Marek Olšák [Thu, 9 Jan 2020 02:51:23 +0000 (21:51 -0500)]
radeonsi: fix si_build_wrapper_function for compute-based primitive culling

Fixes: 3b143369a55 "ac/nir, radv, radeonsi: Switch to using ac_shader_args"
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3338>

4 years agoradeonsi/gfx10: separate code for determining the number of vertices for NGG
Marek Olšák [Fri, 3 Jan 2020 23:02:30 +0000 (18:02 -0500)]
radeonsi/gfx10: separate code for determining the number of vertices for NGG

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoradeonsi/gfx10: separate code for getting edgeflags from the gs_invocation_id VGPR
Marek Olšák [Fri, 3 Jan 2020 21:25:48 +0000 (16:25 -0500)]
radeonsi/gfx10: separate code for getting edgeflags from the gs_invocation_id VGPR

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoradeonsi: move VS_STATE.LS_OUT_PATCH_SIZE a few bits higher to make space there
Marek Olšák [Tue, 24 Dec 2019 01:17:46 +0000 (20:17 -0500)]
radeonsi: move VS_STATE.LS_OUT_PATCH_SIZE a few bits higher to make space there

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoradeonsi: make si_insert_input_* functions non-static
Marek Olšák [Tue, 24 Dec 2019 00:53:46 +0000 (19:53 -0500)]
radeonsi: make si_insert_input_* functions non-static

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>