mesa.git
4 years agoaco/wave32: Fix reductions.
Timur Kristóf [Wed, 27 Nov 2019 15:59:11 +0000 (16:59 +0100)]
aco/wave32: Fix reductions.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco/wave32: Allow setting the subgroup ballot size to 64-bit.
Timur Kristóf [Thu, 28 Nov 2019 09:41:19 +0000 (10:41 +0100)]
aco/wave32: Allow setting the subgroup ballot size to 64-bit.

Previously, it would only work when the ballot size was set to the
lane mask. This patch makes is possible to set the ballot size
to either 32-bit or 64-bit for wave32 mode.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco/wave32: Use wave_size for barrier intrinsic.
Timur Kristóf [Fri, 22 Nov 2019 16:07:34 +0000 (17:07 +0100)]
aco/wave32: Use wave_size for barrier intrinsic.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco/wave32: Fix load_local_invocation_index to support wave32.
Timur Kristóf [Wed, 27 Nov 2019 10:09:20 +0000 (11:09 +0100)]
aco/wave32: Fix load_local_invocation_index to support wave32.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco/wave32: Use lane mask regclass for exec/vcc.
Timur Kristóf [Wed, 27 Nov 2019 10:04:47 +0000 (11:04 +0100)]
aco/wave32: Use lane mask regclass for exec/vcc.

Currently all usages of exec and vcc are hardcoded to use s2 regclass.
This commit makes it possible to use s1 in wave32 mode and
s2 in wave64 mode.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco/wave32: Add wave size specific opcodes to aco_builder.
Timur Kristóf [Thu, 31 Oct 2019 12:28:54 +0000 (13:28 +0100)]
aco/wave32: Add wave size specific opcodes to aco_builder.

Several places in ACO we use SOP1 or SOP2 instructions to operate over the
exec mask or VCC, and these need to be adapted to the new size in wave32
mode.

This commit adds a way to deal with this problem in aco_builder: the caller
can specify a wave size specific opcode and the builder will translate that
to the correct opcode based on the current wave size.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco/wave32: Introduce emit_mbcnt which takes wave size into account.
Timur Kristóf [Thu, 31 Oct 2019 10:26:14 +0000 (11:26 +0100)]
aco/wave32: Introduce emit_mbcnt which takes wave size into account.

This is relevant because in wave32 mode the v_mbcnt_hi_u32_b32
instruction is superfluous.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco/wave32: Replace hardcoded numbers in spiller with wave size.
Timur Kristóf [Mon, 28 Oct 2019 16:15:17 +0000 (17:15 +0100)]
aco/wave32: Replace hardcoded numbers in spiller with wave size.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco/wave32: Change uniform bool optimization to work with wave32.
Timur Kristóf [Fri, 22 Nov 2019 10:57:45 +0000 (11:57 +0100)]
aco/wave32: Change uniform bool optimization to work with wave32.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: Optimize load_subgroup_id to one bit field extract instruction.
Timur Kristóf [Fri, 22 Nov 2019 14:13:54 +0000 (15:13 +0100)]
aco: Optimize load_subgroup_id to one bit field extract instruction.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: Remove lower_linear_bool_phi, it is not needed anymore.
Timur Kristóf [Thu, 21 Nov 2019 11:31:14 +0000 (12:31 +0100)]
aco: Remove lower_linear_bool_phi, it is not needed anymore.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: Remove superfluous argument from emit_boolean_logic.
Timur Kristóf [Thu, 21 Nov 2019 11:28:31 +0000 (12:28 +0100)]
aco: Remove superfluous argument from emit_boolean_logic.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: Fix operand of s_bcnt1_i32_b64 in emit_boolean_reduce.
Timur Kristóf [Thu, 21 Nov 2019 11:26:36 +0000 (12:26 +0100)]
aco: Fix operand of s_bcnt1_i32_b64 in emit_boolean_reduce.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agogitlab-ci: Run piglit glslparser & quick_shader tests separately
Michel Dänzer [Tue, 3 Dec 2019 09:45:28 +0000 (10:45 +0100)]
gitlab-ci: Run piglit glslparser & quick_shader tests separately

And only use --process-isolation false for the quick_gl tests.

This will hopefully avoid variance in the test results that we've been
seeing lately. But even if it doesn't, it should at least help narrow
down the cause of the variance.

Tested-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agointel/perf: fix improper pointer access
Lionel Landwerlin [Tue, 3 Dec 2019 14:35:45 +0000 (16:35 +0200)]
intel/perf: fix improper pointer access

This expression was unused by the macro, probably why it didn't
register in the compilation.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/perf: simplify the processing of OA reports
Lionel Landwerlin [Tue, 3 Dec 2019 14:33:25 +0000 (16:33 +0200)]
intel/perf: simplify the processing of OA reports

This is a more accurate description of what happens in processing the
OA reports.

Previously we only had a somewhat difficult to parse state machine
tracking the context ID.

What we really only need to do to decide if the delta between 2
reports (r0 & r1) should be accumulated in the query result is :

   * whether the r0 is tagged with the context ID relevant to us

   * if r0 is not tagged with our context ID and r1 is: does r0 have a
     invalid context id? If not then we're in a case where i915 has
     resubmitted the same context for execution through the execlist
     submission port

v2: Update comment (Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/perf: take into account that reports read can be fairly old
Lionel Landwerlin [Tue, 3 Dec 2019 14:19:24 +0000 (16:19 +0200)]
intel/perf: take into account that reports read can be fairly old

If we read the OA reports late enough after the query happens, we can
get a timestamp in the report that is significantly in the past
compared to the start timestamp of the query. The current code must
deal with the wraparound of the timestamp value (every ~6 minute). So
consider that if the difference is greater than half that wraparound
period, we're probably dealing with an old report and make the caller
aware it should read more reports when they're available.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/perf: set read buffer len to 0 to identify empty buffer
Lionel Landwerlin [Tue, 3 Dec 2019 14:12:03 +0000 (16:12 +0200)]
intel/perf: set read buffer len to 0 to identify empty buffer

We always add an empty buffer in the list when creating the query.
Let's set the len appropriately so that we can recognize it when we
read OA reports up to the end of a query.

We were using an 0 timestamp value associated with the empty buffer
and incorrectly assuming this was a valid value. In turn that led to
not reading enough reports and resulted in deltas added to our counter
values which should have been discarded because those would be flagged
for a different context.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agointel/perf: fix invalid hw_id in query results
Lionel Landwerlin [Tue, 3 Dec 2019 14:08:12 +0000 (16:08 +0200)]
intel/perf: fix invalid hw_id in query results

Accumulation happens between 2 reports, it can be between a start/end
report from another context. So only consider updating the hw_id of
the results when it's not already valid and that we have a valid value
to put in there.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 41b54b5faf ("i965: move OA accumulation code to intel/perf")
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agoradeonsi: display cs blit count for AMD_DEBUG=testdma
Pierre-Eric Pelloux-Prayer [Fri, 4 Oct 2019 13:24:34 +0000 (15:24 +0200)]
radeonsi: display cs blit count for AMD_DEBUG=testdma

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agoradeonsi: implement sdma for GFX9
Pierre-Eric Pelloux-Prayer [Fri, 29 Nov 2019 13:01:10 +0000 (14:01 +0100)]
radeonsi: implement sdma for GFX9

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agoradv/gfx10: fix the vertex order for triangle strips emitted by a GS
Samuel Pitoiset [Sat, 30 Nov 2019 08:03:31 +0000 (09:03 +0100)]
radv/gfx10: fix the vertex order for triangle strips emitted by a GS

My fix wasn't totally correct as pointed out by Marek.
Ported from RadeonSI.

Fixes: deafe4cc587 ("radv/gfx10: fix primitive indices orientation for NGG GS")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: simplify a check in radv_fixup_vertex_input_fetches()
Samuel Pitoiset [Mon, 2 Dec 2019 15:46:30 +0000 (16:46 +0100)]
radv: simplify a check in radv_fixup_vertex_input_fetches()

The number of loaded channels should always be > 0 now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoradv: remove dead shader input/output variables
Samuel Pitoiset [Mon, 2 Dec 2019 15:33:06 +0000 (16:33 +0100)]
radv: remove dead shader input/output variables

No pipeline-db changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoiris: Stop setting up fake params
Jason Ekstrand [Mon, 18 Nov 2019 21:40:09 +0000 (15:40 -0600)]
iris: Stop setting up fake params

In d1c4e64a69e, we added a parameter to tell the back-end compiler to
ignore the param array and just push however many constants you ask it
to push.  Iris doesn't want to push anything so it gives a bogus number
of parameters and trusts the back-end compiler to dead-code all of them.
Now that we can tell the back-end compiler to stop re-arranging things,
delete the hack and enable the new simpler code path.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
4 years agogallium/scons: fix graw-xlib build on OSX.
Dave Airlie [Sun, 1 Dec 2019 22:51:25 +0000 (08:51 +1000)]
gallium/scons: fix graw-xlib build on OSX.

Fixes: 44a6b0107b37 (gallivm: add nir->llvm translation (v2))
Tested-by: Vinson Lee <vlee@freedesktop.org>
4 years agollvmpipe: enable texcoord semantics
Dave Airlie [Tue, 3 Dec 2019 23:26:46 +0000 (09:26 +1000)]
llvmpipe: enable texcoord semantics

To make NIR transitioning easier, move the driver to using
texcoord semantics.

Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoanv: Respect the always_flush_cache driconf option
Jason Ekstrand [Mon, 11 Nov 2019 18:46:33 +0000 (12:46 -0600)]
anv: Respect the always_flush_cache driconf option

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agogallium/swr: Fix crash when use GL_TDFX_texture_compression_FXT1 format.
Krzysztof Raszkowski [Tue, 3 Dec 2019 13:43:57 +0000 (14:43 +0100)]
gallium/swr: Fix crash when use GL_TDFX_texture_compression_FXT1 format.

Reject the new formats in swr to prevent crashes because it doesn't
know how to handle the new formats.

Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
4 years agogitlab-ci: disable junit results for deqp
Rob Clark [Tue, 3 Dec 2019 16:45:49 +0000 (08:45 -0800)]
gitlab-ci: disable junit results for deqp

They don't seem to be hugely useful, and seem to be bogging down gitlab.

Signed-off-by: Rob Clark <robdclark@chromium.org>
4 years agoanv: Set up SBE_SWIZ properly for gl_Viewport
Jason Ekstrand [Tue, 26 Nov 2019 21:08:43 +0000 (15:08 -0600)]
anv: Set up SBE_SWIZ properly for gl_Viewport

gl_Viewport is also in the VUE header so we need to whack the read
offset to 0 and emit a default (no overrides) SBE_SWIZ entry in that
case as well.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
4 years agogitlab-ci: Update to current ci-templates master
Michel Dänzer [Tue, 3 Dec 2019 11:38:59 +0000 (12:38 +0100)]
gitlab-ci: Update to current ci-templates master

Fixes skopeo copy failures.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoac/llvm: fix atomic var operations if source isn't a deref
Samuel Pitoiset [Mon, 2 Dec 2019 13:58:00 +0000 (14:58 +0100)]
ac/llvm: fix atomic var operations if source isn't a deref

Fixes some CTS regressions.

Fixes: e61a826f396 ("ac/llvm: fix pointer type for global atomics")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoAdd support for T820 CI Jobs
Neil Armstrong [Fri, 20 Sep 2019 13:43:19 +0000 (15:43 +0200)]
Add support for T820 CI Jobs

Tomeu: - Small rebase fixups

Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
4 years agogallivm/llvmpipe: add support for front facing in sysval.
Dave Airlie [Mon, 2 Dec 2019 00:05:12 +0000 (10:05 +1000)]
gallivm/llvmpipe: add support for front facing in sysval.

This wires up the front facing value as a sysval, I'd like to
remove the other facing code but I'd need to confirm VMware
don't use it first.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agollvmpipe/images: handle undefined atomic without crashing
Dave Airlie [Thu, 24 Oct 2019 01:42:23 +0000 (11:42 +1000)]
llvmpipe/images: handle undefined atomic without crashing

just return 0 for unbound atomic operations.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agopanfrost: Remove blend shader hack
Alyssa Rosenzweig [Sat, 30 Nov 2019 18:24:46 +0000 (13:24 -0500)]
panfrost: Remove blend shader hack

This is no longer used.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agogitlab-ci: Test Panfrost on T720 GPUs
Tomeu Vizoso [Fri, 25 Oct 2019 11:04:34 +0000 (13:04 +0200)]
gitlab-ci: Test Panfrost on T720 GPUs

Now that the Mali T720 GPU is supoprted at the same level as the T760,
test it on PINE64 H64 boards.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agogitlab-ci: Remove non-default skips from Panfrost
Alyssa Rosenzweig [Sat, 23 Nov 2019 16:28:43 +0000 (11:28 -0500)]
gitlab-ci: Remove non-default skips from Panfrost

During the past months, Panfrost has matured considerably and several
tests stopped being flaky or failing at all.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopanfrost: White list the Mali T720
Tomeu Vizoso [Mon, 28 Oct 2019 08:59:30 +0000 (09:59 +0100)]
panfrost: White list the Mali T720

Support for this GPU is equal now to that of T760, so whitelist it.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopan/midgard: Splatter on fragment out
Alyssa Rosenzweig [Tue, 26 Nov 2019 13:48:33 +0000 (08:48 -0500)]
pan/midgard: Splatter on fragment out

Make sure that the fragment is complete when writing it out.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
4 years agopanfrost: Simplify shader patching
Tomeu Vizoso [Wed, 20 Nov 2019 15:00:23 +0000 (16:00 +0100)]
panfrost: Simplify shader patching

We need to always upload anyway.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopanfrost: Simplify draw_flags
Alyssa Rosenzweig [Fri, 22 Nov 2019 16:45:13 +0000 (11:45 -0500)]
panfrost: Simplify draw_flags

Fixes dEQP-GLES3.functional.primitive_restart.*. Note the 0x18000 value
is accidentally somehow enabling primitive restart for some reason.
I'm not sure where this value came from but let's not.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
4 years agopanfrost: Implement pan_tiler for non-hierarchy GPUs
Alyssa Rosenzweig [Wed, 27 Nov 2019 13:31:16 +0000 (08:31 -0500)]
panfrost: Implement pan_tiler for non-hierarchy GPUs

The algorithm is as described. Nothing fancy here, just need to add some
new code paths depending on which model we're running on.

Tomeu:
- Also disable tiling when !hierarchy and !vertex_count
- Avoid creating polygon lists smaller than the minimum when
  vertex_count > 0 but tile size smaller than 16 byte
- Take into account tile size when calculating polygon list size for
  !hierarchy
- Allow 0-sized tiles in a single dimension

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
4 years agopanfrost: Add information about T720 tiling
Alyssa Rosenzweig [Wed, 27 Nov 2019 13:04:22 +0000 (08:04 -0500)]
panfrost: Add information about T720 tiling

We've figured out most of the big pieces, and though it looks faintly
like other Midgards, it's much simpler.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agopanfrost: Add quirks system to cmdstream
Tomeu Vizoso [Thu, 28 Nov 2019 09:21:06 +0000 (10:21 +0100)]
panfrost: Add quirks system to cmdstream

Similarly to how it's already done in the compiler, add a way to express
differences between GPU models that need to be taken into account when
assembling the cmdstream.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
4 years agonir/algebraic: Rearrange bcsel sequences generated by nir_opt_peephole_select
Ian Romanick [Fri, 1 Nov 2019 23:23:09 +0000 (16:23 -0700)]
nir/algebraic: Rearrange bcsel sequences generated by nir_opt_peephole_select

Reviewed-by: Matt Turner <mattst88@gmail.com>
All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14660366 -> 14653437 (-0.05%)
instructions in affected programs: 316166 -> 309237 (-2.19%)
helped: 905
HURT: 10
helped stats (abs) min: 1 max: 36 x̄: 7.67 x̃: 6
helped stats (rel) min: 0.13% max: 18.75% x̄: 4.28% x̃: 3.60%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.10% max: 1.33% x̄: 0.70% x̃: 0.97%
95% mean confidence interval for instructions value: -7.91 -7.23
95% mean confidence interval for instructions %-change: -4.46% -3.99%
Instructions are helped.

total cycles in shared programs: 228571646 -> 228549759 (<.01%)
cycles in affected programs: 56239919 -> 56218032 (-0.04%)
helped: 681
HURT: 216
helped stats (abs) min: 1 max: 5156 x̄: 45.49 x̃: 10
helped stats (rel) min: <.01% max: 10.45% x̄: 1.29% x̃: 0.65%
HURT stats (abs)   min: 1 max: 320 x̄: 42.09 x̃: 14
HURT stats (rel)   min: <.01% max: 37.04% x̄: 1.38% x̃: 0.49%
95% mean confidence interval for cycles value: -41.51 -7.29
95% mean confidence interval for cycles %-change: -0.80% -0.49%
Cycles are helped.

LOST:   1
GAINED: 0

4 years agonir/algebraic: Simplify some Inf and NaN avoidance code
Ian Romanick [Sat, 2 Nov 2019 02:53:06 +0000 (19:53 -0700)]
nir/algebraic: Simplify some Inf and NaN avoidance code

Since a is non-negative, neither fsqrt nor frsq should return NaN.  frsq
should only return Inf when fsqrt returns 0.

The changes are pretty small, but this turns a few hundred hurt shaders
in the next patch into helped shaders.

An alternative to the intBitsToFloat is to import numpy and do
np.finfo(np.float32).max.  That's more explicit, but we may also want to
have specific bit encodings of float values later.  I could be convinced
either way, but intBitsToFloat(0x7f7fffff) was what I implemented first.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14661140 -> 14661104 (<.01%)
instructions in affected programs: 7520 -> 7484 (-0.48%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.32% max: 0.61% x̄: 0.49% x̃: 0.52%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.52% -0.47%
Instructions are helped.

total cycles in shared programs: 228585416 -> 228584806 (<.01%)
cycles in affected programs: 56321 -> 55711 (-1.08%)
helped: 32
HURT: 0
helped stats (abs) min: 2 max: 98 x̄: 19.06 x̃: 10
helped stats (rel) min: 0.08% max: 6.41% x̄: 1.09% x̃: 0.65%
95% mean confidence interval for cycles value: -28.32 -9.80
95% mean confidence interval for cycles %-change: -1.63% -0.54%
Cycles are helped.

Sandy Bridge
total cycles in shared programs: 152991077 -> 152991075 (<.01%)
cycles in affected programs: 11525 -> 11523 (-0.02%)
helped: 2
HURT: 2
helped stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.07% max: 0.11% x̄: 0.09% x̃: 0.09%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08%
95% mean confidence interval for cycles value: -5.27 4.27
95% mean confidence interval for cycles %-change: -0.16% 0.15%
Inconclusive result (value mean confidence interval includes 0).

No changes on Iron Lake or GM45.

4 years agointel/compiler: Increase nir_opt_peephole_select threshold
Ian Romanick [Fri, 1 Nov 2019 22:40:12 +0000 (15:40 -0700)]
intel/compiler: Increase nir_opt_peephole_select threshold

I tried 2, 4, 6, 8, and 10.  8 seemed to be the sweet spot across all
Intel platforms.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14736141 -> 14661140 (-0.51%)
instructions in affected programs: 2272413 -> 2197412 (-3.30%)
helped: 8416
HURT: 140
helped stats (abs) min: 1 max: 1152 x̄: 8.99 x̃: 6
helped stats (rel) min: 0.13% max: 42.55% x̄: 4.15% x̃: 3.20%
HURT stats (abs)   min: 1 max: 140 x̄: 4.73 x̃: 1
HURT stats (rel)   min: 0.03% max: 3.44% x̄: 0.87% x̃: 0.60%
95% mean confidence interval for instructions value: -9.36 -8.17
95% mean confidence interval for instructions %-change: -4.14% -3.99%
Instructions are helped.

total cycles in shared programs: 231560416 -> 228585416 (-1.28%)
cycles in affected programs: 126536021 -> 123561021 (-2.35%)
helped: 7092
HURT: 1898
helped stats (abs) min: 1 max: 419320 x̄: 519.02 x̃: 159
helped stats (rel) min: <.01% max: 77.25% x̄: 13.52% x̃: 11.77%
HURT stats (abs)   min: 1 max: 14518 x̄: 371.91 x̃: 36
HURT stats (rel)   min: <.01% max: 103.23% x̄: 5.92% x̃: 2.55%
95% mean confidence interval for cycles value: -514.34 -147.50
95% mean confidence interval for cycles %-change: -9.69% -9.14%
Cycles are helped.

total spills in shared programs: 5763 -> 5848 (1.47%)
spills in affected programs: 1797 -> 1882 (4.73%)
helped: 13
HURT: 13

total fills in shared programs: 17163 -> 16931 (-1.35%)
fills in affected programs: 7214 -> 6982 (-3.22%)
helped: 22
HURT: 19

total sends in shared programs: 730410 -> 730246 (-0.02%)
sends in affected programs: 2705 -> 2541 (-6.06%)
helped: 114
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 1.44 x̃: 1
helped stats (rel) min: 0.60% max: 20.00% x̄: 7.26% x̃: 5.88%
95% mean confidence interval for sends value: -1.55 -1.33
95% mean confidence interval for sends %-change: -7.90% -6.62%
Sends are helped.

LOST:   4
GAINED: 0

Sandy Bridge
total instructions in shared programs: 10760511 -> 10724637 (-0.33%)
instructions in affected programs: 961305 -> 925431 (-3.73%)
helped: 3734
HURT: 110
helped stats (abs) min: 1 max: 151 x̄: 9.66 x̃: 8
helped stats (rel) min: 0.14% max: 41.21% x̄: 4.93% x̃: 3.95%
HURT stats (abs)   min: 1 max: 20 x̄: 1.68 x̃: 1
HURT stats (rel)   min: 0.12% max: 5.41% x̄: 0.88% x̃: 0.52%
95% mean confidence interval for instructions value: -9.76 -8.91
95% mean confidence interval for instructions %-change: -4.90% -4.63%
Instructions are helped.

total cycles in shared programs: 153359411 -> 152991077 (-0.24%)
cycles in affected programs: 11615401 -> 11247067 (-3.17%)
helped: 2725
HURT: 1138
helped stats (abs) min: 1 max: 2844 x̄: 164.27 x̃: 80
helped stats (rel) min: 0.02% max: 48.60% x̄: 7.47% x̃: 3.91%
HURT stats (abs)   min: 1 max: 4351 x̄: 69.69 x̃: 25
HURT stats (rel)   min: 0.02% max: 40.00% x̄: 3.39% x̃: 1.47%
95% mean confidence interval for cycles value: -103.18 -87.52
95% mean confidence interval for cycles %-change: -4.57% -3.97%
Cycles are helped.

total sends in shared programs: 584038 -> 583855 (-0.03%)
sends in affected programs: 3512 -> 3329 (-5.21%)
helped: 157
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.17 x̃: 1
helped stats (rel) min: 2.38% max: 25.00% x̄: 6.52% x̃: 6.06%
95% mean confidence interval for sends value: -1.26 -1.07
95% mean confidence interval for sends %-change: -7.17% -5.87%
Sends are helped.

LOST:   23
GAINED: 0

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8122617 -> 8111592 (-0.14%)
instructions in affected programs: 380503 -> 369478 (-2.90%)
helped: 912
HURT: 86
helped stats (abs) min: 1 max: 129 x̄: 12.19 x̃: 9
helped stats (rel) min: 0.30% max: 39.21% x̄: 3.69% x̃: 2.57%
HURT stats (abs)   min: 1 max: 2 x̄: 1.05 x̃: 1
HURT stats (rel)   min: 0.12% max: 3.64% x̄: 0.54% x̃: 0.36%
95% mean confidence interval for instructions value: -12.00 -10.10
95% mean confidence interval for instructions %-change: -3.56% -3.10%
Instructions are helped.

total cycles in shared programs: 188509780 -> 188534398 (0.01%)
cycles in affected programs: 7211542 -> 7236160 (0.34%)
helped: 859
HURT: 132
helped stats (abs) min: 2 max: 690 x̄: 46.59 x̃: 16
helped stats (rel) min: 0.01% max: 26.76% x̄: 1.53% x̃: 0.33%
HURT stats (abs)   min: 2 max: 1592 x̄: 489.67 x̃: 618
HURT stats (rel)   min: 0.03% max: 185.92% x̄: 23.35% x̃: 6.26%
95% mean confidence interval for cycles value: 9.58 40.10
95% mean confidence interval for cycles %-change: 0.65% 2.93%
Cycles are HURT.

4 years agonir/opt_peephole_select: Don't count some unary operations
Ian Romanick [Fri, 1 Nov 2019 21:52:38 +0000 (14:52 -0700)]
nir/opt_peephole_select: Don't count some unary operations

In many cases, fsat, fneg, fabs, ineg, and iabs will get folded into
another instruction as either source or destination modifiers.
Counting them as instructions means that some if-statements won't get
converted to selects.  For example,

        vec1 32 ssa_25 = flt32 ssa_0, ssa_23.x
        /* succs: block_1 block_2 */
        if ssa_25 {
                block block_1:
                /* preds: block_0 */
                vec1 32 ssa_26 = fabs ssa_24
                vec1 32 ssa_27 = fneg ssa_26
                vec1 32 ssa_28 = fabs ssa_20
                vec1 32 ssa_29 = fneg ssa_28
                vec1 32 ssa_30 = fmul ssa_27, ssa_29
                vec1 32 ssa_31 = fsat ssa_30
                /* succs: block_3 */
        } else {
                block block_2:
                /* preds: block_0 */
                /* succs: block_3 */
        }
        block block_3:
        /* preds: block_1 block_2 */

block_1 isn't really 6 instructions, but it will be counted that way.

Most callers of the peephole_select pass use either 1 or 8.  It's very
easy to blow way past either of these limits with things that are really
only one or two actual instructions.

I also tried some fancier things like making sure the fsat was of
another SSA def from the same block, but the simple test was actually
better.

The i965 back-end SEL peephole pass still helps ~700 shaders in
shader-db with this change.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
All Gen6+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14743694 -> 14738910 (-0.03%)
instructions in affected programs: 156575 -> 151791 (-3.06%)
helped: 1204
HURT: 0
helped stats (abs) min: 1 max: 27 x̄: 3.97 x̃: 3
helped stats (rel) min: 0.15% max: 19.57% x̄: 5.15% x̃: 4.55%
95% mean confidence interval for instructions value: -4.12 -3.82
95% mean confidence interval for instructions %-change: -5.35% -4.95%
Instructions are helped.

total cycles in shared programs: 231749141 -> 231602916 (-0.06%)
cycles in affected programs: 2818975 -> 2672750 (-5.19%)
helped: 876
HURT: 322
helped stats (abs) min: 2 max: 788 x̄: 180.99 x̃: 220
helped stats (rel) min: <.01% max: 43.82% x̄: 20.75% x̃: 19.44%
HURT stats (abs)   min: 1 max: 1188 x̄: 38.27 x̃: 20
HURT stats (rel)   min: 0.09% max: 102.67% x̄: 5.17% x̃: 1.70%
95% mean confidence interval for cycles value: -130.47 -113.64
95% mean confidence interval for cycles %-change: -14.85% -12.72%
Cycles are helped.

total sends in shared programs: 730495 -> 730491 (<.01%)
sends in affected programs: 46 -> 42 (-8.70%)
helped: 2
HURT: 0

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8122757 -> 8122617 (<.01%)
instructions in affected programs: 14716 -> 14576 (-0.95%)
helped: 46
HURT: 1
helped stats (abs) min: 1 max: 8 x̄: 3.07 x̃: 3
helped stats (rel) min: 0.36% max: 10.00% x̄: 2.54% x̃: 1.06%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 1.59% max: 1.59% x̄: 1.59% x̃: 1.59%
95% mean confidence interval for instructions value: -3.42 -2.54
95% mean confidence interval for instructions %-change: -3.28% -1.62%
Instructions are helped.

total cycles in shared programs: 188510100 -> 188509780 (<.01%)
cycles in affected programs: 58994 -> 58674 (-0.54%)
helped: 32
HURT: 1
helped stats (abs) min: 2 max: 96 x̄: 10.06 x̃: 6
helped stats (rel) min: 0.05% max: 15.29% x̄: 1.37% x̃: 0.31%
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.68% max: 0.68% x̄: 0.68% x̃: 0.68%
95% mean confidence interval for cycles value: -16.34 -3.06
95% mean confidence interval for cycles %-change: -2.46% -0.15%
Cycles are helped.

4 years agoiris: Allow max dynamic pool size of 2GB for gen12
Jordan Justen [Sat, 25 May 2019 08:33:17 +0000 (01:33 -0700)]
iris: Allow max dynamic pool size of 2GB for gen12

Reworks:
 * Adjust comment to list the state packets that curro found to be
   affected.

Fixes: 8125d7960b6 ("intel/dev: Add preliminary device info for Tigerlake")
Cc: 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
4 years agoradeonsi/gfx10: fix the vertex order for triangle strips emitted by a GS
Marek Olšák [Fri, 29 Nov 2019 00:46:11 +0000 (19:46 -0500)]
radeonsi/gfx10: fix the vertex order for triangle strips emitted by a GS

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoradeonsi/gfx10: simplify some duplicated NGG GS code
Marek Olšák [Thu, 28 Nov 2019 23:15:48 +0000 (18:15 -0500)]
radeonsi/gfx10: simplify some duplicated NGG GS code

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
4 years agoutil/u_thread: don't restrict u_thread_get_time_nano() to __linux__
Jonathan Gray [Sat, 30 Nov 2019 15:17:36 +0000 (02:17 +1100)]
util/u_thread: don't restrict u_thread_get_time_nano() to __linux__

pthread_getcpuclockid() and clock_gettime() are also available on at least
OpenBSD, FreeBSD, NetBSD, DragonFly, Cygwin.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
4 years agoutil/futex: use futex syscall on OpenBSD
Jonathan Gray [Sat, 30 Nov 2019 15:19:38 +0000 (02:19 +1100)]
util/futex: use futex syscall on OpenBSD

Make use of the futex syscall added in OpenBSD 6.2.

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
4 years agomeson: Add a "prefer_iris" build option
Kenneth Graunke [Sat, 23 Nov 2019 06:18:58 +0000 (22:18 -0800)]
meson: Add a "prefer_iris" build option

Enabling this option makes Intel Gen8-11 hardware load the 'iris'
driver by default instead of the older 'i965' driver.

Regardless of how this option is set, users can still override which
driver the loader selects via two methods.  The first is to create a
~/.drirc or /etc/drirc file with the following snippet:

   <driconf>
     <device driver="loader" kernel_driver="i915">
       <option name="dri_driver" value="i965" />
     </device>
   </driconf>

The other option is to set an environment variable:

   export MESA_LOADER_DRIVER_OVERRIDE=i965

For now, "prefer_iris" defaults to i965 (the historical choice).
A separate future patch will change the default driver to iris.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1893
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agoturnip: fix display wsi fence timing out
Jonathan Marek [Sun, 24 Nov 2019 14:42:43 +0000 (09:42 -0500)]
turnip: fix display wsi fence timing out

Fixes: df9f2adf ("turnip: add display wsi")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agonir/lower_io_to_vector: don't create arrays when not needed
Rhys Perry [Thu, 14 Nov 2019 15:31:52 +0000 (15:31 +0000)]
nir/lower_io_to_vector: don't create arrays when not needed

Some backends require that there are no array varyings.

If there were no arrays in the input shader, the pass shouldn't have to
create new ones.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2103
Fixes: bcd14756eec ('nir/lower_io_to_vector: add flat mode')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
4 years agoaco: fix block_kind_discard s_andn2 definition to exec
Rhys Perry [Mon, 18 Nov 2019 21:00:17 +0000 (21:00 +0000)]
aco: fix block_kind_discard s_andn2 definition to exec

Improves generated code of dEQP-VK.graphicsfuzz.disc-and-add-in-func-in-loop
because a loop exit phi can then be fixed to exec, removing copies and
improving jump threading.

No pipeline-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: handle loop exit and IF merge phis with break/discard
Rhys Perry [Mon, 18 Nov 2019 17:26:38 +0000 (17:26 +0000)]
aco: handle loop exit and IF merge phis with break/discard

ACO considers discards jumps and creates edges in the CFG for them but NIR
does neither of these.

This can be fixed instead by keeping track of whether a side of an IF had
a break/discard, but this doesn't solve the issue with discards affecting
loop exit phis. So this reworks phi handling a bit.

Fixes these tests:
dEQP-VK.graphicsfuzz.disc-and-add-in-func-in-loop
dEQP-VK.graphicsfuzz.loop-call-discard
dEQP-VK.graphicsfuzz.complex-nested-loops-and-call

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: validate the CFG
Rhys Perry [Tue, 19 Nov 2019 14:19:49 +0000 (14:19 +0000)]
aco: validate the CFG

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agomesa/main/util: moving gallium u_mm to util, remove main/mm
Alejandro Piñeiro [Thu, 28 Nov 2019 20:49:02 +0000 (21:49 +0100)]
mesa/main/util: moving gallium u_mm to util, remove main/mm

Right now there are two copies of mm:
   * mesa/main/mm.[ch]
   * gallium/auxiliary/util/u_mm.[ch]

At some point they splitted, and from the commit message it was not
clear why it was not possible to have only one copy at a common place.

Taking into account that was several years ago, Im assuming that it
was not possible then.

This change would allow to have one copy of the same code, and also
being able to use that code out of mesa/main or gallium, if needed.

This commit moves u_mm and removes mm, as u_mm has slightly more
changes.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
4 years agoradv: set writes_memory for global memory stores/atomics
Rhys Perry [Thu, 28 Nov 2019 11:30:55 +0000 (11:30 +0000)]
radv: set writes_memory for global memory stores/atomics

Fixes: 13ab63bb62b ('radv: Implement VK_EXT_buffer_device_address.')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoac/llvm: improve sync scope for global atomics
Rhys Perry [Wed, 27 Nov 2019 16:49:53 +0000 (16:49 +0000)]
ac/llvm: improve sync scope for global atomics

Stronger ordering is implemented in SPIRV->NIR with barriers.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agoac/llvm: fix pointer type for global atomics
Rhys Perry [Wed, 27 Nov 2019 16:49:33 +0000 (16:49 +0000)]
ac/llvm: fix pointer type for global atomics

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
4 years agoiris: Map FXT1 texture formats
Kenneth Graunke [Wed, 27 Nov 2019 10:44:37 +0000 (02:44 -0800)]
iris: Map FXT1 texture formats

This exposes GL_TDFX_texture_compression_FXT1 support.  It's ancient,
only Intel GPUs appear to support it, and I seriously doubt anybody
uses it.  But i965 supports it, and it's trivial to do, so we may as
well support it in the new iris driver as well.

Reviewed-by: Eric Anholt <eric@anholt.net>
4 years agost/mesa: Add GL_TDFX_texture_compression_FXT1 support
Kenneth Graunke [Wed, 27 Nov 2019 10:41:47 +0000 (02:41 -0800)]
st/mesa: Add GL_TDFX_texture_compression_FXT1 support

Eric recently added PIPE_FORMAT_FXT1_RGB[A] as part of his format
unification work.  This was really most of the work of implementing
the extension.  We just need to handle it in a couple of places and
expose the extension.

v2: Reject the new formats in llvmpipe_is_format_supported to prevent
    crashes because it doesn't know how to handle the new formats.

Reviewed-by: Marek Olšák <marek.olsak@amd.com> [v1]
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
4 years agonir/samplers: don't zero samplers_used/txf.
Dave Airlie [Fri, 29 Nov 2019 00:56:05 +0000 (10:56 +1000)]
nir/samplers: don't zero samplers_used/txf.

This allows this pass to be run multiple times and the results are
just or'ed together.

It fixes on test on llvmpipe nir, and regresses none.

Suggested by Kenneth

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agoaco: drop useless lowering of deref operations for shared memory
Samuel Pitoiset [Thu, 7 Nov 2019 21:34:20 +0000 (22:34 +0100)]
aco: drop useless lowering of deref operations for shared memory

Moved to RADV. No pipeline-db changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoradv,ac/nir: lower deref operations for shared memory
Samuel Pitoiset [Thu, 7 Nov 2019 14:56:35 +0000 (15:56 +0100)]
radv,ac/nir: lower deref operations for shared memory

This shouldn't introduce any functional changes for RadeonSI
when NIR is enabled because these operations are already lowered.

pipeline-db (NAVI10/LLVM):
SGPRS: 9043 -> 9051 (0.09 %)
VGPRS: 7272 -> 7292 (0.28 %)
Code Size: 638892 -> 621628 (-2.70 %) bytes
LDS: 1333 -> 1331 (-0.15 %) blocks
Max Waves: 1614 -> 1608 (-0.37 %)

Found this while glancing at some F12019 shaders.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
4 years agoaco: fix a couple of value numbering issues
Daniel Schürmann [Fri, 29 Nov 2019 15:47:13 +0000 (16:47 +0100)]
aco: fix a couple of value numbering issues

Fixes: 3a20ef4a3299fddc886f9d5908d8b3952dd63a54 'aco: refactor value numbering'
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: don't split live-ranges of linear VGPRs
Daniel Schürmann [Fri, 29 Nov 2019 15:43:24 +0000 (16:43 +0100)]
aco: don't split live-ranges of linear VGPRs

Fixes: 93c8ebfa780ebd1495095e794731881aef29e7d3 'aco: Initial commit of independent AMD compiler'
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
4 years agoaco: implement global atomics
Rhys Perry [Wed, 27 Nov 2019 16:51:10 +0000 (16:51 +0000)]
aco: implement global atomics

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: improve FLAT/GLOBAL scheduling
Rhys Perry [Wed, 27 Nov 2019 17:27:36 +0000 (17:27 +0000)]
aco: improve FLAT/GLOBAL scheduling

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: don't enable store_global for helper invocations
Rhys Perry [Wed, 27 Nov 2019 17:15:54 +0000 (17:15 +0000)]
aco: don't enable store_global for helper invocations

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: fix SADDR with FLAT on GFX10
Rhys Perry [Wed, 27 Nov 2019 17:08:27 +0000 (17:08 +0000)]
aco: fix SADDR with FLAT on GFX10

The reference guide is incorrect and SADDR is actually used with FLAT on
GFX10.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: fix assembly of FLAT/GLOBAL atomics
Rhys Perry [Wed, 27 Nov 2019 17:06:10 +0000 (17:06 +0000)]
aco: fix assembly of FLAT/GLOBAL atomics

They can take both a definition and data operand

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: fix GFX10 opcodes for some global/flat atomics
Rhys Perry [Tue, 26 Nov 2019 21:06:35 +0000 (21:06 +0000)]
aco: fix GFX10 opcodes for some global/flat atomics

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: improve WAR hazard workaround with >64bit stores
Rhys Perry [Wed, 27 Nov 2019 17:23:02 +0000 (17:23 +0000)]
aco: improve WAR hazard workaround with >64bit stores

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: add v_nop inbetween exec write and VMEM/DS/FLAT
Rhys Perry [Wed, 27 Nov 2019 17:20:15 +0000 (17:20 +0000)]
aco: add v_nop inbetween exec write and VMEM/DS/FLAT

LLVM and the proprietary compiler seem to do this

Fixes: b01847bd9 ("aco/gfx10: Fix mitigation of VMEMtoScalarWriteHazard.")
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: fix incorrect cast in parse_wait_instr()
Rhys Perry [Wed, 27 Nov 2019 17:11:58 +0000 (17:11 +0000)]
aco: fix incorrect cast in parse_wait_instr()

s_waitcnt is SOPP, not SOPK

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: fix i2i64
Rhys Perry [Wed, 27 Nov 2019 17:24:23 +0000 (17:24 +0000)]
aco: fix i2i64

Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoaco: propagate p_wqm on an image_sample's coordinate p_create_vector
Rhys Perry [Thu, 28 Nov 2019 15:29:40 +0000 (15:29 +0000)]
aco: propagate p_wqm on an image_sample's coordinate p_create_vector

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2156
Fixes: 93c8ebfa780 ('aco: Initial commit of independent AMD compiler')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
4 years agoetnaviv: remove dead code
Christian Gmeiner [Fri, 29 Nov 2019 14:15:27 +0000 (15:15 +0100)]
etnaviv: remove dead code

ptiled is always NULL so the if statement is useless.

CoverityID: 1415572
Fixes: b9627765303 ("etnaviv: rework compatible render base")
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoetnaviv: handle integer case for GENERIC_ATTRIB_SCALE
Christian Gmeiner [Sat, 19 Oct 2019 17:12:53 +0000 (19:12 +0200)]
etnaviv: handle integer case for GENERIC_ATTRIB_SCALE

Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoetnaviv: fix R10G10B10A2 vertex format entries
Christian Gmeiner [Sat, 19 Oct 2019 16:48:35 +0000 (18:48 +0200)]
etnaviv: fix R10G10B10A2 vertex format entries

Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoetnaviv: use NORMALIZE_SIGN_EXTEND
Christian Gmeiner [Wed, 16 Oct 2019 04:31:17 +0000 (06:31 +0200)]
etnaviv: use NORMALIZE_SIGN_EXTEND

The blob driver does something like this for all vertex formats:

if (normalize) {
   if (OPENGL_ES30)
      val = VIVS_FE_VERTEX_ELEMENT_CONFIG_NORMALIZE_SIGN_EXTEND;
   else
      val = VIVS_FE_VERTEX_ELEMENT_CONFIG_NORMALIZE_ON;
} else {
   val = VIVS_FE_VERTEX_ELEMENT_CONFIG_NORMALIZE_OFF;
}

As there is no way to get to that information in gallium we always
assume OPENGL_ES30.

Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoetnaviv: fix integer vertex formats
Christian Gmeiner [Sun, 13 Oct 2019 08:54:48 +0000 (10:54 +0200)]
etnaviv: fix integer vertex formats

Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
4 years agoi965: update Makefile.sources for perf changes
Jonathan Gray [Thu, 28 Nov 2019 05:57:23 +0000 (16:57 +1100)]
i965: update Makefile.sources for perf changes

brw_performance_query_metrics.h was removed in
134e750e16bfc53480e0bba6f0ae3e1d2a7fb87c and
brw_performance_query.h was removed in
8ae6667992ccca41d08884d863b8aeb22a4c4e65

remove reference to these files from Makefile.sources

Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Fixes: 134e750e16bfc53480e0 ("i965: extract performance query metrics")
Fixes: 8ae6667992ccca41d088 ("intel/perf: move query_object into perf")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
4 years agoscons: Bump C standard to gnu11 on macOS 10.15.
Vinson Lee [Thu, 28 Nov 2019 08:05:13 +0000 (00:05 -0800)]
scons: Bump C standard to gnu11 on macOS 10.15.

Fix build error on macOS 10.15 Catalina.

src/util/u_queue.c:179:7: error: implicit declaration of function 'timespec_get' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
      timespec_get(&ts, TIME_UTC);
      ^

timespec_get needs C11 starting with macOS 10.15.

/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/time.h
   193 #if (__DARWIN_C_LEVEL >= __DARWIN_C_FULL) && \
   194         ((defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112L) || \
   195         (defined(__cplusplus) && __cplusplus >= 201703L))
   196 /* ISO/IEC 9899:201x 7.27.2.5 The timespec_get function */
   197 #define TIME_UTC 1 /* time elapsed since epoch */
   198 __API_AVAILABLE(macosx(10.15), ios(13.0), tvos(13.0), watchos(6.0))
   199 int timespec_get(struct timespec *ts, int base);
   200 #endif

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Eric Engestrom <eric@engestrom.ch>
4 years agopanfrost: Make sure we reset the damage region of RTs at flush time
Boris Brezillon [Thu, 14 Nov 2019 08:35:27 +0000 (09:35 +0100)]
panfrost: Make sure we reset the damage region of RTs at flush time

We must reset the damage info of our render targets here even though a
damage reset normally happens when the DRI layer swaps buffers. That's
because there can be implicit flushes the GL app is not aware of, and
those might impact the damage region: if part of the damaged portion
is drawn during those implicit flushes, you have to reload those areas
before next draws are pushed, and since the driver can't easily know
what's been modified by the draws it flushed, the easiest solution is
to reload everything.

Reported-by: Carsten Haitzler <raster@rasterman.com>
Fixes: 65ae86b85422 ("panfrost: Add support for KHR_partial_update()")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
4 years agogallium: Fix the ->set_damage_region() implementation
Boris Brezillon [Fri, 8 Nov 2019 23:02:54 +0000 (00:02 +0100)]
gallium: Fix the ->set_damage_region() implementation

BACK_LEFT attachment can be outdated when the user calls
KHR_partial_update() (->lastStamp != ->texture_stamp), leading to a
damage region update on the wrong pipe_resource object.
Let's delay the ->set_damage_region() call until the attachments are
updated when we're in that case.

Reported-by: Carsten Haitzler <raster@rasterman.com>
Fixes: 492ffbed63a2 ("st/dri2: Implement DRI2bufferDamageExtension")
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
4 years agozink: silence coverity error
Erik Faye-Lund [Wed, 27 Nov 2019 16:46:29 +0000 (17:46 +0100)]
zink: silence coverity error

Coverity doesn't know that we always have coordinates if we have lod. To
avoid annoying errors, let's just zero-initialize this.

CoverityID: 1455202
Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agozink: error-check right variable
Erik Faye-Lund [Wed, 27 Nov 2019 16:44:05 +0000 (17:44 +0100)]
zink: error-check right variable

That's not the value we just allocated...

CoverityID: 1455177
Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agozink: avoid NULL-deref
Erik Faye-Lund [Wed, 27 Nov 2019 16:38:53 +0000 (17:38 +0100)]
zink: avoid NULL-deref

Same story as the previous two commits; these functions dereference the
memory they are pointed at. We can't do that.

CoverityID: 1455180
Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agozink: avoid NULL-deref
Erik Faye-Lund [Wed, 27 Nov 2019 16:34:08 +0000 (17:34 +0100)]
zink: avoid NULL-deref

Similar to the previous commit, pipe_resource_reference also dereference
the memory pointed at. Let's avoid it.

CoverityID: 1455198
Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agozink: avoid NULL-deref
Erik Faye-Lund [Wed, 27 Nov 2019 16:17:08 +0000 (17:17 +0100)]
zink: avoid NULL-deref

zink_render_pass_reference will dereference the memory 'dst' points at,
which can't really go well. All we want to do here is to increase the
reference-count, so let's use a different helper for that instead.

CoverityID: 1455200
Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agozink: handle calloc-failure
Erik Faye-Lund [Wed, 27 Nov 2019 16:31:28 +0000 (17:31 +0100)]
zink: handle calloc-failure

In case we fail to allocate the context, we should notice and fail
gracefully.

CoverityID: 1455193
Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agozink: do not try to destroy NULL-fence
Erik Faye-Lund [Wed, 27 Nov 2019 16:22:24 +0000 (17:22 +0100)]
zink: do not try to destroy NULL-fence

destroy_fence doesn't handle NULL-pointers gracefully. So let's avoid
hitting that code-path, by simply returning NULL early here instead.

CoverityID: 1455179
Reviewed-by: Dave Airlie <airlied@redhat.com>
4 years agozink: delete query rather than allocating a new one
Erik Faye-Lund [Wed, 27 Nov 2019 15:30:29 +0000 (16:30 +0100)]
zink: delete query rather than allocating a new one

It seems I had some fat fingers when writing this function, and I
accidentally ended up allocating a new query and immediately trying to
delete an uninitialized pool instead of just deleting the pool of the
query that was passed.

CoverityID: 1455196
Reviewed-by: Dave Airlie <airlied@redhat.com>