mesa.git
5 years agoac/nir, radv, radeonsi: Switch to using ac_shader_args
Connor Abbott [Mon, 11 Nov 2019 11:50:12 +0000 (12:50 +0100)]
ac/nir, radv, radeonsi: Switch to using ac_shader_args

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
5 years agoac: Add a shared interface between radv, radeonsi, LLVM and ACO
Connor Abbott [Tue, 29 Oct 2019 16:40:30 +0000 (17:40 +0100)]
ac: Add a shared interface between radv, radeonsi, LLVM and ACO

ac_shader_args will be similar to ac_shader_abi, except for being free
from LLVM-specific concepts and therefore capable of being shared
between LLVM and ACO. This will help us accomplish a few different
things:

- Decouple setting up SGPR and VGPR arguments from translating to LLVM,
so that we can reference these arguments in NIR lowering passes, which
will let us lower e.g. descriptor sets in NIR.

- Stop using radv-specific structures for things like determining the
chip generation in ACO.

In the end, we should replace ac_shader_abi with this structure +
driver-specific lowering passes.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Rename ac_arg_regfile
Connor Abbott [Thu, 31 Oct 2019 14:23:35 +0000 (15:23 +0100)]
radv: Rename ac_arg_regfile

We'll duplicate this in a header file in the next commit, and then
remove the original enum. Just rename it temporarily so that things
keep building.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agodrirc: Add glsl_zero_init workaround for GpuTest
Danylo Piliaiev [Fri, 22 Nov 2019 16:05:14 +0000 (18:05 +0200)]
drirc: Add glsl_zero_init workaround for GpuTest

GiMark benchmark from GpuTest has such code in VS:

 out vec4 lightDir0;
 out vec4 lightDir1;

 ...

 lightDir0.xyz = lp0 - vVertex.xyz;
 lightDir1.xyz = lp1 - vVertex.xyz;

In FS:

 float distSqr = dot(lightDir0, lightDir0);

So due to the usage of uninitialized .w channel in the dot product,
distSqr may become undefined which results in many black dots
in the test on Iris.

In https://www.geeks3d.com/forums/index.php/topic,6242.0.html
developer stated that this benchmark most likely won't be updated.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1919
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agomeson: only build imgui when needed
Samuel Pitoiset [Fri, 22 Nov 2019 11:16:50 +0000 (12:16 +0100)]
meson: only build imgui when needed

Only required for Intel tools or the Vulkan overlay layer.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoac/llvm: fix the local invocation index for wave32
Samuel Pitoiset [Thu, 31 Oct 2019 13:00:52 +0000 (14:00 +0100)]
ac/llvm: fix the local invocation index for wave32

Fixes dEQP-VK.compute.builtin_var.local_invocation_index with
RADV_PERFTEST=cswave32.

My initial fix was to lower it but Rhys suggested the shift-right
and it's much better like this.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: disable subgroup shuffle operations on GFX10
Samuel Pitoiset [Thu, 21 Nov 2019 10:27:55 +0000 (11:27 +0100)]
radv: disable subgroup shuffle operations on GFX10

They are broken like on GFX6-GFX7. It seems better to disable them
instead of enabling a broken feature.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agodocs: add llvmpipe to ARB_query_buffer_object.
Dave Airlie [Fri, 22 Nov 2019 04:55:40 +0000 (14:55 +1000)]
docs: add llvmpipe to ARB_query_buffer_object.

5 years agollvmpipe: initial query buffer object support. (v2)
Dave Airlie [Fri, 22 Nov 2019 04:47:59 +0000 (14:47 +1000)]
llvmpipe: initial query buffer object support. (v2)

This fails a couple of piglits due to other bugs in llvmpipe,
but it adds support for the feature properly.

v2: don't reset pipestats, just recalc, fix CI expectation

5 years agoradv: create a fresh fork for each pipeline compile
Timothy Arceri [Sun, 24 Nov 2019 23:08:26 +0000 (10:08 +1100)]
radv: create a fresh fork for each pipeline compile

In order to prevent a potential malicious pipeline tainting our
secure compile process and interfering with successive pipelines
we want to create a fresh fork for each pipeline compile.

Benchmarking has shown that simply forking on each pipeline
creation doubles the total time it takes to compile a fossilize db
collection. So instead here we fork the process at device creation
so that we have a slim copy of the device and then fork this
otherwise idle and untainted process each time we compile a
pipeline. Forking this slim copy of the device results in only a
20% increase in compile time vs a 100% increase.

Fixes: cff53da3 ("radv: enable secure compile support")
5 years agoradv: add a secure_compile_open_fifo_fds() helper
Timothy Arceri [Wed, 13 Nov 2019 03:51:48 +0000 (14:51 +1100)]
radv: add a secure_compile_open_fifo_fds() helper

This will be used to create a communication pipe between the user
facing device and a freshly forked (per pipeline compile) slim copy
of that device.

We can't use pipe() here because the fork will not be a direct fork
of the user facing process. Instead we use a previously forked
copy of the process that was forked at device creation in order to
reduce the resources required for the fork and avoid performance
issues.

Fixes: cff53da3748d ("radv: enable secure compile support")
5 years agoradv: add some infrastructure for fresh forks for each secure compile
Timothy Arceri [Sun, 24 Nov 2019 23:00:20 +0000 (10:00 +1100)]
radv: add some infrastructure for fresh forks for each secure compile

In the following commits we want to be able to fork an existing lightweight
fork created at device creation time. In order for the user facing process
to communicate with this new fresh fork we create some members here to hold
FIFO file descriptors and a unique id.

Here we also add a new fork enum that we use to tell the lightweight
process to create a fresh fork.

For more information on why we create a fresh fork see the following
commits.

5 years agonir: no-op C99 _Pragma() with MSVC
Brian Paul [Sat, 23 Nov 2019 02:42:34 +0000 (19:42 -0700)]
nir: no-op C99 _Pragma() with MSVC

This fixes a build failure on MSVC.

BTW, it looks like clang supports _Pragma() but I don't know if it
understands the "gcc unroll N" directive.

Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
5 years agoMeson: Add llvm>=9 modules
Michel Zou [Sun, 17 Nov 2019 15:40:29 +0000 (16:40 +0100)]
Meson: Add llvm>=9 modules

Fixes build with MinGW, with shared LLVM and lto
/tmp/opengl32.dll.BxiIYm.ltrans59.ltrans.o:<artificial>:(.text+0x1674): undefined reference to `LLVMAddInstructionCombiningPass'

See also scons/llvm.py

Acked-by: Dylan Baker <dylan@pnwbakers.com>
5 years agodisk_cache_get_function_timestamp: check for dladdr
Michel Zou [Mon, 11 Nov 2019 21:15:41 +0000 (22:15 +0100)]
disk_cache_get_function_timestamp: check for dladdr

instead of dlopen

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
5 years agoMeson: Check for dladdr with MinGW
Michel Zou [Mon, 11 Nov 2019 21:14:55 +0000 (22:14 +0100)]
Meson: Check for dladdr with MinGW

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
5 years agonir/serialize: support any num_components for remaining instructions
Marek Olšák [Fri, 22 Nov 2019 01:24:08 +0000 (20:24 -0500)]
nir/serialize: support any num_components for remaining instructions

Only NPOT vectors greater than vec4 use the extra uint32.

This is for instructions that share the dest code.
load_const and undef already support 1-16 in the header.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir/serialize: use 3 unused bits in intrinsic for packed_const_indices
Marek Olšák [Fri, 22 Nov 2019 01:23:27 +0000 (20:23 -0500)]
nir/serialize: use 3 unused bits in intrinsic for packed_const_indices

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir/serialize: don't serialize redundant nir_intrinsic_instr::num_components
Marek Olšák [Fri, 22 Nov 2019 00:45:46 +0000 (19:45 -0500)]
nir/serialize: don't serialize redundant nir_intrinsic_instr::num_components

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir/serialize: serialize writemask for vec8 and vec16
Marek Olšák [Tue, 12 Nov 2019 03:33:49 +0000 (22:33 -0500)]
nir/serialize: serialize writemask for vec8 and vec16

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir/serialize: serialize swizzles for vec8 and vec16
Marek Olšák [Tue, 12 Nov 2019 03:28:17 +0000 (22:28 -0500)]
nir/serialize: serialize swizzles for vec8 and vec16

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir/serialize: reuse the writemask field for 2 src X swizzles of SSA ALU
Marek Olšák [Thu, 7 Nov 2019 05:28:01 +0000 (00:28 -0500)]
nir/serialize: reuse the writemask field for 2 src X swizzles of SSA ALU

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir/serialize: remove up to 3 consecutive equal ALU instruction headers
Marek Olšák [Wed, 6 Nov 2019 03:14:28 +0000 (22:14 -0500)]
nir/serialize: remove up to 3 consecutive equal ALU instruction headers

vec4 scalarized ALUs typically have 4 equal instruction headers, so remove
the last 3.

There are no bits left in the ALU header for more flags, so future
extensions of NIR will have to use something like instr_type == 15
to describe more complex ALU instructions.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir/serialize: try to pack both deref array src into 32 bits
Marek Olšák [Tue, 5 Nov 2019 23:10:40 +0000 (18:10 -0500)]
nir/serialize: try to pack both deref array src into 32 bits

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir/serialize: cleanup - fold nir_deref_type_var cases into switches
Marek Olšák [Tue, 5 Nov 2019 23:24:27 +0000 (18:24 -0500)]
nir/serialize: cleanup - fold nir_deref_type_var cases into switches

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir/serialize: try to put deref->var index into the unused bits of the header
Marek Olšák [Tue, 5 Nov 2019 22:53:32 +0000 (17:53 -0500)]
nir/serialize: try to put deref->var index into the unused bits of the header

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir/serialize: don't serialize mode for deref non-cast instructions
Marek Olšák [Tue, 5 Nov 2019 22:39:38 +0000 (17:39 -0500)]
nir/serialize: don't serialize mode for deref non-cast instructions

It can be derived from src and var. This frees 10 bits in the header
that will be used later.

"mode" is moved in the structure, because those bits will be used for
something else later.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir/serialize: don't store deref types if not needed
Marek Olšák [Tue, 5 Nov 2019 01:11:11 +0000 (20:11 -0500)]
nir/serialize: don't store deref types if not needed

- type_cast: deduplicate types if the last one is the same
- derive the type from the parent for other derefs

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir/serialize: try to pack two alu srcs into 1 uint32
Marek Olšák [Tue, 5 Nov 2019 05:09:29 +0000 (00:09 -0500)]
nir/serialize: try to pack two alu srcs into 1 uint32

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir/serialize: pack nir_intrinsic_instr::const_index[] better
Marek Olšák [Tue, 5 Nov 2019 04:29:33 +0000 (23:29 -0500)]
nir/serialize: pack nir_intrinsic_instr::const_index[] better

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir/serialize: pack 1-component constants into 20 bits if possible
Marek Olšák [Tue, 5 Nov 2019 03:25:15 +0000 (22:25 -0500)]
nir/serialize: pack 1-component constants into 20 bits if possible

The majority of constants can be packed like this.

v2: - use enum for the packing encoding,
    - trim packed_value to 20 bits add 1 bit to last_component,
      which simplifies a later commit

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir/serialize: pack load_const with non-64-bit constants better
Marek Olšák [Tue, 5 Nov 2019 03:15:17 +0000 (22:15 -0500)]
nir/serialize: pack load_const with non-64-bit constants better

v2: use blob_write_uint8/16

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir/serialize: try to store a diff in var data locations instead of var data
Marek Olšák [Tue, 5 Nov 2019 02:31:40 +0000 (21:31 -0500)]
nir/serialize: try to store a diff in var data locations instead of var data

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir/serialize: deduplicate serialized var types by reusing the last unique one
Marek Olšák [Tue, 5 Nov 2019 01:11:11 +0000 (20:11 -0500)]
nir/serialize: deduplicate serialized var types by reusing the last unique one

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir/serialize: don't serialize var->data for temporaries
Marek Olšák [Tue, 5 Nov 2019 00:42:42 +0000 (19:42 -0500)]
nir/serialize: don't serialize var->data for temporaries

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir/serialize: pack src better and limit the object count to 1M from 1G
Marek Olšák [Wed, 30 Oct 2019 22:14:37 +0000 (18:14 -0400)]
nir/serialize: pack src better and limit the object count to 1M from 1G

We need to limit the object count to 1M to free 10 bits for the src
modifiers.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir/serialize: pack instructions better
Marek Olšák [Fri, 25 Oct 2019 06:39:54 +0000 (02:39 -0400)]
nir/serialize: pack instructions better

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agoutil/blob: add 8-bit and 16-bit reads and writes
Marek Olšák [Wed, 20 Nov 2019 00:36:36 +0000 (19:36 -0500)]
util/blob: add 8-bit and 16-bit reads and writes

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agoci: Use a tag from the parallel-deqp-runner repo.
Eric Anholt [Fri, 22 Nov 2019 23:16:27 +0000 (15:16 -0800)]
ci: Use a tag from the parallel-deqp-runner repo.

If the repo continues development, we don't want to accidentally pick
up potentially breaking changes on our next container rebuild.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agogitlab-ci/freedreno/a6xx: remove most of the flakes
Rob Clark [Thu, 21 Nov 2019 18:54:13 +0000 (10:54 -0800)]
gitlab-ci/freedreno/a6xx: remove most of the flakes

xfb + lines/points still flakes too frequently (and the problem isn't
even related to xfb), but we can add the rest back into this mix now.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agogitlab-ci/deqp: generate junit results
Rob Clark [Sun, 17 Nov 2019 20:04:50 +0000 (12:04 -0800)]
gitlab-ci/deqp: generate junit results

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agogitlab-ci/deqp: generate xml results for fails/flakes
Rob Clark [Sun, 17 Nov 2019 19:57:26 +0000 (11:57 -0800)]
gitlab-ci/deqp: generate xml results for fails/flakes

Extract .qpa for the individual unexpected results and flakes, and
translate to xml, preserved with the artifacts.  This allows easy
browsing of the test logs for fails/flakes, for easier debugging.

The # of logs to preserve is capped at 50 to avoid saving 100s of
megabytes of logs in case someone pushes a change that breaks
everything.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agogitlab-ci: bump arm test container
Rob Clark [Fri, 22 Nov 2019 21:30:18 +0000 (13:30 -0800)]
gitlab-ci: bump arm test container

To pick up updated cts_runner and netcat for the flake reporting.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agogitlab-ci/deqp: detect and report flakes
Rob Clark [Sun, 17 Nov 2019 19:33:01 +0000 (11:33 -0800)]
gitlab-ci/deqp: detect and report flakes

If there are a small number of fails, re-run to determine if they are
flakes, and optionally (if `$FLAKES_CHANNEL` configured) report the
flakes.

This way flakes don't interfere with developers working on other
drivers, but get logged so that the developers working on the flaking
driver can monitor the situation.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agogitlab-ci/deqp: preserve caselists for blocks with fails
Rob Clark [Sun, 17 Nov 2019 19:28:16 +0000 (11:28 -0800)]
gitlab-ci/deqp: preserve caselists for blocks with fails

Bump cts_runner to pick up the change to preserve .qpa and caselist .txt
files for blocks of tests that contain fails, and preserve the caselist
files.  To reproduce fails that depend on order of running tests, these
are useful.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agogitlab-ci/deqp: preserve full list of unexpected results
Rob Clark [Sun, 17 Nov 2019 19:16:09 +0000 (11:16 -0800)]
gitlab-ci/deqp: preserve full list of unexpected results

The log only shows the first 50, but preserve the full list for easier
browsing.

(Also move return of exit code to end which makes later patches in the
series easier)

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agogitlab-ci: update deqp build so we can generate xml
Rob Clark [Fri, 15 Nov 2019 18:15:32 +0000 (10:15 -0800)]
gitlab-ci: update deqp build so we can generate xml

Update the deqp build to preserve testlog-to-xml and stylesheets, so
deqp runner can extract .qpa for failed/flaked tests, and convert to
xml.  With this, will be able to browse output from failed tests
directly from the artifacts.

The main motiviation is to give better visibility into what happens with
flaked tests, when it is difficult/impossible to reproduce the flake
locally (ie. when it happens once out of N million tests).  But this
should also make it easier to debug regressions that a MR triggers,
especially when it is on hw that you don't have.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agodrirc: Enable glthread for dolphin/citra/yuzu.
Markus Wick [Tue, 5 Nov 2019 08:16:37 +0000 (09:16 +0100)]
drirc: Enable glthread for dolphin/citra/yuzu.

Dolphin: 75 fps -> 88 fps - Super Mario Galaxy
Citra:   81 fps -> 91 fps - A Link Between Worlds
Yuzu:    21 fps -> 27 fps - Super Mario Odyssey

Dolphin still has many syncs because of glFenceSync and glClientWaitSync.
Moving them to the dispatcher thread might yield another speedup.

Yuzu uses a compatible profile by default. This benchmark used the variable
MESA_GL_VERSION_OVERRIDE=4.5FC to overwrite this behavior.

This profilation was done on a mobile i7-8550U CPU with i965.

Signed-off-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agomesa/glthread: Implement ARB_multi_bind.
Markus Wick [Sun, 3 Nov 2019 08:49:59 +0000 (09:49 +0100)]
mesa/glthread: Implement ARB_multi_bind.

Signed-off-by: Markus Wick <markus@selfnet.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoaco: fix waitcnts for barriers at block ends
Rhys Perry [Fri, 22 Nov 2019 19:38:51 +0000 (19:38 +0000)]
aco: fix waitcnts for barriers at block ends

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: d1b9deee ('aco: improve waitcnt insertion around loops')
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
5 years agoRevert "draw: revert using correct order for prim decomposition."
Zebediah Figura [Tue, 5 Nov 2019 16:21:21 +0000 (10:21 -0600)]
Revert "draw: revert using correct order for prim decomposition."

This reverts commit f97b731c82afb06cfd6ffebc90a3e098a9a1b308.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/250
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agoiris: Change keybox parenting
Kenneth Graunke [Wed, 5 Jun 2019 20:15:35 +0000 (13:15 -0700)]
iris: Change keybox parenting

For temporary lookups, just allocate out of the NULL ralloc context,
so we don't have to edit the linked list of ralloc children to add it
and then immediately remove it again.

When uploading a new shader, allocate the keybox off the shader, so
if we delete the shader the keybox also goes away.  Less manual cleanup.

5 years agonir/range_analysis: Make sure the table validation only occurs once
Ian Romanick [Sat, 16 Nov 2019 21:19:47 +0000 (13:19 -0800)]
nir/range_analysis: Make sure the table validation only occurs once

All of the tables are static const, so they only need to be validated
once.  As noted in the previous commit, the compiler should be able to
eliminate all of this code when the assertions would pass.  Even with
the help of the previous commit, this does not always occur.

-Og: -95.688 +/- 3.91935 (-24.9562% +/- 1.0222%) N=5
-O1: No difference proven at 95.0% confidence. N=5
-O2: -1.962 +/- 0.85001 (-0.860013% +/- 0.372589%) N=5

Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agonir/range-analysis: Add pragmas to help loop unrolling
Ian Romanick [Sat, 16 Nov 2019 21:23:31 +0000 (13:23 -0800)]
nir/range-analysis: Add pragmas to help loop unrolling

I was pretty liberal with these assertions when I wrote this code
because I had assumed that GCC would unroll the loops, inline the look ups
of static const arrays with now constant indices, and then elmininate
all the actuall assertions.  It seems none of this happens even at -O3.

Adding the pragmas helps encourage loop unrolling at some optimization
levels.  I tested by running shader-db with NIR_VALIDATE=false on a Core
i7 Haswell desktop system.

-Og: No difference proven at 95.0% confidence. N=5
-O1: -48.304 +/- 1.221 (-16.3343% +/- 0.412888%) N=5
-O2: -49.94 +/- 1.23521 (-17.9634% +/- 0.444303%) N=5

v2: Add a _Pragma to an inner loop that was accidentally dropped during
a rebase.

Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoglsl: Add varyings to "zero-init of uninitialized vars" workaround
Danylo Piliaiev [Thu, 21 Nov 2019 13:04:37 +0000 (15:04 +0200)]
glsl: Add varyings to "zero-init of uninitialized vars" workaround

Varyings are similar to already handled cases. And "glsl_zero_init"
name of the workaround already looks like it should include varyings.

The issue was observed in GiMark subtest from GpuTest.

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agopan/midgard: Use lower_tex_without_implicit_lod
Alyssa Rosenzweig [Thu, 21 Nov 2019 18:40:00 +0000 (13:40 -0500)]
pan/midgard: Use lower_tex_without_implicit_lod

Just a bit of cleanup. lower_tex can do this lowering for us, which
should also eliminate some special cases (one less thing to fix if we
ever need texturing in tess/geom/etc, perhaps?)

Closes #2133

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
5 years agoetnaviv: use a more self-explanatory param name
Christian Gmeiner [Fri, 15 Nov 2019 16:35:50 +0000 (17:35 +0100)]
etnaviv: use a more self-explanatory param name

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
5 years agoetnaviv: drop not used config_out function param
Christian Gmeiner [Fri, 15 Nov 2019 16:34:11 +0000 (17:34 +0100)]
etnaviv: drop not used config_out function param

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
5 years agogitlab-ci: reduce the number of scons build
Samuel Pitoiset [Thu, 21 Nov 2019 07:29:25 +0000 (08:29 +0100)]
gitlab-ci: reduce the number of scons build

It seems overkill to me to build scons 7x for every pipeline.
Scons is now build with the oldest llvm version in scons-old-llvm
and with the newest llvm version in scons.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agopanfrost: Add lcra.c to Android.mk
Alyssa Rosenzweig [Thu, 21 Nov 2019 13:43:21 +0000 (08:43 -0500)]
panfrost: Add lcra.c to Android.mk

This was forgotten.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
5 years agopan/midgard: Enable LOD lowering only on buggy chips
Alyssa Rosenzweig [Thu, 21 Nov 2019 13:45:27 +0000 (08:45 -0500)]
pan/midgard: Enable LOD lowering only on buggy chips

T720 and earlier need this workaround, so check the quirk before
lowering.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
5 years agopan/midgard: Describe quirk MIDGARD_BROKEN_LOD
Alyssa Rosenzweig [Wed, 20 Nov 2019 02:21:19 +0000 (21:21 -0500)]
pan/midgard: Describe quirk MIDGARD_BROKEN_LOD

Corresponds to errata #10471, applies to T6xx and T720. Fixed in T760.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
5 years agopan/midgard: Add LOD bias/clamp lowering
Alyssa Rosenzweig [Thu, 21 Nov 2019 13:43:53 +0000 (08:43 -0500)]
pan/midgard: Add LOD bias/clamp lowering

We fetch the info with the new intrinsic and lower with ALU ops for txl
instructions, which seemingly correspond to "TEXGRD" instructions (what
we call textureLod).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
5 years agopan/midgard: Implement load_sampler_lod_paramaters_pan
Alyssa Rosenzweig [Thu, 21 Nov 2019 13:42:28 +0000 (08:42 -0500)]
pan/midgard: Implement load_sampler_lod_paramaters_pan

We can stuff this information in as parametrized system values, like we
currently do texture size and SSBO addresses.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
5 years agonir: Add load_sampler_lod_paramaters_pan intrinsic
Alyssa Rosenzweig [Thu, 21 Nov 2019 13:41:22 +0000 (08:41 -0500)]
nir: Add load_sampler_lod_paramaters_pan intrinsic

This loads in the <min_lod, max_lod, lod_bias> settings for a given
sampler, which is necessary for lowering clamps/biases on certain
Midgard chips.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
5 years agomapi/glapi: Generate sizeof() helpers instead of fixed sizes.
Markus Wick [Sun, 17 Nov 2019 18:12:04 +0000 (19:12 +0100)]
mapi/glapi: Generate sizeof() helpers instead of fixed sizes.

Generating a source code with a fixed size leads to issues with plattform dependent types.
We either hard code 4 or 8 bytes there, and both are wrong on the other plattform.
So this patch solves this issue by generating eg sizeof(GLsizeiptr), which is valid both
on 32 and on 64 bit plattforms.

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
5 years agointel/fs: Disable conditional discard optimization on Gen4 and Gen5
Ian Romanick [Mon, 18 Nov 2019 19:52:47 +0000 (11:52 -0800)]
intel/fs: Disable conditional discard optimization on Gen4 and Gen5

The CMP instruction on Gen4 and Gen5 generates one bit (the LSB) of
valid data and 31 bits of junk.  Results of comparisons that are used as
Boolean values need to have a fixup applied to generate the proper 0/~0
values.

Calling fs_visitor::nir_emit_alu with need_dest=false prevents the fixup
code from being generated.  This results in a sequence like:

        cmp.l.f0.0(16)  g8<1>F          g14<8,8,1>F     0x0F  /* 0F */
        ...
        cmp.l.f0.0(16)  g4<1>F          g6<8,8,1>F      0x0F  /* 0F */
(+f0.1) or.z.f0.1(16) null<1>UD g4<8,8,1>UD     g8<8,8,1>UD

instead of

        cmp.l.f0.0(16)  g8<1>F          g14<8,8,1>F     0x0F  /* 0F */
        ...
        cmp.l.f0.0(16)  g4<1>F          g6<8,8,1>F      0x0F  /* 0F */
        or(16) g4<1>UD g4<8,8,1>UD     g8<8,8,1>UD
(+f0.1) and.z.f0.1(16) null<1>UD g4<8,8,1>UD     1UD

I examined a couple of the shaders hurt by this change, and ALL of them
would have been affected by this bug. :(

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1836
Fixes: 0ba9497e66a ("intel/fs: Improve discard_if code generation")
Iron Lake
total instructions in shared programs: 8122757 -> 8122957 (<.01%)
instructions in affected programs: 8307 -> 8507 (2.41%)
helped: 0
HURT: 100
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.84% max: 6.67% x̄: 2.81% x̃: 2.76%
95% mean confidence interval for instructions value: 2.00 2.00
95% mean confidence interval for instructions %-change: 2.58% 3.03%
Instructions are HURT.

total cycles in shared programs: 188510100 -> 188510376 (<.01%)
cycles in affected programs: 76018 -> 76294 (0.36%)
helped: 0
HURT: 55
HURT stats (abs)   min: 2 max: 12 x̄: 5.02 x̃: 4
HURT stats (rel)   min: 0.07% max: 3.75% x̄: 0.86% x̃: 0.56%
95% mean confidence interval for cycles value: 4.33 5.71
95% mean confidence interval for cycles %-change: 0.60% 1.12%
Cycles are HURT.

GM45
total instructions in shared programs: 4994403 -> 4994503 (<.01%)
instructions in affected programs: 4212 -> 4312 (2.37%)
helped: 0
HURT: 50
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.84% max: 6.25% x̄: 2.76% x̃: 2.72%
95% mean confidence interval for instructions value: 2.00 2.00
95% mean confidence interval for instructions %-change: 2.45% 3.07%
Instructions are HURT.

total cycles in shared programs: 128928750 -> 128928982 (<.01%)
cycles in affected programs: 67442 -> 67674 (0.34%)
helped: 0
HURT: 47
HURT stats (abs)   min: 2 max: 12 x̄: 4.94 x̃: 4
HURT stats (rel)   min: 0.09% max: 3.75% x̄: 0.75% x̃: 0.53%
95% mean confidence interval for cycles value: 4.19 5.68
95% mean confidence interval for cycles %-change: 0.50% 1.00%
Cycles are HURT.

5 years agodocs: update calendar, add news item and link release notes for 19.2.6
Dylan Baker [Fri, 22 Nov 2019 00:33:19 +0000 (16:33 -0800)]
docs: update calendar, add news item and link release notes for 19.2.6

5 years agodocs: Add SHA256 sum for 19.2.6
Dylan Baker [Fri, 22 Nov 2019 00:31:47 +0000 (16:31 -0800)]
docs: Add SHA256 sum for 19.2.6

5 years agodocs: Add release notes for 19.2.6
Dylan Baker [Fri, 22 Nov 2019 00:04:11 +0000 (16:04 -0800)]
docs: Add release notes for 19.2.6

5 years agonir/serialize: do ctx = {0} instead of manual initializations
Marek Olšák [Tue, 5 Nov 2019 02:29:56 +0000 (21:29 -0500)]
nir/serialize: do ctx = {0} instead of manual initializations

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agonir: strip as we serialize to remove the nir_shader_clone call
Marek Olšák [Mon, 4 Nov 2019 23:09:26 +0000 (18:09 -0500)]
nir: strip as we serialize to remove the nir_shader_clone call

Serializing stripped NIR is faster now.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
5 years agoetnaviv: add drm-shim
Christian Gmeiner [Tue, 6 Aug 2019 21:49:03 +0000 (23:49 +0200)]
etnaviv: add drm-shim

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agovk_util: drop duplicate formats in vk_format_map[]
Eric Engestrom [Thu, 21 Nov 2019 20:29:35 +0000 (20:29 +0000)]
vk_util: drop duplicate formats in vk_format_map[]

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoturnip: implement UBWC
Jonathan Marek [Mon, 18 Nov 2019 21:46:39 +0000 (16:46 -0500)]
turnip: implement UBWC

This enables UBWC for everything except 3D textures.

It breaks many image_to_image copies but those aren't important and it can
be worked around later (image_to_image copy needs to be done in two steps,
decode from the source format and then encode to the destination format).

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agofreedreno/regs: update UBWC related bits
Jonathan Marek [Mon, 18 Nov 2019 21:17:55 +0000 (16:17 -0500)]
freedreno/regs: update UBWC related bits

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agoswr: Fix build with llvm-10.0.
Vinson Lee [Wed, 20 Nov 2019 06:51:22 +0000 (06:51 +0000)]
swr: Fix build with llvm-10.0.

Fix build error after llvm-10.0 commit 1dfede3122ee ("Move
CodeGenFileType enum to Support/CodeGen.h").

../src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp: In member function ‘void JitManager::DumpAsm(llvm::Function*, const char*)’:
../src/gallium/drivers/swr/rasterizer/jitter/JitManager.cpp:428:45: error: ‘CGFT_AssemblyFile’ is not a member of ‘llvm::TargetMachine’
             *pMPasses, filestream, nullptr, TargetMachine::CGFT_AssemblyFile);
                                             ^

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
5 years agoaco: fix copy+paste error
Rhys Perry [Tue, 22 Oct 2019 14:16:37 +0000 (15:16 +0100)]
aco: fix copy+paste error

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
5 years agoaco: improve waitcnt insertion around loops
Rhys Perry [Mon, 21 Oct 2019 20:36:41 +0000 (21:36 +0100)]
aco: improve waitcnt insertion around loops

Do this by repeating processing of loops until no progress is made.

Totals from affected shaders:
SGPRS: 162576 -> 162576 (0.00 %)
VGPRS: 145228 -> 145228 (0.00 %)
Spilled SGPRs: 668 -> 668 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 15778640 -> 15771336 (-0.05 %) bytes
LDS: 146 -> 146 (0.00 %) blocks
Max Waves: 6087 -> 6087 (0.00 %)

v2: use block_kind_loop_header/block_kind_loop_exit to repeat at the end
    of loops instead of at each continue

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
5 years agofreedreno/perfctrs/fdperf: periodically restore counters
Rob Clark [Wed, 20 Nov 2019 19:56:57 +0000 (11:56 -0800)]
freedreno/perfctrs/fdperf: periodically restore counters

When GPU is idle and suspends, the currently selected countables
will all reset to the first one.  So periodically restore the selected
countables.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/perfcntrs: add fdperf
Rob Clark [Tue, 19 Nov 2019 22:53:49 +0000 (14:53 -0800)]
freedreno/perfcntrs: add fdperf

Port from the envytools tree, but converted to use the .c tables for
describing the perfcounter groups/countables, rather than using rnndec
to get this at runtime from the register xml.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/perfcntrs/a6xx: remove RBBM counters
Rob Clark [Wed, 20 Nov 2019 00:37:18 +0000 (16:37 -0800)]
freedreno/perfcntrs/a6xx: remove RBBM counters

Currently this are getting blocked by the kernel.. these counters don't
seem to be the most useful ones, and to use them we'd have to somehow
probe the kernel by submitting cmdstream to write the selector regs and
see if that triggers a GPU fault.  So let's just skip them.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/perfctrs/a2xx: move CP to be first group
Rob Clark [Wed, 20 Nov 2019 00:35:58 +0000 (16:35 -0800)]
freedreno/perfctrs/a2xx: move CP to be first group

fdperf expects this, to find the ALWAYS_COUNT counter

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/perfcntrs: add accessor to get per-gen tables
Rob Clark [Tue, 19 Nov 2019 19:43:40 +0000 (11:43 -0800)]
freedreno/perfcntrs: add accessor to get per-gen tables

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/perfcntrs: move to shared location
Rob Clark [Tue, 19 Nov 2019 19:05:59 +0000 (11:05 -0800)]
freedreno/perfcntrs: move to shared location

This should eventually be useful for VK_KHR_performance_query as well.
And in the more near term, for fdperf.

Attempt to not break android build is best-effort and untested.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/perfcntrs: remove gallium dependencies
Rob Clark [Tue, 19 Nov 2019 18:54:04 +0000 (10:54 -0800)]
freedreno/perfcntrs: remove gallium dependencies

Prep work to move to a shared location.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/perfcntrs: small cleanup
Rob Clark [Tue, 19 Nov 2019 18:22:44 +0000 (10:22 -0800)]
freedreno/perfcntrs: small cleanup

When we had one gen supporting performance counters, it made sense to
have these builder macros in the .c file with the table.  But time has
come to de-duplicate.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agonir: fix deref offset builder
Dave Airlie [Mon, 18 Nov 2019 22:26:54 +0000 (08:26 +1000)]
nir: fix deref offset builder

Use the correct bit size

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agovtn/opencl: add clz support
Dave Airlie [Mon, 18 Nov 2019 07:04:35 +0000 (17:04 +1000)]
vtn/opencl: add clz support

This is needed for OpenCL

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonouveau: request ufind_msb64 lowering in the frontend.
Dave Airlie [Tue, 19 Nov 2019 23:23:46 +0000 (09:23 +1000)]
nouveau: request ufind_msb64 lowering in the frontend.

This passes the piglit CL builtin-ulong-clz-1.0.generated.cl
test.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
5 years agonir: add 64-bit ufind_msb lowering support. (v2)
Dave Airlie [Tue, 19 Nov 2019 23:23:14 +0000 (09:23 +1000)]
nir: add 64-bit ufind_msb lowering support. (v2)

This adds the option to lower 64-bit ufind_msb opcodes.

v2: use split_x/y removes component loops (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agospirv/nir/opencl: handle some multiply instructions.
Dave Airlie [Mon, 29 Apr 2019 20:57:11 +0000 (06:57 +1000)]
spirv/nir/opencl: handle some multiply instructions.

This adds support for some missing 24-bit and hi multiply
variants.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agospirv: get the correct type for function returns.
Dave Airlie [Tue, 19 Nov 2019 22:33:10 +0000 (08:33 +1000)]
spirv: get the correct type for function returns.

This needs to be derived from the address format, not always 1/32.

Suggested by Jason

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agospirv: don't store 0 to cs.ptr_size for non kernel stages.
Dave Airlie [Tue, 19 Nov 2019 22:29:30 +0000 (08:29 +1000)]
spirv: don't store 0 to cs.ptr_size for non kernel stages.

cs is a union so storing this there is wrong.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoutil: add missing R8G8B8A8_SRGB format to vk_format_map
Jonathan Marek [Thu, 21 Nov 2019 13:24:19 +0000 (08:24 -0500)]
util: add missing R8G8B8A8_SRGB format to vk_format_map

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Eric Anholt <eric@anholt.net>
5 years agodocs: fix ascii html representation
Elie Tournier [Fri, 15 Nov 2019 13:41:25 +0000 (13:41 +0000)]
docs: fix ascii html representation

v2 (Eric): Use more readable ascii version

Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
5 years agoDocs: remove duplicate meson docs for windows
Elie Tournier [Fri, 15 Nov 2019 13:29:58 +0000 (13:29 +0000)]
Docs: remove duplicate meson docs for windows

This block is duplicated, we already have the windows instruction above.

Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
5 years agoci: Move freedreno's parallelism to the runner instead of gitlab-ci jobs.
Eric Anholt [Thu, 21 Nov 2019 13:12:58 +0000 (05:12 -0800)]
ci: Move freedreno's parallelism to the runner instead of gitlab-ci jobs.

I set the runners to concurrency=1, so they serve only one gitlab-ci job
at at time.  Swap over to using the parallel runner now to keep the
runners busy, more efficiently than spawning many docker containers and
downloading artifacts multiple times, and producing easier-to-understand
results for browsing on the web.

This bumps the a306 runners to 4x parallel instead of 2x like before, but
cheza gles3 drops from 6 to 4.  Current rough timings of the jobs (if no
container download):

db410c-gles2: 5:00
a630-gles2: 1:30
a630-gles3: 6:00
a630-gles31: 5:30

a630-gles3 is a bit longer than I like, but it should come back down once
I can sort out the NIR algebraic rewinding.

5 years agoglsl: add missing initialization of the location path field
Iago Toral Quiroga [Thu, 21 Nov 2019 09:05:49 +0000 (10:05 +0100)]
glsl: add missing initialization of the location path field

This was apparently missed in 67b32190f3c95, which added support
for ARB_shading_language_include to #line, including the 'path'
field for the location.

Fixes crashes in CTS with all drivers as they attempt to access
an uninitialized path string during parsing.

Fixes: 67b32190f3c95 ("glsl: add ARB_shading_language_include support to #line")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2132
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jose Maria Casanova <jmcasanova@igalia.com>
5 years agodocs: update features.txt for RADV
Rhys Perry [Wed, 20 Nov 2019 15:08:30 +0000 (15:08 +0000)]
docs: update features.txt for RADV

[skip ci]

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>