Rob Clark [Fri, 25 Oct 2019 17:48:22 +0000 (10:48 -0700)]
freedreno/ir3: simplify creating sysval inputs
In almost all places, the add_sysval_input() is paired directly with a
create_input(). (The one exception is frag shader ij bary coord, and
this exception will go away in a later patch.)
So go ahead and clean this up before reworking input/output handling.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rob Clark [Fri, 25 Oct 2019 17:36:36 +0000 (10:36 -0700)]
freedreno/ir3: remove first-vertex sysval
This is a driver-param (loaded from uniform), not a sysval (populated by
hw into a register). So it has no value to having a sysval slot.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rob Clark [Fri, 25 Oct 2019 16:28:54 +0000 (09:28 -0700)]
freedreno/ir3: helper to print ir if debug enabled
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rob Clark [Fri, 25 Oct 2019 23:15:10 +0000 (16:15 -0700)]
freedreno/ir3: show input/output wrmask's in disasm
Currently it is always 0x1 (scalar), but that will change in a later
patch.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rob Clark [Thu, 24 Oct 2019 19:05:56 +0000 (12:05 -0700)]
freedreno/ir3: add input/output iterators
We can at least get rid of the if-not-NULL check in a bunch of places.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rob Clark [Sat, 26 Oct 2019 17:47:21 +0000 (10:47 -0700)]
freedreno/ir3: remove impossible condition
We keep kill's alive w/ keeps these days, rather than a fake output.
This condition was left over from prior to that change.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rob Clark [Thu, 24 Oct 2019 17:22:33 +0000 (10:22 -0700)]
freedreno/ir3: rename fanin/fanout to collect/split
If I'm going to refactor a bit to use these meta instructions to also
handle input/output, then might as well cleanup the names first.
Nouveau also uses collect/split for names of these meta instructions,
and I like those names better.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rob Clark [Thu, 24 Oct 2019 18:26:34 +0000 (11:26 -0700)]
freedreno/ir3: remove half-precision output
This doesn't really work, we can't necessarily just change the outputs
to half-precision like this in anything but simple cases.
Keep the shader key entry around though, eventually with proper mediump
support we could use this with a nir pass to use lower precision frag
shader outputs when the render target format has <= 16b/component.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Rob Clark [Sat, 9 Nov 2019 19:07:33 +0000 (11:07 -0800)]
freedreno/ir3: fix valgrind complaint with STLW
The instruction has 3 src regs, so `instr->regs[0..3]` are valid, but
`instr->regs[4]` is not.
```
Test case 'dEQP-GLES31.functional.shaders.linkage.es31.tessellation.varying.rules.output_superfluous_declaration'..
==29239== Invalid read of size 8
==29239== at 0x5BE9CDC: emit_cat6 (ir3.c:841)
==29239== by 0x5BEA1BF: ir3_assemble (ir3.c:921)
==29239== by 0x5BDF0A7: ir3_shader_assemble (ir3_shader.c:133)
==29239== by 0x5BDF193: assemble_variant (ir3_shader.c:162)
==29239== by 0x5BDF407: create_variant (ir3_shader.c:215)
==29239== by 0x5BDF4DB: shader_variant (ir3_shader.c:241)
==29239== by 0x5BDF553: ir3_shader_get_variant (ir3_shader.c:257)
==29239== by 0x5BA85F7: ir3_shader_variant (ir3_gallium.c:80)
==29239== by 0x5BA7703: ir3_cache_lookup (ir3_cache.c:96)
==29239== by 0x5B8B8B3: fd6_emit_get_prog (fd6_emit.h:119)
==29239== by 0x5B8C137: fd6_draw_vbo (fd6_draw.c:186)
==29239== by 0x5BB1FBB: fd_draw_vbo (freedreno_draw.c:290)
==29239== Address 0xb97f2d0 is 0 bytes after a block of size 240 alloc'd
==29239== at 0x4848D54: malloc (in /usr/lib/aarch64-linux-gnu/valgrind/vgpreload_memcheck-arm64-linux.so)
==29239== by 0x61BD35B: ralloc_size (ralloc.c:119)
==29239== by 0x61BD41B: rzalloc_size (ralloc.c:151)
==29239== by 0x5BE599B: ir3_alloc (ir3.c:45)
==29239== by 0x5BEA583: instr_create (ir3.c:984)
==29239== by 0x5BEA5DF: ir3_instr_create2 (ir3.c:1000)
==29239== by 0x5BEE317: ir3_STLW (ir3.h:1431)
==29239== by 0x5BF12D3: emit_intrinsic_store_shared_ir3 (ir3_compiler_nir.c:903)
==29239== by 0x5BF418B: emit_intrinsic (ir3_compiler_nir.c:1802)
==29239== by 0x5BF5D07: emit_instr (ir3_compiler_nir.c:2339)
==29239== by 0x5BF603F: emit_block (ir3_compiler_nir.c:2426)
==29239== by 0x5BF624B: emit_cf_list (ir3_compiler_nir.c:2474)
==29239==
```
Probably this only triggers in non-optimized builds?
Fixes: 1f3b52ce503 ("freedreno/a6xx: Add register offset for STG/LDG")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric Anholt [Tue, 12 Nov 2019 19:50:43 +0000 (11:50 -0800)]
ci: Remove old commented copy of freedreno artifacts.
This path was from an older version of freedreno CI.
Eric Anholt [Tue, 5 Nov 2019 18:31:29 +0000 (10:31 -0800)]
ci: Enable all of GLES3/3.1 testing for softpipe.
Now that we're not using so many job slots, it's easy to get these
jobs run in a reasonable amount of time (gles3 took 10 minutes for 4
cores, and gles31 was 15 minutes for 4 cores).
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
Eric Anholt [Mon, 4 Nov 2019 18:54:41 +0000 (10:54 -0800)]
ci: Use cts_runner for our dEQP runs.
This runner is a little project by Bas, written in C++, that spawns
threads that then loop grabbing chunks of the (randomly shuffled but
consistently so) test list and hand it to a dEQP instance. As the
remaining list gets shorter, so do the chunks, so hopefully the
threads all complete effectively at once. It also handles restarting
after crashes automatically. I've extended the runner a bit to do
what I was doing in the bash scripts before, like the skip list and
expected failures handling. This project should also be a good
baseline for extending to handle retesting of intermittent failures.
By switching to it, we can have the swrast tests just take up one job
slot on the shared runners and keep their allotment of CPUs busy,
instead of taking up job slots with single-threaded dEQP jobs. It
will also let us (eventually, once I reprovision) switch the freedreno
runners over to threading within the job instead of running concurrent
jobs, so that memory scribbles in one pipeline don't affect unrelated
pipelines, and I can experiment with their parallelism (particularly
on a306 where we are frequently backed up) without trashing other
people's jobs.
What we lose in this process is per-test output in the log (not a big
loss, I think, since we summarize fails at the end and reducing log
length keeps chrome from choking on our logs so badly). We also drop
the renderer sanity checking, since it's not saving qpa files for us
to go poke through. Given that all the drivers involved have fail
lists, if we got the wrong renderer somehow, we'd get a job failure
anyway.
v2: Rebase on droppong of the autoscale cluster and the arm64
build/test split. Use a script to deduplicate the cts-runner
build.
v3: Rebase on the amd64 build/test container split.
Acked-by: Daniel Stone <daniels@collabora.com> (v1)
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com> (v2)
Eric Anholt [Tue, 5 Nov 2019 17:50:40 +0000 (09:50 -0800)]
ci: Make the skip list regexes match the full test name.
The bash scripts were using grep in the manner that matches any subset
of the line, but the new CTS runner matches the whole line and I think
that's a pretty good behavior. Given that some of the skip lists
already were written to match the full test name, just make them
consistently do so.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
Eric Anholt [Mon, 4 Nov 2019 19:05:25 +0000 (11:05 -0800)]
ci: Use several debian buster packages instead of hand-building.
This helps cut down our container build time. I've left a few that
we're likely to rev more frequently or I was less confident in
dropping.
v2: Rebase on the build/test container split, now bumps the build
container tag in this commit.
Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Acked-by: Daniel Stone <daniels@collabora.com> (v1)
Rafael Antognolli [Tue, 5 Nov 2019 23:08:01 +0000 (15:08 -0800)]
iris: Use mocs from isl_dev.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Rafael Antognolli [Tue, 5 Nov 2019 19:12:36 +0000 (11:12 -0800)]
anv: Use mocs settings from isl_dev.
v2: Remove device->default_mocs and external_mocs (Jason).
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Rafael Antognolli [Tue, 5 Nov 2019 19:11:53 +0000 (11:11 -0800)]
intel/isl: Add MOCS settings to isl_device.
Centralize mocs settings into isl.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Rob Clark [Tue, 12 Nov 2019 17:01:34 +0000 (09:01 -0800)]
freedreno: fix eglDupNativeFenceFD error
We can end up with scenarios where last_fence is associated with a batch
that is flushed through some other path before needs_out_fence_fd gets
set. Resulting in returning a fence that has no backing fd.
The simplest thing is to just skip the optimization to try and avoid
no-op batches when a fence-fd is requested. This should normally be
just once a frame anyways.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Brian Paul [Mon, 11 Nov 2019 23:43:45 +0000 (16:43 -0700)]
nir: fix a couple signed/unsigned comparison warnings in nir_builder.h
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Brian Paul [Mon, 11 Nov 2019 23:22:49 +0000 (16:22 -0700)]
s/APIENTRY/GLAPIENTRY/ in teximage.c
The later is the right symbol for entrypoint functions.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Lepton Wu [Wed, 30 Oct 2019 00:41:14 +0000 (17:41 -0700)]
android: mesa: Revert "android: mesa: revert "Enable asm unconditionally""
Commit
45206d7673adb1484cbdb3eadaf82e0849c9cdcf fixed PIC issue of x86 asm stub.
We can enable asm for Android x86 now. This should sightly improve performance.
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
Rhys Perry [Tue, 12 Nov 2019 15:55:05 +0000 (15:55 +0000)]
aco: combine read_invocation and shuffle implementations
They do mostly the same thing now.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Rhys Perry [Tue, 12 Nov 2019 15:53:15 +0000 (15:53 +0000)]
aco: don't propagate vgprs into v_readlane/v_writelane
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
Rhys Perry [Tue, 12 Nov 2019 15:44:17 +0000 (15:44 +0000)]
aco: fix read_invocation with VGPR lane index
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
Rhys Perry [Tue, 12 Nov 2019 15:29:45 +0000 (15:29 +0000)]
nir/divergence: improve DA of shuffle
If the data is uniform, then it's really a uniform copy. If the index is
uniform, then it's really a read_invocation.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Rhys Perry [Tue, 12 Nov 2019 15:28:52 +0000 (15:28 +0000)]
aco: fix shuffle with uniform operands
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 93c8ebfa ('aco: Initial commit of independent AMD compiler')
Rhys Perry [Tue, 12 Nov 2019 15:00:48 +0000 (15:00 +0000)]
aco: use DPP instead of exec modification when lowering GFX10 shuffles
Seems we can use DPP's row_mask field to get an effect similar to
modifying exec.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Eric Engestrom [Tue, 12 Nov 2019 14:29:44 +0000 (14:29 +0000)]
gitlab-ci: build libdrm using meson instead of autotools
Autotools was deprecated for a while and has now been removed, so let's
start using meson here so that we won't have any issues next time we
update libdrm.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Daniel Schürmann [Wed, 6 Nov 2019 16:47:06 +0000 (17:47 +0100)]
aco: rematerialize s_movk instructions
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Daniel Schürmann [Thu, 7 Nov 2019 15:22:55 +0000 (16:22 +0100)]
aco: preserve kill flag on moved operands during RA
Fixes: 93c8ebfa780ebd1495095e794731881aef29e7d3 aco: Initial commit of independent AMD compiler
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Daniel Schürmann [Fri, 8 Nov 2019 15:36:11 +0000 (16:36 +0100)]
aco: fix invalid access on Pseudo_instructions
Fixes: 93c8ebfa780ebd1495095e794731881aef29e7d3 aco: Initial commit of independent AMD compiler
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Erik Faye-Lund [Thu, 7 Nov 2019 16:48:32 +0000 (17:48 +0100)]
zink: remove no-longer-needed hack
It seems whatever was causing this is no longer an issue. So let's get
rid of the hack here.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Erik Faye-Lund [Fri, 8 Nov 2019 11:53:43 +0000 (12:53 +0100)]
zink: implement buffer-to-buffer copies
Erik Faye-Lund [Fri, 8 Nov 2019 11:54:09 +0000 (12:54 +0100)]
zink: always allow transfer to/from buffers
Danylo Piliaiev [Wed, 30 Oct 2019 14:14:06 +0000 (16:14 +0200)]
intel/blorp: Fix usage of uninitialized memory in key hashing
The automatically generated padding in structs contains
undefined values, force pack the structs to eliminate the
padding. Otherwise structs with the same values may generate
different hashes.
Valgrind output:
Conditional jump or move depends on uninitialised value(s)
util_fast_urem32 (fast_urem_by_const.h:71)
hash_table_search (hash_table.c:262)
_mesa_hash_table_search (hash_table.c:296)
anv_pipeline_cache_search_locked (anv_pipeline_cache.c:318)
anv_pipeline_cache_search (anv_pipeline_cache.c:335)
lookup_blorp_shader (anv_blorp.c:38)
blorp_params_get_mcs_partial_resolve_kernel (blorp_clear.c:1112)
blorp_mcs_partial_resolve (blorp_clear.c:1205)
anv_image_mcs_op (anv_blorp.c:1742)
anv_cmd_predicated_mcs_resolve (genX_cmd_buffer.c:774)
transition_color_buffer (genX_cmd_buffer.c:1159)
cmd_buffer_end_subpass (genX_cmd_buffer.c:4840)
Uninitialised value was created by a stack allocation
blorp_params_get_mcs_partial_resolve_kernel (blorp_clear.c:1103)
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Danylo Piliaiev [Fri, 8 Nov 2019 15:47:57 +0000 (17:47 +0200)]
i965/program_cache: Lift restriction on shader key size
This will allow usage of packed structs which may have size
not divisible by 4.
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Michel Dänzer [Wed, 6 Nov 2019 18:58:19 +0000 (19:58 +0100)]
gitlab-ci: Delete install/bin from artifacts as well
This cuts the x86 artifacts zip file size in less than half.
Reviewed-by: Eric Anholt <eric@anholt.net>
Michel Dänzer [Tue, 5 Nov 2019 17:52:24 +0000 (18:52 +0100)]
gitlab-ci: Use separate docker images for x86 build/test jobs
Same as was done for the ARM images before.
This should make it less painful to update to newer dEQP / piglit as
well as to make changes to the build/test environment.
Reviewed-by: Eric Anholt <eric@anholt.net>
Michel Dänzer [Tue, 22 Oct 2019 15:16:52 +0000 (17:16 +0200)]
gitlab-ci: Run piglit tests with llvmpipe
One job for the quick_gl profile, one for the glslparser & quick_shader
profiles (doing these together takes hardly any more time than
quick_shader alone).
v2:
* Don't break lava tests
v3:
* Remove piglit test artifacts paths:
* Exclude some quick_shader tests again:
- Test whose result flips between pass/fail/skip
- *@vs_in tests, as not the same one of these gets picked every time
v4:
* Do not list passing tests in .gitlab-ci/piglit/*.txt (Eric Anholt)
* Include the test number summary in .gitlab-ci/piglit/*.txt
* Completely disable generating any vs_in tests in the piglit build.
* Remove some more unneded files from the piglit build tree.
* Exclude quick_gl arb_gpu_shader5 tests; they were all skipped anyway,
as llvmpipe doesn't support this extension yet, but occasionally they
would spuriously fail instead.
v5:
* Set LD_LIBRARY_PATH, so we actually test the Mesa build from the
pipeline...
* Verify that wflinfo reports the expected Mesa version
* Pass -noreset to Xvfb
v6:
* Don't use autoscale runners, run piglit with -j4 (Eric Anholt)
Reviewed-by: Eric Anholt <eric@anholt.net>
Michel Dänzer [Wed, 6 Nov 2019 16:05:56 +0000 (17:05 +0100)]
gitlab-ci: Sort packages in debian-install.sh
And remove duplicates.
Reviewed-by: Eric Anholt <eric@anholt.net>
Michel Dänzer [Tue, 5 Nov 2019 18:02:17 +0000 (19:02 +0100)]
gitlab-ci: Share dEQP build process between x86 & ARM test image scripts
See https://gitlab.freedesktop.org/mesa/mesa/issues/2056
v2:
* Rename .gitlab-ci/deqp-build.sh => .gitlab-ci/build-deqp.sh
(Eric Anholt)
Reviewed-by: Eric Anholt <eric@anholt.net>
Michel Dänzer [Wed, 23 Oct 2019 16:42:53 +0000 (18:42 +0200)]
gitlab-ci: Move artifact preparation to separate script
It's currently only needed for the meson-main and meson-arm64 jobs, not
the other meson build jobs.
Also remove MESON_SHADERDB, just run .gitlab-ci/run-shader-db.sh
directly from the meson-main job.
v2:
* Also run prepare-artifacts.sh in meson-arm64 script
v3:
* Move tarball creation into the new script as well, as it prevented
ccache --show-stats from running in after_script
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> # v1
Reviewed-by: Eric Anholt <eric@anholt.net>
Michel Dänzer [Tue, 22 Oct 2019 16:27:53 +0000 (18:27 +0200)]
gitlab-ci: Use ninja -j4 for building dEQP
By default, ninja tries to saturate all cores of the runner host
machine, which could overload it due to other jobs running in parallel.
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Jason Ekstrand [Mon, 11 Nov 2019 15:37:50 +0000 (09:37 -0600)]
spirv: Fix the MSVC build
Fixes: 9cc4c2c91649b "spirv: Add a vtn_decorate_pointer helper"
Tested-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Erik Faye-Lund [Wed, 30 Oct 2019 13:53:56 +0000 (14:53 +0100)]
nir: patch up deref-vars when lowering clip-planes
Otherwise, we fail validation and potentially generate invalid code.
Let's fix up the mode of the accesses to the variable.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Samuel Pitoiset [Mon, 11 Nov 2019 10:34:05 +0000 (11:34 +0100)]
ac: handle pointer types to LDS in ac_get_elem_bits()
This fixes crashes with some
dEQP-VK.spirv_assembly.instruction.spirv1p4.* tests.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Jonathan Marek [Thu, 7 Nov 2019 12:28:37 +0000 (07:28 -0500)]
freedreno: add Adreno 640 ID
A640 seems to work without any other changes (glmark and vkcube).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Luis Mendes [Sat, 9 Nov 2019 23:21:05 +0000 (23:21 +0000)]
radv: fix radv secure compile feature breaks compilation on armhf EABI and aarch64
__NR_select is not defined the same way across architectures, sometimes is
not even defined, like in armhf EABI and aarch64.
Signed-off-by: Luis Mendes <luis.p.mendes@gmail.com>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2042
Marek Olšák [Sat, 9 Nov 2019 00:43:10 +0000 (19:43 -0500)]
st/mesa: remove unused TGSI-only debug printing functions
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Sat, 9 Nov 2019 00:40:44 +0000 (19:40 -0500)]
st/mesa: add ST_DEBUG=nir to print NIR shaders
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Sat, 9 Nov 2019 00:35:02 +0000 (19:35 -0500)]
st/mesa: print TCS/TES/GS/CS TGSI in the right place & keep disk cache enabled
The old place only printed on a disk cache miss, which is why the disk
cache was disabled.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Sat, 9 Nov 2019 00:32:25 +0000 (19:32 -0500)]
st/mesa: remove \n being only printed in debug builds after printed TGSI
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Sat, 9 Nov 2019 00:24:34 +0000 (19:24 -0500)]
st/mesa: rename DEBUG_TGSI -> DEBUG_PRINT_IR
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Mon, 11 Nov 2019 22:04:15 +0000 (17:04 -0500)]
st/mesa: fix Sanctuary and Tropics by disabling ARB_gpu_shader5 for them
They use the "sample" keyword as a variable name.
Cc: 19.2 19.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Lionel Landwerlin [Tue, 16 Oct 2018 22:44:31 +0000 (17:44 -0500)]
anv: implement VK_KHR_timeline_semaphore
v2: Fix inverted condition in vkGetPhysicalDeviceExternalSemaphoreProperties()
v3: Add anv_timeline_* helpers (Jason)
v4: Avoid variable shadowing (Jason)
Split timeline wait/signal device operations (Jason/Lionel)
v5: s/point/signal_value/ (Jason)
Drop piece of drm-syncobj timeline code (Jason)
v6: Add missing sync_fd semaphore signaling (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jason Ekstrand [Tue, 16 Oct 2018 20:58:14 +0000 (15:58 -0500)]
anv: Plumb timeline semaphore signal/wait values through from the API
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Lionel Landwerlin [Thu, 11 Jul 2019 12:21:04 +0000 (15:21 +0300)]
anv/wsi: signal the semaphore in the acquireNextImage
We seem to have forgotten about the semaphore in the
acquireNextImageInfo.
v2: Signal semaphore/fence regardless of presentation status (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jason Ekstrand [Mon, 11 Nov 2019 16:58:44 +0000 (10:58 -0600)]
anv: Lock around fetching sync file FDs from semaphores
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Lionel Landwerlin [Mon, 30 Sep 2019 09:30:20 +0000 (12:30 +0300)]
anv: prepare the driver for delayed submissions
Timeline semaphore introduce support for wait before signal behavior,
which means that it is now allowed to call vkQueueSubmit() with wait
semaphores not yet submitted for execution. Our kernel driver requires
all of the wait primitives to be created before calling the execbuf
ioctl. As a result, we must delay submissions in the userspace driver.
This change store the necessary information to be able to delay a
VkSubmitInfo submission to the kernel driver.
v2: Fold count++ into array access (Jason)
Move queue list to another patch (Jason)
v3: Document cleanup of temporary semaphores (Jason)
v4: Track semaphores of SYNC_FD type that needs updating after delayed
submission
v5: Don't forget to update sync_fd in signaled semaphores after
submission (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Lionel Landwerlin [Sat, 26 Oct 2019 15:59:59 +0000 (18:59 +0300)]
anv: refcount semaphores
Delayed submissions required by timeline semaphores mean we need to be
able to update the sync fd backed semaphores in a delayed fashion.
This could mean a race between the application destroying the
semaphore and the submission code trying to update it with the new
sync fd.
This change prepares semaphores to be refcounted, we'll most likely
only take a reference for cases where we signal a sync fd semaphore.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Lionel Landwerlin [Fri, 23 Aug 2019 11:48:28 +0000 (13:48 +0200)]
anv: prepare driver to report submission error through queues
When we will submit to i915 from a submission thread, we won't be able
to directly report the error to the user (in particular through the
debug report callbacks). So prepare 2 paths to report errors device ->
notifying the user immediately, queue -> notifying the user the next
time an entry point is called.
In this change we still report directly for both paths, this will
change in the next commit.
v2: Split NULL batch parameter handling in
anv_queue_submit_simple_batch() in a different commit
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Lionel Landwerlin [Fri, 23 Aug 2019 17:14:34 +0000 (20:14 +0300)]
anv: allow NULL batch parameter to anv_queue_submit_simple_batch
We can reuse device->trivial_batch_bo
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Lionel Landwerlin [Fri, 23 Aug 2019 10:30:42 +0000 (12:30 +0200)]
anv: move queue init/finish to anv_queue.c
Prepare the queue initialization to take on more responsabilities and
possibly fail.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Lionel Landwerlin [Wed, 7 Aug 2019 13:46:45 +0000 (16:46 +0300)]
anv: expose timeout helpers outside of anv_queue.c
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Lionel Landwerlin [Thu, 1 Aug 2019 10:21:41 +0000 (13:21 +0300)]
anv: detach batch emission allocation from device
In the future we'll have 2 different allocations depending on whether
we're using threaded submission or not.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Lionel Landwerlin [Thu, 19 Sep 2019 22:24:53 +0000 (01:24 +0300)]
anv: remove list items on batch fini
This doesn't seem to fix anything because those destroy() calls happen
right before the command buffer object & its list of batch_bo is also
destroyed. Still looks a bit cleaner.
v2: Found a second occurence
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
Fixes: 26ba0ad54d ("vk: Re-name command buffer implementation files")
Cc: <mesa-stable@lists.freedesktop.org>
Lionel Landwerlin [Thu, 29 Aug 2019 11:54:12 +0000 (14:54 +0300)]
anv: invalidate file descriptor of semaphore sync fd at vkQueueSubmit
We always close the in_fence at the end the anv_cmd_buffer_execbuf()
so when we take it from the semaphore, let's not forget to invalidate
it.
Note that the code leaks the fence_in if we get any error before
reaching the close(). Let's fix that in another patch or better,
rewrite the whole thing!
v2: drop redundant fd = -1 (Jason)
v3: Update commit message (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Rhys Perry [Mon, 11 Nov 2019 11:16:31 +0000 (11:16 +0000)]
radv: fix radv_nir_get_max_workgroup_size when nir=NULL
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Fixes: 84a1a2578 ('compiler: pack shader_info from 160 bytes to 96 bytes')
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Lionel Landwerlin [Mon, 11 Nov 2019 10:32:50 +0000 (12:32 +0200)]
mesa: check framebuffer completeness only after state update
The change made in
88d665830f27 ("mesa: check draw buffer completeness
on glClearBufferfi/glClearBufferiv") correctly updated the state prior
to checking the framebuffer completeness on glClearBufferiv but not in
glClearBufferfi.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Fixes: 88d665830f27 ("mesa: check draw buffer completeness on glClearBufferfi/glClearBufferiv")
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/2072
Caio Marcelo de Oliveira Filho [Sat, 9 Nov 2019 06:21:10 +0000 (22:21 -0800)]
glsl: Check earlier for MaxTextureImageUnits and MaxImageUniforms
Currently the linker do all the work then check for the limits, which
means num_textures and num_images in shader_info may have to store more
than the limit. This breaks down now since shader_info was packed and
doesn't expect to store larger invalid values.
To fix this, pull the check before we set the counts in shader_info.
Add necessary plumbing to make sure we bail once those errors are
found.
Fixes: 84a1a2578da ("compiler: pack shader_info from 160 bytes to 96 bytes")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Caio Marcelo de Oliveira Filho [Sat, 9 Nov 2019 06:00:10 +0000 (22:00 -0800)]
glsl: Check earlier for MaxShaderStorageBlocks and MaxUniformBlocks
Currently the linker do all the work then check for the limits, which
means num_ssbos and num_ubos in shader_info may have to store more
than the limit. This breaks down now since shader_info was packed and
doesn't expect to store larger invalid values.
To fix this, pull the check before we set the counts in shader_info.
One drawback of this approach is that for some cases we might not see
the collected errors from various stages, but bail as soon as a stage
breaks the limits.
Fixes: 84a1a2578da ("compiler: pack shader_info from 160 bytes to 96 bytes")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Dylan Baker [Thu, 31 Oct 2019 20:26:00 +0000 (13:26 -0700)]
util: Use ZSTD for shader cache if possible
This allows ZSTD instead of ZLIB to be used for compressing the shader
cache.
On a 72 core system emulating skl with a full shader-db (with i965):
ZSTD:
1915.10s user 229.27s system 5150% cpu 41.632 total (cold cache)
225.40s user 10.87s system 3810% cpu 6.201 total (warm cache)
154M (235M on disk)
ZLIB:
2231.33s user 194.24s system 1899% cpu 2:07.72 total (cold cache)
229.15s user 10.63s system 3906% cpu 6.139 total (warm cache)
163M (244M on disk)
Tim Arceri sees (8 core ryzen and a full shader-db):
ZSTD:
2505.22 user 40.50 system 3:18.73 elapsed 1280% CPU (cold cache)
418.71 user 14.93 system 0:46.53 elapsed 931% CPU (warm cache)
454.3 MB (681.7 MB on disk)
ZLIB:
3069.83 user 40.02 system 4:20.13 elapsed 1195% CPU (cold cache)
425.50 user 15.17 system 0:46.80 elapsed 941% CPU (warm cache)
470.3 MB (701.4 MB on disk)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
Laurent Carlier [Wed, 6 Nov 2019 15:04:50 +0000 (16:04 +0100)]
egl: avoid local modifications for eglext.h Khronos standard header file
Move differences in eglextchromium.h header file, then provide the same header than libglvnd-1.2
So program that omit to include eglextchromium.h will fail to build with both mesa and libglvnd headers.
Fixes: a0a8109f "include: add the definition of EGL_EXT_image_flush_external"
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Eric Engestrom [Wed, 6 Nov 2019 19:53:28 +0000 (19:53 +0000)]
egl: move #include of local headers out of Khronos headers
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Jason Ekstrand [Sun, 4 Dec 2016 01:15:42 +0000 (17:15 -0800)]
intel/fs: Lower large local arrays to scratch
Shader-db results on Kaby Lake:
total instructions in shared programs:
14929212 ->
14880028 (-0.33%)
instructions in affected programs: 72428 -> 23244 (-67.91%)
helped: 6
HURT: 2
helped stats (abs) min: 2165 max: 15981 x̄: 8590.00 x̃: 7624
helped stats (rel) min: 56.06% max: 74.52% x̄: 67.55% x̃: 72.08%
HURT stats (abs) min: 1178 max: 1178 x̄: 1178.00 x̃: 1178
HURT stats (rel) min: 350.60% max: 361.35% x̄: 355.97% x̃: 355.97%
95% mean confidence interval for instructions value: -11947.03 -348.97
95% mean confidence interval for instructions %-change: -125.72% 202.37%
Inconclusive result (%-change mean confidence interval includes 0).
total cycles in shared programs:
368585300 ->
342557344 (-7.06%)
cycles in affected programs:
28144921 ->
2116965 (-92.48%)
helped: 6
HURT: 2
helped stats (abs) min:
1404978 max:
7766106 x̄:
4353922.00 x̃:
3890682
helped stats (rel) min: 82.01% max: 95.57% x̄: 89.95% x̃: 92.28%
HURT stats (abs) min: 47778 max: 47798 x̄: 47788.00 x̃: 47788
HURT stats (rel) min: 278.20% max: 282.98% x̄: 280.59% x̃: 280.59%
95% mean confidence interval for cycles value: -
5900438.73 -606550.27
95% mean confidence interval for cycles %-change: -140.79% 146.16%
Inconclusive result (%-change mean confidence interval includes 0).
total spills in shared programs: 9243 -> 8901 (-3.70%)
spills in affected programs: 2718 -> 2376 (-12.58%)
helped: 4
HURT: 4
total fills in shared programs: 21831 -> 10141 (-53.55%)
fills in affected programs: 11804 -> 114 (-99.03%)
helped: 6
HURT: 2
total sends in shared programs: 815912 -> 815912 (0.00%)
sends in affected programs: 0 -> 0
helped: 0
HURT: 0
LOST: 1
GAINED: 3
The helped shaders are all compute shaders in Aztec Ruins. There is
also a compute shader in synmark2 OglCSDof that's helped but it doesn't
show up in above shader-db results because it went from SIMD8 to SIMD16.
That shader improves enough to yield an 15-20% performance boost to the
benchmark as a whole on my KBL laptop. The hurt shaders are a couple
shaders in Kerbal Space Program and a couple in Aztec Ruins.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Jason Ekstrand [Thu, 28 Feb 2019 14:15:30 +0000 (08:15 -0600)]
intel/fs: Implement the new load/store_scratch intrinsics
This commit fills in a number of different pieces:
1. We add support to brw_nir_lower_mem_access_bit_sizes to handle the
new intrinsics. This involves simple plumbing work as well as a
tiny bit of extra logic to always scalarize scratch intrinsics
2. Add code to brw_fs_nir.cpp to turn nir_load/store_scratch intrinsics
into byte/dword scattered read/write messages which use the A32
stateless model.
3. Add code to lower_surface_logical_send to handle dword scattered
messages and the A32 stateless model.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Jason Ekstrand [Thu, 28 Feb 2019 16:02:03 +0000 (10:02 -0600)]
intel/nir: Plumb devinfo through lower_mem_access_bit_sizes
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Jason Ekstrand [Thu, 28 Feb 2019 16:26:33 +0000 (10:26 -0600)]
intel/fs: refactor surface header setup
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Jason Ekstrand [Wed, 8 Apr 2015 09:41:33 +0000 (02:41 -0700)]
intel/fs: Add DWord scattered read/write opcodes
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Jason Ekstrand [Wed, 6 Nov 2019 18:36:28 +0000 (12:36 -0600)]
intel/nir: Use nir_extract_bits in lower_mem_access_bit_sizes
The new helper solves most of the annoying problems with data wrangling
in brw_nir_lower_mem_access_bit_sizes.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Jason Ekstrand [Sat, 9 Nov 2019 01:24:05 +0000 (19:24 -0600)]
nir: Add tests for nir_extract_bits
Jason Ekstrand [Wed, 6 Nov 2019 18:09:56 +0000 (12:09 -0600)]
nir/builder: Add a nir_extract_bits helper
This new helper is better than nir_bitcast_vector because it's able to
take a (mostly) arbitrary range from the source vector. The only
requirement is that first_bit has to be aligned to the smaller of the
two bit sizes. It wouldn't be hard to lift that requirement but it's
reasonable for now.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Eric Engestrom [Tue, 10 Sep 2019 16:06:09 +0000 (17:06 +0100)]
egl: fix _EGL_NATIVE_PLATFORM fallback
When the X11 or Haiku platforms were compiled in, they would bypass the
`_EGL_NATIVE_PLATFORM` fallback by always returning themselves instead.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Ricardo Garcia [Thu, 7 Nov 2019 14:38:45 +0000 (15:38 +0100)]
anv: Unify GetDeviceQueue and GetDeviceQueue2
Avoid duplicating some checks and code by making anv_GetDeviceQueue a
subcase of anv_GetDeviceQueue2, like radv does.
Signed-off-by: Ricardo Garcia <rgarcia@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Alyssa Rosenzweig [Thu, 7 Nov 2019 02:48:33 +0000 (21:48 -0500)]
panfrost: Select format-specific blending intrinsics
If we have an accelerated path for a particular framebuffer format,
let's use it to save a bunch of instructions in a blend shader.
[Tomeu: Only use the faster intrinsic on >T760]
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Thu, 7 Nov 2019 13:25:27 +0000 (08:25 -0500)]
pan/midgard: Pack load/store masks
While most load/store operations on 32-bit/vec4 intriniscally, some are
not and have special type-size-dependent semantics for the mask. We need
to convert into this native format.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Thu, 7 Nov 2019 02:50:32 +0000 (21:50 -0500)]
pan/midgard: Implement nir_intrinsic_load_output_u8_as_fp16_pan
We can use the native Midgard ops for this, depending what chip we're
on.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Thu, 7 Nov 2019 02:49:35 +0000 (21:49 -0500)]
pan/midgard: Identify ld_color_buffer_u8_as_fp16*
There are two versions of this opcode, depending what version of the ISA
you're using. I'm not sure if there's a semantic difference; I think
there might be some slight subtleties but it's too early to know at this
stage.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Alyssa Rosenzweig [Thu, 7 Nov 2019 02:47:23 +0000 (21:47 -0500)]
nir: Add load_output_u8_as_fp16_pan intrinsic
This is a single opcode, at least on newer Midgard chips. It's easier to
have this represented in NIR rather than trying to optimize out the
conversions, so let's add the intrinsic.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Tomeu Vizoso [Wed, 6 Nov 2019 09:04:36 +0000 (10:04 +0100)]
panfrost: Set depth and stencil for SFBD based on the format
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Erik Faye-Lund [Fri, 8 Nov 2019 11:22:00 +0000 (12:22 +0100)]
zink: correct depth-stencil format
When using packed vulkan-formats on little-endian systems, we need to
swap the components for the gallium formats. And since Zink isn't
big-endian safe yet, little-endian is the only endianess we care about
right now.
This fixes a bunch of piglit tests, amongs others:
- spec@arb_depth_texture@depth-level-clamp
- spec@arb_depth_texture@depthstencil-render-miplevels * d=z24
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-blit
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-copypixels
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-drawpixels
- spec@arb_depth_texture@fbo-depth-gl_depth_component24-readpixels
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 8d46e35d16e ("zink: introduce opengl over vulkan")
Erik Faye-Lund [Wed, 6 Nov 2019 14:13:58 +0000 (15:13 +0100)]
zink/spirv: add support for nir_op_flrp
This fixes the following piglit:
spec@ati_fragment_shader@ati_fragment_shader-render-fog
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Chris Wilson [Thu, 31 Oct 2019 07:29:55 +0000 (07:29 +0000)]
egl: Mention if swrast is being forced
The system can be disabling HW acceleration unbeknown to the user,
leading to a long debug session trying to work out which component is
failing. A quick mention that it is the environment override would be
very useful.
v2: Use more generic "CPU renderer" and so try to avoid jargon.
Reviewed-By: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Martin Peres <martin.peres@linux.intel.com>
Jason Ekstrand [Thu, 26 Sep 2019 16:56:48 +0000 (11:56 -0500)]
spirv: Sort out the mess that is sampled image
This commit makes two major changes. First, we add a second case to
OpLoad for sampled images which constructs a vtn_sampled_image and
stashes that rather than stashing a pointer to the combined image
sampler like we do for bare samplers and images. This should be more in
line with how SPIR-V is intended to work and hopefully doesn't cause any
weird problems. The second is a rework of vtn_handle_texture to assume
that everything has an image but not everything has a sampler. We also
add a vtn_fail_if for the case where a texture instructions require a
sampler but none is provided.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Jason Ekstrand [Mon, 4 Nov 2019 22:44:30 +0000 (16:44 -0600)]
spirv: Add a vtn_decorate_pointer helper
This helper makes a duplicate copy of the pointer if any new access
flags are set at this stage. This way we don't end up propagating
access flags further than they actual SPIR-V decorations. In several
instances where we create new pointers, we still call the decoration
helper directly because no copy is needed.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Jason Ekstrand [Thu, 26 Sep 2019 16:48:44 +0000 (11:48 -0500)]
spirv: Remove the type from sampled_image
We have types on all vtn_values at this point so there's no reason to
carry the redundant type information.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Rob Clark [Mon, 4 Nov 2019 19:41:55 +0000 (11:41 -0800)]
freedreno/ir3: also track # of nops for shader-db
The instruction count is (mostly) a measure of what optimization passes
can do, while # of nops is more an indication of how effectively the
scheduler is balancing register pressure vs instruction count. So track
these independently.
(There could be opportunities to rematerialize values to reduce register
pressure, swapping some nop's with other alu instructions, so nothing is
truely independent.. but it is still useful to break these stats out.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Rob Clark [Mon, 4 Nov 2019 19:33:54 +0000 (11:33 -0800)]
freedreno/ir3: sync disasm changes from envytools
Signed-off-by: Rob Clark <robdclark@chromium.org>
Rob Clark [Fri, 25 Oct 2019 20:57:49 +0000 (13:57 -0700)]
freedreno/a4xx: fix SP_FS_MRT_REG.HALF_PRECISION
Set flag based on actual output reg type.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Rob Clark [Fri, 25 Oct 2019 20:56:30 +0000 (13:56 -0700)]
freedreno/a3xx: fix SP_FS_MRT_REG.HALF_PRECISION
We should really be setting this based on the actual output register
type.
Signed-off-by: Rob Clark <robdclark@chromium.org>