mesa.git
5 years agofreedreno/ir3: fixup register footprint fixup
Rob Clark [Mon, 21 Oct 2019 23:33:50 +0000 (16:33 -0700)]
freedreno/ir3: fixup register footprint fixup

Small typo resulted in not converting footprint to vec4, meaning that we
could potentially ask for quite a few more registers than required

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/ir3: handle scalarized varying inputs
Rob Clark [Mon, 21 Oct 2019 18:15:53 +0000 (11:15 -0700)]
freedreno/ir3: handle scalarized varying inputs

If the load_interpolated_input is scalarized, we would be too
conservative about deciding the tex instruction wasn't a candidate to
pre-fetch:

vec1 32 ssa_0 = load_const (0x00000000 /* 0.000000 */)
vec2 32 ssa_1 = intrinsic load_barycentric_pixel () (0) /* interp_mode=0 */
vec1 32 ssa_2 = intrinsic load_interpolated_input (ssa_1, ssa_0) (0, 0) /* base=0 */ /* component=0 */ /* packed:v_uv,v_uv1 */
vec1 32 ssa_3 = intrinsic load_interpolated_input (ssa_1, ssa_0) (0, 1) /* base=0 */ /* component=1 */ /* packed:v_uv,v_uv1 */
vec2 32 ssa_8 = vec2 ssa_2, ssa_3
vec4 32 ssa_9 = tex ssa_8 (coord), 0 (texture), 0 (sampler)

Really we don't care that the texcoord components come from different
load_interpolated_input instructions, just that they have consecutive
varying offsets.

Reported-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agoaco: refactor value numbering
Daniel Schürmann [Sat, 19 Oct 2019 14:11:13 +0000 (16:11 +0200)]
aco: refactor value numbering

Previously, we used one hashset per BB, so that we could
always initialize the current hashset from the immediate
dominator. This patch changes the behavior to a single
hashmap using the block index per instruction to resolve
dominance.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
5 years agomesa/st: assert that lowering is supported
Erik Faye-Lund [Fri, 18 Oct 2019 12:29:26 +0000 (14:29 +0200)]
mesa/st: assert that lowering is supported

Some of these lowerings aren't supported for drivers that supports
tesselation and geometry shaders. Let's add a couple of asserts to make
it obvious if these have been enabled when it's not possible.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agogitlab-ci: Enable llvmpipe in ARM build jobs
Michel Dänzer [Tue, 8 Oct 2019 17:48:41 +0000 (19:48 +0200)]
gitlab-ci: Enable llvmpipe in ARM build jobs

v2:
* Use LLVM 8 from buster-backports
v3:
* Use LLVM 7 again for armhf, llvmpipe is still broken there with LLVM 8

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agogitlab-ci: Update the meson cross file for LLVM_VERSION as well
Michel Dänzer [Fri, 11 Oct 2019 13:43:34 +0000 (15:43 +0200)]
gitlab-ci: Update the meson cross file for LLVM_VERSION as well

Cross builds don't use the llvm-config path from the native file.

5 years agogitlab-ci: Use native aarch64 runner for ARM build jobs
Michel Dänzer [Tue, 8 Oct 2019 17:46:11 +0000 (19:46 +0200)]
gitlab-ci: Use native aarch64 runner for ARM build jobs

This allows running the regression tests.

One downside is that we can't easily build the Vulkan overlay layer,
because only x86 binaries of the glslang validator are available. If
that's important, we could either use those binaries via qemu, or build
it from source.

v2:
* Add :amd64 suffix to existing debian-9/10 job names (Eric Engestrom)

Acked-by: Eric Engestrom <eric.engestrom@intel.com> # v1
5 years agogitlab-ci: Explicitly list debian-10 in needs: for .deqp-test template
Michel Dänzer [Tue, 22 Oct 2019 09:19:17 +0000 (11:19 +0200)]
gitlab-ci: Explicitly list debian-10 in needs: for .deqp-test template

Apparently needs: in a definition overwrites inherited ones. So
.deqp-test effectively didn't declare needs: for debian-10, which means
any jobs based on .deqp-test could spuriously run after the debian-10
job failed or was cancelled.

5 years agogitlab-ci: Bring ARM docker image install script in line with x86_64
Michel Dänzer [Thu, 10 Oct 2019 08:56:08 +0000 (10:56 +0200)]
gitlab-ci: Bring ARM docker image install script in line with x86_64

Use https:// URLs in the APT configuration.

Drop --no-install-recommends, the image generation template disables
installation of recommended packages in /etc/apt/apt.conf.

Run apt-get autoremove at the end, cleaning up packages which were
installed to satisfy dependencies but are no longer needed.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agogitlab-ci: Sort ARM docker image packages in alphabetical order
Michel Dänzer [Wed, 9 Oct 2019 16:48:17 +0000 (18:48 +0200)]
gitlab-ci: Sort ARM docker image packages in alphabetical order

No functional change.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoradv: fix updating bound fast ds clear values with different aspects
Samuel Pitoiset [Mon, 21 Oct 2019 20:17:43 +0000 (22:17 +0200)]
radv: fix updating bound fast ds clear values with different aspects

On GFX9, the driver is able to do an optimized fast depth/stencil
clear with only one aspect (ie. clear the stencil part of a
depth/stencil image). When this happens, the driver should only
update the clear values of the given aspect.

Note that it's currently only supported on GFX9 but I have some
local patches that extend this optimized path for other gens.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1967
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agointel/compiler: Refactor disassembly of sources in 3src instruction
Sagar Ghuge [Wed, 22 May 2019 18:11:49 +0000 (11:11 -0700)]
intel/compiler: Refactor disassembly of sources in 3src instruction

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/compiler: Don't move immediate in register
Sagar Ghuge [Tue, 21 May 2019 23:15:16 +0000 (16:15 -0700)]
intel/compiler: Don't move immediate in register

On Gen12, we support mixed mode HF/F operands, and also 3 source
instruction supports immediate value support, so keep immediate as it
is, if it fits properly in 16 bit field.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/compiler: Set bits according to source file
Sagar Ghuge [Fri, 19 Apr 2019 20:37:17 +0000 (13:37 -0700)]
intel/compiler: Set bits according to source file

On Gen >= 12, if src0 or src2 holds immediate value, we need set
src[0/2]_is_imm bits instead of register file.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agointel/compiler: Add Immediate support for 3 source instruction
Sagar Ghuge [Fri, 26 Jul 2019 01:28:06 +0000 (18:28 -0700)]
intel/compiler: Add Immediate support for 3 source instruction

On Gen >= 10, Either src0 or src2 can use 16-bit immediate value, but
not both.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
5 years agoci: Disable lima until its farm can get fixed.
Eric Anholt [Tue, 22 Oct 2019 02:56:01 +0000 (19:56 -0700)]
ci: Disable lima until its farm can get fixed.

It's been throwing the following error today:

"<Fault -32603: 'Internal Server Error (contact server administrator
for details): could not extend file "base/17952/18226": No space left
on device\nHINT: Check free disk space.\n'>"

Reviewed-by: Daniel Stone <daniels@collabora.com>
5 years agointel: Add missing entry for brw_nir_lower_alpha_to_coverage in Makefile
Sagar Ghuge [Mon, 21 Oct 2019 20:58:04 +0000 (13:58 -0700)]
intel: Add missing entry for brw_nir_lower_alpha_to_coverage in Makefile

Fixes: 7ecfbd4f6d4 ("nir: Add alpha_to_coverage lowering pass")
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
5 years agollvmpipe: handle compute shader launch with 0 threads
Dave Airlie [Wed, 16 Oct 2019 03:33:36 +0000 (13:33 +1000)]
llvmpipe: handle compute shader launch with 0 threads

If you set LP_NUM_THREADS=0 compute shaders would hang,
just execute the workloads in sequence if we have no threads
in the pool.

Fixes: 1b24e3ba75 ("llvmpipe: add compute threadpool + mutex")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
5 years agofreedreno/ir3: Add missing ir3_nir_lower_tex_prefetch.c to Android.mk
Marijn Suijten [Sat, 19 Oct 2019 23:54:29 +0000 (01:54 +0200)]
freedreno/ir3: Add missing ir3_nir_lower_tex_prefetch.c to Android.mk

This file is created in 2a0d45ae6cf09d60c048d7854e3d082bf15e374f but
addition to android makefiles was omitted. It breaks the build with
missing references which are defined in this file.
List the file in ir3_SOURCES to make the build succeed.

Signed-off-by: Marijn Suijten <marijns95@gmail.com>
5 years agoac/llvm: fix ac_to_integer_type() for 32-bit const addr space pointers
Samuel Pitoiset [Mon, 21 Oct 2019 12:11:47 +0000 (14:11 +0200)]
ac/llvm: fix ac_to_integer_type() for 32-bit const addr space pointers

This fixes some crashes with dEQP-VK.descriptor_indexing.* when
read_first_invocation has its source from a descriptor.

Most of these tests still fail because of an LLVM bug (they work
with ACO).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoaco: run opt_algebraic in a loop
Rhys Perry [Thu, 3 Oct 2019 16:15:34 +0000 (17:15 +0100)]
aco: run opt_algebraic in a loop

Totals from affected shaders:
SGPRS: 13920 -> 13656 (-1.90 %)
VGPRS: 12972 -> 12960 (-0.09 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 1005680 -> 1000648 (-0.50 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Max Waves: 688 -> 688 (0.00 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
5 years agoaco: use nir_lower_idiv_precise
Rhys Perry [Wed, 18 Sep 2019 19:45:05 +0000 (20:45 +0100)]
aco: use nir_lower_idiv_precise

v7: rename _nv50/_llvm to _fast/_precise

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
5 years agonir/lower_idiv: add new llvm-based path
Rhys Perry [Tue, 5 Feb 2019 15:56:24 +0000 (15:56 +0000)]
nir/lower_idiv: add new llvm-based path

v2: make variable names snake_case
v2: minor cleanups in emit_udiv()
v2: fix Panfrost build failure
v3: use an enum instead of a boolean flag in nir_lower_idiv()'s signature
v4: remove nir_op_urcp
v5: drop nv50 path
v5: rebase
v6: add back nv50 path
v6: add comment for nir_lower_idiv_path enum
v7: rename _nv50/_llvm to _fast/_precise
v8: fix etnaviv build failure

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
5 years agointel/compiler: Remove emit_alpha_to_coverage workaround from backend
Sagar Ghuge [Fri, 27 Sep 2019 23:28:11 +0000 (16:28 -0700)]
intel/compiler: Remove emit_alpha_to_coverage workaround from backend

Remove emit_alpha_to_coverage workaround from backend compiler and start
using ported workaround from NIR.

v2: Copy comment from brw_fs_visitor (Caio Marcelo de Oliveira Filho)

Fixes piglit test on HSW:
- arb_sample_shading-builtin-gl-sample-mask-mrt-alpha-to-coverage-combinations

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agonir: Add alpha_to_coverage lowering pass
Sagar Ghuge [Fri, 27 Sep 2019 23:23:46 +0000 (16:23 -0700)]
nir: Add alpha_to_coverage lowering pass

Importing this pass from fs_visitor::emit_alpha_to_coverage_workaround()
in intel/compiler.

v2 (Caio Marcelo de Oliveira Filho):
- Track store output and sample mask instruction
- Nest math insturction for more readability
- Bail out early if no gl_SampleMask

v3: (Caio Marcelo de Oliveira Filho):
- Do math instructions after instruction block
- Restructure code
- Move pass under src/intel/compiler

v4: (Caio Marcelo de Oliveira Filho):
- Organize dither mask calculation

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
5 years agoaco: ensure that uniform booleans are computed in WQM if their uses happen in WQM
Daniel Schürmann [Wed, 16 Oct 2019 10:56:05 +0000 (12:56 +0200)]
aco: ensure that uniform booleans are computed in WQM if their uses happen in WQM

This fixes graphical corruption in SC2.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
5 years agomeson: Require meson >= 0.49.1 when using icc or icl
Dylan Baker [Fri, 18 Oct 2019 20:49:42 +0000 (13:49 -0700)]
meson: Require meson >= 0.49.1 when using icc or icl

0.49.0 can compile most of mesa with ICC or ICL, but not SWR without
additional workarounds in our meson.build files. Bumping patch version
is easier and shouldn't be a big burden anyway, especially to cover a
niche compiler. The check originally only covered ICC, but now covers
ICL as well.

Fixes: 3740ffb59c89d8d879b1e0c1aed32c389dd82a35
       ("meson: add switches for SWR with MSVC")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1937
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agodocs: update calendar, add news item and link release notes for 19.1.8
Juan A. Suarez Romero [Mon, 21 Oct 2019 17:13:55 +0000 (19:13 +0200)]
docs: update calendar, add news item and link release notes for 19.1.8

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
5 years agodocs: add release notes for 19.1.8
Juan A. Suarez Romero [Mon, 21 Oct 2019 17:10:28 +0000 (19:10 +0200)]
docs: add release notes for 19.1.8

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit cc88eeb6ffc4e86d76dfdbfc601d519bc35b6c41)

5 years agodocs: add release notes for 19.1.8
Juan A. Suarez Romero [Mon, 21 Oct 2019 11:55:11 +0000 (13:55 +0200)]
docs: add release notes for 19.1.8

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit 5c6d266c591208b1c27e06f61b814210fc6e095f)

5 years agoaco/gfx10: Update constant addresses in fix_branches_gfx10.
Timur Kristóf [Wed, 16 Oct 2019 13:05:56 +0000 (15:05 +0200)]
aco/gfx10: Update constant addresses in fix_branches_gfx10.

Due to a bug in GFX10 hardware, s_nop instructions must be added
if a branch is at 0x3f. We already do this, but forgot to also update
the constant addresses that come after this instruction.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
5 years agoaco/gfx10: Fix PS exports for SPI_SHADER_32_AR.
Timur Kristóf [Tue, 15 Oct 2019 07:55:17 +0000 (09:55 +0200)]
aco/gfx10: Fix PS exports for SPI_SHADER_32_AR.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
5 years agoaco/gfx10: Wait for pending SMEM stores before loads
Timur Kristóf [Mon, 14 Oct 2019 13:18:31 +0000 (15:18 +0200)]
aco/gfx10: Wait for pending SMEM stores before loads

Currently if you have an SMEM store followed by an SMEM load that
loads the same location as was written, it won't work because the
store isn't finished before the load is executed. This is NOT
mitigated by an s_nop instruction on GFX10.

Since we currently don't have proper alias analysis, this commit adds
a workaround which will insert an s_waitcnt lgkmcnt(0) before each
SSBO load if they follow a store. We should further refine this in
the future when we can make sure to only add the wait when we load the
same thing as has been stored.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
5 years agopanfrost: Fix the DISCARD_WHOLE_RES case in transfer_map()
Boris Brezillon [Thu, 10 Oct 2019 13:12:30 +0000 (15:12 +0200)]
panfrost: Fix the DISCARD_WHOLE_RES case in transfer_map()

The current implementation does not synchronize on BO readiness when
DISCARD_WHOLE_RES flag is set, which can lead to misbehaviours when the
resource being updated is being used by one of the pending or already
flushed batches.

Adding unconditional BO synchronization would do the trick, but we can
sometimes optimize this path by re-allocating a new BO instead of
waiting for the existing one to be ready.

Reported-by: Daniel Stone <daniels@collabora.com>
Reported-by: Heinrich Fink <heinrich.fink@daqri.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agost/mesa: only require ESSL 3.1 for geometry shaders
Iago Toral Quiroga [Fri, 18 Oct 2019 07:37:54 +0000 (09:37 +0200)]
st/mesa: only require ESSL 3.1 for geometry shaders

According to the OES_geometry_shader spec, section Dependencies:

   "OpenGL ES 3.1 and OpenGL ES Shading Language 3.10
    are required."

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agoegl/android: Remove our own reference to buffers.
Lepton Wu [Tue, 17 Sep 2019 20:49:17 +0000 (13:49 -0700)]
egl/android: Remove our own reference to buffers.

We currently doesn't maintain it correctly and the buffer gets leaked if
surface is destroyed before calling swapping buffers.

From Android frameworks/native/libs/nativewindow/include/system/window.h:

  The window holds a reference to the buffer between dequeueBuffer and
  either queueBuffer or cancelBuffer, so clients only need their own
  reference if they might use the buffer after queueing or canceling it.

v2: Remove our own reference.

Fixes: 0212db35040 ("egl/android: Cancel any outstanding ANativeBuffer in surface destructor")
Reviewed-by: Chia-I Wu <olvaffe@gmail.com> (v1)
Reviewed-By: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
5 years agoradv: advertise VK_KHR_spirv_1_4
Samuel Pitoiset [Wed, 16 Oct 2019 09:41:53 +0000 (11:41 +0200)]
radv: advertise VK_KHR_spirv_1_4

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: do not dump descriptors twice in hang reports
Samuel Pitoiset [Tue, 15 Oct 2019 13:32:13 +0000 (15:32 +0200)]
radv: do not dump descriptors twice in hang reports

If a pipeline has both graphics and compute, descriptors are same.
While we are at it, use queue->device for simplicity.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: dump trace files earlier if a GPU hang is detected
Samuel Pitoiset [Tue, 15 Oct 2019 12:52:02 +0000 (14:52 +0200)]
radv: dump trace files earlier if a GPU hang is detected

To make sure a trace file is generated in case the driver crashes
during the hang report generation (which happens sometimes).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: print which ring is dumped in hang reports
Samuel Pitoiset [Tue, 15 Oct 2019 13:10:27 +0000 (15:10 +0200)]
radv: print which ring is dumped in hang reports

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: do not print useless descriptors info in hang reports
Samuel Pitoiset [Tue, 15 Oct 2019 12:49:38 +0000 (14:49 +0200)]
radv: do not print useless descriptors info in hang reports

This information has never been useful. All descriptors are
already dumped with colors etc, and it's more useful.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agoradv: enable VK_KHR_shader_float_controls on GFX6-GFX7
Samuel Pitoiset [Fri, 18 Oct 2019 16:04:52 +0000 (18:04 +0200)]
radv: enable VK_KHR_shader_float_controls on GFX6-GFX7

Disable 16-bit features because fp16 isn't exposed on these chips.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
5 years agopanfrost/ci: Update expectations list
Alyssa Rosenzweig [Sat, 19 Oct 2019 19:07:27 +0000 (15:07 -0400)]
panfrost/ci: Update expectations list

A bunch of blend tests fixed on T760. A single blend test regressed on
both T760/T860 but I am unable to reproduce locally so am just
documenting the regression and moving on.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Implement SIMD-aware dead code elimination
Alyssa Rosenzweig [Wed, 16 Oct 2019 17:19:49 +0000 (13:19 -0400)]
pan/midgard: Implement SIMD-aware dead code elimination

We would like to eliminate not just entire dead instructions, but also
dead components, which increases scheduler flexibility (since some
vector instructions can become scalar after eliminating dead
components). This also will allow better RA in the future.

Results are meh.

total instructions in shared programs: 3453 -> 3451 (-0.06%)
instructions in affected programs: 60 -> 58 (-3.33%)
helped: 2
HURT: 0

total bundles in shared programs: 1826 -> 1824 (-0.11%)
bundles in affected programs: 33 -> 31 (-6.06%)
helped: 2
HURT: 0

total quadwords in shared programs: 3144 -> 3144 (0.00%)
quadwords in affected programs: 0 -> 0
helped: 0
HURT: 0

total registers in shared programs: 321 -> 321 (0.00%)
registers in affected programs: 45 -> 45 (0.00%)
helped: 11
HURT: 11
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 16.67% max: 50.00% x̄: 39.70% x̃: 50.00%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for registers value: -0.45 0.45
95% mean confidence interval for registers %-change: -1.87% 62.18%
Inconclusive result (value mean confidence interval includes 0).

total threads in shared programs: 445 -> 447 (0.45%)
threads in affected programs: 2 -> 4 (100.00%)
helped: 1
HURT: 0

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Create dependency graph bytewise
Alyssa Rosenzweig [Thu, 17 Oct 2019 20:37:11 +0000 (16:37 -0400)]
pan/midgard: Create dependency graph bytewise

This allows for vec16 dependencies in the scheduler, not that we have
any yet (thankfully).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Handle nontrivial masks in texture RA
Alyssa Rosenzweig [Wed, 16 Oct 2019 17:01:41 +0000 (13:01 -0400)]
pan/midgard: Handle nontrivial masks in texture RA

The texture instruction has a mask we need to take into account.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Implement per-byte liveness tracking
Alyssa Rosenzweig [Wed, 16 Oct 2019 16:30:13 +0000 (12:30 -0400)]
pan/midgard: Implement per-byte liveness tracking

Now that we have notion of byte masks, liveness tracking can be updated
to reflect this extra granularity without loss of correctness.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Simplify mir_bytemask_of_read_components
Alyssa Rosenzweig [Fri, 18 Oct 2019 02:18:36 +0000 (22:18 -0400)]
pan/midgard: Simplify mir_bytemask_of_read_components

There are easy ways to iterate sources!

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Report byte masks for read components
Alyssa Rosenzweig [Wed, 16 Oct 2019 16:25:32 +0000 (12:25 -0400)]
pan/midgard: Report byte masks for read components

Read component masks don't have a particular type associated, since the
type of the ALU operation may not match the type of the operands in
question. So let's generate byte masks instead, and update the rest of
the compiler to use byte masks when analyzing reads.

Preparation for mixed types.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Add helpers for manipulating byte masks
Alyssa Rosenzweig [Wed, 16 Oct 2019 16:24:28 +0000 (12:24 -0400)]
pan/midgard: Add helpers for manipulating byte masks

There are essentially two formats of masks in play beginning with this
commit: masks per-channel and masks per-byte. The former make sense
within a given fixed-size instruction; the latter are
typesize-independent. It turns out you need the latter to meaningfully
manipulate instructions containing multiple sizes (which is quite
possible with ALU operations).

Similarly, we have mir_srcsize. We calculate the size of the source by
analyzing the size of the instruction itself and stepping down if there
is a half-modifier.

Finally, we have mir_round_bytemask_down, for when we want to take a
byte mask and "round it down" to a given component size, so that we can
use it as a component mask.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Implement OP_IS_STORE with table
Alyssa Rosenzweig [Sat, 19 Oct 2019 18:04:39 +0000 (14:04 -0400)]
pan/midgard: Implement OP_IS_STORE with table

..rather than open-coding.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Tableize load/store ops
Alyssa Rosenzweig [Fri, 18 Oct 2019 12:22:40 +0000 (08:22 -0400)]
pan/midgard: Tableize load/store ops

This will allow us to encode properties about the load/store ops like we
do for ALU ops. We include now properties about whether we have a store,
and if there are special cases on the load/store op. We also tag each
instruction by its natural size... this is probably not totally right,
but it's a start.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Factor out mir_get_alu_src
Alyssa Rosenzweig [Thu, 17 Oct 2019 20:37:48 +0000 (16:37 -0400)]
pan/midgard: Factor out mir_get_alu_src

This helper is used in a bunch of places ... might as well make that
common.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard/disasm: Fix printing 8-bit/16-bit masks
Alyssa Rosenzweig [Wed, 16 Oct 2019 21:34:28 +0000 (17:34 -0400)]
pan/midgard/disasm: Fix printing 8-bit/16-bit masks

The trick is realizing even with a destination override, the masks are encoded in the same mode as the
instruction itself, rather than stepping down. The override means that
the smaller type is used, but the mask is parsed as if it were the
higher type. Overriding down is down by printed by blinding doing this. Overriding up can be thought of as printing in the upper size, but shifting the alphabet to use the upper half, i.e. shifting xyzw to become abcd.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Identify 64-bit atomic opcodes
Alyssa Rosenzweig [Fri, 18 Oct 2019 12:18:52 +0000 (08:18 -0400)]
pan/midgard: Identify 64-bit atomic opcodes

They are symmetric to their 32-bit counterparts, just shifted.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
5 years agopan/midgard: Debug mir_insert_instruction_after_scheduled
Alyssa Rosenzweig [Wed, 16 Oct 2019 16:18:51 +0000 (12:18 -0400)]
pan/midgard: Debug mir_insert_instruction_after_scheduled

Add some comments explaining what's going on in a more natural flow in
order to solve the actual bug.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes: 2d914ebe818 ("pan/midgard: Fix memory corruption in register spilling")
5 years agoetnaviv: keep track of buffer valid ranges for PIPE_BUFFER
Christian Gmeiner [Fri, 6 Sep 2019 13:13:51 +0000 (15:13 +0200)]
etnaviv: keep track of buffer valid ranges for PIPE_BUFFER

This allows a write to proceed to an uninitialized part of a buffer
even when the GPU is using the previously-initialized portions.

Such a situation can be triggered with the following API usage example:

  glBufferSubData(..., offset, size, data1);
  glDrawArrays(...);
  // append new vertex data
  glBufferSubData(..., offset+size, size, data2);
  glDrawArrays(...);

Same is done for freedreno, nouveau and radeon.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
5 years agoetnaviv: store updated usage in pipe_transfer object
Christian Gmeiner [Fri, 6 Sep 2019 19:21:26 +0000 (21:21 +0200)]
etnaviv: store updated usage in pipe_transfer object

Store the changed usage in the newly created transfer object.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
5 years agoetnaviv: fix code style
Christian Gmeiner [Sun, 20 Oct 2019 06:02:11 +0000 (08:02 +0200)]
etnaviv: fix code style

Fixes: 1194afdfe35 ("etnaviv: rework the stream flush to always go through the context flush")
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
5 years agoanv: fix memory leak on device destroy
Lionel Landwerlin [Fri, 18 Oct 2019 12:28:30 +0000 (15:28 +0300)]
anv: fix memory leak on device destroy

v2: handle vma destruction if vkCreateDevice fails (Jordan)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1959
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
5 years agoetnaviv: fix compile warnings
Christian Gmeiner [Sun, 20 Oct 2019 05:38:03 +0000 (07:38 +0200)]
etnaviv: fix compile warnings

Fixes: e5cc66dfad0 ("etnaviv: Rework locking")
Fixes: 1456aa61cc5 ("etnaviv: Rework resource status tracking")
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
5 years agomesa: Redefine the RG formats as array formats.
Eric Anholt [Thu, 29 Aug 2019 23:05:20 +0000 (16:05 -0700)]
mesa: Redefine the RG formats as array formats.

This is the layout used in the GL API, and maps directly to PIPE
formats with no endianness trickery.  As with the LA change, this
fixes big-endian fetching from texbos.  Also cleans up some endian
shenanigans in shader images.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agogallium: Drop the unused PIPE_FORMAT_A*L* formats.
Eric Anholt [Thu, 29 Aug 2019 22:56:19 +0000 (15:56 -0700)]
gallium: Drop the unused PIPE_FORMAT_A*L* formats.

Now that Mesa is also using an array format for LA, nothing was using
these.  (And, clearly, no HW driver had exposed them).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agomesa: Replace MESA_FORMAT_L8A8/A8L8 UNORM/SNORM/SRGB with an array format.
Eric Anholt [Thu, 5 Sep 2019 21:42:15 +0000 (14:42 -0700)]
mesa: Replace MESA_FORMAT_L8A8/A8L8 UNORM/SNORM/SRGB with an array format.

The array format is what the GL API wants (fixing texbos on
big-endian), and matches directly to gallium's corresponding array
format.  The only driver exposing A8L8 was radeon/r200 in big-endian,
where the HW's underlying format was trying to read as array and we
needed to flip things around to make our packed format come out right
(note that while the radeon format tables had both AL and LA,
ChooseTextureFormat would only pick one of them based on endianness).

v2: Don't make r200/radeon use endian swaps.
v3: Rebase on dropping the r200 _be/_le format table removal patch
v4: reword commit message to explain why we can drop both formats
    from radeon.

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
5 years agomesa: Replace the LA16_UNORM packed formats with one array format.
Eric Anholt [Thu, 29 Aug 2019 22:45:18 +0000 (15:45 -0700)]
mesa: Replace the LA16_UNORM packed formats with one array format.

The array format is what the GL API wants (and we made a mistake in
the format returned for texbos on big-endian!), and it's exactly what
the gallium-side PIPE_FORMAT_L16A16 is.  The only downside is that
dri_util tries to fall back to sampling RG16 using LA16, which doesn't
have a match for big-endian any more.  No HW drivers supported A16L16
anyway.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoradeon: Drop the unused first arg of OUT_BATCH_RELOC.
Eric Anholt [Thu, 5 Sep 2019 23:06:34 +0000 (16:06 -0700)]
radeon: Drop the unused first arg of OUT_BATCH_RELOC.

This was a trap when trying to figure out how to fit data bits into
the reloc.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
5 years agoradeon: Fill in the TXOFFSET field containing the tile bits in our relocs.
Eric Anholt [Thu, 5 Sep 2019 23:04:01 +0000 (16:04 -0700)]
radeon: Fill in the TXOFFSET field containing the tile bits in our relocs.

The first arg to OUT_BATCH_RELOC is ignored, we actually wanted these
in the third arg.  They're always 0 so far, so it didn't matter.

v2: Reword commit message that I don't end up using the tile bits, but
    keep the commit as a cleanup anyway.

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
5 years agor100/r200: factor out txformat/txfilter setup from the TFP path.
Eric Anholt [Thu, 5 Sep 2019 22:48:58 +0000 (15:48 -0700)]
r100/r200: factor out txformat/txfilter setup from the TFP path.

No matter what, we deref the texFormat from the table, except for a
mistake in cpp=4 where we pulled a 0 out of the table either way.

v2: Rebase on dropping r200 table deduplication patch.

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
5 years agolima: fix PP stack size
Vasily Khoruzhick [Sat, 19 Oct 2019 01:06:56 +0000 (18:06 -0700)]
lima: fix PP stack size

PP stack size should be set to maximum PP stack size, not to stack size of
last shader.

Fixes: 27e7603c344a ("lima: fix ppir spill stack allocation")
Tested-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
5 years agofreedreno/a5xx: enable a510
Marijn Suijten [Sat, 19 Oct 2019 14:43:49 +0000 (16:43 +0200)]
freedreno/a5xx: enable a510

Kernel support for this GPU is added by the following series:
https://patchwork.kernel.org/project/linux-arm-msm/list/?series=187609
In particular https://patchwork.kernel.org/patch/11189953/

Tested on Sony Xperia X and X Compact.

Signed-off-by: Marijn Suijten <marijns95@gmail.com>
Tested-by: AngeloGioacchino Del Regno <kholk11@gmail.com>
5 years agoAppveyor/Meson: Add build test of osmesa gallium
Prodea Alexandru-Liviu [Sat, 19 Oct 2019 14:44:44 +0000 (14:44 +0000)]
Appveyor/Meson: Add build test of osmesa gallium
Signed-off-by: Prodea Alexandru-Liviu <liviuprodea@yahoo.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
5 years agoanv: fix vkUpdateDescriptorSets with inline uniform blocks
Lionel Landwerlin [Fri, 18 Oct 2019 11:50:02 +0000 (14:50 +0300)]
anv: fix vkUpdateDescriptorSets with inline uniform blocks

With inline uniform blocks descriptor, the meaning of descriptorCount
is a number of bytes to copy into the descriptor. Don't try to use
that size as an index into the descriptor table.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 43f40dc7cb ("anv: Implement VK_EXT_inline_uniform_block")
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1195
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agofreedreno/ir3: handle imad24_ir3 case in UBO lowering
Rob Clark [Tue, 8 Oct 2019 20:37:10 +0000 (13:37 -0700)]
freedreno/ir3: handle imad24_ir3 case in UBO lowering

Similiar to iadd, we can fold an added constant value from an imad24_ir3
into the load_uniform's constant offset.  This avoids some cases where
the addition of imad24_ir3 could otherwise be a regression in instr
count.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
5 years agofreedreno/ir3: add imul24 opcode
Rob Clark [Tue, 8 Oct 2019 20:36:14 +0000 (13:36 -0700)]
freedreno/ir3: add imul24 opcode

This maps to mul.s24

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
5 years agofreedreno/ir3: optimize immed 2nd src to mad
Rob Clark [Mon, 30 Sep 2019 18:44:16 +0000 (11:44 -0700)]
freedreno/ir3: optimize immed 2nd src to mad

We can't encode immed sources for cat3 (mad) instructions, but we can
use const in first or third src.  We handled this case already, but we
weren't considering that we could lower immed to const.

For manhattan:

  total instructions in shared programs: 35202 -> 34718 (-1.37%)
  instructions in affected programs: 14931 -> 14447 (-3.24%)
  helped: 90
  HURT: 0
  total full in shared programs: 2451 -> 2359 (-3.75%)
  full in affected programs: 653 -> 561 (-14.09%)
  helped: 69
  HURT: 2

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/ir3: add rule to generate imad24
Rob Clark [Fri, 27 Sep 2019 18:36:43 +0000 (11:36 -0700)]
freedreno/ir3: add rule to generate imad24

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
5 years agonir: add nir_lower_amul pass
Rob Clark [Fri, 27 Sep 2019 17:15:02 +0000 (10:15 -0700)]
nir: add nir_lower_amul pass

Lower amul to either imul or imul24, depending on whether 24b is enough
bits to calculate an offset within the thing being dereferenced.

Signed-off-by: Rob Clark <robdclark@chromium.org>
5 years agonir: add address calc related opt rules
Rob Clark [Thu, 26 Sep 2019 17:34:51 +0000 (10:34 -0700)]
nir: add address calc related opt rules

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
5 years agonir: add amul instruction
Rob Clark [Thu, 26 Sep 2019 17:32:00 +0000 (10:32 -0700)]
nir: add amul instruction

Used for address/offset calculation (ie. array derefs), where we can
potentially use less than 32b for the multiply of array idx by element
size.  For backends that support `imul24`, this gives a lowering pass
an easy way to find multiplies that potentially can be converted to
`imul24`.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
5 years agonir: Add a new ALU nir_op_imul24
Rob Clark [Wed, 25 Sep 2019 17:10:39 +0000 (10:10 -0700)]
nir: Add a new ALU nir_op_imul24

Some hardware can do 24b multiply in a single instruction, but not 32b.
However in most cases 24b is sufficient for address/offset calculation.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
5 years agofreedreno/ir3: Handle newly added opcode nir_op_imad24_ir3
Eduardo Lima Mitev [Fri, 28 Jun 2019 07:43:03 +0000 (09:43 +0200)]
freedreno/ir3: Handle newly added opcode nir_op_imad24_ir3

Simply emit an ir3_MAD_S24 instruction in the backend.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
5 years agonir: Add a new ALU nir_op_imad24_ir3
Eduardo Lima Mitev [Fri, 28 Jun 2019 07:39:38 +0000 (09:39 +0200)]
nir: Add a new ALU nir_op_imad24_ir3

ir3 compiler has a signed integer multiply-add instruction (MAD_S24)
that is used for different offset calculations in the backend.
Since we intend to move some of these calculations to NIR, we need
a new ALU op that can directly represent it.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
5 years agofreedreno/ir3: rename mul.s/mul.u
Rob Clark [Wed, 25 Sep 2019 17:21:24 +0000 (10:21 -0700)]
freedreno/ir3: rename mul.s/mul.u

to mul.s24/mul.u24, to better reflect that these are 24b multiply.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
5 years agonir/search: fix the PoT helpers
Rob Clark [Wed, 25 Sep 2019 18:59:49 +0000 (11:59 -0700)]
nir/search: fix the PoT helpers

Otherwise, if the base type is (for example) uint32, we would
incorrectly think that PoT optimizations could not apply.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jason Ekstsrand <jason@jleksrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
5 years agofreedreno/ir3: enable pre-fs texture fetch for a6xx
Rob Clark [Fri, 18 Oct 2019 18:30:48 +0000 (11:30 -0700)]
freedreno/ir3: enable pre-fs texture fetch for a6xx

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agoturnip: add support for pre-fs texture fetch
Rob Clark [Fri, 18 Oct 2019 18:52:35 +0000 (11:52 -0700)]
turnip: add support for pre-fs texture fetch

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/a6xx: add support for pre-fs texture fetch
Rob Clark [Fri, 11 Oct 2019 23:43:03 +0000 (16:43 -0700)]
freedreno/a6xx: add support for pre-fs texture fetch

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/ir3: Add support for texture sampling pre-dispatch
Hyunjun Ko [Mon, 5 Aug 2019 06:38:57 +0000 (08:38 +0200)]
freedreno/ir3: Add support for texture sampling pre-dispatch

Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/ir3: Add a NIR pass to select tex instructions eligible for pre-fetch
Eduardo Lima Mitev [Mon, 5 Aug 2019 06:09:23 +0000 (08:09 +0200)]
freedreno/ir3: Add a NIR pass to select tex instructions eligible for pre-fetch

The pass should run once at the end of shader compilation, for a4xx
onwards. It iterates texture sampling instructions and mark those
eligibile for pre-dispatch by changing the tex op from 'tex' to
'tex_prefetch'. An instruction is eligibile if:

* The coordinate is a vector where all its components come from a
  shader input.
* The order of the components match exactly that of the input (no
  swizzles).
* The instruction is in the 'main' function, and in the outer
  most-block.

The first two restrictions were arrived to empirically, so more
testing could tighten or loosen it.

The 3rd restriction is there to allow moving the instructions
eligible for pre-dispatch to the beginning of the shader, so
that we don't block the registers holding the result for too
long.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/ir3: force i/j pixel to r0.x
Rob Clark [Fri, 11 Oct 2019 02:36:30 +0000 (19:36 -0700)]
freedreno/ir3: force i/j pixel to r0.x

It seems that pre-fs texture fetch only works if ij_pix ends up in r0.x.
I've tried unknown zero bits, to no avail, and blob also seems to force
r0.x when this feature is used.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/ir3: add pre-dispatch tex fetch to disasm
Rob Clark [Thu, 10 Oct 2019 19:09:15 +0000 (12:09 -0700)]
freedreno/ir3: add pre-dispatch tex fetch to disasm

Useful to see in disassembly listing texture fetches that were moved to
pre-dispatch.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/ir3: add dummy bary.f(ei) for pre-fs-fetch
Rob Clark [Wed, 9 Oct 2019 22:51:01 +0000 (15:51 -0700)]
freedreno/ir3: add dummy bary.f(ei) for pre-fs-fetch

If the only use of varyings is a pre-shader texture-fetch, we still need
to issue a bary.f with the end-input flag, otherwise we'll block further
VS invocations, as the hw will think varying storage is still busy.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/ir3: fixup register footprint to account for prefetch
Rob Clark [Fri, 11 Oct 2019 23:15:44 +0000 (16:15 -0700)]
freedreno/ir3: fixup register footprint to account for prefetch

It is possible that the result of a pre-fs texture fetch is an output
(or partially an output) of the FS.  Sine the meta:tex_prefetch
instructions are dropped before the assembler, we need to account for
this when we fixup the register footprint.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/ir3: add meta instruction for pre-fs texture fetch
Rob Clark [Fri, 11 Oct 2019 22:57:22 +0000 (15:57 -0700)]
freedreno/ir3: add meta instruction for pre-fs texture fetch

Add a placeholder instruction to track texture fetches made prior to FS
shader dispatch.  These, like meta:input instructions are scheduled
before any real instructions, so that RA realizes their result values
are live before the first real instruction.  And to give legalize a way
to track usage of fetched sample requiring (sy) sync flags.

There is some related special handling for varying texcoord inputs used
for pre-fs-fetch, so that they are not DCE'd and remain in linkage
between FS and previous stage.  Note that we could almost avoid this
special handling by giving meta:tex_prefetch real src arguments, except
that in the FS stage, inputs are actual bary.f/ldlv instructions.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/ir3: don't DCE ij_pix if used for pre-fs-texture-fetch
Rob Clark [Fri, 11 Oct 2019 18:50:22 +0000 (11:50 -0700)]
freedreno/ir3: don't DCE ij_pix if used for pre-fs-texture-fetch

When we enable pre-dispatch texture fetch, we could have a scenario
where the barycentric i/j coord sysval is not used in the shader, but
only used for the varying fetch for the pre-dispatch texture fetch.
In this case we need to take care not to DCE this sysval.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/ir3: track sysval slot for inputs
Rob Clark [Fri, 11 Oct 2019 18:35:53 +0000 (11:35 -0700)]
freedreno/ir3: track sysval slot for inputs

Will be needed for special handling of SYSTEM_VALUE_BARYCENTRIC_PIXEL
(ij_pix) when pre-fs texture fetch is enabled.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/ir3: remove unused ir3_instruction::inout
Rob Clark [Fri, 11 Oct 2019 18:26:08 +0000 (11:26 -0700)]
freedreno/ir3: remove unused ir3_instruction::inout

Not sure I remember how long this has been unused for.  But it's unused
now.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno/ir3: Add data structures to support texture pre-fetch
Hyunjun Ko [Fri, 2 Aug 2019 19:12:22 +0000 (21:12 +0200)]
freedreno/ir3: Add data structures to support texture pre-fetch

Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agofreedreno: update registers
Rob Clark [Wed, 9 Oct 2019 19:16:03 +0000 (12:16 -0700)]
freedreno: update registers

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
5 years agonir: Add new texop nir_texop_tex_prefetch
Eduardo Lima Mitev [Wed, 10 Jul 2019 07:48:21 +0000 (09:48 +0200)]
nir: Add new texop nir_texop_tex_prefetch

This is like nir_texop_tex, but signals that the sampling coordinates
are immutable during the shader stage, in a way that allows the HW
that supports pre-dispatching sampling operations to pre-fetch
the result prior to scheduling the shader stage.

This is introduced to support the feature in Freedreno. Adreno HW
from a4xx supports it.

A NIR pass introduced later in this series will detect sampling
operations that are eligible for pre-dispatch, and replace
nir_texop_tex by this new op, to tell the backend to enable
pre-fetch.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>