mesa.git
4 years agopan/bi: Structify fadd/min/max16
Alyssa Rosenzweig [Sat, 28 Mar 2020 02:26:09 +0000 (22:26 -0400)]
pan/bi: Structify fadd/min/max16

There is some quirky encoding here.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopan/bi: Add v2f16 versions of rounding ops
Alyssa Rosenzweig [Sat, 28 Mar 2020 01:29:56 +0000 (21:29 -0400)]
pan/bi: Add v2f16 versions of rounding ops

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopan/bi: Handle round opcodes in frontend
Alyssa Rosenzweig [Sat, 28 Mar 2020 00:28:09 +0000 (20:28 -0400)]
pan/bi: Handle round opcodes in frontend

These correspond to various ops routed through BI_ROUND

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopan/bi: Assert out i16 related converts for now
Alyssa Rosenzweig [Sat, 28 Mar 2020 00:11:07 +0000 (20:11 -0400)]
pan/bi: Assert out i16 related converts for now

Needs more investigation, and GLSL doesn't use it quite yet sadly.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopan/bi: Add one-source f32->f16 op
Alyssa Rosenzweig [Sat, 28 Mar 2020 00:08:03 +0000 (20:08 -0400)]
pan/bi: Add one-source f32->f16 op

This really has a second op for vectorization but we don't handle this
quite yet...

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopan/bi: Add bifrost_fma_2src generic
Alyssa Rosenzweig [Sat, 28 Mar 2020 00:07:43 +0000 (20:07 -0400)]
pan/bi: Add bifrost_fma_2src generic

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopan/bi: Handle standard FMA conversions
Alyssa Rosenzweig [Fri, 27 Mar 2020 23:06:28 +0000 (19:06 -0400)]
pan/bi: Handle standard FMA conversions

These are plain old 1-sources so they're easy to start with.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopan/bi: Enumerate conversions
Alyssa Rosenzweig [Fri, 27 Mar 2020 22:57:58 +0000 (18:57 -0400)]
pan/bi: Enumerate conversions

There are lots of Bifrost conversion opcodes that can all be emitted
from BI_CONVERT, let's pattern match.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopan/bi: Expand out FMA conversion opcodes
Alyssa Rosenzweig [Fri, 27 Mar 2020 22:34:21 +0000 (18:34 -0400)]
pan/bi: Expand out FMA conversion opcodes

There are a *lot* of them, with lots of symmetry we can exploit to
simplify the packing logic (but not entirely). Let's add the
corresponding header structs/defines, although we don't actually poke
the disassembler at this stage.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopan/bi: Pack outmod and roundmode with FMA
Alyssa Rosenzweig [Fri, 27 Mar 2020 19:55:03 +0000 (15:55 -0400)]
pan/bi: Pack outmod and roundmode with FMA

The fields got missed.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopan/bi: Add FMA16 packing
Alyssa Rosenzweig [Fri, 27 Mar 2020 19:53:57 +0000 (15:53 -0400)]
pan/bi: Add FMA16 packing

It's like the original FMA packing but with swizzles introduced.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopan/bi: Fix missing type for fmul
Alyssa Rosenzweig [Fri, 27 Mar 2020 19:53:12 +0000 (15:53 -0400)]
pan/bi: Fix missing type for fmul

We add a zero argument, we want it to align with the size of whatever
the other arguments were for optimization.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopan/bi: Finish FMA structures
Alyssa Rosenzweig [Fri, 27 Mar 2020 19:51:20 +0000 (15:51 -0400)]
pan/bi: Finish FMA structures

There were some missing fields for the 32-bit case, and the 16-bit case
has separate packing.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopan/bi: Ignore swizzle in unwritten component
Alyssa Rosenzweig [Fri, 27 Mar 2020 18:40:30 +0000 (14:40 -0400)]
pan/bi: Ignore swizzle in unwritten component

Otherwise we can trip the assert for no good reason.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopan/bi: Handle f2f* opcodes
Alyssa Rosenzweig [Fri, 27 Mar 2020 18:40:04 +0000 (14:40 -0400)]
pan/bi: Handle f2f* opcodes

Just more converts that got missed.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopanfrost: Enable PIPE_SHADER_CAP_FP16 on Bifrost
Alyssa Rosenzweig [Fri, 27 Mar 2020 18:39:39 +0000 (14:39 -0400)]
panfrost: Enable PIPE_SHADER_CAP_FP16 on Bifrost

We don't have fp16 implemented on Midgard yet but on Bifrost we can flip
it on now.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopan/bi: Enable precision lowering in standalone compiler
Alyssa Rosenzweig [Fri, 27 Mar 2020 18:39:18 +0000 (14:39 -0400)]
pan/bi: Enable precision lowering in standalone compiler

..since there's no CAP to guide here.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopan/bi: Fix off-by-one in scoreboarding packing
Alyssa Rosenzweig [Thu, 26 Mar 2020 14:10:33 +0000 (10:10 -0400)]
pan/bi: Fix off-by-one in scoreboarding packing

Clauses actually encode the *next* clauses' dependencies.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopan/bi: Fix overzealous write barriers
Alyssa Rosenzweig [Thu, 26 Mar 2020 14:06:49 +0000 (10:06 -0400)]
pan/bi: Fix overzealous write barriers

It's possible this triggers an INSTR_INVALID_ENC.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopan/bit: Begin generating a vertex job
Alyssa Rosenzweig [Wed, 25 Mar 2020 18:56:06 +0000 (14:56 -0400)]
pan/bit: Begin generating a vertex job

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopan/bit: Submit a WRITE_VALUE job as a sanity check
Alyssa Rosenzweig [Tue, 24 Mar 2020 18:08:16 +0000 (14:08 -0400)]
pan/bit: Submit a WRITE_VALUE job as a sanity check

If this fails, everything else probably will too.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopanfrost: Stub out G31/G52 quirks
Alyssa Rosenzweig [Tue, 24 Mar 2020 17:53:18 +0000 (13:53 -0400)]
panfrost: Stub out G31/G52 quirks

There are none so far, but we'll need quirks accessible for Bifrost
specific details in the future, and in the mean time we need to handle
the cases somehow to avoid the unreachable(..)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopan/bit: Open up the device
Alyssa Rosenzweig [Tue, 24 Mar 2020 17:48:06 +0000 (13:48 -0400)]
pan/bit: Open up the device

As a start and a sanity check.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopanfrost: Move device open/close to root panfrost
Alyssa Rosenzweig [Tue, 24 Mar 2020 17:40:12 +0000 (13:40 -0400)]
panfrost: Move device open/close to root panfrost

We need it for standalone testing too.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopan/bit: Link standalone compiler with en/decoder
Alyssa Rosenzweig [Tue, 24 Mar 2020 17:24:03 +0000 (13:24 -0400)]
pan/bit: Link standalone compiler with en/decoder

We would like to submit jobs from the standalone compiler for testing
purposes, so let's get things wired up.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopanfrost: Move pan_bo to root panfrost
Alyssa Rosenzweig [Mon, 23 Mar 2020 23:36:46 +0000 (19:36 -0400)]
panfrost: Move pan_bo to root panfrost

Now that its Gallium dependencies have been resolved, we can move this
all out to root. The only nontrivial change here is keeping the
pandecode calls in Gallium-panfrost to avoid creating a circular
dependency between encoder/decoder. This could be solved with a third
drm folder but this seems less intrusive for now and Roman would
probably appreciate if I went longer than 8 hours without breaking the
Android build.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopanfrost: Inline reference counting routines
Alyssa Rosenzweig [Mon, 23 Mar 2020 23:17:49 +0000 (19:17 -0400)]
panfrost: Inline reference counting routines

We use only a very small subset of the capabilities of
pipe_reference (just wrappers for atomic ints..). Let's inline it and
drop the dependency on Gallium from pan_bo.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopanfrost: Isolate panfrost_bo_access_for_stage to pan_cmdstream.c
Alyssa Rosenzweig [Mon, 23 Mar 2020 23:10:06 +0000 (19:10 -0400)]
panfrost: Isolate panfrost_bo_access_for_stage to pan_cmdstream.c

We don't use it outside this file (and really shouldn't) and it has a
strict Gallium dependency in pan_bo.h.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopanfrost: Split panfrost_device from panfrost_screen
Alyssa Rosenzweig [Mon, 23 Mar 2020 22:44:21 +0000 (18:44 -0400)]
panfrost: Split panfrost_device from panfrost_screen

We would like to access properties of the device in a
Gallium-independent way (for out-of-Gallium testing in the short-term,
and would help a theoretical Vulkan implementation in the long run).
Let's split up the struct.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>

4 years agopanfrost: Correctly identify format 0x4c
Icecream95 [Tue, 24 Mar 2020 04:33:22 +0000 (17:33 +1300)]
panfrost: Correctly identify format 0x4c

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4292>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4292>

4 years agopanfrost: Add support for R3G3B2
Icecream95 [Tue, 24 Mar 2020 00:25:19 +0000 (13:25 +1300)]
panfrost: Add support for R3G3B2

Tested with texenv from mesa-demos.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4292>

4 years agost/mesa: Fall back on R3G3B2 for R3_G3_B2
Icecream95 [Tue, 24 Mar 2020 01:45:45 +0000 (14:45 +1300)]
st/mesa: Fall back on R3G3B2 for R3_G3_B2

It's simpler for Panfrost to use R3G3B2 instead of B2G3R3, but
format_map only listed the BGR variation.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4292>

4 years agopanfrost: Add support for B5G5R5X1
Icecream95 [Tue, 24 Mar 2020 00:09:30 +0000 (13:09 +1300)]
panfrost: Add support for B5G5R5X1

Tested with texenv from mesa-demos.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4292>

4 years agopanfrost: Mark 64-bit formats as unsupported
Icecream95 [Mon, 23 Mar 2020 23:42:42 +0000 (12:42 +1300)]
panfrost: Mark 64-bit formats as unsupported

There is no hardware support for these formats, but some games use
them for vertex data.

This fixes a crash in Aleph One.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4292>

4 years agonir: Handle vec8/16 in nir_shrink_array_vars
Jason Ekstrand [Mon, 30 Mar 2020 17:14:48 +0000 (12:14 -0500)]
nir: Handle vec8/16 in nir_shrink_array_vars

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>

4 years agonir: Handle vec8/16 in opt_undef_vecN
Jason Ekstrand [Mon, 30 Mar 2020 17:09:03 +0000 (12:09 -0500)]
nir: Handle vec8/16 in opt_undef_vecN

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>

4 years agonir: Treat vec8/16 as select in opt_peephole_select
Jason Ekstrand [Mon, 30 Mar 2020 17:08:47 +0000 (12:08 -0500)]
nir: Treat vec8/16 as select in opt_peephole_select

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>

4 years agonir: Handle vec8/16 in opt_split_alu_of_phi
Jason Ekstrand [Mon, 30 Mar 2020 17:08:20 +0000 (12:08 -0500)]
nir: Handle vec8/16 in opt_split_alu_of_phi

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>

4 years agonir: Handle vec8/16 in lower_regs_to_ssa
Jason Ekstrand [Mon, 30 Mar 2020 17:07:09 +0000 (12:07 -0500)]
nir: Handle vec8/16 in lower_regs_to_ssa

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>

4 years agonir: Handle vec8/16 in lower_phis_to_scalar
Jason Ekstrand [Mon, 30 Mar 2020 17:06:52 +0000 (12:06 -0500)]
nir: Handle vec8/16 in lower_phis_to_scalar

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>

4 years agonir: Handle vec8/16 in gather_ssa_types
Jason Ekstrand [Mon, 30 Mar 2020 17:06:38 +0000 (12:06 -0500)]
nir: Handle vec8/16 in gather_ssa_types

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>

4 years agonir: Handle vec8/16 in bool_to_bitsize
Jason Ekstrand [Mon, 30 Mar 2020 16:59:25 +0000 (11:59 -0500)]
nir: Handle vec8/16 in bool_to_bitsize

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>

4 years agonir: Copy propagate through vec8s and vec16s
Jason Ekstrand [Sat, 28 Mar 2020 16:23:52 +0000 (11:23 -0500)]
nir: Copy propagate through vec8s and vec16s

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>

4 years agonir: Add a nir_op_is_vec helper
Jason Ekstrand [Mon, 30 Mar 2020 17:06:22 +0000 (12:06 -0500)]
nir: Add a nir_op_is_vec helper

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>

4 years agonir/algebraic: Add downcast-of-pack opts
Jason Ekstrand [Sat, 28 Mar 2020 16:24:08 +0000 (11:24 -0500)]
nir/algebraic: Add downcast-of-pack opts

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>

4 years agonir/lower_int64: Lower 8 and 16-bit downcasts with nir_lower_mov64
Jason Ekstrand [Sat, 28 Mar 2020 16:22:43 +0000 (11:22 -0500)]
nir/lower_int64: Lower 8 and 16-bit downcasts with nir_lower_mov64

We have the code to do the lowering, we were just missing the
boilerplate bits to make should_lower_int64_alu_instr return true.

Fixes: 62d55f12818e "nir: Wire up int64 lowering functions"
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>

4 years agofreedreno/log: avoid duplicate ts's
Rob Clark [Mon, 30 Mar 2020 16:30:54 +0000 (09:30 -0700)]
freedreno/log: avoid duplicate ts's

In cases where `fd_log()`/`fd_log_stream()` are called multiple times
back-to-back, just use the timestamp of the first trace.

This seems to avoid some occasional GPU hangs I was seeing with logging
enabled.  Although not exactly sure the reason for the hangs.  (Looks
like GPU hangs *after* all the cmdstream is processed, according to
crashdec.)

Signed-off-by: Rob Clark <robdclark@chromium.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4366>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4366>

4 years agofreedreno/a6xx: add some more tracepoints
Rob Clark [Sat, 28 Mar 2020 18:28:14 +0000 (11:28 -0700)]
freedreno/a6xx: add some more tracepoints

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4366>

4 years agofreedreno: add some initial fd_log tracepoints
Rob Clark [Sat, 28 Mar 2020 17:54:10 +0000 (10:54 -0700)]
freedreno: add some initial fd_log tracepoints

Mostly convert over existing DBG traces.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4366>

4 years agofreedreno/a6xx: timestamp logging support
Rob Clark [Sat, 28 Mar 2020 17:43:42 +0000 (10:43 -0700)]
freedreno/a6xx: timestamp logging support

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4366>

4 years agofreedreno: add logging infrastructure
Rob Clark [Sat, 28 Mar 2020 16:57:35 +0000 (09:57 -0700)]
freedreno: add logging infrastructure

Provides a way to log msgs timestamped at the corresponding position in
the GPU cmdstream, mostly for the purposes of profiling.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4366>

4 years agoutil: fix u_fifo_pop()
Rob Clark [Sat, 28 Mar 2020 18:14:05 +0000 (11:14 -0700)]
util: fix u_fifo_pop()

Seems like no one ever depended on it to actually return false when fifo
is empty.

Fixes: 6e61d062093 ("util: Add super simple fifo")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4366>

4 years agofreedreno: remove some obsolete debug options
Rob Clark [Sat, 28 Mar 2020 16:07:43 +0000 (09:07 -0700)]
freedreno: remove some obsolete debug options

'fraghalf' is unused (superceeded by actually lowering output based on
the precision information in nir).  And glsl140 support in ir3 is long
past the experimental stage, so the glsl120 option is no longer needed.
So remove them and free up some bits for new things.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4366>

4 years agonir/opt_loop_unroll: Fix has_nested_loop handling
Jason Ekstrand [Mon, 30 Mar 2020 21:39:01 +0000 (16:39 -0500)]
nir/opt_loop_unroll: Fix has_nested_loop handling

In 87839680c0a48, a very subtle mistake was made with the CFG walking
recursion.  Instead of setting the local has_nested_loop variable when
process child loops, has_nested_loop_out was passed directly into the
process_loop_in_block call.  This broke nested loop detection heuristics
and caused loop unrolling to run massively out of control.  In
particular, it makes the following CTS test compile virtually forever:

dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_geom

Fixes: 87839680c0 "nir: Fix breakage of foreach_list_typed_safe..."
Closes: #2710
Reviewed-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4380>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4380>

4 years agofreedreno: Work around UBWC flakiness.
Eric Anholt [Mon, 23 Mar 2020 21:55:50 +0000 (14:55 -0700)]
freedreno: Work around UBWC flakiness.

In trying to track down the new failure in #2670, I found that I could get
the flaky test set down to 4 tests, and dropping any remaining test
wouldn't trigger the failure (a bad 8x4 block in the middle of
dEQP-GLES3.functional.fbo.msaa.4_samples.r16f's render target).  Disabling
gmem or bypass didn't help, and adding lots of CCU flushing didn't help.
What did help was disabling blitting, or this memset to initialize the
UBWC area after we (presumably) pull a BO out of the BO cache.  My guess
is that the 2D blitter can't handle some rare set of state in the flags
buffer and emits some garbage.

I've run 8 gles3 and 7 gles31 runs with this branch now so hopefully I've got the4 right set of flakes marked for removal.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2670
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4290>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4290>

4 years agofreedreno: Fix detection of being in a blit for acc queries.
Eric Anholt [Sat, 28 Mar 2020 00:23:19 +0000 (17:23 -0700)]
freedreno: Fix detection of being in a blit for acc queries.

The batch might not have stage == FD_STAGE_BLIT set because
fd_blitter_pipe_begin was sticking the stage on some random batch (or none
at all) rather than the one that would be used in the meta operation.

What we actually wanted to be looking at was set_active_query_state(),
which is already called by util_blitter and whose state we just needed to
track.

Fixes piglit occlusion_query_meta_no_fragments.  I haven't changed
query_hw.c's stage handling to clean the rest up because I don't have a
db410c/db820c at home to iterate over the piglit tests.

Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4356>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4356>

4 years agofreedreno: Rename "is_blit" to "is_discard_blit"
Eric Anholt [Sat, 28 Mar 2020 00:13:25 +0000 (17:13 -0700)]
freedreno: Rename "is_blit" to "is_discard_blit"

It's about the special case of an overwrite of a level meaning we can
discard old batch contents.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4356>

4 years agofreedreno/a6xx: Fix timestamp queries.
Eric Anholt [Mon, 30 Mar 2020 17:23:45 +0000 (10:23 -0700)]
freedreno/a6xx: Fix timestamp queries.

We were returning the same kind of result as time_elapsed (an end - start
time in ns), which on a timestamp query is approximately zero since
begin/end are at the same point in time.  What we're supposed to return is
a converted-to-ns timestamp based on the GPU clock.  Remove the _pause()
function for time_elapsed to reduce the command stream overhead, and just
capture start (which is, unfortunately, going to happen on each tile and
thus the final start value we ready will be the last tile of the frame,
not the first).

Fixes piglit spec/arb_timer_query/query gl_timestamp

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4356>

4 years agofreedreno: Count blits in GL_TIME_ELAPSED and perf counter queries.
Eric Anholt [Fri, 27 Mar 2020 17:54:19 +0000 (10:54 -0700)]
freedreno: Count blits in GL_TIME_ELAPSED and perf counter queries.

Fixes 0 gpu time reported for glBlitFramebuffer in apitrace replay --pgpu.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4356>

4 years agofreedreno: Associate the acc query bo with the batch.
Eric Anholt [Fri, 27 Mar 2020 18:31:24 +0000 (11:31 -0700)]
freedreno: Associate the acc query bo with the batch.

Otherwise, a result query with wait won't trigger flushing the batch, and
we can end up with zeroed results.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4356>

4 years agofreedreno: Fix acc query handling in the presence of batch reordering.
Eric Anholt [Fri, 27 Mar 2020 22:55:14 +0000 (15:55 -0700)]
freedreno: Fix acc query handling in the presence of batch reordering.

When we switch batches and start a new draw, we need to cap the queries in
the previous batch and start queries again in the new one.

FD_STAGE_NULL got renamed to 0 so that it would naturally return
!is_active and end the queries at the end of the batch.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4356>

4 years agofreedreno: Remove the "active" member of queries.
Eric Anholt [Fri, 27 Mar 2020 23:48:05 +0000 (16:48 -0700)]
freedreno: Remove the "active" member of queries.

The state tracker only gets to begin/query/destroy when !active and end
when active, so we have no need to try to track this ourselves.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4356>

4 years agofreedreno: Remove always-true return from per-gen begin_query.
Eric Anholt [Fri, 27 Mar 2020 23:46:22 +0000 (16:46 -0700)]
freedreno: Remove always-true return from per-gen begin_query.

You should do failure-prone allocation in create_query, not begin, anyway.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4356>

4 years agoutil/u_queue: fix race in total_jobs_size access
Rhys Perry [Thu, 26 Mar 2020 15:50:31 +0000 (15:50 +0000)]
util/u_queue: fix race in total_jobs_size access

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
CC: <mesa-stable@lists.freedesktop.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4335>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4335>

4 years agoglsl: fix race in instance getters
Rhys Perry [Thu, 26 Mar 2020 15:49:05 +0000 (15:49 +0000)]
glsl: fix race in instance getters

Insertions can modify entry->data. Seems to fix random Fossilize crashes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
CC: <mesa-stable@lists.freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4335>

4 years agonir: Set UBO alignments in lower_uniforms_to_ubo
Jason Ekstrand [Fri, 27 Mar 2020 04:56:57 +0000 (23:56 -0500)]
nir: Set UBO alignments in lower_uniforms_to_ubo

Fixes: fb64954d9dd "nir: Validate that memory load/store ops work on..."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4378>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4378>

4 years agoaco: look at p_{extract,split}_vector's definitions in pred_by_exec_mask()
Rhys Perry [Thu, 26 Mar 2020 13:22:13 +0000 (13:22 +0000)]
aco: look at p_{extract,split}_vector's definitions in pred_by_exec_mask()

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4333>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4333>

4 years agoCI: Re-enable Windows VS2019 builds
Daniel Stone [Mon, 30 Mar 2020 14:58:51 +0000 (15:58 +0100)]
CI: Re-enable Windows VS2019 builds

The failures are fixed, but I didn't notice this had been silently
disabled in !4272.

Re-enable the VS2019 build.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4374>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4374>

4 years agonir: Validate that memory load/store ops work on whole bytes
Jason Ekstrand [Fri, 27 Mar 2020 18:08:21 +0000 (13:08 -0500)]
nir: Validate that memory load/store ops work on whole bytes

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>

4 years agoanv: Set alignments on descriptor and constant loads
Jason Ekstrand [Thu, 26 Mar 2020 20:46:56 +0000 (15:46 -0500)]
anv: Set alignments on descriptor and constant loads

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>

4 years agonir: Insert b2b1s around booleans in nir_lower_to
Jason Ekstrand [Fri, 27 Mar 2020 05:30:25 +0000 (00:30 -0500)]
nir: Insert b2b1s around booleans in nir_lower_to

By inserting a b2b1 around the load_ubo, load_input, etc. intrinsics
generated by nir_lower_io, we can ensure that the intrinsic has the
correct destination bit size.  Not having the right size can mess up
passes which try to optimize access.  In particular, it was causing
brw_nir_analyze_ubo_ranges to ignore load_ubo of booleans which meant
that booleans uniforms weren't getting pushed as push constants.  I
don't think this is an actual functional bug anywhere hence no CC to
stable but it may improve perf somewhere.

Shader-db results on ICL with iris:

    total instructions in shared programs: 16076707 -> 16075246 (<.01%)
    instructions in affected programs: 129034 -> 127573 (-1.13%)
    helped: 487
    HURT: 0
    helped stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3
    helped stats (rel) min: 0.45% max: 3.00% x̄: 1.33% x̃: 1.36%
    95% mean confidence interval for instructions value: -3.00 -3.00
    95% mean confidence interval for instructions %-change: -1.37% -1.29%
    Instructions are helped.

    total cycles in shared programs: 338015639 -> 337983311 (<.01%)
    cycles in affected programs: 971986 -> 939658 (-3.33%)
    helped: 362
    HURT: 110
    helped stats (abs) min: 1 max: 1664 x̄: 97.37 x̃: 43
    helped stats (rel) min: 0.03% max: 36.22% x̄: 5.58% x̃: 2.60%
    HURT stats (abs)   min: 1 max: 554 x̄: 26.55 x̃: 18
    HURT stats (rel)   min: 0.03% max: 10.99% x̄: 1.04% x̃: 0.96%
    95% mean confidence interval for cycles value: -79.97 -57.01
    95% mean confidence interval for cycles %-change: -4.60% -3.47%
    Cycles are helped.

    total sends in shared programs: 815037 -> 814550 (-0.06%)
    sends in affected programs: 5701 -> 5214 (-8.54%)
    helped: 487
    HURT: 0

    LOST:   2
    GAINED: 0

The two lost programs were SIMD16 shaders in CS:GO.  However, CS:GO was
also one of the most helped programs where it shaves sends off of 134
programs.  This seems to reduce GPU core clocks by about 4% on the first
1000 frames of the PTS benchmark.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>

4 years agonir: Use b2b opcodes for shared and constant memory
Jason Ekstrand [Fri, 27 Mar 2020 05:29:14 +0000 (00:29 -0500)]
nir: Use b2b opcodes for shared and constant memory

No shader-db changes on ICL with iris

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>

4 years agoaco: Implement b2b32 and b2b1
Jason Ekstrand [Fri, 27 Mar 2020 16:49:14 +0000 (11:49 -0500)]
aco: Implement b2b32 and b2b1

The implementations here just clone i2b32 and i2b1.  This means that
b2b32 doesn't technically generate true NIR 0/-1 booleans but it should
be fine as it's only ever generated for shared variable writes which
will always be consumed by something which will then run it through an
i2b again.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>

4 years agonir: Add b2b opcodes
Jason Ekstrand [Fri, 27 Mar 2020 05:18:43 +0000 (00:18 -0500)]
nir: Add b2b opcodes

These exist to convert between different types of boolean values.  In
particular, we want to use these for uniform and shared memory
operations where we need to convert to a reasonably sized boolean but we
don't care what its format is so we don't want to make the back-end
insert an actual i2b/b2i.  In the case of uniforms, Mesa can tweak the
format of the uniform boolean to whatever the driver wants.  In the case
of shared, every value in a shared variable comes from the shader so
it's already in the right boolean format.

The new boolean conversion opcodes get replaced with mov in
lower_bool_to_int/float32 so the back-end will hopefully never see them.
However, while we're in the middle of optimizing our NIR, they let us
have sensible load_uniform/ubo intrinsics and also have the bit size
conversion.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>

4 years agointel/nir: Run copy-prop and DCE after lower_bool_to_int32
Jason Ekstrand [Fri, 27 Mar 2020 16:05:27 +0000 (11:05 -0500)]
intel/nir: Run copy-prop and DCE after lower_bool_to_int32

No shader-db impact on ICL with iris.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>

4 years agoetnaviv: compiled_framebuffer_state: get rid of SE_SCISSOR_*
Christian Gmeiner [Sun, 22 Mar 2020 21:42:35 +0000 (22:42 +0100)]
etnaviv: compiled_framebuffer_state: get rid of SE_SCISSOR_*

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4278>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4278>

4 years agoetnaviv: s/scissor_s/scissor
Christian Gmeiner [Sun, 22 Mar 2020 10:16:58 +0000 (11:16 +0100)]
etnaviv: s/scissor_s/scissor

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4278>

4 years agoetnaviv: get rid of struct compiled_scissor_state
Christian Gmeiner [Sun, 22 Mar 2020 10:13:12 +0000 (11:13 +0100)]
etnaviv: get rid of struct compiled_scissor_state

We can reuse pipe_scissor_state.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4278>

4 years agoetnaviv: do the left shift by 16 at emit time
Christian Gmeiner [Sun, 22 Mar 2020 10:07:16 +0000 (11:07 +0100)]
etnaviv: do the left shift by 16 at emit time

Also round up the max bounds.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4278>

4 years agoetnaviv: rework clippling calculation to be a derived state
Christian Gmeiner [Sat, 21 Mar 2020 10:24:48 +0000 (11:24 +0100)]
etnaviv: rework clippling calculation to be a derived state

This moves the whole clipping calculation out of the emit function.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4278>

4 years agoetnaviv: get rid of SE_CLIP_*
Christian Gmeiner [Sat, 21 Mar 2020 10:16:37 +0000 (11:16 +0100)]
etnaviv: get rid of SE_CLIP_*

The only difference between e.g. SE_SCISSOR_RIGHT and SE_CLIP_RIGHT
is the used margin value. With that information we can remove
SE_CLIP_* and apply the different margins during emit time.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4278>

4 years agogitlab-ci: Prune all SCons jobs except scons-win64, and allows failures.
Jose Fonseca [Sat, 28 Mar 2020 10:36:28 +0000 (10:36 +0000)]
gitlab-ci: Prune all SCons jobs except scons-win64, and allows failures.

Based on the discussion in https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4352

Reviewed-by: Daniel Stone <daniels@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4363>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4363>

4 years agonir/algebraic: add fexp2(fmul(flog2(a), 0.5) -> fsqrt(a) optimization
Samuel Pitoiset [Fri, 27 Mar 2020 15:40:38 +0000 (16:40 +0100)]
nir/algebraic: add fexp2(fmul(flog2(a), 0.5) -> fsqrt(a) optimization

Helps some Wolfenstein II and Wolfenstein Youngblood shaders.

pipeline-db (VEGA10/ACO):
Totals from affected shaders:
SGPRS: 17904 -> 17904 (0.00 %)
VGPRS: 14492 -> 14492 (0.00 %)
Spilled SGPRs: 20 -> 20 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 1753152 -> 1749708 (-0.20 %) bytes
Max Waves: 2581 -> 2581 (0.00 %)

pipeline-db (VEGA10/LLVM):
Totals from affected shaders:
SGPRS: 26656 -> 26656 (0.00 %)
VGPRS: 23780 -> 23780 (0.00 %)
Spilled SGPRs: 2112 -> 2112 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 2552712 -> 2549236 (-0.14 %) bytes
Max Waves: 3359 -> 3359 (0.00 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4353>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4353>

4 years agoscons: Prune out unnecessary targets.
Jose Fonseca [Fri, 27 Mar 2020 15:07:32 +0000 (15:07 +0000)]
scons: Prune out unnecessary targets.

This prunes out all targets except libgl-gdi, libgl-xlib, and svga, as
suggested by Marek Olšák.

libgl-xlib will be remove once I have had time to confirm no automated
tests we have rely upon it.

There are also a bunch of Makefile.sources which become orphaned as
result, that are not taken care of in this change.

v2: Prune remainders of swr support.

Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4348>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4348>

4 years agoaco: Don't store LS VS outputs to LDS when TCS doesn't need them.
Timur Kristóf [Thu, 26 Mar 2020 18:36:05 +0000 (19:36 +0100)]
aco: Don't store LS VS outputs to LDS when TCS doesn't need them.

Totals:
Code Size: 254764624 -> 254745104 (-0.01 %) bytes

Totals from affected shaders:
VGPRS: 12132 -> 12112 (-0.16 %)
Code Size: 573364 -> 553844 (-3.40 %) bytes

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>

4 years agoaco: When LS and HS invocations are the same, pass LS outputs in temps.
Timur Kristóf [Thu, 26 Mar 2020 16:45:55 +0000 (17:45 +0100)]
aco: When LS and HS invocations are the same, pass LS outputs in temps.

We know that in this case, the LS and HS invocations are working
on the exact same vertex, so it's safe to skip the LDS.

Totals:
VGPRS: 3960744 -> 3961844 (0.03 %)
Code Size: 254824300 -> 254764624 (-0.02 %) bytes
Max Waves: 1053748 -> 1053574 (-0.02 %)

Totals from affected shaders:
VGPRS: 26152 -> 27252 (4.21 %)
Code Size: 1496600 -> 1436924 (-3.99 %) bytes
Max Waves: 4860 -> 4686 (-3.58 %)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>

4 years agoaco: Extract store_output_to_temps into a separate function.
Timur Kristóf [Thu, 26 Mar 2020 16:30:16 +0000 (17:30 +0100)]
aco: Extract store_output_to_temps into a separate function.

Will be used by LS output stores.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>

4 years agoaco: Fix workgroup size calculation.
Timur Kristóf [Thu, 12 Mar 2020 15:28:48 +0000 (16:28 +0100)]
aco: Fix workgroup size calculation.

Clear the workgroup size for all supported shader stages.
Also, unify the workgroup size calculation accross various places.

As a result, insert_waitcnt can use the proper workgroup size
which means that some waits can be dropped from tessellation
shaders. Also, in cases where the previous calculation was wrong,
we now insert s_barrier instructions.

Totals from affected shaders (GFX10):
Code Size: 340116 -> 338484 (-0.48 %) bytes

Fixes: a8d15ab6daf0a07476e9dfabe513c0f1e0f3bf82
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>

4 years agoaco: Extract setup_tcs_info to a separate function.
Timur Kristóf [Thu, 26 Mar 2020 16:17:38 +0000 (17:17 +0100)]
aco: Extract setup_tcs_info to a separate function.

Will be required by the workgroup size calculation.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>

4 years agoaco: Zero-fill undefined elements in create_vec_from_array.
Timur Kristóf [Thu, 26 Mar 2020 11:19:32 +0000 (12:19 +0100)]
aco: Zero-fill undefined elements in create_vec_from_array.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>

4 years agoaco: Change isel inputs/outputs to a flat array.
Timur Kristóf [Tue, 24 Mar 2020 14:46:55 +0000 (15:46 +0100)]
aco: Change isel inputs/outputs to a flat array.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>

4 years agoaco: Treat outputs of the previous stage as inputs of the next stage.
Timur Kristóf [Tue, 17 Mar 2020 12:43:08 +0000 (13:43 +0100)]
aco: Treat outputs of the previous stage as inputs of the next stage.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>

4 years agonir: Collect if shader uses cross-invocation or indirect I/O.
Timur Kristóf [Fri, 13 Mar 2020 09:14:37 +0000 (10:14 +0100)]
nir: Collect if shader uses cross-invocation or indirect I/O.

The following new fields are added to tess shader info:

* `tcs_cross_invocation_inputs_read`
* `tcs_cross_invocation_outputs_read`

These are I/O masks that are a subset of inputs_read and outputs_read
and they contain which per-vertex inputs and outputs are read
cross-invocation.

Additionall, the following new fields are added to shader_info:

* `inputs_read_indirectly`
* `outputs_accessed_indirectly`
* `patch_inputs_read_indirectly`
* `patch_outputs_accessed_indirectly`

These new fields can be used for optimizing TCS in a back-end compiler.
If you can be sure that the TCS doesn't use cross-invocation inputs
or outputs, you can choose a different strategy for storing VS and TCS
outputs. However, such optimizations might need to be disabled when
the inputs/outputs are accessed indirectly due to backend limitations,
so this information is also collected.

Example: RADV currently has to store all VS and TCS outputs in LDS, but
for shaders when only inputs and/or outputs belonging to the current
invocation ID are used, it could skip storing these in LDS entirely.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>

4 years agoaco: Use more optimal sequence at the beginning of merged shaders.
Timur Kristóf [Fri, 13 Mar 2020 11:39:23 +0000 (12:39 +0100)]
aco: Use more optimal sequence at the beginning of merged shaders.

It can be further optimized in the future, but
the new sequence already has a few advantages:

* Uses fewer instructions
* Uses even fewer instructions in wave32 mode
* Doesn't use the VALU at all

Totals from affected shaders (GFX10):
VGPRS: 43504 -> 43496 (-0.02 %)
Code Size: 2436000 -> 2423688 (-0.51 %) bytes
Max Waves: 8704 -> 8705 (0.01 %)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>

4 years agoaco: Skip 2nd read of merged wave info when TCS in/out vertices are equal.
Timur Kristóf [Thu, 12 Mar 2020 18:54:16 +0000 (19:54 +0100)]
aco: Skip 2nd read of merged wave info when TCS in/out vertices are equal.

When TCS has an equal number of input and output, it means that the
number of VS and TCS invocations (LS and HS) are the same; and that
the HS invocations operate on the same vertices as the LS.

When this is the case, this commit removes the else-if between
the merged VS and TCS halves, making it possible to schedule
and optimize the code accross the two halves.

Totals:
SGPRS: 5577367 -> 5581735 (0.08 %)
VGPRS: 3958592 -> 3960752 (0.05 %)
Code Size: 254867144 -> 254838244 (-0.01 %) bytes
Max Waves: 1053887 -> 1053747 (-0.01 %)

Totals from affected shaders:
SGPRS: 29032 -> 33400 (15.05 %)
VGPRS: 35664 -> 37824 (6.06 %)
Code Size: 1979028 -> 1950128 (-1.46 %) bytes
Max Waves: 7310 -> 7170 (-1.92 %)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>

4 years agoaco: Allow combining LDS loads when loading tess factors.
Timur Kristóf [Thu, 12 Mar 2020 15:55:19 +0000 (16:55 +0100)]
aco: Allow combining LDS loads when loading tess factors.

Previously the tess factors were loaded individually, but now they can
be loaded using a single LDS load instruction.

Note that the inner and outer tess factors are not yet combined.

Totals (GFX10):
Code Size: 254896008 -> 254879212 (-0.01 %) bytes

Totals from affected shaders (GFX10):
Code Size: 2028352 -> 2011556 (-0.83 %) bytes

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>

4 years agoaco: Allow combining TCS output VMEM stores.
Timur Kristóf [Thu, 12 Mar 2020 15:30:58 +0000 (16:30 +0100)]
aco: Allow combining TCS output VMEM stores.

Some copypasta may have stuck in the code.
This was left on false by mistake.

Totals (GFX10):
Code Size: 254939248 -> 254896008 (-0.02 %) bytes

Totals from affected shaders (GFX10):
VGPRS: 16196 -> 16212 (0.10 %)
Code Size: 1126332 -> 1083092 (-3.84 %) bytes
Max Waves: 2336 -> 2334 (-0.09 %)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>

4 years agoaco: Fix handling of tess factors.
Timur Kristóf [Thu, 26 Mar 2020 17:36:07 +0000 (18:36 +0100)]
aco: Fix handling of tess factors.

There is no need to check whether they are written using indirect
indices, because all tess factors should be written to VMEM only
at the end of the shader.

No pipeline db changes.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>

4 years agoaco: Extract tcs_driver_location_matches_api_mask to separate function.
Timur Kristóf [Thu, 26 Mar 2020 17:14:43 +0000 (18:14 +0100)]
aco: Extract tcs_driver_location_matches_api_mask to separate function.

Also clear up should_write_tcs_output_to_lds a little bit.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>

4 years agoaco: Create null exports in instruction selection instead of assembler.
Timur Kristóf [Thu, 12 Mar 2020 16:20:16 +0000 (17:20 +0100)]
aco: Create null exports in instruction selection instead of assembler.

This allows the passes after isel to assume that the exports are
always correct, and also allows to schedule these null exports later.
Additionally, it ensures that the correct exec mask is used for
these exports.

Totals from affected shaders (GFX10):
SGPRS: 84224 -> 84344 (0.14 %)
VGPRS: 23088 -> 23076 (-0.05 %)
Code Size: 882892 -> 894368 (1.30 %) bytes

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>