radeonsi/gfx10: implement streamout-related queries
authorNicolai Hähnle <nicolai.haehnle@amd.com>
Wed, 19 Sep 2018 12:53:35 +0000 (14:53 +0200)
committerMarek Olšák <marek.olsak@amd.com>
Wed, 3 Jul 2019 19:51:13 +0000 (15:51 -0400)
commit792a638b032d16fbe6404f9d90c34b3e0f1fb0b5
tree2fc020d12cded02beb595e8e979a347bd537b96f
parentbcd2d2e1942ab7158dd46a5223130498cb0a8f44
radeonsi/gfx10: implement streamout-related queries

The NGG hardware pipeline doesn't track these statistics automatically,
and in fact *cannot* track them automatically when API geometry shaders
are involved, so we accumulate statistics in the shader using atomic
adds.

This implementation accumulates statistics via the memory system and
the RW buffer descriptor setup. We could use GDS, but since these
atomics aren't latency-sensitive, that basically just trades off
L2$ bandwidth vs. export bus bandwidth. One single memory transaction
per shader workgroup doesn't seem too bad. The result ring buffer in
memory is needed either way to avoid pipeline stalls.

The shader code contains the atomic unconditionally, though the
GFX10_GS_QUERY_BUF is a null buffer when no queries are active. The
atomic is simply discarded by the shader hardware in that case.

Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
12 files changed:
src/gallium/drivers/radeonsi/Makefile.sources
src/gallium/drivers/radeonsi/gfx10_query.c [new file with mode: 0644]
src/gallium/drivers/radeonsi/gfx10_shader_ngg.c
src/gallium/drivers/radeonsi/meson.build
src/gallium/drivers/radeonsi/si_pipe.c
src/gallium/drivers/radeonsi/si_pipe.h
src/gallium/drivers/radeonsi/si_query.c
src/gallium/drivers/radeonsi/si_query.h
src/gallium/drivers/radeonsi/si_shader.c
src/gallium/drivers/radeonsi/si_shader_internal.h
src/gallium/drivers/radeonsi/si_shaderlib_tgsi.c
src/gallium/drivers/radeonsi/si_state.h