Dave Airlie [Mon, 1 Jul 2019 21:10:53 +0000 (07:10 +1000)]
gallivm: add compare exchange wrapper
This just pulls the wrapper from LLVM for older versions
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Mon, 24 Jun 2019 04:45:36 +0000 (14:45 +1000)]
vertex shader: add exec masking (v2)
As suggested by Roland this is just a compare of fetch_max
vs the counter, much simpler than my original spaghetti code.
We require the vertex shader to have an exec mask to get proper
ssbo/image load/atore/atomics semantics
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Alexandros Frantzis [Fri, 5 Jul 2019 13:08:43 +0000 (16:08 +0300)]
virgl: Hide internal virgl_resource functions
Since the transition to virgl_resource_transfer_map(), several
previously public virgl_resource functions are not required to be public
anymore.
We also move the functions earlier in the file so they can be used
without functions declarations.
Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Alexandros Frantzis [Fri, 5 Jul 2019 11:27:11 +0000 (14:27 +0300)]
virgl: Use virgl_resource_transfer_map for textures
Replace custom texture map code (for maps which don't require resolve)
with virgl_resource_transfer_map.
Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Alexandros Frantzis [Fri, 5 Jul 2019 11:23:19 +0000 (14:23 +0300)]
virgl: Use virgl_resource_transfer_map for buffers
Replace custom buffer map code with virgl_resource_transfer_map.
Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Alexandros Frantzis [Fri, 5 Jul 2019 11:22:16 +0000 (14:22 +0300)]
virgl: Introduce virgl_resource_transfer_map
Normal mapping of buffers and textures uses almost identical logic.
This commit extracts the this logic in the form of the
virgl_resource_transfer_map() helper function.
Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Jason Ekstrand [Thu, 4 Jul 2019 23:26:20 +0000 (18:26 -0500)]
iris: Use a uint16_t for key sizes
sizeof(struct brw_vs_prog_key) == 324.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Marek Olšák [Sat, 29 Jun 2019 06:25:23 +0000 (02:25 -0400)]
ac: destroy passes in ac_destroy_llvm_compiler
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Sat, 29 Jun 2019 05:03:29 +0000 (01:03 -0400)]
ac: use an LLVM fence instead of s.waitcnt when possible
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Sat, 29 Jun 2019 04:59:55 +0000 (00:59 -0400)]
ac: remove unused AC_WAIT_EXP
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Sat, 29 Jun 2019 01:29:34 +0000 (21:29 -0400)]
ac: only set ac_dlc in ac_llvm_build.c
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Sat, 29 Jun 2019 00:53:15 +0000 (20:53 -0400)]
ac: replace glc,slc with cache_policy for loads
cosmetic change
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Sat, 29 Jun 2019 00:53:15 +0000 (20:53 -0400)]
ac: replace glc,slc with cache_policy for stores
cosmetic change
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Jonathan Marek [Mon, 1 Jul 2019 22:41:20 +0000 (18:41 -0400)]
etnaviv: implement buffer compression
Vivante GPUs have lossless buffer compression using the tile-status bits,
which can reduce memory access and thus improve performance.
This patch only enables compression for "V4" compression GPUs, but the
implementation is tested on GC2000(V1) and GC3000(V2). V1/V2 compresssion
looks absolutely useless, so it is not enabled.
I couldn't test if this patch breaks MSAA, because it looks like MSAA is
already broken.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Jonathan Marek [Mon, 1 Jul 2019 23:31:46 +0000 (19:31 -0400)]
etnaviv: detect v4 compression
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Jonathan Marek [Mon, 1 Jul 2019 20:48:51 +0000 (16:48 -0400)]
etnaviv: rs: don't use etna_compatible_rs_format when possible
This mirrors the change in blt. RS cares about this for msaa/compression.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Jonathan Marek [Mon, 1 Jul 2019 20:42:50 +0000 (16:42 -0400)]
etnaviv: combine translate_ts_sampler_format/translate_msaa_format
Both translate the same thing, so just add the missing cases into one.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Jonathan Marek [Mon, 1 Jul 2019 20:29:40 +0000 (16:29 -0400)]
etnaviv: fix compression format not set correctly in TS_MEM_CONFIG
VIVS_TS_MEM_CONFIG_COLOR_COMPRESSION_FORMAT() needs to be used.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Jonathan Marek [Thu, 4 Jul 2019 11:55:45 +0000 (07:55 -0400)]
etnaviv: set correct ts_clear_value for BLT engine
BLT engine uses all ones to clear TS, set ts_clear_value to match that.
Note: ts_clear_value is never used with BLT engine.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Jonathan Marek [Mon, 1 Jul 2019 20:20:58 +0000 (16:20 -0400)]
etnaviv: remove initial CPU ts clear
Since we have "ts_valid" to avoid using uncleared ts, this memset serves
no purpose. Also it is broken because it doesn't use cpu_prep/cpu_fini.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Jonathan Marek [Mon, 1 Jul 2019 20:16:54 +0000 (16:16 -0400)]
etnaviv: implement TS_MODE for GC7000L
GC7000L has a TS mode with larger tiles, which improves performance.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Jonathan Marek [Thu, 4 Jul 2019 17:30:55 +0000 (13:30 -0400)]
etnaviv: fix ts size calculation
The size of the TS is screen->specs.bits_per_tile bits per tile, with each
tile being 64 bytes of the resource.
This gives the same result for 32bpp formats, but reduces the size of TS
for 16bpp formats by 2.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Jonathan Marek [Mon, 1 Jul 2019 20:07:56 +0000 (16:07 -0400)]
etnaviv: update headers from rnndb
Update to etna_viv commit
8a8b13a and use new names in the code.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Eric Engestrom [Sat, 29 Jun 2019 23:17:50 +0000 (00:17 +0100)]
scons: s/HAVE_NO_AUTOCONF/HAVE_SCONS/
Back when autotools and scons were the two build systems, it kinda made
sense to call scons "not autoconf", but autoconf's been gone for a while
now and other build systems have been added (android.mk and meson), so
the name really doesn't make any sense anymore.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Bas Nieuwenhuizen [Thu, 4 Jul 2019 00:11:27 +0000 (02:11 +0200)]
radeonsi: Fix some warnings.
../mesa/src/gallium/drivers/radeonsi/si_compute_blit.c: In function ‘si_clear_buffer’:
../mesa/src/gallium/drivers/radeonsi/si_compute_blit.c:195:11: warning: unused variable ‘clear_alignment’ [-Wunused-variable]
unsigned clear_alignment = MIN2(clear_value_size, 4);
^~~~~~~~~~~~~~~
[23/60] Compiling C object 'src/gallium/drivers/radeonsi/
3cdc30e@@radeonsi@sta/si_compute_prim_discard.c.o'.
../mesa/src/gallium/drivers/radeonsi/si_compute_prim_discard.c: In function ‘si_prepare_prim_discard_or_split_draw’:
../mesa/src/gallium/drivers/radeonsi/si_compute_prim_discard.c:1106:7: warning: unused variable ‘compute_has_space’ [-Wunused-variable]
bool compute_has_space = sctx->ws->cs_check_space(cs, need_compute_dw, false);
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Mon, 1 Jul 2019 15:26:09 +0000 (17:26 +0200)]
amd/common: move ac_shader_{binary,reloc} into r600 and rename
They are no longer used by radeonsi or radv.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Mon, 1 Jul 2019 14:57:48 +0000 (16:57 +0200)]
amd/common: removed unused ac_shader_binary functions
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Mon, 1 Jul 2019 14:55:28 +0000 (16:55 +0200)]
amd/common: remove unused ac_compile_module_to_binary
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bas Nieuwenhuizen [Mon, 1 Jul 2019 01:21:58 +0000 (03:21 +0200)]
radv: Switch to using rtld.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Mon, 1 Jul 2019 00:19:13 +0000 (02:19 +0200)]
radv: Move more stuff to variant create time.
Due to them depending on the linker result.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Sun, 30 Jun 2019 23:29:24 +0000 (01:29 +0200)]
radv: Add the concept of radv shader binaries.
This simplifies a bunch of stuff by
(1) Keeping all the things in a single allocation, making things easier
for the cache.
(2) creating a shader_variant creation helper.
This is immediately put to use by creating rtld shader binaries. This
is the main reason for the binaries, as we need to do the linking at
upload time, i.e. post caching. We do not enable rtld yet.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Mon, 1 Jul 2019 22:41:47 +0000 (00:41 +0200)]
radv: Add export_prim_id to the shader variant info.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Tue, 2 Jul 2019 10:16:36 +0000 (12:16 +0200)]
radv: use last nir shader to determine stage in postprocessing
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Sat, 29 Jun 2019 23:47:30 +0000 (01:47 +0200)]
radv: Merge rsrc1/rsrc2 fields with the config fields.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Andres Gomez [Wed, 3 Jul 2019 14:02:42 +0000 (17:02 +0300)]
vulkan: Update headers to 1.1.113
Some headers were not dragged in the last update(s).
Fixes: 465ec0b145c ("vulkan: Update the XML and headers to 1.1.113")
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Samuel Pitoiset [Thu, 4 Jul 2019 06:54:49 +0000 (08:54 +0200)]
radv: do not crash when generating binning state for unknown chips
These values are only useful if binning is disabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 4 Jul 2019 06:54:48 +0000 (08:54 +0200)]
radv: fix potential crash in the compute resolve path
If the destination attachment is UNUSED.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tomeu Vizoso [Thu, 4 Jul 2019 07:59:30 +0000 (09:59 +0200)]
panfrost: Take into account off-screen FBOs
In that case, ctx->pipe_framebuffer.cbufs[0] can be NULL.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Cc: Boris Brezillon <boris.brezillon@collabora.com>
Fixes: 5375d009be18 ("panfrost: Pass referenced BOs to the SUBMIT ioctls")
Christian Gmeiner [Wed, 3 Jul 2019 19:06:54 +0000 (21:06 +0200)]
util/macros: rework DIV_ROUND_UP macro
Simplify used math.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Christian Gmeiner [Wed, 3 Jul 2019 21:12:33 +0000 (23:12 +0200)]
gitlab-ci: bump required libdrm version
Fixes following build problem:
Message: libdrm 2.4.99 needed because amdgpu has the highest requirement
Dependency libdrm_intel found: NO found '2.4.97' but need: '>=2.4.99'
Dependency libdrm_intel found: NO
meson.build:1178:4: ERROR: Invalid version of dependency, need 'libdrm_intel' ['>=2.4.99'] found '2.4.97'.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Kenneth Graunke [Wed, 3 Jul 2019 22:14:49 +0000 (15:14 -0700)]
iris: Fix MOCS for grid surface
Hardcoding 4 is bad; we have a function for this now.
Kenneth Graunke [Wed, 3 Jul 2019 22:12:17 +0000 (15:12 -0700)]
iris: Minor tidying
Marek Olšák [Thu, 4 Jul 2019 05:08:02 +0000 (01:08 -0400)]
Revert "mesa/st: Passthrough scissor when clearing by quad"
This reverts commit
0a88aa3025db0cc5a68222c7939d7da4d218f1be.
It breaks a lot of piglit tests.
Marek Olšák [Thu, 27 Jun 2019 17:20:31 +0000 (13:20 -0400)]
gallium/u_blitter: add return to fix the build
Alyssa Rosenzweig [Tue, 18 Jun 2019 20:37:16 +0000 (13:37 -0700)]
mesa/st: Passthrough scissor when clearing by quad
The scissor state -is- setup, but the scissor test is not enabled. This
can prevent certain optimizations from occurring on tilers where
unaffected tiles are thrown out entirely.
v2: Only enable scissor test if the scissor test is actually set by the
app, to avoid regressing quad-based clears used for other reasons (like
a color mask).
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Tue, 24 Oct 2017 11:40:53 +0000 (11:40 +0000)]
amd: add NAVI10 PCI IDs
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Tue, 25 Jun 2019 21:57:48 +0000 (17:57 -0400)]
radeonsi/gfx10: fix legacy GS
LLVM doesn't insert s_waitcnt_vscnt before GS_DONE.
There was also the crash in legacy GS copy shader.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Wed, 8 May 2019 01:06:15 +0000 (03:06 +0200)]
radeonsi/gfx10: disable clear state
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Mon, 13 Nov 2017 16:24:13 +0000 (17:24 +0100)]
radeonsi/gfx10: disable DPBB
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Mon, 13 Nov 2017 16:23:15 +0000 (17:23 +0100)]
radeonsi/gfx10: disable SDMA
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Tue, 25 Jun 2019 00:54:52 +0000 (20:54 -0400)]
radeonsi: determine the rasterization primitive type accurately (v2)
v2: reworked version to fix bugs and make it more efficient
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Tue, 25 Jun 2019 00:53:41 +0000 (20:53 -0400)]
radeonsi/gfx10: consolidate & improve input_prim determination for NGG
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Mon, 24 Jun 2019 20:13:24 +0000 (16:13 -0400)]
ac: rework ac_build_waitcnt for gfx10
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Mon, 24 Jun 2019 21:39:39 +0000 (17:39 -0400)]
radeonsi/gfx10: implement si_shader_vs
Only used with tessellation + GS instancing.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Sat, 22 Jun 2019 01:06:16 +0000 (21:06 -0400)]
radeonsi/gfx10: unpack GS invocation ID
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Fri, 21 Jun 2019 22:38:58 +0000 (18:38 -0400)]
radeonsi/gfx10: jump over the shader query atomic if the queries are disabled
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Fri, 14 Jun 2019 01:29:47 +0000 (21:29 -0400)]
radeonsi/gfx10: cosmetic changes
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Thu, 6 Jun 2019 04:25:40 +0000 (00:25 -0400)]
radeonsi/gfx10: set cache control registers
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Thu, 6 Jun 2019 00:20:47 +0000 (20:20 -0400)]
radeonsi/gfx10: export correct PrimitiveID from NGG vertex shaders
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Wed, 5 Jun 2019 19:04:45 +0000 (15:04 -0400)]
radeonsi/gfx10: set PA_SC_TILE_STEERING_OVERRIDE
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Wed, 5 Jun 2019 05:54:46 +0000 (01:54 -0400)]
radeonsi/gfx10: add a workaround for stencil HTILE with mipmapping
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Wed, 5 Jun 2019 05:37:01 +0000 (01:37 -0400)]
radeonsi/gfx10: disable DCC with MSAA
It was only enabled for 2x MSAA anyway.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Thu, 30 May 2019 00:06:16 +0000 (20:06 -0400)]
radeonsi/gfx10: fix GL_LINE polygon mode for decomposed primitives
We need to tell PA to accept edge flags generated by the input assembler,
because decomposed primitives shouldn't draw inner edges.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Wed, 29 May 2019 20:32:17 +0000 (16:32 -0400)]
radeonsi/gfx10: fix NGG GS color clamping
Just need to pass the input from ES to GS. Everything else is done.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Tue, 28 May 2019 23:55:09 +0000 (19:55 -0400)]
radeonsi/gfx10: fix vertex color clamping for TES
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Wed, 29 May 2019 02:29:08 +0000 (22:29 -0400)]
radeonsi/gfx10: unbind NGG shaders when destroyed
This fixes glsl-max-varyings, which creates shaders, draws, and then
destroys them.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Wed, 29 May 2019 02:06:52 +0000 (22:06 -0400)]
radeonsi/gfx10: don't use the GS workaround for triangle strips w/ adjancency
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Wed, 29 May 2019 02:01:09 +0000 (22:01 -0400)]
radeonsi/gfx10: don't do the query buffer atomic for blit shaders
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Tue, 28 May 2019 23:56:08 +0000 (19:56 -0400)]
radeonsi/gfx10: update spi_map if API VS (as NGG) changes and PS doesn't
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Tue, 28 May 2019 23:52:53 +0000 (19:52 -0400)]
radeonsi/gfx10: fix a possible hang with exp pos0 with done=0 and exec=0
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Tue, 28 May 2019 22:55:30 +0000 (18:55 -0400)]
radeonsi/gfx10: prefetch HW GS when NGG is used
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Mon, 27 May 2019 14:16:39 +0000 (16:16 +0200)]
amd/common/gfx10: set DLC for llvm.amdgcn.s.buffer.load
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Sat, 25 May 2019 02:49:27 +0000 (22:49 -0400)]
radeonsi/gfx10: fix PS exports for SPI_SHADER_32_AR
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Fri, 24 May 2019 22:48:39 +0000 (18:48 -0400)]
radeonsi/gfx10: set DLC for loads when GLC is set
This fixes L1 shader array cache coherency.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Fri, 24 May 2019 21:25:04 +0000 (17:25 -0400)]
radeonsi/gfx10: fix shader images
Don't promote 2D image instructions to 3D, and don't set z=BASE_ARRAY.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Thu, 23 May 2019 18:20:27 +0000 (14:20 -0400)]
radeonsi/gfx10: set the DCC constant encoding flag
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Wed, 22 May 2019 22:21:55 +0000 (18:21 -0400)]
radeonsi/gfx10: fix intensity formats
move the ALPHA_IS_ON_MSB fixup into vi_alpha_is_on_msb
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Wed, 5 Jun 2019 02:08:41 +0000 (22:08 -0400)]
radeonsi/gfx10: allocate GDS BOs for streamout
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Wed, 5 Jun 2019 02:02:25 +0000 (22:02 -0400)]
radeonsi/gfx10: make sure GDS is idle between IBs
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Fri, 21 Sep 2018 20:07:01 +0000 (22:07 +0200)]
radeonsi/gfx10: implement streamout
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Wed, 19 Sep 2018 12:53:35 +0000 (14:53 +0200)]
radeonsi/gfx10: implement streamout-related queries
The NGG hardware pipeline doesn't track these statistics automatically,
and in fact *cannot* track them automatically when API geometry shaders
are involved, so we accumulate statistics in the shader using atomic
adds.
This implementation accumulates statistics via the memory system and
the RW buffer descriptor setup. We could use GDS, but since these
atomics aren't latency-sensitive, that basically just trades off
L2$ bandwidth vs. export bus bandwidth. One single memory transaction
per shader workgroup doesn't seem too bad. The result ring buffer in
memory is needed either way to avoid pipeline stalls.
The shader code contains the atomic unconditionally, though the
GFX10_GS_QUERY_BUF is a null buffer when no queries are active. The
atomic is simply discarded by the shader hardware in that case.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Mon, 1 Apr 2019 13:58:31 +0000 (15:58 +0200)]
radeonsi/gfx10: enable the workaround for unaligned vertex fetch
Yes, really. Note that non-format buffer loads are unaffected and work
just fine with unaligned pointers (as long as SH_MEM_CONFIG is setup
correctly, which amdgpu ensures).
Fixes e.g. KHR-GL45.vertex_attrib_64bit.vao
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Thu, 20 Sep 2018 08:18:07 +0000 (10:18 +0200)]
radeonsi/gfx10: re-order the initialization order in si_compile_tgsi_main
It's useful to be able to access gs_ngg_scratch before creating the
main wrapping branch.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Tue, 18 Sep 2018 12:16:43 +0000 (14:16 +0200)]
radeonsi/gfx10: apply DCC MSAA blend workaround
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Thu, 7 Feb 2019 17:53:27 +0000 (18:53 +0100)]
radeonsi/gfx10: implement si_emit_global_shader_pointers
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Fri, 31 Aug 2018 17:59:48 +0000 (19:59 +0200)]
radeonsi/gfx10: implement si_init_tess_factor_ring
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Fri, 31 Aug 2018 17:59:36 +0000 (19:59 +0200)]
radeonsi/gfx10: initialize EXEC for TES-as-NGG (without geometry shader)
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Fri, 31 Aug 2018 17:58:35 +0000 (19:58 +0200)]
radeonsi/gfx10: use correct VGPR for instance ID in LS shader
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Fri, 31 Aug 2018 17:54:59 +0000 (19:54 +0200)]
radeonsi/gfx10: implement si_shader_hs
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Fri, 31 Aug 2018 17:53:52 +0000 (19:53 +0200)]
radeonsi/gfx10: implement si_create_sampler_state
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Thu, 30 Aug 2018 15:06:52 +0000 (17:06 +0200)]
radeonsi/gfx10: double the number of tessellation offchip buffers per SE
Each gfx10 shader engine corresponds to two gfx9 shader engines, so scale
the number of offchip buffers accordingly.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Tue, 28 Aug 2018 14:00:28 +0000 (16:00 +0200)]
radeonsi/gfx10: implement get_tess_ring_descriptor
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Mon, 23 Jul 2018 07:47:19 +0000 (09:47 +0200)]
radeonsi/gfx10: mask DCC tile swizzle by alignment
DCC alignment can be less than the alignment of the main surface. In that
case, the DCC tile swizzle needs to be masked accordingly. Should have no
impact on pre-gfx10.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Mon, 2 Jul 2018 16:50:48 +0000 (18:50 +0200)]
radeonsi/gfx10: implement hardware MSAA resolve
MSAA is only supported for 64KB_{R,Z}_X modes, so the micro tile
optimization that we use on gfx9 and earlier does not work.
Be very explicit about how the swizzle mode of the temporary surface is
selected.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Thu, 21 Jun 2018 12:56:54 +0000 (14:56 +0200)]
radeonsi/gfx10: fix binding on si_update_scratch_relocs
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Tue, 19 Jun 2018 15:44:24 +0000 (17:44 +0200)]
radeonsi/gfx10: set llvm_has_working_vgpr_indexing
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Fri, 1 Jun 2018 14:04:02 +0000 (16:04 +0200)]
radeonsi/gfx10: implement load_const_buffer_desc_fast_path
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Fri, 1 Jun 2018 14:03:31 +0000 (16:03 +0200)]
radeonsi/gfx10: take PRIMID from the correct output when exported by GS
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Wed, 30 May 2018 20:47:10 +0000 (22:47 +0200)]
radeonsi/gfx10: change location of instance ID shader input
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Nicolai Hähnle [Wed, 30 May 2018 20:45:06 +0000 (22:45 +0200)]
radeonsi/gfx10: set USER_DATA_ADDR offset for geometry shaders
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>