Kenneth Graunke [Fri, 23 Aug 2019 18:08:48 +0000 (11:08 -0700)]
util: Add a _mesa_i64roundevenf() helper.
This always returns a int64_t, translating to _mesa_lroundevenf on
systems where long is 64-bit, and llrintf where "long long" is needed.
Fixes: 594fc0f8595 ("mesa: Replace F_TO_I() with _mesa_lroundevenf().")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Adam Jackson [Thu, 22 Aug 2019 19:15:28 +0000 (15:15 -0400)]
glx: Unset the direct_support bit for GLX_EXT_import_context
GLX_EXT_import_context operates only on indirect contexts, a direct
context cannot possibly support it. Without this change the extension
will appear in the combined GLX extension string even if it is missing
from the server string, indicating a lack of required server support.
Daniel Kolesa [Tue, 27 Aug 2019 19:47:48 +0000 (21:47 +0200)]
util: add auxv based PowerPC AltiVec/VSX detection
At least on Linux, we can use the ELF auxiliary vector to
detect the presence of AltiVec, VSX and other CPU features
without having to go through handling SIGILL, which has
various problems of its own.
A similar thing is already being done for ARM to detect NEON.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Daniel Kolesa <daniel@octaforge.org>
Kenneth Graunke [Sat, 24 Aug 2019 01:23:32 +0000 (18:23 -0700)]
intel/compiler: Use new Gen11 headerless RT writes for MRT cases
Gen11 adds support for specifying the render target index and src0
alpha present bits in the extended message descriptor. Previously,
we had to use a message header for this, requiring extra instructions
to write the fields, and two registers of extra payload.
Improves performance on my ICL 8x8 frequency locked to 700Mhz, on iris:
GfxBench5 Manhattan 3.0: 2.13635% +/- 0.159859% (n=5)
GfxBench5 Aztec Ruins: 1.57173% +/- 0.128749% (n=5)
Synmark2 OglDeferred: 2.86914% +/- 0.191211% (n=10)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Mon, 26 Aug 2019 07:05:21 +0000 (00:05 -0700)]
intel/compiler: Use generic SEND for Gen7+ FB writes
This takes care of generate_fb_write/fire_fb_write/brw_fb_WRITE's stuff
earlier in the visitor. It will also make it easier to generate SENDSC
messages with indirect extended descriptors in a few patches.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Mon, 26 Aug 2019 06:59:25 +0000 (23:59 -0700)]
intel/compiler: Refactor FB write message control setup into a helper.
This will be used by visitor code to convert directly to SEND in a bit.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Mon, 26 Aug 2019 06:43:29 +0000 (23:43 -0700)]
intel/compiler: Handle bits 15:12 in brw_send_indirect_split_message()
Annoyingly, these bits exist in some extended message descriptors
(in particular render target writes), but they don't have any
corresponding bits in the ISA encoding. So we can't use an immediate
and have to fall back to an indirect extended descriptor.
Thanks to Jason Ekstrand for reminding me that you can still set these
bits via an indirect descriptor, even if they don't exist in the ISA.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Mon, 26 Aug 2019 22:21:40 +0000 (15:21 -0700)]
intel/compiler: Fix src0/desc setter ordering
src0 vstride and type overlap with bits of the extended descriptor.
brw_set_desc() also sets the extended descriptor to 0. So by setting
the descriptor, then setting src0, we were accidentally setting a bunch
of extended descriptor bits unintentionally.
When using this infrastructure for framebuffer writes (in a future
patch), this ended up setting the extended descriptor bit 20, which is
"Null Render Target" on Icelake, causing nothing to be written to the
framebuffer.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Marek Olšák [Mon, 19 Aug 2019 17:15:54 +0000 (13:15 -0400)]
radeonsi: fix scratch buffer WAVESIZE setting leading to corruption
Cc: 19.2 19.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Mon, 19 Aug 2019 18:37:33 +0000 (14:37 -0400)]
radeonsi: unbind blend/DSA/rasterizer state correctly in delete functions
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111414
Fixes: b758eed9c37 ("radeonsi: make sure that blend state != NULL and remove all NULL checking")
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Mon, 19 Aug 2019 17:06:47 +0000 (13:06 -0400)]
radeonsi: align scratch and ring buffer allocations for faster memory access
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 21 Aug 2019 22:03:50 +0000 (18:03 -0400)]
radeonsi: consolidate determining VGPR_COMP_CNT for API VS
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 21 Aug 2019 04:18:17 +0000 (00:18 -0400)]
radeonsi/gfx10: set PA_CL_VS_OUT_CNTL with CONTEXT_REG_RMW to fix edge flags
We need two different values of the register, one for NGG and one for
legacy, in order to fix edge flags for the legacy pipeline.
Passing the ngg flag to emit_clip_regs would be too complicated,
so CONTEXT_REG_RMW is used for partial register updates.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 21 Aug 2019 04:28:23 +0000 (00:28 -0400)]
radeonsi/gfx10: remove incorrect ngg/pos_writes_edgeflag variables
It varies depending on si_shader_key::as_ngg.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 21 Aug 2019 04:13:17 +0000 (00:13 -0400)]
radeonsi: add PKT3_CONTEXT_REG_RMW
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 21 Aug 2019 02:44:16 +0000 (22:44 -0400)]
winsys/amdgpu+radeon: process AMD_DEBUG in addition to R600_DEBUG
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 21 Aug 2019 02:37:52 +0000 (22:37 -0400)]
radeonsi/gfx10: add AMD_DEBUG=nongg
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Tue, 20 Aug 2019 22:57:36 +0000 (18:57 -0400)]
radeonsi/gfx10: finish up Navi14, add PCI ID
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Tue, 20 Aug 2019 22:57:01 +0000 (18:57 -0400)]
radeonsi/gfx10: always use the legacy pipeline for streamout
The best way to prevent GDS hangs is not to use GDS.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 21 Aug 2019 00:58:18 +0000 (20:58 -0400)]
radeonsi/gfx10: don't initialize VGT_INSTANCE_STEP_RATE_0
Only gfx9 and older use it to get InstanceID in VGPR1.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 21 Aug 2019 00:56:22 +0000 (20:56 -0400)]
radeonsi/gfx10: fix InstanceID for legacy VS+GS
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 21 Aug 2019 00:39:08 +0000 (20:39 -0400)]
radeonsi/gfx10: add as_ngg variant for VS as ES to select Wave32/64
Legacy GS only works with Wave64.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 21 Aug 2019 00:08:38 +0000 (20:08 -0400)]
radeonsi/gfx10: create the GS copy shader if using legacy streamout
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Wed, 21 Aug 2019 00:07:26 +0000 (20:07 -0400)]
radeonsi/gfx10: fix the PRIMITIVES_GENERATED query if using legacy streamout
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Tue, 20 Aug 2019 22:43:14 +0000 (18:43 -0400)]
radeonsi/gfx10: fix tessellation for the legacy pipeline
ported from PAL
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Tue, 20 Aug 2019 17:36:45 +0000 (13:36 -0400)]
radeonsi: move some global shader cache flags to per-binary flags
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Marek Olšák [Tue, 20 Aug 2019 17:20:07 +0000 (13:20 -0400)]
radeonsi/gfx10: fix the legacy pipeline by storing as_ngg in the shader cache
It could load an NGG shader when we want a legacy shader and vice versa.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Kenneth Graunke [Tue, 27 Aug 2019 20:14:53 +0000 (13:14 -0700)]
iris: Delete dead prototype
Boris Brezillon [Tue, 27 Aug 2019 18:07:03 +0000 (20:07 +0200)]
Revert "panfrost: Free all block/instruction objects before leaving midgard_compile_shader_nir()"
This reverts commit
5882e0def97a47aff050f5a3f412b97a7f440e27.
This commit causes a segfault.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Boris Brezillon [Tue, 27 Aug 2019 09:12:02 +0000 (11:12 +0200)]
panfrost: Make sure bundle.instructions[] contains valid instructions
Add an assert() in schedule_bundle() to make sure all instruction
pointers in bundle.instructions[] are valid.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Boris Brezillon [Tue, 13 Aug 2019 23:56:30 +0000 (01:56 +0200)]
panfrost: Free all block/instruction objects before leaving midgard_compile_shader_nir()
Right now we're leaking all block and instruction objects allocated by
the compiler. Let's clean things up before leaving
midgard_compile_shader_nir().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Boris Brezillon [Tue, 13 Aug 2019 23:54:24 +0000 (01:54 +0200)]
panfrost: Free the instruction object in mir_remove_instruction()
To avoid memory leaks.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Eric Engestrom [Thu, 22 Aug 2019 11:12:20 +0000 (12:12 +0100)]
scons: add support for MAJOR_IN_{MKDEV,SYSMACROS}
src/gallium/winsys/svga/drm/vmw_screen.c: In function ‘vmw_dev_compare’:
src/gallium/winsys/svga/drm/vmw_screen.c:48:12: warning: implicit declaration of function ‘major’ [-Wimplicit-function-declaration]
48 | return (major(*(dev_t *)key1) == major(*(dev_t *)key2) &&
| ^~~~~
src/gallium/winsys/svga/drm/vmw_screen.c:49:12: warning: implicit declaration of function ‘minor’ [-Wimplicit-function-declaration]
49 | minor(*(dev_t *)key1) == minor(*(dev_t *)key2)) ? 0 : 1;
| ^~~~~
That file (and many others) already has the proper #include with their
respective guards, but scons wasn't defining them, resulting in implicit
functions being used instead (and an always-true check that's probably
breaking something down the line).
Note that I'm cheating a bit here because Scons doesn't seem to have
a clean way to detect the existence of major() et al. as functions or
macros, so I'm taking the shortcut of just detecting the presence of the
header and assuming its contents is what we expect.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-By: Jose Fonseca <jfonseca@vmware.com>
Samuel Pitoiset [Fri, 23 Aug 2019 06:55:53 +0000 (08:55 +0200)]
radv: make use of has_ls_vgpr_init_bug
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Fri, 23 Aug 2019 06:52:07 +0000 (08:52 +0200)]
ac: add has_ls_vgpr_init_bug to ac_gpu_info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Fri, 23 Aug 2019 06:50:06 +0000 (08:50 +0200)]
ac: add has_msaa_sample_loc_bug to ac_gpu_info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Wed, 21 Aug 2019 09:32:25 +0000 (11:32 +0200)]
ac: add rbplus_allowed to ac_gpu_info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Wed, 21 Aug 2019 09:21:05 +0000 (11:21 +0200)]
ac: add has_tc_compat_zrange_bug to ac_gpu_info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Tue, 20 Aug 2019 15:38:43 +0000 (17:38 +0200)]
ac: add has_gfx9_scissor_bug to ac_gpu_info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Tue, 20 Aug 2019 15:20:42 +0000 (17:20 +0200)]
ac: add cpdma_prefetch_writes_memory to ac_gpu_info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Tue, 20 Aug 2019 15:16:41 +0000 (17:16 +0200)]
ac: add has_out_of_order_rast to ac_gpu_info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Tue, 20 Aug 2019 15:15:46 +0000 (17:15 +0200)]
ac: add has_load_ctx_reg_pkt to ac_gpu_info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Fri, 2 Aug 2019 10:21:04 +0000 (12:21 +0200)]
ac: add has_rbplus to ac_gpu_info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Fri, 2 Aug 2019 10:16:54 +0000 (12:16 +0200)]
ac: add has_dcc_constant_encode to ac_gpu_info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Fri, 2 Aug 2019 10:13:20 +0000 (12:13 +0200)]
ac: add has_distributed_tess to ac_gpu_info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Fri, 2 Aug 2019 10:10:43 +0000 (12:10 +0200)]
ac: add has_clear_state to ac_gpu_info
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Fri, 23 Aug 2019 06:22:51 +0000 (08:22 +0200)]
ac: drop llvm8 from some load/store helpers
Cleanup.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Dave Airlie [Tue, 27 Aug 2019 03:36:03 +0000 (13:36 +1000)]
gallivm: fix appveyor build after images changes
Dave Airlie [Sat, 20 Jul 2019 04:30:14 +0000 (14:30 +1000)]
docs: add shader image extensions for llvmpipe
v1.1: fix typo in llvmpipe name (ajax)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Sat, 20 Jul 2019 04:29:00 +0000 (14:29 +1000)]
llvmpipe: enable ARB_shader_image_load_store
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Thu, 22 Aug 2019 01:28:40 +0000 (11:28 +1000)]
llvmpipe: flush on api memorybarrier.
Until we have somewhere we can do better, just hit it with a hammer.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Wed, 21 Aug 2019 00:19:16 +0000 (10:19 +1000)]
gallivm: add memory barrier support
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Wed, 21 Aug 2019 06:43:55 +0000 (16:43 +1000)]
gallivm: add support for fences api on older llvm
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Sat, 20 Jul 2019 04:28:45 +0000 (14:28 +1000)]
llvmpipe: bind vertex/geometry shader images
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Sat, 20 Jul 2019 04:28:23 +0000 (14:28 +1000)]
llvmpipe: add fragment shader image support
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Sat, 20 Jul 2019 04:26:48 +0000 (14:26 +1000)]
draw: add vs/gs images support
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Fri, 19 Jul 2019 09:06:48 +0000 (19:06 +1000)]
gallivm: add image load/store/atomic support
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Fri, 19 Jul 2019 06:33:03 +0000 (16:33 +1000)]
gallivm/tgsi: add image interface to tgsi builder
This adds the callbacks for the driver/gallium binding for
image operations.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Fri, 19 Jul 2019 06:29:10 +0000 (16:29 +1000)]
llvmpipe: introduce image jit type to fragment shader jit.
This adds the image type to the fragment shader jit context
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Fri, 19 Jul 2019 06:28:12 +0000 (16:28 +1000)]
draw: add jit image type for vs/gs images.
This introduces the jit image type into the jit interface
for vertex/geom shaders
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Fri, 19 Jul 2019 06:07:47 +0000 (16:07 +1000)]
llvmpipe: move the fragment shader variant key to dynamic length.
This mirrors the vs/gs keys, and will be needed when adding images
support.
The const changes also mirror how the draw code work (as is needed
when we add images)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Fri, 19 Jul 2019 05:52:14 +0000 (15:52 +1000)]
gallivm: add a basic image limit
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Fri, 19 Jul 2019 05:45:22 +0000 (15:45 +1000)]
llvmpipe: handle early test property.
Also handle setting late for shaders that use stores
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Fri, 19 Jul 2019 05:04:55 +0000 (15:04 +1000)]
gallivm: move first/last level jit texture members.
This lets us create an image structure with the same basic
types as the texture one.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Thu, 4 Jul 2019 01:33:22 +0000 (11:33 +1000)]
gallivm: handle helper invocation (v2)
Just invert the exec_mask to get if this is a helper or not.
v2: get the bld mask (Roland)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Sun, 30 Jun 2019 20:49:59 +0000 (06:49 +1000)]
gallivm: make lp_build_float_to_r11g11b10 take a const src
This allows using it with a const src later.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Fri, 28 Jun 2019 21:34:18 +0000 (07:34 +1000)]
llvmpipe: refactor jit type creation
This just cleans the code up so the texture/sampler type
creation can be reused.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Dave Airlie [Tue, 20 Aug 2019 05:44:50 +0000 (15:44 +1000)]
gallivm: fix atomic compare-and-swap
Not sure how I missed this before, but compswap was hitting an
assert here as it is it's own special case.
Fixes: b5ac381d8f ("gallivm: add buffer operations to the tgsi->llvm conversion.")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Paulo Zanoni [Wed, 14 Aug 2019 00:02:13 +0000 (17:02 -0700)]
intel/fs: grab fail_msg from v32 instead of v16 when v32->run_cs fails
Looks like a copy/paste error. This patch prevents a segfault when
running the following on BDW:
INTEL_DEBUG=no8,no16,do32 ./deqp-vk -n \
dEQP-VK.subgroups.arithmetic.compute.subgroupmin_dvec4
For the curious, the message we're getting is:
CS compile failed: Failure to register allocate. Reduce number
of live scalar values to avoid this.
Fixes: 864737ce6cd5 ("i965/fs: Build 32-wide compute shader when needed.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Alyssa Rosenzweig [Mon, 26 Aug 2019 19:48:14 +0000 (12:48 -0700)]
pan/midgard: Fix invert fusing with r26
The invert wasn't applying (correctly) due to the issues addressed here.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Mon, 26 Aug 2019 18:58:27 +0000 (11:58 -0700)]
pan/midgard: Fold ssa_args into midgard_instruction
This is just a bit of refactoring to simplify MIR.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Eric Anholt [Wed, 14 Aug 2019 19:23:46 +0000 (12:23 -0700)]
gallium: Add the ASTC 3D formats.
No driver implements them yet, but this is a long way toward gallium
having matching format enums for Mesa formats.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Eric Anholt [Wed, 14 Aug 2019 19:16:46 +0000 (12:16 -0700)]
gallium: Add block depth to the format utils.
I decided not to update nblocks() with a depth arg as the callers
wouldn't be doing ASTC 3D.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Eric Anholt [Wed, 14 Aug 2019 19:09:04 +0000 (12:09 -0700)]
gallium: Add a block depth field to the u_formats table.
To add ASTC 3D compression formats, we need to be able to express the
block depth. While I'm touching every line, line up the columns of
the CSV again as they've drifted over time.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Alyssa Rosenzweig [Fri, 23 Aug 2019 23:14:13 +0000 (16:14 -0700)]
pan/midgard: Add imov->fmov optimization
When moving constants, if switching to a floating-point representation
doesn't break anything, we'd rather have an fmov than an imov,
permitting inlining the constant in many circumstances.
total quadwords in shared programs: 3408 -> 3366 (-1.23%)
quadwords in affected programs: 1188 -> 1146 (-3.54%)
helped: 41
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.02 x̃: 1
helped stats (rel) min: 0.19% max: 25.00% x̄: 9.65% x̃: 11.11%
95% mean confidence interval for quadwords value: -1.07 -0.98
95% mean confidence interval for quadwords %-change: -11.38% -7.93%
Quadwords are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Alyssa Rosenzweig [Fri, 23 Aug 2019 23:02:49 +0000 (16:02 -0700)]
pan/midgard: Switch constants to uint32
Storing constants as float doesn't make sense when we have integer
instructions; better to switch to be integer natively and coerce to/from
float rather than the opposite.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Kenneth Graunke [Sat, 24 Aug 2019 00:32:06 +0000 (17:32 -0700)]
isl: Don't set UnormPathInColorPipe for integer surfaces.
This fixes dEQP-GLES3.functional.texture.specification subtests on iris:
- texsubimage3d_depth.depth24_stencil8_2d_array
- texsubimage3d_depth.depth32f_stencil8_2d_array
- texsubimage3d_depth.depth_component32f_2d_array
- texsubimage3d_depth.depth_component24_2d_array
- texstorage2d.format.depth24_stencil8_2d
- texstorage2d.format.depth32f_stencil8_2d
- texstorage2d.format.depth_component24_2d
- texstorage2d.format.depth_component32f_2d
- texstorage3d.format.depth24_stencil8_2d_array
- texstorage3d.format.depth32f_stencil8_2d_array
- texstorage3d.format.depth_component24_2d_array
- texstorage3d.format.depth_component32f_2d_array
Here, something appears to be going wrong with having this bit set
during blorp_copy operations for texture upload, which override the
format to R8G8B8A8_UINT.
AFAICT this bit should have no effect for integer surfaces, as it has
to do with blending, and integer blending is not a thing. So it should
be harmless to disable it.
The Windows driver appears to be setting this bit universally, so
I am unclear why we would need to. Perhaps they simply haven't run
into this issue.
Fixes: f741de236b5 ("isl: Enable Unorm Path in Color Pipe")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Sat, 24 Aug 2019 00:32:24 +0000 (17:32 -0700)]
isl: Drop UnormPathInColorPipe for buffer surfaces.
Jason suggested I remove this in review, and he's right. AFAICT this
affects blending, and that just isn't going to happen on buffers.
Fixes: f741de236b5 ("isl: Enable Unorm Path in Color Pipe")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Alyssa Rosenzweig [Mon, 26 Aug 2019 14:46:43 +0000 (07:46 -0700)]
pan/midgard, bifrost: Set lower_fdph = true
fdph instructions show up in some desktop GL shaders.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Samuel Pitoiset [Thu, 6 Jun 2019 15:30:17 +0000 (17:30 +0200)]
radv: add mipmap support for the clear depth/stencil values
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 6 Jun 2019 15:23:17 +0000 (17:23 +0200)]
radv: add mipmap support for the TC-compat zrange bug
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Fri, 23 Aug 2019 08:48:20 +0000 (10:48 +0200)]
radv: allocate metadata space for mipmapped depth/stencil images
For each mipmaps, the driver will store the clear values (8-bytes)
and the TC-compat zrange value (4-bytes).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 6 Jun 2019 10:03:10 +0000 (12:03 +0200)]
radv: decompress mipmapped depth/stencil images during transitions
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 6 Jun 2019 09:57:19 +0000 (11:57 +0200)]
radv: add mipmaps support for decompress/resummarize
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Thu, 6 Jun 2019 09:46:21 +0000 (11:46 +0200)]
radv: add radv_process_depth_image_layer() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Connor Abbott [Fri, 23 Aug 2019 08:46:53 +0000 (10:46 +0200)]
ac/nir: Remove gfx9_stride_size_workaround_for_atomic
The workaround was entirely in common code, and it's needed in radeonsi
too so just always do it when necessary. Fixes
KHR-GL45.shader_image_load_store.advanced-allStages-oneImage on gfx9
with LLVM 8.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Connor Abbott [Thu, 22 Aug 2019 15:09:21 +0000 (17:09 +0200)]
ac/nir: add a workaround for viewing a slice of 3D as a 2D image
GL and Vulkan allow you to bind a single layer of a 3D texture to a 2D
image, and we weren't implementing a workaround for that on gfx9 that
TGSI was. Copy it over.
Fixes KHR-GL45.shader_image_load_store.non-layered_binding with radeonsi
NIR.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Fri, 23 Aug 2019 07:23:21 +0000 (09:23 +0200)]
radv: fix getting the index type size for uint8_t
16-bit and 32-bit values match hardware values but 8-bit doesn't.
This fixes dEQP-VK.pipeline.input_assembly.* with 8-bit index.
Fixes: 372c3dcfdb8 ("radv: implement VK_EXT_index_type_uint8")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl
Dave Airlie [Thu, 22 Aug 2019 06:30:11 +0000 (16:30 +1000)]
virgl: fix format conversion for recent gallium changes.
The virgl formats are fixed in time snapshots of the gallium ones,
we just need to provide a translation table between them when
we enter the hardware.
This fixes a regression since Eric renumbered the gallium table.
Fixes: c45c33a5a2 (gallium: Remove manual defining of PIPE_FORMAT enum values.)
Bugzilla: https://bugs.freedesktop.org/111454
v1 by Dave Airlie <airlied@redhat.com>
v2: virgl: Add a number of formats to the table that are used, e.g. for vertex
attributes
v3: cover some more missing formats from a piglit run
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Dave Airlie [Thu, 22 Aug 2019 06:20:39 +0000 (16:20 +1000)]
virgl: drop unused format field
Erico Nunes [Sat, 10 Aug 2019 20:46:02 +0000 (22:46 +0200)]
lima/ppir: enable vectorize optimization
pp has vector units and some operations can be optimized when bundled
together.
Benchmarking this with piglit shaders shows that the instruction count
can be greatly reduced on many examples with vectorize.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Erico Nunes [Sat, 10 Aug 2019 20:44:22 +0000 (22:44 +0200)]
lima/ppir: lower selects to scalars
nir vec4 fcsel assumes that each component of the condition will be used
to select the same component from the options, but pp can't implement
that since it only has 1 component for the condition.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Erico Nunes [Fri, 23 Aug 2019 04:34:36 +0000 (06:34 +0200)]
lima: fix ppir spill stack allocation
The previous spill stack was fixed and too small, and caused instability
in programs requiring spilling for roughly more than one value.
This patch adds a dynamic calculation of the buffer size based on stack
utilization and switches it to a separate allocation at flush time that
will fit the shader that requires the largest buffer.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Jason Ekstrand [Fri, 23 Aug 2019 20:33:24 +0000 (15:33 -0500)]
intel/fs: Drop the gl_program from fs_visitor
It's not used by anything anymore now that so much lowering has been
moved into NIR. Sadly, we still need on in brw_compile_gs() for
geometry shaders on Sandy Bridge. Short of a lot of pointless work,
that one's probably not going away.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Qiang Yu [Sat, 17 Aug 2019 08:40:49 +0000 (16:40 +0800)]
lima: move format handling to unified place
Create a unified table to handle pipe format to texture
and render target format lookup.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Alex Smith [Sun, 2 Jun 2019 10:32:06 +0000 (11:32 +0100)]
radv: Change memory type order for GPUs without dedicated VRAM
Put the uncached GTT type at a higher index than the visible VRAM type,
rather than having GTT first.
When we don't have dedicated VRAM, we don't have a non-visible VRAM
type, and the property flags for GTT and visible VRAM are identical.
According to the spec, for types with identical flags, we should give
the one with better performance a lower index.
Previously, apps which follow the spec guidance for choosing a memory
type would have picked the GTT type in preference to visible VRAM (all
Feral games will do this), and end up with lower performance.
On a Ryzen 5 2500U laptop (Raven Ridge), this improves average FPS in
the Rise of the Tomb Raider benchmark by up to ~30%. Tested a couple of
other (Feral) games and saw similar improvement on those as well.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
(Bas: CCing this to 19.2-rc due to high impact and limited complexity)
Vasily Khoruzhick [Tue, 20 Aug 2019 14:30:46 +0000 (07:30 -0700)]
lima/ppir: print register index and components number for spilled register
It can be useful for debugging purposes
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Vasily Khoruzhick [Mon, 19 Aug 2019 06:37:23 +0000 (23:37 -0700)]
lima/ppir: add control flow support
This commit adds support for nir_jump_instr, if and loop
nir_cf_nodes.
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Vasily Khoruzhick [Mon, 19 Aug 2019 06:34:22 +0000 (23:34 -0700)]
lima/ppir: add better liveness analysis
Add better liveness analysis that was modelled after one in vc4.
It uses live ranges and is aware of multiple blocks which is prerequisite
for adding CF support
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Vasily Khoruzhick [Fri, 23 Aug 2019 04:17:23 +0000 (21:17 -0700)]
lima/ppir: validate shader outputs
Mali4x0 supports only gl_FragColor. gl_FragDepth is not supported.
Check that we don't get anything but gl_FragColor in shader outputs.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>