Marek Olšák [Thu, 9 Nov 2017 22:25:34 +0000 (23:25 +0100)]
radeonsi: do 64-bit LDS loads recursively
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Jon Turney [Sat, 11 Nov 2017 14:48:10 +0000 (14:48 +0000)]
mapi: Teach es{1,2}api/ABI-check shared library names on Cygwin
Ideally we'd be able to get the library filename from libtool, but that
doesn't seem to be a feature...
Use of ${uname} is presumably ok here as we won't be running 'make check' if
we are cross-compiling
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Samuel Pitoiset [Wed, 22 Nov 2017 15:13:28 +0000 (16:13 +0100)]
Revert "radv: remove unnecessary memset() in radv_AllocateCommandBuffers()"
This fixes two CTS regressions:
- dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_primary
- dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_secondary
These two tests are part the mustpass lists, so presumably they
are correct and my change was wrong.
This reverts commit
0f68208f1d1d3b7b2963dab40e84c60212518692.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 22 Nov 2017 19:13:26 +0000 (20:13 +0100)]
radv/winsys: improve error messages when the buffer list creation failed
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 22 Nov 2017 19:13:25 +0000 (20:13 +0100)]
radv/winsys: do not try to create a BO list with 0 buffers
This happens when all BOs have the RADEON_FLAG_NO_INTERPROCESS_SHARING
(DRM version >= 3.23) flag set. This flag is mainly used for reducing
overhead on the userspace side because we don't have to put those BOs
inside the list.
Though, if the driver tries to create a list with 0 buffers inside it,
libdrm returns -EINVAL and the app just crashes.
This fixes a bunch of CTS dEQP-VK.sparse_resources.* fails (~100).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Iago Toral Quiroga [Tue, 21 Nov 2017 10:33:53 +0000 (11:33 +0100)]
i965/vec4: fix splitting of interleaved attributes
When we split an instruction that reads an uniform value
(vstride 0) we need to respect the vstride on the second
half of the instruction (that is, the second half should
read the same region as the first).
We were doing this already, but we didn't account for
stages that have interleaved input attributes which also
have a vstride of 0 and need the same treatment.
Fixes the following on Haswell:
KHR-GL45.enhanced_layouts.varying_locations
KHR-GL45.enhanced_layouts.varying_array_locations
KHR-GL45.enhanced_layouts.varying_structure_locations
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Andres Gomez <agomez@igalia.com>
Wladimir J. van der Laan [Thu, 23 Nov 2017 09:08:34 +0000 (10:08 +0100)]
etnaviv: Emit vertex buffers consecutively
Vertex buffer legacy state is no longer picked up with new drawing
commands. Change to use different cases depending on the number of
vertex streams in the GPU specs.
This results in slightly more compact state emission as well, on all
vivantes.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Eric Engestrom [Thu, 9 Nov 2017 17:38:25 +0000 (17:38 +0000)]
REVIEWERS: add Alexander von Gluck IV as a reviewer for Haiku
There's been some Haiku-related activity lately, so let's document who
to cc on these patches.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Alexander von Gluck IV <kallisti5@unixzen.com>
Eric Engestrom [Wed, 22 Nov 2017 10:11:13 +0000 (10:11 +0000)]
genxml: fix assert guards
This removes a few hundred warnings on debug builds with asserts off.
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Engestrom [Tue, 21 Nov 2017 15:07:50 +0000 (15:07 +0000)]
meson: add variable for mapi_abi.py instead of going back up the tree
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Eric Engestrom [Tue, 21 Nov 2017 15:07:11 +0000 (15:07 +0000)]
meson: reorder subdirs to avoid directly including more than one level
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Eric Engestrom [Tue, 21 Nov 2017 14:24:01 +0000 (14:24 +0000)]
meson: fix strtof locale support check
Fixes: d1992255bb29054fa5176 "meson: Add build Intel "anv" vulkan driver"
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Roland Scheidegger [Wed, 22 Nov 2017 02:11:33 +0000 (03:11 +0100)]
r600: set DX10_CLAMP for compute shader too
I really intended to set this for all shader stages by
3835009796166968750ff46cf209f6d4208cda86 but missed it for compute shaders
(because it's in a different source file...).
Reviewed-by: Dave Airlie <airlied@redhat.com>
Lionel Landwerlin [Fri, 17 Nov 2017 17:29:26 +0000 (17:29 +0000)]
anv: flag batch & instruction BOs for capture
When the kernel support flagging our BO, let's mark batch &
instruction BOs for capture so then can be included in the error
state.
v2: Only add EXEC_CAPTURE if supported (Kristian)
v3: Fix operator precedence issue (Lionel)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel Landwerlin [Fri, 17 Nov 2017 17:26:59 +0000 (17:26 +0000)]
anv: setup BO flags at state_pool/block_pool creation
This will allow to set the flags on any anv_bo created/filled from a
state pool or block pool later.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gert Wollny [Wed, 15 Nov 2017 09:29:11 +0000 (10:29 +0100)]
r600/shader: Fix all warnings issed with "-Wall -Wextra"
- fix a number of -Wsign-compare warnings
- fix two warnings for -Woverride-init because TGSI_OPCODE_CEIL == 83, and
the according field was defined two times.
[airlied: don't use -1 with unsigned type,
fix whitespace]
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Gert Wollny [Fri, 17 Nov 2017 11:13:40 +0000 (12:13 +0100)]
r600: Emit EOP for more CF instruction types
So far on pre-cayman chipsets the CF instructions CF_OP_LOOP_END,
CF_OP_CALL_FS, CF_OP_POP, and CF_OP_GDS an extra CF_NOP instruction
was added to add the EOP flag, even though this is not actually
needed, because all these instrutions support the EOP flag.
This patch removes the fixup code, adds setting the EOP flag for the
according instructions as well as others like CF_OP_TEX and CF_OP_VTX,
and adds writing out EOP for this type of instruction in the disassembler.
This also fixes a bug where shaders were created that didn't actually have
the EOP flag set in the last CF instruction, which might have resulted
in GPU lockups.
[airlied: cleaned up a little]
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dylan Baker [Tue, 21 Nov 2017 00:34:28 +0000 (16:34 -0800)]
meson: replace with_*dri with with_dri_platform
This fixes the windows and macos stubs to be consistent with the *nix
path.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Dylan Baker [Sat, 28 Oct 2017 00:20:52 +0000 (17:20 -0700)]
meson: add logic to select apple and windows dri
This is still not fully correct (haiku and BSD is notably probably not
correct), but Linux is not regressed and this should be correct for
macOS and Windows.
v2: - set the dri_platform to windows on Cygwin as well (Jon)
v3: - Add a better todo for Hurd (Eric)
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Dylan Baker [Tue, 21 Nov 2017 00:26:06 +0000 (16:26 -0800)]
meson: Fix LLVM requires for radeonsi
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Dylan Baker [Sat, 18 Nov 2017 00:37:50 +0000 (16:37 -0800)]
meson: convert llvm option to tristate
This option has been acting as a strange sort of half-tri state anyway.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Dylan Baker [Thu, 16 Nov 2017 01:31:32 +0000 (17:31 -0800)]
meson: Convert platform to auto
This is necessary to support operating systems other than the *nix
family (excluding macOS). For Linux nothing has changed, the defaults
are still the same.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Dylan Baker [Thu, 16 Nov 2017 01:30:52 +0000 (17:30 -0800)]
meson: Remove duplicate _GNU_SOURCE
There is one provided unconditionally, and one guarded by platform ==
linux. Remove the unconditional one.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Dylan Baker [Thu, 16 Nov 2017 01:09:33 +0000 (17:09 -0800)]
meson: Remove completed or irrelevant TODO comments
These are all either done already, or are autotools specific. The
misspelled gallium G3DVL is the autotools specific bit, meson is
handling that via build_by_default.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Dylan Baker [Thu, 16 Nov 2017 01:07:37 +0000 (17:07 -0800)]
meson: Fix TODO for missing dl_iterate_phdr function
This function is required for both the Intel "Anvil" vulkan driver and
the i965 GL driver. Error out if either of those is enabled but this
function isn't found.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Dylan Baker [Thu, 16 Nov 2017 00:53:40 +0000 (16:53 -0800)]
meson: disable x86 asm in fewer cases.
This patch allows building asm for x86 on x86_64 platforms, when the
operating system is the same. Previously cross compile always turned off
assembly. This allows using a cross file to cross compile x86 binaries
on x86_64 with asm.
This could probably be relaxed further thanks to meson's "exe_wrapper",
which is way to specify an emulator or compatibility layer (wine) that
can run the foreign binaries on the build system. Since the meson build
at this point only supports building on Linux I can't test this and I
don't want to write/enable code that cannot even be build tested.
v4: - set condition to build == x86_64 and host == x86 and
build.system == host.system
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Dylan Baker [Thu, 16 Nov 2017 00:09:22 +0000 (16:09 -0800)]
meson: Enable SSE4.1 optimizations
This patch checks for an and then enables sse4.1 optimizations if the
host machine will be x86/x86_64.
v2: - Don't compile code, it's unnecessary since we require a compiler
which always has SSE4.1 (Matt)
v3: - x64 -> x86_64 (Matt)
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Eric Anholt [Wed, 22 Nov 2017 00:32:33 +0000 (16:32 -0800)]
broadcom/vc5: Fix BASE_LEVEL handling with txl.
The HW doesn't add the base level anywhere (the min/max lod clamping is
what does base level), so we need to add it manually in this case.
Fixes piglit tex-miplevel-selection *Lod 2D.
Eric Anholt [Wed, 22 Nov 2017 00:05:49 +0000 (16:05 -0800)]
broadcom/vc5: Fix array texture layer count setup.
Fixes piglit array-texture.
Eric Anholt [Tue, 21 Nov 2017 23:27:20 +0000 (15:27 -0800)]
broadcom/vc5: Don't increment primitive queries while they're paused.
Fixes ext_transform_feedback-generatemipmap prims_generated
Eric Anholt [Tue, 21 Nov 2017 23:20:31 +0000 (15:20 -0800)]
broadcom/vc5: Fix incorrect padding of TF outputs.
After the first output, we were padding by an extra size of the previous
output. Fixes piglit ext_transform_feedback-output-type mat4x3[2] and
friends.
Eric Anholt [Tue, 21 Nov 2017 23:00:36 +0000 (15:00 -0800)]
broadcom/vc5: Fix UIF surface size setup for ARB_fbo's mismatched sizes.
The HW was computing an implicit height for the surface based on the image
size, but that may be smaller than the surface with ARB_fbo mismatched
sizes. In that case, we need to tell it about the pad, either with the
little 4-bit field in the RT config, or the extended field in
CLEAR_COLORS_PART3.
Fixes piglit arb_framebuffer_object-mixed-buffer-sizes.
Wladimir J. van der Laan [Sat, 18 Nov 2017 09:44:25 +0000 (10:44 +0100)]
etnaviv: Put HALTI level in specs
The HALTI level is an indication of the gross architecture of the GPU.
It determines for significant part what feature level the GPU has, what
state (especially frontend state) is there, and where it is located.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Wladimir J. van der Laan [Sat, 18 Nov 2017 09:44:24 +0000 (10:44 +0100)]
etnaviv: Const-correctness etnaviv_emit.h
The relocation structure is never changed by submitting it.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Juan A. Suarez Romero [Tue, 21 Nov 2017 11:38:27 +0000 (12:38 +0100)]
meson: add si_driinfo.h in libgallium_dri
v2: generate target conditionally (Dylan)
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Iago Toral Quiroga [Thu, 16 Nov 2017 07:53:07 +0000 (08:53 +0100)]
nir/gather_info: recognize load_patch_vertices_in as a system value
This intrinsic is produced to load SYSTEM_VALUE_VERTICES_IN, which is
generated to load gl_PatchVerticesIn in the SPIR-V path for both
Vulkan and OpenGL.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Jordan Justen [Wed, 15 Nov 2017 00:27:34 +0000 (16:27 -0800)]
i965: Support decoding INTERFACE_DESCRIPTOR_DATA with INTEL_DEBUG=bat
This will dump the INTERFACE_DESCRIPTOR_DATA along with the associated
samplers & surfaces.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>
Kristian H. Kristensen [Wed, 30 Nov 2016 05:07:57 +0000 (21:07 -0800)]
intel/genxml: Add helpers for determining field type
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Matt Turner [Mon, 20 Nov 2017 22:21:43 +0000 (14:21 -0800)]
i965/fs: Check ADD/MAD with immediates in satprop unit test
The gen had to be changed from 4 to 6 so that we could test MAD, which
is new on Gen6.
mad_imm_float_neg_mov_sat tests the case fixed by the previous commit.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Matt Turner [Mon, 20 Nov 2017 22:24:57 +0000 (14:24 -0800)]
i965/fs: Handle negating immediates on MADs when propagating saturates
MADs don't take immediate sources, but we allow them in the IR since it
simplifies a lot of things. I neglected to consider that case.
Fixes: 4009a9ead490 ("i965/fs: Allow saturate propagation to propagate
negations into MADs.")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103616
Reported-and-Tested-by: Ruslan Kabatsayev <b7.10110111@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Juan A. Suarez Romero [Wed, 15 Nov 2017 16:49:21 +0000 (16:49 +0000)]
mesa/teximage: add TEXTURE_CUBE_MAP_ARRAY target for CompressedTexImage3D
From section 8.7, page 179 of OpenGL ES 3.2 spec:
An INVALID_OPERATION error is generated by CompressedTexImage3D
if internalformat is one of the the formats in table 8.17 and target
is not TEXTURE_2D_ARRAY, TEXTURE_CUBE_MAP_ARRAY or TEXTURE_3D.
An INVALID_OPERATION error is generated by CompressedTexImage3D if
internalformat is TEXTURE_CUBE_MAP_ARRAY and the “Cube Map Array”
column of table 8.17 is not checked, or if internalformat is
TEXTURE_3D and the “3D Tex.” column of table 8.17 is not checked.
So far it was only considering TEXTURE_2D_ARRAY as valid target. But as
"Cube Map Array" column is checked for all the cases, in practice we can
consider also TEXTURE_CUBE_MAP_ARRAY.
This fixes KHR-GLES32.core.texture_cube_map_array.etc2_texture
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Tapani Pälli [Mon, 20 Nov 2017 08:57:17 +0000 (10:57 +0200)]
intel: fix disasm_info memory leaks
Fixes: 4f82b1728719 ("i965: Rewrite disassembly annotation code")
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Timothy Arceri [Thu, 16 Nov 2017 00:16:10 +0000 (11:16 +1100)]
st/glsl_to_nir: don't generate nir twice for gs
This was left out of
c980a3aa3133
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Roland Scheidegger [Sat, 18 Nov 2017 05:23:35 +0000 (06:23 +0100)]
llvmpipe: fix snorm blending
The blend math gets a bit funky due to inverse blend factors being
in range [0,2] rather than [-1,1], our normalized math can't really
cover this.
src_alpha_saturate blend factor has a similar problem too.
(Note that piglit fbo-blending-formats test is mostly useless for
anything but unorm formats, since not just all src/dst values are
between [0,1], but the tests are crafted in a way that the results
are between [0,1] too.)
v2: some formatting fixes, and fix a fairly obscure (to debug)
issue with alpha-only formats (not related to snorm at all), where
blend optimization would think it could simplify the blend equation
if the blend factors were complementary, however was using the
completely unrelated rgb blend factors instead of the alpha ones...
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Dave Airlie [Fri, 13 May 2016 04:35:33 +0000 (14:35 +1000)]
r600: add cull distance support
This passes all the tests in piglit.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Aravindan Muthukumar [Thu, 9 Nov 2017 05:45:28 +0000 (11:15 +0530)]
i965: Optimize bucket index calculation
Reducing Bucket index calculation to O(1).
This algorithm calculates the index using matrix method. Assuming
PAGE_SIZE is 4096, matrix arrangement is as below:
1*4096 2*4096 3*4096 4*4096
5*4096 6*4096 7*4096 8*4096
10*4096 12*4096 14*4096 16*4096
20*4096 24*4096 28*4096 32*4096
... ... ... ...
... ... ... ...
... ... ... max_cache_size
From this matrix its clearly seen that every row follows the below way:
... ... ... n
n+(1/4)n n+(1/2)n n+(3/4)n 2n
Row is calculated as log2(size/PAGE_SIZE) Column is calculated as
converting the difference between the elements to fit into power size of
two and indexing it.
Final Index is (row*4)+(col-1)
Tested with Intel Mesa CI.
Improves performance of 3DMark on BXT by 0.705966% +/- 0.229767% (n=20)
v4: Review comments on style and code comments implemented (Ian).
v3: Review comments implemented (Ian).
v2: Review comments implemented (Jason).
Signed-off-by: Aravindan Muthukumar <aravindan.muthukumar@intel.com>
Signed-off-by: Kedar Karanje <kedar.j.karanje@intel.com>
Reviewed-by: Yogesh Marathe <yogesh.marathe@intel.com>
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Dylan Baker [Wed, 15 Nov 2017 01:04:27 +0000 (17:04 -0800)]
meson: Guard the gallium dri componenet
Currently the target has a redundant guard, and the state tracker isn't
properly guarded.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Dylan Baker [Wed, 15 Nov 2017 01:03:39 +0000 (17:03 -0800)]
meson: don't build gallium subdir unless we're building gallium
This will allow us to simplify some guards within the gallium directory.
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Eric Anholt [Mon, 20 Nov 2017 18:14:38 +0000 (10:14 -0800)]
broadcom/vc5: Align 1D texture miplevels to 64b.
Fixes tex-miplevel-selection GL2:texture() 1D
Eric Anholt [Mon, 20 Nov 2017 18:07:24 +0000 (10:07 -0800)]
broadcom/vc5: Clamp min lod to the last level.
Otherwise, the simulator would complain in tex-miplevel-selection that the
min/max clamp was out of order. The actual HW seems to have clamped to
the max anyway.
Eric Anholt [Mon, 20 Nov 2017 20:26:49 +0000 (12:26 -0800)]
broadcom/vc5: Increase simulator memory for tex-miplevel-selection.
We were overflowing, because of all the little 4k allocations for CLs that
were getting expanded to 128kb in the simulator due to the GMP alignment.
Tim Rowley [Fri, 10 Nov 2017 22:45:38 +0000 (16:45 -0600)]
swr/rast: Repair simd8 frontend code rot
Keep non-default simd8 frontend code running for comparison purposes.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Thu, 9 Nov 2017 01:17:24 +0000 (19:17 -0600)]
swr/rast: Implement AVX-512 GATHERPS in SIMD16 fetch shader
Disabled for now.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Wed, 8 Nov 2017 20:07:33 +0000 (14:07 -0600)]
swr/rast: Simplify GATHER* jit builder api
General cleanup, and prep work for possibly moving to llvm masked
gather intrinsic.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Tue, 7 Nov 2017 21:24:25 +0000 (15:24 -0600)]
swr/rast: Add alignment to transpose targets
Needed to ensure alignment for avx512.
Fixes address sanitizer crash.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Tue, 7 Nov 2017 19:50:11 +0000 (13:50 -0600)]
swr/rast: Cache eventmanager
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Tue, 31 Oct 2017 21:46:59 +0000 (16:46 -0500)]
swr/rast: Enable AVX-512 targets in the jitter
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Tue, 31 Oct 2017 14:41:02 +0000 (09:41 -0500)]
swr/rast: Points with clipdistance can't go through simplepoints path
Fixes piglit glsl-1.20:vs-clip-vertex-primitives and
glsl-1.30:vs-clip-distance-primitives.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Mon, 23 Oct 2017 20:10:35 +0000 (15:10 -0500)]
swr/rast: Code style change (NFC)
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Thu, 19 Oct 2017 22:33:37 +0000 (17:33 -0500)]
swr/rast: Widen fetch shader to SIMD16
Widen fetch shader to SIMD16, enable SIMD16 types in the jitter,
and provide utility EXTRACT/INSERT SIMD8 <-> SIMD16 utility functions.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Tim Rowley [Wed, 18 Oct 2017 21:51:07 +0000 (16:51 -0500)]
swr/rast: Support flexible vertex layout for DS output
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Nicolai Hähnle [Fri, 10 Nov 2017 10:15:44 +0000 (11:15 +0100)]
gallium/u_threaded: avoid syncing in threaded_context_flush
We could always do the flush asynchronously, but if we're going to wait
for a fence anyway and the driver thread is currently idle, the additional
communication overhead isn't worth it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Fri, 10 Nov 2017 09:58:10 +0000 (10:58 +0100)]
radeonsi: avoid syncing the driver thread in si_fence_finish
It is really only required when we need to flush for deferred fences.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Mon, 13 Nov 2017 13:50:17 +0000 (14:50 +0100)]
radeonsi: recompute the relative timeout after waiting for ready fence
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Fri, 10 Nov 2017 16:13:27 +0000 (17:13 +0100)]
ddebug: fix the hang detection timeout calculation
Fixes: c9fefa062b36 ("ddebug: rewrite to always use a threaded approach")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Fri, 10 Nov 2017 12:11:53 +0000 (13:11 +0100)]
ddebug: fix use-after-free of streamout targets
Fixes: b47727a83ad6 ("ddebug: implement pipelined hang detection mode")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Fri, 10 Nov 2017 10:28:28 +0000 (11:28 +0100)]
gallium/u_threaded: properly initialize fence unflushed tokens
This got lost in a rebase but never hurt anything because we happened
to always sync in fence_finish anyway...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Fri, 10 Nov 2017 11:32:44 +0000 (12:32 +0100)]
util/u_queue: really use futex-based fences
The relevant define changed in the final revision of the simple mutex
patch.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Mon, 13 Nov 2017 13:35:50 +0000 (14:35 +0100)]
util/u_queue: fix timeout handling in util_queue_fence_wait_timeout
Fixes: e3a8013de8ca ("util/u_queue: add util_queue_fence_wait_timeout")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Thu, 9 Nov 2017 13:34:20 +0000 (14:34 +0100)]
st/mesa: use asynchronous flushes in st_finish
With threaded gallium, the driver may currently be running in another
thread. In that case, we will execute all remaining commands in that
thread instead of syncing, which should be better for cache locality.
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Thu, 9 Nov 2017 13:34:19 +0000 (14:34 +0100)]
st/mesa: implement st_server_wait_sync properly
Asynchronous flushes require a proper implementation of
st_server_wait_sync, because we could have the following with
threaded Gallium:
Context 1 app Context 1 driver Context 2
------------- ---------------- ---------
f = glFenceSync
glFlush
<-- app sync --> <-- app sync -->
glWaitSync(f)
.. draw calls ..
pipe_context::flush
for glFenceSync
pipe_context::flush
for glFlush
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Mon, 6 Nov 2017 10:56:54 +0000 (11:56 +0100)]
u_threaded_gallium: remove synchronization in fence_server_sync
The whole point of fence_server_sync is that it can be used to
avoid waiting in the application thread.
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Wed, 15 Nov 2017 11:51:23 +0000 (12:51 +0100)]
amd: build addrlib with C++11
It is required for LLVM anyway.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103658
Fixes: 7f33e94e43a6 ("amd/addrlib: update to latest version")
Tested-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Wed, 15 Nov 2017 10:22:26 +0000 (11:22 +0100)]
radeonsi/gfx9: fix VM fault with fetched instance divisors
We need to account for SGPR locations in merged shaders.
This case is exercised by KHR-GL45.enhanced_layouts.vertex_attrib_locations
Fixes: 79c2e7388c7f ("radeonsi/gfx9: use SPI_SHADER_USER_DATA_COMMON")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Wed, 15 Nov 2017 11:08:29 +0000 (12:08 +0100)]
radv: use a 16 bytes array for the sampled/storage image descriptors
This allows to update them with only one memcpy().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Wed, 15 Nov 2017 09:55:05 +0000 (10:55 +0100)]
radv: do not add the query pool BO to the list in vkCmdEndQuery()
As per the spec, the query identified by queryPool and query
must currently be active. Applications have to call vkCmdBeginQuery()
before, and thus the query pool BO will already be in the list.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Wed, 15 Nov 2017 14:44:01 +0000 (15:44 +0100)]
radv: only load needed depth clear regs for fast depth clears
Similar to how the driver sets the depth clear regs after a
fast depth clear. Most of the time, this will copy a 32-bit reg
instead of a 64-bit reg.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 15 Nov 2017 14:44:00 +0000 (15:44 +0100)]
radv: do not add the image BO in radv_set_depth_clear_regs()
For the fast path, radv_fill_buffer() ensures that the BO is
already in the list. For the slow path, the depth surface is
part of the framebuffer which means the BO is added to the list
when the framebuffer is emitted.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 15 Nov 2017 14:43:59 +0000 (15:43 +0100)]
radv: remove useless assertion in emit_depthstencil_clear()
Already checked in emit_clear().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Samuel Pitoiset [Wed, 15 Nov 2017 14:43:58 +0000 (15:43 +0100)]
radv: remove useless check in radv_set_depth_clear_regs()
aspects can't be zero and there is an assertion that ensures
it's not in emit_clear().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Dave Airlie [Sun, 19 Nov 2017 23:19:31 +0000 (09:19 +1000)]
docs/features: mark some r600 extensions supported
These just looked to be missed when this file was updated.
Signed-off-by: Dave Airlie <airlied@redhat.com>
George Barrett [Sun, 19 Nov 2017 10:55:10 +0000 (21:55 +1100)]
glsl: Catch subscripted calls to undeclared subroutines
generate_array_index fails to check whether the target of a subroutine
call exists in the AST, potentially passing around null ir_rvalue
pointers eventuating in abort/segfault.
Fixes: fd01840c0bd3 ("glsl: add AoA support to subroutines")
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100438
Eric Anholt [Fri, 13 Oct 2017 20:11:15 +0000 (13:11 -0700)]
broadcom/vc5: Fix up integer texture handling.
The original spec I had didn't expose integer textures and suggested that
you use unfiltered floats. Now there are proper formats for them.
Fixes 16- and 32-bit texwrap integer tests in piglit, and
dEQP-GLES3.functional.fbo.completeness.renderable.renderbuffer.color0.rgb10_a2ui.
Eric Anholt [Fri, 17 Nov 2017 01:50:55 +0000 (17:50 -0800)]
broadcom/vc5: Fix simulator assertion failures about color RT clears.
When we tried to clear color while storing depth, it assertion failed
about basically not having enough information to decide which color RT to
clear. It turns out the STORE_GENERAL picks the buffer according to the
color buffer being stored, or all of them if NONE. If you're doing depth,
it doesn't know which to pick.
Rob Clark [Sat, 18 Nov 2017 15:40:49 +0000 (10:40 -0500)]
freedreno/ir3: add texture gather support
Signed-off-by: Rob Clark <robdclark@gmail.com>
Lucas Stach [Wed, 15 Nov 2017 16:33:17 +0000 (17:33 +0100)]
etnaviv: enable full overwrite when no color buffer is present
The OVERWRITE bit disables destination fetches, which is exactly what
we want when there is no valid color buffer bound.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
Jason Ekstrand [Sat, 18 Nov 2017 01:27:55 +0000 (17:27 -0800)]
i965: Stop including brw_cfg.h in brw_disasm_info.h
The brw_disasm_info header is included by certain tools in order to get
shader assembly from binaries so it's a semi-external header. Including
brw_cfg.h also pulls in brw_shader.h so you end up getting quite a bit
of our back-end compiler internals. Instead, make the couple of forward
declarations we need and make the header more stand-alone. This fixes
the meson build.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 4f82b17287194ca7d10816f6cfe4712a3e0a03fc
Jason Ekstrand [Sat, 18 Nov 2017 00:52:09 +0000 (16:52 -0800)]
i965: Mark BOs as external when we export their handle
Almost all of our BO export paths were already properly marked the BO as
external and added it to the handle table. Most export use-cases go
through a prime fd or flink where we have a brw_bo export helper that
does the right thing. The one missing one happens when you call
queryImage and ask for __DRI_IMAGE_ATTRIB_HANDLE. We just grabbed the
gem handle out of the BO (because it's really easy to do that) and
handed it off to the client; what could go wrong? As it turns out, this
path is used by basically every compositor that wants to turn around and
call drmModeAddFB2 on it so it can hand it off to display. The result,
as of
4b1e70cc57d7ff5f465544644b2180dee1490cee, is that we no longer set
MOCS_PTE on those surfaces and the kernel's attempts to disable caching
fail and we scanout gets corruption.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103759
Fixes: 4b1e70cc57d7ff5f465544644b2180dee1490cee
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Jason Ekstrand [Sat, 18 Nov 2017 00:49:03 +0000 (16:49 -0800)]
i965/bufmgr: Add a helper to mark a BO as external
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Andres Gomez [Sat, 18 Nov 2017 00:48:45 +0000 (02:48 +0200)]
i965: Correct disasm_info usage in eu_validate test
Fixes: 4f82b1728719 ("i965: Rewrite disassembly annotation code")
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Eric Anholt [Tue, 14 Nov 2017 23:52:53 +0000 (15:52 -0800)]
broadcom/vc5: Set up the padded height at surface creation time.
This centralizes the calculation in the surface, instead of in each
load/store.
Eric Anholt [Wed, 15 Nov 2017 23:05:37 +0000 (15:05 -0800)]
broadcom/vc5: Ensure that there is always a TLB write.
This should fix some GPU hangs in our (currently always single-threaded)
fragment shaders, and definitely fixes assertion failures in simulation.
Eric Anholt [Tue, 7 Nov 2017 23:42:04 +0000 (15:42 -0800)]
broadcom/vc5: Fix clear color for swap_color_rb render targets.
Fixes dEQP-GLES3.functional.depth_stencil_clear.depth.*
Eric Anholt [Tue, 7 Nov 2017 23:37:46 +0000 (15:37 -0800)]
broadcom/vc5: Fix pasteo in front stencil ref value setup.
Fixes piglit masked-clear.
Eric Anholt [Tue, 7 Nov 2017 23:35:33 +0000 (15:35 -0800)]
broadcom/vc5: Fix colormasking when we need to swap r/b colors.
Fixes part of piglit masked-clear.
Eric Anholt [Tue, 7 Nov 2017 23:21:06 +0000 (15:21 -0800)]
broadcom/vc5: Enable the Z min/max clipping planes.
Eric Anholt [Wed, 15 Nov 2017 00:01:32 +0000 (16:01 -0800)]
broadcom/vc5: Fix driver for new PIPE_SHADER_CAP_MAX_HW_ATOMIC_*.
Brian Paul [Fri, 17 Nov 2017 16:38:39 +0000 (09:38 -0700)]
r300: add PIPE_SHADER_CAP_MAX_HW_ATOMIC_COUNTER* switch cases
To silence compiler warnings.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Brian Paul [Fri, 17 Nov 2017 22:03:21 +0000 (15:03 -0700)]
tgsi: s/uint/enum pipe_shader_type/
Roland Scheidegger <sroland@vmware.com>
Brian Paul [Fri, 17 Nov 2017 16:51:10 +0000 (09:51 -0700)]
tgsi: bump tgsi_opcode_info::output_mode size to 4 bits
To avoid problems with MSVC. And verify size with ASSERT_BITFIELD_SIZE().
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Charmaine Lee <charmainel@vmware.com>