Dave Airlie [Mon, 5 Feb 2018 03:46:23 +0000 (13:46 +1000)]
r600: work out shader export mask at shader build time (v1.1)
Since enhanced layouts allows setting specific MRT outputs, we
can get sparse outputs, so we have to calculate the shader
mask earlier.
v1.1: update checks for state update (Roland)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 5 Feb 2018 03:09:57 +0000 (13:09 +1000)]
r600: fix xfb stream check.
This fixes:
KHR-GL45.enhanced_layouts.xfb_vertex_streams
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Sun, 4 Feb 2018 23:21:27 +0000 (09:21 +1000)]
r600/compute: add render cond support.
Set render cond and emit atom.
Fixes:
KHR-GL45.compute_shader.conditional-dispatching
Reviewed-by: Roland Scheidegger <sorland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Sun, 4 Feb 2018 23:07:54 +0000 (09:07 +1000)]
r600: fix not-very indirect compute
We need to get the grid sizes earlier to fill in to the const
buffer.
Fixes:
KHR-GL45.compute_shader.built-in-variables
and
KHR-GL45.compute_shader.dispatch-indirect
Reviewed-by: Roland Scheidegger <sorland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Sun, 4 Feb 2018 22:42:52 +0000 (08:42 +1000)]
r600: overhaul buffer resource query.
This cleans up and fixes the previous fix even more.
Buffers from textures start at max const,
buffers from buffers/images come in from the 168 offset.
This fixes a bunch of:
KHR-GL45.shader_storage_buffer_object*
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Sun, 4 Feb 2018 20:31:48 +0000 (06:31 +1000)]
r600/eg: fix buffer sizing.
For buffers we want the size in bytes,
For images we want it in elements.
This fixes:
KHR-GL45.shader_storage_buffer_object.advanced-unsizedArrayLength-cs-std430-vec-pad
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 2 Feb 2018 07:07:20 +0000 (17:07 +1000)]
r600/images: set offset for compute shaders with number of declared samplers
for frag shaders we get a value in the key, I expect I need
to make compute work better
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 5 Feb 2018 00:14:19 +0000 (10:14 +1000)]
r600/compute: only mark buffer/image state dirty for fragment shaders
The compute emission path always emits this currently, and emitting
it on the fragment path breaks the blitter.
This fixes gpu hangs in KHR-GL45.compute_shader.resource-texture
Reviewed-by: Roland Scheidegger <sorland@vmware.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 5 Feb 2018 06:46:06 +0000 (16:46 +1000)]
r600/atomic: fix ATOMCAS instruction.
This has 4 srcs.
This fixes:
KHR-GL45.shader_atomic_counter_ops_tests.ShaderAtomicCounterOpsExchangeTestCase
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 5 Feb 2018 06:04:18 +0000 (16:04 +1000)]
r600/sb/cayman: fix indirect ubo access on cayman
With sb enabled on cayman, this was overwriting the proper
cf index value with random ones if the dst gpr was 2 or 3,
only save the value for a MOVA instruction.
Fixes:
KHR-GL45.gpu_shader5.uniform_blocks_array_indexing
(on cayman with sb)
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 2 Feb 2018 05:17:57 +0000 (15:17 +1000)]
r600/eg: use texture target to pick array size not view target (v2)
This fixes a few CTS cases in :
KHR-GL45.texture_view.view_sampling
some multisample cases are still broken, but not sure this is
the same problem.
v2: fix more cases
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Tue, 6 Feb 2018 19:37:48 +0000 (19:37 +0000)]
radv: don't support tc-compat on multisample d32s8 at all.
RX550 fails
dEQP-VK.renderpass.suballocation.multisample.d32_sfloat_s8_uint.samples_2
So increase the range of the workaround.
Fixes: f4c534ef6 (radv: don't enable tc compat for d32s8 + 4/8 samples (v1.1))
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Michal Navratil [Sun, 4 Feb 2018 19:24:02 +0000 (20:24 +0100)]
winsys/amdgpu: allow non page-aligned size bo creation from pointer
Fix INVALID_OPERATION caused by BufferData with target
EXTERNAL_VIRTUAL_MEMORY_BUFFER_AMD when the buffer size is
not page aligned.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Cc: 17.3 18.0 <mesa-stable@lists.freedesktop.org>
Jon Turney [Mon, 5 Feb 2018 21:11:12 +0000 (21:11 +0000)]
meson: ensure xmlpool/options.h is generated for libgallium
In file included from ../src/gallium/targets/dri/target.c:1:
In file included from ../src/gallium/auxiliary/target-helpers/drm_helper.h:8:
../src/util/xmlpool.h:103:10: fatal error: 'xmlpool/options.h' file not found
See also
26bde1e3.
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Andres Gomez [Sun, 28 Jan 2018 23:35:17 +0000 (01:35 +0200)]
vbo: provide 64bits support to print_draw_arrays
Cc: Mathias Fröhlich <mathias.froehlich@web.de>
Cc: Brian Paul <brianp@vmware.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Andres Gomez [Sun, 28 Jan 2018 23:35:16 +0000 (01:35 +0200)]
vbo: take into account the size when printing VAO elements
When using print_draw_arrays for debugging, we were printing an "n"
amount of vertex but that meant not to print all the size in the "n"
vertex, depending on the stride used.
Now we print the whole size in the "n" vertex.
Cc: Mathias Fröhlich <mathias.froehlich@web.de>
Cc: Brian Paul <brianp@vmware.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Andres Gomez [Sun, 28 Jan 2018 23:35:15 +0000 (01:35 +0200)]
vbo: print first element of the VAO when the binding stride is 0
Cc: Mathias Fröhlich <mathias.froehlich@web.de>
Cc: Brian Paul <brianp@vmware.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Iago Toral Quiroga [Mon, 5 Feb 2018 08:49:54 +0000 (09:49 +0100)]
anv/device: initialize the list of enabled extensions properly
The loop goes through the list of enabled extensions marking them as
enabled in the list, but this relies on every other extension being
initialized to false by default.
This bug would make us, for example, advertise certain device extension
entry points as available even when the corresponding extensions had
not been enabled.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: abc62282b5c "anv: Add a per-device table of enabled extensions"
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
Iago Toral Quiroga [Tue, 23 Jan 2018 11:20:16 +0000 (12:20 +0100)]
spirv: split constant initializers on in/out structs
The SPIR-V parser splits in/out struct variables and creates
a separate variable for each first-level member of the struct.
When the struct variable has an initializer this means that we also
need to split the initializer.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Iago Toral Quiroga [Fri, 1 Dec 2017 12:46:23 +0000 (13:46 +0100)]
i965/nir: do int64 lowering before optimization
Otherwise loop unrolling will fail to see the actual cost of
the unrolling operations when the loop body contains 64-bit integer
instructions, and very specially when the divmod64 lowering applies,
since its lowering is quite expensive.
Without this change, some in-development CTS tests for int64
get stuck forever trying to register allocate a shader with
over 50K SSA values. The large number of SSA values is the result
of NIR first unrolling multiple seemingly simple loops that involve
int64 instructions, only to then lower these instructions to produce
a massive pile of code (due to the divmod64 lowering in the unrolled
instructions).
With this change, loop unrolling will see the loops with the int64
code already lowered and will realize that it is too expensive to
unroll.
v2: Run nir_algebraic first so we can hopefully get rid of some of
the int64 instructions before we even attempt to lower them.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Ilia Mirkin [Fri, 2 Feb 2018 05:47:54 +0000 (07:47 +0200)]
mesa: add OES_EGL_image_external_essl3 support
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Vinson Lee [Mon, 5 Feb 2018 23:24:45 +0000 (23:24 +0000)]
r600/fp64: Fix build.
CC r600_shader.lo
r600_shader.c: In function ‘egcm_int_to_double’:
r600_shader.c:4543:12: error: ‘ctx’ is a pointer; did you mean to use ‘->’?
if (ctx.bc->chip_class == CAYMAN)
^
->
Fixes: 35b430157776 ("r600/fp64: fix integer->double conversion")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Dave Airlie [Mon, 29 Jan 2018 00:55:15 +0000 (10:55 +1000)]
r600/fp64: fix integer->double conversion
Doing a straight uint/int->fp32->fp64 conversion causes
some precision issues, Roland suggested splitting the
integer into two portions and doing two separate
int->fp32->fp64 conversions then adding the results.
This passes the tests in CTS and piglit.
[airlied: fix cypress conversion opcodes]
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Mon, 5 Feb 2018 20:37:05 +0000 (21:37 +0100)]
ac/nir: remove emission of nir_op_fdiv
RadeonSI and RADV lower fdiv.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Jon Turney [Tue, 16 Jan 2018 17:51:53 +0000 (17:51 +0000)]
travis: add macOS meson build
v2: Simplify set of options now we have better defaults
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Jon Turney [Sun, 3 Dec 2017 21:58:12 +0000 (21:58 +0000)]
meson: osx ld doesn't support --build-id
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Jon Turney [Sat, 2 Dec 2017 17:13:39 +0000 (17:13 +0000)]
meson: build src/glx/apple
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Dylan Baker [Sat, 28 Oct 2017 04:08:07 +0000 (21:08 -0700)]
meson: set apple glx defines
Reviewed-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Jon Turney [Fri, 2 Feb 2018 22:25:48 +0000 (22:25 +0000)]
meson: better defaults for osx, windows and cygwin
set suitable defaults for 'dri-drivers', 'gallium-drivers', 'vulkan-drivers'
and 'platforms' options for osx, windows and cygwin, adding cygwin where
appropriate.
v2: error() for unknown OS
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Matt Turner [Wed, 31 Jan 2018 19:09:36 +0000 (11:09 -0800)]
i965: Move mistakenly placed line
Ken called this out in review, but it seems I forgot to make the change.
I noticed that the control flow annotations in the fragment shader
disassembly of tests/shaders/glsl-fs-loop-continue.shader_test were not
correct, and moving this line to the correct place fixes it.
Juan A. Suarez Romero [Mon, 5 Feb 2018 16:38:39 +0000 (17:38 +0100)]
glsl/linker: check same name is not used in block and outside
According with OpenGL GLSL 3.20 spec, section 4.3.9:
"It is a link-time error if any particular shader interface
contains:
- two different blocks, each having no instance name, and each
having a member of the same name, or
- a variable outside a block, and a block with no instance name,
where the variable has the same name as a member in the block."
This fixes a previous commit
9b894c8 ("glsl/linker: link-error using the
same name in unnamed block and outside") that covered this case, but
did not take in account that precision qualifiers are ignored when
comparing blocks with no instance name.
With this commit, the original tests
KHR-GL*.shaders.uniform_block.common.name_matching keep fixed, and also
dEQP-GLES31.functional.shaders.linkage.uniform.block.differing_precision
regression is fixed, which was broken by previous commit.
v2: use helper varibles (Matteo Bruni)
Fixes: 9b894c8 ("glsl/linker: link-error using the same name in unnamed block and outside")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104668
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104777
CC: Mark Janes <mark.a.janes@intel.com>
CC: "18.0" <mesa-stable@lists.freedesktop.org>
Tested-by: Matteo Bruni <matteo.mystral@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Juan A. Suarez Romero [Mon, 5 Feb 2018 16:00:11 +0000 (17:00 +0100)]
mesa: enable ASTC format for CompressedTexSubImage3D
If extensions GL_KHR_texture_compression_astc_hdr or
GL_KHR_texture_compression_astc_sliced_3d are implemented then ASTC
format are supported in CompressedTex*Îmage3D.
Fixes KHR-GLES2.texture_3d.* with this format.
CC: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Stephan Gerhold [Wed, 24 Jan 2018 14:13:24 +0000 (15:13 +0100)]
util/build-id: Fix address comparison for binaries with LOAD vaddr > 0
build_id_find_nhdr_for_addr() fails to find the build-id if the first LOAD
segment has a virtual address other than 0x0.
For most shared libraries, the first LOAD segment has vaddr=0x0:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x00000000 0x00000000 0x2d2e26 0x2d2e26 R E 0x1000
LOAD 0x2d2e54 0x002d3e54 0x002d3e54 0x2e248 0x2f148 RW 0x1000
However, compiling the Intel Vulkan driver as 32-bit binary on Android produces
the following ELF header with vaddr=0x8000 instead:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x00008034 0x00008034 0x00100 0x00100 R 0x4
LOAD 0x000000 0x00008000 0x00008000 0x224a04 0x224a04 R E 0x1000
LOAD 0x225710 0x0022e710 0x0022e710 0x25988 0x27364 RW 0x1000
build_id_find_nhdr_callback() compares the address of dli_fbase from dladdr()
and dlpi_addr from dl_iterate_phdr(). With vaddr > 0, these point to a
different memory address, e.g.:
dli_fbase=0xd8395000 (offset 0x8000)
dlpi_addr=0xd838d000
At least on glibc and bionic (Android) dli_fbase refers to the address where
the shared object is mapped into the process space, whereas dlpi_addr is just
the base address for the vaddrs declared in the ELF header.
To compare them correctly, we need to calculate the start of the mapping
by adding the vaddr of the first LOAD segment to the base address.
Note: musl users will need the following patch.
https://git.musl-libc.org/cgit/musl/commit/?id=
b3ae7beabb9f0c219bb8a8b63567a01c6530c1ac
Cc: Chad Versace <chadversary@chromium.org>
Cc: <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104642
Fixes: 5c98d38 "util: Query build-id by symbol address, not library name"
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Boyuan Zhang [Thu, 25 Jan 2018 20:06:35 +0000 (15:06 -0500)]
radeonsi: enable vcn encode for HEVC main
Enable vcn encode for HEVC main profile on Raven.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Boyuan Zhang [Thu, 1 Feb 2018 21:25:44 +0000 (16:25 -0500)]
st/va: implement HEVC encode functions
Implement HEVC encode functions based on VAAPI HEVC encode interface.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Boyuan Zhang [Thu, 1 Feb 2018 21:20:16 +0000 (16:20 -0500)]
st/va: add HEVC encode functions
Add a separate file for HEVC encode functions.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Boyuan Zhang [Thu, 25 Jan 2018 19:32:04 +0000 (14:32 -0500)]
st/va: enable dual instances encode only for H264
Logics that related to dual instances encode should only be done for
H264, not other codecs.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Boyuan Zhang [Thu, 25 Jan 2018 19:21:13 +0000 (14:21 -0500)]
st/va: add entrypoint check for HEVC
Add entrypoint check for HEVC to differentiate decode and encode jobs.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Boyuan Zhang [Thu, 25 Jan 2018 19:18:09 +0000 (14:18 -0500)]
st/va: add HEVC picture desc
Add HEVC picture desc, and add codec check when creating and destroying
context.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Boyuan Zhang [Thu, 1 Feb 2018 20:47:10 +0000 (15:47 -0500)]
st/va: move H264 enc functions into separate file
Move all H264 encode related functions into separate file. Similar to
VAAPI decode side, there will be separate file for each codec on encode
side as well.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Boyuan Zhang [Thu, 1 Feb 2018 20:29:30 +0000 (15:29 -0500)]
radeon/vcn: add header implementations for HEVC
Implement encoding of sps, pps, vps, aud, and slice headers for HEVC
based on HEVC specs.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Boyuan Zhang [Thu, 25 Jan 2018 16:25:20 +0000 (11:25 -0500)]
radeon/vcn: add ib implementations for HEVC
Implement required ibs for vcn HEVC encode.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Boyuan Zhang [Thu, 1 Feb 2018 20:23:49 +0000 (15:23 -0500)]
radeon/vcn: support picture parameters for HEVC
Pass pipe_picture_desc instead of pipe_h264_enc_picture_desc so that
it can be used for different codecs. Add functions to handle picture
parameters that will be used for HEVC encode.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Boyuan Zhang [Thu, 1 Feb 2018 20:05:17 +0000 (15:05 -0500)]
radeon/vcn: add vcn encode interface for HEVC
Add vcn encode interface for HEVC, and rename radeon_enc_h264_enc_pic
to radeon_enc_pic since radeon_enc_pic is used by both H264 and HEVC.
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Boyuan Zhang [Thu, 1 Feb 2018 19:57:37 +0000 (14:57 -0500)]
vl: add parameters for HEVC encode
Add HEVC encode interface
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Eric Anholt [Mon, 5 Feb 2018 12:58:06 +0000 (12:58 +0000)]
broadcom/vc5: Ignore samplers for finding uniform offsets.
Fixes:
KHR-GLES3.shaders.struct.uniform.sampler_array_fragment
KHR-GLES3.shaders.struct.uniform.sampler_array_vertex
KHR-GLES3.shaders.struct.uniform.sampler_nested_fragment
KHR-GLES3.shaders.struct.uniform.sampler_nested_vertex
Eric Anholt [Mon, 5 Feb 2018 11:06:59 +0000 (11:06 +0000)]
broadcom/vc5: Fix non-mipfiltered sampling.
We need to clamp the LOD to 0 if mip filtering is disabled. This is part
of fixing KHR-GLES3.shaders.struct.uniform.sampler_array_fragment.
Eric Anholt [Mon, 5 Feb 2018 10:11:30 +0000 (10:11 +0000)]
broadcom/vc5: Fix "hardwrae" typo in a field name in XML.
Samuel Pitoiset [Fri, 2 Feb 2018 17:56:39 +0000 (18:56 +0100)]
ac/nir: fix a crash in load_gs_input() on pre-GFX9 chips
Fixes: df1d5174fcc ("ac/nir: replace SI.buffer.load.dword with amdgcn.buffer.load")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Eric Anholt [Sun, 4 Feb 2018 21:37:22 +0000 (21:37 +0000)]
broadcom/vc5: Try to merge more than 2 QPU instructions together.
Obviously it would be good to have an ADD and a MUL and a signal together,
but we can even potentially have multiple signals merged, as well.
total instructions in shared programs: 100423 -> 97874 (-2.54%)
instructions in affected programs: 78812 -> 76263 (-3.23%)
Eric Anholt [Sun, 4 Feb 2018 21:05:03 +0000 (21:05 +0000)]
broadcom/vc5: Remove no-op MOVs after register allocation.
We emit some MOVs to track lifetimes of payload registers, but we don't
need there to be actual MOV instructions for them.
total instructions in shared programs: 101045 -> 100423 (-0.62%)
instructions in affected programs: 37083 -> 36461 (-1.68%)
Eric Anholt [Sun, 4 Feb 2018 21:27:32 +0000 (21:27 +0000)]
broadcom/vc5: Add missing shader-db instruction counting.
I must have misplaced it in the instruction packing rework.
Dave Airlie [Fri, 2 Feb 2018 07:28:15 +0000 (17:28 +1000)]
r600: fix resq for buffer images.
If this is an image buffer, we need to calculate the correct resource
id.
Fixes:
KHR-GL45.shader_image_size.*
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 2 Feb 2018 06:56:27 +0000 (16:56 +1000)]
r600/eg: fix cube map array buffer images.
This fixes a crash in:
KHR-GL45.texture_cube_map_array.texture_size_compute_sh.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Wed, 31 Jan 2018 02:03:25 +0000 (03:03 +0100)]
mesa: change ctx->Color.ColorMask into a 32-bit bitmask
4 bits per draw buffer, 8 draw buffers in total --> 32 bits.
This is easier to work with.
Reviewed-by: Eric Anholt <eric@anholt.net>
Jordan Justen [Fri, 2 Feb 2018 21:03:10 +0000 (13:03 -0800)]
i965: Create new program cache bo when clearing the program cache
When the disk shader cache CI testing was enabled, we started noticing
occasional failures on deqp test runs. (Mainly SNB, rarely HSW)
Before this change, when we cleared the (in memory) program cache we
reused the same bo. Since the disk shader cache quickly restores
programs, it appears that this would lead to overwrites of the older
program binaries in the in memory program cache that apparently were
still executing in some cases. If these programs were still executing,
this could cause a GPU hang.
This issue is probably not disk shader cache specific, but may have
been hidden due to the compiler taking time to recompile programs
after the cache was cleared.
v2:
* Don't add `copy` param to brw_cache_new_bo (Ken)
* Call from brw_program_cache_check_size (Ken)
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Jason Ekstrand [Tue, 30 Jan 2018 04:48:57 +0000 (20:48 -0800)]
aubinator: Multiply count by 4 to compute buffer sizes
The count field is in terms of dwords and not bytes. In
7d4007d58ab7c0c1796e116b55814f8be4e699a9, I fixed one instance
of this but missed another.
Eric Anholt [Mon, 22 Jan 2018 01:14:25 +0000 (09:14 +0800)]
broadcom/vc5: Enable UIF XOR on textures.
This should increase performance by reducing SDRAM bank conflicts when
crossing between UIF columns (particularly on power-of-two height
textures).
The uif_xor_disable setup is dropped, since we need to allow XOR on lower
miplevels even when level 0 is XOR. The level 0 force UIF and level 0 XOR
flags should handle setting XOR properly on imported buffers.
Eric Anholt [Fri, 2 Feb 2018 16:50:51 +0000 (08:50 -0800)]
broadcom/vc5: Fix alignment of miplevel 1 with UIF.
The alignment here means that we can't get back the padded height from the
size/stride any more, so it's now a field in the slice as well.
Fixes piglit fbo-generatemipmap-formats RGBA16 NPOT.
Eric Anholt [Sat, 20 Jan 2018 06:35:20 +0000 (22:35 -0800)]
broadcom/vc5: Switch our RGBA4 support to the new gallium format.
Fixes fbo-generatemipmap-formats, fbo-alphatest-formats, etc. tests for
GL_RGBA4, GL_RGB4, GL_RGBA2, etc.
Eric Anholt [Sat, 20 Jan 2018 18:02:07 +0000 (10:02 -0800)]
gallium: Add a new A4B4G4R4 pipe format for Broadcom.
The VC5 HW puts A in the low bits and R in the high bits. We can't just
swizzle in the shaders because the blending HW can't pick what channel A
is in, so make a new format to match it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Eric Anholt [Thu, 1 Feb 2018 19:12:47 +0000 (11:12 -0800)]
mesa: Drop incorrect A4B4G4R4 _mesa_format_matches_format_and_type() cases.
swapBytes operates on bytes, not 4-bit channels, so you can't just take
non-swapBytes cases and flip the REV flag.
Avoids piglit texture-packed-formats regressions when enabling the
ABGR4444 format.
Fixes: c5a5c9a7db89 ("mesa/formats: add new mesa formats and their pack/unpack functions.")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
George Kyriazis [Thu, 1 Feb 2018 16:46:27 +0000 (10:46 -0600)]
meson/swr: Updated copyright dates
cc: mesa-stable@lists.freedesktop.org
cc: dylan@pnwbakers.com
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
George Kyriazis [Thu, 1 Feb 2018 03:44:54 +0000 (21:44 -0600)]
meson/swr: re-shuffle generated files
Move generated files from codegen/meson.build to other directories, in order
to satisfy generated include file dependencies
Add correct file lists for architecture-specific libraries.
cc: mesa-stable@lists.freedesktop.org
cc: dylan@pnwbakers.com
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Marek Olšák [Fri, 2 Feb 2018 18:26:49 +0000 (19:26 +0100)]
amd: remove support for LLVM 3.9
Only these are supported:
- LLVM 4.0
- LLVM 5.0
- LLVM 6.0
- master (7.0)
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Dylan Baker [Fri, 2 Feb 2018 18:45:12 +0000 (10:45 -0800)]
meson: Check for actual LLVM required versions
Currently we always check for 3.9.0, which is pretty safe since
everything except radv work with >= 3.9 and 3.9 is pretty old at this
point. However, radv actually requires 4.0, and there is a patch for
radeonsi to do the same.
Fixes: 673dda833076 ("meson: build "radv" vulkan driver for radeon hardware")
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Dylan Baker [Tue, 16 Jan 2018 18:36:28 +0000 (10:36 -0800)]
meson: Don't confuse the install and search paths for dri drivers
Currently there is not a separate option for setting the search path of
DRI drivers in meson, like there is in scons and autotools. This is an
oversight and needs to be fixed. This adds an extra option
`dri-search-path`, which will default to the value of
`dri-drivers-path`, like autotools does.
v2: - Split input list before joining.
v3: - use : instead of ; as the delimiter. The autotools help string
incorrectly says ; but the code uses :
v4: - Take list in pre : delimited form (Ilia)
- Ensure that the dri-search-path is absolute when using
dri_drivers_path
Fixes: db9788420d4bc7b4 ("meson: Add support for configuring dri drivers directory.")
Reported-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net> (v2)
Reviewed-by: Eric Engestrom <eric@engestrom.ch> (v3)
Marek Olšák [Tue, 2 Jan 2018 03:34:53 +0000 (04:34 +0100)]
radeonsi: use pknorm_i16/u16 and pk_i16/u16 LLVM intrinsics
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Jon Turney [Thu, 18 Jan 2018 13:05:06 +0000 (13:05 +0000)]
travis: add osx autotools build
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Jon Turney [Thu, 1 Feb 2018 15:23:49 +0000 (15:23 +0000)]
travis: pip -> pip2
On travis, for OSX, python2 from homebrew is pre-installed. per [1]:
python points to the macOS system Python (with no manual PATH modification)
python2 points to Homebrew’s Python 2.7.x (if installed)
python3 points to Homebrew’s Python 3.x (if installed)
pip doesn't exist
pip2 points to Homebrew’s Python 2.7.x’s pip (if installed)
pip3 points to Homebrew’s Python 3.x’s pip (if installed)
We will end up using 'python2' for building mesa.
Just use 'pip2' instead of 'pip', as that seems to work for all platforms on
travis.
[1] https://docs.brew.sh/Homebrew-and-Python.html
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Jon Turney [Thu, 1 Feb 2018 15:19:08 +0000 (15:19 +0000)]
travis: conditionalize building of prerequisites on if OS=linux
Use a '|' YAML literal block to avoid the convoluted syntax needed to put
the entire conditional on a single line.
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Jon Turney [Thu, 25 Jan 2018 17:34:54 +0000 (17:34 +0000)]
glx/test: fix building for osx
An additional stub for applegl_create_context() is needed
Cannot test indirect API as it's not built on osx, currently
Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Andres Gomez [Thu, 1 Feb 2018 15:15:14 +0000 (17:15 +0200)]
i965: check if upload is 0 explicitely, when downsizing a format
downsize_format_if_needed takes an integer as number of uploads
parameter. Hence, let's do an integer comparation instead of a boolean
check, since that is confusing.
Since we are at it, fix a couple of wrongly tabbed indents.
Cc: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Marek Olšák [Sun, 7 Jan 2018 17:27:40 +0000 (18:27 +0100)]
mesa: don't flag _NEW_COLOR for KHR adv.blend if prog constant doesn't change
This only affects drivers that set DriverFlags.NewBlend.
v2: - fix typo advanded -> advanced
- return "enum gl_advanced_blend_mode" from
_mesa_get_advanced_blend_sh_constant
- don't call FLUSH_VERTICES twice
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Samuel Pitoiset [Thu, 1 Feb 2018 15:37:15 +0000 (16:37 +0100)]
ac/nir: replace SI.buffer.load.dword with amdgcn.buffer.load
The old one generates useless instructions in there, found while
comparing geometry shaders between RadeonSI and RADV.
This improves all Vulkan demos that use geometry shaders, +4%
for deferredshadows, +9% for viewportarray, +7% for
geometryshader on Polaris10.
This seems to also improve DOW3 a little bit (+1%).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Dave Airlie [Tue, 30 Jan 2018 02:21:59 +0000 (12:21 +1000)]
r600/eg: add crap indirect compute support.
I think the cp packets can be made work, but I think it might
need a kernel change, so for now just do the worst thing.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Thu, 1 Feb 2018 01:31:39 +0000 (17:31 -0800)]
i965: Call prepare_external after implicit window-system MSAA resolves
This fixes some rendering corruption in a couple of Android apps that
use window-system MSAA.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104741
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Roland Scheidegger [Tue, 30 Jan 2018 04:48:27 +0000 (05:48 +0100)]
r600: don't do stack workarounds for hemlock
By the looks of it it seems hemlock is treated separately to cypress, but
certainly it won't need the stack workarounds cedar/redwood (and
seemingly every other eg chip except cypress/juniper) need.
(Discovered by accident.)
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Dave Airlie [Wed, 31 Jan 2018 04:28:26 +0000 (14:28 +1000)]
r600: initial attempt at gl_HelperInvocation (v3)
This passes the CTS and piglit tests.
This also disable sb for helper invocations until it doesn't
mess up the VPM flags.
Thanks to Ilia and Glenn for advice, and Roland for working
out the working evergreen path.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Wed, 31 Jan 2018 11:31:30 +0000 (12:31 +0100)]
radv: Don't expose VK_KHX_multiview on android.
deqp does not allow any KHX extensions, and since deqp is included
in android-cts, android does not allow any khx extensions.
So disable VK_KHX_multiview on android.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
CC: 18.0 <mesa-stable@lists.freedesktop.org>
Mathias Fröhlich [Sat, 27 Jan 2018 15:07:22 +0000 (16:07 +0100)]
vbo: Simplify input array distribution for dlist type draws.
Using the newly introduced VAO array maps, we can
simplify vbo_bind_vertex_list.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Mathias Fröhlich [Sat, 27 Jan 2018 15:07:22 +0000 (16:07 +0100)]
vbo: Simplify input array distribution for imm type draws.
Using the newly introduced VAO array maps, we can
simplify vbo_exec_bind_arrays.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Mathias Fröhlich [Sat, 27 Jan 2018 15:07:22 +0000 (16:07 +0100)]
vbo: Simplify input array distribution for array type draws.
Using the newly introduced VAO state variable, we can
simplify recalculate_input_bindings.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Mathias Fröhlich [Sat, 27 Jan 2018 15:07:22 +0000 (16:07 +0100)]
vbo: Use static const VERT_ATTRIB->VBO_ATTRIB maps.
Instead of each context having its own map instance for
this purpose, use a global static const map.
v2: s,unsigned char,GLubyte,g
s,_VP_MODE_MAX,VP_MODE_MAX,g
Change comment style.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Mathias Fröhlich [Sat, 27 Jan 2018 15:07:22 +0000 (16:07 +0100)]
mesa: Track position/generic0 aliasing in the VAO.
Since the first material attribute no longer aliases with
the generic0 attribute, only aliasing between generic0 and
position is left and entirely dependent on the enabled
state of the VAO. So introduce a gl_attribute_map_mode
in the VAO that is used to track how the position
and the generic 0 attribute alias.
Provide a static const array that can be used to
map from vertex program input indices to VERT_ATTRIB_*
indices. The outer dimension of the array is meant to
be indexed directly by the new VAO member variable.
Also provide methods on the VAO to convert bitmasks of
VERT_BIT's from the VAO numbering to the vertex processing
inputs numbering.
v2: s,unsigned char,GLubyte,g
s,_ATTRIBUTE_MAP_MODE_MAX,ATTRIBUTE_MAP_MODE_MAX,g
Change comment style, add comments.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Mathias Fröhlich [Sat, 27 Jan 2018 15:07:22 +0000 (16:07 +0100)]
mesa: Put materials at the end of the generic block.
The materials are now moved to the end of the
generic attributes block to the range 4-15.
Before, the way the position and generic 0 attribute
is handled was dependent on the presence and kind of
the currently attached vertex program. With this
change the way the position attribute and the generic 0
attribute is treated only depends on the enabled
flag of those two arrays.
This will later help to untangle the update dependencies
between enabled arrays and shader inputs.
v2: s,VERT_ATTRIB_MAT_OFFSET,VERT_ATTRIB_MAT0,g
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Mathias Fröhlich [Sat, 27 Jan 2018 15:07:22 +0000 (16:07 +0100)]
mesa: Use defines for the aliased material array attributes.
Instead of just assuming that the material attributes
just overlap with the generic attributes 0-12, give
them symbolic defines so that we can easier move them
to an other range.
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Mathias Fröhlich [Sat, 27 Jan 2018 15:07:22 +0000 (16:07 +0100)]
vbo: Correctly handle attribute offsets in dlist draw.
When executing a display list draw, for the offset
list to be correct, the offset computation needs to
accumulate all attribute size values in order.
Specifically, if we are shuffling around the position
and generic0 attributes, we may violate the order or
if we do not walk the generic vbo attributes we may
skip some of the attributes.
Even if this is an unlikely usecase we can fix this use
case by precomputing the offsets on the full attribute list
and store the full offset list in the display list node.
v2: Formatting fix
v3: Rebase
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Reviewed-by: Brian Paul <brianp@vmware.com>
Brian Paul [Thu, 1 Feb 2018 20:17:08 +0000 (13:17 -0700)]
gallivm/llvmpipe: add const qualifiers on sampler variables
Once a lp_build_sampler_soa or lp_build_sampler_aos object is created,
it should never be modified. Found by inspection.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Brian Paul [Wed, 31 Jan 2018 23:19:07 +0000 (16:19 -0700)]
vbo: change an argument in vbo_draw_indirect_prims()
In vbo_draw_indirect_prims() pass the 'indirect_data' argument to
vbo->draw_prims(). All the callers are passing ctx->DrawIndirectBuffer
so this should be no functional change. Add a (temporary) assertion to
be sure.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Brian Paul [Wed, 31 Jan 2018 23:15:53 +0000 (16:15 -0700)]
vbo: add comments on the VBO draw function typedefs
And rename indirect_params -> indirect_draw_count_buffer and
indirect_params_offset -> indirect_draw_count_offset to be more
specific.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Brian Paul [Wed, 31 Jan 2018 23:11:12 +0000 (16:11 -0700)]
vbo: s/drawcount/drawcount_offset
This parameter (from the glMultiDrawArraysIndirectCountARB function)
is poorly named. It's an offset into the buffer which contains the
number of primitives to draw.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Brian Paul [Wed, 31 Jan 2018 21:05:46 +0000 (14:05 -0700)]
vbo: use vbo local var for draw call in vbo_save_playback_vertex_list()
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Mathias Fröhlich <mathias.froehlich@web.de>
Brian Paul [Thu, 1 Feb 2018 03:36:35 +0000 (20:36 -0700)]
svga: remove unneeded #includes in svga_pipe_draw.c
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Brian Paul [Thu, 1 Feb 2018 03:33:29 +0000 (20:33 -0700)]
svga: whitespace/formatting fixes in svga_pipe_draw.c
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Brian Paul [Thu, 1 Feb 2018 03:18:52 +0000 (20:18 -0700)]
svga: clean up retry_draw_range_elements(), retry_draw_arrays()
Get rid of a bunch of goto spaghetti. Remove unneeded do_retry parameter.
No Piglit changes. Also tested w/ Google Earth and other apps.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Brian Paul [Wed, 31 Jan 2018 23:10:39 +0000 (16:10 -0700)]
svga: remove unused min/max_index params to draw_vgpu10()
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Eric Anholt [Wed, 24 Jan 2018 01:13:53 +0000 (12:13 +1100)]
broadcom/vc5: Fix image_h setup for both loads and stores.
The image_h for the tiling algorithm needs to be the padded-to-a-uifblock
height of the level, not the unpadded height or the height of level 0.
Fixes some cases of KHR-GLES3.texture_repeat_mode.* and
depthstencil-render-miplevels.
Eric Anholt [Sun, 21 Jan 2018 02:25:10 +0000 (10:25 +0800)]
broadcom/vc5: Add appropriate height padding for bank conflicts.
I thought I didn't need this because I was doing level-0-always-UIF and
that the pad there would propagate down, but it turns out that for level 1
the padding ends up being chosen by the HW. This brings us closer to
being able to turn on UIF XOR for increased performance, as well.
Eric Anholt [Sun, 21 Jan 2018 00:44:53 +0000 (16:44 -0800)]
broadcom/vc5: Simplify separate stencil surface setup.
If we just make another gallium surface for the separate stencil, it's a
lot easier to keep track of which set of fields we're using in RCL setup.
This also incidentally fixes a little bug in setting up the surface's
padded height for separate stencil when the UIF-ness changes at different
levels of Z versus stencil.