Dave Airlie [Tue, 15 Nov 2016 06:46:50 +0000 (06:46 +0000)]
radv: fix image view creation for depth and stencil only
This fixes the image view for sampling just the depth.
It removes some pointless swizzle code, and adds
a missing case for the x8_d24 format.
Fixes:
dEQP-VK.renderpass.formats.d32_sfloat_s8_uint.input.*
dEQP-VK.renderpass.formats.d24_unorm_s8_uint.input.*
dEQP-VK.renderpass.formats.x8_d24_unorm_pack32.input.*
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 16 Nov 2016 23:41:29 +0000 (23:41 +0000)]
radv: make sure to flush input attachments correctly.
This fixes 9 of the
dEQP-VK.renderpass.attachment_allocation.input_output.*
tests.
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Brian Paul [Sat, 19 Nov 2016 01:02:44 +0000 (18:02 -0700)]
tnl: remove unneeded #include "util/simple_list.h"
Reviewed-by: Vinson Lee <vlee@freedesktop.org>
Brian Paul [Sat, 19 Nov 2016 13:46:19 +0000 (06:46 -0700)]
radeon: remove unneeded #include "util/simple_list.h"
Compile tested only.
Reviewed-by: Vinson Lee <vlee@freedesktop.org>
Brian Paul [Sat, 19 Nov 2016 13:41:40 +0000 (06:41 -0700)]
r200: remove unneeded #include "util/simple_list.h"
And include "util/simple_list.h" where it is needed in r200_state.c
Compile tested only.
Reviewed-by: Vinson Lee <vlee@freedesktop.org>
Brian Paul [Sat, 19 Nov 2016 13:41:24 +0000 (06:41 -0700)]
i915: remove unneeded #include "util/simple_list.h"
Compile tested only.
Reviewed-by: Vinson Lee <vlee@freedesktop.org>
Brian Paul [Fri, 18 Nov 2016 21:20:55 +0000 (14:20 -0700)]
mesa: remove unneeded #includes in errors.c
Reviewed-by: Vinson Lee <vlee@freedesktop.org>
Brian Paul [Fri, 18 Nov 2016 21:19:13 +0000 (14:19 -0700)]
mesa: remove trailing whitespace in errors.c
Reviewed-by: Vinson Lee <vlee@freedesktop.org>
Kenneth Graunke [Wed, 12 Oct 2016 07:13:55 +0000 (00:13 -0700)]
nir: Add a C wrapper for glsl_type::is_array_of_arrays().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 4 Oct 2016 06:41:09 +0000 (23:41 -0700)]
i965: Store a clip_distance_mask field similar to cull_distance_mask.
This isn't useful for legacy GL, but will be used in Vulkan.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 4 Oct 2016 06:36:30 +0000 (23:36 -0700)]
i965: Use shader_info for brw_vue_prog_data::cull_distance_mask.
This also allows us to move it from a GL specific location to a
part of the compiler shared by both GL and Vulkan.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 4 Oct 2016 05:18:09 +0000 (22:18 -0700)]
compiler: Store the clip/cull distance array sizes in shader_info.
We switched from a boolean to array lengths in gl_program a while back.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Mon, 14 Nov 2016 23:59:57 +0000 (15:59 -0800)]
i965: Fix GS push inputs with enhanced layouts.
We weren't taking first_component into account when handling GS push
inputs. We hardly ever push GS inputs, so this was not caught by
existing tests. When I started using component qualifiers for the
gl_ClipDistance arrays, glsl-1.50-transform-feedback-type-and-size
started catching this.
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Sat, 19 Nov 2016 20:29:01 +0000 (12:29 -0800)]
i965: Delete unused variable.
I forgot to delete this in
9ef2b9277d3bead6dbfa47e95794ca61e8be4e84.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Kenneth Graunke [Tue, 15 Nov 2016 19:43:07 +0000 (11:43 -0800)]
intel: Share URB configuration code between GL and Vulkan.
This code is far too complicated to cut and paste.
v2: Update the newly added genX_gpu_memcpy.c; const a few things.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Kenneth Graunke [Tue, 15 Nov 2016 09:03:13 +0000 (01:03 -0800)]
i965: Use arrays in Gen7+ URB code.
So much of this code was cut and pasted per stage. We can accomplish
much of it by looping over shader stages.
Improves performance of OglBatch7 (version 6) by 1.50783% +/- 0.287049%
(n = 71) at 1024x768 on Cherryview.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Kenneth Graunke [Tue, 15 Nov 2016 10:16:24 +0000 (02:16 -0800)]
i965: Drop brw->urb.{nr_*_entries,*_start} assignments from gen7_urb.c.
The context fields are for Gen4-5; setting them has always been useless.
There's no point in spending the cost in the hottest path in the driver.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Kenneth Graunke [Tue, 15 Nov 2016 10:00:59 +0000 (02:00 -0800)]
i965: Switch to roundf in HS/DS URB code.
Matt intentionally switched the VS calculation to be float-based in
commit
c1da15709a0c0c2775bd9e534f67c60f7dc95ce8. Tessellation support
was written before this and rebased forward, and missed the change.
Now it's consistent.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Kenneth Graunke [Tue, 15 Nov 2016 08:37:42 +0000 (00:37 -0800)]
i965: Make URB code use prog_data for GS/tessellation enable checks.
If geometry/tessellation shaders are disabled, prog_data will be NULL
(see brw_state_upload.c). This consolidates dirty bits a little.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Kenneth Graunke [Tue, 15 Nov 2016 08:07:35 +0000 (00:07 -0800)]
intel: Convert devinfo->urb.min_*_entries into an array.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Kenneth Graunke [Tue, 15 Nov 2016 07:45:16 +0000 (23:45 -0800)]
intel: Convert devinfo->urb.max_*_entries into an array.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Brian Paul [Thu, 17 Nov 2016 14:20:32 +0000 (07:20 -0700)]
docs: document MESA_DEBUG=context
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Ilia Mirkin [Fri, 18 Nov 2016 01:48:40 +0000 (20:48 -0500)]
swr: mark streamout buffers as written
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Timothy Arceri [Mon, 7 Nov 2016 03:47:18 +0000 (14:47 +1100)]
st/mesa/glsl/nir/i965: make use of new gl_shader_program_data in gl_shader_program
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Timothy Arceri [Mon, 7 Nov 2016 03:43:48 +0000 (14:43 +1100)]
mesa: create new gl_shader_program_data struct
This will be used to share data between gl_program and gl_shader_program
allowing for greater code simplification as we can remove a number of
awkward uses of gl_shader_program.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Timothy Arceri [Sat, 19 Nov 2016 00:11:04 +0000 (11:11 +1100)]
glsl: add new program driver function to standalone compiler
This fixes a regression with the standalone compiler caused by
9d96d3803ab5dc
Note that we change standalone_compiler_cleanup() to no longer
explicitly free the linked shaders as the will be freed when
we free the parent ctx whole_program.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98774
Kenneth Graunke [Thu, 17 Nov 2016 02:29:22 +0000 (18:29 -0800)]
i965: Disable depth writes when depth test is GL_EQUAL.
There's no point in performing depth writes when the depth test
comparison function is set to GL_EQUAL - it would just write out
the same value that's already there (if it is written at all). While
this is harmless from a functional perspective, it hurts performance.
Obviously, writing to memory is not free, but there's another more
subtle impact as well: it can prevent early depth optimizations.
Depth writes aren't supposed to happen for pixels that are killed
by fragment shader discard statements or the alpha test. So, with
depth writes enabled and either of those, the pixel shader must be
invoked to determine whether or not to perform the write. This is
fairly stupid in the EQUAL case - we're running a shader to decide
whether to replace the existing depth value with itself.
By disabling these pointless writes, we allow early depth even with
discards and alpha testing, allowing the hardware to skip the pixel
shader altogether if the depth test fails.
Improves performance of Unigine Valley:
- Skylake GT2: +17.8%
- Broadwell GT3e: +11.5%
- Cherrytrail: +19.4%
Huge thanks to Mark Janes for building frameretrace [1], the performance
analysis tool that helped us find this issue, and to Robert Bragg for
providing us performance metrics on Linux. Mark also spent the time to
analyze Valley performance on Windows vs. Linux and discovered a
discrepancy in early depth test metrics. Once he had isolated a draw
call and drawn attention to the problem, fixing it was pretty simple.
[1] https://github.com/janesma/apitrace/wiki/frameretrace-branch
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Timothy Arceri [Fri, 11 Nov 2016 00:45:55 +0000 (11:45 +1100)]
glsl: tidy up entries temporary
Here we just move initialisation of entries to where it is needed i.e.
outside the loop and after the continue checks.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Timothy Arceri [Fri, 11 Nov 2016 00:45:54 +0000 (11:45 +1100)]
glsl/i965: move per stage AtomicBuffers list to gl_program
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Timothy Arceri [Mon, 31 Oct 2016 12:54:03 +0000 (23:54 +1100)]
glsl: create gl_program at the start of linking rather than the end
This will allow us to directly store metadata we want to retain in
gl_program this metadata is currently stored in gl_linked_shader and
will be lost if relinking fails even though the program will remain
in use and is still valid according to the spec.
"If a program object that is active for any shader stage is re-linked
unsuccessfully, the link status will be set to FALSE, but any existing
executables and associated state will remain part of the current
rendering state until a subsequent call to UseProgram,
UseProgramStages, or BindProgramPipeline removes them from use."
This change will also help avoid the double handing that happens in
_mesa_copy_linked_program_data().
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Timothy Arceri [Thu, 17 Nov 2016 01:26:08 +0000 (12:26 +1100)]
st/mesa/i965: simplify gl_program references and stop leaking
In i965 we were calling _mesa_reference_program() after creating
gl_program and then later calling it again with NULL as a param
to get the refcount back down to 1. This changes things to not
use _mesa_reference_program() at all and just have gl_linked_shader
take ownership of gl_program since refcount starts at 1.
The st and ir_to_mesa linkers were worse as they were both getting
in a state were the refcount would never get to 0 and we would leak
the program.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Nanley Chery [Wed, 28 Sep 2016 23:00:52 +0000 (16:00 -0700)]
anv/cmd_buffer: Enable stencil-only HZ clears
The HZ sequence modifies less state than the blorp path and requires
less CPU time to generate the necessary packets.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Nanley Chery [Wed, 28 Sep 2016 22:51:35 +0000 (15:51 -0700)]
anv/cmd_buffer: Manage Anv state around HZ op emission
Move the assignment to a less surprising location.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Nanley Chery [Fri, 30 Sep 2016 17:28:18 +0000 (10:28 -0700)]
anv/cmd_buffer: Clarify HZ rectangle behavior
This behavior differs from what's described in the PRMs and was
observed by analyzing CTS test results.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Nanley Chery [Wed, 16 Nov 2016 00:42:23 +0000 (16:42 -0800)]
mesa/fbobject: Update CubeMapFace when reusing textures
Framebuffer attachments can be specified through FramebufferTexture*
calls. Upon specifying a depth (or stencil) framebuffer attachment that
internally reuses a texture, the cube map face of the new attachment
would not be updated (defaulting to TEXTURE_CUBE_MAP_POSITIVE_X).
Fix this issue by actually updating the CubeMapFace field.
This bug manifested itself in BindFramebuffer calls performed on
framebuffers whose stencil attachments internally reused a depth
texture. When binding a framebuffer, we walk through the framebuffer's
attachments and update each one's corresponding gl_renderbuffer. Since
the framebuffer's depth and stencil attachments may share a
gl_renderbuffer and the walk visits the stencil attachment after
the depth attachment, the uninitialized CubeMapFace forced rendering
to TEXTURE_CUBE_MAP_POSITIVE_X.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77662
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Lionel Landwerlin [Thu, 3 Nov 2016 17:18:45 +0000 (17:18 +0000)]
mesa: add NV_image_formats extension support
This extension can be enabled automatically as it is a subset of
ARB_shader_image_load_store.
v2: Replace helper function by qualifier struct field (Ilia)
Enable NV_image_formats using ARB_shader_image_load_store (Ilia)
v3: Drop extension field from gl_extensions (Ilia)
Release notes (Ilia)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98480
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Timothy Arceri [Fri, 18 Nov 2016 00:51:59 +0000 (11:51 +1100)]
mesa: fix old classic drivers to use ralloc for ARB asm programs
These changes were missed in
0ad69e6b5.
Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98767
Nicolai Hähnle [Wed, 16 Nov 2016 09:39:04 +0000 (10:39 +0100)]
st/mesa: silence warnings in optimized builds
Mark variables and static functions that only occur in assert()s as
MAYBE_UNUSED.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Nicolai Hähnle [Tue, 15 Nov 2016 14:42:18 +0000 (15:42 +0100)]
radeonsi: emit sample locations also when nr_samples == 1
Since the state tracker now enables MSAA in the hardware for the case
nr_samples == 1 as well, we need to set sample locations correctly for
this case.
The Polaris override is still needed for the non-MSAA case (when
nr_samples == 0).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Nicolai Hähnle [Tue, 15 Nov 2016 13:37:47 +0000 (14:37 +0100)]
radeonsi: allow sample mask export for single-sample framebuffers
This fixes GL45-CTS.sample_variables.mask.*.samples_1.*.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Nicolai Hähnle [Tue, 15 Nov 2016 13:47:25 +0000 (14:47 +0100)]
st/mesa: remove a redundant call to _mesa_is_multisample_enabled
We called it immediately prior, so re-use the previously returned value.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Nicolai Hähnle [Tue, 15 Nov 2016 13:37:05 +0000 (14:37 +0100)]
mesa/main: consider multisampling enabled when number of samples == 1
There are some differences between how non-multisampled framebuffers (i.e.
samples == 0) and multisampled framebuffers with a single sample should be
treated. For example, alpha to coverage and writing to gl_SampleMask has an
effect with single-sample multisample framebuffers, but not on
non-multisample framebuffers.
This fixes GL45-CTS.sample_variables.mask.*.samples_1.* at least for
Gallium drivers (and possibly others, though at least radeonsi needs an
additional fix).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Kenneth Graunke [Fri, 18 Nov 2016 08:28:44 +0000 (00:28 -0800)]
i965: Delete fs_visitor::nir_setup_single_output_varying prototype.
I deleted this function in
59864e8e02057cc6fa0448a8af067a3cf53389da.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tapani Pälli [Wed, 17 Aug 2016 07:37:45 +0000 (10:37 +0300)]
mesa: fix empty program log length
In case we have empty log (""), we should return 0. This fixes
Khronos WebGL conformance test 'program-infolog'.
From OpenGL ES 3.1 (and OpenGL 4.5 Core) spec:
"If pname is INFO_LOG_LENGTH , the length of the info log, including
a null terminator, is returned. If there is no info log, zero is
returned."
v2: apply same fix for get_shaderiv and _mesa_GetProgramPipelineiv (Ian)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97321
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Roland Scheidegger [Sat, 12 Nov 2016 21:46:58 +0000 (22:46 +0100)]
draw: finally optimize bool clip mask generation
lp_build_any_true_range is just what we need, though it will only produce
optimal code with sse41 (ptest + set) - but even without it on 64bit x86
the code is still better (1 unpack, 2 movq + or + set), on 32bit x86 it's
going to be roughly the same as before.
While here also make it a "real" 8bit boolean - cuts one instruction but
more importantly similar to ordinary booleans.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Sat, 12 Nov 2016 21:46:32 +0000 (22:46 +0100)]
draw: use vectorized calculations for fetch (v2)
Instead of doing all the math with scalars, use vectors. This means the
overflow math needs to be done manually, albeit that's only really
problematic for the stride/index mul, the rest has been pretty much
moved outside the shader loop (albeit the mul could actually be optimized
away too), where things are still scalar.
To eliminate control flow in the main shader loop fetch, provide fake
buffers (so index 0 is always valid to fetch).
Still uses aos fetch though in the end - mostly because some more code
would be needed to handle unaligned fetches in that path, and because for
most formats it won't make a difference anyway (we generate some truly
horrendous code for things like R16G16_something for instance).
Instanced fetch however stays roughly the same as before, except that
no longer the same element is fetched multiple times (I've seen a reduction
of ~3 times in main shader loop size due to llvm not recognizing it's all
the same fetch, since it would have been possible some of the fetches
getting replaced with zeros in case vector size exceeds remaining fetch
count - the values of such fetches don't matter at all though).
Also, for elts gathering, use vectorized code as well.
The generated shaders are smaller and faster to compile (not entirely sure
about execution speed, but generally unless there's just single vertices
to handle I would expect it to be faster - there's more opportunities
for future improvements by using soa fetch).
v3: skip the fake index buffer, not needed due to the jit code never seeing
the real index buffer in the first place.
Fix a bug with mask expansion (needs SExt, not ZExt).
Also, be really really careful to keep the behavior the same, even in cases
where it looks wrong, and add comments why the code is doing the seemingly
wrong stuff... Fortunately it's not actually more complex in the end...
Also change function order slightly just to make the diff more readable.
No piglit change. Passes some internal testing with another api too...
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Jordan Justen [Wed, 16 Nov 2016 01:55:41 +0000 (17:55 -0800)]
i965/gen7: Minify blit size for stencil tree copy
Found by the piglit 'fbo-depth-array stencil-clear' test when
implementing blorp blit splitting for gen7.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 15 Nov 2016 19:53:33 +0000 (11:53 -0800)]
mesa: Drop PATH_MAX usage.
GNU/Hurd does not define PATH_MAX since it doesn't have such arbitrary
limitation, so this failed to compile. Apparently glibc does not
enforce PATH_MAX restrictions anyway, so it's kind of a hoax:
https://www.gnu.org/software/libc/manual/html_node/Limits-for-Files.html
MSVC uses a different name (_MAX_PATH) as well, which is annoying.
We don't really need it. We can simply asprintf() the filenames.
If the filename exceeds an OS path limit, presumably fopen() will
fail, and we already check that. (We actually use ralloc_asprintf
because Mesa provides that everywhere, and it doesn't look like we've
provided an implementation of GNU's asprintf() for all platforms.)
Fixes the build on GNU/Hurd.
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98632
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Kenneth Graunke [Thu, 17 Nov 2016 04:24:25 +0000 (20:24 -0800)]
i965: Fix compute shader crash.
Fixes crashes when starting Deus Ex: Mankind Divided.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Jason Ekstrand [Fri, 28 Oct 2016 09:03:36 +0000 (02:03 -0700)]
anv/TODO: Check off render buffer compression
There's still a tiny bit of work to do for storage images but it's
otherwise pretty much done at this point.
Jason Ekstrand [Tue, 25 Oct 2016 17:59:03 +0000 (10:59 -0700)]
anv: Enable "permanent" compression for immutable format images
This commit extends our support of color compression to surfaces without
the VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT set. These images will never have
an image view created with a different format then the one set at image
creation time so it's safe to always use compression. We still bail if the
image is used as a storage image because that sometimes ends up using a
different format.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason Ekstrand [Wed, 26 Oct 2016 09:27:01 +0000 (02:27 -0700)]
intel/blorp: Properly handle color compression in blorp_copy
Previously, blorp copy operations were CCS-unaware so you had to perform
resolves on the source and destination before performing the copy. This
commit makes blorp_copy capable of handling CCS-compressed images without
any resolves.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason Ekstrand [Wed, 26 Oct 2016 08:58:16 +0000 (01:58 -0700)]
intel/blorp: Always use UINT formats on SKL+
Many of these UINT formats aren't available prior to Sky Lake so we used
UNORM formats. Using UINT formats is a bit nicer because it guarantees we
don't run into rounding issues. Also, we will need it in the next commit
for handling copies with CCS enabled.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason Ekstrand [Wed, 16 Nov 2016 21:47:13 +0000 (13:47 -0800)]
i965/blorp: Rework resolve handling
This commit moves the handling of resolves into blorp_surf_for_miptree().
Instead of each helper doing resolves and checks itself, it simply tells
blorp_surf_for_miptree which aux modes are supported by the given blorp
operation and blorp_surf_for_miptree will resolve as-needed.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason Ekstrand [Tue, 25 Oct 2016 17:32:18 +0000 (10:32 -0700)]
anv/image: Add an aux_usage field for "default" aux
Initially, the field is set to ISL_AUX_USAGE_NONE so this commit shouldn't
bring any functional changes. Setting this field to something else will
cause all sampled and storage image views to be created with AUX and blorp
will start trying to respect it so set with care.
Jason Ekstrand [Tue, 25 Oct 2016 05:03:45 +0000 (22:03 -0700)]
anv: Add initial support for Sky Lake color compression
This commit adds basic support for color compression. For the moment,
color compression is only enabled within a render pass and a full resolve
is done before the render pass finishes. All texturing operations still
happen with CCS disabled.
Jason Ekstrand [Thu, 27 Oct 2016 09:11:56 +0000 (02:11 -0700)]
anv/pass: Precompute some subpass usage information
Jason Ekstrand [Thu, 27 Oct 2016 08:33:39 +0000 (01:33 -0700)]
util/vk_alloc: Add a vk_zalloc2 helper
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason Ekstrand [Tue, 25 Oct 2016 02:35:34 +0000 (19:35 -0700)]
anv/image: Memset all aux surfaces (not just HiZ) to 0
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason Ekstrand [Tue, 25 Oct 2016 02:31:36 +0000 (19:31 -0700)]
anv/image: Rename hiz_surface to aux_surface
Jason Ekstrand [Fri, 28 Oct 2016 05:42:02 +0000 (22:42 -0700)]
anv/blorp: Ignore clears for attachments first used as resolve destinations
Otherwise, we'll try to clear it the first time it's used as a draw so if
you do some multisampled rendering, resolve to an attachment, and then draw
on top of the single-sampled attachment, we might accidentally clear it.
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Jason Ekstrand [Tue, 25 Oct 2016 17:48:12 +0000 (10:48 -0700)]
intel/blorp: Take a fast_clear_op in ccs_resolve
Eventually, we may want to just have a single blorp_ccs_op function that
does both clears and resolves. For now we'll stick to just making the
ccs_resolve function we have now a bit more configurable.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Pohjolainen, Topi [Tue, 11 Oct 2016 19:26:35 +0000 (22:26 +0300)]
intel/blorp: Add plumbing for color resolve slice details
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jason Ekstrand [Thu, 27 Oct 2016 05:56:53 +0000 (22:56 -0700)]
intel/isl: Allow non-2D CCS surfaces
The CCS calculations in ISL are already correct for 1-D and 3-D CCS
surfaces since they have exactly the same layout as 2-D array surfaces (at
least on Sky Lake). The only problem was that we weren't passing in the
right dimensionality and we weren't passing in the depth.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason Ekstrand [Wed, 21 Sep 2016 11:03:27 +0000 (04:03 -0700)]
intel/isl: Rework the asserts and fails in isl_surf_get_ccs
There are some invariants such as number of samples on which we should
assert. However, most other things should silently return false since
they're much easier for isl_surf_get_ccs to check than the caller. We also
update the checking to be a bit more complete.
Jason Ekstrand [Tue, 25 Oct 2016 02:25:20 +0000 (19:25 -0700)]
anv/cmd_buffer: Refactor surface state relocation handling
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason Ekstrand [Tue, 25 Oct 2016 02:50:20 +0000 (19:50 -0700)]
anv/cmd_buffer: Pull add_surface_state_reloc into genX_cmd_buffer.c
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason Ekstrand [Wed, 21 Sep 2016 07:05:58 +0000 (00:05 -0700)]
anv/image: Stop force-disabling AUX
Auxiliary surfaces have to be created manually anyway so force-disabling it
does nothing whatsoever at the moment.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Tom Stellard [Wed, 16 Nov 2016 21:21:15 +0000 (21:21 +0000)]
mesa: Add missing call to _mesa_unlock_debug_state(ctx); v2
cd724208d3e1e3307f84a794f2c1fc83b69ccf8a added a call to
_mesa_lock_debug_state(ctx) but wasn't unlocking the debug state.
This fixes a hang in glsl-fs-loop piglit test with MESA_DEBUG=context.
v2:
- Remove unrelated changes.
Reviewed-by: Brian Paul <brianp@vmware.com>
Eric Engestrom [Wed, 16 Nov 2016 22:29:53 +0000 (22:29 +0000)]
egl: fix helper function name
I introduced this code last month, but didn't follow the naming
convention. Fix this.
Fixes: 0a606a400fe382a9bc72 ("egl: add eglSwapBuffersWithDamageKHR")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Eric Engestrom [Tue, 15 Nov 2016 23:48:52 +0000 (23:48 +0000)]
egl/x11: misc style fixes
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Eric Engestrom [Tue, 15 Nov 2016 23:41:38 +0000 (23:41 +0000)]
egl: fix function name in debug string
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Jason Ekstrand [Fri, 11 Nov 2016 06:31:32 +0000 (22:31 -0800)]
nir/spirv: Fix handling of gl_PrimitiveId
Before, we were always treating it as an output which bogus. The only
stage in which this it can be an output is the geometry stage. In all
other stages, it's an input which, in the back-end, we actually want to be
a system value.
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Fri, 11 Nov 2016 05:46:13 +0000 (21:46 -0800)]
anv/fence: Handle ANV_FENCE_CREATE_SIGNALED_BIT
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Fri, 11 Nov 2016 05:32:32 +0000 (21:32 -0800)]
anv: Handle null in all destructors
This fixes a bunch of new CTS tests which look for exactly this. Even in
the cases where we just call vk_free to free a CPU data structure, we still
handle NULL explicitly. This way we're less likely to forget to handle
NULL later should we actually do something less trivial.
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Fri, 11 Nov 2016 05:12:16 +0000 (21:12 -0800)]
util/vk_alloc: Ensure NULL is handled correctly in vk_free
Reviewed-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Thu, 10 Nov 2016 02:45:21 +0000 (18:45 -0800)]
anv/device: Silence a 32-bit warning
Eric Anholt [Mon, 7 Nov 2016 18:34:01 +0000 (10:34 -0800)]
nir: Avoid an extra NIR op in integer divide lowering.
NIR bools are ~0 for true, so ((unsigned)a >> 31) != 0 -> ((int)a >> 31).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Eric Anholt [Fri, 11 Nov 2016 01:47:34 +0000 (17:47 -0800)]
vc4: Try compiling our FSes in multithreaded mode on new kernels.
Multithreaded fragment shaders let us hide texturing latency by a
hyperthreading-style switch to another fragment shader. This gets us up
to 20% framerate improvements on glmark2 tests.
Eric Anholt [Thu, 17 Nov 2016 00:57:45 +0000 (16:57 -0800)]
vc4: Add support for ETC1 textures if the kernel is new enough.
The kernel changes for exposing the param have now been merged, so we can
expose it here.
Eric Anholt [Thu, 17 Nov 2016 01:22:35 +0000 (17:22 -0800)]
vc4: Fix simulator mode missing-GETPARAM debug info.
The value is 0 since we didn't set it, we wanted to see the param.
Mun Gwan-gyeong [Wed, 16 Nov 2016 19:17:39 +0000 (04:17 +0900)]
vc4: Fix resource leak in register allocation failure path.
CID
1394322
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Timothy Arceri [Mon, 3 Oct 2016 06:04:26 +0000 (17:04 +1100)]
glsl: stub out _mesa_reference_program() in standalone compiler
The follow patch will call this directly from the linker, the
shader cache will also start calling these from the compiler.
Timothy Arceri [Wed, 16 Nov 2016 23:52:28 +0000 (10:52 +1100)]
st/mesa/r200/i915/i965: move ARB program fields into a union
It's common for games to compile 2000 programs or more so at
32bits x 2000 programs x 22 fields x 2 (at least) stages
This should give us something like 352 kilobytes in savings
once we add some more glsl only fields.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Timothy Arceri [Wed, 16 Nov 2016 23:51:19 +0000 (10:51 +1100)]
st/mesa: stop initialing Instructions and NumInstructions
Since gl_program is now created with rzalloc() they should
already be initialised.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Timothy Arceri [Sat, 5 Nov 2016 11:35:41 +0000 (22:35 +1100)]
mesa: make use of ralloc when creating ARB asm gl_program fields
This will allow us to move the ARB asm fields in gl_program into
a union as we will be able call ralloc_free() on the entire struct
when destroying the context.
In this change we switch over to using ralloc for the Instructions,
String and LocalParams fields of gl_program.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Timothy Arceri [Sat, 5 Nov 2016 11:27:22 +0000 (22:27 +1100)]
mesa: remove unused Comment field in prog_instruction
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Timothy Arceri [Thu, 27 Oct 2016 08:15:19 +0000 (19:15 +1100)]
i965: get num_abos from shader_info rather than gl_linked_shader
This is a step towards freeing gl_linked_shader after linking.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Timothy Arceri [Thu, 27 Oct 2016 05:17:19 +0000 (16:17 +1100)]
mesa/glsl: copy num_abos to gl_program
We should be able to free gl_linked_shader after linking in order to
do so we need to switch to getting values from gl_program instead.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Timothy Arceri [Thu, 27 Oct 2016 04:59:46 +0000 (15:59 +1100)]
i965: get num_images from shader_info rather than gl_linked_shader
This is a step towards freeing gl_linked_shader after linking.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Timothy Arceri [Thu, 27 Oct 2016 03:47:09 +0000 (14:47 +1100)]
mesa/glsl: copy num_images to gl_program
We should be able to free gl_linked_shader after linking in order to
do so we need to switch to getting values from gl_program instead.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Timothy Arceri [Thu, 27 Oct 2016 08:13:05 +0000 (19:13 +1100)]
nir: add support for counting AoA uniforms in nir_shader_gather_info()
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Timothy Arceri [Thu, 3 Nov 2016 10:47:04 +0000 (21:47 +1100)]
i965: only try print GLSL IR once when using INTEL_DEBUG to dump ir
Since we started releasing GLSL IR after linking the only time we can
print GLSL IR is during linking. When regenerating variants only NIR
will be available.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Jason Ekstrand [Fri, 11 Nov 2016 00:27:47 +0000 (16:27 -0800)]
anv/descriptor_set: Put the whole state in the state free list
We're not really saving much by just putting the offset in there.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Jason Ekstrand [Fri, 11 Nov 2016 00:43:35 +0000 (16:43 -0800)]
anv/descriptor_set: Write the state offset in the surface state free list.
When Kristian reworked descriptor set allocation, somehow he forgot to
actually store the offset in the free list. Somehow, this completely
missed CTS testing until now... This fixes all 2744 of the new
'dEQP-VK.texture.filtering.* tests in the latest CTS.
Cc: "12.0 13.0" <mesa-dev@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Timothy Arceri [Wed, 16 Nov 2016 03:02:12 +0000 (14:02 +1100)]
compiler: remove now unused copy_shader_info() declaration
Left over from
4ac66861
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Timothy Arceri [Wed, 16 Nov 2016 03:02:11 +0000 (14:02 +1100)]
compiler: include shader_enums.h in shader_info.h
We make use of some enums here.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tim Rowley [Wed, 16 Nov 2016 01:44:45 +0000 (19:44 -0600)]
swr: [rasterizer core] fix clear with multiple color attachments
Fixes fbo-mrt-alphatest
v2: styling fixes
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Ben Widawsky [Wed, 16 Nov 2016 19:39:29 +0000 (11:39 -0800)]
Partial revert "i965: "Fix" aux offsets"
This partially reverts commit
0d241085f723402120b4b47e939fe77020a16d80.
HiZ buffer cannot do this properly now, and it's not required, so remove
it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Ben Widawsky [Wed, 16 Nov 2016 01:35:37 +0000 (17:35 -0800)]
i965: "Fix" aux offsets
When 1 BO is used for aux data, it needs to point to the correct offset,
which will not be the BOs offset but instead an offset from the BOs
offset. Since today there are always multiple BOs for aux, this doesn't
actually change anything.
Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>