Ilia Mirkin [Wed, 9 Nov 2016 20:13:26 +0000 (15:13 -0500)]
swr: remove unnecessary -1 entries in format mapping table
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Ilia Mirkin [Wed, 9 Nov 2016 22:16:36 +0000 (17:16 -0500)]
swr: rework resource layout and surface setup
This is a bit of a mega-commit, but unfortunately there's no great way
to break this up since a lot of different pieces have to match up. Here
we do the following:
- change surface layout to match swr's Load/StoreTile expectations
- fix sampler settings to respect all sampler view parameters
- fix stencil sampling to read from secondary resource
- respect pipe surface format, level, and layer settings
- fix resource map/unmap based on the new layout logic
- fix resource map/unmap to copy proper parts of stencil values in and
out of the matching depth texture
These fix a massive quantity of piglits, including all the
tex-miplevel-selection ones.
Note that the swr native miptree layout isn't extremely space-efficient,
and we end up using it for all textures, not just the renderable ones. A
back-of-the-envelope calculation suggests about 10%-25% increased memory
usage for miptrees, depending on the number of LODs. Single-LOD textures
should be unaffected.
There are a handful of regressions as a result of this change:
- Some textureGrad tests, these failures match llvmpipe. (There are
debug settings allowing improved gallivm sampling accurancy.)
- Some layered clearing tests as swr doesn't currently support that. It
was getting lucky before because enough other things were broken.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Charmaine Lee [Tue, 22 Nov 2016 21:33:37 +0000 (13:33 -0800)]
util: fix missing swizzle components in the SINT <-> UINT conversion string
Fixes tgsi error introduced in commit
3817a7a. The error complains missing
swizzle component in the conversion string "UMIN TEMP[0], TEMP[0], IMM[0].x".
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Eric Anholt [Tue, 22 Nov 2016 07:52:37 +0000 (23:52 -0800)]
vc4: Don't conditionalize the src1 mov of qir_SEL().
My thought in having both arguments conditionally moved was that it should
theoretically save some power by not doing work in those channels.
However, it ends up costing us instructions because we can't
register-coalesce the first of the MOVs, and it also introduces extra
scheduling dependencies. The instruction cost would swamp whatever power
benefit I was hoping for.
shader-db results:
total instructions in shared programs: 100548 -> 99741 (-0.80%)
instructions in affected programs: 42450 -> 41643 (-1.90%)
With obvious outliers removed (I had an X11 emacs running over the network
in the "after" case), 3DMMES Taiji showed 1.07231% +/- 0.488241% fps
improvement (n=18, 30).
Eric Anholt [Tue, 22 Nov 2016 07:29:04 +0000 (23:29 -0800)]
vc4: Re-add R4 to the "any" register class.
I screwed this up in
fdad4d24024ab7bc9b6b9cb6288f8b76ccac0d89 which was
supposed to be making this code more maintainable. What's amazing is
multithreaded FS showed the wins it did despite this bug.
shader-db results:
total instructions in shared programs: 103535 -> 100548 (-2.89%)
instructions in affected programs: 83794 -> 80807 (-3.56%)
Eric Anholt [Tue, 22 Nov 2016 21:51:03 +0000 (13:51 -0800)]
vc4: Disable MSAA rasterization when the job binning is single-sampled.
Gallium core just changed to start setting MSAA enabled in the rasterizer
state even with samples==1 buffers. This caused disagreements in our
driver between binning and rasterization state, which the simulator threw
assertion failures about. Keep the single-sampled samples==1 behavior for
now.
Eric Anholt [Tue, 22 Nov 2016 21:31:46 +0000 (13:31 -0800)]
vc4: Make sure we don't overflow texture input/output FIFOs when threaded.
I dropped the first hunk of this change last minute when I decided it
wasn't actually needed, and apparently failed to piglit it in simulation.
The simulator threw an an assertion in gl-1.0-drawpixels-color-index,
which queued up 5 coordinates (3 before a switch, two after) before
loading the result.
Dave Airlie [Tue, 25 Oct 2016 06:23:48 +0000 (07:23 +0100)]
radv: move pipeline barrier image transitions after src flushing
This seems like it would conform better with the spec.
noticed while digging into fast clears.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Jason Ekstrand [Sat, 19 Nov 2016 16:47:25 +0000 (08:47 -0800)]
anv: Enable fast clears on gen7-8
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Fri, 18 Nov 2016 06:55:30 +0000 (22:55 -0800)]
anv: Add support for fast clears on gen9
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Sat, 19 Nov 2016 16:25:28 +0000 (08:25 -0800)]
anv/blorp: Rework flushing around resolves
It turns out that the flushing required around resolves is a bit more
extensive than I first thought. You actually need render cache flush
and a CS stall both before *and* after the resolve.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Sat, 19 Nov 2016 01:39:26 +0000 (17:39 -0800)]
anv/cmd_buffer: Apply remaining flushes in EndCommandBuffer
Otherwise, some pipe flushes may just never happen. This is unlikely to
cause problems depending on how the kernel schedules batches, but we
shouldn't count on it.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Fri, 18 Nov 2016 21:35:42 +0000 (13:35 -0800)]
anv/blorp: Use regular blorp clears for subpass clears
At vkCmdNextSubpass time, we have the actual framebuffer so we can use
regular blorp_clear for subpass clears. For fast clears, there is no
attachment version, so this will make fast clears a bit easier.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Fri, 18 Nov 2016 21:35:16 +0000 (13:35 -0800)]
anv: Add a vk_to_isl_color helper
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Fri, 18 Nov 2016 06:26:52 +0000 (22:26 -0800)]
anv/cmd_buffer: Make setup_attachments take a RenderPassBeginInfo
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Tue, 15 Nov 2016 23:25:55 +0000 (15:25 -0800)]
anv: Set up binding tables and surface states for input attachments
This commit adds the last remaining bits to support input attachments in
the Intel Vulkan driver. For color and depth attachments, we allocate an
input attachment surface state during vkCmdBeginRenderPass like we do for
the render target surface states. This is so that we can incorporate the
clear color and aux information as used in rendering. For stencil, we just
treat it like a regular texture because we don't there is no aux. Also,
only having to worry about at most one input attachment surface for each
attachment makes some of the vkCmdBeginRenderPass code simpler.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Wed, 16 Nov 2016 18:39:15 +0000 (10:39 -0800)]
anv/pipeline: Handle depth/stencil self-dependencies
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Wed, 16 Nov 2016 07:11:55 +0000 (23:11 -0800)]
anv: Use pass attachment information to insert flushes
Input and resolve attachments can cause an implicit dependency in the
pipeline. It's our job to insert the needed flushes. Fortunately, we can
easily reuse the usage tracking that we use for CCS resolves.
This fixes 159 Vulkan CTS tests on Haswell because we're now flushing in
between drawing and MSAA resolves. I have no idea how they were passing
before on newer hardware.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Wed, 16 Nov 2016 19:20:50 +0000 (11:20 -0800)]
anv/cmd_buffer: Fix pipeline barriers for input attachments
We were using VK_IMAGE_ACCESS_COLOR_ATTACHMENT_READ_BIT to detect an input
attachment read. We should use VK_IMAGE_ACCESS_INPUT_ATTACHMENT_READ_BIT
instead.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Tue, 15 Nov 2016 23:21:08 +0000 (15:21 -0800)]
anv/pipeline: Add a input_attachment_index to the bindings
This allows us to go from the binding to either the descriptor or the input
attachment at will.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Tue, 15 Nov 2016 01:18:47 +0000 (17:18 -0800)]
anv/pass: Calculate the combined image usage of attachments
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Mon, 14 Nov 2016 22:23:36 +0000 (14:23 -0800)]
anv: Add an input attachment lowering pass
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Tue, 15 Nov 2016 23:18:32 +0000 (15:18 -0800)]
i965/fs: Implement load_layer_id for fragment shaders
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Tue, 15 Nov 2016 23:15:48 +0000 (15:15 -0800)]
nir: Add a layer_id system value intrinsic
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Wed, 16 Nov 2016 00:55:54 +0000 (16:55 -0800)]
spirv: Stop warning about input attachments
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Mon, 14 Nov 2016 22:22:56 +0000 (14:22 -0800)]
spirv: Handle the InputAttachmentIndex decoration
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Tue, 15 Nov 2016 02:11:07 +0000 (18:11 -0800)]
compiler: Add the rest of the subpassInput types
There are actually 6 of them according to the GL_KHR_vulkan_glsl spec.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Tue, 22 Nov 2016 17:35:01 +0000 (09:35 -0800)]
anv/cmd_buffer: Emit CS push constants after binding tables
Emitting binding tables can cause push constants to be dirtied if the
shader uses images so we need to handle push constants later.
Marek Olšák [Tue, 22 Nov 2016 17:28:18 +0000 (18:28 +0100)]
gallium: fix more occurences of u_hash.h
this fixes compile failures since
86514d84e0beec47c82da4888db12bf07f33cb83
Marek Olšák [Fri, 18 Nov 2016 20:00:27 +0000 (21:00 +0100)]
mesa: use special checksums for unset checksums and fixed-func shaders
for debugging
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Marek Olšák [Fri, 18 Nov 2016 18:49:55 +0000 (19:49 +0100)]
glsl: add gl_linked_shader::SourceChecksum
for debugging
v2: wrap all checksums in #ifdef DEBUG
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Marek Olšák [Fri, 18 Nov 2016 18:23:55 +0000 (19:23 +0100)]
mesa: use util_hash_crc32 instead of _mesa_str_checksum
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Marek Olšák [Fri, 18 Nov 2016 18:20:54 +0000 (19:20 +0100)]
util: import CRC32 implementation from gallium
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Jason Ekstrand [Tue, 22 Nov 2016 16:11:45 +0000 (08:11 -0800)]
anv/cmd_buffer: Add an assert on emit_binding_table failure
The != VK_SUCCESS case is really only capable of handling the one error.
This assert makes things a bit safer if something else goes wrong.
Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Lucas Stach [Tue, 22 Nov 2016 10:12:35 +0000 (11:12 +0100)]
gbm: request correct version of the DRI2_FENCE extension
There is no version 2 of the DRI2_FENCE extension. So only a request
for version 1 has a chance to succeed.
Fixes: 74b1969d717f (gbm: wire up fence extension)
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Jason Ekstrand [Tue, 22 Nov 2016 04:22:53 +0000 (20:22 -0800)]
anv/cmd_buffer: Emit a CS stall before setting a CS pipeline
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Jason Ekstrand [Tue, 22 Nov 2016 04:21:24 +0000 (20:21 -0800)]
anv/cmd_buffer: Re-emit MEDIA_CURBE_LOAD when CS push constants are dirty
This can happen even if the binding table isn't changed. For instance, you
could have dynamic offsets with your descriptor set. This fixes the new
stress.lots-of-surface-state.cs.dynamic cricible test.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Jason Ekstrand [Tue, 22 Nov 2016 04:17:24 +0000 (20:17 -0800)]
anv/cmd_buffer: Handle running out of binding tables in compute shaders
If we try to allocate a binding table and fail, we have to get a new
binding table block, re-emit STATE_BASE_ADDRESS, and then try again. We
already handle this correctly for 3D and blorp but it never got handled for
CS. This fixes the new stress.lots-of-surface-state.cs.static crucible test.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Jason Ekstrand [Tue, 8 Nov 2016 21:21:39 +0000 (13:21 -0800)]
i965/compiler: Disable trig workarounds on KBL+
The precision of our trig instructions appears to have been fixed on Kaby
Lake. Neither Ben nor I can find any documentation for this. However, the
dEQP precision tests now pass with INTEL_PRECISE_TRIG=0 where they fail on
Sky Lake.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Tue, 8 Nov 2016 21:21:38 +0000 (13:21 -0800)]
intel/common: Add an is_kabylake field to gen_device_info
Most of the 3-D engine Kaby Lake is identical to Sky Lake. However, there
are a few small differences that we need to be able to detect.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Gwan-gyeong Mun [Sun, 20 Nov 2016 11:44:22 +0000 (20:44 +0900)]
anv: Fix unintentional integer overflow in anv_CreateDmaBufImageINTEL
Since both pCreateInfo->strideInBytes and pCreateInfo->extent.height
are of uint32_t type 32-bit arithmetic will be used.
Fix unintentional integer overflow by casting to uint64_t before
multifying.
CID
1394321
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
[Emil Velikov: cast only of the arguments]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Gwan-gyeong Mun [Mon, 21 Nov 2016 15:21:23 +0000 (00:21 +0900)]
util/disk_cache: close a previously opened handle in disk_cache_put (v2)
We're missing the close() to the matching open().
CID
1373407
v2: Fixes from Emil Velikov's review
Update the teardown in reverse order of the setup/init.
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v1)
Gwan-gyeong Mun [Tue, 22 Nov 2016 11:51:52 +0000 (20:51 +0900)]
docs: get rid of duplicated description from sourcetree.html
Fixes: 438086efb17 (docs: sourcetree.html misc updates)
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Emil Velikov [Mon, 21 Nov 2016 17:53:33 +0000 (17:53 +0000)]
docs/submitting patches: mention get_reviewers.pl
Mention the script - why/how to use alongside a useful trick to make it
work interactively (thanks Rob!).
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
Timothy Arceri [Mon, 21 Nov 2016 16:30:12 +0000 (16:30 +0000)]
docs/submitting patches: add git tips
v2: [Emil Velikov]
- Add the shorthand git send-email -vX
- Move to submittingpatches.html
- Add to the TOC.
v3: [Emil Velikov]
- Use @~8 instead of HEAD~8 (Nicolai)
Cc: Timothy Arceri <t_arceri@yahoo.com.au>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
Emil Velikov [Mon, 21 Nov 2016 13:46:52 +0000 (13:46 +0000)]
auxiliary/vl/dri: call get_xcb_screen() only once
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Emil Velikov [Mon, 21 Nov 2016 13:46:51 +0000 (13:46 +0000)]
egl/x11: store xcb_screen_t *screen instead of int screen
Just fetch and store it once, rather than doing the
xcb_setup_roots_iterator + get_xcb_screen dance five times.
v2: Call xcb_disconnect() on error (Eric)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Emil Velikov [Mon, 21 Nov 2016 13:46:50 +0000 (13:46 +0000)]
egl/x11: factor out dri2_get_xcb_connection()
Identical throughout dri2, dri3 and drisw. Next patch will add more
common code, so rather than duplicating it factor out the function.
Note: this also sets eglError on failure. Something that's quite
inconsistent throughout the codebase.
v2: Call xcb_disconnect() on error (Eric)
Note: use xcb_disconnect() even in the xcb_connection_has_error() case
as per the manual:
... memory will not be freed until xcb_disconnect...
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Timothy Arceri [Tue, 22 Nov 2016 06:59:41 +0000 (17:59 +1100)]
mesa/glsl: remove unused uses_builtin_functions field
This has been unused since
943b69cddd
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Kenneth Graunke [Mon, 17 Oct 2016 21:23:10 +0000 (14:23 -0700)]
i965: Use NIR-based clip/cull lowering for OpenGL as well.
The old approach works fine, and this approach isn't necessarily better.
But it at least has the advantage that Vulkan and GL use the same
approach. I originally wrote it to gain additional testing for the
new paths.
shader-db statistics show 0 instruction count changes.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 4 Oct 2016 03:44:38 +0000 (20:44 -0700)]
anv: Enable clip and cull distance support.
Everything is now in place, and we appear to pass the tests on Gen7+.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Mon, 17 Oct 2016 18:14:10 +0000 (11:14 -0700)]
i965/vec4: Handle component qualifiers on non-generic varyings.
ARB_enhanced_layouts only requires component qualifier support for
generic varyings, so this is all the vec4 backend knew how to handle.
This patch extends the backend to handle it for all varyings, so we
can use store_output intrinsics with a component set for things like
clip/cull distances. We may want to use that for other VUE header
fields in the future as well.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Kenneth Graunke [Tue, 4 Oct 2016 08:59:33 +0000 (01:59 -0700)]
i965/fs: Handle compact outputs.
We need to calculate the number of vec4 slots correctly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 4 Oct 2016 06:46:37 +0000 (23:46 -0700)]
spirv: Silence unsupported capability warnings for Clip/CullDistance.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 4 Oct 2016 06:44:07 +0000 (23:44 -0700)]
anv: Set clip/cull distances fields in packets.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 4 Oct 2016 03:42:42 +0000 (20:42 -0700)]
anv: Combine ClipDistance and CullDistance arrays.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 4 Oct 2016 05:37:40 +0000 (22:37 -0700)]
nir: add a pass to compact clip/cull distances.
v2: Use nir_is_per_vertex_io() rather than is_arrays_of_arrays().
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Kenneth Graunke [Tue, 4 Oct 2016 03:32:22 +0000 (20:32 -0700)]
nir: Add a "compact array" flag and IO lowering code.
Certain built-in arrays, such as gl_ClipDistance[], gl_CullDistance[],
gl_TessLevelInner[], and gl_TessLevelOuter[] are specified as scalar
arrays. Normal scalar arrays are sparse - each array element usually
occupies a whole vec4 slot. However, most hardware assumes these
built-in arrays are tightly packed.
The new var->data.compact flag indicates that a scalar array should
be tightly packed, so a float[4] array would take up a single vec4
slot, and a float[8] array would take up two slots.
They are still arrays, not vec4s, however. nir_lower_io will generate
intrinsics using ARB_enhanced_layouts style component qualifiers.
v2: Add nir_validate code to enforce type restrictions.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Dave Airlie [Tue, 22 Nov 2016 04:17:49 +0000 (04:17 +0000)]
radv: add support for shader stats dump
I've started working on a shader-db alike for Vulkan,
it's based on vktrace and it records pipelines, this
adds support to dump the shader stats exactly like
radeonsi does, so I can reuse the shader-db scripts it
uses.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 21 Nov 2016 01:12:39 +0000 (01:12 +0000)]
radv: fix sample id loading
The sample id is packed into bits 8-12, so adjust
things properly.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 16 Nov 2016 03:54:22 +0000 (03:54 +0000)]
radv/ac: add implementation of load_sample_pos intrinsic.
This fixes a bunch of crashes in CTS tests looking for this.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Sun, 20 Nov 2016 23:56:45 +0000 (23:56 +0000)]
radv/ac: cleanup ddxy emission
This cleans up the ddxy emission along the same lines as
radeonsi. It also means we don't use LDS on VI chips we
use the dspermute interface, it also removes some duplicated
code.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 21 Nov 2016 03:31:09 +0000 (03:31 +0000)]
radv/meta: cleanup resolve vertex state emission
For the hw resolve there is no need to emit any sort
of texture coordinates, so drop them all in the meta path.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Mon, 21 Nov 2016 23:39:50 +0000 (00:39 +0100)]
radv: Incorporate GPU family into cache UUID.
Invalidates the cache when someone switches cards.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Bas Nieuwenhuizen [Mon, 21 Nov 2016 23:19:30 +0000 (00:19 +0100)]
radv: Use library mtime for cache UUID.
We want to also invalidate the cache when LLVM gets changed. As the
specific LLVM revision is not fixed at build time, we will need to
check at runtime. Computing a checksum for LLVM is going to be very
expensive, so just use the mtime.
Tested on my computer that the returned DSO for the LLVM symbol is
actually the LLVM DSO.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Bas Nieuwenhuizen [Mon, 21 Nov 2016 23:31:44 +0000 (00:31 +0100)]
radv: Store UUID in physical device.
No sense in repeatedly determining it. Also, it might be dependent
on the device as shaders get compiled differently for SI/CIK/VI etc.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Timothy Arceri [Tue, 22 Nov 2016 02:19:33 +0000 (13:19 +1100)]
glsl: fix NULL check
Fixes copy and paste error in
9d96d3803ab
Ilia Mirkin [Sat, 19 Nov 2016 01:19:24 +0000 (20:19 -0500)]
swr: calculate viewport width/height based on the scale
The former calculations were for min/max y. The width/height don't take
translate into account.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Ilia Mirkin [Sun, 20 Nov 2016 18:13:12 +0000 (13:13 -0500)]
swr: don't claim to allow setting layer/viewport from VS
This may ultimately be possible to support, but for now it's not hooked
up and the swr core only supports this output from GS.
This normally wouldn't matter, but we lie about supporting GL 3.2, and
also the blitter and st/mesa will make use of this functionality if
claimed.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Ilia Mirkin [Sun, 20 Nov 2016 21:34:59 +0000 (16:34 -0500)]
swr: allocate all scratch space in one go for vertex buffers
Multiple buffers may reference client arrays. When this happens, we
might reach for scratch space multiple times, which could cause later
arrays to invalidate the pointers allocated for the earlier ones.
This fixes copyteximage 2D_ARRAY.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Ilia Mirkin [Sun, 20 Nov 2016 05:04:42 +0000 (00:04 -0500)]
swr: call swr_update_derived unconditionally when drawing/clearing
Currently a sequence like draw/map/draw/map will cause the second map to
not wait for the second draw. This is because the first map will clear
the resource business bit, and the second draw won't reset it since no
state has changed.
swr_update_derived does a tiny bit of extra work, including updating the
SWR_BACKEND_STATE as well as waiting for prending fences. If that's a
problem, we could call swr_update_resource_status directly from
draw/clear handlers.
Fixes clearbuffer-stencil, clearbuffer-depth, clearbuffer-depth-stencil,
and clearbuffer-display-lists.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Ilia Mirkin [Fri, 18 Nov 2016 03:40:29 +0000 (22:40 -0500)]
swr: [rasterizer memory] minify texture width before alignment
The minification should happen before alignment, not after. See similar
logic on ComputeLODOffsetY. The current logic requires unnecessarily
large textures when there's an initial NPOT size.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Ilia Mirkin [Sat, 12 Nov 2016 08:01:15 +0000 (03:01 -0500)]
swr: [rasterizer memory] minify original sizes for block formats
There's no guarantee that mip width/height will be a multiple of the
compressed block size. Doing a divide by the block size first yields
different results than GL expects, so we do the divide at the end.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
Marek Olšák [Tue, 15 Nov 2016 20:15:55 +0000 (21:15 +0100)]
radeonsi: remove all varyings for depth-only rendering or rasterization off
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Mon, 14 Nov 2016 08:09:51 +0000 (09:09 +0100)]
radeonsi: eliminate VS outputs that aren't used by PS at runtime
A past commit added the ability to compile "optimized" shader variants
asynchronously (not stalling the app).
This commit builds upon that and adds what is basically a runtime shader
linker. If a VS output isn't used by the currently-bound PS, a new VS
compilation is started without that output. The new shader variant
is used when it's ready.
All apps using separate shader objects I've seen had unused VS outputs.
Eliminating unused/useless VS outputs also eliminates the corresponding
vertex attribute loads.
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Mon, 14 Nov 2016 06:56:57 +0000 (07:56 +0100)]
radeonsi: record information about all written and read varyings
It's just tgsi_shader_info with DEFAULT_VAL varyings removed.
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 13 Nov 2016 18:54:13 +0000 (19:54 +0100)]
radeonsi: make si_shader_io_get_unique_index stricter
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 13 Nov 2016 17:41:43 +0000 (18:41 +0100)]
radeonsi: don't export ClipVertex and ClipDistance[] if clipping is disabled
This is the first user of optimized monolithic shader variants.
Cull distances can't be disabled by states.
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 13 Oct 2016 10:18:53 +0000 (12:18 +0200)]
radeonsi: add infrastr. for compiling optimized shader variants asynchronously
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 13 Nov 2016 16:30:54 +0000 (17:30 +0100)]
radeonsi: don't set vs.epilog.export_prim_id if TES is bound
there is no VS epilog in this case
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 13 Nov 2016 18:21:46 +0000 (19:21 +0100)]
radeonsi: simplify checking for monolithic compilation
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Mon, 14 Nov 2016 00:53:24 +0000 (01:53 +0100)]
radeonsi: print all flags in si_dump_shader_key
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 13 Nov 2016 02:17:46 +0000 (03:17 +0100)]
radeonsi: split the shader key into 3 logical parts
key->part.*: prolog and epilog flags only
key->as_{ls,es}: special flags
key->mono.*: flags for monolithic compilation only
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 13 Nov 2016 17:12:36 +0000 (18:12 +0100)]
radeonsi: fix culling if clip & cull distances are used at the same time
Fixed piglits:
- arb_cull_distance/clip-cull-3
- arb_cull_distance/clip-cull-4
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 13 Nov 2016 16:51:41 +0000 (17:51 +0100)]
radeonsi: clean up si_emit_clip_regs
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Mon, 14 Nov 2016 01:03:28 +0000 (02:03 +0100)]
radeonsi: assume that a VS without POSITION is LS
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Mon, 14 Nov 2016 01:01:34 +0000 (02:01 +0100)]
tgsi/scan: record if a shader writes the position output
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Mon, 14 Nov 2016 00:59:42 +0000 (01:59 +0100)]
tgsi/scan: use a big switch for scanning outputs
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 6 Nov 2016 20:49:29 +0000 (21:49 +0100)]
radeonsi: decrease the number of texture slots to 24
Company Of Heroes 2 needs only 24.
This saves 512 bytes of CE RAM per shader stage.
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Fri, 11 Nov 2016 21:36:17 +0000 (22:36 +0100)]
radeonsi: fast exit si_emit_derived_tess_state early
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Fri, 11 Nov 2016 20:19:34 +0000 (21:19 +0100)]
winsys/amdgpu: set addrlib flag opt4Space
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Fri, 11 Nov 2016 20:15:54 +0000 (21:15 +0100)]
radeonsi: check for !is_linear in do_hardware_msaa_resolve
We don't want opt4Space here.
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Fri, 11 Nov 2016 20:14:03 +0000 (21:14 +0100)]
gallium/radeon: add RADEON_SURF_OPTIMIZE_FOR_SPACE
FORCE_TILING should disable it. It has no effect now, but that may change
soon.
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Mun Gwan-gyeong [Mon, 21 Nov 2016 14:20:43 +0000 (23:20 +0900)]
radeonsi: Add missing error-checking to si_create_compute_state (v2)
When the uploading of shader fails on si_shader_binary_upload(),
it returns -ENOMEM. We should handle si_shader_binary_upload() failure path
on si_create_compute_state().
CID
1394027
v2: Fixes from Edward O'Callaghan's review
a) Update explicitly return value check with "si_shader_binary_upload() < 0"
b) Update commit message.
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Roland Scheidegger [Sun, 13 Nov 2016 15:33:37 +0000 (16:33 +0100)]
draw: drop some overflow computations
It turns out that noone actually cares if the address computations overflow,
be it the stride mul or the offset adds.
Wrap around seems to be explicitly permitted even by some other API (which
is a _very_ surprising result, as these overflow computations were added just
for that and made some tests pass at that time - I suspect some later fixes
fixed the actual root cause...). So the requirements in that other api were
actually sane there all along after all...
Still need to make sure the computed buffer size needed is valid, of course.
This ditches the shiny new widening mul from these codepaths, ah well...
And now that I really understand this, change the fishy min limiting
indices to what it really should have done. Which is simply to prevent
fetching more values than valid for the last loop iteration. (This makes
the code path in the loop minimally more complex for the non-indexed case
as we have to skip the optimization combining two adds. I think it should
be safe to skip this actually there, but I don't care much about this
especially since skipping that optimization actually makes the code easier
to read elsewhere.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Sun, 13 Nov 2016 15:33:20 +0000 (16:33 +0100)]
draw: simplify fetch some more
Don't keep the ofbit. This is just a minor simplification, just adjust
the buffer size so that there will always be an overflow if buffers aren't
valid to fetch from.
Also, get rid of control flow from the instanced path too. Not worried about
performance, but it's simpler and keeps the code more similar to ordinary
fetch.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Sun, 13 Nov 2016 15:32:24 +0000 (16:32 +0100)]
draw: unify linear and elts draw jit functions
The code for elts and linear paths was nearly 100% identical by now - with
the elts path simply having some additional gather for the elements in the
main loop (with some additional small differences before the main loop).
Hence nuke the separate functions and decide this at jit shader execution
time (simply based on the presence of the elts pointer).
Some analysis shows that the generated vs jit functions seem to be just very
minimally more complex than the former elts functions, and almost none of the
additional complexity is in the main loop (basically just the branch logic
for the branch fetching the actual indices).
Compared to linear, the codesize of the function is of course a bit larger,
however the actual executed code in the main loop appears to be near 100%
identical (the additional code looking up indices is skipped as expected).
So, I would not expect a (meaningful) performance difference with the
generated code, neither with elts nor linear, this does however roughly
half the compilation time (the compiled shaders should also use only half
the memory of course).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Sun, 13 Nov 2016 15:31:57 +0000 (16:31 +0100)]
draw: use same argument order for jit draw linear / elts functions
This is a bit simpler. Mostly to make it easier to unify the paths later...
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Sun, 13 Nov 2016 16:17:25 +0000 (17:17 +0100)]
draw: drop unnecessary index overflow handling from vsplit code
This was kind of strange, since it replaced indices which were only
overflowing due to bias with MAX_UINT. This would cause an overflow later
in the shader, except if stride was 0, however the vertex id would be
essentially random then (-1 + eltBias). No test cared about it, though.
So, drop this and just use ordinary int arithmetic wraparound as usual.
This is much simpler to understand and the results are "more correct" or
at least more consistent (vertex id as well as actual fetch results just
correspond to wrapped around arithmetic).
There's only one catch, it is now possible to hit the cache initialization
value also with ushort and ubyte elts path (this wouldn't be an issue if
we'd simply handle the eltBias itself later in the shader). Hence, we need
to make sure the cache logic doesn't think this element has already been
emitted when it has not (I believe some seriously bad things could happen
otherwise). So, borrow the logic which handled this from the uint case, but
not before fixing it up...
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Roland Scheidegger [Sat, 12 Nov 2016 21:47:22 +0000 (22:47 +0100)]
draw: simplify vsplit elts code a bit
vsplit_get_base_idx explicitly returned idx 0 and set the ofbit
in case of overflow. We'd then check the ofbit and use idx 0 instead of
looking it up. This was necessary because DRAW_GET_IDX used to return
DRAW_MAX_FETCH_IDX and not 0 in case of overflows.
However, this is all unnecessary, we can just let DRAW_GET_IDX return 0
in case of overflow. In fact before
bbd1e60198548a12be3405fc32dd39a87e8968ab
the code already did that, not sure why this particular bit was changed
(might have been one half of an attempt to get these indices to actual draw
shader execution - in fact I think this would make things less awkward, it
would require moving the eltBias handling to the shader as well).
Note there's other callers of DRAW_GET_IDX - those code paths however
explicitly do not handle index buffer overflows, therefore the overflow
value doesn't matter for them.
Also do some trivial simplification - for (unsigned) a + b, checking res < a
is sufficient for overflow detection, we don't need to check for res < b too
(similar for signed).
And an index buffer overflow check looked bogus - eltMax is the number of
elements in the index buffer, not the maximum element which can be fetched.
(Drop the start check against the idx buffer though, this is already covered
by end check and end < start).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>