mesa.git
7 years agoswr: [rasterizer core] actually perform clear before store in GetHotTile
Ilia Mirkin [Fri, 18 Nov 2016 00:39:20 +0000 (19:39 -0500)]
swr: [rasterizer core] actually perform clear before store in GetHotTile

When switching render target array indexes (as might happen in a GS, or
in a future change, with layered clears), if the previous state is
HOTTILE_CLEAR, we should actually clear the tile before saving it off.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
7 years agoi965: Fix a mistake from porting the URB allocation code to arrays.
Kenneth Graunke [Wed, 23 Nov 2016 20:24:22 +0000 (12:24 -0800)]
i965: Fix a mistake from porting the URB allocation code to arrays.

Commit 6d416bcd846a49414f210cd761789156c37a7b3e (i965: Use arrays in
Gen7+ URB code.) introduced a regression which caused us to fail to
allocate all of our URB space.

   -         total_wants -= ds_wants;
   +         total_wants -= additional;

The new line should have been total_wants -= wants[i].

Fixes a large performance regression in TessMark.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98815
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoi965: Use 3DSTATE_CLIP's User Clip Distance Enable bitmask on Gen8+.
Kenneth Graunke [Wed, 16 Nov 2016 06:59:45 +0000 (22:59 -0800)]
i965: Use 3DSTATE_CLIP's User Clip Distance Enable bitmask on Gen8+.

Gen6-7.5 specify the user clip distance enable bitmask in 3DSTATE_CLIP.
Gen8+ normally uses the new internal signalling mechanism to select the
one specified in the last enabled shader stage (3DSTATE_VS, DS, or GS).

This is a pretty good fit for Vulkan, or even newer GL, where the
bitmask comes entirely from the shader.  But with glClipPlane(),
this is dynamic state, and we have to listen to _NEW_TRASNFORM.

Clip plane enables are the only reason the VS/DS/GS atoms need to
listen to _NEW_TRANSFORM.  3DSTATE_CLIP already has to listen to it
in order to support ARB_clip_control settings.

Setting the "Use the 3DSTATE_CLIP bitmask" force enable bit allows
us to drop _NEW_TRANSFORM from all the shader stage atoms, so we can
re-emit them less often.

Improves performance of OglBatch7 (version 6) by 2.70773% +/- 0.491257%
(n = 38) at 1024x768 on Cherryview.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoradv: fix flipped blits
Dave Airlie [Tue, 15 Nov 2016 04:54:28 +0000 (04:54 +0000)]
radv: fix flipped blits

This fixes:
dEQP-VK.api.copy_and_blit.blit_image.simple_tests.mirror*

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/meta: just local vars for src/dst subresources.
Dave Airlie [Wed, 23 Nov 2016 23:35:41 +0000 (23:35 +0000)]
radv/meta: just local vars for src/dst subresources.

This is just a cleanup before I rework this code to fix mirrored
blits.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: add support for VK_AMD_draw_indirect_count
Fredrik Höglund [Wed, 23 Nov 2016 22:05:01 +0000 (23:05 +0100)]
radv: add support for VK_AMD_draw_indirect_count

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: add support for VK_AMD_negative_viewport_height
Fredrik Höglund [Wed, 23 Nov 2016 22:05:00 +0000 (23:05 +0100)]
radv: add support for VK_AMD_negative_viewport_height

The driver already supports this extension in practice.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: add support for VK_KHR_sampler_mirror_clamp_to_edge
Fredrik Höglund [Wed, 23 Nov 2016 22:04:59 +0000 (23:04 +0100)]
radv: add support for VK_KHR_sampler_mirror_clamp_to_edge

radv_tex_wrap() already supports VK_SAMPLER_ADDRESS_MODE_MIRROR_CLAMP_TO_EDGE,
so all that's needed is to advertise support for the extension.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: add support for anisotropic filtering on SI-CI
Fredrik Höglund [Wed, 23 Nov 2016 22:04:58 +0000 (23:04 +0100)]
radv: add support for anisotropic filtering on SI-CI

Ported from radeonsi.

Note that si_make_texture_descriptor() already sets img7 to the mask
value referred to in the comment.

Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoi965/gen7: Only advertise 4 samples for RGBA32F on GLES
Jordan Justen [Fri, 22 Jul 2016 22:23:55 +0000 (15:23 -0700)]
i965/gen7: Only advertise 4 samples for RGBA32F on GLES

We can't render to 8x MSAA if the width is greater than 64 bits. (see
brw_render_target_supported)

Fixes ES31-CTS.sample_variables.mask.rgba32f.samples_8.mask_*

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
7 years agoradeonsi: print new opt flags in si_dump_shader_key
Marek Olšák [Mon, 21 Nov 2016 19:57:05 +0000 (20:57 +0100)]
radeonsi: print new opt flags in si_dump_shader_key

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: add a debug flag that disables optimized shader variants
Marek Olšák [Mon, 21 Nov 2016 19:39:27 +0000 (20:39 +0100)]
radeonsi: add a debug flag that disables optimized shader variants

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agocompiler/glsl/tests: Fix print format when building 32-bit binaries on 64-bit host
Aaron Watry [Mon, 3 Oct 2016 14:47:45 +0000 (09:47 -0500)]
compiler/glsl/tests: Fix print format when building 32-bit binaries on 64-bit host

Avoids two warnings.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agocompiler/glsl/tests: Fix print format when building 32-bit binaries on 64-bit host
Aaron Watry [Tue, 22 Nov 2016 17:18:11 +0000 (11:18 -0600)]
compiler/glsl/tests: Fix print format when building 32-bit binaries on 64-bit host

Avoids three warnings.

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoanv: fix enumeration of properties
Emil Velikov [Thu, 6 Oct 2016 13:12:27 +0000 (14:12 +0100)]
anv: fix enumeration of properties

Driver should enumerate only up-to min2(num_available, num_requested)
properties and return VK_INCOMPLETE if the # of requested props is
smaller than the ones available.

Presently we assert out in such cases.

Inspired by a similar fix for RADV.

v2: Use MIN2 + typed_memcpy (Jason).

Should fix: dEQP-VK.api.info.device.extensions

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoi965: Restructure fast clear eligibility decision
Ben Widawsky [Wed, 20 Apr 2016 14:44:17 +0000 (07:44 -0700)]
i965: Restructure fast clear eligibility decision

v2 (Jason):
   - Use PRM citation for SKL now that it is available
   - Also return false for gen < 8 mipmapped/arrayed

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoi965: Set initial msaa fast clear status explicitly
Topi Pohjolainen [Fri, 8 Jul 2016 07:26:30 +0000 (10:26 +0300)]
i965: Set initial msaa fast clear status explicitly

instead of in intel_miptree_init_mcs(). For lossless compression
the status is immediately overwritten in
intel_miptree_alloc_non_msrt_mcs() while the status for
non-compressed non-msaa miptrees is explicitly set in
do_blorp_clear().

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoi965: Declare read-only input to level/layer check const
Topi Pohjolainen [Fri, 10 Jun 2016 15:16:22 +0000 (18:16 +0300)]
i965: Declare read-only input to level/layer check const

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoi965/fbo: Prepare layer multiplier for render buffer compression
Topi Pohjolainen [Tue, 7 Jun 2016 05:22:18 +0000 (08:22 +0300)]
i965/fbo: Prepare layer multiplier for render buffer compression

This path is not yet taken for fast cleared or compressed buffers
but later patches will enable it.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoi965: Add multi-slice getter for resolve maps
Topi Pohjolainen [Fri, 10 Jun 2016 15:18:57 +0000 (18:18 +0300)]
i965: Add multi-slice getter for resolve maps

This is useful when checking if any slice is in unresolved state.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoi965/meta: Split conversion of color and setting it
Topi Pohjolainen [Sun, 12 Jun 2016 17:49:54 +0000 (20:49 +0300)]
i965/meta: Split conversion of color and setting it

And fix a mangled comment while at it.

v2 (Ben): Return the converted color.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agointel/blorp: Fix rectangle size for level-not-zero resolves
Topi Pohjolainen [Tue, 22 Nov 2016 10:15:07 +0000 (12:15 +0200)]
intel/blorp: Fix rectangle size for level-not-zero resolves

Needed to prevent gpu hangs when mip-mapped compression gets
enabled.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoi965/miptree: Don't shrink textures when augmenting for more levels
Topi Pohjolainen [Tue, 15 Nov 2016 20:27:12 +0000 (22:27 +0200)]
i965/miptree: Don't shrink textures when augmenting for more levels

This was detected when examining CCS_E failures with piglit test:
"fbo-generatemipmap-formats". Test creates a 2D texture with
dimensions 293x277. It manually loops over all levels and calls
glTexImage2D(). Level one triggers creation of full miptree:
intel_alloc_texture_image_buffer() realizes that there is only one
level in the miptree and calls intel_miptree_create_for_teximage()
to re-allocate the miptree with all 9 levels. However, the end result
is a miptree with level zero dimensions of 292x276.

Related, and possibly calling for treatment of its own is mip-map
generation:
After calling glTexImage2D() against every level test continues by
replacing content for levels one to eight with data derived from level
zero by calling glGenerateMipmapEXT(). This results into the miptree
being allocated anew for every level:
Mip-map generation goes thru meta which ends up validating the texture
(brw_validate_textures()->intel_finalize_mipmap_tree()->
intel_miptree_match_image()) where one finds texture with base level
size 292:276. This results into new miptree being created for the npot
size 293:277. Only here intel_finalize_mipmap_tree() is asked for only
one level, and therefore such is created. Generation for level one in
turn finds right base level size but only one level when two is needed.
And the same goes on for all eight levels.

This patch prevents the shrink maintaining the NPOT size of 293x277.

Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agomain/getteximage: Use the height argument to calculate memcpy copy size
Eduardo Lima Mitev [Tue, 22 Nov 2016 11:12:48 +0000 (12:12 +0100)]
main/getteximage: Use the height argument to calculate memcpy copy size

In get_tex_memcpy, when copying texture data directly from source
to destination (when row strides match for both src and dst), the
copy size is currently calculated using the full texture height
instead of the sub-region height parameter that was passed.

This can cause a read past the end of the mapped buffer when y-offset
is greater than zero, leading to a segfault.

Fixes CTS test (from crash to pass):
* GL45-CTS/get_texture_sub_image/functional_test

v2: (Jason) Use the passed 'height' instead of copying til the
end of the buffer (tex-height - yoffset).

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agonir/spirv: implement ordered / unordered floating point comparisons properly
Iago Toral Quiroga [Thu, 17 Nov 2016 08:36:36 +0000 (09:36 +0100)]
nir/spirv: implement ordered / unordered floating point comparisons properly

Besides the logical operation involved, these also require that we test if the
operands are ordered / unordered.

For ordered operations, both operands must be ordered (and they must pass the
conditional test) while for unordered operations it is sufficient if only one
of the operands is unordered (or they pass the logical test).

Fixes the following Vulkan CTS tests:

dEQP-VK.spirv_assembly.instruction.compute.opfunord.equal
dEQP-VK.spirv_assembly.instruction.compute.opfunord.greater
dEQP-VK.spirv_assembly.instruction.compute.opfunord.greaterequal
dEQP-VK.spirv_assembly.instruction.compute.opfunord.less
dEQP-VK.spirv_assembly.instruction.compute.opfunord.lessequal

v2: Fixed typo: s/nir_eq/nir_feq

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: fix segfault in anv_BindImageMemory
Dave Airlie [Wed, 23 Nov 2016 06:05:34 +0000 (16:05 +1000)]
anv: fix segfault in anv_BindImageMemory

Since bind image memory started memsetting surfaces, the
device node can't be NULL, since we lookup device->info.has_llc.

Not sure why it ever was NULL before.

Fixes some things on my Ivybridge.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoswr: [rasterizer core] fix cast for stencil clear value
Tim Rowley [Wed, 23 Nov 2016 01:50:55 +0000 (19:50 -0600)]
swr: [rasterizer core] fix cast for stencil clear value

Bad type cast for stencil clear value was picking up structure
padding bytes.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agoswr: color interpolation is also supposed to get perspective division
Ilia Mirkin [Mon, 21 Nov 2016 02:20:08 +0000 (21:20 -0500)]
swr: color interpolation is also supposed to get perspective division

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
7 years agoswr: add sprite coord enable mask to fs key
Ilia Mirkin [Mon, 21 Nov 2016 17:45:08 +0000 (12:45 -0500)]
swr: add sprite coord enable mask to fs key

This fixes gl-coord-replace-doesnt-eliminate-frag-tex-coords

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
7 years agoswr: rework vert <-> frag shader linkage logic
Ilia Mirkin [Thu, 10 Nov 2016 02:04:01 +0000 (21:04 -0500)]
swr: rework vert <-> frag shader linkage logic

Fixes a few things:
 - sprite coords only apply to generic varyings, and are a bitmask
 - back color only applies in 2-sided lighting mode
 - handle some odd situations between only some front/back colors being
   there. This is only semi-legal in GL, but we shouldn't start
   crashing.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
7 years agoswr: flatshading makes color outputs flat, it doesn't affect others
Ilia Mirkin [Mon, 21 Nov 2016 00:48:38 +0000 (19:48 -0500)]
swr: flatshading makes color outputs flat, it doesn't affect others

We were previously not marking the "regular" flat outputs as flat when
flatshading was enabled.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
7 years agoswr: only broadcast color0 value, not all color values
Ilia Mirkin [Mon, 21 Nov 2016 00:08:12 +0000 (19:08 -0500)]
swr: only broadcast color0 value, not all color values

The way that dual-source blending is described for GLES2 is very odd,
and we end up with a shader that both has this property set *and* has a
color1 value to be used as the second source. While changing the state
tracker is an option, it seems more reliable to verify that the
broadcast is only done on color0.

Fixes arb_blend_func_extended-fbo-extended-blend-pattern_gles2

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
7 years agoswr: report a reasonable max lod bias
Ilia Mirkin [Sat, 19 Nov 2016 15:10:47 +0000 (10:10 -0500)]
swr: report a reasonable max lod bias

This is the same value that llvmpipe uses. Since swr uses the same
sampler logic, makes sense for this value to also be the same. Most
applications don't care.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
7 years agoswr: avoid using exceptions for expected condition handling
Ilia Mirkin [Sun, 13 Nov 2016 03:18:48 +0000 (22:18 -0500)]
swr: avoid using exceptions for expected condition handling

I was getting a weird segfault from GCC 4.9.3:

 0x00007ffff54f27aa in strlen () from /lib64/libc.so.6
 (gdb) bt
 #0  0x00007ffff54f27aa in strlen () from /lib64/libc.so.6
 #1  0x00007ffff4f128e5 in get_cie_encoding (cie=cie@entry=0x7ffff6e09813)
     at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:272
 #2  0x00007ffff4f1318e in classify_object_over_fdes (ob=ob@entry=0xd7bb90, this_fde=0x7ffff7f11010)
     at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:628
 #3  0x00007ffff4f135ba in init_object (ob=0xd7bb90)
     at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:749
 #4  search_object (ob=ob@entry=0xd7bb90, pc=pc@entry=0x7ffff4f11f4d <_Unwind_RaiseException+61>)
     at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:961
 #5  0x00007ffff4f13e62 in _Unwind_Find_registered_FDE (bases=0x7fffffffd358, pc=0x7ffff4f11f4d <_Unwind_RaiseException+61>)
     at /gcc-4.9.3/libgcc/unwind-dw2-fde.c:1025
 #6  _Unwind_Find_FDE (pc=0x7ffff4f11f4d <_Unwind_RaiseException+61>, bases=bases@entry=0x7fffffffd358)
     at /gcc-4.9.3/libgcc/unwind-dw2-fde-dip.c:450
 #7  0x00007ffff4f11197 in uw_frame_state_for (context=context@entry=0x7fffffffd2b0, fs=fs@entry=0x7fffffffd100)
     at /gcc-4.9.3/libgcc/unwind-dw2.c:1245
 #8  0x00007ffff4f11b15 in uw_init_context_1 (context=context@entry=0x7fffffffd2b0, outer_cfa=outer_cfa@entry=0x7fffffffd660, outer_ra=0x7ffff518d23b <__cxa_throw+91>)
     at /gcc-4.9.3/libgcc/unwind-dw2.c:1566
 #9  0x00007ffff4f11f4e in _Unwind_RaiseException (exc=0xd7c250)
     at /gcc-4.9.3/libgcc/unwind.inc:88
 #10 0x00007ffff518d23b in __cxa_throw () from /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/libstdc++.so.6
 #11 0x00007ffff51ed556 in std::__throw_out_of_range(char const*) ()
    from /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/libstdc++.so.6
 #12 0x00007fffea778be0 in std::map<pipe_format, SWR_FORMAT, std::less<pipe_format>, std::allocator<std::pair<pipe_format const, SWR_FORMAT> > >::at (
     this=0x7fffebeb4c40 <mesa_to_swr_format(pipe_format)::mesa2swr>,
     __k=@0x7fffffffd73c: PIPE_FORMAT_RGTC1_UNORM)
     at /usr/lib/gcc/x86_64-pc-linux-gnu/4.9.3/include/g++-v4/bits/stl_map.h:549
 #13 0x00007fffea776aee in mesa_to_swr_format (format=PIPE_FORMAT_RGTC1_UNORM) at swr_screen.cpp:597

We can just void this whole issue by not using exceptions in the
first place.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: remove formats from mapping table that don't have StoreTile impls
Ilia Mirkin [Sat, 12 Nov 2016 18:09:21 +0000 (13:09 -0500)]
swr: remove formats from mapping table that don't have StoreTile impls

This table exists for the purpose of determining renderable formats.
Without a StoreTile implementation, that can't happen.

This basically removes rendering support to all L/LA/I formats. They can
be re-added when/if StoreTile implementations are added.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: remove unnecessary -1 entries in format mapping table
Ilia Mirkin [Wed, 9 Nov 2016 20:13:26 +0000 (15:13 -0500)]
swr: remove unnecessary -1 entries in format mapping table

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: rework resource layout and surface setup
Ilia Mirkin [Wed, 9 Nov 2016 22:16:36 +0000 (17:16 -0500)]
swr: rework resource layout and surface setup

This is a bit of a mega-commit, but unfortunately there's no great way
to break this up since a lot of different pieces have to match up. Here
we do the following:
 - change surface layout to match swr's Load/StoreTile expectations
 - fix sampler settings to respect all sampler view parameters
 - fix stencil sampling to read from secondary resource
 - respect pipe surface format, level, and layer settings
 - fix resource map/unmap based on the new layout logic
 - fix resource map/unmap to copy proper parts of stencil values in and
   out of the matching depth texture

These fix a massive quantity of piglits, including all the
tex-miplevel-selection ones.

Note that the swr native miptree layout isn't extremely space-efficient,
and we end up using it for all textures, not just the renderable ones. A
back-of-the-envelope calculation suggests about 10%-25% increased memory
usage for miptrees, depending on the number of LODs. Single-LOD textures
should be unaffected.

There are a handful of regressions as a result of this change:
 - Some textureGrad tests, these failures match llvmpipe. (There are
   debug settings allowing improved gallivm sampling accurancy.)
 - Some layered clearing tests as swr doesn't currently support that. It
   was getting lucky before because enough other things were broken.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoutil: fix missing swizzle components in the SINT <-> UINT conversion string
Charmaine Lee [Tue, 22 Nov 2016 21:33:37 +0000 (13:33 -0800)]
util: fix missing swizzle components in the SINT <-> UINT conversion string

Fixes tgsi error introduced in commit 3817a7a. The error complains missing
swizzle component in the conversion string "UMIN TEMP[0], TEMP[0], IMM[0].x".

Reviewed-by: Roland Scheidegger <sroland@vmware.com>
7 years agovc4: Don't conditionalize the src1 mov of qir_SEL().
Eric Anholt [Tue, 22 Nov 2016 07:52:37 +0000 (23:52 -0800)]
vc4: Don't conditionalize the src1 mov of qir_SEL().

My thought in having both arguments conditionally moved was that it should
theoretically save some power by not doing work in those channels.
However, it ends up costing us instructions because we can't
register-coalesce the first of the MOVs, and it also introduces extra
scheduling dependencies.  The instruction cost would swamp whatever power
benefit I was hoping for.

shader-db results:
total instructions in shared programs: 100548 -> 99741 (-0.80%)
instructions in affected programs:     42450 -> 41643 (-1.90%)

With obvious outliers removed (I had an X11 emacs running over the network
in the "after" case), 3DMMES Taiji showed 1.07231% +/- 0.488241% fps
improvement (n=18, 30).

7 years agovc4: Re-add R4 to the "any" register class.
Eric Anholt [Tue, 22 Nov 2016 07:29:04 +0000 (23:29 -0800)]
vc4: Re-add R4 to the "any" register class.

I screwed this up in fdad4d24024ab7bc9b6b9cb6288f8b76ccac0d89 which was
supposed to be making this code more maintainable.  What's amazing is
multithreaded FS showed the wins it did despite this bug.

shader-db results:
total instructions in shared programs: 103535 -> 100548 (-2.89%)
instructions in affected programs:     83794 -> 80807 (-3.56%)

7 years agovc4: Disable MSAA rasterization when the job binning is single-sampled.
Eric Anholt [Tue, 22 Nov 2016 21:51:03 +0000 (13:51 -0800)]
vc4: Disable MSAA rasterization when the job binning is single-sampled.

Gallium core just changed to start setting MSAA enabled in the rasterizer
state even with samples==1 buffers.  This caused disagreements in our
driver between binning and rasterization state, which the simulator threw
assertion failures about.  Keep the single-sampled samples==1 behavior for
now.

7 years agovc4: Make sure we don't overflow texture input/output FIFOs when threaded.
Eric Anholt [Tue, 22 Nov 2016 21:31:46 +0000 (13:31 -0800)]
vc4: Make sure we don't overflow texture input/output FIFOs when threaded.

I dropped the first hunk of this change last minute when I decided it
wasn't actually needed, and apparently failed to piglit it in simulation.
The simulator threw an an assertion in gl-1.0-drawpixels-color-index,
which queued up 5 coordinates (3 before a switch, two after) before
loading the result.

7 years agoradv: move pipeline barrier image transitions after src flushing
Dave Airlie [Tue, 25 Oct 2016 06:23:48 +0000 (07:23 +0100)]
radv: move pipeline barrier image transitions after src flushing

This seems like it would conform better with the spec.

noticed while digging into fast clears.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoanv: Enable fast clears on gen7-8
Jason Ekstrand [Sat, 19 Nov 2016 16:47:25 +0000 (08:47 -0800)]
anv: Enable fast clears on gen7-8

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agoanv: Add support for fast clears on gen9
Jason Ekstrand [Fri, 18 Nov 2016 06:55:30 +0000 (22:55 -0800)]
anv: Add support for fast clears on gen9

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agoanv/blorp: Rework flushing around resolves
Jason Ekstrand [Sat, 19 Nov 2016 16:25:28 +0000 (08:25 -0800)]
anv/blorp: Rework flushing around resolves

It turns out that the flushing required around resolves is a bit more
extensive than I first thought.  You actually need render cache flush
and a CS stall both before *and* after the resolve.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agoanv/cmd_buffer: Apply remaining flushes in EndCommandBuffer
Jason Ekstrand [Sat, 19 Nov 2016 01:39:26 +0000 (17:39 -0800)]
anv/cmd_buffer: Apply remaining flushes in EndCommandBuffer

Otherwise, some pipe flushes may just never happen.  This is unlikely to
cause problems depending on how the kernel schedules batches, but we
shouldn't count on it.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agoanv/blorp: Use regular blorp clears for subpass clears
Jason Ekstrand [Fri, 18 Nov 2016 21:35:42 +0000 (13:35 -0800)]
anv/blorp: Use regular blorp clears for subpass clears

At vkCmdNextSubpass time, we have the actual framebuffer so we can use
regular blorp_clear for subpass clears.  For fast clears, there is no
attachment version, so this will make fast clears a bit easier.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agoanv: Add a vk_to_isl_color helper
Jason Ekstrand [Fri, 18 Nov 2016 21:35:16 +0000 (13:35 -0800)]
anv: Add a vk_to_isl_color helper

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agoanv/cmd_buffer: Make setup_attachments take a RenderPassBeginInfo
Jason Ekstrand [Fri, 18 Nov 2016 06:26:52 +0000 (22:26 -0800)]
anv/cmd_buffer: Make setup_attachments take a RenderPassBeginInfo

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agoanv: Set up binding tables and surface states for input attachments
Jason Ekstrand [Tue, 15 Nov 2016 23:25:55 +0000 (15:25 -0800)]
anv: Set up binding tables and surface states for input attachments

This commit adds the last remaining bits to support input attachments in
the Intel Vulkan driver.  For color and depth attachments, we allocate an
input attachment surface state during vkCmdBeginRenderPass like we do for
the render target surface states.  This is so that we can incorporate the
clear color and aux information as used in rendering.  For stencil, we just
treat it like a regular texture because we don't there is no aux.  Also,
only having to worry about at most one input attachment surface for each
attachment makes some of the vkCmdBeginRenderPass code simpler.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agoanv/pipeline: Handle depth/stencil self-dependencies
Jason Ekstrand [Wed, 16 Nov 2016 18:39:15 +0000 (10:39 -0800)]
anv/pipeline: Handle depth/stencil self-dependencies

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agoanv: Use pass attachment information to insert flushes
Jason Ekstrand [Wed, 16 Nov 2016 07:11:55 +0000 (23:11 -0800)]
anv: Use pass attachment information to insert flushes

Input and resolve attachments can cause an implicit dependency in the
pipeline.  It's our job to insert the needed flushes.  Fortunately, we can
easily reuse the usage tracking that we use for CCS resolves.

This fixes 159 Vulkan CTS tests on Haswell because we're now flushing in
between drawing and MSAA resolves.  I have no idea how they were passing
before on newer hardware.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agoanv/cmd_buffer: Fix pipeline barriers for input attachments
Jason Ekstrand [Wed, 16 Nov 2016 19:20:50 +0000 (11:20 -0800)]
anv/cmd_buffer: Fix pipeline barriers for input attachments

We were using VK_IMAGE_ACCESS_COLOR_ATTACHMENT_READ_BIT to detect an input
attachment read.  We should use VK_IMAGE_ACCESS_INPUT_ATTACHMENT_READ_BIT
instead.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agoanv/pipeline: Add a input_attachment_index to the bindings
Jason Ekstrand [Tue, 15 Nov 2016 23:21:08 +0000 (15:21 -0800)]
anv/pipeline: Add a input_attachment_index to the bindings

This allows us to go from the binding to either the descriptor or the input
attachment at will.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agoanv/pass: Calculate the combined image usage of attachments
Jason Ekstrand [Tue, 15 Nov 2016 01:18:47 +0000 (17:18 -0800)]
anv/pass: Calculate the combined image usage of attachments

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agoanv: Add an input attachment lowering pass
Jason Ekstrand [Mon, 14 Nov 2016 22:23:36 +0000 (14:23 -0800)]
anv: Add an input attachment lowering pass

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agoi965/fs: Implement load_layer_id for fragment shaders
Jason Ekstrand [Tue, 15 Nov 2016 23:18:32 +0000 (15:18 -0800)]
i965/fs: Implement load_layer_id for fragment shaders

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agonir: Add a layer_id system value intrinsic
Jason Ekstrand [Tue, 15 Nov 2016 23:15:48 +0000 (15:15 -0800)]
nir: Add a layer_id system value intrinsic

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agospirv: Stop warning about input attachments
Jason Ekstrand [Wed, 16 Nov 2016 00:55:54 +0000 (16:55 -0800)]
spirv: Stop warning about input attachments

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agospirv: Handle the InputAttachmentIndex decoration
Jason Ekstrand [Mon, 14 Nov 2016 22:22:56 +0000 (14:22 -0800)]
spirv: Handle the InputAttachmentIndex decoration

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agocompiler: Add the rest of the subpassInput types
Jason Ekstrand [Tue, 15 Nov 2016 02:11:07 +0000 (18:11 -0800)]
compiler: Add the rest of the subpassInput types

There are actually 6 of them according to the GL_KHR_vulkan_glsl spec.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agoanv/cmd_buffer: Emit CS push constants after binding tables
Jason Ekstrand [Tue, 22 Nov 2016 17:35:01 +0000 (09:35 -0800)]
anv/cmd_buffer: Emit CS push constants after binding tables

Emitting binding tables can cause push constants to be dirtied if the
shader uses images so we need to handle push constants later.

7 years agogallium: fix more occurences of u_hash.h
Marek Olšák [Tue, 22 Nov 2016 17:28:18 +0000 (18:28 +0100)]
gallium: fix more occurences of u_hash.h

this fixes compile failures since 86514d84e0beec47c82da4888db12bf07f33cb83

7 years agomesa: use special checksums for unset checksums and fixed-func shaders
Marek Olšák [Fri, 18 Nov 2016 20:00:27 +0000 (21:00 +0100)]
mesa: use special checksums for unset checksums and fixed-func shaders

for debugging

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
7 years agoglsl: add gl_linked_shader::SourceChecksum
Marek Olšák [Fri, 18 Nov 2016 18:49:55 +0000 (19:49 +0100)]
glsl: add gl_linked_shader::SourceChecksum

for debugging

v2: wrap all checksums in #ifdef DEBUG

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
7 years agomesa: use util_hash_crc32 instead of _mesa_str_checksum
Marek Olšák [Fri, 18 Nov 2016 18:23:55 +0000 (19:23 +0100)]
mesa: use util_hash_crc32 instead of _mesa_str_checksum

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
7 years agoutil: import CRC32 implementation from gallium
Marek Olšák [Fri, 18 Nov 2016 18:20:54 +0000 (19:20 +0100)]
util: import CRC32 implementation from gallium

Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
7 years agoanv/cmd_buffer: Add an assert on emit_binding_table failure
Jason Ekstrand [Tue, 22 Nov 2016 16:11:45 +0000 (08:11 -0800)]
anv/cmd_buffer: Add an assert on emit_binding_table failure

The != VK_SUCCESS case is really only capable of handling the one error.
This assert makes things a bit safer if something else goes wrong.

Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agogbm: request correct version of the DRI2_FENCE extension
Lucas Stach [Tue, 22 Nov 2016 10:12:35 +0000 (11:12 +0100)]
gbm: request correct version of the DRI2_FENCE extension

There is no version 2 of the DRI2_FENCE extension. So only a request
for version 1 has a chance to succeed.

Fixes: 74b1969d717f (gbm: wire up fence extension)
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoanv/cmd_buffer: Emit a CS stall before setting a CS pipeline
Jason Ekstrand [Tue, 22 Nov 2016 04:22:53 +0000 (20:22 -0800)]
anv/cmd_buffer: Emit a CS stall before setting a CS pipeline

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
7 years agoanv/cmd_buffer: Re-emit MEDIA_CURBE_LOAD when CS push constants are dirty
Jason Ekstrand [Tue, 22 Nov 2016 04:21:24 +0000 (20:21 -0800)]
anv/cmd_buffer: Re-emit MEDIA_CURBE_LOAD when CS push constants are dirty

This can happen even if the binding table isn't changed.  For instance, you
could have dynamic offsets with your descriptor set.  This fixes the new
stress.lots-of-surface-state.cs.dynamic cricible test.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
7 years agoanv/cmd_buffer: Handle running out of binding tables in compute shaders
Jason Ekstrand [Tue, 22 Nov 2016 04:17:24 +0000 (20:17 -0800)]
anv/cmd_buffer: Handle running out of binding tables in compute shaders

If we try to allocate a binding table and fail, we have to get a new
binding table block, re-emit STATE_BASE_ADDRESS, and then try again.  We
already handle this correctly for 3D and blorp but it never got handled for
CS.  This fixes the new stress.lots-of-surface-state.cs.static crucible test.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
7 years agoi965/compiler: Disable trig workarounds on KBL+
Jason Ekstrand [Tue, 8 Nov 2016 21:21:39 +0000 (13:21 -0800)]
i965/compiler: Disable trig workarounds on KBL+

The precision of our trig instructions appears to have been fixed on Kaby
Lake.  Neither Ben nor I can find any documentation for this.  However, the
dEQP precision tests now pass with INTEL_PRECISE_TRIG=0 where they fail on
Sky Lake.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agointel/common: Add an is_kabylake field to gen_device_info
Jason Ekstrand [Tue, 8 Nov 2016 21:21:38 +0000 (13:21 -0800)]
intel/common: Add an is_kabylake field to gen_device_info

Most of the 3-D engine Kaby Lake is identical to Sky Lake.  However, there
are a few small differences that we need to be able to detect.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
7 years agoanv: Fix unintentional integer overflow in anv_CreateDmaBufImageINTEL
Gwan-gyeong Mun [Sun, 20 Nov 2016 11:44:22 +0000 (20:44 +0900)]
anv: Fix unintentional integer overflow in anv_CreateDmaBufImageINTEL

Since both pCreateInfo->strideInBytes and pCreateInfo->extent.height
are of uint32_t type 32-bit arithmetic will be used.

Fix unintentional integer overflow by casting to uint64_t before
multifying.

CID 1394321

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
[Emil Velikov: cast only of the arguments]
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoutil/disk_cache: close a previously opened handle in disk_cache_put (v2)
Gwan-gyeong Mun [Mon, 21 Nov 2016 15:21:23 +0000 (00:21 +0900)]
util/disk_cache: close a previously opened handle in disk_cache_put (v2)

We're missing the close() to the matching open().

CID 1373407

v2: Fixes from Emil Velikov's review
    Update the teardown in reverse order of the setup/init.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v1)
7 years agodocs: get rid of duplicated description from sourcetree.html
Gwan-gyeong Mun [Tue, 22 Nov 2016 11:51:52 +0000 (20:51 +0900)]
docs: get rid of duplicated description from sourcetree.html

Fixes: 438086efb17 (docs: sourcetree.html misc updates)
Signed-off-by: Mun Gwan-gyeong <elongbug@gmail.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agodocs/submitting patches: mention get_reviewers.pl
Emil Velikov [Mon, 21 Nov 2016 17:53:33 +0000 (17:53 +0000)]
docs/submitting patches: mention get_reviewers.pl

Mention the script - why/how to use alongside a useful trick to make it
work interactively (thanks Rob!).

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
7 years agodocs/submitting patches: add git tips
Timothy Arceri [Mon, 21 Nov 2016 16:30:12 +0000 (16:30 +0000)]
docs/submitting patches: add git tips

v2: [Emil Velikov]
 - Add the shorthand git send-email -vX
 - Move to submittingpatches.html
 - Add to the TOC.

v3: [Emil Velikov]
 - Use @~8 instead of HEAD~8 (Nicolai)

Cc: Timothy Arceri <t_arceri@yahoo.com.au>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v1)
7 years agoauxiliary/vl/dri: call get_xcb_screen() only once
Emil Velikov [Mon, 21 Nov 2016 13:46:52 +0000 (13:46 +0000)]
auxiliary/vl/dri: call get_xcb_screen() only once

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
7 years agoegl/x11: store xcb_screen_t *screen instead of int screen
Emil Velikov [Mon, 21 Nov 2016 13:46:51 +0000 (13:46 +0000)]
egl/x11: store xcb_screen_t *screen instead of int screen

Just fetch and store it once, rather than doing the
xcb_setup_roots_iterator + get_xcb_screen dance five times.

v2: Call xcb_disconnect() on error (Eric)

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
7 years agoegl/x11: factor out dri2_get_xcb_connection()
Emil Velikov [Mon, 21 Nov 2016 13:46:50 +0000 (13:46 +0000)]
egl/x11: factor out dri2_get_xcb_connection()

Identical throughout dri2, dri3 and drisw. Next patch will add more
common code, so rather than duplicating it factor out the function.

Note: this also sets eglError on failure. Something that's quite
inconsistent throughout the codebase.

v2: Call xcb_disconnect() on error (Eric)

Note: use xcb_disconnect() even in the xcb_connection_has_error() case
as per the manual:
... memory will not be freed until xcb_disconnect...

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com> (v1)
7 years agomesa/glsl: remove unused uses_builtin_functions field
Timothy Arceri [Tue, 22 Nov 2016 06:59:41 +0000 (17:59 +1100)]
mesa/glsl: remove unused uses_builtin_functions field

This has been unused since 943b69cddd

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
7 years agoi965: Use NIR-based clip/cull lowering for OpenGL as well.
Kenneth Graunke [Mon, 17 Oct 2016 21:23:10 +0000 (14:23 -0700)]
i965: Use NIR-based clip/cull lowering for OpenGL as well.

The old approach works fine, and this approach isn't necessarily better.
But it at least has the advantage that Vulkan and GL use the same
approach.  I originally wrote it to gain additional testing for the
new paths.

shader-db statistics show 0 instruction count changes.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv: Enable clip and cull distance support.
Kenneth Graunke [Tue, 4 Oct 2016 03:44:38 +0000 (20:44 -0700)]
anv: Enable clip and cull distance support.

Everything is now in place, and we appear to pass the tests on Gen7+.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoi965/vec4: Handle component qualifiers on non-generic varyings.
Kenneth Graunke [Mon, 17 Oct 2016 18:14:10 +0000 (11:14 -0700)]
i965/vec4: Handle component qualifiers on non-generic varyings.

ARB_enhanced_layouts only requires component qualifier support for
generic varyings, so this is all the vec4 backend knew how to handle.

This patch extends the backend to handle it for all varyings, so we
can use store_output intrinsics with a component set for things like
clip/cull distances.  We may want to use that for other VUE header
fields in the future as well.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
7 years agoi965/fs: Handle compact outputs.
Kenneth Graunke [Tue, 4 Oct 2016 08:59:33 +0000 (01:59 -0700)]
i965/fs: Handle compact outputs.

We need to calculate the number of vec4 slots correctly.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agospirv: Silence unsupported capability warnings for Clip/CullDistance.
Kenneth Graunke [Tue, 4 Oct 2016 06:46:37 +0000 (23:46 -0700)]
spirv: Silence unsupported capability warnings for Clip/CullDistance.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv: Set clip/cull distances fields in packets.
Kenneth Graunke [Tue, 4 Oct 2016 06:44:07 +0000 (23:44 -0700)]
anv: Set clip/cull distances fields in packets.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv: Combine ClipDistance and CullDistance arrays.
Kenneth Graunke [Tue, 4 Oct 2016 03:42:42 +0000 (20:42 -0700)]
anv: Combine ClipDistance and CullDistance arrays.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agonir: add a pass to compact clip/cull distances.
Kenneth Graunke [Tue, 4 Oct 2016 05:37:40 +0000 (22:37 -0700)]
nir: add a pass to compact clip/cull distances.

v2: Use nir_is_per_vertex_io() rather than is_arrays_of_arrays().

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agonir: Add a "compact array" flag and IO lowering code.
Kenneth Graunke [Tue, 4 Oct 2016 03:32:22 +0000 (20:32 -0700)]
nir: Add a "compact array" flag and IO lowering code.

Certain built-in arrays, such as gl_ClipDistance[], gl_CullDistance[],
gl_TessLevelInner[], and gl_TessLevelOuter[] are specified as scalar
arrays.  Normal scalar arrays are sparse - each array element usually
occupies a whole vec4 slot.  However, most hardware assumes these
built-in arrays are tightly packed.

The new var->data.compact flag indicates that a scalar array should
be tightly packed, so a float[4] array would take up a single vec4
slot, and a float[8] array would take up two slots.

They are still arrays, not vec4s, however.  nir_lower_io will generate
intrinsics using ARB_enhanced_layouts style component qualifiers.

v2: Add nir_validate code to enforce type restrictions.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoradv: add support for shader stats dump
Dave Airlie [Tue, 22 Nov 2016 04:17:49 +0000 (04:17 +0000)]
radv: add support for shader stats dump

I've started working on a shader-db alike for Vulkan,
it's based on vktrace and it records pipelines, this
adds support to dump the shader stats exactly like
radeonsi does, so I can reuse the shader-db scripts it
uses.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: fix sample id loading
Dave Airlie [Mon, 21 Nov 2016 01:12:39 +0000 (01:12 +0000)]
radv: fix sample id loading

The sample id is packed into bits 8-12, so adjust
things properly.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/ac: add implementation of load_sample_pos intrinsic.
Dave Airlie [Wed, 16 Nov 2016 03:54:22 +0000 (03:54 +0000)]
radv/ac: add implementation of load_sample_pos intrinsic.

This fixes a bunch of crashes in CTS tests looking for this.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/ac: cleanup ddxy emission
Dave Airlie [Sun, 20 Nov 2016 23:56:45 +0000 (23:56 +0000)]
radv/ac: cleanup ddxy emission

This cleans up the ddxy emission along the same lines as
radeonsi. It also means we don't use LDS on VI chips we
use the dspermute interface, it also removes some duplicated
code.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/meta: cleanup resolve vertex state emission
Dave Airlie [Mon, 21 Nov 2016 03:31:09 +0000 (03:31 +0000)]
radv/meta: cleanup resolve vertex state emission

For the hw resolve there is no need to emit any sort
of texture coordinates, so drop them all in the meta path.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Incorporate GPU family into cache UUID.
Bas Nieuwenhuizen [Mon, 21 Nov 2016 23:39:50 +0000 (00:39 +0100)]
radv: Incorporate GPU family into cache UUID.

Invalidates the cache when someone switches cards.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
7 years agoradv: Use library mtime for cache UUID.
Bas Nieuwenhuizen [Mon, 21 Nov 2016 23:19:30 +0000 (00:19 +0100)]
radv: Use library mtime for cache UUID.

We want to also invalidate the cache when LLVM gets changed. As the
specific LLVM revision is not fixed at build time, we will need to
check at runtime. Computing a checksum for LLVM is going to be very
expensive, so just use the mtime.

Tested on my computer that the returned DSO for the LLVM symbol is
actually the LLVM DSO.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>