mesa.git
7 years agoi965/gen6: Remove check for stencil format
Topi Pohjolainen [Thu, 29 Dec 2016 08:06:16 +0000 (10:06 +0200)]
i965/gen6: Remove check for stencil format

There are is no alternative.

Reviewed-by: Samuel Iglesias Gons\341lvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965: Remove check for hiz on earlier gens than SNB
Topi Pohjolainen [Wed, 28 Dec 2016 15:49:56 +0000 (17:49 +0200)]
i965: Remove check for hiz on earlier gens than SNB

Only caller, brw_workaround_depthstencil_alignment(), returns
early for gen6+.

While at it, reduce scope for brw_get_depthstencil_tile_masks() as
well.

Reviewed-by: Samuel Iglesias Gons\341lvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965/miptree: Remove redundant check for null texture
Topi Pohjolainen [Thu, 22 Dec 2016 08:36:03 +0000 (10:36 +0200)]
i965/miptree: Remove redundant check for null texture

There exact same check earlier in brw_miptree_layout() which
intel_miptree_create_layout() in turn calls unconditionally.

Reviewed-by: Samuel Iglesias Gons\341lvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965/miptree: Tell when brw_miptree_layout() fails
Topi Pohjolainen [Tue, 17 Jan 2017 08:10:17 +0000 (10:10 +0200)]
i965/miptree: Tell when brw_miptree_layout() fails

In addition, let intel_miptree_create_layout() release the
miptree - it is the allocator.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoi965/meta: Remove unused brw_get_rb_for_slice()
Topi Pohjolainen [Thu, 22 Dec 2016 08:09:55 +0000 (10:09 +0200)]
i965/meta: Remove unused brw_get_rb_for_slice()

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gons<C3><A1>lvez <siglesias@igalia.com>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agoclover: Fix build against clang SVN >= r293097
Michel Dänzer [Thu, 26 Jan 2017 06:28:12 +0000 (15:28 +0900)]
clover: Fix build against clang SVN >= r293097

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
7 years agovc4: Use NEON to speed up utile stores on Pi2+.
Eric Anholt [Sun, 8 Jan 2017 22:54:57 +0000 (14:54 -0800)]
vc4: Use NEON to speed up utile stores on Pi2+.

Improves 1024x1024 TexSubImage2D by 41.2371% +/- 3.52799% (n=10).

7 years agovc4: Use NEON to speed up utile loads on Pi2.
Eric Anholt [Thu, 5 Jan 2017 23:11:30 +0000 (15:11 -0800)]
vc4: Use NEON to speed up utile loads on Pi2.

We had a lot of memcpy call overhead because gpu_stride wasn't being
inlined.  But if you split out the stride==8 and stride==16 cases like
this code does while still using memcpy, you'd no longer have glibc's
NEON memcpy applied at which point we'd be doing 16 uncached reads
instead of 64/(NEON memcpy granularity), for about a 30% performance
hit.  By hand writing the assembly, we can get a whole cacheline
loaded at a time.

Unfortunately, NEON intrinsics turned out to be unusable -- they
didn't have the vldm instruction available.

Note that, for now, the NEON code is only enabled when building for ARMv7
(Pi 2+).  We may want to do runtime detection for the Raspbian case, in
the future.

Improves 1024x1024 GetTexImage by 208.256% +/- 7.07029% (n=10).

7 years agovc4: Move LT tiling code to a separate file.
Eric Anholt [Fri, 6 Jan 2017 18:51:22 +0000 (10:51 -0800)]
vc4: Move LT tiling code to a separate file.

This paves the way for building it twice, with NEON assembly or not.

7 years agovc4: Use unreachable() in an unreachable codepath for tiling.
Eric Anholt [Fri, 6 Jan 2017 18:55:07 +0000 (10:55 -0800)]
vc4: Use unreachable() in an unreachable codepath for tiling.

7 years agogallium/radeon: add VRAM-vis-usage HUD query
Samuel Pitoiset [Wed, 25 Jan 2017 15:56:46 +0000 (16:56 +0100)]
gallium/radeon: add VRAM-vis-usage HUD query

This new query returns the current visible usage of VRAM accessed
by the CPU. It will return 0 on radeon because it's unimplemented.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agogallium/radeon: query the CPU accessible size of VRAM
Samuel Pitoiset [Wed, 25 Jan 2017 15:56:45 +0000 (16:56 +0100)]
gallium/radeon: query the CPU accessible size of VRAM

R600_DEBUG="info" can be used to display that size, as well as
the total amount of VRAM/GTT.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agomesa: Arrange validate_uniform_parameters parameters to match call sites
Ian Romanick [Thu, 6 Nov 2014 19:12:31 +0000 (11:12 -0800)]
mesa: Arrange validate_uniform_parameters parameters to match call sites

Saves a measly 20 bytes on IA32 and nothing on x64.  Depending on
exactly when this is applied, a lot of variation is possible due to
function alignment.

   text    data     bss     dec     hex filename
6670131  228340   22552 6921023  699b3f lib/i965_dri.so before
6670111  228340   22552 6921003  699b2b lib/i965_dri.so after
6342932  293872   29880 6666684  65b9bc lib64/i965_dri.so before
6342932  293872   29880 6666684  65b9bc lib64/i965_dri.so after

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agomesa: Arrange _mesa_uniform parameters to match the call sites
Ian Romanick [Thu, 6 Nov 2014 02:44:21 +0000 (18:44 -0800)]
mesa: Arrange _mesa_uniform parameters to match the call sites

By putting the parameters first that match the parameters to the call
site, 4 (of 14) instructions are saved at _mesa_Uniform4fv on x64.  On
IA32, the details of the instructions change, but it is the same count
and mix of instructions.

Before:

0000000000000830 <_mesa_Uniform4fv>:
     830:       48 83 ec 10             sub    $0x10,%rsp
     834:       49 89 d0                mov    %rdx,%r8
     837:       48 8b 15 00 00 00 00    mov    0x0(%rip),%rdx        # 83e <_mesa_Uniform4fv+0xe>
     83e:       89 f8                   mov    %edi,%eax
     840:       89 f1                   mov    %esi,%ecx
     842:       41 b9 02 00 00 00       mov    $0x2,%r9d
     848:       64 48 8b 3a             mov    %fs:(%rdx),%rdi
     84c:       48 8b 97 c8 01 02 00    mov    0x201c8(%rdi),%rdx
     853:       48 8b 72 70             mov    0x70(%rdx),%rsi
     857:       6a 04                   pushq  $0x4
     859:       89 c2                   mov    %eax,%edx
     85b:       e8 00 00 00 00          callq  860 <_mesa_Uniform4fv+0x30>
     860:       48 83 c4 18             add    $0x18,%rsp
     864:       c3                      retq

After:

00000000000007f0 <_mesa_Uniform4fv>:
     7f0:       48 83 ec 10             sub    $0x10,%rsp
     7f4:       48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        # 7fb <_mesa_Uniform4fv+0xb>
     7fb:       41 b9 02 00 00 00       mov    $0x2,%r9d
     801:       64 48 8b 08             mov    %fs:(%rax),%rcx
     805:       48 8b 81 c8 01 02 00    mov    0x201c8(%rcx),%rax
     80c:       6a 04                   pushq  $0x4
     80e:       4c 8b 40 70             mov    0x70(%rax),%r8
     812:       e8 00 00 00 00          callq  817 <_mesa_Uniform4fv+0x27>
     817:       48 83 c4 18             add    $0x18,%rsp
     81b:       c3                      retq

Saves a measly 416 bytes of text on x64.  Depending on exactly when this
is applied, a lot of variation is possible due to function alignment.

   text    data     bss     dec     hex filename
6670131  228340   22552 6921023  699b3f lib/i965_dri.so before
6670131  228340   22552 6921023  699b3f lib/i965_dri.so after
6343348  293872   29880 6667100  65bb5c lib64/i965_dri.so before
6342932  293872   29880 6666684  65b9bc lib64/i965_dri.so after

There is likely to be no performance change with just this patch.
_mesa_uniform immediately calls validate_uniform_parameters with
parameters in the "wrong" (different from the call site) order.

v2: Rebase on GL_ARB_gpu_shader_fp64.

v3: Rebase on GL_ARB_gpu_shader_int64.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agomesa: Arrange _mesa_uniform_matrix parameters to match the call sites
Ian Romanick [Thu, 6 Nov 2014 01:48:40 +0000 (17:48 -0800)]
mesa: Arrange _mesa_uniform_matrix parameters to match the call sites

By putting the parameters first that match the parameters to the call
site, 4 (of 16) instructions are saved at _mesa_UniformMatrix4fv on
x64.  On IA32, the details of the instructions change, but it is the
same count and mix of instructions.

Before:

0000000000001380 <_mesa_UniformMatrix4fv>:
    1380:       48 83 ec 10             sub    $0x10,%rsp
    1384:       48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        # 138b <_mesa_UniformMatrix4fv+0xb>
    138b:       41 89 f8                mov    %edi,%r8d
    138e:       41 89 f1                mov    %esi,%r9d
    1391:       0f b6 d2                movzbl %dl,%edx
    1394:       64 48 8b 38             mov    %fs:(%rax),%rdi
    1398:       48 8b b7 c8 01 02 00    mov    0x201c8(%rdi),%rsi
    139f:       48 8b 76 70             mov    0x70(%rsi),%rsi
    13a3:       68 06 14 00 00          pushq  $0x1406
    13a8:       51                      push   %rcx
    13a9:       52                      push   %rdx
    13aa:       b9 04 00 00 00          mov    $0x4,%ecx
    13af:       ba 04 00 00 00          mov    $0x4,%edx
    13b4:       e8 00 00 00 00          callq  13b9 <_mesa_UniformMatrix4fv+0x39>
    13b9:       48 83 c4 28             add    $0x28,%rsp
    13bd:       c3                      retq

After:

0000000000001360 <_mesa_UniformMatrix4fv>:
    1360:       48 83 ec 10             sub    $0x10,%rsp
    1364:       48 8b 05 00 00 00 00    mov    0x0(%rip),%rax        # 136b <_mesa_UniformMatrix4fv+0xb>
    136b:       0f b6 d2                movzbl %dl,%edx
    136e:       64 4c 8b 00             mov    %fs:(%rax),%r8
    1372:       49 8b 80 c8 01 02 00    mov    0x201c8(%r8),%rax
    1379:       68 06 14 00 00          pushq  $0x1406
    137e:       6a 04                   pushq  $0x4
    1380:       6a 04                   pushq  $0x4
    1382:       4c 8b 48 70             mov    0x70(%rax),%r9
    1386:       e8 00 00 00 00          callq  138b <_mesa_UniformMatrix4fv+0x2b>
    138b:       48 83 c4 28             add    $0x28,%rsp
    138f:       c3                      retq

Saves a measly 576 bytes of text on x64.

   text    data     bss     dec     hex filename
6670131  228340   22552 6921023  699b3f lib/i965_dri.so before
6670131  228340   22552 6921023  699b3f lib/i965_dri.so after
6343924  293872   29880 6667676  65bd9c lib64/i965_dri.so before
6343348  293872   29880 6667100  65bb5c lib64/i965_dri.so after

v2: Rebase on GL_ARB_gpu_shader_fp64.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agomesa: Trivial clean-ups in uniform_query.cpp
Ian Romanick [Tue, 1 Sep 2015 17:29:04 +0000 (10:29 -0700)]
mesa: Trivial clean-ups in uniform_query.cpp

This is C++, so we can mix code and declarations.  Doing so allows
constification.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agospirv: handle undefined components for OpVectorShuffle
Lionel Landwerlin [Thu, 26 Jan 2017 16:57:40 +0000 (16:57 +0000)]
spirv: handle undefined components for OpVectorShuffle

Fixes:
   dEQP-VK.spirv_assembly.instruction.compute.opspecconstantop.vector_related
   dEQP-VK.spirv_assembly.instruction.graphics.opspecconstantop.vector_related*

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
7 years agospirv: handle OpUndef as part of the variable parsing pass
Lionel Landwerlin [Thu, 26 Jan 2017 16:57:25 +0000 (16:57 +0000)]
spirv: handle OpUndef as part of the variable parsing pass

Looking at the following bit of SPIRV shader :

...
%zero        = OpConstant %i32 0
%ivec3_0     = OpConstantComposite %ivec3 %zero %zero %zero
%vec3_undef  = OpUndef %ivec3
%sc_0        = OpSpecConstant %i32 0
%sc_1        = OpSpecConstant %i32 0
%sc_2        = OpSpecConstant %i32 0
...

Our compiler currently stops parsing variables & types on the OpUndef
and switches to instructions, leaving the following sc_[0-2] variables
untreated.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
7 years agoanv: fix descriptor pool internal size allocation
Lionel Landwerlin [Thu, 26 Jan 2017 11:06:53 +0000 (11:06 +0000)]
anv: fix descriptor pool internal size allocation

The size of the pool is slightly smaller than the size of the
structure containing the whole pool. We need to take that into account
on when setting up the internals.

Fixes a crash due to out of bound memory access in:
   dEQP-VK.api.descriptor_pool.out_of_pool_memory

v2: Drop debug traces (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0 13.0" <mesa-stable@lists.freedesktop.org>
7 years agoi965: Make intelEmitCopyBlit not truncate large strides.
Kenneth Graunke [Sun, 22 Jan 2017 09:44:08 +0000 (01:44 -0800)]
i965: Make intelEmitCopyBlit not truncate large strides.

When trying to blit larger tiled surfaces, the pitch can be larger than
32768 bytes, which means it won't fit in a GLshort.  Passing it in will
truncate the stride to 0, which has...surprising results.

The pitch can be up to 32,768 DWords, or 128kB.  We measure it in bytes,
but divide by 4 when programming it.  So we need to handle values up to
131,072.  Switch from GLshort to int32_t to avoid the truncation.

Fixes GL45-CTS.gtf30.GL3Tests.depth_texture.depth_texture_copyteximage
at widths greater than 8192.

v2: Use int32_t as negative values can be used (Jason).

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoi965: Use a UW source type for CS_OPCODE_CS_TERMINATE.
Kenneth Graunke [Tue, 24 Jan 2017 08:45:53 +0000 (00:45 -0800)]
i965: Use a UW source type for CS_OPCODE_CS_TERMINATE.

SIMD16 compute shaders use a send(16) with mlen 1 for the EOT message,
using a source of g127 for the single register.  With a UD type, this
supposedly could read g128, which doesn't exist, causing the simulator
to get cranky.  Use a UW type to avoid this.

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agoanv/lower_input_attachments: honor sample index parameter to subpassLoad()
Iago Toral Quiroga [Wed, 25 Jan 2017 14:04:35 +0000 (15:04 +0100)]
anv/lower_input_attachments: honor sample index parameter to subpassLoad()

According to GL_KHR_vulkan_glsl, the signature of subpassLoad() is:

gvec4 subpassLoad(gsubpassInput   subpass);
gvec4 subpassLoad(gsubpassInputMS subpass, int sample);

So the multisampled case always receives an explicit sample index that we
should use. The current implementation was ignoring this parameter
and using gl_SampleID value instead.

Fixes:
dEQP-VK.pipeline.multisample_shader_builtin.sample_id.*

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
7 years agoi965: Fix fast depth clears for surfaces with a dimension of 16384.
Kenneth Graunke [Mon, 23 Jan 2017 19:57:21 +0000 (11:57 -0800)]
i965: Fix fast depth clears for surfaces with a dimension of 16384.

I hadn't bothered to set this bit because I figured it would just
paper over us getting the rectangle wrong.  But it turns out that
there is a legitimate reason to use it, so let's do so.

The alternative would be to chop up 16k clears to multiple 8k clears,
which is pointlessly painful.

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
7 years agoanv: Implement VK_KHR_get_physical_device_properties2
Chad Versace [Wed, 25 Jan 2017 20:12:20 +0000 (12:12 -0800)]
anv: Implement VK_KHR_get_physical_device_properties2

Reviewed-by: Jason Ekstranad <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Refactor anv_GetPhysicalDeviceQueueFamilyProperties()
Chad Versace [Wed, 25 Jan 2017 20:12:19 +0000 (12:12 -0800)]
anv: Refactor anv_GetPhysicalDeviceQueueFamilyProperties()

Add a helper function, anv_get_queue_family_properties(), which fills the
struct.  This patch reduces churn in the following patch that implements
vkGetPhysicalDeviceQueueFamilyProperties2KHR.

Reviewed-by: Jason Ekstranad <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Refactor anv_GetPhysicalDeviceFormatProperties()
Chad Versace [Wed, 25 Jan 2017 20:12:18 +0000 (12:12 -0800)]
anv: Refactor anv_GetPhysicalDeviceFormatProperties()

Add a helper function, anv_get_image_format_properties(), which does all
the work and has a VkPhysicalDeviceImageFormatInfo2KHR parameter. This
patch reduces churn in the following patch that implements
vkGetPhysicalDeviceImageFormatProperties2KHR.

Reviewed-by: Jason Ekstranad <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Revive struct anv_common
Chad Versace [Wed, 25 Jan 2017 21:53:00 +0000 (13:53 -0800)]
anv: Revive struct anv_common

The struct was deleted by:
  commit efe9d1cde3340d3a9d17e5560b609a4fb839d43d
  Author: Edward O'Callaghan <funfunctor@folklore1984.net>
  Subject: anv: Clean up some unused variables

Unlike the original anv_common, the new one has a non-const pNext
pointer because we will use it for the output structs of
VK_KHR_get_physical_device_properties2.

v2:
  - Retype pNext from void* to struct anv_common*.

Reviewed-by: Jason Ekstranad <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Define macro anv_debug()
Chad Versace [Wed, 25 Jan 2017 20:12:16 +0000 (12:12 -0800)]
anv: Define macro anv_debug()

This is a printf-like macro that prints a debug message to stderr when
built with DEBUG.  If no DEBUG, then do nothing.

Reviewed-by: Jason Ekstranad <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agomesa: Fix copy-and-paste bug in _mesa_(Program|)Uniform[1234](i|ui)64vARB functions
Ian Romanick [Wed, 25 Jan 2017 00:13:01 +0000 (16:13 -0800)]
mesa: Fix copy-and-paste bug in _mesa_(Program|)Uniform[1234](i|ui)64vARB functions

All of the functions were passing 1 to _mesa_uniform instead of passing
count.

Fixes 16 unsed parameter warnings like:

main/uniforms.c: In function ‘_mesa_Uniform1i64vARB’:
main/uniforms.c:1692:47: warning: unused parameter ‘count’ [-Wunused-parameter]
 _mesa_Uniform1i64vARB(GLint location, GLsizei count, const GLint64 *value)
                                               ^~~~~

This is why I build with extra warnings enabled.  Unfortunately, there
are so many unused parameter warnings in Mesa that I didn't notice these
added warnings for over 6 months. :(

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agospirv: bump headers to SPIRV 1.1
Lionel Landwerlin [Wed, 25 Jan 2017 14:04:05 +0000 (14:04 +0000)]
spirv: bump headers to SPIRV 1.1

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agospirv: add default handler for new enums
Lionel Landwerlin [Wed, 25 Jan 2017 14:03:31 +0000 (14:03 +0000)]
spirv: add default handler for new enums

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agospirv: fix typos
Lionel Landwerlin [Wed, 25 Jan 2017 12:28:50 +0000 (12:28 +0000)]
spirv: fix typos

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoanv: set command buffer to NULL when allocations fail
Lionel Landwerlin [Wed, 25 Jan 2017 16:22:40 +0000 (16:22 +0000)]
anv: set command buffer to NULL when allocations fail

The spec section 5.2 says:

   "vkAllocateCommandBuffers can be used to create multiple command
   buffers. If the creation of any of those command buffers fails, the
   implementation must destroy all successfully created command buffer
   objects from this command, set all entries of the pCommandBuffers
   array to VK_NULL_HANDLE and return the error."

Fixes:
   dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_primary
   dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_secondary

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
7 years agovulkan/wsi: Lower the maximum image sizes
Jason Ekstrand [Wed, 25 Jan 2017 01:10:45 +0000 (17:10 -0800)]
vulkan/wsi: Lower the maximum image sizes

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "17.0" <mesa-dev@lists.freedesktop.org>
7 years agovulkan/wsi/wayland: Handle VK_INCOMPLETE for GetPresentModes
Jason Ekstrand [Wed, 25 Jan 2017 00:43:15 +0000 (16:43 -0800)]
vulkan/wsi/wayland: Handle VK_INCOMPLETE for GetPresentModes

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "17.0" <mesa-dev@lists.freedesktop.org>
7 years agovulkan/wsi/wayland: Handle VK_INCOMPLETE for GetFormats
Jason Ekstrand [Wed, 25 Jan 2017 00:43:01 +0000 (16:43 -0800)]
vulkan/wsi/wayland: Handle VK_INCOMPLETE for GetFormats

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: "17.0" <mesa-dev@lists.freedesktop.org>
7 years agoswr: Update fs texture & sampler state logic
George Kyriazis [Tue, 24 Jan 2017 23:19:55 +0000 (17:19 -0600)]
swr: Update fs texture & sampler state logic

In swr_update_derived() update texture and sampler state on a new fragment
shader.  GALLIUM_HUD can update fs using a previously bound texture and
sampler.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agogallium/radeon: add a new HUD query for the number of mapped buffers
Samuel Pitoiset [Mon, 23 Jan 2017 20:44:45 +0000 (21:44 +0100)]
gallium/radeon: add a new HUD query for the number of mapped buffers

Useful when debugging applications which map a ton of buffers
and also because we used to run into Linux's limit on the number
of simultaneous mmap() calls.

v2: - update the commit message

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agospirv: handle gl_SampleMask
Iago Toral Quiroga [Tue, 24 Jan 2017 10:49:40 +0000 (11:49 +0100)]
spirv: handle gl_SampleMask

SPIR-V maps both gl_SampleMask and gl_SampleMaskIn to the same
builtin (SampleMask). The only way to tell which one we are dealing with
is to check if it is an input or an output.

Fixes:
dEQP-VK.pipeline.multisample_shader_builtin.sample_mask.write.*

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agospirv: acknowledge multisampled input attachments
Iago Toral Quiroga [Tue, 24 Jan 2017 09:48:01 +0000 (10:48 +0100)]
spirv: acknowledge multisampled input attachments

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoradv: program a default point size.
Dave Airlie [Wed, 18 Jan 2017 03:46:43 +0000 (13:46 +1000)]
radv: program a default point size.

Along the lines of what
3b804819 anv: Default PointSize to 1.0 if not written by the shader
does for anv, program a default point size in the hw of 1.0.

This preempt fixes a bunch of geom shader tests.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradeonsi: handle first_non_void correctly in si_create_vertex_elements
Marek Olšák [Fri, 20 Jan 2017 15:02:04 +0000 (16:02 +0100)]
radeonsi: handle first_non_void correctly in si_create_vertex_elements

This fixes R11G11B10_FLOAT, because it's in the category of "OTHER",
meaning that it doesn't have any channel description.

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agost/mesa: destroy pipe_context before destroying st_context (v2)
Marek Olšák [Fri, 20 Jan 2017 01:26:42 +0000 (02:26 +0100)]
st/mesa: destroy pipe_context before destroying st_context (v2)

If radeonsi starts compiling an optimized shader variant asynchronously
with a GL debug callback set and the application destroys the GL context,
radeonsi crashes when trying to write shader stats into the debug output
of a non-existent context after compilation, because st/mesa was destroyed
before pipe_context.

Firefox with WebGL2 enabled hits this bug.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99456

v2: protect against a double destroy in st_create_context_priv and callers.

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agonir: bump loop max unroll limit
Timothy Arceri [Wed, 18 Jan 2017 02:12:37 +0000 (13:12 +1100)]
nir: bump loop max unroll limit

The original number was chosen in an attempt to match the limits applied to
GLSL IR.

A look at the git history of the why these limits were chosen for GLSL IR
shows it was more to do with the slow speed of unrolling large loops in
GLSL IR than anything else. The speed of loop unrolling in NIR is not a
problem so we may wish to bump this even higher in future.

No shader-db change, however a furture change will disbale the GLSL IR
optimisation loop in the i965 backend results in 4 loops from The Talos
Principle failing to unroll. Bumping the limit allows them to unroll which
results in the instruction count matching the previous output from when the
GLSL IR opts were still enabled.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoglsl: lower constant arrays to uniform arrays before optimisation loop
Timothy Arceri [Tue, 24 Jan 2017 03:07:04 +0000 (14:07 +1100)]
glsl: lower constant arrays to uniform arrays before optimisation loop

Previously the constant array would not get copy propagated until the backend
did its GLSL IR opt loop. I plan on removing that from i965 shortly which
caused huge regressions in Deus-ex and Tomb Raider which have large
constant arrays. Moving lowering before the opt loop in the GLSL linker
fixes this and unexpectedly improves some compute shaders also.

shader-db results BDW:

instructions helped:   shaders/closed/steam/deus-ex-mankind-divided/374.shader_test CS SIMD16: 204 -> 194 (-4.90%)
instructions helped:   shaders/closed/steam/deus-ex-mankind-divided/318.shader_test CS SIMD8: 1010 -> 741 (-26.63%)
instructions helped:   shaders/closed/steam/deus-ex-mankind-divided/144.shader_test CS SIMD8: 542 -> 385 (-28.97%)

cycles helped:   shaders/closed/steam/deus-ex-mankind-divided/318.shader_test CS SIMD8: 1831382 -> 1818492 (-0.70%)
cycles helped:   shaders/closed/steam/deus-ex-mankind-divided/144.shader_test CS SIMD8: 216238 -> 206180 (-4.65%)
cycles helped:   shaders/closed/steam/deus-ex-mankind-divided/374.shader_test CS SIMD16: 18484 -> 16644 (-9.95%)

total instructions in shared programs: 13060313 -> 13059877 (-0.00%)
instructions in affected programs: 1756 -> 1320 (-24.83%)
helped: 3
HURT: 0

total cycles in shared programs: 256586698 -> 256561910 (-0.01%)
cycles in affected programs: 2066104 -> 2041316 (-1.20%)
helped: 3
HURT: 0

V3: only call the opt loop if lowering progressed (Suggested by Eric)

V2: call opts before and after lowering (Suggested by Ken)

Reviewed-by: Eric Anholt <eric@anholt.net>
7 years agomesa: Don't advertise GL_OES_read_format in core profile
Ian Romanick [Mon, 23 Jan 2017 17:57:15 +0000 (09:57 -0800)]
mesa: Don't advertise GL_OES_read_format in core profile

OpenGL ES implementations are not allowed to ship ARB extensions, and
OpenGL implementations are not allowed to ship OES extensions.

The functionality is also included in GL_ARB_ES2_compatibility.  Ever
OpenGL core-profile driver currently exposes both extensions.  I don't
know of any applications that explicitly check for GL_OES_read_format,
so removing it seems very unlikely to cause problems.  No functionality
is removed.

I have left this extension in place for compatibility profile.  There
are still OpenGL 1.x drivers in Mesa, and adding code to check for
compatibility profile and not GL_ARB_ES2_compatibility for
GL_IMPLEMENTATION_COLOR_READ_TYPE and GL_IMPLEMENTATION_COLOR_READ_FORMAT
just feels dumb.

Three other other alternatives considered:

 - Remove the string from compatibility profile drivers but leave the
   functionality in place.

 - Add a flag to expose the extension string, and set it in every OpenGL
   driver that does not expose GL_ARB_ES2_compatibility (and those
   drivers only).  I tried this.  You can't have two instances of an
   extension in the extension table (one dummy_true for ES1 and one with
   a flag for compatibility profile), so the implementation requires a
   bit of effort.

 - Only expose the extension in compatibility if the version is less
   than 2.0.  I didn't see an easy way to do this.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
7 years agodocs: fix incorrect link to 12.0.6 release notes
Brian Paul [Tue, 24 Jan 2017 21:30:07 +0000 (14:30 -0700)]
docs: fix incorrect link to 12.0.6 release notes

Trivial.

7 years agoanv: Expose VK_KHR_maintenance1
Jason Ekstrand [Sat, 21 Jan 2017 01:46:33 +0000 (17:46 -0800)]
anv: Expose VK_KHR_maintenance1

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Return better errors from AllocateDescriptorSets
Jason Ekstrand [Sat, 21 Jan 2017 15:40:22 +0000 (07:40 -0800)]
anv: Return better errors from AllocateDescriptorSets

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Allow selecting the slice of a 3D image
Jason Ekstrand [Sat, 21 Jan 2017 03:47:18 +0000 (19:47 -0800)]
anv: Allow selecting the slice of a 3D image

As per VK_KHR_maintenance1, clients can render to a slice of a 3D image
by creating a VK_IMAGE_VIEW_TYPE_2D view of it.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Report FORMAT_FEATURE_TRANSFER_SRC/DST_BIT_KHR
Jason Ekstrand [Sat, 21 Jan 2017 03:39:03 +0000 (19:39 -0800)]
anv: Report FORMAT_FEATURE_TRANSFER_SRC/DST_BIT_KHR

As of VK_KHR_maintenance1, these are supposed to be reported for any
formats on which we support transfer operations.  For us, this is
anything that we can texture from.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Add trivial support for TrimCommandPoolKHR
Jason Ekstrand [Sat, 21 Jan 2017 01:46:21 +0000 (17:46 -0800)]
anv: Add trivial support for TrimCommandPoolKHR

Our command buffers already efficiently use a global pool so trimming
doesn't really need to do anything.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agoanv: Set viewport extents correctly when height is negative
Jason Ekstrand [Sat, 21 Jan 2017 01:30:51 +0000 (17:30 -0800)]
anv: Set viewport extents correctly when height is negative

As per VK_KHR_maintenance1, setting a negative height in the viewport
can be used to get flipped coordinates.  This is, aparently, very useful
when porting D3D apps to Vulkan.  All we need to do to support this is
to make sure we actually set the min and max correctly.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
7 years agovulkan: Don't install vk_platform.h or vulkan.h.
Matt Turner [Tue, 24 Jan 2017 00:48:01 +0000 (16:48 -0800)]
vulkan: Don't install vk_platform.h or vulkan.h.

These files belong to the vulkan loader.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
7 years agoglsl: fix compile errors with mingw due to missing PRIx64 definitions
Roland Scheidegger [Mon, 23 Jan 2017 19:21:00 +0000 (20:21 +0100)]
glsl: fix compile errors with mingw due to missing PRIx64 definitions

define __STDC_FORMAT_MACROS and include <inttypes.h> (same as
ir_builder_print_visitor.cpp already does).

Otherwise, some mingw build errors out (since
8e7e1ae0365ddc7edb0d4d98250ab46728e6c14a and
bbce1c538dc0cb8bf3769510283d11847dc07540 presumably) with:
src/compiler/glsl/ir_print_visitor.cpp:479:40: error: expected ‘)’ before ‘PRIu64’
   case GLSL_TYPE_UINT64:fprintf(f, "%" PRIu64, ir->value.u64[i]); break;

(Note even with that fix I get other format specifier warnings:
src/compiler/glsl/ir_print_visitor.cpp:473:47:
warning: unknown conversion type character ‘a’ in format [-Wformat=]
                fprintf(f, "%a", ir->value.f[i]);
                                               ^
src/compiler/glsl/ir_print_visitor.cpp:473:47:
warning: too many arguments for format [-Wformat-extra-args]
but it still compiles at least)

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
7 years agogallivm: don't try to use fast rcp for fdiv
Roland Scheidegger [Mon, 23 Jan 2017 17:06:03 +0000 (18:06 +0100)]
gallivm: don't try to use fast rcp for fdiv

The use of fast rcp instruction is disabled, and will always fall back
to use a division instead (1 / x). Hence, if we get a division opcode,
it doesn't make much sense trying to split that into rcp/mul.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
7 years agogallivm: (trivial) fix ddiv cpu implementation
Roland Scheidegger [Mon, 23 Jan 2017 17:04:12 +0000 (18:04 +0100)]
gallivm: (trivial) fix ddiv cpu implementation

we can't use the cpu implementation of fdiv, as this one uses different
lp_build_context, which causes assertion failure.
Just use default fdiv action (there is no fast rcp for doubles which we
could potentially use anyway).

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
7 years agotgsi: implement ddiv opcode
Roland Scheidegger [Mon, 23 Jan 2017 17:10:44 +0000 (18:10 +0100)]
tgsi: implement ddiv opcode

softpipe (along with llvmpipe) claims to support arb_gpu_shader_fp64,
so we really need to support that opcode.

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
7 years agoi965/blorp: Use the correct ISL format for combined depth/stencil
Jason Ekstrand [Mon, 23 Jan 2017 18:53:13 +0000 (10:53 -0800)]
i965/blorp: Use the correct ISL format for combined depth/stencil

In brw_blorp_copyteximage, we use the format from the render buffer.
This could be a combined depth/stencil format.  In this case, we handle
stencil properly but we give blorp the wrong ISL format.  Specifically,
we would give blorp ISL_FORMAT_R32G32B32A32_FLOAT which is the wrong
size was causing GPU hangs.

Fixes: GL45-CTS.gtf30.GL3Tests.packed_depth_stencil.packed_depth_stencil_copyteximage
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
7 years agost/glsl_to_tgsi: fix compilation warnings since int64 types
Samuel Pitoiset [Tue, 24 Jan 2017 10:50:08 +0000 (11:50 +0100)]
st/glsl_to_tgsi: fix compilation warnings since int64 types

state_tracker/st_glsl_to_tgsi.cpp:302:28: warning: ‘glsl_to_tgsi_instruction::tex_type’
is too small to hold all values of ‘enum glsl_base_type’
    glsl_base_type tex_type:4;

Fixes: 8ce53d4a2f3 ("glsl: Add basic ARB_gpu_shader_int64 types")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/radeon: undef the very specific UPDATE_COUNTER macro
Samuel Pitoiset [Tue, 24 Jan 2017 10:11:59 +0000 (11:11 +0100)]
gallium/radeon: undef the very specific UPDATE_COUNTER macro

Also, wrap this into a do { ... } while (0). Suggested by Nicolai.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoi965/blorp: Add also depth and stencil buffers to render cache
Topi Pohjolainen [Thu, 19 Jan 2017 08:11:42 +0000 (10:11 +0200)]
i965/blorp: Add also depth and stencil buffers to render cache

v2 (Jason, Curro): Add stencil also even though it is not
                   enabled yet.

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
7 years agogbm: Fix width height getters return type (trivial)
Ben Widawsky [Wed, 26 Oct 2016 19:23:32 +0000 (12:23 -0700)]
gbm: Fix width height getters return type (trivial)

v2: Other way round... to make consistent, make both return type have
the fixed width - uint32_t.

Cc: Daniel Stone <daniel@fooishbar.org>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Daniel Stone <daniels@collabora.com>
7 years agogbm: Move getters to match order in header file (trivial)
Ben Widawsky [Wed, 19 Oct 2016 21:52:12 +0000 (14:52 -0700)]
gbm: Move getters to match order in header file (trivial)

Other things are out of order, but I need to add a getter so I'm just
fixing those.

This helps people adding to GBM know where the right place to put things
is.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Acked-by: Daniel Stone <daniels@collabora.com>
7 years agodocs: add news item and link release notes for 12.0.6
Emil Velikov [Tue, 24 Jan 2017 02:13:42 +0000 (02:13 +0000)]
docs: add news item and link release notes for 12.0.6

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
7 years agodocs: use correct year for the 12.0.6 release notes
Emil Velikov [Tue, 24 Jan 2017 02:05:20 +0000 (02:05 +0000)]
docs: use correct year for the 12.0.6 release notes

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 13953f012dfc7f89dbb07f1eda856aa5353347cc)

7 years agodocs: add sha256 checksums for 12.0.6
Emil Velikov [Tue, 24 Jan 2017 02:02:48 +0000 (02:02 +0000)]
docs: add sha256 checksums for 12.0.6

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 36e3f2542d3cde1fe4f7ca0be83dc49d941cb988)

7 years agodocs: add release notes for 12.0.6
Emil Velikov [Tue, 24 Jan 2017 01:32:02 +0000 (01:32 +0000)]
docs: add release notes for 12.0.6

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit 555885a0bf64d49bc6c31c0aaeb636c24ef61102)

7 years agodocs/releasing: remove stray "cd"
Emil Velikov [Tue, 24 Jan 2017 00:58:54 +0000 (00:58 +0000)]
docs/releasing: remove stray "cd"

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
7 years agonv50: add support for MUL_ZERO_WINS property
Ilia Mirkin [Sat, 14 Jan 2017 23:55:25 +0000 (18:55 -0500)]
nv50: add support for MUL_ZERO_WINS property

This is simply keyed off the vertex shader, as that's guaranteed to be
present in any pipeline.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agonvc0: add support for MUL_ZERO_WINS property
Ilia Mirkin [Sat, 14 Jan 2017 23:55:25 +0000 (18:55 -0500)]
nvc0: add support for MUL_ZERO_WINS property

This sets the dnz flag on all the relevant multiplication operations. At
emission time, this will only be supported by nvc0+, so nv50 will need a
different solution.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
7 years agost/nine: set the MUL_ZERO_WINS flag when supported
Ilia Mirkin [Sun, 15 Jan 2017 17:03:55 +0000 (12:03 -0500)]
st/nine: set the MUL_ZERO_WINS flag when supported

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
7 years agogallium: add PIPE_CAP_TGSI_MUL_ZERO_WINS
Ilia Mirkin [Tue, 17 Jan 2017 03:14:38 +0000 (22:14 -0500)]
gallium: add PIPE_CAP_TGSI_MUL_ZERO_WINS

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
7 years agogallium: add TGSI_PROPERTY_MUL_ZERO_WINS
Ilia Mirkin [Sat, 14 Jan 2017 23:39:41 +0000 (18:39 -0500)]
gallium: add TGSI_PROPERTY_MUL_ZERO_WINS

This will be useful for proper D3D9 emulation, where this behavior is
expected by some shaders.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Axel Davy <axel.davy@ens.fr>
7 years agoradeonsi: always set the TCL1_ACTION_ENA when invalidating L2
Marek Olšák [Fri, 20 Jan 2017 00:13:39 +0000 (01:13 +0100)]
radeonsi: always set the TCL1_ACTION_ENA when invalidating L2

Some CIK-VI docs say this is the default behavior on SI. That doesn't
answer whether it's also the default behavior on CIK-VI.

Cc: 17.0 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: don't declare LDS in TES
Marek Olšák [Thu, 19 Jan 2017 23:08:35 +0000 (00:08 +0100)]
radeonsi: don't declare LDS in TES

not used since we started using the offchip tess ring

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: preload PS inputs only if KILL is used
Marek Olšák [Thu, 19 Jan 2017 12:58:50 +0000 (13:58 +0100)]
radeonsi: preload PS inputs only if KILL is used

so that most shaders can get lower VGPR usage thanks to lazy input loading.
I think this is a more accurate constraint that prevents the black transitions
in Witcher 2.

Affected shaders (7758):
Max Waves: 57437 -> 58231 (1.38 %)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/radeon: adjust the rule for using the LINEAR_ALIGNED layout
Marek Olšák [Fri, 20 Jan 2017 16:57:38 +0000 (17:57 +0100)]
gallium/radeon: adjust the rule for using the LINEAR_ALIGNED layout

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agowinsys/amdgpu: drop all IBs if at least one was rejected within the context
Marek Olšák [Thu, 19 Jan 2017 19:44:49 +0000 (20:44 +0100)]
winsys/amdgpu: drop all IBs if at least one was rejected within the context

The corruption is inevitable and hangs are possible too.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agowinsys/amdgpu: report a rejected IB as a lost context
Marek Olšák [Thu, 19 Jan 2017 19:32:28 +0000 (20:32 +0100)]
winsys/amdgpu: report a rejected IB as a lost context

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agovulkan: import latest registry for 1.0.39 extensions.
Dave Airlie [Mon, 23 Jan 2017 22:05:39 +0000 (08:05 +1000)]
vulkan: import latest registry for 1.0.39 extensions.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agovulkan: bump vulkan.h to 1.0.39 version
Dave Airlie [Mon, 23 Jan 2017 21:55:51 +0000 (07:55 +1000)]
vulkan: bump vulkan.h to 1.0.39 version

This introduces a bunch of new extension defines.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: don't resubmit the same cs over and over while tracing
Grazvydas Ignotas [Mon, 23 Jan 2017 21:16:42 +0000 (23:16 +0200)]
radv: don't resubmit the same cs over and over while tracing

Fixes: 97dfff54 ("radv: Dump command buffer on hang.")
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
CC: <mesa-stable@lists.freedesktop.org>
7 years agogallium/radeon: add HUD queries for monitoring some hw blocks
Samuel Pitoiset [Fri, 20 Jan 2017 18:21:12 +0000 (19:21 +0100)]
gallium/radeon: add HUD queries for monitoring some hw blocks

It's also possible to monitor them via performance counters but
the hardware can only use two counters simultaneously. It seems
easier to re-use the existing code which reads from MMIO instead
of writing a multi-pass approach.

v2: - add new lines after ':'

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agogallium/radeon: refactor the GRBM counters path
Samuel Pitoiset [Fri, 20 Jan 2017 17:15:50 +0000 (18:15 +0100)]
gallium/radeon: refactor the GRBM counters path

This will allow to expose more queries in order to know which
blocks are busy/idle.

v2: - add new lines after ':'

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agoswr: Align query results allocation
George Kyriazis [Wed, 18 Jan 2017 23:09:08 +0000 (17:09 -0600)]
swr: Align query results allocation

Some query results struct contents are declared as cache line aligned.
Use aligned malloc, and align the whole struct, to be safe.

Fixes crash when compiling with clang.

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
7 years agoswr: Prune empty nodes in CalculateProcessorTopology.
Bruce Cherniak [Thu, 19 Jan 2017 21:44:52 +0000 (15:44 -0600)]
swr: Prune empty nodes in CalculateProcessorTopology.

CalculateProcessorTopology tries to figure out system topology by
parsing /proc/cpuinfo to determine the number of threads, cores, and
NUMA nodes.  There are some architectures where the "physical id" begins
with 1 rather than 0, which was creating and empty "0" node and causing a
crash in CreateThreadPool.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97102
Reviewed-By: George Kyriazis <george.kyriazis@intel.com>
CC: <mesa-stable@lists.freedesktop.org>
7 years agoi965: Use UNUSED to silence unused variable (used in assert).
Matt Turner [Mon, 23 Jan 2017 18:50:20 +0000 (10:50 -0800)]
i965: Use UNUSED to silence unused variable (used in assert).

7 years agodri: allow 16bit R/GR images to be exported via drm buffers
Rainer Hochecker [Thu, 5 Jan 2017 15:58:56 +0000 (16:58 +0100)]
dri: allow 16bit R/GR images to be exported via drm buffers

This allows eglCreateImageKHR to access P010 surfaces created by vaapi

Signed-off-by: Rainer Hochecker <fernetmenta@online.de>
Acked-by: Ben Widawky <ben@bwidawsk.net>
7 years agost/va: make sure that we call begin_frame() only once v2
Christian König [Thu, 19 Jan 2017 12:44:34 +0000 (13:44 +0100)]
st/va: make sure that we call begin_frame() only once v2

This fixes "st/va: delay calling begin_frame until we have all parameters".

v2: call begin frame after decoder (re)creation as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Nayan Deshmukh <nayan26deshmukh@gmail.com>
Tested-by: Andy Furniss <adf.lists@gmail.com>
7 years agodrirc: remove spurious tabs
Eric Engestrom [Thu, 5 Jan 2017 21:06:35 +0000 (21:06 +0000)]
drirc: remove spurious tabs

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agost/glsl_to_tgsi: use DDIV instead of DRCP + DMUL
Nicolai Hähnle [Mon, 16 Jan 2017 15:43:54 +0000 (16:43 +0100)]
st/glsl_to_tgsi: use DDIV instead of DRCP + DMUL

Fixes GL45-CTS.gpu_shader_fp64.built_in_functions.

v2: use DDIV unconditionally (Roland)

Reviewed-by: Roland Scheidegger <sroland@vmware.com> (v1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Tested-by: Glenn Kennard <glenn.kennard@gmail.com>
Tested-by: James Harvey <lothmordor@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
7 years agoglsl: split DIV_TO_MUL_RCP into single- and double-precision flags
Nicolai Hähnle [Mon, 16 Jan 2017 15:39:06 +0000 (16:39 +0100)]
glsl: split DIV_TO_MUL_RCP into single- and double-precision flags

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Glenn Kennard <glenn.kennard@gmail.com>
Tested-by: James Harvey <lothmordor@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
7 years agor600: implement DDIV
Nicolai Hähnle [Thu, 19 Jan 2017 13:44:57 +0000 (14:44 +0100)]
r600: implement DDIV

Tested-by: Glenn Kennard <glenn.kennard@gmail.com>
Tested-by: James Harvey <lothmordor@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
7 years agor600: factor out cayman_emit_unary_double_raw
Nicolai Hähnle [Thu, 19 Jan 2017 13:44:24 +0000 (14:44 +0100)]
r600: factor out cayman_emit_unary_double_raw

We will use it for DDIV.

Tested-by: Glenn Kennard <glenn.kennard@gmail.com>
Tested-by: James Harvey <lothmordor@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
7 years agor600: double multiply can handle only one multiply at a time
Nicolai Hähnle [Thu, 19 Jan 2017 13:38:54 +0000 (14:38 +0100)]
r600: double multiply can handle only one multiply at a time

It seems clear that trying to multiply two pairs of doubles would result
in the temporary register getting overwritten by the second pair. So
make the code more explicit.

Tested-by: Glenn Kennard <glenn.kennard@gmail.com>
Tested-by: James Harvey <lothmordor@gmail.com>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
7 years agoglsl: fix tes linking regression
Timothy Arceri [Mon, 23 Jan 2017 07:06:37 +0000 (18:06 +1100)]
glsl: fix tes linking regression

Fixes regression caused by cbeba6bd48da2c. I accidentally pushed the
wrong version of the patch.

7 years agomesa: remove unused gl_shader_info field from gl_linked_shader
Timothy Arceri [Tue, 22 Nov 2016 13:05:01 +0000 (00:05 +1100)]
mesa: remove unused gl_shader_info field from gl_linked_shader

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agomesa/glsl: set and get cs layouts to and from shader_info
Timothy Arceri [Tue, 22 Nov 2016 12:31:08 +0000 (23:31 +1100)]
mesa/glsl: set and get cs layouts to and from shader_info

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agomesa/glsl: set and get gs layouts directly to and from shader_info
Timothy Arceri [Tue, 22 Nov 2016 10:45:16 +0000 (21:45 +1100)]
mesa/glsl: set and get gs layouts directly to and from shader_info

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>