mesa.git
6 years agost/glsl_to_nir: fix driver location for dual-slot packed doubles
Timothy Arceri [Wed, 21 Mar 2018 00:27:19 +0000 (11:27 +1100)]
st/glsl_to_nir: fix driver location for dual-slot packed doubles

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agoradeonsi/nir: fix scanning of multi-slot output varyings
Timothy Arceri [Wed, 21 Feb 2018 05:53:54 +0000 (16:53 +1100)]
radeonsi/nir: fix scanning of multi-slot output varyings

This fixes tcs/tes varying arrays where we dont lower indirects and
therefore don't split arrays. Here we also fix useagemask for dual
slot doubles.

Fixes a number of arb_tessellation_shader piglit tests.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
6 years agobroadcom/vc5: Fix RG16I/UI texture sampling.
Eric Anholt [Tue, 27 Mar 2018 21:26:17 +0000 (14:26 -0700)]
broadcom/vc5: Fix RG16I/UI texture sampling.

How many times did I look at this table without noticing the missing 'G'
in the texture column?

Fixes KHR-GLES3.copy_tex_image_conversions.required.* on 7268.

6 years agonir: fix generated nir_intrinsics.c for MSVC
Rob Clark [Tue, 27 Mar 2018 18:52:55 +0000 (14:52 -0400)]
nir: fix generated nir_intrinsics.c for MSVC

Apparently it is not happy about things like: .foo = {}

So skip over initializers for empty lists.

Fixes: 76dfed8ae2d5c6c509eb2661389be3c6a25077df
Reported-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Rob Clark <robdclark@gmail.com>
6 years agodocs: update calendar 18.0.0 is out
Emil Velikov [Tue, 27 Mar 2018 18:11:45 +0000 (19:11 +0100)]
docs: update calendar 18.0.0 is out

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
6 years agodocs: add news item and link release notes for 18.0.0
Emil Velikov [Tue, 27 Mar 2018 18:08:48 +0000 (19:08 +0100)]
docs: add news item and link release notes for 18.0.0

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
6 years agodocs: add sha256 checksums for 18.0.0
Emil Velikov [Tue, 27 Mar 2018 18:02:59 +0000 (19:02 +0100)]
docs: add sha256 checksums for 18.0.0

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit fb64913d195112462786c0459d12f4bc8e7adee7)

6 years agodocs: Update 18.0.0 release notes
Emil Velikov [Tue, 27 Mar 2018 16:19:58 +0000 (17:19 +0100)]
docs: Update 18.0.0 release notes

Note: the file was originally 17.4.0, yet git stuggles to detect the
move :-\

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit dceb1ce807a8b0ab32dc16b38040969bdbcc0d1b)

6 years agonir: mako all the intrinsics
Rob Clark [Thu, 15 Mar 2018 22:42:44 +0000 (18:42 -0400)]
nir: mako all the intrinsics

I threatened to do this a long time ago.. I probably *should* have done
it a long time ago when there where many fewer intrinsics.  But the
system of macro/#include magic for dealing with intrinsics is a bit
annoying, and python has the nice property of optional fxn params,
making it possible to define new intrinsics while ignoring parameters
that are not applicable (and naming optional params).  And not having to
specify various array lengths explicitly is nice too.

I think the end result makes it easier to add new intrinsics.

v2: couple small fixes found with a test program to compare the old and
    new tables
v3: misc comments, don't rely on capture=true for meson.build, get rid
    of system_values table to avoid return value of intrinsic() and
    *mostly* remove side-effects, add autotools build support
v4: scons build

Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agonir: fix per_vertex_output intrinsic
Rob Clark [Fri, 16 Mar 2018 17:10:18 +0000 (13:10 -0400)]
nir: fix per_vertex_output intrinsic

This is supposed to have both BASE and COMPONENT but num_indices was
inadvertantly set to 1.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
6 years agoglsl_types: fix build break with intel/msvc compiler
Rob Clark [Mon, 26 Mar 2018 22:45:07 +0000 (18:45 -0400)]
glsl_types: fix build break with intel/msvc compiler

The VECN() macro was taking advantage of a GCC specific feature that is
not available on lesser compilers, mostly for the purposes of avoiding a
macro that encoded a return statement.

But as suggested by Ian, we could just have the macro produce the entire
method body and avoid the need for this.  So let's do that instead.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105740
Fixes: f407edf3407396379e16b0be74b8d3b85d2ad7f0
Cc: Emil Velikov <emil.velikov@collabora.com>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Cc: Roland Scheidegger <sroland@vmware.com>
Cc: Ian Romanick <idr@freedesktop.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
6 years agomesa: add GL_HALF_FLOAT as supported type to readpixels
Lin Johnson [Mon, 26 Mar 2018 14:13:32 +0000 (22:13 +0800)]
mesa: add GL_HALF_FLOAT as supported type to readpixels

EXT_color_buffer_float spec states:

  "An INVALID_OPERATION error is generated ... if the color buffer is
   a floating-point format and type is not FLOAT, HALF FLOAT, or
   UNSIGNED_INT_10F_11F_11F_REV."

This means that GL_HALF_FLOAT type should be supported when color
buffer has floating-point format.

Fixes Android CTS test android.view.cts.PixelCopyTest.

v2: remove comments of EXT_color_buffer_half_float as
    EXT_color_buffer_float can use type GL_HALF_FLOAT

Signed-off-by: Lin Johnson <johnson.lin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
6 years agobroadcom/vc5: Fix swizzling of RGB10_A2UI render targets.
Eric Anholt [Mon, 26 Mar 2018 19:39:12 +0000 (12:39 -0700)]
broadcom/vc5: Fix swizzling of RGB10_A2UI render targets.

This is the actual hardware layout, and we were only swizzling R/B back
around in texturing.  Fixes part of
KHR-GLES3.copy_tex_image_conversions.required.cubemap_negx_cubemap_negx in
simulation.

6 years agobroadcom/vc5: Fix extraneous register index in QIR dumping of TLBU writes.
Eric Anholt [Mon, 26 Mar 2018 19:18:39 +0000 (12:18 -0700)]
broadcom/vc5: Fix extraneous register index in QIR dumping of TLBU writes.

Just like TLB without a config uniform, we don't have a register index.

6 years agobroadcom/vc5: Implement workaround for GFXH-1431.
Eric Anholt [Mon, 26 Mar 2018 17:38:28 +0000 (10:38 -0700)]
broadcom/vc5: Implement workaround for GFXH-1431.

This should fix some blending errors, but doesn't impact any testcases in
the CTS.

6 years agobroadcom/vc5: Fix EZ disabling and allow using GT/GE direction as well.
Eric Anholt [Fri, 23 Mar 2018 23:18:02 +0000 (16:18 -0700)]
broadcom/vc5: Fix EZ disabling and allow using GT/GE direction as well.

Once we've disabled EZ for some draws, we need to not use EZ on future
draws.  Implementing that made implementing the GT/GE direction trivial.

Fixes KHR-GLES3.shaders.fragdepth.compare.no_write on V3D 4.1 simulation.

6 years agobroadcom/vc5: Disable TF on V3D 4.x when drawing with queries disabled.
Eric Anholt [Fri, 23 Mar 2018 22:43:50 +0000 (15:43 -0700)]
broadcom/vc5: Disable TF on V3D 4.x when drawing with queries disabled.

On 3.x, we just don't flag the primitive as needing TF, but those
primitive bits are now allocated to the new primitive types.  Now we need
to actually update the enable flag at draw time.

6 years agobroadcom/vc5: Disable transform feedback on V3D 4.x at the end of the job.
Eric Anholt [Fri, 23 Mar 2018 22:28:40 +0000 (15:28 -0700)]
broadcom/vc5: Disable transform feedback on V3D 4.x at the end of the job.

The next job from this client will turn it back on unless TF gets
disabled, but we don't want the state to leak from this client to another
(which causes GPU hangs).

6 years agobroadcom/vc5: Move the BCL epilogue code to a per-version compile.
Eric Anholt [Fri, 23 Mar 2018 22:19:05 +0000 (15:19 -0700)]
broadcom/vc5: Move the BCL epilogue code to a per-version compile.

I need to do some new packets for transform feedback on 4.1.

6 years agobroadcom/vc5: Fix transform feedback in the presence of point size.
Eric Anholt [Wed, 21 Mar 2018 22:07:19 +0000 (15:07 -0700)]
broadcom/vc5: Fix transform feedback in the presence of point size.

I had this note to myself, and it turns out that a lot of CTS tests use
XFB with points to get data out without using a fragment shader.  Keep
track of two sets of precomputed TF specs (point size in VPM prologue or
not), and switch between them when we enable/disable point size.

6 years agobroadcom/vc5: Split transform feedback specs update from buffers.
Eric Anholt [Fri, 23 Mar 2018 22:40:36 +0000 (15:40 -0700)]
broadcom/vc5: Split transform feedback specs update from buffers.

The specs update will be changing based on additional state flags in the
next commit, and this unindents the buffer update code.

6 years agobroadcom/vc5: Limit each transform feedback data spec to 16 dwords.
Eric Anholt [Wed, 21 Mar 2018 22:18:34 +0000 (15:18 -0700)]
broadcom/vc5: Limit each transform feedback data spec to 16 dwords.

The length-1 field only has 4 bits, so we need to generate separate specs
when there's too much TF output per buffer.

Fixes
GTF-GLES3.gtf.GL3Tests.transform_feedback.transform_feedback_builtin_type
and transform_feedback_max_interleaved.

6 years agogallium/u_vbuf: Protect against overflow with large instance divisors.
Eric Anholt [Tue, 20 Mar 2018 17:42:12 +0000 (10:42 -0700)]
gallium/u_vbuf: Protect against overflow with large instance divisors.

GTF-GLES3.gtf.GL3Tests.instanced_arrays.instanced_arrays_divisor uses -1
as a divisor, so we would overflow to count=0 and upload no data,
triggering the assert below.  We want to upload 1 element in this case,
fixing the test on VC5.

v2: Use some more obvious logic, and explain why we don't use the normal
    round_up().

Reviewed-by: Brian Paul <brianp@vmware.com>
6 years agost: Allow accelerated CopyTexImage from RGBA to RGB.
Eric Anholt [Wed, 21 Mar 2018 18:43:28 +0000 (11:43 -0700)]
st: Allow accelerated CopyTexImage from RGBA to RGB.

There's nothing to worry about here -- the A channel just gets dropped by
the blit.  This avoids a segfault in the fallback path when copying from a
RGBA16_SINT renderbuffer to a RGB16_SINT destination represented by an
RGBA16_SINT texture (the fallback path tries to get/fetch to float
buffers, but the float pack/unpack functions are NULL for SINT/UINT).

Fixes KHR-GLES3.packed_pixels.pbo_rectangle.rgba16i on VC5.

v2: Extract the logic to a helper function and explain what's going on
    better.
v3: const-qualify args

Reviewed-by: Brian Paul <brianp@vmware.com>
6 years agowinsys/amdgpu: always allow GTT placements on APUs
Marek Olšák [Wed, 21 Mar 2018 20:10:29 +0000 (16:10 -0400)]
winsys/amdgpu: always allow GTT placements on APUs

Reviewed-by: Christian König <christian.koenig@amd.com>
6 years agoradeonsi: don't reallocate on DMABUF export if local BOs are disabled
Marek Olšák [Thu, 15 Mar 2018 19:58:57 +0000 (15:58 -0400)]
radeonsi: don't reallocate on DMABUF export if local BOs are disabled

6 years agoglsl: fix infinite loop caused by bug in loop unrolling pass
Timothy Arceri [Sun, 25 Mar 2018 23:31:26 +0000 (10:31 +1100)]
glsl: fix infinite loop caused by bug in loop unrolling pass

Just checking for 2 jumps is not enough to be sure we can do a
complex loop unroll. We need to make sure we also have also found
2 loop terminators.

Without this we were attempting to unroll a loop where the second
jump was nested inside multiple ifs which loop analysis is unable
to detect as a terminator. We ended up splicing out the first
terminator but failed to actually unroll the loop, this resulted
in the creation of a possible infinite loop.

Fixes: 646621c66da9 "glsl: make loop unrolling more like the nir unrolling path"
Tested-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105670

6 years agogallium: Do not add -Wframe-address option for gcc <= 4.4.
Vinson Lee [Wed, 21 Mar 2018 21:59:32 +0000 (14:59 -0700)]
gallium: Do not add -Wframe-address option for gcc <= 4.4.

This patch fixes these build errors with GCC 4.4.

  Compiling src/gallium/auxiliary/util/u_debug_stack.c ...
src/gallium/auxiliary/util/u_debug_stack.c: In function ‘debug_backtrace_capture’:
src/gallium/auxiliary/util/u_debug_stack.c:268: error: #pragma GCC diagnostic not allowed inside functions
src/gallium/auxiliary/util/u_debug_stack.c:269: error: #pragma GCC diagnostic not allowed inside functions
src/gallium/auxiliary/util/u_debug_stack.c:271: error: #pragma GCC diagnostic not allowed inside functions

Fixes: 370e356ebab4 ("gallium: silence __builtin_frame_address nonzero argument is unsafe warning")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105529
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
6 years agogallium: Correct minor typo in header comments
Alyssa Rosenzweig [Mon, 26 Mar 2018 15:56:53 +0000 (15:56 +0000)]
gallium: Correct minor typo in header comments

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
6 years agointel/aubinator_error_decode: Decode more registers.
Rafael Antognolli [Wed, 21 Mar 2018 18:42:23 +0000 (11:42 -0700)]
intel/aubinator_error_decode: Decode more registers.

Decode SC_INSTDONE, ROW_INSTDONE and SAMPLER_INSTDONE.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/genxml: Add SAMPLER_INSTDONE register.
Rafael Antognolli [Wed, 21 Mar 2018 18:42:22 +0000 (11:42 -0700)]
intel/genxml: Add SAMPLER_INSTDONE register.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/genxml: Add ROW_INSTDONE register.
Rafael Antognolli [Wed, 21 Mar 2018 18:42:21 +0000 (11:42 -0700)]
intel/genxml: Add ROW_INSTDONE register.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agointel/genxml: Add SC_INSTDONE register.
Rafael Antognolli [Wed, 21 Mar 2018 18:42:20 +0000 (11:42 -0700)]
intel/genxml: Add SC_INSTDONE register.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agoi965/vec4: Fix null destination register in 3-source instructions
Ian Romanick [Fri, 23 Mar 2018 18:46:12 +0000 (11:46 -0700)]
i965/vec4: Fix null destination register in 3-source instructions

A recent commit (see below) triggered some cases where conditional
modifier propagation and dead code elimination would cause a MAD
instruction like the following to be generated:

    mad.l.f0  null, ...

Matt pointed out that fs_visitor::fixup_3src_null_dest() fixes cases
like this in the scalar backend.  This commit basically ports that code
to the vec4 backend.

NOTE: I have sent a couple tests to the piglit list that reproduce this
bug *without* the commit mentioned below.  This commit fixes those
tests.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Fixes: ee63933a7 ("nir: Distribute binary operations with constants into bcsel")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105704

6 years agonir: Don't condition 'a-b < 0' -> 'a < b' on is_not_used_by_conditional
Ian Romanick [Wed, 14 Mar 2018 23:25:07 +0000 (16:25 -0700)]
nir: Don't condition 'a-b < 0' -> 'a < b' on is_not_used_by_conditional

Now that i965 recognizes that a-b generates the same conditions as 'a <
b', there is no reason to condition this transformation on 'is not used
by conditional.'

Since this was the only user of the is_not_used_by_conditional function,
delete it.

All Gen6+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 14400775 -> 14400595 (<.01%)
instructions in affected programs: 36712 -> 36532 (-0.49%)
helped: 182
HURT: 26
helped stats (abs) min: 1 max: 2 x̄: 1.13 x̃: 1
helped stats (rel) min: 0.15% max: 1.82% x̄: 0.70% x̃: 0.62%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.24% max: 1.02% x̄: 0.82% x̃: 0.90%
95% mean confidence interval for instructions value: -0.97 -0.76
95% mean confidence interval for instructions %-change: -0.59% -0.43%
Instructions are helped.

total cycles in shared programs: 532929592 -> 532926345 (<.01%)
cycles in affected programs: 478660 -> 475413 (-0.68%)
helped: 187
HURT: 22
helped stats (abs) min: 2 max: 200 x̄: 20.99 x̃: 18
helped stats (rel) min: 0.23% max: 24.10% x̄: 1.48% x̃: 1.03%
HURT stats (abs)   min: 1 max: 214 x̄: 30.86 x̃: 11
HURT stats (rel)   min: 0.01% max: 23.06% x̄: 3.12% x̃: 0.86%
95% mean confidence interval for cycles value: -19.50 -11.57
95% mean confidence interval for cycles %-change: -1.42% -0.58%
Cycles are helped.

GM45 and Iron Lake had similar results. (Iron Lake shown)
total cycles in shared programs: 177851578 -> 177851810 (<.01%)
cycles in affected programs: 24408 -> 24640 (0.95%)
helped: 2
HURT: 4
helped stats (abs) min: 4 max: 4 x̄: 4.00 x̃: 4
helped stats (rel) min: 0.42% max: 0.47% x̄: 0.44% x̃: 0.44%
HURT stats (abs)   min: 24 max: 108 x̄: 60.00 x̃: 54
HURT stats (rel)   min: 0.52% max: 1.62% x̄: 1.04% x̃: 1.02%
95% mean confidence interval for cycles value: -7.75 85.08
95% mean confidence interval for cycles %-change: -0.39% 1.49%
Inconclusive result (value mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agoi965/vec4: Propagate conditional modifiers from compares to adds
Ian Romanick [Wed, 21 Mar 2018 22:22:51 +0000 (15:22 -0700)]
i965/vec4: Propagate conditional modifiers from compares to adds

No changes on Broadwell or later as those platforms do not use the vec4
backend.

Ivy Bridge and Haswell had similar results. (Ivy Bridge shown)
total instructions in shared programs: 11682119 -> 11681056 (<.01%)
instructions in affected programs: 150403 -> 149340 (-0.71%)
helped: 950
HURT: 0
helped stats (abs) min: 1 max: 16 x̄: 1.12 x̃: 1
helped stats (rel) min: 0.23% max: 2.78% x̄: 0.82% x̃: 0.71%
95% mean confidence interval for instructions value: -1.19 -1.04
95% mean confidence interval for instructions %-change: -0.84% -0.79%
Instructions are helped.

total cycles in shared programs: 257495842 -> 257495238 (<.01%)
cycles in affected programs: 270302 -> 269698 (-0.22%)
helped: 271
HURT: 13
helped stats (abs) min: 2 max: 14 x̄: 2.42 x̃: 2
helped stats (rel) min: 0.06% max: 1.13% x̄: 0.32% x̃: 0.28%
HURT stats (abs)   min: 2 max: 12 x̄: 4.00 x̃: 4
HURT stats (rel)   min: 0.15% max: 1.18% x̄: 0.30% x̃: 0.26%
95% mean confidence interval for cycles value: -2.41 -1.84
95% mean confidence interval for cycles %-change: -0.31% -0.26%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10430493 -> 10429727 (<.01%)
instructions in affected programs: 120860 -> 120094 (-0.63%)
helped: 766
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.30% max: 2.70% x̄: 0.78% x̃: 0.73%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.80% -0.75%
Instructions are helped.

total cycles in shared programs: 146138718 -> 146138446 (<.01%)
cycles in affected programs: 244114 -> 243842 (-0.11%)
helped: 132
HURT: 0
helped stats (abs) min: 2 max: 4 x̄: 2.06 x̃: 2
helped stats (rel) min: 0.03% max: 0.43% x̄: 0.16% x̃: 0.19%
95% mean confidence interval for cycles value: -2.12 -2.00
95% mean confidence interval for cycles %-change: -0.18% -0.15%
Cycles are helped.

GM45 and Iron Lake had identical results. (Iron Lake shown)
total instructions in shared programs: 7780251 -> 7780248 (<.01%)
instructions in affected programs: 175 -> 172 (-1.71%)
helped: 3
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 1.49% max: 2.44% x̄: 1.81% x̃: 1.49%

total cycles in shared programs: 177851584 -> 177851578 (<.01%)
cycles in affected programs: 9796 -> 9790 (-0.06%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.05% max: 0.08% x̄: 0.06% x̃: 0.05%

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agoi965/vec4: Allow cmod propagation when src0 is a uniform or shader input
Ian Romanick [Wed, 21 Mar 2018 22:22:15 +0000 (15:22 -0700)]
i965/vec4: Allow cmod propagation when src0 is a uniform or shader input

No shader-db changes.  This source must have been written by a previous
instruction, so it cannot be a uniform or a shader input.  However, this
change allows the next commit to help more shaders.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agoi965/fs: Propagate conditional modifiers from compares to adds
Ian Romanick [Fri, 9 Mar 2018 21:45:01 +0000 (13:45 -0800)]
i965/fs: Propagate conditional modifiers from compares to adds

The math inside the add and the cmp in this instruction sequence is the
same.  We can utilize this to eliminate the compare.

add(8)          g5<1>F          g2<8,8,1>F      g64.5<0,1,0>F   { align1 1Q compacted };
cmp.z.f0(8)     null<1>F        g2<8,8,1>F      -g64.5<0,1,0>F  { align1 1Q switch };
(-f0) sel(8)    g8<1>F          (abs)g5<8,8,1>F 3e-37F          { align1 1Q };

This is reduced to:

add.z.f0(8)     g5<1>F          g2<8,8,1>F      g64.5<0,1,0>F   { align1 1Q compacted };
(-f0) sel(8)    g8<1>F          (abs)g5<8,8,1>F 3e-37F          { align1 1Q };

This optimization pass could do even better.  The nature of converting
vectorized code from the GLSL front end to scalar code in NIR results in
sequences like:

add(8)          g7<1>F          g4<8,8,1>F      g64.5<0,1,0>F   { align1 1Q compacted };
add(8)          g6<1>F          g3<8,8,1>F      g64.5<0,1,0>F   { align1 1Q compacted };
add(8)          g5<1>F          g2<8,8,1>F      g64.5<0,1,0>F   { align1 1Q compacted };
cmp.z.f0(8)     null<1>F        g2<8,8,1>F      -g64.5<0,1,0>F  { align1 1Q switch };
(-f0) sel(8)    g8<1>F          (abs)g5<8,8,1>F 3e-37F          { align1 1Q };
cmp.z.f0(8)     null<1>F        g3<8,8,1>F      -g64.5<0,1,0>F  { align1 1Q switch };
(-f0) sel(8)    g10<1>F         (abs)g6<8,8,1>F 3e-37F          { align1 1Q };
cmp.z.f0(8)     null<1>F        g4<8,8,1>F      -g64.5<0,1,0>F  { align1 1Q switch };
(-f0) sel(8)    g12<1>F         (abs)g7<8,8,1>F 3e-37F          { align1 1Q };

In this sequence, only the first cmp.z is removed.  With different
scheduling, all 3 could get removed.

Skylake
total instructions in shared programs: 14407009 -> 14400173 (-0.05%)
instructions in affected programs: 1307274 -> 1300438 (-0.52%)
helped: 4880
HURT: 0
helped stats (abs) min: 1 max: 33 x̄: 1.40 x̃: 1
helped stats (rel) min: 0.03% max: 8.70% x̄: 0.70% x̃: 0.52%
95% mean confidence interval for instructions value: -1.45 -1.35
95% mean confidence interval for instructions %-change: -0.72% -0.69%
Instructions are helped.

total cycles in shared programs: 532943169 -> 532923528 (<.01%)
cycles in affected programs: 14065798 -> 14046157 (-0.14%)
helped: 2703
HURT: 339
helped stats (abs) min: 1 max: 1062 x̄: 12.27 x̃: 2
helped stats (rel) min: <.01% max: 28.72% x̄: 0.38% x̃: 0.21%
HURT stats (abs)   min: 1 max: 739 x̄: 39.86 x̃: 12
HURT stats (rel)   min: 0.02% max: 27.69% x̄: 1.38% x̃: 0.41%
95% mean confidence interval for cycles value: -8.66 -4.26
95% mean confidence interval for cycles %-change: -0.24% -0.14%
Cycles are helped.

LOST:   0
GAINED: 1

Broadwell
total instructions in shared programs: 14719636 -> 14712949 (-0.05%)
instructions in affected programs: 1288188 -> 1281501 (-0.52%)
helped: 4845
HURT: 0
helped stats (abs) min: 1 max: 33 x̄: 1.38 x̃: 1
helped stats (rel) min: 0.03% max: 8.00% x̄: 0.70% x̃: 0.52%
95% mean confidence interval for instructions value: -1.43 -1.33
95% mean confidence interval for instructions %-change: -0.72% -0.68%
Instructions are helped.

total cycles in shared programs: 559599253 -> 559581699 (<.01%)
cycles in affected programs: 13315565 -> 13298011 (-0.13%)
helped: 2600
HURT: 269
helped stats (abs) min: 1 max: 2128 x̄: 12.24 x̃: 2
helped stats (rel) min: <.01% max: 23.95% x̄: 0.41% x̃: 0.20%
HURT stats (abs)   min: 1 max: 790 x̄: 53.07 x̃: 20
HURT stats (rel)   min: 0.02% max: 15.96% x̄: 1.55% x̃: 0.75%
95% mean confidence interval for cycles value: -8.47 -3.77
95% mean confidence interval for cycles %-change: -0.27% -0.18%
Cycles are helped.

LOST:   0
GAINED: 8

Haswell
total instructions in shared programs: 12978609 -> 12973483 (-0.04%)
instructions in affected programs: 932921 -> 927795 (-0.55%)
helped: 3480
HURT: 0
helped stats (abs) min: 1 max: 33 x̄: 1.47 x̃: 1
helped stats (rel) min: 0.03% max: 7.84% x̄: 0.78% x̃: 0.58%
95% mean confidence interval for instructions value: -1.53 -1.42
95% mean confidence interval for instructions %-change: -0.80% -0.75%
Instructions are helped.

total cycles in shared programs: 410270788 -> 410250531 (<.01%)
cycles in affected programs: 10986161 -> 10965904 (-0.18%)
helped: 2087
HURT: 254
helped stats (abs) min: 1 max: 2672 x̄: 14.63 x̃: 4
helped stats (rel) min: <.01% max: 39.61% x̄: 0.42% x̃: 0.21%
HURT stats (abs)   min: 1 max: 519 x̄: 40.49 x̃: 16
HURT stats (rel)   min: 0.01% max: 12.83% x̄: 1.20% x̃: 0.47%
95% mean confidence interval for cycles value: -12.82 -4.49
95% mean confidence interval for cycles %-change: -0.31% -0.18%
Cycles are helped.

LOST:   0
GAINED: 5

Ivy Bridge
total instructions in shared programs: 11686082 -> 11681548 (-0.04%)
instructions in affected programs: 937696 -> 933162 (-0.48%)
helped: 3150
HURT: 0
helped stats (abs) min: 1 max: 33 x̄: 1.44 x̃: 1
helped stats (rel) min: 0.03% max: 7.84% x̄: 0.69% x̃: 0.49%
95% mean confidence interval for instructions value: -1.49 -1.38
95% mean confidence interval for instructions %-change: -0.71% -0.67%
Instructions are helped.

total cycles in shared programs: 257514962 -> 257492471 (<.01%)
cycles in affected programs: 11524149 -> 11501658 (-0.20%)
helped: 1970
HURT: 239
helped stats (abs) min: 1 max: 3525 x̄: 17.48 x̃: 3
helped stats (rel) min: <.01% max: 49.60% x̄: 0.46% x̃: 0.17%
HURT stats (abs)   min: 1 max: 1358 x̄: 50.00 x̃: 15
HURT stats (rel)   min: 0.02% max: 59.88% x̄: 1.84% x̃: 0.65%
95% mean confidence interval for cycles value: -17.01 -3.35
95% mean confidence interval for cycles %-change: -0.33% -0.08%
Cycles are helped.

LOST:   9
GAINED: 1

Sandy Bridge
total instructions in shared programs: 10432841 -> 10429893 (-0.03%)
instructions in affected programs: 685071 -> 682123 (-0.43%)
helped: 2453
HURT: 0
helped stats (abs) min: 1 max: 9 x̄: 1.20 x̃: 1
helped stats (rel) min: 0.02% max: 7.55% x̄: 0.64% x̃: 0.46%
95% mean confidence interval for instructions value: -1.23 -1.17
95% mean confidence interval for instructions %-change: -0.67% -0.62%
Instructions are helped.

total cycles in shared programs: 146133660 -> 146134195 (<.01%)
cycles in affected programs: 3991634 -> 3992169 (0.01%)
helped: 1237
HURT: 153
helped stats (abs) min: 1 max: 2853 x̄: 6.93 x̃: 2
helped stats (rel) min: <.01% max: 29.00% x̄: 0.24% x̃: 0.14%
HURT stats (abs)   min: 1 max: 1740 x̄: 59.56 x̃: 12
HURT stats (rel)   min: 0.03% max: 78.98% x̄: 1.96% x̃: 0.42%
95% mean confidence interval for cycles value: -5.13 5.90
95% mean confidence interval for cycles %-change: -0.17% 0.16%
Inconclusive result (value mean confidence interval includes 0).

LOST:   0
GAINED: 1

GM45 and Iron Lake had similar results (GM45 shown):
total instructions in shared programs: 4800332 -> 4798380 (-0.04%)
instructions in affected programs: 565995 -> 564043 (-0.34%)
helped: 1451
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 1.35 x̃: 1
helped stats (rel) min: 0.05% max: 5.26% x̄: 0.47% x̃: 0.31%
95% mean confidence interval for instructions value: -1.40 -1.29
95% mean confidence interval for instructions %-change: -0.50% -0.45%
Instructions are helped.

total cycles in shared programs: 122032318 -> 122027798 (<.01%)
cycles in affected programs: 8334868 -> 8330348 (-0.05%)
helped: 1029
HURT: 1
helped stats (abs) min: 2 max: 40 x̄: 4.43 x̃: 2
helped stats (rel) min: <.01% max: 1.83% x̄: 0.09% x̃: 0.04%
HURT stats (abs)   min: 38 max: 38 x̄: 38.00 x̃: 38
HURT stats (rel)   min: 0.25% max: 0.25% x̄: 0.25% x̃: 0.25%
95% mean confidence interval for cycles value: -4.70 -4.08
95% mean confidence interval for cycles %-change: -0.09% -0.08%
Cycles are helped.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agoi965/fs: Allow cmod propagation when src0 is a uniform or shader input
Ian Romanick [Wed, 14 Mar 2018 17:19:19 +0000 (10:19 -0700)]
i965/fs: Allow cmod propagation when src0 is a uniform or shader input

No shader-db changes.  This source must have been written by a previous
instruction, so it cannot be a uniform or a shader input.  However, this
change allows the next commit to help about 900 more shaders.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agoi965: Add negative_equals methods
Ian Romanick [Tue, 7 Apr 2015 23:11:37 +0000 (16:11 -0700)]
i965: Add negative_equals methods

This method is similar to the existing ::equals methods.  Instead of
testing that two src_regs are equal to each other, it tests that one is
the negation of the other.

v2: Simplify various checks based on suggestions from Matt.  Use
src_reg::type instead of fixed_hw_reg.type in a check.  Also suggested
by Matt.

v3: Rebase on 3 years.  Fix some problems with negative_equals with VF
constants.  Add fs_reg::negative_equals.

v4: Replace the existing default case with BRW_REGISTER_TYPE_UB,
BRW_REGISTER_TYPE_B, and BRW_REGISTER_TYPE_NF.  Suggested by Matt.
Expand the FINISHME comment to better explain why it isn't already
finished.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v3]
Reviewed-by: Matt Turner <mattst88@gmail.com>
6 years agomesa/st/tests: Use tgsi opcode enum also in the test classes
Gert Wollny [Mon, 26 Mar 2018 08:17:00 +0000 (02:17 -0600)]
mesa/st/tests: Use tgsi opcode enum also in the test classes

Fixes: ec478cf9c31K ("st/mesa,tgsi: use enum tgsi_opcode")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105737
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
6 years agomeson: fix header check message
Eric Engestrom [Fri, 23 Mar 2018 17:18:56 +0000 (17:18 +0000)]
meson: fix header check message

before: Checking if "endian.h works" compiles: YES
after:  Checking if "endian.h" compiles: YES

Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.co.uk>
6 years agoglsl_types: vec8/vec16 support
Rob Clark [Mon, 12 Mar 2018 19:00:31 +0000 (15:00 -0400)]
glsl_types: vec8/vec16 support

Not used in GL but 8 and 16 component vectors exist in OpenCL.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agoglsl_types: refactor/prep for vec8/vec16
Rob Clark [Mon, 12 Mar 2018 18:54:56 +0000 (14:54 -0400)]
glsl_types: refactor/prep for vec8/vec16

Refactor things so there isn't so much typing involved to add new
things.

Also drops a pointless conditional (out of bounds rows or columns
already returns error_type in all paths.. might as well drop it
rather than make the check more convoluted in the next patch by
adding the vec8/vec16 case).

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agoanv: Set genX_table for gen11
Jordan Justen [Fri, 23 Mar 2018 21:55:52 +0000 (14:55 -0700)]
anv: Set genX_table for gen11

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agoanv: Add gen11 to anv_genX_call
Jordan Justen [Thu, 22 Mar 2018 19:04:12 +0000 (12:04 -0700)]
anv: Add gen11 to anv_genX_call

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
6 years agovbo: Make sure the internal VAO's stay within limits.
Mathias Fröhlich [Thu, 22 Mar 2018 04:34:09 +0000 (05:34 +0100)]
vbo: Make sure the internal VAO's stay within limits.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agomesa: Flag early if we modify a SharedAndImmutable VAO.
Mathias Fröhlich [Thu, 22 Mar 2018 04:34:09 +0000 (05:34 +0100)]
mesa: Flag early if we modify a SharedAndImmutable VAO.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agomesa: When copying a VAO also copy the vertex attribute mode.
Mathias Fröhlich [Thu, 22 Mar 2018 04:34:09 +0000 (05:34 +0100)]
mesa: When copying a VAO also copy the vertex attribute mode.

Reviewed-by: Brian Paul <brianp@vmware.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
6 years agoconfigure: use AC_CHECK_HEADERS to check for endian.h
Emil Velikov [Fri, 23 Mar 2018 17:37:39 +0000 (17:37 +0000)]
configure: use AC_CHECK_HEADERS to check for endian.h

The currently we use the singular CHECK_HEADER combined with explicit
append to the DEFINES variable. That is a legacy misnomer, since it
requires us to add $DEFINES to every piece that we build.

Using the plural version of the helper sets the HAVE_ macro for us, plus
ensures it's passed to the compiler - if config.h is available in there
(not in the case of mesa) otherwise on the command line.

In hindsight, we should replace all the AC_CHECK_{FUNC,HEADER} instances
with the plural version (or even the _ONCE suffixed version) and drop
the DEFINES hacks.

Fixes: cbee1bfb342 ("meson/configure: detect endian.h instead of trying
to guess when it's available")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105717
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
6 years agoandroid: Use local i915_drm.h rather than the system one.
Kenneth Graunke [Fri, 23 Mar 2018 16:37:43 +0000 (09:37 -0700)]
android: Use local i915_drm.h rather than the system one.

Fixes: 2d26c9993389a8eb8f712 (intel: devinfo: meson: include drm uapi)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
6 years agost/mesa: s/unsigned/enum pipe_shader_type/ for st_bind_ubos()
Brian Paul [Thu, 15 Mar 2018 14:25:43 +0000 (08:25 -0600)]
st/mesa: s/unsigned/enum pipe_shader_type/ for st_bind_ubos()

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
6 years agost/mesa: whitespace/formatting fixes in st_atom_constbuf.c
Brian Paul [Thu, 15 Mar 2018 14:23:13 +0000 (08:23 -0600)]
st/mesa: whitespace/formatting fixes in st_atom_constbuf.c

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
6 years agost/mesa: s/unsigned/enum pipe_shader_type/
Brian Paul [Thu, 15 Mar 2018 14:22:55 +0000 (08:22 -0600)]
st/mesa: s/unsigned/enum pipe_shader_type/

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Neha Bhende <bhenden@vmware.com>
6 years agosvga: simplify uses_flat_interp expression in emit_input_declarations()
Brian Paul [Tue, 6 Mar 2018 16:29:21 +0000 (09:29 -0700)]
svga: simplify uses_flat_interp expression in emit_input_declarations()

Reviewed-by: Neha Bhende <bhenden@vmware.com>
6 years agosvga: replace unsigned with proper enum names
Brian Paul [Mon, 5 Mar 2018 17:48:46 +0000 (10:48 -0700)]
svga: replace unsigned with proper enum names

Reviewed-by: Neha Bhende <bhenden@vmware.com>
6 years agotgsi,softpipe: use enum tgsi_opcode
Brian Paul [Mon, 5 Mar 2018 17:28:03 +0000 (10:28 -0700)]
tgsi,softpipe: use enum tgsi_opcode

Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agost/mesa,tgsi: use enum tgsi_opcode
Brian Paul [Mon, 5 Mar 2018 17:20:32 +0000 (10:20 -0700)]
st/mesa,tgsi: use enum tgsi_opcode

Need to update the tgsi code and st_glsl_to_tgsi code at the same time
to prevent compile break since C++ is much pickier about implicit
enum/unsigned casting.

Bump size of glsl_to_tgsi_instruction::op to 10 bits to be sure to
avoid MSVC signed enum overflow issue.  No change in class size.

Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agotgsi/nir: use enum tgsi_opcode
Brian Paul [Mon, 5 Mar 2018 17:20:01 +0000 (10:20 -0700)]
tgsi/nir: use enum tgsi_opcode

Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agotgsi: use enum tgsi_opcode
Brian Paul [Mon, 5 Mar 2018 17:16:01 +0000 (10:16 -0700)]
tgsi: use enum tgsi_opcode

Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agogallivm: use enum tgis_opcode
Brian Paul [Mon, 5 Mar 2018 17:07:03 +0000 (10:07 -0700)]
gallivm: use enum tgis_opcode

Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agosvga: use enum tgsi_opcode
Brian Paul [Mon, 5 Mar 2018 17:05:52 +0000 (10:05 -0700)]
svga: use enum tgsi_opcode

Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agotgsi: convert opcode macros to enums
Brian Paul [Mon, 5 Mar 2018 17:04:30 +0000 (10:04 -0700)]
tgsi: convert opcode macros to enums

Enums are nicer in gdb.

Reviewed-by: Eric Anholt <eric@anholt.net>
6 years agocompiler: glsl: silence valgrind warning on write cache
Lionel Landwerlin [Fri, 23 Mar 2018 10:40:02 +0000 (10:40 +0000)]
compiler: glsl: silence valgrind warning on write cache

I don't think it actually fixes anything, but that's nice not to have valgrind warnings.
It manifests itself when running the piglit test : glsl-fs-raytrace-bug27060

==2058== Uninitialised byte(s) found during client check request
==2058==    at 0xC5BB040: blob_write_bytes (blob.c:152)
==2058==    by 0xC595359: write_variable (nir_serialize.c:144)
==2058==    by 0xC59560C: write_var_list (nir_serialize.c:192)
==2058==    by 0xC5982E4: nir_serialize (nir_serialize.c:1124)
==2058==    by 0xC0B729D: brw_program_serialize_nir (brw_program.c:835)
==2058==    by 0xC0AB2D6: brw_link_shader (brw_link.cpp:358)
==2058==    by 0xC32FE3F: _mesa_glsl_link_shader (ir_to_mesa.cpp:3169)
==2058==    by 0xC36C7ED: create_new_program(gl_context*, state_key*) (ff_fragment_shader.cpp:1127)
==2058==    by 0xC36C8A6: _mesa_get_fixed_func_fragment_program (ff_fragment_shader.cpp:1157)
==2058==    by 0xC1B50AF: update_program (state.c:134)
==2058==    by 0xC1B56DF: _mesa_update_state_locked (state.c:352)
==2058==    by 0xC1B579A: _mesa_update_state (state.c:386)
==2058==  Address 0xf1eab8a is 58 bytes inside a block of size 96 alloc'd
==2058==    at 0x4C2CB8F: malloc (vg_replace_malloc.c:299)
==2058==    by 0xC0FD306: ralloc_size (ralloc.c:121)
==2058==    by 0xC0FD5B1: ralloc_array_size (ralloc.c:208)
==2058==    by 0xC452B3B: (anonymous namespace)::nir_visitor::visit(ir_variable*) (glsl_to_nir.cpp:448)
==2058==    by 0xC45CE8B: ir_variable::accept(ir_visitor*) (ir.h:428)
==2058==    by 0xC46D0B5: visit_exec_list(exec_list*, ir_visitor*) (ir.cpp:1898)
==2058==    by 0xC451D2F: glsl_to_nir (glsl_to_nir.cpp:162)
==2058==    by 0xC0B5223: brw_create_nir (brw_program.c:79)
==2058==    by 0xC0AAB67: brw_link_shader (brw_link.cpp:257)
==2058==    by 0xC32FE3F: _mesa_glsl_link_shader (ir_to_mesa.cpp:3169)
==2058==    by 0xC36C7ED: create_new_program(gl_context*, state_key*) (ff_fragment_shader.cpp:1127)
==2058==    by 0xC36C8A6: _mesa_get_fixed_func_fragment_program (ff_fragment_shader.cpp:1157)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
6 years agomeson/configure: detect endian.h instead of trying to guess when it's available
Eric Engestrom [Wed, 21 Mar 2018 17:04:06 +0000 (17:04 +0000)]
meson/configure: detect endian.h instead of trying to guess when it's available

Cc: Maxin B. John <maxin.john@gmail.com>
Cc: Khem Raj <raj.khem@gmail.com>
Cc: Rob Herring <robh@kernel.org>
Suggested-by: Jon Turney <jon.turney@dronecode.org.uk>
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Cc: <mesa-stable@lists.freedesktop.org>
6 years agowayland-drm: do not distribute generated sources
Juan A. Suarez Romero [Fri, 23 Mar 2018 10:24:42 +0000 (11:24 +0100)]
wayland-drm: do not distribute generated sources

Instead we will re-generate them again on building.

v2: get rid of BUILT_SOURCES (Daniel, Emil)
v3: keep BUILT_SOURCES for egl/Makefile.am (Emil)

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agoradv: enable TC-compat HTILE for 16-bit depth surfaces on GFX8
Samuel Pitoiset [Wed, 21 Mar 2018 20:30:42 +0000 (21:30 +0100)]
radv: enable TC-compat HTILE for 16-bit depth surfaces on GFX8

The hardware only supports 32-bit depth surfaces, but we can
enable TC-compat HTILE for 16-bit depth surfaces if no Z planes
are compressed.

The main benefit is to reduce the number of depth decompression
passes. Also, we don't need to implement DB->CB copies which is
fine.

This improves Serious Sam 2017 by +4%. Talos and F12017 are also
affected but I don't see a performance difference.

This also improves the shadowmapping Vulkan demo by 10-15%
(FPS is now similar to AMDVLK).

No CTS regressions on Polaris10.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: add radv_calc_decompress_on_z_planes() helper
Samuel Pitoiset [Wed, 21 Mar 2018 20:30:41 +0000 (21:30 +0100)]
radv: add radv_calc_decompress_on_z_planes() helper

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agoradv: add radv_image_is_tc_compat_htile() helper
Samuel Pitoiset [Wed, 21 Mar 2018 20:30:40 +0000 (21:30 +0100)]
radv: add radv_image_is_tc_compat_htile() helper

Instead of that huge conditional that's going to be crazy.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
6 years agonir: Rename image intrinsics to image_var
Jason Ekstrand [Mon, 19 Mar 2018 18:48:11 +0000 (11:48 -0700)]
nir: Rename image intrinsics to image_var

Generated with

git grep -l nir_intrinsic_image | xargs \
sed -i 's/nir_intrinsic_image/nir_intrinsic_image_var/g'

and some manual fixing in nir_intrinsics.h

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
6 years agovirgl: add ARB_cull_distance support.
Dave Airlie [Tue, 13 Mar 2018 05:37:36 +0000 (15:37 +1000)]
virgl: add ARB_cull_distance support.

This just allows the properties through to the host if we have
cull dist support.

Signed-off-by: Dave Airlie <airlied@redhat.com>
6 years agobroadcom/vc5: Account for InstanceID/VertexID in VPM segment size.
Eric Anholt [Thu, 22 Mar 2018 20:52:11 +0000 (13:52 -0700)]
broadcom/vc5: Account for InstanceID/VertexID in VPM segment size.

Fixes failure in
GTF-GLES3.gtf.GL3Tests.draw_instanced.draw_instanced_attrib_size

6 years agobroadcom/vc5: Allow FBOs with mixed color formats.
Eric Anholt [Thu, 22 Mar 2018 20:45:17 +0000 (13:45 -0700)]
broadcom/vc5: Allow FBOs with mixed color formats.

This is required by GLES3, fixing
GTF-GLES3.gtf.GL3Tests.framebuffer_srgb.framebuffer_srgb_draw

6 years agobroadcom/vc5: Add missing support for 2101010_REV vertex attributes.
Eric Anholt [Wed, 21 Mar 2018 21:44:04 +0000 (14:44 -0700)]
broadcom/vc5: Add missing support for 2101010_REV vertex attributes.

Fixes
GTF-GLES3.gtf.GL3Tests.vertex_type_2_10_10_10_rev.vertex_type_2_10_10_10_rev_invalid2,
where we hadn't thrown a GL error as needed in the extension-disabled
case.  We want to be exposing the extension anyway.

6 years agobroadcom/vc5: Set up a vertex position if the shader doesn't.
Eric Anholt [Wed, 21 Mar 2018 21:18:08 +0000 (14:18 -0700)]
broadcom/vc5: Set up a vertex position if the shader doesn't.

Our backend needs some sort of vertex position value to emit the scaled
viewport values and such.  Fixes potential segfaults in
KHR-GLES3.copy_tex_image_conversions.required.cubemap_negx_cubemap_negx

6 years agoi965: add performance query support on CNL
Lionel Landwerlin [Thu, 22 Feb 2018 17:12:42 +0000 (17:12 +0000)]
i965: add performance query support on CNL

v2: Add brw_oa_cnl.xml to EXTRA_DIST (Emil)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoi965: perf: add support for new equation operators
Lionel Landwerlin [Thu, 22 Feb 2018 17:17:40 +0000 (17:17 +0000)]
i965: perf: add support for new equation operators

Some equations of the CNL metrics started to use operators we haven't
defined yet, just add those.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoi965: perf: query topology
Lionel Landwerlin [Wed, 21 Feb 2018 19:15:46 +0000 (19:15 +0000)]
i965: perf: query topology

With the introduction of asymmetric slices in CNL, we cannot rely on
the previous SUBSLICE_MASK getparam to tell userspace what subslices
are available.

We introduce a new uAPI in the kernel driver to report exactly what
part of the GPU are fused and require this to be available on Gen10+.

Prior generations can continue to rely on GETPARAM on older kernels.

This patch is quite a lot of code because we have to support lots of
different kernel versions, ranging from not providing any information
(for Haswell on 4.13 through 4.17), to being able to query through
GETPARAM (for gen8/9 on 4.13 through 4.17), to finally requiring 4.17
for Gen10+.

This change stores topology information in a unified way on
brw_context.topology from the various kernel APIs. And then generates
the appropriate values for the equations from that unified topology.

v2: Move slice/subslice masks fields to gen_device_info (Rafael)

v3: Add a gen_device_info_subslice_available() helper (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel: devinfo: add helper functions to fill fusing masks values
Lionel Landwerlin [Wed, 14 Mar 2018 15:44:56 +0000 (15:44 +0000)]
intel: devinfo: add helper functions to fill fusing masks values

There are a couple of ways we can get the fusing information from the
kernel :

  - Through DRM_I915_GETPARAM with the SLICE_MASK/SUBSLICE_MASK
    parameters

  - Through the new DRM_IOCTL_I915_QUERY by requesting the
    DRM_I915_QUERY_TOPOLOGY_INFO

The second method is more accurate and also gives us the EUs fusing
masks. It's also a requirement for CNL as this platform has asymetric
subslices and the first method SUBSLICE_MASK value is assumed uniform
across slices.

v2: Change gen_device_info_update_from_masks() to generate topology
    and call into gen_device_info_update_from_topology (Lionel/Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel: devinfo: meson: include drm uapi
Lionel Landwerlin [Wed, 14 Mar 2018 15:43:57 +0000 (15:43 +0000)]
intel: devinfo: meson: include drm uapi

Already available with the autotools build.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agodrm-uapi: bump headers
Lionel Landwerlin [Wed, 21 Feb 2018 14:21:08 +0000 (14:21 +0000)]
drm-uapi: bump headers

Required updates from drm-next for changes in i965.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org
6 years agointel: devinfo: store slice/subslice/eu masks
Lionel Landwerlin [Wed, 14 Mar 2018 13:16:01 +0000 (13:16 +0000)]
intel: devinfo: store slice/subslice/eu masks

We want to store values coming from the kernel but as a first step, we
can generate mask values out the numbers already stored in the
gen_device_info masks.

v2: Add a helper to set EU masks (Lionel/Ken)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel: devinfo: store number of EUs per subslice
Lionel Landwerlin [Wed, 14 Mar 2018 13:15:12 +0000 (13:15 +0000)]
intel: devinfo: store number of EUs per subslice

This will be reused to store values reported by the kernel. The main
use case will be for use as the input values of the metric sets
equations for the INTEL_performance_queries extension. By storing this
information in the gen_device_info we make this non GL specific so
this can be reused by Vulkan if we ever have an equivalent extension.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agoRevert "meson: merge C and C++ compiler arguments check"
Dylan Baker [Thu, 22 Mar 2018 18:35:08 +0000 (11:35 -0700)]
Revert "meson: merge C and C++ compiler arguments check"

This reverts commit cb2ddcefa5196fdfeff76f405175c7a6c110eae4.

This causes clang to error out building C++ code. The plan is to fix the
build to work with clang, but in the mean time we'll just revert this

Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
6 years agoi965/perf: fix config registration when uploading to kernel
Lionel Landwerlin [Thu, 22 Mar 2018 16:02:11 +0000 (16:02 +0000)]
i965/perf: fix config registration when uploading to kernel

When registring configurations to the kernel for the first time, we
run into an issue where the id number is not properly set (we're using
the wrong variable). As a result when trying to use that id later on,
we get an error.

This issue manifest itself the first time you use frameretrace after
reboot, subsequent runs are fine.

Fixes: 27ee83eaf7e9 ("i965: perf: add support for userspace configurations")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agogallium/winsys/kms: Add support for multi-planes
Lepton Wu [Mon, 19 Mar 2018 22:01:31 +0000 (15:01 -0700)]
gallium/winsys/kms: Add support for multi-planes

Add a new struct kms_sw_plane which delegate a plane and use it
in place of sw_displaytarget. Multiple planes share same underlying
kms_sw_displaytarget.

v2:
 - add more check for plane size (Tomasz)
v3:
 - split from larger patch (Emil)
v4:
 - no change from v3
v5:
 - remove mapped field (Tomasz)
v6:
 - remove change-id in commit message (Tomasz)
v7:
 - add revision history in commit message (Emil)

Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
6 years agogallium/winsys/kms: Fix possible leak in map/unmap.
Lepton Wu [Mon, 19 Mar 2018 22:01:30 +0000 (15:01 -0700)]
gallium/winsys/kms: Fix possible leak in map/unmap.

If user calls map twice for kms_sw_displaytarget, the first mapped
buffer could get leaked. Instead of calling mmap every time, just
reuse previous mapping. Since user could map same displaytarget with
different flags, we have to keep two different pointers, one for rw
mapping and one for ro mapping. Also introduce reference count for
mapped buffer so we can unmap them at right time.

v2:
 - avoid duplicated mapping and leaked mapping (Tomasz)
v3:
 - split from larger patch (Emil)
v4:
 - remove munmap from dt_destory (Emil)
v5:
 - introduce reference count for mapping (Tomasz)
 - add back munmap in dt_destory
v6:
 - remove change-id in commit message (Tomasz)
v7:
 - remove munmap from dt_destory again (Emil)
 - add revision history in commit message (Emil)

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Tomasz Figa <tfiga@chromium.org>
Signed-off-by: Lepton Wu <lepton@chromium.org>
6 years agobroadcom/vc4: add path to nir_builder.h
Juan A. Suarez Romero [Tue, 20 Mar 2018 10:21:37 +0000 (11:21 +0100)]
broadcom/vc4: add path to nir_builder.h

As the other VC4 files do. Otherwise, it won't find nir_builder.h

v2: add path in source code rather changing autotools (Emil)

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agoautotools: add tegra header files
Juan A. Suarez Romero [Mon, 19 Mar 2018 13:17:22 +0000 (14:17 +0100)]
autotools: add tegra header files

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agoswr/rast: autotools: add events_private.proto in dist tarball.
Juan A. Suarez Romero [Mon, 19 Mar 2018 13:06:57 +0000 (14:06 +0100)]
swr/rast: autotools: add events_private.proto in dist tarball.

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agoradv: autotools: add radv_extensions.h in the generated VULKAN list
Juan A. Suarez Romero [Mon, 19 Mar 2018 12:28:09 +0000 (13:28 +0100)]
radv: autotools: add radv_extensions.h in the generated VULKAN list

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agoanv/radv: autotools: include vulkan_*.h headers
Juan A. Suarez Romero [Mon, 19 Mar 2018 12:17:41 +0000 (13:17 +0100)]
anv/radv: autotools: include vulkan_*.h headers

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agonir: autotools, meson: add GLSL.ext.AMD.h in the files list
Juan A. Suarez Romero [Mon, 19 Mar 2018 12:08:32 +0000 (13:08 +0100)]
nir: autotools, meson: add GLSL.ext.AMD.h in the files list

Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
6 years agointel/compiler: Readd ICL to test_eu_validate.cpp
Matt Turner [Fri, 16 Mar 2018 18:00:50 +0000 (11:00 -0700)]
intel/compiler: Readd ICL to test_eu_validate.cpp

Now that the PCI IDs are upstream, this can be readded.

6 years agointel/compiler: Skip 64-bit type tests when types not available
Matt Turner [Fri, 16 Mar 2018 18:15:26 +0000 (11:15 -0700)]
intel/compiler: Skip 64-bit type tests when types not available

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel: Add a Ice Lake PCI IDs
Anuj Phogat [Tue, 14 Mar 2017 21:43:34 +0000 (14:43 -0700)]
intel: Add a Ice Lake PCI IDs

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
6 years agointel: Disable fast color clear on icl
Anuj Phogat [Tue, 21 Nov 2017 21:46:25 +0000 (13:46 -0800)]
intel: Disable fast color clear on icl

Disabling fast color clear makes fbo-clearmipmap test render correct
texture in base miplevel. Fast color clear is anyways disabled for
non-base miplevels.

Acked-by: Matt Turner <mattst88@gmail.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/compiler/icl: Clear "null render target" bit in extended message descriptor
Jason Ekstrand [Mon, 18 Dec 2017 19:29:14 +0000 (11:29 -0800)]
intel/compiler/icl: Clear "null render target" bit in extended message descriptor

Otherwise all our render target writes go no where.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/compiler/icl: Update the assert in brw_stage_has_packed_dispatch()
Anuj Phogat [Thu, 20 Jul 2017 23:20:33 +0000 (16:20 -0700)]
intel/compiler/icl: Update the assert in brw_stage_has_packed_dispatch()

Rafael ran piglit with the test code enabled and saw no additional GPU
hangs.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
6 years agointel/common/icl: Disable hiz surface sampling
Anuj Phogat [Fri, 16 Feb 2018 21:44:10 +0000 (13:44 -0800)]
intel/common/icl: Disable hiz surface sampling

On gen11+ AUX_HIZ is not a supported value for surfaces being
sampled by the 3D sampler.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>