git.libre-soc.org Git - mesa.git/log

intel/aubinator_error_decode: Allow for more sections

Error states coming from actual Vulkan applications tend to have fairly
long command buffers and lots of chained batches. 30 total BOs isn't
nearly enough. This commit bumps it to 256, makes some things use the
actual number of sections instead of the #define, and adds asserts if we
ever go over 256 sections.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/batch_decoder: Recurse for all 2nd level batches

Our attempt to restart the loop with the second level batch worked at
one point but got broken at some point. It was too fragile anyway and
we're not likely to have enough secondaries to actually overflow the
stack so we may as well recurse in both cases.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

virgl/vtest: add support to vtest for new cap getting.

The vtest protocol is pretty simple but also pretty dumb, and
the v1 caps query was fixed size, with no nice way to expand it,
however the server also ignores any command it doesn't understand.

So we can query v2 caps by sending a v2 followed by a v1, if the
v2 is ignored we know it's an old vtest server, and the we get
a v2 answer then we can just read the v1 answer and discard it.

Acked-by: Jakob Bornecrantz <jakob@collabora.com> (sounds good)

i965/icl: Don't set float blend optimization bit in CACHE_MODE_SS

CACHE_MODE_SS is not listed in gfxspecs table for user mode
non-privileged registers. So, making any changes from Mesa
will do nothing. Kernel is already setting this bit in
CACHE_MODE_SS register which is saved/restored to/from
the HW context image.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

anv/icl: Don't set float blend optimization bit in CACHE_MODE_SS

CACHE_MODE_SS is not listed in gfxspecs table for user mode
non-privileged registers. So, making any changes from Mesa
will do nothing. Kernel is already setting this bit in
CACHE_MODE_SS register which is saved/restored to/from
the HW context image.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

anv: Implement VK_EXT_vertex_attribute_divisor

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

anv/pipeline: Add a per-VB instance divisor

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

anv/pipeline: Use a per-VB struct instead of separate arrays

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

anv: Enable SPV_KHR_8bit_storage and VK_KHR_8bit_storage

Enables SPV_KHR_8bit_storage and VK_KHR_8bit_storage on gen 8+
using the VK_KHR_get_physical_device_properties2 functionality
to expose if the extension is supported or not.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

spirv/nir: Add support for SPV_KHR_8bit_storage

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

spirv: Include headers and grammar for SPV_KHR_8bit_storage

Updates headers and grammar to ff684ffc6a35d2a58f0f63108877d0064ea33feb

Acked-by: Jason Ekstrand <jason@jlekstrand.net>

i965/fs: Enable store_ssbo for 8-bit types.

v2: Update comment according to this patch. (Jason Ekstrand)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

intel/compiler: relax brw_eu_validate for byte raw movs

When the destination is a BYTE type allow raw movs
even if the stride is not exact multiple of destination
type and exec type, execution type is Word and its size is 2.

This restriction was only allowing stride==2 destinations
for 8-bit types.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965/fs: Enable conversions to 8-bit integers

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965: Support for 8-bit base types in helper functions

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

i965/fs: Register allocator shoudn't use grf127 for sends dest

Since Gen8+ Intel PRM states that "r127 must not be used for return
address when there is a src and dest overlap in send instruction."

This patch implements this restriction creating new grf127_send_hack_node
at the register allocator. This node has a fixed assignation to grf127.

For vgrf that are used as destination of send messages we create node
interfereces with the grf127_send_hack_node. So the register allocator
will never assign to these vgrf a register that involves grf127.

If dispatch_width > 8 we don't create these interferences to the because
all instructions have node interferences between sources and destination.
That is enough to avoid the r127 restriction.

This fixes CTS tests that raised this issue as they were executed as SIMD8:

dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.8struct_to_32struct.storage_buffer_*int_geom

Shader-db results on Skylake:
   total instructions in shared programs: 7686798 -> 7686797 (<.01%)
   instructions in affected programs: 301 -> 300 (-0.33%)
   helped: 1
   HURT: 0

   total cycles in shared programs: 337092322 -> 337091919 (<.01%)
   cycles in affected programs: 22420415 -> 22420012 (<.01%)
   helped: 712
   HURT: 588

Shader-db results on Broadwell:

   total instructions in shared programs: 7658574 -> 7658625 (<.01%)
   instructions in affected programs: 19610 -> 19661 (0.26%)
   helped: 3
   HURT: 4

   total cycles in shared programs: 340694553 -> 340676378 (<.01%)
   cycles in affected programs: 24724915 -> 24706740 (-0.07%)
   helped: 998
   HURT: 916

   total spills in shared programs: 4300 -> 4311 (0.26%)
   spills in affected programs: 333 -> 344 (3.30%)
   helped: 1
   HURT: 3

   total fills in shared programs: 5370 -> 5378 (0.15%)
   fills in affected programs: 274 -> 282 (2.92%)
   helped: 1
   HURT: 3

v2: Avoid duplicating register classes without grf127. Let's use a node
    with a fixed assignation to grf127 and create interferences to send
    message vgrf destinations. (Eric Anholt)
v3: Update reference to CTS VK_KHR_8bit_storage failing tests.
    (Jose Maria Casanova)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: 18.1 <mesa-stable@lists.freedesktop.org>

intel/compiler: grf127 can not be dest when src and dest overlap in send

Implement at brw_eu_validate the restriction from Intel Broadwell PRM,
vol 07, section "Instruction Set Reference", subsection "EUISA
Instructions", Send Message (page 990):

"r127 must not be used for return address when there is a src and
dest overlap in send instruction."

v2: Style fixes (Matt Turner)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: 18.1 <mesa-stable@lists.freedesktop.org>

radv: using tls to store llvm related info and speed up compiles (v10)

This uses the common compiler passes abstraction to help radv
avoid fixed cost compiler overheads. This uses a linked list per
thread stored in thread local storage, with an entry in the list
for each target machine.

This should remove all the fixed overheads setup costs of creating
the pass manager each time.

This takes a demo app time to compile the radv meta shaders on nocache
and exit from 1.7s to 1s. It also has been reported to take the startup
time of uncached shaders on RoTR from 12m24s to 11m35s (Alex)

v2: fix llvm6 build, inline emit function, handle multiple targets
in one thread
v3: rebase and port onto new structure
v4: rename some vars (Bas)
v5: drag all code into radv for now, we can refactor it out later
for radeonsi if we make it shareable
v6: use a bit more C++ in the wrapper
v7: logic bugs fixed so it actually runs again.
v8: rebase on top of radeonsi changes.
v9: drop some C++ headers, cleanup list entry
v10: use pop_back (didn't have enough caffeine)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

swrast: Fix eglMakeCurrent(dpy, NULL, NULL, ctx) (v2)

Fixes 14 piglits, mostly in egl_khr_create_context.

v2: Also short-circuit the same-context-no-drawables case (Eric Anholt)

Fixes: https://github.com/anholt/libepoxy/issues/177
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Adam Jackson <ajax@redhat.com>

intel: tools: dump_gpu: fix ppgtt mapping

We were not properly writing page tables when the virtual address
range spans multiple subtrees of the tables.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

v3d: Implement noperspective varyings on V3D 4.x.

Fixes a bunch of piglit interpolation tests, and reduces my concern about
some MSAA blit shaders with noperspective varyings.

v3d: Refactor flat shade/centroid flag emission.

The logic was duplicated in a pretty gross way, when what we really need
is just a helper function for stuffing the values in the packet. This
will make implementing noperspective easier.

v3d: Fix typo in dither mode offset.

We weren't using the field yet, so it didn't affect anything.

Fixes: c0476d964abb ("v3d: Express dithering mode in the same way that the CLIF parser does.")

glsl: Treat sampler2DRect and sampler2DRectShadow as reserved in ES2

"sampler2DRect" and "sampler2DRectShadow" are specified as
reserved from GLSL 1.1 and GLSL ES 1.0

Signed-off-by: zhaowei yuan <zhaowei.yuan@samsung.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106906
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes: 34f7e761bc61 ("glsl/parser: Track built-in types using the glsl_type directly")

st/wgl: check for NULL piAttribList in wglCreatePbufferARB()

Java2d opengl pipeline passes NULL piAttribList to
wglCreatePbufferARB(). So skip parsing the attribute list
if it is NULL.

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Neha Bhende <bhenden@vmware.com>

anv: Add support for VK_KHR_create_renderpass2

The implementation of CreateRenderPass2 uses the helpers we broke out in
previous commits. The implementations of the new vkCmd functions just
call the old versions.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

anv: Make subpass::depth_stencil_attachment a pointer

This makes certain checks a bit easier and means that we don't have
the attachment information duplicated in the attachment list and in
depth_stencil_attachment.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

anv/pass: Move implicit dependency setup to anv_render_pass_compile

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

anv/pass: Move some dependency setup into a helper

This new helper takes a VkSubpassDependency2KHR for future-proofing.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

anv/pass: Move a bunch of analysis into a separate "compile" stage

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

anv/pass: Use a designated initailizer for attachments

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

anv: Bump the advertised patch version to 80

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

glx: Don't allow glXMakeContextCurrent() with only one valid drawable

Drawable and readable need to either both be None or both be non-None.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>

mesa: verify MaxVertexAttribStride for GLES 3.1

The OpenGL 3.1 specification, table Table 20.41 ("Implementation
Dependent Values"), defines the minimum-maximum value for
MAX_VERTEX_ATTRIB_STRIDE to be 2048.

So we shouldn't enable OpenGL ES 3.1 on implementations where this
isn't the case. Let's add a check for this

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

mesa: verify MaxVertexAttribStride for GL 4.4

The OpenGL 4.4 specification, table Table 23.55 ("Implementation
Dependent Values"), defines the minimum-maximum value for
MAX_VERTEX_ATTRIB_STRIDE to be 2048.

So we shouldn't enable OpenGL 4.4 on implementations where this isn't
the case. Let's add a check for this.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

r600: report incorrect max-vertex-attrib for GL 4.4

OpenGL 4.4 requires a max vertex attrib of 2048 or higher, but
r600 only supports 2047. Technically, this makes it an GL4.3 GPU,
but it's currently exposing GL4.4.

To avoid regressing the GL version supported in the following
patches, let's just lie and pretend like we support 2048. Any
applications using 2048 are already broken anyway.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

intel/fs: use uint type for per_slot_offset at GS

This helps us to compact original instruction:

mul(8) g3<1>D g6<8,8,1>UD 0x00000006UD { align1 1Q };

So now we emit:

mul(8) g3<1>UD g6<8,8,1>UD 0x00000006UD { align1 1Q compacted };

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>

radv: add the trace BO to the list when starting a new cmdbuf

That might reduce CPU overhead a little bit when using
RADV_TRACE_FILE.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: reduce CPU overhead in radv_flush_descriptors()

The number of enabled descriptors for a given pipeline stage
can be computed at compile time.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

intel/compiler: remove unused function

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

anv/pipeline: honor the pipeline_cache_enabled run-time flag

v2: merge both conditions to reduce the diff (Lionel)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

r600/sb: fix crash in fold_alu_op3

fold_assoc() called from fold_alu_op3() can lower the number of src to 2,
which then leads to an invalid access to n.src[2]->gvalue().
This didn't seem to have caused much harm in the past, but on Fedora 28
it will crash (presumably because -D_GLIBCXX_ASSERTIONS is used, although
with libstdc++ 4.8.5 this didn't do anything, -D_GLIBCXX_DEBUG was
needed to show the issue).

An alternative fix would be to instead call fold_alu_op2() from within
fold_assoc() when the number of src is reduced and return always TRUE
from fold_assoc() in this case, with the only actual difference being
the return value from fold_alu_op3() then. I'm not sure what the return
value actually should be in this case (or whether it even can make a
difference).

https://bugs.freedesktop.org/show_bug.cgi?id=106928
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Dave Airlie <airlied@redhat.com>

vulkan: Update the XML and headers to 1.1.80

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

i965: fix clear color bo address relocation

Fixes: 7987d041fda0c9 ("i965/surface_state: Emit the clear color address instead of value.")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

radv: winsys/amdgpu: include missing pthread.h header

pthread types are used in some files without explicitely including pthread.h.
This leads to compile errors on Android 7.x nougat-x86
e.g. in src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.h

In file included from external/mesa/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.c:31:
In file included from external/mesa/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_bo.h:32:
external/mesa/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys.h:52:2: error: unknown type name 'pthread_mutex_t'
pthread_mutex_t global_bo_list_lock;
^
1 error generated.

Including pthread.h explicitely solves the building error

Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

nv50/ir: fix Instruction::isActionEqual for PHI instructions

phi instructions don't have the same results by simply having the same sources.
They need to be inside the same BasicBlock or share an equal condition
resulting into a path through the shader selecting equal sources as well.

short example:

cond = ...;
const0 = 0;
const1 = 1;

if (cond) {
  ssa_1 = const0;
} else {
  ssa_2 = const1;
}
ssa_3 = phi ssa_1 ssa_2;

if (!cond) {
  ssa_4 = const0;
} else {
  ssa_5 = const1;
}
ssa_6 = phi ssa_4 ssa_5;

allthough both phis actually have sources with equal results, merging them
would be wrong due to having a different condition selecting which source to
take.

For now we also stick an assert into GlobalCSE, because it should never end up
having to merge phi instructions.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>

nvc0/ir: use the combined tid special register

total instructions in shared programs : 5804448 -> 5804690 (0.00%)
total gprs used in shared programs    : 670065 -> 670065 (0.00%)
total shared used in shared programs  : 548832 -> 548832 (0.00%)
total local used in shared programs   : 21068 -> 21068 (0.00%)

                local     shared        gpr       inst      bytes
    helped           0           0           0           5           5
      hurt           0           0           0         191         191

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>

nir/print: Print texture and sampler indices

Commit 5fb69daa6076e56b deleted support from nir_print for printing the
texture and sampler indices on texture instructions. This commit just
brings it back as best as we can.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

intel/compiler: Relax mixed type restriction for saturating immediates

At the time of commit 7bc6e455e23 (i965: Add support for saturating
immediates.) we thought mixed type saturates would be impossible.  We
were only thinking about type converting moves from D to F, for
example.  However, type converting moves w/saturate from F to DF are
definitely possible.  This change minimally relaxes the restriction to
allow cases that I have been able trigger via piglit tests.

Fixes new piglit tests:
- arb_gpu_shader_fp64/execution/built-in-functions/fs-sign-sat-neg-abs.shader_test
- arb_gpu_shader_fp64/execution/built-in-functions/vs-sign-sat-neg-abs.shader_test

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

i965/vec4: Properly handle sign(-abs(x))

This is achived by copying the sign(abs(x)) optimization from the FS
backend.

On Gen7 an earlier platforms, this fixes new piglit tests:

- glsl-1.10/execution/vs-sign-neg-abs.shader_test
- glsl-1.10/execution/vs-sign-sat-neg-abs.shader_test

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

i965/fs: Properly handle sign(-abs(x))

Fixes new piglit tests:

- glsl-1.10/execution/fs-sign-neg-abs.shader_test
- glsl-1.10/execution/fs-sign-sat-neg-abs.shader_test
- glsl-1.10/execution/vs-sign-neg-abs.shader_test
- glsl-1.10/execution/vs-sign-sat-neg-abs.shader_test

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>

vulkan: utils: handle hexadecimal values in registry

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

st/dri: fix a crash in server_wait_sync

Ported from i965 including the comment.

This fixes:
dEQP-EGL.functional.reusable_sync.valid.wait_server

Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>

python: Stop using the Python 2 exception syntax

We could have made this compatible with Python 3 by using:

except Exception as e:

But since none of this code actually uses the exception objects, let's
just drop them entirely.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>

python: Use spaces, not tabs

Python 3 doesn't allow mixing spaces and tabs in a script, contrarily to
Python 2.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

python: Use the print function

In Python 2, `print` was a statement, but it became a function in
Python 3.

Using print functions everywhere makes the script compatible with Python
versions >= 2.6, including Python 3.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Dylan Baker <dylan@pnwbakers.com>

vma/tests: Fix compilation if limits.h defines PAGE_SIZE (v2)

per POSIX, limits.h may define PAGE_SIZE when the value is not indeterminate

v2: just change the variable name, since there's no intended correlation
here between this value and the machine's actual page size.

Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Scott D Phillips <scott.d.phillips@intel.com>

radv: fix emitting the view index on GFX9

For merged shaders, VS as HS for example.

Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

i965/vec4: Make the vec4_visitor::nir_emit_instr default case unreachable

The bug fixed by the previous commit went undetected because extra
stderr messages are not flagged by the CI. Copy the solution from
fs_visitor::nir_emit_instr and mark the default case unreachable.

An alternate solution is to delete the default case so that the compiler
will issue a warning. That may require more work since there are other
(impossible) cases that exist.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

intel/compiler: More DCE after lowering

Some of the lowering passes, nir_lower_locals_to_regs for example, can
cause some previously live code to be dead.  This pass in particular
leaves a bunch of nir_instr_type_deref instructions floating around.
This causes shader-db runs on Gen5 through Haswell to spew tons of
messages like:

    VS instruction not yet implemented by NIR->vec4

UnrealEngine4/EffectsCaveDemo/239.shader_test is one shader that
generates these messages.  Cleaning up the dead code fixes that.

To verify, I did a shader-db before and after.  Even though all the
messages are gone, the results make my brain hurt. :(

Haswell
total cycles in shared programs: 411890163 -> 411891145 (<.01%)
cycles in affected programs: 57016 -> 57998 (1.72%)
helped: 3
HURT: 11
helped stats (abs) min: 2 max: 154 x̄: 96.67 x̃: 134
helped stats (rel) min: 0.08% max: 2.23% x̄: 1.42% x̃: 1.96%
HURT stats (abs)   min: 18 max: 686 x̄: 115.64 x̃: 20
HURT stats (rel)   min: 0.81% max: 7.12% x̄: 1.87% x̃: 0.93%
95% mean confidence interval for cycles value: -51.39 191.67
95% mean confidence interval for cycles %-change: -0.14% 2.46%
Inconclusive result (value mean confidence interval includes 0).

Ivy Bridge
total cycles in shared programs: 259114802 -> 259115032 (<.01%)
cycles in affected programs: 24034 -> 24264 (0.96%)
helped: 1
HURT: 9
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08%
HURT stats (abs)   min: 18 max: 48 x̄: 25.78 x̃: 20
HURT stats (rel)   min: 0.80% max: 1.94% x̄: 1.08% x̃: 0.80%
95% mean confidence interval for cycles value: 12.42 33.58
95% mean confidence interval for cycles %-change: 0.54% 1.38%
Cycles are HURT.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 5a02ffb733e nir: Rework lower_locals_to_regs to use deref instructions
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>

v3d: Fix leak of the default attributes BOs.

The GLES3 CTS makes a lot more progress on a run now.

v3d: Fix leak of the spill BO on context destruction.

nir: Apply fragment color clamping to gl_FragData[] as well.

From the ARB_color_buffer_float spec:

   35. Should the clamping of fragment shader output gl_FragData[n]
       be controlled by the fragment color clamp.

       RESOLVED: Since the destination of the FragData is a color
       buffer, the fragment color clamp control should apply.

Fixes arb_color_buffer_float-mrt mixed on v3d.

Reviewed-by: Rob Clark <robdclark@gmail.com>

v3d: Skip emitting per-RT blend state for RTs with blend disabled.

Cleans up the CL of fbo-drawbuffers2-blend a bit. We could do better on
more complicated cases by noticing if multiple RTs have the same blend
state and emitting them in a single packet.

v3d: Add proper support for GL_EXT_draw_buffers2's blending enables.

I had flagged it as enabled on V3D 4.x, but not actually implemented the
per-RT enables. Fixes piglit fbo_drawbuffers2-blend.

v3d: Add support for GL_SAMPLE_ALPHA_TO_ONE.

Fixes piglit ext_framebuffer_multisample-draw-buffers-alpha-to-one

v3d: Respect swap_color_rb for the f32_color_rb case.

We don't actually set the two flags together, but I want to use the
r/g/b/a reordered fields in the next commit.

st/nir: Disable varying packing when doing transform feedback.

The varying packing would result in st_nir_assign_var_locations() picking
new driver_locations, despite the pipe_stream_output already being set up
for the old driver location. This left the gallium driver with no way to
work back to what varying was referenced by pipe_stream_output.

Fixes these tests on V3D:
dEQP-GLES3.functional.transform_feedback.random.separate.points.3
dEQP-GLES3.functional.transform_feedback.random.separate.points.7
dEQP-GLES3.functional.transform_feedback.random.separate.points.9
dEQP-GLES3.functional.transform_feedback.random.separate.triangles.3
dEQP-GLES3.functional.transform_feedback.random.separate.triangles.8

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

meson: Set with_dri from with_gallium when DRI glx is explicitly configured

Set with_dri from with_gallium when DRI GLX is explicitly configured, as
well as when DRI GLX is chosen automatically.

Signed-off-by: Jon Turney <jon.turney@dronecode.org.uk>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>

radv/winsys: make use of radeon_emit()

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: only flush CB meta in pipeline image barriers when needed

If the given image doesn't enable CMASK, FMASK or DCC that's
useless to flush CB metadata.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: only flush DB meta in pipeline image barriers when needed

If the given image doesn't have HTILE, that's useless to flush
DB metadata.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: fix "error: initializer element is not constant" build error

GCC 4.8 fails to compile with "static const", while GCC 8.1
fails to compile with only "static".

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

util: u_queue: fix android build error

mesa/src/util/u_queue.c:242:15: error: address of array 'queue->name'
will always evaluate to 'true' [-Werror,-Wpointer-bool-conversion]

Fixes: b238e33bc9d48b814370 "kutil/queue: add a process name into a thread name"
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

Util: fix msvc build

The MSVC preprocessor doesnt understand #warning

Fixes: 2e1e6511f76 ("util: extract get_process_name from xmlconfig.c")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>

python: Specify the JSON separators

On Python 2, the default JSON separators are ', ' for items and ': ' for
dicts.

On Python 3, the default is the same when no indent is specified, but if
one is (and we do specify one) then the default items separator becomes
',' (the dict separator remains unchanged).

This change explicitly specifies the Python 3 default, which helps
ensuring that the output is identical, whether it was generated by
Python 2 or 3.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

python: Stabilize some script outputs

In Python, dictionaries and sets are unordered, and as a result their
is no guarantee that running this script twice will produce the same
output.

Using ordered dicts and explicitly sorting items makes the build more
reproducible, and will make it possible to verify that we're not
breaking anything when we move the build scripts to Python 3.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

intel: tools: remove drm-uapi defines

We already embed the headers, no need to redefine defines/structs.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

intel: intel_dump_gpu: use simulator id in captures

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

intel: devinfo: add simulator id

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

intel: tools: dump-gpu: dump 48-bit addresses

For gen8+, write out PPGTT tables in aub files so that full 48-bit
addresses can be serialized.

v2: Fix handling of `end` index in map_ppgtt

v3: Correctly mark GGTT entry as present (Rafael)

Signed-off-by: Scott D Phillips <scott.d.phillips@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

intel: tools: import intel_aubdump

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Rafael Antognolli <rafael.antognolli@intel.com>

intel: tools: update intel_aub.h

Scott added new stuff in IGT.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

intel: batch-decoder: add missing return line

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

intel: batch-decoder: don't asks for constant BO until decoding

With PPGTT mappings, our aubinator implementation can be quite slow if
we request a buffer that doesn't exist. Instead of doing a PPGTT walk
for invalid addresses (0 lengths), wait until we're sure we want to
decode the data.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

intel/batch-decoder: handle non-contiguous binding table / surface state

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel/tools/aubinator: aubinate ppgtt aubs

v2: by Lionel
    Fix memfd_create compilation issue
    Fix pml4 address stored on 32 instead of 64bits
    Return no buffer if first ppgtt page is not mapped

v3: Drop additional memfd_create() (Rafael)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

intel: aubinator: handle GGTT mappings

We use memfd to store physical pages as they get read/written to and
the GGTT entries translating virtual address to physical pages.

Based on a commit by Scott Phillips.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

util: rb-tree: A simple, invasive, red-black tree

This is a simple, invasive, liberally licensed red-black tree
implementation. It's an invasive data structure similar to the
Linux kernel linked-list where the intention is that you embed a
rb_node struct the data structure you intend to put into the
tree.

The implementation is mostly based on the one in "Introduction to
Algorithms", third edition, by Cormen, Leiserson, Rivest, and
Stein. There were a few other key design points:

* It's an invasive data structure similar to the [Linux kernel
   linked list].

* It uses NULL for leaves instead of a sentinel. This means a few
   algorithms differ a small bit from the ones in "Introduction to
   Algorithms".

* All search operations are inlined so that the compiler can
   optimize away the function pointer call.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

intel: aubinator: drop the 1Tb GTT mapping

Now that we're softpinning the address of our BOs in anv & i965, the
addresses selected start at the top of the addressing space. This is a
problem for the current implementation of aubinator which uses only a
40bit mmapped address space.

This change keeps track of all the memory writes from the aub file and
fetch them on request by the batch decoder. As a result we can get rid
of the 1<<40 mmapped address space and only rely on the mmap aub file
\o/

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

intel: aubinator: rework register writes handling

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

intel: aubinator: remove standard input processing option

On a follow up commit in this series, we stop copying the data from
the mmap'ed file into our big gtt mmap, and start referencing data in
it directly. So reallocating the read buffer and adding more data from
stdin wouldn't work. For that reason, let's stop supporting stdin
process.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

intel: aubinator: remove unused variables

These memory offsets are stored in the gen_batch_decode_ctx.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>

gallium/auxiliary: Fix string matching

Commit f69bc797e15fe6beb9e439009fab55f7fae0b7f9 did the following:

-        if format.layout in ('bptc', 'astc'):
+        if format.layout in ('astc'):

The intention was to go from matching either 'bptc' or 'astc' to
matching only 'astc'.

But the new code doesn't respect this intention any more, because in
Python `('astc')` is not a tuple containing a string, it is just the
string. (the parentheses are simply ignored)

That means we now match any substring of 'astc', for example 'a'.

This commit fixes the test to respect the original intention.

Fixes: f69bc797e15fe6beb9e4 "gallium/auxiliary: Add helper support for
                             bptc format compress/decompress"
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>

radv: optimize vkCmd{Set,Reset}Event() a little bit

Always emitting a bottom-of-pipe event is quite dumb. Instead,
start to optimize these functions by syncing PFP for the
top-of-pipe and syncing ME for the post-index-fetch event.

This can still be improved by emitting EOS events for
syncing PS and CS stages.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

radv: optimize radv_CmdWaitEvents()

This introduces radv_barrier() (same as the draw/dispatch codepath).
This helper is used for merging the code from CmdWaitEvents() and
CmdPipelineBarrier because it's quite similar.

We do ignore the source stage mask for CmdWaitEvents because
it's irrelevant when event objects are used.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

nir/linker: fix msvc build

Empty initializer braces aren't valid c (it's a gnu extension, and
it's valid in c++).
Hopefully fixes appveyor / msvc build...

Fixes 6677e131b806b10754adcb7cf3f427a7fcc2aa09
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>

r600: compare structure elements instead of doing a memcmp

Structures might be padded by the compiler and these padding bytes remain
un-initialized which in turn makes memcmp return a difference where from
the logical point of view there is none.

Fixes valgrind:
     Conditional jump or move depends on uninitialised value(s)
       at 0x4C32CBA: __memcmp_sse4_1 (vg_replace_strmem.c:1099)
       by 0xB8D2537: r600_set_vertex_buffers (r600_state_common.c:573)
       by 0xB71D44A: u_vbuf_set_driver_vertex_buffers (u_vbuf.c:1129)
       by 0xB71F7BB: u_vbuf_draw_vbo (u_vbuf.c:1153)
       by 0xB3B92CB: st_draw_vbo (st_draw.c:235)
       by 0xB36B1AE: vbo_draw_arrays (vbo_exec_array.c:391)
       by 0xB36BB0D: vbo_exec_DrawArrays (vbo_exec_array.c:550)
       by 0x10A989: piglit_display (textureSize.c:157)
       by 0x4F8F174: run_test (piglit_fbo_framework.c:52)
       by 0x4F7BA12: piglit_gl_test_run (piglit-framework-gl.c:229)
       by 0x10A60A: main (textureSize.c:71)
     Uninitialised value was created by a stack allocation
       at 0xB3948FD: st_update_array (st_atom_array.c:388)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

r600: Add R4G4B4A4 and A1B5G5R5 to supported vertex formats

Below tests would fail with an error message
  "Vertex format (R4G4B4A4|R5G5B5A1) not supported."
Add the formate to the translation routine to enable these formats.

Fixes:
  dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rgba4_2d
  dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rgba4_cube
  dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rgb5_a1_2d
  dEQP-GLES3.functional.texture.specification.teximage2d_pbo.rgb5_a1_cube
  dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgba4_2d
  dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgba4_cube
  dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb5_a1_2d
  dEQP-GLES3.functional.texture.specification.texsubimage2d_pbo.rgb5_a1_cube
  dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgba4_2d_array
  dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgba4_3d
  dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb5_a1_2d_array
  dEQP-GLES3.functional.texture.specification.teximage3d_pbo.rgb5_a1_3d
  dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgba4_2d_array
  dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgba4_3d
  dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb5_a1_2d_array
  dEQP-GLES3.functional.texture.specification.texsubimage3d_pbo.rgb5_a1_3d
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>

r600: force LOD range to be only one value when mip.min filter is NONE

For a texture that has only one LOD defined, but for which
GL_TEXTURE_MAX_LEVEL is the default (1000) and
GL_TEXTURE_MIN_LOD != GL_TEXTURE_MAX_LOD the reading from the texture does
not properly resolve the LOD level and texture lookup might fail. Hence,
when no mipmap filter is given (indicating that no mip-mapping takes place),
force the LOD range to contain only value.

Fixes:
  dEQP-GLES3.functional.shaders.texture_functions.texture*.(i|u)sampler2d*
  dEQP-GLES3.functional.texture.format.sized.cube.rgb*
  out of VK_GL_CTS/android/cts/master/gles3-master.txt
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>