mesa.git
5 years agoswr/rast: Add initial SWTag proto definitions
Alok Hota [Tue, 4 Sep 2018 18:41:39 +0000 (13:41 -0500)]
swr/rast: Add initial SWTag proto definitions

Update gen_archrast.py to properly generate event IDs

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
5 years agoswr/rast: Cleanup and generalize gen_archrast
Alok Hota [Fri, 31 Aug 2018 17:13:56 +0000 (12:13 -0500)]
swr/rast: Cleanup and generalize gen_archrast

Update meson.build to accomodate

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
5 years agonir: Use SM5 properties to optimize shift(a@32, iand(31, b))
Daniel Schürmann [Fri, 22 Feb 2019 21:05:07 +0000 (22:05 +0100)]
nir: Use SM5 properties to optimize shift(a@32, iand(31, b))

This is a common pattern from HLSL->SPIRV translation
and supported in HW by all current NIR backends.

vkpipeline-db results anv (SKL):

    total instructions in shared programs: 6403130 -> 6402380 (-0.01%)
    instructions in affected programs: 204084 -> 203334 (-0.37%)
    helped: 208
    HURT: 0

    total cycles in shared programs: 1915629582 -> 1918198408 (0.13%)
    cycles in affected programs: 1158892682 -> 1161461508 (0.22%)
    helped: 107
    HURT: 86

shader-db results on i965 (KBL):

    total instructions in shared programs: 15284592 -> 15284568 (<.01%)
    instructions in affected programs: 81683 -> 81659 (-0.03%)
    helped: 24
    HURT: 0

    total cycles in shared programs: 375013622 -> 375013932 (<.01%)
    cycles in affected programs: 40169618 -> 40169928 (<.01%)
    helped: 13
    HURT: 9

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: Define shifts according to SM5 specification.
Daniel Schürmann [Thu, 14 Feb 2019 07:19:09 +0000 (08:19 +0100)]
nir: Define shifts according to SM5 specification.

SPIR-V shifts are undefined for values >= bitsize, but SM5 shifts
are defined to only use the least significant bits.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/eu: Add an EOT parameter to send_indirect_[split]_message
Jason Ekstrand [Thu, 7 Feb 2019 23:45:51 +0000 (17:45 -0600)]
intel/eu: Add an EOT parameter to send_indirect_[split]_message

For split indirect sends we have to put the EOT parameter in the
extended descriptor as well as the instruction itself so just calling
brw_inst_set_eot is insufficient.  Moving the EOT handling handling into
the send_indirect_[split]_message helper lets us handle it properly.

5 years agod3d: meson: do not prefix user provided d3d-drivers-path
Sergii Romantsov [Fri, 22 Feb 2019 09:23:08 +0000 (11:23 +0200)]
d3d: meson: do not prefix user provided d3d-drivers-path

The user can select the location where there d3d drivers
are installed by the d3d-drivers-path meson option.

By default path will be $prefix/$libdir/d3d.

Currently we add $prefix to the user provided path.
Resulting in an incorrect or even missing path.

Based on logic of
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109698
CC: Kenneth Graunke <kenneth@whitecape.org>
CC: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
5 years agodri: meson: do not prefix user provided dri-drivers-path
Sergii Romantsov [Thu, 21 Feb 2019 08:28:11 +0000 (10:28 +0200)]
dri: meson: do not prefix user provided dri-drivers-path

The user can select the location where there dri drivers
are installed by the dri-drivers-path meson option.

By default path will be $prefix/$libdir/dri.

Currently we add $prefix to the user provided path.
Resulting in an incorrect or even missing path.

v2: fixed dri_search_path by default, rebased to master

v3: new commit-message (Emil Velikov), cc mesa-stable

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109698
CC: Rafael Antognolli <rafael.antognolli@intel.com>
CC: Dylan Baker <dylan@pnwbakers.com>
Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Fixes: 306914db92e1 (meson: Add dridriverdir variable to dri.pc.)
Signed-off-by: Sergii Romantsov <sergii.romantsov@globallogic.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
5 years agointel/aub_viewer: silence more compiler warnings
Lionel Landwerlin [Mon, 25 Feb 2019 10:50:59 +0000 (10:50 +0000)]
intel/aub_viewer: silence more compiler warnings

format not a string literal and no format arguments.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agointel/aub_viewer: silence compiler warning
Lionel Landwerlin [Mon, 25 Feb 2019 10:48:52 +0000 (10:48 +0000)]
intel/aub_viewer: silence compiler warning

buffer_addr may be used uninitialized.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agointel/aub_viewer: printout 48bits addresses
Lionel Landwerlin [Mon, 25 Feb 2019 10:47:55 +0000 (10:47 +0000)]
intel/aub_viewer: printout 48bits addresses

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agomesa/core: Enable EXT_depth_clamp for GLES >= 2.0
Gert Wollny [Wed, 2 Jan 2019 14:44:33 +0000 (15:44 +0100)]
mesa/core: Enable EXT_depth_clamp for GLES >= 2.0

The extension NV_depth_clamp is written against OpenGL 1.2.1, and
since GLES 2.0 is based on GL 2.0 there is no reason not to enable
this extension also for GLES >= 2.0.

v2: Use EXT_depth_clamp that has been proposed to Khronos

v3: - Fix check for extension availability (Erik Faya-Lund)
    - Also fix the test in is_enabled
v4: - Test both, ARB and EXT extension (Erik)
v5: - Fix white space errors (Erik)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
5 years agoiris: Properly allow rendering to RGBX formats.
Kenneth Graunke [Fri, 22 Feb 2019 07:37:58 +0000 (23:37 -0800)]
iris: Properly allow rendering to RGBX formats.

I was converting them at pipe_surface creation time, but not when
answering queries about whether formats support rendering.  This caused
a lot of FBO incomplete errors for formats that ought to be supported.

Fixes "Child of Light", which uses PIPE_FORMAT_R8G8B8X8_UNORM_SRGB.

Also fixes Witcher 1 using wined3d (GL) according to Timur Kristóf.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109738

5 years agoiris: Drop RGBX -> RGBA for storage image usages
Kenneth Graunke [Fri, 22 Feb 2019 05:36:05 +0000 (21:36 -0800)]
iris: Drop RGBX -> RGBA for storage image usages

GLSL doesn't expose RGB/RGBX image formats, so this isn't needed.

5 years agomesa: Fix RGBBuffers for renderbuffers with sized internal formats
Kenneth Graunke [Fri, 22 Feb 2019 09:16:41 +0000 (01:16 -0800)]
mesa: Fix RGBBuffers for renderbuffers with sized internal formats

For texture attachments, 'f' is texImg->_BaseFormat, but for
renderbuffer attachments, 'f' is att->Renderbuffer->InternalFormat.

InternalFormat may be something like GL_RGB8, which causes our
(f == GL_RGB) check to fail.  Switch to using a proper _BaseFormat,
which drops the size.

Fixes dEQP-GLES31.functional.draw_buffers_indexed.random.
max_required_draw_buffers.15 on iris when combined with a driver fix.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
5 years agoglsl: Fix function return typechecking
Oscar Blumberg [Mon, 11 Feb 2019 16:46:20 +0000 (17:46 +0100)]
glsl: Fix function return typechecking

apply_implicit_conversion only converts and check base types but we
need actual type equality for function returns, otherwise you can
return a vec2 from a function declared as returning a float.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
5 years agoiris: Always use in-tree i915_drm.h
Jordan Justen [Sun, 24 Feb 2019 22:21:39 +0000 (14:21 -0800)]
iris: Always use in-tree i915_drm.h

Ref: f1374805a86 "drm-uapi: use local files, not system libdrm"
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
5 years agopanfrost: Decode render target swizzle/channels
Alyssa Rosenzweig [Sun, 24 Feb 2019 06:22:23 +0000 (06:22 +0000)]
panfrost: Decode render target swizzle/channels

On MRT-capable systems, the framebuffer format is encoded as a 64-bit
word in the render target descriptor. Previously, the two 32-bit
words were exposed as opaque hex values. This commit identifies a 12-bit
Mali swizzle and a 2-bit channel counter, removing some of the magic. It
also adds decoding support for the AFBC and MSAA enable bits, which were
already known but otherwise ignored in pandecode.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/midgard: Add fround(_even), ftrunc, ffma
Alyssa Rosenzweig [Sat, 23 Feb 2019 01:12:10 +0000 (01:12 +0000)]
panfrost/midgard: Add fround(_even), ftrunc, ffma

These ops were discovered by invoking the correspondingly names GLSL
functions. The rounding ops here behave exact as expected and are mapped
to their corresponding NIR ops where applicable. The ffma behaves as a
LUT instruction and requires some special argument packing (since
Midgard normally only allows for 2 arguments); this quirk will be
addressed in the future, but for now FMA is still lowered.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/nondrm: Split out dump_counters
Alyssa Rosenzweig [Mon, 18 Feb 2019 23:32:05 +0000 (23:32 +0000)]
panfrost/nondrm: Split out dump_counters

Previously, this function was implied a part of the job submit.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/nondrm: Make COHERENT_LOCAL explicit
Alyssa Rosenzweig [Mon, 25 Feb 2019 02:32:45 +0000 (02:32 +0000)]
panfrost/nondrm: Make COHERENT_LOCAL explicit

This flag corresponds to what was MEM_COHERENT_LOCAL in the vendor
driver, which seems to influence the cache policy, necessary for the
varying temporary storage but nothing else.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/nondrm: Flag CPU-invisible regions
Alyssa Rosenzweig [Mon, 25 Feb 2019 02:31:09 +0000 (02:31 +0000)]
panfrost/nondrm: Flag CPU-invisible regions

Potentially, the kernel could optimize these allocations, or perhaps we
can save on mapping costs.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost/meson: Remove subdir for nondrm
Alyssa Rosenzweig [Fri, 22 Feb 2019 23:08:59 +0000 (23:08 +0000)]
panfrost/meson: Remove subdir for nondrm

This change fixes cross builds with the (temporary) non-DRM overlay.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agopanfrost: Use tiler fast path (performance boost)
Alyssa Rosenzweig [Thu, 21 Feb 2019 05:57:29 +0000 (05:57 +0000)]
panfrost: Use tiler fast path (performance boost)

For reasons that are still unclear (speculation included in the comment
added in this patch), the tiler? metadata has a fast path that we were
not enabling; there looks to be a possible time/memory tradeoff, but the
details remain unclear.

Regardless, this patch improves performance dramatically. Particular
wins are for geometry-heavy scenes. For instance, glmark2-es2's
Phong-shaded bunny, rendering at fullscreen (2400x1600) via GBM, jumped
from ~20fps to hitting vsync cap at 60fps. Gains are even more obvious
when vsync is disabled, as in glmark2-es2-wayland.

With this patch, on GLES 2.0 samples not involving FBOs, it appears
performance is converging with (and sometimes surpassing) the blob.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
5 years agonir/builder: Don't emit no-op swizzles
Jason Ekstrand [Fri, 22 Feb 2019 23:06:39 +0000 (17:06 -0600)]
nir/builder: Don't emit no-op swizzles

The nir_swizzle helper is used some on it's own but it's also called by
nir_channel and nir_channels which are used everywhere.  It's pretty
quick to check while we're walking the swizzle anyway whether or not
it's an identity swizzle.  If it is, we now don't bother emitting the
instruction.  Sure, copy-prop will clean it up for us but there's no
sense making more work for the optimizer than we have to.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
5 years agonir/split_vars: Don't compact vectors unnecessarily
Jason Ekstrand [Sat, 23 Feb 2019 04:10:55 +0000 (22:10 -0600)]
nir/split_vars: Don't compact vectors unnecessarily

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
5 years agost/mesa: remove unused header-file
Erik Faye-Lund [Wed, 20 Feb 2019 20:50:50 +0000 (21:50 +0100)]
st/mesa: remove unused header-file

This header has been unused since f8f2520e88c ("st/mesa: Remove
unnecessary headers"). And in the more than 8 years since, this
hasn't been useful. So let's just get rid of it.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agoconfigure: fix test portability
Maya Rashish [Thu, 10 Jan 2019 14:18:48 +0000 (16:18 +0200)]
configure: fix test portability

From the bash manual:

string1 == string2
string1 = string2
       True if the strings are equal.  = should be used with the test
       command for POSIX conformance.

5 years agomeson: ensure that xmlpool_options.h is generated for gallium targets that need it
David Shao [Sun, 24 Feb 2019 09:00:36 +0000 (09:00 +0000)]
meson: ensure that xmlpool_options.h is generated for gallium targets that need it

Fixes: 68076b87474e7959c161 "meson: build gallium vdpau state tracker"
Fixes: 22a817af8a89eb3c762f "meson: build gallium xvmc state tracker"
Fixes: 5a785d51a6d68ec676ce "meson: build gallium va state tracker"
Fixes: 0ba909f0f111824223bc "meson: build gallium xa state tracker"
Fixes: 1d36dc674d528b93bec3 "meson: build gallium omx state tracker"
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agovulkan/overlay: Add fps counter
Matthias Lorenz [Fri, 22 Feb 2019 23:08:28 +0000 (00:08 +0100)]
vulkan/overlay: Add fps counter

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109747

5 years agoRevert "anv: add support for INTEL_DEBUG=bat"
Lionel Landwerlin [Sun, 24 Feb 2019 01:06:39 +0000 (01:06 +0000)]
Revert "anv: add support for INTEL_DEBUG=bat"

This reverts commit e4d88396d259c4ec6032d2834d1c9073d55e9b45.

Apologies, I pushed the wrong commit.

5 years agoanv: add support for INTEL_DEBUG=bat
Lionel Landwerlin [Sat, 23 Feb 2019 23:27:17 +0000 (23:27 +0000)]
anv: add support for INTEL_DEBUG=bat

As requested by Ken ;)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agoetnaviv: blt: mark used src resource as read from
Christian Gmeiner [Fri, 22 Feb 2019 10:10:29 +0000 (11:10 +0100)]
etnaviv: blt: mark used src resource as read from

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
5 years agoetnaviv: rs: mark used src resource as read from
Christian Gmeiner [Fri, 22 Feb 2019 10:02:34 +0000 (11:02 +0100)]
etnaviv: rs: mark used src resource as read from

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
5 years agogallium/auxiliary/vl: Fix duplicate symbol build errors.
Vinson Lee [Tue, 19 Feb 2019 03:27:27 +0000 (19:27 -0800)]
gallium/auxiliary/vl: Fix duplicate symbol build errors.

  CXXLD    gallium_dri.la
duplicate symbol _compute_shader_video_buffer in:
    ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor.o)
    ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor_cs.o)
duplicate symbol _compute_shader_weave in:
    ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor.o)
    ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor_cs.o)
duplicate symbol _compute_shader_rgba in:
    ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor.o)
    ../../../../src/gallium/auxiliary/.libs/libgalliumvl.a(libgalliumvl_la-vl_compositor_cs.o)

Fixes: 9364d66cb7f7 ("gallium/auxiliary/vl: Add video compositor compute shader render")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: James Zhu <James.Zhu@amd.com>
5 years agonir: fix MSVC build
Caio Marcelo de Oliveira Filho [Sat, 23 Feb 2019 06:38:05 +0000 (22:38 -0800)]
nir: fix MSVC build

Zero initialize struct with {0} instead of {}.

5 years agonir/copy_prop_vars: add tests for load/store elements of vectors
Caio Marcelo de Oliveira Filho [Mon, 14 Jan 2019 21:52:36 +0000 (13:52 -0800)]
nir/copy_prop_vars: add tests for load/store elements of vectors

Test using array deref on vectors in loads and stores.  These are
marked DISABLED_ as this optimization is currently not done.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: nir_build_deref_follower accept array derefs of vectors
Caio Marcelo de Oliveira Filho [Tue, 15 Jan 2019 00:10:44 +0000 (16:10 -0800)]
nir: nir_build_deref_follower accept array derefs of vectors

Code itself already supports it, just make sure we can use it for
those cases.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir/copy_prop_vars: change test helper to get intrinsics
Caio Marcelo de Oliveira Filho [Mon, 14 Jan 2019 21:33:00 +0000 (13:33 -0800)]
nir/copy_prop_vars: change test helper to get intrinsics

Replace find_next_intrinsic(intrinsic, after) with
get_intrinsic(intrinsic, index).  This makes slightly more convenient
to check the resulting loads/stores/copies, since in most tests we
know which one we care about.  The cost is to perform more traversals,
but for such tests this is not a problem.

Added the ASSERT_EQ() on count to some tests missing it, so the
indices queried are always expected to find something.

Also, drop two nir_print_shader leftover calls in a test.

v2: Remove redundant assertions.  nir_src_comp_as_uint already
    assert what we need.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir/copy_prop_vars: keep track of components in copy_entry
Caio Marcelo de Oliveira Filho [Mon, 14 Jan 2019 20:26:30 +0000 (12:26 -0800)]
nir/copy_prop_vars: keep track of components in copy_entry

When a copy_entry is SSA, store not only the nir_ssa_def* for each
component, but also the source component they come from.  At the
moment this is always a match (i.e. 'component[i] == i'), because all
the operations for a copy_entry happen using definitions with the same
size.  This prepares the code for array_derefs of vectors, in which
'component[i] != i'.

Also, extract setting all SSA components into a function of its own.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir/copy_prop_vars: add debug helpers
Caio Marcelo de Oliveira Filho [Sat, 12 Jan 2019 00:27:24 +0000 (16:27 -0800)]
nir/copy_prop_vars: add debug helpers

Disabled by default, to be used during development.  Adding those
so I don't rewrite some ad-hoc version of them everytime I'm working
with this pass.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir/copy_prop_vars: don't get confused by array_deref of vectors
Caio Marcelo de Oliveira Filho [Tue, 8 Jan 2019 23:53:02 +0000 (15:53 -0800)]
nir/copy_prop_vars: don't get confused by array_deref of vectors

For now these derefs are not handled, so don't let these get into the
copies list -- which would cause wrong propagations.  For load_derefs,
do nothing.  For store_derefs, invalidate whatever the store is
writing to.  For copy_derefs, invalidate whatever the copy is writing
to.

These cases will happen once derefs to SSBOs/UBOs are kept around long
enough to get optimized by copy_prop_vars.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: allow nir_lower_phis_to_scalar() on more src types
Timothy Arceri [Fri, 22 Feb 2019 05:59:13 +0000 (16:59 +1100)]
nir: allow nir_lower_phis_to_scalar() on more src types

Rather than only lowering if all srcs are scalarizable we instead
check that at least one src is scalarizable.

We change undef type to return false otherwise it will cause
regressions when it is the only scalarizable src.

total instructions in shared programs: 13219105 -> 13024547 (-1.47%)
instructions in affected programs: 1153797 -> 959239 (-16.86%)
helped: 581
HURT: 74

total cycles in shared programs: 333968972 -> 324807922 (-2.74%)
cycles in affected programs: 129809402 -> 120648352 (-7.06%)
helped: 571
HURT: 131

total spills in shared programs: 57947 -> 29130 (-49.73%)
spills in affected programs: 53364 -> 24547 (-54.00%)
helped: 351
HURT: 0

total fills in shared programs: 51310 -> 25468 (-50.36%)
fills in affected programs: 44882 -> 19040 (-57.58%)
helped: 351
HURT: 0

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoswr/rast: bypass size limit for non-sampled textures
Alok Hota [Thu, 21 Feb 2019 20:41:15 +0000 (14:41 -0600)]
swr/rast: bypass size limit for non-sampled textures

This fixes a bug where SWR will fail to render in cases with large
buffer allocations, e.g. very large meshes whose vertex buffers exceed
2GB

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
5 years agotgsi: don't set tgsi_info::uses_bindless_images for constbufs and hw atomics
Marek Olšák [Wed, 20 Feb 2019 22:21:32 +0000 (17:21 -0500)]
tgsi: don't set tgsi_info::uses_bindless_images for constbufs and hw atomics

This might have decreased performance for radeonsi/tgsi, because most
most shaders claimed they used bindless.

Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
5 years agoiris: Add gitlab-ci build testing
Jordan Justen [Sun, 17 Feb 2019 01:39:45 +0000 (17:39 -0800)]
iris: Add gitlab-ci build testing

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
5 years agofreedreno/a6xx: cube image fix
Rob Clark [Fri, 22 Feb 2019 18:09:25 +0000 (13:09 -0500)]
freedreno/a6xx: cube image fix

Note that emit_intrinsic_load_image() already swaps a .3d flag with an
.a flag.  I tried doing things the other way around (going back to .3d)
but that didn't work.  And treating cube images as 2d array is also what
blob does, so let's just go with that.

Fixes dEQP-GLES31.functional.image_load_store.cube.load_store.*

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/a6xx: fix border-color offset
Rob Clark [Thu, 21 Feb 2019 20:44:35 +0000 (15:44 -0500)]
freedreno/a6xx: fix border-color offset

Fixes nearly all of dEQP-GLES31.functional.texture.border_clamp.* when
run after a test that binds textures used in vertex shader.

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno/ir3: don't hardcode wrmask
Rob Clark [Thu, 21 Feb 2019 19:46:10 +0000 (14:46 -0500)]
freedreno/ir3: don't hardcode wrmask

Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.sampler.const_literal.vertex.samplercubeshadow
and few other similar tests that do multiple texture fetches into
individual components of a packet output.  Mostly works around the
issue mentioned in ra_block_find_definers().

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agofreedreno: fix race condition
Rob Clark [Tue, 19 Feb 2019 14:29:49 +0000 (09:29 -0500)]
freedreno: fix race condition

rsc->write_batch can be cleared behind our back, so we need to acquire
the lock *before* deref'ing.

Signed-off-by: Rob Clark <robdclark@gmail.com>
5 years agovulkan: Fix 32-bit build for the new overlay layer
Kenneth Graunke [Fri, 22 Feb 2019 03:07:29 +0000 (19:07 -0800)]
vulkan: Fix 32-bit build for the new overlay layer

vulkan_core.h defines non-dispatchable handles as (struct object *)
on 64-bit systems, but uint64_t on 32-bit systems.  The former can be
implicitly cast to void *, but the latter requires an explicit cast.

While here, %lu is the wrong format specifier for uint64_t on 32-bit
systems, so use PRIu64, fixing a warning.

Reported-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agoanv: advertise 8 subpixel precision bits
Juan A. Suarez Romero [Fri, 22 Feb 2019 15:47:53 +0000 (16:47 +0100)]
anv: advertise 8 subpixel precision bits

On one side, when emitting 3DSTATE_SF, VertexSubPixelPrecisionSelect is
used to select between 8 bit subpixel precision (value 0) or 4 bit
subpixel precision (value 1). As this value is not set, means it is
taking the value 0, so 8 bit are used.

On the other side, in the Vulkan CTS tests, if the reference rasterizer,
which uses 8 bit precision, as it is used to check what should be the
expected value for the tests, is changed to use 4 bit as ANV was
advertising so far, some of the tests will fail.

So it seems ANV is actually using 8 bits.

v2: explicitly set 3DSTATE_SF::VertexSubPixelPrecisionSelect (Jason)

v3: use _8Bit definition as value (Jason)

v4: (by Jason)
anv: Explicitly set 3DSTATE_CLIP::VertexSubPixelPrecisionSelect

This field was added on gen8 even though there's an identically defined
one in 3DSTATE_SF.

CC: Jason Ekstrand <jason@jlekstrand.net>
CC: Kenneth Graunke <kenneth@whitecape.org>
CC: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agogenxml: add missing field values for 3DSTATE_SF
Juan A. Suarez Romero [Fri, 22 Feb 2019 15:16:24 +0000 (16:16 +0100)]
genxml: add missing field values for 3DSTATE_SF

Fill out "Vertex Sub Pixel Precision Select" possible values.

CC: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoradv: Allow interpolation on non-float types.
Bas Nieuwenhuizen [Fri, 22 Feb 2019 13:24:28 +0000 (14:24 +0100)]
radv: Allow interpolation on non-float types.

In particular structs containing floats and 16-bit floating point
types.

Fixes: 62024fa7750 "radv: enable VK_KHR_16bit_storage extension / 16bit storage features"
Fixes: da295946361 "spirv: Only split blocks"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109735
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agoradv: Fix float16 interpolation set up.
Bas Nieuwenhuizen [Fri, 22 Feb 2019 13:16:08 +0000 (14:16 +0100)]
radv: Fix float16 interpolation set up.

float16 types can have non-flat interpolation so set up the HW
correctly for that.

Fixes: 62024fa7750 "radv: enable VK_KHR_16bit_storage extension / 16bit storage features"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
5 years agonv50: disable compute
Ilia Mirkin [Fri, 22 Feb 2019 14:40:37 +0000 (09:40 -0500)]
nv50: disable compute

It causes more trouble than it's worth. Now vl tries to create compute
shaders without all the proper checking. Since there's really no
(current) way to use compute on nv50, just mark it disabled.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109742
Fixes: f6ac0b5d71 ("gallium/auxiliary/vl: Add compute shader to support video compositor render")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
5 years agointel: fix urb size for CFL GT1
Lionel Landwerlin [Wed, 20 Feb 2019 12:49:17 +0000 (12:49 +0000)]
intel: fix urb size for CFL GT1

Same 192Kb amount as SKL/KBL GT1 applies.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Fixes: de7ed0ba5522 ("i965/CFL: Add PCI Ids for Coffee Lake.")
5 years agoisl: the display engine requires 64B alignment for linear surfaces
Samuel Iglesias Gonsálvez [Tue, 19 Feb 2019 12:06:25 +0000 (13:06 +0100)]
isl: the display engine requires 64B alignment for linear surfaces

v2: Add PRM quote (Lionel)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
5 years agovirgl: Enable mixed color FBO attachemnets only when the host supports
Gert Wollny [Thu, 14 Feb 2019 14:21:30 +0000 (15:21 +0100)]
virgl: Enable mixed color FBO attachemnets only when the host supports
it

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
5 years agoandroid: intel/isl: remove redundant building rules
Mauro Rossi [Thu, 21 Feb 2019 23:30:38 +0000 (00:30 +0100)]
android: intel/isl: remove redundant building rules

Fixes the following building error:

including ./external/mesa/Android.mk ...
build/core/base_rules.mk:183: *** external/mesa/src/intel:
MODULE.TARGET.STATIC_LIBRARIES.libmesa_isl_tiled_memcpy already defined by external/mesa/src/intel.
make: *** [build/core/ninja.mk:164: out/build-android_x86_64.ninja] Error 1

ISL_TILED_MEMCPY_FILES is isl/isl_tiled_memcpy_normal.c
and that source file includes isl_tiled_memcpy.c source

Fixes: 96bb328 ("iris: add Android build")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
5 years agoRevert "iris: Enable auxiliary buffer support"
Kenneth Graunke [Thu, 21 Feb 2019 23:50:14 +0000 (15:50 -0800)]
Revert "iris: Enable auxiliary buffer support"

This reverts commit cd0ced49e7957182d23e21657445b720184ea425.

It breaks glxgears rendering.

5 years agoiris: Enable -msse2 and -mstackrealign
Kenneth Graunke [Thu, 21 Feb 2019 22:29:00 +0000 (14:29 -0800)]
iris: Enable -msse2 and -mstackrealign

This is needed for gen_clflush.h intrinsics to work on 32-bit builds.
i965 and anv both set these, and iris needs to as well.

Tested-by: Mark Janes <mark.a.janes@intel.com>
5 years agointel/fs: Rely on undocumented unrestricted regioning for 32x16-bit integer multiply.
Francisco Jerez [Fri, 18 Jan 2019 19:38:17 +0000 (11:38 -0800)]
intel/fs: Rely on undocumented unrestricted regioning for 32x16-bit integer multiply.

Even though the hardware spec claims that any "integer DWord multiply"
operation is affected by the regioning restrictions of CHV/BXT/GLK,
this is inconsistent with the behavior of the simulator and with
empirical evidence -- Return false from has_dst_aligned_region_restriction()
for such instructions as a micro-optimization.

Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/fs: Implement extended strides greater than 4 for IR source regions.
Francisco Jerez [Fri, 18 Jan 2019 20:51:57 +0000 (12:51 -0800)]
intel/fs: Implement extended strides greater than 4 for IR source regions.

Strides up to 32B can be implemented for the source regions of most
instructions by leveraging either the vertical or the horizontal
stride of the hardware Align1 region.  The main motivation for this is
that currently the lower_integer_multiplication() pass will happily
double the stride of one of the 32-bit sources, which can blow up if
the stride of the original source was already the maximum value
allowed by the hardware.

An alternative would be to use the regioning legalization pass in
order to lower such strides into the composition of multiple legal
strides, but that would be somewhat less efficient.

This showed up as a regression from my commit cbea91eb57a501bebb1ca2
in Vulkan 1.1 CTS tests on CHV/BXT platforms, however it was really a
pre-existing problem that had affected conformance on other platforms
without native support for integer multiplication.  CHV/BXT were
getting around it because the code I removed in that commit had the
"fortunate" side effect of emitting narrower regions that didn't hit
the hardware stride limit after lowering.  Beyond fixing the
regression this fixes ~90 additional Vulkan 1.1 subgroup CTS tests on
ICL (that's why this patch is marked for inclusion in mesa-stable even
though the original regressing patch was not).

According to Jason, a nearly equivalent change had been committed
previously as e8c9e65185de3e821e1 and then (mistakenly?) reverted as
a31d0382084c8aa8.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328
Reported-by: Mark Janes <mark.a.janes@intel.com>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/fs: Cap dst-aligned region stride to maximum representable hstride value.
Francisco Jerez [Thu, 17 Jan 2019 02:49:47 +0000 (18:49 -0800)]
intel/fs: Cap dst-aligned region stride to maximum representable hstride value.

This is required in combination with the following commit, because
otherwise if a source region with an extended 8+ stride is present in
the instruction (which we're about to declare legal) we'll end up
emitting code that attempts to write to such a region, even though
strides greater than four are still illegal for the destination.

Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/fs: Lower integer multiply correctly when destination stride equals 4.
Francisco Jerez [Thu, 17 Jan 2019 03:01:04 +0000 (19:01 -0800)]
intel/fs: Lower integer multiply correctly when destination stride equals 4.

Because the "low" temporary needs to be accessed with word type and
twice the original stride, attempting to preserve the alignment of the
original destination can potentially lead to instructions with illegal
destination stride greater than four.  Because the CHV/BXT alignment
restrictions are now being enforced by the regioning lowering pass run
after lower_integer_multiplication(), there is no real need to
preserve the original strides anymore.

Note that this bug can be reproduced on stable branches, but
back-porting would be non-trivial, because the fix relies on the
regioning lowering pass recently introduced.

Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agointel/fs: Exclude control sources from execution type and region alignment calculations.
Francisco Jerez [Thu, 17 Jan 2019 02:30:08 +0000 (18:30 -0800)]
intel/fs: Exclude control sources from execution type and region alignment calculations.

Currently the execution type calculation will return a bogus value in
cases like:

  mov_indirect(8) vgrf0:w, vgrf1:w, vgrf2:ud, 32u

Which will be considered to have a 32-bit integer execution type even
though the actual indirect move operation will be carried out with
16-bit precision.

Similarly there's no need to apply the CHV/BXT double-precision region
alignment restrictions to such control sources, since they aren't
directly involved in the double-precision arithmetic operations
emitted by these virtual instructions.  Applying the CHV/BXT
restrictions to control sources was expected to be harmless if mildly
inefficient, but unfortunately it exposed problems at codegen level
for virtual instructions (namely the SHUFFLE instruction used for the
Vulkan 1.1 subgroup feature) that weren't prepared to accept control
sources with an arbitrary strided region.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328
Reported-by: Mark Janes <mark.a.janes@intel.com>
Fixes: efa4e4bc5fc "intel/fs: Introduce regioning lowering pass."
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agonir: clone instruction set rather than removing individual entries
Timothy Arceri [Wed, 20 Feb 2019 03:03:37 +0000 (14:03 +1100)]
nir: clone instruction set rather than removing individual entries

This reduces the time spent in nir_opt_cse() by almost a half.

The massif tool from callgrind reported no change in peak
memory use with the large doliphin uber shaders I used for
testing.

Reviewed-by: Thomas Helland<thomashelland90@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agogenxml: Remove extra space in gen4/45/5 field name
Jordan Justen [Fri, 18 Aug 2017 00:28:23 +0000 (17:28 -0700)]
genxml: Remove extra space in gen4/45/5 field name

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agogenxml/gen_bits_header.py: Use regex to strip no alphanum chars
Jordan Justen [Thu, 17 Aug 2017 22:44:53 +0000 (15:44 -0700)]
genxml/gen_bits_header.py: Use regex to strip no alphanum chars

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
5 years agoiris: Enable auxiliary buffer support
Kenneth Graunke [Thu, 14 Feb 2019 01:31:52 +0000 (17:31 -0800)]
iris: Enable auxiliary buffer support

This currently regresses KHR-GL4x.compute_shader.resource-texture,
but that's a pre-existing bug (https://bugs.freedesktop.org/109113)
which should be fixed up once we have fast clear support.

5 years agoiris: Flag ALL_DIRTY_BINDINGS on aux state change.
Rafael Antognolli [Wed, 20 Feb 2019 01:12:19 +0000 (17:12 -0800)]
iris: Flag ALL_DIRTY_BINDINGS on aux state change.

If we change the aux state for a given resource, we need to re-emit the
binding table pointers for any stage that has such resource bound. Since
we don't track that, flag IRIS_ALL_DIRTY_BINDINGS and emit all of them.

5 years agoiris: Skip resolve if there's no context.
Rafael Antognolli [Wed, 20 Feb 2019 01:08:14 +0000 (17:08 -0800)]
iris: Skip resolve if there's no context.

If iris_resource_get_handle() gets called without a context, we can't
resolve the resource. Hopefully it shouldn't be compressed anyway, so
let's just add an assert to ensure it's correct.

5 years agoiris/clear: Pass on render_condition_enabled.
Rafael Antognolli [Fri, 15 Feb 2019 23:23:56 +0000 (15:23 -0800)]
iris/clear: Pass on render_condition_enabled.

5 years agoiris: Avoid leaking if we fail to allocate the aux buffer.
Rafael Antognolli [Fri, 15 Feb 2019 22:16:04 +0000 (14:16 -0800)]
iris: Avoid leaking if we fail to allocate the aux buffer.

Otherwise we could leak the aux state map or the aux BO.

5 years agoiris: Only resolve compute resources for compute shaders
Kenneth Graunke [Thu, 14 Feb 2019 07:10:39 +0000 (23:10 -0800)]
iris: Only resolve compute resources for compute shaders

5 years agoiris: Fix aux usage in render resolve code
Kenneth Graunke [Thu, 14 Feb 2019 06:31:07 +0000 (22:31 -0800)]
iris: Fix aux usage in render resolve code

5 years agoiris: Pin HiZ buffers when rendering.
Rafael Antognolli [Wed, 13 Feb 2019 18:20:41 +0000 (10:20 -0800)]
iris: Pin HiZ buffers when rendering.

5 years agoiris: Flush before hiz_exec.
Rafael Antognolli [Wed, 6 Feb 2019 00:40:14 +0000 (16:40 -0800)]
iris: Flush before hiz_exec.

5 years agoiris: Allow disabling aux via INTEL_DEBUG options
Kenneth Graunke [Tue, 11 Dec 2018 08:43:05 +0000 (00:43 -0800)]
iris: Allow disabling aux via INTEL_DEBUG options

5 years agoiris: do flush for buffers still
Kenneth Graunke [Tue, 11 Dec 2018 07:13:23 +0000 (23:13 -0800)]
iris: do flush for buffers still

5 years agoiris: make surface states for CCS_D too
Kenneth Graunke [Tue, 11 Dec 2018 06:41:34 +0000 (22:41 -0800)]
iris: make surface states for CCS_D too

CCS_E can fall back to CCS_D with incompatible format views

CCS_D is pretty useless without fast clears and we may as well use NONE,
but we're surely going to hook those up at some point, so may as well
just go ahead and do it now...

5 years agoiris: Skip msaa16 on gen < 9.
Rafael Antognolli [Mon, 4 Feb 2019 23:16:18 +0000 (15:16 -0800)]
iris: Skip msaa16 on gen < 9.

Also needed to add gen information to KEY_INIT.

5 years agoiris: Set program key fields for MCS
Kenneth Graunke [Tue, 11 Dec 2018 06:03:14 +0000 (22:03 -0800)]
iris: Set program key fields for MCS

5 years agoiris: don't use hiz for MSAA buffers
Kenneth Graunke [Tue, 11 Dec 2018 05:54:44 +0000 (21:54 -0800)]
iris: don't use hiz for MSAA buffers

5 years agoiris: some initial HiZ bits
Kenneth Graunke [Mon, 10 Dec 2018 08:35:48 +0000 (00:35 -0800)]
iris: some initial HiZ bits

5 years agoiris: disable aux for external things
Kenneth Graunke [Mon, 10 Dec 2018 07:12:33 +0000 (23:12 -0800)]
iris: disable aux for external things

5 years agoiris: Resolves for compute
Kenneth Graunke [Mon, 10 Dec 2018 03:08:40 +0000 (19:08 -0800)]
iris: Resolves for compute

5 years agoiris: consider framebuffer parameter for aux usages
Kenneth Graunke [Mon, 10 Dec 2018 03:07:13 +0000 (19:07 -0800)]
iris: consider framebuffer parameter for aux usages

5 years agoiris: Make blit code use actual aux usages
Kenneth Graunke [Mon, 10 Dec 2018 00:09:55 +0000 (16:09 -0800)]
iris: Make blit code use actual aux usages

5 years agoiris: store modifier info in res
Kenneth Graunke [Sun, 9 Dec 2018 20:11:17 +0000 (12:11 -0800)]
iris: store modifier info in res

5 years agoiris: pin the buffers
Kenneth Graunke [Sat, 8 Dec 2018 19:52:55 +0000 (11:52 -0800)]
iris: pin the buffers

5 years agoiris: resolve before transfer maps
Kenneth Graunke [Sat, 8 Dec 2018 19:40:25 +0000 (11:40 -0800)]
iris: resolve before transfer maps

5 years agoiris: be sure to skip buffers in resolve code
Kenneth Graunke [Sat, 8 Dec 2018 10:01:19 +0000 (02:01 -0800)]
iris: be sure to skip buffers in resolve code

Buffers don't have ISL surfaces, and this can get us into trouble.

5 years agoiris: try to fix copyimage vs copybuffers
Kenneth Graunke [Sat, 8 Dec 2018 09:32:10 +0000 (01:32 -0800)]
iris: try to fix copyimage vs copybuffers

5 years agoiris: actually use the multiple surf states for aux modes
Kenneth Graunke [Sat, 8 Dec 2018 03:51:05 +0000 (19:51 -0800)]
iris: actually use the multiple surf states for aux modes

5 years agoiris: add some draw resolve hooks
Kenneth Graunke [Sat, 8 Dec 2018 02:13:07 +0000 (18:13 -0800)]
iris: add some draw resolve hooks

5 years agoiris: blorp using resolve hooks
Kenneth Graunke [Fri, 7 Dec 2018 21:33:25 +0000 (13:33 -0800)]
iris: blorp using resolve hooks

5 years agoiris: Initial import of resolve code
Kenneth Graunke [Fri, 7 Dec 2018 19:54:16 +0000 (11:54 -0800)]
iris: Initial import of resolve code

5 years agoiris: create aux surface if needed
Kenneth Graunke [Fri, 7 Dec 2018 19:54:02 +0000 (11:54 -0800)]
iris: create aux surface if needed

5 years agoiris: Fill out SURFACE_STATE entries for each possible aux usage
Kenneth Graunke [Fri, 7 Dec 2018 19:33:13 +0000 (11:33 -0800)]
iris: Fill out SURFACE_STATE entries for each possible aux usage