mesa.git
8 years agolinker: Accurately track gl_uniform_block::stageref
Ian Romanick [Mon, 7 Nov 2016 23:59:35 +0000 (15:59 -0800)]
linker: Accurately track gl_uniform_block::stageref

As the linked per-stage shaders are processed, mark any block that has a
field that is accessed as referenced.  When combining all the linked
shaders, combine the per-stage stageref masks.

This fixes a number of GLES CTS tests:

    ES31-CTS.core.geometry_shader.program_resource.program_resource
    ES32-CTS.core.geometry_shader.program_resource.program_resource
    ESEXT-CTS.geometry_shader.program_resource.program_resource
    piglit.gl45-cts.geometry_shader.program_resource.program_resource

However, it makes quite a few more fail:

    ES31-CTS.functional.program_interface_query.buffer_variable.random.6
    ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.compute.unnamed_block.float
    ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.separable_fragment.unnamed_block.float
    ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_fragment_only_fragment.unnamed_block.float
    ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_fragment.unnamed_block.float
    ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_geo_fragment_only_fragment.unnamed_block.float
    ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_geo_fragment.unnamed_block.float
    ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_fragment_only_fragment.unnamed_block.float
    ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_fragment.unnamed_block.float
    ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_geo_fragment_only_fragment.unnamed_block.float
    ES31-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_geo_fragment.unnamed_block.float
    ES32-CTS.functional.program_interface_query.buffer_variable.random.6
    ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.compute.unnamed_block.float
    ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.separable_fragment.unnamed_block.float
    ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_fragment_only_fragment.unnamed_block.float
    ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_fragment.unnamed_block.float
    ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_geo_fragment_only_fragment.unnamed_block.float
    ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_geo_fragment.unnamed_block.float
    ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_fragment_only_fragment.unnamed_block.float
    ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_fragment.unnamed_block.float
    ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_geo_fragment_only_fragment.unnamed_block.float
    ES32-CTS.functional.program_interface_query.buffer_variable.referenced_by.vertex_tess_geo_fragment.unnamed_block.float

I have diagnosed the failures, but I'm not sure whether we or the
tests are wrong.  After optimizations are applied, all of the tests
are of the form:

    buffer X {
        float f;
    } x;

    void main()
    {
        x.f = x.f;
    }

The test then queries that x is referenced by that shader stage.  We
eliminate the assignment of x.f to itself, and that removes the last
reference to x.  We report that x is not referenced, and the test fails.
I do not know whether or not we are allowed to eliminate that assignment
of x.f to itself.

After discussions with the OpenGL ES group in Khronos, we believe that
Mesa's behavior is correct.  I will provide patches to the CTS tests
to Khronos.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
8 years agolinker: Slight code rearrange to prevent duplication in the next commit
Ian Romanick [Mon, 7 Nov 2016 23:54:46 +0000 (15:54 -0800)]
linker: Slight code rearrange to prevent duplication in the next commit

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agolinker: Trivial coding standards fixes
Ian Romanick [Fri, 4 Nov 2016 19:36:08 +0000 (12:36 -0700)]
linker: Trivial coding standards fixes

v2: Revert the unreachable to assert in
parcel_out_uniform_storage::visit_field.  Suggested by Ilia.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoglsl: Add some comments to methods of ir_variable_refcount_visitor
Ian Romanick [Mon, 7 Nov 2016 22:24:39 +0000 (14:24 -0800)]
glsl: Add some comments to methods of ir_variable_refcount_visitor

It was not obvious from the just the .h file what the hash table
contained.  It was also not obvious that get_variable_entry would create
a new entry in the hash table.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agollvmpipe: Fix build after removal of deprecated attribute API v2
Aaron Watry [Tue, 8 Nov 2016 03:55:14 +0000 (21:55 -0600)]
llvmpipe: Fix build after removal of deprecated attribute API v2

Applies on top of v3 of Tom's gallivm change.

v2:
  - Tom Stellard: Use enums instread of strings.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Signed-off-by: Aaron Watry <awatry@gmail.com>
CC: Tom Stellard <thomas.stellard@amd.com>
CC: Jan Vesely <jan.vesely@rutgers.edu>
8 years agogallivm: Fix build after removal of deprecated attribute API v3
Tom Stellard [Mon, 7 Nov 2016 18:35:09 +0000 (18:35 +0000)]
gallivm: Fix build after removal of deprecated attribute API v3

v2:
  Fix adding parameter attributes with LLVM < 4.0.

v3:
  Fix typo.
  Fix parameter index.
  Add a gallivm enum for function attributes.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoradv: fix GetFenceStatus for signaled fences
Dave Airlie [Wed, 9 Nov 2016 01:21:30 +0000 (01:21 +0000)]
radv: fix GetFenceStatus for signaled fences

if a fence is created pre-signaled we should return that
in GetFenceStatus even if it hasn't been submitted.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Gustaw Smolarczyk <wielkiegie@gmail.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agoradv: enable conditional discard optimisation on radv.
Dave Airlie [Wed, 2 Nov 2016 01:23:11 +0000 (01:23 +0000)]
radv: enable conditional discard optimisation on radv.

This fixes a bunch of GPU hangs introduced in some CTS
tests like
dEQP-VK.memory.pipeline_barrier.host_write_uniform_buffer.65536

It works around an issue seen in the LLVM backend, but
also makes the radv code work more like the radeonsi stack.

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agonir: add conditional discard optimisation (v4)
Dave Airlie [Wed, 2 Nov 2016 01:22:07 +0000 (01:22 +0000)]
nir: add conditional discard optimisation (v4)

This is ported from GLSL and converts

if (cond)
discard;

into
discard_if(cond);

This removes a block, but also is needed by radv
to workaround a bug in the LLVM backend.

v2: handle if (a) discard_if(b) (nha)
cleanup and drop pointless loop (Matt)
make sure there are no dependent phis (Eric)
v3: make sure only one instruction in the then block.
v4: remove sneaky tabs, add cursor init (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agoac/nir: add support for discard_if intrinsic (v2)
Dave Airlie [Wed, 2 Nov 2016 01:21:15 +0000 (01:21 +0000)]
ac/nir: add support for discard_if intrinsic (v2)

We are going to start lowering to this in NIR code,
so prepare radv for it.

v2: handle conversion to kilp properly (nha)

Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agoanv: Do relocations in userspace before execbuf ioctl
Kristian Høgsberg Kristensen [Tue, 8 Mar 2016 23:31:47 +0000 (15:31 -0800)]
anv: Do relocations in userspace before execbuf ioctl

Since our surface state buffer is shared by all batches, the kernel does a
full stall and sync with the CPU between batches every time we call
execbuf2 because it refuses to do relocations on an active buffer.  Doing
them in userspace and passing the NO_RELOC flag to the kernel allows us to
perform the relocations without stalling.

This improves the performance of Dota 2 by around 30% on a Sky Lake GT2.

v2 (Jason Ekstrand):
 - Better comments (Chris Wilson)
 - Fixed write_reloc for correct canonical form (Chris Wilson)

v3 (Jason Ekstrand):
 - Skip relocations which aren't needed
 - Provide an environment variable to always use the kernel
 - More comments about correctness (Chris Wilson)

v4 (Jason Ekstrand):
 - More comments (Chris Wilson)

v5 (Jason Ekstrand):
 - Rebase on top of moving execbuf2 setup go QueueSubmit

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
8 years agoanv: Move relocation handling from EndCommandBuffer to QueueSubmit
Jason Ekstrand [Sun, 6 Nov 2016 02:47:33 +0000 (19:47 -0700)]
anv: Move relocation handling from EndCommandBuffer to QueueSubmit

Ever since the early days of the Vulkan driver, we've been setting up the
lists of relocations at EndCommandBuffer time.  The idea behind this was to
move some of the CPU load out of QueueSubmit which the client is required
to lock around and into command buffer building which could be done in
parallel.  Then QueueSubmit basically just becomes a bunch of execbuf2
calls.

Technically, this works.  However, when you start to do more in QueueSubmit
than just execbuf2, you start to run into problems.  In particular, if a
block pool is resized between EndCommandBuffer and QueueSubmit, the list of
anv_bo's and the execbuf2 object list can get out of sync.  This can cause
problems if, for instance, you wanted to do relocations in userspace.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
8 years agoanv/batch: Move last_ss_pool_bo_offset to the command buffer
Jason Ekstrand [Mon, 7 Nov 2016 16:07:43 +0000 (08:07 -0800)]
anv/batch: Move last_ss_pool_bo_offset to the command buffer

The original reason for putting it in the batch_bo was to allow primaries
to share it across secondaries or something like that.  However, the
relocation lists in secondary command buffers are are always left alone and
copied into the primary command buffer's relocation list.  This means that
the offset really applies at the command buffer level and putting it in the
batch_bo doesn't make sense.  This fixes a couple of potential bugs around
re-submission of command buffers that are not likely to be hit but are bugs
none the less.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
8 years agoanv: Add an anv_execbuf helper struct
Jason Ekstrand [Sun, 6 Nov 2016 02:01:44 +0000 (19:01 -0700)]
anv: Add an anv_execbuf helper struct

This commit adds a little helper struct for storing everything we use to
build an execbuf2 call.  Since the add_bo function really has nothing to do
with a command buffer, it makes sense to break it out a bit.  This also
reduces some of the churn in the next commit.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
8 years agoanv/batch_chain: Improve write_reloc
Jason Ekstrand [Wed, 2 Nov 2016 17:42:45 +0000 (10:42 -0700)]
anv/batch_chain: Improve write_reloc

The old version wasn't properly handling large addresses where we have to
sign-extend to get it into the "canonical form" expected by the hardware.
Also, the new version is capable of doing a clflush of the newly written
reloc if requested.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
8 years agoanv: Initialize anv_bo::offset to -1
Jason Ekstrand [Tue, 1 Nov 2016 03:25:08 +0000 (20:25 -0700)]
anv: Initialize anv_bo::offset to -1

Since -1 is an invalid GPU address, this lets us know whether or not we
have a valid address for a buffer.  We don't get a valid address until the
first time that buffer is used in an execbuf2 ioctl.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
8 years agoanv/allocator: Simplify anv_scratch_pool
Jason Ekstrand [Tue, 1 Nov 2016 20:10:11 +0000 (13:10 -0700)]
anv/allocator: Simplify anv_scratch_pool

The previous implementation was being overly clever and using the
anv_bo::size field as its mutex.  Scratch pool allocations don't happen
often, will happen at most a fixed number of times, and never happen in the
critical path (they only happen in shader compilation).  We can make this
much simpler by just using the device mutex.  This also means that we can
start using anv_bo_init_new directly on the bo and avoid setting fields
one-at-a-time.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
8 years agoanv: Add a new bo_pool_init helper
Jason Ekstrand [Tue, 1 Nov 2016 20:09:36 +0000 (13:09 -0700)]
anv: Add a new bo_pool_init helper

This ensures that we're always setting all of the fields in anv_bo

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
8 years agoanv: Don't presume to know what address is in a surface relocation
Jason Ekstrand [Tue, 1 Nov 2016 14:21:00 +0000 (07:21 -0700)]
anv: Don't presume to know what address is in a surface relocation

Because our relocation processing happens at EndCommandBuffer time and
because RENDER_SURFACE_STATE objects may be shared by batches, we really
have no clue whatsoever what address is actually written to the relocation
offset in the BO.  We need to stop making such claims to the kernel and
just let it relocate for us.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
8 years agoanv: Add a cmd_buffer_execbuf helper
Jason Ekstrand [Wed, 2 Nov 2016 17:33:54 +0000 (10:33 -0700)]
anv: Add a cmd_buffer_execbuf helper

This puts the actual execbuf2 call in anv_batch_chain.c along with the
other relocation stuff.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
8 years agoanv/device: Add an execbuf wrapper
Jason Ekstrand [Tue, 1 Nov 2016 03:36:26 +0000 (20:36 -0700)]
anv/device: Add an execbuf wrapper

This wrapper ensures that we always update all anv_bo::offset fields based
on the offsets returned by the kernel.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
8 years agoanv: Make anv_finishme only warn once per call-site
Jason Ekstrand [Wed, 9 Nov 2016 04:43:09 +0000 (20:43 -0800)]
anv: Make anv_finishme only warn once per call-site

When you fire up Dota2 on Haswell you get spammed with thousands of
"Implement Gen7 HZ ops" finishme's.  The point of anv_finishme is to act as
a reminder that there is something left to implement.  Printing it once
should be sufficient.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agoi965/compute: Allow ARB_compute_shader in compat profile
Jordan Justen [Thu, 3 Nov 2016 22:22:11 +0000 (15:22 -0700)]
i965/compute: Allow ARB_compute_shader in compat profile

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97447
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Tested-by: Evan Odabashian <eodabash@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
8 years agoRevert "draw: use vectorized calculations for fetch"
Roland Scheidegger [Wed, 9 Nov 2016 04:46:12 +0000 (05:46 +0100)]
Revert "draw: use vectorized calculations for fetch"

Trivial. There's some regressions internally, related to overflow
behavior. I'll have to look at it at another time, some interactions
with vsplit/vcache are actually mind-blowing.

This reverts commit 3fa10ffb496cc4e6d1003891cf0381bb5bec2a74.

8 years agoswr: disable logic op when the rt format is float or srgb
Ilia Mirkin [Tue, 8 Nov 2016 22:30:03 +0000 (17:30 -0500)]
swr: disable logic op when the rt format is float or srgb

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
8 years agoswr: fix AND_INVERTED logic op conversion
Ilia Mirkin [Tue, 8 Nov 2016 00:18:49 +0000 (19:18 -0500)]
swr: fix AND_INVERTED logic op conversion

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
8 years agoswr: add support for EXT_depth_bounds_test
Ilia Mirkin [Tue, 1 Nov 2016 20:45:13 +0000 (16:45 -0400)]
swr: add support for EXT_depth_bounds_test

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
8 years agoswr: [rasterizer core] set depth hottile when depth bounds test enabled
Ilia Mirkin [Tue, 1 Nov 2016 20:45:12 +0000 (16:45 -0400)]
swr: [rasterizer core] set depth hottile when depth bounds test enabled

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Tim Rowley <timothy.o.rowley@intel.com>
8 years agoi965: Fix GPU hang related to multiple render targets and alpha testing
Anuj Phogat [Mon, 24 Oct 2016 23:03:00 +0000 (16:03 -0700)]
i965: Fix GPU hang related to multiple render targets and alpha testing

This patch should have been the part of commit e592f7df.
In a situation when there are multiple render targets with alpha testing
enabled, if fragment shader doesn't write to draw buffer zero, it causes
the GPU hang on SKL. No GPU hang is seen on HSW. Simulator gives a
warning for all gen6+ h/w:
"Illegal render target write message length 0xa expected 0xc"

This patch fixes the GPU hang as well as the simulator warning with
new piglit test fbo-mrt-alphatest-no-buffer-zero-write:
https://patchwork.freedesktop.org/patch/118212

No regressions in Jenkins CI system.

Cc: "12.0 13.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
8 years agoswr: allow alphatest without blend or logicop
Tim Rowley [Fri, 4 Nov 2016 18:10:56 +0000 (13:10 -0500)]
swr: allow alphatest without blend or logicop

We need to compile a blend function when alphatest is enabled.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
8 years agoradv: emit correct last export when Z/stencil export is enabled
Dave Airlie [Tue, 8 Nov 2016 06:22:39 +0000 (16:22 +1000)]
radv: emit correct last export when Z/stencil export is enabled

I was getting a random GPU hang in the renderpass simple tests,
it turns out sometimes radv emitted the wrong thing "last".

This fixes the logic to emit Z/stencil last if they occur,
and not mark a color output as last. Also this relies on the
Z/STENCIL being the first two fragment outputs, which they are
so yay.

Fixes: dEQP-VK.renderpass.simple.color_depth (random hangs)
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
8 years agotgsi/scan: turn a huge if-else-if.. chain into a switch statement
Marek Olšák [Sat, 5 Nov 2016 17:21:57 +0000 (18:21 +0100)]
tgsi/scan: turn a huge if-else-if.. chain into a switch statement

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agotgsi/scan: fix images_buffers regression
Marek Olšák [Sat, 5 Nov 2016 17:16:16 +0000 (18:16 +0100)]
tgsi/scan: fix images_buffers regression

The first IF statement disabled the second one.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98599

Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoanv: Document cmd_buffer_alloc_binding_table
Jason Ekstrand [Mon, 7 Nov 2016 20:32:28 +0000 (12:32 -0800)]
anv: Document cmd_buffer_alloc_binding_table

Some of the details of this function are very confusing and have a long
history.  We should document that history and this seems like the best
place to do it.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
8 years agointel/blorp: Emit all the binding tables
Jason Ekstrand [Sun, 23 Oct 2016 05:27:23 +0000 (22:27 -0700)]
intel/blorp: Emit all the binding tables

At least on Sky Lake, after emitting 3DSTATE_CONSTANT_*, you are required
to re-emit the 3DSTATE_BINDING_TABLE_POINTERS packet for the corresponding
stage.  If you don't, double-buffering may fail and you may get the wrong
constants.  It turns out that you need to do this even if you have no push
constants to speak of or else the next 3DSTATE_CONSTANT packet you emit for
that stage may not work correctly.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
8 years agoi965/gen9: Allow sampling with hiz when supported
Jordan Justen [Fri, 21 Oct 2016 15:50:50 +0000 (16:50 +0100)]
i965/gen9: Allow sampling with hiz when supported

For gen9+ this will indicate when we should allow hiz based sampling
during rendering.

Improves performance in :
  - Synmark's OglDeferred by 2.2% (n=20)
  - Synmark's OglShMapPcf by 0.44% (n=20)

v2 by Ben: Add spec reference, and make it fix with some of the changes made on
the previous patches
Change the check from mt->aux_buf to mt->num_samples. The presence of an aux_buf
isn't enough to determine there isn't a HiZ buffer to use.

v3: It seems all depth surface end up with num_samples = 0 by default,
    so allow sampling from depth HiZ if num_samples <= 1. (Lionel)
    Allow sampling from HiZ only if all LOD are available from the HiZ
    buffer. (Lionel)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> (v2)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v3)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
8 years agoi965/gen9: Add HiZ auxiliary buffer support
Ben Widawsky [Fri, 21 Oct 2016 15:47:23 +0000 (16:47 +0100)]
i965/gen9: Add HiZ auxiliary buffer support

The original functionality this patch introduces was authored by a patch from
Ken (the commit subject was the same). Since I ended up changing so many patches
in the code before this one, I had some non-trivial decisions to make, and I
didn't feel it was appropriate to keeps Ken's name as author (mostly because he
might not like what I've done). Ken's original patch was like 2 LOC :-)

In either case, some credit needs to go to Ken, and to Jordan for a few small
other changes in that original patch.

v2: Back to a smaller diff now that ISL handles most of the actual
    programming (Lionel)

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v2)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
8 years agoi965: Add function to indicate when sampling with hiz is supported
Jordan Justen [Fri, 21 Oct 2016 14:46:37 +0000 (15:46 +0100)]
i965: Add function to indicate when sampling with hiz is supported

Currently it indicates that this is never supported, but soon it will
be supported for gen8+^w gen9+

v2 by Ben:
- Explicitly disable aux_hiz for gen < 9 (with comment)
- squashed in next patch to avoid unused and useless functions

   i965: Support sampling with hiz during rendering

   For gen8, we can sample from depth while using the hiz buffer. This
   allows us to sample depth without resolving from hiz to the depth
   texture.

   To do this we must resolve to hiz before drawing so we can use the hiz
   buffer to sample while rendering. Hopefully the hiz buffer will
   already be resolved in most cases because it was previously rendered,
   meaning the hiz resolve is a no-op.

   Note that this is still controlled by the
   intel_miptree_sample_with_hiz function, and we will enable hiz
   sampling for gen8 in a separate patch.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> (v2)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
8 years agoi965/miptree: Create a hiz mcs type
Ben Widawsky [Fri, 21 Oct 2016 14:10:56 +0000 (15:10 +0100)]
i965/miptree: Create a hiz mcs type

This seems counter to the goal of consolidating hiz, mcs, and later ccs buffers.
Unfortunately, hiz on gen6 is a thing the code supports, and this wart will be
helpful to achieve that. Overall, I believe it does help unify AUX buffers on
gen7+.

I updated the size field which I introduced in the previous patch, even though
we have no use for it.

XXX: As I mentioned in the last patch, the height given to the MCS buffer
allocation in intel_miptree_alloc_mcs() looks wrong, but I don't claim to fully
understand how the MCS buffer is laid out.

v2: rebase on master (Lionel)

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v2)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
8 years agoi965: Drop the aux mt when not used
Ben Widawsky [Fri, 21 Oct 2016 13:20:39 +0000 (14:20 +0100)]
i965: Drop the aux mt when not used

This patch will preserve the BO & offset, and not the miptree for the
aux_mcs buffer. Eventually it might make sense to pull put the sizing
function in miptree creation, but for now this should be sufficient
and not too hideous.

v2: Save BO's offset too (Lionel)

v3: Squash previous patch storing the size of the allocated aux buffer
    (Lionel)
    Fix memory leak with mcs_buf->bo (Lionel)

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v2)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
8 years agoi965/miptree: Directly gtt map the mcs buffer
Ben Widawsky [Fri, 21 Oct 2016 13:17:21 +0000 (14:17 +0100)]
i965/miptree: Directly gtt map the mcs buffer

The next patch will change the map type, and this will make sure there are no
regressions as a result of the other stuff. Since the miptree is newly created,
I believe it is always safe to just map.

It is possible to CPU map this buffer on LLC platforms (it additionally requires
rounding up to tile size). I did experiment with that patch, and found no
performance gains to be had.

I've added in error handling while here. Generally GTT mapping is an operation
which is highly unlikely to fail, but we may as well handle it when it does.

v2: rebase on master (Lionel)

v3: print out error if gtt mapping fails (Topi)

Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v2)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
8 years agoi965: Wrap MCS miptree in intel_miptree_aux_buffer
Jordan Justen [Fri, 21 Oct 2016 11:56:49 +0000 (12:56 +0100)]
i965: Wrap MCS miptree in intel_miptree_aux_buffer

This will allow us to treat HiZ and MCS the same when using as an
auxiliary surface buffer.

v2: (Ben) Minor rebase conflict resolution.
   Rename mcs_buf to aux_buf to address upcoming change for hiz specific buffers.
   That second thing is essentially a squash of:
   i965/gen8: Use intel_miptree_aux_buffer for auxiliary buffer - which didn't need
   to be separate in my opinion.

v3: rebase on master (Lionel)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Signed-off-by: Ben Widawsky <benjamin.widawsky@intel.com>a (v2)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v3)
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
8 years agogallivm: fix [IU]MUL_HI regression
Nicolai Hähnle [Tue, 8 Nov 2016 09:14:00 +0000 (10:14 +0100)]
gallivm: fix [IU]MUL_HI regression

This patch does two things:

1. It separates the host-CPU code generation from the generic code
   generation. This guards against accidently breaking things for
   radeonsi in the future.

2. It makes sure we actually use both arguments and don't just compute
   a square :-p

Fixes a regression introduced by commit 29279f44b3172ef3b84d470e70fc7684695ced4b

Cc: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
8 years agodraw: use vectorized calculations for fetch
Roland Scheidegger [Fri, 4 Nov 2016 04:13:03 +0000 (05:13 +0100)]
draw: use vectorized calculations for fetch

Instead of doing all the math with scalars, use vectors. This means the
overflow math needs to be done manually, albeit that's only really
problematic for the stride/index mul, the rest has been pretty much
moved outside the shader loop (albeit the mul could actually be optimized
away too), where things are still scalar. Because llvm is complete fail
with the zero-extend widening mul, roll our own even...
To eliminate control flow in the main shader loop fetch, provide fake
buffers (so index 0 is always valid to fetch).
Still uses aos fetch though in the end - mostly because some more code
would be needed to handle unaligned fetches in that path, and because for
most formats it won't make a difference anyway (we generate some truly
horrendous code for things like R16G16_something for instance).

Instanced fetch however stays roughly the same as before, except that
no longer the same element is fetched multiple times (I've seen a reduction
of ~3 times in main shader loop size due to apparently llvm not being able
to deduce it's really all the same with a couple instanced elements).

Also, for elts gathering, use vectorized code as well - provide a fake
elt buffer if there's no valid one bound.

The generated shaders are smaller and faster to compile (not entirely sure
about execution speed, but generally unless there's just single vertices
to handle I would expect it to be faster - there's more opportunities
for future improvements by using soa fetch).

No piglit change.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
8 years agogallivm: introduce 32x32->64bit lp_build_mul_32_lohi function
Roland Scheidegger [Fri, 4 Nov 2016 03:55:09 +0000 (04:55 +0100)]
gallivm: introduce 32x32->64bit lp_build_mul_32_lohi function

This is used by shader umul_hi/imul_hi functions (and soon by draw).
It's actually useful separating this out on its own, however the real
reason for doing it is because we're using an optimized sse2 version,
since the code llvm generates is atrocious (since there's no widening
mul in llvm, and it does not recognize the widening mul pattern, so
it generates code for real 64x64->64bit mul, which the cpu can't do
natively, in contrast to 32x32->64bit mul which it could do).

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
8 years agoi965: Add space before paren
Anuj Phogat [Fri, 28 Oct 2016 18:01:42 +0000 (11:01 -0700)]
i965: Add space before paren

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
8 years agoi965: Remove unnecessary white space
Anuj Phogat [Fri, 28 Oct 2016 17:58:44 +0000 (10:58 -0700)]
i965: Remove unnecessary white space

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
8 years agoi965: Fix alpha-to-coverage and alpha test enabled checks
Anuj Phogat [Thu, 20 Oct 2016 18:40:40 +0000 (11:40 -0700)]
i965: Fix alpha-to-coverage and alpha test enabled checks

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
8 years agomesa: Add helper function _mesa_is_alpha_to_coverage_enabled()
Anuj Phogat [Tue, 25 Oct 2016 18:56:07 +0000 (11:56 -0700)]
mesa: Add helper function _mesa_is_alpha_to_coverage_enabled()

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
8 years agomesa: Add helper function _mesa_is_alpha_test_enabled()
Anuj Phogat [Tue, 25 Oct 2016 18:55:44 +0000 (11:55 -0700)]
mesa: Add helper function _mesa_is_alpha_test_enabled()

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
8 years agomesa: Use separate line for function return type
Anuj Phogat [Tue, 25 Oct 2016 18:54:36 +0000 (11:54 -0700)]
mesa: Use separate line for function return type

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
8 years agonvc0: simplify draw parameters upload for vertex shaders
Samuel Pitoiset [Tue, 25 Oct 2016 19:41:12 +0000 (21:41 +0200)]
nvc0: simplify draw parameters upload for vertex shaders

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agogallium/hud: protect against and initialization race
Steven Toth [Mon, 24 Oct 2016 14:10:51 +0000 (10:10 -0400)]
gallium/hud: protect against and initialization race

In the event that multiple threads attempt to install a graph
concurrently, protect the shared list.

Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agogallium/hud: close a previously opened handle
Steven Toth [Mon, 24 Oct 2016 14:10:50 +0000 (10:10 -0400)]
gallium/hud: close a previously opened handle

We're missing the closedir() to the matching opendir().

Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agogallium/hud: fix a problem where objects are free'd while in use.
Steven Toth [Mon, 24 Oct 2016 14:10:49 +0000 (10:10 -0400)]
gallium/hud: fix a problem where objects are free'd while in use.

Instead of trying to maintain a reference counted list of valid HUD
objects, and freeing them accordingly, creating race conditions
between unanticipated multiple threads, simply accept they're
allocated once and never released until the process terminates.

They're a shared resource between multiple threads, so accept
they're always available for use.

Signed-off-by: Steven Toth <stoth@kernellabs.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agomesa: drop current draw/read buffer when ctx is released
Rob Clark [Wed, 26 Oct 2016 20:52:52 +0000 (16:52 -0400)]
mesa: drop current draw/read buffer when ctx is released

This fixes a problem seen with gallium drivers vs android wallpaper.
Basically, what happens is:

   EGLSurface tmpSurface = mEgl.eglCreatePbufferSurface(mEglDisplay, mEglConfig, attribs);
   mEgl.eglMakeCurrent(mEglDisplay, tmpSurface, tmpSurface, mEglContext);

   int[] maxSize = new int[1];
   Rect frame = surfaceHolder.getSurfaceFrame();
   glGetIntegerv(GL_MAX_TEXTURE_SIZE, maxSize, 0);

   mEgl.eglMakeCurrent(mEglDisplay, EGL_NO_SURFACE, EGL_NO_SURFACE, EGL_NO_CONTEXT);
   mEgl.eglDestroySurface(mEglDisplay, tmpSurface);

   ... check maxSize vs frame size and bail if needed ...

   mEglSurface = mEgl.eglCreateWindowSurface(mEglDisplay, mEglConfig, surfaceHolder, null);
   ... error checking ...
   mEgl.eglMakeCurrent(mEglDisplay, mEglSurface, mEglSurface, mEglContext);

When the window-surface is created, it ends up with the same ptr address
as the recently freed tmpSurface pbuffer surface.  Which after many
levels of indirection, results in st_framebuffer_validate() ending up with
the same/old framebuffer object, and in the end never calling the
DRIimageLoaderExtension::getBuffers().  Then in droid_swap_buffers(), the
dri2_surf is still the old pbuffer surface (with dri2_surf->buffer being
NULL, obviously, so when wallpaper app calls eglSwapBuffers() nothing
gets enqueued to the compositor).  Resulting in a black/blank background
layer.

Note that at the EGL layer, when the context is unbound, EGL drops it's
references to the draw and read buffer as well.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Tested-by: Robert Foss <robert.foss@collabora.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
8 years agoclover: Add CL_PROGRAM_BINARY_TYPE support (CL1.2).
Serge Martin [Mon, 31 Oct 2016 00:21:15 +0000 (17:21 -0700)]
clover: Add CL_PROGRAM_BINARY_TYPE support (CL1.2).

v3 [Francisco Jerez]: Loosely based on Serge's v1 of this patch in
   order to avoid CL-specific enums in the clover module binary
   format.  In addition to other changes made in v2: Represent the CL
   program binary type as the section type instead of adding a CL
   API-specific enum, check that the binary types of the input objects
   are valid during clLinkProgram(), pass section type as argument to
   build_module_library() instead of using separate function.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
8 years agoclover: add missing clGetDeviceInfo CL1.2 queries
Serge Martin [Sat, 1 Oct 2016 16:51:11 +0000 (18:51 +0200)]
clover: add missing clGetDeviceInfo CL1.2 queries

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Vedran Miletić <vedran@miletic.net>
8 years agonvc0: get rid of NVE4_COMPUTE_MP_PM_{A,B}_SIGSEL_XXX
Samuel Pitoiset [Sat, 5 Nov 2016 16:56:02 +0000 (17:56 +0100)]
nvc0: get rid of NVE4_COMPUTE_MP_PM_{A,B}_SIGSEL_XXX

Instead, hardcode group sigsel because there are a bunch of unknown
groups, especially on SM50/SM52.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
8 years agogm107/ir: emit RED instead of ATOM when no dst
Samuel Pitoiset [Fri, 4 Nov 2016 19:08:57 +0000 (20:08 +0100)]
gm107/ir: emit RED instead of ATOM when no dst

This is similar to NVC0 and GK110 emitters where we emit
reduction operations instead of atomic operations when the
destination is not used.

Found after writing some tests which check if performance counters
return the expected value. In that case, gred_count returned 0
on gm107 while at least gk106 returned the correct value.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
8 years agost/mesa: initialize members of glsl_to_tgsi_instruction in emit_asm()
Brian Paul [Sat, 5 Nov 2016 17:00:10 +0000 (11:00 -0600)]
st/mesa: initialize members of glsl_to_tgsi_instruction in emit_asm()

This fixes random crashes with MSVC release builds.  It seems the
members are implicitly initialized to zero with gcc, but not MSVC.
In particular, the tex_offset_num_offset field was non-zero causing
a loop over the NULL tex_offsets array to crash.

Zero-init those fields and a few others to be safe.

The regression began with acc23b04cfd64e "ralloc: remove memset from
ralloc_size".

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoandroid: amd/common: add support for libmesa_amd_common
Mauro Rossi [Fri, 4 Nov 2016 23:00:29 +0000 (00:00 +0100)]
android: amd/common: add support for libmesa_amd_common

Fixes the following building error introduced with commit 7115e56
and related amd/common dependencies:

external/mesa/src/gallium/drivers/radeonsi/si_shader.c:6861: error: undefined reference to 'ac_is_sgpr_param'
external/mesa/src/gallium/drivers/radeonsi/si_shader.c:6951: error: undefined reference to 'ac_is_sgpr_param'
clang++: error: linker command failed with exit code 1 (use -v to see invocation)

ninja: build stopped: subcommand failed.
build/core/ninja.mk:148: recipe for target 'ninja_wrapper' failed
make: *** [ninja_wrapper] Error 1

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
8 years agowinsys/radeon: don't call surface_best for FMASK
Marek Olšák [Fri, 4 Nov 2016 11:30:08 +0000 (12:30 +0100)]
winsys/radeon: don't call surface_best for FMASK

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98518

Acked-by: Edward O'Callaghan <funfunctor@folklore1984.net>
8 years agomesa: Add linear ETC2/EAC to the compressed format list with ES3 compat.
Kenneth Graunke [Thu, 3 Nov 2016 08:58:03 +0000 (01:58 -0700)]
mesa: Add linear ETC2/EAC to the compressed format list with ES3 compat.

GL_ARB_ES3_compatibility brings ETC2/EAC formats to desktop GL.

The meaning of the GL compressed format list is pretty vague - it's
supposed to return formats for "general-purpose usage".  (GL 4.2
deprecates the list because of this.)  Basically everyone interprets
this as "linear RGB/RGBA".

ETC2/EAC meets that criteria, so while we shouldn't be required to add
it to the list, there's also little harm in doing so, at least on
platforms with native support.  I doubt anyone is using this list for
much anyway, so even on platforms without native support, it's probably
not a big deal.

Makes the following GL45-CTS.gtf43 tests pass:

* GL3Tests.eac_compression_r11.gl_compressed_r11_eac
* GL3Tests.eac_compression_rg11.gl_compressed_rg11_eac
* GL3Tests.eac_compression_signed_r11.gl_compressed_signed_r11_eac
* GL3Tests.eac_compression_signed_rg11.gl_compressed_signed_rg11_eac
* GL3Tests.etc2_compression_rgb8.gl_compressed_rgb8_etc2
* GL3Tests.etc2_compression_rgb8_pt_alpha1.gl_compressed_rgb8_pt_alpha1_etc2
* GL3Tests.etc2_compression_rgba8.gl_compressed_rgba8_etc2

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
8 years agovc4: Use Newton-Raphson on the 1/W write to fix glmark2 terrain.
Eric Anholt [Fri, 4 Nov 2016 20:41:20 +0000 (13:41 -0700)]
vc4: Use Newton-Raphson on the 1/W write to fix glmark2 terrain.

The 1/W was apparently not accurate enough, and we were getting sparklies
in the distance.  The closed driver also did a N-R step here.

Cc: <mesa-stable@lists.freedesktop.org>
8 years agovc4: Make sure that vertex shader texture2D() calls use LOD 0.
Eric Anholt [Fri, 4 Nov 2016 19:04:15 +0000 (12:04 -0700)]
vc4: Make sure that vertex shader texture2D() calls use LOD 0.

I noticed this while trying to debug glmark2 terrain (which does vertex
shader texturing, but no mipmaps on its textures sampled from the VS).

8 years agoradeonsi: fix vertex fetches for 2_10_10_10 formats
Nicolai Hähnle [Wed, 2 Nov 2016 18:07:40 +0000 (19:07 +0100)]
radeonsi: fix vertex fetches for 2_10_10_10 formats

The hardware always treats the alpha channel as unsigned, so add a shader
workaround. This is rare enough that we'll just build a monolithic vertex
shader.

The SINT case cannot actually happen in OpenGL, but I've included it for
completeness since it's just a mix of the other cases.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agost/mesa: fix the layer of VDPAU surface samplers
Nicolai Hähnle [Thu, 3 Nov 2016 20:49:40 +0000 (21:49 +0100)]
st/mesa: fix the layer of VDPAU surface samplers

A (latent) bug in VDPAU interop was exposed by commit
e5cc84dd43be066c1dd418e32f5ad258e31a150a.

Before that commit, the st_vdpau code created samplers with
first_layer == last_layer == 1 that the general texture handling code
would immediately delete and re-create, because the layer does not match
the information in the GL texture object.

This was correct behavior at least in the DMABUF case, because the imported
resource is supposed to have the correct offset already applied.  In the
non-DMABUF case, this was just plain wrong but apparently nobody noticed.

After that commit, the state tracker assumes that an existing sampler is
correct at all times.  Existing samplers are supposed to be deleted when
they may become invalid, and they will be created on-demand.  This meant
that the sampler with first_layer == last_layer == 1 stuck around, leading
to rendering artefacts (on radeonsi), command stream failures (on r600), and
assertions (in debug builds everywhere).

This patch fixes the problem by simply not creating a sampler at all in
st_vdpau_map_surface.  We rely on the generic texture code to do the right
thing, adding the layer_override to make the non-DMABUF case work.

v2: add the layer_override

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98512
Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Cc: Christian König <deathsimple@vodafone.de>
Cc: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Christian König <christian.koenig@amd.com>
8 years agoRevert "st/vdpau: use linear layout for output surfaces"
Dave Airlie [Thu, 15 Sep 2016 03:58:33 +0000 (13:58 +1000)]
Revert "st/vdpau: use linear layout for output surfaces"

This reverts commit d180de35320eafa3df3d76f0e82b332656530126.

This is a radeon specific hack that causes problems on nouveau
when combined with the SHARED flag later. If radeonsi needs a fix
for this, please fix it in the driver.

[chk]
Using linear surfaces for this makes sense because tilling isn't
beneficial and the surfaces can potentially be shared with other GPUs
using the VDPAU OpenGL interop.

[airlied]
I think we need a flag that isn't SHARED/LINEAR that is more
SHARED_OTHER_GPU.

[mareko]
Does radeonsi need PIPE_BIND_VIDEO_DECODE_OUTPUT that it would translate
into linear ?

[mareko]
My only concern is decoding performance. If the decoder works in 64x1
blocks, tiling will hurt. That's the theory. I don't know how the
decoder works.

Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Tested-by: Ilia Mirkin <imirkin@alum.mit.edu>
Tested-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> (I+A)
8 years agoradeonsi: fix an assertion failure in si_decompress_sampler_color_textures
Marek Olšák [Thu, 3 Nov 2016 18:16:51 +0000 (19:16 +0100)]
radeonsi: fix an assertion failure in si_decompress_sampler_color_textures

This fixes a crash in Deus Ex: Mankind Divided. Release builds were
unaffected, so it's not too serious.

Cc: 11.2 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agoglx: make interop ABI visible again
Marek Olšák [Wed, 2 Nov 2016 17:59:22 +0000 (18:59 +0100)]
glx: make interop ABI visible again

This was broken when the GLAPI use was removed from mesa_glinterop.h.

Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
8 years agoegl: make interop ABI visible again
Marek Olšák [Wed, 2 Nov 2016 17:59:22 +0000 (18:59 +0100)]
egl: make interop ABI visible again

This was broken when the GLAPI use was removed from mesa_glinterop.h.

Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
8 years agoegl: use util/macros.h
Marek Olšák [Wed, 2 Nov 2016 17:56:39 +0000 (18:56 +0100)]
egl: use util/macros.h

I need the definition of PUBLIC.

Cc: 12.0 13.0 <mesa-stable@lists.freedesktop.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
8 years agoradeonsi: enable GLSL 4.50
Nicolai Hähnle [Fri, 7 Oct 2016 16:21:51 +0000 (18:21 +0200)]
radeonsi: enable GLSL 4.50

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
8 years agost/glsl_to_tgsi: fix dvec[34] loads from SSBO
Nicolai Hähnle [Thu, 3 Nov 2016 10:00:36 +0000 (11:00 +0100)]
st/glsl_to_tgsi: fix dvec[34] loads from SSBO

When splitting up loads, we have to add 16 bytes to the offset for
the high components, just like already happens for stores.

Fixes arb_gpu_shader_fp64@shader_storage@layout-std140-fp64-shader.

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoglsl/cache: correct asprintf error handling
Nicolai Hähnle [Thu, 3 Nov 2016 09:23:17 +0000 (10:23 +0100)]
glsl/cache: correct asprintf error handling

From the manpage of asprintf:

   "If memory allocation wasn't possible, or some other error occurs,
    these functions will return -1, and the contents of strp are
    undefined."

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
8 years agogallium/radeon: Multiply bpe by nsamples in surf_winsys_to_drm
Michel Dänzer [Wed, 2 Nov 2016 09:54:44 +0000 (18:54 +0900)]
gallium/radeon: Multiply bpe by nsamples in surf_winsys_to_drm

For symmetry with surf_drm_to_winsys.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agogallium/radeon: Use flags parameter in radeon_winsys_surface_init
Michel Dänzer [Wed, 2 Nov 2016 10:48:35 +0000 (19:48 +0900)]
gallium/radeon: Use flags parameter in radeon_winsys_surface_init

Fixes valgrind warnings about surf_ws->flags being uninitialized while
starting X.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agogallium/radeon: Only convert stencil info if RADEON_SURF_SBUFFER is set
Michel Dänzer [Wed, 2 Nov 2016 10:09:06 +0000 (19:09 +0900)]
gallium/radeon: Only convert stencil info if RADEON_SURF_SBUFFER is set

Fixes valgrind warnings about using uninitialized memory when starting X.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agogallium/radeon: Only loop up to last_level for drm<->winsys conversion
Michel Dänzer [Wed, 2 Nov 2016 09:43:37 +0000 (18:43 +0900)]
gallium/radeon: Only loop up to last_level for drm<->winsys conversion

Fixes spurious assertion failure in surf_level_drm_to_winsys when
starting X, due to processing a miplevel which was never initialized.

Fixes: e9c76eeeaa67 ("gallium/radeon: remove radeon_surf_level::pitch_bytes")
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
8 years agoanv: use limits.h instead of deprecated/obsolete values.h
Tapani Pälli [Thu, 3 Nov 2016 11:44:47 +0000 (13:44 +0200)]
anv: use limits.h instead of deprecated/obsolete values.h

Mesa uses limits.h elsewhere, and this makes is possible to
compile anv_allocator.c on Android.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
8 years agovc4: Add miptree/texture state support for ETC1 compressed textures.
Eric Anholt [Tue, 6 Jan 2015 21:35:21 +0000 (13:35 -0800)]
vc4: Add miptree/texture state support for ETC1 compressed textures.

The format isn't flagged as enabled at runtime yet, because we need kernel
validation support.

8 years agovc4: Fix use of undefined values since the ralloc zeroing changes.
Eric Anholt [Wed, 2 Nov 2016 20:51:10 +0000 (13:51 -0700)]
vc4: Fix use of undefined values since the ralloc zeroing changes.

reralloc() no longer zeroes the new contents, so switch to using
rzalloc_array() instead.

8 years agonir: Make sure to set the texsrc type in nir drawpixels/bitmap lowering.
Eric Anholt [Thu, 3 Nov 2016 23:28:49 +0000 (16:28 -0700)]
nir: Make sure to set the texsrc type in nir drawpixels/bitmap lowering.

We were leaving an undefined value since the ralloc zeroing changes.
Fixes nir_validate() failures on vc4.

v2: Fix the color-index case of drawpixels as well.

Reviewed-by: Rob Clark <robdclark@gmail.com> (v1)
8 years agodraw: fix undefined input handling some more...
Roland Scheidegger [Fri, 4 Nov 2016 00:48:22 +0000 (01:48 +0100)]
draw: fix undefined input handling some more...

Previous fixes were incomplete - some code still iterated through the number
of elements provided by velem layout instead of the number stored in the key
(which is the same as the number defined by the vs). And also actually
accessed the elements from the layout directly instead of those in the key.
This mismatch could still cause crashes.
(Besides, it is a very good idea to only use data stored in the key anyway.)
v2: move null format check, remove now unnecessary function parameter,
some minor prettify

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
8 years agogallium/hud: call fflush() after printing error messages
Brian Paul [Tue, 1 Nov 2016 14:32:04 +0000 (08:32 -0600)]
gallium/hud: call fflush() after printing error messages

For Windows.  Otherwise, we don't see the message until the program exits.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
8 years agosvga: move svga_mark_surfaces_dirty() prototype to svga_surface.h
Brian Paul [Tue, 1 Nov 2016 14:19:42 +0000 (08:19 -0600)]
svga: move svga_mark_surfaces_dirty() prototype to svga_surface.h

Trivial.

8 years agosvga: whitespace / formatting clean-up in svga_context.c
Brian Paul [Tue, 1 Nov 2016 14:17:05 +0000 (08:17 -0600)]
svga: whitespace / formatting clean-up in svga_context.c

Trivial.

8 years agosvga: collect stats for time spent in svga_context_finish()
Brian Paul [Fri, 28 Oct 2016 20:04:32 +0000 (13:04 -0700)]
svga: collect stats for time spent in svga_context_finish()

This should have appeared with commit "svga: add guest statistic
gathering interface" from August 4, but was somehow lost.

8 years agosvga: invalidate new surface before it is bound to a render target view
Charmaine Lee [Wed, 26 Oct 2016 23:15:23 +0000 (16:15 -0700)]
svga: invalidate new surface before it is bound to a render target view

Invalidate a "new" surface before it is bound to a render target view or
depth stencil view in order to avoid the unnecessary host side copy
of the surface data before it is rendered to.
Note that, recycled surface is already invalidated before it is reused.

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agoRevert "svga: use untyped surface formats in most cases"
Charmaine Lee [Wed, 26 Oct 2016 22:46:49 +0000 (15:46 -0700)]
Revert "svga: use untyped surface formats in most cases"

Using untyped surface formats causes huge performance degradation on Fusion.
This reverts commit eb0ced74f6decd1bf1e111b162e1389bede89af6 until
the backend has a better solution to address typeless surface formats.

8 years agosvga: allow quad blit for more formats
Charmaine Lee [Fri, 28 Oct 2016 18:48:34 +0000 (11:48 -0700)]
svga: allow quad blit for more formats

Currently blitter will fail if the blit format is different and
view-incompatible to the resource format. Instead of punting
to software blit which will stall the pipeline, we will
create temporary resource to allow blitter to work.

Fixes piglit test arb_copy_image-formats.
Also tested with MTT piglit, glretrace.

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agosvga: create BGRX render target view for BGRX_UNORM surface
Charmaine Lee [Tue, 25 Oct 2016 20:56:52 +0000 (13:56 -0700)]
svga: create BGRX render target view for BGRX_UNORM surface

Currently we adjust the view format when we are asked to create a
BGRA render target view for BGRX surface. But we only look for
SVGA3D_B8G8R8X8_TYPELESS surface format.
With this patch, we will also check for SVGA3D_B8G8R8X8_UNORM surface format,
and use SVGA3D_B8G8R8X8_UNORM as the view format for that case.

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agosvga: add a helper function to check for typeless format
Charmaine Lee [Mon, 24 Oct 2016 17:50:29 +0000 (10:50 -0700)]
svga: add a helper function to check for typeless format

This patch adds a helper function svga_format_is_typeless() which
returns TRUE if the specified format is typeless.

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agosvga: add SVGA_NEW_FRAME_BUFFER to svga_hw_tss_binding state atom
Brian Paul [Tue, 27 Sep 2016 16:06:46 +0000 (10:06 -0600)]
svga: add SVGA_NEW_FRAME_BUFFER to svga_hw_tss_binding state atom

We may need to re-emit texture bindings when the framebuffer state
changes.  In particular, emitting the texture binding can also involve
updating a texture from its backing copy during sampler view validation.
The backing copy is made during framebuffer validation.

This helps to fix an issue with Photoshop on VGPU9 (VMware bug 1723971).

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
8 years agosvga: allow copy_region if sample counts match
Charmaine Lee [Fri, 28 Oct 2016 18:39:47 +0000 (11:39 -0700)]
svga: allow copy_region if sample counts match

With this patch, we will allow blit with copy_region if the
source and destination textures have the same sample counts.

Fixes failures with piglit tests
 spec@arb_texture_float@multisample-formats 2 gl_arb_texture_float
 spec@arb_texture_rg@multisample-formats 2 gl_arb_texture_rg-float

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agosvga: set rendered-to flag after updating the texture using PredCopyRegion
Charmaine Lee [Mon, 3 Oct 2016 20:29:58 +0000 (13:29 -0700)]
svga: set rendered-to flag after updating the texture using PredCopyRegion

This patch sets the rendered-to flag for the subresource after it is
updated using the PredCopyRegion command. This is to ensure that the GB surface
will be sync up properly before it will be directly mapped to.

Tested with MTT piglit, glretrace.

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agosvga: add can_use_upload flag
Charmaine Lee [Fri, 30 Sep 2016 23:41:12 +0000 (16:41 -0700)]
svga: add can_use_upload flag

This patch adds a flag "can_use_upload" to svga_texture structure
to avoid some checking of the upload availability at each transfer map time.

Tested with Lightsmark2008, Tropics, MTT glretrace, piglit.

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agosvga: fix texture upload path condition
Charmaine Lee [Fri, 30 Sep 2016 22:52:14 +0000 (15:52 -0700)]
svga: fix texture upload path condition

As Thomas suggested, we'll first try to map directly to a GB surface.
If it is blocked, then we'll use texture upload buffer.
Also if a texture is already "rendered to", that is, the GB surface
is already out of sync, then we'll use the texture upload buffer
to avoid syncing the GB surface.

Tested with Lightsmark2008, Tropics, MTT piglit, glretrace.

Reviewed-by: Brian Paul <brianp@vmware.com>
8 years agosvga: set rendered_to flag with texture uploaded using TransferFromBuffer command
Charmaine Lee [Thu, 29 Sep 2016 23:41:21 +0000 (16:41 -0700)]
svga: set rendered_to flag with texture uploaded using TransferFromBuffer command

This patch sets the rendered_to flag for the texture subresource that
is uploaded using the TransferFromBuffer command. This is to ensure that
the subresource will be read back or invalidated before it will be
directly mapped to. This makes sure that the content of the GB surface
will not be accidentally overwritten by the device at suspend/resume time.

Reviewed-by: Brian Paul <brianp@vmware.com>