mesa.git
7 years agoglsl: Fix constant evaluation of the rcp op.
Francisco Jerez [Tue, 24 Jan 2017 19:41:46 +0000 (11:41 -0800)]
glsl: Fix constant evaluation of the rcp op.

Will avoid a regression in a future commit that introduces some
additional rcp operations.  According to the GLSL 4.10 specification:

"Dividing by 0 results in the appropriately signed IEEE Inf."

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
7 years agomesa/program: Translate csel operation from GLSL IR.
Francisco Jerez [Tue, 24 Jan 2017 07:53:03 +0000 (23:53 -0800)]
mesa/program: Translate csel operation from GLSL IR.

This will be used internally by the GLSL front-end in order to
implement some built-in functions. Plumb it through MESA IR for
back-ends that rely on this translation pass.

v2: Add comment.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
7 years agoetnaviv: Set SE.CLIP registers, add margins for scissor/clip registers
Wladimir J. van der Laan [Fri, 25 Nov 2016 06:42:43 +0000 (06:42 +0000)]
etnaviv: Set SE.CLIP registers, add margins for scissor/clip registers

This fixes rendering of full-screen quads (and other screen-filling
geometry, e.g. ioquake3 walls up-close) on gc3000. It should be a no-op
on other hardware.

- It looks like SE_CLIP registers were not set at all.
  I'm amazed that rendering worked without them. Emit them to
  avoid issues on gc3000.

- Define constants
  ETNA_SE_SCISSOR_MARGIN_RIGHT (0x1119)
  ETNA_SE_SCISSOR_MARGIN_BOTTOM (0x1111)
  ETNA_SE_CLIP_MARGIN_RIGHT (0xffff)
  ETNA_SE_CLIP_MARGIN_BOTTOM (0xffff)

  These demarcate the margin (fixp16) between the computed sizes and the
  value sent to the chip. I have set these to the numbers used by the
  Vivante driver for gc2000. I am not sure whether any old hardware was
  relying on the old numbers, or whether those were just a guess. But if
  so, these need to be moved to the _specs structure.

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
7 years agoetnaviv: Generate new sin/cos instructions on GC3000
Wladimir J. van der Laan [Tue, 31 Jan 2017 08:23:51 +0000 (09:23 +0100)]
etnaviv: Generate new sin/cos instructions on GC3000

Shaders using sin/cos instructions were not working on GC3000.

The reason for this turns out to be that these chips implement sin/cos
in a different way (but using the same opcodes):

- Need their input scaled by 1/pi instead of 2/pi.

- Output an x and y component, which need to be multiplied to
  get the result.

- tex_amode needs to be set to 1.

Add a new bit to the compiler specs and generate these instructions
as necessary.

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
7 years agoanv/cmd_buffer: Use the proper depth input attachment surface state
Nanley Chery [Mon, 30 Jan 2017 20:27:15 +0000 (12:27 -0800)]
anv/cmd_buffer: Use the proper depth input attachment surface state

Commit 2852efcda40274acf3272611c6a3b7731523a72d moved the location of
the depth input attachment surface state from the render pass to the
image view, but failed to update the surface state location used when
emitting the binding table. Fix this by loading the surface state from
the correct location.

Fixes:
dEQP-VK.renderpass.formats.d16_unorm.input.*
dEQP-VK.renderpass.formats.d24_unorm_s8_uint.input.*
dEQP-VK.renderpass.formats.d32_sfloat.input.*
dEQP-VK.renderpass.formats.x8_d24_unorm_pack32.input.*
dEQP-VK.renderpass.attachment_allocation.input_output.93
dEQP-VK.renderpass.attachment_allocation.input_output.92
dEQP-VK.renderpass.attachment_allocation.input_output.82
dEQP-VK.renderpass.attachment_allocation.input_output.46

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
7 years agoglsl: fix heap-buffer-overflow
Bartosz Tomczyk [Tue, 31 Jan 2017 11:02:20 +0000 (12:02 +0100)]
glsl: fix heap-buffer-overflow

The `end+1` skips the ']', whereas the `strlen+1` includes the final
'\0' in the move to terminate the string.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoetnaviv: Cannot render to rb-swapped formats
Wladimir J. van der Laan [Wed, 7 Dec 2016 12:59:54 +0000 (12:59 +0000)]
etnaviv: Cannot render to rb-swapped formats

Exposing rb swapped (or other swizzled) formats for rendering would
involve swizzing in the pixel shader. This is not the case at the
moment, so reject requests for creating such surfaces.

(GPUs that need an extra resolve step anyway due to multiple pixel
pipes, such as gc2000, might also do this swap in the resolve operation.
But this would be tricky to keep track of)

CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
7 years agoetnaviv: Avoid infinite loop in find_frame()
Christian Gmeiner [Tue, 31 Jan 2017 08:10:27 +0000 (09:10 +0100)]
etnaviv: Avoid infinite loop in find_frame()

Use of unsigned loop control variable with '>= 0' would lead
to infinite loop.

Reported by clang:

etnaviv_compiler.c:1024:39: warning: comparison of unsigned expression
>= 0 is always true [-Wtautological-compare]
   for (unsigned sp = c->frame_sp; sp >= 0; sp--)
                                   ~~ ^  ~

v2: Simply use the same datatype as c->frame_sp is using.

CC: <mesa-stable@lists.freedesktop.org>
Reported-by: Rhys Kidd <rhyskidd@gmail.com>
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Rhys Kidd <rhyskidd@gmail.com>
7 years agoradv/ac: apply slice rounding to 1d arrays as well.
Dave Airlie [Tue, 31 Jan 2017 00:09:11 +0000 (10:09 +1000)]
radv/ac: apply slice rounding to 1d arrays as well.

Fixes:
dEQP-VK.glsl.texture_functions.texture.*1darray*

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/geom: check if esgs and gsvs ring exists before filling geom rings
Dave Airlie [Tue, 31 Jan 2017 00:37:25 +0000 (10:37 +1000)]
radv/geom: check if esgs and gsvs ring exists before filling geom rings

There are some corner cases where you end up with an esgs ring, but no
gsvs ring, test for both before dereferencing.

Fixes:
dEQP-VK.geometry.emit.points_emit_0_end_0

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: enable geometryShader and multiViewport capabilities.
Dave Airlie [Fri, 20 Jan 2017 02:42:26 +0000 (12:42 +1000)]
radv: enable geometryShader and multiViewport capabilities.

This enables geometry shader support on radv.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: handle layer export from vs->fs properly
Dave Airlie [Mon, 30 Jan 2017 19:56:49 +0000 (05:56 +1000)]
radv: handle layer export from vs->fs properly

Fixes:
dEQP-VK.geometry.layered.1d_array.fragment_layer

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: emit esgs itemsize register.
Dave Airlie [Fri, 20 Jan 2017 02:41:19 +0000 (12:41 +1000)]
radv: emit esgs itemsize register.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: handle prim id inputs to fragment shader.
Dave Airlie [Fri, 20 Jan 2017 02:40:13 +0000 (12:40 +1000)]
radv: handle prim id inputs to fragment shader.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: emit geometry shaders to hardware
Dave Airlie [Fri, 20 Jan 2017 02:33:45 +0000 (12:33 +1000)]
radv: emit geometry shaders to hardware

This emits the compiled geometry shader and other state registers.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: emit geometry ring size and pointers via preamble (v2)
Dave Airlie [Fri, 20 Jan 2017 01:06:52 +0000 (11:06 +1000)]
radv: emit geometry ring size and pointers via preamble (v2)

This uses the scratch infrastructure to handle the esgs
and gsvs rings.

(this replaces the old code that did this with patching).

v2: fix correct ring sizes, reset sizes (Bas)

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: add gs ring size calculations to pipeline.
Dave Airlie [Fri, 20 Jan 2017 00:21:19 +0000 (10:21 +1000)]
radv: add gs ring size calculations to pipeline.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: add pipeline creation support for geometry shaders (v2.1)
Dave Airlie [Thu, 19 Jan 2017 23:55:37 +0000 (09:55 +1000)]
radv: add pipeline creation support for geometry shaders (v2.1)

This adds gs copy shader support to the pipeline cache, and few
geometry related changes.

v2: rebase for spill changes.
v2.1: fix incorrect pipeline destruction.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/ac: handle primitive id
Dave Airlie [Thu, 19 Jan 2017 05:23:02 +0000 (15:23 +1000)]
radv/ac: handle primitive id

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/ac: handle emitting vertex outputs to esgs ring.
Dave Airlie [Thu, 19 Jan 2017 05:14:31 +0000 (15:14 +1000)]
radv/ac: handle emitting vertex outputs to esgs ring.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/ac: handle gs inputs
Dave Airlie [Thu, 19 Jan 2017 05:09:19 +0000 (15:09 +1000)]
radv/ac: handle gs inputs

This handles geometry shader inputs written by the vertex (es) shader
to the esgs ring.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/ac: add geom input support to get deref offset.
Dave Airlie [Thu, 19 Jan 2017 05:05:37 +0000 (15:05 +1000)]
radv/ac: add geom input support to get deref offset.

This just adds the API and fixes up the callers.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/ac: handle invocation and primitive id intrinsics
Dave Airlie [Thu, 19 Jan 2017 04:54:18 +0000 (14:54 +1000)]
radv/ac: handle invocation and primitive id intrinsics

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/ac: handle geometry emit vertex and end prim intrinsics.
Dave Airlie [Thu, 19 Jan 2017 04:52:07 +0000 (14:52 +1000)]
radv/ac: handle geometry emit vertex and end prim intrinsics.

This handles emitting things to the gsvs ring, and sending the
correct GS msgs.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/ac: handle emitting gs epilogue
Dave Airlie [Thu, 19 Jan 2017 04:47:50 +0000 (14:47 +1000)]
radv/ac: handle emitting gs epilogue

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/ac: add copy shader creation
Dave Airlie [Thu, 19 Jan 2017 03:55:19 +0000 (13:55 +1000)]
radv/ac: add copy shader creation

This create the gs copy shader and compiles it.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/ac: setup function parameters for vs as es and copy shader.
Dave Airlie [Thu, 19 Jan 2017 03:48:26 +0000 (13:48 +1000)]
radv/ac: setup function parameters for vs as es and copy shader.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: pass some necessary gs info back to state handling.
Dave Airlie [Thu, 19 Jan 2017 03:43:26 +0000 (13:43 +1000)]
radv: pass some necessary gs info back to state handling.

We need this info to program some registers.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: emit vertex shader to correct hw block.
Dave Airlie [Thu, 19 Jan 2017 03:26:01 +0000 (13:26 +1000)]
radv: emit vertex shader to correct hw block.

This emits the shader to the ES block in the correct case.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/ac: propogate as_es flag into shader info from key.
Dave Airlie [Thu, 19 Jan 2017 03:23:55 +0000 (13:23 +1000)]
radv/ac: propogate as_es flag into shader info from key.

This just places the flag into the shader info so we can use it from
the driver after we create the shader.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: extend shader stage code to cover geometry shaders.
Dave Airlie [Thu, 19 Jan 2017 02:58:00 +0000 (12:58 +1000)]
radv: extend shader stage code to cover geometry shaders.

This enables the paths for setting up user ptrs to vs/es and gs.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/ac: start setting up the geom shader rings (v2)
Dave Airlie [Wed, 18 Jan 2017 06:13:20 +0000 (16:13 +1000)]
radv/ac: start setting up the geom shader rings (v2)

This sets up the rings and adds the variables
needed to make them work.

v2: rework for sharing ring and scratch
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/ac: handle geom shader sgpr/vgpr inputs
Dave Airlie [Wed, 18 Jan 2017 05:22:44 +0000 (15:22 +1000)]
radv/ac: handle geom shader sgpr/vgpr inputs

This just sets up the gpr inputs.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/ac: add geom shader sendmsg defines.
Dave Airlie [Wed, 18 Jan 2017 05:17:35 +0000 (15:17 +1000)]
radv/ac: add geom shader sendmsg defines.

This just adds some defines needed for geom shaders.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/ac: add some geom shader info from nir->ac shader.
Dave Airlie [Wed, 18 Jan 2017 05:11:52 +0000 (15:11 +1000)]
radv/ac: add some geom shader info from nir->ac shader.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: move hw vertex shader emit to separate function
Dave Airlie [Wed, 18 Jan 2017 04:48:09 +0000 (14:48 +1000)]
radv: move hw vertex shader emit to separate function

This is to later allow ES shaders to be emitted.

Review-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: fixup ia multi vgt param code to handle geom shaders.
Dave Airlie [Wed, 18 Jan 2017 03:55:05 +0000 (13:55 +1000)]
radv: fixup ia multi vgt param code to handle geom shaders.

This fixes up a few of the commented out blocks.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: add code to set gs_table_depth.
Dave Airlie [Wed, 18 Jan 2017 03:54:17 +0000 (13:54 +1000)]
radv: add code to set gs_table_depth.

Review-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: add small helper to denote when a geom shader is in the pipeline.
Dave Airlie [Wed, 18 Jan 2017 03:50:16 +0000 (13:50 +1000)]
radv: add small helper to denote when a geom shader is in the pipeline.

Review-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Prevent Coverity warning
Robert Foss [Mon, 30 Jan 2017 21:26:58 +0000 (16:26 -0500)]
radv: Prevent Coverity warning

Prevent Coverity seeing potential errors when src is
no initialized in the switch case.

Coverity-Id: 1396397
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agomesa: add new MESA_GLSL flag for printing shader cache debug info
Timothy Arceri [Fri, 8 Jul 2016 02:44:44 +0000 (12:44 +1000)]
mesa: add new MESA_GLSL flag for printing shader cache debug info

Reviewed-by: Eric Anholt <eric@anholt.net>
7 years agoglsl: add cache to ctx and add sha1 string fields
Carl Worth [Thu, 14 Apr 2016 01:04:23 +0000 (11:04 +1000)]
glsl: add cache to ctx and add sha1 string fields

We also add a flag for detecting shaders written to shader cache.

V2: dont leak cache

Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
7 years agoglsl: add new uniform fields to be used to restore state from cache
Carl Worth [Thu, 14 Apr 2016 00:48:19 +0000 (10:48 +1000)]
glsl: add new uniform fields to be used to restore state from cache

Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
7 years agoglsl: Switch to disable-by-default for the GLSL shader cache
Carl Worth [Mon, 16 Mar 2015 18:46:20 +0000 (11:46 -0700)]
glsl: Switch to disable-by-default for the GLSL shader cache

The shader cache is expected to be developed incrementally over a
fairly long series of commits. For that period of instability, we
require users to opt into the shader cache by setting:

MESA_GLSL_CACHE_ENABLE=1

In the future, when the shader cache is complete, we can revert this
commit so that the cache will be on by default.

The user can always disable the cache with
MESA_GLSL_CACHE_DISABLE=1. That functionality is not affected by this
commit, (nor will it be affected by the future revert).

Reviewed-by: Eric Anholt <eric@anholt.net>
7 years agoradv/ac: implement txs for buffer textures.
Dave Airlie [Mon, 30 Jan 2017 19:19:56 +0000 (05:19 +1000)]
radv/ac: implement txs for buffer textures.

This fixes a bunch of buffer related:
dEQP-VK.memory.pipeline_barrier.*
tests, that were crashing in LLVM due to this being missing.

Reviewed-by: Andres Rodriguez<andresx7@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/ac: handle nir irem opcode.
Dave Airlie [Mon, 30 Jan 2017 18:50:30 +0000 (04:50 +1000)]
radv/ac: handle nir irem opcode.

This fixes:
dEQP-VK.spirv_assembly.instruction.compute.opsrem.*

Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org"
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/ac: fix multisample subpass image.
Dave Airlie [Mon, 30 Jan 2017 06:13:30 +0000 (16:13 +1000)]
radv/ac: fix multisample subpass image.

We weren't adding the fragment position properly.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: handle transfer_write as a dst flag.
Dave Airlie [Mon, 30 Jan 2017 03:17:05 +0000 (13:17 +1000)]
radv: handle transfer_write as a dst flag.

It appears we can get image barriers like:
    srcStageMask:                   VkPipelineStageFlags = 4096 (VK_PIPELINE_STAGE_TRANSFER_BIT)
    dstStageMask:                   VkPipelineStageFlags = 4096 (VK_PIPELINE_STAGE_TRANSFER_BIT)
    dependencyFlags:                VkDependencyFlags = 0
    memoryBarrierCount:             uint32_t = 0
    pMemoryBarriers:                const VkMemoryBarrier* = NULL
    bufferMemoryBarrierCount:       uint32_t = 0
    pBufferMemoryBarriers:          const VkBufferMemoryBarrier* = NULL
    imageMemoryBarrierCount:        uint32_t = 1
    pImageMemoryBarriers:           const VkImageMemoryBarrier* = 0x7ffc882367b0
        pImageMemoryBarriers[0]:        const VkImageMemoryBarrier = 0x7ffc882367b0:
            sType:                          VkStructureType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER (45)
            pNext:                          const void* = NULL
            srcAccessMask:                  VkAccessFlags = 4096 (VK_ACCESS_TRANSFER_WRITE_BIT)
            dstAccessMask:                  VkAccessFlags = 4096 (VK_ACCESS_TRANSFER_WRITE_BIT)
            oldLayout:                      VkImageLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL (7)
            newLayout:                      VkImageLayout = VK_IMAGE_LAYOUT_GENERAL (1)
            srcQueueFamilyIndex:            uint32_t = 4294967295
            dstQueueFamilyIndex:            uint32_t = 4294967295
            image:                          VkImage = 0x2df55e0
            subresourceRange:               VkImageSubresourceRange = 0x7ffc882367e0:
                aspectMask:                     VkImageAspectFlags = 1 (VK_IMAGE_ASPECT_COLOR_BIT)
                baseMipLevel:                   uint32_t = 0
                levelCount:                     uint32_t = 1
                baseArrayLayer:                 uint32_t = 0
                layerCount:                     uint32_t = 1

This fixes all the CTS dEQP-VK.memory.pipeline_barrier.transfer_dst tests here,
not sure if this is a too large hammer.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
7 years agor600: fix a compilation warning in r600_screen_create()
Samuel Pitoiset [Mon, 30 Jan 2017 12:55:53 +0000 (13:55 +0100)]
r600: fix a compilation warning in r600_screen_create()

Should be r600_common_screen instead of r600_screen.

Fixes: 80157a2c20 ("gallium/radeon: clean up r600_query_init_backend_mask")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/radeon: merge dirty_fb_counter and dirty_tex_descriptor_counter
Marek Olšák [Tue, 24 Jan 2017 22:37:56 +0000 (23:37 +0100)]
gallium/radeon: merge dirty_fb_counter and dirty_tex_descriptor_counter

to simplify things in draw_vbo a little

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agowinsys/radeon: clamp vram_vis_size to 256MB
Marek Olšák [Fri, 27 Jan 2017 11:11:33 +0000 (12:11 +0100)]
winsys/radeon: clamp vram_vis_size to 256MB

the value from the kernel is wrong

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: handle count_from_stream_output in a few IA_MULTI_VGT_PARAM cases
Marek Olšák [Sun, 29 Jan 2017 21:28:04 +0000 (22:28 +0100)]
radeonsi: handle count_from_stream_output in a few IA_MULTI_VGT_PARAM cases

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: don't invoke DCC decompression in update_all_texture_descriptors
Marek Olšák [Sun, 29 Jan 2017 22:59:59 +0000 (23:59 +0100)]
radeonsi: don't invoke DCC decompression in update_all_texture_descriptors

This fixes a bug uncovered by the 17-part patch series, specifically:
  "gallium/radeon: merge dirty_fb_counter and dirty_tex_descriptor_counter"

If dirty_tex_counter has been updated and set_shader_image invokes DCC
decompression, the DCC decompression itself checks the counter and updates
descriptors, which in turn invokes the same DCC decompression. The blitter
can't handle the recursion and the driver eventually crashes.

Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: fold info->indirect conditionals into the last one in draw_vbo
Marek Olšák [Thu, 26 Jan 2017 02:02:23 +0000 (03:02 +0100)]
radeonsi: fold info->indirect conditionals into the last one in draw_vbo

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: atomize the scratch buffer state
Marek Olšák [Thu, 26 Jan 2017 01:56:15 +0000 (02:56 +0100)]
radeonsi: atomize the scratch buffer state

The update frequency is very low.

Difference: Only account for the size when allocating a new one and when
            starting a new IB, and check for NULL. (v3)

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agor600: Fix stack overflow
Bartosz Tomczyk [Mon, 30 Jan 2017 13:07:45 +0000 (14:07 +0100)]
r600: Fix stack overflow

Commit 7b5878ee0491e7a93914389a8369cd6752b9757d increased number of
outputs to 64, but left output array intact. This caused stack overflow
when number of outputs is bigger then 32. Found by ASAN.

Cc: "12.0 13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/radeon: add new HUD queries for monitoring the CP
Samuel Pitoiset [Mon, 30 Jan 2017 11:52:56 +0000 (12:52 +0100)]
gallium/radeon: add new HUD queries for monitoring the CP

There are even more counters in the CP_STAT register but I think
these ones are enough for now.

v2: only read (and expose) CP_STAT on VI+

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agogallium/radeon: add new GPU-sdma-busy HUD query
Samuel Pitoiset [Mon, 30 Jan 2017 11:52:24 +0000 (12:52 +0100)]
gallium/radeon: add new GPU-sdma-busy HUD query

For simplicity, GPU-sdma-busy will return 0 on previous gens.

v2: only read SRBM_STATUS2 on Evergreen+

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agogallium/radeon: rename grbm to mmio in the gpu load path
Samuel Pitoiset [Thu, 26 Jan 2017 19:54:45 +0000 (20:54 +0100)]
gallium/radeon: rename grbm to mmio in the gpu load path

We also want to monitor other MMIO counters like SRBM_STATUS2 in
order to know if SDMA is busy.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agowinsys/amdgpu: add a fast exit path into amdgpu_cs_add_buffer
Marek Olšák [Thu, 26 Jan 2017 16:29:32 +0000 (17:29 +0100)]
winsys/amdgpu: add a fast exit path into amdgpu_cs_add_buffer

The time spent in the function dropped by 37% for torcs.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agowinsys/amdgpu: do not iterate twice when adding fence dependencies
Samuel Pitoiset [Fri, 27 Jan 2017 13:35:23 +0000 (14:35 +0100)]
winsys/amdgpu: do not iterate twice when adding fence dependencies

The perf difference is very small, 3.25->2.84% in amdgpu_cs_flush()
in the DXMD benchmark.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agowinsys/amdgpu: add one likely() call in amdgpu_cs_flush()
Samuel Pitoiset [Fri, 27 Jan 2017 13:35:22 +0000 (14:35 +0100)]
winsys/amdgpu: add one likely() call in amdgpu_cs_flush()

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agohud: fix compilation warnings in hud_nic_graph_install()
Samuel Pitoiset [Mon, 30 Jan 2017 10:19:14 +0000 (11:19 +0100)]
hud: fix compilation warnings in hud_nic_graph_install()

v2: use PRId64 instead of PRIx64

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
7 years agost/mesa: make st_texture_get_sampler_view() static
Samuel Pitoiset [Fri, 27 Jan 2017 13:34:52 +0000 (14:34 +0100)]
st/mesa: make st_texture_get_sampler_view() static

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/radeon: remove r600_common_context::max_db
Marek Olšák [Thu, 26 Jan 2017 01:40:34 +0000 (02:40 +0100)]
gallium/radeon: remove r600_common_context::max_db

this cleanup is based on the vulkan driver, which seems to do the same thing

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agowinsys/amdgpu: fix ADDR_REGISTER_VALUE::backendDisables
Marek Olšák [Thu, 26 Jan 2017 01:16:18 +0000 (02:16 +0100)]
winsys/amdgpu: fix ADDR_REGISTER_VALUE::backendDisables

This would be a fix if the value was used anywhere.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/radeon: clean up r600_query_init_backend_mask
Marek Olšák [Thu, 26 Jan 2017 00:33:23 +0000 (01:33 +0100)]
gallium/radeon: clean up r600_query_init_backend_mask

This just needs to be done for r600g in the screen.
We don't need an IB submission for every new context created for GCN.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: precompute IA_MULTI_VGT_PARAM values into a table
Marek Olšák [Wed, 25 Jan 2017 01:47:15 +0000 (02:47 +0100)]
radeonsi: precompute IA_MULTI_VGT_PARAM values into a table

The perf difference is very small: 0.99% -> 0.40% for the time spent
in si_get_ia_multi_vgt_param when si_draw_vbo is 20%. Pretty much nothing.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: move VGT_VERTEX_REUSE_BLOCK_CNTL into shader states for Polaris
Marek Olšák [Wed, 25 Jan 2017 02:27:34 +0000 (03:27 +0100)]
radeonsi: move VGT_VERTEX_REUSE_BLOCK_CNTL into shader states for Polaris

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: state atom IDs don't have to be off by one
Marek Olšák [Tue, 24 Jan 2017 23:15:35 +0000 (00:15 +0100)]
radeonsi: state atom IDs don't have to be off by one

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: use a bitmask for looping over dirty PM4 states
Marek Olšák [Tue, 24 Jan 2017 23:09:24 +0000 (00:09 +0100)]
radeonsi: use a bitmask for looping over dirty PM4 states

also move it to draw_vbo, because it should be 0 in most cases

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: atomize L2 prefetches
Marek Olšák [Tue, 24 Jan 2017 22:28:32 +0000 (23:28 +0100)]
radeonsi: atomize L2 prefetches

to move the big conditional statement out of draw_vbo

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: unbind disabled shader stages to prevent useless L2 prefetches
Marek Olšák [Tue, 24 Jan 2017 21:54:06 +0000 (22:54 +0100)]
radeonsi: unbind disabled shader stages to prevent useless L2 prefetches

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: also prefetch compute shaders
Marek Olšák [Tue, 24 Jan 2017 02:41:05 +0000 (03:41 +0100)]
radeonsi: also prefetch compute shaders

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: update dirty_level_mask only after the first draw after FB change
Marek Olšák [Tue, 24 Jan 2017 02:25:40 +0000 (03:25 +0100)]
radeonsi: update dirty_level_mask only after the first draw after FB change

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agogallium/radeon: allow VRAM-only placements again on APUs & recent amdgpu
Marek Olšák [Mon, 23 Jan 2017 22:41:47 +0000 (23:41 +0100)]
gallium/radeon: allow VRAM-only placements again on APUs & recent amdgpu

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: don't set +fp64-denormals
Marek Olšák [Mon, 23 Jan 2017 22:32:31 +0000 (23:32 +0100)]
radeonsi: don't set +fp64-denormals

it's the default and the name will change to +fp64-fp16-denormals.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoradeonsi: remove si_shader_context::param_tess_offchip
Marek Olšák [Sun, 22 Jan 2017 12:58:05 +0000 (13:58 +0100)]
radeonsi: remove si_shader_context::param_tess_offchip

we don't use on-chip tess.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agoetnaviv: force vertex buffers through the MMU
Lucas Stach [Mon, 21 Nov 2016 10:54:25 +0000 (11:54 +0100)]
etnaviv: force vertex buffers through the MMU

This fixes a vertex data corruption issue if some of the vertex streams
go through the MMU and some don't.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Philipp Zabel <p.zabel@pengutronix.de>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
7 years agoradv: Expose VK_KHR_maintenance1
Andres Rodriguez [Fri, 27 Jan 2017 05:03:08 +0000 (00:03 -0500)]
radv: Expose VK_KHR_maintenance1

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoradv: Fix vkCmdCopyImage for 2d slices into 3d Images
Andres Rodriguez [Fri, 27 Jan 2017 05:03:07 +0000 (00:03 -0500)]
radv: Fix vkCmdCopyImage for 2d slices into 3d Images

Previously the z offset of the destination image was being ignored. It
should be taken into account when copying into a 3d target.

Also, img_extent_el.depth was being incorrectly clamped to 1 due to the
source image being VK_IMAGE_TYPE_2D. This would result in the blit
failing to iterate over all the 3d slices. Instead we clamp to the
destination image type.

Fixes failures in CTS tests:
dEQP-VK.api.copy_and_blit.image_to_image.3d_images.*

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoradv: Expose transfer format features.
Bas Nieuwenhuizen [Fri, 27 Jan 2017 05:03:06 +0000 (00:03 -0500)]
radv: Expose transfer format features.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
7 years agoradv: Don't allow any operations on non-supported depth/stencil formats.
Bas Nieuwenhuizen [Fri, 27 Jan 2017 05:03:05 +0000 (00:03 -0500)]
radv: Don't allow any operations on non-supported depth/stencil formats.

We really use the depth block for the blits.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: use new error codes for AllocateDescriptorSets
Andres Rodriguez [Fri, 27 Jan 2017 05:03:04 +0000 (00:03 -0500)]
radv: use new error codes for AllocateDescriptorSets

There is a new error code in Maintenance1 that is more specific to the
situation: VK_ERROR_OUT_OF_POOL_MEMORY_KHR

Fixes CTS test case:
dEQP-VK.api.descriptor_pool.out_of_pool_memory

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoradv: vkAllocateCommandBuffers should NULL all output handles
Andres Rodriguez [Fri, 27 Jan 2017 05:03:03 +0000 (00:03 -0500)]
radv: vkAllocateCommandBuffers should NULL all output handles

This is part of the spec and fixes CTS tests:
dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_*

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoradv: add trim command pool stub
Andres Rodriguez [Fri, 27 Jan 2017 05:03:02 +0000 (00:03 -0500)]
radv: add trim command pool stub

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
7 years agoi965: Support the force_glsl_version driconf option.
Kenneth Graunke [Sat, 21 Jan 2017 04:33:57 +0000 (20:33 -0800)]
i965: Support the force_glsl_version driconf option.

Gallium drivers have had this for a while.  It makes sense to support
it consistently across drivers, so expose it in i965 as well.

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
7 years agoi965: Fix check for negative pitch in can_do_fast_copy_blit().
Kenneth Graunke [Thu, 26 Jan 2017 09:27:42 +0000 (01:27 -0800)]
i965: Fix check for negative pitch in can_do_fast_copy_blit().

At this point, the pitch is in bytes.  We haven't yet divided the pitch
by 4 for tiled surfaces, so abs(pitch) may be larger than 32K.  This
means the bit 15 trick won't work.

The caller now has signed integers anyway, so just pass those through
and do the obvious check.

Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
7 years agoradv: Handle command buffers that need scratch memory.
Bas Nieuwenhuizen [Sun, 29 Jan 2017 12:53:05 +0000 (13:53 +0100)]
radv: Handle command buffers that need scratch memory.

v2: Create the descriptor BO with CPU access.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv: Track scratch usage across pipelines & command buffers.
Bas Nieuwenhuizen [Sun, 29 Jan 2017 14:20:03 +0000 (15:20 +0100)]
radv: Track scratch usage across pipelines & command buffers.

Based on code written by Dave Airlie.

Signed-off-by: Bas Nieuwenhuizen <basni@oogle.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/ac: Add compiler support for spilling.
Bas Nieuwenhuizen [Sat, 28 Jan 2017 22:51:19 +0000 (23:51 +0100)]
radv/ac: Add compiler support for spilling.

Based on code written by Dave Airlie.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoradv/amdgpu: Support a preamble CS.
Bas Nieuwenhuizen [Thu, 26 Jan 2017 23:19:52 +0000 (00:19 +0100)]
radv/amdgpu: Support a preamble CS.

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
7 years agoi965: add assert to while_jumps_before_offset()
Timothy Arceri [Thu, 26 Jan 2017 02:50:42 +0000 (13:50 +1100)]
i965: add assert to while_jumps_before_offset()

jip should always be negative here as its the result of
do instruction - while instruction.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agoi965: fix up asserts in brw_inst_set_jip()
Timothy Arceri [Thu, 26 Jan 2017 02:50:41 +0000 (13:50 +1100)]
i965: fix up asserts in brw_inst_set_jip()

We are casting from a signed 32bit int to an unsigned 16bit int
so shift 15 bits rather than 16.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
7 years agollvmpipe: Use LLVMDumpModule, not DumpModule.
Bas Nieuwenhuizen [Sun, 29 Jan 2017 16:03:25 +0000 (17:03 +0100)]
llvmpipe: Use LLVMDumpModule, not DumpModule.

Forgot the prefix ...

Fixes: 0fca80b3db64dc1d004f78e22b9de86a07e9de96
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
7 years agovarious: Fix missing DumpModule with recent LLVM.
Bas Nieuwenhuizen [Sat, 28 Jan 2017 16:32:05 +0000 (17:32 +0100)]
various: Fix missing DumpModule with recent LLVM.

Since LLVM revision 293359 DumpModule gets only implemented when
either a debug build or LLVM_ENABLE_DUMP is set.

This patch adds a direct replacement for the function for radv and
radeonsi, However, as I don't know a good place to put common LLVM
code for all three I inlined the implementation for LLVMPipe.

v2: Use the new code for LLVM 3.4+ instead of LLVM 5+ & fixed indentation

Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
7 years agor600g: use ieee variants of multiplication instructions
Ilia Mirkin [Tue, 24 Jan 2017 01:53:50 +0000 (20:53 -0500)]
r600g: use ieee variants of multiplication instructions

This matches the behavior of most other drivers, including nouveau,
radeonsi, and i965.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agor600g: add support for optionally using non-IEEE mul ops
Ilia Mirkin [Tue, 24 Jan 2017 02:02:28 +0000 (21:02 -0500)]
r600g: add support for optionally using non-IEEE mul ops

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
7 years agovc4: Coalesce into TLB writes as well as VPM/tex.
Eric Anholt [Wed, 18 Jan 2017 22:31:45 +0000 (09:31 +1100)]
vc4: Coalesce into TLB writes as well as VPM/tex.

This generally cuts an instruction when blending is enabled and we thus
have a single instruction generating the color value.

total instructions in shared programs: 91759 -> 91634 (-0.14%)
instructions in affected programs:     5338 -> 5213 (-2.34%)

7 years agovc4: Avoid an extra temporary and mov in ffloor/ffract/fceil.
Eric Anholt [Wed, 18 Jan 2017 03:23:14 +0000 (14:23 +1100)]
vc4: Avoid an extra temporary and mov in ffloor/ffract/fceil.

shader-db results:

total instructions in shared programs: 92611 -> 91764 (-0.91%)
instructions in affected programs:     27417 -> 26570 (-3.09%)

The star is one shader in glmark2's terrain (drops 16% of its
instructions), but there are also wins in mupen64plus and glb2.7.