Dave Airlie [Thu, 19 Jan 2017 03:55:19 +0000 (13:55 +1000)]
radv/ac: add copy shader creation
This create the gs copy shader and compiles it.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 19 Jan 2017 03:48:26 +0000 (13:48 +1000)]
radv/ac: setup function parameters for vs as es and copy shader.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 19 Jan 2017 03:43:26 +0000 (13:43 +1000)]
radv: pass some necessary gs info back to state handling.
We need this info to program some registers.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 19 Jan 2017 03:26:01 +0000 (13:26 +1000)]
radv: emit vertex shader to correct hw block.
This emits the shader to the ES block in the correct case.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 19 Jan 2017 03:23:55 +0000 (13:23 +1000)]
radv/ac: propogate as_es flag into shader info from key.
This just places the flag into the shader info so we can use it from
the driver after we create the shader.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 19 Jan 2017 02:58:00 +0000 (12:58 +1000)]
radv: extend shader stage code to cover geometry shaders.
This enables the paths for setting up user ptrs to vs/es and gs.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 18 Jan 2017 06:13:20 +0000 (16:13 +1000)]
radv/ac: start setting up the geom shader rings (v2)
This sets up the rings and adds the variables
needed to make them work.
v2: rework for sharing ring and scratch
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 18 Jan 2017 05:22:44 +0000 (15:22 +1000)]
radv/ac: handle geom shader sgpr/vgpr inputs
This just sets up the gpr inputs.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 18 Jan 2017 05:17:35 +0000 (15:17 +1000)]
radv/ac: add geom shader sendmsg defines.
This just adds some defines needed for geom shaders.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 18 Jan 2017 05:11:52 +0000 (15:11 +1000)]
radv/ac: add some geom shader info from nir->ac shader.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 18 Jan 2017 04:48:09 +0000 (14:48 +1000)]
radv: move hw vertex shader emit to separate function
This is to later allow ES shaders to be emitted.
Review-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 18 Jan 2017 03:55:05 +0000 (13:55 +1000)]
radv: fixup ia multi vgt param code to handle geom shaders.
This fixes up a few of the commented out blocks.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 18 Jan 2017 03:54:17 +0000 (13:54 +1000)]
radv: add code to set gs_table_depth.
Review-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Wed, 18 Jan 2017 03:50:16 +0000 (13:50 +1000)]
radv: add small helper to denote when a geom shader is in the pipeline.
Review-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Robert Foss [Mon, 30 Jan 2017 21:26:58 +0000 (16:26 -0500)]
radv: Prevent Coverity warning
Prevent Coverity seeing potential errors when src is
no initialized in the switch case.
Coverity-Id:
1396397
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Timothy Arceri [Fri, 8 Jul 2016 02:44:44 +0000 (12:44 +1000)]
mesa: add new MESA_GLSL flag for printing shader cache debug info
Reviewed-by: Eric Anholt <eric@anholt.net>
Carl Worth [Thu, 14 Apr 2016 01:04:23 +0000 (11:04 +1000)]
glsl: add cache to ctx and add sha1 string fields
We also add a flag for detecting shaders written to shader cache.
V2: dont leak cache
Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Carl Worth [Thu, 14 Apr 2016 00:48:19 +0000 (10:48 +1000)]
glsl: add new uniform fields to be used to restore state from cache
Signed-off-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Carl Worth [Mon, 16 Mar 2015 18:46:20 +0000 (11:46 -0700)]
glsl: Switch to disable-by-default for the GLSL shader cache
The shader cache is expected to be developed incrementally over a
fairly long series of commits. For that period of instability, we
require users to opt into the shader cache by setting:
MESA_GLSL_CACHE_ENABLE=1
In the future, when the shader cache is complete, we can revert this
commit so that the cache will be on by default.
The user can always disable the cache with
MESA_GLSL_CACHE_DISABLE=1. That functionality is not affected by this
commit, (nor will it be affected by the future revert).
Reviewed-by: Eric Anholt <eric@anholt.net>
Dave Airlie [Mon, 30 Jan 2017 19:19:56 +0000 (05:19 +1000)]
radv/ac: implement txs for buffer textures.
This fixes a bunch of buffer related:
dEQP-VK.memory.pipeline_barrier.*
tests, that were crashing in LLVM due to this being missing.
Reviewed-by: Andres Rodriguez<andresx7@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 30 Jan 2017 18:50:30 +0000 (04:50 +1000)]
radv/ac: handle nir irem opcode.
This fixes:
dEQP-VK.spirv_assembly.instruction.compute.opsrem.*
Reviewed-by: Andres Rodriguez <andresx7@gmail.com>
Cc: "17.0" <mesa-stable@lists.freedesktop.org"
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 30 Jan 2017 06:13:30 +0000 (16:13 +1000)]
radv/ac: fix multisample subpass image.
We weren't adding the fragment position properly.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 30 Jan 2017 03:17:05 +0000 (13:17 +1000)]
radv: handle transfer_write as a dst flag.
It appears we can get image barriers like:
srcStageMask: VkPipelineStageFlags = 4096 (VK_PIPELINE_STAGE_TRANSFER_BIT)
dstStageMask: VkPipelineStageFlags = 4096 (VK_PIPELINE_STAGE_TRANSFER_BIT)
dependencyFlags: VkDependencyFlags = 0
memoryBarrierCount: uint32_t = 0
pMemoryBarriers: const VkMemoryBarrier* = NULL
bufferMemoryBarrierCount: uint32_t = 0
pBufferMemoryBarriers: const VkBufferMemoryBarrier* = NULL
imageMemoryBarrierCount: uint32_t = 1
pImageMemoryBarriers: const VkImageMemoryBarrier* = 0x7ffc882367b0
pImageMemoryBarriers[0]: const VkImageMemoryBarrier = 0x7ffc882367b0:
sType: VkStructureType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER (45)
pNext: const void* = NULL
srcAccessMask: VkAccessFlags = 4096 (VK_ACCESS_TRANSFER_WRITE_BIT)
dstAccessMask: VkAccessFlags = 4096 (VK_ACCESS_TRANSFER_WRITE_BIT)
oldLayout: VkImageLayout = VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL (7)
newLayout: VkImageLayout = VK_IMAGE_LAYOUT_GENERAL (1)
srcQueueFamilyIndex: uint32_t =
4294967295
dstQueueFamilyIndex: uint32_t =
4294967295
image: VkImage = 0x2df55e0
subresourceRange: VkImageSubresourceRange = 0x7ffc882367e0:
aspectMask: VkImageAspectFlags = 1 (VK_IMAGE_ASPECT_COLOR_BIT)
baseMipLevel: uint32_t = 0
levelCount: uint32_t = 1
baseArrayLayer: uint32_t = 0
layerCount: uint32_t = 1
This fixes all the CTS dEQP-VK.memory.pipeline_barrier.transfer_dst tests here,
not sure if this is a too large hammer.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Mon, 30 Jan 2017 12:55:53 +0000 (13:55 +0100)]
r600: fix a compilation warning in r600_screen_create()
Should be r600_common_screen instead of r600_screen.
Fixes: 80157a2c20 ("gallium/radeon: clean up r600_query_init_backend_mask")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 24 Jan 2017 22:37:56 +0000 (23:37 +0100)]
gallium/radeon: merge dirty_fb_counter and dirty_tex_descriptor_counter
to simplify things in draw_vbo a little
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Fri, 27 Jan 2017 11:11:33 +0000 (12:11 +0100)]
winsys/radeon: clamp vram_vis_size to 256MB
the value from the kernel is wrong
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 29 Jan 2017 21:28:04 +0000 (22:28 +0100)]
radeonsi: handle count_from_stream_output in a few IA_MULTI_VGT_PARAM cases
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 29 Jan 2017 22:59:59 +0000 (23:59 +0100)]
radeonsi: don't invoke DCC decompression in update_all_texture_descriptors
This fixes a bug uncovered by the 17-part patch series, specifically:
"gallium/radeon: merge dirty_fb_counter and dirty_tex_descriptor_counter"
If dirty_tex_counter has been updated and set_shader_image invokes DCC
decompression, the DCC decompression itself checks the counter and updates
descriptors, which in turn invokes the same DCC decompression. The blitter
can't handle the recursion and the driver eventually crashes.
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 26 Jan 2017 02:02:23 +0000 (03:02 +0100)]
radeonsi: fold info->indirect conditionals into the last one in draw_vbo
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 26 Jan 2017 01:56:15 +0000 (02:56 +0100)]
radeonsi: atomize the scratch buffer state
The update frequency is very low.
Difference: Only account for the size when allocating a new one and when
starting a new IB, and check for NULL. (v3)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Bartosz Tomczyk [Mon, 30 Jan 2017 13:07:45 +0000 (14:07 +0100)]
r600: Fix stack overflow
Commit
7b5878ee0491e7a93914389a8369cd6752b9757d increased number of
outputs to 64, but left output array intact. This caused stack overflow
when number of outputs is bigger then 32. Found by ASAN.
Cc: "12.0 13.0 17.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Samuel Pitoiset [Mon, 30 Jan 2017 11:52:56 +0000 (12:52 +0100)]
gallium/radeon: add new HUD queries for monitoring the CP
There are even more counters in the CP_STAT register but I think
these ones are enough for now.
v2: only read (and expose) CP_STAT on VI+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Mon, 30 Jan 2017 11:52:24 +0000 (12:52 +0100)]
gallium/radeon: add new GPU-sdma-busy HUD query
For simplicity, GPU-sdma-busy will return 0 on previous gens.
v2: only read SRBM_STATUS2 on Evergreen+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Thu, 26 Jan 2017 19:54:45 +0000 (20:54 +0100)]
gallium/radeon: rename grbm to mmio in the gpu load path
We also want to monitor other MMIO counters like SRBM_STATUS2 in
order to know if SDMA is busy.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Marek Olšák [Thu, 26 Jan 2017 16:29:32 +0000 (17:29 +0100)]
winsys/amdgpu: add a fast exit path into amdgpu_cs_add_buffer
The time spent in the function dropped by 37% for torcs.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Samuel Pitoiset [Fri, 27 Jan 2017 13:35:23 +0000 (14:35 +0100)]
winsys/amdgpu: do not iterate twice when adding fence dependencies
The perf difference is very small, 3.25->2.84% in amdgpu_cs_flush()
in the DXMD benchmark.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Fri, 27 Jan 2017 13:35:22 +0000 (14:35 +0100)]
winsys/amdgpu: add one likely() call in amdgpu_cs_flush()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Mon, 30 Jan 2017 10:19:14 +0000 (11:19 +0100)]
hud: fix compilation warnings in hud_nic_graph_install()
v2: use PRId64 instead of PRIx64
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Samuel Pitoiset [Fri, 27 Jan 2017 13:34:52 +0000 (14:34 +0100)]
st/mesa: make st_texture_get_sampler_view() static
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 26 Jan 2017 01:40:34 +0000 (02:40 +0100)]
gallium/radeon: remove r600_common_context::max_db
this cleanup is based on the vulkan driver, which seems to do the same thing
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 26 Jan 2017 01:16:18 +0000 (02:16 +0100)]
winsys/amdgpu: fix ADDR_REGISTER_VALUE::backendDisables
This would be a fix if the value was used anywhere.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 26 Jan 2017 00:33:23 +0000 (01:33 +0100)]
gallium/radeon: clean up r600_query_init_backend_mask
This just needs to be done for r600g in the screen.
We don't need an IB submission for every new context created for GCN.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 25 Jan 2017 01:47:15 +0000 (02:47 +0100)]
radeonsi: precompute IA_MULTI_VGT_PARAM values into a table
The perf difference is very small: 0.99% -> 0.40% for the time spent
in si_get_ia_multi_vgt_param when si_draw_vbo is 20%. Pretty much nothing.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 25 Jan 2017 02:27:34 +0000 (03:27 +0100)]
radeonsi: move VGT_VERTEX_REUSE_BLOCK_CNTL into shader states for Polaris
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 24 Jan 2017 23:15:35 +0000 (00:15 +0100)]
radeonsi: state atom IDs don't have to be off by one
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 24 Jan 2017 23:09:24 +0000 (00:09 +0100)]
radeonsi: use a bitmask for looping over dirty PM4 states
also move it to draw_vbo, because it should be 0 in most cases
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 24 Jan 2017 22:28:32 +0000 (23:28 +0100)]
radeonsi: atomize L2 prefetches
to move the big conditional statement out of draw_vbo
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 24 Jan 2017 21:54:06 +0000 (22:54 +0100)]
radeonsi: unbind disabled shader stages to prevent useless L2 prefetches
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 24 Jan 2017 02:41:05 +0000 (03:41 +0100)]
radeonsi: also prefetch compute shaders
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 24 Jan 2017 02:25:40 +0000 (03:25 +0100)]
radeonsi: update dirty_level_mask only after the first draw after FB change
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Mon, 23 Jan 2017 22:41:47 +0000 (23:41 +0100)]
gallium/radeon: allow VRAM-only placements again on APUs & recent amdgpu
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Mon, 23 Jan 2017 22:32:31 +0000 (23:32 +0100)]
radeonsi: don't set +fp64-denormals
it's the default and the name will change to +fp64-fp16-denormals.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Sun, 22 Jan 2017 12:58:05 +0000 (13:58 +0100)]
radeonsi: remove si_shader_context::param_tess_offchip
we don't use on-chip tess.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Lucas Stach [Mon, 21 Nov 2016 10:54:25 +0000 (11:54 +0100)]
etnaviv: force vertex buffers through the MMU
This fixes a vertex data corruption issue if some of the vertex streams
go through the MMU and some don't.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Philipp Zabel <p.zabel@pengutronix.de>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Andres Rodriguez [Fri, 27 Jan 2017 05:03:08 +0000 (00:03 -0500)]
radv: Expose VK_KHR_maintenance1
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Andres Rodriguez [Fri, 27 Jan 2017 05:03:07 +0000 (00:03 -0500)]
radv: Fix vkCmdCopyImage for 2d slices into 3d Images
Previously the z offset of the destination image was being ignored. It
should be taken into account when copying into a 3d target.
Also, img_extent_el.depth was being incorrectly clamped to 1 due to the
source image being VK_IMAGE_TYPE_2D. This would result in the blit
failing to iterate over all the 3d slices. Instead we clamp to the
destination image type.
Fixes failures in CTS tests:
dEQP-VK.api.copy_and_blit.image_to_image.3d_images.*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bas Nieuwenhuizen [Fri, 27 Jan 2017 05:03:06 +0000 (00:03 -0500)]
radv: Expose transfer format features.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Bas Nieuwenhuizen [Fri, 27 Jan 2017 05:03:05 +0000 (00:03 -0500)]
radv: Don't allow any operations on non-supported depth/stencil formats.
We really use the depth block for the blits.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Andres Rodriguez [Fri, 27 Jan 2017 05:03:04 +0000 (00:03 -0500)]
radv: use new error codes for AllocateDescriptorSets
There is a new error code in Maintenance1 that is more specific to the
situation: VK_ERROR_OUT_OF_POOL_MEMORY_KHR
Fixes CTS test case:
dEQP-VK.api.descriptor_pool.out_of_pool_memory
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Andres Rodriguez [Fri, 27 Jan 2017 05:03:03 +0000 (00:03 -0500)]
radv: vkAllocateCommandBuffers should NULL all output handles
This is part of the spec and fixes CTS tests:
dEQP-VK.api.object_management.alloc_callback_fail_multiple.command_buffer_*
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Andres Rodriguez [Fri, 27 Jan 2017 05:03:02 +0000 (00:03 -0500)]
radv: add trim command pool stub
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Kenneth Graunke [Sat, 21 Jan 2017 04:33:57 +0000 (20:33 -0800)]
i965: Support the force_glsl_version driconf option.
Gallium drivers have had this for a while. It makes sense to support
it consistently across drivers, so expose it in i965 as well.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Kenneth Graunke [Thu, 26 Jan 2017 09:27:42 +0000 (01:27 -0800)]
i965: Fix check for negative pitch in can_do_fast_copy_blit().
At this point, the pitch is in bytes. We haven't yet divided the pitch
by 4 for tiled surfaces, so abs(pitch) may be larger than 32K. This
means the bit 15 trick won't work.
The caller now has signed integers anyway, so just pass those through
and do the obvious check.
Cc: "17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bas Nieuwenhuizen [Sun, 29 Jan 2017 12:53:05 +0000 (13:53 +0100)]
radv: Handle command buffers that need scratch memory.
v2: Create the descriptor BO with CPU access.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sun, 29 Jan 2017 14:20:03 +0000 (15:20 +0100)]
radv: Track scratch usage across pipelines & command buffers.
Based on code written by Dave Airlie.
Signed-off-by: Bas Nieuwenhuizen <basni@oogle.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sat, 28 Jan 2017 22:51:19 +0000 (23:51 +0100)]
radv/ac: Add compiler support for spilling.
Based on code written by Dave Airlie.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Thu, 26 Jan 2017 23:19:52 +0000 (00:19 +0100)]
radv/amdgpu: Support a preamble CS.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Timothy Arceri [Thu, 26 Jan 2017 02:50:42 +0000 (13:50 +1100)]
i965: add assert to while_jumps_before_offset()
jip should always be negative here as its the result of
do instruction - while instruction.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Timothy Arceri [Thu, 26 Jan 2017 02:50:41 +0000 (13:50 +1100)]
i965: fix up asserts in brw_inst_set_jip()
We are casting from a signed 32bit int to an unsigned 16bit int
so shift 15 bits rather than 16.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Bas Nieuwenhuizen [Sun, 29 Jan 2017 16:03:25 +0000 (17:03 +0100)]
llvmpipe: Use LLVMDumpModule, not DumpModule.
Forgot the prefix ...
Fixes: 0fca80b3db64dc1d004f78e22b9de86a07e9de96
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Bas Nieuwenhuizen [Sat, 28 Jan 2017 16:32:05 +0000 (17:32 +0100)]
various: Fix missing DumpModule with recent LLVM.
Since LLVM revision 293359 DumpModule gets only implemented when
either a debug build or LLVM_ENABLE_DUMP is set.
This patch adds a direct replacement for the function for radv and
radeonsi, However, as I don't know a good place to put common LLVM
code for all three I inlined the implementation for LLVMPipe.
v2: Use the new code for LLVM 3.4+ instead of LLVM 5+ & fixed indentation
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Ilia Mirkin [Tue, 24 Jan 2017 01:53:50 +0000 (20:53 -0500)]
r600g: use ieee variants of multiplication instructions
This matches the behavior of most other drivers, including nouveau,
radeonsi, and i965.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Ilia Mirkin [Tue, 24 Jan 2017 02:02:28 +0000 (21:02 -0500)]
r600g: add support for optionally using non-IEEE mul ops
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Eric Anholt [Wed, 18 Jan 2017 22:31:45 +0000 (09:31 +1100)]
vc4: Coalesce into TLB writes as well as VPM/tex.
This generally cuts an instruction when blending is enabled and we thus
have a single instruction generating the color value.
total instructions in shared programs: 91759 -> 91634 (-0.14%)
instructions in affected programs: 5338 -> 5213 (-2.34%)
Eric Anholt [Wed, 18 Jan 2017 03:23:14 +0000 (14:23 +1100)]
vc4: Avoid an extra temporary and mov in ffloor/ffract/fceil.
shader-db results:
total instructions in shared programs: 92611 -> 91764 (-0.91%)
instructions in affected programs: 27417 -> 26570 (-3.09%)
The star is one shader in glmark2's terrain (drops 16% of its
instructions), but there are also wins in mupen64plus and glb2.7.
Eric Anholt [Tue, 17 Jan 2017 23:38:55 +0000 (10:38 +1100)]
vc4: Flip the switch to run the GLSL compiler optimization loop once.
This has almost no effect on shader-db:
total instructions in shared programs: 92572 -> 92611 (0.04%)
instructions in affected programs: 4486 -> 4525 (0.87%)
Looking at 2 of the 7 different shaders that were hurt (all of which were
in mupen64), they all appear to be just differences in order of
instructions at the NIR level.
The advantage is that this should significantly reduce time in the compiler.
Kenneth Graunke [Wed, 25 Jan 2017 08:59:42 +0000 (00:59 -0800)]
i965: Unbind deleted shaders from brw_context, fixing malloc heisenbug.
Applications may delete a shader program, create a new one, and bind it
before the next draw. With terrible luck, malloc may randomly return a
chunk of memory for the new gl_program that happened to be the exact
same pointer as our previously bound gl_program. In this case, our
logic to detect new programs in brw_upload_pipeline_state() would break:
if (brw->vertex_program != ctx->VertexProgram._Current) {
brw->vertex_program = ctx->VertexProgram._Current;
brw->ctx.NewDriverState |= BRW_NEW_VERTEX_PROGRAM;
}
Because the pointer is the same, we'd think it was the same program.
But it could be wildly different - a different stage altogether,
different sets of resources, and so on. This causes utter chaos.
As unlikely as this seems, I believe I hit this when running a subset
of the CTS in a loop, in a group of tests that churns through simple
programs, deleting and rebuilding them. Presumably malloc uses a
bucketing cache of sorts, and so freeing up a gl_program and allocating
a new one fairly quickly causes it to reuse that memory.
The result was that brw->vertex_program->info.num_ssbos claimed the
program had SSBOs, while brw->vs.base.prog_data.binding_table claimed
that there were none. This was crazy, because the binding table is
calculated from info.num_ssbos - the shader info appeared to change
between shader compile time and draw time. Careful use of watchpoints
revealed that it was being clobbered by rzalloc's memset when building
an entirely different program...
Fortunately, our 0xd0d0d0d0 canary for unused binding table entries
caused us to crash out of bounds when trying to upload SSBOs, or we
may have never discovered this heisenbug.
Fixes crashes in GL45-CTS.compute_shader.sso-case2 when using a hacked
cts-runner that only runs GL45-CTS.compute_shader.s* in EGL config ID 5
at 64x64 in a loop with 100 iterations.
Cc: "17.0 13.0 12.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Bas Nieuwenhuizen [Sat, 28 Jan 2017 00:32:20 +0000 (01:32 +0100)]
radv/ac: Use base in push constant loads.
Apparently the source is not an address but an offset, so we actually
need to use the base.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
CC: <mesa-stable@lists.freedesktop.org>
Andres Rodriguez [Fri, 27 Jan 2017 05:09:58 +0000 (00:09 -0500)]
radv: drop support for VK_AMD_NEGATIVE_VIEWPORT_HEIGHT
This extension was not correctly supported, and it conflicts with the
VK_KHR_MAINTENANCE1 spec.
Reviewed-by: Fredrik Höglund <fredrik@kde.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 11 Nov 2016 02:27:21 +0000 (02:27 +0000)]
radv: implement VK_KHR_GET_PHYSICAL_DEVICE_PROPERTIES_2
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 9 Jan 2017 07:02:17 +0000 (07:02 +0000)]
radv: use proper maximum slice for layered view
this fixes deferred shadows with geom shaders enabled.
but I think this fix is fine by itself.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Chad Versace [Fri, 13 Jan 2017 18:46:49 +0000 (10:46 -0800)]
i965/sync: Implement fences based on Linux sync_file
This patch implements a new type of struct brw_fence, one that is based
struct sync_file.
This completes support for EGL_ANDROID_native_fence_sync.
* Background
Linux 4.7 added a new file type, struct sync_file. See
commit
460bfc41fd52959311ed0328163f785e023857af
Author: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
Date: Thu Apr 28 10:46:57 2016 -0300
Subject: dma-buf/sync_file: de-stage sync_file headers
A sync file is a cross-driver explicit synchronization primitive. In a
sense, sync_file's relation to synchronization is similar to dma_buf's
relation to memory: both are primitives that can be imported and
exported across drivers (at least in theory).
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Chad Versace [Fri, 13 Jan 2017 18:46:49 +0000 (10:46 -0800)]
i965/sync: Rename brw_fence_insert()
Rename to brw_fence_insert_locked(). This is correct because the fence's
mutex is effectively locked, as all callers are also *creators* of the
fence, and have not yet returned the new fence.
This reduces noise in the next patch, which defines and uses
brw_fence_insert(), an unlocked variant.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Chad Versace [Fri, 13 Jan 2017 18:46:49 +0000 (10:46 -0800)]
i965/sync: Fail sync creation when batchbuffer flush fails
Pre-patch, brw_sync.c ignored the return value of
intel_batchbuffer_flush().
When intel_batchbuffer_flush() fails during eglCreateSync
(brw_dri_create_fence), we now give up, cleanup, and return NULL.
When it fails during glFenceSync, however, we blindly continue and hope
for the best because there does not exist yet a way to tell core GL that
sync creation failed.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Chad Versace [Fri, 13 Jan 2017 18:46:49 +0000 (10:46 -0800)]
i965/sync: Add brw_fence::type
This a refactor patch; no expected changed in behavior.
Add `enum brw_fence_type` and brw_fence::type. There is only one type
currently, BRW_FENCE_TYPE_BO_WAIT. This patch reduces a lot of noise in
the next, which adds new type BRW_FENCE_TYPE_SYNC_FD.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Chad Versace [Fri, 13 Jan 2017 18:46:49 +0000 (10:46 -0800)]
i965: Add intel_batchbuffer_flush_fence()
A variant of intel_batchbuffer_flush() with parameters for in and out
fence fds.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Chad Versace [Fri, 13 Jan 2017 18:46:48 +0000 (10:46 -0800)]
i965: Add intel_screen::has_fence_fd
This bool maps to I915_PARAM_HAS_EXEC_FENCE_FD.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Chad Versace [Fri, 13 Jan 2017 18:46:49 +0000 (10:46 -0800)]
configure: Require libdrm >= 2.4.75
Required to implement EGL_ANDROID_native_fence_sync on i965.
Specifically, i965 needs drm_intel_gem_bo_exec_fence(),
I915_PARAM_HAS_EXEC_FENCE, and libsync.h.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
Tested-by: Rafael Antognolli <rafael.antognolli@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Emil Velikov [Fri, 27 Jan 2017 18:29:38 +0000 (18:29 +0000)]
configure.ac: list radeon in --with-vulkan-drivers help string
Analogous to what we do for the dri and gallium drivers.
Cc: 17.0 13.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@colllabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Emil Velikov [Fri, 27 Jan 2017 18:05:13 +0000 (18:05 +0000)]
radv: automake: Don't install vk_platform.h or vulkan.h.
These files belong to the vulkan loader.
Identical to
045f38a5075 vulkan: Don't install vk_platform.h or vulkan.h.
Cc: Dave Airlie <airlied@redhat.com>
Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: 17.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Jason Ekstrand [Thu, 26 Jan 2017 04:57:47 +0000 (20:57 -0800)]
anv: Advertise API version 1.0.39
I'm pretty sure we've kept up with the bug fixes.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Eric Engestrom [Fri, 27 Jan 2017 17:29:05 +0000 (17:29 +0000)]
gbm/dri: fix memory leaks in error path
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
[Emil Velikov: make sure it builds]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Emil Velikov [Thu, 26 Jan 2017 19:26:13 +0000 (19:26 +0000)]
docs/releasing: add a note about the relnotes template
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Emil Velikov [Thu, 26 Jan 2017 13:24:10 +0000 (13:24 +0000)]
mesa: remove explicit __STDC_FORMAT_MACROS define
Analogous to previous commits.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Emil Velikov [Thu, 26 Jan 2017 13:24:09 +0000 (13:24 +0000)]
nouveau: remove explicit __STDC_FORMAT_MACROS define
Already handled by the build.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Emil Velikov [Thu, 26 Jan 2017 13:24:08 +0000 (13:24 +0000)]
scons: swr: remove explicit __STDC_.*_MACROS defines
Analogous to previous commits.
Cc: George Kyriazis <george.kyriazis@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Emil Velikov [Thu, 26 Jan 2017 13:24:07 +0000 (13:24 +0000)]
gallium: remove explicit __STDC_.*_MACROS defines
Analogous to previous commits.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Emil Velikov [Thu, 26 Jan 2017 13:24:06 +0000 (13:24 +0000)]
gallivm: remove explicit __STDC_.*_MACROS defines
Correctly handled by the build systems.
Cc: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Emil Velikov [Thu, 26 Jan 2017 13:24:05 +0000 (13:24 +0000)]
glsl: remove explicit __STDC_FORMAT_MACROS define
Correctly handled by all the build systems.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Emil Velikov [Thu, 26 Jan 2017 13:24:04 +0000 (13:24 +0000)]
autoconf: set all __STDC_*_MACROS
Analogous to previous commit(s), with a minor detail - here we set the
macros when building both C and C++ sources.
Resolving that is a more challenging task that we'll sort out another
day.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>