Jason Ekstrand [Wed, 6 Dec 2017 06:31:02 +0000 (22:31 -0800)]
spirv: Add a prepass to set types on vtn_values
This autogenerated pass will automatically find and set the type field
on all vtn_values. This way we always have the type and can use it for
validation and other checks.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Jason Ekstrand [Wed, 6 Dec 2017 05:39:51 +0000 (21:39 -0800)]
spirv: Add a vtn_type field to all vtn_values
At the moment, this just lets us drop the const_type for constants and
unify things a bit. Eventually, we will use this to store the types of
all SPIR-V SSA values.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Samuel Iglesias Gonsálvez [Tue, 31 Oct 2017 10:47:57 +0000 (11:47 +0100)]
anv: fix bug when using component qualifier in FS outputs
We can write to the same output but in different components, like
in this example:
layout(location = 0, component = 0) out ivec2 dEQP_FragColor_0;
layout(location = 0, component = 2) out ivec2 dEQP_FragColor_1;
Therefore, they are not two different outputs but only one.
Fixes:
dEQP-VK.glsl.440.linkage.varying.component.frag_out.*
v3:
- Remove FRAG_RESULT_MAX.
- Add const and use sizeof (Ian).
- Do three-pass to set properly the locations of fragment
outputs when having arrays (Jason).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Ilia Mirkin [Sat, 2 Dec 2017 16:20:46 +0000 (11:20 -0500)]
st/mesa: swizzle argument when there's a vector size mismatch
GLSL IR operation arguments can sometimes have an implicit swizzle as a
result of a vector arg and a scalar arg, where the scalar argument is
implicitly expanded to the size of the vector argument.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103955
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Roland Scheidegger [Tue, 12 Dec 2017 03:22:28 +0000 (04:22 +0100)]
gallivm: fix texture wrapping for texture gather for mirror modes
Care must be taken that all coords end up correct, the tests are very
sensitive that everything is correctly rounded. This doesn't matter
for bilinear filter (since picking a wrong texel with weight zero is
ok), and we could also switch the per-sample coords mistakenly.
While here, also optimize the coord_mirror helper a bit (we can do the
mirroring directly by exploiting float rounding, no need for fixing up
odd/even manually).
I did not touch the mirror_clamp and mirror_clamp_to_border modes.
In contrast to mirror_clamp_to_edge and mirror_repeat these are legacy
modes. They are specified against old gl rules, which actually does
the mirroring not per sample (so you get swapped order if the coord
is in the mirrored section). I think the idea though is that they should
follow the respecified mirror_clamp_to_edge rules so the order would be
correct.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Jason Ekstrand [Mon, 11 Dec 2017 23:31:22 +0000 (15:31 -0800)]
spirv: Allow ignoring decorations for workgroup variables
Since we switched over to lowering SLM access directly in SPIR-V -> NIR,
we no longer have vtn_variables for SLM. It's all safe as with UBOs and
SSBOs but we need to let it through in the assert.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104213
Fixes: 8761a04d0d9332d9c0c99164faf855fc3c741f7c
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Wed, 6 Dec 2017 17:13:29 +0000 (09:13 -0800)]
spirv: Set lengths on scalar and vector types
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Bas Nieuwenhuizen [Sun, 10 Dec 2017 22:31:45 +0000 (23:31 +0100)]
ac/nir: Support vulkan_resource_reindex.
Fixes: 93b4cb61eb2 "spirv: Allow OpPtrAccessChain for block indices"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sun, 10 Dec 2017 22:18:32 +0000 (23:18 +0100)]
ac/nir: Don't load the descriptor in vulkan_resource_index.
To support the reindex intrinsic, we need the result to be
something on which we can adjust the index/address.
Since it is all within a basic block, the compiler should be
able to merge any extra loads.
v2: Change visit_get_buffer_size too.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Marek Olšák [Mon, 11 Dec 2017 15:29:40 +0000 (16:29 +0100)]
winsys/amdgpu: disable local BOs again due to worse performance
Cc: 17.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Wed, 26 Jul 2017 13:21:45 +0000 (15:21 +0200)]
drirc: whitelist glthread for Mount and Blade Warband again
Bas Nieuwenhuizen [Sun, 10 Dec 2017 14:34:54 +0000 (15:34 +0100)]
radv: Don't use local BOs when allocating with export options.
If the app does not plan to put a buffer or image in it
(why? But it is allowed and CTS does it), they do not need to
allocate it with the deciate allocation struct.
Fixes: a639d40f133 "radv: add support for local bos. (v3)"
Reviewed-by: Dave Airlie <airlied@redhat.com>
Bas Nieuwenhuizen [Sun, 3 Dec 2017 14:35:39 +0000 (15:35 +0100)]
spirv: Fix loading an entire block at once.
There is no chain, so checking the length ends with a SEGFAULT.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103579
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jason Ekstrand [Sat, 2 Dec 2017 00:10:48 +0000 (16:10 -0800)]
anv: Enable UBO pushing
Push constants on Intel hardware are significantly more performant than
pull constants. Since most Vulkan applications don't actively use push
constants on Vulkan or at least don't use it heavily, we're pulling way
more than we should be. By enabling pushing chunks of UBOs we can get
rid of a lot of those pulls.
On my SKL GT4e, this improves the performance of Dota 2 and Talos by
around 2.5% and improves Aztec Ruins by around 2%.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Sun, 3 Dec 2017 06:34:47 +0000 (22:34 -0800)]
i965/fs: Handle !supports_pull_constants and push UBOs properly
In Vulkan, we don't support classic pull constants and everything the
client asks us to push, we push. However, for pushed UBOs, we still
want to fall back to conventional pulls if we run out of space.
Jason Ekstrand [Sat, 2 Dec 2017 00:07:23 +0000 (16:07 -0800)]
anv/device: Increase the UBO alignment requirement to 32
Push constants work in terms of 32-byte chunks so if we want to be able
to push UBOs, every thing needs to be 32-byte aligned. Currently, we
only require 16-byte which is too small.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Fri, 1 Dec 2017 22:28:46 +0000 (14:28 -0800)]
anv/cmd_buffer: Add support for pushing UBO ranges
In order to do this we have to modify push constant set up to handle
ranges. We also have to tweak the way we handle dirty bits a bit so
that we re-push whenever a descriptor set changes.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Fri, 1 Dec 2017 22:43:25 +0000 (14:43 -0800)]
anv/cmd_buffer: Add some stage asserts
There are several places where we look up opcodes in an array of stages.
Assert that the we don't end up going out-of-bounds.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Fri, 1 Dec 2017 12:25:05 +0000 (04:25 -0800)]
anv/cmd_buffer: Add some helpers for working with descriptor sets
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Fri, 1 Dec 2017 11:18:51 +0000 (03:18 -0800)]
anv/pipeline: Translate vulkan_resource_index to a constant when possible
We want to call brw_nir_analyze_ubo_ranges immedately after
anv_nir_apply_pipeline_layout and it badly wants constants. We could
run an optimization step and let constant folding do it but that's way
more expensive than needed. It's really easy to just handle constants
in apply_pipeline_layout.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jason Ekstrand [Sun, 3 Dec 2017 06:32:59 +0000 (22:32 -0800)]
i965/fs: Rewrite assign_constant_locations
This rewires the logic for assigning uniform locations to work in terms
of "complex alignments". The basic idea is that, as we walk the list of
instructions, we keep track of the alignment and continuity requirements
of each slot and assert that the alignments all match up. We then use
those alignments in the compaction stage to ensure that everything gets
placed at a properly aligned register. The old mechanism handled
alignments by special-casing each of the bit sizes and placing 64-bit
values first followed by 32-bit values.
The old scheme had the advantage of never leaving a hole since all the
64-bit values could be tightly packed and so could the 32-bit values.
However, the new scheme has no type size special cases so it handles not
only 32 and 64-bit types but should gracefully extend to 16 and 8-bit
types as the need arises.
Tested-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Jason Ekstrand [Fri, 8 Dec 2017 23:39:00 +0000 (15:39 -0800)]
anv: Disable VK_KHR_16bit_storage
The testing for this extension is currently very poor. The CTS tests
only test accessing UBOs and SSBOs at dynamic offsets so none of our
constant-offset paths get triggered at all. Also, there's an assertion
in our handling of nir_intrinsic_load_uniform that offset % 4 == 0 which
is never triggered indicating that nothing every gets loaded from an
offset which is not a dword. Both push constants and the constant
offset pull paths are complex enough, we really don't want to ship
without tests. We'll turn the extension back on once we have decent
tests.
Leo Liu [Thu, 7 Dec 2017 17:04:59 +0000 (12:04 -0500)]
radeon/vce: move destroy command before feedback command
VCE processing IBs starts from session and task info at first level,
other commands processed subsequently. The task info for destroy is
embedded to destroy command, resulting that feedback command is not
properly procoessed. This is causing kernel spin VM fault messages on
Polaris and Vega10 card when running ends at encode application.
The fix is also verified on VCE physical mode card.
Signed-off-by: Leo Liu <leo.liu@amd.com>
Cc: mesa-stable@lists.freedesktop.org
Acked-by: Christian König <christian.koenig@amd.com>
Ben Crocker [Mon, 27 Nov 2017 19:44:58 +0000 (14:44 -0500)]
docs/llvmpipe: document ppc64le as alternative architecture to x86.
Power8, Power8NV, and Power9 are supported on an equal footing
with X86.
Cc: "17.2" "17.3" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Ben Crocker <bcrocker@redhat.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
[Eric: changed formatting, reworded a bit (with Ben's ack)]
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Emil Velikov [Fri, 8 Dec 2017 13:59:27 +0000 (13:59 +0000)]
docs/release-calendar: drop 17.3.0 from the table
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Emil Velikov [Fri, 8 Dec 2017 13:56:01 +0000 (13:56 +0000)]
docs: add news item and link release notes for 17.3.0
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Emil Velikov [Fri, 8 Dec 2017 13:53:30 +0000 (13:53 +0000)]
docs: add sha256 checksums for 17.3.0
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit
49a612d1580b3316392273a069d20d93967126a8)
Emil Velikov [Fri, 8 Dec 2017 13:47:33 +0000 (13:47 +0000)]
docs: Update 17.3.0 release notes
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
(cherry picked from commit
8d55da9f579463038f4305ed7d505aa7fffa0f37)
Samuel Pitoiset [Fri, 1 Dec 2017 15:15:40 +0000 (16:15 +0100)]
radv: do not print ASM to stderr when dumping shaders
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Wed, 6 Dec 2017 11:06:43 +0000 (12:06 +0100)]
radv/winsys: implement query_value()
Might be useful to know the VRAM/GTT usage, the number of VRAM
CPU page faults, etc. Nothing is currently using that new
interface, but it's a first step.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Wed, 6 Dec 2017 16:49:37 +0000 (17:49 +0100)]
radv: remove useless check radv_set_dcc_need_cmask_elim_pred()
emit_fast_color_clear() already checks that.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Wed, 6 Dec 2017 16:49:36 +0000 (17:49 +0100)]
radv: remove useless checks in radv_set_{color,depth}_clear_regs()
Already checked by the respective callers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Wed, 6 Dec 2017 16:49:20 +0000 (17:49 +0100)]
radv: only re-mit the index type when it changes
dota2 binds a ton of index buffers but the type is always 16-bit.
Note that we have to invalidate the type when switching from
indexed draws to normal draws.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Wed, 6 Dec 2017 16:48:41 +0000 (17:48 +0100)]
radv: only reset command buffers that are not in the initial state
dota2 always calls vkResetCommandBuffer() before
vkBeginCommandBuffer() which is quite useless.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Wed, 6 Dec 2017 16:48:40 +0000 (17:48 +0100)]
radv: track different status of a command buffer
RADV_CMD_BUFFER_STATUS_INVALID is not used for now, but I think
it makes sense to declare it. Could be used later with better
command buffer error handling.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Samuel Pitoiset [Thu, 7 Dec 2017 10:39:46 +0000 (11:39 +0100)]
radv: fix TC-compat HTILE with VK_FORMAT_D32_SFLOAT_S8_UINT on Vega
Copied from RadeonSI.
This fixes all CTS
dEQP-VK.renderpass.dedicated_allocation.formats.d32_sfloat_s8_uint.clear.*
And some other ones which use the same format.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Jordan Justen [Mon, 20 Nov 2017 21:42:33 +0000 (13:42 -0800)]
docs: Update GL_ARB_get_program_binary docs to support 1 format
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Jordan Justen [Sat, 4 Nov 2017 23:53:15 +0000 (16:53 -0700)]
i965: Add ARB_get_program_binary support using nir_serialization
This resolves an apparent game bug described in 85564. The game
doesn't properly handle ARB_get_program_binary with 0 supported
formats.
V2 (Timothy Arceri):
- less driver code as more has been moved into the common helpers.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=85564
Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Jordan Justen [Tue, 7 Nov 2017 10:11:28 +0000 (02:11 -0800)]
main: Clear shader program data whenever ProgramBinary is called
The GL_ARB_get_program_binary extension spec says:
"If ProgramBinary fails to load a binary, no error is generated, but
any information about a previous link or load of that program object
is lost."
v2:
* Re-initialize shProg->data after clear. (Jordan)
(Required after
6a72eba755fea15a0d97abb913a6315d9d32e274)
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Jordan Justen [Sat, 4 Nov 2017 23:47:54 +0000 (16:47 -0700)]
main: add binary support to ProgramBinary
V2: call generic mesa_program_binary() helper rather than driver
function directly to allow greater code sharing.
Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Jordan Justen [Sat, 4 Nov 2017 23:47:25 +0000 (16:47 -0700)]
main: add binary support to GetProgramBinary
V2: call generic _mesa_get_program_binary() helper rather than driver
function directly to allow greater code sharing.
Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> (v1)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Jordan Justen [Sat, 4 Nov 2017 23:43:21 +0000 (16:43 -0700)]
main: Support getting GL_PROGRAM_BINARY_LENGTH
V2: call generic _mesa_get_program_binary_length() helper
rather than driver function directly to allow greater
code sharing.
Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>i (v1)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Jordan Justen [Sat, 4 Nov 2017 23:52:14 +0000 (16:52 -0700)]
mesa: Add Mesa ARB_get_program_binary helper functions
V2 (Timothy Arceri):
- add extra code comment
- stop passing around void *binary and just pass
program_binary_header *hdr instead.
- move to src/mesa/main rather than src/util
V3 (Timothy Arceri):
- Move more code out of the backend and into the common
helpers.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Timothy Arceri [Tue, 28 Nov 2017 03:27:51 +0000 (14:27 +1100)]
mesa: add driver callbacks for serialising ProgramBinary blobs
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Jordan Justen [Tue, 7 Nov 2017 08:21:33 +0000 (00:21 -0800)]
main: Support 1 Mesa format with get for GL_PROGRAM_BINARY_FORMATS
Mesa supports either 0 or 1 formats. If 1 format is supported, it is
GL_PROGRAM_BINARY_FORMAT_MESA as defined in the
GL_MESA_program_binary_formats extension spec.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Jordan Justen [Sat, 4 Nov 2017 23:39:08 +0000 (16:39 -0700)]
main: Allow non-zero NUM_PROGRAM_BINARY_FORMATS
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Jordan Justen [Sat, 4 Nov 2017 00:18:32 +0000 (17:18 -0700)]
i965: Fix memory leak when serializing nir
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Jordan Justen [Fri, 3 Nov 2017 23:57:42 +0000 (16:57 -0700)]
i965: Add brw_program_serialize_nir
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Jordan Justen [Fri, 3 Nov 2017 23:45:46 +0000 (16:45 -0700)]
i965: Free serialized nir after deserializing
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Jordan Justen [Fri, 3 Nov 2017 23:40:17 +0000 (16:40 -0700)]
i965: Add brw_program_deserialize_nir
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Jordan Justen [Mon, 30 Oct 2017 18:16:48 +0000 (11:16 -0700)]
main, glsl: Add UniformDataDefaults which stores uniform defaults
The ARB_get_program_binary extension requires that uniform values in a
program be restored to their initial value just after linking.
This patch saves off the initial values just after linking. When the
program is restored by glProgramBinary, we can use this to copy the
initial value of uniforms into UniformDataSlots.
V2 (Timothy Arceri):
- Store UniformDataDefaults only when serializing GLSL as this
is what we want for both disk cache and ARB_get_program_binary.
This saves us having to come back later and reset the Uniforms
on program binary restores.
Signed-off-by: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> (v1)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Jordan Justen [Fri, 27 Oct 2017 08:04:53 +0000 (01:04 -0700)]
glsl: Split out shader program serialization
This will allow us to use the program serialization to implement
ARB_get_program_binary.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Jordan Justen [Tue, 7 Nov 2017 08:16:47 +0000 (00:16 -0800)]
include: Add GL_MESA_program_binary_formats to GL/GLES2 ext.h files
Thus was merged into the OpenGL Registry in version
667c5a253781834b40a6ae9eb19d05af4542cfe1.
Ref: https://github.com/KhronosGroup/OpenGL-Registry/pull/127
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Jordan Justen [Tue, 28 Nov 2017 00:15:07 +0000 (11:15 +1100)]
mesa: add GL_PROGRAM_BINARY_FORMAT_MESA enum
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Francisco Jerez [Sat, 14 Oct 2017 00:52:00 +0000 (17:52 -0700)]
intel/cfg: Represent divergent control flow paths caused by non-uniform loop execution.
This addresses a long-standing back-end compiler bug that could lead
to cross-channel data corruption in loops executed non-uniformly. In
some cases live variables extending through a loop divergence point
(e.g. a non-uniform break) into a convergence point (e.g. the end of
the loop) wouldn't be considered live along all physical control flow
paths the SIMD thread could possibly have taken in between due to some
channels remaining in the loop for additional iterations.
This patch fixes the problem by extending the CFG with physical edges
that don't exist in the idealized non-vectorized program, but
represent valid control flow paths the SIMD EU may take due to the
divergence of logical threads. This makes sense because the i965 IR
is explicitly SIMD, and it's not uncommon for instructions to have an
influence on neighboring channels (e.g. a force_writemask_all header
setup), so the behavior of the SIMD thread as a whole needs to be
considered.
No changes in shader-db.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Francisco Jerez [Mon, 23 Oct 2017 20:47:10 +0000 (13:47 -0700)]
intel/fs: Don't let undefined values prevent copy propagation.
This makes the dataflow propagation logic of the copy propagation pass
more intelligent in cases where the destination of a copy is known to
be undefined for some incoming CFG edges, building upon the
definedness information provided by the last patch. Helps a few
programs, and avoids a handful shader-db regressions from the next
patch.
shader-db results on ILK:
total instructions in shared programs:
6541547 ->
6541523 (-0.00%)
instructions in affected programs: 360 -> 336 (-6.67%)
helped: 8
HURT: 0
LOST: 0
GAINED: 10
shader-db results on BDW:
total instructions in shared programs:
8174323 ->
8173882 (-0.01%)
instructions in affected programs: 7730 -> 7289 (-5.71%)
helped: 5
HURT: 2
LOST: 0
GAINED: 4
shader-db results on SKL:
total instructions in shared programs:
8185669 ->
8184598 (-0.01%)
instructions in affected programs: 10364 -> 9293 (-10.33%)
helped: 5
HURT: 2
LOST: 0
GAINED: 2
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Thu, 7 Sep 2017 07:26:03 +0000 (00:26 -0700)]
intel/fs: Restrict live intervals to the subset possibly reachable from any definition.
Currently the liveness analysis pass would extend a live interval up
to the top of the program when no unconditional and complete
definition of the variable is found that dominates all of its uses.
This can lead to a serious performance problem in shaders containing
many partial writes, like scalar arithmetic, FP64 and soon FP16
operations. The number of oversize live intervals in such workloads
can cause the compilation time of the shader to explode because of the
worse than quadratic behavior of the register allocator and scheduler
when running out of registers, and it can also cause the running time
of the shader to explode due to the amount of spilling it leads to,
which is orders of magnitude slower than GRF memory.
This patch fixes it by computing the intersection of our current live
intervals with the subset of the program that can possibly be reached
from any definition of the variable. Extending the storage allocation
of the variable beyond that is pretty useless because its value is
guaranteed to be undefined at a point that cannot be reached from any
definition.
According to Jason, this improves performance of the subgroup Vulkan
CTS tests significantly (e.g. the runtime of the dvec4 broadcast test
improves by nearly 50x).
No significant change in the running time of shader-db (with 5%
statistical significance).
shader-db results on IVB:
total cycles in shared programs:
61108780 ->
60932856 (-0.29%)
cycles in affected programs:
16335482 ->
16159558 (-1.08%)
helped: 5121
HURT: 4347
total spills in shared programs: 1309 -> 1288 (-1.60%)
spills in affected programs: 249 -> 228 (-8.43%)
helped: 3
HURT: 0
total fills in shared programs: 1652 -> 1597 (-3.33%)
fills in affected programs: 262 -> 207 (-20.99%)
helped: 4
HURT: 0
LOST: 2
GAINED: 209
shader-db results on BDW:
total cycles in shared programs:
67617262 ->
67361220 (-0.38%)
cycles in affected programs:
23397142 ->
23141100 (-1.09%)
helped: 8045
HURT: 6488
total spills in shared programs: 1456 -> 1252 (-14.01%)
spills in affected programs: 465 -> 261 (-43.87%)
helped: 3
HURT: 0
total fills in shared programs: 1720 -> 1465 (-14.83%)
fills in affected programs: 471 -> 216 (-54.14%)
helped: 4
HURT: 0
LOST: 2
GAINED: 162
shader-db results on SKL:
total cycles in shared programs:
65436248 ->
65245186 (-0.29%)
cycles in affected programs:
22560936 ->
22369874 (-0.85%)
helped: 8457
HURT: 6247
total spills in shared programs: 437 -> 437 (0.00%)
spills in affected programs: 0 -> 0
helped: 0
HURT: 0
total fills in shared programs: 870 -> 854 (-1.84%)
fills in affected programs: 16 -> 0
helped: 1
HURT: 0
LOST: 0
GAINED: 107
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Francisco Jerez [Wed, 6 Dec 2017 19:42:54 +0000 (11:42 -0800)]
intel/fs: Teach instruction scheduler about GRF bank conflict cycles.
This should allow the post-RA scheduler to do a slightly better job at
hiding latency in presence of instructions incurring bank conflicts.
The main purpuse of this patch is not to improve performance though,
but to get conflict cycles to show up in shader-db statistics in order
to make sure that regressions in the bank conflict mitigation pass
don't go unnoticed.
Acked-by: Matt Turner <mattst88@gmail.com>
Francisco Jerez [Thu, 15 Jun 2017 22:23:57 +0000 (15:23 -0700)]
intel/fs: Implement GRF bank conflict mitigation pass.
Unnecessary GRF bank conflicts increase the issue time of ternary
instructions (the overwhelmingly most common of which is MAD) by
roughly 50%, leading to reduced ALU throughput. This pass attempts to
minimize the number of bank conflicts by rearranging the layout of the
GRF space post-register allocation. It's in general not possible to
eliminate all of them without introducing extra copies, which are
typically more expensive than the bank conflict itself.
In a shader-db run on SKL this helps roughly 46k shaders:
total conflicts in shared programs:
1008981 -> 600461 (-40.49%)
conflicts in affected programs: 816222 -> 407702 (-50.05%)
helped: 46234
HURT: 72
The running time of shader-db itself on SKL seems to be increased by
roughly 2.52%±1.13% with n=20 due to the additional work done by the
compiler back-end.
On earlier generations the pass is somewhat less effective in relative
terms because the hardware incurs a bank conflict anytime the last two
sources of the instruction are duplicate (e.g. while trying to square
a value using MAD), which is impossible to avoid without introducing
copies. E.g. for a shader-db run on SNB:
total conflicts in shared programs: 944636 -> 623185 (-34.03%)
conflicts in affected programs: 853258 -> 531807 (-37.67%)
helped: 31052
HURT: 19
And on BDW:
total conflicts in shared programs:
1418393 -> 987539 (-30.38%)
conflicts in affected programs:
1179787 -> 748933 (-36.52%)
helped: 47592
HURT: 70
On SKL GT4e this improves performance of GpuTest Volplosion by 3.64%
±0.33% with n=16.
NOTE: This patch intentionally disregards some i965 coding conventions
for the sake of reviewability. This is addressed by the next
squash patch which introduces an amount of (for the most part
boring) boilerplate that might distract reviewers from the
non-trivial algorithmic details of the pass.
The following patch is squashed in:
SQUASH: intel/fs/bank_conflicts: Roll back to the nineties.
Acked-by: Matt Turner <mattst88@gmail.com>
Dylan Baker [Tue, 5 Dec 2017 17:40:03 +0000 (09:40 -0800)]
meson: Fix building gallium media targets with gallium-xlib glx
To demonstrate this bug run meson with the options:
-Ddri-drivers= -Dglx=gallium-xlib
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Dylan Baker [Mon, 4 Dec 2017 22:03:25 +0000 (14:03 -0800)]
meson: Add lmsensors to gallium libgl-xlib target.
Fixes: 5e71efef44b992b5d70b ("meson: Add lmsensors support")
Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Eric Engestrom [Thu, 7 Dec 2017 14:47:46 +0000 (14:47 +0000)]
meson: add dep_thread to every lib that includes threads.h
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104141
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Eric Engestrom [Mon, 4 Dec 2017 15:06:03 +0000 (15:06 +0000)]
meson: fix pl111 dependency on vc4
src/gallium/winsys/pl111/drm/libpl111winsys.a(pl111_drm_winsys.c.o): In function `pl111_drm_screen_create':
pl111_drm_winsys.c:(.text+0x33): undefined reference to `vc4_drm_screen_create_renderonly'
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Samuel Pitoiset [Tue, 5 Dec 2017 17:02:08 +0000 (18:02 +0100)]
radv: use a faster version for nir_op_pack_half_2x16
This patch is ported from RadeonSI and it has two effects.
It fixes a rendering issue which affects F1 2017 and Dawn
of War 3 (Vega only) because LLVM was ending up by generating
the new v_mad_mix_{hi,lo} instructions which appear to be
buggy in some way. Not sure if Mesa is generating something
wrong or if the issue is in LLVM only. Anyway, that explains why
the DOW3 issue can't be reproduced with GL on Vega.
It also improves performance because v_cvt_pkrtz_f16 is faster,
and because I guess the rounding mode behaviour is similar between
GL and VK, we can use it. About performance, it improves Talos
by +3/4% but I don't see any other impacts.
No CTS regressions on Polaris.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Alejandro Piñeiro [Thu, 7 Dec 2017 08:38:41 +0000 (09:38 +0100)]
mesa/spirv: move and rename nir_spirv_supported_capabilities
To avoid any vulkan driver to include the GL mtypes.h. Renamed as
eventually this could be used by drivers not using nir.
v2: remove compiler/spirv/spirv.h from mtypes (Alejandro)
v3: added the definition at compiler/shader_info.h (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Vadym Shovkoplias [Mon, 4 Dec 2017 09:47:33 +0000 (11:47 +0200)]
util/disk_cache: Remove unneeded free() on always null string
At this point dc_job->cache_item_metadata.keys always equals
NULL, so call to free() is useless
Fixes: b86ecea3446 ("util/disk_cache: write cache item metadata to disk")
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Samuel Iglesias Gonsálvez [Mon, 20 Nov 2017 12:12:12 +0000 (13:12 +0100)]
spirv: fix bug when OpSpecConstantOp calls a conversion
In that case, nir_eval_const_opcode() will evaluate the conversion
but as it was using destination's bit_size, the resulting
value was just a cast of the source constant value. By passing the
source's bit size, it does the conversion properly.
Fixes:
dEQP-VK.spirv_assembly.instruction.*.opspecconstantop.*convert*
v2:
- Remove invalid conversion op cases.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Samuel Iglesias Gonsálvez [Mon, 20 Nov 2017 11:05:31 +0000 (12:05 +0100)]
spirv: allow specialization constants with bitsize different than 32 bits
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
James Legg [Wed, 6 Dec 2017 11:55:14 +0000 (11:55 +0000)]
nir/opcodes: Fix constant-folding of bitfield_insert
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104119
CC: <mesa-stable@lists.freedesktop.org>
CC: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Alex Smith [Wed, 6 Dec 2017 10:28:14 +0000 (10:28 +0000)]
radv: Add LLVM version to the device name string
Allows apps to determine the LLVM version so that they can decide
whether or not to enable workarounds for LLVM issues.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Cc: "17.2 17.3" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Alejandro Piñeiro [Wed, 6 Dec 2017 10:38:59 +0000 (11:38 +0100)]
mesa: remove set_entry from forward type declarations
This type was used at gl_sync_object, but it is not used anymore.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Kenneth Graunke [Tue, 21 Mar 2017 07:30:06 +0000 (00:30 -0700)]
meta: Fix ClearTexture with GL_DEPTH_COMPONENT.
We only handled unpacking for GL_DEPTH_STENCIL formats.
Cemu was hitting _mesa_problem() for an unsupported format in
_mesa_unpack_float_32_uint_24_8_depth_stencil_row(), because the
format was depth-only, rather than depth-stencil.
Cc: "13.0 12.0" <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94739
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103966
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Kenneth Graunke [Tue, 5 Dec 2017 19:09:13 +0000 (11:09 -0800)]
meta: Initialize depth/clear values on declaration.
This helps avoid compiler warningss in the next commit - everything
was initialized, but it wasn't obvious to static analysis.
Suggested-by: Tapani Pälli <tapani.palli@intel.com>
Timothy Arceri [Wed, 6 Dec 2017 23:16:55 +0000 (10:16 +1100)]
glsl: get correct member type when processing xfb ifc arrays
This fixes a crash in:
KHR-GL45.enhanced_layouts.xfb_block_stride
Fixes: 0822517936d4 "glsl: add helper to process xfb qualifiers during linking"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gert Wollny [Wed, 6 Dec 2017 16:42:02 +0000 (17:42 +0100)]
r600/sb: do not convert if-blocks that contain indirect array access
If an array is accessed within an if block, then currently it is not known
whether the value in the address register is involved in the evaluation of the
if condition, and converting the if condition may actually result in
out-of-bounds array access. Consequently, if blocks that contain indirect array
access should not be converted.
Fixes piglits on r600/BARTS:
spec/glsl-1.10/execution/variable-indexing/
vs-output-array-float-index-wr
vs-output-array-vec3-index-wr
vs-output-array-vec4-index-wr
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104143
Signed-off-by: Gert Wollny <gw.fossdev@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Tue, 28 Nov 2017 02:53:02 +0000 (12:53 +1000)]
r600: add support for compute grid/block sizes. (v2)
We just pass these in from outside in a constant buffer.
The shader side stores them once they are accessed once.
v2: fix to not use a temp_reg.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Mon, 27 Nov 2017 06:12:18 +0000 (16:12 +1000)]
r600: handle image/buffer sizes correctly.
This adds support to compute for the resq workarounds (buffer/cube sizes)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 3 Nov 2017 01:47:55 +0000 (11:47 +1000)]
r600/compute: add support for emitting compute image/buffer atoms
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 3 Nov 2017 01:47:31 +0000 (11:47 +1000)]
r600/compute: handle atomic counters in compute state.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 3 Nov 2017 01:44:06 +0000 (11:44 +1000)]
r600/compute: add support for TGSI compute shaders. (v1.1)
This add paths to handle TGSI compute shaders and shader selection.
It also avoids emitting certain things on tgsi paths,
CBs, vertex buffers, config reg init (not required).
v1.1: fix rat mask calc
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 3 Nov 2017 01:15:26 +0000 (11:15 +1000)]
r600/shader: add compute support to shader assembler
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Fri, 3 Nov 2017 01:35:03 +0000 (11:35 +1000)]
r600/texture: drop lowering 1d/2d images to linear.
This appears to cause hangs with compute images. Unless
we can find more specifics, just don't do this for now.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Alejandro Piñeiro [Wed, 6 Dec 2017 08:57:18 +0000 (09:57 +0100)]
mesa: define nir_spirv_supported_capabilities
Until now it was part of spirv_to_nir_options. But it will be used on
the implementation of ARB_gl_spirv and ARB_spirv_extensions, and added
to the OpenGL context, as a way to save what SPIR-V capabilities the
current OpenGL implementation supports.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Fredrik Höglund [Tue, 5 Dec 2017 20:19:51 +0000 (21:19 +0100)]
anv: fix a case statement in GetMemoryFdPropertiesKHR
The handle type in the case statement is supposed to be VK_EXTERNAL_-
MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT.
Fixes: ab18e8e59b6 ("anv: Implement VK_EXT_external_memory_dma_buf")
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Fredrik Höglund [Tue, 5 Dec 2017 20:15:25 +0000 (21:15 +0100)]
radv: fix a case statement in GetMemoryFdPropertiesKHR
The handle type in the case statement is supposed to be VK_EXTERNAL_-
MEMORY_HANDLE_TYPE_DMA_BUF_BIT_EXT.
Fixes: 546e747867c ("radv: Implement VK_EXT_external_memory_dma_buf")
Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Eric Engestrom [Wed, 6 Dec 2017 13:27:52 +0000 (13:27 +0000)]
meson: fix keyword argument in declare_dependency()
`declare_dependency()` takes `compile_args`, not `c_args`.
It was correct in all the other `declare_dependency()` from that commit.
Fixes: 0bbecc5a8548883f76a71 "meson: define driver dependencies"
Signed-off-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Emil Velikov [Wed, 6 Dec 2017 17:33:00 +0000 (17:33 +0000)]
i965: include brw_pipe_control.h in the tarball
Fixes: bfe0f3a7027 ("i965: Move PIPE_CONTROL defines and prototypes to
brw_pipe_control.h.")
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Emil Velikov [Thu, 16 Nov 2017 14:22:18 +0000 (14:22 +0000)]
mesa: document _mesa_extension_override_* variables
Currently there are no users of these outside of extensions.c.
Provide some information why they exist and how to use them.
Cc: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Andres Gomez <agomez@igalia.com>
Emil Velikov [Thu, 23 Nov 2017 16:56:44 +0000 (16:56 +0000)]
docs: annotate MESA_program_debug as obsolete
It has been obsolete for years - state it explicitly.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
George Kyriazis [Tue, 5 Dec 2017 16:47:12 +0000 (10:47 -0600)]
swr/scons: Fix another intermittent build failure
gen_BackendPixelRate*.cpp depends on gen_ar_eventhandler.hpp.
Fix missing dependency.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Marek Olšák [Fri, 1 Dec 2017 02:08:16 +0000 (03:08 +0100)]
radeonsi: make const and stream uploaders allocate read-only memory
and anything that clones these uploaders, like u_threaded_context.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 5 Dec 2017 19:04:11 +0000 (20:04 +0100)]
radeonsi: use a separate allocator for fine fences
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 5 Dec 2017 12:32:47 +0000 (13:32 +0100)]
radeonsi/gfx9: make shader binaries use read-only memory
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Tue, 5 Dec 2017 12:32:33 +0000 (13:32 +0100)]
winsys/amdgpu: make IBs use read-only memory
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Mon, 4 Dec 2017 22:02:54 +0000 (23:02 +0100)]
radeonsi: print the buffer list for CHECK_VM
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Marek Olšák [Thu, 30 Nov 2017 21:49:10 +0000 (22:49 +0100)]
radeonsi: allow DMABUF exports for local buffers
Cc: 17.3 <mesa-stable@lists.freedesktop.org>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Nicolai Hähnle [Thu, 23 Nov 2017 09:29:49 +0000 (10:29 +0100)]
radeonsi: always place sparse buffers in VRAM
Together with "radeonsi: fix the R600_RESOURCE_FLAG_UNMAPPABLE check",
this ensures that sparse buffers are placed in VRAM.
Noticed by an assertion that started triggering with commit
d4fac1e1d7
("gallium/radeon: enable suballocations for VRAM with no CPU access")
Fixes KHR-GL45.sparse_buffer_tests.BufferStorageTest in debug builds.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Nicolai Hähnle [Thu, 23 Nov 2017 09:25:34 +0000 (10:25 +0100)]
radeonsi: fix the R600_RESOURCE_FLAG_UNMAPPABLE check
The flag is on the pipe_resource, not the r600_resource.
I don't see an obvious bug related to this, but it could potentially lead
to suboptimal placement of some resources.
Fixes: a41587433c4d ("gallium/radeon: add R600_RESOURCE_FLAG_UNMAPPABLE")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Jose Maria Casanova Crespo [Mon, 24 Jul 2017 20:42:59 +0000 (22:42 +0200)]
i965/fs: Use untyped_surface_read for 16-bit load_ssbo
SSBO loads were using byte_scattered read messages as they allow
reading 16-bit size components. byte_scattered messages can only
operate one component at a time so we needed to emit as many messages
as components.
But for vec2 and vec4 of 16-bit, being multiple of 32-bit we can use the
untyped_surface_read message to read pairs of 16-bit components using only
one message. Once each pair is read it is unshuffled to return the proper
16-bit components. vec3 case is assimilated to vec4 but the 4th component
is ignored.
16-bit scalars are read using one byte_scattered_read message.
v2: Removed use of stride = 2 on sources (Jason Ekstrand)
Rework optimization using unshuffle 16 reads (Chema Casanova)
v3: Use W and D types insead of HF and F in shuffle to avoid rounding
erros (Jason Ekstrand)
Use untyped_surface_read for 16-bit vec3. (Jason Ekstrand)
v4: Use subscript insead of chaging type and stride (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jose Maria Casanova Crespo [Wed, 12 Jul 2017 12:49:41 +0000 (14:49 +0200)]
i965/fs: Optimize 16-bit SSBO stores by packing two into a 32-bit reg
Currently, we use byte-scattered write messages for storing 16-bit
into an SSBO. This is because untyped surface messages have a fixed
32-bit size.
This patch optimizes these 16-bit writes by combining 2 values (e.g,
two consecutive components aligned with 32-bits) into a 32-bit register,
packing the two 16-bit words.
16-bit single component values will continue to use byte-scattered
write messages. The same will happens when the first consecutive
component is not aligned 32-bits.
This optimization reduces the number of SEND messages used for storing
16-bit values potentially by 2 or 4, which cuts down execution time
significantly because byte-scattered writes are an expensive
operation as they only write a component for message.
v2: Removed use of stride = 2 on sources (Jason Ekstrand)
Rework optimization using shuffle 16 write and enable writes
of 16bit vec4 with only one message of 32-bits. (Chema Casanova)
v3: - Fix coding style (Eduardo Lima)
- Reorganize code to avoid duplication. (Jason Ekstrand)
- Include new comments to explain the length calculations to
fix alignment issues of components. (Jason Ekstrand)
- Fix issues with writemask yz with 16-bit writes. (Jason Ektrand)
v4: (Jason Ekstrand)
- Reorganize 64-bit ssbo-writes to avoid using slots_per_component.
- Comment about why suffle is needed when using byte_scattered_write.
Signed-off-by: Eduardo Lima <elima@igalia.com>
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>