Eduardo Lima Mitev [Wed, 19 Dec 2018 08:18:04 +0000 (09:18 +0100)]
freedreno/ir3: Handle GL_NONE in get_num_components_for_glformat()
An earlier patch that introduced the function failed to handle the case
where an image format layout qualifier is not specified, which is allowed
on desktop GL profiles. In these cases, nir_variable's image format is
GL_NONE, and we don't need to print a debug message for those.
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Eric Anholt [Wed, 12 Dec 2018 19:11:07 +0000 (11:11 -0800)]
docs: Add an encouraging note about providing reviews and acks.
Across several projects I've seen new contributors say "I wasn't sure if I
should provide a review tag since I'm not really an expert in this area."
Everyone I know already applies some implicit weighting to reviews from
different people, so encourage participation.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Eric Anholt [Wed, 12 Dec 2018 19:08:00 +0000 (11:08 -0800)]
docs: Add a note that MRs should still include any r-b or a-b tags.
v2: Mention "Tested-by" too
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Eric Anholt [Mon, 17 Dec 2018 20:54:42 +0000 (12:54 -0800)]
v3d: Load and store aligned utiles all at once.
This calls the expensive uif offset function once per utile, but it still
gets us a 212.218% +/- 2.41216% (n=10) win on 1024x1024 glTexImage over
calling it on each pixel.
Eric Anholt [Mon, 17 Dec 2018 20:20:41 +0000 (12:20 -0800)]
v3d: Add a fallthrough path for utile load/store of 32 byte lines.
Now that V3D has 8 byte per pixel formats exposed, we've got stride==32
utiles to load and store. Just handle them through the non-NEON paths for
now.
Eric Anholt [Mon, 17 Dec 2018 19:10:11 +0000 (11:10 -0800)]
vc4: Move the utile load/store functions to a header for reuse by v3d.
These implementations of whole-utile load/stores would be the same for
v3d, though the layouts of blocks of utiles has changed.
Eric Anholt [Tue, 18 Dec 2018 22:50:57 +0000 (14:50 -0800)]
v3d: Implement texture_subdata to reduce teximage upload copies.
This lets us store the non-PBO glTexImage data directly into the tiled
image without making an extra untiled memcpy for the gallium transfer.
Improves 1024x1024 TexImage perf by ~19%, mostly from not thrashing around
in the kernel mapping and unmapping the transfer's temporary area.
Eric Anholt [Mon, 17 Dec 2018 20:50:08 +0000 (12:50 -0800)]
v3d: Remove dead prototypes for load/store utile functions.
Eric Anholt [Mon, 17 Dec 2018 20:57:38 +0000 (12:57 -0800)]
v3d: Don't try to create shadow tiled temporaries for 1D textures.
They're raster order anyway, so we'd assertion fail along with wasting
bandwidth.
Fixes: 6ad9e8690d14 ("v3d: Add support for texturing from linear.")
Eric Anholt [Wed, 19 Dec 2018 00:17:26 +0000 (16:17 -0800)]
v3d: Fix check for TFU job completion in the simulator.
We're waiting for the jobs-completed count to increment (with wrapping),
not to reach its starting state. This mostly ended up working out because
the next v3d_hw_tick() for a submit CL would end up doing the TFU
operation first, but it did fail when a blit was used for glReadPixels()
at the end of a test.
Fixes: ee0549ff9ab3 ("v3d: Add the V3D TFU submit interface to the simulator.")
Eric Anholt [Wed, 19 Dec 2018 17:29:26 +0000 (09:29 -0800)]
v3d: Put the dst bo first in the list of BOs for TFU calls.
In the UAPI, the first BO is the destination, and the one the kernel
should do an exclusive reservation on. Currently we only do exclusive
reservations, anyway. However, in the simulator path I was only copying
back the "destination" BO (actually src in this case), and this caused
regressions once I fixed the simulator to actually complete TFU before
returning (since otherwise, the TFU op would happen at the start of the
next CL submit and the draw would get the right contents).
Fixes: 976ea90bdca2 ("v3d: Add support for using the TFU to do some blits.")
Caio Marcelo de Oliveira Filho [Sat, 15 Dec 2018 00:10:32 +0000 (16:10 -0800)]
nir: properly find the entry to keep in copy_prop_vars
When copy propagation handles a store/copy, it iterates the current
copy entries to remove aliases, but keeps the "equal" entry (if
exists) to be updated.
The removal step may swap the entries around (to ensure there are no
holes), invalidating previous iteration pointers. The bug was saving
such pointer to use later. Change the code to first perform the
removals and then find the remaining right entry.
This was causing updates to be lost since they were being made to an
entry that was not part of the current copies.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108624
Fixes: b3c61469255 "nir: Copy propagation between blocks"
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Michel Dänzer [Wed, 19 Dec 2018 14:58:23 +0000 (15:58 +0100)]
winsys/amdgpu: Pull in LLVM CFLAGS
Fixes build failure if the LLVM headers aren't in a standard include
directory.
Fixes: ec22dd34c88f "radeonsi: move SI_FORCE_FAMILY functionality to
winsys"
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Caio Marcelo de Oliveira Filho [Sat, 15 Dec 2018 06:19:24 +0000 (22:19 -0800)]
nir: properly clear the entry sources in copy_prop_vars
When updating a copy entry source value from a "non-SSA" (the data
come from a copy instruction) to a "SSA" (the data or parts of it come
from SSA values), it was possible to hold invalid data in ssa[0]
depending on the writemask. Because the union, ssa[0] could contain a
pointer to a nir_deref_instr left-over from previous non-SSA usage.
Change code to clean up the array before use to avoid invalid data
around.
Fixes: 62332d139c8 "nir: Add a local variable-based copy propagation pass"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Eric Engestrom [Thu, 29 Nov 2018 13:15:48 +0000 (13:15 +0000)]
docs: format code blocks a bit nicely
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Eric Engestrom [Thu, 29 Nov 2018 13:16:42 +0000 (13:16 +0000)]
docs: add meson cross compilation instructions
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Gurchetan Singh [Mon, 3 Dec 2018 23:16:43 +0000 (15:16 -0800)]
virgl: move resource creation / import / destruction to common code
We can remove some duplicated code.
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Gurchetan Singh [Sat, 1 Dec 2018 04:45:44 +0000 (20:45 -0800)]
virgl: move resource metadata into base resource
A resource is just a buffer with some metadata.
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Gurchetan Singh [Mon, 3 Dec 2018 16:50:48 +0000 (08:50 -0800)]
virgl: modify how we handle GL_MAP_FLUSH_EXPLICIT_BIT
Previously, we ignored the the glUnmap(..) operation and
flushed before we flush the cbuf. Now, let's just flush
the data when we unmap.
Neither method is optimal, for example:
glMapBufferRange(.., 0, 100, GL_MAP_FLUSH_EXPLICIT_BIT)
glFlushMappedBufferRange(.., 25, 30)
glFlushMappedBufferRange(.., 65, 70)
We'll end up flushing 25 --> 70. Maybe we can fix this later.
v2: Add fixme comment in the code (Elie)
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Gurchetan Singh [Sat, 1 Dec 2018 01:29:16 +0000 (17:29 -0800)]
virgl: make virgl_buffers use resource helpers
We can reuse the helpers we created.
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Gurchetan Singh [Sat, 1 Dec 2018 02:08:14 +0000 (18:08 -0800)]
virgl: make transfer code with PIPE_BUFFER targets
util_format_get_blocksize returns 1 for R8 formats (all
PIPE_BUFFERs are R8).
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Gurchetan Singh [Fri, 30 Nov 2018 22:54:33 +0000 (14:54 -0800)]
virgl: consolidate transfer code
We could allocate and destroy transfers in one place.
v2: Keep l_stride around.
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Gurchetan Singh [Fri, 30 Nov 2018 22:31:36 +0000 (14:31 -0800)]
virgl: store layer_stride in metadata
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Gurchetan Singh [Sat, 10 Nov 2018 00:40:03 +0000 (16:40 -0800)]
virgl: move vrend_get_tex_image_offset to common code
Will be reused.
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Gurchetan Singh [Sat, 10 Nov 2018 00:27:32 +0000 (16:27 -0800)]
virgl: move virgl_resource_layout to common code
Will be reused.
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Gurchetan Singh [Sat, 10 Nov 2018 00:21:35 +0000 (16:21 -0800)]
virgl: move texture metadata to common code
Will be reused.
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Gurchetan Singh [Mon, 3 Dec 2018 22:49:11 +0000 (14:49 -0800)]
virgl: remove unnessecary code
With commit 89b479, we moved to tracking buffer cleanliness
when binding.
TEST=dEQP-GLES31.functional.image_load_store.buffer.load_store.r32ui
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Gurchetan Singh [Fri, 30 Nov 2018 16:58:27 +0000 (08:58 -0800)]
virgl: texture_transfer_pool --> transfer_pool
It's used for all types of resources.
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Nicolai Hähnle [Thu, 20 Sep 2018 08:21:26 +0000 (10:21 +0200)]
radeonsi: const-ify the si_query_ops
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Tue, 18 Sep 2018 20:29:41 +0000 (22:29 +0200)]
radeonsi: split perfcounter queries from si_query_hw
Remove a level of indirection to make the code more explicit -- should
make it easier to follow what's going on.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Tue, 18 Sep 2018 13:52:17 +0000 (15:52 +0200)]
radeonsi: factor si_query_buffer logic out of si_query_hw
This is a move towards using composition instead of inheritance for
different query types.
This change weakens out-of-memory error reporting somewhat, though this
should be acceptable since we didn't consistently report such errors in
the first place.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Tue, 18 Sep 2018 12:43:09 +0000 (14:43 +0200)]
radeonsi: move query suspend logic into the top-level si_query struct
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Tue, 18 Sep 2018 12:16:10 +0000 (14:16 +0200)]
radeonsi: move remaining perfcounter code into si_perfcounter.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Fri, 21 Sep 2018 15:19:34 +0000 (17:19 +0200)]
radeonsi: track constant buffer bind history in si_pipe_set_constant_buffer
Other callers of si_set_constant_buffer don't need it.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Thu, 20 Sep 2018 08:47:03 +0000 (10:47 +0200)]
radeonsi: use si_set_rw_shader_buffer for setting streamout buffers
Reduce the number of places that encode buffer descriptors.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Fri, 21 Sep 2018 15:35:56 +0000 (17:35 +0200)]
radeonsi: add an si_set_rw_shader_buffer convenience function
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 16 Sep 2018 13:56:13 +0000 (15:56 +0200)]
radeonsi: avoid using hard-coded SI_NUM_RW_BUFFERS
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Fri, 31 Aug 2018 17:51:50 +0000 (19:51 +0200)]
radeonsi: show the fixed function TCS in debug dumps
This is rather important for merged VS/TCS as LSHS shaders...
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Thu, 30 Aug 2018 15:11:23 +0000 (17:11 +0200)]
radeonsi: const-ify si_set_tesseval_regs
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Mon, 2 Jul 2018 16:41:06 +0000 (18:41 +0200)]
radeonsi: rename SI_RESOURCE_FLAG_FORCE_TILING to clarify its purpose
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Fri, 21 Sep 2018 16:05:19 +0000 (18:05 +0200)]
radeonsi: don't set RAW_WAIT for CP DMA clears
There is never a read-after-write hazard because the command doesn't read.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Thu, 28 Jun 2018 22:08:26 +0000 (00:08 +0200)]
radeonsi/gfx9: use SET_UCONFIG_REG_INDEX packets when available
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Thu, 16 Nov 2017 11:14:51 +0000 (12:14 +0100)]
radeonsi: add si_init_draw_functions and make some functions static
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 19 Nov 2017 16:29:31 +0000 (17:29 +0100)]
radeonsi: extract declare_vs_blit_inputs
Prepare for some later refactoring.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sat, 18 Nov 2017 22:23:04 +0000 (23:23 +0100)]
radeonsi: move SI_FORCE_FAMILY functionality to winsys
This helps some debugging cases by initializing addrlib with
slightly more appropriate settings.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Thu, 29 Nov 2018 17:34:01 +0000 (18:34 +0100)]
ac/surface: 3D and cube surfaces are never displayable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Thu, 20 Sep 2018 17:09:50 +0000 (19:09 +0200)]
amd/common: add i1 special case to ac_build_{inclusive,exclusive}_scan
Allow for a unified but efficient treatment of adding a bitmask over a
wave or an entire threadgroup.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Wed, 23 May 2018 20:09:27 +0000 (22:09 +0200)]
amd/common: scan/reduce across waves of a workgroup
Order-aware scan/reduce can trade-off LDS traffic for external atomics
memory traffic in producer/consumer compute shaders.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Wed, 23 May 2018 20:04:20 +0000 (22:04 +0200)]
amd/common: add ac_build_ifcc
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Thu, 29 Nov 2018 18:00:15 +0000 (19:00 +0100)]
amd/common: whitespace fixes
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Sun, 19 Nov 2017 11:59:45 +0000 (12:59 +0100)]
amd/sid_tables: add additional python3 compatibility imports
This happened to bite me while doing some experiments.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Thu, 29 Nov 2018 12:48:03 +0000 (13:48 +0100)]
r600: remove redundant semicolon
Reviewed-By: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Mon, 27 Aug 2018 13:24:07 +0000 (15:24 +0200)]
ddebug: always flush when requested, even when hang detection is disabled
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Nicolai Hähnle [Mon, 28 May 2018 15:30:25 +0000 (17:30 +0200)]
ddebug: simplify watchdog loop and fix crash in the no-timeout case
The following race condition could occur in the no-timeout case:
API thread Gallium thread Watchdog
---------- -------------- --------
dd_before_draw
u_threaded_context draw
dd_after_draw
add to dctx->records
signal watchdog
dump & destroy record
execute draw
dd_after_draw_async
use-after-free!
Alternatively, the same scenario would assert in a debug build when
destroying the record because record->driver_finished has not signaled.
Fix this and simplify the logic at the same time by
- handing the record pointers off to the watchdog thread *before* each
draw call and
- waiting on the driver_finished fence in the watchdog thread
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tapani Pälli [Tue, 25 Sep 2018 10:20:54 +0000 (13:20 +0300)]
anv/android: turn on VK_ANDROID_external_memory_android_hardware_buffer
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tapani Pälli [Thu, 27 Sep 2018 08:02:59 +0000 (11:02 +0300)]
anv: ignore VkSamplerYcbcrConversion on non-yuv formats
This fulfills a requirement for clients that want to utilize same
code path for images with external formats (VK_FORMAT_UNDEFINED) and
"regular" RGBA images where format is known. This is similar to how
OES_EGL_image_external works.
To support this, we allow color conversion samplers for non-YUV
formats but skip setting up conversion when format does not have
can_ycbcr flag set.
v2: add comment and bundle can_ycbcr to the existing break
condition (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tapani Pälli [Mon, 8 Oct 2018 11:42:53 +0000 (14:42 +0300)]
anv: support VkSamplerYcbcrConversionInfo in vkCreateImageView
If a conversion struct was passed, then initialize view using
format from the conversion structure.
v2: use vk_format directly from the anv_format struct
v3: added some assertions (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tapani Pälli [Tue, 13 Nov 2018 07:57:09 +0000 (09:57 +0200)]
anv: add VkFormat field as part of anv_format
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tapani Pälli [Mon, 8 Oct 2018 11:45:52 +0000 (14:45 +0300)]
anv: support VkExternalFormatANDROID in vkCreateSamplerYcbcrConversion
If external format is used, we store the external format identifier in
conversion to be used later when creating VkImageView.
v2: rebase to
b43f955037c changes
v3: added assert, ignore components when creating external
format conversion (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tapani Pälli [Thu, 8 Nov 2018 08:37:12 +0000 (10:37 +0200)]
anv/android: support creating images from external format
Since we don't know the exact format at creation time, some initialization
is done only when bound with memory in vkBindImageMemory.
v2: demand dedicated allocation in vkGetImageMemoryRequirements2 if
image has external format
v3: refactor prepare_ahw_image, support vkBindImageMemory2,
calculate stride correctly for rgb(x) surfaces, rename as
'resolve_ahw_image'
v4: rebase to
b43f955037c changes
v5: add some assertions to verify input correctness (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tapani Pälli [Fri, 17 Aug 2018 08:52:10 +0000 (11:52 +0300)]
anv/android: add ahardwarebuffer external memory properties
v2: have separate memory properties for android, set usage
flags for buffers correctly
v3: code cleanup (Jason)
+ limit maxArrayLayers to 1 for AHardwareBuffer based images
v4: rebase to
b43f955037c changes
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tapani Pälli [Thu, 8 Nov 2018 08:20:35 +0000 (10:20 +0200)]
anv/android: support import/export of AHardwareBuffer objects
v2: add support for non-image buffers (AHARDWAREBUFFER_FORMAT_BLOB)
v3: properly handle usage bits when creating from image
v4: refactor, code cleanup (Jason)
v5: rebase to
b43f955037c changes,
initialize bo flags as ANV_BO_EXTERNAL (Lionel)
v6: add assert that anv_bo_cache_import succeeds, add comment
about multi-bo support to clarify current implementation (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tapani Pälli [Tue, 9 Oct 2018 06:53:55 +0000 (09:53 +0300)]
anv: refactor, remove else block in AllocateMemory
This makes it cleaner to introduce more cases where we import memory
from different types of external memory buffers.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tapani Pälli [Thu, 8 Nov 2018 08:02:42 +0000 (10:02 +0200)]
anv: add anv_ahw_usage_from_vk_usage helper function
v2: rebase to
b43f955037c changes
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tapani Pälli [Tue, 29 May 2018 06:32:25 +0000 (09:32 +0300)]
anv/android: add GetAndroidHardwareBufferPropertiesANDROID
Use the anv_format address in formats table as implementation-defined
external format identifier for now. When adding YUV format support this
might need to change.
v2: code cleanup (Jason)
v3: set anv_format address as identifier
v4: setup suggestedYcbcrModel and suggested[X|Y]ChromaOffset
as expected for HAL_PIXEL_FORMAT_NV12_Y_TILED_INTEL
v5: set linear tiling for GPU_DATA_BUFFER usage, add comment
about multi-bo support to clarify current implementation (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tapani Pälli [Tue, 29 May 2018 06:29:03 +0000 (09:29 +0300)]
anv: add from/to helpers with android and vulkan formats
v2: handle R8G8B8X8 as R8G8B8_UNORM (Jason)
v3: add HAL_PIXEL_FORMAT_NV12_Y_TILED_INTEL, we make it define
for now to avoid direct dependency to minigbm headers
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tapani Pälli [Tue, 29 May 2018 06:26:42 +0000 (09:26 +0300)]
anv: make anv_get_image_format_features public
This will be utilized later by GetAndroidHardwareBufferPropertiesANDROID.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tapani Pälli [Thu, 27 Sep 2018 05:00:53 +0000 (08:00 +0300)]
anv: refactor make_surface to use data from anv_image
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tapani Pälli [Tue, 21 Aug 2018 08:12:37 +0000 (11:12 +0300)]
anv: add create_flags as part of anv_image
This will make it possible for next patch to rip
anv_image_create_info out from make_surface function.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Ian Romanick [Tue, 18 Dec 2018 21:28:22 +0000 (13:28 -0800)]
nir/algebraic: Don't put quotes around floating point literals
The quotation marks around 1.0 cause it to be treated as a string
instead of a floating point value. The generator then treats it as an
arbitrary variable replacement, so any iand involving a ('ineg', ('b2i',
a)) matches.
v2: Remove misleading comment about sized literals (suggested by
Timothy). Add assertion that the name of a varible is entierly
alphabetic (suggested by Jason).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Timothy Arceri <tarceri@itsqueeze.com> [v1]
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> [v1]
Fixes: 6bcd2af0863 ("nir/algebraic: Add some optimizations for D3D-style Booleans")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109075
Vinson Lee [Tue, 18 Dec 2018 17:42:04 +0000 (09:42 -0800)]
meson: Fix libsensors detection.
Fixes: 5e71efef44b9 ("meson: Add lmsensors support")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Vinson Lee [Mon, 17 Dec 2018 00:35:00 +0000 (16:35 -0800)]
meson: Fix typo.
Fixes: 6b4c7047d571 ("meson: build gallium nine state_tracker")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Sagar Ghuge [Thu, 13 Dec 2018 19:40:58 +0000 (11:40 -0800)]
nir: Add a new lowering option to lower 3D surfaces from txd to txl.
Tested on gen9.
v2: Rename lower_txd_3d_surafaces flag to lower_txd_3d (Jason Ekstrand)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Christian Gmeiner [Thu, 13 Dec 2018 20:07:23 +0000 (21:07 +0100)]
meson: add etnaviv to the tools option
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Adam Jackson [Fri, 9 Nov 2018 16:46:48 +0000 (11:46 -0500)]
specs: Bump GLX_MESA_query_renderer to version 9
Note that we have an official GL extension number, pick the appropriate
section of the GLX spec to modify, and add changelog.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Adam Jackson [Fri, 9 Nov 2018 16:37:42 +0000 (11:37 -0500)]
specs: Remove GLX_RENDERER_ID_MESA from GLX_MESA_query_renderer
This has not even had an attempt at implementation. If you asked for
renderer 0 - which, the spec implies, should always work - then
dri2_convert_glx_attribs would fail, we'd silently fall back to creating
an indirect context, and xserver would also not recognize the attribute
and would throw BadValue at you.
The API would be difficult to use in any case, since there's no way to
enumerate how many renderers the screen has. I'd be tempted to add that
by defining:
glXQueryRendererIntegerMESA(dpy, screen,
/* renderer = */ -1,
0, &value);
to return the number of renderers, but a new entrypoint might be
cleaner. Still, better to not specify it at all than to lie about it.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Adam Jackson [Fri, 9 Nov 2018 16:27:27 +0000 (11:27 -0500)]
specs: Remove GLES profile interaction text from GLX_MESA_query_renderer
In one place we say, if GLES isn't supported then the profile version
will be 0.0. Then later we say, if the GLES profile extension isn't
supported then GLX_RENDERER_OPENGL_ES_PROFILE_VERSION_MESA is not
mentioned in the spec. A strict reading of the latter would mean that
GLX_RENDERER_OPENGL_ES_PROFILE_VERSION_MESA is not a recognized token,
and the query should instead return False.
The implementation does not check for the GLES profile extensions, and
the additional complexity doesn't seem worth it. Removing the
interaction text makes the spec match the implementation.
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Eduardo Lima Mitev [Mon, 17 Dec 2018 20:41:24 +0000 (21:41 +0100)]
freedreno/ir3: Make imageStore use num components from image format
emit_intrinsic_store_image() is always using 4 components when
collecting registers for the value. When image has less than
4 components (e.g, r32f, rg32i, etc) this results in extra mov
instructions.
This patch uses the actual number of components from the image format.
For example, in a shader like:
layout (r32f, binding=0) writeonly uniform imageBuffer u_image;
...
void main(void) {
...
imageStore (u_image, some_offset, vec4(1.0));
...
}
instruction count is reduced in at least 3 instructions (note image
format is r32f, 1 component only).
This obviously reduces register pressure as well.
v2: - Added support for image formats from NV_image_format extension
(Ilia Mirkin).
- Return 4 components by default instead of asserting. (Rob Clark).
v3: Added more missing formats (Ilia Mirkin).
v4: Added a debug message for unknown image formats (Rob Clark).
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Jason Ekstrand [Thu, 13 Dec 2018 22:30:24 +0000 (16:30 -0600)]
nir/dead_write_vars: Get modes directly from derefs
Instead of going all the way back to the variable, just look at the
deref. The modes are guaranteed to be the same by nir_validate whenever
the variable can be found. This fixes clear_unused_for_modes for
derefs that don't have an accessible variable.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Jason Ekstrand [Thu, 13 Dec 2018 22:28:23 +0000 (16:28 -0600)]
nir/copy_prop_vars: Get modes directly from derefs
Instead of going all the way back to the variable, just look at the
deref. The modes are guaranteed to be the same by nir_validate whenever
the variable can be found. This fixes apply_barrier_for_modes for
derefs that don't have an accessible variable.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Jason Ekstrand [Thu, 13 Dec 2018 21:03:28 +0000 (15:03 -0600)]
nir/lower_wpos_center: Look at derefs for modes
This is instead of looking all the way back to the variable which may
not exist for all derefs. This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Jason Ekstrand [Thu, 13 Dec 2018 21:01:05 +0000 (15:01 -0600)]
nir/lower_io_to_scalar: Look at derefs for modes
This is instead of looking all the way back to the variable which may
not exist for all derefs. This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Jason Ekstrand [Thu, 13 Dec 2018 20:33:41 +0000 (14:33 -0600)]
nir/lower_io_arrays_to_elements: Look at derefs for modes
This is instead of looking all the way back to the variable which may
not exist for all derefs. This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Jason Ekstrand [Thu, 13 Dec 2018 20:31:54 +0000 (14:31 -0600)]
nir/linking_helpers: Look at derefs for modes
This is instead of looking all the way back to the variable which may
not exist for all derefs. This makes this code properly ignore casts
with modes other than the mode[s] we care about (where casts aren't
allowed).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Jason Ekstrand [Thu, 13 Dec 2018 20:06:48 +0000 (14:06 -0600)]
nir/propagate_invariant: Skip unknown vars
If we can't find the variable from the deref, just assume it isn't
invariant and continue on. This can happen if, for instance, we're
writing to a deref that points into an SSBO.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Ian Romanick [Tue, 18 Dec 2018 02:48:17 +0000 (18:48 -0800)]
Revert "nir/lower_indirect: Bail early if modes == 0"
"There's no point in walking the program if we're never going to
actually lower anything."
Except we might lower compacted local arrays. In that case, modes will
be 0, but there is still lowering to be done.
This reverts commit
7f75cf2a9408b9af562e033ef6c1d1fd15141421.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109081
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Clayton Craft <clayton.a.craft@intel.com>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Lucas Stach [Thu, 17 May 2018 09:03:57 +0000 (11:03 +0200)]
st/dri: replace format conversion functions with single mapping table
Each time I have to touch the buffer import/export functions in the dri
state tracker I get lost in the maze of functions converting between
DRI_IMAGE_FOURCC, DRI_IMAGE_FORMAT, DRI_IMAGE_COMPONENTS and pipe format.
Rip it out and replace by a single table, which defines the correspondence
between the different representations.
Also this now stores all the known representations in the __DRIimageRec,
to avoid the loss of information we currently have when importing a buffer
with a fourcc, which doesn't have a corresponding dri format.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Lucas Stach [Mon, 26 Mar 2018 14:39:28 +0000 (16:39 +0200)]
st/dri: allow both render and sampler compatible dma-buf formats
Currently all the EGL APIs are missing a way to specify how an imported
dma-buf is intended to be used. Demanding the format to be both usable
for sampling and rendering artificially restricts the list of formats a
driver is able to import.
Looking at how the Intel driver implements those DRI2 image APIs it
doesn't distinguish between render or sampler compatible formats. So
this patch aligns behavior between Intel and Gallium based drivers.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Lucas Stach [Wed, 14 Nov 2018 14:11:07 +0000 (15:11 +0100)]
etnaviv: use surface format directly
There is no need to do the detour over the resource behind the
surface to get the format. Use the surface format directly.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Dylan Baker [Tue, 4 Dec 2018 18:06:08 +0000 (10:06 -0800)]
meson: Add toggle for glx-direct
GNU Hurd needs to turn off glx-direct, rather than special case it,
we'll just add a toggle.
CC: 18.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Dylan Baker [Tue, 4 Dec 2018 17:56:30 +0000 (09:56 -0800)]
meson: Add support for gnu hurd
CC: 18.3 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Dylan Baker [Tue, 4 Dec 2018 17:48:42 +0000 (09:48 -0800)]
meson: remove duplicate definition
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Dylan Baker [Tue, 4 Dec 2018 17:28:10 +0000 (09:28 -0800)]
meson: Fix ppc64 little endian detection
Old versions of meson returned ppc64le as the cpu_family for little
endian power8 cpus, versions >=0.48 don't do this, so the check wouldn't
work in that case. This generalizes the check to work for both old and
new versions of meson.
Fixes: 34bbb24ce7702658cdc4e9d34a650e169716c39e
("meson: Add support for ppc assembly/optimizations")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Jason Ekstrand [Mon, 17 Dec 2018 16:48:47 +0000 (10:48 -0600)]
anv: Bump the patch version to 96
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Kenneth Graunke [Fri, 14 Dec 2018 23:38:29 +0000 (15:38 -0800)]
i965: Don't override subslice count to 4 on Gen11.
Gen9-10 have fewer than 4 subslices per slice, so they need this to be
rounded up. Gen11 isn't documented as needing this hack, and it can
also have more than 4 subslices, so the hack actually can break things.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Ian Romanick [Tue, 19 Jun 2018 00:09:41 +0000 (17:09 -0700)]
intel/compiler: More peephole_select for pre-Gen6
No shader-db changes on any Gen6+ platform.
All of the shaders with cycles hurt by more than ~2% are from Master of
Orion. All of the shaders have instructions helped. It looks like the
pass enables some control flow to be converted to bcsels, then the
scheduler does dumb things. These are new shaders (just added before
doing this shader-db run), so there's probably some low-hanging fruit.
Iron Lake
total instructions in shared programs:
8214327 ->
8213684 (<.01%)
instructions in affected programs: 84469 -> 83826 (-0.76%)
helped: 114
HURT: 26
helped stats (abs) min: 2 max: 18 x̄: 7.75 x̃: 9
helped stats (rel) min: 0.17% max: 13.73% x̄: 2.52% x̃: 1.05%
HURT stats (abs) min: 2 max: 20 x̄: 9.23 x̃: 8
HURT stats (rel) min: 0.70% max: 2.48% x̄: 1.66% x̃: 1.61%
95% mean confidence interval for instructions value: -5.87 -3.32
95% mean confidence interval for instructions %-change: -2.32% -1.17%
Instructions are helped.
total cycles in shared programs:
187736850 ->
187749314 (<.01%)
cycles in affected programs: 506750 -> 519214 (2.46%)
helped: 104
HURT: 36
helped stats (abs) min: 2 max: 72 x̄: 21.96 x̃: 16
helped stats (rel) min: 0.02% max: 6.16% x̄: 0.97% x̃: 0.63%
HURT stats (abs) min: 4 max: 1402 x̄: 409.67 x̃: 40
HURT stats (rel) min: 0.33% max: 23.12% x̄: 5.79% x̃: 1.39%
95% mean confidence interval for cycles value: 28.32 149.74
95% mean confidence interval for cycles %-change: -0.07% 1.61%
Inconclusive result (%-change mean confidence interval includes 0).
GM45
total instructions in shared programs:
5044014 ->
5043652 (<.01%)
instructions in affected programs: 46751 -> 46389 (-0.77%)
helped: 63
HURT: 13
helped stats (abs) min: 2 max: 29 x̄: 7.65 x̃: 9
helped stats (rel) min: 0.17% max: 13.73% x̄: 2.93% x̃: 1.04%
HURT stats (abs) min: 2 max: 20 x̄: 9.23 x̃: 8
HURT stats (rel) min: 0.66% max: 2.35% x̄: 1.58% x̃: 1.52%
95% mean confidence interval for instructions value: -6.54 -2.99
95% mean confidence interval for instructions %-change: -3.04% -1.28%
Instructions are helped.
total cycles in shared programs:
128143042 ->
128150188 (<.01%)
cycles in affected programs: 324564 -> 331710 (2.20%)
helped: 57
HURT: 19
helped stats (abs) min: 6 max: 74 x̄: 30.70 x̃: 32
helped stats (rel) min: 0.08% max: 4.74% x̄: 1.22% x̃: 0.81%
HURT stats (abs) min: 10 max: 1400 x̄: 468.21 x̃: 60
HURT stats (rel) min: 0.56% max: 19.94% x̄: 5.80% x̃: 1.70%
95% mean confidence interval for cycles value: 6.90 181.15
95% mean confidence interval for cycles %-change: -0.52% 1.59%
Inconclusive result (%-change mean confidence interval includes 0).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Ian Romanick [Mon, 18 Jun 2018 23:11:55 +0000 (16:11 -0700)]
nir/opt_peephole_select: Don't peephole_select expensive math instructions
On some GPUs, especially older Intel GPUs, some math instructions are
very expensive. On those architectures, don't reduce flow control to a
csel if one of the branches contains one of these expensive math
instructions.
This prevents a bunch of cycle count regressions on pre-Gen6 platforms
with a later patch (intel/compiler: More peephole select for pre-Gen6).
v2: Remove stray #if block. Noticed by Thomas.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Ian Romanick [Wed, 23 May 2018 01:56:41 +0000 (18:56 -0700)]
intel/compiler: More peephole select
Shader-db results:
The one shader hurt for instructions is a compute shader that had both
spills and fills hurt.
v2: Fix typo in comment noticed by Caio.
v3: Fix inverted condition in brw_nir.c. Noticed by Lionel.
Skylake, Broadwell, and Haswell had similar results. (Skylake shown)
total instructions in shared programs:
15072761 ->
15047884 (-0.17%)
instructions in affected programs: 895539 -> 870662 (-2.78%)
helped: 3623
HURT: 1
helped stats (abs) min: 1 max: 181 x̄: 6.89 x̃: 4
helped stats (rel) min: 0.10% max: 25.00% x̄: 3.93% x̃: 3.20%
HURT stats (abs) min: 92 max: 92 x̄: 92.00 x̃: 92
HURT stats (rel) min: 1.92% max: 1.92% x̄: 1.92% x̃: 1.92%
95% mean confidence interval for instructions value: -7.10 -6.63
95% mean confidence interval for instructions %-change: -4.03% -3.82%
Instructions are helped.
total cycles in shared programs:
369738930 ->
369535732 (-0.05%)
cycles in affected programs:
68027851 ->
67824653 (-0.30%)
helped: 2609
HURT: 1035
helped stats (abs) min: 1 max: 4508 x̄: 181.44 x̃: 77
helped stats (rel) min: <.01% max: 71.31% x̄: 9.14% x̃: 5.47%
HURT stats (abs) min: 1 max: 33336 x̄: 261.04 x̃: 20
HURT stats (rel) min: <.01% max: 47.61% x̄: 2.93% x̃: 1.47%
95% mean confidence interval for cycles value: -96.43 -15.09
95% mean confidence interval for cycles %-change: -6.07% -5.36%
Cycles are helped.
total spills in shared programs: 10158 -> 10159 (<.01%)
spills in affected programs: 166 -> 167 (0.60%)
helped: 1
HURT: 1
total fills in shared programs: 22105 -> 22116 (0.05%)
fills in affected programs: 837 -> 848 (1.31%)
helped: 4
HURT: 1
Ivy Bridge
total instructions in shared programs:
12021190 ->
11990256 (-0.26%)
instructions in affected programs: 910561 -> 879627 (-3.40%)
helped: 3344
HURT: 18
helped stats (abs) min: 1 max: 99 x̄: 9.29 x̃: 6
helped stats (rel) min: 0.11% max: 31.18% x̄: 5.19% x̃: 3.31%
HURT stats (abs) min: 2 max: 20 x̄: 7.89 x̃: 6
HURT stats (rel) min: 0.70% max: 2.59% x̄: 1.63% x̃: 1.70%
95% mean confidence interval for instructions value: -9.49 -8.91
95% mean confidence interval for instructions %-change: -5.32% -4.98%
Instructions are helped.
total cycles in shared programs:
179077826 ->
178570196 (-0.28%)
cycles in affected programs:
63205667 ->
62698037 (-0.80%)
helped: 2767
HURT: 620
helped stats (abs) min: 1 max: 7531 x̄: 217.58 x̃: 88
helped stats (rel) min: <.01% max: 75.86% x̄: 9.59% x̃: 6.09%
HURT stats (abs) min: 1 max: 31255 x̄: 152.27 x̃: 11
HURT stats (rel) min: <.01% max: 36.36% x̄: 2.77% x̃: 0.58%
95% mean confidence interval for cycles value: -173.94 -125.81
95% mean confidence interval for cycles %-change: -7.68% -6.97%
Cycles are helped.
Sandy Bridge
total instructions in shared programs:
10852569 ->
10843758 (-0.08%)
instructions in affected programs: 235803 -> 226992 (-3.74%)
helped: 800
HURT: 0
helped stats (abs) min: 1 max: 88 x̄: 11.01 x̃: 8
helped stats (rel) min: 0.11% max: 23.08% x̄: 4.69% x̃: 3.36%
95% mean confidence interval for instructions value: -11.93 -10.10
95% mean confidence interval for instructions %-change: -4.99% -4.39%
Instructions are helped.
total cycles in shared programs:
154732047 ->
154608941 (-0.08%)
cycles in affected programs:
4063110 ->
3940004 (-3.03%)
helped: 606
HURT: 253
helped stats (abs) min: 1 max: 2524 x̄: 227.93 x̃: 62
helped stats (rel) min: 0.02% max: 39.24% x̄: 4.36% x̃: 1.81%
HURT stats (abs) min: 1 max: 1966 x̄: 59.36 x̃: 11
HURT stats (rel) min: 0.02% max: 67.10% x̄: 3.22% x̃: 0.67%
95% mean confidence interval for cycles value: -170.49 -116.13
95% mean confidence interval for cycles %-change: -2.61% -1.65%
Cycles are helped.
No change on Iron Lake or GM45.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Ian Romanick [Wed, 27 Jun 2018 18:41:19 +0000 (11:41 -0700)]
nir/opt_peephole_select: Don't try to remove flow control around indirect loads
That flow control may be trying to avoid invalid loads. On at least
some platforms, those loads can also be expensive.
No shader-db changes on any Intel platform (even with the later patch
"intel/compiler: More peephole select").
v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested
by Rob. See also the big comment in src/intel/compiler/brw_nir.c.
v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from
nir_lower_io_arrays_to_elements.c).
v4: Fix inverted condition in brw_nir.c. Noticed by Lionel.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Ian Romanick [Wed, 20 Jun 2018 01:09:05 +0000 (18:09 -0700)]
i965/vec4: Propagate conditional modifiers from more compares to other compares
If there is a CMP.NZ that compares a single component (via a .zzzz
swizzle, for example) with 0, it can propagate its conditional modifier
back to a previous CMP that writes only that component. The specific
case that I saw was:
cmp.l.f0(8) g42<1>.xF g61<4>.xF (abs)g18<4>.zF
...
cmp.nz.f0(8) null<1>D g42<4>.xD 0D
In this case we can just delete the second CMP.
No changes on Broadwell or Skylake because they do not use the vec4
backend. Also no changes on GM45 or Iron Lake.
Sandy Bridge, Ivy Bridge, and Haswell had similar results. (Sandy Bridge shown)
total instructions in shared programs:
10856676 ->
10852569 (-0.04%)
instructions in affected programs: 228322 -> 224215 (-1.80%)
helped: 1331
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 3.09 x̃: 4
helped stats (rel) min: 0.11% max: 6.67% x̄: 1.88% x̃: 1.83%
95% mean confidence interval for instructions value: -3.19 -2.99
95% mean confidence interval for instructions %-change: -1.93% -1.83%
Instructions are helped.
total cycles in shared programs:
154788865 ->
154732047 (-0.04%)
cycles in affected programs:
2485892 ->
2429074 (-2.29%)
helped: 1097
HURT: 59
helped stats (abs) min: 2 max: 168 x̄: 51.96 x̃: 64
helped stats (rel) min: 0.12% max: 12.70% x̄: 3.44% x̃: 2.22%
HURT stats (abs) min: 2 max: 16 x̄: 3.02 x̃: 2
HURT stats (rel) min: 0.18% max: 0.83% x̄: 0.64% x̃: 0.71%
95% mean confidence interval for cycles value: -51.04 -47.26
95% mean confidence interval for cycles %-change: -3.40% -3.07%
Cycles are helped.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>