Gert Wollny [Tue, 5 Jun 2018 11:58:53 +0000 (13:58 +0200)]
gallium/aux/tgsi_exec.c: remove superfluous parameter from etch_source_d
Remove unused parameter src_datatype from fetch_source_d, fixes warning;
tgsi/tgsi_exec.c: In function 'fetch_source_d':
tgsi/tgsi_exec.c:1594:40: warning: unused parameter 'src_datatype' [-Wunused-parameter]
enum tgsi_exec_datatype src_datatype)
^~~~~~~~~~~~
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Gert Wollny [Tue, 5 Jun 2018 11:58:52 +0000 (13:58 +0200)]
gallium/aux/tgsi_exec.c: remove superfluous parameter from store_dest_dstret
remove unused parameter inst from store_dest_dstret (and consequently also from
store_dest_double), fixes warning:
tgsi/tgsi_exec.c: In Funktion »store_dest_dstret«:
tgsi/tgsi_exec.c:1765:47: Warning: unused parameter »inst« [-Wunused-parameter]
const struct tgsi_full_instruction *inst)
^~~~
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Gert Wollny [Tue, 5 Jun 2018 11:58:51 +0000 (13:58 +0200)]
gallium/aux/tgsi_exec.c: Remove unused parameter from fetch_src_file_channel
remove unused parameter chan_index from fetch_src_file_channel, fixes warning:
tgsi/tgsi_exec.c: In Funktion »fetch_src_file_channel«:
tgsi/tgsi_exec.c:1480:35: Warning: unused parameter »chan_index« [-Wunused-parameter]
const uint chan_index,
^~~~~~~~~~
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Gert Wollny [Tue, 5 Jun 2018 11:58:50 +0000 (13:58 +0200)]
gallium/aux/tgsi_exec.c: Remove paramater inst from exec_kill
Fixes warning:
tgsi/tgsi_exec.c: In Funktion »exec_kill«:
tgsi/tgsi_exec.c:2049:47: Warning: unused parameter »inst« [-Wunused-parameter]
const struct tgsi_full_instruction *inst)
^~~~
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Gert Wollny [Tue, 5 Jun 2018 11:58:49 +0000 (13:58 +0200)]
gallium/aux/tgsi_aa_point.c: Fix -Wsign-compare warnings
tgsi/tgsi_aa_point.c:32:0:
tgsi/tgsi_aa_point.c: In Funktion »aa_decl«:
./util/u_math.h:660:29: Comparison between signed and unsigned in
conditional expressions [-Wsign-compare]
#define MAX2( A, B ) ( (A)>(B) ? (A) : (B) )
^
tgsi/tgsi_aa_point.c:76:21: Remark: when substituting of the macro
»MAX2«
ts->num_tmp = MAX2(ts->num_tmp, decl->Range.Last + 1);
^~~~
./util/u_math.h:660:40: Warning: signed and unsigned type in conditional
expression [-Wsign-compare]
#define MAX2( A, B ) ( (A)>(B) ? (A) : (B) )
^
tgsi/tgsi_aa_point.c:76:21: Remark: when substituting of the macro
»MAX2«
ts->num_tmp = MAX2(ts->num_tmp, decl->Range.Last + 1);
^~~~
tgsi/tgsi_aa_point.c: In Funktion »aa_inst«:
tgsi/tgsi_aa_point.c:220:31: Comparison between signed and unsigned in
conditional expressions [-Wsign-compare]
dst->Register.Index == ts->color_out) {
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Gert Wollny [Tue, 5 Jun 2018 11:58:48 +0000 (13:58 +0200)]
gallium/aux/tgsi_sanity.c: Fix -Wsign-compare warnings
tgsi_sanity.c: In function 'iter_instruction':
tgsi_sanity.c:316:29: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
if (ctx->index_of_END != ~0) {
^~
tgsi_sanity.c: In function 'epilog':
tgsi_sanity.c:488:26: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
if (ctx->index_of_END == ~0) {
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Gert Wollny [Tue, 5 Jun 2018 11:58:47 +0000 (13:58 +0200)]
gallium/aux/tgsi/tgsi_parse.c: Fix two warnings
tgsi_parse.c: In function 'tgsi_parse_free':
tgsi_parse.c:54:31: warning: unused parameter 'ctx' [-Wunused-parameter]
struct tgsi_parse_context *ctx )
^~~
tgsi_parse.c: In function 'tgsi_parse_end_of_tokens':
tgsi_parse.c:62:25: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
return ctx->Position >=
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Gert Wollny [Tue, 5 Jun 2018 11:58:46 +0000 (13:58 +0200)]
gallium/aux/tgsi/tgsi_dump.c: Fix -Wsign-compare warnings
tgsi_dump.c: In function 'iter_property':
tgsi_dump.c:443:18: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
for (i = 0; i < prop->Property.NrTokens - 1; ++i) {
^
tgsi_dump.c:459:13: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
if (i < prop->Property.NrTokens - 2)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Gert Wollny [Tue, 5 Jun 2018 11:58:45 +0000 (13:58 +0200)]
gallium/aux/cso_cache: Fix various warnings
cso_cache.c: In Function »delete_blend_state«:
cso_cache/cso_cache.c:90:51: Warning: unused parameter »data« [-Wunused-
parameter]
static void delete_blend_state(void *state, void *data)
^~~~
cso_cache/cso_cache.c: In Funktion »delete_depth_stencil_state«:
cso_cache/cso_cache.c:98:59: Warning: unused parameter »data« [-Wunused-
parameter]
static void delete_depth_stencil_state(void *state, void *data)
^~~~
cso_cache/cso_cache.c: In Funktion »delete_sampler_state«:
cso_cache/cso_cache.c:106:53: Warning: unused parameter »data« [-
Wunused-parameter]
static void delete_sampler_state(void *state, void *data)
^~~~
cso_cache/cso_cache.c: In Funktion »delete_rasterizer_state«:
cso_cache/cso_cache.c:114:56: Warning: unused parameter »data« [-
Wunused-parameter]
static void delete_rasterizer_state(void *state, void *data)
^~~~
cso_cache/cso_cache.c: In Funktion »delete_velements«:
cso_cache/cso_cache.c:122:49: Warning: unused parameter »data« [-
Wunused-parameter]
static void delete_velements(void *state, void *data)
^~~~
cso_cache/cso_cache.c: In Funktion »sanitize_cb«:
cso_cache/cso_cache.c:166:52: Warning: unused parameter »user_data« [-
Wunused-parameter]
int max_size, void *user_data)
^~~~~~~~~
gallium/aux/cso_context.c: a -Wunused-parameter warning
cso_cache/cso_context.c: In Funktion »delete_sampler_state«:
cso_cache/cso_context.c:163:57: Warning: unused parameter »ctx« [-
Wunused-parameter]
static boolean delete_sampler_state(struct cso_context *ctx, void
*state)
^~~
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Gert Wollny [Mon, 11 Jun 2018 16:24:39 +0000 (18:24 +0200)]
configure.ac: Add CFLAG -Wno-missing-field-initializers (v5)
This warning is misleading: When a struct is partially initialized without
assigning to the structure members by name, then the remaining fields
will be zeroed out, and this warning will be issued (if enabled). If, on the
other hand, the partial initialization is done by assigning to named members,
the remaining structure elements may hold random data, but the warning is not
issued. Since in Mesa the first approach to initialize structure elements is
used very often, and it is usually assumed that the remaining elements are
zeroed out, heeding this warning would be counter-productive.
v2: - add -Wno-missing-field-initializers to meson-build
- fix empty line error
(both Eric Engestrom)
v3: * check for -Wmissing-field-initializers warning and then disable it
because gcc and clang always accept -Wno-* (Dylan Baker)
* Also disable this warning for C++
v4: * meson.build add -Wno-missing-field-initializers to
c_args instead of no_override_init_args (Eric Engstrom)
v5: * configure.ac: Correct copy/paste error with CFLAGS/CXXFLAGS
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
Reviewed-by: Emil Velikov <emil.velikov@collabora.com> (v2)
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Samuel Pitoiset [Tue, 19 Jun 2018 13:24:39 +0000 (15:24 +0200)]
radv: remove unnecessary code around CACHE_FLUSH_AND_INV_TS_EVENT
AMDVLK also always uses CACHE_FLUSH_AND_INV_TS_EVENT. The other
workaround is to flush DB metadata after emitting the framebuffer,
but that seems slower.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Bas Nieuwenhuizen [Tue, 19 Jun 2018 21:48:46 +0000 (23:48 +0200)]
radv: Fix flush_bits being used uninitialized.
A case of making things worse while trying to fix something minor ...
Fixes: ef79457004e "radv: Merge the flush bits of CMASK & DCC clear."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Keith Packard [Fri, 9 Feb 2018 15:45:58 +0000 (07:45 -0800)]
radv: Add EXT_acquire_xlib_display to radv driver [v2]
This extension adds the ability to borrow an X RandR output for
temporary use directly by a Vulkan application to the radv driver.
v2:
Simplify addition of VK_USE_PLATFORM_XLIB_XRANDR_KHR to
vulkan_wsi_args
Suggested-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Keith Packard [Fri, 9 Feb 2018 15:45:58 +0000 (07:45 -0800)]
anv: Add EXT_acquire_xlib_display to anv driver [v3]
This extension adds the ability to borrow an X RandR output for
temporary use directly by a Vulkan application to the anv driver.
v2:
Simplify addition of VK_USE_PLATFORM_XLIB_XRANDR_KHR to
vulkan_wsi_args
Suggested-by: Eric Engestrom <eric.engestrom@imgtec.com>
v3:
Add extension to list in alphabetical order
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Keith Packard [Fri, 9 Feb 2018 15:45:58 +0000 (07:45 -0800)]
vulkan: Add EXT_acquire_xlib_display [v5]
This extension adds the ability to borrow an X RandR output for
temporary use directly by a Vulkan application. For DRM, we use the
Linux resource leasing mechanism.
v2:
Clean up xlib_lease detection
* Use separate temporary '_xlib_lease' variable to hold the
option value to avoid changin the type of a variable.
* Use boolean expressions instead of additional if statements
to compute resulting with_xlib_lease value.
* Simplify addition of VK_USE_PLATFORM_XLIB_XRANDR_KHR to
vulkan_wsi_args
Suggested-by: Eric Engestrom <eric.engestrom@imgtec.com>
Move mode list from wsi_display to wsi_display_connector
Fix scope for wsi_display_mode and wsi_display_connector allocs
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
v3:
Adopt Jason Ekstrand's coding conventions
Declare variables at first use, eliminate extra whitespace
between types and names. Wrap lines to 80 columns.
Explicitly forbid multiple DRM leases. Making the code support
this looks tricky and will require additional thought.
Use xcb_randr_output_t throughout the internals of the
implementation. Convert at the public API
(wsi_get_randr_output_display).
Clean up check for usable active_crtc (possible when only the
desired output is connected to the crtc).
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v4:
Move output resource fetching closer to use in
wsi_display_get_output. This simplifies the error returns in
earlier parts of the code a bit.
Return VK_ERROR_INITIALIZATION_FAILED from
wsi_acquire_xlib_display. Jason says this is the right error
message.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v5:
randr doesn't pass vscan over the wire, so we set vscan to 0
for randr-acquired modes, and test wsi modes for vscan <= 1
when comparing against randr modes.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Keith Packard [Fri, 9 Feb 2018 15:38:32 +0000 (07:38 -0800)]
radv: Add EXT_direct_mode_display to radv driver
Add support for the EXT_direct_mode_display extension. This just
provides the vkReleaseDisplayEXT function.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Keith Packard [Fri, 9 Feb 2018 15:38:32 +0000 (07:38 -0800)]
anv: Add EXT_direct_mode_display to anv driver [v2]
Add support for the EXT_direct_mode_display extension. This just
provides the vkReleaseDisplayEXT function.
v2: Add extension to list in alphabetical order
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Keith Packard [Fri, 9 Feb 2018 15:38:32 +0000 (07:38 -0800)]
vulkan: Add EXT_direct_mode_display [v2]
Add support for the EXT_direct_mode_display extension. This just
provides the vkReleaseDisplayEXT function.
v2:
Adopt Jason Ekstrand's coding conventions
Declare variables at first use, eliminate extra whitespace
between types and names. Wrap lines to 80 columns.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Keith Packard [Wed, 7 Feb 2018 18:31:44 +0000 (10:31 -0800)]
radv: Add KHR_display extension to radv [v5]
This adds support for the KHR_display extension to the radv Vulkan
driver. The driver now attempts to open the master DRM node when the
KHR_display extension is requested so that the common winsys code can
perform the necessary operations.
v2:
* Simplify addition of VK_USE_PLATFORM_DISPLAY_KHR to
vulkan_wsi_args
Suggested-by: Eric Engestrom <eric.engestrom@imgtec.com>
v3:
Adapt to new wsi_device_init API (added display_fd)
v4:
Adopt Jason Ekstrand's coding conventions
Declare variables at first use, eliminate extra whitespace
between types and names. Wrap lines to 80 columns.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v5:
Add vkCreateDisplayModeKHR. This doesn't actually create
new modes, it only looks to see if the requested parameters
matches an existing mode and returns that.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Keith Packard [Wed, 7 Feb 2018 18:31:44 +0000 (10:31 -0800)]
anv: Add KHR_display extension to anv [v7]
This adds support for the KHR_display extension to the anv Vulkan
driver. The driver now attempts to open the master DRM node when the
KHR_display extension is requested so that the common winsys code can
perform the necessary operations.
v2: Make sure primary fd is usable
When KHR_display is selected, we try to open the primary node
instead of the render node in case the user wants to use
KHR_display for presentation. However, if we're actually going
to end up using RandR leases, then we don't care if the
resulting fd can't be used for display, but the kernel also
prevents us from using it for drawing when someone else has
master.
v3:
Simplify addition of VK_USE_PLATFORM_DISPLAY_KHR to vulkan_wsi_args
Suggested-by: Eric Engestrom <eric.engestrom@imgtec.com>
v4:
Adapt primary node usage to new wsi_device_init API
v5:
Adopt Jason Ekstrand's coding conventions
Declare variables at first use, eliminate extra whitespace between
types and names. Wrap lines to 80 columns.
Remove spurious MM_PER_PIXEL define
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v6:
Open DRM master before initializing WSI layer.
The DRM master FD is passed to the WSI layer during
initialization, so we need to open the device slightly earlier
in the function.
Close DRM master in device_finish.
Use anv_gem_get_param to detect working master_fd instead of
directly using the ioctl.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v7:
Add vkCreateDisplayModeKHR. This doesn't actually create
new modes, it only looks to see if the requested parameters
matches an existing mode and returns that.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Keith Packard [Wed, 7 Feb 2018 18:31:44 +0000 (10:31 -0800)]
vulkan: Add KHR_display extension using DRM [v10]
This adds support for the KHR_display extension support to the vulkan
WSI layer. Driver support will be added separately.
v2:
* fix double ;; in wsi_common_display.c
* Move mode list from wsi_display to wsi_display_connector
* Fix scope for wsi_display_mode andwsi_display_connector
allocs
* Switch all allocations to vk_zalloc instead of vk_alloc.
* Fix DRM failure in
wsi_display_get_physical_device_display_properties
When DRM fails, or when we don't have a master fd
(presumably due to application errors), just return 0
properties from this function, which is at least a valid
response.
* Use vk_outarray for all property queries
This is a bit less error-prone than open-coding the same
stuff.
* Remove VK_COMPOSITE_ALPHA_INHERIT_BIT_KHR from surface caps
Until we have multi-plane support, we shouldn't pretend to
have any multi-plane semantics, even if undefined.
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
* Simplify addition of VK_USE_PLATFORM_DISPLAY_KHR to
vulkan_wsi_args
Suggested-by: Eric Engestrom <eric.engestrom@imgtec.com>
v3:
Add separate 'display_fd' and 'render_fd' arguments to
wsi_device_init API. This allows drivers to use different FDs
for the different aspects of the device.
Use largest mode as display size when no preferred mode.
If the display doesn't provide a preferred mode, we'll assume
that the largest supported mode is the "physical size" of the
device and report that.
v4:
Make wsi_image_state enumeration values uppercase.
Follow more common mesa conventions.
Remove 'render_fd' from wsi_device_init API. The
wsi_common_display code doesn't use this fd at all, so stop
passing it in. This avoids any potential confusion over which
fd to use when creating display-relative object handles.
Remove call to wsi_create_prime_image which would never have
been reached as the necessary condition (use_prime_blit) is
never set.
whitespace cleanups in wsi_common_display.c
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Add depth/bpp info to available surface formats. Instead of
hard-coding depth 24 bpp 32 in the drmModeAddFB call, use the
requested format to find suitable values.
Destroy kernel buffers and FBs when swapchain is destroyed. We
were leaking both of these kernel objects across swapchain
destruction.
Note that wsi_display_wait_for_event waits for anything to
happen. wsi_display_wait_for_event is simply a yield so that
the caller can then check to see if the desired state change
has occurred.
Record swapchain failures in chain for later return. If some
asynchronous swapchain activity fails, we need to tell the
application eventually. Record the failure in the swapchain
and report it at the next acquire_next_image or queue_present
call.
Fix error returns from wsi_display_setup_connector. If a
malloc failed, then the result should be
VK_ERROR_OUT_OF_HOST_MEMORY. Otherwise, the associated ioctl
failed and we're either VT switched away, or our lease has
been revoked, in which case we should return
VK_ERROR_OUT_OF_DATE_KHR.
Make sure both sides of if/else brace use matches
Note that we assume drmModeSetCrtc is synchronous. Add a
comment explaining why we can idle any previous displayed
image as soon as the mode set returns.
Note that EACCES from drmModePageFlip means VT inactive. When
vt switched away drmModePageFlip returns EACCES. Poll once a
second waiting until we get some other return value back.
Clean up after alloc failure in
wsi_display_surface_create_swapchain. Destroy any created
images, free the swapchain.
Remove physical_device from wsi_display_init_wsi. We never
need this value, so remove it from the API and from the
internal wsi_display structure.
Use drmModeAddFB2 in wsi_display_image_init. This takes a drm
format instead of depth/bpp, which provides more control over
the format of the data.
v5:
Set the 'currentStackIndex' member of the
VkDisplayPlanePropertiesKHR record to zero, instead of
indexing across all displays. This value is the stack depth of
the plane within an individual display, and as the current
code supports only a single plane per display, should be set
to zero for all elements
Discovered-by: David Mao <David.Mao@amd.com>
v6:
Remove 'platform_display' bits from the build and use the
existing 'platform_drm' instead.
v7:
Ensure VK_ICD_WSI_PLATFORM_MAX is large enough by
setting to VK_ICD_WSI_PLATFORM_DISPLAY + 1
v8:
Simplify wsi_device_init failure from wsi_display_init_wsi
by using the same pattern as the other wsi layers.
Adopt Jason Ekstrand's white space and variable declaration
suggestions. Declare variables at first use, eliminate extra
whitespace between types and names, add list iterator helpers,
switch to lower-case list_ macros.
Respond to Jason's April 8 review:
* Create a function to convert relative to absolute timeouts
to catch overflow issues in one place
* use VK_NULL_HANDLE to clear prop->currentDisplay
* Get rid of available_present_modes array.
* return OUT_OF_DATE_KHR when display_queue_next called after
display has been released.
* Make errors from mode setting fatal in display_queue_next
* Remove duplicate pthread_mutex_init call
* Add wsi_init_pthread_cond_monotonic helper function to
isolate pthread error handling from wsi_display_init_wsi
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v9:
Fix vscan handling by using MAX2(vscan, 1) everywhere. Vscan
can be zero anywhere, which is treated the same as 1.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
v10:
Respond to Vulkan CTS failures.
1. Initialize planeReorderPossible in display_properties code
2. Only report connected displays in
get_display_plane_supported_displays
3. Return VK_ERROR_OUT_OF_HOST_MEMORY when pthread cond
initialization fails.
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
4. Add vkCreateDisplayModeKHR. This doesn't actually create
new modes, it only looks to see if the requested parameters
matches an existing mode and returns that.
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Keith Packard <keithp@keithp.com>
Bas Nieuwenhuizen [Tue, 19 Jun 2018 08:05:20 +0000 (10:05 +0200)]
radv: Merge the flush bits of CMASK & DCC clear.
Probably won't be much different in practice, but still wrong.
Fixes Coverity issue
1435002.
Not CC'ing to stable since this is only hit if you enable MSAA
DCC via RADV_DEBUG.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Bas Nieuwenhuizen [Tue, 19 Jun 2018 08:03:20 +0000 (10:03 +0200)]
radv: Don't check for pipeline being set in draw.
Draws without pipeline are definitely not allowed.
Fixes Coverity issue
1434216.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Marek Olšák [Tue, 19 Jun 2018 01:34:57 +0000 (21:34 -0400)]
radeonsi: rename r600_texture -> si_texture, rxxx -> xxx or sxxx
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Marek Olšák [Tue, 19 Jun 2018 01:07:10 +0000 (21:07 -0400)]
amd,radeonsi: rename radeon_winsys_cs -> radeon_cmdbuf
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Rob Clark [Mon, 18 Jun 2018 22:22:29 +0000 (18:22 -0400)]
freedreno/a5xx: move emit_marker5() into a5xx backend
The scratch registers move again in a6xx.. so for post-a4xx let's just
move this into the backend, and move the one place it used to be needed
in core into fd5_emit_ib(). For a6xx we will do similar, calling
emit_marker6() from fd6_emit_ib().
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Thu, 14 Jun 2018 13:34:11 +0000 (09:34 -0400)]
freedreno/a5xx: fix crash in dEQP-GLES31.stress.vertex_attribute_binding.buffer_bounds.bind_vertex_buffer_offset_near_wrap_10
This is kind of a hack, but really the only problem is the
debug_assert() in OUT_RELOC(). But the debug_assert() is
useful to catch real issues. So just add some #ifdef DEBUG
code to filter things out before we hit the assert.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Wed, 13 Jun 2018 16:46:17 +0000 (12:46 -0400)]
freedreno/a5xx: don't crash if compute shader compile fails
It is impolite, and a bit annoying with dEQP (all tests running in
single process).
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Wed, 13 Jun 2018 14:50:37 +0000 (10:50 -0400)]
freedreno/ir3: fix missing recursion into block condition
Fixes a problem seen with dEQP-GLES31.functional.ssbo.layout.single_basic_array.shared.row_major_mat4
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Wed, 13 Jun 2018 13:50:34 +0000 (09:50 -0400)]
freedreno/a5xx: better FOUR_QUAD/TWO_QUAD decision for compute
If we aren't going to get full occupancy, then use TWO_QUAD.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Tue, 12 Jun 2018 13:56:12 +0000 (09:56 -0400)]
freedreno/a5xx: bordercolor fixes
Need a bit of hand-holding for stencil bordercolor, and add border color
values for sRGB.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Mon, 11 Jun 2018 18:05:19 +0000 (14:05 -0400)]
freedreno: remove per-stateobj dirty_mask's
These never got updated in fd_context_all_dirty() so actually trying to
rely on them (in the case of fd5_emit_images()) ends up in some cases
where state is not emitted but should be. Best to just rip this out.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sun, 10 Jun 2018 15:35:56 +0000 (11:35 -0400)]
freedreno/a5xx: remove one image stateblock
I think this ends up just setting uniform/const memory. But we upload
x/y/z stride differently. At best this is unneeded, at worst it could
possibly clobber other uniform/const memory.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sun, 10 Jun 2018 15:34:37 +0000 (11:34 -0400)]
freedreno/a5xx: cubemap image fixes
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Thu, 7 Jun 2018 19:00:32 +0000 (15:00 -0400)]
freedreno/ir3: handle image buffer
Similar to txf case, we need to insert a 2nd coordinate (zero).
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sun, 3 Jun 2018 16:56:09 +0000 (12:56 -0400)]
freedreno/ir3: handle arrays of images
Unlike textures, this doesn't get lowered for us. (Would be nice
if they were.. at least until we are ready to deal w/ indirect
indexing..)
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Sat, 2 Jun 2018 00:20:43 +0000 (20:20 -0400)]
freedreno/ir3: images can be arrays too
Seems I previously toally forgot about 2d-arrays, etc..
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Tue, 5 Jun 2018 17:42:21 +0000 (13:42 -0400)]
freedreno/ir3: use move_load_const pass
Signed-off-by: Rob Clark <robdclark@gmail.com>
Rob Clark [Wed, 6 Jun 2018 14:11:45 +0000 (10:11 -0400)]
nir: add pass to move load_const
Run this pass late (after opt loop) to move load_const instructions back
into the basic blocks which use the result, in cases where a load_const
is only consumed in a single block.
This helps reduce register usage in cases where the backend driver
cannot lower the load_const to a uniform.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Rob Clark [Mon, 11 Jun 2018 18:49:12 +0000 (14:49 -0400)]
mesa/st/nir: fix driver_location for arrays of image/sampler
We can have arrays of images or samplers. But I forgot to handle that
case long ago. Suprised no one complained yet.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Rob Clark [Fri, 15 Jun 2018 20:12:23 +0000 (16:12 -0400)]
nir: add comment for loop_unroll pass
Save the next person from digging through the code to figure out what
the indirect_mask parameter actually does.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Rob Clark [Fri, 15 Jun 2018 20:11:48 +0000 (16:11 -0400)]
glsl: fix random typo
Just something I stumbled across.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Marek Olšák [Sat, 9 Jun 2018 02:29:55 +0000 (22:29 -0400)]
radeonsi: ignore PIPE_RESOURCE_FLAG_MAP_COHERENT
We treat coherent and non-coherent buffers the same.
And move external_usage for better packing.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Marek Olšák [Sat, 9 Jun 2018 01:48:23 +0000 (21:48 -0400)]
radeonsi: always put persistent buffers into GTT on radeon
This improves performance for certain games.
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Marek Olšák [Sat, 9 Jun 2018 01:34:55 +0000 (21:34 -0400)]
radeonsi: fix si_get_num_queries for radeon
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Marek Olšák [Fri, 8 Jun 2018 23:21:09 +0000 (19:21 -0400)]
radeonsi: don't expose performance counters for non-existent blocks
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Marek Olšák [Fri, 8 Jun 2018 23:09:02 +0000 (19:09 -0400)]
ac/gpu_info: add radeon_info::num_tcc_blocks
The values for the radeon winsys were copied from the kernel driver.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Marek Olšák [Tue, 5 Jun 2018 06:38:06 +0000 (02:38 -0400)]
radeonsi: set a better NUM_PATCHES hard limit
AMDVLK uses 64 (distributed) and 16 (non-distributed).
radeonsi will use 63 and 16.
* This might improve tessellation performance on Hawaii, Bonaire, Tahiti,
Pitcairn. (they will use 16)
* I'm not sure if this matters for 1 SE configs.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Marek Olšák [Tue, 5 Jun 2018 06:09:52 +0000 (02:09 -0400)]
radeonsi: make sure LS-HS vector lanes are reasonably occupied
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Marek Olšák [Tue, 5 Jun 2018 05:20:23 +0000 (01:20 -0400)]
radeonsi: properly compute an LS-HS thread group size limit
"64 / max * 4" is less than "64 * 4 / max".
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Eric Anholt [Mon, 18 Jun 2018 22:46:53 +0000 (15:46 -0700)]
v3d: Fix blitting from a linear winsys BO.
This is the case for the simulator environment, and broke many blitter
tests by trying to texture from linear while the HW can only actually do
UIF/UBLINEAR/LT. Just make a temporary and copy into it with the CPU,
then blit from that.
This is the kind of path that should use the TFU, but I haven't exposed
that hardware yet.
Fixes dEQP-GLES3.functional.fbo.blit.default_framebuffer.*
Eric Anholt [Mon, 18 Jun 2018 22:00:04 +0000 (15:00 -0700)]
v3d: Add missing always_flush debug flag.
The #define existed and was checked in the driver.
Tomeu Vizoso [Mon, 18 Jun 2018 12:50:51 +0000 (14:50 +0200)]
virgl: Remove debugging left-overs
Some fprintfs were probably left unintentionally a few years ago and are
a bit of a nuisance.
Fixes: 2d3301e4d513 ("virgl: fix reference counting of prime handles")
Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Timothy Arceri [Tue, 19 Jun 2018 07:52:00 +0000 (17:52 +1000)]
glsl: fix desktop glsl linking regression
The prog->Shaders[i]->IsES check was accidentally removed causing
ES linking rules to be applied to desktop GLSL.
Fixes: 725b1a406dbe ("mesa/util: add allow_glsl_relaxed_es driconfig override")
Timothy Arceri [Thu, 14 Jun 2018 01:00:25 +0000 (11:00 +1000)]
util: add allow_glsl_relaxed_es to drirc for Google Earth VR
Reviewed-by: Dave Airlie <airlied@redhat.com>
Timothy Arceri [Thu, 14 Jun 2018 01:00:24 +0000 (11:00 +1000)]
mesa/util: add allow_glsl_relaxed_es driconfig override
This relaxes a number of ES shader restrictions allowing shaders
to follow more desktop GLSL like rules.
This initial implementation relaxes the following:
- allows linking ES shaders with desktop shaders
- allows mismatching precision qualifiers
- always enables standard derivative builtins
These relaxations allow Google Earth VR shaders to compile.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Timothy Arceri [Thu, 14 Jun 2018 01:00:23 +0000 (11:00 +1000)]
util: add allow_glsl_builtin_const_expression to drirc for Google Earth VR
Reviewed-by: Dave Airlie <airlied@redhat.com>
Timothy Arceri [Thu, 14 Jun 2018 01:00:22 +0000 (11:00 +1000)]
mesa/util: add allow_glsl_builtin_const_expression driconf override
Google Earth VR shaders uses builtins in constant expressions with
GLSL 1.10. That feature wasn't allowed until GLSL 1.20.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Timothy Arceri [Thu, 14 Jun 2018 01:00:21 +0000 (11:00 +1000)]
util: manually extract the program name from program_invocation_name
Glibc has the same code to get program_invocation_short_name. However
for some reason the short name gets mangled for some wine apps.
For example with Google Earth VR I get:
program_invocation_name:
"/home/tarceri/.local/share/Steam/steamapps/common/EarthVR/Earth.exe"
program_invocation_short_name:
"e"
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Bas Nieuwenhuizen [Mon, 18 Jun 2018 14:29:16 +0000 (16:29 +0200)]
ac/surface: Set compressZ for stencil-only surfaces.
We HTILE compress stencil-only surfaces too.
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Jason Ekstrand [Sun, 17 Jun 2018 23:28:02 +0000 (16:28 -0700)]
anv: Use a single global API patch version
The Vulkan API has only one patch version shared among all of the
major.minor versions. We should also advertise the same patch version
regardless of major.minor.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106941
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Timothy Arceri [Mon, 18 Jun 2018 05:23:20 +0000 (15:23 +1000)]
radeonsi: enable OpenGL 3.3 compat profile
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Timothy Arceri [Mon, 18 Jun 2018 02:36:54 +0000 (12:36 +1000)]
mesa: add ff fragment shader support for geom and tess shaders
This is required for compatibility profile support.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Eric Anholt [Sat, 16 Jun 2018 00:08:29 +0000 (17:08 -0700)]
v3d: Set the SO offsets correctly if we have to re-emit.
This should fix TF across a glFlush() or TF pause/restart. Fixes
dEQP-GLES3.functional.transform_feedback.array.interleaved.lines.highp_float
and many, many others.
Marek Olšák [Fri, 8 Jun 2018 23:49:22 +0000 (19:49 -0400)]
gallium/hud: = should rename the last added data source
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Rafael Antognolli [Fri, 15 Jun 2018 18:44:28 +0000 (11:44 -0700)]
anv: Disable constant buffer 0 being relative.
If we are on gen8+ and have context isolation support, just make that
constant buffer address be absolute, so we can use it for push UBOs too.
v2: Do not duplicate constant_buffer_0_is_relative flag (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Rafael Antognolli [Fri, 15 Jun 2018 16:31:25 +0000 (09:31 -0700)]
anv/device: Check for kernel support of context isolation.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Rafael Antognolli [Fri, 15 Jun 2018 18:43:45 +0000 (11:43 -0700)]
intel/genxml: Add bitmasks for CS_DEBUG_MODE2/INSTPM.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Alok Hota [Tue, 5 Jun 2018 18:59:53 +0000 (13:59 -0500)]
swr/rast: Clang-Format most rasterizer source code
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Eric Engestrom [Fri, 15 Jun 2018 16:49:08 +0000 (17:49 +0100)]
radv: fix reported number of available VGPRs
It's a bit late to round up after an integer division.
Fixes: de889794134e6245e08a2 "radv: Implement VK_AMD_shader_info"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Alex Smith <asmith@feralinteractive.com>
Eric Engestrom [Mon, 18 Jun 2018 10:39:05 +0000 (11:39 +0100)]
mesa: add missing return in error path
Fixes: 67f40dadaa6666dacd90 "mesa: add support for ARB_sample_locations"
Cc: Rhys Perry <pendingchaos02@gmail.com>
Cc: Brian Paul <brianp@vmware.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Bas Nieuwenhuizen [Sun, 17 Jun 2018 01:37:49 +0000 (03:37 +0200)]
radv: Use less conservative approximation for context rolls.
Drops the number of time we set the scissor by 4x for F1 2017,
which results in a consistent performance improvement of about 4%.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Eric Engestrom [Fri, 15 Jun 2018 16:58:17 +0000 (17:58 +0100)]
radv: fix bitwise check
Fixes: 922cd38172b8a2bc286bd "radv: implement out-of-order rasterization when it's safe on VI+"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Eric Engestrom [Fri, 15 Jun 2018 10:31:52 +0000 (11:31 +0100)]
meson: fix i965/anv/isl genX static lib names
Shouldn't make any functional difference, just that `liblibanv_gen90.a`
will now be called `libanv_gen90.a`.
Fixes: 3218056e0eb375eeda470 "meson: Build i965 and dri stack"
Fixes: d1992255bb29054fa5176 "meson: Add build Intel "anv" vulkan driver"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Timothy Arceri [Sun, 17 Jun 2018 00:00:29 +0000 (10:00 +1000)]
mesa: Unconditionally enable floating-point textures
ARB_texture_float references US Patent #6,650,327 [1] which has a filing date
of June 16 1998.
According to [2], patents filed after 1995 expire 20 years from the filing
date, giving an expiration of June 17 2018.
[1] https://www.google.com/patents/US6650327
[2] https://en.wikipedia.org/wiki/Term_of_patent_in_the_United_States
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Jose Maria Casanova Crespo [Sat, 9 Jun 2018 09:46:34 +0000 (11:46 +0200)]
intel/fs: shuffle_64bit_data_for_32bit_write is not used anymore
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jose Maria Casanova Crespo [Sat, 9 Jun 2018 09:46:17 +0000 (11:46 +0200)]
intel/fs: Use new shuffle_32bit_write for all 64-bit storage writes
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jose Maria Casanova Crespo [Sat, 9 Jun 2018 09:46:14 +0000 (11:46 +0200)]
intel/fs: shuffle_32bit_load_result_to_64bit_data is not used anymore
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jose Maria Casanova Crespo [Sat, 9 Jun 2018 09:46:11 +0000 (11:46 +0200)]
intel/fs: Use shuffle_from_32bit_read for 64-bit FS load_input
As the previous use of shuffle_32bit_load_result_to_64bit_data
had a source/destination overlap for 64-bit. Now a temporary destination
is used for 64-bit cases to use shuffle_from_32bit_read that doesn't
handle src/dst overlaps.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jose Maria Casanova Crespo [Sat, 9 Jun 2018 09:46:07 +0000 (11:46 +0200)]
intel/fs: shuffle_from_32bit_read at load_per_vertex_input at TCS/TES
Previously, the shuffle function had a source/destination overlap that
needs to be avoided to use shuffle_from_32bit_read. As we can use for
the shuffle destination the destination of removed MOVs.
This change also avoids the internal MOVs done by the previous shuffle
to deal with possible overlaps.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jose Maria Casanova Crespo [Sat, 9 Jun 2018 09:46:04 +0000 (11:46 +0200)]
intel/fs: Use shuffle_from_32bit_read at VS load_input
shuffle_from_32bit_read manages 32-bit reads to 32-bit destination
in the same way that the previous loop so now we just call the new
function for all bitsizes, simplifying also the 64-bit load_input.
v2: Add comment about future 16-bit support (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jose Maria Casanova Crespo [Sat, 9 Jun 2018 09:46:01 +0000 (11:46 +0200)]
intel/fs: Use shuffle_from_32bit_read for 64-bit gs_input_load
This implementation avoids two unneeded MOVs for each 64-bit
component. One was done in the old shuffle, to avoid cases of
src/dst overlap but this is not the case. And the removed MOV
was already being being done in the shuffle.
Copy propagation wasn't able to remove them because shuffle
destination values are defined with partial writes because they
have stride == 2.
v2: Reword commit log summary (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jose Maria Casanova Crespo [Sat, 9 Jun 2018 09:45:57 +0000 (11:45 +0200)]
intel/fs: shuffle_from_32bit_read for 64-bit do_untyped_vector_read
do_untyped_vector_read is used at load_ssbo and load_shared.
The previous MOVs are removed because shuffle_from_32bit_read
can handle storing the shuffle results in the expected destination
just using the proper offset.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jose Maria Casanova Crespo [Sat, 9 Jun 2018 09:45:54 +0000 (11:45 +0200)]
intel/fs: Remove old 16-bit shuffle/unshuffle functions
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jose Maria Casanova Crespo [Sat, 9 Jun 2018 09:45:50 +0000 (11:45 +0200)]
intel/fs: Use shuffle_for_32bit_write for 16-bits store_ssbo
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jose Maria Casanova Crespo [Sat, 9 Jun 2018 09:45:47 +0000 (11:45 +0200)]
intel/fs: Use shuffle_from_32bit_read to read 16-bit SSBO
Using shuffle_from_32bit_read instead of 16-bit shuffle functions
avoids the need of retype. At the same time new function are
ready for 8-bit type SSBO reads.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jose Maria Casanova Crespo [Sat, 9 Jun 2018 09:45:42 +0000 (11:45 +0200)]
intel/fs: Use shuffle_from_32bit_read at VARYING_PULL_CONSTANT_LOAD
shuffle_from_32bit_read can manage the shuffle/unshuffle needed
for different 8/16/32/64 bit-sizes at VARYING PULL CONSTANT LOAD.
To get the specific component the first_component parameter is used.
In the case of the previous 16-bit shuffle, the shuffle operation was
generating not needed MOVs where its results where never used. This
behaviour passed unnoticed on SIMD16 because dead_code_eliminate
pass removed the generated instructions but for SIMD8 they cound't be
removed because of being partial writes.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jose Maria Casanova Crespo [Sat, 9 Jun 2018 09:45:22 +0000 (11:45 +0200)]
intel/fs: New shuffle_for_32bit_write and shuffle_from_32bit_read
These new shuffle functions deal with the shuffle/unshuffle operations
needed for read/write operations using 32-bit components when the
read/written components have a different bit-size (8, 16, 64-bits).
Shuffle from 32-bit to 32-bit becomes a simple MOV.
shuffle_src_to_dst takes care of doing a shuffle when source type is
smaller than destination type and an unshuffle when source type is
bigger than destination. So this new read/write functions just need
to call shuffle_src_to_dst assuming that writes use a 32-bit
destination and reads use a 32-bit source.
As shuffle_for_32bit_write/from_32bit_read components take components
in unit of source/destination types and shuffle_src_to_dst takes units
of the smallest type component, we adjust components and first_component
parameters.
To enable this new functions it is needed than there is no
source/destination overlap in the case of shuffle_from_32bit_read.
That never happens on shuffle_for_32bit_write as it allocates a new
destination register as it was at shuffle_64bit_data_for_32bit_write.
v2: Reword commit log and add comments to explain why first_component
and components parameters are adjusted. (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jose Maria Casanova Crespo [Sat, 9 Jun 2018 09:45:01 +0000 (11:45 +0200)]
intel/fs: general 8/16/32/64-bit shuffle_src_to_dst function
This new function takes care of shuffle/unshuffle components of a
particular bit-size in components with a different bit-size.
If source type size is smaller than destination type size the operation
needed is a component shuffle. The opposite case would be an unshuffle.
Component units are measured in terms of the smaller type between
source and destination. As we are un/shuffling the smaller components
from/into a bigger one.
The operation allows to skip first_component number of components from
the source.
Shuffle MOVs are retyped using integer types avoiding problems with
denorms and float types if source and destination bitsize is different.
This allows to simplify uses of shuffle functions that are dealing with
these retypes individually.
Now there is a new restriction so source and destination can not overlap
anymore when calling this shuffle function. Following patches that migrate
to use this new function will take care individually of avoiding source
and destination overlaps.
v2: (Jason Ekstrand)
- Rewrite overlap asserts.
- Manage type_sz(src.type) == type_sz(dst.type) case using MOVs
from source to dest. This works for 64-bit to 64-bits
operation that on Gen7 as it doesn't support Q registers.
- Explain that components units are based in the smallest type.
v3: - Fix unshuffle overlap assert (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Jose Fonseca [Sat, 16 Jun 2018 09:02:15 +0000 (10:02 +0100)]
appveyor: Consume LLVM 5.0.1.
https://ci.appveyor.com/project/jrfonseca/mesa/build/47
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Bas Nieuwenhuizen [Thu, 14 Jun 2018 22:01:43 +0000 (00:01 +0200)]
ac: Clear meminfo to avoid valgrind warning.
Somehow valgrind misses that the value is initialized by the ioctl.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Samuel Pitoiset [Fri, 15 Jun 2018 15:50:35 +0000 (17:50 +0200)]
radv: fix emitting the TCS regs on GFX9
The primitive ID is NULL and this generates an invalid
select instruction which crashes because one operand is NULL.
This fixes crashes in The Long Journey Home, Quantum Break
and Just Cause 3 with DXVK.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106756
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Ian Romanick [Wed, 6 Jun 2018 02:00:42 +0000 (19:00 -0700)]
nir: Document a couple instances of parent_instr
nir_ssa_def::parent_instr and nir_src::parent_instr have the same name,
but they mean really different things. I choose to save the next person
the hour+ that I just spent figuring that out. Even now that I know, I
doubt I'd notice in code review that someone typed foo->parent_instr
when they actually meant foo->ssa->parent_instr.
v2: Minor wording tweak in nir_ssa_def::parent_instr. Suggested by
Jason.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Ian Romanick [Wed, 13 Jun 2018 17:36:42 +0000 (10:36 -0700)]
i965/fs: Propagate conditional modifiers from not instructions
Skylake
total instructions in shared programs:
14399081 ->
14399010 (<.01%)
instructions in affected programs: 26961 -> 26890 (-0.26%)
helped: 57
HURT: 0
helped stats (abs) min: 1 max: 6 x̄: 1.25 x̃: 1
helped stats (rel) min: 0.16% max: 0.80% x̄: 0.30% x̃: 0.18%
95% mean confidence interval for instructions value: -1.50 -0.99
95% mean confidence interval for instructions %-change: -0.35% -0.25%
Instructions are helped.
total cycles in shared programs:
532978307 ->
532976050 (<.01%)
cycles in affected programs: 468629 -> 466372 (-0.48%)
helped: 33
HURT: 20
helped stats (abs) min: 3 max: 360 x̄: 116.52 x̃: 98
helped stats (rel) min: 0.06% max: 3.63% x̄: 1.66% x̃: 1.27%
HURT stats (abs) min: 2 max: 172 x̄: 79.40 x̃: 43
HURT stats (rel) min: 0.04% max: 3.02% x̄: 1.48% x̃: 0.44%
95% mean confidence interval for cycles value: -81.29 -3.88
95% mean confidence interval for cycles %-change: -1.07% 0.12%
Inconclusive result (%-change mean confidence interval includes 0).
All Gen6+ platforms, except Ivy Bridge, had similar results. (Haswell shown)
total instructions in shared programs:
12973897 ->
12973838 (<.01%)
instructions in affected programs: 25970 -> 25911 (-0.23%)
helped: 55
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.07 x̃: 1
helped stats (rel) min: 0.16% max: 0.62% x̄: 0.28% x̃: 0.18%
95% mean confidence interval for instructions value: -1.14 -1.00
95% mean confidence interval for instructions %-change: -0.32% -0.24%
Instructions are helped.
total cycles in shared programs:
410355841 ->
410352067 (<.01%)
cycles in affected programs: 578454 -> 574680 (-0.65%)
helped: 47
HURT: 5
helped stats (abs) min: 3 max: 360 x̄: 85.74 x̃: 18
helped stats (rel) min: 0.05% max: 3.68% x̄: 1.18% x̃: 0.38%
HURT stats (abs) min: 2 max: 242 x̄: 51.20 x̃: 4
HURT stats (rel) min: <.01% max: 0.45% x̄: 0.15% x̃: 0.11%
95% mean confidence interval for cycles value: -104.89 -40.27
95% mean confidence interval for cycles %-change: -1.45% -0.66%
Cycles are helped.
Ivy Bridge
total instructions in shared programs:
11679351 ->
11679301 (<.01%)
instructions in affected programs: 28208 -> 28158 (-0.18%)
helped: 50
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.12% max: 0.54% x̄: 0.23% x̃: 0.16%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -0.27% -0.19%
Instructions are helped.
total cycles in shared programs:
257445362 ->
257444662 (<.01%)
cycles in affected programs: 419338 -> 418638 (-0.17%)
helped: 40
HURT: 3
helped stats (abs) min: 1 max: 170 x̄: 65.05 x̃: 24
helped stats (rel) min: 0.02% max: 3.51% x̄: 1.26% x̃: 0.41%
HURT stats (abs) min: 2 max: 1588 x̄: 634.00 x̃: 312
HURT stats (rel) min: 0.05% max: 2.97% x̄: 1.21% x̃: 0.62%
95% mean confidence interval for cycles value: -97.96 65.41
95% mean confidence interval for cycles %-change: -1.56% -0.62%
Inconclusive result (value mean confidence interval includes 0).
No changes on Iron Lake or GM45.
v2: Move 'if (cond != BRW_CONDITIONAL_Z && cond != BRW_CONDITIONAL_NZ)'
check outside the loop. Suggested by Iago.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Ian Romanick [Wed, 13 Jun 2018 17:11:31 +0000 (10:11 -0700)]
i965/fs: Rearrange code to remove most of the gotos
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Ian Romanick [Wed, 13 Jun 2018 17:04:55 +0000 (10:04 -0700)]
i965/fs: Refactor propagation of conditional modifiers from compares to adds
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Ian Romanick [Wed, 13 Jun 2018 22:07:41 +0000 (15:07 -0700)]
i965/vec4: Optimize OR with 0 into a MOV
All of the affected shaders are geometry shaders... the same ones from
the similar fs changes.
The "No changes on any other platforms" comment below is not quite
right. Without the previous change to register coalescing, this
optimization caused quite a few regressions in tests that either used
gl_ClipVertex or used different interpolation modes. I observed that
with both patches applied,
glsl-1.10/execution/interpolation/interpolation-none-gl_BackSecondaryColor-smooth-vertex.shader_test
was one instruction shorter. I suspect other shaders would be similarly
affected. Since this is all based on NOS, shader-db does not reflect
it.
Haswell
total instructions in shared programs:
12954955 ->
12954918 (<.01%)
instructions in affected programs: 3603 -> 3566 (-1.03%)
helped: 37
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.21% max: 2.50% x̄: 1.99% x̃: 2.50%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.30% -1.69%
Instructions are helped.
total cycles in shared programs:
410012108 ->
410012098 (<.01%)
cycles in affected programs: 3540 -> 3530 (-0.28%)
helped: 5
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.28% max: 0.28% x̄: 0.28% x̃: 0.28%
95% mean confidence interval for cycles value: -2.00 -2.00
95% mean confidence interval for cycles %-change: -0.28% -0.28%
Cycles are helped.
Ivy Bridge
total instructions in shared programs:
11679387 ->
11679351 (<.01%)
instructions in affected programs: 3292 -> 3256 (-1.09%)
helped: 36
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.21% max: 2.50% x̄: 2.04% x̃: 2.50%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.34% -1.74%
Instructions are helped.
No changes on any other platforms.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Ian Romanick [Thu, 14 Jun 2018 22:26:58 +0000 (15:26 -0700)]
i965/vec4: Don't register coalesce into source of VS_OPCODE_UNPACK_FLAGS_SIMD4X2
This prevents regressions in a bunch of clipping and interpolation tests
caused by the next patch (i965/vec4: Optimize OR with 0 into a MOV).
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Ian Romanick [Wed, 13 Jun 2018 19:32:27 +0000 (12:32 -0700)]
i965/fs: Optimize OR with 0 into a MOV
fs_visitor::set_gs_stream_control_data_bits generates some code like
"control_data_bits | stream_id << ((2 * (vertex_count - 1)) % 32)" as
part of EmitVertex. The first time this (dynamically) occurs in the
shader, control_data_bits is zero. Many times we can determine this
statically and various optimizations will collaborate to make one of the
OR operands literal zero.
Converting the OR to a MOV usually allows it to be copy-propagated away.
However, this does not happen in at least some shaders (in the assembly
output of shaders/closed/UnrealEngine4/EffectsCaveDemo/301.shader_test,
search for shl).
All of the affected shaders are geometry shaders.
Broadwell and Skylake had similar results. (Skylake shown)
total instructions in shared programs:
14375452 ->
14375413 (<.01%)
instructions in affected programs: 6422 -> 6383 (-0.61%)
helped: 39
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.14% max: 2.56% x̄: 1.91% x̃: 2.56%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.26% -1.57%
Instructions are helped.
total cycles in shared programs:
531981179 ->
531980555 (<.01%)
cycles in affected programs: 27493 -> 26869 (-2.27%)
helped: 39
HURT: 0
helped stats (abs) min: 16 max: 16 x̄: 16.00 x̃: 16
helped stats (rel) min: 0.60% max: 7.92% x̄: 5.94% x̃: 7.92%
95% mean confidence interval for cycles value: -16.00 -16.00
95% mean confidence interval for cycles %-change: -6.98% -4.90%
Cycles are helped.
No changes on earlier platforms.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Eric Anholt [Fri, 15 Jun 2018 01:05:50 +0000 (18:05 -0700)]
v3d: Handle a no-intersection scissor even if it's outside of the VP.
The min/maxes ended up producing a negative clip width/height for
dEQP-GLES3.functional.fragment_ops.scissor.outside_render_line. Just make
sure they stay at 0 (or v3d 3.x's workaround) if that happens.