From: Jason Ekstrand Date: Wed, 11 Jul 2018 23:31:02 +0000 (-0700) Subject: anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=abd629eb3d4027b89c13158e90c6732b412e550e;p=mesa.git anv: Stop setting 3DSTATE_PS_EXTRA::PixelShaderHasUAV We've had several broadwell hangs that have come down to this bit just not working correctly. Most recently, we've had a pile of hangs reported with apps running under DXVK: https://github.com/doitsujin/dxvk/issues/469 Instead, use the bit that doesn't try to imply weird D3D coherency things and just force-enables the PS like we want. cc: mesa-stable@lists.freedesktop.org Reviewed-by: Kenneth Graunke --- diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c index 6f6793e1221..ec47360764c 100644 --- a/src/intel/vulkan/genX_pipeline.c +++ b/src/intel/vulkan/genX_pipeline.c @@ -1389,6 +1389,28 @@ emit_3dstate_wm(struct anv_pipeline *pipeline, struct anv_subpass *subpass, wm.EarlyDepthStencilControl = EDSC_NORMAL; } +#if GEN_GEN >= 8 + /* Gen8 hardware tries to compute ThreadDispatchEnable for us but + * doesn't take into account KillPixels when no depth or stencil + * writes are enabled. In order for occlusion queries to work + * correctly with no attachments, we need to force-enable PS thread + * dispatch. + * + * The BDW docs are pretty clear that that this bit isn't validated + * and probably shouldn't be used in production: + * + * "This must always be set to Normal. This field should not be + * tested for functional validation." + * + * Unfortunately, however, the other mechanism we have for doing this + * is 3DSTATE_PS_EXTRA::PixelShaderHasUAV which causes hangs on BDW. + * Given two bad options, we choose the one which works. + */ + if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) && + !has_color_buffer_write_enabled(pipeline, blend)) + wm.ForceThreadDispatchEnable = ForceON; +#endif + wm.BarycentricInterpolationMode = wm_prog_data->barycentric_interp_modes; @@ -1583,37 +1605,6 @@ emit_3dstate_ps_extra(struct anv_pipeline *pipeline, ps.PixelShaderKillsPixel = subpass->has_ds_self_dep || wm_prog_data->uses_kill; - /* The stricter cross-primitive coherency guarantees that the hardware - * gives us with the "Accesses UAV" bit set for at least one shader stage - * and the "UAV coherency required" bit set on the 3DPRIMITIVE command are - * redundant within the current image, atomic counter and SSBO GL APIs, - * which all have very loose ordering and coherency requirements and - * generally rely on the application to insert explicit barriers when a - * shader invocation is expected to see the memory writes performed by the - * invocations of some previous primitive. Regardless of the value of - * "UAV coherency required", the "Accesses UAV" bits will implicitly cause - * an in most cases useless DC flush when the lowermost stage with the bit - * set finishes execution. - * - * It would be nice to disable it, but in some cases we can't because on - * Gen8+ it also has an influence on rasterization via the PS UAV-only - * signal (which could be set independently from the coherency mechanism - * in the 3DSTATE_WM command on Gen7), and because in some cases it will - * determine whether the hardware skips execution of the fragment shader - * or not via the ThreadDispatchEnable signal. However if we know that - * GEN8_PS_BLEND_HAS_WRITEABLE_RT is going to be set and - * GEN8_PSX_PIXEL_SHADER_NO_RT_WRITE is not set it shouldn't make any - * difference so we may just disable it here. - * - * Gen8 hardware tries to compute ThreadDispatchEnable for us but doesn't - * take into account KillPixels when no depth or stencil writes are - * enabled. In order for occlusion queries to work correctly with no - * attachments, we need to force-enable here. - */ - if ((wm_prog_data->has_side_effects || wm_prog_data->uses_kill) && - !has_color_buffer_write_enabled(pipeline, blend)) - ps.PixelShaderHasUAV = true; - #if GEN_GEN >= 9 ps.PixelShaderComputesStencil = wm_prog_data->computed_stencil; ps.PixelShaderPullsBary = wm_prog_data->pulls_bary;