From: Jason Ekstrand Date: Thu, 14 Mar 2019 17:58:16 +0000 (-0500) Subject: intel/nir: Lower array-deref-of-vector UBO and SSBO loads X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=d3386e73c5976ecec84821d17f05c2fd4b823880;p=mesa.git intel/nir: Lower array-deref-of-vector UBO and SSBO loads This fixes a serious performance issue with DXVK: https://github.com/doitsujin/dxvk/issues/937 This was caused by a recent change that to improve performance on RADV which back-fired on ANV and killed performance for some apps: https://github.com/doitsujin/dxvk/commit/e5a06d3f4a103a54cd4eb51970fedee405d1d698 Throwing in this bit of lowering lets us come along and CSE those UBO loads (or copy-prop for SSBO load) and get one load where we previously would have gotten several. VkPipeline-db results on Kaby Lake: total instructions in shared programs: 5115361 -> 5073185 (-0.82%) instructions in affected programs: 1754333 -> 1712157 (-2.40%) helped: 5331 HURT: 63 total cycles in shared programs: 2544501169 -> 2481144545 (-2.49%) cycles in affected programs: 2531058653 -> 2467702029 (-2.50%) helped: 9202 HURT: 4323 total loops in shared programs: 3340 -> 3331 (-0.27%) loops in affected programs: 9 -> 0 helped: 9 HURT: 0 total spills in shared programs: 3246 -> 3053 (-5.95%) spills in affected programs: 384 -> 191 (-50.26%) helped: 10 HURT: 5 total fills in shared programs: 4626 -> 4452 (-3.76%) fills in affected programs: 439 -> 265 (-39.64%) helped: 10 HURT: 5 All of the shaders with hurt spilling were in Rise of the Tomb Raider which also had shaders solidly helped in the spilling department. Not shown in those results (because I've not had success dumping the shaders) is Witcher 3 where this reduces spilling and improves over-all perf by around 20-25%. There were no shader-db changes. Apparently, this just isn't a pattern that happens in OpenGL. Reviewed-by: Caio Marcelo de Oliveira Filho Cc: "19.0" mesa-stable@lists.freedesktop.org --- diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c index 5734987a964..7719ad40251 100644 --- a/src/intel/compiler/brw_nir.c +++ b/src/intel/compiler/brw_nir.c @@ -742,6 +742,17 @@ brw_preprocess_nir(const struct brw_compiler *compiler, nir_shader *nir, brw_nir_no_indirect_mask(compiler, nir->info.stage); OPT(nir_lower_indirect_derefs, indirect_mask); + /* Lower array derefs of vectors for SSBO and UBO loads. For both UBOs and + * SSBOs, our back-end is capable of loading an entire vec4 at a time and + * we would like to take advantage of that whenever possible regardless of + * whether or not the app gives us full loads. This should allow the + * optimizer to combine UBO and SSBO load operations and save us some send + * messages. + */ + OPT(nir_lower_array_deref_of_vec, + nir_var_mem_ubo | nir_var_mem_ssbo, + nir_lower_direct_array_deref_of_vec_load); + /* Get rid of split copies */ nir = brw_nir_optimize(nir, compiler, is_scalar, false);