From: Jason Ekstrand Date: Sun, 4 Dec 2016 01:15:42 +0000 (-0800) Subject: intel/fs: Lower large local arrays to scratch X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=69244fc72a89b04915e3b81a877f3eaf2e5ec078;p=mesa.git intel/fs: Lower large local arrays to scratch Shader-db results on Kaby Lake: total instructions in shared programs: 14929212 -> 14880028 (-0.33%) instructions in affected programs: 72428 -> 23244 (-67.91%) helped: 6 HURT: 2 helped stats (abs) min: 2165 max: 15981 x̄: 8590.00 x̃: 7624 helped stats (rel) min: 56.06% max: 74.52% x̄: 67.55% x̃: 72.08% HURT stats (abs) min: 1178 max: 1178 x̄: 1178.00 x̃: 1178 HURT stats (rel) min: 350.60% max: 361.35% x̄: 355.97% x̃: 355.97% 95% mean confidence interval for instructions value: -11947.03 -348.97 95% mean confidence interval for instructions %-change: -125.72% 202.37% Inconclusive result (%-change mean confidence interval includes 0). total cycles in shared programs: 368585300 -> 342557344 (-7.06%) cycles in affected programs: 28144921 -> 2116965 (-92.48%) helped: 6 HURT: 2 helped stats (abs) min: 1404978 max: 7766106 x̄: 4353922.00 x̃: 3890682 helped stats (rel) min: 82.01% max: 95.57% x̄: 89.95% x̃: 92.28% HURT stats (abs) min: 47778 max: 47798 x̄: 47788.00 x̃: 47788 HURT stats (rel) min: 278.20% max: 282.98% x̄: 280.59% x̃: 280.59% 95% mean confidence interval for cycles value: -5900438.73 -606550.27 95% mean confidence interval for cycles %-change: -140.79% 146.16% Inconclusive result (%-change mean confidence interval includes 0). total spills in shared programs: 9243 -> 8901 (-3.70%) spills in affected programs: 2718 -> 2376 (-12.58%) helped: 4 HURT: 4 total fills in shared programs: 21831 -> 10141 (-53.55%) fills in affected programs: 11804 -> 114 (-99.03%) helped: 6 HURT: 2 total sends in shared programs: 815912 -> 815912 (0.00%) sends in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 1 GAINED: 3 The helped shaders are all compute shaders in Aztec Ruins. There is also a compute shader in synmark2 OglCSDof that's helped but it doesn't show up in above shader-db results because it went from SIMD8 to SIMD16. That shader improves enough to yield an 15-20% performance boost to the benchmark as a whole on my KBL laptop. The hurt shaders are a couple shaders in Kerbal Space Program and a couple in Aztec Ruins. Reviewed-by: Caio Marcelo de Oliveira Filho --- diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c index 547b60ddc0f..31bf25bb88a 100644 --- a/src/intel/compiler/brw_nir.c +++ b/src/intel/compiler/brw_nir.c @@ -718,6 +718,25 @@ brw_preprocess_nir(const struct brw_compiler *compiler, nir_shader *nir, OPT(nir_lower_clip_cull_distance_arrays); + if (devinfo->gen >= 7 && is_scalar) { + /* TODO: Yes, we could in theory do this on gen6 and earlier. However, + * that would require plumbing through support for these indirect + * scratch read/write messages with message registers and that's just a + * pain. Also, the primary benefit of this is for compute shaders which + * won't run on gen6 and earlier anyway. + * + * The threshold of 128B was chosen semi-arbitrarily. The idea is that + * 128B per channel on a SIMD8 program is 32 registers or 25% of the + * register file. Any array that large is likely to cause pressure + * issues. Also, this value is sufficiently high that the benchmarks + * known to suffer from large temporary array issues are helped but + * nothing else in shader-db is hurt except for maybe that one kerbal + * space program shader. + */ + OPT(nir_lower_vars_to_scratch, nir_var_function_temp, 128, + glsl_get_natural_size_align_bytes); + } + nir_variable_mode indirect_mask = brw_nir_no_indirect_mask(compiler, nir->info.stage); OPT(nir_lower_indirect_derefs, indirect_mask);