From be2990d8fbcd5b4b450e7bfd2053b62d66153b8d Mon Sep 17 00:00:00 2001
From: Jason Ekstrand <jason.ekstrand@intel.com>
Date: Sat, 9 Mar 2019 09:08:44 -0600
Subject: [PATCH] i965: Stop setting LowerBuferInterfaceBlocks

Instead, we do UBO and SSBO deref lowering in NIR after we've given it a
chance to optimize SSBO access:

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15235775 -> 15235484 (<.01%)
    instructions in affected programs: 14992 -> 14701 (-1.94%)
    helped: 19
    HURT: 20

    total cycles in shared programs: 339220331 -> 339027307 (-0.06%)
    cycles in affected programs: 79831981 -> 79638957 (-0.24%)
    helped: 540
    HURT: 602

    total loops in shared programs: 4402 -> 4348 (-1.23%)
    loops in affected programs: 186 -> 132 (-29.03%)
    helped: 27
    HURT: 0

    total spills in shared programs: 23261 -> 23234 (-0.12%)
    spills in affected programs: 38 -> 11 (-71.05%)
    helped: 1
    HURT: 0

    total fills in shared programs: 31442 -> 31371 (-0.23%)
    fills in affected programs: 98 -> 27 (-72.45%)
    helped: 1
    HURT: 0

    LOST:   12
    GAINED: 12

Most of the help and hurt in instruction counts was just churn caused by
re-ordering of optimizations and the fact that the NIR deref lowering
code is emitting slightly different instructions.  Nothing was hurt by
more than three instructions and most things weren't helped by more than
four.  The primary exception to this is one Car Chase shader:

    shaders/non-free/gfxbench4/carchase/341.shader_test CS SIMD32: 1144 -> 821 (-28.23%)

There is also one compute shader in Manhattan 3.1 and a fragment shader
in the UE4 Shooter Game demo that now get a loop partially unrolled.
Those showed up in the results as hurt instructions but were manually
removed to get the results above.

The lost/gained was a dozen Car Chase shaders that went from SIMD8 to
SIMD16 thanks to improved register pressure:

    shaders/non-free/gfxbench4/carchase/366.shader_test CS
    shaders/non-free/gfxbench4/carchase/368.shader_test CS
    shaders/non-free/gfxbench4/carchase/370.shader_test CS
    shaders/non-free/gfxbench4/carchase/372.shader_test CS
    shaders/non-free/gfxbench4/carchase/376.shader_test CS
    shaders/non-free/gfxbench4/carchase/378.shader_test CS
    shaders/non-free/gfxbench4/carchase/380.shader_test CS
    shaders/non-free/gfxbench4/carchase/382.shader_test CS
    shaders/non-free/gfxbench4/carchase/384.shader_test CS
    shaders/non-free/gfxbench4/carchase/388.shader_test CS
    shaders/non-free/gfxbench4/carchase/4.shader_test CS
    shaders/non-free/gfxbench4/carchase/6.shader_test CS

Given how much it appeared to be improved, I ran Car Chase on my laptop.
Unfortunately, I wasn't able to see any measurable improvement.  It
might be helped by 1-2% but it's in the noise.  It does render correctly
as far as I can tell so the improvement is legitimate.

All of the loops that got delete were in dolphin uber shaders.  I've had
no opportunity to test them for correctness or performance.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
---
 src/intel/compiler/brw_compiler.c       | 1 -
 src/mesa/drivers/dri/i965/brw_program.c | 4 ++++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_compiler.c b/src/intel/compiler/brw_compiler.c
index 4101f99d992..4fa02ca4d10 100644
--- a/src/intel/compiler/brw_compiler.c
+++ b/src/intel/compiler/brw_compiler.c
@@ -206,7 +206,6 @@ brw_compiler_create(void *mem_ctx, const struct gen_device_info *devinfo)
       nir_options->lower_doubles_options = fp64_options;
       compiler->glsl_compiler_options[i].NirOptions = nir_options;
 
-      compiler->glsl_compiler_options[i].LowerBufferInterfaceBlocks = true;
       compiler->glsl_compiler_options[i].ClampBlockIndicesToArrayBounds = true;
    }
 
diff --git a/src/mesa/drivers/dri/i965/brw_program.c b/src/mesa/drivers/dri/i965/brw_program.c
index 4f9b7aa71d7..2ef11508c79 100644
--- a/src/mesa/drivers/dri/i965/brw_program.c
+++ b/src/mesa/drivers/dri/i965/brw_program.c
@@ -121,6 +121,10 @@ brw_create_nir(struct brw_context *brw,
 
    NIR_PASS_V(nir, brw_nir_lower_image_load_store, devinfo);
 
+   NIR_PASS_V(nir, gl_nir_lower_buffers, shader_prog);
+   /* Do a round of constant folding to clean up address calculations */
+   NIR_PASS_V(nir, nir_opt_constant_folding);
+
    if (stage == MESA_SHADER_TESS_CTRL) {
       /* Lower gl_PatchVerticesIn from a sys. value to a uniform on Gen8+. */
       static const gl_state_index16 tokens[STATE_LENGTH] =
-- 
2.30.2