ac: only load used channels when sampling buffer views
authorSamuel Pitoiset <samuel.pitoiset@gmail.com>
Wed, 10 Jan 2018 19:12:11 +0000 (20:12 +0100)
committerSamuel Pitoiset <samuel.pitoiset@gmail.com>
Fri, 26 Jan 2018 11:14:27 +0000 (12:14 +0100)
This allows to reduce the number of dwords that are loaded
with buffer_load_format_xyzw. For example, when the only used
channel is 1, the driver will emit buffer_load_format_x instead.

Shader stats for DOW3 (with some local hacky scripts for SPIRV):

143 shaders in 143 tests
Totals:
SGPRS: 5344 -> 5352 (0.15 %)
VGPRS: 3476 -> 3452 (-0.69 %)
Spilled SGPRs: 30 -> 29 (-3.33 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 269860 -> 269808 (-0.02 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 1267 -> 1272 (0.39 %)
Wait states: 0 -> 0 (0.00 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
src/amd/common/ac_nir_to_llvm.c

index b40769fe5a0d1d3f0b9988c8fd20435fa0eade4b..5b8c346d0053dfe8cfc685557480f19fd2ad508e 100644 (file)
@@ -2315,11 +2315,14 @@ static LLVMValueRef build_tex_intrinsic(struct ac_nir_context *ctx,
                                        struct ac_image_args *args)
 {
        if (instr->sampler_dim == GLSL_SAMPLER_DIM_BUF) {
+               unsigned mask = nir_ssa_def_components_read(&instr->dest.ssa);
+
                return ac_build_buffer_load_format(&ctx->ac,
                                                   args->resource,
                                                   args->addr,
                                                   ctx->ac.i32_0,
-                                                  4, true);
+                                                  util_last_bit(mask),
+                                                  true);
        }
 
        args->opcode = ac_image_sample;