From: Kenneth Graunke <kenneth@whitecape.org>
Date: Fri, 8 Jun 2018 21:24:16 +0000 (-0700)
Subject: intel/compiler: Properly consider UBO loads that cross 32B boundaries.
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=b8fa847c2ed9c7c743f31e57560a09fae3992f46;p=mesa.git

intel/compiler: Properly consider UBO loads that cross 32B boundaries.

The UBO push analysis pass incorrectly assumed that all values would fit
within a 32B chunk, and only recorded a bit for the 32B chunk containing
the starting offset.

For example, if a UBO contained the following, tightly packed:

   vec4 a;  // [0, 16)
   float b; // [16, 20)
   vec4 c;  // [20, 36)

then, c would start at offset 20 / 32 = 0 and end at 36 / 32 = 1,
which means that we ought to record two 32B chunks in the bitfield.

Similarly, dvec4s would suffer from the same problem.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
---

diff --git a/src/intel/compiler/brw_nir_analyze_ubo_ranges.c b/src/intel/compiler/brw_nir_analyze_ubo_ranges.c
index d58fe3dd2e3..6d6ccf73ade 100644
--- a/src/intel/compiler/brw_nir_analyze_ubo_ranges.c
+++ b/src/intel/compiler/brw_nir_analyze_ubo_ranges.c
@@ -141,10 +141,16 @@ analyze_ubos_block(struct ubo_analysis_state *state, nir_block *block)
          if (offset >= 64)
             continue;
 
+         /* The value might span multiple 32-byte chunks. */
+         const int bytes = nir_intrinsic_dest_components(intrin) *
+                           (nir_dest_bit_size(intrin->dest) / 8);
+         const int end = DIV_ROUND_UP(offset_const->u32[0] + bytes, 32);
+         const int regs = end - offset + 1;
+
          /* TODO: should we count uses in loops as higher benefit? */
 
          struct ubo_block_info *info = get_block_info(state, block);
-         info->offsets |= 1ull << offset;
+         info->offsets |= ((1ull << regs) - 1) << offset;
          info->uses[offset]++;
       }
    }