From a7d029d3dfac1da2701be75ff4d1589ac562e916 Mon Sep 17 00:00:00 2001 From: Kenneth Graunke Date: Thu, 9 Jun 2016 16:11:46 -0700 Subject: [PATCH] i965: Account for poor address calculations in Haswell CS scratch size. Curro figured this out by investigating the simulator. Apparently there's also a workaround in the Windows driver. I'm not sure it's actually documented anywhere. We were underallocating the scratch buffer by a factor of 128/70. v2: Rename threads_per_subslice to scratch_ids_per_subslice (suggested by Jordan Justen). Cc: "12.0" Signed-off-by: Kenneth Graunke Reviewed-by: Francisco Jerez Reviewed-by: Jordan Justen --- src/mesa/drivers/dri/i965/brw_cs.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_cs.c b/src/mesa/drivers/dri/i965/brw_cs.c index c8598d61891..22856b64179 100644 --- a/src/mesa/drivers/dri/i965/brw_cs.c +++ b/src/mesa/drivers/dri/i965/brw_cs.c @@ -150,9 +150,28 @@ brw_codegen_cs_prog(struct brw_context *brw, if (prog_data.base.total_scratch) { const unsigned subslices = MAX2(brw->intelScreen->subslice_total, 1); + + /* WaCSScratchSize:hsw + * + * Haswell's scratch space address calculation appears to be sparse + * rather than tightly packed. The Thread ID has bits indicating + * which subslice, EU within a subslice, and thread within an EU + * it is. There's a maximum of two slices and two subslices, so these + * can be stored with a single bit. Even though there are only 10 EUs + * per subslice, this is stored in 4 bits, so there's an effective + * maximum value of 16 EUs. Similarly, although there are only 7 + * threads per EU, this is stored in a 3 bit number, giving an effective + * maximum value of 8 threads per EU. + * + * This means that we need to use 16 * 8 instead of 10 * 7 for the + * number of threads per subslice. + */ + const unsigned scratch_ids_per_subslice = + brw->is_haswell ? 16 * 8 : brw->max_cs_threads; + brw_get_scratch_bo(brw, &brw->cs.base.scratch_bo, prog_data.base.total_scratch * - brw->max_cs_threads * subslices); + scratch_ids_per_subslice * subslices); } if (unlikely(INTEL_DEBUG & DEBUG_CS)) -- 2.30.2