From 456dbcc3377ee23dbeffa4da02a4d80a8519bb62 Mon Sep 17 00:00:00 2001 From: Eric Anholt Date: Mon, 3 Dec 2012 19:59:55 -0800 Subject: [PATCH] i965/fs: Before reg alloc, schedule instructions to reduce live ranges. This came from an idea by Ben Segovia. 16-wide pixel shaders are very important for latency hiding on i965, so we want to try really hard to get them. If scheduling an instruction makes some set of instructions available, those are probably the ones that make the instruction's result dead. By choosing those first, we'll have a tendency to reduce the amount of live data as opposed to creating more. Previously, we were sometimes getting this behavior out of the scheduler, which was what produced the scheduler's original performance wins on lightsmark. Unfortunately, that was mostly an accident of the lame instruction latency information that I had, which made it impossible to fix the actual scheduling for performance. Now that we've fixed the scheduling for setup for register allocation, we can safely update the latency parameters for the final schedule. In shader-db, we lose 37 16-wide shaders, but gain 90 new ones. 4 shaders that were spilling change how many registers spill, for a reduction of 70/3899 instructions. v2: Simplify the new loop. Acked-by: Kenneth Graunke --- .../dri/i965/brw_fs_schedule_instructions.cpp | 47 ++++++++++++++++--- 1 file changed, 41 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp index 14de5f804a3..76bd5b2cc36 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_schedule_instructions.cpp @@ -495,13 +495,48 @@ instruction_scheduler::schedule_instructions(fs_inst *next_block_header) schedule_node *chosen = NULL; int chosen_time = 0; - foreach_list(node, &instructions) { - schedule_node *n = (schedule_node *)node; + if (post_reg_alloc) { + /* Of the instructions closest ready to execute or the closest to + * being ready, choose the oldest one. + */ + foreach_list(node, &instructions) { + schedule_node *n = (schedule_node *)node; + + if (!chosen || n->unblocked_time < chosen_time) { + chosen = n; + chosen_time = n->unblocked_time; + } + } + } else { + /* Before register allocation, we don't care about the latencies of + * instructions. All we care about is reducing live intervals of + * variables so that we can avoid register spilling, or get 16-wide + * shaders which naturally do a better job of hiding instruction + * latency. + * + * To do so, schedule our instructions in a roughly LIFO/depth-first + * order: when new instructions become available as a result of + * scheduling something, choose those first so that our result + * hopefully is consumed quickly. + * + * The exception is messages that generate more than one result + * register (AKA texturing). In those cases, the LIFO search would + * normally tend to choose them quickly (because scheduling the + * previous message not only unblocked the children using its result, + * but also the MRF setup for the next sampler message, which in turn + * unblocks the next sampler message). + */ + for (schedule_node *node = (schedule_node *)instructions.get_tail(); + node != instructions.get_head()->prev; + node = (schedule_node *)node->prev) { + schedule_node *n = (schedule_node *)node; + + chosen = n; + if (chosen->inst->regs_written() <= 1) + break; + } - if (!chosen || n->unblocked_time < chosen_time) { - chosen = n; - chosen_time = n->unblocked_time; - } + chosen_time = chosen->unblocked_time; } /* Schedule this instruction. */ -- 2.30.2