i965: Ask the register allocator to round-robin through registers.
The way we were allocating registers before, packing into low register
numbers for Ironlake, resulted in an overly-constrained dependency graph
for instruction scheduling. Improves GLBenchmark 2.1 performance by
4.5% +/- 0.7% (n=26). No difference on my old GLSL demo (n=20). No
difference on nexuiz (n=15).
v2: Fix off-by-one bug that made the change only work for 16-wide on i965.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>