It turns out there's nothing in the hardware preventing this. It
appears that it ought to work on pre-gen6 as well, but just produces
GPU hangs.
Improves glbenchmark Egypt framerate 4.4% +/- 0.3% (n=3), and Pro by
2.6% +/- 0.6% (n=3).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
insn = next_insn(p, BRW_OPCODE_IF);
brw_set_dest(p, insn, brw_imm_w(0));
- insn->header.execution_size = BRW_EXECUTE_8;
+ if (p->compressed) {
+ insn->header.execution_size = BRW_EXECUTE_16;
+ } else {
+ insn->header.execution_size = BRW_EXECUTE_8;
+ }
insn->bits1.branch_gen6.jump_count = 0;
brw_set_src0(p, insn, src0);
brw_set_src1(p, insn, src1);
{
fs_inst *inst;
- if (c->dispatch_width == 16) {
+ if (intel->gen != 6 && c->dispatch_width == 16) {
fail("Can't support (non-uniform) control flow on 16-wide\n");
}
assert(intel->gen == 6);
gen6_IF(p, inst->conditional_mod, src[0], src[1]);
} else {
- brw_IF(p, BRW_EXECUTE_8);
+ brw_IF(p, c->dispatch_width == 16 ? BRW_EXECUTE_16 : BRW_EXECUTE_8);
}
if_depth_in_loop[loop_stack_depth]++;
break;