It appears that the thing that was killing VS threads was the
gratuitous NOP that replaced the gratuitous jump from OPCODE_END to
the nearby OPCODE_END implementation. With that gone, we can move on
to the rest of the pipeline.
(brw->vs.prog_data->urb_read_length << GEN6_VS_URB_READ_LENGTH_SHIFT) |
(0 << GEN6_VS_URB_ENTRY_READ_OFFSET_SHIFT));
OUT_BATCH((0 << GEN6_VS_MAX_THREADS_SHIFT) |
- GEN6_VS_STATISTICS_ENABLE);
+ GEN6_VS_STATISTICS_ENABLE |
+ GEN6_VS_ENABLE);
ADVANCE_BATCH();
intel_batchbuffer_emit_mi_flush(intel->batch);