From: Kenneth Graunke Date: Wed, 26 Aug 2015 00:09:40 +0000 (-0700) Subject: i965/vs: Fix a subtlety in the nr_attributes == 0 workaround. X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=6842ad79125371e7e61baac8e6b8a77583f79065;p=mesa.git i965/vs: Fix a subtlety in the nr_attributes == 0 workaround. nr_attributes is used to compute first_non_payload_grf, which is the first register we're allowed to use for ordinary register allocation. The hardware requires us to read at least one pair of values, but we're completely free to overwrite that garbage register with whatever we like. Instead of altering nr_attributes, we should alter urb_read_length, which only affects the amount we ask the VF to read. This should save us a register in trivial cases (which admittedly isn't very useful). While we're at it, improve the explanation in the comments. v2: Actually do what I said (caught by Ilia). Signed-off-by: Kenneth Graunke Reviewed-by: Iago Toral Quiroga --- diff --git a/src/mesa/drivers/dri/i965/brw_vs.c b/src/mesa/drivers/dri/i965/brw_vs.c index 17d3bc49580..0dc2bdccae8 100644 --- a/src/mesa/drivers/dri/i965/brw_vs.c +++ b/src/mesa/drivers/dri/i965/brw_vs.c @@ -170,14 +170,16 @@ brw_codegen_vs_prog(struct brw_context *brw, nr_attributes++; } - /* The BSpec says we always have to read at least one thing from the VF, - * and it appears that the hardware wedges otherwise. + /* The 3DSTATE_VS documentation lists the lower bound on "Vertex URB Entry + * Read Length" as 1 in vec4 mode, and 0 in SIMD8 mode. Empirically, in + * vec4 mode, the hardware appears to wedge unless we read something. */ - if (nr_attributes == 0 && !brw->intelScreen->compiler->scalar_vs) - nr_attributes = 1; + if (brw->intelScreen->compiler->scalar_vs) + prog_data.base.urb_read_length = DIV_ROUND_UP(nr_attributes, 2); + else + prog_data.base.urb_read_length = DIV_ROUND_UP(MAX2(nr_attributes, 1), 2); prog_data.nr_attributes = nr_attributes; - prog_data.base.urb_read_length = DIV_ROUND_UP(nr_attributes, 2); /* Since vertex shaders reuse the same VUE entry for inputs and outputs * (overwriting the original contents), we need to make sure the size is