For 32-bit instructions we want to use <4,4,1> regions for VGRF
sources so we should really set a width of 4 (we were setting 8).
For 64-bit instructions we want to use a width of 2 because the
hardware uses 32-bit swizzles, meaning that we can only address 2
consecutive 64-bit components in a row. Also, Curro suggested that
the hardware is probably fixing the width to 2 for 64-bit instructions
anyway, so just go with that and use <2,2,1>.
v2:
- No need to explicitly set the vertical stride of 64-bit regions to 2,
brw_vecn_grf with a width of 2 will do that for us.
- No need to adjust the width of dst registers.
v3 (Ian):
- Make type_size and width const.
Signed-off-by: Connor Abbott <connor.w.abbott@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
struct src_reg &src = inst->src[i];
struct brw_reg reg;
switch (src.file) {
- case VGRF:
- reg = byte_offset(brw_vec8_grf(src.nr, 0), src.offset);
+ case VGRF: {
+ const unsigned type_size = type_sz(src.type);
+ const unsigned width = REG_SIZE / 2 / MAX2(4, type_size);
+ reg = byte_offset(brw_vecn_grf(width, src.nr, 0), src.offset);
reg.type = src.type;
reg.swizzle = src.swizzle;
reg.abs = src.abs;
reg.negate = src.negate;
break;
+ }
- case UNIFORM:
+ case UNIFORM: {
+ const unsigned width = REG_SIZE / 2 / MAX2(4, type_sz(src.type));
reg = stride(byte_offset(brw_vec4_grf(
prog_data->base.dispatch_grf_start_reg +
src.nr / 2, src.nr % 2 * 4),
src.offset),
- 0, 4, 1);
+ 0, width, 1);
reg.type = src.type;
reg.swizzle = src.swizzle;
reg.abs = src.abs;
/* This should have been moved to pull constants. */
assert(!src.reladdr);
break;
+ }
case ARF:
case FIXED_GRF: