When the original instruction had a stride > 1, the combined registers
written by the split instructions won't amount to the same register space
written by the original instruction because the split instructions will
use a stride of 1. The current code assumed otherwise and computed the
number of registers written by split instructions as an equal share based
on the relation between the lowered width and the original execution size
of the instruction.
It is only after the split, when we interleave the components of the result
from the lowered instructions back into the original dst register, that the
original stride takes effect and we write all the registers specified by
the original instruction.
Just make the number of register written the same as the vgrf space we
allocate for the dst of the split instruction.
Fixes crashes in fp64 tests produced as a result of assigning incorrectly the
number of registers written by split instructions, which led to incorrect
validation of the size of the writes against the allocated vgrf space.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
split_inst.dst = dsts[i] =
lbld.vgrf(inst->dst.type, dst_size);
split_inst.regs_written =
- DIV_ROUND_UP(inst->regs_written * lower_width,
- inst->exec_size);
+ DIV_ROUND_UP(type_sz(inst->dst.type) * dst_size * lower_width,
+ REG_SIZE);
}
lbld.emit(split_inst);