opt_register_coalesce() was optimizing sequences such as:
mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D
mach(8) vgrf5.xy:D, attr18.xyyy:D, attr19.xyyy:D
mov(8) m4.zw:F, vgrf5.xxxy:F
into:
mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D
mach(8) m4.zw:D, attr18.xxxy:D, attr19.xxxy:D
This doesn't work - if we're going to reswizzle MACH, we'd need to
reswizzle the MUL as well. Here, the MUL fills the accumulator's .zw
components with attr18.yy * attr19.yy. But the MACH instruction expects
.z to contain attr18.x * attr19.x. Bogus results ensue.
No change in shader-db on Haswell. Prevents regressions in Timothy's
patches to use enhanced layouts for varying packing (which rearrange
code just enough to trigger this pre-existing bug, but were fine
themselves).
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
if (devinfo->gen == 6 && is_math() && swizzle != BRW_SWIZZLE_XYZW)
return false;
+ /* Don't touch MACH - it uses the accumulator results from an earlier
+ * MUL - so we'd need to reswizzle both. We don't do that, so just
+ * avoid it entirely.
+ */
+ if (opcode == BRW_OPCODE_MACH)
+ return false;
+
if (!can_do_writemask(devinfo) && dst_writemask != WRITEMASK_XYZW)
return false;