Gen8+'s MUL instruction doesn't ignore the high 16-bits of one source
like on earlier platforms, so we can constant propagate into it without
worry. Integer multiplies (not into the accumulator, which is done for
imul_high) are lowered in lower_integer_multiplication(), so it's safe
there as well.
On Broadwell, fragment shaders only:
total instructions in shared programs:
4377769 ->
4377451 (-0.01%)
instructions in affected programs: 48064 -> 47746 (-0.66%)
helped: 156
On Broadwell, vertex shaders only:
total instructions in shared programs:
2858885 ->
2856313 (-0.09%)
instructions in affected programs: 26380 -> 23808 (-9.75%)
helped: 134
On Broadwell, vertex shaders only (with INTEL_USE_NIR=1):
total instructions in shared programs:
2911688 ->
2865984 (-1.57%)
instructions in affected programs:
1421715 ->
1376011 (-3.21%)
helped: 6186
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
/* Fit this constant in by commuting the operands.
* Exception: we can't do this for 32-bit integer MUL/MACH
* because it's asymmetric.
+ *
+ * The BSpec says for Broadwell that
+ *
+ * "When multiplying DW x DW, the dst cannot be accumulator."
+ *
+ * Integer MUL with a non-accumulator destination will be lowered
+ * by lower_integer_multiplication(), so don't restrict it.
*/
- if ((inst->opcode == BRW_OPCODE_MUL ||
+ if (((inst->opcode == BRW_OPCODE_MUL &&
+ inst->dst.is_accumulator()) ||
inst->opcode == BRW_OPCODE_MACH) &&
(inst->src[1].type == BRW_REGISTER_TYPE_D ||
inst->src[1].type == BRW_REGISTER_TYPE_UD))