The pass should work for all bit sizes but it's less clear that the
extra instructions are worth it on small integers. Also, the hardware
doesn't do mul_high on anything other than 32-bit integers and, absent
any decent mechanism for testing the pass on 8 and 16-bit types, it's
probably best to just leave it disabled for now.
Shader-db results on Sky Lake:
total instructions in shared programs:
15105795 ->
15111403 (0.04%)
instructions in affected programs: 72774 -> 78382 (7.71%)
helped: 0
HURT: 265
Note that hurt here actually means helped because we're getting rid of
integer quotient operations (which are a send on some platforms!) and
replacing them with fairly cheap ALU ops.
Reviewed-by: Ian Romanick ian.d.romanick@intel.com
OPT(nir_opt_cse);
OPT(nir_opt_peephole_select, 0);
OPT(nir_opt_intrinsics);
+ OPT(nir_opt_idiv_const, 32);
OPT(nir_opt_algebraic);
OPT(nir_opt_constant_folding);
OPT(nir_opt_dead_cf);
*/
OPT(nir_lower_int64, nir_lower_imul64 |
nir_lower_isign64 |
- nir_lower_divmod64);
+ nir_lower_divmod64 |
+ nir_lower_imul_high64);
nir = brw_nir_optimize(nir, compiler, is_scalar, true);