intel/fs: Rely on undocumented unrestricted regioning for 32x16-bit integer multiply.
authorFrancisco Jerez <currojerez@riseup.net>
Fri, 18 Jan 2019 19:38:17 +0000 (11:38 -0800)
committerFrancisco Jerez <currojerez@riseup.net>
Thu, 21 Feb 2019 22:07:25 +0000 (14:07 -0800)
Even though the hardware spec claims that any "integer DWord multiply"
operation is affected by the regioning restrictions of CHV/BXT/GLK,
this is inconsistent with the behavior of the simulator and with
empirical evidence -- Return false from has_dst_aligned_region_restriction()
for such instructions as a micro-optimization.

Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
src/intel/compiler/brw_ir_fs.h

index c4427a658b0167cbe2945e9790666f7bf2c8cfd7..56a4bdc6e526302409fde11f31d25b4365f5a4bf 100644 (file)
@@ -542,11 +542,19 @@ has_dst_aligned_region_restriction(const gen_device_info *devinfo,
                                    const fs_inst *inst)
 {
    const brw_reg_type exec_type = get_exec_type(inst);
-   const bool is_int_multiply = !brw_reg_type_is_floating_point(exec_type) &&
-         (inst->opcode == BRW_OPCODE_MUL || inst->opcode == BRW_OPCODE_MAD);
+   /* Even though the hardware spec claims that "integer DWord multiply"
+    * operations are restricted, empirical evidence and the behavior of the
+    * simulator suggest that only 32x32-bit integer multiplication is
+    * restricted.
+    */
+   const bool is_dword_multiply = !brw_reg_type_is_floating_point(exec_type) &&
+      ((inst->opcode == BRW_OPCODE_MUL &&
+        MIN2(type_sz(inst->src[0].type), type_sz(inst->src[1].type)) >= 4) ||
+       (inst->opcode == BRW_OPCODE_MAD &&
+        MIN2(type_sz(inst->src[1].type), type_sz(inst->src[2].type)) >= 4));
 
    if (type_sz(inst->dst.type) > 4 || type_sz(exec_type) > 4 ||
-       (type_sz(exec_type) == 4 && is_int_multiply))
+       (type_sz(exec_type) == 4 && is_dword_multiply))
       return devinfo->is_cherryview || gen_device_info_is_9lp(devinfo);
    else
       return false;