i965/fs: Lower DDY instructions to SIMD8 during SIMD lowering time

author Francisco Jerez <currojerez@riseup.net>

Tue, 17 May 2016 23:27:09 +0000 (16:27 -0700)

committer Francisco Jerez <currojerez@riseup.net>

Sat, 28 May 2016 06:19:22 +0000 (23:19 -0700)
author Francisco Jerez <currojerez@riseup.net>
Tue, 17 May 2016 23:27:09 +0000 (16:27 -0700)
committer Francisco Jerez <currojerez@riseup.net>
Sat, 28 May 2016 06:19:22 +0000 (23:19 -0700)
diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp

index 000658104afb56d7d1d38a25772d18d341919113..c2dd9da5a4923e497c50b091a67effe3a315447e 100644 (file)
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -4825,6 +4825,35 @@ get_lowered_simd_width(const struct brw_device_info *devinfo,
         */
        return (devinfo->gen == 4 ? 16 : MIN2(16, inst->exec_size));
  
+   case FS_OPCODE_DDY_FINE:
+      /* The implementation of this virtual opcode may require emitting
+       * compressed Align16 instructions, which are severely limited on some
+       * generations.
+       *
+       * From the Ivy Bridge PRM, volume 4 part 3, section 3.3.9 (Register
+       * Region Restrictions):
+       *
+       *  "In Align16 access mode, SIMD16 is not allowed for DW operations
+       *   and SIMD8 is not allowed for DF operations."
+       *
+       * In this context, "DW operations" means "operations acting on 32-bit
+       * values", so it includes operations on floats.
+       *
+       * Gen4 has a similar restriction.  From the i965 PRM, section 11.5.3
+       * (Instruction Compression -> Rules and Restrictions):
+       *
+       *  "A compressed instruction must be in Align1 access mode. Align16
+       *   mode instructions cannot be compressed."
+       *
+       * Similar text exists in the g45 PRM.
+       *
+       * Empirically, compressed align16 instructions using odd register
+       * numbers don't appear to work on Sandybridge either.
+       */
+      return (devinfo->gen == 4 || devinfo->gen == 6 ||
+              (devinfo->gen == 7 && !devinfo->is_haswell) ?
+              MIN2(8, inst->exec_size) : MIN2(16, inst->exec_size));
+
     case SHADER_OPCODE_MULH:
        /* MULH is lowered to the MUL/MACH sequence using the accumulator, which
         * is 8-wide on Gen7+.
author	Francisco Jerez <currojerez@riseup.net>
	Tue, 17 May 2016 23:27:09 +0000 (16:27 -0700)
committer	Francisco Jerez <currojerez@riseup.net>
	Sat, 28 May 2016 06:19:22 +0000 (23:19 -0700)