intel/fs/gen6: Use SEL instead of bashing thread payload for unlit centroid workaround.
authorFrancisco Jerez <currojerez@riseup.net>
Fri, 3 Jan 2020 23:58:05 +0000 (15:58 -0800)
committerFrancisco Jerez <currojerez@riseup.net>
Fri, 17 Jan 2020 21:22:39 +0000 (13:22 -0800)
This prevents regressions on SNB due to the redundant MOVs lying
around in cases where fetch_payload_reg() returns a VGRF (currently
only in SIMD32 but soon in pretty much all cases).  The MOVs can't be
register-coalesced due to their source being a FIXED_GRF, and they
can't be copy-propagated either due to the unlit centroid workaround
partial writes.  They can be copy-propagated just fine into a SEL
instruction though.

On SNB this prevents the following shader-db regressions (including
SIMD32 programs) in combination with the interpolation rework part of
this series:

   total instructions in shared programs: 13996898 -> 14001982 (0.04%)
   instructions in affected programs: 197461 -> 202545 (2.57%)
   helped: 0
   HURT: 1251

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
src/intel/compiler/brw_fs_visitor.cpp

index ce51268ec8d7b6a758c61c000c7abfa60d68cc3c..951b9f49e48dbaefa96da4767acc66e4fc36c02a 100644 (file)
@@ -351,14 +351,17 @@ fs_visitor::emit_interpolation_setup_gen6()
          if (!(centroid_modes & (1 << i)))
             continue;
 
+         const fs_reg centroid_delta_xy = delta_xy[i];
          const fs_reg &pixel_delta_xy = delta_xy[i - 1];
 
-         for (unsigned q = 0; q < dispatch_width / 8; q++) {
-            for (unsigned c = 0; c < 2; c++) {
+         delta_xy[i] = bld.vgrf(BRW_REGISTER_TYPE_F, 2);
+
+         for (unsigned c = 0; c < 2; c++) {
+            for (unsigned q = 0; q < dispatch_width / 8; q++) {
                const unsigned idx = c + (q & 2) + (q & 1) * dispatch_width / 8;
-               set_predicate_inv(
-                  BRW_PREDICATE_NORMAL, true,
-                  bld.half(q).MOV(horiz_offset(delta_xy[i], idx * 8),
+               set_predicate(BRW_PREDICATE_NORMAL,
+                  bld.half(q).SEL(horiz_offset(delta_xy[i], idx * 8),
+                                  horiz_offset(centroid_delta_xy, idx * 8),
                                   horiz_offset(pixel_delta_xy, idx * 8)));
             }
          }