i965: Set "Subslice Hashing Mode" to 16x16 on Apollolake.
authorKenneth Graunke <kenneth@whitecape.org>
Tue, 30 May 2017 21:29:08 +0000 (14:29 -0700)
committerKenneth Graunke <kenneth@whitecape.org>
Wed, 2 Aug 2017 20:31:56 +0000 (13:31 -0700)
As of 4.11, the kernel isn't bothering to set the subslice hashing mode
on Apollolake, leaving it at the default of 8x8.  (It initializes it to
16x4 on most platforms.)

Performance data for GPUTest Triangle on Apollolake at 1024x640:

   X-tiled RT:
   -----------
   8x8 -> 16x4:   2.4325%  +/- 0.383683% (n=107)
   8x8 -> 8x4:   -3.75105% +/- 0.592491% (n=40)
   8x8 -> 16x16:  6.17238% +/- 0.67157%  (n=30)

   Y-tiled RT:
   -----------
   8x8 -> 16x4:   1.30307%  +/- 0.297292% (n=205)
   8x8 -> 8x4:   -0.769282% +/- 0.729557% (n=35)
   8x8 -> 16x16:  3.00254%  +/- 0.715503% (n=40)

   8x MSAA RT (INTEL_FORCE_MSAA=8):
   --------------------------------
   8x8 -> 16x4:   1.38889% +/- 0.93729%  (n=7)
   8x8 -> 8x4:   -2.10643% +/- 1.15153%  (n=3)
   8x8 -> 16x16:  3.87183% +/- 1.08851%  (n=5)

Based on this, we choose 16x16 for Apollolake.

Skylake GT2 with X-tiled buffers appears to be a toss-up between 16x4
and 16x16, and with Y-tiled buffers it doesn't seem to really matter.
So we'll leave Skylake alone for now.

The hashing mode doesn't seem to make a measurable impact on more
complex benchmarks.

Acked-by: Matt Turner <mattst88@gmail.com>
src/mesa/drivers/dri/i965/brw_defines.h
src/mesa/drivers/dri/i965/brw_state_upload.c

index 2a8dbf8cb9a3041f03e788c4f5a1d10dc2863021..4abb790612d9d1c3a1b739b051b11c7496effc13 100644 (file)
@@ -1617,6 +1617,13 @@ enum brw_pixel_shader_coverage_mask_mode {
 # define GEN8_HIZ_PMA_MASK_BITS \
    REG_MASK(GEN8_HIZ_NP_PMA_FIX_ENABLE | GEN8_HIZ_NP_EARLY_Z_FAILS_DISABLE)
 
+#define GEN7_GT_MODE                    0x7008
+# define GEN9_SUBSLICE_HASHING_8x8      (0 << 8)
+# define GEN9_SUBSLICE_HASHING_16x4     (1 << 8)
+# define GEN9_SUBSLICE_HASHING_8x4      (2 << 8)
+# define GEN9_SUBSLICE_HASHING_16x16    (3 << 8)
+# define GEN9_SUBSLICE_HASHING_MASK_BITS REG_MASK(3 << 8)
+
 /* Predicate registers */
 #define MI_PREDICATE_SRC0               0x2400
 #define MI_PREDICATE_SRC1               0x2408
index acaa97ee7d456169c2e6ffc18868badde1206075..f38c1946df62382c5ee5567bf99435c4f0f4434b 100644 (file)
@@ -72,6 +72,15 @@ brw_upload_initial_gpu_state(struct brw_context *brw)
                 GEN9_FLOAT_BLEND_OPTIMIZATION_ENABLE |
                 GEN9_PARTIAL_RESOLVE_DISABLE_IN_VC);
       ADVANCE_BATCH();
+
+      if (brw->is_broxton) {
+         BEGIN_BATCH(3);
+         OUT_BATCH(MI_LOAD_REGISTER_IMM | (3 - 2));
+         OUT_BATCH(GEN7_GT_MODE);
+         OUT_BATCH(GEN9_SUBSLICE_HASHING_MASK_BITS |
+                   GEN9_SUBSLICE_HASHING_16x16);
+         ADVANCE_BATCH();
+      }
    }
 
    if (brw->gen >= 8) {