i965: Disable Z16 on contexts that don't require it.
authorEric Anholt <eric@anholt.net>
Thu, 25 Apr 2013 19:34:07 +0000 (12:34 -0700)
committerEric Anholt <eric@anholt.net>
Mon, 29 Apr 2013 18:41:34 +0000 (11:41 -0700)
It appears that Z16 on Intel hardware is in fact slower than Z24, so
people are getting surprisingly hurt when trying to use Z16 as a
performance-versus-precision tradeoff, or when they're targeting GLES2 and
that's all you get.

GL 3.0+ have Z16 on the list of required exact format sizes, but GLES
doesn't, so choose the better-performing layout in that case.  Improves
GLB 2.7 trex performance at 1920x1080 by 10.7% +/- 1.1% (n=3) on my IVB
system.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
src/mesa/drivers/dri/i965/brw_wm_surface_state.c

index a74b2c7cc1e9c22bb45bec2613d22c9e8e85f979..f1976391b1a388e0c868ae859c0b19bfa34f17b8 100644 (file)
@@ -566,7 +566,20 @@ brw_init_surface_formats(struct brw_context *brw)
    ctx->TextureFormatSupported[MESA_FORMAT_X8_Z24] = true;
    ctx->TextureFormatSupported[MESA_FORMAT_Z32_FLOAT] = true;
    ctx->TextureFormatSupported[MESA_FORMAT_Z32_FLOAT_X24S8] = true;
-   ctx->TextureFormatSupported[MESA_FORMAT_Z16] = true;
+
+   /* It appears that Z16 is slower than Z24 (on Intel Ivybridge and newer
+    * hardware at least), so there's no real reason to prefer it unless you're
+    * under memory (not memory bandwidth) pressure.  Our speculation is that
+    * this is due to either increased fragment shader execution from
+    * GL_LEQUAL/GL_EQUAL depth tests at the reduced precision, or due to
+    * increased depth stalls from a cacheline-based heuristic for detecting
+    * depth stalls.
+    *
+    * However, desktop GL 3.0+ require that you get exactly 16 bits when
+    * asking for DEPTH_COMPONENT16, so we have to respect that.
+    */
+   if (_mesa_is_desktop_gl(ctx))
+      ctx->TextureFormatSupported[MESA_FORMAT_Z16] = true;
 
    /* On hardware that lacks support for ETC1, we map ETC1 to RGBX
     * during glCompressedTexImage2D(). See intel_mipmap_tree::wraps_etc1.