i965: Only emit 1 viewport when possible.
authorKenneth Graunke <kenneth@whitecape.org>
Mon, 26 Sep 2016 17:30:30 +0000 (10:30 -0700)
committerKenneth Graunke <kenneth@whitecape.org>
Tue, 4 Oct 2016 01:41:10 +0000 (18:41 -0700)
commit9d6ca7c3d091e1ab71ce2f75bf4f13dc8844d801
tree1d23319263fc859fd759a2ccf9514b4f2bd634fe
parent7eb7684818ead4ec7444ee309e22a9db731dd234
i965: Only emit 1 viewport when possible.

In core profile, we support up to 16 viewports.  However, in the
majority of cases, only 1 of them is actually used - we only need
the others if the last shader stage prior to the rasterizer writes
gl_ViewportIndex.

Processing all 16 viewports adds additional CPU overhead, which hurts
CPU-intensive workloads such as Glamor.  This meant that switching to
core profile actually penalized Glamor to an extent, which is
unfortunate.

This patch tracks the number of relevant viewports, switching between
1 and ctx->Const.MaxViewports if gl_ViewportIndex is written.  A new
BRW_NEW_VIEWPORT_COUNT flag tracks this.  This could mean re-emitting
viewport state when switching, but hopefully this is offset by doing
1/16th of the work in the common case.  The new flag is also lighter
weight than BRW_NEW_VUE_MAP_GEOM_OUT, which we were using in one case.

According to Eric Anholt, x11perf -copypixwin10 performance improves by
11.5094% +/- 3.10841% (n=10) on his Skylake.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
src/mesa/drivers/dri/i965/brw_cc.c
src/mesa/drivers/dri/i965/brw_context.c
src/mesa/drivers/dri/i965/brw_context.h
src/mesa/drivers/dri/i965/brw_gs_state.c
src/mesa/drivers/dri/i965/brw_state_upload.c
src/mesa/drivers/dri/i965/gen6_clip_state.c
src/mesa/drivers/dri/i965/gen6_scissor_state.c
src/mesa/drivers/dri/i965/gen6_viewport_state.c
src/mesa/drivers/dri/i965/gen7_viewport_state.c
src/mesa/drivers/dri/i965/gen8_viewport_state.c