i965: perf: minimize the chances to spread queries across batchbuffers

author Lionel Landwerlin <lionel.g.landwerlin@intel.com>

Thu, 22 Jun 2017 01:15:50 +0000 (02:15 +0100)

committer Lionel Landwerlin <lionel.g.landwerlin@intel.com>

Tue, 27 Jun 2017 11:10:25 +0000 (14:10 +0300)
author Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Thu, 22 Jun 2017 01:15:50 +0000 (02:15 +0100)
committer Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tue, 27 Jun 2017 11:10:25 +0000 (14:10 +0300)
diff --git a/src/mesa/drivers/dri/i965/brw_performance_query.c b/src/mesa/drivers/dri/i965/brw_performance_query.c

index 06576a54d039978be46d72e002d02898dae2bd63..6b874d0bbeef3df7552d6ffb631c7e2c78f1e622 100644 (file)
--- a/src/mesa/drivers/dri/i965/brw_performance_query.c
+++ b/src/mesa/drivers/dri/i965/brw_performance_query.c
@@ -1063,6 +1063,14 @@ brw_end_perf_query(struct gl_context *ctx,
                                               obj->oa.begin_report_id + 1);
        }
  
+      /* We flush the batchbuffer here to minimize the chances that MI_RPC
+       * delimiting commands end up in different batchbuffers. If that's the
+       * case, the measurement will include the time it takes for the kernel
+       * scheduler to load a new request into the hardware. This is manifested
+       * in tools like frameretrace by spikes in the "GPU Core Clocks"
+       * counter.
+       */
+      intel_batchbuffer_flush(brw);
        --brw->perfquery.n_active_oa_queries;
  
        /* NB: even though the query has now ended, it can't be accumulated
author	Lionel Landwerlin <lionel.g.landwerlin@intel.com>
	Thu, 22 Jun 2017 01:15:50 +0000 (02:15 +0100)
committer	Lionel Landwerlin <lionel.g.landwerlin@intel.com>
	Tue, 27 Jun 2017 11:10:25 +0000 (14:10 +0300)