i965: perf: minimize the chances to spread queries across batchbuffers
authorLionel Landwerlin <lionel.g.landwerlin@intel.com>
Thu, 22 Jun 2017 01:15:50 +0000 (02:15 +0100)
committerLionel Landwerlin <lionel.g.landwerlin@intel.com>
Tue, 27 Jun 2017 11:10:25 +0000 (14:10 +0300)
Counter related to timings will be sensitive to any delay introduced
by the software. In particular if our begin & end of performance
queries end up in different batches, time related counters will
exhibit biffer values caused by the time it takes for the kernel
driver to load new requests into the hardware.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
src/mesa/drivers/dri/i965/brw_performance_query.c

index 06576a54d039978be46d72e002d02898dae2bd63..6b874d0bbeef3df7552d6ffb631c7e2c78f1e622 100644 (file)
@@ -1063,6 +1063,14 @@ brw_end_perf_query(struct gl_context *ctx,
                                              obj->oa.begin_report_id + 1);
       }
 
+      /* We flush the batchbuffer here to minimize the chances that MI_RPC
+       * delimiting commands end up in different batchbuffers. If that's the
+       * case, the measurement will include the time it takes for the kernel
+       * scheduler to load a new request into the hardware. This is manifested
+       * in tools like frameretrace by spikes in the "GPU Core Clocks"
+       * counter.
+       */
+      intel_batchbuffer_flush(brw);
       --brw->perfquery.n_active_oa_queries;
 
       /* NB: even though the query has now ended, it can't be accumulated