This reduces mutex overhead.
radeonsi: +4.4% performance with piglit/drawoverhead, DrawElements, Ryzen X1700
iris_dri.so: +14% with piglit/drawoverhead, DrawArrays, i7 7700HQ.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
* The idea is to have batches as small as possible but large enough so that
* the queuing and mutex overhead is negligible.
*/
-#define TC_CALLS_PER_BATCH 192
+#define TC_CALLS_PER_BATCH 768
/* Threshold for when to use the queue or sync. */
#define TC_MAX_STRING_MARKER_BYTES 512