Windows opengl32.dll calls glFinish prior to every swapbuffers, which
makes it pretty hard to get decent performance...
Work around by mapping finish to flush on PIPE_OS_WINDOWS. This is
conformant, though it might confuse poorly-written benchmarks which
attempt to measure a single event rather than figuring out the rate of
continuous processing.
{
functions->Flush = st_glFlush;
functions->Finish = st_glFinish;
+
+ /* Windows opengl32.dll calls glFinish prior to every swapbuffers.
+ * This is unnecessary and degrades performance. Luckily we have some
+ * scope to work around this, as the externally-visible behaviour of
+ * Finish() is identical to Flush() in all cases - no differences in
+ * rendering or ReadPixels are visible if we opt not to wait here.
+ *
+ * Only set this up on windows to avoid suprise elsewhere.
+ */
+#ifdef PIPE_OS_WINDOWS
+ functions->Finish = st_glFlush;
+#endif
}