nvc0: add compute invocation counter
The strategy is to keep a CPU-side counter of the direct invocations,
and a GPU-side counter of the indirect invocations, and then add them
together for queries.
The specific technique is a macro which multiplies a list of integers
together and accumulates the product into SCRATCH registers held inside
of the context. Another macro will read those values out and add them to
the passed-in cpu-side counter to be stored in a query buffer the same
way that all the other statistics are stored.
Original implementation by Rhys Perry, redone by Ilia Mirkin to use the
SCRATCH temporaries.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>