nv50: add compute-related MP perf counters on G84+
These compute-related MP performance counters have been reverse
engineered using CUPTI which is part of NVIDIA CUDA.
As for nvc0, we use a compute kernel to read out those performance
counters, and the command stream to configure them. Note that Tesla
only exposes 4 MP performance counters, while Fermi has 8.
Only G84+ is supported because G80 is an old and weird card.
Tested on G84, G96, G200, MCP79 and GT218 with glxgears, glxspheres64,
xonotic-glx, heaven and valley.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Pierre Moreau <pierre.morrow@free.fr>
Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>