radeonsi: add threadgroups_per_cu param into si_get_compute_resource_limits