radeonsi: set threadgroup size to 0 for threadgroups with only 1 wave