aco: improve waitcnt insertion around loops