aco: use s_waitcnt_depctr to mitigate VMEMtoScalarWriteHazard