[nvptx] Add support for a per-worker broadcast buffer and barrier