gpu-compute: Fixing HSA's barrier bit implementation
This changeset fixes several bugs in the HSA barrier bit implementation.
1. Forces AQL packet launch to wait for completion of all previous packets
2. Enforces barrier bit blocking only if there are packets pending completion
3. Barrier bit unblocking is correclty done by the last pending packet
4. Implementing barrier bit for all packets to conform to HSA spec
Change-Id: I62ce589dff57dcde4d64054a1b6ffd962acd5eb8
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/30354
Reviewed-by: Sooraj Puthoor <puthoorsooraj@gmail.com>
Maintainer: Anthony Gutierrez <anthony.gutierrez@amd.com>
Tested-by: kokoro <noreply+kokoro@google.com>