vk/allocator: Fix a data race in the state pool
The previous algorithm had a race because of the way we were using
__sync_fetch_and_add for everything. In particular, the concept of
"returning" over-allocated states in the "next > end" case was completely
bogus. If too many threads were hitting the state pool at the same time,
it was possible to have the following sequence:
A: Get an offset (next == end)
B: Get an offset (next > end)
A: Resize the pool (now next < end by a lot)
C: Get an offset (next < end)
B: Return the over-allocated offset
D: Get an offset
in which case D will get the same offset as C. The solution to this race
is to get rid of the concept of "returning" over-allocated states.
Instead, the thread that gets a new block simply sets the next and end
offsets directly and threads that over-allocate don't return anything and
just futex-wait. Since you can only ever hit the over-allocate case if
someone else hit the "next == end" case and hasn't resized yet, you're
guaranteed that the end value will get updated and the futex won't block
forever.