On 64 bits platforms, some atomic operations like __sync_fetch_and_add()
have constant time, but on 32 bits platforms they are implemented with a
loop and might take much longer.
Additionally, it seems like if their operands are not aligned to 64
bits, they also require extra memory accesses. From the Intel
Architecture's Developer Manual Vol. 1, 4.1.1:
"A word or doubleword operand that crosses a 4-byte boundary or a
quadword operand that crosses an 8-byte boundary is considered
unaligned and requires two separate memory bus cycles for access."
Forcing the u64 field to be aligned to 64 bits seems to make the unit
tests that are stressing this finish much faster.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
/* A simple count that is incremented every time the head changes. */
uint32_t count;
};
- uint64_t u64;
+ /* Make sure it's aligned to 64 bits. This will make atomic operations
+ * faster on 32 bit platforms.
+ */
+ uint64_t u64 __attribute__ ((aligned (8)));
};
#define ANV_FREE_LIST_EMPTY ((union anv_free_list) { { UINT32_MAX, 0 } })
uint32_t next;
uint32_t end;
};
- uint64_t u64;
+ /* Make sure it's aligned to 64 bits. This will make atomic operations
+ * faster on 32 bit platforms.
+ */
+ uint64_t u64 __attribute__ ((aligned (8)));
};
};