There is also the issue that PowerISA's memory fences are unnecessarily strong, particularly `isync` which is used for a lot of `acquire` and stronger fences. `isync` forces the cpu to do a full pipeline flush, which is unnecessary when all that is needed is a memory barrier.
`atomic_fetch_add_seq_cst` is 6 instructions including a loop:
+
```
# address in r4, addend in r5
sync
```
`atomic_load_seq_cst` is 5 instructions, including a branch, and an unnecessarily-strong memory fence:
+
```
# address in r3
sync
```
`atomic_compare_exchange_strong_seq_cst` is 7 instructions, including a loop with 2 branches, and an unnecessarily-strong memory fence:
+
```
# address in r4, compared-to value in r5, replacement value in r6
sync
```
`atomic_load_acquire` is 4 instructions, including a branch and an unnecessarily-strong memory fence:
+
```
ld 3, 0(3)
cmpw 0, 3, 3