`atomic_load_acquire` is 4 instructions, including a branch and an unnecessarily-strong memory fence:
```
+# address in r3
ld 3, 0(3)
cmpw 0, 3, 3
bne- skip
isync
skip:
+# output in r3
```
Having single atomic operations is useful for implementations that want to send atomic operations to a shared cache since they can be more efficient to execute there, rather than having to move a whole cache block. Relying exclusively on