-the operations it has that I was going to propose:
-fetch_add
-fetch_xor
-fetch_or
-fetch_and
-fetch_umax
-fetch_smax
-fetch_umin
-fetch_smin
-exchange
-
-as well as a few I wasn't going to propose (they seem less useful to me):
-compare-and-swap-not-equal
-fetch-and-increment-bounded
-fetch-and-increment-equal
-fetch-and-decrement-bounded
-store-twin
-
-The spec also basically says that the atomic memory operations are only intended for when you want to do atomic operations on memory, but don't want that memory to be loaded into your L1 cache.
-
-imho that restriction is specifically *not* wanted, because there are plenty of cases where atomic operations should happen in your L1 cache.
-
-I'd guess that part of why those atomic operations weren't included in gcc or clang as the default implementation of atomic operations (when the appropriate ISA feature is enabled) is because of that restriction.
-
-imho the cpu should be able to (but not required to) predict whether to send an atomic operation to L2-cache/L3-cache/etc./memory or to execute it directly in the L1 cache. The prediction could be based on how often that cache block was accessed from different cpus, e.g. by having a small saturating counter and a last-accessing-cpu field, where it would count how many times the same cpu accessed it in a row, sending it to the L1 cache if that's more than some limit, otherwise doing the operation in the L2/L3/etc.-cache if the limit wasn't reached or a different cpu tried to access it.
-
-# TODO: add list of proposed instructions
\ No newline at end of file
+see [[discussion]] for proposed operations and thoughts TODO
+remove this sentence
+
+
+# DRAFT atomic instructions
+
+These two instructions, `lat` and `stat`, are identical
+to `lwat/ldat` and `stwat/stdat` except add acquire and
+release guaranteed ordering semantics as well as 8 and
+16 bit memory widths.
+
+AT-Form (TODO)
+
+* lat. RT,RA,FC,aq,rl,ew
+* stat. RS,RA,FC,aq,rl,ew
+
+**DRAFT** EXT031 and XO, these are near to the existing
+atomic memory operations
+
+|0.5|6.10|11.15|16.20|21|22|23.24|25.30 |31|name| Form |
+|-- | -- | --- | --- |--|--|---- |------|--|----|------------|
+|31 | RT | RA | FC |lr|sc|ew |000101|Rc|lat | TODO-Form |
+|31 | RS | RA | FC |lr|sc|ew |100101|/ |stat| TODO-Form |
+
+* `ew` specifies the memory operation width: 0/1/2/3 8/16/32/64
+* If the `aq` bit is set,
+ then no later atomic memory operations can be observed
+ to take place before the AMO in this or other cores.
+ (A global Write-after-Read Memory Hazard is created)
+* If the `rl` bit is set, then other cores will not observe the AMO before
+ memory accesses preceding the AMO.
+ (A global Read-after-Write Memory Hazard is created)
+* Setting both the `aq` and the `rl` bit makes the sequence
+ sequentially consistent, meaning that
+ it cannot be reordered with respect to earlier or later atomic
+ memory operations. (Both a RaW and WaR are simultaneously created)
+* `FC` is identical to the Function tables used in Power ISA v3 for `lwat`
+ and `stwat`
+
+read functions v3.1 book II section 4.5.1 p1071
+
+|opcode| regs | memory | description |
+|------|----------------|------------------------|-----------------------------|
+|00000 | RT, RT+1 | mem(EA,s) | Fetch and Add |
+|00001 | RT, RT+1 | mem(EA,s) | Fetch and XOR |
+|00010 | RT, RT+1 | mem(EA,s) | Fetch and OR |
+|00011 | RT, RT+1 | mem(EA,s) | Fetch and AND |
+|00100 | RT, RT+1 | mem(EA,s) | Fetch and Maximum Unsigned |
+|00101 | RT, RT+1 | mem(EA,s) | Fetch and Maximum Signed |
+|00110 | RT, RT+1 | mem(EA,s) | Fetch and Minimum Unsigned |
+|00111 | RT, RT+1 | mem(EA,s) | Fetch and Minimum Signed |
+|01000 | RT, RT+1 | mem(EA,s) | Swap |
+|10000 | RT, RT+1, RT+2 | mem(EA,s) | Compare and Swap Not Equal |
+|11000 | RT | mem(EA,s) mem(EA+s, s) | Fetch and Increment Bounded |
+|11001 | RT | mem(EA,s) mem(EA+s, s) | Fetch and Increment Equal |
+|11100 | RT | mem(EA-s,s) mem(EA, s) | Fetch and Decrement Bounded |
+
+store functions
+
+|opcode| regs | memory | description |
+|------|------|-----------|-----------------------------|
+|00000 | RS | mem(EA,s) | Store Add |
+|00001 | RS | mem(EA,s) | Store XOR |
+|00010 | RS | mem(EA,s) | Store OR |
+|00011 | RS | mem(EA,s) | Store AND |
+|00100 | RS | mem(EA,s) | Store Maximum Unsigned |
+|00101 | RS | mem(EA,s) | Store Maximum Signed |
+|00110 | RS | mem(EA,s) | Store Minimum Unsigned |
+|00111 | RS | mem(EA,s) | Store Minimum Signed |
+|11000 | RS | mem(EA,s) | Store Twin |
+
+These functions are also recognised as being part of the
+OpenCAPI Specification.