X-Git-Url: https://git.libre-soc.org/?a=blobdiff_plain;f=openpower%2Fatomics.mdwn;h=bf1a055e0152d5e791cd94da9dc7a88950592036;hb=e9b9d33d6f8737f398868d2ed08bbfdce4fb6102;hp=ad304c5d7b0d6c9a85ccf9c3d4434d264a7bbca8;hpb=e3730e6c160f9123e44035de6accf01baf585b17;p=libreriscv.git diff --git a/openpower/atomics.mdwn b/openpower/atomics.mdwn index ad304c5d7..bf1a055e0 100644 --- a/openpower/atomics.mdwn +++ b/openpower/atomics.mdwn @@ -1,10 +1,23 @@ # Draft proposal for improved atomic operations for the Power ISA +**NOTE THIS PROPOSAL IS NOT BEING SUBMITTED DUE TO +DISCOVERY DURING INVESTIGATION THAT ATOMICS ARE DESIGNED +FOR MASSIVE DISTRIBUTED CLUSTERS. SIGNIFICANT ADDITIONAL +RESEARCH IS REQUIRED SO THIS PROPOSAL IS PUT ON HOLD +UNTIL BUDGET IS AVAILABLE** + Links: * * [OpenCAPI spec](http://opencapi.org/wp-content/uploads/2017/02/OpenCAPI-TL.WGsec_.V3p1.2017Jan27.pdf) p47-49 for AMO section * [RISC-V A](https://github.com/riscv/riscv-isa-manual/blob/master/src/a.tex) +* [[atomics/discussion]] +* + +TODO: + +* investigate Power ISA 3.1 p1077 eh hint + # Motivation @@ -110,75 +123,76 @@ unnecessary restrictions: it has only 32-bit and 64-bit atomic operations. -read operations v3.1 book II section 4.5.1 p1071 - - | 00000 | RT, RT+1 | mem(EA,s) | Fetch and Add | - | 00001 | RT, RT+1 | mem(EA,s) | Fetch and XOR | - | 00010 | RT, RT+1 | mem(EA,s) | Fetch and OR | - | 00011 | RT, RT+1 | mem(EA,s) | Fetch and AND | - | 00100 | RT, RT+1 | mem(EA,s) | Fetch and Maximum Unsigned | - | 00101 | RT, RT+1 | mem(EA,s) | Fetch and Maximum Signed | - | 00110 | RT, RT+1 | mem(EA,s) | Fetch and Minimum Unsigned | - | 00111 | RT, RT+1 | mem(EA,s) | Fetch and Minimum Signed | - | 01000 | RT, RT+1 | mem(EA,s) | Swap | - | 10000 | RT, RT+1, RT+2 | mem(EA,s) | Compare and Swap Not Equal | - | 11000 | RT | mem(EA,s) mem(EA+s, s) | Fetch and Increment Bounded | - | 11001 | RT | mem(EA,s) mem(EA+s, s) | Fetch and Increment Equal | - | 11100 | RT | mem(EA-s,s) mem(EA, s) | Fetch and Decrement Bounded | - -store operations - - | 00000 RS mem(EA,s) Store Add - | 00001 RS mem(EA,s) Store XOR - | 00010 RS mem(EA,s) Store OR - | 00011 RS mem(EA,s) Store AND t - | 00100 RS mem(EA,s) Store Maximum Unsigned - | 00101 RS mem(EA,s) Store Maximum Signed t - | 00110 RS mem(EA,s) Store Minimum Unsigned - | 00111 RS mem(EA,s) Store Minimum Signed - | 11000 RS mem(EA,s) Store Twin - -These operations are recognised as being part of the +see [[discussion]] for proposed operations and thoughts TODO +remove this sentence + + +# DRAFT atomic instructions + +These two instructions, `lat` and `stat`, are identical +to `lwat/ldat` and `stwat/stdat` except add acquire and +release guaranteed ordering semantics as well as 8 and +16 bit memory widths. + +AT-Form (TODO) + +* lat. RT,RA,FC,aq,rl,ew +* stat. RS,RA,FC,aq,rl,ew + +**DRAFT** EXT031 and XO, these are near to the existing +atomic memory operations + +|0.5|6.10|11.15|16.20|21|22|23.24|25.30 |31|name| Form | +|-- | -- | --- | --- |--|--|---- |------|--|----|------------| +|31 | RT | RA | FC |lr|sc|ew |000101|Rc|lat | TODO-Form | +|31 | RS | RA | FC |lr|sc|ew |100101|/ |stat| TODO-Form | + +* `ew` specifies the memory operation width: 0/1/2/3 8/16/32/64 +* If the `aq` bit is set, + then no later atomic memory operations can be observed + to take place before the AMO in this or other cores. + (A global Write-after-Read Memory Hazard is created) +* If the `rl` bit is set, then other cores will not observe the AMO before + memory accesses preceding the AMO. + (A global Read-after-Write Memory Hazard is created) +* Setting both the `aq` and the `rl` bit makes the sequence + sequentially consistent, meaning that + it cannot be reordered with respect to earlier or later atomic + memory operations. (Both a RaW and WaR are simultaneously created) +* `FC` is identical to the Function tables used in Power ISA v3 for `lwat` + and `stwat` + +read functions v3.1 book II section 4.5.1 p1071 + +|opcode| regs | memory | description | +|------|----------------|------------------------|-----------------------------| +|00000 | RT, RT+1 | mem(EA,s) | Fetch and Add | +|00001 | RT, RT+1 | mem(EA,s) | Fetch and XOR | +|00010 | RT, RT+1 | mem(EA,s) | Fetch and OR | +|00011 | RT, RT+1 | mem(EA,s) | Fetch and AND | +|00100 | RT, RT+1 | mem(EA,s) | Fetch and Maximum Unsigned | +|00101 | RT, RT+1 | mem(EA,s) | Fetch and Maximum Signed | +|00110 | RT, RT+1 | mem(EA,s) | Fetch and Minimum Unsigned | +|00111 | RT, RT+1 | mem(EA,s) | Fetch and Minimum Signed | +|01000 | RT, RT+1 | mem(EA,s) | Swap | +|10000 | RT, RT+1, RT+2 | mem(EA,s) | Compare and Swap Not Equal | +|11000 | RT | mem(EA,s) mem(EA+s, s) | Fetch and Increment Bounded | +|11001 | RT | mem(EA,s) mem(EA+s, s) | Fetch and Increment Equal | +|11100 | RT | mem(EA-s,s) mem(EA, s) | Fetch and Decrement Bounded | + +store functions + +|opcode| regs | memory | description | +|------|------|-----------|-----------------------------| +|00000 | RS | mem(EA,s) | Store Add | +|00001 | RS | mem(EA,s) | Store XOR | +|00010 | RS | mem(EA,s) | Store OR | +|00011 | RS | mem(EA,s) | Store AND | +|00100 | RS | mem(EA,s) | Store Maximum Unsigned | +|00101 | RS | mem(EA,s) | Store Maximum Signed | +|00110 | RS | mem(EA,s) | Store Minimum Unsigned | +|00111 | RS | mem(EA,s) | Store Minimum Signed | +|11000 | RS | mem(EA,s) | Store Twin | + +These functions are also recognised as being part of the OpenCAPI Specification. -the operations it has that I was going to propose: - -* fetch_add -* fetch_xor -* fetch_or -* fetch_and -* fetch_umax -* fetch_smax -* fetch_umin -* fetch_smin -* exchange - -as well as a few I wasn't going to propose (they seem less useful to me): - -* compare-and-swap-not-equal -* fetch-and-increment-bounded -* fetch-and-increment-equal -* fetch-and-decrement-bounded -* store-twin - -The spec also basically says that the atomic memory operations are only -intended for when you want to do atomic operations on memory, but don't -want that memory to be loaded into your L1 cache. - -imho that restriction is specifically *not* wanted, because there are -plenty of cases where atomic operations should happen in your L1 cache. - -I'd guess that part of why those atomic operations weren't included in -gcc or clang as the default implementation of atomic operations (when -the appropriate ISA feature is enabled) is because of that restriction. - -imho the cpu should be able to (but not required to) predict whether to -send an atomic operation to L2-cache/L3-cache/etc./memory or to execute -it directly in the L1 cache. The prediction could be based on how often -that cache block was accessed from different cpus, e.g. by having a -small saturating counter and a last-accessing-cpu field, where it would -count how many times the same cpu accessed it in a row, sending it to the -L1 cache if that's more than some limit, otherwise doing the operation -in the L2/L3/etc.-cache if the limit wasn't reached or a different cpu -tried to access it. - -# TODO: add list of proposed instructions