X-Git-Url: https://git.libre-soc.org/?a=blobdiff_plain;f=simple_v_extension.mdwn;h=0642ce926d702469769bae2210773b1035398a95;hb=fa430b3a7c6a80d54aa5d040bd3dd4add9fcf5d0;hp=3a16606a1459e37309ea5efdb01ada7b831db0a9;hpb=999df9c317daa04d7af6e13abc40a62aa1fe2aff;p=libreriscv.git diff --git a/simple_v_extension.mdwn b/simple_v_extension.mdwn index 3a16606a1..0642ce926 100644 --- a/simple_v_extension.mdwn +++ b/simple_v_extension.mdwn @@ -1,5 +1,7 @@ # Variable-width Variable-packed SIMD / Simple-V / Parallelism Extension Proposal +**Note: this document is out of date and involved early ideas and discussions** + Key insight: Simple-V is intended as an abstraction layer to provide a consistent "API" to parallelisation of existing *and future* operations. *Actual* internal hardware-level parallelism is *not* required, such @@ -1657,6 +1659,61 @@ in advance to avoid. TBD: floating-point compare and other exception handling +------ + +Multi-LR/SC + +Please don't try to use the L1 itself. + +Use the Load and Store buffers which capture instruction state prior +to being accessed in the L1 (and prior to data arriving in the case of +Store buffer). + +Also, use the L1 Miss buffers as these already HAVE to be snooped by +coherence traffic. These are used to monitor that all participating +cache lines remain interference free, and amalgamate same into a CPU +signal accessible ia branch or predicate. + +The Load buffers manage inbound traffic +The Store buffers manage outbound traffic. + +Done properly, the participating cache lines can exceed the associativity +of the L1 cache without architectural harm (may incur additional latency). + + + +> > > so, let's say instead of another LR *cancelling* the load +> > > reservation, the SMP core / hardware thread *blocks* for +> > > up to 63 further instructions, waiting for the reservation +> > > to clear. +> > +> > Can you explain what you mean by this paragraph? +> +> best put in sequential events, probably. +> +> LR <-- 64-instruction countdown starts here +> ... 63 +> ... 62 +> LR same address <--- notes that core1 is on 61, +> so pauses for **UP TO** 61 cycles +> ... 32 +> SC <- core1 didn't reach zero, therefore valid, therefore +> core2 is now **UNBLOCKED**, is granted the +> load-reservation (and begins its **own** 64-cycle +> LR instruction countdown) +> ... 63 +> ... 62 +> ... +> ... +> SC <- also valid + +Looks to me that you could effect the same functionality by simply +holding onto the cache line in core 1 preventing core 2 from + getting past the LR. + +On the other hand, the freeze is similar to how the MP CRAYs did +ATOMIC stuff. + # References * SIMD considered harmful