+ -- with no stalls. Stores also complete in 2 cycles in most
+ -- circumstances.
+ --
+ -- A request proceeds through the pipeline as follows.
+ --
+ -- Cycle 0: Request is received from loadstore or mmu if either
+ -- d_in.valid or m_in.valid is 1 (not both). In this cycle portions
+ -- of the address are presented to the TLB tag RAM and data RAM
+ -- and the cache tag RAM and data RAM.
+ --
+ -- Clock edge between cycle 0 and cycle 1:
+ -- Request is stored in r0 (assuming r0_full was 0). TLB tag and
+ -- data RAMs are read, and the cache tag RAM is read. (Cache data
+ -- comes out a cycle later due to its output register, giving the
+ -- whole of cycle 1 to read the cache data RAM.)
+ --
+ -- Cycle 1: TLB and cache tag matching is done, the real address
+ -- (RA) for the access is calculated, and the type of operation is
+ -- determined (the OP_* values above). This gives the TLB way for
+ -- a TLB hit, and the cache way for a hit or the way to replace
+ -- for a load miss.
+ --
+ -- Clock edge between cycle 1 and cycle 2:
+ -- Request is stored in r1 (assuming r1.full was 0)
+ -- The state machine transitions out of IDLE state for a load miss,
+ -- a store, a dcbz, or a non-cacheable load. r1.full is set to 1
+ -- for a load miss, dcbz or non-cacheable load but not a store.
+ --
+ -- Cycle 2: Completion signals are asserted for a load hit,
+ -- a store (excluding dcbz), a TLB operation, a conditional
+ -- store which failed due to no matching reservation, or an error
+ -- (cache hit on non-cacheable operation, TLB miss, or protection
+ -- fault).