dcache: Split PLRU into storage and logic
Rather than having update and decode logic for each individual PLRU
as well as a register to store the current PLRU state, we now put the
PLRU state in a little RAM, which will typically use LUT RAM on FPGAs,
and have just a single copy of the logic to calculate the pseudo-LRU
way and to update the PLRU state.
The PLRU RAM that apples to the data storage (as opposed to the TLB)
is read asynchronously in the cycle after the cache tag matching is
done. At the end of that cycle the PLRU RAM entry is updated if the
access was a cache hit, or a victim way is calculated and stored if
the access was a cache miss. It is possible that a cache miss doesn't
start being handled until later, in which case the stored victim way
is used later when the miss gets handled.
Similarly for the TLB PLRU, the RAM is read asynchronously in the
cycle after a TLB lookup is done, and either updated at the end of
that cycle (for a hit), or a victim is chosen and stored for when the
TLB miss is satisfied.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>