dcache: Move way selection and forwarding earlier
This moves the way multiplexer for the data from the BRAM, and the
multiplexers for forwarding data from earlier stores or refills,
before a clock edge rather than after, so that now the data output
from the dcache comes from a clean latch. To do this we remove the
extra latch on the output of the data BRAM (i.e. ADD_BUF=false) and
rearrange the logic. The choice whether to forward or not now depends
not on way comparisons but rather on a tag comparisons, for the sake
of timing.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>