dcache: Improve timing of valid/done outputs
This makes d_out.valid and m_out.done come directly from registers in
order to improve timing. The inputs to the registers are set by the
same conditions that cause r1.hit_load_valid, r1.slow_valid,
r1.error_done and r1.stcx_fail to be set.
Note that the STORE_WAIT_ACK state doesn't test r1.mmu_req but assumes
that the request came from loadstore1. This is because we normally
have r1.full = 0 in this state, which means that r1.mmu_req can
change at any time.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>