loadstore1: Reduce busy cycles
This reduces the number of cycles where loadstore1 asserts its busy
output, leading to increased throughput of loads and stores. Loads
that hit in the cache can now be executed at the rate of one every two
cycles. Stores take 4 cycles assuming the wishbone slave responds
with an ack the cycle after we assert strobe.
To achieve this, the state machine code is split into two parts, one
for when we have an existing instruction in progress, and one for
starting a new instruction. We can now combinatorially clear busy and
start a new instruction in the same cycle that we get a done signal
from the dcache; in other words we are completing one instruction and
potentially writing back results in the same cycle that we start a new
instruction and send its address and data to the dcache.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>