dcache: Add support for unaligned loads and stores
For an unaligned load or store, we do the first doubleword (dword) of
the transfer as normal, but then go to a new NEXT_DWORD state of the
state machine to do the cache tag lookup for the second dword of the
transfer. From the NEXT_DWORD state we have much the same transitions
to other states as from the IDLE state (the transitions for OP_LOAD_HIT
are a bit different but almost identical for the other op values).
We now do the preparation of the data to be written in loadstore1,
that is, byte reversal if necessary and rotation by a number of
bytes based on the low 3 bits of the address. We do rotation not
shifting so we have the bytes that need to go into the second
doubleword in the right place in the low bytes of the data sent to
dcache. The rotation and byte reversal are done in a single step
with one multiplexer per byte by setting the select inputs for each
byte appropriately.
This also fixes writeback to not write the register value until it
has received both pieces of an unaligned load value.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>