FPU: Add stage-2 stall ability to FPU
This makes the FPU able to stall other units at execute stage 2 and be
stalled by other units (specifically the LSU).
This means that the completion and writeback for an instruction can
now end up being deferred until the second cycle of a following
instruction, i.e. the cycle when the state machine has gone through
IDLE state into one of the DO_* states, which means we need to latch
the destination FPR number, CR mask, etc. from the previous
instruction so that we present the correct information to writeback.
The advantage of this is that we can get rid of the in_progress signal
from the LSU.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>