decode1: Split instruction decoding into two steps
This reduces the block RAM requirements for instruction decoding by
splitting it into two steps. The first, in a new pipeline stage
called decode0 (implemented by code in decode1.vhdl) maps the
instruction to a 9-bit instruction code using major and row decode
ROMs. The second maps the 9-bit code to the final decode_rom_t (about
44 bits wide). Branch prediction done in decode is now done in
decode0 rather than decode1.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>