decode1: Use block RAMs in decode
This combines the various decode arrays in decode1 into two, one
indexed by the major opcode (bits 31--26 of the instruction) together
with bits 4--0 of the instruction, and the other indexed mostly by the
minor opcode (bits 10--1), with some swizzles to accommodate the
relevant parts of the minor opcode space for opcodes 19, 31, 59 and 63
within a 2k entry ROM (11 address bits). These are called the "major"
and the "row" decode ROMs respectively. (Bits 10--6 of the
instruction are called the "row index", and bits 5--1, or 5--0 for
some opcodes, are called the "column index", because of the way the
opcode maps in the ISA are laid out.)
Both ROMs are looked up each cycle and the result from one or other,
or from an override in ri.override_decode, are selected after a clock
edge.
This uses quite a lot of BRAM resources. In future a predecode step
will reduce the BRAM usage substantially.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>