execute1: Add a pipelined 33-bit signed multiplier
This adds a pipelined 33-bit by 33-bit signed multiplier with one
cycle latency to the execute pipeline, and uses it for the mullw,
mulhw and mulhwu instructions. Because it has one cycle of latency we
can assume that its result is available in the second execute stage
without needing to add busy logic to the second stage.
This adds both a generic version of the multiplier and a
Xilinx-specific version using four DSP slices of the Artix-7.
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>