i965/vec4: add a SIMD lowering pass
Generally, instructions in Align16 mode only ever write to a single
register and don't need any form of SIMD splitting, that's why we
have never had a SIMD splitting pass in the vec4 backend. However,
double-precision instructions typically write 2 registers and in
some cases they run into certain hardware bugs and limitations
that we need to work around by splitting the instructions so we only
write to 1 register at a time. This patch implements a SIMD splitting
pass similar to the one in the scalar backend.
Because we only use double-precision instructions in Align16 mode
in gen7 (gen8+ is fully scalar and gens < 7 do not implement fp64)
the pass should be a no-op on any other generation.
For now the pass only handles the gen7 restriction where any
instruction that writes 2 registers also needs to read 2 registers.
This affects double-precision instructions reading uniforms, for
example. Later patches will extend the lowering pass adding a few
more cases.
v2:
- Move the simd lowering pass after the main optimization loop and
run copy-propagation and dce if it reports progress (Curro)
- Compute number of registers written instead of fixing it to 1 (Iago)
- Use group from backend_instruction (Iago)
- Drop assertion that checked that we only split 8-wide instructions
into 4-wide. (Curro)
- Don't assume that instructions can only be 8-wide, we might want
to use 16-wide instructions in the future too (Curro)
- Wrap gen7 workarounds in a conditional to ease adding workarounds
for other gens in the future (Curro)
- Handle dst/src overlap hazard (Curro)
- Use the horiz_offset() helper to simplify the implementation (Curro)
- Drop the assertion that checks that each split instruction writes
exactly one register (Curro)
- Use the copy constructor to generate split instructions with all
the relevant fields initialized to the values in the original
instruction instead of copying only a handful of them manually (Curro)
v3 (Iago):
- When copying to a temporary, allocate the number of registers required
for the copy based on the size written of the lowered instruction
instead of assuming that all lowered instructions produce single-register
writes
- Adapt to changes in offset()
Reviewed-by: Matt Turner <mattst88@gmail.com>