intel/fs/gen12: Introduce software scoreboard lowering pass.
Gen12+ hardware lacks the register scoreboard logic that used to
guarantee data coherency between register reads and writes in previous
generations. This lowering pass runs after register allocation in
order to make up for it.
It works by performing global dataflow analysis in order to determine
the set of potential dependencies of every instruction in the shader,
and then inserts any required SWSB annotations and additional SYNC
instructions in order to guarantee data coherency.
v2: Drop unnecessary _safe list iteration (Caio).
v3: Temporarily workaround potential WaR hazard between FPU
instruction and subsequent out-of-order write, pending
clarification from the hardware team. Drop redundant tracking of
implicit access of acc0-1, since the hardware guarantees coherency
of these (but not the other accumulators...).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>