freedreno/ir3: more clever legalize algorithm
Previously we didn't handle flow control in legalize, and instead just
set (ss)(sy) on the first instruction in every block. Which isn't very
clever.
Instead, consider output state of all predecessor blocks, so we only
set a sync bit if needed for any possible path leading into a block.
Because of loops, we can't require that all successor blocks are
legalized before a given block, so instead run in a loop until results
converge.
Signed-off-by: Rob Clark <robdclark@gmail.com>