From c71c1f44b055c680f073a2608a3bf560b55f8974 Mon Sep 17 00:00:00 2001 From: Jason Ekstrand Date: Wed, 1 Apr 2020 15:05:18 -0500 Subject: [PATCH] nir/from_ssa: Only chain movs when a src is also a dest The algorithm we use for resolving parallel copy instructions plays this little shell game with the values. The reason for this is that it lets us handle cases where, for instance we have a -> b and b -> a and we need to use a temporary to do a swap. One result of this algorithm is that it tends to emit a lot of mov chains which are typcially really bad for GPUs where a mov is far from free. For instance, it's likely to turn this: r16 = ssa_0; r17 = ssa_0; r18 = ssa_0; r15 = ssa_0 into this: r15 = mov ssa_0 r18 = mov r15 r17 = mov r18 r16 = mov r17 which, if it's the only thing in a block (this is common for phis) is impossible for a scheduler to fix because of the dependencies and you end up with significant stalling. If, on the other hand, we only do the chaining in the actual case where we need to free up a so that it can be used as a destination, we can emit this: r15 = mov ssa_0 r18 = mov ssa_0 r17 = mov ssa_0 r16 = mov ssa_0 which is far nicer to the scheduler. On Intel, our copy propagation pass will undo the chain for us so this has no shader-db impact. However, for less intelligent back-ends, it's probably a lot better. Reviewed-by: Rob Clark Reviewed-by: Connor Abbott Tested-by: Marge Bot Part-of: --- src/compiler/nir/nir_from_ssa.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/src/compiler/nir/nir_from_ssa.c b/src/compiler/nir/nir_from_ssa.c index acb758584a6..41ee4e79581 100644 --- a/src/compiler/nir/nir_from_ssa.c +++ b/src/compiler/nir/nir_from_ssa.c @@ -680,15 +680,17 @@ resolve_parallel_copy(nir_parallel_copy_instr *pcopy, int a = pred[b]; emit_copy(&state->builder, values[loc[a]], values[b]); - /* If any other copies want a they can find it at b */ - loc[a] = b; - /* b has been filled, mark it as not needing to be copied */ pred[b] = -1; - /* If a needs to be filled, it's ready for copying now */ - if (pred[a] != -1) + /* If a needs to be filled... */ + if (pred[a] != -1) { + /* If any other copies want a they can find it at b */ + loc[a] = b; + + /* It's ready for copying now */ ready[++ready_idx] = a; + } } int b = to_do[to_do_idx--]; if (pred[b] == -1) -- 2.30.2