Make ira_get_dup_out_num handle more cases
SVE has a prefix instruction (MOVPRFX) that acts as a move but is
designed to be easily fusible with the following instruction. The SVE
port therefore has lots of patterns with constraints of the form:
A: operand 0: =w,?w
...
operand n: 0, w
where the first alternative is a single instruction and the second
alternative uses MOVPRFX.
Ideally we want operand n to be allocated to the same register as
operand 0 in this case.
add_insn_allocno_copies is the main IRA routine that deals with tied
operands. It is (rightly) very conservative, and only handles cases in
which we're confident about saving a full move. So for a pattern like:
B: operand 0: =w,w
...
operand n: 0,w
we don't (and shouldn't) assume that tying operands 0 and n would
save the cost of a move.
But in A, the second alternative has a ? marker, which makes it more
expensive than the first alternative by a full reload. So I think for
copy elision we should ignore the untied operand n in the second
alternative of A.
One approach would be to add '*' markers to each pattern and make
ira_get_dup_out_num honour them. But I think the rule applies on
first principles, so marking with '*' shouldn't be necessary.
This patch instead makes ira_get_dup_out_num ignore expensive
alternatives if there are other alternatives that match exactly.
The cheapest way of doing that seemed to be to take expensive
alternatives out of consideration in ira_setup_alts, which provides
a bitmask of alternatives and has all the information available.
add_insn_allocno_copies is the only current user of ira_setup_alts,
so no other code should be affected.
If all available alternatives are disparaged or need a reload,
there's not much we can do to cut them down at this stage,
since it's hard to predict which operands will be reloaded and
which registers will need to be spilled.
An interesting case is patterns like this msp430 one:
;; Alternatives 2 and 3 are to handle cases generated by reload.
(define_insn "subqi3"
[(set (match_operand:QI 0 "nonimmediate_operand" "=rYs, rm, &?r, ?&r")
(minus:QI (match_operand:QI 1 "general_operand" "0, 0, !r, !i")
(match_operand:QI 2 "general_operand" " riYs, rmi, rmi, r")))]
""
"@
SUB.B\t%2, %0
SUB%X0.B\t%2, %0
MOV%X0.B\t%1, %0 { SUB%X0.B\t%2, %0
MOV%X0.B\t%1, %0 { SUB%X0.B\t%2, %0"
)
Here alternative 3 is significantly more expensive then alternative 0
(reject costs 0 and 606 respectively). But if operand 1 is an integer
constant, we'll still use alternative 3 if operand 2 is an allocated
register. On the other hand, if operand 1 is an integer constant but
operand 2 is spilled to memory, we'll move the constant into a register
and use the first alternative.
So in this case, if operand 1 is a register, we should consider
only the first two alternatives and thus try to tie operand 1
to operand 0 (which we didn't do previously). If operand 1 is a
constant integer, we should consider at least alternatives 0, 1 and 3.
We could exclude alternative 2, but I don't have any evidence that
that's useful.
2019-07-01 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* ira.c (ira_setup_alts): If any valid alternatives have zero cost,
exclude any others that are disparaged or that are bound to need
a reload or spill.
(ira_get_dup_out_num): Expand comment.
From-SVN: r272849