aarch64: Remove aarch64_sve_pred_dominates_p
authorRichard Sandiford <richard.sandiford@arm.com>
Fri, 2 Oct 2020 10:53:06 +0000 (11:53 +0100)
committerRichard Sandiford <richard.sandiford@arm.com>
Fri, 2 Oct 2020 10:53:06 +0000 (11:53 +0100)
commit0eb5e901f6e25a7b8a9790a7a8c209147fb649ec
tree91a66cd4e3e1e35222ab0ee100a5c124b5100a6f
parentbb78e5876aa6a6b4a8158cdc0f6c8511eb2be75f
aarch64: Remove aarch64_sve_pred_dominates_p

In r11-2922, Przemek fixed a post-RA instruction match failure
caused by the SVE FP subtraction patterns..  This patch applies
the same fix to the other patterns.

To recap, the issue is around the handling of predication.
We want to do two things:

- Optimise cases in which a predicate is known to be all-true.

- Differentiate cases in which the predicate on an _x ACLE function has
  to be kept as-is from cases in which we can make more lanes active.
  The former is true by default, the latter is true for certain
  combinations of flags in the -ffast-math group.

This is handled by a boolean flag in the unspecs to say whether the
predicate is “strict” or “relaxed”.  When combining multiple strict
operations, the predicates used in the operations generally need to
match.  When combining multiple relaxed operations, we can ignore the
predicates on nested operations and just use the predicate on the
“outermost” operation.

Originally I'd tried to reduce combinatorial explosion by using
aarch64_sve_pred_dominates_p.  This required matching predicates
for strict operations but allowed more combinations for relaxed
operations.

The problem (as I should have remembered) is that C conditions on
insn patterns can't reliably enforce matching operands.  If the
same register is used in two different input operands, the RA is
allowed to use different hard registers for those input operands
(and sometimes it has to).  So operands that match before RA
might not match afterwards.  The only sure way to force a match
is via match_dup.

This patch splits the cases into two.  I cry bitter tears at having
to do this, but I think it's the only backportable fix.  There might
be some way of using define_subst to generate the cond_* patterns from
the pred_* patterns, with some alternatives strategically disabled in
each case, but that's future work and might not be an improvement.

Since so many patterns now do this, I moved the comments from the
subtraction pattern to a new banner comment at the head of the file.

gcc/
* config/aarch64/aarch64-protos.h (aarch64_sve_pred_dominates_p):
Delete.
* config/aarch64/aarch64.c (aarch64_sve_pred_dominates_p): Likewise.
* config/aarch64/aarch64-sve.md: Add banner comment describing
how merging predicated FP operations are represented.
(*cond_<SVE_COND_FP_UNARY:optab><mode>_2): Split into...
(*cond_<SVE_COND_FP_UNARY:optab><mode>_2_relaxed): ...this and...
(*cond_<SVE_COND_FP_UNARY:optab><mode>_2_strict): ...this.
(*cond_<SVE_COND_FP_UNARY:optab><mode>_any): Split into...
(*cond_<SVE_COND_FP_UNARY:optab><mode>_any_relaxed): ...this and...
(*cond_<SVE_COND_FP_UNARY:optab><mode>_any_strict): ...this.
(*cond_<SVE_COND_FP_BINARY_INT:optab><mode>_2): Split into...
(*cond_<SVE_COND_FP_BINARY_INT:optab><mode>_2_relaxed): ...this and...
(*cond_<SVE_COND_FP_BINARY_INT:optab><mode>_2_strict): ...this.
(*cond_<SVE_COND_FP_BINARY_INT:optab><mode>_any): Split into...
(*cond_<SVE_COND_FP_BINARY_INT:optab><mode>_any_relaxed): ...this
and...
(*cond_<SVE_COND_FP_BINARY_INT:optab><mode>_any_strict): ...this.
(*cond_<SVE_COND_FP_BINARY:optab><mode>_2): Split into...
(*cond_<SVE_COND_FP_BINARY:optab><mode>_2_relaxed): ...this and...
(*cond_<SVE_COND_FP_BINARY:optab><mode>_2_strict): ...this.
(*cond_<SVE_COND_FP_BINARY_I1:optab><mode>_2_const): Split into...
(*cond_<SVE_COND_FP_BINARY_I1:optab><mode>_2_const_relaxed): ...this
and...
(*cond_<SVE_COND_FP_BINARY_I1:optab><mode>_2_const_strict): ...this.
(*cond_<SVE_COND_FP_BINARY:optab><mode>_3): Split into...
(*cond_<SVE_COND_FP_BINARY:optab><mode>_3_relaxed): ...this and...
(*cond_<SVE_COND_FP_BINARY:optab><mode>_3_strict): ...this.
(*cond_<SVE_COND_FP_BINARY:optab><mode>_any): Split into...
(*cond_<SVE_COND_FP_BINARY:optab><mode>_any_relaxed): ...this and...
(*cond_<SVE_COND_FP_BINARY:optab><mode>_any_strict): ...this.
(*cond_<SVE_COND_FP_BINARY_I1:optab><mode>_any_const): Split into...
(*cond_<SVE_COND_FP_BINARY_I1:optab><mode>_any_const_relaxed): ...this
and...
(*cond_<SVE_COND_FP_BINARY_I1:optab><mode>_any_const_strict): ...this.
(*cond_add<mode>_2_const): Split into...
(*cond_add<mode>_2_const_relaxed): ...this and...
(*cond_add<mode>_2_const_strict): ...this.
(*cond_add<mode>_any_const): Split into...
(*cond_add<mode>_any_const_relaxed): ...this and...
(*cond_add<mode>_any_const_strict): ...this.
(*cond_<SVE_COND_FCADD:optab><mode>_2): Split into...
(*cond_<SVE_COND_FCADD:optab><mode>_2_relaxed): ...this and...
(*cond_<SVE_COND_FCADD:optab><mode>_2_strict): ...this.
(*cond_<SVE_COND_FCADD:optab><mode>_any): Split into...
(*cond_<SVE_COND_FCADD:optab><mode>_any_relaxed): ...this and...
(*cond_<SVE_COND_FCADD:optab><mode>_any_strict): ...this.
(*cond_sub<mode>_3_const): Split into...
(*cond_sub<mode>_3_const_relaxed): ...this and...
(*cond_sub<mode>_3_const_strict): ...this.
(*aarch64_pred_abd<mode>): Split into...
(*aarch64_pred_abd<mode>_relaxed): ...this and...
(*aarch64_pred_abd<mode>_strict): ...this.
(*aarch64_cond_abd<mode>_2): Split into...
(*aarch64_cond_abd<mode>_2_relaxed): ...this and...
(*aarch64_cond_abd<mode>_2_strict): ...this.
(*aarch64_cond_abd<mode>_3): Split into...
(*aarch64_cond_abd<mode>_3_relaxed): ...this and...
(*aarch64_cond_abd<mode>_3_strict): ...this.
(*aarch64_cond_abd<mode>_any): Split into...
(*aarch64_cond_abd<mode>_any_relaxed): ...this and...
(*aarch64_cond_abd<mode>_any_strict): ...this.
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_2): Split into...
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_2_relaxed): ...this and...
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_2_strict): ...this.
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_4): Split into...
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_4_relaxed): ...this and...
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_4_strict): ...this.
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_any): Split into...
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_any_relaxed): ...this and...
(*cond_<SVE_COND_FP_TERNARY:optab><mode>_any_strict): ...this.
(*cond_<SVE_COND_FCMLA:optab><mode>_4): Split into...
(*cond_<SVE_COND_FCMLA:optab><mode>_4_relaxed): ...this and...
(*cond_<SVE_COND_FCMLA:optab><mode>_4_strict): ...this.
(*cond_<SVE_COND_FCMLA:optab><mode>_any): Split into...
(*cond_<SVE_COND_FCMLA:optab><mode>_any_relaxed): ...this and...
(*cond_<SVE_COND_FCMLA:optab><mode>_any_strict): ...this.
(*aarch64_pred_fac<cmp_op><mode>): Split into...
(*aarch64_pred_fac<cmp_op><mode>_relaxed): ...this and...
(*aarch64_pred_fac<cmp_op><mode>_strict): ...this.
(*cond_<optab>_nontrunc<SVE_FULL_F:mode><SVE_FULL_HSDI:mode>): Split
into...
(*cond_<optab>_nontrunc<SVE_FULL_F:mode><SVE_FULL_HSDI:mode>_relaxed):
...this and...
(*cond_<optab>_nontrunc<SVE_FULL_F:mode><SVE_FULL_HSDI:mode>_strict):
...this.
(*cond_<optab>_nonextend<SVE_FULL_HSDI:mode><SVE_FULL_F:mode>): Split
into...
(*cond_<optab>_nonextend<SVE_FULL_HSDI:mode><SVE_FULL_F:mode>_relaxed):
...this and...
(*cond_<optab>_nonextend<SVE_FULL_HSDI:mode><SVE_FULL_F:mode>_strict):
...this.
* config/aarch64/aarch64-sve2.md
(*cond_<SVE2_COND_FP_UNARY_LONG:optab><mode>): Split into...
(*cond_<SVE2_COND_FP_UNARY_LONG:optab><mode>_relaxed): ...this and...
(*cond_<SVE2_COND_FP_UNARY_LONG:optab><mode>_strict): ...this.
(*cond_<SVE2_COND_FP_UNARY_NARROWB:optab><mode>_any): Split into...
(*cond_<SVE2_COND_FP_UNARY_NARROWB:optab><mode>_any_relaxed): ...this
and...
(*cond_<SVE2_COND_FP_UNARY_NARROWB:optab><mode>_any_strict): ...this.
(*cond_<SVE2_COND_INT_UNARY_FP:optab><mode>): Split into...
(*cond_<SVE2_COND_INT_UNARY_FP:optab><mode>_relaxed): ...this and...
(*cond_<SVE2_COND_INT_UNARY_FP:optab><mode>_strict): ...this.
gcc/config/aarch64/aarch64-protos.h
gcc/config/aarch64/aarch64-sve.md
gcc/config/aarch64/aarch64-sve2.md
gcc/config/aarch64/aarch64.c