From: Richard Sandiford Date: Fri, 25 May 2018 08:18:42 +0000 (+0000) Subject: Prefer open-coding vector integer division X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=8f76f377861b4195487416806c4a0eacabc433c9;p=gcc.git Prefer open-coding vector integer division vect_recog_divmod_pattern currently bails out if the target has native support for integer division, but I think in practice it's always going to be better to open-code it anyway, just as we usually open-code scalar divisions by constants. I think the only currently affected targets are MIPS MSA and powerpcspe (which is currently marked obsolete). For: void foo (int *x) { for (int i = 0; i < 100; ++i) x[i] /= 2; } the MSA port previously preferred to use division for powers of 2: .set noreorder bnz.w $w1,1f div_s.w $w0,$w0,$w1 break 7 .set reorder 1: (or just the div_s.w for -mno-check-zero-division), but after the patch it open-codes them using shifts: clt_s.w $w1,$w0,$w2 subv.w $w0,$w0,$w1 srai.w $w0,$w0,1 MSA doesn't define a high-part pattern, so it still uses a division instruction for the non-power-of-2 case. Richard B pointed out that this would disable SLP of division by different amounts, but I think in practice that's a price worth paying, since the current cost model can't really tell whether using a general vector division is better than using open-coded scalar divisions. The fix would be either to support SLP of mixed open-coded divisions or to improve the cost model and try SLP again without the patterns. The patch adds an XFAILed test for this. 2018-05-23 Richard Sandiford gcc/ * tree-vect-patterns.c: Include predict.h. (vect_recog_divmod_pattern): Restrict check for division support to when optimizing for size. gcc/testsuite/ * gcc.dg/vect/bb-slp-div-1.c: New XFAILed test. From-SVN: r260711 --- diff --git a/gcc/ChangeLog b/gcc/ChangeLog index fd187b92d39..67938386d67 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2018-05-25 Richard Sandiford + + * tree-vect-patterns.c: Include predict.h. + (vect_recog_divmod_pattern): Restrict check for division support + to when optimizing for size. + 2018-05-25 Richard Sandiford * doc/sourcebuild.texi (vect_double_cond_arith: Document. diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index c5b2c631b5d..66296db447e 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2018-05-25 Richard Sandiford + + * gcc.dg/vect/bb-slp-div-1.c: New XFAILed test. + 2018-05-25 Richard Sandiford * lib/target-supports.exp diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-div-1.c b/gcc/testsuite/gcc.dg/vect/bb-slp-div-1.c new file mode 100644 index 00000000000..65d83a437b6 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-div-1.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-msve-vector-bits=256" { target aarch64_sve } } */ + +int x[8]; + +void +f (void) +{ + x[0] /= 2; + x[1] /= 3; + x[2] /= 4; + x[3] /= 5; + x[4] /= 6; + x[5] /= 7; + x[6] /= 8; + x[7] /= 9; +} + +/* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { xfail *-*-* } } } */ diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index 75bf84b7645..6da784cdc3a 100644 --- a/gcc/tree-vect-patterns.c +++ b/gcc/tree-vect-patterns.c @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3. If not see #include "attribs.h" #include "cgraph.h" #include "omp-simd-clone.h" +#include "predict.h" /* Pattern recognition functions */ static gimple *vect_recog_widen_sum_pattern (vec *, tree *, @@ -2674,15 +2675,19 @@ vect_recog_divmod_pattern (vec *stmts, if (vectype == NULL_TREE) return NULL; - /* If the target can handle vectorized division or modulo natively, - don't attempt to optimize this. */ - optab = optab_for_tree_code (rhs_code, vectype, optab_default); - if (optab != unknown_optab) + if (optimize_bb_for_size_p (gimple_bb (last_stmt))) { - machine_mode vec_mode = TYPE_MODE (vectype); - int icode = (int) optab_handler (optab, vec_mode); - if (icode != CODE_FOR_nothing) - return NULL; + /* If the target can handle vectorized division or modulo natively, + don't attempt to optimize this, since native division is likely + to give smaller code. */ + optab = optab_for_tree_code (rhs_code, vectype, optab_default); + if (optab != unknown_optab) + { + machine_mode vec_mode = TYPE_MODE (vectype); + int icode = (int) optab_handler (optab, vec_mode); + if (icode != CODE_FOR_nothing) + return NULL; + } } prec = TYPE_PRECISION (itype);