i386: Add pass_remove_partial_avx_dependency
With -mavx, for
$ cat foo.i
extern float f;
extern double d;
extern int i;
void
foo (void)
{
d = f;
f = i;
}
we need to generate
vxorp[ds] %xmmN, %xmmN, %xmmN
...
vcvtss2sd f(%rip), %xmmN, %xmmX
...
vcvtsi2ss i(%rip), %xmmN, %xmmY
to avoid partial XMM register stall. This patch adds a pass to generate
a single
vxorps %xmmN, %xmmN, %xmmN
at entry of the nearest dominator for basic blocks with SF/DF conversions,
which is in the fake loop that contains the whole function, instead of
generating one
vxorp[ds] %xmmN, %xmmN, %xmmN
for each SF/DF conversion.
NB: The LCM algorithm isn't appropriate here since it may place a vxorps
inside the loop. Simple testcase show this:
$ cat badcase.c
extern float f;
extern double d;
void
foo (int n, int k)
{
for (int j = 0; j != n; j++)
if (j < k)
d = f;
}
It generates
...
loop:
if(j < k)
vxorps %xmm0, %xmm0, %xmm0
vcvtss2sd f(%rip), %xmm0, %xmm0
...
loopend
...
This is because LCM only works when there is a certain benifit. But for
conditional branch, LCM wouldn't move
vxorps %xmm0, %xmm0, %xmm0
out of loop. SPEC CPU 2017 on Intel Xeon with AVX512 shows:
1. The nearest dominator
|RATE |Improvement|
|500.perlbench_r | 0.55% |
|538.imagick_r | 8.43% |
|544.nab_r | 0.71% |
2. LCM
|RATE |Improvement|
|500.perlbench_r | -0.76% |
|538.imagick_r | 7.96% |
|544.nab_r | -0.13% |
Performance impacts of SPEC CPU 2017 rate on Intel Xeon with AVX512
using
-Ofast -flto -march=skylake-avx512 -funroll-loops
before
commit
e739972ad6ad05e32a1dd5c29c0b950a4c4bd576
Author: uros <uros@
138bc75d-0d04-0410-961f-
82ee72b054a4>
Date: Thu Jan 31 20:06:42 2019 +0000
PR target/89071
* config/i386/i386.md (*extendsfdf2): Split out reg->reg
alternative to avoid partial SSE register stall for TARGET_AVX.
(truncdfsf2): Ditto.
(sse4_1_round<mode>2): Ditto.
git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@268427
138bc75d-0d04-0410-961f-
82ee72b054a4
are:
|INT RATE |Improvement|
|500.perlbench_r | 0.55% |
|502.gcc_r | 0.14% |
|505.mcf_r | 0.08% |
|523.xalancbmk_r | 0.18% |
|525.x264_r |-0.49% |
|531.deepsjeng_r |-0.04% |
|541.leela_r |-0.26% |
|548.exchange2_r |-0.3% |
|557.xz_r |BuildSame|
|FP RATE |Improvement|
|503.bwaves_r |-0.29% |
|507.cactuBSSN_r | 0.04% |
|508.namd_r |-0.74% |
|510.parest_r |-0.01% |
|511.povray_r | 2.23% |
|519.lbm_r | 0.1% |
|521.wrf_r | 0.49% |
|526.blender_r | 0.13% |
|527.cam4_r | 0.65% |
|538.imagick_r | 8.43% |
|544.nab_r | 0.71% |
|549.fotonik3d_r | 0.15% |
|554.roms_r | 0.08% |
After commit
e739972ad6ad05e32a1dd5c29c0b950a4c4bd576, on Skylake client,
impacts on 538.imagick_r with
-fno-unsafe-math-optimizations -march=native -Ofast -funroll-loops -flto
1. Size comparision:
before:
text data bss dec hex filename
2436377 8352 4528
2449257 255f69 imagick_r
after:
text data bss dec hex filename
2425249 8352 4528
2438129 2533f1 imagick_r
2. Number of vxorps:
before after difference
4948 4135 -19.66%
3. Performance improvement:
|RATE |Improvement|
|538.imagick_r | 5.5% |
gcc/
2019-02-22 H.J. Lu <hongjiu.lu@intel.com>
Hongtao Liu <hongtao.liu@intel.com>
Sunil K Pandey <sunil.k.pandey@intel.com>
PR target/87007
* config/i386/i386-passes.def: Add
pass_remove_partial_avx_dependency.
* config/i386/i386-protos.h
(make_pass_remove_partial_avx_dependency): New.
* config/i386/i386.c (make_pass_remove_partial_avx_dependency):
New function.
(pass_data_remove_partial_avx_dependency): New.
(pass_remove_partial_avx_dependency): Likewise.
(make_pass_remove_partial_avx_dependency): Likewise.
* config/i386/i386.md (avx_partial_xmm_update): New attribute.
(*extendsfdf2): Add avx_partial_xmm_update.
(truncdfsf2): Likewise.
(*float<SWI48:mode><MODEF:mode>2): Likewise.
(SF/DF conversion splitters): Disabled for TARGET_AVX.
gcc/testsuite/
2019-02-22 H.J. Lu <hongjiu.lu@intel.com>
Hongtao Liu <hongtao.liu@intel.com>
Sunil K Pandey <sunil.k.pandey@intel.com>
PR target/87007
* gcc.target/i386/pr87007-1.c: New test.
* gcc.target/i386/pr87007-2.c: Likewise.
Co-Authored-By: Hongtao Liu <hongtao.liu@intel.com>
Co-Authored-By: Sunil K Pandey <sunil.k.pandey@intel.com>
From-SVN: r269119