Fix false dependence of scalar operation vrcp/vsqrt/vrsqrt/vrndscale
For instructions with xmm operand:
op %xmmN,%xmmQ,%xmmQ ----> op %xmmN, %xmmN, %xmmQ
for instruction with mem operand or gpr operand:
op mem/gpr, %xmmQ, %xmmQ
---> using pass rpad ---->
xorps %xmmN, %xmmN, %xxN
op mem/gpr, %xmmN, %xmmQ
Performance influence of SPEC2017 fprate which is tested on SKX
----
503.bwaves_r -0.03%
507.cactuBSSN_r -0.22%
508.namd_r -0.02%
510.parest_r 0.37%
511.povray_r 0.74%
519.lbm_r 0.24%
521.wrf_r 2.35%
526.blender_r 0.71%
527.cam4_r 0.65%
538.imagick_r 0.95%
544.nab_r -0.37
549.fotonik3d_r 0.24%
554.roms_r 0.90%
fprate geomean 0.50%
-----
Changelog
gcc/
* config/i386/i386.md (*rcpsf2_sse): Add
avx_partial_xmm_update, prefer m constraint for TARGET_AVX.
(*rsqrtsf2_sse): Ditto.
(*sqrt<mode>2_sse): Ditto.
(sse4_1_round<mode>2): separate constraint vm, add
avx_partail_xmm_update, prefer m constraint for TARGET_AVX.
* config/i386/sse.md (*sse_vmrcpv4sf2"): New define_insn used
by pass rpad.
(*<sse>_vmsqrt<mode>2<mask_scalar_name><round_scalar_name>*):
Ditto.
(*sse_vmrsqrtv4sf2): Ditto.
(*avx512f_rndscale<mode><round_saeonly_name>): Ditto.
(*sse4_1_round<ssescalarmodesuffix>): Ditto.
(sse4_1_round<ssescalarmodesuffix>): Add m constraint and
<iptr> pointer size modifier since vround support memory operand.
gcc/testsuite
* gcc.target/i386/pr87007-4.c: New test.
* gcc.target/i386/pr87007-5.c: Ditto.
From-SVN: r277469