Optimise sqrt reciprocal multiplications
authorKyrylo Tkachov <kyrylo.tkachov@arm.com>
Wed, 5 Sep 2018 13:39:38 +0000 (13:39 +0000)
committerKyrylo Tkachov <ktkachov@gcc.gnu.org>
Wed, 5 Sep 2018 13:39:38 +0000 (13:39 +0000)
commit24c49431499bcb462aeee41e027a3dac25e934b3
tree65e7f2c1e0a84d7ea2da8fca45dd3c7b86341ccb
parent76a5eae5494440844a9b6d0171efdcf4ebb6f3c6
Optimise sqrt reciprocal multiplications

This patch aims to optimise sequences involving uses of 1.0 / sqrt (a) under -freciprocal-math and -funsafe-math-optimizations.
In particular consider:

x = 1.0 / sqrt (a);
r1 = x * x;  // same as 1.0 / a
r2 = a * x; // same as sqrt (a)

If x, r1 and r2 are all used further on in the code, this can be transformed into:
tmp1 = 1.0 / a
tmp2 = sqrt (a)
tmp3 = tmp1 * tmp2
x = tmp3
r1 = tmp1
r2 = tmp2

A bit convoluted, but this saves us one multiplication and, more importantly, the sqrt and division are now independent.
This also allows optimisation of a subset of these expressions.
For example:
x = 1.0 / sqrt (a)
r1 = x * x

can be transformed to r1 = 1.0 / a, eliminating the sqrt if x is not used anywhere else.
And similarly:
x = 1.0 / sqrt (a)
r1 = a * x

can be transformed to sqrt (a) eliminating the division.

For the testcase:
double res, res2, tmp;
void
foo (double a, double b)
{
  tmp = 1.0 / __builtin_sqrt (a);
  res = tmp * tmp;
  res2 = a * tmp;
}

We now generate for aarch64 with -Ofast:
foo:
        fmov    d2, 1.0e+0
        adrp    x2, res2
        fsqrt   d1, d0
        adrp    x1, res
        fdiv    d0, d2, d0
        adrp    x0, tmp
        str     d1, [x2, #:lo12:res2]
        fmul    d1, d1, d0
        str     d0, [x1, #:lo12:res]
        str     d1, [x0, #:lo12:tmp]
        ret

where before it generated:
foo:
        fsqrt   d2, d0
        fmov    d1, 1.0e+0
        adrp    x1, res2
        adrp    x2, tmp
        adrp    x0, res
        fdiv    d1, d1, d2
        fmul    d0, d1, d0
        fmul    d2, d1, d1
        str     d1, [x2, #:lo12:tmp]
        str     d0, [x1, #:lo12:res2]
        str     d2, [x0, #:lo12:res]
        ret

As you can see, the new sequence has one fewer multiply and the fsqrt and fdiv are independent.

* tree-ssa-math-opts.c (is_mult_by): New function.
(is_square_of): Use the above.
(optimize_recip_sqrt): New function.
(pass_cse_reciprocals::execute): Use the above.

* gcc.dg/recip_sqrt_mult_1.c: New test.
* gcc.dg/recip_sqrt_mult_2.c: Likewise.
* gcc.dg/recip_sqrt_mult_3.c: Likewise.
* gcc.dg/recip_sqrt_mult_4.c: Likewise.
* gcc.dg/recip_sqrt_mult_5.c: Likewise.
* g++.dg/recip_sqrt_mult_1.C: Likewise.
* g++.dg/recip_sqrt_mult_2.C: Likewise.

From-SVN: r264126
gcc/ChangeLog
gcc/testsuite/ChangeLog
gcc/testsuite/g++.dg/recip_sqrt_mult_1.C [new file with mode: 0644]
gcc/testsuite/g++.dg/recip_sqrt_mult_2.C [new file with mode: 0644]
gcc/testsuite/gcc.dg/recip_sqrt_mult_1.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/recip_sqrt_mult_2.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/recip_sqrt_mult_3.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/recip_sqrt_mult_4.c [new file with mode: 0644]
gcc/testsuite/gcc.dg/recip_sqrt_mult_5.c [new file with mode: 0644]
gcc/tree-ssa-math-opts.c