Optimise sqrt reciprocal multiplications
This patch aims to optimise sequences involving uses of 1.0 / sqrt (a) under -freciprocal-math and -funsafe-math-optimizations.
In particular consider:
x = 1.0 / sqrt (a);
r1 = x * x; // same as 1.0 / a
r2 = a * x; // same as sqrt (a)
If x, r1 and r2 are all used further on in the code, this can be transformed into:
tmp1 = 1.0 / a
tmp2 = sqrt (a)
tmp3 = tmp1 * tmp2
x = tmp3
r1 = tmp1
r2 = tmp2
A bit convoluted, but this saves us one multiplication and, more importantly, the sqrt and division are now independent.
This also allows optimisation of a subset of these expressions.
For example:
x = 1.0 / sqrt (a)
r1 = x * x
can be transformed to r1 = 1.0 / a, eliminating the sqrt if x is not used anywhere else.
And similarly:
x = 1.0 / sqrt (a)
r1 = a * x
can be transformed to sqrt (a) eliminating the division.
For the testcase:
double res, res2, tmp;
void
foo (double a, double b)
{
tmp = 1.0 / __builtin_sqrt (a);
res = tmp * tmp;
res2 = a * tmp;
}
We now generate for aarch64 with -Ofast:
foo:
fmov d2, 1.0e+0
adrp x2, res2
fsqrt d1, d0
adrp x1, res
fdiv d0, d2, d0
adrp x0, tmp
str d1, [x2, #:lo12:res2]
fmul d1, d1, d0
str d0, [x1, #:lo12:res]
str d1, [x0, #:lo12:tmp]
ret
where before it generated:
foo:
fsqrt d2, d0
fmov d1, 1.0e+0
adrp x1, res2
adrp x2, tmp
adrp x0, res
fdiv d1, d1, d2
fmul d0, d1, d0
fmul d2, d1, d1
str d1, [x2, #:lo12:tmp]
str d0, [x1, #:lo12:res2]
str d2, [x0, #:lo12:res]
ret
As you can see, the new sequence has one fewer multiply and the fsqrt and fdiv are independent.
* tree-ssa-math-opts.c (is_mult_by): New function.
(is_square_of): Use the above.
(optimize_recip_sqrt): New function.
(pass_cse_reciprocals::execute): Use the above.
* gcc.dg/recip_sqrt_mult_1.c: New test.
* gcc.dg/recip_sqrt_mult_2.c: Likewise.
* gcc.dg/recip_sqrt_mult_3.c: Likewise.
* gcc.dg/recip_sqrt_mult_4.c: Likewise.
* gcc.dg/recip_sqrt_mult_5.c: Likewise.
* g++.dg/recip_sqrt_mult_1.C: Likewise.
* g++.dg/recip_sqrt_mult_2.C: Likewise.
From-SVN: r264126