expansion: Improve double-word modulo by certain constant divisors [PR97459]
As discussed in the PR, e.g. on x86_64 (both -m32 and -m64) there is no
double-word modulo and so we expand it to a __{,u}mod[dt]i3 call.
For certain constant divisors we can do better. E.g. consider
32-bit word-size, 0x100000000ULL % 3 == 1, so we can use partly the Hacker's
delight modulo by summing digits approach and optimize
unsigned long long foo (unsigned long long x) { return x % 3; }
as
unsigned long long foo (unsigned long long x) {
unsigned int sum, carry;
carry = __builtin_add_overflow ((unsigned int) x, (unsigned int) (x >> 32), &sum);
sum += carry;
return sum % 3;
}
Similarly, 0x10000000ULL % 5 == 1 (note, 1 << 28), so
unsigned long long bar (unsigned long long x) { return x % 5; }
as
unsigned long long bar (unsigned long long x) {
unsigned int sum = x & ((1 << 28) - 1);
sum += (x >> 28) & ((1 << 28) - 1);
sum += (x >> 56);
return sum % 5;
}
etc.
And we can do also signed modulo,
long long baz (long long x) { return x % 5; }
as
long long baz (long long x) {
unsigned int sum = x & ((1 << 28) - 1);
sum += ((unsigned long long) x >> 28) & ((1 << 28) - 1);
sum += ((unsigned long long) x >> 56);
/* Sum adjustment for negative x. */
sum += (x >> 63) & 3;
unsigned int rem = sum % 5;
/* And finally adjust it to the right interval for negative values. */
return (int) (rem + ((x >> 63) & -4));
}
2020-11-30 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/97459
* internal-fn.h (expand_addsub_overflow): Declare.
* internal-fn.c (expand_addsub_overflow): No longer static.
* optabs.c (expand_doubleword_mod): New function.
(expand_binop): Optimize double-word mod with constant divisor.
* gcc.dg/pr97459-1.c: New test.
* gcc.dg/pr97459-2.c: New test.