This patch adds support to vectorize sum of abslolute differences (SAD_EXPR)
using SVE.
Given this input code:
int
sum_abs (uint8_t *restrict x, uint8_t *restrict y, int n)
{
int sum = 0;
for (int i = 0; i < n; i++)
{
sum += __builtin_abs (x[i] - y[i]);
}
return sum;
}
The resulting SVE code is:
0000000000000000 <sum_abs>:
0:
7100005f cmp w2, #0x0
4:
5400026d b.le 50 <sum_abs+0x50>
8:
d2800003 mov x3, #0x0 // #0
c:
93407c42 sxtw x2, w2
10:
2538c002 mov z2.b, #0
14:
25221fe0 whilelo p0.b, xzr, x2
18:
2538c023 mov z3.b, #1
1c:
2518e3e1 ptrue p1.b
20:
a4034000 ld1b {z0.b}, p0/z, [x0, x3]
24:
a4034021 ld1b {z1.b}, p0/z, [x1, x3]
28:
0430e3e3 incb x3
2c:
0520c021 sel z1.b, p0, z1.b, z0.b
30:
25221c60 whilelo p0.b, x3, x2
34:
040d0420 uabd z0.b, p1/m, z0.b, z1.b
38:
44830402 udot z2.s, z0.b, z3.b
3c:
54ffff21 b.ne 20 <sum_abs+0x20> // b.any
40:
2598e3e0 ptrue p0.s
44:
04812042 uaddv d2, p0, z2.s
48:
1e260040 fmov w0, s2
4c:
d65f03c0 ret
50:
1e2703e2 fmov s2, wzr
54:
1e260040 fmov w0, s2
58:
d65f03c0 ret
Notice how udot is used inside a fully masked loop.
gcc/Changelog:
2019-05-07 Alejandro Martinez <alejandro.martinezvicente@arm.com>
* config/aarch64/aarch64-sve.md (<su>abd<mode>_3): New define_expand.
(aarch64_<su>abd<mode>_3): Likewise.
(*aarch64_<su>abd<mode>_3): New define_insn.
(<sur>sad<vsi2qi>): New define_expand.
* config/aarch64/iterators.md: Added MAX_OPP attribute.
* tree-vect-loop.c (use_mask_by_cond_expr_p): Add SAD_EXPR.
(build_vect_cond_expr): Likewise.
gcc/testsuite/Changelog:
2019-05-07 Alejandro Martinez <alejandro.martinezvicente@arm.com>
* gcc.target/aarch64/sve/sad_1.c: New test for sum of absolute
differences.
From-SVN: r270975