[AArch64] Improve popcount expansion
The popcount expansion uses umov to extend the result and move it back
to the integer register file. If we model ADDV as a zero-extending
operation, fmov can be used to move back to the integer side. This
results in a ~0.5% speedup on deepsjeng on Cortex-A57.
A typical __builtin_popcount expansion is now:
fmov s0, w0
cnt v0.8b, v0.8b
addv b0, v0.8b
fmov w0, s0
gcc/
* config/aarch64/aarch64-simd.md
(aarch64_zero_extend<GPI:mode>_reduc_plus_<VDQV_E:mode>): New pattern.
* config/aarch64/aarch64.md (popcount<mode>2): Use it instead of
generating separate ADDV and zero_extend patterns.
* config/aarch64/iterators.md (VDQV_E): New iterator.
testsuite/
* gcc.target/aarch64/popcnt2.c: New test.