[AArch64] Emit TARGET_DOTPROD-specific sequence for <us>sadv16qi
authorKyrylo Tkachov <kyrylo.tkachov@arm.com>
Mon, 3 Jun 2019 11:20:58 +0000 (11:20 +0000)
committerKyrylo Tkachov <ktkachov@gcc.gnu.org>
Mon, 3 Jun 2019 11:20:58 +0000 (11:20 +0000)
commit72215009a9f9827397a4eb74e9341b2b7dc658df
tree85c9597bd0985e8be2de5f8dfbbcce8493abad31
parentc89503d957f13f7f0a5eeeab1326048c455d9533
[AArch64] Emit TARGET_DOTPROD-specific sequence for <us>sadv16qi

Wilco pointed out that when the Dot Product instructions are available we can use them
to generate an even more efficient expansion for the [us]sadv16qi optab.
Instead of the current:
        uabdl2  v0.8h, v1.16b, v2.16b
        uabal   v0.8h, v1.8b, v2.8b
        uadalp  v3.4s, v0.8h

we can generate:
      (1)  mov    v4.16b, 1
      (2)  uabd    v0.16b, v1.16b, v2.16b
      (3)  udot    v3.4s, v0.16b, v4.16b

Instruction (1) can be CSEd across multiple such expansions and even hoisted outside of loops,
so when this sequence appears frequently back-to-back (like in x264_r) we essentially only have 2 instructions
per sum. Also, the UDOT instruction does the byte-to-word accumulation in one step, which allows us to use
the much simpler UABD instruction before it.

This makes it a shorter and lower-latency sequence overall for targets that support it.

* config/aarch64/iterators.md (MAX_OPP): New code attr.
* config/aarch64/aarch64-simd.md (*aarch64_<su>abd<mode>_3): Rename to...
(aarch64_<su>abd<mode>_3): ... This.
(<sur>sadv16qi): Add TARGET_DOTPROD expansion.

* gcc.target/aarch64/ssadv16qi.c: Add +nodotprod to pragma.
* gcc.target/aarch64/usadv16qi.c: Likewise.
* gcc.target/aarch64/ssadv16qi-dotprod.c: New test.
* gcc.target/aarch64/usadv16qi-dotprod.c: Likewise.

From-SVN: r271863
gcc/ChangeLog
gcc/config/aarch64/aarch64-simd.md
gcc/testsuite/ChangeLog
gcc/testsuite/gcc.target/aarch64/ssadv16qi-dotprod.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/ssadv16qi.c
gcc/testsuite/gcc.target/aarch64/usadv16qi-dotprod.c [new file with mode: 0644]
gcc/testsuite/gcc.target/aarch64/usadv16qi.c