[AArch64] Prefer LD1RQ for big-endian SVE
authorRichard Sandiford <richard.sandiford@linaro.org>
Thu, 1 Feb 2018 11:04:00 +0000 (11:04 +0000)
committerRichard Sandiford <rsandifo@gcc.gnu.org>
Thu, 1 Feb 2018 11:04:00 +0000 (11:04 +0000)
commit8179efe00e04285184112de7dbb977a75852197c
tree5a7162b0e53a7fb9c7dbbc86cf05d9a3619cbda2
parent947b137212d16d432eec201fe7f800dfdb481203
[AArch64] Prefer LD1RQ for big-endian SVE

This patch deals with cases in which a CONST_VECTOR contains a
repeating bit pattern that is wider than one element but narrower
than 128 bits.  The current code:

* treats the repeating pattern as a single element
* uses the associated LD1R to load and replicate it (such as LD1RD
  for 64-bit patterns)
* uses a subreg to cast the result back to the original vector type

The problem is that for big-endian targets, the final cast is
effectively a form of element reverse.  E.g. say we're using LD1RD to load
16-bit elements, with h being the high parts and l being the low parts:

                               +-----+-----+-----+-----+-----+----
                         lanes |  0  |  1  |  2  |  3  |  4  | ...
                               +-----+-----+-----+-----+-----+----
     memory              bytes |h0 l0 h1 l1 h2 l2 h3 l3 h0 l0 ....
                               +----------------------------------
                                 V  V  V  V  V  V  V  V
                     ----------+-----------------------+
    register         ....      |           0           |
     after           ----------+-----------------------+  lsb
     LD1RD           .... h3 l3 h0 l0 h1 l1 h2 l2 h3 l3|
                     ----------------------------------+

                     ----+-----+-----+-----+-----+-----+
    expected         ... |  4  |  3  |  2  |  1  |  0  |
    register         ----+-----+-----+-----+-----+-----+  lsb
    contents         .... h0 l0 h3 l3 h2 l2 h1 l1 h0 l0|
                     ----------------------------------+

A later patch fixes the handling of general subregs to account
for this, but it means that we need to do a REV instruction
after the load.  It seems better to use LD1RQ[BHW] on a 128-bit
pattern instead, since that gets the endianness right without
a separate fixup instruction.

2018-02-01  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
* config/aarch64/aarch64.c (aarch64_expand_sve_const_vector): Prefer
the TImode handling for big-endian targets.

gcc/testsuite/
* gcc.target/aarch64/sve/slp_2.c: Expect LD1RQ to be used instead
of LD1R[HWD] for multi-element constants on big-endian targets.
* gcc.target/aarch64/sve/slp_3.c: Likewise.
* gcc.target/aarch64/sve/slp_4.c: Likewise.

Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>
From-SVN: r257288
gcc/ChangeLog
gcc/config/aarch64/aarch64.c
gcc/testsuite/ChangeLog
gcc/testsuite/gcc.target/aarch64/sve/slp_2.c
gcc/testsuite/gcc.target/aarch64/sve/slp_3.c
gcc/testsuite/gcc.target/aarch64/sve/slp_4.c