Simplify movmem code by always doing overlapping copies when larger than 8 bytes on AArch64.
This changes the movmem code in AArch64 that does copy for data between 4 and 7
bytes to use the smallest possible mode capable of copying the remaining bytes in one
go and then overlapping the reads if needed.
This means that if we're copying 5 bytes we would issue an SImode and QImode
load instead of two SImode loads.
This does smaller memory accesses but also gives the mid-end a chance to realise
that it can CSE the loads in certain circumstances. e.g. when you have something
like
return foo;
where foo is a struct. This would be transformed by the mid-end into SSA form as
D.XXXX = foo;
return D.XXXX;
This movmem routine will handle the first copy, but it's usually not needed,
the mid-end would do SImode and QImode stores into X0 for the 5 bytes example
but without the first copies being in the same mode, it doesn't know it doesn't
need the stores at all.
From-SVN: r262434