git.libre-soc.org Git - mesa.git/commit

author	Eric Anholt <eric@anholt.net>
	Thu, 5 Jan 2017 23:11:30 +0000 (15:11 -0800)
committer	Eric Anholt <eric@anholt.net>
	Thu, 26 Jan 2017 20:48:10 +0000 (12:48 -0800)
commit	4d30024238efa829cabc72c1601beeee18c3dbf2
tree	09f04f006eb015b3cc5940eddde461519114f77a	tree
parent	347b69e7d74f61f3b08853ccdfad72bdae683e12	commit \| diff

vc4: Use NEON to speed up utile loads on Pi2.

We had a lot of memcpy call overhead because gpu_stride wasn't being
inlined.  But if you split out the stride==8 and stride==16 cases like
this code does while still using memcpy, you'd no longer have glibc's
NEON memcpy applied at which point we'd be doing 16 uncached reads
instead of 64/(NEON memcpy granularity), for about a 30% performance
hit.  By hand writing the assembly, we can get a whole cacheline
loaded at a time.

Unfortunately, NEON intrinsics turned out to be unusable -- they
didn't have the vldm instruction available.

Note that, for now, the NEON code is only enabled when building for ARMv7
(Pi 2+).  We may want to do runtime detection for the Raspbian case, in
the future.

Improves 1024x1024 GetTexImage by 208.256% +/- 7.07029% (n=10).

src/gallium/drivers/vc4/Makefile.am		diff \| blob \| history
src/gallium/drivers/vc4/vc4_tiling.h		diff \| blob \| history
src/gallium/drivers/vc4/vc4_tiling_lt.c		diff \| blob \| history