translate_generic: use memcpy if possible (v3)
Changes in v3:
- If we can do a copy, don't try to get an emit func, as that can assert(0)
Changes in v2:
- Add comment regarding copy_size
When used in GPU drivers, translate can be used to simultaneously
perform a gather operation, and convert away from unsupported formats.
In this use case, input and output formats will often be identical: clearly
it would make sense to use a memcpy in this case.
Instead, translate will insist to convert to and from 32-bit floating point
numbers.
This is not only extremely expensive, but it also loses precision for
32/64-bit integers and 64-bit floating point numbers.
This patch changes translate_generic to just use memcpy if the formats are
identical, non-blocked, and with an integral number of bytes per pixel (note
that all sensible vertex formats are like this).