gallivm: enhance special sse2 4x4f and 2x8f -> 1x16ub conversion
There's no good reason why it can't handle 2x4f->1x8ub, 1x4f->1x4ub and
1x8f->1x8ub cases, there might be legitimate reasons why we don't have
enough input vectors for a full destination vector, and using pack
intrinsics should still be much better than using generic conversion
(it looks like convert_alpha from the blend code might hit this though
I suspect it could be avoided).
v2: add another test vector format to lp_test_conv so this gets tested.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>