i965/fs: Enables 16-bit load_ubo with sampler
load_ubo is using 32-bit loads as uniforms surfaces have a 32-bit
surface format defined. So when reading 16-bit components with the
sampler we need to unshuffle two 16-bit components from each 32-bit
component.
Using the sampler avoids the use of the byte_scattered_read message
that needs one message for each component and is supposed to be
slower.
v2: (Jason Ekstrand)
- Simplify component selection and unshuffling for different bitsizes
- Remove SKL optimization of reading only two 32-bit components when
reading 16-bits types.
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>