intel/fs: don't forget the stride at generate_shuffle
During generate_shuffle(), when we use byte sized registers we end up
with a destination stride of 2. We don't take the stride into
consideration when selecting the group offset for the last MOV
operation, which means we end up moving things to the wrong place,
leaving the last few channels untouched. Take the destination stride
in consideration so we don't miss the last channels.
v2: Assert this is not necessary for the IVB special case (Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>