intel/fs: make scan/reduce work with SIMD32 when it fits 2 registers