x86: also fold remaining multi-vector-size shift insns
By slightly relaxing the checking in operand_type_register_match() we
can fold the vector shift insns with an XMM source as well. While
strictly speaking an overlap in just one size (see the code comment) is
not enough (both operands could have multiple sizes with just a single
common one), this is good enough for all templates we have, or which
could sensibly / usefully appear (within the scope of the present
operand matching model).
Tightening this a little would be possible, but would require broadcast
related information to be passed into the function.