The reason for the dramatic increase is that the reordering of each element in vec4 requires 2 bits per element, plus a predicate mask. This means a minimum of 3 bits per element: 12 bits for a vec4, and if there are 2 src operands this is a whopping 24 bits of immediate data, per instruction.
The reason for the dramatic increase is that the reordering of each element in vec4 requires 2 bits per element, plus a predicate mask. This means a minimum of 3 bits per element: 12 bits for a vec4, and if there are 2 src operands this is a whopping 24 bits of immediate data, per instruction.