3D GPU operations on batches of vec2, vec3 and vec4 often require re-ordering of the elements in an "out of lane" fashion with respect to standard high performance non-GPU-centric Vector Processors. Examples include:
* Normalisation of Vectors of XYZ with respect to one dimension
-* Alteration of ARGB pixel vectors wuth respect to opacity (A)
+* Alteration of ARGB pixel vectors with respect to opacity (A)
* Adjustment of YUV vectors with respect to luminosity
and many more. Lane-based Vector Processors not having the 2/3/4 inter-lane crossing have some difficulty processing such data and require it to be pushed into memory and retrieved, which is prohibitively costly in both instructions, time, and power consumption.