that can be structure-packed (LD/ST typically), REMAP may be applied to
literally any instruction: CRs, Arithmetic, Logical, LD/ST, anything.
-Note that REMAP does not *directly* apply to sub-vector elements: that
+Note that REMAP does not *directly* apply to sub-vector elements but
+only to the group: that
is what swizzle is for. Swizzle *can* however be applied to the same
-instruction as REMAP. As explained in [[sv/mv.swizzle]], [[sv/mv.vec]] and the [[svp64/appendix]], Pack and Unpack EXTRA Mode bits
+instruction as REMAP. As explained in [[sv/mv.swizzle]]
+and the [[svp64/appendix]], Pack and Unpack EXTRA Mode bits
can extend down into Sub-vector elements to perform vec2/vec3/vec4
-sequential reordering, but even here, REMAP is not extended down to
-the actual sub-vector elements themselves.
+sequential reordering, but even here, REMAP is not *individually*
+extended down to the actual sub-vector elements themselves.
In its general form, REMAP is quite expensive to set up, and on some
implementations may introduce
latency, so should realistically be used only where it is worthwhile.
+Given that even with latency the fact that up to 127 operations
+can be requested to be issued (from a single instruction) it should
+be clear that REMAP should not be dismissed for *potential* latency alone.
Commonly-used patterns such as Matrix Multiply, DCT and FFT have
helper instruction options which make REMAP easier to use.