being the most common), REMAP may be applied to
literally any instruction: CRs, Arithmetic, Logical, LD/ST, anything.
-When SUBVL is greater than 1 the group of Subvector
-elements are kept together, effectively the group becomes the
+When SUBVL is greater than 1 a given group of Subvector
+elements are kept together: effectively the group becomes the
element, and the group is REMAPed together.
Swizzle *can* however be applied to the same
instruction as REMAP, providing re-sequencing of
-Subvector elements that REMAP cannot. Also as explained in [[sv/mv.swizzle]], [[sv/mv.vec]] and the [[svp64/appendix]], Pack and Unpack EXTRA Mode bits
-can extend down into Sub-vector elements to perform vec2/vec3/vec4
+Subvector elements which REMAP cannot. Also as explained in [[sv/mv.swizzle]], [[sv/mv.vec]] and the [[svp64/appendix]], Pack and Unpack Mode bits
+can extend down into Sub-vector elements to influence vec2/vec3/vec4
sequential reordering, but even here, REMAP is not extended down to
the actual sub-vector elements themselves.
In its general form, REMAP is quite expensive to set up, and on some
implementations may introduce
latency, so should realistically be used only where it is worthwhile.
+Given that most other ISAs require full loop-unrolling for Matrix,
+DCT and FFT, savings are still anticipated.
Commonly-used patterns such as Matrix Multiply, DCT and FFT have
helper instruction options which make REMAP easier to use.