REMAP allows the usual vector loop `0..VL-1` to be "reshaped" (re-mapped)
from a linear form to a 2D or 3D transposed form, or "offset" to permit
-arbitrary access to elements, independently on each Vector src or dest
+arbitrary access to elements (when elwidth overrides are used),
+independently on each Vector src or dest
register.
The initial primary motivation of REMAP was for Matrix Multiplication, reordering of sequential
data in-place. Four SPRs are provided so that a single FMAC may be
-used in a single loop to perform 4x4 times 4x4 Matrix multiplication,
+used in a single loop to perform for example 4x4 times 4x4 Matrix multiplication,
generating 64 FMACs. Additional uses include regular "Structure Packing"
such as RGB pixel data extraction and reforming.