independently on each Vector src or dest
register. Aside from Indexed REMAP this is entirely Hardware-accelerated
reordering and consequently not costly in terms of register access. It
-will however place a burden on Multi-Issue systems but no more than if -
-exactly as if - the equivalent Scalar instructions were explicitly
-loop-unrolled without SVP64.
+will however place a burden on Multi-Issue systems but no more than if
+the equivalent Scalar instructions were explicitly
+loop-unrolled without SVP64, and some advanced implementations may even find
+the Deterministic nature of the Scheduling to be easier on resources.
-The initial primary motivation of REMAP was for Matrix Multiplication, reordering of sequential
-data in-place: in-place DCT and FFT were easily justified given the
+The initial primary motivation of REMAP was for Matrix Multiplication, reordering
+of sequential data in-place: in-place DCT and FFT were easily justified given the
high usage in Computer Science.
Four SPRs are provided which may be applied to any GPR, FPR or CR Field
so that for example a single FMAC may be
-used in a single loop to perform 5x3 times 3x4 Matrix multiplication,
+used in a single hardware-controlled 100% Deterministic loop to
+perform 5x3 times 3x4 Matrix multiplication,
generating 60 FMACs *without needing explicit assembler unrolling*.
Additional uses include regular "Structure Packing"
such as RGB pixel data extraction and reforming.