register.
The initial primary motivation of REMAP was for Matrix Multiplication, reordering of sequential
-data in-place. Four SPRs are provided so that a single FMAC may be
+data in-place: in-place DCT and FFT were easily justified given the
+high usage in Computer Science.
+Four SPRs are provided so that a single FMAC may be
used in a single loop to perform for example 4x4 times 4x4 Matrix multiplication,
generating 64 FMACs. Additional uses include regular "Structure Packing"
such as RGB pixel data extraction and reforming.
the actual sub-vector elements themselves.
In its general form, REMAP is quite expensive to set up, and on some
-implementations introduce
+implementations may introduce
latency, so should realistically be used only where it is worthwhile.
Commonly-used patterns such as Matrix Multiply, DCT and FFT have
helper instruction options which make REMAP easier to use.
There are three types of REMAP:
-* **Matrix**, also known as 2D and 3D reshaping
+* **Matrix**, also known as 2D and 3D reshaping, can perform in-place
+ Matrix transpose and rotate.
* **FFT/DCT**, with full triple-loop in-place support: limited to
Power-2 RADIX
* **Indexing**, for any general-purpose reordering, also includes