latency, so should realistically be used only where it is worthwhile.
Given that even with latency the fact that up to 127 operations
can be requested to be issued (from a single instruction) it should
-be clear that REMAP should not be dismissed for *potential* latency alone.
+be clear that REMAP should not be dismissed for *possible* latency alone.
Commonly-used patterns such as Matrix Multiply, DCT and FFT have
helper instruction options which make REMAP easier to use.