Additional savings come in the form of `SVREMAP`. This is a hardware
index transformation system where the normally sequentially-linear
-element access may be "Re-Mapped" to a limited but algorithmic-tailored
-and commonly-used deterministic schedule, for example Matrix Multiply,
+element access may be "Re-Mapped" to limited but algorithmic-tailored
+commonly-used deterministic schedules, for example Matrix Multiply,
DCT, or FFT. A full in-register-file 5x7 Matrix Multiply or a 3x4 or
2x6 may be performed in as little as 4 instructions, one of which
is to zero-initialise the accumulator Vector used to store the result.
unprecedented. RADIX2 in-place DCT Triple-loop Schedules may be created in
around 11 instructions. The only other processors well-known to have
this type of compact capability are both VLIW DSPs: TI's TMS320 Series
-and Qualcom's Hexagon.
+and Qualcom's Hexagon, and both are targetted at FFTs only.
+
+There is no reason at all why future algorithmic schedules should not
+be proposed as extensions to SVP64 (sorting algorithms,
+compression algorithms, Sparse Data Sets, Graph Node walking
+for example). Bear in mind that
+the submission process will be
+entirely at the discretion of the OpenPOWER Foundation ISA WG,
+however this is encouraged and welcomed.