From: lkcl <lkcl@web>
Date: Sat, 15 Apr 2023 22:30:26 +0000 (+0100)
Subject: (no commit message)
X-Git-Tag: opf_rfc_ls009_v1~60
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=7fc74252d07be595d141ed62c0432e5fcaf8246d;p=libreriscv.git

---

diff --git a/openpower/sv/remap.mdwn b/openpower/sv/remap.mdwn
index 1fb8b3edf..2bc8a598f 100644
--- a/openpower/sv/remap.mdwn
+++ b/openpower/sv/remap.mdwn
@@ -38,19 +38,21 @@ that can be structure-packed (LD/ST and Move operations
 being the most common), REMAP may be applied to
 literally any instruction: CRs, Arithmetic, Logical, LD/ST, anything.
 
-When SUBVL is greater than 1 the group of Subvector
-elements are kept together, effectively the group becomes the
+When SUBVL is greater than 1 a given group of Subvector
+elements are kept together: effectively the group becomes the
 element, and the group is REMAPed together.
 Swizzle *can* however be applied to the same
 instruction as REMAP, providing re-sequencing of
-Subvector elements that REMAP cannot. Also as explained in [[sv/mv.swizzle]], [[sv/mv.vec]] and the [[svp64/appendix]], Pack and Unpack EXTRA Mode bits
-can extend down into Sub-vector elements to perform vec2/vec3/vec4
+Subvector elements which REMAP cannot. Also as explained in [[sv/mv.swizzle]], [[sv/mv.vec]] and the [[svp64/appendix]], Pack and Unpack Mode bits
+can extend down into Sub-vector elements to influence vec2/vec3/vec4
 sequential reordering, but even here, REMAP is not extended down to
 the actual sub-vector elements themselves.
 
 In its general form, REMAP is quite expensive to set up, and on some
 implementations may introduce
 latency, so should realistically be used only where it is worthwhile.
+Given that most other ISAs require full loop-unrolling for Matrix,
+DCT and FFT, savings are still anticipated.
 Commonly-used patterns such as Matrix Multiply, DCT and FFT have
 helper instruction options which make REMAP easier to use.