From: lkcl <lkcl@web>
Date: Mon, 27 Mar 2023 23:50:40 +0000 (+0100)
Subject: (no commit message)
X-Git-Tag: opf_rfc_ls001_v3~22
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=a6220790a87d0154b87c3830f8a491bd9046c19e;p=libreriscv.git

---

diff --git a/openpower/sv/rfc/ls009.mdwn b/openpower/sv/rfc/ls009.mdwn
index 755a80801..022193bf0 100644
--- a/openpower/sv/rfc/ls009.mdwn
+++ b/openpower/sv/rfc/ls009.mdwn
@@ -119,16 +119,21 @@ Vector ISAs which would typically only have a limited set of instructions
 that can be structure-packed (LD/ST typically), REMAP may be applied to
 literally any instruction: CRs, Arithmetic, Logical, LD/ST, anything.
 
-Note that REMAP does not *directly* apply to sub-vector elements: that 
+Note that REMAP does not *directly* apply to sub-vector elements but
+only to the group: that 
 is what swizzle is for.  Swizzle *can* however be applied to the same
-instruction as REMAP.  As explained in [[sv/mv.swizzle]], [[sv/mv.vec]] and the [[svp64/appendix]], Pack and Unpack EXTRA Mode bits
+instruction as REMAP.  As explained in [[sv/mv.swizzle]]
+and the [[svp64/appendix]], Pack and Unpack EXTRA Mode bits
 can extend down into Sub-vector elements to perform vec2/vec3/vec4
-sequential reordering, but even here, REMAP is not extended down to
-the actual sub-vector elements themselves.
+sequential reordering, but even here, REMAP is not *individually*
+extended down to the actual sub-vector elements themselves.
 
 In its general form, REMAP is quite expensive to set up, and on some
 implementations may introduce
 latency, so should realistically be used only where it is worthwhile.
+Given that even with latency the fact that up to 127 operations
+can be requested to be issued (from a single instruction) it should
+be clear that REMAP should not be dismissed for *potential* latency alone.
 Commonly-used patterns such as Matrix Multiply, DCT and FFT have
 helper instruction options which make REMAP easier to use.