From: lkcl Date: Mon, 27 Mar 2023 23:50:40 +0000 (+0100) Subject: (no commit message) X-Git-Tag: opf_rfc_ls001_v3~22 X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=a6220790a87d0154b87c3830f8a491bd9046c19e;p=libreriscv.git --- diff --git a/openpower/sv/rfc/ls009.mdwn b/openpower/sv/rfc/ls009.mdwn index 755a80801..022193bf0 100644 --- a/openpower/sv/rfc/ls009.mdwn +++ b/openpower/sv/rfc/ls009.mdwn @@ -119,16 +119,21 @@ Vector ISAs which would typically only have a limited set of instructions that can be structure-packed (LD/ST typically), REMAP may be applied to literally any instruction: CRs, Arithmetic, Logical, LD/ST, anything. -Note that REMAP does not *directly* apply to sub-vector elements: that +Note that REMAP does not *directly* apply to sub-vector elements but +only to the group: that is what swizzle is for. Swizzle *can* however be applied to the same -instruction as REMAP. As explained in [[sv/mv.swizzle]], [[sv/mv.vec]] and the [[svp64/appendix]], Pack and Unpack EXTRA Mode bits +instruction as REMAP. As explained in [[sv/mv.swizzle]] +and the [[svp64/appendix]], Pack and Unpack EXTRA Mode bits can extend down into Sub-vector elements to perform vec2/vec3/vec4 -sequential reordering, but even here, REMAP is not extended down to -the actual sub-vector elements themselves. +sequential reordering, but even here, REMAP is not *individually* +extended down to the actual sub-vector elements themselves. In its general form, REMAP is quite expensive to set up, and on some implementations may introduce latency, so should realistically be used only where it is worthwhile. +Given that even with latency the fact that up to 127 operations +can be requested to be issued (from a single instruction) it should +be clear that REMAP should not be dismissed for *potential* latency alone. Commonly-used patterns such as Matrix Multiply, DCT and FFT have helper instruction options which make REMAP easier to use.