From a6220790a87d0154b87c3830f8a491bd9046c19e Mon Sep 17 00:00:00 2001 From: lkcl Date: Tue, 28 Mar 2023 00:50:40 +0100 Subject: [PATCH] --- openpower/sv/rfc/ls009.mdwn | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/openpower/sv/rfc/ls009.mdwn b/openpower/sv/rfc/ls009.mdwn index 755a80801..022193bf0 100644 --- a/openpower/sv/rfc/ls009.mdwn +++ b/openpower/sv/rfc/ls009.mdwn @@ -119,16 +119,21 @@ Vector ISAs which would typically only have a limited set of instructions that can be structure-packed (LD/ST typically), REMAP may be applied to literally any instruction: CRs, Arithmetic, Logical, LD/ST, anything. -Note that REMAP does not *directly* apply to sub-vector elements: that +Note that REMAP does not *directly* apply to sub-vector elements but +only to the group: that is what swizzle is for. Swizzle *can* however be applied to the same -instruction as REMAP. As explained in [[sv/mv.swizzle]], [[sv/mv.vec]] and the [[svp64/appendix]], Pack and Unpack EXTRA Mode bits +instruction as REMAP. As explained in [[sv/mv.swizzle]] +and the [[svp64/appendix]], Pack and Unpack EXTRA Mode bits can extend down into Sub-vector elements to perform vec2/vec3/vec4 -sequential reordering, but even here, REMAP is not extended down to -the actual sub-vector elements themselves. +sequential reordering, but even here, REMAP is not *individually* +extended down to the actual sub-vector elements themselves. In its general form, REMAP is quite expensive to set up, and on some implementations may introduce latency, so should realistically be used only where it is worthwhile. +Given that even with latency the fact that up to 127 operations +can be requested to be issued (from a single instruction) it should +be clear that REMAP should not be dismissed for *potential* latency alone. Commonly-used patterns such as Matrix Multiply, DCT and FFT have helper instruction options which make REMAP easier to use. -- 2.30.2