From 9a2ca22b5583712ada3c20891a16ee830fa89e3d Mon Sep 17 00:00:00 2001 From: lkcl Date: Thu, 14 Jan 2021 05:11:48 +0000 Subject: [PATCH] --- openpower/sv/remap.mdwn | 30 +++++++++++++++++++----------- 1 file changed, 19 insertions(+), 11 deletions(-) diff --git a/openpower/sv/remap.mdwn b/openpower/sv/remap.mdwn index 211afecb2..cc47e8159 100644 --- a/openpower/sv/remap.mdwn +++ b/openpower/sv/remap.mdwn @@ -8,6 +8,10 @@ access to elements, independently on each Vector src or dest register. Their primary use is for Matrix Multiplication, reordering of sequential data in-place. Four SPRs are provided so that a single FMAC may be used in a single loop to perform 4x4 times 4x4 Matrix multiplication, generating 64 FMACs. Additional uses include regular "Structure Packing" such as RGB pixel data extraction and reforming. +REMAP, like all of SV, is abstracted out, meaning that unlike traditional Vector ISAs which would typically only have a limited set of instructions that can be structure-packed (LD/ST typically), REMAP may be applied to literally any instruction: CRs, Arithmetic, Logical, LD/ST, anything. + +Note that REMAP does not apply to sub-vector elements: that is what swizzle is for. Swizzle *can* however be applied to the same instruction as REMAP. + # SHAPE 1D/2D/3D vector-matrix remapping SPRs There are four "shape" SPRs, SHAPE0-3, 32-bits in each, @@ -18,7 +22,7 @@ which have the same format. The algorithm below shows how REMAP works more clearly, and may be executed as a python program: - xdim = 3 + xdim = 3 # changeme ydim = 4 zdim = 1 @@ -59,10 +63,8 @@ executed as a python program: break idxs[order[i]] = 0 -Here, it is assumed that this algorithm be run within all pseudo-code -throughout this document where a (parallelism) for-loop would normally -run from 0 to VL-1 to refer to contiguous register -elements; instead, where REMAP indicates to do so, the element index + +Each element index from the for-loop `0..VL-1` is run through the above algorithm to work out the **actual** element index, instead. Given that there are four possible SHAPE entries, up to four separate registers in any given operation may be simultaneously @@ -74,8 +76,8 @@ remapped:  for (i = 0; i < VL; i++) xSTATE.srcoffs = i # save context if (predval & 1<