# NOTE
-This section is under revision (and is optional)
-
-# REMAP CSR <a name="remap" />
-
-There is one 32-bit CSR which may be used to indicate which registers,
-if used in any operation, must be "reshaped" (re-mapped) from a linear
-form to a 2D or 3D transposed form, or "offset" to permit arbitrary
-access to elements within a register.
-
-Their primary use is for Matrix Multiplication, reordering of sequential data in-place. Three CSRs are provided so that a single FMAC may be used in a single loop to perform 4x4 times 4x4 Matrix multiplication, generating 64 FMACs
-
-The 32-bit REMAP CSR may reshape up to 3 registers:
-
-| 29..28 | 27..26 | 25..24 | 23 | 22..16 | 15 | 14..8 | 7 | 6..0 |
-| ------ | ------ | ------ | -- | ------- | -- | ------- | -- | ------- |
-| shape2 | shape1 | shape0 | 0 | regidx2 | 0 | regidx1 | 0 | regidx0 |
-
-regidx0-2 refer not to the Register CSR CAM entry but to the underlying
-*real* register (see regidx, the value) and consequently is 7-bits wide.
-When set to zero (referring to x0), clearly reshaping x0 is pointless,
-so is used to indicate "disabled".
-shape0-2 refers to one of three SHAPE CSRs. A value of 0x3 is reserved.
-Bits 7, 15, 23, 30 and 31 are also reserved, and must be set to zero.
-
-It is anticipated that these specialist CSRs not be very often used.
-Unlike the CSR Register and Predication tables, the REMAP CSRs use
-the full 7-bit regidx so that they can be set once and left alone,
-whilst the CSR Register entries pointing to them are disabled, instead.
-
-# SHAPE 1D/2D/3D vector-matrix remapping CSRs
-
-There are three "shape" CSRs, SHAPE0, SHAPE1, SHAPE2, 32-bits in each,
-which have the same format. When each SHAPE CSR is set entirely to zeros,
-remapping is disabled: the register's elements are a linear (1D) vector.
-
-| 27..25 | 24..22 | 21-18 | 17..12 | 11..6 | 5..0 |
-| ------ | ------- | -- | ------- | ------- | -- | ------- |
-| invxyz | permute | offs | zdimsz | ydimsz | xdimsz |
-
-invxyz will invert the start index of each of x, y or z. If invxyz[0] is zero then x counting begins from 0, otherwise it begins from xdimsz-1 and iterates down to zero. Likewise for y and z.
-
-offs is a 4-bit field, spread out across bits 7, 15 and 23, which
-is added to the element index during the loop calculation. It is added prior to the dimensional remapping.
-
-xdimsz, ydimsz and zdimsz are offset by 1, such that a value of 0 indicates
-that the array dimensionality for that dimension is 1. A value of xdimsz=2
-would indicate that in the first dimension there are 3 elements in the
-array. The format of the array is therefore as follows:
-
- array[xdim+1][ydim+1][zdim+1]
-
-However whilst illustrative of the dimensionality, that does not take the
-"permute" setting into account. "permute" may be any one of six values
-(0-5, with values of 6 and 7 being reserved, and not legal). The table
-below shows how the permutation dimensionality order works:
-
-| permute | order | array format |
-| ------- | ----- | ------------------------ |
-| 000 | 0,1,2 | (xdim+1)(ydim+1)(zdim+1) |
-| 001 | 0,2,1 | (xdim+1)(zdim+1)(ydim+1) |
-| 010 | 1,0,2 | (ydim+1)(xdim+1)(zdim+1) |
-| 011 | 1,2,0 | (ydim+1)(zdim+1)(xdim+1) |
-| 100 | 2,0,1 | (zdim+1)(xdim+1)(ydim+1) |
-| 101 | 2,1,0 | (zdim+1)(ydim+1)(xdim+1) |
-
-In other words, the "permute" option changes the order in which
-nested for-loops over the array would be done. The algorithm below
-shows this more clearly, and may be executed as a python program:
-
- # mapidx = REMAP.shape2
- xdim = 3 # SHAPE[mapidx].xdim_sz+1
- ydim = 4 # SHAPE[mapidx].ydim_sz+1
- zdim = 5 # SHAPE[mapidx].zdim_sz+1
-
- lims = [xdim, ydim, zdim]
- idxs = [0,0,0] # starting indices
- order = [1,0,2] # experiment with different permutations, here
- offs = 0 # experiment with different offsets, here
- invxyz = [0,0,0]
-
- for idx in range(xdim * ydim * zdim):
- for i in range(3):
- ix[i] = idxs[i]
- if invxyz[i]:
- ix[i] = lims[i] - ix[i]
- new_idx = offs + ix[0] + ix[1] * xdim + ix[2] * xdim * ydim
- print new_idx
- for i in range(3):
- idxs[order[i]] = idxs[order[i]] + 1
- if (idxs[order[i]] != lims[order[i]]):
- break
- print
- idxs[order[i]] = 0
-
-Here, it is assumed that this algorithm be run within all pseudo-code
-throughout this document where a (parallelism) for-loop would normally
-run from 0 to VL-1 to refer to contiguous register
-elements; instead, where REMAP indicates to do so, the element index
-is run through the above algorithm to work out the **actual** element
-index, instead. Given that there are three possible SHAPE entries, up to
-three separate registers in any given operation may be simultaneously
-remapped:
-
- function op_add(rd, rs1, rs2) # add not VADD!
- ...
- ...
- for (i = 0; i < VL; i++)
- xSTATE.srcoffs = i # save context
- if (predval & 1<<i) # predication uses intregs
- ireg[rd+remap(id)] <= ireg[rs1+remap(irs1)] +
- ireg[rs2+remap(irs2)];
- if (!int_vec[rd ].isvector) break;
- if (int_vec[rd ].isvector) { id += 1; }
- if (int_vec[rs1].isvector) { irs1 += 1; }
- if (int_vec[rs2].isvector) { irs2 += 1; }
-
-By changing remappings, 2D matrices may be transposed "in-place" for one
-operation, followed by setting a different permutation order without
-having to move the values in the registers to or from memory. Also,
-the reason for having REMAP separate from the three SHAPE CSRs is so
-that in a chain of matrix multiplications and additions, for example,
-the SHAPE CSRs need only be set up once; only the REMAP CSR need be
-changed to target different registers.
-
-Note that:
-
-* Over-running the register file clearly has to be detected and
- an illegal instruction exception thrown
-* When non-default elwidths are set, the exact same algorithm still
- applies (i.e. it offsets elements *within* registers rather than
- entire registers).
-* If permute option 000 is utilised, the actual order of the
- reindexing does not change!
-* If two or more dimensions are set to zero, the actual order does not change!
-* The above algorithm is pseudo-code **only**. Actual implementations
- will need to take into account the fact that the element for-looping
- must be **re-entrant**, due to the possibility of exceptions occurring.
- See MSTATE CSR, which records the current element index.
-* Twin-predicated operations require **two** separate and distinct
- element offsets. The above pseudo-code algorithm will be applied
- separately and independently to each, should each of the two
- operands be remapped. *This even includes C.LDSP* and other operations
- in that category, where in that case it will be the **offset** that is
- remapped (see Compressed Stack LOAD/STORE section).
-* Offset is especially useful, on its own, for accessing elements
- within the middle of a register. Without offsets, it is necessary
- to either use a predicated MV, skipping the first elements, or
- performing a LOAD/STORE cycle to memory.
- With offsets, the data does not have to be moved.
-* Setting the total elements (xdim+1) times (ydim+1) times (zdim+1) to
- less than MVL is **perfectly legal**, albeit very obscure. It permits
- entries to be regularly presented to operands **more than once**, thus
- allowing the same underlying registers to act as an accumulator of
- multiple vector or matrix operations, for example.
-
-Clearly here some considerable care needs to be taken as the remapping
-could hypothetically create arithmetic operations that target the
-exact same underlying registers, resulting in data corruption due to
-pipeline overlaps. Out-of-order / Superscalar micro-architectures with
-register-renaming will have an easier time dealing with this than
-DSP-style SIMD micro-architectures.
-
+This section is **OBSOLETE** and superceded by [[openpower/sv/remap]]