# REMAP <a name="remap" />
+* <https://bugs.libre-soc.org/show_bug.cgi?id=143>
+
see [[sv/propagation]] because it is currently the only way to apply
REMAP.
REMAP is quite expensive to set up, and on some implementations introduce
latency, so should realistically be used only where it is worthwhile
+# Principle
+
+* normal vector element read/write as operands would be sequential
+ (0 1 2 3 ....)
+* this is not appropriate for (e.g.) Matrix multiply which requires
+ accessing elements in alternative sequences (0 3 6 1 4 7 ...)
+* normal Vector ISAs use either Indexed-MV or Indexed-LD/ST to "cope"
+ with this. both are expensive (copy large vectors, spill through memory)
+* REMAP **redefines** the order of access according to set "Schedules"
+
+Only the most commonly-used algorithms in computer science have REMAP
+support, due to the high cost in both the ISA and in hardware.
+
+# REMAP SPR
+
+| 0 | 2 | 4 | 6 | 8 | 10.14 | 15..23 |
+| -- | -- | -- | -- | -- | ----- | ------ |
+|mi0 |mi1 |mi2 |mo0 |mo1 | SVme | rsv |
+
+mi0-2 and mo0-1 each select SVSHAPE0-3 to apply to a given register.
+mi0-2 apply to RA, RB, RC respectively, as input registers, and
+likewise mo0-1 apply to output registers (FRT, FRS respectively).
+SVme is 5 bits, and indicates indicate whether the
+SVSHAPE is actively applied or not.
+
+* bit 0 of SVme indicates if mi0 is applied to RA / FRA
+* bit 1 of SVme indicates if mi1 is applied to RB / FRB
+* bit 2 of SVme indicates if mi2 is applied to RC / FRC
+* bit 3 of SVme indicates if mo0 is applied to RT / FRT
+* bit 4 of SVme indicates if mo1 is applied to Effective Address / FRS
+ (LD/ST-with-update has an implicit 2nd write register, RA)
+
+There is also a corresponding SVRM-Form for the svremap
+instruction which matches the above SPR:
+
+ |0 |6 |11 |13 |15 |17 |19 |21 | 22 |26 |31 |
+ | PO | SVme |mi0 | mi1 | mi2 | mo0 | mo1 | pst | rsvd | XO | / |
+
# SHAPE 1D/2D/3D vector-matrix remapping SPRs
There are four "shape" SPRs, SHAPE0-3, 32-bits in each,
The algorithm below shows how REMAP works more clearly, and may be
executed as a python program:
+
```
[[!inline quick="yes" raw="yes" pages="openpower/sv/remap.py" ]]
```
+An easier-to-read version (using python iterators) shows the loop nesting:
+
+```
+[[!inline quick="yes" raw="yes" pages="openpower/sv/remapyield.py" ]]
+```
+
Each element index from the for-loop `0..VL-1`
is run through the above algorithm to work out the **actual** element
index, instead. Given that there are four possible SHAPE entries, up to
register-renaming will have an easier time dealing with this than
DSP-style SIMD micro-architectures.
+## svstate instruction
+
+Please note: this is **not** intended for production. It sets up
+(overwrites) all required SVSHAPE SPRs and indicates that the
+*next instruction* shall have those REMAP shapes applied to it,
+assuming that instruction is of the form FRT,FRA,FRC,FRB.
+
+Form: SVM-Form SV "Matrix" Form (see [[isatables/fields.text]])
+
+| 0.5|6.10 |11.15 |16..20 | 21..25 | 25 | 26..30 |31| name |
+| -- | -- | --- | ----- | ------ | -- | ------ |--| -------- |
+|OPCD| SVxd | SVyd | SVzd | SVRM | vf | XO |/ | svstate |
+
+
+Fields:
+
+* **SVxd** - SV REMAP "xdim"
+* **SVyd** - SV REMAP "ydim"
+* **SVMM** - SV REMAP Mode (0b00000 for Matrix, 0b00001 for FFT)
+* **vf** - sets "Vertical-First" mode
+* **XO** - standard 5-bit XO field
+
# 4x4 Matrix to vec4 Multiply Example
The following settings will allow a 4x4 matrix (starting at f8), expressed
and applying a rotating 1D SHAPE SPR of xdim=16 to f8 in order to get
it to apply four times to compute the four columns worth of vectors.
+# Warshall transitive closure algorithm
+
+TODO move to [[sv/remap/discussion]] page, copied from here
+http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-July/003286.html
+
+with thanks to Hendrik.
+
+> Just a note: interpreting + as 'or', and * as 'and',
+> operating on Boolean matrices,
+> and having result, X, and Y be the exact same matrix,
+> updated while being used,
+> gives the traditional Warshall transitive-closure
+> algorithm, if the loops are nested exactly in thie order.
+
+this can be done with the ternary instruction which has
+an in-place triple boolean input:
+
+ RT = RT | (RA & RB)
+
+and also has a CR Field variant of the same
+
+notes from conversations:
+
+> > for y in y_r:
+> > for x in x_r:
+> > for z in z_r:
+> > result[y][x] +=
+> > a[y][z] *
+> > b[z][x]
+
+> This nesting of loops works for matrix multiply, but not for transitive
+> closure.
+
+> > it can be done:
+> >
+> > for z in z_r:
+> > for y in y_r:
+> > for x in x_r:
+> > result[y][x] +=
+> > a[y][z] *
+> > b[z][x]
+>
+> And this ordering of loops *does* work for transitive closure, when a,
+> b, and result are the very same matrix, updated while being used.
+>
+> By the way, I believe there is a graph algorithm that does the
+> transitive closure thing, but instead of using boolean, "and", and "or",
+> they use real numbers, addition, and minimum. I think that one computes
+> shortest paths between vertices.
+>
+> By the time the z'th iteration of the z loop begins, the algorithm has
+> already peocessed paths that go through vertices numbered < z, and it
+> adds paths that go through vertices numbered z.
+>
+> For this to work, the outer loop has to be the one on teh subscript that
+> bridges a and b (which in this case are teh same matrix, of course).
+
# SUBVL Remap
Remapping even of SUBVL (vec2/3/4) elements is permitted, as if the