REMAP is quite expensive to set up, and on some implementations introduce
latency, so should realistically be used only where it is worthwhile
+# Principle
+
+* normal vector element read/write as operands would be sequential
+ (0 1 2 3 ....)
+* this is not appropriate for (e.g.) Matrix multiply which requires
+ accessing elements in alternative sequences (0 3 6 1 4 7 ...)
+* normal Vector ISAs use either Indexed-MV or Indexed-LD/ST to "cope"
+ with this. both are expensive (copy large vectors, spill through memory)
+* REMAP **redefines** the order of access according to set "Schedules"
+
+Only the most commonly-used algorithms in computer science have REMAP
+support, due to the high cost in both the ISA and in hardware.
+
+# REMAP SPR
+
+| 0 | 2 | 4 | 6 | 8 | 10.14 | 15..23 |
+| -- | -- | -- | -- | -- | ----- | ------ |
+|mi0 |mi1 |mi2 |mo0 |mo1 | SVme | rsv |
+
+mi0-2 and mo0-1 each select SVSHAPE0-3 to apply to a given register.
+mi0-2 apply to RA, RB, RC respectively, as input registers, and
+likewise mo0-1 apply to output registers (FRT, FRS respectively).
+SVme is 5 bits, and indicates indicate whether the
+SVSHAPE is actively applied or not.
+
+* bit 0 of SVme indicates if mi0 is applied to RA / FRA
+* bit 1 of SVme indicates if mi1 is applied to RB / FRB
+* bit 2 of SVme indicates if mi2 is applied to RC / FRC
+* bit 3 of SVme indicates if mo0 is applied to RT / FRT
+* bit 4 of SVme indicates if mo1 is applied to Effective Address / FRS
+ (LD/ST-with-update has an implicit 2nd write register, RA)
+
+There is also a corresponding SVRM-Form for the svremap
+instruction which matches the above SPR:
+
+ |0 |6 |11 |13 |15 |17 |19 |21 | 22 |26 |31 |
+ | PO | SVme |mi0 | mi1 | mi2 | mo0 | mo1 | pst | rsvd | XO | / |
+
# SHAPE 1D/2D/3D vector-matrix remapping SPRs
There are four "shape" SPRs, SHAPE0-3, 32-bits in each,
register-renaming will have an easier time dealing with this than
DSP-style SIMD micro-architectures.
-## TEMPORARY HELPER
+## svstate instruction
Please note: this is **not** intended for production. It sets up
(overwrites) all required SVSHAPE SPRs and indicates that the
Form: SVM-Form SV "Matrix" Form (see [[isatables/fields.text]])
-| 0.5|6.10 |11.15 |16..20 | 21..30 |31| name |
-| -- | -- | --- | ------ | ------ |--| -------- |
-|OPCD| SVxd | SVyd | SVRM | XO |/ | svremap |
+| 0.5|6.10 |11.15 |16..20 | 21..25 | 25 | 26..30 |31| name |
+| -- | -- | --- | ----- | ------ | -- | ------ |--| -------- |
+|OPCD| SVxd | SVyd | SVzd | SVRM | vf | XO |/ | svstate |
+
Fields:
* **SVxd** - SV REMAP "xdim"
* **SVyd** - SV REMAP "ydim"
-* **SVMM** - SV REMAP Mode (currently 0b00000 only)
-* **XO** - standard 10-bit XO field
+* **SVMM** - SV REMAP Mode (0b00000 for Matrix, 0b00001 for FFT)
+* **vf** - sets "Vertical-First" mode
+* **XO** - standard 5-bit XO field
# 4x4 Matrix to vec4 Multiply Example
and applying a rotating 1D SHAPE SPR of xdim=16 to f8 in order to get
it to apply four times to compute the four columns worth of vectors.
+# Warshall transitive closure algorithm
+
+TODO move to [[sv/remap/discussion]] page, copied from here
+http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-July/003286.html
+
+with thanks to Hendrik.
+
+> Just a note: interpreting + as 'or', and * as 'and',
+> operating on Boolean matrices,
+> and having result, X, and Y be the exact same matrix,
+> updated while being used,
+> gives the traditional Warshall transitive-closure
+> algorithm, if the loops are nested exactly in thie order.
+
+this can be done with the ternary instruction which has
+an in-place triple boolean input:
+
+ RT = RT | (RA & RB)
+
+and also has a CR Field variant of the same
+
+notes from conversations:
+
+> > for y in y_r:
+> > for x in x_r:
+> > for z in z_r:
+> > result[y][x] +=
+> > a[y][z] *
+> > b[z][x]
+
+> This nesting of loops works for matrix multiply, but not for transitive
+> closure.
+
+> > it can be done:
+> >
+> > for z in z_r:
+> > for y in y_r:
+> > for x in x_r:
+> > result[y][x] +=
+> > a[y][z] *
+> > b[z][x]
+>
+> And this ordering of loops *does* work for transitive closure, when a,
+> b, and result are the very same matrix, updated while being used.
+>
+> By the way, I believe there is a graph algorithm that does the
+> transitive closure thing, but instead of using boolean, "and", and "or",
+> they use real numbers, addition, and minimum. I think that one computes
+> shortest paths between vertices.
+>
+> By the time the z'th iteration of the z loop begins, the algorithm has
+> already peocessed paths that go through vertices numbered < z, and it
+> adds paths that go through vertices numbered z.
+>
+> For this to work, the outer loop has to be the one on teh subscript that
+> bridges a and b (which in this case are teh same matrix, of course).
+
# SUBVL Remap
Remapping even of SUBVL (vec2/3/4) elements is permitted, as if the