(no commit message)

[libreriscv.git] / openpower / sv / remap.mdwn
diff --git a/openpower/sv/remap.mdwn b/openpower/sv/remap.mdwn

index 0086f4520baee3645728eeff8dc6902c5cda423c..bba5464169c863bcbbbf8868fd9536a4675c08a0 100644 (file)
--- a/openpower/sv/remap.mdwn
+++ b/openpower/sv/remap.mdwn
@@ -2,6 +2,8 @@
  
  # REMAP <a name="remap" />
  
+* <https://bugs.libre-soc.org/show_bug.cgi?id=143>
+
  see [[sv/propagation]] because it is currently the only way to apply
  REMAP.  
  
@@ -28,6 +30,44 @@ as REMAP.
  REMAP is quite expensive to set up, and on some implementations introduce
  latency, so should realistically be used only where it is worthwhile
  
+# Principle
+
+* normal vector element read/write as operands would be sequential
+  (0 1 2 3 ....)
+* this is not appropriate for (e.g.) Matrix multiply which requires
+  accessing elements in alternative sequences (0 3 6 1 4 7 ...)
+* normal Vector ISAs use either Indexed-MV or Indexed-LD/ST to "cope"
+  with this.  both are expensive (copy large vectors, spill through memory)
+* REMAP **redefines** the order of access according to set "Schedules"
+
+Only the most commonly-used algorithms in computer science have REMAP
+support, due to the high cost in both the ISA and in hardware.
+
+# REMAP SPR
+
+| 0  | 2  | 4  | 6  | 8  | 10.14 | 15..23 |
+| -- | -- | -- | -- | -- | ----- | ------ |
+|mi0 |mi1 |mi2 |mo0 |mo1 | SVme  | rsv    |
+
+mi0-2 and mo0-1 each select SVSHAPE0-3 to apply to a given register.
+mi0-2 apply to RA, RB, RC respectively, as input registers, and
+likewise mo0-1 apply to output registers (FRT, FRS respectively).
+SVme is 5 bits, and indicates indicate whether the
+SVSHAPE is actively applied or not.
+
+* bit 0 of SVme indicates if mi0 is applied to RA / FRA
+* bit 1 of SVme indicates if mi1 is applied to RB / FRB
+* bit 2 of SVme indicates if mi2 is applied to RC / FRC
+* bit 3 of SVme indicates if mo0 is applied to RT / FRT
+* bit 4 of SVme indicates if mo1 is applied to Effective Address / FRS
+  (LD/ST-with-update has an implicit 2nd write register, RA)
+
+There is also a corresponding SVRM-Form for the svremap
+instruction which matches the above SPR:
+
+    |0     |6     |11  |13   |15   |17   |19   |21    | 22    |26     |31 |
+    | PO   | SVme |mi0 | mi1 | mi2 | mo0 | mo1 | pst  | rsvd | XO    | / |
+
  # SHAPE 1D/2D/3D vector-matrix remapping SPRs
  
  There are four "shape" SPRs, SHAPE0-3, 32-bits in each,
@@ -37,10 +77,17 @@ which have the same format.
  
  The algorithm below shows how REMAP works more clearly, and may be
  executed as a python program:
+
  ```
  [[!inline quick="yes" raw="yes" pages="openpower/sv/remap.py" ]]
  ```
  
+An easier-to-read version (using python iterators) shows the loop nesting:
+
+```
+[[!inline quick="yes" raw="yes" pages="openpower/sv/remapyield.py" ]]
+```
+
  Each element index from the for-loop `0..VL-1`
  is run through the above algorithm to work out the **actual** element
  index, instead.  Given that there are four possible SHAPE entries, up to
@@ -109,6 +156,28 @@ pipeline overlaps.  Out-of-order / Superscalar micro-architectures with
  register-renaming will have an easier time dealing with this than
  DSP-style SIMD micro-architectures.
  
+## svstate instruction
+
+Please note: this is **not** intended for production.  It sets up
+(overwrites) all required SVSHAPE SPRs and indicates that the
+*next instruction* shall have those REMAP shapes applied to it,
+assuming that instruction is of the form FRT,FRA,FRC,FRB.
+
+Form: SVM-Form SV "Matrix" Form (see [[isatables/fields.text]])
+
+| 0.5|6.10  |11.15  |16..20 | 21..25 | 25 | 26..30 |31|  name    |
+| -- | --   | ---   | ----- | ------ | -- | ------ |--| -------- |
+|OPCD| SVxd | SVyd  | SVzd  | SVRM   | vf | XO     |/ | svstate  |
+
+
+Fields:
+
+* **SVxd** - SV REMAP "xdim"
+* **SVyd** - SV REMAP "ydim"
+* **SVMM** - SV REMAP Mode (0b00000 for Matrix, 0b00001 for FFT)
+* **vf** - sets "Vertical-First" mode
+* **XO** - standard 5-bit XO field
+
  # 4x4 Matrix to vec4 Multiply Example
  
  The following settings will allow a 4x4 matrix (starting at f8), expressed
@@ -158,6 +227,63 @@ setting VL=64, using an extra dimension on the SHAPE0 and SHAPE1 SPRs,
  and applying a rotating 1D SHAPE SPR of xdim=16 to f8 in order to get
  it to apply four times to compute the four columns worth of vectors.
  
+# Warshall transitive closure algorithm
+
+TODO move to [[sv/remap/discussion]] page, copied from here
+http://lists.libre-soc.org/pipermail/libre-soc-dev/2021-July/003286.html
+
+with thanks to Hendrik.
+
+> Just a note:  interpreting + as 'or', and * as 'and',
+> operating on Boolean matrices, 
+> and having result, X, and Y be the exact same matrix,
+> updated while being used,
+> gives the traditional Warshall transitive-closure
+> algorithm, if the loops are nested exactly in thie order.
+
+this can be done with the ternary instruction which has
+an in-place triple boolean input:
+
+    RT = RT | (RA & RB)
+
+and also has a CR Field variant of the same
+
+notes from conversations:
+
+> > for y in y_r:
+> >  for x in x_r:
+> >    for z in z_r:
+> >      result[y][x] +=
+> >         a[y][z] *
+> >         b[z][x]
+
+> This nesting of loops works for matrix multiply, but not for transitive
+> closure. 
+
+> > it can be done:
+> >
+> >   for z in z_r:
+> >    for y in y_r:
+> >     for x in x_r:
+> >       result[y][x] +=
+> >          a[y][z] *
+> >          b[z][x]
+>
+> And this ordering of loops *does* work for transitive closure, when a,
+> b, and result are the very same matrix, updated while being used.
+>
+> By the way, I believe there is a graph algorithm that does the
+> transitive closure thing, but instead of using boolean, "and", and "or",
+> they use real numbers, addition, and minimum.  I think that one computes
+> shortest paths between vertices.
+>
+> By the time the z'th iteration of the z loop begins, the algorithm has
+> already peocessed paths that go through vertices numbered < z, and it
+> adds paths that go through vertices numbered z.
+>
+> For this to work, the outer loop has to be the one on teh subscript that
+> bridges a and b (which in this case are teh same matrix, of course).
+
  # SUBVL Remap
  
  Remapping even of SUBVL (vec2/3/4) elements is permitted, as if the