From: Andrey Miroshnikov <andrey@technepisteme.xyz>
Date: Thu, 9 Nov 2023 19:17:52 +0000 (+0000)
Subject: Clean up remap matrix instruction sections
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=353c9900c7128084f26a8e23c20a7340ed0f255f;p=libreriscv.git

Clean up remap matrix instruction sections
---

diff --git a/openpower/sv/cookbook/remap_matrix.mdwn b/openpower/sv/cookbook/remap_matrix.mdwn
index 701c9c01c..0136034d2 100644
--- a/openpower/sv/cookbook/remap_matrix.mdwn
+++ b/openpower/sv/cookbook/remap_matrix.mdwn
@@ -214,71 +214,54 @@ ISACaller)*
 
 The `svshape` instruction is a convenient way to access the `SVSHAPE` Special
 Purpose Registers (SPRs), which were added alongside the SVP64 looping
-system for complex element indexing. Without having "Re-shaping" SPRs, only the most
-basic, consecuting indexing of register elements (0,1,2,3...) would
-be possible.
+system for complex element indexing. Without having "Re-shaping" SPRs,
+only the most basic, consecuting indexing of register elements (0,1,2,3...)
+would be possible.
+
+The REMAP system has 16 modes, all of which are accessible through the
+`svshape` instruction. However for the purpose of this guide, only SVrm=0
+(Matrix Multiply) will be covered.
 
 ### SVSHAPE Remapping SPRs
 
 * See [[sv/remap]] for the full break down of SPRs `SVSHAPE0-3`.
 * Pseudo-code for the
 [svshape instruction](https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/simplev.mdwn;h=33a02e#l120)
+* Pseude-code also available on the wiki: [[openpower/sv/appendix]].
 
 Matrix Multiply utilises SVSHAPE0-2 SPRs.
 
-(NOTE: This section is duplicated from the remap spec, and thus will be re-worked.)
-
-SVSHAPE0 SPR:
-
-|0:5   |6:11  | 12:17   | 18:20   | 21:23   |24:27 |28:29  |30:31|
-|----- |----- | ------- | ------- | ------  |------|------ |---- |
-|xdimsz|ydimsz| zdimsz  | permute | invxyz  |offset|skip   |mode |
-
-
-skip:
-
-- 0b00 indicates no dimensions to be skipped
-- 0b01 - skip '1st dim'
-- 0b10 - skip '2nd dim'
-- 0b11 - skip '3rd dim'
-
-invxyz (3-bit; 1 for x, 1 for y, 1 for z):
-
-- If corresponding dim bit is zero, start index from zero and increment
-- If bit set, start from xdimsz-1 (x dimension size, or whichever dimension
-bit is being looked at) and decrement down to zero.
-
-offset is used to offset the result by `offset` elements (important for when
-using element width overrides are used).
-
-xdimsz, ydimsz, zdimsz are offset by 1, such that 0-0b111111 correspond to
-1-64. A value of xdimsz=2 would indicate that in the first dimension there are
-3 elements in the array.
-
-With the example Matrix X (2 rows, 3 columns, or 2x3 matrix), xdimsz=1,
-ydimsz=2, zdimsz=0.
+The index table shown for the inner method above shows indices for a 'flattened'
+matrix (how it would be arranged in sequential GPR registers), whereas
+SVSHAPE0, 1, 2 registers setup the indices in relation to rows and columns
+of the matrix.
 
-permute setting:
+This is how the indices compare:
 
-| permute | order | array format             |
-| ------- | ----- | ------------------------ |
-| 000     | 0,1,2 | (xdim+1)(ydim+1)(zdim+1) |
-| 001     | 0,2,1 | (xdim+1)(zdim+1)(ydim+1) |
-| 010     | 1,0,2 | (ydim+1)(xdim+1)(zdim+1) |
-| 011     | 1,2,0 | (ydim+1)(zdim+1)(xdim+1) |
-| 100     | 2,0,1 | (zdim+1)(xdim+1)(ydim+1) |
-| 101     | 2,1,0 | (zdim+1)(ydim+1)(xdim+1) |
-| 110     | 0,1   | Indexed (xdim+1)(ydim+1) |
-| 111     | 1,0   | Indexed (ydim+1)(xdim+1) |
+```
+                             Row/Column Indices
+    Flattened Indices     | Mat X | Mat Y | Mat Z |
+| Mat X | Mat Y | Mat Z | | r   c | r   c | r   c |
+|   0   |   0   |   0   | | 0   0 | 0   0 | 0   0 |
+|   0   |   1   |   1   | | 0   0 | 0   1 | 0   1 |
+|   3   |   0   |   2   | | 1   0 | 0   0 | 1   0 |
+|   3   |   1   |   3   | | 1   0 | 0   1 | 1   1 |
+|   1   |   2   |   0   | | 0   1 | 1   0 | 0   0 |
+|   1   |   3   |   1   | | 0   1 | 1   1 | 0   1 |
+|   4   |   2   |   2   | | 1   1 | 1   0 | 1   0 |
+|   4   |   3   |   3   | | 1   1 | 1   1 | 1   1 |
+|   2   |   4   |   0   | | 0   2 | 2   0 | 0   0 |
+|   2   |   5   |   1   | | 0   2 | 2   1 | 0   1 |
+|   5   |   4   |   2   | | 1   2 | 2   0 | 1   0 |
+|   5   |   5   |   3   | | 1   2 | 2   1 | 1   1 |
+```
 
-Permute re-arranges the order of the nested for-loops used to iterate over the
-three dimensions. This allows for in-place transpose, in-place rotate, matrix
-multiply, convolutions, without the limitation of Power-of-Two matrices.
+These row/column indices are converted to the flattened indices when actually
+used when SVP64 looping is going on (during the `maddld` hot loop).
 
-For normal matrix multiply, the permute setting is 0b010 (order 1,0,2,
-or swap x and y loops).
+See [[openpower/sv/remap]] Section 3.3 Matrix Mode for more information on
+the index sequences which can be produced with SVSHAPE SPRs.
 
-(*NOTE:* This is done automatically by the Matrix-Multiply REMAP mode, `SVRM=0`.)
 
 ### Limitations of Matrix REMAP
 
@@ -303,9 +286,9 @@ multiply:
 
 breakdown:
 
-- SVxd=2, SVyd=2, SVzd=3
-- SVRM=0 (Matrix mode, uses `SVSHAPE0` SPR)
-- vf=0 (not using Vertical-First mode)
+- `SVxd=2`, `SVyd=2`, `SVzd=3`
+- `SVrm=0` (Matrix mode, uses `SVSHAPE0` SPR)
+- `vf=0` (not using Vertical-First mode)
 
 To determine the `SVxd`/`SVyd`/`SVzd` settings:
 
@@ -325,13 +308,23 @@ Table form
     SVzd | mat_X_num_cols OR mat_Y_num_rows
 ```
 
+The `svshape` instruction will do the following (for Matrix Multiply REMAP):
+
+- The vector length `VL` of the SVP64 loop is determined based on the three
+dimensions: `VL <- xd * yd * zd`. For this example should be 12, since there
+will be 12 multiply-add accumulates to fully compute the result matrix.
+- `SVSHAPE0`, `SVSHAPE1`, and `SVSHAPE2` SPRs are configured with the x/y/z
+dimensions. As each SVSHAPE register supports three sets of indices (three
+loops), the third index z is skipped (because we're dealing with 2d matrices).
+- Other modifications done by `svshape` (such as `SVSTATE` SPR) are
+out-of-scope for this document.
+
 ## SVREMAP
 
-SVRM-Form:
+* See [[sv/remap]] for the `svremap` instruction.
 
-|0     |6     |11  |13   |15   |17   |19   |21    | 22:25 |26:31  |
-| --   | --   | -- | --  | --  | --  | --  | --   | ----  | ----- |
-| PO   | SVme |mi0 | mi1 | mi2 | mo0 | mo1 | pst  | rsvd  | XO    |
+Assigns the configured SVSHAPEs to the relevant operand/result registers
+of the consecutive instruction/s (depending on if REMAP is set to persistent).
 
 * svremap SVme,mi0,mi1,mi2,mo0,mo1,pst
 
@@ -348,59 +341,33 @@ will have REMAP applied.
 - `mix/mox` fields determine which shape is applied to the activated register
 - `mi0=1`, instruction operand RA has SVSHAPE1 applied to it.
 - `mi1=2`, instruction operand RB has SVSHAPE2 applied to it.
-- `mi2=3`, instruction operand RA has SVSHAPE3 applied to it.
+- `mi2=3`, instruction operand RC has SVSHAPE3 applied to it.
 - `mo0=0`, instruction result RT has SVSHAPE0 applied to it.
-- `mo1=0`, instruction result EA/FRS has SVSHAPE0 applied to it. *(not applicable
-for this example)*
+- `mo1=0`, instruction result EA/FRS has SVSHAPE0 applied to it.
+*(not applicable for this example)*
 - `pst=0`, if set, REMAP remains enabled until explicitly disabled, or another
 REMAP, or setvl is setup.
 
-Assigns the configured SVSHAPEs to the relevant operand/result registers
-of the consecutive instruction/s (depending on if REMAP is set to persistent).
-
-The index table shown for the inner method above shows indices for a 'flattened'
-matrix (how it would be arranged in sequential GPR registers), whereas
-SVSHAPE0, 1, 2 registers setup the indices in relation to rows and columns
-of the matrix.
-
-This is how the indices compare:
 
-```
-                             Row/Column Indices
-    Flattened Indices     | Mat X | Mat Y | Mat Z |
-| Mat X | Mat Y | Mat Z | | r   c | r   c | r   c |
-|   0   |   0   |   0   | | 0   0 | 0   0 | 0   0 |
-|   0   |   1   |   1   | | 0   0 | 0   1 | 0   1 |
-|   3   |   0   |   2   | | 1   0 | 0   0 | 1   0 |
-|   3   |   1   |   3   | | 1   0 | 0   1 | 1   1 |
-|   1   |   2   |   0   | | 0   1 | 1   0 | 0   0 |
-|   1   |   3   |   1   | | 0   1 | 1   1 | 0   1 |
-|   4   |   2   |   2   | | 1   1 | 1   0 | 1   0 |
-|   4   |   3   |   3   | | 1   1 | 1   1 | 1   1 |
-|   2   |   4   |   0   | | 0   2 | 2   0 | 0   0 |
-|   2   |   5   |   1   | | 0   2 | 2   1 | 0   1 |
-|   5   |   4   |   2   | | 1   2 | 2   0 | 1   0 |
-|   5   |   5   |   3   | | 1   2 | 2   1 | 1   1 |
-```
+## maddld - Multiply-Add Low Doubleword VA-form
 
-See [[openpower/sv/remap]] Section 3.3 Matrix Mode for more information on
-the index sequences which can be produced with SVSHAPE SPRs.
+A standard instruction available since version 3.0 of PowerISA.
 
-## maddld - Multiply-Add Low Doubleword VA-form
+This instruction can be used as a multiply-add accumulate by setting the
+third operand to be the same as the result register, which functions as
+an accumulator.
 
 ```
     sv.maddld *0, *16, *32, *0
 ```
 
-A standard instruction available since version 3.0 of PowerISA.
-
-*Temporary note:* maddld (Multiply-Add Low Doubleword) in the 3.1b version
-of the PowerISA spec is in the Linux Compliancy Subset, not SFS or SFFS.
-See page 1477 of the document, or page 1503 of the pdf.
+breakdown:
 
-This instruction can be used as a multiply-add accumulate by setting the
-third operand to be the same as the result register, which functions as
-an accumulator.
+- Store result (RT) of the operation starting at register 0.
+- Operands RA and RB correspond to the two operand matrices, starting at
+register 16 and register 32 respectively.
+- The third operand RC is the same as the result register, which gives the
+multiply-add accumulate behaviour.
 
 ## Appendix