The `svshape` instruction is a convenient way to access the `SVSHAPE` Special
Purpose Registers (SPRs), which were added alongside the SVP64 looping
-system for complex element indexing. Without having "Re-shaping" SPRs, only the most
-basic, consecuting indexing of register elements (0,1,2,3...) would
-be possible.
+system for complex element indexing. Without having "Re-shaping" SPRs,
+only the most basic, consecuting indexing of register elements (0,1,2,3...)
+would be possible.
+
+The REMAP system has 16 modes, all of which are accessible through the
+`svshape` instruction. However for the purpose of this guide, only SVrm=0
+(Matrix Multiply) will be covered.
### SVSHAPE Remapping SPRs
* See [[sv/remap]] for the full break down of SPRs `SVSHAPE0-3`.
* Pseudo-code for the
[svshape instruction](https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/simplev.mdwn;h=33a02e#l120)
+* Pseude-code also available on the wiki: [[openpower/sv/appendix]].
Matrix Multiply utilises SVSHAPE0-2 SPRs.
-(NOTE: This section is duplicated from the remap spec, and thus will be re-worked.)
-
-SVSHAPE0 SPR:
-
-|0:5 |6:11 | 12:17 | 18:20 | 21:23 |24:27 |28:29 |30:31|
-|----- |----- | ------- | ------- | ------ |------|------ |---- |
-|xdimsz|ydimsz| zdimsz | permute | invxyz |offset|skip |mode |
-
-
-skip:
-
-- 0b00 indicates no dimensions to be skipped
-- 0b01 - skip '1st dim'
-- 0b10 - skip '2nd dim'
-- 0b11 - skip '3rd dim'
-
-invxyz (3-bit; 1 for x, 1 for y, 1 for z):
-
-- If corresponding dim bit is zero, start index from zero and increment
-- If bit set, start from xdimsz-1 (x dimension size, or whichever dimension
-bit is being looked at) and decrement down to zero.
-
-offset is used to offset the result by `offset` elements (important for when
-using element width overrides are used).
-
-xdimsz, ydimsz, zdimsz are offset by 1, such that 0-0b111111 correspond to
-1-64. A value of xdimsz=2 would indicate that in the first dimension there are
-3 elements in the array.
-
-With the example Matrix X (2 rows, 3 columns, or 2x3 matrix), xdimsz=1,
-ydimsz=2, zdimsz=0.
+The index table shown for the inner method above shows indices for a 'flattened'
+matrix (how it would be arranged in sequential GPR registers), whereas
+SVSHAPE0, 1, 2 registers setup the indices in relation to rows and columns
+of the matrix.
-permute setting:
+This is how the indices compare:
-| permute | order | array format |
-| ------- | ----- | ------------------------ |
-| 000 | 0,1,2 | (xdim+1)(ydim+1)(zdim+1) |
-| 001 | 0,2,1 | (xdim+1)(zdim+1)(ydim+1) |
-| 010 | 1,0,2 | (ydim+1)(xdim+1)(zdim+1) |
-| 011 | 1,2,0 | (ydim+1)(zdim+1)(xdim+1) |
-| 100 | 2,0,1 | (zdim+1)(xdim+1)(ydim+1) |
-| 101 | 2,1,0 | (zdim+1)(ydim+1)(xdim+1) |
-| 110 | 0,1 | Indexed (xdim+1)(ydim+1) |
-| 111 | 1,0 | Indexed (ydim+1)(xdim+1) |
+```
+ Row/Column Indices
+ Flattened Indices | Mat X | Mat Y | Mat Z |
+| Mat X | Mat Y | Mat Z | | r c | r c | r c |
+| 0 | 0 | 0 | | 0 0 | 0 0 | 0 0 |
+| 0 | 1 | 1 | | 0 0 | 0 1 | 0 1 |
+| 3 | 0 | 2 | | 1 0 | 0 0 | 1 0 |
+| 3 | 1 | 3 | | 1 0 | 0 1 | 1 1 |
+| 1 | 2 | 0 | | 0 1 | 1 0 | 0 0 |
+| 1 | 3 | 1 | | 0 1 | 1 1 | 0 1 |
+| 4 | 2 | 2 | | 1 1 | 1 0 | 1 0 |
+| 4 | 3 | 3 | | 1 1 | 1 1 | 1 1 |
+| 2 | 4 | 0 | | 0 2 | 2 0 | 0 0 |
+| 2 | 5 | 1 | | 0 2 | 2 1 | 0 1 |
+| 5 | 4 | 2 | | 1 2 | 2 0 | 1 0 |
+| 5 | 5 | 3 | | 1 2 | 2 1 | 1 1 |
+```
-Permute re-arranges the order of the nested for-loops used to iterate over the
-three dimensions. This allows for in-place transpose, in-place rotate, matrix
-multiply, convolutions, without the limitation of Power-of-Two matrices.
+These row/column indices are converted to the flattened indices when actually
+used when SVP64 looping is going on (during the `maddld` hot loop).
-For normal matrix multiply, the permute setting is 0b010 (order 1,0,2,
-or swap x and y loops).
+See [[openpower/sv/remap]] Section 3.3 Matrix Mode for more information on
+the index sequences which can be produced with SVSHAPE SPRs.
-(*NOTE:* This is done automatically by the Matrix-Multiply REMAP mode, `SVRM=0`.)
### Limitations of Matrix REMAP
breakdown:
-- SVxd=2, SVyd=2, SVzd=3
-- SVRM=0 (Matrix mode, uses `SVSHAPE0` SPR)
-- vf=0 (not using Vertical-First mode)
+- `SVxd=2`, `SVyd=2`, `SVzd=3`
+- `SVrm=0` (Matrix mode, uses `SVSHAPE0` SPR)
+- `vf=0` (not using Vertical-First mode)
To determine the `SVxd`/`SVyd`/`SVzd` settings:
SVzd | mat_X_num_cols OR mat_Y_num_rows
```
+The `svshape` instruction will do the following (for Matrix Multiply REMAP):
+
+- The vector length `VL` of the SVP64 loop is determined based on the three
+dimensions: `VL <- xd * yd * zd`. For this example should be 12, since there
+will be 12 multiply-add accumulates to fully compute the result matrix.
+- `SVSHAPE0`, `SVSHAPE1`, and `SVSHAPE2` SPRs are configured with the x/y/z
+dimensions. As each SVSHAPE register supports three sets of indices (three
+loops), the third index z is skipped (because we're dealing with 2d matrices).
+- Other modifications done by `svshape` (such as `SVSTATE` SPR) are
+out-of-scope for this document.
+
## SVREMAP
-SVRM-Form:
+* See [[sv/remap]] for the `svremap` instruction.
-|0 |6 |11 |13 |15 |17 |19 |21 | 22:25 |26:31 |
-| -- | -- | -- | -- | -- | -- | -- | -- | ---- | ----- |
-| PO | SVme |mi0 | mi1 | mi2 | mo0 | mo1 | pst | rsvd | XO |
+Assigns the configured SVSHAPEs to the relevant operand/result registers
+of the consecutive instruction/s (depending on if REMAP is set to persistent).
* svremap SVme,mi0,mi1,mi2,mo0,mo1,pst
- `mix/mox` fields determine which shape is applied to the activated register
- `mi0=1`, instruction operand RA has SVSHAPE1 applied to it.
- `mi1=2`, instruction operand RB has SVSHAPE2 applied to it.
-- `mi2=3`, instruction operand RA has SVSHAPE3 applied to it.
+- `mi2=3`, instruction operand RC has SVSHAPE3 applied to it.
- `mo0=0`, instruction result RT has SVSHAPE0 applied to it.
-- `mo1=0`, instruction result EA/FRS has SVSHAPE0 applied to it. *(not applicable
-for this example)*
+- `mo1=0`, instruction result EA/FRS has SVSHAPE0 applied to it.
+*(not applicable for this example)*
- `pst=0`, if set, REMAP remains enabled until explicitly disabled, or another
REMAP, or setvl is setup.
-Assigns the configured SVSHAPEs to the relevant operand/result registers
-of the consecutive instruction/s (depending on if REMAP is set to persistent).
-
-The index table shown for the inner method above shows indices for a 'flattened'
-matrix (how it would be arranged in sequential GPR registers), whereas
-SVSHAPE0, 1, 2 registers setup the indices in relation to rows and columns
-of the matrix.
-
-This is how the indices compare:
-```
- Row/Column Indices
- Flattened Indices | Mat X | Mat Y | Mat Z |
-| Mat X | Mat Y | Mat Z | | r c | r c | r c |
-| 0 | 0 | 0 | | 0 0 | 0 0 | 0 0 |
-| 0 | 1 | 1 | | 0 0 | 0 1 | 0 1 |
-| 3 | 0 | 2 | | 1 0 | 0 0 | 1 0 |
-| 3 | 1 | 3 | | 1 0 | 0 1 | 1 1 |
-| 1 | 2 | 0 | | 0 1 | 1 0 | 0 0 |
-| 1 | 3 | 1 | | 0 1 | 1 1 | 0 1 |
-| 4 | 2 | 2 | | 1 1 | 1 0 | 1 0 |
-| 4 | 3 | 3 | | 1 1 | 1 1 | 1 1 |
-| 2 | 4 | 0 | | 0 2 | 2 0 | 0 0 |
-| 2 | 5 | 1 | | 0 2 | 2 1 | 0 1 |
-| 5 | 4 | 2 | | 1 2 | 2 0 | 1 0 |
-| 5 | 5 | 3 | | 1 2 | 2 1 | 1 1 |
-```
+## maddld - Multiply-Add Low Doubleword VA-form
-See [[openpower/sv/remap]] Section 3.3 Matrix Mode for more information on
-the index sequences which can be produced with SVSHAPE SPRs.
+A standard instruction available since version 3.0 of PowerISA.
-## maddld - Multiply-Add Low Doubleword VA-form
+This instruction can be used as a multiply-add accumulate by setting the
+third operand to be the same as the result register, which functions as
+an accumulator.
```
sv.maddld *0, *16, *32, *0
```
-A standard instruction available since version 3.0 of PowerISA.
-
-*Temporary note:* maddld (Multiply-Add Low Doubleword) in the 3.1b version
-of the PowerISA spec is in the Linux Compliancy Subset, not SFS or SFFS.
-See page 1477 of the document, or page 1503 of the pdf.
+breakdown:
-This instruction can be used as a multiply-add accumulate by setting the
-third operand to be the same as the result register, which functions as
-an accumulator.
+- Store result (RT) of the operation starting at register 0.
+- Operands RA and RB correspond to the two operand matrices, starting at
+register 16 and register 32 respectively.
+- The third operand RC is the same as the result register, which gives the
+multiply-add accumulate behaviour.
## Appendix