From: Andrey Miroshnikov Date: Thu, 9 Nov 2023 19:17:52 +0000 (+0000) Subject: Clean up remap matrix instruction sections X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=353c9900c7128084f26a8e23c20a7340ed0f255f;p=libreriscv.git Clean up remap matrix instruction sections --- diff --git a/openpower/sv/cookbook/remap_matrix.mdwn b/openpower/sv/cookbook/remap_matrix.mdwn index 701c9c01c..0136034d2 100644 --- a/openpower/sv/cookbook/remap_matrix.mdwn +++ b/openpower/sv/cookbook/remap_matrix.mdwn @@ -214,71 +214,54 @@ ISACaller)* The `svshape` instruction is a convenient way to access the `SVSHAPE` Special Purpose Registers (SPRs), which were added alongside the SVP64 looping -system for complex element indexing. Without having "Re-shaping" SPRs, only the most -basic, consecuting indexing of register elements (0,1,2,3...) would -be possible. +system for complex element indexing. Without having "Re-shaping" SPRs, +only the most basic, consecuting indexing of register elements (0,1,2,3...) +would be possible. + +The REMAP system has 16 modes, all of which are accessible through the +`svshape` instruction. However for the purpose of this guide, only SVrm=0 +(Matrix Multiply) will be covered. ### SVSHAPE Remapping SPRs * See [[sv/remap]] for the full break down of SPRs `SVSHAPE0-3`. * Pseudo-code for the [svshape instruction](https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=openpower/isa/simplev.mdwn;h=33a02e#l120) +* Pseude-code also available on the wiki: [[openpower/sv/appendix]]. Matrix Multiply utilises SVSHAPE0-2 SPRs. -(NOTE: This section is duplicated from the remap spec, and thus will be re-worked.) - -SVSHAPE0 SPR: - -|0:5 |6:11 | 12:17 | 18:20 | 21:23 |24:27 |28:29 |30:31| -|----- |----- | ------- | ------- | ------ |------|------ |---- | -|xdimsz|ydimsz| zdimsz | permute | invxyz |offset|skip |mode | - - -skip: - -- 0b00 indicates no dimensions to be skipped -- 0b01 - skip '1st dim' -- 0b10 - skip '2nd dim' -- 0b11 - skip '3rd dim' - -invxyz (3-bit; 1 for x, 1 for y, 1 for z): - -- If corresponding dim bit is zero, start index from zero and increment -- If bit set, start from xdimsz-1 (x dimension size, or whichever dimension -bit is being looked at) and decrement down to zero. - -offset is used to offset the result by `offset` elements (important for when -using element width overrides are used). - -xdimsz, ydimsz, zdimsz are offset by 1, such that 0-0b111111 correspond to -1-64. A value of xdimsz=2 would indicate that in the first dimension there are -3 elements in the array. - -With the example Matrix X (2 rows, 3 columns, or 2x3 matrix), xdimsz=1, -ydimsz=2, zdimsz=0. +The index table shown for the inner method above shows indices for a 'flattened' +matrix (how it would be arranged in sequential GPR registers), whereas +SVSHAPE0, 1, 2 registers setup the indices in relation to rows and columns +of the matrix. -permute setting: +This is how the indices compare: -| permute | order | array format | -| ------- | ----- | ------------------------ | -| 000 | 0,1,2 | (xdim+1)(ydim+1)(zdim+1) | -| 001 | 0,2,1 | (xdim+1)(zdim+1)(ydim+1) | -| 010 | 1,0,2 | (ydim+1)(xdim+1)(zdim+1) | -| 011 | 1,2,0 | (ydim+1)(zdim+1)(xdim+1) | -| 100 | 2,0,1 | (zdim+1)(xdim+1)(ydim+1) | -| 101 | 2,1,0 | (zdim+1)(ydim+1)(xdim+1) | -| 110 | 0,1 | Indexed (xdim+1)(ydim+1) | -| 111 | 1,0 | Indexed (ydim+1)(xdim+1) | +``` + Row/Column Indices + Flattened Indices | Mat X | Mat Y | Mat Z | +| Mat X | Mat Y | Mat Z | | r c | r c | r c | +| 0 | 0 | 0 | | 0 0 | 0 0 | 0 0 | +| 0 | 1 | 1 | | 0 0 | 0 1 | 0 1 | +| 3 | 0 | 2 | | 1 0 | 0 0 | 1 0 | +| 3 | 1 | 3 | | 1 0 | 0 1 | 1 1 | +| 1 | 2 | 0 | | 0 1 | 1 0 | 0 0 | +| 1 | 3 | 1 | | 0 1 | 1 1 | 0 1 | +| 4 | 2 | 2 | | 1 1 | 1 0 | 1 0 | +| 4 | 3 | 3 | | 1 1 | 1 1 | 1 1 | +| 2 | 4 | 0 | | 0 2 | 2 0 | 0 0 | +| 2 | 5 | 1 | | 0 2 | 2 1 | 0 1 | +| 5 | 4 | 2 | | 1 2 | 2 0 | 1 0 | +| 5 | 5 | 3 | | 1 2 | 2 1 | 1 1 | +``` -Permute re-arranges the order of the nested for-loops used to iterate over the -three dimensions. This allows for in-place transpose, in-place rotate, matrix -multiply, convolutions, without the limitation of Power-of-Two matrices. +These row/column indices are converted to the flattened indices when actually +used when SVP64 looping is going on (during the `maddld` hot loop). -For normal matrix multiply, the permute setting is 0b010 (order 1,0,2, -or swap x and y loops). +See [[openpower/sv/remap]] Section 3.3 Matrix Mode for more information on +the index sequences which can be produced with SVSHAPE SPRs. -(*NOTE:* This is done automatically by the Matrix-Multiply REMAP mode, `SVRM=0`.) ### Limitations of Matrix REMAP @@ -303,9 +286,9 @@ multiply: breakdown: -- SVxd=2, SVyd=2, SVzd=3 -- SVRM=0 (Matrix mode, uses `SVSHAPE0` SPR) -- vf=0 (not using Vertical-First mode) +- `SVxd=2`, `SVyd=2`, `SVzd=3` +- `SVrm=0` (Matrix mode, uses `SVSHAPE0` SPR) +- `vf=0` (not using Vertical-First mode) To determine the `SVxd`/`SVyd`/`SVzd` settings: @@ -325,13 +308,23 @@ Table form SVzd | mat_X_num_cols OR mat_Y_num_rows ``` +The `svshape` instruction will do the following (for Matrix Multiply REMAP): + +- The vector length `VL` of the SVP64 loop is determined based on the three +dimensions: `VL <- xd * yd * zd`. For this example should be 12, since there +will be 12 multiply-add accumulates to fully compute the result matrix. +- `SVSHAPE0`, `SVSHAPE1`, and `SVSHAPE2` SPRs are configured with the x/y/z +dimensions. As each SVSHAPE register supports three sets of indices (three +loops), the third index z is skipped (because we're dealing with 2d matrices). +- Other modifications done by `svshape` (such as `SVSTATE` SPR) are +out-of-scope for this document. + ## SVREMAP -SVRM-Form: +* See [[sv/remap]] for the `svremap` instruction. -|0 |6 |11 |13 |15 |17 |19 |21 | 22:25 |26:31 | -| -- | -- | -- | -- | -- | -- | -- | -- | ---- | ----- | -| PO | SVme |mi0 | mi1 | mi2 | mo0 | mo1 | pst | rsvd | XO | +Assigns the configured SVSHAPEs to the relevant operand/result registers +of the consecutive instruction/s (depending on if REMAP is set to persistent). * svremap SVme,mi0,mi1,mi2,mo0,mo1,pst @@ -348,59 +341,33 @@ will have REMAP applied. - `mix/mox` fields determine which shape is applied to the activated register - `mi0=1`, instruction operand RA has SVSHAPE1 applied to it. - `mi1=2`, instruction operand RB has SVSHAPE2 applied to it. -- `mi2=3`, instruction operand RA has SVSHAPE3 applied to it. +- `mi2=3`, instruction operand RC has SVSHAPE3 applied to it. - `mo0=0`, instruction result RT has SVSHAPE0 applied to it. -- `mo1=0`, instruction result EA/FRS has SVSHAPE0 applied to it. *(not applicable -for this example)* +- `mo1=0`, instruction result EA/FRS has SVSHAPE0 applied to it. +*(not applicable for this example)* - `pst=0`, if set, REMAP remains enabled until explicitly disabled, or another REMAP, or setvl is setup. -Assigns the configured SVSHAPEs to the relevant operand/result registers -of the consecutive instruction/s (depending on if REMAP is set to persistent). - -The index table shown for the inner method above shows indices for a 'flattened' -matrix (how it would be arranged in sequential GPR registers), whereas -SVSHAPE0, 1, 2 registers setup the indices in relation to rows and columns -of the matrix. - -This is how the indices compare: -``` - Row/Column Indices - Flattened Indices | Mat X | Mat Y | Mat Z | -| Mat X | Mat Y | Mat Z | | r c | r c | r c | -| 0 | 0 | 0 | | 0 0 | 0 0 | 0 0 | -| 0 | 1 | 1 | | 0 0 | 0 1 | 0 1 | -| 3 | 0 | 2 | | 1 0 | 0 0 | 1 0 | -| 3 | 1 | 3 | | 1 0 | 0 1 | 1 1 | -| 1 | 2 | 0 | | 0 1 | 1 0 | 0 0 | -| 1 | 3 | 1 | | 0 1 | 1 1 | 0 1 | -| 4 | 2 | 2 | | 1 1 | 1 0 | 1 0 | -| 4 | 3 | 3 | | 1 1 | 1 1 | 1 1 | -| 2 | 4 | 0 | | 0 2 | 2 0 | 0 0 | -| 2 | 5 | 1 | | 0 2 | 2 1 | 0 1 | -| 5 | 4 | 2 | | 1 2 | 2 0 | 1 0 | -| 5 | 5 | 3 | | 1 2 | 2 1 | 1 1 | -``` +## maddld - Multiply-Add Low Doubleword VA-form -See [[openpower/sv/remap]] Section 3.3 Matrix Mode for more information on -the index sequences which can be produced with SVSHAPE SPRs. +A standard instruction available since version 3.0 of PowerISA. -## maddld - Multiply-Add Low Doubleword VA-form +This instruction can be used as a multiply-add accumulate by setting the +third operand to be the same as the result register, which functions as +an accumulator. ``` sv.maddld *0, *16, *32, *0 ``` -A standard instruction available since version 3.0 of PowerISA. - -*Temporary note:* maddld (Multiply-Add Low Doubleword) in the 3.1b version -of the PowerISA spec is in the Linux Compliancy Subset, not SFS or SFFS. -See page 1477 of the document, or page 1503 of the pdf. +breakdown: -This instruction can be used as a multiply-add accumulate by setting the -third operand to be the same as the result register, which functions as -an accumulator. +- Store result (RT) of the operation starting at register 0. +- Operands RA and RB correspond to the two operand matrices, starting at +register 16 and register 32 respectively. +- The third operand RC is the same as the result register, which gives the +multiply-add accumulate behaviour. ## Appendix