From: lkcl Date: Fri, 10 Nov 2023 07:40:49 +0000 (+0000) Subject: (no commit message) X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=3ce6b92358d2387b82f92ecfe84b1c2df45b5847;p=libreriscv.git --- diff --git a/openpower/sv/cookbook/remap_matrix.mdwn b/openpower/sv/cookbook/remap_matrix.mdwn index c18fad881..4658d29d3 100644 --- a/openpower/sv/cookbook/remap_matrix.mdwn +++ b/openpower/sv/cookbook/remap_matrix.mdwn @@ -41,9 +41,11 @@ columns of the second matrix. For this example, the following values will be used for the operand matrices X and Y, result Z shown for completeness. +``` X =| 1 2 3 | Y = | 6 7 | Z = | 52 58 | | 3 4 5 | | 8 9 | | 100 112 | | 10 11 | +``` Matrix X has 2 rows, 3 columns (2x3), and matrix Y has 3 rows, 2 columns. @@ -90,7 +92,7 @@ Calculations: | 10 11 | ``` -For the algorithm, assign indeces to matrices as follows: +For the algorithm, assign indices to matrices as follows: ``` Index | 0 1 2 3 4 5 | @@ -201,12 +203,9 @@ Outer and inner product indices side-by-side: * Multiple-Add Low Doubleword instruction pseudo-code (Power ISA 3.0C Book I, section 3.3.9): [[openpower/isa/fixedarith]] -*(Need to check if first arg of svremap correct, then one shown works with -ISACaller)* - ``` svshape 2, 2, 3, 0, 0 - svremap 31, 1, 2, 3, 0, 0, 0 + svremap 15, 1, 2, 3, 0, 0, 0 sv.maddld *0, *16, *32, *0 ``` @@ -216,8 +215,9 @@ matrix multiplication, with non-power-of-2 matrices! The reason why the main part of matrix multiplication is so simple is down to three reasons: -- RISC ISA with powerful and simple instructions (Power ISA SFS/SFFS) -- REMAP indexing system - generates index schedules for a range of +- a RISC ISA is used as the fundamental basis, with powerful and simple + instructions (Power ISA) +- Si ple-V REMAP indexing system - generates index schedules for a range of problems: matrix multiply, FFT, DCT, programmer-defined, etc. - Simple-V SVP64 looping system based on instruction prefixing which turns any scalar instruction into a vector one. Can follow consecutive element @@ -226,11 +226,12 @@ to three reasons: Additionally, if instead of matrix multiplication, a different operation is required (say, to perform logical operations on rows/cols), the third instruction - `maddld` - can be substituted for a different operation. This is -beyond the scope of this guide however. +beyond the scope of this guide, however it should be clear that using +`fmadds` instead would perform a FP32 matrix multiply, just by replacing `maddld`. ## svshape -The `svshape` instruction is a convenient way to access the `SVSHAPE` Special +The `svshape` instruction is a convenient way to set up the `SVSHAPE` Special Purpose Registers (SPRs), which were added alongside the SVP64 looping system for complex element indexing. Without having "Re-shaping" SPRs, only the most basic, consecuting indexing of register elements (0,1,2,3...) @@ -238,7 +239,7 @@ would be possible. The REMAP system has 16 modes, all of which are accessible through the `svshape` instruction. However for the purpose of this guide, only SVrm=0 -(Matrix Multiply) will be covered. The Matrix Multiply mode can be used to +(Matrix Multiply, Inner Product) will be covered. The Matrix Multiply mode can be used to produce indices in the form of the inner product table shown above. ### SVSHAPE Remapping SPRs @@ -259,31 +260,29 @@ This is how the indices compare: ``` Row/Column Indices - Flattened Indices | Mat X | Mat Y | Mat Z | -| Mat X | Mat Y | Mat Z | | r c | r c | r c | -| 0 | 0 | 0 | | 0 0 | 0 0 | 0 0 | -| 0 | 1 | 1 | | 0 0 | 0 1 | 0 1 | -| 3 | 0 | 2 | | 1 0 | 0 0 | 1 0 | -| 3 | 1 | 3 | | 1 0 | 0 1 | 1 1 | -| 1 | 2 | 0 | | 0 1 | 1 0 | 0 0 | -| 1 | 3 | 1 | | 0 1 | 1 1 | 0 1 | -| 4 | 2 | 2 | | 1 1 | 1 0 | 1 0 | -| 4 | 3 | 3 | | 1 1 | 1 1 | 1 1 | -| 2 | 4 | 0 | | 0 2 | 2 0 | 0 0 | -| 2 | 5 | 1 | | 0 2 | 2 1 | 0 1 | -| 5 | 4 | 2 | | 1 2 | 2 0 | 1 0 | -| 5 | 5 | 3 | | 1 2 | 2 1 | 1 1 | -``` - -See the appendix section below for some more info on how to generate these -sequences. + Flattened Indices | Mat X | Mat Y | Mat Z | +| Mat X | Mat Y | Mat Z | | r c | r c | r c | +| 0 | 0 | 0 | | 0 0 | 0 0 | 0 0 | +| 0 | 1 | 1 | | 0 0 | 0 1 | 0 1 | +| 3 | 0 | 2 | | 1 0 | 0 0 | 1 0 | +| 3 | 1 | 3 | | 1 0 | 0 1 | 1 1 | +| 1 | 2 | 0 | | 0 1 | 1 0 | 0 0 | +| 1 | 3 | 1 | | 0 1 | 1 1 | 0 1 | +| 4 | 2 | 2 | | 1 1 | 1 0 | 1 0 | +| 4 | 3 | 3 | | 1 1 | 1 1 | 1 1 | +| 2 | 4 | 0 | | 0 2 | 2 0 | 0 0 | +| 2 | 5 | 1 | | 0 2 | 2 1 | 0 1 | +| 5 | 4 | 2 | | 1 2 | 2 0 | 1 0 | +| 5 | 5 | 3 | | 1 2 | 2 1 | 1 1 | +``` These row/column indices are converted to the flattened indices when actually used when SVP64 looping is going on (during the `maddld` hot loop). -See [[openpower/sv/remap]] Section 3.3 Matrix Mode for more information on -the index sequences which can be produced with SVSHAPE SPRs. - +* See the appendix section below for some more info on how to generate these +sequences. +* See [[openpower/sv/remap]] Section 3.3 Matrix Mode for more information on + the index sequences which can be produced with SVSHAPE SPRs. ### Limitations of Matrix REMAP @@ -291,8 +290,8 @@ the index sequences which can be produced with SVSHAPE SPRs. (MAC), or other operations may be performed in total. For matrix multiply, it means both operand matrices and result matrix can have no more than 127 elements in total. -(Larger matrices can be split into tiles to circumvent this issue, out -of scope of this document). +(Larger matrices can be split into tiles - a standard Computer Science technique - +to circumvent this issue, out of scope of this document). - `svshape` instruction only provides part of the Matrix REMAP capability. For rotation and mirroring, `SVSHAPE` SPRs must be programmed directly (thus requiring more assembler instructions). Future revisions of SVP64 will @@ -310,7 +309,7 @@ breakdown: - `SVxd=2`, `SVyd=2`, `SVzd=3` - `SVrm=0` (Matrix mode) -- `vf=0` (not using Vertical-First mode) +- `vf=0` (use Simple-V Horizontal-First Mode, not Vertical-First) To determine the `SVxd`/`SVyd`/`SVzd` settings: @@ -327,7 +326,7 @@ Table form ``` SVxd | mat_Y_num_cols SVyd | mat_X_num_rows - SVzd | mat_X_num_cols OR mat_Y_num_rows + SVzd | both mat_X_num_cols AND mat_Y_num_rows ``` The `svshape` instruction will do the following (for Matrix Multiply REMAP): @@ -351,7 +350,7 @@ of the consecutive instruction/s (depending on if REMAP is set to persistent). * svremap SVme,mi0,mi1,mi2,mo0,mo1,pst ``` - svremap 15, 1, 2, 3, 0, 0, 0 +svremap 15, 1, 2, 3, 0, 0, 0 ``` breakdown: