From: Andrey Miroshnikov Date: Tue, 17 Oct 2023 15:56:11 +0000 (+0000) Subject: Add outer/inner loop info X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=9006971fe6c4e95560706918af19a4e42707a9d9;p=libreriscv.git Add outer/inner loop info --- diff --git a/openpower/sv/cookbook/remap_matrix.mdwn b/openpower/sv/cookbook/remap_matrix.mdwn index 406fda611..d6fb053b5 100644 --- a/openpower/sv/cookbook/remap_matrix.mdwn +++ b/openpower/sv/cookbook/remap_matrix.mdwn @@ -39,21 +39,8 @@ Matrix X has 2 rows, 3 columns (2x3), and matrix Y has 3 rows, 2 columns. To determine the final dimensions of the resultant matrix Z, take the number of rows from matrix X (2) and number of columns from matrix Y (2). -For the algorithm, assign indeces to matrices as follows: - - Index | 0 1 2 3 4 5 | - Mat X | 1 2 3 3 4 5 | - - Index | 0 1 2 3 4 5 | - Mat Y | 6 7 8 9 10 11 | - - Index | 0 1 2 3 | - Mat Z | 52 58 100 112 | - -(Start with the first row, then assign index left-to-right, top-to-bottom.) - The method usually taught in linear algebra course to students is the -following: +following (outer product): 1. Start with the first row of the first matrix, and first column of the second matrix. @@ -89,7 +76,73 @@ Calculations: | 3 4 5 | * | 8 9 | | 100 112 | | 10 11 | +For the algorithm, assign indeces to matrices as follows: + + Index | 0 1 2 3 4 5 | + Mat X | 1 2 3 3 4 5 | + + Index | 0 1 2 3 4 5 | + Mat Y | 6 7 8 9 10 11 | + + Index | 0 1 2 3 | + Mat Z | 52 58 100 112 | + +(Start with the first row, then assign index left-to-right, top-to-bottom.) + +Index list: + + Mat X | Mat Y | Mat Z + 0 | 0 | 0 + 1 | 2 | 0 + 2 | 4 | 0 + 0 | 1 | 1 + 1 | 3 | 1 + 2 | 5 | 1 + 3 | 0 | 2 + 4 | 2 | 2 + 5 | 4 | 2 + 3 | 1 | 3 + 4 | 3 | 3 + 5 | 5 | 3 + + +The issue with this algorithm is that the result matrix element is the same +for three consecutive operations, and where each element is stored in CPU +registers, the same register will be written to three times and thus causing +consistent stalling. + +## Inner Product + +A slight modification to the order of the loops in the algorithm massively +reduces the chance of read-after-write hazards, as the result matrix +element (and thus register) changes with every multiply-add operation. + +The code: + + for i in range(mat_X_num_rows): + for j in range(0, mat_X_num_cols): # or mat_Y_num_rows + for k in range(0, mat_Y_num_cols): + mat_Z[i][k] += mat_X[i][j] * mat_Y[j][k] +Index list: + + Mat X | Mat Y | Mat Z + 0 | 0 | 0 + 0 | 1 | 1 + 3 | 0 | 2 + 3 | 1 | 3 + 1 | 2 | 0 + 1 | 3 | 1 + 4 | 2 | 2 + 4 | 3 | 3 + 2 | 4 | 0 + 2 | 5 | 1 + 5 | 4 | 2 + 5 | 5 | 3 + +The index for the result matrix changes with every operation, and thus the +consecutive multiply-add instruction doesn't depend on the previous write +register. ## Appendix