| 5 | 5 | 3 |
```
+Worked example broken down into individual multiply-add accumulates:
+
+[[!img outer_product_worked_example.jpg size="600x ]]
The issue with this algorithm is that the result matrix element is the same
for three consecutive operations, and where each element is stored in CPU
| 5 | 5 | 3 |
```
+Worked example for inner product:
+
+[[!img inner_product_worked_example.jpg size="600x ]]
+
The index for the result matrix changes with every operation, and thus the
consecutive multiply-add instruction doesn't depend on the previous write
register.