matrix to create
a 5x4 result:
- svshape 5, 4, 3, 0, 0
+```
+ svshape 5, 4, 3, 0, 0 # Outer Product
svremap 15, 1, 2, 3, 0, 0, 0, 0
- sv.fmadds *0, *8, *16, *0
+ sv.fmadds *0, *16, *32, *0
+```
* svshape sets up the four SVSHAPE SPRS for a Matrix Schedule
* svremap activates four out of five registers RA RB RC RT RS (15)
- RC to use SVSHAPE3
- RT to use SVSHAPE0
- RS Remapping to not be activated
-* sv.fmadds has RT=0.v, RA=8.v, RB=16.v, RC=0.v
+* sv.fmadds has Vectors at RT=0, RA=16, RB=32, RC=0
* With REMAP being active each register's element index is
*independently* transformed using the specified SHAPEs.
Thus the Vector Loop is arranged such that the use of
the multiply-and-accumulate instruction executes precisely the required
-Schedule to perform an in-place in-registers Matrix Multiply with no
+Schedule to perform an in-place in-registers Outer Product
+Matrix Multiply with no
need to perform additional Transpose or register copy instructions.
The example above may be executed as a unit test and demo,
[here](https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_matrix.py;h=c15479db9a36055166b6b023c7495f9ca3637333;hb=a17a252e474d5d5bf34026c25a19682e3f2015c3#l94)