<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_predication.py;hb=HEAD>
-## 3D GPU style "Branch Conditional"
+## Matrix Multiply
-(*Note: Specification is ready, Simulator still under development of
-full specification capabilities*)
-This example demonstrates a 2-long Vector Branch-Conditional only
-succeeding if *all* elements in the Vector are successful. This
-avoids the need for additional instructions that would need to
-perform a Parallel Reduction of a Vector of Condition Register
-tests down to a single value, on which a Scalar Branch-Conditional
-could then be performed. Full Rationale at
-<https://libre-soc.org/openpower/sv/branches/>
+Matrix Multiply of any size (non-power-2) up to a total of 127 operations
+is achievable with only three instructions. Normally in any other SIMD
+ISA at least one source requires Transposition and often massive rolling
+repetition of data is required. These 3 instructions may be used as the
+"inner triple-loop kernel" of the usual 6-loop Massive Matrix Multiply.
```
- 80 # test_sv_branch_cond_all
- 81 for i in [7, 8, 9]:
- 83 addi 1, 0, i+1 # set r1 to i
- 84 addi 2, 0, i # set r2 to i
- 85 cmpi cr0, 1, 1, 8 # compare r1 with 8 and store to cr0
- 86 cmpi cr1, 1, 2, 8 # compare r2 with 8 and store to cr1
- 87 sv.bc/all 12, *1, 0xc # bgt 0xc - branch if BOTH
- 88 # r1 AND r2 greater 8 to the nop below
- 89 addi 3, 0, 0x1234, # if tests fail this shouldn't execute
- 90 or 0, 0, 0 # branch target
+ 28 # test_sv_remap1 5x4 by 4x3 matrix multiply
+ 29 svshape 5, 4, 3, 0, 0
+ 30 svremap 31, 1, 2, 3, 0, 0, 0
+ 31 sv.fmadds *0, *8, *16, *0
```
-<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_bc.py;hb=HEAD>
+<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_matrix.py;hb=HEAD>
+
+## Parallel Reduction
+
+Parallel (Horizontal) Reduction is often deeply problematic in SIMD and
+Vector ISAs. Parallel Reduction is Fully Deterministic in Simple-V and
+thus may even usefully be deployed on non-associative and non-commutative
+operations.
+
+```
+ 75 # test_sv_remap2
+ 76 svshape 7, 0, 0, 7, 0
+ 77 svremap 31, 1, 0, 0, 0, 0, 0 # different order
+ 78 sv.subf *0, *8, *16
+```
+
+<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_parallel_reduce.py;hb=HEAD>
\newpage{}
## DCT
<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;hb=HEAD>
-## Matrix Multiply
-
-Matrix Multiply of any size (non-power-2) up to a total of 127 operations
-is achievable with only three instructions. Normally in any other SIMD
-ISA at least one source requires Transposition and often massive rolling
-repetition of data is required. These 3 instructions may be used as the
-"inner triple-loop kernel" of the usual 6-loop Massive Matrix Multiply.
-
-```
- 28 # test_sv_remap1 5x4 by 4x3 matrix multiply
- 29 svshape 5, 4, 3, 0, 0
- 30 svremap 31, 1, 2, 3, 0, 0, 0
- 31 sv.fmadds *0, *8, *16, *0
-```
-
-<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_matrix.py;hb=HEAD>
-
-## Parallel Reduction
+## 3D GPU style "Branch Conditional"
-Parallel (Horizontal) Reduction is often deeply problematic in SIMD and
-Vector ISAs. Parallel Reduction is Fully Deterministic in Simple-V and
-thus may even usefully be deployed on non-associative and non-commutative
-operations.
+(*Note: Specification is ready, Simulator still under development of
+full specification capabilities*)
+This example demonstrates a 2-long Vector Branch-Conditional only
+succeeding if *all* elements in the Vector are successful. This
+avoids the need for additional instructions that would need to
+perform a Parallel Reduction of a Vector of Condition Register
+tests down to a single value, on which a Scalar Branch-Conditional
+could then be performed. Full Rationale at
+<https://libre-soc.org/openpower/sv/branches/>
```
- 75 # test_sv_remap2
- 76 svshape 7, 0, 0, 7, 0
- 77 svremap 31, 1, 0, 0, 0, 0, 0 # different order
- 78 sv.subf *0, *8, *16
+ 80 # test_sv_branch_cond_all
+ 81 for i in [7, 8, 9]:
+ 83 addi 1, 0, i+1 # set r1 to i
+ 84 addi 2, 0, i # set r2 to i
+ 85 cmpi cr0, 1, 1, 8 # compare r1 with 8 and store to cr0
+ 86 cmpi cr1, 1, 2, 8 # compare r2 with 8 and store to cr1
+ 87 sv.bc/all 12, *1, 0xc # bgt 0xc - branch if BOTH
+ 88 # r1 AND r2 greater 8 to the nop below
+ 89 addi 3, 0, 0x1234, # if tests fail this shouldn't execute
+ 90 or 0, 0, 0 # branch target
```
-<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_parallel_reduce.py;hb=HEAD>
+<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_bc.py;hb=HEAD>
## Big-Integer Math
```
Additional 128/64 Mul and Div/Mod instructions may similarly be exploited
-to perform roll-over in arbitrary-length arithmetic.
+to perform roll-over in arbitrary-length arithmetic: effectively they use
+one of the two 64-bit output registers as a form of "64-bit Carry In-Out".
+
+All of these big-integer instructions are Scalar instructions standing on
+their own merit and may be utilised even in a Scalar environment to improve
+performance. When used with Simple-V they may also be used to improve
+performance and also greatly simplify unlimited-length biginteger algorithms.
<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_bigint.py;hb=HEAD>