as of 08Sep2022
\newpage{}
-
-# Use case: DCT
-
-DCT has dozens of uses in Audio-Visual processing and CODECs.
-A full 8-wide in-place triple-loop Inverse DCT may be achieved
-in 8 instructions. Expanding this to 16-wide is a matter of setting
-`svshape 16` **and the same instructions used**.
-Lee Composition may be deployed to construct non-power-two DCTs.
-The cosine table may be computed (once) with 18 Vector instructions
-(one of them `fcos`)
-
-```
-1014 def test_sv_remap_fpmadds_ldbrev_idct_8_mode_4(self):
-1015 """>>> lst = [# LOAD bit-reversed with half-swap
-1016 "svshape 8, 1, 1, 14, 0",
-1017 "svremap 1, 0, 0, 0, 0, 0, 0",
-1018 "sv.lfs/els *0, 4(1)",
-1019 # Outer butterfly, iterative sum
-1020 "svremap 31, 0, 1, 2, 1, 0, 1",
-1021 "svshape 8, 1, 1, 11, 0",
-1022 "sv.fadds *0, *0, *0",
-1023 # Inner butterfly, twin +/- MUL-ADD-SUB
-1024 "svshape 8, 1, 1, 10, 0",
-1025 "sv.ffmadds *0, *0, *0, *8"
-```
-
-<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;hb=HEAD>
-
-# Use case: Matrix Multiply
-
-Matrix Multiply of any size (non-power-2) up to a total of 127 operations
-is achievable with only three instructions. Normally in any other SIMD
-ISA at least one source requires Transposition and often massive rolling
-repetition of data is required. These 3 instructions may be used as the
-"inner triple-loop kernel" of the usual 6-loop Massive Matrix Multiply.
-
-```
- 28 def test_sv_remap1(self):
- 29 """>>> lst = ["svshape 2, 2, 3, 0, 0",
- 30 "svremap 31, 1, 2, 3, 0, 0, 0",
- 31 "sv.fmadds *0, *8, *16, *0"
- 32 ]
-```
-
-<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_matrix.py;hb=HEAD>
-
-# Use case: Parallel Reduction
-
-Parallel (Horizontal) Reduction is often deeply problematic in SIMD and
-Vector ISAs. Parallel Reduction is Fully Deterministic in Simple-V and
-thus may even usefully be deployed on non-associative and non-commutative
-operations.
-
-```
- 75 def test_sv_remap2(self):
- 76 """>>> lst = ["svshape 7, 0, 0, 7, 0",
- 77 "svremap 31, 1, 0, 0, 0, 0, 0", # different order
- 78 "sv.subf *0, *8, *16"
- 79 ]
- 80 REMAP sv.subf RT,RA,RB - inverted application of RA/RB
- 81 left/right due to subf
-```
-
-<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_parallel_reduce.py;hb=HEAD>
-
# Use case: LD/ST-Multi
Context-switching saving and restoring of registers on the stack often
setvli 64
sv.ld/sm=EQ *rt,0(ra)
```
-\newpage{}
# Use case: Twin-Predication, re-entrant
for full re-entrancy on a Context Switch or function call *even if
in the middle of executing a loop*. Also demonstrates that it is
permissible for a programmer to write **directly** to the SVSTATE
-SPR, and still expect Deterministic Behaviour.
+SPR, and still expect Deterministic Behaviour. It's not exactly recommended
+(performance may be impacted by direct SVSTATE access), but it is not
+prohibited either.
```
292 # checks that we are able to resume in the middle of a VL loop,
<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_bc.py;hb=HEAD>
+\newpage{}
+# Use case: DCT
+
+DCT has dozens of uses in Audio-Visual processing and CODECs.
+A full 8-wide in-place triple-loop Inverse DCT may be achieved
+in 8 instructions. Expanding this to 16-wide is a matter of setting
+`svshape 16` **and the same instructions used**.
+Lee Composition may be deployed to construct non-power-two DCTs.
+The cosine table may be computed (once) with 18 Vector instructions
+(one of them `fcos`)
+
+```
+1014 def test_sv_remap_fpmadds_ldbrev_idct_8_mode_4(self):
+1015 """>>> lst = [# LOAD bit-reversed with half-swap
+1016 "svshape 8, 1, 1, 14, 0",
+1017 "svremap 1, 0, 0, 0, 0, 0, 0",
+1018 "sv.lfs/els *0, 4(1)",
+1019 # Outer butterfly, iterative sum
+1020 "svremap 31, 0, 1, 2, 1, 0, 1",
+1021 "svshape 8, 1, 1, 11, 0",
+1022 "sv.fadds *0, *0, *0",
+1023 # Inner butterfly, twin +/- MUL-ADD-SUB
+1024 "svshape 8, 1, 1, 10, 0",
+1025 "sv.ffmadds *0, *0, *0, *8"
+```
+
+<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;hb=HEAD>
+
+# Use case: Matrix Multiply
+
+Matrix Multiply of any size (non-power-2) up to a total of 127 operations
+is achievable with only three instructions. Normally in any other SIMD
+ISA at least one source requires Transposition and often massive rolling
+repetition of data is required. These 3 instructions may be used as the
+"inner triple-loop kernel" of the usual 6-loop Massive Matrix Multiply.
+
+```
+ 28 def test_sv_remap1(self):
+ 29 """>>> lst = ["svshape 2, 2, 3, 0, 0",
+ 30 "svremap 31, 1, 2, 3, 0, 0, 0",
+ 31 "sv.fmadds *0, *8, *16, *0"
+ 32 ]
+```
+
+<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_matrix.py;hb=HEAD>
+
+# Use case: Parallel Reduction
+
+Parallel (Horizontal) Reduction is often deeply problematic in SIMD and
+Vector ISAs. Parallel Reduction is Fully Deterministic in Simple-V and
+thus may even usefully be deployed on non-associative and non-commutative
+operations.
+
+```
+ 75 def test_sv_remap2(self):
+ 76 """>>> lst = ["svshape 7, 0, 0, 7, 0",
+ 77 "svremap 31, 1, 0, 0, 0, 0, 0", # different order
+ 78 "sv.subf *0, *8, *16"
+ 79 ]
+ 80 REMAP sv.subf RT,RA,RB - inverted application of RA/RB
+ 81 left/right due to subf
+```
+
+<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_parallel_reduce.py;hb=HEAD>
+
[[!tag opf_rfc]]
[^extend]: Prefix opcode space **must** be reserved in advance to do so, in order to avoid the catastrophic binary-incompatibility mistake made by RISC-V RVV and ARM SVE/2