# Simple-V Architectural Resources
* No new Interrupt types are required.
+ **No modifications to existing Power ISA are required either**.
* GPR FPR and CR Field Register numbers are extended to 128.
A future version may extend to 256 or beyond [^extend]
* (A future version or other Stakeholder *may* wish to drop Simple-V
**EXT059 and EXT063**
Additionally for High-Performance Compute and Competitive 3D GPU, IEEE754 FP
-Transcendentals are required:
+Transcendentals are required, as are some DCT/FFT "Twin-Butterfly" operations:
* QTY 33of X-Form "1-argument" (fsin, fsins, fcos, fcoss)
* QTY 15of X-Form "2-argument" (pow, atan2, fhypot)
+* QTY 5of A-Form "3-in 2-out" FP Butterfly operations for DCT/FFT
+* QTY 8of X-Form "2-in 2-out" FP Butterfly operations (again for DCT/FFT)
+
+\newpage{}
+
+# Use case: DCT
+
+DCT has dozens of uses in Audio-Visual processing and CODECs.
+A full 8-wide in-place triple-loop Inverse DCT may be achieved
+in 7 instructions. Expanding this to 16-wide is a matter of setting
+`svshape 16`. Lee Composition may be deployed to construct non-power-two
+DCTs. The cosine table may be computed (once) with 18 Vector instructions.
+
+<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;hb=HEAD>
+
+
+```
+1014 def test_sv_remap_fpmadds_ldbrev_idct_8_mode_4(self):
+1015 """>>> lst = [# LOAD bit-reversed with half-swap
+1016 "svshape 8, 1, 1, 14, 0",
+1017 "svremap 1, 0, 0, 0, 0, 0, 0",
+1018 "sv.lfs/els *0, 4(1)",
+1019 # Outer butterfly, iterative sum
+1020 "svremap 31, 0, 1, 2, 1, 0, 1",
+1021 "svshape 8, 1, 1, 11, 0",
+1022 "sv.fadds *0, *0, *0",
+1023 # Inner butterfly, twin +/- MUL-ADD-SUB
+1024 "svshape 8, 1, 1, 10, 0",
+1025 "sv.ffmadds *0, *0, *0, *8"
+```
+
+# Use case: Matrix Multiply
+
+Matrix Multiply of any size (non-power-2) up to a total of 127 operations
+is achievable with only three instructions. Normally in any other SIMD
+ISA at least one source requires Transposition and often massive rolling
+repetition of data is required. These 3 instructions may be used as the
+"inner triple-loop kernel" of the usual 6-loop Massive Matrix Multiply.
+
+<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_matrix.py;hb=HEAD>
+```
+ 28 def test_sv_remap1(self):
+ 29 """>>> lst = ["svshape 2, 2, 3, 0, 0",
+ 30 "svremap 31, 1, 2, 3, 0, 0, 0",
+ 31 "sv.fmadds *0, *8, *16, *0"
+ 32 ]
+```
+
+# Use case: Parallel Reduction
+
+Parallel (Horizontal) Reduction is often deeply problematic in SIMD and
+Vector ISAs. Parallel Reduction is Fully Deterministic in Simple-V and
+thus may even usefully be deployed on non-associative and non-commutative
+operations.
+
+<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_parallel_reduce.py;hb=HEAD>
+
+```
+ 75 def test_sv_remap2(self):
+ 76 """>>> lst = ["svshape 7, 0, 0, 7, 0",
+ 77 "svremap 31, 1, 0, 0, 0, 0, 0", # different order
+ 78 "sv.subf *0, *8, *16"
+ 79 ]
+ 80 REMAP sv.subf RT,RA,RB - inverted application of RA/RB
+ 81 left/right due to subf
+```
+
+# Use case: LD/ST-Multi
+
+Context-switching saving and restoring of registers on the stack often
+requires explicit loop-unrolling to achieve effectively. In SVP64 it
+is possible to use a Predicate Mask to "compact" or "expand" a swathe
+of desired registers, dynamically. Known as "VCOMPRESS" and "VEXPAND",
+runtime-configurable LD/ST-Multi is achievable with 2 instructions.
+
+```
+ # load 64 registers off the stack, in-order, skipping unneeded ones
+ # by using CR0-CR63's "EQ" bits to select only those needed.
+ setvli 64
+ sv.ld/sm=EQ *rt,0(ra)
+```
[[!tag opf_rfc]]
[^extend]: Prefix opcode space **must** be reserved in advance to do so, in order to avoid the catastrophic binary-incompatibility mistake made by RISC-V RVV and ARM SVE/2
-[^likeext001]: SVP64-Single is remarkably similar to the "bit 1" of EXT001 being set to indicate that the 64-bits is to be allocated in full to a new encoding, but in fact it still embeds v3.0 Scalar operations.
+[^likeext001]: SVP64-Single is remarkably similar to the "bit 1" of EXT001 being set to indicate that the 64-bits is to be allocated in full to a new encoding, but in fact SVP64-single still embeds v3.0 Scalar operations.
[^pseudorewrite]: elwidth overrides does however mean that all SFS / SFFS pseudocode will need rewriting to be in terms of XLEN. This has the indirect side-effect of automatically making a 32-bit Scalar Power ISA Specification possible, as well as a future 128-bit one (Cross-reference: RISC-V RV32 and RV128)