From: Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Date: Fri, 9 Sep 2022 01:00:21 +0000 (+0100)
Subject: shuffle pages
X-Git-Tag: opf_rfc_ls005_v1~573
X-Git-Url: https://git.libre-soc.org/?a=commitdiff_plain;h=dcb76e17280cec4121968d2b5c69b169e47f3f3b;p=libreriscv.git

shuffle pages
---

diff --git a/openpower/sv/rfc/ls001.mdwn b/openpower/sv/rfc/ls001.mdwn
index 44f05e271..99aa77680 100644
--- a/openpower/sv/rfc/ls001.mdwn
+++ b/openpower/sv/rfc/ls001.mdwn
@@ -272,71 +272,6 @@ For each of EXT059 and EXT063:
   as of 08Sep2022
 
 \newpage{}
-
-# Use case: DCT
-
-DCT has dozens of uses in Audio-Visual processing and CODECs.
-A full 8-wide in-place triple-loop Inverse DCT may be achieved
-in 8 instructions.  Expanding this to 16-wide is a matter of setting
-`svshape 16` **and the same instructions used**.
-Lee Composition may be deployed to construct non-power-two DCTs.
-The cosine table may be computed (once) with 18 Vector instructions
-(one of them `fcos`)
-
-```
-1014     def test_sv_remap_fpmadds_ldbrev_idct_8_mode_4(self):
-1015         """>>> lst = [# LOAD bit-reversed with half-swap
-1016                       "svshape 8, 1, 1, 14, 0",
-1017                       "svremap 1, 0, 0, 0, 0, 0, 0",
-1018                       "sv.lfs/els *0, 4(1)",
-1019                       # Outer butterfly, iterative sum
-1020                       "svremap 31, 0, 1, 2, 1, 0, 1",
-1021                       "svshape 8, 1, 1, 11, 0",
-1022                       "sv.fadds *0, *0, *0",
-1023                       # Inner butterfly, twin +/- MUL-ADD-SUB
-1024                       "svshape 8, 1, 1, 10, 0",
-1025                       "sv.ffmadds *0, *0, *0, *8"
-```
-
-<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;hb=HEAD>
-
-# Use case: Matrix Multiply
-
-Matrix Multiply of any size (non-power-2) up to a total of 127 operations
-is achievable with only three instructions.  Normally in any other SIMD
-ISA at least one source requires Transposition and often massive rolling
-repetition of data is required.  These 3 instructions may be used as the
-"inner triple-loop kernel" of the usual 6-loop Massive Matrix Multiply.
-
-```
-  28     def test_sv_remap1(self):
-  29         """>>> lst = ["svshape 2, 2, 3, 0, 0",
-  30                       "svremap 31, 1, 2, 3, 0, 0, 0",
-  31                       "sv.fmadds *0, *8, *16, *0"
-  32                      ]
-```
-
-<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_matrix.py;hb=HEAD>
-
-# Use case: Parallel Reduction
-
-Parallel (Horizontal) Reduction is often deeply problematic in SIMD and
-Vector ISAs.  Parallel Reduction is Fully Deterministic in Simple-V and
-thus may even usefully be deployed on non-associative and non-commutative
-operations.
-
-```
-  75     def test_sv_remap2(self):
-  76         """>>> lst = ["svshape 7, 0, 0, 7, 0",
-  77                       "svremap 31, 1, 0, 0, 0, 0, 0", # different order
-  78                       "sv.subf *0, *8, *16"
-  79                         ]
-  80                 REMAP sv.subf RT,RA,RB - inverted application of RA/RB
-  81                                          left/right due to subf
-```
-
-<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_parallel_reduce.py;hb=HEAD>
-
 # Use case: LD/ST-Multi
 
 Context-switching saving and restoring of registers on the stack often
@@ -351,7 +286,6 @@ runtime-configurable LD/ST-Multi is achievable with 2 instructions.
     setvli 64
     sv.ld/sm=EQ *rt,0(ra)
 ```
-\newpage{}
 
 # Use case: Twin-Predication, re-entrant
 
@@ -361,7 +295,9 @@ that sufficient state is stored within the Vector Context SPR, SVSTATE,
 for full re-entrancy on a Context Switch or function call *even if
 in the middle of executing a loop*.  Also demonstrates that it is
 permissible for a programmer to write **directly** to the SVSTATE
-SPR, and still expect Deterministic Behaviour.
+SPR, and still expect Deterministic Behaviour. It's not exactly recommended
+(performance may be impacted by direct SVSTATE access), but it is not
+prohibited either.
 
 ```
  292     # checks that we are able to resume in the middle of a VL loop,
@@ -414,6 +350,71 @@ could then be performed.  Full Rationale at
 
 <https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_bc.py;hb=HEAD>
 
+\newpage{}
+# Use case: DCT
+
+DCT has dozens of uses in Audio-Visual processing and CODECs.
+A full 8-wide in-place triple-loop Inverse DCT may be achieved
+in 8 instructions.  Expanding this to 16-wide is a matter of setting
+`svshape 16` **and the same instructions used**.
+Lee Composition may be deployed to construct non-power-two DCTs.
+The cosine table may be computed (once) with 18 Vector instructions
+(one of them `fcos`)
+
+```
+1014     def test_sv_remap_fpmadds_ldbrev_idct_8_mode_4(self):
+1015         """>>> lst = [# LOAD bit-reversed with half-swap
+1016                       "svshape 8, 1, 1, 14, 0",
+1017                       "svremap 1, 0, 0, 0, 0, 0, 0",
+1018                       "sv.lfs/els *0, 4(1)",
+1019                       # Outer butterfly, iterative sum
+1020                       "svremap 31, 0, 1, 2, 1, 0, 1",
+1021                       "svshape 8, 1, 1, 11, 0",
+1022                       "sv.fadds *0, *0, *0",
+1023                       # Inner butterfly, twin +/- MUL-ADD-SUB
+1024                       "svshape 8, 1, 1, 10, 0",
+1025                       "sv.ffmadds *0, *0, *0, *8"
+```
+
+<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;hb=HEAD>
+
+# Use case: Matrix Multiply
+
+Matrix Multiply of any size (non-power-2) up to a total of 127 operations
+is achievable with only three instructions.  Normally in any other SIMD
+ISA at least one source requires Transposition and often massive rolling
+repetition of data is required.  These 3 instructions may be used as the
+"inner triple-loop kernel" of the usual 6-loop Massive Matrix Multiply.
+
+```
+  28     def test_sv_remap1(self):
+  29         """>>> lst = ["svshape 2, 2, 3, 0, 0",
+  30                       "svremap 31, 1, 2, 3, 0, 0, 0",
+  31                       "sv.fmadds *0, *8, *16, *0"
+  32                      ]
+```
+
+<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_matrix.py;hb=HEAD>
+
+# Use case: Parallel Reduction
+
+Parallel (Horizontal) Reduction is often deeply problematic in SIMD and
+Vector ISAs.  Parallel Reduction is Fully Deterministic in Simple-V and
+thus may even usefully be deployed on non-associative and non-commutative
+operations.
+
+```
+  75     def test_sv_remap2(self):
+  76         """>>> lst = ["svshape 7, 0, 0, 7, 0",
+  77                       "svremap 31, 1, 0, 0, 0, 0, 0", # different order
+  78                       "sv.subf *0, *8, *16"
+  79                         ]
+  80                 REMAP sv.subf RT,RA,RB - inverted application of RA/RB
+  81                                          left/right due to subf
+```
+
+<https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_parallel_reduce.py;hb=HEAD>
+
 [[!tag opf_rfc]]
 
 [^extend]: Prefix opcode space **must** be reserved in advance to do so, in order to avoid the catastrophic binary-incompatibility mistake made by RISC-V RVV and ARM SVE/2