whitespace

author Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Sat, 29 Apr 2023 17:12:58 +0000 (18:12 +0100)

committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>

Sat, 29 Apr 2023 17:12:58 +0000 (18:12 +0100)
author Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Sat, 29 Apr 2023 17:12:58 +0000 (18:12 +0100)
committer Luke Kenneth Casson Leighton <lkcl@lkcl.net>
Sat, 29 Apr 2023 17:12:58 +0000 (18:12 +0100)
diff --git a/openpower/sv/rfc/ls016.mdwn b/openpower/sv/rfc/ls016.mdwn

index c50a2aaff74e06d253586f1ee182114b5d6dc603..aa44445057706b9534e9df155ab417b52c20a9a7 100644 (file)
--- a/openpower/sv/rfc/ls016.mdwn
+++ b/openpower/sv/rfc/ls016.mdwn
@@ -63,10 +63,10 @@
  
  The list of uses for DCT is enormous - well over a hundred.
  <https://en.wikipedia.org/wiki/Discrete_cosine_transform#General_applications>
-The number of uses for FFT is also equally known to be extremely high
+The number of uses for FFT, DFT, NTT is also equally known to be extremely high
  <https://en.wikipedia.org/wiki/Fast_Fourier_transform#Applications>
  ARM has already added `vqrdmulhq_s16/32` instructions as their inclusion
-in any ISA replaces **eight** non-Twin-Butterfly instructions, which
+in any ISA replaces **eight** equivalent non-Twin-Butterfly instructions, which
  are often loop-unrolled, resulting in L1 I-Cache stripmining as well
  as requiring far greater resources (double the number of intermediate
  Vector registers) or much more complex hardware to
@@ -75,26 +75,27 @@ get efficient execution.
  **Notes and Observations**:
  
  1. Whilst it is easy to justify these high-value instructions they are
-   sufficiently complex as to warrant placement as optional SFFS in
-   the new EXT2xx area (marked as Vectoriseable).
+   sufficiently complex as to warrant placement as optional SFFS in the
+   new EXT2xx area (marked as Vectoriseable).
  2. Although they are 3-in 2-out the actual encoding is as a double-overwrite
     reducing the actual number of operands down to three (RT RA and RB)
-   where RT is a Read-Modify-Write and an additional RS (normally RT+1) is implicit.
+   where RT is a Read-Modify-Write and an additional RS (normally RT+1)
+   is implicit.
  3. As with the biginteger set of 3-in 2-out instructions if Power ISA did not
-   already have LD/ST-with-Update, Load/Store-Quad, and other RTp and RAp instructions,
-   these instructions would not be proposed.
+   already have LD/ST-with-Update, Load/Store-Quad, and other RTp and
+   RAp instructions, these instructions would not be proposed.
  4. The read and write of two overlapping registers normally requires
-   an intermediate register (similar to the justifcation for CAS - Compare-and-Swap).
-   When Vectorised the situation becomes even worse: an entire *Vector*
-   of intermediate temporaries is required.
-   Thus *even if implemented inefficiently* requiring more cycles to complete
-   (taking an extra cycle to write the second result) these instructions still
-   save on resources.
+   an intermediate register (similar to the justifcation for CAS -
+   Compare-and-Swap).  When Vectorised the situation becomes even
+   worse: an entire *Vector* of intermediate temporaries is required.
+   Thus *even if implemented inefficiently* requiring more cycles to
+   complete (taking an extra cycle to write the second result) these
+   instructions still save on resources.
  5. Macro-op fusion equivalents of these instructions is *not possible* for
-   exactly the same reason that the equivalent CAS sequence may not be macro-op
-   fused.  Full in-place Vectorised FFT and DCT algorithms *only* become
-   possible due to these instructions atomically reading **both** operands
-   into internal Reservation Stations (exactly like CAS).
+   exactly the same reason that the equivalent CAS sequence may not be
+   macro-op fused.  Full in-place Vectorised FFT and DCT algorithms *only*
+   become possible due to these instructions atomically reading **both**
+   operands into internal Reservation Stations (exactly like CAS).
  5. Although desirable (particularly to detect overflow) Rc=1 is hard to
     conceptualise.  It is likely that instead, Simple-V "saturation" if
     enabled will create an Rc=1 CR.SO flag (including SVP64Single).
@@ -151,15 +152,15 @@ Add the following new fields to Book I 1.6.2 Word Instruction Fields:
  
  | Form | Book | Page | Version | Mnemonic | Description |
  |------|------|------|---------|----------|-------------|
-| A    | I    | #    | 3.2B    | maddsubrs | Integer DCT/FFT Twin-Butterfly |
-| X    | I    | #    | 3.2B    | fdmadds   | FP DCT Twin-Butterfly Single |
-| X    | I    | #    | 3.2B    | ffmadds   | FP FFT Twin-Butterfly Single |
-| X    | I    | #    | 3.2B    | fdmadds   | FP DCT Twin-Butterfly Double |
-| X    | I    | #    | 3.2B    | ffmadds   | FP FFT Twin-Butterfly Double |
-| X    | I    | #    | 3.2B    | ffadds    | FP FFT Twin-Butterfly Single |
-| X    | I    | #    | 3.2B    | ffadd     | FP FFT Twin-Butterfly Double |
-| X    | I    | #    | 3.2B    | ffsubs    | FP FFT Twin-Butterfly Single |
-| X    | I    | #    | 3.2B    | ffsub     | FP FFT Twin-Butterfly Double |
+| A    | I    | #    | 3.2B    |maddsubrs | Integer DCT/FFT Twin-Butterfly |
+| X    | I    | #    | 3.2B    |fdmadds   | FP DCT Twin-Butterfly Single |
+| X    | I    | #    | 3.2B    |ffmadds   | FP FFT Twin-Butterfly Single |
+| X    | I    | #    | 3.2B    |fdmadds   | FP DCT Twin-Butterfly Double |
+| X    | I    | #    | 3.2B    |ffmadds   | FP FFT Twin-Butterfly Double |
+| X    | I    | #    | 3.2B    |ffadds    | FP FFT Twin-Butterfly Single |
+| X    | I    | #    | 3.2B    |ffadd     | FP FFT Twin-Butterfly Double |
+| X    | I    | #    | 3.2B    |ffsubs    | FP FFT Twin-Butterfly Single |
+| X    | I    | #    | 3.2B    |ffsub     | FP FFT Twin-Butterfly Double |
  
  [[!tag opf_rfc]]
  
diff --git a/openpower/sv/twin_butterfly.mdwn b/openpower/sv/twin_butterfly.mdwn

index 1a17e9cdb086ef68c60ff62c46c16af996f3ba93..72a2552ec65541ebe96849f015e83884fb7f4019 100644 (file)
--- a/openpower/sv/twin_butterfly.mdwn
+++ b/openpower/sv/twin_butterfly.mdwn
@@ -77,6 +77,10 @@ Example taken from libvpx
      srawi 5,5,14
  ```
  
+-------
+
+\newpage{}
+
  ## Integer Butterfly Multiply Add/Sub FFT/DCT
  
  **Add the following to Book I Section 3.3.9.1**
@@ -125,6 +129,10 @@ Special Registers Altered:
      None
  ```
  
+-------
+
+\newpage{}
+
  # Twin Butterfly Floating-Point DCT Instruction(s)
  
  ## Floating-Point Twin Multiply-Add DCT [Single]
author	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Sat, 29 Apr 2023 17:12:58 +0000 (18:12 +0100)
committer	Luke Kenneth Casson Leighton <lkcl@lkcl.net>
	Sat, 29 Apr 2023 17:12:58 +0000 (18:12 +0100)
openpower/sv/rfc/ls016.mdwn		patch \| blob \| history
openpower/sv/twin_butterfly.mdwn		patch \| blob \| history