From b85bcbb2f39c755d98b71afb98ff990face5979d Mon Sep 17 00:00:00 2001 From: Luke Kenneth Casson Leighton Date: Sat, 29 Apr 2023 18:12:58 +0100 Subject: [PATCH] whitespace --- openpower/sv/rfc/ls016.mdwn | 53 ++++++++++++++++---------------- openpower/sv/twin_butterfly.mdwn | 8 +++++ 2 files changed, 35 insertions(+), 26 deletions(-) diff --git a/openpower/sv/rfc/ls016.mdwn b/openpower/sv/rfc/ls016.mdwn index c50a2aaff..aa4444505 100644 --- a/openpower/sv/rfc/ls016.mdwn +++ b/openpower/sv/rfc/ls016.mdwn @@ -63,10 +63,10 @@ The list of uses for DCT is enormous - well over a hundred. -The number of uses for FFT is also equally known to be extremely high +The number of uses for FFT, DFT, NTT is also equally known to be extremely high ARM has already added `vqrdmulhq_s16/32` instructions as their inclusion -in any ISA replaces **eight** non-Twin-Butterfly instructions, which +in any ISA replaces **eight** equivalent non-Twin-Butterfly instructions, which are often loop-unrolled, resulting in L1 I-Cache stripmining as well as requiring far greater resources (double the number of intermediate Vector registers) or much more complex hardware to @@ -75,26 +75,27 @@ get efficient execution. **Notes and Observations**: 1. Whilst it is easy to justify these high-value instructions they are - sufficiently complex as to warrant placement as optional SFFS in - the new EXT2xx area (marked as Vectoriseable). + sufficiently complex as to warrant placement as optional SFFS in the + new EXT2xx area (marked as Vectoriseable). 2. Although they are 3-in 2-out the actual encoding is as a double-overwrite reducing the actual number of operands down to three (RT RA and RB) - where RT is a Read-Modify-Write and an additional RS (normally RT+1) is implicit. + where RT is a Read-Modify-Write and an additional RS (normally RT+1) + is implicit. 3. As with the biginteger set of 3-in 2-out instructions if Power ISA did not - already have LD/ST-with-Update, Load/Store-Quad, and other RTp and RAp instructions, - these instructions would not be proposed. + already have LD/ST-with-Update, Load/Store-Quad, and other RTp and + RAp instructions, these instructions would not be proposed. 4. The read and write of two overlapping registers normally requires - an intermediate register (similar to the justifcation for CAS - Compare-and-Swap). - When Vectorised the situation becomes even worse: an entire *Vector* - of intermediate temporaries is required. - Thus *even if implemented inefficiently* requiring more cycles to complete - (taking an extra cycle to write the second result) these instructions still - save on resources. + an intermediate register (similar to the justifcation for CAS - + Compare-and-Swap). When Vectorised the situation becomes even + worse: an entire *Vector* of intermediate temporaries is required. + Thus *even if implemented inefficiently* requiring more cycles to + complete (taking an extra cycle to write the second result) these + instructions still save on resources. 5. Macro-op fusion equivalents of these instructions is *not possible* for - exactly the same reason that the equivalent CAS sequence may not be macro-op - fused. Full in-place Vectorised FFT and DCT algorithms *only* become - possible due to these instructions atomically reading **both** operands - into internal Reservation Stations (exactly like CAS). + exactly the same reason that the equivalent CAS sequence may not be + macro-op fused. Full in-place Vectorised FFT and DCT algorithms *only* + become possible due to these instructions atomically reading **both** + operands into internal Reservation Stations (exactly like CAS). 5. Although desirable (particularly to detect overflow) Rc=1 is hard to conceptualise. It is likely that instead, Simple-V "saturation" if enabled will create an Rc=1 CR.SO flag (including SVP64Single). @@ -151,15 +152,15 @@ Add the following new fields to Book I 1.6.2 Word Instruction Fields: | Form | Book | Page | Version | Mnemonic | Description | |------|------|------|---------|----------|-------------| -| A | I | # | 3.2B | maddsubrs | Integer DCT/FFT Twin-Butterfly | -| X | I | # | 3.2B | fdmadds | FP DCT Twin-Butterfly Single | -| X | I | # | 3.2B | ffmadds | FP FFT Twin-Butterfly Single | -| X | I | # | 3.2B | fdmadds | FP DCT Twin-Butterfly Double | -| X | I | # | 3.2B | ffmadds | FP FFT Twin-Butterfly Double | -| X | I | # | 3.2B | ffadds | FP FFT Twin-Butterfly Single | -| X | I | # | 3.2B | ffadd | FP FFT Twin-Butterfly Double | -| X | I | # | 3.2B | ffsubs | FP FFT Twin-Butterfly Single | -| X | I | # | 3.2B | ffsub | FP FFT Twin-Butterfly Double | +| A | I | # | 3.2B |maddsubrs | Integer DCT/FFT Twin-Butterfly | +| X | I | # | 3.2B |fdmadds | FP DCT Twin-Butterfly Single | +| X | I | # | 3.2B |ffmadds | FP FFT Twin-Butterfly Single | +| X | I | # | 3.2B |fdmadds | FP DCT Twin-Butterfly Double | +| X | I | # | 3.2B |ffmadds | FP FFT Twin-Butterfly Double | +| X | I | # | 3.2B |ffadds | FP FFT Twin-Butterfly Single | +| X | I | # | 3.2B |ffadd | FP FFT Twin-Butterfly Double | +| X | I | # | 3.2B |ffsubs | FP FFT Twin-Butterfly Single | +| X | I | # | 3.2B |ffsub | FP FFT Twin-Butterfly Double | [[!tag opf_rfc]] diff --git a/openpower/sv/twin_butterfly.mdwn b/openpower/sv/twin_butterfly.mdwn index 1a17e9cdb..72a2552ec 100644 --- a/openpower/sv/twin_butterfly.mdwn +++ b/openpower/sv/twin_butterfly.mdwn @@ -77,6 +77,10 @@ Example taken from libvpx srawi 5,5,14 ``` +------- + +\newpage{} + ## Integer Butterfly Multiply Add/Sub FFT/DCT **Add the following to Book I Section 3.3.9.1** @@ -125,6 +129,10 @@ Special Registers Altered: None ``` +------- + +\newpage{} + # Twin Butterfly Floating-Point DCT Instruction(s) ## Floating-Point Twin Multiply-Add DCT [Single] -- 2.30.2