* [[openpower/isa/svfparith]]
* [[openpower/isa/svfixedarith]]
* [[openpower/sv/rfc/ls016]]
-
<!-- show -->
+Although best used with SVP64 REMAP these instructions may be used in a Scalar-only
+context to save considerably on DCT, DFT and FFT processing. Whilst some hardware
+implementations may not necessarily implement them efficiently (slower Micro-coding)
+savings still come from the reduction in temporary registers as well as instruction
+count.
+
# Rationale for Twin Butterfly Integer DCT Instruction(s)
The number of general-purpose uses for DCT is huge. The number of
For the double-coefficient butterfly instruction.
-`fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)`
+In a 32-bit context `fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)`
```
#define ROUND_POWER_OF_TWO(value, n) \
These instructions are at the core of **ALL** FDCT calculations in many
major video codecs, including -but not limited to- VP8/VP9, AV1, etc.
-Arm includes special instructions to optimize these operations, although
+ARM includes special instructions to optimize these operations, although
they are limited in precision: `vqrdmulhq_s16`/`vqrdmulhq_s32`.
The suggestion is to have a single instruction to calculate both values
```
|0 |6 |11 |16 |21 |26 |31 |
| PO | RT | RA | RB | SH | XO |Rc |
-
```
* maddsubrs RT,RA,SH,RB
n <- SH
sum <- (RT) + (RA)
diff <- (RT) - (RA)
- prod1 <- MULS(RB, sum)[XLEN:(XLEN*2)-1] + 1
- prod2 <- MULS(RB, diff)[XLEN:(XLEN*2)-1] + 1
- res1 <- ROTL64(prod1, XLEN-n)
- res2 <- ROTL64(prod2, XLEN-n)
- m <- MASK(n, (XLEN-1))
- signbit1 <- res1[0]
- signbit2 <- res2[0]
- smask1 <- ([signbit1]*XLEN) & ¬m
- smask2 <- ([signbit2]*XLEN) & ¬m
- s64_1 <- [0]*(XLEN-1) || signbit1
- s64_2 <- [0]*(XLEN-1) || signbit2
- RT <- (res1 & m | smask1) + s64_1
- RS <- (res2 & m | smask2) + s64_2
+ prod1 <- MULS(RB, sum)
+ prod1_lo <- prod1[XLEN:(XLEN*2)-1]
+ prod2 <- MULS(RB, diff)
+ prod2_lo <- prod2[XLEN:(XLEN*2)-1]
+ if n = 0 then
+ RT <- prod1_lo
+ RS <- prod2_lo
+ else
+ round <- [0]*XLEN
+ round[XLEN -n] <- 1
+ prod1_lo <- prod1_lo + round
+ prod2_lo <- prod2_lo + round
+ m <- MASK(n, (XLEN-1))
+ res1 <- ROTL64(prod1_lo, XLEN-n) & m
+ res2 <- ROTL64(prod2_lo, XLEN-n) & m
+ signbit1 <- prod1_lo[0]
+ signbit2 <- prod2_lo[0]
+ smask1 <- ([signbit1]*XLEN) & ¬m
+ smask2 <- ([signbit2]*XLEN) & ¬m
+ RT <- (res1 | smask1)
+ RS <- (res2 | smask2)
```
Note that if Rc=1 an Illegal Instruction is raised. Rc=1 is `RESERVED`
\newpage{}
-# Twin Butterfly Floating-Point DCT Instruction(s)
-
-## Floating-Point Twin Multiply-Add DCT [Single]
+# Twin Butterfly Floating-Point DCT and FFT Instruction(s)
**Add the following to Book I Section 4.6.6.3**
+## Floating-Point Twin Multiply-Add DCT [Single]
+
X-Form
```
Using the exact same operand input register values from FRT and FRB
that were used to create FRS, the Floating-Point operand in register
FRB is subtracted from the floating-point operand in register FRT and
-the result then multiplied by FRA to create an intermediate result that
-is stored in FRT.
+the result then rounded before being multiplied by FRA to create an
+intermediate result that is stored in FRT.
The add into FRS is treated exactly as `fadds`. The creation of the
-result FRT is **not** the same as that of `fmsubs`.
-The creation of FRS and FRT are treated as parallel independent operations
-which occur at the same time.
+result FRT is **not** the same as that of `fmsubs`, but is instead as if
+`fsubs` were performed first followed by `fmuls`. The creation of FRS
+and FRT are treated as parallel independent operations which occur at
+the same time.
Note that if Rc=1 an Illegal Instruction is raised. Rc=1 is `RESERVED`
## Floating-Point Multiply-Add FFT [Single]
-**Add the following to Book I Section 4.6.6.3**
-
X-Form
```
Using the exact same values of FRT, FRT and FRB as used to create
FRS, the floating-point operand in register FRT is multiplied by the
-floating-point operand in register FRA. The float- ing-point operand
+floating-point operand in register FRA. The floating-point operand
in register FRB is subtracted from this intermediate result, and the
intermediate stored in FRT.
Vector `FRT` where the length of `FRT` is set by `SVSTATE.MAXVL`
(Max Vector Length).
-
Special Registers Altered:
```
FX OX UX XX
VXSNAN VXISI VXIMZ
```
-## Floating-Point Twin Multiply-Add DCT
-**Add the following to Book I Section 4.6.6.3**
+## Floating-Point Twin Multiply-Add DCT
X-Form
Using the exact same operand input register values from FRT and FRB
that were used to create FRS, the Floating-Point operand in register
FRB is subtracted from the floating-point operand in register FRT and
-the result then multiplied by FRA to create an intermediate result that
-is stored in FRT.
+the result then rounded before being multiplied by FRA to create an
+intermediate result that is stored in FRT.
The add into FRS is treated exactly as `fadd`. The creation of the
-result FRT is **not** the same as that of `fmsub`.
-The creation of FRS and FRT are treated as parallel independent operations
-which occur at the same time.
+result FRT is **not** the same as that of `fmsub`, but is instead as if
+`fsub` were performed first followed by `fmuls. The creation of FRS
+and FRT are treated as parallel independent operations which occur at
+the same time.
Note that if Rc=1 an Illegal Instruction is raised. Rc=1 is `RESERVED`
## Floating-Point Twin Multiply-Add FFT
-**Add the following to Book I Section 4.6.6.3**
-
X-Form
```
```
-## [DRAFT] Floating-Point Add FFT/DCT [Single]
+## Floating-Point Add FFT/DCT [Single]
A-Form
+```
+ |0 |6 |11 |16 |21 |26 |31 |
+ | PO | FRT | FRA | FRB | / | XO |Rc |
+```
+
* ffadds FRT,FRA,FRB (Rc=0)
-* ffadds. FRT,FRA,FRB (Rc=1)
Pseudo-code:
FPRF FR FI
FX OX UX XX
VXSNAN VXISI
- CR1 (if Rc=1)
```
-## [DRAFT] Floating-Point Add FFT/DCT [Double]
+## Floating-Point Add FFT/DCT [Double]
A-Form
+```
+ |0 |6 |11 |16 |21 |26 |31 |
+ | PO | FRT | FRA | FRB | / | XO |Rc |
+```
+
* ffadd FRT,FRA,FRB (Rc=0)
-* ffadd. FRT,FRA,FRB (Rc=1)
Pseudo-code:
FPRF FR FI
FX OX UX XX
VXSNAN VXISI
- CR1 (if Rc=1)
```
-## [DRAFT] Floating-Point Subtract FFT/DCT [Single]
+## Floating-Point Subtract FFT/DCT [Single]
A-Form
+```
+ |0 |6 |11 |16 |21 |26 |31 |
+ | PO | FRT | FRA | FRB | / | XO |Rc |
+```
+
* ffsubs FRT,FRA,FRB (Rc=0)
-* ffsubs. FRT,FRA,FRB (Rc=1)
Pseudo-code:
FPRF FR FI
FX OX UX XX
VXSNAN VXISI
- CR1 (if Rc=1)
```
-## [DRAFT] Floating-Point Subtract FFT/DCT [Double]
+## Floating-Point Subtract FFT/DCT [Double]
A-Form
+```
+ |0 |6 |11 |16 |21 |26 |31 |
+ | PO | FRT | FRA | FRB | / | XO |Rc |
+```
+
* ffsub FRT,FRA,FRB (Rc=0)
-* ffsub. FRT,FRA,FRB (Rc=1)
Pseudo-code:
FPRF FR FI
FX OX UX XX
VXSNAN VXISI
- CR1 (if Rc=1)
```