# Introduction * * for format and information about implicit RS/FRS * * [[openpower/isa/svfparith]] * [[openpower/isa/svfixedarith]] # Rationale for Twin Butterfly Integer DCT Instruction(s) The number of general-purpose uses for DCT is huge. The number of instructions needed instead of these Twin-Butterfly instructions is also huge (**eight**) and given that it is extremely common to explicitly loop-unroll them quantity hundreds to thousands of instructions are dismayingly common (for all ISAs). The goal is to implement instructions that calculate the expression: ``` fdct_round_shift((a +/- b) * c) ``` For the single-coefficient butterfly instruction, and: ``` fdct_round_shift(a * c1 +/- b * c2) ``` For the double-coefficient butterfly instruction. `fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)` ``` #define ROUND_POWER_OF_TWO(value, n) (((value) + (1 << ((n)-1))) >> (n)) ``` These instructions are at the core of **ALL** FDCT calculations in many major video codecs, including -but not limited to- VP8/VP9, AV1, etc. Arm includes special instructions to optimize these operations, although they are limited in precision: `vqrdmulhq_s16`/`vqrdmulhq_s32`. The suggestion is to have a single instruction to calculate both values `((a + b) * c) >> N`, and `((a - b) * c) >> N`. The instruction will run in accumulate mode, so in order to calculate the 2-coeff version one would just have to call the same instruction with different order a, b and a different constant c. ## Integer Butterfly Multiply Add/Sub FFT/DCT **Add the following to Book I Section 3.3.9.1** A-Form ``` |0 |6 |11 |16 |21 |26 |31 | | PO | RT | RA | RB | SH | XO |/ | ``` * maddsubrs RT,RA,SH,RB Pseudo-code: ``` n <- SH sum <- (RT) + (RA) diff <- (RT) - (RA) prod1 <- MULS(RB, sum)[XLEN:(XLEN*2)-1] prod2 <- MULS(RB, diff)[XLEN:(XLEN*2)-1] res1 <- ROTL64(prod1, XLEN-n) res2 <- ROTL64(prod2, XLEN-n) m <- MASK(n, (XLEN-1)) signbit1 <- res1[0] signbit2 <- res2[0] smask1 <- ([signbit1]*XLEN) & ¬m smask2 <- ([signbit2]*XLEN) & ¬m s64_1 <- [0]*(XLEN-1) || signbit1 s64_2 <- [0]*(XLEN-1) || signbit2 RT <- (res1 & m | smask1) + s64_1 RS <- (res2 & m | smask2) + s64_2 ``` Special Registers Altered: ``` None ``` # Twin Butterfly Integer DCT Instruction(s) ## Floating Twin Multiply-Add DCT [Single] **Add the following to Book I Section 4.6.6.3 ** X-Form ``` |0 |6 |11 |16 |21 |31 | | PO | FRT | FRA | FRB | XO |/ | ``` * fdmadds FRT,FRA,FRB (Rc=0) Pseudo-code: ``` FRS <- FPADD32(FRT, FRB) sub <- FPSUB32(FRT, FRB) FRT <- FPMUL32(FRA, sub) ``` Special Registers Altered: ``` FPRF FR FI FX OX UX XX VXSNAN VXISI VXIMZ ``` ## Floating Multiply-Add FFT [Single] **Add the following to Book I Section 4.6.6.3 ** X-Form ``` |0 |6 |11 |16 |21 |31 | | PO | FRT | FRA | FRB | XO |/ | ``` * ffmadds FRT,FRA,FRB (Rc=0) Pseudo-code: ``` FRS <- FPMULADD32(FRT, FRA, FRB, -1, 1) FRT <- FPMULADD32(FRT, FRA, FRB, 1, 1) ``` Special Registers Altered: ``` FPRF FR FI FX OX UX XX VXSNAN VXISI VXIMZ ```