* * for format and information about implicit RS/FRS * * [[openpower/isa/svfparith]] # [DRAFT] Twin Butterfly DCT Instruction(s) The goal is to implement instructions that calculate the expression: ``` fdct_round_shift((a +/- b) * c) ``` For the single-coefficient butterfly instruction, and: ``` fdct_round_shift(a * c1 +/- b * c2) ``` For the double-coefficient butterfly instruction. `fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)` ``` #define ROUND_POWER_OF_TWO(value, n) (((value) + (1 << ((n)-1))) >> (n)) ``` These instructions are at the core of **ALL** FDCT calculations in many major video codecs, including -but not limited to- VP8/VP9, AV1, etc. Arm includes special instructions to optimize these operations, although they are limited in precision: `vqrdmulhq_s16`/`vqrdmulhq_s32`. The suggestion is to have a single instruction to calculate both values `((a + b) * c) >> N`, and `((a - b) * c) >> N`. The instruction will run in accumulate mode, so in order to calculate the 2-coeff version one would just have to call the same instruction with different order a, b and a different constant c. # [DRAFT] Integer Butterfly Multiply Add/Sub FFT/DCT BF-Form * maddsubrs RT,RA,RB,RC,SH Pseudo-code: ``` sum <- (RA) + (RB) diff <- (RA) - (RB) prod1 <- MUL(RC, sum) # TODO: pick hi-half prod2 <- MUL(RC, diff) # TODO: pick hi-half res1 <- ROTL64(prod1, SH) # TODO shift the other way (63-SH?) res2 <- ROTL64(prod2, SH) RT <- res1 RS <- res2 ``` Special Registers Altered: ``` None ``` Where BF-Form is defined in fields.txt: ``` # 1.6.39 BF-FORM |0 | 6 |11 |16 |21 | 26 |27 31| | PO | RT | RA | RB | RC | SH | XO | ``` The instruction has been added to `minor_22.csv`: ``` ------01000,ALU,OP_MADDSUBRS,RA,RB,RC,RT,NONE,CR0,0,0,ZERO,0,NONE,0,0,0,0,1,0,RC_ONLY,0,0,maddsubrs,BF,,1,unofficial until submitted and approved/renumbered by the opf isa wg ```