openpower/sv/twin_butterfly.mdwn

   1 # Introduction
   2
   3 <!-- hide -->
   4 * <https://bugs.libre-soc.org/show_bug.cgi?id=1074>
   5 * <https://libre-soc.org/openpower/sv/biginteger/> for format and
   6   information about implicit RS/FRS
   7 * <https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;hb=HEAD>
   8 * [[openpower/isa/svfparith]]
   9 * [[openpower/isa/svfixedarith]]
  10 <!-- show -->
  11
  12 # Rationale for Twin Butterfly Integer DCT Instruction(s)
  13
  14 The number of general-purpose uses for DCT is huge. The
  15 number of instructions needed instead of these Twin-Butterfly
  16 instructions is also huge (**eight**) and given that it is
  17 extremely common to explicitly loop-unroll them quantity
  18 hundreds to thousands of instructions are dismayingly common
  19 (for all ISAs).
  20
  21 The goal is to implement instructions that calculate the expression:
  22
  23 ```
  24     fdct_round_shift((a +/- b) * c)
  25 ```
  26
  27 For the single-coefficient butterfly instruction, and:
  28
  29 ```
  30     fdct_round_shift(a * c1  +/- b * c2)
  31 ```
  32
  33 For the double-coefficient butterfly instruction.
  34
  35 `fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)`
  36
  37 ```
  38     #define ROUND_POWER_OF_TWO(value, n) (((value) + (1 << ((n)-1))) >> (n))
  39 ```
  40
  41 These instructions are at the core of **ALL** FDCT calculations in many major video codecs, including -but not limited to- VP8/VP9, AV1, etc.
  42 Arm includes special instructions to optimize these operations, although they are limited in precision: `vqrdmulhq_s16`/`vqrdmulhq_s32`.
  43
  44 The suggestion is to have a single instruction to calculate both values `((a + b) * c) >> N`, and `((a - b) * c) >> N`.
  45 The instruction will run in accumulate mode, so in order to calculate the 2-coeff version one would just have to call the same instruction with different order a, b and a different constant c.
  46
  47 ## Integer Butterfly Multiply Add/Sub FFT/DCT
  48
  49 **Add the following to Book I Section 3.3.9.1**
  50
  51 A-Form
  52
  53 ```
  54     |0     |6     |11      |16     |21      |26    |31 |
  55     | PO   |  RT  |   RA   |   RB  |   SH   |   XO |/  |
  56
  57 ```
  58
  59 * maddsubrs  RT,RA,SH,RB
  60
  61 Pseudo-code:
  62
  63 ```
  64     n <- SH
  65     sum <- (RT) + (RA)
  66     diff <- (RT) - (RA)
  67     prod1 <- MULS(RB, sum)[XLEN:(XLEN*2)-1]
  68     prod2 <- MULS(RB, diff)[XLEN:(XLEN*2)-1]
  69     res1 <- ROTL64(prod1, XLEN-n)
  70     res2 <- ROTL64(prod2, XLEN-n)
  71     m <- MASK(n, (XLEN-1))
  72     signbit1 <- res1[0]
  73     signbit2 <- res2[0]
  74     smask1 <- ([signbit1]*XLEN) & ¬m
  75     smask2 <- ([signbit2]*XLEN) & ¬m
  76     s64_1 <- [0]*(XLEN-1) || signbit1
  77     s64_2 <- [0]*(XLEN-1) || signbit2
  78     RT <- (res1 & m | smask1) + s64_1
  79     RS <- (res2 & m | smask2) + s64_2
  80 ```
  81
  82 Special Registers Altered:
  83
  84 ```
  85     None
  86 ```
  87
  88 # Twin Butterfly Integer DCT Instruction(s)
  89
  90 ## Floating Twin Multiply-Add DCT [Single]
  91
  92 **Add the following to Book I Section 4.6.6.3 **
  93
  94 X-Form
  95
  96 ```
  97     |0     |6     |11      |16     |21      |31 |
  98     | PO   |  FRT |  FRA   |  FRB  |   XO   |/  |
  99 ```
 100
 101 * fdmadds FRT,FRA,FRB (Rc=0)
 102
 103 Pseudo-code:
 104
 105 ```
 106     FRS <- FPADD32(FRT, FRB)
 107     sub <- FPSUB32(FRT, FRB)
 108     FRT <- FPMUL32(FRA, sub)
 109 ```
 110
 111 Special Registers Altered:
 112
 113 ```
 114     FPRF FR FI
 115     FX OX UX XX
 116     VXSNAN VXISI VXIMZ
 117 ```
 118
 119 ## Floating Multiply-Add FFT [Single]
 120
 121 **Add the following to Book I Section 4.6.6.3 **
 122
 123 X-Form
 124
 125 ```
 126     |0     |6     |11      |16     |21      |31 |
 127     | PO   |  FRT |  FRA   |  FRB  |   XO   |/  |
 128 ```
 129
 130 * ffmadds FRT,FRA,FRB (Rc=0)
 131
 132 Pseudo-code:
 133
 134 ```
 135     FRS <- FPMULADD32(FRT, FRA, FRB, -1, 1)
 136     FRT <- FPMULADD32(FRT, FRA, FRB, 1, 1)
 137 ```
 138
 139 Special Registers Altered:
 140
 141 ```
 142     FPRF FR FI
 143     FX OX UX XX
 144     VXSNAN VXISI VXIMZ
 145 ```