openpower/sv/twin_butterfly.mdwn

   1 # Introduction
   2
   3 <!-- hide -->
   4 * <https://bugs.libre-soc.org/show_bug.cgi?id=1074>
   5 * <https://libre-soc.org/openpower/sv/biginteger/> for format and
   6   information about implicit RS/FRS
   7 * <https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;hb=HEAD>
   8 * [[openpower/isa/svfparith]]
   9 * [[openpower/isa/svfixedarith]]
  10 <!-- show -->
  11
  12 # Twin Butterfly Integer DCT Instruction(s)
  13
  14 The goal is to implement instructions that calculate the expression:
  15
  16 ```
  17 fdct_round_shift((a +/- b) * c)
  18 ```
  19
  20 For the single-coefficient butterfly instruction, and:
  21
  22 ```
  23  fdct_round_shift(a * c1  +/- b * c2)
  24 ```
  25
  26 For the double-coefficient butterfly instruction.
  27
  28 `fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)`
  29
  30 ```
  31 #define ROUND_POWER_OF_TWO(value, n) (((value) + (1 << ((n)-1))) >> (n))
  32 ```
  33
  34 These instructions are at the core of **ALL** FDCT calculations in many major video codecs, including -but not limited to- VP8/VP9, AV1, etc.
  35 Arm includes special instructions to optimize these operations, although they are limited in precision: `vqrdmulhq_s16`/`vqrdmulhq_s32`.
  36
  37 The suggestion is to have a single instruction to calculate both values `((a + b) * c) >> N`, and `((a - b) * c) >> N`.
  38 The instruction will run in accumulate mode, so in order to calculate the 2-coeff version one would just have to call the same instruction with different order a, b and a different constant c.
  39
  40 ## [DRAFT] Integer Butterfly Multiply Add/Sub FFT/DCT
  41
  42 A-Form
  43
  44 * maddsubrs  RT,RA,SH,RB
  45
  46 Pseudo-code:
  47
  48 ```
  49     n <- SH
  50     sum <- (RT) + (RA)
  51     diff <- (RT) - (RA)
  52     prod1 <- MULS(RB, sum)[XLEN:(XLEN*2)-1]
  53     prod2 <- MULS(RB, diff)[XLEN:(XLEN*2)-1]
  54     res1 <- ROTL64(prod1, XLEN-n)
  55     res2 <- ROTL64(prod2, XLEN-n)
  56     m <- MASK(n, (XLEN-1))
  57     signbit1 <- res1[0]
  58     signbit2 <- res2[0]
  59     smask1 <- ([signbit1]*XLEN) & ¬m
  60     smask2 <- ([signbit2]*XLEN) & ¬m
  61     s64_1 <- [0]*(XLEN-1) || signbit1
  62     s64_2 <- [0]*(XLEN-1) || signbit2
  63     RT <- (res1 & m | smask1) + s64_1
  64     RS <- (res2 & m | smask2) + s64_2
  65 ```
  66
  67 Special Registers Altered:
  68
  69 ```
  70     None
  71 ```
  72
  73 Where we have added this variant in A-Form (defined in fields.txt):
  74
  75 ```
  76 # # 1.6.17 A-FORM
  77     |0     |6     |11      |16     |21      |26    |31 |
  78     | PO   |  RT  |   RA   |   RB  |   SH   |   XO |Rc |
  79
  80 ```
  81
  82 The instruction has been added to `minor_22.csv`:
  83
  84 ```
  85 ------01000,ALU,OP_MADDSUBRS,RT,CONST_SH,RB,RT,NONE,CR0,0,0,ZERO,0,NONE,0,0,0,0,1,0,RC_ONLY,0,0,maddsubrs,A,,1,unofficial until submitted and approved/renumbered by the opf isa wg
  86 ```
  87
  88
  89 # Twin Butterfly Integer DCT Instruction(s)
  90
  91 ## [DRAFT] Floating Twin Multiply-Add DCT [Single]
  92
  93 X-Form
  94
  95 ```
  96     |0     |6     |11      |16     |21      |31 |
  97     | PO   |  FRT |  FRA   |  FRB  |   XO   | Rc|
  98 ```
  99
 100 * fdmadds FRT,FRA,FRB (Rc=0)
 101 * fdmadds. FRT,FRA,FRB (Rc=1)
 102
 103 Pseudo-code:
 104
 105 ```
 106     FRS <- FPADD32(FRT, FRB)
 107     sub <- FPSUB32(FRT, FRB)
 108     FRT <- FPMUL32(FRA, sub)
 109 ```
 110
 111 Special Registers Altered:
 112
 113 ```
 114     FPRF FR FI
 115     FX OX UX XX
 116     VXSNAN VXISI VXIMZ
 117     CR1          (if Rc=1)
 118 ```
 119
 120 ## [DRAFT] Floating Multiply-Add FFT [Single]
 121
 122 X-Form
 123
 124 ```
 125     |0     |6     |11      |16     |21      |31 |
 126     | PO   |  FRT |  FRA   |  FRB  |   XO   | Rc|
 127 ```
 128
 129 * ffmadds FRT,FRA,FRB (Rc=0)
 130 * ffmadds. FRT,FRA,FRB (Rc=1)
 131
 132 Pseudo-code:
 133
 134 ```
 135     FRS <- FPMULADD32(FRT, FRA, FRB, -1, 1)
 136     FRT <- FPMULADD32(FRT, FRA, FRB, 1, 1)
 137 ```
 138
 139 Special Registers Altered:
 140
 141 ```
 142     FPRF FR FI
 143     FX OX UX XX
 144     VXSNAN VXISI VXIMZ
 145     CR1          (if Rc=1)
 146 ```