openpower/sv/twin_butterfly.mdwn

   1 * <https://bugs.libre-soc.org/show_bug.cgi?id=1074>
   2 * <https://libre-soc.org/openpower/sv/biginteger/> for format and
   3   information about implicit RS/FRS
   4 * <https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;hb=HEAD>
   5 * [[openpower/isa/svfparith]]
   6
   7 # Twin Butterfly Integer DCT Instruction(s)
   8
   9 The goal is to implement instructions that calculate the expression:
  10
  11 ```
  12 fdct_round_shift((a +/- b) * c)
  13 ```
  14
  15 For the single-coefficient butterfly instruction, and:
  16
  17 ```
  18  fdct_round_shift(a * c1  +/- b * c2)
  19 ```
  20
  21 For the double-coefficient butterfly instruction.
  22
  23 `fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)`
  24
  25 ```
  26 #define ROUND_POWER_OF_TWO(value, n) (((value) + (1 << ((n)-1))) >> (n))
  27 ```
  28
  29 These instructions are at the core of **ALL** FDCT calculations in many major video codecs, including -but not limited to- VP8/VP9, AV1, etc.
  30 Arm includes special instructions to optimize these operations, although they are limited in precision: `vqrdmulhq_s16`/`vqrdmulhq_s32`.
  31
  32 The suggestion is to have a single instruction to calculate both values `((a + b) * c) >> N`, and `((a - b) * c) >> N`.
  33 The instruction will run in accumulate mode, so in order to calculate the 2-coeff version one would just have to call the same instruction with different order a, b and a different constant c.
  34
  35 ## [DRAFT] Integer Butterfly Multiply Add/Sub FFT/DCT
  36
  37 A-Form
  38
  39 * maddsubrs  RT,RA,SH,RB
  40
  41 Pseudo-code:
  42
  43 ```
  44     n <- SH
  45     sum <- (RT) + (RA)
  46     diff <- (RT) - (RA)
  47     prod1 <- MULS(RB, sum)[XLEN:(XLEN*2)-1]
  48     prod2 <- MULS(RB, diff)[XLEN:(XLEN*2)-1]
  49     res1 <- ROTL64(prod1, XLEN-n)
  50     res2 <- ROTL64(prod2, XLEN-n)
  51     m <- MASK(n, (XLEN-1))
  52     signbit1 <- res1[0]
  53     signbit2 <- res2[0]
  54     smask1 <- ([signbit1]*XLEN) & ¬m
  55     smask2 <- ([signbit2]*XLEN) & ¬m
  56     s64_1 <- [0]*(XLEN-1) || signbit1
  57     s64_2 <- [0]*(XLEN-1) || signbit2
  58     RT <- (res1 & m | smask1) + s64_1
  59     RS <- (res2 & m | smask2) + s64_2
  60 ```
  61
  62 Special Registers Altered:
  63
  64 ```
  65     None
  66 ```
  67
  68 Where we have added this variant in A-Form (defined in fields.txt):
  69
  70 ```
  71 # # 1.6.17 A-FORM
  72     |0     |6     |11      |16     |21      |26    |31 |
  73     | PO   |  RT  |   RA   |   RB  |   SH   |   XO |Rc |
  74
  75 ```
  76
  77 The instruction has been added to `minor_22.csv`:
  78
  79 ```
  80 ------01000,ALU,OP_MADDSUBRS,RT,CONST_SH,RB,RT,NONE,CR0,0,0,ZERO,0,NONE,0,0,0,0,1,0,RC_ONLY,0,0,maddsubrs,A,,1,unofficial until submitted and approved/renumbered by the opf isa wg
  81 ```
  82
  83
  84 # Twin Butterfly Integer DCT Instruction(s)
  85
  86 ## [DRAFT] Floating Twin Multiply-Add DCT [Single]
  87
  88 X-Form
  89
  90 ```
  91     |0     |6     |11      |16     |21      |31 |
  92     | PO   |  FRT |  FRA   |  FRB  |   XO   | Rc|
  93 ```
  94
  95 * fdmadds FRT,FRA,FRB (Rc=0)
  96 * fdmadds. FRT,FRA,FRB (Rc=1)
  97
  98 Pseudo-code:
  99
 100 ```
 101     FRS <- FPADD32(FRT, FRB)
 102     sub <- FPSUB32(FRT, FRB)
 103     FRT <- FPMUL32(FRA, sub)
 104 ```
 105
 106 Special Registers Altered:
 107
 108 ```
 109     FPRF FR FI
 110     FX OX UX XX
 111     VXSNAN VXISI VXIMZ
 112     CR1          (if Rc=1)
 113 ```
 114
 115 ## [DRAFT] Floating Multiply-Add FFT [Single]
 116
 117 X-Form
 118
 119 ```
 120     |0     |6     |11      |16     |21      |31 |
 121     | PO   |  FRT |  FRA   |  FRB  |   XO   | Rc|
 122 ```
 123
 124 * ffmadds FRT,FRA,FRB (Rc=0)
 125 * ffmadds. FRT,FRA,FRB (Rc=1)
 126
 127 Pseudo-code:
 128
 129 ```
 130     FRS <- FPMULADD32(FRT, FRA, FRB, -1, 1)
 131     FRT <- FPMULADD32(FRT, FRA, FRB, 1, 1)
 132 ```
 133
 134 Special Registers Altered:
 135
 136 ```
 137     FPRF FR FI
 138     FX OX UX XX
 139     VXSNAN VXISI VXIMZ
 140     CR1          (if Rc=1)
 141 ```