openpower/sv/twin_butterfly.mdwn

   1 # Introduction
   2
   3 <!-- hide -->
   4 * <https://bugs.libre-soc.org/show_bug.cgi?id=1074>
   5 * <https://libre-soc.org/openpower/sv/biginteger/> for format and
   6   information about implicit RS/FRS
   7 * <https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;hb=HEAD>
   8 * [[openpower/isa/svfparith]]
   9 * [[openpower/isa/svfixedarith]]
  10 * [[openpower/sv/rfc/ls016]]
  11
  12 <!-- show -->
  13
  14 # Rationale for Twin Butterfly Integer DCT Instruction(s)
  15
  16 The number of general-purpose uses for DCT is huge. The
  17 number of instructions needed instead of these Twin-Butterfly
  18 instructions is also huge (**eight**) and given that it is
  19 extremely common to explicitly loop-unroll them quantity
  20 hundreds to thousands of instructions are dismayingly common
  21 (for all ISAs).
  22
  23 The goal is to implement instructions that calculate the expression:
  24
  25 ```
  26     fdct_round_shift((a +/- b) * c)
  27 ```
  28
  29 For the single-coefficient butterfly instruction, and:
  30
  31 ```
  32     fdct_round_shift(a * c1  +/- b * c2)
  33 ```
  34
  35 For the double-coefficient butterfly instruction.
  36
  37 `fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)`
  38
  39 ```
  40     #define ROUND_POWER_OF_TWO(value, n) \
  41             (((value) + (1 << ((n)-1))) >> (n))
  42 ```
  43
  44 These instructions are at the core of **ALL** FDCT calculations in many major video codecs, including -but not limited to- VP8/VP9, AV1, etc.
  45 Arm includes special instructions to optimize these operations, although they are limited in precision: `vqrdmulhq_s16`/`vqrdmulhq_s32`.
  46
  47 The suggestion is to have a single instruction to calculate both values `((a + b) * c) >> N`, and `((a - b) * c) >> N`.
  48 The instruction will run in accumulate mode, so in order to calculate the 2-coeff version one would just have to call the same instruction with different order a, b and a different constant c.
  49
  50 ## Integer Butterfly Multiply Add/Sub FFT/DCT
  51
  52 **Add the following to Book I Section 3.3.9.1**
  53
  54 A-Form
  55
  56 ```
  57     |0     |6     |11      |16     |21      |26    |31 |
  58     | PO   |  RT  |   RA   |   RB  |   SH   |   XO |Rc |
  59
  60 ```
  61
  62 * maddsubrs  RT,RA,SH,RB
  63
  64 Pseudo-code:
  65
  66 ```
  67     n <- SH
  68     sum <- (RT) + (RA)
  69     diff <- (RT) - (RA)
  70     prod1 <- MULS(RB, sum)[XLEN:(XLEN*2)-1]
  71     prod2 <- MULS(RB, diff)[XLEN:(XLEN*2)-1]
  72     res1 <- ROTL64(prod1, XLEN-n)
  73     res2 <- ROTL64(prod2, XLEN-n)
  74     m <- MASK(n, (XLEN-1))
  75     signbit1 <- res1[0]
  76     signbit2 <- res2[0]
  77     smask1 <- ([signbit1]*XLEN) & ¬m
  78     smask2 <- ([signbit2]*XLEN) & ¬m
  79     s64_1 <- [0]*(XLEN-1) || signbit1
  80     s64_2 <- [0]*(XLEN-1) || signbit2
  81     RT <- (res1 & m | smask1) + s64_1
  82     RS <- (res2 & m | smask2) + s64_2
  83 ```
  84
  85 Note that if Rc=1 an Illegal Instruction is raised.
  86 Rc=1 is `RESERVED`
  87
  88 Similar to `RTp`, this instruction produces an implicit result,
  89 `RS`, which under Scalar circumstances is defined as `RT+1`.
  90 For SVP64 if `RT` is a Vector, `RS` begins immediately after the
  91 Vector `RT` where the length of `RT` is set by `SVSTATE.MAXVL`
  92 (Max Vector Length).
  93
  94 Special Registers Altered:
  95
  96 ```
  97     None
  98 ```
  99
 100 # Twin Butterfly Integer DCT Instruction(s)
 101
 102 ## Floating Twin Multiply-Add DCT [Single]
 103
 104 **Add the following to Book I Section 4.6.6.3**
 105
 106 X-Form
 107
 108 ```
 109     |0     |6     |11      |16     |21      |31 |
 110     | PO   |  FRT |  FRA   |  FRB  |   XO   |Rc |
 111 ```
 112
 113 * fdmadds FRT,FRA,FRB (Rc=0)
 114
 115 Pseudo-code:
 116
 117 ```
 118     FRS <- FPADD32(FRT, FRB)
 119     FRT <- FPMULADD32(FRT, FRA, FRB, 1, -1)
 120 ```
 121
 122 The Floating-Point operand in register FRT is added to the floating-point
 123 operand in register FRB and the result stored in FRS.
 124
 125 Using the exact same operand input register values from FRT and FRB that
 126 were used to create FRS, the Floating-Point operand in register FRB
 127 is subtracted from the floating-point operand in register FRT and the
 128 result then multiplied by FRA to create an intermediate result that is
 129 stored in FRT.
 130
 131 The add into FRS is treated exactly as `fadd`.  The creation
 132 of the result FRT is exact!y that of `fmsub`.  The creation of FRS and FRT are
 133 treated as parallel independent operations which occur at the same time.
 134
 135 Note that if Rc=1 an Illegal Instruction is raised.
 136 Rc=1 is `RESERVED`
 137
 138 Similar to `FRTp`, this instruction produces an implicit result,
 139 `FRS`, which under Scalar circumstances is defined as `FRT+1`.
 140 For SVP64 if `FRT` is a Vector, `FRS` begins immediately after the
 141 Vector `FRT` where the length of `FRT` is set by `SVSTATE.MAXVL`
 142 (Max Vector Length).
 143
 144 Special Registers Altered:
 145
 146 ```
 147     FPRF FR FI
 148     FX OX UX XX
 149     VXSNAN VXISI VXIMZ
 150 ```
 151
 152 ## Floating Multiply-Add FFT [Single]
 153
 154 **Add the following to Book I Section 4.6.6.3**
 155
 156 X-Form
 157
 158 ```
 159     |0     |6     |11      |16     |21      |31 |
 160     | PO   |  FRT |  FRA   |  FRB  |   XO   |Rc |
 161 ```
 162
 163 * ffmadds FRT,FRA,FRB (Rc=0)
 164
 165 Pseudo-code:
 166
 167 ```
 168     FRS <- FPMULADD32(FRT, FRA, FRB, -1, 1)
 169     FRT <- FPMULADD32(FRT, FRA, FRB, 1, 1)
 170 ```
 171
 172 The two operations
 173
 174 ```
 175     FRS <- -([(FRT) * (FRA)] - (FRB))
 176     FRT <-   [(FRT) * (FRA)] + (FRB)
 177 ```
 178
 179 are performed.
 180
 181 The floating-point operand in register FRT is multiplied
 182 by the floating-point operand in register FRA. The float-
 183 ing-point operand in register FRB is added to
 184 this intermediate result, and the intermediate stored in FRS.
 185
 186 Using the exact same values of FRT, FRT and FRB as used to create FRS,
 187 the floating-point operand in register FRT is multiplied
 188 by the floating-point operand in register FRA. The float-
 189 ing-point operand in register FRB is subtracted from
 190 this intermediate result, and the intermediate stored in FRT.
 191
 192 FRT is created as if
 193 a `fmadds` operation had been performed. FRS is created as if
 194 a `fnmsubs` operation had simultaneously been performed with
 195 the exact same register operands, in parallel, independently,
 196 at exactly the same time.
 197
 198 FRT is a Read-Modify-Write operation.
 199
 200 Note that if Rc=1 an Illegal Instruction is raised.
 201 Rc=1 is `RESERVED`
 202
 203 Similar to `FRTp`, this instruction produces an implicit result,
 204 `FRS`, which under Scalar circumstances is defined as `FRT+1`.
 205 For SVP64 if `FRT` is a Vector, `FRS` begins immediately after the
 206 Vector `FRT` where the length of `FRT` is set by `SVSTATE.MAXVL`
 207 (Max Vector Length).
 208
 209
 210 Special Registers Altered:
 211
 212 ```
 213     FPRF FR FI
 214     FX OX UX XX
 215     VXSNAN VXISI VXIMZ
 216 ```
 217
 218 ## [DRAFT] Floating Add FFT/DCT [Single]
 219
 220 A-Form
 221
 222 * ffadds FRT,FRA,FRB (Rc=0)
 223 * ffadds. FRT,FRA,FRB (Rc=1)
 224
 225 Pseudo-code:
 226
 227 ```
 228     FRT <- FPADD32(FRA, FRB)
 229     FRS <- FPSUB32(FRB, FRA)
 230 ```
 231
 232 Special Registers Altered:
 233
 234 ```
 235     FPRF FR FI
 236     FX OX UX XX
 237     VXSNAN VXISI
 238     CR1          (if Rc=1)
 239 ```
 240
 241 ## [DRAFT] Floating Add FFT/DCT [Double]
 242
 243 A-Form
 244
 245 * ffadd FRT,FRA,FRB (Rc=0)
 246 * ffadd. FRT,FRA,FRB (Rc=1)
 247
 248 Pseudo-code:
 249
 250 ```
 251     FRT <- FPADD64(FRA, FRB)
 252     FRS <- FPSUB64(FRB, FRA)
 253 ```
 254
 255 Special Registers Altered:
 256
 257 ```
 258     FPRF FR FI
 259     FX OX UX XX
 260     VXSNAN VXISI
 261     CR1          (if Rc=1)
 262 ```
 263
 264 ## [DRAFT] Floating Subtract FFT/DCT [Single]
 265
 266 A-Form
 267
 268 * ffsubs FRT,FRA,FRB (Rc=0)
 269 * ffsubs. FRT,FRA,FRB (Rc=1)
 270
 271 Pseudo-code:
 272
 273 ```
 274     FRT <- FPSUB32(FRB, FRA)
 275     FRS <- FPADD32(FRA, FRB)
 276 ```
 277
 278 Special Registers Altered:
 279
 280 ```
 281     FPRF FR FI
 282     FX OX UX XX
 283     VXSNAN VXISI
 284     CR1          (if Rc=1)
 285 ```
 286
 287 ## [DRAFT] Floating Subtract FFT/DCT [Double]
 288
 289 A-Form
 290
 291 * ffsub FRT,FRA,FRB (Rc=0)
 292 * ffsub. FRT,FRA,FRB (Rc=1)
 293
 294 Pseudo-code:
 295
 296 ```
 297     FRT <- FPSUB64(FRB, FRA)
 298     FRS <- FPADD64(FRA, FRB)
 299 ```
 300
 301 Special Registers Altered:
 302
 303 ```
 304     FPRF FR FI
 305     FX OX UX XX
 306     VXSNAN VXISI
 307     CR1          (if Rc=1)
 308 ```