openpower/sv/twin_butterfly.mdwn

   1 # Introduction
   2
   3 <!-- hide -->
   4 * <https://bugs.libre-soc.org/show_bug.cgi?id=1074>
   5 * <https://libre-soc.org/openpower/sv/biginteger/> for format and
   6   information about implicit RS/FRS
   7 * <https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;hb=HEAD>
   8 * [[openpower/isa/svfparith]]
   9 * [[openpower/isa/svfixedarith]]
  10 * [[openpower/sv/rfc/ls016]]
  11
  12 <!-- show -->
  13
  14 # Rationale for Twin Butterfly Integer DCT Instruction(s)
  15
  16 The number of general-purpose uses for DCT is huge. The
  17 number of instructions needed instead of these Twin-Butterfly
  18 instructions is also huge (**eight**) and given that it is
  19 extremely common to explicitly loop-unroll them quantity
  20 hundreds to thousands of instructions are dismayingly common
  21 (for all ISAs).
  22
  23 The goal is to implement instructions that calculate the expression:
  24
  25 ```
  26     fdct_round_shift((a +/- b) * c)
  27 ```
  28
  29 For the single-coefficient butterfly instruction, and:
  30
  31 ```
  32     fdct_round_shift(a * c1  +/- b * c2)
  33 ```
  34
  35 For the double-coefficient butterfly instruction.
  36
  37 `fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)`
  38
  39 ```
  40     #define ROUND_POWER_OF_TWO(value, n) \
  41             (((value) + (1 << ((n)-1))) >> (n))
  42 ```
  43
  44 These instructions are at the core of **ALL** FDCT calculations in many major video codecs, including -but not limited to- VP8/VP9, AV1, etc.
  45 Arm includes special instructions to optimize these operations, although they are limited in precision: `vqrdmulhq_s16`/`vqrdmulhq_s32`.
  46
  47 The suggestion is to have a single instruction to calculate both values `((a + b) * c) >> N`, and `((a - b) * c) >> N`.
  48 The instruction will run in accumulate mode, so in order to calculate the 2-coeff version one would just have to call the same instruction with different order a, b and a different constant c.
  49
  50 ## Integer Butterfly Multiply Add/Sub FFT/DCT
  51
  52 **Add the following to Book I Section 3.3.9.1**
  53
  54 A-Form
  55
  56 ```
  57     |0     |6     |11      |16     |21      |26    |31 |
  58     | PO   |  RT  |   RA   |   RB  |   SH   |   XO |Rc |
  59
  60 ```
  61
  62 * maddsubrs  RT,RA,SH,RB
  63
  64 Pseudo-code:
  65
  66 ```
  67     n <- SH
  68     sum <- (RT) + (RA)
  69     diff <- (RT) - (RA)
  70     prod1 <- MULS(RB, sum)[XLEN:(XLEN*2)-1]
  71     prod2 <- MULS(RB, diff)[XLEN:(XLEN*2)-1]
  72     res1 <- ROTL64(prod1, XLEN-n)
  73     res2 <- ROTL64(prod2, XLEN-n)
  74     m <- MASK(n, (XLEN-1))
  75     signbit1 <- res1[0]
  76     signbit2 <- res2[0]
  77     smask1 <- ([signbit1]*XLEN) & ¬m
  78     smask2 <- ([signbit2]*XLEN) & ¬m
  79     s64_1 <- [0]*(XLEN-1) || signbit1
  80     s64_2 <- [0]*(XLEN-1) || signbit2
  81     RT <- (res1 & m | smask1) + s64_1
  82     RS <- (res2 & m | smask2) + s64_2
  83 ```
  84
  85 Note that if Rc=1 an Illegal Instruction is raised.
  86 Rc=1 is `RESERVED`
  87
  88 Similar to `RTp`, this instruction produces an implicit result,
  89 `RS`, which under Scalar circumstances is defined as `RT+1`.
  90 For SVP64 if `RT` is a Vector, `RS` begins immediately after the
  91 Vector `RT` where the length of `RT` is set by `SVSTATE.MAXVL`
  92 (Max Vector Length).
  93
  94 Special Registers Altered:
  95
  96 ```
  97     None
  98 ```
  99
 100 # Twin Butterfly Integer DCT Instruction(s)
 101
 102 ## Floating Twin Multiply-Add DCT [Single]
 103
 104 **Add the following to Book I Section 4.6.6.3**
 105
 106 X-Form
 107
 108 ```
 109     |0     |6     |11      |16     |21      |31 |
 110     | PO   |  FRT |  FRA   |  FRB  |   XO   |Rc |
 111 ```
 112
 113 * fdmadds FRT,FRA,FRB (Rc=0)
 114
 115 Pseudo-code:
 116
 117 ```
 118     FRS <- FPADD32(FRT, FRB)
 119     FRT <- FPMULADD32(FRT, FRA, FRB, 1, -1)
 120 ```
 121
 122 The Floating-Point operand in register FRT is added to the floating-point
 123 operand in register FRB and the result stored in FRS.
 124
 125 Using the exact same operand input register values from FRT and FRB that
 126 were used to create FRS, the Floating-Point operand in register FRB
 127 is subtracted from the floating-point operand in register FRT and the
 128 result then multiplied by FRA to create an intermediate result that is
 129 stored in FRT.
 130
 131 The add into FRS is treated exactly as `fadd`.  The creation
 132 of the result FRT is exact!y that of `fmsub`.  The creation of FRS and FRT are
 133 treated as parallel independent operations which occur at the same time.
 134
 135 Note that if Rc=1 an Illegal Instruction is raised.
 136 Rc=1 is `RESERVED`
 137
 138 Similar to `FRTp`, this instruction produces an implicit result,
 139 `FRS`, which under Scalar circumstances is defined as `FRT+1`.
 140 For SVP64 if `FRT` is a Vector, `FRS` begins immediately after the
 141 Vector `FRT` where the length of `FRT` is set by `SVSTATE.MAXVL`
 142 (Max Vector Length).
 143
 144 Special Registers Altered:
 145
 146 ```
 147     FPRF FR FI
 148     FX OX UX XX
 149     VXSNAN VXISI VXIMZ
 150 ```
 151
 152 ## Floating Multiply-Add FFT [Single]
 153
 154 **Add the following to Book I Section 4.6.6.3**
 155
 156 X-Form
 157
 158 ```
 159     |0     |6     |11      |16     |21      |31 |
 160     | PO   |  FRT |  FRA   |  FRB  |   XO   |Rc |
 161 ```
 162
 163 * ffmadds FRT,FRA,FRB (Rc=0)
 164
 165 Pseudo-code:
 166
 167 ```
 168     FRS <- FPMULADD32(FRT, FRA, FRB, -1, 1)
 169     FRT <- FPMULADD32(FRT, FRA, FRB, 1, 1)
 170 ```
 171
 172 The two operations
 173
 174 ```
 175     FRS <- -([(FRT) * (FRA)] - (FRB))
 176     FRT <-   [(FRT) * (FRA)] + (FRB)
 177 ```
 178
 179 are performed.
 180
 181 The floating-point operand in register FRT is multiplied
 182 by the floating-point operand in register FRA. The float-
 183 ing-point operand in register FRB is added to
 184 this intermediate result, and the intermediate stored in FRS.
 185
 186 Using the exact same values of FRT, FRT and FRB as used to create FRS,
 187 the floating-point operand in register FRT is multiplied
 188 by the floating-point operand in register FRA. The float-
 189 ing-point operand in register FRB is subtracted from
 190 this intermediate result, and the intermediate stored in FRT.
 191
 192 FRT is created as if
 193 a `fmadds` operation had been performed. FRS is created as if
 194 a `fnmsubs` operation had simultaneously been performed with
 195 the exact same register operands, in parallel, independently,
 196 at exactly the same time.
 197
 198 FRT is a Read-Modify-Write operation.
 199
 200 Note that if Rc=1 an Illegal Instruction is raised.
 201 Rc=1 is `RESERVED`
 202
 203 Similar to `FRTp`, this instruction produces an implicit result,
 204 `FRS`, which under Scalar circumstances is defined as `FRT+1`.
 205 For SVP64 if `FRT` is a Vector, `FRS` begins immediately after the
 206 Vector `FRT` where the length of `FRT` is set by `SVSTATE.MAXVL`
 207 (Max Vector Length).
 208
 209
 210 Special Registers Altered:
 211
 212 ```
 213     FPRF FR FI
 214     FX OX UX XX
 215     VXSNAN VXISI VXIMZ
 216 ```
 217 ## Floating Twin Multiply-Add DCT
 218
 219 **Add the following to Book I Section 4.6.6.3**
 220
 221 X-Form
 222
 223 ```
 224     |0     |6     |11      |16     |21      |31 |
 225     | PO   |  FRT |  FRA   |  FRB  |   XO   |Rc |
 226 ```
 227
 228 * fdmadd FRT,FRA,FRB (Rc=0)
 229
 230 Pseudo-code:
 231
 232 ```
 233     FRS <- FPADD64(FRT, FRB)
 234     FRT <- FPMULADD64(FRT, FRA, FRB, 1, -1)
 235 ```
 236
 237 The Floating-Point operand in register FRT is added to the floating-point
 238 operand in register FRB and the result stored in FRS.
 239
 240 Using the exact same operand input register values from FRT and FRB that
 241 were used to create FRS, the Floating-Point operand in register FRB
 242 is subtracted from the floating-point operand in register FRT and the
 243 result then multiplied by FRA to create an intermediate result that is
 244 stored in FRT.
 245
 246 The add into FRS is treated exactly as `fadd`.  The creation
 247 of the result FRT is exact!y that of `fmsub`.  The creation of FRS and FRT are
 248 treated as parallel independent operations which occur at the same time.
 249
 250 Note that if Rc=1 an Illegal Instruction is raised.
 251 Rc=1 is `RESERVED`
 252
 253 Similar to `FRTp`, this instruction produces an implicit result,
 254 `FRS`, which under Scalar circumstances is defined as `FRT+1`.
 255 For SVP64 if `FRT` is a Vector, `FRS` begins immediately after the
 256 Vector `FRT` where the length of `FRT` is set by `SVSTATE.MAXVL`
 257 (Max Vector Length).
 258
 259 Special Registers Altered:
 260
 261 ```
 262     FPRF FR FI
 263     FX OX UX XX
 264     VXSNAN VXISI VXIMZ
 265 ```
 266
 267 ## Floating Twin Multiply-Add FFT
 268
 269 **Add the following to Book I Section 4.6.6.3**
 270
 271 X-Form
 272
 273 ```
 274     |0     |6     |11      |16     |21      |31 |
 275     | PO   |  FRT |  FRA   |  FRB  |   XO   |Rc |
 276 ```
 277
 278 * ffmadd FRT,FRA,FRB (Rc=0)
 279
 280 Pseudo-code:
 281
 282 ```
 283     FRS <- FPMULADD64(FRT, FRA, FRB, -1, 1)
 284     FRT <- FPMULADD64(FRT, FRA, FRB, 1, 1)
 285 ```
 286
 287 The two operations
 288
 289 ```
 290     FRS <- -([(FRT) * (FRA)] - (FRB))
 291     FRT <-   [(FRT) * (FRA)] + (FRB)
 292 ```
 293
 294 are performed.
 295
 296 The floating-point operand in register FRT is multiplied
 297 by the floating-point operand in register FRA. The float-
 298 ing-point operand in register FRB is added to
 299 this intermediate result, and the intermediate stored in FRS.
 300
 301 Using the exact same values of FRT, FRT and FRB as used to create FRS,
 302 the floating-point operand in register FRT is multiplied
 303 by the floating-point operand in register FRA. The float-
 304 ing-point operand in register FRB is subtracted from
 305 this intermediate result, and the intermediate stored in FRT.
 306
 307 FRT is created as if
 308 a `fmadd` operation had been performed. FRS is created as if
 309 a `fnmsub` operation had simultaneously been performed with
 310 the exact same register operands, in parallel, independently,
 311 at exactly the same time.
 312
 313 FRT is a Read-Modify-Write operation.
 314
 315 Note that if Rc=1 an Illegal Instruction is raised.
 316 Rc=1 is `RESERVED`
 317
 318 Similar to `FRTp`, this instruction produces an implicit result,
 319 `FRS`, which under Scalar circumstances is defined as `FRT+1`.
 320 For SVP64 if `FRT` is a Vector, `FRS` begins immediately after the
 321 Vector `FRT` where the length of `FRT` is set by `SVSTATE.MAXVL`
 322 (Max Vector Length).
 323
 324 Special Registers Altered:
 325
 326 ```
 327     FPRF FR FI
 328     FX OX UX XX
 329     VXSNAN VXISI VXIMZ
 330 ```
 331
 332
 333 ## [DRAFT] Floating Add FFT/DCT [Single]
 334
 335 A-Form
 336
 337 * ffadds FRT,FRA,FRB (Rc=0)
 338 * ffadds. FRT,FRA,FRB (Rc=1)
 339
 340 Pseudo-code:
 341
 342 ```
 343     FRT <- FPADD32(FRA, FRB)
 344     FRS <- FPSUB32(FRB, FRA)
 345 ```
 346
 347 Special Registers Altered:
 348
 349 ```
 350     FPRF FR FI
 351     FX OX UX XX
 352     VXSNAN VXISI
 353     CR1          (if Rc=1)
 354 ```
 355
 356 ## [DRAFT] Floating Add FFT/DCT [Double]
 357
 358 A-Form
 359
 360 * ffadd FRT,FRA,FRB (Rc=0)
 361 * ffadd. FRT,FRA,FRB (Rc=1)
 362
 363 Pseudo-code:
 364
 365 ```
 366     FRT <- FPADD64(FRA, FRB)
 367     FRS <- FPSUB64(FRB, FRA)
 368 ```
 369
 370 Special Registers Altered:
 371
 372 ```
 373     FPRF FR FI
 374     FX OX UX XX
 375     VXSNAN VXISI
 376     CR1          (if Rc=1)
 377 ```
 378
 379 ## [DRAFT] Floating Subtract FFT/DCT [Single]
 380
 381 A-Form
 382
 383 * ffsubs FRT,FRA,FRB (Rc=0)
 384 * ffsubs. FRT,FRA,FRB (Rc=1)
 385
 386 Pseudo-code:
 387
 388 ```
 389     FRT <- FPSUB32(FRB, FRA)
 390     FRS <- FPADD32(FRA, FRB)
 391 ```
 392
 393 Special Registers Altered:
 394
 395 ```
 396     FPRF FR FI
 397     FX OX UX XX
 398     VXSNAN VXISI
 399     CR1          (if Rc=1)
 400 ```
 401
 402 ## [DRAFT] Floating Subtract FFT/DCT [Double]
 403
 404 A-Form
 405
 406 * ffsub FRT,FRA,FRB (Rc=0)
 407 * ffsub. FRT,FRA,FRB (Rc=1)
 408
 409 Pseudo-code:
 410
 411 ```
 412     FRT <- FPSUB64(FRB, FRA)
 413     FRS <- FPADD64(FRA, FRB)
 414 ```
 415
 416 Special Registers Altered:
 417
 418 ```
 419     FPRF FR FI
 420     FX OX UX XX
 421     VXSNAN VXISI
 422     CR1          (if Rc=1)
 423 ```