(no commit message)
[libreriscv.git] / openpower / sv / twin_butterfly.mdwn
1 # Introduction
2
3 <!-- hide -->
4 * <https://bugs.libre-soc.org/show_bug.cgi?id=1074>
5 * <https://libre-soc.org/openpower/sv/biginteger/> for format and
6 information about implicit RS/FRS
7 * <https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;hb=HEAD>
8 * [[openpower/isa/svfparith]]
9 * [[openpower/isa/svfixedarith]]
10 <!-- show -->
11
12 # Rationale for Twin Butterfly Integer DCT Instruction(s)
13
14 The number of general-purpose uses for DCT is huge. The
15 number of instructions needed instead of these Twin-Butterfly
16 instructions is also huge (**eight**) and given that it is
17 extremely common to explicitly loop-unroll them quantity
18 hundreds to thousands of instructions are dismayingly common
19 (for all ISAs).
20
21 The goal is to implement instructions that calculate the expression:
22
23 ```
24 fdct_round_shift((a +/- b) * c)
25 ```
26
27 For the single-coefficient butterfly instruction, and:
28
29 ```
30 fdct_round_shift(a * c1 +/- b * c2)
31 ```
32
33 For the double-coefficient butterfly instruction.
34
35 `fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)`
36
37 ```
38 #define ROUND_POWER_OF_TWO(value, n) (((value) + (1 << ((n)-1))) >> (n))
39 ```
40
41 These instructions are at the core of **ALL** FDCT calculations in many major video codecs, including -but not limited to- VP8/VP9, AV1, etc.
42 Arm includes special instructions to optimize these operations, although they are limited in precision: `vqrdmulhq_s16`/`vqrdmulhq_s32`.
43
44 The suggestion is to have a single instruction to calculate both values `((a + b) * c) >> N`, and `((a - b) * c) >> N`.
45 The instruction will run in accumulate mode, so in order to calculate the 2-coeff version one would just have to call the same instruction with different order a, b and a different constant c.
46
47 ## Integer Butterfly Multiply Add/Sub FFT/DCT
48
49 **Add the following to Book I Section 3.3.9.1**
50
51 A-Form
52
53 ```
54 |0 |6 |11 |16 |21 |26 |31 |
55 | PO | RT | RA | RB | SH | XO |/ |
56
57 ```
58
59 * maddsubrs RT,RA,SH,RB
60
61 Pseudo-code:
62
63 ```
64 n <- SH
65 sum <- (RT) + (RA)
66 diff <- (RT) - (RA)
67 prod1 <- MULS(RB, sum)[XLEN:(XLEN*2)-1]
68 prod2 <- MULS(RB, diff)[XLEN:(XLEN*2)-1]
69 res1 <- ROTL64(prod1, XLEN-n)
70 res2 <- ROTL64(prod2, XLEN-n)
71 m <- MASK(n, (XLEN-1))
72 signbit1 <- res1[0]
73 signbit2 <- res2[0]
74 smask1 <- ([signbit1]*XLEN) & ¬m
75 smask2 <- ([signbit2]*XLEN) & ¬m
76 s64_1 <- [0]*(XLEN-1) || signbit1
77 s64_2 <- [0]*(XLEN-1) || signbit2
78 RT <- (res1 & m | smask1) + s64_1
79 RS <- (res2 & m | smask2) + s64_2
80 ```
81
82 Special Registers Altered:
83
84 ```
85 None
86 ```
87
88 # Twin Butterfly Integer DCT Instruction(s)
89
90 ## Floating Twin Multiply-Add DCT [Single]
91
92 **Add the following to Book I Section 4.6.6.3 **
93
94 X-Form
95
96 ```
97 |0 |6 |11 |16 |21 |31 |
98 | PO | FRT | FRA | FRB | XO |/ |
99 ```
100
101 * fdmadds FRT,FRA,FRB (Rc=0)
102
103 Pseudo-code:
104
105 ```
106 FRS <- FPADD32(FRT, FRB)
107 sub <- FPSUB32(FRT, FRB)
108 FRT <- FPMUL32(FRA, sub)
109 ```
110
111 Special Registers Altered:
112
113 ```
114 FPRF FR FI
115 FX OX UX XX
116 VXSNAN VXISI VXIMZ
117 ```
118
119 ## Floating Multiply-Add FFT [Single]
120
121 **Add the following to Book I Section 4.6.6.3 **
122
123 X-Form
124
125 ```
126 |0 |6 |11 |16 |21 |31 |
127 | PO | FRT | FRA | FRB | XO |/ |
128 ```
129
130 * ffmadds FRT,FRA,FRB (Rc=0)
131
132 Pseudo-code:
133
134 ```
135 FRS <- FPMULADD32(FRT, FRA, FRB, -1, 1)
136 FRT <- FPMULADD32(FRT, FRA, FRB, 1, 1)
137 ```
138
139 Special Registers Altered:
140
141 ```
142 FPRF FR FI
143 FX OX UX XX
144 VXSNAN VXISI VXIMZ
145 ```