(no commit message)
[libreriscv.git] / openpower / sv / twin_butterfly.mdwn
1 # Introduction
2
3 <!-- hide -->
4 * <https://bugs.libre-soc.org/show_bug.cgi?id=1074>
5 * <https://libre-soc.org/openpower/sv/biginteger/> for format and
6 information about implicit RS/FRS
7 * <https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;hb=HEAD>
8 * [[openpower/isa/svfparith]]
9 * [[openpower/isa/svfixedarith]]
10 * [[openpower/sv/rfc/ls016]]
11
12 <!-- show -->
13
14 # Rationale for Twin Butterfly Integer DCT Instruction(s)
15
16 The number of general-purpose uses for DCT is huge. The
17 number of instructions needed instead of these Twin-Butterfly
18 instructions is also huge (**eight**) and given that it is
19 extremely common to explicitly loop-unroll them quantity
20 hundreds to thousands of instructions are dismayingly common
21 (for all ISAs).
22
23 The goal is to implement instructions that calculate the expression:
24
25 ```
26 fdct_round_shift((a +/- b) * c)
27 ```
28
29 For the single-coefficient butterfly instruction, and:
30
31 ```
32 fdct_round_shift(a * c1 +/- b * c2)
33 ```
34
35 For the double-coefficient butterfly instruction.
36
37 `fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)`
38
39 ```
40 #define ROUND_POWER_OF_TWO(value, n) \
41 (((value) + (1 << ((n)-1))) >> (n))
42 ```
43
44 These instructions are at the core of **ALL** FDCT calculations in many major video codecs, including -but not limited to- VP8/VP9, AV1, etc.
45 Arm includes special instructions to optimize these operations, although they are limited in precision: `vqrdmulhq_s16`/`vqrdmulhq_s32`.
46
47 The suggestion is to have a single instruction to calculate both values `((a + b) * c) >> N`, and `((a - b) * c) >> N`.
48 The instruction will run in accumulate mode, so in order to calculate the 2-coeff version one would just have to call the same instruction with different order a, b and a different constant c.
49
50 ## Integer Butterfly Multiply Add/Sub FFT/DCT
51
52 **Add the following to Book I Section 3.3.9.1**
53
54 A-Form
55
56 ```
57 |0 |6 |11 |16 |21 |26 |31 |
58 | PO | RT | RA | RB | SH | XO |Rc |
59
60 ```
61
62 * maddsubrs RT,RA,SH,RB
63
64 Pseudo-code:
65
66 ```
67 n <- SH
68 sum <- (RT) + (RA)
69 diff <- (RT) - (RA)
70 prod1 <- MULS(RB, sum)[XLEN:(XLEN*2)-1]
71 prod2 <- MULS(RB, diff)[XLEN:(XLEN*2)-1]
72 res1 <- ROTL64(prod1, XLEN-n)
73 res2 <- ROTL64(prod2, XLEN-n)
74 m <- MASK(n, (XLEN-1))
75 signbit1 <- res1[0]
76 signbit2 <- res2[0]
77 smask1 <- ([signbit1]*XLEN) & ¬m
78 smask2 <- ([signbit2]*XLEN) & ¬m
79 s64_1 <- [0]*(XLEN-1) || signbit1
80 s64_2 <- [0]*(XLEN-1) || signbit2
81 RT <- (res1 & m | smask1) + s64_1
82 RS <- (res2 & m | smask2) + s64_2
83 ```
84
85 Note that if Rc=1 an Illegal Instruction is raised.
86 Rc=1 is `RESERVED`
87
88 Similar to `RTp`, this instruction produces an implicit result,
89 `RS`, which under Scalar circumstances is defined as `RT+1`.
90 For SVP64 if `RT` is a Vector, `RS` begins immediately after the
91 Vector `RT` where the length of `RT` is set by `SVSTATE.MAXVL`
92 (Max Vector Length).
93
94 Special Registers Altered:
95
96 ```
97 None
98 ```
99
100 # Twin Butterfly Integer DCT Instruction(s)
101
102 ## Floating Twin Multiply-Add DCT [Single]
103
104 **Add the following to Book I Section 4.6.6.3**
105
106 X-Form
107
108 ```
109 |0 |6 |11 |16 |21 |31 |
110 | PO | FRT | FRA | FRB | XO |Rc |
111 ```
112
113 * fdmadds FRT,FRA,FRB (Rc=0)
114
115 Pseudo-code:
116
117 ```
118 FRS <- FPADD32(FRT, FRB)
119 FRT <- FPMULADD32(FRT, FRA, FRB, 1, -1)
120 ```
121
122 The Floating-Point operand in register FRT is added to the floating-point
123 operand in register FRB and the result stored in FRS.
124
125 Using the exact same operand input register values from FRT and FRB that
126 were used to create FRS, the Floating-Point operand in register FRB
127 is subtracted from the floating-point operand in register FRT and the
128 result then multiplied by FRA to create an intermediate result that is
129 stored in FRT.
130
131 The add into FRS is treated exactly as `fadd`. The creation
132 of the result FRT is exact!y that of `fmsub`. The creation of FRS and FRT are
133 treated as parallel independent operations which occur at the same time.
134
135 Note that if Rc=1 an Illegal Instruction is raised.
136 Rc=1 is `RESERVED`
137
138 Similar to `FRTp`, this instruction produces an implicit result,
139 `FRS`, which under Scalar circumstances is defined as `FRT+1`.
140 For SVP64 if `FRT` is a Vector, `FRS` begins immediately after the
141 Vector `FRT` where the length of `FRT` is set by `SVSTATE.MAXVL`
142 (Max Vector Length).
143
144 Special Registers Altered:
145
146 ```
147 FPRF FR FI
148 FX OX UX XX
149 VXSNAN VXISI VXIMZ
150 ```
151
152 ## Floating Multiply-Add FFT [Single]
153
154 **Add the following to Book I Section 4.6.6.3**
155
156 X-Form
157
158 ```
159 |0 |6 |11 |16 |21 |31 |
160 | PO | FRT | FRA | FRB | XO |Rc |
161 ```
162
163 * ffmadds FRT,FRA,FRB (Rc=0)
164
165 Pseudo-code:
166
167 ```
168 FRS <- FPMULADD32(FRT, FRA, FRB, -1, 1)
169 FRT <- FPMULADD32(FRT, FRA, FRB, 1, 1)
170 ```
171
172 The two operations
173
174 ```
175 FRS <- -([(FRT) * (FRA)] - (FRB))
176 FRT <- [(FRT) * (FRA)] + (FRB)
177 ```
178
179 are performed.
180
181 The floating-point operand in register FRT is multiplied
182 by the floating-point operand in register FRA. The float-
183 ing-point operand in register FRB is added to
184 this intermediate result, and the intermediate stored in FRS.
185
186 Using the exact same values of FRT, FRT and FRB as used to create FRS,
187 the floating-point operand in register FRT is multiplied
188 by the floating-point operand in register FRA. The float-
189 ing-point operand in register FRB is subtracted from
190 this intermediate result, and the intermediate stored in FRT.
191
192 FRT is created as if
193 a `fmadds` operation had been performed. FRS is created as if
194 a `fnmsubs` operation had simultaneously been performed with
195 the exact same register operands, in parallel, independently,
196 at exactly the same time.
197
198 FRT is a Read-Modify-Write operation.
199
200 Note that if Rc=1 an Illegal Instruction is raised.
201 Rc=1 is `RESERVED`
202
203 Similar to `FRTp`, this instruction produces an implicit result,
204 `FRS`, which under Scalar circumstances is defined as `FRT+1`.
205 For SVP64 if `FRT` is a Vector, `FRS` begins immediately after the
206 Vector `FRT` where the length of `FRT` is set by `SVSTATE.MAXVL`
207 (Max Vector Length).
208
209
210 Special Registers Altered:
211
212 ```
213 FPRF FR FI
214 FX OX UX XX
215 VXSNAN VXISI VXIMZ
216 ```
217
218 ## [DRAFT] Floating Add FFT/DCT [Single]
219
220 A-Form
221
222 * ffadds FRT,FRA,FRB (Rc=0)
223 * ffadds. FRT,FRA,FRB (Rc=1)
224
225 Pseudo-code:
226
227 ```
228 FRT <- FPADD32(FRA, FRB)
229 FRS <- FPSUB32(FRB, FRA)
230 ```
231
232 Special Registers Altered:
233
234 ```
235 FPRF FR FI
236 FX OX UX XX
237 VXSNAN VXISI
238 CR1 (if Rc=1)
239 ```
240
241 ## [DRAFT] Floating Add FFT/DCT [Double]
242
243 A-Form
244
245 * ffadd FRT,FRA,FRB (Rc=0)
246 * ffadd. FRT,FRA,FRB (Rc=1)
247
248 Pseudo-code:
249
250 ```
251 FRT <- FPADD64(FRA, FRB)
252 FRS <- FPSUB64(FRB, FRA)
253 ```
254
255 Special Registers Altered:
256
257 ```
258 FPRF FR FI
259 FX OX UX XX
260 VXSNAN VXISI
261 CR1 (if Rc=1)
262 ```
263
264 ## [DRAFT] Floating Subtract FFT/DCT [Single]
265
266 A-Form
267
268 * ffsubs FRT,FRA,FRB (Rc=0)
269 * ffsubs. FRT,FRA,FRB (Rc=1)
270
271 Pseudo-code:
272
273 ```
274 FRT <- FPSUB32(FRB, FRA)
275 FRS <- FPADD32(FRA, FRB)
276 ```
277
278 Special Registers Altered:
279
280 ```
281 FPRF FR FI
282 FX OX UX XX
283 VXSNAN VXISI
284 CR1 (if Rc=1)
285 ```
286
287 ## [DRAFT] Floating Subtract FFT/DCT [Double]
288
289 A-Form
290
291 * ffsub FRT,FRA,FRB (Rc=0)
292 * ffsub. FRT,FRA,FRB (Rc=1)
293
294 Pseudo-code:
295
296 ```
297 FRT <- FPSUB64(FRB, FRA)
298 FRS <- FPADD64(FRA, FRB)
299 ```
300
301 Special Registers Altered:
302
303 ```
304 FPRF FR FI
305 FX OX UX XX
306 VXSNAN VXISI
307 CR1 (if Rc=1)
308 ```