whitespace
[libreriscv.git] / openpower / sv / twin_butterfly.mdwn
1 # Introduction
2
3 <!-- hide -->
4 * <https://bugs.libre-soc.org/show_bug.cgi?id=1074>
5 * <https://libre-soc.org/openpower/sv/biginteger/> for format and
6 information about implicit RS/FRS
7 * <https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;hb=HEAD>
8 * [[openpower/isa/svfparith]]
9 * [[openpower/isa/svfixedarith]]
10 * [[openpower/sv/rfc/ls016]]
11
12 <!-- show -->
13
14 # Rationale for Twin Butterfly Integer DCT Instruction(s)
15
16 The number of general-purpose uses for DCT is huge. The number of
17 instructions needed instead of these Twin-Butterfly instructions is also
18 huge (**eight**) and given that it is extremely common to explicitly
19 loop-unroll them quantity hundreds to thousands of instructions are
20 dismayingly common (for all ISAs).
21
22 The goal is to implement instructions that calculate the expression:
23
24 ```
25 fdct_round_shift((a +/- b) * c)
26 ```
27
28 For the single-coefficient butterfly instruction, and:
29
30 ```
31 fdct_round_shift(a * c1 +/- b * c2)
32 ```
33
34 For the double-coefficient butterfly instruction.
35
36 `fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)`
37
38 ```
39 #define ROUND_POWER_OF_TWO(value, n) \
40 (((value) + (1 << ((n)-1))) >> (n))
41 ```
42
43 These instructions are at the core of **ALL** FDCT calculations in many
44 major video codecs, including -but not limited to- VP8/VP9, AV1, etc.
45 Arm includes special instructions to optimize these operations, although
46 they are limited in precision: `vqrdmulhq_s16`/`vqrdmulhq_s32`.
47
48 The suggestion is to have a single instruction to calculate both values
49 `((a + b) * c) >> N`, and `((a - b) * c) >> N`. The instruction will
50 run in accumulate mode, so in order to calculate the 2-coeff version
51 one would just have to call the same instruction with different order a,
52 b and a different constant c.
53
54 Example taken from libvpx
55 <https://chromium.googlesource.com/webm/libvpx/+/refs/heads/main/vpx_dsp/fwd_txfm.c#132>:
56
57 ```
58 #include <stdint.h>
59 #define ROUND_POWER_OF_TWO(value, n) \
60 (((value) + (1 << ((n)-1))) >> (n))
61 void twin_int(int16_t *t, int16_t x0, int16_t x1, int16_t cospi_16_64) {
62 t[0] = ROUND_POWER_OF_TWO((x0 + x1) * cospi_16_64, 14);
63 t[1] = ROUND_POWER_OF_TWO((x0 - x1) * cospi_16_64, 14);
64 }
65 ```
66
67 8 instructions are required - replaced by just the one (maddsubrs):
68
69 ```
70 add 9,5,4
71 subf 5,5,4
72 mullw 9,9,6
73 mullw 5,5,6
74 addi 9,9,8192
75 addi 5,5,8192
76 srawi 9,9,14
77 srawi 5,5,14
78 ```
79
80 -------
81
82 \newpage{}
83
84 ## Integer Butterfly Multiply Add/Sub FFT/DCT
85
86 **Add the following to Book I Section 3.3.9.1**
87
88 A-Form
89
90 ```
91 |0 |6 |11 |16 |21 |26 |31 |
92 | PO | RT | RA | RB | SH | XO |Rc |
93
94 ```
95
96 * maddsubrs RT,RA,SH,RB
97
98 Pseudo-code:
99
100 ```
101 n <- SH
102 sum <- (RT) + (RA)
103 diff <- (RT) - (RA)
104 prod1 <- MULS(RB, sum)[XLEN:(XLEN*2)-1] + 1
105 prod2 <- MULS(RB, diff)[XLEN:(XLEN*2)-1] + 1
106 res1 <- ROTL64(prod1, XLEN-n)
107 res2 <- ROTL64(prod2, XLEN-n)
108 m <- MASK(n, (XLEN-1))
109 signbit1 <- res1[0]
110 signbit2 <- res2[0]
111 smask1 <- ([signbit1]*XLEN) & ¬m
112 smask2 <- ([signbit2]*XLEN) & ¬m
113 s64_1 <- [0]*(XLEN-1) || signbit1
114 s64_2 <- [0]*(XLEN-1) || signbit2
115 RT <- (res1 & m | smask1) + s64_1
116 RS <- (res2 & m | smask2) + s64_2
117 ```
118
119 Note that if Rc=1 an Illegal Instruction is raised. Rc=1 is `RESERVED`
120
121 Similar to `RTp`, this instruction produces an implicit result, `RS`,
122 which under Scalar circumstances is defined as `RT+1`. For SVP64 if
123 `RT` is a Vector, `RS` begins immediately after the Vector `RT` where
124 the length of `RT` is set by `SVSTATE.MAXVL` (Max Vector Length).
125
126 Special Registers Altered:
127
128 ```
129 None
130 ```
131
132 -------
133
134 \newpage{}
135
136 # Twin Butterfly Floating-Point DCT Instruction(s)
137
138 ## Floating-Point Twin Multiply-Add DCT [Single]
139
140 **Add the following to Book I Section 4.6.6.3**
141
142 X-Form
143
144 ```
145 |0 |6 |11 |16 |21 |31 |
146 | PO | FRT | FRA | FRB | XO |Rc |
147 ```
148
149 * fdmadds FRT,FRA,FRB (Rc=0)
150
151 Pseudo-code:
152
153 ```
154 FRS <- FPADD32(FRT, FRB)
155 sub <- FPSUB32(FRT, FRB)
156 FRT <- FPMUL32(FRA, sub)
157 ```
158
159 The two IEEE754-FP32 operations
160
161 ```
162 FRS <- [(FRT) + (FRB)]
163 FRT <- [(FRT) - (FRB)] * (FRA)
164 ```
165
166 are simultaneously performed.
167
168 The Floating-Point operand in register FRT is added to the floating-point
169 operand in register FRB and the result stored in FRS.
170
171 Using the exact same operand input register values from FRT and FRB
172 that were used to create FRS, the Floating-Point operand in register
173 FRB is subtracted from the floating-point operand in register FRT and
174 the result then multiplied by FRA to create an intermediate result that
175 is stored in FRT.
176
177 The add into FRS is treated exactly as `fadds`. The creation of the
178 result FRT is **not** the same as that of `fmsubs`.
179 The creation of FRS and FRT are treated as parallel independent operations
180 which occur at the same time.
181
182 Note that if Rc=1 an Illegal Instruction is raised. Rc=1 is `RESERVED`
183
184 Similar to `FRTp`, this instruction produces an implicit result, `FRS`,
185 which under Scalar circumstances is defined as `FRT+1`. For SVP64 if
186 `FRT` is a Vector, `FRS` begins immediately after the Vector `FRT`
187 where the length of `FRT` is set by `SVSTATE.MAXVL` (Max Vector Length).
188
189 Special Registers Altered:
190
191 ```
192 FPRF FR FI
193 FX OX UX XX
194 VXSNAN VXISI VXIMZ
195 ```
196
197 ## Floating-Point Multiply-Add FFT [Single]
198
199 **Add the following to Book I Section 4.6.6.3**
200
201 X-Form
202
203 ```
204 |0 |6 |11 |16 |21 |31 |
205 | PO | FRT | FRA | FRB | XO |Rc |
206 ```
207
208 * ffmadds FRT,FRA,FRB (Rc=0)
209
210 Pseudo-code:
211
212 ```
213 FRS <- FPMULADD32(FRT, FRA, FRB, -1, 1)
214 FRT <- FPMULADD32(FRT, FRA, FRB, 1, 1)
215 ```
216
217 The two operations
218
219 ```
220 FRS <- -([(FRT) * (FRA)] - (FRB))
221 FRT <- [(FRT) * (FRA)] + (FRB)
222 ```
223
224 are performed.
225
226 The floating-point operand in register FRT is multiplied by the
227 floating-point operand in register FRA. The floating-point operand in
228 register FRB is added to this intermediate result, and the intermediate
229 stored in FRS.
230
231 Using the exact same values of FRT, FRT and FRB as used to create
232 FRS, the floating-point operand in register FRT is multiplied by the
233 floating-point operand in register FRA. The float- ing-point operand
234 in register FRB is subtracted from this intermediate result, and the
235 intermediate stored in FRT.
236
237 FRT is created as if a `fmadds` operation had been performed. FRS is
238 created as if a `fnmsubs` operation had simultaneously been performed
239 with the exact same register operands, in parallel, independently,
240 at exactly the same time.
241
242 FRT is a Read-Modify-Write operation.
243
244 Note that if Rc=1 an Illegal Instruction is raised.
245 Rc=1 is `RESERVED`
246
247 Similar to `FRTp`, this instruction produces an implicit result,
248 `FRS`, which under Scalar circumstances is defined as `FRT+1`.
249 For SVP64 if `FRT` is a Vector, `FRS` begins immediately after the
250 Vector `FRT` where the length of `FRT` is set by `SVSTATE.MAXVL`
251 (Max Vector Length).
252
253
254 Special Registers Altered:
255
256 ```
257 FPRF FR FI
258 FX OX UX XX
259 VXSNAN VXISI VXIMZ
260 ```
261 ## Floating-Point Twin Multiply-Add DCT
262
263 **Add the following to Book I Section 4.6.6.3**
264
265 X-Form
266
267 ```
268 |0 |6 |11 |16 |21 |31 |
269 | PO | FRT | FRA | FRB | XO |Rc |
270 ```
271
272 * fdmadd FRT,FRA,FRB (Rc=0)
273
274 Pseudo-code:
275
276 ```
277 FRS <- FPADD64(FRT, FRB)
278 sub <- FPSUB64(FRT, FRB)
279 FRT <- FPMUL64(FRA, sub)
280 ```
281
282 The two IEEE754-FP64 operations
283
284 ```
285 FRS <- [(FRT) + (FRB)]
286 FRT <- [(FRT) - (FRB)] * (FRA)
287 ```
288
289 are simultaneously performed.
290
291 The Floating-Point operand in register FRT is added to the floating-point
292 operand in register FRB and the result stored in FRS.
293
294 Using the exact same operand input register values from FRT and FRB
295 that were used to create FRS, the Floating-Point operand in register
296 FRB is subtracted from the floating-point operand in register FRT and
297 the result then multiplied by FRA to create an intermediate result that
298 is stored in FRT.
299
300 The add into FRS is treated exactly as `fadd`. The creation of the
301 result FRT is **not** the same as that of `fmsub`.
302 The creation of FRS and FRT are treated as parallel independent operations
303 which occur at the same time.
304
305 Note that if Rc=1 an Illegal Instruction is raised. Rc=1 is `RESERVED`
306
307 Similar to `FRTp`, this instruction produces an implicit result, `FRS`,
308 which under Scalar circumstances is defined as `FRT+1`. For SVP64 if
309 `FRT` is a Vector, `FRS` begins immediately after the Vector `FRT`
310 where the length of `FRT` is set by `SVSTATE.MAXVL` (Max Vector Length).
311
312 Special Registers Altered:
313
314 ```
315 FPRF FR FI
316 FX OX UX XX
317 VXSNAN VXISI VXIMZ
318 ```
319
320 ## Floating-Point Twin Multiply-Add FFT
321
322 **Add the following to Book I Section 4.6.6.3**
323
324 X-Form
325
326 ```
327 |0 |6 |11 |16 |21 |31 |
328 | PO | FRT | FRA | FRB | XO |Rc |
329 ```
330
331 * ffmadd FRT,FRA,FRB (Rc=0)
332
333 Pseudo-code:
334
335 ```
336 FRS <- FPMULADD64(FRT, FRA, FRB, -1, 1)
337 FRT <- FPMULADD64(FRT, FRA, FRB, 1, 1)
338 ```
339
340 The two operations
341
342 ```
343 FRS <- -([(FRT) * (FRA)] - (FRB))
344 FRT <- [(FRT) * (FRA)] + (FRB)
345 ```
346
347 are performed.
348
349 The floating-point operand in register FRT is multiplied by the
350 floating-point operand in register FRA. The float- ing-point operand in
351 register FRB is added to this intermediate result, and the intermediate
352 stored in FRS.
353
354 Using the exact same values of FRT, FRT and FRB as used to create
355 FRS, the floating-point operand in register FRT is multiplied by the
356 floating-point operand in register FRA. The float- ing-point operand
357 in register FRB is subtracted from this intermediate result, and the
358 intermediate stored in FRT.
359
360 FRT is created as if a `fmadd` operation had been performed. FRS is
361 created as if a `fnmsub` operation had simultaneously been performed
362 with the exact same register operands, in parallel, independently,
363 at exactly the same time.
364
365 FRT is a Read-Modify-Write operation.
366
367 Note that if Rc=1 an Illegal Instruction is raised. Rc=1 is `RESERVED`
368
369 Similar to `FRTp`, this instruction produces an implicit result, `FRS`,
370 which under Scalar circumstances is defined as `FRT+1`. For SVP64 if
371 `FRT` is a Vector, `FRS` begins immediately after the Vector `FRT`
372 where the length of `FRT` is set by `SVSTATE.MAXVL` (Max Vector Length).
373
374 Special Registers Altered:
375
376 ```
377 FPRF FR FI
378 FX OX UX XX
379 VXSNAN VXISI VXIMZ
380 ```
381
382
383 ## [DRAFT] Floating-Point Add FFT/DCT [Single]
384
385 A-Form
386
387 * ffadds FRT,FRA,FRB (Rc=0)
388 * ffadds. FRT,FRA,FRB (Rc=1)
389
390 Pseudo-code:
391
392 ```
393 FRT <- FPADD32(FRA, FRB)
394 FRS <- FPSUB32(FRB, FRA)
395 ```
396
397 Special Registers Altered:
398
399 ```
400 FPRF FR FI
401 FX OX UX XX
402 VXSNAN VXISI
403 CR1 (if Rc=1)
404 ```
405
406 ## [DRAFT] Floating-Point Add FFT/DCT [Double]
407
408 A-Form
409
410 * ffadd FRT,FRA,FRB (Rc=0)
411 * ffadd. FRT,FRA,FRB (Rc=1)
412
413 Pseudo-code:
414
415 ```
416 FRT <- FPADD64(FRA, FRB)
417 FRS <- FPSUB64(FRB, FRA)
418 ```
419
420 Special Registers Altered:
421
422 ```
423 FPRF FR FI
424 FX OX UX XX
425 VXSNAN VXISI
426 CR1 (if Rc=1)
427 ```
428
429 ## [DRAFT] Floating-Point Subtract FFT/DCT [Single]
430
431 A-Form
432
433 * ffsubs FRT,FRA,FRB (Rc=0)
434 * ffsubs. FRT,FRA,FRB (Rc=1)
435
436 Pseudo-code:
437
438 ```
439 FRT <- FPSUB32(FRB, FRA)
440 FRS <- FPADD32(FRA, FRB)
441 ```
442
443 Special Registers Altered:
444
445 ```
446 FPRF FR FI
447 FX OX UX XX
448 VXSNAN VXISI
449 CR1 (if Rc=1)
450 ```
451
452 ## [DRAFT] Floating-Point Subtract FFT/DCT [Double]
453
454 A-Form
455
456 * ffsub FRT,FRA,FRB (Rc=0)
457 * ffsub. FRT,FRA,FRB (Rc=1)
458
459 Pseudo-code:
460
461 ```
462 FRT <- FPSUB64(FRB, FRA)
463 FRS <- FPADD64(FRA, FRB)
464 ```
465
466 Special Registers Altered:
467
468 ```
469 FPRF FR FI
470 FX OX UX XX
471 VXSNAN VXISI
472 CR1 (if Rc=1)
473 ```