only put "add the following" where needed
[libreriscv.git] / openpower / sv / twin_butterfly.mdwn
1 # Introduction
2
3 <!-- hide -->
4 * <https://bugs.libre-soc.org/show_bug.cgi?id=1074>
5 * <https://libre-soc.org/openpower/sv/biginteger/> for format and
6 information about implicit RS/FRS
7 * <https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;hb=HEAD>
8 * [[openpower/isa/svfparith]]
9 * [[openpower/isa/svfixedarith]]
10 * [[openpower/sv/rfc/ls016]]
11
12 <!-- show -->
13
14 # Rationale for Twin Butterfly Integer DCT Instruction(s)
15
16 The number of general-purpose uses for DCT is huge. The number of
17 instructions needed instead of these Twin-Butterfly instructions is also
18 huge (**eight**) and given that it is extremely common to explicitly
19 loop-unroll them quantity hundreds to thousands of instructions are
20 dismayingly common (for all ISAs).
21
22 The goal is to implement instructions that calculate the expression:
23
24 ```
25 fdct_round_shift((a +/- b) * c)
26 ```
27
28 For the single-coefficient butterfly instruction, and:
29
30 ```
31 fdct_round_shift(a * c1 +/- b * c2)
32 ```
33
34 For the double-coefficient butterfly instruction.
35
36 `fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)`
37
38 ```
39 #define ROUND_POWER_OF_TWO(value, n) \
40 (((value) + (1 << ((n)-1))) >> (n))
41 ```
42
43 These instructions are at the core of **ALL** FDCT calculations in many
44 major video codecs, including -but not limited to- VP8/VP9, AV1, etc.
45 Arm includes special instructions to optimize these operations, although
46 they are limited in precision: `vqrdmulhq_s16`/`vqrdmulhq_s32`.
47
48 The suggestion is to have a single instruction to calculate both values
49 `((a + b) * c) >> N`, and `((a - b) * c) >> N`. The instruction will
50 run in accumulate mode, so in order to calculate the 2-coeff version
51 one would just have to call the same instruction with different order a,
52 b and a different constant c.
53
54 Example taken from libvpx
55 <https://chromium.googlesource.com/webm/libvpx/+/refs/heads/main/vpx_dsp/fwd_txfm.c#132>:
56
57 ```
58 #include <stdint.h>
59 #define ROUND_POWER_OF_TWO(value, n) \
60 (((value) + (1 << ((n)-1))) >> (n))
61 void twin_int(int16_t *t, int16_t x0, int16_t x1, int16_t cospi_16_64) {
62 t[0] = ROUND_POWER_OF_TWO((x0 + x1) * cospi_16_64, 14);
63 t[1] = ROUND_POWER_OF_TWO((x0 - x1) * cospi_16_64, 14);
64 }
65 ```
66
67 8 instructions are required - replaced by just the one (maddsubrs):
68
69 ```
70 add 9,5,4
71 subf 5,5,4
72 mullw 9,9,6
73 mullw 5,5,6
74 addi 9,9,8192
75 addi 5,5,8192
76 srawi 9,9,14
77 srawi 5,5,14
78 ```
79
80 -------
81
82 \newpage{}
83
84 ## Integer Butterfly Multiply Add/Sub FFT/DCT
85
86 **Add the following to Book I Section 3.3.9.1**
87
88 A-Form
89
90 ```
91 |0 |6 |11 |16 |21 |26 |31 |
92 | PO | RT | RA | RB | SH | XO |Rc |
93
94 ```
95
96 * maddsubrs RT,RA,SH,RB
97
98 Pseudo-code:
99
100 ```
101 n <- SH
102 sum <- (RT) + (RA)
103 diff <- (RT) - (RA)
104 prod1 <- MULS(RB, sum)[XLEN:(XLEN*2)-1] + 1
105 prod2 <- MULS(RB, diff)[XLEN:(XLEN*2)-1] + 1
106 res1 <- ROTL64(prod1, XLEN-n)
107 res2 <- ROTL64(prod2, XLEN-n)
108 m <- MASK(n, (XLEN-1))
109 signbit1 <- res1[0]
110 signbit2 <- res2[0]
111 smask1 <- ([signbit1]*XLEN) & ¬m
112 smask2 <- ([signbit2]*XLEN) & ¬m
113 s64_1 <- [0]*(XLEN-1) || signbit1
114 s64_2 <- [0]*(XLEN-1) || signbit2
115 RT <- (res1 & m | smask1) + s64_1
116 RS <- (res2 & m | smask2) + s64_2
117 ```
118
119 Note that if Rc=1 an Illegal Instruction is raised. Rc=1 is `RESERVED`
120
121 Similar to `RTp`, this instruction produces an implicit result, `RS`,
122 which under Scalar circumstances is defined as `RT+1`. For SVP64 if
123 `RT` is a Vector, `RS` begins immediately after the Vector `RT` where
124 the length of `RT` is set by `SVSTATE.MAXVL` (Max Vector Length).
125
126 Special Registers Altered:
127
128 ```
129 None
130 ```
131
132 -------
133
134 \newpage{}
135
136 # Twin Butterfly Floating-Point DCT Instruction(s)
137
138 **Add the following to Book I Section 4.6.6.3**
139
140 ## Floating-Point Twin Multiply-Add DCT [Single]
141
142 X-Form
143
144 ```
145 |0 |6 |11 |16 |21 |31 |
146 | PO | FRT | FRA | FRB | XO |Rc |
147 ```
148
149 * fdmadds FRT,FRA,FRB (Rc=0)
150
151 Pseudo-code:
152
153 ```
154 FRS <- FPADD32(FRT, FRB)
155 sub <- FPSUB32(FRT, FRB)
156 FRT <- FPMUL32(FRA, sub)
157 ```
158
159 The two IEEE754-FP32 operations
160
161 ```
162 FRS <- [(FRT) + (FRB)]
163 FRT <- [(FRT) - (FRB)] * (FRA)
164 ```
165
166 are simultaneously performed.
167
168 The Floating-Point operand in register FRT is added to the floating-point
169 operand in register FRB and the result stored in FRS.
170
171 Using the exact same operand input register values from FRT and FRB
172 that were used to create FRS, the Floating-Point operand in register
173 FRB is subtracted from the floating-point operand in register FRT and
174 the result then multiplied by FRA to create an intermediate result that
175 is stored in FRT.
176
177 The add into FRS is treated exactly as `fadds`. The creation of the
178 result FRT is **not** the same as that of `fmsubs`.
179 The creation of FRS and FRT are treated as parallel independent operations
180 which occur at the same time.
181
182 Note that if Rc=1 an Illegal Instruction is raised. Rc=1 is `RESERVED`
183
184 Similar to `FRTp`, this instruction produces an implicit result, `FRS`,
185 which under Scalar circumstances is defined as `FRT+1`. For SVP64 if
186 `FRT` is a Vector, `FRS` begins immediately after the Vector `FRT`
187 where the length of `FRT` is set by `SVSTATE.MAXVL` (Max Vector Length).
188
189 Special Registers Altered:
190
191 ```
192 FPRF FR FI
193 FX OX UX XX
194 VXSNAN VXISI VXIMZ
195 ```
196
197 ## Floating-Point Multiply-Add FFT [Single]
198
199 X-Form
200
201 ```
202 |0 |6 |11 |16 |21 |31 |
203 | PO | FRT | FRA | FRB | XO |Rc |
204 ```
205
206 * ffmadds FRT,FRA,FRB (Rc=0)
207
208 Pseudo-code:
209
210 ```
211 FRS <- FPMULADD32(FRT, FRA, FRB, -1, 1)
212 FRT <- FPMULADD32(FRT, FRA, FRB, 1, 1)
213 ```
214
215 The two operations
216
217 ```
218 FRS <- -([(FRT) * (FRA)] - (FRB))
219 FRT <- [(FRT) * (FRA)] + (FRB)
220 ```
221
222 are performed.
223
224 The floating-point operand in register FRT is multiplied by the
225 floating-point operand in register FRA. The floating-point operand in
226 register FRB is added to this intermediate result, and the intermediate
227 stored in FRS.
228
229 Using the exact same values of FRT, FRT and FRB as used to create
230 FRS, the floating-point operand in register FRT is multiplied by the
231 floating-point operand in register FRA. The float- ing-point operand
232 in register FRB is subtracted from this intermediate result, and the
233 intermediate stored in FRT.
234
235 FRT is created as if a `fmadds` operation had been performed. FRS is
236 created as if a `fnmsubs` operation had simultaneously been performed
237 with the exact same register operands, in parallel, independently,
238 at exactly the same time.
239
240 FRT is a Read-Modify-Write operation.
241
242 Note that if Rc=1 an Illegal Instruction is raised.
243 Rc=1 is `RESERVED`
244
245 Similar to `FRTp`, this instruction produces an implicit result,
246 `FRS`, which under Scalar circumstances is defined as `FRT+1`.
247 For SVP64 if `FRT` is a Vector, `FRS` begins immediately after the
248 Vector `FRT` where the length of `FRT` is set by `SVSTATE.MAXVL`
249 (Max Vector Length).
250
251
252 Special Registers Altered:
253
254 ```
255 FPRF FR FI
256 FX OX UX XX
257 VXSNAN VXISI VXIMZ
258 ```
259 ## Floating-Point Twin Multiply-Add DCT
260
261 X-Form
262
263 ```
264 |0 |6 |11 |16 |21 |31 |
265 | PO | FRT | FRA | FRB | XO |Rc |
266 ```
267
268 * fdmadd FRT,FRA,FRB (Rc=0)
269
270 Pseudo-code:
271
272 ```
273 FRS <- FPADD64(FRT, FRB)
274 sub <- FPSUB64(FRT, FRB)
275 FRT <- FPMUL64(FRA, sub)
276 ```
277
278 The two IEEE754-FP64 operations
279
280 ```
281 FRS <- [(FRT) + (FRB)]
282 FRT <- [(FRT) - (FRB)] * (FRA)
283 ```
284
285 are simultaneously performed.
286
287 The Floating-Point operand in register FRT is added to the floating-point
288 operand in register FRB and the result stored in FRS.
289
290 Using the exact same operand input register values from FRT and FRB
291 that were used to create FRS, the Floating-Point operand in register
292 FRB is subtracted from the floating-point operand in register FRT and
293 the result then multiplied by FRA to create an intermediate result that
294 is stored in FRT.
295
296 The add into FRS is treated exactly as `fadd`. The creation of the
297 result FRT is **not** the same as that of `fmsub`.
298 The creation of FRS and FRT are treated as parallel independent operations
299 which occur at the same time.
300
301 Note that if Rc=1 an Illegal Instruction is raised. Rc=1 is `RESERVED`
302
303 Similar to `FRTp`, this instruction produces an implicit result, `FRS`,
304 which under Scalar circumstances is defined as `FRT+1`. For SVP64 if
305 `FRT` is a Vector, `FRS` begins immediately after the Vector `FRT`
306 where the length of `FRT` is set by `SVSTATE.MAXVL` (Max Vector Length).
307
308 Special Registers Altered:
309
310 ```
311 FPRF FR FI
312 FX OX UX XX
313 VXSNAN VXISI VXIMZ
314 ```
315
316 ## Floating-Point Twin Multiply-Add FFT
317
318 X-Form
319
320 ```
321 |0 |6 |11 |16 |21 |31 |
322 | PO | FRT | FRA | FRB | XO |Rc |
323 ```
324
325 * ffmadd FRT,FRA,FRB (Rc=0)
326
327 Pseudo-code:
328
329 ```
330 FRS <- FPMULADD64(FRT, FRA, FRB, -1, 1)
331 FRT <- FPMULADD64(FRT, FRA, FRB, 1, 1)
332 ```
333
334 The two operations
335
336 ```
337 FRS <- -([(FRT) * (FRA)] - (FRB))
338 FRT <- [(FRT) * (FRA)] + (FRB)
339 ```
340
341 are performed.
342
343 The floating-point operand in register FRT is multiplied by the
344 floating-point operand in register FRA. The float- ing-point operand in
345 register FRB is added to this intermediate result, and the intermediate
346 stored in FRS.
347
348 Using the exact same values of FRT, FRT and FRB as used to create
349 FRS, the floating-point operand in register FRT is multiplied by the
350 floating-point operand in register FRA. The float- ing-point operand
351 in register FRB is subtracted from this intermediate result, and the
352 intermediate stored in FRT.
353
354 FRT is created as if a `fmadd` operation had been performed. FRS is
355 created as if a `fnmsub` operation had simultaneously been performed
356 with the exact same register operands, in parallel, independently,
357 at exactly the same time.
358
359 FRT is a Read-Modify-Write operation.
360
361 Note that if Rc=1 an Illegal Instruction is raised. Rc=1 is `RESERVED`
362
363 Similar to `FRTp`, this instruction produces an implicit result, `FRS`,
364 which under Scalar circumstances is defined as `FRT+1`. For SVP64 if
365 `FRT` is a Vector, `FRS` begins immediately after the Vector `FRT`
366 where the length of `FRT` is set by `SVSTATE.MAXVL` (Max Vector Length).
367
368 Special Registers Altered:
369
370 ```
371 FPRF FR FI
372 FX OX UX XX
373 VXSNAN VXISI VXIMZ
374 ```
375
376
377 ## [DRAFT] Floating-Point Add FFT/DCT [Single]
378
379 A-Form
380
381 * ffadds FRT,FRA,FRB (Rc=0)
382 * ffadds. FRT,FRA,FRB (Rc=1)
383
384 Pseudo-code:
385
386 ```
387 FRT <- FPADD32(FRA, FRB)
388 FRS <- FPSUB32(FRB, FRA)
389 ```
390
391 Special Registers Altered:
392
393 ```
394 FPRF FR FI
395 FX OX UX XX
396 VXSNAN VXISI
397 CR1 (if Rc=1)
398 ```
399
400 ## [DRAFT] Floating-Point Add FFT/DCT [Double]
401
402 A-Form
403
404 * ffadd FRT,FRA,FRB (Rc=0)
405 * ffadd. FRT,FRA,FRB (Rc=1)
406
407 Pseudo-code:
408
409 ```
410 FRT <- FPADD64(FRA, FRB)
411 FRS <- FPSUB64(FRB, FRA)
412 ```
413
414 Special Registers Altered:
415
416 ```
417 FPRF FR FI
418 FX OX UX XX
419 VXSNAN VXISI
420 CR1 (if Rc=1)
421 ```
422
423 ## [DRAFT] Floating-Point Subtract FFT/DCT [Single]
424
425 A-Form
426
427 * ffsubs FRT,FRA,FRB (Rc=0)
428 * ffsubs. FRT,FRA,FRB (Rc=1)
429
430 Pseudo-code:
431
432 ```
433 FRT <- FPSUB32(FRB, FRA)
434 FRS <- FPADD32(FRA, FRB)
435 ```
436
437 Special Registers Altered:
438
439 ```
440 FPRF FR FI
441 FX OX UX XX
442 VXSNAN VXISI
443 CR1 (if Rc=1)
444 ```
445
446 ## [DRAFT] Floating-Point Subtract FFT/DCT [Double]
447
448 A-Form
449
450 * ffsub FRT,FRA,FRB (Rc=0)
451 * ffsub. FRT,FRA,FRB (Rc=1)
452
453 Pseudo-code:
454
455 ```
456 FRT <- FPSUB64(FRB, FRA)
457 FRS <- FPADD64(FRA, FRB)
458 ```
459
460 Special Registers Altered:
461
462 ```
463 FPRF FR FI
464 FX OX UX XX
465 VXSNAN VXISI
466 CR1 (if Rc=1)
467 ```