(no commit message)
[libreriscv.git] / openpower / sv / twin_butterfly.mdwn
1 # Introduction
2
3 <!-- hide -->
4 * <https://bugs.libre-soc.org/show_bug.cgi?id=1074>
5 * <https://libre-soc.org/openpower/sv/biginteger/> for format and
6 information about implicit RS/FRS
7 * <https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;hb=HEAD>
8 * [[openpower/isa/svfparith]]
9 * [[openpower/isa/svfixedarith]]
10 * [[openpower/sv/rfc/ls016]]
11
12 <!-- show -->
13
14 # Rationale for Twin Butterfly Integer DCT Instruction(s)
15
16 The number of general-purpose uses for DCT is huge. The
17 number of instructions needed instead of these Twin-Butterfly
18 instructions is also huge (**eight**) and given that it is
19 extremely common to explicitly loop-unroll them quantity
20 hundreds to thousands of instructions are dismayingly common
21 (for all ISAs).
22
23 The goal is to implement instructions that calculate the expression:
24
25 ```
26 fdct_round_shift((a +/- b) * c)
27 ```
28
29 For the single-coefficient butterfly instruction, and:
30
31 ```
32 fdct_round_shift(a * c1 +/- b * c2)
33 ```
34
35 For the double-coefficient butterfly instruction.
36
37 `fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)`
38
39 ```
40 #define ROUND_POWER_OF_TWO(value, n) \
41 (((value) + (1 << ((n)-1))) >> (n))
42 ```
43
44 These instructions are at the core of **ALL** FDCT calculations in many major video codecs, including -but not limited to- VP8/VP9, AV1, etc.
45 Arm includes special instructions to optimize these operations, although they are limited in precision: `vqrdmulhq_s16`/`vqrdmulhq_s32`.
46
47 The suggestion is to have a single instruction to calculate both values `((a + b) * c) >> N`, and `((a - b) * c) >> N`.
48 The instruction will run in accumulate mode, so in order to calculate the 2-coeff version one would just have to call the same instruction with different order a, b and a different constant c.
49
50 ## Integer Butterfly Multiply Add/Sub FFT/DCT
51
52 **Add the following to Book I Section 3.3.9.1**
53
54 A-Form
55
56 ```
57 |0 |6 |11 |16 |21 |26 |31 |
58 | PO | RT | RA | RB | SH | XO |Rc |
59
60 ```
61
62 * maddsubrs RT,RA,SH,RB
63
64 Pseudo-code:
65
66 ```
67 n <- SH
68 sum <- (RT) + (RA)
69 diff <- (RT) - (RA)
70 prod1 <- MULS(RB, sum)[XLEN:(XLEN*2)-1]
71 prod2 <- MULS(RB, diff)[XLEN:(XLEN*2)-1]
72 res1 <- ROTL64(prod1, XLEN-n)
73 res2 <- ROTL64(prod2, XLEN-n)
74 m <- MASK(n, (XLEN-1))
75 signbit1 <- res1[0]
76 signbit2 <- res2[0]
77 smask1 <- ([signbit1]*XLEN) & ¬m
78 smask2 <- ([signbit2]*XLEN) & ¬m
79 s64_1 <- [0]*(XLEN-1) || signbit1
80 s64_2 <- [0]*(XLEN-1) || signbit2
81 RT <- (res1 & m | smask1) + s64_1
82 RS <- (res2 & m | smask2) + s64_2
83 ```
84
85 Note that if Rc=1 an Illegal Instruction is raised.
86 Rc=1 is `RESERVED`
87
88 Similar to `RTp`, this instruction produces an implicit result,
89 `RS`, which under Scalar circumstances is defined as `RT+1`.
90 For SVP64 if `RT` is a Vector, `RS` begins immediately after the
91 Vector `RT` where the length of `RT` is set by `SVSTATE.MAXVL`
92 (Max Vector Length).
93
94 Special Registers Altered:
95
96 ```
97 None
98 ```
99
100 # Twin Butterfly Integer DCT Instruction(s)
101
102 ## Floating Twin Multiply-Add DCT [Single]
103
104 **Add the following to Book I Section 4.6.6.3**
105
106 X-Form
107
108 ```
109 |0 |6 |11 |16 |21 |31 |
110 | PO | FRT | FRA | FRB | XO |Rc |
111 ```
112
113 * fdmadds FRT,FRA,FRB (Rc=0)
114
115 Pseudo-code:
116
117 ```
118 FRS <- FPADD32(FRT, FRB)
119 FRT <- FPMULADD32(FRT, FRA, FRB, 1, -1)
120 ```
121
122 The Floating-Point operand in register FRT is added to the floating-point
123 operand in register FRB and the result stored in FRS.
124
125 Using the exact same operand input register values from FRT and FRB that
126 were used to create FRS, the Floating-Point operand in register FRB
127 is subtracted from the floating-point operand in register FRT and the
128 result then multiplied by FRA to create an intermediate result that is
129 stored in FRT.
130
131 The add into FRS is treated exactly as `fadd`. The creation
132 of the result FRT is exact!y that of `fmsub`. The creation of FRS and FRT are
133 treated as parallel independent operations which occur at the same time.
134
135 Note that if Rc=1 an Illegal Instruction is raised.
136 Rc=1 is `RESERVED`
137
138 Similar to `FRTp`, this instruction produces an implicit result,
139 `FRS`, which under Scalar circumstances is defined as `FRT+1`.
140 For SVP64 if `FRT` is a Vector, `FRS` begins immediately after the
141 Vector `FRT` where the length of `FRT` is set by `SVSTATE.MAXVL`
142 (Max Vector Length).
143
144 Special Registers Altered:
145
146 ```
147 FPRF FR FI
148 FX OX UX XX
149 VXSNAN VXISI VXIMZ
150 ```
151
152 ## Floating Multiply-Add FFT [Single]
153
154 **Add the following to Book I Section 4.6.6.3**
155
156 X-Form
157
158 ```
159 |0 |6 |11 |16 |21 |31 |
160 | PO | FRT | FRA | FRB | XO |Rc |
161 ```
162
163 * ffmadds FRT,FRA,FRB (Rc=0)
164
165 Pseudo-code:
166
167 ```
168 FRS <- FPMULADD32(FRT, FRA, FRB, -1, 1)
169 FRT <- FPMULADD32(FRT, FRA, FRB, 1, 1)
170 ```
171
172 The two operations
173
174 ```
175 FRS <- -([(FRT) * (FRA)] - (FRB))
176 FRT <- [(FRT) * (FRA)] + (FRB)
177 ```
178
179 are performed.
180
181 The floating-point operand in register FRT is multiplied
182 by the floating-point operand in register FRA. The float-
183 ing-point operand in register FRB is added to
184 this intermediate result, and the intermediate stored in FRS.
185
186 Using the exact same values of FRT, FRT and FRB as used to create FRS,
187 the floating-point operand in register FRT is multiplied
188 by the floating-point operand in register FRA. The float-
189 ing-point operand in register FRB is subtracted from
190 this intermediate result, and the intermediate stored in FRT.
191
192 FRT is created as if
193 a `fmadds` operation had been performed. FRS is created as if
194 a `fnmsubs` operation had simultaneously been performed with
195 the exact same register operands, in parallel, independently,
196 at exactly the same time.
197
198 FRT is a Read-Modify-Write operation.
199
200 Note that if Rc=1 an Illegal Instruction is raised.
201 Rc=1 is `RESERVED`
202
203 Similar to `FRTp`, this instruction produces an implicit result,
204 `FRS`, which under Scalar circumstances is defined as `FRT+1`.
205 For SVP64 if `FRT` is a Vector, `FRS` begins immediately after the
206 Vector `FRT` where the length of `FRT` is set by `SVSTATE.MAXVL`
207 (Max Vector Length).
208
209
210 Special Registers Altered:
211
212 ```
213 FPRF FR FI
214 FX OX UX XX
215 VXSNAN VXISI VXIMZ
216 ```
217 ## Floating Twin Multiply-Add DCT
218
219 **Add the following to Book I Section 4.6.6.3**
220
221 X-Form
222
223 ```
224 |0 |6 |11 |16 |21 |31 |
225 | PO | FRT | FRA | FRB | XO |Rc |
226 ```
227
228 * fdmadd FRT,FRA,FRB (Rc=0)
229
230 Pseudo-code:
231
232 ```
233 FRS <- FPADD64(FRT, FRB)
234 FRT <- FPMULADD64(FRT, FRA, FRB, 1, -1)
235 ```
236
237 The Floating-Point operand in register FRT is added to the floating-point
238 operand in register FRB and the result stored in FRS.
239
240 Using the exact same operand input register values from FRT and FRB that
241 were used to create FRS, the Floating-Point operand in register FRB
242 is subtracted from the floating-point operand in register FRT and the
243 result then multiplied by FRA to create an intermediate result that is
244 stored in FRT.
245
246 The add into FRS is treated exactly as `fadd`. The creation
247 of the result FRT is exact!y that of `fmsub`. The creation of FRS and FRT are
248 treated as parallel independent operations which occur at the same time.
249
250 Note that if Rc=1 an Illegal Instruction is raised.
251 Rc=1 is `RESERVED`
252
253 Similar to `FRTp`, this instruction produces an implicit result,
254 `FRS`, which under Scalar circumstances is defined as `FRT+1`.
255 For SVP64 if `FRT` is a Vector, `FRS` begins immediately after the
256 Vector `FRT` where the length of `FRT` is set by `SVSTATE.MAXVL`
257 (Max Vector Length).
258
259 Special Registers Altered:
260
261 ```
262 FPRF FR FI
263 FX OX UX XX
264 VXSNAN VXISI VXIMZ
265 ```
266
267 ## Floating Twin Multiply-Add FFT
268
269 **Add the following to Book I Section 4.6.6.3**
270
271 X-Form
272
273 ```
274 |0 |6 |11 |16 |21 |31 |
275 | PO | FRT | FRA | FRB | XO |Rc |
276 ```
277
278 * ffmadd FRT,FRA,FRB (Rc=0)
279
280 Pseudo-code:
281
282 ```
283 FRS <- FPMULADD64(FRT, FRA, FRB, -1, 1)
284 FRT <- FPMULADD64(FRT, FRA, FRB, 1, 1)
285 ```
286
287 The two operations
288
289 ```
290 FRS <- -([(FRT) * (FRA)] - (FRB))
291 FRT <- [(FRT) * (FRA)] + (FRB)
292 ```
293
294 are performed.
295
296 The floating-point operand in register FRT is multiplied
297 by the floating-point operand in register FRA. The float-
298 ing-point operand in register FRB is added to
299 this intermediate result, and the intermediate stored in FRS.
300
301 Using the exact same values of FRT, FRT and FRB as used to create FRS,
302 the floating-point operand in register FRT is multiplied
303 by the floating-point operand in register FRA. The float-
304 ing-point operand in register FRB is subtracted from
305 this intermediate result, and the intermediate stored in FRT.
306
307 FRT is created as if
308 a `fmadd` operation had been performed. FRS is created as if
309 a `fnmsub` operation had simultaneously been performed with
310 the exact same register operands, in parallel, independently,
311 at exactly the same time.
312
313 FRT is a Read-Modify-Write operation.
314
315 Note that if Rc=1 an Illegal Instruction is raised.
316 Rc=1 is `RESERVED`
317
318 Similar to `FRTp`, this instruction produces an implicit result,
319 `FRS`, which under Scalar circumstances is defined as `FRT+1`.
320 For SVP64 if `FRT` is a Vector, `FRS` begins immediately after the
321 Vector `FRT` where the length of `FRT` is set by `SVSTATE.MAXVL`
322 (Max Vector Length).
323
324 Special Registers Altered:
325
326 ```
327 FPRF FR FI
328 FX OX UX XX
329 VXSNAN VXISI VXIMZ
330 ```
331
332
333 ## [DRAFT] Floating Add FFT/DCT [Single]
334
335 A-Form
336
337 * ffadds FRT,FRA,FRB (Rc=0)
338 * ffadds. FRT,FRA,FRB (Rc=1)
339
340 Pseudo-code:
341
342 ```
343 FRT <- FPADD32(FRA, FRB)
344 FRS <- FPSUB32(FRB, FRA)
345 ```
346
347 Special Registers Altered:
348
349 ```
350 FPRF FR FI
351 FX OX UX XX
352 VXSNAN VXISI
353 CR1 (if Rc=1)
354 ```
355
356 ## [DRAFT] Floating Add FFT/DCT [Double]
357
358 A-Form
359
360 * ffadd FRT,FRA,FRB (Rc=0)
361 * ffadd. FRT,FRA,FRB (Rc=1)
362
363 Pseudo-code:
364
365 ```
366 FRT <- FPADD64(FRA, FRB)
367 FRS <- FPSUB64(FRB, FRA)
368 ```
369
370 Special Registers Altered:
371
372 ```
373 FPRF FR FI
374 FX OX UX XX
375 VXSNAN VXISI
376 CR1 (if Rc=1)
377 ```
378
379 ## [DRAFT] Floating Subtract FFT/DCT [Single]
380
381 A-Form
382
383 * ffsubs FRT,FRA,FRB (Rc=0)
384 * ffsubs. FRT,FRA,FRB (Rc=1)
385
386 Pseudo-code:
387
388 ```
389 FRT <- FPSUB32(FRB, FRA)
390 FRS <- FPADD32(FRA, FRB)
391 ```
392
393 Special Registers Altered:
394
395 ```
396 FPRF FR FI
397 FX OX UX XX
398 VXSNAN VXISI
399 CR1 (if Rc=1)
400 ```
401
402 ## [DRAFT] Floating Subtract FFT/DCT [Double]
403
404 A-Form
405
406 * ffsub FRT,FRA,FRB (Rc=0)
407 * ffsub. FRT,FRA,FRB (Rc=1)
408
409 Pseudo-code:
410
411 ```
412 FRT <- FPSUB64(FRB, FRA)
413 FRS <- FPADD64(FRA, FRB)
414 ```
415
416 Special Registers Altered:
417
418 ```
419 FPRF FR FI
420 FX OX UX XX
421 VXSNAN VXISI
422 CR1 (if Rc=1)
423 ```