(no commit message)
[libreriscv.git] / openpower / sv / twin_butterfly.mdwn
1 * <https://bugs.libre-soc.org/show_bug.cgi?id=1074>
2 * <https://libre-soc.org/openpower/sv/biginteger/> for format and
3 information about implicit RS/FRS
4 * <https://git.libre-soc.org/?p=openpower-isa.git;a=blob;f=src/openpower/decoder/isa/test_caller_svp64_dct.py;hb=HEAD>
5 * [[openpower/isa/svfparith]]
6
7 # Twin Butterfly Integer DCT Instruction(s)
8
9 The goal is to implement instructions that calculate the expression:
10
11 ```
12 fdct_round_shift((a +/- b) * c)
13 ```
14
15 For the single-coefficient butterfly instruction, and:
16
17 ```
18 fdct_round_shift(a * c1 +/- b * c2)
19 ```
20
21 For the double-coefficient butterfly instruction.
22
23 `fdct_round_shift` is defined as `ROUND_POWER_OF_TWO(x, 14)`
24
25 ```
26 #define ROUND_POWER_OF_TWO(value, n) (((value) + (1 << ((n)-1))) >> (n))
27 ```
28
29 These instructions are at the core of **ALL** FDCT calculations in many major video codecs, including -but not limited to- VP8/VP9, AV1, etc.
30 Arm includes special instructions to optimize these operations, although they are limited in precision: `vqrdmulhq_s16`/`vqrdmulhq_s32`.
31
32 The suggestion is to have a single instruction to calculate both values `((a + b) * c) >> N`, and `((a - b) * c) >> N`.
33 The instruction will run in accumulate mode, so in order to calculate the 2-coeff version one would just have to call the same instruction with different order a, b and a different constant c.
34
35 ## [DRAFT] Integer Butterfly Multiply Add/Sub FFT/DCT
36
37 A-Form
38
39 * maddsubrs RT,RA,SH,RB
40
41 Pseudo-code:
42
43 ```
44 n <- SH
45 sum <- (RT) + (RA)
46 diff <- (RT) - (RA)
47 prod1 <- MULS(RB, sum)[XLEN:(XLEN*2)-1]
48 prod2 <- MULS(RB, diff)[XLEN:(XLEN*2)-1]
49 res1 <- ROTL64(prod1, XLEN-n)
50 res2 <- ROTL64(prod2, XLEN-n)
51 m <- MASK(n, (XLEN-1))
52 signbit1 <- res1[0]
53 signbit2 <- res2[0]
54 smask1 <- ([signbit1]*XLEN) & ¬m
55 smask2 <- ([signbit2]*XLEN) & ¬m
56 s64_1 <- [0]*(XLEN-1) || signbit1
57 s64_2 <- [0]*(XLEN-1) || signbit2
58 RT <- (res1 & m | smask1) + s64_1
59 RS <- (res2 & m | smask2) + s64_2
60 ```
61
62 Special Registers Altered:
63
64 ```
65 None
66 ```
67
68 Where we have added this variant in A-Form (defined in fields.txt):
69
70 ```
71 # # 1.6.17 A-FORM
72 |0 |6 |11 |16 |21 |26 |31 |
73 | PO | RT | RA | RB | SH | XO |Rc |
74
75 ```
76
77 The instruction has been added to `minor_22.csv`:
78
79 ```
80 ------01000,ALU,OP_MADDSUBRS,RT,CONST_SH,RB,RT,NONE,CR0,0,0,ZERO,0,NONE,0,0,0,0,1,0,RC_ONLY,0,0,maddsubrs,A,,1,unofficial until submitted and approved/renumbered by the opf isa wg
81 ```
82
83
84 # Twin Butterfly Integer DCT Instruction(s)
85
86 ## [DRAFT] Floating Twin Multiply-Add DCT [Single]
87
88 X-Form
89
90 ```
91 |0 |6 |11 |16 |21 |31 |
92 | PO | FRT | FRA | FRB | XO | Rc|
93 ```
94
95 * fdmadds FRT,FRA,FRB (Rc=0)
96 * fdmadds. FRT,FRA,FRB (Rc=1)
97
98 Pseudo-code:
99
100 ```
101 FRS <- FPADD32(FRT, FRB)
102 sub <- FPSUB32(FRT, FRB)
103 FRT <- FPMUL32(FRA, sub)
104 ```
105
106 Special Registers Altered:
107
108 ```
109 FPRF FR FI
110 FX OX UX XX
111 VXSNAN VXISI VXIMZ
112 CR1 (if Rc=1)
113 ```
114
115 ## [DRAFT] Floating Multiply-Add FFT [Single]
116
117 X-Form
118
119 ```
120 |0 |6 |11 |16 |21 |31 |
121 | PO | FRT | FRA | FRB | XO | Rc|
122 ```
123
124 * ffmadds FRT,FRA,FRB (Rc=0)
125 * ffmadds. FRT,FRA,FRB (Rc=1)
126
127 Pseudo-code:
128
129 ```
130 FRS <- FPMULADD32(FRT, FRA, FRB, -1, 1)
131 FRT <- FPMULADD32(FRT, FRA, FRB, 1, 1)
132 ```
133
134 Special Registers Altered:
135
136 ```
137 FPRF FR FI
138 FX OX UX XX
139 VXSNAN VXISI VXIMZ
140 CR1 (if Rc=1)
141 ```