(no commit message)
[libreriscv.git] / simple_v_extension / vector_ops.mdwn
1 # Vector Operations Extension to SV
2
3 This extension is usually dependent on SV SUBVL being implemented. When SUBVL is set to define the length of a subvector the operations in this extension interpret the elements as a single vector.
4
5 Normally in SV all operations are scalar and independent, and the operations on them may inherently be independently parallelised, with the result being a vector of length exactly equal to the input vectors.
6
7 In this extension, the subvector itself is typically the unit, although some operations will work on scalars or standard vectors as well, or the result is a scalar that is dependent on all elements within the vector arguments.
8
9 Examples which can require SUBVL include cross product and may in future involve complex numbers.
10
11 ## CORDIC
12
13 CORDIC is an extremely general-purpose algorithm useful for a huge number
14 of diverse purposes. In its full form it does however require quite a
15 few parameters, one of which is a vector, making it awkward to include in
16 a standard "scalar" ISA. Additionally the coordinates can be set to circular,
17 linear or hyperbolic, producing three different modes, and the algorithm
18 may also be run in either "vector" mode or "rotation" mode. See [[discussion]]
19
20 CORDIC can also be used for performing DCT. See
21 <https://arxiv.org/abs/1606.02424>
22
23 vx, vy = CORDIC(vx, vy, coordinate\_mode, beta)
24
25 int i = 0;
26 int iterations = 0; // Number of times to run the algorithm
27 float arctanTable[iterations]; // in Radians
28 float K = 0.6073; // K
29 float v_x,v_y; // Vector v; x and y components
30
31 for(i=0; i < iterations; i++) {
32 arctanTable[i] = atan(pow(2,-i));
33 }
34
35 float vnew_x; // To store the new value of x;
36 for(i = 0; i < iterations; i++) {
37 // If beta is negative, we need to do a counter-clockwise rotation:
38 if( beta < 0) {
39 vnew_x = v_x + (v_y*pow(2,-i));
40 v_y -= (v_x*pow(2,-i));
41 beta += arctanTable[i];
42 }
43 // If beta is positive, we need to do a clockwise rotation:
44 else {
45 vnew_x = v_x - (v_y*pow(2,-i));
46 v_y += (v_x*pow(2,-i));
47 beta -= arctanTable[i];
48 }
49 v_x = vnew_x;
50 }
51 v_x *= K;
52 v_y *= K;
53
54 Links:
55
56 * <http://www.myhdl.org/docs/examples/sinecomp/>
57
58 ## Vector cross product
59
60 Result is the cross product of x and y, i.e., the resulting components are, in order:
61
62 x[1] * y[2] - y[1] * x[2]
63 x[2] * y[0] - y[2] * x[0]
64 x[0] * y[1] - y[0] * x[1]
65
66 All the operands must be vectors of 3 components of a floating-point type.
67
68 Pseudocode:
69
70 vec3 a, b; // elements in order a.x, a.y, a.z
71 // compute a cross b:
72 vec3 t1 = a.yzx; // produce vector [a.y, a.z, a.x]
73 vec3 t2 = b.zxy;
74 vec3 t3 = a.zxy;
75 vec3 t4 = b.yzx;
76 vec3 p = t3 * t4;
77 vec3 cross = t1 * t2 - p;
78
79 ## Vector dot product
80
81 Computes the dot product of two vectors. Internal accuracy must be
82 greater than the input vectors and the result.
83
84 Pseudocode in python:
85
86 from operator import mul
87 sum(map(mul, A, B))
88
89 Pseudocode in c:
90
91 double dot_product(float v[], float u[], int n)
92 {
93 double result = 0.0;
94 for (int i = 0; i < n; i++)
95 result += v[i] * u[i];
96 return result;
97 }
98
99 ## Vector length
100
101 The scalar length of a vector:
102
103 sqrt(x[0]^2 + x[1]^2 + ...).
104
105 ## Vector distance
106
107 The scalar distance between two vectors. Subtracts one vector from the
108 other and returns length:
109
110 length(v0 - v1)
111
112 ## Vector LERP
113
114 Known as **fmix** in GLSL.
115
116 <https://en.m.wikipedia.org/wiki/Linear_interpolation>
117
118 Pseudocode:
119
120 // Imprecise method, which does not guarantee v = v1 when t = 1,
121 // due to floating-point arithmetic error.
122 // This form may be used when the hardware has a native fused
123 // multiply-add instruction.
124 float lerp(float v0, float v1, float t) {
125 return v0 + t * (v1 - v0);
126 }
127
128 // Precise method, which guarantees v = v1 when t = 1.
129 float lerp(float v0, float v1, float t) {
130 return (1 - t) * v0 + t * v1;
131 }
132
133 ## Vector SLERP
134
135 Not recommended as it is not commonly used and has several trigonometric
136 functions.
137
138 <https://en.m.wikipedia.org/wiki/Slerp>
139
140 Pseudocode:
141
142 Quaternion slerp(Quaternion v0, Quaternion v1, double t) {
143 // Only unit quaternions are valid rotations.
144 // Normalize to avoid undefined behavior.
145 v0.normalize();
146 v1.normalize();
147
148 // Compute the cosine of the angle between the two vectors.
149 double dot = dot_product(v0, v1);
150
151 // If the dot product is negative, slerp won't take
152 // the shorter path. Note that v1 and -v1 are equivalent when
153 // the negation is applied to all four components. Fix by
154 // reversing one quaternion.
155 if (dot < 0.0f) {
156 v1 = -v1;
157 dot = -dot;
158 }
159
160 const double DOT_THRESHOLD = 0.9995;
161 if (dot > DOT_THRESHOLD) {
162 // If the inputs are too close for comfort, linearly interpolate
163 // and normalize the result.
164
165 Quaternion result = v0 + t*(v1 - v0);
166 result.normalize();
167 return result;
168 }
169
170 // Since dot is in range [0, DOT_THRESHOLD], acos is safe
171 double theta_0 = acos(dot); // theta_0 = angle between input vectors
172 double theta = theta_0*t; // theta = angle between v0 and result
173 double sin_theta = sin(theta); // compute this value only once
174 double sin_theta_0 = sin(theta_0); // compute this value only once
175
176 double s0 = cos(theta) - dot * sin_theta / sin_theta_0; // == sin(theta_0 - theta) / sin(theta_0)
177 double s1 = sin_theta / sin_theta_0;
178
179 return (s0 * v0) + (s1 * v1);
180 }
181
182 However this algorithm does not involve transcendentals except in
183 the computation of the tables: <https://en.wikipedia.org/wiki/CORDIC#Rotation_mode>
184
185 function v = cordic(beta,n)
186 % This function computes v = [cos(beta), sin(beta)] (beta in radians)
187 % using n iterations. Increasing n will increase the precision.
188
189 if beta < -pi/2 || beta > pi/2
190 if beta < 0
191 v = cordic(beta + pi, n);
192 else
193 v = cordic(beta - pi, n);
194 end
195 v = -v; % flip the sign for second or third quadrant
196 return
197 end
198
199 % Initialization of tables of constants used by CORDIC
200 % need a table of arctangents of negative powers of two, in radians:
201 % angles = atan(2.^-(0:27));
202 angles = [ ...
203 0.78539816339745 0.46364760900081
204 0.24497866312686 0.12435499454676 ...
205 0.06241880999596 0.03123983343027
206 0.01562372862048 0.00781234106010 ...
207 0.00390623013197 0.00195312251648
208 0.00097656218956 0.00048828121119 ...
209 0.00024414062015 0.00012207031189
210 0.00006103515617 0.00003051757812 ...
211 0.00001525878906 0.00000762939453
212 0.00000381469727 0.00000190734863 ...
213 0.00000095367432 0.00000047683716
214 0.00000023841858 0.00000011920929 ...
215 0.00000005960464 0.00000002980232
216 0.00000001490116 0.00000000745058 ];
217 % and a table of products of reciprocal lengths of vectors [1, 2^-2j]:
218 % Kvalues = cumprod(1./abs(1 + 1j*2.^(-(0:23))))
219 Kvalues = [ ...
220 0.70710678118655 0.63245553203368
221 0.61357199107790 0.60883391251775 ...
222 0.60764825625617 0.60735177014130
223 0.60727764409353 0.60725911229889 ...
224 0.60725447933256 0.60725332108988
225 0.60725303152913 0.60725295913894 ...
226 0.60725294104140 0.60725293651701
227 0.60725293538591 0.60725293510314 ...
228 0.60725293503245 0.60725293501477
229 0.60725293501035 0.60725293500925 ...
230 0.60725293500897 0.60725293500890
231 0.60725293500889 0.60725293500888 ];
232 Kn = Kvalues(min(n, length(Kvalues)));
233
234 % Initialize loop variables:
235 v = [1;0]; % start with 2-vector cosine and sine of zero
236 poweroftwo = 1;
237 angle = angles(1);
238
239 % Iterations
240 for j = 0:n-1;
241 if beta < 0
242 sigma = -1;
243 else
244 sigma = 1;
245 end
246 factor = sigma * poweroftwo;
247 % Note the matrix multiplication can be done using scaling by
248 % powers of two and addition subtraction
249 R = [1, -factor; factor, 1];
250 v = R * v; % 2-by-2 matrix multiply
251 beta = beta - sigma * angle; % update the remaining angle
252 poweroftwo = poweroftwo / 2;
253 % update the angle from table, or eventually by just dividing by two
254 if j+2 > length(angles)
255 angle = angle / 2;
256 else
257 angle = angles(j+2);
258 end
259 end
260
261 % Adjust length of output vector to be [cos(beta), sin(beta)]:
262 v = v * Kn;
263 return
264
265 endfunction
266
267 2x2 matrix multiply can be done with shifts and adds:
268
269 x = v[0] - sigma * (v[1] * 2^(-j));
270 y = sigma * (v[0] * 2^(-j)) + v[1];
271 v = [x; y];
272
273 The technique is outlined in a paper as being applicable to 3D:
274 <https://www.atlantis-press.com/proceedings/jcis2006/232>
275
276 # Expensive 3-operand OP32 operations
277
278 3-operand operations are extremely expensive in terms of OP32 encoding space. A potential idea is to embed 3 RVC register formats across two out of three 5-bit fields rs1/rs2/rd