(no commit message)
[libreriscv.git] / simple_v_extension / vector_ops.mdwn
1 [[!tag standards]]
2
3 # Vector Operations Extension to SV
4
5 This extension is usually dependent on SV SUBVL being implemented. When SUBVL is set to define the length of a subvector the operations in this extension interpret the elements as a single vector.
6
7 Normally in SV all operations are scalar and independent, and the operations on them may inherently be independently parallelised, with the result being a vector of length exactly equal to the input vectors.
8
9 In this extension, the subvector itself is typically the unit, although some operations will work on scalars or standard vectors as well, or the result is a scalar that is dependent on all elements within the vector arguments.
10
11 Examples which can require SUBVL include cross product and may in future involve complex numbers.
12
13 ## CORDIC
14
15 SUBVL=2, vd, vs; SUBVL=1, beta.
16
17 * CORDIC.lin.rot vd, vs, beta
18 * CORDIC.cir.rot vd, vs, beta
19 * CORDIC.hyp.rot vd, vs, beta
20 * CORDIC.lin.vec vd, vs, beta
21 * CORDIC.cir.vec vd, vs, beta
22 * CORDIC.hyp.vec vd, vs, beta
23
24 CORDIC is an extremely general-purpose algorithm useful for a huge number
25 of diverse purposes. In its full form it does however require quite a
26 few parameters, one of which is a vector, making it awkward to include in
27 a standard "scalar" ISA. Additionally the coordinates can be set to circular,
28 linear or hyperbolic, producing three different modes, and the algorithm
29 may also be run in either "vector" mode or "rotation" mode. See [[discussion]]
30
31 CORDIC can also be used for performing DCT. See
32 <https://arxiv.org/abs/1606.02424>
33
34 vx, vy = CORDIC(vx, vy, coordinate\_mode, beta)
35
36 int i = 0;
37 int iterations = 0; // Number of times to run the algorithm
38 float arctanTable[iterations]; // in Radians
39 float K = 0.6073; // K
40 float v_x,v_y; // Vector v; x and y components
41
42 for(i=0; i < iterations; i++) {
43 arctanTable[i] = atan(pow(2,-i));
44 }
45
46 float vnew_x; // To store the new value of x;
47 for(i = 0; i < iterations; i++) {
48 // If beta is negative, we need to do a counter-clockwise rotation:
49 if( beta < 0) {
50 vnew_x = v_x + (v_y*pow(2,-i));
51 v_y -= (v_x*pow(2,-i));
52 beta += arctanTable[i];
53 }
54 // If beta is positive, we need to do a clockwise rotation:
55 else {
56 vnew_x = v_x - (v_y*pow(2,-i));
57 v_y += (v_x*pow(2,-i));
58 beta -= arctanTable[i];
59 }
60 v_x = vnew_x;
61 }
62 v_x *= K;
63 v_y *= K;
64
65 Links:
66
67 * <http://www.myhdl.org/docs/examples/sinecomp/>
68
69 ## Vector cross product
70
71 SUBVL=2,3,4 all regs
72
73 * VCROSS vd, vs1, vs1
74
75 Result is the cross product of x and y, i.e., the resulting components are, in order:
76
77 x[1] * y[2] - y[1] * x[2]
78 x[2] * y[0] - y[2] * x[0]
79 x[0] * y[1] - y[0] * x[1]
80
81 All the operands must be vectors of 3 components of a floating-point type.
82
83 Pseudocode:
84
85 vec3 a, b; // elements in order a.x, a.y, a.z
86 // compute a cross b:
87 vec3 t1 = a.yzx; // produce vector [a.y, a.z, a.x]
88 vec3 t2 = b.zxy;
89 vec3 t3 = a.zxy;
90 vec3 t4 = b.yzx;
91 vec3 p = t3 * t4;
92 vec3 cross = t1 * t2 - p;
93
94 ## Vector dot product
95
96 * SUBVL=1, rd SUBVL=2,3,4 vs1,vs2
97 * rd=scalar, SUBVL=default, vs1, vs2=vec
98
99 * VDOT rd, vs1, vs2
100
101 Computes the dot product of two vectors. Internal accuracy must be
102 greater than the input vectors and the result.
103
104 Pseudocode in python:
105
106 from operator import mul
107 sum(map(mul, A, B))
108
109 Pseudocode in c:
110
111 double dot_product(float v[], float u[], int n)
112 {
113 double result = 0.0;
114 for (int i = 0; i < n; i++)
115 result += v[i] * u[i];
116 return result;
117 }
118
119 ## Vector length
120
121 * rd=scalar, vs1=vec (SUBVL=default)
122 * rd=scalar, vs1=vec (SUBVL=2,3,4) only 1
123 * rd=vec, SUBVL=1; vs1=vec, SUBVL=any
124
125 * VLEN rd, vs1
126
127 The scalar length of a vector:
128
129 sqrt(x[0]^2 + x[1]^2 + ...).
130
131 ## Vector distance
132
133 * VDIST rd, vs1, vs2
134
135 The scalar distance between two vectors. Subtracts one vector from the
136 other and returns length:
137
138 length(v0 - v1)
139
140 ## Vector LERP
141
142 * VLERP rd, vs1, rs2 # SUBVL=2: vs1.v0 vs1.v1
143
144 Known as **fmix** in GLSL.
145
146 <https://en.m.wikipedia.org/wiki/Linear_interpolation>
147
148 Pseudocode:
149
150 // Imprecise method, which does not guarantee v = v1 when t = 1,
151 // due to floating-point arithmetic error.
152 // This form may be used when the hardware has a native fused
153 // multiply-add instruction.
154 float lerp(float v0, float v1, float t) {
155 return v0 + t * (v1 - v0);
156 }
157
158 // Precise method, which guarantees v = v1 when t = 1.
159 float lerp(float v0, float v1, float t) {
160 return (1 - t) * v0 + t * v1;
161 }
162
163 ## Vector SLERP
164
165 * VSLERP vd, vs1, vs2, rs3
166
167 Not recommended as it is not commonly used and has several trigonometric
168 functions. Also a costly 4 arg operation.
169
170 <https://en.m.wikipedia.org/wiki/Slerp>
171
172 Pseudocode:
173
174 Quaternion slerp(Quaternion v0, Quaternion v1, double t) {
175 // Only unit quaternions are valid rotations.
176 // Normalize to avoid undefined behavior.
177 v0.normalize();
178 v1.normalize();
179
180 // Compute the cosine of the angle between the two vectors.
181 double dot = dot_product(v0, v1);
182
183 // If the dot product is negative, slerp won't take
184 // the shorter path. Note that v1 and -v1 are equivalent when
185 // the negation is applied to all four components. Fix by
186 // reversing one quaternion.
187 if (dot < 0.0f) {
188 v1 = -v1;
189 dot = -dot;
190 }
191
192 const double DOT_THRESHOLD = 0.9995;
193 if (dot > DOT_THRESHOLD) {
194 // If the inputs are too close for comfort, linearly interpolate
195 // and normalize the result.
196
197 Quaternion result = v0 + t*(v1 - v0);
198 result.normalize();
199 return result;
200 }
201
202 // Since dot is in range [0, DOT_THRESHOLD], acos is safe
203 double theta_0 = acos(dot); // theta_0 = angle between input vectors
204 double theta = theta_0*t; // theta = angle between v0 and result
205 double sin_theta = sin(theta); // compute this value only once
206 double sin_theta_0 = sin(theta_0); // compute this value only once
207
208 double s0 = cos(theta) - dot * sin_theta / sin_theta_0; // == sin(theta_0 - theta) / sin(theta_0)
209 double s1 = sin_theta / sin_theta_0;
210
211 return (s0 * v0) + (s1 * v1);
212 }
213
214 However this algorithm does not involve transcendentals except in
215 the computation of the tables: <https://en.wikipedia.org/wiki/CORDIC#Rotation_mode>
216
217 function v = cordic(beta,n)
218 % This function computes v = [cos(beta), sin(beta)] (beta in radians)
219 % using n iterations. Increasing n will increase the precision.
220
221 if beta < -pi/2 || beta > pi/2
222 if beta < 0
223 v = cordic(beta + pi, n);
224 else
225 v = cordic(beta - pi, n);
226 end
227 v = -v; % flip the sign for second or third quadrant
228 return
229 end
230
231 % Initialization of tables of constants used by CORDIC
232 % need a table of arctangents of negative powers of two, in radians:
233 % angles = atan(2.^-(0:27));
234 angles = [ ...
235 0.78539816339745 0.46364760900081
236 0.24497866312686 0.12435499454676 ...
237 0.06241880999596 0.03123983343027
238 0.01562372862048 0.00781234106010 ...
239 0.00390623013197 0.00195312251648
240 0.00097656218956 0.00048828121119 ...
241 0.00024414062015 0.00012207031189
242 0.00006103515617 0.00003051757812 ...
243 0.00001525878906 0.00000762939453
244 0.00000381469727 0.00000190734863 ...
245 0.00000095367432 0.00000047683716
246 0.00000023841858 0.00000011920929 ...
247 0.00000005960464 0.00000002980232
248 0.00000001490116 0.00000000745058 ];
249 % and a table of products of reciprocal lengths of vectors [1, 2^-2j]:
250 % Kvalues = cumprod(1./abs(1 + 1j*2.^(-(0:23))))
251 Kvalues = [ ...
252 0.70710678118655 0.63245553203368
253 0.61357199107790 0.60883391251775 ...
254 0.60764825625617 0.60735177014130
255 0.60727764409353 0.60725911229889 ...
256 0.60725447933256 0.60725332108988
257 0.60725303152913 0.60725295913894 ...
258 0.60725294104140 0.60725293651701
259 0.60725293538591 0.60725293510314 ...
260 0.60725293503245 0.60725293501477
261 0.60725293501035 0.60725293500925 ...
262 0.60725293500897 0.60725293500890
263 0.60725293500889 0.60725293500888 ];
264 Kn = Kvalues(min(n, length(Kvalues)));
265
266 % Initialize loop variables:
267 v = [1;0]; % start with 2-vector cosine and sine of zero
268 poweroftwo = 1;
269 angle = angles(1);
270
271 % Iterations
272 for j = 0:n-1;
273 if beta < 0
274 sigma = -1;
275 else
276 sigma = 1;
277 end
278 factor = sigma * poweroftwo;
279 % Note the matrix multiplication can be done using scaling by
280 % powers of two and addition subtraction
281 R = [1, -factor; factor, 1];
282 v = R * v; % 2-by-2 matrix multiply
283 beta = beta - sigma * angle; % update the remaining angle
284 poweroftwo = poweroftwo / 2;
285 % update the angle from table, or eventually by just dividing by two
286 if j+2 > length(angles)
287 angle = angle / 2;
288 else
289 angle = angles(j+2);
290 end
291 end
292
293 % Adjust length of output vector to be [cos(beta), sin(beta)]:
294 v = v * Kn;
295 return
296
297 endfunction
298
299 2x2 matrix multiply can be done with shifts and adds:
300
301 x = v[0] - sigma * (v[1] * 2^(-j));
302 y = sigma * (v[0] * 2^(-j)) + v[1];
303 v = [x; y];
304
305 The technique is outlined in a paper as being applicable to 3D:
306 <https://www.atlantis-press.com/proceedings/jcis2006/232>
307
308 # Expensive 3-operand OP32 operations
309
310 3-operand operations are extremely expensive in terms of OP32 encoding space. A potential idea is to embed 3 RVC register formats across two out of three 5-bit fields rs1/rs2/rd
311
312 Another is to overwrite one of the src registers.
313
314 # Opcode Table
315
316 TODO