3 # Vector Operations Extension to SV
5 This extension is usually dependent on SV SUBVL being implemented. When SUBVL is set to define the length of a subvector the operations in this extension interpret the elements as a single vector.
7 Normally in SV all operations are scalar and independent, and the operations on them may inherently be independently parallelised, with the result being a vector of length exactly equal to the input vectors.
9 In this extension, the subvector itself is typically the unit, although some operations will work on scalars or standard vectors as well, or the result is a scalar that is dependent on all elements within the vector arguments.
11 However given that some of the parameters are vectors (with and without SUBVL set), and some are scalars (where SUBVL will not apply), some clear rules need to be defined as to how the operations work.
13 Examples which can require SUBVL include cross product and may in future involve complex numbers.
17 * SUBVL=2, vd, vs; SUBVL ignored on beta.
19 * CORDIC.lin.rot vd, vs, beta
20 * CORDIC.cir.rot vd, vs, beta
21 * CORDIC.hyp.rot vd, vs, beta
22 * CORDIC.lin.vec vd, vs, beta
23 * CORDIC.cir.vec vd, vs, beta
24 * CORDIC.hyp.vec vd, vs, beta
26 CORDIC is an extremely general-purpose algorithm useful for a huge number
27 of diverse purposes. In its full form it does however require quite a
28 few parameters, one of which is a vector, making it awkward to include in
29 a standard "scalar" ISA. Additionally the coordinates can be set to circular,
30 linear or hyperbolic, producing three different modes, and the algorithm
31 may also be run in either "vector" mode or "rotation" mode. See [[discussion]]
33 CORDIC can also be used for performing DCT. See
34 <https://arxiv.org/abs/1606.02424>
36 vx, vy = CORDIC(vx, vy, coordinate\_mode, beta)
39 int iterations = 0; // Number of times to run the algorithm
40 float arctanTable[iterations]; // in Radians
41 float K = 0.6073; // K
42 float v_x,v_y; // Vector v; x and y components
44 for(i=0; i < iterations; i++) {
45 arctanTable[i] = atan(pow(2,-i));
48 float vnew_x; // To store the new value of x;
49 for(i = 0; i < iterations; i++) {
50 // If beta is negative, we need to do a counter-clockwise rotation:
52 vnew_x = v_x + (v_y*pow(2,-i));
53 v_y -= (v_x*pow(2,-i));
54 beta += arctanTable[i];
56 // If beta is positive, we need to do a clockwise rotation:
58 vnew_x = v_x - (v_y*pow(2,-i));
59 v_y += (v_x*pow(2,-i));
60 beta -= arctanTable[i];
69 * <http://www.myhdl.org/docs/examples/sinecomp/>
71 ## Vector cross product
77 Result is the cross product of x and y, i.e., the resulting components are, in order:
79 x[1] * y[2] - y[1] * x[2]
80 x[2] * y[0] - y[2] * x[0]
81 x[0] * y[1] - y[0] * x[1]
83 All the operands must be vectors of 3 components of a floating-point type.
87 vec3 a, b; // elements in order a.x, a.y, a.z
89 vec3 t1 = a.yzx; // produce vector [a.y, a.z, a.x]
94 vec3 cross = t1 * t2 - p;
98 * SUBVL ignored on rd. SUBVL=2,3,4 vs1,vs2
99 * rd=scalar, SUBVL=1, vs1, vs2=vec
103 Computes the dot product of two vectors. Internal accuracy must be
104 greater than the input vectors and the result.
106 Pseudocode in python:
108 from operator import mul
113 double dot_product(float v[], float u[], int n)
116 for (int i = 0; i < n; i++)
117 result += v[i] * u[i];
123 * rd=scalar, vs1=vec (SUBVL=1)
124 * rd=scalar, vs1=vec (SUBVL=2,3,4) only 1 (predication rules apply)
125 * rd=vec, SUBVL ignored; vs1=vec, SUBVL=2,3,4
126 * rd=vec, SUBVL ignored; vs1=vec, SUBVL=1: reserved encoding.
130 The scalar length of a vector:
132 sqrt(x[0]^2 + x[1]^2 + ...).
138 The scalar distance between two vectors. Subtracts one vector from the
139 other and returns length:
145 * VLERP rd, vs1, rs2 # SUBVL=2: vs1.v0 vs1.v1
147 Known as **fmix** in GLSL.
149 <https://en.m.wikipedia.org/wiki/Linear_interpolation>
153 // Imprecise method, which does not guarantee v = v1 when t = 1,
154 // due to floating-point arithmetic error.
155 // This form may be used when the hardware has a native fused
156 // multiply-add instruction.
157 float lerp(float v0, float v1, float t) {
158 return v0 + t * (v1 - v0);
161 // Precise method, which guarantees v = v1 when t = 1.
162 float lerp(float v0, float v1, float t) {
163 return (1 - t) * v0 + t * v1;
168 * VSLERP vd, vs1, vs2, rs3
170 Not recommended as it is not commonly used and has several trigonometric
171 functions, although CORDIC in vector rotate circular mode is designed for this purpose. Also a costly 4 arg operation.
173 <https://en.m.wikipedia.org/wiki/Slerp>
177 Quaternion slerp(Quaternion v0, Quaternion v1, double t) {
178 // Only unit quaternions are valid rotations.
179 // Normalize to avoid undefined behavior.
183 // Compute the cosine of the angle between the two vectors.
184 double dot = dot_product(v0, v1);
186 // If the dot product is negative, slerp won't take
187 // the shorter path. Note that v1 and -v1 are equivalent when
188 // the negation is applied to all four components. Fix by
189 // reversing one quaternion.
195 const double DOT_THRESHOLD = 0.9995;
196 if (dot > DOT_THRESHOLD) {
197 // If the inputs are too close for comfort, linearly interpolate
198 // and normalize the result.
200 Quaternion result = v0 + t*(v1 - v0);
205 // Since dot is in range [0, DOT_THRESHOLD], acos is safe
206 double theta_0 = acos(dot); // theta_0 = angle between input vectors
207 double theta = theta_0*t; // theta = angle between v0 and result
208 double sin_theta = sin(theta); // compute this value only once
209 double sin_theta_0 = sin(theta_0); // compute this value only once
211 double s0 = cos(theta) - dot * sin_theta / sin_theta_0; // == sin(theta_0 - theta) / sin(theta_0)
212 double s1 = sin_theta / sin_theta_0;
214 return (s0 * v0) + (s1 * v1);
217 However this algorithm does not involve transcendentals except in
218 the computation of the tables: <https://en.wikipedia.org/wiki/CORDIC#Rotation_mode>
220 function v = cordic(beta,n)
221 % This function computes v = [cos(beta), sin(beta)] (beta in radians)
222 % using n iterations. Increasing n will increase the precision.
224 if beta < -pi/2 || beta > pi/2
226 v = cordic(beta + pi, n);
228 v = cordic(beta - pi, n);
230 v = -v; % flip the sign for second or third quadrant
234 % Initialization of tables of constants used by CORDIC
235 % need a table of arctangents of negative powers of two, in radians:
236 % angles = atan(2.^-(0:27));
238 0.78539816339745 0.46364760900081
239 0.24497866312686 0.12435499454676 ...
240 0.06241880999596 0.03123983343027
241 0.01562372862048 0.00781234106010 ...
242 0.00390623013197 0.00195312251648
243 0.00097656218956 0.00048828121119 ...
244 0.00024414062015 0.00012207031189
245 0.00006103515617 0.00003051757812 ...
246 0.00001525878906 0.00000762939453
247 0.00000381469727 0.00000190734863 ...
248 0.00000095367432 0.00000047683716
249 0.00000023841858 0.00000011920929 ...
250 0.00000005960464 0.00000002980232
251 0.00000001490116 0.00000000745058 ];
252 % and a table of products of reciprocal lengths of vectors [1, 2^-2j]:
253 % Kvalues = cumprod(1./abs(1 + 1j*2.^(-(0:23))))
255 0.70710678118655 0.63245553203368
256 0.61357199107790 0.60883391251775 ...
257 0.60764825625617 0.60735177014130
258 0.60727764409353 0.60725911229889 ...
259 0.60725447933256 0.60725332108988
260 0.60725303152913 0.60725295913894 ...
261 0.60725294104140 0.60725293651701
262 0.60725293538591 0.60725293510314 ...
263 0.60725293503245 0.60725293501477
264 0.60725293501035 0.60725293500925 ...
265 0.60725293500897 0.60725293500890
266 0.60725293500889 0.60725293500888 ];
267 Kn = Kvalues(min(n, length(Kvalues)));
269 % Initialize loop variables:
270 v = [1;0]; % start with 2-vector cosine and sine of zero
281 factor = sigma * poweroftwo;
282 % Note the matrix multiplication can be done using scaling by
283 % powers of two and addition subtraction
284 R = [1, -factor; factor, 1];
285 v = R * v; % 2-by-2 matrix multiply
286 beta = beta - sigma * angle; % update the remaining angle
287 poweroftwo = poweroftwo / 2;
288 % update the angle from table, or eventually by just dividing by two
289 if j+2 > length(angles)
296 % Adjust length of output vector to be [cos(beta), sin(beta)]:
302 2x2 matrix multiply can be done with shifts and adds:
304 x = v[0] - sigma * (v[1] * 2^(-j));
305 y = sigma * (v[0] * 2^(-j)) + v[1];
308 The technique is outlined in a paper as being applicable to 3D:
309 <https://www.atlantis-press.com/proceedings/jcis2006/232>
311 # Expensive 3-operand OP32 operations
313 3-operand operations are extremely expensive in terms of OP32 encoding space. A potential idea is to embed 3 RVC register formats across two out of three 5-bit fields rs1/rs2/rd
315 Another is to overwrite one of the src registers.
323 * <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-September/002736.html>
324 * <http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-September/002733.html>
325 * <http://bugs.libre-riscv.org/show_bug.cgi?id=142>