simple_v_extension/vector_ops.mdwn

   1 # Vector Operations Extension to SV
   2
   3 This extension is usually dependent on SV SUBVL being implemented. When SUBVL is set to define the length of a subvector the operations in this extension interpret the elements as a single vector.
   4
   5 Normally in SV all operations are scalar and independent, and the operations on them may inherently be independently parallelised, with the result being a vector of length exactly equal to the input vectors.
   6
   7 In this extension, the subvector itself is typically the unit, although some operations will work on scalars or standard vectors as well, or the result is a scalar that is dependent on all elements within the vector arguments.
   8
   9 Examples which can require SUBVL include cross product and may in future involve complex numbers.
  10
  11 ## CORDIC
  12
  13 CORDIC is an extremely general-purpose algorithm useful for a huge number
  14 of diverse purposes.  In its full form it does however require quite a
  15 few parameters, one of which is a vector, making it awkward to include in
  16 a standard "scalar" ISA.  Additionally the coordinates can be set to circular,
  17 linear or hyperbolic, producing three different modes, and the algorithm
  18 may also be run in either "vector" mode or "rotation" mode.  See [[discussion]]
  19
  20 CORDIC can also be used for performing DCT.  See
  21 <https://arxiv.org/abs/1606.02424>
  22
  23 vx, vy = CORDIC(vx, vy, coordinate\_mode, beta)
  24
  25      int i = 0;
  26      int iterations = 0; // Number of times to run the algorithm
  27      float arctanTable[iterations]; // in Radians
  28      float K = 0.6073; // K
  29      float v_x,v_y; // Vector v; x and y components
  30
  31      for(i=0; i < iterations; i++) {
  32         arctanTable[i] = atan(pow(2,-i));
  33      }
  34
  35      float vnew_x;   // To store the new value of x;
  36      for(i = 0; i < iterations; i++) {
  37          // If beta is negative, we need to do a counter-clockwise rotation:
  38          if( beta < 0) {
  39             vnew_x = v_x + (v_y*pow(2,-i));
  40             v_y -= (v_x*pow(2,-i));
  41             beta += arctanTable[i];
  42          }
  43          // If beta is positive, we need to do a clockwise rotation:
  44          else {
  45             vnew_x = v_x - (v_y*pow(2,-i));
  46             v_y += (v_x*pow(2,-i));
  47             beta -= arctanTable[i];
  48          }
  49          v_x = vnew_x;
  50      }
  51      v_x *= K;
  52      v_y *= K;
  53
  54 Links:
  55
  56 * <http://www.myhdl.org/docs/examples/sinecomp/>
  57
  58 ## Vector cross product
  59
  60 Result is the cross product of x and y, i.e., the resulting components are, in order:
  61
  62     x[1] * y[2] - y[1] * x[2]
  63     x[2] * y[0] - y[2] * x[0]
  64     x[0] * y[1] - y[0] * x[1]
  65
  66 All the operands must be vectors of 3 components of a floating-point type.
  67
  68 Pseudocode:
  69
  70     vec3 a, b; // elements in order a.x, a.y, a.z
  71     // compute a cross b:
  72     vec3 t1 = a.yzx; // produce vector [a.y, a.z, a.x]
  73     vec3 t2 = b.zxy;
  74     vec3 t3 = a.zxy;
  75     vec3 t4 = b.yzx;
  76     vec3 p = t3 * t4;
  77     vec3 cross = t1 * t2 - p;
  78
  79 ## Vector dot product
  80
  81 Computes the dot product of two vectors. Internal accuracy must be
  82 greater than the input vectors and the result.
  83
  84 Pseudocode in python:
  85
  86     from operator import mul
  87     sum(map(mul, A, B))
  88
  89 Pseudocode in c:
  90
  91     double dot_product(float v[], float u[], int n)
  92     {
  93         double result = 0.0;
  94         for (int i = 0; i < n; i++)
  95             result += v[i] * u[i];
  96         return result;
  97     }
  98
  99 ## Vector length
 100
 101 The scalar length of a vector:
 102
 103     sqrt(x[0]^2 + x[1]^2 + ...).
 104
 105 ## Vector distance
 106
 107 The scalar distance between two vectors. Subtracts one vector from the
 108 other and returns length:
 109
 110     length(v0 - v1)
 111
 112 ## Vector LERP
 113
 114 Known as **fmix** in GLSL.
 115
 116 <https://en.m.wikipedia.org/wiki/Linear_interpolation>
 117
 118 Pseudocode:
 119
 120     // Imprecise method, which does not guarantee v = v1 when t = 1,
 121     // due to floating-point arithmetic error.
 122     // This form may be used when the hardware has a native fused
 123     // multiply-add instruction.
 124     float lerp(float v0, float v1, float t) {
 125       return v0 + t * (v1 - v0);
 126     }
 127
 128     // Precise method, which guarantees v = v1 when t = 1.
 129     float lerp(float v0, float v1, float t) {
 130       return (1 - t) * v0 + t * v1;
 131     }
 132
 133 ## Vector SLERP
 134
 135 Not recommended as it is not commonly used and has several trigonometric
 136 functions.
 137
 138 <https://en.m.wikipedia.org/wiki/Slerp>
 139
 140 Pseudocode:
 141
 142     Quaternion slerp(Quaternion v0, Quaternion v1, double t) {
 143         // Only unit quaternions are valid rotations.
 144         // Normalize to avoid undefined behavior.
 145         v0.normalize();
 146         v1.normalize();
 147
 148         // Compute the cosine of the angle between the two vectors.
 149         double dot = dot_product(v0, v1);
 150
 151         // If the dot product is negative, slerp won't take
 152         // the shorter path. Note that v1 and -v1 are equivalent when
 153         // the negation is applied to all four components. Fix by
 154         // reversing one quaternion.
 155         if (dot < 0.0f) {
 156             v1 = -v1;
 157             dot = -dot;
 158         }
 159
 160         const double DOT_THRESHOLD = 0.9995;
 161         if (dot > DOT_THRESHOLD) {
 162             // If the inputs are too close for comfort, linearly interpolate
 163             // and normalize the result.
 164
 165             Quaternion result = v0 + t*(v1 - v0);
 166             result.normalize();
 167             return result;
 168         }
 169
 170         // Since dot is in range [0, DOT_THRESHOLD], acos is safe
 171         double theta_0 = acos(dot);        // theta_0 = angle between input vectors
 172         double theta = theta_0*t;          // theta = angle between v0 and result
 173         double sin_theta = sin(theta);     // compute this value only once
 174         double sin_theta_0 = sin(theta_0); // compute this value only once
 175
 176         double s0 = cos(theta) - dot * sin_theta / sin_theta_0;  // == sin(theta_0 - theta) / sin(theta_0)
 177         double s1 = sin_theta / sin_theta_0;
 178
 179         return (s0 * v0) + (s1 * v1);
 180     }
 181
 182 However this algorithm does not involve transcendentals except in
 183 the computation of the tables: <https://en.wikipedia.org/wiki/CORDIC#Rotation_mode>
 184
 185     function v = cordic(beta,n)
 186         % This function computes v = [cos(beta), sin(beta)] (beta in radians)
 187         % using n iterations. Increasing n will increase the precision.
 188
 189         if beta < -pi/2 || beta > pi/2
 190             if beta < 0
 191                 v = cordic(beta + pi, n);
 192             else
 193                 v = cordic(beta - pi, n);
 194             end
 195             v = -v; % flip the sign for second or third quadrant
 196             return
 197         end
 198
 199         % Initialization of tables of constants used by CORDIC
 200         % need a table of arctangents of negative powers of two, in radians:
 201         % angles = atan(2.^-(0:27));
 202         angles =  [  ...
 203             0.78539816339745   0.46364760900081
 204             0.24497866312686   0.12435499454676 ...
 205             0.06241880999596   0.03123983343027
 206             0.01562372862048   0.00781234106010 ...
 207             0.00390623013197   0.00195312251648
 208             0.00097656218956   0.00048828121119 ...
 209             0.00024414062015   0.00012207031189
 210             0.00006103515617   0.00003051757812 ...
 211             0.00001525878906   0.00000762939453
 212             0.00000381469727   0.00000190734863 ...
 213             0.00000095367432   0.00000047683716
 214             0.00000023841858   0.00000011920929 ...
 215             0.00000005960464   0.00000002980232
 216             0.00000001490116   0.00000000745058 ];
 217         % and a table of products of reciprocal lengths of vectors [1, 2^-2j]:
 218         % Kvalues = cumprod(1./abs(1 + 1j*2.^(-(0:23))))
 219         Kvalues = [ ...
 220             0.70710678118655   0.63245553203368
 221             0.61357199107790   0.60883391251775 ...
 222             0.60764825625617   0.60735177014130
 223             0.60727764409353   0.60725911229889 ...
 224             0.60725447933256   0.60725332108988
 225             0.60725303152913   0.60725295913894 ...
 226             0.60725294104140   0.60725293651701
 227             0.60725293538591   0.60725293510314 ...
 228             0.60725293503245   0.60725293501477
 229             0.60725293501035   0.60725293500925 ...
 230             0.60725293500897   0.60725293500890
 231             0.60725293500889   0.60725293500888 ];
 232         Kn = Kvalues(min(n, length(Kvalues)));
 233
 234         % Initialize loop variables:
 235         v = [1;0]; % start with 2-vector cosine and sine of zero
 236         poweroftwo = 1;
 237         angle = angles(1);
 238
 239         % Iterations
 240         for j = 0:n-1;
 241             if beta < 0
 242                 sigma = -1;
 243             else
 244                 sigma = 1;
 245             end
 246             factor = sigma * poweroftwo;
 247             % Note the matrix multiplication can be done using scaling by
 248             % powers of two and addition subtraction
 249             R = [1, -factor; factor, 1];
 250             v = R * v; % 2-by-2 matrix multiply
 251             beta = beta - sigma * angle; % update the remaining angle
 252             poweroftwo = poweroftwo / 2;
 253             % update the angle from table, or eventually by just dividing by two
 254             if j+2 > length(angles)
 255                 angle = angle / 2;
 256             else
 257                 angle = angles(j+2);
 258             end
 259         end
 260
 261         % Adjust length of output vector to be [cos(beta), sin(beta)]:
 262         v = v * Kn;
 263         return
 264
 265     endfunction
 266
 267 2x2 matrix multiply can be done with shifts and adds:
 268
 269     x = v[0] - sigma * (v[1] * 2^(-j));
 270     y = sigma * (v[0] * 2^(-j)) + v[1];
 271     v = [x; y];
 272
 273 The technique is outlined in a paper as being applicable to 3D:
 274 <https://www.atlantis-press.com/proceedings/jcis2006/232>
 275
 276 # Expensive 3-operand OP32 operations
 277
 278 3-operand operations are extremely expensive in terms of OP32 encoding space.  A potential idea is to embed 3 RVC register formats across two out of three 5-bit fields rs1/rs2/rd