simple_v_extension/vector_ops.mdwn

   1 [[!tag standards]]
   2
   3 # Vector Operations Extension to SV
   4
   5 This extension is usually dependent on SV SUBVL being implemented. When SUBVL is set to define the length of a subvector the operations in this extension interpret the elements as a single vector.
   6
   7 Normally in SV all operations are scalar and independent, and the operations on them may inherently be independently parallelised, with the result being a vector of length exactly equal to the input vectors.
   8
   9 In this extension, the subvector itself is typically the unit, although some operations will work on scalars or standard vectors as well, or the result is a scalar that is dependent on all elements within the vector arguments.
  10
  11 Examples which can require SUBVL include cross product and may in future involve complex numbers.
  12
  13 ## CORDIC
  14
  15 SUBVL=2, vd, vs; SUBVL=1, beta.
  16
  17 * CORDIC.lin.rot vd, vs, beta
  18 * CORDIC.cir.rot vd, vs, beta
  19 * CORDIC.hyp.rot vd, vs, beta
  20 * CORDIC.lin.vec vd, vs, beta
  21 * CORDIC.cir.vec vd, vs, beta
  22 * CORDIC.hyp.vec vd, vs, beta
  23
  24 CORDIC is an extremely general-purpose algorithm useful for a huge number
  25 of diverse purposes.  In its full form it does however require quite a
  26 few parameters, one of which is a vector, making it awkward to include in
  27 a standard "scalar" ISA.  Additionally the coordinates can be set to circular,
  28 linear or hyperbolic, producing three different modes, and the algorithm
  29 may also be run in either "vector" mode or "rotation" mode.  See [[discussion]]
  30
  31 CORDIC can also be used for performing DCT.  See
  32 <https://arxiv.org/abs/1606.02424>
  33
  34 vx, vy = CORDIC(vx, vy, coordinate\_mode, beta)
  35
  36      int i = 0;
  37      int iterations = 0; // Number of times to run the algorithm
  38      float arctanTable[iterations]; // in Radians
  39      float K = 0.6073; // K
  40      float v_x,v_y; // Vector v; x and y components
  41
  42      for(i=0; i < iterations; i++) {
  43         arctanTable[i] = atan(pow(2,-i));
  44      }
  45
  46      float vnew_x;   // To store the new value of x;
  47      for(i = 0; i < iterations; i++) {
  48          // If beta is negative, we need to do a counter-clockwise rotation:
  49          if( beta < 0) {
  50             vnew_x = v_x + (v_y*pow(2,-i));
  51             v_y -= (v_x*pow(2,-i));
  52             beta += arctanTable[i];
  53          }
  54          // If beta is positive, we need to do a clockwise rotation:
  55          else {
  56             vnew_x = v_x - (v_y*pow(2,-i));
  57             v_y += (v_x*pow(2,-i));
  58             beta -= arctanTable[i];
  59          }
  60          v_x = vnew_x;
  61      }
  62      v_x *= K;
  63      v_y *= K;
  64
  65 Links:
  66
  67 * <http://www.myhdl.org/docs/examples/sinecomp/>
  68
  69 ## Vector cross product
  70
  71 SUBVL=2,3,4 all regs
  72
  73 * VCROSS vd, vs1, vs1
  74
  75 Result is the cross product of x and y, i.e., the resulting components are, in order:
  76
  77     x[1] * y[2] - y[1] * x[2]
  78     x[2] * y[0] - y[2] * x[0]
  79     x[0] * y[1] - y[0] * x[1]
  80
  81 All the operands must be vectors of 3 components of a floating-point type.
  82
  83 Pseudocode:
  84
  85     vec3 a, b; // elements in order a.x, a.y, a.z
  86     // compute a cross b:
  87     vec3 t1 = a.yzx; // produce vector [a.y, a.z, a.x]
  88     vec3 t2 = b.zxy;
  89     vec3 t3 = a.zxy;
  90     vec3 t4 = b.yzx;
  91     vec3 p = t3 * t4;
  92     vec3 cross = t1 * t2 - p;
  93
  94 ## Vector dot product
  95
  96 * SUBVL=1, rd SUBVL=2,3,4 vs1,vs2
  97 * rd=scalar, SUBVL=default, vs1, vs2=vec
  98
  99 * VDOT rd, vs1, vs2
 100
 101 Computes the dot product of two vectors. Internal accuracy must be
 102 greater than the input vectors and the result.
 103
 104 Pseudocode in python:
 105
 106     from operator import mul
 107     sum(map(mul, A, B))
 108
 109 Pseudocode in c:
 110
 111     double dot_product(float v[], float u[], int n)
 112     {
 113         double result = 0.0;
 114         for (int i = 0; i < n; i++)
 115             result += v[i] * u[i];
 116         return result;
 117     }
 118
 119 ## Vector length
 120
 121 * rd=scalar, vs1=vec (SUBVL=default)
 122 * rd=scalar, vs1=vec (SUBVL=2,3,4) only 1
 123 * rd=vec, SUBVL=1; vs1=vec, SUBVL=any
 124
 125 * VLEN rd, vs1
 126
 127 The scalar length of a vector:
 128
 129     sqrt(x[0]^2 + x[1]^2 + ...).
 130
 131 ## Vector distance
 132
 133 * VDIST rd, vs1, vs2
 134
 135 The scalar distance between two vectors. Subtracts one vector from the
 136 other and returns length:
 137
 138     length(v0 - v1)
 139
 140 ## Vector LERP
 141
 142 * VLERP rd, vs1, rs2 # SUBVL=2: vs1.v0 vs1.v1
 143
 144 Known as **fmix** in GLSL.
 145
 146 <https://en.m.wikipedia.org/wiki/Linear_interpolation>
 147
 148 Pseudocode:
 149
 150     // Imprecise method, which does not guarantee v = v1 when t = 1,
 151     // due to floating-point arithmetic error.
 152     // This form may be used when the hardware has a native fused
 153     // multiply-add instruction.
 154     float lerp(float v0, float v1, float t) {
 155       return v0 + t * (v1 - v0);
 156     }
 157
 158     // Precise method, which guarantees v = v1 when t = 1.
 159     float lerp(float v0, float v1, float t) {
 160       return (1 - t) * v0 + t * v1;
 161     }
 162
 163 ## Vector SLERP
 164
 165 * VSLERP vd, vs1, vs2, rs3
 166
 167 Not recommended as it is not commonly used and has several trigonometric
 168 functions. Also a costly 4 arg operation.
 169
 170 <https://en.m.wikipedia.org/wiki/Slerp>
 171
 172 Pseudocode:
 173
 174     Quaternion slerp(Quaternion v0, Quaternion v1, double t) {
 175         // Only unit quaternions are valid rotations.
 176         // Normalize to avoid undefined behavior.
 177         v0.normalize();
 178         v1.normalize();
 179
 180         // Compute the cosine of the angle between the two vectors.
 181         double dot = dot_product(v0, v1);
 182
 183         // If the dot product is negative, slerp won't take
 184         // the shorter path. Note that v1 and -v1 are equivalent when
 185         // the negation is applied to all four components. Fix by
 186         // reversing one quaternion.
 187         if (dot < 0.0f) {
 188             v1 = -v1;
 189             dot = -dot;
 190         }
 191
 192         const double DOT_THRESHOLD = 0.9995;
 193         if (dot > DOT_THRESHOLD) {
 194             // If the inputs are too close for comfort, linearly interpolate
 195             // and normalize the result.
 196
 197             Quaternion result = v0 + t*(v1 - v0);
 198             result.normalize();
 199             return result;
 200         }
 201
 202         // Since dot is in range [0, DOT_THRESHOLD], acos is safe
 203         double theta_0 = acos(dot);        // theta_0 = angle between input vectors
 204         double theta = theta_0*t;          // theta = angle between v0 and result
 205         double sin_theta = sin(theta);     // compute this value only once
 206         double sin_theta_0 = sin(theta_0); // compute this value only once
 207
 208         double s0 = cos(theta) - dot * sin_theta / sin_theta_0;  // == sin(theta_0 - theta) / sin(theta_0)
 209         double s1 = sin_theta / sin_theta_0;
 210
 211         return (s0 * v0) + (s1 * v1);
 212     }
 213
 214 However this algorithm does not involve transcendentals except in
 215 the computation of the tables: <https://en.wikipedia.org/wiki/CORDIC#Rotation_mode>
 216
 217     function v = cordic(beta,n)
 218         % This function computes v = [cos(beta), sin(beta)] (beta in radians)
 219         % using n iterations. Increasing n will increase the precision.
 220
 221         if beta < -pi/2 || beta > pi/2
 222             if beta < 0
 223                 v = cordic(beta + pi, n);
 224             else
 225                 v = cordic(beta - pi, n);
 226             end
 227             v = -v; % flip the sign for second or third quadrant
 228             return
 229         end
 230
 231         % Initialization of tables of constants used by CORDIC
 232         % need a table of arctangents of negative powers of two, in radians:
 233         % angles = atan(2.^-(0:27));
 234         angles =  [  ...
 235             0.78539816339745   0.46364760900081
 236             0.24497866312686   0.12435499454676 ...
 237             0.06241880999596   0.03123983343027
 238             0.01562372862048   0.00781234106010 ...
 239             0.00390623013197   0.00195312251648
 240             0.00097656218956   0.00048828121119 ...
 241             0.00024414062015   0.00012207031189
 242             0.00006103515617   0.00003051757812 ...
 243             0.00001525878906   0.00000762939453
 244             0.00000381469727   0.00000190734863 ...
 245             0.00000095367432   0.00000047683716
 246             0.00000023841858   0.00000011920929 ...
 247             0.00000005960464   0.00000002980232
 248             0.00000001490116   0.00000000745058 ];
 249         % and a table of products of reciprocal lengths of vectors [1, 2^-2j]:
 250         % Kvalues = cumprod(1./abs(1 + 1j*2.^(-(0:23))))
 251         Kvalues = [ ...
 252             0.70710678118655   0.63245553203368
 253             0.61357199107790   0.60883391251775 ...
 254             0.60764825625617   0.60735177014130
 255             0.60727764409353   0.60725911229889 ...
 256             0.60725447933256   0.60725332108988
 257             0.60725303152913   0.60725295913894 ...
 258             0.60725294104140   0.60725293651701
 259             0.60725293538591   0.60725293510314 ...
 260             0.60725293503245   0.60725293501477
 261             0.60725293501035   0.60725293500925 ...
 262             0.60725293500897   0.60725293500890
 263             0.60725293500889   0.60725293500888 ];
 264         Kn = Kvalues(min(n, length(Kvalues)));
 265
 266         % Initialize loop variables:
 267         v = [1;0]; % start with 2-vector cosine and sine of zero
 268         poweroftwo = 1;
 269         angle = angles(1);
 270
 271         % Iterations
 272         for j = 0:n-1;
 273             if beta < 0
 274                 sigma = -1;
 275             else
 276                 sigma = 1;
 277             end
 278             factor = sigma * poweroftwo;
 279             % Note the matrix multiplication can be done using scaling by
 280             % powers of two and addition subtraction
 281             R = [1, -factor; factor, 1];
 282             v = R * v; % 2-by-2 matrix multiply
 283             beta = beta - sigma * angle; % update the remaining angle
 284             poweroftwo = poweroftwo / 2;
 285             % update the angle from table, or eventually by just dividing by two
 286             if j+2 > length(angles)
 287                 angle = angle / 2;
 288             else
 289                 angle = angles(j+2);
 290             end
 291         end
 292
 293         % Adjust length of output vector to be [cos(beta), sin(beta)]:
 294         v = v * Kn;
 295         return
 296
 297     endfunction
 298
 299 2x2 matrix multiply can be done with shifts and adds:
 300
 301     x = v[0] - sigma * (v[1] * 2^(-j));
 302     y = sigma * (v[0] * 2^(-j)) + v[1];
 303     v = [x; y];
 304
 305 The technique is outlined in a paper as being applicable to 3D:
 306 <https://www.atlantis-press.com/proceedings/jcis2006/232>
 307
 308 # Expensive 3-operand OP32 operations
 309
 310 3-operand operations are extremely expensive in terms of OP32 encoding space.  A potential idea is to embed 3 RVC register formats across two out of three 5-bit fields rs1/rs2/rd
 311
 312 Another is to overwrite one of the src registers.
 313
 314 # Opcode Table
 315
 316 TODO