openpower/sv/mv.vec.mdwn

   1 # Vector mv operations
   2
   3 In the SIMD VSX set, section 6.8.1 and 6.8.2 p254 of v3.0B has a series of pack and unpack operations. This page covers those and more.  [[svp64]] provides the Vector Context to also add saturation as well as predication.
   4
   5 See <https://bugs.libre-soc.org/show_bug.cgi?id=230#c30>
   6
   7 # move to/from vec2/3/4
   8
   9 Basic idea: mv operations where either the src or dest is specifically marked as having SUBVL apply to it, but, crucially, the *other* argument does *not*. Note that this is highly unusual in SimpleV, which normally only allows SUBVL to be applied uniformly across all dest and all src.
  10
  11      mv.srcvec  r3, r4.vec2
  12      mv.destvec r2.vec4, r5
  13
  14 TODO: evaluate whether this will fit with [[mv.swizzle]] involved as well
  15 (yes it probably will)
  16
  17 * M=0 is mv.srcvec
  18 * M=1 is mv.destvec
  19
  20 mv.srcvec (leaving out elwidths and chop):
  21
  22     for i in range(VL):
  23         regs[rd+i] = regs[rs+i*SUBVL]
  24
  25 mv.destvec (leaving out elwidths and chop):
  26
  27     for i in range(VL):
  28         regs[rd+i*SUBVL] = regs[rs+i]
  29
  30 Note that these mv operations only become significant when elwidth is set on the vector to a small value.  SUBVL=4, src elwidth=8, dest elwidth=32 for example.
  31
  32 intended to cover:
  33
  34     rd = (rs >> 0 * 8) & (2^8 - 1)
  35     rd+1 = (rs >> 1 * 8) & (2^8 - 1)
  36     rd+2 = (rs >> 2 * 8) & (2^8 - 1)
  37     rd+3 = (rs >> 3 * 8) & (2^8 - 1)
  38
  39 and variants involving vec3 into 32 bit (4th byte set to zero).
  40 TODO: include this pseudocode which shows how the vecN can do that.
  41 in this example RA elwidth=32 and RB elwidth=8, RB is a vec4.
  42
  43     for i in range(VL):
  44          if predicate_bit_not_set(i) continue
  45          uint8_t *start_point = (uint8_t*)(int_regfile[RA].i[i])
  46          for j in range(SUBVL): # vec4
  47               start_point[j] = some_op(int_regfile[RB].b[i*SUBVL + j])
  48
  49 ## Twin Predication, saturation, swizzle, and elwidth overrides
  50
  51 Note that mv is a twin-predicated operation, and is swizzlable.  This implies that from the vec2, vec3 or vec4, 1 to 8 bytes may be selected and re-ordered (XYZW), mixed with 0 and 1 constants, skipped by way of twin predicate pack and unpack, and a huge amount besides.
  52
  53 Also saturation can be applied to individual elements, including the elements within a vec2/3/4.
  54
  55 # mv.zip and unzip
  56
  57 | 0.5 |6.10|11.15|16..20|21..25|26.....30|31|  name        |
  58 |-----|----|-----|------|------|---------|--|--------------|
  59 | 19  | RT | RC  | RB/0 | RA/0 | XO[5:9] |Rc| mv.zip       |
  60 | 19  | RT | RC  | RS/0 | RA/0 | XO[5:9] |Rc| mv.unzip     |
  61
  62 these are specialist operations that zip or unzip to/from multiple regs to/from one vector including vec2/3/4. when SUBVL!=1 the vec2/3/4 is the contiguous unit that is copied (as if one register).  different elwidths result in zero-extension or truncation except if saturation is enabled, where signed/unsigned may be applied as usual.
  63
  64 mv.zip, RA=0, RB=0
  65
  66     for i in range(VL):
  67         regs[rt+i] = regs[rc+i]
  68
  69 mv.zip, RA=0, RB!=0
  70
  71     for i in range(VL):
  72         regs[rt+i*2  ] = regs[rb+i]
  73         regs[rt+i*2+1] = regs[rc+i]
  74
  75 mv.zip, RA!=0, RB!=0
  76
  77     for i in range(VL):
  78         regs[rt+i*3  ] = regs[rb+i]
  79         regs[rt+i*3+1] = regs[rc+i]
  80         regs[rt+i*3+2] = regs[ra+i]