openpower/sv/mv.vec.mdwn

   1 [[!tag standards]]
   2
   3 # Vector mv operations
   4
   5 In the SIMD VSX set, section 6.8.1 and 6.8.2 p254 of v3.0B has a series of pack and unpack operations. This page covers those and more.  [[svp64]] provides the Vector Context to also add saturation as well as predication.
   6
   7 * See <https://bugs.libre-soc.org/show_bug.cgi?id=230#c30>
   8 * <https://lists.libre-soc.org/pipermail/libre-soc-dev/2022-June/004911.html>
   9
  10 Note that some of these may be covered by [[remap]].
  11
  12 # move to/from vec2/3/4
  13
  14 Basic idea: mv operations where either the src or dest is specifically marked as having SUBVL apply to it, but, crucially, the *other* argument does *not*. Note that this is highly unusual in SimpleV, which normally only allows SUBVL to be applied uniformly across all dest and all src.
  15
  16      mv.srcvec  r3, r4.vec2
  17      mv.destvec r2.vec4, r5
  18
  19 TODO: evaluate whether this will fit with [[mv.swizzle]] involved as well
  20 (yes it probably will)
  21
  22 * M=0 is mv.srcvec
  23 * M=1 is mv.destvec
  24
  25 mv.srcvec (leaving out elwidths and chop):
  26
  27     for i in range(VL):
  28         regs[rd+i] = regs[rs+i*SUBVL]
  29
  30 mv.destvec (leaving out elwidths and chop):
  31
  32     for i in range(VL):
  33         regs[rd+i*SUBVL] = regs[rs+i]
  34
  35 Note that these mv operations only become significant when elwidth is set on the vector to a small value.  SUBVL=4, src elwidth=8, dest elwidth=32 for example.
  36
  37 intended to cover:
  38
  39     rd = (rs >> 0 * 8) & (2^8 - 1)
  40     rd+1 = (rs >> 1 * 8) & (2^8 - 1)
  41     rd+2 = (rs >> 2 * 8) & (2^8 - 1)
  42     rd+3 = (rs >> 3 * 8) & (2^8 - 1)
  43
  44 and variants involving vec3 into 32 bit (4th byte set to zero).
  45 TODO: include this pseudocode which shows how the vecN can do that.
  46 in this example RA elwidth=32 and RB elwidth=8, RB is a vec4.
  47
  48     for i in range(VL):
  49          if predicate_bit_not_set(i) continue
  50          uint8_t *start_point = (uint8_t*)(int_regfile[RA].i[i])
  51          for j in range(SUBVL): # vec4
  52               start_point[j] = some_op(int_regfile[RB].b[i*SUBVL + j])
  53
  54 ## Twin Predication, saturation, swizzle, and elwidth overrides
  55
  56 Note that mv is a twin-predicated operation, and is swizzlable.  This implies that from the vec2, vec3 or vec4, 1 to 8 bytes may be selected and re-ordered (XYZW), mixed with 0 and 1 constants, skipped by way of twin predicate pack and unpack, and a huge amount besides.
  57
  58 Also saturation can be applied to individual elements, including the elements within a vec2/3/4.
  59
  60 # mv.zip and unzip
  61
  62 These are Scalar equivalents to VSX Pack and Unpack: v3.1
  63 Book I Section 6.8 p278.  Saturated variants do not need
  64 adding because SVP64 overrides add Saturation already.
  65 More detailed merging may be achieved with [[sv/bitmanip]]
  66 instructions.
  67
  68 | 0.5 |6.10|11.15|16..20|21..25|26.....30|31|  name        |
  69 |-----|----|-----|------|------|---------|--|--------------|
  70 | 19  | RTp| RC  | RB/0 | RA/0 | XO[5:9] |Rc| mv.zip       |
  71 | 19  | RT | RC  | RS/0 | RA/0 | XO[5:9] |Rc| mv.unzip     |
  72
  73 these are specialist operations that zip or unzip to/from multiple regs to/from one vector including vec2/3/4. when SUBVL!=1 the vec2/3/4 is the contiguous unit that is copied (as if one register).  different elwidths result in zero-extension or truncation except if saturation is enabled, where signed/unsigned may be applied as usual.
  74
  75 mv.zip, RA=0, RB=0
  76
  77     for i in range(VL):
  78         regs[rt+i] = regs[rc+i]
  79
  80 mv.zip, RA=0, RB!=0
  81
  82     for i in range(VL):
  83         regs[rt+i*2  ] = regs[rb+i]
  84         regs[rt+i*2+1] = regs[rc+i]
  85
  86 mv.zip, RA!=0, RB!=0
  87
  88     for i in range(VL):
  89         regs[rt+i*3  ] = regs[rb+i]
  90         regs[rt+i*3+1] = regs[rc+i]
  91         regs[rt+i*3+2] = regs[ra+i]
  92
  93 # REMAP concept for pack/unpack
  94
  95 It may be possible to use one standard mv instruction to perform packing
  96 and unpacking: Matrix allows for both reordering and offsets. At the very least a predicate mask potentially can
  97 be used.
  98
  99 * If a single src-dest mv is used, then it potentially requires
 100   two separate REMAP and two separate sv.mvs: remap-even, sv.mv,
 101   remap-odd, sv.mv
 102 * If adding twin-src and twin-dest that is a lot of instructions,
 103   particularly if triple is added as well. FPR mv, GPR mv
 104 * Unless twin or triple is added, how is it possible to determine
 105   the extra register(s) to be merged (or split)?
 106
 107 How about instead relying on the implicit RS=MAXVL+RT trick and
 108 extending that to RS=MAXVL+RA as a source?  One spare bit in the
 109 EXTRA RM area says whether the sv.mv is a pack (RS-as-src=RA+MAXVL)
 110 or unpack (RS-as-dest=RT+MAXVL)
 111
 112 Alternatively, given that Matrix is up to 3 Dimensions, not even
 113 be concerned about RS, just simply use one of those dimensions to
 114 span the packing:
 115
 116 Example 1:
 117
 118 * RA set to linear
 119 * RT set to YX, ydim=2, xdim=4
 120 * VL=MAXVL=8
 121
 122 The indices match up as follows:
 123
 124     | RA | (0 1) (2 3) (4 5) (6 7) |
 125     | RT |   0 2 4 8     1 3 5 7   |
 126
 127 This results in a 2-element "unpack"
 128
 129 Example 2:
 130
 131 * RT set to linear
 132 * RT set to YX, ydim=3, xdim=3
 133 * VL=MAXVL=9
 134
 135 The indices match up as follows:
 136
 137     | RA |  0 1 2   3 4 5   6 7 8  |
 138     | RT | (0 3 6) (1 4 7) (2 5 8) |
 139
 140 This results in a 3-element "pack"
 141
 142 Both examples become particularly fun when Twin Predication is thrown
 143 into the mix.
 144
 145