5 In the SIMD VSX set, section 6.8.1 and 6.8.2 p254 of v3.0B has a series of pack and unpack operations. This page covers those and more. [[svp64]] provides the Vector Context to also add saturation as well as predication.
7 * See <https://bugs.libre-soc.org/show_bug.cgi?id=230#c30>
8 * <https://lists.libre-soc.org/pipermail/libre-soc-dev/2022-June/004911.html>
10 Note that some of these may be covered by [[remap]].
12 # move to/from vec2/3/4
14 Basic idea: mv operations where either the src or dest is specifically marked as having SUBVL apply to it, but, crucially, the *other* argument does *not*. Note that this is highly unusual in SimpleV, which normally only allows SUBVL to be applied uniformly across all dest and all src.
17 mv.destvec r2.vec4, r5
19 TODO: evaluate whether this will fit with [[mv.swizzle]] involved as well
20 (yes it probably will)
25 mv.srcvec (leaving out elwidths and chop):
28 regs[rd+i] = regs[rs+i*SUBVL]
30 mv.destvec (leaving out elwidths and chop):
33 regs[rd+i*SUBVL] = regs[rs+i]
35 Note that these mv operations only become significant when elwidth is set on the vector to a small value. SUBVL=4, src elwidth=8, dest elwidth=32 for example.
39 rd = (rs >> 0 * 8) & (2^8 - 1)
40 rd+1 = (rs >> 1 * 8) & (2^8 - 1)
41 rd+2 = (rs >> 2 * 8) & (2^8 - 1)
42 rd+3 = (rs >> 3 * 8) & (2^8 - 1)
44 and variants involving vec3 into 32 bit (4th byte set to zero).
45 TODO: include this pseudocode which shows how the vecN can do that.
46 in this example RA elwidth=32 and RB elwidth=8, RB is a vec4.
49 if predicate_bit_not_set(i) continue
50 uint8_t *start_point = (uint8_t*)(int_regfile[RA].i[i])
51 for j in range(SUBVL): # vec4
52 start_point[j] = some_op(int_regfile[RB].b[i*SUBVL + j])
54 ## Twin Predication, saturation, swizzle, and elwidth overrides
56 Note that mv is a twin-predicated operation, and is swizzlable. This implies that from the vec2, vec3 or vec4, 1 to 8 bytes may be selected and re-ordered (XYZW), mixed with 0 and 1 constants, skipped by way of twin predicate pack and unpack, and a huge amount besides.
58 Also saturation can be applied to individual elements, including the elements within a vec2/3/4.
62 These are Scalar equivalents to VSX Pack and Unpack: v3.1
63 Book I Section 6.8 p278. Saturated variants do not need
64 adding because SVP64 overrides add Saturation already.
65 More detailed merging may be achieved with [[sv/bitmanip]]
68 | 0.5 |6.10|11.15|16..20|21..25|26.....30|31| name |
69 |-----|----|-----|------|------|---------|--|--------------|
70 | 19 | RTp| RC | RB/0 | RA/0 | XO[5:9] |Rc| mv.zip |
71 | 19 | RT | RC | RS/0 | RA/0 | XO[5:9] |Rc| mv.unzip |
73 these are specialist operations that zip or unzip to/from multiple regs to/from one vector including vec2/3/4. when SUBVL!=1 the vec2/3/4 is the contiguous unit that is copied (as if one register). different elwidths result in zero-extension or truncation except if saturation is enabled, where signed/unsigned may be applied as usual.
78 regs[rt+i] = regs[rc+i]
83 regs[rt+i*2 ] = regs[rb+i]
84 regs[rt+i*2+1] = regs[rc+i]
89 regs[rt+i*3 ] = regs[rb+i]
90 regs[rt+i*3+1] = regs[rc+i]
91 regs[rt+i*3+2] = regs[ra+i]
93 # REMAP concept for pack/unpack
95 It may be possible to use one standard mv instruction to perform packing
96 and unpacking: Matrix allows for both reordering and offsets. At the very least a predicate mask potentially can
99 * If a single src-dest mv is used, then it potentially requires
100 two separate REMAP and two separate sv.mvs: remap-even, sv.mv,
102 * If adding twin-src and twin-dest that is a lot of instructions,
103 particularly if triple is added as well. FPR mv, GPR mv
104 * Unless twin or triple is added, how is it possible to determine
105 the extra register(s) to be merged (or split)?
107 How about instead relying on the implicit RS=MAXVL+RT trick and
108 extending that to RS=MAXVL+RA as a source? One spare bit in the
109 EXTRA RM area says whether the sv.mv is a pack (RS-as-src=RA+MAXVL)
110 or unpack (RS-as-dest=RT+MAXVL)
112 Alternatively, given that Matrix is up to 3 Dimensions, not even
113 be concerned about RS, just simply use one of those dimensions to
119 * RT set to YX, ydim=2, xdim=4
122 The indices match up as follows:
124 | RA | (0 1) (2 3) (4 5) (6 7) |
125 | RT | 0 2 4 8 1 3 5 7 |
127 This results in a 2-element "unpack"
132 * RT set to YX, ydim=3, xdim=3
135 The indices match up as follows:
137 | RA | 0 1 2 3 4 5 6 7 8 |
138 | RT | (0 3 6) (1 4 7) (2 5 8) |
140 This results in a 3-element "pack"
142 Both examples become particularly fun when Twin Predication is thrown