5 In the SIMD VSX set, section 6.8.1 and 6.8.2 p254 of v3.0B has a series of pack and unpack operations. This page covers those and more. [[svp64]] provides the Vector Context to also add saturation as well as predication.
7 * See <https://bugs.libre-soc.org/show_bug.cgi?id=230#c30>
8 * <https://lists.libre-soc.org/pipermail/libre-soc-dev/2022-June/004911.html>
10 Pack and unpack may be covered by [[remap]] by using Matrix 2D layouts on either source or destination but is quite expensive to do so. Additionally,
11 with pressure on the Scalar 32-bit opcode space it is more appropriate to
12 compromise by adding required capability in SVP64 on top of a
13 base pre-existing Scalar mv instruction. [[sv/mv.swizzle]] is sufficiently
14 unusual to justify a base Scalar 32-bit instruction but pack/unpack is not.
16 # REMAP concept for pack/unpack
18 It may be possible to use one standard mv instruction to perform packing
19 and unpacking: Matrix allows for both reordering and offsets. At the very least a predicate mask potentially can
22 * If a single src-dest mv is used, then it potentially requires
23 two separate REMAP and two separate sv.mvs: remap-even, sv.mv,
25 * If adding twin-src and twin-dest that is a lot of instructions,
26 particularly if triple is added as well. FPR mv, GPR mv
27 * Unless twin or triple is added, how is it possible to determine
28 the extra register(s) to be merged (or split)?
30 How about instead relying on the implicit RS=MAXVL+RT trick and
31 extending that to RS=MAXVL+RA as a source? One spare bit in the
32 EXTRA RM area says whether the sv.mv is a pack (RS-as-src=RA+MAXVL)
33 or unpack (RS-as-dest=RT+MAXVL)
35 Alternatively, given that Matrix is up to 3 Dimensions, not even
36 be concerned about RS, just simply use one of those dimensions to
42 * RT set to YX, ydim=2, xdim=4
45 The indices match up as follows:
47 | RA | (0 1) (2 3) (4 5) (6 7) |
48 | RT | 0 2 4 8 1 3 5 7 |
50 This results in a 2-element "unpack"
55 * RT set to YX, ydim=3, xdim=3
58 The indices match up as follows:
60 | RA | 0 1 2 3 4 5 6 7 8 |
61 | RT | (0 3 6) (1 4 7) (2 5 8) |
63 This results in a 3-element "pack"
65 Both examples become particularly fun when Twin Predication is thrown
72 | Field Name | Field bits | Description |
73 |------------|------------|----------------------------|
74 | Rdest_EXTRA2 | `10:11` | extends Rdest (R\*\_EXTRA2 Encoding) |
75 | Rsrc_EXTRA2 | `12:13` | extends Rsrc (R\*\_EXTRA2 Encoding) |
76 | src_SUBVL | `14:15` | SUBVL for Source |
77 | MASK_SRC | `16:18` | Execution Mask for Source |
79 The inclusion of a separate src SUBVL would allow either
80 `sv.mv RT.vecN RA.vecN` to mean contiguous sequential copy
81 or it could mean zip/unzip (pack/unpack).