3 # Vector Pack/Unpack operations
5 In the SIMD VSX set, section 6.8.1 and 6.8.2 p254 of v3.0B has a series of pack and unpack operations. This page covers those and more. [[svp64]] provides the Vector Context to also add saturation as well as predication.
7 * See <https://bugs.libre-soc.org/show_bug.cgi?id=230#c30>
8 * <https://lists.libre-soc.org/pipermail/libre-soc-dev/2022-June/004911.html>
10 Pack and unpack may be covered by [[sv/remap]] by using Matrix 2D layouts on either source or destination but is quite expensive to do so. Additionally,
11 with pressure on the Scalar 32-bit opcode space it is more appropriate to
12 compromise by adding required capability in SVP64 on top of a
13 base pre-existing Scalar mv instruction. [[sv/mv.swizzle]] is sufficiently
14 unusual to justify a base Scalar 32-bit instruction but pack/unpack is not.
15 Both may benefit from a use of the `RM.EXTRA` field to provide an
16 additional mode, that may be applied to vec2/3/4.
18 # REMAP concept for pack/unpack
20 It may be possible to use one standard mv instruction to perform packing
21 and unpacking: Matrix allows for both reordering and offsets. At the very least a predicate mask potentially can
24 * If a single src-dest mv is used, then it potentially requires
25 two separate REMAP and two separate sv.mvs: remap-even, sv.mv,
27 * If adding twin-src and twin-dest that is a lot of instructions,
28 particularly if triple is added as well. FPR mv, GPR mv
29 * Unless twin or triple is added, how is it possible to determine
30 the extra register(s) to be merged (or split)?
32 How about instead relying on the implicit RS=MAXVL+RT trick and
33 extending that to RS=MAXVL+RA as a source? One spare bit in the
34 EXTRA RM area says whether the sv.mv is a pack (RS-as-src=RA+MAXVL)
35 or unpack (RS-as-dest=RT+MAXVL)
37 Alternatively, given that Matrix is up to 3 Dimensions, not even
38 be concerned about RS, just simply use one of those dimensions to
44 * RT set to YX, ydim=2, xdim=4
47 The indices match up as follows:
49 | RA | (0 1) (2 3) (4 5) (6 7) |
50 | RT | 0 2 4 8 1 3 5 7 |
52 This results in a 2-element "unpack"
57 * RT set to YX, ydim=3, xdim=3
60 The indices match up as follows:
62 | RA | 0 1 2 3 4 5 6 7 8 |
63 | RT | (0 3 6) (1 4 7) (2 5 8) |
65 This results in a 3-element "pack"
67 Both examples become particularly fun when Twin Predication is thrown
70 There exists room within the `svshape` instruction of [[sv/remap]]
71 to request some alternative Matrix mappings, and there is also
72 room within the reserved bits of `svremap` as well.
76 Also used on [[sv/mv.swizzle]]
80 | Field Name | Field bits | Description |
81 |------------|------------|----------------------------|
82 | Rdest_EXTRA2 | `10:11` | extends Rdest (R\*\_EXTRA2 Encoding) |
83 | Rsrc_EXTRA2 | `12:13` | extends Rsrc (R\*\_EXTRA2 Encoding) |
84 | PACK_en | `14` | Enable pack |
85 | UNPACK_en | `15` | Enable unpack |
86 | MASK_SRC | `16:18` | Execution Mask for Source |
88 The usual RM-2P-1S1D is reduced from EXTRA3 to EXTRA2, making
89 room for 2 extra bits that enable either "packing" or "unpacking"
90 on the subvectors vec2/3/4.
93 "normal" SVP64 operation with `SUBVL!=1:` (assuming no elwidth overrides):
97 for j in range(SUBVL):
103 For pack/unpack (again, no elwidth overrides):
105 # yield an outer-SUBVL or inner VL loop with SUBVL
108 for j in range(SUBVL):
113 for j in range(SUBVL):
116 # walk through both source and dest indices simultaneously
117 for src_idx, dst_idx in zip(index_p(PACK), index_p(UNPACK)):
118 move_operation(RT+dst_idx, RA+src_idx)
120 "yield" from python is used here for simplicity and clarity.
121 The two Finite State Machines for the generation of the source
122 and destination element offsets progress incrementally in
125 Setting of both `PACK_en` and `UNPACK_en` is neither prohibited nor
126 `UNDEFINED` because the reordering is fully deterministic, and
127 additional REMAP reordering may be applied. For Matrix this would
128 give potentially up to 4 Dimensions of reordering.