[[!tag standards]] # Vector Pack/Unpack operations In the SIMD VSX set, section 6.8.1 and 6.8.2 p254 of v3.0B has a series of pack and unpack operations. This page covers those and more. [[svp64]] provides the Vector Context to also add saturation as well as predication. * See * Pack and unpack may be covered by [[sv/remap]] by using Matrix 2D layouts on either source or destination but is quite expensive to do so. Additionally, with pressure on the Scalar 32-bit opcode space it is more appropriate to compromise by adding required capability in SVP64 on top of a base pre-existing Scalar mv instruction. [[sv/mv.swizzle]] is sufficiently unusual to justify a base Scalar 32-bit instruction but pack/unpack is not. Both may benefit from a use of the `RM.EXTRA` field to provide an additional mode, that may be applied to vec2/3/4. # REMAP concept for pack/unpack It may be possible to use one standard mv instruction to perform packing and unpacking: Matrix allows for both reordering and offsets. At the very least a predicate mask potentially can be used. * If a single src-dest mv is used, then it potentially requires two separate REMAP and two separate sv.mvs: remap-even, sv.mv, remap-odd, sv.mv * If adding twin-src and twin-dest that is a lot of instructions, particularly if triple is added as well. FPR mv, GPR mv * Unless twin or triple is added, how is it possible to determine the extra register(s) to be merged (or split)? How about instead relying on the implicit RS=MAXVL+RT trick and extending that to RS=MAXVL+RA as a source? One spare bit in the EXTRA RM area says whether the sv.mv is a pack (RS-as-src=RA+MAXVL) or unpack (RS-as-dest=RT+MAXVL) Alternatively, given that Matrix is up to 3 Dimensions, not even be concerned about RS, just simply use one of those dimensions to span the packing: Example 1: * RA set to linear * RT set to YX, ydim=2, xdim=4 * VL=MAXVL=8 The indices match up as follows: | RA | (0 1) (2 3) (4 5) (6 7) | | RT | 0 2 4 8 1 3 5 7 | This results in a 2-element "unpack" Example 2: * RT set to linear * RT set to YX, ydim=3, xdim=3 * VL=MAXVL=9 The indices match up as follows: | RA | 0 1 2 3 4 5 6 7 8 | | RT | (0 3 6) (1 4 7) (2 5 8) | This results in a 3-element "pack" Both examples become particularly fun when Twin Predication is thrown into the mix. There exists room within the `svshape` instruction of [[sv/remap]] to request some alternative Matrix mappings, and there is also room within the reserved bits of `svremap` as well. # RM Pack/unpack Also used on [[sv/mv.swizzle]] `RM-2P-1S1D-PU` Mode is applicable to all mv operations (fmv etc) and to Indexed LD/ST. The usual RM-2P-1S1D is reduced from EXTRA3 to EXTRA2, making room for 2 extra bits that enable either "packing" or "unpacking" on the subvectors vec2/3/4. Illustrating a "normal" SVP64 operation with `SUBVL!=1:` (assuming no elwidth overrides): def index(): for i in range(VL): for j in range(SUBVL): yield i*SUBVL+j for idx in index(): operation_on(RA+idx) For pack/unpack (again, no elwidth overrides): # yield an outer-SUBVL or inner VL loop with SUBVL def index_p(outer): if outer: for j in range(SUBVL): for i in range(VL): yield i+VL*j else: for i in range(VL): for j in range(SUBVL): yield i*SUBVL+j # walk through both source and dest indices simultaneously for src_idx, dst_idx in zip(index_p(PACK), index_p(UNPACK)): move_operation(RT+dst_idx, RA+src_idx) "yield" from python is used here for simplicity and clarity. The two Finite State Machines for the generation of the source and destination element offsets progress incrementally in lock-step. Setting of both `PACK_en` and `UNPACK_en` is neither prohibited nor `UNDEFINED` because the reordering is fully deterministic, and additional REMAP reordering may be applied. For Matrix this would give potentially up to 4 Dimensions of reordering.