[[!tag standards]]

# Vector Pack/Unpack operations

In the SIMD VSX set, section 6.8.1 and 6.8.2 p254 of v3.0B has a series of pack and unpack operations. This page covers those and more.  [[svp64]] provides the Vector Context to also add saturation as well as predication.

* See <https://bugs.libre-soc.org/show_bug.cgi?id=230#c30>
* <https://lists.libre-soc.org/pipermail/libre-soc-dev/2022-June/004911.html>

Pack and unpack may be covered by [[sv/remap]] by using Matrix 2D layouts on either source or destination but is quite expensive to do so.  Additionally,
with pressure on the Scalar 32-bit opcode space it is more appropriate to
compromise by adding required capability in SVP64 on top of a
base pre-existing Scalar mv instruction.  [[sv/mv.swizzle]] is sufficiently
unusual to justify a base Scalar 32-bit instruction but pack/unpack is not.
Both may benefit from a use of the `RM.EXTRA` field to provide an
additional mode, that may be applied to vec2/3/4.

# REMAP concept for pack/unpack

It may be possible to use one standard mv instruction to perform packing
and unpacking: Matrix allows for both reordering and offsets. At the very least a predicate mask potentially can
be used.

* If a single src-dest mv is used, then it potentially requires
  two separate REMAP and two separate sv.mvs: remap-even, sv.mv,
  remap-odd, sv.mv
* If adding twin-src and twin-dest that is a lot of instructions,
  particularly if triple is added as well. FPR mv, GPR mv
* Unless twin or triple is added, how is it possible to determine
  the extra register(s) to be merged (or split)?

How about instead relying on the implicit RS=MAXVL+RT trick and
extending that to RS=MAXVL+RA as a source?  One spare bit in the
EXTRA RM area says whether the sv.mv is a pack (RS-as-src=RA+MAXVL)
or unpack (RS-as-dest=RT+MAXVL)

Alternatively, given that Matrix is up to 3 Dimensions, not even
be concerned about RS, just simply use one of those dimensions to
span the packing:

Example 1:

* RA set to linear
* RT set to YX, ydim=2, xdim=4
* VL=MAXVL=8

The indices match up as follows:

    | RA | (0 1) (2 3) (4 5) (6 7) |
    | RT |   0 2 4 8     1 3 5 7   |

This results in a 2-element "unpack"

Example 2:

* RT set to linear
* RT set to YX, ydim=3, xdim=3
* VL=MAXVL=9

The indices match up as follows:

    | RA |  0 1 2   3 4 5   6 7 8  |
    | RT | (0 3 6) (1 4 7) (2 5 8) |

This results in a 3-element "pack"

Both examples become particularly fun when Twin Predication is thrown
into the mix.

There exists room within the `svshape` instruction of  [[sv/remap]]
to request some alternative Matrix mappings, and there is also
room within the reserved bits of `svremap` as well.

# RM Pack/unpack

Also used on [[sv/mv.swizzle]] 

`RM-2P-1S1D-PU` Mode is applicable to all mv operations
(fmv etc) and to Indexed LD/ST.

The usual RM-2P-1S1D is reduced from EXTRA3 to EXTRA2, making
room for 2 extra bits that enable either "packing" or "unpacking"
on the subvectors vec2/3/4.

Illustrating a
"normal" SVP64 operation with `SUBVL!=1:` (assuming no elwidth overrides):

    def index():
        for i in range(VL):
            for j in range(SUBVL):
                yield i*SUBVL+j

    for idx in index():
        operation_on(RA+idx)

For pack/unpack (again, no elwidth overrides):

    # yield an outer-SUBVL or inner VL loop with SUBVL
    def index_p(outer):
        if outer:
            for j in range(SUBVL):
                for i in range(VL):
                    yield i+VL*j
        else:
            for i in range(VL):
                for j in range(SUBVL):
                    yield i*SUBVL+j

     # walk through both source and dest indices simultaneously
     for src_idx, dst_idx in zip(index_p(PACK), index_p(UNPACK)):
         move_operation(RT+dst_idx, RA+src_idx)

"yield" from python is used here for simplicity and clarity.
The two Finite State Machines for the generation of the source
and destination element offsets progress incrementally in
lock-step.

Setting of both `PACK_en` and `UNPACK_en` is neither prohibited nor
`UNDEFINED` because the reordering is fully deterministic, and
additional REMAP reordering may be applied. For Matrix this would
give potentially up to 4 Dimensions of reordering.