[[!tag standards]]
-# Vector mv operations
+# Vector Pack/Unpack operations
In the SIMD VSX set, section 6.8.1 and 6.8.2 p254 of v3.0B has a series of pack and unpack operations. This page covers those and more. [[svp64]] provides the Vector Context to also add saturation as well as predication.
compromise by adding required capability in SVP64 on top of a
base pre-existing Scalar mv instruction. [[sv/mv.swizzle]] is sufficiently
unusual to justify a base Scalar 32-bit instruction but pack/unpack is not.
+Both may benefit from a use of the `RM.EXTRA` field to provide an
+additional mode, that may be applied to vec2/3/4.
# REMAP concept for pack/unpack
Similar to [[sv/mv.swizzle]]
-MVRM-2P-2S1D:
+MVRM-2P-1S1D:
| Field Name | Field bits | Description |
|------------|------------|----------------------------|
| Rdest_EXTRA2 | `10:11` | extends Rdest (R\*\_EXTRA2 Encoding) |
| Rsrc_EXTRA2 | `12:13` | extends Rsrc (R\*\_EXTRA2 Encoding) |
-| src_SUBVL | `14:15` | SUBVL for Source |
+| PACK_en | `14` | Enable pack |
+| UNPACK_en | `15` | Enable unpack |
| MASK_SRC | `16:18` | Execution Mask for Source |
-The inclusion of a separate src SUBVL allows
-`sv.mv.swiz RT.vecN RA.vecN` to mean zip/unzip (pack/unpack).
-This is conceptually achieved by having both source and
-destination SUBVL be "outer" loops instead of inner loops.
+The usual RM-2P-1S1D is reduced from EXTRA3 to EXTRA2, making
+room for 2 extra bits that enable either "packing" or "unpacking"
+on the subvectors vec2/3/4.
Illustrating a
"normal" SVP64 operation with `SUBVL!=1:` (assuming no elwidth overrides):
for idx in index():
operation_on(RA+idx)
-For a separate source/dest SUBVL (again, no elwidth overrides):
+For pack/unpack (again, no elwidth overrides):
- # only one of these will be >1 at any given time
- subvl = MAX(SUBVL,SRC_SUBVL)
- # yield an outer-SUBVL, inner VL loop with SRC SUBVL
- def index_src(outer):
+ # yield an outer-SUBVL or inner VL loop with SUBVL
+ def index_p(outer):
if outer:
- for j in range(subvl):
+ for j in range(SUBVL):
for i in range(VL):
yield i+VL*j
else:
for i in range(VL):
- for j in range(subvl):
- yield i*subvl+j
+ for j in range(SUBVL):
+ yield i*SUBVL+j
- # yield an outer-SUBVL, inner VL loop with DEST SUBVL
- def index_dest(outer):
- if outer:
- for j in range(subvl):
- for i in range(VL):
- yield i+VL*j
- else:
- for i in range(VL):
- for j in range(subvl):
- yield i*subvl+j
-
- # inner looping when SUBVLs are equal
- if SRC_SUBVL == SUBVL:
- for idx in index():
- move_operation(RT+idx, RA+idx)
- else:
- # walk through both source and dest indices simultaneously
- so, do = SRC_SUBVL>SUBVL, SUBVL>SRC_SUBVL
- for src_idx, dst_idx in zip(index_src(so), index_dst(do)):
- move_operation(RT+dst_idx, RA+src_idx)
+ # walk through both source and dest indices simultaneously
+ for src_idx, dst_idx in zip(index_p(PACK), index_p(UNPACK)):
+ move_operation(RT+dst_idx, RA+src_idx)
"yield" from python is used here for simplicity and clarity.
The two Finite State Machines for the generation of the source
and destination element offsets progress incrementally in
lock-step.
-* Normal usage, `SUBVL=SRC_SUBVL`, gives straight subvector copy.
-* `SRC_SUBVL=1, SUBVL=2/3/4` gives a "pack" effect
-* `SUBVL=1, SRC_SUBVL=2/3/4` gives an "unpack".
-* Setting both SUBVL and SRC_SUBVL to unequal values greater than
- 1 will, like [[sv/mv.swizzle]], produce `UNDEFINED` results.
+Setting of both `PACK_en` and `UNPACK_en` is neither prohibited nor
+`UNDEFINED` because the reordering is fully deterministic, and
+additional REMAP reordering may be applied. For Matrix this would
+give potentially up to 4 Dimensions of reordering.