From f7f4aa04463bb19a1b98d15b8434148a34abbe7c Mon Sep 17 00:00:00 2001
From: lkcl <lkcl@web>
Date: Mon, 13 Jun 2022 16:46:10 +0100
Subject: [PATCH]

---
 openpower/sv/mv.vec.mdwn | 62 ++++++++++++++--------------------------
 1 file changed, 22 insertions(+), 40 deletions(-)

diff --git a/openpower/sv/mv.vec.mdwn b/openpower/sv/mv.vec.mdwn
index 40a31c954..79dc9d124 100644
--- a/openpower/sv/mv.vec.mdwn
+++ b/openpower/sv/mv.vec.mdwn
@@ -1,6 +1,6 @@
 [[!tag standards]]
 
-# Vector mv operations
+# Vector Pack/Unpack operations
 
 In the SIMD VSX set, section 6.8.1 and 6.8.2 p254 of v3.0B has a series of pack and unpack operations. This page covers those and more.  [[svp64]] provides the Vector Context to also add saturation as well as predication.
 
@@ -12,6 +12,8 @@ with pressure on the Scalar 32-bit opcode space it is more appropriate to
 compromise by adding required capability in SVP64 on top of a
 base pre-existing Scalar mv instruction.  [[sv/mv.swizzle]] is sufficiently
 unusual to justify a base Scalar 32-bit instruction but pack/unpack is not.
+Both may benefit from a use of the `RM.EXTRA` field to provide an
+additional mode, that may be applied to vec2/3/4.
 
 # REMAP concept for pack/unpack
 
@@ -73,19 +75,19 @@ room within the reserved bits of `svremap` as well.
 
 Similar to [[sv/mv.swizzle]] 
 
-MVRM-2P-2S1D:
+MVRM-2P-1S1D:
 
 | Field Name | Field bits | Description                     |
 |------------|------------|----------------------------|
 | Rdest_EXTRA2 | `10:11`  | extends Rdest (R\*\_EXTRA2 Encoding)   |
 | Rsrc_EXTRA2  | `12:13`  | extends Rsrc  (R\*\_EXTRA2 Encoding)   |
-| src_SUBVL    | `14:15`  | SUBVL for Source              |
+| PACK_en      | `14`     | Enable pack              |
+| UNPACK_en    | `15`     | Enable unpack             |
 | MASK_SRC     | `16:18`  | Execution Mask for Source     |
 
-The inclusion of a separate src SUBVL allows
-`sv.mv.swiz RT.vecN RA.vecN` to mean zip/unzip (pack/unpack).
-This is conceptually achieved by having both source and
-destination SUBVL be "outer" loops instead of inner loops.
+The usual RM-2P-1S1D is reduced from EXTRA3 to EXTRA2, making
+room for 2 extra bits that enable either "packing" or "unpacking"
+on the subvectors vec2/3/4.
 
 Illustrating a
 "normal" SVP64 operation with `SUBVL!=1:` (assuming no elwidth overrides):
@@ -98,49 +100,29 @@ Illustrating a
     for idx in index():
         operation_on(RA+idx)
 
-For a separate source/dest SUBVL (again, no elwidth overrides):
+For pack/unpack (again, no elwidth overrides):
 
-    # only one of these will be >1 at any given time
-    subvl = MAX(SUBVL,SRC_SUBVL)
-    # yield an outer-SUBVL, inner VL loop with SRC SUBVL
-    def index_src(outer):
+    # yield an outer-SUBVL or inner VL loop with SUBVL
+    def index_p(outer):
         if outer:
-            for j in range(subvl):
+            for j in range(SUBVL):
                 for i in range(VL):
                     yield i+VL*j
         else:
             for i in range(VL):
-                for j in range(subvl):
-                    yield i*subvl+j
+                for j in range(SUBVL):
+                    yield i*SUBVL+j
 
-    # yield an outer-SUBVL, inner VL loop with DEST SUBVL
-    def index_dest(outer):
-        if outer:
-            for j in range(subvl):
-                for i in range(VL):
-                    yield i+VL*j
-        else:
-            for i in range(VL):
-                for j in range(subvl):
-                    yield i*subvl+j
-
-    # inner looping when SUBVLs are equal
-    if SRC_SUBVL == SUBVL:
-        for idx in index():
-            move_operation(RT+idx, RA+idx)
-    else:
-        # walk through both source and dest indices simultaneously
-        so, do = SRC_SUBVL>SUBVL, SUBVL>SRC_SUBVL
-        for src_idx, dst_idx in zip(index_src(so), index_dst(do)):
-            move_operation(RT+dst_idx, RA+src_idx)
+     # walk through both source and dest indices simultaneously
+     for src_idx, dst_idx in zip(index_p(PACK), index_p(UNPACK)):
+         move_operation(RT+dst_idx, RA+src_idx)
 
 "yield" from python is used here for simplicity and clarity.
 The two Finite State Machines for the generation of the source
 and destination element offsets progress incrementally in
 lock-step.
 
-* Normal usage, `SUBVL=SRC_SUBVL`, gives straight subvector copy.
-* `SRC_SUBVL=1, SUBVL=2/3/4` gives a "pack" effect
-* `SUBVL=1, SRC_SUBVL=2/3/4` gives an "unpack".
-* Setting both SUBVL and SRC_SUBVL to unequal values greater than
-  1 will, like [[sv/mv.swizzle]], produce `UNDEFINED` results.
+Setting of both `PACK_en` and `UNPACK_en` is neither prohibited nor
+`UNDEFINED` because the reordering is fully deterministic, and
+additional REMAP reordering may be applied. For Matrix this would
+give potentially up to 4 Dimensions of reordering.
-- 
2.30.2