From f7f4aa04463bb19a1b98d15b8434148a34abbe7c Mon Sep 17 00:00:00 2001 From: lkcl Date: Mon, 13 Jun 2022 16:46:10 +0100 Subject: [PATCH] --- openpower/sv/mv.vec.mdwn | 62 ++++++++++++++-------------------------- 1 file changed, 22 insertions(+), 40 deletions(-) diff --git a/openpower/sv/mv.vec.mdwn b/openpower/sv/mv.vec.mdwn index 40a31c954..79dc9d124 100644 --- a/openpower/sv/mv.vec.mdwn +++ b/openpower/sv/mv.vec.mdwn @@ -1,6 +1,6 @@ [[!tag standards]] -# Vector mv operations +# Vector Pack/Unpack operations In the SIMD VSX set, section 6.8.1 and 6.8.2 p254 of v3.0B has a series of pack and unpack operations. This page covers those and more. [[svp64]] provides the Vector Context to also add saturation as well as predication. @@ -12,6 +12,8 @@ with pressure on the Scalar 32-bit opcode space it is more appropriate to compromise by adding required capability in SVP64 on top of a base pre-existing Scalar mv instruction. [[sv/mv.swizzle]] is sufficiently unusual to justify a base Scalar 32-bit instruction but pack/unpack is not. +Both may benefit from a use of the `RM.EXTRA` field to provide an +additional mode, that may be applied to vec2/3/4. # REMAP concept for pack/unpack @@ -73,19 +75,19 @@ room within the reserved bits of `svremap` as well. Similar to [[sv/mv.swizzle]] -MVRM-2P-2S1D: +MVRM-2P-1S1D: | Field Name | Field bits | Description | |------------|------------|----------------------------| | Rdest_EXTRA2 | `10:11` | extends Rdest (R\*\_EXTRA2 Encoding) | | Rsrc_EXTRA2 | `12:13` | extends Rsrc (R\*\_EXTRA2 Encoding) | -| src_SUBVL | `14:15` | SUBVL for Source | +| PACK_en | `14` | Enable pack | +| UNPACK_en | `15` | Enable unpack | | MASK_SRC | `16:18` | Execution Mask for Source | -The inclusion of a separate src SUBVL allows -`sv.mv.swiz RT.vecN RA.vecN` to mean zip/unzip (pack/unpack). -This is conceptually achieved by having both source and -destination SUBVL be "outer" loops instead of inner loops. +The usual RM-2P-1S1D is reduced from EXTRA3 to EXTRA2, making +room for 2 extra bits that enable either "packing" or "unpacking" +on the subvectors vec2/3/4. Illustrating a "normal" SVP64 operation with `SUBVL!=1:` (assuming no elwidth overrides): @@ -98,49 +100,29 @@ Illustrating a for idx in index(): operation_on(RA+idx) -For a separate source/dest SUBVL (again, no elwidth overrides): +For pack/unpack (again, no elwidth overrides): - # only one of these will be >1 at any given time - subvl = MAX(SUBVL,SRC_SUBVL) - # yield an outer-SUBVL, inner VL loop with SRC SUBVL - def index_src(outer): + # yield an outer-SUBVL or inner VL loop with SUBVL + def index_p(outer): if outer: - for j in range(subvl): + for j in range(SUBVL): for i in range(VL): yield i+VL*j else: for i in range(VL): - for j in range(subvl): - yield i*subvl+j + for j in range(SUBVL): + yield i*SUBVL+j - # yield an outer-SUBVL, inner VL loop with DEST SUBVL - def index_dest(outer): - if outer: - for j in range(subvl): - for i in range(VL): - yield i+VL*j - else: - for i in range(VL): - for j in range(subvl): - yield i*subvl+j - - # inner looping when SUBVLs are equal - if SRC_SUBVL == SUBVL: - for idx in index(): - move_operation(RT+idx, RA+idx) - else: - # walk through both source and dest indices simultaneously - so, do = SRC_SUBVL>SUBVL, SUBVL>SRC_SUBVL - for src_idx, dst_idx in zip(index_src(so), index_dst(do)): - move_operation(RT+dst_idx, RA+src_idx) + # walk through both source and dest indices simultaneously + for src_idx, dst_idx in zip(index_p(PACK), index_p(UNPACK)): + move_operation(RT+dst_idx, RA+src_idx) "yield" from python is used here for simplicity and clarity. The two Finite State Machines for the generation of the source and destination element offsets progress incrementally in lock-step. -* Normal usage, `SUBVL=SRC_SUBVL`, gives straight subvector copy. -* `SRC_SUBVL=1, SUBVL=2/3/4` gives a "pack" effect -* `SUBVL=1, SRC_SUBVL=2/3/4` gives an "unpack". -* Setting both SUBVL and SRC_SUBVL to unequal values greater than - 1 will, like [[sv/mv.swizzle]], produce `UNDEFINED` results. +Setting of both `PACK_en` and `UNPACK_en` is neither prohibited nor +`UNDEFINED` because the reordering is fully deterministic, and +additional REMAP reordering may be applied. For Matrix this would +give potentially up to 4 Dimensions of reordering. -- 2.30.2